Comments (6)
@jasonbosco Thanks for the quick response. I've updated to Typesense 26.0 and included the --db-compaction-interval=21600
in the arguments. Disk usage has held around 8-9GB since then but that seems quite high for the number of documents. It could just be leftover cruft from 0.25.2. I've wiped the persistent storage, restarted the cluster and started a fresh index and will let it stew over the weekend to see what happens.
from typesense.
We've improved disk usage in v26.0, for this specific write pattern of creating new timestamped collections and deleting the old one, like how the scraper does.
Could you try upgrading to it, and then setting db-compaction-interval = 21600
as a Typesense server parameter, to see if that helps?
from typesense.
After rebuilding all the data under 26.0 the disk usage under /usr/share/typesense/data is much more steady and consistent across cluster nodes. We are noticing that one node seems to be keeping everything in ./db while the other 2 appear to be moving data over to ./state/snapshot despite all containers being started with identical parameters. Any ideas why that may be?
typesense-0:
typesense-0:/$ cd /usr/share/typesense/data
typesense-0:/usr/share/typesense/data$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/nvme2n1 9.8G 1.5G 8.3G 16% /usr/share/typesense/data
typesense-0:/usr/share/typesense/data$ du -h --max-depth=2
4.0K ./models
236K ./state/log
1.5G ./state/snapshot
8.0K ./state/meta
1.5G ./state
16K ./lost+found
3.9M ./db/archive
30M ./db
4.0K ./meta/archive
5.3M ./meta
1.5G .
typesense-1:
typesense-1:/$ cd /usr/share/typesense/data
typesense-1:/usr/share/typesense/data$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/nvme4n1 9.8G 1.6G 8.2G 16% /usr/share/typesense/data
typesense-1:/usr/share/typesense/data$ du -h --max-depth=2
236K ./state/log
8.0K ./state/meta
1.6G ./state/snapshot
1.6G ./state
4.0K ./models
16K ./lost+found
3.9M ./db/archive
30M ./db
4.0K ./meta/archive
5.3M ./meta
1.6G .
typesense-2:
typesense-2:/$ cd /usr/share/typesense/data
typesense-2:/usr/share/typesense/data$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/nvme3n1 9.8G 1.5G 8.3G 16% /usr/share/typesense/data
typesense-2:/usr/share/typesense/data$ du -h --max-depth=2
16K ./lost+found
3.9M ./db/archive
1.5G ./db
4.0K ./models
4.0K ./meta/archive
5.3M ./meta
236K ./state/log
680K ./state/snapshot
8.0K ./state/meta
928K ./state
1.5G .
from typesense.
Can you tell me what type of disk you are using for the data directory?
When a write arrives, it is written to the raft log and also written to the store in db
directory. Every 1 hour, a snapshot happens where the contents of the db
directory is hard linked within the state/snapshot
directory (hard linking is like soft link but happens at the inode level so that data is not duplicated). When Typesense is restarted, we replace the db
directory with the contents of the db from state/snapshot
. You can confirm this behavior in a Typesense server on your localhost.
So ideally, the db
directory should be more or less have the same data in state/snapshot
unless a lot of writes have happened before a snapshot occurs.
from typesense.
from typesense.
Related Issues (20)
- [Joins] Support for sorting and limiting the number of items in a joined collection. HOT 2
- Range facet values not returned when filtering HOT 2
- bazel build failure HOT 2
- Not able to add data into a nested indexed field when optional flag is false in the schema HOT 2
- [Feature Request] Support querying joined fields HOT 5
- Prefix search in filter_by not getting all the results HOT 1
- Cloudflare AI APIs for semantic search
- Issue installing Typesense with docker yaml file on website HOT 2
- [Joins] Error on indexing reference field with enabled nested fields
- Stemming does not work for non-english locale
- Typesense group_by working incorrect
- Collection schema cannot contain a field with name id. Ignoring field HOT 7
- Bug in 26, 0.25.2 working fine: "Make sure to enable infix search by specifying" when only part of the fields have the infix index enabled
- [Feature Request] Support strict exact highlighting on array elements HOT 2
- Add support for nuxt
- [Feature request] Backup and restore using S3 compatible bucket HOT 2
- Highlight issue when negative operator is used HOT 1
- Constraint enforcements on JOINed references should be optional HOT 3
- Segfault due to reference loading in CollectionManager::load on startup HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from typesense.