Comments (9)
@risharde The reason why deletions increase the size of the db is that generally, they write more blocks, and then unused blocks get cleaned up later. But that cleanup process is not very aggressive and doesn't really try to minimize file size. Also, a small number of tombstone entries are left behind to support incremental replication if a node disappears and returns.
-
A simple solution would be to use FALLOC_FL_PUNCH_HOLE to mark the unused extents of the file, which are 2MB chunks, i.i.r.c., unused, so that the file uses less actual disk space.
-
That might still suffer from space usage by LBA entries, small log entries that tell where block ids and their file offsets are, which get garbage collected separately and more slowly. Some tuning of garbage collection parameters, or configurability of them to make garbage collection more aggressive, could be implemented. They could also be made configurable table-by-table.
-
Another solution would be to finish implementing the RocksDB branch, https://github.com/srh/rethinkdb/tree/sam/24x/rocks , which only lacks support for some kind of incremental replication, should a shard's replica disconnect and reconnect.
from rethinkdb.
@srh not sure if this is the best place to put this but I think someone needs to solve the issue of tables using increased storage space even though items are deleted. I would have proposed a fix but I don't code in C nor do I understand the internal working of Rethinkdb.
Initially I thought that recreating a table was enough but I didn't consider having to recreate table specific indexes etc. Seems more like a hack workaround (to copy a table and recreate indexes) than resolving the actual issue.
Your thoughts and insights on how hard fixing the storage issue would be greatly appreciated
from rethinkdb.
I have uploaded early Kinetic and Windows builds, because those were not available for previous releases.
from rethinkdb.
Heads up the https://download.rethinkdb.com/repository/centos/7/x86_64/rethinkdb-2.4.3.x86_64.centos7.rpm is currently unsigned:
# rpm -q --info -p ./rethinkdb-2.4.3.x86_64.centos7.rpm
Name : rethinkdb
Version : 2.4.3
Release : 30658
Architecture: x86_64
Install Date: (not installed)
Group : Database
Size : 29149286
License : ASL 2.0
Signature : (none)
Source RPM : rethinkdb-2.4.3-30658.src.rpm
Build Date : Mon 16 Jan 2023 10:22:49 PM UTC
Build Host : 309af1f74b7a
Relocations : /
Packager : RethinkDB <[email protected]>
Vendor : RethinkDB
URL : https://www.rethinkdb.com/
Summary : RethinkDB is built to store JSON documents, and scale to multiple servers with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by.
Description :
RethinkDB is built to store JSON documents, and scale to multiple servers with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by.
I have my /etc/yum.repos.d/rethinkdb.repo set up as per https://rethinkdb.com/docs/install/centos/.
I was surprised to see 2.4.3 show up in the repo while there's no v2.4.3 tag at https://github.com/rethinkdb/rethinkdb/tags and https://rethinkdb.com/ still lists 2.4.2 as the latest release.
from rethinkdb.
surprised as well although really good news
from rethinkdb.
Gah, I ... will get those rpm's signed when I get home. @gabor-boros asked me about this yesterday, too.
from rethinkdb.
@jude I've replaced the x86_64 RPM's with signed RPM's. (Other architectures didn't have this problem because Gabor did those.)
I've also pushed the v2.4.3 tag to the main repo.
The hook to update the website off the https://github.com/rethinkdb/www or docs repos isn't working; also, running rake locally ran into some issues. I don't know when the website will get updated.
@gabor-boros tells me he will handle Docker and Brew updates.
from rethinkdb.
@srh Let me take a look at the hook.
(Also, thanks for re-uploading)
from rethinkdb.
@gabor-boros I also ran into issues trying to run rake
locally. I was too lazy to investigate.
from rethinkdb.
Related Issues (20)
- Unable to install Rethinkdb 2.4.0 on Ubuntu 20 HOT 3
- Data Explorer Code Completion Not working HOT 14
- Error in thread 2 in src/arch/runtime/context_switching.cc at line 362 HOT 1
- Rethinkdb Proxy
- Set a name to a proxy name HOT 3
- Add "Buffers" from /proc/meminfo in parse_meminfo_file to determine available memory
- download.rethinkdb.com is down, 502 Bad Gateway HOT 1
- Evaluate Profile-Guided Optimization (PGO) on RethinkDB
- error: to_string called on an uninitialized ip_address_t, addr_type: 0 compiling rethinkdb on Raspberry HOT 6
- RethinkDB not fully supported on Raspberry PI OS Bullseye (32/64 bit) HOT 10
- Reasonable to change hard-coded cluster size? HOT 5
- help bro my issue = warn: Problem when checking for new versions of RethinkDB: HTTP request to update.rethinkdb.com failed. HOT 1
- cluster connect/reconnect timeout HOT 1
- Installation fails in Kubuntu 23.10 HOT 4
- Generate web_assets.cc in a repeatable file order HOT 1
- Avoid full paths of coffeescript files in generation of web_assets.cc HOT 2
- Rethinkdb 2.4.4 release list HOT 11
- Support protobuf 25
- Return multiple changes feed
- Cache miss rate measurements HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rethinkdb.