Comments (6)
Another example, which might be more common, is that due to task cancellations coming in CachingInputStream might not be updating BookKeeper even though it downloaded the data. This could be fixed by ensuring updateCacheAndStats is scheduled irrespective of Interrupts coming in or not in CachingInputStream but taking the proposed approach of moving file weights updation work to getCacheStatus
api would solve this case too.
from rubix.
Would one potential solution be to rejects further loads into the cache above a given threshold (90% of available disk space)?
from rubix.
Would one potential solution be to rejects further loads into the cache above a given threshold (90% of available disk space)?
Even if we add that new threshold, same condition of delayed accounting can occur and cause this same problem.
from rubix.
It seems like to be completely accurate you'd need to evict at the same time you add to the cache.
from rubix.
Right @JamesRTaylor. The proposal tries to do that by doing evictions before addition of data because "evict at the same time you add to the cache" cannot be guaranteed given the asynchronous nature of addition to cache and updation of metadata.
from rubix.
There is another option which can get us improved accounting (not accurate though) at a very low dev cost:
All download of data in async warmup happens in FileDownloadRequestChain. In that process:
- Limit the size of each readRequest in FileDownloadRequestChain to 100MB. The slight penalty on opening of additional connections in new positional reads (due to additional ReadRequests formed now) would be small enough to ignore
- With the download of each readRequest itself update the metadata instead of existing behavior of downloading all the data in the Chain and then updating metadata
What this gets us is that without any intrusive changes we are ensured that cache will never cross ~1GB beyond the configured threshold (10 download threads * 100MB delayed accounting per thread)
This approach will work only with async warmup case because in sync warmup the updation of metadata is over thrift that can be too costly with number of request we will get. Given that async warmup is default and we want to deprecate sync warmup over time, this should be an acceptable solution.
from rubix.
Related Issues (20)
- Tests should run from the root directory HOT 1
- Track moving averages in stats via airlift's CounterStat
- Use `java.nio.file.Files` instead of `java.io.File` HOT 1
- Extend ClusterManager to return current node or current node index
- Fix jmx metrics documentation
- Lower log level of cache eviction to debug HOT 4
- Avoid extra listStatus calls at worker level
- Available cache space is computed before cache directories are cleaned up during startup HOT 4
- Add option to disable non-local reads
- Lower DEFAULT_KEY_POOL_MAX_SIZE to smaller value, ideally under 10
- Avoid caching presto worker nodes HOT 2
- ClusterManager not updating cluster membership since 0.3.17
- Do not log `NoSuchFileException` as warning in `FileMetadata.addFilesForDeletion`
- Prevent overwhelming of worker nodes by dynamically sizing thread pools HOT 2
- Lower the default max wait timeout
- Invalid cached data led to exceptions on read HOT 4
- Is there any document about the principle and architecture of rubix? HOT 1
- How can i join rubix slack HOT 2
- Support file size based usage limits
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rubix.