Comments (4)
Of your 50-100 lookups, how many have the same key?
How long does it currently take to load the lookups?
You could convert your lookups to RocksDB based lookups where you create new snapshots once a day and publish updates via Kafka. This would require you to build a new RocksDB instance once a day, zip it up and publish it to HDFS. But it also means you would need some daemon process to do change data capture and publish the updated or new rows to Kafka.
In your 50-100 lookups, if many of your lookups share the same key, you could replace them with our JDBC lookup since it allows for multiple values to be loaded in one lookup, saving duplication of key space. E.g. lookups-cached-global you have one key to one value: Map(a -> aa, b -> bb) Map(a-> 123, b -> 456), our JDBC lookups allow for just one lookup : Map( a -> (aa, 123), b -> (bb, 456)). At query time, you just specific which column you want in the extraction function.
from maha.
We haven't properly monitored the loading time.For one large lookup(around 10 million entries) , it takes around 45 minutes.
from maha.
@vsharathchandra might be easier to talk about this on gitter or hangouts
from maha.
okay sure will contact you on gitter.
from maha.
Related Issues (20)
- Reduce JDBC Lookups for Oracle HOT 10
- Support for a doubleSum aggregator in Druid HOT 2
- Integration issue with finatra at runtime HOT 1
- Allows option of storing best candidate powerset in the rocksdb
- Create QueryPipelineFactory Context Config in maha service at registry level
- Plans to support newer druid versions for maha-lookups? HOT 7
- Druid filter on facts not included in the list of request columns HOT 3
- Maha Validate query request before actually query HOT 2
- [Feature] Dim Only Query: Schema required filter validation in request
- Return all rows from Druid GroupBy query HOT 3
- Support for really large queries HOT 8
- Oracle/Postgres queries with forceFactDriven set true can ignore Selected Fact Cols
- [WIP] Dynamically generated versioned schema [old]
- [WIP] Dynamically generated versioned schema
- Missing DefaultDimensionSpec in Druid query when querying with multiple Lookup dimensions. HOT 10
- Combining data from datasources based on date range HOT 2
- MacOS Big Sur breaks PostgreSQL installations causing maha failure
- Rollup Best Candidate Calc Inconsistent with Equal Cost
- newRollUp and createSubset need to apply costMultiplier updates HOT 2
- Ability to set different Druid URLs per table to support Broker Tiering HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from maha.