Comments (20)
Hi Jaromir, you are quite right, it is a problem with index. At the moment there no efficient way to store large volume of distinct ids and search on them. This problem has been hanging over my head for quite some time and i'll definitelly add an efficient index for case like this one.
Symbol is kind of Enum type, it is designed for time series data where you have relatively low number of items (stock symbols, sensors or other subjects) and large volume of time series data associated with each of them. Symbols do noty work well with counts over 100k.
One workaround that you can try is this:
$str("uniqueId").size(15).index().buckets(5000)
This will create an index that you want, a hashtable.
from questdb.
Hi Vlad, thanks for your prompt response. I'll try the workaround and let you know.
from questdb.
I've tried the workaround and it is considerably faster but the performance is still much worse than standard index with few distinct values. Is there any ETA when you add the efficient solution for case like this?
Thanks a lot.
Jaromir
from questdb.
could you give me an example of your object and config setup for it? it could be possible to tweak performance as is?
I cannot give an ETA but that's very high on priority list. I'll keep this issue updated with progress.
from questdb.
OK, thanks.
Here is my object:
public class Order {
private long id;
private String clOrdId;
private long timestamp;
public void clear(){
clOrdId = null;
}
public long getId() {
return id;
}
public void setId(long id) {
this.id = id;
}
public long getTimestamp() {
return timestamp;
}
public void setTimestamp(long timestamp) {
this.timestamp = timestamp;
}
public void setClOrdId(String clOrdId) {
this.clOrdId = clOrdId;
}
public String getClOrdId() {
return clOrdId;
}
}
And the setup is here:
$(Order.class)
.partitionBy(PartitionType.DAY)
.location("orders-by-day")
.key("id")
.$str("clOrdId").index().size(15).buckets(5000)
.$ts()
from questdb.
This code appends 1M Orders in 216ms without index and 300ms with index. Is this similar to what you are getting?
JournalFactory factory = new JournalFactory(new JournalConfigurationBuilder() {{
$(Order.class)
.partitionBy(PartitionType.DAY)
.$str("clOrdId").index().buckets(5000)
.$ts()
;
}}.build(args[0]));
Order order = new Order();
long t = System.nanoTime();
JournalWriter<Order> w = factory.writer(Order.class);
for (int i = -1000000; i < 1000000; i++) {
if (i == 0) {
t = System.nanoTime();
}
order.setTimestamp(System.currentTimeMillis());
order.setId(i);
order.setClOrdId(Integer.toString(i));
w.append(order);
}
w.commit();
System.out.println(System.nanoTime() - t);
from questdb.
I was getting much worse numbers - I compared my example with yours and the difference was the timestamp. In my example the timestamp was calculated so it distributed the items among 90 days - thus creating 90 partitions.
One last question, Vlad: in the example above, how do I find an item with specific clordId? It seems the journal query does not work:
journal.query().all().withSymValues("clOrdId", "1001")
Thansk again, Jaromir
from questdb.
With multiple partitions it is likely to be sizing/memory issue rather than index performance. I'm going to try to tune that and let you know.
In the mean time there are couple of things you can try:
- $(Order.class).recordCountHint(10000) this hint is per partition, since 1M spread over 90 days 10K per partition should be fine, otherwise journal will be mapping memory too agressively.
- depending on how much RAM you have you may end up with filling it up quite quickly. To avoid that user bulkWriter() instead of writer(), it is a bit slower but much more memory frugal:
JournalWriter<Order> w = factory.bulkWriter(Order.class);
In release there is no way to simply search string. You can do it but that's quite a few lines of code (and understanding of index structure). In snapshot however you can do something like this:
for (Order o : q.ds(
q.top(1
, q.forEachPartition(
q.source(w, false)
, q.forEachRow(
q.kvSource("clOrdId", q.hashSource("clOrdId", "10"))
, q.equalsConst("clOrdId", "10")
)
)
)
, order
)) {
System.out.println(o);
}
it is a little fiddly as well, but simpler that doing it all by hand :)
from questdb.
With config like this:
JournalFactory factory = new JournalFactory(new JournalConfigurationBuilder() {{
$(Order.class).recordCountHint(10000)
.partitionBy(PartitionType.DAY)
.$str("clOrdId").index().buckets(100)
.$ts()
;
}}.build(args[0]));
I can append 1M in 700ms without index and ~900ms with index.
For practical applications it doesn't make sense to partition 1M of data like that. Its about 126MB on disk underpopulated. You could be looking at growing database to 50-100M rows before considering partitioning. Otherwise cost of managing files is higher than the cost of search if data was kept all in one place.
from questdb.
I am not sure if this might help. Perhaps you can borrow some ideas from:
https://code.google.com/p/cqengine/
https://code.google.com/p/concurrent-trees/
from questdb.
Hi Vlad,
thanks for your response. You are right, it does not make sense to have such small partitions. I wasn't planning it - it was just an artificial example. Btw: when do you think you will release the support for string searching?
Hi Sirinath,
thanks for the tip - I think we might have come across the CQEngine already - will double check.
Jaromir
from questdb.
Hi Jaromir, snapshot is already available in maven repo if you want to play with it. Just add this to pom:
<repositories>
<repository>
<id>sonatype-snapshots</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
</repository>
</repositories>
I'm planning to add hash index on int fields, so you can perhaps use "id" for unique key? I'll release over this weekend, having said that the Q API is part of a large project and will definitely take some time to stabilise and test properly, so you can use it but if treat it as beta i'd very much appreciate that.
from questdb.
I have release 2.0.1 containing support for key search (both int and string)
https://github.com/NFSdb/nfsdb/releases/tag/2.0.1
from questdb.
Cool. Thanks again Vlad.
from questdb.
My pleasure!
from questdb.
Hi Vlad,
I haven't found Release 2.0.1 on Maven Central Repository - could you please publish it? I am using 2.0.2-SNAPSHOT for the time being ...
Thanks, Jaromir
from questdb.
Hi Jaromir, you are right, release was incomplete.
All steps are now done and its with maven replication system. Please check in an hour or two.
Vlad
from questdb.
Thanks!
from questdb.
it is out, finally 👍
from questdb.
Great, thanks.
from questdb.
Related Issues (20)
- Kafka consumer fails with error "NoSuchFieldError: enLocale"
- QuestDB returns 404 on static http GET requests HOT 1
- Flaky test DirectUtf8SinkTest.testCreateEmpty HOT 2
- UnionAllVarcharTest discovered a bug with long256 HOT 1
- Allow creation of multiple databases on the same QuestDB instance HOT 2
- `create as select` and `insert as select` should not be atomic by default
- first_value window function returns incorrect result when used over sample by result HOT 1
- Confusing Window function syntax RANGE and ROWS
- Do nothing on DEDUP UPSERT HOT 2
- Error using JOIN and GROUP BY for aliased column HOT 3
- Add support for cast(geohash_column as VARCHAR/STRING)
- Support interval scans for designated timestamp WHERE expressions that include OR HOT 1
- Fix questdb.service example
- AssertionError thrown in time-range filter and order by timestamp query
- regression: HTTP HEAD method on `/`
- Flaky AlterWalTableLineTcpReceiverTest.testAlterCommandTruncateTable()
- Improve performance of filtered distinct symbol query
- Segfault (abort trap 6) in
- Flaky AlterWalTableLineTcpReceiverTest.testAlterCommandTruncateTable() HOT 1
- Http server hangs if QDB_PG_ENABLED=false
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from questdb.