Hi Vlad, I've recently came across your NFSDB. It looks impressive.

OK, thanks. Here is my object: <div class="highlight highlight-s

With config like this: <div class="highlight highlight-source-java notranslate pos

Unique key support about questdb HOT 20 CLOSED

questdb commented on May 22, 2024

Unique key support

from questdb.

Comments (20)

bluestreak01 commented on May 22, 2024

Hi Jaromir, you are quite right, it is a problem with index. At the moment there no efficient way to store large volume of distinct ids and search on them. This problem has been hanging over my head for quite some time and i'll definitelly add an efficient index for case like this one.

Symbol is kind of Enum type, it is designed for time series data where you have relatively low number of items (stock symbols, sensors or other subjects) and large volume of time series data associated with each of them. Symbols do noty work well with counts over 100k.

One workaround that you can try is this:

$str("uniqueId").size(15).index().buckets(5000)

This will create an index that you want, a hashtable.

from questdb.

jaromirs commented on May 22, 2024

Hi Vlad, thanks for your prompt response. I'll try the workaround and let you know.

from questdb.

jaromirs commented on May 22, 2024

I've tried the workaround and it is considerably faster but the performance is still much worse than standard index with few distinct values. Is there any ETA when you add the efficient solution for case like this?

Thanks a lot.
Jaromir

from questdb.

bluestreak01 commented on May 22, 2024

could you give me an example of your object and config setup for it? it could be possible to tweak performance as is?

I cannot give an ETA but that's very high on priority list. I'll keep this issue updated with progress.

from questdb.

jaromirs commented on May 22, 2024

OK, thanks.

Here is my object:

public class Order {
    private long id;
    private String clOrdId;
    private long timestamp;

    public void clear(){
        clOrdId = null;
    }

    public long getId() {
        return id;
    }

    public void setId(long id) {
        this.id = id;
    }

    public long getTimestamp() {
        return timestamp;
    }

    public void setTimestamp(long timestamp) {
        this.timestamp = timestamp;
    }

    public void setClOrdId(String clOrdId) {
        this.clOrdId = clOrdId;
    }

    public String getClOrdId() {
        return clOrdId;
    }
}

And the setup is here:
$(Order.class)
.partitionBy(PartitionType.DAY)
.location("orders-by-day")
.key("id")
.$str("clOrdId").index().size(15).buckets(5000)
.$ts()

from questdb.

bluestreak01 commented on May 22, 2024

This code appends 1M Orders in 216ms without index and 300ms with index. Is this similar to what you are getting?

        JournalFactory factory = new JournalFactory(new JournalConfigurationBuilder() {{
            $(Order.class)
                    .partitionBy(PartitionType.DAY)
                    .$str("clOrdId").index().buckets(5000)
                    .$ts()
            ;
        }}.build(args[0]));


        Order order = new Order();

        long t = System.nanoTime();
        JournalWriter<Order> w = factory.writer(Order.class);
        for (int i = -1000000; i < 1000000; i++) {
            if (i == 0) {
                t = System.nanoTime();
            }
            order.setTimestamp(System.currentTimeMillis());
            order.setId(i);
            order.setClOrdId(Integer.toString(i));
            w.append(order);
        }
        w.commit();

        System.out.println(System.nanoTime() - t);

from questdb.

jaromirs commented on May 22, 2024

I was getting much worse numbers - I compared my example with yours and the difference was the timestamp. In my example the timestamp was calculated so it distributed the items among 90 days - thus creating 90 partitions.

One last question, Vlad: in the example above, how do I find an item with specific clordId? It seems the journal query does not work:

journal.query().all().withSymValues("clOrdId", "1001")

Thansk again, Jaromir

from questdb.

bluestreak01 commented on May 22, 2024

With multiple partitions it is likely to be sizing/memory issue rather than index performance. I'm going to try to tune that and let you know.

In the mean time there are couple of things you can try:

$(Order.class).recordCountHint(10000) this hint is per partition, since 1M spread over 90 days 10K per partition should be fine, otherwise journal will be mapping memory too agressively.
depending on how much RAM you have you may end up with filling it up quite quickly. To avoid that user bulkWriter() instead of writer(), it is a bit slower but much more memory frugal:

JournalWriter<Order> w = factory.bulkWriter(Order.class);

In release there is no way to simply search string. You can do it but that's quite a few lines of code (and understanding of index structure). In snapshot however you can do something like this:

    for (Order o : q.ds(
            q.top(1
                    , q.forEachPartition(
                            q.source(w, false)
                            , q.forEachRow(
                                    q.kvSource("clOrdId", q.hashSource("clOrdId", "10"))
                                    , q.equalsConst("clOrdId", "10")
                            )
                    )
            )
            , order
    )) {
        System.out.println(o);
    }

it is a little fiddly as well, but simpler that doing it all by hand :)

from questdb.

bluestreak01 commented on May 22, 2024

With config like this:

        JournalFactory factory = new JournalFactory(new JournalConfigurationBuilder() {{
            $(Order.class).recordCountHint(10000)
                    .partitionBy(PartitionType.DAY)
                    .$str("clOrdId").index().buckets(100)
                    .$ts()
            ;
        }}.build(args[0]));

I can append 1M in 700ms without index and ~900ms with index.

For practical applications it doesn't make sense to partition 1M of data like that. Its about 126MB on disk underpopulated. You could be looking at growing database to 50-100M rows before considering partitioning. Otherwise cost of managing files is higher than the cost of search if data was kept all in one place.

from questdb.

sirinath commented on May 22, 2024

I am not sure if this might help. Perhaps you can borrow some ideas from:

https://code.google.com/p/cqengine/
https://code.google.com/p/concurrent-trees/

from questdb.

jaromirs commented on May 22, 2024

Hi Vlad,

thanks for your response. You are right, it does not make sense to have such small partitions. I wasn't planning it - it was just an artificial example. Btw: when do you think you will release the support for string searching?

Hi Sirinath,

thanks for the tip - I think we might have come across the CQEngine already - will double check.

Jaromir

from questdb.

bluestreak01 commented on May 22, 2024

Hi Jaromir, snapshot is already available in maven repo if you want to play with it. Just add this to pom:

<repositories>
    <repository>
        <id>sonatype-snapshots</id>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    </repository>
</repositories>

I'm planning to add hash index on int fields, so you can perhaps use "id" for unique key? I'll release over this weekend, having said that the Q API is part of a large project and will definitely take some time to stabilise and test properly, so you can use it but if treat it as beta i'd very much appreciate that.

from questdb.

bluestreak01 commented on May 22, 2024

I have release 2.0.1 containing support for key search (both int and string)

https://github.com/NFSdb/nfsdb/releases/tag/2.0.1

from questdb.

jaromirs commented on May 22, 2024

Cool. Thanks again Vlad.

from questdb.

bluestreak01 commented on May 22, 2024

My pleasure!

from questdb.

jaromirs commented on May 22, 2024

Hi Vlad,

I haven't found Release 2.0.1 on Maven Central Repository - could you please publish it? I am using 2.0.2-SNAPSHOT for the time being ...

Thanks, Jaromir

from questdb.

bluestreak01 commented on May 22, 2024

Hi Jaromir, you are right, release was incomplete.

All steps are now done and its with maven replication system. Please check in an hour or two.

Vlad

from questdb.

jaromirs commented on May 22, 2024

Thanks!

from questdb.

bluestreak01 commented on May 22, 2024

it is out, finally 👍

from questdb.

jaromirs commented on May 22, 2024

Great, thanks.

from questdb.

Unique key support about questdb HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent