Giter Club home page Giter Club logo

Comments (7)

mikethemerry avatar mikethemerry commented on September 4, 2024 3

I've just spent three evenings tracking down the same bug and have managed to figure this out in the last half hour or so.

I think this is a regression introduced by https://github.com/chroma-core/chroma/pull/2526/files

I'm still figuring out the reproduction steps, but I think the process is

  1. Deploy chroma and create a collection using <=0.5.4 with metadata={"hnsw:space": "cosine"} or similar. Specifically for me
    self.collection = self.vdb.get_or_create_collection(
            name=collection_name,
            embedding_function=self.embedding_function,
            metadata={"hnsw:space": "cosine"},
        )

This will create the collection with the defaults in 0.5.4 where sync_threshold=100 and batch_size=1000

  1. Upgrade your client to 0.5.5
  2. It is now checking the sync_threshold and batch_size with the existing defaults and throwing the error

I haven't read through all of the other changes to the HNSW work in 0.5.5 but it looks like there's some changes to persistent properties and similar. I actually was trying to change the configured properties specifically with different metadata definitions and similar, but was having a lot of troubles. Specifically, this was not fixed by changing that code to

    self.collection = self.vdb.get_or_create_collection(
            name=collection_name,
            embedding_function=self.embedding_function,
            metadata={"hnsw:space": "cosine", "sync_threshold":1000, "batch_size":100},
        )

As a short term, I would suggest a downgrade to 0.5.4 (this has worked for me) and wait for a patch as the 0.5.5 is still in pre-release.

from chroma.

tazarov avatar tazarov commented on September 4, 2024

@dddxst and @mikethemerry, thanks for reporting and investigating this. Indeed, it was a bug (#2338) released with 0.5.4 which was fixed (#2526) in 0.5.5. The issue is that any DB created with 0.5.4 would result in a validation issue you reporeted.

To fix the problem (ideally, we should've added a migration script to do that, but alas):

If in docker:

Connect to your docker container:

apt update && apt install sqlite3
sqlite3 /chroma/chroma/chroma.sqlite3 "update collections set config_json_str=json_set(config_json_str,'$.hnsw_configuration.batch_size',100,'$.hnsw_configuration.sync_threshold',1000) where name='test';"
# you don't have to run the below, but for consistency reasons:
sqlite3 /chroma/chroma/chroma.sqlite3 "update collection_metadata set int_value = 100 where key='hnsw:batch_size' and collection_id in (select id from collections where name='test');"
sqlite3 /chroma/chroma/chroma.sqlite3 "update collection_metadata set int_value = 1000 where key='hnsw:hnsw:sync_threshold' and collection_id in (select id from collections where name='test');"

from chroma.

dodeeric avatar dodeeric commented on September 4, 2024

@mikethemerry, thanks to you it did not take three evenings to me to solve my problem, but only 3 minutes...

from chroma.

dddxst avatar dddxst commented on September 4, 2024

@dddxst and @mikethemerry, thanks for reporting and investigating this. Indeed, it was a bug (#2338) released with 0.5.4 which was fixed (#2526) in 0.5.5. The issue is that any DB created with 0.5.4 would result in a validation issue you reporeted.

To fix the problem (ideally, we should've added a migration script to do that, but alas):

If in docker:

Connect to your docker container:

apt update && apt install sqlite3
sqlite3 /chroma/chroma/chroma.sqlite3 "update collections set config_json_str=json_set(config_json_str,'$.hnsw_configuration.batch_size',100,'$.hnsw_configuration.sync_threshold',1000) where name='test';"
# you don't have to run the below, but for consistency reasons:
sqlite3 /chroma/chroma/chroma.sqlite3 "update collection_metadata set int_value = 100 where key='hnsw:batch_size' and collection_id in (select id from collections where name='test');"
sqlite3 /chroma/chroma/chroma.sqlite3 "update collection_metadata set int_value = 1000 where key='hnsw:hnsw:sync_threshold' and collection_id in (select id from collections where name='test');"

tks,it works when update to 0.5.5,but error occur on windows ...

from chroma.

dddxst avatar dddxst commented on September 4, 2024

I've just spent three evenings tracking down the same bug and have managed to figure this out in the last half hour or so.

I think this is a regression introduced by https://github.com/chroma-core/chroma/pull/2526/files

I'm still figuring out the reproduction steps, but I think the process is

  1. Deploy chroma and create a collection using <=0.5.4 with metadata={"hnsw:space": "cosine"} or similar. Specifically for me
    self.collection = self.vdb.get_or_create_collection(
            name=collection_name,
            embedding_function=self.embedding_function,
            metadata={"hnsw:space": "cosine"},
        )

This will create the collection with the defaults in 0.5.4 where sync_threshold=100 and batch_size=1000

  1. Upgrade your client to 0.5.5
  2. It is now checking the sync_threshold and batch_size with the existing defaults and throwing the error

I haven't read through all of the other changes to the HNSW work in 0.5.5 but it looks like there's some changes to persistent properties and similar. I actually was trying to change the configured properties specifically with different metadata definitions and similar, but was having a lot of troubles. Specifically, this was not fixed by changing that code to

    self.collection = self.vdb.get_or_create_collection(
            name=collection_name,
            embedding_function=self.embedding_function,
            metadata={"hnsw:space": "cosine", "sync_threshold":1000, "batch_size":100},
        )

As a short term, I would suggest a downgrade to 0.5.4 (this has worked for me) and wait for a patch as the 0.5.5 is still in pre-release.

tks

from chroma.

tazarov avatar tazarov commented on September 4, 2024

@dddxst and @mikethemerry, thanks for reporting and investigating this. Indeed, it was a bug (#2338) released with 0.5.4 which was fixed (#2526) in 0.5.5. The issue is that any DB created with 0.5.4 would result in a validation issue you reporeted.
To fix the problem (ideally, we should've added a migration script to do that, but alas):
If in docker:
Connect to your docker container:

apt update && apt install sqlite3
sqlite3 /chroma/chroma/chroma.sqlite3 "update collections set config_json_str=json_set(config_json_str,'$.hnsw_configuration.batch_size',100,'$.hnsw_configuration.sync_threshold',1000) where name='test';"
# you don't have to run the below, but for consistency reasons:
sqlite3 /chroma/chroma/chroma.sqlite3 "update collection_metadata set int_value = 100 where key='hnsw:batch_size' and collection_id in (select id from collections where name='test');"
sqlite3 /chroma/chroma/chroma.sqlite3 "update collection_metadata set int_value = 1000 where key='hnsw:hnsw:sync_threshold' and collection_id in (select id from collections where name='test');"

tks,it works when update to 0.5.5,but error occur on windows ...

Can you share the error you get on Windows?

from chroma.

codetheweb avatar codetheweb commented on September 4, 2024

Hey everyone--I believe this is caused by a version mismatch; this shouldn't happen if your client and server are on the same version. Please make sure that your server and client are both on 0.5.5 and let us know if this is still happening.

from chroma.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.