Giter Club home page Giter Club logo

Comments (4)

sparshbhawsar avatar sparshbhawsar commented on August 11, 2024 1

Hi @tazarov, Yes the issue still in 0.5.0 version.

I can't provide the data, it's confidential but i can share the code using which you can reproduce this.

import chroma db 

### Using Normal Client 
chroma_client = chromadb.Client()

from chromadb import Documents, EmbeddingFunction, Embeddings 

Class MyEmbeddingFunction(EmbeddingFunction): 
def __call__(self, input: Documents) -> Embeddings: 
     embeddings = Your Embeddings 
     return embeddings 

collection = chroma_client.create_collection( name="test", embedding_function=MyEmbeddingFunction(), metadata={"hnsw:space": "cosine"} ) 

# docs = Your Document 

collection.add(ids=[str(i) for i in range(len(docs))], documents=[d.page_content for d in docs], metadatas=[d.metadata for d in docs])

collection.query( query_embeddings==[Query Vector], n_results=3 ) 


### Using Persistent Client (Saving to disk)
persistent_client = chromadb.PersistentClient(path="/path/to/save/to”) 

from chromadb import Documents, EmbeddingFunction, Embeddings 

Class MyEmbeddingFunction(EmbeddingFunction): 
def __call__(self, input: Documents) -> Embeddings: 
       embeddings = Your Embeddings 
       return embeddings 

persistent_collection = persistent_client.create_collection( name="test", embedding_function=MyEmbeddingFunction(), metadata={"hnsw:space": "cosine"} ) 

# docs = Your Document 

persistent_collection.add(ids=[str(i) for i in range(len(docs))], documents=[d.page_content for d in docs], metadatas=[d.metadata for d in docs])

persistent_collection.query( query_embeddings==[Query Vector], n_results=3 )

from chroma.

tazarov avatar tazarov commented on August 11, 2024

@sparshbhawsar, thanks for raising this. Do you have a short snippet of your add/query with some sample data to help with reproducing this?

Side note: Is the bug reproducible in Chroma 0.5.0?

from chroma.

sparshbhawsar avatar sparshbhawsar commented on August 11, 2024

Hi @tazarov, any update on the issue ?

from chroma.

tazarov avatar tazarov commented on August 11, 2024

@sparshbhawsar,

I've tried with:

import chromadb

### Using Normal Client 
chroma_client = chromadb.Client()


collection = chroma_client.create_collection( name="test123", metadata={"hnsw:space": "cosine"} )

docs = ["This provides a daily snapshot of the ...", "This is the description of....","Table1","Table2"] 

collection.add(ids=[str(i) for i in range(len(docs))], documents=[d for d in docs])

qr = collection.query( query_texts=["description of snapshot table"], n_results=4)

print(qr)

### Using Persistent Client (Saving to disk)
persistent_client = chromadb.PersistentClient(path="./2134")


persistent_collection = persistent_client.create_collection( name="test", metadata={"hnsw:space": "cosine"} )

# docs = Your Document 

persistent_collection.add(ids=[str(i) for i in range(len(docs))], documents=[d for d in docs])

qr1 = persistent_collection.query( query_texts=["description of snapshot table"], n_results=4 )

print(qr1)

A few things to note about the above code is that it relies on the default embedding function (it is not great with cosine, but it works. It yields consistent results for both clients. We do a lot of testing around the consistency of things, so I wonder what conditions you see this problem under. I have two suspects:

  • Data
  • Custom Embedding functions

I think next step is for me to work on the first by getting a little more "decent" dataset than just 4 docs. You mentioned that your dataset is private, but can you give me an indication of how many records (embeddings) you add to Chroma and whether your topK results have small or large distances between each other?

from chroma.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.