Comments (4)
Hi @tazarov, Yes the issue still in 0.5.0 version.
I can't provide the data, it's confidential but i can share the code using which you can reproduce this.
import chroma db
### Using Normal Client
chroma_client = chromadb.Client()
from chromadb import Documents, EmbeddingFunction, Embeddings
Class MyEmbeddingFunction(EmbeddingFunction):
def __call__(self, input: Documents) -> Embeddings:
embeddings = Your Embeddings
return embeddings
collection = chroma_client.create_collection( name="test", embedding_function=MyEmbeddingFunction(), metadata={"hnsw:space": "cosine"} )
# docs = Your Document
collection.add(ids=[str(i) for i in range(len(docs))], documents=[d.page_content for d in docs], metadatas=[d.metadata for d in docs])
collection.query( query_embeddings==[Query Vector], n_results=3 )
### Using Persistent Client (Saving to disk)
persistent_client = chromadb.PersistentClient(path="/path/to/save/toβ)
from chromadb import Documents, EmbeddingFunction, Embeddings
Class MyEmbeddingFunction(EmbeddingFunction):
def __call__(self, input: Documents) -> Embeddings:
embeddings = Your Embeddings
return embeddings
persistent_collection = persistent_client.create_collection( name="test", embedding_function=MyEmbeddingFunction(), metadata={"hnsw:space": "cosine"} )
# docs = Your Document
persistent_collection.add(ids=[str(i) for i in range(len(docs))], documents=[d.page_content for d in docs], metadatas=[d.metadata for d in docs])
persistent_collection.query( query_embeddings==[Query Vector], n_results=3 )
from chroma.
@sparshbhawsar, thanks for raising this. Do you have a short snippet of your add/query with some sample data to help with reproducing this?
Side note: Is the bug reproducible in Chroma 0.5.0
?
from chroma.
Hi @tazarov, any update on the issue ?
from chroma.
I've tried with:
import chromadb
### Using Normal Client
chroma_client = chromadb.Client()
collection = chroma_client.create_collection( name="test123", metadata={"hnsw:space": "cosine"} )
docs = ["This provides a daily snapshot of the ...", "This is the description of....","Table1","Table2"]
collection.add(ids=[str(i) for i in range(len(docs))], documents=[d for d in docs])
qr = collection.query( query_texts=["description of snapshot table"], n_results=4)
print(qr)
### Using Persistent Client (Saving to disk)
persistent_client = chromadb.PersistentClient(path="./2134")
persistent_collection = persistent_client.create_collection( name="test", metadata={"hnsw:space": "cosine"} )
# docs = Your Document
persistent_collection.add(ids=[str(i) for i in range(len(docs))], documents=[d for d in docs])
qr1 = persistent_collection.query( query_texts=["description of snapshot table"], n_results=4 )
print(qr1)
A few things to note about the above code is that it relies on the default embedding function (it is not great with cosine, but it works. It yields consistent results for both clients. We do a lot of testing around the consistency of things, so I wonder what conditions you see this problem under. I have two suspects:
- Data
- Custom Embedding functions
I think next step is for me to work on the first by getting a little more "decent" dataset than just 4 docs. You mentioned that your dataset is private, but can you give me an indication of how many records (embeddings) you add to Chroma and whether your topK results have small or large distances between each other?
from chroma.
Related Issues (20)
- [Bug]: Collection.modify() got an unexpected keyword argument 'new_name' HOT 1
- [Feature Request]: Combine `queryTexts` and `queryEmbeddings` into a single `query` field in the interface for the queryRecords method.
- [Feature Request]: Metadata Filters: having common tags or not HOT 2
- [Bug]: Lack of Auth, headers and SSL (custom PKI/ignore) for Async client
- [Feature Request]: Singular Convenience Methods/Interfaces for Chroma Clients
- [Feature Request]: Structure Records as Objects/Dicts instead of parallel arrays. HOT 2
- [Feature Request]: Match shape of return type to shape of parameter types
- [Install issue]: How to initialize the tenant and database when using a kubernetes cluster? HOT 5
- [Bug]: v0.5.5 uses Pydantic 2 features while depending on Pydantic 1 HOT 1
- [Bug]: AttributeError: 'NoneType' object has no attribute 'get' HOT 16
- [Bug]: chromadb 0.5.4 crashes on windows HOT 32
- [Bug]: Unable to modify (update) collection metadata - "hnsw:space" will be lost. HOT 1
- [Feature Request]: Retrieve A List Of Metadata Or A List Of Collections HOT 9
- [Feature Request]: Retrieve A List Of Metadata Or Collections
- [Bug]: Error when loading chromadb collection froma docker container HOT 4
- [ENH] Client / Server versioning compatibility HOT 1
- [Install issue]: Authentication token not applied when using docker HOT 2
- [Feature Request]: AUR package
- [Bug]: 100% appear, windows11, python 3.11.9, the error will appear when 100th data insert to collection HOT 2
- [Bug]: System cache not cleared when a new database is loaded with same persitent_directory HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chroma.