Giter Club home page Giter Club logo

Comments (9)

eric-gardyn avatar eric-gardyn commented on May 23, 2024 1

from troubleshooting session, it looks like for the 'text-embedding-3-small' embedding, the minScore value for the findNearestNeighborsOptions in the makeDefaultFindContent setup needs be set to lower than 0.9

from chatbot.

mongodben avatar mongodben commented on May 23, 2024

could be an Atlas Vector Search Index issue.

Have you created a vector search index on the field collection with the text-embedding-3-small embeddings? (see https://mongodb.github.io/chatbot/mongodb#3-create-atlas-vector-search-index-required-for-rag)

from chatbot.

eric-gardyn avatar eric-gardyn commented on May 23, 2024

yes, the index is marked as "Active" with
Primary Node:
60 (100%) indexed of 60 total

created with

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    }
  ]
}

I can also inspect the "embedded_content" collection's documents.
So, I can confirm that the ingestion script worked.

from chatbot.

mongodben avatar mongodben commented on May 23, 2024

hmm, it's hard to help debug this without more information. would you be able to share the source code?

and just to confirm, do you have 2 vector search indexes, 1 for each collection containing embedded_content?

from chatbot.

eric-gardyn avatar eric-gardyn commented on May 23, 2024

I cloned a brand new instance (from 'main' with commit 9ed093a), created new database/collection ('embedded_content'), new index ('vector_index') from new DB/collection, and ran the "quick start" example (from https://mongodb.github.io/chatbot/quick-start/).
the ingest is successfull: index shows: 78 (100%) indexed of 78 total
ran the default ui and server (from 'quick-start' folder as well),
and I got the same error: "message":"No matching content found".

config:

# MongoDB config
MONGODB_CONNECTION_URI="mongodb+srv://XXX:[email protected]/?retryWrites=true&w=majority"
VECTOR_SEARCH_INDEX_NAME="vector_index" # or whatever your index name is
MONGODB_DATABASE_NAME="gardyn-search-dev-4-32" # or whatever your database name is. must contain vector search index.

# OpenAI config
OPENAI_API_KEY="xxxxxx"
OPENAI_CHAT_COMPLETION_MODEL="gpt-4"
OPENAI_EMBEDDING_MODEL="text-embedding-3-small"

does it matter than the index name 'vector_index' is the same across several DB?

from chatbot.

mongodben avatar mongodben commented on May 23, 2024

does it matter than the index name 'vector_index' is the same across several DB?

i believe this could be the issue. though it's hard to say without looking at your code and cluster config.

try setting up different index names in the atlas UI.

atlas vector search indexes are set at the cluster level, correlating to a specific collection in a specific database.

from chatbot.

eric-gardyn avatar eric-gardyn commented on May 23, 2024

unfortunately, it did not work:
I created a brand new cluster (M0 Sandbox), used the 'quick-start' example out-of-the-box (with mongoDB doc as a source). I used the mongodb-ui and still got the "Unfortunately, I do not know how to respond to your message.".

I also tried with the "text-embedding-3-large" embedding model, but got error

vector field is indexed with 2048 dimensions but queried with 3072"

as the vector_index does not allow 3072:

Number of vector dimensions. Value can be between 1 and 2048, both inclusive.
Value is above the maximum of 2048.

from chatbot.

mongodben avatar mongodben commented on May 23, 2024

I created a brand new cluster (M0 Sandbox), used the 'quick-start' example out-of-the-box (with mongoDB doc as a source). I used the mongodb-ui and still got the "Unfortunately, I do not know how to respond to your message.".

hard to say what's going on here without more visibility into your project, but i suspect it's an index config issue.

if you'd like i'd be happy to hop on a call and work through the problem with you. also curious to learn more about your experience using the framework 😄

vector field is indexed with 2048 dimensions but queried with 3072

this is due to Atlas Vector Search limiting the max number of dimensions to index to 2048. from the docs: "You must specify a value less than 2049. You can set this field only for vector type fields."

this isn't related to the Chatbot Framework, but Atlas Vector Search itself, so it'd be hard to work around w/o using a different vector DB.

 

from chatbot.

mongodben avatar mongodben commented on May 23, 2024

thanks for posting that here. will add to the docs that the similarity score can vary depending on the vector embedding model you use.

closing this issue.

from chatbot.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.