Giter Club home page Giter Club logo

Comments (9)

dosubot avatar dosubot commented on June 9, 2024 2

To address the ValueError related to embedding dimensions not aligning (512 or 1024 vs. 384), ensure you're setting the max_length parameter correctly when initializing the HuggingFaceEmbedding object. This parameter should match the dimensionality of the embeddings you intend to use. Here's how you can adjust it:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(
    model_name="/home/techplus-820/w/bge-large-en-v1.5",  # Adjust the model_name as needed
    max_length=1024  # Set this to 512 or 1024 based on your model's embedding dimensions
)
Settings.embed_model = embed_model

This adjustment ensures the embedding model is configured to handle the input dimensions you're working with, thus avoiding the dimension mismatch error.

If the issue persists, it might be due to a mismatch between the dimensions of the vectors in your query and those in your vector store. Ensure the embedding_dimension parameter of your vector store matches the dimension of your query vectors to resolve this.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

from llama_index.

hushuitian avatar hushuitian commented on June 9, 2024 1

@logan-markewich Thank you very much! The quick fixing solution is that I should delete the old ./storage directory after changing the embedding model. It now works perfectly as expected.Thank you very much again for helping me get out of the rabbit hole!

from llama_index.

dosubot avatar dosubot commented on June 9, 2024

Hello @hushuitian! 👋 I'm Dosu, a bot here to assist you with bug fixes, answer your questions, and guide you on contributing while we wait for a human maintainer. I'm taking a look at your issue and will provide a full answer shortly.

from llama_index.

logan-markewich avatar logan-markewich commented on June 9, 2024

@hushuitian you need to make sure you are querying with the same embedding model that created the index.

If you switch embedding models, you need to build a new index

from llama_index.

hushuitian avatar hushuitian commented on June 9, 2024

@Disiok Thanks for fixing advice! But it doesn't work.

from llama_index.

hushuitian avatar hushuitian commented on June 9, 2024

@hushuitian you need to make sure you are querying with the same embedding model that created the index.

If you switch embedding models, you need to build a new index

Thanks a lot for your advice! In my test case. It uses SimpleVectorStore. In get_top_k_embeddings() of ~/.local/lib/python3.10/site-packages/llama_index/core/indices/query/embedding_utils.py,
...
embeddings_np = np.array(embeddings)
...
embeddings_np.shape is set to (22, 384). Where is the number 384 coming from? Some default configurations somewhere? How can we change this shape to (22,512) that will correctly match query_embedding_np.shape (512,)?

from llama_index.

logan-markewich avatar logan-markewich commented on June 9, 2024

@hushuitian there is a mismatch between the embeddings used to create the index, and the embeddings used to query the index. That is where the 384 and 512 are coming from

from llama_index.

logan-markewich avatar logan-markewich commented on June 9, 2024

You can't change the embedding model and query an existing index, which is what you are doing

If you change the embedding model, you NEED to create the index again using VectorStoreIndex.from_documents(documents, ...) or VectorStoreIndex(nodes, ...) before you can query again

from llama_index.

logan-markewich avatar logan-markewich commented on June 9, 2024

In the notebook you are following, there is this block of code

 import os
from llama_index.core import (
    StorageContext,
    VectorStoreIndex,
    load_index_from_storage,
)

if not os.path.exists("storage"):
    index = VectorStoreIndex.from_documents(docs)
    # save index to disk
    index.set_index_id("vector_index")
    index.storage_context.persist("./storage")
else:
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="storage")
    # load index
    index = load_index_from_storage(storage_context, index_id="vector_index")

If you change the embedding model, you should delete the existing storage folder so that the index is created again

from llama_index.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.