Comments (1)
Hey @huanghao128! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let me know how I can assist you!
The issue in the BM25Retriever
class at line 92 of the file llama_index/retrievers/bm25/base.py
can be fixed by modifying the loop to correctly match the retrieval nodes, top_n indexes, and scores. The suggested fix is correct. Here is the corrected code:
for ix in top_n:
nodes.append(NodeWithScore(node=self._nodes[ix], score=float(scores[ix])))
This ensures that the scores are correctly matched with their corresponding nodes. The updated _retrieve
method should look like this:
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
if query_bundle.custom_embedding_strs or query_bundle.embedding:
logger.warning("BM25Retriever does not support embeddings, skipping...")
query = query_bundle.query_str
tokenized_query = self._tokenizer(query)
scores = self.bm25.get_scores(tokenized_query)
top_n = scores.argsort()[::-1][: self._similarity_top_k]
nodes: List[NodeWithScore] = []
for ix in top_n:
nodes.append(NodeWithScore(node=self._nodes[ix], score=float(scores[ix])))
return nodes
This change ensures that the scores are correctly associated with the nodes they correspond to [1].
To continue talking to Dosu, mention @dosu.
from llama_index.
Related Issues (20)
- [Feature Request]: Allow streaming of progress during workflow execution
- [Question]: How can I combine Vector DB and a new query engine? HOT 1
- [Question]: Less context than similarity_top_k HOT 3
- [Feature Request]:
- [Question]: Ingestion Pipelines and Workflows? HOT 2
- [Bug]: poetry add llama-index failing for v0.10.65 HOT 6
- [Bug]: impossible to use PDfReader with an S3 file because of Path() casting HOT 1
- [Bug]: impossible to use PDfReader with an S3 file because of Path() casting HOT 1
- [Question]: How to run HuggingFaceEmbedding on multiple available GPUs? HOT 1
- [Question]: HOT 2
- [Question]: Constructing hybrid indices with Qdrant. HOT 10
- [Bug]: NeptuneAnalyticsPropertyGraphStore incorrectly assigning the embedding to the chunk instead of the entity HOT 2
- [Question]: Data disappear after build vector store with VectorStoreIndex.from_documents() HOT 6
- [Bug]: VectorStoreIndex.build_index_from_nodes() missing 1 required positional argument: 'self' HOT 4
- Pydantic V2 migration (llms)
- Pydantic V2 migration (graph-stores & indices) HOT 1
- Pydantic V2 migration (agent, embeddings, callbacks, extractors, multi-modal-llms, postprocessor)
- Pydantic V2 migration (node parsers, output parsers, program, question gen)
- [Question]: how to insert custom embeddings into qdrant instead of document HOT 1
- [Feature Request]: Support to configure the base url of jina.ai
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama_index.