Question Validation <

[Question]: How do I prioritize exact matches in my retriever? (BM25 or any other) about llama_index HOT 1 OPEN

pliablepixels commented on August 16, 2024 1

[Question]: How do I prioritize exact matches in my retriever? (BM25 or any other)

from llama_index.

Comments (1)

dosubot commented on August 16, 2024 3

You can experiment with the k1 parameter in BM25 by using the rank_bm25 library, which the BM25Retriever class from LlamaIndex relies on. Here is an example of how you can modify the BM25Retriever class to include the k1 parameter:

from rank_bm25 import BM25Okapi

class BM25Retriever(BaseRetriever):
    def __init__(
        self,
        nodes: List[BaseNode],
        tokenizer: Optional[Callable[[str], List[str]]],
        similarity_top_k: int = DEFAULT_SIMILARITY_TOP_K,
        k1: float = 1.5,  # Default value for k1
        callback_manager: Optional[CallbackManager] = None,
        objects: Optional[List[IndexNode]] = None,
        object_map: Optional[dict] = None,
        verbose: bool = False,
    ) -> None:
        try:
            from rank_bm25 import BM25Okapi
        except ImportError:
            raise ImportError("Please install rank_bm25: pip install rank-bm25")

        self._nodes = nodes
        self._tokenizer = tokenizer or tokenize_remove_stopwords
        self._similarity_top_k = similarity_top_k
        self._corpus = [self._tokenizer(node.get_content()) for node in self._nodes]
        self.bm25 = BM25Okapi(self._corpus, k1=k1)  # Set k1 parameter here
        super().__init__(
            callback_manager=callback_manager,
            object_map=object_map,
            objects=objects,
            verbose=verbose,
        )

For a retrieval engine that is more biased towards text matches rather than term frequency, you might consider using a vector-based retrieval method such as those provided by dense embedding models (e.g., Sentence Transformers). These models focus more on semantic similarity rather than term frequency. LlamaIndex supports vector-based retrieval methods, which can be more effective for capturing semantic meaning in text. You can explore using the VectorStoreIndex class in LlamaIndex for this purpose ^[1]^[2].

Additionally, you can use the MetadataFilters class to handle exact matches in VectorStoreIndex. Here is an example:

```python
from llama_index.legacy.vector_stores.types import VectorStoreQuery, VectorStoreQueryMode, MetadataFilters, MetadataFilter, FilterOperator

# Create exact match filters
filters = MetadataFilters(filters=[
    MetadataFilter(key="field_name", value="exact_value", operator=FilterOperator.EQ)
])

# Create a query with the TEXT_SEARCH mode to prioritize exact matches
query = VectorStoreQuery(
    query_str="your query",
    mode=VectorStoreQueryMode.TEXT_SEARCH,
    filters=filters
)

# Execute the query
result = vector_store.query(query)
```

This configuration ensures that the query prioritizes exact matches by using the TEXT_SEARCH mode and applying the exact match filters ^[3].

To continue talking to Dosu, mention @dosu.

from llama_index.

Recommend Projects

[Question]: How do I prioritize exact matches in my retriever? (BM25 or any other) about llama_index HOT 1 OPEN

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent