Comments (1)
-
You can experiment with the
k1
parameter in BM25 by using therank_bm25
library, which theBM25Retriever
class from LlamaIndex relies on. Here is an example of how you can modify theBM25Retriever
class to include thek1
parameter:from rank_bm25 import BM25Okapi class BM25Retriever(BaseRetriever): def __init__( self, nodes: List[BaseNode], tokenizer: Optional[Callable[[str], List[str]]], similarity_top_k: int = DEFAULT_SIMILARITY_TOP_K, k1: float = 1.5, # Default value for k1 callback_manager: Optional[CallbackManager] = None, objects: Optional[List[IndexNode]] = None, object_map: Optional[dict] = None, verbose: bool = False, ) -> None: try: from rank_bm25 import BM25Okapi except ImportError: raise ImportError("Please install rank_bm25: pip install rank-bm25") self._nodes = nodes self._tokenizer = tokenizer or tokenize_remove_stopwords self._similarity_top_k = similarity_top_k self._corpus = [self._tokenizer(node.get_content()) for node in self._nodes] self.bm25 = BM25Okapi(self._corpus, k1=k1) # Set k1 parameter here super().__init__( callback_manager=callback_manager, object_map=object_map, objects=objects, verbose=verbose, )
-
For a retrieval engine that is more biased towards text matches rather than term frequency, you might consider using a vector-based retrieval method such as those provided by dense embedding models (e.g., Sentence Transformers). These models focus more on semantic similarity rather than term frequency. LlamaIndex supports vector-based retrieval methods, which can be more effective for capturing semantic meaning in text. You can explore using the
VectorStoreIndex
class in LlamaIndex for this purpose [1][2].
Additionally, you can use the MetadataFilters
class to handle exact matches in VectorStoreIndex
. Here is an example:
```python
from llama_index.legacy.vector_stores.types import VectorStoreQuery, VectorStoreQueryMode, MetadataFilters, MetadataFilter, FilterOperator
# Create exact match filters
filters = MetadataFilters(filters=[
MetadataFilter(key="field_name", value="exact_value", operator=FilterOperator.EQ)
])
# Create a query with the TEXT_SEARCH mode to prioritize exact matches
query = VectorStoreQuery(
query_str="your query",
mode=VectorStoreQueryMode.TEXT_SEARCH,
filters=filters
)
# Execute the query
result = vector_store.query(query)
```
This configuration ensures that the query prioritizes exact matches by using the TEXT_SEARCH
mode and applying the exact match filters [3].
To continue talking to Dosu, mention @dosu.
from llama_index.
Related Issues (20)
- [Bug]: Getting deprecation message for Neo4jPropertyGraphStore HOT 4
- [Question]: Why does Bedrock prints out prompt and other unnecessary symbols? HOT 2
- [Question]: Retriever search depth in a custom node relationship environment HOT 4
- [Bug]: function_call does not generate langfuse subspans for a custom AsyncBaseTool class HOT 7
- [Bug]: AttributeError: 'Anthropic' object has no attribute 'beta' HOT 6
- [Bug]: ReplicateError: Unauthenticated HOT 3
- [Bug]: Unable to instantiate LiteLLM LlamaIndex Object HOT 2
- [Bug]: Bedrock cohere still not working as expected HOT 1
- [Feature Request]: support of faiss.IndexIVFFlat
- [Bug]: Querying by specific doc_ids in the vector db not working properly HOT 4
- [Bug]: TypeError: expected str instance, int found when calling lazy_load_data method in llama_index/readers/mongodb/base.py HOT 1
- [Question]: How big of a VectorStore is too big ? HOT 1
- [Bug]: llama-index-llms-langchain is not working with latest lanchain HOT 2
- [Bug]: Function Tool not being hit when it's a SubQuestionQueryEngine in OpenAIAgent HOT 3
- [Feature Request]: Improve ElasticSearch metadata filtering handling
- [Question]: cannot download sharepoint file HOT 2
- [Question]: Connecting llamaindex to aoss opensearch on aws HOT 7
- [Bug]: 'await' used outside function HOT 4
- [Question]: TokenCountingHandler is not working for Multimodal HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama_index.