Giter Club home page Giter Club logo

advanced_sementic_search's Introduction

Hybrid Advanced_Sementic_Search

WorkFlow

1. Introduction

In our project, we introduce a hybrid search methodology, which makes use of both dense and sparse vectors. This approach allows us to encompass a wide array of search features including semantic understanding through dense vectors and exact matching plus keyword searching via sparse vectors.

  1. The Hybrid Search Approach

The hybrid search approach is a powerful technique that combines the strengths of both dense and sparse vectors. While the dense vectors capture the semantic meaning of queries, the sparse vectors excel in providing precise matching based on exact terms and keywords. This dual capability ensures an overall robust and comprehensive search result.

  1. Role of SPLADE

For the implementation of sparse vectors, we employ SPLADE - a state-of-the-art sparse embedding method that outperforms traditional techniques like BM25 across diverse tasks. SPLADE not only enhances the benefits of sparse search but also introduces the feature of learning term expansion, effectively addressing the vocabulary mismatch problem often faced in question-answering systems.

  1. Integration of Sentence Transformers

In order to handle the dense vectors, we have used a sentence transformer model trained on the MS-MARCO dataset. Sentence transformers are adept at understanding the semantic context, thus making them an integral part of our hybrid search approach.

  1. Implementation via Hugging Face Transformers

All these components โ€“ SPLADE, sentence transformer model, and the hybrid search approach โ€“ are implemented through Hugging Face Transformers, a widely-used library for state-of-the-art Natural Language Processing tasks.

  1. Conclusion

By combining sentence transformers and SPLADE in a hybrid search approach, our project presents a comprehensive question-answering system that is capable of handling both semantic understanding and precise term matching. This makes it an effective tool for any use-case that requires an advanced search capability, promising high-quality results that balance both precision and context understanding.

image

How to run the Project

  1. first clone the project
  2. In backend folder create a .env file and
model_id_ner = dslim/bert-base-NER
dense_model_id=msmarco-bert-base-dot-v5
sparse_model_id = naver/efficient-splade-V-large-doc
api_key=<Pinecone api key>
environment=<Pinecone Enviroment>
index_name =<Index Name>
batch_size = 64
  1. now run uvicorn app:app
  2. now in frontent folder write npm install
  3. then npm start

advanced_sementic_search's People

Contributors

manish06097 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.