Giter Club home page Giter Club logo

awesome-semantic-search's Introduction

awesome-semantic-search

In Semantic search with embeddings, I described how to build semantic search systems (also called neural search). These systems are being used more and more with indexing techniques improving and representation learning getting better every year with new deep learning papers. The medium post explain how to build them, and this list is meant to reference all interesting resources on the topic to allow anyone to quickly start building systems.

image

  • Tutorials explain in depth how to build semantic search systems
  • Good datasets to build semantic search systems
    • Tensorflow datasets building search systems only requires image or text, many tf datasets are interesting in that regard
    • Torchvision datasets datasets provided for vision are also interesting for this
  • Pretrained encoders make it possible to quickly build a new system without training
    • Vision+Language
      • Clip encode image and text in a same space
    • Image
      • Efficientnet b0 is a simple way to encode images
      • Dino is an encoder trained using self supervision which reaches high knn classification performance
      • Face embeddings compute face embeddings
    • Text
      • Labse a bert text encoder trained for similarity that put sentences from 109 in the same space
    • Misc
      • Jina examples provide example on how to use pretrained encoders to build search systems
      • Vectorhub image, text, audio encoders
  • Similarity learning allows you to build new similarity encoders
  • Indexing and approximate knn: indexing make it possible to create small indices encoding million of embeddings that can be used to query the data in milli seconds
    • Faiss Many aknn algorithms (ivf, hnsw, flat, gpu, …) in c++ with a python interface
    • Autofaiss to use faiss easily
    • Nmslib fast implementation of hnsw
    • Annoy a aknn algorithm by spotify
    • Scann a aknn algorithm faster than hnsw by google
    • Catalyzer training the quantizer with backpropagation
    • hora approximate knn implemented in rust
  • Search pipelines allow fast serving and customization of how the indices are queries
    • Milvus end to end similarity engine, on top of faiss and hnswlib
    • Jina flexible end to end similarity engine
    • Haystack question answering on text pipeline
  • Companies: many companies are being built around semantic search systems
    • Jina is building flexible pipeline to encode and search with embeddings
    • Weaviate is building a cloud-native vector search engine
    • Pinecone a startup building databases indexing embeddings
    • Vector ai is building an encoder hub
    • Milvus builds an end to end open source semantic search system
    • FeatureForm's embeddinghub combining DB and KNN
    • vespa knn-based managed retrieval engine
    • Many other companies are using these systems and releasing open tools on the way, and it would be too long a list to put them here (for example facebook with faiss and self supervision, google with scann and thousand of papers, microsoft with sptag, spotify with annoy, criteo with rsvd, deepr, autofaiss, …)

awesome-semantic-search's People

Contributors

rom1504 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.