qdrant / fastembed Goto Github PK

View Code? Open in Web Editor NEW

1.3K 11.0 90.0 2.4 MB

Fast, Accurate, Lightweight Python library to make State of the Art Embedding

Home Page: https://qdrant.github.io/fastembed/

License: Apache License 2.0

Python 68.98% Jupyter Notebook 31.02%

embeddings openai rag retrieval retrieval-augmented-generation vector-search

fastembed's Introduction

⚡️ What is FastEmbed?

FastEmbed is a lightweight, fast, Python library built for embedding generation. We support popular text models. Please open a GitHub issue if you want us to add a new model.

The default text embedding (TextEmbedding) model is Flag Embedding, presented in the MTEB leaderboard. It supports "query" and "passage" prefixes for the input text. Here is an example for Retrieval Embedding Generation and how to use FastEmbed with Qdrant.

📈 Why FastEmbed?

Light: FastEmbed is a lightweight library with few external dependencies. We don't require a GPU and don't download GBs of PyTorch dependencies, and instead use the ONNX Runtime. This makes it a great candidate for serverless runtimes like AWS Lambda.
Fast: FastEmbed is designed for speed. We use the ONNX Runtime, which is faster than PyTorch. We also use data parallelism for encoding large datasets.
Accurate: FastEmbed is better than OpenAI Ada-002. We also support an ever-expanding set of models, including a few multilingual models.

🚀 Installation

To install the FastEmbed library, pip works best. You can install it with or without GPU support:

pip install fastembed

# or with GPU support

pip install fastembed-gpu

📖 Quickstart

from fastembed import TextEmbedding
from typing import List

# Example list of documents
documents: List[str] = [
    "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
    "fastembed is supported by and maintained by Qdrant.",
]

# This will trigger the model download and initialization
embedding_model = TextEmbedding()
print("The model BAAI/bge-small-en-v1.5 is ready to use.")

embeddings_generator = embedding_model.embed(documents)  # reminder this is a generator
embeddings_list = list(embedding_model.embed(documents))
  # you can also convert the generator to a list, and that to a numpy array
len(embeddings_list[0]) # Vector of 384 dimensions

Fastembed supports a variety of models for different tasks and modalities. The list of all the available models can be found here

🎒 Dense text embeddings

from fastembed import TextEmbedding

model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embeddings = list(model.embed(documents))

# [
#   array([-0.1115,  0.0097,  0.0052,  0.0195, ...], dtype=float32),
#   array([-0.1019,  0.0635, -0.0332,  0.0522, ...], dtype=float32)
# ]

🔱 Sparse text embeddings

SPLADE++

from fastembed import SparseTextEmbedding

model = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
embeddings = list(model.embed(documents))

# [
#   SparseEmbedding(indices=[ 17, 123, 919, ... ], values=[0.71, 0.22, 0.39, ...]),
#   SparseEmbedding(indices=[ 38,  12,  91, ... ], values=[0.11, 0.22, 0.39, ...])
# ]

🦥 Late interaction models (aka ColBERT)

from fastembed import LateInteractionTextEmbedding

model = LateInteractionTextEmbedding(model_name="colbert-ir/colbertv2.0")
embeddings = list(model.embed(documents))

# [
#   array([
#       [-0.1115,  0.0097,  0.0052,  0.0195, ...],
#       [-0.1019,  0.0635, -0.0332,  0.0522, ...],
#   ]),
#   array([
#       [-0.9019,  0.0335, -0.0032,  0.0991, ...],
#       [-0.2115,  0.8097,  0.1052,  0.0195, ...],
#   ]),  
# ]

🖼️ Image embeddings

from fastembed import ImageEmbedding

images = [
    "./path/to/image1.jpg",
    "./path/to/image2.jpg",
]

model = ImageEmbedding(model_name="Qdrant/clip-ViT-B-32-vision")
embeddings = list(model.embed(images))

# [
#   array([-0.1115,  0.0097,  0.0052,  0.0195, ...], dtype=float32),
#   array([-0.1019,  0.0635, -0.0332,  0.0522, ...], dtype=float32)
# ]

⚡️ FastEmbed on a GPU

FastEmbed supports running on GPU devices. It requires installation of the fastembed-gpu package.

pip install fastembed-gpu

Check our example for detailed instructions and CUDA 12.x support.

from fastembed import TextEmbedding

embedding_model = TextEmbedding(
    model_name="BAAI/bge-small-en-v1.5", 
    providers=["CUDAExecutionProvider"]
)
print("The model BAAI/bge-small-en-v1.5 is ready to use on a GPU.")

Usage with Qdrant

Installation with Qdrant Client in Python:

pip install qdrant-client[fastembed]

pip install qdrant-client[fastembed-gpu]

You might have to use quotes pip install 'qdrant-client[fastembed]' on zsh.

from qdrant_client import QdrantClient

# Initialize the client
client = QdrantClient("localhost", port=6333) # For production
# client = QdrantClient(":memory:") # For small experiments

# Prepare your documents, metadata, and IDs
docs = ["Qdrant has Langchain integrations", "Qdrant also has Llama Index integrations"]
metadata = [
    {"source": "Langchain-docs"},
    {"source": "Llama-index-docs"},
]
ids = [42, 2]

# If you want to change the model:
# client.set_model("sentence-transformers/all-MiniLM-L6-v2")
# List of supported models: https://qdrant.github.io/fastembed/examples/Supported_Models

# Use the new add() instead of upsert()
# This internally calls embed() of the configured embedding model
client.add(
    collection_name="demo_collection",
    documents=docs,
    metadata=metadata,
    ids=ids
)

search_result = client.query(
    collection_name="demo_collection",
    query_text="This is a query document"
)
print(search_result)

fastembed's People

Contributors

Stargazers

Watchers

Forkers

rishav-hub bharatr21 coactive-tomas srbhr codeaudit evelynmitchell paperwave techthiyanes ashiskumarnaik xiechengmude tashaskyup do-me touristshaun amankishore michaelfeil theseriousprogrammer ngogiaphat johannesmessner alialemimatinpour khaliladib11 joanfm anush008 dolife chan150 gurpreetkaurjethra shahinsharifi dpjanes sorokinvld okabe-rintarou-0 carlos-osorio-alcalde ichiranakita tahslim habibzadeh hiepxanh geetu040 rheagalfire tanmaypatil123 canonical-ai-inc salma2vec atakarim821 jamert kartikgupta321 srini047 35c4n0r yuvraj-wale srijansriv acuere rexionmars ashwagandhae aazam-gh andrei-masilevich pulkitraju akash10513 deependujha pandinosaurus zihaowang ya-shh johnyoonh patrickkenya afg1 ditto190 tspannhw arunppsg waffleboy zhcharles i8dnlo wittech socho009 eren23 kaqumiq deichrenner apollohuang1 shuaibibobo thierrydamiba zolero hubayirp vineetp6 cuoicungtui mldk-tech yingruiz-github bm777 celinehoang177 russpalms ego nickprock 1537906150 dev1ous twellck kun432

fastembed's Issues

clip-ViT-B-32-multilingual-v1 support, ps: I can contribute.

I exported clip-ViT-B-32-multilingual-v1 to onnx with some modifications(no effect on the output embedding).

hf optimum onnx export can export this model with (0) Transformer and (1) Pooling. But it can not extend with provided dense layer. What I have done is, I created a model that combines 3 layers as follows;

CombinedModel

from sentence_transformers import SentenceTransformer
from sentence_transformers import models
import torch
import torch.nn as nn
import onnx
import numpy as np

class CombinedModel(nn.Module):
    def __init__(self, transformer_model, dense_model):
        super(CombinedModel, self).__init__()
        self.transformer = transformer_model
        self.dense = dense_model

    def forward(self, input_ids, attention_mask):
        outputs = self.transformer({'input_ids': input_ids, 'attention_mask': attention_mask})
        token_embeddings = outputs['token_embeddings']
        dense_output = self.dense({'sentence_embedding': token_embeddings})
        dense_output_tensor = dense_output['sentence_embedding']
        
        ### this was important for me. it took me a bit to figure out that original model takes the mean of dense output
        mean_output = torch.mean(dense_output_tensor, dim=1)
        flattened_output = mean_output.squeeze(0)
        return flattened_output

Combine dense with original model

transformer_model = SentenceTransformer('clip-ViT-B-32-multilingual-v1', cache_folder='model_pytorch')
tokenizer = transformer_model.tokenizer

### this is from dense model configuration
dense_model = models.Dense(
    in_features=768,
    out_features=512,
    bias=False,
    activation_function= nn.Identity()
)

### load the weights from dense model binary
state_dict = torch.load('model_pytorch/sentence-transformers_clip-ViT-B-32-multilingual-v1/2_Dense/pytorch_model.bin')
dense_model.load_state_dict(state_dict)

model = CombinedModel(transformer_model, dense_model)

Export combined model to onnx

model.eval()

input_text = "This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text (in 50+ languages) and images to a common dense vector space such that images and the matching texts are close."

inputs = tokenizer(input_text, padding='longest', truncation=True, max_length=128, return_tensors='pt')
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']

# Export the model
torch.onnx.export(model,               # model being run
                  (input_ids, attention_mask), # model input (or a tuple for multiple inputs)
                  "combined_model.onnx", # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=17,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input_ids', 'attention_mask'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input_ids': {0 : 'batch_size', 1: 'seq_length'},    # variable length axes
                                'attention_mask': {0 : 'batch_size', 1: 'seq_length'},
                                'output' : {0 : 'batch_size'}})

onnx.checker.check_model("combined_model.onnx")
comdined_model = onnx.load("combined_model.onnx")

Compare both original and onnx model output;

import torch
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

model = SentenceTransformer('sentence-transformers/clip-ViT-B-32-multilingual-v1')
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/clip-ViT-B-32-multilingual-v1')

# Prepare the input
text = "This is an example sentence."
inputs = tokenizer(text, padding='longest', truncation=True, max_length=128, return_tensors='pt')

# Run the PyTorch model
pytorch_output =  model.encode(text, convert_to_tensor=True, device='cpu')

# Convert the inputs to numpy arrays for the ONNX model
inputs_onnx = {name: tensor.numpy() for name, tensor in inputs.items()}

# Run the ONNX model
sess = ort.InferenceSession("combined_model.onnx")
onnx_output = sess.run(None, inputs_onnx)

# Compare the outputs
print("Are the outputs close?", np.allclose(pytorch_output.detach().numpy(), onnx_output[0], atol=1e-6))

# Calculate the differences between the outputs
differences = pytorch_output.detach().numpy() - onnx_output[0]

# Print the standard deviation of the differences
print("Standard deviation of the differences:", np.std(differences))

print("pytorch_output size:", pytorch_output.size())
print("onnx_output size:", onnx_output[0].shape)

Output:

Are the outputs close? True
Standard deviation of the differences: 1.6167593e-07
pytorch_output size: torch.Size([512])
onnx_output size: (512,)

I would really like to contribute the onnx model, novices like me can use the onnx version easily. I did not find any CONTRIBUTIONS guide, however, I can contribute the model with your directions.

Support for hugging face paraphrase-multilingual-MiniLM-L12-v2

Requesting support for paraphrase-multilingual-MiniLM-L12-v2. Thanks.

Load fine-tuning model using fastembed

Hi. Your works are great!
I want to load my fine-tuning model into fastembed.
But i can't find any documentation about that
Can you tell me how to do that?

Thank you

clarify splitting in documentation

I am using embeddings to embed scientific papers. Usually, I use langchain splitters to split the paper into multiple chunks. However, it is not clear to me if fastembed will do splitting for me or I have to split everything (for which I will have to run embedding tokenizer to evaluate tokens per each paragraph).

Make `local_cache` an environment variable

Similar to other libaries, it would be great to pass a default env var.
e.g.

os.environ.get("SENTENCE_TRANSFORMERS_HOME")
os.environ.get("HF_HOME")

404 Error

Just noticed a 404 error on https://qdrant.github.io/fastembed/examples/Retrieval%20with%20FastEmbed/

It would be helpful if the docs are up-to date.

Qdrant giving relatively high scores when doing embeddings with `BAAI/bge-small-en-v1.5`

Hi,

I am generating embeddings for a lot of abstracts each 500 chars longs using fastembed and BAAI/bge-small-en-v1.5 and inserting them into Qdrant.
I am using also the same above setup to embed user questions in my LLM. and then use the embedded vector to find relevant abstracts in qdrant.

the correlation that is happening between the embeddings are giving high scores. which does not make sense all the time.

Suppose the below example.

Q: Do you know any info about Atomic habits book by James Clear?

Retrieved abstract from db:
Do you ever know the book? Yeah, this is a very interesting book to read. As we told previously, reading is not kind of obligation activity to do when we have to obligate. Reading should be a habit, a good habit. By reading, you can open the new world and get the power from the world. Everything can be gained through the book. Well in brief, book is very powerful. As what we offer you right here, this natural healing is as one of reading book for you. the score of this answer is 0.72

Q:Who was the creator of dragon ball Xenoverse game?

retireved abstract from db:

Abstract Computational algorithms can be described in many methods and implemented in many languages. Here we present an approach using storytelling methods of computer game design in modeling some finite-state machine algorithms and applications requiring user interaction. An open source software Twine is used for the task. Interactive nonlinear stories created with Twine are applications that can be executed in a web browser. Storytelling approach provides an easy-to-understand view on computational power. the score of this record is 0.6188309

I understand that there is very similarity between the questions asked above and their corresponding retrieved abstracts. they can be retrieved which is fine, but not with such scores. I would score these abstracts lower than 0.7 or 0.6.

This behavior caused me to increase the score_threshold of the database retrieval to 0.75 .

Is this not making sense only for me? or is that the normal behavior of how things should work.

Using sentence-transformers from huggingface previously, was giving lower scores for these kind of questions above.
Using fastembed with BAAI/bge-base-en-v1.5 was giving high scores as well and also sentence-transformers/all-minilm-l6-v2 from fastembed was giving high scores too.
Any ideas?

AttributeError: 'FlagEmbedding' object has no attribute 'embed_documents'

from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader

path_f = 'test.txt'

# Load and process the text
loader = TextLoader(path_f)
documents = loader.load()

#text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=70)
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

with open(path_f, 'r') as file:
    content = file.read()

Hello I usually load my txt file for embedding with langchain textloader so as to split it into chunks. So I was able to do the embedding with code below but unable to add it to a vector store with chromadb

fe_model = Embedding(model_name="BAAI/bge-base-en-v1.5", max_length=512)

embeddings = fe_model.passage_embed(texts)
list(embeddings)

Code to add to chromadb

from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents=texts, embedding=Embedding(), persist_directory="./db2")
print("[+]Done")

Error - AttributeError: 'FlagEmbedding' object has no attribute 'embed_documents'
What could be wrong, I usually save my embedding this way to be able to use it with langchain llm like llama for free. Thanks!

Allow any HF feature extraction model to be used in fastembed

There are various models on HuggingFace with the same architecture type as existing supported models (like Bert, XLM Roberta, etc.). It'll be good to allow any of them to be loaded and used in fastembed.

The corresponding model artifacts (if there is some quantization and format conversion happening behind the scenes) should ideally be stored at a permanent location (maybe on HF hub), to allow for easy reproduction.

cc @NirantK

Universal class for all type of embedding models

It looks like currently we have different base classes for Jina and all the other embedding models.
That creates confusing user experience and in general over-complicates usage (see tests, for example).

We need to implement a single common class for all dense text models, which should internally route to the proper internal implementaiton.

Is BAAI/bge-small-en-v1.5 supported?

It looks like fast-bge-small-en-v1.5.tar.gz exists on GCS but is not complete.

    259 tokenizer_path = model_dir / "tokenizer.json"
    260 if not tokenizer_path.exists():
--> 261     raise ValueError(f"Could not find tokenizer.json in {model_dir}")
    262 model_path = model_dir / "model_optimized.onnx"
    263 if not model_path.exists():

Working with Frozen embeddings

Might be a good idea to include this as an endpoint directly here

Customizing Embedding: https://github.com/openai/openai-cookbook/blob/main/examples/Customizing_embeddings.ipynb

Feature: Option for ONNX on GPU execution provider

I appreciate the CPU-first design.

It would be great to switch the execution provider to GPU/CUDA - are there any plans on the roadmap to do so?

Could you add the releases to github ?

at the moment, there are no github releases.

For context, I'm packaging this for a linux distribution (nixos).
This packages is awesome, thank you for making it!
(qdrant too btw!!!).

Add Facebook DPR Question Encoder Models

https://huggingface.co/facebook/dpr-ctx_encoder-multiset-base
https://huggingface.co/facebook/dpr-question_encoder-single-nq-base

Allow to use data parallelization instead of onnx multithreading

Currently, we have a single-threaded loop that feeds batches into multi-threaded onnx runtime.
This is ok we want to minimize latency of the single branch, however in case of the task to encode a lot of documents, data-level parallelism could be more efficient for better throughput.

Suggestion is to allow user to select which type of parallelism is better in which situation:
In case of qdrant integration: use batch-level on queries and data-level on add

Data-level parallelism can be implemented using python multiprocessing library.
It is quite easy to break thins with multiprocessing, so I suggest to re-use implementation from https://github.com/qdrant/qdrant-client/blob/master/qdrant_client/parallel_processor.py

If data parallelism is used, onnx runtime will be forked across processes and should be configured to only use one thread per fork.

Compressed file ended before the end-of-stream marker was reached

"name": "EOFError",
"message": "Compressed file ended before the end-of-stream marker was reached",
This is the code i used

from fastembed.embedding import FlagEmbedding as Embedding
from typing import List
import numpy as np

documents: List[str] = [
    "passage: Hello, World!",
    "query: Hello, World!",  # these are two different embedding
    "passage: This is an example passage.",
    "fastembed is supported by and maintained by Qdrant.",  # You can leave out the prefix but it's recommended
]
embedding_model = Embedding(model_name="BAAI/bge-base-en", max_length=512)
embeddings: List[np.ndarray] = embedding_model.embed(documents)

interruption of model downloading leads to a corrupted cache

"sentence-transformers/all-MiniLM-L6-v2" - incorrect embeddings and rather slow speedup.

I wrote a small unit test. Your models seem to have a couple of issues:

Inconsistency of created embedding vs Sentence transformers (the sentences are different) - wrong conversion?
No onnx-gpu

sentence_transformers=2.22
fastembed=0.5.0
torch=2.0.0

import json
import timeit

import numpy as np
from sentence_transformers import SentenceTransformer
from fastembed.embedding import FlagEmbedding

model_name_or_path="sentence-transformers/all-MiniLM-L6-v2"

model_fast = FlagEmbedding(model_name_or_path)
model_st = SentenceTransformer(model_name_or_path)

sample_sentence = [f"{list(range(i))} " for i in range(64)]

got = np.stack(list(model_fast.embed(sample_sentence)))
want = model_st.encode(sample_sentence, normalize_embeddings=True)

# FAILS here Mismatched elements: 24384 / 24576 (99.2%)
np.testing.assert_almost_equal(
    got, want
)

# 2.0177175840362906 vs 2.4251126241870224
print(
timeit.timeit(lambda: list(model_fast.embed(sample_sentence)), number=10), "vs",
timeit.timeit(lambda: model_st.encode(sample_sentence, normalize_embeddings=True), number=10))

Embedding Limit

Good Day,

I must say that this embedding too actually does what it says, I was able to embed 8 million tokens in just 3 hours on my Macbook pro m1 which Ollama embedding has taken over 2 days now that I just had to end it.

But what happened is when I used a chat model to polish reference to human like responses, it was unable to pick some stuffs out of the context.

So my question would be, does it have a limit in how much it embeds, that it left some stuffs out?

WARNING: qdrant-client 1.2.0 does not provide the extra 'fastembed'

After install I got

WARNING: qdrant-client 1.2.0 does not provide the extra 'fastembed'

Any hint?

Progress bar?

Thank you for this great library! It's my go-to library now instead of sentence-transformers for web services.

I was wondering... can a progress bar be added to the embedding process? Something as simple as rich or tqdm.

embeddings = model.embed(documents, progress=True)
embeddings = list(embeddings)

# progress bar here

Add Sentence Transformer Family Models

Can see the most popular/downloaded ones from HF and support those. Reduce the scope of this issue to fewer models and a specific list

Add support for custom models

For most of our customers, trying to use models trained on regulatory data, it would be great if Fastembed supports custom model.

Can Fastembed have pure rust or c++versions

no python no pain

Promote Huggingface Hub to first class citizen

The latest plans are in the most recent comment at the end

Error Handling improvements will come from two main improvements:

Migrating away from GCP to Huggingface Hub completely
1. This will reduce the edge cases we need to maintain, including file renaming and similar code too
For models which we push to HF Hub, we can add a “name” and “sources” field —
1. where the name is what HF Hub base model and sources is a list of community or Qdrant models

This issue is about the first one.

How to push models?

This is a good reference contribution: https://huggingface.co/weakit-v/bge-base-en-v1.5-onnx/tree/main

This is what we should aim to replicate as much as we can. We'll have these models under the Qdrant Huggingface Hub account instead. So they'd be something like: qdrant/bge-base-en-v1.5-onnx

{
   name: "BAAI/bge-base-en-v1.5",
   sources: ["qdrant/bge-base-en-v1.5-onnx", "weakit-v/bge-base-en-v1.5-onnx"]
}

We'll have to do this for each model, one at a time:

BAAI/bge-small-en-v1.5
BAAI/bge-base-en-v1.5
sentence-transformers/all-MiniLM-L6-v2 — do not quantize the model and push as is
intfloat/multilingual-e5-large
jinaai/jina-embeddings-v2-small-en — we should be able to retain the existing embedding implementation
jinaai/jina-embeddings-v2-base-en — we should be able to retain the existing embedding implementation

In this process, we deprecate the following models by not porting them from GCP to HF Hub on our account:

BAAI/bge-small-en
BAAI/bge-small-zh-v1.5
BAAI/bge-base-en

please update "fastembed 0.1.1 requires tokenizers<0.14,>=0.13, but..."

Other packages are self installing updated versions of e.g. tokenizers AFTER I downgraded for fastembed...

Installing collected packages: tokenizers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.13.0
    Uninstalling tokenizers-0.13.0:
      Successfully uninstalled tokenizers-0.13.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastembed 0.1.1 requires tokenizers<0.14,>=0.13, but you have tokenizers 0.15.0 which is incompatible.
Successfully installed tokenizers-0.15.0

Version Tag: 1.1

Missing version tag

EmbeddingModel is not an abstract class

Inheriting from abc.ABC does not prevent creating a class if it has no @abc.abstractmethods
Thus we need either add some abstractmethods to prevent creation of EmbeddingModel or stop inheriting from ABC

Since we create an instance of EmbeddingModel in EmbeddingWorker, I believe we need to remove inheritance

Sbert transformers

does this support Sbert library?

Add support for Image/Multimodal Model

Considering Qdrant would support "any" kind of embedding, we should have a way to process image/multimodal embeddings.

Is there an existing way to do it through Fastembed?

Difference between passage_embed and embed in Embedding Class

I see that the code base has 2 methods one passage_embed and embed, but upon inspection of the code, I think that both are essentially the same, is there any difference between them. Or is it intended to add future features

Support BAAI/bge-large-en-v1.5

It's currently the best MTEB model (by a small margin though) but significantly larger than the base model.

Request to support multilingual-e5 family

Hi,
at the moment fastembed support only multilingual-e5-large.
It would be helpful to also have the basic and small.

Issue adding fastembed as dependency in poetry due to python requirement

Installing the current project: infinity_emb (0.0.2)

infinity_emb $poetry lock --no-update
Resolving dependencies... (0.1s)

The current project's Python requirement (>=3.10,<4.0) is not compatible with some of the required packages Python requirement:
  - fastembed requires Python >=3.8.0,<3.12, so it will not be satisfied for Python >=3.12,<4.0

[tool.poetry.dependencies]
python = ">=3.10,<4.0"
fastapi = "^0.103.2"
fastembed = {version = "^0.0.5", optional=true}

Investigate sentence-transformers/paraphrase-multilingual-mpnet-base-v2 model

It would be great, specially for users that need a language besides English to support multilingual-e5-large. It's the best model for plenty of non-mainstream languages.

Wrap and support Sparse Vector Creation

FastEmbed should/can support sparse vector creation which is based on Bag of Words e.g. TF-IDF and BM25 Okapi. We can launch with existing Python implementations e.g https://pypi.org/project/rank-bm25/

This will help adoption for sparse vectors within the Qdrant ecosystem itself as we can recommend this as the canonical place to make some sparse vectors.

Add Supported Models API and to Docs

Add nDCG/Recall vs Speed Comparison for All Supported Models

Something which can answer the same questions which are answered here i.e. :

Which models offer the best recall to speed trade off for query time across different domains?

Updated tokenizers to match transformers requirement

tokenizers version must be >= 0.14

Add model sentencetransformers/multi-qa-MiniLM-L6-cos-v1

Hi is there a way to support "sentencetransformers/multi-qa-MiniLM-L6-cos-v1". It is similar to sentencetransformers/AllMiniLML6V2.

Hopefully nothing changes to produce this version, just it was trained on questions and answers and might be more relevant for databases when I am trying to match specific queries which are submitted as questions?

Issue with 1.5 model

>>> Embedding("BAAI/bge-small-en-v1.5")
Was not able to download fast-bge-small-en-v1.5.tar.gz, trying BAAI-bge-small-en-v1.5.tar.gz
Traceback (most recent call last):
  File "/Users/amankishore/Library/Python/3.9/lib/python/site-packages/fastembed/embedding.py", line 339, in retrieve_model
    self.download_file_from_gcs(
  File "/Users/amankishore/Library/Python/3.9/lib/python/site-packages/fastembed/embedding.py", line 245, in download_file_from_gcs
    raise PermissionError(
PermissionError: Authentication Error: You do not have permission to access this resource. Please check your credentials.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/amankishore/Library/Python/3.9/lib/python/site-packages/fastembed/embedding.py", line 422, in __init__
    self._model_dir = self.retrieve_model(model_name, cache_dir)
  File "/Users/amankishore/Library/Python/3.9/lib/python/site-packages/fastembed/embedding.py", line 346, in retrieve_model
    self.download_file_from_gcs(
  File "/Users/amankishore/Library/Python/3.9/lib/python/site-packages/fastembed/embedding.py", line 245, in download_file_from_gcs
    raise PermissionError(
PermissionError: Authentication Error: You do not have permission to access this resource. Please check your credentials.

need multilingual-e5-small

multilingual-e5-large is too big

Host model weights on Huggingface

Big organizations put up firewalls that make it impossible to download model weights. It is unlikely that large organizations have the capacity to whitelist every single source for model weights for every derivative package. However a blanket-approval is often granted for huggingface (via proxies that do malware scanning and license checks).

So my ask is; would it be possible to use huggingface as a host? It doesn't need to be the primary host. But for poor folks behind corporate firewalls having a secondary host on huggingface might just save them tons of bureaucracy. It will definitely help with adoption, as otherwise I'd be forced to stick with (sentence-)transformers.

Similarity mismatch?

In the documentation it is said that the embeddings are 99% similar while the respective number is ~0.92. can you please check it?

Support for thenlper/gte-large

Hello - Wondering if it's possible to add the gte class of models? Thank you!

Langchain Integration

Also, any timeframe for fastembed to be available on langchain?

From Discord conversation here: https://discord.com/channels/907569970500743200/1156253958470385704/1156291861762293821

sentence-transformers/sentence-t5-base and sentence-transformers/all-mpnet-base-v2