Comments (9)
fastembed=0.5.0
please try v0.1.1
from fastembed.
FYI for "BAAI/bge-base-en" i get a cosine_sim of ~0.999
. For "sentence-transformers/all-MiniLM-L6-v2" its around 0.223
from fastembed.
Thanks for flagging this. Will investigate.
First guess would be that these are two different models possibly since FlagEmbedding is different from Sentence Transformers.
from fastembed.
On my system the code above still vails with v0.1.1 Have you tried the above code?
@NirantK For models, i use "sentence-transformers/all-MiniLM-L6-v2" on both sides.
from fastembed.
sentence-transformers=2.22
fastembed=0.1.1
sentence = ["This is a test sentence."]
arrays are not almost equal to 1 decimals
Mismatched elements: 2 / 384 (0.521%)
Max absolute difference: 0.81547204
Max relative difference: 2334.82220783
x: array([ 1.4e-02, -1.9e-02, 6.3e-03, 3.0e-02, 1.8e-02, -1.5e-02,
-8.6e-03, 1.3e-02, 1.1e-02, -4.0e-03, -6.7e-04, 7.2e-03,
5.4e-03, 1.2e-02, 1.5e-03, -4.8e-03, 1.8e-02, -1.6e-02,...
y: array([ 8.4e-02, 5.8e-02, 4.5e-03, 1.1e-01, 7.1e-03, -1.8e-02,
-1.7e-02, -1.5e-02, 4.0e-02, 3.3e-02, 1.0e-01, -4.7e-02,
6.9e-03, 4.1e-02, 1.9e-02, -4.1e-02, 2.4e-02, -5.7e-02,...
from fastembed.
Hey!
I've not done a thorough analysis, but I've also had some really quirky results with the fastembed
embeddings as well. Some similarity scores don't make much sense at all, so I speculate if the embeddings are incorrect.
from fastembed.
Hey, I can confirm that the sentence-transforms quantization isn't perfect. The cosine similarity is lower than we'd like.
The retrieval performance doesn't see too much degradation in a small test that I ran, but yes — this is an important issue.
Thanks for flagging this.
from fastembed.
I am curious if this is still present? I want to use fastembed
in my docker containers, but not sure if that's feasible with the mismatches
from fastembed.
So, this fix must resolve this issue
from fastembed.
Related Issues (20)
- Please add BAAI/bge-large-zh-v1.5 model
- [Bug/Model Request]: Is slower than sentence transformer for all-minilm-l6-v2 HOT 10
- [Model Request] please add "pkshatech/GLuCoSE-base-ja" HOT 3
- [Bug]: Bug when trying to use FastEmbedEmbeddings() HOT 4
- [Model Request]: Please add jinaai/jina-embeddings-v2-base-de
- Not able to install fastembed in windows machine. HOT 1
- [Bug]: Faiss Search Error with TextEmbedding HOT 1
- Download the model at Docker image build time HOT 2
- [Model Request]: Support italian BM25
- [Model Request]: Support lier007/xiaobu-embedding-v2
- In AWS Lambda "Unable to import module 'app': /lib64/libm.so.6: version `GLIBC_2.27' not found"
- [Bug/Model Request]: Does this version support cuDNN 9.x and onnxruntime-gpu 1.18.1? HOT 1
- [Bug/Model Request]: Installation failed getting [SSL: CERTIFICATE_VERIFY_FAILED] HOT 1
- [Bug/Model Request]: Load model files from path, not from huggingface cach directory HOT 2
- [Bug/Model Request]: Support for Alibaba-NLP/gte-multilingual-base
- [Bug/Model Request]: Newly added supported models
- Deprecate prithvida splade due to a typo in the name
- [Bug/Model Request]: Issue: DeprecationWarning for tar.extractall Filter Parameter in Python 3.14 and Inconsistent Behavior Across Platforms
- [Documentation] Querying with Splade++
- [Bug/Model Request]:
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastembed.