Comments (8)
pca will be integrated as a feature provided by annlite. It provides the function to reduce the embedding dimension so as the memory footprint. At this moment, we make a strong assumption that the training data (for pca) comes from the same distribution of indexed data, and the incremental data distribution would not have a large shift.
from annlite.
How will this be integrated? is it a way to do dimensionality reduction? How will u mantain ANNLite
being incremental?
from annlite.
But why is this better than PQ? We have PQ already for "dim reduction" right?
from annlite.
Generally, we want to take advantage of the fact that PCA dimensionality reduction preserves more information in the first components. PCA theoretically can achieve better ANN results with the same compress rate. In practice, pca and pq are usually used together (pca first, and then pq) to achieves good accuracy–space trade-offs. And we also find research work that demonstrates the combination idea works https://cs.uwaterloo.ca/~jimmylin/publications/Ma_etal_EMNLP2021.pdf
from annlite.
Hello,
would the PCA work with multiple shards of ANNLITE ? And how ? Will there will be a different projections for each shard. Would it be ok ?
Thanks
from annlite.
A good practice is to prepare a PCA ahead (offline) which can be shared across shards.
from annlite.
Hi @numb3r3, I briefly check the Projector code and it doesn't seems to me that there is some prepare loading of pretrained PCA. Or am I missing something ?
from annlite.
@jemmyshin do we have such document or examples?
from annlite.
Related Issues (20)
- Support for 16 bit quantization HOT 2
- Support Lucene backend via PyLucene HOT 1
- fix: links to documentation are broken HOT 2
- RuntimeError: wrong dimensionality of the vectors HOT 5
- RuntimeError: cannot return results
- add dump/backup endpoints
- Support for Mac with Apple Silicon HOT 1
- Can annlite be accelerated? HOT 4
- AttributeError: 'builtins.WriteOptions' object has no attribute 'set_sync' HOT 2
- annlite wrong filter name bug HOT 1
- Delete in executor does not works HOT 11
- Update does not work in annlite executor HOT 31
- Link missing in README.md HOT 2
- (bug)ID mismatch between hnsw and sqlite HOT 1
- ImportError in tests directory HOT 2
- 支持gpu? HOT 1
- Annliteindexer results change every bootup within a jina flow HOT 9
- AttributeError: 'builtins.WriteOptions' object has no attribute 'set_sync' HOT 1
- docarray extend is very slow HOT 6
- snapshot's index_hash has wrong value when deleting only HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from annlite.