Comments (6)
Hello @tommykoctur , the docarray version used was the same?
from annlite.
hi,
yes for both tests they used docarray==0.21.0
from annlite.
@tommykoctur Thanks for your reports. We will perform some experiments to identify the cause, and try to fix this issue ASAP.
from annlite.
Storing 1 document (with embedding d 768) took 0.35s vs 1.15s.
@tommykoctur What's the index size in your comparison? The insert performance is very sensitive with the indexer built already. Inserting a new point in a large indexer usually takes a longer time.
from annlite.
from annlite import AnnLite
from docarray import Document, DocumentArray
import numpy as np
import time
import shutil
N = 10000
REPEAT = 10
# directly using Annlite
time_elapse_list = []
indexer = AnnLite(n_dim=768, data_path='./data')
for _ in range(REPEAT):
da = DocumentArray()
for i in range(N):
da.extend([Document(id=str(i), embedding=np.random.randn(768))])
indexer.index(da)
time.sleep(1) # wait for indexing
da = DocumentArray([Document(embedding=np.random.randn(768))])
start_time = time.time()
indexer.index(da)
print(indexer.stat) # make sure that the insertion happends, you are supposed to see index_size=10001 and total_docs=10001
time_elapse_list.append(time.time() - start_time)
indexer.clear()
time.sleep(1)
print(f"===> insert using Annlite: {sum(time_elapse_list) / len(time_elapse_list)}")
indexer.close()
shutil.rmtree('./data')
# using Annlite in docarray
time_elapse_list = []
indexer = DocumentArray(storage='annlite',
config={'n_dim': 768, 'data_path': './data'})
for _ in range(REPEAT):
with indexer:
da = DocumentArray()
for i in range(N):
da.extend([Document(id=str(i), embedding=np.random.randn(768))])
indexer.extend(da)
time.sleep(1)
d = DocumentArray([Document(embedding=np.random.randn(768))])
start_time = time.time()
indexer.extend(d)
print(indexer._annlite.stat)
time_elapse_list.append(time.time() - start_time)
indexer.clear()
time.sleep(1)
print(f"===> insert using docarray: {sum(time_elapse_list) / len(time_elapse_list)}")
shutil.rmtree('./data')
can you try this script and print out the time elape? @tommykoctur
I run this script on my local machine using annlite==0.5.4 and 0.5.8, actually 0.5.8 is even faster than 0.5.4 since we move part of index logic into memory instead of disk(10ms for 0.5.4 and 8ms for 0.5.8). Could you double-check your index size? The time for insertion will have huge difference when index size is different.
from annlite.
Hi,
your script doesn't seems problematic. I will investigate more why this is happening to our project. Thank you for now.
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
===> insert using Annlite: 0.12578973770141602
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
===> insert using docarray: 0.13696033954620362```
from annlite.
Related Issues (20)
- Support for 16 bit quantization HOT 2
- Support Lucene backend via PyLucene HOT 1
- fix: links to documentation are broken HOT 2
- RuntimeError: wrong dimensionality of the vectors HOT 5
- RuntimeError: cannot return results
- add dump/backup endpoints
- Support for Mac with Apple Silicon HOT 1
- Can annlite be accelerated? HOT 4
- AttributeError: 'builtins.WriteOptions' object has no attribute 'set_sync' HOT 2
- annlite wrong filter name bug HOT 1
- Delete in executor does not works HOT 11
- Update does not work in annlite executor HOT 31
- Link missing in README.md HOT 2
- (bug)ID mismatch between hnsw and sqlite HOT 1
- ImportError in tests directory HOT 2
- 支持gpu? HOT 1
- Annliteindexer results change every bootup within a jina flow HOT 9
- AttributeError: 'builtins.WriteOptions' object has no attribute 'set_sync' HOT 1
- snapshot's index_hash has wrong value when deleting only HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from annlite.