Giter Club home page Giter Club logo

Comments (6)

JoanFM avatar JoanFM commented on May 27, 2024

Hello @tommykoctur , the docarray version used was the same?

from annlite.

tommykoctur avatar tommykoctur commented on May 27, 2024

hi,

yes for both tests they used docarray==0.21.0

from annlite.

numb3r3 avatar numb3r3 commented on May 27, 2024

@tommykoctur Thanks for your reports. We will perform some experiments to identify the cause, and try to fix this issue ASAP.

from annlite.

numb3r3 avatar numb3r3 commented on May 27, 2024

Storing 1 document (with embedding d 768) took 0.35s vs 1.15s.

@tommykoctur What's the index size in your comparison? The insert performance is very sensitive with the indexer built already. Inserting a new point in a large indexer usually takes a longer time.

from annlite.

jemmyshin avatar jemmyshin commented on May 27, 2024
from annlite import AnnLite
from docarray import Document, DocumentArray
import numpy as np
import time
import shutil

N = 10000
REPEAT = 10

# directly using Annlite
time_elapse_list = []
indexer = AnnLite(n_dim=768, data_path='./data')
for _ in range(REPEAT):
    da = DocumentArray()
    for i in range(N):
        da.extend([Document(id=str(i), embedding=np.random.randn(768))])
    indexer.index(da)
    time.sleep(1) # wait for indexing

    da = DocumentArray([Document(embedding=np.random.randn(768))])

    start_time = time.time()
    indexer.index(da)
    print(indexer.stat) # make sure that the insertion happends, you are supposed to see index_size=10001 and total_docs=10001 
    time_elapse_list.append(time.time() - start_time)
    indexer.clear()
    time.sleep(1)
print(f"===> insert using Annlite: {sum(time_elapse_list) / len(time_elapse_list)}")

indexer.close()
shutil.rmtree('./data')

# using Annlite in docarray
time_elapse_list = []
indexer = DocumentArray(storage='annlite',
                    config={'n_dim': 768, 'data_path': './data'})
for _ in range(REPEAT):
    with indexer:
        da = DocumentArray()
        for i in range(N):
            da.extend([Document(id=str(i), embedding=np.random.randn(768))])
        indexer.extend(da)
        time.sleep(1)

        d = DocumentArray([Document(embedding=np.random.randn(768))])

        start_time = time.time()
        indexer.extend(d)
        print(indexer._annlite.stat)
        time_elapse_list.append(time.time() - start_time)
        indexer.clear()
        time.sleep(1)
print(f"===> insert using docarray: {sum(time_elapse_list) / len(time_elapse_list)}")
shutil.rmtree('./data')

can you try this script and print out the time elape? @tommykoctur
I run this script on my local machine using annlite==0.5.4 and 0.5.8, actually 0.5.8 is even faster than 0.5.4 since we move part of index logic into memory instead of disk(10ms for 0.5.4 and 8ms for 0.5.8). Could you double-check your index size? The time for insertion will have huge difference when index size is different.

from annlite.

tommykoctur avatar tommykoctur commented on May 27, 2024

Hi,

your script doesn't seems problematic. I will investigate more why this is happening to our project. Thank you for now.

{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
===> insert using Annlite: 0.12578973770141602
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
{'total_docs': 10001, 'index_size': 10001, 'n_cells': 1, 'n_dim': 768, 'n_components': None, 'metric': 'COSINE', 'is_trained': True}
===> insert using docarray: 0.13696033954620362```

from annlite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.