Comments (5)
Where can you see that this function (ivf_search
) call actually takes the most of time at query time?
from annlite.
ivf_search
is where the search actually happens. BTW, I also run a profile
Line # Hits Time Per Hit % Time Line Contents
==============================================================
132 @line_profile
133 def search_cells(
134 self,
135 query: np.ndarray,
136 cells: np.ndarray,
137 where_clause: str = '',
138 where_params: Tuple = (),
139 limit: int = 10,
140 include_metadata: bool = False,
141 ):
142 15 15.0 1.0 0.0 topk_dists, topk_docs = [], []
143 30 48.0 1.6 0.0 for x, cell_idx in zip(query, cells):
144 # x.shape = (self.dim,)
145 30 910683.0 30356.1 98.5 dists, doc_ids, cells = self.ivf_search(
146 15 3.0 0.2 0.0 x,
147 15 3.0 0.2 0.0 cells=cell_idx,
148 15 3.0 0.2 0.0 where_clause=where_clause,
149 15 7.0 0.5 0.0 where_params=where_params,
150 15 5.0 0.3 0.0 limit=limit,
151 )
152
153 15 32.0 2.1 0.0 topk_dists.append(dists)
154 15 160.0 10.7 0.0 match_docs = DocumentArray()
155 165 186.0 1.1 0.0 for dist, doc_id, cell_id in zip(dists, doc_ids, cells):
156 150 5574.0 37.2 0.6 doc = Document(id=doc_id)
157 150 67.0 0.4 0.0 if include_metadata:
158 150 5568.0 37.1 0.6 doc = self.doc_store(cell_id).get([doc_id])[0]
159
160 150 1736.0 11.6 0.2 doc.scores[self.metric.name.lower()].value = dist
161 150 196.0 1.3 0.0 match_docs.append(doc)
162 15 13.0 0.9 0.0 topk_docs.append(match_docs)
163
164 15 6.0 0.4 0.0 return topk_dists, topk_docs
from annlite.
Even if it is where search happens boiler plate code joining results could actually take more time. Nevertheless, this posts suggests that is not the case. at least al search_cells level.
from annlite.
Results on table.query
Line # Hits Time Per Hit % Time Line Contents
==============================================================
221 @line_profile
222 def query(
223 self,
224 where_clause: str = '',
225 where_params: Tuple = (),
226 ) -> Iterator[dict]:
227 """Query the records which matches the given conditions
228
229 :param where_clause: where clause for query
230 :param where_params: where parameters for query
231 :return: iterator to yield matched doc
232 """
233 15 20.0 1.3 0.0 sql = 'SELECT _id, _doc_id from {table} WHERE {where} ORDER BY _id ASC;'
234
235 # where_conds = ['_deleted = ?']
236 15 10.0 0.7 0.0 where_conds = []
237 15 8.0 0.5 0.0 if where_clause:
238 15 14.0 0.9 0.0 where_conds.append(where_clause)
239 15 17.0 1.1 0.0 where_conds += ['_deleted = ?']
240 15 15.0 1.0 0.0 where = ' and '.join(where_conds)
241 15 54.0 3.6 0.0 sql = sql.format(table=self.name, where=where)
242
243 # params = (0,) + tuple([_converting(p) for p in where_params])
244 15 81.0 5.4 0.0 params = tuple([_converting(p) for p in where_params]) + (0,)
245
246 # for row in self._conn.execute(f'PRAGMA index_list("{self.name}")'):
247 # print(row)
248
249 # # sql = 'EXPLAIN QUERY PLAN ' + sql
250 # for row in self._conn.execute('EXPLAIN QUERY PLAN ' + sql, params):
251 # print(row)
252
253 15 9597.0 639.8 1.2 cursor = self._conn.execute(sql, params)
254 500015 555813.0 1.1 68.7 for row in cursor:
255 500000 243938.0 0.5 30.1 yield {'_id': row[0] - 1, '_doc_id': row[1]}
from annlite.
This PR #74 archives 3x improvement
from annlite.
Related Issues (20)
- Support for 16 bit quantization HOT 2
- Support Lucene backend via PyLucene HOT 1
- fix: links to documentation are broken HOT 2
- RuntimeError: wrong dimensionality of the vectors HOT 5
- RuntimeError: cannot return results
- add dump/backup endpoints
- Support for Mac with Apple Silicon HOT 1
- Can annlite be accelerated? HOT 4
- AttributeError: 'builtins.WriteOptions' object has no attribute 'set_sync' HOT 2
- annlite wrong filter name bug HOT 1
- Delete in executor does not works HOT 11
- Update does not work in annlite executor HOT 31
- Link missing in README.md HOT 2
- (bug)ID mismatch between hnsw and sqlite HOT 1
- ImportError in tests directory HOT 2
- 支持gpu? HOT 1
- Annliteindexer results change every bootup within a jina flow HOT 9
- AttributeError: 'builtins.WriteOptions' object has no attribute 'set_sync' HOT 1
- docarray extend is very slow HOT 6
- snapshot's index_hash has wrong value when deleting only HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from annlite.