izihawa / summa Goto Github PK
View Code? Open in Web Editor NEWFull-text IPFS-friendly and WASM-compatible Search in Rust
License: MIT License
Full-text IPFS-friendly and WASM-compatible Search in Rust
License: MIT License
It should be done through Directory interface. Additionally, a hotpath for migrating (copying) IpfsEngine
to IpfsEngine
should be accounted, because copying may be done through copying metadata only.
To make it work, I had to do the following:
aiosumma
from the master
branch of this repo (not from pypi).aiosumma/client.py
line 77 replace **index_engine,
by ipfs=ipfs
The schema specified in the quick start guide has a default_fields
element which blows up in the current version of aiosumma
. It looks like this was removed in 164310a, so the docs should probably be updated to match.
$ summa-cli localhost:8082 - create-index-from-file ~/summa/schema.yaml
SERVER_RESPONDED:
Traceback (most recent call last):
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/google/protobuf/internal/python_message.py", line 577, in _GetFieldByName
return message_descriptor.fields_by_name[field_name]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'default_fields'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/nix/store/7hxdkn32a2qqvacpi4fh7sr73yigv75j-python3.11-aiosumma-2.44.1/bin/.summa-cli-wrapped", line 9, in <module>
sys.exit(main())
^^^^^^
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/aiosumma/cli.py", line 25, in main
fire.Fire(client_cli, name='summa-client')
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/fire/core.py", line 689, in _CallAndUpdateTrace
component = loop.run_until_complete(fn(*varargs, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/ng1c2jqy48p1x33j1qyg0n5anhfv31g0-python3-3.11.4/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/aiogrpcclient/base.py", line 94, in exposing_wrapper
result = await f
^^^^^^^
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/aiogrpcclient/base.py", line 39, in inner
return await method(**data)
^^^^^^^^^^^^^^^^^^^^
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/aiosumma/client.py", line 242, in create_index
index_service_pb.CreateIndexRequest(
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/google/protobuf/internal/python_message.py", line 548, in init
new_val = field.message_type._concrete_class(**field_value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/google/protobuf/internal/python_message.py", line 516, in init
field = _GetFieldByName(message_descriptor, field_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/l3xw4yisgv61a1lrg2rcxm2xjsjs9srx-python3-3.11.4-env/lib/python3.11/site-packages/google/protobuf/internal/python_message.py", line 579, in _GetFieldByName
raise ValueError('Protocol message %s has no "%s" field.' %
ValueError: Protocol message IndexAttributes has no "default_fields" field.
Code in summa
should be slightly refactored to support wasm-futures-executor
ThreadPool.
Precisely, we need a way to off-load requests to a particular index into separate Web Worker. Inside the query to index we may still use async executing but it is the topic to discuss too.
Mainly it is blocked in n0-computer/iroh#539
Is that possible using the summa query parser to ask for entries where a specific field is not null. Something like cid:<NOT NULL>
?
hi, developers, may I ask you to add a docker image which support armv7/arm64.
because I have found that many TV-Box with armv7/arm64, 2G/ram,16G/rom, and linux system, have a pretty lower power rate, nearly 3 KWH/mon, and only cost 10$ to buy one.
So, If it is kindly for you to make my issue into consider, there will be a better progress in application of IPFS.
at least in my region, many people could be able to buy a TV-box and install linux in it, then deploy summa by docker, with metadata of books and papers. here nearly every family have a TV-box enough to deploy docker.
thank you~
Hi,
Thanks for creating such powerful and creative tool.
I followed the quick start doc and it worked great (except for a small problem #141). Then, I tried IPFS Publish + WASM Browsing , but got the following error:
thread 'tokio-runtime-workers-0' panicked at 'not implemented', summa-server/src/services/index.rs:284:54
with the izihawa/summa-server:testing
docker container.
Currently there is ChunkedCachingDirectory
that holds locks internally. Locking is excessive for read-only cases and may be safely evaded.
Now merge_responses
are incomplete and duplicates Tantivy logic. Need to refactor this part of Summa for reducing codebase.
CIDs deletion should be supported in Iroh Store, removal absence is a big no go because there is no safe way to evict unused data
We have an application where we would like to run search in browser if the query can't be passed to elastic search (offline, etc). Are there any plans of providing an API compatible with elastic search? Because of it's ubiquity, other search engines (like quickwit, based on tantivy) also provide it. It would be great if summa were to also provide such an API
I followed the quick start guide and the server is running fine. But I have a bug when trying to attach an IPFS index:
summa-cli 0.0.0.0:8082 attach-index my_lib --ipfs '{"cid": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}'
Error:
ERROR: The function received no value for the required argument: index_engine
Usage: summa-client 0.0.0.0:8082 attach-index INDEX_NAME INDEX_ENGINE <flags>
optional flags: --merge_policy | --request_id | --session_id | --format
For detailed information on this command, run:
summa-client 0.0.0.0:8082 attach-index --help
I guess the CLI API has changed, so I tried:
summa-cli 0.0.0.0:8082 attach-index \
nexus_science \
'{"ipfs": {"chunked_cache_config": {"chunk_size": 10000 "cache_size": 10000}}}' \
--ipfs '{"cid": "xxxxxxxxxxxxxxxxxxxx"}'
and now the error is:
File "PYTHON_ENV/lib/python3.10/site-packages/aiosumma/client.py", line 72, in attach_index
index_service_pb.AttachIndexRequest(
TypeError: index_service_pb2.AttachIndexRequest() argument after ** must be a mapping, not str
Now files are duplicated until the server will be restarted.
The main issue is that IrohDirectory
should be properly reconfigured after commit to make it work without files on FS
following document to create index
summa-cli localhost:8082 - create-index-from-file schema.yaml
and get the following error:
Traceback (most recent call last):
File "/usr/local/bin/summa-cli", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/aiosumma/cli.py", line 24, in main
fire.Fire(client_cli, name='summa-client')
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 689, in _CallAndUpdateTrace
component = loop.run_until_complete(fn(*varargs, **kwargs))
File "/usr/local/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/aiogrpcclient/base.py", line 86, in exposing_wrapper
result = await fn(*bound.args, **bound.kwargs)
File "/usr/local/lib/python3.10/site-packages/aiogrpcclient/base.py", line 39, in inner
return await method(**data)
TypeError: SummaClient.create_index() missing 1 required positional argument: 'index_engine'
Hi, I would like to suggest to add the tag tantivy
in your repository, this will allow others to find easily your repository using: https://github.com/topics/tantivy
I tried to use the latest version of summa to index a large IPFS dataset (bafyb4iemblftubydyhfo6xhw56zrhudy2xexqb25f7awrahe3qfplse5g4
).
I am using:
izihawa/summa-server:0.13.0
aiosumma==2.30.1
The indexing works and I can perform searches.
But when trying to run summa-cli 0.0.0.0:8082 warmup-index "xxxxx" --is-full
and after waiting 10-20 hours, nothing happens. The data folder size remains constant (about 20Mo) and the logs don't show much besides the below:
2023-02-27T21:59:50.535400Z INFO tokio-runtime-workers-17 summa_core::components::index_holder: action="warming_up"
2023-02-27T22:06:10.069516Z WARN tokio-runtime-workers-2 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:07:38.764484Z WARN tokio-runtime-workers-11 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:09:28.935775Z WARN tokio-runtime-workers-18 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:41:42.821623Z WARN tokio-runtime-workers-25 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:46:17.843424Z WARN tokio-runtime-workers-8 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:47:53.240032Z WARN tokio-runtime-workers-20 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:48:27.795842Z WARN tokio-runtime-workers-26 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:50:18.449929Z WARN tokio-runtime-workers-18 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:53:19.389835Z WARN tokio-runtime-workers-12 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:56:08.556449Z WARN tokio-runtime-workers-15 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:56:28.528263Z WARN tokio-runtime-workers-12 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T22:59:43.933639Z WARN tokio-runtime-workers-15 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:02:04.264890Z WARN tokio-runtime-workers-26 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:19:00.988748Z WARN tokio-runtime-workers-10 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:25:39.076722Z WARN tokio-runtime-workers-5 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:30:49.920279Z WARN tokio-runtime-workers-26 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:31:35.280767Z WARN tokio-runtime-workers-23 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:32:08.273081Z WARN tokio-runtime-workers-1 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:35:19.405701Z WARN tokio-runtime-workers-10 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:37:57.243325Z WARN tokio-runtime-workers-2 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:38:23.882930Z WARN tokio-runtime-workers-0 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:39:45.993799Z WARN tokio-runtime-workers-26 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:40:36.507857Z WARN tokio-runtime-workers-2 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:44:43.510741Z WARN tokio-runtime-workers-5 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:46:14.269627Z WARN tokio-runtime-workers-8 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:55:48.495018Z WARN tokio-runtime-workers-15 rustls::conn: Sending fatal alert BadCertificate
2023-02-27T23:59:13.761939Z WARN tokio-runtime-workers-11 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:01:01.027181Z WARN tokio-runtime-workers-23 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:02:27.219973Z WARN tokio-runtime-workers-28 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:07:29.976689Z WARN tokio-runtime-workers-8 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:11:00.711415Z WARN tokio-runtime-workers-2 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:11:23.335158Z WARN tokio-runtime-workers-25 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:15:48.766198Z WARN tokio-runtime-workers-5 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:16:41.485513Z WARN tokio-runtime-workers-5 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:16:58.379112Z WARN tokio-runtime-workers-14 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:23:24.459728Z WARN tokio-runtime-workers-12 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:26:11.919729Z WARN tokio-runtime-workers-14 rustls::conn: Sending fatal alert BadCertificate
2023-02-28T00:32:17.039008Z WARN tokio-runtime-workers-26 rustls::conn: Sending fatal alert BadCertificate
Below are the ports set with docker run:
ports:
- 8082:8082 # GRPC API
- 8080:8080 # Iroh Gateway HTTP
- 4444:4444 # P2P - libp2p connection port. peers will dial your node here
- 4445:4445 # P2P
I also tried to use network_mode: host
but it didn't fix it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.