Epsilla 是一个开源矢量数据库。我们的重点是确保矢量搜索的可扩展性、高性能和成本效益。 EpsillaDB 弥合了大型语言模型中信息检索和内存保留之间的差距。
1. 在 Docker 中运行后端
docker pull epsilla/vectordb docker run --pull=always -d -p 8888:8888 -v /data:/data epsilla/vectordb
2. 与Python客户端交互
pip install pyepsilla
from pyepsilla import vectordbclient = vectordb.Client(host='localhost', port='8888') client.load_db(db_name="MyDB", db_path="/data/epsilla") client.use_db(db_name="MyDB")
client.create_table( table_name="MyTable", table_fields=[ {"name": "ID", "dataType": "INT", "primaryKey": True}, {"name": "Doc", "dataType": "STRING"}, ], indices=[ {"name": "Index", "field": "Doc"}, ] )
client.insert( table_name="MyTable", records=[ {"ID": 1, "Doc": "Jupiter is the largest planet in our solar system."}, {"ID": 2, "Doc": "Cheetahs are the fastest land animals, reaching speeds over 60 mph."}, {"ID": 3, "Doc": "Vincent van Gogh painted the famous work "Starry Night.""}, {"ID": 4, "Doc": "The Amazon River is the longest river in the world."}, {"ID": 5, "Doc": "The Moon completes one orbit around Earth every 27 days."}, ], )
client.query( table_name="MyTable", query_text="Celestial bodies and their characteristics", limit=2 )
# Result # { # 'message': 'Query search successfully.', # 'result': [ # {'Doc': 'Jupiter is the largest planet in our solar system.', 'ID': 1}, # {'Doc': 'The Moon completes one orbit around Earth every 27 days.', 'ID': 5} # ], # 'statusCode': 200 # }
-
嵌入向量的高性能和生产规模相似性搜索。
-
成熟的数据库管理系统,具有熟悉的数据库、表和字段概念。矢量只是另一种字段类型。
-
元数据过滤。
-
融合密集向量和稀疏向量的混合搜索。
-
内置嵌入支持,以自然语言呈现自然语言的搜索体验。
-
具有计算存储分离、无服务器和多租户的云原生架构。
-
丰富的生态系统集成,包括LangChain和LlamaIndex。
-
Python/JavaScript/Ruby 客户端和 REST API 接口。
Epsilla 的核心采用 C++ 编写,利用先进的学术并行图遍历技术进行向量索引,实现比 HNSW 快 10 倍的向量搜索,同时保持超过 99.9% 的精度水平。
在Epsilla Cloud尝试我们完全托管的矢量 DBaaS
1. 构建 Epsilla Python Bindings lib 包
cd engine/scripts (If on Ubuntu, run this first: bash setup-dev.sh) bash install_oatpp_modules.sh cd .. bash build.sh ls -lh build/*.so
2. 使用上一步中构建的文件夹“build”中的 python bindings lib "epsilla.so" "libvectordb_dylib.so 运行测试
cd engine export PYTHONPATH=./build/ export DB_PATH=/tmp/db33 python3 test/bindings/python/test.py
以下是一些示例代码:
import epsillaepsilla.load_db(db_name="db", db_path="/data/epsilla") epsilla.use_db(db_name="db") epsilla.create_table( table_name="MyTable", table_fields=[ {"name": "ID", "dataType": "INT", "primaryKey": True}, {"name": "Doc", "dataType": "STRING"}, {"name": "EmbeddingEuclidean", "dataType": "VECTOR_FLOAT", "dimensions": 4, "metricType": "EUCLIDEAN"} ] ) epsilla.insert( table_name="MyTable", records=[ {"ID": 1, "Doc": "Berlin", "EmbeddingEuclidean": [0.05, 0.61, 0.76, 0.74]}, {"ID": 2, "Doc": "London", "EmbeddingEuclidean": [0.19, 0.81, 0.75, 0.11]}, {"ID": 3, "Doc": "Moscow", "EmbeddingEuclidean": [0.36, 0.55, 0.47, 0.94]} ] ) (code, response) = epsilla.query( table_name="MyTable", query_field="EmbeddingEuclidean", response_fields=["ID", "Doc", "EmbeddingEuclidean"], query_vector=[0.35, 0.55, 0.47, 0.94], filter="ID < 6", limit=10, with_distance=True ) print(code, response)