infiniflow / infinity Goto Github PK
View Code? Open in Web Editor NEWThe AI-native database built for LLM applications, providing incredibly fast full-text and vector search
Home Page: https://infiniflow.org
License: Apache License 2.0
The AI-native database built for LLM applications, providing incredibly fast full-text and vector search
Home Page: https://infiniflow.org
License: Apache License 2.0
A short, clear and concise description of what the bug is.
Steps to reproduce the behavior. Bonus points if those are only SQL queries.
main
branch?Parent Issue
Line 5 in e3ee494
You can not set the generator in CMake. It is a read only variable. The it is specified by the -G option to CMake and once picked can not be changed. This could be changed to a fatal error if the generator is not ninja.
My stack is all C# and Azure. I don't want to use any Python code or interop.
A .net API please?
I use Azure RAG now.
Massive c# community.
No response
Current full text index is based on iresearch library which is tightly bounded to document oriented data models and does not support real time indexing.
We need a new full text index implementation start from scratch such that it could work more smoothly with infinity with higher performance, real time indexing.
Exception occurred during concurrent operation
SizeT thread_num = 16;
SizeT total_times = 2 * 10 * 1000;
main
branch?In the current interface of the catalog module, many functions have multiple return values. However, now we do not use tuple or pair as the return value, but instead place the return value in the function parameters and obtain the output result by reference.
Use tuple as return value of function.
No response
No response
No response
Nano benchmark source code need to be removed from the git history to reduce the whole repository size.
Now the index creation is a logical log. The index file needs to be rebuilt when playing back the log, resulting in slow playback speed.
Writing the path information of the index file brushed to disk to the wal file
No response
No response
Unnecessary data copy from `ColumnBuffer` to `ColumnVector`
Read from file to ColumnVector
Remove ColumnBuffer
.
GetColumnVector
in BlockColumnEntry
, which load the column of entry from disk. The data lifetime of returned column vector is managed by buffer_manager.Varchar
type use FixHeapManager
to allocate and read/load chunk. One chunk is mapped to one outline file on disk.No response
No response
No response
main
docker image id 1f1ebe620523
Hardware: MacBook Pro, Intel Core i7
OS type: macOS Ventura 13.6.1
Others: Docker Desktop for macOS, Version 4.24.0 (122432)
# librae @ mbpl in ~/work/repo/infinity on git:main o [21:18:48]
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
infiniflow/infinity latest 1f1ebe620523 5 days ago 122MB
nodered/node-red latest aad8a8d13b50 3 months ago 549MB
# librae @ mbpl in ~/work/repo/infinity on git:main o [21:23:13]
$ docker run -d --name infinity -v /tmp/infinity/:/tmp/infinity --network=host infiniflow/infinity bash ./opt/bin/infinity
eb9bf7949bab2474fca51e3852f0ad77d38f2e49bf6fedf5cdda97af0cee80db
# librae @ mbpl in ~/work/repo/infinity on git:main o [21:25:18]
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
eb9bf7949bab infiniflow/infinity "bash ./opt/bin/infi…" 8 seconds ago Exited (126) 7 seconds ago infinity
# librae @ mbpl in ~/work/repo/infinity on git:main o [21:25:25]
$ docker logs infinity
./opt/bin/infinity: ./opt/bin/infinity: cannot execute binary file
Expect the docker container to run successfully.
docker run -d --name infinity -v /tmp/infinity/:/tmp/infinity --network=host infiniflow/infinity bash ./opt/bin/infinity
### Additional information
_No response_
Secondary index is used for numeric filtering. It is composed of two parts:
The mechanism of range filtering of secondary index is as follows:
No response
No response
No response
No response
Current strategy to schedule task is round-robin **all** tasks in a `PlanFragment`.
For the task that depends on other tasks, plain round-robin simply schedules on a random(next) cpu.
For example, assume a complete serialize fragment which length is 16 and with no parallel task.
Current strategy will schedule all the task on all different 16 cpu core.
The problem is:
1. Some core is allocated a unready task, which will check every time the cpu is runable.
2. The context switch cost is big.
The scheduler can allocate the task that has dependency relation on the same cpu and remain their sequence.
Schedule the task when it is runable.
No response
No response
No response
BOOL type should be similar to the std::bitset.
No response
No response
No response
1. Each import create a new segment and import data in new block in the new segment. The not filled block may waste disk space.
2. The compaction also remove delete row to save disk space.
3. The index is created in segment granulrity, small segment will degrade the performance of index.
4. The index rebuild is not solved in this issue
A backend task scan the table in period, if segment can be merged then merge it.
No response
No response
No response
Allow construction of knn index (hnsw) in parallel.
No unified error message and error code before. For software error, maybe let infinity crash and provide back trace. For recoverable error, we need the error code and error message returned to client.
A unified error code and error message to return to client.
success
0000 ok
auth error
2001 passwd is wrong
2002 insufficient privilege
syntax error or access rule violation
3001 invalid username
3002 invalid password
3003 invalid db/schema name
3004 invalid table name
3005 invalid column name
3006 invalid index name
3007 invalid column definition
3008 invalid table definition
3009 invalid index definition
3010 data type mismatch
3011 name too long
3012 reserved name
3013 syntax error
3014 invalid parameter value
3015 duplicate user
3016 duplicate database
3017 duplicate table
3018 duplicate index name
3019 duplicate index
3020 no such user
3021 database not exist
3022 table not exist
3023 index not exist
3024 column not exist
3025 aggregate can't be in where clause
3026 column name in select list must appear in group by or aggregate function.
3027 no such system variable
3028 set invalid value to system variable
3029 system variable is read-only
txn error
4001 txn rollback
4002 txn conflict
insufficient resources or exceed limits
5001 disk_full
5002 out of memory
5003 too many connections
5004 configuration limit exceed
5005 query is too complex
operation intervention
6006 query_canceled
6007 not supported
system error
7001 io_error
7002 duplicated file
7003 config file error
7004 lock file exists
7005 catalog is corrupted
7006 data corrupted
7007 index corrupted
7008 file not found
7009 dir not found
No response
No response
Parent Issue
SELECT a , b FROM test_table_star where a =4
Steps to reproduce the behavior. Bonus points if those are only SQL queries.
SELECT a , b FROM test_table_star where a =4;
main
branch?main
No response
There are a lot of forward declarations of classes which are actually defined in another modules, this is incorrect.
For instance, here
TableCollectionEntry
is declared in the module logical_fusion
, and it contradicts to the fact that it's actually defined in the module table_collection_entry
No response
...
No response
Infinity need the min max column value information of each column in the segment/block. With this information and condition expression, infinity may filter out some data segments/blocks before table scan.
Currently, I suppose these information will co-located with the information of segment / block which is in catalog.
Feature Request
Supports SQL Limit clauses
e.g. select * from t1 limit 3 offset 1
COPY NATION FROM 'test/sql/copy/nation.csv' WITH ( DELIMITER ',' );
crash
Steps to reproduce the behavior. Bonus points if those are only SQL queries.
nation.csv
1,2,
3,4,
main
branch?Restart server after running function.
error message:
"terminate called after throwing an instance of 'infinity::StorageException@infinity_exception'
what(): Storage Error: index_def_meta should have at least one entry @src/storage/meta/entry/table_collection_entry.cpp:410"
main
branch?Current task is synchronous. IO operation blocks the task.
Refactor the task to allow suspend and resume when IO happens.
TODO
No response
No response
Blocking occurs when multiple threads create Database
main
branch?OS: Ubuntu
Statements:
CREATE TABLE mytable (
id INTEGER PRIMARY KEY,
name VARCHAR(50),
age INTEGER
);
INSERT INTO mytable (id, name, age) VALUES (1, 'John', 30);
INSERT INTO mytable (id, name, age) VALUES (2, 'Jane', 25);
SELECT * FROM mytable;
Error Message:
Executor Error: Not value expression. @src/executor/operator/physical_insert.cpp:25
main
kould-21j0
description: Computer
width: 64 bits
capabilities: smp vsyscall32
*-core
description: Motherboard
physical id: 0
*-memory
description: System memory
physical id: 0
size: 28GiB
*-cpu
product: AMD Ryzen 7 7735H with Radeon Graphics
vendor: Advanced Micro Devices [AMD]
physical id: 1
bus info: cpu@0
version: 25.68.1
size: 2311MHz
capacity: 4828MHz
width: 64 bits
Distributor ID: Ubuntu
Description: Ubuntu 23.04
Release: 23.04
Codename: lunar
Import 9000 pieces of data, but in fact there are only 808 pieces, and it can be reproduced repeatedly
After importing 9000 pieces of data, select * from table
can display 9000 pieces of data.
kould=> CREATE TABLE test_limit (c1 int, c2 int);
OK
----
(0 rows)
kould=> COPY test_limit FROM '/home/kould/CLionProjects/infinity-k/test/data/csv/test_limit.csv' WITH ( DELIMITER ',' );
IMPORT 9000 Rows
kould=> select * from test_limit;
Tips: Use the csv file attached below
SELECT a + 1, b FROM test_table_star
Steps to reproduce the behavior. Bonus points if those are only SQL queries.
SELECT a + 1, b FROM test_table_star
main
branch?I created a table with a Varchar field and inserted a string into the corresponding field before an exception occurred
Tips: src/function/cast/varchar_cast.h:47
create table t7 (a int primary key, z varchar(298) unique null);
insert into t7 (a, z) values (1, 'k');
main
branch?Parent Issue
CREATE TABLE mytable (
id INTEGER PRIMARY KEY,
name VARCHAR(50),
age INTEGER
);
INSERT INTO mytable (id, name, age) VALUES (1, 'John', 30);
INSERT INTO mytable (id, name, age) VALUES (2, 'Jane', 25);
The system fails when sql syntax error.
show * from t1
(where t1 is a table name)
or click tab on keyboard when has a wrong syntax.
main
branch?main
No response
After this commit:
commit c5d004a
Author: shen yushi [email protected]
Date: Fri Dec 22 16:30:19 2023 +0800
Try to fix CI bug. Add more log. (#351)
* Fix bug: add lock in `BufferObj` when close file. Add extra log for ci debug.
* Remove lock and add log.
When I run slt test from scratch, everything is OK. Then I shutdown the server and restart again. Following crash information is given:
[23:51:37.194] [120875] [info] Load base catalog1 from: /tmp/infinity/data/catalog/META_550.delta.json
[23:51:37.196] [120875] [info] Load delta catalog1 from: /tmp/infinity/data/catalog/META_1072.delta.json
[23:51:37.197] [120875] [info] Load delta catalog1 from: /tmp/infinity/data/catalog/META_1108.delta.json
terminate called after throwing an instance of 'infinity::StorageException@infinity_exception'
what(): Storage Error: SegmentEntry::MergeFrom requires min_row_ts_ match @src/storage/meta/entry/segment_entry.cpp:46
No response
1. Clean data directory.
2. Start infinity server.
3. Run slt test.
4. After all cases passed, shutdown the server.
5. Start infinity server again, which will trigger the fault.
No response
DATE data type is not functioning
Support DATE data type
No response
No response
No response
SizeT thread_num = 1;
SizeT total_times = 2 * 10 * 1000;
main
branch?main
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
https://github.com/infiniflow/infinity/blob/main/docs/build_from_source.md
Once I have git, I can use git clone, so I don't need to install git again.
sudo only works for echo
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/llvm-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/llvm-archive-keyring.gpg] https://apt.llvm.org/jammy/ llvm-toolchain-jammy-17 main" | sudo tee /etc/apt/sources.list.d/llvm17.list
sudo apt update
sudo apt install clang-17 clang-tools-17
Installing clang-17 but using clang-18
There are dependencies on lz4 and boost, but they are not installed.
No response
Build from source on Ubuntu 22.04
No response
Supports SQL OrderBy clauses
e.g. select * from t1 order by c1
No response
treat order by + limit as top operation
No response
No response
No response
What is the feature?
Supports aggregate operation.
How to make the feature.
In memory index is based on a lock-free btree for both dictionary and posting. When dumped to disk, it is compressed according to posting format
No response
default dimension of VarcharInfo
should not be 0
src/planner/logical_planner.cpp LogicalPlanner::BuildInsertValue
create table t3 (a int primary key, z varchar unique null);
insert into t3 (a, z) values (1, 'k');
main
branch?SELECT test_table_star.* FROM test;
Steps to reproduce the behavior. Bonus points if those are only SQL queries.
main
branch?main
i5-12500, 16c, 16GB, Ubuntu 22.04
As title, system crash when use 16 thread to test the query_benchmark. And use 1 thread to test query_benchmark will cost about 3s, which cost 2.2~2.3s before.
No crash and no performance downgrade.
1. Checkout d4af653975c9ce4642142d9276f3904a07ade8ac (before Add new scheduler #395)
Single thread performance OK and no crash on multiple thread query benchmark.
2. Checkout ada746cfa22f37ead2edcb8dfe857a3371951736 (after Add new scheduler #395)
Single thread performance OK, but crashed on multiple thread query benchmark.
3. Checkout 0d199792e228e904bb5deacf1fa8edc577a0ca74 (after Add lock when set fragment task status. #401)
Single thread performance downgrade and crash on multiple thread query benchmark.
No response
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.