Giter Club home page Giter Club logo

duckdb-rs's Introduction

duckdb-rs

Downloads Build Status dependency status codecov Latest Version Docs

duckdb-rs is an ergonomic wrapper for using duckdb from Rust. It attempts to expose an interface similar to rusqlite. Actually the initial code and even this README is forked from rusqlite as duckdb also tries to expose a sqlite3 compatible API.

use duckdb::{params, Connection, Result};

// In your project, we need to keep the arrow version same as the version used in duckdb.
// Refer to https://github.com/wangfenjin/duckdb-rs/issues/92
// You can either:
use duckdb::arrow::record_batch::RecordBatch;
// Or in your Cargo.toml, use * as the version; features can be toggled according to your needs
// arrow = { version = "*", default-features = false, features = ["prettyprint"] }
// Then you can:
// use arrow::record_batch::RecordBatch;

use duckdb::arrow::util::pretty::print_batches;

#[derive(Debug)]
struct Person {
    id: i32,
    name: String,
    data: Option<Vec<u8>>,
}

fn main() -> Result<()> {
    let conn = Connection::open_in_memory()?;

    conn.execute_batch(
        r"CREATE SEQUENCE seq;
          CREATE TABLE person (
                  id              INTEGER PRIMARY KEY DEFAULT NEXTVAL('seq'),
                  name            TEXT NOT NULL,
                  data            BLOB
                  );
        ")?;

    let me = Person {
        id: 0,
        name: "Steven".to_string(),
        data: None,
    };
    conn.execute(
        "INSERT INTO person (name, data) VALUES (?, ?)",
        params![me.name, me.data],
    )?;

    // query table by rows
    let mut stmt = conn.prepare("SELECT id, name, data FROM person")?;
    let person_iter = stmt.query_map([], |row| {
        Ok(Person {
            id: row.get(0)?,
            name: row.get(1)?,
            data: row.get(2)?,
        })
    })?;

    for person in person_iter {
        let p = person.unwrap();
        println!("ID: {}", p.id);
        println!("Found person {:?}", p);
    }

    // query table by arrow
    let rbs: Vec<RecordBatch> = stmt.query_arrow([])?.collect();
    print_batches(&rbs).unwrap();
    Ok(())
}

Notes on building duckdb and libduckdb-sys

libduckdb-sys is a separate crate from duckdb-rs that provides the Rust declarations for DuckDB's C API. By default, libduckdb-sys attempts to find a DuckDB library that already exists on your system using pkg-config, or a Vcpkg installation for MSVC ABI builds.

You can adjust this behavior in a number of ways:

  • If you use the bundled feature, libduckdb-sys will use the cc crate to compile DuckDB from source and link against that. This source is embedded in the libduckdb-sys crate and as we are still in development, we will update it regularly. After we are more stable, we will use the stable released version from duckdb. This is probably the simplest solution to any build problems. You can enable this by adding the following in your Cargo.toml file:
    [dependencies]
    # Assume that version DuckDB version 0.9.2 is used.
    duckdb = { version = "0.9.2", features = ["bundled"] }
  • When linking against a DuckDB library already on the system (so not using any of the bundled features), you can set the DUCKDB_LIB_DIR environment variable to point to a directory containing the library. You can also set the DUCKDB_INCLUDE_DIR variable to point to the directory containing duckdb.h.
  • Installing the duckdb development packages will usually be all that is required, but the build helpers for pkg-config and vcpkg have some additional configuration options. The default when using vcpkg is to dynamically link, which must be enabled by setting VCPKGRS_DYNAMIC=1 environment variable before build.

Binding generation

We use bindgen to generate the Rust declarations from DuckDB's C header file. bindgen recommends running this as part of the build process of libraries that used this. We tried this briefly (duckdb 0.10.0, specifically), but it had some annoyances:

  • The build time for libduckdb-sys (and therefore duckdb) increased dramatically.
  • Running bindgen requires a relatively-recent version of Clang, which many systems do not have installed by default.
  • Running bindgen also requires the DuckDB header file to be present.

So we try to avoid running bindgen at build-time by shipping pregenerated bindings for DuckDB.

If you use the bundled features, you will get pregenerated bindings for the bundled version of DuckDB. If you want to run bindgen at buildtime to produce your own bindings, use the buildtime_bindgen Cargo feature.

Contributing

See to Contributing.md

Checklist

  • Run cargo +nightly fmt to ensure your Rust code is correctly formatted.
  • Run cargo clippy --fix --allow-dirty --all-targets --workspace --all-features -- -D warnings to fix all clippy issues.
  • Ensure cargo test --all-targets --workspace --features "modern-full extensions-full" reports no failures.

TODOs

  • Refactor the ErrorCode part, it's borrowed from rusqlite, we should have our own
  • Support more type
  • Update duckdb.h
  • Adjust the code examples and documentation
  • Delete unused code / functions
  • Add CI
  • Publish to crate

License

DuckDB and libduckdb-sys are available under the MIT license. See the LICENSE file for more info.

duckdb-rs's People

Contributors

0b01 avatar aarashy avatar ankrgyl avatar chhetripradeep avatar dependabot[bot] avatar eddyxu avatar fadedbee avatar jaredzhou avatar jbcrail avatar jeadie avatar kenanwarren avatar mause avatar maxxen avatar mlafeldt avatar mytherin avatar nickpresta avatar phillipleblanc avatar rkusa avatar rongcuid avatar snth avatar swoorup avatar szarnyasg avatar tachiniererin avatar therealhieu avatar tshauck avatar wangfenjin avatar yoonghm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

duckdb-rs's Issues

Update Arrow dependency version

I have a project using the latest version of arrow (11.1.0), which seems to conflict with the version used by duckdb-rs (6.5.0):

    = note: expected reference `&arrow::datatypes::Schema`
               found reference `&arrow::datatypes::schema::Schema`
    = note: perhaps two different versions of crate `arrow` are being used?

I can't downgrade my version as I'm also attempting to use arrow-flight (also 11.1.0) which wasn't available around 6.5.0 (I'm assuming I need to keep arrow and arrow-flight in lockstep).

Is it possible to update the version of arrow used by duckdb-rs?

bundled build flags

According to #65 , I found out that the released asset from duckdb repo is different from our bundled release, as they might use different flags. eg. windows

It will make some of the features not supported in our bundled build, for example, test_insert_duplicate succeed using the released asset, but failed in bundled version when build from source.

We may need to set the flags same to the released version of duckdb, in build.rs

Unable to build on WSL 2 / Ubuntu 20.04 LTS

Hi,

I keep hitting compilation errors and/or segfaults when trying to compile version 0.7.1.

Is there a problem with my buildtools toolchain or am I perhaps running out of memory related to #107 (I only have 16 GB of RAM and only about 6 GB available at the start of compilation)?

$ cargo build
   Compiling libduckdb-sys v0.7.1
The following warnings were emitted during compilation:

warning: c++: fatal error: Killed signal terminated program cc1plus
warning: compilation terminated.

error: failed to run custom build command for `libduckdb-sys v0.7.1`

Caused by:
  process didn't exit successfully: `/home/tobias/src/pq/target/debug/build/libduckdb-sys-856c36fda295fbde/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-changed=duckdb/duckdb.hpp
  cargo:rerun-if-changed=duckdb/duckdb.cpp
  TARGET = Some("x86_64-unknown-linux-gnu")
  OPT_LEVEL = Some("0")
  HOST = Some("x86_64-unknown-linux-gnu")
  CXX_x86_64-unknown-linux-gnu = None
  CXX_x86_64_unknown_linux_gnu = None
  HOST_CXX = None
  CXX = None
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  HOST_CXXFLAGS = None
  CXXFLAGS = None
  CRATE_CC_NO_DEFAULTS = None
  DEBUG = Some("true")
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
  CXX_x86_64-unknown-linux-gnu = None
  CXX_x86_64_unknown_linux_gnu = None
  HOST_CXX = None
  CXX = None
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  HOST_CXXFLAGS = None
  CXXFLAGS = None
  CRATE_CC_NO_DEFAULTS = None
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
  CXX_x86_64-unknown-linux-gnu = None
  CXX_x86_64_unknown_linux_gnu = None
  HOST_CXX = None
  CXX = None
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  HOST_CXXFLAGS = None
  CXXFLAGS = None
  CRATE_CC_NO_DEFAULTS = None
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
  CXX_x86_64-unknown-linux-gnu = None
  CXX_x86_64_unknown_linux_gnu = None
  HOST_CXX = None
  CXX = None
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  HOST_CXXFLAGS = None
  CXXFLAGS = None
  CRATE_CC_NO_DEFAULTS = None
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
  CXX_x86_64-unknown-linux-gnu = None
  CXX_x86_64_unknown_linux_gnu = None
  HOST_CXX = None
  CXX = None
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  HOST_CXXFLAGS = None
  CXXFLAGS = None
  CRATE_CC_NO_DEFAULTS = None
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
  running: "/home/tobias/.cargo/bin/sccache" "c++" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-std=c++11" "-o" "/home/tobias/src/pq/target/debug/build/libduckdb-sys-b3c7f7b3f70cbc02/out/duckdb/duckdb.o" "-c" "duckdb/duckdb.cpp"
  cargo:warning=c++: fatal error: Killed signal terminated program cc1plus
  cargo:warning=compilation terminated.
  exit status: 1

  --- stderr


  error occurred: Command "/home/tobias/.cargo/bin/sccache" "c++" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-std=c++11" "-o" "/home/tobias/src/pq/target/debug/build/libduckdb-sys-b3c7f7b3f70cbc02/out/duckdb/duckdb.o" "-c" "duckdb/duckdb.cpp" with args "c++" did not execute successfully (status code exit status: 1).

Query on C Interface Arrow Data

Can someone provide an example of how to seemlessly pass the arrow to duckdb in rust? I have the location of parquet file(s) in Azure ADLSv2. I am currently using object_store and arrow-rs to read those parquet files. I do see duckdb and arrow work very seemless in Python. Is there some example that someone can point how to use in rust.

Thanks in advance.

How to enable parquet (reader)?

I am a rust newbie, so my apologies if this is a trivial question. I have successfully run the example provided in the examples folder (appender). I am trying to adjust it, by writing / reading a parquet file. The reading capability is actually more important for me at this moment.

I am using the instructions for parquet on. https://duckdb.org/docs/data/parquet
csv import/export using rust and duckdb works. But parquet does not, even though using the cli using the same version on my machine works.

Is there something I need to do to enable parquet? Or is there something else I am doing wrong?

Thanks again for making this!

Using the standard duckdb cli I am able to both export and reat the parquet files. Trying the same queries in rust gives me the following errors.

  1. reading
    let mut stmt = db.prepare("SELECT id, count(1) FROM parquet_schema('userdata1.parquet') GROUP BY id")?;

Error: DuckDBFailure(Error { code: Unknown, extended_code: 1 }, Some("Catalog Error: Table Function with name parquet_schema does not exist!\nDid you mean "duckdb_schemas"?\nLINE 1: SELECT id, count(1) FROM parquet_schema('userdata1.parquet') GRO...\

  1. writing
    let pragma_t = "COPY (SELECT * FROM test) TO 'result-snappy.parquet' (FORMAT 'parquet');";
    db.execute_batch(pragma_t)?;

Error: DuckDBFailure(Error { code: Unknown, extended_code: 1 }, Some("Catalog Error: Copy Function with name parquet does not exist!\nDid you mean "csv"?"))

Environment
Mac os X - m1
rustc 1.58.1 (db9d1b20b 2022-01-20)
duckdb 0.3.1

Cargo.toml
[dependencies.duckdb]
version = "0.3.1"
features = ["bundled"]

Unable to compile `duckdb-sys` on Windows

We were setting up Windows builds in GitHub Actions for our app and noticed that the duckdb-sys v0.4.0 crate's build script is unable to compile DuckDB from source.

There are a lot of suspicious warnings, and I've had to attach the output as a file because GitHub has a 65536 character limit on comments, but I think this is the thing that actually fails the build:

fatal error C1128: number of sections exceeded object file format limit: compile with /bigobj

(Link to full output from the build script)

I notice you've actually commented out the Windows build for this repo's CI. Does that mean this is a known issue?

https://github.com/wangfenjin/duckdb-rs/blob/30d8e3e637f8a3f9b96ca6de9e2275bc69a5bbae/.github/workflows/rust.yaml#L15-L17

Support more data types

Currently duckdb c api doesn't support all data types, also the duckdb_value_* and duckdb_bind_* methods are not complete.

We need to:

  • Support duckdb_bind_* api
  • Support more type in row.rs#value_ref

It will be a good issue to work on if you want to be familiar with both duckdb codebase and also this rust repo

Re-export dependencies (in particular arrow) to avoid version incompatibilties in client crates

Hi,

The Rust Arrow crate is used in many projects through the Rust data ecosystem (duckdb-rs, DataFusion, Polars, connectorx, ...). They also rapidly cycle through major version numbers (currently v25 and I think they were on 19 when I started a few months ago). This makes it extremely difficult if not impossible to find common version numbers to use in any client crate if you want to use more than one of these projects in one crate.

The only way around this that I have found so far is to use a re-exported version from the place where it is used. DataFusion does this for example here: https://github.com/apache/arrow-datafusion/blob/572c20e904b48ea4819210b983d41b1b08c23b46/datafusion/core/src/lib.rs#L232

Please can you add the same top-level re-export to duckdb-rs? It has to be the arrow crate at the top-level because I use a number of items from it.

This is probably a one line change and I'm happy to submit a PR for it if you like.

Loading `parquet` extension results in `Assertion failed ... file parse_info.hpp` on Cargo cleaned build

TLDR; Running conn.execute_batch("LOAD parquet;")?; appears to cause the following exception on the latest version (0.8.0) of duckdb:

Assertion failed: (dynamic_cast<TARGET *>(this)), function Cast, file parse_info.hpp, line 22.

This issue was previously raised in 161. The exception still occurs after running cargo clean. Reproducible example below.

main.rs

use duckdb::{params, Connection, Result};

// In your project, we need to keep the arrow version same as the version used in duckdb.
// Refer to https://github.com/wangfenjin/duckdb-rs/issues/92
// You can either:
// use duckdb::arrow::record_batch::RecordBatch;
// Or in your Cargo.toml, use * as the version; features can be toggled according to your needs
// arrow = { version = "*", default-features = false, features = ["prettyprint"] }
// Then you can:
use arrow::record_batch::RecordBatch;

use duckdb::arrow::util::pretty::print_batches;

#[derive(Debug)]
struct Person {
    id: i32,
    name: String,
    data: Option<Vec<u8>>,
}

fn main() -> Result<()> {
    let conn = Connection::open_in_memory()?;

    conn.execute_batch(
        r"CREATE SEQUENCE seq;
          CREATE TABLE person (
                  id              INTEGER PRIMARY KEY DEFAULT NEXTVAL('seq'),
                  name            TEXT NOT NULL,
                  data            BLOB
                  );
        ")?;

    let me = Person {
        id: 0,
        name: "Steven".to_string(),
        data: None,
    };
    conn.execute(
        "INSERT INTO person (name, data) VALUES (?, ?)",
        params![me.name, me.data],
    )?;

    // query table by rows
    let mut stmt = conn.prepare("SELECT id, name, data FROM person")?;
    let person_iter = stmt.query_map([], |row| {
        Ok(Person {
            id: row.get(0)?,
            name: row.get(1)?,
            data: row.get(2)?,
        })
    })?;

    for person in person_iter {
        println!("Found person {:?}", person.unwrap());
    }

    // query table by arrow
    let rbs: Vec<RecordBatch> = stmt.query_arrow([])?.collect();
    print_batches(&rbs).unwrap();

    conn.execute_batch("INSTALL parquet;")?;

    // Crash occurs here
    conn.execute_batch("LOAD parquet;")?;
    Ok(())
}

Cargo.toml

[package]
name = "duckdb-test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
duckdb = { version = "0.8.0", features = ["bundled"] }
arrow = { version = "*", default-features = false, features = ["prettyprint"] }

Structure

duckdb-test on  master [?] is 📦 v0.1.0 via 🦀 v1.68.0 
❯ tree
.
├── Cargo.lock
├── Cargo.toml
└── src
    ├── data
    │   ├── int32_decimal.parquet
    │   └── yellow_tripdata_2022-01.parquet
    └── main.rs

2 directories, 5 files

OS

❯ uname -a
Darwin wills-mbp.lan 22.1.0 Darwin Kernel Version 22.1.0: Sun Oct  9 20:15:09 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T6000 arm64

Output

duckdb-test on  master [?] is 📦 v0.1.0 via 🦀 v1.68.0 took 45s 
❯ cargo clean && cargo run --bin duckdb-test
   Compiling autocfg v1.1.0
   Compiling libm v0.2.7
   Compiling cfg-if v1.0.0
   Compiling proc-macro2 v1.0.58
   Compiling quote v1.0.27
   Compiling unicode-ident v1.0.8
   Compiling libc v0.2.144
   Compiling version_check v0.9.4
   Compiling once_cell v1.17.1
   Compiling core-foundation-sys v0.8.4
   Compiling bitflags v2.3.1
   Compiling iana-time-zone v0.1.56
   Compiling static_assertions v1.1.0
   Compiling arrow-schema v39.0.0
   Compiling lexical-util v0.8.5
   Compiling syn v1.0.109
   Compiling num-traits v0.2.15
   Compiling num-integer v0.1.45
   Compiling num-bigint v0.4.3
   Compiling num-iter v0.1.43
   Compiling num-rational v0.4.1
   Compiling ahash v0.8.3
   Compiling getrandom v0.2.9
   Compiling rustversion v1.0.12
   Compiling syn v2.0.16
   Compiling memchr v2.5.0
   Compiling crc32fast v1.3.2
   Compiling serde v1.0.163
   Compiling heck v0.4.1
   Compiling hashbrown v0.13.2
   Compiling serde_json v1.0.96
   Compiling adler v1.0.2
   Compiling xattr v0.2.3
   Compiling jobserver v0.1.26
   Compiling filetime v0.2.21
   Compiling num-complex v0.4.3
   Compiling half v2.2.1
   Compiling miniz_oxide v0.7.1
   Compiling chrono v0.4.24
   Compiling lexical-write-integer v0.8.5
   Compiling lexical-parse-integer v0.8.6
   Compiling itoa v1.0.6
   Compiling ryu v1.0.13
   Compiling lexical-parse-float v0.8.5
   Compiling lexical-write-float v0.8.5
   Compiling aho-corasick v1.0.1
   Compiling flate2 v1.0.26
   Compiling cc v1.0.79
   Compiling tar v0.4.38
   Compiling unicode-width v0.1.10
   Compiling vcpkg v0.2.15
   Compiling pkg-config v0.3.27
   Compiling regex-syntax v0.7.1
   Compiling num v0.4.0
   Compiling arrow-buffer v39.0.0
   Compiling serde_derive v1.0.163
   Compiling lexical-core v0.8.5
   Compiling rust_decimal v1.29.1
   Compiling arrayvec v0.7.2
   Compiling hashlink v0.8.2
   Compiling arrow-data v39.0.0
   Compiling smallvec v1.10.0
   Compiling fallible-streaming-iterator v0.1.9
   Compiling fallible-iterator v0.2.0
   Compiling cast v0.3.0
   Compiling arrow-array v39.0.0
   Compiling strum_macros v0.24.3
   Compiling regex v1.8.1
   Compiling arrow-select v39.0.0
   Compiling arrow-arith v39.0.0
   Compiling arrow-row v39.0.0
   Compiling strum v0.24.1
   Compiling comfy-table v6.1.4
   Compiling arrow-string v39.0.0
   Compiling arrow-cast v39.0.0
   Compiling arrow-ord v39.0.0
   Compiling arrow v39.0.0
   Compiling libduckdb-sys v0.8.0
   Compiling duckdb v0.8.0
   Compiling duckdb-test v0.1.0 (/Users/will/Projects/duckdb-test)
warning: field `id` is never read
  --> src/main.rs:16:5
   |
15 | struct Person {
   |        ------ field in this struct
16 |     id: i32,
   |     ^^
   |
   = note: `Person` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
   = note: `#[warn(dead_code)]` on by default

warning: `duckdb-test` (bin "duckdb-test") generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 46.53s
     Running `target/debug/duckdb-test`
Found person Person { id: 1, name: "Steven", data: None }
+----+--------+------+
| id | name   | data |
+----+--------+------+
| 1  | Steven |      |
+----+--------+------+
Assertion failed: (dynamic_cast<TARGET *>(this)), function Cast, file parse_info.hpp, line 22.
zsh: abort      cargo run --bin duckdb-test

How to compile statically linked executable?

Hello, thanks for an amazing project! It helps a lot.

I'm trying to build a static executable and I already tried everything that I know :)

It's not an issue. I just need a little help. Could you give me a hint in which direction I should dig? Thanks!

WASM example

Hi! Could you provide an example that shows how to compile a small project that uses DuckDb to WASM and then is runnable in the browser? That would be a great help.

How to insert lists, maps, and structs?

Maybe I missed it, but I couldn't see how to insert a row if one of the columns is e.g. TEXT[]. I naively tried params![a_vec_of_str] but ToSql is only implemented for Vec<u8> (I guess for blobs).

The DuckDB docs aren't super clear but it looks like you're supposed to use list_create('a', 'b', 'c') but I'm not sure how to do that with params![].

Baffled by problems building libduckdb-sys with Dockerfile

I am also working on the prql project, and trying to create a Dockerfile to encapulate all the machinery in a reproducible Docker container. I encounter a problem with an error while compiling libduckdb-sys.

Specifically, I'm working with the prql-query repo. I can run cargo build natively without problem. But when I attempt to use docker build -t pq . on the repo, I get the error reported in this Issue: error: failed to run custom build command for libduckdb-sys v0.5.1

Here's an excerpt showing the error message - The original report lists all the details. Any ideas about how to troubleshoot this? Many thanks.

# error from "docker build -t pq ." ...
...
#12 470.9 The following warnings were emitted during compilation:
#12 470.9
#12 470.9 warning: c++: fatal error: Killed signal terminated program cc1plus
#12 470.9 warning: compilation terminated.
#12 470.9
#12 470.9 error: failed to run custom build command for `libduckdb-sys v0.5.1`
#12 470.9
#12 470.9 Caused by:
#12 471.0   process didn't exit successfully: `/app/target/release/build/libduckdb-sys-dd2866ecae070e6b/build-script-build` (exit status: 1)
...

Problem with `stmt.query_arrow([])?.collect()`

I'm a total starter and trying to learn Rust with DuckDB. Just close if it's too basic of a question, just posting it here in case others have trouble getting started.

Error:

rustc: a value of type `Vec<RecordBatch>` cannot be built from an iterator over elements of type `arrow::record_batch::RecordBatch` the trait `FromIterator<arrow::record_batch::RecordBatch>` is not implemented for `Vec<RecordBatch>`

Question: What am I doing wrong:
image

I understand the rust iterator does not like the arrow RecordBatch. But how can I fix that, did I use the wrong arrow dependencies in my Cargo.toml on the right?

Is `Cargo.toml` current?

Running cargo minimal-versions check gives a bunch of errors:

error[E0432]: unresolved import `std::path::AsPath`
  --> /Users/maximilian/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.0/src/lib.rs:50:26
   |
50 | use std::path::{PathBuf, AsPath};
   |                          ^^^^^^
   |                          |
   |                          no `AsPath` in `path`
   |                          help: a similar name exists in the module: `Path`

error[E0433]: failed to resolve: could not find `old_io` in `std`
   --> /Users/maximilian/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.0/src/lib.rs:328:10
    |
328 |     std::old_io::stdio::set_stderr(Box::new(std::old_io::util::NullWriter));
    |          ^^^^^^ could not find `old_io` in `std`

error[E0433]: failed to resolve: could not find `old_io` in `std`
   --> /Users/maximilian/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.0/src/lib.rs:328:50
    |
328 |     std::old_io::stdio::set_stderr(Box::new(std::old_io::util::NullWriter));
    |                                                  ^^^^^^ could not find `old_io` in `std`

error[E0425]: cannot find function `set_exit_status` in module `env`
   --> /Users/maximilian/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.0/src/lib.rs:327:10
    |
327 |     env::set_exit_status(1);
    |          ^^^^^^^^^^^^^^^ not found in `env`

(and thank you for the excellent library! It's really helpful over at PRQL!)

Improve docs

Currently there is no way to learn how to use this library. There should be a duckdb-rs book for the best practices to do common operations such as inserting, querying etc..

Incorrect binding to unsigned integers

Even though DuckDB supports unsigned integer, this crate still binds values as signed integers, causing overflow error:

use duckdb::*;
fn main() {
    let conn = Connection::open_in_memory().expect("open db");
    conn.execute_batch(
        r"CREATE TABLE a (x UBIGINT);").expect("create table");
    conn.execute("INSERT INTO a(x) VALUES (?)", params![0xFFFFFFFFFFFFFFFFu64]).expect("insert");
}

Output:

thread 'main' panicked at 'insert: DuckDBFailure(Error { code: Unknown, extended_code: 1 }, Some("Conversion Error: Type INT64 with value -1 can't be cast because the value is out of range for the destination type UINT64"))', src/main.rs:7:81
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This behavior might be inherited from Rusqlite, because I know Rusqlite does something like this.

Add support for pre-release DuckDB versions (e.g. "pre" feature).

The currently released 0.6.1 version of DuckDB has a bug that causes core dumps on Linux using batch insertion (see: duckdb/duckdb#5759).

This bug has been fixed, but hasn't been released yet. It would be great if there was a simple feature like "pre" or "dev" allowing for libduckdb-sys to be built against the main branch instead of the released version.

Wasm support?

This doesn't seem to compile within a native Rust Cloudflare Worker. When trying to invoke wrangler dev we see this message: note: rust-lld: error: unable to find library -lduckdb.

Unable to build in Ubuntu linux

I am trying to build the examples. But i am unable to build in Linux. Using the bundled feature flag takes quite a long time and appears to fail.

duckdb = { version = "0.6.0", features = ["bundled" ]}

Logs

   Compiling tokio-util v0.7.4
   Compiling tokio-stream v0.1.11
The following warnings were emitted during compilation:

warning: c++: fatal error: Killed signal terminated program cc1plus
warning: compilation terminated.

error: failed to run custom build command for `libduckdb-sys v0.6.0`

Caused by:
  process didn't exit successfully: `/home/swoorup/personal/sample-rs/target/debug/build/libduckdb-sys-fa153eb3d17a1282/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-changed=/usr/include/clang/15.0.2/include/stdbool.h
  cargo:rerun-if-changed=/usr/include/clang/15.0.2/include/stdint.h
  cargo:rerun-if-changed=/usr/include/stdint.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/libc-header-start.h
  cargo:rerun-if-changed=/usr/include/features.h
  cargo:rerun-if-changed=/usr/include/features-time64.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/wordsize.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/timesize.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/wordsize.h
  cargo:rerun-if-changed=/usr/include/stdc-predef.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/sys/cdefs.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/wordsize.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/long-double.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/gnu/stubs.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/gnu/stubs-64.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/features.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/wordsize.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/timesize.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/wordsize.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/typesizes.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/time64.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/wchar.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/wordsize.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/stdint-intn.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/stdint-uintn.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/stdlib.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/libc-header-start.h
  cargo:rerun-if-changed=/usr/include/features.h
  cargo:rerun-if-changed=/usr/include/clang/15.0.2/include/stddef.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/waitflags.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/waitstatus.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/floatn.h
  cargo:rerun-if-changed=/usr/include/features.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/floatn-common.h
  cargo:rerun-if-changed=/usr/include/features.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/long-double.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/sys/types.h
  cargo:rerun-if-changed=/usr/include/features.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types/clock_t.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types/clockid_t.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types/time_t.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types/timer_t.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/clang/15.0.2/include/stddef.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/stdint-intn.h
  cargo:rerun-if-changed=/usr/include/endian.h
  cargo:rerun-if-changed=/usr/include/features.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/endian.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/endianness.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/byteswap.h
  cargo:rerun-if-changed=/usr/include/features.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/uintn-identity.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/sys/select.h
  cargo:rerun-if-changed=/usr/include/features.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/select.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types/sigset_t.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types/__sigset_t.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types/time_t.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types/struct_timeval.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types/struct_timespec.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/endian.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/types/time_t.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/pthreadtypes.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/thread-shared-types.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/pthreadtypes-arch.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/wordsize.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/atomic_wide_counter.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/struct_mutex.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/struct_rwlock.h
  cargo:rerun-if-changed=/usr/include/alloca.h
  cargo:rerun-if-changed=/usr/include/features.h
  cargo:rerun-if-changed=/usr/include/clang/15.0.2/include/stddef.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/stdlib-float.h
  cargo:rerun-if-changed=/usr/include/x86_64-linux-gnu/bits/floatn.h
  cargo:rerun-if-changed=duckdb/duckdb.hpp
  cargo:rerun-if-changed=duckdb/duckdb.cpp
  TARGET = Some("x86_64-unknown-linux-gnu")
  OPT_LEVEL = Some("0")
  HOST = Some("x86_64-unknown-linux-gnu")
  cargo:rerun-if-env-changed=CXX_x86_64-unknown-linux-gnu
  CXX_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXX_x86_64_unknown_linux_gnu
  CXX_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXX
  HOST_CXX = None
  cargo:rerun-if-env-changed=CXX
  CXX = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64-unknown-linux-gnu
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64_unknown_linux_gnu
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXXFLAGS
  HOST_CXXFLAGS = None
  cargo:rerun-if-env-changed=CXXFLAGS
  CXXFLAGS = None
  cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
  CRATE_CC_NO_DEFAULTS = None
  DEBUG = Some("true")
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,llvm14-builtins-abi,sse,sse2")
  cargo:rerun-if-env-changed=CXX_x86_64-unknown-linux-gnu
  CXX_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXX_x86_64_unknown_linux_gnu
  CXX_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXX
  HOST_CXX = None
  cargo:rerun-if-env-changed=CXX
  CXX = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64-unknown-linux-gnu
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64_unknown_linux_gnu
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXXFLAGS
  HOST_CXXFLAGS = None
  cargo:rerun-if-env-changed=CXXFLAGS
  CXXFLAGS = None
  cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
  CRATE_CC_NO_DEFAULTS = None
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,llvm14-builtins-abi,sse,sse2")
  cargo:rerun-if-env-changed=CXX_x86_64-unknown-linux-gnu
  CXX_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXX_x86_64_unknown_linux_gnu
  CXX_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXX
  HOST_CXX = None
  cargo:rerun-if-env-changed=CXX
  CXX = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64-unknown-linux-gnu
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64_unknown_linux_gnu
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXXFLAGS
  HOST_CXXFLAGS = None
  cargo:rerun-if-env-changed=CXXFLAGS
  CXXFLAGS = None
  cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
  CRATE_CC_NO_DEFAULTS = None
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,llvm14-builtins-abi,sse,sse2")
  cargo:rerun-if-env-changed=CXX_x86_64-unknown-linux-gnu
  CXX_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXX_x86_64_unknown_linux_gnu
  CXX_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXX
  HOST_CXX = None
  cargo:rerun-if-env-changed=CXX
  CXX = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64-unknown-linux-gnu
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64_unknown_linux_gnu
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXXFLAGS
  HOST_CXXFLAGS = None
  cargo:rerun-if-env-changed=CXXFLAGS
  CXXFLAGS = None
  cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
  CRATE_CC_NO_DEFAULTS = None
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,llvm14-builtins-abi,sse,sse2")
  cargo:rerun-if-env-changed=CXX_x86_64-unknown-linux-gnu
  CXX_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXX_x86_64_unknown_linux_gnu
  CXX_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXX
  HOST_CXX = None
  cargo:rerun-if-env-changed=CXX
  CXX = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64-unknown-linux-gnu
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64_unknown_linux_gnu
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXXFLAGS
  HOST_CXXFLAGS = None
  cargo:rerun-if-env-changed=CXXFLAGS
  CXXFLAGS = None
  cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
  CRATE_CC_NO_DEFAULTS = None
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,llvm14-builtins-abi,sse,sse2")
  running: "c++" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-std=c++11" "-o" "/home/swoorup/personal/sample-rs/target/debug/build/libduckdb-sys-0d8aca56e8cb0792/out/duckdb/duckdb.o" "-c" "duckdb/duckdb.cpp"
  cargo:warning=c++: fatal error: Killed signal terminated program cc1plus
  cargo:warning=compilation terminated.
  exit status: 1

  --- stderr


  error occurred: Command "c++" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-std=c++11" "-o" "/home/swoorup/personal/sample-rs/target/debug/build/libduckdb-sys-0d8aca56e8cb0792/out/duckdb/duckdb.o" "-c" "duckdb/duckdb.cpp" with args "c++" did not execute successfully (status code exit status: 1).

Requiring re-build after removing `~/.cargo/registry/src`

Over at PRQL/prql#2870, we're hitting an issue with how duckdb 0.8.0 uses the cargo cache.

Specifically, the standard cargo cache GHA deletes ~/.cargo/registry/src when it's uploading the cache. But if that's removed, then duckdb-rs requires recompiling. This is the only dependency of ours where this occurs, and it is new after 0.7.1. It also seems to occur in 0.8.1. It's possible to test locally by cargo build, removing the path, and then cargo build again.

It's probably less visible in this repo, since the cargo cache GHA only caches dependencies.

I don't have much context for why this happens — it's very possible it's a dependency of duckdb-rs. It is quite disruptive for our CI — it takes 2-3x longer across most of our jobs.

Would you have any idea for what's causing it? Thank you!

Very high RAM consumption on build

Is it normal the bundled crate requires >16GB of RAM to build? It looks like the C++ compiler is compiling one giant cpp file. Is it not possible to perform separate compilation?

Interleaved connections results in table does not exist error

If you open a connection (e.g. conn1 below) and then perform DDL, the original connection doesn't see the new DDL.

Here's some very basic sample code that fails:

use duckdb::Connection;

fn run_query(conn: &mut Connection, query: &str) {
    let mut stmt = conn.prepare(query).unwrap();
    let query_result = stmt.query_arrow([]).unwrap();
    eprintln!("RESULT: {:?}", query_result.collect::<Vec<_>>());
}

fn main() {
    let _ = std::fs::remove_file("/tmp/test_duckdb_concurrency.duckdb");
    let url = "/tmp/test_duckdb_concurrency.duckdb";

    let mut conn = Connection::open_with_flags(
        url,
        duckdb::Config::default()
            .access_mode(duckdb::AccessMode::ReadWrite)
            .unwrap(),
    )
    .unwrap();
    run_query(&mut conn, "DROP TABLE IF EXISTS t");
    run_query(&mut conn, "CREATE TABLE t AS SELECT 1 AS a");

    let mut conn1 = Connection::open_with_flags(
        url,
        duckdb::Config::default()
            .access_mode(duckdb::AccessMode::ReadWrite)
            .unwrap(),
    )
    .unwrap();
    let mut conn2 = Connection::open_with_flags(
        url,
        duckdb::Config::default()
            .access_mode(duckdb::AccessMode::ReadWrite)
            .unwrap(),
    )
    .unwrap();
    run_query(&mut conn2, "CREATE OR REPLACE VIEW x AS SELECT * FROM t");

    run_query(&mut conn2, "SELECT * FROM x");

    run_query(&mut conn1, "SELECT * FROM t");
    run_query(&mut conn1, "SELECT * FROM x");
}

Unable to construct a Connection from a raw pointer

Heya @wangfenjin,

I'm looking at using the virtual table functionality you integrated from my project (duckdb-extension-framework) and I haven't quite figured out how to fully switch. The missing piece is a way to construct a Connection object from a raw pointer in a function called by DuckDB, which is how DuckDB loads extensions.

Something like this would probably suffice:

pub unsafe fn from_cpp(db: duckdb_database) -> Result<Connection> {
    Ok(Connection {
        db: RefCell::new(unsafe { InnerConnection::new(db, false)? }),
        cache: StatementCache::with_capacity(0),
        path: None,
    })
}

which I could then use like this:

#[no_mangle]
pub unsafe extern "C" fn deltatable_init_rust(db: *mut c_void) {
    init(db.cast()).expect("init failed");
}

unsafe fn init(db: duckdb_database) -> Result<(), Box<dyn Error>> {
    let connection = Connection::from_cpp(db);

    connection.register_table_function::<DeltaFunction>("read_delta")?;
    Ok(())
}

I've had a look through the codebase, and can't see a way of doing this without modifying the duckdb-rs code.

Happy to be corrected if I've missed something!

Insert support with data chunk or Arrow

I can't seem to find an interface available that would allow me to insert data from columnar data I have in memory.

For Arrow specifically, I see the following C interfaces in duckdb itself for Arrow support, but they appear to be used for reading data out, or used in small inserts where the data is in the SQL query itself: https://github.com/duckdb/duckdb/blob/c3ba7e5b/src/include/duckdb.h#L1805

I recently opened a question on this in duckdb (duckdb/duckdb#3412) and closed it since there appeared to be support (possibly only through a C++ API with DataChunk). But, I'm thinking that since there doesn't appear to be a C API for writing to their DataChunk type that this might actually be necessary to support it from duckdb-rs.

I thought I might ask here since I think @wangfenjin has done a lot of the Arrow support in DuckDB, but I am using duckdb-rs as my primary interface.

Dealing with arrow nulls?

Hello!

I'm trying to copy an arrow record batch into a duckdb chunk via record_batch_to_duckdb_data_chunk. Generally things are working however it seems that items that should be null aren't working properly. I think on downcasting or otherwise, things are getting dropped, e.g. a null i32 becomes 0. I'm only tried the basic primitive types.

I've created a little test here: tshauck@0d3b87b (I think validity shouldn't be null).

I'm happy to try to take a look, but any pointers where to start would be helpful.

Error on 0.8.0

Hello,

I updated from 0.7.1 to 0.8.0 and I encounter this issue :

Assertion failed: (dynamic_cast<TARGET *>(this)), function Cast, file parse_info.hpp, line 22.

It happen when I create a connection

let db = Connection::open_in_memory()?;

When I go back to 0.7.1, the code works.

My env is the following :

  • rustc 1.69.0 (84c898d65 2023-04-16) (built from a source tarball)
  • cargo 1.69.0 (6e9a83356 2023-04-12)
  • MacOS 13.4

Allow bundling of extensions at compilation time

Hi,

First of all a huge thank you very much for creating and maintaining this extension! It works wonderfully and I am making extensive use of it in prql-query.

I'm currently loading DuckDB extensions through remote installation, eg INSTALL parquet; LOAD parquet; code example. This works fine but causes a noticeable delay on each invocation.

Discussing this with the DuckDB developers on Discord (thread) they pointed out that these extensions can be bundled at compile time:

The various extensions can be bundled by toggling the BUILD_..._EXTENSION in the CMake whereever the shared library is produced:

option(BUILD_ICU_EXTENSION "Build the ICU extension." FALSE)
option(BUILD_PARQUET_EXTENSION "Build the Parquet extension." FALSE)
option(BUILD_TPCH_EXTENSION "Build the TPC-H extension." FALSE)
option(BUILD_TPCDS_EXTENSION "Build the TPC-DS extension." FALSE)
option(BUILD_FTS_EXTENSION "Build the FTS extension." FALSE)
option(BUILD_HTTPFS_EXTENSION "Build the HTTP File System extension." FALSE)
option(BUILD_JSON_EXTENSION "Build the JSON extension." FALSE)
option(BUILD_EXCEL_EXTENSION "Build the excel extension." FALSE)
option(BUILD_INET_EXTENSION "Build the inet extension." FALSE)

Would it be possible for you to facilitate this somehow?

My guess is that this would have to be done as part of the libduckdb-sys build stage somehow but I don't understand enough of this to really attempt a PR for this myself.

"Invalid memory reference" when appending a row with a timestamp

Thanks for the crate!

I'm trying to use the Appender api but get a segfault when there is a timestamp in the row. Here's a minimal example showing what happens (and here is a repo reproducing it):

    #[test]
    fn timestamp_appender_minimal_example_sig_segv() {
        let db = Connection::open_in_memory().unwrap();

        let create_table_sql = r"
          CREATE TABLE item (
              id INTEGER NOT NULL,
              ts TIMESTAMP
          );";
        db.execute_batch(create_table_sql).unwrap();

        let mut app = db.appender("item").unwrap();
        let row_count = 10;
        for i in 0..row_count {
            app.append_row(params![i, "1970-01-01T00:00:00Z"]).unwrap();
        }

        // running 1 test
        // error: test failed, to rerun pass '--lib'
        // Caused by:
        //   process didn't exit successfully: `/path-to-repo` (signal: 11, SIGSEGV: invalid memory reference)

        let val = db
            .query_row("SELECT count(1) FROM item", [], |row| {
                <(u32,)>::try_from(row)
            })
            .unwrap();

        assert_eq!(val, (row_count,));
    }

I've enabled the chrono feature and tried with Utc::now as well. This is using the bundled version of duckdb.

I'd love to help fix the issue, would you mind pointing me in the right direction? I started a draft with failing tests here

uuid::UUID feature - use new DuckDB UUID datatype rather than BLOB

After duckdb 0.3.0, we should start thinking about using UUID instead of BLOB as the underlying datatype for uuid::Uuid.

A new variant of Value seems to be required so that we can stop piggybacking off of Value::Blob.

#[cfg(feature = "uuid")]
impl From<uuid::Uuid> for Value {
    #[inline]
    fn from(id: uuid::Uuid) -> Value {
        Value::Blob(id.as_bytes().to_vec())
    }
}

Is it accurate to say that we will require at least 1 PR to the duckdb repo?

  • From what I can see, /src/appender.rs makes use of APIs such as ffi::duckdb_append_blob, but an API called ffi::duckdb_append_uuid doesn't exist yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.