Giter Club home page Giter Club logo

duckdb-wasm's Introduction

DuckDB-Wasm

DuckDB is an in-process SQL OLAP Database Management System.

DuckDB-Wasm brings DuckDB to every browser thanks to WebAssembly.

Duckdb-Wasm speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js. Learn more about DuckDB-Wasm from our VLDB publication or the recorded talk.

Try it out at shell.duckdb.org or Observable, read the API documentation, check out the web-app examples, and chat with us on Discord.

Build from source

git clone https://github.com/duckdb/duckdb-wasm.git
cd duckdb-wasm
git submodule init
git submodule update
make

Repository Structure

Subproject Description Language
duckdb_wasm Wasm Library C++
@duckdb/duckdb-wasm Typescript API Typescript
@duckdb/duckdb-wasm-shell SQL Shell Rust
@duckdb/duckdb-wasm-app GitHub Page Typescript
@duckdb/react-duckdb React Hooks Typescript

duckdb-wasm's People

Contributors

ankoh avatar bitflower avatar candux avatar carlopi avatar chenxi8611 avatar dakror avatar dberardo-com avatar dengkunli avatar dependabot[bot] avatar domoritz avatar duanluan avatar dude0001 avatar elefeint avatar hannes avatar jcolot avatar jimmyhmiller avatar jraymakers avatar keller-mark avatar mytherin avatar o-alexandre-felipe avatar ozkatz avatar ravwojdyla avatar samansmink avatar spff avatar stratospark avatar szarnyasg avatar tishj avatar whscullin avatar y-- avatar ywelsch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

duckdb-wasm's Issues

Prepare for public Github Actions

We should migrate gracefully from my custom runner to public actions.
We should have a dedicated job that checks up-front if the docker CI image exists.
All other jobs then depend on that image directly.

Automate NPM releases

wasm_release and the current package naming are too error prone.
Better automate that. Maybe even -dev releases.

DOMException: Failed to construct 'Worker'

I would like to construct a worker from jsdelivr but it doesn't seem to work (probably a CORS issue).

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>DuckDB WASM Test</title>
</head>
<body>
    <script src="https://cdn.jsdelivr.net/npm/@dashql/[email protected]/dist/duckdb-browser-async.js"></script>
    <script>
        async function run() {
            const DUCKDB_CONFIG = await duckdb.configure(duckdb.getJsDelivrBundles());

            console.log('config', DUCKDB_CONFIG);

            const logger = new duckdb.ConsoleLogger();
            const worker = new Worker(DUCKDB_CONFIG.mainWorker);
            const db = new duckdb.AsyncDuckDB(logger, worker);
            await db.instantiate(DUCKDB_CONFIG.mainModule, DUCKDB_CONFIG.pthreadWorker);
        }

        run();
    </script>
</body>
</html>

I think we should automatically detect when we make a cross domain request and then create a worker with importScripts as shown in https://stackoverflow.com/a/62914052/214950.

The example of loading JSON text in the README fails

The README gives (after #333):

await db.registerFileText(
    'rows.json',
    `[
    { “col1”: 1, “col2”: “foo” },
    { “col1”: 2, “col2”: “bar” },
]`,
);
await db.insertJSONFromPath('rows.json', { name: 'test' });

But, when I run that, I get TypeError: t is undefined:

Uncaught (in promise) TypeError: t is undefined
    insertJSONFromPath http://devd.io:8000/dist/viewer_duckdb.js:11833
    <anonymous> debugger eval code:6
    async* debugger eval code:9

I created a repository that reproduces the problem, you can just clone it and open up index.html in a browser to see it in action

Add .examples command

We need an .examples command in the shell to show interesting queries.
That way, we can better present features such as remote parquet scans or the database import.

Export files via /out/

We should support a pseudo output directory /out/ in the wasm filesystem.

# Exports the relation lineitem to a pseudo file somefile.csv.
COPY lineitem TO '/out/somefile.csv' ( DELIMITER '|', HEADER );

# Exports the database to a pseudo directory somedb.
EXPORT DATABASE '/out/somedb' (FORMAT PARQUET);

The shell should check these paths and provide the files to the user via a download dialog.

Open remote database files read-only

We should support opening remote database files with open().
Our filesystem already supports HTTP range requests so that's likely only targeting the front-facing WebDB api.

  • Generate DuckDB TPCH database files in CI script
  • Serve the DuckDB database files with Karma
  • Implement open API with filesystem path

Support prepared statements

WebDB should maintain a dictionary of currently prepared statements.
For now, the js-api can then just pass a stringified json object with the arguments to WASM.

Example of the C-API:

duckdb_prepared_statement stmt;
if (duckdb_prepare(con, "INSERT INTO integers VALUES (?, ?)", &stmt) == DuckDBError) {
	// handle error
}

duckdb_bind_int32(stmt, 1, 42); // the parameter index starts counting at 1!
duckdb_bind_int32(stmt, 2, 43);
// NULL as second parameter means no result set is requested
duckdb_execute_prepared(stmt, NULL);
duckdb_destroy_prepare(&stmt);

// we can also query result sets using prepared statements
if (duckdb_prepare(con, "SELECT * FROM integers WHERE i = ?", &stmt) == DuckDBError) {
	// handle error
}
duckdb_bind_int32(stmt, 1, 42);
duckdb_execute_prepared(stmt, &result);

// do something with result

// clean up
duckdb_destroy_result(&result);
duckdb_destroy_prepare(&stmt);

Add S3FS

SELECT * FROM parquet_scan('s3://<bucket>/<file>');

Would be a pretty sweet example.
Mostly building this on top of our buffered filesystem.

Support custom expressions

It would be great to support e.g. js expressions in DuckDB so that one can write (for now synchronous) code to compute some derived value. Maybe we could even support user defined aggregates.

As we discussed before, supporting async code would be pretty trick since we can't just stop and reenter the execution in wasm right now. Maybe DuckDB will add async execution in the future to support async expressions as well.

Globbing HTTP files should run HEAD requests

Right now, globbing a HTTP file returns false which means that remote scans can only be done with explicit URL registration.
Better would be to just run HEAD requests on globs.
That way, we can support SELECT * FROM 'http://some-parquet'

Attach command

Remote databases are read-only right now.
I think the much more useful approach would be to "attach" to a remote database by importing all of its tables as views.
That way, the user can still create new tables/views and scan remote tables if necessary.

Requires duckdb/duckdb#1985

Polish rust bindings

Right now, we just implement the bare minimum for the shell.
We should just maintain a rust crate for wasm bindings since that is useful on its own.

Customize loader

It would be great if there was a way to customize the http loader so that a user can override authentication and base urls easily.

Fix DuckDB version tags in CI

Right now, the shell does not show the correct DuckDB version since the tags are not properly fetched in the GitHub actions.

JsDelivr Bundles

Picking the right duckdb bundle is a little bit too inconvenient at the moment.
configure should just derive all bundle urls via jsdelivr if not provided explicitly.

Issue with aggregation

If I try and do the following it works fine in the normal DuckDB build (using python for example):

import duckdb
cursor=duckdb.connect()
cursor.execute("create table products as select * from 'Products.csv'")
cursor.execute("select productgroup FROM p group by productgroup").fetchall()

However, if trying it in wasm I get an error about binding:

duckdb> select ProductGroup FROM products GROUP BY ProductGroup;
Error: Binder Error: Referenced column "productgroup" not found in FROM clause!
Candidate bindings: "products.ProductGroup"
LINE 1: ...ct ProductGroup FROM products GROUP BY ProductGroup;

Here's a 30s video as well if that's helpful: https://gyazo.com/a909ade3348d4f8ecfe2183d209dd56f.

And here's the file:

$ cat > Products.csv
ProductGroup,Product,Year,Quarter,Revenue,Units,Count,Product Key,Reseller,Product Info,QuarterAsNumber
Electronics,Phone,2018,Q1,103,7,1,2018-Q1,Sony,Format=XML; <Properties>…,1
Electronics,Phone,2018,Q1,102,4,1,2018-Q1,Sony,Format=XML; <Properties>…,1
Electronics,Phone,2019,Q1,98,12,1,2019-Q1,Sony,Format=XML; <Properties>…,1
Electronics,Computer,2018,Q1,104,3,1,2018-Q1,Samsung,Format=XML; <Properties>…,1
Electronics,Computer,2019,Q1,83,7,1,2019-Q1,Google,Format=XML; <Properties>…,1
Media,Theater,2018,Q1,17,4,1,2018-Q1,Sony,Format=XML; <Properties>…,1
Media,Theater,2019,Q1,20,7,1,2019-Q1,Sony,Format=XML; <Properties>…,1
Media,Movies,2018,Q1,25,12,1,2018-Q1,Microsoft,Format=XML; <Properties>…,1
Media,Movies,2019,Q1,26,13,1,2019-Q1,Sony,Format=XML; <Properties>…,1
Electronics,Phone,2018,Q2,105,5,1,2018-Q2,Samsung,Format=XML; <Properties>…,2
Electronics,Phone,2019,Q2,82,15,1,2019-Q2,LG,Format=XML; <Properties>…,2
Electronics,Computer,2018,Q2,99,4,1,2018-Q2,LG,Format=XML; <Properties>…,2
Electronics,Computer,2019,Q2,84,20,1,2019-Q2,Sony,Format=XML; <Properties>…,2
Media,Theater,2018,Q2,17,4,1,2018-Q2,Microsoft,Format=XML; <Properties>…,2
Media,Theater,2019,Q2,22,5,1,2019-Q2,Sony,Format=XML; <Properties>…,2
Media,Movies,2018,Q2,25,12,1,2018-Q2,Samsung,Format=XML; <Properties>…,2
Media,Movies,2019,Q2,26,14,1,2019-Q2,Google,Format=XML; <Properties>…,2
Electronics,Phone,2000,Q1,103,7,1,2000-Q1,Sony,Format=XML; <Properties>…,1
Electronics,Phone,2001,Q1,102,4,1,2001-Q1,Sony,Format=XML; <Properties>…,1
Electronics,Phone,2002,Q1,98,12,1,2002-Q1,Microsoft,Format=XML; <Properties>…,1
Electronics,Computer,2003,Q1,104,3,1,2003-Q1,Sony,Format=XML; <Properties>…,1
Electronics,Computer,2004,Q1,83,7,1,2004-Q1,Samsung,Format=XML; <Properties>…,1
Media,Theater,2005,Q1,17,4,1,2005-Q1,Google,Format=XML; <Properties>…,1
Media,Theater,2006,Q1,20,7,1,2006-Q1,Sony,Format=XML; <Properties>…,1
Media,Movies,2007,Q1,25,12,1,2007-Q1,Sony,Format=XML; <Properties>…,1
Media,Movies,2008,Q1,26,13,1,2008-Q1,Microsoft,Format=XML; <Properties>…,1
Electronics,Phone,2009,Q2,105,5,1,2009-Q2,Sony,Format=XML; <Properties>…,2
Electronics,Phone,2010,Q2,82,15,1,2010-Q2,Sony,Format=XML; <Properties>…,2
Electronics,Computer,2011,Q2,99,4,1,2011-Q2,Sony,Format=XML; <Properties>…,2
Electronics,Computer,2012,Q2,84,20,1,2012-Q2,Sony,Format=XML; <Properties>…,2
Media,Theater,2013,Q2,17,4,1,2013-Q2,Sony,Format=XML; <Properties>…,2
Media,Theater,2014,Q2,22,5,1,2014-Q2,Sony,Format=XML; <Properties>…,2
Media,Movies,2015,Q2,25,12,1,2015-Q2,Sony,Format=XML; <Properties>…,2
Media,Movies,2016,Q2,26,14,1,2016-Q2,Samsung,Format=XML; <Properties>…,2
Media,Movies,2017,Q1,26,13,1,2017-Q1,Google,Format=XML; <Properties>…,1
Electronics,Phone,2018,Q2,105,5,1,2018-Q2,Sony,Format=XML; <Properties>…,2
Electronics,Phone,2019,Q2,82,15,1,2019-Q2,Sony,Format=XML; <Properties>…,2
Electronics,Computer,2020,Q2,99,4,1,2020-Q2,Microsoft,Format=XML; <Properties>…,2
Electronics,Phone,2020,Q1,103,7,1,2020-Q1,Sony,Format=XML; <Properties>…,1
Electronics,Phone,2020,Q2,102,4,1,2020-Q2,Samsung,Format=XML; <Properties>…,2
Electronics,Phone,2020,Q3,98,12,1,2020-Q3,LG,Format=XML; <Properties>…,3
Electronics,Computer,2020,Q4,104,3,1,2020-Q4,LG,Format=XML; <Properties>…,4
Electronics,Computer,2020,Q1,83,7,1,2020-Q1,Sony,Format=XML; <Properties>…,1
Media,Theater,2020,Q1,17,4,1,2020-Q1,Microsoft,Format=XML; <Properties>…,1
Media,Theater,2020,Q1,20,7,1,2020-Q1,Sony,Format=XML; <Properties>…,1

Ship -Os variants

They are slower, but roughly half the size.
We should ship an -Os variant of every wasm module and let the user select either small module size or performance.

Import from native objects

Someone may want to import an existing array of objects or object of columns into DuckDB. If we support importing Arrow Tables, we would solve this issue.

Parking Lot based Shared Lock

Our shared lock is based on parking lots and has a rarely occurring bug where all threads keep spinning.
Replaced with std::shared_mutex for now here.

Need stress tests for the parking lot.
This is only relevant for the experimental pthread bundle and does not effect single-threaded builds.

Replace terminal with alternate greeter on smartphones

We just don't support too narrow screens at the moment.
Fall back to a different greeter with error description.

Especially the printing of wide arrow tables is just not possible on small screens.
We could maybe just make the font size small enough in these cases but an alternate greeter will do for now.

Test embedding of shell

We should test the shell embedding in the CI since it will be used in duckdb-web and dashql.

Terminal error logger

Right now, some errors in the shell are still only logged to the console.
Make sure the most important errors are printed to xterm.

Implement importing of databases

We should support importing database files via HTTP.
This shouldn't be too hard since we already support HTTP files in the wasm filesystem.

The shell should also expose this via URL parameter.
Demo with university schema and TPC-H 0.01.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.