Giter Club home page Giter Club logo

zksql's Introduction

ZKSQL: Verifiable and Efficient Query Evaluation with Zero-Knowledge Proofs


Authors

Xiling Li, Chenkai Weng, Yongxin Xu, Xiao Wang, Jennie Rogers

{xiling.li@,ckweng@u.@, yongxinxu2022@u.,wangxiao@cs., jennie@}northwestern.edu


Abstract

Individuals and organizations are using databases to store personal information at an unprecedented rate. This creates a quandary for data providers. They are responsible for protecting the privacy of individuals described in their database. On the other hand, data providers are sometimes required to provide statistics about their data instead of sharing it wholesale with strong assurances that these answers are correct and complete such as in regulatory filings for the US SEC and other goverment organizations.

We introduce a system, ZKSQL, that provides authenticated answers to ad-hoc SQL queries with zero-knowledge proofs. Its proofs show that the answers are correct and sound with respect to the database's contents and they do not divulge any information about its input records. This system constructs proofs over the steps in a query's evaluation and it accelerates this process with authenticated set operations. We validate the efficiency of this approach over a suite of TPC-H queries and our results show that ZKSQL achieves two orders of magnitude speedup over the baseline.


Dependencies

  • PostgreSQL 9.5+
  • Apache Calcite 1.8+
  • Apache Maven 3+
  • Java 8+
  • JavaCC 5.0+
  • Maven 4+
  • Python 2.7+
  • cmake 3.11+
  • Protobuf (protobuf-c-compiler protobuf-compiler in brew)
  • libpqxx 6.2.5
  • libgflags-dev
  • libgrpc++-dev, libgrpc-dev

Docker

In this work, we recommend to run the code with docker setup as shown in:

docker_setup.md

Otherwise, you setup and run the code as shown in the following sections.


Setup

Install the dependencies above as needed

Configure psql and load 60k, 120k and 240k databases

Install zk set operations, emp toolkits (VOLE-based protocols) and pqxx

./setup.sh

Frontend

In our experiments, we generate our plans for backend based on queries in TPC-H benchmark and plans can be found:

cd src/main/cpp/conf/plans/

To cutomize plan, you should use maven to manage all required dependencies:

mvn compile

Then you can parse a SQL query to its canonicalized query tree (in json):

mvn compile exec:java -Dexec.mainClass="org.vaultdb.ParseSqlToJson" -Dexec.args="<db name> <file with SQL query>  <path to write output file>"

For example, to prepare a query for the tpch database in PostgreSQL with the query stored in conf/workload/tpch/queries/01.sql writing the query tree to conf/workload/tpch/plans/01.json, run:

mvn compile exec:java -Dexec.mainClass="org.vaultdb.ParseSqlToJson" -Dexec.args="tpch   conf/workload/tpch/queries/01.sql  conf/workload/tpch/plans"

Backend

Build:

cd src/main/cpp
cmake .

Confirm database in use:

# db_name_ = tpch_60k/tpch_120k/tpch_240k for Alice and db_name_ = tpch_zk_bob for Bob
vim ./test/support/zk_base_test.h

Make TPC-H tests:

make tpch_test

Run tests for Alice (prover) and Bob (verifier) concurrently in separate machines:

# Alice machine
sudo chmod +x ./run-alice.sh
./run-alice.sh tpch_test "alice_ip_address"
# Bob machine
sudo chmod +x ./run-bob.sh
./run-bob.sh tpch_test "alice_ip_address"

To switch databases in use, please modify:

# Create an empty copy of your new DB for Bob (the verifier) with table definitions alone and no rows
# Bob will use this for schema inference
# E.g., db_name_ = "tpch_zk_bob" for Bob
vim ./test/support/zk_base_test.h
make tpch_test

References

Xiling Li and Chenkai Weng and Yongxin Xu and Xiao Wang and Jennie Rogers. ZKSQL: Verifiable and Efficient Query Evaluation with Zero-Knowledge Proofs, In Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 8, 2023.

Xiao Wang, Alex J. Malozemoff, and Jonathan Katz. EMP-toolkit: Efficient MultiParty computation toolkit. https://github.com/emp-toolkit, 2016.

Kang Yang, Pratik Sarkar, Chenkai Weng, and Xiao Wang. 2021. QuickSilver: Efficient and Affordable Zero-Knowledge Proofs for Circuits and Polynomials over Any Field. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS '21). Association for Computing Machinery, New York, NY, USA, 2986โ€“3001. https://doi.org/10.1145/3460120.3484556

Nicholas Franzese, Jonathan Katz, Steve Lu, Rafail Ostrovsky, Xiao Wang, and Chenkai Weng. 2021. Constant-Overhead Zero-Knowledge for RAM Programs. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS '21). Association for Computing Machinery, New York, NY, USA, 178โ€“191. https://doi.org/10.1145/3460120.3484800


Acknowledgement

This work is supported in part by DARPA under Contract No.HR001120C0087, NSF awards #2016240, #1846447, #2236819, and research awards from Facebook and Google. The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

zksql's People

Contributors

jennierogers avatar xilinggrantli avatar

Stargazers

Yijun Lee avatar timofey avatar  avatar Jack Gilcrest avatar misoobu avatar  avatar Erhan avatar Vu Vo avatar Rb avatar  avatar Alex Sheluchin avatar alper avatar novalunosis.eth avatar  avatar Eray avatar  avatar Kevin Mai-Husan Chia avatar mark avatar Hins avatar Secant avatar  avatar

Watchers

Xiao Wang avatar  avatar  avatar

zksql's Issues

Segmentation fault when running tpch_test

After following the deployment steps in the docker_setup.md file and running the ./bin/tpch_test in the two containers, Bob's process triggers a segmentation fault:

  • Alice
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TpcHTest
[ RUN      ] TpcHTest.tpch_q3
connected
Initial memory util: 898318336 bytes.
Plaintext runtime: 0.018357s
net_recv_data
  • Bob
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TpcHTest
[ RUN      ] TpcHTest.tpch_q3
connected
Initial memory util: 719974400 bytes.
Plaintext runtime: 0.019346s
Time: 3e-06s,...,Validating Project (1) ... tuple count: 0...success!
Time: 1e-06s,...,Validating Filter (2) ... tuple count: 0...success!
Time: 2e-06s,...,Validating Project (3) ... tuple count: 0...success!
Time: 2e-06s,...,Validating Project (5) ... tuple count: 0...success!
Time: 0s,...,Validating Filter (6) ... tuple count: 0...success!
Segmentation fault (core dumped)

Bob appears to always read 0 tuples, while Alice seems to block on a network read:

  • Alice's backtrace
#0  __GI___libc_read (nbytes=1048576, buf=0x7f245386b010, fd=4) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __GI___libc_read (fd=4, buf=0x7f245386b010, nbytes=1048576) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007f2454707b9f in _IO_new_file_underflow (fp=0x556df7e8be10) at libioP.h:948
#3  0x00007f24547063b8 in __GI__IO_file_xsgetn (fp=0x556df7e8be10, data=<optimized out>, n=32) at fileops.c:1322
#4  0x00007f24546f9ee3 in __GI__IO_fread (buf=0x7ffc5bdecef0, size=1, count=32, fp=0x556df7e8be10) at iofread.c:38
#5  0x0000556df693e7ca in emp::NetIO::recv_data_internal (this=0x556df7e8b530, data=0x7ffc5bdecef0, len=32) at /usr/local/include/emp-tool/io/net_io_channel.h:139
#6  0x0000556df694c767 in emp::BoolIO<emp::NetIO>::recv_data_internal (this=0x556df7e8bff0, data=0x7ffc5bdecef0, len=32) at /usr/local/include/emp-zk/emp-zk-bool/bool_io.h:69
#7  0x0000556df694b632 in emp::IOChannel<emp::BoolIO<emp::NetIO> >::recv_data (this=0x556df7e8bff0, data=0x7ffc5bdecef0, nbyte=32) at /usr/local/include/emp-tool/io/io_channel.h:19
#8  0x0000556df694dfca in emp::IOChannel<emp::BoolIO<emp::NetIO> >::recv_block(long long __vector(2)*, unsigned long) (this=0x556df7e8bff0, data=0x7ffc5bdecef0, nblock=2)
at /usr/local/include/emp-tool/io/io_channel.h:27
#9  0x0000556df69506b9 in OTPre<emp::BoolIO<emp::NetIO> >::recv(long long __vector(2)*, bool const*, int, emp::BoolIO<emp::NetIO>*, int) (this=0x7f244c001630, data=0x556df7efe9c0, 
    b=0x556df7e5cfa0, length=13, io2=0x556df7e8bff0, s=0) at /usr/local/include/emp-ot/ferret/preot.h:84
  • Bob's backtrace
#0  0x00007fdeef5534fd in F2kOSTriple<emp::BoolIO<emp::NetIO> >::compute_add_const(long long __vector(2)&, long long __vector(2)&, long long __vector(2) const&, long long __vector(2) const&, long long __vector(2) const&) (this=0x55b808365f50, valb=..., macb=..., 
    vala=..., maca=..., c=...) at /usr/local/include/emp-zk/extensions/ram-zk/ostriple.h:90
#1  0x00007fdeef552167 in ZKSet<emp::BoolIO<emp::NetIO> >::inn_prdt_bch4(long long __vector(2)&, long long __vector(2)&, std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, long long __vector(2), int) (this=0x55b808365710, val=..., mac=..., X=std::vector of length 0, capacity 0, MAC=std::vector of length 0, capacity 0, r=..., count=0) at /usr/local/include/emp-zk-set/zk_set_utils.hpp:196
#2  0x00007fdeef550b68 in ZKSet<emp::BoolIO<emp::NetIO> >::check_set_equality(std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, int) (this=0x55b808365710, lval=std::vector of length 0, capacity 0, lmac=std::vector of length 0, capacity 0, 
    rval=std::vector of length 0, capacity 0, rmac=std::vector of length 0, capacity 0, count=0) at /usr/local/include/emp-zk-set/zk_set_utils.hpp:218
#3  0x00007fdeef54fad5 in ZKSet<emp::BoolIO<emp::NetIO> >::equalInternal (this=0x55b808365710, lhs=0x0, rhs=0x0, count=0, block_sz=40) at /usr/local/include/emp-zk-set/zk_set_equal.hpp:36
#4  0x00007fdeef54e9a9 in ZKSet<emp::BoolIO<emp::NetIO> >::equal (this=0x55b808365710, lhs=0x0, rhs=0x0, count=0, block_sz=40) at /usr/local/include/emp-zk-set/zk_set.h:38
#5  0x00007fdeef54bb87 in vaultdb::Sort::runSelf (this=0x7ffce5d9f590) at /home/vaultdb/zksql/src/main/cpp/operators/sort.cpp:87
#6  0x00007fdeef5274d2 in vaultdb::Operator::run (this=0x7ffce5d9f590) at /home/vaultdb/zksql/src/main/cpp/operators/operator.cpp:58
#7  0x00007fdeef55c9a5 in vaultdb::Join::sortByDummyTag (this=0x55b8082f60a0, table=...) at /home/vaultdb/zksql/src/main/cpp/operators/join.cpp:304
#8  0x00007fdeef55bc8f in vaultdb::Join::zeroOutDupes (this=0x55b8082f60a0, table=...) at /home/vaultdb/zksql/src/main/cpp/operators/join.cpp:221
#9  0x00007fdeef55d0b3 in vaultdb::Join::verifyEquality (this=0x55b8082f60a0, lhs=..., rhs=..., isForeignKey=false, offset=0) at /home/vaultdb/zksql/src/main/cpp/operators/join.cpp:350

Looking at SqlInput::runQuery, it seems that Bob's table will have local_output->getTupleCount() * local_output->getSchema()->size() bits, but the local_output->getTupleCount() is always zero, as Bob's database only contains the schema. Making Bob read from the same database as Alice works:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TpcHTest
[ RUN      ] TpcHTest.tpch_q3
connected
Initial memory util: 720433152 bytes.
Plaintext runtime: 0.02695s
Time: 0.005993s,...Validating Project (1)... tuple count: 1500...success!
Time: 0.016764s,...Validating Filter (2)... tuple count: 1500...success!
Time: 0.000845s,...Validating Project (3)... tuple count: 1500...success!
Time: 0.089406s,...Validating Project (5)... tuple count: 15000...success!
Time: 1.80902s,...Validating Filter (6)... tuple count: 15000...success!
Time: 8.14428s,...Validating HashKeyedJoin (7)... tuple count: 15000...success!
Time: 0.474293s,...Validating Project (9)... tuple count: 60175...success!
Time: 4.11588s,...Validating Filter (10)... tuple count: 60175...success!
Time: 75.774s,...Validating Project (11)... tuple count: 60175...success!
Time: 49.0794s,...Validating HashKeyedJoin (12)... tuple count: 60175...success!
Time: 0.382136s,...Validating Project (13)... tuple count: 60175...success!
Time: 7.16988s,...Validating Sort (14)... tuple count: 60175...success!
Time: 8.23881s,...Validating GroupByAggregate (14)... tuple count: 60175...success!
Time: 2.0583s,...Validating Sort (15)... tuple count: 10...success!
Time: 3.3e-05s,...Validating Project (16)... tuple count: 10...success!
Total communication cost: 53710848 bytes
ZK runtime: 160.358s
CPU clock ticks: 1.39216e+08
CPU clock ticks per second: 139.216
[Linux]Peak resident set size: 4719714304 bytes, current memory size: 967020544 bytes.
Memory: 4719714304 bytes
[       OK ] TpcHTest.tpch_q3 (188274 ms)
[----------] 1 test from TpcHTest (188274 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (188274 ms total)
[  PASSED  ] 1 test.

Is there something I am missing? Shouldn't Bob's database have no data?
Thanks for the help.

Inconsistency frontend logical plan

Hi, as I was to trying to use the frontend to parse a sql query (e.g. "q1") to logical plan, it seems that the resulting logical plan is inconsistent with the preset plans in https://github.com/vaultdb/zksql/blob/main/src/main/cpp/conf/plans/zk-q1.json. I've listed the query and the parsed logical plan below.

Any help will be appreciated, thanks :).

The query:

WITH q1_projection AS (
    SELECT l_returnflag, l_linestatus, l_quantity, l_discount, l_extendedprice, l_extendedprice * (1.0 - l_discount) AS disc_price, l_extendedprice * (1.0 - l_discount) * (1.0 + l_tax) AS charge, l_shipdate
    FROM lineitem
    ORDER BY l_returnflag, l_linestatus)
SELECT
     l_returnflag,
     l_linestatus,
     SUM(l_quantity) as sum_qty,
     SUM(l_extendedprice) as sum_base_price,
     SUM(disc_price) as sum_disc_price,
     SUM(charge) as sum_charge,
     AVG(l_quantity) as avg_qty,
     AVG(l_extendedprice) as avg_price,
     AVG(l_discount) as avg_disc,
     COUNT(*) as count_order
FROM
    q1_projection
WHERE
     l_shipdate <= date '1998-08-03'
GROUP BY
   l_returnflag,
   l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus;

results from frontend

{
  "rels": [
    {
      "id": "0",
      "relOp": "LogicalValues",
      "type": [
        {
          "type": "CHAR",
          "nullable": true,
          "precision": 1,
          "name": "l_returnflag"
        },
        {
          "type": "CHAR",
          "nullable": true,
          "precision": 1,
          "name": "l_linestatus"
        },
        {
          "type": "BIGINT",
          "nullable": true,
          "name": "l_quantity"
        },
        {
          "type": "BIGINT",
          "nullable": true,
          "name": "l_extendedprice"
        },
        {
          "type": "DECIMAL",
          "nullable": true,
          "precision": 19,
          "scale": 1,
          "name": "disc_price"
        },
        {
          "type": "DECIMAL",
          "nullable": true,
          "precision": 19,
          "scale": 2,
          "name": "charge"
        },
        {
          "type": "BIGINT",
          "nullable": true,
          "name": "l_discount"
        }
      ],
      "tuples": [],
      "inputs": []
    },
    {
      "id": "1",
      "relOp": "LogicalAggregate",
      "group": [
        0,
        1
      ],
      "aggs": [
        {
          "agg": {
            "name": "SUM",
            "kind": "SUM",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "BIGINT",
            "nullable": true
          },
          "distinct": false,
          "operands": [
            2
          ],
          "name": "sum_qty"
        },
        {
          "agg": {
            "name": "SUM",
            "kind": "SUM",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "BIGINT",
            "nullable": true
          },
          "distinct": false,
          "operands": [
            3
          ],
          "name": "sum_base_price"
        },
        {
          "agg": {
            "name": "SUM",
            "kind": "SUM",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "DECIMAL",
            "nullable": true,
            "precision": 19,
            "scale": 1
          },
          "distinct": false,
          "operands": [
            4
          ],
          "name": "sum_disc_price"
        },
        {
          "agg": {
            "name": "SUM",
            "kind": "SUM",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "DECIMAL",
            "nullable": true,
            "precision": 19,
            "scale": 2
          },
          "distinct": false,
          "operands": [
            5
          ],
          "name": "sum_charge"
        },
        {
          "agg": {
            "name": "AVG",
            "kind": "AVG",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "BIGINT",
            "nullable": true
          },
          "distinct": false,
          "operands": [
            2
          ],
          "name": "avg_qty"
        },
        {
          "agg": {
            "name": "AVG",
            "kind": "AVG",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "BIGINT",
            "nullable": true
          },
          "distinct": false,
          "operands": [
            3
          ],
          "name": "avg_price"
        },
        {
          "agg": {
            "name": "AVG",
            "kind": "AVG",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "BIGINT",
            "nullable": true
          },
          "distinct": false,
          "operands": [
            6
          ],
          "name": "avg_disc"
        },
        {
          "agg": {
            "name": "COUNT",
            "kind": "COUNT",
            "syntax": "FUNCTION_STAR"
          },
          "type": {
            "type": "BIGINT",
            "nullable": false
          },
          "distinct": false,
          "operands": [],
          "name": "count_order"
        }
      ]
    },
    {
      "id": "2",
      "relOp": "LogicalSort",
      "collation": [
        {
          "field": 0,
          "direction": "ASCENDING",
          "nulls": "LAST"
        },
        {
          "field": 1,
          "direction": "ASCENDING",
          "nulls": "LAST"
        }
      ]
    }
  ]
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.