vaultdb / zksql Goto Github PK

View Code? Open in Web Editor NEW

21.0 3.0 5.0 18.38 MB

ZKSQL: Verifiable and Efficient Query Evaluation with Zero-Knowledge Proofs

License: Other

Shell 0.49% CMake 2.06% C++ 75.15% C 0.06% Java 22.25%

zksql's Introduction

ZKSQL: Verifiable and Efficient Query Evaluation with Zero-Knowledge Proofs

Authors

Xiling Li, Chenkai Weng, Yongxin Xu, Xiao Wang, Jennie Rogers

{xiling.li@,ckweng@u.@, yongxinxu2022@u.,wangxiao@cs., jennie@}northwestern.edu

Abstract

Individuals and organizations are using databases to store personal information at an unprecedented rate. This creates a quandary for data providers. They are responsible for protecting the privacy of individuals described in their database. On the other hand, data providers are sometimes required to provide statistics about their data instead of sharing it wholesale with strong assurances that these answers are correct and complete such as in regulatory filings for the US SEC and other goverment organizations.

We introduce a system, ZKSQL, that provides authenticated answers to ad-hoc SQL queries with zero-knowledge proofs. Its proofs show that the answers are correct and sound with respect to the database's contents and they do not divulge any information about its input records. This system constructs proofs over the steps in a query's evaluation and it accelerates this process with authenticated set operations. We validate the efficiency of this approach over a suite of TPC-H queries and our results show that ZKSQL achieves two orders of magnitude speedup over the baseline.

Dependencies

PostgreSQL 9.5+
Apache Calcite 1.8+
Apache Maven 3+
Java 8+
JavaCC 5.0+
Maven 4+
Python 2.7+
cmake 3.11+
Protobuf (protobuf-c-compiler protobuf-compiler in brew)
libpqxx 6.2.5
libgflags-dev
libgrpc++-dev, libgrpc-dev

Docker

In this work, we recommend to run the code with docker setup as shown in:

docker_setup.md

Otherwise, you setup and run the code as shown in the following sections.

Setup

Install the dependencies above as needed

Configure psql and load 60k, 120k and 240k databases

Install zk set operations, emp toolkits (VOLE-based protocols) and pqxx

./setup.sh

Frontend

In our experiments, we generate our plans for backend based on queries in TPC-H benchmark and plans can be found:

cd src/main/cpp/conf/plans/

To cutomize plan, you should use maven to manage all required dependencies:

mvn compile

Then you can parse a SQL query to its canonicalized query tree (in json):

mvn compile exec:java -Dexec.mainClass="org.vaultdb.ParseSqlToJson" -Dexec.args="<db name> <file with SQL query>  <path to write output file>"

For example, to prepare a query for the tpch database in PostgreSQL with the query stored in conf/workload/tpch/queries/01.sql writing the query tree to conf/workload/tpch/plans/01.json, run:

mvn compile exec:java -Dexec.mainClass="org.vaultdb.ParseSqlToJson" -Dexec.args="tpch   conf/workload/tpch/queries/01.sql  conf/workload/tpch/plans"

Backend

Build:

cd src/main/cpp
cmake .

Confirm database in use:

# db_name_ = tpch_60k/tpch_120k/tpch_240k for Alice and db_name_ = tpch_zk_bob for Bob
vim ./test/support/zk_base_test.h

Make TPC-H tests:

make tpch_test

Run tests for Alice (prover) and Bob (verifier) concurrently in separate machines:

# Alice machine
sudo chmod +x ./run-alice.sh
./run-alice.sh tpch_test "alice_ip_address"
# Bob machine
sudo chmod +x ./run-bob.sh
./run-bob.sh tpch_test "alice_ip_address"

To switch databases in use, please modify:

# Create an empty copy of your new DB for Bob (the verifier) with table definitions alone and no rows
# Bob will use this for schema inference
# E.g., db_name_ = "tpch_zk_bob" for Bob
vim ./test/support/zk_base_test.h
make tpch_test

References

Xiling Li and Chenkai Weng and Yongxin Xu and Xiao Wang and Jennie Rogers. ZKSQL: Verifiable and Efficient Query Evaluation with Zero-Knowledge Proofs, In Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 8, 2023.

Xiao Wang, Alex J. Malozemoff, and Jonathan Katz. EMP-toolkit: Efficient MultiParty computation toolkit. https://github.com/emp-toolkit, 2016.

Kang Yang, Pratik Sarkar, Chenkai Weng, and Xiao Wang. 2021. QuickSilver: Efficient and Affordable Zero-Knowledge Proofs for Circuits and Polynomials over Any Field. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS '21). Association for Computing Machinery, New York, NY, USA, 2986–3001. https://doi.org/10.1145/3460120.3484556

Nicholas Franzese, Jonathan Katz, Steve Lu, Rafail Ostrovsky, Xiao Wang, and Chenkai Weng. 2021. Constant-Overhead Zero-Knowledge for RAM Programs. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS '21). Association for Computing Machinery, New York, NY, USA, 178–191. https://doi.org/10.1145/3460120.3484800

Acknowledgement

This work is supported in part by DARPA under Contract No.HR001120C0087, NSF awards #2016240, #1846447, #2236819, and research awards from Facebook and Google. The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

zksql's People

Contributors

Stargazers

Watchers

Forkers

qaz97877 brucefj jamie-cui binadamu-isiyoonekana lengxiaoyi

zksql's Issues

Segmentation fault when running tpch_test

After following the deployment steps in the docker_setup.md file and running the ./bin/tpch_test in the two containers, Bob's process triggers a segmentation fault:

Alice

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TpcHTest
[ RUN      ] TpcHTest.tpch_q3
connected
Initial memory util: 898318336 bytes.
Plaintext runtime: 0.018357s
net_recv_data

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TpcHTest
[ RUN      ] TpcHTest.tpch_q3
connected
Initial memory util: 719974400 bytes.
Plaintext runtime: 0.019346s
Time: 3e-06s,...,Validating Project (1) ... tuple count: 0...success!
Time: 1e-06s,...,Validating Filter (2) ... tuple count: 0...success!
Time: 2e-06s,...,Validating Project (3) ... tuple count: 0...success!
Time: 2e-06s,...,Validating Project (5) ... tuple count: 0...success!
Time: 0s,...,Validating Filter (6) ... tuple count: 0...success!
Segmentation fault (core dumped)

Bob appears to always read 0 tuples, while Alice seems to block on a network read:

Alice's backtrace

#0  __GI___libc_read (nbytes=1048576, buf=0x7f245386b010, fd=4) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __GI___libc_read (fd=4, buf=0x7f245386b010, nbytes=1048576) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007f2454707b9f in _IO_new_file_underflow (fp=0x556df7e8be10) at libioP.h:948
#3  0x00007f24547063b8 in __GI__IO_file_xsgetn (fp=0x556df7e8be10, data=<optimized out>, n=32) at fileops.c:1322
#4  0x00007f24546f9ee3 in __GI__IO_fread (buf=0x7ffc5bdecef0, size=1, count=32, fp=0x556df7e8be10) at iofread.c:38
#5  0x0000556df693e7ca in emp::NetIO::recv_data_internal (this=0x556df7e8b530, data=0x7ffc5bdecef0, len=32) at /usr/local/include/emp-tool/io/net_io_channel.h:139
#6  0x0000556df694c767 in emp::BoolIO<emp::NetIO>::recv_data_internal (this=0x556df7e8bff0, data=0x7ffc5bdecef0, len=32) at /usr/local/include/emp-zk/emp-zk-bool/bool_io.h:69
#7  0x0000556df694b632 in emp::IOChannel<emp::BoolIO<emp::NetIO> >::recv_data (this=0x556df7e8bff0, data=0x7ffc5bdecef0, nbyte=32) at /usr/local/include/emp-tool/io/io_channel.h:19
#8  0x0000556df694dfca in emp::IOChannel<emp::BoolIO<emp::NetIO> >::recv_block(long long __vector(2)*, unsigned long) (this=0x556df7e8bff0, data=0x7ffc5bdecef0, nblock=2)
at /usr/local/include/emp-tool/io/io_channel.h:27
#9  0x0000556df69506b9 in OTPre<emp::BoolIO<emp::NetIO> >::recv(long long __vector(2)*, bool const*, int, emp::BoolIO<emp::NetIO>*, int) (this=0x7f244c001630, data=0x556df7efe9c0, 
    b=0x556df7e5cfa0, length=13, io2=0x556df7e8bff0, s=0) at /usr/local/include/emp-ot/ferret/preot.h:84

Bob's backtrace

#0  0x00007fdeef5534fd in F2kOSTriple<emp::BoolIO<emp::NetIO> >::compute_add_const(long long __vector(2)&, long long __vector(2)&, long long __vector(2) const&, long long __vector(2) const&, long long __vector(2) const&) (this=0x55b808365f50, valb=..., macb=..., 
    vala=..., maca=..., c=...) at /usr/local/include/emp-zk/extensions/ram-zk/ostriple.h:90
#1  0x00007fdeef552167 in ZKSet<emp::BoolIO<emp::NetIO> >::inn_prdt_bch4(long long __vector(2)&, long long __vector(2)&, std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, long long __vector(2), int) (this=0x55b808365710, val=..., mac=..., X=std::vector of length 0, capacity 0, MAC=std::vector of length 0, capacity 0, r=..., count=0) at /usr/local/include/emp-zk-set/zk_set_utils.hpp:196
#2  0x00007fdeef550b68 in ZKSet<emp::BoolIO<emp::NetIO> >::check_set_equality(std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, std::vector<long long __vector(2), std::allocator<long long __vector(2)> > const&, int) (this=0x55b808365710, lval=std::vector of length 0, capacity 0, lmac=std::vector of length 0, capacity 0, 
    rval=std::vector of length 0, capacity 0, rmac=std::vector of length 0, capacity 0, count=0) at /usr/local/include/emp-zk-set/zk_set_utils.hpp:218
#3  0x00007fdeef54fad5 in ZKSet<emp::BoolIO<emp::NetIO> >::equalInternal (this=0x55b808365710, lhs=0x0, rhs=0x0, count=0, block_sz=40) at /usr/local/include/emp-zk-set/zk_set_equal.hpp:36
#4  0x00007fdeef54e9a9 in ZKSet<emp::BoolIO<emp::NetIO> >::equal (this=0x55b808365710, lhs=0x0, rhs=0x0, count=0, block_sz=40) at /usr/local/include/emp-zk-set/zk_set.h:38
#5  0x00007fdeef54bb87 in vaultdb::Sort::runSelf (this=0x7ffce5d9f590) at /home/vaultdb/zksql/src/main/cpp/operators/sort.cpp:87
#6  0x00007fdeef5274d2 in vaultdb::Operator::run (this=0x7ffce5d9f590) at /home/vaultdb/zksql/src/main/cpp/operators/operator.cpp:58
#7  0x00007fdeef55c9a5 in vaultdb::Join::sortByDummyTag (this=0x55b8082f60a0, table=...) at /home/vaultdb/zksql/src/main/cpp/operators/join.cpp:304
#8  0x00007fdeef55bc8f in vaultdb::Join::zeroOutDupes (this=0x55b8082f60a0, table=...) at /home/vaultdb/zksql/src/main/cpp/operators/join.cpp:221
#9  0x00007fdeef55d0b3 in vaultdb::Join::verifyEquality (this=0x55b8082f60a0, lhs=..., rhs=..., isForeignKey=false, offset=0) at /home/vaultdb/zksql/src/main/cpp/operators/join.cpp:350

Looking at SqlInput::runQuery, it seems that Bob's table will have local_output->getTupleCount() * local_output->getSchema()->size() bits, but the local_output->getTupleCount() is always zero, as Bob's database only contains the schema. Making Bob read from the same database as Alice works:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TpcHTest
[ RUN      ] TpcHTest.tpch_q3
connected
Initial memory util: 720433152 bytes.
Plaintext runtime: 0.02695s
Time: 0.005993s,...Validating Project (1)... tuple count: 1500...success!
Time: 0.016764s,...Validating Filter (2)... tuple count: 1500...success!
Time: 0.000845s,...Validating Project (3)... tuple count: 1500...success!
Time: 0.089406s,...Validating Project (5)... tuple count: 15000...success!
Time: 1.80902s,...Validating Filter (6)... tuple count: 15000...success!
Time: 8.14428s,...Validating HashKeyedJoin (7)... tuple count: 15000...success!
Time: 0.474293s,...Validating Project (9)... tuple count: 60175...success!
Time: 4.11588s,...Validating Filter (10)... tuple count: 60175...success!
Time: 75.774s,...Validating Project (11)... tuple count: 60175...success!
Time: 49.0794s,...Validating HashKeyedJoin (12)... tuple count: 60175...success!
Time: 0.382136s,...Validating Project (13)... tuple count: 60175...success!
Time: 7.16988s,...Validating Sort (14)... tuple count: 60175...success!
Time: 8.23881s,...Validating GroupByAggregate (14)... tuple count: 60175...success!
Time: 2.0583s,...Validating Sort (15)... tuple count: 10...success!
Time: 3.3e-05s,...Validating Project (16)... tuple count: 10...success!
Total communication cost: 53710848 bytes
ZK runtime: 160.358s
CPU clock ticks: 1.39216e+08
CPU clock ticks per second: 139.216
[Linux]Peak resident set size: 4719714304 bytes, current memory size: 967020544 bytes.
Memory: 4719714304 bytes
[       OK ] TpcHTest.tpch_q3 (188274 ms)
[----------] 1 test from TpcHTest (188274 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (188274 ms total)
[  PASSED  ] 1 test.

Is there something I am missing? Shouldn't Bob's database have no data?
Thanks for the help.

Inconsistency frontend logical plan

Hi, as I was to trying to use the frontend to parse a sql query (e.g. "q1") to logical plan, it seems that the resulting logical plan is inconsistent with the preset plans in https://github.com/vaultdb/zksql/blob/main/src/main/cpp/conf/plans/zk-q1.json. I've listed the query and the parsed logical plan below.

Any help will be appreciated, thanks :).

The query:

WITH q1_projection AS (
    SELECT l_returnflag, l_linestatus, l_quantity, l_discount, l_extendedprice, l_extendedprice * (1.0 - l_discount) AS disc_price, l_extendedprice * (1.0 - l_discount) * (1.0 + l_tax) AS charge, l_shipdate
    FROM lineitem
    ORDER BY l_returnflag, l_linestatus)
SELECT
     l_returnflag,
     l_linestatus,
     SUM(l_quantity) as sum_qty,
     SUM(l_extendedprice) as sum_base_price,
     SUM(disc_price) as sum_disc_price,
     SUM(charge) as sum_charge,
     AVG(l_quantity) as avg_qty,
     AVG(l_extendedprice) as avg_price,
     AVG(l_discount) as avg_disc,
     COUNT(*) as count_order
FROM
    q1_projection
WHERE
     l_shipdate <= date '1998-08-03'
GROUP BY
   l_returnflag,
   l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus;

results from frontend

{
  "rels": [
    {
      "id": "0",
      "relOp": "LogicalValues",
      "type": [
        {
          "type": "CHAR",
          "nullable": true,
          "precision": 1,
          "name": "l_returnflag"
        },
        {
          "type": "CHAR",
          "nullable": true,
          "precision": 1,
          "name": "l_linestatus"
        },
        {
          "type": "BIGINT",
          "nullable": true,
          "name": "l_quantity"
        },
        {
          "type": "BIGINT",
          "nullable": true,
          "name": "l_extendedprice"
        },
        {
          "type": "DECIMAL",
          "nullable": true,
          "precision": 19,
          "scale": 1,
          "name": "disc_price"
        },
        {
          "type": "DECIMAL",
          "nullable": true,
          "precision": 19,
          "scale": 2,
          "name": "charge"
        },
        {
          "type": "BIGINT",
          "nullable": true,
          "name": "l_discount"
        }
      ],
      "tuples": [],
      "inputs": []
    },
    {
      "id": "1",
      "relOp": "LogicalAggregate",
      "group": [
        0,
        1
      ],
      "aggs": [
        {
          "agg": {
            "name": "SUM",
            "kind": "SUM",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "BIGINT",
            "nullable": true
          },
          "distinct": false,
          "operands": [
            2
          ],
          "name": "sum_qty"
        },
        {
          "agg": {
            "name": "SUM",
            "kind": "SUM",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "BIGINT",
            "nullable": true
          },
          "distinct": false,
          "operands": [
            3
          ],
          "name": "sum_base_price"
        },
        {
          "agg": {
            "name": "SUM",
            "kind": "SUM",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "DECIMAL",
            "nullable": true,
            "precision": 19,
            "scale": 1
          },
          "distinct": false,
          "operands": [
            4
          ],
          "name": "sum_disc_price"
        },
        {
          "agg": {
            "name": "SUM",
            "kind": "SUM",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "DECIMAL",
            "nullable": true,
            "precision": 19,
            "scale": 2
          },
          "distinct": false,
          "operands": [
            5
          ],
          "name": "sum_charge"
        },
        {
          "agg": {
            "name": "AVG",
            "kind": "AVG",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "BIGINT",
            "nullable": true
          },
          "distinct": false,
          "operands": [
            2
          ],
          "name": "avg_qty"
        },
        {
          "agg": {
            "name": "AVG",
            "kind": "AVG",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "BIGINT",
            "nullable": true
          },
          "distinct": false,
          "operands": [
            3
          ],
          "name": "avg_price"
        },
        {
          "agg": {
            "name": "AVG",
            "kind": "AVG",
            "syntax": "FUNCTION"
          },
          "type": {
            "type": "BIGINT",
            "nullable": true
          },
          "distinct": false,
          "operands": [
            6
          ],
          "name": "avg_disc"
        },
        {
          "agg": {
            "name": "COUNT",
            "kind": "COUNT",
            "syntax": "FUNCTION_STAR"
          },
          "type": {
            "type": "BIGINT",
            "nullable": false
          },
          "distinct": false,
          "operands": [],
          "name": "count_order"
        }
      ]
    },
    {
      "id": "2",
      "relOp": "LogicalSort",
      "collation": [
        {
          "field": 0,
          "direction": "ASCENDING",
          "nulls": "LAST"
        },
        {
          "field": 1,
          "direction": "ASCENDING",
          "nulls": "LAST"
        }
      ]
    }
  ]
}

Missing files in docker container

When I enable the latest docker image, I can't find any useful files in the vaultdb directory

vaultdb / zksql Goto Github PK

zksql's Introduction

ZKSQL: Verifiable and Efficient Query Evaluation with Zero-Knowledge Proofs

Authors

Abstract

Dependencies

Docker

Setup

Frontend

Backend

References

Acknowledgement

zksql's People

Contributors

Stargazers

Watchers

Forkers

zksql's Issues

Recommend Projects

Recommend Topics

Recommend Org