Giter Club home page Giter Club logo

peloton's Introduction

Peloton Logo

GitHub license Version Travis Status Jenkins Status Coverage Status

UPDATE 2019-03-17

The Peloton project is dead. We have abandoned this repository and moved on to build a new DBMS. There are a several engineering techniques and designs that we learned from this first system on how to support autonomous operations that we are doing a much better job at implementing in the second system.

We will not accept pull requests for this repository. We will also not respond to questions or problems that you may have with running with this software.

We will announce the new system later in 2019.

What Is Peloton?

  • A self-driving SQL database management system.
  • Integrated artificial intelligence components that enable autonomous optimization.
  • Native support for byte-addressable non-volatile memory (NVM) storage technology.
  • Lock-free multi-version concurrency control to support real-time analytics.
  • Postgres network-protocol and JDBC compatible.
  • High-performance, lock-free Bw-Tree for indexing.
  • 100% Open-Source (Apache Software License v2.0).

What Problem Does Peloton Solve?

During last two decades, researchers and vendors have built advisory tools to assist database administrators in system tuning and physical design. This work is incomplete because they still require the final decisions on changes in the database, and are reactionary measures that fix problems after they occur.

A new architecture is needed for a truly “self-driving” database management system (DBMS) which is designed for autonomous operations. This is different than earlier attempts because all aspects of the system are controlled by an integrated planning component. In addition to optimizing the system for the current workload, it predicts future workload trends which lets the system prepare itself accordingly. This eliminates the requirement of a human to determine the right way, and reduces time taken to deploy the changes, optimizing the DBMS to provide high-performance. Auto-management of these systems has surpassed the abilities of human experts.

Peloton is a relational database management system designed for fully autonomous optimization of hybrid workloads. See the peloton wiki for more information.

Installation

Check out the installation instructions.

Supported Platforms

The Wiki also contains a list of supported platforms.

Development / Contributing

We invite you to help us build the future of self-driving DBMSs. Please look up the contributing guide for details.

Issues

Before reporting a problem, please check how to file an issue guide.

Status

Technology preview: currently unsupported, possibly due to incomplete functionality or unsuitability for production use.

Contributors

See the people page for the full listing of contributors.

License

Copyright (c) 2014-2018 CMU Database Group
Licensed under the Apache License.

peloton's People

Contributors

abhishekjoshi2 avatar aelroby avatar allisonwang avatar amaliujia avatar angli-leon avatar apavlo avatar camellyx avatar chenboy avatar eric-haibin-lin avatar fzqneo avatar hanli32 avatar hzxa21 avatar jessesleeping avatar jinwoongkim avatar linmagit avatar luochenumich avatar mattperron avatar mengranwo avatar mindbergh avatar pervazea avatar pmenon avatar ranxian avatar saifalharthi avatar seojungmin avatar sid1607 avatar tli2 avatar viveksengupta avatar wangziqi2013 avatar yeoedward avatar yingjunwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

peloton's Issues

DBMS-Client Interface

The values returned by the DBMS should convey the reason for the failure of a query.

  1. We need to add a mechanism for conveying this information using different error codes similar to Postgres.
  2. We need to revamp our exception handling subsystem to let the compiler do the work for us.

`insert on conflict do update` transaction failed

Tested on 583d1be (Apr 9) and 71d7a68 (Mar 13).

postgres=# select * from test;
 key | value | flag | size 
-----+-------+------+------
(0 rows)

postgres=# INSERT INTO test (key, value, flag, size) VALUES ('1', '1', 1, 1) ON CONFLICT (key) DO UPDATE SET value = excluded.value, flag = excluded.flag, size = excluded.size;
INSERT 0 1
postgres=# select * from test;
 key | value | flag | size 
-----+-------+------+------
 1   | 1     |    1 |    1
(1 row)

postgres=# INSERT INTO test (key, value, flag, size) VALUES ('1', '2', 1, 1) ON CONFLICT (key) DO UPDATE SET value = excluded.value, flag = excluded.flag, size = excluded.size;
ERROR:  transaction failed
19:54:18,379 [../../src/backend/bridge/ddl/ddl_index.cpp:143:CreateIndex] INFO  - Created index(44886)  test_pkey on test.                                        [1/1647]
19:54:23,046 [../../src/backend/bridge/dml/mapper/mapper_seq_scan.cpp:44:TransformSeqScan] INFO  - SeqScan: database oid 12111 table oid 44880: test
19:54:23,046 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:270:BuildPredicateFromQual] INFO  - Predicate:
19:54:23,046 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:272:BuildPredicateFromQual] INFO  - NULL
19:54:23,046 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:89:GetGenericInfoFromScanState] INFO  - project_info : Target List: < DEST_column_id , expression >
DirectMap List: < NEW_col_id , <tuple_idx , OLD_col_id>  > 
<0, <0, 0> >
<1, <0, 1> >
<2, <0, 2> >
<3, <0, 3> >

19:54:23,046 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:123:GetGenericInfoFromScanState] INFO  - Pure direct map projection.
19:54:23,109 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:42:BuildParams] INFO  - Built 0 params 
19:54:23,109 [../../src/backend/concurrency/optimistic_txn_manager.cpp:265:CommitTransaction] INFO  - Committing peloton txn : 2 
19:54:28,416 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:197:PrepareModifyTableState] INFO  - CMD_INSERT
19:54:28,416 [../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:39:TransformModifyTable] INFO  - CMD_INSERT
19:54:28,416 [../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:83:TransformInsert] INFO  - Insert into: database oid 12111 table oid 44880: test
19:54:28,416 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:42:BuildParams] INFO  - Built 0 params 
19:54:28,416 [../../src/backend/concurrency/optimistic_txn_manager.cpp:265:CommitTransaction] INFO  - Committing peloton txn : 3 
19:54:30,187 [../../src/backend/bridge/dml/mapper/mapper_seq_scan.cpp:44:TransformSeqScan] INFO  - SeqScan: database oid 12111 table oid 44880: test
19:54:30,187 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:270:BuildPredicateFromQual] INFO  - Predicate:
19:54:30,187 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:272:BuildPredicateFromQual] INFO  - NULL
19:54:30,187 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:89:GetGenericInfoFromScanState] INFO  - project_info : Target List: < DEST_column_id , expression >
DirectMap List: < NEW_col_id , <tuple_idx , OLD_col_id>  > 
<0, <0, 0> >
<1, <0, 1> >
<2, <0, 2> >
<3, <0, 3> >

19:54:30,187 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:123:GetGenericInfoFromScanState] INFO  - Pure direct map projection.
19:54:30,187 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:42:BuildParams] INFO  - Built 0 params 
19:54:30,188 [../../src/backend/concurrency/optimistic_txn_manager.cpp:265:CommitTransaction] INFO  - Committing peloton txn : 4 
19:54:33,467 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:197:PrepareModifyTableState] INFO  - CMD_INSERT
19:54:33,467 [../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:39:TransformModifyTable] INFO  - CMD_INSERT
19:54:33,467 [../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:83:TransformInsert] INFO  - Insert into: database oid 12111 table oid 44880: test
19:54:33,468 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:42:BuildParams] INFO  - Built 0 params 
19:54:33,468 [../../src/backend/storage/data_table.cpp:228:InsertTuple] WARN  - Index constraint violated
19:54:33,468 [../../src/backend/concurrency/optimistic_txn_manager.cpp:448:AbortTransaction] INFO  - Aborting peloton txn : 5 
ERROR:  transaction failed
STATEMENT:  INSERT INTO test (key, value, flag, size) VALUES ('1', '2', 1, 1) ON CONFLICT (key) DO UPDATE SET value = excluded.value, flag = excluded.flag, size = exclude
d.size;

Constraints

Constraints give you as much control over the data in your tables as you wish. They serve a very important role in DBMSs. We currently only validate null and unique constraints. We need to add more support for constraints.

  1. The primary key and foreign key information should not be stored within each column in the catalog. This should be stored separately as a constraint.
  2. We should add support for foreign key constraints and check constraints in Postgres.
  3. We need a constraint testing suite.
  4. We should handle primary key constraints as a combination of not null and unique constraint

Isolation Levels

We use a multi-version concurrency control protocol. Right now, we only support the default snapshot isolation level.

  1. We need to identify anomalies and avoid them in our protocol.
  2. We need to support higher isolation levels, like the serializable (isolation level)[https://en.wikipedia.org/wiki/Isolation_%28database_systems%29#Isolation_levels].
  3. We need a testing suite for validating different isolation levels.

Prepared Statements not working/supported?

Tested on 6a88a58 (Apr 12) and 71d7a68 (Mar 13).

CREATE TABLE test ( key VARCHAR(200) PRIMARY KEY, value VARCHAR(2048), flag smallint, size smallint );

PREPARE GET (text) AS SELECT key, flag, size, value FROM TEST WHERE key = $1;
EXECUTE GET('1');
postgres=# PREPARE GET (text) AS SELECT key, flag, size, value FROM TEST WHERE key = $1;
PREPARE
postgres=# EXECUTE GET('1');
 key | flag | size | value 
-----+------+------+-------
(0 rows)

postgres=# EXECUTE GET('1');
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
10:36:54,405 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:845:CopyExprState] TRACE - ExprState tag : 400 , Expr tag : 303 
10:36:54,405 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:845:CopyExprState] TRACE - ExprState tag : 400 , Expr tag : 303 
10:36:54,405 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:845:CopyExprState] TRACE - ExprState tag : 400 , Expr tag : 303 
10:36:54,405 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:845:CopyExprState] TRACE - ExprState tag : 400 , Expr tag : 303 
10:36:54,405 [../../src/backend/bridge/dml/mapper/mapper_index_scan.cpp:75:TransformIndexScan] INFO  - Index scan on test using oid 20342, index name: test_pkey
10:36:54,405 [../../src/backend/bridge/dml/mapper/mapper_index_scan.cpp:79:TransformIndexScan] TRACE - Scan order: 1
10:36:54,405 [../../src/backend/bridge/dml/mapper/mapper_index_scan.cpp:86:TransformIndexScan] TRACE - num of scan keys = 1, num of runtime key = 0
10:36:54,405 [../../src/backend/bridge/dml/tuple/tuple_transformer.cpp:144:GetValue] TRACE - len = 1 , text = "1"
10:36:54,405 [../../src/backend/bridge/dml/mapper/mapper_index_scan.cpp:197:BuildScanKey] TRACE - key no: 1
10:36:54,405 [../../src/backend/bridge/dml/mapper/mapper_index_scan.cpp:215:BuildScanKey] INFO  - key >= VARCHAR::[1]"1"[@140435297918393]
10:36:54,405 [../../src/backend/bridge/dml/expr/expr_transformer.cpp:47:TransformExpr] TRACE - Null expression
10:36:54,405 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:270:BuildPredicateFromQual] INFO  - Predicate:
10:36:54,405 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:272:BuildPredicateFromQual] INFO  - NULL
10:36:54,405 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:89:GetGenericInfoFromScanState] INFO  - project_info : Target List: < DEST_column_id , expression >
DirectMap List: < NEW_col_id , <tuple_idx , OLD_col_id>  > 
<0, <0, 0> >
<1, <0, 2> >
<2, <0, 3> >
<3, <0, 1> >

10:36:54,405 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:123:GetGenericInfoFromScanState] INFO  - Pure direct map projection.
10:36:54,406 [../../src/backend/bridge/dml/executor/plan_executor.cpp:51:ExecutePlan] TRACE - PlanExecutor Start 
10:36:54,406 [../../src/backend/bridge/dml/executor/plan_executor.cpp:67:ExecutePlan] TRACE - Txn ID = 3 
10:36:54,406 [../../src/backend/bridge/dml/executor/plan_executor.cpp:68:ExecutePlan] TRACE - Building the executor tree
10:36:54,406 [../../src/backend/bridge/dml/tuple/tuple_transformer.cpp:144:GetValue] TRACE - len = 1 , text = "1"
10:36:54,406 [../../src/backend/bridge/dml/mapper/mapper_utils.cpp:42:BuildParams] INFO  - Built 1 params 
10:36:54,406 [../../src/backend/bridge/dml/executor/plan_executor.cpp:77:ExecutePlan] TRACE - Initializing the executor tree
10:36:54,406 [../../src/backend/bridge/dml/executor/plan_executor.cpp:89:ExecutePlan] TRACE - Running the executor tree
10:36:54,406 [../../src/backend/executor/index_scan_executor.cpp:103:DExecute] INFO  - Index Scan executor :: 0 child
10:36:54,406 [../../src/backend/index/btree_index.cpp:116:Scan] TRACE - Special case : 1 
10:36:54,406 [../../src/backend/index/index.cpp:164:ConstructLowerBoundTuple] TRACE - Column itr : 0  Placeholder : 1 
10:36:54,406 [../../src/backend/index/index.cpp:179:ConstructLowerBoundTuple] TRACE - Lower Bound Tuple :: (VARCHAR::[1]"1"[@140435299163641])

10:36:54,406 [../../src/backend/index/btree_index.cpp:131:Scan] TRACE - All constraints are equal : 1 
10:36:54,406 [../../src/backend/executor/index_scan_executor.cpp:181:ExecIndexLookup] INFO  - Tuple_locations.size(): 0
10:36:54,406 [../../src/backend/bridge/dml/executor/plan_executor.cpp:129:ExecutePlan] TRACE - About to commit: single stmt: 1, init_failure: 0, status: 1
10:36:54,406 [../../src/backend/concurrency/transaction_manager.cpp:254:CommitTransaction] INFO  - Committing peloton txn : 3 
10:36:54,406 [../../src/backend/concurrency/transaction_manager.cpp:222:EndCommitPhase] TRACE - update lcid worked : 3 
10:36:54,406 [../../src/backend/bridge/ddl/ddl.cpp:45:ProcessUtility] TRACE - Process Utility
10:36:54,406 [../../src/backend/bridge/ddl/ddl.cpp:100:ProcessUtility] WARN  - unrecognized node type: 764
10:36:55,103 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:845:CopyExprState] TRACE - ExprState tag : 400 , Expr tag : 303 
10:36:55,103 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:845:CopyExprState] TRACE - ExprState tag : 400 , Expr tag : 303 
10:36:55,103 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:845:CopyExprState] TRACE - ExprState tag : 400 , Expr tag : 303 
10:36:55,103 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:845:CopyExprState] TRACE - ExprState tag : 400 , Expr tag : 303 
10:36:55,103 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:845:CopyExprState] TRACE - ExprState tag : 400 , Expr tag : 305 
10:36:55,103 [../../src/backend/bridge/dml/mapper/mapper_index_scan.cpp:75:TransformIndexScan] INFO  - Index scan on test using oid 20342, index name: test_pkey
10:36:55,103 [../../src/backend/bridge/dml/mapper/mapper_index_scan.cpp:79:TransformIndexScan] TRACE - Scan order: 1
10:36:55,103 [../../src/backend/bridge/dml/mapper/mapper_index_scan.cpp:86:TransformIndexScan] TRACE - num of scan keys = 1, num of runtime key = 1
10:36:55,103 [../../src/backend/bridge/dml/expr/expr_transformer.cpp:442:TransformParam] TRACE - Handle EXTREN PARAM
10:36:55,103 [../../src/backend/expression/parameter_value_expression.cpp:24:ParameterValueExpression] TRACE - ParameterValueExpression 0
10:36:55,103 [../../src/backend/bridge/dml/mapper/mapper_index_scan.cpp:245:BuildRuntimeKey] TRACE - Runtime scankey Expr: + Expression[VALUE_PARAMETER, 31]
   OptimizedParameter[0]

Fatal error under concurrent txns / multiple connections

Peloton fails when I have multiple psql connections submitting queries concurrently.

This is the script I used to submit queries:

psql < insert.sql &
psql < insert.sql &
psql < insert.sql &
psql < insert.sql &
psql < insert.sql &
psql < insert.sql &
psql < insert.sql &
psql < insert.sql &
psql < insert.sql &

This is the content of insert.sql (only one line)

INSERT INTO A1 VALUES(1, 0);

This is the error message I got:

psql: FATAL: could not find relation mapping for relation "pg_tablespace", OID 1213
psql: FATAL: could not find relation mapping for relation "pg_proc", OID 1255
psql: FATAL: could not find relation mapping for relation "pg_type", OID 1247
psql: FATAL: could not find relation mapping for relation "pg_tablespace", OID 1213
INSERT 0 1
ERROR: transaction failed
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
connection to server was lost
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
connection to server was lost

This is what I found in peloton logs (not very informational)

18:17:32,447 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:197:PrepareModifyTableState] INFO - CMD_INSERT
18:17:32,447 [../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:39:TransformModifyTable] INFO - CMD_INSERT
18:17:32,447 [../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:83:TransformInsert] INFO - Insert into: database oid 12145 table oid 12148: a1
2016-04-13 18:17:32.457 EDT :: FATAL: invalid syntax in time zone file "Default", line 325
2016-04-13 18:17:32.460 EDT yifan yifan :: FATAL: could not find relation mapping for relation "pg_tablespace", OID 1213
2016-04-13 18:17:32.461 EDT yifan yifan :: FATAL: could not find relation mapping for relation "pg_proc", OID 1255
2016-04-13 18:17:32.463 EDT yifan yifan :: FATAL: could not find relation mapping for relation "pg_type", OID 1247
2016-04-13 18:17:32.463 EDT yifan yifan :: FATAL: could not find relation mapping for relation "pg_tablespace", OID 1213
18:17:32,481 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:197:PrepareModifyTableState] INFO - CMD_INSERT
18:17:32,481 [../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:39:TransformModifyTable] INFO - CMD_INSERT
18:17:32,481 [../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:83:TransformInsert] INFO - Insert into: database oid 12145 table oid 12148: a1

I can reproduce this problem with the most recent version of peloton using the above script once every ~5 times. Has anyone seen the same issue?

Does the nested_loop_join_executor really support nest loop + index scan?

I suppose it does not work when nl != nullptr and I think the relevant part is wrong.
For the following relevant part code to deal with nest loop + index scan, I think there are several problems,

  1. it is an independent part, children_[1]->SetContext(value, 1); would only take the value relate to the last left_tile_row_itr part.

  2. It does not run Executor() of index scan for each left_tile_row_itrat all.

    /*

    • Go over every pair of tuples in left (outer plan)

    • and pass the joinkey to the executor and inner plan (right)
      */
      if (nl != nullptr && !left_result_tiles_.empty()) {
      // nl is supposed to be set but here is for the original version
      left_tile = left_result_tiles_.back().get();
      for (auto left_tile_row_itr : *left_tile) {
      expression::ContainerTupleexecutor::LogicalTile left_tuple(
      left_tile, left_tile_row_itr);
      ListCell *lc = nullptr;
      foreach (lc, nl->nestParams) {
      NestLoopParam *nlp = (NestLoopParam *)lfirst(lc);
      // int paramno = nlp->paramno;
      // Var *paramval = nlp->paramval;

      /*
       * pass the joinkeys to executor params and set the flag = 1
       */
      Value value = left_tuple.GetValue(nlp->paramval->varattno - 1);
      executor_context_->ClearParams();
      executor_context_->SetParams(value);
      executor_context_->SetParamsExec(1);
      children_[1]->ClearContext();
      children_[1]->SetContext(value, 1);
      
      /* Flag parameter value as changed */
      // innerPlan->chgParam = bms_add_member(innerPlan->chgParam, paramno);
      

      } // end foreach
      } // end for
      } // end if

How to filter a column with given predicates?

I saw there is example:

expression::AbstractExpression *predicate = expression::ExpressionUtil::ConstantValueFactory(Value::GetFalse());

expression::AbstractExpression *tuple_value_expr = nullptr;

tuple_value_expr = expression::ExpressionUtil::TupleValueFactory(0, 0);

Value constant_value = ValueFactory::GetIntegerValue(20);

 expression::AbstractExpression *constant_value_expr =  expression::ExpressionUtil::ConstantValueFactory(constant_value);

 expression::AbstractExpression *equality_expr =
 expression::ExpressionUtil::ComparisonFactory(EXPRESSION_TYPE_COMPARE_EQUAL,
                                                    tuple_value_expr, constant_value_expr);

 predicate = expression::ExpressionUtil::ConjunctionFactory(EXPRESSION_TYPE_CONJUNCTION_OR,
                                                               predicate, equality_expr);

Basically this code creates predicates that helps select * from A where value of column 0 = 20 and tuple id = 0, since here the example uses tuple_value_expr = expression::ExpressionUtil::TupleValueFactory(0, 0);

The questions, how to create a predicate that helps "select * from A where value of column 0 = 20"?

Join Operators and Types

  1. Modify nested loop, hash, and merge join executors to handle cases where one of the tables is empty or both are empty.
  2. Add radix join executor.
  3. Add support for more join types like semi join, anti join, and cross join.
  4. Extend the join testing suite.

Scheduler

Parallelize sequential scans. Need to use TBB.

Main-Memory Oriented Planner

We could swap out the Postgres planner with a main-memory oriented extensible planner.

  1. Get a ANTLR-based C++ SQL parser.
  2. Handle all the basic SQL parse trees in TPC-C and TPC-H in the planner.
  3. Add a planner testing suite.
  4. Integrate the planner with our Postgres-planner compatible execution engine.
  5. Optionally, add support for distribution operators.

Views

  1. Add support for views, both static and dynamic views.

Link error when building tests

I pulled the latest version code and compiled code in master branch, but get the following error:

libtool: link: cannot find the library ../third_party/gmock/libgmock.la' or unhandled argument ../third_party/gmock/libgmock.la

However, I can see this library in the Makefile.am in root_dir,

include $(top_srcdir)/third_party/gmock/Makefile.am

I used make check.

I have no idea what's wrong. Anyone can help fix this?

Multithreaded insert over JDBC doesn't work

Concurrently loading a TPC-H dataset using OLTPBench doesn't seem to be working. Loading the tables serially through OLTPBench does, however, seem to be working.

I haven't dug deep into exactly where the problem is, but figured I'd create a ticket to track the issue.

Scalability

We need a testing suite that evaluates the scalability of the entire DBMS on the TPC-C benchmark.

  1. A testing suite for doing automated scalability analysis. We should test the scalability of the different subsystems -- like the transaction manager, the index structures, etc.
  2. Figuring out scalability bottenecks while running the benchmark and fixing them.

Triggers

A trigger is a specification that the database should automatically execute a particular function whenever a certain type of operation is performed. Triggers can be defined to execute either before or after any INSERT, UPDATE, or DELETE operation, either once per modified row, or once per SQL statement. If a trigger event occurs, the trigger's function is called at the appropriate time to handle the event.

  1. Add support for defining triggers.
  2. Add support for processing triggers.

Update query failure: Type INVALID

When running the following sequence of queries:

insert into test (k, d) values ('123123', '1');
select * from test;
update test set d=CONCAT('prepend',d) where k='123123';

The sequence of queries executes fine in PostgreSQL.
It would fail, and the row is deleted after the failed execution.

user=# select * from test;
 k | d 
---+---
(0 rows)

user=# insert into test (k, d) values ('123123', '1');                         
INSERT 0 1
user=# select * from test;
   k    | d 
--------+---
 123123 | 1
(1 row)
user=# update test set d=CONCAT('prepend',d) where k='123123';                 
ERROR:  Peloton exception :: Type INVALID does not match with VARCHARType INVALID can't be cast as VARCHAR...
user=# select * from test;
 k | d 
---+---
(0 rows)
user=# insert into test (k, d) values ('123123', '1');                         
WARNING:  GUC nest level = 1 at transaction start
INSERT 0 1
user=# select * from test;
   k    | d 
--------+---
 123123 | 1
(1 row)

user=# 
user=# update test set d=CONCAT('prepend',d) where k='123123';                 
ERROR:  Peloton exception :: Type INVALID does not match with VARCHARType INVALID can't be cast as VARCHAR...
user=# insert into test (k, d) values ('123123', '1');                         
WARNING:  GUC nest level = 1 at transaction start
ERROR:  transaction failed

Optimize non-inlined value processing

Store the non-inlined data in a manner optimized for faster comparisons. For instance, a long string can be represented by its start and end letters and a pointer to the actual string.

PostgreSQL wire protocol

Implement the postgreSQL wire protocol in C++11. This will be invaluable to a lot of upcoming open-source DBMSs.

  1. Here's the reference.
  2. There's a nice question related to this on StackOverflow. A simple Python implementation of a subset of this protocol is also given.

Debugging logger

We rely on an ad-hoc macro based logger.

  1. We need to ensure that the default logging levels are sane. Right now, we default to LOG_LEVEL_INFO when DEBUG is not defined, and default to LOG_LEVEL_OFF when NDEBUG is defined.
  2. We need to decide whether we would like to have a trailing newline by default or not.
  3. We need to inherit all printable objects from an abstract class, and force them to implement a Debug function. Then, we can overload the I/O operator just once for this class.

[Wiki] Installation script

Running the dependency script will give me error message "There are problems and -y was used without --force-yes".

Adding --force-yes flag before -y can solve this error.

My OS: Ubuntu 14.04.3 desktop amd64bit

Not picking up libmm14 on Ubuntu 14.04

$ ./configure
checking for mm_version in -lmm... no
configure: error: Please install libmm library : libmm14

This is after installing the libmm14 package.

Index Data Structures

We should refactor our index API. We might want to compare it with other in-memory DBMSs like Memsql.

  1. Add support for a new index data structures -- like a concurrent hash table or a skip list.
  2. Compare different indexes in a bake-off with different workloads.
  3. Upgrade the index testing framework.
  4. Use covering indexes to improve query performance.

configure: error: Please install zlib library : libzlib1g-dev

checking for inflateEnd in -lz... no
checking zlib.h usability... no
checking zlib.h presence... no
checking for zlib.h... no
configure: error: Please install zlib library : libzlib1g-dev

As far as I know libzlib1g-dev doesn't exist on ubuntu. I think you mean zlib1g-dev

atomic problem in sequential scan

Currently, we cannot guarantee atomicity when performing sequential scan. Let us now consider the scenario that a transaction txn_A updates a tuple with key=10. This action essentially contains three sub-actions: (1) create a local version V_new with the latest value assigned; (2) make the local version V_new globally visible by setting its begin and end timestamp; (3) change the older version V_old's end timestamp. If a transaction txn_B performs sequential scan to search for tuples with key=10, it may find nothing! The key reason is that txn_A's sub-actions (2) and (3) are not performed atomically. It is possible that txn_B first reads V_new and finds it invisible, and then txn_A performs (2) and (3), and after that txn_B reads V_old and finds it also invisible! This problem requires special support from garbage-collection component.

Exhaustive Generic Join Test Cases

We need to test all the different combinations of tile groups, value types, join types, and join algorithms. This needs to be generic so that we can easily re-use the code whenever we add a new join executor.

configure: error: Cannot find ssl libraries

checking if ssl is wanted... yes
configure: error: Cannot find ssl libraries

It doesn't tell me to install the libssl-dev package like you do for the other missing dependencies.

Refactor CASE expression type

Right now, CASE expressions are handled between CaseExpression and OperatorCaseWhenExpression, where the former relies on the latter throwing exceptions to determine whether when clauses are false. This is unnecessarily complex.

One solution is to have a single CaseExpression class that accepts a list of pairs of expressions as clauses. The first component of the pair is the "when" condition (either boolean or a value whose type is comparable against the initial expression if one is provided), and the second component is the "then" expression. An initial idea is something like:

typedef std::pair<AbstractExpression*, AbstractExpression*> WhenClause;

class CaseExpression {
 public:
  CaseExpression(AbstractExpression* initial, 
                             std::vector<WhenClause> &clauses,
                             AbstractExpression* default);
};

Evaluation of a case expression is as simple as iterating over the clauses, evaluating the first component of the pair, and if true, returning the evaluation of the second component of the pair.

Schema changes

Handling schema changes (DDL) along with concurrent DML operations is a challenging problem faced by several companies.

  1. Add support for any missing DDL operations.
  2. Handle concurrent DDL operations in a transactionally consistent as well as performant manner.

CleanPlan is not working properly

create table test1(a int);
create table test2(a int);
insert into test1 values (1);
insert into test2 values (2);

select test1.a + 1 from test1,test2;

If this sequence of queries are executed, the peloton fails with segmentation fault.
Our team looked into the stack trace and found that the following line causes the error.
If we comment out the "delete root", it no longer causes segmentation but it may possibly lead
the memory leak.

https://github.com/cmu-db/peloton/blob/master/src/backend/bridge/dml/mapper/mapper.cpp#L195

I include the stack trace.

#0  0x0000000000000085 in ?? ()
#1  0x00007ffff6d7d776 in std::default_delete<peloton::catalog::Schema const>::operator() (this=0x7fffdc0bef10, __ptr=0x7fffdc0bee60) at /usr/include/c++/4.8/bits/unique_ptr.h:67
#2  0x00007ffff6d7d511 in std::unique_ptr<peloton::catalog::Schema const, std::default_delete<peloton::catalog::Schema const> >::~unique_ptr (this=0x7fffdc0bef10, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/unique_ptr.h:184
#3  0x00007ffff6d7d816 in peloton::planner::ProjectionPlan::~ProjectionPlan (this=0x7fffdc0beee0, __in_chrg=<optimized out>) at ../../src/backend/planner/projection_plan.h:29
#4  0x00007ffff6d7d862 in peloton::planner::ProjectionPlan::~ProjectionPlan (this=0x7fffdc0beee0, __in_chrg=<optimized out>) at ../../src/backend/planner/projection_plan.h:29
#5  0x00007ffff6d79b65 in peloton::bridge::PlanTransformer::CleanPlan (root=0x7fffdc0beee0) at ../../src/backend/bridge/dml/mapper/mapper.cpp:194
#6  0x00007ffff6d7b7eb in std::_Sp_counted_deleter<peloton::planner::AbstractPlan const*, void (*)(peloton::planner::AbstractPlan const*), std::allocator<int>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7fffdc0d3870) at /usr/include/c++/4.8/bits/shared_ptr_base.h:347
#7  0x00007ffff7779882 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fffdc0d3870) at /usr/include/c++/4.8/bits/shared_ptr_base.h:144
#8  0x00007ffff777960f in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffeaae0bf8, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:546
#9  0x00007ffff77795b2 in std::__shared_ptr<peloton::planner::AbstractPlan const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffeaae0bf0, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:781
#10 0x00007ffff77795cc in std::shared_ptr<peloton::planner::AbstractPlan const>::~shared_ptr (this=0x7fffeaae0bf0, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr.h:93
#11 0x00007ffff7778f73 in peloton_dml (planstate=0x7fffdc0e7670, sendTuples=true, dest=0x7fffdc0b89a0, tuple_desc=0x7fffdc0e97f8, prepStmtName=0x0) at ../../src/postgres/backend/postmaster/peloton.cpp:201
#12 0x00007ffff7620733 in peloton_ExecutePlan (estate=0x7fffdc0e7560, planstate=0x7fffdc0e7670, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x7fffdc0b89a0, tupDesc=0x7fffdc0e97f8, prepStmtName=0x0) at ../../src/postgres/backend/executor/execMain.cpp:1658
#13 0x00007ffff761e863 in standard_ExecutorRun (queryDesc=0x7fffdc0be910, direction=ForwardScanDirection, count=0) at ../../src/postgres/backend/executor/execMain.cpp:378
#14 0x00007ffff761e6d1 in ExecutorRun (queryDesc=0x7fffdc0be910, direction=ForwardScanDirection, count=0) at ../../src/postgres/backend/executor/execMain.cpp:300
#15 0x00007ffff7808e2b in PortalRunSelect (portal=0x7fffdc0b68d0, forward=true, count=0, dest=0x7fffdc0b89a0) at ../../src/postgres/backend/tcop/pquery.cpp:860
#16 0x00007ffff7808a7c in PortalRun (portal=0x7fffdc0b68d0, count=9223372036854775807, isTopLevel=true, dest=0x7fffdc0b89a0, altdest=0x7fffdc0b89a0, completionTag=0x7fffeaae0fe0 "") at ../../src/postgres/backend/tcop/pquery.cpp:714
#17 0x00007ffff78021e4 in exec_simple_query (query_string=0x7fffdc068130 "select test1.a + 1 from test1,test2;") at ../../src/postgres/backend/tcop/postgres.cpp:1118
#18 0x00007ffff7806b3a in PostgresMain (argc=1, argv=0x7fffdc001f88, dbname=0x7fffdc000c78 "postgres", username=0x7fffdc000c10 "vagrant") at ../../src/postgres/backend/tcop/postgres.cpp:4054
#19 0x00007ffff778bd10 in BackendRun (port=0x60a620) at ../../src/postgres/backend/postmaster/postmaster.cpp:4015
#20 0x00007ffff778b0cb in BackendTask (bn=0x60a7e0, port=0x60a620, param=0x60db60) at ../../src/postgres/backend/postmaster/postmaster.cpp:3634
#21 0x00007ffff7790118 in std::_Bind_simple<void (*(bkend*, Port*, BackendParameters*))(bkend*, Port*, BackendParameters*)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x6063f0) at /usr/include/c++/4.8/functional:1732
#22 0x00007ffff778ff39 in std::_Bind_simple<void (*(bkend*, Port*, BackendParameters*))(bkend*, Port*, BackendParameters*)>::operator()() (this=0x6063f0) at /usr/include/c++/4.8/functional:1720
#23 0x00007ffff778fe88 in std::thread::_Impl<std::_Bind_simple<void (*(bkend*, Port*, BackendParameters*))(bkend*, Port*, BackendParameters*)> >::_M_run() (this=0x6063d8) at /usr/include/c++/4.8/thread:115
#24 0x00007ffff56a0a60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#25 0x00007ffff4c79182 in start_thread (arg=0x7fffeaae8700) at pthread_create.c:312
#26 0x00007ffff632f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Transaction support

Integrate postgres and n-store

Already have transaction support in n-store. Do we need to do extra stuff for postgres ?

insert into select statement does not work

After running the following statements, I get the segmentation fault error.

create table a(test int);
create table b(test int);

insert into a values(1);
insert into a values(2);

insert into b select * from a;

I also attach the stack trace and logs.
It seems that peloton supports the regular insert statements only.

20:28:37,957 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:197:PrepareModifyTableState] INFO  - CMD_INSERT
20:28:37,957 [../../src/backend/bridge/dml/mapper/dml_utils.cpp:247:PrepareInsertState] ERROR - Unsupported child type of Insert: 109
20:28:37,957 [../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:39:TransformModifyTable] INFO  - CMD_INSERT
20:28:37,957 [../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:84:TransformInsert] INFO  - Insert into: database oid 12111 table oid 298832: c

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffea8f4700 (LWP 27224)]
0x00007ffff6d45c7b in peloton::bridge::PlanTransformer::TransformInsert (mt_plan_state=0x7fffdc1487c0, options=...) at ../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:86
86        AbstractPlanState *sub_planstate = mt_plan_state->mt_plans[0];
(gdb) where
#0  0x00007ffff6d45c7b in peloton::bridge::PlanTransformer::TransformInsert (mt_plan_state=0x7fffdc1487c0, options=...) at ../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:86
#1  0x00007ffff6d4599e in peloton::bridge::PlanTransformer::TransformModifyTable (mt_plan_state=0x7fffdc1487c0, options=...) at ../../src/backend/bridge/dml/mapper/mapper_modify_table.cpp:40
#2  0x00007ffff6d438ae in peloton::bridge::PlanTransformer::TransformPlan (planstate=0x7fffdc1487c0, options=...) at ../../src/backend/bridge/dml/mapper/mapper.cpp:88
#3  0x00007ffff6d43725 in peloton::bridge::PlanTransformer::TransformPlan (this=0x7fffea8edda0, planstate=0x7fffdc1487c0, prepStmtName=0x0) at ../../src/backend/bridge/dml/mapper/mapper.cpp:62
#4  0x00007ffff7779030 in peloton_dml (planstate=0x7fffdc147af0, sendTuples=false, dest=0x7fffdc13d890, tuple_desc=0x7fffdc1485e0, prepStmtName=0x0) at ../../src/postgres/backend/postmaster/peloton.cpp:192
#5  0x00007ffff7620823 in peloton_ExecutePlan (estate=0x7fffdc1478a0, planstate=0x7fffdc147af0, operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x7fffdc13d890, tupDesc=0x7fffdc1485e0, prepStmtName=0x0) at ../../src/postgres/backend/executor/execMain.cpp:1658
#6  0x00007ffff761e953 in standard_ExecutorRun (queryDesc=0x7fffdc141880, direction=ForwardScanDirection, count=0) at ../../src/postgres/backend/executor/execMain.cpp:378
#7  0x00007ffff761e7c1 in ExecutorRun (queryDesc=0x7fffdc141880, direction=ForwardScanDirection, count=0) at ../../src/postgres/backend/executor/execMain.cpp:300
#8  0x00007ffff7807e65 in ProcessQuery (plan=0x7fffdc141208, sourceText=0x7fffdc068130 "insert into c\nselect * from a;", params=0x0, dest=0x7fffdc13d890, completionTag=0x7fffea8ed0e0 "", prepStmtName=0x0) at ../../src/postgres/backend/tcop/pquery.cpp:164
#9  0x00007ffff7809854 in PortalRunMulti (portal=0x7fffdc143880, isTopLevel=true, dest=0x7fffdc13d890, altdest=0x7fffdc13d890, completionTag=0x7fffea8ed0e0 "") at ../../src/postgres/backend/tcop/pquery.cpp:1158
#10 0x00007ffff7808d68 in PortalRun (portal=0x7fffdc143880, count=9223372036854775807, isTopLevel=true, dest=0x7fffdc13d890, altdest=0x7fffdc13d890, completionTag=0x7fffea8ed0e0 "") at ../../src/postgres/backend/tcop/pquery.cpp:739
#11 0x00007ffff78023f6 in exec_simple_query (query_string=0x7fffdc068130 "insert into c\nselect * from a;") at ../../src/postgres/backend/tcop/postgres.cpp:1118
#12 0x00007ffff7806d4c in PostgresMain (argc=1, argv=0x7fffdc001f88, dbname=0x7fffdc000c78 "postgres", username=0x7fffdc000c10 "vagrant") at ../../src/postgres/backend/tcop/postgres.cpp:4054
#13 0x00007ffff778bf22 in BackendRun (port=0x60df70) at ../../src/postgres/backend/postmaster/postmaster.cpp:4015
#14 0x00007ffff778b2dd in BackendTask (bn=0x60e130, port=0x60df70, param=0x62ef60) at ../../src/postgres/backend/postmaster/postmaster.cpp:3634
#15 0x00007ffff779032a in std::_Bind_simple<void (*(bkend*, Port*, BackendParameters*))(bkend*, Port*, BackendParameters*)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x614300) at /usr/include/c++/4.8/functional:1732
#16 0x00007ffff779014b in std::_Bind_simple<void (*(bkend*, Port*, BackendParameters*))(bkend*, Port*, BackendParameters*)>::operator()() (this=0x614300) at /usr/include/c++/4.8/functional:1720
#17 0x00007ffff779009a in std::thread::_Impl<std::_Bind_simple<void (*(bkend*, Port*, BackendParameters*))(bkend*, Port*, BackendParameters*)> >::_M_run() (this=0x6142e8) at /usr/include/c++/4.8/thread:115
#18 0x00007ffff560da60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#19 0x00007ffff4be6182 in start_thread (arg=0x7fffea8f4700) at pthread_create.c:312
#20 0x00007ffff629c47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Map plan to operators

(a) Parse plan - get info from postgres catalog
Currently, we can print plans in n-store [ src/backend/bridge.c ].

(b) Return output tuples to postgres
We need to transform from nstore::storage::Tuple -> postgres::TupleTableSlot.

(c) Finish Insert and Seq scan operators.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.