citusdata / citus Goto Github PK

Distributed PostgreSQL as an extension

License: GNU Affero General Public License v3.0

Makefile 0.25% Shell 0.24% C 57.62% Perl 0.87% PLpgSQL 34.99% Ruby 3.44% Python 2.26% M4 0.14% sed 0.12% GDB 0.01% Dockerfile 0.06%

database citus multi-tenant postgresql scale sharding sql distributed-database postgres citus-extension

citus's People

Contributors

Stargazers

Watchers

Forkers

supr directorscut82 ticketscale eeeebbbbrrrr varver jghoman flinkt gurjeet uikit0 r1015 a320321wb dieface digoal hiproz shengxianye ajmssc daamien dailypipsgxj xluffy-fork kast0rtr0y amagnasco thisiswei ezhangle codearchival kkkato xubingyue lemonhall delkyd fengshao0907 dama2010 fdr sharper bogdan-pr eydunn inverselina zoujiaqing caidongyun chinpeng eric-seekas tharanga-abeyseela drob bhanug cuulee madhawa ejchet erhuabushuo muxinc tinhol travisjeffery theory magrinya mapbased amosbird credativ dadbob payssion jnoliver2 constructagility is00hcw zhoudaqing joliny dut3062796s sbilly pombredanne fnet123 rucky2013 mingxuanchen liqingrikeiikyeong zmyer iokays ozgune ralamgstromg skyformat99 devin001 michelp stevenyao968 a520ass joerg84 segmond robin900 vlubarsky shangshujie365 number0 blunney1 slowisfast168 samjaninf stephensong teotikalki forkeer smkingsoft ictlyh qcjxberin joe2hpimn edwardbetts 7758285 liuzhenhzong tighterman eagle518 wahello vincentchen

citus's Issues

Publish TPC-DS benchmarks on PostgreSQL

TPC-DS benchmarks are new and target mixed workloads. The TPC-DS website shows that there are currently no published benchmark results.

We considered publishing benchmark results for PostgreSQL. However, the benchmark looked too long and complex for us to prioritize this ahead of other activities.

Session analytics package on single node

Funnel and cohort queries in SQL (PostgreSQL) are hard to execute in human real-time. The session analytics package improves performance for funnel queries by 10-100x; it owes this to an array based execution model.

http://www.redbook.io/ch8-interactive.html ("array-based" executor)

We need to consider writing a tutorial for the session analytics package on a single node. For multiple nodes, this issue also relates to #41.

Fast OLAP tutorial: Install HyperLogLog and topN extensions and aggregate functions for them out of the box

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers installing HyperLogLog and topN extensions and aggregate functions for them out of the box.

This item is a subtask of #3

Masterless: Replication group automated fail-over: Design document

The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.

This task investigates current fail-over solutions for PostgreSQL, understands their use and popularity, and documents them.

Some of the current fail-over solutions have external dependencies, such as etcd or ZooKeeper. If we decide to incorporate these systems, this task also relates to #13 and #14.

spark_fdw or provide native Spark integration

We could consider writing a spark_fdw (foreign data wrapper) to enable querying data in Spark.

Or we could build a tight integration between Spark and PostgreSQL / Citus. In this scenario, Spark manages distributed roll-ups and PostgreSQL acts as the presentation layer.

Simpler multiple node install: Installation script that uses SSH

Users manually install our packages on each node and edit relevant config. We need to have simpler multiple node install steps. We could do this by using an installation scripts that uses SSH.

Fast OLAP tutorial: Write an example client script to read example data in real-time and insert it in

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers writing an example client script to read example data in real-time and insert it in.

This item is a subtask of #3

Upgrading from Citus 4.0 to the extension

We currently document between Citus versions here: https://www.citusdata.com/documentation/citusdb-documentation/admin_guide/upgrading_citusdb.html

We should revise these steps and test for the upgrade from 4.0 to the new extension.

Real-time PostgreSQL: Blog entry, webinar, and documentation

Once we pick and implement an approach for roll-up tables in #38, we need to gather feedback and create content to communicate this approach.

PostgreSQL "better" materialized views

PostgreSQL's materialized views don't get updated on-demand. Users need to refresh a materialized view, and the refresh command discards old contents and completely replaces the contents of a materialized view.

Commercial databases incrementally update a materialized view's contents -- this is particularly helpful for aggregate queries. This task involves improving PostgreSQL's materialized views for incremental updates.

Simpler single node install (Docker image) and/or OS X install

Most new users start with trying out unofficial Citus Docker images. We also currently don't have an install story except from source for OS X. We need to have simpler single node install (Docker image) and/or OS X install instructions.

Outer Join Improvements: Create bucket(s) for non-matching values

We need to create bucket(s) for non-matching values in single repartition outer joins.

Fast OLAP tutorial: Introduce a cache in front of immutable shards (for historical data) to cache query fragment results

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask introduce a cache in front of immutable shards (for historical data) to cache query fragment results.

This item is a subtask of #3

Propagate DDL commands to workers v2

Citus 5.0 propagates Alter Table and Create Index commands to worker nodes. We implemented this feature using Citus' current replication model. We also decided to switch to using 2PC (or pg_paxos) once the metadata propagation changes were implemented.

This issue tracks v2 of the DDL propagation changes and depends on #19.

Masterless: Propagate metadata changes to all nodes

We need to propagate metadata changes to all nodes. That is, when the user creates a distributed table or creates new shards for that distributed table, we need to propagate these changes to all nodes in the cluster. For this, we could use the 2PC protocol built-into Postgres or pg_paxos.

Masterless: Educating users on streaming replication

The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.

Most users get confused by PostgreSQL's documentation on streaming replication. This chapter communicates various alternatives and feels like a choose your own adventure guide. We need to find a way to better articulate how streaming replication works -- @anarazel had an internal presentation that was pretty insightful.

Distributed materialized views

Citus users currently create aggregate tables in the following way: they pick a distribution column, for example customer_id. They then ingest all event data related to the customer_id. They then create roll-up tables for per-hour and per-day aggregates on customer_id. Since all tables are hash partitioned on customer_id, both raw data and roll-up tables end up being co-located on the same node.

Citus users currently use PL/PgSQL or simple scripts to create these roll-up tables. We need to make this simpler. One way is by offering a UDF that propagates certain DDL commands.

This issue could also be a duplicate of #11.

Subselect push down #3

We split the subselect push down project into three projects. This task refers to complex subselect queries that can be pushed down to worker nodes for human real-time queries. These complex queries are mostly applicable in the context of session and funnel analytics queries.

Write a user-defined function to propagate DELETE commands to all shards

Write a user-defined function that extends and propagates DELETE commands to all the shards. This function doesn't need to be as safe as functionality that's built into Citus and should help in removing the need for customers to write their own scripts.

Fast OLAP tutorial: Write an example server script to help with installation and start-up

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers writing an example server script to help with installation and server start-up.

This item is a subtask of #3

Master node setup: Instructions for streaming replica + load balancer setup; etc.

When users have questions around master node failover, we currently point them to relevant sections in the PostgreSQL manual. We need to have more streamlined steps / scripts around setting up streaming replication and load balancer.

This task may need a requirements document.

Masterless: Answer compatibility questions with existing Citus deployments

Citus currently replicates incoming changes to all shard placements. If a shard placement is unavailable, Citus marks the placement as invalid.

This approach is different than having all changes go through a primary in a replication group. We need to answer compatibility questions with Citus' existing replication model.

Masterless: Replication group automated fail-over

The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.

We will first investigate existing solutions in #20 and come up with a design document. This task then implements and tests the picked solution.

Some of the current fail-over solutions have external dependencies, such as etcd or ZooKeeper. If we decide to incorporate these systems, this task also relates to #13 and #14.

Masterless: Replication group initialization logic

The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.

We need to write scripts / functions to easily set up and configure these replication groups.

When writing this logic, if we introduce new dependencies such as a new scripting language, we should also think about incorporating them into #13 and #14.

Improve and test parallel copy script for hash partitioned tables

Improve and test the parallel loading script to replace copy_from_distributed_table in CitusDB 5

Integrate pg_shard + CitusDB

HLL, topN, histogram packaging and communication

We offer several extensions to our customers to enable human real-time queries. We could consider packaging these extensions together and communicating their benefits. The extensions I can think of are HyperLogLog (HLL), topN, histogram, and approximate percentile.

Open items on single table subselects

We have several open items on repartitioned subselects. We need to revisit these items and implement them.

Fast OLAP tutorial: Decide on the underlying installation platform (Docker, VM, etc.)

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers how we intend to distribute and have Citus installed for the tutorials (apt-get, Docker, VM, etc.).

This item is a subtask of #3

Simplify data migration from PostgreSQL to Citus

Simplify data migration from PostgreSQL (local tables) to Citus (distributed tables).

This issue has several components to it and each one would be beneficial in isolation:

Migrate data from an existing PostgreSQL database to the Citus coordinator. AWS has a data migration service for Postgres that could be worth looking into.
- Do we provide this for Citus Cloud (managed), AWS, or on-prem deployments?
- Do we take any downtime when replicating the data?
- Do we follow an approach that uses logical replication (Slony, pg_logical) or physical replication?
Load data in Citus coordinator into distributed tables
- One way to do that is by running an INSERT INTO ... SELECT. #782 and #1117 provides a good workaround for this step.
- Do we take any downtime when replicating the data?
Enable schema migrations for the multi-tenant data model. During migrations, Citus may require changes to the underlying data definition. For example:
- You may need to add a tenant_id column to your tables and then backfill data. This particular item comes up frequently in engineering sessions.
- You may then need to change your primary key or foreign key declarations.
Enable schema migrations from "one schema per tenant" databases to "shared tables." The Apartment gem & corresponding blog post talks about the "one schema per tenant" approach. We could look to easily migrate prospective users to Citus' multi-tenant model.
Enable schema migrations from other relational databases to PostgreSQL. AWS has a schema migration tool that may be worth looking into.
Automate data remodeling for the multi-tenant use case. In this migration task, we'd write software to automate the following: understand the current table schema (likely in the relational database model), pick the table that's at the top of the hierarchy, and convert the relational database model into the hierarchical one while also adding the tenant_id column to the corresponding tables.

Outer Join Improvements: Implement unified join order planner

We currently have join order planning logic spread across the join order and logical planners. This item unifies these two code paths.

Outer Join Improvements: Design unified join order planner

We currently have join order planning logic spread across the join order and logical planners. This item comes up with a design to unfiy these two code paths.

Create tutorial that shows fast aggregations and sub-second OLAP queries

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. These tutorials should enable customers to set up a Citus cluster themselves and run OLAP queries on real-time data.

A rough breakdown of these tasks include:

Decide on the underlying installation platform (Docker, VM, etc.)
Revisit the examples section in documentation
Write an example server script to help with installation and start-up
Introduce a cache in front of immutable shards (for historical data) to cache query fragment results
Write an example client script to read example data in real-time and insert it in
Install HyperLogLog and topN extensions and aggregate functions for them out of the box

Outer Join Improvements: Do not push down filters on outer join output in re-partition joins

We shouldn't push down filters on outer join output in re-partition joins.

COPY FROM for hash partitioned tables

We currently don't have a native way to bulk insert data into hash partitioned tables. The built-in copy_to_distributed_table script only supports certain COPY arguments. More importantly, this script uses triggers to ingest data and therefore doesn't have desirable performance characteristics.

What are we building? [bulk ingest into hash-partitioned tables?]

COPY for CitusDB with the goal of:

Expanding (Postgre)SQL support
Bulk ingesting into hash and range-partitioned tables

Who are we building it for?

Users of hash-partitioned and range-partitioned tables, e.g. co-located join and key-value scenarios. Could be used for experimenting with sample data, initial data load, or loading production data. The capacity is limited to a certain number of cores in single master world (which allows tens of thousands of rows/s per core).

What is the user experience?

The current code has a configurable transaction manager, which allows for 3 models:

2PC model:

set-up: max_prepared_transactions to > 0 on all worker nodes
usage: \COPY or COPY as per PostgreSQL docs
recovery: call a UDF (probably)

regular model:

set-up: none
usage: \COPY or COPY as per PostgreSQL docs
recovery: none, data can be partially ingested

choice model:

user can choose between the former 2

superuser is required for COPY .. FROM 'file', but not \COPY .. FROM 'file'.

Performance on a typical cluster: What must/should be our throughput on a single core? On multiple cores?

Should be 100x faster than copy_to_distributed_table on a single core, scalable by the number of cores.

Failure semantics: What must/should be our behavior on (a) bad data and (b) node failures?

The current code has a configurable transaction manager which offers 2 models:

2PC model:
(a) roll back transaction
(b) worker failure before copy - mark placement as inactive
worker failure during copy - roll back transaction
worker failure during commit - roll back/forward transaction upon recovery
master failure - roll back/forward transaction upon recovery

regular model:
(a) roll back transaction
(b) worker failure before - mark placement as inactive
worker failure during copy - roll back transaction
worker failure during commit - leave partially copied data
master failure before or during copy - roll back transaction
master failure during commit - leave partially copied data

choice model:

user can choose between the former 2

Delivery mechanism: How do we deliver this to the customer? Is this a script, a binary, or something that gets checked into product?

As part of the CitusDB extension.

Remove master node (masterless)

Citus currently has a master node that holds authoritative metadata. This issue is to remove the master node from the Citus architecture and make sure that all nodes can ingest data and hold metadata.

The subtasks for this project are:

Requirements and design document (#18)
Propagate metadata changes to all nodes (2PC or pg_paxos) (#19)
Replication group automated fail-over (Requirements document: review existing solutions) (#20)
Replication group fail-over (#21)
Replication group initialization logic (#22)
Educating users on PostgreSQL's streaming replication (#23)
Answer compatibility questions with existing Citus deployments (#24)

Better data ingest for append partitioned tables

Citus users copy data into their cluster by starting up csql and using the \stage command. This way, data doesn't flow through the master node and the master doesn't become a bottleneck.

This approach however has the usability drawback that users can't ingest data into append partitioned tables using standard PostgreSQL connectors. We need to offer a more native and also scalable alternative to \stage.

Fast OLAP tutorial: Revisit the Examples section in our documentation

We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask revisits the Examples section in our documentation.

This item is a subtask of #3

Task batching improvements in task-tracker executor

The task tracker executor runs into performance bottleneck when assigning and tracking a high number of tasks (1M+). @marcocitus has a change that batches task assignment and task tracking queries together and this change notably improves the task tracker executor's performance.

Outer Join Improvements: Integration with subqueries

Subquery push downs currently have a separate code path to enable pushing down of outer joins. Also, repartitioned subselects don't yet have support for joins. We need to integrate outer join logic with the current subquery logic.

Kafka integration and communication

Most Citus customers use a Kafka queue before they ingest data into the database. We need to investigate their use and have a better integration story between Kafka and Citus.

Kafka uses the Java runtime. This task may therefore relate to #4.

Outer Join Improvements: Generate replacement for prunable outer join

We need to generate replacement for prunable outer join (e.g. join with an empty table).

Implement master_aggregate_table_shards(source table, destination table, aggregation_query)

Our documentation refers to the user-defined function master_aggregate_table_shards to help users create distributed materialized views. We need to implement this function or remove the reference to it from our documentation.

https://www.citusdata.com/documentation/citusdb-documentation/examples/id_querying_aggregated_data.html

The mechanism through which we implement this function could potentially be similar to #10

PostgreSQL extension that can skip faulty lines in COPY

When users load large data sets (from S3 or files), these datasets might have a few bad records. Most data warehousing solutions can be configured to skip over a predefined number of bad lines.

This has also been discussed for PostgreSQL: https://wiki.postgresql.org/wiki/Error_logging_in_COPY

This task proposes to extend COPY to skip over a configurable number of records.

Distributed EXPLAIN

@marcocitus has a pull request out for distributed EXPLAIN. Marco mentioned that @samay-sharma could be a good person to review these changes. We need to document what distributed EXPLAIN covers and review the pull request.

Outer Join post-5.0 Improvements

Citus 5.0 has partial support for outer join queries. We'd now like to refactor and expand on this feature's scope.

Design unified join order planner (high priority)
Implement unified join order planner (high priority)
Generate replacement for prunable outer join (e.g. join with empty table)
Do not push down filters on outer join output in re-partition joins
Create bucket(s) for non-matching values in single-repartition outer joins
Integration with subqueries

citusdata / citus Goto Github PK

citus's People

Contributors

Stargazers

Watchers

Forkers

citus's Issues

Recommend Projects

Recommend Topics

Recommend Org