citusdata / citus Goto Github PK
View Code? Open in Web Editor NEWDistributed PostgreSQL as an extension
Home Page: https://www.citusdata.com
License: GNU Affero General Public License v3.0
Distributed PostgreSQL as an extension
Home Page: https://www.citusdata.com
License: GNU Affero General Public License v3.0
TPC-DS benchmarks are new and target mixed workloads. The TPC-DS website shows that there are currently no published benchmark results.
We considered publishing benchmark results for PostgreSQL. However, the benchmark looked too long and complex for us to prioritize this ahead of other activities.
Funnel and cohort queries in SQL (PostgreSQL) are hard to execute in human real-time. The session analytics package improves performance for funnel queries by 10-100x; it owes this to an array based execution model.
http://www.redbook.io/ch8-interactive.html ("array-based" executor)
We need to consider writing a tutorial for the session analytics package on a single node. For multiple nodes, this issue also relates to #41.
We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers installing HyperLogLog and topN extensions and aggregate functions for them out of the box.
This item is a subtask of #3
The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.
This task investigates current fail-over solutions for PostgreSQL, understands their use and popularity, and documents them.
Some of the current fail-over solutions have external dependencies, such as etcd or ZooKeeper. If we decide to incorporate these systems, this task also relates to #13 and #14.
We could consider writing a spark_fdw (foreign data wrapper) to enable querying data in Spark.
Or we could build a tight integration between Spark and PostgreSQL / Citus. In this scenario, Spark manages distributed roll-ups and PostgreSQL acts as the presentation layer.
Users manually install our packages on each node and edit relevant config. We need to have simpler multiple node install steps. We could do this by using an installation scripts that uses SSH.
We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers writing an example client script to read example data in real-time and insert it in.
This item is a subtask of #3
We currently document between Citus versions here: https://www.citusdata.com/documentation/citusdb-documentation/admin_guide/upgrading_citusdb.html
We should revise these steps and test for the upgrade from 4.0 to the new extension.
Once we pick and implement an approach for roll-up tables in #38, we need to gather feedback and create content to communicate this approach.
PostgreSQL's materialized views don't get updated on-demand. Users need to refresh a materialized view, and the refresh command discards old contents and completely replaces the contents of a materialized view.
Commercial databases incrementally update a materialized view's contents -- this is particularly helpful for aggregate queries. This task involves improving PostgreSQL's materialized views for incremental updates.
Most new users start with trying out unofficial Citus Docker images. We also currently don't have an install story except from source for OS X. We need to have simpler single node install (Docker image) and/or OS X install instructions.
We need to create bucket(s) for non-matching values in single repartition outer joins.
We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask introduce a cache in front of immutable shards (for historical data) to cache query fragment results.
This item is a subtask of #3
Citus 5.0 propagates Alter Table
and Create Index
commands to worker nodes. We implemented this feature using Citus' current replication model. We also decided to switch to using 2PC (or pg_paxos) once the metadata propagation changes were implemented.
This issue tracks v2 of the DDL propagation changes and depends on #19.
We need to propagate metadata changes to all nodes. That is, when the user creates a distributed table or creates new shards for that distributed table, we need to propagate these changes to all nodes in the cluster. For this, we could use the 2PC protocol built-into Postgres or pg_paxos.
The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.
Most users get confused by PostgreSQL's documentation on streaming replication. This chapter communicates various alternatives and feels like a choose your own adventure guide. We need to find a way to better articulate how streaming replication works -- @anarazel had an internal presentation that was pretty insightful.
Citus users currently create aggregate tables in the following way: they pick a distribution column, for example customer_id. They then ingest all event data related to the customer_id. They then create roll-up tables for per-hour and per-day aggregates on customer_id. Since all tables are hash partitioned on customer_id, both raw data and roll-up tables end up being co-located on the same node.
Citus users currently use PL/PgSQL or simple scripts to create these roll-up tables. We need to make this simpler. One way is by offering a UDF that propagates certain DDL commands.
This issue could also be a duplicate of #11.
We split the subselect push down project into three projects. This task refers to complex subselect queries that can be pushed down to worker nodes for human real-time queries. These complex queries are mostly applicable in the context of session and funnel analytics queries.
Write a user-defined function that extends and propagates DELETE commands to all the shards. This function doesn't need to be as safe as functionality that's built into Citus and should help in removing the need for customers to write their own scripts.
We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers writing an example server script to help with installation and server start-up.
This item is a subtask of #3
When users have questions around master node failover, we currently point them to relevant sections in the PostgreSQL manual. We need to have more streamlined steps / scripts around setting up streaming replication and load balancer.
This task may need a requirements document.
Citus currently replicates incoming changes to all shard placements. If a shard placement is unavailable, Citus marks the placement as invalid.
This approach is different than having all changes go through a primary in a replication group. We need to answer compatibility questions with Citus' existing replication model.
The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.
We will first investigate existing solutions in #20 and come up with a design document. This task then implements and tests the picked solution.
Some of the current fail-over solutions have external dependencies, such as etcd or ZooKeeper. If we decide to incorporate these systems, this task also relates to #13 and #14.
The new masterless approach plans to use replication groups. These replication groups would have PostgreSQL databases set up as primary and secondaries.
We need to write scripts / functions to easily set up and configure these replication groups.
When writing this logic, if we introduce new dependencies such as a new scripting language, we should also think about incorporating them into #13 and #14.
Improve and test the parallel loading script to replace copy_from_distributed_table in CitusDB 5
Integrate pg_shard + CitusDB
We offer several extensions to our customers to enable human real-time queries. We could consider packaging these extensions together and communicating their benefits. The extensions I can think of are HyperLogLog (HLL), topN, histogram, and approximate percentile.
More documentation: more on data modeling for distributed systems, how our users set up pgBouncer, and items that were cut out from docs v1
We have several open items on repartitioned subselects. We need to revisit these items and implement them.
We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask covers how we intend to distribute and have Citus installed for the tutorials (apt-get, Docker, VM, etc.).
This item is a subtask of #3
Simplify data migration from PostgreSQL (local tables) to Citus (distributed tables).
This issue has several components to it and each one would be beneficial in isolation:
tenant_id
column to your tables and then backfill data. This particular item comes up frequently in engineering sessions.tenant_id
column to the corresponding tables.We currently have join order planning logic spread across the join order and logical planners. This item unifies these two code paths.
We currently have join order planning logic spread across the join order and logical planners. This item comes up with a design to unfiy these two code paths.
We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. These tutorials should enable customers to set up a Citus cluster themselves and run OLAP queries on real-time data.
A rough breakdown of these tasks include:
We shouldn't push down filters on outer join output in re-partition joins.
We currently don't have a native way to bulk insert data into hash partitioned tables. The built-in copy_to_distributed_table script only supports certain COPY arguments. More importantly, this script uses triggers to ingest data and therefore doesn't have desirable performance characteristics.
What are we building? [bulk ingest into hash-partitioned tables?]
COPY for CitusDB with the goal of:
Who are we building it for?
Users of hash-partitioned and range-partitioned tables, e.g. co-located join and key-value scenarios. Could be used for experimenting with sample data, initial data load, or loading production data. The capacity is limited to a certain number of cores in single master world (which allows tens of thousands of rows/s per core).
What is the user experience?
The current code has a configurable transaction manager, which allows for 3 models:
2PC model:
regular model:
choice model:
superuser is required for COPY .. FROM 'file', but not \COPY .. FROM 'file'.
Performance on a typical cluster: What must/should be our throughput on a single core? On multiple cores?
Should be 100x faster than copy_to_distributed_table on a single core, scalable by the number of cores.
Failure semantics: What must/should be our behavior on (a) bad data and (b) node failures?
The current code has a configurable transaction manager which offers 2 models:
2PC model:
(a) roll back transaction
(b) worker failure before copy - mark placement as inactive
worker failure during copy - roll back transaction
worker failure during commit - roll back/forward transaction upon recovery
master failure - roll back/forward transaction upon recovery
regular model:
(a) roll back transaction
(b) worker failure before - mark placement as inactive
worker failure during copy - roll back transaction
worker failure during commit - leave partially copied data
master failure before or during copy - roll back transaction
master failure during commit - leave partially copied data
choice model:
Delivery mechanism: How do we deliver this to the customer? Is this a script, a binary, or something that gets checked into product?
As part of the CitusDB extension.
Citus currently has a master node that holds authoritative metadata. This issue is to remove the master node from the Citus architecture and make sure that all nodes can ingest data and hold metadata.
The subtasks for this project are:
Citus users copy data into their cluster by starting up csql
and using the \stage
command. This way, data doesn't flow through the master node and the master doesn't become a bottleneck.
This approach however has the usability drawback that users can't ingest data into append partitioned tables using standard PostgreSQL connectors. We need to offer a more native and also scalable alternative to \stage
.
We need to have a tutorial that shows how to ingest time-series data into Citus and to run fast aggregations on this data. This subtask revisits the Examples section in our documentation.
This item is a subtask of #3
The task tracker executor runs into performance bottleneck when assigning and tracking a high number of tasks (1M+). @marcocitus has a change that batches task assignment and task tracking queries together and this change notably improves the task tracker executor's performance.
Subquery push downs currently have a separate code path to enable pushing down of outer joins. Also, repartitioned subselects don't yet have support for joins. We need to integrate outer join logic with the current subquery logic.
Most Citus customers use a Kafka queue before they ingest data into the database. We need to investigate their use and have a better integration story between Kafka and Citus.
Kafka uses the Java runtime. This task may therefore relate to #4.
We need to generate replacement for prunable outer join (e.g. join with an empty table).
Our documentation refers to the user-defined function master_aggregate_table_shards
to help users create distributed materialized views. We need to implement this function or remove the reference to it from our documentation.
The mechanism through which we implement this function could potentially be similar to #10
When users load large data sets (from S3 or files), these datasets might have a few bad records. Most data warehousing solutions can be configured to skip over a predefined number of bad lines.
This has also been discussed for PostgreSQL: https://wiki.postgresql.org/wiki/Error_logging_in_COPY
This task proposes to extend COPY to skip over a configurable number of records.
@marcocitus has a pull request out for distributed EXPLAIN. Marco mentioned that @samay-sharma could be a good person to review these changes. We need to document what distributed EXPLAIN covers and review the pull request.
Citus 5.0 has partial support for outer join queries. We'd now like to refactor and expand on this feature's scope.
We need to implement repartitioned subselects over multiple tables. Internally, we refer to this project as subselect #2.
PostgreSQL can easily be extended to create aggregate tables as raw data gets ingested into the database.
We explored several methods to create roll-up tables and also presented a tutorial on it. We could evaluate our learnings from this tutorial and pick and implement an approach: https://www.youtube.com/watch?v=0ybz6zuXCPo
We need to have separate requirements and design documents. We started working on these as one document, but haven't reached a conclusion yet.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.