Giter Club home page Giter Club logo

examples's Introduction

ClickHouse Examples

ClickHouse Blog data

A collection of data required to back our own ClickHouse official Blogs, including:

  • DDL statements;
  • SQL queries;
  • a collection of agents (Vector, FluentBit, etc) configurations to analyze Kubernetes logs in ClickHouse;
  • more;

ClickHouse docker compose recipes

A list of ClickHouse recipes using docker compose:

  • ClickHouse single node with Keeper
  • ClickHouse single node with Keeper and IMDB dataset
  • ClickHouse and Dagster
  • ClickHouse and Grafana
  • ClickHouse and MSSQL Server 2022
  • ClickHouse and MinIO S3
  • ClickHouse and LDAP (OpenLDAP)
  • ClickHouse and Postgres
  • ClickHouse and Vector syslog and apache demo data
  • ClickHouse Cluster: 2 CH nodes - 3 ClickHouse Keeper (1 Shard 2 Replicas)
  • ClickHouse Cluster: 2 CH nodes - 3 ClickHouse Keeper (2 Shards 1 Replica)
  • ClickHouse Cluster: 4 CH nodes - 3 ClickHouse Keeper (2 Shards 2 Replicas)
  • ClickHouse Cluster: 4 CH nodes - 3 ClickHouse Keeper (2 Shards 2 Replicas) with inter-nodes and keeper digest authentication
  • ClickHouse Cluster: 2 CH nodes - 3 ClickHouse Keeper (1 Shard 2 Replicas) - CH Proxy LB
  • ClickHouse Cluster: 2 CH nodes - 3 ClickHouse Keeper (2 Shards 1 Replica) - CH Proxy LB
  • ClickHouse Cluster: 4 CH nodes - 3 ClickHouse Keeper (2 Shards 2 Replicas) - CH Proxy LB

These recipes are meant to provide a quick n dirty way to get started and try out specific type of ClickHouse integration or clustered environment locally.

Last but not least, feel free to contribute by submitting a PR!

examples's People

Contributors

abstractart avatar danroscigno avatar dependabot[bot] avatar derekchia avatar dermasmid avatar garrettthomaskth avatar gingerwizard avatar juliojimenez avatar leartbeqiraj1 avatar li-zeyuan avatar mneedham avatar nellicus avatar slavanorm avatar tom-clickhouse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

examples's Issues

large data loads - Format-specific variants

While the approach of the initial version of the data load script is generic, robust, and works for all formats, we can optimize performance by exploiting file type-specific knowledge and available metadata and, with that, avoiding unnecessary reads.

For example, ClickHouse SQL queries can access (and potentially utilize for our script) an exhaustive list of Parquet metadata. All numeric columns in a parquet file have metadata describing the minimum and maximum values per row group. From 23.8, ClickHouse automatically exploits this metadata at query time to speed up queries filtering on numeric columns in parquet files. Our script could utilize this as an alternative to rowNumberInAllBlocks by allowing parallel reading within Parquet.

large data loads - Parallel script instances

Performance of the data load mechanism could be (drastically) improved by having an orchestrator that runs parallel instances of the script, each handling distinct subsets of the files This would require separate staging tables per script instance, and in a multi-server cluster, ideally, a different server is used per subset of files to utilize all available CPU cores fully.

large data loads - Variants without rowNumberInAllBlocks

In the initial version of the data load script, we explicitly limit the level of parallel processing to guarantee idempotence for the rowNumberInAllBlocks function. There are also approaches to split a large file's rows into evenly sized repeatable batches without the rowNumberInAllBlocks function. The rows could be dynamically assigned to n buckets based on applying a modulo operation with n to (1) an existing unique key column or, more generic, to (2) the hash value of all columns (e.g., WHERE halfMD5(*) % n = bucket_nr). The former requires explicit knowledge about the stored data values, while the latter can be slower than the rowNumberInAllBlocks-based approach.

Provide an example with Clickhouse username/password

Please provide a simple docker-compose example where one can

  • Spin up a Clickhouse server locally
  • Database should be accessible using database name, username, password
  • User should be able to connect to it using a general-purpose database client, e.g. DBeaver or DbVisualizer

Currently, I am able to spin up the server using docker compose up but do not know how to configure database name, username, password. Even editing users.xml and re-running docker compose up does not help.

Upgrade clickhouse connect version

There is a need to upgrade clickhouse-connect version from confluent-hub install --no-prompt clickhouse/clickhouse-kafka-connect:0.0.17 to confluent-hub install --no-prompt clickhouse/clickhouse-kafka-connect:v1.0.14
Because now it seems current version is no longer present

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.