Giter Club home page Giter Club logo

cassandra-data-modeling's Introduction

Cassandra Data Modeling

Cassandra is a partitioned row store, where rows are organized into tables with a required primary key.

The first component of a table’s primary key is the partition key; within a partition, rows are clustered by the remaining columns of the PK. Other columns may be indexed independent of the PK.

This allows pervasive denormalization to "pre-build" resultsets at update time, rather than doing expensive joins across the cluster.

Sample Tables:

CREATE TABLE sensor_readings ( sensorID uuid, time_bucket int, timestamp bigint, reading decimal, PRIMARY KEY ((sensorID, time_bucket), timestamp) ) WITH CLUSTERING ORDER BY (timestamp DESC);

SELECT * FROM sensor_readings WHERE sensorID = 53755080-4676-11e4-916c-0800200c9a66 AND time_bucket IN (1411840800, 1411844400) AND timestamp >= 1411841700 AND timestamp ⇐ 1411845300;

CREATE TABLE IF NOT EXISTS ${keyspace}.traces ( trace_id blob, span_id bigint, span_hash bigint, parent_id bigint, operation_name text, flags int, start_time bigint, duration bigint, tags list<frozen<keyvalue>>, logs list<frozen<log>>, refs list<frozen<span_ref>>, process frozen<process>, PRIMARY KEY (trace_id, span_id, span_hash) ) WITH compaction = { 'compaction_window_size': '1', 'compaction_window_unit': 'HOURS', 'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy' } AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = ${trace_ttl} AND speculative_retry = 'NONE' AND gc_grace_seconds = 10800; — 3 hours of downtime acceptable on nodes

CREATE TABLE IF NOT EXISTS ${keyspace}.duration_index ( service_name text, // service name operation_name text, // operation name, or blank for queries without span name bucket timestamp, // time bucket, - the start_time of the given span rounded to an hour duration bigint, // span duration, in microseconds start_time bigint, trace_id blob, PRIMARY KEY ((service_name, operation_name, bucket), duration, start_time, trace_id) ) WITH CLUSTERING ORDER BY (duration DESC, start_time DESC) AND compaction = { 'compaction_window_size': '1', 'compaction_window_unit': 'HOURS', 'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy' } AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = ${trace_ttl} AND speculative_retry = 'NONE' AND gc_grace_seconds = 10800; — 3 hours of downtime acceptable on nodes

Sequential writes can cause hot spots: If the application tends to write or update a sequential block of rows at a time, the writes will not be distributed across the cluster. They all go to one node. This is frequently a problem for applications dealing with timestamped data

cassandra-data-modeling's People

Contributors

sunilsoni avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.