Giter Club home page Giter Club logo

cosmos-db's Introduction

Cosmos DB

version license API Reference codecov Lint Test Discord chat

Common database interface for various database backends. Primarily meant for applications built on Tendermint, such as the Cosmos SDK, but can be used independently of these as well.

Minimum Go Version

Go 1.19+

Supported Database Backends

  • MemDB [stable]: An in-memory database using Google's B-tree package. Has very high performance both for reads, writes, and range scans, but is not durable and will lose all data on process exit. Does not support transactions. Suitable for e.g. caches, working sets, and tests. Used for IAVL working sets when the pruning strategy allows it.

  • GoLevelDB: a pure Go implementation of LevelDB (see below). Currently the default on-disk database used in the Cosmos SDK.

  • LevelDB using levigo Go wrapper. Uses LSM-trees for on-disk storage, which have good performance for write-heavy workloads, particularly on spinning disks, but requires periodic compaction to maintain decent read performance and reclaim disk space. Does not support transactions.

  • RocksDB: A Go wrapper around RocksDB. Similarly to LevelDB (above) it uses LSM-trees for on-disk storage, but is optimized for fast storage media such as SSDs and memory. Supports atomic transactions, but not full ACID transactions.

  • Pebble: a RocksDB/LevelDB inspired key-value database in Go using RocksDB file format and LSM-trees for on-disk storage. Supports snapshots.

Meta-databases

  • PrefixDB [stable]: A database which wraps another database and uses a static prefix for all keys. This allows multiple logical databases to be stored in a common underlying databases by using different namespaces. Used by the Cosmos SDK to give different modules their own namespaced database in a single application database.

Tests

To test common databases, run make test. If all databases are available on the local machine, use make test-all to test them all.

cosmos-db's People

Contributors

alatuszam avatar catshaark avatar dependabot-preview[bot] avatar dependabot[bot] avatar erikgrinaker avatar facundomedica avatar faddat avatar itsdevbear avatar jayt106 avatar julienrbrt avatar luohaha avatar mark-rushakoff avatar melekes avatar mvdan avatar odeke-em avatar robert-zaremba avatar roysc avatar stumble avatar tac0turtle avatar tessr avatar williambanfield avatar yihuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cosmos-db's Issues

tune default rocksdb options

Current Default

  • target_file_size_multiplier = 1
  • block_size = 4096
  • OptimizeLevelStyleCompaction(512M) implies
    • target_file_size_base = 64M
    • snappy/lz4 compression types

Problem

  • sst file size cap at 64M, big number of files
  • small block_size leads to bigger index/filter block and less efficient compression
  • could use more compression at lower levels, zstd with preset dictionary is pretty good according to our tests.

Tuning For DB Size

  • Increase sst file sizes of lower levels, 300M+, set target_file_size_multiplier = 2?
  • Increase block_size to 32k.
  • Use higher compression at lower levels, zstd with preset dictionary.

We manage to reduce a testnet node's application.db from 256G -> 174G by doing a manual compaction with new parameters.

Other Options

  • optimize_filters_for_hits = 1
  • level_compaction_dynamic_level_bytes = true
  • format_version = 4 more efficient index/filter block: http://rocksdb.org/blog/2019/03/08/format-version-4.html
  • format_version = 5, optimize_filters_for_memory=true and jemalloc, more efficient bloom filter.
  • ribbon filter seems a good trade off, saving memory and disk space.

MemTable Optimizations

  • memtable_whole_key_filtering
  • memtable_prefix_bloom_size_ratio

Pure Go: Do we want that?

The only reason that this repository has ever needed a pretty complex build system is that leveldb and rocksdb both use cgo.

Since we're adding pebble, which outperforms rocksdb and leveldb isn't very much of a superstar and goleveldb is more ergonomic, there's a question if we'd like to make the databases pure go, mainly for ergonomics.

support rocksdb column family

column family fits well with different iavl stores, at least it's easier to check the db states of different stores, potentially could tune db options separately.

But it seems not easy to do that transparently, without changing a few apis.

Add Size() method for Batch interface

This is a sub-task of this issue. As discussed in that issue, we need to divide the each of the module's commits into many smaller db batches of pre-configured size to improve performance of the underlying LSM backend. Hence we need adding Size() method to Batch interface so that we can constrains the size of each db batches to be the pre-configured size.

Add GetNearest method to the DB interface

Summary

There is an issue in cosmos-sdk: Make KVStore interface have methods to getNearest entry.

To address this issue, we can add a new method, GetNearest to the DB interface and implement it for all supported storage engines. This method should have a similar cost to the Get operation and provide a more efficient alternative to iterators for finding the nearest key.

// GetNearest retrieves the nearest key to the given key.
// If 'ascending' is true, the method returns the smallest key greater than the given key.
// If 'ascending' is false, the method returns the largest key smaller than the given key.
// In case there is no key in the desired direction, the method returns nil.
// CONTRACT: key, value read-only []byte
GetNearest(key []byte, ascending bool) ([]byte, error)

Once implemented, the method can be used in the cosmos-sdk repo to add a GetNearest function to the KVStore interface, as described in the original issue.

introduce DB options when creating new backend DB

When testing Cronos functionalities, we think the changes in tendermint/tm-db#218 made the DB consume too much RAM. Instead of hardcoding the max open file number, we prefer it to be adjustable based on the instance's resources.
Therefore,

We may introduce DBOptions into dbCreator, i.e.

type (
	dbCreator func(name string, dir string, opts DBOptions) (DB, error)

	DBOptions interface {
		Get(string) interface{}
	}
)

and introduce a new method and then update the original NewDB implementation

func NewDBwithOptions(name string, backend BackendType, dir string, opts DBOptions) (DB, error) {

func NewDB(name string, backend BackendType, dir string) (DB, error) {
	return NewDBwithOptions(name, backend, dir, nil)
}

Therefore, it will be easier to make DB adjustments to the Cosmos SDK when the DB supports certain options.

remove cleveldb

I'm not aware of anyone successfully or preferentially using cleveldb, so I think that we should remove it.

Make a second DB "IterKey" API that returns a key to the caller that can potentially mutate

(X-post of cometbft/cometbft-db#156 )

Currently the Iterator.Key() API makes a copy of the key it gets from the database. This is because the database's iterator returns something it will mutate on the subsequent .Next() call for heap efficiency. This extra copy causes very large heap allocation (and time overheads) to query serving nodes, and a 1% time overhead to the entire state machine time for Osmosis.

On a heap allocation profile of a query serving Osmosis RPC node over an hour, it has 450 gigabytes allocated from this API. On spot-check, none of the big ones need this copying behavior. (160GB removed from a tendermint update, but the remaining 290GB are still from this API)

image

In the state machine, we see 1% of state machine execution time is blocked on copying this key, again in situations where I don't think we need any of this either.
image


Proposal: Add a new method KeyMut() to the interface for Iterator. The caller should not mutate this key, and the expectation is that the key may get mutated on the next .Next() call.

I'm not stoked about the naming of this method, so happy for better ideas

Benchmarks

So, real-world performance comparisons have showed that Pebbledb is much faster, but in unexpected ways:

  • each node will consume 10x less write throughput
  • each node will consume 27x less read throughput
  • each node will deliver 95%-105% the query traffic

meaning that
users can serve dramatically more from the same hardware.

Previously we benchmarked by cloning iavl, and replacing with our fork of tmdb.

I'd like to have benchmarks here so that we can observe changes commit by commit.

Thoughts on style?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.