iggy-rs / iggy Goto Github PK

Iggy is the persistent message streaming platform written in Rust, supporting QUIC, TCP and HTTP transport protocols, capable of processing millions of messages per second.

Home Page: https://iggy.rs

License: MIT License

Rust 99.45% Dockerfile 0.01% Just 0.04% Shell 0.50%

http iggy messaging quic rust streaming tcp

iggy's Introduction

Iggy

Iggy is the persistent message streaming platform written in Rust, supporting QUIC, TCP (custom binary specification) and HTTP (regular REST API) transport protocols. Currently, running as a single server, it allows creating streams, topics, partitions and segments, and send/receive messages to/from them. The messages are stored on disk as an append-only log, and are persisted between restarts.

The goal of the project is to make a distributed streaming platform (running as a cluster), which will be able to scale horizontally and handle millions of messages per second (actually, it's already very fast, see the benchmarks below).

Iggy provides exceptionally high throughput and performance while utilizing minimal computing resources.

This is not yet another extension running on top of the existing infrastructure, such as Kafka or SQL database.

Iggy is the persistent message streaming log built from the ground up using the low lvl I/O for speed and efficiency.

The name is an abbreviation for the Italian Greyhound - small yet extremely fast dogs, the best in their class. Just like mine lovely Fabio & Cookie ❤️

Features

Highly performant, persistent append-only log for the message streaming
Very high throughput for both writes and reads
Low latency and predictable resource usage thanks to the Rust compiled language (no GC)
Users authentication and authorization with granular permissions and PAT (Personal Access Tokens)
Support for multiple streams, topics and partitions
Support for multiple transport protocols (QUIC, TCP, HTTP)
Fully operational RESTful API which can be optionally enabled
Available client SDK in multiple languages
Works directly with the binary data (lack of enforced schema and serialization/deserialization)
Configurable server features (e.g. caching, segment size, data flush interval, transport protocols etc.)
Possibility of storing the consumer offsets on the server
Multiple ways of polling the messages:
- By offset (using the indexes)
- By timestamp (using the time indexes)
- First/Last N messages
- Next N messages for the specific consumer
Possibility of auto committing the offset (e.g. to achieve at-most-once delivery)
Consumer groups providing the message ordering and horizontal scaling across the connected clients
Message expiry with auto deletion based on the configurable retention policy
Additional features such as server side message deduplication
TLS support for all transport protocols (TCP, QUIC, HTTPS)
Optional server-side as well as client-side data encryption using AES-256-GCM
Optional metadata support in the form of message headers
Built-in CLI to manage the streaming server
Built-in benchmarking app to test the performance
Single binary deployment (no external dependencies)
Running as a single node (no cluster support yet)

Roadmap

Streaming server caching and I/O improvements
Low level optimizations (zero-copy etc.)
Clustering & data replication
Rich console CLI
Advanced Web UI
Developer friendly SDK supporting multiple languages
Plugins & extensions support

For the detailed information about current progress, please refer to the project board.

Supported languages SDK (work in progress)

CLI

The brand new, rich, interactive CLI is being implemented under the cli project, to provide the best developer experience. This will be a great addition to the Web UI, especially for all the developers who prefer using the console tools.

Web UI

There's an ongoing effort to build the administrative web UI for the server, which will allow to manage the streams, topics, partitions, messages and so on. Check the Web UI repository

Docker

You can find the Dockerfile and docker-compose in the root of the repository. To build and start the server, run: docker compose up.

Additionally, you can run the CLI which is available in the running container, by executing: docker exec -it iggy-server /iggy/iggy.

Keep in mind that running the container on the OS other than Linux, where the Docker is running in the VM, might result in the significant performance degradation.

The official images can be found here, simply type docker pull iggyrs/iggy.

Configuration

The default configuration can be found in server.toml (the default one) or server.json file in configs directory.

The configuration file is loaded from the current working directory, but you can specify the path to the configuration file by setting IGGY_CONFIG_PATH environment variable, for example export IGGY_CONFIG_PATH=configs/server.json (or other command depending on OS).

For the detailed documentation of the configuration file, please refer to the configuration section.

Quick start

Build the project (the longer compilation time is due to LTO enabled in release profile):

cargo build

Run the tests:

cargo test

Start the server:

cargo r --bin iggy-server

Please note that all commands below are using iggy binary, which is part of release (cli sub-crate).

Create a stream with ID 1 named dev using default credentials and tcp transport (available transports: quic, tcp, http, default tcp):

cargo r --bin iggy -- --transport tcp --username iggy --password iggy stream create 1 dev

List available streams:

cargo r --bin iggy -- --username iggy --password iggy stream list

Get stream details (ID 1):

cargo r --bin iggy -- -u iggy -p iggy stream get 1

Create a topic for stream dev (ID 1), with ID 1, 2 partitions (IDs 1 and 2), disabled message expiry (0 seconds), named sample:

cargo r --bin iggy -- -u iggy -p iggy topic create dev 1 2 sample

List available topics for stream dev (ID 1):

cargo r --bin iggy -- -u iggy -p iggy topic list dev

Get topic details (ID 1) for stream dev (ID 1):

cargo r --bin iggy -- -u iggy -p iggy topic get 1 1

Send a message 'hello world' (ID 1) to the stream dev (ID 1) to topic sample (ID 1) and partition 1:

cargo r --bin iggy -- -u iggy -p iggy message send --partition-id 1 dev sample "hello world"

Send another message 'lorem ipsum' (ID 2) to the same stream, topic and partition:

cargo r --bin iggy -- -u iggy -p iggy message send --partition-id 1 dev sample "lorem ipsum"

Poll messages by a regular consumer with ID 1 from the stream dev (ID 1) for topic sample (ID 1) and partition with ID 1, starting with offset 0, messages count 2, without auto commit (storing consumer offset on server):

cargo r --bin iggy -- -u iggy -p iggy message poll --consumer 1 --offset 0 --message-count 2 --auto-commit dev sample 1

Finally, restart the server to see it is able to load the persisted data.

The HTTP API endpoints can be found in server.http file, which can be used with REST Client extension for VS Code.

To see the detailed logs from the CLI/server, run it with RUST_LOG=trace environment variable. See images below:

Files structure

Server start

CLI start

Server restart

Examples

You can find the sample consumer & producer applications under examples directory. The purpose of these apps is to showcase the usage of the client SDK. To find out more about building the applications, please refer to the getting started guide.

To run the example, first start the server with cargo r --bin iggy-server and then run the producer and consumer apps with cargo r --example message-envelope-producer and cargo r --example message-envelope-consumer respectively.

You might start multiple producers and consumers at the same time to see how the messages are being handled across multiple clients. Check the Args struct to see the available options, such as the transport protocol, stream, topic, partition, consumer ID, message size etc.

By default, the consumer will poll the messages using the next available offset with auto commit enabled, to store its offset on the server. With this approach, you can easily achieve at-most-once delivery.

Benchmarks

To benchmark the project, first build the project in release mode:

cargo build --release

Then, run the benchmarking app with the desired options:

Polling (reading) benchmark

cargo r --bin iggy-bench -r -- -c -v send tcp

Sending (writing) benchmark

cargo r --bin iggy-bench -r -- -c -v poll tcp

Parallel sending and polling benchmark

cargo r --bin iggy-bench -r -- -c -v send-and-poll tcp

These benchmarks would start the server with the default configuration, create a stream, topic and partition, and then send or poll the messages. The default configuration is optimized for the best performance, so you might want to tweak it for your needs. If you need more options, please refer to iggy-bench subcommands help and examples. For example, to run the benchmark for the already started server, provide the additional argument --server-address 0.0.0.0:8090.

Depending on the hardware, transport protocol (quic, tcp or http) and payload size (messages-per-batch * message-size) you might expect over 4000 MB/s (e.g. 4M of 1 KB msg/sec) throughput for writes and 6000 MB/s for reads. These results have been achieved on Apple M1 Max with 64 GB RAM.

iggy's People

Contributors

Stargazers

Watchers

iggy's Issues

Use separate struct of bundle of messages

Something like

struct Messages {
    messages: Vec<Arc<Message>>
    total_size_bytes: usize
}

so we can have faster putting into cache, without iterator in buffer.rs, method extend.

Store the consumer offset in embedded DB instead of raw file

Add channel notifications for the polled messages in IggyClient

PoC: accessible iggy-cmd from web GUI

The idea is to start ssh server inside iggy-server docker container, add new user with minimal priviliges, add iggy-cmd to PATH.

In the web gui, it should be possible to ssh user@server:port and see output of terminal.

Main purpose of this is to access this terminal, and use iggy-cmd thru it. (so it would be good to have nice ssh hello msg)

Allow marking the unprocessable (poisoned) messages

Investigate possibility of using jemalloc as standard allocator

https://nnethercote.github.io/perf-book/build-configuration.html

Metrics: add cache hit KPI

So we could know anything regarding cache. Also, maybe think of some other metrics (not cache related) that are worth tracking.

Configurable message size threshold

Add handling of maximum size of `logs` folder

Perhaps it should be possible to use crate https://docs.rs/rolling-file/latest/rolling_file/
instead of tracing-appender.

The parameter itself is already in the config.

Offering my help to build stuff

Hey mate, this project looks great.

I am building a video streaming platform: videocall-rs https://github.com/security-union/videocall-rs

We currently use NATs, it works great but I am very interested in contributing to a pure rust message streaming platform that is more aligned with our stack and values.

What really called my attention is that you are planning to add a REST api, with NATs we all end up just writing this by hand, so this really makes me happy

How can we help you? what are you working on?

CC @griffobeid

Create PR template

Run `iggy-server` with just one tokio worker

To verify if there aren't any deadlocks. And run perf tests.

EDIT:
just to add - this is pure exercise that can result in not-producing any code that'll be commited to master. in other words, it's just a task to check if everything's fine ;)

Add binary artifacts to release tags

They are already stored, just need to release and add them.

Extend prints with session id (if any)

I suggest doing it (if possible) in the logger format string configuration.

Fix recompilation issue when config.toml is changed

Fix errors build.rs when publishing a crate

When trying to publish a crate with cargo publish it fails with:

Caused by:
  Source directory was modified by build.rs during cargo publish. Build scripts should not modify anything outside of OUT_DIR.

Add bot to discord which would print commit info upon merge to master

So we can have a nice, separate channel with constant stream of updates.

Propose adaptive caching strategy

Instead of ringbuffer or slow atomic-based global counter of used bytes.

Expose more protocol-specific parameters to config

For example, for TCP we could expose buffer sizes, enabling/disabling Nagle algorithm.

Custom Error type

Refactor the existing Error into more concise (and easier to use) component with a helper methods for converting the values to/from string, int etc.

Indicators or spinners for the CLI

Maybe we could make use of some spinners or other kinds of indicators displaying the overall status, ongoing command execution etc.? Available crates spinoff, indicatif - I've tried to use indicatif once on the server-side, but had some issues with the tracing logger integration, even when using tracing-indicatif.

Connection string

Implement connection string for IggyClient to automatically connect and authenticate the client connection.

Dump config to logs during iggy-server startup

I suggest using Display trait for all config structs.

Write testcase to verify deduplication of messages

Unfortunately we don't have this tested (yet).

Allow passing unique consumer id or name

For now, when calling the poll_messages() the underlying consumer or consumer group, make use of u32 ID. Investigate the usage of Identifier instead (same like when working with the streams, topics, users or other resources) to improve the developer experience by passing either numeric or text ID.

Add automatic release of client (SDK) to crates.io

It should work similarly to current release of iggy-server docker image to dockerhub.

Rename samples directory to examples

Server-side message compression feature

PoC: use different crate for cache

https://crates.io/crates/atone
instead of VecDeque

Provide a memory limit for the cached messages

Fix load_messages_to_cache

When iggy-server is starting and cache is enabled, it tries to load everything to Vec<Arc<Messages>>. If on SSD there is more messages than memory, we get killed by system. Aim of this issue is to fix this problem by proposing new API, something like:
async fn get_newest_messages_by_size(u64: size_bytes)
and iterate starting from newest segments.

This way we would fetch only the amount we need, not everything.

Delete segments command

Similar to the existing DeletePartitions command, we could introduce another kind of "administrative" request for deleting the segments. Given the specific Stream ID + Topic ID + Partition ID, the segments_count should remove all the segments starting from the oldest one (the lowest start offset in the log filename). The manual segments removal should act in the same way as the automated message expiration feature.

Allow passing numeric or string resource ID in new CLI

In order to make a new CLI compatible with existing API, we could allow the user to provide either numeric ID (e.g. Stream = 1), or string ID (e.g. Topic = "orders") for the resources such as stream, topic and user.

For example, the SDK command looks like this:

pub struct DeleteTopic {
    pub stream_id: Identifier,
    pub topic_id: Identifier,
}

And the CLI command could either use the same Identifier wrapper or just make use of String? to work similarly:

pub(crate) struct TopicDelete {
    stream_id: u32,
    topic_id: u32,
}

The only exceptions are CreateStream and CreateTopic commands (in the latter, you can use the Identifier for stream_id only).

let config: Result<ServerConfig, figment::Error> = config_builder
    .merge(Env::prefixed("IGGY_").split("_"))
    .extract();

For example, the following setting:

[system.database]
path = "database"

Can be easily overwritten with:

export IGGY_SYSTEM_DATABASE_PATH=mydb

However, this will not work with the underscored field names, such as:

[quic.certificate]
self_signed = true

Will not be updated with:

export IGGY_QUIC_SELF_SIGNED=false

More information about this issue can be found here and in the mentioned nesting docs.