Giter Club home page Giter Club logo

ranger's Introduction

Ranger

This is an experiment to define a generic range-based sharding interface which services can implement in order to have their workloads automatically balanced, and a generic controller to perform that balancing. It's designed in particular to support stateful workloads, which need to move around large amounts of data in order to rebalance, but should be useful to stateless worksloads, too.

Ranger is just a toy today, with various critical features missing and hardly any tests, so is not yet suitable for any purpose under any circumstances. I'm working on it (on the side) because I'm acquainted with many systems which would benefit from such a thing existing. Please drop me a line if you're interested in collaborating.

Examples

Interface

Services implement rangelet.Node:

  • GetLoadInfo(rID ranje.Ident) LoadInfo
  • PrepareAddRange(rm RangeMeta, parents []Parent) error
  • AddRange(rid RangeID) error
  • PrepareDropRange(rid RangeID) error
  • DropRange(rid RangeID) error

This is a Go interface, but it's all gRPC+protobufs under the hood. There are no other implementations today, but it's a goal to avoid doing anything which would make it difficult to implement Rangelets in other languages.

Design

ranger-diagram-v1

Here's how it works, at a high level.
The main components are:

  • Keyspace: Stores the desired state of ranges and placements. Provides an interface to create new ranges by splitting and joining. (Ranges cannot currently be destroyed; only obsoleted, in case the history is needed.) Provides an interface to create and destroy placements, in order to designate which node(s) each range should be placed on.
  • Roster: Watches (external) service discovery to maintain a list of nodes (on which ranges can be placed). Relays messages from other components (e.g. the orchestrator) to the nodes. Periodically probes those nodes to monitor their health, and the state of the ranges placed on them. For now, provides an interface for other components to find a node suitable for range placement.
  • Orchestrator: Reconciles the difference between the desired state (from the keyspace) and the current state (from the roster), somewhat like a Kubernetes controller.
  • Rangelet: Runs inside of nodes. Receives RPCs from the roster, and calls methods of the rangelet.Node interface to notify nodes of changes to the set of ranges placed on them. Provides some useful helper methods to simplify node development.
  • Balancer: External component. Simple implementation(s) provided, but can be replaced for more complex services. Fetches state of nodes, ranges, and placements from orchestrator, and sends split and join RPCs in order to spread ranges evenly across nodes.

Both Persister and Discovery are simple interfaces to pluggable storage systems. Only Consul is supported by either for now, but adding support for other systems (e.g. ZooKeeper, etcd) should be easy enough in future.

The green boxes are storage nodes. These are implemented entirely (except the rangelet) by the service owner, to perform the actual work that Ranger is so helpfully sharding and balancing. Services may receive their data via HTTP or RPC, and so may provide a client library to route requests to the appropriate node(s), or may forward requests between themselves. (Ranger doesn't provide any help with that part today, but likely will in future.) Alternatively, services may pull relevant work from e.g. a message queue.

For example node implementations, see the examples directory.
For more complex examples, read the Slicer and Shard Manager papers.

Client

Ranger includes a command line client, rangerctl, which is a thin wrapper around the gRPC interface to the orchestrator. This is currently the primary means of inspecting and balancing data across a cluster.

$ ./rangerctl -h
Usage: ./rangerctl [-addr=host:port] <action> [<args>]

Action and args must be one of:
  - ranges
  - range <rangeID>
  - nodes
  - node <nodeID>
  - move <rangeID> [<nodeID>]
  - split <rangeID> <boundary> [<nodeID>] [<nodeID>]
  - join <rangeID> <rangeID> [<nodeID>]

Flags:
  -addr string
        controller address (default "localhost:5000")
  -request
        print gRPC request instead of sending it

State Machines

TODO

Related Work

I've taken ideas from most of these systems. I'll expand this doc soon to clarify what came from each. But for now, here are some links:

License

MIT

ranger's People

Contributors

adammck avatar maciejkow avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.