Giter Club home page Giter Club logo

exonum's Issues

Add basic peer discovery mechanism.

It's a proposition to add logic of broadcasting a newer Connect, sent by a PublicKey, to all existing peers in exonum::node::NodeHandler.handle_connect(&mut self, message: Connect).
Resolution will allow to easily add new fullnodes on blockchain network.
This has to correlate with #14 and have a limit on allowed frequency of handling each new Connect message from the same PublicKey.
https://*************/projects/22/tasks/1307.

Implement new leader election algorithm

Update a leader election algorithm to provide weak censorship resistance.

Changes are the following

  1. Every author of an accepted proposal is moving to the disabled state for F blocks (During the next F blocks one don't have a right to create new block proposals). The node behaves as usual in other activities, including voting for a new block, signing messages, etc.
  2. We need to shuffle possible leader nodes in a deterministic manner. To do so, we take a permutation over M = N - F validators. The number of permutation is calculated as T = Hash(H) mod M!. Such calculation provides uniform distribution of the orders, that is byzantine validators would be randomly distributed inside the current height H.

Leveldb don't link on ubuntu 12.02

To compile on travis we need to set ubuntu: trusty.
The errors looks like:

/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib/libleveldb.a(env_posix.o): In function `leveldb::(anonymous namespace)::PosixEnv::Schedule(void (*)(void*), void*)':

(.text+0xaf2): undefined reference to `operator delete(void*)'

/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib/libleveldb.a(env_posix.o): In function `leveldb::(anonymous namespace)::PosixEnv::Schedule(void (*)(void*), void*)':

(.text+0xb8b): undefined reference to `std::__throw_bad_alloc()'

The error probably in C++ runtime.

Use `Duration` for timeouts

Currently we are using SystemTime for timeouts, for example:

pub fn add_status_timeout(&mut self) {
    let time = self.channel.get_time() + Duration::from_millis(self.status_timeout());
    self.channel.add_timeout(NodeTimeout::Status, time);
}

Duration can be used instead:

add_timeout(NodeTimeout::Status, Duration::from_millis(self.status_timeout()));

Although, straightforward implementation will change timeouts behavior because they are handled through the same channel as other events. Perhaps, we need separate channels/queues.

Managing node through managing API ("RPC")

t:

  • blockchain-info () - current info on blockchain(currrent height etc.); covered by #131 and earlier
  • block-hash () - block hash by height; covered by #131 and earlier
  • block-header () - block header with signatures by height/hash; covered by #131
  • tx () - return transation by hash; probably, separate method or parameters to remove proof of inclusion into block. covered by #131
  • mempool-info () - private; mempool size. Closed as part of #224
  • mempool-list (hash) - private; get status (mempool, committed, unknown) of transaction; its body; its location, if committed. Closed as part of #224
  • stop () - stops server; deferred to #149
  • help () - command help. Not actual for restful api. Documentation instead.
  • generate () - generate one or more blocks in demo mode??; not needed. we're in demo mode by default 🚸 .
  • propose-get () - return propose ; deferred to #151
  • propose-submit () - broadcast propose; deferred to #151
  • (prevote|precommit)-(get|submit) - handles to atomic operations of byzantine behavior deferred to #151
  • viewing consensus messages from other node? Currently these aren't persisted, besides precommits. - deferred to #152.
  • prioritize () - change tx priority in pool. deferred to #150
  • network-info () - private; static protocol_id, static network_id, list of mounted services. Closed as part of #224
  • network-totals () - traffic etc (we don't collect this info now) ; deferred to #153
  • peer-info () - private; get all incoming/outgoing connections; peers map of state; actual reconnects. Closed as part of #224
  • peer-add (ip-address) - private; generate Event(ConnecTo(ip-address)) . Closed as part of #224
  • peer-remove () - deferred to be contolled by reading config during node restart of #14

Generic fabric for clap configuration

There should be a way to create configuration for real net:

In internal discussion we decide to split creating configuration in next steps

  1. Create config template.
  2. Each validator should add hisself to this template in some order.
  3. Then this template should propagate to all validators.
  4. Each validators finnalize this template to they own config.

Сделать алгоритм консенсуса независимым от времени

  1. Убрать время из сообщений типа Propose, Block и Request*, и логику их валидации.
  2. Добавить время в сообщений типа Precommit (указывается текущее время на момент создания сообщения, при получении сообщения никак не валидируется)
  3. Обновить тесты, убедиться что все работает.

Merge Request и код пока выливается на gitlab.

Investigate external/internal ip for network discovery.

The fact, that node is sending initially statically defined addr of itself in Connect message may be problematic for deployment of nodes across different networks/organisations.
In general case, a node cannot know its own ip, as seen by another peer, without using external services.
Moreover, a node's ip may vary from different peers' perspective.
#16
https://*********/projects/22/tasks/1307

precommits verification

@defuz @alekseysidorov ; found 1 likely bugs in precommits verification:
It seems there's no code to verify, that precommmits are from distinct validators. Replicating single precommit self.state.majority_count() times will suffice to pass verification.

        let precommits = msg.precommits();
        if precommits.len() < self.state.majority_count() ||
           precommits.len() > self.state.validators().len() {
            error!("Received block without consensus, block={:?}", msg);
            return;
        }
        let precommit_round = precommits[0].round();
        for precommit in &precommits {
            let r = self.verify_precommit(&block_hash, block.height(), precommit_round, precommit);
            if let Err(e) = r {
                error!("{}, block={:?}", e, msg);
                return;
            }
        }

Test time-independent consensus algorithm in denial scenarios 5/1 and 6/2 (no txs load).

propose_timeout=500 in all cases. These are empty blocks.

  1. For
  • 6 and 8 nodes
  • round_timeout = 3000
  • status_timeout = 5000
    performance is poor. Many (1-6) rounds per an empty block.

5-12 blocks per minute on 6 nodes.

  1. For
  • 6 and 8 nodes
  • round_timeout = 3000
  • status_timeout = 3000

74 blocks per minute on 6 nodes.

  1. For
  • 5/1 and 6/2 nodes (5/1 means 6 validators total, 1 stopped for maintenance or due to denial).
  • round_timeout = 3000
  • status_timeout = 1000
    No data.

This is related to #2 and #29.

Refactor error handling

As proposed in #39, it would be convenient to change functions returning (), that contains code like

if some_bad_case {
  error!("ERROR MESSAGE");
  return;
}

or

let val = match get() {
    Some(val) => val,
    Error(err) => {
        error!("{:?}", err);
        return;
    }
};

into functions returning Result<(), Error>, so the code above could be rewritten into:

let val = get()?;

Core schema and its use in services

I've got into a problem understanding how the core schema interacts with the service code (on the example of the anchoring service). I'm in the dark here, so a clarification could be helpful.

I don't quite understand why the core exposes its schema as a part of its public interface. (Which leads to some questionable choices, such as having the notion of configurations and especially configuration changes embedded into the core - whereas there is a separate service for that.) It could be more developer-friendly to have a pseudo-service interface for the core. Furthermore, this hypothetical interface is similar in its goal to the one used now for service HTTP GET requests; only the middleware could automatically decide not to provide Merkle proofs in the case of inter-service interaction within a full node. This interface could return, via dedicated methods:

  • a list of current validators/admins
  • current height
  • block at given height
  • tx with given hash
  • block with given hash

And so on. Now, the anchoring service has an optional dependency on the configuration change service (e.g., in order to change the anchoring address), and it should probably:

  • understand if the config change service is available
  • interact with that service (again, via GET-methods) if it is available, in order to get the following config

Perhaps, I'm misunderstanding something, but I would describe the current approach as hacking the core (e.g., with get_following_configuration and the like) just in case it runs with one particular service. Is this done for efficiency reasons?

Proposed solution: A good solution would require inter-process communication. A good place to start seems to treat a View passed to the transaction's execute method as the execution context of the transaction. Then, it could be passed to other service calls (ideally, implicitly - middleware should take care of that). Behind the scenes, an execution context would correspond to many things, including the DB view, but we would want to hide these details from service developers, right?

So, instead of

pub fn execute(&self, view: &View) {
    let schema = Schema::new(view);
    let actual_cfg = schema.get_actual_configuration()?;
    let validators = actual_cfg.validators;
}

it would look like

pub fn execute(&self, context: &ExecutionContext) {
    // narrow() notation is taken from CORBA
    let service = context.get_service(CoreService::SERVICE_ID).narrow<CoreService>();
    let validators = service.get_validators(context);
}

Sorry for my Rust, but you probably get the idea.

Network discovery failure via RequestPeers

This was observed to sometimes result in

  1. network discovery problems: not all validators getting updates for transactions which resulted in them having empty txpools and broadcasting empty proposals.
  2. network partitioning (stop/start all validators, watch them not continuing progress on blocks).
    pub fn handle_request_peers(&mut self, msg: RequestPeers) {
        let peers: Vec<Connect> = self.state.peers().iter().map(|(_, b)| b.clone()).collect();
        for peer in peers {
            self.send_to_peer(*msg.from(), peer.raw());
        }
    }
------
    pub fn send_to_peer(&mut self, public_key: PublicKey, message: &RawMessage) {
        if let Some(conn) = self.state.peers().get(&public_key) {
            trace!("Send to addr: {}", conn.addr());
            self.channel.send_to(&conn.addr(), message.clone());
        } else {
            warn!("Hasn't connection with peer {:?}", public_key);
        }
    }

If node A missed node B's Connect, node A won't send its peers to B upon being requested.

Proposed fix: add addr and time fields to RequestPeers, effectively combining Connect and RequestPeers. (and combining handling logic too).

        addr:           SocketAddr  [32 => 38]
        time:           SystemTime  [38 => 50]

get rid of unwrap fn public_key_of(&self, id: ValidatorId).

@alekseysidorov Accidentally spotted this tiny method. https://github.com/exonum/exonum-core/blob/master/exonum/src/node/consensus.rs#L766
It's likely to cause panics when performing incoming consensus messages verification from rogue nodes:

   `handle_propose`
   --> src/node/consensus.rs:106:28
    |
106 |             let key = self.public_key_of(msg.validator());
    |                            ^^^^^^^^^^^^^

   `handle_prevote`
   --> src/node/consensus.rs:246:28
    |
246 |             let key = self.public_key_of(prevote.validator());
    |                            ^^^^^^^^^^^^^

   --> src/node/consensus.rs:285:32
    |
285 |                 let key = self.public_key_of(validator);
    |                                ^^^^^^^^^^^^^

   `handle_precommit`
   --> src/node/consensus.rs:340:25
    |
340 |         let peer = self.public_key_of(msg.validator());
    |                         ^^^^^^^^^^^^^

   --> src/node/consensus.rs:676:28
    |
676 |             let key = self.public_key_of(validator);
    |                            ^^^^^^^^^^^^^

Separate build instructions from readme.md and update & translate them

Currently have bug. Scenario:

  • Clean repository
  • Mac OsX 10.12.3
  • run cargo test --all
Compiling lazy_static v0.2.2
error: linking with `cc` failed: exit code: 1
****
  = note: ld: library not found for -lleveldb
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Add whitelist support

For now, we can connect to node with self generated pair (public_key, secret_key).
We should add filter to disallow connection from node with unauthorized public_key.

Exonum do not support big-endian architectures

At least two modules are implicitily assume that current hardware is little-endian.

  • Storage
  • Messages

It seems not to be critical issue because most of modern hardware is little-endian.

Refactor message! and storage_value!

  1. message! and storage_value! share same code to generate packed like structures, so we need to
    take out shared code to some module.
  2. message! and storage_value! has borrowed fields, so we cant derive deserialize for it
  3. message! should semantically depend from service
  4. there many boilerplate code that user need to write const MESSAGE_TYPE, const SIZE, const SERVICE_ID, and [from => to] to each field

For first iteration we decided:

  • Make seperate traits that implements exonum json Deserialize and Serialize aspects (partial implemented in #71 )
  • Implement Field for array of Field (fix #32)
  • The main idea is code should be well documented, so we should not use associated types for return values.

Use tuple struct instead of simple `type`

Currently we have "typedefs" for some things like height, round, etc.:

pub type Round = u32;
pub type Height = u64;
pub type ValidatorId = u32;

Instead they can be made into tuple struct.

Advantages:

  • Prevents possible (probably, unlikely) errors, for example: fn foo(Round, ValidatorId).
  • Force consistency: currently we have many places where row types are used instead of our typedefs.
  • "Cool typesafety". 😆

Disadvantages:

  • Additional "boilerplate": round.0 instead of round if we need underlying value.
  • ?

I can make such refactoring if we decide that we need it.

Remove profiler_service

Why?

  • This is the only service that remained in exonum-core
  • Everything that it does is just calling of flame_dump
  • Call flame_dump from execute method of a transaction is a good example of bad design :)

What to do:

  • Use conditional compilation for calling flame_dump
  • Right place for doing it is non-existent method handle_terminate of Node (look at this for more info)

Tracking issue for 0.1 release

Features & code changes:

@defuz:

  • Implement iterators for storage, refactoring storage key #7 #58
  • Panic handling during transaction execution #59
  • Add developer notes #8

@alekseysidorov:

  • Combine public/private API endpoints from services #66 #53
  • Dynamic IDs for services #65

@gisochre:

  • Network discovery failure via RequestPeers #73
  • transaction location within block #77

@DarkEld3r:

  • "Propose timeout" refactoring #49
  • Review methods and functions naming into whole project #55
  • Sending status message after every block and reset timeout #63
  • Separate keys for signing consensus messages and transactions #62
  • Documenting consensus messages #48

@vldm:

  • Ser/de, refactoring messages #17 #32
  • Whitelist support for full nodes #14
  • Generic fabric for clap configuration (assistant @alekseysidorov) #61
  • Tx generator, running benchmarks #54
  • Handling of mempool filling #64
  • Verify profiling #52
  • Managing node through managing API ("RPC") (assistant @alekseysidorov) #60

@deniskolodin:

  • Modifying block structure #138

Documentation #111:

Each responsible provide separate PR which add #![deny(missing_docs)] to their modules. After that, we add #![deny(missing_docs)] for overall exonum.

Release process

TBD

Documenting consensus messages

It was determined in #46 that consensus messages (e.g., Propose) are not sufficiently documented for now. Each such message could be commented like

// Request connected peers from the node `to`.
//
// ### Processing
//   * The message is authenticated by the pubkey `from`. 
//     It must be in the receiver's full node list
//   * If the message is properly authorized, the node responds with...
//
// ### Generation
// A node generates `RequestPeers` under such and such conditions...
message! {
     RequestPeers {

Note that consensus messages are slightly different from transaction messages defined by services; neither Processing, nor Generation sections can be straightforwardly translated for transaction messages (although these messages should probably be documented too). This is because tx message processing is encapsulated in the execute() method of the transaction (i.e., can be documented there); and there are no specific rules as to when ordinary tx messages are generated.

Proposed solution: I think some documentation for consensus messages is needed both here and in general Exonum docs. Message descriptions here could be useful in order to verify that messages are processed and generated as intended without needing to consult an external source. And they can be copy-pasted to the general docs if necessary.

Consensus on the threshold of 1/3 sleeping validators

After merging #6 request (#2 issue), we are going to have a problem with consensus on the threshold of 1/3 sleeping validators. Proposed solution: if 1/3 validators send messages for round R or higher, then validators from lower rounds should jump to round R.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.