Giter Club home page Giter Club logo

honey-badger-testing's Introduction

honey-badger-testing

A collection of scripts to test the Honey Badger BFT integration in Parity

the term localnet refers to a local testnet. the term remotenet refers to a testnet that is accessable via SSH, and requires ssh config entries in the sheme hbbft1, hbbft2, ....hbbft999, as well corrisponding files in the testnet nodes directory

Project setup

parallel to ./honey-badger-testing this project requires othe projects as well.

depending on the features you need, this is the bare minimum:

  • Rust (cargo)
  • NPM
  • npm install -g @openzeppelin/contracts
git clone https://github.com/DMDcoin/diamond-contracts-core.git
cd diamond-contracts-core && npm ci && cd ..

git clone https://github.com/DMDcoin/diamond-node.git


git clone https://github.com/DMDcoin/honey-badger-testing.git
cd honey-badger-testing
npm ci
npm run localnet-create-mnemonic

SSH setup

The projects expects that you have SSH access to the servers where you want to deploy the testnetwork. The SSH Servers neet to be registered in the ssh config file on linux. you can have as many testservers as you want. the SSH servers need to be registered as the following naming scheme: hbbft1, hbbft2, ...

creating a localnet

deploying a remote net

A remotenet can be deployed from a localnet. It is advised to deploy only fresh (never started) localnets.

deployment of a testnet on remote machines

the following examples define all nodes ( -- -a) as target.

# pulls the network specified in the settings. networkGitRepo and networkGitRepoBranch
npm run remotenet-git-clone-network  -- -a
npm run remotenet-deploy-from-localnet -- -a 
# run the update from git async first.
npm run remotenet-binary-update-from-git-async -- -a
# confirm the success by doing the update sync.
npm run remotenet-binary-update-from-git -- -a

now we need to generate communication information between the peers. the following script generates a reserved peers file for the rpc port on the deployed network.

diamond indexer

Diamond indexer is a service that indexes the posdao contracts on a postgres db

git clone https://github.com/DMDcoin/honey-badger-testing.git

Performance Test Scripts

The test scripts are implemented using node.js v10, install and run as usual:

npm ci

There are following tests available:

  • latency1 (1 Transaction all 1-10 Seconds)
  • latency2 (1 Transaction all 1-10 Seconds, background baseload 10tx/second)
  • throughput1 (~ 70 transactions a second)
  • throughput2 (~ 7 * 70 transactions a second, distributed on multiple nodes in the network.)

The Tests are further described in detail here: https://github.com/artis-eco/wiki/wiki/Honey-Badger-BFT-Hypothesis-Testing

It is possible to run all tests using the npm run runAllTests command.

The Tests are configured in the ./config directory.

Starting from a new Testchain requires to feed those testaccounts first: npm run feedAccounts

Test Results

The tests write Testresults into the output directory. This directory is not mapped by the Git repository. Testresults, that require to be analysed need to get manually transfered to the jupyter/data directory.

Test Result Analysis

Jupyter Notebooks are used to analyse the testresults. please refere to the jupyter/README.md

Testnet Setup Scripts

This repository contains scripts to automatically generate config files to set up a hbbft test network of arbitrary size.

SSH Setup

The remote-net-system works on the system of named ssh nodes. Therefore every setup is supported that can be supported by the ssh system. you can either have a Network infrastructure on localhost, localhost within a (para) VM, remote VM's, real hardware...

The system expects to have the nodes numerated in the sense of

  • hbbft1
  • hbbft2
  • ...

Introduction

We are using Docker to quickly spin up and down a test network of any size.

One desired property of the setup is the ability to replace individual Docker nodes with locally running nodes for the purpose of interactive debugging.

We achieve this property by mapping the nodes' port to the Docker bridge address, and let all nodes communicate through this bridge address. Locally running nodes can bind to that interface as well, allowing for a mix of Docker and local nodes.

Usage

Requirements:

  • The following repositories cloned at the same directory level as this repository
  • Python >=v3.6
  • Docker

To generate the configs for n nodes cd into this repository and execute:

cd pumba
./setup_testnet.py n

Where "n" has to be replaced by a number >=1 denoting the number of validator node configs to generate.

The script also supports generating configs for nat/extip setups. Simply add the external ip address as argument to the script.

./setup_testnet.py n ext_ip

Where "ext_ip" has to replaced by the external IP address to use.

Folder Structure

To be compatible with both local and Docker nodes we have to use an appropriate directory structure.

For the sake of simplicity we choose a single directory containing all configs and data to be mounted into a Docker volume.

Caveat: Filesystem performance inside of a Docker volume may be significantly slower than inside the container. We may re-consider the approach of sharing the "data" folder through a Docker volume for that reason.

Block Number Tracking

Requires to manually find the first block in the CSV. We could fix this by memorizing the block number befor we start. For example by writing it into a file.

Managing Network

building diamond node fresh

removing existing installation, and getting new one as defined in the repository

npm run remotenet-git-delete-node
npm run remotenet-git-setup-build-from-source

Building the Node Software

npm run remotenet-git-pull-node-and-build

or if a lot of nodes have to be build, do it async

npm run remotenet-binary-update-from-git-async

tipps for managing nodes.

example stop a node and build latest from git

export NODE_TARGET= -s hbbft10 
npm run remotenet-stop $NODE_TARGET && npm run remotenet-git-pull-and-build $NODE_TARGET         

honey-badger-testing's People

Contributors

d10r avatar dforsten avatar surfingnerd avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

honey-badger-testing's Issues

grafana: Sync status

make panel for sync_status.

syncing nodes are contributing to current epoch.
is the tracked syncstatus wrong sometimes ?

DMDcoin/diamond-node#27

        r.register_gauge(
			"sync_status",
			"WaitingPeers(0), SnapshotManifest(1), SnapshotData(2), SnapshotWaiting(3), Blocks(4), Idle(5), Waiting(6), NewBlocks(7)", 
			match self.eth_handler.sync.status().state {
			SyncState::WaitingPeers => 0,
			SyncState::SnapshotManifest => 1,
			SyncState::SnapshotData => 2,
			SyncState::SnapshotWaiting => 3,
			SyncState::Blocks => 4,
			SyncState::Idle => 5,
			SyncState::Waiting => 6,
			SyncState::NewBlocks => 7,
        });

chainspec adapting service for fork support

only localnet support is required for this kind of setup!

adapting real network

  • step 1: generate a regular network (example: 4 nodes + rpc)
  • step 2: use chainspec from remotenet (raw pull from github)
  • step 3: adapt the downloaded remotenet chainspec with a fork section
  • step 4: copy adapted localnet chainspec to all localnet nodes

localnet behaves now like every other localnet.
contract upgrades can be deployed, associated diamond-node version can be upgraded.

DMDcoin/diamond-node#98

testnetwork for testing fork support

similar for the application on the real network

  • step 2 is different, instead of downloading the spec from github, a spec for a local testnet has to be downloaded.

prerequisites

  • port network creation to Typescipt, refactor network creation script so it is able to do individual steps.

support for multiple local testnets

the system is currently able to holds one testnet located in the directory testnet/nodes.
The nodes in this directory are use for several use cases:

  • regression tand integration tests to verify if the system works fine after changes.
  • storing local data about remotenets, this date is consumed by npm run remotenet-??? commands

In this situation it is just not possible to run a regression / integration test during a development cycle where also often remotenet commands are used. resulting in the requirement to "rename" the directory for the nodes.
currently used sheme is

  • nodes for the current version
  • nodes-current for a parked remotenet info pool
  • nodes-backup for some interesting scenario to analyse later

we need a fool proof solution where it is always possible to just start tests without messing around with renaming of directories.

move typechain generation to postinstall

don't check in generated typechain files.
finally found a good solution for typechain generated files.
using post install triggers in npm:

"postinstall":"typechain"

restructure testfiles

currently the tests and the helper function and tie quality of life feature, all are in the root of the src directory.
it's getting more and more messy.

The configs do not make it much easier, since some config options don't make sense for every runner.

Separate

  • runners
  • logic
  • tests
  • QOL Features
  • helpers

integration tests: auto run all integration tests

We have put a lot of effords developion the integration tests:
#3

Those test the Contracts together with the Node-software to ensure that everything works together.

integration tests are known as testrun , that do a cleanup on old data, build a new blockchain, and run the specified tests against the new network.
some of those tests require a large network (~25) and they do need a lot of CPU&RAM.

postgres integration

currently we have some csv exports, but for further processing we require the data to be hosted on a DB

  • add num of validators to the DB of block data.

grafana renderer service

it is best for the UI to directly display rendered grafana panels
grafana offers a grafana renderer that allows that.

automated test for automatic restalking feature

Testing for: DMDcoin/diamond-contracts-core#43

  • including the impact of small delegate stakes on validators.

  • figure out if the blockchain will be able to create a block within 1 minute.

  • plot a graph of the impacts

  • Network generation

    • with 25 Nodes (it could be only 1 Node, since it "should" scale lineary) but with 25 nodes wrong expectations are addressed. for a local test, 7 Nodes should do fine, so we do not have a overload of the local machine.
    • Epoch Length should be long enough so the long lasting block can be created within 1 epoch, and short enough
      to get results within a convinient timeframe. Since the target block creation timeframe is 1 minute, 10 Minutes should be a good value, that gives enough time to do the key generation.
    • network needs to be generated with only 1 validator - the rest of the validators need to be filled, so they become regular validators that are possible to be staked on, instead of MOC Validators.
  • maybe develop with the capability to test it both on a remote and a localnet. (current ostacle: #97) (no, we do not need remotenet support)

  • Increase the number of delegators each block by a configurable step (100 ?) for each epoch.

  • figure out the block that does the reward call, and measure the time consumed for creating the block.

  • write out each created epoch switch as a CSV line.

  • create a plot (libre office calc?) to visualize the result.

  • highlight the breakpoint of the target time of 1 Minute, so we know the hard minimum boundary for delegator min stake.

  • append the data and visualisation in this issue.

regression test: Staking and pick up as validators

Prerequisite:

Candidate node is running, and not registered as validator node.
Maximum number of Nodes not reached yet.

Action

stacker adds a pool for candidate node.

Expected result

The Node get's picked up as validator within the next epoch swich.

Automate TestNet Deployment

For #3

create script that simplifies testnet deployment of currnet configurion.
can run testNet node in a screen.
would be nice to have minimum sanity checks.

Alpha 2 DB Size

  • make forcasts when the DB become a problem with current speed.

Performance Analyse Script: plateau finder

As a metric for performance improvements we need a script that builds up load, until it sees that the network can not keep up anymore.
as a result we get:

  • transactions processed per second.
  • average block speed.

in addition we get insight on how much the average blockspeed drops in favor of bigger blocks.

test design:

  • Create and Fund x (x=100) Test accounts. (Unfortunatly we are limited here by web3.js on how many accounts it supports before crashing)

  • Send n transaction for each x

  • n becomes larger and larger.

  • measure performance of the blockchain and append it to a performance result CSV.

  • extent with scaling up the network so we have different graphs for each network size.

  • make long running tests so we see the impact of DB size on the Performance.

regression test: Unavailability handling caused by Node shutdown

A Node is treated as available.
Once it got chosen to write it's PARTs and ACKs,
and it misses out doing so during the key generation time window,
it must be excluded from the set and flagged as "unavailable".

The validators, who managed to write their keyshares - they keep beeing in the validator - set.

If there are enough additional potential validators,
a replacement validator is chosen by random.
If the system is running out of potential validators,
the HBBFT validator Set continues running with less validator set.

In a new key generation time window - all validators get another chance for writing their keys.
(See also: StakingHbbftBase.currentKeyGenExtraTimeWindow)

Suggested Test:

Settings
60 seconds lenght of Phase 1
1 MOC/RPC + 3 Regular Nodes.
Actions:
Wait for Epoch Switch (so we have a full Phase 1 time window left)
Stake on all 3 Regular Nodes.
Shut down RegularNode1.
Key Generation should throw out Node1 and Mark it unavailable.
And continue with 2 HBBFT Consensus Nodes.
Boot up Node1 again.
Node1 should notify the system that it became available again.
Wait for Epoch Switch.
Node1-2-3 should all be a pending validator again.
The nodes should be able to write their Key.
The HBBFT System runs now with 3 consensus nodes.

validator without ACKS scraper

Develop a data scraper that analyses the blockchain and figures out occurances of validators that did write their PART, but did never write their ACK.
we noticed this anomality multiple times:
DMDcoin/openethereum-3.x#65

  • can use blockscout API including GraphQL

  • should work from the tail of the blockchain to the front (newer data is more interesting.

  • should evaluate Transactions, and parse the data field instead of reading the contract data.

  • detect cases

  • prove thesis: a case leads to a node dropout in the upcomming epoch.

Status Message missing

sometimes we miss the status message (Block IMport...) in the nodes.
it could be a lock problem.

Create setup for hbbft Testnet1.

  • latest openethereum implementation for hbbft
  • latest hbbft-posdao-contracts deployed in the genesis block
  • analytic functions to verify basic functions of the hbbft-posdao-contracts
  • integrate blockscout into honey-badger-testing

extract core logic into a shared library

The core logic for contract interaction is useful for other projects as well.
diamond-ui (dmd-ui), replacement of posdao-ui already uses it.
we should seperate it to a shared library

automatic client restart upon unavailability.

we can watch the blockchain for seeing client unavailable events.
if we see such, we can auto-restart the node. Eventually even clearing the cache.
could be implemented in the watchdog - service.
(a more sophisticated implementation would seperate watchdog and actions)

regression Test: Unstaking and removing as validators

prerequesite

Continuation of #12
Validator has staked on a node,
The node got picked as validator.

action

Requesting Unstaking and Unstaking

expected result

Node will not get picked in the next validator selection round.
After the locking period, the staking can get removed by the pool owner.

localnet vs remotenet as config param ?

currently the config does not tell if a network is a local net or a remote net.
especially, a localnet can be deployed to become a remotenet.
in this case the localnet should not be used anymore.

the project structure defines 3 set of functions concerning networks::

  • localnet-xyz : interacts with local processes and local files - function can not be applied on a remotenet.
  • remotenet-xyz : interacts via SSH with remote nodes - functions can not be applied on a localnet
  • net-xyz: interacts with RPC only, functions can be applied to localnet and remotenets.

there is also testnet-xyz what means localnet in it's essence, but usually refers to integration tests for a specific feature.
this testnet naming pattern should be reworked, by either specifying features and restrictions that only apply to integration tests, or completly replaced with localnet.

Tests that are currently written for net suffer the inability to not spin up and shut down the 'localnet' - and are currently not suitable for automatic integration tests - that issue should be addressed.

testnetwork: n out of m validator nodes network generation.

The current Testscripts support only a "All IN" Testnet setup.
Meaning, it is possible to create a testnet with 7 nodes, but all 7 nodes start of being a Validator node at origin.
it is already prepared in the script here: 0d4a357

A possible implementation option would be to implement this in the
open ethereum hbbft config generator.
https://github.com/SurfingNerd/openethereum-3.x/blob/surfingnerd/fix-hbbft-config-gen/crates/ethcore/src/engines/hbbft/hbbft_config_generator/src/main.rs

That implementation would make a lot of regression tests easier and blend in very well in the existing test suite
(even with pumba support)
#3

extend contract upgrade tool

During Upgrade, some contracts might require an initialization, in the case the contract did not have a reference to another contract before.

Question:
Is there a Safe Method to figure it out programmatic ?

A1:
IsInitialized() is always false, because the contract is missing it's storage until the upgrade is complete.

Integration Test definition

Define a long running test with real diamond-nodes that test network stability and provide valuable statistics:

Actions execute

  • filling Reward Contracts payout pots
  • claim reward (obsolete! restaking implementation in progress)
  • Staking and pick up as validators #12
  • Unstaking and removing as validators #13
  • fall back to single validator node #19
  • DPOS Staking
  • Unavailability handling caused by Node shutdown #18
  • Boot of unavailable Node leads to availability #88
  • Malice Reports (executing automated node shutdown)
  • rebooting and re adding mallice nodes as legit nodes
  • sync a fresh node from origin (hard fork topic)
  • economics: lost stakes vanishing
  • economics: late claim dilution
  • economics: no claim dilution

Trackings to be done:

  • storage impact on Blockchain with increasing validators count
  • block time impact on blockchain with increasing validators count
  • coineconomy: Staking returns
  • coineconomy: Pool sizes

Quality of life features:

  • automate testNet deployment. #4
  • n out of m test network creation #10

Automating the execution of CI Tests

it might also be possible with a hack in openEthereum to fast spin the time by a huge factor, so we can run the tests with realistic values like 24 hours epoch length and 30 minutes transition time length.
Time would just fast forward - simulating 1 day in just a few seconds.

This testseries might take quite a huge amount of CPU/Memory and Time.
It might be nice to make the log-output available on a web interface.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.