dmdcoin / honey-badger-testing Goto Github PK

This project forked from artis-eco/honey-badger-testing

Various test scripts for analyzing and exercising the Honey Badger BFT integration into Parity

JavaScript 0.18% Shell 0.16% Dockerfile 0.04% Python 0.11% Jupyter Notebook 89.06% TypeScript 10.23% PLpgSQL 0.21%

honey-badger-testing's Introduction

honey-badger-testing

A collection of scripts to test the Honey Badger BFT integration in Parity

the term localnet refers to a local testnet. the term remotenet refers to a testnet that is accessable via SSH, and requires ssh config entries in the sheme hbbft1, hbbft2, ....hbbft999, as well corrisponding files in the testnet nodes directory

Project setup

parallel to ./honey-badger-testing this project requires othe projects as well.

depending on the features you need, this is the bare minimum:

Rust (cargo)
NPM
npm install -g @openzeppelin/contracts

git clone https://github.com/DMDcoin/diamond-contracts-core.git
cd diamond-contracts-core && npm ci && cd ..

git clone https://github.com/DMDcoin/diamond-node.git


git clone https://github.com/DMDcoin/honey-badger-testing.git
cd honey-badger-testing
npm ci
npm run localnet-create-mnemonic

SSH setup

The projects expects that you have SSH access to the servers where you want to deploy the testnetwork. The SSH Servers neet to be registered in the ssh config file on linux. you can have as many testservers as you want. the SSH servers need to be registered as the following naming scheme: hbbft1, hbbft2, ...

creating a localnet

deploying a remote net

A remotenet can be deployed from a localnet. It is advised to deploy only fresh (never started) localnets.

deployment of a testnet on remote machines

the following examples define all nodes ( -- -a) as target.

# pulls the network specified in the settings. networkGitRepo and networkGitRepoBranch
npm run remotenet-git-clone-network  -- -a
npm run remotenet-deploy-from-localnet -- -a 
# run the update from git async first.
npm run remotenet-binary-update-from-git-async -- -a
# confirm the success by doing the update sync.
npm run remotenet-binary-update-from-git -- -a

now we need to generate communication information between the peers. the following script generates a reserved peers file for the rpc port on the deployed network.

diamond indexer

Diamond indexer is a service that indexes the posdao contracts on a postgres db

git clone https://github.com/DMDcoin/honey-badger-testing.git

Performance Test Scripts

The test scripts are implemented using node.js v10, install and run as usual:

npm ci

There are following tests available:

latency1 (1 Transaction all 1-10 Seconds)
latency2 (1 Transaction all 1-10 Seconds, background baseload 10tx/second)
throughput1 (~ 70 transactions a second)
throughput2 (~ 7 * 70 transactions a second, distributed on multiple nodes in the network.)

The Tests are further described in detail here: https://github.com/artis-eco/wiki/wiki/Honey-Badger-BFT-Hypothesis-Testing

It is possible to run all tests using the npm run runAllTests command.

The Tests are configured in the ./config directory.

Starting from a new Testchain requires to feed those testaccounts first: npm run feedAccounts

Test Results

The tests write Testresults into the output directory. This directory is not mapped by the Git repository. Testresults, that require to be analysed need to get manually transfered to the jupyter/data directory.

Test Result Analysis

Jupyter Notebooks are used to analyse the testresults. please refere to the jupyter/README.md

Testnet Setup Scripts

This repository contains scripts to automatically generate config files to set up a hbbft test network of arbitrary size.

SSH Setup

The remote-net-system works on the system of named ssh nodes. Therefore every setup is supported that can be supported by the ssh system. you can either have a Network infrastructure on localhost, localhost within a (para) VM, remote VM's, real hardware...

The system expects to have the nodes numerated in the sense of

hbbft1
hbbft2
...

Introduction

We are using Docker to quickly spin up and down a test network of any size.

One desired property of the setup is the ability to replace individual Docker nodes with locally running nodes for the purpose of interactive debugging.

We achieve this property by mapping the nodes' port to the Docker bridge address, and let all nodes communicate through this bridge address. Locally running nodes can bind to that interface as well, allowing for a mix of Docker and local nodes.

Usage

Requirements:

The following repositories cloned at the same directory level as this repository
- diamond-node ([email protected]:dmdcoin/diamond-node.git)
- diamond-contracts-core ([email protected]:dmdcoin/diamond-contracts-core.git)
Python >=v3.6
Docker

To generate the configs for n nodes cd into this repository and execute:

cd pumba
./setup_testnet.py n

Where "n" has to be replaced by a number >=1 denoting the number of validator node configs to generate.

The script also supports generating configs for nat/extip setups. Simply add the external ip address as argument to the script.

./setup_testnet.py n ext_ip

Where "ext_ip" has to replaced by the external IP address to use.

Folder Structure

To be compatible with both local and Docker nodes we have to use an appropriate directory structure.

For the sake of simplicity we choose a single directory containing all configs and data to be mounted into a Docker volume.

Caveat: Filesystem performance inside of a Docker volume may be significantly slower than inside the container. We may re-consider the approach of sharing the "data" folder through a Docker volume for that reason.

Block Number Tracking

Requires to manually find the first block in the CSV. We could fix this by memorizing the block number befor we start. For example by writing it into a file.

Managing Network

building diamond node fresh

removing existing installation, and getting new one as defined in the repository

npm run remotenet-git-delete-node
npm run remotenet-git-setup-build-from-source

Building the Node Software

npm run remotenet-git-pull-node-and-build

or if a lot of nodes have to be build, do it async

npm run remotenet-binary-update-from-git-async

tipps for managing nodes.

example stop a node and build latest from git

export NODE_TARGET= -s hbbft10 
npm run remotenet-stop $NODE_TARGET && npm run remotenet-git-pull-and-build $NODE_TARGET

honey-badger-testing's People

Contributors

Stargazers

Watchers

Forkers

surfingnerd dforsten

honey-badger-testing's Issues

diamond db: calculate rewards

... or develop it in the smart contracts ?

branding of client software

openethereum will have a new name, like diamond-node.
DMDcoin/openethereum-3.x#34

grafana: Sync status

make panel for sync_status.

syncing nodes are contributing to current epoch.
is the tracked syncstatus wrong sometimes ?

DMDcoin/diamond-node#27

        r.register_gauge(
			"sync_status",
			"WaitingPeers(0), SnapshotManifest(1), SnapshotData(2), SnapshotWaiting(3), Blocks(4), Idle(5), Waiting(6), NewBlocks(7)", 
			match self.eth_handler.sync.status().state {
			SyncState::WaitingPeers => 0,
			SyncState::SnapshotManifest => 1,
			SyncState::SnapshotData => 2,
			SyncState::SnapshotWaiting => 3,
			SyncState::Blocks => 4,
			SyncState::Idle => 5,
			SyncState::Waiting => 6,
			SyncState::NewBlocks => 7,
        });

contract size limitation

we should adapt the maximum contract size to really what we need.

chainspec adapting service for fork support

only localnet support is required for this kind of setup!

adapting real network

step 1: generate a regular network (example: 4 nodes + rpc)
step 2: use chainspec from remotenet (raw pull from github)
step 3: adapt the downloaded remotenet chainspec with a fork section
step 4: copy adapted localnet chainspec to all localnet nodes

localnet behaves now like every other localnet.
contract upgrades can be deployed, associated diamond-node version can be upgraded.

DMDcoin/diamond-node#98

testnetwork for testing fork support

similar for the application on the real network

step 2 is different, instead of downloading the spec from github, a spec for a local testnet has to be downloaded.

prerequisites

port network creation to Typescipt, refactor network creation script so it is able to do individual steps.

support for multiple local testnets

the system is currently able to holds one testnet located in the directory testnet/nodes.
The nodes in this directory are use for several use cases:

regression tand integration tests to verify if the system works fine after changes.
storing local data about remotenets, this date is consumed by npm run remotenet-??? commands

In this situation it is just not possible to run a regression / integration test during a development cycle where also often remotenet commands are used. resulting in the requirement to "rename" the directory for the nodes.
currently used sheme is

nodes for the current version
nodes-current for a parked remotenet info pool
nodes-backup for some interesting scenario to analyse later

we need a fool proof solution where it is always possible to just start tests without messing around with renaming of directories.

getCurrentTestnetState: fix or remove script

move typechain generation to postinstall

don't check in generated typechain files.
finally found a good solution for typechain generated files.
using post install triggers in npm:

"postinstall":"typechain"

Network Spec creation after merge of contracts

prometheus: create scraping config for nodes

create a script that generates a scrape config for prometheus

Hbbft Spec generation: Support for setting initial Funds account

currently the hbbft spec generator takes the template and the account mentioned in the spec, but it should take the first account provided by the mnemonic.
This is required by several automatation tasks, e.g. automated testing.

migrate to openzeppelin upgrade contracts

test before alpha 2 network start

just in time fetching of new information

running as a service
fetches just in time data on the newest blocks
pushes data into the postgres db

restructure testfiles

currently the tests and the helper function and tie quality of life feature, all are in the root of the src directory.
it's getting more and more messy.

The configs do not make it much easier, since some config options don't make sense for every runner.

Separate

runners
logic
tests
QOL Features
helpers

DMD-DB fill script: handle RPC instabilites.

currently the db fill service crashes if the RPC goes offline.

configuration for alpha3

integration tests: auto run all integration tests

We have put a lot of effords developion the integration tests:
#3

Those test the Contracts together with the Node-software to ensure that everything works together.

integration tests are known as testrun , that do a cleanup on old data, build a new blockchain, and run the specified tests against the new network.
some of those tests require a large network (~25) and they do need a lot of CPU&RAM.

alpha2 economy test parameters

create a testnetwork with the parameters defined for the alpha 2 network.

Max Block Gas: 300 * MGas
60 Times Faster

postgres integration

currently we have some csv exports, but for further processing we require the data to be hosted on a DB

add num of validators to the DB of block data.

Hbbft Spec Generation: Support for lost coins.

requirement from:
DMDcoin/diamond-contracts-core#172

grafana renderer service

it is best for the UI to directly display rendered grafana panels
grafana offers a grafana renderer that allows that.

integrate grafana in honey-badger-testing

usefull for debugging local test networks.
currently it is a lot of work to setup the grafana instance, altough the DB is already well integrated.

automated testing for phoenix protocol

stopping and starting nodes during the processing
develop as test-first so we can reproduce the current problem with the node software

automated test for automatic restalking feature

Testing for: DMDcoin/diamond-contracts-core#43

regression test: Staking and pick up as validators

Prerequisite:

Candidate node is running, and not registered as validator node.
Maximum number of Nodes not reached yet.

Action

stacker adds a pool for candidate node.

Expected result

The Node get's picked up as validator within the next epoch swich.

Automate TestNet Deployment

For #3

create script that simplifies testnet deployment of currnet configurion.
can run testNet node in a screen.
would be nice to have minimum sanity checks.

Alpha 2 DB Size

make forcasts when the DB become a problem with current speed.

Performance Analyse Script: plateau finder

As a metric for performance improvements we need a script that builds up load, until it sees that the network can not keep up anymore.
as a result we get:

transactions processed per second.
average block speed.

in addition we get insight on how much the average blockspeed drops in favor of bigger blocks.

test design:

Create and Fund x (x=100) Test accounts. (Unfortunatly we are limited here by web3.js on how many accounts it supports before crashing)
Send n transaction for each x
n becomes larger and larger.
measure performance of the blockchain and append it to a performance result CSV.
extent with scaling up the network so we have different graphs for each network size.
make long running tests so we see the impact of DB size on the Performance.

regression test: Unavailability handling caused by Node shutdown

A Node is treated as available.
Once it got chosen to write it's PARTs and ACKs,
and it misses out doing so during the key generation time window,
it must be excluded from the set and flagged as "unavailable".

The validators, who managed to write their keyshares - they keep beeing in the validator - set.

If there are enough additional potential validators,
a replacement validator is chosen by random.
If the system is running out of potential validators,
the HBBFT validator Set continues running with less validator set.

In a new key generation time window - all validators get another chance for writing their keys.
(See also: StakingHbbftBase.currentKeyGenExtraTimeWindow)

Suggested Test:

Settings
60 seconds lenght of Phase 1
1 MOC/RPC + 3 Regular Nodes.
Actions:
Wait for Epoch Switch (so we have a full Phase 1 time window left)
Stake on all 3 Regular Nodes.
Shut down RegularNode1.
Key Generation should throw out Node1 and Mark it unavailable.
And continue with 2 HBBFT Consensus Nodes.
Boot up Node1 again.
Node1 should notify the system that it became available again.
Wait for Epoch Switch.
Node1-2-3 should all be a pending validator again.
The nodes should be able to write their Key.
The HBBFT System runs now with 3 consensus nodes.

validator without ACKS scraper

Develop a data scraper that analyses the blockchain and figures out occurances of validators that did write their PART, but did never write their ACK.
we noticed this anomality multiple times:
DMDcoin/openethereum-3.x#65

can use blockscout API including GraphQL
should work from the tail of the blockchain to the front (newer data is more interesting.
should evaluate Transactions, and parse the data field instead of reading the contract data.
detect cases
prove thesis: a case leads to a node dropout in the upcomming epoch.

Status Message missing

sometimes we miss the status message (Block IMport...) in the nodes.
it could be a lock problem.

early epoch end: automated test

DMDcoin/diamond-node#87

mimic claiming pot dillution

... and write a service that distributes the coin to the delta pot and the DAO.

automatic restaking implications

since automatic restaking changes, the functionalities of honey-badger-testing are obsolete, and need to get removed / rewritten.
See contract changes:
DMDcoin/diamond-contracts-core#43

Create setup for hbbft Testnet1.

latest openethereum implementation for hbbft
latest hbbft-posdao-contracts deployed in the genesis block
analytic functions to verify basic functions of the hbbft-posdao-contracts
integrate blockscout into honey-badger-testing

extract core logic into a shared library

The core logic for contract interaction is useful for other projects as well.
diamond-ui (dmd-ui), replacement of posdao-ui already uses it.
we should seperate it to a shared library

automatic client restart upon unavailability.

we can watch the blockchain for seeing client unavailable events.
if we see such, we can auto-restart the node. Eventually even clearing the cache.
could be implemented in the watchdog - service.
(a more sophisticated implementation would seperate watchdog and actions)

regression Test: Unstaking and removing as validators

prerequesite

Continuation of #12
Validator has staked on a node,
The node got picked as validator.

action

Requesting Unstaking and Unstaking

expected result

Node will not get picked in the next validator selection round.
After the locking period, the staking can get removed by the pool owner.

regression test: fall back to single validator node

we are suffering with a special case where the system can not close the block.
Expected Problem: reward() call at closing the block throws an error.
Task: write a regression test that leads exactly to that problem.
More about the problem: DMDcoin/diamond-contracts-core#86

localnet vs remotenet as config param ?

currently the config does not tell if a network is a local net or a remote net.
especially, a localnet can be deployed to become a remotenet.
in this case the localnet should not be used anymore.

the project structure defines 3 set of functions concerning networks::

localnet-xyz : interacts with local processes and local files - function can not be applied on a remotenet.
remotenet-xyz : interacts via SSH with remote nodes - functions can not be applied on a localnet
net-xyz: interacts with RPC only, functions can be applied to localnet and remotenets.

there is also testnet-xyz what means localnet in it's essence, but usually refers to integration tests for a specific feature.
this testnet naming pattern should be reworked, by either specifying features and restrictions that only apply to integration tests, or completly replaced with localnet.

Tests that are currently written for net suffer the inability to not spin up and shut down the 'localnet' - and are currently not suitable for automatic integration tests - that issue should be addressed.

activate prometheus & grafana

This would be a huge improvement for our analytical abilities.

adopt spec.json generation

because of

lost coins testing
openzeppelin upgradability

filling reward contracts payout pots

Integration test:
Automated filling of the reward payout pots.
Prerequisite of claiming tests.

testnetwork: n out of m validator nodes network generation.

The current Testscripts support only a "All IN" Testnet setup.
Meaning, it is possible to create a testnet with 7 nodes, but all 7 nodes start of being a Validator node at origin.
it is already prepared in the script here: 0d4a357

A possible implementation option would be to implement this in the
open ethereum hbbft config generator.
https://github.com/SurfingNerd/openethereum-3.x/blob/surfingnerd/fix-hbbft-config-gen/crates/ethcore/src/engines/hbbft/hbbft_config_generator/src/main.rs

That implementation would make a lot of regression tests easier and blend in very well in the existing test suite
(even with pumba support)
#3

extend contract upgrade tool

During Upgrade, some contracts might require an initialization, in the case the contract did not have a reference to another contract before.

Question:
Is there a Safe Method to figure it out programmatic ?

A1:
IsInitialized() is always false, because the contract is missing it's storage until the upgrade is complete.

Integration Test definition

Define a long running test with real diamond-nodes that test network stability and provide valuable statistics:

Actions execute

Trackings to be done:

storage impact on Blockchain with increasing validators count
block time impact on blockchain with increasing validators count
coineconomy: Staking returns
coineconomy: Pool sizes

Quality of life features:

automate testNet deployment. #4
n out of m test network creation #10

Automating the execution of CI Tests

#100

it might also be possible with a hack in openEthereum to fast spin the time by a huge factor, so we can run the tests with realistic values like 24 hours epoch length and 30 minutes transition time length.
Time would just fast forward - simulating 1 day in just a few seconds.

This testseries might take quite a huge amount of CPU/Memory and Time.
It might be nice to make the log-output available on a web interface.