celestiaorg / test-infra Goto Github PK

View Code? Open in Web Editor NEW

24.0 8.0 10.0 1.83 MB

Testing infrastructure for the Celestia Network

License: Apache License 2.0

Dockerfile 1.64% Go 94.25% Makefile 4.11%

celestia testing testground

test-infra's People

Contributors

Stargazers

Watchers

Forkers

bidon15 dougefresh rysiman whatsa4 evan-forbes r3kt-eth vgonkivs rach-id dorucioclea musloda

test-infra's Issues

docs: create the process of transferring bugs to actionable items

This is an issue that requires definition of how we treat bugs:
Main criterias to focus for the doc are:

How we treat bugs that are not covered by any existing test-cases
What to do if the bug is hard and expensive to reproduce (limitations section)
Response time to bugs by urgency levels
When to reject/close/obsolete the existing bug

manual: light client connecting to another light client

Be network topology design, the light client should not connect to another light client
We can try to break this using mutual peers or trusted peer of the active light client to sync the new one

testground/app: refactor repetitive steps into reusable components

Introduction 🖖

As @Wondertan has correctly mentioned, having redundant code for Celestia App network creation is not a sustainable approach. Thus this issue should address to solving this

What needs to be done 🧐

Test-steps like InitChain, CreateKey, AddGenAccounts, GenTx, CollectGenTxs should be refactored to components in the appkit to make the overall process of bootstrapping an app's network better
For instance, rn the RunApp func is mostly containing the same code base that InitValidator has for Application specific instances

References 👀

PR that caused this issue creation: #23

Maksi

investigation/testground: multiple barriers are causing some of the instances to freeze

When we want instances to get into a specific state and later continue test logic, we are using state mechanism in sync client.
However, after 1+ usage of state and barrier we see that sometimes instances are freezing due to waiting for an event to happen across each other.

Need to create a simple test-case that abuses barrier multiple times to see if this is reliable reproducible

infra: schedule execution of tests

When to execute tests and at what stages are still an open questions to be answered

What branches can wait more time for execution of the majority of tests
What stages are defined in the pipeline to start with

manual: dasing for big messages

Pre-Reqs:
In Terminal 1:

# 16000 are an equivalent for 16 kilobytes 
msg=$(openssl rand -hex 16000)

pay for message testing

for 1mb message
for 16mb message
for 32mb message

Steps to do:

In Terminal 1: Have a dedicated celestia-app/celestia full node in synced status and account to create payForMessage TXs
In Terminal 2: Have a light client synced with the celestia full node
In Terminal 1: Execute the command with different message sizes (from 1 to 16 and 32mbs)

celestia-appd tx payment payForMessage 0102030405060708 $msg --from <acc_addr> --keyring-backend test --chain-id devnet-2

testground/app: make chain-id and moniker as test-params

Now, we have chain-id and monikers as variables defined in the *.go files, which is not ideal.
Relocating definition of those 2 into the manifest.toml and composition.toml files respectively are preferable.

testground/app: app consistently produces blocks every minute

Description to be added @Bidon15

exp: Testground case study for e2e tests

Going through the docs and executing the tests using testground to have a clear decision on pros/cons of using this for our e2e tests

docker: push the docker image build for celestia-app to org docker hub

As we are using celestia-app as the first point to test against, we should push celestia-app docker image to docker's org first

docs: create a kick-off test plan

We need a test plan that covers the basics of our repos.
The test plan should be based/linked to existing spec/ADR to not lose track of what is being tested
Creation of test cases should be done in the separate issue that will be linked to this
The pipeline for executing tests should be done in the separate issue that will be linked to this

infra: hive forking and tuning

We need to revisit what is needed in the repo, to define what

can be used for out testing needs
should be removed in the present state of the repo
functionality is lacking to achieve our testing needs
dashboard to track em(tests) all

testground/tests: implement TC-003

After finishing #60, we need to adapt the code-base to reflect

Syncing after some amount of blocks
Create more composition files reflecting data
Better documentation tracing
Measure the sync time to get a baseline figure for further comparisons with new implementation of the p2p stack

Ref:

testground/node: use WithListenAddr instead of config for node initialization

Ref: https://github.com/celestiaorg/test-infra/pull/27/files#r915810034

We need to get rid of Config init just for the listenAddress
Firstly, we need to add an option in Celestia Node repo

testground/tests: implement TC-002

We already have all the presets done in node_sync.go
The only left to do is:

Create more composition files reflecting data
Adjust the bandwidths
Better documentation tracing
Redo the block reach to a runenv param to avoid hard-coding test ending

Ref:

testground/celestia-app: create basic validator steps creation

In order to finish celestia-app basic chapter, we need the list below to be done in order to have not-only pre-gen keys/configs as the only solution:

keys add creation
adding genesis account
sending genesis.json across N amount of instances
gentx per instance
sending gentxs to orchestrator
collect-gentx by the orchestrator

testground/tests: implement TC-001

We already have all the presets done in init_val.go
The only left to do is:

Create more composition files reflecting data sets
Adjust the bandwidths
Better documentation tracing

Ref:

infra/metrics: Dashboard for monitoring of the cluster itself

Now we are on the finishing line of the #25, we need to show the data charts that the cluster is generating
https://docs.testground.ai/v/master/runner-library/cluster-k8s/monitoring#cluster-wide-resources-utilisation

This is important as we need to analyse whether hw resources are a bottleneck in the future big network tests or not

Ref: #32

hive: basic test for celestia-core + celestia-app

This test-case should test celestia-core on the following scenario:

Pre-reqs:

Spin up the sim env
Load config(genesis file, etc) for the celestia-core + app
Startup the core+app instances
Check the genesis file is consumed correctly

Steps:

Post the data via the cli of the app
Check data consumption

Motivation: we need to smoke this out to get hands dirty in order to complete burning gas fee test-case(ref to #39)

infra: connect to Influx DB and populate data

After we finish some details for #25 we need to execute a dry-run with a basic test scenario to see what actually Influx DB data is populated for further analysis

testground/app/node: prepare downgrade of dependencies

Out test-plans are based on v.0.35 tendermint for both app/sdk that are a dependency for node
We need to also downgrade to be streamlined
Ref: celestiaorg/celestia-node#951

testground/app: move home path creation to a params .toml file

Same as #35, there are moments when the test designer wants to use different paths for home directory, when testing celestia-app

We need to move it from the code-base to the test params section of the .toml file

docs: Test-case creation

Creating successful test-case requires each of criterias to be met:

A Test plan -> test suite should be linked to it
What part of the spec/ADR the test-case is testing
Steps should be human readable
If a bug is found with the test-case, it(bug) should be linked to the test-case to avoid further duplicates

testground/infra/app: collect metrics

We need to gather the data from the app-nodes to further analyze the test-run
Tendermint has the built in metrics collection module that require us to do 3 steps

Open up the port in the dockerfile
Configure the settings option
Forward the metrics to testground's infra to collect the metrics

https://docs.tendermint.com/master/nodes/metrics.html

testground/doc: add explanation how to run test-plan in readme file

We need documentation that will explain to the user how to run our test-plans and where to find documentation to learn more about testground as well as how to setup infrastructure

[EPIC] Failure and Recovery cases for block with withheld data

Background

Original Message:

We should do network tests for cases like if a block producer withholds some of the block:
(1) will full nodes reconstruct it in case the block is recoverable, and how long will it take?
(2) after the block is reconstructed - what happens to light clients that have hanging DAS queries?
(3) if the erasure code is valid, will a bad coding fraud proof be generated and how long?

Introduction

This epic contains creating/implementation/reporting of the test-plan(or plans)

Docs/Test-Plan creation

TBA

Test-Plan implementation

#96
Big Network tests for Sq Sz 128/256
DASing from a Full Node that has reconstructed a block

Test-Plan Execution & Reporting

#126
Big Network

Notes

manual: measure footprint from fully synced devnet-2

Steps to do:

install celestia-app and celestia-node . Install docs: https://docs.celestia.org/nodes/overview
Start/Sync the celestia-app non-validating instance with the chain
Start Celestia Bridge node with a remote flag pointing to a synced celestia-app with the genesis trusted hash (block=1)
Wait till Celestia Bridge becomes fully synced
Repeat 2-3-4 for Full Node
Initialize the Celestia Light Node with the genesis trusted hash and the trusted peer of the synced Celestia Full Node
Start the Celestia Light Client
Wait until the Celestia Light Client is fully synced
Using sudo du -sh <path> measure the disk space of the following directories

.celestia-app
.celestia-bridge
.celestia-full
.celestia-light

testground/node/app: create PFD steps

Introduction 🖖

In both celestia-app and celestia-node, we can submit pfd. However, this is not a straightforward approach for both of them

Celestia Application part 🔧

In celestia-app, we can already use the wrapcli approach to fulfil plain PFD
Still, we need to implement opening of the grpc rpc endpoint from app to other celestia-node types such as light/full

Celestia Node part 🧱

Pre-Requisites 📦

Celestia-App should find a way that the node is asking for funding the account
Celestia-Node should check the balance before starting PFD

What needs to be done 🧐

In celestia-node, we are currently using the RPC approach
https://docs.celestia.org/developers/node-tutorial#connect-to-a-public-core-endpoint

We might need to wrap this RPC part as usually the user uses these APIs
https://docs.celestia.org/developers/node-tutorial#submit-a-pfd-transaction

Ihyuf GJ no

infra: Increase cluster resources

According to our upcoming test-plan implementation #55, we need to prepare the cluster to accommodate:

100 Celestia Application Validators (4 cores/ 6Gb) => 400 cores / 600 Gb
100 Celestia Bridge Nodes (4 cores/ 6Gb) => 400 cores / 600 Gb
50 Celestia Full Nodes (4 cores / 4 Gb) => 200 cores / 200 Gb
1000 Celestia Light Nodes (2 cores / 3 Gb) => 2000 cores / 3000 Gb

The total for all this is 2600 cores / 4400 Gb
Taking into consideration sidecar/influx and the testground's daemon, we definitely need more power

docker: create celestia org account for dockerhub

The goal is to have a celestia organisation in docker hub, where we can store all docker images related across all org repos

ci: tests execution env

We need to decide on which CI to execute tests as well as on which stages to execute them(tests)
This requires analysis of existing tools like circle/travis/ga/etc.

testground/app: Configure any node to be a bootstrapper as a pre-requisite

Introduction 📜

In order to start the chain, our validators should find each other.
Now, we are adding them as p2p.persistent-peers in config.toml using sync.Client from testground.

Ideally, we need to configure any of the existing node to be a bootstrapper to others by editing the config.toml file

Bootstrap Mode 📡

What to do:

Create a new topic, where any validator can become a bootstrap-peer to others by publishing it's peerid to the event
Others listens to the event to receive this peerid (or a set of peerids)
We need a new WrapCLI func that changes p2p.bootstrap-peers for those nodes who wants to find others

docs: Test-plan for full node(with no core node's setup) communication with other full nodes

As writing first test-plan relying on ADR #002: Devnet Celestia Core <> Celestia Node Communication, it is going to be good/easy-to-read to have 2 test plans rather have 1 bigger test plan.

The main focus for this test-plan is to cover cases, where CFN with no CCN should communicate with CFN(embed/remote types) in order to receive data

testground: experiment with celestia-app/celestia-node

After finishing #5 , we need to apply gained knowledge to our app/node products

testground/app: move account params to a .toml file as test params

As a continuation effort for the app started in #34 , it will be beneficial to set up token allocation as well as backend keyring setups in test parameters in the manifest.toml instead of the code-base itself.

This will be really good for the #31, too

investigation/testground: mustpublish/mustsubscribe

We need to create a simple test-plan that abuses must* commands in the sync client.
From time to time, we can not publish nor receive information from events properly

testground/app: create a seed node

a seed node that connect the peers to each other, so the peers don't need to add one another as persistent peers as mandatory anymore

This can help us in the future minimize the amount of inbound/outbound peers per 1 validator for testing bandwidth vs big blocks

docker: rework plan's dockerimage

The initial docker image is not tied to an influx db
this should be edit as well as we should refactor some of the points

simple local test-net using docker compose

Suggestion

Modify the existing approach in tendermint to run a local testnet using docker-compose.

References:

Instructions:

https://docs.tendermint.com/v0.32/networks/docker-compose.html#build

It should be a low-hanging fruit to use this approach to spin up a celestia-node IMO. That way we create the genesis file once and put it directly into the docker containers before creating them (no need to scp stuff around etc).

[EPIC][Celestia-App]: Refactor existing features to reusable components

Introduction 🖖

This epic contains all necessary features that need to be refactor out as reusable components for future ease of test development
As spinning up Celestia Application instances is a pre-requisite to every test-run, we need to make this setup process less redundant as well as let the test designer configure those components as they wish to via .toml file

What needs to be done 🧐

hive: basic test for celestia-core

This test-case should test celestia-core on the following scenario:

Spin up the sim env
Load config(genesis file, etc) for the celestia-core
Startup the core instance
Check the genesis file is consumed correctly

testground/app/node: Fund accounts mechanism

Introduction 🖖

Celestia Node full/light need to have funds in order to start submitting PFDs.
Creating a sync topic for that specific case will solve the task

What we have as a reference point 👀

We already have a sync topic for sharing accounts between app's for funding genesis accounts (ref: #24).
Tbe, these parts:

What needs to be done 🧐

To get this done we need:

A new topic for funding accounts
New WrapCLI command for app
Node publishes to the topic for funding
App subscribes to all events and funds celes accounts accordingly

Ref: #31

celestia-app: smoke suite

We need to create a smoke suite for celestia-app without hive due to issues with #3
What ideas for implementation could be used for this(sorted by ease and knowledge):

vanilla docker compose
dockertest
testground

Choose a cloud provider
Install all dependencies required to run a testground's test-plan
Setting up a monitoring dashboard for runs
Including test-runs into GA (e.g. cadence)

Materials to read:

implement test-plan #001 for celestia-node

The goal of this issue is to have tests running for celestia-node based on test-plan #1

More info here: docs/test_plans/tp-001-devnet-full-node.md

testground: bandwidth params as manifest.toml test values

Now we have bandwidth params hardcoded into go files which is not handy as we want to expand on experiments when the end user has 100/256/320/512/1k mbps bandwidth

celestiaorg / test-infra Goto Github PK

test-infra's People

Contributors

Stargazers

Watchers

Forkers

test-infra's Issues

Introduction 🖖

What needs to be done 🧐

References 👀

Background

Introduction

Docs/Test-Plan creation

Test-Plan implementation

Test-Plan Execution & Reporting

Notes

Introduction 🖖

Celestia Application part 🔧

Celestia Node part 🧱

Pre-Requisites 📦

What needs to be done 🧐

Introduction 📜

Bootstrap Mode 📡

Suggestion

Introduction 🖖

What needs to be done 🧐

Introduction 🖖

What we have as a reference point 👀

What needs to be done 🧐

Recommend Projects

Recommend Topics

Recommend Org