tbarbette / npf Goto Github PK

Network Performance Framework: easy-to-use experiment manager with automated testing, result collection, and graphing

License: GNU General Public License v3.0

Python 96.00% Shell 0.84% Makefile 0.01% Dockerfile 0.04% Vim Script 0.98% Jupyter Notebook 2.13%

regression graph networking performance comparator reproducible-experiments reproducible-research reproducibility

npf's Introduction

Network Performance Framework

Run performance tests on network and system software by running snippets of bash scripts on a cluster following a simple definition file. For instance, the following configuration to test iPerf2 performance:

%info
IPerf 2 Throughput Experiment

%config
n_runs=5
var_names={PARALLEL:Number of parallel connections,WINDOW:Window size (kB),THROUGHPUT:Throughput}

%variables
PARALLEL=[1-8]
WINDOW={16,512}
TIME=2

%script@server
iperf -s

%script@client delay=1
//Launch the program, copy the output to a log
iperf -c ${server:0:ip} -w ${WINDOW}k -t $TIME -P $PARALLEL 2>&1 | tee iperf.log
//Parse the log to find the throughput
result=$(cat iperf.log | grep -ioE "[0-9.]+ [kmg]bits" | tail -n 1)
//Give the throughput to NPF through stdout
echo "RESULT-THROUGHPUT $result"

When launching NPF with:

npf-run --test tests/tcp/01-iperf.npf \
        --cluster client=machine01.cluster.com server=machine02.cluster.com

NPF will automatically produce the following graph. Configuration options enable you to change the graph type and many other options easily. check the wiki to see different graphs displaying the same data.

Test files allow to define a matrix of parameters to try many combinations of variables (see here for a description of the possible definitions such as values, ranges, ...) for each test and report performance results and evolution for each combination of variables.

Finally, a graph will be built and statistical results may be computed for each test showing the difference between variable values, different software, or the evolution of performances through commits.

Test files are simple to write, and easy to share, as such we encourage users to share their ".npf" scripts with their code to allow other users to reproduce their results and graphs.

NPF supports running the given test across a cluster, allowing to try your tests in multiple different configurations very quickly and on serious hardware.

Documentation

The documentation is available on read the docs!

Quick Installation

NPF is built using Python 3, and is published on pypi, so it can be installed with pip using:

pip3 install --user npf

At run-time, NPF uses SSH and can benefit from usage of sudo and NFS, see the run-time dependencies in the documentation for more information.

With docker

We provide a Dockerfile to use npf.

docker build --tag npf .
docker run -it npf npf-compare ...

Big picture

Your .npf test file is composed of a series of sections, as in the example given above. The sections describe the scripts to run, where to run them, what variables should be tested, what are their ranges, configuration parameters such as timeout or graph colors, etc. Each section is described in more detail in the "writing test script" documentation.

When launching NPF, you will also give the name of one or more repositories, which are files located in the repo folder describing software to download, install and compile so everything is in place when your experiment is launched. They follow a format described in repo/README.md. It can also be ignored using the local fake repository.

Your test script will also define a few script roles, such as client or server as in the example above. When you actually launch your experiment, you must tell which machine (physical or virtual) will take the role. For simple cases, passing the address of a machine with the --cluster role=machine will be enough. When you'd like to define parameters such as IPs and MAC addresses, you can define a cluster file that will describe details about each machine. See the cluster documentation for more details.

Where to continue from here?

Have you read the writing tests documentation? Then, inspire yourself from the test script files in tests/, and write your own!

How to distribute your test scripts, modules and repo files?

We welcome merge requests for generic stuffs! But you can keep your files in your "experimentation" folder. Indeed, NPF will always look for a file first in "./repo" for repo files, "./modules" for modules, and "./cluster" for machine definition.

npf's People

Contributors

Stargazers

Watchers

Forkers

sdarayan massimogirondi palanik1 hamidgh09 louisna cnm777 mpiraux zfzhong dpcodebiz supercoolin lapnd panoleramix mmanoj cdelzotti simcornelis

npf's Issues

Allow generating files without test execution

Let NPF generate scripts that should run on the different machines, but without having to execute them. This could be done through a command line option.

Cluster example file doesn't describe the user directive

https://github.com/tbarbette/npf/blob/d098aef870c2dbfcef14b6aafba56fce28180d59/cluster/cluster01.node.sample

Also, the addr directive look like user@domain can be used, although it does not seem to work any more.

Monitor performance evolution throughout different repo versions

Allow to compare the performance evolution of different git versions of a repository.

Generate a Jupyter Notebook that produces graphs

Fidgeting with NPF graph configuration can be avoided if the result of the NPF execution is a Jupyter Notebook. The user can then directly tweak the resulting graphs or further explore the collected data.

The objective is to generate a notebook that imports the data and generates the same graph as NPF.

Include support for SSH configuration files

Streamline the SSH connection using hosts defined in configuration files.

NPF indefinitely holds when connecting without NFS and internet connection

When connecting without internet access, npf seems to get stuck at this line of code.

When not using NFS, automatically create remote path

To ease the learning curve, it would be great to be able to use NPF without having to handle "path=" and folder creation on the remote machines without NFS.

For now, the remote path will be by default the local path, so the user has to create the similar remote path. The best would be by default if no remote path is created, to create a "npf" folder on the remote home or something like that.

Implement email notifications for test results

Automatically report test results by mail once the execution has completed. This includes graphs, raw data, and associated scripts.

Escape saved files

As experienced by @aliireza if some characters are used in variable names or values, then the parsing of the file will fail (eg one of "{}:=") at least.

At the beginning I chose a custom format to keep the results file readable, but now the format got extended and that file is not readable anyway. So pickling should be fine.

Better output format

I think it is better to produce a Pandas/Dask data frame rather than CSV files, as they are easier to post-process.

It seems CSV output cannot have multiple variables in some cases. For instance, the following CSVs have embedded one variable value into the name of the file.

Cluster file and special characters

Hello,

I have some interfaces that have dashes in their names. For example sn-p1, so in the cluster file we have:
0:ifname=sn-p1

But then in npf the variable pc:0:ifname returns only the part before the dash: sn.
I have tried using quotes (" or ') and escaping (\), but they do not work.
Is this the intended behavior? Is there some other escape character?

Re-think how loading previous results should be done

The problem is that some experiments generate multiple values per runs. So if a test should do 3 runs, but we see previous variables has 5 and 7 values. How many good runs were actually done?

Allow to generate variables at graph-time

Something like PERCENT=(HIT)/(HIT+MISS) but without rerunning the tests, as opposed to what pyexit allows.

Proposed by @aliireza

Crash in the ZLT experimental design

While performing a zero-loss throughput search, the processing crashes when it has no values left to try, either because the minimum value still causes loss, or because the target value in the process of getting an acceptable value occurs to be outside of the range provided by the user.

As example, with a rate defined as RATE=[1-100#1], an experimental designed defined with --exp-design "zlt(RATE,RX-GOODPUT-GBPS-PKTGEN)", and such results for a specific test case:

NDESC:256,NTHREADS:4,RATE:100,SIZE:1024,TABLE:0={RX-RATE-PPS:6970137.0,6966369.0,7057956.0},{RX-RATE-MBPS:58214.1460744,58182.6912224,58947.6194112},{LOSE-RATE:0.40676482083724,0.40717707911177,0.3993470714069},{AVG-LAT:0.0006643464506172799,0.0006664189814814801,0.0006661439043209901},{RX-GOODPUT-GBPS-PKTGEN:59.0,59.0,60.0}
NDESC:256,NTHREADS:4,RATE:59,SIZE:1024,TABLE:0={RX-RATE-PPS:6836607.0,6863517.0,6870717.0},{RX-RATE-MBPS:57098.6279176,57323.3810496,57383.4996384},{LOSE-RATE:0.010669608963823,0.0069656616030801,0.0058126685257428},{AVG-LAT:0.00011578858024691,0.00014424035493827,0.00012892361111111},{RX-GOODPUT-GBPS-PKTGEN:59.0,58.0,58.0}
NDESC:256,NTHREADS:4,RATE:57,SIZE:1024,TABLE:0={RX-RATE-PPS:6615077.0,6455461.0,6664933.0},{RX-RATE-MBPS:55248.3990632,53915.3320736,55664.7847008},{LOSE-RATE:0.0085888474008111,0.032754681330904,0.0014417885279341},{AVG-LAT:8.163117283950599e-05,0.00015808757716049,5.8790509259258994e-05},{RX-GOODPUT-GBPS-PKTGEN:57.0,55.0,57.0}
NDESC:256,NTHREADS:4,RATE:55,SIZE:1024,TABLE:0={RX-RATE-MBPS:7.76e-05,52443.03136,53690.6564448},{LOSE-RATE:0.999999984462,0.024391036299443,0.0014013166576273},{AVG-LAT:0.0,0.00016096643518518998,5.4939814814815e-05},{RX-GOODPUT-GBPS-PKTGEN:0.0,53.0,55.0},{RX-RATE-PPS:0,6279182.0,6428567.0}
NDESC:256,NTHREADS:4,RATE:16,SIZE:1024,TABLE:0={RX-RATE-PPS:1867116.0,0,1865846.0},{RX-RATE-MBPS:15593.3998472,0,15582.8079272},{LOSE-RATE:0.0012343911120845,1.0,0.0023091008216815},{AVG-LAT:4.462808641975299e-05,0.0,4.6456404320988e-05},{RX-GOODPUT-GBPS-PKTGEN:15.0,0.0,15.0}
NDESC:256,NTHREADS:4,RATE:3,SIZE:1024,TABLE:0={RX-RATE-PPS:350057.0,350135.0,350179.0},{RX-RATE-MBPS:2922.960672,2923.612128,2923.9650536},{LOSE-RATE:0.0015578834120213,0.0013718684820204,0.001360875180768},{AVG-LAT:0.0025012793209877,4.5064429012345994e-05,4.6386574074073996e-05},{RX-GOODPUT-GBPS-PKTGEN:2.0,2.0,2.0}

NPF crashes with the following traceback:

Traceback (most recent call last):
  File "(removed path to npf)/npf/npf-compare.py", line 72, in <module>
    main()
  File "(removed path to npf)/npf/npf-compare.py", line 60, in main
    series, time_series = comparator.run(test_name=args.test_files,
  File "(removed path to npf)/npf/npf/test_driver.py", line 30, in run
    build, data_dataset, time_dataset = regressor.regress_all_tests(
  File "(removed path to npf)/npf/npf/regression.py", line 198, in regress_all_tests
    all_results,time_results, init_done = test.execute_all(
  File "(removed path to npf)/npf/npf/test.py", line 1198, in execute_all
    for root_variables in all_variables:
  File "(removed path to npf)/npf/npf/expdesign/zltexp.py", line 102, in __next__
    next_val = max(filter(lambda x : x < target,left_to_try))
ValueError: max() arg is an empty sequence

I bet this is because RATE=3 leads to a GOODPUTH of 2, i.e. a drop rate of 1, leading to searching for the rate below 2-1=1, which does not exist as 1 is the lowest rate possible.

Go through an intermediate CSV before doing graphs

People get confused about the cache, the graphs and CSV.
When you create a figure, then move to other experiments and add other variables, you can't re-do the first figure without "removing" the new variables, because the cache doesn't know what variables were added. E.g. if you do a graph about TCP stuffs, then decide to try with CONGESTION={vegas,cubic,bbr}, you'll have to re-run old tests because before that point, the congestion control used was whatever the default on the system.
In paper rush, it might be stressful to have to re-do tests or deactivate new variables to rebuild the graph (and dangerous, imagine BBR was set system-wide, then you're actually not in the same conditions).

So the idea would be to keep the cache idea as hidden as possible, and always export a CSV that will be used to create the graphs. The npf commands will continue to build graph automatically but a new npf-graph command would allow to rebuild the very same graph from the CSV.

The question left therefore would be what would be the appropriate CSV format. Knowing we have multiple output variables, and multiple runs per parameters, and also multiple series when using npf-compare.

Imagine we compare netperf and iperf, have one variable "ZEROCOPY" that can have values 0 and 1 and have two outputs results THROUGHPUT and LATENCY, and do 2 runs :

series,run_number,ZEROCOPY,THROUGHPUT,LATENCY
iperf,1,0,...
iperf,1,1,...
iperf,2,0,...
iperf,2,1,...
netperf,1,0...
netperf,1,1...
netperf,2,0...
netperf,2,1...
The problem is still that some "output" (results) can have multiple values in the same run. We could use another (a bit non standard) separator to have multiple results in a single column, eg using the "+" sign (using a ";" might lead to bad interpretation of CSV).

Any input on this?

Use a certain amount of clients programmatically

For now, one can run a script (a role) on multiple servers by defining it a certain number of times on the argument list (eg --cluster client=server20 client=server21, ...). It would be nice to have a parameter that effectively select those clients.

Evaluation of dynamic variables during the initialization phase

If one defines something like:

%variables
CPU=[1-8]

%late_variables
THREADS=EXPAND( $(( $CPU + 1 )) )

Then during initialization (creation of files, or %init sections) an error will appear because the evaluation of late_variables has not yet any value for CPU, that will get values defined for each of the runs only.

Solution is to have a dependency graph (ouch) or maybe add a parameter to late_variables to specify that it should not run at initialization phase. Probably easier.

Missing config in example scripts and documentation

Hello, I was reading the docs and I couldn't run any of the provided simple scripts. I then tried to dig into the repository and found that neither the following test could run
tests/examples/tests-readme.npf
Giving me the following error, as in the scripts of the documentation:
This npf script has no default repository

I found that the tests/examples/math.npf has this particular line of code that actually solves the above error.

%config
default_repo=local //No program under test

I would propose to include this line to the broken tests and provide some paragraph in the documentation about the local repo.

Thank you.

(I know it is a minor thing, but as someone coming in first contact with your (really interesting) tool, I find the ability to run simple scripts and build knowledge to be useful).

Finish package dependencies

For instance, snort needs libdumbnet-dev to build (or -devel on redhat*), the "package" method is provisioned but not implemented yet. It would be nice to finish this to handle installation of system-level dependencies.

Inappropriate ticks for a variable

Support for CDFs graph

Following #13, a native support for CDFs would be nice.

The idea is that all results for each run would be printed as a line for CDFs.

Either a single test produce many RESULT-XXX in stdout, so we have many points and can draw a CDF showing the percentage of points in Y, and the value range in X from that.

Or, one have "noise" variables, and take all results from the noise in a single variable using var_aggregate={VAR1+VAR2+VAR3:all}.

Maybe I should add a prefix to variables to automatically "hide" them, i.e one would be able to add a variable such as @LOSS=[0-10], showing we don't really care about the result for a particular LOSS rate but want to re-run tests for many LOSS values and its impact on the variance (through a CDF, or errorbars, or boxplot).

Remote compilation

When destination machines have different arch, we need to compile remotely, and use different Softwares on remote.

The cluster files already have "arch" but it does not do anything yet.

Exception when plotting after experiment

I'm currently trying to run this command :

python3 /usr/local/bin/npf-run.py local --test ./script.npf --graph-filename ./results/graph.pdf --variables LIMIT_TIME=14 --cluster client=$CLIENT server=$SERVER --graph-size 12 10 --single-output ./results/out.csv --cluster-autosave --result-path ./results/

With the following variables in script.npf :

%variables
CPU=[1-1]
WL=1000
FREQ={1000,2000,3000,3900}
TIME=10

GEN_THREADS=8
GEN_BURST=8
GEN_RX_THREADS=8
GEN_FLOWSIZE=32

SLEEP_MODE={sleep_mode1,sleep_mode2,sleep_mode3}
SLEEP_DELTA={1,2,4,8,16,32}
GEN_RATE={4000000}
BURST_SIZE={32}
GEN_LENGTH=64
MINFREQ=1000
LIMIT=1000000000

FASTCLICK_PATH=/root/fastclick_sleep_modes/

%config
n_runs=5
results_expect={WATT,RAM,THROUGHPUT,PPS}
graph_filter_by={WATT:DROPPEDPC>0.01,MAXWATT:DROPPEDPC>0.01,LAT99:DROPPEDPC>0.01}
accept_zero={BEGIN_POLL,BEGIN_C1,BEGIN_C1E,BEGIN_C6,END_POLL,END_C1,END_C1E,END_C6}
...
[Experiment descriptions]
...

%import@client fastclick-play-single-mt

While the (in my case very long) testing happens without any issue, npf crashes afterwards during what seems to be the plotting phase. This is the python stack that appears :

ERROR: When trying to export serie local:
Traceback (most recent call last):
  File "/usr/local/bin/npf-run.py", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/npf_run.py", line 262, in main
    grapher.graph(series=[(test, build, all_results)] + g_series,
  File "/usr/local/lib/python3.10/dist-packages/npf/grapher.py", line 706, in graph
    raise(e)
  File "/usr/local/lib/python3.10/dist-packages/npf/grapher.py", line 703, in graph
    all_results_df = pd.concat([all_results_df,x_df],ignore_index = True, axis=0)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/concat.py", line 393, in concat
    return op.get_result()
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/concat.py", line 678, in get_result
    indexers[ax] = obj_labels.get_indexer(new_labels)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 3882, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Is there a solution to that problem ? 😃

%cleanup

There's %exit per script, but we'd need a global cleanup afterwards like init

ModuleNotFoundError: No module named 'npf.types.web'

Hi! I installed NPF directly from pip, but when I tried to run it, it gives the following error:

$ npf-run
Traceback (most recent call last):
  File "/home/user/.local/bin/npf-run", line 5, in <module>
    from npf_run import main
  File "/home/user/.local/lib/python3.8/site-packages/npf_run.py", line 11, in <module>
    from npf.regression import *
  File "/home/user/.local/lib/python3.8/site-packages/npf/regression.py", line 3, in <module>
    from npf.grapher import *
  File "/home/user/.local/lib/python3.8/site-packages/npf/grapher.py", line 9, in <module>
    from npf.types.web.web import prepare_web_export
ModuleNotFoundError: No module named 'npf.types.web'

I've already export ~/.local/bin to PATH so that npf-run can run directly. But that does not seems enough.

Here is my NPF version:

$ pip list | grep npf
npf                    1.0.50              
npf-web-extension      0.6.4

And here is what pip says when installing:

$ pip install --user npf
Requirement already satisfied: npf in /home/user/.local/lib/python3.8/site-packages (1.0.50)
Requirement already satisfied: numpy in /home/user/.local/lib/python3.8/site-packages (from npf) (1.24.4)
Requirement already satisfied: scipy in /home/user/.local/lib/python3.8/site-packages (from npf) (1.10.1)
Requirement already satisfied: importlib-metadata in /home/user/.local/lib/python3.8/site-packages (from npf) (6.8.0)
Requirement already satisfied: npf-web-extension>=0.6.4 in /home/user/.local/lib/python3.8/site-packages (from npf) (0.6.4)
Requirement already satisfied: scikit-learn in /home/user/.local/lib/python3.8/site-packages (from npf) (1.3.0)
Requirement already satisfied: asteval in /home/user/.local/lib/python3.8/site-packages (from npf) (0.9.31)
Requirement already satisfied: typing in /home/user/.local/lib/python3.8/site-packages (from npf) (3.7.4.3)
Requirement already satisfied: packaging in /home/user/.local/lib/python3.8/site-packages (from npf) (23.1)
Requirement already satisfied: matplotlib in /home/user/.local/lib/python3.8/site-packages (from npf) (3.7.2)
Requirement already satisfied: pandas in /home/user/.local/lib/python3.8/site-packages (from npf) (2.0.3)
Requirement already satisfied: pyasn1 in /home/user/.local/lib/python3.8/site-packages (from npf) (0.5.0)
Requirement already satisfied: pydotplus in /home/user/.local/lib/python3.8/site-packages (from npf) (2.0.2)
Requirement already satisfied: require-python-3 in /home/user/.local/lib/python3.8/site-packages (from npf) (1)
Requirement already satisfied: webcolors in /home/user/.local/lib/python3.8/site-packages (from npf) (1.13)
Requirement already satisfied: paramiko in /usr/lib/python3/dist-packages (from npf) (2.6.0)
Requirement already satisfied: gitpython in /home/user/.local/lib/python3.8/site-packages (from npf) (3.1.32)
Requirement already satisfied: colorama in /usr/lib/python3/dist-packages (from npf) (0.4.3)
Requirement already satisfied: pygtrie in /home/user/.local/lib/python3.8/site-packages (from npf) (2.5.0)
Requirement already satisfied: natsort in /home/user/.local/lib/python3.8/site-packages (from npf) (8.4.0)
Requirement already satisfied: regex in /home/user/.local/lib/python3.8/site-packages (from npf) (2023.6.3)
Requirement already satisfied: cryptography==41.0.0 in /home/user/.local/lib/python3.8/site-packages (from npf) (41.0.0)
Requirement already satisfied: gitdb in /home/user/.local/lib/python3.8/site-packages (from npf) (4.0.10)
Requirement already satisfied: ordered-set; python_version >= "3.7.0" in /home/user/.local/lib/python3.8/site-packages (from npf) (4.1.0)
Requirement already satisfied: zipp>=0.5 in /home/user/.local/lib/python3.8/site-packages (from importlib-metadata->npf) (3.16.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/user/.local/lib/python3.8/site-packages (from scikit-learn->npf) (3.2.0)
Requirement already satisfied: joblib>=1.1.1 in /home/user/.local/lib/python3.8/site-packages (from scikit-learn->npf) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /home/user/.local/lib/python3.8/site-packages (from matplotlib->npf) (0.11.0)
Requirement already satisfied: contourpy>=1.0.1 in /home/user/.local/lib/python3.8/site-packages (from matplotlib->npf) (1.1.0)
Requirement already satisfied: fonttools>=4.22.0 in /home/user/.local/lib/python3.8/site-packages (from matplotlib->npf) (4.41.1)
Requirement already satisfied: python-dateutil>=2.7 in /home/user/.local/lib/python3.8/site-packages (from matplotlib->npf) (2.8.2)
Requirement already satisfied: pyparsing<3.1,>=2.3.1 in /home/user/.local/lib/python3.8/site-packages (from matplotlib->npf) (3.0.9)
Requirement already satisfied: pillow>=6.2.0 in /usr/lib/python3/dist-packages (from matplotlib->npf) (7.0.0)
Requirement already satisfied: importlib-resources>=3.2.0; python_version < "3.10" in /home/user/.local/lib/python3.8/site-packages (from matplotlib->npf) (6.0.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/user/.local/lib/python3.8/site-packages (from matplotlib->npf) (1.4.4)
Requirement already satisfied: pytz>=2020.1 in /home/user/.local/lib/python3.8/site-packages (from pandas->npf) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in /home/user/.local/lib/python3.8/site-packages (from pandas->npf) (2023.3)
Requirement already satisfied: cffi>=1.12 in /home/user/.local/lib/python3.8/site-packages (from cryptography==41.0.0->npf) (1.15.1)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/user/.local/lib/python3.8/site-packages (from gitdb->npf) (5.0.0)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib->npf) (1.14.0)
Requirement already satisfied: pycparser in /home/user/.local/lib/python3.8/site-packages (from cffi>=1.12->cryptography==41.0.0->npf) (2.21)

I'm not really familar with Python, and sorry if I missed some basic steps. Thank you!