Giter Club home page Giter Club logo

phoebe's Introduction

Phoeβe

CI

Idea

Phoeβe (/ˈfiːbi/) wants to add basic artificial intelligence capabilities to the Linux OS.

What problem Phoeβe wants to solve

System-level tuning is a very complex activity, requiring the knowledge and expertise of several (all?) layers which compose the system itself, how they interact with each other and (quite often) it is required to also have an intimate knowledge of the implementation of the various layers.

Another big aspect of running systems is dealing with failure. Do not think of failure as a machine turning on fire rather as an overloaded system, caused by misconfiguration, which could lead to starvation of the available resources.

In many circumstances, operators are used to deal with telemetry, live charts, alerts, etc. which could help them identifying the offending machine(s) and (re)act to fix any potential issues.

However, one question comes to mind: wouldn't it be awesome if the machine could auto-tune itself and provide a self-healing capability to the user? Well, if that is enough to trigger your interest then this is what Phoeβe aims to provide.

Phoeβe uses system telemetry as the input to its brain and produces a big set of settings which get applied to the running system. The decision made by the brain is continuously reevaluated (considering the grace_period setting) to offer eventually the best possible setup.

Architecture

Phoeβe is designed with a plugin architecture in mind, providing an interface for new functionality to be added with ease.

Plugins are loaded at runtime and registered with the main body of execution. The only requirement is to implement the interface dictated by the structure plugin_t. The network_plugin.c represents a very good example of how to implement a new plugin for Phoeβe.

Disclaimer

The mathematical model implemented is a super-basic one, which implements a machine-learning 101 approach: input * weight + bias. It does not use any fancy techniques and the complexity is close to zero.

The plan is to eventually migrate towards a model created in Tensorflow and exported so to be used by Phoeβe, but we are not there yet.

10,000 feet view

The code allows for both training and inference: — all the knobs which can modify the run-time behavior of the implementation are configurable via the settings.json file, where each parameter is explained in detail.

For the inference case, when a match is found, then the identified kernel parameters are configured accordingly.

The inference loop runs every N seconds and the value is configurable via the inference_loop_period. Depending on how quick we want the system to react to a situation change, then the value given to the inference_loop_period will be bigger or smaller.

The code has a dedicated stats collection thread which periodically collects system statistics and populates structures used by the inference loop. The statistics are collected every N seconds, and this value is configurable via the stats_collection_period. Depending on the overall network demands, the value of stats_collection_period will be bigger or smaller to react slower or quicker to network events.

In case a high traffic rate is seen on the network and a matching entry is found, then the code will not consider any lower values for a certain period of time: the value is configurable via the grace_period in the settings.json file.

That behavior has been implemented to avoid causing too much reconfiguration on the system and to prevent sudden system reconfiguration due to network spikes.

The code also supports few approximation functions, also available via the settings.json file.

The approximation functions can tune the tolerance value - runtime calculated - to further allow the user for fine tuning of the matching criteria. Depending on the approximation function, obviously, the matching criteria could be narrower or broader.

Settings

Below is a detailed an explanation of what configurations are available in settings.json, the possible values and what effect they have. (Note that this is not really valid JSON; please remove the lines with double-forward-slashes if you use it.)

{
    "app_settings": {

        // path where application is expecting to find plugins to load
        "plugins_path": "/home/mvarlese/REPOS/phoebe/bin",

        // max_learning_values: number of values learnt per iteration
        "max_learning_values": 1000,

        // save trained data to file every saving_loop value
        "saving_loop": 10,

        // accuracy: the level of accuracy to find a potential entry
        // given the transfer rate considered.
        //
        // MaxValue: Undefined, MinValue: 0.00..1
        // Probably not very intuitive: a higher number correspondes to
        // a higher accuracy level.
        "accuracy": 0.5,

        // approx_function: the approximation function applied
        // to the calculated tolerance value used to find a
        // matching entry in values.
        //
        // Possible values:
        // 0 = no approximation function
        // 1 = square-root
        // 2 = power-of-two
        // 3 = log10
        // 4 = log
        "approx_function": 0,

        // grace_period: the time which must be elapsed
        // before applying new settings for a lower
        // transfer rate than the one previously measured.
        "grace_period": 10,

        // stats_loop_period: the cadence of time which
        // has to be elapsed between stats collection.
        // It is expressed in seconds but it accepts non-integer
        // values; ie. 0.5 represents half-second
        "stats_collection_period": 0.5,

        // inferece_loop_period: the time which must be
        // elapsed before running a new inference evaluation
        "inference_loop_period": 1

    },

    "labels": {
        // geography: valid options are EMEA, NA, LAT, APAC, NOT_SET
        "geography": "NOT_SET",
        // business: valid options are RETAIL, AUTOMOTIVE, SERVICE, NOT_SET
        "business": "NOT_SET",
        // behavior: valid options are THROUGHPUT, LATENCY, POWER
        "behavior": "THROUGHPUT"
    },

    "weights":{
        "transfer_rate_weight": 0.8,
        "drop_rate_weight" : 0.1,
        "errors_rate_weight" : 0.05,
        "fifo_errors_rate_weight" : 0.05
    },

    "bias": 10
}

Building and installation.

See BUILDING.md for build instructions. Packages for various distributions can be found in the OpenBuild service.

Running

The code supports multiple mode of operation:

  • Training mode:
./build/src/phoebe -f ./csv_files/rates.csv -m training -s settings.json
  • Inference
./build/src/phoebe -f ./csv_files/rates_trained_data.csv -i wlan0 -m inference -s settings.json

Feedback / Input / Collaboration

If you are curious about the project and want more information, please, do reach out to [email protected].
I will be more than happy to talk to you more about this project and what other initiatives are in this area.

phoebe's People

Contributors

asmorodskyi avatar dcermak avatar kkaempf avatar mge1512 avatar mslacken avatar mvarlese avatar shunghsiyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phoebe's Issues

Unable to build using meson

I am trying to build using meson as given in readme. But after I execute third line it gives error
image

Please see if it same for others .

collect_stats tries to access sysfs

The collect_stats.py script tries to query the current cpu frequency and governor from sysfs:

with open(SYSFS_CPU_PATH + 'cpu0/cpufreq/scaling_governor') as f:

Unfortunately, this fails in the github actions with:

Traceback (most recent call last):
  File "/__w/phoebe/phoebe/scripts/collect_stats.py", line 306, in <module>
    main(sys.argv[1], settings, count)
  File "/__w/phoebe/phoebe/scripts/collect_stats.py", line 271, in main
    collect_stats(
  File "/__w/phoebe/phoebe/scripts/collect_stats.py", line 187, in collect_stats
    with open(SYSFS_CPU_PATH + 'cpu0/cpufreq/scaling_governor') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'

I suspect that this is caused by the github actions runner not allowing the CI action to query these information to prevent it from modifying the CPU behavior.

@shunghsiyu I was told you introduced this, can we remove it for the meantime?

meson - room for improvements

  1. scripts - folder copied empty.
    Expected : needs to contain python script
  2. attempt to run scripts/collect_stats.py end up with Module not found "_phobe"
    Expected: meson script suppose to install need python module

Towards a proof-of-concept for a wider audience

I believe this project can gain more traction if we can showcase the project to a wider audience that adding artificial intelligence capabilities to Linux OS works, i.e. auto-tuning does indeed yield better performance. (I'm leaving out the self-healing part for now, as it seems harder than auto-tuning).

What I mean by wider audience is for someone with little knowledge of system and artificial intelligence to be able setup a scenario (or a benchmark), run the project and easily observe that the performance improved when auto-tuning is in action (ideally with all that done by a single script). To achieve that, there are still quite some challenges ahead.

First off, the scenario should be easy to setup. Right now we use TREX in our target scenario, which is not the easiest to setup; while its performance is superior, it support much less network interface cards compared to the Linux kernel. This is easily solvable to switching to other packet generators (e.g. iperf3, sockperf, ab, etc.), and is a minor issue compared to the next one.

Now, addressing the elephant in the room.

The core of Phoeβe lies in its brain, the decision making engine that will take system telemetry as input, and output a set of system-level parameters that improves the system's performance when applied.

But so far we have not been able to prove that this most curcial piece of the project works, that is, show that it can output system settings that does improve the system's performance. This is our second (albeit the major) issue we have that prevented us from showcasing the project to a wider audience.


I hope this proposal make sense, and if so, perhaps we can proceed to a discussion on how can we improve the decision engine (more data points for csv_files/rates.csv? collect more metric for the decision engine?) and have a simpler setup.

Add detailed installation instructions

I tried setting up phoebe on a new Tumbleweed VM today and I had a couple of issues with missing dependencies:

  1. libnl-3.0 was missing
  2. json-c was missing
  3. I had a couple of issues with cmocka. After building from source, I tried building phoebe and ran into linker issues. Installing RPMs from rpmfind fixed the issue. Could somebody confirm if this is the expected way to fulfil the cmocka dependency, I haven't used it before and I was wondering if I was missing something in the process?

It'd be great to have a bit more detailed instructions for building the repository. I could create an INSTALL.md file to add more instructions to build on Tumbleweed to begin with

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.