Giter Club home page Giter Club logo

coinboot's People

Contributors

frzb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coinboot's Issues

PoC: plugin for NVIDIA driver

Coinboot should not only support AMD GPUs but as well NVIDIA GPUs.

So we have to come up with a plugin providing the proprietary NVIDIA GPU driver.
Reference GPU for this PoC is a NVIDIA P106-100.

Add plugin metadata key for used kernel version

The current metadata structure of the plugin creation files looks like this:

plugin: AMDGPU-Pro Polaris
archive_name: amdgpupro_polaris
version: 20.50-1234664
description: AMD Polaris GPU (RX500/RX400 family) firmware and driver with support for OpenCL 1.2
maintainer: Gunter Miegel <[email protected]>
source: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-50
run: |
  < pluging creation code>

As Coinboot progresses we might bring up multiple kernel versions like 5.11.0-46-generic or 5.13.0-25-generic.
For the plugins that have a kernel dependency we have to reflect that in plugin metadata with adding the mandatory key kernel.
For the plugins that don't have a kernel dependency I purpose to set kernel to the value all.

Add Swap based on zram with full memory size

Currently there is no swap enable.
This leads to constraints when applications allocate a lot of memory and the overall memory is on the lower end with 4 GB and let's in the worst cause the OOM killer stopping the memory allocating process.
Like it currently happens with Teamred Miner in multiple GPU setup with only 4 GB system memory.

We should enable swap backed by a zram drive covering the full memory capacity or 8 GB whichever is smaller.
Similiar to how it was done successfully for Fedora 34.

https://fedoraproject.org/wiki/Changes/Scale_ZRAM_to_full_memory_size

setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

coinboot-initramfs-5.4.0-58-generic seems to have an issue with locale

During login/start of the image the following error message get's displayed:

warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

Compose fails with error message

used a fresh cloned vanilla version of the coinboot master and try to do a

docker-compose up
and it failed with the linked error message

Building coinboot
Traceback (most recent call last):
  File "/usr/bin/docker-compose", line 11, in <module>
    load_entry_point('docker-compose==1.17.1', 'console_scripts', 'docker-compose')()
  File "/usr/lib/python2.7/dist-packages/compose/cli/main.py", line 68, in main
    command()
  File "/usr/lib/python2.7/dist-packages/compose/cli/main.py", line 121, in perform_command
    handler(command, command_options)
  File "/usr/lib/python2.7/dist-packages/compose/cli/main.py", line 952, in up
    start=not no_start
  File "/usr/lib/python2.7/dist-packages/compose/project.py", line 431, in up
    svc.ensure_image_exists(do_build=do_build)
  File "/usr/lib/python2.7/dist-packages/compose/service.py", line 318, in ensure_image_exists
    self.build()
  File "/usr/lib/python2.7/dist-packages/compose/service.py", line 923, in build
    shmsize=parse_bytes(build_opts.get('shm_size')) if build_opts.get('shm_size') else None,
TypeError: build() got an unexpected keyword argument 'stream'

CI/CD: verify_and_release: Run Coinboot server and boot workers shows error on dnsmasq start

Despite the error message the CI/CD pipeline suceeds.

We have to identify if this is a real problem our just a side effect of installing dnsmasq for running a preflight check with dnsmasq aginst our dnsmasq configuration file.

https://github.com/frzb/coinboot/runs/4657577120?check_suite_focus=true#step:5:375

[snip]
Job for dnsmasq.service failed because the control process exited with error code.
See "systemctl status dnsmasq.service" and "journalctl -xe" for details.
invoke-rc.d: initscript dnsmasq, action "start" failed.
● dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server
     Loaded: loaded (�]8;;file://fv-az246-793/lib/systemd/system/dnsmasq.service�/lib/systemd/system/dnsmasq.service�]8;;�; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2021-12-29 10:07:33 UTC; 6ms ago
    Process: 2975 ExecStartPre=/usr/sbin/dnsmasq --test (code=exited, status=0/SUCCESS)
    Process: 2976 ExecStart=/etc/init.d/dnsmasq systemd-exec (code=exited, status=2)

Dec 29 10:07:33 fv-az246-793 systemd[1]: Starting dnsmasq - A lightweight DHCP and caching DNS server...
Dec 29 10:07:33 fv-az246-793 dnsmasq[2975]: dnsmasq: syntax check OK.
Dec 29 10:07:33 fv-az246-793 dnsmasq[2976]: dnsmasq: failed to create listening socket for port 53: Address already in use
Dec 29 10:07:33 fv-az246-793 systemd[1]: dnsmasq.service: Control process exited, code=exited, status=2/INVALIDARGUMENT
Dec 29 10:07:33 fv-az246-793 dnsmasq[2976]: failed to create listening socket for port 53: Address already in use
Dec 29 10:07:33 fv-az246-793 systemd[1]: dnsmasq.service: Failed with result 'exit-code'.
Dec 29 10:07:33 fv-az246-793 dnsmasq[2976]: FAILED to start up
Dec 29 10:07:33 fv-az246-793 systemd[1]: Failed to start dnsmasq - A lightweight DHCP and caching DNS server.
[snap]

modprobe: can't load module zstd (kernel/crypto/zstd.ko): unknown symbol in module, or unknown parameter

Booting a focal suite kernel 5.15.0-48-generic with zram bootflag ends up in a kernel panic:

$ qemu-system-x86_64 -kernel coinboot-vmlinuz-5.15.0-48-generic -initrd coinboot-initramfs-5.15.0-48-generic  -m 4096 -smp 2 -nographic  -serial mon:stdio -append "console=ttyS0 net.ifnames=0 biosdevname=0 break=skip_loading_plugins zram"

[...]

Honoring zram kernel arg
[    4.269253] zram: Added device: zram0
[    4.284727] zstd: Unknown symbol ZSTD_initCCtx (err -2)
[    4.285098] zstd: Unknown symbol ZSTD_getParams (err -2)
[    4.285335] zstd: Unknown symbol ZSTD_CCtxWorkspaceBound (err -2)
[    4.285609] zstd: Unknown symbol ZSTD_compressCCtx (err -2)
modprobe: can't load module zstd (kernel/crypto/zstd.ko): unknown symbol in module, or unknown parameter
[    4.360702] Can't allocate a compression stream
[    4.361164] zram: Cannot initialise zstd compressing backend
sh: write error: Cannot allocate memory

Add centralized network share

A centralized network share on the Coinboot server that is mounted in the filesystem of the Coinboot nodes would be an improveme for operations.
That network share would ease up storing persistent data and access to data that is needed during operations like firmware images or kernel dumps.

WebDAV is to favour cause it requires no privileges from the Docker host the Coinboot server container is running on.

dnsmasq: Increase maximum number of concurrent TFTP connections allowed

From the dnsmasq manpage

--tftp-max=
Set the maximum number of concurrent TFTP connections allowed. This defaults to 50. When serving a large number of TFTP
connections, per-process file descriptor limits may be encountered. Dnsmasq needs one file descriptor for each concur‐
rent TFTP connection and one file descriptor per unique file (plus a few others). So serving the same file simultane‐
ously to n clients will use require about n + 10 file descriptors, serving different files simultaneously to n clients
will require about (2*n) + 10 descriptors. If --tftp-port-range is given, that can affect the number of concurrent con‐
nections.

The default tftp-max value of 50 is obvious to low for scenarios where hundreds of nodes boot at the same time.
In such scenarios congestion like situations have been observed with with the default value of tftp-max with lots of nodes that seem to be stuck in the boot processes.
An ad-hoc adjustment of tftp-max=4096 in conf/dnsmasq/coinboot.conf resolved the situation.

We need to raise the default value of tftp-max to a sensible value.

The configuration should be verified by some load testing in the CI pipeline with software like fbender which can also benchmark TFTP servers.

Reproducible rootfs creation

While I was thinking about the rework of the release scheme I recognized that build process has to be reproducible to not end up with a moving target. Main target of this effort is the rootfs build with debirf based on debootstrap.

The people at Debian already addressed this topic:

https://wiki.debian.org/ReproducibleInstalls

So we should find out which software we should use for creating a reproducible rootfs build.

Remove Xenial support

We no longer support Coinboot images based on Ubuntu Xenial.

We have to remove the support for Xenial from our build scripts and pipelines.

If Loki becomes unreachable Containers are stuck

It is a know upstream issue:

The driver keeps all logs in memory and will drop log entries if Loki is not reachable and if the quantity of max_retries has been exceeded. To avoid the dropping of log entries, setting max_retries to zero allows unlimited retries; the drive will continue trying forever until Loki is again reachable. Trying forever may have undesired consequences, because the Docker daemon will wait for the Loki driver to process all logs of a container, until the container is removed. Thus, the Docker daemon might wait forever if the container is stuck.

Feeding the Docker container logs via Promtail or Fluentd into Loki and not over the Loki Docker logging driver should be sufficient as work-around.

debirf: Integrate package caching support of debootstrap

Recent release of debootstrap have support for caching of packages.

From the manpage:

--cache-dir=DIR
Cache .deb files under directory. It should be an absolute path.

For improving the developer experience we should integrate this into debirf to speed-up repeated builds local builds.

Evaluate using a custom dpkg configuration for dropping unnecessary files

By using a custom configuration for dpkg we can control which files to drop during install of a package.

For example: /etc/dpkg/dpkg.cfg.d/01_coinboot

# block documentation
path-exclude /usr/share/doc/*
# keep copyright files for legal reasons
path-include /usr/share/doc/*/copyright
path-exclude /usr/share/man/*
path-exclude /usr/share/groff/*
path-exclude /usr/share/info/*
# lintian stuff is small, but really unnecessary
path-exclude /usr/share/lintian/*
path-exclude /usr/share/linda/*
# block non-us locales
path-exclude /usr/share/locale/*
path-include /usr/share/locale/en*

Inspired by: https://wiki.ubuntu.com/ReducingDiskFootprint#Drop_unnecessary_files

debirf: Create persistent shared host key for SSH

For SSH host keys are an essential securtiy feature against man-in-the-middle attacks.
On each start of a Coinboot node a new host key is generated by the OpenSSH daemon and needs to be acknowledged when initiating a SSH connection or is ignore by the SSH client configuration at all.

In a controlled cluster environment where access is only happening in the local network with a minimal risk for man-in-the-middle attacks sharing host keys is acceptable. So we have to:

Find a way to create a persistent shared host key.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.