Giter Club home page Giter Club logo

bluebanquise's Introduction

Kofi

BlueBanquise

BlueBanquise Logo

Web site: https://bluebanquise.com

๐Ÿ“ข The main branch is under active dev for now. Consider using a stable branch for production. ๐Ÿ“ข

What is BlueBanquise

BlueBanquise is group of coherent Ansible collections and tools, designed to deploy and manage large group of hosts (clusters of nodes).

The BlueBanquise collections are generic and can adapt to any kind of architecture (High Performance Computing clusters, university or enterprise infrastructures, Blender render farm, K8S cluster, etc.). A specific focus is made on scalability for very large clusters.

When "stacked" together, collections and tools are called BlueBanquise stack.

Collections

The following collections are available. Please note that for now, only infrastructure collection of BlueBanquise is considered stable.

  • ๐ŸŒ Infrastructure: the core of the stack, focused on providing roles and tools to deploy hosts and configure vital services.
  • ๐ŸŒ hardware: specific hardware support roles (GPU, interconnect, etc.).
  • ๐ŸŒ file system: support for local or network FS roles.
  • ๐ŸŒ hpc: High Performance Computing related roles.
  • ๐ŸŒ containers: containers related roles.
  • ๐ŸŒ high availability: HA and load balancing related roles.
  • ๐ŸŒ logging: system logging related roles (different from monitoring).
  • ๐ŸŒ monitoring: cluster monitoring related roles.
  • ๐ŸŒ security: system security related roles.

Infrastructure collection should be compatible with all target Linux distributions (RHEL 8, RHEL 9, Debian 11, Debian 12, OpenSuse Leap 15, Ubuntu 20.04, Ubuntu 22.04). Other collections do not support all these distributions (support is added on demand).

Note that few features are still limited on Ubuntu and Debian (mainly network configuration), I am working on it.

License

BlueBanquise repository is under MIT license, except Bluebanquise documentation which is under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Quickstart

We will assume here you already have a recent Ansible setup and configured. If you are new to Ansible, you can use the provided generic tutorial.

1. Core variables and Jinja2 extensions

In order to use BlueBanquise collections, you need the core variables, that contain the logic (BlueBanquise relies on a centralized logic to easily impact all roles at once).

To install core variables, you can either:

  • Copy file bb_core.yml into your inventory at group_vars/all/ level
  • Or install commons collection and invoke the vars plugin at ansible-playbook execution, using ANSIBLE_VARS_ENABLED=ansible.builtin.host_group_vars,bluebanquise.commons.core
  • Or add it into your ansible.cfg file (see example at ansible.cfg) by adding jinja2_extensions = jinja2.ext.loopcontrols,jinja2.ext.do

While first solution is simpler, second solution allows to use the galaxy update mechanism to ensure your core logic is always up to date (bug fixes mainly).

In both cases, you need to enable some Jinja2 extensions at run time. To do so, either:

  • Add it into your ansible.cfg file (see example at ansible.cfg) by adding jinja2_extensions = jinja2.ext.loopcontrols,jinja2.ext.do
  • Or invoke the extensions at ansible-playbook execution, using ANSIBLE_JINJA2_EXTENSIONS=jinja2.ext.loopcontrols,jinja2.ext.do

Note that not all roles need this core logic, and that all logic variables are prefixed by j2_.

2. Install collections

To install BlueBanquise collections, you can use the ansible-galaxy command:

ansible-galaxy collection install git+https://github.com/bluebanquise/bluebanquise.git#/collections/commons,master -vvv --upgrade
ansible-galaxy collection install git+https://github.com/bluebanquise/bluebanquise.git#/collections/infrastructure,master -vvv --upgrade

3. Create inventory

To create your inventory, you can use the provided datamodel, and roles embed READMEs (for example, for pxe_stack role, you can rely on README.md, etc.).

4. Create playbooks

You can invoke BlueBanquise roles using full name:

---
- name: managements playbook
  hosts: "fn_management"
  roles:
    - role: bluebanquise.infrastructure.dhcp_server
      tags: dhcp_server
    - role: bluebanquise.infrastructure.pxe_stack
      tags: pxe_stack

If you are not running Ansible as root, remember to pass the -b (--become) argument to ansible-playbook command.

5. Read documentation

It is advised to read the documentation at https://bluebanquise.com/documentation/ to understand stack basic concepts.

Resources

Documentation

The stack documentation is available on the BlueBanquise website, in documentation subfolder.

Note that each role embeds its own README, with detailed usage description.

Packages

The stack packages are available in the repositories subfolder.

Supported software environment

The stack aims at supporting a maximum range of hardware, CPU architectures, and Linux distributions.

Currently tested and supported distributions (other derivative could work) are:

Operating System family Operating System distribution Tested versions Architectures Notes
Red Hat
RHEL 7, 8, 9 x86_64, aarch64 โˆš
Rocky Linux 8, 9 x86_64, aarch64 โˆš
CentOS 7, 8 x86_64, aarch64 โˆš
CentOS Stream 8 x86_64, aarch64 โˆš
Alma Linux 8, 9 x86_64, aarch64 โˆš
Debian
Ubuntu 20.04, 22.04 x86_64, arm64 โˆš. Diskless not supported for now.
Debian 11, 12 x86_64, arm64 โˆš. Diskless not supported for now.
Suse
SLES 15 x86_64, aarch64 โˆš. Diskless not supported for now.
OpenSuse Leap 15 x86_64, aarch64 โˆš. Diskless not supported for now.

Ansible >= 4.10.0 is mandatory for BlueBanquise to run properly.

Please note that EL 7 systems (Centos 7, RHEL 7, etc.) is now considered best effort only.

The project

BlueBanquise is part of the Algoric project from the Fabrique du Loch FabLab, located in Brittany - France.

BlueBanquise Logo

It is a revamping of the old stack Banquise, based on Salt.

The BlueBanquise project is a 100% open source project, not managed by a company, and will stay MIT license.

The name

You may wonder where this name comes from:

bluebanquise's People

Contributors

aldarrie avatar alissonzuza avatar aolloh avatar bouriquet avatar btravouillon avatar dilassert avatar gavillom avatar giacomo-mcevoy avatar ginomcevoy avatar hmescaler avatar johnnykeats avatar jpm38 avatar lmagdanello avatar loar38 avatar marbolangos avatar mp-bull avatar neilmunday avatar osmocl avatar oxedions avatar patrick-legi avatar pietersdavid avatar pigay avatar remyd1 avatar rezib avatar santos-lucas avatar sla31 avatar sobrase avatar strus38 avatar thiagocardozo avatar wlln avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bluebanquise's Issues

roles/core/time - not working on ubi8 image

Hi
molecule testing is showing the following error:
TASK [time : Install packages] ************************************************* fatal: [instance]: FAILED! => {"changed": false, "failures": ["No package chrony available."], "msg": "Failed to install some of the specified packages", "rc": 1, "results": []}

Report enhancement

Need to finish report role:

  • plugins
  • remove ansible-inventory-grapher and ansible-playbook-grapher
  • add more patterns
  • finish network graph generation
  • add system map and racks map

Missing external_* checks in some roles

Hey ,

I'm trying to use bluebanquise to deploy a cluster stack. I'm not interested in using external servers at all. So I just removed values from example in my inventory. The file (/etc/ansible/inventory/gorup_vars/general_settings/external.yml) is like this :

external_time:
  time_server:
    server: # List of possible time servers
    pool: # List of possible time pools
  time_client:
    server: 
    pool: 

external_dns:
  dns_server: # set as forwarders in named.conf
  dns_client: # set directly on client side in resolv.conf

# Hosts defined here will be automatically added into /etc/hosts and into DNS configuration
external_hosts:

Then I got the following error on several roles (at least hostsfile, dns_client, dns_server) :

AnsibleError: Unexpected templating type error occurred on (#### Blue Banquise file ####
): 'NoneType' object is not iterable

It feels like templates need to check if lists length is more than 0.

Mathis

[CI] ci: implement unit tests for roles with Molecule

Issue to track the work to implement CI/CD unit tests of the roles with Molecule. Please refer to the instructions below. You can use PR #242 as an example.

Core roles

Advanced core roles

  • advanced_dhcp_server (#428)
  • advanced_dns_server (#438)

Addons roles

  • clone
  • clustershell (#174)
  • diskless
  • nic_nmcli
  • ofed
  • ofed_sm
  • openldap_client
  • openldap_server
  • prometheus_client (#430)
  • prometheus_server (#430)
  • report
  • slurm
  • users_basic (#174)

How to

For developers, install virtualenv in your environment:

$ pip install --user virtualenv

At the root of your git clone, create a new virtual environment in directory venv and activate it:

$ virtualenv venv-molecule
$ source venv-molecule/bin/activate
(venv-molecule) $ 

Install the required modules in the virtual environment:

(venv-molecule) $ pip install 'molecule[docker]>3' ansible-lint flake8

Change your current directory to the role you want to update and initialize the default molecule scenario:

(venv-molecule) $ cd roles/core/conman/
(venv-molecule) $ molecule init scenario --role-name conman
--> Initializing new scenario default...
Initialized scenario in /git/bluebanquise/roles/core/conman/molecule/default successfully.

Tune, commit, test, open a PR.

Once completed, deactivate your virtual environment:

(venv-molecule) $ deactivate 
$ 

Note: If you run Fedora 31, Docker will not run by default due to switch to cgroups v2, however it's still possible to run most of the tests locally using the podman driver (molecule create -d podman).

disklessset.py: static code analysis error

I'm concerned by the F821 undefined name 'node'.

$ flake8 . --statistics --ignore E501,E226
./roles/addons/diskless/files/disklessset.py:17:1: F401 'shutil.copy2' imported but unused
./roles/addons/diskless/files/disklessset.py:20:1: F401 'pwd' imported but unused
./roles/addons/diskless/files/disklessset.py:21:1: F401 'grp' imported but unused
./roles/addons/diskless/files/disklessset.py:22:1: F401 're' imported but unused
./roles/addons/diskless/files/disklessset.py:23:1: F401 'hashlib' imported but unused
./roles/addons/diskless/files/disklessset.py:25:1: F401 'base64.urlsafe_b64encode as encode' imported but unused
./roles/addons/diskless/files/disklessset.py:26:1: F401 'base64.urlsafe_b64decode as decode' imported but unused
./roles/addons/diskless/files/disklessset.py:27:1: F401 'getpass.getpass' imported but unused
./roles/addons/diskless/files/disklessset.py:405:78: F821 undefined name 'node'
8     F401 'shutil.copy2' imported but unused
1     F821 undefined name 'node'

Need to have 2 disctinct services for tftp and http in pxe chain boot

Bluebanquise stack uses the same IP address in pxe chain boot for tftp service and http service: The pxe_ip variable in network definition.
networks:
ice1-1:
subnet: 10.10.0.0
prefix: 16
netmask: 255.255.0.0
broadcast: 10.10.255.255
dhcp_unknown_range: 10.10.254.1 10.10.254.254
gateway: 10.10.255.254
is_in_dhcp: true
is_in_dns: true
services_ip:
pxe_ip: 10.10.0.1

This is working fine when services are running on the same host/VM/container and so exposing the same IP address for both services.
When deploying services like dhcp, tftp, http, dns etc. inside containers running on various bare metal nodes, orchestrated bu Kubernetes, we have a service IP address per service. For that reason, we need to have a pxe chain boot solution for which we can configure a tftp_ip address and a pxe_ip address for respectively access to TFTP server and HTTP server.

Improve IPXE ROM robustness

Hi,

Can you provide a more reliable IPXE (mainly for the DHCP part) ? It works perfectly with good switches, but when it comes to old switches, the interface configuration command hangs out and timeout after 5 retries.

File : bluebanquise_standard.ipxe

Command to be changed :
ifconf --configurator dhcp || shell

Exemple provided by the IPXE website :
:retry
ifconf --configurator dhcp || goto retry

or

:retry
ifconf -c dhcp && isset ${filename} || goto retry

roles/core/time - not working on ubuntu18.04

Molecule testing not working on ubuntu18.04:
TASK [time : Install packages] ************************************************* fatal: [instance]: FAILED! => {"changed": false, "msg": "No package matching 'chrony' is available"}

SELinux issues with NFS /home

When upgrading tumulus cluster to 1.2.0, we found an issue with SELinux on logins nodes with /home for uses.

A boolean must be set.

Need the dns server to listen and allow query on all networks

In a kubernetes context, the DNS server is started in a container and only knows the kubernetes network (not managed by bluebanquise but by kubernetes)
In this case, we must delete the listen-on and allow_query section and keep the default behavior to be able to function

for example

cat /etc/ansible/kubernetes-dns.patch
 
--- roles/core/dns_server/templates/named.conf.j2.bb    2020-02-05 14:51:21.871668303 +0100
+++ roles/core/dns_server/templates/named.conf.j2       2020-02-05 14:52:41.327668303 +0100
@@ -12,14 +12,14 @@
 {% set main_ntw = (ntw | first | join | trim) %}
 
 options {
-  listen-on port 53 {
-    127.0.0.1;
-{% for network in (networks | select('match',(j2_current_iceberg_network+'-[0-9]+')) | list | unique | sort) %}
-{% if networks[network]['is_in_dns'] is defined and networks[network]['is_in_dns'] == true %}
-    {{networks[network]['services_ip']['dns_ip']}};
-{% endif %}
-{% endfor %}
-  };
+  #listen-on port 53 {
+    #127.0.0.1;
+#{% for network in (networks | select('match',(j2_current_iceberg_network+'-[0-9]+')) | list | unique | sort) %}
+#{% if networks[network]['is_in_dns'] is defined and networks[network]['is_in_dns'] == true %}
+    #{{networks[network]['services_ip']['dns_ip']}};
+#{% endif %}
+#{% endfor %}
+  #};
 
   listen-on-v6 port 53 { ::1; };
   directory    "/var/named";
@@ -27,14 +27,14 @@ options {
   statistics-file "/var/named/data/named_stats.txt";
   memstatistics-file "/var/named/data/named_mem_stats.txt";
 
-  allow-query {
-    localhost;
-{% for network in (networks | select('match',(j2_current_iceberg_network+'-[0-9]+')) | list | unique | sort) %}
-{% if (networks[network]['is_in_dns'] is defined and not none) and (networks[network]['is_in_dns'] == true) %}
-    {{networks[network]['subnet']}}/{{networks[network]['prefix']}};
-{% endif %}
-{% endfor %}
-  };
+  #allow-query {
+    #localhost;
+#{% for network in (networks | select('match',(j2_current_iceberg_network+'-[0-9]+')) | list | unique | sort) %}
+#{% if (networks[network]['is_in_dns'] is defined and not none) and (networks[network]['is_in_dns'] == true) %}
+    #{{networks[network]['subnet']}}/{{networks[network]['prefix']}};
+#{% endif %}
+#{% endfor %}
+  #};
 
 {% if external_dns.dns_server is defined and not none and external_dns.dns_server %}
   forwarders {

RHEL 8 support

Yo Ox,

As discussed, I will have a look on how to integrate RHEL 8 into BB.
Can you give me the rights, and create the branch for my devs ?
Thanks

JK

EFI not managed

Yo Ox,

I found EFI is baldy managed: sanboot in menu.ipxe do not work in EFI, and iPXE usb and iso images do not contains EFI part.

OK if I take care of that ? The CINES SMP node is in EFI and this is broken this way.

Documentation is missing some parts

Missing parts:

  • Ansible training do not cover tasks (when, loop, with_items, etc)
  • equipment_profiles values are not properly explained in the documentation, and available values are not provided.
  • Web site upgrade with new documentation

adding a BMC entry with option_match in inventory doesn't work

Hi,
i find a new issue.
When I want to add a BMC entry with option82 match, the entry in dhcp file isn't added.
On the other hand, the entry with mac works.

          bmc:
            name: pm0-bmc05
            #mac: 08:00:27:0d:f8:b2
            option_match: |
              substring (option agent.remote-id, 0 , 64) = "0"
              and option agent.circuit-id = 01:00:10
              and (substring(option vendor-class-identifier, 0, 10) = "XXXX S BMC")
            ip4: 10.0.0.106
            network: ice1-1

Kind regards,

Need carrier timout at dracut handling

On many hardware, NIC carrier timeout too fast at dracut, leaving osdeploy or diskless boots orphan of kickstart or image.

A simple dhclient in dracut shell shows ip is easy to get, but dracut timed out too fast.

Need a simple way in bootset to add correct kernel parameters to ask dracut to wait and retry more.

Clonezilla for Microsoft Windows

Tumulus has 2 Windows 10 systems.

We are using clonezilla to backup/reinstall. Need support for basic imaging based on clonezilla live.

1.3.0 Milestone.

Merge core time roles to one

Chrony roles are nearly the same between server and client. Merging them like for slurm would simplify the list.

Remove python scripts from templates

Hi,

I think python scripts (bootset) code should not be include in Ansible templates.

Moving this code to a more standard path and making the code "static" (by declaring your CLI in a setup.py file and remove sources from role templates) could be a great enhancement.

Configuration of the bootset executable can be done through an external configuration file (let's say /etc/bootset/bootset.conf) and templated by Ansible.

dns_server role: remove "@ IN A 10.0.0.1 mngt IN A 10.0.0.1" from forward template

These 3 lines are not needed and are leading the the dns_server role failing in chroot context execution.

diff -r roles/core/dns_server/templates/forward.j2 roles.bkp/core/dns_server/templates/forward.j212a13,15> @ IN A {{network_interfaces[j2_node_main_network_interface]['ip4']}}>> {{inventory_hostname}} IN A {{network_interfaces[j2_node_main_network_interface]['ip4']}}

Prometheus role need equipment_profiles affinity

Prometheus ADDON role need ability to define alerts on equipment_profiles granularity.

Need to move alerts from file to template, and use Jinja inside with jobs. Need to find how to define an alert for multiple jobs.

Would be also interesting do define a graphana role to bind to Prometheus and Alert Manager.

Bluebanquise roles should support chroot ansible_connection type

Hey,

From several weeks now, we're using the amazing Bluebanquise roles to manage our cluster and most of all to customize our overlayfs diskless images.
That allow us to have diskfull and diskless system configured on the same way, which really ease the way of managing clusters.

Some roles are using "service" Ansible task that do not work very well when using chroot ansible connection plugin. In fact it makes services starting in the diskless image chroot where we only expect the service to be enabled. And that can cause some troubles on the node we're generating this diskless.

That would be amazing if Bluebanquise core roles could take care of this need (and I think we need to think about this a little).

Mathis

bootset should allow to write output files on an alternate root path

For now, bootset generates output configuration files on the local root file system.
This feature request is a requirement for deployment types who have their services running on a different node than the node hosting locally the configuration files.
As an example, we have dhcp, tftp, http services running inside containers on different hosts (orchestrated by kubernetes for instance). These services need to have access to pxe and repository data files from a shared storage file system.

An wrong ifcfg file is generate even no ip is set in the nic declaration

We need to declare a nic in a new network but we can't set an IP as the IP will be mamanged by kubernetes.
I tried addind a test to skipped the ifcfg generation if no ipv4 is set in the role nic
For example

- name: Set NIC configuration
  template:
    src: ifcfg.j2
    dest: /etc/sysconfig/network-scripts/ifcfg-{{item}}
    owner: root
    group: root
    mode: 0644
  with_items: "{{network_interfaces}}"
  when: (network_interfaces[item]['ip4'] is defined and not none)

without the test, an ifcfg file is generate but with empty parameter.

Nyancat

Where is the nyancat ? Where is the display message ?

Feature request: add support for slave dns

I was reviewing the dns_client and dns_server roles and noticed there is a maximum number of one DNS server per network.

Could we consider adding support for secondary DNS servers?
We can discuss this in this thread then I can push a PR if accepted.

bootset refactoring

Hi @oxedions,

Just submitted a bootset refactoring PR here : #42

It can be improved, but i think that's a better base for improvment.

TODO :

  • OO class
  • Add error handling
  • More flexibility (we should be able to develop additionnal plugins to add more boot methods)
  • Use logging lib instead of print

Need script and directory tags in pxe_stack role

When deploying services like dhcp, tftp, http, dns etc. inside containers running on various bare metal nodes, we used the ansible "chroot" connector to generate the configuration files on a shared file system mounted on all nodes.
We want to use the standard roles (dhcp, dns, pxe_stack ...) for that purpose, but we do NOT want to start the services on the bare metal management node. So we use the template tag , but we need also for pxe_stack role to create the output directories and to copy the bootswitch cgi script. So, a directory and a script tag are needed for that.

How to organize tasks when mixed ditros

Yo Ox,

I was wondering, how should we organize tasks when dealing with multiple distros ?

You know, with centos_7.yml, ubuntu_18.yml etc files, because for some tasks packages only is not enough :-( you have to deal with paths or services also...

time: do not define both server and pool together

The role time defines both server and pool in chrony configuration if both are defined in the inventory.

[root@management1 ~]# head -6 /etc/chrony.conf 
#### Blue Banquise file ####
## Ansible managed: modified on 2020-04-07 03:52:27 by root on management1

# External servers/pool if asked
server 0.fr.pool.ntp.org
pool pool.ntp.org iburst maxsources 3

We should change this behaviour to define either server(s) or pool. I would default to pool.

See:

Improve quality, add services enabling/disabling variable

Hi,

i submitted a PR (see #31) containing the following changes :

Major changes :

  • Add boolean to disable all services (HA)
  • Implement tags + tag uniformisation (package, template, service)
  • Implement handlers

Minor changes :

  • Use new package module syntax (using list in name instead of loops)
  • Replace some "with_X" by loop
  • Minor fixes

The issue #19 can also be updated (contribution guidelines should be updated before closing)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.