Giter Club home page Giter Club logo

operos's People

Contributors

dyachuk avatar gwklok avatar rlisagor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

operos's Issues

System upgrade - initial version

  • OS build:

    • SFS files should be versioned
  • Controller:

    • Teamster

      • Periodic version check against Gatekeeper (Pax Automa upgrade server)
      • If updates available:
        • Download SFS files
        • Store flag in etcd that update is pending
      • Keep track of the Operos version on each worker (from the workers' version check)
      • Update kick API:
        • Update bootloader config
        • Reboot
      • Worker version check API
      • Algorithm for rebooting of worker nodes. First version will be a simple one-by-one rolling reboot, but allow for more sophisticated algorithms in the future.
    • Waterfront

      • Toggle to enable/disable the update system
      • Notify user when controller upgrade is ready
      • Allow user to kick off upgrade via Teamster
      • Display update status and version information for each worker node
  • Workers:

    • Update agent:
      • Periodic version check against Teamster
      • Response to check: whether to reboot the worker node
      • If yes, drain self and reboot
      • Uncordon on startup

Trying to do a clean build on a clean install of ArchLinux, No Joy

Trying to understand if the project has been abandoned or what is going on.

I have set up a physical machine with a clean install of Arch Linux. Installed Docker, Go, Virtual Box, OpenSSH, Packer, Git. Cloned the repo, built the arch Linux packer image using the instructions from the website. No problem once I got all the dependencies figured out and adjusted some of the timings.

I then cloned this repo, and it fails building the waterfront project with the proto buffer packages, with the ProtoBufferIsVersion3 check of the autogenerated file.

My intention was to provide some pull requests back, but I wanted to test them first. I have been testing with Hyper-V and it turns out that the current set of scripts assumes that all available drives are used when it boots the PODs and on Hyper-V, you cannot disable the Floppy drives, therefore it tries to set up the floppy drives as a storage device (thus it hangs). A simple check if the device is a (\dev\fd?) device in the loop would prevent that. Plus some changes to the Readme.md on the prereqs such as docker, OpenSSH, etc required on a fresh install.

Dynamically tune Ceph placement groups as nodes are added

Teamster should dynamically adjust the placement group size for the cluster as new nodes are brought online (with a sufficient soak time incase of flappy nodes). See http://docs.ceph.com/docs/master/rados/operations/placement-groups/ for more information. Additionally the current config designed for very small clusters e.g. two worker nodes needs to ramp up the minimum number of written replicas for objects in the pool as the number of worker nodes increases.

Metrics forwarding

The Operos controller collects system metrics, these are displayed through the waterfront UI as well as being used by various components of the overall system for tuning e.g. Kubernetes, software updates etc. This metric collection system is not designed however to replace a larger organizational metrics collection system (i.e. not business level/application metrics), nor is it designed for long term retention. It is desirable to create aggregates mixing those however, so ensure that the system metrics pipeline can accept and endpoint to forward metrics for aggregation and long term storage.

Allow administrator to change cluster network default route

Currently we setup the controller to act as the default router for the cluster network to simplify getting started, however we also reserve the first 10 IPs in the cluster network for network devices not managed by Operos, one of these could very well be a firewall/router for handling cluster ingress and egress. Modify the installer to allow the administer to override the default.

Nodes with NVMe disks

Operos v0.2.81
5 node Cluster
2x 4core,16gig ram, 128 sata ssd
3x 8core, 32gig ram, 256 NVMe ssd
Everything was deploying fine until I got to my nodes that have the NVMe drives.
Hung on
Starting Download container images from controller
Failed on
Docker Socket for the API
Worker partitions intialization

I did Send Diagnostics and I can only assume it got sent to the controller but I can't find the Directory.

Log message forwarding

Similar to the metrics ticket #19 add the configuration and machinery to forward log messages to outside system or service for long term retention/processing.

opsctl command line tool

For every action that a administrator or user could perform through the waterfront shell we want to provide the capability of doing so from the command line/scripts.

Enable RBAC

  • apiserver manifest should include:
    • --authorization-mode=Node,RBAC
    • --admission-control=...,NodeRestriction,...
  • Kubelet certs should have:
    • usr = system:node:<nodeName>
    • grp = system:nodes
  • All built-in manifests should have service accounts with appropriate permissions / bindings:
    • kubedns
    • rbd-provisioner
    • kube-scheduler
    • kube-controller-manager
    • kube-proxy
    • waterfront
    • calico
    • prometheus
    • node-exporter

Unattended install

It should be possible to install Operos without having to use the installer console. Ideally, this would work by supplying a JSON/YAML formatted set of answers to the questions asked by the installer.

Enable worker full disk encryption

Teamster currently generates a key per worker to be used by LUKS to encrypt the block devices in worker nodes. The make-partitions script should use this key to encrypt both the ephemeral and persistent (ceph osd) volumes.

TLS certificate management for waterfront

Allow administrators to upload TLS certificate and private key to secure the waterfront front end. Possibly support some kind of automatic method to obtain a TLS certificate that is browser supported out of the box (e.g. Let's encrypt), however the typically use case of an Operos cluster frontend is not directly connected to the internet so this might be difficult.

Prospector collect LLDP information for each online ethernet interface on startup

This ticket is to add the capability to the prospector client to gather the LLDP information from each ethernet interface before registering with teamster. This information will be used to determine rack topology by temaster adjusting the Ceph CRUSH map in larger clusters. It will also in the future be used for dealing with ingress/load balancing into a cluster (i.e. landing switch connected to upstream provider advertises lldp info foobar), that information will be available to Kubernetes so that a daemonset may be launched providing the ingress machinery.

OIDC support in Waterfront

Waterfront should be an OIDC client. JWT Token received from OIDC will be passed down to kube-dashboard and kube-apiserver.

localhost not in /etc/hosts

In some environments, kube-apiserver fails to start. This seems to be caused by the fact that localhost is not listed explicitly in /etc/hosts.

GPU support

Add support for GPUs/accelerators.
Not really a technical task, more wade through pages of legalese to figure out if it is possible to distribute the proprietary drivers normally involved and if it is not possible to distribute the binaries nicely: figure out a user friendly way for the users to add them.

Infiniband support

Add infiniband support to Operos for networking.
Modify installer to allow administrator to use IB interfaces for cluster topology.

Advanced cluster network topology

Currently the controller has a very simple network setup user facing (waterfront/kube api) versus cluster data/control plane what we call the private interface (everything else).

Allow the administrator to split the cluster data/control plane between physical interfaces on the controller.
Allow those interfaces to include virtual interfaces as well (e.g. VLANs), potentially a controller could then only need one interface.
Traffic splitting should focus on the control plane (PXE boot, image delivery), Ceph data plane, and the interpod communication fabric.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.