choria-io / go-choria Goto Github PK

View Code? Open in Web Editor NEW

87.0 8.0 29.0 20.59 MB

Backplane Development Framework and Server hosting Choria Agents, Networks, Federations and Streaming Data

License: Apache License 2.0

Go 99.56% Ruby 0.07% Shell 0.24% Makefile 0.11% Batchfile 0.01% Dockerfile 0.02%

orchestration backplane framework

go-choria's Introduction

Choria Broker and Server

Choria is a framework for building Control Planes, Orchestration Systems and Programmable Infrastructure.

This is a daemon and related tools written in Go that hosts services, autonomous agents and generally provide a secure hosting environment for callable logic that you can interact with from code.

Additionally, this is the foundational technology for a monitoring pipeline called Choria Scout.

More information about the project can be found on Choria.IO.

Links

go-choria's People

Contributors

Stargazers

Watchers

go-choria's Issues

PoC REST Server

add at least a PoC in ruby of a REST service that uses the security capabilities to be fully AAA compliant.

https://github.com/ripienaar/mcollective-choria/wiki/Security-REST

keep stats about protocol issues

JSON parse failures
JSON schema validation failures
Signing failures
messages that had their signatures validated
messages that had invalid signatures
invalid certificates received
general protocol errors - missing certs, unparseable certs etc

ensure the broker server exits on interrupt

right now it blocks, probably because the imbedded NATS has no context, run it in a go proc and call its Shutdown() instead when the context is done

add buildinfo

Instead of showing the build details in choria --version we should have a build info sub command and specify just version for version.

This currently breaks man page generation from kingpin

Build instructions and considerations

Hi!

I finally managed to build Choria on FreeBSD, the process was unexpectedly complicated, and I am not sure about the exactness of the process I used to build the code.

Because I have no real experience with go-based programs, I looked at other go software packages in the FreeBSD ports tree to see how they are built, and since there is a bunch of differences, it might make sense to consider a few points.

How I built choria

In case I did this utterly wrong, let me start by explaining how I built Choria:

git clone https://github.com/choria-io/go-choria
cd go-choria
go get
go build

This produced a working go-choria binary in the working directory \o/.

However, a ~/go directory was created and a lot of source codes where downloaded there (I assume it's all the choria dependencies… 76MB 😨).

Reproducible builds

After creating a new user account and building choria the same way (but at a different date), the content of the ~/go directory is not strictly the same (89 MB this time). Maybe I am wrong, but my guess is that the dependencies are not fetch against a particular commit, and at some point, the build may break because of a dependency got updated.

In order to package Choria on FreeBSD, the checksum of all sources must be registered so that they are checked before building. Most go applications in the FreeBSD ports tree have all their dependencies included into their source code (!), generaly in a vendor directory (e.g. aptly, hub, syncthing, etc). While this is arguably ugly, it offers the benefits of reproducibility. Another option would be to add information about each dependencies version / commit, so that they could be checked. This is for example done for the go-cve-dictionary port, based on a file Gopkg.lock in upstream's repository (a lot more work for porters, but far less ugly at the repository level).

Moving into one of these directions (or something similar) would be awesome.

Binary name

The repository is called go-choria, and go build produce a go-choria binary. Can you confirm that the file should be renamed to choria when installed on the end-user system?

support auditing in the mco rpc harness

Should have build in compatible auditing plugin that is 1:1 to the ruby one. Later should allow ruby ones to be called via a exec or something

nats-io/go-nats dependency registered twice

I guess that the nats-io/go-nats dependency appears twice in glide.lock, first here:

go-choria/glide.lock

Lines 46 to 50 in 8352d2b

 - name: github.com/nats-io/go-nats 

 version: d66cb54e6b7bdd93f0b28afc8450d84c780dfb68 

 subpackages: 

 - encoders/builtin 

 - util

then here (with the name nats-io/nats that redirects to nats-io/go-nats):

go-choria/glide.lock

Lines 55 to 56 in 8352d2b

 - name: github.com/nats-io/nats 

 version: d66cb54e6b7bdd93f0b28afc8450d84c780dfb68

Notice how the second entry has the same commit as the first entry.

This was discovered while scripting dependencies extraction from glide.yaml. I'll try to submit a PR that fix the issue.

create rpcutil agent

Requires #57

use os.Getuid rather than user.Current

At present user.Current() is used in a few places, unfortunately this does not work too hot when cross compiling since cgo isnt available during that.

Annoyingly Go is supposed to have worked around this by falling back to os.Getuid in those cases but this appears to not work, running a cross compiled choria still produce errors about this when determining SSL paths etc

So rip those out and use os.Getuid directly

keep stats in the connector

initial connect tries
initial connect total time
numbers of each type of message - direct, federated, broadcast etc
connection disconnects
connection reconnects
connection closed
connection errors

use context in agent.HandleMessage

it would be useful for agent actions to be interruptable like those using exec.CommandContext()

Keep stats of the federation broker

Old fed broker keeps stats and publishes this on the wire regularly so that mco federation observe can report on the global state.

We need at least something compatible for now but down the line I'd like graphite/prometheus etc emiters.

distribute broker.conf and server.conf

Distribute 2 separate config files rather than choria.conf

allow parsing mcorpc request data into custom structs

At present there's no way in a Go agent to get the request data, this should be possible via data being unmarshalled into a supplied variable

rename the prom label instance to worker

We cant use instance to label workers because instance is in use by prom internals, use worker instead

correctly detect startup failure on el6

The init script needs something like this, the hacky way to background these go apps will always return 0 even if the app fails

https://github.com/ripienaar/stream-replicator/blob/a1def0faa60f14d1ac89e4616f1f799019100cc4/build/dist/el6/stream-replicator.init#L54

allow el5/i386 packages to be built

also make contact details configurable

Keep stats and expose

Keep stats using the https://github.com/rcrowley/go-metrics library:

Adapters
Federation Brokers
Network Server
Registration

Expose those via the expvar method for now, later we'll do graphite, prometheus etc

It should listen on plugin.choria.stats_port and be off by default

allow extra agents to be compiled in

It should be possible for files to be patched into the build process and built conditionally,

so say you have github.com/foo/go-foo-agent you might patch in a file server/additional_agents_foo.go:

// +build foo

package server

import (
	fooagent "github.com/foo/go-foo-agent"
)

func init() {
	registerAdditionalAgent(func(ctx context.Context, mgr *agents.Manager, connector choria.InstanceConnector, log *logrus.Entry) error {
		fa, err := fooagent.New(mgr)
		if err != nil {
			return fmt.Errorf("Could not create foo agent: %s", err)
		}

		mgr.RegisterAgent(ctx, "foo_agent", fa, connector)

		return nil
	})
}

and then a go build -tags 'foo' should activate this agent

record server stats

we should record at least the same server stats as the old one did, we can use prometheus for this and they have to be exposed in the rpcutil agent

values from the go client can be fetched internally and be translated in the agent https://github.com/prometheus/haproxy_exporter/blob/master/haproxy_exporter_test.go#L46

when run as root each call to SSLDir() calls puppet

It calls out to puppet to get the ssl dir setting when not configured, when run as root this happens 100s of times.

So we should store that setting into the config hash and so not call it all the time

Restart daemons on upgrade

for el7 ensure the broker/server is restarted and systemd is reloaded during upgrades

federation broker always subscribe to mcollective cluster id

support authorization in the mco rpc harness

should have built in action policy compatible plugin, later support others via ruby callouts.

Calling data plugins can be deferred for now

extract the protocol code into its own project

race condition on server.Start

The variable has some concurrent access issues as seen when built with --race, add a mutex around it

rename nats_stream adapter to natsstream

go moans about underscores in package names

add a network broker

NATS is easily embedded so lets make a choria broker --config /.... command that runs an embedded broker.

It should:

always use SSL as taken from the choria configs
support clusters with SRV resolution and manual configs
have minimal config
log to normal choria places/formats
not expose its own stats just yet, but a future integration for stats must be able to get at those

Proposed configs are:

plugin.choria.network_client_port
plugin.choria.natwork_peer_port
plugin.choria.network_peer_user
plugin.choria.network_peer_password
plugin.choria.network_peers

Down the line, as a feature that can be enabled using a compiler flag, it should support FIPS via a something like https://github.com/spacemonkeygo/openssl which would let people build against the system openssl.

NATS doesnt play well with this library so internally we'd open a listening port that takes normal TLS connections and route it internally to the plain text NATS port. A future mcollective connector would then support the same - TLS connection to plain text NATS. Thus we can elevate NATS to FIPS compliance via a managed TLS Proxy

validate ssl config for sanity

Port Util::Choria#check_ssl_setup

add copytruncate to logrotate configs

create choria_util agent

requires #57

fix packaging so src rpms is usable

right now rpmbuild gets passed a bunch of arguments to define variables and then the rpm does seds on the files, this yields a hard to use src rpm.

https://github.com/choria-io/nats-streaming-rpms does it differently by pre-processing the files and so yields workable src rpms

Do the same here

create mcollective rpc compatible agent harness

Create a harness to host mcollective rpc agents

emit basic info to /choria

Emit buildinfo etc on /choria, also config file and identity

on rhel6 after signalling the daemon then term it if it did not die

logrotate and {server,broker}.cfg has inconsistent locations

Both should use {{pkgname}}.log

revisit 'mco rpc'

The default mco rpc tool is a bit meh and probably contributes greatly to the difficulty in using mcollective.

It was written before DDLs even existed and was never revisited.

This should be revisited to focus more on the problem it should solve, towards that I have come to the following basic sketch:

Goals:

Focus on what users will most often want to see by creating a dynamic interface
Rethink some of the user facing terminology, removes RPC in favor of Request etc
Gradually expose the complexity inherent in a generic RPC client rather than by default produce a wall of text
Have tab completion for every part of the cli, the mco completion can already do this
Use $PAGER for things like --action-doc if more than $LINES of output

There is one possibility I also want to explore and that is to make a more interactive client that asks you client using the :prompt defined in the DDL, you'd start it up like choria puppet --interactive and it will start asking you questions via prompts, defaults etc and construct the request - and possibly show you what command would have produced the same outcome as a learning tool. This way people with almost no experience can interactively learn the system

Some future suggestions:

Make some file where you can specify on a per agent/action basis defaults you always wish to apply, like say --batch or --noop for the runonce agent or whatever via @trevor-vaughan

The code used to produce the output can be found here https://gist.github.com/ripienaar/f68d2a9031b35f9dc3d467c9d85886ee it just prints stuff doesnt actually make requests

Default action, show available agents

$ choria
Choria client version x.x.x

Usage: choria <agent> <action> [agent options] [request options]

Available agents:

  package        Install and uninstall software packages
  puppet         Run Puppet agent, get its status, and enable/disable it
  rpcutil        General helpful actions that expose stats and internals to SimpleRPC clients
  service        Start and stop system services

See choria <agent> --help for details about the agent

Per agent generated details

$ choria puppet
Puppet agent version 1.11.1

Usage: choria puppet <action> [agent options] [request options]

Run Puppet agent, get its status, and enable/disable it

Available actions:

  disable                Disable the Puppet agent
  enable                 Enable the Puppet agent
  last_run_summary       Get the summary of the last Puppet run
  resource               Evaluate Puppet RAL resources
  runonce                Invoke a single Puppet run
  status                 Get the current status of the Puppet agent

See choria puppet <action> --help for details about one of the actions

Per action view

Here we focus on showing the available inputs the action take and turn them into --foo style flags and show them as options.

All the old RPC noise is hidden by default under --filter-help and --request-help, an additional --action-doc exist to show the DDL produced doc for the action

Ideally these options would show as much as possible from the DDL things like data type and default but we have limited screen real estate

$ choria puppet runonce --help
Puppet agent version 1.11.1

Usage: choria puppet runonce [agent options] [request options]

Run Puppet agent, get its status, and enable/disable it

Options for the runonce action:

    Use --action-doc to get details about these such as types, defaults and valid values

Optional options:
        --force                      Will force a run immediately else subject to default splay time
        --server                     Address and port of the Puppet Master in server:port format
        --tags                       Restrict the Puppet run to a comma list of tags
        --noop                       Do a Puppet dry run
        --splay                      Sleep for a period before initiating the run
        --splaylimit                 Maximum amount of time to sleep before run
        --environment                Which Puppet environment to run
        --use_cached_catalog         Determine if to use the cached catalog or not

Additional help:
        --action-doc                 View the documentation for the runonce action
        --filter-help                Help on selecting which nodes to act on
        --request-help               View a full set of request options

Per action DDL doc

This needs some iteration it really is just to show the idea here:

$ choria puppet runonce --action-doc
Puppet agent version 1.11.1

Definition of the runonce action

Action inputs:

  Optional options:
    environment (String):
      Description: Which Puppet environment to run
           Prompt: Environment
         Required: false
          Default: nil
       Max Length: 50
       Validation: puppet_variable

    force (Boolean):
      Description: Will force a run immediately else subject to default splay time
           Prompt: Force
         Required: false
          Default: nil
       Max Length: 0
       Validation: none

  <snip>


  Action outputs:

    initiated_at:
      Description: Timestamp of when the runonce command was issues
       Display As: Initiated at
          Default: 0

    summary:
      Description: Summary of command run
       Display As: Summary
          Default:

Host filters help

Supplying --filter-help will add just the filter options. For demo purposes this is just a copy/paste from mco rpc some refining will be needed to make this suck a bit less

$ choria puppet runonce --filter-help
Puppet agent version 1.11.1

Usage: choria puppet runonce [agent options] [request options]

Run Puppet agent, get its status, and enable/disable it

Options for the runonce action:

    Use --action-doc to get details about these such as types, defaults and valid values

Optional options:
        --force                      Will force a run immediately else subject to default splay time
        --server                     Address and port of the Puppet Master in server:port format
        --tags                       Restrict the Puppet run to a comma list of tags
        --noop                       Do a Puppet dry run
        --splay                      Sleep for a period before initiating the run
        --splaylimit                 Maximum amount of time to sleep before run
        --environment                Which Puppet environment to run
        --use_cached_catalog         Determine if to use the cached catalog or not

Additional help:
        --action-doc                 View the documentation for the runonce action
        --filter-help                Help on selecting which nodes to act on
        --request-help               View a full set of request options

Host Filters:
    -W, --with FILTER                Combined classes and facts filter
    -S, --select FILTER              Compound filter combining facts and classes
    -F, --wf, --with-fact fact=val   Match hosts with a certain fact
    -C, --wc, --with-class CLASS     Match hosts with a certain config management class
    -A, --wa, --with-agent AGENT     Match hosts with a certain agent
    -I, --wi, --with-identity IDENT  Match hosts with a certain configured identity

Full request help

For demo purposes this is just a copy/paste from mco rpc some refining will be needed to make this suck a bit less

$ choria puppet runonce --request-help
Puppet agent version 1.11.1

Usage: choria puppet runonce [agent options] [request options]

Run Puppet agent, get its status, and enable/disable it

Options for the runonce action:

    Use --action-doc to get details about these such as types, defaults and valid values

Optional options:
        --force                      Will force a run immediately else subject to default splay time
        --server                     Address and port of the Puppet Master in server:port format
        --tags                       Restrict the Puppet run to a comma list of tags
        --noop                       Do a Puppet dry run
        --splay                      Sleep for a period before initiating the run
        --splaylimit                 Maximum amount of time to sleep before run
        --environment                Which Puppet environment to run
        --use_cached_catalog         Determine if to use the cached catalog or not

Additional help:
        --action-doc                 View the documentation for the runonce action
        --filter-help                Help on selecting which nodes to act on
        --request-help               View a full set of request options

Request Modifiers:
        --no-results, --nr           Do not process results, just send request
        --np, --no-progress          Do not show the progress bar
    -1, --one                        Send request to only one discovered nodes
        --batch SIZE                 Do requests in batches
        --batch-sleep SECONDS        Sleep time between batches
        --limit-seed NUMBER          Seed value for deterministic random batching
        --limit-nodes, --ln, --limit COUNT
                                     Send request to only a subset of nodes, can be a percentage
    -j, --json                       Produce JSON output
        --display MODE               Influence how results are displayed. One of ok, all or failed
    -c, --config FILE                Load configuration from file rather than default
    -v, --verbose                    Be verbose

    -T, --target COLLECTIVE          Target messages to a specific sub collective
        --dt, --discovery-timeout SECONDS
                                     Timeout for doing discovery
    -t, --timeout SECONDS            Timeout for calling remote agents
    -q, --quiet                      Do not be verbose
        --ttl TTL                    Set the message validity period
        --reply-to TARGET            Set a custom target for replies
        --dm, --disc-method METHOD   Which discovery method to use
        --do, --disc-option OPTION   Options to pass to the discovery method
        --nodes FILE                 List of nodes to address
        --publish_timeout TIMEOUT    Timeout for publishing requests to remote agents.
        --threaded                   Start publishing requests and receiving responses in threaded mode.
        --sort                       Sort the output of an request before processing.
        --connection-timeout TIMEOUT Set the timeout for establishing a connection to the middleware

use the packager for packaging

package man files

After #53 package man pages

explore live provisioning

In large setups the desired middleware etc might not be known upfront when the machine is being installed.

Imagine the machine it built in a DC so large that one have multiple choria networks in the same DC, or perhaps you just want to create seperate networks for whatever reason.

I imagine a process like this:

At start look for a configuration file with choria.provision=1 set, go into provision mode if a provision server was compiled in

At this point if in provision mode it will try to connect to a compiled in nats server, or maybe one given on the CLI:

Connects to the provision collective
Publishes metadata without splay every n seconds
Wait for something to interact with the provision agent to tell it its configuration

The provisioning agent:

Receives a request to store configuration - which includes choria.provision=0
Writes the configuration and reloads exec itself now with the new config

We now have a normal configured choria, it:

Starts the provision agent should there by any provision url compiled in
The provision agent allows just a reprovision action that lets makes it write choria.provision=1 and copies over logging and registration settings from the running instance and reload

Any configuration file that gets loaded into the framework - even ones passed into it - should adjust itself this way when provisioning is on:

Turn off federation
Set main_collective and collectives to provisioning
Set registration interval to 120
Disable registration splay
Set the file_content registration target to choria.provisioning_data

parse all files in the plugin.d dirs

check request TTLs

the protocol does not check TTLs so we have to do it in server.handleRawMessage

make the max broker clients a build configurable

default to 50k

treat the defaults file as config in rpm specs

fix TLS route connections

In versio 0.0.2 trying to setup TLS routes yields:

{"component":"network_broker","level":"debug","msg":"192.168.88.39:5222 - rid:2 - TLS route handshake error: x509: certificate signed by unknown authority","time":"2017-12-10T13:51:
44Z"}
{"component":"network_broker","level":"debug","msg":"192.168.88.39:5222 - rid:2 - Router connection closed","time":"2017-12-10T13:51:44Z"}

Appears we're missing some TLS setup from NATS still

use backoff when reconnecting federation brokers, adapters and server

consider one of these and use to slow down reconnects rather than the current naive sleep based things

https://github.com/kamilsk/retry
https://github.com/ssgreg/repeat
https://github.com/carlescere/goback

expose more of the request to mcorpc.Request

cache dns lookups

the way the go nats package resolves servers results in many concurrent DNS lookups in every worker in every federation broker and adapter.

This stuff should be shared and cached - even a 5 second cache will help a ton

add a federation broker

improve writing adapters

the current adapter is like the first possible thing that worked, hacky and just served the need I had at the time.

A better adapter framework should exist, mainly it seems NATS ingest will be the most prolific use so this should be a parameter to a well written package, the other side can stay roughly as it is now but the whole thing should be written around channels and context for plumbing rather than the meh way it is now

	- name: github.com/nats-io/go-nats
	version: d66cb54e6b7bdd93f0b28afc8450d84c780dfb68
	subpackages:
	- encoders/builtin
	- util

	- name: github.com/nats-io/nats
	version: d66cb54e6b7bdd93f0b28afc8450d84c780dfb68