Giter Club home page Giter Club logo

etcd-mesos's People

Contributors

dallasmarlow avatar jdef avatar ozdanborne avatar pires avatar spacejam avatar sttts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

etcd-mesos's Issues

add support for mesos persistent volumes for optional recovery

Optionally provide the ability to use persistent volumes, which are used for storage. When they are in use:

  • etcd-mesos scheduler checks zk to see if persistent volumes have been employed previously
  • if so, only start an initial etcd node if it can use a previous persistent volume. perform the same --force-new-cluster setup that is in use with the reseed codepath to clear previous cluster metadata.
  • if not, spin them up when starting nodes for use with them

other required changes:

  • when a total cluster loss occurs, don't lock, but wait indefinitely for a volume to return for use
  • when a majority of previous volumes are available, and no nodes are alive, start a new cluster using all of them - raft should facilitate
  • when a minority are alive, this gets trickier, do raft-comparison in a safe way without exposing ports to clients, and pick the longest to act as a reseed. terminate other volumes.

simple http stats endpoint

desirable stats:

  1. raft index for each etcd peer
  2. is the last health check passing?
  3. counter of reseed events since scheduler started
  4. counter of etcd tasks launched
  5. current etcd term for cluster
  6. aggregated etcd stats (reads, writes, latency)

nuke PumpTheBrakes

PumpTheBrakes was added as a dev stopgap before proper state verification was performed prior to launching new tasks. It's probably safe to remove now, but we need to do fault injection without it to increase our confidence.

support manual cluster reseed

situation:

  1. previous etcd-mesos cluster got into unrecoverable state
  2. operator performed manual backup of etcd server

Currently, there's no way to manually seed a new cluster except by manually traversing and duplicating keys in an etcd server. We need to support an operator supplying a restore argument that contains a compressed etcd storage directory which will be used to initialize a new cluster.

configurable static port

Some environments simply can't rely on SRV or mesos state.json. We should provide the option to only accept offers using a specific port for those in this situation, with the caveat that it will not be as easy to schedule on an arbitrary node.

automatic backups to s3/hdfs/nfs/gcs

At backup-interval seconds, an etcd executor should check to see if the etcd server running under it is the elected leader. If it is, it should backup, compress, and upload a copy of the storage to s3/hdfs/nfs/gcs.

Accept a flag to set framework name

The framework name is used to build Mesos-DNS A records. By default Mesos will set this to the FrameworkID, which is impossible for dependent services to predict.

etcd deployment fails with DCOS if framework found in Zookeeper

I setup etcd on my cluster using DCOS CLI a first time and it worked. I then uninstalled it. A couple days later I decided to reinstall but since, every installation is failing.
It seems that the reason for this is that the framework is found in Zookeeper but fails at restoring. Here is the failure trace I got through the stderr file in mesos (just changed the IPs with x.x.x.x (agent) and y.y.y.y(mesos master):

+ /work/bin/etcd-mesos-scheduler -alsologtostderr=true -framework-name=etcd -cluster-size=3 -master=zk://master.mesos:2181/mesos -zk-framework-persist=zk://master.mesos:2181/etcd -v=1 -auto-reseed=true -reseed-timeout=240 -sandbox-disk-limit=4096 -sandbox-cpu-limit=1 -sandbox-mem-limit=2048 -admin-port=3356 -driver-port=3357 -artifact-port=3358 -framework-weburi=http://etcd.marathon.mesos:3356/stats
I0222 04:14:30.573426       7 app.go:218] Found stored framework ID in Zookeeper, attempting to re-use: b9ff885a-c67e-4ec5-89cc-3b9d8fc0ef54-0003
I0222 04:14:30.575267       7 scheduler.go:209] found failover_timeout = 168h0m0s
I0222 04:14:30.575363       7 scheduler.go:323] Initializing mesos scheduler driver
I0222 04:14:30.575473       7 scheduler.go:792] Starting the scheduler driver...
I0222 04:14:30.575552       7 http_transporter.go:407] listening on x.x.x.x port 3357
I0222 04:14:30.575588       7 scheduler.go:809] Mesos scheduler driver started with PID=scheduler(1)@10.32.0.4:3357
I0222 04:14:30.575625       7 scheduler.go:821] starting master detector *zoo.MasterDetector: &{client:<nil> leaderNode: bootstrapLock:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0} bootstrapFunc:0x7991c0 ignoreInstalled:0 minDetectorCyclePeriod:1000000000 done:0xc2080548a0 cancel:0x7991b0}
I0222 04:14:30.575746       7 scheduler.go:999] Scheduler driver running.  Waiting to be stopped.
I0222 04:14:30.575776       7 scheduler.go:663] running instances: 0 desired: 3 offers: 0
I0222 04:14:30.575799       7 scheduler.go:671] PeriodicLaunchRequestor skipping due to Immutable scheduler state.
I0222 04:14:30.575811       7 scheduler.go:1033] Admin HTTP interface Listening on port 3356
I0222 04:14:30.607180       7 scheduler.go:374] New master [email protected]:5050 detected
I0222 04:14:30.607306       7 scheduler.go:435] No credentials were provided. Attempting to register scheduler without authentication.
I0222 04:14:30.607466       7 scheduler.go:922] Reregistering with master: [email protected]:5050
I0222 04:14:30.607656       7 scheduler.go:881] will retry registration in 1.254807398s if necessary
I0222 04:14:30.610527       7 scheduler.go:769] Handling framework error event.
I0222 04:14:30.610636       7 scheduler.go:1081] Aborting framework [&FrameworkID{Value:*b9ff885a-c67e-4ec5-89cc-3b9d8fc0ef54-0003,XXX_unrecognized:[],}]
I0222 04:14:30.610890       7 scheduler.go:1062] stopping messenger
I0222 04:14:30.610985       7 messenger.go:269] stopping messenger..
I0222 04:14:30.611076       7 http_transporter.go:476] stopping HTTP transport
I0222 04:14:30.611168       7 scheduler.go:1065] Stop() complete with status DRIVER_ABORTED error <nil>
I0222 04:14:30.611262       7 scheduler.go:1051] Sending error via withScheduler: Framework has been removed
I0222 04:14:30.611366       7 scheduler.go:298] stopping scheduler event queue..
I0222 04:14:30.611504       7 http_transporter.go:450] HTTP server stopped because of shutdown
I0222 04:14:30.611598       7 scheduler.go:444] Scheduler received error: Framework has been removed
I0222 04:14:30.611687       7 scheduler.go:444] Scheduler received error: Framework has been removed
I0222 04:14:30.611779       7 scheduler.go:250] finished processing scheduler events

Any suggestions on how to fix the deployment?

investigate patching + upstreaming proxy to bail out or requery when reconnected to a different clusterid

The current etcd proxy, when using a discovery-srv to find peers from srv records, will never requery the srv record. As a result, when it loses connectivity to the cluster's servers, it will block forever trying to reconnect to the last known nodes. If a different etcd cluster comes up and happens to reuse the same ports on one of the hosts, the proxy will start using it as the source of truth. Note that reseeding will also generate a new clusterid. Can we have the proxy bail out if it detects that it has talked to nodes with two different cluster ID's ever?

determine healthy disk size

during load testing, some etcd nodes crashed during log compaction with the following error:

2015/08/17 17:24:41 etcdserver: raft save state and entries error: write etcd_data/member/wal/000000000000001f-00000000007582d2.wal: no space left on device

As can be seen at etcd-io/etcd#3300, etcd can use double the required disk space to complete a compaction. We need to determine a safe size for most workloads to safely exist in the mesos sandbox.

release channels for 0.22 and 0.24+0.23

We should create branches for different release channels. My thinking is that we could have branches corresponding to mesos versions, and they pull in changes from master which are mesos version agnostic, while applying version-specific logic in each (no persistent volume stuff in the 0.22 branch, etc...)

tune down backoffs

In some cases, progress cannot be made until an RPC to an unhealthy etcd instance times out. Investigate turning down some of the backoff timeouts.

bail out if reregistered version != registered version

Because mesos-go bindings are version-specific, and we'd like to support down to version 0.22 up through 0.24, if we reregister and the master version has increased then we should bail out so that a wrapper script can start with the proper mesos-go version. This is a race condition, as the master may have been upgraded after the detector wrapper ran but before the registered call completed, but the window is small.

automatic re-seeding sounds dangerous

Hey, I just stumbled across the project, and I'm concerned about the automatic re-seeding. If you can't access a majority of an etcd/raft cluster, you don't know if you've lost committed data. Rolling with it breaks the promise that etcd/raft makes to its clients, so automatically trying to recover seems like it could cause a lot of trouble. Making this the default behavior is even worse.

Do you have evidence that automating this is even necessary in practice? I like automated systems too, but I'd rather get human approval any time I'm admitting possible data loss.

cc @philips

mesos version detector script

Create a script that detects the mesos version and launches the appropriate etcd-mesos binary compiled for that version of mesos-go.

missing latest mesos-go in Godeps?

$ make
rm bin/etcd-*
rm -f /home/vagrant/etcd-mesos-workspace/src/github.com/mesosphere/etcd-mesos/Godeps/_workspace/src/""github.com/mesosphere"/etcd-mesos"
mkdir -p /home/vagrant/etcd-mesos-workspace/src/github.com/mesosphere/etcd-mesos/Godeps/_workspace/src/"github.com/mesosphere"
ln -s /home/vagrant/etcd-mesos-workspace/src/github.com/mesosphere/etcd-mesos /home/vagrant/etcd-mesos-workspace/src/github.com/mesosphere/etcd-mesos/Godeps/_workspace/src/""github.com/mesosphere"/etcd-mesos"
go build -o bin/etcd-mesos-executor cmd/etcd-mesos-executor/app.go
go build -o bin/etcd-mesos-scheduler cmd/etcd-mesos-scheduler/app.go
# github.com/mesosphere/etcd-mesos/scheduler
Godeps/_workspace/src/github.com/mesosphere/etcd-mesos/scheduler/scheduler.go:964: unknown mesosproto.TaskInfo field 'Discovery' in struct literal
make: *** [bin/etcd-mesos-scheduler] Error 2

add docs

  • architecture
  • differences between ectd and etcd-mesos
  • configuration
  • deployment & administration guide

claim zk ephemeral znode before registering

This allows multiple instances of the scheduler to be run in HA mode if desired, and will prevent multiple nodes alternatively kicking each other off the mesos master when they register.

use mesos DiscoveryInfo to cut down on the number of returned ports in mesos-dns

Kubernetes-mesos is currently unable to use etcd-mesos, as the etcd proxy that will be local to the k8s components fails to parse the mesos-dns response if it is larger than 512 bytes (presumed 512, 768 failed but 112 worked). We need to cut down on the number of ports returned, which is currently 6 per etcd instance (udp, tcp * client, peer, and reseed listener).

add tests for reseed logic

tests should cover:

  • ensuring that the node with the highest raft index is chosen
  • ensuring that when the first node fails to come online, that the next node is attempted
  • ensuring that no nodes are killed unless a node is successfully determined to be a new seed
  • ensuring that when a new seed is chosen, and the scheduler dies before killing old nodes, that when it comes back it knows to kill the non-healthy stale cluster (it should perform a new reseed, but pick the exact same node as was picked before due to higher raft index)
  • ensuring that when the scheduler starts and nodes were in livelocked state, that the above logic works

bad peer created on master restart

Bad peer brought up when master reconnects. Steps:

2015-09-28 18:39:19.540160 I | etcdmain: etcd Version: 2.2.0
2015-09-28 18:39:40.070822 I | etcdmain: Git SHA: e4561dd
2015-09-28 18:39:15.417100 I | etcdmain: Go Version: go1.5
2015-09-28 18:39:15.417108 I | etcdmain: Go OS/Arch: linux/amd64
2015-09-28 18:39:15.417120 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2015-09-28 18:39:15.417148 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2015-09-28 18:39:15.417490 I | etcdmain: listening for peers on http://localhost.localdomain:31003
2015-09-28 18:39:15.417604 I | etcdmain: listening for client requests on http://localhost.localdomain:31004
2015-09-28 18:39:15.437030 I | netutil: resolving localhost.localdomain:31000 to 127.0.0.1:31000
2015-09-28 18:39:15.437101 I | netutil: resolving localhost.localdomain:31003 to 127.0.0.1:31003
2015-09-28 18:39:15.437187 I | etcdmain: stopping listening for client requests on http://localhost.localdomain:31004
2015-09-28 18:39:15.437204 I | etcdmain: stopping listening for peers on http://localhost.localdomain:31003
2015-09-28 18:39:15.437220 C | etcdmain: error validating peerURLs {ClusterID:36e76486cf49fa77 Members:[&{ID:6429236700d6f390 RaftAttributes:{PeerURLs:[http://localhost.localdomain:32000]} Attributes:{Name:etcd-1443480720 ClientURLs:[http://localhost.localdomain:32001]}} &{ID:aaccb2ea791d9ae1 RaftAttributes:{PeerURLs:[http://localhost.localdomain:31000]} Attributes:{Name:etcd-1443480719 ClientURLs:[http://localhost.localdomain:31001]}}] RemovedMemberIDs:[]}: unmatched member while checking PeerURLs

This is possibly the result of clearing peers on disconnect/reconnect, and then syncing with master which itself has an incomplete perspective. The quick fix is to hold off before reconciling to allow slaves to check in. A better fix may be to persist known tasks in ZK.

fix logging and log rotation

executor logging is borked currently. need to log:

  • fetcher info
  • executor stuff
  • etcd stuff
    with rotation enabled for the etcd log, and possibly the executor log

cluster re-seed support

When a cluster experiences livelock for reseed-timeout seconds, the scheduler should:

  1. determine liveness of each etcd server
  2. determine raft index of each live etcd server
  3. for each server, starting with the highest raft index and ending with the lowest, try to reseed a cluster using that node. If it succeeds, try to create at least cluster-size / 2 NEW slave instances, and kill the other previous members if that is successful.
  4. If none of the reseeds succeed, do NOTHING - an operator needs to perform a manual backup and restore, and we don't want to kill our tasks which will cause their mesos sandbox to be rm'd in the mean time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.