Giter Club home page Giter Club logo

constructr's People

Contributors

berardino avatar everpeace avatar fcecilia avatar gerson24 avatar gitter-badger avatar hrenovcik avatar hseeberger avatar jasongoodwin avatar markusjura avatar matsluni avatar nick-nachos avatar raboof avatar rubendg avatar sergigp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

constructr's Issues

Handle MemberRemoved

In the case of a graceful removal of a member node, ConstructR needs to either remove the respective entry from KV or shutdown the system. I think I prefer the latter, because a system using ConstructR is intended as a cluster member, right?

'docker-machine ip default' fails

When trying to run the tests, I usually get:

[info] * de.heikoseeberger.constructr.akka.MultiNodeConsulConstructrSpec
[JVM-5] Error saving host to store: remove /home/aengelen/.docker/machine/machines/default/config.json: no such file or directory
[JVM-5] *** RUN ABORTED ***
[JVM-5]   java.lang.ExceptionInInitializerError:
[JVM-5]   at de.heikoseeberger.constructr.akka.MultiNodeConsulConstructrSpec.<init>(MultiNodeConsulConstructrSpec.scala:67)
[JVM-5]   at de.heikoseeberger.constructr.akka.MultiNodeConsulConstructrSpecMultiJvmNode5.<init>(MultiNodeConsulConstructrSpec.scala:65)
[JVM-5]   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[JVM-5]   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[JVM-5]   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[JVM-5]   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
[JVM-5]   at java.lang.Class.newInstance(Class.java:442)
[JVM-5]   at org.scalatest.tools.Runner$.genSuiteConfig(Runner.scala:2644)
[JVM-5]   at org.scalatest.tools.Runner$$anonfun$37.apply(Runner.scala:2461)
[JVM-5]   at org.scalatest.tools.Runner$$anonfun$37.apply(Runner.scala:2460)
[JVM-5]   ...
[JVM-5]   Cause: java.lang.RuntimeException: Nonzero exit value: 1
[JVM-5]   at scala.sys.package$.error(package.scala:27)
[JVM-5]   at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.slurp(ProcessBuilderImpl.scala:132)
[JVM-5]   at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang$bang(ProcessBuilderImpl.scala:102)
[JVM-5]   at de.heikoseeberger.constructr.akka.ConsulConstructrMultiNodeConfig$.<init>(MultiNodeConsulConstructrSpec.scala:40)
[JVM-5]   at de.heikoseeberger.constructr.akka.ConsulConstructrMultiNodeConfig$.<clinit>(MultiNodeConsulConstructrSpec.scala)
[JVM-5]   at de.heikoseeberger.constructr.akka.MultiNodeConsulConstructrSpec.<init>(MultiNodeConsulConstructrSpec.scala:67)
[JVM-5]   at de.heikoseeberger.constructr.akka.MultiNodeConsulConstructrSpecMultiJvmNode5.<init>(MultiNodeConsulConstructrSpec.scala:65)
[JVM-5]   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[JVM-5]   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[JVM-5]   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[JVM-5]   ...

... which suggests 'docker-machine ip default' fails. The strange thing is it sometimes works for the Etcd test.

Running 'docker-machine ip default' on the command-line works fine.

I'm running on Linux

When hard-coding the docker-machine ip the tests run without further problems.

Improve error message when adding self to consul fails

putKeyWithSession returns Success(false) when the PUT call to consul returnes status code 200, and indicates (the body of the response) that the update has not taken place (https://consul.io/docs/agent/http/kv.html)).

In that case, the if result in ConsulCoordination.addSelf will cause the result to be a Failure(_) with the generic error message java.util.NoSuchElementException: Future.filter predicate is not satisfied, and no obvious clue in the stacktrace either.

Using the consul service catalog

When using consul, ConstructR currently uses the consul KV store both for storing the lock that determines whether there is already a seed node initializing, and for storing the addresses of seed nodes.

Wouldn't it make sense to use the consul service catalog to keep track of the addresses of seed nodes?

Rethink failure handling

Currently ConstructR is pretty aggressive when it comes to failure: in many cases the system is terminated. Further on it's not easy to customize failure handling.

My current thoughts are:

  • Make things as simple as possible
  • By default the system should not terminate in the face of failure

ZK

Any plans to add support for zookeeper?

Nice project!

Outgoing Connection 'caches' the IP in Coordination

In Constructr.scala the Connection is initialized and the solved IPs are cached but it's using the same one in every flow materialization. As a result, if that Consul node got down, ConstructR won't try with another IP.

val connection = Http()(context.system).outgoingConnection(host, port)
Coordination("akka", context.system.name, context.system.settings.config)
            (connection, ActorMaterializer())

This behaviour might be observed during DNS resolution of coordination cluster nodes.

Possible related info:

akka/akka#19419
https://gitter.im/akka/akka?at=569552acee13050b38a32091

Uniform treatment of outdated refresh

As described in #22, etcd doesn't really care about refreshing an outdated (TTLed) entry, it simply creates a new one. Consul instead, when trying to refresh an outdated session, returns NotFound.

Therefore we need to change the general behavior to transitioning back to AddingSelf in case of such an issue with refreshing. In etcd we can simply add a prevExist=true to the PUT to also get a NotFound.

A RefreshResult needs to be introduced with the existing Refreshed and a new SelfNotFound as subtypes.

Isolated mode

I have an application that is intended to run clustered, but e.g. for local development it might be convenient to be able to run a single, isolated instance without connecting etcd/consul.

What would be a convenient way to achieve that? Perhaps we could introduce a 'dummy' coordination backend and I could introduce an option to select that one in my application?

Consul support

Excellent work!. Thanks for sharing this. Do you have any plans to support consul in addition to etcd in the near future?. Would you accept PRs in this regard?.

More control over what happens when coordination fails

The README still mentions: If something goes wrong, e.g. a timeout (after configurable retries are exhausted) when interacting with the coordination service, ConstructR by default terminates its ActorSystem. At least for constructr-akka this can be changed by providing a custom SupervisorStrategy to the manually started Constructr actor, but be sure you know what you are doing.

The latter is no longer the case, right? de.heikoseeberger.constructr.akka.Constructr is final, fixes its SupervisionStrategy to SupervisorStrategy.stoppingStrategy and terminates its ActorSystem when coordination terminates.

I'd like to have some more control over what to do when coordination fails.

Allow existing nodes in ConsulCoordination.addSelf

If a node is restarted quickly, addSelf doesn't behave correctly since a previous session can already have acquired the key so that a new session cannot obtain the lock. Therefore, the former session will expire after the ttl and the key will disappear.

Make TTL-factor rules consistent

The README (and reference.conf) mention:

ttl-factor // Must be greater than 1 + (coordination-timeout * (1 + coordination-retries) / refresh-interval)!

However, ConstructrMachineSettings checks:

require(
      ttlFactor > 1 + coordinationTimeout / refreshInterval,
      s"ttl-factor must be greater than one plus coordination-timeout divided by refresh-interval, but was $ttlFactor!"
    )

In other words: the automated check does not take into account the allowed number of coordination retries.

Do we want to make those consistent?

Cleanup ttl handling

  • ttl handling in ConstructrMachine is confusing
  • ttl documentation is inconsistent (see #64)
  • ttl should always be a FiniteDuration

constructr-cassandra not found

Adding
resolvers += Resolver.bintrayRepo("hseeberger", "maven")

libraryDependencies ++= Vector(
"de.heikoseeberger" %% "constructr-cassandra" % "0.13.2",
...
)

in my build.sbt result in an unresolved dependency.

Note: Unresolved dependencies path:
[warn] de.heikoseeberger:constructr-cassandra_2.11:0.13.2 (/Users/elio/progetti/iptm/build.sbt#L29-70)
[warn] +- eu.sia.innhub:iptm_2.11:1.0
sbt.ResolveException: unresolved dependency: de.heikoseeberger#constructr-cassandra_2.11;0.13.2: not found

The library seems not to be present on bintray (http://dl.bintray.com/hseeberger/maven/de/heikoseeberger/)

[cas] Use official cassandra image

Currently we use hseeberger/cassandra which is based on a PR against the official cassandra image. Yet the PR won't be accepted and hseeberger/cassandra dropped.

Using RUN we can rewrite docker-entrypoint.sh to use the seed-provider from ConstructR.

Check that ttl-factor > 1

While README.md and the comments in reference.conf say that "ttl-factor must be greater than one", that's never actually checked.

Not handling MemberJoined and MemberUp cluster events

It is possible that akka cluster MemberJoined and MemberUp events are send while the ConstructrMachine is in state AddingSelf. In this state the events are unhandled and therefore the following messages are written as warnings to the log:

2016-09-01T13:38:46Z MacBook-Pro-6.local WARN  ConstructrMachine [sourceThread=conductr-akka.actor.default-dispatcher-2, akkaTimestamp=13:38:46.609UTC, akkaSource=akka.tcp://[email protected]:9024/user/reaper/constructr/constructr-machine, sourceActorSystem=conductr] - unhandled event MemberUp(Member(address = akka.tcp://[email protected]:9044, status = Up)) in state AddingSelf
2016-09-01T13:38:46Z MacBook-Pro-6.local WARN  ConstructrMachine [sourceThread=conductr-akka.actor.default-dispatcher-30, akkaSource=akka.tcp://[email protected]:9034/user/reaper/constructr/constructr-machine, sourceActorSystem=conductr, akkaTimestamp=13:38:46.141UTC] - unhandled event MemberJoined(Member(address = akka.tcp://[email protected]:9054, status = Joining)) in state AddingSelf

The current code first unsubscribes to the akka cluster events before moving into AddingSelf state. However, it might be a raise condition that other MemberUp or MemberJoined events are received while Akka still hasn't processed the unsubscription.

Rethink coordination timeout handling

Currently coordination timeouts are handled via uniform retries. But as most coordination operations aren't idempotent (yet), this is a source for trouble. Also, dealing with TTLs gets nasty in the face of retries. I suggest the following changes:

  • GettingNodes: transition to BeforeGettingNodes, i.e. effectively unlimited retries
  • Locking: transition to BeforeGettingNodes, i.e. effectively unlimited retries; also make lock idempotent by first reading
  • AddingSelf: use explicit retries, fail after exhausted
  • Refreshing: unlimited retries

In all cases a warning or even an error should be logged.

Allow 201 Created as status code for EtcdCoordination.refresh

When refreshing for some reason (GC pause or other delays) can't refresh timely enough, the entry in etcd store has disappeared and hence refresh results in a 201 Created instead of a 200 OK. This is perfectly fine, since the system is still up an running.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.