Giter Club home page Giter Club logo

marathon's Introduction

Project Status

Support for DC/OS ends on October 31, 2021. We will continue to provide support for our current DC/OS customers per their contracts, of course. However, we will no longer be investing in new features or capabilities or maintaining the related repositories. If a customer wishes to continue use of the DC/OS Enterprise platform or other non-free DC/OS components, the customer can purchase an End-of-Life License or Perpetual Use License, however support is not included in these licenses and continued use of DC/OS will be at your own discretion and risk. We apologize for any inconvenience this may have caused.

We want to thank all of our loyal customers, particularly those DC/OS users who were fellow pioneers in the growth of the cloud native landscape from the beginning.

Marathon is a production-proven Apache Mesos framework for container orchestration. DC/OS is the easiest way to start using Marathon. Issues are tracked in JIRA.

Marathon provides a REST API for starting, stopping, and scaling applications. Marathon is written in Scala and can run in highly-available mode by running multiple copies. The state of running tasks gets stored in the Mesos state abstraction.

Marathon is also used as a meta framework: you can start other Mesos frameworks such as Chronos or Storm with it to ensure they survive machine failures. It can launch anything that can be launched in a standard shell. In fact, you can even start other Marathon instances via Marathon.

Features

  • HA - run any number of Marathon schedulers, but only one gets elected as leader; if you access a non-leader, your request gets proxied to the current leader
  • Constraints - e.g., only one instance of an application per rack, node, etc.
  • Service Discovery & Load Balancing via HAProxy or the events API (see below).
  • Health Checks: check your application's health via HTTP or TCP checks.
  • Event Subscription lets you supply an HTTP endpoint to receive notifications, for example to integrate with an external load balancer.
  • Marathon UI
  • JSON/REST API for easy integration and scriptability
  • Basic Auth and SSL
  • Metrics: query them at /metrics in JSON format, push them to systems like Graphite, StatsD and DataDog, or scrape them using Prometheus.

Documentation

Marathon documentation is available on the Marathon GitHub pages site.

Documentation for installing and configuring the full Mesosphere stack including Mesos and Marathon is available on the Mesosphere website.

Issue Tracking

Marathon uses JIRA to track issues. You can browse existing issues or file a new issue with your GitHub account.

Note for users of GitHub issues: All existing issues have been migrated and closed, and a reference to the related JIRA has been added as a comment. We leave the GitHub issues available for reference. Going forward please use JIRA always.

Contributing

We heartily welcome external contributions to Marathon's documentation. Documentation should be committed to the master branch and published to our GitHub pages site using the instructions in docs/README.md.

Setting Up And Running Marathon

Dependencies

Marathon has the following compile-time dependencies:

  • sbt - A build tool for scala. You can find the instructions for installing sbt for Mac OS X and Linux over here.
  • JDK 1.8+

For run-time, Marathon has the following dependencies:

  • libmesos - JNI bindings for talking to Apache Mesos master. Look at the Install Mesos section for instructions to get libmesos.
  • Apache Zookeeper - You can have a separate Zookeeper installation specifically for Marathon, or you can use the same Zookeeper used by Mesos.

Installation

Getting started with DC/OS

The by far easiest way to get Marathon running is to use DC/OS. Marathon is pre-bundled into DC/OS.

Install Mesos

Marathon requires libmesos, a shared object library, that contains JNI bindings for Marathon to talk to the Mesos master. libmesos comes as part of the Apache Mesos installation. There are three options for installing Apache Mesos.

Installing Mesos from prepackaged releases

Instructions on how to install prepackaged releases of Mesos are available in the Marathon docs.

Building Mesos from source

NOTE: Choose this option only if building Marathon from source, else there might be version incompatibility between pre-packaged releases of Marathon and Mesos built from source.

You can find the instructions for compiling Mesos from source in the Apache Mesos getting started docs. If you want Mesos to install libraries and executables in a non-default location use the --prefix option during configuration as follows:

./configure --prefix=<path to Mesos installation>

The make install will install libmesos (libmesos.so on Linux and libmesos.dylib on Mac OS X) in the install directory.

Using the Mesos Version Manager

NOTE: Choose this option only if building Marathon from source, else there might be version incompatibility between pre-packaged releases of Marathon and Mesos built from source.

The Mesos Version Manager (mvm) compiles, configures, and manages multiple versions of Apache Mesos. It allows switching between versions quickly, making it easy to test Marathon against different versions of Mesos.

Prerequisites

The Mesos Version Manager assumes that all dependencies of Apache Mesos are readily installed.
Please refer to the Apache Mesos getting started docs for instructions on how to set up the build environment.

MVM compiles Mesos with SSL support by default, which requires openssl and libevent to be installed.
On macOS, the packages can be installed using brew: brew install openssl libevent
On CentOS, the packages can be installed using yum: sudo yum install -y libevent-devel openssl-devel

Usage

The script can be run as follows:

    cd marathon
    cd tools
    ./mvm.sh <VERSION> [SHELL]

The following command will launch a bash shell configured for Mesos 1.2.0: ./mvm.sh 1.2.0 bash

You should consider placing the script into a folder in your shell's PATH if you are using it regularly.

The mvm script accepts three different formats as version name:

  1. Version tags from the Mesos repository. Use ./mvm.sh --tags in order to obtain a list of available tags.
  2. Commit hashes from the Mesos repository.
  3. The --latest flag, which automatically chooses the latest development version: ./mvm.sh --latest.

MVM Will automatically download & compile Apache Mesos if necessary. It will then spawn a new bash shell with the chosen version of Mesos activated.
For more information see ./mvm.sh --help.

Note: You will have to re-run the script if you wish to use Mesos after closing the shell. See ./mvm.sh --help information on how to permanently configure your shell for mvm to avoid this.

Install Marathon

Instructions on how to install prepackaged releases are available in the Marathon docs. Alternatively, you can build Marathon from source.

Building from Source
  1. To build Marathon from source, check out this repo and use sbt to build a universal:

    git clone https://github.com/mesosphere/marathon.git
    cd marathon
    sbt 'run --master localhost:5050 --zk zk://localhost:2181/marathon'
    

    Troubleshooting

    1. Failure in retrieval of IP address of the local machine will result in an error and may look like this:

      Failed to obtain the IP address for '<local-machine>'; the DNS service may not be able to resolve it: nodename nor servname provided, or not known

      Make sure that LIBPROCESS_IP environment variable is set.

      export LIBPROCESS_IP="127.0.0.1"
      
    2. When the MESOS_NATIVE_JAVA_LIBRARY environment variable is not set, the following error may occur,

      java.lang.UnsatisfiedLinkError: no mesos in java.library.path...

      Make sure that MESOS_NATIVE_JAVA_LIBRARY environment variable is set.

      export MESOS_NATIVE_JAVA_LIBRARY="/path/to/mesos/lib/libmesos.dylib"
      
  2. Run sbt universal:packageZipTarball to package Marathon as an txz file containing bin/marathon fully packaged.

  3. Run cd tools/packager; make tag-docker for a local Marathon docker image.

Running in Development Mode

Mesos local mode allows you to run Marathon without launching a full Mesos cluster. It is meant for experimentation and not recommended for production use. Note that you still need to run ZooKeeper for storing state. The following command launches Marathon on Mesos in local mode. Point your web browser to http://localhost:8080 to see the Marathon UI.

    mesos-local
    sbt 'run --master localhost:5050 --zk zk://localhost:2181/marathon'

For more information on how to run Marathon in production and configuration options, see the Marathon docs.

Developing Marathon

See developing Marathon in the docs.

Marathon Clients

Companies using Marathon

Across all installations Marathon is managing applications on more than 100,000 nodes world-wide. These are some of the companies using it:

Not in the list? Open a pull request and add yourself!

Help

Have you found an issue? Feel free to report it using our JIRA Issues page. In order to speed up response times, we ask you to provide as much information on how to reproduce the problem as possible. If the issue is related in any way to the web UI, we kindly ask you to use the gui label.

If you have questions, please post on the Marathon Framework email list.

You can find Marathon support in the #marathon channel, and Mesos support in the #mesos channel, on freenode (IRC). Alternatively, check out the same channels on the Mesos Slack (request an invitation here).

The team at Mesosphere is also happy to answer any questions.

If you'd like to take part in design research and test new features in Marathon before they're released, please add your name to our UX Research list.

Authors

Marathon was created by Tobias Knaup and Florian Leibert and continues to be developed by the team at Mesosphere and by many contributors from the community.

Acknowledgements

YourKit, LLC

YourKit, LLC

YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.

marathon's People

Contributors

ajain13 avatar alenkacz avatar aneumann82 avatar aquamatthias avatar cmaloney avatar connordoyle avatar drexin avatar everpeace avatar florianleibert avatar gkleiman avatar guenter avatar ichernetsky avatar janisz avatar jasongilanfarr avatar jdef avatar jeschkies avatar kensipe avatar lloesche avatar meichstedt avatar meln1k avatar mlunoe avatar sascala avatar solidsnack avatar ssk2 avatar ssorallen avatar sttts avatar timcharper avatar unterstein avatar wavesoft avatar zen-dog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

marathon's Issues

Marathon continues to try to download resources it can't access

When creating a task with a tarball that is not accessible, Marathon retries indefinitely to start the app.

The screenshot below shows Marathon trying to run the following command a bunch of times:

marathon start -i rails-app -C 'cd rails_test_app && bundle exec rails s' \
  -n 1 -c 1 -m 99 -u http://datacentercomputer.s3.amazonaws.com/rails_test_app.tgz

screen shot 2013-07-23 at 3 45 33 pm

Error when running `./bin/start.bash -m localhost:5000`

I am trying to run Marathon against a mesos-local setup, and I am getting this error:

Exception in thread "main" com.google.inject.CreationException: Guice creation errors:

1) Error injecting constructor, org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /marathon/apps
  at mesosphere.marathon.AppRegistry.<init>(AppRegistry.scala:24)
  at mesosphere.marathon.MarathonModule.configure(MarathonModule.scala:35)
  while locating mesosphere.marathon.AppRegistry
    for parameter 3 at mesosphere.marathon.MarathonScheduler.<init>(MarathonScheduler.scala:20)
  at mesosphere.marathon.MarathonModule.configure(MarathonModule.scala:34)
  while locating mesosphere.marathon.MarathonScheduler
    for parameter 4 at mesosphere.marathon.MarathonSchedulerService.<init>(MarathonSchedulerService.scala:30)
  at mesosphere.marathon.MarathonModule.configure(MarathonModule.scala:33)
  while locating mesosphere.marathon.MarathonSchedulerService
    for parameter 0 at mesosphere.marathon.api.RedirectFilter.<init>(RedirectFilter.scala:13)
  at mesosphere.marathon.api.MarathonRestModule.configureServlets(MarathonRestModule.scala:19)
  while locating mesosphere.marathon.api.RedirectFilter

1 error
    at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
    at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:183)
    at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:109)
    at com.google.inject.Guice.createInjector(Guice.java:95)
    at com.google.inject.Guice.createInjector(Guice.java:72)
    at mesosphere.Application$class.run(Application.scala:31)
    at mesosphere.marathon.Main$.run(Main.scala:13)
    at mesosphere.marathon.Main$delayedInit$body.apply(Main.scala:29)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.collection.immutable.List.foreach(List.scala:309)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
    at scala.App$class.main(App.scala:71)
    at mesosphere.marathon.Main$.main(Main.scala:13)
    at mesosphere.marathon.Main.main(Main.scala)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /marathon/apps
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
    at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1277)
    at mesosphere.marathon.AppRegistry.<init>(AppRegistry.scala:36)
    at mesosphere.marathon.AppRegistry$$FastClassByGuice$$c6172113.newInstance(<generated>)
    at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
    at com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
    at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
    at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
    at com.google.inject.Scopes$1$1.get(Scopes.java:65)
    at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
    at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
    at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
    at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:84)
    at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
    at com.google.inject.Scopes$1$1.get(Scopes.java:65)
    at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
    at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
    at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
    at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:84)
    at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
    at com.google.inject.Scopes$1$1.get(Scopes.java:65)
    at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
    at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
    at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
    at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:84)
    at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
    at com.google.inject.Scopes$1$1.get(Scopes.java:65)
    at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
    at com.google.inject.internal.InternalInjectorCreator$1.call(InternalInjectorCreator.java:204)
    at com.google.inject.internal.InternalInjectorCreator$1.call(InternalInjectorCreator.java:198)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
    at com.google.inject.internal.InternalInjectorCreator.loadEagerSingletons(InternalInjectorCreator.java:198)
    at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:179)
    ... 15 more

option to allow marathon to launch/stop tasks on slave in parallel to improve Docker integration

Hi,

Marathon is starting/stopping tasks on slaves in serial. This is not ideal for launching lightweight Docker containers. It ends up taking almost 60 seconds to launch 10 containers which if launched in parallel would like take less then <10 seconds. Also, the 6 seconds to start a docker container is suspicious. They launch in less then 1 second when done manually.

I tested starting 10 containers on a slave and they had the following start times in seconds:
1,7,12,18,23,29,35,40,46,52 etc.

5472ce303424 scivm/scientific-python-2.7:latest /usr/bin/supervisord 10 seconds ago Up 9 seconds 0.0.0.0:31012->8888/tcp, 22/tcp furious_wright

991550b65821 scivm/scientific-python-2.7:latest /usr/bin/supervisord 16 seconds ago Up 15 seconds 0.0.0.0:31020->8888/tcp, 22/tcp kickass_franklin

4eb86a81c0f4 scivm/scientific-python-2.7:latest /usr/bin/supervisord 21 seconds ago Up 20 seconds 0.0.0.0:31022->8888/tcp, 22/tcp focused_bohr

aa343d797e32 scivm/scientific-python-2.7:latest /usr/bin/supervisord 27 seconds ago Up 26 seconds 0.0.0.0:31156->8888/tcp, 22/tcp berserk_mclean

c40145cb3b0f scivm/scientific-python-2.7:latest /usr/bin/supervisord 32 seconds ago Up 31 seconds 0.0.0.0:31635->8888/tcp, 22/tcp berserk_thompson

5cc0d81a6ec1 scivm/scientific-python-2.7:latest /usr/bin/supervisord 38 seconds ago Up 38 seconds 0.0.0.0:31116->8888/tcp, 22/tcp sad_davinci

fa2fc55f303f scivm/scientific-python-2.7:latest /usr/bin/supervisord 44 seconds ago Up 43 seconds 0.0.0.0:31009->8888/tcp, 22/tcp furious_franklin

45bf7beb1ea5 scivm/scientific-python-2.7:latest /usr/bin/supervisord 49 seconds ago Up 48 seconds 0.0.0.0:31098->8888/tcp, 22/tcp hungry_albattani

df8779b6f721 scivm/scientific-python-2.7:latest /usr/bin/supervisord 55 seconds ago Up 54 seconds 0.0.0.0:31667->8888/tcp, 22/tcp drunk_mclean

f3419635b590 scivm/scientific-python-2.7:latest /usr/bin/supervisord About a minute ago Up About a minute 0.0.0.0:31642->8888/tcp, 22/tcp tender_pasteur

Thanks,
Mike

Add endpoint to push changes to apps

Customers can build tools that keep a connection open to Marathon so they can receive status updates, i.e. host:port being added/removed from an app. Use case is load balancing.
Could use HTTP long polling or Websockets. The advantage is that building tools against an HTTP API would be simple.

Example logging configuration

hello,

As a sysadmin I would be happy having some example how to make marathon log in a file on the local filesystem.

Thanks

Support dynamic port allocations

Allow marathon to pick a port for the application and expose this via an environment variable that's used when starting the application.

The host:port should be persisted in zookeeper as well under:
/marathon/services/service_name/seq_node

This should contain the endpoints as well as host:id pairs as jason.

@guenter

If a resource offer can accommodate multiple jobs in the queue, schedule multiple jobs at once

The current scheduling logic is fairly naive, it boils down to:

for each offer
  task = queue.peek
  if offer.matches(task)
    launch(task)
  else
    decline(offer)
    queue.add(task)
end

There are a couple of things that could be improved:

  • Use an offer to launch multiple tasks if it's big enough
  • Don't give up after the first task doesn't match, try other tasks in the queue
  • Try to satisfy constraints first, then tasks without constraints

Show app status in web UI

Apps registered with Marathon are displayed in the UI, but whether they are actually running is not. Let's add that.

Docker Trusted Build

I read about the Docker Trusted Builds today and my first thought (well, after mesos itself) would be to package marathon. Is this something that's being worked on? If it's not, I'd love to tackle getting marathon to build in a Dockerfile so it can use the trusted build system.

Is this test failure spurious?

Unfortunately, adding .addExecutorIds(ExecutorID.newBuilder.setValue("exe0")) to the offer doesn't fix it. Not sure that's the right place to add it, in any event.

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running mesosphere.mesos.TaskBuilderTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.774 sec <<< FAILURE!

Results :

Tests in error: 
  testBuildIfMatches(mesosphere.mesos.TaskBuilderTest): Message missing required fields: executor_id

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.635s
[INFO] Finished at: Thu Aug 22 20:46:18 UTC 2013
[INFO] Final Memory: 9M/23M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.10:test (default-test) on project marathon: There are test failures.
[ERROR] 
[ERROR] Please refer to /Users/solidsnack/flobi/marathon/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

CLI first pass

create a simple CLI that supports

  • start app
  • scale no of instances
  • stop app

Cant get marathon example running on Mesos 0.14.2

I am trying get this example working "http POST localhost:8080/v1/apps/start id=sleep cmd='sleep 600' instances=1 mem=128 cpus=1" on latest version of Marathon but it fails and see a TASK_FAILED in the mesos UI.

I ran mesos examples and they work fine.

Below is the mesos-slave log. I am running following on same machine :

  1. single mesos master
  2. single mesos slave
  3. zookeeper
  4. marathon
    mesos is running without the zookeeper. Pls suggest if I am missing anything here.

Thanks.


I1215 08:29:28.001943 6218 slave.cpp:2178] Executor 'sleep_0-1387096167104' of framework 201312121834-813934784-5050-1492-0000 has exited with status 1
I1215 08:29:28.015235 6218 slave.cpp:1798] Handling status update TASK_FAILED (UUID: d59aa563-a2ca-408c-b33b-09228d091824) for task sleep_0-1387096167104 of framework 201312121834-813934784-5050-1492-0000 from @0.0.0.0:0
I1215 08:29:28.022212 6218 status_update_manager.cpp:300] Received status update TASK_FAILED (UUID: d59aa563-a2ca-408c-b33b-09228d091824) for task sleep_0-1387096167104 of framework 201312121834-813934784-5050-1492-0000
I1215 08:29:28.023475 6218 status_update_manager.cpp:351] Forwarding status update TASK_FAILED (UUID: d59aa563-a2ca-408c-b33b-09228d091824) for task sleep_0-1387096167104 of framework 201312121834-813934784-5050-1492-0000 to [email protected]:5050
W1215 08:29:28.040544 6217 process_isolator.cpp:265] Failed to kill the process tree rooted at pid 18735: Failed to find process 18735
I1215 08:29:28.048215 6217 process_isolator.cpp:298] Asked to update resources for an unknown/killed executor 'sleep_0-1387096167104' of framework 201312121834-813934784-5050-1492-0000
I1215 08:29:28.062522 6220 status_update_manager.cpp:375] Received status update acknowledgement (UUID: d59aa563-a2ca-408c-b33b-09228d091824) for task sleep_0-1387096167104 of framework 201312121834-813934784-5050-1492-0000
I1215 08:29:28.064841 6218 slave.cpp:2308] Cleaning up executor 'sleep_0-1387096167104' of framework 201312121834-813934784-5050-1492-0000
I1215 08:29:28.068897 6219 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201312150731-813934784-5050-31922-1/frameworks/201312121834-813934784-5050-1492-0000/executors/sleep_0-1387096167104/runs/d00b20ba-8fd3-4537-8c24-259b2c781ed0' for removal
I1215 08:29:28.070014 6219 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201312150731-813934784-5050-31922-1/frameworks/201312121834-813934784-5050-1492-0000/executors/sleep_0-1387096167104' for removal
I1215 08:29:28.069262 6218 slave.cpp:2380] Cleaning up framework 201312121834-813934784-5050-1492-0000
I1215 08:29:28.073884 6219 status_update_manager.cpp:262] Closing status update streams for framework 201312121834-813934784-5050-1492-0000
I1215 08:29:28.074201 6218 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201312150731-813934784-5050-31922-1/frameworks/201312121834-813934784-5050-1492-0000' for removal
I1215 08:29:28.107458 6218 slave.cpp:793] Got assigned task sleep_0-1387096168101 for framework 201312121834-813934784-5050-1492-0000
I1215 08:29:28.113899 6219 gc.cpp:84] Unscheduling '/tmp/mesos/slaves/201312150731-813934784-5050-31922-1/frameworks/201312121834-813934784-5050-1492-0000' for removal
I1215 08:29:28.118167 6218 slave.cpp:904] Launching task sleep_0-1387096168101 for framework 201312121834-813934784-5050-1492-0000
I1215 08:29:28.121764 6218 paths.hpp:336] Created executor directory '/tmp/mesos/slaves/201312150731-813934784-5050-31922-1/frameworks/201312121834-813934784-5050-1492-0000/executors/sleep_0-1387096168101/runs/4584597e-9150-46a2-ba18-a609af9d0619'
I1215 08:29:28.123733 6219 process_isolator.cpp:100] Launching sleep_0-1387096168101 (/home/ubuntu/mesos-0.14.2/src/mesos-executor) in /tmp/mesos/slaves/201312150731-813934784-5050-31922-1/frameworks/201312121834-813934784-5050-1492-0000/executors/sleep_0-1387096168101/runs/4584597e-9150-46a2-ba18-a609af9d0619 with resources ' for framework 201312121834-813934784-5050-1492-0000
I1215 08:29:28.124042 6218 slave.cpp:1015] Queuing task 'sleep_0-1387096168101' for executor sleep_0-1387096168101 of framework '201312121834-813934784-5050-1492-0000
I1215 08:29:28.130666 6219 process_isolator.cpp:163] Forked executor at 18771
I1215 08:29:28.134117 6218 slave.cpp:536] Successfully attached file '/tmp/mesos/slaves/201312150731-813934784-5050-31922-1/frameworks/201312121834-813934784-5050-1492-0000/executors/sleep_0-1387096168101/runs/4584597e-9150-46a2-ba18-a609af9d0619'

Test fails on Ubuntu 13.04

Running build on Ubuntu 13.04 fails on a test. The same on wheezy passes.
runing openjdk

mvn test -B
dhardy@viaduc436:~/SOURCE/GIT/marathon$ mvn test -B
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building marathon 0.2.1
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-resources-plugin:2.3:resources (default-resources) @ marathon ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 10 resources
[INFO] 
[INFO] --- scala-maven-plugin:3.1.5:add-source (scala-compile-first) @ marathon ---
[INFO] Add Source directory: /home/dhardy/SOURCE/GIT/marathon/src/main/scala
[INFO] Add Test Source directory: /home/dhardy/SOURCE/GIT/marathon/src/test/scala
[INFO] 
[INFO] --- scala-maven-plugin:3.1.5:compile (scala-compile-first) @ marathon ---
[WARNING]  Expected all dependencies to require Scala version: 2.10.3
[WARNING]  mesosphere:marathon:0.2.1 requires scala version: 2.10.3
[WARNING]  com.typesafe.akka:akka-actor_2.10:2.2.3 requires scala version: 2.10.2
[WARNING] Multiple versions of scala libraries detected!
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ marathon ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-resources-plugin:2.3:testResources (default-testResources) @ marathon ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/dhardy/SOURCE/GIT/marathon/src/test/resources
[INFO] 
[INFO] --- scala-maven-plugin:3.1.5:testCompile (scala-test-compile) @ marathon ---
[WARNING]  Expected all dependencies to require Scala version: 2.10.3
[WARNING]  mesosphere:marathon:0.2.1 requires scala version: 2.10.3
[WARNING]  com.typesafe.akka:akka-actor_2.10:2.2.3 requires scala version: 2.10.2
[WARNING] Multiple versions of scala libraries detected!
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ marathon ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-surefire-plugin:2.10:test (default-test) @ marathon ---
[INFO] Surefire report directory: /home/dhardy/SOURCE/GIT/marathon/target/surefire-reports

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running mesosphere.marathon.api.v1.AppDefinitionTest
log4j:WARN No appenders could be found for logger (org.jboss.logging).
log4j:WARN Please initialize the log4j system properly.
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.424 sec <<< FAILURE!
Running mesosphere.marathon.api.v1.json.ConstraintModuleTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.113 sec
Running mesosphere.marathon.TaskTrackerTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.164 sec
Running mesosphere.marathon.MarathonSchedulerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.146 sec
Running mesosphere.mesos.TaskBuilderTest
déc. 02, 2013 11:37:56 AM mesosphere.mesos.TaskBuilder offerMatches
INFO: Met all constraints.
déc. 02, 2013 11:37:56 AM mesosphere.mesos.TaskBuilder offerMatches
WARNING: Did not meet a constraint in an offer.
déc. 02, 2013 11:37:56 AM mesosphere.mesos.TaskBuilder buildIfMatches
INFO: No matching offer (need 1.0 CPUs, 64.0 mem, 0 ports) : id {
  value: "1"
}
framework_id {
  value: "marathon"
}
slave_id {
  value: "slave0"
}
hostname: "beta"
resources {
  name: "cpus"
  type: SCALAR
  scalar {
    value: 4.0
  }
  role: "*"
}
resources {
  name: "mem"
  type: SCALAR
  scalar {
    value: 16000.0
  }
  role: "*"
}
resources {
  name: "ports"
  type: RANGES
  ranges {
    range {
      begin: 31000
      end: 32000
    }
  }
  role: "*"
}
attributes {
  name: "rackid"
  type: TEXT
  text {
    value: "1"
  }
}

déc. 02, 2013 11:37:56 AM mesosphere.mesos.TaskBuilder offerMatches
INFO: Met all constraints.
déc. 02, 2013 11:37:56 AM mesosphere.mesos.TaskBuilder offerMatches
WARNING: Did not meet a constraint in an offer.
déc. 02, 2013 11:37:56 AM mesosphere.mesos.TaskBuilder buildIfMatches
INFO: No matching offer (need 1.0 CPUs, 64.0 mem, 0 ports) : id {
  value: "1"
}
framework_id {
  value: "marathon"
}
slave_id {
  value: "slave0"
}
hostname: "alpha"
resources {
  name: "cpus"
  type: SCALAR
  scalar {
    value: 4.0
  }
  role: "*"
}
resources {
  name: "mem"
  type: SCALAR
  scalar {
    value: 16000.0
  }
  role: "*"
}
resources {
  name: "ports"
  type: RANGES
  ranges {
    range {
      begin: 31000
      end: 32000
    }
  }
  role: "*"
}
attributes {
  name: "rackid"
  type: TEXT
  text {
    value: "3"
  }
}

déc. 02, 2013 11:37:56 AM mesosphere.mesos.TaskBuilder offerMatches
INFO: Met all constraints.
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.029 sec
Running mesosphere.mesos.ConstraintsTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 sec
Running mesosphere.util.RateLimitersTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.014 sec

Results :

Failed tests:   testValidation(mesosphere.marathon.api.v1.AppDefinitionTest)

Tests run: 24, Failures: 1, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.149s
[INFO] Finished at: Mon Dec 02 11:37:57 CET 2013
[INFO] Final Memory: 18M/341M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.10:test (default-test) on project marathon: There are test failures.
[ERROR] 
[ERROR] Please refer to /home/dhardy/SOURCE/GIT/marathon/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Content of target/surefire-reports/mesosphere.marathon.api.v1.AppDefinitionTest.xml

<?xml version="1.0" encoding="UTF-8" ?>
<testsuite failures="1" time="0.423" errors="0" skipped="0" tests="3" name="mesosphere.marathon.api.v1.AppDefinitionTest">
  <properties>
    <property name="java.runtime.name" value="OpenJDK Runtime Environment"/>
    <property name="sun.boot.library.path" value="/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64"/>
    <property name="java.vm.version" value="23.7-b01"/>
    <property name="java.vm.vendor" value="Oracle Corporation"/>
    <property name="java.vendor.url" value="http://java.oracle.com/"/>
    <property name="path.separator" value=":"/>
    <property name="guice.disable.misplaced.annotation.check" value="true"/>
    <property name="java.vm.name" value="OpenJDK 64-Bit Server VM"/>
    <property name="file.encoding.pkg" value="sun.io"/>
    <property name="user.country" value="FR"/>
    <property name="sun.java.launcher" value="SUN_STANDARD"/>
    <property name="sun.os.patch.level" value="unknown"/>
    <property name="java.vm.specification.name" value="Java Virtual Machine Specification"/>
    <property name="user.dir" value="/home/dhardy/SOURCE/GIT/marathon"/>
    <property name="java.runtime.version" value="1.7.0_25-b30"/>
    <property name="java.awt.graphicsenv" value="sun.awt.X11GraphicsEnvironment"/>
    <property name="java.endorsed.dirs" value="/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/endorsed"/>
    <property name="os.arch" value="amd64"/>
    <property name="java.io.tmpdir" value="/tmp"/>
    <property name="line.separator" value="
"/>
    <property name="java.vm.specification.vendor" value="Oracle Corporation"/>
    <property name="os.name" value="Linux"/>
    <property name="classworlds.conf" value="/usr/share/maven/bin/m2.conf"/>
    <property name="sun.jnu.encoding" value="UTF-8"/>
    <property name="java.library.path" value="/usr/java/packages/lib/amd64:/usr/lib/jni:/lib:/usr/lib"/>
    <property name="java.specification.name" value="Java Platform API Specification"/>
    <property name="java.class.version" value="51.0"/>
    <property name="sun.management.compiler" value="HotSpot 64-Bit Tiered Compilers"/>
    <property name="os.version" value="3.8.0-33-generic"/>
    <property name="user.home" value="/home/dhardy"/>
    <property name="user.timezone" value="Europe/Paris"/>
    <property name="java.awt.printerjob" value="sun.print.PSPrinterJob"/>
    <property name="file.encoding" value="UTF-8"/>
    <property name="java.specification.version" value="1.7"/>
    <property name="user.name" value="dhardy"/>
    <property name="java.class.path" value="/usr/share/maven/boot/plexus-classworlds-2.x.jar"/>
    <property name="java.vm.specification.version" value="1.7"/>
    <property name="sun.arch.data.model" value="64"/>
    <property name="java.home" value="/usr/lib/jvm/java-7-openjdk-amd64/jre"/>
    <property name="sun.java.command" value="org.codehaus.plexus.classworlds.launcher.Launcher test -B"/>
    <property name="java.specification.vendor" value="Oracle Corporation"/>
    <property name="user.language" value="fr"/>
    <property name="awt.toolkit" value="sun.awt.X11.XToolkit"/>
    <property name="java.vm.info" value="mixed mode"/>
    <property name="java.version" value="1.7.0_25"/>
    <property name="java.ext.dirs" value="/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext:/usr/java/packages/lib/ext"/>
    <property name="securerandom.source" value="file:/dev/./urandom"/>
    <property name="sun.boot.class.path" value="/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/netx.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/plugin.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rhino.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/classes"/>
    <property name="java.vendor" value="Oracle Corporation"/>
    <property name="maven.home" value="/usr/share/maven"/>
    <property name="file.separator" value="/"/>
    <property name="java.vendor.url.bug" value="http://bugreport.sun.com/bugreport/"/>
    <property name="sun.cpu.endian" value="little"/>
    <property name="sun.io.unicode.encoding" value="UnicodeLittle"/>
    <property name="sun.desktop" value="gnome"/>
    <property name="sun.cpu.isalist" value=""/>
  </properties>
  <testcase time="0.151" classname="mesosphere.marathon.api.v1.AppDefinitionTest" name="testToProto"/>
  <testcase time="0.212" classname="mesosphere.marathon.api.v1.AppDefinitionTest" name="testValidation">
    <failure type="java.lang.AssertionError">java.lang.AssertionError
    at org.junit.Assert.fail(Assert.java:86)
    at org.junit.Assert.assertTrue(Assert.java:41)
    at org.junit.Assert.assertTrue(Assert.java:52)
    at mesosphere.marathon.api.v1.AppDefinitionTest.shouldViolate$1(AppDefinitionTest.scala:64)
    at mesosphere.marathon.api.v1.AppDefinitionTest.testValidation(AppDefinitionTest.scala:76)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
    at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
    at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
    at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
</failure>
    <system-err>log4j:WARN No appenders could be found for logger (org.jboss.logging).
log4j:WARN Please initialize the log4j system properly.
</system-err>
  </testcase>
  <testcase time="0.012" classname="mesosphere.marathon.api.v1.AppDefinitionTest" name="testMergeFromProto"/>
</testsuite>

Marathon/Docker scaling up/down issue

Not sure where this issue exactly is but logging it here. Pls let me know if I need to move it to a different project.

To reproduce I believe do the following.

  1. Have marathon, mesos, dockerized riak or tomcat, mesos-docker running
  2. Assume we have 4 cpus available and scale to 5 instances of app
  3. you will see 4 instances of app running
  4. Now scale the number of instances to 0
  5. You will still see one instance running in "docker ps" all though the UI Shows 0
  6. Now if you scale it 2 and back 0 you will still see one instance running

Only was to stop that instance would be to do a "docker stop "

"v1/tasks/kill" returns 200 even in failure

If Marathon receives a request to kill or kill & scale a task, the endpoint always returns a 200 response code even if the requested task or app does not exist.

  1. Passing nothing at all returns a 200. A 404 is more appropriate:

    $ http POST localhost:8080/v1/tasks/kill?appId=id=
    
    HTTP/1.1 200 OK
    Content-Type: application/json
    Server: Jetty(8.y.z-SNAPSHOT)
    Transfer-Encoding: chunked
    
    []
    
  2. Passing a valid appId but an invalid id (task ID) returns a 200, but a 404 or a 410 is more appropriate.

Case 2. is particularly problematic because the app is scaled only if there is actually a task to kill. If you request the list of tasks to find a task ID, the task is somehow killed or dies, and then you request "v1/tasks/kill" for that task with scale=true, you will think you have scaled down your app since 200 is the only response right now, but Marathon will have actually silently done nothing.

Marathon/Mesos won't launch more than one app despite CPU being available (i.e 0.1 < 0.9 )

I'm not sure if this is a mesos or marathon problem but tasks stay queued when you have <1 CPU available despite the tasks requiring less than what is available (say 0.9). I'm using Mesos 0.14 which may not like floats for resources. Anyway an easy workaround if you're just trying to test this is to add --resources=cpus:10 to your slave arguments or if you use chef "--resources=cpus:#{str(node['cpu']['total'] * node['mesos']['overcommit_factor'])} ".

Overhaul API

Ideas:

  • Make it fully REST, e.g. DELETE /v2/apps/sleep to delete an app, PUT /v2/apps/sleep to scale etc.
  • Add support for updating apps: resources, command, etc.
  • Simplify some clumsy bits, e.g. constraints

MESOS_HOME and MESOS_NATIVE_LIBRARY settings not mentioned in docs

When starting Marathon, it looks for the environment variables MESOS_HOME and MESOS_NATIVE_LIBRARY for the Mesos build directory and the Mesos native library, respectively. If they aren't set, Marathon prints warning-ish messages:

MESOS_HOME is not set. Defaulting to /usr/local
MESOS_NATIVE_LIBRARY set to /usr/local/lib/libmesos.dylib

When building and working on both Mesos and Marathon locally, it's useful to point Marathon to your local Mesos. The docs should mention the two environment variables and what they do.

StorageException when killing two tasks in succession

When sending two requests to kill two separate tasks on the same app at the "same time" (within the same second), I get a mesosphere.marathon.StorageException consistently:

Note: this only happens with scale=true. I have only reproduced it when trying to scale down.

Jan 06, 2014 4:52:27 PM mesosphere.chaos.http.ChaosRequestLog write
INFO: 0:0:0:0:0:0:0:1 -  -  [07/Jan/2014:00:52:27 +0000] "POST /v1/tasks/kill?appId=asdf&id=asdf_2-1389055935232&scale=true HTTP/1.1" 200 2 ...
Jan 06, 2014 4:52:27 PM mesosphere.marathon.api.v1.MarathonExceptionMapper toResponse
WARNING: 
mesosphere.marathon.StorageException: Failed to store asdf
  at mesosphere.marathon.state.MarathonStore$$anonfun$store$1$$anonfun$apply$1.apply(MarathonStore.scala:35)
  at mesosphere.marathon.state.MarathonStore$$anonfun$store$1$$anonfun$apply$1.apply(MarathonStore.scala:33)
  at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
  at scala.util.Try$.apply(Try.scala:161)
  at scala.util.Success.map(Try.scala:206)
  at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
  at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
  at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
  at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Jan 06, 2014 4:52:27 PM mesosphere.chaos.http.ChaosRequestLog write
INFO: 0:0:0:0:0:0:0:1 -  -  [07/Jan/2014:00:52:27 +0000] "POST /v1/tasks/kill?appId=asdf&id=asdf_3-1389055940237&scale=true HTTP/1.1" 500 34 ...

Marathon non-master UI not redirecting to master UI when using --ha flag

I am running two marathon instances using the --ha flag both pointing to the same zookeeper ensemble for leader election. When I navigate to the non-leader node I am not redirected to the leader UI. I can confirm that my api requests to endpoints such as /v1/apps do get redirected properly though.

Here are some examples:

http://nonmaster:8080 -> no redirection and I don't see the current configured apps

http://nonmaster:8080/v1/apps -> redirected to master marathon instance showing configured apps

http://nonmaster:8080/foo -> redirected to master marathon instance to page not found

Is this expected behavior?

Marathon process stays alive, doesn't respond to signals, after suicide

This happens consistently when my laptop goes to sleep and wakes up. It looks like losing the connection to ZooKeeper and then reconnecting causes it to die but not kill its process.

After waking up, it doesn't respond to signals, i.e. Ctrl + C. A kill -9 is the only way to get rid of the process.

I1209 09:27:40.438854 546488320 detector.cpp:427] Master detector (scheduler(1)@10.1.10.15:57667)  found 0 registered masters
I1209 09:27:40.438875 546488320 detector.cpp:448] Master detector (scheduler(1)@10.1.10.15:57667) couldn't find any masters
Dec 09, 2013 9:27:40 AM mesosphere.marathon.MarathonScheduler disconnected
WARNING: Disconnected
Dec 09, 2013 9:27:40 AM mesosphere.marathon.MarathonScheduler suicide
SEVERE: Committing suicide

Marathon dead / zombie tasks

If marathon is offline when tasks die, marathon doesn't get the failure / finish notification and thinks the tasks still run.

MarathonTask's port would be inconsistent to AppDefinition's port with high probability.

#31 (7490ed4) made TaskBuilder#getPort be non-deterministic method.

TaskBuilder#getPort is called from two places ( on 35cf445 ):

In MarathonScheduler, acquired port number is stored to MarathonTask. But port number stored in AppDefinition was acquired in TaskBuilder. These two port number could differ with high possibility.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.