ciscocloud / marathon-consul Goto Github PK
View Code? Open in Web Editor NEWbridge Marathon information to Consul KV
License: Apache License 2.0
bridge Marathon information to Consul KV
License: Apache License 2.0
I noticed that marathon-consul seems to have trouble creating names for apps started from a group in marathon.
If we create a simple group:
{ "groups": {"id": "a", "apps": [{"id":"b", ...}]}}
and start the group with curl -XPUT http://marathon/v2/groups/foo ...
marathon will start an app named /foo/a/b
but marathon-consul stores this under the key marathon/foo-a-b/tasks/foo_a_b.c76b66f4-0f16-11e5-bcfb-56847afe9799
Is this expected behavior? It is rather unwieldy to use keys of this form from consul-template.
Any chance the repo could show an example on how to use it lets say with consul-template
for example ?
I'm assuming we need to grab taskStatus
in the JSON store:
{{with $t := key "marathon/TASKNAME/tasks/SUBTASKNAME" | parseJSON}}
{{if $t}}
{{$t.taskStatus}}
{{end}}
{{end}}
But how do we iterate the tasks in the KV ? marathon-consul
seems to store a single key as well as a folder:
MARATHON/
TASKNAME/
TASKNAME
Thanks in advance for chiming in.
So I was testing out the new code and noticed something interesting.
To test, I cleared the consul kv tree of marathon data. Then restarted marathon-consul, the kv tree got repopulated, except there were no 'taskStatus' values, as these only are provided by marathon on Status Update events.
Unfortunately I had been using the taskStatus to determine if I would add a record to my service config or not, and since the taskStatus field was empty, nothing was added (well it was, but commented out).
So it appears that resync works, the apps were repopulated in consul, but the taskStatus will not change until restarting the app.
Mesos 0.24 was released recently. Was not able to make it work with updated version. Marathon-consul don't have any information in log, even in debug mode, so i don't have any diagnostic information. After downgrading mesos it works as usual.
Btw, marathon 0.11 was also released, so it would be reasonable to check compatibility too.
This strikes me as a bit odd, I would've expected the information about which hosts are running which tasks to be in the information passed along to Consul.
Hi,
I'm running marathon-consul as a docker instance deployed by puppet, here is the process:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a9adf6b884bd3475211305f049082cd907447fdee890d36467b5609369541cb7 ciscocloud/marathon-consul "/launch.sh --marathon-location=172.17.42.1:8080 --registry=172.17.42.1:8500 --log-level=debug" 23 minutes ago Up 23 minutes 0.0.0.0:4000->4000/tcp marathon-consul
As you can see, I've specified the registry and marathon locations to be the docker bridge IP.
Then I start marathon with the callback flag and create a callback:
curl -X POST 'http://localhost:8080/v2/eventSubscriptions?callbackUrl=http://localhost:4000/events' -v
I've also tried creating a callback with docker bridge IP:
curl -X POST 'http://172.17.42.1:8080/v2/eventSubscriptions?callbackUrl=http://172.17.42.1:4000/events' -v
In both instances when running marathon-consul
in debug mode, I see the following:
time="2015-10-18T17:28:22Z" level=info msg="handling event" eventType="status_update_event"
time="2015-10-18T17:28:22Z" level=debug msg="{\"slaveId\":\"20151017-195725-3111233728-5050-17500-S0\",\"taskId\":\"basic-0.838db8f8-75bd-11e5-b38e-5e84bfba7cb5\",\"taskStatus\":\"TASK_KILLED\",\"message\":\"Command terminated with signal Terminated\",\"appId\":\"/basic-0\",\"host\":\"mesos0\",\"ports\":[20777],\"version\":\"2015-10-18T17:27:27.950Z\",\"eventType\":\"status_update_event\",\"timestamp\":\"2015-10-18T17:28:22.309Z\"}"
time="2015-10-18T17:29:18Z" level=info msg="handling event" eventType="api_post_event"
time="2015-10-18T17:29:18Z" level=info msg="[ERROR] response generated error: Get http://127.0.0.1:8500/v1/kv/marathon/basic-0: dial tcp 127.0.0.1:8500: connection refused"
time="2015-10-18T17:29:18Z" level=debug msg="{\"clientIp\":\"10.2.9.151\",\"uri\":\"/v2/apps//basic-0\",\"appDefinition\":{\"id\":\"/basic-0\",\"cmd\":null,\"args\":null,\"user\":null,\"env\":{},\"instances\":1,\"cpus\":1.0,\"mem\":128.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[],\"uris\":[],\"storeUrls\":[],\"ports\":[0],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":null,\"healthChecks\":[],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"acceptedResourceRoles\":null,\"version\":\"2015-10-18T17:29:18.558Z\"},\"eventType\":\"api_post_event\",\"timestamp\":\"2015-10-18T17:29:18.560Z\"}"
time="2015-10-18T17:29:18Z" level=info msg="not handling event" eventType="group_change_success"
I can't seem to work out why the query is going to '127.0.0.1:8500' rather than the configured registry address. Any help would be appreciated!
Cheers,
root@mesos0:~# dpkg -l | grep -E 'marathon|mesos'
ii marathon 0.9.0-1.0.381.debian77 amd64 Cluster-wide init and control system for services running on Apache Mesos
ii mesos 0.22.1-1.0.debian78 amd64 Cluster resource manager with efficient resource isolation
Hello,
Great contributions to the mesos ecosystem here with mantl and the consul / general hashi-* tooling. Thanks! I notice that the mesos-consul
project seems to be much more update and have more activity, is it replacing this project entirely?
Hi
With @dankraw we are working on registering tasks based on Marathon events. Basically this means merging functionality of mesos-consul to marathon-consul. Are you interested in this feature or we should make it stand-alone application. We are using lot of your code, and development takes place here
EDIT:
Was misconfiguration problem
Running this marathon-consul in one of my Mesos slaves slaves does not work using the consul dns service.
sudo docker run -i -t ciscocloud/marathon-consul --marathon-location=marathon.service.consul:8080 --registry=http://consul.service.consul:8500 --registry=http://consul.service.consul:8500 --log-level=debug
I get the following
DEBU[0000] asking Marathon for its version location=marathon.service.consul:8080
INFO[0000] syncing apps
DEBU[0000] asking Marathon for apps location=marathon.service.consul:8080
ERRO[0003] Get http://:@marathon.service.consul:8080/v2/info: dial tcp: lookup marathon.service.consul: no such host location=marathon.service.consul:8080 protocol=http statusCode=???
WARN[0003] version parsing failed, assuming >= 0.9.0 error=Get http://:@marathon.service.consul:8080/v2/info: dial tcp: lookup marathon.service.consul: no such host
INFO[0003] detected Marathon events endpoint version=0.9.0
ERRO[0003] Get http://:@marathon.service.consul:8080/v2/apps: dial tcp: lookup marathon.service.consul: no such host location=marathon.service.consul:8080 protocol=http statusCode=???
ERRO[0003] HTTP request for /v2/events failed! error=Get http://:@marathon.service.consul:8080/v2/events: dial tcp: lookup marathon.service.consul: no such host
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x402a74]
goroutine 1 [running]:
main.SubscribeToEventStream(0xc20808c300, 0x7ffe4aee0ef4, 0x1c, 0x7895c0, 0x4, 0xc20803ad80, 0x0, 0xc20801cd80)
/go/src/github.com/CiscoCloud/marathon-consul/main.go:65 +0x1264
main.main()
/go/src/github.com/CiscoCloud/marathon-consul/main.go:54 +0xa05
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:2232 +0x1
Any hints on building dynamically this URL?
If I use one of the Marathon Masters IPs, it resolves the consul IP correctly, then resolving this should be easy (if a am sending the marathon URL correctly)
Currently the app is only a web hook system. In order to do things as described in #3, we need to break some of the functionality out into smaller composable pieces.
Work for this is being tracked on the feature/refactoring branch
Marathon has a new server-sent events endpoint in 0.9. It would simplify configuration to just open a connection to Marathon instead of configuring Marathon to push events.
The http://0.0.0.0:4000/health endpoint is not working in the latest version.
I guess it is less important since /events is used instead of a subscription with newer versions of marathon. However it would be nice to know about the health of the container. This would match mesos-consul which had a health check added.
If an is started by marathon before marathon-consul is running, it does not properly update the consul state with the state of this app, until the app is restarted.
This can be observed by stopping an app, stopping marathon-consul, starting the previously stopped app, then starting marathon consul. Since no event is generated the consul state is not changed.
I think a useful remedy would be to have marathon-consul query the state of all apps at startup, and replace any values under the /kv/marathon path with the current state of the world. In the case where nothing has changed this should be idempotent, but in the case where an apps state changed while marathon-consul was away, this will get things back on track.
I cannot understand the logic of port mapping and service port in marathon and using marthon-consul
This is a json file describing an app running on Marathon and the result in my KV store in consul :
https://gist.github.com/tgermain/32e0387ba9f1491a2aad
Note the Ports part in the KV store :
"ports":[
30501,
10002
],
The specified port 30501 is still there, and the servicePort=0
as been assigned to a free port of the host, here 10002.
Here a slightly different json file discribing the same app with a different portMapping and the result in my KV store in consul : https://gist.github.com/tgermain/add38414effe1557d612
Here is the problem :
"ports":[
0
],
There no trace of the port 30501 and consul-marathon ( & consul-template) interpret this as "easy-python-app is listening on port 0 on the host ".
I don't think you can listen on port 0, with my fair knowledge on network.
Any hint ? Am I missing something in port mapping definition ? Is it related to marathon more than marathon-consul ?
First, thanks for the great mesos and marathon tooling -- it's all been extremely solid and reliable!
I am using marathon-consul along with haproxy-consul for load balancing.
As soon as a task is running, it's added to consul and thus the haproxy config. If an application has a non-trivial start up time, I will start to see 503s. That is, tasks are added to the LB before they are reporting healthy.
I've been trying to figure out the best way to solve this. The app definition in consul does have health check info, so, theoretically I could translate the marathon checks to haproxy checks in my haproxy-consul template. Practically though the marathon checks could be more complex than haproxy can express (e.g. some of the apps are using "COMMAND" marathon health checks). It also seems non-trivial to be able to do the translation in the template, and, there might be a chance haproxy and marathon don't agree.
My current thinking is that it makes sense to leave the health checks evaluated by marathon, and include this information in the KV store (perhaps /<app>/<task>/healthCheck
?)
Then in the template I can check that tasks where their app has a health check is defined is only rendered if the task's health check is passing (i.e. not unknown or failing).
I believe this can be done by adding support for the following event types from the event bus:
add_health_check_event
remove_health_check_event
failed_health_check_event
health_status_changed_event
and including this new information in the KV.
I am happy to submit a PR for this, and would love to get any feedback on this approach, if there's a better way, etc.
thanks!
When deploying a group to marathon we are seeing only the task information appear in Consul.
After we restart marathon-consul all the app config and tasks will appear.
K/V Before Restart:
marathon/group-test-app1/tasks/group-test_app1.string
marathon/group-test-app2/tasks/group-test_app2.string
K/V After Restart:
marathon/group-test-app1
marathon/group-test-app1/tasks/group-test_app1.string
marathon/group-test-app2
marathon/group-test-app2/tasks/group-test_app2.string
We are using Marathon 8.1.
Here is an example config:
{
"id": "group-test",
"apps": [
{
"id": "/group-test/app1",
"cmd": null,
"cpus": 0.1,
"mem": 256.0,
"instances": 1,
"container": {
"type": "DOCKER",
"docker": {
"image": "nginx",
"network": "BRIDGE",
"portMappings": [
{ "containerPort": 80, "hostPort": 0, "servicePort": 0, "protocol": "tcp" }
]
}
}
},
{
"id": "/group-test/app2",
"cmd": null,
"cpus": 0.1,
"mem": 256.0,
"instances": 1,
"container": {
"type": "DOCKER",
"docker": {
"image": "nginx",
"network": "BRIDGE",
"portMappings": [
{ "containerPort": 80, "hostPort": 0, "servicePort": 0, "protocol": "tcp" }
]
}
}
}
],
"dependencies": []
}
My consul server is behind a reverse proxy and I am passing below arguments to marathon-consul for consul auth. But it is not getting authenticated, can somebody please take a look.
"args": [
"--registry=http://172.28.58.230:8500",
"--registry-auth=admin:$apr1$0xUM.ti7$dQuRPxHFuNH42LK1iqcMR1",
"--marathon-location=dev.marathon.example.com:443",
"--marathon-protocol=https",
"--marathon-username=marathon",
"--marathon-password=marathon123"
]
Error:
time="2018-01-11T12:08:57Z" level=error msg="body generated error" error="Unexpected response code: 401 (<html>
<head><title>401 Authorization Required</title></head>
<body bgcolor="white">
<center><h1>401 Authorization Required</h1></center>
<hr><center>nginx/1.10.1</center>
</body>
</html>
)" eventType="status_update_event"
time="2018-01-11T12:08:57Z" level=info msg="handling event" eventType="status_update_event"
time="2018-01-11T12:08:57Z" level=error msg="body generated error" error="Unexpected response code: 401 (<html>
<head><title>401 Authorization Required</title></head>
<body bgcolor="white">
<center><h1>401 Authorization Required</h1></center>
<hr><center>nginx/1.10.1</center>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.