Comments (10)
apparently it get caught here:
// Regular expression to extract runSpecId from instanceId // See: https://github.com/mesosphere/marathon/blob/v1.4.0-RC4/src/main/scala/mesosphere/marathon/core/instance/Instance.scala#L244 var instanceIDRegex = regexp.MustCompile(
^(.+).(instance-|marathon-)([^\.]+)$`)
func (t TaskHealthChange) TaskID() apps.TaskID {
if t.ID != "" {
return t.ID
}
return apps.TaskID(instanceIDRegex.ReplaceAllString(t.InstanceID, "$1.$3"))
}`
If I read that correct, it seems that that the ID is empty and therefor uses the regex replacement
from marathon-consul.
I would like to add to this bug, as we are hitting the same thing.
though I would add some extra information.
time="2019-10-22T23:25:19Z" level=debug msg="Service is running" Id=pyt_my-service-svc.instance-c98bdb62-f522-11e9-9b69-02429c8e109f._app.1_my-service-svc_31650
time="2019-10-22T23:26:08Z" level=info msg="Got StatusEvent" Id=pyt_my-service-svc.instance-c98bdb62-f522-11e9-9b69-02429c8e109f._app.1 TaskStatus=TASK_KILLED
time="2019-10-22T23:26:08Z" level=info msg=Deregistering Address=10.130.37.52 Id=pyt_my-service-svc.instance-c98bdb62-f522-11e9-9b69-02429c8e109f._app.1_my-service-svc_31650
time="2019-10-22T23:26:09Z" level=info msg="Got StatusEvent" Id=pyt_my-service-svc.instance-5373f6f3-f523-11e9-9b69-02429c8e109f._app.1 TaskStatus=TASK_STARTING
time="2019-10-22T23:26:09Z" level=debug msg="Not handled task status" Id=pyt_my-service-svc.instance-5373f6f3-f523-11e9-9b69-02429c8e109f._app.1 taskStatus=TASK_STARTING
time="2019-10-22T23:26:09Z" level=info msg="Got StatusEvent" Id=pyt_my-service-svc.instance-5373f6f3-f523-11e9-9b69-02429c8e109f._app.1 TaskStatus=TASK_RUNNING
time="2019-10-22T23:26:09Z" level=debug msg="Not handled task status" Id=pyt_my-service-svc.instance-5373f6f3-f523-11e9-9b69-02429c8e109f._app.1 taskStatus=TASK_RUNNING
time="2019-10-22T23:26:24Z" level=info msg="Got HealthStatusEvent" Id=pyt_my-service-svc.5373f6f3-f523-11e9-9b69-02429c8e109f
time="2019-10-22T23:26:24Z" level=debug msg="Asking Marathon for /pyt/my-service-svc" Location="ip-10-130-37-72:8080"
time="2019-10-22T23:26:24Z" level=debug msg="Sending GET request to marathon" Location="ip-10-xx-xx-72:8080" Protocol=http Uri="/v2/apps//pyt/my-service-svc?embed=apps.tasks"
time="2019-10-22T23:26:24Z" level=info msg="Got StatusEvent" Id=pyt_my-service-svc.instance-5373f6f3-f523-11e9-9b69-02429c8e109f._app.1 TaskStatus=TASK_RUNNING
time="2019-10-22T23:26:24Z" level=debug msg="Not handled task status" Id=pyt_my-service-svc.instance-5373f6f3-f523-11e9-9b69-02429c8e109f._app.1 taskStatus=TASK_RUNNING
time="2019-10-22T23:26:24Z" level=error msg="Task not found" Id=pyt_my-service-svc.5373f6f3-f523-11e9-9b69-02429c8e109f
time="2019-10-22T23:30:19Z" level=info msg="Ignoring health check of type COMMAND" Address=10.xx.xx.52 Id=/pyt/my-service-svc
time="2019-10-22T23:30:19Z" level=info msg=Registering Address=10.xx.xx.52 EnableTagOverride=false Id=pyt_my-service-svc.instance-5373f6f3-f523-11e9-9b69-02429c8e109f._app.1_my-service-svc_31586 Name=my-service-svc Port=31586 Tags="[marathon haproxy_fqdn=my-service-svc.core-us.jumio.link haproxy_http_app_public haproxy_http_app_waf_backend=detection-only haproxy_httpchk=GET /api/internal/monitoring/healthchecks haproxy_http_app haproxy_http_app_waf_backend=detection-only marathon-task:pyt_my-service-svc.instance-5373f6f3-f523-11e9-9b69-02429c8e109f._app.1]"
This is the curated logs during a normal marathon deploy.
the task is correctly de-registered, but it cannot re-register, though this only occurs from the register task.
After some minutes, when the sync all task is called, the registration of the service works.
msg=Registering Address=10.xx.xx.52 EnableTagOverride=false Id=pyt_my-service-svc.instance-5373f6f3-f523-11e9-9b69-02429c8e109f._app.1_my-service-svc_31586
Unfortunately, the sync is quite infrequent, and this causes downtime for us. We will no doubt increase the frequency of the sync until this issue can be resolved, but this is not ideal.
(In this example, we are talking about 3.5 minutes before the service gets registered back in consul)
from marathon-consul.
we are working on a fix and hopefully will submit a merge request semi-soon
from marathon-consul.
now testing with 1.5.1 (marathon 1.9.100 from yum repo http://repos.mesosphere.com/el/7/$basearch/) and we see the same
from marathon-consul.
Just to debug some code changes which make it work (but are no real fix in any sense of that combination of words ;)
diff marathon-consul-master/events/task_health_change.go go/src/github.com/allegro/marathon-consul/events/task_health_change.go
24c24
var instanceIDRegex = regexp.MustCompile(^(.+)\.(instance-|marathon-)([^\.]+)$
)var instanceIDRegex = regexp.MustCompile(
^(.+)$
)30c30
return apps.TaskID(instanceIDRegex.ReplaceAllString(t.InstanceID, "$1.$3"))return apps.TaskID(instanceIDRegex.ReplaceAllString(t.InstanceID, "$1._app.1"))
So essentially I just take the complete instanceID and put ._app.1 after that, which causes the task to be found.
However if the task has more than 1 instance it will fail, also the regex change is just giving back the input, which should be done in a correct way (but hey, I'm an operator, not a programmer).
from marathon-consul.
The data given by the health status change event seems to be
data: {"appId":"/demo/hello-world","instanceId":"demo_hello-world.instance-b98f6bf3- eb61-11e9-af4c-02423f927891","version":"2019-10-10T13:27:36.878Z","alive":true,"eventType":"health_status_changed_event","timestamp":"2019-10-10T13:27:41.929Z"}
So if we can match that instanceId to a field in the task list instead of the id field which does include the ._app.# we should be baconbuyer.
from marathon-consul.
Is there any update on this topic?
from marathon-consul.
There is a PR but there is a build fail, which JurrianFahner can't reproduce. Not sure what's the hold-up on the reply to his question.
from marathon-consul.
The thing is we stopped developing this project. We still use it in production, but it runs on an older version of Marathon and we only use a subset of features. We don't plan to keep it up to date with Marathon releases.
That being said, we'd be happy to accept PRs and release them for you. We don't want to invest time in debugging the travis build though.
Even better, you could fork this project and continue its development. If you decide so, we can make it official and provide a link in our README.
from marathon-consul.
ok thanks for the update. We'll consider our options at this point.
from marathon-consul.
Related Issues (20)
- marathon-consul does not recover from connection refused in leader retrieval HOT 11
- marathon-consul loses connectivity to marathon HOT 14
- please add configurable number of retries/retry interval for marathon connection errors HOT 1
- Remove web handler
- PPA broken HOT 2
- marathon-consul should support multiple marathon clusters HOT 4
- "consul" label naming case sensitivity HOT 4
- Syncing the TASK_KILLING servcies HOT 5
- Migrate from Glide to dep HOT 2
- Update Consul to 0.9.3
- Update Consul to 1.0.0 HOT 1
- Catalog get operation randomness problematic in heterogenous ACL environments HOT 1
- Parallel tests run into a data race
- marathon-consul sometimes failing updation HOT 1
- USER network mode can't register to consul HOT 4
- Unexpected response code: 500 (rpc error: No path to datacenter): Sync not working, Services not being de-registered HOT 6
- marathon-consul (de)registers consul services in all datacenters HOT 3
- Documents on deployment with consul? HOT 4
- APT repository not signed HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from marathon-consul.