Giter Club home page Giter Club logo

Comments (37)

aaithal avatar aaithal commented on May 13, 2024 22

Hello,

One of the reasons for not having utilization metrics more granular than service name is the ephemeral nature of Task IDs. Publishing utilization metrics by task IDs can lead to metric spam as most of these tasks are short-lived by nature. It's also very hard to alarm on something as ephemeral as task IDs. Having something that's more human readable and generated makes it easier to do these things.

Emitting utilization metrics aggregated by task definition family and version strings is something that is sort of a middle ground here, which we have considered as an alternative here. Is that something that you think would prove to be helpful here?

Thanks,
Anirudh

from containers-roadmap.

akshayram-wolverine avatar akshayram-wolverine commented on May 13, 2024 9

Hi everyone,

This feature is now in preview: https://aws.amazon.com/about-aws/whats-new/2019/07/introducing-container-insights-for-ecs-and-aws-fargate-in-preview/

Look forward to feedback!

from containers-roadmap.

vcolano avatar vcolano commented on May 13, 2024 9

The docs explicitly state that this is not available for AWS Batch: "Currently, Container Insights isn't supported in AWS Batch."

When will this be supported for Batch?

from containers-roadmap.

ayush-san avatar ayush-san commented on May 13, 2024 7

Is there any timeline for it to be supported for batch too?

from containers-roadmap.

jonathonsim avatar jonathonsim commented on May 13, 2024 5

+1 for adding a dimension for TaskName to these metrics. Without it we can't really get a full picture of what's running on the cluster, only what's running in a service. Tasks scheduled based on things like Cloudwatch events are invisible

from containers-roadmap.

esbie avatar esbie commented on May 13, 2024 5

It seems like for ecs, task-level metrics were not added to cloudwatch insights. I only see "TaskDefinitionFamily" in the the ecs supported dimensions. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-metrics-ECS.html

from containers-roadmap.

coultn avatar coultn commented on May 13, 2024 4

Thanks for the feedback. I wanted to let you know that the ECS team is aware of this issue, and that it is under active consideration. We always appreciate +1's and additional details on use cases.

from containers-roadmap.

danielfosbery avatar danielfosbery commented on May 13, 2024 4

+1 This would be really helpful. We have a service running that hits 100% Max CPU but with an average CPU of about 40%. Some tasks are doing more work than others, without task level stats it is very hard to debug which tasks are running at capacity and why.

from containers-roadmap.

christianblunden avatar christianblunden commented on May 13, 2024 3

+1

from containers-roadmap.

blues4ugrl avatar blues4ugrl commented on May 13, 2024 3

Another reason for insight into task-level metrics is for helping debug issues. I have Service A and it runs 30 tasks. By other means (i.e. alerting on CloudWatch events from the ECS Agent) I get notification that 1 or 2 tasks get stopped due to an OutOfMemoryError. When I view service-level metrics and look at max memory utilization during the timeframe that said tasks are stopped, the max utilization is < 80%.

According to documentation:

Service memory utilization (metrics that are filtered by ClusterName and ServiceName) is measured as the total memory in use by the tasks that belong to the service, divided by the total memory that is reserved for the tasks that belong to the service.

Out of my 30 tasks, only 2 of them were stopped due to memory pressure. What about the other tasks? Are they only utilizing a small percentage compared to the 2 that fell over? Or, were they high in utilization as well and only 2 tasks hit that breaking point? Knowing that makes a difference - either you don't have enough capacity overall or you have some code that in certain data scenarios is using a ton of memory.

If you already know the "total memory in use by the tasks that belong to the service" to be able to show us the overall utilization, I'm hoping that based on the conversations/feedback above, you'll find a way to expose it that makes sense to those looking for it. Thanks for listening! :)

from containers-roadmap.

jonathonsim avatar jonathonsim commented on May 13, 2024 1

@aaithal - I think that would give us what we need for our use case

from containers-roadmap.

mdamir avatar mdamir commented on May 13, 2024 1

Yes. @aaithal That can be helpful. An alternative can also be to have a completely new metric such as "maxMemoryUtilization" which will track max memory among all tasks in a ecs service.

from containers-roadmap.

bashilbers avatar bashilbers commented on May 13, 2024 1

+1 a task container name does not change that much right? Is that an alternative to use as task metric?

from containers-roadmap.

waffleshop avatar waffleshop commented on May 13, 2024 1

+1

It's hard for me to recommend ECS as a container solution without being able to monitor basic container-level metrics. The burden is being pushed on your consumers to develop our own means of container resource monitoring.

I wrote PowerShell and Python scripts to ship these metrics to CloudWatch, but depending on the number of containers you're running across your environments, the cost can be quite ridiculous. I recommend shipping these metrics to another monitoring solution if you have lots of containers.

from containers-roadmap.

akshayram-wolverine avatar akshayram-wolverine commented on May 13, 2024 1

Shipped! More info here: https://aws.amazon.com/about-aws/whats-new/2019/08/container-monitoring-for-amazon-ecs-eks-and-kubernetes-is-now-available-in-amazon-cloudwatch/

from containers-roadmap.

billyshambrook avatar billyshambrook commented on May 13, 2024

Is this a limitation of the agent or cloudwatch?

from containers-roadmap.

atifrizwan89 avatar atifrizwan89 commented on May 13, 2024

+1 for task level monitoring

from containers-roadmap.

skatenerd avatar skatenerd commented on May 13, 2024

+1 it would be nice to have an official response at least telling us whether cloudwatch will eventually offer this

from containers-roadmap.

bramswenson avatar bramswenson commented on May 13, 2024

@billyshambrook More likely a limitation of ECS itself, and the metrics it is emitting to Cloudwatch. The current dimensions are ClusterName and ServiceName:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ecs-metricscollected.html

from containers-roadmap.

jrodr12 avatar jrodr12 commented on May 13, 2024

+1

from containers-roadmap.

mandeepbal avatar mandeepbal commented on May 13, 2024

+1

from containers-roadmap.

dustinbolton avatar dustinbolton commented on May 13, 2024

+1

from containers-roadmap.

hlopezvg avatar hlopezvg commented on May 13, 2024

+1

from containers-roadmap.

milanbrahmbhatt avatar milanbrahmbhatt commented on May 13, 2024

+1

from containers-roadmap.

ryanpagel avatar ryanpagel commented on May 13, 2024

+1

from containers-roadmap.

vimmis avatar vimmis commented on May 13, 2024

+1

from containers-roadmap.

kbhandar avatar kbhandar commented on May 13, 2024

+1

from containers-roadmap.

sandeepboyapati avatar sandeepboyapati commented on May 13, 2024

+1

from containers-roadmap.

kandoiNikhil avatar kandoiNikhil commented on May 13, 2024

+1

from containers-roadmap.

DionJones615 avatar DionJones615 commented on May 13, 2024

+1

from containers-roadmap.

abby-fuller avatar abby-fuller commented on May 13, 2024

moving this over to the containers roadmap since this is a feature request and not an ecs-agent issue.

from containers-roadmap.

nicolas-modsy avatar nicolas-modsy commented on May 13, 2024

+1

from containers-roadmap.

deleugpn avatar deleugpn commented on May 13, 2024

My use case is that I often seen some of my services with max utilization CPU nearly 100% and min utilization nearly 10%. I can only assume that some tasks are working hard while others are being lazy, but I don't know which. I'd like to know so I could either find out why or at least kill them and get a better one.

from containers-roadmap.

enricopesce avatar enricopesce commented on May 13, 2024

+1

from containers-roadmap.

medbensalem avatar medbensalem commented on May 13, 2024

+1

from containers-roadmap.

gdanielson avatar gdanielson commented on May 13, 2024

A big +1 👍 for task level resource tracking. Since the inside of a running container is normally so opaque any additional information on run-time state is extremely valuable when things do not go according to plan

from containers-roadmap.

sasuolanderSito avatar sasuolanderSito commented on May 13, 2024

Technically, is it possible to turn container insight on in batch compute environment by running:

aws ecs update-cluster-settings --cluster BatchComputeEnviromentClusterEC2 --settings "name=containerInsights,value=enabled" ?

Compute environment for a fargate seems to be just a normal EC2 cluster.

from containers-roadmap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.