Giter Club home page Giter Club logo

Comments (15)

mrogers950 avatar mrogers950 commented on June 22, 2024 2

@felixkrohn yes, thanks for your help! the pprof data shows what I expected, which is that the daemon's actual heap usage is only a small percentage of the total that is reported by the cluster:
Here only about 7mb total:
image

This coincides with what I found about the reserved space used by the go runtime which, I tried to outline briefly here: https://mrogers950.gitlab.io/golang/2021/03/12/wild-crazy-golang-mem/
So I believe the high usage will be addressed by golang/go#44167 , (referenced by golang/go#43699).

But I think that now we can support pod limits properly because the daemon pods are more robust and should be able to handle restart by OOM occasionally. I'll work on a PR for that.

from file-integrity-operator.

jhrozek avatar jhrozek commented on June 22, 2024 1

Thank you for filing the issue. We'll look into it next sprint.
While the resource limits are something we wanted to set either way, we also want to see if we can find the root cause of the leak.

from file-integrity-operator.

mrogers950 avatar mrogers950 commented on June 22, 2024 1

@felixkrohn would you be able to run with the steps outlined in https://mrogers950.gitlab.io/openshift/2021/04/12/fio-profile/ ?
It will enable pprof for the ds pods, but requires a container build from source. If you can capture the heap data at a few points (like once a few days in, then again the next week), that could be useful for us to take a look at. I've traced the same slow leak myself and it would be good to have a comparison.

from file-integrity-operator.

openshift-bot avatar openshift-bot commented on June 22, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

from file-integrity-operator.

openshift-bot avatar openshift-bot commented on June 22, 2024

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

from file-integrity-operator.

openshift-bot avatar openshift-bot commented on June 22, 2024

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

from file-integrity-operator.

openshift-ci-robot avatar openshift-ci-robot commented on June 22, 2024

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from file-integrity-operator.

felixkrohn avatar felixkrohn commented on June 22, 2024

Would it be possible to re-open this issue? After a week running the pods consume about 3GiB RAM each.
Current workaround could be to set namespaced defaults, but I find this less elegant.

from file-integrity-operator.

JAORMX avatar JAORMX commented on June 22, 2024

@felixkrohn what versoin are you using?

from file-integrity-operator.

felixkrohn avatar felixkrohn commented on June 22, 2024

0.1.13 as distributed by RH on operatorhub (image: http://quay.io/file-integrity-operator/file-integrity-operator:0.1.13)

from file-integrity-operator.

felixkrohn avatar felixkrohn commented on June 22, 2024

Is there anything I can do to help you debug this? (we're not yet running it in production)
2021-04-12 09_09_21-Prometheus Time Series Collection and Processing Server

from file-integrity-operator.

JAORMX avatar JAORMX commented on June 22, 2024

@felixkrohn we'll look into it.

from file-integrity-operator.

felixkrohn avatar felixkrohn commented on June 22, 2024

@mrogers950 Thanks to the great how-to 👍 I got it running, and will send you the .gz files next week (don't hesitate to remind me should I forget...)

from file-integrity-operator.

felixkrohn avatar felixkrohn commented on June 22, 2024

Did the traces help in any way?
Would it be OK to add memory limits (something between 500 and 1000M) to the f-i-o deployment, or do you expect this could cause unwanted side effects or even reduce reliability of the results?

from file-integrity-operator.

felixkrohn avatar felixkrohn commented on June 22, 2024

Great news! thanks for the update.

from file-integrity-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.