Comments (15)
@felixkrohn yes, thanks for your help! the pprof data shows what I expected, which is that the daemon's actual heap usage is only a small percentage of the total that is reported by the cluster:
Here only about 7mb total:
This coincides with what I found about the reserved space used by the go runtime which, I tried to outline briefly here: https://mrogers950.gitlab.io/golang/2021/03/12/wild-crazy-golang-mem/
So I believe the high usage will be addressed by golang/go#44167 , (referenced by golang/go#43699).
But I think that now we can support pod limits properly because the daemon pods are more robust and should be able to handle restart by OOM occasionally. I'll work on a PR for that.
from file-integrity-operator.
Thank you for filing the issue. We'll look into it next sprint.
While the resource limits are something we wanted to set either way, we also want to see if we can find the root cause of the leak.
from file-integrity-operator.
@felixkrohn would you be able to run with the steps outlined in https://mrogers950.gitlab.io/openshift/2021/04/12/fio-profile/ ?
It will enable pprof for the ds pods, but requires a container build from source. If you can capture the heap data at a few points (like once a few days in, then again the next week), that could be useful for us to take a look at. I've traced the same slow leak myself and it would be good to have a comparison.
from file-integrity-operator.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
from file-integrity-operator.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
/remove-lifecycle stale
from file-integrity-operator.
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
from file-integrity-operator.
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen
.
Mark the issue as fresh by commenting/remove-lifecycle rotten
.
Exclude this issue from closing again by commenting/lifecycle frozen
./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from file-integrity-operator.
Would it be possible to re-open this issue? After a week running the pods consume about 3GiB RAM each.
Current workaround could be to set namespaced defaults, but I find this less elegant.
from file-integrity-operator.
@felixkrohn what versoin are you using?
from file-integrity-operator.
0.1.13 as distributed by RH on operatorhub (image: http://quay.io/file-integrity-operator/file-integrity-operator:0.1.13)
from file-integrity-operator.
Is there anything I can do to help you debug this? (we're not yet running it in production)
from file-integrity-operator.
@felixkrohn we'll look into it.
from file-integrity-operator.
@mrogers950 Thanks to the great how-to 👍 I got it running, and will send you the .gz files next week (don't hesitate to remind me should I forget...)
from file-integrity-operator.
Did the traces help in any way?
Would it be OK to add memory limits (something between 500 and 1000M) to the f-i-o deployment, or do you expect this could cause unwanted side effects or even reduce reliability of the results?
from file-integrity-operator.
Great news! thanks for the update.
from file-integrity-operator.
Related Issues (20)
- image file-integrity-operator-index has no latest tag HOT 3
- [ocp4.8] FIO Servicemonitor not accessible by openshift-user-workload-monitoring prometheus instance HOT 4
- Operator not cleaning up old aide.log.backup and aide.db.gz.backup files HOT 6
- Operator does not expose a native way to modify nodeSelector HOT 4
- Move aide.reinit out of `/etc` HOT 3
- feature request: add `s390x` / multi arch support HOT 4
- Add support for ppc64le and the capability to build images using GitHub Actions or any CI mechanism HOT 3
- license for the operator HOT 2
- Future Release Branches Frozen For Merging | branch:release-4.16 branch:release-4.17
- The controller-gen dependency isn't tracked using go modules
- Makefile release targets assume repository conventions
- The kustomize dependency isn't tracked using tools.go HOT 4
- The setup-envtest dependency isn't tracked using tools.go
- End to end tests broke including the namespace in alerts HOT 4
- Using *undeploy Makefile targets outputs Not Found errors
- Unable to run end-to-end tests using operator installed from catalog source
- The make release-images target doesn't properly handle the latest tag HOT 4
- Unable to install kustomize with golang 1.18
- Add initialDelay option HOT 5
- End-to-end CI is broken HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from file-integrity-operator.