Comments (6)
It's very hard to reproduce. Heaptrack indicates it's only about ~6% of connections. I tried with a custom program that just did these checks endlessly and it never leaked. If I try on the program where I discovered this it takes 10+ hours for the leak to occur. I did my own custom changes to go-ceph
; tried to remove runtime.SetFinalizer
, tried directly freeing if conn.cluster != nil
and it still leaked.
from go-ceph.
Going to give some context for any users that run into similar problem I have. Looking at the code you see how I handle the connection closing (via defer
). For some reason that is not freeing up resources. I decided to use heaptrack
and low and behold rados_create2
leaks (see pic).
If I drop the defer
and instead just add Destroy
/Shutdown
where needed (if error, and before exit). There is no memory leak.
ioctx.Destroy()
conn.Shutdown()
It's really weird, it's like the defer
calls to conn.Shutdown()
or ioctx.Destroy()
do not get called.
from go-ceph.
@phlogistonjohn go-ceph indeed has a memory leak with how Shutdown
is called. See above code and golang/go#43363 (comment).
If you return a *rados.Conn
and then immediately defer conn.Shutdown()
it will not be evaluated properly and will therefore leak.
I have worked around this by just calling the Shutdown()
/Destroy()
at the end of whatever it is I'm doing. However it may also be possible to defer func() { conn.Shutdown() }()
to keep the code more idiomatic.
Since the defer
is evaluated at defer
time; the freeConn(c)
and/or ensureConnected
are buggy in some way.
Basically, the connection I return after conn.Connect()
isnt "complete" enough for defer conn.Shutdown()
to properly reap resources. I might experiment with it more but if done rapidly in a gourtine the connections can leak.
from go-ceph.
OK, thanks for the update. Without a lot of investigation on my part yet, an issue with Shutdown seems more plausible to me. I'm reopening this issue since it automatically got closed from the other PR. We'll look into it soon.
from go-ceph.
It's very hard to reproduce. Heaptrack indicates it's only about ~6% of connections. I tried with a custom program that just did these checks endlessly and it never leaked. If I try on the program where I discovered this it takes 10+ hours for the leak to occur. I did my own custom changes to
go-ceph
; tried to removeruntime.SetFinalizer
, tried directly freeing ifconn.cluster != nil
and it still leaked.
@shell-skrimp So, I'm a bit confused now. Which of your findings from above (defer vs direct call etc.) are still valid? Now it sounds here like there is a leak no matter what. To be honest, it even sounds like there might be a race within ceph itself. But I just started to look into this, so it's just a gut feeling.
from go-ceph.
@ansiwen neither are valid. I thought that direct calling was better than defer
because testing showed initially that there was no memory leak, but in the end there was still a memory leak, it just took thousands of new connections to the ceph cluster to reproduce.
What I did in my testing:
- Try removal of
runtime.SetFinalizer
; no difference. - Try removal of
defer
and directly free/close/destroy; no difference. - Change
Shutdown
toif c.cluster != nil { c.rados_shutdown(...}
(going by memory on this one); no difference
In the mean time I switched to a long lived ceph connection and that seems to have fixed issue for now.
from go-ceph.
Related Issues (20)
- CI job 'check' emits a warning about go.mod
- rgw: Empty usage problem HOT 3
- TestPingMonitor crashes HOT 6
- Enhance `GetPoolStats()` Method to Include `Num_bytes_available` Field in `PoolStat` HOT 2
- APIs pending stability updates in v0.24.0
- Support bucket scope quota HOT 4
- Add support for rbd_resize2
- Implement subvolume quiesce API HOT 4
- Need squid branch support
- Pacific CI jobs are failing with package dependencies HOT 5
- Should `Resize()` after `EncryptionLoad()` account for the encryption header space? HOT 7
- Quiesce test failing for pre-squid HOT 3
- TestCloneSubVolumeSnapshot failing on ceph main branch HOT 2
- APIs pending stability updates in v0.27.0
- APIs pending stability updates in v0.28.0
- TestRadosGWTestSuite/TestUserBucket is consistently failing in CI HOT 5
- API call to set image QoS HOT 2
- CI failures with pre-reef and main jobs HOT 1
- APIs pending stability updates in v0.29.0
- Build error on ioctx_octopus.go even will using -tags HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from go-ceph.