Comments (8)
Hi @fscheiner, short answer: no it hasn't been merged in. Whether it's still needed I don't know.
from gct.
Whether it's still needed I don't know.
Well, if it wasn't changed, I assume it's still a problem, also because of the description in globus/globus-toolkit#63. If the deadlock behaviour can be easily reproduced, we can just compare the results of (1) using the unmodified source and (2) using a "patched" source, to determine if it's needed or not.
from gct.
@msalle
I seem to remember observing globus-gridftp-server
processes that keep running w/o producing any load after past transfers were finished. This could be related to the deadlocks described in globus/globus-toolkit#63. So I will try to reproduce this behaviour with the current GCT code locally and then compare to the behaviour with @bbockelm's patch included.
from gct.
The lock is already held during fork().
The fork is done in globus_l_gfs_spawn_child():
gct/gridftp/server/src/globus_gridftp_server.c
Lines 439 to 443 in da14279
And though this function does not lock the mutex, it is called from
gct/gridftp/server/src/globus_gridftp_server.c
Lines 1154 to 1165 in da14279
gct/gridftp/server/src/globus_gridftp_server.c
Line 1198 in da14279
gct/gridftp/server/src/globus_gridftp_server.c
Lines 1270 to 1273 in da14279
I.e. the lock is already held when globus_l_gfs_spawn_child() is called. There is no need to lock it again in this function. Locking the same mutex again from the same thread would be undefined behaviour.
from gct.
Never mind - I now realise that the original PR was for a different mutex than this one. Please disregard my comment.
from gct.
Hi again.
I looked at this as part of preparing my "fix some compiler warnings" PR #179.
I now think that my comment above was not so off the mark after all.
In the source file in question - gridftp/server/src/globus_gridftp_server.c - the mutex globus_l_gfs_xio_server->mutex is never locked or released. All manipulations of globus_l_gfs_xio_server (assignments, function calls, ...) are (or at least are supposed to be) protected by holding the global mutex globus_l_gfs_mutex. So any locking of mutex globus_l_gfs_xio_server->mutex should not be able to persist if the global mutex is released, and hence my comment above is valid.
However, there is a flaw to this logic - which I think is a bug. For some reason, the lock is released here:
inside the globus_l_gfs_spawn_child() function, even tough the logic demands that the mutex should be locked for the duration of this function. The logic demands that the mutex stays locked after returning from the call to globus_l_gfs_spawn_child() and is released by the call to globus_mutex_unlock on line 1270:
gct/gridftp/server/src/globus_gridftp_server.c
Lines 1192 to 1286 in da14279
As can be seen above, there are some function calls after the return from globus_l_gfs_spawn_child() than uses globus_l_gfs_xio_server, that could potentially screw up the locking on mutex globus_l_gfs_xio_server->mutex without the global lock being locked due to the early release of the lock on line 538.
This unlock also makes no sense for a different reason. Every other call to globus_xio_register_close() is done while the mutex is locked. Here the mutex is released just before calling this function, which makes the call questionable for a different reason.
Dropping the call to release the lock early would fix this.
The direct call to globus_l_gfs_close_cb() in case the callback registration fails must then be handled in some way, since the callback tries to acquire the lock. This could be
if(result != GLOBUS_SUCCESS)
{
globus_mutex_unlock(&globus_l_gfs_mutex);
globus_l_gfs_close_cb(handle, result, NULL);
globus_mutex_lock(&globus_l_gfs_mutex);
}
But the callback function is:
gct/gridftp/server/src/globus_gridftp_server.c
Lines 583 to 601 in da14279
So this would be equivalent to:
if(result != GLOBUS_SUCCESS)
{
globus_mutex_unlock(&globus_l_gfs_mutex);
globus_mutex_lock(&globus_l_gfs_mutex);
globus_i_gfs_connection_closed();
globus_mutex_unlock(&globus_l_gfs_mutex);
globus_mutex_lock(&globus_l_gfs_mutex);
}
i.e., one could just write:
if(result != GLOBUS_SUCCESS)
{
globus_i_gfs_connection_closed();
}
Which would be consistent with the rest of the code in the file, since in other places where, if a callback registration fails in a place where the lock is held, the backup is to execute the code of the callback without the lock/unlock, e.g.:
gct/gridftp/server/src/globus_gridftp_server.c
Lines 905 to 920 in da14279
gct/gridftp/server/src/globus_gridftp_server.c
Lines 866 to 885 in da14279
Coincidently, this change is proposed in the PR #179:
https://github.com/gridcf/gct/pull/179/files#diff-f76f3cae29cf9ff2977b4124b6177a87eb0b1500c72908a7ca6194a6c0e21bd8
from gct.
This came in some 18 years ago. Prior to that, the child was spawned before the locking of globus_l_gfs_mutex
had happened (see
gct/gridftp/server/src/globus_gridftp_server.c
Lines 435 to 462 in 7e6fa4e
...for details). Not sure what the reasons were to change that back then.
But is globus_l_gfs_mutex
for the parent the same mutex as globus_l_gfs_xio_server->mutex
for the spawned child? I ask, because if not, I don't see a chance for a deadlock in the child. But as @bbockelm detected one in globus/globus-toolkit#63, there must have been one in the code (at least back then).
Won't there be a real chance for a deadlock for the parent in
gct/gridftp/server/src/globus_gridftp_server.c
Lines 539 to 543 in da14279
...,after removing the
globus_mutex_unlock()
in line 538, because inside the call in line 539 there happens a globus_mutex_lock()
on handle->context->mutex
(see gct/xio/src/globus_xio_handle.c
Line 2604 in da14279
globus_l_gfs_mutex
- can't succeed until globus_l_gfs_mutex
is unlocked in line 1270.
UPDATE: But maybe not, because if both were the same, the deadlock would - without any changes to line 538 above - already happen in the call to globus_xio_register_open()
in
gct/gridftp/server/src/globus_gridftp_server.c
Line 1180 in da14279
globus_l_xio_register_open()
which then calls globus_mutex_lock(&handle->context->mutex)
before the call to globus_l_gfs_spawn_child()
happens).
I'll check if the changes from #179 show any differences in behaviour for the Globus GridFTP server compared to the current code state (@da14279). I still wonder how the mentioned deadlocks could happen.
from gct.
I assume this is no longer relevant or needed with the inclusion of #179 (specifically https://github.com/gridcf/gct/pull/179/files#diff-f76f3cae29cf9ff2977b4124b6177a87eb0b1500c72908a7ca6194a6c0e21bd8).
from gct.
Related Issues (20)
- fail to compiler gct-6.2 because of openssl HOT 3
- Can't install gct-toolkit release gct-6.2.20210826 HOT 13
- fail to globus-job-run becasue of no permission to access tmp directory on execution node
- globus-gridftp, globus-gram5 and globus-gsi not found HOT 1
- globus_gsi_cert_utils_error.c:42: possible missing "," ? HOT 5
- globus-job-run fails because the job manager failed to create an internal script argument file HOT 2
- where is MDS in GT6 HOT 2
- globus-job-run fails because of no permission to tmp directory HOT 2
- DNS error on repo.gridcf.org HOT 3
- TLSv1.3 handling incorrectly assumes exactly two tickets will be sent
- Weak GSSAPIKexAlgorithms ciphers detected HOT 5
- grid-proxy-init w/OpenSSL 3.x: Weakly encrypted PKCS#12 keystores can't be processed HOT 1
- pipeline doesn't work: ERROR: too many url strings specified HOT 6
- Typo in globus_gsi_system_config.c HOT 1
- autoreconf failure: files not found HOT 1
- Build error: undefined reference to `FIPS_mode' HOT 9
- confusion between ASN1_UTCTIME and ASN1_GENERALIZEDTIME HOT 5
- Lack of IO error checks generate incorrect file checksums HOT 4
- Unknown/unsupported OpenSSL version ("30100040 (OpenSSL 3.1.4 24 Oct 2023)") HOT 9
- RHEL9 clients and dCache on java-17 compatibility HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gct.