This directory contains C++ test code for testing the locking of a single file by multiple processes. It was developed to test the behavior of the Lustre "global locking" feature which can be configured on or off. Problems arose when testing global locking with casacore. In these tests, it appeared as though using global locking with CASA resulted in deadlock.
This test code is written in C++ (-std=c++20
) and it uses cppzmq for queueing
up actions for the parallel processes. The locking options are:
- read lock on file immediately
- write lock on file immediately
- read lock on file at a specific time
- write lock on file at a specific time
- release lock immediately (whether held or not)
- release lock immediately (if held)
- release lock at a specific time (whether held or not)
- release lock at a specific time (if held)
The only other command that the locking processes accepts is stop
which cause them to exit.
First install cppzmq. Then running make
should build the test executables,
running make clean
should remove the built files and make package
should create a tar file in ..
The test can be run using ./controller /tmp/lock-file 10
which will start 10
locking processes which
will try to lock /tmp/lock-file
at the same time.
The current controller.cc
code has all of the locking processes just attempt to acquire a write lock and then
release it. This is done with the code:
send( "file@"+lock_file ); send( at(now( ) + seconds(1),"write") ); send( "releaseif" ); recv( []( const pair<pid_t,string> &r ) -> bool { cout << "Controller received: " << get<1>(r) << " from " << get<0>(r) << endl; return true; /*** true means don't retain result in return ***/ } ); send( "stop" );
The locking processes only send a result back to the controller when they encounter an empty command queue. This implementation is flexible and allows for testing complicated locking behavior.
Lustre "global" file locking (i.e. the locking across nodes that a distributed filesystem is expected to do) has not worked reliably. Due to this "local" locking has been used, i.e. file locking which is limited to single node. Switching to "global" file locking resulted in deadlock.