Giter Club home page Giter Club logo

Comments (24)

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024 6

Using the undocumented OFD values for MacOS works!!! (Thanks to Darren for suggesting the "evil" experiment.)

An experimental build with OFD locks on Intel MacOS Ventura was able to run mdstcl and execute the deco command without throwing errors. And jTraverser was able to open and edit trees (including modifying data). Plus the TreeSegmentTest (threaded version) passed.

We should probably use the "evil" solution to fix this MacOS Ventura locking issue. Nonetheless, before using an undocumented feature of the kernel, would be good for the team to at least briefly discuss the pros / cons of doing so.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024 1

Thanks for digging into the MacOS Ventura kernel. I will conduct that "evil" test later today.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Encountered this same locking issue when used jTraverser to open a tree. This was when using an experimental build of MDSplus for MacOS Ventura 13.4.1 on Apple Silicon.

Note that Josh's bug was for MacOS Ventura on Intel.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Another easy way to trigger the bug is to add a node. Following output was generated by an experimental build of MDSplus for MacOS Ventura 13.4.1 on Apple Silicon.

TCL> edit bug_2599 /shot=-1 /new
TCL> add node some_text/usage=text
Error adding node some_text
Error was: %TREE-W-LOCK_FAILURE, Error locking file, perhaps NFSLOCKING not enabled on this system
mdsdcl: --> failed on line 'add node some_text/usage=text'
TCL> 
TCL> add node a_num/usage=num
Error adding node a_num
Error was: %TREE-W-LOCK_FAILURE, Error locking file, perhaps NFSLOCKING not enabled on this system
mdsdcl: --> failed on line 'add node a_num/usage=num'

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

The "Open File Descriptor" (OFD) locks are failing on MacOS Ventura. Using the LLDB debugger, traced the problem to treeshr, and specifically to the io_lock_local() routine in RemoteAccess.c.

The problem is on the fcntl() call in the following code block. If is failing and thus setting err to -1.

  if (use_ofd_locks == 1)
  {
    flock.l_pid = 0;
    err = fcntl(fd, nowait ? F_OFD_SETLK : F_OFD_SETLKW, &flock);
    if (err != 0 && errno == EINVAL)

This investigation triggered the bug by adding a node to a tree. Experiment done with MacOS Ventura running on Apple Silicon, using an experimental build of MDSplus.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Disabling the OFD locks (and instead using "process" locks) allows both mdstcl and jTraverser to work.

Although "process" locks are OK for a single program that is multi-threaded, the "open file descriptor" locks are better if have multiple programs simultaneously accessing the same tree. This difference in the locking merits more investigation and discussion before a decision is made regarding a fix.

mwinkel@lap-winkel y_trees % mdstcl
TCL> edit xxx /shot=-1 /new
TCL> add node a/usage=num
TCL> add node b/usage=text
TCL> dir

\XXX::TOP

 :A            :B           

Total of 2 nodes.
TCL> write
TCL> close
TCL> quit

image

The above examples were with an experimental build of MDSplus for MacOS Ventura running on Apple Silicon. (Also need to run these examples on MacOS Ventura for Intel.)

from mdsplus.

joshStillerman avatar joshStillerman commented on July 22, 2024

I would recommend putting in this fix with a #ifdef for apple while we investigate

from mdsplus.

dgarnier avatar dgarnier commented on July 22, 2024

What is that file descriptor pointing to? The actual tree file, or some temporary lock file?

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Because this bug is blocking issue #2597, it inherits the "U.S. Priority" label from that issue.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

The LLDB debugger confirms that the file descriptors are pointing to the tree file, and not to a temporary lock file.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Using "process" locks instead of "open file descriptor" locks also eliminates the error message associated with the deco command (see initial bug report way above).

However, now the deco command segfaults (when running an experimental MDSplus for MacOS Ventura on Apple Silicon). If the segfault also occurs on MacOS Ventura for Intel, it will be submitted as a separate issue.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Using "process" locks instead of "OFD" locks works on Intel MacOS Ventura 13.5. There are no error messages about locking, nor segfaults. For this experiment, used an experimental "CMake" build of MDSplus that was done on an Intel Mac.

Additional testing with Aarch64 (aka Arm64) Ubuntu22 also didn't trigger a segfault with the deco command. The segfault is specific to Apple Silicon (aka Arm64) MacOS Ventura. Thus is just a part of Issue #2597 to port MDSplus to MacOS Ventura on Apple Silicon.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

The previous post is not quite correct. The TreeSegmentTest fails on Intel MacOS Ventura when using "process" locks. (Experiments with Arm64 Ubuntu 22 show that the test requires "OFD" locks to pass.). However, all other C and Python tests pass (on Intel MacOS Ventura) with "process" locks.

If "OFD" locks are used on Intel MacOS Ventura, then three tests fail: TreeSegmentTest, TreeDeleteNodeTest and mdslib_ctest.

As for Apple Silicon MacOS Ventura, there are ~20 tests that segfault. (Perhaps a compiler flag is missing.). Conjecture is that when the segfaults are fixed, that the "process vs. OFD" locking issue triggered by TreeSegmentTest will also show up in the Apple Silicon build of MDSplus.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Correction to my post above (the one with the jTraverser screenshot) -- which got the facts backwards. The OFD locks are needed for multi-threaded programs; the standard "record" locks (bound to a process) are for synchronizing file access by cooperating processes.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

The OFD locks do not appear in the fcntl.h file that ships with Apple's Xcode development tool.

On MacOS Ventura, the file is found at this location:

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/fcntl.h

And although the PureDarwin project added OFD locks a few years ago, it is unclear if those changes are in MacOS Ventura. (Both MacOS and PureDarwin are derived from Apple's open source Darwin project, but the forks have likely diverged.)

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

The OFD locks fail on both the APFS and HFS+ file systems. (HFS+ is also known as "Mac OS Extended".)

from mdsplus.

dgarnier avatar dgarnier commented on July 22, 2024

The source code for ventura kernel is here: https://github.com/apple-oss-distributions/xnu. The kernel code for OFD locks are there, in the rel/xnu-8792 branch... which is the kernel for Ventura.

What I'm not understanding is how we even got OFD locks to compile? (see below!)

If they aren't defined in fcntl.h. In the kernel they are defined in xnu/bsd/sys/fcntl.h:

#ifdef PRIVATE
#define F_OFD_SETLK             90      /* Acquire or release open file description lock */
#define F_OFD_SETLKW            91      /* (as F_OFD_SETLK but blocking if conflicting lock) */
#define F_OFD_GETLK             92      /* Examine OFD lock */

These definitions seem to be added for xnu-3247... (and did not exist in kernel before that time.)

Internally this is implemented using the vfs calls.. where this is implemented. So.. looks like a private API.. which probably matches your non-finding of it in the library.

However it currently compiles... there is some code that puts THIS in:

#ifndef F_OFD_SETLK
#define F_OFD_GETLK 36
#define F_OFD_SETLK 37
#define F_OFD_SETLKW 38

So.. that was basically undocumented behavior at that point. Who knows what it is supposed to do? I assume it did a no-op.

I wonder what would happen if you add:

#ifndef F_OFD_SETLK
#if defined (__APPLE__)
#define F_OFD_SETLK      90    
#define F_OFD_SETLKW   91 
#define F_OFD_GETLK      92    
#else
#define F_OFD_GETLK 36
#define F_OFD_SETLK 37
#define F_OFD_SETLKW 38
#endif

This is definitely evil, but...

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Hacked up the TreeSegmentTest so that when used with standard record locks, it launches processes (not threads). And confirmed that the test runs successfully with three processes (each single-threaded) writing segments concurrently to the same tree. The test used the default settings: 3 writing processes, 100 segments written by each, with every segment having 10,000 elements.

Thus if the undocumented OFD locks in MacOS Ventura don't work, the fallback option for MacOS users of MDSplus will be to only use single-threaded programs for I/O to trees.

The test was performed with an experimental build of MDSplus using CMake on Intel MacOS Ventura.

from mdsplus.

dgarnier avatar dgarnier commented on July 22, 2024

Looking back at the history of this code, seems to me that this change maybe should just be reversed. This is the merge request that effectively was evil, basically assuming that if it isn't windows, its linux. The threaded behavior was there before, so you can see where we could put it back.

#2163

Oh.. and obviously, if we want MacOS as a first class citizen, we need to run unittests on it as well.

from mdsplus.

dgarnier avatar dgarnier commented on July 22, 2024

I found another project on Github that uses my "evil" solution. Who knows.
https://github.com/acronis/pcs-io/blob/a246d934f8fc389346eed0b843c3fd3b00494182/libpcs_io/pcs_compat.h#L96

Also.. here's a blog post that says that the apple shipped version of sqlite3, which is custom and built into MacOS.. also uses this private api.

https://bonsaidb.io/blog/acid-on-apple/

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Issue #2163 changed many files, but has been in place for two or three years. My hunch (perhaps incorrect) is that it would be safer to keep #2163 and instead just make the minor changes needed to get MDSplus to work with MacOS Ventura.

Issue #2198 is a related change that is incompatible with the existing multi-threaded TreeSegmentTest, because if an operating system does not have OFD locks, then #2198 instead uses standard record locks (not thread-safe). If we allow fallback to record locks, then TreeSegmentTest needs to support that scenario too.

  • On an OS that supports OFD locks, TreeSegmentTest runs its multi-threaded code (e.g., three writer threads).
  • On an OS that only supports record locks, TreeSegmentTest launches processes (e.g., three writer programs).

from mdsplus.

dgarnier avatar dgarnier commented on July 22, 2024

We should probably use the "evil" solution to fix this MacOS Ventura locking issue. Nonetheless, before using an undocumented feature of the kernel, would be good for the team to at least briefly discuss the pros / cons of doing so.

Given that Apple is using it, and it's been built in for 5 past MacOS versions, I think its pretty safe. Watching MacOS evolve over the past 20 years, I think this is much more likely to move from private API to public, then to be removed. It adds compatibility with other codes, is much better than the broken SysV and POSIX semantics, and I can't see a security issue with it.

Of course, we should document the code and keep the test software so the fault shows up quickly with never OS versions.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Apple introduced the OFD locks in the kernel, xnu-3247.1.106, which was used in MacOS El Capitan back in 2015. (Thus the OFD locks have been present in 8 versions of MacOS.) And the #defines for the OFD locks have been constant over that period (no change in the numbers).

So I concur that it is reasonable to use the undocumented OFD locks in MDSplus.

from mdsplus.

mwinkel-dev avatar mwinkel-dev commented on July 22, 2024

Apple uses open source software in MacOS, thus publishes the internals on GitHub. For more information about the OFD locks, refer to the following links.

https://opensource.apple.com/releases/

  • click on "Releases" in upper right corner of the page
  • select the desired release of MacOS, and click to go to GitHub
  • then navigate to: xnu/bsd/sys/fcntl.h

https://github.com/apple-oss-distributions/xnu/blob/rel/xnu-3247/bsd/sys/fcntl.h

  • the OFD locks were introduced in xnu-3247
  • OS X El Capitan 10.11 (released 30-Sep-2015) is based on xnu-3247.1.106

from mdsplus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.