The reason this is so important is because containers either have a single entrypoint, or a programatically predictable (internally modular) set of entrypoints (e.g., see https://sci-f.github.io.
|
<p> |
|
This Oxford Common File Layout (OCFL) specification describes an application-independent approach to the storage |
|
of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term |
|
object management best practices within digital repositories. |
|
</p> |
|
<p> |
|
This specification covers two principle areas: |
|
</p> |
Right now the Scientific Filesystem (scif above) has internal modularity for the software inside, and it has a spot for data, but not any specification to how it's organized. This would be a really strong use case for how the standard can work with containers.
Best Practices
We might want to be conservative about saying "best practices." It is developing a standard so it's (by definition) something like that, but many will hear "best practices" and immediately have reason that it's not the absolute best.
|
object management best practices within digital repositories. |
What if instead we say "reproducible practices" or "standardized practice" and use a more descriptive word than "best" ?
Two Specifications
This part
|
<li>Structure. A normative specification of the nature of an OCFL Object (the "object-at-rest");</li> |
|
<li>Client Behaviours. A set of recommendations for how OCFL Objects should be acted upon (the |
|
"object-in-motion") |
|
</li> |
I think there are two (very big and different) things here that would need to eventually work together. The first part to tackle I think is just the first point, and doing so to do it with the second in mind. The second is the language to interact with it. This again smells a lot like scif --> https://sci-f.github.io/spec-v1 in that the main difference being that scif has user interface commands that "feel" a lot like how you would interact with a container (e.g. "run" "exec" "shell") but notably, it doesn't have to be installed in a container.
I think it would be stronger to package these two things somewhat separately, because even if you branded them under the same label, I can see use cases where someone might want one or the other (but not both).
|
<p>The OCFL initiative arose from a need to have well-defined application-independent file management within |
|
digital repositories.</p> |
|
<p>A general observation is that the contents of a digital repository -- that is, the digital files and metadata |
digital repositories such as? A file system in an old library is different from object storage, which is different from a flat database, which is different from a relational.
|
<p>A general observation is that the contents of a digital repository -- that is, the digital files and metadata |
|
that an institution might wish to manage -- are largely stable. Once content has been accessioned, it is |
|
unlikely to change significantly over its lifetime. This is in contrast to the software applications that |
|
manage these contents, which are ephemeral, requiring constant updating and replacement. Thus, transitions |
|
between application-specific methods of file management to support software upgrades and replacement cycles |
This is a really important point - I don't think that institutions even know what these metadata are, at least the ones that have awareness of it are farther along. This is probably more challenging than making the standard itself - going across software / archives in general for data or other and deciding what are the "important metadata." There needs to be a whole workflow to create groups, assign responsibilities, and go through change cycles just for doing that. This is another reason I would set aside the set of functions for now - the needs and functions for interacting with the data structure probably are going to vary based on the domain, and also might be better carried out by third party software that writes tools understanding the datastructure.
|
An OCFL Object is a group of one or more content bitstreams (data and metadata), and their administrative |
|
information that are together identified by a URI. The object may contain a sequence of versions of the |
|
bitstreams that represent the evolution of the object's contents. |
|
</p> |
What do these permissions look like? POSIX? Something else? How does this vary based on traditional NFS vs an object store or a signed container?
|
[object_root] |
|
├── 0=ocfl_object_1.0 |
|
├── inventory.jsonld |
|
├── inventory.jsonld.sha512 |
|
├── logs |
|
│ └── .keep |
So this is scoped to a filesystem then? A filesystem in a container?
Another gut reaction is that this sounds a lot like what people would describe as a "data container" but just with a required organization inside. Again a little bit like scif, since it has the /scif root and a predictable structure for the contents within :)
Are there any example use cases? That might be useful.
So I would say this is really great so far! Maybe for this first draft focus on just the organization, and a plan for how the community works around it? Then either solutions will start to emerge (or a need for a general set of functions) for interacting with the data structures.