samvera / hydra-works Goto Github PK

View Code? Open in Web Editor NEW

24.0 61.0 14.0 1.6 MB

A ruby gem implementation of the PCDM Works domain model based on the Samvera software stack

License: Other

Ruby 100.00%

ruby pcdm core-components samvera-community

hydra-works's Introduction

Hydra::Works

Code:

Docs:

Community Support:

What is hydra-works?

The Hydra::Works gem implements the PCDM Works data model using ActiveFedora-based models. In addition to the models, Hydra::Works includes associated behaviors around the broad concept of describable "works" or intellectual entities, the need for which was expressed by a variety of Samvera community use cases.

Product Owner & Maintenance

hydra-works was a Core Component of the Samvera Community. Given a decline in available labor required for maintenance, this project no longer has a dedicated Product Owner. The documentation for what this means can be found here.

Product Owner

Vacant

Until a Product Owner has been identified, we ask that you please direct all requests for support, bug reports, and general questions to the #dev Channel on the Samvera Slack.

Help

The Samvera community is here to help. Please see our support guide.

Getting Started

The PCDM Works domain model includes the following high-level entities:

Collection: a pcdm:Collection that indirectly contains zero or more Works and zero or more Collections
Work: a pcdm:Object that holds zero or more FileSets and zero or more Works
FileSet: a pcdm:Object that groups one or more related pcdm:Files, such as an original file (e.g., PDF document), its derivatives (e.g., a thumbnail), and extracted full-text

View a diagram of the Hydra::Works domain model.

Behaviors included in the model include:

Characterization of original files within FileSets
Generation of derivatives from original files
Virus checking of original files
Full-text extraction from original files

Dependencies

Check out the Hydra::Derivatives README for dependencies.

Additional dependencies required for specs

ClamAV

Mac installation

$ brew install clamav
$ cp /usr/local/etc/clamav/freshclam.conf.sample /usr/local/etc/clamav/freshclam.conf
$ freshclam

Installation

Add these lines to your application's Gemfile:

gem 'hydra-works', '~> 0.15'

And then execute:

$ bundle install

Or install it yourself:

$ gem install hydra-works

Usage

Usage involves extending the behavior provided by this gem. In your application, you can create Hydra::Works-based models like so:

class Collection < ActiveFedora::Base
  include Hydra::Works::CollectionBehavior
end

class Book < ActiveFedora::Base
  include Hydra::Works::WorkBehavior
end

class Page < ActiveFedora::Base
  include Hydra::Works::FileSetBehavior
end

collection = Collection.create
book = Book.create
page = Page.create

collection.members << book
collection.save

book.members << page
book.save

file = page.files.build
file.content = "The quick brown fox jumped over the lazy dog."
page.save

Virus Detection

To turn on virus detection, install ClamAV on your system and add the clamby gem to your Gemfile

gem 'clamby'

Then include the VirusCheck module in your FileSet class:

class Page < ActiveFedora::Base
  include Hydra::Works::FileSetBehavior
  include Hydra::Works::VirusCheck
end

Access controls

We are using Web ACL as implemented by hydra-access-controls.

How to contribute

If you'd like to contribute to this effort, please check out the contributing guidelines

Development

Testing with the continuous integration server

You can test Hydra::Works using the same process as our continuous integration server. To do that, run the default rake task which will download Solr and Fedora, start them, and run the tests for you.

rake

Testing manually

If you want to run the tests manually, first run solr and FCRepo. To start solr:

solr_wrapper -v -d solr/config/ -n hydra-test -p 8985

To start FCRepo, open another shell and run:

fcrepo_wrapper -v -p 8986 --no-jms

Note you won't find these ports mentioned in this codebase, as testing behavior is inherited from ActiveFedora.

Now you’re ready to run the tests. In the directory where hydra-works is installed, run:

rake works:spec

Acknowledgments

This software has been developed by and is brought to you by the Samvera community. Learn more at the Samvera website.

hydra-works's People

Contributors

Stargazers

Watchers

Forkers

jpstroop jhsimpson hz3 grosscol blancoj njaffer dlacy anukat2015 andjsmit dunn mlibrary avalonmediasystem jrgriffiniii nycrecords

hydra-works's Issues

Hydra::Works::GenericFile Behaviors

The points below are from @elrayle original comments in code. I took out a reference Hydra::Works::File since that class is something that we decided not implement. Which of these are behaviors that actually need to be implemented in hydra-works?

Hydra::Works::GenericWork can NOT aggregate Hydra::PCDM::Collection
Hydra::Works::GenericWork can NOT aggregate Hydra::Works::Collection
Hydra::Works::GenericWork can NOT aggregate Works::GenericWork unless it is also a Hydra::Works::GenericFile
Hydra::Works::GenericWork can aggregate Hydra::Works::GenericFile

Question regarding ICLA and asking for Use Cases via pull request

Given that a few of the participants in the conversation haven't signed ICLAs, does it make sense that we use the pull request model for the code? This is documentation and brainstorming.

I'm a little unclear. Any ideas? @escowles

Hydra::Works::Collection Behaviors

Hydra::Works::Collection can NOT aggregate Hydra::PCDM::Collection unless it is also a Hydra::Works::Collection
Allow Collection to have both GenericWorks and Collections as members

Generate vocabulary for works rdf_types

Temporarily created a vocabulary in /lib/hydra/works/vocab/works_terms.rb

Need to generate this vocabulary once the terms and properties are decided. See samvera/hydra-pcdm#24 for information on how this was done for pcdm.

GenericFile persists as a PCDM Object and hasFiles that are PCDM Files

Reference: Hydra Works PCDM Diagram

Related: #57, #59

service object: AutoGenerateThumbnail

Create a service in Hydra::Works at /lib/hydra/works/services/file/auto_generate_thumbnail.rb

module Hydra::Works
  class AutoGenerateThumbnail

    ##
    # Auto-generate a thumbnail for an existing file.
    #
    # @param [String] :path_to_source_file that serves as the base for the derivative thumbnail
    # @param [String] :path_to_target_directory where the generated thumbnail should be saved
    #
    # @return [String] the path with filename of the generated file

    def self.call( path_to_source_file, path_to_target_dir )

        # TODO write code to generate a thumbnail

  end
end

Work generic_file validation is wrong

https://github.com/projecthydra-labs/hydra-works/blob/master/lib/hydra/works/models/concerns/work_behavior.rb#L41-L42

it is validating that the type of file must be a Works::GenericFile however Sufia will want to provide a "GenericFile" which does not satisfy the current validation:

     ArgumentError:
       each file must be a Hydra::Works::GenericFile

Hydra::Works::GenericWork Behaviors

Hydra::Works::GenericWork can NOT aggregate Hydra::PCDM::Collection
Hydra::Works::GenericWork can NOT aggregate Hydra::Works::Collection
Hydra::Works::GenericWork can NOT aggregate Works::GenericWork unless it is also a Hydra::Works::GenericFile
Hydra::Works::GenericWork can aggregate Hydra::Works::GenericFile
Allow GenericWork to have both GenericWorks and GenericFiles as members

Use Case: Collections, Admin Sets, Display Sets

From Julie Rudder: https://wiki.duraspace.org/display/hydra/Collections%2C+Admin+Sets%2C+Display+Sets

(Putting here so the link doesn't get lost)

Start Sufia pcdm integration and report bugs

Within the pcdm-integration branch of sufia,

run the tests with pcdm & hydra-works as dependencies, look through the failures and record tickets for any flaws in pcdm and hydra-works that it kicks up.
start refactoring sufia to use hydra works GenericFile and GenericWork

Solrize GenericWork

Initial guess would be that you'd want works to solrize their GenericFile's metadata? Do you reindex after a generic file is saved?

service: upload to generic file

# Upload a file to a generic file, optionally running auto-generation services to create variants.
#
# @param [Hydra::Works::GenericFile] :generic_file into which to upload the file
# @param [String] :path_to_file path to the file being uploaded
# @param [Hash] :auto_gen_services info for auto-generating files from the uploaded content file
#
# @return [Hydra::Works::GenericFile] the updated generic file

Questions:

A generic_file is defined as one content file + variants that are auto-generated. Do others agree?
If the generic_file already has files, is the new file considered a new version of the existing content file?
What predicate should be used for identifying the purpose of the uploaded content file and each auto-gen file? (e.g. use - what is the full URI?)
What values should be used for the 'purpose' predicate? (e.g. file.use = "content" | "thumbnail" | "extracted_text")

This was samvera/hydra-pcdm#74

Use Case: Conference Event

Goal and Reason

Given an event (a conference), with many sub-events with many creative works created by many entities about related but different topics:

As a user:

I want to see a hierarchical list of all sub-events that can sorted by date, title, or creator while maintaining the order of any sub-components (for example, parts 1-3 of a single music performance are kept together and in order).
I want to see a timeline of events with dates and times, so that there is greater context for the event.
I want to see descriptive metadata for the entire event (the conference).
I want to see descriptive metadata for each sub-event (for example, a musical demonstration as part of a presentation).
I want to see technical metadata for each file (let's say an .mp3).
I want to see components by descriptive characteristics, so that I could for example access all the performances of electronic percussion music.

Add tests for collections, generic_works, and generic_files

Write tests equivalent to those in PCDM for each of the models in HydraWorks.

Allow addressing of segments of files as Works

Although deferred at the F2F in Portland, there are use cases that require the addressing of segments of files as Works. Some examples include:

An audio CD ripped to a single file, with offsets as to the tracks or other segmentation.
A digitized image that depicts two pages in a spread, and there is metadata about the individual pages
The division of a video of a musical performance based on different bands playing at different times

This requires multiple Works to be associated with parts of the same File. To use the book case, there would be one Work for each page, and each page would need to refer to the segment of the File that depicts it. If there are URIs for the Files, then fragment URIs could be used to refer to the areas.
The larger question is about the use of hasFile ... which of the two pages has the file? They can't both have it as that would break the files-associated-with-exactly-one-work rule.

[Integration] Update Sufia to rely on the 3 Hydra::Works classes

Update Sufia GenericFile, GenericWork and Collection to be subclasses of the corresponding Hydra::Works classes.
Report on any bugs this causes if they require changes in hydra-works or hydra-pcdm
If the Sufia test suite passes with this integration, submit a PR to Sufia

Note If this integration triggers necessary changes in sufia or sufia-models, record the tickets in that repository, not here.

Target ActiveFedora 9

Currently this works with the ActiveFedora 7 API, but UCSD and PSU need it for ActiveFedora 8

Procedure for next steps

With the use case creation in full swing, we'll want to look to the next step. Below is a straw person proposal:

I believe it makes sense to put an announcement on the Hydra Tech Call that we are in the "brainstorming" phase of use cases. We should also announce when the brainstorming phase will wind down (Oct 22 is my proposal) and the consolidation and formalization begins.

The next phase would be to produce the normalized use cases for the next Hydra Tech all (Oct 29).

From that point, I don't know how we'd proceed, but it gets us one step further along.

Allow Collection to have both GenericWorks and Collections as members

Should LinkedResources be a part of the model?

2098979#diff-dfdaffa041fae792413d39860382f74cR11

Should the linked resource be a property or an entity?

(P. S. presently the LinkedResource model is missing from this repo, but one example is here https://github.com/curationexperts/worthwhile/blob/master/app/models/worthwhile/linked_resource.rb)
This is an architecture question, so it would be great if we can get comments from those who care about modeling.

GenericCollection has files() interface

Issue: generic_collection lacks files() interface but a collection can have associated files.

Use ActiveTriples mechanism for setting multiple types when available

Effects:

collection_behavior.rb
work_behavior.rb
file_behavior.rb

Replace the following code: (substitute appropriate type for WorksTerms.Collection)

    def initialize(*args)
      super(*args)

      t = get_values(:type)
      t << RDFVocabularies::WorksTerms.Collection
      set_value(:type,t)
    end

With something like... (TBD depending on ActiveTriples implementation)

    included do
      type RDFVocabularies::WorksTerms.Collection
    end

Should Hydra::Works be an Engine

Given that Hydra::Works is a modeling exercise, do we need the verbosity of a Rails engine? The Rails engine requires continual rebuilding of a dummy application.

It could instead be a Railtie. Or a gem that leverages ActiveSupport and ActiveModel.

Create a Hydra::Works PCDM Diagram

Make a diagram based on the Sufia PCDM Diagram and add more detail around how pcdm:File metadata is expressed

Use Case: Page level representation (parts of Works)

A multi-page text (such as a book, magazine, article, or similar) may be represented by a variety of different resources with different uses. Contributed towards refining #8

Given a simple object that could be rendered in a page turning interface:

The Work
- is-described-by Descriptive metadata about the Work
- has-representation PDF of the entire Work
- has-part-order [Page 1, Page 2, Page 3, ...]
- has-part Page 1
  - is-described-by Descriptive metadata about Page 1
  - has-representation JPG Access copy of digitized Page 1
  - has-representation JP2/Tiff Master copy of digitized Page 1
  - has-representation ALTO XML of Page 1's textual content
- has-part Page 2
  - ...

Diagram Options for Inheritance vs. Mixins (and Namespaces) with Works

Create the contributing guidelines for Use Cases

As a contributor to the Hydra::Works project.
I want to know the process for contributing my use cases.
So that I can have my voice heard.

Documentation: Add working example

Develop and test a working example and update README.md.

Remove redundant declaration of PCDM predicates in hydra-works

hydra-pcdm defines PCDMTerms here: https://github.com/projecthydra-labs/hydra-pcdm/blob/master/lib/hydra/pcdm/vocab/pcdm_terms.rb

Does that mean this code in hydra-works redundant? https://github.com/projecthydra-labs/hydra-works/blob/master/lib/hydra/works.rb#L10-L24

service object: CreateFileWithUpload

Create a service in Hydra::Works at /lib/hydra/works/services/file/create_file_with_upload.rb

module Hydra::Works
  class CreateFileWithUpload

    ##
    # Upload a file to Fedora and create the PCDM File for it.
    #
    # @param [String] :path_to_file path to the file being uploaded
    #
    # @return [Hydra::PCDM::File] the newly uploaded file

    def self.call( path_to_file )
        # NOTE: This is not the exact ordering for these tasks.  There may be additional tasks.
        # TODO create a PCDM File
        # TODO upload the file to Fedora
        # TODO anything with technical metadata here?


  end
end

Allow GenericWork to have both GenericWorks and GenericFiles as members

Provide a definition of scope of "Work"?

My understanding is that it is a compound or complex object (eg a resource that has parts, which may themselves have parts). It is not the bibliographic notion of an abstract Work (as opposed to a physical Item that embodies the Work).

It would be good to come to a common understanding of the definition and thus scope of the effort before starting in on modeling.

Works::Collection persists as a PCDM Collection and can have GenericWorks as members

See Discussion in #18

Reference: Hydra Works PCDM Diagram

Related: #57, #58

setup service architecture

[moved from samvera/hydra-pcdm#57 by @elrayle]
create the following directory structure in Hydra::Works

\lib\hydra\works\services
     \collection
     \generic_work
     \generic_file
     \file

Remove WithEditors

I don't think this code is releavent to the core-concerns of this gem

What definition of Set/List/Collection is in scope?

From #9 and #17 and to clarify #8 ... which notions of collections (either unordered sets, or ordered lists) are in scope for discussion and which aren't?

Should hydra:contains actually be a nested node in Fedora 4?

For example, the GenericWork hasFile predicate is a sub-property of hydra:contains. Should we model this in Fedora-4 by placing the file resource at a subpath of the work?

Update Code Shredding diagram to reflect current Code Shredding spreadsheet

Update the diagram in the CurationEngine overview document:

The Diagram: GenericFiles in CuratonConcern Gem

The Spreadsheet: Code Shredding in Sufia and Hydra::Works

Audio example

Putting this out there because I'm guessing it is out of scope. However here is an example of what we will be tackling next.

Audio Example. Use case for a multi-part item where both order and hierarchy matter. http://server1.variations2.indiana.edu/variations/cgi-bin/access.pl?id=ABF3712

For classical music, the structure is assigned to match the liner notes for the work, mostly. Video works could have similar structures, for example an opera that has acts and solos within acts.

--Assigned structure supports navigation. For example, when a user clicks on a track, that part of the audio plays.

--Each track/part could correlate to separate files OR the parts could point to a time marker on one larger file.

--For each “master file”, multiple access derivatives are made to support various quality viewing experiences (which would have its own technical metadata, etc.)

--Tracks may have access control needs that are different than the intellectual work. For example, a class may only have access to 5 out of 20 tracks.

--Individual tracks/parts may be added to playlists (in the future when they exist)

Use Case: Non-repository (Solr-only) works

Our main search-and-discovery interface, Virgo (http://search.lib.virginia.edu), uses a Solr index built from a variety of sources in addition to our repository (e.g., our Sirsi/Dynix OPAC, HathiTrust, MARCXML linking to licensed content on the web, etc). These sources have their own external mechanisms for maintaining their metadata and for transforming it into Solr documents without any associated Fedora object.

We have a continuing requirement to be able to search across all items' metadata (whether the items are physical or digital) so we have a need to be able to treat Works (essentially, each thing that has a Solr record) in a coherent manner regardless of the origin of their metadata.

There may not be a lot of institutions with a similar requirement, so we wouldn't be looking for direct support of this use case per se -- we just want to ensure that the baseline Hydra::Work is extensible enough to allow it to be supported.

GenericWork has GenericFiles as members and persists as a PCDM Object

Reference: Hydra Works PCDM Diagram

Related to #58, #59

service object: MoveGenericFileToGenericWork

module Hydra::Works
  class MoveGenericFileToGenericWork

    ##
    # Move a generic file from one generic work to another
    #
    # @param [Hydra::Works::GenericWork] :old_parent_work where the generic file currently lives
    # @param [Hydra::Works::GenericWork] :new_parent_work where the generic file is being moved
    # @param [Hydra::Works::Generic_File] :child_generic_file being moved
    #
    # @return [Hydra::Works::GenericWork] the destination generic work

    def self.call( old_parent_work, new_parent_work, child_generic_file )
        # TODO validate all parameters
        # TODO create association between new_parent_work and child_generic_file
        # TODO remove association between old_parent_work and child_generic_file

        # NOTE: This is not a copy.  No new fedora objects should be created as part of this process.
  end
end

Provide a sample use case or template

We would do well to provide at least one sample use case, or a use case template, so that potential contributors know what we're looking for. #9, #10, and #11 are quite different from one another. Is that variation OK? How much detail are we looking for, and how much do we care about how they're formatted (we talked previously about user stories, and we're talking now about use cases, and these are different and equally useful things).

JSON-LD Context for ORE

There is one, but in my opinion the proxy construction makes our use cases harder:
http://www.openarchives.org/ore/0.9/jsonld#proxies

I think the intuitive context would generate the following single tree:

{
  "@id": "http://example.com/aggr",
  "@type": "ore:Aggregation",
  "iana:first": "_:p1",
  "iana:last": "_:p2",
  "aggregates": [
    {
      "@id": "http://example.org/1",
      "proxy": 
        {
           "@id": "_:p1",
           "@type": "Proxy",
           "iana:next": "_:p2",
           "proxyIn":  "http://example.com/aggr"
        }
      },
    {
      "@id": "http://example.org/2",
      "proxy": 
        {
           "@id": "_:p2",
           "@type": "Proxy",
           "iana:prev": "_:p1",
           "proxyIn": "http://example.com/aggr"
        }
      }
    ]
}

Rather than two flat lists that the currently proposed one does:

{
  "@id": "http://example.com/aggr",
  "@type": "ore:Aggregation",
  "iana:first": "_:p1",
  "iana:last": "_:p2",
  "aggregates": [
     "http://example.org/1",
     "http://example.org/2"   
  ],
  "proxies": [
       {
          "@id": "_:p1",
          "@type": "Proxy",
          "iana:next": "_:p2",
          "proxyFor": "http://example.org/1"
       },
       {
          "@id": "_:p2",
          "@type": "Proxy",
          "iana:prev": "_:p1",
          "proxyFor": "http://example.org/2"
       }
  ]
}

Thoughts?

Test that triples are correctly set in fedora objects

Right now, the tests validate behaviors. There are no tests that validate that expected predicates and values are set in the fedora object.

Ex.

Are generic works associated with a collection via the pcdm:hasMember predicate?
Are related objects associated with an collection via the ore:aggregates predicate?
etc.

Update README and Guidelines for Contributing to reflect current developments

The README is incomplete and out of date. It needs to

Declare what functionality is provided by the gem
Declare the intended uses of the gem
Declare the relationship between this code and Sufia, sufia-models, Worthwhile and Curate
Provide Instructions for Installation/Use
Provide links to background documentation (ie. Application Profile) with context explaining the relationship between that documentation and this code

The Guidelines for Contributing still say that we are soliciting Use Cases. They don't say anything about contributing code or submitting issues other than Use Cases. They should

Provide guidelines for contributing code
Provide guildelines for submitting issues and pull requests
Is there a template for this somewhere in the Hydra space? If not, should we create one and apply it to all of the repositories?

Use Case: Research Dataset

A research dataset containing a set of files organized into top-level categories of preparatory materials, raw data files, statistics, and visualization images, with multiple files in each category. The visualization images are further organized in a hierarchy by type, and then by X/Y/Z axis.

The dataset as a whole has descriptive metadata describing the research project, the research team, etc.
The top-level categories and the hierarchy of visualization images have titles.
Each individual file has a title and technical metadata.

Implement Thumbnail file type

Relates to samvera-deprecated/sufia#1010

Build a thumbnail file class in HydraWorks such that an implementation could do:

class MyGenericFile < Hydra::Works::GenericFile
  has_one :thumbnail, predicate: ::RDF::URI(http://hydraworks.namespace/hasThumbnail), class_name: "HydraWorks::Thumbnail"
end

and triples associated with MyGenericFile would include <> hasThumbnail HydraWorks::Thumbnail

The thumbnail class extends Hydra::PCDM::File and has its own rdf type as per duraspace/pcdm#5

Specifying additional technical metadata

Hydra::PCDM::File classes will specify some amount of minimal technical metadata (see samvera/hydra-pcdm#16), but this should be augmentable at the Hydra::Works integration level so an implementor may add more of their own additional technical metadata.

Should we provide integration tests that ensure this is feasible and demonstrates how to do it?

Use Cases: Co-Sponsoring?

I'm not sure if this will actually happen because we are all special ❄️ ;) But I can imagine the possibility of multiple institutions having similar use cases.

So if I wanted to co-sponsor(+1) a use case that another institution created, what practice do we want to recommend for supporting this?

Would it make sense to open a PR with a co-sponsor line with @username and/or institution?

samvera / hydra-works Goto Github PK

hydra-works's Introduction

Hydra::Works

What is hydra-works?

Product Owner & Maintenance

Product Owner

Help

Getting Started

Dependencies

Additional dependencies required for specs

ClamAV

Installation

Usage

Virus Detection

Access controls

How to contribute

Development

Testing with the continuous integration server

Testing manually

Acknowledgments

hydra-works's People

Contributors

Stargazers

Watchers

Forkers

hydra-works's Issues

Sponsors

Goal and Reason

Recommend Projects

Recommend Topics

Recommend Org