Giter Club home page Giter Club logo

hydra-works's Introduction

Hydra::Works

Code: Gem Version Build Status Coverage Status Code Climate

Docs: Contribution Guidelines Apache 2.0 License API Docs

Community Support: Samvera Community Slack

What is hydra-works?

The Hydra::Works gem implements the PCDM Works data model using ActiveFedora-based models. In addition to the models, Hydra::Works includes associated behaviors around the broad concept of describable "works" or intellectual entities, the need for which was expressed by a variety of Samvera community use cases.

Product Owner & Maintenance

hydra-works was a Core Component of the Samvera Community. Given a decline in available labor required for maintenance, this project no longer has a dedicated Product Owner. The documentation for what this means can be found here.

Product Owner

Vacant

Until a Product Owner has been identified, we ask that you please direct all requests for support, bug reports, and general questions to the #dev Channel on the Samvera Slack.

Help

The Samvera community is here to help. Please see our support guide.

Getting Started

The PCDM Works domain model includes the following high-level entities:

  • Collection: a pcdm:Collection that indirectly contains zero or more Works and zero or more Collections
  • Work: a pcdm:Object that holds zero or more FileSets and zero or more Works
  • FileSet: a pcdm:Object that groups one or more related pcdm:Files, such as an original file (e.g., PDF document), its derivatives (e.g., a thumbnail), and extracted full-text

View a diagram of the Hydra::Works domain model.

Behaviors included in the model include:

  • Characterization of original files within FileSets
  • Generation of derivatives from original files
  • Virus checking of original files
  • Full-text extraction from original files

Dependencies

Check out the Hydra::Derivatives README for dependencies.

Additional dependencies required for specs

ClamAV

  • Mac installation
    $ brew install clamav
    $ cp /usr/local/etc/clamav/freshclam.conf.sample /usr/local/etc/clamav/freshclam.conf
    $ freshclam
    

Installation

Add these lines to your application's Gemfile:

gem 'hydra-works', '~> 0.15'

And then execute:

$ bundle install

Or install it yourself:

$ gem install hydra-works

Usage

Usage involves extending the behavior provided by this gem. In your application, you can create Hydra::Works-based models like so:

class Collection < ActiveFedora::Base
  include Hydra::Works::CollectionBehavior
end

class Book < ActiveFedora::Base
  include Hydra::Works::WorkBehavior
end

class Page < ActiveFedora::Base
  include Hydra::Works::FileSetBehavior
end

collection = Collection.create
book = Book.create
page = Page.create

collection.members << book
collection.save

book.members << page
book.save

file = page.files.build
file.content = "The quick brown fox jumped over the lazy dog."
page.save

Virus Detection

To turn on virus detection, install ClamAV on your system and add the clamby gem to your Gemfile

gem 'clamby'

Then include the VirusCheck module in your FileSet class:

class Page < ActiveFedora::Base
  include Hydra::Works::FileSetBehavior
  include Hydra::Works::VirusCheck
end

Access controls

We are using Web ACL as implemented by hydra-access-controls.

How to contribute

If you'd like to contribute to this effort, please check out the contributing guidelines

Development

Testing with the continuous integration server

You can test Hydra::Works using the same process as our continuous integration server. To do that, run the default rake task which will download Solr and Fedora, start them, and run the tests for you.

rake

Testing manually

If you want to run the tests manually, first run solr and FCRepo. To start solr:

solr_wrapper -v -d solr/config/ -n hydra-test -p 8985

To start FCRepo, open another shell and run:

fcrepo_wrapper -v -p 8986 --no-jms

Note you won't find these ports mentioned in this codebase, as testing behavior is inherited from ActiveFedora.

Now you’re ready to run the tests. In the directory where hydra-works is installed, run:

rake works:spec

Acknowledgments

This software has been developed by and is brought to you by the Samvera community. Learn more at the Samvera website.

Samvera Logo

hydra-works's People

Contributors

afred avatar atz avatar awead avatar barmintor avatar blancoj avatar bmquinn avatar botimer avatar carolyncole avatar cbeer avatar cjcolvar avatar dchandekstark avatar dunn avatar eliotjordan avatar elrayle avatar escowles avatar flyingzumwalt avatar grosscol avatar hackartisan avatar hectorcorrea avatar jcoyne avatar jeremyf avatar jrgriffiniii avatar kefo avatar kevinreiss avatar little9 avatar mark-dce avatar mjgiarlo avatar scherztc avatar tampakis avatar tpendragon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hydra-works's Issues

Hydra::Works::GenericFile Behaviors

The points below are from @elrayle original comments in code. I took out a reference Hydra::Works::File since that class is something that we decided not implement. Which of these are behaviors that actually need to be implemented in hydra-works?

  • Hydra::Works::GenericWork can NOT aggregate Hydra::PCDM::Collection
  • Hydra::Works::GenericWork can NOT aggregate Hydra::Works::Collection
  • Hydra::Works::GenericWork can NOT aggregate Works::GenericWork unless it is also a Hydra::Works::GenericFile
  • Hydra::Works::GenericWork can aggregate Hydra::Works::GenericFile

Hydra::Works::Collection Behaviors

The points below are from @elrayle original comments in code. I took out a reference Hydra::Works::File since that class is something that we decided not implement. Which of these are behaviors that actually need to be implemented in hydra-works?

  • Hydra::Works::Collection can NOT aggregate Hydra::PCDM::Collection unless it is also a Hydra::Works::Collection
  • Allow Collection to have both GenericWorks and Collections as members

service object: AutoGenerateThumbnail

Create a service in Hydra::Works at /lib/hydra/works/services/file/auto_generate_thumbnail.rb

module Hydra::Works
  class AutoGenerateThumbnail

    ##
    # Auto-generate a thumbnail for an existing file.
    #
    # @param [String] :path_to_source_file that serves as the base for the derivative thumbnail
    # @param [String] :path_to_target_directory where the generated thumbnail should be saved
    #
    # @return [String] the path with filename of the generated file

    def self.call( path_to_source_file, path_to_target_dir )

        # TODO write code to generate a thumbnail

  end
end

Hydra::Works::GenericWork Behaviors

The points below are from @elrayle original comments in code. I took out a reference Hydra::Works::File since that class is something that we decided not implement. Which of these are behaviors that actually need to be implemented in hydra-works?

  • Hydra::Works::GenericWork can NOT aggregate Hydra::PCDM::Collection
  • Hydra::Works::GenericWork can NOT aggregate Hydra::Works::Collection
  • Hydra::Works::GenericWork can NOT aggregate Works::GenericWork unless it is also a Hydra::Works::GenericFile
  • Hydra::Works::GenericWork can aggregate Hydra::Works::GenericFile
  • Allow GenericWork to have both GenericWorks and GenericFiles as members

Solrize GenericWork

Initial guess would be that you'd want works to solrize their GenericFile's metadata? Do you reindex after a generic file is saved?

service: upload to generic file

# Upload a file to a generic file, optionally running auto-generation services to create variants.
#
# @param [Hydra::Works::GenericFile] :generic_file into which to upload the file
# @param [String] :path_to_file path to the file being uploaded
# @param [Hash] :auto_gen_services info for auto-generating files from the uploaded content file
#
# @return [Hydra::Works::GenericFile] the updated generic file

Questions:

  • A generic_file is defined as one content file + variants that are auto-generated. Do others agree?
  • If the generic_file already has files, is the new file considered a new version of the existing content file?
  • What predicate should be used for identifying the purpose of the uploaded content file and each auto-gen file? (e.g. use - what is the full URI?)
  • What values should be used for the 'purpose' predicate? (e.g. file.use = "content" | "thumbnail" | "extracted_text")

This was samvera/hydra-pcdm#74

Use Case: Conference Event

Sponsors

UC San Diego Library (Metadata Services): @MetadataDeluxe @Juliane666 @remerjohnson

Goal and Reason

Given an event (a conference), with many sub-events with many creative works created by many entities about related but different topics:

As a user:

  • I want to see a hierarchical list of all sub-events that can sorted by date, title, or creator while maintaining the order of any sub-components (for example, parts 1-3 of a single music performance are kept together and in order).
  • I want to see a timeline of events with dates and times, so that there is greater context for the event.
  • I want to see descriptive metadata for the entire event (the conference).
  • I want to see descriptive metadata for each sub-event (for example, a musical demonstration as part of a presentation).
  • I want to see technical metadata for each file (let's say an .mp3).
  • I want to see components by descriptive characteristics, so that I could for example access all the performances of electronic percussion music.

screenshot from 2014-10-17 14 09 41

Allow addressing of segments of files as Works

Although deferred at the F2F in Portland, there are use cases that require the addressing of segments of files as Works. Some examples include:

  • An audio CD ripped to a single file, with offsets as to the tracks or other segmentation.
  • A digitized image that depicts two pages in a spread, and there is metadata about the individual pages
  • The division of a video of a musical performance based on different bands playing at different times

This requires multiple Works to be associated with parts of the same File. To use the book case, there would be one Work for each page, and each page would need to refer to the segment of the File that depicts it. If there are URIs for the Files, then fragment URIs could be used to refer to the areas.
The larger question is about the use of hasFile ... which of the two pages has the file? They can't both have it as that would break the files-associated-with-exactly-one-work rule.

[Integration] Update Sufia to rely on the 3 Hydra::Works classes

  • Update Sufia GenericFile, GenericWork and Collection to be subclasses of the corresponding Hydra::Works classes.
  • Report on any bugs this causes if they require changes in hydra-works or hydra-pcdm
  • If the Sufia test suite passes with this integration, submit a PR to Sufia

Note If this integration triggers necessary changes in sufia or sufia-models, record the tickets in that repository, not here.

Target ActiveFedora 9

Currently this works with the ActiveFedora 7 API, but UCSD and PSU need it for ActiveFedora 8

Procedure for next steps

With the use case creation in full swing, we'll want to look to the next step. Below is a straw person proposal:

I believe it makes sense to put an announcement on the Hydra Tech Call that we are in the "brainstorming" phase of use cases. We should also announce when the brainstorming phase will wind down (Oct 22 is my proposal) and the consolidation and formalization begins.

The next phase would be to produce the normalized use cases for the next Hydra Tech all (Oct 29).

From that point, I don't know how we'd proceed, but it gets us one step further along.

Use ActiveTriples mechanism for setting multiple types when available

Effects:

  • collection_behavior.rb
  • work_behavior.rb
  • file_behavior.rb

Replace the following code: (substitute appropriate type for WorksTerms.Collection)

    def initialize(*args)
      super(*args)

      t = get_values(:type)
      t << RDFVocabularies::WorksTerms.Collection
      set_value(:type,t)
    end

With something like... (TBD depending on ActiveTriples implementation)

    included do
      type RDFVocabularies::WorksTerms.Collection
    end

Should Hydra::Works be an Engine

Given that Hydra::Works is a modeling exercise, do we need the verbosity of a Rails engine? The Rails engine requires continual rebuilding of a dummy application.

It could instead be a Railtie. Or a gem that leverages ActiveSupport and ActiveModel.

Use Case: Page level representation (parts of Works)

A multi-page text (such as a book, magazine, article, or similar) may be represented by a variety of different resources with different uses. Contributed towards refining #8

Given a simple object that could be rendered in a page turning interface:

  • The Work
    • is-described-by Descriptive metadata about the Work
    • has-representation PDF of the entire Work
    • has-part-order [Page 1, Page 2, Page 3, ...]
    • has-part Page 1
      • is-described-by Descriptive metadata about Page 1
      • has-representation JPG Access copy of digitized Page 1
      • has-representation JP2/Tiff Master copy of digitized Page 1
      • has-representation ALTO XML of Page 1's textual content
    • has-part Page 2
      • ...

service object: CreateFileWithUpload

Create a service in Hydra::Works at /lib/hydra/works/services/file/create_file_with_upload.rb

module Hydra::Works
  class CreateFileWithUpload

    ##
    # Upload a file to Fedora and create the PCDM File for it.
    #
    # @param [String] :path_to_file path to the file being uploaded
    #
    # @return [Hydra::PCDM::File] the newly uploaded file

    def self.call( path_to_file )
        # NOTE: This is not the exact ordering for these tasks.  There may be additional tasks.
        # TODO create a PCDM File
        # TODO upload the file to Fedora
        # TODO anything with technical metadata here?


  end
end

Provide a definition of scope of "Work"?

My understanding is that it is a compound or complex object (eg a resource that has parts, which may themselves have parts). It is not the bibliographic notion of an abstract Work (as opposed to a physical Item that embodies the Work).

It would be good to come to a common understanding of the definition and thus scope of the effort before starting in on modeling.

Remove WithEditors

I don't think this code is releavent to the core-concerns of this gem

Audio example

Putting this out there because I'm guessing it is out of scope. However here is an example of what we will be tackling next.

Audio Example. Use case for a multi-part item where both order and hierarchy matter. http://server1.variations2.indiana.edu/variations/cgi-bin/access.pl?id=ABF3712

For classical music, the structure is assigned to match the liner notes for the work, mostly. Video works could have similar structures, for example an opera that has acts and solos within acts.

--Assigned structure supports navigation. For example, when a user clicks on a track, that part of the audio plays.

--Each track/part could correlate to separate files OR the parts could point to a time marker on one larger file.

--For each “master file”, multiple access derivatives are made to support various quality viewing experiences (which would have its own technical metadata, etc.)

--Tracks may have access control needs that are different than the intellectual work. For example, a class may only have access to 5 out of 20 tracks.

--Individual tracks/parts may be added to playlists (in the future when they exist)

Use Case: Non-repository (Solr-only) works

Our main search-and-discovery interface, Virgo (http://search.lib.virginia.edu), uses a Solr index built from a variety of sources in addition to our repository (e.g., our Sirsi/Dynix OPAC, HathiTrust, MARCXML linking to licensed content on the web, etc). These sources have their own external mechanisms for maintaining their metadata and for transforming it into Solr documents without any associated Fedora object.

We have a continuing requirement to be able to search across all items' metadata (whether the items are physical or digital) so we have a need to be able to treat Works (essentially, each thing that has a Solr record) in a coherent manner regardless of the origin of their metadata.

There may not be a lot of institutions with a similar requirement, so we wouldn't be looking for direct support of this use case per se -- we just want to ensure that the baseline Hydra::Work is extensible enough to allow it to be supported.

service object: MoveGenericFileToGenericWork

module Hydra::Works
  class MoveGenericFileToGenericWork

    ##
    # Move a generic file from one generic work to another
    #
    # @param [Hydra::Works::GenericWork] :old_parent_work where the generic file currently lives
    # @param [Hydra::Works::GenericWork] :new_parent_work where the generic file is being moved
    # @param [Hydra::Works::Generic_File] :child_generic_file being moved
    #
    # @return [Hydra::Works::GenericWork] the destination generic work

    def self.call( old_parent_work, new_parent_work, child_generic_file )
        # TODO validate all parameters
        # TODO create association between new_parent_work and child_generic_file
        # TODO remove association between old_parent_work and child_generic_file

        # NOTE: This is not a copy.  No new fedora objects should be created as part of this process.
  end
end

Provide a sample use case or template

We would do well to provide at least one sample use case, or a use case template, so that potential contributors know what we're looking for. #9, #10, and #11 are quite different from one another. Is that variation OK? How much detail are we looking for, and how much do we care about how they're formatted (we talked previously about user stories, and we're talking now about use cases, and these are different and equally useful things).

JSON-LD Context for ORE

There is one, but in my opinion the proxy construction makes our use cases harder:
http://www.openarchives.org/ore/0.9/jsonld#proxies

I think the intuitive context would generate the following single tree:

{
  "@id": "http://example.com/aggr",
  "@type": "ore:Aggregation",
  "iana:first": "_:p1",
  "iana:last": "_:p2",
  "aggregates": [
    {
      "@id": "http://example.org/1",
      "proxy": 
        {
           "@id": "_:p1",
           "@type": "Proxy",
           "iana:next": "_:p2",
           "proxyIn":  "http://example.com/aggr"
        }
      },
    {
      "@id": "http://example.org/2",
      "proxy": 
        {
           "@id": "_:p2",
           "@type": "Proxy",
           "iana:prev": "_:p1",
           "proxyIn": "http://example.com/aggr"
        }
      }
    ]
}

Rather than two flat lists that the currently proposed one does:

{
  "@id": "http://example.com/aggr",
  "@type": "ore:Aggregation",
  "iana:first": "_:p1",
  "iana:last": "_:p2",
  "aggregates": [
     "http://example.org/1",
     "http://example.org/2"   
  ],
  "proxies": [
       {
          "@id": "_:p1",
          "@type": "Proxy",
          "iana:next": "_:p2",
          "proxyFor": "http://example.org/1"
       },
       {
          "@id": "_:p2",
          "@type": "Proxy",
          "iana:prev": "_:p1",
          "proxyFor": "http://example.org/2"
       }
  ]
}

Thoughts?

Test that triples are correctly set in fedora objects

Right now, the tests validate behaviors. There are no tests that validate that expected predicates and values are set in the fedora object.

Ex.

  • Are generic works associated with a collection via the pcdm:hasMember predicate?
  • Are related objects associated with an collection via the ore:aggregates predicate?
    etc.

Update README and Guidelines for Contributing to reflect current developments

The README is incomplete and out of date. It needs to

  • Declare what functionality is provided by the gem
  • Declare the intended uses of the gem
  • Declare the relationship between this code and Sufia, sufia-models, Worthwhile and Curate
  • Provide Instructions for Installation/Use
  • Provide links to background documentation (ie. Application Profile) with context explaining the relationship between that documentation and this code

The Guidelines for Contributing still say that we are soliciting Use Cases. They don't say anything about contributing code or submitting issues other than Use Cases. They should

  • Provide guidelines for contributing code
  • Provide guildelines for submitting issues and pull requests
    Is there a template for this somewhere in the Hydra space? If not, should we create one and apply it to all of the repositories?

Use Case: Research Dataset

A research dataset containing a set of files organized into top-level categories of preparatory materials, raw data files, statistics, and visualization images, with multiple files in each category. The visualization images are further organized in a hierarchy by type, and then by X/Y/Z axis.

  • The dataset as a whole has descriptive metadata describing the research project, the research team, etc.
  • The top-level categories and the hierarchy of visualization images have titles.
  • Each individual file has a title and technical metadata.

Implement Thumbnail file type

Relates to samvera-deprecated/sufia#1010

Build a thumbnail file class in HydraWorks such that an implementation could do:

class MyGenericFile < Hydra::Works::GenericFile
  has_one :thumbnail, predicate: ::RDF::URI(http://hydraworks.namespace/hasThumbnail), class_name: "HydraWorks::Thumbnail"
end

and triples associated with MyGenericFile would include <> hasThumbnail HydraWorks::Thumbnail

The thumbnail class extends Hydra::PCDM::File and has its own rdf type as per duraspace/pcdm#5

Specifying additional technical metadata

Hydra::PCDM::File classes will specify some amount of minimal technical metadata (see samvera/hydra-pcdm#16), but this should be augmentable at the Hydra::Works integration level so an implementor may add more of their own additional technical metadata.

Should we provide integration tests that ensure this is feasible and demonstrates how to do it?

Use Cases: Co-Sponsoring?

I'm not sure if this will actually happen because we are all special ❄️ ;) But I can imagine the possibility of multiple institutions having similar use cases.

So if I wanted to co-sponsor(+1) a use case that another institution created, what practice do we want to recommend for supporting this?

Would it make sense to open a PR with a co-sponsor line with @username and/or institution?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.