Giter Club home page Giter Club logo

Comments (20)

jlhardes avatar jlhardes commented on August 25, 2024

Not sure if this is helpful here, but baseline technical metadata properties across file formats/types from the Hydra Technical Metadata Subgroup are available: https://docs.google.com/document/d/1SZCpSIdlGfXgoYrAnW2eRKlIt6O-1ADIDDhmLrvxeLc/edit#heading=h.a8hurtypz8qi

from hydra-pcdm.

jcoyne avatar jcoyne commented on August 25, 2024

In a block like so:

metadata do
  type PCDM::File
  property :foo, ...
end

from hydra-pcdm.

acoburn avatar acoburn commented on August 25, 2024

Reiterating @jlhardes comment about the technical metadata profile -- this is now available on the duraspace wiki at: https://wiki.duraspace.org/display/hydra/Technical+Metadata+Application+Profile

from hydra-pcdm.

hectorcorrea avatar hectorcorrea commented on August 25, 2024

@jcoyne Just to make sure I understand what I need to add here.

Is the goal of this ticket to have something like this at the end (assuming I get the correct Ruby classes for each of those predicates) ?

    metadata do
      configure type: RDFVocabularies::PCDMTerms.File
      property :label, predicate: ::RDF::RDFS.label

      # TODO: Get the proper Ruby classes for these predicates
      property :file_name, predicate: ::ebucore.file 
      property :file_size, predicate: ::ebucore.fileSize 
      property :date_created, predicate: ::ebucore.dateCreated
      property :file_hash, predicate: ::premis.hasMessageDigest
      property :mime_type, predicate: ::ebucore.hasMimeType
      property :date_modified, predicate: ::ebucore.dateModified
      property :file_format, predicate: ::pronom.puid
      property :byte_order, predicate: ::sweetjpl.byteOrder
    end

from hydra-pcdm.

tpendragon avatar tpendragon commented on August 25, 2024

👍 Pronom's impossible, but besides that

from hydra-pcdm.

awead avatar awead commented on August 25, 2024

Should hydra-pcdm be concerned about what kind of tech metadata it is, or just that it has any kind of tech metadata?

from hydra-pcdm.

jlhardes avatar jlhardes commented on August 25, 2024

For this sprint, I think it makes sense to go with required properties (File Name and File Size) at a minimum. If recommended properties can also be includes (Label, Date Created, File Hash, File Format Type, and Has Mime Type) that would make it more complete. The optional fields can probably be safely ignored for the sprint - they aren't completely workable anyway (pronom:puid, for example).

from hydra-pcdm.

awead avatar awead commented on August 25, 2024

@jlhardes I agree. My question was really more about the schema. Are we enforcing a techdata schema at this level? I guess it doesn't matter, since it's RDF, if an implementer wants to use a different one, they just add it in. I think we can flesh out those details at the hydra-works level, with integration tests that serve as an example of someone who would want to build their own PCDM-approved object and add additional technical metadata.

from hydra-pcdm.

hectorcorrea avatar hectorcorrea commented on August 25, 2024

@jlhardes This is good to know since I've got File Size working (Fedora does that automatically using premis:hasSize as the predicate)

I need to talk with Esme about hasOriginalName since Fedora seems to do it out of the box but I cannot get it to work.

from hydra-pcdm.

hectorcorrea avatar hectorcorrea commented on August 25, 2024

@jlhardes Question: Is it OK if we use the equivalent predicates indicated in the document that you posted (e.g. use "premis:hasSize" instead of "ebucore:fileSize") or should we use the one indicated at the top of each one (e.g. "ebucore:fileSize") ?

from hydra-pcdm.

jlhardes avatar jlhardes commented on August 25, 2024

If we can stick with the properties at the top (Property name: ebucore:fileSize) that might make things easier for the sprint (not so much to implement). Those Property names are the main ones we'd like to see implemented anyway for technical metadata. The equivalent properties are listed to help explain the property and to provide options if the property we're listing can't be used for some reason.

from hydra-pcdm.

acoburn avatar acoburn commented on August 25, 2024

@hectorcorrea the logic behind using ebucore was that it is a comprehensive vocabulary for technical metadata. So rather than splitting the technical metadata properties across lots of different vocabularies (nfo, exif, dc, premis, etc, etc), it would be much more sane to start with a well supported, single vocabulary.

from hydra-pcdm.

hectorcorrea avatar hectorcorrea commented on August 25, 2024

@jlhardes @acoburn thanks for the background info. I'll look into implementing it with ebucore then.

from hydra-pcdm.

hectorcorrea avatar hectorcorrea commented on August 25, 2024

@jlhardes @acoburn Fedora automatically calculates and stores (as read-only) the following properties premis:fileSize, fedora:digest, and fedora:mimetype.

I could add three separate properties with educore predicates as the document recommends, but they would have to manually set and run the risk of having different values than what Fedora already stores. Do we really want to do that or should we stay with the Fedora provided properties?

//cc: @awead @jcoyne (thoughts?)

from hydra-pcdm.

acoburn avatar acoburn commented on August 25, 2024

@hectorcorrea part of the thinking here was that if an external tool (e.g. FITS) calculates these value, they can be put into the ebucore properties (since the existing properties are managed by the server and hence read-only). The advantage of using the additional properties include:

  • There is a clearer line between server managed properties and externally managed properties (potentially useful for provenance)
  • The fedora: namespace is much less widely used than ebucore: and so potentially less transferrable
  • By keeping as much technical metadata within a single namespace, you make the data (potentially) more useable in a LOD context
  • If an application chooses to use a different hashing algorithm than SHA1, that option is available
  • If there is a mismatch between the server managed properties and an externally generated value, that might be useful for certain types of preservation activities.

The disadvantage of using the additional properties is:

  • data duplication
  • more code to write / manage

That said, I don't actually have a strong opinion one way or the other. @jlhardes thoughts?

from hydra-pcdm.

awead avatar awead commented on August 25, 2024

Yes, 👍 to that. But, I don't think hydra-pcdm should have any opinions about what tech data you're using or what your'e using to create it. It should just allow you to using whichever tool and schema you prefer.

from hydra-pcdm.

jcoyne avatar jcoyne commented on August 25, 2024

@awead I disagree with that somewhat. I think it should provide an opinionated default. You should be allowed to do something else though.

from hydra-pcdm.

jlhardes avatar jlhardes commented on August 25, 2024

We had some discussion about these properties in relation to properties that are already in Fedora and I wasn't quite sure which of these mapped, so you've helped clear that up, @hectorcorrea - thanks!

I don't actually understand how it works to NOT use what we are implementing on this sprint. It seems like we want to see a baseline of technical metadata across all Hydra implementations using PCDM to make things easier going across systems and sharing externally. I understand that we don't want to limit people's implementations by making these properties using these predicates a requirement but it seems like we do want to encourage their use.

I think for that reason and for the longer term it's better to go with a more externally-useful standard, so I'd stick with premis:hasMessageDigest, ebucore:hasMimeType, and ebucore:fileSize to express those properties, even though it is a bit of duplication.

Additionally, I don't think premis:fileSize actually exists (http://id.loc.gov/ontologies/premis.html) - at least not in RDF premis. I think the premis property might be hasSize so if Fedora is using premis:fileSize, I'm not sure what ontology is actually being used.

from hydra-pcdm.

awead avatar awead commented on August 25, 2024

@jcoyne agreed. If there's any additional tech metadata you want beyond what Fedora is giving you already, then it should be as easy as simply including a module with the additional properties. Any implementation would then override that module, or more realistically, just include their own. The side effect is that you may have extra triples with different predicates but duplicate object content.

So, if we use @jlhardes recommendations, you'd have two triples with the checksum, fedora:digest and premis:hasMessageDigest. And two for mime type: fedora:mimeType and ebucore:hasMimeType (assuming their object values can be the same). I think @hectorcorrea meant premis:hasSize. That's what comes back from Fedora if you do GET request on the binary's fcr:metadata node.

from hydra-pcdm.

hectorcorrea avatar hectorcorrea commented on August 25, 2024

So I went ahead and implemented the additional properties. The only caveats with the current implementation are:

  1. Fedora considers PREMIS.hasMessageDigest a server-managed property and therefore it does not let us change the value of this property (i.e. this is a read-only property.)
  2. Property pronum:puid indicated in the documented linked by @jlhardes was not implemented since this ontology hasn't been published.

from hydra-pcdm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.