Giter Club home page Giter Club logo

emleditor's People

Contributors

jimmyrocks avatar majestc96 avatar roblbaker avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emleditor's Issues

upload_data_package(): 32Mb limit

This function currently enforces a 4MB maximum file size, as per the API documentation. However, further testing has determined that the actual file size limit is 32MB. Increase file size to limit to 32MB (or just shy of that).

Add ability to revise Notes

Add the ability for users to revise the Notes section (to do away with stray characters if detected in an as-yet un written DPchecker function). Likely similar to how set_abstract works.

set_content_units: add attribute tag

Text-based recognition of content unit links for metadata extraction in DataStore is not very robust and is prone to errors or people inadvertently using the same text string for non-content unit links.

Add an attribute tag to the element of each content unit link such that it reads:

set_creator_orgs: look-up table for park units

Currently, the set_creator_orgs() function allows people to input whatever they want as the organizationName. This can lead to discrepancies ("ROMN", "Rocky Mountain Network", "The Rocky Mountain Network", "Network, Rocky Mountain", "NPS Rocky Mountain Network" etc.

Add a parameter to the function that takes in a park unit and uses that unit to automatically populate the organizationName element in metadata.

add ability to insert orcid to eml

We want to start using ORCiDs more consistently; allow EML authors to add in ORCiDs without having to re-run EMLassemblyline

set_orcid() ?

set.CUI cannot overwrite existing CUI

Currently, the set.CUI function stops if existing CUI is detected in any of the additionalMetadata elements. This needs to be fixed to enable overwriting of just the additionalMetadata element containing CUI, if it exists. for instance:

myneweml<-set.version(emlObject, "PUBFUL")

sets CUI to PUBFUL. But:

mynewesteml<-set.version(myneweml, "PUBVER")

stops with an error message that reports CUI already exists instead of replacing PUBFUL with PUBVER.

set_int_rights: odd behavior

set_int_rights works fine if you use set_cui, export to .xml, reimport to R and then call set_int_rights. However, it throws an error if you call set_cui and then directly call set_int_rights. It should be able to handle both cases.

Add ability to revise Methods

Add a function similar to set_abstract to let users revise the Methods section, particularly if stray characters are detected via an as yet unwritten DPchecker function.

Set_nps_publisher function

Is this really worth doing?

I'm getting a lot of mixed messaging from users. On the one hand, people don't seem to like the idea of an automated pipeline. The less work they have to do, the better. On the other hand, under the current implimentation some things are automatically handled in the background (like setting the publisher) and that has raised eyebrows and feedback has not been 100% positive: it seems people also want control over what is being done to the metadata.

Unfortunately, for most NPS data publications the publisher is defined as Fort Collins and there are a lot of specific fields to set for the publisher. Giving people control over this will inevitably result in critical errors such as mismatches between the publisher and the ROR ID (will people even know what that is?) that will get passed up to DataCite, data.gov, etc. The solution appears to be to write a manual function with no options that gives people control over when they set the publisher and allows them to manually set the publisher (but no control over what the publisher is set to).

  1. Remove .set_publisher from the utils file.
  2. Remove calls to .set publisher from within set_ functions; remove NPS=TRUE from function arguments.
  3. Write a set_NPSpublisher function that must be manually run and will correctly set the publisher.
  4. set_NPSpublisher will have no options so that the publisher info is correctly specified.
  5. set_NPSpublisher documentation will point people to set_publisher for full manual control over all fields in case they really. don't want to have NPS be the publisher.
  6. This will likely mean having to re-think how the "by or for NPS" fields are generated.

Write over data urls

There is a problem with the current workflow. Currently users would use EMLassemblyline to generate an initial EML document and EMLeditor to edit it. However, EMLassemblyline requires having URLs to the data files. This means users must initiate a draft reference on DataStore prior to running any of the EMLeditor functions. The set_datastore_doi() function adds a DOI to metadata by initiating a draft reference (and files are then uploaded to the "correct" draft reference using upload_data_package()). But because this initiates a new draft reference, the data urls are no longer accurate.

The solution involves a change in workflow such that a temporary data url is generated during EAL EML creation and then is replaced by the correct/updated URL when a set_datastore_doi() is called. This means that users don't end up generating multiple draft references and the URLs are correct.

set_datastore_doi behaves oddly

sets to use set_datastore_doi if there is no doi... but set_datstore_doi was just called
input prompt isn't on the right line
Warning message:
"in get_ds_id(eml_object) your eml lacks an alternate identifier tag please use the set_doi fucntion to add your doi" fix this!"

can give non-required fields empty strings instead of "string"?

add function set_creator_org

EMLassemblyline does not appear to support including organizations as creators, but many NPS data packages will have organizations (such as NPS, a network, or a park) as creators. Write a function, set_creator_org() that:

  1. Adds an organization as a creator to the existing creators
  2. Allows for ROR ids for the creator organization

issue with set_cui/set_int_rights

user reports that even after using set_cui(), the set_int_rights() function reports there is no CUI and to use the set_cui() function to set the CUI before setting intellectual rights. Clearly there is something funny going on with one or both of the functions.

Proposed solution: find the bug, fix the bug, and hope it doesn't spawn more little bugs.

Add citations to EML

EML should contain any/all citations used when generating the EML: for instance, if someone cites a protocol or SOP on DataStore, that should go into the Citations. If someone references a publication or text book used when generating the methodology, that should go in the Citations.

  1. This is underway but was largely abandoned and could use substantial work
  2. The function should take in a list (file?) of references in bibtex format
  3. The function should insert the references into metadata
  4. The references should be appropriately aligned with the xml formatting and not just dumped in to the citations. This will allow them to be pulled out and re-used later (for instance, to be sent up to DataCite, etc)

incorporate elevation into geographicCoverage

Geographic coverage (other than content units) rarely includes elevation. USGS allows for a bulk API request and might be used to add elevation to geographicCoverage elements in metadata. This would definitely be a diferent function than set_content_units().

USGS bulk api request (bulk point query).

add ability to update dataTable urls

Legacy versions of EMLeditor did not update the urls for dataTables if/when someone changed or added a DOI to the metadata. There should be a function (set_data_urls) to update the dataTable urls to correspond to the DOI.

NPS ROR

Have you seen a standard spot to put the NPS ROR in EML? Perhaps there isn't a high-level location that is appropriate, but it would be used at the level of each individual contributor/creator instead. Since there is a single ROR for all of NPS perhaps this package could help with inserting that into the right spot(s).

I don't think this is a high priority, but clearly of some relevance as we saw it when we registered a DOI through DataCite. There is a DataStore connection to unravel as well (since it stores an affiliation but not a ROR at this point).

add NARA dissemination codes to set_CUI

Add the following dissemination codes to the set_CUI function pick list:
PUBLIC (technically not a NARA dissemination code; replaces PUBFUL and PUBVER)
FEDONLY (replaces NPSONLY)

set_datastore_doi needs more brakes

Apparently people are creating lots of draft references using set_datastore_doi().

Take another look at this function and see if there are any additional ways to add some brakes to excessive draft reference creation.

add a function to inject producing units info

Add a function to capture producing park unit codes - a good place is likely additional metadata. Do not need to store geographic coordinates, just the code (typically 4 letters).

An alternative to place to store this info is in the metadataProvider element.

set_content_units

add ability to remove content units (or make this a separate function?)

.... incase someone adds a content unit by mistake or something. Though I guess "replace content units" does this pretty effectively, so long as you want to replace it with something rather than replace it with nothing? Replacing with nothing seems like a pretty rare scenario.

Add validation checks to EMLeditor

People want some way to check their EML (beyond write_readme). DPchecker has functions to perform many of these checks, but this will require people running multiple independent functions. DPchecker has a wrapper function that runs ALL DPchecker functions at once, but many of them require access to the data files in addition to the metadata and would fail when only checking metadata.

Write a wrapper function for EMLeditor that runs just the EML-specific functions from DPchecker.

Make sure set_int_rights works with updated set_CUI

After changing the picklist of dissemination codes in set_CUI(), make sure that the set_int_rights() function still works with the new picklist. Right now it will limit the type of license you can choose from depending on what you have set the CUI to. Make sure that functionality is retained and up to date.

Add ability to re-order creators

set_creator_org allows users to add an organization to the list of creators, but it always appends it as the last creator. Users may want to update the order of the creators in metadata (in order to update the author order on DataStore, etc).

Write a function, set_creator_order that allows users to update their creator order (and delete creators)

write_readme function

The 'write_readme' function appears to have a bug. When I export the readme .txt file I'm getting a note that the CUI and Park Unit Content has not been defined. I believe both these attributes are defined in the xml being summarized in the readme.txt so am puzzled why this warning is appearing.

Warning messages:
1: In get_content_units(eml_object) :
No Park Unit Connections specified. Use the set_content_units() function to add Park Unit Connections.
2: In get_cui(eml_object) :
CUI not properly specified. User set_cui to update the CUI code.

Here's a link to the EML being summarized in the readme.txt
https://doimspp-my.sharepoint.com/:u:/g/personal/ksherrill_nps_gov/EYjafZEDKQhLpuxJnUnGUvIBK_USRWVOovGZozbjEAR4Kw

Sarah comments

  • set.NPSpublisher (and other "set" functions)
    • Consider adding an "replace_existing = c("ask", "yes", "no")" parameter to better support both interactive and non-interactive use
  • get.unitPolygon
    • Is this redundant with QCkit::qc_getParkPolygonIRMA?
  • get.beginDate/endDate
    • Consider accepting date format as argument (w/sensible default)
  • get.citation
    • Can we get publisher from the metadata instead of hard-coding?
  • get/set.parkUnits
    • Should we always use this fxn instead of setting more precise bounding coords?
  • When are you supposed to use set.DOI vs new.DOI?
  • How do the DOI prefixes work?
  • write.readme
    • default outfile to "" so that users have option to just dump to console

set_content_units

The set_content_units function is appears to have a bug when you set force=FALSE and NPS=False. I was wanting to manually with in R add a non-nps geographicCoverage in the XML. When selecting option 2 - Add to the existing unit Connections, it replicates the existing units.

##Code below is what I was using:
park_units <- c("ROMO", "GRSA", "YELL")

in_eml2 <- set_content_units(in_eml, park_units, force=FALSE, NPS=FALSE)
Your metadata already has the following Content Unit Links Specified:
NPS Content Unit Link: ROMO
NPS Content Unit Link: GRSA
NPS Content Unit Link: YELL
Do you want to

1: Retain the existing Unit Connections
2: Add to the exsiting Unit Connections
3: Replace the existing Unit Connections2
#After selectin 2:
Your metadata now has the following Content Unit Links Specified:
NPS Content Unit Link: ROMO
NPS Content Unit Link: GRSA
NPS Content Unit Link: YELL
NPS Content Unit Link: ROMO
NPS Content Unit Link: GRSA
NPS Content Unit Link: YELL

#The get_content_units() function also appears to not be working. This is what I get after select 2 above:

get_content_units(in_eml)
[1] "NPS Content Unit Links: "
Warning message:
In get_content_units(in_eml) :
No Park Unit Connections specified. Use the set_content_units() function to add Park Unit Connections.

get_author_list doesn't handle organizations

get_author_list only returns authors that are individuals and does not handle authors (creators) that are organizations. Improve get_author_list to include organizations as well as individuals.

set_content_units: multiple units

When setting content units, if multiple units are specified, the EML is correct, but the message displayed to the screen is confusing. Currently:

mymeta5<-set_content_units(mymeta5, c("ROMO", "YELL"))
No previous Content Unit Links Detected
Your Content Unit Links have been set to: ROMOYELL.

Fix this so that if multiple content units are specified the message printed to the screen is more intuitive/confidence inspiring.

Add "force" option to set_ class functions to enable scripting

Currently most of the "set_" class functions are fairly interactive and prompt the user to decide if they really want to overwrite existing metadata. They also provide the user with detailed feedback on the result of the function ("Your metadata has a title, "Title" . Are you sure you want to change it?"; "Your new title is, "Title".). This is great for new users, but can complicated things for advanced users who would like to script these functions.

Adding a "force" option would turn off all of the warnings and feedback and just do whatever the user tells the function to do and trust that the user knows what they are doing. This approach will facilitate scripting by advanced users.

For an example, see the set_title function, which currently has a "force" option that defaults to FALSE.

set_content_units: cross-ref with park units

currently, set content units accepts non-park units (any text string). It would be nice if it cross referenced the list of park units provided with the list of available park units to verify them.

This can be accomplished via api calls so that a list of park units doesn't need to be maintained within EMLeditor.

let set_cui, set_int_rights accept upper & lower case

right now set_cui() parameters are all upper case (e.g., "PUBLIC") and set_int_rights() paramters are all lower case (e.g. "public"). We probably want to store the CUI dissemination codes as upper case ("PUBLIC") and the intellectual rights/license name as lower case ("public") in the metadata to be consistent with the external sources they are tied to. However, we can make it easier on users by accepting either upper, lower, or title case as user input and translating it to the appropriate case for storage in metadata.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.