librarycarpentry / lc-fair-research Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 24.0 3.13 MB

Library Carpentry: FAIR Data & Software

Home Page: https://librarycarpentry.org/lc-fair-research

License: Other

Makefile 9.00% R 7.69% Shell 0.41% Python 82.32% Ruby 0.58%

lc-fair-research's People

Contributors

Stargazers

Watchers

lc-fair-research's Issues

Challenge-F2/F3-Metadata

The challenge below is taken from the Sprint GoogleDoc.

Activity suggestion:

Pick an article using Crossref API to view the metadata (e.g. http://api.crossref.org/v1/works/10.1371/journal.pone.0237703).
Alternative: Check the metadata of a Zenodo Record (which uses DataCite).

How could we improve the metadata?

Solution:

Adding PIDs (ORCID iDs) to all authors, contributors and editors.
Adding PIDs (ROR) to populate affiliation data
Adding PIDs (DOIs, handles, URNs...) (or permalinks if no PID is minted) to all cited works.
Indicate the citation relationship (e.g. "cites")

Trick: To visualize more clearly the JSON results of Crossref API in the browser, use a browser extension such as JSONview

Is anyone interested in expanding on these exercises and reviewing them? You could also add them to the main lesson under Findable. Don't forget to add the solution to the lesson.

Don't forget to check out the comments in the google doc.

Any further comments on these challenges please add them to this issue!

June 2019 Lesson Release checklist

If your Maintainer team has decided not to participate in the June 2019 lesson release, please close this issue.

To have this lesson included in the 18 June 2019 release, please confirm that the following items are true:

When all checkboxes above are completed, this lesson will be added to the 18 June lesson release. Please leave a comment on carpentries/lesson-infrastructure#26 or contact Erin Becker with questions ([email protected]).

Use PIDs to refer to external resources when possible

This is a suggestion for all lessons: when citing articles, datasets, or other research objects, let's use their persistent identifiers when they have one.

Ask contributors to use persistent identifiers (like DOIs) in lesson contents

As a follow-up to #11, lesson contributors should be requested to use persistent identifiers (PIDs) to articles, datasets and other research objects.

I suppose (and suggested in #38 (comment)) that the pull request template has a message or checklist item reminding authors to use PIDs or that an automatic check is run on the contents.

Add bibliography under Extras

Consider the utility of creating a public zotero library and exporting it to a bibliography page in this lesson under the Extras menu.

Need jump lists (anchors) for headings

@zkamvar
We need to update style for this lesson so that jump lists (anchors for different sections) becomes automatically available for different levels of headings.

TIB's FAIR Data & Software workshop material available

Please see TIBHannover/2018-07-09-FAIR-Data-and-Software#10 for a discussion of whether to transfer it somewhere, maybe here, or keep on developing it as Google Slides.

Populate learning objectives for Episode 4 - Interoperable

Learning objectives are the highest level and focus on what the learner will learn.

Populate learning objectives for Episode 7 - Assessment

Learning objectives are the highest level and focus on what the learner will learn.

Change "Scientific lifecycle" to "research lifecycle"

I wanted to open this up for discussion before anyone started work on it: the Introduction episode references the "research lifecycle" and then uses this illustration which instead calls it the "scientific lifecycle". Given the generally narrower usage of the word "science" in English (usually referring to only the natural sciences) than other European languages (where it is mostly a proper synonym for "research"), should we change "science" to "research" here? Or perhaps clarify our definition of "science"?

Research Life cycle Headings & Key Points

In light of the myriad of "Research Data Life cycle figures" available including https://www.dcc.ac.uk/guidance/curation-lifecycle-model
I would like to create one specifically for this lesson and would, therefore, like to ask you to add what you think is should be included under the following themes:
1- Planning:

Data management planning (DMPs)
Data description and metadata extraction
Data documentation
Choice of repositories
Choices of file formats
Data re-use
Funders requirements
File naming
Ethics and Research conduct
Funding for RDM activities

2- Managing:

Storage and backup & security
Active Metadata collection
Tools and software solutions
Curation
Versioning
Provenance

3- Sharing

Data access and Sharing rights
Data privacy and GDPR compliance
Data ownership, licensing
Data Transfer
GDPR

4- Preservation and Publication

Citation
PrePrint
DOI
Publishing requirements
Long Term Storage
Archival and Disposal policies

Please feel free to edit/add as you see fit.

Populate learning objectives for Episode 3 - Accessible

Learning objectives are the highest level and focus on what the learner will learn.

Additional examples for Choosing the Right Repository in episode 2- Findable

For episode 2, Findable (https://librarycarpentry.org/lc-fair-research/02-findable/index.html), here is some suggested text for “Choosing the right repository”

Funding agencies are another resource for recommendations on choosing a data repository. National Institutes of Health, for example, provides a list of Open Domain-Specific Data Sharing Repositories

Journals may also have requirements specifying which data repositories are to be used for sharing data associated with a published article. For example, the author guidelines for Systematic Biology state that "All data files and online-only appendices should be uploaded to Dryad", and "All nucleotide sequence data and alignments must be submitted to GenBank or EMBL before the paper can be published. In addition, all data matrices and resulting trees must be submitted to TreeBASE."

Populate learning objectives for Episode 2 - Findable

Learning objectives are the highest level and focus on what the learner will learn.

Exercise ideas for Accessible

The idea below is taken from the Sprint GoogleDoc.

Provide a short list of general and discipline specific repositories
Let participants profile and compare 2 or 3 repositories according to certain criteria: topic relevance, structure, search/discovery-features etc

Is anyone interested in expanding on these exercises and reviewing them? Don't forget to add the solution to the lesson.

Don't forget to check out the comments in the google doc.

Any further comments on this exercise please add them to this issue!

Populate learning objectives for Episode 1 - Introduction

Learning objectives are the highest level and focus on what the learner will learn.

ReproNim's FAIR Data module & forks

[…] collection of guiding principles to make data Findable, Accessible, Interoperable, and Re-usable. This module provides a number of lessons to ensure that a researcher’s data is properly managed and published to ensure it enables reproducible research.

The original repo is a bit sleepy, but I worked in two different forks on some minor fixes and a new exercise. Please see the little bubbles on https://github.com/ReproNim/module-FAIR-data/network.

Challenge-Findable

The challenges below are taken from the Sprint GoogleDoc.

Challenge 1:
arXiv is a preprint repository for physics, math, computer science and related disciplines. It allows researchers to share and access their work before it is formally published.
Go to arXiv and
Does arXiv use a DOI?
Compare these two papers:
https://arxiv.org/abs/2008.09350
https://arxiv.org/abs/2008.00287
which one of them has a persistent identifier?

Challenge 2:
Look at this paper [link]. Click on the ‘pdf’ link to download it. Do a full-text search by using control + F or command + F and search for ‘http’. Did the author use DOIs for their data and software?

Challenge 3:
What is the problem with referring to your code and software only with a URL [example] without providing a DOI?

Is anyone interested in expanding on these exercises and reviewing them? You could also add them to the main lesson under Findable. Don't forget to add the solution to the lesson.

Don't forget to check out the comments in the google doc.

Any further comments on these challenges please add them to this issue!

Learner personas for FAIR lesson

Some of us teach postgraduate students and researchers, some of us teach librarians, some of us teach data stewards and people supporting research infrastructure.

We know that it's good to have a crossover audience who can inform each other, but without detailed knowledge of our different learner profiles, it's difficult to diagnose exactly the problems we want to fix.

Fleshing out learner personas will help us understand our target audience and better understand which activities are best going to suit them.

Create a lesson team for FAIR lesson

Hi @maneesha can you create a lesson team for the FAIR lesson and invite:

Thanks!

Resources for the Licenses episode

Here are some resources that are reusable that will complement the licenses episode.
To be added here: https://github.com/LibraryCarpentry/lc-fair-research/blob/gh-pages/_episodes/06-licenses.md

4 Simple recommendations for Open Source Software https://softdev4research.github.io/4OSS-lesson/
- Use a license: https://softdev4research.github.io/4OSS-lesson/03-use-license/index.html
Top 10 FAIR Imaging https://librarycarpentry.org/Top-10-FAIR//2019/06/27/imaging/
- Licensing your work: https://librarycarpentry.org/Top-10-FAIR//2019/06/27/imaging/#9-licensing-your-work
The Turing Way a Guide for reproducible Research: https://the-turing-way.netlify.app/welcome
- Licensing https://the-turing-way.netlify.app/reproducible-research/licensing.html
The Open Science Training Handbook: https://open-science-training-handbook.gitbook.io/book/
- Open Licensing and file formats https://open-science-training-handbook.gitbook.io/book/open-science-basics/open-licensing-and-file-formats#6-open-licensing-and-file-formats
DCC How to license research data https://www.dcc.ac.uk/guidance/how-guides/license-research-data

Populate learning objectives for Episode 6 - Licenses

Learning objectives are the highest level and focus on what the learner will learn.

Should software be integrated throughout the whole lesson?

The FAIR principles were originally developed for research data: discussions as to how (or even if) they apply to software are still ongoing in the wider research community. Issue raised by @nehamoopen during a Zoom call.

Two suggestions were made:

Include sections for "Data" and "Software" in each episode
Focus the main episodes on "Data" and then have a final episode to discuss how the principles translate to "Software"

Populate learning objectives for Episode 5 - Reusable

Learning objectives are the highest level and focus on what the learner will learn.

Challenge-ORCID-Findable lesson

The challenge below is taken from the Sprint GoogleDoc.

ORCID + DOI Autoupdate.
Suggestion:

Register for an ORCID account and activate it if you don't have one (if you don't want to have an official ORCID iD, you can also use the Sandbox environment https://sandbox.orcid.org/)
Apply desired privacy settings to the data in your ORCID profile
Use the Search & Link Tool/Wizard to connect your ORCID with Crossref Metadata Search and DataCite. See if there're already works authored by you that you can import from the wizard.
Upload a work (e.g. your most recent presentation) to Zenodo or Figshare (both use DataCite DOIs). Remember to fill in the metadata correctly and add your ORCID iD to it.
Wait a couple of minutes to see the work appearing in your ORCID Record.

Comment: Zenodo mints a DOI for each uploaded version and another one for the complete collection of version. If we only upload one work, it looks like if two DOIs were minted for the same object. This leads to a duplicate in the ORCID Record, but can be seen as an opportunity to explain the "combine" option in ORCID. https://orcid.org/blog/2020/06/18/new-features-alert-combining-work-items

Is anyone interested in expanding on these exercises and reviewing them? You could also add them to the main lesson under Findable. Don't forget to add the solution to the lesson.

Don't forget to check out the comments in the google doc.
Any further comments on these challenges please add them to this issue!

Exercise idea for Interoperability

This exercise example is specific to interoperability of oceanographic data, if that is appropriate.

Using the data server ERDDAP, along with the Climate and Forecast conventions, combine temperature data from multiple sources and make a profile or timeseries plot.
Maybe building off this example jupyter notebook, but reducing some of the complexity with xarray and the various functions introduced.

Exercise idea for Reusable

The idea below is taken from the Sprint GoogleDoc.

Case-Study: Are there examples or best practices available how and where datasets have been successfully reused and repurposed?
Analyse and assess - what went well, where are limitations/boundaries, to what degree can certain dataset be reused?

Is anyone interested in expanding on this exercise and reviewing it?. Don't forget to add the solution to the lesson.

Don't forget to check out the comments in the google doc.

Any further comments on this, please add them to this issue!

Populate key questions across all episodes

Only the "Findable" lesson has questions listed. These might need review and questions need to be added for the other lessons as well.

Broken links in CONTRIBUTING.md

The links in this line item in CONTRIBUTING.md are broken.

If you wish to change this lesson, please work in https://github.com/swcarpentry/FIXME, which can be viewed at https://swcarpentry.github.io/FIXME.

Populate learning objectives across all episodes

Determining learning objectives will help how the content on each page is shaped.

Learning objectives could be drawn from any number of existing FAIR training materials.

Populate key points across all episodes

The Reference page under "Extras" is a useful lesson tool, but requires all the episodes to have clear key points to draw from.

It could also be useful to rename "References" to Glossary, to avoid confusion with any user expecting a bibliography or list of references.

Typo in Top 10 FAIR Data and Software Things: Nanotechnology (checkout)

In the first sentence under "Description" on page https://librarycarpentry.org/Top-10-FAIR//2019/09/09/nanotechnology/, the word "interoperability" is currently misspelled as "interopera_bility" and should be fixed.

Lesson Contribution: Introduction

I'm a member of The Carpentries Core Team and I'm submitting this issue on behalf of another member of the community. In most cases, I won't be able to follow up or provide more details other than what I'm providing below.

I would like contribute to the new FAIR Data lesson as part of the checkout process:

https://librarycarpentry.org/lc-fair-research/01-introduction/index.html

I've read the whole lesson and I believe it is very well done already.

My suggestion: In the penultimate section of the Introduction "How does "FAIR" translate to your institution or workplace?" the first question goes: "Does your institutional data management policy refer to FAIR principles?"

· There are many research institutions and libraries that do not yet have a data management policy and/or might not be planning to get one in the near future so this question could alienate some participants. It would be better to change the wording to something along the lines of: "If your institution has a data management policy, does it refer to the FAIR principles?" OR "Does your institution have a data management policy that refers to the FAIR principles?"

Lesson Contribution - Data and Software Management Plans

I'm a member of The Carpentries staff and I'm submitting this issue on behalf of another member of the community. In most cases, I won't be able to follow up or provide more details other than what I'm providing below.

I noticed that in the lesson https://librarycarpentry.org/lc-fair-research/07-assessment/index.html under the "Planning" section there was a heading for Data and Software Management Plans, but no supporting text or links. I would like to recommend the following text and links as resources for Data Management Plans:

Here are two sites that help you create Data Management Plans
DMPonline - https://dmponline.dcc.ac.uk/
DMPTool - https://dmptool.org/

Many organizations and funding agencies provide sample Data Management Plans. For example, the Inter-university Consortium for Political and Social Research (ICPSR) provides a sample data plan for data deposited in its repository - https://www.icpsr.umich.edu/web/pages/datamanagement/dmp/plan.html.

Exercise ideas for Episode 2 - Findable

@Karvovskaya and I wanted some feedback on the exercises for the Findable episode before we start fleshing it out. Suggestions for improvement, incl. how to do them totally differently are welcome!

I've organized the exercises according to the (sub-)principles for some structure.

F1: (Meta) data are assigned globally unique and persistent identifiers / DOIs

Challenge 1:
Compare these two papers from arXiv - a preprint repository for physics, math, computer science, and related disciplines which allow researchers to share and access their work before it is formally published:

https://arxiv.org/abs/2008.09350
https://arxiv.org/abs/2008.00287

Which one of them has a persistent identifier?

Challenge 2:
Look at this paper [link to be included]. Click on the ‘pdf’ link to download it. Do a full-text search by using control + F or command + F and search for ‘http’. Did the author use DOIs for their data and software?

Challenge 3:
What is the problem with referring to your code and software only with a URL [example to be included] without providing a DOI?

F2: Data are described with rich metadata
F3: Metadata clearly and explicitly include the identifier of the data they describe

We could provide an exercise where a dataset/software is provided, and learners have to extract + fill out metadata fields based on that? If possible, it would be nice to allow for ‘correctly’ typed answers only - so no typos, etc. because those little errors affect the links between content.

Example exercise for inspiration: https://sites.uwm.edu/dltre/metadata/exercises/

The depth of this exercise can range from something simple like the three images in the previous link, or we could have sample exercises that follow specific schemes/standards like DDI, DataCite, discipline-specific standards.

Also, this is what is currently on the lesson website: Automatic ORCID profile update when DOI is minted RelatedIdentifiers linking papers, data, software in Zenodo

F4: (Meta)data are registered or indexed in a searchable resource

Perhaps we could use Zenodo’s Sandbox for learners to ‘upload’ the data + metadata?
We could also provide some example datasets/software and have learners select the most appropriate (discipline-specific) repository from a list we give them/they can search for the repo themselves.

librarycarpentry / lc-fair-research Goto Github PK

lc-fair-research's People

Contributors

Stargazers

Watchers

Forkers

lc-fair-research's Issues

Recommend Projects

Recommend Topics

Recommend Org