Giter Club home page Giter Club logo

consultation's Introduction

About

This repo is to facilitate a call for contributions to the report that the Horizon 2020 Commission Expert Group on Turning FAIR Data into Reality (E03464) is compiling for the European Commission. The call is also available from the Commission's website.

The remit and objectives of the FAIR Data Expert Group

The European Commission has established an Expert Group on FAIR data to support the Research and Innovation policy development on Open Science. The overall objective of the Expert Group is turning the FAIR data principles into an operational reality, to ensure that research data are Findable, Accessible, Interoperable and Reusable. The Group will address five specific objectives:

  1. To develop recommendations on what needs to be done to turn each component of the FAIR data principles into reality
  2. To propose indicators to measure progress on each of the FAIR components
  3. To provide input to the proposed European Open Science Cloud (EOSC) action plan on how to make data FAIR
  4. To contribute to the evaluation of the Horizon 2020 Data Management Plan (DMP) template and development of associated sector / discipline-specific guidance
  5. To provide input on the issue of costing and financing data management activities

The Group will review existing initiatives and analyse the components of FAIR to recommend how they could be implemented and supported. Several topics will be addressed in the report, including research data culture, skills, incentives, service components, data management planning, metrics to evaluate FAIR, cost models and sustainability. Further information is available on the FAIR data expert group webpage.

What to contribute?

We invite contributions on each proposed section of the report as follows:

Concepts - why FAIR?

In the concepts and context section of the report, we will introduce FAIR and other similar principles that promote effective data documentation and sharing to enable reuse. These will include the OECD principles, the Royal Society concept of ‘intelligent openness’ and the RDA Global Digital Object Cloud. This section will present clear definitions of what it means for data to be FAIR, and why this is important. We will analyse the relationship between FAIR and the principle that data should be as open as possible and as closed as necessary. It will be important to understand - in terms of concepts and definitions - what are the component attributes which enable data to be Findable, Accessible, Interoperable and Reusable. Identifying such definitions will allow us to explore and assess proposals for degrees or scales of FAIRness, which will help researchers and institutions with implementation. The role of data selection and case studies on the benefits of FAIR will also be considered.

Specific questions to guide external contributions are:

  • Why do we need FAIR data? What challenges does science face without it?
  • What case studies can be shared of FAIR data in practice and the benefits this brings?
  • What is a good definition of X, where X is a key concept related to FAIR data?
  • What other principles or guidelines that parallel FAIR should be considered?
  • What are the characteristics that allow data to be Findable, Accessible, Interoperable and Reusable?
  • How are these attributes best deployed in a model that presents a scale of FAIRness (e.g. increasingly FAIR Digital Objects) to assist implementation?
  • Are there any issues you have encountered, from a conceptual and definitional perspective, in interpreting the FAIR principles?
  • What models or criteria exist for identifying ‘valuable’ data that define what should be retained?

Research data culture

In the research data culture section of the report, we will consider who are the critical players in the research community and what needs to be done by each of them to make FAIR data a reality. This will address research culture, workflows and skills requirements. We will also critically assess how the FAIR principles apply to different disciplines outside of the life sciences - where they originated - and whether differences in working practices necessitate amendments. This section will also address the European Commission template for FAIR Data Management Plans. Here, we will assess the appropriateness of the existing EC template and make recommendations for discipline specific and machine-actionable approaches.

Specific questions to guide external contributions are:

  • How FAIR are current research practices in different parts of the research ecosystem (e.g., disciplines, sectors, geographic regions…)?
  • What are good examples of aggregating large amounts of data of different origins and how this offers new scientific possibilities?
  • What are the key barriers to FAIR practices (e.g. lack of metadata standards, domain repositories, data sharing norms, …)?
  • What could different players do to increase the FAIRness of data within their remit? How have research workflows actually been adapted in order to make data more FAIR? How much of this can be automated?
  • What training opportunities and career paths are needed to support researchers and other players in the research ecosystem with data management and sharing?
  • What improvements could be made to the current EC approach to DMPs?
  • How can DMPs become more integrated and machine-actionable?

Making FAIR data real

In the making FAIR data real section of the report, we will consider the operational challenges and components needed at a national and international level to provide a FAIR data ecosystem. This will address the role of repositories, registries, standards, identifiers, workflows, skills and legal interoperability. The report will explore how national and domain infrastructures interoperate and make recommendations for the European Open Science Cloud (EOSC) roadmap.

Specific questions to guide external contributions are:

  • To what extent are the FAIR principles alone sufficient to reduce fragmentation and increase interoperability? The principles have a great potential to influence the minds of stakeholders towards more efficient data sharing and reuse, but perhaps additional measures and more specifics are needed to guide implementation?
  • What are the necessary components of a FAIR data ecosystem in terms of technologies, standards, legal framework, skills etc?
  • What existing components can be built on, and are there promising examples of joined-up architectures and interoperability around research data such as those based on Digital Objects?
  • Do we need a layered approach to tackle the complexity of building a global data infrastructure ecosystem, and if so, what are the layers?
  • Which global initiatives are working on relevant architectural frameworks to put FAIR into practice?
  • A large proportion of data-driven research has been shown to not be reproducible. Do we need to turn to automated processing guided by documented workflows, and if so how should this be organised?
  • What kind of roles and professions are required to put the FAIR principles into place?

Measuring change

In the measuring change section of the report, we will focus on metrics, incentives and methods to assess the FAIRness of data and repositories. Current career progression for academic researchers is deeply dependent on author-centric metrics linked to publications. Researchers who devote time and expertise to data curation and sharing are not sufficiently rewarded for this. We welcome examples of incentives that are changing practice towards FAIR, such as data being recognised in research assessment exercises, data sharing and curation forming part of promotion criteria, or data being required in reference lists in grant proposals. Emerging models for assessing the FAIRness of data and repositories, such as those from DANS and 4TU, will also be reviewed. Feedback on these methods or examples of how organisations plan to assess FAIR are welcomed.

Specific questions to guide external contributions are:

  • What metrics are you using to track the impact of research data?
  • How are you incentivising FAIR data practices, and which methods are working best?
  • How can FAIR data practice be incorporated into career progression criteria?
  • Do repositories plan to include a FAIR rating alongside dataset metadata?
  • Are the FAIR criteria a useful way to assess the trustworthiness of data repositories?
  • What factors should be monitored to evidence a change in practice towards FAIR data?
  • Given variations in disciplinary practices, are different factors more/less relevant to note a shift in practice in certain domains?
  • How long will it take to see measurable change across the broad research community or specific subsets thereof?

Facilitating change

In the facilitating change section of the report, we will consider what enablers are needed in terms of data policy, investment and skills. Recognising that many of the issues faced are social rather than technical, we will make recommendations for creating data stewardship career paths and recognising these roles. Cost models for investing in the FAIR data ecosystem at a national and international level, and recognising additional resources required in proposals via DMPs will also be covered.

Specific questions to guide external contributions are:

  • What sustainable business models exist for data infrastructure investments?
  • How can policymakers on an EU and member state level ensure there is sufficient, sustainable funding to support a FAIR data ecosystem?
  • Can lessons be learned from the existing practice of costing RDM activities into proposals?
  • How do we build the community of 500,000 data stewards suggested for EOSC?
  • What data stewardship roles and career trajectories need to be supported?
  • What incentives are needed for individuals to drive the transition to FAIR practices?
  • How can we promote cross-disciplinary and cross-border re-use of data?

How and when to contribute?

This GitHub repository has been established to enable the community to contribute suggestions and resources in the open. Please see the guidelines and examples on how to contribute. We suggest that you review the existing tickets first and add to these if your suggestion is related to previously shared ideas and resources. When you have something different to share, proceed to raise a new issue.

We will also host two open community sessions to allow people to contribute ideas to the FAIR data expert group in real-time and in a moderated forum to allow discussion with the group. These will take place on Monday 3rd July from 14:00-16:00 CEST and Wednesday 26th July from 10:00-12:00 CEST. Further details on how to join will be circulated nearer the time and detailed on a dedicated subpage of this repository.

Your contributions will be shared with the group members responsible for the associated topic. We may request further information or invite external experts to address the group at future telecons or face-to-face meetings.

The call for contributions is open between 12 June 2017 and 31 July 2017. Early input is encouraged to allow time for follow-up discussion.

Members of the expert group were present at the European Open Science Cloud summit on 12th June and will be attending the Estonian Presidency event in Tallinn on 12-13th October 2017. Further information on our progress will be shared through this repository on an ongoing basis.

consultation's People

Contributors

daniel-mietchen avatar simonhodson99 avatar sjdcc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

consultation's Issues

What is the first step in "F" ?

I would like to take the case of a new data repository, in an environment which has a nebulous level of hierarchy, then ask the question : "How does this data repository get found ?"

One may consider many African countries and institutes, which may have data repositories, but which are "invisible".

How is data currently discovered ?

Data repositories are all over the place. Many of them serve such a restricted purpose that it makes no sense to talk of "finding" them because everyone who needs to know about them already does so. However, what about the ones which could be FAIR (ie, their technology and licensing allows the AIR part), but are not F because nobody knows about them.

Typically the way things are found is via "search". A single-point of entry to finding data is really useful - and it's reasonable to assume that this will depend on the community in practice. Astronomers have ADS, biologists have a whole mess of databases, climate scientists have ESGF, Earth Observers have [GEO](http://www.geoportal.org/, etc. (ok, these are comparing apples and oranges in many cases).

Sometimes there's a federation based on a metadata standard like DDI...

But for the data to be returned in a search, the repositories still need to be indexed, monitored etc.

In the best case scenario, perhaps there's an overarching body at the national or community level which proposes best practice in bringing new repositories in, how they are evaluated and supported, etc.

In the case of African countries, there is rarely any such oversight, and much data either gets lost over time or simply is never seen by anyone. Essentially, we can never check the AIR bits, because we never F the data !

What is being done to support the inclusion of new repositories ?

  1. What technology guidelines are there for creating FAIR-friendly repository ?
  2. What policy guidelines are there for "registering" the repository with an indexing service or authority ?
  3. For monitoring purposes, how does one communicate with the repo maintainer on the state and compliance of their repository ?

These are clearly complex and subtle questions with no single answer. I'd be happy to narrow down any aspects in discussion.

How FAIR is FAIR?

Several commentators have started commenting on the FAIR principles themselves

FAIRmetrics score on the profile of each published dataset?

I have a naive idea about a rose diagram of FAIRmetrics on the profile of each published dataset. The idea is derived from the Altermetrics that is widely used for scientific publications now.

A rose diagram with a overall data quality value can be very transparent and attractive on the profile of a dataset. The challenge, as already mentioned by a few colleagues, is methods to do the quantitative estimation of the four elements in FAIR. Altermetrics uses datasets from social media to estimate the impact of a publication. For the elements of findability, accessibility, interoperability and reusability, there could be some automated ways to get records if the dataset has a unique identifier such as DOI or URI. Still, we need to carefully design a model and a mechanism to evaluate each item in FAIR, and then generate a overall score.

Research Data Culture Questions - community perspective

We had a project in Germany in Baden-Württemberg (bwFDM-Communities) and intensively interviewed almost 800 scientists from all disciplines in our federal state about their data handling and further needs. So we have a very detailed overview. The results are only available in German. However, I will try to extract short answers to your data culture questions today from a community perspective.

How FAIR are current research practices in different parts of the research ecosystem (e.g., disciplines, sectors, geographic regions…)?

  • You should first consider, that half of our scientists were happy with data availability in their disciplines. They might not always know what they miss and a good third was unhappy, but it is hard to motivate beyond that without presenting clear individual benefits. FAIR practices vary a lot, even within a disciplin (like architecture or zoology), but there are also connections beyond specific domains. I once had this picture on a slide to increase our understanding and it might be helpful to categorize from a different perspective (data contributes to a scientific goal and is more or less important): https://doi.org/10.6084/m9.figshare.5117728
  • Please also have the long tail of science in mind. In our survey only 10% of the researchers had already published data openly. That there "is" data can be seen, when this number is compared to 40% who already shared data when they were asked for it and 90% who already shared data within a project or their research group. This is not fully representative, because small groups were overrepresented in our survey (group-wise questioning), but these numbers can still serve as rough estimates for the situation in Germany 3 years ago.

What are good examples of aggregating large amounts of data of different origins and how this offers new scientific possibilities?

What are the key barriers to FAIR practices (e.g. lack of metadata standards, domain repositories, data sharing norms, …)?

In our survey we collected for Chapters 2.3, 3.3 and 5.2 (http://bwfdm.scc.kit.edu/downloads/Abschlussbericht.pdf (in German!))

  • 243 user stories that wished a better suited or a useful repository at all
    Domain repositories are more preferred than institutional ones, especially when a data search need was formulated for own research progress
    "institutional" repositories were wished for "archiving" data (but also there not preferred). They almost always had special "origin" e.g. to: keep full control over access, (automatic) link to own publications, have central visibility, store high amount of data, have reliability, trustworthiness and similar properties
  • 199 user stories that wished a better scientific culture in their disciplines
  • 122 user stories that wished more consistent (meta)data formats and standards
  • 89 user stories that wished more compatible software
  • 38 user stories that wished clearer guidelines on the formats, archival process etc. to use
  • Trust is also very important. A lot of data will not be trusted enough to encourage own work on it, especially when the corresponding paper is not in a high-impact journal: That has to do with the replication crisis: https://doi.org/10.1038%2F533452a

What could different players do to increase the FAIRness of data within their remit? How have research workflows actually been adapted in order to make data more FAIR? How much of this can be automated?

  • In my opinion we need a certificate for OAIS conform organisational structures at universities (no repository or technology issues). That means checking if there is for each created set of data a person who is in charge of making it FAIR and a control over that. It is not enough to say "the scientists should...", scientists are just "producers". The management and responsibility roles have to be clearly defined from the group domain upwards and cover cases of leaving scientists and incorporate professional data managers. This is not difficult to do, but difficult to enforce.
  • We should teach more about automation possibilities and support these in specific disciplines. This is an easy way to combine better data structures and reproducibility with higher research speed.
  • We have a lot of different pilot projects in Baden-Württemberg that try to make data more FAIR. To wrap it up I would answer two things:
    First: The publication of data is coming earlier in the scientific workflow. Often this is possible, although there are "fears". To work around that, I would suggest to give data the same credit as the publication it directly supports, if it is published within 2 years in front, even if it is not mentioned in the publication. That way even if another group overtakes the own "discovery" by steeling and "reproducing" the data, one could claim for the same discovery without writing a paper, if the own data is FAIR enough. (That is the idea in a nutshell). This incentive would wipe out most "sharing-fears" and FAIR-problems.
    Second: Scientific workflows are built more robust, reproducible and standardised. The most difficult thing here is to transfer single developments over universities or domains. I have no full solution for that. What we try now is to establish a local professional network to support this and discuss with all new "data-projects" some aspects of further use from the start. But it also depends on the incentive to use work that matches the "NIH syndrome".
  • Critical players in disciplines can be service or data providers (like huge instruments) that can foster a good culture in whole disciplines by their terms of use.
  • Repositories should invest in user friendly functionality and get closer to the users e.g. by:
  1. export functions to a generic data formats
  2. (web)software to visualize data
  3. (web)software to analyze data (e.g. statistically) and search by available analysis output
  4. discussion site for each data set (e.g. with answers from the author)

What training opportunities and career paths are needed to support researchers and other players in the research ecosystem with data management and sharing?

  • A researcher who contributes mainly to a groups infrastructure (by caring for the data) needs a real perspective.

What improvements could be made to the current EC approach to DMPs?
How can DMPs become more integrated and machine-actionable?

  • I would like to see some standardised nicely coloured (and machine-readable) data-flow-pictures at the end of DMPs. What are the data sources, what are the processing steps and where do all products or primary data will be published and/or archived. So the reviewers can easily see if the important data is included in the plan and if it "ends" green, red or yellow. Nobody wants to read or write these things into longish DMPs.

Is the assiduous collection of metadata and its aggregation via a DOI agency an essential operation for FAIR Data?

Comment/question.

I believe the FAIRness of data must be associated with the quality of the metadata associated with it.

My question is whether there is agreement that assiduous collection of metadata and its aggregation via a DOI agency is an essential operation?

My comment is that FAIR metadata assiduously collected against a specified schema does not seem common yet. Thus http://doi.org/b88d is an analysis of some of the the most scientifically important datasets collected in recent years, together with the metadata associated (or not) with them. How are we to interpret that such a well-resourced research project is not assiduous in its metadata dissemination?

FAIR EOSC: FAIR e-Infrastructure Service vs FAIR Research Infrastructure Services

At the EOSC Summit I expected quite some discussion about the differences between FAIR Data services

  1. EOSC e-Infrastructure services : file storage, general catalogues, AAI, metadata needed for finding files, compute resources, etc
  2. Research Infrastructures using the EOSC services : data sets, subject specific catalogues, specialist Commons

But there was little discussion about (1). It was mostly (2). In fact there seemed a huge disconnect between the two. Most (2) use commercial services not EU services.
the NIH FAIR Data Commons RFA (https://commonfund.nih.gov/sites/default/files/RM-17-026_CommonsPilotPhase.pdf) recognises this and actively encourages partnership with commercial service providers.

We need:

  • common FAIR services
  • federated services
  • to clarify and respect the roles of EOSC and RI services

FAIR initiatives in the rare disease community

The shared (live) version of this document can be found here: https://goo.gl/Dr1sHT

FAIR initiatives in the rare disease community

Purpose of this document

The purpose of this document is to inform EU high level experts about emerging activity towards implementation of FAIR data principles, and the organisation thereof, in the European/international community of rare and undiagnosed disease stakeholders, particularly with respect to research infrastructure (RD-Connect, E-Rare, ELIXIR, BBMRI), patient organisations such as EURORDIS, and health care experts centres for rare diseases organised via ‘European Reference Networks’. The document is drafted in the context of the call for consultation: https://github.com/FAIR-Data-EG/consultation

Emerging FAIR activity in the rare disease domain

Synopsis

Significant activity is emerging within the rare disease community towards implementing FAIR data principles. The goal is to significantly increase the efficiency and potential of safely combining and exchanging various types of information across various types of data sources under various (complex) models of ownership. Activities include pilot projects and workshops supported by RD-Action, RD-Connect, ELIXIR, BBMRI, and national organisations. These are supported and monitored by a cross-project, global ‘rare disease linked data and ontology task force’. The next step is to organise and professionalise FAIR data services between rare disease stakeholders (e.g. rare disease expert centres organised in European Reference Networks) and solution providers (e.g. Orphanet, RD-Connect, ELIXIR, BBMRI, the Global Alliance for Genomics and Health). A group of operational leaders in the field (see contacts below) are currently exploring the concept of a GO-FAIR implementation network as a framework for this organisation. The group welcomes any further consultation with European Commission experts when desired.

Background and history

Rare diseases (RDs) are considered a challenge for Europe (Commission Communication on Europe’s challenges in the field of rare diseases, 2008), because they raise specific problems: poor recognition leading to diagnostic delay and inappropriate management and limited knowledge on natural history and pathophysiology leading to an insufficient development of new therapies. Around 50% of RD patients are considered un- or misdiagnosed, and only 5% of all RDs have a therapy available. To significantly improve these statistics for all patients with rare and undiagnosed conditions, an exponential increase in efficiency is required in all steps towards diagnosis and therapy. This includes the way data is handled at all stages. The low prevalence and the specificity of rare diseases make that a global, multi-stakeholder approach, intended to gather specific expertise and to build transversal, shared strategies is necessary. Data generation and sharing, and the ability to interrogate data across resources (under well defined conditions) are key and urgent elements of this strategy. With over 6000 diseases (source: Orphanet) and a high degree of diversity across institutes and countries, the data landscape for rare diseases is highly fragmented and heterogeneous. Data stewardship, the development of common semantic standards, and a federated approach are considered essential to achieve the desired increase in efficiency. For example, the Human Phenotype Ontology and the Orphanet Rare Diseases Ontology are already widely endorsed as milestones towards data interoperability, for which they acquired the status of ‘recognized resource’ by IRDiRC, the international consortium of rare disease research funding agencies. The application of the FAIR guiding principles as an IRDiRC recognized resource is in preparation. Progress towards the implementation of FAIR principles is further marked by pilot projects, workshops, and active discussion in fora organised by EURORDIS and RD-Action or online (e.g. [1,2]). A pilot study supported by RD-Connect, ELIXIR, and BBMRI tested interoperability services for enabling questions across biobanks and registries (an extensive report of the study recommendations is available through ELIXIR upon request). Numerous ‘Bring Your Own Data’ workshops were organised for hand-on experience with these techniques [3], in turn serving as inspiration for defining the FAIR guiding principles [4]. These activities were the basis for the organisation of a cross-project, multi-stakeholder ‘rare disease data linkage plan’ with additional support from RD-Connect, ELIXIR-EXCELERATE, ELIXIR, BBMRI, and patient organisation representatives. At the same time, ethical and legal constraints associated with access and reusability are addressed through collaborations between IRDiRC and the Global Alliance for Genomics and Health, and in organisations such as BBMRI. Automation of (re-)consent and privacy-preserving record linkage is under investigation. The currently running rare disease data linkage plan serves as a front-runner of a FAIR implementation service for the rare disease domain.

European Reference Networks for rare diseases

The recently started ‘European Reference Networks’ (ERNs) for rare diseases emphasize the importance of FAIR principles in the rare disease domain. The aim of ERNs is to make expertise of specific rare diseases, organised in expert centres across Europe (health care providers), available to rare disease patients across the European Union. Furthermore, they aim to stimulate translational use of different types of data in support of disease experts and new research. Requirements should be expected towards exchanging information about patient conditions between institutes in different European countries, including supporting (research) data. This will encompass the need to improve the quality of data at the expert centres in terms of the FAIR principles. In the rare disease domain, extending the adoption to FAIR principles to the valuable data held by patient organisations represents an additional important challenge. Therefore, ERNs have expressed a clear interest in FAIR principles as an aid to address part of their information needs.

Next steps

The rare disease data linkage plan is a starting point for further organising and professionalizing FAIR services in the rare disease domain. More fine-grained guidelines for the implementation of FAIR principles will need to be defined together with stakeholders, including lead representatives of ERNs, patient organisations such as EURORDIS, and solution providers such as Orphanet, ELIXIR and BBMRI. Therefore, a group of operational leaders in the rare disease domain (see contacts below) have recently started preparations towards the organisation of professional FAIR data services, using the GO-FAIR implementation network paradigm as a driver. The organisation is aimed to take effect in the fourth quarter of 2017. The seed organisers welcome any further questions towards the implementation of FAIR data principles in the rare disease domain and are available for further consultation by European Commission experts.

Contacts (Europe)

(In alphabetical order)

  • Virginie Bros-Facer, representative of the European organisation of rare disease patient organisation EURORDIS
  • Ronald Cornet, expert in ‘registration at the source’
  • David van Enckevort, technical leader rare disease data linkage plan
  • Victoria Hedley and Ana Rath, representatives of RD-Action, the organisation that supported the implementation of European Reference Networks and of the Orphanet nomenclature in health information systems.
  • Ana Rath and Marc Hanauer, representatives of Orphanet
  • Marco Roos*, co-lead of the ELIXIR rare disease community use case and co-lead of the rare disease data linkage plan, chair of the rare disease linked data and ontology task force.
  • Rachel Thompson, representative of RD-Connect
  • Mark Wilkinson, first author of the FAIR guiding principles

* For correspondence, please contact Marco Roos ([email protected]) and Petra van Overveld ([email protected])

References

  1. Biocuration 2017 Conference Highlights, http://monarch-initiative.blogspot.nl/2017/04/biocuration-2017-conference-highlights.html
  2. Registries of Domain-Relevant Semantic Reference Models Help Bootstrap Interoperability in Domains with Fragmented Data Resources, Roos M., Wilkinson M. D. et al., Proceedings of the Semantic Web Applications and Tools for the Life Sciences, Amsterdam 2016, http://ceur-ws.org/Vol-1795/paper16.pdf
  3. The Organisation of Bring Your Own Data (BYOD) Workshops to Make Life Science Data Linkable at the Source, Jansen M, Carta C. et al., Proceedings of the Semantic Web Applications and Tools for the Life Sciences, Amsterdam 2016, http://ceur-ws.org/Vol-1795/paper18.pdf
  4. The FAIR Guiding Principles for scientific data management and stewardship, Wilkinson M. D. et al., Nature Scientific data, 2016, https://www.nature.com/articles/sdata201618

FAIR Data needs FAIR tooling that is lightweight and ubiquitous

FAIR principles are all very well, but without the TOOLS for FAIR then its not possible to adopt. To make a "app eco system" of FAIR tools we need very small number of lightweight protocols and conventions, with real and adoptable, sustainable tooling.
This is not the time for monolithic systems.
Scruffy, lightweight, easy and ubiquitous ALWAYS beats out fancy, heavyweight, tricky and limited in my experience.

Data Repository Evaluation - Are the FAIR Data Principles fair?

Since the work of 4TU is mentioned under 'Measuring Change' on the about page of this knowledge pool, I would like to quickly link to the corresponding work.

  • Pre-Print Practice Research Paper "Are the FAIR Data Principles fair?"

  • Excel Spreadsheet with the evaluation, statistic and graphs of 37 dutch research data repositories / archives (also including Figshare, Mendeley Data, EUDAT B2Share, Zenodo)

  • Blog-post connecting all the dots for the IDCC 2017

Building a disciplinary, world-wide data infrastructure paper

Building a Disciplinary, World‐Wide Data Infrastructure, by Francoise Genova et al https://doi.org/10.5334/dsj-2017-016

This paper describes the way several disciplines organised themselves to set up their disciplinary interoperability framework, commonalities and differences. It comes from a panel discussion in a session of SciDataCon 2016. It will be useful for the 'Research data culture' and 'Making FAIR data real' sections of the report.

FAIR metric form

We have developed a form to aid in the development of FAIR metrics.

Metric Descriptor  Value
Metric Identifier  
Metric Name  
To which principle does it apply?  
What is being measured?  
Why should we measure it?  
What must be provided?  
How do we measure it?  
What is a valid result?  
For which digital resource(s) is this relevant?  
Examples of their application across types of digital resources  
Comment

This form is published on http://fairmetrics.org

How can/should one introduce Context into FAIR data?

I have heard criticism of FAIR as representing only four of the five essential attributes of data. The missing component is “context”. Without the back story associated with the data, it is impoverished.

Arguably, the FAIR metadata can provide link(s) to such back stories, but is this sufficient and should over mechanisms such as perhaps EventData be promoted as well?

EBI as an example of the benefits of open data

Referring to the question: What case studies can be shared of FAIR data in practice and the benefits this brings?

The European Bioinformatics Institute (EBI) is one of the biggest providers of data in life science worldwide. Almost all data is open and is used by scientists and industry alike. What could make the EBI an interesting example for the report is the fact that there is a relatively new impact assessment, clearly outlining the financial benefits. I do no know if such studies exist for other domains. If you are interested, a summary as well as the full report of the impact assessment can be found at http://www.ebi.ac.uk/about/our-impact

Introducing Context into FAIR data: some use-cases.

More on the context of FAIR data. Here is FAIR data embeded into the context of a journal table;

http://doi.org/10.14469/hpc/1248 The table has cells which reference the DOI of the data referred to, and using Javascript retrieves the data and renders it. The components of the table can be downloaded separately if readers wish to inspect them or re-assemble them into another context. In this model, the table “wrapper” is itself distinct from the article it is a part of (hosted on a repository, and hence has its own metadata).

We also use other models in which such tables are hosted by the journal as part of the HTML version of the article, eg http://www.rsc.org/suppdata/cc/c3/C3CC46720A/C3CC46720A/index.html In this model, the data Table does not have its own DOI and hence can only weakly inherit metadata from the article itself.

It is of some interest that the major publishers in our area (chemistry) have thus far accepted the idea of creating a separate independent version of a journal article table which has its separate metadata. Whether this will persist remains to be seen.

Make this repository WOW ?

I applaud the choice to make this consultation an Open process. Perhaps the authors/maintainers have heard of the Mozilla Science Working Open Workshops ? I think that aspects of WOW would make it easier and more attractive to contribute.

I think it would help to clarify a few aspects of this repository though :

Personas and pathways

See http://mozillascience.github.io/working-open-workshop/personas_pathways/

  1. Who (person/group) started it ? It belongs to @FAIR-Data-EG , but this has only one public member. Apart from knowing that it will inform the EOSC, it is not clear to me who is behind this.
  2. It would help to explain who the authors are (perhaps have an Authors file, with some description of how one gets included.
  3. What other forms of contributions are recognised ? How are these recognised ?

Any ideas or comments on this ?

Should subject-specific enhancement of metadata be used by communities as a mechanism for achieving FAIRness in data?

I would like to present the following search:

https://search.datacite.org/works?query=media:chemical\/x\-gaussian*+SubjectScheme:inchikey+subject:XZYDALXOGPZGNV-UHFFFAOYSA-M+media:chemical\/x\-mnpub*

as an example of FAIR metadata associated with a FAIR dataset. The metadata from one of the two hits can be inspected at

https://data.datacite.org/application/vnd.datacite.datacite+xml/10.14469/HPC/2635

and exploits the element from the DataCite metadata schema to enhance the FAIR attributes of the dataset.

https://search.datacite.org/works/10.14469/HPC/2635 shows the media types associated with this dataset, again part of the FAIR attributes.

My question is whether such subject-specific FAIR enhancement should be used by communities as a mechanism for achieving FAIRness?

Survey on Horizon 2020 DMP template

OpenAIRE is coordinating a survey on the Horizon 2020 Data Management Plan template. This includes questions on the process of writing a DMP, what views people have towards the H2020 template, whether there could be any improvements in terms of terminology, questions, guidance, coverage etc, priorities for improvements, and lessons learned from reviewing DMPs.

The survey is available online and open until Friday 21st July 2017.

OpenAIRE will publish a summary of the responses to the survey and recommendations on the project website. The survey results will also be used as an input to the FAIR data expert group, which will contribute to an evaluation of the Horizon 2020 DMP template and suggest future revisions.

FAIR data decisions: Lossy or lossless

One of the issues often confronted by depositors of aspiring FAIR data is how much data loss to tolerate. I give just one example, crystallographic data in chemistry (often described as the Gold Standard in chemical Data). There are the following hierarchies, with increasing data loss:

  1. The raw instrument data
  2. The processed instrument data, including "hkl" information
  3. The processed instrument data, including rich structure information but excluding "hkl" data
  4. The processed minimum dataset, which suffices for perhaps 90% of most user's needs
  5. A graphical representation of the minimum dataset, as a JPEG or PDF...
  6. which itself can be lossy.

So most consumers of say category 4 would find it adequately FAIR for their needs, but some specialist users would find it too lossy, and might need to go as high as category 1. The trouble is that this type of data might be as much as 10,000 times larger than the minimal set.

Unfortunately there is no easy way of specifying the degree of data loss in any aspiring FAIR dataset as metadata information. This remember is considered the "gold" standard. One finds similar situations in other types of chemical data.

Making FAIR data real - The community experience

Probably you might already know this all but maybe it is still somewhat helpful answering your questions on how to make FAIR data real from what I have learned from the our almost 800 questioned scientists in a brief overview.

To what extent are the FAIR principles alone sufficient to reduce fragmentation and increase interoperability? The principles have a great potential to influence the minds of stakeholders towards more efficient data sharing and reuse, but perhaps additional measures and more specifics are needed to guide implementation?

  • It should be clearly said there is also research data beyond "publication supporting data", which is often enough ignored. From our survey there was admittedly a very strong disciplinary difference, but especially in life and natural sciences we had a strong demand (every second) for publication of "negative" results. In other discplines it was said, that "there is no negative result". An example: Dozens of researches try the same synthesizing pathway for a new molecule in a "standard way" or "second standard way", but fail and nobody publishs it, because it is normal that "other ways" are needed (that is their science to find them) and the negative result is not "publishable". A simple trustworthy entry in a database of failure would be enough for them, but currently the knowledge is just lost. It is not a big deal in the single case, just some hours of work, but it happens thousands of times.
    So the question must be answered: "What is the data, that needs to be FAIR?" 45% of our researches said they could benefit "much" or "very much" from some kind of "negative data", but I do not see it coming automatically by being FAIR alone. The disciplines know in principle what could be needed, but are not able to change their "credit system". I think there shoud be funding offers to disciplines to think about their data and work on such things as a whole.
  • "Reproducibility" is often misunderstood. It should be more "press the button, there it comes" (and it is possible to exchange input data) and not "Read the paper and source code, there is all you need." It should be at best so, that the man on the street "can reproduce" science, because the scientist from another discipline is exactly like the man on the street. And by the way, the man on the street could benefit, too.

What are the necessary components of a FAIR data ecosystem in terms of technologies, standards, legal framework, skills etc?

  • Concerning skills you will surely know the http://edison-project.eu
  • Although it is not my preferred solution, maybe we need an EU-wide copyright collective to pay scientific software that is needed to reproduce science each time some work is "reproduced" by people not having access to necessary software. I am sure there could be a contract for widely used software or only software will be allowed to use in funded projects that takes part in such a contract. But of course, someone has to pay that little money (and misuse has to be prevented in a smart way). I suggest this with some qualms, because I would prefer more open solutions, but maybe we can't have our cake and eat it, too.

What existing components can be built on, and are there promising examples of joined-up architectures and interoperability around research data such as those based on Digital Objects?

  • In Germany we begin to have federated archiving-place in some federal states (e.g. in Baden-Württemberg). This makes sense for Germany, because the federal states mainly pay their universities. However, for each "membership" in e.g. European, national or other other networks, the universities will automatically try the most synergetic approach and therefore will make these things compatible. This is not a "no-brainer" and sometimes a project on its own, but it is happening at such melting points.
  • Secondly I would heavily build on repositories, because they are close to their community needs.

Do we need a layered approach to tackle the complexity of building a global data infrastructure ecosystem, and if so, what are the layers?
Which global initiatives are working on relevant architectural frameworks to put FAIR into practice?

  • I am not sure if we need layers, but the roles must be clear for all players. We should not get into a situation with scientists are running in circles asking for payment or just the "delivery" of a service (like data deposition) because each asked "station" in the circle feels irresponsible.
    So please make clear among all players what exactly universities, project funders and disciplinary or EU-solutions should offer their scientists (e.g. as condition for some participation/cooperation)

A large proportion of data-driven research has been shown to not be reproducible. Do we need to turn to automated processing guided by documented workflows, and if so how should this be organised?

  • In my opinion this must definitely come, but strictly driven by the scientists, not from information centers. Scientists should be allowed to take part at programs to develop their disciplines automation, if they are a "relevant mass" in their disciplines and have good support from central information infrastructures. There is a 4 sided paper of RFII in Germany (http://www.rfii.de/?wpdmdl=2269 - you know it I guess, although it is in German). It is impossible for me to overemphasise the importance of the scientists role. The performing scientists of such automation projects should talk to and include "all" scientists in their discipline to avoid isolated solutions. This should be absolutely mandatory.

What kind of roles and professions are required to put the FAIR principles into place?

  • There should be a main data or "FAIR" manager at each university (even if it is just a title at the start).
  • Maybe (as a rough idea) we need to push forward the profession of "replication science" as an own science with own professorships.
    Let us try the analogy with industry: This professorships for "data replication science" are comparable with a specialised "quality control", but for the good "data". This is different from the current approach where the production lines somehow check each other. The new guys would focus on checking only the data. These people could give valuable feedback and impact for the data science field. Science is an industry with high quality and sensitive products.
  • There are also totally different "data scientists" we also need, which would be more comparable to "supply chain management" in industry, which we currently totally ignore. A company with complex products ignoring this would fail fast today (ask your factory next door). Well, I think scientific data is a complex product. So please consider a supply chain management profession perspective, too and transfer it from factories to universities and scientific data.

FAIR Maturity Model

Although we have metrics, we also recognise that there is no one definition of what is FAIR, as this depends on many factors including stakeholders viewpoints and purpose. Along with metrics we need a multi-level Maturity Model or readiness model

One thing for sure - FAIR is not not ALL OR NOTHING. Must be Incremental.

CRIS and FAIR data infrastructure: a contribution by euroCRIS (Making FAIR work)

euroCRIS, representing the research information community in general and the CRIS community in particular (http://www.eurocris.org) is gladly willing to actively contribute to the EOSC initiative. The expertise and experience present within euroCRIS may bring added value for the realisation of an optimal EOSC, more specifically so for the FAIR aspect of the infrastructure. The rationale for this is summarized in the points below.

  1. The availability and interoperability of optimal metadata is crucial for the FAIR-ness of a research data infrastructure. This does not only regard metadata concerning the datasets as such, but also metadata, containing information about objects and aspects related to the data(sets), such as: publications based upon the data, the project the data resulted from, researchers and institutes involved in the research, funders that provided the grants, controlled vocabularies classifying the research and its datasets, etc… Each of these related metadata provide extra entries that facilitate and promote the findability as well as - the interpretability of - the (re)usefulness of the data and as such substantially enhance the FAIR-ness of the data.

  2. Typically such a full and interrelated set of metadata about research and its products and objects is stored in CRIS: Current Research Information Systems. Therefore (the information in) CRIS are valuable resources for any research data infrastructure.

  3. As part of its activities, euroCRIS has developed a standard, 3-layer, architectural model describing and structuring the use of metadata in a research data infrastructure. In this model a distinction is made between “generic” metadata that apply to any type of research data(sets) and discipline- or subject-specific metadata. The model is summarized in the attached figure:
    3LayerRD_Metadata_Model.pdf

  4. Two specific examples of CRIS and CERIF in handling research data sets are in progress – one in the UK with the ‘CERIFication of the Research Data Shared Service’ pilot project and the second in the Netherlands ("RDS Project") to describe, upload and archive research datasets by means of a CRIS (a project in cooperation with DANS, the national NL data hosting organisation) .

In conclusion: the success of the European Open Science Cloud (EOSC) requires efficient and effective underpinning systems - among which CRIS - based on open standards and comprehensive interlinked metadata. The latter are essential to optimize the findability, interpretability and reusability of datasets as well as facilitate interoperability and so reduce the burden in collecting and reusing information. Together with authoritative persistent identifiers and standard definitions, a standard metadata exchange model provide the three pillars of interoperability. (see: http://dspacecris.eurocris.org/handle/11366/567)

Amsterdam contribution to the FAIR Expert Group

The Amsterdam Economic Board and the Amsterdam Science Park
contribution to the FAIR data Expert Group Call for Contributions

With this letter the Amsterdam Metropolitan Area would like to set out its vision on FAIR data and offer its support in turning the FAIR data principles into an operational reality.
First of all, we fully support and compliment the data related efforts of the European Council and the European Commission. The Digital Single market, the European Cloud Initiative, EOSC and FAIR data initiatives are all part of a roadmap to innovation. Furthermore, we applaud the announced focus of the Estonian Presidency on a fifth freedom on free movement of data (https://www.eu2017.ee/news/insights/FreeMovementOfData).

Data-driven innovation in the Amsterdam metropolitan area
Data-driven innovation is crucial for the Amsterdam Metropolitan Area in a variety of sectors such as mobility, health, sustainability, and of course advanced research. A strong signal from the Amsterdam municipality was the decision to endorse the principle of open data for all relevant information. This key choice has its governance implications, but this signal is driving change to a new culture of benefitting from open data, not only for public services but also for the private sector and the public at large.
Stakeholders in the Amsterdam region are cooperating to create an open cloud environment based on FAIR data. This movement is not restricted to the scientific communities and their data but also includes other public and private sectors. We consider the FAIR principals crucial for seamless and trusted use of data across disciplines and organisations. The data sharing initiatives in the Amsterdam region are mostly bottom-up and community driven. Drivers for these initiatives are the emerging new business models in the open cloud, and the need to agree on organising the related trust in the concerned communities. All initiatives are cooperating with similar developments in the Netherlands and abroad, which positions the Amsterdam region as an area of expertise and as model region for other regions and countries.
Examples of the initiatives (with stakeholders in italics) in the Amsterdam Metropolitan Area are:

Our recommendations
The following paragraphs offer our contribution for the EC expert group on FAIR data, following the proposed sections in its call for contributions.

Involve public and private partners (Section: concept- why FAIR?)
The vision on FAIR data is currently mainly regarded for research communities. However, we also see value of the FAIR principles for other public and private communities. A new economy is emerging, offering products and services with added value on shared data. A prerequisite is that all involved stakeholders have trust in the full cycle from raw to processed data. Data are only understandable in the context of how they were produced, and as such also software code, virtual environments, sensors, machines (Internet of Things) can be regarded as FAIR data and services Here the FAIR principles offer a common baseline for organising trust. Apart from being Findable, Accessible, Interoperable, and Re-usable, the data also have to be machine-understandable allowing for automatic processing. In addition, data should be legally interoperable with a limited set of machine-readable license formats, whilst guaranteeing digital trust among all parties involved, including citizens. Stakeholders in the Amsterdam metropolitan region are putting effort in researching and testing such approaches.

Apply a bottom-up community strategy (Section: research data culture)
While we regard the required culture not restricted to the research communities, we observe that culture change is mainly driven by opportunities for creating new knowledge, new services, new business models, or new strategic partnerships. It is conditional to implement the supporting software tools that will facilitate a FAIR policy. For example to support the direct upload of new data in correct formats, and tools to (re)use data in applications. Not until such tools will offer an accepted default mode of work, a new data culture is expected to expand in time. It has been suggested to define “Rules of Engagement”, being a kind of Charters on issues such as quality, interoperability, privacy, and safety. Regional, national or sectorial supervisory bodies established by cooperating actors in the landscape of providers could develop the mechanisms to enforce such rules. Such a bottom-up community strategy, rather than top-down governance, will facilitate adaptive developments.

_Establish local support teams (Section: facilitating change)
Above, already some facilitating mechanisms are mentioned such as supporting tools, common Charters, and community supervision. Various mechanisms, standards and procedures will help communities to define a Roadmap towards organising trust. Promotion of national, regional or sectorial networks with adequate support in order to implement the mechanisms is crucial for adopting new working processes as well as new opportunities in science, public sector, business and society at large. Stakeholders in the Amsterdam Metropolitan Area will contribute to the call of the Netherlands and Germany to follow FAIR principles, and are considering plans to establish technical and legal support teams to assist developing networks in other countries.

We’re looking forward to cooperate on turning the FAIR data principles into an operational reality.

Kind regards,

Nina Tellegen Leo le Duc
Amsterdam Economic Board Amsterdam Science Park

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.