Giter Club home page Giter Club logo

Comments (8)

emmetaobrien avatar emmetaobrien commented on July 18, 2024

Longer term, our intent is to deal with the issue of changing datasets by more clearly defined version management, in which any change in any of those factors would be represented as a distinct different version of the dataset.

from conp-portal.

dbujold avatar dbujold commented on July 18, 2024

I understand. So this implies that expanding dataset with frequent (daily/weekly) releases, the DATS document will need to get updated and versioned accordingly?

from conp-portal.

emmetaobrien avatar emmetaobrien commented on July 18, 2024

That would be the expectation with the current model, yes.

from conp-portal.

dbujold avatar dbujold commented on July 18, 2024

I think it would be nice to have a way to support projects with rolling releases as well. Such projects sometimes want to describe their cohort and datasets content in a standardized way, without entering into the specifics of how many files, what size they are, etc.

from conp-portal.

emmetaobrien avatar emmetaobrien commented on July 18, 2024

Exactly how much data are you envisioning storing on CONP, and of what sort? Our processing involves building fixed links to every distinct file, so that needs redoing for anything that changes from release to release.

from conp-portal.

dbujold avatar dbujold commented on July 18, 2024

Right now we have two cohorts of >5000 participants, with thousands of whole genomes, whole exomes, etc. But data is under controlled access, which means files wouldn't be indexed by CONP. It's the dataset provenance that we're aiming to describe, rather than its content.

from conp-portal.

bryancaron avatar bryancaron commented on July 18, 2024

Hi David, I was discussing briefly with Emmet this morning. Are the datasets you have in mind those from the BQC19 which we have discussed in the context of distribution through NeuroHub, or different datasets? Thanks!

from conp-portal.

dbujold avatar dbujold commented on July 18, 2024

Hi Bryan, this one and others. We have a few cohorts supported in Bento currently, often in a rolling release kind of way. We prepare a DATS file to annotate the datasets, but we're not always able to provide precise details about that dataset content.

from conp-portal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.