Giter Club home page Giter Club logo

module-dataprocessing's Introduction

Lesson template for ReproNim teaching sessions

How to use this template:

  1. Go to the Github Importer. In the top text box paste the url of this repo. In the bottom part choose either "ReproNim" (if that's an option) or your own user account and then enter the name of the lesson/repository that you wish to create.

  2. Change the following variables in the _config.yml file:

    • title
    • repo
    • root
    • email (you can leave Ariel's address here, if you want).
    • start_time : this is the start time in minutes since midnight. For example, 9 AM is 540 (60 * 9).
  3. Edit the content in the _episodes folder, adding images (into assets/img), code (into code), data (into data) as needed. Pay particular attention to the following:

  • Sections should be named 01-first-part.md, 02-second-part.md, etc to be ordered in the schedule.
  • Edit the headers of each of your sections. Editing the duration of both teaching and exercises
  • Add coffee breaks into the lesson. This keeps the timing of each section accurate.

Acknowledgment

Please see LICENSE.md for copyright, license, and how to acknowledge information.

Testing trafic check

HitCount

module-dataprocessing's People

Contributors

aaren avatar abbycabs avatar abought avatar aflaxman avatar alistairwalsh avatar arokem avatar bkatiemills avatar cdw avatar christinalk avatar djarecka avatar fmichonneau avatar gvwilson avatar jbpoline avatar jdblischak avatar josephmje avatar jpallen avatar montoyjh avatar nikhilweee avatar pausz avatar pbanaszkiewicz avatar pipitone avatar rgaiacs avatar rrlove avatar satra avatar synesthesiam avatar tbekolay avatar twitwi avatar valentina-s avatar wking avatar yarikoptic avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

module-dataprocessing's Issues

"Idealized" versus "good-enough" processing stream

I'm curious what would be considered an "idealized" reproducible processing stream, and what is a "good enough" reproducible processing stream, and identify the tools/skills needed to complete a "good enough" reproducible analysis. I have some hypothesized steps and some tools listed to complete those steps.

Sparse Learner's Profile

Starting from the top where a PI (or someone) hands you a bunch of dicoms and asks you get subcortical volumes from the structural scans (but there are other currently irrelevant dicoms as well). The PI also wants to be able to run your analysis and wants the data to be publicly available (assuming all IRB/data sharing agreements are satisfied)

An Idealized Processing Pipeline

I imagine we would be using datalad to record all our data/code/processing steps, and always be using/developing containers from the beginning. I'm not exactly sure where/how to place NIDM annotations of data/results or what tool I should use (PyNIDM?).

  • search through and find the relevant dicoms
    • nibabel
    • afni
  • version control the relevant dicoms
    • datalad
    • git-annex
  • convert the dicoms to nifti file format named to the BIDS standard
    • heudiconv (via docker/singularity)
    • datalad
  • deface and rename the files
    • pydeface (via docker/singularity)
    • shell
    • datalad
  • write a script that calculates subcortical volumes
    • niflows (via pip/conda env)
    • fsl
    • datalad
  • place the script in a container with all the requisite software installed
    • neurodocker
  • upload the container to a hub (docker and/or singularity)
    • docker
    • singularity
  • run the script on the data and output data in a derivatives directory
    • docker
    • singularity
  • upload the BIDS organized nifti files to some online database
    • openneuro
  • upload the code/outputs to an online repository
    • git
    • github
  • test your code against that uploaded data
    • testkraken
    • circleci
    • travisci
    • shell

Good Enough Processing Pipeline

Removed datalad from the processing stream, removed testing, removed niflows, but still want to use desired software from within a container.

  • search through and find the relevant dicoms
    • nibabel
    • afni
  • convert the dicoms to nifti file format named to the BIDS standard
    • heudiconv (via docker/singularity)
  • deface and rename the files
    • pydeface (via docker/singularity)
    • shell
  • write a script that calculates subcortical volumes
    • shell
    • fsl
    • datalad
  • place the script in a container with all the requisite software installed
    • neurodocker
  • upload the container to a hub (docker and/or singularity)
    • docker
    • singularity
  • run the script on the data and output data in a derivatives directory
    • docker
    • singularity
  • upload the BIDS organized nifti files to some online database
    • openneuro
  • upload the code/outputs to an online repository and link to what containers you used
    • git
    • github

I would like feedback on both the "Idealized" and "Good Enough" analyses since I am not as knowledgeable as I would like to be on designing processing pipelines. I may not be most up to date on what are the hot/new tools versus what will get the job done.

Once we pin what we would like workshop attendees to be able to do (and hopefully this matches with what they wish to do as well), then I think we will have an easier time elucidating necessary skills and modifying episodes to make sure they help build these skills.

my comments and Satra answers to the lessons 1-3 (copy of the email conversation)

Lesson 1: Core concept
General:
- you're saying at the beginning that the lesson use the Simple Workflow, but it's not clear to me how specific parts are related to the repo. I would expect much more guidance.

can you suggest what kind of guidance?

Element 1:
- are you planning to "convert" the JSON file (that is missing for now) to the two standards you mention? It would me very useful IMO.

we should revise this. NIDM-E is not set in stone, so if we release we should have some confirmation of a NIDM-E version.

Element 2:
- don't understand the first sentence, "...when a different dataset containing the same data or a slightly different workflow is used."

i don't see this sentence here: http://www.reproducibleimaging.org/module-dataprocessing/02-concepts/

Lesson 2: Annotate..

General:

  • there are no "elements"

elements were conceptual pieces in the concepts lesson - we can refer back to those elements, but don't expect elements to be in everything (i think).

Links don't work:

http://data.wu.ac.at/csvengine/csvm
the second looks like a markdown typo.

Data and Metadata

  • not sure what is the main point here, convincing people to submit data to archives? A few sentences of intro might be nice. When do you use NDA and when NeuroVault for sharing?

yes, and introductory statement would be good there.

Lesson3: Create and...
General:
- there are no "elements"
- IMO a general motivation when to use VM, when Docker or Singularity, in scientific application, is missing. You have it in your presentation, but not here.
- as I mentioned, it might be useful to work on the final version of the videos so they are easier to use.

we should update things here with respect to the presentation.

Docker
- I know you mention Bids-App, but it could be useful to point to some Python or Nipype images, so they don't have to install everything to run Nipype workflow.

sure - the current nipype dockerfile has most things necessary.

Singularity

  • I don't see any "smooth transition" to Vagrant. I understand that Vagrant is needed when one uses OSX or when has to create an image on a machine without sudo (since she/he is a root in VM), but it's not obvious IMO.

we should point out a few more things here, but there are a few places in the web we can draw from.

Decide/Advice on aggregating what was actually taught at the recent workshops...

... e.g. in http://www.repronim.org/sfn2018-training/ "Data Processing" was taught through a "complete workflow" based on heudiconv/reproin and FSL via containers and datalad containers-run. I think it would be valuable to absorb those in some fashion within this module since that is the only one dealing with the actual data processing.

On the other hand I think 04-containers would be best to migrate to reproducibility basics.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.