The module-dataprocessing from repronim

"Idealized" versus "good-enough" processing stream

I'm curious what would be considered an "idealized" reproducible processing stream, and what is a "good enough" reproducible processing stream, and identify the tools/skills needed to complete a "good enough" reproducible analysis. I have some hypothesized steps and some tools listed to complete those steps.

Sparse Learner's Profile

Starting from the top where a PI (or someone) hands you a bunch of dicoms and asks you get subcortical volumes from the structural scans (but there are other currently irrelevant dicoms as well). The PI also wants to be able to run your analysis and wants the data to be publicly available (assuming all IRB/data sharing agreements are satisfied)

An Idealized Processing Pipeline

I imagine we would be using datalad to record all our data/code/processing steps, and always be using/developing containers from the beginning. I'm not exactly sure where/how to place NIDM annotations of data/results or what tool I should use (PyNIDM?).

Good Enough Processing Pipeline

Removed datalad from the processing stream, removed testing, removed niflows, but still want to use desired software from within a container.

search through and find the relevant dicoms
- nibabel
- afni
convert the dicoms to nifti file format named to the BIDS standard
- heudiconv (via docker/singularity)
deface and rename the files
- pydeface (via docker/singularity)
- shell
write a script that calculates subcortical volumes
- shell
- fsl
- datalad
place the script in a container with all the requisite software installed
- neurodocker
upload the container to a hub (docker and/or singularity)
- docker
- singularity
run the script on the data and output data in a derivatives directory
- docker
- singularity
upload the BIDS organized nifti files to some online database
- openneuro
upload the code/outputs to an online repository and link to what containers you used
- git
- github

I would like feedback on both the "Idealized" and "Good Enough" analyses since I am not as knowledgeable as I would like to be on designing processing pipelines. I may not be most up to date on what are the hot/new tools versus what will get the job done.

Once we pin what we would like workshop attendees to be able to do (and hopefully this matches with what they wish to do as well), then I think we will have an easier time elucidating necessary skills and modifying episodes to make sure they help build these skills.

my comments and Satra answers to the lessons 1-3 (copy of the email conversation)

Lesson 1: Core concept
General:
- you're saying at the beginning that the lesson use the Simple Workflow, but it's not clear to me how specific parts are related to the repo. I would expect much more guidance.

can you suggest what kind of guidance?

Element 1:
- are you planning to "convert" the JSON file (that is missing for now) to the two standards you mention? It would me very useful IMO.

we should revise this. NIDM-E is not set in stone, so if we release we should have some confirmation of a NIDM-E version.

Element 2:
- don't understand the first sentence, "...when a different dataset containing the same data or a slightly different workflow is used."

i don't see this sentence here: http://www.reproducibleimaging.org/module-dataprocessing/02-concepts/

Lesson 2: Annotate..

General:

there are no "elements"

elements were conceptual pieces in the concepts lesson - we can refer back to those elements, but don't expect elements to be in everything (i think).

Links don't work:

metadata editor: http://data.wu.ac.at/csvengine/csvm/editor
BIDS: http://www.reproducibleimaging.org/module-dataprocessing/03-data/bids.neuroimaging.io

http://data.wu.ac.at/csvengine/csvm
the second looks like a markdown typo.

Data and Metadata

not sure what is the main point here, convincing people to submit data to archives? A few sentences of intro might be nice. When do you use NDA and when NeuroVault for sharing?

yes, and introductory statement would be good there.

Lesson3: Create and...
General:
- there are no "elements"
- IMO a general motivation when to use VM, when Docker or Singularity, in scientific application, is missing. You have it in your presentation, but not here.
- as I mentioned, it might be useful to work on the final version of the videos so they are easier to use.

we should update things here with respect to the presentation.

Docker
- I know you mention Bids-App, but it could be useful to point to some Python or Nipype images, so they don't have to install everything to run Nipype workflow.

sure - the current nipype dockerfile has most things necessary.

Singularity

I don't see any "smooth transition" to Vagrant. I understand that Vagrant is needed when one uses OSX or when has to create an image on a machine without sudo (since she/he is a root in VM), but it's not obvious IMO.

we should point out a few more things here, but there are a few places in the web we can draw from.

address TODO: Here provide a very short (1 sentence) description of the module.

testing the questions with moodle

@mjtravers - could you please check if our xml file works fine with the moodle

adding changes from new_containers

I messed up with git history, failed to fix it properly. I should review commits from new_containers (https://github.com/djarecka/module-dataprocessing/commits/new_containers) and see which one had useful changes

Converting DICOM data to BIDS

atm just points to heudiconv but may be worth pointing to alternatives as well?

update content for containers lesson

@djarecka - we should update containers lesson to point to your new slides and to the newer singularity slides as well.

add questions and answers for each unit

we could take a look at: https://classroom.github.com/

Decide/Advice on aggregating what was actually taught at the recent workshops...

... e.g. in http://www.repronim.org/sfn2018-training/ "Data Processing" was taught through a "complete workflow" based on heudiconv/reproin and FSL via containers and datalad containers-run. I think it would be valuable to absorb those in some fashion within this module since that is the only one dealing with the actual data processing.

On the other hand I think 04-containers would be best to migrate to reproducibility basics.

Formatting issues

The paragraph Machine accesibility means... has some odd formatting that appears when loaded with a web browser, specifically '>' symbols and {: .callout}

repronim / module-dataprocessing Goto Github PK

module-dataprocessing's Introduction

Lesson template for ReproNim teaching sessions

Acknowledgment

Testing trafic check

module-dataprocessing's People

Contributors

Stargazers

Watchers

Forkers

module-dataprocessing's Issues

Sparse Learner's Profile

An Idealized Processing Pipeline

Good Enough Processing Pipeline

Recommend Projects

Recommend Topics

Recommend Org