datalad-handbook / course Goto Github PK
View Code? Open in Web Editor NEWTalks and materials for workshops based on the DataLad handbook
License: Other
Talks and materials for workshops based on the DataLad handbook
License: Other
@mih and I will be giving a workshop on DataLad in Lucca on March 23rd-24th. This issue lists the TODOs and acts as a progress tracker.
Please extend and edit as necessary. :)
Logistics
Software
Teaching
A Basics layout has been proposed by @mih and awaits feedback from Lucca
Datalad concepts and principles
Basics of local data/code version control
Modular data management for reproducible science
Data management for collaborative science
Data publication
Outlook (what is else possible, resources, use cases)
Potential group work: Small sets of people are given problems to solve with DataLad and present
This is currently structured like this:
Monday 23 Morning session
1 Datalad concepts and principles
2 Basics of local data/code version control + Hands on: tasks to exercise basic building blocks
Monday 23 Afternoon session
1 Modular data management for reproducible science + Hands on: implement sketch of a reproducible paper
2 Data management for collaborative science + Hands on: Using your infrastructure (Gdrive) to collaborate on a demo project
Tuesday 24 Morning session
1 Data publication + Hands on: Publish data on "GitHub"
2 Outlook (what is else possible, resources, use cases)
Resources to create
got interested in sandwhich03.svg
which comes from that submodule but that submodule has ssh url for it
(git)lena:~datalad/datalad-handbook/course[master]git
$> datalad -f json_pp subdatasets pics/slides
{
"action": "subdataset",
"gitmodule_name": "pics/slides",
"gitmodule_url": "kumo.ovgu.de:/home/mih/public_html/datalad/slides",
"gitshasum": "76882e01a9194444b507491889e7d9f6d6dcb6b2",
"parentds": "/home/yoh/proj/datalad/datalad-handbook/course",
"path": "/home/yoh/proj/datalad/datalad-handbook/course/pics/slides",
"refds": "/home/yoh/proj/datalad/datalad-handbook/course",
"state": "absent",
"status": "ok",
"type": "dataset"
}
When: November 26th, 2019, 4pm
Where: Same library seminar room as before
Duration: 2 hours
Participants: 25 grad students, various backgrounds (neuroscience, psych, bio, physics, engineering, medicine), workshop will be made compulsory
Communicated expectations on content:
Date: Jan 22nd 2020
Tentative schedule:
ReproNim: Data Versioning and Transformation with DataLad
Instructor: Adina Wagner*, Institute of Neuroscience and Medicine (INM-7)
Why Should Data Be Versioned?
Simple DataLad Transform: Retrieve, Compute, Store Results
Create a Dataset
Using DataLad with Containers on the Dataset
Rerunning and Checking Analysis Differences
Submission due: Dec. 15th
Todos:
This tool is very useful:
<iframe></iframe>
tag): <iframe src="https://directpoll.com/r?XDbzPBdJ2bAX0ZEC2YlWLumm6WtYBkChGSFh5Vwe4W"
title="This is my poll", width="900", height="900"></iframe>
... instead of executing everything on my machine.
datalad run nano
).The DebConf talk proposal was accepted.
Here is the abstract:
Title: DataLad - Decentralized Management of Digital Objects for Open Science
With a general awareness of a reproducibility crisis in many scientific areas and increasing importance of research data management in science and policy making, data-driven fields require convenient and scalable data management solutions. Standing on the shoulders of Git and git-annex (git-annex.branchable.com/, Joey Hess), DataLad provides a decentralized solution that enables the joint management of code, data, and complete containerized computational environments in a scalable and distributed fashion. With features such as unambiguous version control, a wide spectrum of data transport mechanisms, convenient provenance capture, and re-execution for verification or as an alternative to storage and transport, it enables and facilitates many aspects of open and reproducible science: collaboration, sharing, analytical transparency, computational reproducibility of digital research objects, and disk-space aware storage and computing workflows on infrastructure that ranges from personal laptops up to supercomputers.
In this talk, we will introduce DataLad, present its main features which should be of interest to the audience regardless of their relation to any field of science, and share the process and status of its adoption in the neuroimaging community.
Recording tips: https://debconf-video-team.pages.debian.net/docs/advice_for_recording.html
The goal is to develop a course, based on the book while minimizing the amount of disconnected material, and therefore making it easier to evolve book and course together with the evolution of datalad
the course and the book share the exact same content, but the former is performed, while the latter serves as the syllabus
code examples in the book are actually executable. we use this feature to turn them into "cast" scripts. once in that form, we can use the cast_live
tools from DataLad to demo them in a course installment
each code example in the book needs to be equipped with a "caption" that can then serve as a narrative cue in the cast script. The caption could then also be displayed in the book itself.
each code example in the book needs to get a tag or label that can be used to subselect examples that make up a shorter, but still internally consistent narrative -- this aids the generation of shorter course installments
initially the slides of the course material are based on the "summary" components of each chapter, plus relevant key figures. once tailored to and validated by the teaching the course, their content is fed back into the book (possibly using a new dedicated markup). Each slide contains a link to the respective part of the book, where more details are available. The link is possibly implemented as a QR code.
the order of topics in the course matches the order in the book. if it turns out that this order is suboptimal it needs to be adjusted in both book and course. consequently, the course starts with basics and a uniform narrative, and ends with more standalone scenario descriptions.
the course starts with, or is following a "pitch" that outlines an attractive take-away for a respective target audience. Candidate pitches are any "use case" chapter.
slide decks for course installments are based on reveal.js, and are more or less fully generated using the book sources are a (set of) templates. Each chapter has its own slide deck.
analog to the book, each session/chapter (and in particular the early ones) must communicated in a self-evident fashion, why their content/objective is important, and applicable to practical problems a target audience can relate to.
Each of these "basics" chapters is handled in a 90min installment.
After the initial sessions on "basics" and number of use case descriptions can follow.
For the initial run at INM7, we will have a dedicated "How to work with the local infrastructure" session that could take place any time after (3). This will the also turn into a use case chapter in the book.
Instead of a weekly or biweekly frequency, this course can also be tought as a 2-day block event, with the basics on day 1, and a re-cap + use cases on a (shorter) day 2.
This will be cool!
Registering as a talk/workshop todo. Info is in https://github.com/adswa/mpi-datamanagement-ws/.
Takes place November 18th, full day.
abstract:
With a growing awareness of the role of sample size and replicable results (Button et al., 2013; Turner et al., 2018), a rise of platforms, tools, and standards that aim to facilitate data sharing and management (Wiener et al., 2016), unprecedented sample sizes (e.g., UKBiobank; Bzdok & Yeo, 2017), and increasingly complex data analyses (e.g, Glasser et al., 2013; Alfaro-Almagro et al., 2018), research data management (RDM) is essential to put open and FAIR neuroimaging research into effect. But just as FAIRness and RDM can not be an afterthought in any given scientific project, they also shouldn’t be an afterthought in the training and education of current and future generations of neuroscientists. This training has to fulfill the demands of different stakeholders in science: 1) Researchers, that apply RDM in their scientific projects, 2) PIs and similar personnel with management tasks, that need to set out and justify plans for the implementation of RDM and FAIR principles, and 3) trainers, such as librarians or data managers, that educate users on tools and practices for FAIR science (Fothergill et al., 2019, Grisham et al., 2016). Researchers of any career level and of any background need accessible tutorial-like educational content and documentation for relevant tools and concepts to apply FAIR RDM from the get go. Planners need high-level, non-technical information in order to make informed yet efficient decisions on whether a tool fulfils their needs. And trainers need reliable, open teaching material.
A user-driven alternative to scientific software documentation by software developers, “Documentation Crowdsourcing”, has been successfully employed by the NumPy project (Oliphant, 2006; Pawlik et al., 2015). Extending this concept beyond documentation, we have created the DataLad handbook (handbook.datalad.org) as a free & open-source, user-driven and -focused educational instrument and resource for trainers, users, and planners for (research) data management, independent of their background and skill level (Wagner et al., 2020). Drawing from the experiences of creating more than 400 pages of educational material, with almost 40 independent contributors from around the world, and nearly 2 years of in-person and virtual teaching based on the handbook, I want to highlight the unique challenges of RDM training and as well as its opportunities for the field of neuroscience.
just was trying to get a glimpse of https://github.com/datalad-handbook/course/blob/0b26cb6ac9a5d6c2d5bd5473a92d0284d959ec79/talks/hhu.html but it seems that most of the figures, such as e.g. talks/hhu.html: <img height="850" class="fragment fade-in" src="../pics/ukb_datasets.svg">
are nowhere to be found.
This is to document how to turn the handbook into cast_live
scripts.
cast_live
to to "play" itTODO:
cast_live
tools to run without obscure failure (XGetWindowProperty[_NET_WM_DESKTOP] failed (code=1)
)
xdotool windowactivate --sync $(xdotool getwindowfocus)
For a symposium "Open and Reproducible Neuroimaging: Integration of community developed tools from data acquisition to publication". Michael and I will both have a 15 min slot to talk about data storage and retrieval.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.