Data processing pipeline for the Longitudinal Early-onset Alzheimer's Disease Study (LEADS).
This repository contains the new codebase to process LEADS MRI and PET scans using an MRI-based processing pipeline that mimics the ADNI processing pipeline. The new codebase was written by D. Schonhaut in Spring 2024 as a full rewrite of the original LEADS processing code by L. Iaccarino in 2018.
The new codebase is designed to be faster, more readable, and more modular than the original codebase. With minimal changes, the new code should also be fully extesible to other projects that collect PET and MRI data and want to adopt an MRI-based PET processing pipeline with a similar directory structure and file naming conventions.
The points below encompass, to the best of our knowledge, all substantive differences between the new LEADS pipeline and the older processing code it replaces. Non-substantive differences refer to any changes that do not affect the underlying values of processed data (for example, changes to file naming convenctions and directory organization, or changes to the code itself that don't affect the end result for files saved).
- The original codebase used SPM ImCalc to perform arithmetic operations on PET images (e.g. create an SUVR by dividing PET voxels by a scalar value). However, the ImCalc code was programmed to cast output values to an inappropriate data type (int16 for continuous values) and reduced the decimal precision vs. the input values by 4-fold. In the new codebase, SPM ImCalc is no longer used (working with Matlab arrays is faster) but all NIfTI images that hold continuous values are stored at float32 precision.
- Corrected an error in the algorithm to calculate a global amyloid SUVR and corresponding Centiloid value using the ADNI longitudinal reference region. To obtain the reference region scaling factor, our original code calculated a weighted mean across all voxels in the eroded WM, whole cerebellum, and brainstem after combining these regions into a single mask. In fact, ADNI calculates the longitudinal reference region scaling factor by calculating the mean PET signal within the eroded WM, whole cerebellum, and brainste separately, then taking a straight average across these regions, unweighted by their respective volumes.
- To create the inferior cerebellar gray matter mask that is used as a reference region for tau-PET, the cerebellar SUIT atlas is warped into the subject's native MRI space. In the original LEADS code, warped SUIT atlas files were resliced with trilinear interpolation and then re-cast as integer values. This caused edge effects with non- sensical values at the borders between different cerebellar subregions (e.g. a left subregion label appearing in R hemisphere). The new code fixes this problem by using nearest-neighbor interpolation to reslice the SUIT atlas to native MRI space.
- Also regarding to the inferior cerebrellar reference region, we now use the SUIT toolbox in MATLAB to estimate a nonlinear transformation between each subject's cerebellum in native space and the 'SUIT space' template cerebellum. Previously we had used the iy_ file defined during SPM segmentation to reverse normalize an MNI space version of the SUIT template.
- The regularization parameters used for SPM segmentation in the original LEADS codebase differed slightly from the default regularization parameters used by SPM12. (original LEADS: matlabbatch{1}.spm.spatial.preproc.warp.reg = [0 0.001 0.5 0.025 0.1];) (default params: matlabbatch{1}.spm.spatial.preproc.warp.reg = [0 0.001 0.5 0.05 0.2];). We could not find a rationale for this change and did not observe a noticeable difference in segmentation results when testing segmentation with the altered versus default parameters. For simplicity, the new codebase reverts back to the SPM12 default.
- The original LEADS codebase saved PET images that were affine transformed to MNI space at 1mm^3 resolution, while PET images that were nonlinearly warped to MNI space were saved at 1.5mm^3. In the new LEADS codebase, affine and warped MNI space images are both saved at 1.5mm^3.
The new codebase is implemented in Matlab (version 2023b) and Python (version 3.11), with Matlab serving as the primary interface for end users. The user should not need to specify paths to any programs aside from Matlab, as all paths are specified by the code to ensure that the same versions of required software are used consistently.
A few additional notes about the new codebase:
- On the PETcore VM, the codebase lives in
/mnt/coredata/processing/leads/code
and is connected to its remote repository on GitHub. - The new code is 10-20x faster than the original codebase, mostly due to the use of multithreading throughout the processing pipeline. FreeSurfer still takes ~10 hr to run per scan, although up to 16 MRIs can be processed in parallel on the PETcore VM.
- A single Matlab script,
run_leads_mri_based_processing.m
, allows the user to run all parts of the processing pipeline while interacting with the program through simple, question and answer based prompts. - The processing pipeline is divided into three modules, one of which must be specified by
the user with each call to
run_leads_mri_based_processing.m
.
From an end user's perspective, new MRI and PET data will be downloaded from LONI and manually
uploaded as one or more zip files to /mnt/coredata/processing/leads/data/newdata
. The
run_leads_mri_based_processing.m
program then needs to be run four times in the following
order:
- Run the Setup Module (Option 1, "Setup scans for processing") to move scans from
.../leads/data/newdata
to.../leads/data/raw
and schedule new MRIs for processing (note: scans are selected but not yet processed). This module takes 2-3 min to run, and at the end the user is informed of how many scans have been scheduled for processing.- Optionally, the user can now run
run_leads_mri_based_processing.m
and choose Option 2 ("View scans that are scheduled for processing") to see a full list of scheduled scans.
- Optionally, the user can now run
- Run the MRI Module (Option 3, "Process scheduled MRIs") to process new MRIs through FreeSurfer and SPM-based pipelines.
- Rerun the Setup Module to schedule new PET scans for processing. This step is necessary
because PET scans will not be scheduled until the MRI that they will be coregistered to has
been processed.
- As before, running
run_leads_mri_based_processing.m
and choosing Option 2 ("View scans that are scheduled for processing") will print a full list of scheduled PET scans.
- As before, running
- Run the PET Module (Option 4, "Process scheduled PET scans") to process new PET scans.
Here is a somewhat more detailed account of what happens during each processing module.
- Main script:
setup_leads_processing.m
- Overview: Newly uploaded data are prepared for subsequent processing
- Zip files in
.../leads/data/newdata
are unzipped - All DICOMs in
.../leads/data/newdata
are converted to NIfTI usingdcm2niix
(version v1.0.20240202). - MRI and PET scan directories are moved from
.../leads/data/newdata
to.../leads/data/raw
if they don't already exist in.../leads/data/raw
- All scan directories in
.../leads/data/raw
are identified by a recursive search for *.nii files - Filenames are parsed to resolve:
- Subject ID
- Scan type (MRI or PET; and for PET, which tracer)
- Scan acquisition date
- LONI Image ID
- For PET scans, the spatial resolution
- Each PET scan is matched to the closest MRI scan
- Orphan MRIs (those not recently acquired and not closest to any PET scan) are identified, and under default settings will not be scheduled for processing
- PET scans are audited for potential issues that would prevent processing. These include:
- Inability to parse any of the scan information above
- PET not at the expected resolution (6mm by default)
- PET and MRI scans too far apart (>365 days, by default)
- Same MRI would be used for multiple PET scans of the same tracer (e.g. two FTP timepoints)
- A search is conducted to identify which MRIs in
.../leads/data/raw
have been processed by FreeSurfer, and which have completed post-FreeSurfer, SPM-based processing, respectively - A search is conducted to identify which PET scans in
.../leads/data/raw
have been fully processed - MRIs are scheduled for processing if they have not been fully processed, are not orphans, and (for follow-up MRIs) their corresponding baseline MRI has been processed
- PET scans are scheduled for processing if they have not been fully processed, are not flagged with an issue, and their corresponding MRI has been fully processed
- The user is informed of how many MRI and PET scans are in
.../leads/data/raw
, and how many of these scans have completed processing, been flagged with one or more issues, and been scheduled for processing, respectively - Two CSV files containing scan-level information are saved to 'scans_to_process': one for MRIs and one for PET scans. These files are read by downstream scripts in the processing pipeline. Manually editing these files will affect which scans the scripts attempt to process, though could also introduce errors, so edit the CSVs with caution and at your own risk.
- For scans that are scheduled to be processed, new directories are created in
.../leads/data/processed
to store the processed data. Symbolic links are made in these new scan directories that point back to their corresponding directories in.../leads/data/raw
. A symbolic link in each PET scan directory is also created that points to the directory of the MRI it will be coregistered to.
- Zip files in
- Main script:
process_mris.m
- Overview: MRI processing is divided into two submodules:
- T1 MRIs are processed through FreeSurfer
recon-all
andsegmentBS.sh
, which segments the brainstem into subregions. This is the most time consuming part of the processing pipeline, taking ~10 hours to run on a single MRI, and currently the PETcore VM is equipped to process only up to 16 scans in parallel. In contrast, the second most time consuming part of processing is SPM segmentation, which takes about 12 minutes per scan. - Post-FreeSurfer processing is carried out in
SPM12
. The steps in order are:- FreeSurfer files are converted from .mgz to .nii and copied from the FreeSurfer directory into the processed MRI directory
- Image origins of the nu.nii, aparc+aseg.nii, and brainstem_sublabels.nii files are reset to the center-of-mass of the nu.nii. Rigid body coregistration parameters are then estimated to bring the nu.nii as close as possible to the SPM12 OldNorm T1.nii template in MNI space, and these parameters are applied to the nu.nii, aparc+aseg.nii, and brainstem_sublabels.nii images.
- The nu.nii is coregistered to the baseline nu.nii (earliest processed MRI for a given subject), and the same transform is applied to the aparc+aseg.nii and brainstem_sublabels.nii images. This step is skipped for baseline MRIs.
- The Matlab SUIT toolbox is run to estimate a nonlinear fit between the native space nu.nii and the cerebellar SUIT template.
- Reference region and target ROI masks are saved in native MRI space.
- The nu.nii is segmented into tissue probability maps, and deformation fields that map the nonlinear transforms between native MRI and MNI space are saved (y_* and iy_* files).
- The nu.nii is warped to MNI space using the y_* deformation field
- The full (12-degree) affine transform between the native space nu.nii and the OldNorm T1.nii template in MNI space is estimated and saved.
- The affine transform from the previous step is applied to the native space nu.nii
- T1 MRIs are processed through FreeSurfer
- Main script:
process_pets.m
- Overview: PET processing must be performed after the MRI that PET will be coregistered to has
been processed, and the Setup module will not schedule PET scans to be processed until this
has happened. PET processing includes the following steps, in order:
- The "raw" PET scan is copied from
.../leads/data/raw
to.../leads/data/processed
- Note: In LEADS we download already preprocessed PET data, for which reconstructed PET frames have already been realigned and averaged into a single frame spanning a tracer-specific time window of interest. Preprocessed scans have also been resliced to the same voxel size, intensity normalized against mean signal in the whole cerebellum (not FreeSurfer-based), and smoothed to approximately the same resolution. This code will therefore need to be modified before it can be used to process raw PET data (reconstructed frames) rather than preprocessed PET data.
- The PET scan is coregistered and resliced to the nu.nii (at 1mm^3 voxel sizes)
- Reference region means are calculated, and voxelwise PET SUVR images are saved.
- Reference regions in the new pipeline are not hard-coded, but rather depend on a
ref_regions.csv
file that in the.../leads/code/config
directory. To process a new PET tracer or reference region, the user should need only to add a line to this file and -- if the reference region is not already saved in the processed MRI directory -- add a new script to save the desired reference region and call it fromsave_roi_masks.m
. If the PET tracer is not recognized, it may need to be added to a different config file,scan_types_and_tracers.csv
. - Following ADNI processing, the default reference regions saved in LEADS are:
- Amyloid-PET
- Whole cerebellum (cross-sectional reference region)
- Composite white matter, consisting of the unweighted average PET signal within the brainstem, eroded subcortical white matter, and whole cerebellum (longitudinal reference region)
- Tau-PET
- Inferior cerebellar gray matter (cross-sectional reference region)
- Eroded subcortical white matter (longitudinal reference region)
- FDG-PET
- Pons
- Amyloid-PET
- Reference regions in the new pipeline are not hard-coded, but rather depend on a
- Mean PET SUVRs and ROI volumes are extracted from FreeSurfer regions in native MRI space.
- The
fsroi_list_<TRACER>.csv
files in../leads/code/config
define which ROIs are saved for each PET tracer - For amyloid-PET only, the mean SUVR within the ADNI cortical summary region is calculated and converted to Centiloids using the appropriate equation from Royse et al., 2021
- PET SUVRs are warped from native MRI space to MNI space using the y_* deformation field obtained during segmentation of the nu.nii.
- PET SUVRs are linearly transformed from native MRI space to MNI space using the affine transform estimated on the nu.nii.
- The "raw" PET scan is copied from