Giter Club home page Giter Club logo

data-pipeline's Introduction

data-pipeline's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

data-pipeline's Issues

Enhance Data Pipeline Setup Guide

Update the documentation for the data pipeline to include detailed instructions on setting up, configuring, and running the DICOM file comparison script. This documentation update will provide clarity and guidance for users, facilitating easier adoption and usage of the pipeline.

Create File Renaming Mechanism According to Specified Structure

Implement a mechanism to rename DICOM files following the proposed structure after the anonymization process. This mechanism should parse necessary information either from the original filename or DICOM metadata.

The proposed structure is the following:

AnonimizedPatientID_YYYYMMDD_MODALITY_SEQUENCE_INSTANCE.dcm

Acceptance Criteria:

  • Successfully rename DICOM files according to the specified naming convention.
  • Extract required information from DICOM metadata if available.
  • Log any errors or files that cannot be renamed due to missing information.

Anonymization Fails for DICOM Files Due to Missing Attributes

I encountered an issue while running the anonymization process for DICOM files. The process failed for some files due to missing attributes in the DICOM dataset. Specifically, the attribute PatientAge was not found in one of the DICOM files, leading to an error during anonymization.

Steps to Reproduce

  1. Run the anonymization process for a set of DICOM files.

  2. Encounter an error similar to the following:

Anonymization failed for /path/to/dicom/file: 'FileDataset' object has no attribute 'PatientAge'

Expected Behavior

The anonymization process should handle cases where certain attributes are missing from the DICOM dataset gracefully, without causing errors. This should be done to all DICOM tags.

Proposed Solution

Update the anonymization logic to check whether the required attributes exist before attempting to anonymize them. Specifically, change the order of operations to first check if the attribute exists before attempting to access it in all DICOM tags.

For instance, from:

# Anonymize Patient's Age
logging.info(f"Anonymizing Patient's Age: {getattr(ds, 'PatientAge', 'N/A')}")
if hasattr(ds, 'PatientAge'):
  ds.PatientAge = ""
  logging.info(f"PatientAge: {ds.PatientAge}")
else:
  logging.info("Attribute 'PatientAge' not found in DICOM file.")

To:

# Anonymize Patient's Age
if hasattr(ds, 'PatientAge'):
  logging.info(f"Anonymizing Patient's Age: {getattr(ds, 'PatientAge', 'N/A')}")
  ds.PatientAge = ""
  logging.info(f"PatientAge: {ds.PatientAge}")
else:
  logging.info("Attribute 'PatientAge' not found in DICOM file.")

Additional Information

  • Operating System: macOS 14.4.1
  • Python Version: Python 3.9.7
  • Pydicom Version: Pydicom 2.3.0

Implement DICOM File Handling Improvements

Enhance the DICOM file comparison script by improving the handling of DICOM files. This includes implementing error handling mechanisms, validation checks, and robust file parsing to ensure reliable processing.

Streamline Data Pipeline Deployment Process

Streamline the deployment process of the data pipeline by incorporating best practices and automation techniques. This optimization effort will simplify setup procedures, automate deployment tasks, and ensure compatibility with various environments.

Develop Data Transfer Script

Create a script to transfer the anonymized and renamed DICOM files to the dataset-multimodal-breast repository. This script should ensure that the data is organized correctly in the target repository.

Acceptance Criteria:

  • All anonymized DICOM files are copied to the target repository without data loss.
  • Files are organized in the target repository according to predefined directory structures.
  • Include error handling for transfer failures and logging for transfer activities.

Implement DICOM Anonymization

We need to develop a script that anonymizes DICOM images by removing or obfuscating Personal Identifiable Information (PII) contained within the DICOM file headers. The anonymization process should comply with HIPAA guidelines and other relevant standards for data privacy.

Acceptance Criteria:

  • Remove or obfuscate all PII from DICOM headers.
  • Ensure the anonymized DICOM files retain their integrity for medical research purposes.
  • Include unit tests to verify the anonymization process.

Document the Anonymization and Transfer Process

Provide comprehensive documentation for the anonymization and transfer process, including setup, execution instructions, and troubleshooting tips. This documentation will be essential for other researchers and contributors to understand and use the data-pipeline effectively.

Acceptance Criteria:

  • Documentation includes step-by-step setup and execution instructions.
  • Troubleshooting section for common issues.
  • Clear description of the anonymization process and data transfer mechanism.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.