rsinghlab / maddi Goto Github PK

View Code? Open in Web Editor NEW

71.0 71.0 12.0 154 KB

This repository is for the Multimodal Alzheimer’s Disease Diagnosis framework (MADDi).

License: MIT License

Jupyter Notebook 62.84% Python 37.16%

maddi's People

Contributors

Stargazers

Forkers

yueup sarwanpasha karthimd sophiajwagner xlouszhj zhl6 hsfxhenan harel-coffee hnchan13 shunsunsun hoanbklucky ning7117

maddi's Issues

Couldn't create the file "overlap_test_set.csv"

To execute "preprocess_genetic/create_genetic_dataset.ipynb" , we need the dataset "overlap_test_set.csv", which will be created after executing "preprocess_overlap/create_overlap_dataset.ipynb".

But to execute "preprocess_overlap/create_overlap_dataset.ipynb" we need the dataset "vcf_select.pkl" , which will be created after executing "preprocess_genetic/create_genetic_dataset.ipynb".

In simple words the two files "preprocess_genetic/create_genetic_dataset.ipynb" and "preprocess_overlap/create_overlap_dataset.ipynb" depends on each other to execute.

Can you please give a clarification to this?

Doubt regarding dataset

Hello,
I have been working on your paper and came across the file "preprocess_genetic/concat_vcfs.py" where we require "DIAGNOSIS_TABLE" dataset, which is mentioned in the 12th line of that file. Can you please share the details regarding the dataset,If possible share the steps to obtain that dataset.

Regarding the source of dataset

Hello,
In the file preprocess_genetic/concat_vcfs.py we are facing multiple error due to the incompatible dataset.Can you please share the exact link of dataset in ADNI website or its better if you provide the sample vcf file.

Genetic data processing

Hello author, I am working with genetic data, filter_vcfs.py,
start = vcf_file.find("ADNI_ID.") + len("ADNI_ID.")
end = vcf_file.find("output.vcf")
For the genetic data in ANDI dataset, do you use.plink files for genetic data processing or.vCF files? I can't find the string ADNI_ID. Did you name these by yourself? Thank you for your answer

About image collections

I am already able to access the ADNI dataset and I am curious about how you chose image_ Collections. Because according to the paper, I cannot obtain the corresponding results. Can you share the method of selecting a dataset or add me to your ADNI researcher so that I can download your image collection.

Regarding ADNI dataset

I came across your paper and planning to implementing it. We have got access to the ADNI dataset from college but I am unable to find specific dataset used in the source code.
Can you share the exact dataset or can you help.me find it by pointing it out in ADNI with steps?
It would be extremely thankful if you can help me with this.

run filter_vcfs.py error

When I run filter_ vcfs.py，error:Error -3 while decompressing data: invalid distance too far back appears. How can this be resolved? The data is downloaded from ADNI using ADNI WGs Data - GATK SNV+Index call chr1 of 23, and then decompressed using the tar command to enter the filter_ vcfs.py.

Hello，I would like to reproduce your work

It is a great honor for me to read your article on Multimodal Attention-based Deep Learning for Alzheimer’s Disease Diagnosis. Your clear and compacted declaration inspired me a lot. However, in notebook general/diagnosis_making.ipynb new = pd.DataFrame.from_dict(d, orient='index').reset_index() ，'d' does not seem to be defined. Meanwhile, I'd like to know where the file all_img_try1_10_31_2021.csv came from, I didn't find it in the ADNI database.

Thanks!

I can't find any key "DX" on DXSUM_PDXCONV_ADNIALL.csv

Hello,

Thank you for opening your great study!

I got a 'DX' Key Error while running /general/diagnosis_making.ipynb.

So, I check all keys on DXSUM_PDXCONV_ADNIALL.csv and other csv files.
However, all keys related to DX are DXCAHNGE, DXCURREN, and a lot of words starting DX.

I'm confused which word do you mean?

Request to share preprocessed data

Dear：
Thank you for sharing your code openly; I've learned a lot from your article. However, due to data versioning issues, many of us have encountered significant obstacles when trying to replicate the experiments. The original data is protected for privacy reasons and cannot be shared publicly. Could I kindly request that you provide all preprocessed data for me to train the models? If possible, I would greatly appreciate it.
My email: [email protected]

doubt regarding the dataset

Hi Michal! We are a group of students from Georgia Tech working on Alzheimer Disease Detection. We came across your paper MADDi and are planning on implementing it. We have gained access to the ADNI dataset but we are unable to find the MRI images folder used in your source code preprocess_images.py and the metadata (all_img_try1_10_31_2021.csv) associated with it. Can you point us where to find it? We would be extremely grateful if you can help us with this.

About Datasets

Hello, thank you for sharing the code. I would like to replicate your work, and I have obtained access permission for ADNI. However, I'm facing difficulties in selecting the data according to the descriptions in the paper. If possible, could you share the dataset you filtered from ADNI with me, or provide some guidance on how to select file names or table names on the official ADNI website? My email is [email protected]. Thanks again for your work.

Need clarification in code

In the file "preprocess_genetic/concat_vcfs.py" on 25th line you have created a variable named "merged" and done many operation on that, finally to produce a pickle file on 37th line.But you haven't used that file anywhere in other part of the project.
Can you please give clarification on the usage of that file?

Request to share preprocessed data

Facing error due to missing of "index" column

Hello,
In the file "preprocess_genetic/concat_vcfs.py" at the line no 25 you are merging the two vcf files using the column named "index" and later renamed it as 'subject".But if we print the vcf file after 24th we will get something like this https://drive.google.com/file/d/1JGesHpv4QQje1aRVrf9pUBmuTW2yrZAU/view?usp=sharing , which doesnt have actual "subject id" value under the column "index".

We tried creating the subject column by putting the actual subject id but we are getting the errors in concatinating the multiple vcf files .

So can you please clarify on how the "subject" column should be considered and how exactly we can club multiple vcf files of different subject to sa ingle file, as they dont have common column name.

No .pkl files generated from pre-processing code and unable to locate diagnosis table for genetic data on ADNI

I am facing the same issue as well and also have access to ADNI. Would anyone be able to help? @michalg04: If I have made any mistakes which have caused the issues below to occur, would be able to point them out please? The following are my issues:

Genetic Data

I had to add the line if vcf_file.endswith(".gz"): inside the for loop for vcf_file in files: of the python script filter_vcfs.py to prevent .vcf.gz.tbi files from being processed as errors were returned.
For filter_vcfs.py, it seems that only .pkl files and "log.txt" will be generated, however, after iterating through all the files, that is, the ADNI WGS (GATK) data, not a single .pkl file was generated. Therefore, the only file output was log.txt containing which contain boolean values (nearly if not all are False). Issue: No pickle files generated, therefore unable to feed this data into downstream code concat_vcfs.py
I am struggling to find the labels for the genetic data used in the MADDI study i.e. for the python script concat_vcfs.py on line 12 diag = pd.read_csv("YOUR_PATH_TO_DIAGNOSIS_TABLE"), I am unable to locate the diagnosis table. Issue: Unable to find diagnosis table on ADNI website

Additional issues faced during genetic data pre-processing
For : ./ADNI.808_indiv.minGQ_21.pass.ADNI_ID.chr3.vcf.gz

CSV reading complete
vcf: <pandas.io.parsers.readers.TextFileReader object at 0x7fe95fb15790>
Traceback (most recent call last):
File "/home/user/Alzheimers/genetic_data/filter_vcfs.py", line 100, in
main()
File "/home/user/Alzheimers/genetic_data/filter_vcfs.py", line 61, in main
vcf = pd.concat(vcf, ignore_index=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 368, in concat
op = _Concatenator(
^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 422, in init
objs = list(objs)
^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1698, in next
return self.get_chunk()
^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1810, in get_chunk
return self.read(nrows=size)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/parsers.pyx", line 820, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 817 fields in line 1476784, saw 833

rsinghlab / maddi Goto Github PK

maddi's People

Contributors

Stargazers

Forkers

maddi's Issues

Recommend Projects

Recommend Topics

Recommend Org