sunlabuiuc / pyhealth Goto Github PK

A Deep Learning Python Toolkit for Healthcare Applications.

Home Page: https://pyhealth.readthedocs.io

License: MIT License

Python 94.58% Jupyter Notebook 3.57% Cython 1.80% Dockerfile 0.06%

healthcare data-mining deep-learning preprocessing clinical-data clinical-research electronic-medical-record medical-code electronic-health-record

pyhealth's Issues

Getting error while loading OMOP dataset

While loading the OMOP dataset I am getting the following error.

The results of SafeDrug model differ significantly from those in the paper.

Hi! sorry for bothering you again.
I ran the code for GAMENet, SafeDrug and MoleRec locally. The results of the three models are as follows:

Here is the problem: the jaccard_samples of my local SafeDrug can only reach about 0.33. Theoretically, the jaccard_samples of SafeDrug should be similar to GAMENet. Why is there such a big gap?
Additionally, why the results obtained with pyhealth are lower than in the paper? I note that the sample dataset contains 14,142 visits and 5,449 patients, which is different from the papers that contain 6,350 patients and 14,995 visits. Is it because of this?

Looking forward to and thank you for your reply!

API doc link seems to be broken

https://pyhealth.readthedocs.io/en/latest/pyhealth.html

ImportError: cannot import name 'MIMIC3BaseDataset' from 'pyhealth.datasets'

Hi,

When I was testing the example in https://pyhealth.readthedocs.io/en/latest/examples_bak.html#step-3-build-deep-learning-models, I found I could not load the dataset. Could you help look into this issue? Thank you!

> python test_retain.py
Traceback (most recent call last):
  File "test_retain.py", line 1, in <module>
    from pyhealth.datasets import MIMIC3BaseDataset
ImportError: cannot import name 'MIMIC3BaseDataset' from 'pyhealth.datasets' (/Users/anaconda3/envs/ehr/lib/python3.8/site-packages/pyhealth/datasets/__init__.py)

I also tried to search the class MIMIC3BaseDataset over this repo but could not find it. Any help would be appreciated!

Add contributor list (tmp2)

@all-contributors
please add @ycq091044 for code.
please add @zzachw for code.
please add @pat-jj for code.
please add @zlin7 for code.
please add @v1xerunt for code.
please add @BPDanek for code.
please add @solarsys for code.

drugrec for OMOP datasets doesn't work

from pyhealth.datasets import OMOPDataset
omop_base = OMOPDataset(
    root="https://storage.googleapis.com/pyhealth/synpuf1k_omop_cdm_5.2.2",
    tables=["condition_occurrence", "procedure_occurrence"],
    code_mapping={},
)
from pyhealth.tasks import drug_recommendation_omop_fn
omop_sample = omop_base.set_task(drug_recommendation_eicu_fn)

pyhealth.calib.calibration.

Dear Sir/Madam,

I got an error in the following commands:

from pyhealth.calib.calibration.hb import HistogramBinning
cal_model = HistogramBinning(model)
cal_model.calibrate(cal_dataset=val_dataset)

val_dataset is from torch subset from BaseEHRDataset. Could you give some advice?

How-to-contribute section is still missing

https://github.com/yzhao062/PyHealth#how-to-contribute

too keen on contributing 😂

RxNorm codes hierarchy

Hi, I noticed something that might seem strange in the hierarchy of RxNorm (and perhaps other vocabularies).
For instance, the code 1000001 in RxNorm doesn't have any parents or children in the PyHealth hierarchy.

However, according Athena it looks this code has parents and children:

This isn't specific to this code alone; it applies to many others as well. I just used 1000001 as an example.
I would like to use PyHealth for getting the hierarchy of RxNorm .
Can you please check this? How can I get the hierarchy of RxNorm correctly?
Thank you!
@pat-jj

Cannot download 'https://storage.googleapis.com/pyhealth/resource/NDC_to_ATC.csv'

When initialize the MIMIC3Dataset() class, I get urllib.error.URLError. And I checked the call stack, I found the problem lies in the function download_and_read_csv() of the CrossMap class. I think it's because of my own Internet connection, while I hope to open the local download permission for these files and alternate network download with local file reading.

The results obtained with pyhealth are much lower than in the paper.

Thanks for your great job!

But I wonder why the results reported on the PyHealth homepage are much lower than those reported in the paper of SafeDrug. And according to the results reported by PyHealth, GAMENet performs better than Safedrug, which contradicts the paper's results.

Below are the results reported in the SafeDrug paper,

How to use any of model for mimic-iii code prediction?

I am trying mimic-iii code prediction task, how I can feed data, pretrained embeddings, vocab, etc. Also, How to use models after that, a simple minimal example would be beneficial.

Thank you!

Descriptive info on the example data needed

I found your PyHealth package a valuable resource. I am trying the test_sequence_data.ipynb notebook with example dataset. While the csv files in /datasets/mimic/y_data/ folder seems to be clear because the column names are self-explanatory, but not the ones in /datasets/mimic/x_data/ folder, which has no column names. I’ve read the readme files and online documentation, couldn’t find anything. Can you help me on this?

BTW, it would help a lot if you could add some minimal description on the data, data processing or training steps in the notebook. That would help the users a lot, because they don’t have to spend a lot of time finding the info everywhere.

How to generate x_data as the data in datasets?

Hi, thanks for your great work! I am trying to run your code using the mimic-iii-demo dataset. The problem I met is that I don't know how to generate the x_data as the data in datasets (mimic or cms). I followed your instructions but only got the y_data after running generate_mortality_prediction_mimic_demo.py. Is this because the data in the datasets folder used the full variables of mimic-iii data while only a few of them existed in mimic-iii-demo data? Thanks!

Add contributor list (tmp3)

@all-contributors please add @ycq091044 for code

Hello, the question about repositorie "Pandarallel"

When I perform prediction tasks on the mimic-iv dataset, due to the amount of mimic-iv, my code is always in deadlock, like the before issue.

I want to know the specific version about 'pandas' and 'pandarallel', thanks!

Bug in SafeDrug

The program crashes when I run the SafeDrug for drug recommendation with the real MIMIC-III dataset. When I looked into the source code, I found the bug occurred in the function generate_molecule_info(), specifically, adjacency = Chem.GetAdjacencyMatrix(mol). https://github.com/sunlabuiuc/PyHealth/blob/5592d437abf6a06df7d41204cf56971f45e98a47/pyhealth/models/safedrug.py#L529C20-L529C20

I don't know what happened. Please provide some suggestions. Thank you.
I provide a bug case here: smile = '[F-].[Na+]'

a question about ddi rate

Hi! sorry for bothering you again.

I added ddi rate as a metric and checked the ddi rate, but this value was much smaller than it should be. Besides, it seems that the ddi matrix of PyHealth is different from SafeDrug and GAMEnet. Could you please give me some help about it?

DDI of eICU dataset and omop dataset

Hi. Can we use pyhealth to caculate DDI rate in these two dataset?

Question about MIMIC-iii dataset

Hi, I found that in the MoleRec paper, the processed mimic-iii dataset has 6, 350 patients and 14, 995 visits. However, I only got 5, 449 patients and 14, 141 visits when I using PyHealth to process this dataset. Here is my screenshot.

wrong code in pyhealth\metrics\drug_recommendation.py

Hello, when you update the code, you wrote the wrong code to calculate ddi, the original code is correct. It bothered me all morning. ^ ^

if ddi_matrix[i, j] == 1 or ddi_matrix[j, i] == 1: # wrong code
if ddi_matrix[med_i, med_j] == 1 or ddi_matrix[med_j, med_i] == 1: # Old and correct code

Performance of SafeDrug and Molerec

Hello, esteemed author. I noticed that when running the safedrug and molerec algorithms from your library, their performance falls far short of what is claimed in your papers and benchmarks. I would like to inquire whether you used any specific parameters or techniques during testing. Thank you for your response.

question about eicu in drug recommendation task

Thank you so much for your work!
when I use eicu data for drug recommendation, I meet an error as:
Key drugs has mixed nested list levels across samples.

could you please tell me how to solve this problem?

Thanks in advance

no such thing SampleDataset

pyhealth 1.1.4 version
no such thing
i think u should either update ur docu or code

a question about metrics

Is there a way to parameterize DDI rate as metrics in Trainer? And output the final DDI rate in trainer.evaluate().

pip install version

Hello, I recently installed PyHealth using the command ```pip install pyhealth''', and it installed version 1.1.4. However, I noticed some discrepancies between this version and the latest code available on GitHub. For example, the multilabel_metrics_fn seems to be different.

question about MIMIC-III in drug recommendation task

The MIMIC-III dataset used in many of the papers (eg. SafeDrug, GAMENet, MoleRec) consists of 50,206 medical encounter records. By filtering out the patients with only one visit, they would contain 14,995 visits and 6,350 patients, In the code of drug_recommendation_mimic3_fn, they appear to have the same task as in the paper, but using "mimic3_ds= mimic3_ds.set_task(task_fn=drug_recommendation_mimic3_fn)" would only produce 911 patients and 1858 Visits, why is this?

Bug in GAMNET

Dear Sir/Madam,

When I run 'drug_recommendation_mimic4_gamenet.py' in tutorials, I get an error.

Epoch 0 / 20:   0%|                                                                                               | 0/2 [00:00<?, ?it/s]queries shape torch.Size([64, 10, 128])
prev_drugs shape torch.Size([64, 10, 147])
curr_drugs shape torch.Size([64, 147])
a_s shape torch.Size([64, 9])
DM_values shape torch.Size([64, 10, 147])
Epoch 0 / 20:   0%|                                                                                               | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/featurize/PyHealth/examples/drug_recommendation_mimic4_gamenet.py", line 103, in <module>
    model, trainer = train_gamenet(data, train_loader, val_loader)
  File "/home/featurize/PyHealth/examples/drug_recommendation_mimic4_gamenet.py", line 77, in train_gamenet
    trainer.train(
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/trainer.py", line 195, in train
    output = self.model(**data)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/models/gamenet.py", line 410, in forward
    loss, y_prob = self.gamenet(queries, prev_drugs, curr_drugs, mask)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/models/gamenet.py", line 211, in forward
    a_m = torch.einsum("bv,bvz->bz", a_s, DM_values.float())
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/functional.py", line 378, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: einsum(): subscript v has size 10 for operand 1 which does not broadcast with previously seen size 9

It seems it can be addressed like in the following figure. Please have a look.

The Demo for LSTM on Phenotyping Prediction with GPU

cur_dataset = expdata_generator(exp_id=exp_id)
should change to
cur_dataset = expdata_generator(expdata_id=expdata_id)
Thanks~

Does models support text?

Dear Sir or Madam,

I have input text report as features into the transformer and it works. But I dont know if it is meaningful to do so. If it can learn from the text as language models do?

I follow the case 2 for the transformer model, each code is a radiology report.

"case 2. [[code1, code2]] or [[code1, code2], [code3, code4, code5], …]"

KeyError: 'logit'

Dear Sir/Madam,

I got an error when I ran the following codes (data is from Pipeline 5: Sleep Staging):

cal_model = HistogramBinning(model)

cal_model =KCal(model)

cal_model =TemperatureScaling(model)

cal_model.calibrate(cal_dataset=val_dataset)
from pyhealth.trainer import Trainer
print(Trainer(model=cal_model, metrics=['cwECEt_adapt', 'accuracy']).evaluate(test_loader))

Any advice? Thank you.

Entering deadlock when parsing prescriptions

I have tested some basic code from the tutorial with the MIMIC-4 dataset. But the process hanged. I press ctrl-C to exit the program and it gives the following call stacks. Seems like parallel_apply get into a deadlock or something else when parsing prescriptions.

reproducing code

import logging
from pyhealth.datasets import MIMIC4Dataset

logger = logging.getLogger("pyhealth")
logger.setLevel(logging.DEBUG)

dataset = MIMIC4Dataset(
    "/home/featurize/data/mimic-iv-2.2/hosp",
    tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
    code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
)

Callstacks after ctrl-C

Loaded NDC->ATC mapping from /home/featurize/.cache/pyhealth/medcode/NDC_to_ATC.pkl                                                                                                                                
Loaded NDC code from /home/featurize/.cache/pyhealth/medcode/NDC.pkl                                     
Loaded ATC code from /home/featurize/.cache/pyhealth/medcode/ATC.pkl                                                                                                                                               
Processing MIMIC4Dataset base dataset...            
INFO: Pandarallel will run on 6 workers.                                                                 
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.                                                                                                               
finish basic patient information parsing : 80.05470561981201s                                            
finish parsing diagnoses_icd : 134.23406291007996s                                                       
finish parsing procedures_icd : 57.97325396537781s                                                       
                                                    
^CTraceback (most recent call last):                                                                                                                                                                               
  File "main.py", line 7, in <module>
    dataset = MIMIC4Dataset(                                                                             
  File "/home/featurize/work/PyHealth/pyhealth/datasets/base_ehr_dataset.py", line 130, in __init__
    patients = self.parse_tables()                                                                                                                                                                                 
  File "/home/featurize/work/PyHealth/pyhealth/datasets/base_ehr_dataset.py", line 190, in parse_tables
    patients = getattr(self, f"parse_{table.lower()}")(patients)                                                                                                                                                   
  File "/home/featurize/work/PyHealth/pyhealth/datasets/mimic4.py", line 307, in parse_prescriptions
    group_df = group_df.parallel_apply(                                                                                                                                                                            
  File "/environment/miniconda3/envs/py38/lib/python3.8/site-packages/pandarallel/core.py", line 307, in closure
Process ForkPoolWorker-28:                                                                               
Process ForkPoolWorker-31:        
Process ForkPoolWorker-30:                                                                                                                                                                                         
Process ForkPoolWorker-32:          
Process ForkPoolWorker-33:                                                                                                                                                                                         
Process ForkPoolWorker-29:                   
    message: Tuple[int, WorkerStatus, Any] = master_workers_queue.get()                                                                                                                                            
  File "<string>", line 2, in get   
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod  
Traceback (most recent call last):
Traceback (most recent call last):                                                                                                                                                                                 
Traceback (most recent call last):  
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap                                                                                                       
    self.run()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()                               
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run                                                                                                              
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker                                                                                                              
    task = get()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker    
    task = get()                                                                                                                                                                                                   
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker            
    task = get()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 356, in get
    res = self._reader.recv_bytes()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt

Does the inner map cotains all the icd code id?

I use the following code to get all the icd tokens.

base_dataset2 = MIMIC4Dataset(
       root="/home/czhaobo/KnowHealth/data/physionet.org/files/mimiciv/2.0/hosp",  # 2.2 不大行
       tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
       code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
       dev=False,
       refresh_cache=False, # 第一次用True
   )
   sample_dataset2 = base_dataset2.set_task(drug_recommendation_mimic4_fn)
   tokenizer2 = Tokenizer(
       tokens=sample_dataset2.get_all_tokens(key='conditions'),
       special_tokens=["<pad>", "<unk>"],
   )
   tokens2 = list(tokenizer2.vocabulary.idx2token.values())
   print(tokens2)
   diag_sys1, proc_sys1, med_sys1 = get_stand_system('MIMIC-III')
   diag_sys2, proc_sys2, med_sys2 = get_stand_system('MIMIC-IV')

but when i try to find their name via Innermap.lookup, i always get a key error. For example, H4011X0 is a id in tokens2,
```python
if __name__ == "__main__":
   icd9cm = InnerMap.load("ICD9CM")
   icd10cm = InnerMap.load("ICD10CM")
   print(icd9cm.lookup('H4011X0'))
   print(icd10cm.lookup('H4011X0'))

sequential_drug_recommendation

Dear sir/Madam,

In 'Advanced Case 2: Work on customized healthcare task' , I have questions about 'sequential_drugs' and 'drugs' (see following codes). It seems 'sequential_drugs' is always empty. What ' sequential_drugs[-1] = drugs' is used for at the final row?

def sequential_drug_recommendation(patient):
samples = []

sequential_conditions = []
sequential_procedures = []
sequential_drugs = [] # not include the drugs now
for visit in patient:

    # step 1: obtain feature information
    conditions = visit.get_code_list(table="DIAGNOSES_ICD")
    procedures = visit.get_code_list(table="PROCEDURES_ICD")
    drugs = visit.get_code_list(table="PRESCRIPTIONS")

    sequential_conditions.append(conditions)
    sequential_procedures.append(drugs)
    sequential_drugs.append([])

    # step 2: exclusion criteria: visits without drug
    if len(drugs) == 0: 
        sequential_drugs[-1] = drugs
        continue

    # step 3: assemble the samples
    samples.append(
        {
            "visit_id": visit.visit_id,
            "patient_id": patient.patient_id,
            # the following keys can be the "feature_keys" or "label_key" for initializing downstream ML model
            "sequential_conditions": sequential_drugs.copy(),
            "sequential_procedures": sequential_procedures.copy(),
            "sequential_drugs": sequential_drugs.copy(),
            "label": drugs,
        }
    )
    sequential_drugs[-1] = drugs

return samples

HALO and EHR synthetic task

I found that HALO is combined in the branch 'main Halo 2'. Will it be added to the main branch along with the EHR synthetic task? #278

Use Pretrained Model

Hi,
How can I do training from a pre-trained model?

For example, instead of using:

from pyhealth.models import Transformer, RNN, RETAIN

model = Transformer(
    dataset=mimic3_task_ds,
    # look up what are available for "feature_keys" and "label_keys" in dataset.samples[0]
    feature_keys=["conditions", "procedures", "drugs"],
    label_key="label",
    mode="binary",

I would like to use a pre-trained Transformer or pre-trained bert model instead.
Is it possible?
@pat-jj

sunlabuiuc / pyhealth Goto Github PK

pyhealth's Issues

cal_model =KCal(model)

cal_model =TemperatureScaling(model)

Recommend Projects

Recommend Topics

Recommend Org