Giter Club home page Giter Club logo

pyhealth's Issues

The results of SafeDrug model differ significantly from those in the paper.

Hi! sorry for bothering you again.
I ran the code for GAMENet, SafeDrug and MoleRec locally. The results of the three models are as follows:
394bb702054f7c354d1e4074b664a4a
Here is the problem: the jaccard_samples of my local SafeDrug can only reach about 0.33. Theoretically, the jaccard_samples of SafeDrug should be similar to GAMENet. Why is there such a big gap?
Additionally, why the results obtained with pyhealth are lower than in the paper? I note that the sample dataset contains 14,142 visits and 5,449 patients, which is different from the papers that contain 6,350 patients and 14,995 visits. Is it because of this?
image
Looking forward to and thank you for your reply!

ImportError: cannot import name 'MIMIC3BaseDataset' from 'pyhealth.datasets'

Hi,

When I was testing the example in https://pyhealth.readthedocs.io/en/latest/examples_bak.html#step-3-build-deep-learning-models, I found I could not load the dataset. Could you help look into this issue? Thank you!

> python test_retain.py
Traceback (most recent call last):
  File "test_retain.py", line 1, in <module>
    from pyhealth.datasets import MIMIC3BaseDataset
ImportError: cannot import name 'MIMIC3BaseDataset' from 'pyhealth.datasets' (/Users/anaconda3/envs/ehr/lib/python3.8/site-packages/pyhealth/datasets/__init__.py)

I also tried to search the class MIMIC3BaseDataset over this repo but could not find it. Any help would be appreciated!

drugrec for OMOP datasets doesn't work

from pyhealth.datasets import OMOPDataset
omop_base = OMOPDataset(
    root="https://storage.googleapis.com/pyhealth/synpuf1k_omop_cdm_5.2.2",
    tables=["condition_occurrence", "procedure_occurrence"],
    code_mapping={},
)
from pyhealth.tasks import drug_recommendation_omop_fn
omop_sample = omop_base.set_task(drug_recommendation_eicu_fn)

pyhealth.calib.calibration.

Dear Sir/Madam,

I got an error in the following commands:

from pyhealth.calib.calibration.hb import HistogramBinning
cal_model = HistogramBinning(model)
cal_model.calibrate(cal_dataset=val_dataset)

val_dataset is from torch subset from BaseEHRDataset. Could you give some advice?
image
image

RxNorm codes hierarchy

Hi, I noticed something that might seem strange in the hierarchy of RxNorm (and perhaps other vocabularies).
For instance, the code 1000001 in RxNorm doesn't have any parents or children in the PyHealth hierarchy.

image
However, according Athena it looks this code has parents and children:
image

This isn't specific to this code alone; it applies to many others as well. I just used 1000001 as an example.
I would like to use PyHealth for getting the hierarchy of RxNorm .
Can you please check this? How can I get the hierarchy of RxNorm correctly?
Thank you!
@pat-jj

Cannot download 'https://storage.googleapis.com/pyhealth/resource/NDC_to_ATC.csv'

When initialize the MIMIC3Dataset() class, I get urllib.error.URLError. And I checked the call stack, I found the problem lies in the function download_and_read_csv() of the CrossMap class. I think it's because of my own Internet connection, while I hope to open the local download permission for these files and alternate network download with local file reading.

The results obtained with pyhealth are much lower than in the paper.

Thanks for your great job!

But I wonder why the results reported on the PyHealth homepage are much lower than those reported in the paper of SafeDrug. And according to the results reported by PyHealth, GAMENet performs better than Safedrug, which contradicts the paper's results.
截屏2023-08-23 19 33 22

Below are the results reported in the SafeDrug paper,

截屏2023-08-23 19 46 27

Descriptive info on the example data needed

I found your PyHealth package a valuable resource. I am trying the test_sequence_data.ipynb notebook with example dataset. While the csv files in /datasets/mimic/y_data/ folder seems to be clear because the column names are self-explanatory, but not the ones in /datasets/mimic/x_data/ folder, which has no column names. I’ve read the readme files and online documentation, couldn’t find anything. Can you help me on this?

BTW, it would help a lot if you could add some minimal description on the data, data processing or training steps in the notebook. That would help the users a lot, because they don’t have to spend a lot of time finding the info everywhere.

How to generate x_data as the data in datasets?

Hi, thanks for your great work! I am trying to run your code using the mimic-iii-demo dataset. The problem I met is that I don't know how to generate the x_data as the data in datasets (mimic or cms). I followed your instructions but only got the y_data after running generate_mortality_prediction_mimic_demo.py. Is this because the data in the datasets folder used the full variables of mimic-iii data while only a few of them existed in mimic-iii-demo data? Thanks!

Hello, the question about repositorie "Pandarallel"

When I perform prediction tasks on the mimic-iv dataset, due to the amount of mimic-iv, my code is always in deadlock, like the before issue.

I want to know the specific version about 'pandas' and 'pandarallel', thanks!

Bug in SafeDrug

The program crashes when I run the SafeDrug for drug recommendation with the real MIMIC-III dataset. When I looked into the source code, I found the bug occurred in the function generate_molecule_info(), specifically, adjacency = Chem.GetAdjacencyMatrix(mol). https://github.com/sunlabuiuc/PyHealth/blob/5592d437abf6a06df7d41204cf56971f45e98a47/pyhealth/models/safedrug.py#L529C20-L529C20

I don't know what happened. Please provide some suggestions. Thank you.
I provide a bug case here: smile = '[F-].[Na+]'

a question about ddi rate

Hi! sorry for bothering you again.

I added ddi rate as a metric and checked the ddi rate, but this value was much smaller than it should be. Besides, it seems that the ddi matrix of PyHealth is different from SafeDrug and GAMEnet. Could you please give me some help about it?

Question about MIMIC-iii dataset

Hi, I found that in the MoleRec paper, the processed mimic-iii dataset has 6, 350 patients and 14, 995 visits. However, I only got 5, 449 patients and 14, 141 visits when I using PyHealth to process this dataset. Here is my screenshot.
image

wrong code in pyhealth\metrics\drug_recommendation.py

Hello, when you update the code, you wrote the wrong code to calculate ddi, the original code is correct. It bothered me all morning. ^ ^

if ddi_matrix[i, j] == 1 or ddi_matrix[j, i] == 1: # wrong code
if ddi_matrix[med_i, med_j] == 1 or ddi_matrix[med_j, med_i] == 1: # Old and correct code

Performance of SafeDrug and Molerec

Hello, esteemed author. I noticed that when running the safedrug and molerec algorithms from your library, their performance falls far short of what is claimed in your papers and benchmarks. I would like to inquire whether you used any specific parameters or techniques during testing. Thank you for your response.

question about eicu in drug recommendation task

Thank you so much for your work!
when I use eicu data for drug recommendation, I meet an error as:
Key drugs has mixed nested list levels across samples.

could you please tell me how to solve this problem?

Thanks in advance

a question about metrics

Is there a way to parameterize DDI rate as metrics in Trainer? And output the final DDI rate in trainer.evaluate().

pip install version

Hello, I recently installed PyHealth using the command ```pip install pyhealth''', and it installed version 1.1.4. However, I noticed some discrepancies between this version and the latest code available on GitHub. For example, the multilabel_metrics_fn seems to be different.

question about MIMIC-III in drug recommendation task

The MIMIC-III dataset used in many of the papers (eg. SafeDrug, GAMENet, MoleRec) consists of 50,206 medical encounter records. By filtering out the patients with only one visit, they would contain 14,995 visits and 6,350 patients, In the code of drug_recommendation_mimic3_fn, they appear to have the same task as in the paper, but using "mimic3_ds= mimic3_ds.set_task(task_fn=drug_recommendation_mimic3_fn)" would only produce 911 patients and 1858 Visits, why is this?

Bug in GAMNET

Dear Sir/Madam,

When I run 'drug_recommendation_mimic4_gamenet.py' in tutorials, I get an error.

Epoch 0 / 20:   0%|                                                                                               | 0/2 [00:00<?, ?it/s]queries shape torch.Size([64, 10, 128])
prev_drugs shape torch.Size([64, 10, 147])
curr_drugs shape torch.Size([64, 147])
a_s shape torch.Size([64, 9])
DM_values shape torch.Size([64, 10, 147])
Epoch 0 / 20:   0%|                                                                                               | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/featurize/PyHealth/examples/drug_recommendation_mimic4_gamenet.py", line 103, in <module>
    model, trainer = train_gamenet(data, train_loader, val_loader)
  File "/home/featurize/PyHealth/examples/drug_recommendation_mimic4_gamenet.py", line 77, in train_gamenet
    trainer.train(
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/trainer.py", line 195, in train
    output = self.model(**data)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/models/gamenet.py", line 410, in forward
    loss, y_prob = self.gamenet(queries, prev_drugs, curr_drugs, mask)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/models/gamenet.py", line 211, in forward
    a_m = torch.einsum("bv,bvz->bz", a_s, DM_values.float())
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/functional.py", line 378, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: einsum(): subscript v has size 10 for operand 1 which does not broadcast with previously seen size 9

It seems it can be addressed like in the following figure. Please have a look.
1690209021253

Does models support text?

Dear Sir or Madam,

I have input text report as features into the transformer and it works. But I dont know if it is meaningful to do so. If it can learn from the text as language models do?

I follow the case 2 for the transformer model, each code is a radiology report.

"case 2. [[code1, code2]] or [[code1, code2], [code3, code4, code5], …]"

KeyError: 'logit'

Dear Sir/Madam,

I got an error when I ran the following codes (data is from Pipeline 5: Sleep Staging):

cal_model = HistogramBinning(model)

cal_model =KCal(model)

cal_model =TemperatureScaling(model)

cal_model.calibrate(cal_dataset=val_dataset)
from pyhealth.trainer import Trainer
print(Trainer(model=cal_model, metrics=['cwECEt_adapt', 'accuracy']).evaluate(test_loader))

image
image

Any advice? Thank you.

Entering deadlock when parsing prescriptions

I have tested some basic code from the tutorial with the MIMIC-4 dataset. But the process hanged. I press ctrl-C to exit the program and it gives the following call stacks. Seems like parallel_apply get into a deadlock or something else when parsing prescriptions.

reproducing code

import logging
from pyhealth.datasets import MIMIC4Dataset

logger = logging.getLogger("pyhealth")
logger.setLevel(logging.DEBUG)

dataset = MIMIC4Dataset(
    "/home/featurize/data/mimic-iv-2.2/hosp",
    tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
    code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
)

Callstacks after ctrl-C

Loaded NDC->ATC mapping from /home/featurize/.cache/pyhealth/medcode/NDC_to_ATC.pkl                                                                                                                                
Loaded NDC code from /home/featurize/.cache/pyhealth/medcode/NDC.pkl                                     
Loaded ATC code from /home/featurize/.cache/pyhealth/medcode/ATC.pkl                                                                                                                                               
Processing MIMIC4Dataset base dataset...            
INFO: Pandarallel will run on 6 workers.                                                                 
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.                                                                                                               
finish basic patient information parsing : 80.05470561981201s                                            
finish parsing diagnoses_icd : 134.23406291007996s                                                       
finish parsing procedures_icd : 57.97325396537781s                                                       
                                                    
^CTraceback (most recent call last):                                                                                                                                                                               
  File "main.py", line 7, in <module>
    dataset = MIMIC4Dataset(                                                                             
  File "/home/featurize/work/PyHealth/pyhealth/datasets/base_ehr_dataset.py", line 130, in __init__
    patients = self.parse_tables()                                                                                                                                                                                 
  File "/home/featurize/work/PyHealth/pyhealth/datasets/base_ehr_dataset.py", line 190, in parse_tables
    patients = getattr(self, f"parse_{table.lower()}")(patients)                                                                                                                                                   
  File "/home/featurize/work/PyHealth/pyhealth/datasets/mimic4.py", line 307, in parse_prescriptions
    group_df = group_df.parallel_apply(                                                                                                                                                                            
  File "/environment/miniconda3/envs/py38/lib/python3.8/site-packages/pandarallel/core.py", line 307, in closure
Process ForkPoolWorker-28:                                                                               
Process ForkPoolWorker-31:        
Process ForkPoolWorker-30:                                                                                                                                                                                         
Process ForkPoolWorker-32:          
Process ForkPoolWorker-33:                                                                                                                                                                                         
Process ForkPoolWorker-29:                   
    message: Tuple[int, WorkerStatus, Any] = master_workers_queue.get()                                                                                                                                            
  File "<string>", line 2, in get   
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod  
Traceback (most recent call last):
Traceback (most recent call last):                                                                                                                                                                                 
Traceback (most recent call last):  
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap                                                                                                       
    self.run()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()                               
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run                                                                                                              
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker                                                                                                              
    task = get()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker    
    task = get()                                                                                                                                                                                                   
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker            
    task = get()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 356, in get
    res = self._reader.recv_bytes()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt

Does the inner map cotains all the icd code id?

I use the following code to get all the icd tokens.

base_dataset2 = MIMIC4Dataset(
       root="/home/czhaobo/KnowHealth/data/physionet.org/files/mimiciv/2.0/hosp",  # 2.2 不大行
       tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
       code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
       dev=False,
       refresh_cache=False, # 第一次用True
   )
   sample_dataset2 = base_dataset2.set_task(drug_recommendation_mimic4_fn)
   tokenizer2 = Tokenizer(
       tokens=sample_dataset2.get_all_tokens(key='conditions'),
       special_tokens=["<pad>", "<unk>"],
   )
   tokens2 = list(tokenizer2.vocabulary.idx2token.values())
   print(tokens2)
   diag_sys1, proc_sys1, med_sys1 = get_stand_system('MIMIC-III')
   diag_sys2, proc_sys2, med_sys2 = get_stand_system('MIMIC-IV')

but when i try to find their name via Innermap.lookup, i always get a key error. For example, H4011X0 is a id in tokens2,
```python
if __name__ == "__main__":
   icd9cm = InnerMap.load("ICD9CM")
   icd10cm = InnerMap.load("ICD10CM")
   print(icd9cm.lookup('H4011X0'))
   print(icd10cm.lookup('H4011X0'))

sequential_drug_recommendation

Dear sir/Madam,

In 'Advanced Case 2: Work on customized healthcare task' , I have questions about 'sequential_drugs' and 'drugs' (see following codes). It seems 'sequential_drugs' is always empty. What ' sequential_drugs[-1] = drugs' is used for at the final row?

def sequential_drug_recommendation(patient):
samples = []

sequential_conditions = []
sequential_procedures = []
sequential_drugs = [] # not include the drugs now
for visit in patient:

    # step 1: obtain feature information
    conditions = visit.get_code_list(table="DIAGNOSES_ICD")
    procedures = visit.get_code_list(table="PROCEDURES_ICD")
    drugs = visit.get_code_list(table="PRESCRIPTIONS")

    sequential_conditions.append(conditions)
    sequential_procedures.append(drugs)
    sequential_drugs.append([])

    # step 2: exclusion criteria: visits without drug
    if len(drugs) == 0: 
        sequential_drugs[-1] = drugs
        continue

    # step 3: assemble the samples
    samples.append(
        {
            "visit_id": visit.visit_id,
            "patient_id": patient.patient_id,
            # the following keys can be the "feature_keys" or "label_key" for initializing downstream ML model
            "sequential_conditions": sequential_drugs.copy(),
            "sequential_procedures": sequential_procedures.copy(),
            "sequential_drugs": sequential_drugs.copy(),
            "label": drugs,
        }
    )
    sequential_drugs[-1] = drugs

return samples

HALO and EHR synthetic task

I found that HALO is combined in the branch 'main Halo 2'. Will it be added to the main branch along with the EHR synthetic task? #278

Use Pretrained Model

Hi,
How can I do training from a pre-trained model?

For example, instead of using:

from pyhealth.models import Transformer, RNN, RETAIN

model = Transformer(
    dataset=mimic3_task_ds,
    # look up what are available for "feature_keys" and "label_keys" in dataset.samples[0]
    feature_keys=["conditions", "procedures", "drugs"],
    label_key="label",
    mode="binary",

I would like to use a pre-trained Transformer or pre-trained bert model instead.
Is it possible?
@pat-jj

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.