sunlabuiuc / pyhealth Goto Github PK
View Code? Open in Web Editor NEWA Deep Learning Python Toolkit for Healthcare Applications.
Home Page: https://pyhealth.readthedocs.io
License: MIT License
A Deep Learning Python Toolkit for Healthcare Applications.
Home Page: https://pyhealth.readthedocs.io
License: MIT License
Hi! sorry for bothering you again.
I ran the code for GAMENet, SafeDrug and MoleRec locally. The results of the three models are as follows:
Here is the problem: the jaccard_samples of my local SafeDrug can only reach about 0.33. Theoretically, the jaccard_samples of SafeDrug should be similar to GAMENet. Why is there such a big gap?
Additionally, why the results obtained with pyhealth are lower than in the paper? I note that the sample dataset contains 14,142 visits and 5,449 patients, which is different from the papers that contain 6,350 patients and 14,995 visits. Is it because of this?
Looking forward to and thank you for your reply!
Hi,
When I was testing the example in https://pyhealth.readthedocs.io/en/latest/examples_bak.html#step-3-build-deep-learning-models, I found I could not load the dataset. Could you help look into this issue? Thank you!
> python test_retain.py
Traceback (most recent call last):
File "test_retain.py", line 1, in <module>
from pyhealth.datasets import MIMIC3BaseDataset
ImportError: cannot import name 'MIMIC3BaseDataset' from 'pyhealth.datasets' (/Users/anaconda3/envs/ehr/lib/python3.8/site-packages/pyhealth/datasets/__init__.py)
I also tried to search the class MIMIC3BaseDataset
over this repo but could not find it. Any help would be appreciated!
@all-contributors
please add @ycq091044 for code.
please add @zzachw for code.
please add @pat-jj for code.
please add @zlin7 for code.
please add @v1xerunt for code.
please add @BPDanek for code.
please add @solarsys for code.
from pyhealth.datasets import OMOPDataset
omop_base = OMOPDataset(
root="https://storage.googleapis.com/pyhealth/synpuf1k_omop_cdm_5.2.2",
tables=["condition_occurrence", "procedure_occurrence"],
code_mapping={},
)
from pyhealth.tasks import drug_recommendation_omop_fn
omop_sample = omop_base.set_task(drug_recommendation_eicu_fn)
https://github.com/yzhao062/PyHealth#how-to-contribute
too keen on contributing 😂
Hi, I noticed something that might seem strange in the hierarchy of RxNorm
(and perhaps other vocabularies).
For instance, the code 1000001
in RxNorm
doesn't have any parents or children in the PyHealth
hierarchy.
However, according Athena it looks this code has parents and children:
This isn't specific to this code alone; it applies to many others as well. I just used 1000001 as an example.
I would like to use PyHealth for getting the hierarchy of RxNorm
.
Can you please check this? How can I get the hierarchy of RxNorm
correctly?
Thank you!
@pat-jj
When initialize the MIMIC3Dataset()
class, I get urllib.error.URLError
. And I checked the call stack, I found the problem lies in the function download_and_read_csv()
of the CrossMap
class. I think it's because of my own Internet connection, while I hope to open the local download permission for these files and alternate network download with local file reading.
Thanks for your great job!
But I wonder why the results reported on the PyHealth homepage are much lower than those reported in the paper of SafeDrug. And according to the results reported by PyHealth, GAMENet performs better than Safedrug, which contradicts the paper's results.
Below are the results reported in the SafeDrug paper,
I am trying mimic-iii code prediction task, how I can feed data, pretrained embeddings, vocab, etc. Also, How to use models after that, a simple minimal example would be beneficial.
Thank you!
I found your PyHealth package a valuable resource. I am trying the test_sequence_data.ipynb notebook with example dataset. While the csv files in /datasets/mimic/y_data/ folder seems to be clear because the column names are self-explanatory, but not the ones in /datasets/mimic/x_data/ folder, which has no column names. I’ve read the readme files and online documentation, couldn’t find anything. Can you help me on this?
BTW, it would help a lot if you could add some minimal description on the data, data processing or training steps in the notebook. That would help the users a lot, because they don’t have to spend a lot of time finding the info everywhere.
Hi, thanks for your great work! I am trying to run your code using the mimic-iii-demo dataset. The problem I met is that I don't know how to generate the x_data as the data in datasets (mimic or cms). I followed your instructions but only got the y_data after running generate_mortality_prediction_mimic_demo.py. Is this because the data in the datasets folder used the full variables of mimic-iii data while only a few of them existed in mimic-iii-demo data? Thanks!
@all-contributors please add @ycq091044 for code
When I perform prediction tasks on the mimic-iv dataset, due to the amount of mimic-iv, my code is always in deadlock, like the before issue.
I want to know the specific version about 'pandas' and 'pandarallel', thanks!
The program crashes when I run the SafeDrug for drug recommendation with the real MIMIC-III dataset. When I looked into the source code, I found the bug occurred in the function generate_molecule_info()
, specifically, adjacency = Chem.GetAdjacencyMatrix(mol)
. https://github.com/sunlabuiuc/PyHealth/blob/5592d437abf6a06df7d41204cf56971f45e98a47/pyhealth/models/safedrug.py#L529C20-L529C20
I don't know what happened. Please provide some suggestions. Thank you.
I provide a bug case here: smile = '[F-].[Na+]'
Hi! sorry for bothering you again.
I added ddi rate as a metric and checked the ddi rate, but this value was much smaller than it should be. Besides, it seems that the ddi matrix of PyHealth is different from SafeDrug and GAMEnet. Could you please give me some help about it?
Hi. Can we use pyhealth to caculate DDI rate in these two dataset?
Hello, when you update the code, you wrote the wrong code to calculate ddi, the original code is correct. It bothered me all morning. ^ ^
if ddi_matrix[i, j] == 1 or ddi_matrix[j, i] == 1: # wrong code
if ddi_matrix[med_i, med_j] == 1 or ddi_matrix[med_j, med_i] == 1: # Old and correct code
Hello, esteemed author. I noticed that when running the safedrug and molerec algorithms from your library, their performance falls far short of what is claimed in your papers and benchmarks. I would like to inquire whether you used any specific parameters or techniques during testing. Thank you for your response.
Thank you so much for your work!
when I use eicu data for drug recommendation, I meet an error as:
Key drugs has mixed nested list levels across samples.
could you please tell me how to solve this problem?
Thanks in advance
pyhealth 1.1.4 version
no such thing
i think u should either update ur docu or code
Is there a way to parameterize DDI rate as metrics in Trainer? And output the final DDI rate in trainer.evaluate().
Hello, I recently installed PyHealth using the command ```pip install pyhealth''', and it installed version 1.1.4. However, I noticed some discrepancies between this version and the latest code available on GitHub. For example, the multilabel_metrics_fn seems to be different.
The MIMIC-III dataset used in many of the papers (eg. SafeDrug, GAMENet, MoleRec) consists of 50,206 medical encounter records. By filtering out the patients with only one visit, they would contain 14,995 visits and 6,350 patients, In the code of drug_recommendation_mimic3_fn, they appear to have the same task as in the paper, but using "mimic3_ds= mimic3_ds.set_task(task_fn=drug_recommendation_mimic3_fn)" would only produce 911 patients and 1858 Visits, why is this?
Dear Sir/Madam,
When I run 'drug_recommendation_mimic4_gamenet.py' in tutorials, I get an error.
Epoch 0 / 20: 0%| | 0/2 [00:00<?, ?it/s]queries shape torch.Size([64, 10, 128])
prev_drugs shape torch.Size([64, 10, 147])
curr_drugs shape torch.Size([64, 147])
a_s shape torch.Size([64, 9])
DM_values shape torch.Size([64, 10, 147])
Epoch 0 / 20: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/featurize/PyHealth/examples/drug_recommendation_mimic4_gamenet.py", line 103, in <module>
model, trainer = train_gamenet(data, train_loader, val_loader)
File "/home/featurize/PyHealth/examples/drug_recommendation_mimic4_gamenet.py", line 77, in train_gamenet
trainer.train(
File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/trainer.py", line 195, in train
output = self.model(**data)
File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/models/gamenet.py", line 410, in forward
loss, y_prob = self.gamenet(queries, prev_drugs, curr_drugs, mask)
File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/models/gamenet.py", line 211, in forward
a_m = torch.einsum("bv,bvz->bz", a_s, DM_values.float())
File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/functional.py", line 378, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
RuntimeError: einsum(): subscript v has size 10 for operand 1 which does not broadcast with previously seen size 9
It seems it can be addressed like in the following figure. Please have a look.
cur_dataset = expdata_generator(exp_id=exp_id)
should change to
cur_dataset = expdata_generator(expdata_id=expdata_id)
Thanks~
Dear Sir or Madam,
I have input text report as features into the transformer and it works. But I dont know if it is meaningful to do so. If it can learn from the text as language models do?
I follow the case 2 for the transformer model, each code is a radiology report.
"case 2. [[code1, code2]] or [[code1, code2], [code3, code4, code5], …]"
Dear Sir/Madam,
I got an error when I ran the following codes (data is from Pipeline 5: Sleep Staging):
cal_model = HistogramBinning(model)
cal_model.calibrate(cal_dataset=val_dataset)
from pyhealth.trainer import Trainer
print(Trainer(model=cal_model, metrics=['cwECEt_adapt', 'accuracy']).evaluate(test_loader))
Any advice? Thank you.
I have tested some basic code from the tutorial with the MIMIC-4 dataset. But the process hanged. I press ctrl-C to exit the program and it gives the following call stacks. Seems like parallel_apply
get into a deadlock or something else when parsing prescriptions.
reproducing code
import logging
from pyhealth.datasets import MIMIC4Dataset
logger = logging.getLogger("pyhealth")
logger.setLevel(logging.DEBUG)
dataset = MIMIC4Dataset(
"/home/featurize/data/mimic-iv-2.2/hosp",
tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
)
Callstacks after ctrl-C
Loaded NDC->ATC mapping from /home/featurize/.cache/pyhealth/medcode/NDC_to_ATC.pkl
Loaded NDC code from /home/featurize/.cache/pyhealth/medcode/NDC.pkl
Loaded ATC code from /home/featurize/.cache/pyhealth/medcode/ATC.pkl
Processing MIMIC4Dataset base dataset...
INFO: Pandarallel will run on 6 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
finish basic patient information parsing : 80.05470561981201s
finish parsing diagnoses_icd : 134.23406291007996s
finish parsing procedures_icd : 57.97325396537781s
^CTraceback (most recent call last):
File "main.py", line 7, in <module>
dataset = MIMIC4Dataset(
File "/home/featurize/work/PyHealth/pyhealth/datasets/base_ehr_dataset.py", line 130, in __init__
patients = self.parse_tables()
File "/home/featurize/work/PyHealth/pyhealth/datasets/base_ehr_dataset.py", line 190, in parse_tables
patients = getattr(self, f"parse_{table.lower()}")(patients)
File "/home/featurize/work/PyHealth/pyhealth/datasets/mimic4.py", line 307, in parse_prescriptions
group_df = group_df.parallel_apply(
File "/environment/miniconda3/envs/py38/lib/python3.8/site-packages/pandarallel/core.py", line 307, in closure
Process ForkPoolWorker-28:
Process ForkPoolWorker-31:
Process ForkPoolWorker-30:
Process ForkPoolWorker-32:
Process ForkPoolWorker-33:
Process ForkPoolWorker-29:
message: Tuple[int, WorkerStatus, Any] = master_workers_queue.get()
File "<string>", line 2, in get
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker
task = get()
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker
task = get()
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker
task = get()
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 355, in get
with self._rlock:
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 356, in get
res = self._reader.recv_bytes()
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 355, in get
with self._rlock:
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
I use the following code to get all the icd tokens.
base_dataset2 = MIMIC4Dataset(
root="/home/czhaobo/KnowHealth/data/physionet.org/files/mimiciv/2.0/hosp", # 2.2 不大行
tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
dev=False,
refresh_cache=False, # 第一次用True
)
sample_dataset2 = base_dataset2.set_task(drug_recommendation_mimic4_fn)
tokenizer2 = Tokenizer(
tokens=sample_dataset2.get_all_tokens(key='conditions'),
special_tokens=["<pad>", "<unk>"],
)
tokens2 = list(tokenizer2.vocabulary.idx2token.values())
print(tokens2)
diag_sys1, proc_sys1, med_sys1 = get_stand_system('MIMIC-III')
diag_sys2, proc_sys2, med_sys2 = get_stand_system('MIMIC-IV')
but when i try to find their name via Innermap.lookup, i always get a key error. For example, H4011X0 is a id in tokens2,
```python
if __name__ == "__main__":
icd9cm = InnerMap.load("ICD9CM")
icd10cm = InnerMap.load("ICD10CM")
print(icd9cm.lookup('H4011X0'))
print(icd10cm.lookup('H4011X0'))
Dear sir/Madam,
In 'Advanced Case 2: Work on customized healthcare task' , I have questions about 'sequential_drugs' and 'drugs' (see following codes). It seems 'sequential_drugs' is always empty. What ' sequential_drugs[-1] = drugs' is used for at the final row?
def sequential_drug_recommendation(patient):
samples = []
sequential_conditions = []
sequential_procedures = []
sequential_drugs = [] # not include the drugs now
for visit in patient:
# step 1: obtain feature information
conditions = visit.get_code_list(table="DIAGNOSES_ICD")
procedures = visit.get_code_list(table="PROCEDURES_ICD")
drugs = visit.get_code_list(table="PRESCRIPTIONS")
sequential_conditions.append(conditions)
sequential_procedures.append(drugs)
sequential_drugs.append([])
# step 2: exclusion criteria: visits without drug
if len(drugs) == 0:
sequential_drugs[-1] = drugs
continue
# step 3: assemble the samples
samples.append(
{
"visit_id": visit.visit_id,
"patient_id": patient.patient_id,
# the following keys can be the "feature_keys" or "label_key" for initializing downstream ML model
"sequential_conditions": sequential_drugs.copy(),
"sequential_procedures": sequential_procedures.copy(),
"sequential_drugs": sequential_drugs.copy(),
"label": drugs,
}
)
sequential_drugs[-1] = drugs
return samples
I found that HALO is combined in the branch 'main Halo 2'. Will it be added to the main branch along with the EHR synthetic task? #278
Hi,
How can I do training from a pre-trained model?
For example, instead of using:
from pyhealth.models import Transformer, RNN, RETAIN
model = Transformer(
dataset=mimic3_task_ds,
# look up what are available for "feature_keys" and "label_keys" in dataset.samples[0]
feature_keys=["conditions", "procedures", "drugs"],
label_key="label",
mode="binary",
I would like to use a pre-trained Transformer or pre-trained bert model instead.
Is it possible?
@pat-jj
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.