Extract disease data from PubTator Central for a given query (in this case, 'heart failure'). Link each disease with PMID and MESH ID for use in later projects.
This Python script does the following:
- Obtains list of PMIDs from PubMed for a specific search query
- Converts that list into a format searchable in PubTator Central (PTC)
- Uses PTC API to get annotated articles (right now just 100 but should be able to do up to 1000) for PMID list. Output is in Pubtator Format.
- Writes data to csv file, then reads back in and parses into a data frame.
This script requires:
Both may be installed through pip, e.g., pip install biopython pandas
.
Run this script as python '.\PMID to BioC Retrieval Using PubMed and PTC APIs.py'
By default, the script searches for all documents corresponding to the query "heart failure" - please change the string for the search query in the script.
Output is written to "output.csv".
The data frame is not saved but may be passed to another function.
Developed by Marlee Zinsser in the Ping Lab at UCLA while working with Harry Caufield in Fall 2019.