coderxio / dailymed-api Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 5.0 163 KB

REST API for DailyMed SPLs

Home Page: https://coderx.io/

License: MIT License

Python 99.14% Dockerfile 0.86%

daily-med django hacktoberfest python3 scrapy

dailymed-api's People

Contributors

Stargazers

Watchers

Forkers

hyejinniya strawbridges lichman0405 openssl-sg-insights saikatechinc

dailymed-api's Issues

Capture product NDC as well as package NDC

I think this makes sense at the SPL level... but might be more than one product NDC (i.e. 12345-6789) per SPL?

Repository needs a license for those who fork it

Problem Statement

Repo is missing a license.

Criteria for Success

Add a license.

Additional Information

Potentially MIT license?

Zip extraction needs to loop through multiple zips

Problem Statement

extract_zips.py can only handle a single .zip file. There are multiple .zip files when the data is downloaded from dailymed. Additional finding when deploying to droplet was memory errors, out of memory, due to writing entire file in one chunk. Need ability to download large, > 1 Gb, file on droplet with minimal memory.

Criteria for Success

All zip files are looped through and unzipped.
An out of memory error does not occur.

Additional Information

README prod deployment corrections and presentation changes

Problem Statement

README.md has incorrect information on steps to reproduce. Additionally, README does not contain links to the hosted API.

Criteria for Success

Deployment steps are correct and link to API works.

Capture DEA drug schedule within the product model

Proposal

Capture the DEA drug schedule for the product within the model.

Rationale

This is a next step as we expand the scrapping of the XML files. The item fits within our current models logically, i.e. the product level.

Update code to align with PEP8 standard

Problem Statement

Code does not align with PEP8 standards

Criteria for Success

Successfully run flake8 against code base with no errors

Additional Information

Create branch, process via python module Black and then forward, require all code changes to be compliant.

Update argparse metavars to proper type, i.e. int.

Problem Statement

Command line arguments have wrong metavar. Metavar needs to be set to int for all three inputs.

Criteria for Success

Metavar identified as int

Additional Information

Update get_zips with correct metavar in argparse.

Missing filters for product endpoint

Problem Statement

Need to filter down to product by

contains product codes
does not contain inactive ingredient UNIIs

Criteria for Success

Product endpoint filter for product_codes and not_inactive_ingredient_uniis

Additional Information

Desired workflow is probably this:

User searches for RXCUI via a user-friendly search
Query RxNorm for an RXCUI https://rxnav.nlm.nih.gov/REST/ndcproperties.json?id=314200
Gather distinct product codes from this response
User also searches for inactive ingredient UNIIs that they can't tolerate via a user-friendly search
Query DailyMed API for all products that contain ANY of the product codes from RxNorm which also do NOT contain ANY of the inactive ingredient UNIIs

Initial filter setup returns duplicate SPLs if searching by things like inactive ingredients

Problem Statement

SPLs should not be returned multiple times just because they have a product that has multiple inactive ingredients with the same name.

Criteria for Success

This query should return only one result, not three:
http://api.coderx.io/spl/?set_id=15776e53-0ae5-4605-914a-8a1bcd97323a&inactive_ingredient_name=oxide

Additional Information

Keyword arguments for "distinct" exist, but haven't figured out how to use them, and not sure this would fix it anyway: https://django-filter.readthedocs.io/en/stable/ref/filters.html?highlight=distinct#distinct

Scrapy xml parsing errors

Problem Statement

There are currently some xml parsing errors that are occurring with only some of the spl zip files. Appears the errors are related to missing containerPackagedProduct tags in some spl files. Instead containerPackagedMedicine tags are being used.

Criteria for Success

We need to determine if this is the only error in these spl files. Solution will likely involve testing for either tag when scraping the xml files.

Additional Information

None

Scrapy run errors filenotfound

The scrapy run logs currently show two errors after execution on every run. Our setup is using scrapy to crawl a bunch of xml files so this error is really related to its default settings which apply more so for crawling URLs instead of files. The error appears to be related to a missing robots.txt file.

To fix this bug, the default scrapy settings need to be adjusted to disable searching for this file. We should be cautious with this if we decide to actually scrape urls in the future.

Stackoverflow question
Scrapy settings

Filter Views via Django Filters

Proposal

Ability to filter views by the items presented in the view. For example, /spl view could filter by product name.
Per discussion on 09/27, the recommendation was Django Filters.

Rationale

Currently, users are unable to curate the returned information. Filtering would enable the user to select only information pertinent to the problem being solved. If a user needs to know NDC numbers for lisinopril, they could filter only products with the name lisinopril.

Branch naming conventions

Proposal

Would propose we come up with some branch naming conventions. Maybe something like name/issue#/description. Food for though what do you guys think?

Rationale

This would better organize and identify branches in the repo.
ref: https://stackoverflow.com/questions/273695/what-are-some-examples-of-commonly-used-practices-for-naming-git-branches

Standardized issue and PR templates

Github allows you to create standardized Issue and PR templates. I think it would be a good idea to come up with some basic templates for this repo. Wondering if it would be cool to actually make a modified SBAR format for this. These would be defaults but can still be adjusted in the issue or PR.

Templates I have seen in the past look like this (shout out to @toozej):

For issues:

Problem Statement

[What needs to be done and why]

Criteria for Success

[Measureable outcome if possible]

Additional Information

[ways one might accomplish this task, links, documentation, alternatives, etc.]

For PRs:

Fixes org/repo#ISSUE_NUMBER

Explanation

[What did you change?]

Rationale

[Why did you make the changes mentioned above? What alternatives did you consider?]

Tests

What testing did you do?
Attach testing logs inside a summary block:

testing logs

Add pagination to product & spl views. Consider other views too for performance reasons.

Proposal

The API end points needs to include pagination.

Rationale

Currently, the API is slow to load for products endpoint and possibly other end points due to the size of the data.

Add WSGI http server to project poetry files

Problem Statement

Need wsgi http server, e.g. gunicorn, added to project dependency files.

Criteria for Success

n/a

Additional Information

I would recommend gunicorn as it is a very simple command to use it.

Single view point, i.e. /spl

Proposal

Single view set, /spl with filtering.

Rationale

The project is crawling SPL documentation. It makes sense to have the URL route to be /spl.

Duplicate NDCs within a single SPL

Problem Statement
Individual SPLs sometimes have more than one NDC within the XML (and within the "Ingredients and Appearance" section). This prevents us from:

Using NDC as a primary key
Using NDC as a lookup_field

Example: https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=2524b253-069e-4028-819c-361b888df110

Criteria for Success
Either modify Scrapy logic to remove duplicates (if appropriate) or modify Django Rest Framework to use an auto-incrementing number as the primary key and don't use NDC as a lookup_field.

Success = 0 duplicate errors during Scrapy run.

Additional Information
The two XML documents I found are below:

3c8e9c87-0475-444b-bbe7-99620519a581.xml
8d4d72be-638b-11ea-918e-832cfc2ca371.xml

I pulled this from the API for one of them:

GET /spl/8d4d72be-638b-11ea-918e-832cfc2ca371/
HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
{
    "id": "8d4d72be-638b-11ea-918e-832cfc2ca371",
    "set": "http://192.168.1.12:8000/set/2524b253-069e-4028-819c-361b888df110/",
    "ndcs": [
        "http://192.168.1.12:8000/ndc/50458-178-00/",
        "http://192.168.1.12:8000/ndc/50458-178-15/",
        "http://192.168.1.12:8000/ndc/50458-178-00/",
        "http://192.168.1.12:8000/ndc/50458-178-20/",
        "http://192.168.1.12:8000/ndc/50458-178-00/",
        "http://192.168.1.12:8000/ndc/50458-178-12/",
        "http://192.168.1.12:8000/ndc/50458-178-28/",
        "http://192.168.1.12:8000/ndc/50458-178-06/",
        "http://192.168.1.12:8000/ndc/50458-176-00/",
        "http://192.168.1.12:8000/ndc/50458-176-15/",
        "http://192.168.1.12:8000/ndc/50458-176-28/",
        "http://192.168.1.12:8000/ndc/50458-176-06/"
    ]
}

coderxio / dailymed-api Goto Github PK

dailymed-api's People

Contributors

Stargazers

Watchers

Forkers

dailymed-api's Issues

Problem Statement

Criteria for Success

Additional Information

Problem Statement

Criteria for Success

Additional Information

Problem Statement

Criteria for Success

Proposal

Rationale

Problem Statement

Criteria for Success

Additional Information

Problem Statement

Criteria for Success

Additional Information

Problem Statement

Criteria for Success

Additional Information

Problem Statement

Criteria for Success

Additional Information

Problem Statement

Criteria for Success

Additional Information

Proposal

Rationale

Proposal

Rationale

Problem Statement

Criteria for Success

Additional Information

Explanation

Rationale

Tests

Proposal

Rationale

Problem Statement

Criteria for Success

Additional Information

Proposal

Rationale

Recommend Projects

Recommend Topics

Recommend Org