Giter Club home page Giter Club logo

dailymed-api's People

Contributors

finish06 avatar jrlegrand avatar yevgenybulochnik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dailymed-api's Issues

Zip extraction needs to loop through multiple zips

Problem Statement

extract_zips.py can only handle a single .zip file. There are multiple .zip files when the data is downloaded from dailymed. Additional finding when deploying to droplet was memory errors, out of memory, due to writing entire file in one chunk. Need ability to download large, > 1 Gb, file on droplet with minimal memory.

Criteria for Success

All zip files are looped through and unzipped.
An out of memory error does not occur.

Additional Information

Capture DEA drug schedule within the product model

Proposal

Capture the DEA drug schedule for the product within the model.

Rationale

This is a next step as we expand the scrapping of the XML files. The item fits within our current models logically, i.e. the product level.

Update code to align with PEP8 standard

Problem Statement

Code does not align with PEP8 standards

Criteria for Success

Successfully run flake8 against code base with no errors

Additional Information

Create branch, process via python module Black and then forward, require all code changes to be compliant.

Update argparse metavars to proper type, i.e. int.

Problem Statement

Command line arguments have wrong metavar. Metavar needs to be set to int for all three inputs.

Criteria for Success

Metavar identified as int

Additional Information

Update get_zips with correct metavar in argparse.

Missing filters for product endpoint

Problem Statement

Need to filter down to product by

  • contains product codes
  • does not contain inactive ingredient UNIIs

Criteria for Success

Product endpoint filter for product_codes and not_inactive_ingredient_uniis

Additional Information

Desired workflow is probably this:

  1. User searches for RXCUI via a user-friendly search
  2. Query RxNorm for an RXCUI https://rxnav.nlm.nih.gov/REST/ndcproperties.json?id=314200
  3. Gather distinct product codes from this response
  4. User also searches for inactive ingredient UNIIs that they can't tolerate via a user-friendly search
  5. Query DailyMed API for all products that contain ANY of the product codes from RxNorm which also do NOT contain ANY of the inactive ingredient UNIIs

Initial filter setup returns duplicate SPLs if searching by things like inactive ingredients

Problem Statement

SPLs should not be returned multiple times just because they have a product that has multiple inactive ingredients with the same name.

Criteria for Success

This query should return only one result, not three:
http://api.coderx.io/spl/?set_id=15776e53-0ae5-4605-914a-8a1bcd97323a&inactive_ingredient_name=oxide

Additional Information

Keyword arguments for "distinct" exist, but haven't figured out how to use them, and not sure this would fix it anyway: https://django-filter.readthedocs.io/en/stable/ref/filters.html?highlight=distinct#distinct

Scrapy xml parsing errors

Problem Statement

There are currently some xml parsing errors that are occurring with only some of the spl zip files. Appears the errors are related to missing containerPackagedProduct tags in some spl files. Instead containerPackagedMedicine tags are being used.

Criteria for Success

We need to determine if this is the only error in these spl files. Solution will likely involve testing for either tag when scraping the xml files.

Additional Information

None

Scrapy run errors filenotfound

The scrapy run logs currently show two errors after execution on every run. Our setup is using scrapy to crawl a bunch of xml files so this error is really related to its default settings which apply more so for crawling URLs instead of files. The error appears to be related to a missing robots.txt file.

To fix this bug, the default scrapy settings need to be adjusted to disable searching for this file. We should be cautious with this if we decide to actually scrape urls in the future.

Stackoverflow question
Scrapy settings

Filter Views via Django Filters

Proposal

Ability to filter views by the items presented in the view. For example, /spl view could filter by product name.
Per discussion on 09/27, the recommendation was Django Filters.

Rationale

Currently, users are unable to curate the returned information. Filtering would enable the user to select only information pertinent to the problem being solved. If a user needs to know NDC numbers for lisinopril, they could filter only products with the name lisinopril.

Standardized issue and PR templates

Github allows you to create standardized Issue and PR templates. I think it would be a good idea to come up with some basic templates for this repo. Wondering if it would be cool to actually make a modified SBAR format for this. These would be defaults but can still be adjusted in the issue or PR.

Templates I have seen in the past look like this (shout out to @toozej):


For issues:

Problem Statement

[What needs to be done and why]

Criteria for Success

[Measureable outcome if possible]

Additional Information

[ways one might accomplish this task, links, documentation, alternatives, etc.]


For PRs:

Fixes org/repo#ISSUE_NUMBER

Explanation

[What did you change?]

Rationale

[Why did you make the changes mentioned above? What alternatives did you consider?]

Tests

  1. What testing did you do?
  2. Attach testing logs inside a summary block:
testing logs

Add WSGI http server to project poetry files

Problem Statement

Need wsgi http server, e.g. gunicorn, added to project dependency files.

Criteria for Success

n/a

Additional Information

I would recommend gunicorn as it is a very simple command to use it.

Single view point, i.e. /spl

Proposal

Single view set, /spl with filtering.

Rationale

The project is crawling SPL documentation. It makes sense to have the URL route to be /spl.

Duplicate NDCs within a single SPL

Problem Statement
Individual SPLs sometimes have more than one NDC within the XML (and within the "Ingredients and Appearance" section). This prevents us from:

  1. Using NDC as a primary key
  2. Using NDC as a lookup_field

Example: https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=2524b253-069e-4028-819c-361b888df110

Criteria for Success
Either modify Scrapy logic to remove duplicates (if appropriate) or modify Django Rest Framework to use an auto-incrementing number as the primary key and don't use NDC as a lookup_field.

Success = 0 duplicate errors during Scrapy run.

Additional Information
The two XML documents I found are below:

  • 3c8e9c87-0475-444b-bbe7-99620519a581.xml
  • 8d4d72be-638b-11ea-918e-832cfc2ca371.xml

I pulled this from the API for one of them:

GET /spl/8d4d72be-638b-11ea-918e-832cfc2ca371/
HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
{
    "id": "8d4d72be-638b-11ea-918e-832cfc2ca371",
    "set": "http://192.168.1.12:8000/set/2524b253-069e-4028-819c-361b888df110/",
    "ndcs": [
        "http://192.168.1.12:8000/ndc/50458-178-00/",
        "http://192.168.1.12:8000/ndc/50458-178-15/",
        "http://192.168.1.12:8000/ndc/50458-178-00/",
        "http://192.168.1.12:8000/ndc/50458-178-20/",
        "http://192.168.1.12:8000/ndc/50458-178-00/",
        "http://192.168.1.12:8000/ndc/50458-178-12/",
        "http://192.168.1.12:8000/ndc/50458-178-28/",
        "http://192.168.1.12:8000/ndc/50458-178-06/",
        "http://192.168.1.12:8000/ndc/50458-176-00/",
        "http://192.168.1.12:8000/ndc/50458-176-15/",
        "http://192.168.1.12:8000/ndc/50458-176-28/",
        "http://192.168.1.12:8000/ndc/50458-176-06/"
    ]
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.