nlpsandbox / date-annotator-example Goto Github PK

View Code? Open in Web Editor NEW

6.0 7.0 4.0 692 KB

Example implementation of the NLP Sandbox Date Annotator API

Home Page: https://nlpsandbox.io

License: Apache License 2.0

Dockerfile 1.45% Python 98.32% Shell 0.23%

nlp natural-language-processing date datetime nlp-sandbox cd2h api-service sage-bionetworks benchmarking

date-annotator-example's Introduction

nlpsandbox

Home repository

date-annotator-example's People

Contributors

Stargazers

Watchers

Forkers

mcw-bmi boyleconnor yy6linda gkowalski

date-annotator-example's Issues

Match regex using in Java implementation

Compare Swagger code gen and openapi code gen

Push README to DockerHub in ci.yml

Add the improvements developed while working on the ROCC service

Clarify logging

print() are not shown in stdout
logging.info() are shown to stdout

This likely stems from the s6 logging configuration.

Identify format of the logs so they can be interpreted by ELK

@thomasyu888 The format of the NLP Tool is almost "Gold" and I'll present it on Tuesday during our technical meeting. One of the last elements that we need to figure out is the format of the logs printed to stdout and stderr so we can easily interpret them in our ELK stack. Did you come across a standard or best practices regarding this? I guess that the logs should be prefixed with at least the following information for which we need to define the format:

timestamp
message type (e.g. info, warning, error)

Figure out the best way to structure the folder src/

Motivation

This example repository for NLP Sandbox Date Annotators should implement all the best practices that come to our mind. This repo is expected to be forked by the Developers who will use the NLP Sandbox. We could also make this repo available as a GH template.

A best practice is to promote a modular implementation of the Date Annotators and other NLP Tools.

I don't have much experience with the development of Python packages / programs. Would a structure like the one below make sense?

src/library: Python library that implements the logic of the Date Annotator (and optionally publish it on PyPI?)
src/server: A Python-Flask server that implements the NLP Sandbox OpenAPI spec of the Date Annotator and use the above library for the core business logic. The current content has been generated using openapi-generator.
src/cli: A command line interface program for the library (optional)?

@thomasyu888 @jaeddy What do you think is the best way to achieve this?

Remove nginx from the stack

It's actually not need for security reason, only the Docker network is.

Update the README to reflect this change

Add workflow to notify when new NLP Sandbox schemas is available

Use a new file named .nlpsandbox-version to track the current version of the NLP Sandbox schemas implemented by this repository.

Automatically Validate and/or Bump Version

Is your proposal related to a problem?

There is a tool version listed in both docker-compose.yml and tool_controller.py. When you release a version (which automatically gets Dockerized, thanks to our nifty ci.yml workflow), there is nothing to make sure that the version named in the tag matches the version listed in these files (it should!)

Describe the solution you'd like

If it is possible, it would be nice if the tag workflow could check that the version mentioned in the tag actually matches the version listed in these files.

It would also be nice if the tag workflow could automatically bump the version in these files after each release. E.g., after you release version 1.2.3, it automatically commits new versions of docker-compose.yml and tool_controller.py with 1.2.4 as the new version number. I think bumpversion does something like this.

Identify if the local openapi.yaml be customized by developers

Or is this file overwritten when updating the codebase when a new version of the OpenAPI spec is available?

If we decide that the developer can customize the local openapi.yaml file, document this in the README

Update .openapi-generator-ignore to minimize number of files to merge after a schemas update

Set up github CI to submit to challenge

should depend on the docker step
step is called nlpsandbox

Add example print message

Update to schemas 1.1.0

See changelog from nlpsandbox/nlpsandbox-schemas#200

Add workflow to check if new OpenAPI specs has been released

Task

Create a workflow that

Download the OpenAPI specification of the NLP Sandbox Date Annotator (openapi.yaml)
Compare the version of the reference spec with the one stored in the root folder of this repo
If a new version if available, automatically create a PR that run openapi-generator to update the API

Invalid license value

ValueError: Invalid value for `license` (Apache-2.0), must be one of ['afl-3.0', 'apache-2.0', 'artistic-2.0', 'bsl-1.0', 'bsd-2-clause', 'bsd-3-clause', 'bsd-3-clause-clear', 'cc', 'cc0-1.0', 'cc-by-4.0', 'cc-by-sa-4.0', 'wtfpl', 'ecl-2.0', 'epl-1.0', 'epl-2.0', 'eupl-1.1', 'agpl-3.0', 'gpl', 'gpl-2.0', 'gpl-3.0', 'lgpl', 'lgpl-2.1', 'lgpl-3.0', 'isc', 'lppl-1.3c', 'ms-pl', 'mit', 'mpl-2.0', 'osl-3.0', 'postgresql', 'ofl-1.1', 'ncsa', 'unlicense', 'zlib']

Is the a possibility to have the schema avoid case validation? (Edit: I did some research and found that there isn't a way to specify case-insensitive enums: https://stackoverflow.com/questions/60772786/case-insensitive-string-parameter-in-schema-of-openapi)

Document how the initial implementation has been created using openapi-generator

Add to the README

Return Service object when quering /

Create badge that redirect to the Leaderboard page of this NLP Task

The badge could also be used in the Documentation on Synapse

Badge icon: Synapse logo
Badge label: "Leaderboard" or "Benchmark" or "Benchmarked on nlpsandbox.io" (maybe a bit long) or just "nlpsandbox.io"

The request body for /dates has more keys than we receive from the data node server

@tschaffter: It appears that the request body is:

[
  {
    "createdAt": "2020-10-20T03:19:51.087Z",
    "createdBy": {
      "email": "[email protected]",
      "firstName": "John",
      "lastName": "Smith",
      "username": "John78"
    },
    "id": 0,
    "updatedAt": "2020-10-20T03:19:51.087Z",
    "updatedBy": {
      "email": "[email protected]",
      "firstName": "John",
      "lastName": "Smith",
      "username": "John78"
    },
    "text": "On 09-03-1999, Ms Chloe Price met with Dr Joe.",
    "type": "pathology"
  }
]

Unfortunately, we only currently receive the below from the data notes server:

{"id": ....
"text":....}

This does indeed work, but was just wondering as to why there were these other fields. It seems like the only fields that are required to use this API is the id and text. Do i compare the dates with the date results I get form querying the date endpoint in the data node API?

Service object doesn't seem to be returned unless full path specified

curl -X GET "http://10.23.55.45:9000"               
{
  "detail": "The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.",
  "status": 404,
  "title": "Not Found",
  "type": "about:blank"
}

Use new sage DockerHub service user

Add GH worflow

Task

Lint dockerfile and docker-compose
build and publish docker image

Document how to update codebase when API spec has been updated

Add this documentation to the project README. A section for this doc has already been created.

Review date regex

It seems that the separator (-) can be replaced by -
Consider the example below where the last character is not taken into account because not including it is also valid according to the regex.

See https://regexr.com/5l0op

Fix tox issue

Running tox generates the following error:

openapi_server/test/test_date_controller.py F                                                                                                                  [100%]

============================================================================== FAILURES ==============================================================================
_______________________________________________________________ TestDateController.test_dates_read_all _______________________________________________________________

self = <openapi_server.test.test_date_controller.TestDateController testMethod=test_dates_read_all>

        def test_dates_read_all(self):
            """Test case for dates_read_all

            Get all date annotations
            """
            note = {
      "fileName" : "260-01.xml",
      "text" : "October 3, Ms Chloe Price met with...",
      "type" : "pathology",
      "patientPublicId" : ""
    }
            headers = {
                'Accept': 'application/json',
                'Content-Type': 'application/json',
            }
            response = self.client.open(
                '/api/v1/dates',
                method='GET',
                headers=headers,
                data=json.dumps(note),
                content_type='application/json')
>           self.assert200(response,
                           'Response body is : ' + response.data.decode('utf-8'))

openapi_server/test/test_date_controller.py:38:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox/py3/lib/python3.8/site-packages/flask_testing/utils.py:336: in assert200
    self.assertStatus(response, 200, message)
.tox/py3/lib/python3.8/site-packages/flask_testing/utils.py:324: in assertStatus
    self.assertEqual(response.status_code, status_code, message)
E   AssertionError: 400 != 200 : Response body is : {
E     "detail": "{'fileName': '260-01.xml', 'patientPublicId': '', 'text': 'October 3, Ms Chloe Price met with...', 'type': 'pathology'} is not of type 'array'",
E     "status": 400,
E     "title": "Bad Request",
E     "type": "about:blank"
E   }
------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------
ERROR    connexion.decorators.validation:validation.py:200 http://localhost/api/v1/dates validation error: {'fileName': '260-01.xml', 'patientPublicId': '', 'text': 'October 3, Ms Chloe Price met with...', 'type': 'pathology'} is not of type 'array'
========================================================================== warnings summary ==========================================================================
.tox/py3/lib/python3.8/site-packages/flask/_compat.py:139
  /mnt/c/Users/thoma/Documents/dev/nlp-sandbox-date-annotator-example/src/server/.tox/py3/lib/python3.8/site-packages/flask/_compat.py:139: DeprecationWarning: 'flask.json_available' is deprecated and will be removed in version 2.0.0.
    self._warn()

-- Docs: https://docs.pytest.org/en/latest/warnings.html

----------- coverage: platform linux, python 3.8.5-final-0 -----------
Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
openapi_server/__init__.py                               0      0   100%
openapi_server/__main__.py                               9      9     0%
openapi_server/controllers/__init__.py                   0      0   100%
openapi_server/controllers/date_controller.py            9      3    67%
openapi_server/controllers/health_controller.py          6      0   100%
openapi_server/controllers/security_controller_.py       1      1     0%
openapi_server/encoder.py                               16     10    38%
openapi_server/models/__init__.py                        7      0   100%
openapi_server/models/annotation.py                     87     38    56%
openapi_server/models/base_model_.py                    31     16    48%
openapi_server/models/date_annotation.py                60     23    62%
openapi_server/models/entity.py                         51     20    61%
openapi_server/models/health.py                         22      9    59%
openapi_server/models/note.py                           86     37    57%
openapi_server/models/user.py                           50     25    50%
openapi_server/test/__init__.py                         11      0   100%
openapi_server/test/test_date_controller.py             15      1    93%
openapi_server/test/test_health_controller.py           13      1    92%
openapi_server/typing_utils.py                          15     10    33%
openapi_server/util.py                                  56     44    21%
------------------------------------------------------------------------
TOTAL                                                  545    247    55%

=========================================================== 1 failed, 1 passed, 1 warnings in 6.71 seconds ===========================================================
ERROR: InvocationError for command /mnt/c/Users/thoma/Documents/dev/nlp-sandbox-date-annotator-example/src/server/.tox/py3/bin/pytest --cov=openapi_server (exited with code 1)
______________________________________________________________________________ summary _______________________________________________________________________________
ERROR:   py3: commands failed

Fix npx commands in README

Several commands are currently missing "@OpenAPITools" in "npx @openapitools/openapi-generator-cli"

Run docker scan

Review if there is vulnerabilities and if they can be fixed easily.

Rename app.ini to uwsgi.ini

Minimize the code to review when updating the API server based on a new OpenAPI document

@thomasyu888 Could you give a try to adding an OpenAPI Generator template to this microservice to attempt to make its update easier?

Documentation enhancement

The documentation in the README.md states

Evaluate the performance of a local prediction file:

docker run --rm nlpsandbox/cli evaluate

But does not explain the other options which are required , nor tell how to first get / produce these files

docker run --rm nlpsandbox/cli evaluate prediction --help

Usage: nlp-cli evaluate prediction [OPTIONS]

  Evaluate the performance of a local prediction file

Options:
  --pred_filepath PATH            Prediction filepath  [required]
  --gold_filepath PATH            Gold standard filepath  [required]
  --output PATH                   Specify output json path
  --eval_type [date|person|address]
                                  Type of evaluation.
  --help                          Show this message and exit.

(base) ➜  mcw-nlpsandbox-client git:(develop) ✗

Update codebase to return Tool info

Use edge specification

Start Flask app in production mode when container starts

Update server based on latest OpenAPI spec

The OpenAPI of the Date Annotator has been updated this week following a discussion with George.

https://sage-bionetworks.github.io/nlp-sandbox-schemas/date-annotator/develop/openapi.yaml

Fix Docker badge

Now that we are pushing the image to Synapse, we currently can't get the number of pulls. Find an alternative. Also change the URL to the Docker repository on Synapse.

Specify that the controllers developed for the API will not have network access.

Update CI/CD workflow

Test for python 3.7, 3.8 and 3.9

Update to schema version 0.1.2

Troubleshooting Paul's submission

@paulheider forked this date annotator example, manually built the docker image before pushing it to Synapse. He then submitted it and received the following error that he shared with me by email:

Hello Paul-M-Heider, Your submission (id: 9719900) is invalid, below are the invalid reasons: API api/v1/tool endpoint not implemented or implemented incorrectly. Make sure correct tool object is returned. .../api/v1/ui not
 implemented or implemented incorrectly. Sincerely, Challenge Administrator

@paulheider I successfully performed the above operation and received the scores for the i2b2 and Mayo Clinic dataset (MCW data site may be down, we are investigating).

I have a few questions:

Did you follow these instructions to build and push the image to Synapse?
Did you submit the Docker image to the queue NLP sandbox - Date Annotator?

A quick look at your fork reveal that you have not yet customized the tool info returned by the endpoint /api/v1/tool. This is good for now as we should change as little as possible before we figure out what the current issue is.

Enable images in README to render in DockerHub README

https://hub.docker.com/repository/docker/nlpsandbox/date-annotator-example/general

Installation issue with pip

Ran thru a new installation, got past the conda set up of the environment and ran :

(nlp-sandbox-date-annotator-example) ➜ server git:(develop) pip install .
ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
(nlp-sandbox-date-annotator-example) ➜ server git:(develop)

This was on a mac 10.15.7 Catalina

(nlp-sandbox-date-annotator-example) ➜ server git:(develop) which pip
/Users/gkowalsk/opt/miniconda3/envs/nlp-sandbox-date-annotator-example/bin/pip
(nlp-sandbox-date-annotator-example) ➜ server git:(develop)

Add badges to README

Update to schemas 1.1.2

See nlpsandbox/nlpsandbox-schemas#213

Changelog

Configure branch main as default
Improve CI workflow based on nlpsandbox-schemas workflow
Push image to docker.synapse.org/syn22277123/date-annotator-example
Update implementation based on schemas 1.1.2
Update README
- Make it more concise if possible
- Describe the GH Secret to configure after forking + other parameter (e.g. ID to personal Synapse project)
Remove branch develop

nlpsandbox / date-annotator-example Goto Github PK

date-annotator-example's Introduction

nlpsandbox

date-annotator-example's People

Contributors

Stargazers

Watchers

Forkers

date-annotator-example's Issues

Motivation

Is your proposal related to a problem?

Describe the solution you'd like

Task

Task

Changelog

Recommend Projects

Recommend Topics

Recommend Org