Giter Club home page Giter Club logo

date-annotator-example's Introduction

nlpsandbox

Home repository

date-annotator-example's People

Contributors

dependabot[bot] avatar github-actions[bot] avatar gkowalski avatar thomasyu888 avatar tschaffter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

date-annotator-example's Issues

Clarify logging

  • print() are not shown in stdout
  • logging.info() are shown to stdout

This likely stems from the s6 logging configuration.

Identify format of the logs so they can be interpreted by ELK

@thomasyu888 The format of the NLP Tool is almost "Gold" and I'll present it on Tuesday during our technical meeting. One of the last elements that we need to figure out is the format of the logs printed to stdout and stderr so we can easily interpret them in our ELK stack. Did you come across a standard or best practices regarding this? I guess that the logs should be prefixed with at least the following information for which we need to define the format:

  • timestamp
  • message type (e.g. info, warning, error)

Figure out the best way to structure the folder src/

Motivation

This example repository for NLP Sandbox Date Annotators should implement all the best practices that come to our mind. This repo is expected to be forked by the Developers who will use the NLP Sandbox. We could also make this repo available as a GH template.

A best practice is to promote a modular implementation of the Date Annotators and other NLP Tools.

I don't have much experience with the development of Python packages / programs. Would a structure like the one below make sense?

  • src/library: Python library that implements the logic of the Date Annotator (and optionally publish it on PyPI?)
  • src/server: A Python-Flask server that implements the NLP Sandbox OpenAPI spec of the Date Annotator and use the above library for the core business logic. The current content has been generated using openapi-generator.
  • src/cli: A command line interface program for the library (optional)?

@thomasyu888 @jaeddy What do you think is the best way to achieve this?

Remove nginx from the stack

It's actually not need for security reason, only the Docker network is.

  • Update the README to reflect this change

Automatically Validate and/or Bump Version

Is your proposal related to a problem?

There is a tool version listed in both docker-compose.yml and tool_controller.py. When you release a version (which automatically gets Dockerized, thanks to our nifty ci.yml workflow), there is nothing to make sure that the version named in the tag matches the version listed in these files (it should!)

Describe the solution you'd like

If it is possible, it would be nice if the tag workflow could check that the version mentioned in the tag actually matches the version listed in these files.

It would also be nice if the tag workflow could automatically bump the version in these files after each release. E.g., after you release version 1.2.3, it automatically commits new versions of docker-compose.yml and tool_controller.py with 1.2.4 as the new version number. I think bumpversion does something like this.

Add workflow to check if new OpenAPI specs has been released

Task

Create a workflow that

  1. Download the OpenAPI specification of the NLP Sandbox Date Annotator (openapi.yaml)
  2. Compare the version of the reference spec with the one stored in the root folder of this repo
  3. If a new version if available, automatically create a PR that run openapi-generator to update the API

Invalid license value

ValueError: Invalid value for `license` (Apache-2.0), must be one of ['afl-3.0', 'apache-2.0', 'artistic-2.0', 'bsl-1.0', 'bsd-2-clause', 'bsd-3-clause', 'bsd-3-clause-clear', 'cc', 'cc0-1.0', 'cc-by-4.0', 'cc-by-sa-4.0', 'wtfpl', 'ecl-2.0', 'epl-1.0', 'epl-2.0', 'eupl-1.1', 'agpl-3.0', 'gpl', 'gpl-2.0', 'gpl-3.0', 'lgpl', 'lgpl-2.1', 'lgpl-3.0', 'isc', 'lppl-1.3c', 'ms-pl', 'mit', 'mpl-2.0', 'osl-3.0', 'postgresql', 'ofl-1.1', 'ncsa', 'unlicense', 'zlib']

Is the a possibility to have the schema avoid case validation? (Edit: I did some research and found that there isn't a way to specify case-insensitive enums: https://stackoverflow.com/questions/60772786/case-insensitive-string-parameter-in-schema-of-openapi)

The request body for /dates has more keys than we receive from the data node server

@tschaffter: It appears that the request body is:

[
  {
    "createdAt": "2020-10-20T03:19:51.087Z",
    "createdBy": {
      "email": "[email protected]",
      "firstName": "John",
      "lastName": "Smith",
      "username": "John78"
    },
    "id": 0,
    "updatedAt": "2020-10-20T03:19:51.087Z",
    "updatedBy": {
      "email": "[email protected]",
      "firstName": "John",
      "lastName": "Smith",
      "username": "John78"
    },
    "text": "On 09-03-1999, Ms Chloe Price met with Dr Joe.",
    "type": "pathology"
  }
]

Unfortunately, we only currently receive the below from the data notes server:

{"id": ....
"text":....}

This does indeed work, but was just wondering as to why there were these other fields. It seems like the only fields that are required to use this API is the id and text. Do i compare the dates with the date results I get form querying the date endpoint in the data node API?

Add GH worflow

Task

  • Lint dockerfile and docker-compose
  • build and publish docker image

Review date regex

  • It seems that the separator (-) can be replaced by -
  • Consider the example below where the last character is not taken into account because not including it is also valid according to the regex.

image

See https://regexr.com/5l0op

Fix tox issue

Running tox generates the following error:

openapi_server/test/test_date_controller.py F                                                                                                                  [100%]

============================================================================== FAILURES ==============================================================================
_______________________________________________________________ TestDateController.test_dates_read_all _______________________________________________________________

self = <openapi_server.test.test_date_controller.TestDateController testMethod=test_dates_read_all>

        def test_dates_read_all(self):
            """Test case for dates_read_all

            Get all date annotations
            """
            note = {
      "fileName" : "260-01.xml",
      "text" : "October 3, Ms Chloe Price met with...",
      "type" : "pathology",
      "patientPublicId" : ""
    }
            headers = {
                'Accept': 'application/json',
                'Content-Type': 'application/json',
            }
            response = self.client.open(
                '/api/v1/dates',
                method='GET',
                headers=headers,
                data=json.dumps(note),
                content_type='application/json')
>           self.assert200(response,
                           'Response body is : ' + response.data.decode('utf-8'))

openapi_server/test/test_date_controller.py:38:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox/py3/lib/python3.8/site-packages/flask_testing/utils.py:336: in assert200
    self.assertStatus(response, 200, message)
.tox/py3/lib/python3.8/site-packages/flask_testing/utils.py:324: in assertStatus
    self.assertEqual(response.status_code, status_code, message)
E   AssertionError: 400 != 200 : Response body is : {
E     "detail": "{'fileName': '260-01.xml', 'patientPublicId': '', 'text': 'October 3, Ms Chloe Price met with...', 'type': 'pathology'} is not of type 'array'",
E     "status": 400,
E     "title": "Bad Request",
E     "type": "about:blank"
E   }
------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------
ERROR    connexion.decorators.validation:validation.py:200 http://localhost/api/v1/dates validation error: {'fileName': '260-01.xml', 'patientPublicId': '', 'text': 'October 3, Ms Chloe Price met with...', 'type': 'pathology'} is not of type 'array'
========================================================================== warnings summary ==========================================================================
.tox/py3/lib/python3.8/site-packages/flask/_compat.py:139
  /mnt/c/Users/thoma/Documents/dev/nlp-sandbox-date-annotator-example/src/server/.tox/py3/lib/python3.8/site-packages/flask/_compat.py:139: DeprecationWarning: 'flask.json_available' is deprecated and will be removed in version 2.0.0.
    self._warn()

-- Docs: https://docs.pytest.org/en/latest/warnings.html

----------- coverage: platform linux, python 3.8.5-final-0 -----------
Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
openapi_server/__init__.py                               0      0   100%
openapi_server/__main__.py                               9      9     0%
openapi_server/controllers/__init__.py                   0      0   100%
openapi_server/controllers/date_controller.py            9      3    67%
openapi_server/controllers/health_controller.py          6      0   100%
openapi_server/controllers/security_controller_.py       1      1     0%
openapi_server/encoder.py                               16     10    38%
openapi_server/models/__init__.py                        7      0   100%
openapi_server/models/annotation.py                     87     38    56%
openapi_server/models/base_model_.py                    31     16    48%
openapi_server/models/date_annotation.py                60     23    62%
openapi_server/models/entity.py                         51     20    61%
openapi_server/models/health.py                         22      9    59%
openapi_server/models/note.py                           86     37    57%
openapi_server/models/user.py                           50     25    50%
openapi_server/test/__init__.py                         11      0   100%
openapi_server/test/test_date_controller.py             15      1    93%
openapi_server/test/test_health_controller.py           13      1    92%
openapi_server/typing_utils.py                          15     10    33%
openapi_server/util.py                                  56     44    21%
------------------------------------------------------------------------
TOTAL                                                  545    247    55%

=========================================================== 1 failed, 1 passed, 1 warnings in 6.71 seconds ===========================================================
ERROR: InvocationError for command /mnt/c/Users/thoma/Documents/dev/nlp-sandbox-date-annotator-example/src/server/.tox/py3/bin/pytest --cov=openapi_server (exited with code 1)
______________________________________________________________________________ summary _______________________________________________________________________________
ERROR:   py3: commands failed

Run docker scan

Review if there is vulnerabilities and if they can be fixed easily.

Documentation enhancement

The documentation in the README.md states

Evaluate the performance of a local prediction file:

docker run --rm nlpsandbox/cli evaluate

But does not explain the other options which are required , nor tell how to first get / produce these files

docker run --rm nlpsandbox/cli evaluate prediction --help

Usage: nlp-cli evaluate prediction [OPTIONS]

  Evaluate the performance of a local prediction file

Options:
  --pred_filepath PATH            Prediction filepath  [required]
  --gold_filepath PATH            Gold standard filepath  [required]
  --output PATH                   Specify output json path
  --eval_type [date|person|address]
                                  Type of evaluation.
  --help                          Show this message and exit.

(base) ➜  mcw-nlpsandbox-client git:(develop) ✗  

Fix Docker badge

Now that we are pushing the image to Synapse, we currently can't get the number of pulls. Find an alternative. Also change the URL to the Docker repository on Synapse.

Troubleshooting Paul's submission

@paulheider forked this date annotator example, manually built the docker image before pushing it to Synapse. He then submitted it and received the following error that he shared with me by email:

Hello Paul-M-Heider, Your submission (id: 9719900) is invalid, below are the invalid reasons: API api/v1/tool endpoint not implemented or implemented incorrectly. Make sure correct tool object is returned. .../api/v1/ui not
 implemented or implemented incorrectly. Sincerely, Challenge Administrator

@paulheider I successfully performed the above operation and received the scores for the i2b2 and Mayo Clinic dataset (MCW data site may be down, we are investigating).

I have a few questions:

  • Did you follow these instructions to build and push the image to Synapse?
  • Did you submit the Docker image to the queue NLP sandbox - Date Annotator?

A quick look at your fork reveal that you have not yet customized the tool info returned by the endpoint /api/v1/tool. This is good for now as we should change as little as possible before we figure out what the current issue is.

Installation issue with pip

Ran thru a new installation, got past the conda set up of the environment and ran :

(nlp-sandbox-date-annotator-example) ➜ server git:(develop) pip install .
ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
(nlp-sandbox-date-annotator-example) ➜ server git:(develop)

This was on a mac 10.15.7 Catalina

(nlp-sandbox-date-annotator-example) ➜ server git:(develop) which pip
/Users/gkowalsk/opt/miniconda3/envs/nlp-sandbox-date-annotator-example/bin/pip
(nlp-sandbox-date-annotator-example) ➜ server git:(develop)

Update to schemas 1.1.2

See nlpsandbox/nlpsandbox-schemas#213

Changelog

  • Configure branch main as default
  • Improve CI workflow based on nlpsandbox-schemas workflow
  • Push image to docker.synapse.org/syn22277123/date-annotator-example
  • Update implementation based on schemas 1.1.2
  • Update README
    • Make it more concise if possible
    • Describe the GH Secret to configure after forking + other parameter (e.g. ID to personal Synapse project)
  • Remove branch develop

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.