Home repository
nlpsandbox / date-annotator-example Goto Github PK
View Code? Open in Web Editor NEWExample implementation of the NLP Sandbox Date Annotator API
Home Page: https://nlpsandbox.io
License: Apache License 2.0
Example implementation of the NLP Sandbox Date Annotator API
Home Page: https://nlpsandbox.io
License: Apache License 2.0
This likely stems from the s6 logging configuration.
@thomasyu888 The format of the NLP Tool is almost "Gold" and I'll present it on Tuesday during our technical meeting. One of the last elements that we need to figure out is the format of the logs printed to stdout and stderr so we can easily interpret them in our ELK stack. Did you come across a standard or best practices regarding this? I guess that the logs should be prefixed with at least the following information for which we need to define the format:
This example repository for NLP Sandbox Date Annotators should implement all the best practices that come to our mind. This repo is expected to be forked by the Developers who will use the NLP Sandbox. We could also make this repo available as a GH template.
A best practice is to promote a modular implementation of the Date Annotators and other NLP Tools.
I don't have much experience with the development of Python packages / programs. Would a structure like the one below make sense?
openapi-generator
.@thomasyu888 @jaeddy What do you think is the best way to achieve this?
It's actually not need for security reason, only the Docker network is.
Use a new file named .nlpsandbox-version
to track the current version of the NLP Sandbox schemas implemented by this repository.
There is a tool version
listed in both docker-compose.yml
and tool_controller.py
. When you release a version (which automatically gets Dockerized, thanks to our nifty ci.yml
workflow), there is nothing to make sure that the version named in the tag matches the version listed in these files (it should!)
If it is possible, it would be nice if the tag
workflow could check that the version mentioned in the tag actually matches the version listed in these files.
It would also be nice if the tag
workflow could automatically bump the version in these files after each release. E.g., after you release version 1.2.3
, it automatically commits new versions of docker-compose.yml
and tool_controller.py
with 1.2.4
as the new version number. I think bumpversion
does something like this.
Or is this file overwritten when updating the codebase when a new version of the OpenAPI spec is available?
If we decide that the developer can customize the local openapi.yaml file, document this in the README
docker
stepnlpsandbox
See changelog from nlpsandbox/nlpsandbox-schemas#200
Create a workflow that
openapi.yaml
)ValueError: Invalid value for `license` (Apache-2.0), must be one of ['afl-3.0', 'apache-2.0', 'artistic-2.0', 'bsl-1.0', 'bsd-2-clause', 'bsd-3-clause', 'bsd-3-clause-clear', 'cc', 'cc0-1.0', 'cc-by-4.0', 'cc-by-sa-4.0', 'wtfpl', 'ecl-2.0', 'epl-1.0', 'epl-2.0', 'eupl-1.1', 'agpl-3.0', 'gpl', 'gpl-2.0', 'gpl-3.0', 'lgpl', 'lgpl-2.1', 'lgpl-3.0', 'isc', 'lppl-1.3c', 'ms-pl', 'mit', 'mpl-2.0', 'osl-3.0', 'postgresql', 'ofl-1.1', 'ncsa', 'unlicense', 'zlib']
Is the a possibility to have the schema avoid case validation? (Edit: I did some research and found that there isn't a way to specify case-insensitive enums: https://stackoverflow.com/questions/60772786/case-insensitive-string-parameter-in-schema-of-openapi)
Add to the README
The badge could also be used in the Documentation on Synapse
@tschaffter: It appears that the request body is:
[
{
"createdAt": "2020-10-20T03:19:51.087Z",
"createdBy": {
"email": "[email protected]",
"firstName": "John",
"lastName": "Smith",
"username": "John78"
},
"id": 0,
"updatedAt": "2020-10-20T03:19:51.087Z",
"updatedBy": {
"email": "[email protected]",
"firstName": "John",
"lastName": "Smith",
"username": "John78"
},
"text": "On 09-03-1999, Ms Chloe Price met with Dr Joe.",
"type": "pathology"
}
]
Unfortunately, we only currently receive the below from the data notes server:
{"id": ....
"text":....}
This does indeed work, but was just wondering as to why there were these other fields. It seems like the only fields that are required to use this API is the id
and text
. Do i compare the dates
with the date results I get form querying the date endpoint in the data node API?
curl -X GET "http://10.23.55.45:9000"
{
"detail": "The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.",
"status": 404,
"title": "Not Found",
"type": "about:blank"
}
Add this documentation to the project README. A section for this doc has already been created.
(-)
can be replaced by -
Running tox
generates the following error:
openapi_server/test/test_date_controller.py F [100%]
============================================================================== FAILURES ==============================================================================
_______________________________________________________________ TestDateController.test_dates_read_all _______________________________________________________________
self = <openapi_server.test.test_date_controller.TestDateController testMethod=test_dates_read_all>
def test_dates_read_all(self):
"""Test case for dates_read_all
Get all date annotations
"""
note = {
"fileName" : "260-01.xml",
"text" : "October 3, Ms Chloe Price met with...",
"type" : "pathology",
"patientPublicId" : ""
}
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
}
response = self.client.open(
'/api/v1/dates',
method='GET',
headers=headers,
data=json.dumps(note),
content_type='application/json')
> self.assert200(response,
'Response body is : ' + response.data.decode('utf-8'))
openapi_server/test/test_date_controller.py:38:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox/py3/lib/python3.8/site-packages/flask_testing/utils.py:336: in assert200
self.assertStatus(response, 200, message)
.tox/py3/lib/python3.8/site-packages/flask_testing/utils.py:324: in assertStatus
self.assertEqual(response.status_code, status_code, message)
E AssertionError: 400 != 200 : Response body is : {
E "detail": "{'fileName': '260-01.xml', 'patientPublicId': '', 'text': 'October 3, Ms Chloe Price met with...', 'type': 'pathology'} is not of type 'array'",
E "status": 400,
E "title": "Bad Request",
E "type": "about:blank"
E }
------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------
ERROR connexion.decorators.validation:validation.py:200 http://localhost/api/v1/dates validation error: {'fileName': '260-01.xml', 'patientPublicId': '', 'text': 'October 3, Ms Chloe Price met with...', 'type': 'pathology'} is not of type 'array'
========================================================================== warnings summary ==========================================================================
.tox/py3/lib/python3.8/site-packages/flask/_compat.py:139
/mnt/c/Users/thoma/Documents/dev/nlp-sandbox-date-annotator-example/src/server/.tox/py3/lib/python3.8/site-packages/flask/_compat.py:139: DeprecationWarning: 'flask.json_available' is deprecated and will be removed in version 2.0.0.
self._warn()
-- Docs: https://docs.pytest.org/en/latest/warnings.html
----------- coverage: platform linux, python 3.8.5-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------------------------
openapi_server/__init__.py 0 0 100%
openapi_server/__main__.py 9 9 0%
openapi_server/controllers/__init__.py 0 0 100%
openapi_server/controllers/date_controller.py 9 3 67%
openapi_server/controllers/health_controller.py 6 0 100%
openapi_server/controllers/security_controller_.py 1 1 0%
openapi_server/encoder.py 16 10 38%
openapi_server/models/__init__.py 7 0 100%
openapi_server/models/annotation.py 87 38 56%
openapi_server/models/base_model_.py 31 16 48%
openapi_server/models/date_annotation.py 60 23 62%
openapi_server/models/entity.py 51 20 61%
openapi_server/models/health.py 22 9 59%
openapi_server/models/note.py 86 37 57%
openapi_server/models/user.py 50 25 50%
openapi_server/test/__init__.py 11 0 100%
openapi_server/test/test_date_controller.py 15 1 93%
openapi_server/test/test_health_controller.py 13 1 92%
openapi_server/typing_utils.py 15 10 33%
openapi_server/util.py 56 44 21%
------------------------------------------------------------------------
TOTAL 545 247 55%
=========================================================== 1 failed, 1 passed, 1 warnings in 6.71 seconds ===========================================================
ERROR: InvocationError for command /mnt/c/Users/thoma/Documents/dev/nlp-sandbox-date-annotator-example/src/server/.tox/py3/bin/pytest --cov=openapi_server (exited with code 1)
______________________________________________________________________________ summary _______________________________________________________________________________
ERROR: py3: commands failed
Several commands are currently missing "@OpenAPITools" in "npx @openapitools/openapi-generator-cli"
Review if there is vulnerabilities and if they can be fixed easily.
Related to Sage-Bionetworks/research-benchmarking-technology#7
@thomasyu888 Could you give a try to adding an OpenAPI Generator template to this microservice to attempt to make its update easier?
The documentation in the README.md states
Evaluate the performance of a local prediction file:
docker run --rm nlpsandbox/cli evaluate
But does not explain the other options which are required , nor tell how to first get / produce these files
docker run --rm nlpsandbox/cli evaluate prediction --help
Usage: nlp-cli evaluate prediction [OPTIONS]
Evaluate the performance of a local prediction file
Options:
--pred_filepath PATH Prediction filepath [required]
--gold_filepath PATH Gold standard filepath [required]
--output PATH Specify output json path
--eval_type [date|person|address]
Type of evaluation.
--help Show this message and exit.
(base) ➜ mcw-nlpsandbox-client git:(develop) ✗
Use edge specification
The OpenAPI of the Date Annotator has been updated this week following a discussion with George.
https://sage-bionetworks.github.io/nlp-sandbox-schemas/date-annotator/develop/openapi.yaml
Now that we are pushing the image to Synapse, we currently can't get the number of pulls. Find an alternative. Also change the URL to the Docker repository on Synapse.
@paulheider forked this date annotator example, manually built the docker image before pushing it to Synapse. He then submitted it and received the following error that he shared with me by email:
Hello Paul-M-Heider, Your submission (id: 9719900) is invalid, below are the invalid reasons: API api/v1/tool endpoint not implemented or implemented incorrectly. Make sure correct tool object is returned. .../api/v1/ui not
implemented or implemented incorrectly. Sincerely, Challenge Administrator
@paulheider I successfully performed the above operation and received the scores for the i2b2 and Mayo Clinic dataset (MCW data site may be down, we are investigating).
I have a few questions:
NLP sandbox - Date Annotator
?A quick look at your fork reveal that you have not yet customized the tool info returned by the endpoint /api/v1/tool
. This is good for now as we should change as little as possible before we figure out what the current issue is.
Ran thru a new installation, got past the conda set up of the environment and ran :
(nlp-sandbox-date-annotator-example) ➜ server git:(develop) pip install .
ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
(nlp-sandbox-date-annotator-example) ➜ server git:(develop)
This was on a mac 10.15.7 Catalina
(nlp-sandbox-date-annotator-example) ➜ server git:(develop) which pip
/Users/gkowalsk/opt/miniconda3/envs/nlp-sandbox-date-annotator-example/bin/pip
(nlp-sandbox-date-annotator-example) ➜ server git:(develop)
See nlpsandbox/nlpsandbox-schemas#213
main
as defaultdevelop
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.