Giter Club home page Giter Club logo

insight's Introduction

Project Insight

NLP as a Service

Project Insight

GitHub issues GitHub forks Github Stars GitHub license Code style: black

Contents

  1. Introduction
  2. Installation
  3. Project Details
  4. License

Introduction

Project Insight is designed to create NLP as a service with code base for both front end GUI (streamlit) and backend server (FastApi) the usage of transformers models on various downstream NLP task.

The downstream NLP tasks covered:

  • News Classification

  • Entity Recognition

  • Sentiment Analysis

  • Summarization

  • Information Extraction To Do

The user can select different models from the drop down to run the inference.

The users can also directly use the backend fastapi server to have a command line inference.

Features of the solution

  • Python Code Base: Built using Fastapi and Streamlit making the complete code base in Python.
  • Expandable: The backend is desinged in a way that it can be expanded with more Transformer based models and it will be available in the front end app automatically.
  • Micro-Services: The backend is designed with a microservices architecture, with dockerfile for each service and leveraging on Nginx as a reverse proxy to each independently running service.
    • This makes it easy to update, manitain, start, stop individual NLP services.

Installation

  • Clone the Repo.
  • Run the Docker Compose to spin up the Fastapi based backend service.
  • Run the Streamlit app with the streamlit run command.

Setup and Documentation

  1. Download the models

    • Download the models from here
    • Save them in the specific model folders inside the src_fastapi folder.
  2. Running the backend service.

    • Go to the src_fastapi folder
    • Run the Docker Compose comnand
    $ cd src_fastapi
    src_fastapi:~$ sudo docker-compose up -d
  3. Running the frontend app.

    • Go to the src_streamlit folder
    • Run the app with the streamlit run command
    $ cd src_streamlit
    src_streamlit:~$ streamlit run NLPfily.py
  4. Access to Fastapi Documentation: Since this is a microservice based design, every NLP task has its own seperate documentation

Project Details

Demonstration

Project Insight Demo

Directory Details

  • Front End: Front end code is in the src_streamlit folder. Along with the Dockerfile and requirements.txt

  • Back End: Back End code is in the src_fastapi folder.

    • This folder contains directory for each task: Classification, ner, summary...etc
    • Each NLP task has been implemented as a microservice, with its own fastapi server and requirements and Dockerfile so that they can be independently mantained and managed.
    • Each NLP task has its own folder and within each folder each trained model has 1 folder each. For example:
    - sentiment
        > app
            > api
                > distilbert
                    - model.bin
                    - network.py
                    - tokeniser files
                >roberta
                    - model.bin
                    - network.py
                    - tokeniser files
    
    • For each new model under each service a new folder will have to be added.

    • Each folder model will need the following files:

      • Model bin file.
      • Tokenizer files
      • network.py Defining the class of the model if customised model used.
    • config.json: This file contains the details of the models in the backend and the dataset they are trained on.

How to Add a new Model

  1. Fine Tune a transformer model for specific task. You can leverage the transformers-tutorials

  2. Save the model files, tokenizer files and also create a network.py script if using a customized training network.

  3. Create a directory within the NLP task with directory_name as the model name and save all the files in this directory.

  4. Update the config.json with the model details and dataset details.

  5. Update the <service>pro.py with the correct imports and conditions where the model is imported. For example for a new Bert model in Classification Task, do the following:

    • Create a new directory in classification/app/api/. Directory name bert.

    • Update config.json with following:

      "classification": {
      "model-1": {
          "name": "DistilBERT",
          "info": "This model is trained on News Aggregator Dataset from UC Irvin Machine Learning Repository. The news headlines are classified into 4 categories: **Business**, **Science and Technology**, **Entertainment**, **Health**. [New Dataset](https://archive.ics.uci.edu/ml/datasets/News+Aggregator)"
      },
      "model-2": {
          "name": "BERT",
          "info": "Model Info"
      }
      }
    • Update classificationpro.py with the following snippets:

      Only if customized class used

      from classification.bert import BertClass

      Section where the model is selected

      if model == "bert":
          self.model = BertClass()
          self.tokenizer = BertTokenizerFast.from_pretrained(self.path)

License

This project is licensed under the GPL-3.0 License - see the LICENSE.md file for details

insight's People

Contributors

abhimishra91 avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

insight's Issues

Backend server giving errors on running

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1264, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1310, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1259, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 976, in send
    self.connect()
  File "/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py", line 368, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1264, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1310, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1259, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 976, in send
    self.connect()
  File "/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 205, in _retrieve_server_version
    return self.version(api_version=False)["ApiVersion"]
  File "/usr/local/lib/python3.6/dist-packages/docker/api/daemon.py", line 181, in version
    return self._result(self._get(url), json=True)
  File "/usr/local/lib/python3.6/dist-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 228, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/docker-compose", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/main.py", line 67, in main
    command()
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/main.py", line 123, in perform_command
    project = project_from_options('.', options)
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/command.py", line 69, in project_from_options
    environment_file=environment_file
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/command.py", line 132, in get_project
    verbose=verbose, version=api_version, context=context, environment=environment
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/docker_client.py", line 43, in get_client
    environment=environment, tls_version=get_tls_version(environment)
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/docker_client.py", line 170, in docker_client
    client = APIClient(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 188, in __init__
    self._version = self._retrieve_server_version()
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 213, in _retrieve_server_version
    'Error while fetching server API version: {0}'.format(e)
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

Failed to establish a new connection: [Errno 111] Connection refused

Hello! Nice work, it looks very good in the video. Unfortunately, I was not able to make it run. I get this error:

Insight
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Traceback:
File "c:\users\mbene\miniconda3\lib\site-packages\streamlit\script_runner.py", line 324, in run_script
exec(code, module.dict)
File "C:\Users\mbene\Documents\GitHub\insight-master\insight-master\src_streamlit\NLPfiy.py", line 132, in
main()
File "C:\Users\mbene\Documents\GitHub\insight-master\insight-master\src_streamlit\NLPfiy.py", line 118, in main
model_details = apicall.model_list(service=service)
File "C:\Users\mbene\Documents\GitHub\insight-master\insight-master\src_streamlit\NLPfiy.py", line 24, in model_list
return json.loads(models.text)
File "c:\users\mbene\miniconda3\lib\json_init
.py", line 348, in loads
return _default_decoder.decode(s)
File "c:\users\mbene\miniconda3\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\users\mbene\miniconda3\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None

In any of the service I get the same message (only change the URL endpoint)

.- I already add the sreamlit to the docker network
.- I already down and up the firewall
.- Create an exception in the firewall
.- Update docker and docker-compose
.- In Linux Ubuntu and Windows 10 is the same error
.- Installed the requirements.txt
.- I can't access the localhost:8000/docs.
.- And l already tried docker run the Dockerfile inside streamlit, and add it to the network

no success so far.

I would appreciate your help. Thank you!

Mauro

No model.bin in the repo

Hi Abhi Mishra

Amazing work. Really like the idea of building NLP services. I'm highly interested in augmenting the models with custom features for sentiment analysis. Is it possible for you to upload the model files (pytorch_model.bin) to try different NLP pipelines for sentiment analysis?

JSONDecodeError

After I run the "streamlit run NLPfily.py" and get the webpage, there is a jsondecodeerror raised when I choose the service.
I don't know how to solve it, the error is below:
could you please help me ?
File "/home/test/ENTER/lib/python3.8/site-packages/streamlit/script_runner.py", line 332, in _run_script
exec(code, module.dict)
File "/home/test/insight-master/src_streamlit/NLPfiy.py", line 130, in
main()
File "/home/test/insight-master/src_streamlit/NLPfiy.py", line 116, in main
model_details = apicall.model_list(service=service)
File "/home/test/insight-master/src_streamlit/NLPfiy.py", line 24, in model_list
return json.loads(models.text)
File "/home/test/ENTER/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/home/test/ENTER/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/test/ENTER/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Hello Abhishek
I try today with the latest changes but still is not working for me. Now I get "Internal Server Error" using FastAPI Swagger UI (see picture)

image

If I request a GET info, I get the name and the description of the model (see next picture)

image

The front end present the same Json decoder error as yesterday.

image

I tried in windows and linux again.

Internal server error invoking NER predict end point

Hi,

I have all other models working except for NER, which uses spaCy. lexemes.bin seems to be missing. I've used spaCy before, but not with an unpackaged model like this appears to be. Any pointers welcomed.

This is my trace:

nginx_1           | 172.20.0.1 - - [26/Aug/2020:12:29:56 +0000] "GET /api/v1/ner/docs HTTP/1.1" 200 910 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36" "-"
ner_1             | INFO:     172.20.0.6:56800 - "GET /api/v1/ner/openapi.json HTTP/1.0" 200 OK
nginx_1           | 172.20.0.1 - - [26/Aug/2020:12:29:57 +0000] "GET /api/v1/ner/openapi.json HTTP/1.1" 200 2724 "http://localhost:8080/api/v1/ner/docs" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36" "-"
nginx_1           | 172.20.0.1 - - [26/Aug/2020:12:30:15 +0000] "GET /api/v1/ner/info HTTP/1.1" 200 163 "http://localhost:8080/api/v1/ner/docs" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36" "-"
ner_1             | INFO:     172.20.0.6:56802 - "GET /api/v1/ner/info HTTP/1.0" 200 OK
ner_1             | INFO:     172.20.0.6:56810 - "POST /api/v1/ner/predict HTTP/1.0" 500 Internal Server Error
ner_1             | ERROR:    Exception in ASGI application
ner_1             | Traceback (most recent call last):
ner_1             |   File "/usr/local/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi
ner_1             |     result = await app(self.scope, self.receive, self.send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
ner_1             |     return await self.app(scope, receive, send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/fastapi/applications.py", line 140, in __call__
ner_1             |     await super().__call__(scope, receive, send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/applications.py", line 134, in __call__
ner_1             |     await self.error_middleware(scope, receive, send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 178, in __call__
ner_1             |     raise exc from None
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 156, in __call__
ner_1             |     await self.app(scope, receive, _send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 73, in __call__
ner_1             |     raise exc from None
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 62, in __call__
ner_1             |     await self.app(scope, receive, sender)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 590, in __call__
ner_1             |     await route(scope, receive, send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 208, in __call__
ner_1             |     await self.app(scope, receive, send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 41, in app
ner_1             |     response = await func(request)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/fastapi/routing.py", line 127, in app
ner_1             |     raw_response = await dependant.call(**values)
ner_1             |   File "./app/api/ner.py", line 46, in named_entity_recognition
ner_1             |     ner_process = NerProcessor(model=item.model.lower())
ner_1             |   File "./app/api/nerpro.py", line 21, in __init__
ner_1             |     self.model = spacy.load("./app/api/spacy/", disable=["tagger", "parser"])
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/__init__.py", line 21, in load
ner_1             |     return util.load_model(name, **overrides)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 116, in load_model
ner_1             |     return load_model_from_path(Path(name), **overrides)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 156, in load_model_from_path
ner_1             |     return nlp.from_disk(model_path)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 647, in from_disk
ner_1             |     util.from_disk(path, deserializers, exclude)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 511, in from_disk
ner_1             |     reader(path / key)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 635, in <lambda>
ner_1             |     self.vocab.from_disk(p) and _fix_pretrained_vectors_name(self))),
ner_1             |   File "vocab.pyx", line 377, in spacy.vocab.Vocab.from_disk
ner_1             |   File "/usr/local/lib/python3.7/pathlib.py", line 1203, in open
ner_1             |     opener=self._opener)
ner_1             |   File "/usr/local/lib/python3.7/pathlib.py", line 1058, in _opener
ner_1             |     return self._accessor.open(self, flags, mode)
ner_1             | FileNotFoundError: [Errno 2] No such file or directory: 'app/api/spacy/vocab/lexemes.bin'
nginx_1           | 172.20.0.1 - - [26/Aug/2020:12:32:00 +0000] "POST /api/v1/ner/predict HTTP/1.1" 500 21 "http://localhost:8080/api/v1/ner/docs" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36" "-"

This is the request body I used from Swagger:

{
  "model": "spaCy",
  "text": "Dense, real valued vectors representing distributional similarity information are now a cornerstone of practical NLP. The most common way to train these vectors is the Word2vec family of algorithms. If you need to train a word2vec model, we recommend the implementation in the Python library Gensim.",
  "query": "string"
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.