Giter Club home page Giter Club logo

f8a-hpf-insights's Introduction

CI codecov

f8a-hpf-insights (maven)

(fabric8-analytics-hpf-insights)

HPF Matrix Factorizations for companion recommendation. HPF- Hierarchical Poisson Factorization

Index:

Supported ecosystems:

  • Maven - Last trained at: 2018-08-08 11:30 IST(UTC +5:30)

Build upon:

To run locally via docker-compose:

  • Setup Minio and start Minio server so that hpf-insights is loaded as a folder inside it upon running. To use AWS S3 instead of Minio add your AWS S3 credentials in the next step instead of Minio credentials.
  • Create a .env file and add credentials to it.
  • In the .env set the AWS_S3_ENDPOINT_URL to <blank> for using AWS S3 and to http://ip:port for using Minio.
  • source .env
  • docker-compose build
  • docker-compose up
  • curl http://0.0.0.0:6006/ should return status: ok

To run on dev-cluster:

  • cp secret.yaml.template secret.yaml
  • Add your AWS S3 credentials to secret.yaml
  • oc login
  • oc new-project hpf-insights
  • oc create -f secret.yaml
  • oc process -f openshift/template.yaml -o yaml|oc create -f - If you want to update the template.yaml and redeploy it, then do oc process -f openshift/template.yaml -o yaml|oc apply -f - Use apply instead of create for subsequent re-deployments.
  • Go your Openshift console and expose the route
  • curl <route_URL> should return status:ok

Unit Tests

There's a script named runtests.sh that can be used to run all unit tests. The unit test coverage is reported as well by this script.

Usage:

./runtests.sh

To run load testing for recommendation API:

  • pip install locustio==0.8.1
  • Bring up the service.
  • locust -f perf_tests/locust_tests.py --host=<URL of the service>

Footnotes:

Check for all possible issues

The script named check-all.sh is to be used to check the sources for all detectable errors and issues. This script can be run w/o any arguments:

./check-all.sh

Expected script output:

Running all tests and checkers
  Check all BASH scripts
    OK
  Check documentation strings in all Python source file
    OK
  Detect common errors in all Python source file
    OK
  Detect dead code in all Python source file
    OK
  Run Python linter for Python source file
    OK
  Unit tests for this project
    OK
Done

Overall result
  OK

An example of script output when one error is detected:

Running all tests and checkers
  Check all BASH scripts
    Error: please look into files check-bashscripts.log and check-bashscripts.err for possible causes
  Check documentation strings in all Python source file
    OK
  Detect common errors in all Python source file
    OK
  Detect dead code in all Python source file
    OK
  Run Python linter for Python source file
    OK
  Unit tests for this project
    OK
Done

Overal result
  One error detected!

Please note that the script creates bunch of *.log and *.err files that are temporary and won't be commited into the project repository.

Coding standards:

  • You can use scripts run-linter.sh and check-docstyle.sh to check if the code follows PEP 8 and PEP 257 coding standards. These scripts can be run w/o any arguments:
./run-linter.sh
./check-docstyle.sh

The first script checks the indentation, line lengths, variable names, whitespace around operators etc. The second script checks all documentation strings - its presence and format. Please fix any warnings and errors reported by these scripts.

List of directories containing source code, that needs to be checked, are stored in a file directories.txt

Code complexity measurement

The scripts measure-cyclomatic-complexity.sh and measure-maintainability-index.sh are used to measure code complexity. These scripts can be run w/o any arguments:

./measure-cyclomatic-complexity.sh

and:

./measure-maintainability-index.sh

The first script measures cyclomatic complexity of all Python sources found in the repository. Please see this table for further explanation how to comprehend the results.

The second script measures maintainability index of all Python sources found in the repository. Please see the following link with explanation of this measurement.

You can specify command line option --fail-on-error if you need to check and use the exit code in your workflow. In this case the script returns 0 when no failures has been found and non zero value instead.

Dead code detection

The script detect-dead-code.sh can be used to detect dead code in the repository. This script can be run w/o any arguments:

./detect-dead-code.sh

Please note that due to Python's dynamic nature, static code analyzers are likely to miss some dead code. Also, code that is only called implicitly may be reported as unused.

Because of this potential problems, only code detected with more than 90% of confidence is reported.

List of directories containing source code, that needs to be checked, are stored in a file directories.txt

Common issues detection

The script detect-common-errors.sh can be used to detect common errors in the repository. This script can be run w/o any arguments:

./detect-common-errors.sh

Please note that only semantical problems are reported.

List of directories containing source code, that needs to be checked, are stored in a file directories.txt

Check for scripts written in BASH

The script named check-bashscripts.sh can be used to check all BASH scripts (in fact: all files with the .sh extension) for various possible issues, incompatibilities, and caveats. This script can be run w/o any arguments:

./check-bashscripts.sh

Please see the following link for further explanation, how the ShellCheck works and which issues can be detected.

Code coverage report

Code coverage is reported via the codecov.io. The results can be seen on the following address:

code coverage report

Additional links:


f8a-hpf-insights's People

Contributors

abs51295 avatar animuk avatar arajkumar avatar dgpatelgit avatar jmelis avatar lucky-suman avatar maorfr avatar miteshvp avatar msrb avatar rootavish avatar sara-02 avatar sawood14012 avatar tisnik avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

f8a-hpf-insights's Issues

Errors found by linter in the file training/train.py

train.py:32:9: W291 trailing whitespace
train.py:42:74: W291 trailing whitespace
train.py:44:75: W291 trailing whitespace
train.py:53:75: W291 trailing whitespace
train.py:72:27: W291 trailing whitespace
train.py:84:19: W291 trailing whitespace
train.py:119:62: W291 trailing whitespace
train.py:132:18: W291 trailing whitespace
train.py:150:29: W291 trailing whitespace
train.py:160:27: W291 trailing whitespace
train.py:197:9: E741 ambiguous variable name 'l'
train.py:225:58: W291 trailing whitespace
train.py:232:86: W291 trailing whitespace
train.py:242:65: W291 trailing whitespace
train.py:243:31: E128 continuation line under-indented for visual indent
train.py:244:31: E128 continuation line under-indented for visual indent
train.py:244:89: W291 trailing whitespace
train.py:245:31: E128 continuation line under-indented for visual indent
train.py:275:91: W291 trailing whitespace
train.py:283:1: W391 blank line at end of file

Exercise service with dependencies of a real upstream project

Let's see how performance tests would behave if we would ask it to scan some upstream project with complex dependency tree.

@sara-02 in the list of dependencies to be sent to the endpoint, are you looking for direct dependencies or transitives as well or it is irrelevant?

Thanks!

Karel

Strange implementation of list-files method from local_data_store.py

The method list-files is different from the similar methods from other repositories:

  1. the +1 in len() computation seems not to be correct (and it is not tested)

  2. unused arguments that have sense for this method:

src/data_store/local_data_store.py
src/data_store/local_data_store.py:22: unused variable 'max_count' (100% confidence)
src/data_store/local_data_store.py:22: unused variable 'prefix' (100% confidence)

Update this repository to use Python 3.6 instead of Python 3.4

EPEL repositories now contain proper Python 3.6 packages and at the same moment Python 3.4 is being deprecated [1] [2].

It means that we need to upgrade this repository to use Python 3.6 instead of Python 3.4.

What needs to be changed AND tested:

  • all Dockerfiles
  • CICO setup
  • linter and pydocstyle scripts
  • CI and MI measurement scripts
  • script to start tests

References:
[1] https://lists.fedoraproject.org/archives/list/[email protected]/thread/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO/
[2] https://www.reddit.com/r/CentOS/comments/azetyy/python_34_to_be_deprecated_this_month/

Common errors found in the file training/train.py

training/train.py
training/train.py:7: 'itertools' imported but unused
training/train.py:125: local variable 'count' is assigned to but never used
training/train.py:186: local variable 'e' is assigned to but never used
training/train.py:201: local variable 'e' is assigned to but never used
training/train.py:212: local variable 'e' is assigned to but never used

Add logging

Add logging support to help in better debugging.

Error in prod environment during loading & initializing the app

[2019-04-01 11:55:07 +0000] [9] [INFO] Starting gunicorn 19.9.0
[2019-04-01 11:55:07 +0000] [9] [INFO] Listening at: http://0.0.0.0:6006  (9)
[2019-04-01 11:55:07 +0000] [9] [INFO] Using worker: sync
[2019-04-01 11:55:07 +0000] [12] [INFO] Booting worker with pid: 12
[2019-04-01 11:55:16 +0000] [12] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/usr/lib/python3.4/site-packages/gunicorn/workers/base.py", line 129, in init_process
    self.load_wsgi()
  File "/usr/lib/python3.4/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/lib/python3.4/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/lib/python3.4/site-packages/gunicorn/app/wsgiapp.py", line 52, in load
    return self.load_wsgiapp()
  File "/usr/lib/python3.4/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/lib/python3.4/site-packages/gunicorn/util.py", line 350, in import_app
    __import__(module)
  File "/src/flask_endpoint.py", line 55, in <module>
    app.scoring_object = HPFScoring(datastore=s3_object)
  File "/src/scoring/hpf_scoring.py", line 43, in __init__
    self.loadObjects()
  File "/src/scoring/hpf_scoring.py", line 69, in loadObjects
    0].get("package_list", {}).items()})
KeyError: 0
[2019-04-01 11:55:16 +0000] [12] [INFO] Worker exiting (pid: 12)
[2019-04-01 11:55:16 +0000] [9] [INFO] Shutting down: Master
[2019-04-01 11:55:16 +0000] [9] [INFO] Reason: Worker failed to boot.

Unit test criteria and REST API test to met F8A: Definition of Done

The following criteria should be accomplished to met F8A: Definition of Done

  • unit test setup in the repository
  • unit test coverage ~ 50% (let's start with lover number)
  • the business logic needs to be covered (one function yet to be tested, #47)
  • all REST API endpoints covered by tests

Model gives an error with empty package list

Input:

[
    {
        "ecosystem": "maven",
        "package_list": []

    }
]

Output:

hpf-insights_1  | 2018-08-17 15:11:20,178 flask.app [ERROR] Exception on /api/v1/companion_recommendation [POST]
hpf-insights_1  | Traceback (most recent call last):
hpf-insights_1  |   File "/usr/lib64/python3.4/site-packages/flask/app.py", line 2292, in wsgi_app
hpf-insights_1  |     response = self.full_dispatch_request()
hpf-insights_1  |   File "/usr/lib64/python3.4/site-packages/flask/app.py", line 1815, in full_dispatch_request
hpf-insights_1  |     rv = self.handle_user_exception(e)
hpf-insights_1  |   File "/usr/lib64/python3.4/site-packages/flask_cors/extension.py", line 161, in wrapped_function
hpf-insights_1  |     return cors_after_request(app.make_response(f(*args, **kwargs)))
hpf-insights_1  |   File "/usr/lib64/python3.4/site-packages/flask/app.py", line 1718, in handle_user_exception
hpf-insights_1  |     reraise(exc_type, exc_value, tb)
hpf-insights_1  |   File "/usr/lib64/python3.4/site-packages/flask/_compat.py", line 35, in reraise
hpf-insights_1  |     raise value
hpf-insights_1  |   File "/usr/lib64/python3.4/site-packages/flask/app.py", line 1813, in full_dispatch_request
hpf-insights_1  |     rv = self.dispatch_request()
hpf-insights_1  |   File "/usr/lib64/python3.4/site-packages/flask/app.py", line 1799, in dispatch_request
hpf-insights_1  |     return self.view_functions[rule.endpoint](**req.view_args)
hpf-insights_1  |   File "/src/flask_endpoint.py", line 92, in hpf_scoring
hpf-insights_1  |     input_stack['package_list'])
hpf-insights_1  |   File "/src/scoring/hpf_scoring.py", line 197, in predict
hpf-insights_1  |     if len(missing_packages) / len(input_stack) < UNKNOWN_PACKAGES_THRESHOLD:
hpf-insights_1  | ZeroDivisionError: division by zero


Not all source files are checked by linters

Source file                                  | Line count | Linter      | Docstyle
tests/__init__.py                            |     1      | ✓           | ✓
tests/unit_tests/test_flask_endpoint.py      |    45      | ✓           | ✓
tests/unit_tests/test_scoring_hpf_scoring.py |    78      | ✓           | ✓
tests/unit_tests/__init__.py                 |     1      | ✓           | ✓
tests/unit_tests/test_utils.py               |     9      | ✓           | ✓
tests/test_data/__init__.py                  |     1      | ✓           | ✓
tests/test_data/maven/scoring/__init__.py    |     1      | ✓           | ✓
src/__init__.py                              |     1      | ✓           | ✓
src/utils.py                                 |     9      | ✓           | ✓
src/flask_endpoint.py                        |   124      | ✓           | ✓
src/scoring/__init__.py                      |     1      | ✓           | ✓
src/scoring/hpf_scoring.py                   |   205      | ✓           | ✓
src/training/__init__.py                     |     1      | ✓           | ✓
PoC_code/__init__.py                         |     1      | ✓           | ✓
perf_tests/__init__.py                       |     1      | ✓           | ✓
perf_tests/locust_tests.py                   |   110      | ✓           | ✓
deployments/__init__.py                      |     1      | ✓           | ✓
training/train.py                            |   265      | Not checked | Not checked

Add QA-related scripts into separate subdirectory

Currently all QA-related scripts (check-bashscripts.sh, check-docstyle.sh) are stored in the repo's root directory. It might be worth to move it into separate subdirectory to cleanup the content a bit.

Missing docstrings in almost all files in trainings/train.py

./train.py:1 at module level:
        D100: Missing docstring in public module
./train.py:28 in public function `load_S3`:
        D103: Missing docstring in public function
./train.py:41 in public function `load_data`:
        D103: Missing docstring in public function
./train.py:63 in public function `generate_package_id_dict`:
        D103: Missing docstring in public function
./train.py:76 in public function `generate_manifest_id_dict`:
        D103: Missing docstring in public function
./train.py:88 in public function `preprocess_raw_data`:
        D103: Missing docstring in public function
./train.py:102 in public function `preprocess_data`:
        D103: Missing docstring in public function
./train.py:117 in public function `make_user_item_df`:
        D103: Missing docstring in public function
./train.py:136 in public function `train_test_split`:
        D103: Missing docstring in public function
./train.py:154 in public function `check_unique`:
        D103: Missing docstring in public function
./train.py:161 in public function `frac`:
        D103: Missing docstring in public function
./train.py:167 in public function `extra_df`:
        D103: Missing docstring in public function
./train.py:177 in public function `recall_at_m`:
        D103: Missing docstring in public function
./train.py:192 in public function `precision_at_m`:
        D103: Missing docstring in public function
./train.py:207 in public function `precision_recall_at_m`:
        D103: Missing docstring in public function
./train.py:219 in public function `run_recommender`:
        D103: Missing docstring in public function
./train.py:230 in public function `save_model`:
        D103: Missing docstring in public function
./train.py:240 in public function `save_hyperparams`:
        D103: Missing docstring in public function
./train.py:251 in public function `save_obj`:
        D103: Missing docstring in public function
./train.py:264 in public function `train_model`:
        D103: Missing docstring in public function

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.