jeff1evesque / machine-learning Goto Github PK

View Code? Open in Web Editor NEW

258.0 22.0 88.0 21.24 MB

Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)

License: Other

Python 30.82% Shell 3.20% JavaScript 31.35% Ruby 0.15% Puppet 4.75% HTML 6.49% Dockerfile 1.43% SCSS 21.80%

machine-learning python reactjs mariadb mongodb puppet d3js flask

machine-learning's Introduction

Machine Learning

This project provides a web-interface, as well as a programmatic-api for various machine learning algorithms.

Supported algorithms:

Support Vector Machine (SVM)
Support Vector Regression (SVR)

Contributing

Please adhere to contributing.md, when contributing code. Pull requests that deviate from the contributing.md, could be labelled as invalid, and closed (without merging to master). These best practices will ensure integrity, when revisions of code, or issues need to be reviewed.

Note: support, and philantropy can be inquired, to further assist with development.

Configuration

Fork this project, using of the following methods:

simple clone: clone the remote master branch.
commit hash: clone the remote master branch, then checkout a specific commit hash.
release tag: clone the remote branch, associated with the desired release tag.

Installation

To proceed with the installation for this project, users will need to decide whether to use the rancher ecosystem, or use docker-compose. The former will likely be less reliable, since the corresponding install script, may not work nicely across different operating systems. Additionally, this project will assume rancher as the primary method to deploy, and run the application. So, when using the docker-compose alternate, keep track what the corresponding endpoints should be.

If users choose rancher, both docker and rancher must be installed. Installing docker must be done manually, to fulfill a set of dependencies. Once completed, rancher can be installed, and automatically configured, by simply executing a provided bash script, from the docker quickstart terminal:

cd /path/to/machine-learning
./install-rancher

Note: the installation, and the configuration of rancher, has been outlined if more explicit instructions are needed.

If users choose to forgo rancher, and use the docker-compose, then simply install docker, as well as docker-compose. This will allow the application to be deployed from any terminal console:

cd /path/to/machine-learning
docker-compose up

Note: the installation, and the configuration of docker-compose, has been outlined if more explicit instructions are needed.

Execution

Both the web-interface, and the programmatic-api, have corresponding unit tests which can be reviewed, and implemented. It is important to remember, the installation of this application will dictate the endpoint. More specifically, if the application was installed via rancher, then the endpoint will take the form of https://192.168.99.101:XXXX. However, if the docker-compose up alternate was used, then the endpoint will likely change to https://localhost:XXXX, or https://127.0.0.1:XXXX.

Web Interface

The web-interface, can be accessed within the browser on https://192.168.99.101:8080:

The following sessions are available:

data_new: store the provided dataset(s), within the implemented sql database.
data_append: append additional dataset(s), to an existing representation (from an earlier data_new session), within the implemented sql database.
model_generate: using previous stored dataset(s) (from an earlier
data_new, or data_append session), generate a corresponding model into
model_predict: using a previous stored model (from an earlier model_predict session), from the implemented nosql datastore, along with user supplied values, generate a corresponding prediction.

When using the web-interface, it is important to ensure the csv, xml, or json file(s), representing the corresponding dataset(s), are properly formatted. Dataset(s) poorly formatted will fail to create respective json dataset representation(s). Subsequently, the dataset(s) will not succeed being stored into corresponding database tables. This will prevent any models, and subsequent predictions from being made.

The following dataset(s), show acceptable syntax:

Note: each dependent variable value (for JSON datasets), is an array (square brackets), since each dependent variable may have multiple observations.

Programmatic Interface

The programmatic-interface, or set of API, allow users to implement the following sessions:

data_new: store the provided dataset(s), within the implemented sql database.
data_append: append additional dataset(s), to an existing representation (from an earlier data_new session), within the implemented sql database.
model_generate: using previous stored dataset(s) (from an earlier
data_new, or data_append session), generate a corresponding model into
model_predict: using a previous stored model (from an earlier model_predict session), from the implemented nosql datastore, along with user supplied values, generate a corresponding prediction.

A post request, can be implemented in python, as follows:

import requests

endpoint = 'https://192.168.99.101:9090/load-data'
headers = {
    'Authorization': 'Bearer ' + token,
    'Content-Type': 'application/json'
}

requests.post(endpoint, headers=headers, data=json_string_here)

Note: more information, regarding how to obtain a valid token, can be further reviewed, in the /login documentation.

Note: various data attributes can be nested in above POST request.

It is important to remember that the docker-compose.development.yml, has defined two port forwards, each assigned to its corresponding reverse proxy. This allows port 8080 on the host, to map into the webserver-web container. A similar case for the programmatic-api, uses port 9090 on the host.

machine-learning's People

Contributors

Stargazers

Watchers

Forkers

mba811 mzane42 karsinkk mtamillow jsonbao albre116 madmashup garrison-v setuc mrdrewkeller datadave lclibardi suraj-deshmukh dineshsonachalam yashu29 doobeh strategist922 bornfreesoul arbdigital ondrocks wazeerzulfikar liviust charmingsmilesmile xiaoxioa guilding primmus oahhihs0122 maelstrompli vitao18 protojas lijinpvlm chickuvlm bigrlab rahulsuresh95 ramram1234 clustersdata vishalbelsare www3838438 mieitza kashyap2108 nguyenhoamy1602 dgreyling sharmanatasha khld wypgitt ssh-shashi hufengquna rangsansith azimnorazmi wabc1994 afnanamin mohitsahunitrr whidbey vbondarevsky situchunyun vbhatia9 thomasbshop heikipikker kusht07 thebureaugroup saifnalband efrenbl afcarl stvhanna dbreddyai 5l1v3r1 qls0ulp adsparda as135 rovertroy vgaurav3011 riahtu thanhuutuan franec94 ariczeng2018 ankitsahay15 manoj652 pranay144 bodealamu w3ss doubravap arshad360 iq-scm zz-brian

machine-learning's Issues

Create 'python/svm_analysis.py'

We will create python/svm_analysis.py, which will parse the POST data and request the respective SVM model from a mySQL database query. This query will be defined within a method from python/svm_model.py.

Note: The queried Support Vector Machine model, is previously defined by the same script python/svm_model.py, when imported and called from python/svm_training.py.

Note: It is important that we escape any parameters received from php/logic_loader.php.

Temporary Working Environment

For the remaining week, we have a temporary Windows 7 machine:

Windows edition
Windows 7 Professional
Copyright 2009 Microsoft Corporation. All rights reserved
Service Pack1

System
Manufacturer: Dell
Model: Precision M4600
Rating: 6.9 Windows Experience Index
Processor: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz 2.40 GHz
Installed memory (RAM): 16.0 GB
System type: 64-bit Operating System
Pen and Touch: No Pen or Touch Input is available for the Display

Hard drive
File system: NTFS
Capacity: 237 GB

Unfortunately, we will not be setting up a dual-boot. Also, this machine already implements VMware Player 5.0.2 build-1031769. So, we will find the corresponding version of Ubuntu Server to work within the VMware Player.

Since the issue of setting up working environments has come up more than several times across other projects, it serves precursor for something fantastic. More specifically, we will research how to create a bootable USB drive.

Form: Research jquery autocomplete feature

We will research the practicality of the jquery autocomplete, by following a corresponding tutorial. This will be used to auto-suggest values for various input fields within the html form. These suggestions will be dynamically created based on the dataset chosen during training.

Determine Database

We will determine if we need a database scheme to store the data sets.

Add 'html_form.js'

We will initially have html_form.js be responsible for adding additional form fields.

Properly append form fields

We want to ensure that we append additional form fields after the Remove buttons, not after the Add more buttons.

Remove Flask and dependencies

We no longer require the use of Flask. Therefore, we will remove the submodule, and any dependencies needed for Flask:

$ sudo apt-get remove libapache2-mod-wsgi

Form: display submit button

We will only display the form submit button for the following two cases:

analysis session: known factors, independent variables have been provided
training session: classification, dependent / independent variables are been provided

Form: Implement AJAX to prevent form redirect

We will override the submit event using AJAX, and prevent the default action.

Implement 'fieldsets' for 'tests/php/index.php'

We will implement fieldsets to group similar form elements together in a <fieldset>.

Properly 'Add more' form elements

Form elements added is always the second element in the field element array. We prefer elements be added to end of the field array. This will require adjusting html_form_delegator.js.

Install 'scikit-learn' Submodule

We need to install scikit-learn:

$ cd /var/www/html/machine-learning/scikit-learn/
$ python setup.py build
$ sudo python setup.py install

'html_form.js' creates additional form elements

html_form.js will be responsible for creating additional form fieldsets based on various datalist choices. For example, the choice of Session Type appends the corresponding fieldset, that is either Training Session, or Analysis Session to the form DOM structure. Therefore, tests/php/index.php will be responsible for creating an initial form with just one fieldset:

Session Type

Next, we will need to remove all form fieldsets within tests/php/index.php except the one listed above. Also, we need to add a reference to html_form.js, since it was earlier renamed to html_form_delegator.js.

Create 'test/index.php' test page

We will create tests/index.php to be our test page for the Support Vector Machines (SVM). This page will pass a few parameters to the imported python/logic_loader.py. For example, we will pass the required parameter whether this is a training, or analysis instance (string), an optional object containing XML feed urls, an optional object containing attributes of XML elements we would like to parse for.

logic_loader.py will delegate the parameters to other python scripts in order to return either a message indicating a successful training session, or a prediction.

Training Session - Display 'Estimated Analysis' fieldset

We will have the Known Factors fieldset determine the DOM creation of the Estimated Analysis fieldset. The changes required for this issue will be very similar to the issue Form: Display 'Training Type' Fieldset.

Add 'Flask' Submodule

We will add the Fask submodule.

Move current 'html_form.js' source code

We will move the current source code for html_form.js into html_form_delegator.js. Then, we will need to update the reference of the rename javascript in tests/php/index.php.

Note: Once this is complete, html_form.js will create a single fieldset.

Create 'python/logic_loader.php'

We will create python/logic_loader.php which will be the primary script that will import and instantiate other required python scripts. logic_loader.php may call upon the following scripts:

Note: this script will be passed three parameters:

(required) type string, indicating whether this is a training, or analysis instance
(optional) type object, containing XML feed urls
(optional) type object, containing attributes of XML elements to be parsed

If both, optional parameters have not been passed in, and the required parameter indicates training this script will prompt users to manually input data (i.e. upload file). The latter option will be defined in this script. But, data_creator.py will not have the ability (yet) to parse the uploaded file.

Add jquery-1.8.3.js

We will add jquery-1.8.3.js from the whisper repository.

Create backend logic to parse contents of URL

We will create python/xml_retriever.py as a generic class that will take two parameters:

the URL of the xml document to parse
an object containing the xml attributes to parse from the xml document

This class, after being instantiated will return an object. This return object will be the important xml attributes we wanted to parse for (as defined by the second parameter of this class).

Create 'python/svm_training.py'

We will create python/svm_training.py which will send data to python/data_creator.py, then store the SVM dataset into a mySQL database. Then, python/svm_training.py will call upon python/svm_model.py which will be responsible for creating the Support Vector Machine model, respectively.

Note: It is important that we escape any parameters received from php/logic_loader.php.

Form: remove <br> generated in 'html_form.js'

We will remove the multiple <br> tags generated by html_form.js.

Form: Add 'Training Type' datalist

Just as we have the <datalist> for the Session type, we need to create another <datalist> for Training Session Type. This field will need to be placed above the fieldset Supply Dataset within tests/php/index.php.

Flask: Send form POST data to Python script

We will research which python web framework to implement for receiving POST data from an HTML5 <form> element, which we hope to send to a python script.

Documentation: Install 'scikit-learn' Submodule

We need to document the steps required to install scikit-learn.

Note: We will install flask which will allow use to send POST data to the SVM component to the scikit-learn submodule.

Research 'Support Vector Machines' (wiki)

To gain a better understanding of Support Vector Machines, we will research the following:

SVM Classification

General Definition
Requirement
Other Modeling Techniques (SVM in general)
Limitations (curse of dimensionality, applies generally to multi-dimensional models)

SVM Regression

Background: Regression Analysis
SVM: Regression Analysis

Loss Functions

Loss Functions: Quick Summary
Loss Functions: SVM Regression
Loss Functions: SVM Classification
Types of Regression

Install 'Flask' Submodule

Now, that we have flask in this repository, we will proceed with the installation:

sudo apt-get install python-pip
sudo python setup.py develop

Documentation: Create initial outline

We need to create an initial outline for the README.md.

Define 'Training' class in 'python/data_creator.py'

We will define the created python/data_creator.py, which will be responsible for storing our merged SVM dataset into a database.

Form: Add 'Dataset Type' datalist

Just as we have the <datalist> for the Session type, we need to create another <datalist> for Dataset Type. This field will need to be placed above the datalist Training Type, and Supply Dataset within tests/php/index.php.

Generate Data - choose XML feed

This issue assists with the overall understanding of how Support Vector Machines are used. We will begin by choosing an acceptable XML feed to generate a data set.

Adjust 'tests/php/index.php' form

We previously decided to have php as the language that distributes necessary parameters to our python scripts. Therefore, tests/php/index.php will need to be adjusted so that POST data is sent to python/logic_loader.php, not python/logic_loader.py.

Form: Add 'Analysis Model' datalist

Just as we have the <datalist> for the Session type, we need to create another <datalist> for Select Model. This field will need to be placed above the fieldset Known Factors within tests/php/index.php. The element will list all the trained models available for analysis.

Include 'jquery-1.8.3.js' within 'tests/php/index.php'

We will add the following within <head> of tests/php/index.php:

<script src='src/js/jquery-1.8.3.js'></script>

Add 'scikit-learn' Submodule

We found the IRC #scikit-learn very useful. As such, we will use this implementation instead of the pyML we had previously considered. We may at a later time decide to use, or mirror the latter project to GitHub.

Form: Display 'Training Type' Fieldset

Currently, a choice on the Dataset Type fieldset creates both Supply Dataset, and Training Type fieldsets. Instead, we will add conditional javascript, when a user provides context for Supply Dataset, then the Training Type fieldset will be created in the DOM.

Mirror / Add 'pyML' Submodule

We will attempt to add pyML from SourceForge as a submodule in this GitHub repository. This will allow us to use the Support Vector Machines. However, this project doesn't exist in GitHub. We will contact the maintainer from SourceForge, Professor Asa Ben-Hur, and offer to mirror the code into GitHub.

The steps to mirror svn to GitHub is a fairly simple clone command.

IRC #cmusphinx (08/13/14 ~ 6:40pm EST):

jeffreylevesque: nshm, was it difficult to create a mirror from svn to github, like you did with sphinx libraries?

nshm: no, thats just few commands
nshm: git svn clone
nshm: git push
nshm: there are tutorials on the web
nshm: https://www.atlassian.com/git/migration#!migration-convert

Create 'python/svm_initializer.py'

We will create python/svm_initializer.py. This script will determine if we need to undergo training, or perform analysis. Therefore, depending on what is needed, the following may be called:

Form - Add unique classnames for each fieldset

We will add a unique classname for each fieldset within the <form> element in tests/php/index.php.

Form: 'Add more' determinant function

We will need a function that counts the number of input fields created by the Add more button, and check whether all created field elements are defined. This function will be implemented for any fieldsets being created that depends on an input field that can be appended with additional elements. This will ensure that users provide data for all form fields, before succeeding fieldsets are available.

Research scikit-learn syntax and implementation (wiki)

We will determine the syntax requirements for both Support Vector Machine Classification, and Regression within our implementation of scikit-learn.

SVM Classification

Preprocessing Data Set
Training On Data Set
Implement new Classifier

SVM Regression

Preprocessing Data Set
Training On Data Set
Implement new Regression Model

Add 'delegation listener' for 'Remove' button

Since, we've added the Remove buttons, we now need to define its behavior. Upon being clicked, it must remove the last corresponding input element within the immediate containing fieldset.

Form: Add 'Remove last' button

We will add Remove buttons to the immediate right of the Add more buttons. This will be responsible for removing the last respective field element within the corresponding fieldset in the form.

Documentation: Define 'Linux Packages' Subsection

We need to add the following scikit-learn dependency to the Linux Packages subsection:

$ sudo apt-get install python-numpy

Update 'Web Interface' documentation subsection

We need to include instructions on how to use the Web Interface, and how to interpret prediction results.

Documentation: Fix 'Linux Packages' Subsection

The following needs to be marked up as code:

# General Packages:

# Scikit Package(s):
sudo apt-get install python-numpy