Giter Club home page Giter Club logo

machine-learning's Introduction

Machine Learning Build Status Coverage Status

This project provides a web-interface, as well as a programmatic-api for various machine learning algorithms.

Supported algorithms:

Contributing

Please adhere to contributing.md, when contributing code. Pull requests that deviate from the contributing.md, could be labelled as invalid, and closed (without merging to master). These best practices will ensure integrity, when revisions of code, or issues need to be reviewed.

Note: support, and philantropy can be inquired, to further assist with development.

Configuration

Fork this project, using of the following methods:

  • simple clone: clone the remote master branch.
  • commit hash: clone the remote master branch, then checkout a specific commit hash.
  • release tag: clone the remote branch, associated with the desired release tag.

Installation

To proceed with the installation for this project, users will need to decide whether to use the rancher ecosystem, or use docker-compose. The former will likely be less reliable, since the corresponding install script, may not work nicely across different operating systems. Additionally, this project will assume rancher as the primary method to deploy, and run the application. So, when using the docker-compose alternate, keep track what the corresponding endpoints should be.

If users choose rancher, both docker and rancher must be installed. Installing docker must be done manually, to fulfill a set of dependencies. Once completed, rancher can be installed, and automatically configured, by simply executing a provided bash script, from the docker quickstart terminal:

cd /path/to/machine-learning
./install-rancher

Note: the installation, and the configuration of rancher, has been outlined if more explicit instructions are needed.

If users choose to forgo rancher, and use the docker-compose, then simply install docker, as well as docker-compose. This will allow the application to be deployed from any terminal console:

cd /path/to/machine-learning
docker-compose up

Note: the installation, and the configuration of docker-compose, has been outlined if more explicit instructions are needed.

Execution

Both the web-interface, and the programmatic-api, have corresponding unit tests which can be reviewed, and implemented. It is important to remember, the installation of this application will dictate the endpoint. More specifically, if the application was installed via rancher, then the endpoint will take the form of https://192.168.99.101:XXXX. However, if the docker-compose up alternate was used, then the endpoint will likely change to https://localhost:XXXX, or https://127.0.0.1:XXXX.

Web Interface

The web-interface, can be accessed within the browser on https://192.168.99.101:8080:

web-interface

The following sessions are available:

  • data_new: store the provided dataset(s), within the implemented sql database.
  • data_append: append additional dataset(s), to an existing representation (from an earlier data_new session), within the implemented sql database.
  • model_generate: using previous stored dataset(s) (from an earlier
  • data_new, or data_append session), generate a corresponding model into
  • model_predict: using a previous stored model (from an earlier model_predict session), from the implemented nosql datastore, along with user supplied values, generate a corresponding prediction.

When using the web-interface, it is important to ensure the csv, xml, or json file(s), representing the corresponding dataset(s), are properly formatted. Dataset(s) poorly formatted will fail to create respective json dataset representation(s). Subsequently, the dataset(s) will not succeed being stored into corresponding database tables. This will prevent any models, and subsequent predictions from being made.

The following dataset(s), show acceptable syntax:

Note: each dependent variable value (for JSON datasets), is an array (square brackets), since each dependent variable may have multiple observations.

Programmatic Interface

The programmatic-interface, or set of API, allow users to implement the following sessions:

  • data_new: store the provided dataset(s), within the implemented sql database.
  • data_append: append additional dataset(s), to an existing representation (from an earlier data_new session), within the implemented sql database.
  • model_generate: using previous stored dataset(s) (from an earlier
  • data_new, or data_append session), generate a corresponding model into
  • model_predict: using a previous stored model (from an earlier model_predict session), from the implemented nosql datastore, along with user supplied values, generate a corresponding prediction.

A post request, can be implemented in python, as follows:

import requests

endpoint = 'https://192.168.99.101:9090/load-data'
headers = {
    'Authorization': 'Bearer ' + token,
    'Content-Type': 'application/json'
}

requests.post(endpoint, headers=headers, data=json_string_here)

Note: more information, regarding how to obtain a valid token, can be further reviewed, in the /login documentation.

Note: various data attributes can be nested in above POST request.

It is important to remember that the docker-compose.development.yml, has defined two port forwards, each assigned to its corresponding reverse proxy. This allows port 8080 on the host, to map into the webserver-web container. A similar case for the programmatic-api, uses port 9090 on the host.

machine-learning's People

Contributors

jeff1evesque avatar kumida avatar protojas avatar vitao18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machine-learning's Issues

Create 'python/svm_analysis.py'

We will create python/svm_analysis.py, which will parse the POST data and request the respective SVM model from a mySQL database query. This query will be defined within a method from python/svm_model.py.

Note: The queried Support Vector Machine model, is previously defined by the same script python/svm_model.py, when imported and called from python/svm_training.py.

Note: It is important that we escape any parameters received from php/logic_loader.php.

Temporary Working Environment

For the remaining week, we have a temporary Windows 7 machine:

Windows edition
Windows 7 Professional
Copyright 2009 Microsoft Corporation. All rights reserved
Service Pack1

System
Manufacturer: Dell
Model: Precision M4600
Rating: 6.9 Windows Experience Index
Processor: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz 2.40 GHz
Installed memory (RAM): 16.0 GB
System type: 64-bit Operating System
Pen and Touch: No Pen or Touch Input is available for the Display

Hard drive
File system: NTFS
Capacity: 237 GB

Unfortunately, we will not be setting up a dual-boot. Also, this machine already implements VMware Player 5.0.2 build-1031769. So, we will find the corresponding version of Ubuntu Server to work within the VMware Player.

Since the issue of setting up working environments has come up more than several times across other projects, it serves precursor for something fantastic. More specifically, we will research how to create a bootable USB drive.

Form: Research jquery autocomplete feature

We will research the practicality of the jquery autocomplete, by following a corresponding tutorial. This will be used to auto-suggest values for various input fields within the html form. These suggestions will be dynamically created based on the dataset chosen during training.

Determine Database

We will determine if we need a database scheme to store the data sets.

Add 'html_form.js'

We will initially have html_form.js be responsible for adding additional form fields.

Remove Flask and dependencies

We no longer require the use of Flask. Therefore, we will remove the submodule, and any dependencies needed for Flask:

$ sudo apt-get remove libapache2-mod-wsgi

Form: display submit button

We will only display the form submit button for the following two cases:

  • analysis session: known factors, independent variables have been provided
  • training session: classification, dependent / independent variables are been provided

Properly 'Add more' form elements

Form elements added is always the second element in the field element array. We prefer elements be added to end of the field array. This will require adjusting html_form_delegator.js.

'html_form.js' creates additional form elements

html_form.js will be responsible for creating additional form fieldsets based on various datalist choices. For example, the choice of Session Type appends the corresponding fieldset, that is either Training Session, or Analysis Session to the form DOM structure. Therefore, tests/php/index.php will be responsible for creating an initial form with just one fieldset:

  • Session Type

Next, we will need to remove all form fieldsets within tests/php/index.php except the one listed above. Also, we need to add a reference to html_form.js, since it was earlier renamed to html_form_delegator.js.

Create 'test/index.php' test page

We will create tests/index.php to be our test page for the Support Vector Machines (SVM). This page will pass a few parameters to the imported python/logic_loader.py. For example, we will pass the required parameter whether this is a training, or analysis instance (string), an optional object containing XML feed urls, an optional object containing attributes of XML elements we would like to parse for.

logic_loader.py will delegate the parameters to other python scripts in order to return either a message indicating a successful training session, or a prediction.

Create 'python/logic_loader.php'

We will create python/logic_loader.php which will be the primary script that will import and instantiate other required python scripts. logic_loader.php may call upon the following scripts:

Note: this script will be passed three parameters:

  • (required) type string, indicating whether this is a training, or analysis instance
  • (optional) type object, containing XML feed urls
  • (optional) type object, containing attributes of XML elements to be parsed

If both, optional parameters have not been passed in, and the required parameter indicates training this script will prompt users to manually input data (i.e. upload file). The latter option will be defined in this script. But, data_creator.py will not have the ability (yet) to parse the uploaded file.

Create backend logic to parse contents of URL

We will create python/xml_retriever.py as a generic class that will take two parameters:

  • the URL of the xml document to parse
  • an object containing the xml attributes to parse from the xml document

This class, after being instantiated will return an object. This return object will be the important xml attributes we wanted to parse for (as defined by the second parameter of this class).

Form: Add 'Training Type' datalist

Just as we have the <datalist> for the Session type, we need to create another <datalist> for Training Session Type. This field will need to be placed above the fieldset Supply Dataset within tests/php/index.php.

Research 'Support Vector Machines' (wiki)

To gain a better understanding of Support Vector Machines, we will research the following:

SVM Classification

  • General Definition
  • Requirement
  • Other Modeling Techniques (SVM in general)
  • Limitations (curse of dimensionality, applies generally to multi-dimensional models)

SVM Regression

  • Background: Regression Analysis
  • SVM: Regression Analysis

Loss Functions

  • Loss Functions: Quick Summary
  • Loss Functions: SVM Regression
  • Loss Functions: SVM Classification
  • Types of Regression

Form: Add 'Dataset Type' datalist

Just as we have the <datalist> for the Session type, we need to create another <datalist> for Dataset Type. This field will need to be placed above the datalist Training Type, and Supply Dataset within tests/php/index.php.

Adjust 'tests/php/index.php' form

We previously decided to have php as the language that distributes necessary parameters to our python scripts. Therefore, tests/php/index.php will need to be adjusted so that POST data is sent to python/logic_loader.php, not python/logic_loader.py.

Form: Add 'Analysis Model' datalist

Just as we have the <datalist> for the Session type, we need to create another <datalist> for Select Model. This field will need to be placed above the fieldset Known Factors within tests/php/index.php. The element will list all the trained models available for analysis.

Add 'scikit-learn' Submodule

We found the IRC #scikit-learn very useful. As such, we will use this implementation instead of the pyML we had previously considered. We may at a later time decide to use, or mirror the latter project to GitHub.

Form: Display 'Training Type' Fieldset

Currently, a choice on the Dataset Type fieldset creates both Supply Dataset, and Training Type fieldsets. Instead, we will add conditional javascript, when a user provides context for Supply Dataset, then the Training Type fieldset will be created in the DOM.

Mirror / Add 'pyML' Submodule

We will attempt to add pyML from SourceForge as a submodule in this GitHub repository. This will allow us to use the Support Vector Machines. However, this project doesn't exist in GitHub. We will contact the maintainer from SourceForge, Professor Asa Ben-Hur, and offer to mirror the code into GitHub.

The steps to mirror svn to GitHub is a fairly simple clone command.

IRC #cmusphinx (08/13/14 ~ 6:40pm EST):

jeffreylevesque: nshm, was it difficult to create a mirror from svn to github, like you did with sphinx libraries?

nshm: no, thats just few commands
nshm: git svn clone
nshm: git push
nshm: there are tutorials on the web
nshm: https://www.atlassian.com/git/migration#!migration-convert

Form: 'Add more' determinant function

We will need a function that counts the number of input fields created by the Add more button, and check whether all created field elements are defined. This function will be implemented for any fieldsets being created that depends on an input field that can be appended with additional elements. This will ensure that users provide data for all form fields, before succeeding fieldsets are available.

Research scikit-learn syntax and implementation (wiki)

We will determine the syntax requirements for both Support Vector Machine Classification, and Regression within our implementation of scikit-learn.

SVM Classification

  • Preprocessing Data Set
  • Training On Data Set
  • Implement new Classifier

SVM Regression

  • Preprocessing Data Set
  • Training On Data Set
  • Implement new Regression Model

Form: Add 'Remove last' button

We will add Remove buttons to the immediate right of the Add more buttons. This will be responsible for removing the last respective field element within the corresponding fieldset in the form.

Documentation: Test Scripts

We need to describe our unit tests located in the /test/ directory. Then, we need to discuss the various logs this repository produces, and how it may be useful.

HTML additional form fields

Users need to be able to add more than one uploaded file, more than one URL to an xml file for the SVM data set, and more than one dependent or independent variables.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.