Small sample app for testing the Microsoft AutoML framework

The code is inspired by the following tutorial: Auto generate a binary classifier using the CLI and Github repo here

The code is based on training data from 30k labelled comments on Wikipedia ('Wikipedia Detox'). The data consists of a tab-separated values file (.tsv) with each line containing a comment and a value of either 1 or 0 representing whether it is an offensive comment or not. 3 different sizes of training data (100, 1000 and 30000 samples) enable testing of various levels of data quality.

Prerequisites

Install .NET Core SDK 2.2 or later from here
Install ML.NET CLI by entering the following in you terminal (on Windows you might have to log out/in after installing .NET Core SDK in order to have paths correctly setup to recognize the 'dotnet tool' command):

> dotnet tool install -g mlnet

(Optional) Install VS Code from here

Running the sample

cd into the root directory of this repo where this README.md file is placed.
Train a simple model using the small dataset and using a short amount of time:

> mlnet auto-train --task binary-classification --dataset wiki100Rows.tsv --label-column-name Label --max-exploration-time 30

This will attempt training a number of models within the given time and should output a generated model with accuracy around 83% placed in 'SampleBinaryClassification⁩/SampleBinaryClassification.Model⁩/MLModel.zip'

Run the custom console app 'CustomBinaryClassificationConsoleApp' in order to test the generated model:

> cd CustomBinaryClassificationConsoleApp
> dotnet run

You should notice that some senetences are badly predicted such as "you are pretty" which is categorized as hostile.

Now try training a model with the large dataset using the following command (note the longer training time of 180 seconds due to the larger dataset):

> cd ..
> mlnet auto-train --task binary-classification --dataset wiki30000Rows.tsv --label-column-name Label --max-exploration-time 180

This should result in a much better performing model with accuracy around 96% which you can test again by running the commands:

> cd CustomBinaryClassificationConsoleApp
> dotnet run

And now you should hopefully experience a much better prediction regarding whether or not the sentences entered are hostile.

jason6583 / automl-playground Goto Github PK

automl-playground's Introduction

Small sample app for testing the Microsoft AutoML framework

Prerequisites

Running the sample

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent