KNN classifier

"In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method [...] k-NN is a type of classification where the function is only approximated locally and all computation is deferred until function evaluation" 1

This project is an implementation of the kNN algorithm, to estimate the type of Iris flower based on four properties:

Sepal length
Sepal width
Petal length
Petal width

This project is written in c++, by Michal Iakobashvili and Ohad Heines

Installation

This project relies on CMake to generate the executable file. If you don't have CMake yet, you can download it here, or alternatively, in a Debian-based system, run

$ sudo apt install cmake

Then, once cmake is installed, clone this project's repository with:

$ git clone 'https://github.com/bananabajanana/KNNClassifier.git'

Usage

You can run the code with our default input files, or alternatively put in your own. The server has it's training data in TrainingData.csv. In addition the client's input and output files can be given as arguments during compilation time, but it is recommended to put them as Unclassified.csv and Classified.csv folders as shown below.

KNNClassifier
│   ...
│
└───Client
│   │   ...
│   │   
│   └───Data
│   │   │   Unclassified.csv
│   │       
│   └───Output
│       │   Classified.csv
│
└───Server
│   │   ...
│   │   
│   └───Data
│   │   │   TrainingData.csv

You can run the project using our provided CMakeLists.txt file:

$ mkdir -p build && cd build
$ cmake ..
$ make -j && make Server && make Client
$ ./Server

After this, the compilation and linkage of the codes is complete and the Server is running. All that is left is to run the Client program from a different terminal, as following:

$ cd {path}/build
$ ./Client {IPath} {OPath}

Replacing {path} with the folder path to the project's repository on your personal computer.

In addition, {IPath} should be replaced with the path to the input file to be classified, and {OPath} with the path to output the data to. It is also possible to keep not specify {IPath} and {OPath} arguments, which will cause the program to work with the default arguments specified above.

Implementation

Server

We first created a Flower object, that will store a single flower's type and parameters. The flower's type is stored as an enum, with four options: the three possible types, and an undefined option.

enum typeIris { versicolor, virginica, setosa, undifined };

class Flower {
private:
    typeIris type;
    const NPoint character;
    ...
};

The NPoint is a representation of the Flower's parameters, and functions as a point in an n-dimensional coordinate system (in 4d with the current implementation example, but can easily be expanded).

Our code implements the algorithm with three different possible distance functions: Euclidean distance, Manhattan distance, and Chebyshev distance, but to allow the addition of other distance functions, we implemented generic code with an abstract distance class.

class DistanceCalc {
public:
    virtual double distance(NPoint p1, NPoint p2) = 0;
    ...
};

In addition, to keep track of all our different types of distances implemented, we made a DistancesData builder class.

class DistancesData {
    /**
     * @return all the types of distance calculators that can be used.
     */
    static std::vector<DistanceCalc*>& getAllTypes();
};

Finally, we created a Classifier class. This class identifies a given vector of flowers based on an input list of already identified Irises. This is implemented step-by-step according to the kNN algorithm, first finding the k closest neighbors to the unidentifiable flower, and then finding which category is most common among them.

With these implemented, the server runs indefinitely in the following loop:

graph TD;
    Listen-->Accept;
    Accept-->DataFromUser;
    DataFromUser-->OutputClass;
    OutputClass-->DataFromUser;
    OutputClass-->UserDisconnects;
    UserDisconnects-->Accept;

Listen: The server waits for a user to connect.
Accept: The accept stage in the connection process.
DataFromUser: The server receives information about an unclassified flower from the user.
OutputClass: The server sends to the user the classification of the flower.
UserDisconnects: The current user disconnects, allowing the server to interact with a new user.

All the transitions are managed by a middle step we'll call Select. The Select is in charge of managing the following things:

Gracefully closing the server in case of internal errors.
Managing timeouts: if there is a non-active client - kick it, and in any case keep listening for further actions.
If there are no connected clients, a client must be connecting, and we'll accept it.
If there is an active client, we'll communicate with it through the server protocol.

Server Parameters

Port = 6969 We chose this port number since it is not a commonly used port and is not part of the super-user port range (0-1024). 2
Timeout = 5sec We gave clients a generous amount of time before kicking them out.
Buffer Size = 128 Since the user can only send one flower info at a time, we limited the user's message size to 128 bytes.

Client

The client is a sort of simple demo for interaction with the server. It takes an input file from the user, and sends the flower information - one by one - to the server, while outputting the given classifications to an output file.

graph TD;
    ReadLine-.No lines left.->Exit;
    ReadLine-->SendInfo;
    SendInfo-->WriteLine;
    WriteLine-->ReadLine;

ReadLine: The client program reads a line from the input file.
SendInfo: The client sends the flower info previously read to the server.
WriteLine: The client writes the given class to the output file.
Exit: If there are no lines left, the client closes its connection with the server.

Authors

Acknowledgements

For a better understanding of the algorithms, c++ language, and multiple dependencies, we used the following sites:

bananabajanana / knnclassifier2.0 Goto Github PK

knnclassifier2.0's Introduction

KNN classifier

This project is written in c++, by Michal Iakobashvili and Ohad Heines

Installation

Usage

In addition, {IPath} should be replaced with the path to the input file to be classified, and {OPath} with the path to output the data to. It is also possible to keep not specify {IPath} and {OPath} arguments, which will cause the program to work with the default arguments specified above.

Implementation

Server

Server Parameters

Client

Authors

Acknowledgements

knnclassifier2.0's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent