Giter Club home page Giter Club logo

knnclassifier2.0's Introduction

KNN classifier

"In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method [...] k-NN is a type of classification where the function is only approximated locally and all computation is deferred until function evaluation" 1

This project is an implementation of the kNN algorithm, to estimate the type of Iris flower based on four properties:

  • Sepal length
  • Sepal width
  • Petal length
  • Petal width

This project is written in c++, by Michal Iakobashvili and Ohad Heines


Installation

This project relies on CMake to generate the executable file. If you don't have CMake yet, you can download it here, or alternatively, in a Debian-based system, run

$ sudo apt install cmake

Then, once cmake is installed, clone this project's repository with:

$ git clone 'https://github.com/bananabajanana/KNNClassifier.git'

Usage

You can run the code with our default input files, or alternatively put in your own. The server has it's training data in TrainingData.csv. In addition the client's input and output files can be given as arguments during compilation time, but it is recommended to put them as Unclassified.csv and Classified.csv folders as shown below.

KNNClassifier
│   ...
│
└───Client
│   │   ...
│   │   
│   └───Data
│   │   │   Unclassified.csv
│   │       
│   └───Output
│       │   Classified.csv
│
└───Server
│   │   ...
│   │   
│   └───Data
│   │   │   TrainingData.csv

You can run the project using our provided CMakeLists.txt file:

$ mkdir -p build && cd build
$ cmake ..
$ make -j && make Server && make Client
$ ./Server

After this, the compilation and linkage of the codes is complete and the Server is running. All that is left is to run the Client program from a different terminal, as following:

$ cd {path}/build
$ ./Client {IPath} {OPath}

Replacing {path} with the folder path to the project's repository on your personal computer.

In addition, {IPath} should be replaced with the path to the input file to be classified, and {OPath} with the path to output the data to. It is also possible to keep not specify {IPath} and {OPath} arguments, which will cause the program to work with the default arguments specified above.

Implementation

Server

We first created a Flower object, that will store a single flower's type and parameters. The flower's type is stored as an enum, with four options: the three possible types, and an undefined option.

enum typeIris { versicolor, virginica, setosa, undifined };

class Flower {
private:
    typeIris type;
    const NPoint character;
    ...
};

The NPoint is a representation of the Flower's parameters, and functions as a point in an n-dimensional coordinate system (in 4d with the current implementation example, but can easily be expanded).

Our code implements the algorithm with three different possible distance functions: Euclidean distance, Manhattan distance, and Chebyshev distance, but to allow the addition of other distance functions, we implemented generic code with an abstract distance class.

class DistanceCalc {
public:
    virtual double distance(NPoint p1, NPoint p2) = 0;
    ...
};

In addition, to keep track of all our different types of distances implemented, we made a DistancesData builder class.

class DistancesData {
    /**
     * @return all the types of distance calculators that can be used.
     */
    static std::vector<DistanceCalc*>& getAllTypes();
};

Finally, we created a Classifier class. This class identifies a given vector of flowers based on an input list of already identified Irises. This is implemented step-by-step according to the kNN algorithm, first finding the k closest neighbors to the unidentifiable flower, and then finding which category is most common among them.

With these implemented, the server runs indefinitely in the following loop:

graph TD;
    Listen-->Accept;
    Accept-->DataFromUser;
    DataFromUser-->OutputClass;
    OutputClass-->DataFromUser;
    OutputClass-->UserDisconnects;
    UserDisconnects-->Accept;
  • Listen: The server waits for a user to connect.
  • Accept: The accept stage in the connection process.
  • DataFromUser: The server receives information about an unclassified flower from the user.
  • OutputClass: The server sends to the user the classification of the flower.
  • UserDisconnects: The current user disconnects, allowing the server to interact with a new user.

All the transitions are managed by a middle step we'll call Select. The Select is in charge of managing the following things:

  1. Gracefully closing the server in case of internal errors.
  2. Managing timeouts: if there is a non-active client - kick it, and in any case keep listening for further actions.
  3. If there are no connected clients, a client must be connecting, and we'll accept it.
  4. If there is an active client, we'll communicate with it through the server protocol.

Server Parameters

  • Port = 6969 We chose this port number since it is not a commonly used port and is not part of the super-user port range (0-1024). 2
  • Timeout = 5sec We gave clients a generous amount of time before kicking them out.
  • Buffer Size = 128 Since the user can only send one flower info at a time, we limited the user's message size to 128 bytes.

Client

The client is a sort of simple demo for interaction with the server. It takes an input file from the user, and sends the flower information - one by one - to the server, while outputting the given classifications to an output file.

graph TD;
    ReadLine-.No lines left.->Exit;
    ReadLine-->SendInfo;
    SendInfo-->WriteLine;
    WriteLine-->ReadLine;
    
  • ReadLine: The client program reads a line from the input file.
  • SendInfo: The client sends the flower info previously read to the server.
  • WriteLine: The client writes the given class to the output file.
  • Exit: If there are no lines left, the client closes its connection with the server.

Authors


Acknowledgements

For a better understanding of the algorithms, c++ language, and multiple dependencies, we used the following sites:

knnclassifier2.0's People

Contributors

bananabajanana avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.