Giter Club home page Giter Club logo

marz's Introduction

Project MaRz

Machine Learning in Real-Time with Fuzzification

This repository contains the core code files of the MaRz project.

Files here are working and tested code and are used in the MaRz project. Some aspects of the project are still in development, but everything here is working and not testing files.

Dependencies

  • numpy for dataset management
  • unittest for testing
  • sklearn for splitting and analysis

For specific experiments

  • matplotlib.pyplot and seaborn for charts
  • tensorflow for heart disease comparison

How to Use

The process of making a query with MaRz happens in three phases: Preprocessing, Hyper-boxing, and Querying. For full tests, entire small (~5000 lines or fewer) datasets can be processed all together.

Format and Preprocessing

In order to ensure compatibility, datasets should be formatted as 2D numpy arrays with rows of inputs and columns of attributes. The last column of the dataset should be the target outputs of the data.

The preprocessing done in dataset_preprocessing.generate_index_table creates a sorter table of indices of the dataset and fills it with columns where-in each column is the indices of the corresponding column of the data set, in the order they would be in if the dataset were sorted (stable) by that column.

The second part of preprocessing is to generate the base fuzzy width for the dataset. This is done with dataset_preprocessing.get_base_fuzzy, which takes a properly formatted dataset and returns a list of value ranges for each feature of the dataset, referred to as the base_fuzzy.

Hyper-boxing

At this point, an input is needed. An input is a list of features the same size as a single entry in the dataset, but without a target on the end.

Generating an appropriate hyperbox for the data is a small tuning process, which uses the get_alpha function in get_alpha_sorted.py. This version uses an index table as generated in the preprocessing. The unsorted version does not use an index table, but it less efficient.

The alpha value is a number between 0 and 1 by which each number of the base_fuzzy is multiplied in order to create a hyperbox around the input within the dataset which contains the input as well as a minimum of n additional datapoints, as indicated by the points argument. get_alpha also returns a container of the indices of the points within the hyper-box.

Querying

To query MaRz with the chosen input uses the marz_get_output.get_output function, which applies the fuzzy calculation to the points in the hyper-box and produces a prediction for the input. This output is a decimal value appropriate to the targets of the dataset.

Full Tests

The run_dataset.py file makes it convenient to process a full dataset and get back two lists containing the real and predicted values returned when each line of a dataset is given as input to the MaRz process. These output lists can be used to calculate an accuracy score, with the entire dataset as "testing split." See this in code in the airfoil_data.py experiement.

In order to handle columns where every value is the same, the base_fuzzy for that column is converted from 0 to 0.000001, to prevent division by 0 downstream.

Citation

This code is currently under anonymous submission for the 2024 AAAI Conference. External sources are cited where they are used.

marz's People

Contributors

s-gorrin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.