Giter Club home page Giter Club logo

fforest's Introduction

fforest

diagram_fforest
Your database is split into multiple sub-databases. This configuration is the one mainly used by data-scientists and researchers in machine-learning.

With the help of the Salammbo software, fforest construct decision trees and fuzzy decision trees from the sub-sub-train databases, using a handful of triangular norms. Then fforest compute different information about the databases, such as :

  • If a tree has correctly classified your instance.
  • The quality of every tree.
  • The difficulty for an instance to be classified.

Installation

fforest is available on PyPI, just run

$ pip install fforest

in a shell to install the software. The package creates 1 new commands :

$ fforest

Usage

For a regular usage, you only have to identify the way your data is stored in your CSV file and the software'll handle the rest for you. The most used options are :

--encoding-input=ENCODING
--delimiter-input=CHAR
--quoting-input=QUOTING
--quote-character-input=CHAR
--have-header
--class=NAME
--identifier=ID

The software can't predict if theses options are badly configured, and this can lead to unknown results. Be careful !

For example, here are the following parameters used to test the software with two databases from the fforest/test/data directory.

$ fforest fforest/test/data/bank.csv --delimiter-input=';' --have-header --class y
$ fforest fforest/test/data/australian.dat --delimiter-input=' ' --quoting-input nonnumeric --class -1

The goal of this software is to be very modular. To achieve this, a lot of options are available. So do not hesitate to consult the help page.

Requirements

  • This software doesn't work with Windows OSes, this is due to the fact that Salammbo, the binary used to create Fuzzy-Trees only works with GNU/Linux distributions.
  • This software can be run with Python 3.5 and above. It hasn't been tested with other Python 3 versions, but due to type hints, it should works with Python 3.4.

Dependencies

  • docopt >= 0.6.2
  • numpy >= 1.12.1
  • scipy >= 0.19.0

Contributing

  1. Fork the project.
  2. Create your feature branch : git checkout -b my-new-feature.
  3. Commit your changes : git commit -am 'Added some cool feature !'.
  4. Push to the branch : git push origin my-new-feature.
  5. Submit a pull request.

Todo

  • Implement the guess option for the parameter --have-header with the help of this code snippet : https://docs.python.org/3/library/csv.html#csv.Sniffer.has_header
  • Implement the guess option for the parameters --delimiter, --quoting, --quote-character and --encoding.
  • Add more messages related to the verbosity
  • Add some progress bars with the package : https://pypi.python.org/pypi/progressbar2
  • Implement other format for the input database (each format must be changed to the CSV format during the preprocessing phase)
  • Round floating values with https://stackoverflow.com/a/1317578
  • Add the --max-instances-at-once parameter, which force some functions which loads entire databases in memory to only loads a precise amount of instances at once.
  • Clean all the non-main entry points documentation inside the entry_points_documentation.xml file.
  • The code inside the args_cleaner.py and init_environment.py modules is working but is very ugly and thus, difficult to maintain. A better idea is welcome.
  • Implement the random option as a splitting method.

Acknowledgments

I would like to thanks the LFI team from the LIP6 laboratory, and specifically Mr. Marsala Christophe, for helping me during the entire duration of my internship, and for offering me the knowledge and resources needed to build this software.

License

This project is licensed under the !!! License - see the LICENSE.txt file for details.

References

  • C. Marsala and B. Bouchon-Meunier, “An adaptable system to construct fuzzy decision trees,” in Proc. of the NAFIPS’99 (North American Fuzzy Information Processing Society), New York, USA, 1999, pp. 223–227 (http://webia.lip6.fr/~marsala/articles/1999-nafips.pdf)
  • C. Marsala, and M. Rifqi. Summarizing Fuzzy Decision Forest by subclass discovery. IEEE International Conference on Fuzzy Systems, FUZZIEEE’2013, Jul 2013, Hyderabad, India. IEEE, pp.1-6, 2013, (http://hal.upmc.fr/hal-01198855)
  • P. W. Cooper, "The hypersphere in pattern recognition", Information and Control, vol. 5, pp. 324-346, December 1962.
  • P. W. Cooper, "A Note on an Adaptive Hypersphere Decision Boundary", IEEE Transactions on Electronic Computers Volume EC-15, Issue 6, DECEMBER, 1966, Pages 948-949
  • Jigang Wang, Predrag Neskovic, Leon N. Cooper, "Pattern Classification via Single Spheres", Lecture Notes in Computer Science Volume 3735
  • J. Forest, "Caractérisation de classes par la découverte automatique de sous-classes", thesis from Université Pierre et Marie Curie, Laboratoire d’informatique de Paris 6, September 2009
  • C. Marsala, "Apprentissage artificiel et raisonnement flou", thesis from Université Pierre et Marie Curie – Paris VI, November 2010

logo_upmc           logo_lip6           logo_lfi

fforest's People

Contributors

nicolasbizzozzero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.