Giter Club home page Giter Club logo

set-tree's Introduction

Set-Tree

Extending decision trees to process sets

This is the official repository for the paper: "Trees with Attention for Set Prediction Tasks" (ICML21). http://proceedings.mlr.press/v139/hirsch21a.html

This repository contains a prototypical implementaion of Set-Tree and GBeST (Gradient Boosted Set-Tree) algorithms

Getting Started

The Set-Tree package can be downloaded from PIP: pip install settree

We also supply the code and datasets for reproducing our experimetns under exps folder.

Background and motivation

In many machine learning applications, each record represents a set of items. A set is an unordered group of items, the number of items may differ between different sets. Problems comprised from sets of items are present in diverse fields, from particle physics and cosmology to statistics and computer graphics. In this work, we present a novel tree-based algorithm for processing sets.

set_problems

The model

Set-Tree model comprised from two components:

  1. Set-compatible split criteria: we specifically support the familly of split criteria defined by the following equation and parametrized by alpha and beta.
  2. Attention-Sets: a mechanism for allplying the split criteria to subsets of the input. The attention-sets are derived forn previous split-criteria and allows the model to learn more complex set-functions.

Implementation

The current implementation is based on Sklean's BaseEstimator class and is fully compatible with Sklearn. It contains two main components: SetDataset object, that receives records as attribute. records is a list of numpy arrays, each array with shape (n_i, d) represents a single record (set). d is the dimention of each item in the record and n_i is the number of items in the i's record and may differ between records. The second componnent is SetTree model inherited from Sklean's BaseEstimator and has simillar attributes.

When configuring Set-Tree one should also configure:

  • operations : list of the operations to be used
  • use_attention_set : binary flag for activating the attention-sets mechanism
  • attention_set_limit : the number of ancestors levels to derive attention-sets from
  • use_attention_set_comp : binary flag for activating the attention-sets compatibility option

A simplified code snippet for training Set-Tree:

import settree
import numpy as np

set_data = settree.SetDataset(records=[np.random.randn(2,5) for _ in range(10)])
labels = np.random.randn(10) >= 0.5
set_tree_model = settree.SetTree(classifier=True,
                                 criterion='entropy',
                                 operations=settree.OPERATIONS,
                                 use_attention_set=True,
                                 use_attention_set_comp=True,
                                 attention_set_limit=5,
                                 max_depth=10)
set_tree_model.fit(set_data, labels)

A simplified code snippet for training GBeST:

import settree
import numpy as np

set_data = settree.SetDataset(records=[np.random.randn(2,5) for _ in range(10)])
labels = np.random.randn(10) >= 0.5
gbest_model = settree.GradientBoostedSetTreeClassifier(learning_rate=0.1, 
                                                       n_estimators=10,
                                                       criterion='mse',
                                                       operations=settree.OPERATIONS,
                                                       use_attention_set=True,
                                                       use_attention_set_comp=True,
                                                       attention_set_limit=5,
                                                       max_depth=10)
gbest_model.fit(set_data, labels)

For further details and examples see: example.ipynb.

Citation

If you use Set-Tree in your work, please cite:

@InProceedings{pmlr-v139-hirsch21a,
  title = 	 {Trees with Attention for Set Prediction Tasks},
  author =       {Hirsch, Roy and Gilad-Bachrach, Ran},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {4250--4261},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR}
}

License

Set-Tree is MIT licensed, as found in the LICENSE file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.