Giter Club home page Giter Club logo

synbols's Introduction

#Synbols

Probing Learning Algorithms with Synthetic Datasets

CircleCI Documentation Status

Synbols

Progress in the field of machine learning has been fueled by the introduction of benchmark datasets pushing the limits of existing algorithms. Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. In this sense, we introduce Synbols โ€” Synthetic Symbols โ€” a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images. Synbols leverages the large amount of symbols available in the Unicode standard and the wide range of artistic font provided by the open font community. Our tool's high-level interface provides a language for rapidly generating new distributions on the latent features, including various types of textures and occlusions. To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of distribution generalization, unsupervised representation learning, and object counting.

[paper]

Installation

The easiest way to install Synbols is via PyPI. Simply run the following command:

pip install synbols

Software dependencies

Synbols relies on fonts and system packages. To ensure reproducibility, we provide a Docker image with everything preinstalled. Thus, the only dependency is Docker (see here to install).

Usage

Using predefined generators

$ synbols-datasets --help
$ synbols-datasets --dataset=some-large-occlusion --n_samples=1000 --seed=42

Generating some-large-occlusion dataset. Info: With probability 20%, add a large occlusion over the existing symbol.
Preview generated.
 35%|############################2                                                   | 353/1000 [00:05<00:10, 63.38it/s]

Defining your own generator

Examples of how to create new datasets can be found in the examples directory.

def translation(rng):
    """Generates translations uniformly from (-2, 2), going outside of the box."""
    return tuple(rng.uniform(low=-2, high=2, size=2))


# Modifies the default attribute sampler to fix the scale to a constant and the (x,y) translation to a new distribution
attr_sampler = basic_attribute_sampler(scale=0.5, translation=translation)

generate_and_write_dataset(dataset_path, attr_sampler, n_samples)

To generate your dataset, you need to run your code in the Synbols runtime environment. This is done using the synbols command as follows:

synbols mydataset.py --foo bar

Contact

For any bug or feature requests, please create an issue.

synbols's People

Contributors

recursix avatar aldro61 avatar prlz77 avatar mattcraddock avatar david-vazquez avatar optimass avatar issamlaradji avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.