Giter Club home page Giter Club logo

mixture-of-experts's Introduction

Mixture of experts layers for Keras

This repository contains Keras layers implementing convolutional and dense mixture of experts models.

Dense mixture of experts layer

The file DenseMoE.py contains a Keras layer implementing a dense mixture of experts model:

This layer can be used in the same way as a Dense layer. Some of its main arguments are as follows:

  • units: the output dimensionality
  • n_experts: the number of experts ()
  • expert_activation: activation function for the expert model ()
  • gating_activation: activation function for the gating model ()

Please see DenseMoE.py for additional arguments.

Convolutional mixture of experts layer

The file ConvolutionalMoE.py contains Keras layers implementing 1D, 2D, and 3D convolutional mixture of experts models:

where * denotes a convolution operation. These layers can be used in the same way as the corresponding standard convolutional layers (Conv1D, Conv2D, Conv3D).

Examples

The file conv_moe_demo.py contains an example demonstrating how to use these layers. The example here is based on the cifar10_cnn.py file in the keras/examples folder and it builds a simple convolutional deep network, using either standard convolutional and dense layers or the corresponding mixture of experts layers, and compares the performance of the two models on CIFAR-10 image recognition benchmark.

The figure below compares the validation accuracy of the standard convolutional model with that of the mixture of experts model with two experts (K=2, I have observed signs of overfitting for larger numbers of experts). The error bars are standard errors over 4 independent repetitions. The mixture of experts model performs significantly better than the standard convolutional model, although admittedly this is not a controlled experiment as the mixture of experts model has more parameters (please let me know if you would be interested in helping me run more controlled experiments comparing the performance of these two models).

The file jacobian_moe_demo.py contains another example illustrating how to use the dense mixture of experts layer. The example here essentially implements the simulations reported in this blog post.

I have tested these examples with Tensorflow v1.8.0 and Keras v2.0.9 on my laptop without using a GPU. Other configurations may or may not work. Please let me know if you have any trouble running these examples.

mixture-of-experts's People

Contributors

eminorhan avatar wzs9717 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.