Giter Club home page Giter Club logo

em's Introduction

EM

We demonstrate the use of EM for topic modeling on NIPS dataset and image segmentation on select images.

The data can be found in /data folder and images including the segmented images generated can be found in /images folder.

Python 3.5 is required to run the code. We tested and recommend using Anaconda and Jupyter notebook. The code and output can be seen in the .ipynb files. The .py files were generated by Jupyter and may require minor modification to compile successfully. Additionally, popular python modules such as numpy, scikit learn and scipy are required to run the code.

The code was optimized for performance by replacing loops with linear algebra operations.

Topic Models

We used multivariate multinomial distribution model to cluster documents into 30 topics based on distribution of words.

We used scipy's CSR matrix to optimize performance since the dataset is sparse. In addition, we used starting points from results of scikit learn's k-means.

To avoid problems with underflow/overflow in computation of w, we used Log Sum Exp approximation trick.

Subsequently we plotted Q's as they vary over iterations of EM.

Upon convergence, we graphed probability with which each topic is selected and tabulated top 10 most probable words for each topic.

The relevant published notebook can be viewed here courtesy of Anaconda cloud.

Image Segmentation

We used multivariate normal distribution model to segment images into 10, 20 and 50 clusters. We used different convergence threshold for different images in proporation to the value of Q.

As in topic models, we initiated with starting points from k-means, but we also experimented with several different starting points for the sunset image. There was not much variance after all.

The relevant published notebook can be viewed here courtesy of Anaconda cloud.

References

For theory and model construction, please refer to chapter "Clustering using Probability Models" in David Forsyth's textbook on Applied Machine Learning.

em's People

Contributors

zpahuja avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.