Giter Club home page Giter Club logo

mallet's Introduction

Learning the Latent "Look"

This code provides an implementation of the research paper:

Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images
Wei-Lin Hsiao, Kristen Grauman
International Conference on Computer Vision (ICCV), 2017

See our project page for more detail.

This project is based on MALLET polylingual LDA implementation, with a few minor changes:
We smoothed the topic/word counts to topic/word probabilities.
For stability, after convergence, we run 100 more samplings to average the learnt topic/word probabilities.

Installation

To build a Mallet 2.0 development release, you must have the Apache ant build tool installed. From the command prompt, first change to the mallet directory, and then type ant

If ant finishes with "BUILD SUCCESSFUL", Mallet is now ready to use.

Usage

  1. Prepare your documents. Every document is a single line in the file, every line is
docID language_name word_1 word_2 ...

The documents for each language will be in its own file. The Nth document in language 1 is assumed to have the same topic distribution (though a different vocabulary) as the Nth document in language 2, and so on. If a language lacks Nth document, just leave it blank as

docID language_name 

An example corpus can be downloaded here.

  1. Import documents for each language. The token-regex is the appropriate one to use with our attribute vocabulary.
bin/mallet import-file --input <document_filename> --output <sequence_filename> --keep-sequence --token-regex '[\p{L}\p{N}_<>/-]+|[\p{P}]+'
  1. Train a model.
bin/mallet run cc.mallet.topics.PolylingualTopicModel --language-inputs <language1_sequence> <language2_sequence> ... <languageN_sequence> --alpha <alpha_val> --beta <beta_val> --num-topics <number_of_topics> --output-average-doc-topics <theta_save_to_filename> --output-average-topic-keys <topic_top_words_save_to_filename> --output-average-key-probs <phi_save_to_filename>

For details about the commands please visit the API documentation and website at: http://mallet.cs.umass.edu/

mallet's People

Contributors

mimno avatar andrewmccallum avatar casutton avatar wlhsiao avatar liminyao avatar mucapaz avatar mwunderlich avatar mkrnr avatar davidsoergel avatar danring avatar mihaiiancu avatar michaelxh avatar drevicko avatar capdevc avatar attapol avatar napsternxg avatar seansouthern avatar renaud avatar nrockweiler avatar hussain7 avatar juharris avatar gturri avatar carschno avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.