Giter Club home page Giter Club logo

mmfeat's Introduction

MMFeat

Multi-modal features toolkit in Python, developed at the University of Cambridge Computer Laboratory. The aim of this toolkit is to make it easier for researchers to use multi-modal features. Both image and sound (i.e., visual and auditory representations) are supported.

The following models are currently available:

  1. CNN: Convolutional neural network representations for images
  2. BoVW: Bag-of-visual-words for images, using DSIFT local descriptors
  3. BoAW: Bag-of-audio-words for sound files, using MFCC local descriptors

Getting started

The following dependencies need to be installed: numpy, scipy, scikit-learn and yaml. You may need to install Pillow for reading images from files into arrays. If you want to use the CNN model, you will also need to install Caffe. For BoAW you will need to install librosa as well.

Installing the main dependencies on Ubuntu:

sudo apt-get install build-essential python-dev python-setuptools \
                python-numpy python-scipy python-sklearn python-yaml

Citations

If you use this toolkit in your work, please cite the following paper:

D. Kiela (2016). MMFEAT: A Toolkit for Extracting Multi-Modal Features. Proceedings of ACL 2016: System Demonstrations, Berlin, Germany.

Tools

The toolkit comes with two tools that do not require any knowledge of Python and that can be run from the command-line.

miner.py

For mining images or sound files. Before you can use the miner you need to acquire API keys from Google, Bing, FreeSound or Flickr and set them in miner.yaml (see miner-example.yaml for an example). ImageNet does not require an API key. The query_file argument should point to a file that contains a list of queries, one query per line. Usage:

miner.py [-h] [-n NUM_FILES]
                {bing,google,freesound,flickr,imagenet} query_file data_dir

Examples:

# Get 10 images per query term from Bing and store in a data directory
python miner.py -n 10 bing list_of_queries.txt ./img_data_dir
# Get 100 sound files per query term from FreeSound and store in a data directory
python miner.py -n 100 freesound list_of_queries.txt ./sound_data_dir

extract.py

For extracting representations from a data directory. The data directory needs to contain an index file (index.pkl) that the is automatically generated by the miner, or that you can manually construct. Usage:

extract.py [-h] [-gpu] [-k K] [-c CENTROIDS] [-o {pickle,json,csv}]
                  [-s SAMPLE_FILES] [-m {vgg,alexnet}] [-v]
                  {boaw,bovw,cnn} data_dir out_file

Examples:

# Extract BoVW representations with k=100, sampling 10% for clustering, and store as a Python pickle.
python extract.py -k 100 -s 0.1 bovw ./img_data_dir ./output_vectors.pkl
# Extract CNN representations, using an AlexNet on a GPU, and store as a JSON file.
python extract.py -gpu -o json cnn ./img_data_dir ./output_vectors.json
# Extract BoAW representation with k=300, sampling 50% for clustering, and store as a CSV file.
python extract.py -k 300 -s 0.5 -o csv boaw ./sound_data_dir ./output_vectors.csv

To extract layers from the CNN you need to tell the toolkit where it can find Caffe. For example (run this, or simply add to your ~/.bashrc):

export CAFFE_ROOT_PATH="/usr/local/caffe/"

Demos

1. Similarity and relatedness (1-simrel)

The demo downloads images from either Google or Bing and creates BoVW or CNN representations. It then evaluates similarity and relatedness (i.e., Spearman correlation with human similarity ratings) on the well-known MEN and SimLex-999 datasets. See e.g. Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics

2. ESP Game dataset (2-esp)

The demo downloads the ESP Game dataset sample and extracts it. It then builds an index from the label lookup and obtains BoAW or CNN representations for the thumbnail images. The representations are stored in a file for later use.

3. Matlab interfacing (3-matlab)

A simple demo to show that you can get local descriptors from Matlab and load them. This means you can use VLFeat or other libraries for getting descriptors (for instance, PHOW) as well.

4. Music instrument clustering (4-instruments)

The demo downloads sound files for 8 instruments of two classes and obtains auditory representations. It then clusters the representations and reports the outcomes. See Multi- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception

5. Image dispersion scores (5-dispersion)

The demo downloads images for "elephant" and "happiness" and calculates the image dispersion scores of these concepts. See Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More.

6. Image search plot (6-searchplot)

A simple plotting demo of images returned by various search engines. Requires matplotlib.

7. CNN layers (7-cnnlayers)

Shows how you can transfer different layers from the CNN models.

8. ImageNet (8-imagenet)

Uses ImageNet to retrieve images for provided synsets. This requires NLTK and the NLTK WordNet corpus to be installed.

mmfeat's People

Contributors

douwekiela avatar benbuleong avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.