Giter Club home page Giter Club logo

huse's Introduction

HUSE

PyTorch implementation of HUSE: Hierarchical Universal Semantic Embeddings

Classification task involving cross-modal representation learning corresponding to images and text. HUSE projects images and text into a shared latent space by using a shared classification layer for image and text modalities. It incorporates semantic information into a universal embedding space. The original paper referenced can be found here.

Data

For this implementation, images from the a retail store has been used. The images can be found in ./images folder, and the corresponding .csv files for both training and validation can be found at ./data/ folder.

Prerequisites

  • python >= 3.6
  • pytorch 1.4.0
  • pytorch-pretrained-bert 0.6.2 find it here.
  • sentence-transformers 0.2.6.1 find it here.

Usage

Train model

python main.py train --csv_file_path </path/to/train/csv/file> --batch_size 4 --epochs 100 --save_model_dir </path/to/save-model/folder> --cuda 1
  • --csv_file_path path to the csv file which contains the train image paths and text, default is ./data/train_data.csv
  • --batch_size batch size used while training, default is 4
  • --epochs number of training epochs, default is 100
  • --save_model_dir path to folder where trained model will be saved, default is ./saved_models/
  • --cuda set it to 1 for running on GPU, 0 for CPU. (default is 1)

Evalutating the model

python main.py eval --csv_file_path </path/to/eval/csv/file> --model </path/to/saved/model> --cuda 1
  • --csv_file_path path to the csv file which contains the validation image paths and text, default is ./data/eval_data.csv
  • --model saved model to be used for evaluating the images
  • --cuda set it to 1 for running on GPU, 0 for CPU. (default is 1)

Refer to ./main.py for other command line arguments.

Model Architecture

model_high_level Image embeddings and text embeddings are passed through their respective towers to get universal embeddings. These embeddings are passed through a shared hidden layer. The semantic graph is used to regularize the universal embedding space.

Citation

@misc{narayana2019huse,
title={HUSE: Hierarchical Universal Semantic Embeddings},
author={Pradyumna Narayana and Aniket Pednekar and Abishek Krishnamoorthy and Kazoo Sone and Sugato Basu},
year={2019},
eprint={1911.05978},
archivePrefix={arXiv},
primaryClass={cs.CV}
 }

huse's People

Contributors

amanjain1397 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.