Giter Club home page Giter Club logo

multimodal-image-retrieval's Introduction

Multimodal-Image-Retrieval

In this work, a multi-modal medical image retrieval approach that incorporates both visual and textual features for improved image retrieval performance is presented. In the discussed model, SIFT features are used for capturing the important visual features of the medical images and Latent Dirichlet Allocation (LDA) is used to effectively represent the topics of the clustered SIFT features. To derive the composite feature set, two different fusion techniques were experimented with - early and late fusion. In early fusion, features obtained from an autoencoder and a modified VGG-16 model were used. The late fusion approach was implemented as an ensemble of both visual and textual features, aided by a SVM based classification for improving retrieval performance. Experiments showed that the drop in performance when the textual features are incorporated indicates that the co-occurrence matrix was not a effective way of fusing the textual and visual features in this case. Further attempts to decrease sparsity using autoencoder and using VGG features did not improve the performance. Separating out the textual and visual components using the late fusion approach gave better results. The performance with visual-features-only model was improved by re-ranking the result list using an independently trained text classifier. This outperformed the early fusion approaches proposed in this work as well as those described in other contemporary works.

File descriptions

File Description
step1_image.py Used for generating the visual bag of words
tf_kmeans.py Tensorflow K-Means used in step1_image.py
lda_image.py Computes the latent visual topics
step_1_text.py Textual feature extraction
step_2_text.py SVM on textual features
co_occurence.py The get_vistex()is used to compute the co-occurence matrix described in the early fusion approach
autoencoder.py The autoencoder used for reducing sparsity in the early fusion approach
vgg_net.py The VGG network used for reducing sparsity in the early fusion approach
metrics.py The various metrics which are used for evaluation
evaluate_just_visual.py Used to compute the performance by using just the visual features
evaluate_vistex.py Used to compute the performance by using just the co-occurence matrix
evaluate_autoencoder.py Used to compute the performance by using just the co-occurence matrix compressed using autoencoder
evaluate_vgg.py Used to compute the performance by using just the co-occurence matrix compressed using VGG
evaluate_late_fusion.py The late fusion approach and its evaluation
other util code gen_path_list.py, irma_reader.py

multimodal-image-retrieval's People

Contributors

vikram-mm avatar suhasbs avatar aditya5558 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.