Giter Club home page Giter Club logo

siamese_net's Introduction

A siamese-like model with hard-mining for image and semantic fused data

The code is the reference implementation of a Siamese architecture applied for cross-temporal matrching of geographical data used in the publication "Margarita Khokhlova, Valerie Gouet-Brunet, Nathalie Abadie, and Liming Chen. “Cross-year multi-modal image retrieval using siamese networks”. To appear at the proceesings of The 27th IEEE International Conference on Image Processing (2020).".

The architecture proposed is used to learn the descriptors for aerial images of the same geographic zone taken 15 years apart. Both images and semantic labels are used in an early fusion scenario to produce a compact descriptor, which can then be exploited in an image retrieval task.

The model and dataloaders can be found in corresponding files. The model is implemented using Keras, the architecture is shown below.

alt text

The architecture is based on classical Siamese netwroks implementations (see Keras siamese Demo or any other), but modified for my custom data pairs and early fusion scenario for multi-modal data (i.e. the network takes two images as an input). The backbone is ResNet50. Binary Cross-Entropy loss is used in this version.

The dataloaders are all custom. The hard mining is performed via pre-calculating embeddings with current network weights and creating positive-negative pairs of images. The re-computing of hard samples can be performed several times during the training to mine for new hard pairs. In the current implementation hard mining happens each 5 steps. An example of the input image pairs (positive pairs) is shown below. alt text

The main files: model_for_siamese.py - model definiton train_siamese.py -training with hard-mining and an binary cross-entropy (recommended) or focal loss

Unfortunately, we do not provide the final dataset for this work but the unprocessed version of it can be found on the website of ign. The data are called BD TOPO and BD Ortho. https://www.data.gouv.fr/en/datasets/bd-ortho-r-50-cm/.

Map@5 for unique image correspondence retrieval is used along with the unsuprvised KNN based on computed image descriptors.

The final descriptor dimension can be tuned, I got the best results with the number 128 since it is smaller, but 256 also seem to give a similar performance. 512 tends to be less stable to train but we didn't perform a complete hyper-parameters search for this descriptor size. The map@5 curves for 128 & 256 are shown below.

alt text

siamese_net's People

Contributors

margokhokhlova avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.