Giter Club home page Giter Club logo

image-based-facial-expression-recognition's Introduction

Facial-Expression-Recognition

This note introduces ViT_SE, Difference of Embeddings, and Disentangled Diffrence of Embeddings models on FER task.

📝 Contents

  • Experimental Model Introduction

    • ViT_SE
    • Difference of Embeddings
    • Disentangled Difference of Embeddings
  • Experimental Results

    • Dataset
      • AffectNet small
      • Camera two of CTBC Dataset (collected by MISLAB)
    • Experimental Results of AffectNet small
    • Experimental Results of CTBC Dataset
    • Ablation Study of Difference of Embeddings Model

📝 Experimental Model Introduction

  • ViT_SE

    Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition
    作者提出Vison transformer會逐漸從global attention轉變為local attention,於是作者決定加入SE block來以重新調整local attention features之間的關聯性。

    Model Structure

    drawing

    Input shape:[Batch_size, 3, 224, 224]

    How to implement?

    1. Install Transformer
    pip install transformers
    
    2. Read the following resouce

    Then, you can click the official introduction pages to learn the API of ViT model, or you can see the introduction and implementation here. Otherwise, you can directly refer the source code.

  • Difference of Embeddings

    This model is going to use difference of embeddings (between Neutral and target expressions) and target expressions embeddings to classify the target expression.

    Model Structure

    Difference_of_Embeddings model Input shape:[Batchsize, 2, Cin, Height, Width]
      You have to input 2 images(Neutral and target expressions image)
      into the model at the same time.
    

    How to implement?

    Use Feature Extractor which is based on MobileNetV3 large and pretrained on ImageNet dataset to infer the embeddings of Neutral image and Target expression image, and then concatenate the difference of embeddings and target expression embedding to infer the class of target expression image.

    You can refer the source code here.

  • Disentangled Difference of Embeddings

    This model is going to separate into 2 parts.

    1. The first part is going to learn the Emotion feature extractor which aims to ignore the identity of difference people.
    2. The second part is going to use the Emotion feature extractor to get the emotion embeddings of Neutral image and Target expression image. After getting the embeddings, we can calculate the difference of two embeddings which can represent the variation between Neutral and Target emotion and concatenate the difference with the embedding of target expression to classify the Target expression image.

    Model Structure

    Pretrained_Emotion_Encoder model Input shape:[Batchsize, Cin, Height, Width]
    Disentangled_Difference_of_Embeddings model Input shape:[Batchsize, 2, Cin, Height, Width]

    How to implement?

    This model is trained in 2 steps

    1. Train Pretrained_Emotion_Encoder by using concatenation of Identity embedding and Emotion embedding to classify the expressions.(The pretrained_Identity_Encoder is based on ResNet50 trained on MS1M and fine-tuned on VGGFace2)
    2. Use Pretrained Emotion Encoder to infer the embeddings of Neutral image and Target expression image, and then concatenate the difference of embeddings and target expression embedding to infer the class of target expression image.

📝 Experimental Results

  • Dataset

    1. AffectNet small

    AffectNet is an annotated dataset collected in the wild and contains more than 1M facial images. In this experiment, We only use the sample images from manually annotated part of the AffectNet, and these images consists of 8 discrete facial expressions.

    Preprocessing procedures
    1. Random sample 10% pictures of eight labels from original manually annotated AffectNet
    2. Crop the face region
    3. Resize to 224×224
    Training and Testing data
    Expressions Neutral Happiness Sadness Surprise
    Training data 7487 13441 2545 1409
    Testing data 500 500 500 500
    Expressions Fear Disgust Anger Contempt
    Training data 637 380 2488 375
    Testing data 500 500 500 500

    2. CTBC

    CTBC is an annotated dataset consisting of 7 labels and 10 subjects collected by MISLAB. In the following experiments, we will use 10 folds validation to validate the dataset. Due to the RAM problems of Colab, We use part of the front face images(camera2) to train our model in some experiments.

    Preprocessing procedures of Experimental Results of CTBC Dataset
    1. Sample 16081 cam2 images
    2. Resize to 224×224
    1. Sample all of cam2 data
    2. Resize to 128×128
  • Experimental Results of AffectNet small

    The experimental results demonstrate that the Disentangled_Difference_of_Embeddings improve the accuracy on AffectNet small by using the Pretrained Emotion Encoder to only extract the emotion features and ignore the difference identities problems in AffectNet.

    • Experimental Settings

      Hyper parameters Value
      Training data 28762
      Testing data 4000
      Batchsize 32
      Epochs 30
      Optimizer Adam
      Loss function Cross Entropy
    • Experimental Results Comparison table of AffectNet small

  • Expereimental Results of CTBC Dataset

    • Experimental Settings

      Hyper parameters Value
      Batchsize 32
      Epochs 30
      Optimizer Adam
      Loss function Cross Entropy
      Validation method 10 folds validation

    • Experimental Results

  • Ablation Study of Difference of Embeddings

    The experimental results demonstrate the combination of Baseline and difference of embeddings representing the variation of expressions improves performance of expression recognition.

    • Experimental Settings

      Hyper parameters Value
      Batchsize 32
      Epochs 30
      Optimizer Adam
      Loss function Cross Entropy
      Validation method 10 folds validation

    • Experimental Results

image-based-facial-expression-recognition's People

Contributors

hackmd-deploy avatar jerry940100 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.