Giter Club home page Giter Club logo

face-emotion-recognition's Introduction

HSEmotion (High-Speed face Emotion recognition) library

Downloads pypi package PWC

This repository contains code that was developed by A. Savchenko during his research at the HSE University and Sber AI Lab.

Usage

Special python packages hsemotion and hsemotion-onnx were prepared to simplify the usage of our models for face expression recognition and extraction of visual emotional embeddings. They can be installed via pip:

    pip install hsemotion
    pip install hsemotion-onnx

In order to run our code on the datasets, please prepare them firstly using our TensorFlow notebooks: train_emotions.ipynb, AFEW_train.ipynb and VGAF_train.ipynb.

If you want to run our mobile application, please, run the following scripts inside mobile_app folder:

python to_tflite.py
python to_pytorchlite.py

NOTE!!! I updated the models so that they should work with recent timm library. However, for v0.1 version, please be sure that EfficientNet models for PyTorch are based on old timm 0.4.5 package, so that exactly this version should be installed by the following command:

pip install timm==0.4.5

News

  • Our models let our team HSEmotion took the second place in the Compound Expression Recognition Challenge and the 3rd place in the Action Unit Detection during the sixth Affective Behavior Analysis in-the-wild (ABAW) Competition
  • The paper "Facial Expression Recognition with Adaptive Frame Rate based on Multiple Testing Correction" has been accepted as Oral talk at ICML 2023. The source code to reproduce the results of this paper are available at this repository, see subsections "Adaptive Frame Rate" at abaw3_train.ipynb and train_emotions-pytorch-afew-vgaf.ipynb
  • Our models let our team HSE-NN took the first place in the Learning from Synthetic Data (LSD) Challenge and the 3rd place in the Multi-Task Learning (MTL) Challenge in the fourth ABAW Competition
  • Our models let our team HSE-NN took the 3rd place in the multi-task learning challenge, 4th places in Valence-Arousal and Expression challenges and 5th place in the Action Unite Detection Challenge in the third Affective Behavior Analysis in-the-wild (ABAW) Competition. Our approach is presented in the paper accepted at CVPR 2022 ABAW Workshop.

Details

All the models were pre-trained for face identification task using VGGFace2 dataset. In order to train PyTorch models, SAM code was borrowed.

We upload several models that obtained the state-of-the-art results for AffectNet dataset. The facial features extracted by these models lead to the state-of-the-art accuracy of face-only models on video datasets from EmotiW 2019, 2020 challenges: AFEW (Acted Facial Expression In The Wild), VGAF (Video level Group AFfect), EngageWild; and ABAW CVPR 2022 and ECCV 2022 challenges: Learning from Synthetic Data (LSD) and Multi-task Learning (MTL).

Here are the performance metrics (accuracy on AffectNet, AFEW and VGAF), F1-score on LSD, on the validation sets of the above-mentioned datasets and the mean inference time for our models on Samsung Fold 3 device with Qualcomm 888 CPU and Android 12:

Model AffectNet (8 classes) AffectNet (7 classes) AFEW VGAF LSD MTL Inference time, ms Model size, MB
mobilenet_7.h5 - 64.71 55.35 68.92 - 1.099 16 ± 5 14
enet_b0_8_best_afew.pt 60.95 64.63 59.89 66.80 59.32 1.110 59 ± 26 16
enet_b0_8_best_vgaf.pt 61.32 64.57 55.14 68.29 59.72 1.123 59 ± 26 16
enet_b0_8_va_mtl.pt 61.93 64.94 56.73 66.58 60.94 1.276 60 ± 32 16
enet_b0_7.pt - 65.74 56.99 65.18 - 1.111 59 ± 26 16
enet_b2_7.pt - 66.34 59.63 69.84 - 1.134 191 ± 18 30
enet_b2_8.pt 63.03 66.29 57.78 70.23 52.06 1.147 191 ± 18 30
enet_b2_8_best.pt 63.125 66.51 56.73 71.12 - - 191 ± 18 30

Please note, that we report the accuracies for AFEW and VGAF only on the subsets, in which MTCNN detects facial regions. The code contains also computation of overall accuracy on the complete testing set, which is slightly lower due to the absence of faces or failed face detection.

Research papers

If you use our models, please cite the following papers:

@inproceedings{savchenko2023facial,
  title = 	 {Facial Expression Recognition with Adaptive Frame Rate based on Multiple Testing Correction},
  author =       {Savchenko, Andrey},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning (ICML)},
  pages = 	 {30119--30129},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  url={https://proceedings.mlr.press/v202/savchenko23a.html}
}
@inproceedings{savchenko2021facial,
  title={Facial expression and attributes recognition based on multi-task learning of lightweight neural networks},
  author={Savchenko, Andrey V.},
  booktitle={Proceedings of the 19th International Symposium on Intelligent Systems and Informatics (SISY)},
  pages={119--124},
  year={2021},
  organization={IEEE},
  url={https://arxiv.org/abs/2103.17107}
}
@inproceedings{Savchenko_2022_CVPRW,
  author    = {Savchenko, Andrey V.},
  title     = {Video-Based Frame-Level Facial Analysis of Affective Behavior on Mobile Devices Using EfficientNets},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  month     = {June},
  year      = {2022},
  pages     = {2359-2366},
  url={https://arxiv.org/abs/2103.17107}
}
@inproceedings{Savchenko_2022_ECCVW,
  author    = {Savchenko, Andrey V.},
  title     = {{MT-EmotiEffNet} for Multi-task Human Affective Behavior Analysis and Learning from Synthetic Data},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV 2022) Workshops},
  pages={45--59},
  year={2023},
  organization={Springer},
  url={https://arxiv.org/abs/2207.09508}
}
@article{savchenko2022classifying,
  title={Classifying emotions and engagement in online learning based on a single facial expression recognition neural network},
  author={Savchenko, Andrey V and Savchenko, Lyudmila V and Makarov, Ilya},
  journal={IEEE Transactions on Affective Computing},
  year={2022},
  publisher={IEEE},
  url={https://ieeexplore.ieee.org/document/9815154}
}

face-emotion-recognition's People

Contributors

av-savchenko avatar sunggukcha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

face-emotion-recognition's Issues

Additional files

What additional files need to be created in the project and what is their content
I would be happy to list a file tree for the project files
For example, the files behind the code:
AFFECT_DATA_DIR=ALL_DATA_DIR#+'AffectNet/'
AFFECT_TRAIN_DATA_DIR = AFFECT_DATA_DIR+'full_res/train'
AFFECT_VAL_DATA_DIR = AFFECT_DATA_DIR+'full_res/val'
AFFECT_IMG_TRAIN_DATA_DIR = AFFECT_DATA_DIR+str(INPUT_SIZE[0])+'/train'
AFFECT_IMG_VAL_DATA_DIR = AFFECT_DATA_DIR+str(INPUT_SIZE[0])+'/val'
AFFECT_TRAIN_ORIG_DATA_DIR = AFFECT_DATA_DIR+'orig/train'
AFFECT_VAL_ORIG_DATA_DIR = AFFECT_DATA_DIR+'orig/val'

Request for help in the Training Script for enet_b0_8_best_afew.onnx model

Hello,

I'm reaching out to seek guidance regarding the enet_b0_8_best_afew.onnx model. I'm interested in utilizing this model, but I'm encountering challenges in locating the relevant training script to generate this particular model.

While searching, I came across these scripts: GitHub Link. Additionally, I found that the model is placed under the affectnet folder: GitHub Link.

My understanding is that the model has been trained with EfficientNet B0 using the VGAF dataset, based on the code VGAF_train.ipynb. However, I am unsure if this is the correct procedure or if there are additional steps involved.

Could someone please provide me with more information or guidance on how to properly train and obtain the enet_b0_8_best_afew.onnx model? Any assistance you could offer would be greatly appreciated.

Thank you for your time and support.

Preprocessing of images to run inference

Hello, thank you very much for your work.

I am trying to preprocess a batch of images (I have my own dataset) the way you prepared your data. I'm following the notebook train_emotions.ipynb as it is in Tensforflow and I'm using that framework.

I have a question about the steps of the preprocessing, so I would like to ask you if you can tell me the correct steps. These are the steps I'm following, let me know if I'm right or if something is missing:

  1. I already have my images with the faces detected and croppped, i.e, I have a dataset full of faces like this
    frame9

  2. img = cv2.imread(img_path)

  3. img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

  4. img = cv2.resize(img,(224,224))

  5. Then your notebook shows you make a normalization
    def mobilenet_preprocess_input(x,**kwargs):
    x[..., 0] -= 103.939
    x[..., 1] -= 116.779
    x[..., 2] -= 123.68
    return x
    preprocessing_function=mobilenet_preprocess_input

Here I am having an issue because I cannot cast the subtraction operation between an integer and a float, so I changed it to

def mobilenet_preprocess_input(x,**kwargs):
x[..., 0] = x[..., 0] - 103.939
x[..., 1] = x[..., 1] - 116.779
x[..., 2] = x[..., 2] - 123.68
return x
preprocessing_function=mobilenet_preprocess_input

So, let me know if the process I'm following is correct or if there's something missing.

Thank you!

Why is multi-task learning one head?

Thank you for your wonderful code sharing.

I'm looking at a multi-task learning code train_emotion-pytorch.ipynb.

I know why you used FER, Valence, and Arousal in one head without dividing the head.

Thank you.

about the facial video

Pardon. Could you please tell me how to get the facial video on student's or teacher's device used in your paper. Thanks for your reply!

Accuracy on AffectNet is worse than expected

I am using the HSEmotion package to classify emotions on the AffectNet validation set but my accuracies are worse than what is in the README:

MODEL README Accuracy MINE
enet_b0_8_best_afew 60.95 58.31
enet_b0_8_best_vgaf 61.32 59.66
enet_b0_8_va_mtl 61.93 59.04
enet_b2_8 63.03 61.59
enet_b2_8_best 63.125 61.24

What do you think might be causing this discrepancy?

Here's an excerpt of how I'm using the package:

from hsemotion.facial_emotions import HSEmotionRecognizer
fer=HSEmotionRecognizer(model_name=model_name,device=device)

def predict_score(img_path):
    frame_bgr = cv2.imread(img_path)
    emotion,scores=fer.predict_emotions(frame_bgr,logits=True)
    return EMOTION_MAPPING[emotion]

running code on colab

Hello,
I want to run the code on google colab on the pre-trained models,
after reading the read.me I'm trying to run display_emotions.ipynb but I'm getting errors.
Is it right to start working on display_emotions.ipynb?

Error while testing

Hello and thank you for everything!
What do you think is the cause of this error? Whatever I searched for, I did not find a suitable answer.
image

docker image

Hello, can you give me a docker image file? This way I can study more conveniently.
May all go well with you

Request for a demo code

Hi,

Is there a demonstration code available for running videos or webcam feeds? Thank you.

Scoring error

Hello. Thanks for sharing your great works.

Your scoring script is correct only if len(dataloader) % batch_size == 0.

epoch_val_accuracy = 0
epoch_val_loss = 0
for data, label in test_loader:
    data = data.to(device)
    label = label.to(device)

    val_output = model(data)
    val_loss = criterion(val_output, label)

    acc = (val_output.argmax(dim=1) == label).float().mean()
    epoch_val_accuracy += acc / len(test_loader)
    epoch_val_loss += val_loss / len(test_loader)

My version follows where length is the size of the dataset (not which of dataloader).

loss = 0.0
accuracy = 0.0
for (images, emotions) in tqdm(dataloader):
    images = images.cuda()
    emotions = emotions.cuda()
    preds = model(images)
    # loss
    loss += criterion(preds, emotions) / 1
    # accuracy
    preds = torch.argmax(preds, dim=1)
    acc = torch.eq(preds, emotions).sum()
    accuracy += acc
loss /= length
accuracy /= length

AttributeError: 'EfficientNet' object has no attribute 'act1'

Thank you chenko for the great work! I tried to use effifientNet model to predict facial emotions. I used code from test_hsemotion_package.ipynb. I downloaded the xx.pt file from the models folder and load it from a local directory. The model loaded well. But an error occurs when executing fer.predict_emotions(frame,logits=True)
Here is my code.

import os
import numpy as np
import matplotlib.pyplot as plt
import cv2
from PIL import Image
import torch
from hsemotion.facial_emotions import HSEmotionRecognizer

import torch

use_cuda = torch.cuda.is_available()
device = 'cuda' if use_cuda else 'cpu'

model_path='../models/affectnet_emotions/enet_b2_8_best.pt'

fer=HSEmotionRecognizer(model_name=model_path,device=device)
# model = torch.load(model_path, map_location=torch.device('cpu'))

fpath='../test_images/0_alamy_adoration_emotion_2_22.jpg'
frame_bgr = cv2.imread(fpath)
plt.figure(figsize=(5, 5))
frame = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
plt.axis('off')
plt.imshow(frame)

emotion, scores=fer.predict_emotions(frame,logits=True)
plt.figure(figsize=(3, 3))
plt.axis('off')
plt.imshow(frame)
plt.title(emotion)

The error is:

raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'EfficientNet' object has no attribute 'act1'

My timm version is 0.4.5. Any idea on how to solve this issue?

afew test accuracy is around 55%

hi, thanks for your great work.Now, I'm following your work on afew dataset. But I just got around 55% accuracy by running AFEW_train.ipynb and train_emotions-pytorch.ipynb. When I read your codes, I found you used enet_b0_8_va_mtl.pt model in train_emotions-pytorch.ipynb and used mobilnet_7.h5 model in AFEW_train.ipynb. I am confused which model is the right one and what should I do to reproduce 59% accuracy on afew dateset.
thank you for your replies!

QUestion about inference

Hello.

THanks for your code. I've tried to run your code on webcam and for that i've needed to load model properly from pretrained weights. I've took a look into your train_emotions-pytorch and sought that you're loading model as

model=timm.create_model('tf_efficientnet_b0_ns', pretrained=False)
model.classifier=torch.nn.Identity()
model.load_state_dict(torch.load('../models/pretrained_faces/state_vggface2_enet0_new.pt'))

Then you're adding model.classifier=nn.Sequential(nn.Linear(in_features=1280, out_features=num_classes)) and training your model. If i got it right, you're providing pre-trained weights at the models/affectnet_emotions folder. But question is, how to load them? I've thought it should be something like this

model=timm.create_model('tf_efficientnet_b0_ns', pretrained=False)
model.classifier=nn.Sequential(nn.Linear(in_features=1280, out_features=8))
model.load_state_dict(torch.load('../models/affectnet_emotions/enet_b0_8_best_afew.pt'))

But no, i've got an error. AttributeError: 'EfficientNet' object has no attribute 'copy'. So my guess is wrong then. Should i load it some different way? THanks in advance.

Valence and arousal

Hello again!
I've read your paper and I've seen that you use the circumplex model's variables arousal and valence.
How do those variable appears in the code? I can't find them :(
Thank you,
Amaia

Provide the validation script/notebook.

Hi,

I am fond of your works and paper, but I can not find any validation script to validate your result, especially the highest result with efficientNetB2-8 classes-EffectNet.

Or could you please provide a separate script to pre-process the input images then we can validate the provided weights on your GitHub repository?

Thank you,

Quick question (face-emotion recognition)

Hello!
I'm trying to use your code and I don't understand the range in which the emotions go once you add a picture and run the code.
For example:
Happy: -4.876
Angry: 0.987654
And so on, which is the range in which this emotions can take values of?
Thank you very much,
Amaia

Confidence range for inference using python library

Hi,

First of all, thank you so much for such a convenient setup to use!

I'm using the python library face emotion in my code with the model_name = 'enet_b0_8_best_afew'.
I was wondering what is the range of the confidence returned by the library or this model in particular.
I wasn't able to figure that out.

Thank you

dataset question

i can not find the aff-wild2 dataset,Can you please share this dataset with me, I will be very grateful

How to apply Adaptive Frame Rate model on new dataset?

Hi! Thanks for your excellent work. Could I ask how can I apply the model from the paper Adaptive Frame Rate on the new video dataset? The notebook provided doesn't elaborate on it explicitly. Thanks for your help.

about multi-task learning

Hi,
Thank you for your great job!

I've read your paper, and I have a question about multi-task learning.

Am I right that during learning all new heads (emotions, age, gender, etc) "CMM lower layers" are freezed, so meaning you've trains only the heads for every task? So, am right that age, gender and ethnicity don't influence face emotion recognition features? As I see, you've used efficientnet like backbone adding dense layer like a classifier and train this architecture to get emotion labeling. I can't understand how adding of face attribute recognition task influence the FER accuracy, counting that during the training the common part of the NN architecture is frozen:(

Missing enet_b0_8_va_mtl.ptl

Hi

I am trying to run your Android example. It tries to load file model file "enet_b0_8_va_mtl.ptl", which looks to be missing in the repository.

Is this the file I should generate myself? If yes, I would really appreciate if you can point to the place where this can be done.

Thank you!

A error when runing codes.

When runing AFEW_train.ipynb,
an error occured:

could not broadcast input array from shape (0,112,3) into shape (60,112,3)
at facial_anylysis.py line 274 :
tmp[dy[k]-1:edy[k],dx[k]-1:edx[k],:] = img[y[k]-1:ey[k],x[k]-1:ex[k],:]

why dose this occured? could you please fixed it?

How to prepare the AffectNet dataset?

I have downloaded the AffectNet dataset. It consists of npy files in the below folder format.

images
--->
0.jpg
annotations
-->
0_exp.npy
0_aro.npy
0_val.npy
0_lnd.npy

I'm trying to create a csv file with same columns mentioned in the https://github.com/HSE-asavchenko/face-emotion-recognition/blob/main/src/train_emotions.ipynb

['subDirectory_filePath', 'face_x', 'face_y', 'face_width','face_height', 'facial_landmarks', 'expression', 'valence', 'arousal']

However, I can't seem to find any information in the dataset concerning face x, face y, or face width.

  1. Do I need to calculate them? If so, could you please upload the dataset preparation file?
  2. Is 'facial_landmarks' an array of size 136?

A few suggestions.

Hello!

I have a couple of ideas:

  1. Could you, please, add text description about difference between models, especially between b0 and b2 general types?
  2. Please consider adding hsemotion-onnx package to the pip repository.

Age gender ethinicity model giving same output for different results

`class CNN(object):

def __init__(self, model_filepath):

    self.model_filepath = model_filepath
    self.load_graph(model_filepath = self.model_filepath)

def load_graph(self, model_filepath):
    print('Loading model...')
    self.graph = tf.Graph()
    self.sess = tf.compat.v1.InteractiveSession(graph = self.graph)

    with tf.compat.v1.gfile.GFile(model_filepath, 'rb') as f:
        graph_def = tf.compat.v1.GraphDef()
        graph_def.ParseFromString(f.read())

    print('Check out the input placeholders:')
    nodes = [n.name + ' => ' +  n.op for n in graph_def.node if n.op in ('Placeholder')]
    for node in nodes:
        print(node)

    # Define input tensor
    self.input = tf.compat.v1.placeholder(np.float32, shape = [None, 224, 224, 3], name='input')
    # self.dropout_rate = tf.placeholder(tf.float32, shape = [], name = 'dropout_rate')

    tf.import_graph_def(graph_def, {'input_1': self.input})

    print('Model loading complete!')

    
    # Get layer names
    layers = [op.name for op in self.graph.get_operations()]
    for layer in layers:
        print(layer)

def test(self, data):

    # Know your output node name
    output_tensor1,output_tensor2 ,output_tensor3  = self.graph.get_tensor_by_name('import/age_pred/Softmax: 0'),self.graph.get_tensor_by_name('import/gender_pred/Sigmoid: 0'),self.graph.get_tensor_by_name('import/ethnicity_pred/Softmax: 0')
    output = self.sess.run([output_tensor1,output_tensor2 ,output_tensor3], feed_dict = {self.input: data})

    return output`

Using this code load "age_gender_ethnicity_224_deep-03-0.13-0.97-0.88.pb" and predict on it. But when predicting on images, every time I am getting same output array.

[array([[0.01319346, 0.00229602, 0.00176407, 0.00270929, 0.01408699, 0.00574261, 0.00756087, 0.01012164, 0.01221055, 0.01821703, 0.01120028, 0.00936489, 0.01003029, 0.00912451, 0.00813381, 0.00894791, 0.01277262, 0.01034999, 0.01053109, 0.0133063 , 0.01423471, 0.01610439, 0.01528896, 0.01825454, 0.01722076, 0.01933933, 0.01908059, 0.01899827, 0.01919533, 0.0278129 , 0.02204996, 0.02146631, 0.02125309, 0.02146868, 0.02230236, 0.02054285, 0.02096066, 0.01976574, 0.01990371, 0.02064857, 0.01843528, 0.01697922, 0.01610838, 0.01458549, 0.01581902, 0.01377539, 0.01298613, 0.01378927, 0.01191105, 0.01335083, 0.01154454, 0.01118198, 0.01019558, 0.01038121, 0.00920709, 0.00902615, 0.00936321, 0.00969135, 0.00867239, 0.00838663, 0.00797724, 0.00756043, 0.00890809, 0.00758041, 0.00743711, 0.00584346, 0.00555749, 0.00639214, 0.0061864 , 0.00784793, 0.00532241, 0.00567684, 0.00481544, 0.0052173 , 0.00513186, 0.00394571, 0.00415856, 0.00384584, 0.00452774, 0.0041736 , 0.00328163, 0.00327138, 0.00297012, 0.00369216, 0.00284221, 0.00255897, 0.00285459, 0.00232105, 0.00228869, 0.00218005, 0.0021927 , 0.00236659, 0.00233843, 0.00204793, 0.00209861, 0.00231407, 0.00145706, 0.00179674, 0.00186183, 0.00221309]], dtype=float32), array([[0.62949586]], dtype=float32), array([[0.21338916, 0.19771543, 0.19809113, 0.19525865, 0.19554558]], dtype=float32)]
Is there something am missing or is this .pb file not meant for predicting?

Error on CPU with enet_b2_8_best model

There is following error while running enet_b2_8_best model on CPU with the latest git hse package:

...
    fer = HSEmotionRecognizer(model_name = model_name)
  File "/home/build/.local/lib/python3.10/site-packages/hsemotion/facial_emotions.py", line 49, in __init__
    model=torch.load(path)
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 1049, in _load
    result = unpickler.load()
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 1019, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 1001, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 152, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 136, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

At that, model enet_b0_8_best_afew works fine with the same script on the same CPU.

Is it possible to predict action units?

In the paper "Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices" predictions are made for AUs. I wonder if this is possible with the code provided here?

The issue with converting .pt to .onnx.

Hi, excellent work, and thank you for sharing!
I have a question regarding converting .pt to .onnx using your convert_pt_to_onnx.py script. I encountered an issue where, despite installing timm==0.4.5, it gives an error stating no timm.layer. I understand this is likely due to the timm version, but updating to a newer version causes other unexpected issues. Do you have any suggestion? thanks

Provide the setting for ENetB0

Hi,

Could you please share the setting to train ENetB0 to reach the accuracy of 61.32 in the report?

I can only re-produce about 57%
Capture

Problem with the version of TensorFlow.

Hi!

Thanks for your greate work. However, there are some problems when I want to run the display_emotions.ipynb. I think there is a problem with the versions of the tensorflow and the numpy. Thus, could you please give a more detailed instroduction of the version of tensorflow and other packages that are used in the project?

Thanks a lot!

range of valence and arousal

Thank you for your great model.

Is the range of valence and arousal of enot_b0_8_va_mtl.pt [-1 to 1] correct?

Based on the Affectnet dataset, it looks like [-1 to 1], but I want to know the exact range.

Thank you.

Can not load pretrained models

 File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/timm/models/efficientnet_blocks.py", line 47, in forward
    return x * self.gate(x_se)
  File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/module.py", line 947, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'SqueezeExcite' object has no attribute 'gate'

Request for guidelines on training on multitask classification

Hi,

I'm looking to train a custom dataset for multitask classification and I'm interested in trying to train this model. Could you please provide some guidelines or advice on how to proceed with this? Any help would be greatly appreciated!

Thank you.

about pretrained_faces

In the folder "models/pretrained_faces", I can only find a weight file about EfficientNet-B0 which called "state_vggface2_enet0_new.pt", so could you upload the weight file about EfficientNet-B2 which maybe called "state_vggface2_enet2_new.pt" ? Thank you!

Question about this work.

Dear Andrey Savchenko,

I'm a student and going to build a small system to detect student's emotions for my thesis. After finding a solution, I found your job. But I can't run https://github.com/HSE-asavchenko/face-emotion-recognition/blob/main/src/affectnet/train_emotions.ipynb by current AFFECT dataset's version. Please correct me if I'm wrong. My question is: Can I run this workhttps://github.com/HSE-asavchenko/face-emotion-recognition/blob/main/src/affectnet/train_affectnet_march2021_pytorch.ipynb with MobileNet. Because I tend to build small applications to detect emotions from client site then send result to server.

Many thanks,

Son Nguyen.

Fine tuning code

How to fine tune it on custom dataset. Can you provide the walkthrough notebook including dataset creation approach as well.
Thanks

real-time classify emotion

Hello.

Thanks for your code. I am trying to run your code on webcam and face a problem.

This is my code fixed from your code "AFEW_train.ipynb" for real-time face emotion recognition.

now i'm having problem predicting face emotion from "x_score_norm" which has shape=(None, 4096).
Can you help me how to classify facial emotion in 7 emotions by using this variable?

import dlib
import os
from PIL import Image
import cv2
from sklearn import preprocessing
#from keras.preprocessing.image import img_to_array

import numpy as np
from skimage import transform as trans

import tensorflow as tf
## mtcnn
from sklearn.ensemble import RandomForestClassifier

from facial_analysis import FacialImageProcessing

from tensorflow.keras.models import load_model,Model


## for error
#config = tf.ConfigProto()

#config.gpu_options.allow_growth = True

#tf.Session(config=config)

idx_to_class={0: 'Anger', 1: 'Disgust', 2: 'Fear', 3: 'Happiness', 4: 'Neutral', 5: 'Sadness', 6: 'Surprise'}
base_model=load_model('../models/affectnet_emotions/mobilenet_7.h5')


#base_model = load_model('../models/pretrained_faces/age_gender_tf2_224_deep-03-0.13-0.97.h5')
#base_model = torch.load("../models/pretrained_faces/state_vggface2_enet0_new.pt")

feature_extractor_model=Model(base_model.input,[base_model.get_layer('global_pooling').output,base_model.get_layer('feats').output,base_model.output])
feature_extractor_model.summary()
_,w,h,_=feature_extractor_model.input.shape


#afew_model = load_model('C:/Users/ccaa9/PycharmProjects/real-time_recognition/face-emotion-recognition/models/affectnet_emotion/enet_b0_8_best_afew.pt')


idx_to_class={0: 'Anger', 1: 'Disgust', 2: 'Fear', 3: 'Happiness', 4: 'Neutral', 5: 'Sadness', 6: 'Surprise'}
base_model=load_model('../models/affectnet_emotions/mobilenet_7.h5')


#base_model = load_model('../models/pretrained_faces/age_gender_tf2_224_deep-03-0.13-0.97.h5')
#base_model = torch.load("../models/pretrained_faces/state_vggface2_enet0_new.pt")

feature_extractor_model=Model(base_model.input,[base_model.get_layer('global_pooling').output,base_model.get_layer('feats').output,base_model.output])
feature_extractor_model.summary()
_,w,h,_=feature_extractor_model.input.shape

imgProcessing=FacialImageProcessing(False)
print(tf.__version__)

landmark_detector = dlib.shape_predictor(landmark_model)
emotion_to_index = {'Angry':0, 'Disgust':1, 'Fear':2, 'Happy':3, 'Neutral':4, 'Sad':5, 'Surprise':6}
INPUT_SIZE = (224,224)
### extract frames
def get_iou(bb1, bb2):
    """
    Calculate the Intersection over Union (IoU) of two bounding boxes.

    Parameters
    ----------
    bb1 : array
        order: {'x1', 'y1', 'x2', 'y2'}
        The (x1, y1) position is at the top left corner,
        the (x2, y2) position is at the bottom right corner
    bb2 : array
        order: {'x1', 'y1', 'x2', 'y2'}
        The (x1, y1) position is at the top left corner,
        the (x2, y2) position is at the bottom right corner

    Returns
    -------
    float
        in [0, 1]
    """

    # determine the coordinates of the intersection rectangle
    x_left = max(bb1[0], bb2[0])
    y_top = max(bb1[1], bb2[1])
    x_right = min(bb1[2], bb2[2])
    y_bottom = min(bb1[3], bb2[3])

    if x_right < x_left or y_bottom < y_top:
        return 0.0

    # The intersection of two axis-aligned bounding boxes is always an
    # axis-aligned bounding box
    intersection_area = (x_right - x_left) * (y_bottom - y_top)

    # compute the area of both AABBs
    bb1_area = (bb1[2] - bb1[0]) * (bb1[3] - bb1[1])
    bb2_area = (bb2[2] - bb2[0]) * (bb2[3] - bb2[1])

    # compute the intersection over union by taking the intersection
    # area and dividing it by the sum of prediction + ground-truth
    # areas - the interesection area
    iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
    return iou

#print(get_iou([10,10,20,20],[15,15,25,25]))

def preprocess(img, bbox=None, landmark=None, **kwargs):
    M = None
    image_size = [224,224]
    src = np.array([
      [30.2946, 51.6963],
      [65.5318, 51.5014],
      [48.0252, 71.7366],
      [33.5493, 92.3655],
      [62.7299, 92.2041] ], dtype=np.float32 )
    if image_size[1]==224:
        src[:,0] += 8.0
    src*=2
    if landmark is not None:
        dst = landmark.astype(np.float32)

        tform = trans.SimilarityTransform()
        #dst=dst[:3]
        #src=src[:3]
        #print(dst.shape,src.shape,dst,src)
        tform.estimate(dst, src)
        M = tform.params[0:2,:]
        #M = cv2.estimateRigidTransform( dst.reshape(1,5,2), src.reshape(1,5,2), False)
        #print(M)

    if M is None:
        if bbox is None: #use center crop
            det = np.zeros(4, dtype=np.int32)
            det[0] = int(img.shape[1]*0.0625)
            det[1] = int(img.shape[0]*0.0625)
            det[2] = img.shape[1] - det[0]
            det[3] = img.shape[0] - det[1]
        else:
              det = bbox
        margin = kwargs.get('margin', 44)
        bb = np.zeros(4, dtype=np.int32)
        bb[0] = np.maximum(det[0]-margin//2, 0)
        bb[1] = np.maximum(det[1]-margin//2, 0)
        bb[2] = np.minimum(det[2]+margin//2, img.shape[1])
        bb[3] = np.minimum(det[3]+margin//2, img.shape[0])
        ret = img[bb[1]:bb[3],bb[0]:bb[2],:]
        if len(image_size)>0:
              ret = cv2.resize(ret, (image_size[1], image_size[0]))
        return ret
    else: #do align using landmark
        assert len(image_size)==2
        warped = cv2.warpAffine(img,M,(image_size[1],image_size[0]), borderValue = 0.0)
        return warped


# landmark detection using dlib
def lanmark(image, face):

    # 얼굴에서 68개 점 찾기
    landmarks = landmark_detector(image, face)

    # create list to contain landmarks
    landmark_list = []

    # append (x, y) in landmark_list
    for p in landmarks.parts():
        landmark_list.append([p.x, p.y])
        cv2.circle(image, (p.x, p.y), 2, (255, 255, 255), -1)

def mobilenet_preprocess_input(x,**kwargs):
    x[..., 0] -= 103.939
    x[..., 1] -= 116.779
    x[..., 2] -= 123.68
    return x


def get_features_scores(image):
    filename2features = {}
    X_global_features, X_feats, X_scores, X_isface = [], [], [], []
    images = image
    images_10 = []
    i = 0
    for imgs in images:
        X_isface.append(True)  # making bbox means has face! so always have faces

        images_10.append(imgs)
        inp = preprocessing_function(np.array(images_10, dtype=np.float32))
        global_features, feats, scores = feature_extractor_model.predict(inp)
        print(global_features.shape,feats.shape,scores.shape)


        if len(X_feats) == 0:
            X_feats = feats
            X_global_features = global_features
            X_scores = scores
        else:
            X_feats = np.concatenate((X_feats, feats), axis=0)
            X_global_features = np.concatenate((X_global_features, global_features), axis=0)
            X_scores = np.concatenate((X_scores, scores), axis=0)

    print("global", X_global_features)
    X_isface = np.array(X_isface)
        # print(X_global_features.shape,X_feats.shape,X_scores.shape)

    filename2features[i] = (X_global_features, X_feats, X_scores, X_isface)
    i += 1

    return filename2features

## create dataset ==> concat function scores

USE_ALL_FEATURES = True


def create_dataset(filename2features):
    x = []
    y = []
    has_faces = []
    ind = 0
    features = filename2features[0]
    total_features = None

    if USE_ALL_FEATURES and True:
        print('here')
        #for face in [1, 0]:
        cur_features = features[ind]

        #if len(cur_features) == 0:
        #    continue
        weight = len(cur_features) / len(features[ind])
        mean_features = np.mean(cur_features, axis=0)
        std_features = np.std(cur_features, axis=0)
        max_features = np.max(cur_features, axis=0)
        min_features = np.min(cur_features, axis=0)

        # join several features together
        feature = np.concatenate((mean_features, std_features, min_features, max_features), axis=None)
        print("Feature", feature)
        if total_features is None:
            total_features = weight * feature
        else:
            total_features += weight * feature
    has_faces.append(1)
    print("total_Features : ", total_features)


    if total_features is not None:
        print("totla features is not none")
        x.append(total_features)
        #y.append(emotion_to_index[category])


    print("out of for moon")
    x = np.array(x)
    y = np.array(y)
    has_faces = np.array(has_faces)
    #print("x : ", x.shape, "y :", y.shape, "has_face : ", has_faces)
    #return x, y, has_faces
    return x #, y, has_faces


## dlib
detector = dlib.get_frontal_face_detector()


## main

preprocessing_function=mobilenet_preprocess_input
# webcam open
cap = cv2.VideoCapture(0,cv2.CAP_DSHOW)

print('camera is opened width: {0}, height: {1}'.format(cap.get(3), cap.get(4)))


if cap.isOpened():
    print('width: {}, height : {}'.format(cap.get(3), cap.get(4)))

while(cap.isOpened()):

    ret, image = cap.read()

    if ret:

        frame = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        bounding_boxes, points = imgProcessing.detect_faces(frame)
        points = points.T

        faceframe_10 = []
        for i in range(10):
            for bbox, p in zip(bounding_boxes, points):
                box = bbox.astype(int)
                x1, y1, x2, y2 = box[0:4]
                # face_img=frame[y1:y2,x1:x2,:]

                p = p.reshape((2, 5)).T

                #top, left, bottom, right = box[0:4]
                #face=dlib.rectangle(left, top, right, bottom)
                face=dlib.rectangle(x1, y1, x2, y2)

                face_img = preprocess(frame, box, p)  ## CROPPED AND ALIGNED
                face_img = cv2.cvtColor(face_img, cv2.COLOR_BGR2RGB)


                ###show aligned cropped image
                cv2.imshow("face_img", face_img)

                ### draw bounding box on original image
                cv2.rectangle(image, (face.left() - 5, face.top() - 5), (face.right() + 5, face.bottom() + 5),
                              (0, 186, 255), 3)
                
                cv2.imshow("original", image)

                faceframe_10.append(face_img)
                
        n = 0
        ## feature score threw aligned cropped image
        features_scores = get_features_scores(faceframe_10)
        x_score= create_dataset(features_scores)
        n += 1

        ## normalization
        x_score_norm = preprocessing.normalize(x_score, norm='l2')
        
        ## ? how to use x_score_norm to predict emotion??

    else:
        print("error")


    if cv2.waitKey(25) & 0xFF == ord('q'):
        record = False
        break

cap.release()
cv2.destroyAllWindows()

Can I get a multi-task learning model file?

Thank you for the good paper.

I'm interested in your work, so I want to check the multi-task learning side, so I'm leaving an issue.
I have two questions.

  1. Multi-task learning training
  1. If understanding is correct, is it correct to freeze the weight of the backbone and learn only the weight of the head?
  2. In this case, is it practically the same as each of you learned?
  1. Multi-task learning model
    Can I get a multi-task learning model file that I learned? I'd like to check Aruosal, Valence, etc.

Time of training

Hello.
I would like to know how much time did it take you to train all affentnet?
and what type of GPU did you choose, and how many GPUs did you choose?

question about training on my own data

thank u for your sharing of your models and codes.
in my work i want to train a 3-class face emotion recognition model(8 class is too much for me) on my own data using PYTORCH, and i hope can train my classifier base on enet_b0_8_best_afew.pt (just train classifier with backbone frozen)
i really don't want to train from scratch O_O
but i don't know how to train from it, can u give me some suggestions?

or Can u tell me which should i use of these traing codes below? because i can't tell the difference between them
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.