av-savchenko / face-emotion-recognition Goto Github PK

Efficient face emotion recognition in photos and videos

License: Apache License 2.0

Jupyter Notebook 99.06% Python 0.36% Java 0.58%

emotion-recognition face-emotion-detection face-expression face-emotion-recognition emotion-analysis emotion-detection facial-expression-recognition facial-expressions

face-emotion-recognition's Introduction

HSEmotion (High-Speed face Emotion recognition) library

This repository contains code that was developed by A. Savchenko during his research at the HSE University and Sber AI Lab.

Usage

Special python packages hsemotion and hsemotion-onnx were prepared to simplify the usage of our models for face expression recognition and extraction of visual emotional embeddings. They can be installed via pip:

    pip install hsemotion
    pip install hsemotion-onnx

In order to run our code on the datasets, please prepare them firstly using our TensorFlow notebooks: train_emotions.ipynb, AFEW_train.ipynb and VGAF_train.ipynb.

If you want to run our mobile application, please, run the following scripts inside mobile_app folder:

python to_tflite.py
python to_pytorchlite.py

NOTE!!! I updated the models so that they should work with recent timm library. However, for v0.1 version, please be sure that EfficientNet models for PyTorch are based on old timm 0.4.5 package, so that exactly this version should be installed by the following command:

pip install timm==0.4.5

News

Our models let our team HSEmotion took the second place in the Compound Expression Recognition Challenge and the 3rd place in the Action Unit Detection during the sixth Affective Behavior Analysis in-the-wild (ABAW) Competition
The paper "Facial Expression Recognition with Adaptive Frame Rate based on Multiple Testing Correction" has been accepted as Oral talk at ICML 2023. The source code to reproduce the results of this paper are available at this repository, see subsections "Adaptive Frame Rate" at abaw3_train.ipynb and train_emotions-pytorch-afew-vgaf.ipynb
Our models let our team HSE-NN took the first place in the Learning from Synthetic Data (LSD) Challenge and the 3rd place in the Multi-Task Learning (MTL) Challenge in the fourth ABAW Competition
Our models let our team HSE-NN took the 3rd place in the multi-task learning challenge, 4th places in Valence-Arousal and Expression challenges and 5th place in the Action Unite Detection Challenge in the third Affective Behavior Analysis in-the-wild (ABAW) Competition. Our approach is presented in the paper accepted at CVPR 2022 ABAW Workshop.

Details

All the models were pre-trained for face identification task using VGGFace2 dataset. In order to train PyTorch models, SAM code was borrowed.

We upload several models that obtained the state-of-the-art results for AffectNet dataset. The facial features extracted by these models lead to the state-of-the-art accuracy of face-only models on video datasets from EmotiW 2019, 2020 challenges: AFEW (Acted Facial Expression In The Wild), VGAF (Video level Group AFfect), EngageWild; and ABAW CVPR 2022 and ECCV 2022 challenges: Learning from Synthetic Data (LSD) and Multi-task Learning (MTL).

Here are the performance metrics (accuracy on AffectNet, AFEW and VGAF), F1-score on LSD, on the validation sets of the above-mentioned datasets and the mean inference time for our models on Samsung Fold 3 device with Qualcomm 888 CPU and Android 12:

Model	AffectNet (8 classes)	AffectNet (7 classes)	AFEW	VGAF	LSD	MTL	Inference time, ms	Model size, MB
mobilenet_7.h5	-	64.71	55.35	68.92	-	1.099	16 ± 5	14
enet_b0_8_best_afew.pt	60.95	64.63	59.89	66.80	59.32	1.110	59 ± 26	16
enet_b0_8_best_vgaf.pt	61.32	64.57	55.14	68.29	59.72	1.123	59 ± 26	16
enet_b0_8_va_mtl.pt	61.93	64.94	56.73	66.58	60.94	1.276	60 ± 32	16
enet_b0_7.pt	-	65.74	56.99	65.18	-	1.111	59 ± 26	16
enet_b2_7.pt	-	66.34	59.63	69.84	-	1.134	191 ± 18	30
enet_b2_8.pt	63.03	66.29	57.78	70.23	52.06	1.147	191 ± 18	30
enet_b2_8_best.pt	63.125	66.51	56.73	71.12	-	-	191 ± 18	30

Please note, that we report the accuracies for AFEW and VGAF only on the subsets, in which MTCNN detects facial regions. The code contains also computation of overall accuracy on the complete testing set, which is slightly lower due to the absence of faces or failed face detection.

Research papers

If you use our models, please cite the following papers:

@inproceedings{savchenko2023facial,
  title = 	 {Facial Expression Recognition with Adaptive Frame Rate based on Multiple Testing Correction},
  author =       {Savchenko, Andrey},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning (ICML)},
  pages = 	 {30119--30129},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  url={https://proceedings.mlr.press/v202/savchenko23a.html}
}

@inproceedings{savchenko2021facial,
  title={Facial expression and attributes recognition based on multi-task learning of lightweight neural networks},
  author={Savchenko, Andrey V.},
  booktitle={Proceedings of the 19th International Symposium on Intelligent Systems and Informatics (SISY)},
  pages={119--124},
  year={2021},
  organization={IEEE},
  url={https://arxiv.org/abs/2103.17107}
}

@inproceedings{Savchenko_2022_CVPRW,
  author    = {Savchenko, Andrey V.},
  title     = {Video-Based Frame-Level Facial Analysis of Affective Behavior on Mobile Devices Using EfficientNets},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  month     = {June},
  year      = {2022},
  pages     = {2359-2366},
  url={https://arxiv.org/abs/2103.17107}
}

@inproceedings{Savchenko_2022_ECCVW,
  author    = {Savchenko, Andrey V.},
  title     = {{MT-EmotiEffNet} for Multi-task Human Affective Behavior Analysis and Learning from Synthetic Data},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV 2022) Workshops},
  pages={45--59},
  year={2023},
  organization={Springer},
  url={https://arxiv.org/abs/2207.09508}
}

@article{savchenko2022classifying,
  title={Classifying emotions and engagement in online learning based on a single facial expression recognition neural network},
  author={Savchenko, Andrey V and Savchenko, Lyudmila V and Makarov, Ilya},
  journal={IEEE Transactions on Affective Computing},
  year={2022},
  publisher={IEEE},
  url={https://ieeexplore.ieee.org/document/9815154}
}

face-emotion-recognition's People

Contributors

Stargazers

Watchers

Forkers

prnvjb tusenka neobruno2021 nalibjchn leowong316 lynnnnnnnnn liyantett snehilsanyal wijdanrashid suke0 ziudeso chaelin0722 cherrychou98 pootiney alvinchiew-naluri nothingeasy yaronblum ayushtyagi188 meetrilok startrekq tsendee0502 dashk4 easonc13 avisheknandi binayin mr-dav-rii sorizeta yonghoonkwon youssefmuaa douglasamante darksidds zrzb lilrachel1985 estherrlim machine-learning-cohsive acamargofb solodonut fabioburgos fighting332 fatemeata naufalf121 look4pritam softwareimpacts perfyperfect hwijune thanvanquang wahyu-adi-n hojamuerta rcxd sunggukcha steffi201028 exoalam sirazhang aurora-zhou rishi-11-2 aiveo-as tomar-vrdate abdulazizbek aly-kh paliwaladitya2 millavasilieva zhouzw87 arislid saeu5407 vinaybharadwaj007 n-raghav lapshinaaa asahichine zlp2495006990 hwei02455 jackzhousz jayrgopal lauritsfromberg pistachio1005 ortegahz yyhatom dheeran2602 nikelu2017 farishuwaidi anshul2807 parisa-ahmadi skyrookieyu 12mori12 ajyanand saeedtaghavi donavanyeoseph immy-delish thuc0202 youjunli888 soon14 5l1v3r1 tokihana vaishnavi-gupta06 djj547 lovehuanhuan abuuqaasim linhong00316 yadheedhya06 mr-nobody-dey tqhuyen

face-emotion-recognition's Issues

Additional files

What additional files need to be created in the project and what is their content
I would be happy to list a file tree for the project files
For example, the files behind the code:
AFFECT_DATA_DIR=ALL_DATA_DIR#+'AffectNet/'
AFFECT_TRAIN_DATA_DIR = AFFECT_DATA_DIR+'full_res/train'
AFFECT_VAL_DATA_DIR = AFFECT_DATA_DIR+'full_res/val'
AFFECT_IMG_TRAIN_DATA_DIR = AFFECT_DATA_DIR+str(INPUT_SIZE[0])+'/train'
AFFECT_IMG_VAL_DATA_DIR = AFFECT_DATA_DIR+str(INPUT_SIZE[0])+'/val'
AFFECT_TRAIN_ORIG_DATA_DIR = AFFECT_DATA_DIR+'orig/train'
AFFECT_VAL_ORIG_DATA_DIR = AFFECT_DATA_DIR+'orig/val'

question for au

Request for help in the Training Script for enet_b0_8_best_afew.onnx model

Hello,

I'm reaching out to seek guidance regarding the enet_b0_8_best_afew.onnx model. I'm interested in utilizing this model, but I'm encountering challenges in locating the relevant training script to generate this particular model.

While searching, I came across these scripts: GitHub Link. Additionally, I found that the model is placed under the affectnet folder: GitHub Link.

My understanding is that the model has been trained with EfficientNet B0 using the VGAF dataset, based on the code VGAF_train.ipynb. However, I am unsure if this is the correct procedure or if there are additional steps involved.

Could someone please provide me with more information or guidance on how to properly train and obtain the enet_b0_8_best_afew.onnx model? Any assistance you could offer would be greatly appreciated.

Thank you for your time and support.

Preprocessing of images to run inference

Hello, thank you very much for your work.

I am trying to preprocess a batch of images (I have my own dataset) the way you prepared your data. I'm following the notebook train_emotions.ipynb as it is in Tensforflow and I'm using that framework.

I have a question about the steps of the preprocessing, so I would like to ask you if you can tell me the correct steps. These are the steps I'm following, let me know if I'm right or if something is missing:

I already have my images with the faces detected and croppped, i.e, I have a dataset full of faces like this
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img,(224,224))
Then your notebook shows you make a normalization
def mobilenet_preprocess_input(x,**kwargs):
x[..., 0] -= 103.939
x[..., 1] -= 116.779
x[..., 2] -= 123.68
return x
preprocessing_function=mobilenet_preprocess_input

Here I am having an issue because I cannot cast the subtraction operation between an integer and a float, so I changed it to

def mobilenet_preprocess_input(x,**kwargs):
x[..., 0] = x[..., 0] - 103.939
x[..., 1] = x[..., 1] - 116.779
x[..., 2] = x[..., 2] - 123.68
return x
preprocessing_function=mobilenet_preprocess_input

So, let me know if the process I'm following is correct or if there's something missing.

Thank you!

Questions for souce code of enet_b2_8.pt

Could you please share with me the source code to inference with the enet_b2_8.pt model (trained on 8 classes)?

Why is multi-task learning one head?

Thank you for your wonderful code sharing.

I'm looking at a multi-task learning code train_emotion-pytorch.ipynb.

I know why you used FER, Valence, and Arousal in one head without dividing the head.

Thank you.

Realtime Performance

Can you please share the insights about realtime performance ?

about the facial video

Pardon. Could you please tell me how to get the facial video on student's or teacher's device used in your paper. Thanks for your reply!

Accuracy on AffectNet is worse than expected

I am using the HSEmotion package to classify emotions on the AffectNet validation set but my accuracies are worse than what is in the README:

MODEL	README Accuracy	MINE
enet_b0_8_best_afew	60.95	58.31
enet_b0_8_best_vgaf	61.32	59.66
enet_b0_8_va_mtl	61.93	59.04
enet_b2_8	63.03	61.59
enet_b2_8_best	63.125	61.24

What do you think might be causing this discrepancy?

Here's an excerpt of how I'm using the package:

from hsemotion.facial_emotions import HSEmotionRecognizer
fer=HSEmotionRecognizer(model_name=model_name,device=device)

def predict_score(img_path):
    frame_bgr = cv2.imread(img_path)
    emotion,scores=fer.predict_emotions(frame_bgr,logits=True)
    return EMOTION_MAPPING[emotion]

running code on colab

Hello,
I want to run the code on google colab on the pre-trained models,
after reading the read.me I'm trying to run display_emotions.ipynb but I'm getting errors.
Is it right to start working on display_emotions.ipynb?

Error while testing

Hello and thank you for everything!
What do you think is the cause of this error? Whatever I searched for, I did not find a suitable answer.

when i execute test_hsemotion_package.ipynb, getting an error below.

'EfficientNet' object has no attribute 'act1'

Kindly help

'DepthwiseSeparableConv' object has no attribute 'act1'

Hi! I run feature_extractor_model() in video_summarizer.ipynb and found this erro. I think it's because of timm version, but I've tried 0.67 and 0.45.

docker image

Hello, can you give me a docker image file? This way I can study more conveniently.
May all go well with you

Request for a demo code

Hi,

Is there a demonstration code available for running videos or webcam feeds? Thank you.

Scoring error

Hello. Thanks for sharing your great works.

Your scoring script is correct only if len(dataloader) % batch_size == 0.

epoch_val_accuracy = 0
epoch_val_loss = 0
for data, label in test_loader:
    data = data.to(device)
    label = label.to(device)

    val_output = model(data)
    val_loss = criterion(val_output, label)

    acc = (val_output.argmax(dim=1) == label).float().mean()
    epoch_val_accuracy += acc / len(test_loader)
    epoch_val_loss += val_loss / len(test_loader)

My version follows where length is the size of the dataset (not which of dataloader).

loss = 0.0
accuracy = 0.0
for (images, emotions) in tqdm(dataloader):
    images = images.cuda()
    emotions = emotions.cuda()
    preds = model(images)
    # loss
    loss += criterion(preds, emotions) / 1
    # accuracy
    preds = torch.argmax(preds, dim=1)
    acc = torch.eq(preds, emotions).sum()
    accuracy += acc
loss /= length
accuracy /= length

AttributeError: 'EfficientNet' object has no attribute 'act1'

Thank you chenko for the great work! I tried to use effifientNet model to predict facial emotions. I used code from test_hsemotion_package.ipynb. I downloaded the xx.pt file from the models folder and load it from a local directory. The model loaded well. But an error occurs when executing fer.predict_emotions(frame,logits=True)
Here is my code.

import os
import numpy as np
import matplotlib.pyplot as plt
import cv2
from PIL import Image
import torch
from hsemotion.facial_emotions import HSEmotionRecognizer

import torch

use_cuda = torch.cuda.is_available()
device = 'cuda' if use_cuda else 'cpu'

model_path='../models/affectnet_emotions/enet_b2_8_best.pt'

fer=HSEmotionRecognizer(model_name=model_path,device=device)
# model = torch.load(model_path, map_location=torch.device('cpu'))

fpath='../test_images/0_alamy_adoration_emotion_2_22.jpg'
frame_bgr = cv2.imread(fpath)
plt.figure(figsize=(5, 5))
frame = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
plt.axis('off')
plt.imshow(frame)

emotion, scores=fer.predict_emotions(frame,logits=True)
plt.figure(figsize=(3, 3))
plt.axis('off')
plt.imshow(frame)
plt.title(emotion)

The error is:

raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'EfficientNet' object has no attribute 'act1'

My timm version is 0.4.5. Any idea on how to solve this issue?

afew test accuracy is around 55%

hi, thanks for your great work.Now, I'm following your work on afew dataset. But I just got around 55% accuracy by running AFEW_train.ipynb and train_emotions-pytorch.ipynb. When I read your codes, I found you used enet_b0_8_va_mtl.pt model in train_emotions-pytorch.ipynb and used mobilnet_7.h5 model in AFEW_train.ipynb. I am confused which model is the right one and what should I do to reproduce 59% accuracy on afew dateset.
thank you for your replies!

QUestion about inference

Hello.

THanks for your code. I've tried to run your code on webcam and for that i've needed to load model properly from pretrained weights. I've took a look into your train_emotions-pytorch and sought that you're loading model as

model=timm.create_model('tf_efficientnet_b0_ns', pretrained=False)
model.classifier=torch.nn.Identity()
model.load_state_dict(torch.load('../models/pretrained_faces/state_vggface2_enet0_new.pt'))

Then you're adding model.classifier=nn.Sequential(nn.Linear(in_features=1280, out_features=num_classes)) and training your model. If i got it right, you're providing pre-trained weights at the models/affectnet_emotions folder. But question is, how to load them? I've thought it should be something like this

model=timm.create_model('tf_efficientnet_b0_ns', pretrained=False)
model.classifier=nn.Sequential(nn.Linear(in_features=1280, out_features=8))
model.load_state_dict(torch.load('../models/affectnet_emotions/enet_b0_8_best_afew.pt'))

But no, i've got an error. AttributeError: 'EfficientNet' object has no attribute 'copy'. So my guess is wrong then. Should i load it some different way? THanks in advance.

Valence and arousal

Hello again!
I've read your paper and I've seen that you use the circumplex model's variables arousal and valence.
How do those variable appears in the code? I can't find them :(
Thank you,
Amaia

Provide the validation script/notebook.

Hi,

I am fond of your works and paper, but I can not find any validation script to validate your result, especially the highest result with efficientNetB2-8 classes-EffectNet.

Or could you please provide a separate script to pre-process the input images then we can validate the provided weights on your GitHub repository?

Thank you,

Quick question (face-emotion recognition)

Hello!
I'm trying to use your code and I don't understand the range in which the emotions go once you add a picture and run the code.
For example:
Happy: -4.876
Angry: 0.987654
And so on, which is the range in which this emotions can take values of?
Thank you very much,
Amaia

Confidence range for inference using python library

Hi,

First of all, thank you so much for such a convenient setup to use!

I'm using the python library face emotion in my code with the model_name = 'enet_b0_8_best_afew'.
I was wondering what is the range of the confidence returned by the library or this model in particular.
I wasn't able to figure that out.

Thank you

AttributeError: 'SqueezeExcite' object has no attribute 'gate'

Excuse me, this problem occurs when using the ‘enet_b2_7.pt’ model to test. I completed it according to the steps you gave, but I really couldn't find the reason for this problem. Do you have any suggestions？

dataset question

i can not find the aff-wild2 dataset,Can you please share this dataset with me, I will be very grateful

How to apply Adaptive Frame Rate model on new dataset?

Hi! Thanks for your excellent work. Could I ask how can I apply the model from the paper Adaptive Frame Rate on the new video dataset? The notebook provided doesn't elaborate on it explicitly. Thanks for your help.

about multi-task learning

Hi,
Thank you for your great job!

I've read your paper, and I have a question about multi-task learning.

Am I right that during learning all new heads (emotions, age, gender, etc) "CMM lower layers" are freezed, so meaning you've trains only the heads for every task? So, am right that age, gender and ethnicity don't influence face emotion recognition features? As I see, you've used efficientnet like backbone adding dense layer like a classifier and train this architecture to get emotion labeling. I can't understand how adding of face attribute recognition task influence the FER accuracy, counting that during the training the common part of the NN architecture is frozen:(

Missing enet_b0_8_va_mtl.ptl

I am trying to run your Android example. It tries to load file model file "enet_b0_8_va_mtl.ptl", which looks to be missing in the repository.

Is this the file I should generate myself? If yes, I would really appreciate if you can point to the place where this can be done.

Thank you!

specifications of computer

Can you tell me the specification of your computer to train affectnet efficiently ?

A error when runing codes.

When runing AFEW_train.ipynb,
an error occured:

could not broadcast input array from shape (0,112,3) into shape (60,112,3)
at facial_anylysis.py line 274 :
tmp[dy[k]-1:edy[k],dx[k]-1:edx[k],:] = img[y[k]-1:ey[k],x[k]-1:ex[k],:]

why dose this occured? could you please fixed it?

can you share your Manually_Annotated_file cvs files?

I test affectnet validation data, but get 0.5965 using enet_b2_8.pt.
can you share Manually_Annotated_file validation.csv and training.csv to me for debug?

How to prepare the AffectNet dataset?

I have downloaded the AffectNet dataset. It consists of npy files in the below folder format.

images
--->
0.jpg
annotations
-->
0_exp.npy
0_aro.npy
0_val.npy
0_lnd.npy

I'm trying to create a csv file with same columns mentioned in the https://github.com/HSE-asavchenko/face-emotion-recognition/blob/main/src/train_emotions.ipynb

['subDirectory_filePath', 'face_x', 'face_y', 'face_width','face_height', 'facial_landmarks', 'expression', 'valence', 'arousal']

However, I can't seem to find any information in the dataset concerning face x, face y, or face width.

Do I need to calculate them? If so, could you please upload the dataset preparation file?
Is 'facial_landmarks' an array of size 136?

A few suggestions.

Hello!

I have a couple of ideas:

Could you, please, add text description about difference between models, especially between b0 and b2 general types?
Please consider adding hsemotion-onnx package to the pip repository.

Age gender ethinicity model giving same output for different results

`class CNN(object):

def __init__(self, model_filepath):

    self.model_filepath = model_filepath
    self.load_graph(model_filepath = self.model_filepath)

def load_graph(self, model_filepath):
    print('Loading model...')
    self.graph = tf.Graph()
    self.sess = tf.compat.v1.InteractiveSession(graph = self.graph)

    with tf.compat.v1.gfile.GFile(model_filepath, 'rb') as f:
        graph_def = tf.compat.v1.GraphDef()
        graph_def.ParseFromString(f.read())

    print('Check out the input placeholders:')
    nodes = [n.name + ' => ' +  n.op for n in graph_def.node if n.op in ('Placeholder')]
    for node in nodes:
        print(node)

    # Define input tensor
    self.input = tf.compat.v1.placeholder(np.float32, shape = [None, 224, 224, 3], name='input')
    # self.dropout_rate = tf.placeholder(tf.float32, shape = [], name = 'dropout_rate')

    tf.import_graph_def(graph_def, {'input_1': self.input})

    print('Model loading complete!')

    
    # Get layer names
    layers = [op.name for op in self.graph.get_operations()]
    for layer in layers:
        print(layer)

def test(self, data):

    # Know your output node name
    output_tensor1,output_tensor2 ,output_tensor3  = self.graph.get_tensor_by_name('import/age_pred/Softmax: 0'),self.graph.get_tensor_by_name('import/gender_pred/Sigmoid: 0'),self.graph.get_tensor_by_name('import/ethnicity_pred/Softmax: 0')
    output = self.sess.run([output_tensor1,output_tensor2 ,output_tensor3], feed_dict = {self.input: data})

    return output`

Using this code load "age_gender_ethnicity_224_deep-03-0.13-0.97-0.88.pb" and predict on it. But when predicting on images, every time I am getting same output array.

[array([[0.01319346, 0.00229602, 0.00176407, 0.00270929, 0.01408699, 0.00574261, 0.00756087, 0.01012164, 0.01221055, 0.01821703, 0.01120028, 0.00936489, 0.01003029, 0.00912451, 0.00813381, 0.00894791, 0.01277262, 0.01034999, 0.01053109, 0.0133063 , 0.01423471, 0.01610439, 0.01528896, 0.01825454, 0.01722076, 0.01933933, 0.01908059, 0.01899827, 0.01919533, 0.0278129 , 0.02204996, 0.02146631, 0.02125309, 0.02146868, 0.02230236, 0.02054285, 0.02096066, 0.01976574, 0.01990371, 0.02064857, 0.01843528, 0.01697922, 0.01610838, 0.01458549, 0.01581902, 0.01377539, 0.01298613, 0.01378927, 0.01191105, 0.01335083, 0.01154454, 0.01118198, 0.01019558, 0.01038121, 0.00920709, 0.00902615, 0.00936321, 0.00969135, 0.00867239, 0.00838663, 0.00797724, 0.00756043, 0.00890809, 0.00758041, 0.00743711, 0.00584346, 0.00555749, 0.00639214, 0.0061864 , 0.00784793, 0.00532241, 0.00567684, 0.00481544, 0.0052173 , 0.00513186, 0.00394571, 0.00415856, 0.00384584, 0.00452774, 0.0041736 , 0.00328163, 0.00327138, 0.00297012, 0.00369216, 0.00284221, 0.00255897, 0.00285459, 0.00232105, 0.00228869, 0.00218005, 0.0021927 , 0.00236659, 0.00233843, 0.00204793, 0.00209861, 0.00231407, 0.00145706, 0.00179674, 0.00186183, 0.00221309]], dtype=float32), array([[0.62949586]], dtype=float32), array([[0.21338916, 0.19771543, 0.19809113, 0.19525865, 0.19554558]], dtype=float32)]
Is there something am missing or is this .pb file not meant for predicting?

Python package excess dependency

Hello!

It seems, python package has excess dependency timm. This module is imported, but not used.

Error on CPU with enet_b2_8_best model

There is following error while running enet_b2_8_best model on CPU with the latest git hse package:

...
    fer = HSEmotionRecognizer(model_name = model_name)
  File "/home/build/.local/lib/python3.10/site-packages/hsemotion/facial_emotions.py", line 49, in __init__
    model=torch.load(path)
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 1049, in _load
    result = unpickler.load()
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 1019, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 1001, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 152, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/home/build/.local/lib/python3.10/site-packages/torch/serialization.py", line 136, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

At that, model enet_b0_8_best_afew works fine with the same script on the same CPU.

Is it possible to predict action units?

In the paper "Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices" predictions are made for AUs. I wonder if this is possible with the code provided here?

The issue with converting .pt to .onnx.

Hi, excellent work, and thank you for sharing!
I have a question regarding converting .pt to .onnx using your convert_pt_to_onnx.py script. I encountered an issue where, despite installing timm==0.4.5, it gives an error stating no timm.layer. I understand this is likely due to the timm version, but updating to a newer version causes other unexpected issues. Do you have any suggestion? thanks

Provide the setting for ENetB0

Hi,

Could you please share the setting to train ENetB0 to reach the accuracy of 61.32 in the report?

I can only re-produce about 57%

Problem with the version of TensorFlow.

Hi!

Thanks for your greate work. However, there are some problems when I want to run the display_emotions.ipynb. I think there is a problem with the versions of the tensorflow and the numpy. Thus, could you please give a more detailed instroduction of the version of tensorflow and other packages that are used in the project?

Thanks a lot!

range of valence and arousal

Thank you for your great model.

Is the range of valence and arousal of enot_b0_8_va_mtl.pt [-1 to 1] correct?

Based on the Affectnet dataset, it looks like [-1 to 1], but I want to know the exact range.

Thank you.

Can not load pretrained models

 File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/timm/models/efficientnet_blocks.py", line 47, in forward
    return x * self.gate(x_se)
  File "/Users/xxx/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/module.py", line 947, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'SqueezeExcite' object has no attribute 'gate'

Request for guidelines on training on multitask classification

Hi,

I'm looking to train a custom dataset for multitask classification and I'm interested in trying to train this model. Could you please provide some guidelines or advice on how to proceed with this? Any help would be greatly appreciated!

Thank you.

about pretrained_faces

In the folder "models/pretrained_faces", I can only find a weight file about EfficientNet-B0 which called "state_vggface2_enet0_new.pt", so could you upload the weight file about EfficientNet-B2 which maybe called "state_vggface2_enet2_new.pt" ? Thank you!

Question about this work.

Dear Andrey Savchenko,

I'm a student and going to build a small system to detect student's emotions for my thesis. After finding a solution, I found your job. But I can't run https://github.com/HSE-asavchenko/face-emotion-recognition/blob/main/src/affectnet/train_emotions.ipynb by current AFFECT dataset's version. Please correct me if I'm wrong. My question is: Can I run this workhttps://github.com/HSE-asavchenko/face-emotion-recognition/blob/main/src/affectnet/train_affectnet_march2021_pytorch.ipynb with MobileNet. Because I tend to build small applications to detect emotions from client site then send result to server.

Many thanks,

Son Nguyen.

Fine tuning code

How to fine tune it on custom dataset. Can you provide the walkthrough notebook including dataset creation approach as well.
Thanks

real-time classify emotion

Hello.

Thanks for your code. I am trying to run your code on webcam and face a problem.

This is my code fixed from your code "AFEW_train.ipynb" for real-time face emotion recognition.

now i'm having problem predicting face emotion from "x_score_norm" which has shape=(None, 4096).
Can you help me how to classify facial emotion in 7 emotions by using this variable?

import dlib
import os
from PIL import Image
import cv2
from sklearn import preprocessing
#from keras.preprocessing.image import img_to_array

import numpy as np
from skimage import transform as trans

import tensorflow as tf
## mtcnn
from sklearn.ensemble import RandomForestClassifier

from facial_analysis import FacialImageProcessing

from tensorflow.keras.models import load_model,Model


## for error
#config = tf.ConfigProto()

#config.gpu_options.allow_growth = True

#tf.Session(config=config)

idx_to_class={0: 'Anger', 1: 'Disgust', 2: 'Fear', 3: 'Happiness', 4: 'Neutral', 5: 'Sadness', 6: 'Surprise'}
base_model=load_model('../models/affectnet_emotions/mobilenet_7.h5')


#base_model = load_model('../models/pretrained_faces/age_gender_tf2_224_deep-03-0.13-0.97.h5')
#base_model = torch.load("../models/pretrained_faces/state_vggface2_enet0_new.pt")

feature_extractor_model=Model(base_model.input,[base_model.get_layer('global_pooling').output,base_model.get_layer('feats').output,base_model.output])
feature_extractor_model.summary()
_,w,h,_=feature_extractor_model.input.shape


#afew_model = load_model('C:/Users/ccaa9/PycharmProjects/real-time_recognition/face-emotion-recognition/models/affectnet_emotion/enet_b0_8_best_afew.pt')


idx_to_class={0: 'Anger', 1: 'Disgust', 2: 'Fear', 3: 'Happiness', 4: 'Neutral', 5: 'Sadness', 6: 'Surprise'}
base_model=load_model('../models/affectnet_emotions/mobilenet_7.h5')


#base_model = load_model('../models/pretrained_faces/age_gender_tf2_224_deep-03-0.13-0.97.h5')
#base_model = torch.load("../models/pretrained_faces/state_vggface2_enet0_new.pt")

feature_extractor_model=Model(base_model.input,[base_model.get_layer('global_pooling').output,base_model.get_layer('feats').output,base_model.output])
feature_extractor_model.summary()
_,w,h,_=feature_extractor_model.input.shape

imgProcessing=FacialImageProcessing(False)
print(tf.__version__)

landmark_detector = dlib.shape_predictor(landmark_model)
emotion_to_index = {'Angry':0, 'Disgust':1, 'Fear':2, 'Happy':3, 'Neutral':4, 'Sad':5, 'Surprise':6}
INPUT_SIZE = (224,224)
### extract frames
def get_iou(bb1, bb2):
    """
    Calculate the Intersection over Union (IoU) of two bounding boxes.

    Parameters
    ----------
    bb1 : array
        order: {'x1', 'y1', 'x2', 'y2'}
        The (x1, y1) position is at the top left corner,
        the (x2, y2) position is at the bottom right corner
    bb2 : array
        order: {'x1', 'y1', 'x2', 'y2'}
        The (x1, y1) position is at the top left corner,
        the (x2, y2) position is at the bottom right corner

    Returns
    -------
    float
        in [0, 1]
    """

    # determine the coordinates of the intersection rectangle
    x_left = max(bb1[0], bb2[0])
    y_top = max(bb1[1], bb2[1])
    x_right = min(bb1[2], bb2[2])
    y_bottom = min(bb1[3], bb2[3])

    if x_right < x_left or y_bottom < y_top:
        return 0.0

    # The intersection of two axis-aligned bounding boxes is always an
    # axis-aligned bounding box
    intersection_area = (x_right - x_left) * (y_bottom - y_top)

    # compute the area of both AABBs
    bb1_area = (bb1[2] - bb1[0]) * (bb1[3] - bb1[1])
    bb2_area = (bb2[2] - bb2[0]) * (bb2[3] - bb2[1])

    # compute the intersection over union by taking the intersection
    # area and dividing it by the sum of prediction + ground-truth
    # areas - the interesection area
    iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
    return iou

#print(get_iou([10,10,20,20],[15,15,25,25]))

def preprocess(img, bbox=None, landmark=None, **kwargs):
    M = None
    image_size = [224,224]
    src = np.array([
      [30.2946, 51.6963],
      [65.5318, 51.5014],
      [48.0252, 71.7366],
      [33.5493, 92.3655],
      [62.7299, 92.2041] ], dtype=np.float32 )
    if image_size[1]==224:
        src[:,0] += 8.0
    src*=2
    if landmark is not None:
        dst = landmark.astype(np.float32)

        tform = trans.SimilarityTransform()
        #dst=dst[:3]
        #src=src[:3]
        #print(dst.shape,src.shape,dst,src)
        tform.estimate(dst, src)
        M = tform.params[0:2,:]
        #M = cv2.estimateRigidTransform( dst.reshape(1,5,2), src.reshape(1,5,2), False)
        #print(M)

    if M is None:
        if bbox is None: #use center crop
            det = np.zeros(4, dtype=np.int32)
            det[0] = int(img.shape[1]*0.0625)
            det[1] = int(img.shape[0]*0.0625)
            det[2] = img.shape[1] - det[0]
            det[3] = img.shape[0] - det[1]
        else:
              det = bbox
        margin = kwargs.get('margin', 44)
        bb = np.zeros(4, dtype=np.int32)
        bb[0] = np.maximum(det[0]-margin//2, 0)
        bb[1] = np.maximum(det[1]-margin//2, 0)
        bb[2] = np.minimum(det[2]+margin//2, img.shape[1])
        bb[3] = np.minimum(det[3]+margin//2, img.shape[0])
        ret = img[bb[1]:bb[3],bb[0]:bb[2],:]
        if len(image_size)>0:
              ret = cv2.resize(ret, (image_size[1], image_size[0]))
        return ret
    else: #do align using landmark
        assert len(image_size)==2
        warped = cv2.warpAffine(img,M,(image_size[1],image_size[0]), borderValue = 0.0)
        return warped


# landmark detection using dlib
def lanmark(image, face):

    # 얼굴에서 68개 점 찾기
    landmarks = landmark_detector(image, face)

    # create list to contain landmarks
    landmark_list = []

    # append (x, y) in landmark_list
    for p in landmarks.parts():
        landmark_list.append([p.x, p.y])
        cv2.circle(image, (p.x, p.y), 2, (255, 255, 255), -1)

def mobilenet_preprocess_input(x,**kwargs):
    x[..., 0] -= 103.939
    x[..., 1] -= 116.779
    x[..., 2] -= 123.68
    return x


def get_features_scores(image):
    filename2features = {}
    X_global_features, X_feats, X_scores, X_isface = [], [], [], []
    images = image
    images_10 = []
    i = 0
    for imgs in images:
        X_isface.append(True)  # making bbox means has face! so always have faces

        images_10.append(imgs)
        inp = preprocessing_function(np.array(images_10, dtype=np.float32))
        global_features, feats, scores = feature_extractor_model.predict(inp)
        print(global_features.shape,feats.shape,scores.shape)


        if len(X_feats) == 0:
            X_feats = feats
            X_global_features = global_features
            X_scores = scores
        else:
            X_feats = np.concatenate((X_feats, feats), axis=0)
            X_global_features = np.concatenate((X_global_features, global_features), axis=0)
            X_scores = np.concatenate((X_scores, scores), axis=0)

    print("global", X_global_features)
    X_isface = np.array(X_isface)
        # print(X_global_features.shape,X_feats.shape,X_scores.shape)

    filename2features[i] = (X_global_features, X_feats, X_scores, X_isface)
    i += 1

    return filename2features

## create dataset ==> concat function scores

USE_ALL_FEATURES = True


def create_dataset(filename2features):
    x = []
    y = []
    has_faces = []
    ind = 0
    features = filename2features[0]
    total_features = None

    if USE_ALL_FEATURES and True:
        print('here')
        #for face in [1, 0]:
        cur_features = features[ind]

        #if len(cur_features) == 0:
        #    continue
        weight = len(cur_features) / len(features[ind])
        mean_features = np.mean(cur_features, axis=0)
        std_features = np.std(cur_features, axis=0)
        max_features = np.max(cur_features, axis=0)
        min_features = np.min(cur_features, axis=0)

        # join several features together
        feature = np.concatenate((mean_features, std_features, min_features, max_features), axis=None)
        print("Feature", feature)
        if total_features is None:
            total_features = weight * feature
        else:
            total_features += weight * feature
    has_faces.append(1)
    print("total_Features : ", total_features)


    if total_features is not None:
        print("totla features is not none")
        x.append(total_features)
        #y.append(emotion_to_index[category])


    print("out of for moon")
    x = np.array(x)
    y = np.array(y)
    has_faces = np.array(has_faces)
    #print("x : ", x.shape, "y :", y.shape, "has_face : ", has_faces)
    #return x, y, has_faces
    return x #, y, has_faces


## dlib
detector = dlib.get_frontal_face_detector()


## main

preprocessing_function=mobilenet_preprocess_input
# webcam open
cap = cv2.VideoCapture(0,cv2.CAP_DSHOW)

print('camera is opened width: {0}, height: {1}'.format(cap.get(3), cap.get(4)))


if cap.isOpened():
    print('width: {}, height : {}'.format(cap.get(3), cap.get(4)))

while(cap.isOpened()):

    ret, image = cap.read()

    if ret:

        frame = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        bounding_boxes, points = imgProcessing.detect_faces(frame)
        points = points.T

        faceframe_10 = []
        for i in range(10):
            for bbox, p in zip(bounding_boxes, points):
                box = bbox.astype(int)
                x1, y1, x2, y2 = box[0:4]
                # face_img=frame[y1:y2,x1:x2,:]

                p = p.reshape((2, 5)).T

                #top, left, bottom, right = box[0:4]
                #face=dlib.rectangle(left, top, right, bottom)
                face=dlib.rectangle(x1, y1, x2, y2)

                face_img = preprocess(frame, box, p)  ## CROPPED AND ALIGNED
                face_img = cv2.cvtColor(face_img, cv2.COLOR_BGR2RGB)


                ###show aligned cropped image
                cv2.imshow("face_img", face_img)

                ### draw bounding box on original image
                cv2.rectangle(image, (face.left() - 5, face.top() - 5), (face.right() + 5, face.bottom() + 5),
                              (0, 186, 255), 3)
                
                cv2.imshow("original", image)

                faceframe_10.append(face_img)
                
        n = 0
        ## feature score threw aligned cropped image
        features_scores = get_features_scores(faceframe_10)
        x_score= create_dataset(features_scores)
        n += 1

        ## normalization
        x_score_norm = preprocessing.normalize(x_score, norm='l2')
        
        ## ? how to use x_score_norm to predict emotion??

    else:
        print("error")


    if cv2.waitKey(25) & 0xFF == ord('q'):
        record = False
        break

cap.release()
cv2.destroyAllWindows()

Can I get a multi-task learning model file?

Thank you for the good paper.

I'm interested in your work, so I want to check the multi-task learning side, so I'm leaving an issue.
I have two questions.

Multi-task learning training

If understanding is correct, is it correct to freeze the weight of the backbone and learn only the weight of the head?
In this case, is it practically the same as each of you learned?

Multi-task learning model
Can I get a multi-task learning model file that I learned? I'd like to check Aruosal, Valence, etc.

Time of training

Hello.
I would like to know how much time did it take you to train all affentnet?
and what type of GPU did you choose, and how many GPUs did you choose?

question about training on my own data

thank u for your sharing of your models and codes.
in my work i want to train a 3-class face emotion recognition model(8 class is too much for me) on my own data using PYTORCH, and i hope can train my classifier base on enet_b0_8_best_afew.pt (just train classifier with backbone frozen)
i really don't want to train from scratch O_O
but i don't know how to train from it, can u give me some suggestions?

or Can u tell me which should i use of these traing codes below? because i can't tell the difference between them

av-savchenko / face-emotion-recognition Goto Github PK

face-emotion-recognition's Introduction

HSEmotion (High-Speed face Emotion recognition) library

Usage

News

Details

Research papers

face-emotion-recognition's People

Contributors

Stargazers

Watchers

Forkers

face-emotion-recognition's Issues

Recommend Projects

Recommend Topics

Recommend Org