Giter Club home page Giter Club logo

mivolow's Introduction


MiVOLO: Multi-input Transformer for Age and Gender Estimation

PWC PWC PWC PWC PWC PWC PWC

MiVOLO: Multi-input Transformer for Age and Gender Estimation, Maksim Kuprashevich, Irina Tolstykh, 2023 arXiv 2307.04616

[Paper] [Demo] [BibTex] [Data]

MiVOLO pretrained models

Gender & Age recognition performance.

Model Type Dataset Age MAE Age CS@5 Gender Accuracy download
volo_d1 face_only, age IMDB-cleaned 4.29 67.71 - checkpoint
volo_d1 face_only, age, gender IMDB-cleaned 4.22 68.68 99.38 checkpoint
mivolo_d1 face_body, age, gender IMDB-cleaned 4.24 [face+body]
6.87 [body]
68.32 [face+body]
46.32 [body]
99.46 [face+body]
96.48 [body]
checkpoint
volo_d1 face_only, age UTKFace 4.23 69.72 - checkpoint
volo_d1 face_only, age, gender UTKFace 4.23 69.78 97.69 checkpoint
mivolo_d1 face_body, age, gender Lagenda 3.99 [face+body] 71.27 [face+body] 97.36 [face+body] demo

Dataset

Please, cite our paper if you use any this data!

  • Lagenda dataset: images and annotation.

  • IMDB-clean: follow these instructions to get images and download our annotations.

  • UTK dataset: origin full images and our annotation: split from the article, random full split.

  • Adience dataset: follow these instructions to get images and download our annotations.

    Click to expand!

    After downloading them, your data directory should look something like this:

    data
    └── Adience
        ├── annotations  (folder with our annotations)
        ├── aligned      (will not be used)
        ├── faces
        ├── fold_0_data.txt
        ├── fold_1_data.txt
        ├── fold_2_data.txt
        ├── fold_3_data.txt
        └── fold_4_data.txt

    We use coarse aligned images from faces/ dir.

    Using our detector we found a face bbox for each image (see tools/prepare_adience.py).

    This dataset has five folds. The performance metric is accuracy on five-fold cross validation.

    images before removal fold 0 fold 1 fold 2 fold 3 fold 4
    19,370 4,484 3,730 3,894 3,446 3,816

    Not complete data

    only age not found only gender not found SUM
    40 1170 1,210 (6.2 %)

    Removed data

    failed to process image age and gender not found SUM
    0 708 708 (3.6 %)

    Genders

    female male
    9,372 8,120

    Ages (8 classes) after mapping to not intersected ages intervals

    0-2 4-6 8-12 15-20 25-32 38-43 48-53 60-100
    2,509 2,140 2,293 1,791 5,589 2,490 909 901
  • FairFace dataset: follow these instructions to get images and download our annotations.

    Click to expand!

    After downloading them, your data directory should look something like this:

    data
    └── FairFace
       ├── annotations  (folder with our annotations)
       ├── fairface-img-margin025-trainval   (will not be used)
           ├── train
           ├── val
       ├── fairface-img-margin125-trainval
           ├── train
           ├── val
       ├── fairface_label_train.csv
       ├── fairface_label_val.csv
    

    We use aligned images from fairface-img-margin125-trainval/ dir.

    Using our detector we found a face bbox for each image and added a person bbox if it was possible (see tools/prepare_fairface.py).

    This dataset has 2 splits: train and val. The performance metric is accuracy on validation.

    images train images val
    86,744 10,954

    Genders for validation

    female male
    5,162 5,792

    Ages for validation (9 classes):

    0-2 3-9 10-19 20-29 30-39 40-49 50-59 60-69 70+
    199 1,356 1,181 3,300 2,330 1,353 796 321 118

Install

Install pytorch 1.13+ and other requirements.

pip install -r requirements.txt
pip install .

Demo

  1. Download body + face detector model to models/yolov8x_person_face.pt
  2. Download mivolo checkpoint to models/mivolo_imbd.pth.tar
wget https://variety.com/wp-content/uploads/2023/04/MCDNOHA_SP001.jpg -O jennifer_lawrence.jpg

python3 demo.py \
--input "jennifer_lawrence.jpg" \
--output "output" \
--detector-weights "models/yolov8x_person_face.pt " \
--checkpoint "models/mivolo_imbd.pth.tar" \
--device "cuda:0" \
--with-persons \
--draw

To run demo for a youtube video:

python3 demo.py \
--input "https://www.youtube.com/shorts/pVh32k0hGEI" \
--output "output" \
--detector-weights "models/yolov8x_person_face.pt" \
--checkpoint "models/mivolo_imbd.pth.tar" \
--device "cuda:0" \
--draw \
--with-persons

To run cam:

python3 demo.py \
--input "cam" \
--output "output" \
--detector-weights "models/yolov8x_person_face.pt" \
--checkpoint "models/mivolo_imbd.pth.tar" \
--device "cuda:0" \
--draw \
--with-persons

Validation

To reproduce validation metrics:

  1. Download prepared annotations for imbd-clean / utk / adience / lagenda / fairface.
  2. Download checkpoint
  3. Run validation:
python3 eval_pretrained.py \
  --dataset_images /path/to/dataset/utk/images \
  --dataset_annotations /path/to/dataset/utk/annotation \
  --dataset_name utk \
  --split valid \
  --batch-size 512 \
  --checkpoint models/mivolo_imbd.pth.tar \
  --half \
  --with-persons \
  --device "cuda:0"

Supported dataset names: "utk", "imdb", "lagenda", "fairface", "adience".

License

Please, see here

Citing

If you use our models, code or dataset, we kindly request you to cite the following paper and give repository a ⭐

@article{mivolo2023,
   Author = {Maksim Kuprashevich and Irina Tolstykh},
   Title = {MiVOLO: Multi-input Transformer for Age and Gender Estimation},
   Year = {2023},
   Eprint = {arXiv:2307.04616},
}

mivolow's People

Contributors

sweetdream779 avatar wildchlamydia avatar sirfixalot16 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.