Giter Club home page Giter Club logo

google-mediapipe's Introduction

Google MediaPipe for Pose Estimation

MediaPipe is a cross-platform framework for building multimodal applied machine learning pipelines including inference models and media processing functions.

The main purpose of this repo is to:

  • Customize output of MediaPipe solutions
  • Customize visualization of 2D & 3D outputs
  • Demo some simple applications on Python (refer to Demo Overview)
  • Demo some simple applications on JavaScript refer to java folder

Pose Estimation with Input Color Image

Attractiveness of Google MediaPipe as compared to other SOTA (e.g. FrankMocap, CMU OpenPose, DeepPoseKit, DeepLabCut, MinimalHand):

  • Fast: Runs at almost realtime rate on CPU and even mobile devices
  • Open-source: Codes are freely available at github (except that details of network models are not released)
  • User-friendly: For python API just pip install mediapipe will work (but C++ API is much more troublesome to build and use)
  • Cross-platform: Works across Android, iOS, desktop, JavaScript and web (Note: this repo only focuses on using Python API for desktop usage)
  • ML Solutions: Apart from face, hand, body and object pose estimations, MediaPipe offers an array of machine learning applications refer to their github for more details

Features

Latest MediaPipe Python API version 0.8.9.1 (Released 14 Dec 2021) features:

Face Detect (2D face detection)

Face Mesh (468/478 3D face landmarks)

Hands (21 3D landmarks and able to support multiple hands, 2 levels of model complexity) (NEW world coordinates)

Body Pose (33 3D landmarks for whole body, 3 levels of model complexity)

Holistic (Face + Hands + Body) (A total of 543/535 landmarks: 468 face + 2 x 21 hands + 33/25 pose)

Objectron (3D object detection and tracking) (4 possible objects: Shoe / Chair / Camera / Cup)

Selfie Segmentation (Segments human for selfie effect/video conferencing)

Note: The above videos are presented at CVPR 2020 Fourth Workshop on Computer Vision for AR/VR, interested reader can refer to the link for other related works.

Installation

The simplest way to run our implementation is to use anaconda.

You can create an anaconda environment called mp with

conda env create -f environment.yaml
conda activate mp

Demo Overview

Single Image Video Input Gesture Recognition Rock Paper Scissor Game
IMAGE ALT TEXT HERE
Measure Hand ROM Measure Wrist and Forearm ROM Face Mask Triangulate Points for 3D Pose
3D Skeleton 3D Object Detection Selfie Segmentation

Usage

5 different modes are available and sample images are located in data/sample/ folder

python 00_image.py --mode face_detect
python 00_image.py --mode face
python 00_image.py --mode hand
python 00_image.py --mode body
python 00_image.py --mode holistic

Note: The sample images for subject with body marker are adapted from An Asian-centric human movement database capturing activities of daily living and the image of Mona Lisa is adapted from Wiki

5 different modes are available and video capture can be done online through webcam or offline from your own .mp4 file

python 01_video.py --mode face_detect
python 01_video.py --mode face
python 01_video.py --mode hand
python 01_video.py --mode body
python 01_video.py --mode holistic

Note: It takes around 10 to 30 FPS on CPU, depending on the mode selected. The video demonstrating supported mini-squats is adapted from National Stroke Association

2 modes are available: Use evaluation mode to perform recognition of 11 gestures and use train mode to log your own training data

python 02_gesture.py --mode eval
python 02_gesture.py --mode train

Note: A simple but effective K-nearest neighbor (KNN) algorithm is used as the classifier. For the hand gesture recognition demo, since 3D hand joints are available, we can compute flexion joint angles (feature vector) and use it to classify different hand poses. On the other hand, if 3D body joints are not yet reliable, the normalized pairwise distances between predifined lists of joints as described in MediaPipe Pose Classification could also be used as the feature vector for KNN.

Simple game of rock paper scissor requires a pair of hands facing the camera

python 03_game_rps.py

For another game of flappy bird refer to this github

2 modes are available: Use evaluation mode to perform hand ROM recognition and use train mode to log your own training data

python 04_hand_rom.py --mode eval
python 04_hand_rom.py --mode train

3 modes are available and user has to input the side of the hand to be measured

  • 0: Wrist flexion/extension
  • 1: Wrist radial/ulnar deviation
  • 2: Forearm pronation/supination
python 05_wrist_rom.py --mode 0 --side right
python 05_wrist_rom.py --mode 1 --side right
python 05_wrist_rom.py --mode 2 --side right
python 05_wrist_rom.py --mode 0 --side left
python 05_wrist_rom.py --mode 1 --side left
python 05_wrist_rom.py --mode 2 --side left

Note: For measuring forearm pronation/supination, the camera has to be placed at the same level as the hand such that palmar side of the hand is directly facing camera. For measuring wrist ROM, the camera has to be placed such that upper body of the subject is visible, refer to examples of wrist_XXX.png images in data/sample/ folder. The wrist images are adapted from Goni Wrist Flexion, Extension, Radial & Ulnar Deviation

Overlay a 3D face mask on the detected face in image plane

python 06_face_mask.py

Note: The face image is adapted from MediaPipe 3D Face Transform

Estimating 3D body pose from a single 2D image is an ill-posed problem and extremely challenging. One way to reconstruct 3D body pose is to make use of multiview setup and perform triangulation. For offline testing, use CMU Panoptic Dataset, follow the instructions on PanopticStudio Toolbox to download a sample dataset 171204_pose1_sample into data/ folder

python 07_triangulate.py --mode body --use_panoptic_dataset

3D pose estimation is available in full-body mode and this demo displays the estimated 3D skeleton of the hand and/or body. 3 different modes are available and video capture can be done online through webcam or offline from your own .mp4 file

python 08_skeleton_3D.py --mode hand
python 08_skeleton_3D.py --mode body
python 08_skeleton_3D.py --mode holistic

4 different modes are available and a sample image is located in data/sample/ folder. Currently supports 4 classes: Shoe / Chair / Cup / Camera.

python 09_objectron.py --mode shoe
python 09_objectron.py --mode chair
python 09_objectron.py --mode cup
python 09_objectron.py --mode camera

2 modes are available. The landscape mode has fewer FLOPS than the general model and may run faster. The selfie segmentation works best for selfie effects and video conferencing, where the person is close (< 2m) to the camera.

python 10_segmentation.py --mode general
python 10_segmentation.py --mode landscape

Limitations:

Estimating 3D pose from a single 2D image is an ill-posed problem and extremely challenging, thus the measurement of ROM may not be accurate! Please refer to the respective model cards for more details on other types of limitations such as lighting, motion blur, occlusions, image resolution, etc.

google-mediapipe's People

Contributors

guanming001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

google-mediapipe's Issues

Calibration of stereo camera based on world landmarks

Hi,
I have seen you have also a module based on stereo camera. I wonder if you'd be interested in idea of calculating camera calibration (basically only relative position) using just a pose from 2 cameras?
The idea is that you capture points in 2d and also have a 3d model from world landmarks. We assume that there is one valid model for example from 1st camera. If we apply SolvePNP with first model to 2d points of first and second camera and negate (I think) the 3d coordinates we can get actual pose of the both cameras. This method most likely will be far in precision from chessboard but if we capture few samples and take only most important joints then it might lead to good enough estimate to use it with mediapipe.
I am just writing about this feature in case you are interested and find it applicable to your solution.
If that is not applicable to your project our you simply doesn't have time then sorry and I shall close this issue :)
Thanks,
Greg

Demo of body 3d pose in space

Hello.
I was looking at 05_wrist_rom where I can see hand in 3d being properly positioned in all 3 axis, that is left, right and depth with default parameters.
In new mediapipe there is also position in meters but origin is in hips. Could you consider making demo of body pose but located in space same as you have now hand?
Thank you for great examples.

mediapipe, import error :DLL load failed: The specified module could not be found.

This is happening while import the mediapipe module.
**
import mediapipe
Traceback (most recent call last):
File "<pyshell#1>", line 1, in
import mediapipe
File "C:\Users\new user\AppData\Local\Programs\Python\Python37\lib\site-packages\mediapipe_init_.py", line 16, in
from mediapipe.python import *
File "C:\Users\new user\AppData\Local\Programs\Python\Python37\lib\site-packages\mediapipe\python_init_.py", line 17, in
from mediapipe.python._framework_bindings import resource_util
ImportError: DLL load failed: The specified module could not be found.

**
How to solve this problem
mediapipe

Behaviour of 08_skeleton_3D

Hi.
I'd like to ask how this demo is supposed to work. I can see in Open3D window the same thing as in "img 2D".
Thanks

TypeError: Descriptors cannot not be created directly.

python .\code\00_image.py --mode face_detect
Traceback (most recent call last):
  File ".\code\00_image.py", line 16, in <module>
    from utils_mediapipe import MediaPipeFaceDetect, MediaPipeFace, MediaPipeHand, MediaPipeBody, MediaPipeHolistic
  File "D:\CV2022\mymediapipe\code\utils_mediapipe.py", line 9, in <module>
    import mediapipe as mp
  File "D:\Anaconda3\envs\mediapipe\lib\site-packages\mediapipe\__init__.py", line 17, in <module>
    import mediapipe.python.solutions as solutions
  File "D:\Anaconda3\envs\mediapipe\lib\site-packages\mediapipe\python\solutions\__init__.py", line 17, in <module>
    import mediapipe.python.solutions.drawing_styles
  File "D:\Anaconda3\envs\mediapipe\lib\site-packages\mediapipe\python\solutions\drawing_styles.py", line 20, in <module>
    from mediapipe.python.solutions.drawing_utils import DrawingSpec
  File "D:\Anaconda3\envs\mediapipe\lib\site-packages\mediapipe\python\solutions\drawing_utils.py", line 25, in <module>
    from mediapipe.framework.formats import detection_pb2
  File "D:\Anaconda3\envs\mediapipe\lib\site-packages\mediapipe\framework\formats\detection_pb2.py", line 16, in <module>
    from mediapipe.framework.formats import location_data_pb2 as mediapipe_dot_framework_dot_formats_dot_location__data__pb2
  File "D:\Anaconda3\envs\mediapipe\lib\site-packages\mediapipe\framework\formats\location_data_pb2.py", line 16, in <module>
    from mediapipe.framework.formats.annotation import rasterization_pb2 as mediapipe_dot_framework_dot_formats_dot_annotation_dot_rasterization__pb2
  File "D:\Anaconda3\envs\mediapipe\lib\site-packages\mediapipe\framework\formats\annotation\rasterization_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "D:\Anaconda3\envs\mediapipe\lib\site-packages\google\protobuf\descriptor.py", line 560, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.