mmmmmm44 / vtuber-python-unity Goto Github PK

An Implementation of VTuber (Both 3D and Live2D) using Python and Unity. Providing face movement tracking, eye blinking detection, iris detection and tracking and mouth movement tracking using CPU only.

License: MIT License

C# 54.64% Python 45.36%

python unity vtuber live2d 3d face-tracking facemesh mediapipe mediapipe-facemesh opencv

vtuber-python-unity's Introduction

VTuber Python Unity Tutorial

An Implementation of VTuber (Both 3D and Live2D) using Python and Unity. Supporting face movement tracking, eye blinking detection, iris detection and tracking and mouth movement tracking using CPU only.

Usage

Live2D Demo

UnityChan 3D Demo

Features

Facial landmarks detection and movement tracking supported by Facemesh by Mediapipe.
Various facial expressions detection including eye blinking, iris and mouth movements.
Running smooth 30 FPS with CPU only for the aformentioned features.
Simple and clean UI for adjusting the sensibility of detection in Unity.
Saveload mechanism to save and load your preferences in Unity.
Including sample (Unity) projects for both 3D and Live2D models.
Support Window and M1 machines.
Detailed and thorough explanation videos (with EN and ZH subtitles) playlist

File Explanation

File	Description
main.py	The main program
facial_landmark.py	The module which is used to detect your face and generate the facial landmarks.
pose_estimator.py	The module which estimates your pose/ orientation of your head based on the landmarks.
stabilizer.py	Implementation of Kalman Filter to stabilize the values.
facial_features.py	Various facial features detection implementation, including blinking, iris detection and mouth movement.
model.txt	The points of the 3D Canonical model used in Mediapipe. Source file
UnityAssets	Whole Unity Projects (in packages) and Scripts (old version) for both 3D (UnityChan) and Live2D (Hiyori) models

Background

Using avatars for streaming, content creation and VR gaming has been gaining increasing popularity, especially the boom of Hololive and other related companies active apperances in social media platforms such as YouTube and Twitter. Curious about the technology behind, I create this project after multiple researches.

Existing projects rely on Dlib, which although providing reliable and accurate facial landmark detection, requires decent graphic cards to run. However, implemented with the recent FaceMesh model in Mediapipe, accurate detection and tracking can be run smoothly using CPU only, making running on computers with mediocre graphic cards or laptops with integrated graphic cards possible.

How To Use

Clone this project into your directory

git clone https://github.com/mmmmmm44/VTuber-Python-Unity.git
cd "VTuber-Python-Unity"

Setup

Create An empty Unity 3D Project
Import either the Live2D or UnityChan3D package to your project. The corresponding SDKs have been included already. (Excpet that I removed the voice files of UnityChan in the UnityChan Unity package due to file size limit in github.)
- last edit: 16-05-2022
[NEW] In the Game window of Unity, create a 9:16 portrait ratio display mode and save it. Then the UI will return normal. (Or you can adjust the ratio in the AspectGridLayoutCellSize.cs attached on each of the panel Game Object, namely "Setting Panel" and "TCP Setting Panel".) Tutorial
Run the Scene. Click the Setting button to switch on the TCP server. There should be a "Waiting for connection..." log message showing in the Console of Unity.
Run the following code in terminal [content in the bracket is optional]

python main.py --connect [--debug] [--port PORT]

Enjoy

Custom Setup (YT Tutorial)

For Live2D model

Video Walkthrough: Click Me!

Download the Cubism SDK For Unity from this website and the sample model used (桃瀬ひより) from this website
Create an empty Unity 3D project, and import the Cubism SDK. Unzip the model and drag the whole folder to the Project window of the Unity Project.
Drag the live2D model's prefab into the scene. Run the scene immediately to allow the model to be shown in Scene and Game window.
Adjust the camera's position, background and projection properties. If there are some werid projection problems of the model, changing the projection of the camera from Perspective and Orthographic works for me.
Drag the HiyoriController.cs to the Hiyori GameObject. Adjust the parameters in the inspector
Run the scene.
Run the following code in terminal [content in the bracket is optional]

python main.py --connect [--debug] [--port PORT]

Enjoy

For 3D Model (UnityChan)

Video Walkthrough: Click Me!

Download the UnityChan model from the website. Go to "Data Download", accept the terms and agreements, and select the first one. Unzip the file.
Create an empty Unity 3D Project. Drag the unzipped folder to the Project Window of the project.
Go to UnityChan\Prefabs and Drag the "unitychan" prefab into the scene.
Adjust the camera's position, background and field of view.
Drag the UnityChanControl.cs script onto the prefab. Change the update mode of the Animator attached to "Animate Physics" and the Controller to UnityChanLocomotions. (Crucial) Adjust the variables in the inspector. Disable other attached scripts except AutoBlink and UnityChanController. You may tick the box "is Auto Blink Active" in UnityChanContoller to enable auto blinking (enable AutoBlink script when ticked).
Run the scene first
Run the following code in terminal [content in the bracket is optional]

python main.py --connect [--debug] [--port PORT]

Enjoy

Make sure you run the Unity Scene first before running the python script

The complete Unity Project with fancy stuffs such as UI system, Save Load Preferences, and custom port UI can be found after importing the Unity Packages provided.

Options (main.py)

-h, --help                           show this help message and exit

--connect                            connect to the unity character

--port PORT                          specify the port of the connection to 
                                     unity. Have to be the same as in Unity.

--cam CAM                            specify the camera number to use 
                                     if you have multiple cameras connected
                                     to the computer.
                                     (for cv2.VideoCapture(CAM))

--debug                              show the raw values of the detection 
                                     in the terminal

Examples

Connecting to the Unity Project with default port, showing the sent data.

python main.py --connect --debug

Connecting to the Unity Project with custom port (e.g. 5077), showing the sent data.

python main.py --connect --debug --port 5077

Development Environment

Python 3.8.5
Numpy 1.19.2
OpenCV 4.5.1
Mediapipe 0.8.9.1
Unity 2020.3.12f1

(Later version should be supported as well)

(For Windows, it is recommended to run this project using Anaconda and create a virtual environment before installing such packages.)

The whole project is run on a laptop with Intel Core i5-8250U, with 16GB RAM and integrated graphic card only.

The same project is tested on a M1 Max device, running in rosetta.

References/ Credits

Detect 468 Face Landmarks in Real-time | OpenCV Python | Computer Vision - Murtaza's Workshop - Robotics and AI

Eye motion tracking - Opencv with Python - Pysoruce

Project	Author	LICENSE
head-pose-estimation	Yin Guobing	LICENSE
VTuber_Unity	AI葵	LICENSE
VTuber-MomoseHiyori	KennardWang	LICENSE

Hiyori Momose's model

Position	Creator
Illustration	Kani Biimu [Twitter @kani_biimu]
Modeling	Live2D Inc.

License

MIT

The Unity Chan model in the Unity Packages provided is distributed under Unity-Chan License © Unity Technologies Japan/UCL. A seperate sets of that License is included in UnityAssets/Licenses/UCL2_0

vtuber-python-unity's People

Contributors

Stargazers

Watchers

vtuber-python-unity's Issues

ConnectionRefusedError: [WinError 10061]

Traceback (most recent call last):

ConnectionRefusedError: [WinError 10061] Connection failed because the target computer refused to connect.

I copied it while watching your YouTube video, and it's the result of running unity program first.
Can anyone tell me what the problem is? Do I have to change the port number?

There's an error like this in unity. Is it okay to ignore it?
It's a package downloaded from Dropbox, but it's still like this.....

hwo to set these values?

TCP connnect can not connected, unity is runing

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Traceback (most recent call last):
File ".\main.py", line 239, in
main()
File ".\main.py", line 92, in main
socket = init_TCP()
File ".\main.py", line 37, in init_TCP
s.connect(address)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
[ WARN:[email protected]] global D:\a\opencv-python\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (539) `anonymous-namespace'::SourceReaderCB::~SourceReaderCB terminating async callback

The reason why somttimes iris_image_points will run out of range

If you will list index out of range in main iris_image_points[j, 0] = faces[0][j + 468][0], you need to check your face detector init.

You need to set refine_landmarks be Ture. It mean detector will detect detail.

3D custom model question

if I design a 3D model myself, do I need to follow the topology of the face landmark of mediapipe?

barracuda facemesh & other model

Hi @mmmmmm44 ,

Hope you are all well !

I was wondering how complicated it would be to re-use https://github.com/keijiro/FaceMeshBarracuda with your project.

Do you think it is possible ? Can you help me do so ?

Thanks for any insights or inputs on that.

Cheers,
Luc Michalski

Why the pitch always -90

I print the pitch value, it always -90. How to fix it?

i cant change unitychan character.

2022-07-17.23-31-47.mp4

i tested this. but unitychan is not moving.

what should i do?

About asking how to set the port of unity

Hello, my python is already running, but cannot communicate with unity, what should I do in this step?

i can connect without error, but it so slow..

i tryed print log. so, I knew where it slowed down.

i started 5:50:00
but that is time for opened window.

this is my main.py code.
i changed something.

original code and my code take too long to open the camera. (2min 28sec)

"""
Main program to run the detection and TCP
"""

from argparse import ArgumentParser
import cv2
import mediapipe as mp
import numpy as np

# for TCP connection with unity
import socket

# face detection and facial landmark
from facial_landmark import FaceMeshDetector

# pose estimation and stablization
from pose_estimator import PoseEstimator
from stabilizer import Stabilizer

# Miscellaneous detections (eyes/ mouth...)
from facial_features import FacialFeatures, Eyes

import sys

# global variable
port = 5066         # have to be same as unity

# init TCP connection with unity
# return the socket connected
def init_TCP():
    port = args.port

    # '127.0.0.1' = 'localhost' = your computer internal data transmission IP
    # address = ('127.0.0.1', port)
    # address = ('121.160.178.145', port)
    address = ('172.30.1.31', port)
    # address = ('192.168.0.107', port)

    try:
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.connect(address)
        # print(socket.gethostbyname(socket.gethostname()) + "::" + str(port))
        print("Connected to address:", socket.gethostbyname(socket.gethostname()) + ":" + str(port))
        return s
    except OSError as e:
        print("Error while connecting :: %s" % e)
        
        # quit the script if connection fails (e.g. Unity server side quits suddenly)
        sys.exit()

    # s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    # # print(socket.gethostbyname(socket.gethostname()))
    # s.connect(address)
    # return s

def send_info_to_unity(s, args):
    msg = '%.4f ' * len(args) % args

    try:
        s.send(bytes(msg, "utf-8"))
    except socket.error as e:
        print("error while sending :: " + str(e))

        # quit the script if connection fails (e.g. Unity server side quits suddenly)
        sys.exit()

def print_debug_msg(args):
    msg = '%.4f ' * len(args) % args
    print(msg)

def main():

    print("start")
    # use internal webcam/ USB camera
    cap = cv2.VideoCapture(1)
    print("cap include success")

    # IP cam (android only), with the app "IP Webcam"
    # url = 'http://192.168.0.102:4747/video'
    # url = 'https://192.168.0.102:8080/video'
    # cap = cv2.VideoCapture(url)

    # Facemesh
    detector = FaceMeshDetector()
    print("detector include success")

    # get a sample frame for pose estimation img
    success, img = cap.read()
    print("cap read success")

    # Pose estimation related
    pose_estimator = PoseEstimator((img.shape[0], img.shape[1]))
    print("pose_estimator include success")
    image_points = np.zeros((pose_estimator.model_points_full.shape[0], 2))
    print("image_points include success")

    # extra 10 points due to new attention model (in iris detection)
    iris_image_points = np.zeros((10, 2))
    print("iris_image_points include success")

    # Introduce scalar stabilizers for pose.
    pose_stabilizers = [Stabilizer(
        state_num=2,
        measure_num=1,
        cov_process=0.1,
        cov_measure=0.1) for _ in range(6)]
    print("pose_stabilizers include success")

    # for eyes
    eyes_stabilizers = [Stabilizer(
        state_num=2,
        measure_num=1,
        cov_process=0.1,
        cov_measure=0.1) for _ in range(6)]
    print("eyes_stabilizers include success")

    # for mouth_dist
    mouth_dist_stabilizer = Stabilizer(
        state_num=2,
        measure_num=1,
        cov_process=0.1,
        cov_measure=0.1
    )
    print("mouth_dist_stabilizer include success")


    # Initialize TCP connection
    if args.connect:
        socket = init_TCP()
        print("socket init success")

    while cap.isOpened():
        success, img = cap.read()

        if not success:
            print("Ignoring empty camera frame.")
            continue

        # Pose estimation by 3 steps:
        # 1. detect face;
        # 2. detect landmarks;
        # 3. estimate pose

        # first two steps
        img_facemesh, faces = detector.findFaceMesh(img)

        # flip the input image so that it matches the facemesh stuff
        img = cv2.flip(img, 1)

        # if there is any face detected
        if faces:
            # only get the first face
            for i in range(len(image_points)):
                image_points[i, 0] = faces[0][i][0]
                image_points[i, 1] = faces[0][i][1]
                
            # for refined landmarks around iris
            for j in range(len(iris_image_points)):
                iris_image_points[j, 0] = faces[0][j + 468][0]
                iris_image_points[j, 1] = faces[0][j + 468][1]

            # The third step: pose estimation
            # pose: [[rvec], [tvec]]
            pose = pose_estimator.solve_pose_by_all_points(image_points)

            x_ratio_left, y_ratio_left = FacialFeatures.detect_iris(image_points, iris_image_points, Eyes.LEFT)
            x_ratio_right, y_ratio_right = FacialFeatures.detect_iris(image_points, iris_image_points, Eyes.RIGHT)


            ear_left = FacialFeatures.eye_aspect_ratio(image_points, Eyes.LEFT)
            ear_right = FacialFeatures.eye_aspect_ratio(image_points, Eyes.RIGHT)

            pose_eye = [ear_left, ear_right, x_ratio_left, y_ratio_left, x_ratio_right, y_ratio_right]

            mar = FacialFeatures.mouth_aspect_ratio(image_points)
            mouth_distance = FacialFeatures.mouth_distance(image_points)

            # print("left eye: %.2f, %.2f" % (x_ratio_left, y_ratio_left))
            # print("right eye: %.2f, %.2f" % (x_ratio_right, y_ratio_right))

            # print("rvec (y) = (%f): " % (pose[0][1]))
            # print("rvec (x, y, z) = (%f, %f, %f): " % (pose[0][0], pose[0][1], pose[0][2]))
            # print("tvec (x, y, z) = (%f, %f, %f): " % (pose[1][0], pose[1][1], pose[1][2]))

            # Stabilize the pose.
            steady_pose = []
            pose_np = np.array(pose).flatten()

            for value, ps_stb in zip(pose_np, pose_stabilizers):
                ps_stb.update([value])
                steady_pose.append(ps_stb.state[0])

            steady_pose = np.reshape(steady_pose, (-1, 3))

            # stabilize the eyes value
            steady_pose_eye = []
            for value, ps_stb in zip(pose_eye, eyes_stabilizers):
                ps_stb.update([value])
                steady_pose_eye.append(ps_stb.state[0])

            mouth_dist_stabilizer.update([mouth_distance])
            steady_mouth_dist = mouth_dist_stabilizer.state[0]

            # uncomment the rvec line to check the raw values
            # print("rvec steady (x, y, z) = (%f, %f, %f): " % (steady_pose[0][0], steady_pose[0][1], steady_pose[0][2]))
            # print("tvec steady (x, y, z) = (%f, %f, %f): " % (steady_pose[1][0], steady_pose[1][1], steady_pose[1][2]))

            # calculate the roll/ pitch/ yaw
            # roll: +ve when the axis pointing upward
            # pitch: +ve when we look upward
            # yaw: +ve when we look left
            roll = np.clip(np.degrees(steady_pose[0][1]), -90, 90)
            pitch = np.clip(-(180 + np.degrees(steady_pose[0][0])), -90, 90)
            yaw =  np.clip(np.degrees(steady_pose[0][2]), -90, 90)

            # print("Roll: %.2f, Pitch: %.2f, Yaw: %.2f" % (roll, pitch, yaw))
            # print("left eye: %.2f, %.2f; right eye %.2f, %.2f"
            #     % (steady_pose_eye[0], steady_pose_eye[1], steady_pose_eye[2], steady_pose_eye[3]))
            # print("EAR_LEFT: %.2f; EAR_RIGHT: %.2f" % (ear_left, ear_right))
            # print("MAR: %.2f; Mouth Distance: %.2f" % (mar, steady_mouth_dist))

            # send info to unity
            if args.connect:

                # for sending to live2d model (Hiyori)
                send_info_to_unity(socket,
                    (roll, pitch, yaw,
                    ear_left, ear_right, x_ratio_left, y_ratio_left, x_ratio_right, y_ratio_right,
                    mar, mouth_distance)
                )

            # print the sent values in the terminal
            if args.debug:
                print_debug_msg((roll, pitch, yaw,
                        ear_left, ear_right, x_ratio_left, y_ratio_left, x_ratio_right, y_ratio_right,
                        mar, mouth_distance))


            # pose_estimator.draw_annotation_box(img, pose[0], pose[1], color=(255, 128, 128))

            # pose_estimator.draw_axis(img, pose[0], pose[1])

            pose_estimator.draw_axes(img_facemesh, steady_pose[0], steady_pose[1])

        else:
            # reset our pose estimator
            pose_estimator = PoseEstimator((img_facemesh.shape[0], img_facemesh.shape[1]))

        cv2.imshow('Facial landmark', img_facemesh)
   
        # press "q" to leave
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()


if __name__ == "__main__":

    parser = ArgumentParser()

    parser.add_argument("--connect", action="store_true",
                        help="connect to unity character",
                        default=False)

    parser.add_argument("--port", type=int, 
                        help="specify the port of the connection to unity. Have to be the same as in Unity", 
                        default=5066)

    parser.add_argument("--cam", type=int,
                        help="specify the camera number if you have multiple cameras",
                        default=1)

    parser.add_argument("--debug", action="store_true",
                        help="showing raw values of detection in the terminal",
                        default=False)

    args = parser.parse_args()

    # demo code
    main()

What should I do to make it faster?

Live2D chitoge problem

The other eye isn't closing as well as the other one isn't opening even the mouth wont open.
The only thing works is moving the head, Tried messing some settings and manage to open both of the eyes but it wont blink now.

Or maybe Chitoge's model is different on Hiyori's model? I know they came from Live2D sample data but why chitoge facial isn't working.
Hiyori's model is working fine but kinda glitchy eye blinking if I'm not close to the camera.

main.py isn't opening any window

I'm unable to run python unity connection together.
Getting this issue

apart from this, whenever I run my mainpy file, it doesn't open any window. face landmark work properly independently.

can you please help me out?
also, unity window shows 0 animations.

Motion of hair components

Hello, thanks for your work, that's very interesting. I like it so I tried the live2d part. But when I follow the steps, when I shake my head, the part of the character's hair doesn't seem to move slightly as shown. How should I set it? Thank You!

import error, how to resolve?

how to track eye pose?

Hi, I found the eye's pose doesn't make the model animate. Anyway to support eye pose correctly?

can i run it in ubuntu 16.04

Error, Can you help me?

NullReferenceException: Object reference not set to an instance of an object
UnityChanController.Start () (at Assets/Scripts/UnityChanController.cs:66)

The code is:

    GameObject.FindWithTag("GameController").GetComponent<UISystem>().LoadData();
    GameObject.FindWithTag("GameController").GetComponent<UISystem>().InitUI();

[Tutorial] Create a 9:16 game aspect ratio in Unity to let the UI showing correctly

Thank you for your interest about this project. I did not expect that much attention on YouTube in last year when I uploaded.

###Background

I changed the game running aspect ratio from a landscape 4:3 to a portrait 9:16 in the view of showing the full body of the model. This is common in vtuber capturing software running on mobile, and for streamer to slowly revealing their new outfits from toe to head.

Moreover, I adjusted the UI panels optimally for this aspect ratio. Yet Unity does not come with this aspect ratio by default. Luckily, the changes are simple and with a few steps you can get a nice-looking UI.

###Steps

Here I am using the UnityChan project as a demo. The procedure is applicable to the Live2D unity project.

After importing a package, your UI should look like this, with widgets, buttons leaving their ideal position.

Click the aspect ratio area. A drop-down menu should be shown.

Click the "+" button at the bottom of the drop-down menu

A menu is shown for you to input the custom 9:16 aspect ratio. Following the picture bellow to input them. Press "OK".

Apply the newly input aspect ratio and you can see the UI components are back to their right position.

###Add-ons

If you feel uncomfortable with the UI, or want to adjust the height of each columns for landscape 4:3 display, or whether reason, the height of each columns can be adjusted in each of the panel, namely "Setting Panel" (child of "Character Setting Canvas"), and "TCP Panel". There is a script namely "AdjustGridLayoutCellSize" attached. Inside the script there is a parameter called "Cell Ratio". Adjust the value to increase/ decrease the height of the columns.

(Hint: the meaning is height = width of the panel (Cell size X of Grid Layout Group) * "Cell Ratio")

Thank you for your interest on this project.

I hope you can enjoy this project.

mmmmmm44