Giter Club home page Giter Club logo

mediapipe-samples's People

Contributors

chanwooleeme avatar dakyz avatar dependabot[bot] avatar duy-maimanh avatar ewwwgiddings avatar haruiz avatar hheydary avatar jenperson avatar joezoug avatar khanhlvg avatar kinarr avatar ktonthat avatar kuaashish avatar linchenn avatar markmcd avatar mohammad3id avatar morganchen12 avatar neilblaze avatar nutsiepully avatar paultr avatar priankakariatyml avatar sator-imaging avatar satoren avatar schmidt-sebastian avatar st-tuanmai avatar thatfiredev avatar unixxxx avatar vis-wa avatar woodyhoko avatar yuedev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mediapipe-samples's Issues

Mediapipe Pose livestream segmentation causes Python to quit

I am trying to run Mediapipe Pose in livestream mode

options = PoseLandmarkerOptions(
    base_options=BaseOptions(model_asset_path='pose_landmarker_full.task'),
    running_mode=VisionRunningMode.LIVE_STREAM,
    output_segmentation_masks=True,
    result_callback=print_result)

with PoseLandmarker.create_from_options(options) as pose:
    cap = cv2.VideoCapture(0)
    i = 0
    while (1):
        succes, image = cap.read()
        image = np.array(image)
        imgRGB = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=imgRGB)
        pose.detect_async(mp_image,i)
        i += 1

however, if I try to output the segmentation mask, the Python interpreter quits without any error messages
I have narrowed it down to this line

def print_result(result: PoseLandmarkerResult, output_image: mp.Image, timestamp_ms: int):

    alpha= result.segmentation_masks[0].numpy_view()

if that code is alpha= result.segmentation_masks , then Mediapipe will run and alpha is of type mediapipe.python._framework_bindings.image.Image object but if I wish to return the pixel data with .numpy_view() , it will cause Python to quit without any messages

Low framerate in new segmentation solution

The new segmentation solution for the webcam has a much lower quality compared to the old legacy selfie - segmentation. At least the framerate is obviously much lower. The old solution worked very smoothly. Is there anything that can be tuned to get the same results?

MediaPipe use NNAPI

I want to use Android NNAPI in Hand landmarks detection.
How can i do?
Thanks you

Is there any difference between this framework and the [https://github.com/google/mediapipe] on android platform?

Is there any difference between this framework and the https://github.com/google/mediapipe on android platform? Or are they compatible?

I found the maven depends on are different:
com.google.mediapipe:solution-core:latest.release
implementation 'com.google.mediapipe:solution-core:latest.release'

com.google.mediapipe:tasks-vision:0.1.0-alpha-5
implementation 'com.google.mediapipe:tasks-vision:0.1.0-alpha-5'

Can they migrate to each other? Or is the model file xx.tflite they use compatible?

FLAME indices correspondence

Hello.

I am using your FaceLandmarker found here: https://github.com/googlesamples/mediapipe/blob/main/examples/face_landmarker/python/%5BMediaPipe_Python_Tasks%5D_Face_Landmarker.ipynb

Currently I am trying to use the dense landmarks while training a FLAME-based 3D model. Is there such a correspondence between the dense landmarks you produce (478 points) and the FLAME vertices? This repo (https://github.com/Zielon/metrical-tracker/tree/master/flame/mediapipe) seems to have found a correspondence for 105 points, but not the entire 478 points.

GPU accelerated Whisper inference in Mediapipe?

Hi there,
i really like the idea of low-code ML dev tools.
Especially the GPU acelerated inference on android devices!

@st-duymai & @PaulTR:

  1. is there a Whisper (audio-to-text) demo in scope? With streamlined/recorded audio real time into text transcription? I believe with the GPU support it could be done real time, given that there are binaries that almost real time run on CPU.
  2. if not, could one use .tflite models (like this one) to achieve the above with your framework?

thank you for your time and reply in advance!

Optimized MediaPipe HandGestureRecognizer (JS) example

I came across the CodePen link of MediaPipe HandGestureRecognizer (JS) example from here and I'm curious to know whether these changes (or rather updates) can be applied to the same to optimize & increase code readability.


  1. Arrow Function has been implemented to increase conciseness and readability while adhering to ES6 standards.
async function runDemo() {
  const vision = await FilesetResolver.forVisionTasks(
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
  );
  gestureRecognizer = await GestureRecognizer.createFromOptions(vision, {
    baseOptions: {
      modelAssetPath:
        "https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/gesture_recognizer.task"
    },
    runningMode: runningMode
  });
  demosSection.classList.remove("invisible");
}
runDemo();

⬇️ (Updated Code)

const runDemo = async () => {
  const vision = await FilesetResolver.forVisionTasks(
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
  );
  gestureRecognizer = await GestureRecognizer.createFromOptions(vision, {
    baseOptions: {
      modelAssetPath:
        "https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/gesture_recognizer.task"
    },
    runningMode: runningMode
  });
  demosSection.classList.remove("invisible");
};
runDemo();

  1. Unnecessary assignment of runningMode variable has been avoided, which extensively reduces the risk of introducing bugs (if reused). We directly pass the string to the setOptions() method instead of assigning the "IMAGE" string to the runningMode variable, which simplifies the code and makes it more concise.
  if (runningMode === "VIDEO") {
    runningMode = "IMAGE";
    await gestureRecognizer.setOptions({ runningMode: runningMode });
  }

⬇️ (Updated Code)

  if (runningMode === "VIDEO") {
    await gestureRecognizer.setOptions({ runningMode: "IMAGE" });
  }

  1. Using let instead of var in the loop (adhering to ES6 standards) & making it concise & easier to read.
  const allCanvas = event.target.parentNode.getElementsByClassName("canvas");
  for (var i = allCanvas.length - 1; i >= 0; i--) {
    const n = allCanvas[i];
    n.parentNode.removeChild(n);
  }

⬇️ (Updated Code)

  const allCanvas = event.target.parentNode.getElementsByClassName("canvas");
  for (let i = allCanvas.length - 1; i >= 0; i--) {
    allCanvas[i].parentNode.removeChild(allCanvas[i]);
  }

  1. Leveraging the optional chaining operator in the Arrow function makes it more concise and readable & avoids the need for additional checks.
function hasGetUserMedia() {
  return !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
}

⬇️ (Updated Code)

const hasGetUserMedia = () => !!(navigator.mediaDevices?.getUserMedia);

  1. Optimizing using a ternary operator which avoids unnecessary repetition of code.
  if (webcamRunning === true) {
    webcamRunning = false;
    enableWebcamButton.innerText = "ENABLE PREDICTIONS";
  } else {
    webcamRunning = true;
    enableWebcamButton.innerText = "DISABLE PREDICITONS";
  }

⬇️ (Updated Code)

  webcamRunning = !webcamRunning;
  enableWebcamButton.innerText = webcamRunning ? "DISABLE PREDICTIONS" : "ENABLE PREDICTIONS";

  1. Optimized using the Arrow function & optional chaining, resulting in reduction of verbose code. Moreover, instead of using === true to compare a boolean value to true (which is unnecessary), the boolean value itself can be used as the condition.
async function predictWebcam() {
  const webcamElement = document.getElementById("webcam");
  // Now let's start detecting the stream.
  if (runningMode === "IMAGE") {
    runningMode = "VIDEO";
    await gestureRecognizer.setOptions({ runningMode: runningMode });
  }
  let nowInMs = Date.now();
  const results = gestureRecognizer.recognizeForVideo(video, nowInMs);

  canvasCtx.save();
  canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);

  canvasElement.style.height = videoHeight;
  webcamElement.style.height = videoHeight;
  canvasElement.style.width = videoWidth;
  webcamElement.style.width = videoWidth;
  if (results.landmarks) {
    for (const landmarks of results.landmarks) {
      drawConnectors(canvasCtx, landmarks, HAND_CONNECTIONS, {
        color: "#00FF00",
        lineWidth: 5
      });
      drawLandmarks(canvasCtx, landmarks, { color: "#FF0000", lineWidth: 2 });
    }
  }
  canvasCtx.restore();
  if (results.gestures.length > 0) {
    gestureOutput.style.display = "block";
    gestureOutput.style.width = videoWidth;
    gestureOutput.innerText =
      "GestureRecognizer: " +
      results.gestures[0][0].categoryName +
      "\n Confidence: " +
      Math.round(parseFloat(results.gestures[0][0].score) * 100) +
      "%";
  } else {
    gestureOutput.style.display = "none";
  }
  // Call this function again to keep predicting when the browser is ready.
  if (webcamRunning === true) {
    window.requestAnimationFrame(predictWebcam);
  }
}

⬇️ (Updated Code)

const predictWebcam = async () => {
  const webcamElement = document.getElementById("webcam");
  // Now let's start detecting the stream.
  if (runningMode === "IMAGE") {
    runningMode = "VIDEO";
    await gestureRecognizer.setOptions({ runningMode: runningMode });
  }
  let nowInMs = Date.now();
  const results = gestureRecognizer.recognizeForVideo(video, nowInMs);

  canvasCtx.save();
  canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);

  canvasElement.style.height = videoHeight;
  webcamElement.style.height = videoHeight;
  canvasElement.style.width = videoWidth;
  webcamElement.style.width = videoWidth;
  results.landmarks?.forEach((landmarks) => {
    drawConnectors(canvasCtx, landmarks, HAND_CONNECTIONS, {
      color: "#00FF00",
      lineWidth: 5
    });
    drawLandmarks(canvasCtx, landmarks, { color: "#FF0000", lineWidth: 2 });
  });

  canvasCtx.restore();
  if (results.gestures.length > 0) {
    gestureOutput.style.display = "block";
    gestureOutput.style.width = videoWidth;
    const categoryName = results.gestures[0][0].categoryName;
    const score = Math.round(parseFloat(results.gestures[0][0].score) * 100);
    gestureOutput.innerText = `GestureRecognizer: ${categoryName}\n Confidence: ${score}%`;
  } else {
    gestureOutput.style.display = "none";
  }
  // Call this function again to keep predicting when the browser is ready.
  if (webcamRunning) {
    window.requestAnimationFrame(predictWebcam);
  }
}

💡 The updated CodePen example can be accessed here.
cc: @jenperson

OS : Ubuntu (22.04 LTS, x64)   /   Windows 10 Pro (x64)
Browser :

  • Google Chrome — Version 111.0.5563.147 (Official Build) (64-bit)
  • Microsoft Edge [Version 112.0.1722.39 (Official build) (64-bit)]

Fail to initialize gesture recognizer

If start gesture recognizer app on arm64-v8a device so everything is OK. But starting on armeabi-v7a device (Android version is the same and equals to 11) makes facing to some problems:

E/tflite: The supplied buffer is not 4-bytes aligned
E/tflite: The model allocation is null/empty
E/native: E20221114 20:29:51.589087  3371 graph.cc:472] Could not build model from the provided pre-loaded flatbuffer: The model allocation is null/empty
W/System.err: com.google.mediapipe.framework.MediaPipeException: unknown: Could not build model from the provided pre-loaded flatbuffer: The model allocation is null/empty
W/System.err:     at com.google.mediapipe.framework.Graph.nativeStartRunningGraph(Native Method)
W/System.err:     at com.google.mediapipe.framework.Graph.startRunningGraph(Graph.java:336)
W/System.err:     at com.google.mediapipe.tasks.core.TaskRunner.create(TaskRunner.java:71)
W/System.err:     at com.google.mediapipe.tasks.vision.gesturerecognizer.GestureRecognizer.createFromOptions(GestureRecognizer.java:194)
W/System.err:     at com.google.mediapipe.examples.gesturerecognizer.GestureRecognizerHelper.setupGestureRecognizer(GestureRecognizerHelper.kt:95)
W/System.err:     at com.google.mediapipe.examples.gesturerecognizer.GestureRecognizerHelper.<init>(GestureRecognizerHelper.kt:50)
W/System.err:     at com.google.mediapipe.examples.gesturerecognizer.fragment.CameraFragment.onViewCreated$lambda-4(CameraFragment.kt:147)
W/System.err:     at com.google.mediapipe.examples.gesturerecognizer.fragment.CameraFragment.$r8$lambda$xiZI6LDAjMBw-J7vyrjSe_CLWo0(Unknown Source:0)
W/System.err:     at com.google.mediapipe.examples.gesturerecognizer.fragment.CameraFragment$$ExternalSyntheticLambda13.run(Unknown Source:2)
W/System.err:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
W/System.err:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
W/System.err:     at java.lang.Thread.run(Thread.java:923)
E/GestureRecognizerHelper 56784282: MP Task Vision failed to load the task with error: unknown: Could not build model from the provided pre-loaded flatbuffer: The model allocation is null/empty

I guess the error depends on device arch. What can you advice in this situation?

audio_classification_live_stream - Python

Hi,

I got the following error:

Traceback (most recent call last):  
 File "...\mediapipe\examples\audio_classifier\python\audio_classification_live_stream\classify.py", line 134, in <module>
    main()  
 File "...\mediapipe\examples\audio_classifier\python\audio_classification_live_stream\classify.py", line 129, in main
    run(args.model, int(args.maxResults), float(args.scoreThreshold),  
File "...\mediapipe\examples\audio_classifier\python\audio_classification_live_stream\classify.py", line 57, in run
    classifier = audio.AudioClassifier.create_from_options(options)
 File "...\mediapipe\examples\audio_classifier\python\audio_classification_live_stream\venv\lib\site-packages\mediapipe\tasks\python\audio\audio_classifier.py", line 204, in create_from_options
    return cls(  
 File "...\mediapipe\examples\audio_classifier\python\audio_classification_live_stream\venv\lib\site-packages\mediapipe\tasks\python\audio\core\base_audio_task_api.py", line 64, in __init__
    self._runner = _TaskRunner.create(graph_config, packet_callback)

RuntimeError: ValidatedGraphConfig Initialization failed.
No registered object with name: mediapipe::tasks::audio::audio_classifier::AudioClassifierGraph; Unable to find Calculator "mediapipe.tasks.audio.audio_classifier.AudioClassifierGraph"

Process finished with exit code 1

I use PyCharm and Windows 10
I tried with Python 3.9 and 3.10
I tried the provided code and I also tried to use downloaded models:

base_options = mp.tasks.BaseOptions(model_asset_path=model)
base_options = mp.tasks.BaseOptions(model_asset_path='lite-model_yamnet_classification_tflite_1.tflite')
base_options = mp.tasks.BaseOptions(model_asset_path='yamnet_audio_classifier_with_metadata.tflite')

kotlin.UninitializedPropertyAccessException: lateinit property gestureRecognizerHelper has not been initialized

Hi: using AS Eel and the defaults but package com.google.mediapipe.examples.gesturerecognizer.fragment throws

E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.google.mediapipe.examples.gesturerecognizer, PID: 6318
kotlin.UninitializedPropertyAccessException: lateinit property gestureRecognizerHelper has not been initialized
at com.google.mediapipe.examples.gesturerecognizer.fragment.CameraFragment$initBottomSheetControls$7.onItemSelected(CameraFragment.kt:235)
at android.widget.AdapterView.fireOnSelected(AdapterView.java:957)
at android.widget.AdapterView.dispatchOnItemSelected(AdapterView.java:946)
at android.widget.AdapterView.-$$Nest$mdispatchOnItemSelected(Unknown Source:0)
at android.widget.AdapterView$SelectionNotifier.run(AdapterView.java:910)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at android.app.ActivityThread.main(ActivityThread.java:7872)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:936)
I/tflite: Initialized TensorFlow Lite runtime.
W/libc: Access denied finding property "ro.mediatek.platform"
W/libc: Access denied finding property "ro.chipname"
W/libc: Access denied finding property "ro.hardware.chipname"
I/tflite: Created TensorFlow Lite XNNPACK delegate for CPU.
I/CameraManagerGlobal: Connecting to camera service
D/CameraRepository: Added camera: 0
I/Process: Sending signal. PID: 6318 SIG: 9

on a pixel 7.

Presence and visibility for individual landmarks

The PoseLandmarkerResult does only return x, y and z for normalized and world landmarks. Is there any way to get the presence and visibility score? According to the guide, we should have access to these attributes. Is the source code available anywhere?

Thanks a lot for the new solution - any help would be much appreciated!

some problems with the rear camera

When I switch phone camera to back, like this CameraSelector.LENS_FACING_BACK. When the phone camera recognizes the hand, the real-time refresh node and line segment will not accurately fall on the hand, but slightly offset. This will not happen after switching to the front camera. How to optimize and deal with this problem?

Audio Embedded Model for Mediapipe Task

I tried searching for a pre-trained Audio embedded model for the Mediapipe Task API but was unable to find it. Does there exist any model for this job?

I have raised a PR to fill in the missing example on AudioEmbedded #66, waiting for the .tflite model.

How to run on darwin/iOS?

Hey all 👋

The previous version of Mediapipe had first class C++ support with the ability to run on iOS and documentation for this.

Is this dropped from solutions like FaceMesh V2? I'm only seeing Android, Python and Web guides.

How to run python example on GPU?

I followed the Python example in Colab notebook(installed mediapipe by pip), but found the GPU seemed like not being occupied. Is there a specific mediapipe version or installation option to use GPU device?

Hand tracking is slower than the old Mediapipe for Android

In the old Mediapipe's sample app my test phone could track hands in real time, in this version it is lagging behind. What could be the problem? First thought that comes to my head is analyze case for CameraX being slower. Old Mediapipe sample used an older CameraX version and handled SurfaceView manually if my memory is serving me right. Is there anyway I can get around this and get the same speed for this version? Forgive me if I am asking anything obvious as I am still rather new to Android development.

How to run two models at the same time on android?

Hello, I really need help
I need to collect the coordinates of the points of both hands and face from one frame. I have combined the code to search for face points and hand points.
Now I have two FaceLandmarkerHelper and HandLandmarkerHelper files in my project. Nothing has changed in them, except the name of the inherited LandmarkerListener functions: onError, onResults, onEmpty. Now it's onErrorFace/onErrorHand, onResultsFace/onResultsHand, onEmptyFace/onEmptHand.
In the CameraFragment, I similarly divided the variables into hands and face: imageAnalyzer, backgroundExecutor. And added a variable handLandmarkerHelper, faceLandmarkerHelper. The class itself inherits from HandLandmarkerHelper.LandmarkerListener и FaceLandmarkerHelper.LandmarkerListener.
All separated variables have duplicated code.

Declaring variables:

private lateinit var handLandmarkerHelper: HandLandmarkerHelper
private lateinit var faceLandmarkerHelper: FaceLandmarkerHelper

private val viewModel: MainViewModel by activityViewModels()
private var preview: Preview? = null 

private var handIimageAnalyzer: ImageAnalysis? = null
private var faceImageAnalyzer: ImageAnalysis? = null

private var camera: Camera? = null 
private var cameraProvider: ProcessCameraProvider? = null 
private var cameraFacing = CameraSelector.LENS_FACING_FRONT

private lateinit var handBackgroundExecutor: ExecutorService
private lateinit var faceBackgroundExecutor: ExecutorService 

In the function onViewCreated:

    super.onViewCreated(view, savedInstanceState)

    // Initialize our background executor
    backgroundExecutor = Executors.newSingleThreadExecutor()
    faceBackgroundExecutor = Executors.newSingleThreadExecutor()

    // Wait for the views to be properly laid out
    fragmentCameraBinding.viewFinder.post {
        // Set up the camera and its use cases
        setUpCamera() // метод setUpCamera текущего класса
    }

    // Create the HandLandmarkerHelper that will handle the inference
    handBackgroundExecutor.execute {
        handLandmarkerHelper = HandLandmarkerHelper(
            context = requireContext(),
            runningMode = RunningMode.LIVE_STREAM,
            minHandDetectionConfidence = viewModel.currentMinHandDetectionConfidence,
            minHandTrackingConfidence = viewModel.currentMinHandTrackingConfidence,
            minHandPresenceConfidence = viewModel.currentMinHandPresenceConfidence,
            maxNumHands = viewModel.currentMaxHands,
            currentDelegate = viewModel.currentDelegate,
            handLandmarkerHelperListener = this
        )
    }
    faceBackgroundExecutor.execute {
        faceLandmarkerHelper = FaceLandmarkerHelper(
            context = requireContext(),
            runningMode = RunningMode.LIVE_STREAM,
            minFaceDetectionConfidence = viewModel.currentMinFaceDetectionConfidence,
            minFaceTrackingConfidence = viewModel.currentMinFaceTrackingConfidence,
            minFacePresenceConfidence = viewModel.currentMinFacePresenceConfidence,
            maxNumFaces = viewModel.currentMaxFaces,
            currentDelegate = viewModel.currentDelegate,
            faceLandmarkerHelperListener = this
        )
    }

Part of the function bindCameraUseCases:

    // ImageAnalysis. Using RGBA 8888 to match how our models work
    handImageAnalyzer =
        ImageAnalysis.Builder().setTargetAspectRatio(AspectRatio.RATIO_4_3)
            .setTargetRotation(fragmentCameraBinding.viewFinder.display.rotation)
            .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
            .setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
            .build()
            // The analyzer can then be assigned to the instance
            .also {
                it.setAnalyzer(handBackgroundExecutor) { image ->
                    detectHand(image) 
                }
            }

    faceImageAnalyzer =
        ImageAnalysis.Builder().setTargetAspectRatio(AspectRatio.RATIO_4_3)
            .setTargetRotation(fragmentCameraBinding.viewFinder.display.rotation)
            .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
            .setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
            .build()
            // The analyzer can then be assigned to the instance
            .also {
                it.setAnalyzer(faceBackgroundExecutor) { image ->
                    detectFace(image)
                }
            }


    // Must unbind the use-cases before rebinding them
    cameraProvider.unbindAll()

Is it possible to pass two analyzers to bindToLifcycle? Continuation of the function bindCameraUseCases:

    try {
        // A variable number of use-cases can be passed here -
        // camera provides access to CameraControl & CameraInfo
        camera = cameraProvider.bindToLifecycle(
            this, cameraSelector, preview, handImageAnalyzer (or faceImageAnalyzer)
        )

        // Attach the viewfinder's surface provider to preview use case
        preview?.setSurfaceProvider(fragmentCameraBinding.viewFinder.surfaceProvider)
    } catch (exc: Exception) {
        Log.e(TAG, "Use case binding failed", exc)
    }

Or maybe somehow combine both analyzers in one?
imageAnalyzer = handImageAnalyzer + faceImageAnalyzer

If you create one imageAnalyzer and one backgroundExecutor, then in this code:

    imageAnalyzer =
        ImageAnalysis.Builder().setTargetAspectRatio(AspectRatio.RATIO_4_3)
            .setTargetRotation(fragmentCameraBinding.viewFinder.display.rotation)
            .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
            .setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
            .build()
            // The analyzer can then be assigned to the instance
            .also {
                it.setAnalyzer(backgroundExecutor) { image ->
                    detectFace(image)
                }
                it.setAnalyzer(backgroundExecutor) { image ->
                    detectHand(image)
                }
            }

the latter method will prevail (now detectHand)

Issue with importing mediapipe library using ES6 modules in Workers

I am writing to report an issue I have encountered while importing the mediapipe library using ES6 modules in Workers. When attempting to import mediapipe, I consistently receive the following error: "TypeError: Failed to execute 'importScripts' on 'WorkerGlobalScope': Module scripts don't support importScripts()." This error occurs when I set the 'type' attribute to 'module' when connecting the Worker, as shown in the code snippet below:

new Worker(this.workerScriptUrl, { type: 'module' });

After investigating the issue, it has become apparent that the mediapipe library relies on importScripts to import WebAssembly (wasm) files. Unfortunately, due to the limitation of module scripts in Workers, importScripts is not supported, resulting in the aforementioned error. Consequently, I am unable to utilize the mediapipe library within Workers when using ES6 modules.

I kindly request your assistance in resolving this matter. I propose the implementation of an optional workaround to address this issue. Specifically, it would be greatly appreciated if you could introduce an exception that allows an alternative method, such as using fetch or any other suitable approach, to import the wasm files instead of relying on importScripts. This adjustment would facilitate the seamless integration of the mediapipe library with ES6 modules in Workers, enabling developers to effectively leverage its functionalities.

Thank you for your attention to this matter. I eagerly await your response and any guidance you can provide to help resolve this issue.

image segmentation canvas live-streaming mode seems not showing full Image

Hello, I was trying to modify the image segmentation example. For video live streaming mode, I try to put points into the image, but it seems not centered. I debug the image, it seems that the video within the canvas is cut.
Code :
image
I call this code within setResults
drawExpectedLocation(mask_with_center.nativeObjAddr)
and I convert the mask_with_center into bitmap and show it

val mask_with_center = Mat(outputHeight, outputWidth, CvType.CV_8UC4,Scalar.all(0.0))
drawExpectedLocation(mask_with_center.nativeObjAddr)
val image = Bitmap.createBitmap(
            mask_dif_largest.cols(),
            mask_dif_largest.rows(),
            Bitmap.Config.ARGB_8888
        );
Utils.matToBitmap(mask_with_center, image);
val scaleFactor = when (runningMode) {
    RunningMode.IMAGE,
    RunningMode.VIDEO -> {
        min(width * 1f / outputWidth, height * 1f / outputHeight)
    }
    RunningMode.LIVE_STREAM -> {
        // PreviewView is in FILL_START mode. So we need to scale up the
        // landmarks to match with the size that the captured images will be
        // displayed.
        max(width * 1f / outputWidth, height * 1f / outputHeight)
    }
}
val scaleWidth = (outputWidth * scaleFactor).toInt()
val scaleHeight = (outputHeight * scaleFactor).toInt()
scaleBitmap = Bitmap.createScaledBitmap(
    image, scaleWidth, scaleHeight, false
)

and I got the result like this
image
the 2 circle is not centered... I think because the image within canvas is cut. please help me

All CodePen examples for `vision` are failing

Description

The following CodePen demos (examples) are failing because of dependency incompatibility for MediaPipe Tasks Vision package:

  1. Background Segmenter (web)
  2. MediaPipe GestureRecognizer (web)
  3. MediaPipe HandLandmarker (web)
  4. MediaPipe Image Classifier (web)
  5. MediaPipe Image Embedder (web)
  6. MediaPipe Image Segmentation (web)
  7. MediaPipe Interactive Image Segmentation (web)
  8. MediaPipe Object Detection (web)

I initially assumed that it was an issue with CodePen, but I can confirm that it's valid since I've tested most of them locally too.

cc: @jenperson, @PaulTR

Screenshot 🖼️

The error log populates an issue with building the "@mediapipe/tasks-vision" package [@mediapipe/[email protected]]

image

Solution ❔

💡 0.1.0-alpha-10 just got released just 6 hours ago, while 0.1.0-alpha-9 got released 2 days ago. Release versions can be explored over here.

The easy fix is to update the CDN → https://cdn.skypack.dev/@mediapipe/tasks-vision@latest with either https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision (stable) or better "https://cdn.skypack.dev/@mediapipe/[email protected]" (latest stable) & you're good to go! 🚀


P.S.: Please consider adding an Issue Template as well as Stale bot for this repository or better import the same one from here & set the config value accordingly. If assigned, I'd love to raise a PR for the same! 😄


OS : Ubuntu (22.04 LTS, x64)   /   Windows 10 Pro (x64)
Browser :

  • Google Chrome — Version 111.0.5563.147 (Official Build) (64-bit)
  • Microsoft Edge [Version 112.0.1722.39 (Official build) (64-bit)]

Exception in HandLandmarker sample: lateinit property handLandmarkerHelper has not been initialized

E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.google.mediapipe.examples.handlandmarker, PID: 22940
kotlin.UninitializedPropertyAccessException: lateinit property handLandmarkerHelper has not been initialized
at com.google.mediapipe.examples.handlandmarker.fragment.CameraFragment$initBottomSheetControls$9.onItemSelected(CameraFragment.kt:249)
at android.widget.AdapterView.fireOnSelected(AdapterView.java:979)
at android.widget.AdapterView.dispatchOnItemSelected(AdapterView.java:968)
at android.widget.AdapterView.-$$Nest$mdispatchOnItemSelected(Unknown Source:0)
at android.widget.AdapterView$SelectionNotifier.run(AdapterView.java:932)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:240)
at android.os.Looper.loop(Looper.java:351)

I got this error on my OnePlus 9R device. I can see handLandmarkerHelper has been initialized in onViewCreated() via concurrent executor. That's the reason of the crash I guess. Is it possible to initialize it on the main thread?

NormalizedLandmark depth (z-axis) scale/denormalization

I am trying to use the x, y, and z axes of the landmarks output from your latest FaceLandmarker model. It is mentioned here that the z magnitude follows the scale of the x-axis. I tried de-normalizing the z-axis based on the x-axis but got inaccurate results.

Can you please clarify how to correctly de-normalize the depth (z-axis) values?

Dense face landmark definition

Hello, is there a definition of the location of each landmark in the dense model? If so, how is it corresponding to the vertices of FLAME, GHUM, orPhoMoH models?

Thanks!

Regarding GSoC 2023

Hi,
My Self Devarsh Mavani, 3rd year Information Technology Student at Vishwakarma Government Engineering College. I am looking forward to participating in GSoC under tensorflow this year. I am interested in the project "Interactive Web Demos using the MediaPipe Machine Learning Library ''. To give you a bit of my background, I have Done ML Specialization and Deep Learning course from coursera. I have also done a few Deep-learning related projects. One of them being Sign language recognition in android (using tflite). I participated in GSoC last year under MIT App Inventor, and have over 1 year of experience in open-source development.
I found mediapipe really interesting and want to contribute by making demos for mediapipe. I have a couple of demos that we can build in my mind, such as Sign language recognition and Virtual paint app. I wanted to discuss what type of demo web-applications are the mentors expecting from me, as part of GSoC. Will it be up to students? or will it be discussed to us prior? I am working on a Proposal for this project so I wanted to clarify some of my doubts. It will be really helpful if I can connect to potential mentor.

Thank You,
Devarsh Mavani

image segmentation on ipad safari

Trying to troubleshoot why only on my ipad safari i get a full white mask back every time. The segmentation works fine on desktop but im not entirely sure how else to test/debug this. i do get a base64 image from the segmenter, but its always just a white image on ipad safari.

just updated my computer and ipad to the lastest ios/safari

useEffect(() => {
const loadModel = async () => {
const audio = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/wasm"
);
const options = {
baseOptions: {
modelAssetPath: "/models/selfie_multiclass.tflite",
delegate: "GPU",
},
runningMode: "IMAGE",
outputCategoryMask: true,
outputConfidenceMasks: false,
displayNamesLocale: "en",
};
const segmenter = await ImageSegmenter.createFromOptions(audio, options);
setImageSegmenter(segmenter);
};

loadModel();

}, []);

const initializeCamera = async () => {
try {
const constraints = {
audio: false,
video: { facingMode: "user", width: 512, height: 768 },
};
const stream = await navigator.mediaDevices.getUserMedia(constraints);
const videoElement = videoRef.current;
videoElement.srcObject = stream;
videoElement.onloadedmetadata = () => {
videoElement.play();
setIsCameraInitialized(true);
};
} catch (error) {
console.error("Error accessing webcam:", error);
console.error(error.message);
}
};

const takeSelfie = async () => {
const videoElement = videoRef.current;
const canvasElement = canvasRef.current;
const canvasCtx = canvasElement.getContext("2d");

const { videoWidth, videoHeight } = videoElement;
canvasElement.width = 512;
canvasElement.height = 768;

canvasCtx.drawImage(videoElement, 0, 0, 512, 768);

const imageData = canvasCtx.getImageData(0, 0, 512, 768);
const imageUrl = canvasElement.toDataURL();
setImageURL(imageUrl);
await imageSegmenter.segment(imageData, callback);

};

const callback = (result) => {
const canvasElement = canvasRef.current;
const canvasCtx = canvasElement.getContext("2d");

const { width, height } = canvasElement;
const categoryMask = result.categoryMask.getAsUint8Array();

const imageData = canvasCtx.getImageData(0, 0, 512, 768);
const data = imageData.data;
const headColor = [0, 0, 0, 255]; // Black color for head and hair
const whiteColor = [255, 255, 255, 255]; // White color for other parts

for (let i = 0; i < categoryMask.length; i++) {
  const categoryIndex = categoryMask[i];

  if (
    categoryIndex == 1 ||
    categoryIndex === 2 ||
    categoryIndex == 3 ||
    categoryIndex == 5
  ) {
    // Head and hair category indices
    data[i * 4] = headColor[0];
    data[i * 4 + 1] = headColor[1];
    data[i * 4 + 2] = headColor[2];
    data[i * 4 + 3] = headColor[3];
  } else {
    data[i * 4] = whiteColor[0];
    data[i * 4 + 1] = whiteColor[1];
    data[i * 4 + 2] = whiteColor[2];
    data[i * 4 + 3] = whiteColor[3];
  }
}

canvasCtx.putImageData(imageData, 0, 0);

const segmentedImageURL = canvasElement.toDataURL();
setSegmentedImageURL(segmentedImageURL);

};

UninitializedPropertyAccessException

Getting the following error when I start the app

E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.google.mediapipe.examples.poselandmarker, PID: 23887
kotlin.UninitializedPropertyAccessException: lateinit property poseLandmarkerHelper has not been initialized
at com.google.mediapipe.examples.poselandmarker.fragment.CameraFragment$initBottomSheetControls$8.onItemSelected(CameraFragment.kt:254)
at android.widget.AdapterView.fireOnSelected(AdapterView.java:957)
at android.widget.AdapterView.dispatchOnItemSelected(AdapterView.java:946)
at android.widget.AdapterView.-$$Nest$mdispatchOnItemSelected(Unknown Source:0)
at android.widget.AdapterView$SelectionNotifier.run(AdapterView.java:910)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at android.app.ActivityThread.main(ActivityThread.java:7884)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:936)

AAR Unity porting error

While converting the gesturerecognizer example into AAR and inserting it into Unity to call an activity and get a value, an error like this log and it says that initialization failed.

native  com.DefaultCompany.SampleAPI  I  I20230329 11:05:49.779655 14280 resource_util_android.cc:77] Successfully loaded: gesture_recognizer.task

QCamera  [email protected]  I  <HAL><I> openCamera: 794: [KPI Perf]: X PROFILE_OPEN_CAMERA camera id 0, rc: 0

 LGCameraPerf-8996ac_OOS [email protected]  E  powerHintInternal_LGE: 353: mEnable = 0, enable = 1 PowerHint::CAMERA_STREAMING = 12

LGCameraPerf-8996ac_OOS [email protected]  E  powerHintInternal_LGE: 376: powerHint = 11, enable = 1,

QCamera  [email protected]  I  <HAL><I> initialize: 1097: E :mCameraId = 0 mState = 1

 sensors_ha...otionAccel [email protected]  D  processInd: LP2: X: 0.296753 Y: 1.117096 Z: 9.717972 SAM TS: 2602640251 HAL TS:79423187446215 elapsedRealtimeNano:79423270247980

sensors_hal_Ctx         [email protected]  D  poll:polldata:1, sensor:54, type:499898101, x:0.296753 y:1.117096 z:9.717972

sensors_hal_Util        [email protected]  D  waitForResponse: timeout=0

BluetoothRemoteDevices  com.android.bluetooth    D  Property type: 1

 6519-6749  BluetoothRemoteDevices  com.android.bluetooth   W  Skip name update for C0:F0:FB:27:E3:C2

 QCOM PowerHAL  [email protected]   I  Preview power hint start

BluetoothRemoteDevices  com.android.bluetooth  D  Property type: 4

 BluetoothRemoteDevices  com.android.bluetooth    W  Skip class update for C0:F0:FB:27:E3:C2

native                  com.DefaultCompany.SampleAPI         I  I20230329 11:05:49.785691 14280 hand_gesture_recognizer_graph.cc:250] Custom gesture classifier is not defined.

QCamera   [email protected]  I  <HAL><I> initialize: 1130: X

 tflite   com.DefaultCompany.SampleAPI         E  The supplied buffer is not 4-bytes aligned

tflite    com.DefaultCompany.SampleAPI         E  The model allocation is null/empty

 native     com.DefaultCompany.SampleAPI         E  E20230329 11:05:49.786113 14280 graph.cc:472] Could not build model from the provided pre-loaded flatbuffer: The model allocation is null/empty

GestureRec...r 41116847 com.DefaultCompany.SampleAPI         E  MP Task Vision failed to load the task with error: unknown: Could not build model from the provided pre-loaded flatbuffer: The model allocation is null/empty

Why does this log :( I don't know because I don't have knowledge about TensorFlow.
+) I confirmed that it works well when inserting AAR through Android Studio.

How could I use VIDEO mode on gesture recognizer?

Hello,
I've tried to read Video frame to numpy array.
Did I missed something to make an input of recognizer?

import random
import ctypes 
from PIL import Image
with vision.GestureRecognizer.create_from_options(options) as recognizer:
  cap = cv2.VideoCapture('TRAIN_300.mp4')
  print("==== Video Info. ===== ")
  #print(cv2.CAP_PROP_FRAME_WIDTH) 
  #print(cv2.CAP_PROP_FRAME_HEIGHT)
  fps = cv2.CAP_PROP_FPS
  #print(fps)
  timestamps = [cv2.CAP_PROP_POS_MSEC]
  calc_timestamps = [0.0]
  timearray = []
  
  frameCount = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
  frameWidth = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
  frameHeight = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

  buf = np.empty((frameCount, frameHeight, frameWidth, 3), np.dtype('uint8'))

  fc = 0
  ret = True

  while (fc < frameCount  and ret):
    ret, buf[fc] = cap.read()
    fc += 1 
    timestamps.append(cap.get(cv2.CAP_PROP_POS_MSEC))
    ts = cap.get(cv2.CAP_PROP_POS_MSEC)
    cts = calc_timestamps[-1] + 1000/fps
    timearray.append(abs(ts - cts))
  cap.release()
 
  frame_timestamp_ms = timearray[9]
  print(type(buf[9]))
  mp_image = mp.Image(format=ImageFormat.SRGB, data=np.stack(buf[9]))
  
  gesture_recognition_result = recognizer.recognize_for_video(mp_image,frame_timestamp_ms)
 

  #numpy_frame_from_opencv = np.stack(frames, axis=0) # dimensions (T, H, W, C)
  
  #print(len(numpy_frame_from_opencv))
   
  cv2.destroyAllWindows()


==== Video Info. =====
<class 'numpy.ndarray'>
W20230204 14:13:15.370810 88347 gesture_recognizer_graph.cc:122] Hand Gesture Recognizer contains CPU only ops. Sets HandGestureRecognizerGraph acceleartion to Xnnpack.
I20230204 14:13:15.374961 88347 hand_gesture_recognizer_graph.cc:250] Custom gesture classifier is not defined.

TypeError Traceback (most recent call last)
Cell In[10], line 35
33 frame_timestamp_ms = timearray[9]
34 print(type(buf[9]))
---> 35 mp_image = mp.Image(format=ImageFormat.SRGB, data=np.stack(buf[9]))
37 gesture_recognition_result = recognizer.recognize_for_video(mp_image,frame_timestamp_ms)
40 #numpy_frame_from_opencv = np.stack(frames, axis=0) # dimensions (T, H, W, C)
41
42 #print(len(numpy_frame_from_opencv))

TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. mediapipe.python._framework_bindings.image.Image(image_format: mediapipe::ImageFormat_Format, data: numpy.ndarray[numpy.uint8])
2. mediapipe.python._framework_bindings.image.Image(image_format: mediapipe::ImageFormat_Format, data: numpy.ndarray[numpy.uint16])
3. mediapipe.python._framework_bindings.image.Image(image_format: mediapipe::ImageFormat_Format, data: numpy.ndarray[numpy.float32])

Invoked with: kwargs: format=<ImageFormat.SRGB: 1>, data=array([[[113, 123, 106],
[113, 123, 106],
[113, 123, 106],
...,
[149, 162, 144],
[149, 162, 144],
[147, 160, 142]],

   [[114, 124, 107],
    [114, 124, 107],
    [114, 124, 107],
    ...,
    [149, 162, 144],
    [147, 160, 142],
    [147, 160, 142]],

   [[114, 124, 107],
    [114, 124, 107],
    [114, 124, 107],
    ...,
    [147, 160, 142],
    [146, 159, 141],
    [146, 159, 141]],

   ...,

   [[ 38,  43,  41],
    [ 52,  57,  55],
    [ 68,  74,  69],
    ...,
    [ 19,  24,  22],
    [ 20,  25,  23],
    [ 20,  25,  23]],

   [[ 68,  73,  71],
    [ 92,  97,  95],
    [104, 110, 105],
    ...,
    [ 18,  23,  21],
    [ 19,  24,  22],
    [ 19,  24,  22]],

   [[ 49,  54,  52],
    [ 46,  51,  49],
    [ 62,  68,  63],
    ...,
    [ 18,  23,  21],
    [ 19,  24,  22],
    [ 20,  25,  23]]], dtype=uint8)

Hii paul this is regarding the Google Summer of Code '2023

Dear Paul ,

I hope this message finds you well. I am interested in contributing to your _**_android app mediapipe machine learning app development project that is listed on Gsoc'23 **_and have also sent you the proposal regarding this and would like to know whether there is a repository available on GitHub.
If there is already a repository available, could you please share the link with me? Alternatively, if there is no repository available, would you consider creating one so that contributors like myself can easily contribute to the project?
Thank you for considering my request. I look forward to hearing from you soon.
Hope you remember me well!
Thanking you in advance and sorry for creating an issue like this on Github.
Best regards,
Aakash

outputType : CONFIDENCE_MASK seems weird in Android

hi all, I dont know whether this repo is the right place to ask this, but since I am using the example from this repo, I will ask anyway.

So, I am trying to modify
https://github.com/googlesamples/mediapipe/tree/main/examples/image_segmentation/android
I try to change
.setOutputType(ImageSegmenter.ImageSegmenterOptions.OutputType.CATEGORY_MASK)
into
.setOutputType(ImageSegmenter.ImageSegmenterOptions.OutputType.CONFIDENCE_MASK)
this is the code:
image

and I read on https://developers.google.com/mediapipe/solutions/vision/image_segmenter/android that
image
so I print the result, by using
image
and
image
it seems the value is not the same as a probability
image
Note that I am using hair_segmentation.tflite which has only 2 types of output, background and hair

please guys, I cannot find another resource regarding this

READMEs

We will need top layer and app-level READMEs updated

Tests needed

Currently need tests for imageclassifier, object detection

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.