google-ai-edge / mediapipe-samples Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
gesnture recognizer can not run on windows
I am trying to run Mediapipe Pose in livestream mode
options = PoseLandmarkerOptions(
base_options=BaseOptions(model_asset_path='pose_landmarker_full.task'),
running_mode=VisionRunningMode.LIVE_STREAM,
output_segmentation_masks=True,
result_callback=print_result)
with PoseLandmarker.create_from_options(options) as pose:
cap = cv2.VideoCapture(0)
i = 0
while (1):
succes, image = cap.read()
image = np.array(image)
imgRGB = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=imgRGB)
pose.detect_async(mp_image,i)
i += 1
however, if I try to output the segmentation mask, the Python interpreter quits without any error messages
I have narrowed it down to this line
def print_result(result: PoseLandmarkerResult, output_image: mp.Image, timestamp_ms: int):
alpha= result.segmentation_masks[0].numpy_view()
if that code is alpha= result.segmentation_masks
, then Mediapipe will run and alpha
is of type mediapipe.python._framework_bindings.image.Image object
but if I wish to return the pixel data with .numpy_view() , it will cause Python to quit without any messages
Background segmenter tutorial has a bug, it shows everything blurred instead of blurring just the background
Has anyone created a version of the example below using streaming video?
https://github.com/googlesamples/mediapipe/tree/main/examples/pose_landmarker/python
The new segmentation solution for the webcam has a much lower quality compared to the old legacy selfie - segmentation. At least the framerate is obviously much lower. The old solution worked very smoothly. Is there anything that can be tuned to get the same results?
I want to use Android NNAPI in Hand landmarks detection.
How can i do?
Thanks you
Is there any difference between this framework and the https://github.com/google/mediapipe on android platform? Or are they compatible?
I found the maven depends on are different:
com.google.mediapipe:solution-core:latest.release
implementation 'com.google.mediapipe:solution-core:latest.release'
com.google.mediapipe:tasks-vision:0.1.0-alpha-5
implementation 'com.google.mediapipe:tasks-vision:0.1.0-alpha-5'
Can they migrate to each other? Or is the model file xx.tflite
they use compatible?
Hi, following the guide, I can run pose estimate of human, but the pose_landmarker_result confused me. There both have z in pose_landmarks and pose_world_landmarks, they are all relative depth, but they are different in value, so what are their definiation?
Hello.
I am using your FaceLandmarker found here: https://github.com/googlesamples/mediapipe/blob/main/examples/face_landmarker/python/%5BMediaPipe_Python_Tasks%5D_Face_Landmarker.ipynb
Currently I am trying to use the dense landmarks while training a FLAME-based 3D model. Is there such a correspondence between the dense landmarks you produce (478 points) and the FLAME vertices? This repo (https://github.com/Zielon/metrical-tracker/tree/master/flame/mediapipe) seems to have found a correspondence for 105 points, but not the entire 478 points.
Hi there,
i really like the idea of low-code ML dev tools.
Especially the GPU acelerated inference on android devices!
@st-duymai & @PaulTR:
thank you for your time and reply in advance!
I came across the CodePen link of MediaPipe HandGestureRecognizer
(JS) example from here and I'm curious to know whether these changes (or rather updates) can be applied to the same to optimize & increase code readability.
async function runDemo() {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
);
gestureRecognizer = await GestureRecognizer.createFromOptions(vision, {
baseOptions: {
modelAssetPath:
"https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/gesture_recognizer.task"
},
runningMode: runningMode
});
demosSection.classList.remove("invisible");
}
runDemo();
⬇️ (Updated Code)
const runDemo = async () => {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
);
gestureRecognizer = await GestureRecognizer.createFromOptions(vision, {
baseOptions: {
modelAssetPath:
"https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/gesture_recognizer.task"
},
runningMode: runningMode
});
demosSection.classList.remove("invisible");
};
runDemo();
runningMode
variable has been avoided, which extensively reduces the risk of introducing bugs (if reused). We directly pass the string to the setOptions()
method instead of assigning the "IMAGE" string to the runningMode
variable, which simplifies the code and makes it more concise. if (runningMode === "VIDEO") {
runningMode = "IMAGE";
await gestureRecognizer.setOptions({ runningMode: runningMode });
}
⬇️ (Updated Code)
if (runningMode === "VIDEO") {
await gestureRecognizer.setOptions({ runningMode: "IMAGE" });
}
let
instead of var
in the loop (adhering to ES6 standards) & making it concise & easier to read. const allCanvas = event.target.parentNode.getElementsByClassName("canvas");
for (var i = allCanvas.length - 1; i >= 0; i--) {
const n = allCanvas[i];
n.parentNode.removeChild(n);
}
⬇️ (Updated Code)
const allCanvas = event.target.parentNode.getElementsByClassName("canvas");
for (let i = allCanvas.length - 1; i >= 0; i--) {
allCanvas[i].parentNode.removeChild(allCanvas[i]);
}
function hasGetUserMedia() {
return !!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia);
}
⬇️ (Updated Code)
const hasGetUserMedia = () => !!(navigator.mediaDevices?.getUserMedia);
if (webcamRunning === true) {
webcamRunning = false;
enableWebcamButton.innerText = "ENABLE PREDICTIONS";
} else {
webcamRunning = true;
enableWebcamButton.innerText = "DISABLE PREDICITONS";
}
⬇️ (Updated Code)
webcamRunning = !webcamRunning;
enableWebcamButton.innerText = webcamRunning ? "DISABLE PREDICTIONS" : "ENABLE PREDICTIONS";
=== true
to compare a boolean value to true (which is unnecessary), the boolean value itself can be used as the condition.async function predictWebcam() {
const webcamElement = document.getElementById("webcam");
// Now let's start detecting the stream.
if (runningMode === "IMAGE") {
runningMode = "VIDEO";
await gestureRecognizer.setOptions({ runningMode: runningMode });
}
let nowInMs = Date.now();
const results = gestureRecognizer.recognizeForVideo(video, nowInMs);
canvasCtx.save();
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
canvasElement.style.height = videoHeight;
webcamElement.style.height = videoHeight;
canvasElement.style.width = videoWidth;
webcamElement.style.width = videoWidth;
if (results.landmarks) {
for (const landmarks of results.landmarks) {
drawConnectors(canvasCtx, landmarks, HAND_CONNECTIONS, {
color: "#00FF00",
lineWidth: 5
});
drawLandmarks(canvasCtx, landmarks, { color: "#FF0000", lineWidth: 2 });
}
}
canvasCtx.restore();
if (results.gestures.length > 0) {
gestureOutput.style.display = "block";
gestureOutput.style.width = videoWidth;
gestureOutput.innerText =
"GestureRecognizer: " +
results.gestures[0][0].categoryName +
"\n Confidence: " +
Math.round(parseFloat(results.gestures[0][0].score) * 100) +
"%";
} else {
gestureOutput.style.display = "none";
}
// Call this function again to keep predicting when the browser is ready.
if (webcamRunning === true) {
window.requestAnimationFrame(predictWebcam);
}
}
⬇️ (Updated Code)
const predictWebcam = async () => {
const webcamElement = document.getElementById("webcam");
// Now let's start detecting the stream.
if (runningMode === "IMAGE") {
runningMode = "VIDEO";
await gestureRecognizer.setOptions({ runningMode: runningMode });
}
let nowInMs = Date.now();
const results = gestureRecognizer.recognizeForVideo(video, nowInMs);
canvasCtx.save();
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
canvasElement.style.height = videoHeight;
webcamElement.style.height = videoHeight;
canvasElement.style.width = videoWidth;
webcamElement.style.width = videoWidth;
results.landmarks?.forEach((landmarks) => {
drawConnectors(canvasCtx, landmarks, HAND_CONNECTIONS, {
color: "#00FF00",
lineWidth: 5
});
drawLandmarks(canvasCtx, landmarks, { color: "#FF0000", lineWidth: 2 });
});
canvasCtx.restore();
if (results.gestures.length > 0) {
gestureOutput.style.display = "block";
gestureOutput.style.width = videoWidth;
const categoryName = results.gestures[0][0].categoryName;
const score = Math.round(parseFloat(results.gestures[0][0].score) * 100);
gestureOutput.innerText = `GestureRecognizer: ${categoryName}\n Confidence: ${score}%`;
} else {
gestureOutput.style.display = "none";
}
// Call this function again to keep predicting when the browser is ready.
if (webcamRunning) {
window.requestAnimationFrame(predictWebcam);
}
}
💡 The updated CodePen example can be accessed here.
cc: @jenperson
OS : Ubuntu (22.04 LTS, x64) / Windows 10 Pro (x64)
Browser :
If start gesture recognizer app on arm64-v8a device so everything is OK. But starting on armeabi-v7a device (Android version is the same and equals to 11) makes facing to some problems:
E/tflite: The supplied buffer is not 4-bytes aligned
E/tflite: The model allocation is null/empty
E/native: E20221114 20:29:51.589087 3371 graph.cc:472] Could not build model from the provided pre-loaded flatbuffer: The model allocation is null/empty
W/System.err: com.google.mediapipe.framework.MediaPipeException: unknown: Could not build model from the provided pre-loaded flatbuffer: The model allocation is null/empty
W/System.err: at com.google.mediapipe.framework.Graph.nativeStartRunningGraph(Native Method)
W/System.err: at com.google.mediapipe.framework.Graph.startRunningGraph(Graph.java:336)
W/System.err: at com.google.mediapipe.tasks.core.TaskRunner.create(TaskRunner.java:71)
W/System.err: at com.google.mediapipe.tasks.vision.gesturerecognizer.GestureRecognizer.createFromOptions(GestureRecognizer.java:194)
W/System.err: at com.google.mediapipe.examples.gesturerecognizer.GestureRecognizerHelper.setupGestureRecognizer(GestureRecognizerHelper.kt:95)
W/System.err: at com.google.mediapipe.examples.gesturerecognizer.GestureRecognizerHelper.<init>(GestureRecognizerHelper.kt:50)
W/System.err: at com.google.mediapipe.examples.gesturerecognizer.fragment.CameraFragment.onViewCreated$lambda-4(CameraFragment.kt:147)
W/System.err: at com.google.mediapipe.examples.gesturerecognizer.fragment.CameraFragment.$r8$lambda$xiZI6LDAjMBw-J7vyrjSe_CLWo0(Unknown Source:0)
W/System.err: at com.google.mediapipe.examples.gesturerecognizer.fragment.CameraFragment$$ExternalSyntheticLambda13.run(Unknown Source:2)
W/System.err: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
W/System.err: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
W/System.err: at java.lang.Thread.run(Thread.java:923)
E/GestureRecognizerHelper 56784282: MP Task Vision failed to load the task with error: unknown: Could not build model from the provided pre-loaded flatbuffer: The model allocation is null/empty
I guess the error depends on device arch. What can you advice in this situation?
Hi,
I got the following error:
Traceback (most recent call last):
File "...\mediapipe\examples\audio_classifier\python\audio_classification_live_stream\classify.py", line 134, in <module>
main()
File "...\mediapipe\examples\audio_classifier\python\audio_classification_live_stream\classify.py", line 129, in main
run(args.model, int(args.maxResults), float(args.scoreThreshold),
File "...\mediapipe\examples\audio_classifier\python\audio_classification_live_stream\classify.py", line 57, in run
classifier = audio.AudioClassifier.create_from_options(options)
File "...\mediapipe\examples\audio_classifier\python\audio_classification_live_stream\venv\lib\site-packages\mediapipe\tasks\python\audio\audio_classifier.py", line 204, in create_from_options
return cls(
File "...\mediapipe\examples\audio_classifier\python\audio_classification_live_stream\venv\lib\site-packages\mediapipe\tasks\python\audio\core\base_audio_task_api.py", line 64, in __init__
self._runner = _TaskRunner.create(graph_config, packet_callback)
RuntimeError: ValidatedGraphConfig Initialization failed.
No registered object with name: mediapipe::tasks::audio::audio_classifier::AudioClassifierGraph; Unable to find Calculator "mediapipe.tasks.audio.audio_classifier.AudioClassifierGraph"
Process finished with exit code 1
I use PyCharm and Windows 10
I tried with Python 3.9 and 3.10
I tried the provided code and I also tried to use downloaded models:
base_options = mp.tasks.BaseOptions(model_asset_path=model)
base_options = mp.tasks.BaseOptions(model_asset_path='lite-model_yamnet_classification_tflite_1.tflite')
base_options = mp.tasks.BaseOptions(model_asset_path='yamnet_audio_classifier_with_metadata.tflite')
Hi: using AS Eel and the defaults but package com.google.mediapipe.examples.gesturerecognizer.fragment throws
E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.google.mediapipe.examples.gesturerecognizer, PID: 6318
kotlin.UninitializedPropertyAccessException: lateinit property gestureRecognizerHelper has not been initialized
at com.google.mediapipe.examples.gesturerecognizer.fragment.CameraFragment$initBottomSheetControls$7.onItemSelected(CameraFragment.kt:235)
at android.widget.AdapterView.fireOnSelected(AdapterView.java:957)
at android.widget.AdapterView.dispatchOnItemSelected(AdapterView.java:946)
at android.widget.AdapterView.-$$Nest$mdispatchOnItemSelected(Unknown Source:0)
at android.widget.AdapterView$SelectionNotifier.run(AdapterView.java:910)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at android.app.ActivityThread.main(ActivityThread.java:7872)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:936)
I/tflite: Initialized TensorFlow Lite runtime.
W/libc: Access denied finding property "ro.mediatek.platform"
W/libc: Access denied finding property "ro.chipname"
W/libc: Access denied finding property "ro.hardware.chipname"
I/tflite: Created TensorFlow Lite XNNPACK delegate for CPU.
I/CameraManagerGlobal: Connecting to camera service
D/CameraRepository: Added camera: 0
I/Process: Sending signal. PID: 6318 SIG: 9
on a pixel 7.
The PoseLandmarkerResult
does only return x
, y
and z
for normalized and world landmarks. Is there any way to get the presence and visibility score? According to the guide, we should have access to these attributes. Is the source code available anywhere?
Thanks a lot for the new solution - any help would be much appreciated!
I saw a paragraph explaining "Otherwise, a lightweight hand tracking algorithm is used to determine the location of the hand(s) for subsequent landmark detection."
How does a lightweight hand tracking algorithm work?
Thanks.
When I switch phone camera to back, like this CameraSelector.LENS_FACING_BACK. When the phone camera recognizes the hand, the real-time refresh node and line segment will not accurately fall on the hand, but slightly offset. This will not happen after switching to the front camera. How to optimize and deal with this problem?
I tried searching for a pre-trained Audio embedded model for the Mediapipe Task API but was unable to find it. Does there exist any model for this job?
I have raised a PR to fill in the missing example on AudioEmbedded #66, waiting for the .tflite model.
Hello,
I am trying to implement the android application for the image segmentation task. (https://github.com/googlesamples/mediapipe/tree/main/examples/image_segmentation/android)
I have an h5 model trained on my own data.
In this documentation https://developers.google.com/mediapipe/solutions/vision/pose_landmarker/android ,
It is given that the result will be this:
PoseLandmarkerResult:
Landmarks:
Landmark #0:
x : 0.638852
y : 0.671197
z : 0.129959
visibility : 0.9999997615814209
presence : 0.9999984502792358
But in the implementation, I am only getting x, y and z values but not visibility and presence.
Does mediapipe only support CPU and GPU?
If the mediapipe support NPU , May you let me know how to use it?
Hey all 👋
The previous version of Mediapipe had first class C++ support with the ability to run on iOS and documentation for this.
Is this dropped from solutions like FaceMesh V2? I'm only seeing Android, Python and Web guides.
I followed the Python example in Colab notebook(installed mediapipe by pip), but found the GPU seemed like not being occupied. Is there a specific mediapipe version or installation option to use GPU device?
In the old Mediapipe's sample app my test phone could track hands in real time, in this version it is lagging behind. What could be the problem? First thought that comes to my head is analyze case for CameraX being slower. Old Mediapipe sample used an older CameraX version and handled SurfaceView manually if my memory is serving me right. Is there anyway I can get around this and get the same speed for this version? Forgive me if I am asking anything obvious as I am still rather new to Android development.
I have follow the page https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer/customize and get a custom model gesture_recognizer.task
that can recognize rock paper scissors.
And I found Android SDK can only load one model. So how can I use this model conbime with the default one https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer#models
the face_landmarker_v2_with_blendshapes.task seems only support CPU?
what other models support GPU ?
Hello, I really need help
I need to collect the coordinates of the points of both hands and face from one frame. I have combined the code to search for face points and hand points.
Now I have two FaceLandmarkerHelper and HandLandmarkerHelper files in my project. Nothing has changed in them, except the name of the inherited LandmarkerListener functions: onError, onResults, onEmpty. Now it's onErrorFace/onErrorHand, onResultsFace/onResultsHand, onEmptyFace/onEmptHand.
In the CameraFragment, I similarly divided the variables into hands and face: imageAnalyzer, backgroundExecutor. And added a variable handLandmarkerHelper, faceLandmarkerHelper. The class itself inherits from HandLandmarkerHelper.LandmarkerListener и FaceLandmarkerHelper.LandmarkerListener.
All separated variables have duplicated code.
Declaring variables:
private lateinit var handLandmarkerHelper: HandLandmarkerHelper
private lateinit var faceLandmarkerHelper: FaceLandmarkerHelper
private val viewModel: MainViewModel by activityViewModels()
private var preview: Preview? = null
private var handIimageAnalyzer: ImageAnalysis? = null
private var faceImageAnalyzer: ImageAnalysis? = null
private var camera: Camera? = null
private var cameraProvider: ProcessCameraProvider? = null
private var cameraFacing = CameraSelector.LENS_FACING_FRONT
private lateinit var handBackgroundExecutor: ExecutorService
private lateinit var faceBackgroundExecutor: ExecutorService
In the function onViewCreated:
super.onViewCreated(view, savedInstanceState)
// Initialize our background executor
backgroundExecutor = Executors.newSingleThreadExecutor()
faceBackgroundExecutor = Executors.newSingleThreadExecutor()
// Wait for the views to be properly laid out
fragmentCameraBinding.viewFinder.post {
// Set up the camera and its use cases
setUpCamera() // метод setUpCamera текущего класса
}
// Create the HandLandmarkerHelper that will handle the inference
handBackgroundExecutor.execute {
handLandmarkerHelper = HandLandmarkerHelper(
context = requireContext(),
runningMode = RunningMode.LIVE_STREAM,
minHandDetectionConfidence = viewModel.currentMinHandDetectionConfidence,
minHandTrackingConfidence = viewModel.currentMinHandTrackingConfidence,
minHandPresenceConfidence = viewModel.currentMinHandPresenceConfidence,
maxNumHands = viewModel.currentMaxHands,
currentDelegate = viewModel.currentDelegate,
handLandmarkerHelperListener = this
)
}
faceBackgroundExecutor.execute {
faceLandmarkerHelper = FaceLandmarkerHelper(
context = requireContext(),
runningMode = RunningMode.LIVE_STREAM,
minFaceDetectionConfidence = viewModel.currentMinFaceDetectionConfidence,
minFaceTrackingConfidence = viewModel.currentMinFaceTrackingConfidence,
minFacePresenceConfidence = viewModel.currentMinFacePresenceConfidence,
maxNumFaces = viewModel.currentMaxFaces,
currentDelegate = viewModel.currentDelegate,
faceLandmarkerHelperListener = this
)
}
Part of the function bindCameraUseCases:
// ImageAnalysis. Using RGBA 8888 to match how our models work
handImageAnalyzer =
ImageAnalysis.Builder().setTargetAspectRatio(AspectRatio.RATIO_4_3)
.setTargetRotation(fragmentCameraBinding.viewFinder.display.rotation)
.setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
.setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
.build()
// The analyzer can then be assigned to the instance
.also {
it.setAnalyzer(handBackgroundExecutor) { image ->
detectHand(image)
}
}
faceImageAnalyzer =
ImageAnalysis.Builder().setTargetAspectRatio(AspectRatio.RATIO_4_3)
.setTargetRotation(fragmentCameraBinding.viewFinder.display.rotation)
.setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
.setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
.build()
// The analyzer can then be assigned to the instance
.also {
it.setAnalyzer(faceBackgroundExecutor) { image ->
detectFace(image)
}
}
// Must unbind the use-cases before rebinding them
cameraProvider.unbindAll()
Is it possible to pass two analyzers to bindToLifcycle? Continuation of the function bindCameraUseCases:
try {
// A variable number of use-cases can be passed here -
// camera provides access to CameraControl & CameraInfo
camera = cameraProvider.bindToLifecycle(
this, cameraSelector, preview, handImageAnalyzer (or faceImageAnalyzer)
)
// Attach the viewfinder's surface provider to preview use case
preview?.setSurfaceProvider(fragmentCameraBinding.viewFinder.surfaceProvider)
} catch (exc: Exception) {
Log.e(TAG, "Use case binding failed", exc)
}
Or maybe somehow combine both analyzers in one?
imageAnalyzer = handImageAnalyzer + faceImageAnalyzer
If you create one imageAnalyzer and one backgroundExecutor, then in this code:
imageAnalyzer =
ImageAnalysis.Builder().setTargetAspectRatio(AspectRatio.RATIO_4_3)
.setTargetRotation(fragmentCameraBinding.viewFinder.display.rotation)
.setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
.setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
.build()
// The analyzer can then be assigned to the instance
.also {
it.setAnalyzer(backgroundExecutor) { image ->
detectFace(image)
}
it.setAnalyzer(backgroundExecutor) { image ->
detectHand(image)
}
}
the latter method will prevail (now detectHand)
I am writing to report an issue I have encountered while importing the mediapipe library using ES6 modules in Workers. When attempting to import mediapipe, I consistently receive the following error: "TypeError: Failed to execute 'importScripts' on 'WorkerGlobalScope': Module scripts don't support importScripts()." This error occurs when I set the 'type' attribute to 'module' when connecting the Worker, as shown in the code snippet below:
new Worker(this.workerScriptUrl, { type: 'module' });
After investigating the issue, it has become apparent that the mediapipe library relies on importScripts to import WebAssembly (wasm) files. Unfortunately, due to the limitation of module scripts in Workers, importScripts is not supported, resulting in the aforementioned error. Consequently, I am unable to utilize the mediapipe library within Workers when using ES6 modules.
I kindly request your assistance in resolving this matter. I propose the implementation of an optional workaround to address this issue. Specifically, it would be greatly appreciated if you could introduce an exception that allows an alternative method, such as using fetch or any other suitable approach, to import the wasm files instead of relying on importScripts. This adjustment would facilitate the seamless integration of the mediapipe library with ES6 modules in Workers, enabling developers to effectively leverage its functionalities.
Thank you for your attention to this matter. I eagerly await your response and any guidance you can provide to help resolve this issue.
recently y'all posted excellent work: https://ai.googleblog.com/2023/06/on-device-diffusion-plugins-for.html
that article links to the MediaPipe image generation app but it redirects to /
https://mediapipe.page.link/on-device-diffusion-plugins
where's the code?
cc the wizards: @YangNaruto @Tingbopku @jiuqiant @lu-wang-g @khanhlvg @lee-ju @[email protected]
Hello, I was trying to modify the image segmentation example. For video live streaming mode, I try to put points into the image, but it seems not centered. I debug the image, it seems that the video within the canvas is cut.
Code :
I call this code within setResults
drawExpectedLocation(mask_with_center.nativeObjAddr)
and I convert the mask_with_center into bitmap and show it
val mask_with_center = Mat(outputHeight, outputWidth, CvType.CV_8UC4,Scalar.all(0.0))
drawExpectedLocation(mask_with_center.nativeObjAddr)
val image = Bitmap.createBitmap(
mask_dif_largest.cols(),
mask_dif_largest.rows(),
Bitmap.Config.ARGB_8888
);
Utils.matToBitmap(mask_with_center, image);
val scaleFactor = when (runningMode) {
RunningMode.IMAGE,
RunningMode.VIDEO -> {
min(width * 1f / outputWidth, height * 1f / outputHeight)
}
RunningMode.LIVE_STREAM -> {
// PreviewView is in FILL_START mode. So we need to scale up the
// landmarks to match with the size that the captured images will be
// displayed.
max(width * 1f / outputWidth, height * 1f / outputHeight)
}
}
val scaleWidth = (outputWidth * scaleFactor).toInt()
val scaleHeight = (outputHeight * scaleFactor).toInt()
scaleBitmap = Bitmap.createScaledBitmap(
image, scaleWidth, scaleHeight, false
)
and I got the result like this
the 2 circle is not centered... I think because the image within canvas is cut. please help me
The following CodePen demos (examples) are failing because of dependency incompatibility for MediaPipe Tasks Vision package:
I initially assumed that it was an issue with CodePen, but I can confirm that it's valid since I've tested most of them locally too.
cc: @jenperson, @PaulTR
The error log populates an issue with building the "@mediapipe/tasks-vision" package [@mediapipe/[email protected]
]
💡 0.1.0-alpha-10 just got released just 6 hours ago, while 0.1.0-alpha-9 got released 2 days ago. Release versions can be explored over here.
The easy fix is to update the CDN → https://cdn.skypack.dev/@mediapipe/tasks-vision@latest
with either https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision
(stable) or better "https://cdn.skypack.dev/@mediapipe/[email protected]" (latest stable) & you're good to go! 🚀
P.S.: Please consider adding an Issue Template
as well as Stale bot
for this repository or better import the same one from here & set the config value accordingly. If assigned, I'd love to raise a PR for the same! 😄
OS : Ubuntu (22.04 LTS, x64) / Windows 10 Pro (x64)
Browser :
On the documentation page:
https://developers.google.com/mediapipe/solutions/vision/image_segmenter/web_js
it is mentioned that segmentForVideo() can also be run in a worker in order to improve performance:
I couldn't find a working example for this. The problem is that importScripts() is not supported in workers, but the mediapipe npm module seems to use this function.Thus I'm getting:
I'm running this all in the renderer process of an Electron app.
E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.google.mediapipe.examples.handlandmarker, PID: 22940
kotlin.UninitializedPropertyAccessException: lateinit property handLandmarkerHelper has not been initialized
at com.google.mediapipe.examples.handlandmarker.fragment.CameraFragment$initBottomSheetControls$9.onItemSelected(CameraFragment.kt:249)
at android.widget.AdapterView.fireOnSelected(AdapterView.java:979)
at android.widget.AdapterView.dispatchOnItemSelected(AdapterView.java:968)
at android.widget.AdapterView.-$$Nest$mdispatchOnItemSelected(Unknown Source:0)
at android.widget.AdapterView$SelectionNotifier.run(AdapterView.java:932)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:240)
at android.os.Looper.loop(Looper.java:351)
I got this error on my OnePlus 9R device. I can see handLandmarkerHelper has been initialized in onViewCreated() via concurrent executor. That's the reason of the crash I guess. Is it possible to initialize it on the main thread?
I am trying to use the x, y, and z axes of the landmarks output from your latest FaceLandmarker model. It is mentioned here that the z magnitude follows the scale of the x-axis. I tried de-normalizing the z-axis based on the x-axis but got inaccurate results.
Can you please clarify how to correctly de-normalize the depth (z-axis) values?
Hello, is there a definition of the location of each landmark in the dense model? If so, how is it corresponding to the vertices of FLAME, GHUM, orPhoMoH models?
Thanks!
Hi,
My Self Devarsh Mavani, 3rd year Information Technology Student at Vishwakarma Government Engineering College. I am looking forward to participating in GSoC under tensorflow this year. I am interested in the project "Interactive Web Demos using the MediaPipe Machine Learning Library ''. To give you a bit of my background, I have Done ML Specialization and Deep Learning course from coursera. I have also done a few Deep-learning related projects. One of them being Sign language recognition in android (using tflite). I participated in GSoC last year under MIT App Inventor, and have over 1 year of experience in open-source development.
I found mediapipe really interesting and want to contribute by making demos for mediapipe. I have a couple of demos that we can build in my mind, such as Sign language recognition and Virtual paint app. I wanted to discuss what type of demo web-applications are the mentors expecting from me, as part of GSoC. Will it be up to students? or will it be discussed to us prior? I am working on a Proposal for this project so I wanted to clarify some of my doubts. It will be really helpful if I can connect to potential mentor.
Thank You,
Devarsh Mavani
Trying to troubleshoot why only on my ipad safari i get a full white mask back every time. The segmentation works fine on desktop but im not entirely sure how else to test/debug this. i do get a base64 image from the segmenter, but its always just a white image on ipad safari.
just updated my computer and ipad to the lastest ios/safari
useEffect(() => {
const loadModel = async () => {
const audio = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/wasm"
);
const options = {
baseOptions: {
modelAssetPath: "/models/selfie_multiclass.tflite",
delegate: "GPU",
},
runningMode: "IMAGE",
outputCategoryMask: true,
outputConfidenceMasks: false,
displayNamesLocale: "en",
};
const segmenter = await ImageSegmenter.createFromOptions(audio, options);
setImageSegmenter(segmenter);
};loadModel();
}, []);
const initializeCamera = async () => {
try {
const constraints = {
audio: false,
video: { facingMode: "user", width: 512, height: 768 },
};
const stream = await navigator.mediaDevices.getUserMedia(constraints);
const videoElement = videoRef.current;
videoElement.srcObject = stream;
videoElement.onloadedmetadata = () => {
videoElement.play();
setIsCameraInitialized(true);
};
} catch (error) {
console.error("Error accessing webcam:", error);
console.error(error.message);
}
};const takeSelfie = async () => {
const videoElement = videoRef.current;
const canvasElement = canvasRef.current;
const canvasCtx = canvasElement.getContext("2d");const { videoWidth, videoHeight } = videoElement; canvasElement.width = 512; canvasElement.height = 768; canvasCtx.drawImage(videoElement, 0, 0, 512, 768); const imageData = canvasCtx.getImageData(0, 0, 512, 768); const imageUrl = canvasElement.toDataURL(); setImageURL(imageUrl); await imageSegmenter.segment(imageData, callback);
};
const callback = (result) => {
const canvasElement = canvasRef.current;
const canvasCtx = canvasElement.getContext("2d");const { width, height } = canvasElement; const categoryMask = result.categoryMask.getAsUint8Array(); const imageData = canvasCtx.getImageData(0, 0, 512, 768); const data = imageData.data; const headColor = [0, 0, 0, 255]; // Black color for head and hair const whiteColor = [255, 255, 255, 255]; // White color for other parts for (let i = 0; i < categoryMask.length; i++) { const categoryIndex = categoryMask[i]; if ( categoryIndex == 1 || categoryIndex === 2 || categoryIndex == 3 || categoryIndex == 5 ) { // Head and hair category indices data[i * 4] = headColor[0]; data[i * 4 + 1] = headColor[1]; data[i * 4 + 2] = headColor[2]; data[i * 4 + 3] = headColor[3]; } else { data[i * 4] = whiteColor[0]; data[i * 4 + 1] = whiteColor[1]; data[i * 4 + 2] = whiteColor[2]; data[i * 4 + 3] = whiteColor[3]; } } canvasCtx.putImageData(imageData, 0, 0); const segmentedImageURL = canvasElement.toDataURL(); setSegmentedImageURL(segmentedImageURL);
};
Getting the following error when I start the app
E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.google.mediapipe.examples.poselandmarker, PID: 23887
kotlin.UninitializedPropertyAccessException: lateinit property poseLandmarkerHelper has not been initialized
at com.google.mediapipe.examples.poselandmarker.fragment.CameraFragment$initBottomSheetControls$8.onItemSelected(CameraFragment.kt:254)
at android.widget.AdapterView.fireOnSelected(AdapterView.java:957)
at android.widget.AdapterView.dispatchOnItemSelected(AdapterView.java:946)
at android.widget.AdapterView.-$$Nest$mdispatchOnItemSelected(Unknown Source:0)
at android.widget.AdapterView$SelectionNotifier.run(AdapterView.java:910)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at android.app.ActivityThread.main(ActivityThread.java:7884)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:936)
While converting the gesturerecognizer example into AAR and inserting it into Unity to call an activity and get a value, an error like this log and it says that initialization failed.
native com.DefaultCompany.SampleAPI I I20230329 11:05:49.779655 14280 resource_util_android.cc:77] Successfully loaded: gesture_recognizer.task
QCamera [email protected] I <HAL><I> openCamera: 794: [KPI Perf]: X PROFILE_OPEN_CAMERA camera id 0, rc: 0
LGCameraPerf-8996ac_OOS [email protected] E powerHintInternal_LGE: 353: mEnable = 0, enable = 1 PowerHint::CAMERA_STREAMING = 12
LGCameraPerf-8996ac_OOS [email protected] E powerHintInternal_LGE: 376: powerHint = 11, enable = 1,
QCamera [email protected] I <HAL><I> initialize: 1097: E :mCameraId = 0 mState = 1
sensors_ha...otionAccel [email protected] D processInd: LP2: X: 0.296753 Y: 1.117096 Z: 9.717972 SAM TS: 2602640251 HAL TS:79423187446215 elapsedRealtimeNano:79423270247980
sensors_hal_Ctx [email protected] D poll:polldata:1, sensor:54, type:499898101, x:0.296753 y:1.117096 z:9.717972
sensors_hal_Util [email protected] D waitForResponse: timeout=0
BluetoothRemoteDevices com.android.bluetooth D Property type: 1
6519-6749 BluetoothRemoteDevices com.android.bluetooth W Skip name update for C0:F0:FB:27:E3:C2
QCOM PowerHAL [email protected] I Preview power hint start
BluetoothRemoteDevices com.android.bluetooth D Property type: 4
BluetoothRemoteDevices com.android.bluetooth W Skip class update for C0:F0:FB:27:E3:C2
native com.DefaultCompany.SampleAPI I I20230329 11:05:49.785691 14280 hand_gesture_recognizer_graph.cc:250] Custom gesture classifier is not defined.
QCamera [email protected] I <HAL><I> initialize: 1130: X
tflite com.DefaultCompany.SampleAPI E The supplied buffer is not 4-bytes aligned
tflite com.DefaultCompany.SampleAPI E The model allocation is null/empty
native com.DefaultCompany.SampleAPI E E20230329 11:05:49.786113 14280 graph.cc:472] Could not build model from the provided pre-loaded flatbuffer: The model allocation is null/empty
GestureRec...r 41116847 com.DefaultCompany.SampleAPI E MP Task Vision failed to load the task with error: unknown: Could not build model from the provided pre-loaded flatbuffer: The model allocation is null/empty
Why does this log :( I don't know because I don't have knowledge about TensorFlow.
+) I confirmed that it works well when inserting AAR through Android Studio.
Hello,
I've tried to read Video frame to numpy array.
Did I missed something to make an input of recognizer?
import random
import ctypes
from PIL import Image
with vision.GestureRecognizer.create_from_options(options) as recognizer:
cap = cv2.VideoCapture('TRAIN_300.mp4')
print("==== Video Info. ===== ")
#print(cv2.CAP_PROP_FRAME_WIDTH)
#print(cv2.CAP_PROP_FRAME_HEIGHT)
fps = cv2.CAP_PROP_FPS
#print(fps)
timestamps = [cv2.CAP_PROP_POS_MSEC]
calc_timestamps = [0.0]
timearray = []
frameCount = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
frameWidth = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frameHeight = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
buf = np.empty((frameCount, frameHeight, frameWidth, 3), np.dtype('uint8'))
fc = 0
ret = True
while (fc < frameCount and ret):
ret, buf[fc] = cap.read()
fc += 1
timestamps.append(cap.get(cv2.CAP_PROP_POS_MSEC))
ts = cap.get(cv2.CAP_PROP_POS_MSEC)
cts = calc_timestamps[-1] + 1000/fps
timearray.append(abs(ts - cts))
cap.release()
frame_timestamp_ms = timearray[9]
print(type(buf[9]))
mp_image = mp.Image(format=ImageFormat.SRGB, data=np.stack(buf[9]))
gesture_recognition_result = recognizer.recognize_for_video(mp_image,frame_timestamp_ms)
#numpy_frame_from_opencv = np.stack(frames, axis=0) # dimensions (T, H, W, C)
#print(len(numpy_frame_from_opencv))
cv2.destroyAllWindows()
TypeError Traceback (most recent call last)
Cell In[10], line 35
33 frame_timestamp_ms = timearray[9]
34 print(type(buf[9]))
---> 35 mp_image = mp.Image(format=ImageFormat.SRGB, data=np.stack(buf[9]))
37 gesture_recognition_result = recognizer.recognize_for_video(mp_image,frame_timestamp_ms)
40 #numpy_frame_from_opencv = np.stack(frames, axis=0) # dimensions (T, H, W, C)
41
42 #print(len(numpy_frame_from_opencv))
TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. mediapipe.python._framework_bindings.image.Image(image_format: mediapipe::ImageFormat_Format, data: numpy.ndarray[numpy.uint8])
2. mediapipe.python._framework_bindings.image.Image(image_format: mediapipe::ImageFormat_Format, data: numpy.ndarray[numpy.uint16])
3. mediapipe.python._framework_bindings.image.Image(image_format: mediapipe::ImageFormat_Format, data: numpy.ndarray[numpy.float32])
Invoked with: kwargs: format=<ImageFormat.SRGB: 1>, data=array([[[113, 123, 106],
[113, 123, 106],
[113, 123, 106],
...,
[149, 162, 144],
[149, 162, 144],
[147, 160, 142]],
[[114, 124, 107],
[114, 124, 107],
[114, 124, 107],
...,
[149, 162, 144],
[147, 160, 142],
[147, 160, 142]],
[[114, 124, 107],
[114, 124, 107],
[114, 124, 107],
...,
[147, 160, 142],
[146, 159, 141],
[146, 159, 141]],
...,
[[ 38, 43, 41],
[ 52, 57, 55],
[ 68, 74, 69],
...,
[ 19, 24, 22],
[ 20, 25, 23],
[ 20, 25, 23]],
[[ 68, 73, 71],
[ 92, 97, 95],
[104, 110, 105],
...,
[ 18, 23, 21],
[ 19, 24, 22],
[ 19, 24, 22]],
[[ 49, 54, 52],
[ 46, 51, 49],
[ 62, 68, 63],
...,
[ 18, 23, 21],
[ 19, 24, 22],
[ 20, 25, 23]]], dtype=uint8)
Can you please write a small code in which i can use multiclass selfie segmentation on my local machine?
Dear Paul ,
I hope this message finds you well. I am interested in contributing to your _**_android app mediapipe machine learning app development project that is listed on Gsoc'23 **_and have also sent you the proposal regarding this and would like to know whether there is a repository available on GitHub.
If there is already a repository available, could you please share the link with me? Alternatively, if there is no repository available, would you consider creating one so that contributors like myself can easily contribute to the project?
Thank you for considering my request. I look forward to hearing from you soon.
Hope you remember me well!
Thanking you in advance and sorry for creating an issue like this on Github.
Best regards,
Aakash
We will need to add a new tab to imageclassification to import images/videos. This will follow the object detection example.
hi all, I dont know whether this repo is the right place to ask this, but since I am using the example from this repo, I will ask anyway.
So, I am trying to modify
https://github.com/googlesamples/mediapipe/tree/main/examples/image_segmentation/android
I try to change
.setOutputType(ImageSegmenter.ImageSegmenterOptions.OutputType.CATEGORY_MASK)
into
.setOutputType(ImageSegmenter.ImageSegmenterOptions.OutputType.CONFIDENCE_MASK)
this is the code:
and I read on https://developers.google.com/mediapipe/solutions/vision/image_segmenter/android that
so I print the result, by using
and
it seems the value is not the same as a probability
Note that I am using hair_segmentation.tflite which has only 2 types of output, background and hair
please guys, I cannot find another resource regarding this
We will need top layer and app-level READMEs updated
Currently need tests for imageclassifier, object detection
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.