Right now, there is no way to request the Reality camera data in order to do computer

Chatting with NingXin Hu, he suggested mediacapture-worker (<a href="https://w3c.githu

I am thinking of two use scenarios of camera data: upload came

Initiated a API sketch <a class="issue-link js-issue-link" data-error-text="Failed to

Stub out the API for getting access to camera data for computer vision about webxr-polyfill HOT 6 OPEN

mozilla commented on May 17, 2024

Stub out the API for getting access to camera data for computer vision

from webxr-polyfill.

Comments (6)

blairmacintyre commented on May 17, 2024

Chatting with NingXin Hu, he suggested mediacapture-worker (https://w3c.github.io/mediacapture-worker/) might be a good starting point for how to structure a vision worker.

That is essentially what I have imagined: a worker-like setup where WebXR can execute custom CV code (perhaps in WebAsssembly, Javascript or even on the GPU) for each video frame. We have the opportunity to provide whatever necessary data we want (e.g., camera intrinsics, pose of camera relative to some frame of reference, other time-synchronized sensor data such as accelerometers/gyros, etc) as well.

Some of this (sensor data) might be best provided via a separate sensor API (assuming we can leverage shared memory to share it between workers). I think (when we look at modern camera APIs) we might want to consider assuming we have things like intrinsics for each camera: at least, make this an optional field.

Having the camera video not just be assumed to be “the video we are overlaying AR onto” is essential, I think: we want to support see-through devices with cameras (like Hololens), multi-camera devices, and devices (like Vive) that have cameras that don’t align/cover the camera view.

We should assume that we can provide the camera pose relative to some “display” frame of reference. CV for AR (in general) has been greatly hampered by not knowing the calibrated structure of the display and sensor package, but when you look at real devices (e.g., Hololens, Vive, etc) that have cameras, the relationships between the device coordinate system and camera, along with the camera intrinsics, is pre-calibrated. ARKit and ARCore will also provide this information on mobile, and I assume any custom HMD will be able to provide it for any attached devices.

from webxr-polyfill.

TrevorFSmith commented on May 17, 2024

The MediaCapture Worker doc is marked as inactive, so probably not going to help on the implementation side, but I agree that the pattern is one that could work for this.

Yes, we need to handle camera data of varying types and FOV coverage with intrinsics to inform the CV algorithms.

from webxr-polyfill.

blairmacintyre commented on May 17, 2024

Yes, I don't mean "use it": we don't want to use WebRTC at all, directly. What I envision, eventually, might be a way to "add in" WebRTC sources to the worker structure, but for now, I think the video sources would be accessed and configured via WebXR, because we only want ones that really have the information we need, and that we can access efficiently.

I was thinking of the patterns, yes.

from webxr-polyfill.

huningxin commented on May 17, 2024

Agree. I don't suggest to take MediaCapture worker spec as is.

We (with Mozilla folks) used to try bringing CV to web. We made some progresses on MediaCapture worker for off-main-thread processing, ImageBitmap extension for efficient captured image data access, MediaCapture depth extension for depth camera access and OpenCV.js for CV algorithms on web (asm.js at that time, now support wasm).

I think we can leverage some experiences obtained from previous work and benefit the CV use cases in WebXR.

from webxr-polyfill.

huningxin commented on May 17, 2024

I am thinking of two use scenarios of camera data:

upload camera data to WebGL for rendering, e.g. for HoloLens or Vive
send camera data to a worker for marker detection, for ARKit/ARCore, HoloLens

The first case can be handled by the main thread. The second case needs to be handled by a worker thread.

It requires to represent the camera data by a opaque handle. The handle supports uploading image data to GPU if the data is in CPU memory or skip that if data is already in GPU memory. The handle also supports copying the camera data to WebAssembly heap for CPU processing case. It should avoid the unnecessary color-conversion and memory copies of current mediastream -> video -> canvas pipeline. ImageBitmap extension is a good fit here.

from webxr-polyfill.

huningxin commented on May 17, 2024

Initiated a API sketch mozilla/webxr-api#18 for discussion.

from webxr-polyfill.

Stub out the API for getting access to camera data for computer vision about webxr-polyfill HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent