Giter Club home page Giter Club logo

Comments (6)

blairmacintyre avatar blairmacintyre commented on May 17, 2024

Chatting with NingXin Hu, he suggested mediacapture-worker (https://w3c.github.io/mediacapture-worker/) might be a good starting point for how to structure a vision worker.

That is essentially what I have imagined: a worker-like setup where WebXR can execute custom CV code (perhaps in WebAsssembly, Javascript or even on the GPU) for each video frame. We have the opportunity to provide whatever necessary data we want (e.g., camera intrinsics, pose of camera relative to some frame of reference, other time-synchronized sensor data such as accelerometers/gyros, etc) as well.

Some of this (sensor data) might be best provided via a separate sensor API (assuming we can leverage shared memory to share it between workers). I think (when we look at modern camera APIs) we might want to consider assuming we have things like intrinsics for each camera: at least, make this an optional field.

Having the camera video not just be assumed to be “the video we are overlaying AR onto” is essential, I think: we want to support see-through devices with cameras (like Hololens), multi-camera devices, and devices (like Vive) that have cameras that don’t align/cover the camera view.

We should assume that we can provide the camera pose relative to some “display” frame of reference. CV for AR (in general) has been greatly hampered by not knowing the calibrated structure of the display and sensor package, but when you look at real devices (e.g., Hololens, Vive, etc) that have cameras, the relationships between the device coordinate system and camera, along with the camera intrinsics, is pre-calibrated. ARKit and ARCore will also provide this information on mobile, and I assume any custom HMD will be able to provide it for any attached devices.

from webxr-polyfill.

TrevorFSmith avatar TrevorFSmith commented on May 17, 2024

The MediaCapture Worker doc is marked as inactive, so probably not going to help on the implementation side, but I agree that the pattern is one that could work for this.

Yes, we need to handle camera data of varying types and FOV coverage with intrinsics to inform the CV algorithms.

from webxr-polyfill.

blairmacintyre avatar blairmacintyre commented on May 17, 2024

Yes, I don't mean "use it": we don't want to use WebRTC at all, directly. What I envision, eventually, might be a way to "add in" WebRTC sources to the worker structure, but for now, I think the video sources would be accessed and configured via WebXR, because we only want ones that really have the information we need, and that we can access efficiently.

I was thinking of the patterns, yes.

from webxr-polyfill.

huningxin avatar huningxin commented on May 17, 2024

Agree. I don't suggest to take MediaCapture worker spec as is.

We (with Mozilla folks) used to try bringing CV to web. We made some progresses on MediaCapture worker for off-main-thread processing, ImageBitmap extension for efficient captured image data access, MediaCapture depth extension for depth camera access and OpenCV.js for CV algorithms on web (asm.js at that time, now support wasm).

I think we can leverage some experiences obtained from previous work and benefit the CV use cases in WebXR.

from webxr-polyfill.

huningxin avatar huningxin commented on May 17, 2024

I am thinking of two use scenarios of camera data:

  1. upload camera data to WebGL for rendering, e.g. for HoloLens or Vive
  2. send camera data to a worker for marker detection, for ARKit/ARCore, HoloLens

The first case can be handled by the main thread. The second case needs to be handled by a worker thread.

It requires to represent the camera data by a opaque handle. The handle supports uploading image data to GPU if the data is in CPU memory or skip that if data is already in GPU memory. The handle also supports copying the camera data to WebAssembly heap for CPU processing case. It should avoid the unnecessary color-conversion and memory copies of current mediastream -> video -> canvas pipeline. ImageBitmap extension is a good fit here.

from webxr-polyfill.

huningxin avatar huningxin commented on May 17, 2024

Initiated a API sketch mozilla/webxr-api#18 for discussion.

from webxr-polyfill.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.