As described in immersive-web/webxr#394, when we move away from allowing AR content inline there's a desire to still enable 2D UI to be built using DOM for cases like phone AR, where support for displaying DOM elements and the AR stream together should be pretty trivial. (Previously the same effect would have been supported by using the fullscreen API on a parent element that contained both the AR canvas and some overlaid DOM elements.) Given that we want to utilize the advantages of the web as our platform whenever possible, it would be incredibly unfortunate to lose this ability in pursuit of an explicit AR mode.
For reference, consider this image of Pokemon Go:
If created with WebXR, ideally the animal and likely the ball at the bottom of the screen (because it has an associated throwing animation) would be rendered with WebGL as part of the core session rAF loop, but the other UI elements would ideally be handled as standard DOM elements that are simply composited over the AR content by the UA. (The name of the animal floating over it's head is a special case that I'll address more in a second.)
However, while the core need is to retain DOM support for phone AR there's also a desire to potentially enable that DOM content to be surfaced on headsets, with the idea being that the AR content would be fully immersive while the DOM portion is composited in by the UA somehow to ensure that the user can still access it. Exactly how the would appear is an open question, and one that we probably would want to leave up to the UA to avoid prescribing unproven UX patterns. Some possibilities I could see are:
- Having the overlay DOM be a floating, moveable window in space
- Attaching it to your wrist
- Pinning it to a wall
- Showing/hiding it with the push of a button
Doing so would likely be considered a "compatibility" mode, and would definitely not be a path we'd encourage for developers explicitly targeting headset AR. The big benefit being that it would enable AR content built for the more common devices (phone AR) to still be accessible on more advanced devices, thus immediately increasing the content that's accessible to them.
That said, there's also been some concerns voiced that supporting DOM like this in headsets could be difficult for some platforms, or would be hard to make a good user experience. As such, I'm reluctant to say that supporting a DOM overlay should be required for all devices that support AR. And certainly I believe that we should offer the right signals and tools in all cases to allow developers to explicitly create experiences optimized for any given devices they choose to support.
So this issue is simply to talk about how we should go about supporting those overlays and what guarantees of availability that mode should have.
Some other considerations:
- In my mind this is explicitly different than a DOM-to-texture or DOM layer solution, which would primarily be aimed at enabling DOM content to be shown in a developer-controlled way in 3D space. While you certainly could use such a mechanism to achieve the same effect it has a lot more technical, security, and ergonomics issues involved, doesn't feel like the right fit for a simple "I just want a couple of DOM buttons" UI cases, and the complexity would likely prevent us from shipping any time soon.
- As pointed out by @blairmacintyre at the AR F2F, it would be ideal if we ensured that cases where DOM UI isn't desired could be optimized by the UA, which would no longer have to do some processing on the DOM tree/compositor.
- We expect that users will attempt to do alignment of DOM elements and 3D rendered elements, probably involving gratuitous amounts of matrix math and CSS 3D transforms. See, for example, the name floating above the animal's head in the image above. Problematically, these types of alignment would probably function OK for phone AR but be next-to-useless in headset AR, especially if the developer has no sense of where the DOM content is relative to the AR content. It's an open question for me if we want to accept this as inevitable and encourage this type of use by providing spatial mapping functions, or if we should discourage it for the sake of headset AR by explicitly making it difficult to reverse-engineer the spatial mapping between DOM and AR content.