Depth and RGB frames in the Hololens are not aligned. I'm trying to get information ab

@mauronano, what <a class="user-mention notranslate" data-hovercard-type="user" data-h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

For anyone who comes from Google, see <a href="https://github.com/cyberj0g/HoloLensFor

How to acquire synchronized frames from depth and main(1280x720) RGB cameras? about hololensforcv HOT 13 CLOSED

microsoft commented on May 22, 2024

How to acquire synchronized frames from depth and main(1280x720) RGB cameras?

from hololensforcv.

Comments (13)

Huangying-Zhan commented on May 22, 2024

Hi @mauronano, poses of the sensors in different timestamps are available in research mode. For depth maps, there is an unprojection model for the depth sensor as well. Therefore, Using the unprojection model, you can get 3D coordinates of the points in the depth map. Knowing the relative pose between depth sensor and RGB camera, you should be able to transform the 3D points to RGB camera coordinate system and project the points to RGB camera view, which means that you can create depth maps for RGB camera view. In this case, RGB frame and depth frame are not necessary to be acquired at the same time. However, if the acquired time gap is too large, there are some other issues, like occlusion.

from hololensforcv.

FracturedShader commented on May 22, 2024

@mauronano, what @Huangying-Zhan said is exactly right. In my case I have an additional set of steps where I keep the latest frame from each of the streams, but only send the data from the last frame of each if the wearer hasn't moved very much in the last second. This is mainly to prevent blurry images, but also helps to make debugging a little bit easier. While the CameraStreamCoordinateMapper and CameraStreamCorrelation samples seem to indicate you can directly convert from one image to the other, I found this to simply not be true. Likely because those two samples use a stationary Kinect, which acts differently. In my case I ended up doing: ColorProjectionMatrix * ColorViewMatrix * DepthSpaceToColorSpace * InverseDepthViewMatrix * DepthCameraPoint. It's a whole ordeal, but all in all a pretty standard 2D->3D->3D->2D graphical space conversion pipeline. To get the DepthSpaceToColorSpace matrix you can use the TryGetTransformTo method with the two SpatialCoordinateSystems like they do in the MediaFrameReaderContext::FrameArrived function. Seeing in your previous question that you are using Unity, be sure to convert the matrices before doing anything with them. The Mixed Reality Toolkit has a preview sample that shows how to do that.

from hololensforcv.

FracturedShader commented on May 22, 2024

I would like to add, the projection returned by the streams for the color image is very wrong. I ended up needing to fake it. Once you get it dialed in though, you can capture things as small as individual wires pretty well.

Folds in fabric come out pretty well too. Keep in mind, the data is rough, and full of holes.

from hololensforcv.

maurosyl commented on May 22, 2024

@Huangying-Zhan , @FracturedShader thank you both for your answers. Just to make sure if i understand what you suggest: after i manage to get the 3D coordinates of the unprojected depth pixel (which you refer to as "DepthCameraPoint") i use the cameraViewTransform matrix provided by the Recorder tool to map them to the depth camera space. Regarding the DepthSpaceToColorSpace, i'm not sure about the meaning of the "FrameToOrigin" matrix produced at the place in the code you pointed at, maybe grasping that could help me better understand the math behind this trasformation. As for the "ColorViewMatrix " which stores the camera extrinsics, should i edit the Recorder project to provide me that too? As at the moment it returns only the parameters of the low resolution RGB cameras and not the 1280x720 one.

from hololensforcv.

FracturedShader commented on May 22, 2024

@mauronano, unfortunately I ended up writing my own custom application. I haven't actually tried to use the Recorder project directly. As a result I don't know much about how it specifically works. You may just have to poke around and make modifications as you see fit to get the data you need.

from hololensforcv.

Huangying-Zhan commented on May 22, 2024

@mauronano, suppose you have 3D points in Depth camera space, then if you want to get 3D points in Color camera space, you need the relative pose between depth camera and color camera. From the recorder app, there is an example of getting the absolute pose for each sensor, which you can check from here. Eventually, you can get the relative pose from the absolute poses.

from hololensforcv.

maurosyl commented on May 22, 2024

I want to thank you both very much for your answers as i'm only approaching to computer vision and the stuff i find online is still pretty obscure to me. I have one last question, can i only find the alignment once, offline, and then use it to map every depth frame on my rgb frames or it is something i should do continuously?

from hololensforcv.

Huangying-Zhan commented on May 22, 2024

@mauronano, given that the depth frames and rgb frames are generally taken at different timestamps, which means that, if you have many RGB-D pairs, the relative poses (between RGB and D) for these pairs are not constant and you can't use a single transformation for all the alignments. Therefore, you need to do this for each RGB-D pair. If RGB-D pairs are always taken at same timestamp, then a single transformation is enough for all the alignments. Unfortunately, it is not the case in here.

from hololensforcv.

pranaabdhawan commented on May 22, 2024

@mauronano were you able to convert between the 2 co-ordinate systems? I tried the mapping by going from 2D depth -> World States -> 2d rgb, using the matrices provided by the csv file in the recorder app. However I am following this is to project into the rgb space: https://docs.microsoft.com/en-us/windows/mixed-reality/locatable-camera. Somehow I see the point clouds for the 2 views but they have some offset in each coordinate. Maybe the multiplication by 0.5 etc. as provided in the shader code example (above link) should not be exactly implemented. Do you know the correct approach? Thanks!

from hololensforcv.

cyberj0g commented on May 22, 2024

For anyone who comes from Google, see my implementation of such mapping. Might not be the best code, but works fast and accurate enough for my purpose.

from hololensforcv.

LisaVelten commented on May 22, 2024

Hi everyone,

I am working on the same problem you have. I cannot get my depth images and hd images aligned. The following picture visualizes my problem. I try to align a calibration pattern. First I filter the 3D depth points, which belong to the calibration pattern, then I project these points into the HD image.

To figure out the problem I investigated in some detail the meaning of the tranformation matrices. In the following I outline my understanding of the transformation matrices. Then I describe how I try to align my images. I would appreciate your help very much!

1. CameraCoordinateSystem (MFSampleExtension_Spatial_CameraCoordinateSystem)
In the example HoloLensForCV this coordinate system is used to obtain the transformation "FrameToOrigin". The FrameToOrigin transformation is obtained by transforming the CameraCoordinateSystem to the OriginFrameOfReference. (line 140-142 in MediaFrameReaderContext.cpp)

I still do not exactly know what is described by this transformation. What is meant by "frame"?

Through experimenting I found out that the translation vector changes when moving. In fact, the changes do make sense: If I move forward, the z-component becomes smaller. This agrees with the coordinate system in the image below. The z-axis is pointing in the opposite direction of the image plane.

The same applies for moving left or right: moving right makes the x component increase. The y component is about stable. This makes sense as I am not moving up or down.
What I am really uncertain about is the rotational part of the transformation matrix. The rotational part is almost an Identity Matrix. The rotation of my head seems to be contained in the CameraViewTransform, which I describe in the second point.

As far as I understand, the FrameToOrigin Matrix looks as follows:
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
x, y, z, 1]

For me the "FrameToOrigin" seems to describe the relation between a fixed point on the HoloLens to the Origin (The Origin is defined each time the app is started, this helps to map each frame to a frame of reference). In the Image above the Origin is probably the "App-specific Coordinate System".

2. CameraViewTransform (MFSampleExtension_Spatial_CameraViewTransform )
The CameraViewTransform is directly saved with each frame (In contrast to FrameToOrigin, no transformation is neccessary).

The rotation of the head seems to be saved within the rotational part of this matrix. I testes this by moving my head around the y-axis. If I turn about 180 ° to the right around my y-axis, the rotational part looks as follows:
[0, 0, 1,
0, 1, 0,
-1, 0, 0].
This corresponds to a 180° rotation around the y-axis - what we expect..

The translational part seems to stay about stable. This would make sense if the translational part described the translation between the fixed point on the HoloLens and the respective camera (hd or depth). However, I would expect the translational part to stay exactly equal. This is not the case. The translational part is only "about" equal and not exactly.

If I do not turn my head (rotational part is an Identity Matrix) the CameraViewTransform looks as follows:

CameraViewTransform for HD Camera
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0.00631712, -0.184793, 0.145006, 1]

CameraViewTransform for Depth Camera
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0.00798517, -0.184793, 0.0537722, 1]

So the CameraViewTransform seems to capture the rotation of the users head. What is captured by the translational part? If the translational part is the distance between a fixed point on the HoloLens and the respective camera - why is the translational part not always exactly equal?

3. CameraProjectionTransform (MFSampleExtension_Spatial_CameraProjectionTransform)
This transformation is described on the following github page:
https://github.com/MicrosoftDocs/mixed-reality/blob/5b32451f0fff3dc20048db49277752643118b347/mixed-reality-docs/locatable-camera.md

However, what is still unclear: What is the meaning of the terms A and B?

My aim is to map between the depth camera and the hd camera of the HoloLens. To do this I do the following:

I record images with the Recorder Tool of the HoloLensForCV sample
I take a depth image and look for the corresponding hd image by checking the timestamps.
I use the unprojection mapping to find the 3D points in the CameraViewSpace of the Depth Camera.
I transform the 3D points from the Depth Camera View to the HD Camera View and project them onto the image plane I use the following transformations:
Pixel Coordinates = [3D depth point,1] * inv(CameraViewTransform_Depth) * FrameToOrigin_Depth * inv(FrameToOrigin_HD) * CameraViewTransform_HD * CameraProjectionTransfrom_HD

These Pixel Coordinates are in the range from -1 to 1 and need to be adjusted to the image size of 720x1280. This is done as follows:
 x_rgb = 1280 * (PixelCoordinates.x + 1) / 2;
 y_rgb = 720 * (1 - ((PixelCoordinates.y +1)/2));

Result: When transforming my detections from the depth camera to the hd image camera, the objects (In this case the Calibration Pattern) are not 100 % aligned. So I am trying to figure out where the misalignment is coming from. Am I understanding the transformation matrices wrong or has anyone experienced similar problems?

The problems might occur if the spatial mapping of the HoloLens is not 100% working correctly. This might happen If the HoloLens cannot find enough features to map the room. Thus, I tested my setup in different rooms. Especially, in smaller rooms with more clutter in the background (such that the HoloLens can find more features). However, the problem still occurs. As I outlined above, the rough appearance of the transformations seems to be correct. I do not have any idea of how to test the transformation matrices further to grasp the problem.

I would appreciate your help very much! Thanks a lot in advance!
Lisa

from hololensforcv.

cxnvcarol commented on May 22, 2024

Hello. I haven't tried it myself, but it looks like you're very close to the solution. Could the error come from which matrix are you using for the reprojection? (each camera has it's own CameraViewTransform and FrameToOriginTransform even if they're very close to each other).

Also, if it helps, I personally have found useful to check the ARUco Sample in the HololensForCV project in order to understand how to use these matrices. In the attachment I've extracted the important lines. In this case they have the correspondence already and they're triangulating the 3d point back to the world coordinates. Let us know if you find the solution, I'll apprecciate it.
2DPairTo3DPipelineHololens_arucoSample.pdf

from hololensforcv.

LisaVelten commented on May 22, 2024

I thought it might be better to open a a new case (#119 ), as this one is already closed. I found a solution for the correct alignment, which I noted down in the comments. The exact content of the transformation matrices is still unclear to me. So the case #119 is still open.

from hololensforcv.

How to acquire synchronized frames from depth and main(1280x720) RGB cameras? about hololensforcv HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent