Comments (13)
Hi @mauronano, poses of the sensors in different timestamps are available in research mode. For depth maps, there is an unprojection model for the depth sensor as well. Therefore, Using the unprojection model, you can get 3D coordinates of the points in the depth map. Knowing the relative pose between depth sensor and RGB camera, you should be able to transform the 3D points to RGB camera coordinate system and project the points to RGB camera view, which means that you can create depth maps for RGB camera view. In this case, RGB frame and depth frame are not necessary to be acquired at the same time. However, if the acquired time gap is too large, there are some other issues, like occlusion.
from hololensforcv.
@mauronano, what @Huangying-Zhan said is exactly right. In my case I have an additional set of steps where I keep the latest frame from each of the streams, but only send the data from the last frame of each if the wearer hasn't moved very much in the last second. This is mainly to prevent blurry images, but also helps to make debugging a little bit easier. While the CameraStreamCoordinateMapper and CameraStreamCorrelation samples seem to indicate you can directly convert from one image to the other, I found this to simply not be true. Likely because those two samples use a stationary Kinect, which acts differently. In my case I ended up doing: ColorProjectionMatrix * ColorViewMatrix * DepthSpaceToColorSpace * InverseDepthViewMatrix * DepthCameraPoint
. It's a whole ordeal, but all in all a pretty standard 2D->3D->3D->2D graphical space conversion pipeline. To get the DepthSpaceToColorSpace
matrix you can use the TryGetTransformTo
method with the two SpatialCoordinateSystem
s like they do in the MediaFrameReaderContext::FrameArrived
function. Seeing in your previous question that you are using Unity, be sure to convert the matrices before doing anything with them. The Mixed Reality Toolkit has a preview sample that shows how to do that.
from hololensforcv.
I would like to add, the projection returned by the streams for the color image is very wrong. I ended up needing to fake it. Once you get it dialed in though, you can capture things as small as individual wires pretty well.
Folds in fabric come out pretty well too. Keep in mind, the data is rough, and full of holes.
from hololensforcv.
@Huangying-Zhan , @FracturedShader thank you both for your answers. Just to make sure if i understand what you suggest: after i manage to get the 3D coordinates of the unprojected depth pixel (which you refer to as "DepthCameraPoint") i use the cameraViewTransform matrix provided by the Recorder tool to map them to the depth camera space. Regarding the DepthSpaceToColorSpace, i'm not sure about the meaning of the "FrameToOrigin" matrix produced at the place in the code you pointed at, maybe grasping that could help me better understand the math behind this trasformation. As for the "ColorViewMatrix " which stores the camera extrinsics, should i edit the Recorder project to provide me that too? As at the moment it returns only the parameters of the low resolution RGB cameras and not the 1280x720 one.
from hololensforcv.
@mauronano, unfortunately I ended up writing my own custom application. I haven't actually tried to use the Recorder project directly. As a result I don't know much about how it specifically works. You may just have to poke around and make modifications as you see fit to get the data you need.
from hololensforcv.
@mauronano, suppose you have 3D points in Depth camera space, then if you want to get 3D points in Color camera space, you need the relative pose between depth camera and color camera. From the recorder app, there is an example of getting the absolute pose for each sensor, which you can check from here. Eventually, you can get the relative pose from the absolute poses.
from hololensforcv.
I want to thank you both very much for your answers as i'm only approaching to computer vision and the stuff i find online is still pretty obscure to me. I have one last question, can i only find the alignment once, offline, and then use it to map every depth frame on my rgb frames or it is something i should do continuously?
from hololensforcv.
@mauronano, given that the depth frames and rgb frames are generally taken at different timestamps, which means that, if you have many RGB-D pairs, the relative poses (between RGB and D) for these pairs are not constant and you can't use a single transformation for all the alignments. Therefore, you need to do this for each RGB-D pair. If RGB-D pairs are always taken at same timestamp, then a single transformation is enough for all the alignments. Unfortunately, it is not the case in here.
from hololensforcv.
@mauronano were you able to convert between the 2 co-ordinate systems? I tried the mapping by going from 2D depth -> World States -> 2d rgb, using the matrices provided by the csv file in the recorder app. However I am following this is to project into the rgb space: https://docs.microsoft.com/en-us/windows/mixed-reality/locatable-camera. Somehow I see the point clouds for the 2 views but they have some offset in each coordinate. Maybe the multiplication by 0.5 etc. as provided in the shader code example (above link) should not be exactly implemented. Do you know the correct approach? Thanks!
from hololensforcv.
For anyone who comes from Google, see my implementation of such mapping. Might not be the best code, but works fast and accurate enough for my purpose.
from hololensforcv.
Hi everyone,
I am working on the same problem you have. I cannot get my depth images and hd images aligned. The following picture visualizes my problem. I try to align a calibration pattern. First I filter the 3D depth points, which belong to the calibration pattern, then I project these points into the HD image.
To figure out the problem I investigated in some detail the meaning of the tranformation matrices. In the following I outline my understanding of the transformation matrices. Then I describe how I try to align my images. I would appreciate your help very much!
1. CameraCoordinateSystem (MFSampleExtension_Spatial_CameraCoordinateSystem)
In the example HoloLensForCV this coordinate system is used to obtain the transformation "FrameToOrigin". The FrameToOrigin transformation is obtained by transforming the CameraCoordinateSystem to the OriginFrameOfReference. (line 140-142 in MediaFrameReaderContext.cpp)
I still do not exactly know what is described by this transformation. What is meant by "frame"?
Through experimenting I found out that the translation vector changes when moving. In fact, the changes do make sense: If I move forward, the z-component becomes smaller. This agrees with the coordinate system in the image below. The z-axis is pointing in the opposite direction of the image plane.
The same applies for moving left or right: moving right makes the x component increase. The y component is about stable. This makes sense as I am not moving up or down.
What I am really uncertain about is the rotational part of the transformation matrix. The rotational part is almost an Identity Matrix. The rotation of my head seems to be contained in the CameraViewTransform, which I describe in the second point.
As far as I understand, the FrameToOrigin Matrix looks as follows:
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
x, y, z, 1]
For me the "FrameToOrigin" seems to describe the relation between a fixed point on the HoloLens to the Origin (The Origin is defined each time the app is started, this helps to map each frame to a frame of reference). In the Image above the Origin is probably the "App-specific Coordinate System".
2. CameraViewTransform (MFSampleExtension_Spatial_CameraViewTransform )
The CameraViewTransform is directly saved with each frame (In contrast to FrameToOrigin, no transformation is neccessary).
The rotation of the head seems to be saved within the rotational part of this matrix. I testes this by moving my head around the y-axis. If I turn about 180 ° to the right around my y-axis, the rotational part looks as follows:
[0, 0, 1,
0, 1, 0,
-1, 0, 0].
This corresponds to a 180° rotation around the y-axis - what we expect..
The translational part seems to stay about stable. This would make sense if the translational part described the translation between the fixed point on the HoloLens and the respective camera (hd or depth). However, I would expect the translational part to stay exactly equal. This is not the case. The translational part is only "about" equal and not exactly.
If I do not turn my head (rotational part is an Identity Matrix) the CameraViewTransform looks as follows:
CameraViewTransform for HD Camera
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0.00631712, -0.184793, 0.145006, 1]
CameraViewTransform for Depth Camera
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0.00798517, -0.184793, 0.0537722, 1]
So the CameraViewTransform seems to capture the rotation of the users head. What is captured by the translational part? If the translational part is the distance between a fixed point on the HoloLens and the respective camera - why is the translational part not always exactly equal?
3. CameraProjectionTransform (MFSampleExtension_Spatial_CameraProjectionTransform)
This transformation is described on the following github page:
https://github.com/MicrosoftDocs/mixed-reality/blob/5b32451f0fff3dc20048db49277752643118b347/mixed-reality-docs/locatable-camera.md
However, what is still unclear: What is the meaning of the terms A and B?
My aim is to map between the depth camera and the hd camera of the HoloLens. To do this I do the following:
- I record images with the Recorder Tool of the HoloLensForCV sample
- I take a depth image and look for the corresponding hd image by checking the timestamps.
- I use the unprojection mapping to find the 3D points in the CameraViewSpace of the Depth Camera.
- I transform the 3D points from the Depth Camera View to the HD Camera View and project them onto the image plane I use the following transformations:
Pixel Coordinates = [3D depth point,1] * inv(CameraViewTransform_Depth) * FrameToOrigin_Depth * inv(FrameToOrigin_HD) * CameraViewTransform_HD * CameraProjectionTransfrom_HD
These Pixel Coordinates are in the range from -1 to 1 and need to be adjusted to the image size of 720x1280. This is done as follows:
x_rgb = 1280 * (PixelCoordinates.x + 1) / 2;
y_rgb = 720 * (1 - ((PixelCoordinates.y +1)/2));
Result: When transforming my detections from the depth camera to the hd image camera, the objects (In this case the Calibration Pattern) are not 100 % aligned. So I am trying to figure out where the misalignment is coming from. Am I understanding the transformation matrices wrong or has anyone experienced similar problems?
The problems might occur if the spatial mapping of the HoloLens is not 100% working correctly. This might happen If the HoloLens cannot find enough features to map the room. Thus, I tested my setup in different rooms. Especially, in smaller rooms with more clutter in the background (such that the HoloLens can find more features). However, the problem still occurs. As I outlined above, the rough appearance of the transformations seems to be correct. I do not have any idea of how to test the transformation matrices further to grasp the problem.
I would appreciate your help very much! Thanks a lot in advance!
Lisa
from hololensforcv.
Hello. I haven't tried it myself, but it looks like you're very close to the solution. Could the error come from which matrix are you using for the reprojection? (each camera has it's own CameraViewTransform and FrameToOriginTransform even if they're very close to each other).
Also, if it helps, I personally have found useful to check the ARUco Sample in the HololensForCV project in order to understand how to use these matrices. In the attachment I've extracted the important lines. In this case they have the correspondence already and they're triangulating the 3d point back to the world coordinates. Let us know if you find the solution, I'll apprecciate it.
2DPairTo3DPipelineHololens_arucoSample.pdf
from hololensforcv.
I thought it might be better to open a a new case (#119 ), as this one is already closed. I found a solution for the correct alignment, which I noted down in the comments. The exact content of the transformation matrices is still unclear to me. So the case #119 is still open.
from hololensforcv.
Related Issues (20)
- Can I use this in Unity HOT 1
- How can I get world space ray from pv frame ? HOT 2
- TOF short_throw recording FPS not stable
- new Mat() access denied
- Access to raw eye tracking data
- HoloLensForCV.SensorFrameStreamer.Enable() - unauthorised access exception HOT 2
- FrameToOrigin
- OpenCV lib corrupt HOT 1
- Errors while building the project
- wrong url in readme HOT 1
- Can we use this for making an app for commercial use?
- Access to active brightness buffer (clean IR reading)?
- Cannot build ArUco Marker Tracker
- Applying mask on live stream
- nullptr_ service socket when trying to access Short Throw ToF Reflectivity sensor
- This repo is missing important files HOT 2
- Real world pixel size/spacing
- How to unproject a 2D point in an captured image back to its accurate location in the physical world?
- cant catch the Exception in the Concurrency Runtime HOT 1
- Can I use the Hololense2 to capture the depth RGB images? for the purpose of deep learning training.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hololensforcv.