Giter Club home page Giter Club logo

Comments (9)

borongyuan avatar borongyuan commented on May 24, 2024 1

Good news, HF-Net now works on OAK cameras. Performance in preliminary tests looks pretty good, slightly faster than SuperPoint with 320×200 input. If the global head is removed, it has acceptable real-time performance even with 640×400 input. I won't provide this cropped model because we definitely want to use the full HF-Net. You can download the converted blob file from my new repository.
https://github.com/Factor-Robotics/depthai-hfnet/raw/main/blobs/hfnet_200x320_5shave.blob

It should be noted that I did not use the previous ONNX file for conversion, but took a different route. I directly converted the TensorFlow model to OpenVINO IR and then compiled it into a blob file. I gusse that this may avoid some model conversion problems, such as the FP16 overflow issue encountered by the SuperPoint model on L2 Norm. Below is the OpenVINO IR visualized in Netron. You can see that in the previous ONNX file L2 Norm was broken down into a series of basic operations, while in OpenVINO IR it is a single NormalizeL2 layer.
saved_model xml

I've added local head part in #1193 and it looks good. It is also easy to add the global head part, but I am a little unclear about your previous definition of GlobalDescriptor. I know how to fill data_, but what exactly do type_ and info_ refer to? Should type=0 for cmr_lidarloop, type=1 for HF-Net/NetVLAD? Or type=0 for all descriptors in vector form?

from rtabmap.

borongyuan avatar borongyuan commented on May 24, 2024 1

I have updated the blob files. The real problem with global head is caused by NormalizeL2 in OpenVINO. Elements after intra-normalization are all 1 or -1. This is not data overflow or underflow, but rather a failure to sum along the set dim. I managed to avoid the problem and now all outputs look reasonable.
However, there seems to be another issue with this part. HF-Net's intra-normalization seems to use a wrong dim ethz-asl/hfnet#71. Even so, the model may still give correct results. Because it is distilled from NetVLAD rather than trained directly. It's just running in unexpected ways. We will verify the results later.

from rtabmap.

hellovuong avatar hellovuong commented on May 24, 2024

Hi, I am also looking at implementing of global descriptor for rtabmap. There are some repos already implemented in HF-NET TensorRT also, I tested it. Should we consider that? should not be so hard to add to the rtabmap codebase. I just don't know how to integrate it with RTABMap memory management for image retrieval.

from rtabmap.

matlabbe avatar matlabbe commented on May 24, 2024

As explained in #1105, we added an interface to feed a global descriptor (can be any format, see also rtabmap_python to easily compress/uncompress numpy matrices) at the same time than image or lidar data to rtabmap node. For images, the easiest way would be to make a node combining the image topics into a RGBDImage topic (including global descriptor field), connected as input to rtabmap node (with subscribe_rgbd:=true). The global descriptor will be saved in the database for each node added to map's graph.

Currently, there is no internal loop closure detector based on global descriptor. An external loop closure detector can get global descriptors of all nodes in WM (and LTM) by calling service /rtabmap/get_map_data (with graphOnly=true, you get only features data to avoid downloading all images). To handle memory management (when used), that external loop closure node would also need to subscribe to rtabmap/mapData topic to get the GlobalDescriptor linked to latest added node (ID) or any nodes retrieved from LTM that were not downloaded on start. When a loop closure is detected, you can call /rtabmap/add_link service to add the constraint to the internal graph. This is roughly how cmr_lidarloop approach did it with a lidar global descriptor.

Back in the days, when we added the GlobalDescriptor table in the database the goal was indeed to add NETVLAD global descriptor support to improve/combine with the actual loop closure detection done inside RTAB-Map (currently based only on local visual descriptors, a.k.a. bags-of-words approach). Currently the external loop closure detection seems to most flexible for any loop closure detection approach, though it requires ROS/ROS2. I guess to do it internally in standalone version we would need to add python global descriptor approach (like we did for external python ML local keypoints/descriptors or for ML feature matching) to avoid re-implementing in c++ every new ML global descriptors coming up. It has been a while since I read on the subject, but is there a common way to do global descriptor matching in current state-of-the-art that could be used with many global descriptors? Is just a naive nearest neighbor approach between global descriptors enough to find the closest ones? If so, that could worth putting the time to implement it inside RTAB-Map so that we can better integrate loop closure hypotheses between global descriptors and BOW (combine them or switch between them).

I think we would not have to modify the memory management approach, as it is based on the current actual loop closure hypotheses to know which ones to retrieve from LTM first (loop closure hypotheses would already include the score of the global descriptor matching).

from rtabmap.

borongyuan avatar borongyuan commented on May 24, 2024

It has been a while since I read on the subject, but is there a common way to do global descriptor matching in current state-of-the-art that could be used with many global descriptors? Is just a naive nearest neighbor approach between global descriptors enough to find the closest ones?

In fact, I was relatively vague about this part of the concept before. I just tend to introduce some deep learning methods in loop closure detection, because they should be able to adapt to changing environments better. For odometry I tend to use traditional methods because this part is well-modeled. I'm also curious about what form the global descriptor can take. cmr_lidarloop and VLAD both use feature vectors, and KNN is used for matching. I did some rough searches and there didn't seem to be any more good options for matching. The summary here can be referred to. Therefore, I think we can first implement global retrieval based on KNN.

My plan is to add nanoflann support first #906. This should help both with performance and cross-platform usage. Then I can add ANMS by the way #1127.

I guess to do it internally in standalone version we would need to add python global descriptor approach (like we did for external python ML local keypoints/descriptors or for ML feature matching) to avoid re-implementing in c++ every new ML global descriptors coming up.

Using onnxruntime seems to be much more convenient. But I'm new to dotnet.
2024-02-18 14 35 54 onnxruntime ai d1d9d7d5732c

from rtabmap.

borongyuan avatar borongyuan commented on May 24, 2024

My plan is to add nanoflann support first #906.

The biggest obstacle to adding nanoflann is maintaining compatibility of parameters. Adding it to the end of GMS can lead to confusing logic. Any suggestions?

from rtabmap.

matlabbe avatar matlabbe commented on May 24, 2024

I think matching global descriptor with KNN could be a good first step. For nanoflann implementation, I left a new comment on #906 on how that could be integrated.

from rtabmap.

matlabbe avatar matlabbe commented on May 24, 2024

That looks great! Note that #1193 has been in Draft mode for a while, just checking if it is on purpose or it could be ready to be reviewed/merged.

For GlobalDescriptor:

int type_;
cv::Mat info_;
cv::Mat data_;

These fields don't have a fixed purpose yet. The CMR loop closure detector is using type=0 on their side. I would keep type=0 for types that rtabmap cannot handle internally (e..g, user-defined external descriptor like CMR). We could officially set NETVLAD descriptor as type=1. The info field is optional, it was used for CMR to describes the vectors inside the data field. For NETVLAD, we could just set type=1 and data fields.

from rtabmap.

borongyuan avatar borongyuan commented on May 24, 2024

I filled the global descriptor. Unfortunately this part of the data still looks wrong. All values are close to 0.015625. This means that a numerical overflow occurred somewhere. I need to debug the model again. My main suspicion is the ReduceSum layer. It's often dangerous in FP16 inference. ReduceSum was also generated when L2 Norm was decomposed in the previous SuperPoint model.

I submitted the code first because after debugging the model I only needed to update the blob file. The model may require scaling in some parts.

from rtabmap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.