Giter Club home page Giter Club logo

monovit's People

Contributors

zxcqlf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

monovit's Issues

Training from scratch custom data

Hello Zhaocq,

First of all thank you very much for sharing your amazing work. I managed to integrate the model into the Monodepth training pipeline and train the model using the pre-trained weights that you provided. Even though, the results obtained were not great as with far object/scenes the model was not managing to keep a smooth disparity or properly interpreting the scene. The top-right image corresponds to the ZED2 output depth.
image

My goal is to be able to train it from "scratch" (using the pre-trained models with ImageNet) as I would like to include a semi-supervised term into the loss function using GT Depth (LIDAR or ZED2 depth) information to be able to get metric depth and pose and keep consistency with far elements of the image. Inspired by this paper https://arxiv.org/pdf/1910.01765.pdf

Here I attach some sample images that I am using for training:

image image
image image

I am using the same parameters that you are mentioning in the experiments section when you explain how you trained from scratch, using the pre-trained models from ImageNet and combined with the information provided in the multiple papers of Monodepth and Monodepth2.
image

When starting training the first mini-batch output is as shown in the following image:
image

However the following mini-batches from the same epoch are as follows, which means that the NN is not properly being trained:
image image image

I am using the ZED2 camera which has a baseline of 12cm between each lenses and using rectified images before inference.

With this provided information would you be able to understand what I may be missing from the papers and code provided that is not making the model train?

Thank you very much for your time.

Can you share more details about the evaluation with the DrivingStereo dataset?

I'm interested in evaluating monocular depth estimation solutions with the DrivingStereo dataset, but since the dataset is focused in stereo matching I'm not yet sure how to do this. Your experiments implementation could serve as a reference for me and other people trying to conduct similar experiments.

Thanks in advance!

A evaluation for Driving Stereo Dataset

Dear author,

Thank you for your fantastic contribution. I noticed that you mentioned the results for the Drivingstereo dataset in the paper. However, I cannot locate any corresponding code for this dataset. The Drivingstereo dataset is different from the Kitti dataset and does not have a toolkit for loading :(

Thank you for your time and assistance!

Pose Network weights

First of all, thank you for sharing your nice work.

Could you share the weights of pose networks in addition to depth networks?

Thank you.

Revise the setting of the depth network

For training, please download monodepth2, replace the depth network, and revise the setting of the depth network, the optimizer and learning rate according to trainer.py.

I read the contents of the sentence above. I don't know how to modify trainer.py. Can you help me?

The model and loaded state dict do not match exactly.

Hello,

I would like to express my gratitude for your outstanding work. I am currently working with monodepth2 and have encountered a model mismatch error while attempting to replace the depth network. The specific error I am facing is as follows:

Uploading 1119.png…

I would be immensely grateful if you could provide me with some suggestions or advice on how to modify or address this issue。

Discussion: how about implement joint CNN&TF on posenet backbone?

Thanks for your paper and repo, I am working on the self-supervised odometry estimation, I am interested in your apporach for depth prediction with local and global context.

Just asking your opinions, if I implenment your contribution architecture on PoseNet side, what advantge could be expected? Thanks!!

Evaluating on a folder of images

Do you have the equivalent script to monodepth2's "test_simple.py" script but for monovit? or is there a way to use your evaluate_depth.py to evaluate on a folder on RGB images?

关于消融实验

论文当中是以其他论文MPViT模块的删减来做消融,请问这个的考虑是基于什么呢?因为一般来说消融实验会在自己本身论文模块进行删减。

How to replace monodepth2's depth network with the one of MonoViT?

Hi,

Thanks for your fantastic work. I am a bit confused of the following description.
please download monodepth2, replace the depth network, and revise the setting of the depth network, the optimizer and learning rate according to trainer.py.

  1. Does depth network mean replacing the folder of monodepth2 with MonoViT's?
  2. There's no in trainer.py of MonoViT.

Can you provide more info how to do the replacement? Thanks in advance.

load checkpoint

Train stage:About "For training, please download monodepth2, replace the depth network, and revise the setting of the depth network" .Can you give some details, because it occurs some errors

Questions about the size of feature maps

Hello, author, thanks for your remarkable work.
I noticed that you changed the stride(from 2 to 1) of the second conv block of stem block to get a H / 2 × W / 2 feature map. And after the first "Joint CNN & Transformer Layer", the feature map downsample twice again to H / 4 × W / 4 . But according to the paper of MPViT, it seems the first "Joint CNN & Transformer Layer" won't change the height and width of feature map. Did you make any additional changes?
1700108370092
1700108461215

mmcv version problem

The following error occurs due to the mmcv version problem, how did you solve it?

ModuleNotFoundError: No module named 'mmcv._ext'

After making modifications to monodepth2, the results are significantly different from what was expected.

class Trainer:
    def __init__(self, options):
        ...

        # self.models["encoder"] = networks.ResnetEncoder(
        #     self.opt.num_layers, self.opt.weights_init == "pretrained")
        self.models['encoder'] = networks.mpvit_small()
        self.models["encoder"].to(self.device)
        # self.parameters_to_train += list(self.models["encoder"].parameters())

        self.models["depth"] = networks.DepthDecoder()
        self.models["depth"].to(self.device)
        # self.parameters_to_train += list(self.models["depth"].parameters())

        if self.use_pose_net:
            if self.opt.pose_model_type == "separate_resnet":
                self.models["pose_encoder"] = networks.ResnetEncoder(
                    self.opt.num_layers,
                    self.opt.weights_init == "pretrained",
                    num_input_images=self.num_pose_frames)

                self.models["pose_encoder"].to(self.device)
                self.parameters_to_train += list(self.models["pose_encoder"].parameters())

                self.models["pose"] = networks.PoseDecoder(
                    self.models["pose_encoder"].num_ch_enc,
                    num_input_features=1,
                    num_frames_to_predict_for=2)

            elif self.opt.pose_model_type == "shared":
                self.models["pose"] = networks.PoseDecoder(
                    self.models["encoder"].num_ch_enc, self.num_pose_frames)

            elif self.opt.pose_model_type == "posecnn":
                self.models["pose"] = networks.PoseCNN(
                    self.num_input_frames if self.opt.pose_model_input == "all" else 2)

            self.models["pose"].to(self.device)
            self.parameters_to_train += list(self.models["pose"].parameters())

        if self.opt.predictive_mask:
            assert self.opt.disable_automasking, \
                "When using predictive_mask, please disable automasking with --disable_automasking"

            self.models["predictive_mask"] = networks.DepthDecoder()
            self.models["predictive_mask"].to(self.device)
            self.parameters_to_train += list(self.models["predictive_mask"].parameters())

        # self.model_optimizer = optim.Adam(self.parameters_to_train, self.opt.learning_rate)
        # self.model_lr_scheduler = optim.lr_scheduler.StepLR(
        #     self.model_optimizer, self.opt.scheduler_step_size, 0.1)


        #######################
        ####   MonoViT      ##
        ######################
        # self.model_optimizer = optim.AdamW(self.parameters_to_train, self.opt.learning_rate)
        self.params = [{
            "params": self.parameters_to_train,
            "lr": 1e-4
            #"weight_decay": 0.01
            },
            {
            "params": list(self.models["encoder"].parameters()),
            "lr": self.opt.learning_rate
            # "weight_decay": 0.01
        }]
        self.model_optimizer = optim.AdamW(self.params)
        self.model_lr_scheduler = optim.lr_scheduler.ExponentialLR(
            self.model_optimizer, 0.9)

        ....

I made the modifications to the monodepth2 trainer according to your readme, but the results I obtained after training were significantly different from the results in your paper. I suspect that my modifications may not be correct, but due to my limited abilities, I cannot find the error. Could you help me take a look and see where I may have made a mistake? Thank you very much for your open source contributions, and I hope to receive your help.

关于模块替换

您好 我想把Convolutional Block 替换新的模块 请问这块代码具体是哪部分么

Recreation failed

MonoVit The results are shocking, thanks for your work. But I had a little problem in reproducing.
In trainer.py, I wrote:
self.models["encoder"] = networks.mpvit_small()
self.models["encoder"].to(self.device)

self.models["encoder"].num_ch_enc = [64, 128, 216, 288, 288]
self.models["depth"] = networks.DepthDecoder()
self.models["depth"].to(self.device)
self.parameters_to_train += list(self.models["depth"].parameters())
and:
self.models["encoder"] did not put it in self.parameters_to_train. Learning rate and optimizer are the same as you give.
Everything else remains in the same setting as the monodepth2.
My environment:
torch 1.12.1+cu116 pypi_0 pypi
torchaudio 0.12.1+cu116 pypi_0 pypi
torchvision 0.13.1+cu116 pypi_0 pypi
The results obtained:
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.106 & 0.766 & 4.491 & 0.182 & 0.893 & 0.965 & 0.983 \
It is very different from your result, can you give me some hints, or send your related files to me:
[email protected]

Which net to use for pose network?

Thanks for the great work and open source your code!

I have a question about the pose network: which network should I use for the pose network? According to paper I should use a 'lightweight ResNet18', but I fails to find it in the repo, the posecnn used in monodepth2 seems not a ResNet, is there something I miss?

Best regards,
Liancheng

Missing scripts

Dear authors, thanks so much for releasing your codes! I was wondering if some scripts are not yet uploaded to the repo? What would be the expected date for the full repo to be released, for reproducing training and evaluation?

Unable to reproduce MPViT-base correctly.

Dear author,

Thank you for your fantastic contribution ! However,I had some problems reproducing the results of MPViT-base. I'd really appreciate it if you would help me check what the problem is :-) @zxcqlf

I eval my MPViT-base model on KITTI and got the following results:
截屏2023-04-04 15 31 54

I think it may be because I set num_ch_enc or ch_enc incorrectly in depth_decoder, would you help me confirm what the correct value should be?

  1. Firstly, I modified the DepthDecoder class in hr_decoder.py by changing self.num_ch_dec to np.array([64, 64, 128, 256, 512]) as shown below.
class DepthDecoder(nn.Module):
    def __init__(self, ch_enc = [64,128,216,288,288], scales=range(4),num_ch_enc = [ 64, 64, 128, 256, 512 ], num_output_channels=1):
        super(DepthDecoder, self).__init__()
        self.num_output_channels = num_output_channels
        self.num_ch_enc = num_ch_enc
        self.ch_enc = ch_enc
        self.scales = scales
        # self.num_ch_dec = np.array([16, 32, 64, 128, 256])  # mpvit_small
        self.num_ch_dec = np.array([64, 64, 128, 256, 512])  # mpvit_base
  1. Secondly, in trainer.py, I reassigned the ch_enc and num_ch_enc arguments to DepthDecoder. It looks like this:
class Trainer:
    def __init__(self, options, ngpus_per_node=None):
        ... ...
        self.models["encoder"] = networks.mpvit_base()
        self.models["encoder"].to(self.device)
        # self.parameters_to_train += list(self.models["encoder"].parameters())

        self.models["depth"] = networks.DepthDecoder(ch_enc=[128, 224, 368, 480, 480], num_ch_enc = [128,128,256,512,1024])
        self.models["depth"].to(self.device)
        self.parameters_to_train += list(self.models["depth"].parameters())
        ... ...
  1. Finally, in evaluate_depth.py, I changed the parameters of the encoder and decoder.
def evaluate(opt,ngpus_per_node=None):
        ... ...
        encoder = networks.mpvit_base().to(device) #networks.ResnetEncoder(opt.num_layers, False)
        encoder.num_ch_enc = [128, 224, 368, 480, 480] # = networks.ResnetEncoder(opt.num_layers, False)

        depth_decoder = networks.DepthDecoder(ch_enc=[128,224,368,480,480], num_ch_enc = [128,128,256,512,1024]).to(device)
        ... ...

As a supplement, my training loss looks like this:

截屏2023-04-04 15 55 07

Thank you for your time and assistance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.