Giter Club home page Giter Club logo

midas's Issues

how much gpus did you use?

Hi, since the MidasNet is a very large model, how much gpus did you use and how long did it take to train the model? Since the batchsize is not larget ( 8 for each dataset), would multi-gpu training hurt the performance? Since there are a lot of batch normalization layers in the encoder.

Thanks.

What kind of activation function do you use during training?

Hi,
I see that relu is used during evaluation(this repo), What about training?
other question is that we do not include invalid pixels such as the sky when calculating the loss.
How can we ensure that the network outputs the correct depth (zero or negative) for the background such as the sky?

Training code

Do you plan to release your training code sometime in the future? It would be really helpful to advance the research on monocular depth estimation!

If not, can you explain how the Pareto optimatility is ensured during training? It seems like there will also have to be an undo step in the training pipeline such that whenever the Pareto optimum is reached and the next backpropagation update disturbs this state, this update will have to be reversed.

How to covert RedWeb dataset label to disparity [0, 1]?

Hi,
In redweb dataset, the label is gived by a png file. nearest object depth is 0 and background depth is 255 (sky or something).
How to convert it to disparity [0, 1] as suggested in the paper?
What about other dataset like MegaDepth?

Is my code correct?

eps = 0.1
label = cv2.imread(label_path, cv2.IMREAD_GRAYSCALE).astype(np.float32)
sky_mask = (label == 255)
disparity = 1 / (label + eps)
disparity[sky_mask] = 0
disparity= (disparity - disparity.min()) / (disparity.max() - disparity.min())

Wrong predictions?

I did a test run with a book on a flat surface. On the output the background depth isn't uniform. Is there something I'm missing?

image

Function loss or training code

Hi!

Thanks for sharing the source code, this is a great contribution!
I have searched in the repository, but I didn't find the loss function that you have described in the paper (considering the normalization procedure for disparity maps).
Do you have plans to share it, or even to share the training code?

Thanks.

Depth map estimated distance

Hello, the project I am working on requires a monocular camera to estimate the distance of the object. The general method is to obtain the depth value from the depth camera and then calculate the distance according to the camera_factor. But I did not see this related variable in the paper. I know if I can use the depth generated by this network to estimate the distance. I am a student who has just learned knowledge in this field. Hope to get help. Thank you

Input size during inference

Hello!

You write that during training images are "randomly cropped and resized to 384×384". If we leave augmentation aside, images are essentially resized so that the shorter dimension becomes 384. However, during inference we resize so that the longer dimension is 384.

For example, suppose all the inputs are 1,280x720. During training, if we do the central crop, then the image is downsampled by the factor 720 / 384. If the square crop is not 720x720, but rather, say, 672x672, then the image is downsampled by the factor 672 / 384. During inference, however, we take the 1,280x720 image and downsample it by the much higher factor 1280 / 384, so it becomes 384x224 (there's a tiny distortion because the shorter dimension must be divisible by 32). Instead, we could go for something like 672x384 and get input features approximately of the same size as during training.

I don't know how important this is, but I would much appreciate if you could shed some light on the reasoning behind your choice.

Got unexpected keyword argument "groups"

python3 run.py
initialize
device: cuda
Loading weights: model-f46da743.pt
Downloading: "https://github.com/facebookresearch/WSL-Images/archive/master.zip" to /home/prikshet/.cache/torch/hub/master.zip
Traceback (most recent call last):
File "run.py", line 105, in
run(INPUT_PATH, OUTPUT_PATH, MODEL_PATH)
File "run.py", line 29, in run
model = MidasNet(model_path, non_negative=True)
File "/home/prikshet/midas/midas/midas_net.py", line 30, in init
self.pretrained, self.scratch = _make_encoder(features, use_pretrained)
File "/home/prikshet/midas/midas/blocks.py", line 6, in _make_encoder
pretrained = _make_pretrained_resnext101_wsl(use_pretrained)
File "/home/prikshet/midas/midas/blocks.py", line 26, in _make_pretrained_resnext101_wsl
resnet = torch.hub.load("facebookresearch/WSL-Images", "resnext101_32x8d_wsl")
File "/usr/local/lib/python3.6/dist-packages/torch/hub.py", line 354, in load
model = entry(*args, **kwargs)
File "/home/prikshet/.cache/torch/hub/facebookresearch_WSL-Images_master/hubconf.py", line 39, in resnext101_32x8d_wsl
return _resnext('resnext101_32x8d', Bottleneck, [3, 4, 23, 3], True, progress, **kwargs)
File "/home/prikshet/.cache/torch/hub/facebookresearch_WSL-Images_master/hubconf.py", line 23, in _resnext
model = ResNet(block, layers, **kwargs)
TypeError: init() got an unexpected keyword argument 'groups'

Testing code

Hi,
I'm having some problems in obtaining your same results at testing time. Could you share also an example of a script (e.g, NYU or TUM would be great) to test your network, please?

No such file or directory: 'model-f46da743.pt'

Hello

I got this error. Did I put the .pt file in wrong location? (see attachment)

initialize
device: cuda
Loading weights: model-f46da743.pt
Using cache found in C:\Users\gregb/.cache\torch\hub\facebookresearch_WSL-Images_master
Traceback (most recent call last):
File "run.py", line 105, in
run(INPUT_PATH, OUTPUT_PATH, MODEL_PATH)
File "run.py", line 29, in run
model = MidasNet(model_path, non_negative=True)
File "C:\Users\gregb\Documents\Python\MiDaS\midas\midas_net.py", line 47, in init
self.load(path)
File "C:\Users\gregb\Documents\Python\MiDaS\midas\base_model.py", line 11, in load
parameters = torch.load(path)
File "C:\Users\gregb\anaconda3\envs\3DP\lib\site-packages\torch\serialization.py", line 525, in load
with _open_file_like(f, 'rb') as opened_file:
File "C:\Users\gregb\anaconda3\envs\3DP\lib\site-packages\torch\serialization.py", line 212, in _open_file_like
return _open_file(name_or_buffer, mode)
File "C:\Users\gregb\anaconda3\envs\3DP\lib\site-packages\torch\serialization.py", line 193, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'model-f46da743.pt'

Untitled

Fine tuning

Hi, i want to fine tune your model for some specific purposes. I will be very grateful, if you provide your train scripts with metrics or if you give a peace of advice about what part of model I can successfully fine tune. Thanks!

Scale of model outputs

Hi! Great work! My question is about model outputs. You mentioned that you learn disparity values which are shifted in [0,1] range. However, the inference values of ~10000. Where does this differnce come from?

The code of scale- and shift- invariant loss

Hi! Thanks for your excellent paper!
I want to repeat your training procedure, but there's no ssi-loss implementation here.
Can I ask for your pytorch implementation for scale- and shift-invariant loss?

How can I visualize the point cloud?

Hi,

Thanks for your great work. I want to know how I can visualize the point cloud from the depth image by Open3D. According the Camera Intrinsics, I don't know the content of the intrinsics.json file. Could you give me an example of this file?
Another question, when do you plan to release the scripts that used to produce the data set?

Thank you very much!

Measure distance by the depth map

How could I measure the distance by the output depth map? What is the unit of it?
From test_simple.py,

# PREDICTION
input_image = input_image.to(device)
features = encoder(input_image)
outputs = depth_decoder(features)

disp = outputs[("disp", 0)]

How could I measure the absolute distance of each pixels by the disp(disp mean disparity, or it is depth?) tensor?Thanks

KITTI Numbers

The KITTI set reported in the paper is said to have 161 images.
(from Supplementary Material Section C,

"For KITTI we used the intersection of the official validation set for depth estimation (with improved ground-truth depth [69]) and the Eigen test split [60] (161 images)".)

I assume that is a mix of Depth Benchmark Val and Eigen Test. These seem to have only 145 common images. Could you please point me to the exact datasets that were used?

Here's the code I wrote to find the intersection

#!/usr/bin/python
import re

benchmark_val_file = "splits/benchmark/val_files.txt"
eigen_test_file = "splits/eigen/test_files.txt"

zfill=True

with open(benchmark_val_file, 'r') as f:
    val_set= set()
    for line in f.readlines():
        dir_name, img_num = line.split()[:2]
        if zfill:
            val_set.add((dir_name, img_num.zfill(10)))
        else:
            val_set.add((dir_name, img_num))

with open(eigen_test_file, 'r') as f:
    eigen_set = set() 
    for line in f.readlines():
        dir_name, img_num = line.split()[:2]
        eigen_set.add((dir_name, img_num))

print(len(val_set.intersection(eigen_set)))

Great improvement. Training code release?

The new model generates a more accurate depthmap than its previous version. Can you share the change you made? Also, is there a timeline for releasing the training code and the training data processing pipeline?

Non-cuda compatibility

Please add a switch for running the code without CUDA. It's being a real pain to refactor the code to run it on my MacBook.

Scale and shift in inference

Hello!

Suppose during inference you get values in some interval [a,b]. Then for visualization you scale and shift them into some region, say [0, 1]. Now, there are two ways to do this: (x - a) / (b - a) and (b - x) / (b - a). Naturally, you choose the first way; however, I do not actually see how we are guaranteed that the scale must be positive.

Looking at your loss function: https://gist.github.com/dvdhfnr/732c26b61a0e63a0abc8a5d769dbebd0 - you just use least squares and can easily get positive or negative scale.

Since the network has to learn ordinal relationships (near vs. far), it seems intuitive that the scale would be positive for all images if it is positive for some; however, I am not sure we are guaranteed even that. Or is it something you did during training?

Also, I see that in the paper propose using the median and mean absolute deviation instead. So, what did you end up using?

Slow image transformation

Hej,
I dont think this is a issue, sorry for posting like this.
But the image that goes through you model is really slow. Do you have a method for speeding it up?
Sorry again for posting it as a issue, but dont know how else to make contact.

What does the predicted depth signify?

A "prediction" gives the following:

[[2496.0127 2495.973 2495.9888 ... 855.7698 855.57666 856.0468 ]
[2495.9575 2495.9158 2495.9329 ... 855.4036 855.20917 855.68256]
[2495.9797 2495.9387 2495.9556 ... 855.55426 855.36035 855.83234]
...
[3245.7551 3245.7756 3245.7664 ... 2852.5774 2852.4922 2852.702 ]
[3245.7275 3245.7478 3245.739 ... 2852.4827 2852.397 2852.6072 ]
[3245.7974 3245.8179 3245.809 ... 2852.7156 2852.6309 2852.8398 ]]

What are the units of these numbers m, mm, ft? Of course those numbers aren't disparities (since the images aren't that wide). So what do these numbers represent? How to convert this prediction to actual depth given camera intrinsics?
Thanks

About getting results in meters unit

@ranftlr Thank you for the work. I'm trying to apply it with Myriad X VPU.
So I would like to ask whether the unknown scale and shift mentioned in #36 are linear parameters?
For example, in each frame, I can find a linear equation like "P = D * scale + shift" to project the values of depth maps "D" to the physical absolute measurements "P" according to putting a known scale ruler in the view, right ?

How could I convert it to onnx model?

Trying to convert the model to onnx model, but got error

File "to_onnx.py", line 72, in
export_model(model, img_input, export_model_name)
File "to_onnx.py", line 30, in export_model
torch.onnx.export(model, input, export_model_name, verbose=False, export_params=True, opset_version=11)
File "C:\Users\yyyy\Anaconda3\envs\torchreid\lib\site-packages\torch\onnx_init_.py", line 148, in export
strip_doc_string, dynamic_axes, keep_initializers_as_inputs)
File "C:\Users\yyyy\Anaconda3\envs\torchreid\lib\site-packages\torch\onnx\utils.py", line 66, in export
dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs)
File "C:\Users\yyyy\Anaconda3\envs\torchreid\lib\site-packages\torch\onnx\utils.py", line 416, in _export
fixed_batch_size=fixed_batch_size)
File "C:\Users\yyyy\Anaconda3\envs\torchreid\lib\site-packages\torch\onnx\utils.py", line 279, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
File "C:\Users\yyyy\Anaconda3\envs\torchreid\lib\site-packages\torch\onnx\utils.py", line 236, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(model, args, _force_outplace=True, return_inputs_states=True)
File "C:\Users\yyyy\Anaconda3\envs\torchreid\lib\site-packages\torch\jit_init
.py", line 277, in _get_trace_graph
outs = ONNXTracedModule(f, _force_outplace, return_inputs, return_inputs_states)(*args, **kwargs)
File "C:\Users\yyyy\Anaconda3\envs\torchreid\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "C:\Users\yyyy\Anaconda3\envs\torchreid\lib\site-packages\torch\jit_init
.py", line 332, in forward
in_vars, in_desc = _flatten(args)
RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type numpy.ndarray

to_onnx.py

import os
import glob
import torch
import utils
import cv2

from torchvision.transforms import Compose
from models.midas_net import MidasNet
from models.transforms import Resize, NormalizeImage, PrepareForNet

import onnx
import onnxruntime

def test_model_accuracy(export_model_name, raw_output, input):    
    ort_session = onnxruntime.InferenceSession(export_model_name)

    def to_numpy(tensor):
        return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

    # compute ONNX Runtime output prediction
    ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(input)}
    ort_outs = ort_session.run(None, ort_inputs)	

    # compare ONNX Runtime and PyTorch results
    np.testing.assert_allclose(to_numpy(raw_output), ort_outs[0], rtol=1e-03, atol=1e-05)

    print("Exported model has been tested with ONNXRuntime, and the result looks good!")		

def export_model(model, input, export_model_name):
    torch.onnx.export(model, input, export_model_name, verbose=False, export_params=True, opset_version=11)	
    onnx_model = onnx.load(export_model_name)    
    onnx.checker.check_model(onnx_model)
    graph_output = onnx.helper.printable_graph(onnx_model.graph)
    with open("graph_output.txt", mode="w") as fout:
        fout.write(graph_output)
		
device = torch.device("cpu")

 # load network
model_path = "model.pt"
model = MidasNet(model_path, non_negative=True)

transform = Compose(
        [
            Resize(
                384,
                384,
                resize_target=None,
                keep_aspect_ratio=True,
                ensure_multiple_of=32,
                resize_method="lower_bound",
                image_interpolation_method=cv2.INTER_CUBIC,
            ),
            NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            PrepareForNet(),
        ]
)

model.to(device)
model.eval()

img = utils.read_image("input/line_up_00.jpg")
img_input = transform({"image": img})["image"]

# compute
#with torch.no_grad():
sample = torch.from_numpy(img_input).to(device).unsqueeze(0)
print("sample type = ", type(sample), ", shape of sample = ", sample.shape)
print(sample)	
prediction = model.forward(sample)
export_model_name = "midas.onnx"	
export_model(model, img_input, export_model_name)

Environment:

pytorch 1.4.0(installed by anaconda)
os is windows 10 64bits

What does run.py script return?

Hi! I am trying to get your repository working in a simple inference mode to be able to estimate the quality on NYUv2 dataset. As far as I understand, your run.py script returns inverse logarithm of depth scaled by some coefficient. Am I right? (At least that gives the best metrics, though I saw that you said that you predict the inverse depth)

Also, I have yet another question: you take as a backbone ResNet network. But, as far as I understand, you use it for the unnormalized images (i.e, images that do not have 0 mean intensities and unit stds), while the ResNet was trained on normalized images. Is this right and why do you do that?

why use inverse depth ?

Hi, @dvdhfnr I can't understand why use inverse depth has difference with using original depth? or the inverse depth has some unique process? thanks~

MiDaS-v2 was trained using the median + MAE

Thanks for your work! In the paper proposeMiDaS-v2 was trained using the median + MAD,What's special about the implementation details?I implementationit but the training result is terrible than using the former loss.( I remove the relu at the end
of the net )Maybe I make some mistakes.Could you release your implementation of the loss function?

Loss implementations

Hi,

I read in #43 that you did not plan on releasing the training code. Can you still share the implementations you used for the losses?

Thanks!

how to train the groudtrue disparity from PWC

Hello,

Thanks for releasing the code.What an amazing project you did!
There I have some questions. I can not get perfect groundtrue disparity maps as you did. I hope to have your help.

  1. how to modify the pwc-net code. replace the 2dcorrealtion with 1dcorrlation or not?
  2. train the pwc-net code in supervised mode or unsupervised mode?
  3. if train pwc-net code in unsupervised mode, what is the unsupervised loss you used?
  4. can you release the trained model of your pwc-net?
    Thanks very much.

Perry

Loss functions

Hi, the loss functions when training midas are very simple, i.e., ptrim(l1) and gradient loss. Have you tried other loss functions like normal loss or BerHu? Or have you tried these loss functions but they didn't work well?

Thanks.

Pytorch Errors

I got this error after running MiDaS 2.1

initialize
device: cpu
Loading weights: model-f6b98070.pt
Using cache found in C:\Users\gregb/.cache\torch\hub\facebookresearch_WSL-Images_master
Traceback (most recent call last):
File "run.py", line 151, in
run(args.input_path, args.output_path, args.model_weights, args.model_type, args.optimize)
File "run.py", line 32, in run
model = MidasNet(model_path, non_negative=True)
File "C:\Users\gregb\Documents\Python\MiDaS\midas\midas_net.py", line 47, in init
self.load(path)
File "C:\Users\gregb\Documents\Python\MiDaS\midas\base_model.py", line 11, in load
parameters = torch.load(path, map_location=torch.device('cpu'))
File "C:\Users\gregb\anaconda3\envs\3DP\lib\site-packages\torch\serialization.py", line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File "C:\Users\gregb\anaconda3\envs\3DP\lib\site-packages\torch\serialization.py", line 224, in init
super(_open_zipfile_reader, self).init(torch.C.PyTorchFileReader(name_or_buffer))
RuntimeError: version
<= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at ..\caffe2\serialize\inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at ..\caffe2\serialize\inline_container.cc:132)
(no backtrace available)

How important is a consistent loss to the generalizability?

Do you think the same multi-objective training with different loss functions would yield the same type of generalizable performance. In MiDaS, all training was directly supervised with the same loss function, regardless of dataset. Have you considered or evaluated what the impact of blending different losses would be? For example, take a SSIM loss like Monodepth for training on stereo image pairs or video frames while also training on the datasets you analyzed in this work. From the original paper, it seems like it should work (they evaluate in a truly multi-task setting), but I am curious if the cross-dataset generalizability would hold up as well.

Did you fill the holes in the groundtruth before training midasnet?

I tried to reproduce the experimental result of MIDAS V2 but failed. The edge of instances produced by my models is not as clear as the official model.

I figure that because of the procedure of getting groundtruth depth, in these 6 datasets there are a lot of areas that are masked out when calculating the loss. Most of these areas covers the edge of objects or scenes. I thought this might be the reason.

So did you preprocessed in the groundtruth depth map to fill these holes? If not, how did you deal with these holes that are important to produce depth maps with clear edges? Did you simply masked these holes out?

Thanks.

FileNotFoundError: [Errno 2] No such file or directory: 'MiDaS/model.pt'

Hi, everyone!

I like to see this great product. But I have faced with that issue.
It happens on start after python main.py --config argument.yml
This files really doesn`t exist. So next question, where I can find this file or how I can create it ?

The full console message

(3DP) D:\Projects\Python\3d_photo>python main.py --config argument.yml
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]Current Source ==>  moon
initialize
device: cpu
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 47, in <module>
    config['MiDaS_model_ckpt'], MonoDepthNet, MiDaS_utils, target_w=640)
  File "D:\Projects\Python\3d_photo\MiDaS\run.py", line 29, in run_depth
    model = Net(model_path)
  File "D:\Projects\Python\3d_photo\MiDaS\monodepth_net.py", line 52, in __init__
    self.load(path)
  File "D:\Projects\Python\3d_photo\MiDaS\monodepth_net.py", line 88, in load
    parameters = torch.load(path)
  File "D:\Programs\miniconda3\envs\3DP\lib\site-packages\torch\serialization.py", line 525, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "D:\Programs\miniconda3\envs\3DP\lib\site-packages\torch\serialization.py", line 212, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "D:\Programs\miniconda3\envs\3DP\lib\site-packages\torch\serialization.py", line 193, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'MiDaS/model.pt'

Depth in float32 in meters units

Hello! Thanks for you work!

I have two questions:

  1. What is the .pmf format and what is it used for?
  2. While opening .png depth maps how to convert them into float32 in meters units?

Can't run script from another folder

when running Midas from another folder:

python ../MiDaS/run.py 
initialize
device: cuda
Loading weights:  model-f46da743.pt
Using cache found in /home/3dsf/.cache/torch/hub/facebookresearch_WSL-Images_master
Traceback (most recent call last):
  File "../MiDaS/run.py", line 105, in <module>
    run(INPUT_PATH, OUTPUT_PATH, MODEL_PATH)
  File "../MiDaS/run.py", line 29, in run
    model = MidasNet(model_path, non_negative=True)
  File "/home/3dsf/MiDaS/midas/midas_net.py", line 47, in __init__
    self.load(path)
  File "/home/3dsf/MiDaS/midas/base_model.py", line 11, in load
    parameters = torch.load(path)
  File "/home/3dsf/MiDaS/envs/lib/python3.7/site-packages/torch/serialization.py", line 381, in load
    f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'model-f46da743.pt'

This could be a feature, I guess. Anyways, great job, I've tested it several times and here is a magicEye Video made using MiDas

How to get the clear edge

Hi,thank you for your work!
I trained the model following your job. I found the released model has more clear object-edges. Which part does benefit the clear edges?
Thanks in advance!

Camera Intrinsics

Here is a visualization of the combined rgb image taken with my phone and depth inferred by MiDaS.
image

I used the intrinsic values from the phone camera but I'm not sure that makes sense to do since the depth projects out to much greater than the ground truth.

Is the depth just relative within the image or should I expect to be able to achieve a realistic rough depth value?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.