Giter Club home page Giter Club logo

epro-pnp-v2's People

Contributors

lakonik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

epro-pnp-v2's Issues

inference is non-deterministic ?

Hi!

I've been doing inference with the following script, using the previous repo and this config:

from mmcv.parallel import MMDataParallel
from mmdet.datasets import build_dataloader
from epropnp_det.datasets.builder import build_dataset
from epropnp_det.apis.inference import init_detector
from mmcv import Config
import torch
from mmdet.apis import set_random_seed

set_random_seed(0, deterministic=True)

config_file = 'configs/epropnp_det_basic.py'
checkpoint_file = '/path/to/checkpoint/file'
device = 'cuda:0'
cfg = Config.fromfile(config_file)
distributed = False
samples_per_gpu = cfg.data.val.pop('samples_per_gpu', 1)
samples_per_gpu = 1
dataset = build_dataset(cfg.data.val)
model = init_detector(cfg, checkpoint_file, device=device)
model.test_cfg['debug'] = ['orient']
model = MMDataParallel(model, device_ids=[0])

data_loader = build_dataloader(
    dataset,
    samples_per_gpu=samples_per_gpu,
    workers_per_gpu=cfg.data.workers_per_gpu,
    dist=distributed,
    shuffle=False)

for i, data in enumerate(data_loader):
    with torch.no_grad():
        result = model(return_loss=False, rescale=True, **data)
    print(result[0]["orient_logprob"][0].shape)
    print(result[0]["bbox_results"][0].shape)
    print(result[0]["bbox_3d_results"][0].shape)
    print("------------------------------------")
    if i == 20:
        break

print('2nd for cycle')

for i, data in enumerate(data_loader):

    with torch.no_grad():
        result = model(return_loss=False, rescale=True, **data)
    print(result[0]["orient_logprob"][0].shape)
    print(result[0]["bbox_results"][0].shape)
    print(result[0]["bbox_3d_results"][0].shape)
    print("------------------------------------")

    logprob = result[0]["orient_logprob"]
    bbox_3d = result[0]["bbox_3d_results"]
    if i == 20:
        break

This way I'm printing the shapes of results for cars in each image. The first part of the shapes correspond to the number of detected objects for the image. I noticed that despite setting the seed I sometimes (from 2*20 iterations always) get different number of detections for the two iterations of the same dataloader (separated by print('2nd for cycle') ).

Outputs for the above script:
FIRST ITERATION:

(1, 128)
(1, 5)
(1, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(5, 128)
(5, 5)
(5, 20)
------------------------------------
(33, 128)
(33, 5)
(33, 20)
------------------------------------
(14, 128)
(14, 5)
(14, 20)
------------------------------------
(2, 128)
(2, 5)
(2, 20)
------------------------------------
(5, 128)
(5, 5)
(5, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(4, 128)
(4, 5)
(4, 20)
------------------------------------
(35, 128)
(35, 5)
(35, 20)
------------------------------------
(12, 128)
(12, 5)
(12, 20)
------------------------------------
(1, 128)
(1, 5)
(1, 20)
------------------------------------
(2, 128)
(2, 5)
(2, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(5, 128)
(5, 5)
(5, 20)
------------------------------------
(33, 128)
(33, 5)
(33, 20)
------------------------------------
(15, 128)
(15, 5)
(15, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(3, 128)
(3, 5)
(3, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(3, 128)
(3, 5)
(3, 20)

SECOND ITERATION:

(1, 128)
(1, 5)
(1, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(6, 128)
(6, 5)
(6, 20)
------------------------------------
(32, 128)
(32, 5)
(32, 20)
------------------------------------
(15, 128)
(15, 5)
(15, 20)
------------------------------------
(2, 128)
(2, 5)
(2, 20)
------------------------------------
(5, 128)
(5, 5)
(5, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(5, 128)
(5, 5)
(5, 20)
------------------------------------
(34, 128)
(34, 5)
(34, 20)
------------------------------------
(12, 128)
(12, 5)
(12, 20)
------------------------------------
(1, 128)
(1, 5)
(1, 20)
------------------------------------
(2, 128)
(2, 5)
(2, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(4, 128)
(4, 5)
(4, 20)
------------------------------------
(32, 128)
(32, 5)
(32, 20)
------------------------------------
(12, 128)
(12, 5)
(12, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(3, 128)
(3, 5)
(3, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(3, 128)
(3, 5)
(3, 20)

As you can see from 20 iterations there are 8 differences in detected object numbers. Only one difference is bigger than 1: 12 instead of 15.

What could be the cause of this? Maybe the non-deterministic nature of the pnp-solver?
Thanks in advance for the help!

monitoring perspective inference?

Hello author, your work is great! My custom dataset is a scene from a monitoring perspective, which is a tilted camera perspective. I want to infer images. Is there a way to complete the inference by modifying the configuration? (I do not have the corresponding annotations for retraining, so I cannot retrain at this time)

Object detector training code

Hi there,
Thanks for sharing all the work and code, very interesting.

I was wondering if you have any code for training the object detectors mentioned in your papers? This repo only contains the code for the "second" stage of your work.

Kind regards,
Chris

coordinate issue

Dear hansheng,

Thank for the second version. I am wondering how to change the coordinate in such setting: x3d fixed, but to predict x2d, w2d. Now the x2d is [0,1,2,....]×[0,1,2,....],the x3d and x2d is among[0,1]. In my case, x3d is fixed as [0,1,2,....]×[0,1,2,....]×[depth], how to normalize it? Thank you very much.

Why is CLS_ORIENTATION False for barriers in dataset?

I see that in the code:

CLS_ORIENTATION = [True, True, True, True, True, True, True, True, False, False]

CLS_ORIENTATION is set to false for barriers. Why is it this way?

nuScenes states about the TP metrics in the detection task:

We omit measurements for classes where they are not well defined: AVE for cones and barriers since they are stationary; AOE of cones since they do not have a well defined orientation; and AAE for cones and barriers since there are no attributes defined on these classes.

So orientation should be computed for barriers too.

Or did I misunderstood the meaning of CLS_ORIENTATION ?

mini-dataset output Json file

Thank you for the exciting paper, and providing the codes!
I needed the json output file of the mini dataset for my bachelor project, can you please share it with me?
here is my email:
[email protected]

How to understand the weight w2d?

w2d : Shape (num_obj, num_points, 2)
I'm sorry, after reading papers and code for a long time, I still haven't understood the physical meaning of w2d.
For 2D matching, the weight is score: Shape (num_obj, num_points, 1), which represents the probability of matching between two pairs of 2d feature points. But why does this 3d-2d shape have two columns, and what is the specific meaning?

Question about image shapes

Hello!
I have some questions abou image shapes:

  • I see that the images you get from the dataloder after the pipelines are of 1600x672 resolution. But the backbone is a ResNet101 pretrained on ImageNet, which I think accepts 224x224 images. If this is true, then the images are resized by the backbone, but the ground truths will be on the original scale. This leads to some confusions for me. For example:
    I see it in the code that center predictions in the fcos head are based on strides so the correspond to the 224x224 images. But the gt 2d centers are from the the 1600x672 resolution annotations. So they don't match.
    So how does this work? My intuition is that ResNet isn't actually 224x224 here but I couldn't find any evidence.

  • Multiple times the code makes it seem like that the images in the bacthes are not of the same shape (but that can't be the case right?):

    img_shapes = cam_intrinsic.new_tensor([img_meta['img_shape'][:2] for img_meta in img_metas])
    ori_shapes = cam_intrinsic.new_tensor([img_meta['ori_shape'][:2] for img_meta in img_metas])

    This part of the code I really don't understand because I think 'batch_input_shape' and 'img_shape' are always the same here, so this will be an all-zero mask:
    with default_timers['FCOS head forward time']:
    batch_size = mlvl_feats[0].size(0)
    input_img_h, input_img_w = img_metas[0]['batch_input_shape']
    img_masks = mlvl_feats[0].new_ones(
    (batch_size, input_img_h, input_img_w))
    for img_id in range(batch_size):
    img_h, img_w, _ = img_metas[img_id]['img_shape']
    img_masks[img_id, :img_h, :img_w] = 0

Thanks in advance for the help!

Could a new type of loss be introduced for classes?

In EPro-PnP-Det_v2 if we want to improve the classification performance, could theoretically a new type of loss be introduced with the help of the deformable correspondance head?

I was thinking about how the yaw angle distribution corresponds to different classes. During the AMIS algorithm we could use the generated rotation distribution, evaluate it from 0 to 2pi with some density. Then feed this distribution to a simple network which classifies based on yaw angle. Maybe this isn't suitable for all classes but it might be useful to train a binary classsifier for pedestrians and cones (which can be mixed for classifiers that are based on purely image inputs) and add the scores to the corresponding ones in the FCOS detection head with some weighting.

Or we could just use these orient logprobs for this purpose?:

if 'orient' in debug:
orient_bins = getattr(self.test_cfg, 'orient_bins', 128)
orient_grid = torch.linspace(
0, 2 * np.pi * (orient_bins - 1) / orient_bins,
steps=orient_bins, device=x3d.device)
# (orient_bins, num_obj, 4)
pose_grid = pose_opt[None].expand(orient_bins, -1, -1).clone()
pose_grid[..., 3] = orient_grid[None, :, None]
cost = evaluate_pnp(
x3d, x2d, w2d, pose_grid, self.camera, self.cost_fun, out_cost=True)[1]
orient_logprob = cost.neg().log_softmax(dim=0) + np.log(orient_bins / (2 * np.pi))
orient_logprob = orient_logprob.transpose(1, 0).cpu().numpy()

This is just an idea and my question is, could this theoretically work? Can this be backpropagated at all?

Thanks in advance for the answer, and for the previous ones too, they've been very useful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.