Giter Club home page Giter Club logo

ifc's People

Contributors

sukjunhwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ifc's Issues

COCO pretrained model

Thanks for the code.
It would be great if you could also share the COCO pretrained model.

evaluation error

After running the following command to evaluate:

python projects/IFC/train_net.py --num-gpus 8 --eval-only --config-file projects/IFC/configs/base_ytvis.yaml MODEL.WEIGHTS pretrained_weights/coco_r50.pth INPUT.SAMPLING_FRAME_NUM 5

An error occurred

  File "/SSD_DISK/users/yanghongye/projects/rvos/IFC/projects/IFC/ifc/ifc.py", line 221, in forward
    video_output.update(clip_results)
  File "/SSD_DISK/users/yanghongye/projects/rvos/IFC/projects/IFC/ifc/structures/clip_output.py", line 103, in update
    input_clip.frame_idx] = input_clip.mask_logits[left_idx]
RuntimeError: shape mismatch: value tensor of shape [100, 5, 45, 80] cannot be broadcast to indexing result of shape [50, 5, 45, 80]

And I change


to

num_max_inst = 100

the error still occurred when update the second clip of the video

  File "/SSD_DISK/users/yanghongye/projects/rvos/IFC/projects/IFC/ifc/ifc.py", line 221, in forward
    video_output.update(clip_results)
  File "/SSD_DISK/users/yanghongye/projects/rvos/IFC/projects/IFC/ifc/structures/clip_output.py", line 103, in update
    input_clip.frame_idx] = input_clip.mask_logits[left_idx]
RuntimeError: shape mismatch: value tensor of shape [5, 5, 45, 80] cannot be broadcast to indexing result of shape [0, 5, 45, 80]

Could you help me to solve it?

Inference Coco data image

I trained a model using base_coco.yaml files,but when I want to visualize the detection results of Coco data images instead of videos, I encountered the same problem as #5

May I ask how to use base_coco.yaml files to infer some images and obtain visual results

FPS mesurment

Hi, thanks for the amazing work!
I wanted to ask how you compute the FPS on the semi-online setup and how it depends on the stride S and clip_size T used.
Taking the T=5 & S=1 scenario (the one reported on the main results table) the model takes as input 5 frames at a time, 4 of which will be overlapping from window to window (is this correct?). This means that the effective new frames predictions from step to step is just 1 frame, as the other 4 are part of the overlap used to compute the matching.
Having this in mind how do you compute the FPS? I guess that is not computed taking just the effective 1 frame as the actual frames, as then FPS will be equally proportional to the stride for a fixed clip_size T.

Thanks a lot for your clarifications!!

Code explanation

Hello,

First of all, great paper! I just have one question. Would you mind helping me understand why only the last feature map is used in the transformer? Aren't you losing information by discarding the others?

src, mask = features[-1].decompose()

How to generate prediction of instance segmentation without bounding box, class, edge and probability?

When I finish training for instance segmentation and use demo.py to generate masks, I get the result of the first image.

First image includes box and class(0 in this case) and probability.

Also, the segment object has edge with different colors.

I want to ask how to generate the mask like the second image.

I want to generate image without edge, box, probability and class.

Hope someone can help and thank you so much
微信截图_20230823203910

Question about batch size vs num frames

Hello again,

I have one last question that I'm still unclear about. In this implementation, the size of the input being fed into the network is (B x C x H x W) with B being the number of frames correct? Or is it actually (B x F x C x H x W) with F being the number of frames?

How does the pre-train process effect the final performance?

Thanks for your wonderful work!
I noticed that in your paper, before train IFC on VIS dataset, you firstly add an extra pretrain process on COCO dataset by setting T to 1. This implies the memory token and all bus layers are also pretrained during this process.
So I'm wondering how this process influence the final performance on VIS? If we do not pretrain all memory token and bus layers on COCO, what will happen to the final performance on YouTube dataset?
Hoping for your reply and thank you again.

Questions about Memory

Thanks for your great work.

I have two questions about memory_bus and memory_pos.

The first one:
In the paper, memory tokens helps features in different frames communicate with each other.
However, In the code, It seems the communications is designed for communication among layers.

        for layer_idx in range(self.num_layers):
            output = torch.cat((output, memory_bus))

            output = self.enc_layers[layer_idx](output, src_mask=mask,
                           src_key_padding_mask=src_key_padding_mask, pos=pos)
            output, memory_bus = output[:hw, :, :], output[hw:, :, :]

            memory_bus = memory_bus.view(M, bs, t, c).permute(2,1,0,3).flatten(1,2) # TxBMxC
            memory_bus = self.bus_layers[layer_idx](memory_bus)
            memory_bus = memory_bus.view(t, bs, M, c).permute(2,1,0,3).flatten(1,2) # MxBTxC

The second one:
It seems self.memory_bus and self.memory_pos are not updated. Intuitively, I guess it will be helpful if it is updated along with frames.

memory_bus = self.memory_bus

self.memory_bus = torch.nn.Parameter(torch.randn(num_memory_bus, d_model))
        self.memory_pos = torch.nn.Parameter(torch.randn(num_memory_bus, d_model))
        if num_memory_bus:
            nn.init.kaiming_normal_(self.memory_bus, mode="fan_out", nonlinearity="relu")
            nn.init.kaiming_normal_(self.memory_pos, mode="fan_out", nonlinearity="relu")

        self.return_intermediate_dec = return_intermediate_dec

        self.d_model = d_model
        self.nhead = nhead

    def _reset_parameters(self):
        for p in self.parameters():
            if p.dim() > 1:
                nn.init.xavier_uniform_(p)

    def pad_zero(self, x, pad, dim=0):
        if x is None:
            return None
        pad_shape = list(x.shape)
        pad_shape[dim] = pad
        return torch.cat((x, x.new_zeros(pad_shape)), dim=dim)

    def forward(self, src, mask, query_embed, pos_embed, is_train):
        # prepare for enc-dec
        bs = src.shape[0] // self.num_frames if is_train else 1
        t = src.shape[0] // bs
        _, c, h, w = src.shape

        memory_bus = self.memory_bus
        memory_pos = self.memory_pos

        # encoder
        src = src.view(bs*t, c, h*w).permute(2, 0, 1)               # HW, BT, C
        frame_pos = pos_embed.view(bs*t, c, h*w).permute(2, 0, 1)   # HW, BT, C
        frame_mask = mask.view(bs*t, h*w)                           # BT, HW

        src, memory_bus = self.encoder(src, memory_bus, memory_pos, src_key_padding_mask=frame_mask, pos=frame_pos, is_train=is_train)

        # decoder
        dec_src = src.view(h*w, bs, t, c).permute(2, 0, 1, 3).flatten(0,1)
        query_embed = query_embed.unsqueeze(1).repeat(1, bs, 1)     # Q, B, C
        tgt = torch.zeros_like(query_embed)

        dec_pos = pos_embed.view(bs, t, c, h*w).permute(1, 3, 0, 2).flatten(0,1)
        dec_mask = mask.view(bs, t*h*w)                             # B, THW

        clip_hs = self.clip_decoder(tgt, dec_src, memory_bus, memory_pos, memory_key_padding_mask=dec_mask,
                                    pos=dec_pos, query_pos=query_embed, is_train=is_train)

        ret_memory = src.permute(1,2,0).reshape(bs*t, c, h, w)

        return clip_hs, ret_memory

Do I misunderstand something?

Use model at inference

Hey,
first things first: Great paper!
I am currently trying to run your model at inference and therefor used the script demo/demo.py and passed the arguments
--config-file ifc_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml
-output <path_to_output_file>
--video-input <path_to_input_file>
--opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x/138205316/model_final_a3ec72.pkl

Everything works fine, but I think thats not using your model right? Putting r101.pth for the WEIGHTS and R101_ytvis.yaml for the config-file does not work ("KeyError: 'Non-existent config key: MODEL.IFC' "). So how can I use your pretrained model at inference just to visualize and test for results?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.