hkchengrex / cutie Goto Github PK

View Code? Open in Web Editor NEW

494.0 3.0 54.0 2.78 MB

[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation

Home Page: https://hkchengrex.com/Cutie/

License: MIT License

Python 100.00%

computer-vision deep-learning pytorch segmentation video-editing video-object-segmentation video-segmentation cvpr2024

cutie's People

Contributors

Stargazers

Watchers

cutie's Issues

point supervision

Thanks for your great work.

Since Cutie adopt point supervision for training to reduce memory requirements, I replaced my loss function with yours, but it didn't the video memory.
I also replaced your loss function with the original XMem's and modified it (as shown below), and again found the memory to be almost identical.
Do you have any idea what this is about?

import torch
import torch.nn as nn
import torch.nn.functional as F

from collections import defaultdict
from cutie.utils.tensor_utils import cls_to_one_hot


def dice_loss(mask: torch.Tensor, soft_gt: torch.Tensor) -> torch.Tensor:
    # mask: T*C*H*W
    # soft_gt: T*C*H*W
    # ignores the background
    mask = mask[:, 1:].flatten(start_dim=2)
    gt = soft_gt[:, 1:].float().flatten(start_dim=2)
    numerator = 2 * (mask * gt).sum(-1)
    denominator = mask.sum(-1) + gt.sum(-1)
    loss = 1 - (numerator + 1) / (denominator + 1)
    return loss.sum(0).mean()


# https://stackoverflow.com/questions/63735255/how-do-i-compute-bootstrapped-cross-entropy-loss-in-pytorch
class BootstrappedCE(nn.Module):
    def __init__(self, start_warm, end_warm, top_p=0.15):
        super().__init__()

        self.start_warm = start_warm
        self.end_warm = end_warm
        self.top_p = top_p

    def forward(self, input, target, it):
        if it < self.start_warm:
            return F.cross_entropy(input, target), 1.0

        raw_loss = F.cross_entropy(input, target, reduction='none').view(-1)
        num_pixels = raw_loss.numel()

        if it > self.end_warm:
            this_p = self.top_p
        else:
            this_p = self.top_p + (1-self.top_p)*((self.end_warm-it)/(self.end_warm-self.start_warm))
        loss, _ = torch.topk(raw_loss, int(num_pixels * this_p), sorted=False)
        return loss.mean(), this_p


class LossComputerXMem:
    def __init__(self, config=None):
        super().__init__()
        # self.config = config
        self.bce = BootstrappedCE(20000, 70000)

    def compute(self, data, num_objects, it):
        losses = defaultdict(float)

        b, num_frames = data['rgb'].shape[:2]
        t_range = range(1, num_frames)

        for bi in range(b):
            logits = torch.stack([data[f'logits_{ti}'][bi, :num_objects[bi] + 1] for ti in t_range], dim=0)
            cls_gt = data['cls_gt'][bi, 1:]  # remove gt for the first frame
            soft_gt = cls_to_one_hot(cls_gt, num_objects[bi])

            loss, _ = self.bce(logits, soft_gt, it)
            losses[f'ce_loss'] += (loss / b)

            loss = dice_loss(logits.softmax(dim=1), soft_gt)
            losses[f'dice_loss'] += (loss / b)

            aux = [data[f'aux_{ti}'] for ti in t_range]
            if 'sensory_logits' in aux[0]:
                sensory_log = torch.stack(
                    [a['sensory_logits'][bi, :num_objects[bi] + 1] for a in aux], dim=0)

                loss, _ = self.bce(sensory_log, F.interpolate(soft_gt, scale_factor=1/16), it)
                losses[f'aux_sensory_ce'] += (loss / b)

                loss = dice_loss(sensory_log.softmax(dim=1), F.interpolate(soft_gt, scale_factor=1/16))
                losses[f'aux_sensory_dice'] += (loss / b)

            if 'q_logits' in aux[0]:
                num_levels = aux[0]['q_logits'].shape[2]

                for l in range(num_levels):
                    query_log = torch.stack(
                        [a['q_logits'][bi, :num_objects[bi] + 1, l] for a in aux], dim=0)

                    loss, _ = self.bce(query_log, F.interpolate(soft_gt, scale_factor=1 / 16), it)
                    losses[f'aux_query_ce_l{l}'] += (loss / b)

                    loss = dice_loss(query_log.softmax(dim=1), F.interpolate(soft_gt, scale_factor=1/16))
                    losses[f'aux_query_dice_l{l}'] += (loss / b)

        losses['total_loss'] = sum(losses.values())

        return losses

TypeError: cannot pickle '_thread.lock' object

I followed the instruction to run the example, I completed the following steps：

Use left-click for foreground annotation on the firts frame
Click the "commit to permanent memory" button
Click "Propagate forward"

then I got the error info below:

(cutie) PS E:\Cloneman\Cutie> python interactive_demo.py --video ./examples/example.mp4 --num_objects 1
Using device: cuda
Single object: False
Workspace is in: ./workspace\example
117 images found.
Exception ignored in: <function ResourceManager.del at 0x00000230E0A572E0>
Traceback (most recent call last):
Traceback (most recent call last):
File "E:\Cloneman\Cutie\gui\main_controller.py", line 272, in on_forward_propagation
File "E:\Cloneman\Cutie\gui\resource_manager.py", line 137, in del
for _ in range(self.num_save_threads):
self.on_propagate()
AttributeError: 'ResourceManager' object has no attribute 'num_save_threads'
File "E:\Cloneman\Cutie\gui\main_controller.py", line 315, in on_propagate
Traceback (most recent call last):
File "", line 1, in
for data in loader:
File "F:\SystemTools\miniconda3\envs\cutie\lib\site-packages\torch\utils\data\dataloader.py", line 438, in iter
File "F:\SystemTools\miniconda3\envs\cutie\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
return self._get_iterator()
File "F:\SystemTools\miniconda3\envs\cutie\lib\multiprocessing\spawn.py", line 126, in _main
File "F:\SystemTools\miniconda3\envs\cutie\lib\site-packages\torch\utils\data\dataloader.py", line 386, in _get_iterator
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
return _MultiProcessingDataLoaderIter(self)
File "F:\SystemTools\miniconda3\envs\cutie\lib\site-packages\torch\utils\data\dataloader.py", line 1039, in init
w.start()
File "F:\SystemTools\miniconda3\envs\cutie\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "F:\SystemTools\miniconda3\envs\cutie\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "F:\SystemTools\miniconda3\envs\cutie\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "F:\SystemTools\miniconda3\envs\cutie\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "F:\SystemTools\miniconda3\envs\cutie\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread.lock' object

Is there something wrong with my operation?

About pth --->ONNX

Dear hkchengrex:
When I convert pth to ONNX ,I'm confused by self._add_memory in inference_core.py. I just want to know how to convert this function to ONNX.Thanks

Attempting to deserialize object on a CUDA

I got a problem,yestorday,I test the project ,but it show the error : Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.What should id do?

Congratulations on your new great work

Congratulations on your new great work!I want to know can I change click mode(in Xmem, there are scribble,click,free) in Cutie?I think that is more convenient.

more detailed tips for interactive tool

can someone explain all the options available in the interactive demo.
possible example explanation expected :
left mouse click : first click selects foreground *******
save soft mask during propagation : click this for ******
,etc

hope its not too much work and its possible to include how to use gui_config.yaml, how to use image sequences, how to use video, how to segment objects,etc in the tips or in getting started

Simple cli inference script

Great work !

Will you provide a simple command line interface to process a video/images of video given object mask(s)?
Maybe something like done for XMem++: https://github.com/max810/XMem2#use-xmem-command-line-and-python-interface

I can see that eval_vos.py provides for something like this using dataset=generic, but this script seems more to be for big batch processing of datasets.

Donation / Sponsoring?

Hi there!

I am a big fan of this project, thank you for your work!
I would like to show my support, through donation.
However, I have not found any link to a donation / sponsoring page.

Would you mind providing me with one?
Thanks!

Soft Masks

I noticed that when saving soft masks, the first frame where I start propagation from will have a binary mask instead of soft mask. Is there any way to correct that so I can get the soft mask on every frame?

Also, I was wondering if there is any way to get the benefits of both the softmask and binary mask together?
What I mean is, the binary mask follows the mask that is drawn in the gui perfectly, but the edges are rough and temporally unstable. Simply smoothing or blurring the edges doesn't look good because they are unstable.
The soft mask has fantastic edges and temporal stability! But there can be a lot of gray areas where things should be solid.
I think it would be amazing if I could get the mask which has solid areas which follow the gui, but also have the feathered edges with temporal stability like the softmasks. Do you have any idea how that can be achieved?

Change number of tracking objects while running cuties

This is very interesting work and it help me a lot in my project. However, I would like to change the number of tracking objects but it show error. Cloud you suggest me about that?

ModuleNotFoundError: No module named 'cutie'

I followed the installation everything went smooth until I tried to run:
python interactive_demo.py

I get this error:

(cutie) D:\AI\Cutie>python interactive_demo.py
Traceback (most recent call last):
  File "D:\AI\Cutie\interactive_demo.py", line 20, in <module>
    from gui.main_controller import MainController
  File "D:\AI\Cutie\gui\main_controller.py", line 20, in <module>
    from cutie.model.cutie import CUTIE
ModuleNotFoundError: No module named 'cutie'

I also tried this:
pip install cutie

But I still get the same error.

I'm not a programmer so I can't understand how to fix it,
Any suggestion how to fix that and make Cutie launch so I can test the demo?

Thanks ahead 🙏

convert interactive_demo to .exe

Thank you for your great work!
I want to know if it is possible to convert .py to .exe for this work?

Real-Time/Streaming Support

Amazing work @hkchengrex! I’ve been looking at applying some of your work (XMem and now Cutie) to some of our real-time robotics applications.

I was wondering if you had any pointers/entry points for running Cutie in real-time (5-10 Hz) given a live camera feed. Any advice would be extremely appreciated!

Error importing mask

Hi, the Import mask button currently does not work.
It seems that in the file 'main_controller.py', line 556 needs to be changed
from: file_name = self._open_file('Mask')
to: file_name = self.gui.open_file('Mask')

Is permanent memory refers to longterm memory?

Thanks for this great work.

I saw many operations on the permanent memory but didn't find description about it in the paper.
How is it used with other memories (sensory, working, long-term)?

Errors installing on Windows

Attempting to install this in windows results in errors related to the packages cchardet & netifaces.
I believe these packages may not be well maintained and only work on old versions of python. Can anyone confirm if they have had success installing Cutie in windows under a specific python version?

ERROR: Failed building wheel for cchardet
ERROR: Failed building wheel for netifaces

I found this alternate package for cchardet which allowed the installation to progress, but I have not found a solution for netifaces yet.
https://pypi.org/project/faust-cchardet/
Are these packages critical to the use of the application, or is it possible they could be removed?

Tips on training on single GPU

I get crashed every times when I trained your model in single GPU (NVIDIA GeForce RTX 3090 24GB) due to insufficient memory. Can you give me some tips to fix that.

Question about the experimental result of STCN on table 1.

I tested STCN on MOSE valid, but your paper had a different result.

What I've done so far:

downloaded a pre-trained model of STCN(s03 model)
inference on MOSE valid using eval_genenric.py of STCN
uploaded to MOSE codalab

and the score I got on MOSE codalab was 0.254048895.

Do you have any idea why this discrepancy showed?

Can Cutie be extended to real-time stream inference?

Thank you for such an excellent project. Newcomers in the field of video segmentation, I have two questions that I need to ask you:

From a design perspective, can Cutie be extended to real-time stream inference?
Is it necessary that video input requires setting the number of objects to be tracked in advance?

I would greatly appreciate it if I could receive an answer. Thanks

Unwanted objects getting segmented and tracked (without any human/mask input)

Hello authors,

Thank you for creating this wonderful tool and for open sourcing the repository.

I am facing an issue through the interactive_demo script wherein unwanted objects – that I did not provide any mask for (not even though adding clicks) are getting segmented out of nowhere and getting tracked.

Below are the input and output videos.

outdoor_grass.mov

visualization_davis.mp4

To be specific, I pass the video path using --video and do not provide any masks for the hand. Then, I click "forward propagate".

I was expecting nothing to get detected and tracked, but the hand is getting detected and tracked.

As I understand it, the model only tracks objects that the user provides a mask for. Is my understanding correct? Please clarify.

Thank you!

Inference speed/ memory optimisation

Hi @hkchengrex , great work! May I ask, is it possible to optimise the inference in term of running at half precision fp16 by any chance, is the amp flag in config meant for it? And if you can give some pointers, if I want to quantise the model to maybe int8 (if that might be possible)? My motivation is to run the model at lower memory and using less GPU computation.

Issue of Object number in Inference

Hi, Thanks for the impressive work and codes!

I found there might has an issue during inference stage. https://github.com/hkchengrex/Cutie/blob/main/cutie/inference/inference_core.py#L281

I wondering the if tmp_id >= pred_prob_no_bg.shape[0]: should be if tmp_id + 1 >= pred_prob_no_bg.shape[0]:?

                    if tmp_id >= pred_prob_no_bg.shape[0]:
                        new_masks.append(this_mask.unsqueeze(0))
                    else:
                        # +1 for padding the background channel
                        pred_prob_no_bg[tmp_id + 1] = this_mask

When tmp_id==pred_prob_no_bg-1; the pred_prob_no_bg[tmp_id + 1] = this_mask will raise an error since tmp_id + 1 (e.g., tmp_id=4, tmp_id+1=5) already exceeds the channel dimension of pred_prob_no_bg (5; index start by 0, so the largest accepted index is 4)

run time error : forward() got an unexpected keyword argument 'average_attn_weights'

Got error when run this

python cutie/eval_vos.py dataset=generic image_directory=examples/images mask_directory=examples/masks size=480

可以添加上RMBG-1.4模型嗎Can we add the RMBG-1.4 model

可以添加上RMBG-1.4模型嗎
Can we add the RMBG-1.4 model

License for Commercial use

Thanks again for the great project.

As far as my test goes your project is working so much better and faster and is easier to use than any other similar I have been testing.

Is there any possibility to change the license to use the inference part of your project in a commercial product?

The current GPL-3.0 license, prevents doing that.

Question: How to change dilation radius / padding of mask?

Hi Rex and team, thank you for making this useful tool. 🙏

May I know how do we change the dilation radius (also known as padding width) of the mask?

Congratulations on your new great work

Congratulations on your new great work, I'm replicating your work, on the mose dataset, can you show what the correct format for mose submissions is? Like, what's the frame interval? What is the structure of the submitted file?

Get list of frames committed to memory?

Hi, I would like to add an indicator in the gui that can show which frames have been committed to memory.
I could not find in the code where you keep a list of the frames that are in memory.
Can you advise how I can obtain that information?

Training Setting

Hi, sorry to bother you again.:wave:

I found that Cutie uses an auxiliary loss, and the scaling factor for that loss is 0.1 in the paper, but it is 0.01 in the code.
Am I looking for the wrong settings?

In addition, Cutie does not provide the model pre-trained with BL30K, which your previous works (STCN, XMem) does.
Is this because the new dataset (MOSE, or your MEGA setting) is enough to make up for the past insufficient dataset situation? 😯 Did you pre-train it with BL30K?

How to use the BURST dataset?

Problem about trainer

In XMem, I find that the model conducts "self.xmem.eval()".
Here is "self.cutie.train()".
Could you tell me the difference between them?

Brush painting instead of auto segmentation

I am very eager to use Cutie in my projects since XMem worked so well for me.

However, running cutie for the very first time in these days, I noticed that the function to "paint" on the image to create a mask seems to be gone from the new GUI used in cutie.
That function was a very useful and important feature in my work with the XMem GUI, since the "left click - auto segmentation" frequently does not yield the right result and is tedious to refine on more complicated segmentation tasks.

Could you implement the "paint brush" function again?

Change the output to polygon

This is an amazing repo, this help me a lot in my work.
I have a question, Is there any efficient way to convert the output from mask to polygon when I have multiple objects?

Bugs when running on mac

Wrong device mapping on checkpoint load

Traceback (most recent call last):
  File "/opt/gh/Cutie/interactive_demo.py", line 75, in <module>
    ex = MainController(cfg)
  File "/opt/gh/Cutie/gui/main_controller.py", line 59, in __init__
    self.initialize_networks()
  File "/opt/gh/Cutie/gui/main_controller.py", line 129, in initialize_networks
    model_weights = torch.load(self.cfg.weights)
  File "/opt/miniconda/envs/cutie/lib/python3.10/site-packages/torch/serialization.py", line 1014, in load
    return _load(opened_zipfile,
  File "/opt/miniconda/envs/cutie/lib/python3.10/site-packages/torch/serialization.py", line 1422, in _load
    result = unpickler.load()
  File "/opt/miniconda/envs/cutie/lib/python3.10/site-packages/torch/serialization.py", line 1392, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/opt/miniconda/envs/cutie/lib/python3.10/site-packages/torch/serialization.py", line 1366, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/opt/miniconda/envs/cutie/lib/python3.10/site-packages/torch/serialization.py", line 381, in default_restore_location
    result = fn(storage, location)
  File "/opt/miniconda/envs/cutie/lib/python3.10/site-packages/torch/serialization.py", line 274, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/opt/miniconda/envs/cutie/lib/python3.10/site-packages/torch/serialization.py", line 258, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Fix:

in main controller do:

model_weights = torch.load(self.cfg.weights, map_location=self.device)

autocast seemingly doesn't work with mps

Traceback (most recent call last):
  File "/opt/gh/Cutie/gui/gui.py", line 411, in on_mouse_press
    self.click_fn(action, ex, ey)
  File "/opt/gh/Cutie/gui/main_controller.py", line 152, in click_fn
    with autocast(self.device, enabled=self.amp and self.device != 'mps'):
  File "/opt/miniconda/envs/cutie/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 241, in __init__
    raise RuntimeError(
RuntimeError: User specified an unsupported autocast device_type 'mps'

Fix

anywhere autocast is used in main_controller, I did this? idk there's probably a more elegant way

with autocast('cpu' if self.device == 'mps' else self.device, enabled=self.amp and self.device != 'mps'):

Trying to reproduce SAMTrack-esque demo with Cutie

Hi there, thanks so much for making this! This is really cool!

I'm trying to adapt this to a SAMTrack-like demo (similar to https://github.com/z-x-yang/Segment-and-Track-Anything/blob/main/demo_instseg.ipynb) where I have an existing image-level segmentation system which can give me masks for the objects I want to track, and I want to be able to run this system at intervals over the video to start tracking new masks as they come into frame. However, when implementing something like this I'm running into errors with torch transformations (in this case, in the self.sensory_update call where during chunk-by-chunk inference in Mask Encoder, there's an issue with concatenating the inputs "g" and "h" as they are different sizes).

Is there a way to adapt the current demo to follow this sort of "online segmentation and tracking" framework (assuming I have a separate source for ID-based masks so I just need to somehow get tracked masks without state from Cutie along with merging the tracked and segmented masks and updating the Cutie memory)?

Code issue

Hi, Cheng!

Thanks for your great work!

Our work is based on XMem framework and now wish to try to use your new vos_dataset.py, only to find a problem:
In the original XMem, transforms.Normalize is written directly inside vos_dataset.py, but in Cutie it is written in the model.
Is there any difference between the two? If they are the same, would it be better to write it back into vos_dataset.py to make it easier for people who use the XMem framework to migrate your new vos_dataset.py?

Best wish.

https://github.com/hkchengrex/XMem/blob/9ea04795564dcff06b6570132aed7eedba94d9b8/dataset/vos_dataset.py#L92-L95

Cutie/cutie/dataset/vos_dataset.py

Line 128 in e812cf3

transforms.ToTensor(),

Cutie/cutie/model/cutie.py

Line 58 in e812cf3

image = (image - self.pixel_mean) / self.pixel_std

VRAM usage with amp

I have been testing the amp setting, and I am a little confused by the result I am seeing. With cutie's default settings, I see less vram usage when amp: is disabled. Only when increasing the max_internal_size do I get any vram benefit from enabling it.
Each test run was conducted following a fresh restart of the application.

For a short clip with only 79 frames, it used 2.5gb with amp: True, and only 1.8gb with amp: False.

For a clip that is 1888 frames in length, I left all memory settings at defaults except I increased the long term memory size, so that I can measure the memory usage without it getting purged.
With amp: True, the entire clip completed, but it ended up right at 12gb, which is the limit for my gpu.
With amp: False, the entire clip processed and ended up using 11gb.

With the longer clip again, I increase the max_internal_size to 720. This time I did see a huge benefit for amp: True.
With amp: True it was able to process 160 frames before coming to a stop due to being out of vram.
With amp: False, it was only able to process about 65 frames.

Taking the max_internal_size back down a bit, to 540
With amp: True I was able to process 1055 frames
With amp: False I was able to process 755

So basically what I am seeing, is at max_internal_size of 480 or lower, amp is harmful to vram usage. Then the more you increase max_internal_size, the more benefit that is gained from it.
Can you confirm if this result makes sense? I am not sure if it could just be something peculiar to my own system, or if this result is expected.

Not tracking properly

Hi Rex I'm a big fan and user of your work.

I'm trying to get Cutie to work, but it seems to be failing to track my masks. I have the examples running fine, but when I try to apply this to a video of a drone, it totally fails.

Frame 1 (mask generated externally)

Frame 4 (hasn't moved from initial position)

Frame 37 (track begins to vanish)

I've tried different configs, but no matter what Cutie seems to do this for this particular object. Do you have any idea what be the issue? I can use XMEM but Cutie seems more performant.

Memory settings

I was wondering if you could provide an overview of the memory settings on the right side of the gui. While I understand what most of them do at a high level, from a practical standpoint I am not sure when to change them, or if I should change them.

What are some situations where it might be beneficial to increase them? Are higher settings just better if you have a lot of vram?
Also, I'm not quite sure what "memory frame every (r)" actually does.

Inquiry About AVA and HACS Files in the BURST Dataset

Hello,

First of all, thank you for your outstanding work. I am currently attempting to reproduce training results on the Mega dataset and have encountered some difficulties.

I've noticed that I'm missing some files from the AVA and HACS collections, which should be located in the .../BURST/frames/... directory. There seems to be no mention of the use of these files in the training process in the article. Could you please clarify if these files were used in the training? And if so, is there a way to obtain them without a university email?

Question: Extract ALPHA result?

Is there a way to extract PNG Sequence or Video with the ALPHA of the result?

For example:
Tracked subject only Masked (soft or binary) without background, instead of using the alpha manually on a composite software.

If not, will you consider to add such feature? 🙏

Is there a way to solve the noise caused by camera disturbance?

Due to the motion of the camera, a large number of false target information will be introduced into the video frame. Is there a way to solve the noise caused by camera disturbance?

Interpolation aligment

Have you checked if you are affected in

Cutie/cutie/inference/inference_core.py

Line 223 in 555d016

mode='nearest',

By pytorch/pytorch#34808 ?

I've not analyzed all the interpolations you have used between pytorch and opencv so I am not sure if you are impacted or not.

Question: restarting inference.

First of all, I really appreciated for your great works.
I have a question about inference.

Im trying to integrate this model in my application and I wanna restart object tracking when trigger that provides new input mask and input image has been called.

It seems clear_memory, clear_sensory_memory which are InferenceCore's methods can work for this.
How can I clear memory of object and restart tracking?

Thanks!

About BURST

Hi!

First of all, thanks for your great work and code, specifically the new Dataloader and BURST tools.
But, I have some questions about the following things:

Code issue
convert_burst_to_vos_train.py seems to have several problems

remove # L28
#L65: output_folder should be visualize_folder

BURST Tools
If my understanding is correct, I could use the convert_burst_to_vos_train.py to convert burst mask in json to png.

python scripts/convert_burst_to_vos_train.py --json_path ../BURST/annotations/train/train.json --input_path ../BURST/frames/train --output_path ../BURST/vos_train/train # generate train split, which have all masks
python scripts/convert_burst_to_vos_train.py --json_path ../BURST/annotations/val/first_frame_annotations.json --input_path ../BURST/frames/val --output_path ../BURST/vos_train/val_first_annotations # generate val split, which have the first annotations of each object
python scripts/convert_burst_to_vos_train.py --json_path ../BURST/annotations/val/all_classes.json --input_path ../BURST/frames/val --output_path ../BURST/vos_train/val_all_annotations # generate val split, which have all annotations of each object

I use the all_classes.json to generate full mask for val split, then use mask_to_burst_json.py to re-convert it to predition json.
I eval the predition json, but the metric it not 100 for both val and test split.

If these two tools are absolutely correct and there is no loss, i.e. json->png and png->json, why the metric is not 100?

Your precomputed_results

I downloaded your results (cutie-base-wmose_burst-val.zip) to verify the correctness of my evaluation tool for BURST-benchmark.
Although the metric is the same as yours, when I use mask_to_burst_json.py, I find all the pngs in Annotations are null like this.

which can not use mask_to_burst_json.py, I want to know the reason for this.

Sorry for the many questions, thank you for your patient reply!

Improve mask deletion in GUI

Thanks for you're great work on this family of models.

One thing that is difficult with the UI is deleting object tracks once they have started. I tried resetting an object in a frame, but it's not trivial to propagate those deletions backward in time.

Some suggestions:

show object presence over time - essentially a boolean grid of "objects" by "time"
- Have the ability to select and delete time ranges for a single object
have the ability to delete portions of the mask (have a mask brush)

hkchengrex / cutie Goto Github PK

cutie's People

Contributors

Stargazers

Watchers

Forkers

cutie's Issues

Wrong device mapping on checkpoint load

Fix:

autocast seemingly doesn't work with mps

Fix

Recommend Projects

Recommend Topics

Recommend Org