liheyoung / unimatch Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2023] Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
Home Page: https://arxiv.org/abs/2208.09910
License: MIT License
[CVPR 2023] Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
Home Page: https://arxiv.org/abs/2208.09910
License: MIT License
Hello, I appreciate your generosity in sharing your code. I have attempted to replicate your work on two GTX 4090s with a combined memory of 24 GB. However, during the training process on the cityscapes dataset, the GPU memory allocation proved insufficient, forcing me to reduce the batch size to 1 and the backbone to ResNet50. Despite these adjustments, the program still consumes a considerable amount of memory at around 23 GB. I was wondering how much memory is typically required for the model during your own training process?"
1.The first question is:
example:
loss_u_s2 = criterion_u(pred_u_s2, mask_u_w_cutmixed2)
loss_u_s2 = loss_u_s2 * ((conf_u_w_cutmixed2 >= cfg['conf_thresh']) & (ignore_mask_cutmixed2 != 255))
loss_u_s2 = torch.sum(loss_u_s2) / torch.sum(ignore_mask_cutmixed2 != 255).item()
Take the above loss as an example: we can see that after the unsupervised loss is calculated, the loss_item is filtered according to two conditions (the second line); Then on the third line, we see that there is a division operation (can it be understood as a normalization of weight? ? Why did the denominator here become?
ignore_mask ! = 255 This conditional region.
2.The second question is:
Because you are aiming at the direction of semantic segmentation in this article, you have adopted two conditions: argmax () and confidence threshold greater than 0.95 when setting false labels for unlabeled data! Since my current research direction does not involve multiple classes and belongs to the task of binary classification, it implicitly shows that 0.5 becomes the first condition, so if I want to learn from your idea, do I still need a threshold of 0.95?
3.The third question:
The third question may be related to the second question. If I don't use 0.95 as the threshold, can the weight of the unlabeled loss be fixed at 0.5 or 0.25? Do you think there are other good ways?
hello! Thanks for the good research!
I tried to use resnet-50,101, but I can't download it with a link that exists in Pretrained Backbone.
please check! 🙏 thank you!
Thank you so much for your wonderful work! Have you considered introducing the Teacher model (EMA) to produce the output of the weakly view? Since in the semi-supervised semantic segmentation, the Teacher model can produce a more robust output.
I was training on Cityscapes with the same config setup but only with 2 GPUs. I got this error which seems to suggest that I could not train the model with a batch size of 1 per GPU due to the batchnorm layer. However, I saw you successfully trained with batchsize=1 in your cityscapes training log. I wonder how this could be fixed, thank you!
Hi,
Great work. I have some confusion about Supervised baseline and UniMatch comparison in Pascal Voc 2012 dataset.
Root Cause (first observed failure):
[0]:
time : 2023-06-19_18:05:03
host : gpu1
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 139953)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================`
Your model is based on the fixmatch method. I see that the flexmatch method is also mentioned in your paper, so why not use flexmatch as the baseline?
您好,看到您论文中提到flexmatch,想请教一下不以它为baseline的原因。
谢谢您
看您关于特征扰动的实现是在原始特征上drop后与原始特征cat起来输入解码器,也就是说,这样计算得到的损失会同时更新编码器+解码器。
作者有试过将drop的特征detach,然后直接输入到解码器吗?这样drop特征输入只训练解码器,也就是梯度只经过解码器。
Where does the pre-training weight of resnet50 come from?
I would like to know where the pre training weights provided in readme are obtained from? I found that there is a significant difference between the results of the resnet50 weight training provided on the Pytorch official website and the weight training results loaded in Readme.
pytorch resnet50: https://download.pytorch.org/models/resnet50-19c8e357.pth
I can find the Cityscapes and COCO weights, but couldn't find any PASCALVOC weight file. As I reproduce the same code without modifications of yours, also checked the config file (as below), there was a slight gap of 2-3%. I got 76.91 on 732 split, which yours were 79.9. Can you upload any weight file for PASCALVOC? Thank you. Really appreciate your works by the way!
[2023-04-10 19:41:31,486][ INFO] {'backbone': 'resnet101',
'batch_size': 2,
'conf_thresh': 0.95,
'config': 'configs/pascal.yaml',
'criterion': {'kwargs': {'ignore_index': 255}, 'name': 'CELoss'},
'crop_size': 321,
'data_root': '/data2/ksy/PASCALVOC2012/',
'dataset': 'pascal',
'dilations': [6, 12, 18],
'epochs': 80,
'labeled_id_path': 'splits/pascal/732/labeled.txt',
'local_rank': 0,
'lr': 0.001,
'lr_multi': 10.0,
'model': 'deeplabv3plus',
'nclass': 21,
'ngpus': 2,
'port': 20024,
'replace_stride_with_dilation': [False, False, True],
'save_path': 'exp/pascal/unimatch/r101/732',
'unlabeled_id_path': 'splits/pascal/732/unlabeled.txt'}
您好,我细读了您出色的半监督语义分割方法,我关注到了附加实验中的变换检测内容,我想请问一下你是如何实施本方法在变换检测任务中,我简单修改源代码使其变为二分类,但是不知道其它具体部分的细节,如果有源代码的话是否可以上传一下相关代码或者描述一下您在应用到变换检测任务中的实现细节。
您好,首先非常感谢您的贡献!
我的问题是,在尝试使用自己的数据集(医学图像血管分割,所以我直接用了more_scenarios/medical)训练时,我尝试修改more_scenarios/medical/spilts/acdc中的文件,发现有如下很多文件
我想请问:
more_scenarios/medical/spilts/mydataset/1/labeled.text
以及more_scenarios/medical/spilts/mydataset/1/unlabeled.text
再次感谢您的贡献,同时请您不吝赐教!
Amazing work! simple and effective!
I noticed that the build of "ignore_mask" requires the "mask"(dataset/semi.py)
Then, I noticed that "ignore_mask", "ignore_mask_mix" are involved in building the loss, so is there some label(gt mask) information used in the calculation of unlabeled data loss?(unimatch.py, also in fixmatch.py)
Thank you!
您好,我在本地进行训练与推理时发现,是否使用cudnn加速(torch.backends.cudnn.enabled)将带来显著差别。我注意到您的ST++代码没有设置(保持默认),而Unimatch代码则设置为True。
我想要请教的是 :
1.训练和测试是否要保证torch.backends.cudnn.enabled设置一致?(我注意到如果不一致,似乎会造成结果错误?)
2.torch.backends.cudnn.enabled的默认是?(我在官方文档中没有找到关于默认设置的描述,特向您请教)
Hi, thanks for the great work!
I have found that you used additional dataloader for the cutmix sources.
Since most of the work will mix by using the image from the same loaded batches, I was wondering what is the intention of using mixing sources from another batch, and will it affect the training result by using the common setting which only using one dataloader and the same batch for mixing? Thanks!
Hello, your work has inspired me a lot! But I have a question about weak augumentation on raw image. The paper mentions applying weak perturbations such as crop and flip to the input images at the image level. However, I couldn't find the specific code for these perturbations. Could you please let me know which perturbations were applied and provide the relevant code if possible? Thank you very much!
Is there any difference between the deeplabv3+ & resnet code here and the code in your previous work(ST++)?
I noticed that there seems to be a difference in parameter settings?
Hi,
Thanks for sharing your code. I found that only config file for pascal 321x321 is presented. Could you please provide your training settings for high resolution training which is 513x513? Or they just share the same setting. Thanks.
Did you use the CutMix to reproduce the results on the PASCAL tables?
Hi, Thanks for your great job,
Can a single GPU be used for training? How to evaluate after training, can you provide eval.sh file
您好,请问一下training_logs中的log文件名字前的数字代表什么含义呢?例如training-logs/Pascal-VOC-2012/High-Quality-Split-Size321/ResNet-101/下的1464-run1.log、183-run1.log、366-run1.log中的1464,183,366。
Hi, thank you for the work!
Not sure if I'm missing something but why are we including the masks for the unlabeled images in the split
folder? I see that in semi.py we are processing the masks for the unlabeled images as well, but according to Algorithm 1 in the paper, we get the predicted mask from the perturbed images.
Is it possible to run this model on a dataset for which we don't have the masks for the unlabeled images?
A nice work!
Is the data split of the Cityscapes dataset used the same as that of the previous work (i,e,. ST++)? I noticed that there were many splits in the previous work. Is it the same as theirs? Thank you!
你好,作者,请问特征扰动的具体代码是哪里呢,我没有找见,可以帮忙标注一下么,谢谢!
你好作者,我想问一下,segmentclass.zip这个文件和官方的数据集不太相同,我想知道这个不同是什么呢。
I noticed that compared to the previous version, the latest code has removed 'multi_grid' from the backbone (resnet) and changed the settings of parameters such as dilations.
May I ask if the adjusted version can improve the performance?
Hi @LiheYoung, thanks for maintaining the list of Awesome Semi-Supervised Semantic Segmentation!
Could you kindly add the following paper into the list? Thanks!
Could you provide a pre-trained model on the Cityscapes dataset? (Under different data partitions)
I think it will help other researchers quickly reproduce this excellent work, thank you!
Hi,
Great work ! I was wondering where your pretrained backbone come from ?
Renaud
Hello
In DusPerb framework is said that a shared weak view of an image is used to supervised the 2 strong views. But in the code of Unimatch I see that each strong view has a corresponding weak pseudo label generated from the forward pass of the weak augmented image. So, it seems that there is a weak1 pseudo label to supervise strong1 image and weak2 pseudo label for the strong2 image. The two strong views have generated by attaching a different random patch so they are needed their own pseudo label that is created with a weakly manner. Could you help me with this?
Also, a second question is if the DusPerb framework by itself without feature perturbation is overrunning the baseline FixMatch.
Many thanks
还是不行,报错了,能指点一下吗
warnings.warn(
usage: launch.py [-h] [--nnodes NNODES] [--nproc_per_node NPROC_PER_NODE] [--rdzv_backend RDZV_BACKEND] [--rdzv_endpoint RDZV_ENDPOINT] [--rdzv_id RDZV_ID] [--rdzv_conf RDZV_CONF] [--standalone]
[--max_restarts MAX_RESTARTS] [--monitor_interval MONITOR_INTERVAL] [--start_method {spawn,fork,forkserver}] [--role ROLE] [-m] [--no_python] [--run_path] [--log_dir LOG_DIR]
[-r REDIRECTS] [-t TEE] [--node_rank NODE_RANK] [--master_addr MASTER_ADDR] [--master_port MASTER_PORT] [--use_env]
training_script ...
launch.py: error: the following arguments are required: training_script, training_script_args
train.sh: 28: unimatch.py: not found
Hello,
Would it be easy to reproduce the results for pascal dataset on a single gpu? What learning rate is required for training on a single gpu to reproduce the results for FixMatch / UniMatch?
Thanks
Great work! According to your paper, UniMatch performs well on most splits, especially small ones (like 92, 183). However, when I tried to reproduce and further explore some more, I found an interesting phenomenon.
In most semi-supervised semantic segmentation methods, using a larger crop size(513 vs 321) usually leads to a performance improvement. When 92 split is selected, UniMatch can achieve good performance (74.5-75.0) in the crop size 321 scenario. However, under the crop size of 513x513, the performance can only reach around 72.5-73.0, and the overfitting will appear very early and lead to performance degradation.
I noticed that in the paper, the result you reported is also the scene under 321 crop size. I wonder if you have done experiments with 513 crop size under a small number of splits(92 or 183)?
Attached here is a change curve of miou during training using 92 split at 513 crop size. The highest performance will be reached at about 10 epochs (72.8). After training for 80 epochs, the model performance will be around 68.
criterion_l = ProbOhemCrossEntropy2d(**cfg['criterion']['kwargs']).cuda(local_rank)
, but it doesn't mention it on the paper, could you please tell me the details to how to use OHEM loss?大佬您好,我想问一下训练自己的数据集得到了权重之后,如何根据权重进行预测测试集?
作者您好,请问您在实现FixMatch时,噪声标签的过滤是过滤掉低置信度的像素,只保留高置信度的像素参与训练?还是过滤掉低置信度的一整张图像,只保留高置信度的像素参与训练?
Hi, thanks for your work.
I want to ask why the CPS result on Cityscapes is different from the original paper.
作者你好,我把医学场景下的backbone由unet换成了deeplabv3+,但由于医学图像是单通道的,无法使用预训练权重,只能从头开始训练,但出现了如下问题,我截取了一个epoch的日志,可以看的更清晰,就是在evaluation阶段,他的每一个dice都是0,请问这个是正常的吗,或者说是哪里出了问题。希望请您指教一下。
[2023-04-10 16:48:11,793][ INFO] ===========> Epoch: 3, LR: 0.00095, Previous best: 0.00
[2023-04-10 16:48:11,981][ INFO] Iters: 0, Total loss: 0.349
[2023-04-10 16:49:00,196][ INFO] Iters: 379, Total loss: 0.165
[2023-04-10 16:49:48,422][ INFO] Iters: 758, Total loss: 0.163
[2023-04-10 16:50:37,289][ INFO] Iters: 1137, Total loss: 0.161
[2023-04-10 16:51:25,787][ INFO] Iters: 1516, Total loss: 0.160
[2023-04-10 16:52:14,801][ INFO] Iters: 1895, Total loss: 0.159
[2023-04-10 16:53:03,536][ INFO] Iters: 2274, Total loss: 0.159
[2023-04-10 16:53:52,277][ INFO] Iters: 2653, Total loss: 0.157
[2023-04-10 16:54:40,751][ INFO] Iters: 3032, Total loss: 0.156
[2023-04-10 16:54:57,018][ INFO] ***** Evaluation ***** >>>> Class [0 Right Ventricle] Dice: 0.00
[2023-04-10 16:54:57,019][ INFO] ***** Evaluation ***** >>>> Class [1 Myocardium] Dice: 0.00
[2023-04-10 16:54:57,019][ INFO] ***** Evaluation ***** >>>> Class [2 Left Ventricle] Dice: 0.00
[2023-04-10 16:54:57,019][ INFO] ***** Evaluation ***** >>>> MeanDice: 0.00
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.