songqi-github / attanet Goto Github PK

View Code? Open in Web Editor NEW

40.0 2.0 14.0 4.65 MB

AttaNet for real-time semantic segmentation.

Python 100.00%

real-time semantic-segmentation pytorch cityscapes aaai2021 scene-parsing

attanet's People

Contributors

Stargazers

Watchers

Forkers

polor1010 eungicho jacke121 rsdljm jie311 dovedx scott-mao haoguo98 xuwenlong0315 oberstwb longsy316 suyanzhou626 lgler sebastianwang9

attanet's Issues

关于精度和速度的测试问题

您好，我想了解一下 70.1% mIoU 的精度的输入大小是512 * 1024还是 1024 * 2048？
看文章中说速度的测试是先resize为512 * 1024测试，然后resize回原尺寸；但是看代码里，对于精度的测试默认直接用scale=1.0测试, 也就是用1024 * 2048的尺度测试了精度，这个我有点困惑，感觉速度和精度的测试并不match, 辛苦解答一下哈~

FPs question

Can I give a code for the FPs test, I can't achieve the Fps use my own code

I‘ve down the dataset here https://www.cityscapes-dataset.com/file-handling/?packageID=3 leftImg8bit_trainvaltest.zip (11GB) [md5] left 8-bit images - train, val, and test sets (5000 images)
The dataset is three folders as above,but your code mentioned /gtFine/train folder，it seems that it doesn't belong any folder(train/val/test).Could you explain for us that how did this folder come about and what does it do？Thanks！

memory usage and fps on CPU-Only

@songqi-github
Your work shows potential, though somethings require clarification:

How much memory and is required for inferencing a 1024 x 1024 image using only CPU?
For cpu-only inferencing, what is the fps count for 1024*1024?

2 class segmentation

Hi there,
Can AttaNet work with high accuracy when having only 2 classes for segmentation?

Quantization and Pruning

@songqi-github
Have you considered to Quantize the ResNet18 model > Training the model > Pruning the trained model > Finetuning ?
i believe that the final ResNet18 model could achieve much more efficiency and fps, while maintaining almost same accuracy.

Loading the pretrained model

when I loaded the pretrained model ,'resnet18-5c106cde.pth', there is a error during the training. As showed below.

Missing key(s) in state_dict: "head.resnet.conv1.weight", "head.resnet.bn1.weight", "head.resnet.bn1.bias", "head.resnet.bn1.running_mean", "head.resnet.bn1.running_var",.......
Unexpected key(s) in state_dict: "conv1.weight", "bn1.running_mean", "bn1.running_var", "bn1.weight", "bn1.bias", "layer1.0.conv1.weight", "layer1.0.bn1.running_mean",........
Do you know how to fix it?

Trained Models 百度网盘失效了，能再上传一份么

There is no train.py

Hi, it's a good work in the real-time semantic segmentation.But, there is no training file here. could you upload it in your free time? Besides, there is another question. what does the meaning of "The training settings require 8 GPU with at least 11GB memory."?

Is there has a python file for the ade20k dataset except for the cityscapes?

The release of the code

Hi there,
when is it expected the release of the code

关于速度测试

作者你好，关于速度测试我有一些疑问，能不能认为你在测速的时候先把图片放缩到更小的resolution，如512x1024，然后把放缩后的图片输入到模型里面去，再将输出放缩回1024x2048？如果真是如此的话，那你此时用的测速resolution不能声称为1024x2048，而是512x1024。要不你就测精度的时候相应地如此操作，否则无法确保公平。

Quantitive results of AttaNet

Hi, thanks for your great work AttaNet and I'm pretty interested in your research.
After reading the papers and reviewing the code, I'm confused about the inference speed and evaluation results of the method.

AttaNet is tested with the input size 512x1024 and achieves 130 FPS with ResNet-18 backbone while the mIoU is evaluated with crop_eval and flip test
see:

AttaNet/evaluate.py

Line 59 in 32fd818

def crop_eval(self, im):

Therefore, the mIoU (78.5 on ResNet-18) is evaluated with crop and flip while the inference time is measured by a single 512x1024 input. The inference time and the evaluated results might not be consistent.

建议更换您的teaser

根据如下几个issue：

#16
#13

这篇论文的Teaser结果完完全全是错误的。

实时模型不能用多尺度crop测试结果却报告小图分辨率的速度。

全图测试结果attanet就76不到的水平。

另外速度测试没有用cuda的同步命令，速度测试也不对。

Visualization result of AFM,SAM and Predicting

The results of your experiment left a deep impression on me. This is a very good job. Can you upload your visualization file，including AFM,SAM and the predicting results?

关于test数据集的小问题

请问一下，测试test的时候是下采样成512*1024送入模型的吗，是否像您的evalute那样crop成了(1024,1024),又是否在测试的时候使用了多尺度等增点手段？

Structural issues with the Strip Attention Module

When I was looking at the SAM part of the code and the paper, I found that the operations of Q and K do not correspond to each other. If I just swap the names of Q and K in the code, it is still not correct because the subsequent transpose operation is still performed by Q.