Giter Club home page Giter Club logo

sanet-main's Introduction

Spatial-Assistant Encoder-Decoder Network forReal Time Semantic Segmentation

The official implementation of "Spatial-Assistant Encoder-Decoder Network forReal Time Semantic Segmentation"

Link at arxiv:SANet

English | 中文

SANet

Comparison of SANet and other models
Comparing Inference Speed and Accuracy of Real-Time Models on the Cityscapes Dataset

SANet network structure

SANet Network Structure

APPPM

APPPM did not come about overnight, the prototype of APPPM is DAPPM, our first idea was to try to reduce the branches of DAPPM, which is really good for speed, and at the same time we observed the change of accuracy (we think there is almost no degradation), but eventually we gave up this idea.

After that, we referred to the idea of SPPF, which is a kind of "use two 3x3 convolutions to replace one 5x5 convolution" idea applied to the pooling layer, and then tried to apply the idea of asymmetric convolution to the pooling layer. In the results, the SPPF idea seems to be detrimental to the accuracy (of course, we did not do a thorough study, so we cannot be completely sure), while the asymmetric pooling layer approach APPPM is a feasible approach.

DAPPM DAPPM减少分支 SPPF TAB SPPF TAB ADD
dappm dappm2 sppf sppf_add

The feasibility of APPPM is easy to understand. Say you have an image with 30x30 resolution, if you reduce the image by 1/2 using normal pooling, the resolution is directly reduced to 15x15. If you use convolution to extract features after the pooling layer, it is clear that you can only extract features once at 15x15 resolution. With an asymmetric pooling layer, the resolution is first reduced to 30x15, then to 15x15. If you still put the convolution behind the pooling layer to extract features, you can obviously extract more features at a more detailed resolution.

Of course, from the list above, we also performed the operation of using asymmetric convolution to replace normal convolution as mentioned in DMRNet, and the operation of feature multiplexing as mentioned in DDRNet. But in the end, from the consideration of speed and accuracy, we finally proposed APPPM.(In the beginning, the model was referred to as TAPPM and in SANet's modeling code as TAPPM)

SAD

Specific structure of Simple Attention Decoder

Use

Prepare data

Download the datasets Cityscapes and Camvid from the website. if Camvid comes up with Website not found!, try downloading the Camvid dataset from Motion-based Segmentation and Recognition Dataset or Kaggle!

To further validate the model, we trained on the GTAV dataset with nearly 25,000 sheets

pre-training

In real-time semantic segmentation,Network in ImageNet pre-training common methods, if you wish to use ImageNet for pre-training you can refer to our method. We have used this project ImageNet for pre-training.

train

Download the pre-trained weights we provided and put them into pretrained_models/imagenet/

Configure training parameters in the yaml file under the config folder, such as ExpName (experiment name), ROOT (dataset directory), END_EPOCH (training rounds), etc.

Start the training with our preset script train.sh or use the following command

python tools/train.py --cfg configs/cityscapes/sanet_cityscapes_S.yaml

evaluate

Download the training weights we provided and put them in pretrained_models/cityscapes/orpretrained_models/camvid/

Configure evaluation parameters in the yaml file in the config folder, such as ExpName

python tools/train.py --cfg=configs/cityscapes/sanet_cityscapes_S.yaml

If you wish to submit the results of your test dataset to Cityscapes, change the TEST_SET parameter in the yaml file in the config folder

pre-training weight

ImageNet

Model SANet-S SANet-M SANet-L
Link SANet-imagenet-S SANet-imagenet-M SANet-imagenet-L

Cityscapes

Model Val(%mIou) Test(%mIou) FPS
SANet-S 78.6 \ 79.9 77.2 \ 78.4 65.1
SANet-M 78.8 \ 80.2 77.6 \ 78.8 52.7
SANet-L 79.2 \ 80.6 78.1 \ 79.0 39.6

Camvid

Model Test(%mIou) FPS
SANet-S 78.8 147
SANet-M 79.5 126

GTAV

Model Test(%mIou)
PIDNet-S(no pretrain) 38.2
SANet-S(no pretrain) 38.5
PIDNet-S(Cityscapes) 45.0
SANet-S(Cityscapes) 48.0

Speed

The test speed follows the DDRNet and PIDNet test methodology, and the speed test is in models/sanet_speed.py

Tools

In computer vision, more image representations are beneficial. But the tools provided in different projects are different, the used tools we use are provided here in the hope that they can help others.

Segmentation Image

Segmentation maps are one of the most frequently used image representations for semantic segmentation.

img1 img2 图3

The detailed code tools/generate_segmentation_image.py with other configurations

Boundary Image

A boundary map is an image that shows the boundaries of a single object

bd_img1 bd_img2 bd_img3

The detailed code is in tools/generate_segmentation_image.py, please set boundary to True, and make other configurations.

Heat Map Image

In image segmentation tasks, heat maps can be used to represent the probability of which category or object each pixel belongs to. Each category has a corresponding heat map that shows the distribution of pixels in that category.

hm_img1 hm_img2 hm_img3

The detailed code is in tools/heat_map_drawing/heat_map_generator.py, you need to import pytorch_grad_cam and configure it before using it.

Receptive Field

Receptive Field is the range of influence of a neuron or convolutional kernel in a deep learning model on the input data.

Receptive Field Image

The detailed code is in tools/receptive_field_generator/main.py with other configurations

Receptive Field Calculations

The Receptive Field calculation tool is at tools/receptive_field_tools/sanet_receptive_field_calculator.py.

Multi-category boundary losses

This is an approach mentioned in MSFNet. Unlike the common boundary loss which has only two classifications (yes or no boundaries), the multi-category boundary loss splits into multiple categories (what categories are boundaries) according to the category of the dataset. We tried to incorporate this approach for SANet at that time, but did not find the corresponding code provided by MSFNet. Therefore, we re-implemented this multi-category approach. However, the performance of the SANet to which we added this approach did not perform well.

The detailed code is in tools/multi_class_boundary_detection/multi_class_boundary_detection.py with other configurations

Citation

@misc{wang2023spatialassistant,
      title={Spatial-Assistant Encoder-Decoder Network for Real Time Semantic Segmentation}, 
      author={Yalun Wang and Shidong Chen and Huicong Bian and Weixiao Li and Qin Lu},
      year={2023},
      eprint={2309.10519},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

sanet-main's People

Contributors

cuzaoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

cv-seg yychayitu

sanet-main's Issues

Nan and Inf found in put tensor.

Hi:
When I run the trainning code of cityscaps, there are some abnormal value: the semantic loss still remains 0, and the SB loss is always nan, and find the Nan or Inf found in input tensor. Could you please give us some advice to solve them? many thanks.
1

Dataset about camvid

EXCUSE ME, When I download the camvid dataset, I find the list of camvid (\data\list\camvid) has wrong serial number about annotation. As the serial num of annotation in camvid format like : 0001TP_006870_L.png, but the download camvid annotation formation like : 0001TP_007860.
So could you please provide the correct dataset list. or provide the your dataset download link. Many thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.