Giter Club home page Giter Club logo

crowd_analysis's Introduction

Crowd estimation: CSRNet

Update 08-06-2020: Fixed broken link to dataset. I tried to upload the weights but they're too heavy for Github to handle. You can train the model and download it by just running the Colab notebook.

Train a Congested Scene Recognition network (CSRNet) using the methodology of the paper. This repo is an updated version from the other ones as it works on Python 3.7 and fixes some inconsistencies from the original repo.

If you wish to train the model from scratch, you can do so with the provided Colab notebook.

Context

Estimating the number of people in a crowd has been an open area of research. There are a lot of applications that come in handy such as crowd monitoring, not neessarily for people, as it can also be used to calculate the number of bacteria in a sample, or the number of cars on a traffic jam.

There had been three approaches for tackling this challenge:

  1. Detection-based methods: Train an object detector and then count the total number of detections. The problem here is that detectors perform poorly on congested scenes due to highly-occluded targets.
  2. Regression-based methods: Train a regressor that takes as input an image and directly outputs the total number of objects. As mentioned by Li et al. [1], saliency is overlooked, thus causing inaccurate results.
  3. CNN-based methods: Use fully Convolutional Neural Networks (fCNNs) that generate density maps not only used for counting, but also to model crowd distributions. This approach has outperformed the other ones, and the most popular structure is Multi-column Convolutional Neural Network (MCNN), introduced by Zhang et al. [2].

Li et al. [1] followed the approach of Zhang et al. [2] when it comes to generating an output density map such that its sum is equal to the total number of people, given an input image (see Figure 1). As opposed to the MCNN structure, they introduced a deeper fCNN model which yields state-of-the-art results on the ShanghaiTech, UCSD, WorldExpo'10, UCF_CC_50, and Trancos datasets.

Density map
Figure 1. Groundtruth samples. First row is the input image and second row is the resulting density map

Datasets

For the moment only the ShanghaiTech dataset (both parts A and B) is available for training, but part of my future work is to support other datasets for training, validation and testing. Either way, here are the links for all of them.

Training and data generation

Model training supports both GPU and CPU, the former being recommended. If you wish to train the model from sratch then you can use the provided Colab notebook or clone this repo, download a dataset (e.g. Shanghai dataset) and do:

foo@bar:crowd_estimation$ mv path/to/dataset_folder Shanghai
foo@bar:crowd_estimation$ python make_dataset.py Shanghai/ Shanghai_A
foo@bar:crowd_estimation$ python train.py part_A/trainval.json part_A/test.json
foo@bar:crowd_estimation$ python validation.py path/to/model.pth.tar part_A/test.json

There are a couple of flags for training such as allowing data augmentation (-a True/False) and continue training a model (-p path/to/model.pth.tar).

Experiments

After training the model for ~50 epochs these were the results (see Figure 2 to visually check the output of the model):

Model MAE MSE SSIM
Shanghai_A 82.941 131.376 0.697
Shanghai_B 10.611 16.282 0.977

As we can see, metrics for part B are similar to those mentioned in the paper. Nonetheless, the model for part A still needs to continue training. I'll do that as soon as Colab lets me use its GPU instances again.

Crowd counting
Figure 2. Outputs of a trained CSRNet on Shanghai part A. First row is input image, second row is groundtruth density map, and third row shows the outputs of the model

Future Work

  • Make training available for all datasets mentioned in the paper.
  • Upload the trained models ready for downloading.

Reference list

[1] Li, Y., Zhang, X., & Chen, D. (2018). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1091-1100).

[2] Zhang, Y., Zhou, D., Chen, S., Gao, S., & Ma, Y. (2016). Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 589-597).

crowd_analysis's People

Contributors

tempdata73 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.