Giter Club home page Giter Club logo

bounding-box-prediction's Introduction

Pedestrian bounding box prediction library:

Introduction:

This is the official code for the papers "Pedestrian Intention Prediction: A Multi-task Perspective", accepted and published in hEART 2020 (the 9th Symposium of the European Association for Research in Transportation) and "Pedestrian 3D Bounding Box Prediction", accepted and published in hEART 2022 (the 10th Symposium of the European Association for Research in Transportation).

Absracts:

Pedestrian Intention Prediction: A Multi-task Perspective
In order to be globally deployed, autonomous cars must guarantee the safety of pedestrians. This is the reason why forecasting pedestrians' intentions sufficiently in advance is one of the most critical and challenging tasks for autonomous vehicles. This work tries to solve this problem by jointly predicting the intention and visual states of pedestrians. In terms of visual states, whereas previous work focused on x-y coordinates, we will also predict the size and indeed the whole bounding box of the pedestrian. The method is a recurrent neural network in a multi-task learning approach. It has one head that predicts the intention of the pedestrian for each one of its future position and another one predicting the visual states of the pedestrian. Experiments on the JAAD dataset show the superiority of the performance of our method compared to previous works for intention prediction. Also, although its simple architecture (more than 2 times faster), the performance of the bounding box prediction is comparable to the ones yielded by much more complex architectures.

Pedestrian 3d Bounding Box Prediction
Safety is still the main issue of autonomous driving, and in order to be globally deployed, they need to predict pedestrians’ motions sufficiently in advance. While there is a lot of research on coarse-grained (human center prediction) and fine-grained predictions (human body keypoints prediction), we focus on 3D bounding boxes, which are reasonable estimates of humans without modeling complex motion details for autonomous vehicles. This gives the flexibility to predict in longer horizons in real-world settings. We suggest this new problem and present a simple yet effective model for pedestrians’ 3D bounding box prediction. This method follows an encoder-decoder architecture based on recurrent neural networks, and our experiments show its effectiveness in both the synthetic (JTA) and real-world (NuScenes) datasets. The learned representation has useful information to enhance the performance of other tasks, such as action anticipation.

Contents


Repository structure


|─── datasets                            : Scripts for loading different datasets
        |─── jaad.py
        |─── jta.py
        |─── nuscenes.py
|─── preprocess                          : Scripts for preprocessing
        |─── jaad_preprocessor.py
        |─── jta_preprocessor.py
        |─── nu_preprocessor.py
        |─── nu_split.py     
|─── visualization                       : Scripts for visualizing the results  
        |─── visualize.py                       
├── train.py                : Script for training PV-LSTM  
├── test.py                 : Script for testing PV-LSTM  
├── networks.py             : Script containing the implementation of the network
├── utils.py                : Script containing necessary math and transformation functions

Proposed method


Our proposed multitask Position-Speed-LSTM (PV-LSTM) architecture. 2D bounding box predictions with crossing intention

3D bounding box predictions with attribute

Results


Example of outputs Example of outputs 2D real world JAAD dataset

Example of outputs 3D synthetic JTA dataset

Example of outputs 3D real world NuScenes dataset

Installation


Start by cloning this repositiory:

git clone https://github.com/vita-epfl/bounding-box-prediction.git
cd bounding-box-prediction

Create a new conda environment (Python 3.7):

conda create -n pv-lstm python=3.7
conda activate pv-lstm

And install the dependencies:

pip install -r requirements.txt

Datasets


Currently supporting the following datasets:

The network only takes bounding box annotations, thus videos and images are only needed for visualization.

For JAAD and JTA datasets, the preprocessing script first saves files containing all available samples to a preprocessed_annotations folder (by default created in the dataset's home directory). These files are then used during dataloading to allow for creation of data with different input/output/stride values.

However, for nuScenes the final .csv data files are generated directly so there will be no preprocessed_annotations folder.

JAAD

Clone repo and copy over preprocessing file

git clone https://github.com/ykotseruba/JAAD
cd JAAD
cp jaad_data.py ./bounding-box-prediction/preprocess/

Run the preprocessor script, train/val/test ratios must be in [0,1] and their sum should equal 1

python3 preprocess/jaad_preprocessor.py --data_dir=/path/to/JAAD --train_ratio=0.7 --val_ratio=0.2 --test_ratio=0.1 

For visualization, download the JAAD clips (UNRESIZED) and unzip them in the videos folder. Then, run the script split_clips_to_frames.sh to convert the JAAD videos into frames. NOTE: Each frame will be placed in a folder under the scene folder. Note that this takes 169G of space.

JTA

Clone the repo and preprocess the dataset according to original train/val/test ratios.

git clone https://github.com/fabbrimatteo/JTA-Dataset
python3 preprocess/jta_preprocessor.py --data_dir=/path/to/JTA 

For visualization, fownload the full dataset following instructions from the official repository.

nuScenes

Clone repo and copy over required folder for preprocessing.

git clone https://github.com/nutonomy/nuscenes-devkit
cd nuscenes-devkit/python-sdk
cp -r nuscenes/ ../bounding-box-prediction/preprocess/

Download the nuScenes dataset from the official website under the heading Full dataset (v1.0) The full Trainval dataset was used, but Mini dataset is great for quick tests. NOTE: the Test dataset was not used because there are no annotations.

Preprocess the dataset using custom train/val/test split specified in split.py. The version should match the type of dataset used (eg. v1.0-mini). The input and output sizes default to 4 unless otherwise specified.

python preprocess/nu_preprocessor.py --data_dir=/path/to/nuscenes --version=v1.0-mini --input=4 --output=4

Training/Testing


CLI

required arguments:
  --data_dir        Path to dataset
  --dataset         Dataset name
  --out_dir         Path to store outputs (model checkpoints, logs)
  --task            What task the network is performing, choose between '2D_bounding_box-intention', 
  '3D_bounding_box', '3D_bounding_box-attribute' 
  --input           Sequence input length in frames 
  --output          Sequence output length in frames
  --stride          Sequence stride in frames 

optional arguments:
  --is_3D         Whether the dataset is 3D (default: False)
  --dtype         Data type train/test/val (default: None)
  --from_file     Flag to set whether to load existing csv data (default: None)
  --save          Whether to save loaded data (default: True)
  --log_name      Name for output files (default: None)
  --loader_workers  How many load workers (default: 10)
  --loader_shuffle  Shuffle during loading (default: True)
  --pin_memory      Data loading pin memory (default:False)
  --device          GPU (default: 'cuda')
  --batch_size      Batch size (default: 100)
  --n_epochs        Training epochs (default: 100)
  --hidden_size     Network hidden layer size (default: 512)
  --hardtanh_limit  (default: 100)
  --skip            How many frames to skip, where 1 means use consequtive frames (default: 1)
  --lr              Learning rate (default: 1e-5)
  --lr_scheduler    Whether to use scheduler (default: False)

Examples

Commands for training different datasets:

python3 train.py --data_dir=/path/to/JAAD/processed_annotations --dataset=jaad --out_dir=/path/to/output --n_epochs=100 --task='2D_bounding_box-intention' --input=16 --output=16 --stride=16

python3 train.py --data_dir=/path/to/jta-dataset/preprocessed_annotations --data=jta --out_dir=/path/to/output --n_epochs=100 --task='3D_bounding_box' --input=16 --output=16 --stride=32

python3 train.py --data_dir=/path/to/nuscenes --dataset=nuscenes --out_dir=/path/to/output --n_epochs=100 --task='3D_bounding_box-attribute' --input=4 --output=4 --stride=8

Test the network by running the command:

python3 test.py --data_dir=/path/to/JAAD/processed_annotations --dataset=jaad --out_dir=/path/to/output --task='2D_bounding_box-intention' 

Tested Environments


  • Ubuntu 18.04, CUDA 10.1
  • Windows 10, CUDA 10.1

Citations

@inproceedings{bouhsain2020pedestrian,
title={Pedestrian Intention Prediction: A Multi-task Perspective},
 author={Bouhsain, Smail and Saadatnejad, Saeed and Alahi, Alexandre},
  booktitle = {European Association for Research in Transportation  (hEART)},
  year={2020},
}
@inproceedings{saadatnejad2022pedestrian,
title={Pedestrian 3D Bounding Box Prediction},
 author={Saadatnejad, Saeed and Ju, Yi Zhou and Alahi, Alexandre},
  booktitle = {European Association for Research in Transportation  (hEART)},
  year={2022},
}

bounding-box-prediction's People

Contributors

celinna avatar saeedsaadatnejad avatar smail8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bounding-box-prediction's Issues

imbalanced cross/ not cross labels

I do not know why I find that only about 10% samples are with cross labels. Could you help check the ratio of positive samples ?

If the data is indeed imbalanced, other metrics like the recall, f1-score of intention prediction may be important.

Can I run JAAD dataset alone with current version of the model?

I have noticed that this model uses 3 datasets from JAAD, JTA to NuScenes; may I ask if the current version of the model on GitHub is able to run JAAD alone (excluding JTA and NuScenes) ? At this stage I only want to run JAAD dataset, do I need to modify any files to do that?

Problem with downloading JAAD clips

Hi,
Thanks for making your work available on github.
I tried to download the JAAD clips but the link is not working for me. Is there another way to access the data?

Thank you in advance.
Zeynab

Visualisation of the result

Hi, thank you for sharing your work.

i want to make prediction and its visualization on a video:

  1. would you propose to me an available and efficient bounding box detector's code to use?

  2. How to feed this code into the model, like where should i make changes so it works ?

Best regards,

Can't pickle <class '__main__.args'>: it's not the same object as __main__.args

Sorry sir, there are some problems when I run your code.

Traceback (most recent call last):
  File "D:\anaconda\envs\open\lib\site-packages\IPython\core\interactiveshell.py", line 3524, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-9a191744a90c>", line 1, in <module>
    for idx, (obs_s, target_s, obs_p, target_p, target_c, label_c) in enumerate(train):
  File "D:\anaconda\envs\open\lib\site-packages\torch\utils\data\dataloader.py", line 368, in __iter__
    return self._get_iterator()
  File "D:\anaconda\envs\open\lib\site-packages\torch\utils\data\dataloader.py", line 314, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "D:\anaconda\envs\open\lib\site-packages\torch\utils\data\dataloader.py", line 927, in __init__
    w.start()
  File "D:\anaconda\envs\open\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "D:\anaconda\envs\open\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "D:\anaconda\envs\open\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "D:\anaconda\envs\open\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\anaconda\envs\open\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class '__main__.args'>: it's not the same object as __main__.args

I don't know how to solve it. Could you help me to solve it? Thank you!

visualization

How can I visualize the results in 2D when using the JAAD dataset? Think you for your help.

Intention prediction

Hi,
I am running your code with JAAD dataset. Without making any changes in the code, when I calculate F1 score for Intention prediction and state prediction it is basically zero and the same values of Intention accuracy (0.8717)and state accuracy(0.9257) keep repeating.
image
image

Can you see, why is it like that?
Thank you

you are losing the batch information

imagen

Why do you do this? You are losing your batch information, since you go from:

obs_len x batch_size x 2

To:

obs_len x hidden_dim

This does not make sense.

How to visualize the outputs of the network?

Hi @SaeedSaadatnejad,

Thank you for this repository 💯

We are working on your interesting paper entitled "Pedestrian Intention Prediction: A Multi-task Perspective".

Please can you help us to visualize (plot bounding-box and its label) the outputs of the network on each image as you have done in the following image :

/Images/visualizations.png

We know that it can be possible to do it by modifying the "test.py" file but we didn't understand some outputs as:

with torch.no_grad():
speed_preds, crossing_preds, intentions = net(speed=obs_s, pos=obs_p, average=True)
speed_loss = mse(speed_preds, target_s)/100

obs_s is the computed speed for each pedestrian ? calculated as [new position given by the detector] - [last estimated position] ?
obs_p is the new position (of each pedestrian) given by the detector?
speed_preds is the estimated speed given by the network?
crossing_preds is the estimated position given by the network? we think that estimated positions are calculated after at line 99?
intentions is the predicted intention given by the network?

crossing_loss = 0
for i in range(target_c.shape[1]):
crossing_loss += bce(crossing_preds[:,i], target_c[:,i])
crossing_loss /= target_c.shape[1]
avg_epoch_val_s_loss += float(speed_loss)
avg_epoch_val_c_loss += float(crossing_loss)

Does this code compute the error on the estimated position "crossing_preds"? or the intersection between bounding-boxes?
preds_p = utils.speed2pos(speed_preds, obs_p)
ade += float(utils.ADE_c(preds_p, target_p))
fde += float(utils.FDE_c(preds_p, target_p))
aiou += float(utils.AIOU(preds_p, target_p))
fiou += float(utils.FIOU(preds_p, target_p))

pres_p represents the estimated position? what about ade, fde, aiou, fiou?
target_c = target_c[:,:,1].view(-1).cpu().numpy()
crossing_preds = np.argmax(crossing_preds.view(-1,2).detach().cpu().numpy(), axis=1)
label_c = label_c.view(-1).cpu().numpy()
intentions = intentions.view(-1).detach().cpu().numpy()
state_preds.extend(crossing_preds)
state_targets.extend(target_c)
intent_preds.extend(intentions)
intent_targets.extend(label_c)

label_c represents the ground-truth? and intentions represents the estimated/predicted intention?

**what about state_preds, state_targets, intent_preds, intent_targets?

In advance, THANK YOU for your help

Feature Idea: Video Prediction

Hey, thanks for sharing your work!

Would it be possible to implement the prediction and visualizing it in "in the wild" videos?
I understand that the code would have to detect the bounding boxes of pedestrians and their speed and input this in the model, correct?
Are there any intentions to provide such functionality or some hints how to do this?

Kind regards!

Removing all crossing labels?

data = data.drop(data[data.crossing_obs.apply(lambda x: 1. in x)].index)

Doesn't this line remove any labels that are not 1 (crossing), effectively only leaving only 0 (not crossing)? I may not be understanding this properly, please correct me if I'm wrong.

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.