Giter Club home page Giter Club logo

tutr's Introduction

TUTR: Trajectory Unified Transformer for Pedestrian Trajectory Prediction

This is the official implementation for TUTR: Trajectory Unified Transformer for Pedestrian Trajectory Prediction. [Paper]

Abstract -- Pedestrian trajectory prediction is an essential link to understanding human behavior. Recent work achieves state-of-the-art performance gained from hand-designed post-processing, e.g., clustering. However, this post-processing suffers from expensive inference time and neglects the probability that the predicted trajectory disturbs downstream safety decisions. In this paper, we present Trajectory Unified TRansformer, called TUTR, which unifies the trajectory prediction components, social interaction, and multimodal trajectory prediction, into a transformer encoder-decoder architecture to effectively remove the need for post-processing. Specifically, TUTR parses the relationships across various motion modes using an explicit global prediction and an implicit mode-level transformer encoder. Then, TUTR attends to the social interactions with neighbors by a social-level transformer decoder. Finally, a dual prediction forecasts diverse trajectories and corresponding probabilities in parallel without post-processing. TUTR achieves state-of-the-art accuracy performance and improvements in inference speed of about $10 \times $ - $40 \times$ compared to previous well-tuned state-of-the-art methods using post-processing.

@InProceedings{Shi_2023_ICCV,
    author    = {Shi, Liushuai and Wang, Le and Zhou, Sanping and Hua, Gang},
    title     = {Trajectory Unified Transformer for Pedestrian Trajectory Prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {9675-9684}
}

framework

TUTR employs an encoder-decoder transformer architecture to forecast future motion behaviors.Firstly, the global prediction generates general motion modes.Then, the general motion modes concatenated with the observed embedding are considered as the input tokens of a mode-level transformer encoder.Subsequently, the encoder output attends to the social interactions by a social-level decoder.Finally, two shared prediction heads in dual prediction are used to obtain the dual results, predicted trajectories and corresponding probabilities.

performance & speed

TUTR achieves a balance between accuracy and inference speed.

(A) is the performance comparison between TUTR and other SOTA methods, where the line segments of MemoNet and SocialVAE are the performance changing with post-processing or not. (B) is the speed comparison between the TUTR and the post-processing methods.

visualization

Motion Modes

Prediction

Dependencies

python=3.7.12=hf930737_100_cpython

numpy==1.19.0

scikit-learn==1.0.2

torch==1.12.1+cu113

torchvision==0.13.1+cu113

Dataset

Please download the dataset and extract it into the directory './dataset/' like this:

./dataset/sdd_train.pkl/
./dataset/sdd_test.pkl/

train && evaluation (GPU VERSION)

$ python train.py --dataset_name <dataset_name> --hp_config <config_file> --gpu <sdd_index>

Example for the ETH-UCY and dataset :

python train.py --dataset_name sdd --hp_config config/sdd.py --gpu 0
python train.py --dataset_name eth --hp_config config/eth.py --gpu 0
python train.py --dataset_name hotel --hp_config config/hotel.py --gpu 0
python train.py --dataset_name univ --hp_config config/univ.py --gpu 0
python train.py --dataset_name zara1 --hp_config config/zara1.py --gpu 0
python train.py --dataset_name zara2 --hp_config config/zara2.py --gpu 0

Acknowledgement

The pre-processed datasets are obtained from the dataloader of [SocialVAE].

You can generate pkl-dataset like this:

python get_data_pkl.py --train data/eth/train --test data/eth/test --config config/eth.py

Other datasets can be obtained to modify the corresponding dataset names, such as:

 python get_data_pkl.py --train data/sdd/train --test data/sdd/test --config config/sdd.py

License

This repository is licensed under Apache 2.0.

tutr's People

Contributors

lssiair avatar

Stargazers

yongli avatar  avatar  avatar  avatar Yuzhen Wei avatar  avatar 2018wwc avatar K avatar  avatar FluppyFR avatar  avatar 叶锞 avatar  avatar Ziye Qin avatar Wei Ziang avatar Anbc avatar  avatar  avatar  avatar ggosjw avatar  avatar  avatar Student.Xiaoji avatar  avatar  avatar LM avatar Francesco Verolla avatar HengLiu avatar  avatar  avatar  avatar zachary avatar

Watchers

Kostas Georgiou avatar  avatar

tutr's Issues

How to process the original sdd dataset

the sdd dataset seems different form the original sdd dataset, for example, the original dataset offers a bounding box for an instance and have different trajectory length for different instances. But your dataset just has a position(x,y) and fixed trajectory length for an instance.
Could you tell me how you process it? thank you very much.

About the clf_loss calculation

Hello, firstly thank you for sharing such an excellent work.
When I was running the code you provided, I found that there is a bit of a problem in the part of the code for the calculation of clf_loss. The code uses the same loss function as in the paper - cross entropy loss. In this case, for the calculation of clf_loss in the screenshot below, should the soft_label be replaced with closest_mode_indices. squeeze(), or do you have another calculation method, hope to get your reply, thanks again for sharing your excellent work.
1703410663040

关于实际场景测试的问题

作者您好!
您所分享的代码看起来在test模型的时候,使用模型产生了多mode的预测结果,在计算ade和fde的时候使用的是其中与GT距离最小的一个mode,这是否有些问题?
在实际测试场景中应该是没有GT的,那又应该怎样选择mode呢?
万分感谢您的回答!

visulization

Thank you for the amazing paper, can you please share the visualisation script.
Thank you

Pretrain model?

Hi, Thanks for your amazing work.
I tried to re-create the results for Hotel data follow your provided codes without any modification.
I got 0.13/0.21 for Hotel dataset instead of 0.11/0.18.

Can you share the pre-trained model for this?

Datasets link

Hi, thanks for your work.
When I tried to download the datasets via the provided link, it ask me to apply
for the permission, could you please share the link for public?
Thanks.

dataset

How to use the content represented by the generated pkl file? May I ask if it can be answered? Thank you @lssiair

make our own dataset

Hi, thanks for your work.
I'm trying to make a our own dataset, could you please share the way to do it?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.