We experiment the use of static cameras. Details are in the attached report. Here, we describe usage.
What we used to run the experiments
- Python 2.7.3
- Tensorflow 1.3.0
- Ubuntu 14.04
- NVIDIA GTX 780M
├── dataset
├── all
├── raw : folder containing the raw recordings
├── processed : folder containing the cropped/annotated recordings
├── annotations : folder containing the (approximate) ground truth text file i.e. the output of mask rcnn
├── crops : folder containing the pedestrian crops
├── train : folder containing the training set
├── annotations
├── crops
├── test : folder containing the testing set
├── annotations
├── crops
├── annotator
├── mask-rcnn.pytorch : folder containing the mask rcnn people detector
├── hungarian_tracker.py : python script that runs the hungarian tracker
├── assign_labels.py : python script that assigns the label (cross / did not cross) to each pedestrian
├── crop_images.py : python script that crops the pedestrian and places them in /dataset/crops
├── annotate.sh : shell script that runs the full annotation pipeline
├── annotate-sample1.sh : shell script that sets the parameters for the sample1 video
├── intention_prediction
├── models : pretrained model weights
├── results :
├── scripts : folder containing the model and data loader python scripts
├── train.py : train script
├── train.sh : train script
├── evaluate_lausanne.ipynb : ipython script to visualize results on the lausanne dataset
├── guidedbp_lausanne.ipynb : ipython script to visualize guided-backprop results on the lausanne dataset
├── images : images used for this github repository
├── report.pdf : report
├── slides_midterm.pptx : midterm presentation slides
├── slides_final.pptx : final presentation slides
To augment the dataset of [1], we have that can be automatically annotated. In the following, we describe the steps needed to generate (approximate) the ground truths of your own recordings. The example below is done on the video /dataset/all/raw/sample1.MP4
. A shell script including all the steps have been included.
-
Follow the instructions at https://github.com/roytseng-tw/Detectron.pytorch to set up the Mask RCNN.
-
Set the region of interest that the annotator will operate on in the following order (top left x coordinate, top left y coordinate, width, height). The values for the example below is (0, 875, 650, 250). Use ffmpeg to crop the video and place the output in
/dataset/all/processed/sample1.MP4
.
ffmpeg -y -i ROOT/dataset/all/raw/sample1.MP4 -filter:v "crop=0:875:650:250" ROOT/dataset/all/processed/sample1.MP4
- Run the Mask RCNN people detector on the input video at
/dataset/all/processed/sample1.MP4
. The output is a csv text file at/dataset/all/annotations/sample1.txt
containing the detections. Note that the UID of each detection is initialized as -1 and will be given a unique UID when the tracker is run.
cd ROOT/mask-rcnn.pytorch
python3 -u tools/infer_simple.py --dataset keypoints_coco2017 --cfg ./configs/baselines/e2e_keypoint_rcnn_R-50-FPN_1x.yaml -
load_detectron ./data/weights/R-50-FPN-1x.pkl --image_dir ../../dataset/all/raw/sample1.MP4 --output_dir
../../dataset/all/annotations/sample1.txt
df = df.read_csv("sample1.txt")
print(df)
Frame no | UID | tlx | tlx | width | height | score |
---|---|---|---|---|---|---|
1 | -1 | 474 | 12 | 20 | 56 | 0.995529 |
1 | -1 | 474 | 12 | 20 | 56 | 0.995529 |
- Run
hungarian.py
and set themaximum_allowed_distance
(pixels) andmaximum_limbo_lifetime
(frames) parameters. Themaximum_allowed_distance
prevents a detection att1
from being assigned to a detection att2
if their distance is above said parameter. Themaximum_limbo_lifetime
stops the tracker for any object that has remained in limbo without a successful match i.e. it stops looking for a correspondence for an object if that object has not found a match after said duration. Runninghungarian.py
appends a new columnlabel
to the dataframe and saves the csv file to/dataset/all/annotations/sample1-modified.txt
cd ROOT
python3 ./annotator/hungarian.py ../dataset/all/annotations/sample1.txt --maximum_allowed_distance 50 --maximum_limbo_lifetime 60
- Specify the crossing in the cropped image then run
classify_trajectories.py
to determine pedestrians that crossed the road. Runningclassify_trajectories.py
appends the columnscross
that states if the pedestrian eventually crossed the road, andincrossing
that states if the pedestrian is currently in the crossing or is at the sidewalks. Note that the value ofcross
for each pedestrian will be similar throughout his lifetime.
cd ROOT
python3 ./annotator/classify_trajectories.py --filename ../dataset/all/annotations/sample1-modified.txt --tl 565 70 --tr 650 70 --br 650 300 --bl 465 300
-
Crop the pedestrians to create the dataset. The images of each pedestrian will be located at
/dataset/all/crops/<video_name>/<pedestrian_id>/<frame_number>.png
. For example, the image of pedestrian 2 at frame 1000 for the videosample1.MP4
will be located at/dataset/all/crops/sample1/0000000002/0000001000.png
-
OPTIONAL: The results of the tracker can be visualized by running
annotate_video.py
We remind the users that a shell script including all the steps have been included at /annotator/annotate-sample.sh
and that the folders /dataset/all/crops/sample1
and /dataset/all/annotations/sample1.txt
must be moved to /dataset/train/crops/sample1
and /dataset/train/annotations/sample1.txt
for it to be used in the training set.
We built a simple CNN LSTM as a baseline for our study. Details are in the report. Run train.sh
to train the architecture. The model will be saved at /pedestrian_intention/models/
.
Run evaluate_lausanne.ipynb
to get visual results when classifying at every timestep. In the gif below, a green bounding box indicates a decision of "not crossing" while a red bounding box indicates a decision of "crossing".
Run guidedbackprop_lausanne.ipynb
to get visual results when classifying at every timestep. In the images below, a green bounding box indicates a decision of "not crossing" while a red bounding box indicates a decision of "crossing".