Giter Club home page Giter Club logo

imvisible's Introduction

ImVisible: Pedestrian Traffic Light Dataset, LytNet Neural Network, and Mobile Application for the Visually Impaired

The implementation of the work done in the following two papers: https://arxiv.org/abs/1907.09706, https://arxiv.org/abs/1909.09598.

Introduction

This project consists of three sections. First, we provide an image dataset of street intersections, labelled with the color of the corresponding pedestrian traffic light and the position of the zebra crossing in the image. Second, we provide a neural network adapted off of MobileNet v2 (LytNet) that accepts a larger input size while still running at near real-time speeds on an IPhone 7. Third, we provide a demo iOS application that is able to run LytNet and output the appropriate information onto the phone screen.

Pedestrian-Traffic-Lights (PTL) is a high-quality image dataset of street intersections, created for the detection of pedestrian traffic lights and zebra crossings. Images have variation in weather, position and orientation in relation to the traffic light and zebra crossing, and size and type of intersection.

Stats

Training Validation Testing Total
Number of Images 3456 864 739 5059
Percentage 68.3% 17.1% 14.6% 100%

Use these stats for image normalization:
mean = [120.56737612047593, 119.16664454573734, 113.84554638827127]
std=[66.32028460114392, 65.09469952002551, 65.67726614496246]

Labels

Each row of the csv files in the annotations folder contain a label for an image in the form:

[file_name, class, x1, y1, x2, y2, blocked tag].

An example is:

['IMG_2178.JPG', '2', '1040', '1712', '3210', '3016', 'not_blocked'].

Note that all labels are in String format, so it is neccessary to cast the coordinates to integers in python.

Classes are as follows:

0: Red

1: Green

2: Countdown Green

3: Countdown Blank

4: None

With the following distribution:

Red Green Countdown Green Countdown Blank None
Number of Images 1477 1303 963 904 412
Percentage 29.2% 25.8% 19.0% 17.9% 8.1%

Images may contain multiple pedestrian traffic lights, in which the intended "main" traffic light was chosen.

The coordinates represent the start and endpoint of the midline of the zebra crossing. They are labelled as the position on the original 4032x3024 sized image, so if a different resolution is used it is important to convert the image coordinates to the appropriate values or normalize the coordinates to be between a range of [0, 1].

Download

Annotations can be downloaded from the annotations folder in this repo. There are three downloadable versions of the dataset. With our network, the 876x657 resolution images was used during training to accomodate random cropping. The 768x576 version was used during validation and testing without a random crop.

The 4032x3024 full resolution dataset is out! Download here: Part 1, Part 2.

Model

LytNet V1

We created our own pytorch neural network LytNet that can be accessed from the Model folder in this repo. The folder contains both the code and the weights after running the code with the dataset. Given and input image, our model will return the appropriate color of the traffic light, and two image coordinates representing the predicted endpoints of the zebra crossing.

Here are the precisions and recalls for each class:

Red Green Countdown Green Countdown Blank
Precision 0.97 0.94 0.99 0.86
Recall 0.96 0.94 0.96 0.92

Here are the endpoint errors:

Number of Images Angle Error (degrees) Startpoint Error Endpoint Error
Unblocked 594 5.86 0.0725 0.0476
Blocked 145 7.97 0.0918 0.0649
All 739 6.27 0.0763 0.0510

Our network is adapted from MobileNet, with a larger input size of 768x576 designed for image classification tasks that involve a smaller object within the image (such as a traffic light). Certain layers from MobileNet v2 were removed for the network to run at a near real-time frame rate (21 fps), but still maintain high accuracy.

This is the structure of LytNet V1:

LytNet V2

Our network has been updated, achieving better accuracy. Below is a comparison between our two networks:

Network Red Green Countdown Green Countdown Blank
Precision LytNet V1 0.97 0.94 0.99 0.86
LytNet V2 0.98 0.95 0.99 0.93
Recall LytNet V1 0.96 0.94 0.96 0.92
LytNet V2 0.96 0.96 0.97 0.97
Network Angle Error (degrees) Startpoint Error Endpoint Error
LytNet V1 6.27 0.0763 0.0510
LytNet V2 6.15 0.0759 0.0477

Application

A demo iOS application is also provided. Requirements are iOS 11 and above. The application continuously iterates through the flowchart below:

To use the application, open the LYTNet demo xcode project and add a developer team. Build the application on a device with iOS 12.0 or above.

Citations

Please consider citing our papers in your publications if this project helped with your research. The BibTeX references are as follows:

@InProceedings{yu2019lytnet,
  title = {LYTNet: A Convolutional Neural Network for Real-Time Pedestrian Traffic Lights and Zebra Crossing Recognition for the Visually Impaired},
  author = {Yu, Samuel and Lee, Heon and Kim, John},
  booktitle = {Computer Analysis of Images and Patterns (CAIP)},
  month = {Aug},
  year = {2019}
}
@InProceedings{yu2019lytnetv2,
  author = {Yu, Samuel and Lee, Heon and Kim, Junghoon},
  title = {Street Crossing Aid Using Light-Weight CNNs for the Visually Impaired},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV) Workshops},
  month = {Oct},
  year = {2019}
}

imvisible's People

Contributors

heon01px2020 avatar samuelyu2002 avatar vladd11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

imvisible's Issues

Expected input[32, 768, 3, 576] to have 3 channels, but got 768 channels instead.

Hi, I just set the correct paths to the annotations and to your datasets, I run the 'training.py' locally and I got the following error:

RuntimeError: Given groups=1, weight of size 32 3 3 3, expected input[32, 768, 3, 576] to have 3 channels, but got 768 channels instead

Also a warning:

UserWarning: nn.init.xavier_normal is now deprecated in favor of nn.init.xavier_normal_.

LYTNETV2 not working, while LYTNET yes

With the same files, the training.py crashes only when uses LytnetV2, with the following error:

RuntimeError: Given input size: (960x9x12). Calculated output size: (960x0x1). Output size is too small

From zebra crossing line to traffic light position

Hello,

We're working on changing the labels to get the prediction of the position of the traffic light, instead of the points for the zebra crossing prediction. We finished labeling and we have ascertained that the new coordinates are good. The new (x1,y1,x2,y2) refers to the upper-left angle of the traffic light box (p1) and the bottom-right angle of the traffic light box (p2), so that to get a bounding box.

Problem with Lytnet: the precisions of classes remain always [0.30, 0.29, 0, 0, 0], even after the 600th epoch. Do you have any suggestion to how set the coordinates to predict the position of the traffic light with Lytnet?

Thank you very much in advance

How to use your model for real time image

Hi guys! First of all, Amazing work! I am currently working on a project that used Jetson nano and CSI camera. It's a detector meant to help visually impaired person. I am currently trying your model but I am troubling making it to read real time footage . Can you guys tell me what to search for or what to learn? Thanks!

using model input parameter issue

import torch
import torch.nn as nn
from LYTNet import LYTNet
from LYTNetV2 import LYTNetV2

from torch.utils.data import DataLoader
from dataset import TrafficLightDataset

MODEL_PATH = './LytNetV1_weights'
device = torch.device('cpu')
model=LYTNet()
model.load_state_dict(torch.load(MODEL_PATH,map_location=device))
model.eval()

test_file_loc = './traffic/testing_file.csv'
test_image_directory = './traffic/PTL_Dataset_768x576'

import numpy as np
from PIL import Image
size=(768,576)
im = Image.open('./traffic/PTL_Dataset_768x576/john_IMG_0671.jpg' )
#im = pilimg.open('./traffic/PTL_Dataset_768x576/heon_IMG_0776.jpg' )

im=im.resize(size)
im.show()

pix = np.array(im)
pix=torch.Tensor(pix).type(torch.FloatTensor)
#print(pix.shape)

pix=pix.unsqueeze(0)
pix=pix.view([1,-1,576,768])
#print(pix.shape)

pred_classes, pred_direc = model(pix)
_, predicted = torch.max(pred_classes, 1)
print(predicted)

It works and output was "tensor([4])" .
But when I put green light image, it says it's "tensor([4])" almost every green light images.
I think it had problem on input parameter.
Please help..

Is there a requirements.txt?

Hi. I have some issues to run training with latest version of Pytorch.
I have no time to fix they, so I wanna just downgrade the version, but I don't know what version I should use.

what is input of model?

import torch
import torch.nn as nn
from LYTNet import LYTNet
from LYTNetV2 import LYTNetV2

from torch.utils.data import DataLoader
from dataset import TrafficLightDataset

MODEL_PATH = './LytNetV1_weights'
device = torch.device('cpu')
model=LYTNet()
model.load_state_dict(torch.load(MODEL_PATH,map_location=device))
model.eval()

test_file_loc = './traffic/testing_file.csv'
test_image_directory = './traffic/PTL_Dataset_768x576'

import numpy as np
from PIL import Image
size=(768,576)
im = Image.open('./traffic/PTL_Dataset_768x576/john_IMG_0671.jpg' )
#im = pilimg.open('./traffic/PTL_Dataset_768x576/heon_IMG_0776.jpg' )

im=im.resize(size)
im.show()

pix = np.array(im)
pix=torch.Tensor(pix).type(torch.FloatTensor)
#print(pix.shape)

pix=pix.unsqueeze(0)
pix=pix.view([1,-1,576,768])
#print(pix.shape)

pred_classes, pred_direc = model(pix)
_, predicted = torch.max(pred_classes, 1)
print(predicted)

It works and output was "tensor([4])" .
But when I put green light image, it says it's "tensor([4])" almost every green light images.
I think it had problem on input parameter.
Please help..

Where can I get the dataset?

Hello,
For this kind of application, I have searched for a long while and the implementation is very good! Now, the Pedestrian-Traffic-Light (PTL)-dataset you mention in the Readme is not in the repository and I also can't find it anywhere else on the internet. I'd like to train some other models with it to see how it performs.
Can you please tell me how to get the images of the dataset? Thanks!

关于数据集咨询

作者您好,我下载了您上传的数据集768576与876657,但是没看到里面有标签文件,请问下标签文件在哪下载啊。> <

Clarification on Labeling Resolution for Coordinate Values

Thanks for the dataset!
I have noticed that the values for x1, y1, x2, y2 are not in normalized form.
I would like to utilize these labels for smaller image resolutions for my custom model.

Therefore, I would appreciate it if you could specify the resolution that is taken into account when labeling, so I can proceed with remapping.

Once again, Thank you for your work!

@samuelyu2002

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.