Background

Traffic signs play a significant role in the modern traffic system, and it becomes even more important in recent years due to the emergence of autonomous vehicles. The problem we are aiming to solve is to use the computer vision technologies to do the stop sign detection under different driving scenarios and extreme cases.

Motivation

With the development of autonomous vehicles, recognizing stop signs plays a very important role. Although in the future, when connected autonomous vehicles are developed, this may be solved by using roadside units sending instructions to the vehicles, nowadays, we still need to rely on the vehicles themselves to recognize these signs. Furthermore, vehicles that can recognize traffic signs will improve the driving experiences and safety for the drivers. Most drivers may have experienced situations when they were driving too fast on the highways and did not see clearly what the road guidance sign tells, hence driving to the wrong direction. In other cases, drivers may not see speed limit signs when the environment is dark, hence getting a ticket on over speeding.

These scenarios can all be avoided as long as we can develop an application that can automatically recognize the traffic signs. The vehicles that are equipped with this application can extract the key information, and either speak it out or display it on the head-up displays (HUD).

For our project, we focus on stop sign detection.

Current state-of-the-art

In general speaking of stop signs detection technology, it’s already well developed. It is a significant part of Advanced driver-assistance systems(ADAS). Such technologies are already used broadly by vehicle manufacturers on their products, such as BMW, Ford and Volvo, to improve the performance and safety. Most cars with an ADAS are able to detect stop signs.

The cornerstones of the stop signs detection are based on two facts:

The features of stop signs are identical in the U.S. There is a standard of the design and size of all traffic sign published by MUTCD in the U.S.
The cutting-edge algorithms enable cars to analyze and understand the content of stop signs.

The concept was brought up in 1968, but the recent blossom was triggered by the breaking through in hardware and machine learning algorithms.

Following are some widely-used approaches to the problem:

Histogram of oriented gradients (HOG) + supervised learning (SVM, random forest)
Deep learning approaches (RCNN, Mask-RCNN, Fast-RCNN, YOLO, ...)

Structure of the webpage

In the following of the webpage, we will talk about the two approaches we implemented: the deep learning approach and the traditional computer vision approach associated with the RBF kernel SVM classifier.

Approach 1 - deep learning

Overview

In this part, we tried to use YOLOv3 (You Only Look Once v3) to perform traffic sign detection with COCO dataset and LISA Traffic Sign Dataset (LISA), we examed our results by using challenging condition pictures.

First try: YOLOv3 + COCO

COCO data set

COCO is a large-scale object detection, segmentation, and captioning dataset. We chose COCO as our data set because we do not need to reformat the annoation, and COCO has a large set of training data of stop sign.

The resolution of the training data is high and all pictures were taken under normal conditions.

A peek of COCO data set:

Input data preparation

We used COCO API to preprocess our data using the following command:

# get all images containing given categories, select one at random

        catIds = coco.getCatIds(catNms=['stop_sign']);

        imgIds = coco.getImgIds(catIds=catIds );

YOLOv3

YOLOv3 is a very famous object detector, which has the following advantages:

Fast: Can perform detection in 30FPS live video
Arcurate: Have a fairly good arcuracy in pre-trained model
Active community: Easy to find solution

Training

We used the following packages and facilities to help us train the model:

OS: Ubuntu 18.04 LTS
CUDA: 10.01
GPU: GTX 2080 Ti
OPENCV: 3.4.4

Testing results

Our result for normal cases was good, and the IOU was 0.973 as follows:

Extreme cases

Although YOLOv3 performed well on normal images, it failed for all extreme cases. We summarize the extreme cases as follows:

Blur

Darkness

Dirty lens

Over-exposure

Rain

Snow

Haze

Second try: YOLO-Tiny + LISA

LISA

To solve the extreme cases problem, we chose to use some other data set. We chose LISA Traffic Sign Dataset, which contains 47 US sign types. Here are some image examples of the dataset.

YOLO-Tiny

Instead of 102 layers of YOLOv3, YOLO-Tiny has only nine layers, which is much faster for training.

Image pre-processing

Our algorithm for image pre-processing is shown as follows:

Training

The configuration of the YOLO-Tiny model is shown as follows:

Batches = 64
Subdivision = 8
maxbathes = 60000
steps = 48000, 54000
Momentum = 0.9
learning_rate=0.001
burn_in=1000

Testing results

Here are part of the testing results, where the model successfully detects stop signs under dark environment.

Approach 2 - SVM + HOG

Algorithm

Our second approach is called SVM + HOG approach. It includes image processing, ROI (region of interest) selection, HOG feature extraction and supervised learning.

Our detailed algorithm is shown below.

Datasets

Original datasets

We randomly downloaded around 135 pictures that contain one or more stop signs from the Internet, to form our original positive dataset. Some of them are shown below.

For negative dataset, we download 415 pictures from [1]. All the pictures do not contain any stop signs. Here are some of them.

Positive training dataset

The generation of positive training dataset requires a set of image processing steps. Here we use an example [2] to illustrate the process how we crop the ROI to generate our positive training dataset.

Step 1, convert the orignial image to HSV color ranges. HSV stands for hue, saturation and value, and is more robust to color segmentation. We define a color range to select out all the red objects, as shown below.

Step 2, we use a set of morphological transformations, including dialation and erosiion, to romove the noise in the image.

Step 3, we fill in holes within each connected component. As you will see, after filling the holes, the text "stop" in the stop sign is removed. In this way, we can remove unneccessary contours and decrease the number of ROI candidates to reduce the running time and hence increase the detection accuracy.

Step 4, we set a threshold, and discard all the contours with size smaller than that threshold. With this, we can remove small (unneccessary) objects.

Step 5, for each contour, or connected component, we find its top left and bottom right coordinates. By multiplying these coordinates with some slack factor (here we used 1.1), we can draw the bounding boxes. Each subimage in the bounding boxes is a ROI candidate.

Step 6, removing all the small bounding boxes that are involved in some larger bounding boxes gives us all the ROI candidates.

Step 7, select out all the subimages within each bounding box and scale them to the same size.

Step 8, remove all the false positives, and keep only the correct ones in our positive training dataset.

The image below shows part of our positive training dataset, which represents normal conditions as well as some extreme conditions including raining, snowing, blurring and darkness.

Negative training dataset

We resized our original negative images (images do not contain stop signs) to some fixed size to form the negative training dataset. Part of the dataset is shown below.

Training (HOG + SVM)

We extracted HOG features for each image from our positive and negative training dataset. Using the feature of each item as well as their labels, we feed into our weigthed RBF kernel SVM classifier for training.

Testing

Whenever we get a test image, let's takes the example image we used above for generating the positive training dataset, we can extract ROI candidates following the same steps as we generate positive training dataset. Hence, for the example image, we have the following steps for testing.

That is, given the extracted ROIs, we extract HOG features for each of them. Feed the HOG featues into our trained RBF kernel SVM classifier and test whether it is positive or negative. If positive, we select that ROI and draw a green bouding box around it. Our final result is shown below.

Results

We tested our model on images taken under extreme weather conditions as well as those with complex backgrounds (images with a lot of confusing red objects that are very similar to stop signs). Our model was very robust to these conditions and can detect the stop signs in these images accurately.

Furthermore, we drove in the city of Madison, WI in both rainy and sunny days and took the videos. Our model again showed robust ability of detecting and tracking the stop signs under different scenarios. Videos are shown below.

Discussion

Approach 1 - deep learning

The YOLO-Tiny + LISA can detect stop signs in real time efficiently. After we applied image preprocessing for images that are blurred, dark or over-exposed, the model also did well on these extreme cases.

Approach 2 - SVM + HOG

Running time

This SVM + HOG approach is relatively fast compared to other models. For example, the one minute video takes about two minutes to process, and we can see the very small stop signs can even be detected.

Acutally, the running time largely depends on how much accuracy we want to achieve. If we make the threshold larger, more small objects will be filtered out and we can only detect the stop signs that are large enough (in pixels). However, if we want to detect the stop signs faraway, we have to make the threshold smaller and that will take more time to process.

Pros

Here are some of the advantages of our HOG + SVM model:

The running time is good compared to other models, especially to deep learning models.
It can achieve high accuracy if the threshold is small enough.
It can detect robustly under most extreme weather conditions, especially in cloudy, rainy and snowy weather.
It is easy to implement and it does not need expensive hardware.

Cons

Although the results (images and videos) show that our model is very robust to some of the extreme conditions, there are still certain cases where this model cannot work very well. These include:

The weather is extremely dark, then our color mask will just filter out the stop signs.
When there are red objects covering or connecting to part of the stop sign, the image processing process will treat them as a single object, and hence the supervised learning classifier will vote for "no".

Code

The code is stored at https://github.com/lichengxihuang/CS766_Project.

References

http://vision.stanford.edu/projects/OPTIMOL/category/stop-sign/catmain.html
https://www.brooklynpaper.com/assets/photos/40/14/dtg-dumbo-stop-sign-safest-ever-2017-04-07-bk01_z.jpg

Stop Sign Detection

Han Cao, Xi Chen, Lichengxi Huang

Background

Motivation

Current state-of-the-art

Structure of the webpage

Approach 1 - deep learning

Overview

First try: YOLOv3 + COCO

COCO data set

Input data preparation

YOLOv3

Training

Testing results

Extreme cases

Second try: YOLO-Tiny + LISA

LISA

YOLO-Tiny

Image pre-processing

Training

Testing results

Approach 2 - SVM + HOG

Algorithm

Datasets

Original datasets

Positive training dataset

Negative training dataset

Training (HOG + SVM)

Testing

Results

Discussion

Approach 1 - deep learning

Approach 2 - SVM + HOG

Running time

Pros

Cons

Code

References