Stop Sign Detection

Han Cao, Xi Chen, Lichengxi Huang

Background

Traffic signs play a significant role in the modern traffic system, and it becomes even more important in recent years due to the emergence of autonomous vehicles. The problem we are aiming to solve is to use the computer vision technologies to do the stop sign detection under different driving scenarios and extreme cases.


Motivation

With the development of autonomous vehicles, recognizing stop signs plays a very important role. Although in the future, when connected autonomous vehicles are developed, this may be solved by using roadside units sending instructions to the vehicles, nowadays, we still need to rely on the vehicles themselves to recognize these signs. Furthermore, vehicles that can recognize traffic signs will improve the driving experiences and safety for the drivers. Most drivers may have experienced situations when they were driving too fast on the highways and did not see clearly what the road guidance sign tells, hence driving to the wrong direction. In other cases, drivers may not see speed limit signs when the environment is dark, hence getting a ticket on over speeding.

These scenarios can all be avoided as long as we can develop an application that can automatically recognize the traffic signs. The vehicles that are equipped with this application can extract the key information, and either speak it out or display it on the head-up displays (HUD).

For our project, we focus on stop sign detection.


Current state-of-the-art

In general speaking of stop signs detection technology, it’s already well developed. It is a significant part of Advanced driver-assistance systems(ADAS). Such technologies are already used broadly by vehicle manufacturers on their products, such as BMW, Ford and Volvo, to improve the performance and safety. Most cars with an ADAS are able to detect stop signs.

The cornerstones of the stop signs detection are based on two facts:

  1. The features of stop signs are identical in the U.S. There is a standard of the design and size of all traffic sign published by MUTCD in the U.S.
  2. The cutting-edge algorithms enable cars to analyze and understand the content of stop signs.

The concept was brought up in 1968, but the recent blossom was triggered by the breaking through in hardware and machine learning algorithms.

Following are some widely-used approaches to the problem:


Structure of the webpage

In the following of the webpage, we will talk about the two approaches we implemented: the deep learning approach and the traditional computer vision approach associated with the RBF kernel SVM classifier.


Approach 1 - deep learning

Overview

In this part, we tried to use YOLOv3 (You Only Look Once v3) to perform traffic sign detection with COCO dataset and LISA Traffic Sign Dataset (LISA), we examed our results by using challenging condition pictures.

First try: YOLOv3 + COCO

COCO data set

COCO is a large-scale object detection, segmentation, and captioning dataset. We chose COCO as our data set because we do not need to reformat the annoation, and COCO has a large set of training data of stop sign.

The resolution of the training data is high and all pictures were taken under normal conditions.

A peek of COCO data set:

approach1_COCO_peek

Input data preparation

We used COCO API to preprocess our data using the following command:

# get all images containing given categories, select one at random
catIds = coco.getCatIds(catNms=['stop_sign']);
imgIds = coco.getImgIds(catIds=catIds );

YOLOv3

YOLOv3 is a very famous object detector, which has the following advantages:

Training

We used the following packages and facilities to help us train the model:

Testing results

Our result for normal cases was good, and the IOU was 0.973 as follows:

approach1_COCO_result

Extreme cases

Although YOLOv3 performed well on normal images, it failed for all extreme cases. We summarize the extreme cases as follows:

  1. Blur

  2. Darkness

  3. Dirty lens

  4. Over-exposure

  5. Rain

  6. Snow

  7. Haze

Second try: YOLO-Tiny + LISA

LISA

To solve the extreme cases problem, we chose to use some other data set. We chose LISA Traffic Sign Dataset, which contains 47 US sign types. Here are some image examples of the dataset.

approach1_LISA_peek

YOLO-Tiny

Instead of 102 layers of YOLOv3, YOLO-Tiny has only nine layers, which is much faster for training.

Image pre-processing

Our algorithm for image pre-processing is shown as follows:

approach1_image_preprocessing

Training

The configuration of the YOLO-Tiny model is shown as follows:

Testing results

Here are part of the testing results, where the model successfully detects stop signs under dark environment.

approach1_show_1
approach1_show_2

Approach 2 - SVM + HOG

Algorithm

Our second approach is called SVM + HOG approach. It includes image processing, ROI (region of interest) selection, HOG feature extraction and supervised learning.

Our detailed algorithm is shown below.

algorithm

Datasets

Original datasets

We randomly downloaded around 135 pictures that contain one or more stop signs from the Internet, to form our original positive dataset. Some of them are shown below.

orig_pos1 orig_pos2 orig_pos3 orig_pos4 orig_pos5 orig_pos6

For negative dataset, we download 415 pictures from [1]. All the pictures do not contain any stop signs. Here are some of them.

orig_neg1 orig_neg2 orig_neg3

Positive training dataset

The generation of positive training dataset requires a set of image processing steps. Here we use an example [2] to illustrate the process how we crop the ROI to generate our positive training dataset.

Step 1, convert the orignial image to HSV color ranges. HSV stands for hue, saturation and value, and is more robust to color segmentation. We define a color range to select out all the red objects, as shown below.

ex1

Step 2, we use a set of morphological transformations, including dialation and erosiion, to romove the noise in the image.

ex2

Step 3, we fill in holes within each connected component. As you will see, after filling the holes, the text "stop" in the stop sign is removed. In this way, we can remove unneccessary contours and decrease the number of ROI candidates to reduce the running time and hence increase the detection accuracy.

ex3

Step 4, we set a threshold, and discard all the contours with size smaller than that threshold. With this, we can remove small (unneccessary) objects.

ex4

Step 5, for each contour, or connected component, we find its top left and bottom right coordinates. By multiplying these coordinates with some slack factor (here we used 1.1), we can draw the bounding boxes. Each subimage in the bounding boxes is a ROI candidate.

ex5

Step 6, removing all the small bounding boxes that are involved in some larger bounding boxes gives us all the ROI candidates.

ex6

Step 7, select out all the subimages within each bounding box and scale them to the same size.

ex7

Step 8, remove all the false positives, and keep only the correct ones in our positive training dataset.

ex8

The image below shows part of our positive training dataset, which represents normal conditions as well as some extreme conditions including raining, snowing, blurring and darkness.

pos_dataset

Negative training dataset

We resized our original negative images (images do not contain stop signs) to some fixed size to form the negative training dataset. Part of the dataset is shown below.

neg_dataset

Training (HOG + SVM)

We extracted HOG features for each image from our positive and negative training dataset. Using the feature of each item as well as their labels, we feed into our weigthed RBF kernel SVM classifier for training.

Testing

Whenever we get a test image, let's takes the example image we used above for generating the positive training dataset, we can extract ROI candidates following the same steps as we generate positive training dataset. Hence, for the example image, we have the following steps for testing.

testing

That is, given the extracted ROIs, we extract HOG features for each of them. Feed the HOG featues into our trained RBF kernel SVM classifier and test whether it is positive or negative. If positive, we select that ROI and draw a green bouding box around it. Our final result is shown below.

testing1

Results

We tested our model on images taken under extreme weather conditions as well as those with complex backgrounds (images with a lot of confusing red objects that are very similar to stop signs). Our model was very robust to these conditions and can detect the stop signs in these images accurately.

results1
results2

Furthermore, we drove in the city of Madison, WI in both rainy and sunny days and took the videos. Our model again showed robust ability of detecting and tracking the stop signs under different scenarios. Videos are shown below.


Discussion

Approach 1 - deep learning

The YOLO-Tiny + LISA can detect stop signs in real time efficiently. After we applied image preprocessing for images that are blurred, dark or over-exposed, the model also did well on these extreme cases.

Approach 2 - SVM + HOG

Running time

This SVM + HOG approach is relatively fast compared to other models. For example, the one minute video takes about two minutes to process, and we can see the very small stop signs can even be detected.

Acutally, the running time largely depends on how much accuracy we want to achieve. If we make the threshold larger, more small objects will be filtered out and we can only detect the stop signs that are large enough (in pixels). However, if we want to detect the stop signs faraway, we have to make the threshold smaller and that will take more time to process.

Pros

Here are some of the advantages of our HOG + SVM model:

  1. The running time is good compared to other models, especially to deep learning models.
  2. It can achieve high accuracy if the threshold is small enough.
  3. It can detect robustly under most extreme weather conditions, especially in cloudy, rainy and snowy weather.
  4. It is easy to implement and it does not need expensive hardware.

Cons

Although the results (images and videos) show that our model is very robust to some of the extreme conditions, there are still certain cases where this model cannot work very well. These include:


Code

The code is stored at https://github.com/lichengxihuang/CS766_Project.


References

  1. http://vision.stanford.edu/projects/OPTIMOL/category/stop-sign/catmain.html
  2. https://www.brooklynpaper.com/assets/photos/40/14/dtg-dumbo-stop-sign-safest-ever-2017-04-07-bk01_z.jpg