As a Gov.Tech and PaaS (Platform as a Service) — Operator eValidation Austria wants to make it as easy and as fast as possible for their customers to go through the customer journey. With Switzerland as non-EU country directly neighbouring to Austria a high number of car- and passenger-traffic is very visible at the border. Aside ordinary commuters, also travelers and day-shoppers need to run through custom control. The problem is obvious — long queues of unavoidable traffic jam.
Yes, because of the current situation there are almost no traffic jams, but a vaccine is in sight and we will be right back where we were.
With the upcoming digitization requirements of customs for digitally validating purchased goods the problem can be tackled. Nevertheless, customs doesn’t know in advance who is coming or whether they have goods at all and if they have goods, if those are to be inspected or not. That causes additional time to consider and causes queues in general.
One simple solution would be to track users with GPS through an APP on the mobile device of a traveller. Even if we would use the data just for this use case, this would be a surveillance which is not necessary, and we don’t want to implement.
Our approach is to detect the license plate ahead of custom offices and border exit points to filter travelers into drive-through and stop2inspect — lanes. This means drivers can focus on the road and don’t need to watch their mobile device for such notifications.
As a side note I want to mention that this was the project for my bachelor thesis. If you are interested in the whole thesis you can read it here. Because this is already one and a half year old, the following approach might not be the cutting edge. However, it works well and I will post an update on the approach soon, so stay tuned.
The license plate recognition system
The simplest way to describe the system would be:” A 3-step object detector based on YOLOv3.”
As for almost every problem it is easier to solve when we divide it in smaller ones. In the domain of license plate recognition this would be:
- Detecting vehicles (could be skipped when the camera is already focused on the area or zoomed in)
- Detecting the license plate
- Reading the characters
Why have we used YOLOv3? Because at the time it was the best object detection architecture considering accuracy-performance tradeoff. Another argument was the open-source availability of the implementation of YOLOv3 (link to the repository).
Step 1: region of interest detection
Even though YOLOv3 comes pre-trained on the COCO dataset which contains cars, buses, trucks, etc. We collected and hand-labeled an own dataset, because we only want the front- and backside of vehicles, where the license plate is located, to be detected. If the camera e.g., is not static and cars from the side could be in the image, we don’t want them to be detected and passed to the license plate detection. This saves computation time and minimizes false positive detections, because some vehicles have text printed on the side.
Step 2: license plate detection
After we detected the region of interest and cropped it, the cropped image gets passed to the next YOLOv3 detector to get the bounding box of the license plate. One more hint: Be generous with the bounding box when you label your license plate dataset, to make sure that YOLOv3 also detects the bounding box with all characters in it when dealing with difficult scenarios (bad lighting, different angle of view, etc.).
Step 3: character recognition
When you think of character recognition, Tesseract probably comes to your mind. However, Tesseract recognizes characters in contexts of words and sentences, which doesn’t suit this use case. Also, different colors and background noise makes it even more challenging for Tesseract.
You can also try to break the problem further into character segmentation and classification. Which you can often see in older approaches for license plate detection or character recognition in general.
Or you simply do it in one step with YOLOv3. Again, like in your license plate dataset, try to be generous with the bounding box while labeling the characters of license plates. It will help you later when you search for the right non-maximum suppression threshold.
Still the most difficult problem in license plate detection systems is to decide if a character is a “0” or an “O”. Not only the artificial neural network struggles. Also you, when you read this and you don’t know the font. You might have a hard time deciding which is the number and which is the letter. Also, every country has its own font for license plates. Even in the European Union they didn’t manage to choose one standard font and only a few countries (Germany, Netherlands, etc.) have a good font where you can easily differentiate between “0” and “O”. But depending on your use case and your goal you can implement workarounds.
Every country has its own rules for character positions. For Austria you can find a detailed article on Wikipedia. If you read through the article you will notice that there are a lot of exceptions, but if you only want to detect Austrian license plates it is doable but already difficult.
Now imagine you want to detect license plates from every country in the European Union and all neighboring countries. Implementing rules and exceptions isn’t what you want to do in this case.
Again, we went the easy way. We just don’t differentiate between “0” and “O”, because we find that the low accuracy of differentiating between “0” and “O” is worse than the likelihood of a license plate being almost the same, only with “0” and “O” swapped.
Is an easy and good algorithm when you get more than one image of the same license plate, to reduce your error rate. Temporal redundancy groups the detected strings of the same license plates and gives you the most likely one by majority vote.
Putting it together
Implementing the system, the architecture looks like this:
All these parts are executed in different threads, which run simultaneously and communicate over queues. This allows to execute the three different neural networks on different GPUs to increase the overall performance of the system.