Published: April 29, 2022
I saw Tesla’s full self-driving module’s navigation panel the other day and I was amazed by how the real world was accurately described in the panel. Buildings, vehicles, pedestrians, and even trees were fully re-created in this virtual world and fed to the autopilot model.
I decided to recreate what this module was doing, i.e. analyzing a video taken in the real world and identifying the moving objects in the video.
Tesla is probably using machine learning for this, however, I would like to go with a simpler approach due to below 3 reasons:
Above is a nicely put statement by Eugene Yan. Whenever we are trying to solve a data problem, it is always a good idea to start with simple heuristics and see whether reasonable accuracy is achieved before moving on to more complex models.
If we were to train a model that identifies moving objects in a video, that’d be possible, however, we would need labeled data for the model to train on. Creating labels is a costly (and boring) exercise and I’d like to avoid that as much as possible.
Machine learning identifies patterns in the data. If a data scientist already has an idea about a solution (patterns) to the problem, and this solution can easily be tested, it is always a great idea (and more fun) to try simple heuristics on data. This way, we can have an explanatory model, rather than a predictive model, which is a black box most of the time.
We first deconstruct the video into its frames and try to identify moving objects by tracking the color of each pixel in each frame. A color is represented by a combination of the Red, Green, and Blue (RGB) scales. These triple colors are represented by a number between 0 and 255. Isn’t it amazing that the colors, images, and videos we are seeing are just huge matrices of numbers (RGBs, pixels)?
A moving object is identified when there is a significant change in the color of a pixel in a frame. For example, when a pixel’s RGB changes from purple (180, 70, 250) to green (40, 200, 120) and back to purple (40, 200, 120), we can confidently say that an object passed through that pixel given that our camera is stable. By tracking the color of a pixel frame by frame, we can tag a point as a “moving object” whenever we observe a color change that is above a threshold.
Whenever a moving object is detected, we change the color of that pixel to green to highlight the object. We also convert the video to grayscale to easily see the highlighted objects. The below video shows how this logic processes videos and finds out the moving objects. Pretty cool huh?
We have created a rule-based moving object identifier by looking at the colors of each pixel in a video. This model works well for identifying cars, pedestrians, and even ferries. However, there are limitations to this model as it requires a stable camera, and setting up a reasonable pixel color change threshold is necessary. If the threshold is too low, we end up identifying waves in a river as moving objects, while if it is too large, we can miss some small moving objects such as pets.
Here is the code for the model.
Happy hacking!
Leave comment
Comments
Check out other works
2024/06/03
Kango: Guess The Kanji
2024/07/24
Lingo: Guess The Word
2024/04/29
Druggio
2024/01/28
Tetris
2022/03/15
Temperature Forecast
2022/02/09
House Price Prediction
2021/12/01
Japan Drug Database
2021/09/20
Japanese Text Classification
2021/09/01
Travel Demand Prediction