How does a driverless car distinguish between pedestrians and trees? And how can Face ID tell whether it’s you or a thief logging into your phone? These and other tasks are tackled by specialists in the field of computer vision. In this article, Sergey Shavetov, associate professor and deputy dean at the Faculty of Control Systems and Robotics, explains whether a robot or an AI can be taught to see the world as a human being, why it is so hard and what will happen when it is finally possible.

Content list:

What is computer vision?
Why do we need computer vision?
Can an AI or robot be taught to “see”?
What changed with the advance in neural networks?
What does it look like?
What are the main achievements?
How is computer vision used in various areas?
What are researchers working on now?
How can this technology change our lives in the future?

What is computer vision?

It is a rather broad field of knowledge. It studies the ways to receive, process and analyze digital images and video streams. If we split this term into its parts, we can talk about low-level algorithms (that deal with receiving a digital video image, its representation, filtering and elimination of noises), middle-level algorithms (image recognition and structure blocks detection) and high-level algorithms (semantic image analysis, structure blocks classification and their understanding). If we consider mechatronics and robotics, and digital management systems, then machine vision will be a more appropriate term to describe the systems that sensitize robots. It is a combination of technical devices – sensors and signal converters – and algorithms for information processing that enable a robot to perceive and interpret its surroundings. In robotics, not only normal digital cameras but also range-finders, lidars, velocimeters, accelerometers, compases and other devices are used as sensors.

Why do we need computer vision?

We can use them for various purposes – for instance, to create safety or face/object recognition systems, quality control systems for production sites, driverless cars, improving image quality, 3D scene reconstruction, etc. If we generalize all of it, the main task of computer or machine vision is assistance in automatization of household or production processes, improvement of their safety and reliability and elimination of the human factor.

Can an AI or robot be taught to “see”?

In a sense, yes, but it is not an easy task. What machine vision needs to do is hard to formalize. Humans classify objects subconsciously. However, the sheer variability of the many objects around us and their qualities (like, for example, their brightness and geometry) and even the slightest change in the images leads to immense difficulties in developing computer vision algorithms. For a long time, people used the algorithms that operated individual images and analyzed each of them as a separate element: determined its borders, identified the object in the image and its characteristic properties. This approach is currently referred to as “classic”. In 2013, the neural networks boomed and that was when they started to be used in all computer vision tasks alongside machine learning algorithms.

What changed with the advance in neural networks?

The neural networks approach was a real breakthrough. The networks allowed researchers to stop perfecting analytical algorithms and form expert databases, which served as training data for neural networks fitted for specialized tasks. But that’s where the main difficulty is hidden – for training, you need not only a database of reference objects so that the algorithm learns to recognize them but also examples of images where the object in question is absent or hard to identify.

What does it look like?

Here is an example from the field of driverless cars: let’s say that the neural networks can now identify pedestrians in good weather. But what will they do when the sky is overcast? What if it rains or snows? What if it’s dark out and everything is in black and white because of the infrared lighting? Then the neural net needs to get additional training – and this requires a lot of time and effort.

What are the main achievements?

Active use of driverless vehicles can be called a great success in the field of machine vision – there are haul trucks, railroad trains, cars, as well as the social ranking system in China. There, cameras are everywhere and the respective state services know almost all there is to know about every citizen. It is a huge advance in the sense of technology that was only described in Orwell’s 1984. In all actuality, there is no difficulty in putting video cameras around the city, uniting the images acquired from them into a single videospace and tracking certain people or cars moving around the city in real time. It sounds like something from a sci-fi novel but it is fairly easy to implement.

How is computer vision used in various areas?

There are a lot of examples: identifying a criminal in a crowd; tracking the pupils of drivers to monitor their fatigue; AR technologies with software adding details to the scene based on computer vision algorithms; quality control on assembly lines; detecting fights and other deviant behaviour in subway carriages, trains or buses (these are actual tasks solved by companies like Russian Railways).

What are researchers working on now?

On the one hand, there is the problem of contemporary neural networks – creating suitable datasets (marked image databases) takes a lot of time and the data acquired is hard to project onto real-world conditions. The main problem of classic algorithms is in their narrow specialization for specific tasks and the difficulty of scaling, which means that sooner or later they need to be replaced with machine learning and then neural networks. One of the relevant tasks currently is developing algorithms that could solve both classic and new computer vision tasks based on correctly formalized data and analytical descriptions without any additional training cycles. Thus, we need a certain symbiosis of the two approaches, a combination of these two existing methods.

How can this technology change our lives in the future?

Thanks to computer vision, we will continue seeing our routine tasks automated. Instead of security tracking a person via dozens of cameras, it will be possible to simply select him at the entrance and trace his whole route. In flexible digital production, we won’t need any software tweaking or resetting, it will all be done automatically. Unmanned cars, trains and trucks are no longer a fantasy but a reality. Mass implementation of these technologies is just a question of time. The number of people doing routine jobs will definitely downtrend. Turns out, sci-fi novelists were right. Whether we like it or not, we cannot stop the advance of technology.

Why Teach Robots How to See? Nine Burning Questions About Computer Vision

Content list:

What is computer vision?

Why do we need computer vision?

Can an AI or robot be taught to “see”?

What changed with the advance in neural networks?

What does it look like?

What are the main achievements?

How is computer vision used in various areas?

What are researchers working on now?

How can this technology change our lives in the future?

Ekaterina Shevyreva

Catherine Zavodova

Related news

Fake Statham, Real Scams: What Are Deepfakes and How To Fight Them

Machine Rights: Can an AI Create And How Should Its Creations Be Protected?

ITMO Scientists Suggest New Method to Make More Adaptive and Autonomous Robots