Scientist Evgeny Belyaev on the Challenges of Video Encoding, Limits of Compression, and Future of Streaming
Evgeny Belyaev has been working on various projects that deal with video encoding for over a decade. In this interview with ITMO.NEWS, he talks about the nature of video encoding, working in this field, and how it feels to return home after years abroad.
What is it that you do?
My research interests lie in the field of video data compression and transmission and, in general, all of my work falls into this category. There are many different challenges and tasks in this area: one such task is to perfect the quality of current video compression algorithms, which means working on developing video formats such as MPEG-2 and MPEG-4. Currently, H.264 or MPEG-4 Part 10, Advanced Video Coding (H.264/AVC) is the most frequently used video coding standard.
It was released in 2003 and is present in almost any existing device or video streaming service. There is an unspoken rule that every 10 years a new standard emerges that is two times better than the previous one, which means that it allows us to compress a video to half its size and retain the same visual quality. In 2013, the emerging H.265 High Efficiency Video Coding (H.265/HEVC) standard was released, which will supplant H.264/AVC in the nearest future. However, even before 2013, research was already being conducted to create a new standard, and right now H.266, known as Future Video Coding (H.266/FVC), is under development. I also took part in this initiative and was working on improving some of the algorithms that make up these codecs. In 2013, I participated in the Grand Video Compression Challenge (San Jose, CA, USA), where I got a finalist award for the reduction of H.265/HEVC video bit streams by about 0.5-1.5 % without a rise in complexity.
How are these codecs and algorithms evaluated?
There are a number of standard tools used to evaluate these algorithms. They are all publically available and are usually different types of videos that vary in resolution and content: some are more static, while others involve a lot of active movement. If an algorithm doesn’t affect the quality of the video, then only the compression is evaluated, but if the algorithm changes both the compression and the quality, then we build a rate-quality curve and use them to compare the codecs. Obviously, if there is higher compression with the same quality, that means that this codec is better.
What is it that helps make algorithms more efficient?
One of the most important parts of an encoding algorithm is block-based motion compensation. The simplistic version of how video encoding works is that we know that the adjacent frames are similar and therefore we divide a frame into blocks and then search for similar blocks in the previous decoded frame. Then, instead of compression of each block itself (as in JPEG), we compress the difference between the current block and a similar block from the previous frame as well as the displacement or motion vector between these blocks. In MPEG-2, this search was conducted only for 16x16 blocks, while in H.264, it can already be done for 16x16, 8x8, 8x4, 4x4 blocks and so on. It allows us to better compress different moving objects, but requires more computational resources needed for the best block size selection for each part of a frame.
Is there a hard limit for video compression? It isn’t really possible to compress a video forever, right?
I would say that a limit is not really known, because there are no comprehensive models of a video source; it can only be estimated. There is only circumstantial evidence that each new standard is getting harder and harder to achieve. At some point, we will either have to invent something completely new to revolutionize the field or we will have to focus on specific applications, where some additional knowledge about a video source is available. For example, I'm working on a task of inspecting various infrastructures with aerial drones. A camera is mounted onto a drone and the drone creates a map of a certain area. The drone flies in a zigzag pattern and the images overlap when the drone returns to where it has previously been. This additional knowledge can be used to improve the compression rate in comparison with conventional coding as in H.264 or in H.265. Unlike in movies, we can form a map while we are recording and use it as a reference for motion compensation part of the codec. Moreover, if such inspections are scheduled to be regular, we can assume that the basic infrastructure of an area remained the same and just utilize a previously created map as a reference. My preliminary experiments show that it allows us to reduce the video bitrate down to half comparing to the traditional coding.
What are some other examples of specific video encodings?
Another interesting property of the conventional codecs is that they are all very sensitive to the loss of information. You must have noticed it when watching digital TV: when a frame is missing, you can see the artifacts looks like colored squares. In these cases, the abovementioned motion compression becomes a nuisance, because even if only a small part of the frame is missing, the error propagates fast through the following frames until the next keyframe is received. Therefore, if the video is being transmitted over a very unreliable channel, as in car-to-car communications, other video coding algorithms should be used.
You can see in this GIF that with only 1% of packet loss the quality of the standard H.264 codec (below) is comparable to a specific codec for this task (above), but for larger packet losses (10% and 20%) it becomes noticeable that H.264 cannot manage. Usually, there are algorithms for resending lost data, which is why you don’t see this kind of problem on video-on-demand services like YouTube, but when a user is sending data to multiple users or in the case of video conference services like Skype, retransmission is not feasible and we have to manage the video data losses.
We have developed a video encoding algorithm based on the wavelet transform, which is used in the JPEG2000 format, for instance. Instead of block-based motion compensation, we apply a 1-D wavelet transform to a group of frames in temporal direction, and then a 2-D spatial wavelet transform to each frame. Then we encode and transmit each wavelet sub-band separately from others. It helps us to avoid error propagation even at high data loss rates. This codec does not provide better compression ratio in comparison with the codecs utilizing block-based motion compensation. But it is preferable when the communication channel is very unreliable.
You have worked both in Russia and in Europe. How did you get into this field and is there any difference to where you work?
Russia is part of an international community and adopts the same European standards. However, there are not many works in video compression coming from Russia. When I visit various conferences on the topic, I often hear Russian names and meet Russian people, but they are all usually affiliated with European and American institutes or companies.
When I was a student at the St. Petersburg State University of Aerospace Instrumentation (SUAI), we had a number of very prominent scientists working in Information Theory there. In the USSR, there were two schools in the field, one in the Institute for Information Transmission Problems in Moscow, and another one was here in SUAI. When I was a student, I was offered a job at my department that consisted of creating a specific video encoding algorithm for a Russian-American company, which is how I got into this field altogether.
After my candidate of science thesis defense in SUAI, I moved to Finland for work, and then to Denmark. It is very comfortable to work there; there is almost no paperwork and everything is very well organized, there are instructions for everything. The first thing I noticed there was the amount of free time that I now had and how convenient everything was. Their culture is also very different from ours: for instance, being late for ten minutes is considered very impolite and so is not keeping your promises, even the small ones. I have visited the USA and China and I have to say that of all the things that I have seen, Scandinavia and Finland are the most noticeably different, it’s like a different world.
What was the reason for you visit to China? What was it like?
I have received several grants from the National Natural Science Foundation of China, which works with young scientists from all over the world. I have a working relationship with a team at Xidian University, where I work on arithmetic coding, which is an important part of video compression. Usually, I visit China for two-four weeks every couple of years. In China, there is a problem with communication, since the vast majority of local people, except for researchers in the universities, do not speak English. Moreover, the local food is too different from European cuisine. On the other hand, there are a lot of interesting places to see and the rate of changes surprise me each time I visit. I think it is hard to deny that Chinese scientific and technological work is very prominent in the world nowadays.
Now you work at ITMO University. What led you to return to Russia and what made you choose ITMO?
Moving abroad gives a person a lot of experience. You understand that the world isn’t really that big and that you can move and work anywhere, that people, though very different, have more in common than you can imagine. I learned a lot there: how to write papers, learn languages, and I even got a PhD. But there are disadvantages. It’s much harder to communicate with relatives and friends, and there is no familiar food. Though one can easily survive without all these things, when you are suddenly deprived of them all, you start to really to miss home.
At some point, I figured I had learned everything there was to learn, but the disadvantages never went anywhere. So I decided to return home.
I began searching for a place to return to. From the media, I learned about the 5-100 Russian Academic Excellence Project and looked up the sites of the universities. In most of these sites, I didn’t find proper instructions for someone in my position abroad; only SFU and ITMO were easy to understand. My previous experience was with European universities, where it was really simple, and only at ITMO University it was just as easy. Also, two of the professors who I was working with at SUAI had moved to ITMO, so it was really nice to be able to come back to them and to establish contact with ITMO’s scientific groups.
Currently, I am working on the drone video encoding, arithmetic compression and compressive sensing. ITMO University has great conditions for conducting research; here, it is possible to focus on your work and do it well and this is exactly what I was looking for.