Denis Ivanko, second year PhD student at ITMO's Department of Speech Information Systems, will go to the University of Ulm to work on speech recognition systems.
"We are developing a system that will recognize speech in both audio and video formats simultaneously: it will read lips and recognize speech recorded with a microphone. We will unite these two modalities so as to attain most precise results," shares the researcher.
The system will make use of machine learning and neural networks. As of now, the developer already has a prototype - a basic version that its creators wish to improve further.
According to Denis, the University of Ulm has a great track record in developing image recognition technologies and dialog-based applications, i.e. systems that are already in use and can be used for their project.
University of Ulm
Despite speech recognition being a popular field of research, experts state that systems like Google Voice, Siri and such still have a long way to go. That is why it is so important to teach machines to recognize human speech on a level close to that of humans. In places like the subway, railroad stations or airports, where there are lots of people, a system finds it hard to recognize speech well.
"We add using image recognition to the common approach so as to distinguish our "target" speaker from others who talk nearby. Another novelty we use is a high-speed camera. Research using common cameras had already been conducted and showed that people talk too fast, and much information is lost. This is why we've decided to use a high-speed camera at 200 FPS, which allows us to track a speaker’s lips movements more precisely," shares Denis Ivanko.
The student's work will mostly have to do with fundamental research, as the project’s developers are yet to create a commercial application.
Another student who will go to the University of Ulm is Alexei Romanenko from the Department of Speech Information Systems. Alexei will conduct research on robust speech recognition for low resource languages.
Alexei Romanenko
Low resource languages are a group of languages for which informational technologies are underdeveloped, in a particular sense.
"Let us take Georgian as an example. A lot of people speak it. Yet, there are no decent speech recognition systems for it. Also, there're not enough materials - text data or audio files - to create them. My task is to use modern technologies so as to develop methods for providing reliable and quality speech recognition for it," explains Alexei Romanenko.
In Germany, the PhD student will conduct research focused on developing a methodology for creating automatic speech recognition systems for low resource languages.
Fedor Glushenko, second year PhD student at the Instrumentation Technologies Department will go to the University of Applied Sciences Emden/Leer where he will focus on automation technology as part of the Industry 4.0 concept.
Fedor Glushenko
His research has to do with the operation of a molding machine that can mold different items from polymers. The process is quite complex, so automation is a great solution. As of now, Germans actively research and introduce Industry 4.0 technologies aimed at enterprises' informatization.
"At the university I am going to, I’ll be given access to different manipulators and sensors, and I will try learning the basic skills in creating networks for such systems. The knowledge I'll acquire will then be used for developing the laboratory of molding processes," shares the PhD student.
The student's research will mainly focus on developing an automated technological process for creating optical goods from polymer materials.