Speech Information Technology Programs at ITMO University

Experts claim that by 2020, half of all online requests will be done verbally, and a third of them won’t require any interaction with a display. Speech information systems are becoming more elaborate, and win their popularity with regular users. Still, what is this field of computer science really about, and where does one get relevant education in this field? Read our article to learn more about ITMO’s Department of Speech Information Systems that was launched in collaboration with Speech Technology Center, an international level company that focuses on speech information technologies.

What is the reason behind speech systems’ popularity?

Speech information technologies don’t just recognize and analyze, but can also help synthesize human voice. This includes imitating speech, analyzing context, text-to-speech conversion, and all kinds of tasks associated with voice biometrics. This field of computer science is considered amongst the most difficult ones, as it stands at the intersection of several complex disciplines: linguistics, mathematics, and programming. Nevertheless, speech systems are also one of the most rapidly developing fields, as well.

The reasons behind their popularity are quite obvious. An average person can print about 40 words in a minute, but they talk three to four times faster. What is more, by 2017, programs have attained a level of speech recognition which is almost on par with that of humans. According to the Internet Trends Report, in the recent four years, the accuracy of speech technologies has increased from 70 to 95 percent.

Where does one learn to develop speech systems?

Since 2011, ITMO University offers training in the field of speech information technologies at its Department of Speech Information Systems, which was founded in 2011 with the help of Speech Technology Center, an established company that has been working in the field of development of innovative systems for quality recording, processing and analysis of audiovisual information, as well as development of speech recognition systems, for over twenty years. STC’s solutions have been introduced in more than 70 countries including the USA, countries of Latin America, Middle East, and Europe.

Key features of the ITMO’s educational program

Practice-oriented education

The program’s lecturers are leading specialists from the Speech Technology Center JSC and professionals from other renowned scientific and commercial organizations. From the very beginning of the program, its Master’s students can participate in real projects. Best students automatically become the company’s interns and take part in project work under the guidance of its experts.

Internships at STC are practically identical to the real working process: interns are given a workplace and access to the company’s systems; successfully completed projects are paid.

Vladimir Kabanov, senior lecturer at the Department, stresses that their students get the opportunity to work as part of teams on real commercial projects over the whole course of the program. What is more, the internship does not hinder the educational process in any way.

Collaborations with international universities and research centers

Apart from starting an internship at STC, the program’s Master students also get an opportunity to get internships at international universities as part of a collaboration with the Ulm University (Germany) within the framework of the Leonard Euler Program of the DAAD foundation. Yuri Matveev, head of the Department of Speech Information Systems, notes that professors from Ulm University visit the department each semester to listen to the students’ reports; the most successful students are invited to continue their research in Germany as part of their pre-graduation internships.

Best interns get employed by STC

The students can get employed by the company as early as after their first year of a Master’s program. This way, they continue to study while already working for STC.

For instance, this was how Dmitry Ubskiy, once an intern for the company and now part of the team working on promising new research, joined STC.

“When Dmitry came to work with us, he was a total newbie. At first, we gave him tasks on adjusting our systems, and now he is already part of a team that is working on promising research projects and developing new speech recognition algorithms, shares Kirill Levin, head of STC’s R&D Department. By now, he’s already completed his Master’s program, but continues to study and do research; he’s recently started a PhD program, and also did two reports at major international conferences. We are also considering sending him abroad for an internship, and we’ve already agreed on a double degree program in Germany.”

Students can get their PhD degrees at European universities

The department collaborates with leading universities of Germany, France, Finland, the Czech Republic and Italy on both Master’s and PhD programs. For one, the department already conducts double degree PhD programs with the University of Eastern Finland, Joensuu (Finland), University du Maine, Le Mans (France), University of West Bohemia (the Czech Republic) and Ulm University (Germany).

What are the current STC’s projects that ITMO students participate in?

Starting with the first year of their Master's program, the department’s students who have successfully completed the company’s testing get the opportunity to participate in its projects under the guidance of experienced employees. According to Kirill Levin, those are mostly projects in STC’s classical fields of focus, such as automatic speech recognition and synthesis, as well as those that have to do with voice biometrics.

NeuroEar - a project to help robots with spatial orientation

This April, STC launched its new NeuroEar project. Its main purpose is to develop a common platform for machine hearing technology.

“In essence, it is all about describing the soundscene. For example, if you close your eyes, and someone asks you to describe what is happening around you, you can describe it based on the sounds you hear. You will say that you hear the noise of the street, and my voice; you will be able to describe what I am talking about, my emotional condition, discern some additional sounds. We want to create a platform that will be able to do that automatically, explains Kirill Levin. What do we need this for? For instance, you want to construct a robot. Surely, you can be proficient with kinematics and mechanics, develop the computer intelligence that will control the robot’s movements, but you will still have to give your robot the ability to perceive its environment, assess the situation it’s in:where to turn its head, where to go, what to answer if it’s been asked a question. Modern developers don’t really have much choice in this matter - they will take a speech recognition technology from one developer, a spatial orientation technology from another, but having to search for those can really slow down the development process. In order to avoid that, we want to create a common platform that will cover all those issues.”

The peculiarity of the platform’s development process is that it will make use of more than just machine learning technologies. For this project, STC’s specialists will be collaborating with different institutes of the Russian Academy of Sciences, as well as other scientific centers, for example the Kurchatov Institute and acoustics laboratories of medical establishments. Joint efforts of IT specialists, biologists and medical specialists will help understand how exactly the human brain processes auditory information.

The project has just been launched, and the company is actively assembling the research team.

“We offer a lot of open positions. We are looking for young people who know the basics of machine learning, and are well-versed in mathematical statistics and probability theory, i.e. have a good basic knowledge base. Most importantly, we are looking for people who are not afraid to learn new things, as they will have to solve unconventional tasks,” share the company’s representatives.

STC’s contest on speech synthesis

At the Find IT event in April, STC’s specialists already showed the ways to do simple speech synthesis in just an hour, and in the middle of June, the company will launch a major contest on speech synthesis; both teams and individual participants are welcomed. A special jury will judge the projects based on the quality of speech synthesis only; the participants will get basic instructions from the company’s experts, and will have to improve their systems in order to reduce noise and make the results as close to human voice as possible. The contest winner will get a 100,000 rubles cash prize.

Elena Nikisheva, Vladimir Kabanov and Kirill Levin

In May, STC will also launch a series of meetups for IT specialists, where representatives of different companies and students will get an opportunity to share experience and establish useful contacts. The first event will be dedicated to high-load systems. According to Elena Nikisheva, the company’s director for human resources, such meetups will take place twice every three months.

Speech Information Technology Programs at ITMO University

Elena Menshikova

Vasilii Perov

Related news

Schneider Electric Becomes Key Partner of Industrial Cyber-Physical Systems Master's Program

Future of MRI: ITMO’s Magnetic Resonance Imaging Summer School Attracts 70+ Students, Engineers, Medical Scientists

Russian And German Students Collaborate On An Industry 4.0 Project