Nine students from different universities who successfully solved the test on neural networks participated in the first Algorithm summer school; they came to St. Petersburg from different cities to master the fundamentals of machine learning under the guidance of the field's experienced practitioners.
"The test was about creating a system for road sign recognition in images, based on an existing database that's been on the Internet since 2012. Its task was to detect a road sign on an image and then catalog it. We were free to choose libraries, approaches - it was the result that was most important. After completing the task, we were invited to an interview, where they selected several of us for the program. I think that the test job was quite relevant, as major companies have been focusing on driverless car systems for the past two years, and recognizing road signs is an important issue in this regard. In other words, we were offered to solve a most relevant problem right away," shares Ivan Kremnev from ITMO's Department of Advanced Mathematics.
Algorithm's participants
After the list of the school's participants were published, the organizers proposed their students to work on a real project by STC. As part of the summer school's program, they were to create a speech recognition system based on sound, video, and the combination of both. For experiments, they were to use STC's database containing videos with people spelling Russian numbers. They were also provided with baseline-system for acquainting themselves with speech recognition technologies. The main task was to improve the baseline-system so as to minimize the mistakes of speech recognition using lip movements and voice.
"We've been collaborating with ITMO University for five years already; for instance, we've co-organized the Department of Speech Information Systems. When the question of doing a summer school came up, we didn't even think for a second. We already had an affiliated department, lecturers, a site, and equipment. This school is not our first experience in education. For many years, we've been searching for the right formats, conducted contests, workshops, etc., yet this will be our first program in a school format - and I have to say, it's a really successful one, as we get to teach, spark the interest of, and objectively assess every student in the course of our interaction," comments Kirill Levin, Head of STC's Research Department.
The participants worked in groups of three. In an attempt to teach the system to recognize what a person said (from a vocabulary of 10 words, all of which were spelled Russian numbers), each team used a different approach. For instance, one team used an article on the audio-visual synthesis of Russian speech that stated that lip positions can be broken up into 14 visemes (visual phonemes) that can describe all of the sounds pronounced by a person. Thus, they started with rearranging the database provided by the organizers in accordance with these visemes.
"After studying for a week and a half and having tried to do everything on our own, we had two days to create a working system. Some worked on systems that were to be provided with data and returned words. As for us, our system returned phonemes, separate sounds. Then, following the classical approach, we had to use a coder to transform these phonemes into words. This task didn't call for machine learning technologies," says Ivan Kremnev.
Upon completing the program, the students received certificates, and the top three were offered to continue working on the project as employees of STC's Research Department. According to the organizers, the company is ready to consider employing the other participants, as well; in future, they plan to make the school an annual event.
Kirill Levin
"The skills you have now is what gives you the opportunity to adapt to the modern reality. STC is really concerned with that, as people find that easy, but companies don't. This is why we are now starting to work with technologies related to AI, like chatbots and Big Data, not just speech recognition, image recognition and the like. All of that will once constitute the artificial intelligence," explains Kirill Levin.