Alexander Belozerchik has been working in the field of speech technologies for some 20 years – having entered it right after he graduated from Herzen State Pedagogical University’s Faculty of Physics and Astronomy in 1995. That said, he’s had to master the lion’s share of technologies and methods on his own. With his long history of collaboration with the Speech Technology Center, as well as extensive expertise, knowledge and skills, Alexander nevertheless decided to get another degree and applied for the Speech Information Systems Master’s program at ITMO University. 

Can you start by telling us about your profession? What is speech technology and what exactly do you do?

I’ve been working at the Speech Technology Center for what is now almost 20 years. Before this, I also worked in the field of speech technologies, but it was at a forensic laboratory – I was engaged in the research of phonograms of speech signals. We analyzed the recordings and identified the speakers, examined the phonograms for signs of editing, and solved other diagnostic tasks.

I was invited to join the Speech Technology Center as an expert on identifying people by their voices. At that time, there were no neural networks or machine learning apps. Speech technologies worked a different way: we would extract signals and features that were comprehensible enough from the physics’ point of view and study them using conventional methods of spectral analysis. The models and methods for making decisions at that time were completely different, too.

Over time, the company developed and new in-demand products started to appear – this created a need for not only experts and programmers but also other IT specialists: technical writers, product managers, analysts, tech support specialists. Thus, my work at the Speech Technology Center organically grew into product-oriented activities. 

Speech Technology Center. Credit: social media
Speech Technology Center. Credit: social media

I had to do with many of the company’s products, but I specialize in multichannel recording systems and speech data analysis. Previously, these were just recorders, but now these are represented by entire hardware and software complexes that carry out high-loaded – spanning hundreds and even thousands of data channels – recording of speech data, its special processing and analysis. In terms of their software architecture, these are rather advanced systems using speech recognition, natural language processing, and analysis of texts derived from the recognized speech. Such systems are needed, for example, at major call centers which get hundreds of thousands of calls a day – all these calls are registered, analyzed, used to make various managerial decisions such as improving the quality of work, introducing new services, and others. 

I work with speech technologies as applied to the analysis of customer service within the framework of mass-market products such as bank loans and mobile services.

How did you initially get into this field?

To some degree, it was by luck, but who knows what makes a coincidence and what makes a pattern? I’m very grateful to Herzen State Pedagogical University – it had a really good department of experimental physics. And those who gravitated towards research had a wealth of opportunities to hone their research skills using adequate experimental facilities. It was there, at Nikolay Divin’s laboratory, that I started doing acoustic research.

Credit: shutterstock.com
Credit: shutterstock.com

Of course, at the time I didn’t ask myself questions about how the speech apparatus is arranged in terms of physiology, how it connects to the brain and what is speech as a phenomenon. I just focused on acoustics and defended my thesis on thermoacoustic autogeneration. Based on the results of this work, I was recommended to one of the forensic laboratories in the city, where I started to specifically focus on the speech acoustics and various technical applications related to the studies of the phonograms of speech signals. 

I worked at this laboratory for five years. Naturally, we all had specific training, but I had to acquire a huge swathe of knowledge on my own via practical work with the material, talking to my colleagues, as well as studying literature on the topic. 

What did your work look like back then, how exactly did you go about analyzing speech?

We worked in special software for spectral study of sound signals. We would input the sounds from ordinary tape cassettes into a computer using analog-to-digital converters and boards. There are special techniques that allow you to identify the speaker on the basis of spectral, temporal, and cepstral features.

The second, but no less important, part of research was multilevel linguistic analysis covering photonics, vocabulary, grammar, and prosody. For example, you can guess a lot about a person, their profession, age, social status, and so on based on the words they use, while the way they pronounce certain sounds can give you a fairly confident understanding of their place of birth and the formation of their language skills.

Speech spectrogram. Credit: izotope.com
Speech spectrogram. Credit: izotope.com

With such a vast experience and professional record, why did you decide to return to university studies? Couldn’t you yourself pass for an expert?

In some ways I could, in others I undoubtedly couldn’t. 

When I observed what the young people coming to work at the Speech Technology Center were capable of, I didn’t fully understand what they’d been taught. I felt that they knew something I didn’t, but at the same time, it seemed like they hadn’t heard about some important things as far as I was concerned. It was very interesting to me to delve into this matter so that I didn’t have any redundant expectations of working with young people coming to the center. And I also didn’t want them to perceive me as someone from the past, someone who had no idea or understanding of modern things perceived by them as obvious. 

What’s more, even though I’d been working in the field of speech technologies for a long time, I didn’t consider my knowledge sufficient, I wasn’t fully confident that I knew everything I needed to know. I wanted to get real, high-quality education in the field of IT. I toyed with this idea for a couple of years, thinking, “Isn’t it too late for me? I seem to be working well enough without it, I don’t experience any problems.” But if you get it in your head to do something, you’d better do it. 

I looked at the curriculum and saw things that were completely new to me – everything concerning data analysis and machine learning. We, of course, weren’t taught this back in the day, and now this is some sort of a must-have. 

So, Master’s studies became somewhat of a challenge for me. What’s more, I also had to logically draw some line under my professional becoming. Imagine that you’ve been doing something your whole life, you have lots of skills and achievements, but you can’t consider yourself a fully fledged specialist because you haven’t gotten any comprehensive training in this field, one that would form a definitive set of knowledge, competencies and skills. Otherwise there’s a risk of your remaining a craftsperson – a professional shouldn’t have any weak spots and gaps in their knowledge. 

Students of ITMO's Speech Information Systems program
Students of ITMO's Speech Information Systems program

What kind of relationship did you have with your lecturers? Did you ever get a feeling that you knew something better than they did? Or that you knew everything already?

The staff are exceptional – they are all highly qualified specialists, and I’m grateful that I got the chance to attend their courses. 

When it came to modern areas related to machine learning, data analysis, automatic natural language analysis, I soaked everything up like a sponge, since it was largely new to me. Some things I did better at, there were questions I knew my way around more or less, for example digital signal processing. In that case, I even helped my classmates do practical tasks. 

There were things I had some understanding of from reading relevant literature, but it was still a pleasure for me to take the courses, which allowed me to get a bigger picture, for example, of the advances in speech psychology, the connection between speech and thinking. This knowledge, among other things, is used in the development of speech artificial intelligence – systems that are able to maintain a conversation with a person as if they were another human and not a robot. This is one of the most cutting-edge topics in modern speech technologies. 

Despite the fact that the program is aimed at speech specialists, it also offers a good curriculum for those who focus on multimodal biometrics and face identification. I learned many interesting things in this field, too.

Students of ITMO's Speech Information Systems program
Students of ITMO's Speech Information Systems program

When I was studying at Herzen State Pedagogical University, our education process was worlds apart from what I saw at ITMO. Now, everything is geared towards working on a computer: electronic learning materials, lab projects. Back in my day, we only had one computer class – under lock and key – with five students taking turns to do their lab work on one computer. 

Today’s students have gigabytes of code accumulating on their laptops by the time they graduate – while all I had when graduating was a giant pile of notebooks. But I’m glad that it was like this back in the day and different today, and that I have the opportunity to compare. 

How comfortable was it for you to study alongside people that were much younger than you?

I didn’t experience any psychological discomfort at all – the question never arose. My classmates and I lived regular student lives, helped each other with lab projects, shared lecture notes, covered for each other if one of us couldn’t attend a class. The lecturers didn’t pay any attention to the age difference too, I didn’t stand out in any way. 

Now there is a trend that people both look and feel much younger than what is indicated in their passport. I think that this applies to me too. 

ITMO University
ITMO University

Was it difficult for you to juggle studies and fulltime work?

Of course, there were times when I was snowed under with work and didn’t sleep at night because I had work and study tasks to finish. But not only is this possible, it’s the way to do it. There’ll come a time for rest, but while you’re young, you have to grind away and not be afraid of difficulties. I’m all for letting students combine studies and work. Train hard, fight easy, as the saying goes.

What would be your advice to current students?

I’d advise them not to leave everything to the last moment, as it often happens. The earlier you start writing your thesis, the better it will pan out, the more interesting will be the result. I was left a bit unsatisfied with my Master’s thesis – just because I didn’t have enough time for what I’d planned. My defense went okay, but had I started working on my thesis earlier, it would’ve gone much better. 

And my main piece of advice to students would be to learn with interest, and be grateful to your lecturers.