How did you develop an interest in LLMs?
I’ve always been interested in AI and particularly in natural language processing, which I believe is the key to developing intelligent machines that can produce human-like answers. But it took me a long time to get into this field. Since childhood, I loved drawing robots and in my third year at North Ossetian State University I finally took a step from dreams towards reality. I developed a robotic manipulator and won my first grant – 500,000 rubles from the UMNIK program.
After this, I decided to launch a new startup. That’s when I started experimenting with machine learning and neural networks. I launched a mobile app that would monitor the load on the university’s server, used generative adversarial networks to generate particle flight trajectories, and even tried to make a mobile game. At that moment I realized that I was facing a dead end in mobile development – it was time to learn some new approaches.
I gained this knowledge at my first IT job. My university collaborates with the Joint Institute for Nuclear Research in Dubna – and they, in turn, work with CERN (the European Organization for Nuclear Research). At the time, CERN was conducting the ATLAS experiment, which aimed to detect ultraheavy elementary particles in the Large Hadron Collider. I was tasked with developing a database monitoring system for the experiment. Thanks to this experience, I acquired some new skills and decided that I needed to start working with AI as soon as I could.
My first significant achievement in this field was a neural network-based voice assistant for business that my team and I developed together. The distinguishing feature of this assistant is that it is based on Siamese neural networks. Based on a small amount of data, they recognize user intentions by comparing relatively arbitrary commands with those already existing in the system. The assistant can perform various tasks, such as sending emails or connecting the user to online conferences. This startup landed in the top 20 of the annual list of 1,000 best startups made by the federal project Platform for University Technological Entrepreneurship. I also won a 50,000-ruble grant to use Yandex Cloud services and a 1-million-ruble grant within the Student Startup competition by the Foundation for Assistance to Small Innovative Enterprises.
Alan Gazzaev with the robot manipulator. Photo courtesy of the author
How did you learn about ITMO? Why did you decide to study here?
In the early years of my studies, my faculty was visited by the staff of ITMO’s School of Physics and Engineering. That’s when I first learned about the university.
In my fourth year, I started looking for a Master’s program where I’d be able to continue to work in generative AI and natural language processing. I found one just like that at ITMO – Deep Learning and Generative AI.
I was able to secure a tuition-free position thanks to the Junior ML Contest, where applicants could present their ML or AI solutions. My app, AIsaacChat, which I developed with the funding from the Student Startup competition, was evaluated highly by the experts. It’s an ensemble of neural networks that can identify a user’s intent, answer queries, and generate images.
What skills did you gain during your Master’s?
My studies here gave me freedom to develop new things and the opportunity to work on what I like. For instance, for one of my lab classes, I combined a language model and a model for automatic recognition of video and speech in Russian into a single system that understands what’s going on in each scene and can describe it. I also learned to work with reinforcement learning, as well as natural language and big data processing. For the first time in a long time, I had to solve equations with pen and paper for the course on coding theory. We had quite a few tasks and problems that took me outside the field I’d been typically involved with.
Alan Gazzaev defending a tech entrepreneurship solution at ITMO. Photo courtesy of the author
How did you start working with MTS?
When I got into my Master’s program, I knew that even though having your own startup is cool, you need real-world experience – so I sent out my CV to different companies. One of them was MWS AI (a subsidiary of MTS), but I forgot to include my contact information in the CV. Their recruiter found me through my Telegram channel, where I wrote about my work, and left me a comment there. At first, I assumed it was a scammer, but it turned out to be a legitimate MWS AI employee. He said that he chose me because I am a student at ITMO, one of the country’s top universities in AI as well as a partner of the company. After a couple of interviews, I was offered the position of a junior developer and researcher at the fundamental research team, where I develop and train language models. After a year, I was promoted to a middle-level developer and researcher.
In collaboration with MWS AI, you’ve developed Cotype Nano, a compact language model for generating text. What can you tell us about this product?
I am partial to small language models because thanks to their size, they are always at hand – on your smartphone, for instance. That’s why when I just started working there, I suggested making such a model. It was the first open-source Russian-language model by MWS AI; before that, they hadn’t launched any solutions in Russian. Our team tested various hypotheses for two months, but the model was trained quite quickly. Its main difference from similar models is its size. Cotype Nano doesn’t require internet access and can be launched on almost any device. This guarantees not only quick access to the solution, but also its safety – no data will leak online.
Cotype Nano can solve various tasks related to text in Russian: generate code for a program, a server, or a robot, write marketing text or financial reports, answer customer queries, and create job descriptions or training materials for employees. I used the RuLLMArena benchmark to evaluate the quality of its work. My model answered 500 questions and an AI “judge” compared its answers with reference samples. Cotype Nano showed one of the best results in its category in Russian – 50.51. The model reached such a high score thanks to two-stage supervised training (SFT): in the first stage, it was trained on code and mathematical problems, and in the second – on high-quality instructions. The hardest part of the work was to properly train a small model.
In November 2024, MWS AI successfully tested Cotype Nano on their set of problems and made it available for business. Since then, the model has been downloaded over 35,000 times, including 5,000 just in the past month.
In June 2024, you received the national award Priority: Digital – 2025 for Cotype Nano. What does this award mean to you?
We had some serious competition for the award – we were pitted against solutions from leading Russian tech companies, including Sberbank, VTB, Alfa-Bank, Nornickel, and Moscow’s IT Department. It’s a great honor to have won in the Digital Breakthrough category and a success on the national level.
After receiving the award Priority: Digital – 2025, Alan Gazzaev was invited to a session of the parliament of the North Ossetia-Alania Republic.
What’s next on your agenda?
I am still working at MWS AI and have recently graduated from my Master’s program. My thesis was named among the best ones at ITMO’s Institute of Applied Computer Science. In it, I developed an algorithm for additional training of a language model that would equip it with the ability to deliberate. Currently, ChatGPT takes some time to “think” over a question before producing one solution. But what if we can teach the model to come up with four solutions? This way, we’d be able to not only solve math problems, but also validate startup hypotheses or help robots map their routes.
