Crucial skills for AI engineers today

Understanding a neural network’s functionality. How does it analyze the prompt and the context? How does it plan the steps for solving a task and generate an answer (prefill-decode)? How does it work with several requests simultaneously (continuous batching) without recalculating similar questions and storing them instead (distributed prefix caching)? These are the insights that help engineers develop and optimize high-load services.

Optimizing neural networks at the hardware level. Even the most powerful and expensive GPU, a specialized processor designed for parallel data processing and computation acceleration, will execute a command 90 times slower if the engineer hasn't optimized the code. In practice, this means that training a neural network will take not a month, but three – and this is critical when the AI market changes literally every day. Engineers need to understand how the hardware they work with is structured: how memory cache functions, how multiple computations run in parallel, and how the written code is imposed on the GPU device. It is also necessary to be familiar with specialized programming languages and architectures to write commands directly for the GPU (for example, CUDA and PTX), with the NCCL library, which links multiple GPUs, and with specific algorithms like FlashAttn, which significantly speed up certain processes in neural networks.

Designing distributed systems. If you want to deploy a neural network for a million users, you’ll inevitably be faced with a problem – the energy consumption can grow up to dozens of gigawatts. No country in the world has such resources. That’s why it’s important to know how to optimize neural networks in terms of their architecture and infrastructure.

To increase the overall computational power of a model, engineers use special techniques. For example, they apply the Mixture-of-Experts architecture, where instead of one huge model, tens of thousands of small specialized submodels – “experts” – are used. For each processing element, the neural network automatically selects and activates only the "experts" with relevant competencies for the task.

For instance, if you write the following prompt to Alice, “I am travelling to Lake Baikal. What should I see there?”, in several seconds, you’ll get a list of beautiful spots complete with descriptions, images, and routes. In practice, such a distributed system is 30-50 times more efficient than a usual neural network running on a GPU.

Skills at the intersection of machine learning. These, for example, include attention optimization (multi-query attention and cross-layer attention), which helps process long contexts more efficiently and lower the load on computational resources, as well as speculative decoding used to speed up replies with parallel processing of several generation steps. The knowledge of these tools will allow engineers to create fast, high-quality networks that take into account infrastructure limitations.

These skills can be gained at ITMO, including in joint programs with Yandex. You can learn more about them here.
Valery Stromov's lecture at ITMO's Yandex space. Photo by Dmitry Grigoryev / ITMO NEWS

Valery Stromov's lecture at ITMO's Yandex space. Photo by Dmitry Grigoryev / ITMO NEWS

What to learn in the future

Designing no-lag speech recognition and synthesis systems. Engineers improve neural networks so that they can imitate real-life communication: understand speech even if words overlay and respond like a human, with only a 100-150 millisecond lag. However, the task of speech recognition and synthesis remains quite complex: specialists need to take into account various details to allow neural networks to correctly detect not only words, but the speaker's tone. The next level is to make AI assistants more empathetic so that they communicate with users like real-life people and support them in times of need; thus, AI will be able to recognize the emotions of others and generate its own.

Creating a human-like memory that remembers and understands personal context. Imagine that neural networks could become so personalized that they’ll remember what users asked them a year ago, when their family members have their birthdays – and what gifts they’ve already received. To make a neural network imitate human memory, it must be trained to find the highest‑quality tokens that best describe a user’s life and to use them in every request. Solving this task requires engineers to have competencies in memory design: designing data stores and caches (hot storage), algorithms for selecting important user facts for context (update policies), optimizing the data a neural network can process simultaneously from conversation history (context window optimization), as well as knowledge of neural network memory management and long‑context optimization (long context windows). 

Increasing a neural network’s autonomy. In order to enable a neural network to understand the context of a user’s life and proactively handle typical tasks, engineers train AI assistants using reinforcement learning. If a neural network’s solution matches an engineer’s solution, it receives the maximum reward; if not, it continues striving for a better result. So, in this field, an engineer’s own skills also need to be developed continuously and quite intensively.

Creating an agent neural network. Imagine that a user wants to grab lunch close to their office. Having received this request, a neural network will act like an agent and will separate this complex task into parts and distribute it amidst other services. They, in turn, will “check out” the restaurants nearby, study their menus, check table availability, compare prices, and offer recommendations in a single reply. In order to design agent neural networks, engineers have to boost their skills in designing platforms for agents and task orchestration, as well as in asynchronous programming and external API integration.

Valery Stromov's lecture at ITMO's Yandex space. Photo by Dmitry Grigoryev / ITMO NEWS

Valery Stromov's lecture at ITMO's Yandex space. Photo by Dmitry Grigoryev / ITMO NEWS

Valery Stromov is the CEO of Alice and Smart Devices at Yandex. At the company, he heads a team of over 700 developing a network used by about 47 million people. Mr. Stromov delivered the lecture on the evolution of Alice and the relevant and future skills of top engineers for ITMO students at the university’s Yandex space. Future events at the space will be announced in its Telegram channel, ITMOxYandex.