While sequential recommendation systems excel at capturing temporal chains of user actions, they struggle to recognize true user preferences, especially when there is not enough data. Large language models (LLMs) can help uncover this hidden data, but if used directly, they require a lot of resources: a single LLM query can take over 10 minutes. The approach developed by the Russian scientists solves this program. During the training phase, the algorithm generates a user profile – their preferences and behavioral traits – based on the history of their interactions and text metadata. Then, this profile is converted into a vector, which serves as a reference point for the internal representations of the model. As a result, when a system generates recommendations for a real user during inference, a resource-demanding model is no longer necessary.

Tests conducted on four datasets showed that adding LLM distillation to the popular SASRec and BERT4Rec models delivers a stable increase in quality. The accuracy of NDCG@10 on the ML-20M dataset grew by 5.62% and the Recall@10 metric – by 4.74%, compared to the original SASRec. Meantime, recommendation systems generated results 190 times faster (4,37 vs. 840 seconds) than the LLM-based IDGenRec methods.

“Large language models possess immense knowledge about the world and where our preferences come from. However, using them directly for recommendations is like inviting a linguistics professor whenever we get a question in chat. What our approach does is take the most valuable from a ‘professor’ – their deep insights into user needs – and pass it onto a fast and lightweight ‘assistant.’ Users get personalized, instant recommendations, while businesses gain a scalable, cost-efficient solution for businesses,” notes Nikolay Tiden, the head of Sberbank’s Practical AI Center. 

The developers believe their solution will make recommendation systems even better. Streaming services will offer a smarter choice of recommended movies and marketplaces will learn to show the right products, even when a user doesn’t yet know what they want. As for companies, they will obtain a ready-made method that will improve the quality of recommendations without having to increase computational costs, remodel architectures, and maintain a heavy LLM in production. This is especially crucial for large-scale industrial systems where every millisecond matters.

Vladislav Kulikov at ECIR 2026. Photo courtesy of the subject

Vladislav Kulikov at ECIR 2026. Photo courtesy of the subject

The findings were presented at the 48th European Conference on Information Retrieval (ECIR 2026) in Delft, the Netherlands. 

“For this research, I developed user profiles based on LLMs: I studied various prompting strategies, tested methods for aggregating interaction history, and compared different models and embedders. These profiles were then used to adjust recommendation systems, so their quality had a direct impact on the final result. I demonstrated our results at ECIR 2026. During the poster session, many researchers approached me: they asked questions about our findings and shared the tasks they worked on – it was a great exchange of experiences. It was also valuable to practice my English and see people from different countries and contexts understand our work and be eager to learn more,” says Vladislav Kulikov, a student of the Artificial Intelligence Master’s program.