According to the team’s generative retrieval method, Semantic ID, an internal identifier that reflects the semantic and behavioral characteristics of catalog objects (products, content units, etc.), is recalculated using fresh data at the first stage of training; next, the new identifiers are aligned with their previous versions. Thanks to this approach, the system can account for current changes in customer interests without breaking compatibility, optimize full retraining, and accelerate recommendation updates.

Vladimir Baikalov. Photo courtesy of the subject

Vladimir Baikalov. Photo courtesy of the subject

“This solution will benefit major digital platforms with highly dynamic customer interests that have to spend a significant sum to retrain their models from scratch. These costs can be cut by eight times thanks to the new approach,” notes Vladimir Baikalov, a senior researcher at AI VK and an engineer at ITMO’s Computer Technologies Laboratory.

Previously, the industrial application of the generative retrieval approach was complicated by the fact that Semantic IDs, which use collaborative signals, tend to become outdated over time as audience interests and patterns change rapidly. Fine-tuning on new data does not always solve the problem, whereas a full recalculation of Semantic IDs without alignment can make it harder for models to adapt to previous versions of the system and its elements.