Modern AI performs exceptionally well in English, Chinese, and Russian largely due to the vast amounts of text and speech data in these languages. However, the same models lag behind when it comes to low-resource languages, i.e. languages that lack the needed quantity and quality of data to train AI models. These include, for example, Arabic and Kazakh. These languages demonstrate an acute shortage of datasets, especially those necessary for speech synthesis and recognition, even though they have a large number of native speakers. Building speech models for low-resource languages comes with financial hurdles: it requires costly manual data collection, labeling, and processing. Hence, the quality of AI responses in these languages falls short as compared to the most common ones – and, therefore, prevents users from fully integrating AI into their everyday lives. 

These issues will be addressed at a new laboratory by ITMO and MWS AI, which will specialize in efficient AI methods and data infrastructure for low-resource environments. Researchers will develop new models for speech synthesis and recognition in low-resource languages, as well as accelerate and compress existing AI models such as Qwen и Llama. Algorithms and methods developed at the laboratory will be compatible with any transformer models – that is, most modern large language models. All solutions will be published on GitHub and HuggingFace.

The laboratory will be managed by Ammar Ali, a senior researcher at ITMO’s School of Translational Information Technologies and an expert developer at the Fundamental Research Center at MWS AI, Stamatis Lefkimmiatis, the head of the Machine Learning and Fundamental Research Center at MWS AI, and Alexey Kashevnik, a senior researcher at ITMO’s School of Translational Information Technologies. It will involve employees at ITMO and MWS AI, as well students from ITMO’s Information Technologies and Programming Faculty;

“Arabic is my native language. Training models in it is much more expensive and challenging than in English. It also puts some limits on studies related to AI for low-resource languages speakers. That’s why we want to build AI tools that will be available to everyone. In particular, we plan to collect around 100 hours of Arabic speech, which, among other things, may be used to train speech synthesis models; we also want to train Microsoft's VibeVoice to operate in Russian, Kazakh, and Arabic,” notes Ammar Ali, the laboratory’s head and a researcher at ITMO and MWS AI. 

Ammar Ali. Credit: Sberbank’s AI Journey contest

Ammar Ali. Credit: Sberbank’s AI Journey contest

By using mathematical optimization methods (quantization, AI pruning, and attention linearization), the researchers intend to develop new compressing methods for neural networks that will make them run faster and take up less space without significant decreases in accuracy (up to 5%). Specifically, they will attempt to accelerate the performance of existing transformer models by four times and reduce the hardware requirements for them by half. The team will also work on new cost-efficient methods for training AI in data classification, segmentation, and detection, as well as will assemble and introduce datasets for low-resource languages into current models. Lastly, they’ll develop benchmarks that will evaluate how large language models such as ChatGPT or Claude perform on new data. 

In the longer run, optimized AI models can be deployed locally on resource-constrained devices (e.g., smartphones), keeping sensitive data private for universities, businesses, and individual users. Additionally, this could cut the need for cloud services for corporations and provide reliable access to AI for all users.