When developing new medicines or materials, scientists need to consider the properties of individual molecules – but also how they interact with one another within the final multicomponent system: for instance, a cocrystal or molecular complex. These interactions determine critical properties such as drug solubility and stability, structure and strength of materials, and sensitivity of molecular systems, as well as binding affinity and function in biomolecular complexes.

Most commonly, molecular systems are studied via experimental methods, numerical modeling, and now – AI. However, the current data on complex multicomponent systems is insufficient: these systems are heterogeneous, often poorly-structured, and rarely fit for training models. Therefore, one of the main objectives on the agenda is to create high-quality datasets, benchmarks, and evaluation protocols to design and compare AI models. 

These problems will be addressed by the new laboratory for digital design and modeling of macro- and supermolecular complexes by ITMO University and AIRI Institute. The lab’s researchers will develop algorithms to help describe the interactions of molecular components, predict the properties of multicomponent systems, and select the most promising combinations for further testing. For that, the laboratory will conduct studies at the intersection of chemistry and AI – the research scope includes high-quality datasets and benchmarks for model training and comparing, automatic extraction of data from scientific literature, data augmentation methods, chemically- and physically-informed models, the properties of multicomponent systems, and AI agents for the research process. 

“We want to do more than just model the behavior of individual molecules; we want to learn to handle more realistic and advanced systems: complexes, cocrystals, biomolecular interactions, and materials strongly affected by collective behavior. These studies will speed up the development processes for new materials, medications, molecular complexes, and systems with desired properties. In the long run, it will help us reduce the need for expensive experiments and accelerate the transition from an idea to a testable solution,” says Nina Gubina, the head of the new laboratory and an engineer at ITMO’s Center for AI in Chemistry. 

Nina Gubina. Credit: prostospb.team/hackathon-26

Nina Gubina. Credit: prostospb.team/hackathon-26

At their disposal, the scientists will have machine learning, generative AI, graph and tabular models, large language models (LLMs), active learning methods, multimodal representations, and multiagent systems to deal with multiple tasks, part of which will be associated with computer modeling, i.e. automatic hypothesis validation before laboratory testing. 

The core team will be formed by researchers from ITMO University and AIRI Institute. From ITMO’s side, the laboratory will be co-led by Nina Gubina, an engineer at the Center for AI in Chemistry, Nikita Serov, the head of the Center for AI in Chemistry, and Michael Medvedev, the leader of the theoretical chemistry group at the N.D. Zelinsky Institute of Organic Chemistry. Plus, employees and students of ITMO’s Center for AI in Chemistry, Infochemistry Scientific Center, and adjacent research teams will join the new lab. The AIRI Institute team will include Artur Kadurin, the director of the Centre for AI-Based Development of New Drugs, as well as researchers Kuzma Khrabrov and Artem Tsypin. Students from other universities – specifically, Lomonosov Moscow State University, HSE University, and Mendeleev University of Chemical Technology – are also invited.

“It’s no longer enough to study separate molecules; we need to understand how they behave together in multicomponent systems. We’ll primarily focus on developing high-quality datasets and benchmarks for training models because a lack of structured data remains one of the main limitations in this field. We believe the research will speed up the creation of new medications, functional materials, and promising molecular systems of the future,” notes Artur Kadurin, the director of the Centre for AI-Based Development of New Drugs at AIRI Institute.

The laboratory will integrate remote work; this concerns communication, reporting, and development processes. As Nina Gubina highlights, this format makes a perfect fit for the project because it is driven by data, code, and computer simulations. 

For the first stage, the researchers plan to prepare verified datasets, benchmarks, and evaluation protocols for multicomponent molecular systems; next, these materials will be used to develop predictive models and targeted search methods for systems with specific properties. One of the team’s pilot projects will focus on creating a large-scale multimodal benchmark for quantitative modeling of aptamer-protein interactions, which will help standardize diverse data and formulate evaluation protocols to compare AI models and identify limitations.

Another project will be dedicated to building a cocrystallization database with first-hand information from scientific sources. The findings will be beneficial for both automated data extraction and predictions of cocrystal formation. In the future, the developed solutions can be merged into agent and multiagent pipelines for the entire cycle of molecular design.