This year, the Nobel Prize in Chemistry was awarded for research performed at the intersection of chemistry and IT – the honored scientists developed machine learning algorithms to study protein properties.
For our bodies, proteins are not just building materials, but also regulators of all chemical reactions. These molecules consist of amino acids, the building blocks that make up every living being. In order to learn a protein’s properties, it’s necessary to know its linear structure, the sequence of amino acids. Predicting and designing these protein structures is one of the key problems in drug development.
“Life could not exist without proteins. That we can now predict protein structures and design our own proteins confers the greatest benefit to humankind,” states the Nobel Committee’s official website.
In 2003, David Baker was the first to model a new protein type from amino acids. For this purpose, Baker developed the Rosetta algorithm, which subsequently turned into Rosetta@Home, a distributed computing network for the creation of new proteins and identification of their 3D forms. With the help of Rosetta@Home, it may be possible to develop treatments for HIV, malaria, cancer, or Alzheimer’s disease. Since the project’s launch, Baker’s group has been developing new proteins that can be used in the creation of new drugs, vaccines, nanomaterials, and sensors.
“Anything that has to do with in silico drug modeling starts with predicting the 3D structures of proteins. The golden standard here is the experimental approach – with X-ray crystallography and nuclear magnetic resonance spectroscopy. However, this process is time-consuming and expensive. On the other hand, predicting protein structures with AI greatly accelerates any project regardless of the research fields: drug modeling, diagnostics, materials science, or even fundamental studies into the nature of living things. That’s why everyone has been expecting this award – it was only a question of time,” says Ekaterina Skorb, a leading researcher at ITMO’s Infochemistry Scientific Center.
The second discovery has to do with predicting protein structures based on amino acid sequences – a problem researchers had been trying to solve since the 1970s. Finally, in 2020, a breakthrough occurred: British scientists from Google DeepMind, Demis Hassabis and John Jumper, developed AlphaFold2, an AI model for predicting complex protein structures from their amino acid sequences. With its help, they were able to predict the structure of nearly all 200 million proteins identified by other researchers in 2022. To date, over two million people from 190 countries have used this program. Thanks to this development, scientists can, for instance, study antibiotic resistance and create enzymes that break down plastic.
“This year’s Nobel laureates have succeeded in predicting protein structures with algorithms; however, the mechanisms by which these structures are formed remain to be studied. That’s what we focus on at ITMO’s Infochemistry Scientific Center by studying the fundamental aspects of molecular protein folding and trying to understand its mechanisms. In our work, we are developing infochemistry, a discipline at the intersection of IT and chemistry. It’s a vast field that includes chemoinformatics, chemotronics, triboinformatics, and other disciplines,” comments Sergey Shityakov, a leading researcher at the center.
ITMO scientists are also developing algorithms meant to study protein structures. For instance, in 2022 the researchers discovered a universal constant to predict protein folding. For this purpose, they suggested an algorithm that calculates a molecule’s dimensionality – whether it’s 2D, 3D, or an intermediate state. Unlike other existing methods, this one allows scientists to identify the folding state in biomolecules and track its progress.