“Protein folding is the challenge of the millennium for molecular biology, and great advances have been made to solve it. In a way, the work on this fundamental problem is akin to decoding the human genome. Quite a few research fields depend on proteins, and their functions are determined by sequences of amino acids and their folding mechanism. For instance, folding determines the processes triggered by particular proteins in living cells. In pharmacology, the hydrophobic nucleus of a folded protein is responsible for interaction with drugs, while in biomedicine folding can help us unravel the mechanisms behind the progression of various diseases, as well as the best ways to treat them,” explains Dr. Sergey Shityakov, the article’s first author and head of the chemoinformatics group at the Infochemistry Scientific Center.

Protein folding is still an unsolved mystery. We know that the sequence of amino acids can determine the structure of the folded protein, but it is still unclear how the protein reaches its ultimate folded state and why the folding speed is so high given the number of intermediate states a molecule can assume on its way to folding.

In the case of the Trp-cage polypeptide used by the researchers in this study, the folding from linear to globular form took nanoseconds. By using its molecular dynamics, the scientists were able to predict this process.

“One of the most influential discoveries in 20th century biology was that DNA is structured like a text or a sequence of nucleic acids. This text is transformed into a physical object with its own function, a globular protein molecule, through folding, where a one-dimensional chain acquires a unique shape in a water medium and turns into a 3D globule with specific properties. This is a very interesting process both in terms of practice and philosophy, and we want to describe it mathematically using fractal dimensionality, which fluctuates between 1D and 3D limits,” explains Prof. Michael Nosonovsky, one of the heads of the project, head of the triboinformatics group at the Infochemistry Scientific Center, and a Professor at the University of Wisconsin-Milwaukee.

Prof. Michael Nosonovsky. Photo courtesy of ITMO's Infochemistry Scientific Center

Prof. Michael Nosonovsky. Photo courtesy of ITMO's Infochemistry Scientific Center

The study in detail

The key figure in the method was the function with the power exponent α, which is commonly used in polymer physics, but was applied to proteins for the first time. The exponent is derived from the molecule’s topology and its radius of gyration, which is responsible for how compact the folded protein is.

In this article, the researchers were the first to calculate this constant for proteins. When α approaches 0.333, it signifies that the protein is in a globular state, nearly folded. If α = 1, we will see a linear stretched molecule and at α = 0.5 – the ideal chain.

A great number of protein conformations were generated for Trp-cage in order to use the molecular dynamics method for this protein and identify the values of α. Now, thanks to that, the exponent can be used to analyze any protein without resorting to in vitro tests.

In order to identify the efficiency of the new method, it was compared to the performance of AlphaFold and other popular algorithms, the majority of which do not depict protein folding in real time. Some algorithms even require experimental data to run.

The Monte Carlo (Rosetta) method works faster, but it requires a reference structure, acquired crystallographically. Machine learning-based methods, such as AlphaFold, are relatively fast and precise, but they produce ultimate models, without taking into account the flow of protein folding. A protein’s globular state can be calculated with the radius of gyration, but it is always changing, while the suggested method relies on a constant.

In this article, the researchers considered a protein folded when it reached the near-ultimate metastable state, in which the molecule’s hydrophobic nucleus has formed, drastically reducing Gibbs free energy. Then, the folding proceeds along the molecule’s hydrophilic areas around the nucleus, which, too, is an important process, however it takes longer and isn’t accompanied by such declines in Gibbs free energy.

Currently, the researchers are working on an extended protein folding that lasts 3-5 microseconds, with the aim of producing a method that will outperform AlphaFold.

What next

According to Dr. Shityakov, the new method can be applied to various globular proteins. The researchers already have a tried methodology that will allow them to create software that would calculate α for different proteins, including bigger ones.

Dr. Sergey Shityakov. Photo courtesy of ITMO's Infochemistry Scientific Center

Dr. Sergey Shityakov. Photo courtesy of ITMO's Infochemistry Scientific Center

“With this method, we can change the protein’s structure or modify the solution by changing its pH, adding ions, ligands, and other molecules such as melamine cyanurate or cyclodextrin. These molecules will affect the protein folding mechanism. For instance, there is a special protein class called chaperone proteins (Hsp90), located in living cells and specialized in assisting the folding of other proteins. There is evidence that their normal functioning is key for supporting cells’ vitality, while a disturbance in their work can trigger cancer. Our universal method will make it possible to study such protein functions and changes in them,” concludes Dr. Shityakov.

Reference: S Shityakov, EV Skorb, M Nosonovsky. Topological bio-scaling analysis as a universal measure of protein folding. R. Soc. Open Sci. 2022.

Tamara Besedina, 

Infochemistry Scientific Center, ITMO University