Every person has a genome – a specific sequence of genes according to which an individual develops. However, any living organism contains another gene sequence that is called the metagenome. It is the total DNA content of many different microorganisms that inhabit the same environment – bacteria, fungi and viruses. The metagenome often indicates various diseases or predispositions to such diseases. Studying microbiota, i.e. the full range of microorganisms inhabiting different parts of the human body, has therefore a critical role in metagenomic research.

The software tool developed by the scientists and called MetaFast is able to conduct a rapid comparative analysis of large amount of metagenomes. “Having studied the intestinal microflora of patients, we are able to detect microorganisms associated with a particular disease, such as diabetes, or predisposition to the disease. It forms a basis for applying personalized medicine techniques and developing new drugs. Using the results obtained with the software, biologists will be able to draw conclusions on how to develop their research, because the algorithm enables them to study environments that we currently know nothing about,” says Vladimir Ulyantsev, lead developer of the algorithm and researcher at Computer Technologies Laboratory at ITMO University.


One of the key benefits of the program is that it is able to work successfully with environments in which the genetic contents have not yet been studied. “The newly developed approach allows us to do two things – find all the possible gene sequences, even if they were previously unknown (the program collects them from fragments of genomic reads), and at the same time identify metagenomic patterns that distinguish one patient from another, e.g. people who suffer and don`t suffer from disease,” says Dmitry Alexeev, the leader of the project and head of MIPT’s Laboratory of Complex Biological Systems.

This program can be used for conducting an untargeted express analysis of markers indicating certain diseases. Then, by using targeted methods such as PCR (the polymerase chain reaction, a technique to make multiple copies of a fragment of DNA), the results can be verified and adjusted. According to the researchers, the program can reduce the time needed to develop new drugs by several times.

Microorganisms that do not reproduce in vitro, such as viruses, give very abstract results in tests and it is not possible to collect their DNA. However, the new program is able to detect even these microorganisms. “Skin microbiota contains 90% of the organisms that are unknown,” continues Dmitry Alexeev. “Our approach enables us to work with completely unknown material and still obtain results. The program has been tested in a wide range of environments, including those with a high number of viruses. The program is even able to locate and collect single DNA strands.”

MetaFast functions are not limited with detecting pathogens. For example, the program can also be used to compare metagenomes of distinct people at closed populations with people living in cities to help identify bacterial strains that are extremely useful to humans, but have been lost in the process of urbanization. Antibiotics, preservatives, colorants and supermarket food have pushedmany useful bacteria out of our microflora, but these bacteria could still exist in human organisms of those who live in closed populations, such as American Indians or people in Russian villages.

MetaFast has proven to be highly effective in studying rare and undiscovered metagenomes. As a part of the study, the scientists analyzed the metagenome of several of the world’s largest lakes. Without any information about the samples of microbiota from the lakes, the program found genetic similarities between samples that has similar chemical composition.

The researchers also used the new algorithm to study the microorganisms that inhabit the New York underground, thus demonstrating the effectiveness of the algorithm when analyzing such complex systems. Most of the DNA collected using MetaFast belonged to known bacteria. This confirms previous theories stating that the subway is safe for humans, and the microbes that live there suppress any flora that could be dangerous to people.

A vast amount of experimental data has already been gathered worldwide on various metagenomes. As the cost of extracting DNA is decreasing and the sensitivity of equipment is increasing, the volume of data is continuing to grow exponentially. Despite this, most of the studies have not been fully completed. The reason lies in the limitations of the current technology. On the one hand, scientists are able to partially collect a metagenome, but piecing together the “puzzle” takes an enormous amount of time. On the other hand, they can compare individual fragments of the genome with existing DNA references, but there are very limited numbers of bacteria, and virtually no viruses.

The new algorithm not only combines the advantages of both of these approaches, but also enables data to be processed at high speed. The program saves RAM because it partially collects and partially compares genomes, but does not go into an in-depth collection analysis.


Vladimir I. Ulyantsev, Sergey V. Kazakov, Veronika B. Dubinkina, Alexander V. Tyakht, and Dmitry G. Alexeev MetaFast: fast reference-free graph-based comparison of shotgun metagenomic data
Bioinformatics 2016 : https://doi.org/10.1093/bioinformatics/btw312