To comprehend how the new method works one should consider genomic sequences as complex collections included nucleotide “words” or k-mers where “k” is a length of nucleotide combinations A, T, G and C.The reason why all organisms have dissimilar k-mer spectra is that all of them have unique data-containing sequences. The combination of all genomes of organisms and a carrying genome is metagenome. Having analyzed it one can compare samples to find out differences between bacterial compositions.

Nucleotide sequences are researched by sequenators used for biological material`s analysis. These datasets give opportunity to get detailed information about nucleotide sequences. While analyzing metagenomes of such complex bio systems as human gut, sea water, petroleum and others one deals with plenty of data that should be structured.To calculate nucleotides with different lengths quickly as well as to arrange sequences and compare them correctly one should develop a specific algorithm. That is why the experts in different fields initiated an interdisciplinary project. Two years ago Vladimir Ulyantsev, ITMO University researcher and specialist in programming visited Moscow to make contacts with biologists. Then he developed a program for k-mer counting and researching of their spectra.

“Rapidly developing of personalized medicine is why this research is relevant. It will help to determine what drugs are required for patients with various diseases. The wide applicability of these approaches is limited due to high price: the analysis of one genome or metagenome costs more than 1, 000 $. This invention can be also applied for soil analyses and petroleum extraction needs like quality evaluation. Furthermore one can research new types of bacteria,” noted Mr. Ulyantsev.

Usually metagenomics analysis is based on comparing samples with taxonomic compositions. Using existing bacterial genomes researchers compare the features of new samples with them. However some unknown organisms like viruses have no samples to be compared with.Those parts of sequences that have no analogues are not taken into account even if they are important. The method developed by Russian researchers doesn`t require the process of comparing new genomes with known ones, what gives an opportunity to analyze all parts of sequences and get more precise results.

Read the article here.