Earth has millions of biological species, and this diversity is laid down on the genetic level. Animals’ anatomy, size, color patterns and habits are defined by their genes. Then again, the diversity of genes themselves is not that great: by today, scientists have only identified about over 20,000. Therefore, species are different in not only the sets of genes they have but also in how these genes are arranged. In the language of comparative genomics, this is called synteny, i.e. the arrangement of genes and regulatory elements.

“Let’s take a gorilla and a chimpanzee as an example,” says Ksenia Krasheninnikova, a researcher and engineer at ITMO University. “These two species have the same set of genes, but their regulatory elements and genome rearrangements create slightly different orders, which results in differences between these primates.”

Primates. Credit: shytterstock.com
Primates. Credit: shytterstock.com

Therefore, for the purposes of understanding how close two species are from the evolutionary standpoint, scientists need to know not just their genes but also how they are arranged in a chromosome, and how many common genome fragments there are. 

“Gene sequences that retain their order in different species are called synteny blocks,” comments the researcher.

In search for a new tool

The identification of these synteny blocks is an essential task for geneticists; it helps to get a better understanding of the mechanism of evolution and speciation. But looking for them manually is impossible: the amount of data is just too big. Genomes of mammals consist of millions and billions of base pairs, which makes processing without big data technologies next to impossible.

Then again, finding an appropriate software tool for the processing of such data can be quite hard: some algorithms are very slow, others don't work with entire sets of data storage formats, and there are also those that just aren’t capable of  dealing with today’s tasks. 

“Genomics is a constantly developing science, and the quality of genome assemblies is constantly increasing,” explains Ksenia Krasheninnikova. “Some old trustworthy programs just can’t work with the vast amounts of data that we get nowadays. They simply weren’t designed for this much data.”

Ksenia Krasheninnikova. Credit: personal archive
Ksenia Krasheninnikova. Credit: personal archive

For this reason, scientists create more elaborate programs that make it possible to solve this new category of tasks which has emerged in the course of the development of this science. And this is what the research team that included scientists from ITMO’s Laboratory of Genomic Diversity did. 

Comparing a cat to a dog

The new software tool was named halSynteny. According to its authors, it can search for synteny blocks better and faster than other programs developed for this purpose. What’s more, halSynteny works with data in two standard and well-documented formats.

“Our goal was to create an algorithm that could be easily applied to accessible data,” says Ksenia, who is the first author of this research. “Some of the approaches to the identification of synteny sequences are based on annotating genes in advance; our method is different. We don’t use any additional annotation. We use the alignment method, when different parts of one genome are aligned by their degree of similarity with parts of another genome. This way, we can identify homogeneous parts, parts that are of the same origin.”

Cat and dog. Credit: shutterstock.com
Cat and dog. Credit: shutterstock.com

The program makes it possible to speed up the computations by over two times in comparison with SatsumaSynteny2, another popular tool. Such high efficiency was attained by implementing a mathematically effective algorithm using C++. 

The proposed method and software tool were tested by comparing cat and dog genomes. 

“We showed that large fragments of cat chromosomes and some fragments of dog chromosomes unite in synteny blocks, which means that they’ve evolved from similar chromosomes of a common ancestor. And this can be used as a basis for making conclusions about their evolutionary process. Previous research in the field of “wet” biology demonstrated that cats’ genome changed less from the genome of their common ancestor in comparison with that of dogs. This can be seen in comparison with other species that are not part of the carnivora order. The results that we got confirm these conclusions and make them more accurate. This means that in some specific part, the genome of a cat and the species taken for comparison is similar, and in dogs, it is rearranged.”

In future, this algorithm will be used in other research in the field of comparative genomics that takes place at ITMO University.

Reference: Ksenia Krasheninnikova, Mark Diekhans, Joel Armstrong, Aleksei Dievskii, Benedict Paten, Stephen O’Brien. halSynteny: a fast, easy-to-use conserved synteny block construction method for multiple whole-genome alignments. GigaScience, 2020/10.1093/gigascience/giaa047