What is DNA?

It’s more or less common knowledge that there are proteins, fats, and carbohydrates. Along with these, each cell contains deoxyribonucleic acid (DNA), which is responsible for data storage. For specialists in genomics and bioinformatics, DNA is one of the main languages of biology. It contains only four letters: A, T, G, and C. These letters stand for nitrogenous bases that DNA is made of: adenin (A), cytosine (C), guanine (G), and thymine (T). DNA is like a computer program – it might be very complex and full of kludges, and yet it works.

How is it possible to extract DNA from hair, blood, or saliva?

DNA is present in almost all body cells, except for erythrocytes, the cores of which are destroyed as they mature in order to transfer oxygen more easily. So, there is a wide range of biological materials suitable for DNA extraction. This process consists of four stages: destruction of cell membranes in order to release DNA, getting rid of proteins connected with DNA, getting rid of other admixtures, and dissolution of DNA in order to store it. It can be done even at home, but then DNA wouldn’t be very clean and would hardly be suitable for scientific research. The quality of extraction also affects the information you’ll get as a result, so you need specialists in molecular biology to do everything with high precision. The easiest material to extract DNA from is blood because it’s an easy-to-acquire biological material. Even though erythrocytes don’t contain DNA, blood contains a great deal of white blood cells, which are a plentiful source of DNA. There’s less of it in saliva and oral epithelium, and even less in hair. You need a blood sample in order to decode the entire genome with high precision, but saliva will suffice if you need to do a DNA test, for which precision is not that important.

What is the difference between a genome and DNA?

DNA is the name of a molecule that contains hereditary information. A genome is the entirety of a living organism’s DNA that contains all the information about a species or even an individual. That’s why you can say either “the human genome” or “the genome of a particular person – Peter’s or Mary’s”. On a physical level, any genome is made of chromosomes. Humans inherit 23 pairs of chromosomes from their parents: 23 from their father and 23 from their mother, 46 overall. Once the organism is conceived and begins to grow, each cell copies this set of chromosomes. Sometimes minor errors occur during this process and that’s called somatic mutation. Sometimes the errors are rather crucial and cause various disorders.

What does it mean to decode a genome? Who encoded it?

No one, it’s just an appropriate word to describe the process of working with genomes. Going back to the previous analogy, a genome is not only a complex program but also a very badly written one. Other than A, C, G, and T, it also contains many additional levels of data encryption that aren’t necessarily inherited. They can change during the lifetime of an organism. This process is often called epigenome and is studied by the science of epigenetics. All this complexity makes it seem as if we have to "decode" the genome to study it. Plus, the term we use in Russian (расшифровать – “to decipher”, not “to decode” – Ed.) is not an entirely correct translation from English. In English, “a code” is just a system of symbols. It doesn’t imply there’s a secret or a security system. Any language is a code. A cipher, however, is a code secured from breaking in. So, English terms are less romantic than Russian ones.

They say that you can track your genetic history with the help of your genome. Is it like a DNA test?

By “DNA test” we often mean analysis of some small parts of a genome, variations of which somehow affect the organism. In genomic research, scientists work with a much larger amount of DNA – ideally, with all of it. That’s called whole-genome research. However, fragments of the genome will suffice if we want to figure out someone’s genetic history or find out whether two people are related. It’s possible thanks to mathematics and the fact that our genome contains fragments that are variable and unique for each person.

How can scientists learn something about, say, ancient lions by taking blood or hair samples from their modern descendants?

There are very complex mathematical algorithms that allow us to determine the most likely scenario based on genetic data and see when mutations in certain fragments of the genome took place and thus formed its modern appearance. It’s a mathematical time machine of sorts. Scientists from ITMO University have recently published a program that helps go back in time and see the past of a genome more precisely. One of the most exciting features of this approach is the ability to determine geographical position. We not only simulate changes in the genome, but also see how populations with specific genetic variations migrated from one place to another.

So, we take the bone from an early Paleolithic tiger and make conclusions about its entire population. But how do we separate common features from individual ones?

We try to create a model that describes scenarios of the past most precisely. It’s not enough to have one sample in order to make it specific. The more samples we have, the more precise our model is. We all have lots of rare genetic variations and some of them are even unique. If we have several samples, then individual variations are filtered out so that they wouldn’t distort our results. Evolution deals with populations, not individuals.

How do we prove our conclusions if we can’t always find the remains of a species’ ancestors?

Good models are able to make predictions. The easiest way to verify our conclusions is to use new data that corresponds with the model. However, sometimes it doesn’t, and the model has to be remade. We can see an insightful illustration of mutations, evolution, and so on through the example of the SARS-CoV-2 genome. Never before have we had such elaborate data on the evolution of a species. At the same time, we get more and more data for other genomes. Over time, the models will be improved – the more data we have, the better they are.

Why do we need all this?

It’s a fascinating process of exploration, but there are also lots of practical applications. Speaking of discovering the history of one’s ancestors, there is a mathematical tool used to determine so-called population bottleneck events. These are moments when the size of a population decreased significantly for some reason. Knowing when such events happened – genomic archaeology, one might say – allows us to understand how to avoid them in the future. It’s especially relevant now, when many species are endangered.