While describing research done at your lab, you mentioned that you also develop systems for computational analysis of cells. Can you tell us a little more about it? How can mathematical models bring us closer to diagnosing various diseases? 

There are many examples of how these methods are applied in studying diseases and I tried to cover some of them in my lecture. In general, our tasks are broader than that, as we are trying to understand how cells work. 

These are extremely complex systems that contain a great number of components. For instance, even at the level of genes, there are over 10,000 possible transcripts (transcript is an RNA molecule created as a result of transcription – Ed.), according to our estimates – and that’s only the most popular ones. There are even more types of proteins. 

We could never model the whole system due to its complexity. That is why our aim is to understand single processes: certain molecule combinations or combination states that actually matter. In other words, to pinpoint the functional differences between cells, for example, separating those that do their job from those that malfunction. We have to view this as a statistical task, otherwise, it just wouldn’t work. Each individual cell is different from the others in some aspects. These differences account for a lot of “noise”, which we need to ignore while paying attention to the things that are important. That is why we need to use statistics to interpret cellular biology. 

I am not saying that such models exist today, they are still somewhere in the distant future. But methods of individual cell management bring us one step closer to understanding. We get to see the noisy diversity of cell states and sort out the details that actually matter. 

What does it look like in practice? 

Let’s say we want to identify the type of cells most affected by changes during schizophrenia. Our brains have at least a hundred types of neurons, so it would not be easy to find the one subtype that is most affected. What do we do? We measure complex states of hundreds of thousands of individual cells and analyze them, trying to pinpoint which of the neurons changed the most. 

Peter Kharchenko's open lecture at ITMO
Peter Kharchenko's open lecture at ITMO

To do that we naturally need a practical solution that will analyze the data we’ve collected: it has to be normalized and prioritized, for us to derive statistically significant changes, which we can work with. The next, more precise level is trying to understand which of the genes functionally caused these changes or what are the signals that could initiate them. The signals, especially the exogenous ones, are in that sense more interesting, because they can be easily manipulated – something we can’t say about the in-cell processes. 

Thus, our group, as well as many others, is developing methods that would allow us to say what to look for next. In a way, it is a microscope capable of producing a rich and clear picture, so that we can see the way ahead of us when we look at this picture. The method itself, is, obviously, not an exhaustive answer and is not actually applicable to diagnosis and treatment. 

You also said that data visualization is crucial to enable researchers to apply your models. How do you work with data?

I can tell you how we’ve been doing it for around five years. We collaborated with many experimental groups, so we had to plan a lot of experiments. We were usually responsible for the initial statistical analysis, but after that we have to visualize the data to demonstrate the most statistically significant aspects. 

It is not easy to visualize the data acquired after analyzing such a great number of cells. That is why we had to create a certain type of programs so that biologists and we ourselves could quickly identify the origin of changes during a specified disease. It is very time-consuming, but the good news is that we can use the results again and again. Now it’s not just our lab that has access to them but many other research groups. 

Peter Kharchenko's open lecture at ITMO
Peter Kharchenko's open lecture at ITMO

Lately, you have been involved in cancer research projects. How broad of a field does your lab cover? 

I would say that half of our projects have to do with cancer research. For instance, we study prostate cancer, leukemia and neuroblastoma, a type of cancer that affects the sympathetic nervous system and is common in children. 

We are primarily interested in two aspects in this field. First of all, it’s the tumor’s microenvironment, because a lot depends on that. It can be noticed in the way it metastasizes only in certain tissues. Prostate cancer, for instance, reaches for bone marrow in around 100% of metastasizing scenarios, meaning that there is something that supports this type of tumor. It is important to keep track of such patterns. 

Secondly, we have to pay attention to the differences between cancer cells, because they are thought to be the cells’ resistance factors. With our detailed methods, we can identify these differences and try to connect them to genetic factors and reaction to treatment.

The other half of our projects is tissue mapping. Here, we mostly work with the brain. 

You got your Bachelor’s in Physics, then went into Computer Science and eventually received your PhD in Biophysics. And judging by the tasks covered by your lab, any of your researchers needs to be a biologist and a physicist with good programming skills. How do find people who can handle all of it? What are the qualities you look for? 

It’s not that simple. As someone who is responsible for recruitment, I am always aware of the fact that we need someone with a broad range of skills for all of the projects that we do. For almost all of them our researchers have to be on perfect terms with programming. On the other hand, they have to know the principles of statistics, as they form the basis of all our data analysis. 

Harvard Medical School. Credit: social networks
Harvard Medical School. Credit: social networks

Next, they have to understand biology and have an interest in this field, they need to always be reading on this topic. This field is easier to delve into, but you still need the basics. Finally, there is a whole list of other skills I would love to see in our researchers, but, of course, it’s impossible to get all at once. 

We try to find people who specialize in some of these fields and are willing to improve their knowledge of the rest. I would say that the best-case scenario is when a researcher already has a good understanding of numerical methods, statistics and programming while also being enough interested in biology to spend time on it. In my experience, these researchers take less to adapt to our work. 

Two years ago, you published an article called Challenges and emerging directions in single-cell analysis. Have the challenges and directions changed over the last two years? What do you plan to work on in the future? 

The field has changed so drastically that we would now have to write anew everything we wrote two years ago. As for emerging directions, I would first note spatial transcriptomics. Experiments in this subfield are still extremely complex and the technology is under development, but it has a lot of advantages. First of all, these experiments allow us to look at the context — and multicellular organisms depend on the context by definition, so we can only see a rather small picture without it. 

Secondly, there is an important technical advantage. We can freeze or chemically fix the tissue in the majority of such methods and thus we can study it in almost exactly the same state as it was in the organism. It is a highly prospective field and we are planning to apply it in many directions, for instance in cancer research. 

DNA analysis. Credit: shutterstock.com
DNA analysis. Credit: shutterstock.com

The other promising field, in my opinion, is the one closely connected to computational studies: it’s translation or integration of various measuring modes. Transcription state is a projection of a cell. We can also measure the states of DNA, proteins and other cell molecules. We can learn to translate between these modes with computational methods. As I see it, it is a case of training a certain core, which has good knowledge of a particular system so that it can predict the configuration of the regulatory elements by the state of DNA or transcription. 

As for the challenges, there are always plenty of them and on many levels. For example, in computation they have to do with the amount of data and the problems of statistics and visualization. The spatial methods, in turn, take us back to microscopy and with it come the image processing issues. 

One of your colleagues also mentioned ethical challenges in his lecture today. It is a relevant topic in CRISPR/Cas research. Do ethical aspects affect your work? 

Our particular field doesn’t have such strict limitations. There are ethical rules, however, which I fully support. We are more limited in other aspects, like funding, for example. Even if all of the ethics committees approve your research, it is challenging to receive financing for embryonic studies, because the amount of funding is limited. But I wouldn’t even call it a limitation. It is rather a certain threshold. 

It is important to underline that we do not manipulate the system, we study its natural organization, so there are not as many ethical problems that we need to solve. In general, I think that the scientific community has successfully divided the fields and the ethical problems inside them. And, naturally, the closer we get to humans, the more accurately we have to watch our research.