It was just yesterday that the winners of Skoltech Fellowship Program 2019 have been announced. You made it into the top-5 young scientists in the field of systems biology with a project dedicated to single cell RNA sequencing. Please tell us more about your work.
I will only start working on this project as part of the program. Nevertheless, I already have a particular number of projects in this field. The goal of this one is to identify the sources of the variation in the data from RNA sequencing of single cells.
For decades, we’ve been deriving RNA from a sample (a blood sample, for instance), and studied it as a whole, but now we can do even better than that. We can take a sample which is a solution of cells and identify the RNA that are present in every particular cell. This can help us to understand what happens in each of them.
Let’s say that you have a group of diseased people and a control group, and you compare their RNA. You take blood samples from every person, but some can have more lymphocytes in their blood, and some – less. You see, every sample has a very complex cellular composition.
Now, we can take a sample and split it into cellular components straight away. This means that we can immediately learn the number of lymphocytes and their gene expression, the number of monocytes and their RNA, and so on. Biology-wise, it’s sampling all the same, but it allows us to attain a higher level of analysis and a better understanding of what’s happening in the sample.
This is extremely interesting, but there are particular issues. In fact, all of the RNA that exists in a cell is actually the RNA that is defined by the cell type and particular processes that take place in this cell, like division or response to some external stimulant. We want to have the ability to understand several things at once: both the type of the cell and which processes have been activated in it. This is what my research is about.
For how long have scientists focused on such tasks?
As of today, single cell sequencing is the forefront of systems biology. It has only been in the recent years that scientists understood the ways to effectively isolate cells for the purpose of RNA sequencing, and what is more important, the ways for doing that with thousands of cells at the same time. What’s more, we can now accomplish such things as studying not just the RNA but also the proteins that exist on a cell’s surface. In other words, we can study everything at once.
How would you explain the importance of studying the RNA of single cells in layman's terms?
Different cells have different cell functions. To put it simpler, I can give you the following analogy: you have a pack of Skittles with candies of different color. If you eat a handful at once, you’ll feel the tastes of all of them. You’ll know that you’re eating Skittles, but you won’t be able to tell what color they were or feel how every particular candy tasted. So, single cell RNA sequencing is something like when we take a pack, split the contents into different groups by color (in our case, components), and try to understand particular cellular processes that have been activated in these cells.
On the one hand, it is the processes that we are interested in, on the other, we want to know exactly how many cells we have. If it’s a tumor that we’re studying, it has both malignant and immune cells, and we want to know what these immune cells are doing there, how many of them there are, and how many there are other cells that do something useful. If you take a piece from the sample and analyze the whole of its RNA, you will get some average result, while what we want is to learn about all the key players and how they operate.
When and how did you arrive at the idea for your project?
I’ve been working with this technology for several years, and I’ve come to understand that there are issues in this field that different people learned to solve by various means.
On the one hand, in such conditions when you have a great number of cells, you start to analyze data and try to understand which types of cells you have (for example T cells, B cells, monocytes, etc.). On the other, you have the division signal. Point is, the signal changes the cells to such an extent that the dividing cells become more similar to each other that to the cells of their type. In a particular sense, you can say that the signal completely “upgrades” the cell.
As I’ve already said, people learned the ways of avoiding that: if they know that cellular division is present in their set of data, they get rid of this signal and analyze the data in its absence. But this is very interesting data which you wouldn’t want to lose. And we’d really like it if we could study all the cells that we have in a sample, their type and the associated processes.
What are the practical application of single cell RNA sequencing?
It has many uses; as for us, we applied it in studying various diseases and their models: oncological diseases, viral diseases, atherosclerosis. If it’s cancer research that we’re talking about, single cell RNA sequencing is good for studying both malignant and immune cells that enter the tumor to fight the disease. Also, we can use this technology to study how the Zika virus infects bone marrow stem cells in mice. On the whole, the applications of single cell RNA sequencing are not limited to the field of medicine only. But it’s this field that I’m personally interested in.
How are you going to organize your research work?
The project is planned to last three years, and I hope that will be enough time to do the necessary amount of research. I’ll be doing most of my work here at ITMO University. What’s more, we also have a collaboration with the Washington University in St. Louis. So, if we’ll come to understand that we need to make a biological dataset in order to validate this method, that won’t pose much of a problem.
Your article has recently been published in Nature Communications. Did it have to do with a similar field?
The field was slightly different, but the tasks had a lot in common. In that research, we studied the RNA of mixed samples, and not single cells; at the same time, we said that we can consider a sample as a combination of cell types and try to predict the ratio of cellular types in the samples. In my current research, I consider every particular cell as a combination of cell types and cell processes, but I also try to understand how to split it into components. From a mathematics stadpoint, these tasks are very similar, so I aim to apply the practices from this research article to working out how it all operates on cellular level.
You got your Bachelor’s degree at ITMO’s Computer Technologies Department. Why did you choose to not focus on industrial programming but got interested in science, namely systems biology?
I think my main motivation is the opportunity to generate new knowledge. Actually, I have quite a story with biology. I graduated from a lyceum with a focus on physics and mathematics, and I didn’t have a teacher who would’ve shown me that biology is indeed an interesting science. I believe that it’s a common trait of such educational establishments where the main focus is math. Therefore, I never liked chemistry and biology in my school years.
Towards the end of my Bachelor’s program, I started working with Aleksey Sergushichev (Aleksey Sergushichev is a research fellow at ITMO’s international laboratory “Computer Technologies”, head of the Bioinformatics and Systems Biology Master’s program – Ed.). My work was mostly associated with programming. But I already started to try to learn more, to better understand biology.
For example, I watched a great video course, “Introduction to Biology” by Eric Lander, which really motivated me. Then I went to the Bioinformatics Summer School, and they had many great lecturers who spoke about this science and their research. So I thought: “Wow. It’s all totally great”. Since then, I’ve been focusing on this field.
Systems biology as a general term became widespread in the ‘00s, which makes it a young research field within the context of the development of fundamental science. What is working in this field about?
For me, systems biology is something like a toolkit for biologists. We solve fundamental tasks on a daily basis: biologists bring us data, and we analyze it and generate new hypotheses. Then, biologists return to their laboratories and conduct experiments in order to validate them.
In a sense, systems biology is similar to machine learning: it is most useful when it solves specific tasks of some particular field.
In all interdisciplinary fields, the issue of understanding specialists from other areas is ever relevant. How is this problem solved in the field of systems biology?
As of now, I’d say that I exist in-between two research fields. So you can say that I act as a translator of a sort. But on the early stages, that was indeed a complex process. During the first two years, I had problems with understanding my tasks: I got the general idea, but grasping the nuances was quite hard.
I was lucky that I got a chance to do an internship in the USA, at the Washington University in Saint Louis, Department of Pathology & Immunology. I spent my time there working in a team where about seven people knew how to code. The remaining members, and that was over 200 people, were hardcore immunologists. Once a week, we had seminars where PhD and postdoc students from various laboratories spoke about their work in 15 minutes. At first, I couldn’t understand more than the first three slides of any presentation. In about a year’s time, I learned enough to understand the first ten. Making your first steps in systems biology is not an easy process, you have to read a lot and keep track of the latest research in this field.
While on this subject, could you please name the key competencies that a systems biologist should have?
Without doubt, you need fundamental knowledge in the field of molecular biology and bioengineering. Nowadays, not knowing anything about CRISPR is simply unacceptable in our line of work. Without that, you won’t be able to understand any articles that are published in international journals.
On the whole, you need to keep pace with the emerging technologies. You can’t focus on one task only and wait until it’s solved. It’s important to also look at what’s being done in biology and its adjacent fields. I’d even say that today, it’s hardly possible to find an article in a good international journal where all authors would be pure biologists. Solving a task from within a single research field has simply become impossible.
On the other hand, we have collaborations. You don’t have to dive too deep into another field. If a team has a person who has keen knowledge of biology and another one who’s proficient in bioinformatics, they will surely be able to come to an agreement and understand each other.
As part of the Skoltech Fellowship Program, you’ll be working in Russia for the next three years. On the whole, are there enough opportunities and prospects for doing systems biology in today’s Russia?
If it is bioinformatics and systems biology that we’re talking about, then yes, absolutely. The situation might be different with biological sciences, as you’ll need to purchase chemicals and additional equipment, deliver them, and find additional financing for these expenses. As for systems biology, all you need is a laptop, and sometimes a computer cluster, and we have a plenty of such resources here in Russia. Finally, you can involve collaborators, and interact with them remotely.
We also have enough competencies and specialists, both in St. Petersburg and Moscow as well as other cities. For example, we are soon leaving for a conference in Tomsk; there also are strong competencies in Kazan. What’s more, we conduct various events, including workshops, summer schools and so on, which often bring together participants from all around the country.
Apart from your work as part of the Fellowship Program, what are the projects that you are planning to conduct in the near future?
We want to study a great number of datasets of single cell RNA sequencing, including those that are associated with cancer research. It would’ve been be great to develop a service where every researcher who focuses on this subject could open any of the publicly available datasets and have a look at the data. As of now, there’s no problem with accessing common datasets on expression, but if it’s datasets on single cells that we’re talking about, it is not that simple due to their size, complexity and particular issues with visualization. Then again, datasets of single cell RNA sequencing offer a great opportunity to learn what is expressed and where, and in what tissues, cells or cell populations a particular gene can be active. Such a service can be useful to researchers who are on the lookout for new leads.