The analysis of a transcriptome, or the set of all RNA molecules in a sample, is widely used in biomedical research. Using this method, researchers can analyze molecular processes in a tissue and, for instance, identify the severity of a cancer. However, tissue samples can contain millions of different cells that need to be distinguished from each other in order for us to understand exactly what processes are happening in the tissue.

To solve this problem, scientists have come up with special deconvolution algorithms, which allow to decompose the data and match it to different cell types. This method helps form an understanding of the cell types that are present in a sample, as well as of their proportion and impact on the transcriptome. However, if a sample contains many different cell types, it becomes difficult to identify all of them without any additional information present.

An international research group from ITMO University and the University of Washington in St. Louis (the US) has found a way to overcome this obstacle and proposed a new method for analyzing transcriptome samples. Demonstrating high accuracy, the method allows to determine which cell types the samples contain based on the principle of mutual linearity of genes, according to which the expression levels of two genes specific for the same cell type linearly depend on each other. The scientists used this correlation to construct networks of linearly dependent genes, through the analysis of which one can determine what cells are in the samples.

The researchers have shown that all deconvolution algorithms are subject to the same bias: if different cell types in the sample have different amounts of RNA, the algorithms in question fail to accurately estimate cell type proportions. To test this experimentally, two types of cells with different amounts of RNA were selected and mixed in different predetermined proportions. After that, the scientists used various deconvolution algorithms to determine cell types ratios.

Konstantin Zaitsev
Konstantin Zaitsev

“We saw that existing algorithms would always be wrong about the number of cells since what they estimate is the amount of RNA in the samples and not the actual number of cells. But if we conduct our measurements through adding particular amount of artificial RNA to each sample, the predicted cell type proportions can be improved to become more accurate,” explains Konstantin Zaitsev, a researcher at ITMO University’s Laboratory of Computer Technologies.

According to the researcher, this method is best suited for analyzing mixed samples without sufficient information about their composition. This method doesn’t need any additional information, which makes it suitable for all types of tissues.

“For example, it can detect differences in cell composition after vaccination in blood samples. Using the TCGA public database (The Cancer Genome Atlas), we are already trying to identify the cell types associated with the survival of patients with different cancers,” concludes Konstantin Zaitsev.

Reference: Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures. Konstantin Zaitsev et al. Nature Communications. May 17, 2019.