Denis Razbitksy, Ural Federal University

As part of the summer school I participated in a project on reconstructing phylogenetic trees. A phylogenetic tree is a type of graph depicting the evolutionary links between different species. Why do we need to master this technique for cancer research? So that we could retrace how a healthy cell transforms itself to a cancerous one. This would allow for earlier diagnosis of oncological diseases. This line of research is spearheaded by ITMO University Computer Technologies department’s leading research associate Nikita Alekseev, and he became our project supervisor.

Although the project scope was reduced to only two cell mutations, deletion and duplication, the task we were faced with was far from easy. The summer school program included a seminar on the Monte-Carlo Markov Chain (МСМС) sampling method, and we decided to apply it to our project. I remembered hearing about cases when similar tasks were solved by using decision trees based on a set of machine learning algorithms, which also inspired our research approach. The largest chunk of time was spent away from the computer; we had to organize our thoughts and experiment with different theses. It was only after this that we started programming, though we had to limit ourselves to conducting statistical calculations due to time constraints. We still had a great time and not only gained new and valuable skills, but also had the chance to put them into practice.

The 2018 Bioinformatics summer school
The 2018 Bioinformatics summer school

There’s a lot of ongoing research in the field of cancer studies. Amongst the most topical subjects are cancer immunotherapy, oncogenomics, and personalized oncology. But I opted for researching phylogenetic trees because I wanted to use my probability theory and mathematical statistics knowledge for solving some real-life issues, and I think that my team and I managed to do just that.

Elena Kartysheva, St. Petersburg State University

In as little as three days I’ve managed to establish connections with some like-minded computer scientists, and we decided to take on a project under the guidance of Konstantin Zaytsev, Professor at Washington University in St. Louis Python. The project bore a mysterious and long-winded name of ‘Comparative analysis of LUAD and LUSC of lung cancer using the TCGA data’, but that didn’t scare us away as we were interested in trying something new. It turned out that there are two types of lung cancer, each with a different gene expression, discerning which enables us to make more accurate diagnoses and, therefore, introduce more effective solutions.  So targeted cancer treatments is a very topical and important branch of cancer research.

We talked to our supervisor and tasked ourselves with finding something interesting, some genetic patterns and correlations, but there wasn’t much time to do that. We had to conduct an analysis of mutated genes present in lung tumors and identify how they differ from one another, which would in turn have led to a better understanding of the two types of lung cancer specific features.

The 2018 summer school participants
The 2018 summer school participants

At first we didn’t have a clue how to tackle this enormous task, but later decided to narrow it down to conducting a binary classification. We began by testing different models and working with databases, although the latter wasn’t much help as we had to use a disperse data array. This process of trial and error showed that the Yandex-developed Catboost model was the most effective, so we decided to use it for the second stage of our research project, which consisted of visualization and results analysis.

The Bioinformatics Institute has been hosting the Bioinformatics summer school since 2013. The school is an intensive one-week course attended by a hundred early stage researchers, as well as undergraduate and PhD students majoring in mathematics, computer science, and biology and interested in bioinformatics. This year’s extensive study program allowed participants to travel to the forefront of cancer research with insightful lectures on molecular biology, genetics, bioinformatics, and their use in cancer diagnostics. Students tested the newly-acquired knowledge during practical classes on processing data in Python programming language and R statistical environment, building pipelines, and conducting molecular simulations. They were also given the opportunity to learn more about presenting their research results, building a career in bioinformatics, and compiling an attractive scientific CV. Guest lecturers included leading Russian and international scientists such as Alla Lapidus (St. Petersburg State University’s Laboratory for Algorithmic Biology), Mikhail Pyatnitsky (The V. Orekhovich Institute of Biomedical Chemistry), Alexey Sergushichev (ITMO University), Konstantin Zaytsev (Washington University in St. Louis), German Demidov (Barcelona Institute of Science and Technology / Universitat Pompeu Fabra), Konstantin Okonechnikov (German Cancer Research Center), Pavel Sinitsyn (Max Planck Institute of Biochemistry), and Ilia Minkin (Pennsylvania State University), among many others.

Organizers always strive to improve on the summer school’s program and format. Initially all projects were compulsory and continued for the whole duration of the one-week program. Now, however, the format has switched to a hackathon consisting of short-term research projects that are optional and conducted in teams.

“Since 2016, each of the summer schools has been assigned with a different theme, which allows the previous program participants to revisit their summer school experience if they’re interested in this year’s subject. This also gives us the chance to make our lectures more in-depth and field-specific, which is much more useful for people involved in a particular research area. Last year’s school, for instance, centered on data mining, so we had a lot of lectures on statistics, as well as machine and deep learning. The 2018 program revolved around the topic of cancer research and included classes on Cancer Atlas Consortium and disease detection. We chose this theme because it’s a popular research topic with lots of data at its disposal that bioinformatics scientists are very interested in,” explained Olga Bondareva, the Bioinformatics Institute coordinator.

More than 450 promising students and researchers from Russia and all over the world applied to take part in the 2018 Bioinformatics summer school. Divided into biology and informatics educational tracks, the final shortlist included participants from five countries and thirty cities. Organizers selected the students based on their application form, CV, and research proposal, especially focusing on applicants’ motivation, educational background, academic and scientific record, as well as other accomplishments. You don’t have to be a bioinformatics ace to apply as the school welcomes participants with a beginner level of bioinformatics knowledge.

“We look for applicants that know why they need the knowledge the school offers and how they plan to use it. It would be ideal if an applicant was working on a NGS-related project and needed more data processing skills. Unfortunately, we are often approached by people who just heard about bioinformatics and decided to give it a go,” shared Olga Bondareva.