Contents:

  1. Why it’s relevant
  2. Program tracks
  3. Career prospects
  4. Potential applicants
  5. Enrollment opportunities

A lot of data – a few experts

97 zettabytes or 97 billion terabytes – that was the total volume of data created, consumed, and stored worldwide in 2022. By 2025, it is projected to nearly double, reaching a staggering 181 zettabytes.

Data has become a tool used to solve various problems. For instance, in banking, information about client transactions is analyzed to develop the best financial products or evaluate clients’ paying capacity. Streaming services need data to make better personal recommendations, while state services process hours of CCTV footage to monitor and prevent road accidents using machine learning. 

When amassed in such volumes, data cannot be processed on regular computers – instead, distributed systems are used for the purpose. They bring together several machines or systems, each responsible for a specific task in data analysis.

These systems are built and operated by data engineers, who have to be familiar with a plethora of technologies for data storage and access, mass data processing,server clustering, as well as have experience in backend development, and creation and support of databases.

In recent years, data engineers have become increasingly sought after – the demand increased by 50% between 2019 and 2020, and according to Alexey Platonov, an associate professor at the Faculty of Software Engineering and Computer Systems and a specialist in information retrieval systems at Huawei Russia, there is still a lack of such specialists on the market. Moreover, the engineers that are there could use some additional training.

ITMO’s new Master’s program targets these problems while also offering its students fundamental training in big data processing.

“It often happens that recent graduates have no idea about the structure of big data systems. This means that once they start working, they have to pick up a lot on the spot. Various online courses can come to the rescue, of course, but they usually lack the systematic approach and cover only single aspects or algorithms. At the same time, existing university programs don’t offer a broad fundamental overview of working with big data in distributed systems, which is why we find it important to focus on such fundamentals within this program. First, our students will learn to see how certain algorithms work and why they are used in a particular system, and then they will train to apply them in practice,” explains Alexey Platonov, who also heads the new program.

Credit: photogenica.ru

Credit: photogenica.ru

Program tracks

In the program’s first year, the curriculum focuses on the structure and principles of distributed big data systems. Over the course of two semesters, the students learn how to store, compress, and index data, how to coordinate hundreds of computers, and how to write algorithms for local and distributed data processing systems. They will also learn about the software used today to manage data systems: unstructured NoSQL databases, as well as the frameworks Apache Spark and Flink. Then, they will hone their skills by personally designing subsystems as part of their practical training.

“Since 80% of this program’s creators are industry veterans themselves, we know well what sorts of tasks you’ll encounter when developing actual big data systems. That’s why the entirety of practical training in the program is aimed at implementing separate parts of a unified system. For instance, in one class we’ll talk about data compression; in another – indexing; and in yet another – distributed processing. This way, the students will understand how these different subjects are interconnected and learn about the different aspects of working with big data,” explains Alexey Platonov.

The program team’s idea is that by the end of the first year, the students would possess all the skills a data engineer needs to develop, maintain, and improve big data systems.

In their second year, students will be able to choose one of two tracks:

  • ML Engineer
    Here, students will acquire the skills and expertise needed to become machine learning engineers: how machine learning works, how to develop pipelines and models, and how to efficiently integrate them into big data systems (MLOps).
  • Data Architect
    In this track, students will learn to work as database architects, studying subjects like design patterns, stream processing, and analytical systems.

Credit: photogenica.ru

Credit: photogenica.ru

Career prospects

Depending on their chosen track, graduates can land jobs as data engineers, backend developers, data architects, and machine learning engineers. Such skills are in demand in telecom, banking, IT, and logistics – for instance, there are over 700 job openings for data engineers in St. Petersburg at the time of publication. Employed at major companies, including Sberbank, Tinkoff, Raiffeisenbank, Medsi, Yandex, and Tele2, such specialists earn up to 200,000 rubles per month at the start of their career and can expect to reach 300-400,000 rubles per month as their experience expands.

Potential applicants

The program is for students who have experience developing backend apps and wish to grow in the field of data engineering and architecture. Successful applicants will also benefit from the knowledge of algorithms, server systems, and cluster computing.

Enrollment opportunities

Prospective students can enroll in one of the following ways: