The study consisted of two parts. In the first one, the authors analyzed how open-source software is utilized in Russia. It turned out that nearly all companies developing open solutions in Data/ML are meant both for domestic and international markets. Users, in turn, choose solutions based on their effectiveness rather than the developer’s country of origin.
Based on expert opinions and open data from GitHub and PyPI, the researchers identified the top five projects in various categories: ML and algorithms, mathematics, infrastructure, BI visualization, data storage, and MLOps. The list of leaders included solutions from both domestic and international companies, such as CatBoost, LangChain, Spark, MetaBase, Numpy, and ClickHouse, among others.
In the second part of their study, the researchers determined the leaders in Russian open-source development based on several criteria, such as the number of open-source projects on topics within Data/ML, application metrics for these products in Russia, as well as the quality of repository implementations, the number of contributors and their activity. Yandex came out as the leader based on the majority of criteria, with Sberbank and T-Bank taking second and third place respectively. In the top ten are also Postgres Pro, VK, Avito, Evrone, MTS, Selectel, and major universities, including ITMO.
Apart from researchers, the study benefited from the opinions of experts representing Yandex, Sberbank, T-Bank, VK, Wildberries, Rocket Control, CodeScoring, and Moscow Institute of Physics and Technology. They helped identify the strong points and weaknesses of open-source software and the most efficient tactics for its promotion, as well as formulate the details of communicating with the developer community.
Based on the results of the expert survey, the researchers outlined the main trends and opinions related to the global open-source movement. In terms of the platforms in use, the main takeaway is that GitHub continues to be regarded as the de facto standard for open-source, however the interest in alternative platforms (such as Gitee and GitVerse) remains strong. As for the prospects for open-source development, the experts name the importance of the leading role of human intelligence in the age of AI, the demand for democratic and automated AI solutions, as well as for an international open-source community. Moreover, the survey participants state that the notion that open-source solutions can help one’s competitors is turning obsolete as more and more companies are focused on advancing the industry as a whole. Experts also emphasized the need for financial investment in open-source projects – big tech companies have the resources for this, but it is currently crucial for them to maintain their position on the market.
The authors of this study are part of the ITMO OpenSource community, one of the largest of its kind in the country (around 1,000 members). ITMO’s open-source ecosystem in AI is the biggest one among other academic solutions, with the deepest level of project development. ITMO OpenSource organizes regular meetups, collaborates with the Open Data Science community, and involves students in open projects. Thanks to the study, members of the community will be able to gauge the field and identify the best entry points and practices.