In order for scientific experiments to be replicable – and for research teams to be able to use the results described by others – it’s important that papers describe these results in a clear and transparent way. That’s why papers and supplementary materials (such as data for training models, code for algorithms or experiments, template data, etc.) are uploaded to repositories, which are special databases for code that are open to other users. Each such repository has to be user-friendly – with a clear structure and a description in English. However, often this is not the case: instead, code and data are stored chaotically, their authors being the only ones who can make sense of the structure. When published in this way, the results are hard to replicate or use by other teams, which diminishes both a paper’s value and its citation index.
There are separate solutions that can help solve the problem – such as coding AI assistants (GitHub Copilot being one of them) or code documentation tools. However, there had been no comprehensive solution that would fully automate the full cycle of repository management – until now.
In order to make scientific data more understandable and replicable, scientists from ITMO have developed Open Source Advisor (OSA), an LLM-based tool that improves open-source code repositories. It helps create such repositories based on a combination of code and the corresponding paper, generates descriptions for the entire repository and the classes and methods within it, reports on any necessary updates, and structures the contents. This tool will benefit scientists (e.g. biologists or chemists) who don’t have experience in commercial development and code documentation.
OSA’s authors used a multiagent system based on several LLMs, each responsible for a specific task: generating documentation, code testing, and analysis. Depending on a user’s access level, they can choose the models they’d like to use – GPT-4, LLaMA, GigaChat, and others.
“The tool is currently in beta testing mode (for instance, we are discussing testing it on the AI-first platform GitVerse), with the full version slated to be launched by fall. In the future, we are planning to run OSA on ITMO servers to make it even easier to use. At the moment, we are working on automated test generation to allow users to check if their code is running correctly; additionally, we are implementing automatic development of knowledge graphs about the best practices in open-source development, which we can then use to improve a model’s quality,” shared Nikolay Nikitin, the head of the project and the R&D group of ITMO’s Research Center “Strong AI in Industry.”

Nikolay Nikitin. Photo by Dmitry Grigoryev / ITMO NEWS
By now, the scientists have tested OSA on several repositories by ITMO teams, as well as on projects by scientists from Brazil – for instance, to check the automated translation of files and folders from Portuguese to English, a feature that will make such code more accessible to the international community. Next, the researchers are planning to “teach” OSA to lower the initial code readiness requirements and to tackle harder tasks within research projects.