The accuracy of ML models is directly proportional to the amount of data. Nonetheless, the government often prohibits data transfers to third parties, such as financial or medical organizations. 

A team of developers from ITMO University and Sberbank developed Stalactite, an AI training tool to exchange private data from several sources without the need for complex synchronization or merging. The development was created as part of the federal project Artificial Intelligence. 

“Many businesses have now reached the point where their own data is no longer enough to provide more accurate ML predictions; they need other resources. In the last two years, Chinese and Russian companies began using Vertical Federated Learning (VFL) for their transfers. As far as we know, Stalactite is one of the first such tools in Russia that can predict how different departments will perform in the future following security policies,” notes Nikolay Butakov, a senior researcher at ITMO’s Research Center “Strong AI in Industry.”

Stalactite uses Python and Protobuf for secure data transfers along with several ML algorithms to deal with tabular data and images for regression and classification tasks, as well as recommendations systems. Additionally, the software can be utilized to debug, select parameters, and adjust environments, for instance, in a test mode. 

The development will benefit developers of AI systems who can use it to adapt their own algorithms to the VFL mode. It will also appeal to inexperienced specialists given its simple console interface, which starts a ready-made algorithm with several commands and sets it up for training. 

“Stalactite is a follow-up of our long-term work with Russia’s leading research centers aimed at developing and improving data science tools. This year, we presented several reports at top AI conferences, some of which focused on federal learning. The created framework, in particular, has the potential to improve recommendation systems. With it, data scientists will be able to test varied VFL algorithms for such systems to safely train models while sharing no private information directly,” says Gleb Gusev, the head of Sber AI Lab.