In mineral processing, it’s important to track the properties of fossil fuels and ore: big rocks can damage the grinder, while smaller ones – pass right through. Specialists also have to take note of the size, shape, color, and movement of bubbles during flotation, a process that helps get rid of empty ore. Treated with special solutions, some of the processed particles sink, while others don’t take on water and are carried away by foam. And in another example, it’s necessary to determine the size and wetness of ore pellets used in cast iron production.
Material characteristics can be evaluated in several ways: stones are sifted through a large sieve, analyzed via laser 3D scanning, or examined with X-ray radiometric methods. Bubbles are either identified visually or through chemical analysis of the solution. Computer vision and artificial intelligence can accelerate and simplify the process. However, stones and bubbles are densely clustered objects that overlap randomly. This makes it challenging to train AI to “see” the boundaries between stones and bubbles, often requiring manual labeling of objects.
Students of ITMO’s Computer Technologies Laboratory have suggested an open-source library of computational models that can automate the labelling of images with densely packed objects, such as stones or bubbles.
“There are ML and AI-based solutions for the mining industry in Russia, but they are typically not available to the public. Our approach is unique because we create open-source, universally applicable solutions. We have not just automated the labelling of stones and bubbles, but also developed a generative model that labels similar objects. Machine learning engineers would have to spend some time adapting the algorithm, but this approach would still be faster than manual labelling,” says Maria Rumyantseva, the head of the project, a PhD student at ITMO’s Information Technologies and Programming Faculty.

Maria Rumyantseva. Photo courtesy of the subject
At the core of the library are three AI solutions: the fundamental Segment Anything Model selects the area with the necessary object within a square frame, then the YOLOv8s detection model segments major objects, and a watershed segmentation algorithm tackles smaller ones. Thanks to this combination, the model can analyze the image without missing anything.
“Our solution can segment more objects in the image than classic watershed segmentation: its object recall is 0.85 compared to the 0.52 of the conventional model. The library also outperforms the existing method in the optical flow similarity metric (0.27 vs. 0.23) but falls behind in the temporal consistency metric of segmentation masks (0.30 vs. 0.41). These parameters indicate how stable and coherent a segmentation model is. Our models label objects quite quickly – approximately 600 images sized 512x512 pixels per hour. The processing speed does not depend on the number of objects in the image, but labeling stones is more challenging because they can overlap with each other and have varying shapes, unlike the consistently round bubbles,” shares Egor Prokopov, one of the model’s developers and a fourth-year student of ITMO’s Faculty of Control Systems and Robotics.

A comparison of automated labelling done by a human and the model (left: initial photo; center: human labelling; right: the model’s labelling). Image courtesy of Maria Rumyantseva
The solution can find its application not only in mining but also in oil refining, processing of bulk materials, granules, and crystals in food production, and fertilizers in agriculture. For example, the team has already created a service for the company Knauf, having AI automatically analyze the pore structure quality in drywall using images with up to 99% accuracy.
To adapt the models for other tasks, the researchers have prepared a method of generating datasets that ML engineers can use to train their segmentation models. The dataset includes target images of stones or bubbles and masks – black-and-white images highlighting the boundaries and shapes of each object.
To create this dataset, the developers first generated masks via text prompts using the Stable Diffusion Turbo generative model, then extracted object contours from the resulting images. Next, using the IP Adapter tool, they trained the Stable Diffusion image generation model, teaching it to recognize real-life appearances of stones and bubbles. The trained model can now generate images of target objects within mask contours.
“Using the resulting dataset of ‘image – mask’ pairs, ML engineers can train their own stones and bubbles segmentation models. Our tests show that, compared to expert labelling, the models trained this way demonstrate object recall at 0.99. This means that the model detects all stones marked by an expert,” explains Daria Usacheva, one of the project’s developers and a fourth-year student of ITMO’s Faculty of Control Systems and Robotics.
In the future, the team will improve the generation of masks for bubbles, train the models to label objects on video, and add an analytics system. The latter will make the models capable of not just detecting stones or bubbles, but also describing their size, shape, amount, and color. Also on the team’s agenda is to present the model at conferences and expand partnerships with industrial companies.
Bubble labelling visualized. Video courtesy of Maria Rumyantseva