Image Matching Challenge is an annual machine learning competition organized by Google Research. The contest brings together ML specialists tasked with generating 3D reconstructions based on photographs submitted by Google Maps users. Subsequently, these visualizations will be used in the Street View function and other Google products, including the Open Heritage project.
The competition is held for the fourth time, becoming more and more challenging each year. Thus, last year, participants had to combine two pictures into one panorama, whereas this year, they needed to develop and implement a full-fledged 3D solution for Google Maps based on photos added by other users.
The cornerstone of this year’s competition was the technology known as Structure from motion (SfM), a photogrammetric technique utilizing a series of multiple view images of a particular object. Contestants were challenged to not only sort all the images but also match them correctly to produce a 3D model. The task was successfully solved by Jaafar Mahmoud and Ammar Ali, PhD students at the Faculty of Control Systems and Robotics and the Information Technologies and Programming Faculty, who were able to bypass nearly 500 teams and make it to the top 10 gold medal winners of Image Matching Challenge 2023.
The participants noted that the key challenge of the competition was a strict 9-hour time limit. The time constraint influenced the team’s approach towards a possible solution: instead of applying extensive matching, they had to develop a retrieval system to select the best image pairs from a dataset. As a result, the final solution was a mix of technologies, including binary search algorithms, a deep neural network, and geometric modification of the embedding dimension space.
As a starting point, the students built a neural network to analyze the rotation angles and positioning of more than 1,500 images from the dataset. The developed algorithm was successful in processing both original pictures and horizontally-oriented copies, which made it possible for the developers to implement a rotation invariant solution, resulting in a higher accuracy of image matching. Then, the participants utilized a search algorithm to select images with greater matching features and used these pairs to generate a 3D model.
To demonstrate their technology, the students created a 3D model of ITMO’s main building. By analyzing a limited set of images (around 35) of the building's frontal section taken from different angles, the retrieval algorithm aimed to identify and group similar images. This step served as the initial input for the 3D reconstruction pipeline, where each image was individually registered. Local optimization techniques were frequently employed to enhance the accuracy of pose estimation and 3D coordinates. As a result, a 3D sparse reconstruction of the building was generated, accompanied by the determination of camera positions for each image. Subsequently, the sparse model was refined to produce a denser mesh that accurately depicted the frontal view of the building.
This is not the first time Jaafar Mahmoud and Ammar Ali participate in international contests as a team. The teammates competed at Image Matching Challenge in 2022, as well. Back then, the students managed to make the top 30 and bring home a silver medal. As shared by the winners, what makes their team successful is their diverse background: Ammar Ali specializes in machine learning and Jaafar Mahmoud – in computer vision. Moreover, Ammar Ali is currently a senior research engineer at MTS AI and Jaafar Mahmoud has been working on industrial projects at the International Laboratory of Biomechatronics and Energy-Efficient Robotics for several years now.
“I am studying localization and mapping for mobile robots. Structure from motion and 3D reconstruction is something we specialize in within our projects at ITMO’s BE2R Lab (the International Laboratory of Biomechatronics and Energy-Efficient Robotics – Ed.). It gets especially difficult when we try to make the solution robust for different subsets of images, regarding its orientation, illumination, and other difficulties, while ensuring good accuracy,” comments Jaafar Mahmoud.
In his turn, Ammar Ali stresses his interest in SfM as a technology that opens opportunities in a range of other fields.
“I'm a machine learning researcher, so my main contributions to this contest are based on the ML part, while other tweaks were handled by my teammate and friend Jaafar. This is a very interesting problem and I wouldn't even say it's complex. You can come up with a complex solution for any problem, even a basic classification, especially if you are after improved performance/accuracy. What makes this one exciting is its application in virtual and augmented reality and potentially a range of other fields, from self-driving cars to advanced robotics,” concludes Ammar Ali.