Hello to everyone! My name is Ammar Ali. I graduated from Tishreen University in Syria before receiving a scholarship to study in a Master’s program at ITMO, at the Information Technologies and Programming Faculty. Right now, I’m a PhD student specializing in the applications of machine learning in computer vision.
Why AI?
I wasn’t involved with AI before coming to ITMO, but I’ve grown interested in the topic. Soon, AI will play a major role in automation. I believe that in the near future, many complex systems will be replaced by artificial intelligence.
At ITMO, my research revolves around developing a driving assistance system that would generate personalized recommendations for various scenarios based on the driver’s state and external conditions. I’ve been working on the different aspects of this technology since 2021. In 2023, the project was able to continue thanks to a grant from the Russian Science Foundation (RSF).
The initial goal of the research was to develop a driver monitoring system capable of detecting the driver's state, such as drowsiness or distraction, using AI models to detect actions like eating, drinking, or smoking, as well as assessing the seat belt state, the driver’s pose, and concentration on the road using a simple RGB camera. The project then extended to the second phase, focusing on detecting the driver's vital signs. An AI-based system was used to analyze the surrounding environment to estimate safe speed based on dynamic changes and create a 3D map for the driver's assistance.
The research part of the project is currently finished, but I will continue my work as part of my PhD thesis. My research focuses on developing a driver assistance system that uses a single RGB camera to analyze the driver's state and provide recommendations or warnings based on multiple factors detected using AI-based systems. Additionally, we are building a system to analyze the surrounding environment and provide recommendations for optimal speed and a 3D map.
Path to success
I first became interested in developing machine learning algorithms as a Master’s student.
In 2021, Ammar Ali won his first major prize in a contest organized by the USA’s National Oceanic and Atmospheric Administration (NOAA). As a contestant, he succeeded in developing the most efficient model for predicting errors in reports produced by navigational spacecraft based on satellite data about solar wind parameters. One of the contest’s requirements was that the model had to base its forecast on real-time data.
Even though the first time I took part in a contest, the main motivator was the prize ($15,000 for the first place), I quickly realized that such events are a great way to develop your problem-solving skills and help you learn about all the latest research in a given field. Contests help you with studies and research, too – the experience you acquire is useful for solving problems, building baselines, and accomplishing tasks much more quickly. So, after a short break during which I wrote my Master’s thesis, I continued to take part in competitions, now as a PhD student. To find events relevant to my interest, I monitor announcements on DrivenData, Kaggle, DSWorks, AICrowd and ods.ai.
The most difficult, but at the same time interesting aspect of competitions is that you have to work within strict limits on time and resources. You must meet deadlines, gauge your own capabilities, and generally work in much more demanding conditions than as a regular developer at a commercial business.
Having taken part in various competitions, I’ve developed my own approach to preparation. At the basic level, I compile all the latest achievements in the relevant field and study open-source solutions. It’s unlikely that within the two months that the contest lasts, someone else will develop a radically different solution – and open source gives you a great head start. Then comes the brainstorm stage, when you try to improve and fine-tune the basic tool by applying additional methods and strategies.
Conditions and teams
I pay special attention to tasks in the field of computer vision, as that’s something I’m much more well-versed in than any other field of AI. As a rule, my personal interest is what decides whether I’ll take part in any particular contest. At the same time, I consider the time that I’ll need to spend on developing a solution.
I often compete solo, but I’m open to joining a team, too. For instance, in 2022 and 2023 I took part in the Image Matching Challenge (IMC), organized by Google Research, with my friend Jaafar Mahmoud, who is also a PhD student at ITMO, but at the Faculty of Control Systems and Robotics. Jaafar has a great deal of experience in fields that are new to me, and the right combination of competencies is the key to success. In 2023, it helped us become one of the top 10 teams; working on a short timeline, we were able to develop an optimal algorithm for creating 3D reconstructions of scenes based on user-submitted photos from Google Maps.
AI Journey
In 2023, I took part in Sber’s AI Journey contest for the third time.
In 2021, Ammar Ali took the first place in the AITrain track. The contestants were tasked with developing an algorithm that could use computer vision to identify dangerous objects on railroads and warn train drivers. As input data, the developers received photographs made by cameras installed in electric trains. A year later, Ammar won three out of AI Journey Contest’s four tracks, despite competing on his own and not as part of a team.
At the latest AI Journey contest, I signed up for two tracks [which ones?]. Another two – RecSys and PersonalAI – weren’t interesting to me, as they involved recommendation systems. As for RescueAI, I didn’t have enough time left for it, even though the task did revolve around biological data, which seemed interesting.
In the end, I won first place in the contest centered on sign language recognition and the third place in the contest on multimodal conversational ML models, as well as the special prize in the HumanEval category.
In the case of the sign language recognition task, my solution was based on the MViT technology with the tiny architecture, which is suitable for fast inference on light devices. The architecture itself is a 3D transformer encoder – I used various samples, multi-training stages, iterative validation, and complex data splitting criteria to improve the results. The solution had to fit several constraints, such as that the model should be able to run twice in real-time on CPU devices, which complicated the use of additional tricks or ensemble methods. The baseline scored 0.72/0.69 on the public/private leaderboard; my solution was able to get to 0.84/0.83 which is above 10% enhancement on the accuracy metric for about 1K different gestures. The gap between the first and second place solutions was about 2%.
By the way, the solution from the last track, EqualAI, will probably be a part of my PhD thesis. It would be interesting to expand my driving assistance system with a sign language recognition feature and, in that way, provide more ways in which people can interact with the driver.
Interview by Ekaterina Derik