Media Research Group

Could you tell us a bit about the Media Research Group? When and why was it created?

Media Research Group is one of the research teams of the Machine Learning Laboratory as part of ITMO University’s International Laboratory “Computer Technologies”. Our team was created three years ago. We deal with research in machine learning and social media analytics.

This field of studies came to Russia from Singapore and continued a long-term cooperation of the Machine Learning Lab with the National University of Singapore. We have an agreement and lots of our students went to Singapore. When I was working there, I was invited to perform social media analysis in Russia.

Aleksandr Farseev at a conference
Aleksandr Farseev at a conference

How big is your team?

Initially, there were just two of us – me and a PhD student. Recently, our team has become a bit bigger. Students used to join us as they were working on their Master’s theses. Now they have graduated and two of them remained at the lab. Right now there are five of us: me, two PhD students, and two Master’s students.

Overall, the machine learning lab is doing great when it comes to team building. I’m not always in Russia but I know that our lab is one of the most close-knit labs at ITMO. There are many students who’d rather do real research than simply pursue a salary somewhere else. It’s important because in Russia the research industry is in its early phase. The more people stay at laboratories, the better chances our country has of growing and becoming a worldwide leader, for example, in social media research. My task is to help us develop and that’s what my team has been engaged in for the past three years.

A revolution in digital marketing

Credit: shutterstock.com
Credit: shutterstock.com

You are working on an article  for Habr about the revolutionary impact artificial intelligence had on marketing. Could you tell us about it, too?

Digital marketing is one of the most obvious fields to apply machine learning and social media analysis in. Most digital advertising is found on social media and search engines. All research related to data analysis can be applied in digital marketing.

Overall, there are two ways to apply social media data analytics. The first is to analyze common interests or diseases for public services and social science. The second one is digital marketing – sales growth based on a detailed understanding of one’s target audience and better quality of targeting. You can also apply data analysis in HR, but that’s more specific.

How did AI contribute to this revolution?

Artificial intelligence made a revolutionary breakthrough much earlier than people began to talk about it. You can see its results by looking at the targeting systems used by search engines and social networks worldwide. VK, Yandex, Google – they all have millions of users and billions of transactions. It’s hard to process this much data manually. In order to collect, moderate, and apply all this data, you need to use machine learning and other, more advanced, AI technologies. Otherwise it would be very difficult to process.

As a result, once social networks, search engines, and their business models for advertisement appeared, so did the AI technologies that allowed us to manage these processes. There was a need to manage [ad-selling] auctions and targeting. We needed to understand the audience and be able to reach out to it in the right time with the right offer.

These days, there is a set of additional tools that allow advertising agencies and brands to use machine learning technologies to set up branding systems. During the past few years, it’s been working like this – services like Google allow you to target millions of words and a huge amount of interests, but marketing specialists don’t know how to pick the ones they need. That’s why many platforms that balance this system appear and make the lives of marketing specialists easier.

Credit: shutterstock.com
Credit: shutterstock.com

Some people are frightened by these things. People joke about paranoia and surveillance – “I said I’d like some tea and then ads for tea appeared on my screen”. Are there any robots that make targeting algorithms less intrusive?

Facebook and Google spend a lot of resources on making their ads relevant. If ads aren’t relevant, they will lose what they value the most – their users. Overall, there are three “players” – users, brands, and social networks. Artificial intelligence is supposed to balance the interests of them all.

I think that this paranoia is caused by the fact that people don’t realize that nothing is free. If something exists, then someone paid for it: a taxpayer pays for “free” healthcare and users pay for web content by watching ads. Once users understand that services they use are paid for by advertising, the paranoia will disappear. Everyone will be able to choose whether they want to be a part of social media, use their mobile phones, and allow tracking. It will be a personal choice.

You said that artificial intelligence can help find balance between users, brands, and social networks. How so?

Let’s take Facebook for example. Right now they work hard on fighting virtual users and fake content generated by robots. Lots of people and companies create bots that produce fake content. Bots don’t need any rest – they can work 24 hours per day. If you would create a big enough network of them, you could influence opinions of people all over the world.

Facebook tries to fight that. They create models that can distinguish fake content from real posts. It’s important to understand what was posted by bots, if there is a trend in the way it’s spreading, and how the bots communicate with each other. Once it’s all clear, the entire chain can be blocked and it won’t spread anymore. It’s a fascinating and extremely complex task because bots don’t post the same stuff – they train on content made by real people. In this way, Facebook’s artificial intelligence must constantly train, too, in order to avoid accidentally blocking real content. It’s extremely hard because millions of messages are sent every minute.

So it’s a constant race, like the one between viruses and antiviruses?

Yes, something like that, but in another field.

Trump the bachelor and a fake blogger

Blogger Maya. Credit  b9.com.br
Blogger Maya. Credit b9.com.br

What projects is your team working on?

Facebook, Google, and other high tech companies mostly deal with engineering cases based on research by scientists like us. Tasks that we work on are more about research. We are trying to see which technologies would allow us to solve the tasks I mentioned above.

We do research in two fields: user profiling based on analysis of their content and content generation based on existing images, texts, and videos.

Could you tell us more about these tasks? What is profiling?

Profiling is analysis of the content written by users. It’s required in order to understand who these users are, what they are into, and what type of personality they have. ITMO.NEWS published a story a few years ago about how our algorithm profiled Trump as a single man.

How does it work? We all have a social image that we show to people: our age, location, profile picture, and relationship status. However, there is also psychographic information that describes us as we actually are. It’s reflected in the content we produce. This data can create a whole other image based on content analysis. Preferably, there should be various types of data available: videos, check-ins, text posts, and pictures. This can give us an idea of who the person actually is and what they care about.

Going back to the Trump case, we analyzed both him and other presidential candidates back then (referring to the 2016 US presidential election – Ed.). Other candidates ended up with correct predictions, but Trump didn’t. It means that our precision was higher than 80%. What does it tell us? It tells us that the model was working well, but Trump’s demographic qualities didn’t match his psychographic behavior. If we take a look at Trump’s tweets without knowing who wrote them, we will probably think that someone much younger posted them – not a married man in his 70s who is an important political figure. The algorithm thought so, too.

Credit: shutterstock.com
Credit: shutterstock.com

Why does it matter? Because people working in marketing live in their own world. In their perception, only 35-40 years-old women who care about motherhood and nothing else buy products for children. In reality, it’s not like that – aunts, uncles, and fathers may also buy products for children. Mothers, on the other hand, can be into basketball and video games, not only in raising their kids.

However, marketing specialists, as a rule, don’t consider this. Algorithms are able to understand users and perform more detailed targeting, sometimes by offering a product via another one. For example, we can advertise diapers by showing users information about fitness centers where they can become fit after giving birth. These offers aren’t related directly, but artificial intelligence can draw connections between them. That’s why it’s so popular.

What about content generation?

You can generate new content based on existing materials and it will look genuine. That’s what Facebook is trying to fight. But what we’re doing is perfectly legal because we always warn users if our content was generated automatically and it’s up to them to decide whether they want to see it or not.

We began to deal with this last year. Back then, we created specific modules that later became a foundation for Maya and her advertising. We thought that if a neural network can generate faces, then it can also generate dynamic content. For example, a fast food venue releases a banner with an ad for a new burger. Based on this banner, we can generate 100 other versions of this banner and determine which one is most popular among users. If it was done manually, it would take an enormous amount of time. This field has a lot of potential.

COVID and social network monopoly

Credit: shutterstock.com
Credit: shutterstock.com

But as I understand, your projects aren’t just about advertising on social media?

Publications and presentations at conferences are the key performance indicators for us. We have recently published a paper about analytics related to COVID in a great medical journal. We analyzed the populations of various countries and the spread of the virus among them. We tried to understand whether analytics by WHO were absolutely correct and helpful, or if there are non-medical factors – such as political systems, accessibility of testing, hospital preparedness, and mentality of people – that influence the numbers.

We saw that COVID spreads faster in countries that are most ready for a pandemic. It may be caused by the fact that there are lots of opportunities for testing, meanwhile in countries with smaller numbers it’s just not always diagnosed. As a result, all statistical data we have is inconsistent because in some countries people get tested even involuntarily, whereas in others this data is confidential.

We also analyzed data on death toll and connected it with chronic diseases and external factors such as air pollution. We concluded that air pollution has more impact on mortality than chronic diseases. As a result, many recommendations that seem logical at first can actually be argued, because the spread of diseases and reports about it depend on the political situation in many ways. It should be taken into consideration.

Credit: shutterstock.com
Credit: shutterstock.com

When you talked about problems that Facebook has to face, you mentioned how bots can influence opinions worldwide. At the same time, many media sources complain that their information doesn’t reach users through social networks. Have you researched this problem?

You have named a very interesting problem – the content doesn’t reach people. Allow me to specify that it doesn’t happen because there are too many bots, but because social networks gain money from advertising. If they will show everyone content for free, they will profit less. If you take a look at the average reach rates, you’ll see that it’s going down.

Reach is the amount of followers who see the content we post. For example, if I have a million followers, how many of them will see something I shared on my page? In 2011 it would be 26% of them – so, 260,000 people. In 2018 the percentage is only 0.2%. At the same time, Facebook’s profit from advertising grew proportionally!

What does it mean? It means that there is a monopoly on social networks. If too much content will be available for free, it won’t be profitable for the networks. Of course, it’s all disguised as caring about the relevance of content. As the amount of organic content gets smaller, it also becomes more important. Facebook is doing great because they pay attention to making the content relevant, but still – everywhere, from China to the USA, the reach rate is on the downfall. When we’ll be able to reduce the monopoly on digital advertising, a business revolution will come. I think we should work in this direction rather than fight problems related to machine learning. But that’s already a philosophical matter.