RENOIR Project: dealing with the chaos that is user data
In 2018, the total volume of user-generated data amounted to 33ZB, and according to forecasts by the IDC, it will reach 175ZB by 2025. The driving factor of this process will be the data from entertainment platforms, IoT devices, performance-tuning devices, and metadata. All of that is very important from the standpoint of analytics and contextualization of information.
The goal of the RENOIR project is to develop new mechanisms for processing social information. Members of this consortium, which has brought together a dozen of different universities, laboratories, and organizations, are focused on reconstructing the dynamics of dissemination of information in social networks. This includes, for instance, the dissemination of information, including rumors, on Twitter and Facebook as well as dynamics of news and their topics across time.
The project is based on three key principles: data acquisition, data mining/machine learning, and complex systems modeling. Some of the particular issues examined by the consortium's participants include the prediction of the propagation of information on various topics in mass media and the analysis of its mechanics, the search for information sources and the disclosure of hidden information channels. The staff of RENOIR also stress that their work opens up new opportunities for businesses by granting them access to new innovative methods and instruments of data analysis.
RENOIR receives funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement and has brought together laboratories and universities from different countries.
Among them are four key partners: the Warsaw University of Technology (Poland), the Wroclaw University of Science and Technology (Poland), Jozef Stefan Institute (Slovenia) and the Slovenian Press Agency STA, as well as 11 other partners including Stanford University, the University of California and ITMO University.
One of the project's notable features is the level of cooperation between its participants: the partners regularly visit each others' universities and companies in order to exchange experience. On the whole, RENOIR envelops five levels, or workpackages, that involve the exchange of knowledge and innovations in the field of data infrastructure, the development of innovative solutions in the field of data processing and analysis, the exchange of knowledge and innovation on data mining and machine learning for reverse engineering of social information processing, and many things more.
Joanna Toruniewska's work at ITMO University
Joanna Toruniewska, a PhD student from the Warsaw University of Technology, has come to ITMO to conduct research as part of the RENOIR project. In Poland, she mostly focused on working with social data and modeling complex systems with the help of a coevolutionary algorithms.
"Working with a coevolutionary model means that we can change the connections and nodes states in a particular network. I am currently working on a coevolutionary q-voter model. What this means is that we have a network and we can portray all of the network's agents that have a positive or negative opinion about something. And these agents can change either their own opinion (their agent/node condition in the network), or change their “friends” who don't share their opinions," explains the researcher.
Joanna gives the following situation as an example: imagine that you have a group of people who are connected in one way or the other. Some like green tables, others prefer yellow ones. These groups can meet and discuss their preferences, and then the agents decide whether they should change them. For instance, if the participants believe that their friendship is more important than their preferences, they can change them but keep their friends. If not, those who aren't ready to change their preferences will sever connections with those who think differently and start searching for like-minded people.
A q-voter model has a parameter that helps predict such a situation's outcome: if it's high, then the participants are more likely to sever their ties and begin searching for like-minded peers; if not, they decide to maintain their bonds and change their opinions instead. It's an important parameter that allows researchers to identify how a community operates as a whole, and how opinions and social bonds change.
"It is possible to model the dissemination of opinions. In order to study how a system's particular properties change, for example, how many times people change their opinions, you have to conduct such modeling for an extensive network” explains Joanna Toruniewska. “Still, to conduct such modeling within a reasonable time frame you need top-grade equipment such as a multiprocessing cluster. That’s why I requested the help of the specialists at ITMO University, as they are very well-versed in the field of high-performance computing. Here at ITMO University I have the opportunity to work with such systems, for example, conduct massive parallel computations, and make simulations way faster. During my visit, I focused on optimizing the code's runtime so that I would be able to generate results in reasonable time."
Pitfalls of working with social data
Joanna has been part of the RENOIR project for several years now, having completed a series of studies. The project's main goal is to understand the process of information distribution. For example, if the researchers have a large amount of data from social media, they can study which methods and processes are important in this regard, and why particular bits of information become popular and others don’t, which can be used to develop the methods of distributing important information. What's more, it is possible to identify the resources that contribute to misinformation and tell genuine information apart from fakes. One of the obvious applications of such research is in marketing, as many companies have long been interested in instruments that can help them promote particular information.
Nevertheless, there are issues that make working with social data a complex task. The amount of social data is absolutely massive and studying it requires quality equipment capable of high-performance computing.
When working with such amounts of information, researchers have to decide which data is the most important and choose only that which will will help fulfill the project's goals and produce relevant results.
"Right now we are trying to learn all we can about both the model and the results we get. As part of the RENOIR project, we collaborate with sociologists who consult us and verify our results," explains Joanna.
Joanna plans to use her work at ITMO University to acquire concise modeling results that will help understand how the system's processes operate.
"What I'm currently working on is just one part of a bigger problem. And I'm sure that the specialists at ITMO University have the competencies that make collaboration on such problems possible. I hope that in the future we will get many opportunities to continue this joint effort,” comments Joanna Toruniewska. “Social networks are a source of a great many data, which is why we can conduct lots of different research that will help us understand how communities function. We have to, of course, consider that it's people's behavior in social networks that we are dealing with, but it still has to do with their real lives, as well. We have never had this many opportunities to study human behavior.”