Sentiment analysis

Ksenia Mukhina and Vasiliy Boychuk, researchers from ITMO University’s eScience Research Institute, have found a variety of uses for software that analyses the emotions of groups of people based on their social media posts. The first large-scale test of this technology was presented at the booth of the youth camp “Sirius” in Sochi. Their intention was to show the city and the emotions it causes the people to experience (the gathered data was visualized on a map of the city using an interactive app). The researchers outlined three massive zones for their test – Sochi, Krasnaya Polyana and Adler; the social networks chosen for the purpose of the study were Instagram, Twitter and VK.

The researchers had to gather an adequate amount of data and use an algorithm to determine the sentiment: each analyzed post contained a photograph that was evaluated by a convolutional neural network. Each facial expression was assigned one of the eight emotions: anger, indifference, neutral state, sadness, surprise, happiness and grief. In addition to the photos themselves, the accompanying posts were also analyzed for positivity or negativity. The final evaluation was then added to the map where one could see which places stimulate happiness and which ones stimulate sadness.

Since automatic algorithms have a certain percentage of error, the developers allowed people to evaluate the posts on their own. The man-made evaluations will then be used to improve the algorithm’s accuracy. This system was also shown at VK Fest 2016, although there it was limited to the area of St. Petersburg’s 300-Years Park and only used data from VK. The project’s intention was to show which spots present more interest for people, which ones are the most photographed and the like. The results of both studies can be seen in this video.

The researchers have used emotional analysis to test the hypothesis that implies a connection between the occurrence of emergency situations at stadiums during football games and the emotions seen in photos there. The study used data from Instagram posts geo-tagged to various stadiums.  During the first stage of the study it emerged that games that involved fighting among fans had twice the level of “anger” detected in the photos as opposed to games that did not involve a conflict. The team proposes to use the study’s results to develop a method for predicting fights and other conflicts at football matches.

“We used mass media to collect information on fights and other emergency situations at football games involving the Zenit football club. It was important for us to examine the impact of such situations on the fans. The emergencies include things like the use of pyrotechnics, throwing objects at the audience or on the field, offensive banners or chants. We examined games in the period from 2013 to 2015 – approximately 10 matches; we later expanded our pool by adding other teams’ games. There were not that many fights, so we used official stats from the Russian Football Union to look at other occurrences that could threaten the health or psyche of fans. Overall, of the 700 games that occurred in three seasons, nearly a half involved some sort of an emergency situation. The data can be gathered at any time: before, during or after a game. Our system can use historical data to analyze the mood of a stadium’s audience in real-time,” – explains Mr. Boychuk.


Ms. Mukhina and her colleagues from the eScience Research Institute have developed a project that would provide tourists with a list of spots popular among the locals. Data analysis was made using Instagram posts. This social network’s defining characteristic, says Ksenia, is that people post more positive content there.

“We can look at posts made by a city’s local populace and identify the places that are evaluated positively. We can then provide tourists with a list of places that are popular among the locals. We used data for 2016 gathered from the profiles of 59.024 users. It contains 529.251 photos geo-tagged at 17.921 spots around St. Petersburg. 23.596 users were identified as locals. In the end, we identified 44 places that are the most popular among the citizens. They can be split into five categories: places of culture (theatres, museums), restaurants (bars, cafes), landmarks (bridges, streets, etc.) parks and “others” (creative spaces, studios, etc),” – explains Ms. Mukhina.

The system also has to be able to discern between a local and a tourist. The researchers have assumed that people post their photos when they’re in the city. They also assumed that people tend to have two 15-day vacation periods each year with at least 30 days in between. If a user posts photos outside of those periods from the city, the system assumes that they are local. Others are considered as tourists and therefore not included in the analysis. The developers also identified the places that are most popular with tourists and added them to a blacklist. Thus, the go-to tourists spots like the Hermitage, Russian Museum, the Peter-and-Paul Fortress were excluded from the list of “secret spots”.

Grey market analysis

Social network data analysis allows researchers to study the areas that have been long hidden from the public eye. Daniil Voloshin, postgraduate student at ITMO University’s High-Performance Computing Department, and his colleagues have performed an analysis of the grey market of sex work in St. Petersburg. The researchers were first and foremost interested in the geographical spread of venues that provided such illegal services.

Daniil Voloshin, Ksenia Mukhina and Vasiliy Boychuk

There is a sub-field of criminology called ecology theory that studies the correlation between urban environment and occurrence of criminal activity. In recent years, a slew of research has appeared on clusterization of certain types of illegal activities. For instance, some crimes tend to happen close to bars and pubs. This means that people could be able to differentiate the various crimes and thus increase the efficiency of law enforcement bodies. In addition, this helps develop better social policies and urban planning.

A group of researchers have decided to find out if the advertisements posted on the web, including social media, can present an image of the state of the overall market. The scientists also benefitted from the fact that most such adverts contain not only descriptions of sex workers, but their coordinates as well. In this regard, comment the researchers, Russian websites differ greatly from their European or American counterparts.

Research has shown that the providers of illegal services are successfully keeping up with the latest trends in technology: there are even applications that provide loyalty programs (bonuses, etc.), and, most importantly, coordinates. Even though they do not provide complete information, it is enough for the researchers to assign them to a certain micro-district. Such research is highly beneficial to the study of hidden populations: criminals and those involved in illegal activity. Currently this data is put on the web by the perpetrators themselves, which is why it is quite easy to gather and analyze this data. The areas that used to be hidden from the public eye are now putting themselves out there in virtual space.

The team is currently struggling with establishing the patterns and correlations among the various points on their map. There are several hypotheses: cheap rent, shared networks, etc.

The most valuable part of this information is the text. It varies from short, concise sentences to winded descriptions. It’s hard to establish if the descriptions are written by the same people that provide the services, but logical connections can be made among coordinates using linguistic analysis to detect similar structures and terminology. For instance, if the spots are located far from each other but use similar descriptions, it is likely that they are a part of a network.

This, in turn, lets researchers determine to what degree this market is organized: to explore both the virtual and the special structure. One of the fields of deviance study explores the methods of determining the professional status of a person involved in illegal business. One of the criteria used in that process is the subjects’ use of professional language. By analyzing the text descriptions, researchers can figure out how deeply the various types of people are involved in this activity.

Another source of information is reviews. Earlier on, the researchers used data from websites managed by the service providers themselves, but less structured data from social networks was no less useful for their analysis. Websites that contain tips and reviews are useful for several reasons: there is a great deal of text information provided, as well as dictionaries for newcomers; these can be analyzed to increase the accuracy of the results. In some cases, the information in social networks is more credible that what is found on other website, so it definitely has to be taken into account, explain the researchers.