New York University Professor’s Take on Future of Digital Cities

Stanislav Sobolevsky is an Associate Professor of Practice at New York University (NYU), the Head of Urban Complexity Research Group and a co-founder of the Insight Data Labs consulting studio. His research interests include machine learning and Big Data, network analysis, human movement modeling, and the theory of differential equations. During his open lecture at ITMO University, he talked about a new scientific field, urban informatics, and in particular what issues it deals with and how it helps make cities more livable.


Thanks to the technological revolution that has been sweeping across the globe in the last couple of decades, digital technologies are gaining popularity. As more and more aspects of our life become digitized, vast amounts of data are generated. City traffic, energy consumption, taxi and public transport routes, phone calls, credit card transactions, Wi-Fi and Bluetooth data, social networks – we leave our digital traces everywhere.

Nowadays, these data are actively used in smart cities research and planning in order to make modern cities as comfortable to live in as possible. Because city systems are extremely vulnerable and complex, modern technologies play a very important role in the field of urban science.


Digital cities

The concept of digital cities implies the use of analytical approaches aimed at making people’s lives safer and more comfortable. Digital informatics is based on three pillars: urbanization (new challenges), digitalization (new solutions), machine learning and artificial intelligence (new tools).

Urbanization has led to more than 50% of the world’s population living in cities. But it hasn’t always been so. In the 19th century, only 3% of people were city-dwellers. Some cities have grown dramatically over the last 30 years. For example, as little far back as 30 years ago, Singapore, which is now one of the smartest cities in the world, was just a small town in the tropics.

Modern cities are the main energy consumers (they are responsible for more than 75% of the world energy consumption). Population density, high traffic, and lots of industrial enterprises present new challenges to modern society and thus create the need for new digital solutions.

Talking about digitalization, by 2003, humankind had produced 5,000,000,000 GB of information. Today, we generate the same amount of information in only a day or two. It’s not because people became so much smarter, no. It’s just that many processes now require the use of digital technologies, and all these huge amounts of information are now logged.

Singapore. Credit:
Singapore. Credit:

In many cities, this information is public. For one, in New York, all data that is not private automatically become public. Researchers in New York can use information on the city’s public transport, road traffic and pedestrian flows, recycling, weather, air pollution, noise rates, population, social media, crime statistics, real estate, and so on.

Certain information is harder to get, but it’s still pretty much available. Such information includes anonymized mobile data, Wi-Fi usage data, credit card transactions statistics, and anonymized clinical records. These can be acquired if certain conditions are met.

As for machine learning and artificial intelligence, over the last ten years, the number of startups that use these technologies has increased significantly. However, the flip side of the coin is that people expect too much from these new tools, which often results in disappointment.

There are several possible applications of AI and machine learning technologies in urban studies.

HubCab Project. Credit:
HubCab Project. Credit:

HubCap project

Taxi is a very important transport in many cities and one of the main factors causing traffic jams and air pollution. Using taxi together would reduce the negative effects of this means of transport, but because this method is fraught with certain difficulties (such as longer time of journey) this way of traveling is not very popular.

The HubCap project’s authors have analyzed the data about millions of taxi rides in New York and came to the conclusion that the number of rides can be reduced by 40%, which would, in its turn, lead to prices lowering, traffic reduction and environmental benefits.

Mobility in developed and developing countries

Transport and communication networks lay the foundation for sustainable development in developing countries. Mobile data research give scientists insight into many aspects of citizens’ mobility. However, such research is primarily conducted in the most developed regions of the world, which comprise only one-third of the world population. Little has been done in this field in developing countries so far.

The data the researchers worked with had been gathered from the mobile phone towers in Côte d’Ivoire, which is a tell-tale sign that developing regions can be analyzed. In addition to the fast urbanization, Côte d’Ivoire can also boast a rich diversity of cultures and languages present in the country. Such contrasting social interactions provided the researchers with an opportunity to look into different models of communications and mobility, as well as to understand the changing needs of a developing country.

The scholars compared the mobility figures of the Ivory Coast with those of the more industrial Portugal to examine whether the mobility models created for developed regions suit the developing ones. The key parameters here were the probability, average distance, and regional distribution of migration. In order to better understand the regional disparities, the team used a set of algorithms aimed at the identification of different communities. Applying these algorithms both to Côte d’Ivoire and Portugal, they discovered a couple of surprising differences in the structure of mobility networks.

Côte d’Ivoire. Credit:
Côte d’Ivoire. Credit:

For example, the official administrative borders of Côte d’Ivoire aren’t that much in tune with the borders of the communities living in the country, unlike those discovered in Portugal. The scientists also discovered that those communities which structurally corresponded to the established tribal and cultural divisions allowed for better models of mobility than those constituted by administrative borders.

ITMO.NEWS talked with the speaker to find out what makes a good city informatics specialist, whether it is possible for the data to be used incorrectly, and how the countries that have only just started to explore this new scientific field can benefit from the experience of their more urban-savvy counterparts.

Stanislav Sobolevsky
Stanislav Sobolevsky

What requirements are there for city informatics specialists, be it data journalists or developers? Does ethics play a big role here or it is technical aspects that take the lead?

City informatics is, undoubtedly, a multidisciplinary scientific field; even a cursory analysis of the educational backgrounds of our students attests to this fact. We have majors in architecture, design, engineering, and maths, and such diversity of skills and talent is crucial when working with urban data. Another important aspect is a researcher’s ability to ask the right questions, which is a characteristic more often found in specialists from applied scientific fields.

Technical skills, such as statistics and data analysis and processing, do have their significance here. Also valuable is a certain scientific integrity: because we have to decode the data with as much accuracy as possible, a city informatics specialist has to be prepared to interpret their findings in a right way, without overestimating them. But it wouldn’t be correct to say that all of these skills are an absolute must for any beginner: they can develop these in the process, but you need to be willing to learn.


Have there been any cases in which the data wasn’t used correctly?

There are projects out there where some procedural mistakes have been found. One example is Google Flu Trends: as part of this research initiative, Google specialists have proposed that the number of Internet queries on falling ill is proportional to the spreading of a disease. The company started to do forecasting analysis aimed at identifying where and when the epidemic will strike next, based on the population’s googling the possible reasons of their sudden malaise. After processing these reports, the analytics then moved to predict the future spread of the epidemic and the regions that will be affected.

During the first two years, the model worked without any evident problems. The first serious glitch happened in 2009. Because of the intense media hype about swine flu, the public started searching for information on the disease more often than usual, and the engine majorly overestimated the spread of this epidemic. While in reality, it existed as a series of unconnected cases, the users’ manic searching created a image of the whole country infected with this disease.

This predicament resulted in some changes in the model, but in three years, these changes led to a reverse effect: the engine completely failed to forecast the next epidemic because some queries were given less weight than the others. The public’s interests in information about the disease also shifted dramatically, which was important as they formed their Google searches in a different way. Because the 2012 flu epidemic was left unnoticed, the company decided to shut the project down. This is an example of scientists trying to readjust the model to make its forecasting better but losing the extrapolation and failing as a result.

Google Flu Trends. Credit:
Google Flu Trends. Credit:

Human behavior is so inconsistent that when we look at some event through a certain prism, we can’t expect for this interpretation to work for another event in a couple of years. Such models are usually very short-lived.

Do you in your personal work use any methods of detecting that a specific model has lost its relevance?

It is very important for us to anticipate the developments that could require us to upgrade our model to make it more relevant. That’s why we’re constantly working on new approaches to detect such changes. We have a temporal network that allows us to see that some processes today unravel in a different way than they did, say, two years ago, which is a sign that the models describing those also need to catch up. We don’t see the exact changes, but we register the discrepancies with the earlier data.

Some cities opt for making the data public, but others still approach this with a great deal of conservatism. Does this mean that such cities develop more slowly than their more progressive counterparts?

The development of a city is a complex and slow-going process in itself. If they renounce making their data public, this can limit their potential in the next five years, but they’re likely to still get there sooner or later. What is certain, though, is that they won’t develop the effective elements of a smart city as fast as they could if they were more open, because researchers won’t have access to the resources they need to test their models, so they’ll just keep to the few areas where they have some data to work with.

NYU. Credit:
NYU. Credit:

How is the modern legislation on using data for urban development research being formed? Is this process at all regulated by the state?

The majority of Western states now use the case-based regulation model. This means that a directive comes from the industry, not the government. The court considers and rules on each case separately, and such rulings are then used as a context to the examination of similar cases in the future. However, this still remains a very modern trend, and even scientists can’t yet fathom all the difficulties the field will have to face going forward.

It seems that nowadays, it is still impossible to develop a comprehensive legal framework that will cover all the cases, so the case-based model stands out as the most sensible solution to this problem. The reality is that the laws on using data for advancing the development of cities are influenced by the policies of big corporations; the legislators’ only role is to catch up. The companies reject or approve of certain practices, and the authorities then enshrine it in law. That’s why the market is a real trendsetter when it comes to legislative decisions regulating our scientific field.

Russia and the CIS countries have an advantage here: we can monitor how the situation changes in the countries that have opened up their urban data before us to see which difficulties our Western colleagues have encountered in this process. This gives us an opportunity to analyze their experience and adopt the best practices out there.

Archive by year: