What is Urban Data Science?


By 2050, 80% of the world’s population will be living in cities. Globally, cities are being transformed or redeveloped to continue providing citizens opportunities for sustainable and prosperous livelihoods and access to jobs, resources and amenities. Historically, humans have always migrated to dense spatial agglomerations for intense social interactions, exchange of goods, services, information and ideas. The process of urbanisation has naturally fostered socioeconomic, technological and institutional transformations that are also necessary for sustaining this growth. But the unprecedented development is tightly linked to the most pressing functional and environmental challenges of our time.

Multiple cities in the world are at the brink of collapse, suffering from poverty and segregation, excessive consumption, pollution and associated changes in climate, depleting agricultural utility and in exceptional cases, submergence of land. Lacking resources to solve such problems, some cities are directing further development to satellite centres like Accra in Ghana or Cairo in Egypt. Jakarta, home to 60% of the Indonesian population, is considering relocating its capital city to the rainforests of Borneo because of rising sea levels. Rapidly urbanising cities are also making tremendous efforts to become smarter, sustainable, resilient and inclusive. But how?

Urban Data

In the last decade, technological advancements have led us to embed large-scale networked systems, sensors and computers into the built environment. Urban data has emerged as an excellent stream of constant, real-time and accurate information about all urban activities. The big data revolution, coupled with the capacity of infrastructure to be “smart” has enticed cities and urban managers worldwide to participate in machine learning-based decision making for improving the course of humanity. But city planning has largely been instituted around loosely coupled organisations within municipal and regional governments, project developers, companies and investors, transport, water and energy operators. While some communities have enjoyed the benefits of policies based on the use of big data, machine learning and AI, many have also suffered disproportionately by being pushed to the physical and technological periphery of rapid development in cities. As a data scientist, and especially an engineering and policy analyst, it is our responsibility to interrogate the quality of data, design of intelligent systems and their impact on communities.

Course Contents — What is this class about?

The primary purpose of this course is to teach future data scientists to look beyond the technical power of artificial intelligence and recognise the possibilities and limitations of data and the spatial inequalities that galvanise as a result of the data-driven policy. This course will engage students at the intersection of data science, urbanisation, and effective communication. By interrogating the sociotechnical nature of urban problems, students should then be able to approach solutions to these problems in ways that prioritise social equality and equity.

This class will train students to gather, fuse and clean data from multiple sources, in order to gain useful insights into the reality of multiple problems in urban ecosystems, understand and estimate alternative implications of solutions and communicate results to a wide audience effectively.

The course is divided into five major modules, each focusing on crucial steps in the lifecycle of a data science project.

  1. Obtain: Obtaining data from multiple open data sources.
  2. Scrub: Data cleaning, munging, sampling to consolidate all information into a dataset that is manageable, informative and relates to your problem.
  3. Explore: Exploratory data analysis to make sense of what your data is trying to say.
  4. Model: Estimation and modelling based on statistical tools such as regression and clustering.
  5. Interpret: Communicating results and reflections through visualisation, storytelling and interpretable summaries.

Pedagogical Goals

After completion of this course, you will be able to:

  • interpret and discuss data sources that are usable and relatable for a problem presented.
  • manipulate data and consolidate all information into a dataset that is manageable, informative and relates to your problem.
  • describe and analyse the consolidated dataset(s) to support your problem with evidence.
  • apply models using statistical and machine learning to infer results in the process of turning data into valuable information.
  • report results and reflections through visualisation, storytelling and interpretable summaries, especially when faced with a new dataset.

… and hopefully get great data-driven policy jobs in the future where you can address issues of equity, or go on adventurous travels with an open mind.

It may be useful to keep the following description of an Urban System in mind for this course.

Urban Systems

Taking inspiration from Meerow et al. (2016), we describe an urban system as a complex, adaptive and emerging environment that, for simplicity, is organised in 4 different layers of function. The layer of infrastructure form consists of public assets like buildings, service infrastructure and natural capital. Networks of energy, material, and information flows enable the movement of resources that address our collective needs. The layer of socioeconomic dynamics expresses how demographics are interspersed with socially constructed values such as sustainability and equity. The layer of governance includes organisations, institutions and businesses that shape urban systems through new development, policies or innovation.

A simplified conceptual schematic of the urban ‘system’.