- Lab 1: Tools (Sep 2, 2020)
- Lab 2: Tidy Data (Sep 9, 2020)
- Lab 3: Data Engineering (Sep 171, 2020)
- Lab 4: Geo-Visualisation (Sep 23, 2020)
- Lab 5: Networks and Spatial Weights (Sep 30, 2020)
- Lab 6: Linear Regression (Oct 7, 2020)
- Lab 7: Clustering (Oct 14, 2020)
- Lab 8: Points (Oct 21, 2020)
Note: The labs will be updated by at least the previous Friday for every week before it begins.
Lab 1 - Tools
IMPORTANT This is a supplementary notebook that covers many basics of the tools we will use in the course but does not explain anything directly related to Urban Data Science.
Students are encouraged to read it once before getting started with the other notebooks and then keep it as a reference throughout the rest of the course. There are some basic Python operations in there that act as a refresher, practice or learning material.
If you want to explore further by yourself the contents presented in this tutorial, the following pointers are good places to start:
- [Video] “Python as Super Glue for the Modern Scientific Workflow”, keynote speech by Prof. Joshua Bloom from UC Berkley about how Python is used in Astronomy research.
- Gallery of interesting notebooks: a wealth of examples of Jupyter (formerly called IPython) notebooks.
- (Downey, 2012): very good general introduction to Python as a programming language and to the algorithmic way of thinking. The book is freely available in HTML and PDF.
- Downey, A. (2012). Think Python - How to Think Like a Computer Scientist. Green Tea Press.
Lab 2 - Tidy Data
This session uses the “Census socio-demographics” dataset of Liverpool, United Kingdom in two parts. The dataset for this lab is provided in the zipped lab files above.
- Table of LSOA areas in Liverpool with population counts by World region. The table is derived from the CDRC Census data pack.
- Collection of socio-demographic characteristics from the 2011 Census for the city of Liverpool.
- A good extension of this session is (Wickham, 2014). The paper is published under an Open Access license, so it is freely available on the journal’s site, but the author has also made available a public repository with the data and code used in the paper. Keep in mind the paper, and the code that comes with it is based on R, not on Python.
- [Visualization] Python library
- [Recommended] (McKinney, 2012): An excellent introduction to Python for data analysis, with plenty of examples and code snippets (Publisher’s page link).
- NY Times article about the importance of cleaning data.
- Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10).
- McKinney, W. (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Inc.
Lab 3 - Data Engineering
This session uses two datasets which are provided in the zipped lab files above.
- A dataset about wines from different countries, download from Kaggle.
- A dataset scraped and collected from Goodreads.
Lab 4 - Geo-Visualisation
Lab 4 homework exercises are embedded within the lab files itself. You have to complete the exercises as you go along understanding the rest of the code. There are two files for this lab,
eda. Submit one or both together as one zip file on Peer when you are done. Since this geocomputational lab is not as straightforward as other python code, a solution set will be provided for questions indicated in the lab after submission ends.
This session uses multiple datasets which are provided in the zipped lab files above.
- A “Census socio-demographics” dataset as well as the Ordnance Survey (OS) Geodata Pack.
- An “Index of Multiple Deprivation”" dataset as well as the Ordnance Survey (OS) Geodata Pack.
- Additionally, you will need the raster file for the basemap of Liverpool. This has been assembled by Dani Arribas-Bel from the OS VectorMap District (Backdrop Raster), and it is licensed as OpenData.
- Simple datasets on
mysteryyou can find out for yourself.
- A good introduction to the
geopandasproject is provided by Kelsey Jordahl, the project’s founder in this set of slides from a 2015 talk and the companion repository.
- An additional great resource is this 4h. workshop by Carson Farmer.
Lab 5 - Networks and Spatial Weights
This session uses multiple datasets which are all provided in the zipped lab files above.
- An “Index of Multiple Deprivation”" dataset used in previous labs.
- A Brexit dataset.
This is the dataset of the results of the 2016 referendum vote to leave the EU, at the local authority level. All the necessary data have been assembled for convenience in a single file that contains geographic information about each local authority in England, Wales and Scotland, as well as the vote attributes. The file is in the modern geospatial format GeoPackage, which presents several advantages over the more traditional shapefile (chief among them, the need of a single file instead of several). The file is provided below,
Unzip and add this file to the data folder as well.
The source data used to compile the file linked above include:
- Electoral Commission data on the EU referendum results (
- Local Authority District boundaries (
Required before the practical
Watch the section on spatial weights of the SciPy’16 tutorial on Geographic Data Science with PySAL.
[ YouTube - Min 1:02:55 to 1:25:40]
Watch the section on ESDA of the SciPy’16 tutorial on Geographic Data Science with PySAL.
“User Guides” in
PySAL's documentation are an excellent resource to better get to know the library.
Lab 6 - Linear Regression
Lab 7 - Clustering
This session uses the “AirBnb listing for Inner London - MSOA level” dataset. Go to the Datasets tab to find out more information as well as instructions to download it.
Required before the practical
Watch the section on spatial clustering of the SciPy’16 tutorial on Geographic Data Science with PySAL.
[ YouTube - Min 2:30:00 to 3:02:00]
- Although a bit more advanced, the documentation for
scikit-learn, a world-class Python library for machine learning, is excellent and includes many examples that cover the entire functionality set of the library.
Lab 8 - Points
This lab uses a sample of geo-referenced tweets for the city of Liverpool. Go to the Datasets tab and check the Geo-referenced tweets section to find out more information about the data as well as instructions to download it.
Required before the practical
Watch the section on points of the SciPy’16 tutorial on Geographic Data Science with PySAL.
[ YouTube - Min 1:50:00 to 2:30:00]
- A very good resource for kernel density estimation in Python is provided in this blog post by Jake Vanderplas.
- The R package
spatstatprovides extensive functionality to statistically describe and model point patterns. Note that this is in R, not Python.
On Sep 16th 2020, Wednesday, there is no lecture. I think that is because you have an EPA Programme session. ↩︎