class: center, middle, inverse, title-slide # Urban and socio-economic correlates of property prices in Dublin’s area ## IEEE DSAA Special Session - EnGeoData ### Damien Dupré ### Dublin City University - October 6th, 2020 --- layout: true <div class="custom-footer"><span>DSAA EnGeoData - Dupré (2020) </span></div> <!-- --- --> <!-- # My Journey into Data Science --> <!-- #### Developement of the DynEmo Facial Expression Database (Master) --> <!-- * Dynamic and spontaneous emotions --> <!-- * Assessed with self-reports and by observers --> <!-- #### Analysis of Emotional User Experience of Innovative Tech. (Industrial PhD) --> <!-- * Understand users' acceptance of technologies from their emotional response --> <!-- * Based on multivariate self-reports --> <!-- #### Evaluation of Emotions from Facial and Physiological Measures (Industrial PostDoc) --> <!-- * Applications to marketing, sports and automotive industries --> <!-- * Dynamic changes with trend extraction techniques (2 patents) --> <!-- #### Performance Prediction using Machine Learning (Academic PostDoc) --> <!-- * Application to sport analytics --> <!-- * Big Data treatment (> 1 million users with activities recorded in the past 5 years) --> --- class: inverse, mline, center, middle # 1. A Closer Look at Property Prices --- # Market Characteristics An housing market is endemic to a country in a specific context. .pull-left[ Countries are very different in term of: - Architecture Styles/Composition - Building Materials - Legislation - Urban features - Population characteristics ] .pull-right[ <img src="dsaa_2020_files/img/slides_insertimage_1.png" width="100%" style="display: block; margin: auto;" /> .center.tiny[Ratio of Housing prices adjusted for inflation (base year 2015)<br />Credit: Jeff Desjardins - Visual Capitalist (2019) [🔗](https://www.visualcapitalist.com/mapped-the-countries-with-the-highest-housing-bubble-risks/)] ] Ireland can be considered as a housing bubble as it has one of the highest ratio of Housing prices adjusted for inflation in the world. Almost 40% of Irish population leave in greater Dublin (1.9M/4.9M inhabitant) spread on a 318 km2 urban area resulting in one of the lowest population density in an European capital (4,811 inhabitant/km2). --- # Market History Housing prices are included in an economic context. Despite, having one of European highest economic growth (5.5% in 2019), Ireland suffered badly form the economic crash between 2008 and 2012. <img src="dsaa_2020_files/slides_files/figure-html/unnamed-chunk-2-1.png" width="864" style="display: block; margin: auto;" /> .center.tiny[Distribution of properties sold in Dublin's area between 2010 and 2018] But since 2013, price have constantly risen with an average of 7% leading to a social crisis in Dublin where the number of homeless people has gone to the roof. --- # A Data Science Insight All properties purchased has to be published online on the [Property Price Register](https://propertypriceregister.ie): - Since January 2010 - Filed by the owners - Contains only the address and the price In its current form, the property price register is only used by buyers to compare the price of their future acquisition with the price that houses has be sold in the same street. However it could also be used to have an overall view of the price distribution by geocoding the addresses using sing [Nominatim REST API from Open Street Map](http://nominatim.openstreetmap.org): ```r address <- "10 Downing St, Westminster, London SW1A 2AA, United Kingdom" api_url <- "http://nominatim.openstreetmap.org/search/@addr@?format=json&addressdetails=0&limit=1" api_url %>% stringr::str_replace("\\@addr\\@", stringr::str_replace(address, "\\s+", "\\%20")) %>% jsonlite::fromJSON() %>% dplyr::select(lat, lon) # lat lon # 1 51.50344025 -0.12770820958562096 ``` --- # Spatial Density with GAM Once the lat/lon has been obtained, it is possible to estimate the typical property price in a 2D space: - Use of Generalized Additive Model (GAM, see [Wood, 2017](Generalized additive models: An introduction with R)) - Application of a soap film smoother to solve the "finite area smoothing" problem due to the coastal shape of Dublin's area <img src="dsaa_2020_files/img/slides_insertimage_2.png" width="60%" style="display: block; margin: auto;" /> .center.tiny[Illustration of the "finite area smoothing" problem.<br />Credit: Gavin Simpson (2016) [🔗](https://fromthebottomoftheheap.net/2016/03/27/soap-film-smoothers/)] --- # Spatial Density with GAM .pull-left[ 1- A matrix has to be drawn with key knots within the shape to predict ] .pull-right[ <img src="dsaa_2020_files/slides_files/figure-html/unnamed-chunk-6-1.png" width="144" style="display: block; margin: auto;" /> .center.tiny[Key knots within Dublin's coastal area.] ] 2- A Bayesian spline smoothing using restricted maximum likelihood is applied to estimate the degree of smoothness of the property price variation ([Wood, 2011](10.1111/j.1467-9868.2010.00749.x); [Lin & Zhang, 1999](10.1111/1467-9868.00183)): ```r gam_pred <- mgcv::gam( price ~ s(lng, lat, bs = "so", xt = list(bnd = bound)), data = data_dublin_2018_features, method = "REML", knots = knots, family = gaussian(link = "identity") ) %>% gam_density(too.far = 0.05, n.grid = 100) ``` --- # Spatial Density with GAM <img src="dsaa_2020_files/slides_files/figure-html/unnamed-chunk-8-1.png" width="504" style="display: block; margin: auto;" /> The prices differences between areas reveal that underlying factors are influencing these differences. --- class: inverse, mline, center, middle # 2. Urban and Socio-Economic Correlates --- # Urban Correlates Are property prices correlating with their distance to urban landmarks? Calculation of shortest distance to each property GPS location with one of 160 urban features from the [Overpass API of Open Street Map](https://wiki.openstreetmap.org/wiki/Overpass_API). For example, here is the query for all the pubs in Dublin: .pull-left[ ```r opq("Dublin") %>% add_osm_feature( key = "amenity", value = "pub" ) %>% osmdata_sp() %>% use_series("osm_points") %>% plot() ``` ] .pull-right[ <img src="dsaa_2020_files/slides_files/figure-html/unnamed-chunk-9-1.png" width="288" style="display: block; margin: auto;" /> .center.tiny[Location of results obtained from the query `amenity = pub` in Dublin using the R package {osmdata}.] ] --- # Urban Correlates Use these distances as predictors of the property price in a XGBoost model (tree based boosted regression, see [Chen & Guestrin, 2016](Generalized additive models: An introduction with R)). The dataset was randomly split in a train and in a test dataset (80% vs. 20%). The results reveal that the model explains 44% of the variance of the property prices of the test dataset ... <img src="dsaa_2020_files/slides_files/figure-html/unnamed-chunk-11-1.png" width="504" style="display: block; margin: auto;" /> .center.tiny[Model's prediction accuracy using urban features *vs.* test dataset.] ... but some features are more important than others! --- # Urban Correlates Identify the contributing importance of each feature to the property price estimate. This contribution (also called "importance") is a measure of the improvement in accuracy brought by a feature. <table class="table" style="font-size: 14px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Feature Category </th> <th style="text-align:left;"> Feature Type </th> <th style="text-align:left;"> Importance </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> amenity </td> <td style="text-align:left;width: 10em; "> embassy </td> <td style="text-align:left;"> 15.2% </td> </tr> <tr> <td style="text-align:left;"> natural </td> <td style="text-align:left;width: 10em; "> grassland </td> <td style="text-align:left;"> 4.5% </td> </tr> <tr> <td style="text-align:left;"> route </td> <td style="text-align:left;width: 10em; "> bus </td> <td style="text-align:left;"> 2.3% </td> </tr> <tr> <td style="text-align:left;"> power </td> <td style="text-align:left;width: 10em; "> line </td> <td style="text-align:left;"> 2.2% </td> </tr> <tr> <td style="text-align:left;"> boundary </td> <td style="text-align:left;width: 10em; "> administrative </td> <td style="text-align:left;"> 2.2% </td> </tr> <tr> <td style="text-align:left;"> amenity </td> <td style="text-align:left;width: 10em; "> bar </td> <td style="text-align:left;"> 1.8% </td> </tr> <tr> <td style="text-align:left;"> boundary </td> <td style="text-align:left;width: 10em; "> political </td> <td style="text-align:left;"> 1.8% </td> </tr> <tr> <td style="text-align:left;"> place </td> <td style="text-align:left;width: 10em; "> island </td> <td style="text-align:left;"> 1.7% </td> </tr> <tr> <td style="text-align:left;"> route </td> <td style="text-align:left;width: 10em; "> ferry </td> <td style="text-align:left;"> 1.5% </td> </tr> <tr> <td style="text-align:left;"> barrier </td> <td style="text-align:left;width: 10em; "> wall </td> <td style="text-align:left;"> 1.5% </td> </tr> </tbody> </table> .center.tiny[Top 10 most important feature according their contribution to the model.] The proximity to **embassies** and **grasslands** is a driver of property prices. --- # Socio-economic Correlates Is the distribution of population characteristics correlating with property prices? Localisation of the properties within the "small area" (i.e. highest administrative resolution map) used for the [Irish Census 2016](https://www.cso.ie/en/). Each small area is summarised according a percentage of 48 socio-economic features (e.g. age, employment, civil status, religion, ...). <img src="https://shanelynnwebsite-mid9n9g1q9y8tt.netdna-ssl.com/wp-content/uploads/2017/10/electoral-divisions-dublin-city-sapmap.png" width="60%" style="display: block; margin: auto;" /> .center.tiny[Divisions of Dublin small areas.<br />Credit: Shane Lynn (2018) [🔗](https://www.shanelynn.ie/the-irish-property-price-register-geocoded-to-small-areas/)] --- # Socio-economic Correlates The census characteristics of the small area containing the properties are used as predictors of the property price in a XGBoost model. Again, the dataset was randomly split in a train and in a test dataset (80% vs. 20%). The results reveal that the 48 socio-economic features explain 42.8% of the property price variance of the test dataset. <img src="dsaa_2020_files/slides_files/figure-html/unnamed-chunk-15-1.png" width="504" style="display: block; margin: auto;" /> .center.tiny[Model's prediction accuracy using socio-economic features *vs.* test dataset.] --- # Socio-economic Correlates <table class="table" style="font-size: 14px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Feature Category </th> <th style="text-align:left;"> Feature Type </th> <th style="text-align:left;"> Importance </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> carers </td> <td style="text-align:left;width: 10em; "> Provides No Care </td> <td style="text-align:left;"> 33.9% </td> </tr> <tr> <td style="text-align:left;"> housing rooms </td> <td style="text-align:left;width: 10em; "> 8 or more Rooms </td> <td style="text-align:left;"> 19.1% </td> </tr> <tr> <td style="text-align:left;"> religion </td> <td style="text-align:left;width: 10em; "> No Religion </td> <td style="text-align:left;"> 3.2% </td> </tr> <tr> <td style="text-align:left;"> population </td> <td style="text-align:left;width: 10em; "> Age 0 - 14 </td> <td style="text-align:left;"> 3.1% </td> </tr> <tr> <td style="text-align:left;"> general health </td> <td style="text-align:left;width: 10em; "> Very Good </td> <td style="text-align:left;"> 2.0% </td> </tr> <tr> <td style="text-align:left;"> housing rooms </td> <td style="text-align:left;width: 10em; "> 6 Rooms </td> <td style="text-align:left;"> 1.9% </td> </tr> <tr> <td style="text-align:left;"> population </td> <td style="text-align:left;width: 10em; "> Age 80 Plus </td> <td style="text-align:left;"> 1.7% </td> </tr> <tr> <td style="text-align:left;"> carers </td> <td style="text-align:left;width: 10em; "> 1-19 hours unpaid PW </td> <td style="text-align:left;"> 1.7% </td> </tr> <tr> <td style="text-align:left;"> housing tenure </td> <td style="text-align:left;width: 10em; "> Owner Occupier No Mortgage </td> <td style="text-align:left;"> 1.7% </td> </tr> <tr> <td style="text-align:left;"> religion </td> <td style="text-align:left;width: 10em; "> Other Catholic </td> <td style="text-align:left;"> 1.7% </td> </tr> </tbody> </table> .center.tiny[Top 10 most important feature according their contribution to the model.] The density of individual not providing regular unpaid personal help in the area is the most important socio-economic feature to predict property prices. Large properties in the area and the proportion of individuals reporting having no religion are also relatively important. --- class: inverse, mline, center, middle # 3. Conclusion --- # Conclusion By preforming a feature analysis with urban and socio-economic features, it is possible to evaluate and predict the potential price of a property. -- - The presence of embassies or parks are criteria that influence significantly the price of properties. -- - Similarly, the characteristics of inhabitants in the area such as religion, health and age is correlated to the evolution of housing prices. -- These results allow an understanding of why some areas have higher prices than others which is relevant information not only for real estate agents in charge of property valuations but also for buyers in order to estimate the real value of a property. --- # Acknoledgment Many thanks to the contributors and developers of the OpenStreetMap project. Many thanks to the developers and contributors of R and RStudio as well as to those of the packages used: - {tidyverse} - {magrittr} - {mgcv} - {xgboost} - {OpenStreetMap} - {osmdata} - {rgdal} - {sf} - {sp} to name the main ones. --- class: inverse, mline, left, middle <img class="circle" src="https://github.com/damien-dupre.png" width="250px"/> # Thanks for your attention, find me at... [
@damien_dupre](http://twitter.com/damien_dupre) [
@damien-dupre](http://github.com/damien-dupre) [
damien-datasci-blog.netlify.app](https://damien-datasci-blog.netlify.app) [
damien.dupre@dcu.ie](mailto:damien.dupre@dcu.ie)