Take Home Exercise 2

Take Home Exercise 2: Focusing on Airbnb and how their expansion has impacted our economy. Using Spatial Point Patterns Analysis of Airbnb Listing in Singapore.

Sarah Chin linkedin.com/in/sarahchin99/
09-14-2021

1. Overview

Airbnb has expanded their services over 34,000 cities across 191 countries. However, Singapore is still one of the global cities that has yet to legalise short-term rentals offered by platforms such as Airbnb. Despite Singapore’s disregard of using Airbnb, there are still tools and datasets about Singapore that allows people to explore how Airbnb are used in the cities.

2. Installing and Loading the packages

packages = c('maptools', 'sf', 'raster','spatstat', 'tmap', 'onemapsgapi', 'tidyverse', 'lubridate')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}

3. Section A: Airbnb Distribution in 2019

In this section, we need to investigate if the distribution of Airbnb listings are affected by location factors such as near to existing hotels, MRT services and tourist attractions.

Before we can analyse these points, we need to import and clean our data. Firstly, we import the Airbnb data using st_read() of sf package and transform the coordinate system to 3414.

airbnb <- read.csv("Airbnb_listing_30062019/30062019.csv")

We also want to extract the number and locations of hotels and tourist attractions in Singapore to see how this competition affects the Airbnb listings.

hotels <- read.csv("OneMap_Data/hotels.csv")
tourism <- read.csv("OneMap_Data/tourism.csv")

Extracting the data for MRT stations are also important for analysis. As the data for MRT station is in shp file format, we will use the following code to extract the data.

mrt <- st_read(dsn = "TrainStation", 
                layer = "MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source 
  `D:\sarahcsp\IS415_blog\_posts\2021-09-14-take-home-exercise-2\TrainStation' 
  using driver `ESRI Shapefile'
Simple feature collection with 185 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21

Since the MRT dataset is a shp file, it can be plotted immediately.

Since the Airbnb, hotels and tourists datasets that have been imported are in .csv format, we would need to convert them to sf for further analysis. Additionally, we need to change the coordinate system to 3414, the coordinate system of Singapore. As all of the data provided for latitude and longitude are in decimal degree format, we will assume that the data is in wgs84 Geographic Coordinate System.

airbnb_sf <- st_as_sf(airbnb, 
                       coords = c("longitude", "latitude"),
                       crs=4326) %>%
  st_transform(crs = 3414)

hotels_sf <- st_as_sf(hotels, 
                       coords = c("Lng", "Lat"),
                       crs=4326) %>%
  st_transform(crs = 3414)

tourism_sf <- st_as_sf(tourism, 
                       coords = c("Lng", "Lat"),
                       crs=4326) %>%
  st_transform(crs = 3414)

Let’s plot to review the datasets that have been provided. This is the Airbnb map using airbnb_sf.

tmap_mode("view")
tm_shape(airbnb_sf) + 
  tm_dots(alpha = 0.4, 
          col = "blue", 
          size = 0.05) +
  tm_basemap("OpenStreetMap")

Here is the hotels map using hotels_sf.

tm_shape(hotels_sf) + 
  tm_dots(alpha = 0.4, 
          col = "red", 
          size = 0.05) +
  tm_basemap("OpenStreetMap")

Here are the tourist attractions available in Singapore, using tourism_sf.

tm_shape(tourism_sf) + 
  tm_dots(alpha = 0.4, 
          col = "purple", 
          size = 0.05) +
  tm_basemap("OpenStreetMap")

As we can see from the above results for tourism_sf, there is a coordinate that is not within Singapore. This means that this point (Longitude and Latitude) could possibly be N/A. We can verify this by searching for any missing values.

sum(is.na(tourism_sf$LATITUDE))
[1] 1

From the results, we can tell that there is one N/A result in the column “LATITUDE” under the tourism_sf dataset. We shall remove that N/A value to concentrate our findings on Singapore.

tourism_sf <- tourism_sf[!is.na(tourism_sf$LATITUDE),]

After the N/A row has been removed, we can plot the graph again to see if there’s an improvement.

tm_shape(tourism_sf) + 
  tm_dots(alpha = 0.4, 
          col = "purple", 
          size = 0.05) +
  tm_basemap("OpenStreetMap")

After cleaning the tourism_sf dataset, we can finally put the 3 datasets together to see if there are any correlation between the datasets. The Airbnb dataset are highlighted in blue, the hotels dataset are highlighted in red and the tourism dataset highlighted in purple. Here’s all the datasets together:

tmap_mode("view")
tm_shape(airbnb_sf) + 
  tm_dots(alpha = 0.4, 
          col = "blue", 
          size = 0.05) +
tm_shape(hotels_sf) + 
  tm_dots(alpha = 0.4, 
          col = "red", 
          size = 0.05) +
tm_shape(tourism_sf) + 
  tm_dots(alpha = 0.4, 
          col = "purple", 
          size = 0.05) +
tm_shape(mrt) +
  tm_dots(alpha = 0.4, 
          col = "green", 
          size = 0.05) +
tm_basemap("OpenStreetMap")

From the above plotted map, we can tell that the Airbnb facilities have been spread widely over Singapore, covering places that even the hotels are not available in. On the other hand, majority of the hotels are located in the central district of Singapore with the exception of some hotels such as RM Hotel on the far west and Changi hotels in the east. However, the location of the hotels can be related to the tourism locations. As seen above, the locations of most of the tourist attractions are within the central district of Singapore as well. In order to capitalise and profit from tourists, hotels would locate themselves nearer to the tourist attractions as tourists would prefer to be nearer to these attractions. For the MRT lines, we can see that it correlates closely with the Airbnb locations, especially in the Northern region of Singapore. This could be because the Airbnb hosts believe that being near an MRT station would be more attractive to tourists for easy access to transport.

Now that we have plotted our graph, we can start the geospatial data wrangling process.

Geospatial Data Wrangling

One of the objectives in this task is to derive the kernel density maps of the Airbnb listings, hotels, MRT services and tourist attractions. In order to analyse any of the data that we have plotted so far, we would need to further clean the data with the following steps.

Step 1: Converting sf data frames to sp’s Spatial class

As the airbnb_sf, hotels_sf and tourism_sf are all in sf data frame, we would need to first convert them into Spatial class.

airbnb_spatial <- as_Spatial(airbnb_sf)
hotels_spatial <- as_Spatial(hotels_sf)
tourism_spatial <- as_Spatial(tourism_sf)
mrt_spatial <- as_Spatial(mrt)
class       : SpatialPointsDataFrame 
features    : 8293 
extent      : 7215.566, 44098.31, 25166.35, 49226.35  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 14
names       :       id,                                              name,   host_id, host_name, neighbourhood_group, neighbourhood,       room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365 
min values  :    49091,                                                  ,     23666,          ,      Central Region,    Ang Mo Kio, Entire home/apt,     0,              1,                 0,            ,              0.01,                              1,                0 
max values  : 36053005, ZR2- NEW! Sunny & Modern Apt 4 mins to Orchard Rd, 271165196,    Zuzana,         West Region,        Yishun,     Shared room, 13999,           1000,               308,  2019-06-25,             12.09,                            277,              365 
class       : SpatialPointsDataFrame 
features    : 422 
extent      : 5939.241, 45334.18, 25379.44, 44562.4  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 7
names       :                              NAME, ADDRESSPOSTALCODE,                                 ADDRESSSTREETNAME,         HYPERLINK, TOTALROOMS,     KEEPERNAME, ICON_NAME 
min values  :                      30 BENCOOLEN,             18956,                                 1 Bayfront Avenue, 96ytlim@gmail.com,          4,  Adel Aramouni, hotel.gif 
max values  : YotelAir Singapore Changi Airport,            819666, 99 IRRAWADDY ROAD, # 22-00 ROYAL SQUARE AT NOVENA, zubair@dam.com.sg,       2561, Zhang YuanQing, hotel.gif 
class       : SpatialPointsDataFrame 
features    : 106 
extent      : 11380.23, 43659.54, 22869.34, 47596.73  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 15
names       :                                  NAME,                                                                                                                                                   DESCRIPTION,                  ADDRESSSTREETNAME,                            HYPERLINK,                                                                                                                             PHOTOURL,                                                                                          URL_PATH,                                                                                                          IMAGE_ALT_TEXT,                PHOTOCREDITS,                  LASTMODIFIED,  LATITUDE, LONGTITUDE,                                                                                             META_DESCRIPTION,                                                                                                                                                                                   OPENING_HOURS,        ICON_NAME, ADDRESSPOSTALCODE 
min values  : Adventure Cove Waterpark™ Singapore, A feat of engineering, an architectural statement and a sheer aesthetic triumph, Marina Bay Sands<sup>®</sup> has upped the ante for buildings in Singapore.,                       1 Beach Road,                   http://acm.org.sg/,                                  /content/dam/desktop/global/see-do-singapore/architecture/hajjah-fatimah-mosque-carousel01-rect.jpg, www.yoursingapore.com/en/see-do-singapore/architecture/historical/capitol-building-singapore.html,               Adults and kids of all ages who are not even science buffs will have fun at the Singapore Science Centre.,                   8Q at SAM, 2015-03-30T12:57:27.648+08:00, 1.2230965,  103.68398, A tranquil patch of imperial China in the west of Singapore is pleasant respite from the bustle of the city.,                                                                                                                                                       50th storey Skybridge, Daily, 9am –10pm, tourist_spot.gif,                 0 
max values  :            Victoria Theatre Singapore,                                                   With so many attractions packed into this 15-km stretch of beaches, you’ll never run out of things to do., Seng Poh Road and Tiong Bahru Road, https://www.pub.gov.sg/marinabarrage, www.yoursingapore.com/content/dam/desktop/global/see-do-singapore/recreation-leisure/universal-studios-singapore-carousel01-rect.jpg,      www.yoursingapore.com/en/see-do-singapore/recreation-leisure/viewpoints/singapore-flyer.html, Whether you prefer water sports, rollerblading or cycling, find a myriad of things to do at East Coast Park, Singapore., Wildlife Reserves Singapore, 2015-11-03T17:55:41.364+08:00,   1.44672,  103.97403,                                     With the Henderson Waves bridge, form meets function to stunning effect., Visits are by appointment only.Visitors must sign up in advance for heritage tours which fall on:Monday, 2pm – 3pm,Tuesday, 6.30pm – 7.30pm,Thursday, 10am – 11am,Saturday, 11am – 12pm, tourist_spot.gif,                 0 
class       : SpatialPointsDataFrame 
features    : 185 
extent      : 6138.311, 45254.86, 27555.06, 47854.2  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs 
variables   : 3
names       : OBJECTID,              STN_NAME, STN_NO 
min values  :        0, ADMIRALTY MRT STATION,    BP1 
max values  :      191,    YISHUN MRT STATION,    TE3 

Step 2: Converting Spatial class into sp format

As spatstat requires the data in ppp format and there is no direct way to convert Spatial class into ppp, we need to first convert the data into Spatial object.

airbnb_sp <- as(airbnb_spatial, "SpatialPoints")
hotels_sp <- as(hotels_spatial, "SpatialPoints")
tourism_sp <- as(tourism_spatial, "SpatialPoints")
mrt_sp <- as(mrt_spatial, "SpatialPoints")
class       : SpatialPoints 
features    : 8293 
extent      : 7215.566, 44098.31, 25166.35, 49226.35  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
class       : SpatialPoints 
features    : 422 
extent      : 5939.241, 45334.18, 25379.44, 44562.4  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
class       : SpatialPoints 
features    : 106 
extent      : 11380.23, 43659.54, 22869.34, 47596.73  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
class       : SpatialPoints 
features    : 185 
extent      : 6138.311, 45254.86, 27555.06, 47854.2  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs 

Step 3: Converting sp format into spatstat’s ppp format

Now that the datasets are in sp format, we can finally convert the datasets into ppp object format.

airbnb_ppp <- as(airbnb_sp, "ppp")
hotels_ppp <- as(hotels_sp, "ppp")
tourism_ppp <- as(tourism_sp, "ppp")
mrt_ppp <- as(mrt_sp, "ppp")
Planar point pattern: 8293 points
window: rectangle = [7215.57, 44098.31] x [25166.35, 49226.35] units
Planar point pattern: 422 points
window: rectangle = [5939.24, 45334.18] x [25379.44, 44562.4] units
Planar point pattern: 106 points
window: rectangle = [11380.23, 43659.54] x [22869.34, 47596.73] units
Planar point pattern: 185 points
window: rectangle = [6138.31, 45254.86] x [27555.06, 47854.2] units

Handling duplicated points

Before we can proceed, we need to check to see if the data contains any duplicated points. We can do so by using the any(duplicated()) function.

any(duplicated(airbnb_ppp))
[1] TRUE
any(duplicated(hotels_ppp))
[1] TRUE
any(duplicated(tourism_ppp))
[1] TRUE
any(duplicated(mrt_ppp))
[1] FALSE

From the above results, we can tell that 3 datasets, excluding mrt_pp, contain duplicated points. Therefore, we need to properly handle them before moving on. We can use jittering, which is a solution that adds a small perturbation to the duplicated points, ensuring that the points do not occupy the same space. We can use this solution by using the following code:

airbnb_ppp_jit <- rjitter(airbnb_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)

hotels_ppp_jit <- rjitter(hotels_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)

tourism_ppp_jit <- rjitter(tourism_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)

After running the code above, let’s check to see if there are any duplicated points left.

any(duplicated(airbnb_ppp_jit))
[1] FALSE
any(duplicated(hotels_ppp_jit))
[1] FALSE
any(duplicated(tourism_ppp_jit))
[1] FALSE

Creating owin object

In order to properly analyse the data within Singapore, we need to create an object owin to represent the Singapore boundary as a polygonal region. We will use the MPSZ_SUBZONE_WEB_PL dataset that was used in Hands on exercise 4.

mpsz_sf <- st_read(dsn = "data", 
                layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `D:\sarahcsp\IS415_blog\_posts\2021-09-14-take-home-exercise-2\data' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
mpsz_spatial <- as_Spatial(mpsz_sf)

mpsz_sp <- as(mpsz_spatial, "SpatialPolygons")

mpsz_owin <- as(mpsz_sp, "owin")

Next, let’s plot mpsz_owin to see the subzones and the outline of Singapore.

Combining the owin object with the points object

Lastly, we extract the individual datasets’ points that are located within Singapore using this code:

airbnbSG_ppp <- airbnb_ppp[mpsz_owin]
hotelsSG_ppp <- hotels_ppp[mpsz_owin]
tourismSG_ppp <- tourism_ppp[mpsz_owin]
mrtSG_ppp <- mrt_ppp[mpsz_owin]

Airbnb

plot(airbnbSG_ppp)

Hotels

plot(hotelsSG_ppp)

Tourist Attractions

plot(tourismSG_ppp)

MRT Stations

plot(mrtSG_ppp)

First-Order Spatial Point Patterns Analysis

Now that we have found the ppp values of all the datasets, we can proceed to calculate the kernel density estimation (KDE) layer to visualise and explore the intensity of point processes.

Using the code chunk below, we will be able to produce and plot the kernel density of all the datasets.

Airbnb

kde_airbnb_bw <- density(airbnbSG_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 

plot(kde_airbnb_bw)

As we can see from the plot above, the output range is too small for us to make any analysis. We will need to convert the unit of measure from meter to kilometer.

Now, we will be able to re-run the code and plot the KDE map.

kde_airbnbSG.bw <- density(airbnbSG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_airbnbSG.bw, main = "Airbnb bw.ppl")

In order to plot the kernel density maps on the OpenStreetMap, we need to convert the kernel density map into a grid object which will then be converted into a raster object.

grid_kde_airbnbSG.bw <- as.SpatialGridDataFrame.im(kde_airbnb_bw)
raster_kde_airbnbSG.bw <- raster(grid_kde_airbnbSG.bw)

After converting into a raster object, we need to check if the CRS projection has been set.

crs(raster_kde_airbnbSG.bw)
CRS arguments: NA 

Note that the CRS argument is NA. This means that no projection has been assigned to the raster object. To assign the correct projection to the raster object, we use the code below:

crs(raster_kde_airbnbSG.bw) = "+init=EPSG:3414"
crs(raster_kde_airbnbSG.bw)
CRS arguments:
 +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333
+k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m
+no_defs 

Now that the raster object has its projection, we can finally plot the kernel density map.

tmap_mode("view")
  tm_shape(raster_kde_airbnbSG.bw) +
  tm_raster(alpha=0.6,
            "v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE) +
  tm_basemap("OpenStreetMap")

Now, we will do the same KDE plotting for the rest of the datasets: Hotels, Tourist Attractions and MRT stations.

Hotels

kde_hotels_bw <- density(hotelsSG_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 

plot(kde_hotels_bw)

Similar to the Airbnb dataset, we shall convert the unit of measurement and plot the density map again.

hotelsSG_ppp.km <- rescale(hotelsSG_ppp, 1000, "km")

kde_hotelsSG.bw <- density(hotelsSG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_hotelsSG.bw, main = "Hotels bw.ppl")

Converting to grid before converting to raster.

grid_kde_hotelsSG.bw <- as.SpatialGridDataFrame.im(kde_hotels_bw)
raster_kde_hotelsSG.bw <- raster(grid_kde_hotelsSG.bw)

Checking CRS projection.

crs(raster_kde_hotelsSG.bw)
CRS arguments: NA 

Setting the CRS projection.

crs(raster_kde_hotelsSG.bw) = "+init=EPSG:3414"
crs(raster_kde_hotelsSG.bw)
CRS arguments:
 +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333
+k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m
+no_defs 

Plotting the Hotels Kernel Density Map.

tmap_mode("view")
  tm_shape(raster_kde_hotelsSG.bw) +
  tm_raster(alpha=0.6,
            "v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE) +
  tm_basemap("OpenStreetMap")

Tourist Attractions

kde_tourism_bw <- density(tourismSG_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 

plot(kde_tourism_bw)

tourismSG_ppp.km <- rescale(tourismSG_ppp, 1000, "km")

kde_tourismSG.bw <- density(tourismSG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_tourismSG.bw, main = "Tourist Attractions bw.ppl")

Converting to grid before converting to raster.

grid_kde_tourismSG.bw <- as.SpatialGridDataFrame.im(kde_tourism_bw)
raster_kde_hotelsSG.bw <- raster(grid_kde_tourismSG.bw)

Checking CRS projection.

crs(raster_kde_hotelsSG.bw)
CRS arguments: NA 

Setting the CRS projection.

crs(raster_kde_hotelsSG.bw) = "+init=EPSG:3414"
crs(raster_kde_hotelsSG.bw)
CRS arguments:
 +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333
+k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m
+no_defs 

Plotting the Hotels Kernel Density Map.

tmap_mode("view")
  tm_shape(raster_kde_hotelsSG.bw) +
  tm_raster(alpha=0.6,
            "v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE) +
  tm_basemap("OpenStreetMap")

MRT Stations

kde_mrt_bw <- density(mrtSG_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 

plot(kde_mrt_bw)

mrtSG_ppp.km <- rescale(mrtSG_ppp, 1000, "km")

kde_mrtSG.bw <- density(mrtSG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_mrtSG.bw, main = "MRT Stations bw.ppl")

Converting to grid before converting to raster.

grid_kde_mrtSG.bw <- as.SpatialGridDataFrame.im(kde_mrt_bw)
raster_kde_mrtSG.bw <- raster(grid_kde_mrtSG.bw)

Checking CRS projection.

crs(raster_kde_mrtSG.bw)
CRS arguments: NA 

Setting the CRS projection.

crs(raster_kde_mrtSG.bw) = "+init=EPSG:3414"
crs(raster_kde_mrtSG.bw)
CRS arguments:
 +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333
+k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m
+no_defs 

Plotting the Hotels Kernel Density Map.

tmap_mode("view")
  tm_shape(raster_kde_mrtSG.bw) +
  tm_raster(alpha=0.6,
            "v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE) +
  tm_basemap("OpenStreetMap")

Now that the First Order Analysis has been completed, we can move on to the Second Order Spatial Point Patterns Analysis. However, we would need to analyze the spatial points using G-Function before we can do so. In order to do that, we must first conduct the Nearest Neighbour Analysis

Nearest Neighbour Analysis

For Nearest Neighbour Analysis, we will be using the Clark-Evans test of aggregation for a spatial point pattern analysis.

The test hypotheses will adjust according to the four different factors: Airbnb, Hotels, Tourist Attractions and MRT Station. Here are the hypotheses:

Ho = The distribution of each factor are randomly distributed.

H1 = The distribution of each factor are not randomly distributed.

For this hypothesis, we will use the 95% confident interval.

Airbnb

clarkevans.test(airbnbSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 99 simulations of CSR with fixed n

data:  airbnbSG_ppp
R = 0.34998, p-value = 0.01
alternative hypothesis: clustered (R < 1)

Hotels

clarkevans.test(hotelsSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 99 simulations of CSR with fixed n

data:  hotelsSG_ppp
R = 0.30302, p-value = 0.01
alternative hypothesis: clustered (R < 1)

Tourist Attractions

clarkevans.test(tourismSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 99 simulations of CSR with fixed n

data:  tourismSG_ppp
R = 0.50233, p-value = 0.01
alternative hypothesis: clustered (R < 1)

MRT Stations

clarkevans.test(mrtSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 99 simulations of CSR with fixed n

data:  mrtSG_ppp
R = 0.67075, p-value = 0.01
alternative hypothesis: clustered (R < 1)

As seen above, all the tests have resulted in alternative hypothesis: clustered (R < 1). This means that the point pattern of all the datasets have resulted in the alternative hypothesis, the clustered point pattern. Now that we have performed the Clark-Evans test, we can proceed with the Second-Order Spatial Point Patterns Analysis.

Second-Order Spatial Point Patterns Analysis

For the Second-Order Spatial Point Patterns Analysis, we will use the G-function to estimate the distribution of the different factors in a point pattern.

First, let’s take a look at the Airbnb dataset. We will use Gest() to compute the G-function of the dataset.

G_airbnb = Gest(airbnb_ppp, correction = "border")
plot(G_airbnb, xlim=c(0,500))

As seen in the graph above, the actual line, G(bord), is higher than the estimated line, G(pois). Now we will now perform the Complete Spatial Randomness Test. We will have the following hypothesis:

Ho = The distribution of Airbnb locations are randomly distributed.

H1 = The distribution of Airbnb locations are not randomly distributed.

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.

G_airbnb.csr <- envelope(airbnb_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.

Done.
plot(G_airbnb.csr)

Based on the results above, we can tell that there is a clustered pattern occurring in the Airbnb dataset. As we can see from the graph above, the solid line G(r) increases rapidly at a short distance which means that the data points are clustered. After conducting the Monte Carlo simulation test on the dataset, we can see that the solid line, G(r), lies above the upper envelope which means that the estimated G(r) is statistically significant.

We will now conduct the same tests on the other factors.

Hotels

G_hotels = Gest(hotelsSG_ppp, correction = "border")
plot(G_hotels, xlim=c(0,500))

We will have the following hypothesis:

Ho = The distribution of Airbnb locations are randomly distributed.

H1 = The distribution of Airbnb locations are not randomly distributed.

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.

G_hotels.csr <- envelope(hotelsSG_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10 [etd 15:38] .........20 [etd 15:01] .........
30 [etd 14:45] .........40 [etd 14:35] .........50 [etd 14:48] ........
.60 [etd 14:28] .........70 [etd 14:26] .........80 [etd 14:09] .......
..90 [etd 14:03] .........100 [etd 13:54] .........110 [etd 13:41] ......
...120 [etd 13:30] .........130 [etd 13:20] .........140 [etd 13:10] .....
....150 [etd 12:59] .........160 [etd 12:49] .........170 [etd 12:41] ....
.....180 [etd 12:30] .........190 [etd 12:19] .........200 [etd 12:09] ...
......210 [etd 12:00] .........220 [etd 11:49] .........230 [etd 11:39] ..
.......240 [etd 11:29] .........250 [etd 11:21] .........260 [etd 11:10] .
........270 [etd 11:00] .........280 [etd 10:51] .........290
 [etd 10:42] .........300 [etd 10:33] .........310 [etd 10:23] .........
320 [etd 10:15] .........330 [etd 10:08] .........340 [etd 10:01] ........
.350 [etd 9:51] .........360 [etd 9:42] .........370 [etd 9:33] .......
..380 [etd 9:31] .........390 [etd 9:28] .........400 [etd 9:18] ......
...410 [etd 9:12] .........420 [etd 9:06] .........430 [etd 8:57] .....
....440 [etd 8:47] .........450 [etd 8:38] .........460 [etd 8:28] ....
.....470 [etd 8:18] .........480 [etd 8:08] .........490 [etd 7:58] ...
......500 [etd 7:48] .........510 [etd 7:39] .........520 [etd 7:29] ..
.......530 [etd 7:19] .........540 [etd 7:10] .........550 [etd 7:00] .
........560 [etd 6:50] .........570 [etd 6:40] .........580
 [etd 6:31] .........590 [etd 6:21] .........600 [etd 6:12] .........
610 [etd 6:02] .........620 [etd 5:53] .........630 [etd 5:43] ........
.640 [etd 5:33] .........650 [etd 5:24] .........660 [etd 5:14] .......
..670 [etd 5:05] .........680 [etd 4:56] .........690 [etd 4:46] ......
...700 [etd 4:39] .........710 [etd 4:32] .........720 [etd 4:22] .....
....730 [etd 4:12] .........740 [etd 4:03] .........750 [etd 3:54] ....
.....760 [etd 3:44] .........770 [etd 3:35] .........780 [etd 3:25] ...
......790 [etd 3:16] .........800 [etd 3:06] .........810 [etd 2:57] ..
.......820 [etd 2:47] .........830 [etd 2:38] .........840 [etd 2:29] .
........850 [etd 2:19] .........860 [etd 2:10] .........870
 [etd 2:00] .........880 [etd 1:51] .........890 [etd 1:42] .........
900 [etd 1:32] .........910 [etd 1:23] .........920 [etd 1:14] ........
.930 [etd 1:04] .........940 [etd 55 sec] .........950 [etd 46 sec] .......
..960 [etd 36 sec] .........970 [etd 27 sec] .........980 [etd 18 sec] ......
...990 [etd 8 sec] ........ 999.

Done.
plot(G_hotels.csr)

Based on the results above, the first portion of the estimated G(r) lies above the envelope. This means that the hotels within a certain vicinity are clustered. However, as the hotels grow further, the estimted G(r) starts to fall within the envelope, showing that the hotels seem to appear more homogeneous or random.

Tourist Attractions

G_tourism = Gest(tourismSG_ppp, correction = "border")
plot(G_tourism, xlim=c(0,500))

We will have the following hypothesis:

Ho = The distribution of Airbnb locations are randomly distributed.

H1 = The distribution of Airbnb locations are not randomly distributed.

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.

G_tourism.csr <- envelope(tourismSG_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10 [etd 9:30] .........20 [etd 9:16] .........
30 [etd 9:26] .........40 [etd 9:46] .........50 [etd 9:29] ........
.60 [etd 9:19] .........70 [etd 9:07] .........80 [etd 9:04] .......
..90 [etd 8:55] .........100 [etd 8:52] .........110 [etd 8:55] ......
...120 [etd 8:48] .........130 [etd 8:39] .........140 [etd 8:31] .....
....150 [etd 8:23] .........160 [etd 8:19] .........170 [etd 8:11] ....
.....180 [etd 8:04] .........190 [etd 7:56] .........200 [etd 7:51] ...
......210 [etd 7:43] .........220 [etd 7:36] .........230 [etd 7:29] ..
.......240 [etd 7:22] .........250 [etd 7:17] .........260 [etd 7:11] .
........270 [etd 7:05] .........280 [etd 6:58] .........290
 [etd 6:53] .........300 [etd 6:47] .........310 [etd 6:40] .........
320 [etd 6:34] .........330 [etd 6:28] .........340 [etd 6:22] ........
.350 [etd 6:16] .........360 [etd 6:10] .........370 [etd 6:04] .......
..380 [etd 5:58] .........390 [etd 5:52] .........400 [etd 5:45] ......
...410 [etd 5:40] .........420 [etd 5:34] .........430 [etd 5:28] .....
....440 [etd 5:22] .........450 [etd 5:16] .........460 [etd 5:10] ....
.....470 [etd 5:04] .........480 [etd 4:58] .........490 [etd 4:52] ...
......500 [etd 4:47] .........510 [etd 4:41] .........520 [etd 4:35] ..
.......530 [etd 4:29] .........540 [etd 4:23] .........550 [etd 4:18] .
........560 [etd 4:12] .........570 [etd 4:06] .........580
 [etd 4:00] .........590 [etd 3:55] .........600 [etd 3:49] .........
610 [etd 3:43] .........620 [etd 3:37] .........630 [etd 3:32] ........
.640 [etd 3:26] .........650 [etd 3:20] .........660 [etd 3:14] .......
..670 [etd 3:08] .........680 [etd 3:03] .........690 [etd 2:57] ......
...700 [etd 2:51] .........710 [etd 2:45] .........720 [etd 2:40] .....
....730 [etd 2:34] .........740 [etd 2:28] .........750 [etd 2:22] ....
.....760 [etd 2:17] .........770 [etd 2:11] .........780 [etd 2:06] ...
......790 [etd 2:00] .........800 [etd 1:54] .........810 [etd 1:48] ..
.......820 [etd 1:43] .........830 [etd 1:37] .........840 [etd 1:31] .
........850 [etd 1:25] .........860 [etd 1:20] .........870
 [etd 1:15] .........880 [etd 1:10] .........890 [etd 1:04] .........
900 [etd 58 sec] .........910 [etd 52 sec] .........920 [etd 46 sec] ........
.930 [etd 40 sec] .........940 [etd 35 sec] .........950 [etd 29 sec] .......
..960 [etd 23 sec] .........970 [etd 17 sec] .........980 [etd 11 sec] ......
...990 [etd 5 sec] ........ 999.

Done.
plot(G_tourism.csr)

Based on the results above, the first portion of the estimated G(r) lies above the envelope. This means that the tourist attractions within a certain vicinity are clustered. However, as the tourist attractions grow further, the estimted G(r) starts to fall within the envelope, showing that the tourist attracktions seem to appear more homogeneous or random.

MRT Stations

G_mrt = Gest(mrtSG_ppp, correction = "border")
plot(G_mrt, xlim=c(0,500))

We will have the following hypothesis:

Ho = The distribution of Airbnb locations are randomly distributed.

H1 = The distribution of Airbnb locations are not randomly distributed.

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.

G_mrt.csr <- envelope(mrtSG_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10 [etd 11:14] .........20 [etd 10:40] .........
30 [etd 10:29] .........40 [etd 10:23] .........50 [etd 10:27] ........
.60 [etd 10:15] .........70 [etd 10:04] .........80 [etd 9:56] .......
..90 [etd 9:50] .........100 [etd 9:44] .........110 [etd 9:38] ......
...120 [etd 9:45] .........130 [etd 9:39] .........140 [etd 9:33] .....
....150 [etd 9:25] .........160 [etd 9:24] .........170 [etd 9:17] ....
.....180 [etd 9:09] .........190 [etd 9:01] .........200 [etd 8:53] ...
......210 [etd 8:45] .........220 [etd 8:38] .........230 [etd 8:30] ..
.......240 [etd 8:22] .........250 [etd 8:14] .........260 [etd 8:08] .
........270 [etd 8:01] .........280 [etd 7:54] .........290
 [etd 7:47] .........300 [etd 7:40] .........310 [etd 7:33] .........
320 [etd 7:26] .........330 [etd 7:19] .........340 [etd 7:12] ........
.350 [etd 7:06] .........360 [etd 6:59] .........370 [etd 6:52] .......
..380 [etd 6:45] .........390 [etd 6:39] .........400 [etd 6:32] ......
...410 [etd 6:25] .........420 [etd 6:19] .........430 [etd 6:12] .....
....440 [etd 6:05] .........450 [etd 5:59] .........460 [etd 5:52] ....
.....470 [etd 5:46] .........480 [etd 5:39] .........490 [etd 5:32] ...
......500 [etd 5:26] .........510 [etd 5:20] .........520 [etd 5:13] ..
.......530 [etd 5:07] .........540 [etd 5:00] .........550 [etd 4:53] .
........560 [etd 4:47] .........570 [etd 4:40] .........580
 [etd 4:34] .........590 [etd 4:27] .........600 [etd 4:21] .........
610 [etd 4:14] .........620 [etd 4:07] .........630 [etd 4:01] ........
.640 [etd 3:54] .........650 [etd 3:48] .........660 [etd 3:41] .......
..670 [etd 3:35] .........680 [etd 3:29] .........690 [etd 3:22] ......
...700 [etd 3:16] .........710 [etd 3:09] .........720 [etd 3:02] .....
....730 [etd 2:56] .........740 [etd 2:49] .........750 [etd 2:43] ....
.....760 [etd 2:36] .........770 [etd 2:30] .........780 [etd 2:23] ...
......790 [etd 2:17] .........800 [etd 2:10] .........810 [etd 2:04] ..
.......820 [etd 1:57] .........830 [etd 1:51] .........840 [etd 1:44] .
........850 [etd 1:38] .........860 [etd 1:31] .........870
 [etd 1:25] .........880 [etd 1:18] .........890 [etd 1:12] .........
900 [etd 1:05] .........910 [etd 59 sec] .........920 [etd 52 sec] ........
.930 [etd 45 sec] .........940 [etd 39 sec] .........950 [etd 32 sec] .......
..960 [etd 26 sec] .........970 [etd 19 sec] .........980 [etd 12 sec] ......
...990 [etd 6 sec] ........ 999.

Done.
plot(G_mrt.csr)

Based on the results above, the first portion of the estimated G(r) lies above the envelope. This means that the MRT Stations within a certain vicinity are clustered. However, the estimated G(r) line falls uder the envelope and stays stagnant. This means that there is a regular pattern to the MRT Stations that are in Singapore.

Section B: Impact of Covid-19

In this section, we will be analysing the impact of Covid-19 on Airbnb businesses in Singapore. The date range that we will be looking at is from June 2019 to June 2021.

Before we start analysing, we would need to adjust the dataset accordingly to fit the time period of Covid-19. Ensure that you download the package, lubridate, before running the code below.

airbnb$date201921 <- format(as.Date(airbnb$last_review), "%Y-%m")

After ensuring that the date format has been changed, we will need to check if there are any N/A values and remove them.

sum(is.na(airbnb$date201921))
[1] 3137
airbnb <- airbnb[!is.na(airbnb$date201921),]
sum(is.na(airbnb$date201921))
[1] 0

After the N/A values have been removed, we can now proceed to filter according to the Covid-19 period, from June 2019 to June 2021.

airbnb <- airbnb[airbnb[["date201921"]] >= "2019-06",]
airbnb <- airbnb[airbnb[["date201921"]] <= "2021-06",]

We then find the number of unique values in the room_type column.

length(unique(airbnb[["room_type"]]))
[1] 3

Since there are 3 unique values, we find out what different room types there are in the dataset.

unique(airbnb[c("room_type")])
          room_type
6      Private room
37  Entire home/apt
113     Shared room

After finding out the unique values, we save them in separate datasets.

airbnb_private <- airbnb[airbnb[["room_type"]] == "Private room",]
airbnb_entirehome <- airbnb[airbnb[["room_type"]] == "Entire home/apt",]
airbnb_shared <- airbnb[airbnb[["room_type"]] == "Shared room",]

We will then convert them to sf format and ensure that their coordinate projection is 3414.

airbnb_private_sf <- st_as_sf(airbnb_private, 
                       coords = c("longitude", "latitude"),
                       crs=4326) %>%
  st_transform(crs = 3414)

airbnb_entirehome_sf <- st_as_sf(airbnb_entirehome, 
                       coords = c("longitude", "latitude"),
                       crs=4326) %>%
  st_transform(crs = 3414)

airbnb_shared_sf <- st_as_sf(airbnb_shared, 
                       coords = c("longitude", "latitude"),
                       crs=4326) %>%
  st_transform(crs = 3414)

Geospatial Data Wrangling for Covid-19

Now that we have prepared the datasets, we can continue with geospatial data wrangling. As we did previously, we will be using the following steps.

Step 1: Converting sf dataframes to sp’s Spatial class

airbnb_private <- as_Spatial(airbnb_private_sf)
airbnb_entirehome <- as_Spatial(airbnb_entirehome_sf)
airbnb_shared <- as_Spatial(airbnb_shared_sf)

Private Rooms

airbnb_private
class       : SpatialPointsDataFrame 
features    : 577 
extent      : 11965.58, 43591.65, 26182.53, 47805.48  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 15
names       :       id,                                               name,   host_id, host_name, neighbourhood_group, neighbourhood,    room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365, date201921 
min values  :    71903, !! CozyRoom@City Center,Little India,FarrerParkMRT,     59498,   Aanchal,      Central Region,    Ang Mo Kio, Private room,    22,              1,                 1,  2019-06-01,              0.05,                              1,                0,    2019-06 
max values  : 35985397,           Yunnan Garden Corner Terrace ( Near NTU), 269092833,       Zoe,         West Region,        Yishun, Private room,  1300,            180,               294,  2019-06-25,             12.09,                            172,              365,    2019-06 

Entire Homes/Apt

airbnb_entirehome
class       : SpatialPointsDataFrame 
features    : 779 
extent      : 15782.79, 41366.15, 25320.05, 44330.08  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 15
names       :       id,                                             name,   host_id,     host_name, neighbourhood_group, neighbourhood,       room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365, date201921 
min values  :   746929, (Central-Novena) Semi-Detached House Near Subway,    165209,              ,      Central Region,    Ang Mo Kio, Entire home/apt,     0,              1,                 1,  2019-06-01,              0.09,                              1,                0,    2019-06 
max values  : 35947264,       WOW 4BRs, Near MRT, Cozy Apartment@Geylang, 268956734, Zoe & Friends,         West Region,        Yishun, Entire home/apt,   800,             90,               308,  2019-06-25,              8.22,                            277,              365,    2019-06 

Shared Room

airbnb_shared
class       : SpatialPointsDataFrame 
features    : 55 
extent      : 18742.15, 32282.62, 29275.31, 40202.48  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 15
names       :       id,                             name,   host_id, host_name, neighbourhood_group,   neighbourhood,   room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365, date201921 
min values  :  1678744,      12 Pax Mixed Dorm @ Kampong,    646629,     Adler,      Central Region,     Bukit Merah, Shared room,    18,              1,                 1,  2019-06-01,              0.11,                              1,                6,    2019-06 
max values  : 35773131, Unique Mixed-Gender Capsule Pods, 264230720,      Wink,         West Region, Singapore River, Shared room,   200,              5,               131,  2019-06-25,              5.08,                            107,              364,    2019-06 

Step 2: Converting Spatial class into generic sp format

airbnb_private_sp <- as(airbnb_private, "SpatialPoints")
airbnb_entirehome_sp <- as(airbnb_entirehome, "SpatialPoints")
airbnb_shared_sp <- as(airbnb_shared, "SpatialPoints")

Private Rooms

airbnb_private_sp
class       : SpatialPoints 
features    : 577 
extent      : 11965.58, 43591.65, 26182.53, 47805.48  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 

Entire Home/Apt

airbnb_entirehome_sp
class       : SpatialPoints 
features    : 779 
extent      : 15782.79, 41366.15, 25320.05, 44330.08  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 

Shared Rooms

airbnb_shared_sp
class       : SpatialPoints 
features    : 55 
extent      : 18742.15, 32282.62, 29275.31, 40202.48  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 

Step 3: Converting generic sp format into spatstat’s ppp format

airbnb_private_ppp <- as(airbnb_private_sp, "ppp")
airbnb_entirehome_ppp <- as(airbnb_entirehome_sp, "ppp")
airbnb_shared_ppp <- as(airbnb_shared_sp, "ppp")

Private Rooms

airbnb_private_ppp
Planar point pattern: 577 points
window: rectangle = [11965.58, 43591.65] x [26182.53, 47805.48] units

Entire Homes/Apt

airbnb_entirehome_ppp
Planar point pattern: 779 points
window: rectangle = [15782.79, 41366.15] x [25320.05, 44330.08] units

Shared Rooms

airbnb_shared_ppp
Planar point pattern: 55 points
window: rectangle = [18742.15, 32282.62] x [29275.31, 40202.48] units

Now that we have successfully converted them into the ppp format, we need to handle any possible duplicated points.

Handling Duplicated Points

any(duplicated(airbnb_private_ppp))
[1] FALSE
any(duplicated(airbnb_entirehome_ppp))
[1] FALSE
any(duplicated(airbnb_shared_ppp))
[1] FALSE

Since there are no duplicated points, we can plot the datasets onto a graph using the following code:

par(mfrow=c(2,2))
tmap_mode('view')
tm_shape(airbnb_private) +
  tm_dots(alpha=0.4, 
          size=0.05)
tmap_mode('view')
tm_shape(airbnb_entirehome) +
  tm_dots(alpha=0.4, 
          size=0.05)
tmap_mode('view')
tm_shape(airbnb_shared) +
  tm_dots(alpha=0.4, 
          size=0.05)

Creating owin object

We will use the previous code that was created for the owin object.

mpsz_sf <- st_read(dsn = "data", 
                layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `D:\sarahcsp\IS415_blog\_posts\2021-09-14-take-home-exercise-2\data' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
mpsz_spatial <- as_Spatial(mpsz_sf)

mpsz_sp <- as(mpsz_spatial, "SpatialPolygons")

mpsz_owin <- as(mpsz_sp, "owin")

plot(mpsz_owin)

Combining point events object and owin object

We will use the following code to extract Private Rooms, Entire Homes/Apt and Shared Rooms events that are located within Singapore.

airbnb_privateSG_ppp = airbnb_private_ppp[mpsz_owin]
airbnb_entirehomeSG_ppp = airbnb_entirehome_ppp[mpsz_owin]
airbnb_sharedSG_ppp = airbnb_shared_ppp[mpsz_owin]

Now, we plot all the ppp objects.

par(mfrow=c(2,2))
plot(airbnb_privateSG_ppp)
plot(airbnb_entirehomeSG_ppp)
plot(airbnb_sharedSG_ppp)

First-order Spatial POint Patterns Analysis on Covid-19

As what we did previously, we will be using the following code to create the kernel density map.

Private Rooms

kde_privateSG_bw <- density(airbnb_privateSG_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 
plot(kde_privateSG_bw)

As we can see from the plot above, the output range is too small for us to make any analysis. We will need to convert the unit of measure from meter to kilometer.

Now, we will be able to re-run the code and plot the KDE map.

kde_privateSG.bw <- density(privateSG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_privateSG.bw, main = "Private Rooms bw.ppl")

In order to plot the kernel density maps on the OpenStreetMap, we need to convert the kernel density map into a grid object which will then be converted into a raster object.

grid_kde_privateSG.bw <- as.SpatialGridDataFrame.im(kde_privateSG_bw)
raster_kde_privateSG.bw <- raster(grid_kde_privateSG.bw)

Let’s check if the coordinate projection has been input.

crs(raster_kde_privateSG.bw)
CRS arguments: NA 
crs(raster_kde_privateSG.bw) = "+init=EPSG:3414"
crs(raster_kde_privateSG.bw)
CRS arguments:
 +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333
+k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m
+no_defs 

Now that the raster object has its projection, we can finally plot the kernel density map.

tmap_mode("view")
  tm_shape(raster_kde_privateSG.bw) +
  tm_raster(alpha=0.6,
            "v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE) +
  tm_basemap("OpenStreetMap")

Entire Homes/Apt

kde_entireSG_bw <- density(airbnb_entirehomeSG_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 
plot(kde_entireSG_bw)

As we can see from the plot above, the output range is too small for us to make any analysis. We will need to convert the unit of measure from meter to kilometer.

Now, we will be able to re-run the code and plot the KDE map.

kde_entireSG.bw <- density(entireSG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_entireSG.bw, main = "Entire Homes/Apt bw.ppl")

In order to plot the kernel density maps on the OpenStreetMap, we need to convert the kernel density map into a grid object which will then be converted into a raster object.

grid_kde_entireSG.bw <- as.SpatialGridDataFrame.im(kde_entireSG_bw)
raster_kde_entireSG.bw <- raster(grid_kde_entireSG.bw)

Let’s check if the coordinate projection has been input.

crs(raster_kde_entireSG.bw)
CRS arguments: NA 
crs(raster_kde_entireSG.bw) = "+init=EPSG:3414"
crs(raster_kde_entireSG.bw)
CRS arguments:
 +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333
+k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m
+no_defs 

Now that the raster object has its projection, we can finally plot the kernel density map.

tmap_mode("view")
  tm_shape(raster_kde_entireSG.bw) +
  tm_raster(alpha=0.6,
            "v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE) +
  tm_basemap("OpenStreetMap")

Shared Rooms

kde_sharedSG_bw <- density(airbnb_sharedSG_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 
plot(kde_sharedSG_bw)

As we can see from the plot above, the output range is too small for us to make any analysis. We will need to convert the unit of measure from meter to kilometer.

Now, we will be able to re-run the code and plot the KDE map.

kde_sharedSG.bw <- density(sharedSG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_sharedSG.bw, main = "Shared Rooms bw.ppl")

In order to plot the kernel density maps on the OpenStreetMap, we need to convert the kernel density map into a grid object which will then be converted into a raster object.

grid_kde_sharedSG.bw <- as.SpatialGridDataFrame.im(kde_sharedSG_bw)
raster_kde_sharedSG.bw <- raster(grid_kde_sharedSG.bw)

Let’s check if the coordinate projection has been input.

crs(raster_kde_sharedSG.bw)
CRS arguments: NA 
crs(raster_kde_sharedSG.bw) = "+init=EPSG:3414"
crs(raster_kde_sharedSG.bw)
CRS arguments:
 +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333
+k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m
+no_defs 

Now that the raster object has its projection, we can finally plot the kernel density map.

tmap_mode("view")
  tm_shape(raster_kde_sharedSG.bw) +
  tm_raster(alpha=0.6,
            "v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE) +
  tm_basemap("OpenStreetMap")

Nearest Neighbour Analysis

As usual, after we have completed and plotted the kernel density maps, we will proceed with the Nearest Neighbour Analysis using the Clark and Evans test. Here are our test hypotheses:

Ho = The distribution of childcare services are randomly distributed.

H1= The distribution of childcare services are not randomly distributed.

The 95% confident interval will be used.

Private Rooms

clarkevans.test(airbnb_privateSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 99 simulations of CSR with fixed n

data:  airbnb_privateSG_ppp
R = 0.37025, p-value = 0.01
alternative hypothesis: clustered (R < 1)

Entire Homes/Apt

clarkevans.test(airbnb_entirehomeSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 99 simulations of CSR with fixed n

data:  airbnb_entirehomeSG_ppp
R = 0.22435, p-value = 0.01
alternative hypothesis: clustered (R < 1)

Shared Rooms

clarkevans.test(airbnb_sharedSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 99 simulations of CSR with fixed n

data:  airbnb_sharedSG_ppp
R = 0.16012, p-value = 0.01
alternative hypothesis: clustered (R < 1)

Second-Order Spatial Point Patterns Analysis

Similar to what we did in the previous Second-Order Spatial Point Patterns Analysis, we will be using the G Function to analyse the data.

Private Rooms

We will use the following code chunk to compute the G-function using Gest().

G_private = Gest(airbnb_privateSG_ppp, correction = "border")
plot(G_private, xlim=c(0,500))

Our hypothesis:

Ho = The distribution of Private Rooms in Singapore are randomly distributed.

H1= The distribution of Private Rooms in Singapore are not randomly distributed.

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.

Now that our hypothesis has been created, we can conduct the Monte Carlo test with the G-function.

G_private.csr <- envelope(airbnb_privateSG_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10 [etd 21:03] .........20 [etd 21:52] .........
30 [etd 23:24] .........40 [etd 23:14] .........50 [etd 23:08] ........
.60 [etd 22:19] .........70 [etd 22:30] .........80 [etd 22:29] .......
..90 [etd 22:48] .........100 [etd 22:48] .........110 [etd 22:52] ......
...120 [etd 23:16] .........130 [etd 23:10] .........140 [etd 22:57] .....
....150 [etd 22:24] .........160 [etd 22:02] .........170 [etd 21:52] ....
.....180 [etd 21:20] .........190 [etd 20:49] .........200 [etd 20:24] ...
......210 [etd 19:58] .........220 [etd 19:45] .........230 [etd 19:27] ..
.......240 [etd 19:16] .........250 [etd 19:05] .........260 [etd 18:50] .
........270 [etd 18:34] .........280 [etd 18:17] .........290
 [etd 17:59] .........300 [etd 17:42] .........310 [etd 17:27] .........
320 [etd 17:15] .........330 [etd 17:16] .........340 [etd 17:33] ........
.350 [etd 17:42] .........360 [etd 17:37] .........370 [etd 17:16] .......
..380 [etd 16:50] .........390 [etd 16:26] .........400 [etd 16:02] ......
...410 [etd 15:40] .........420 [etd 15:19] .........430 [etd 15:03] .....
....440 [etd 14:41] .........450 [etd 14:21] .........460 [etd 14:00] ....
.....470 [etd 13:40] .........480 [etd 13:20] .........490 [etd 13:01] ...
......500 [etd 12:43] .........510 [etd 12:27] .........520 [etd 12:10] ..
.......530 [etd 11:54] .........540 [etd 11:36] .........550 [etd 11:18] .
........560 [etd 11:00] .........570 [etd 10:45] .........580
 [etd 10:28] .........590 [etd 10:11] .........600 [etd 9:54] .........
610 [etd 9:44] .........620 [etd 9:28] .........630 [etd 9:11] ........
.640 [etd 8:54] .........650 [etd 8:37] .........660 [etd 8:20] .......
..670 [etd 8:03] .........680 [etd 7:47] .........690 [etd 7:31] ......
...700 [etd 7:15] .........710 [etd 7:00] .........720 [etd 6:44] .....
....730 [etd 6:28] .........740 [etd 6:13] .........750 [etd 5:58] ....
.....760 [etd 5:42] .........770 [etd 5:28] .........780 [etd 5:13] ...
......790 [etd 4:58] .........800 [etd 4:43] .........810 [etd 4:29] ..
.......820 [etd 4:14] .........830 [etd 4:00] .........840 [etd 3:46] .
........850 [etd 3:33] .........860 [etd 3:19] .........870
 [etd 3:05] .........880 [etd 2:50] .........890 [etd 2:35] .........
900 [etd 2:20] .........910 [etd 2:06] .........920 [etd 1:52] ........
.930 [etd 1:37] .........940 [etd 1:23] .........950 [etd 1:09] .......
..960 [etd 55 sec] .........970 [etd 41 sec] .........980 [etd 27 sec] ......
...990 [etd 13 sec] ........ 999.

Done.
plot(G_private.csr)

Entire Homes/Apt

We will use the following code chunk to compute the G-function using Gest().

G_entirehome = Gest(airbnb_entirehomeSG_ppp, correction = "border")
plot(G_entirehome, xlim=c(0,500))

Our hypothesis:

Ho = The distribution of Entire Homes/Apt in Singapore are randomly distributed.

H1= The distribution of Entire Homes/Apt in Singapore are not randomly distributed.

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.

Now that our hypothesis has been created, we can conduct the Monte Carlo test with the G-function.

G_entirehome.csr <- envelope(airbnb_entirehomeSG_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10 [etd 27:44] .........20 [etd 27:22] .........
30 [etd 26:47] .........40 [etd 25:10] .........50 [etd 23:45] ........
.60 [etd 22:53] .........70 [etd 22:17] .........80 [etd 21:45] .......
..90 [etd 9:29:43] .........100 [etd 8:28:50] .........110 [etd 2 days] ......
...120 [etd 2.8 days] .........130 [etd 2.6 days] .........140 [etd 2.4 days] .....
....150 [etd 2.2 days] .........160 [etd 2 days] .........170 [etd 1.9 days] ....
.....180 [etd 1.8 days] .........190 [etd 1.6 days] .........200 [etd 1.5 days] ...
......210 [etd 1.4 days] .........220 [etd 1.4 days] .........230 [etd 1.3 days] ..
.......240 [etd 1.2 days] .........250 [etd 1.2 days] .........260 [etd 1.1 days] .
........270 [etd 1 day] .........280 [etd 23:46:17] .........290
 [etd 22:38:09] .........300 [etd 21:34:34] .........310 [etd 20:35:07] .........
320 [etd 19:39:22] .........330 [etd 18:47:00] .........340 [etd 17:57:43] ........
.350 [etd 17:11:14] .........360 [etd 16:27:20] .........370 [etd 15:45:49] .......
..380 [etd 15:06:29] .........390 [etd 14:29:10] .........400 [etd 13:53:42] ......
...410 [etd 13:19:58] .........420 [etd 12:47:52] .........430 [etd 12:17:15] .....
....440 [etd 11:48:01] .........450 [etd 11:20:08] .........460 [etd 10:53:28] ....
.....470 [etd 10:27:54] .........480 [etd 10:03:21] .........490 [etd 9:39:48] ...
......500 [etd 9:17:16] .........510 [etd 8:55:35] .........520 [etd 8:34:42] ..
.......530 [etd 8:14:34] .........540 [etd 7:55:11] .........550 [etd 7:36:30] .
........560 [etd 7:18:33] .........570 [etd 7:01:10] .........580
 [etd 6:44:22] .........590 [etd 6:28:08] .........600 [etd 6:12:26] .........
610 [etd 5:57:15] .........620 [etd 5:42:34] .........630 [etd 5:28:23] ........
.640 [etd 5:14:37] .........650 [etd 5:01:16] .........660 [etd 4:48:19] .......
..670 [etd 4:35:44] .........680 [etd 4:23:30] .........690 [etd 4:11:37] ......
...700 [etd 4:00:05] .........710 [etd 3:48:51] .........720 [etd 3:37:56] .....
....730 [etd 3:27:19] .........740 [etd 3:16:58] .........750 [etd 3:06:54] ....
.....760 [etd 3:14:57] .........770 [etd 3:04:33] .........780 [etd 2:54:23] ...
......790 [etd 2:44:24] .........800 [etd 2:34:38] .........810 [etd 2:25:06] ..
.......820 [etd 2:15:49] .........830 [etd 2:06:44] .........840 [etd 1:57:51] .
........850 [etd 1:49:14] .........860 [etd 1:40:46] .........870
 [etd 1:32:28] .........880 [etd 1:24:22] .........890 [etd 1:16:26] .........
900 [etd 1:08:41] .........910 [etd 1:01:05] .........920 [etd 53:39] ........
.930 [etd 46:23] .........940 [etd 39:15] .........950 [etd 32:16] .......
..960 [etd 25:25] .........970 [etd 18:43] .........980 [etd 12:08] ......
...990 [etd 5:42] ........ 999.

Done.
plot(G_entirehome.csr)

Shared Rooms

We will use the following code chunk to compute the G-function using Gest().

G_shared = Gest(airbnb_sharedSG_ppp, correction = "border")
plot(G_shared, xlim=c(0,500))

Our hypothesis:

Ho = The distribution of Shared Rooms in Singapore are randomly distributed.

H1= The distribution of Shared Rooms in Singapore are not randomly distributed.

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.

Now that our hypothesis has been created, we can conduct the Monte Carlo test with the G-function.

G_shared.csr <- envelope(airbnb_sharedSG_ppp, Gest, nsim = 999)
Generating 999 simulations of CSR  ...
1, 2, 3, ......10 [etd 8:39] .........20 [etd 8:31] .........
30 [etd 8:09] .........40 [etd 8:12] .........50 [etd 8:16] ........
.60 [etd 8:20] .........70 [etd 8:15] .........80 [etd 8:13] .......
..90 [etd 8:10] .........100 [etd 8:11] .........110 [etd 8:04] ......
...120 [etd 7:57] .........130 [etd 7:53] .........140 [etd 7:43] .....
....150 [etd 7:35] .........160 [etd 7:29] .........170 [etd 7:23] ....
.....180 [etd 7:15] .........190 [etd 7:13] .........200 [etd 7:07] ...
......210 [etd 7:02] .........220 [etd 6:57] .........230 [etd 6:52] ..
.......240 [etd 6:46] .........250 [etd 6:40] .........260 [etd 6:37] .
........270 [etd 6:32] .........280 [etd 6:26] .........290
 [etd 6:21] .........300 [etd 6:18] .........310 [etd 6:18] .........
320 [etd 6:12] .........330 [etd 6:08] .........340 [etd 6:04] ........
.350 [etd 5:59] .........360 [etd 5:54] .........370 [etd 5:49] .......
..380 [etd 5:45] .........390 [etd 5:39] .........400 [etd 5:35] ......
...410 [etd 5:32] .........420 [etd 5:28] .........430 [etd 5:24] .....
....440 [etd 5:18] .........450 [etd 5:12] .........460 [etd 5:08] ....
.....470 [etd 5:04] .........480 [etd 4:59] .........490 [etd 4:55] ...
......500 [etd 4:49] .........510 [etd 4:44] .........520 [etd 4:39] ..
.......530 [etd 4:34] .........540 [etd 4:29] .........550 [etd 4:26] .
........560 [etd 4:21] .........570 [etd 4:15] .........580
 [etd 4:12] .........590 [etd 4:06] .........600 [etd 4:00] .........
610 [etd 3:54] .........620 [etd 3:49] .........630 [etd 3:43] ........
.640 [etd 3:36] .........650 [etd 3:30] .........660 [etd 3:23] .......
..670 [etd 3:17] .........680 [etd 3:10] .........690 [etd 3:04] ......
...700 [etd 2:58] .........710 [etd 2:51] .........720 [etd 2:45] .....
....730 [etd 2:39] .........740 [etd 2:32] .........750 [etd 2:26] ....
.....760 [etd 2:20] .........770 [etd 2:14] .........780 [etd 2:08] ...
......790 [etd 2:02] .........800 [etd 1:56] .........810 [etd 1:50] ..
.......820 [etd 1:44] .........830 [etd 1:38] .........840 [etd 1:32] .
........850 [etd 1:26] .........860 [etd 1:21] .........870
 [etd 1:15] .........880 [etd 1:09] .........890 [etd 1:03] .........
900 [etd 57 sec] .........910 [etd 51 sec] .........920 [etd 45 sec] ........
.930 [etd 40 sec] .........940 [etd 34 sec] .........950 [etd 28 sec] .......
..960 [etd 22 sec] .........970 [etd 17 sec] .........980 [etd 11 sec] ......
...990 [etd 5 sec] ........ 999.

Done.
plot(G_shared.csr)