require(spatialEco)
library(rgdal)
library(foreign)

The code below demonstrates spatial join and aggregation examples using spatialeco, foreign, and rgdal packages. Some code snippets from the Spatial Data Science introduction course (authored by Luc Anselin) were used from the Center for Spatial Data Science.

Data Preparation

You must have two spatial data files to start. Lat and long column fields do not make the data spatial; it must be converted to a spatial format such as .shp (shapefile), geojson, and so forth. These formats should include the appropriate project.

Data used in this tutorial, produce carts and Community Area boundaries, were downloaded from the City of Chicago Data Portal. The community area data was exported as a shapefile. The produce cart dataset was uploaded to GeoDa, and converted to spatial format using the provided latitute and longitude.

Load Spatial Data

Import two spatial files with your data. Here, I first import my points shapefile representing produce carts in Chicago. Then second file is the area of interest that I’m aggregating my stations to; Community Areas of Chicago.

Your data must have the following extensions as a shapefile: .shp, .shx, .dbf, and .prj. The .prj file includes the projection of the data. For a spatial join, both datasets must be in the same projection!

Point Data - Produce Carts in Chicago

Read shapefile

pts<- readOGR(".","Produce_Carts")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "Produce_Carts"
## with 14 features
## It has 4 fields
# Plot points for inspection
plot(pts)

# Check length of data
length(pts)
## [1] 14
# Inspect data. Note here that the produce cart location field is incorrent (should be long,lat), though we have the corrected version implemented in our spatial metadata.
head(pts@data)
##              ADDRESS LATITUDE LONGITUDE                    LOCATION
## 1  721 N LA SALLE DR 41.89558 -87.63254 (41.89557507, -87.63254328)
## 2   4602 N BROADWAY  41.96548 -87.65769 (41.96547764, -87.65768753)
## 3   25 E CHICAGO AVE 41.89658 -87.62709 (41.89658176, -87.62708766)
## 4   3758 W OGDEN AVE 41.85308 -87.71965 (41.85307985, -87.71965154)
## 5     4020 W 26TH ST 41.84435 -87.72546 (41.84435111, -87.72545519)
## 6 1601 W DIVISION ST 41.90324 -87.66755 (41.90323582, -87.66755407)

Polygon Data - Community Areas (ie. Neighorhoods) in Chicago

# Read shapefile
polys<-readOGR(".","ComArea")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "ComArea"
## with 77 features
## It has 2 fields
# Plot polygons for inspection
plot(polys) 

# Inspect data.
head(polys@data)
##   ComAreaID       community
## 0        35         DOUGLAS
## 1        36         OAKLAND
## 2        37     FULLER PARK
## 3        38 GRAND BOULEVARD
## 4        39         KENWOOD
## 5         4  LINCOLN SQUARE

Spatial Join

Use spatialEco function: point.in.poly. This funciton intersects point and polygon feature classes and adds polygon attributes to points.

pts.poly <- point.in.poly(pts, polys)

Aggregate points per polygon

There are many ways to do this. In this example, I use the table function in R to create a frequency of produce carts (points) per community area (polygons).

carts_total<- table(pts.poly$ComAreaID)
cartsframe<- as.data.frame(carts_total)
head(cartsframe)
##   Var1 Freq
## 1    3    1
## 2    7    1
## 3    8    2
## 4   24    2
## 5   29    3
## 6   30    4
dim(cartsframe)
## [1] 7 2

Data Cleaning

Because there are only a few community areas represented by produce carts, we need to join frequencies to a new, empty vector that matches the Community Area length.

#Check if Var1 is a factor; if it is, we need to change.
is.factor(cartsframe$Var1)
## [1] TRUE
#We convert Var 1 from factor to level to numeric. 
nnf<- cartsframe$Var1
levels(nnf)
## [1] "3"  "7"  "8"  "24" "29" "30" "35"
nnn<- as.numeric(levels(nnf))
nnn
## [1]  3  7  8 24 29 30 35
is.factor(nnn)
## [1] FALSE
# Create a new, empty vector that is the length of Community Area polygons
narea<- max(comarea$ComAreaID)
vc<- vector(mode="numeric",length=narea)
vc
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0
length(vc)
## [1] 77
vc[nnn]<- egoframe$Freq
vc
##  [1] 0 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 3 4 0 0 0 0 1
## [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0

Write data

Once we have the data set up how we’d like it, we convert to a data frame, update column names, and write to a csv. That csv can later be joined to the community area shapefile we have (using R, Geoda, or another option).

# Finalize new data frame
nid <- (1:narea)
vcframe <- data.frame(as.integer(nid),as.integer(vc))
names(vcframe)<- c("CID","producecarts")

# Write data as csv for joining in Geoda.
write.csv(vcframe,"produce-carts-by-CA.csv",row.names=FALSE)