We have two geographies: census tracts and a squared grid. The grid dataset only has information on population count. We have information on the total income of each census tract. What we would like to do is to apportion these income data from the census tracts to the grid cells.
This is a very common problem in geographical analysis and there're probably many ways to address it. We want to do this considering not only the spatial overlap between census tracts and grid cells but also considering the population of each cell. This is mainly to avoid problems when there is a large census tract that may contain people living only in a small area.
We present below a reproducible example (using R and the sf package) and the solution we've found to this problem so far, using a sample we extracted from our geographies. We would appreciate to see if others have alternative (more efficient) solutions to check if our results are correct.
library(sf)
library(dplyr)
library(readr)
# Files
download.file("https://github.com/ipeaGIT/acesso_oport/raw/master/test/shapes.RData", "shapes.RData")
load("shapes.RData")
# Open tracts and calculate area
tract <- tract %>%
mutate(area_tract = st_area(.))
# Open grid squares and calculate area
square <- square %>%
mutate(area_square = st_area(.))
ui <-
# Create spatial units for all intersections between the tracts and the squares (we're calling these "piece")
st_intersection(square, tract) %>%
# Calculate area for each piece
mutate(area_piece = st_area(.)) %>%
# Compute the proportion of each tract that's inserted in that piece
mutate(area_prop_tract = area_piece/area_tract) %>%
# Compute the proportion of each square that's inserted in that piece
mutate(area_prop_square = area_piece/area_square) %>%
# Based on the square's population, compute the population that lives in that piece
mutate(pop_prop_square = square_pop * area_prop_square) %>%
# Compute the population proportion of each square that is within the tract
group_by(id_tract) %>%
mutate(sum = sum(pop_prop_square)) %>%
ungroup() %>%
# Compute population of each piece whitin the tract
mutate(pop_prop_square_in_tract = pop_prop_square/sum) %>%
# Compute income within each piece
mutate(income_piece = tract_incm* pop_prop_square_in_tract)
# Final agreggation by squares
ui_fim <- ui %>%
# Group by squares and population and sum the income for each piece
group_by(id_square, square_pop) %>%
summarise(square_income = sum(income_piece, na.rm = TRUE))
Thank you!
Depending on the approach to interpolation you want to use, I may have a solution for you that I've helped develop. The areal package implements areal weighted interpolation, and I use it in my own research from interpolating between U.S. census geography and grid squares. You can check out the package's website (and associated vignettes) here. Hope this is useful!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With