I am trying to fit Polish local gov election results 2015 following the supperb blog by @GavinSimpson. https://www.fromthebottomoftheheap.net/2017/10/19/first-steps-with-mrf-smooths/ I join xls with shp data on 6 digit identifier (there may be a leading 0 s). I kept it a text variable. EDIT, I simplified the identifier and am now using a sequence from 1 to nrow to simplify my question.
library(tidyverse)
library(sf)
library(mgcv)
# Read data
# From https://www.gis-support.pl/downloads/gminy.zip shp file
boroughs_shp <- st_read("../../_mapy/gminy.shp",options = "ENCODING=WINDOWS-1250",
                     stringsAsFactors = FALSE ) %>% 
  st_transform(crs = 4326)%>% 
  janitor::clean_names() %>% 
# st_simplify(preserveTopology = T, dTolerance = 0.01) %>% 
  mutate(teryt=str_sub(jpt_kod_je, 1, 6)) %>% 
  select(teryt, nazwa=jpt_nazwa, geometry)
# From https://parlament2015.pkw.gov.pl/wyniki_zb/2015-gl-lis-gm.zip data file
elections_xls <-
  readxl::read_excel("data/2015-gl-lis-gm.xls",
             trim_ws = T, col_names = T) %>% 
  janitor::clean_names() %>% 
  select(teryt, liczba_wyborcow, glosy_niewazne)
elections <-
  boroughs_shp %>% fortify() %>% 
  left_join(elections_xls, by = "teryt") %>% 
  arrange(teryt) %>%
  mutate(idx = seq.int(nrow(.)) %>% as.factor(), 
         teryt = as.factor(teryt)) 
# Neighbors
boroughs_nb <-spdep::poly2nb(elections, snap = 0.01, queen = F, row.names = elections$idx )
names(boroughs_nb) <- attr(boroughs_nb, "region.id")
# Model
ctrl <- gam.control(nthreads = 4) 
m1 <- gam(glosy_niewazne ~ s(idx, bs = 'mrf', xt = list(nb = boroughs_nb)), 
          data = elections,
          offset = log(liczba_wyborcow), # number of votes
          method = 'REML', 
          control = ctrl,
          family = betar()) 
Here is the error message:
    Error in smooth.construct.mrf.smooth.spec(object, dk$data, dk$knots) : 
  mismatch between nb/polys supplied area names and data area names
In addition: Warning message:
In if (all.equal(sort(a.name), sort(levels(k))) != TRUE) stop("mismatch between nb/polys supplied area names and data area names") :
  the condition has length > 1 and only the first element will be used
elections$idx is a factor. I am using it to give names to boroughs_nb to be absolutely sure I have the same number of levels. What am I doing wrong?
EDIT: The condition mentioned in error message is met:
> all(sort(names(boroughs_nb)) == sort(levels(elections$idx)))
[1] TRUE
It seems that I solved the issue, maybe not quite realizing how it did being stat beginner.
First, not a single NA should be present in modeled data. There was one. After that the mcgv seemed to run, but it took long time (quarter of an hour) and inexplicably for me, only when I limited no of knots to k=50, with poor results (less or more and it did not return any result) and with warning to be cautious about results. 
Then I tried to remove offset=log(liczba_wyborcow) ie offset number of voters and made number of void votes per 1000 my predicted variable.
elections <-
 boroughs_shp %>%  
 left_join(elections_xls, by = "teryt") %>% na.omit() %>% 
 arrange(teryt) %>% 
 mutate(idx = row_number() %>% as.factor()) %>% 
 mutate(void_ratio=round(glosy_niewazne/liczba_wyborcow,3)*1000)
Now that it is a count, why not try change family = betar() in gam formula to poisson() - still not a good result, and then to negative binomial  family = nb()
Now my formula looks like
m1 <-
gam(
 void_ratio ~ s(
 idx,
 bs = 'mrf',
 k =500,
 xt = list(nb = boroughs_nb),
 fx = TRUE),
 data = elections_df,
 method = 'REML', 
 control = gam.control(nthreads = 4),
 family = nb()
)
It seems now to be blazingly fast and return valid results with no warnings or errors. On a laptop with 4 cores Intel Core I7 6820HQ @ 2.70GHZ 16GB Win10 it takes now 1-2 minutest to build a model.
In brief, what I changed was: remove a single NA, remove offset from formula and use negative binomial distribution.
Here is the result of what I wanted to achieve, from left to right, real rate of void votes, a rate smoothed by a model and residuals indicating discrepancies. The mcgv code let me do that.

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With