I found that function varIdent has an undesired behavior which is dependent on data order. In detail, the reference category depends on data order, see the next example
set.seed(123)
library(tidyverse)
library(nlme)
#> 
#> Attaching package: 'nlme'
#> The following object is masked from 'package:dplyr':
#> 
#>     collapse
n_smpl<-10
z0<-data.frame(sex=sample(c("F","M"),n_smpl,prob = c(0.6,0.4),replace = T) %>% factor)
summary(z0)
#>  sex  
#>  F:6  
#>  M:4
X<-model.matrix(~sex, z0)
sd_e<-1.3
e<-rnorm(nrow(z0),0,sd_e)
my_bts<-c(10,1.4)
z0$y<-(X %*% my_bts+e)%>% drop
m0<-gls(model=y~sex,weights=varIdent(form=~1|sex),data=z0[order(z0$sex),])
m1<-gls(model=y~sex,weights=varIdent(form=~1|sex),data=z0[order(z0$sex,decreasing =T),])
summary(m0)
#> Generalized least squares fit by REML
#>   Model: y ~ sex 
#>   Data: z0[order(z0$sex), ] 
#>        AIC      BIC    logLik
#>   35.82722 36.14498 -13.91361
#> 
#> Variance function:
#>  Structure: Different standard deviations per stratum
#>  Formula: ~1 | sex 
#>  Parameter estimates:
#>         F         M 
#> 1.0000000 0.5305269 
#> 
#> Coefficients:
#>                 Value Std.Error   t-value p-value
#> (Intercept) 10.344227 0.5847687 17.689434  0.0000
#> sexM         0.967754 0.6973690  1.387721  0.2026
#> 
#>  Correlation: 
#>      (Intr)
#> sexM -0.839
#> 
#> Standardized residuals:
#>         Min          Q1         Med          Q3         Max 
#> -1.38845834 -0.72023259 -0.02681155  0.85333098  1.31623645 
#> 
#> Residual standard error: 1.432385 
#> Degrees of freedom: 10 total; 8 residual
summary(m1)
#> Generalized least squares fit by REML
#>   Model: y ~ sex 
#>   Data: z0[order(z0$sex, decreasing = T), ] 
#>        AIC      BIC    logLik
#>   35.82722 36.14498 -13.91361
#> 
#> Variance function:
#>  Structure: Different standard deviations per stratum
#>  Formula: ~1 | sex 
#>  Parameter estimates:
#>        M        F 
#> 1.000000 1.884919 
#> 
#> Coefficients:
#>                 Value Std.Error   t-value p-value
#> (Intercept) 10.344227 0.5847687 17.689434  0.0000
#> sexM         0.967754 0.6973690  1.387721  0.2026
#> 
#>  Correlation: 
#>      (Intr)
#> sexM -0.839
#> 
#> Standardized residuals:
#>         Min          Q1         Med          Q3         Max 
#> -1.38845834 -0.72023259 -0.02681155  0.85333098  1.31623645 
#> 
#> Residual standard error: 0.7599187 
#> Degrees of freedom: 10 total; 8 residual
Created on 2023-04-04 by the reprex package (v2.0.1)
Look at Variance function outputs for m0 and m1 models, the reference category is different in each model. This does not happens in the mean structure component of the model.
This seems to be a bug, or at least an infelicity, in nlme. If your sex variable wasn't already a factor with defined levels I would say it was your fault, or at least expected - but it is.
I have reported this here.
A patched version that fixes this problem is available at https://github.com/bbolker/nlme: remotes::install_github("bbolker/nlme") to install it (you'll need to have development tools installed to install from source).
A slightly more compact reproducible example:
library(nlme)
dd <- transform(mtcars,
                am = factor(am))
dd <- dd[order(dd$am),]
dd_rev <- dd[order(dd$am, decreasing = TRUE),]
m0 <- gls(mpg ~ hp, weights = varIdent(form = ~1|am), data = dd)
m1 <- update(m0, data = dd_rev)
coef(m0$modelStruct$varStruct)  # 0.9776946
coef(m1$modelStruct$varStruct)  # -0.9776946
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With