Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error in reshape(): duplicate 'row names' are not allowed

Tags:

r

reshape

I have wide longitudinal data that I would like to reshape into long data. This is a sample:

sex group id sex.1 group.1    status1  beg1  end1 status2  beg2  end2
1 1000   1     a 1000     1       a Vocational  <NA> S2007      HE S2007 S2008
2 1001   1     a 1001     1       a Vocational  <NA> S2007      HE S2008 S2012
3 1004   1     a 1004     1       a Vocational  <NA> S2008     999  <NA>  <NA>
4 1006   2     a 1006     2       a Vocational  <NA> S2007    Army S2012  <NA>
5 1007   1     a 1007     1       a         HE  <NA> S2007     999  <NA>  <NA>
6 1008   1     a 1008     1       a Vocational S2013  <NA>     999  <NA>  <NA>

I need to get it in this shape, compatible with SPELL format:

  id sex  group index  status    beg     end
1000  1    a      1   Vocational  NA     S2007
1000  1    a      2      HE      S2008   S2012
...

I am using the following command:

spell <- reshape(data, 
                 varying=names(data)[4:60],
                 direction="long",
                 idvar=c("id","sex","group"),
                 sep="")   

And I get the following error message:

    Error in `row.names<-.data.frame`(`*tmp*`, value = paste(d[, idvar], times[1L],  : 
duplicate 'row.names' are not allowed
        In addition: Warning message: non-unique value when setting 'row.names': ‘NA.1’ 

I have tried setting NA values to 999 this way, but it does not work.

data[is.na(data)] <- 999

Do you know what may get this to work? thanks a lot beforehand!

like image 746
Gina Zetkin Avatar asked Sep 06 '25 22:09

Gina Zetkin


1 Answers

That error message indicates that you either have duplicate rows or missing values in the id variable(s).

Check for duplicates first:

with(data, any(duplicated(cbind(id, sex, group))))

If TRUE, then there's your answer.

If FALSE, then you may have missing values in the id variable(s), maybe even whole missing rows, and probably at the end. This can be due to the actual source data having blank rows or your R command to import the data, for example using read_excel and specifying too many rows in the range argument. In any case, check the data carefully for missing values in the id variable(s). Replacing them all with 999 won't help.

like image 108
Edward Avatar answered Sep 08 '25 12:09

Edward