Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's happening behind the conversion of factor to numeric?

Tags:

r

dplyr

magrittr

Here is a small data.frame:

e = data.frame(A=c(letters[1:5], 1:5))

I am a little bit confused regarding what's happening when I execute the following command:

unclass(e$A) %>% as.numeric()

I am getting the following output:

 [1]  6  7  8  9 10  1  2  3  4  5

why a:e is treated as 6:10?

like image 242
dondapati Avatar asked Nov 21 '25 09:11

dondapati


1 Answers

data.frame makes a factor, this can be seen by using str(e):

'data.frame': 10 obs. of  1 variable:
 $ A: Factor w/ 10 levels "1","2","3","4",..: 6 7 8 9 10 1 2 3 4 5

This factor has different levels, ordered alphabetically (where R sorts numbers before letters), levels(e$A):

 [1] "1" "2" "3" "4" "5" "a" "b" "c" "d" "e"

as.numeric converts a factor to the indices of the levels, i.e. the first level gets value 1 (which means 1 remains 1) and the sixth level gets value 6 (which means "a" becomes 6).

In this case you actually already force this conversion with unclass(), which results in the numeric vector you see. The as.numeric then only also drops the levels attribute.

?Comparison tells us any comparison between character vectors (such as sorting them) are based on the collating sequence of the current locale.

Note: this is independent of the %>%.

like image 162
Axeman Avatar answered Nov 22 '25 23:11

Axeman