I created a random forest and predicted the classes of my test set, which are living happily in a dataframe:
row.names class 564028 1 275747 1 601137 0 922930 1 481988 1 ...
The row.names attribute tells me which row is which, before I did various operations that scrambled the order of the rows during the process. So far so good.
Now I would like get a general feel for the accuracy of my predictions. To do this, I need to take this dataframe and reorder it in ascending order according to the row.names attribute. This way, I can compare the observations, row-wise, to the labels, which I already know.
Forgive me for asking such a basic question, but for the life of me, I can't find a good source of information regarding how to do such a trivial task.
The documentation implores me to:
use
attr(x, "row.names")if you need to retrieve an integer-valued set of row names.
but this leaves me with nothing but NULL.
My question is, how can I use row.names which has been loyally following me around in the various incarnations of dataframes throughout my workflow? Isn't this what it is there for?
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
`. rowNamesDF<-` is a (non-generic replacement) function to set row names for data frames, with extra argument make. names .
None of the other solutions would actually work.
It should be:
# Assuming the data frame is called df df[ order(as.numeric(row.names(df))), ] because the row name in R is character, when the as.numeric part is missing it, it will arrange the data as 1, 10, 11, ... and so on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With