Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating through data frame and changing values on condition [R]

Had to make an account because this sequence of for loops has been annoying me for quite some time.

I have a data frame in R with 1000 rows and 10 columns, with each value ranging from 1:3. I would like to re-code EVERY entry so that: 1==3, 2==2, 3==1. I understand that there are easier ways to do this, such as sub-setting each column and hard coding the condition, but this isn't always ideal as many of the data sets that I work with have up to 100 columns.

I would like to use a nested loop in order to accomplish this task -- this is what I have thus far:

for(i in 1:nrow(dat_trans)){
  for(j in length(dat_trans)){
    if(dat_trans[i,j] == 1){
      dat_trans[i,j] <- 3
    } else if(dat_trans[i,j] == 2){
      dat_trans[i,j] <- 2
    } else{
      dat_trans[i,j] <- 1
    }
  }
}

So I iterate through the first column, grab every value and change it based on the if/else's condition, I am still learning R so if you have any pointers in my code, feel free to point it out.

edit: code

like image 596
Silver_Surfer9 Avatar asked Sep 02 '25 05:09

Silver_Surfer9


2 Answers

R is a vectorized language, so you really don't need the inner loop.
Also if you notice that 4-"old value" = "new value", you can eliminate the if statements.

for(i in 1:ncol(dat_trans)){
        dat_trans[,i] <- 4-dat_trans[,i]
}

The outer loop is now iterating across the columns for only 10 iterations as opposed to 1000 for all of rows. This will greatly improve performance.

like image 100
Dave2e Avatar answered Sep 04 '25 19:09

Dave2e


This type of operation is a swap operation. The ways to swap values without for loops are numerous.

To set up a simple dataframe:

df <- data.frame(
  col1 = c(1,2,3),
  col2 = c(2,3,1),
  col3 = c(3,1,2)
)

Using a dummy value:

df[df==1] <- 4
df[df==3] <- 1
df[df==4] <- 3

Using a temporary variable:

dftemp <- df
df[dftemp==1] <- 3
df[dftemp==3] <- 1

Using multiplication/division and addition/subtraction:

df <- 4 - df

Using Boolean operations:

df <- (df==1) * 3 + (df==2) * 2 + (df==3) * 1

Using a bitwise xor (in case you really have a need for speed):

df[df!=2] <- sapply(df, function(x){bitwXor(2,x)})[df!=2]

If a nested for loop is required the switch function is a good option.

for(i in seq(ncol(df))){
  for(j in seq(nrow(df))){
    df[j,i] <- switch(df[j,i],3,2,1)
  }
}

Text can be used if the values are not as nicely indexed as 1, 2, and 3.

for(i in seq(ncol(df))){
  for(j in seq(nrow(df))){
    df[j,i] <- switch(as.character(df[j,i]),
                      "1" = 3,
                      "2" = 2,
                      "3" = 1)
  }
}
like image 27
Agriculturist Avatar answered Sep 04 '25 21:09

Agriculturist