Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R- Replace all values in rows of dataframe after first NA by NA

Tags:

r

na

I have a dataframe of 3500 observations and 278 variables. For each row going from the first column, I want to replace all values occurring after the first NA by NAs. For instance, I want to go from a dataframe like so:

X1 X2 X3 X4 X5
 1  3 NA  6  9
 1 NA  4  6 18
 6  7 NA  3  1 
10  1  2 NA  2 

To something like

X1 X2 X3 X4 X5
 1  3 NA NA NA
 1 NA NA NA NA
 6  7 NA NA NA 
10  1  2 NA NA   

I tried using the following nested for loop, but it is not terminating:

for(i in 2:3500){
 firstna <- min(which(is.na(df[i,])))
 df[i, firstna:278] <- NA
}

Is there a more efficient way to do this? Thanks in advance.

like image 927
Prasad Avatar asked Dec 05 '25 00:12

Prasad


2 Answers

You could do something like this:

# sample data
mat <- matrix(1, 10, 10)
set.seed(231)
mat[sample(100, 7)] <- NA

You can use apply with cumsum and is.na to keep track of where NAs need to be placed (i.e. places across the row where the cumulative sum of NAs is greater than 0). Then, use those locations to assign NAs to the original structure in the appropriate places.

mat[t(apply(is.na(mat), 1, cumsum)) > 0 ] <- NA
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    1    1    1    1    1   NA   NA   NA    NA
# [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
# [3,]    1    1    1    1    1    1    1    1    1     1
# [4,]    1    1    1    1    1    1    1    1    1     1
# [5,]    1    1    1   NA   NA   NA   NA   NA   NA    NA
# [6,]    1    1    1    1    1    1    1    1    1     1
# [7,]    1   NA   NA   NA   NA   NA   NA   NA   NA    NA
# [8,]    1    1    1    1    1    1    1    1    1     1
# [9,]    1    1    1    1    1    1    1    1    1     1
#[10,]    1    1   NA   NA   NA   NA   NA   NA   NA    NA

Works the fine with data frames. Using the provided example data:

d<-read.table(text="
X1 X2 X3 X4 X5
 1  3 NA  6  9
 1 NA  4  6 18
 6  7 NA  3  1 
10  1  2 NA  2 ", header=TRUE)

d[t(apply(is.na(d), 1, cumsum)) > 0 ] <- NA
#  X1 X2 X3 X4 X5
#1  1  3 NA NA NA
#2  1 NA NA NA NA
#3  6  7 NA NA NA
#4 10  1  2 NA NA
like image 162
Jota Avatar answered Dec 07 '25 14:12

Jota


We can use rowCumsums from library(matrixStats)

library(matrixStats)
d*NA^rowCumsums(+(is.na(d)))
#  X1 X2 X3 X4 X5
#1  1  3 NA NA NA
#2  1 NA NA NA NA
#3  6  7 NA NA NA
#4 10  1  2 NA NA

Or a base R option is

d*NA^do.call(cbind,Reduce(`+`,lapply(d, is.na), accumulate=TRUE))
like image 43
akrun Avatar answered Dec 07 '25 14:12

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!