I have a large data set with 11 columns and 100000 rows (for example) in which i have values 1,2,3,4. Where 4 is a missing value. Some of the rows are completely missing. i.e. 4 in all 11 columns. For example <pre class="prettyprint"><code>"4" "4" "4" "4" "4" "4" "4" "4" "4" "4" "4" </code></pre> Now what i need is to remove only those rows which are completely missing. In simple words, i want to keep rows with missing value less than 11. I have used na.omit, but it does not work in my case. Thanks in advance.

Perhaps your best option is to utilise R's idiom for working with missing, or <code>NA</code> values. Once you have coded <code>NA</code> values you can work with <code>complete.cases</code> to easily achieve your objective. Create some sample data with missing values (i.e. with value 4): <pre class="prettyprint"><code>set.seed(123) m <- matrix(sample(1:4, 30, prob=c(0.3, 0.3, 0.3, 0.1), replace=TRUE), ncol=6) m[4, ] <- rep(4, 6) </code></pre> Replace all values equal to 4 with <code>NA</code>: <pre class="prettyprint"><code>m[m==4] <- NA m [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 1 NA 2 2 2 [2,] 2 3 3 1 2 3 [3,] 3 2 2 1 2 3 [4,] NA NA NA NA NA NA [5,] NA 3 1 NA 2 1 </code></pre> Now you can use a variety of functions that deal with <code>NA</code> values. For example, <code>complete.cases</code> will return only, you guessed it, complete cases: <pre class="prettyprint"><code>m[complete.cases(m), ] [,1] [,2] [,3] [,4] [,5] [,6] [1,] 2 3 3 1 2 3 [2,] 3 2 2 1 2 3 </code></pre> <hr> For more information, see <code>?complete.cases</code> or <code>?na.fail</code> in the <code>stats</code> package.

I found this solution elsewhere and am pasting it here using Andrie's code to generate the initial data set. First generate the data set: <pre class="prettyprint"><code>set.seed(123) m <- matrix(sample(1:4, 30, prob=c(0.3, 0.3, 0.3, 0.1), replace=TRUE), ncol=6) m[4, ] <- rep(4, 6) m[m==4] <- NA m </code></pre> Here is the intial data set: <pre class="prettyprint"><code>1 1 NA 2 2 2 2 3 3 1 2 3 3 2 2 1 2 3 NA NA NA NA NA NA NA 3 1 NA 2 1 </code></pre> Now remove rows that only contain missing observations: <pre class="prettyprint"><code>m[rowSums(is.na(m))<ncol(m),] </code></pre> Here is the result: <pre class="prettyprint"><code>1 1 NA 2 2 2 2 3 3 1 2 3 3 2 2 1 2 3 NA 3 1 NA 2 1 </code></pre>

How to remove a row which contain only missing values in R?

Tags:

r

I have a large data set with 11 columns and 100000 rows (for example) in which i have values 1,2,3,4. Where 4 is a missing value. Some of the rows are completely missing. i.e. 4 in all 11 columns. For example

"4"  "4"  "4"  "4"  "4"  "4"  "4"  "4"  "4"  "4"   "4"

Now what i need is to remove only those rows which are completely missing. In simple words, i want to keep rows with missing value less than 11. I have used na.omit, but it does not work in my case.

Thanks in advance.

660

asked Aug 25 '11 04:08

Iftikhar

2 Answers

Perhaps your best option is to utilise R's idiom for working with missing, or NA values. Once you have coded NA values you can work with complete.cases to easily achieve your objective.

Create some sample data with missing values (i.e. with value 4):

set.seed(123)
m <- matrix(sample(1:4, 30, prob=c(0.3, 0.3, 0.3, 0.1), replace=TRUE), ncol=6)
m[4, ] <- rep(4, 6)

Replace all values equal to 4 with NA:

m[m==4] <- NA
m
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1   NA    2    2    2
[2,]    2    3    3    1    2    3
[3,]    3    2    2    1    2    3
[4,]   NA   NA   NA   NA   NA   NA
[5,]   NA    3    1   NA    2    1

Now you can use a variety of functions that deal with NA values. For example, complete.cases will return only, you guessed it, complete cases:

m[complete.cases(m), ]

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    2    3    3    1    2    3
[2,]    3    2    2    1    2    3

For more information, see ?complete.cases or ?na.fail in the stats package.

answered Oct 17 '22 11:10

Andrie

I found this solution elsewhere and am pasting it here using Andrie's code to generate the initial data set.

First generate the data set:

set.seed(123)
m <- matrix(sample(1:4, 30, prob=c(0.3, 0.3, 0.3, 0.1), replace=TRUE), ncol=6)
m[4, ] <- rep(4, 6)
m[m==4] <- NA
m

Here is the intial data set:

1    1    NA   2    2    2
2    3    3    1    2    3
3    2    2    1    2    3
NA   NA   NA   NA   NA   NA
NA   3    1    NA   2    1

Now remove rows that only contain missing observations:

m[rowSums(is.na(m))<ncol(m),]

Here is the result:

1    1    NA   2    2    2
2    3    3    1    2    3
3    2    2    1    2    3
NA   3    1    NA   2    1

answered Oct 17 '22 12:10

Mark Miller

Related questions
                            
                                Count rows between NA's
                            
                                Closest point to a path
                            
                                R implementation for Finding the longest common starting substrings in a set of strings
                            
                                Convert integers to decimal values
                            
                                Use string to select column per row in dplyr (or base R)
                            
                                Adding new column with conditional values using ifelse
                            
                                Return last match from vector
                            
                                extract weekdays from a set of dates in R
                            
                                Changing temporary directory in R [duplicate]
                            
                                How to match a string and white space in R
                            
                                R - Identify a sequence of row elements by groups in a dataframe
                            
                                Generate a sequence of numbers between values of vector
                            
                                Sum elements across a list of data.frames
                            
                                Adjust size of Shiny progress bar and center it
                            
                                Take every nth row from a file with groups and n is a given in a column
                            
                                Select highest values in a dataframe by group
                            
                                Count number of rows within certain range
                            
                                how to do R includes [duplicate]
                            
                                Conditional summing (R)
                            
                                Error with gls function in nlme package in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With