I have a dataframe called dd2. I need to paste the values in Left.Gene.Symbols and Right.Gene.Symbols which I can do by simply using code below, but I would not want NAs pasted along if there is missing values. I want it to look like in the combination column as shown in result.
mycode
#to remove NAs
dd2[dd2 == 'NA'] <- NA
#pasting values together
result <- cbind(dd2,combination = paste(dd2[,"Left.Gene.Symbols"],dd2[,"Right.Gene.Symbols"],sep="*"))
data
dd2<- structure(c("AMLM12001KP", "AMLM12001KP", "AMLM12001KP", "AMLM12001KP",
"AMLM12001KP", "AK2", "HFM1", "HFM1", "HFM1", "HFM1", NA, "PPT",
NA, "GGT", NA), .Dim = c(5L, 3L), .Dimnames = list(NULL, c("customer_sample_id",
"Left.Gene.Symbols", "Right.Gene.Symbols")))
result
customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combination
[1,] "AMLM12001KP" "AK2" NA AK2*
[2,] "AMLM12001KP" "HFM1" "PPT" HFM1*PPT
[3,] "AMLM12001KP" "HFM1" NA HFM1*
[4,] "AMLM12001KP" "HFM1" "GGT" HFM1*GGT
[5,] "AMLM12001KP" "HFM1" NA HFM1*
The na. omit() function returns a list without any rows that contain na values. It will drop rows with na value / nan values. This is the fastest way to remove na rows in the R programming language.
How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.
There are two easy methods to select columns of an R data frame without missing values, first one results in a vector and other returns a matrix. For example, if we have a data frame called df then the first method can be used as df[,colSums(is.na(df))==0] and the second method will be used as t(na.
action settings within R include: na. omit and na. exclude: returns the object with observations removed if they contain any missing values; differences between omitting and excluding NAs can be seen in some prediction and residual functions.
You could do something like this, temporarily replacing NA values with the empty character "".
cbind(
dd2,
combination = paste(dd2[,2], replace(dd2[,3], is.na(dd2[,3]), ""), sep = "*")
)
# customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combinations
# [1,] "AMLM12001KP" "AK2" NA "AK2*"
# [2,] "AMLM12001KP" "HFM1" "PPT" "HFM1*PPT"
# [3,] "AMLM12001KP" "HFM1" NA "HFM1*"
# [4,] "AMLM12001KP" "HFM1" "GGT" "HFM1*GGT"
# [5,] "AMLM12001KP" "HFM1" NA "HFM1*"
Of course substitute your column names for the column numbers above. I didn't write them because they are too long.
We can use NAer from qdap with sprintf
library(qdap)
sprintf('%s*%s', dd2[,2],NAer(dd2[,3],''))
#[1] "AK2*" "HFM1*PPT" "HFM1*" "HFM1*GGT" "HFM1*"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With