Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aggregating columns of data frame

Tags:

dataframe

r

I have a data.frame as follows:

>data
    ID     Orginal   Modified
    Sam_1    M         K
    Sam_1    K         M
    Sam_1    I         J
    Sam_1    M         K
    Sam_1    K         M
    Sam_2    K         M
    Sam_2    M         K
    Sam_3    J         P
    Sam_4    K         M
    Sam_4    M         K
    Sam_4    P         J 

I would like to count the for every sample number times M in column "Original" is converted to K in column "Modified" and "K" ins the column "Original" to "M" in columns Modified and report it in tab delim text file as follows:

>newdata
    ID     M_to_K_counts  K_to_M_counts 
    Sam_1     2                2 
    Sam_2     1                1
    Sam_3     0                0
    Sam_4     1                1

I tried following code but it failed:

counts=function()
{
for(i in 1:dim(rnaseqmut)[1])
{
  mk_counts=0
  km_counts=0
  if(data$Original[i]=='M' & data$Modified[i]== 'K')
    {
       mk_counts=mk_counts+1
    }
  if(data$Original[i]=='K' & data$Modified[i]== 'M')
    {
       km_counts=km_counts+1
    }
}
print(mk_counts)
print(km_counts)
}

How can I achieve my desired format.

like image 429
Carol Avatar asked Dec 02 '25 05:12

Carol


2 Answers

One option would be using data.table. Convert the 'data.frame' to 'data.table' (setDT(data)). Grouped by the 'ID' column, we get the sum of elements that are 'M' for the 'Orginal' and 'K' for 'Modified' ('MtoKcount'), similarly the 'KtoMcount' is got by doing the reverse.

library(data.table)
setDT(data)[, list(MtoKcount=sum(Orginal=='M' & Modified=='K'),
               KtoMcount = sum(Orginal=='K' & Modified=='M')), by =  ID]
#       ID MtoKcount KtoMcount
#1: Sam_1         2         2
#2: Sam_2         1         1
#3: Sam_3         0         0
#4: Sam_4         1         1

Another option is table from base R. We paste the columns other than the 'ID' column (do.call(paste0, data[-1])) and get the frequency count using table. Then, we subset the table output ('tbl') that have only 'KM' or 'MK' as its column names

 tbl <- table(data$ID,do.call(paste0, data[-1]))[,c('KM', 'MK')]
 tbl
 #      KM MK
 #Sam_1  2  2
 #Sam_2  1  1
 #Sam_3  0  0
 #Sam_4  1  1

As @user295691 mentioned in the comments, we can change the column names while pasteing.

  tbl <- with(data, table(ID, paste0(Orginal, "_to_", Modified,"_counts"))) 
  tbl[,c('K_to_M_counts', 'M_to_K_counts')]

data

data <- structure(list(ID = c("Sam_1", "Sam_1", "Sam_1", "Sam_1", 
"Sam_1", 
"Sam_2", "Sam_2", "Sam_3", "Sam_4", "Sam_4", "Sam_4"), Orginal = c("M", 
"K", "I", "M", "K", "K", "M", "J", "K", "M", "P"), Modified = c("K", 
"M", "J", "K", "M", "M", "K", "P", "M", "K", "J")), .Names = c("ID", 
"Orginal", "Modified"), class = "data.frame", row.names = c(NA, 
-11L))
like image 70
akrun Avatar answered Dec 03 '25 22:12

akrun


Base R using xtabs. Desired shape/subsetting requires transposing and fiddling with container type classes.

d<-as.matrix(ftable(xtabs(Count~Orginal+Modified+ID,transform(data,Count=1))))
as.data.frame(t(d))[,c("M_K","K_M")]
      M_K K_M
Sam_1   2   2
Sam_2   1   1
Sam_3   0   0
Sam_4   1   1
like image 34
A. Webb Avatar answered Dec 03 '25 22:12

A. Webb



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!