I'm an enthusiastic R newbie that needs some help! :)
I have a data frame that looks like this:
id<-c(100,200,300,400)
a<-c(1,1,0,1)
b<-c(1,0,1,0)
c<-c(0,0,1,1)
y=data.frame(id=id,a=a,b=b,c=c)
Where id is an unique identifier (e.g. a person) and a, b and c are dummy variables for whether the person has this feature or not (as always 1=TRUE).
I want R to create a matrix or data frame where I have the variables a, b and c both as the names of the columns and of the rows. For the values of the matrix R will have to calculate the number of identifiers that have this feature, or the combination of features.
So for example, IDs 100, 200 and 400 have feature a then in the diagonal of the matrix where a and a cross, R will input 3. Only ID 100 has both features a and b, hence R will input 1 where a and b cross, and so forth.
The resulting data frame will have to look like this:
l<-c("","a","b","c")
m<-c("a",3,1,1)
n<-c("b",1,2,1)
o<-c("c",1,1,2)
result<-matrix(c(l,m,n,o),nrow=4,ncol=4)
As my data set has 10 variables and hundreds of observations, I will have to automate the whole process.
Your help will be greatly appreciated. Thanks a lot!
With base R:
crossprod(as.matrix(y[,-1]))
#   a b c
# a 3 1 1
# b 1 2 1
# c 1 1 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With