Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

compare multiple columns and create count of matches

Tags:

loops

r

I have a dataset with ID numbers for respondents' friends and bullies.

I'd like to go through all the friendship nominations and all the bully nominations in each row and get a count of the number of people they nominate as both. Any help would be great!

HAVE DATA:

ID  friend_1  friend_2  friend_3  bully_1  bully_2
1          4        12         7       12       15
2          8         6         7       18       20
3          9        18         1        2        1
4         15         7         2        7       13 
5          1        17         9       17        1
6          9        19        20       14       12
7         19        12        20        9       12
8          7         1        16        2       15 
9          1        10        12        1        7
10         7        11         9       11        7

WANT DATA:

ID  friend_1  friend_2  friend_3  bully_1  bully_2  num_both
1          4        12         7       12       15         1
2          8         6         7       18       20         0
3          9        18         1        2        1         1
4         15         7         2        7       13         1
5          1        17         9       17        1         2
6          9        19        20       14       12         0
7         19        12        20        9       12         1
8          7         1        16        2       15         0
9          1        10        12        1        7         1
10         7        11         9       11        7         2
like image 913
L. Tucker Avatar asked Nov 18 '25 08:11

L. Tucker


2 Answers

Assuming that values are unique within friend/bully groups, a simple approach would be:

apply(df[,-1], 1, function (x) sum(table(x) > 1)) 
[1] 1 0 1 1 2 0 1 0 1 2
like image 77
Ritchie Sacramento Avatar answered Nov 20 '25 20:11

Ritchie Sacramento


We can use apply row-wise and find out the the number of common friends which are present both in friend and bully columns

df$num_both <- apply(df, 1, function(x) 
      length(intersect(x[grep("friend", names(df))], x[grep("bully", names(df))])))


#   ID friend_1 friend_2 friend_3 bully_1 bully_2 num_both
#1   1        4       12        7      12      15        1
#2   2        8        6        7      18      20        0
#3   3        9       18        1       2       1        1
#4   4       15        7        2       7      13        1
#5   5        1       17        9      17       1        2
#6   6        9       19       20      14      12        0
#7   7       19       12       20       9      12        1
#8   8        7        1       16       2      15        0
#9   9        1       10       12       1       7        1
#10 10        7       11        9      11       7        2

Or if you are not a big fan of apply, you can use sapply with the same logic

friend_cols <- grep("friend", names(df))
bully_cols <- grep("bully", names(df))

sapply(seq_len(nrow(df)), function(i) 
 length(intersect(df[i, friend_cols, drop = TRUE], df[i, bully_cols, drop = TRUE])))

#[1] 1 0 1 1 2 0 1 0 1 2

EDIT

If there are some NA values and we want to exclude them we can use is.na and sum

apply(df, 1, function(x) sum(!is.na(intersect(x[friend_cols], x[bully_cols]))))
like image 40
Ronak Shah Avatar answered Nov 20 '25 21:11

Ronak Shah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!