Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linking records with record expansion in R [duplicate]

I have a large dataframe that looks something like that:

id x
1 123
2 123
2 125
3 125
4 200
tibble(id = c(1, 2, 2, 3, 4),
       x = c(123, 123, 125, 125, 200))

I want to link my records and expand them. I.e. where they have the same x, they should have the same id. Note, record id 2 & 3 are considered same entity because of their shared link with 1. In other words, without record id 1, records 2 and 3 will not link. I'd also like to give it the smallest id value where the records link.

Results should look something like:

id x
1 123
1 123
1 125
1 125
4 200
like image 589
sparklink Avatar asked Dec 07 '25 07:12

sparklink


1 Answers

An equivalent way to restate this problem is "if the data frame represented the 'to' and 'from' nodes of a directed graph, what would be the first ID of each unique component of the graph?"

We can show this graphically:

library(igraph)
library(tidyverse)

df <- tibble(id = c(1, 2, 2, 3, 4),
             x = c(123, 123, 125, 125, 200))

g <- graph_from_data_frame(df)

If we plot the graph, we will see that there are two unconnected clusters of nodes (called components); one for the grouped IDs 1:3, and one for ID 4:

plot(g)

We can use the function components to work out which group each ID belongs to, and get the first ID in each group:

comp <- components(g)$membership

df %>%
  mutate(group = comp[match(id, names(comp))]) %>%
  mutate(id = first(id), .by = 'group') %>%
  select(-group)
#> # A tibble: 5 x 2
#>      id     x
#>   <dbl> <dbl>
#> 1     1   123
#> 2     1   123
#> 3     1   125
#> 4     1   125
#> 5     4   200

Created on 2023-09-07 with reprex v2.0.2

like image 145
Allan Cameron Avatar answered Dec 09 '25 15:12

Allan Cameron



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!