How to identify and merge connected cases in a list of integer vectors

Question

I have a list with elements which contain either single integers or integer vectors.

int_list <- list(
    c(1, 15),
    c(3, 19),
    c(2, 16),
    c(4, 19),
    c(5, 21),
    c(19, 28),
    c(28, 30),
    17
)

As you can see the second and fourth element contain the integer 19. I want to merge such cases into one vector. Therefore the expected/desired result in that case would be:

int_list_merged <- list(
    c(1, 15),
    c(3, 4, 19, 28, 30),
    c(2, 16),
    c(5, 21),
    17
)

How can I achieve this?

SamR · Accepted Answer

The condition that list(c(1,2), c(2,3), c(3,4)) returns list(c(1,2,3,4)) means we have to pass over the list until it stops changing.

We can use a recursive function which stops calling itself when that condition is satisfied.

create_merged_list <- function(l, check_finished_list = NULL) {
    new_l <- unique(lapply(seq(l), \(i) merge_elements(l, i)))

    if (identical(check_finished_list, new_l)) {
        return(new_l)
    }
    create_merged_list(new_l, l)
}

The workhorse function this calls is merge_elements(), which iterates over each element of the list, merging it with other elements with which it shares values, and returns a new list.

merge_elements <- function(l, i) {
    l_compare <- l[-i]
    el <- l[[i]]
    match_vals <- which(outer(el, unlist(l_compare), \(x, y) x == y), arr.ind = TRUE)[, "col"]

    if (!length(match_vals)) {
        return(el)
    }

    l_breaks <- cumsum(lengths(l_compare))

    l_match_idx <- vapply(match_vals, \(x) min(which(x <= l_breaks)), integer(1))
    new_el <- sort(unique(c(el, unlist(l_compare[l_match_idx]))))
    new_el
}

Output

create_merged_list(int_list) 
# [[1]]
# [1]  1 15

# [[2]]
# [1]  3  4 19 28 30

# [[3]]
# [1]  2 16

# [[4]]
# [1]  5 21

# [[5]]
# [1] 17

identical(create_merged_list(int_list), int_list_merged) 
# [1] TRUE

Andre Wildberg · Answer

An approach using expand.grid to construct combinations. Then using 2 mapplys, one to get the merged data and another one to get the non-matching indices.

Merge <- function(List){
  Seq <- seq_along(List)
  ExpSeq <- expand.grid(Seq, Seq)
  rows <- which(upper.tri(matrix(Seq, ncol=max(Seq), nrow=max(Seq))))

  c(list(as.vector(na.omit(unique(unlist(
    mapply(\(x, y) 
      ifelse(any(List[[x]] %in% List[[y]]), list(c(List[[x]], List[[y]])), NA), 
        ExpSeq[rows,1], ExpSeq[rows,2])))))),

    List[Seq[!Seq %in% na.omit(unique(unlist(
    mapply(\(x, y) 
      ifelse(any(List[[x]] %in% List[[y]]), list(c(x, y)), NA), 
        ExpSeq[rows,1], ExpSeq[rows,2]))))]]
  )
}

output

[[1]]
[1]  3 19  4 28 30

[[2]]
[1]  1 15

[[3]]
[1]  2 16

[[4]]
[1]  5 21

[[5]]
[1] 17

How to identify and merge connected cases in a list of integer vectors

Tags:

algorithm

list

r

cluster-analysis

Corbjn

2 Answers

Output

SamR

Andre Wildberg

Recent Activity

Donate For Us

How to identify and merge connected cases in a list of integer vectors

Tags:

algorithm

list

r

cluster-analysis

Corbjn

2 Answers

Output

SamR

Andre Wildberg

Related questions

Recent Activity

Donate For Us