I would like to select for each ID the two closest values of Cq. I thought I'd figured it out, but it depends on row position...
Here is an example of the form of my dataset :
df <- data.frame(ID = c("A","A","A","B","B","B","C","C","C"),
Cq = c(34.32,34.40,34.31,31.49,31.40,31.49,31.22,31.31,31.08))
ID Cq
1 A 34.32
2 A 34.40
3 A 34.31
4 B 31.49
5 B 31.40
6 B 31.49
7 C 31.22
8 C 31.31
9 C 31.08
And what I tried
df4 <-df %>%
group_by(ID) %>%
arrange(Cq) %>%
mutate(diffvals= Cq - lag(Cq)) %>%
filter(row_number() == 1 | row_number() == 2)
#Output
ID Cq diffvals
1 A 34.31 NA
2 A 34.32 0.0100
3 B 31.40 NA
4 B 31.49 0.0900
5 C 31.08 NA
6 C 31.22 0.14
And the expected Output
ID Cq
1 A 34.32
2 A 34.31
3 B 31.49
4 B 31.49
5 C 31.22
6 C 31.31
I've tried sorting my dataset before, but it doesn't change anything. I also tried using filter(diffvals=wich.min==diffvals)but I don't know how to extract the two smallest.
If you have any ideas, it would help me a lot!
Thanks in advance
Here is a base R code, where dist is used to enumerate distances of all pairs within groups, i.e.,
dfout <- do.call(rbind,
lapply(split(df,df$ID),
function(v) {
d <- `diag<-`(as.matrix(dist(v$Cq)),NA)
d[lower.tri(d)] <- NA
v[which(d==min(d,na.rm = T),arr.ind = T),]
}
))
such that
> dfout
ID Cq
A.1 A 34.32
A.3 A 34.31
B.4 B 31.49
B.6 B 31.49
C.7 C 31.22
C.8 C 31.31
Using dplyr one option is to do a full_join with itself based on ID. Remove the rows which are generated in combination with itself and for each ID select the row with minimum difference and get the data in long format.
library(dplyr)
df %>%
mutate(Row = row_number()) %>%
full_join(df, by = 'ID') %>%
group_by(ID, Row) %>%
filter(Cq.x != Cq.y) %>%
group_by(ID) %>%
slice(which.min(abs(Cq.x - Cq.y))) %>%
tidyr::pivot_longer(cols = starts_with('Cq')) %>%
select(-Row, -name)
# ID value
# <fct> <dbl>
#1 A 34.3
#2 A 34.3
#3 B 31.5
#4 B 31.4
#5 C 31.2
#6 C 31.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With