I have a data frame which has distances from a unit's centroid to different points. The points are identified by numbers and what I am trying to obtain a new column where I get the distance to the closest object.
So the data frame looks like this:
FID <- c(12, 12, 14, 15, 17, 18)
year <- c(1990, 1994, 1983, 1953, 1957, 2000)
centroid_distance_1 <- c(220.3, 220.3, 515.6, NA, 200.2, 22)
centroid_distance_2 <- c(520, 520, 24.3, NA , NA, 51.8)
centroid_distance_3 <- c(NA, 12.8, 124.2, NA, NA, 18.8)
centroid_distance_4 <- c(725.3, 725.3, 44.2, NA, 62.9, 217.9)
sample2 <- data.frame(FID, year, centroid_distance_1, centroid_distance_2, centroid_distance_3, centroid_distance_4)
sample2
FID year centroid_distance_1 centroid_distance_2 centroid_distance_3 centroid_distance_4
1 12 1990 220.3 520.0 NA 725.3
2 12 1994 220.3 520.0 12.8 725.3
3 14 1983 515.6 24.3 124.2 44.2
4 15 1953 NA NA NA NA
5 17 1957 200.2 NA NA 62.9
6 18 2000 22.0 51.8 18.8 217.9
FID is an identifier of each unit and year a year indicator. Each row is a FID*year pair. centroid_distance_xis the row's distance between its centroid and the object x. This is a small sample of the data frame, which contains much more columns and rows.
What I am looking for is something like this:
short_distance <- c(220.3, 12.8, 24.3, NA, 62.9,18.8)
unit <- c(1, 3, 2, NA, 4, 3)
ideal.df <- data.frame(FID, year, short_distance, unit)
ideal.df
FID year short_distance unit
1 12 1990 220.3 1
2 12 1994 12.8 3
3 14 1983 24.3 2
4 15 1953 NA NA
5 17 1957 62.9 4
6 18 2000 18.8 3
Where basically, I add one column with named short_distance which is the cell with the lower value a row takes of all the centroid_distance_* columns above, and one named unit which identifies the object from which each row has the smaller distance (so if one row has smallest value in centorid_distance_1 it takes the value of 1 for unit).
I have tried a bunch of things with dplyr and pivot and re-pivoting the dataframe but I'm really not getting there.
Thanks a lot for the help!
Another solution based in the tidyverse - using pivot_longer - could look as follows.
library(dplyr)
library(tidyr)
library(stringr)
sample2 %>%
pivot_longer(-c(FID, year)) %>%
group_by(year, FID) %>%
slice_min(value, n = 1, with_ties = FALSE) %>%
mutate(unit = str_sub(name, -1)) %>%
select(-name, short_distance = value)
# Groups: year, FID [6]
# FID year short_distance unit
# <dbl> <dbl> <dbl> <chr>
# 1 15 1953 NA 1
# 2 17 1957 62.9 4
# 3 14 1983 24.3 2
# 4 12 1990 220. 1
# 5 12 1994 12.8 3
# 6 18 2000 18.8 3
My first couple of attempts at this weren't working like I imagined, either - couldn't always get the NA behavior you want - but here's one that works:
library(dplyr)
library(reshape2) # Or use tidyr if you prefer
sample2 %>%
# Melt/unpivot to one value per row
melt(id.vars = c("FID", "year")) %>%
# Extract the unit number
mutate(
unit = sub(x = variable,
pattern = "^centroid_distance_",
replacement = "")
) %>%
group_by(FID, year) %>% # Group by FID and year to get one row of output for each
arrange(value) %>% # Put smallest distance at the top of each group
slice_head(n = 1) # Take one row from the top of each group
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With