Attribute value to new column based on values in similarly called columns

Question

I have a data frame which has distances from a unit's centroid to different points. The points are identified by numbers and what I am trying to obtain a new column where I get the distance to the closest object.

So the data frame looks like this:

FID <- c(12, 12, 14, 15, 17, 18)
year <- c(1990, 1994, 1983, 1953, 1957, 2000)
centroid_distance_1 <- c(220.3, 220.3, 515.6, NA, 200.2, 22)
centroid_distance_2 <- c(520, 520, 24.3, NA , NA, 51.8)
centroid_distance_3 <- c(NA, 12.8, 124.2, NA, NA, 18.8)
centroid_distance_4 <- c(725.3, 725.3, 44.2, NA, 62.9, 217.9)
sample2 <- data.frame(FID, year, centroid_distance_1, centroid_distance_2, centroid_distance_3, centroid_distance_4)


    sample2
  FID year centroid_distance_1 centroid_distance_2 centroid_distance_3 centroid_distance_4
1  12 1990               220.3               520.0                  NA               725.3
2  12 1994               220.3               520.0                12.8               725.3
3  14 1983               515.6                24.3               124.2                44.2
4  15 1953                  NA                  NA                  NA                  NA
5  17 1957               200.2                  NA                  NA                62.9
6  18 2000                22.0                51.8                18.8               217.9

FID is an identifier of each unit and year a year indicator. Each row is a FID*year pair. centroid_distance_xis the row's distance between its centroid and the object x. This is a small sample of the data frame, which contains much more columns and rows.

What I am looking for is something like this:

short_distance <- c(220.3, 12.8, 24.3, NA, 62.9,18.8)
unit <- c(1, 3, 2, NA, 4, 3)
ideal.df <- data.frame(FID, year, short_distance, unit)

ideal.df
  FID year short_distance unit
1  12 1990          220.3    1
2  12 1994           12.8    3
3  14 1983           24.3    2
4  15 1953             NA   NA
5  17 1957           62.9    4
6  18 2000           18.8    3

Where basically, I add one column with named short_distance which is the cell with the lower value a row takes of all the centroid_distance_* columns above, and one named unit which identifies the object from which each row has the smaller distance (so if one row has smallest value in centorid_distance_1 it takes the value of 1 for unit).

I have tried a bunch of things with dplyr and pivot and re-pivoting the dataframe but I'm really not getting there.

Thanks a lot for the help!

rjen · Accepted Answer

Another solution based in the tidyverse - using pivot_longer - could look as follows.

library(dplyr)
library(tidyr)
library(stringr)

sample2 %>%
  pivot_longer(-c(FID, year)) %>%
  group_by(year, FID) %>%
  slice_min(value, n = 1, with_ties = FALSE) %>%
  mutate(unit = str_sub(name, -1)) %>%
  select(-name, short_distance = value)

# Groups:   year, FID [6]
#     FID  year short_distance unit 
#   <dbl> <dbl>          <dbl> <chr>
# 1    15  1953           NA   1    
# 2    17  1957           62.9 4    
# 3    14  1983           24.3 2    
# 4    12  1990          220.  1    
# 5    12  1994           12.8 3    
# 6    18  2000           18.8 3

Matt Parker · Answer

My first couple of attempts at this weren't working like I imagined, either - couldn't always get the NA behavior you want - but here's one that works:

library(dplyr)
library(reshape2) # Or use tidyr if you prefer


sample2 %>%
    # Melt/unpivot to one value per row
    melt(id.vars = c("FID", "year")) %>%
    # Extract the unit number
    mutate(
        unit = sub(x = variable,
                   pattern = "^centroid_distance_",
                   replacement = "")
    ) %>%
    group_by(FID, year) %>% # Group by FID and year to get one row of output for each
    arrange(value) %>%      # Put smallest distance at the top of each group
    slice_head(n = 1)       # Take one row from the top of each group

Attribute value to new column based on values in similarly called columns

Tags:

r

data.table

dplyr

pivot-table

tidyverse

AntVal

2 Answers

rjen

Matt Parker

Recent Activity

Donate For Us

Attribute value to new column based on values in similarly called columns

Tags:

r

data.table

dplyr

pivot-table

tidyverse

AntVal

2 Answers

rjen

Matt Parker

Related questions

Recent Activity

Donate For Us