Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace values of a variable by the most frequent value in R

Tags:

r

I am working with R and have a data frame that includes names and corresponding office addresses. But the addresses are not perfectly matched with names and I have to replace the wrong addresses based on the most frequent value. Here is an example:

ID   Clients ADDRESS    FIRST_NAME   LAST_NAME
1     A       Cobben      Cui          Z 
2     B       Beerstaat   Cui          Z
3     C       River       Cui          Z
4     D       Cobben      Cui          Z
5     E       Cobben      Cui          Z
6     F       Cobben      Cui          Z
6     F       Hart        Mike         G
7     H       Hart        Mike         G
8     I       Sparren     Mike         G

And I would like a data frame that looks like this:

ID   Clients ADDRESS    FIRST_NAME   LAST_NAME
1     A       Cobben      Cui          Z 
2     B       Cobben      Cui          Z
3     C       Cobben      Cui          Z
4     D       Cobben      Cui          Z
5     E       Cobben      Cui          Z
6     F       Cobben      Cui          Z
6     F       Hart        Mike         G
7     H       Hart        Mike         G
8     I       Hart        Mike         G

I think it can be done by creating a frequency table and I was wondering whether there is a less complicated method.

Thanks a lot!

like image 331
cara Avatar asked Nov 16 '25 10:11

cara


1 Answers

A dplyr approach using add_count and which.max may look like so:

library(dplyr)

d %>% 
  add_count(LAST_NAME, ADDRESS) %>% 
  group_by(LAST_NAME) %>% 
  mutate(ADDRESS = ADDRESS[which.max(n)]) %>% 
  ungroup() %>% 
  select(-n)
#> # A tibble: 9 × 5
#>      ID Clients ADDRESS FIRST_NAME LAST_NAME
#>   <int> <chr>   <chr>   <chr>      <chr>    
#> 1     1 A       Cobben  Cui        Z        
#> 2     2 B       Cobben  Cui        Z        
#> 3     3 C       Cobben  Cui        Z        
#> 4     4 D       Cobben  Cui        Z        
#> 5     5 E       Cobben  Cui        Z        
#> 6     6 F       Cobben  Cui        Z        
#> 7     6 F       Hart    Mike       G        
#> 8     7 H       Hart    Mike       G        
#> 9     8 I       Hart    Mike       G

DATA

d <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 6L, 6L, 7L, 8L), Clients = c(
  "A",
  "B", "C", "D", "E", "F", "F", "H", "I"
), ADDRESS = c(
  "Cobben",
  "Beerstaat", "River", "Cobben", "Cobben", "Cobben", "Hart", "Hart",
  "Sparren"
), FIRST_NAME = c(
  "Cui", "Cui", "Cui", "Cui", "Cui",
  "Cui", "Mike", "Mike", "Mike"
), LAST_NAME = c(
  "Z", "Z", "Z",
  "Z", "Z", "Z", "G", "G", "G"
)), class = "data.frame", row.names = c(
  NA,
  -9L
))
like image 169
stefan Avatar answered Nov 18 '25 20:11

stefan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!