I have in R the following data frame:
ID = c(rep(1,5),rep(2,3),rep(3,2),rep(4,6));ID
VAR = c("A","A","A","A","B","C","C","D",
             "E","E","F","A","B","F","C","F");VAR
CATEGORY = c("ANE","ANE","ANA","ANB","ANE","BOO","BOA","BOO",
        "CAT","CAT","DOG","ANE","ANE","DOG","FUT","DOG");CATEGORY
DATA = data.frame(ID,VAR,CATEGORY);DATA
That looks like this table below :
| ID | VAR | CATEGORY | 
|---|---|---|
| 1 | A | ANE | 
| 1 | A | ANE | 
| 1 | A | ANA | 
| 1 | A | ANB | 
| 1 | B | ANE | 
| 2 | C | BOO | 
| 2 | C | BOA | 
| 2 | D | BOO | 
| 3 | E | CAT | 
| 3 | E | CAT | 
| 4 | F | DOG | 
| 4 | A | ANE | 
| 4 | B | ANE | 
| 4 | F | DOG | 
| 4 | C | FUT | 
| 4 | F | DOG | 
ideal output given the above data frame in R I want to be like that:
| ID | TEXTS | category | 
|---|---|---|
| 1 | A | ANE | 
| 2 | C | BOO | 
| 3 | E | CAT | 
| 4 | F | DOG | 
More specifically: I want for ID say 1 to search the most common value in the column VAR which is A and then to search the most common value in the column CATEGORY related to the most common value A which is the ANE and so forth.
How can I do it in R ? Imagine that it is sample example.My real data frame contains 850.000 rows and has 14000 unique ID.
Another dplyr strategy using count and slice:
library(dplyr)
DATA %>% 
    group_by(ID) %>% 
    count(VAR, CATEGORY) %>% 
    slice(which.max(n)) %>% 
    select(-n)
     ID VAR   CATEGORY
  <dbl> <chr> <chr>   
1     1 A     ANE     
2     2 C     BOA     
3     3 E     CAT     
4     4 F     DOG  
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With