Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort this datset to create a proper dataframe

Tags:

r

Here is a smaple of data set:

df<-read.table (text="Id    Name    Surname Group   A1  A2  A3  Type1   Gen1    B1  B2  B3  Type2   Gen2
116 Matt    Dabi    A   4   4   4   AB  N   3   3   4   G   N
116 Matt    Dabi    D   4   3   2   D   N   4   3   2   G   N
116 Matt    Dabi    Q   NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
116 Matt    Dabi    B   NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
", header=TRUE)

Id  Name    Surname Group   A1  A2  A3  Type1   Gen1
116 Matt    Dabi    A   4   4   4   AB  N
116 Matt    Dabi    D   4   3   2   D   N
116 Matt    Dabi    Q   3   3   4   G   N
116 Matt    Dabi    B   4   3   2   G   N

I have tried df %>% na.omit()

like image 788
user330 Avatar asked Dec 31 '25 17:12

user330


2 Answers

With multiple patterns, one option is to create a spec with build_longer_spec and then use pivot_longer_spec

library(tidyr)
library(dplyr)
library(stringr)
spec <- df %>% 
    build_longer_spec(cols = matches("^[A-Za-z]+\\d+$"), 
    names_to = ".value", names_pattern = ".*(\\d+)")  %>% 
    mutate(`.value` = case_when(str_detect(`.name`, "^[AB]\\d+$") ~ 
        str_c('A', `.value`), 
       str_detect(`.name`, 'Type') ~ 'Type1',
         str_detect(`.name`, 'Gen') ~ 'Gen1')) 
pivot_longer_spec(df, spec, values_drop_na = TRUE)
# A tibble: 4 × 9
     Id Name  Surname Group    A1    A2    A3 Type1 Gen1 
  <int> <chr> <chr>   <chr> <int> <int> <int> <chr> <chr>
1   116 Matt  Dabi    A         4     4     4 AB    N    
2   116 Matt  Dabi    A         3     3     4 G     N    
3   116 Matt  Dabi    D         4     3     2 D     N    
4   116 Matt  Dabi    D         4     3     2 G     N    
like image 136
akrun Avatar answered Jan 02 '26 08:01

akrun


Here another approach using pivot_wider:

df %>%
  pivot_longer(starts_with(c("A", "B")), names_to="ID") %>%
  na.omit() %>%
  mutate(ID=sub("B", "A", ID)) %>%
  pivot_wider(names_from=c(ID))

In case of more columns, one can use:

df %>%
  pivot_longer(matches("[A-Z][1-9]"), names_to="ID") %>%
  na.omit() %>%
  mutate(ID = sub("[B-Z]", "A", ID)) %>%
  pivot_wider(names_from = ID)

given that the rows are uniquely identified otherwise.

# A tibble: 4 x 7
     Id Name  Surname Group    A1    A2    A3
  <int> <chr> <chr>   <chr> <int> <int> <int>
1   116 Matt  Dabi    A         4     4     4
2   116 Matt  Dabi    D         4     3     2
3   116 Matt  Dabi    Q         4     3     3
4   116 Matt  Dabi    B         4     2     4
like image 36
tpetzoldt Avatar answered Jan 02 '26 10:01

tpetzoldt



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!