Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a single column from multiple?

Tags:

dataframe

r

dplyr

I have df1:

Account   Score1 Score2 Score3 Score4 Score5 Score6  Random Random2
23        F30    G1     G5     H10    J18    NULL    3      4
42        NULL   NULL   NULL   NULL   NULL   NULL    5      6
56        D10    D11    NULL   NULL   NULL   NULL    6      2
59        X14    D3     F4     A11    A12    A13     8      2
41        D11    D12    NULL   NULL   NULL   NULL    7      7
45        C3     C10    R4     T5     NULL   NULL    1      1
30        C4     NULL   NULL   NULL   NULL   NULL    1      5
33        D2     D3     NULL   NULL   NULL   NULL    3      4

I would like to make a new data frame that takes the values from the 6 score columns and puts them into a single column named Score. If there are multiple scores, then I would want multiple rows of data.

I would like the output df2 to look like:

Account   Score
23        F30
23        G1
23        G5
23        H10
23        J18
56        D10
56        D11
59        X14
59        D3
59        F4
59        A11
59        A12
59        A13
41        D11
41        D12
45        C3
45        C10 
45        R4
45        T5
30        C4
33        D2
33        D3
like image 936
Evan Avatar asked Oct 18 '25 17:10

Evan


2 Answers

If the values are "NULL", then we can select the columns of interest, convert to long format with pivot_longer and filter out the "NULL" elements

library(dplyr)
library(tidyr)
df1 %>% 
   select(Account, starts_with("Score")) %>% 
   pivot_longer(cols = -Account, names_to = NULL, values_to = "Score") %>% 
   filter(Score != "NULL")

-output

# A tibble: 22 × 2
   Account Score
     <int> <chr>
 1      23 F30  
 2      23 G1   
 3      23 G5   
 4      23 H10  
 5      23 J18  
 6      56 D10  
 7      56 D11  
 8      59 X14  
 9      59 D3   
10      59 F4   
# … with 12 more rows

data

df1 <- structure(list(Account = c(23L, 42L, 56L, 59L, 41L, 45L, 30L, 
33L), Score1 = c("F30", "NULL", "D10", "X14", "D11", "C3", "C4", 
"D2"), Score2 = c("G1", "NULL", "D11", "D3", "D12", "C10", "NULL", 
"D3"), Score3 = c("G5", "NULL", "NULL", "F4", "NULL", "R4", "NULL", 
"NULL"), Score4 = c("H10", "NULL", "NULL", "A11", "NULL", "T5", 
"NULL", "NULL"), Score5 = c("J18", "NULL", "NULL", "A12", "NULL", 
"NULL", "NULL", "NULL"), Score6 = c("NULL", "NULL", "NULL", "A13", 
"NULL", "NULL", "NULL", "NULL"), Random = c(3L, 5L, 6L, 8L, 7L, 
1L, 1L, 3L), Random2 = c(4L, 6L, 2L, 2L, 7L, 1L, 5L, 4L)),
 class = "data.frame", row.names = c(NA, 
-8L))
like image 83
akrun Avatar answered Oct 20 '25 07:10

akrun


A base R option

u <- subset(
  cbind(df[1], stack(df[-1])),
  values != "NULL" & startsWith(as.character(ind), "Score"),
  select = -ind
)
u[order(match(u$Account, df$Account)), ]

gives

   Account values
1       23    F30
9       23     G1
17      23     G5
25      23    H10
33      23    J18
3       56    D10
11      56    D11
4       59    X14
12      59     D3
20      59     F4
28      59    A11
36      59    A12
44      59    A13
5       41    D11
13      41    D12
6       45     C3
14      45    C10
22      45     R4
30      45     T5
7       30     C4
8       33     D2
16      33     D3
like image 36
ThomasIsCoding Avatar answered Oct 20 '25 06:10

ThomasIsCoding



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!