I have a dataframe , a column of which contains colon and equal-separated strings.
data$col1
[1] "ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45"
[2] "ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05"
[3] "DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82"
[4] "DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03"
[5] "ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37"
[6] "ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29"
I want to extract the numbers follow the NLOD= and TLOD=, and then split it into two columns. Here is the output I want.
data
col1 TLOD NLOD
"ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45" 5.45 38.78
"ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05" 4.05 36.58
"DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82" 5.82 20.42
"DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03" 8.03 30.70
"ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37" 5.37 41.48
"ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29" 5.29 40.59
Any help is appreciated. Thank you.
Reproducible sample data
structure(list(col1 = c("ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45",
"ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05", "DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82",
"DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03", "ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37",
"ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
In base R, you can use strcapture to capture the data into separate columns.
cbind(df, strcapture('NLOD=(.*?);TLOD=(.*)', df$col1,
proto = list(NLOD = numeric(), TLOD = numeric())))
#. col1 NLOD TLOD
#1 ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45 38.78 5.45
#2 ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05 36.58 4.05
#3 DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82 20.42 5.82
#4 DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03 30.70 8.03
#5 ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37 41.48 5.37
#6 ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29 40.59 5.29
To look specifically for numbers you can do :
cbind(df, strcapture('NLOD=(\\d+\\.\\d+);TLOD=(\\d+\\.\\d+)', df$col1,
proto = list(NLOD = numeric(), TLOD = numeric())))
Same regex can also be used in tidyr::extract :
tidyr::extract(df, col1, c('NLOD', 'TLOD'), 'NLOD=(.*?);TLOD=(.*)', remove = FALSE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With