Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract words from a column of data.frame

In my data, there is a column like :

df <- data.frame(status = c("GET/sfuksd1567","GET/sjsh787","POST/hsfhuks","GET/sfukfiezd17","POST/fshks"), stringsAsFactors = FALSE)

I want to create another column automatically which is the indicator of the variable status and it only extracts the "GET" or "POST", like df$ind=c("GET","GET","POST","GET","POST").

I've tried the function substr, but I didn't success.

Original data:

> df
           status
1  GET/sfuksd1567
2     GET/sjsh787
3    POST/hsfhuks
4 GET/sfukfiezd17
5      POST/fshks

Expected result:

> df
           status  ind
1  GET/sfuksd1567  GET
2     GET/sjsh787  GET
3    POST/hsfhuks POST
4 GET/sfukfiezd17  GET
5      POST/fshks POST
like image 519
velvetrock Avatar asked Dec 03 '25 17:12

velvetrock


2 Answers

You could simply remove everything after the backslash using regex

df$ind <- sub("/.*", "", df$status)
df
#            status  ind
# 1  GET/sfuksd1567  GET
# 2     GET/sjsh787  GET
# 3    POST/hsfhuks POST
# 4 GET/sfukfiezd17  GET
# 5      POST/fshks POST

Or if you don't like regex, you could try

library(tidyr)
separate(df, "status", c("ind", "status"))

Or

library(data.table) ## V1.9.6+
setDT(df)[, tstrsplit(status, "/")]

Or

read.table(text = df$status, sep = "/")

The last three options will just split the status columns into two separate ones.

like image 119
David Arenburg Avatar answered Dec 06 '25 09:12

David Arenburg


We have :

df<-data.frame(status=c("GET/sfuksd1567","GET/sjsh787","POST/hsfhuks","GET/sfukfiezd17","POST/fshks"),stringsAsFactors=F)

You can do:

df$ind<-sapply(1:nrow(df),function(x){strsplit(df$status,'/')[[x]][1]})

or

df$ind<-sapply(strsplit(df$status,'/'),`[[`,1)

Both return

df
           status  ind
1  GET/sfuksd1567  GET
2     GET/sjsh787  GET
3    POST/hsfhuks POST
4 GET/sfukfiezd17  GET
5      POST/fshks POST

Benchmark :

microbenchmark(david=sub("/.*", "", df$status),etienne=sapply(strsplit(df$status,'/'),`[[`,1))

Unit: microseconds
    expr    min      lq     mean  median     uq     max neval cld
   david 25.198 25.8985 27.64456 26.5980 27.298 116.189   100  a 
 etienne 62.294 63.3440 65.13979 63.8695 65.094 128.088   100   b
like image 43
etienne Avatar answered Dec 06 '25 11:12

etienne



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!