In my data, there is a column like :
df <- data.frame(status = c("GET/sfuksd1567","GET/sjsh787","POST/hsfhuks","GET/sfukfiezd17","POST/fshks"), stringsAsFactors = FALSE)
I want to create another column automatically which is the indicator of the variable status and it only extracts the "GET" or "POST", like df$ind=c("GET","GET","POST","GET","POST").
I've tried the function substr, but I didn't success.
Original data:
> df
status
1 GET/sfuksd1567
2 GET/sjsh787
3 POST/hsfhuks
4 GET/sfukfiezd17
5 POST/fshks
Expected result:
> df
status ind
1 GET/sfuksd1567 GET
2 GET/sjsh787 GET
3 POST/hsfhuks POST
4 GET/sfukfiezd17 GET
5 POST/fshks POST
You could simply remove everything after the backslash using regex
df$ind <- sub("/.*", "", df$status)
df
# status ind
# 1 GET/sfuksd1567 GET
# 2 GET/sjsh787 GET
# 3 POST/hsfhuks POST
# 4 GET/sfukfiezd17 GET
# 5 POST/fshks POST
Or if you don't like regex, you could try
library(tidyr)
separate(df, "status", c("ind", "status"))
Or
library(data.table) ## V1.9.6+
setDT(df)[, tstrsplit(status, "/")]
Or
read.table(text = df$status, sep = "/")
The last three options will just split the status columns into two separate ones.
We have :
df<-data.frame(status=c("GET/sfuksd1567","GET/sjsh787","POST/hsfhuks","GET/sfukfiezd17","POST/fshks"),stringsAsFactors=F)
You can do:
df$ind<-sapply(1:nrow(df),function(x){strsplit(df$status,'/')[[x]][1]})
or
df$ind<-sapply(strsplit(df$status,'/'),`[[`,1)
Both return
df
status ind
1 GET/sfuksd1567 GET
2 GET/sjsh787 GET
3 POST/hsfhuks POST
4 GET/sfukfiezd17 GET
5 POST/fshks POST
Benchmark :
microbenchmark(david=sub("/.*", "", df$status),etienne=sapply(strsplit(df$status,'/'),`[[`,1))
Unit: microseconds
expr min lq mean median uq max neval cld
david 25.198 25.8985 27.64456 26.5980 27.298 116.189 100 a
etienne 62.294 63.3440 65.13979 63.8695 65.094 128.088 100 b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With