Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine rows, separated by returns, that start and end with specific characters?

Tags:

string

r

srt

I'm working with video transcript data. The data was automatically exported with a return mid-sentence. I'd like to combine the spoken lines into a single row. The data is formatted as such:

data$transcript<-as.data.frame(c("00:00:03.990 --> 00:00:05.270",
 "<v Bill>I'm here to take some notes. I've",
 "heard this will be interesting.</v>",
 "00:00:05.770 --> 00:00:07.370",
 "<v Charlie>I believe you'll be correct",
 "about that, Bill.</v>",
 "00:00:10.810 --> 00:00:11.170",
 "<v Bill>Awesome.</v>"))

Intended output:

intendedData$transcript<-as.data.frame(c("00:00:03.990 --> 00:00:05.270",
 "<v Bill>I'm here to take some notes. I've heard this will be interesting.</v>",
 "00:00:05.770 --> 00:00:07.370",
 "<v Charlie>I believe you'll be correct about that, Bill.</v>",
 "00:00:10.810 --> 00:00:11.170",
 "<v Bill>Awesome.</v>"))

I've tried conditional statements for rows that start with <v and end with , but that didn't work. Any ideas will be greatly appreciated. Thank you!

like image 426
Courtney Gerver Avatar asked Oct 12 '25 17:10

Courtney Gerver


1 Answers

An approach using strsplit and paste. (Same idea as @Allan Cameron, but different execution).

tmp <- trimws(strsplit(paste(data$transcript, collapse=" "), "<v|<\\/v>")[[1]])

ifelse(grepl("\\d{2}:\\d{2}:\\d{2}\\.\\d{3}", tmp), tmp, paste0("<v ", tmp, "</v>"))
[1] "00:00:03.990 --> 00:00:05.270"
[2] "<v Bill>I'm here to take some notes. I've heard this will be interesting.</v>"
[3] "00:00:05.770 --> 00:00:07.370"
[4] "<v Charlie>I believe you'll be correct about that, Bill.</v>"
[5] "00:00:10.810 --> 00:00:11.170"
[6] "<v Bill>Awesome.</v>"

Without temporary variable

trimws(strsplit(paste(data$transcript, collapse=" "), "<v|<\\/v>")[[1]]) |> 
  (\(x) ifelse(grepl("\\d{2}:\\d{2}:\\d{2}\\.\\d{3}", x), x, paste0("<v ", x, "</v>")))()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!