Say I have this data:
df <- data.frame(x = c("Tom: I like cheese.",
"Tom: Cheese is good.",
"Tom: Muenster is my favorite.",
"Bob: No, I like Cheddar.",
"Tom: You're wrong. I think cheddar is only good on burgers.",
"Gina: But what about American on burgers?",
"Gina: That's better.",
"Bob: Yeah, I agree with Gina.",
"Bob: American is better on burgers. Cheddar is for grating on nachos."))
I want to turn it into this data:
df <- data.frame(x = c("Tom: I like cheese. Cheese is good. Muenster is my favorite.",
"Bob: No, I like Cheddar.",
"Tom: You're wrong. I think cheddar is only good on burgers.",
"Gina: But what about American on burgers? That's better.",
"Bob: Yeah, I agree with Gina. American is better on burgers. Cheddar is for grating on nachos."))
Basically, I want to cut the text including and before the colon on any instance of text that already has had a recent name.
I am struggling with trying to figure out how to do it in a way that doesn't group the entire "Tom:"'s and "Gina:"'s together and remove them all but for the first instance. I want the later mentions of names to restart the loop.
We can use tidyr
to split the speaker and what they say into columns, then use dplyr
to combine runs of the same speaker. For example
df |>
tidyr::separate_wider_delim(x, ": ", names=c("speaker", "words")) |>
mutate(instance = consecutive_id(speaker)) |>
summarize(speaker = first(speaker), text=paste(words, collapse=" "), .by=instance)
returns
instance speaker text
<int> <chr> <chr>
1 1 Tom I like cheese. Cheese is good. Muenster is my favorite.
2 2 Bob No, I like Cheddar.
3 3 Tom You're wrong. I think cheddar is only good on burgers.
4 4 Gina But what about American on burgers? That's better.
5 5 Bob Yeah, I agree with Gina. American is better on burgers. Cheddar is for grating on na…
Using data.table, split on ": "
, group by relid, then paste it back per group:
df[, c("name", "text") := tstrsplit(x, ": ", fixed = TRUE)
][, .(text = paste(text, collapse = " ")), by = .(name, rleid(name))
][, -2]
# name text
# <char> <char>
# 1: Tom I like cheese. Cheese is good. Muenster is my favorite.
# 2: Bob No, I like Cheddar.
# 3: Tom You're wrong. I think cheddar is only good on burgers.
# 4: Gina But what about American on burgers? That's better.
# 5: Bob Yeah, I agree with Gina. American is better on burgers. Cheddar is for grating on nachos.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With