I'm extracting song lyrics with Vagalume's API.
library(vagalumeR)
library(tibble)
library(stringr)
set.seed(1234)
musicas = as.tibble(topLyrics(name = "seu-jorge",
message = TRUE))
musica = sample(musicas$id.top, 1)
letra = lyrics(identifier = musica,
type = "id",
artist = "seu-jorge",
key = key)
However, letra
is just one big block of text and I want to split it into smaller ones
str_split(string = as.character(letra),
"[[:upper:]]")
And this is what I get:
[1] "Pretinha" "aço tudo pelo nosso amor"
[3] "aço tudo pelo bem de nosso bem (meu bem)" " saudade é minha dor"
[5] "ue anda arrasando com meu coração" "ão"
[7] "uvide que um dia" "u te darei o céu"
[9] "eu amor junto com um anel" "ra gente se casar"
I'm pretty sure most of you don't know portuguese, but trust me, it's skipping the capital letter which I'm using as a separator. How do I include the capital letter into the smaller chunks?
You want to use positive lookahead:
str_split(string = as.character(letra), "(?=[[:upper:]])")
It splits at ""
if right after it there is a capital letter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With