Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split strings on capital letters

Tags:

r

I'm extracting song lyrics with Vagalume's API.

library(vagalumeR)
library(tibble)
library(stringr)
set.seed(1234)

musicas = as.tibble(topLyrics(name = "seu-jorge",
                          message = TRUE))

musica = sample(musicas$id.top, 1)

letra = lyrics(identifier = musica,
   type = "id",
   artist = "seu-jorge",
   key = key)

However, letra is just one big block of text and I want to split it into smaller ones

str_split(string = as.character(letra),
     "[[:upper:]]")

And this is what I get:

[1] "Pretinha"                                 "aço tudo pelo nosso amor"                
[3] "aço tudo pelo bem de nosso bem (meu bem)" " saudade é minha dor"                    
[5] "ue anda arrasando com meu coração"        "ão"                                      
[7] "uvide que um dia"                         "u te darei o céu"                        
[9] "eu amor junto com um anel"                "ra gente se casar"                       

I'm pretty sure most of you don't know portuguese, but trust me, it's skipping the capital letter which I'm using as a separator. How do I include the capital letter into the smaller chunks?

like image 831
Pedro Cavalcante Avatar asked Sep 06 '25 13:09

Pedro Cavalcante


1 Answers

You want to use positive lookahead:

str_split(string = as.character(letra), "(?=[[:upper:]])")

It splits at "" if right after it there is a capital letter.

like image 141
Julius Vainora Avatar answered Sep 09 '25 05:09

Julius Vainora