I have the following data frame:
structure(list(g = c("1", "2", "3"), x = c("This is text.", "This is text too.",
"This is no text"), y = c("What is text?", "Can it eat text?",
"Maybe I will try.")), class = "data.frame", row.names = c(NA,
-3L))
I would like to count the number of words across the columns x and y and sum up the value to get one column with the total number of words used per column. It is important that I am able to subset the data. The result shoud look like this:
structure(list(g = c("1", "2", "3"), x = c("This is text.", "This is text too.",
"This is no text"), y = c("What is text?", "Can it eat text?",
"Maybe I will try."), z = c("6", "8", "8")), class = "data.frame", row.names = c(NA,
-3L))
I have tried using str_count(" ") with different regex expressions in combination with across or apply but I do not seem to get the solution.
I did not anticipate in my original question that columns with NA cells in them would be problematic, but I do. So any solution needs to be able to handle NA cells as well.
Here solution using tokenizers:
library(tokenizers)
df <-
structure(list(g = c("1", "2", "3"), x = c("This is text.", "This is text too.",
"This is no text"), y = c("What is text?", "Can it eat text?",
"Maybe I will try.")), class = "data.frame", row.names = c(NA,
-3L))
df$z = tokenizers::count_words(df$x) + tokenizers::count_words(df$y)
df
#> g x y z
#> 1 1 This is text. What is text? 6
#> 2 2 This is text too. Can it eat text? 8
#> 3 3 This is no text Maybe I will try. 8
If you prefer pure R:
df$z <- rowSums(
sapply(df[,c("x","y")],function(x)
sapply(gregexpr("\\b\\w+\\b", x) , function(x)
if(x[[1]] > 0) length(x) else 0)))
Note that \w+ matches all words and \b matches word boundaries, though i believe "\w" suffices
One possible solution:
df$z = stringi::stri_count_words(paste(df$x, df$y))
g x y z
1 1 This is text. What is text? 6
2 2 This is text too. Can it eat text? 8
3 3 This is no text Maybe I will try. 8
Or
df$z = lengths(gregexpr("\\b\\w+\\b", paste(df$x, df$y)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With