i'm trying to get the average number of words in my character vector in R
one <- c(9, 23, 43)
two <- c("this is a new york times article.", "short article.", "he went outside to smoke a cigarette.")
mydf <- data.frame(one, two)
mydf
# one two
# 1 9 this is a new york times article.
# 2 23 short article.
# 3 43 he went outside to smoke a cigarette.
i'm looking for a function that gives me the average number of words of character vector "two".
the output here should be 5.3333 (=(7+2+7)/3)
Here's a possibility with the qdap package:
library(qdap)
wc(mydf$two, FALSE)/nrow(mydf)
## [1] 5.333333
This is overkill but you could also do:
word_stats(mydf$two)
## all n.sent n.words n.char n.syl n.poly wps cps sps psps cpw spw pspw n.state proDF2 n.hapax n.dis grow.rate prop.dis
## 1 all 3 16 68 23 3 5.333 22.667 7.667 1 4.250 1.438 .188 3 1 12 2 .750 .125
And wps column is words per sentence.
Or gregexpr()
mean(sapply(mydf$two,function(x)length(unlist(gregexpr(" ",x)))+1))
[1] 5.333333
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With