Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

average number of words in a character vector in R

i'm trying to get the average number of words in my character vector in R

one <- c(9, 23, 43)
two <- c("this is a new york times article.", "short article.", "he went outside to smoke a cigarette.")

mydf <- data.frame(one, two)
mydf

#   one                                   two
# 1   9     this is a new york times article.
# 2  23                        short article.
# 3  43 he went outside to smoke a cigarette.

i'm looking for a function that gives me the average number of words of character vector "two".

the output here should be 5.3333 (=(7+2+7)/3)

like image 432
cptn Avatar asked Oct 23 '25 22:10

cptn


2 Answers

Here's a possibility with the qdap package:

library(qdap)
wc(mydf$two, FALSE)/nrow(mydf)

## [1] 5.333333

This is overkill but you could also do:

word_stats(mydf$two)

##   all n.sent n.words n.char n.syl n.poly   wps    cps   sps psps   cpw   spw pspw n.state proDF2 n.hapax n.dis grow.rate prop.dis
## 1 all      3      16     68    23      3 5.333 22.667 7.667    1 4.250 1.438 .188       3      1      12     2      .750     .125

And wps column is words per sentence.

like image 69
Tyler Rinker Avatar answered Oct 25 '25 10:10

Tyler Rinker


Or gregexpr()

mean(sapply(mydf$two,function(x)length(unlist(gregexpr(" ",x)))+1))
[1] 5.333333
like image 41
Troy Avatar answered Oct 25 '25 12:10

Troy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!