Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count words in a multi-language text using Ruby & JavaScript

What I want to achieve is to get the word count in a multi-language text.

Like if I have a text has both English and Chinese: The last Olympics was held in 北京, the count should be 8, because there's six English words and two Chinese characters, like the word count in Microsoft Word.

What's the best way to do that in Ruby and in JavaScript?

like image 773
larryzhao Avatar asked Nov 03 '25 06:11

larryzhao


1 Answers

I have a solution based on "how can i detect cjk characters in a string in ruby".

s = 'The last Olympics was held in 北京'
class String
  def contains_cjk?
    !!(self =~ /\p{Han}|\p{Katakana}|\p{Hiragana}|\p{Hangul}/)
  end
end
s.split.inject(0) do |sum, word|
  if word.contains_cjk?
    sum += word.length   # => ONLY work in Ruby 1.9. 
                         #    Search for other methods to do this for 1.8
  else
    sum += 1
  end
end
like image 114
halfelf Avatar answered Nov 04 '25 20:11

halfelf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!