Say I have a String, how do I determine the number of words in it? I'm trying to create an extension like:
extension String {
var numberOfWords: Int {
// Insert string-counting code here
}
}
If you search "word count string swift" you'll find dozens of StackOverflow answers and gists that tell you to split the string using str.components(separatedBy: " ").count
.
DON'T USE components(separatedBy:)
!!!
Many non-European languages (particularly East Asian languages) don't use spaces to split words. This will also incorrectly count hyphenated words as separate, and lone punctuation as a word.
The most correct AND most performant way to solve this problem is to use either enumerateSubstrings(in:options:)
or CFStringTokenizer
.
// enumerateSubstrings
extension String {
var numberOfWords: Int {
var count = 0
let range = startIndex..<endIndex
enumerateSubstrings(in: range, options: [.byWords, .substringNotRequired, .localized], { _, _, _, _ -> () in
count += 1
})
return count
}
}
OR:
// CFStringTokenizer
extension String {
var numberOfWords: Int {
let inputRange = CFRangeMake(0, utf16.count)
let flag = UInt(kCFStringTokenizerUnitWord)
let locale = CFLocaleCopyCurrent()
let tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, self as CFString, inputRange, flag, locale)
var tokenType = CFStringTokenizerAdvanceToNextToken(tokenizer)
var count = 0
while tokenType != [] {
count += 1
tokenType = CFStringTokenizerAdvanceToNextToken(tokenizer)
}
return count
}
}
Both are very performant, but enumerateSubtrings(in:options:...)
is about twice as fast.
Shocked that nobody is pointing this out elsewhere, so I hope people searching for a solution find this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With