Kotlin check for words in string

Question

I have a NSFW class that scans texts like item names and descriptions against a list of known NSFW-words.

That would be the best approach to test a list of strings like

    let nsfw = listof(
    "badword",
    "curseword",
    "ass",
    ... 200+ more
    )

against a string like:

This is the text that contains a badword // returns true

Please note that i need to check for full words. not parts of words.

so the sentence:

The grass is grean // returns false

Because grass is not a bad word.

Ive tried something like this but it doesnt check for full words.

        val result =  nsfw.filter { it in sentence.toLowerCase() }

Wiktor Stribiżew · Accepted Answer

You may build a regex like

\b(?:word1|word2|word3...)\b

See the regex demo. Then, use it with the Regex.containsMatchIn method:

val nsfw = listOf(
    "badword",
    "curseword",
    "ass"
)
val s1 = "This is the text that contains a badword"
val s2 = "The grass is grean"
val rx = Regex("\b(?:${nsfw.joinToString(separator="|")})\b")
println(rx.containsMatchIn(s1)) // => true
println(rx.containsMatchIn(s2)) // => false

See this Kotlin demo.

Here, nsfw.joinToString(separator="|") joins the words with a pipe (the alternation operator) and the "\b(?:${nsfw.joinToString(separator="|")})\b" creates the correct regex.

If your words may contain special regex metacharacters, like +, ?, (, ), etc., you need to "preprocess" the nsfw values with the Regex.escape method:

val rx = Regex("\b(?:${nsfw.map{Regex.escape(it)}.joinToString("|")})\b")
                            ^^^^^^^^^^^^^^^^^^^^^^

See the Kotlin demo.

AND one more thing: if the keywords may start/end with chars other than letters, digits and underscores, you cannot rely on \b word boundaries. You may

Use whitespace boundaries: val rx = Regex("(?<!\S)(?:${nsfw.map{Regex.escape(it)}.joinToString("|")})(?!\S)")
Use unambiguous word boundaries: val rx = Regex("(?<!\w)(?:${nsfw.map{Regex.escape(it)}.joinToString("|")})(?!\w)")

forpas · Answer

You can use split() on the string that you want to check, with space as a delimiter, so you create a list of its words, although this does not always guarantee that all words will be extracted successfully, since there could exist other word separators like dots or commas etc. If that suits you, do this:

val nsfw = listOf(
    "badword",
    "curseword",
    "ass"
)

val str = "This is the text that contains a badword"
val words = str.toLowerCase().split("\s+".toRegex())
val containsBadWords = words.firstOrNull { it in nsfw } != null
println(containsBadWords)

will print

true

If you want a list of the "bad words":

val badWords = words.filter { it in nsfw }

Kotlin check for words in string

Tags:

regex

android

kotlin

sn0ep

2 Answers

Wiktor Stribiżew

forpas

Recent Activity

Donate For Us

Kotlin check for words in string

Tags:

regex

android

kotlin

sn0ep

2 Answers

Wiktor Stribiżew

forpas

Related questions

Recent Activity

Donate For Us