I'm trying to get a list of most frequent words appearing in a column.
SELECT
word,
sum(nentry) AS nentry
FROM ts_stat(
$$
SELECT to_tsvector('simple', body)
FROM document
$$
)
GROUP BY word
This works pretty well, but the problem is that documents contain words in French and English. If I use the English dictionary for stop words, the most frequent word I get is pour
, and it's the
when I use the French one. Those are two words I obviously want to exclude.
Is there a way to create a configuration that uses two different dictionaries for stop words ?
You should create a stop word file that is the union of the French and English stop word files and create a simple
dictionary with that stop word file.
Then create a text search configuration that uses this dictionary for asciiword
and word
and use this configuration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With