is there a magic function or operator to ignore some tokens?
select to_tsvector('the quick. brown fox') @@ 'brown' -- returns true
select to_tsvector('the quick,brown fox') @@ 'brown' -- returns true
select to_tsvector('the quick.brown fox') @@ 'brown' -- returns false, should return true
select to_tsvector('the quick/brown fox') @@ 'brown' -- returns false, should return true
Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.
The Good Ol' Text Search. You're probably familiar with pattern search, which has been part of the standard SQL since the beginning, and available to every single SQL-powered database: SELECT column_name FROM table_name WHERE column_name LIKE 'pattern'; That will return the rows where column_name matches the pattern .
Basic Text Matching. Full text searching in PostgreSQL is based on the match operator @@ , which returns true if a tsvector (document) matches a tsquery (query).
to_tsvector parses a textual document into tokens, reduces the tokens to lexemes, and returns a tsvector which lists the lexemes together with their positions in the document. The document is processed according to the specified or default text search configuration.
I'm afraid that you are probably stuck. If you run your terms through ts_debug you will see that 'quick.brown' is parsed as a hostname and 'quick/brown' is parsed as filesystem path. The parser really isn't that clever sadly.
My only suggestion is that you preprocess your texts to convert these tokens to spaces. You could easily create a function in plpgsql to do that.
nicg=# select ts_debug('the quick.brown fox');
                              ts_debug
---------------------------------------------------------------------
 (asciiword,"Word, all ASCII",the,{english_stem},english_stem,{})
 (blank,"Space symbols"," ",{},,)
 (host,Host,quick.brown,{simple},simple,{quick.brown})
 (blank,"Space symbols"," ",{},,)
 (asciiword,"Word, all ASCII",fox,{english_stem},english_stem,{fox})
(5 rows)
As you can see from the above you don't get tokens for quick and brown
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With