Premise: Suppose you have a table containing words, where some may be distinct and some "may overlap", meaning a longer word starts with a shorter one, eg:
---------------
| word |
---------------
| dog | *
| games | *
| stat |
| state |
| statement | *
| fulfill |
| fulfilled | *
| fulfillment | *
---------------
Question: How does one write a query that returns the list of non-overlapping + the longest overlapping words in such cases?
In the example above, the desired words are identified by a *, as per the following extended explanation:
dog and games don't overlap with anything, so they're the longest in their "solo/distinct" categorystatement overlaps with state and stat and is the longestfulfilled overlaps with fulfill and is longer (does NOT overlap with fulfillment)
fulfillment overlaps with fulfill and is longer (does NOT overlap with fulfilled)
Note: Please be aware that for the sake of simplicity, the data sample is reduced. In reality there are a few million records to query and, there are no search terms known before hand, so it's not possible to directly use constructs such as WHERE word LIKE 'stat%'. Not sure if relevant or not, but the words have a relatively short max length, eg 20.
Something like
select word
from your_table t1
where not exists (
select word
from your_table t2
where t2.word like t1.word || '_%'
)
;
The query would benefit from the existence of an index on word. But even then it may take a long time. In any case, you can give it a try and let us know what happens.
As long as you compare prefixes and if one word is entirely included as prefix of the other, you may try match_recognize to sequentially check prefix-matching in one pass.
But exists is more clear, though you should examine performace on your real dataset with index on word.
with a as ( select column_value as word from sys.odcivarchar2list( 'dog' , 'games' , 'stat' , 'state' , 'statement' , 'fulfill' , 'fulfilled' , 'fulfillment' ) ) select * from a match_recognize( order by word desc /*Longest first*/ measures a.word as word one row per match /*Exclude any number of words each of which is a prefix of the previous one */ pattern (a {- b* -}) define /*Current word is a prefix of the previous*/ b as prev(word) like word || '%' ) t| WORD | | :---------- | | statement | | games | | fulfillment | | fulfilled | | dog |
db<>fiddle here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With