I need to have synonyms that I define for about 100 words of my of my choice. For testing I am adding the entries manually:
t = {}
t.update({'Strong':['Strong', 'Able', 'Active', 'Big',
'Energy', 'Firm',
'Force', 'Heavy', 'Robust', 'Secure',
'Solid', 'Stable', 'Steady',
'Tough', 'Vigor', 'Might',
'Rugged', 'Sound']})
t.update({'Fast':['Fast', 'Agile', 'Brisk', 'Hot', 'Quick',
'Rapid', 'Swift', 'Accel', 'Active',
'Dash', 'Flash', 'Fly', 'Race', 'Snap',
'Wing', 'Streak', 'Time', 'Chop', 'Jiffy',
'Split', 'Bat', 'Crazy', 'Double', 'Scream',
'Sonic', 'Super', 'Ball', 'Speed']})
So I am creating an empty dictionary, and then taking words like "Strong" and "Fast" and mapping it to synonyms (which I need to be able to choose).
Since I need only 100 different word mappings is this a reasonable approach? Or is there a better way to implement this?
I am also looking at using NLTK and the wordnet module. However, this module takes awhile to run and it seems I have no way of adding synonyms like I need.
I could organize your thesaurus in a graph
fashion. First of all, you keep all the words in a dictionary word -> key
and then you make a linked-list graph, since it will be sparse.
w = {}
w = {'Fast': 0, 'Strong': 1, 'Able': 2, 'Active': 3, 'Big': 4, ...}
t = {0: [1, 2, 3, ...], ...}
It would scale better for large data sets, since ints use less memory than strings.
In an actual thesaurus, individual words may belong to multiple sets of synonyms. For example, fast as in quick might be one list while fast as in secure might be in another.
I would map each word to a list of "sense groups," and then each sense group would map to a list of words.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With