I am looking for a python library / algorithm / paper to extract a list of groceries out of free text.
For example:
"One salad and two beers"
Should be converted to:
{'salad':1, 'beer': 2}
In [1]: from word2number import w2n
In [2]: print w2n.word_to_num("One")
1
In [3]: print w2n.word_to_num("Two")
2
In [4]: print w2n.word_to_num("Thirty five")
35
You can convert to number with using this package and rest of things you can implement as your needs.
Installation of this package.
pip install word2number
Update
You can implement like this.
from word2number import w2n
result = {}
input = "One salad and two beers"
b = input.split()
for i in b:
    if type(w2n.word_to_num(i)) is int:
        result[b[b.index(i)+1]] = w2n.word_to_num(i)
Result
{'beers': 2, 'salad': 1}
I suggest using WordNet. You can call it from java (JWNL library), etc. Here is the suggestion: for each single word, check it's hypernym. For edibles at the top level of the hypernymy hierarchy you will find " food, nutrient". Which is probably what you want. Now to test this, query the word "beer" in the Online version. Click on the "S", and then click on "inherited hypernym ". You will find this somewhere in the hierarchy:
....
    S: (n) beverage, drink, drinkable, potable (any liquid suitable for drinking) "may I take your beverage order?"
        S: (n) food, nutrient (any substance that can be metabolized by an animal to give energy and build tissue) 
          ....
You can traverse this hierarchy using the programming language of your choice, etc. Once you flagged all the edibles, then you can catch the number , i.e. 2 in "2 beers", and you have all the information you need. Note that catching the numbers by itself can be a descent coding task! Hope it helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With