Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to extract addresses from a sentence/paragraph (non-Regex approach)?

I was working on a project which needed me to extract addresses from a sentence.

For e.g. Input sentence: Hi, Mr. Sam D. Richards lives here Shop No / 123, 3rd Floor, ABC Building, Behind CDE Mart, Aloha Road, 12345. If you need any help, call me on 12345678

I am trying to extract just the address i.e. Shop No / 123, 3rd Floor, ABC Building, Behind CDE Mart, Aloha Road, 12345

What I have tried so far:

I tried Pyap which also works on Regex so it is not able to generalize it better for addresses of countries other than US/Canada/UK. I realized that we cannot use Regex as there is no pattern to the address or the sentences whatsoever. Also tried locationtagger which only manages to return the country or the city.

Is there any better way of doing it?


1 Answers

If there is no obvious pattern for regex, you can try an ML-based approach. There is a well known problem named entity recognition (NER), and it is typically solved as a sequence tagging problem: a model is trained to predict for each token (e.g. a word or a subword) whether it is a part of address or not.

You can look for a model that is already trained to extract addresses (e.g. here https://huggingface.co/models?search=address), or fine-tune a BERT-based model on your own dataset (here is a recipe).

like image 138
David Dale Avatar answered Jan 26 '26 10:01

David Dale



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!