I'm looking for a Regex pattern to find German addresses.
The problem is that the format is a bit odd, and changes frequently, samples:
Falcken Str. 45 F
Heinrich-Heine-Straße 62A, Berlin-Kreuzberg
Lindenstrasse 113; Kreuzberg; 10969 Berlin
Erkstrasse 7; Neuköln; 12043 Berlin
Werbellin Strasse 69; Neuköln; 12053 Berlin
Anschrift; Rudolfstrasse 8-10; Friedrichshain; 10245 Berlin
Maybachufer 3, Neukölln, 12047, Berlin, Germany (?)
Skalitzer Strasse 31-32; Kreuzberg; 10999 Berlin
Mühlen Strasse 17; Friedrichshain; 10243 Berlin
Am Flutgraben 1; Treptow; 12435 Berlin; Germany (?)
Rigaer Strasse 89; Friedrichshain; 10247 Berlin
Köpenicker Str. 12, 10997 Berlin-Kreuzberg
Schliemannstraße 27; 10437; Berlin
Michaelkirchstr. 32, 10179 Berlin
Maybachufer 44, Neukölln, 12045, Berlin, Germany
Alexanderstrasse 11; Mitte; 10178 Berlin
Café Dritter Raum - Hertzbergstr. 14 - 12055 Berlin
Now I've tried to divide them to groups (at least [Address] [zipcode] [berlin])
but I couldn't catch all of them, the best I could come up with was
^([a-zäöüß\s\d.,-]+?)\s*([\d\s]+(?:\s?[-|+/]\s?\d+)?\s*[a-z]?)?;*\s*(\d{5})\s*(.+)?$
(thanks to another question on stackoverflow).
Any ideas?
Irregular data leads to inconsistent results. In addition, regular expressions are not the right hammer for every crystal decanter.
From a practical point of view, I'd just parse the standardized addresses (whatever that means for German addresses), and dump the leftovers to another file for manual address correction. If most of your addresses are malformed, then you might need to get access to an address-correction database of some sort--usually commercial, and often available from the postal service involved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With