I'm currently working on a script to geocode a list of addresses to get it ready for analysis and visualization. This is my first time working with Geocoding, so I'm struggling and not sure if there is something obvious I'm missing here. I'm working with Google's Geocoding API.
The general outline of my (not optimized) process is to turn a column in a DataFrame containing addresses into a list. Then, I create a new list from that one using a list comprehension where each element is a subset of the data I got back.
key = ...
city = "Long Beach"
state = "CA"
addresses = df["Address"].values.tolist()
geocodes = [geocode(x,city,state,key) for x in addresses]
The actual function I'm using for the geocoding is below. It takes in my API key, the address, and city/state parameters to give a move complete address. Then, it just makes the call and returns a list of three elements that I'm looking for in the response.
def geocode(address, city, state, key):
time.sleep(.05)
params = f"{address.lower()} {city}, {state}".replace(" ","+")
request_url = "https://maps.googleapis.com/maps/api/geocode/json?address="+params+f"&key={key}"
response = requests.get(request_url).json()
neighborhood = response["results"][0]["address_components"][2]["long_name"]
lat = response["results"][0]["geometry"]["location"]["lat"]
lon = response["results"][0]["geometry"]["location"]["lng"]
return [neighborhood, lat, lon]
When I run it, the script will progress for a while, then fail. When it does, the traceback gives me the exceptions I'm including below this. So far, I haven't been able to find info on what this issue might be or how I should approach diagnosing the problem for Google's Geocoding API. They give info on how to interpret the request statuses, but when I'm checking the statuses that I get back before the failure, all of them are 'OK' and none of them provide an indication of why the connection is closing.
RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
ProtocolError Traceback (most recent call last)
----------
ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
----------
ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Now, I have toyed around with it and ruled out a couple things so far:
1. I have tried this succesfully with single calls and gotten what I needed. The problems happen when I run it on a list.
2. I've used TQDM and printed the statuses (at different points), and the script does successfully make the call, get the data back, and move to the next one many times before it fails.
3. I don't think it's a rate limiting issue. This API has no daily limits, only a Queries per Second limit of 50. The crude time.sleep(.05) in my function should be keeping it to around 20 QPS and have me under that limit.
Does anyone know what my problem might be? Or would someone explain what additional diagnostics I should be doing? Again, I'm new to geocoding and haven't experienced this issue before with the APIs I have experience with, so even help in understanding what's going on so that I can solve it myself would be greatly appreciated if no one can find the problem.
Try using the official googlemaps py module. It uses requests.Session under the hood too but I've never had trouble with it. You may need some multithreading down the line but if you don't have 'too many' addresses, this should do the trick:
import time
import logging
import googlemaps
key = '...'
gmaps_client = googlemaps.Client(key=key)
addresses = [
["8473 Manor Station Street", "Cartersville", "GA"],
["14 Edgewater Ave.", "Ottumwa", "IA"],
["42 Aspen Court", "San Diego", "CA"]
]
def geocode(address, city, state):
time.sleep(.05)
params = f"{address.lower()} {city}, {state}".replace(" ", "+")
try:
response = gmaps_client.geocode(params)[0]
neighborhood = response["address_components"][2]["long_name"]
lat = response["geometry"]["location"]["lat"]
lon = response["geometry"]["location"]["lng"]
return [neighborhood, lat, lon]
except Exception as e:
logging.error(e)
return [None, None, None]
geocodes = [geocode(*group) for group in addresses]
print(geocodes)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With