I have coordinates in a Latitude dataset that each end with a letter (ex. N).
What is the best way to retrieve only the numbers and replace the original values?
My attempt at this was:
raw['LATITUDE'] = raw.loc[(raw['LATITUDE'].str.len() == 9)].str[0:8]
But I get an AttributeError message.
AttributeError: 'DataFrame' object has no attribute 'str'
I also tried replacing the values with regex but I wasn't sure how to make it successful.
I'd appreciate any suggestions, thank you.
Okay, let's clarify a couple of things:
You seem to be working with mixed dtypes. Print out raw['LATITUDE'].apply(type).nunique()
to confirm; it should be > 1.
You're working with geodata. A lot of your values are invalid (the 0s), which I'd recommend be coerced to NaNs instead because that represents missing data more meaningfully
To fix your issue, try getting everything upto the last character (:-1
):
raw['LATITUDE'] = raw['LATITUDE'].str[:-1].astype(float)
raw
LATITUDE
0 NaN
1 38.72496
2 39.90272
3 38.72927
4 39.91152
5 39.84841
6 NaN
7 NaN
8 NaN
9 39.84941
This works despite your column being of mixed dtypes, because the str
accessor is designed to coerce non-string rows to NaN.
If you wish to preserve 0s (which I don't recommend), use a fast replacement function like np.where
;
raw['LATITUDE'] = np.where(
raw.LATITUDE.eq(0), 0, raw['LATITUDE'].str[:-1].astype(float)
)
raw
LATITUDE
0 0.00000
1 38.72496
2 39.90272
3 38.72927
4 39.91152
5 39.84841
6 0.00000
7 0.00000
8 0.00000
9 39.84941
The reason I don't recommend preserving the 0s is because it is semantically more meaningful to use NaN to demarcate missing data instead of 0.
You appear to have mixed types in your series with dtype object
.
Option 1
You can first attempt to convert to numeric with errors='coerce'
, and then fillna
with all up to the last character prior to converting to float
:
s = pd.Series(['34.49881N', 0], dtype=object)
s = pd.to_numeric(s, errors='coerce').fillna(s.str[:-1].astype(float))
Option 2
You can also work the other way round. This is inadvisable as it is less stringent, i.e. you may find unexpected types in the result.
s = s.str[:-1].astype(float).fillna(s)
Result
In both cases, you will find:
print(s)
0 34.49881
1 0.00000
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With