Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace values with substrings in pandas objects?

I have coordinates in a Latitude dataset that each end with a letter (ex. N).

What is the best way to retrieve only the numbers and replace the original values?

My attempt at this was:

raw['LATITUDE'] = raw.loc[(raw['LATITUDE'].str.len() == 9)].str[0:8]

But I get an AttributeError message.

AttributeError: 'DataFrame' object has no attribute 'str'

I also tried replacing the values with regex but I wasn't sure how to make it successful.

I'd appreciate any suggestions, thank you.

enter image description here

like image 215
ekim420 Avatar asked Sep 18 '25 11:09

ekim420


2 Answers

Okay, let's clarify a couple of things:

  1. You seem to be working with mixed dtypes. Print out raw['LATITUDE'].apply(type).nunique() to confirm; it should be > 1.

  2. You're working with geodata. A lot of your values are invalid (the 0s), which I'd recommend be coerced to NaNs instead because that represents missing data more meaningfully

To fix your issue, try getting everything upto the last character (:-1):

raw['LATITUDE'] = raw['LATITUDE'].str[:-1].astype(float)
raw

   LATITUDE
0       NaN
1  38.72496
2  39.90272
3  38.72927
4  39.91152
5  39.84841
6       NaN
7       NaN
8       NaN
9  39.84941

This works despite your column being of mixed dtypes, because the str accessor is designed to coerce non-string rows to NaN.

If you wish to preserve 0s (which I don't recommend), use a fast replacement function like np.where;

raw['LATITUDE'] = np.where(
    raw.LATITUDE.eq(0), 0, raw['LATITUDE'].str[:-1].astype(float)
)

raw
   LATITUDE
0   0.00000
1  38.72496
2  39.90272
3  38.72927
4  39.91152
5  39.84841
6   0.00000
7   0.00000
8   0.00000
9  39.84941

The reason I don't recommend preserving the 0s is because it is semantically more meaningful to use NaN to demarcate missing data instead of 0.

like image 153
cs95 Avatar answered Sep 20 '25 00:09

cs95


You appear to have mixed types in your series with dtype object.

Option 1

You can first attempt to convert to numeric with errors='coerce', and then fillna with all up to the last character prior to converting to float:

s = pd.Series(['34.49881N', 0], dtype=object)

s = pd.to_numeric(s, errors='coerce').fillna(s.str[:-1].astype(float))

Option 2

You can also work the other way round. This is inadvisable as it is less stringent, i.e. you may find unexpected types in the result.

s = s.str[:-1].astype(float).fillna(s)

Result

In both cases, you will find:

print(s)

0    34.49881
1     0.00000
dtype: float64
like image 38
jpp Avatar answered Sep 19 '25 23:09

jpp