I have a pandas dataframe where the column names are capital and snake case. I want to convert them into camel case with first world starting letter to be lower case. The following code is not working for me. Please let me know how to fix this.
import pandas as pd
# Sample DataFrame with column names
data = {'RID': [1, 2, 3],
'RUN_DATE': ['2023-01-01', '2023-01-02', '2023-01-03'],
'PRED_VOLUME_NEXT_360': [100, 150, 200]}
df = pd.DataFrame(data)
# Convert column names to lowercase
df.columns = df.columns.str.lower()
# Convert column names to camel case with lowercase starting letter
df.columns = [col.replace('_', ' ').title().replace(' ', '').replace(col[0], col[0].lower(), 1) for col in df.columns]
# Print the DataFrame with updated column names
print(df)
I want to column names RID, RUN_DATE, PRED_VOLUME_NEXT_360 to be converted to rid, runDate, predVolumeNext360, but the code is giving Rid, RunDate and PredVolumeNext360.
You could use a regex to replace _x
by _X
:
df.columns = (df.columns.str.lower()
.str.replace('_(.)', lambda x: x.group(1).upper(),
regex=True)
)
Or with a custom function:
def to_camel(s):
l = s.lower().split('_')
l[1:] = [x.capitalize() for x in l[1:]]
return ''.join(l)
df = df.rename(columns=to_camel)
Output:
rid runDate predVolumeNext360
0 1 2023-01-01 100
1 2 2023-01-02 150
2 3 2023-01-03 200
Define methods to convert to lower camel case separately for clarity:
import pandas as pd
def to_camel_case(snake_str):
return "".join(x.capitalize() for x in snake_str.lower().split("_"))
def to_lower_camel_case(snake_str):
# We capitalize the first letter of each component except the first one
# with the 'capitalize' method and join them together.
camel_string = to_camel_case(snake_str)
return snake_str[0].lower() + camel_string[1:]
# Sample DataFrame with column names
data = {'RID': [1, 2, 3],
'RUN_DATE': ['2023-01-01', '2023-01-02', '2023-01-03'],
'PRED_VOLUME_NEXT_360': [100, 150, 200]}
df = pd.DataFrame(data)
# Convert column names to camel case with lowercase starting letter
df.columns = [to_lower_camel_case(col) for col in df.columns]
# Print the DataFrame with updated column names
print(df)
Prints:
rid runDate predVolumeNext360
0 1 2023-01-01 100
1 2 2023-01-02 150
2 3 2023-01-03 200
The methods are based on this answer by jbaiter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With