Let's say we have the following dataframe
    handle  image_src   image_position  image_src_1 image_src_2 image_src_3
0        a       img1                1         None        None        None
1        b       img2                2         None        None        None
2        c       img3                3         None        None        None
How can we convert this datafame into this?
    handle  image_src   image_position  image_src_1 image_src_2 image_src_3
0        a       img1                1         img1        None        None
1        b       img2                2         None        img2        None
2        c       img3                3         None        None        img3
Basically I want to fill the image_src_{image_position} column based on the value of image_position of each row.
I come up with a solution which involves manually iterating over the whole dataframe, like this
data = [
    {"handle": "a","image_src": "img1","image_position": 1,"image_src_1": None,"image_src_2": None,"image_src_3": None},
    {"handle": "b","image_src": "img2","image_position": 2,"image_src_1": None,"image_src_2": None,"image_src_3": None},
    {"handle": "c","image_src": "img3","image_position": 3,"image_src_1": None,"image_src_2": None,"image_src_3": None}
]
df = pd.DataFrame(data)
for index in range(0, len(df)):
    row = df.iloc[index]
    
    position = row["image_position"].astype("int64")
    
    df.loc[index, f"image_src_{position}"] = row["image_src"]
But as iterating over the whole dataframe is a bad thing, how can I imporove this?
First that came to my mind was .pivot() function. Though this might not be the exact thing you wanted, — the original image_src column would be dropped, the column naming would slightly differ and nan might need handling, — still it's rather a short solution.
(df.pivot(index=['handle', 'image_position'],  # to keep these columns intact
          columns='image_src',
          values='image_src')
   .add_prefix('image_src_')
   .reset_index())
What I got:
| handle | image_position | image_src_img1 | image_src_img2 | image_src_img3 | |
|---|---|---|---|---|---|
| 0 | a | 1 | img1 | nan | nan | 
| 1 | b | 2 | nan | img2 | nan | 
| 2 | c | 3 | nan | nan | img3 | 
Another take is to use image_position column as an index for the new columns, thus it would be dropped too. Also you may want to replace nan with None:
import numpy as np
(df.pivot(index='handle',
          columns='image_position',
          values='image_src')
   .add_prefix('image_src_')
   .replace({np.nan: None})
   .reset_index())
Goes like this:
| handle | image_src_1 | image_src_2 | image_src_3 | |
|---|---|---|---|---|
| 0 | a | img1 | None | None | 
| 1 | b | None | img2 | None | 
| 2 | c | None | None | img3 | 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With