Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replicate pandas DataFrame rows and change periodically one column

I have df going like

pd.DataFrame([["A1"     "B1",      "C1", "P"],
              ["A2"     "B2",      "C2", "P"],
              ["A3"     "B3",      "C3", "P"]], columns=["col_a"  "col_b",   "col_c", "col_d"])


col_a  col_b   col_c col_d
A1     B1      C1    P
A2     B2      C2    P
A3     B3      C3    P
...

the result I need is basically repeat and ensure that columns have P Q R extension in col_d for every unique row occurence

col_a  col_b   col_c col_d
A1     B1      C1    P
A1     B1      C1    Q
A1     B1      C1    R

A2     B2      C2    P
A2     B2      C2    Q
A2     B2      C2    R

A3     B3      C3    P
A3     B3      C3    Q
A3     B3      C3    R
...

All I have so far is:

new_df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)

Which result in duplication of those values, but col_d is unchanged

EDIT:

Now I stumbled upon another need, where for every unique col_a and col_b I need to add "S" to col_d

Resulting for instance in this:

col_a  col_b   col_c col_d
A1     B1      C1    P
A1     B1      C1    Q
A1     B1      C1    R
A1     B1       T    S

A2     B2      C2    P
A2     B2      C2    Q
A2     B2      C2    R
A2     B2       T    S

Thank you very much for help!

like image 624
DisplayedName Avatar asked Oct 19 '25 03:10

DisplayedName


1 Answers

Add values to column col_d by DataFrame.assign with numpy.tile:

L = ['P','Q','R']
new_df = (pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
           .assign(col_d = np.tile(L, len(df))))

print (new_df)
  col_acol_b col_c col_d
0       A1B1    C1     P
1       A1B1    C1     Q
2       A1B1    C1     R
3       A2B2    C2     P
4       A2B2    C2     Q
5       A2B2    C2     R
6       A3B3    C3     P
7       A3B3    C3     Q
8       A3B3    C3     R

Another similar idea is repeat indices and duplicated rows by DataFrame.loc:

L = ['P','Q','R']
new_df = (df.loc[df.index.repeat(3)]
            .assign(col_d = np.tile(L, len(df)))
            .reset_index(drop=True))

print (new_df)
  col_acol_b col_c col_d
0       A1B1    C1     P
1       A1B1    C1     Q
2       A1B1    C1     R
3       A2B2    C2     P
4       A2B2    C2     Q
5       A2B2    C2     R
6       A3B3    C3     P
7       A3B3    C3     Q
8       A3B3    C3     R

EDIT:

L = ['P','Q','R','S']
new_df = (pd.DataFrame(np.repeat(df.values, len(L), axis=0), columns=df.columns)
           .assign(col_d = np.tile(L, len(df)),
                   col_c = lambda x: x['col_c'].mask(x['col_d'].eq('S'), 'T')))

print (new_df)
   col_acol_b col_c col_d
0        A1B1    C1     P
1        A1B1    C1     Q
2        A1B1    C1     R
3        A1B1     T     S
4        A2B2    C2     P
5        A2B2    C2     Q
6        A2B2    C2     R
7        A2B2     T     S
8        A3B3    C3     P
9        A3B3    C3     Q
10       A3B3    C3     R
11       A3B3     T     S
like image 166
jezrael Avatar answered Oct 20 '25 16:10

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!