Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duplicating rows and and creating ID column and repeating column in Python

Tags:

python

pandas

I have the following input dataframe:

Input Dataframe:

c1  c2  c3  c4  c5  c6  c7  c8    c9 c10 c11
56  1   2   4   1.0 1   4.0 1.0   2  2   18000.0
52  2   2   5   3.0 1   4.0 1.0   1  1   0.0
82  2   2   5   4.0 2   4.0 1.0   1  1   0.0
26  1   2   4   2.0 1   4.0 1.0   2  2   12000.0
65  1   2   4   1.0 1   4.0 23.0  2  1   324900.0

In the input datframe, I want duplicate each row 3 times. The calculated id column is the same number that repeats three times and represents the row number. The calculated type column goes 1,2,3 for each original record.

How can I do this in Python?

Expected Output:

id type  c1  c2  c3  c4  c5  c6  c7  c8   c9 c10 c11
1  1     56 1   2   4   1.0 1   4.0 1.0   2  2   18000.0
1  2     56 1   2   4   1.0 1   4.0 1.0   2  2   18000.0
1  3     56 1   2   4   1.0 1   4.0 1.0   2  2   18000.0
2  1     52 2   2   5   3.0 1   4.0 1.0   1  1   0.0
2  2     52 2   2   5   3.0 1   4.0 1.0   1  1   0.0
2  3     52 2   2   5   3.0 1   4.0 1.0   1  1   0.0
3  1     82 2   2   5   4.0 2   4.0 1.0   1  1   0.0
3  2     82 2   2   5   4.0 2   4.0 1.0   1  1   0.0
3  3     82 2   2   5   4.0 2   4.0 1.0   1  1   0.0
4  1     26 1   2   4   2.0 1   4.0 1.0   2  2   12000.0
4  2     26 1   2   4   2.0 1   4.0 1.0   2  2   12000.0
4  3     26 1   2   4   2.0 1   4.0 1.0   2  2   12000.0
5  1     65 1   2   4   1.0 1   4.0 23.0  2  1   324900.0
5  2     65 1   2   4   1.0 1   4.0 23.0  2  1   324900.0
5  3     65 1   2   4   1.0 1   4.0 23.0  2  1   324900.0

I have the following code but I don't like it because I have to use assign() twice and also I'm not sure how to calculate the id column...I just put placeholder code there. I seem to get the type column correct though.

My Attempt:

(df
     .dropna()
     .assign(id =  lambda x: range(1, len(x) + 1)
            )
      .pipe(lambda x: x.loc[x.index.repeat(3)])
      .assign(id = lambda x: np.r_[:len(x)] % 3 + 1,
              type = lambda x: np.r_[:len(x)] % 3 + 1))
like image 638
Eisen Avatar asked Oct 31 '25 08:10

Eisen


2 Answers

A cross join will do the job:

pd.merge(
    df.rename_axis("id").reset_index(),
    pd.DataFrame({"type": [1, 2, 3]}),
    how="cross",
)

# If you are using pandas 1.5 or later
pd.merge(
    df.reset_index(names="id"),
    pd.DataFrame({"type": [1, 2, 3]}),
    how="cross",
)
like image 164
Code Different Avatar answered Nov 02 '25 23:11

Code Different


A slightly more complicated way to do this using concat and groupby with cumcount:

df3 = pd.concat([df, df, df]).sort_index().reset_index(names='id')
df3['type'] = df3.groupby('id').cumcount() + 1

Output:

    id  c1  c2  c3  c4   c5  c6   c7    c8  c9  c10       c11  type
0    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     1
1    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     2
2    0  56   1   2   4  1.0   1  4.0   1.0   2    2   18000.0     3
3    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     1
4    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     2
5    1  52   2   2   5  3.0   1  4.0   1.0   1    1       0.0     3
6    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     1
7    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     2
8    2  82   2   2   5  4.0   2  4.0   1.0   1    1       0.0     3
9    3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     1
10   3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     2
11   3  26   1   2   4  2.0   1  4.0   1.0   2    2   12000.0     3
12   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     1
13   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     2
14   4  65   1   2   4  1.0   1  4.0  23.0   2    1  324900.0     3
like image 31
Nick Avatar answered Nov 02 '25 21:11

Nick