Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: make list size in a column same as in another column

I have two columns: serial_number and inv_number containing lists. If there is one inv_number for multiple serial_number, I need to make the size of inv_number's list the same as serial_number's.

    serial_number                                       inv_number
28  [С029768, С029775]                                  [101040031171, 101040031172]
29  [090020960190402011, 090020960190402009]            [210134002523, 210134002524]
31  [1094]                                              [410124000215]
32  [01]                                                [101040022094]
33  [F161B5, F17D86, F17D8D, F1825C, F1825A, F1825D]    [101040026976]

Here at the index 33 we have 6 serial numbers but one inventory number, so it should be changed to

[101040026976, 101040026976, 101040026976, 101040026976, 101040026976, 101040026976]

I've tried to do it by "multiplying" values to make a list (like [value] * N):

si.loc[si['inv_number'].apply(len)==1, 'inv_number'].apply
    (lambda x: [str(x[0])] * si['serial_number'].apply(len).values)

but it gives me an error:

UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U12'), dtype('int64')) -> None

How can I solve this problem?

like image 793
Michael Avatar asked Aug 30 '25 16:08

Michael


2 Answers

Try:

mask = (df["serial_number"].str.len() > 1) & (df["inv_number"].str.len() == 1)
df.loc[mask, "inv_number"] = df["serial_number"].str.len() * df.loc[mask, "inv_number"]

print(df)

Prints:

                                       serial_number                                                                            inv_number
28                                [С029768, С029775]                                                          [101040031171, 101040031172]
29          [090020960190402011, 090020960190402009]                                                          [210134002523, 210134002524]
31                                            [1094]                                                                        [410124000215]
32                                              [01]                                                                        [101040022094]
33  [F161B5, F17D86, F17D8D, F1825C, F1825A, F1825D]  [101040026976, 101040026976, 101040026976, 101040026976, 101040026976, 101040026976]
like image 155
Andrej Kesely Avatar answered Sep 02 '25 06:09

Andrej Kesely


With map, len & mul :

df["inv_number"] *= (-df.map(len).diff(axis=1).iloc[:, -1]).add(1)
serial_number inv_number
0 ['С029768', 'С029775'] ['101040031171', '101040031172']
1 ['090020960190402011', '090020960190402009'] ['210134002523', '210134002524']
2 ['1094'] ['410124000215']
3 ['01'] ['101040022094']
4 ['F161B5', 'F17D86', 'F17D8D', 'F1825C', 'F1825A', 'F1825D'] ['101040026976', '101040026976', '101040026976', '101040026976', '101040026976', '101040026976']
like image 42
Timeless Avatar answered Sep 02 '25 06:09

Timeless