Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Group combinations based on condition

Consider the following data set,

Name    Age Grade
Person1 18  50
Person2 19  100
Person3 20  0
Person4 21  -25
Person5 22  -125
Person6 23  80
Person7 24  -70

If I wanted to group together the students based on the aggregation of their grades, how would I go about it? What I mean is - I want to make groups of students such as the sum of the grades of all the students in that grade is 0 or as close to zero as possible.

So, ideally - the answer to the dataset would be something like -

  1. Group 1 ----> Person1, Person2, Person4, Person5 (because their added grades are 0)

  2. Group 2 ----> Person3 (because the grade is 0)

  3. Group 3 ----> Person6, Person7 (because the added sum is 10)

How I went on about approaching this was -

new_dt = pd.read_csv('../tt.csv')

a = list(itertools.chain.from_iterable(
    [[j for i in el for j in i] for el in itertools.combinations(new_dt.values.tolist(),i)]
    for i in range(1, len(new_dt)+1)
    )
    )

out = pd.DataFrame(a)

So, basically I am finding out all the possible combinations using itertools.combinations

    0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20
0   Person1 18  50                                                                      
1   Person2 19  100                                                                     
2   Person3 20  0                                                                       
3   Person4 21  -25                                                                     
4   Person5 22  -125                                                                        
5   Person6 23  80                                                                      
6   Person7 24  -70                                                                     
7   Person1 18  50  Person2 19  100                                                         
8   Person1 18  50  Person3 20  0                                                           
9   Person1 18  50  Person4 21  -25                                                         
10  Person1 18  50  Person5 22  -125                                                            
11  Person1 18  50  Person6 23  80                                                          
12  Person1 18  50  Person7 24  -70                                                         
13  Person2 19  100 Person3 20  0                                                           
14  Person2 19  100 Person4 21  -25                                                         
15  Person2 19  100 Person5 22  -125                                                            
16  Person2 19  100 Person6 23  80                                                          
17  Person2 19  100 Person7 24  -70                                                         
18  Person3 20  0   Person4 21  -25                                                         
19  Person3 20  0   Person5 22  -125                                                            
20  Person3 20  0   Person6 23  80                                                          
21  Person3 20  0   Person7 24  -70                                                         
22  Person4 21  -25 Person5 22  -125                                                            
23  Person4 21  -25 Person6 23  80                                                          
24  Person4 21  -25 Person7 24  -70                                                         
25  Person5 22  -125    Person6 23  80                                                          
26  Person5 22  -125    Person7 24  -70                                                         
27  Person6 23  80  Person7 24  -70                                                         
28  Person1 18  50  Person2 19  100 Person3 20  0                                               
29  Person1 18  50  Person2 19  100 Person4 21  -25                                             
30  Person1 18  50  Person2 19  100 Person5 22  -125                                                
31  Person1 18  50  Person2 19  100 Person6 23  80                                              
32  Person1 18  50  Person2 19  100 Person7 24  -70                                             
33  Person1 18  50  Person3 20  0   Person4 21  -25                                             
34  Person1 18  50  Person3 20  0   Person5 22  -125                                                
35  Person1 18  50  Person3 20  0   Person6 23  80                                              
36  Person1 18  50  Person3 20  0   Person7 24  -70                                             
37  Person1 18  50  Person4 21  -25 Person5 22  -125                                                
38  Person1 18  50  Person4 21  -25 Person6 23  80                                              
39  Person1 18  50  Person4 21  -25 Person7 24  -70                                             
40  Person1 18  50  Person5 22  -125    Person6 23  80                                              
41  Person1 18  50  Person5 22  -125    Person7 24  -70                                             
42  Person1 18  50  Person6 23  80  Person7 24  -70                                             
43  Person2 19  100 Person3 20  0   Person4 21  -25                                             
44  Person2 19  100 Person3 20  0   Person5 22  -125                                                
45  Person2 19  100 Person3 20  0   Person6 23  80                                              
46  Person2 19  100 Person3 20  0   Person7 24  -70                                             
47  Person2 19  100 Person4 21  -25 Person5 22  -125                                                
48  Person2 19  100 Person4 21  -25 Person6 23  80                                              
49  Person2 19  100 Person4 21  -25 Person7 24  -70                                             
50  Person2 19  100 Person5 22  -125    Person6 23  80                                              
51  Person2 19  100 Person5 22  -125    Person7 24  -70                                             
52  Person2 19  100 Person6 23  80  Person7 24  -70                                             
53  Person3 20  0   Person4 21  -25 Person5 22  -125                                                
54  Person3 20  0   Person4 21  -25 Person6 23  80                                              
55  Person3 20  0   Person4 21  -25 Person7 24  -70                                             
56  Person3 20  0   Person5 22  -125    Person6 23  80                                              
57  Person3 20  0   Person5 22  -125    Person7 24  -70                                             
58  Person3 20  0   Person6 23  80  Person7 24  -70                                             
59  Person4 21  -25 Person5 22  -125    Person6 23  80                                              
60  Person4 21  -25 Person5 22  -125    Person7 24  -70                                             
61  Person4 21  -25 Person6 23  80  Person7 24  -70                                             
62  Person5 22  -125    Person6 23  80  Person7 24  -70                                             
63  Person1 18  50  Person2 19  100 Person3 20  0   Person4 21  -25                                 
64  Person1 18  50  Person2 19  100 Person3 20  0   Person5 22  -125                                    
65  Person1 18  50  Person2 19  100 Person3 20  0   Person6 23  80                                  
66  Person1 18  50  Person2 19  100 Person3 20  0   Person7 24  -70                                 
67  Person1 18  50  Person2 19  100 Person4 21  -25 Person5 22  -125                                    
68  Person1 18  50  Person2 19  100 Person4 21  -25 Person6 23  80                                  
69  Person1 18  50  Person2 19  100 Person4 21  -25 Person7 24  -70                                 
70  Person1 18  50  Person2 19  100 Person5 22  -125    Person6 23  80                                  
71  Person1 18  50  Person2 19  100 Person5 22  -125    Person7 24  -70                                 
72  Person1 18  50  Person2 19  100 Person6 23  80  Person7 24  -70                                 
73  Person1 18  50  Person3 20  0   Person4 21  -25 Person5 22  -125                                    
74  Person1 18  50  Person3 20  0   Person4 21  -25 Person6 23  80                                  
75  Person1 18  50  Person3 20  0   Person4 21  -25 Person7 24  -70                                 
76  Person1 18  50  Person3 20  0   Person5 22  -125    Person6 23  80                                  
77  Person1 18  50  Person3 20  0   Person5 22  -125    Person7 24  -70                                 
78  Person1 18  50  Person3 20  0   Person6 23  80  Person7 24  -70                                 
79  Person1 18  50  Person4 21  -25 Person5 22  -125    Person6 23  80                                  
80  Person1 18  50  Person4 21  -25 Person5 22  -125    Person7 24  -70                                 
81  Person1 18  50  Person4 21  -25 Person6 23  80  Person7 24  -70                                 
82  Person1 18  50  Person5 22  -125    Person6 23  80  Person7 24  -70                                 
83  Person2 19  100 Person3 20  0   Person4 21  -25 Person5 22  -125                                    
84  Person2 19  100 Person3 20  0   Person4 21  -25 Person6 23  80                                  
85  Person2 19  100 Person3 20  0   Person4 21  -25 Person7 24  -70                                 
86  Person2 19  100 Person3 20  0   Person5 22  -125    Person6 23  80                                  
87  Person2 19  100 Person3 20  0   Person5 22  -125    Person7 24  -70                                 
88  Person2 19  100 Person3 20  0   Person6 23  80  Person7 24  -70                                 
89  Person2 19  100 Person4 21  -25 Person5 22  -125    Person6 23  80                                  
90  Person2 19  100 Person4 21  -25 Person5 22  -125    Person7 24  -70                                 
91  Person2 19  100 Person4 21  -25 Person6 23  80  Person7 24  -70                                 
92  Person2 19  100 Person5 22  -125    Person6 23  80  Person7 24  -70                                 
93  Person3 20  0   Person4 21  -25 Person5 22  -125    Person6 23  80                                  
94  Person3 20  0   Person4 21  -25 Person5 22  -125    Person7 24  -70                                 
95  Person3 20  0   Person4 21  -25 Person6 23  80  Person7 24  -70                                 
96  Person3 20  0   Person5 22  -125    Person6 23  80  Person7 24  -70                                 
97  Person4 21  -25 Person5 22  -125    Person6 23  80  Person7 24  -70                                 
98  Person1 18  50  Person2 19  100 Person3 20  0   Person4 21  -25 Person5 22  -125                        
99  Person1 18  50  Person2 19  100 Person3 20  0   Person4 21  -25 Person6 23  80                      
100 Person1 18  50  Person2 19  100 Person3 20  0   Person4 21  -25 Person7 24  -70                     
101 Person1 18  50  Person2 19  100 Person3 20  0   Person5 22  -125    Person6 23  80                      
102 Person1 18  50  Person2 19  100 Person3 20  0   Person5 22  -125    Person7 24  -70                     
103 Person1 18  50  Person2 19  100 Person3 20  0   Person6 23  80  Person7 24  -70                     
104 Person1 18  50  Person2 19  100 Person4 21  -25 Person5 22  -125    Person6 23  80                      
105 Person1 18  50  Person2 19  100 Person4 21  -25 Person5 22  -125    Person7 24  -70                     
106 Person1 18  50  Person2 19  100 Person4 21  -25 Person6 23  80  Person7 24  -70                     
107 Person1 18  50  Person2 19  100 Person5 22  -125    Person6 23  80  Person7 24  -70                     
108 Person1 18  50  Person3 20  0   Person4 21  -25 Person5 22  -125    Person6 23  80                      
109 Person1 18  50  Person3 20  0   Person4 21  -25 Person5 22  -125    Person7 24  -70                     
110 Person1 18  50  Person3 20  0   Person4 21  -25 Person6 23  80  Person7 24  -70                     
111 Person1 18  50  Person3 20  0   Person5 22  -125    Person6 23  80  Person7 24  -70                     
112 Person1 18  50  Person4 21  -25 Person5 22  -125    Person6 23  80  Person7 24  -70                     
113 Person2 19  100 Person3 20  0   Person4 21  -25 Person5 22  -125    Person6 23  80                      
114 Person2 19  100 Person3 20  0   Person4 21  -25 Person5 22  -125    Person7 24  -70                     
115 Person2 19  100 Person3 20  0   Person4 21  -25 Person6 23  80  Person7 24  -70                     
116 Person2 19  100 Person3 20  0   Person5 22  -125    Person6 23  80  Person7 24  -70                     
117 Person2 19  100 Person4 21  -25 Person5 22  -125    Person6 23  80  Person7 24  -70                     
118 Person3 20  0   Person4 21  -25 Person5 22  -125    Person6 23  80  Person7 24  -70                     
119 Person1 18  50  Person2 19  100 Person3 20  0   Person4 21  -25 Person5 22  -125    Person6 23  80          
120 Person1 18  50  Person2 19  100 Person3 20  0   Person4 21  -25 Person5 22  -125    Person7 24  -70         
121 Person1 18  50  Person2 19  100 Person3 20  0   Person4 21  -25 Person6 23  80  Person7 24  -70         
122 Person1 18  50  Person2 19  100 Person3 20  0   Person5 22  -125    Person6 23  80  Person7 24  -70         
123 Person1 18  50  Person2 19  100 Person4 21  -25 Person5 22  -125    Person6 23  80  Person7 24  -70         
124 Person1 18  50  Person3 20  0   Person4 21  -25 Person5 22  -125    Person6 23  80  Person7 24  -70         
125 Person2 19  100 Person3 20  0   Person4 21  -25 Person5 22  -125    Person6 23  80  Person7 24  -70         
126 Person1 18  50  Person2 19  100 Person3 20  0   Person4 21  -25 Person5 22  -125    Person6 23  80  Person7 24  -70

After getting that, all the grades are in every 3rd column, so I'm adding the sums of every 3rd row to that row with

only_grades['Total'] = only_grades.sum(axis=1)
out['Total'] = only_grades['Total']
out.to_csv('../tt_2.csv')

which adds a column "Total" with the following outputs on each line,

Total
50
100
0
-25
-125
80
-70
150
50
25
-75
130
-20
100
75
-25
180
30
-25
-125
80
-70
-150
55
-95
-45
-195
10
150
125
25
230
80
25
-75
130
-20
-100
105
-45
5
-145
60
75
-25
180
30
-50
155
5
55
-95
110
-150
55
-95
-45
-195
10
-70
-220
-15
-115
125
25
230
80
0
205
55
105
-45
160
-100
105
-45
5
-145
60
-20
-170
35
-65
-50
155
5
55
-95
110
30
-120
85
-15
-70
-220
-15
-115
-140
0
205
55
105
-45
160
80
-70
135
35
-20
-170
35
-65
-90
30
-120
85
-15
-40
-140
80
-70
135
35
10
-90
-40
10

I was then wondering, I'd make the groups out of these. What do you think of this approach? And how might you make groups using the "Totals" column so that every single row in the original dataset is considered in making the groups only once, not more.

like image 269
user10141156 Avatar asked May 13 '26 18:05

user10141156


1 Answers

Your answer seems to work but its complexity is exponential since you do an exhaustive search.

This problem seems to be like the subset sum problem, where you want to find if there is a subset that sums to zero in a larger set (It is a NP problem, so you won't find any perfect solution in a polynomial time. Here is an explanation of NP problems if you do not know that).

Since there are already existing polynomial algorithms to find an approximation of the solution to the subset sum problem, I think adapting your problem to that could lead you to an acceptable solution. Here is the backtracking algorithm I would use.

Now let's adapt your problem :

1) The backtracking problem only applies on positive numbers, so the first step would be to find the minimum grade in your set. Keep it, you will need it several times. A simple for loop will do it.

2) If this minimum grade is < 0, you can add it to your whole set to have positive grades. Again a for loop will do it. If the minimum grade is >= 0, you do not have any subset making 0 other than lone students getting zero as a grade (and you have your solution).

3) Now your problem is the exact same problem that on the video, with n the size of your complete set, and m the number you added to your whole set on step 2) (which is the zero before adding). Do the algorithm, find your subset, and remove it from the set.

4) Note that your algorithm may come to the end of the graph without finding the required subset. So each time you backtrack, keep the subset and its sum, compare it to the best found so far, replace it if it is better.

5) Apply step 3) to your remaining set until you get a sum of your remaining set < m or until it is empty. If not empty, it becomes your last set.

This may not be the best solution, feel free to propose any improvement. This algorithm will prioritize low distance subsets + a last bad subset, so if your need is to have all subsets approximately good this may not be a good solution.

The complexity of the algorithm depends a lot if you find your subsets fast each time in the graph, so ordering your students by grades can improve the time complexity. And it would remove the step 1) since the minimum is then the first element.

like image 200
Baptiste Merliot Avatar answered May 16 '26 08:05

Baptiste Merliot