Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate All Permutations in Excel using LAMBDA

This is a frequently asked and answered question: how can I generate all permutations in Excel:

2011 2016 2017 2017 superuser 2018 2021

and now in 2022 which did not get an answer before it was closed as a duplicate, which is unfortunate because LAMBDA really changes the way that this question would be answered.

I have infrequently had the same need and get frustrated by having to reinvent a complex wheel. So, I will re-pose the question and put my own answer below. I will not mark any submissions as the answer, but invite good ideas. I am sure that my own approach can be improved upon.

Restating the 2022 question

I am trying to create a loop in excel with formulas only. What I am trying to achieve is described below. Let's say I have 3 columns as inputs: (i) Country; (ii) variable; and (iii) year. I want to expand from these inputs to then assign values to these parameters.

Inputs:

Country Variable Year
GB GDP 2015
DE area 2016
CH area 2015

Outputs:

Country Variable Year
GB GDP 2015
GB GDP 2016
GB area 2015
GB area 2016
DE GDP 2015
DE GDP 2016
DE area 2015
DE area 2016

How can I do that efficiently using Excel?

Expanding the 2018 question

I have three columns, each of which has different kinds of master data as shown below:

enter image description here

Now, I want to have all possible combinations of these three cells - like

aa kk jj
aa kk ff
aa ll jj
aa ll ff
aa mm jj
...

Can this be done with a formula. I found one formula with 2 columns, but i'm not able to extend it to 3 columns correctly

Formula with 2 columns:

=IF(ROW()-ROW($G$1)+1>COUNTA($A$2:$A$15)*COUNTA($B$2:$B$4),"",
INDEX($A$2:$A$15,INT((ROW()-ROW($G$1))/COUNTA($B$2:$B$4)+1))&
INDEX($B$2:$B$4,MOD(ROW()-ROW($G$1),COUNTA($B$2:$B$4))+1))

where G1 is the cell to place the resulting value

Common Requirements

What these have in common is that they are both trying to create an ordered set of permutations from an ordered set of symbols. They both happen to need 3 levels of symbols, but the 2018 question was asking for help to go from 2 levels to 3 and the 2021 question was asking to go from 3 levels to 5. The 2022 question is just asking for 3 levels, but the output needs to be a table.

What if we go up to 6 levels like this?

L1 L2 L3 L4 L5 L6
A F K P U 1
B G L Q V 2
C H R W 3
D X 4
E

This would generate 1'440 permutations.

L1 L2 L3 L4 L5 L6
A F K P U 1
A F K P U 2
A F K P U 3
A F K P U 4
A F K P V 1
A F K P V 2
A F K P V 3
A F K P V 4
A F K P W 1
... ... ... ... ... ...

Making a general formula that takes in any number of levels (columns) is hard. Just go through the answers provided - they each required some rocket science and until now, all the solutions had hard-coded limits on the number of columns of symbols. So can LAMBDA give us a general solution?

like image 323
mark fitzpatrick Avatar asked Jan 24 '26 08:01

mark fitzpatrick


2 Answers

Cool question and brain teezer; I just puzzled on something which is using MAKEARRAY():


Option 1:

What you'd call as "super inefficient" is creating a list of permutations when you calculate rows^columns. I think the following is not that inefficient. Let's imagine the following:

enter image description here

Formula in E1:

=LET(A,A1:C3,B,ROWS(A),C,COLUMNS(A),D,B^C,E,UNIQUE(MAKEARRAY(D,C,LAMBDA(rw,cl,INDEX(IF(A="","",A),MOD(CEILING(rw/(D/(B^cl)),1)-1,B)+1,cl)))),FILTER(E,MMULT(--(E<>""),SEQUENCE(C,,,0))=C))

In short, what this does:

  • Variables A-D are all helpers.
  • The idea then is to just use simple INDEX()s to return all values. To do so we need the right indices for both rows and columns.
  • MAKEARRAY() will make calculations relatively easy due to the recursive functionality that lambda brings. Inside these functions its basic math to return the correct indices for both these rows and columns. In fact there is no calculation needed for the columns as we simply refer to 'cl' and all the calculation for all the row indices is done through MOD(CEILING(rw/(D/(B^cl)),1)-1,B)+1.
  • The result from the above is put into UNIQUE() to use very little resources to filter out any potential duplicates and limit potential empty rows to just a single empty row.
  • FILTER() and MMULT() work nicely together to filter out any unwanted results (read; empty).

That's as compact and quick as I think I could get this. The formula will now work on any contiguous range of cells. A single cell, a single row, a single column or any 2D-range.


Option 2:

OP rightfully mentioned that option 1 could create too many tuples at the start to only later discard them. This can be inefficient. To tackle this issue (and if this is not something you want), we can use a much larger formula. Let's imagine the following data:

A B C
a d f
b e h
e
c g
g

We see that there are empty cells and duplicate values. These are the reasons option 1 would create too many tuples. To counter that I came up with a much longer formula:

=LET(A,A1:C5,B,ROWS(A),C,COLUMNS(A),D,IF(A="",NA(),A),E,MAKEARRAY(B,C,LAMBDA(rw,cl,INDEX(SORT(INDEX(D,0,cl)),rw))),F,BYCOL(E,LAMBDA(cl,COUNTA(FILTER(cl,NOT(ISERROR(cl)))))),G,MAKEARRAY(PRODUCT(F),C,LAMBDA(rw,cl,INDEX(E,MOD(CEILING(rw/IFERROR(PRODUCT(INDEX(F,SEQUENCE(C-cl,,cl+1))),1),1)-1,INDEX(F,cl))+1,cl))),UNIQUE(G))

To break this down:

  • LET() - Work with variables;
  • A - Our initial full range of cells (contiguous);
  • B - The total amount of rows of A;
  • C - The total amount of columns of A;
  • D - The formula IF(A="",NA(),A) is meant to check each value in the matrix if it's empty (string). If so, make it an error (which will make sense in the next step).
  • E - In this step the formula MAKEARRAY(B,C,LAMBDA(rw,cl,INDEX(SORT(INDEX(D,0,cl)),rw))) is sorting each column so the values are on top and all errors pushed down:
A B C
a d f
b e g
c e g
#N/A #N/A h
#N/A #N/A #N/A
  • F - This variable's formula BYCOL(E,LAMBDA(cl,COUNTA(FILTER(cl,NOT(ISERROR(cl)))))) will now count the amount of items per columns. This is needed for later use and to count all permutations. The outcome in this specific case would be {3;3;4}.
  • G - The last variable (if one chooses to use it as such) using MAKEARRAY(PRODUCT(F),C,LAMBDA(rw,cl,INDEX(E,MOD(CEILING(rw/IFERROR(PRODUCT(INDEX(F,SEQUENCE(C-cl,,cl+1))),1),1)-1,INDEX(F,cl))+1,cl))). It's rather long but each step makes sense; Get the product (all possible permutations) to calculate the total amount of rows, the columns stay the same. In the LAMBDA() we refer to all columns after the current column-indice in the F variable. It's quite a big chunk to digest and unfortunately I'm not good enough in explaining =).
  • UNIQUE(G) - The very last step is to filter out all double permutations if one chooses too.

The result:

enter image description here

Now, even though option 1 would beat option 2 in readability, the 2nd option took (with very limited testing) just a third of the time it took the 1st option to calculate. So in terms of speed, this 2nd option would be preferred.


As an alternative to the 2nd option I first had:

=LET(A,A1:C5,B,ROWS(A),C,COLUMNS(A),D,MAKEARRAY(B,C,LAMBDA(rw,cl,IF(MATCH(INDEX(A,rw,cl),INDEX(A,0,cl),0)=rw,INDEX(A,rw,cl),NA()))),E,MAKEARRAY(B,C,LAMBDA(rw,cl,INDEX(SORT(INDEX(D,0,cl)),rw))),F,BYCOL(E,LAMBDA(cl,COUNTA(UNIQUE(FILTER(cl,NOT(ISERROR(cl))))))),G,MAKEARRAY(PRODUCT(F),C,LAMBDA(rw,cl,INDEX(E,MOD(CEILING(rw/IFERROR(PRODUCT(INDEX(F,SEQUENCE(C-cl,,cl+1))),1),1)-1,INDEX(F,cl))+1,cl))),G)

This would now change the D variable to a longer formula to remove duplicates in each column beforehand. Both variations would work just fine.

like image 178
JvdV Avatar answered Jan 26 '26 23:01

JvdV


A Simple LET supported by LAMBDA

To the 2022 question, I used the following LET:

=LET( matrix, A2:E6,

   cC, COLUMNS( matrix ), cSeq, SEQUENCE( 1, cC ),
   rC, ROWS( matrix ), rSeq, SEQUENCE( rC ),
   eC, rC ^ cC, eSeq, SEQUENCE( eC,,0 ),
   unblank, IF( ISBLANK(matrix), "°|°", matrix ),
   m, UNIQUE( INDEX( unblank, MOD( INT( INT( SEQUENCE( eC, cC, 0 )/cC )/rC^SEQUENCE( 1, cC, cC-1, -1 ) ), rC ) + 1, cSeq ) ),
   FILTER( m, BYROW( IFERROR( FIND( "°|°", m ), 0 ), LAMBDA(x, SUM( x ) ) ) = 0 ) )

where the matrix that is used to generate the permutations in in A2:E6: enter image description here

This formula inserts "°|°" in place of blanks so as to avoid having the blanks recast as 0's. It eliminates duplicates by applying UNIQUE.

This is super inefficient because it generates every possible permutation, including repeats and then filters them out. For a small scale matrix, it's not a big deal, but imagine that a 100 row by 3 column matrix would generate 1'000'000 rows and then filter them down to a small subset.

There is one small benefit, however, notice the f in the yellow cell, stranded in the middle of the column and not contiguous with its peers. This formula handles that like a champ. It just integrates it into the outputs of valid permutations. That will be important in the efficient formula below.

An Efficient General LET supported by LAMBDA

As for a general LAMBDA formula, I used:

SYMBOLPERMUTATIONS =
LAMBDA( matrix,

LET(
       cC, COLUMNS( matrix ),
       cSeq, SEQUENCE( 1, cC ),
       symbolCounts, BYCOL( matrix, LAMBDA(x, SUM( --NOT( ISBLANK( x ) ) ) ) ),
       rSeq, SEQUENCE( MAX( symbolCounts )-1 ),
       permFactors, INDEX( SCAN( 1, INDEX( symbolCounts, , cC-cSeq+1), LAMBDA(a,b, a*b ) ),, cC-cSeq+1 ),
       permMods, IFERROR( INDEX( permFactors,, IF( cSeq + 1  > cC, -1, cSeq+1 ) ), 1 ),
       idx, INT( MOD( SEQUENCE( INDEX(permFactors, 1, 1),,0 ), permFactors )/permMods ) + 1,
       answer, INDEX( matrix, idx, cSeq ),
       er, OR( BYCOL( --ISBLANK(matrix), LAMBDA(x, SUM(--(INDEX(x,rSeq+1)<INDEX(x,rSeq))) ) ) ), // detect if there are stranded symbols
       IF( SUM(symbolCounts)=0, "no symbols",
             IF( er, "symbol columns must be contiguous",
                          answer ) ) )

)

enter image description here

This takes in matrix, just as above (the LET version is shown below). It delivers quite the same result, but there are significant differences:

  • It is efficient. In the example shown here, the Simple formula above would generate 5^6 = 15625 permutations just to arrive at 1440 valid ones. This generates only what is needed and does not require filtering.
  • However, it does not handle that stranded f at all. In fact, it would generate a 0 in the place of the f which is not something that a user with a ton of permutations might even notice.

For this reason, there is error detection and handling. The variable er detects if there are any stranded symbols in the matrix that are not column-wise contiguous with the ones above it using this LAMBDA Helper:

OR( BYCOL( --ISBLANK(matrix), LAMBDA(x, SUM(--(INDEX(x,rSeq+1)<INDEX(x,rSeq))) ) ) )

But this creates a new challenge and perhaps a whole new question. In the case of the stranded f, can anyone come up with a way to incorporate it?

The other error trap is to detect if the column is completely empty.

LAMBDA Magic

The magic that LAMBDA brings is from this one line that makes both formulas extensible to any number of columns without having to hard code them:

permFactors, INDEX( SCAN( 1, INDEX( symbolCounts, , cC-cSeq+1), LAMBDA(a,b, a*b ) ),, cC-cSeq+1 )

I got this from JvdV's answer to this question that was aimed specifically at trying to solve this question. Scott Craner & Bosco Yip had showed that it was basically unsolvable without LAMBDA and JvdV showed how it can be done with the LAMBDA helper SCAN.

LET Version of the Efficient Formula

=LET( matrix, A2:F6,
   cC, COLUMNS( matrix ),
   cSeq, SEQUENCE( 1, cC ),
   symbolCounts, BYCOL( matrix, LAMBDA(x, SUM( --NOT( ISBLANK( x ) ) ) ) ),
   rSeq, SEQUENCE( MAX( symbolCounts )-1 ),
   permFactors, INDEX( SCAN( 1, INDEX( symbolCounts, , cC-cSeq+1), LAMBDA(a,b, a*b ) ),, cC-cSeq+1 ),
   permMods, IFERROR( INDEX( permFactors,, IF( cSeq + 1  > cC, -1, cSeq+1 ) ), 1 ),
   idx, INT( MOD( SEQUENCE( INDEX(permFactors, 1, 1),,0 ), permFactors )/permMods ) + 1,
   answer, INDEX( matrix, idx, cSeq ),
   er, OR( BYCOL( --ISBLANK(matrix), LAMBDA(x, SUM(--(INDEX(x,rSeq+1)<INDEX(x,rSeq))) ) ) ),
   IF( SUM(symbolCounts)=0, "no symbols",
         IF( er, "symbol columns must be contiguous",
                      answer ) ) )
like image 21
mark fitzpatrick Avatar answered Jan 26 '26 23:01

mark fitzpatrick