Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting such vector elements so that the sum of elements is exactly equal to the specified value

I have a vector of random positive integers. I would like to select only those elements of the vector whose sum will be exactly equal to a certain predetermined value.

Let's take an example like this. x=1:5, I am looking for elements whose sum is equal to 14. The solution is of course the vector c(2, 3, 4, 5).

Of course, there may be several solutions. Example 2. x=1:5, I'm looking for elements whose sum is equal to 7. Here, of course, should be the following three solutions:
1.c(2, 5),
2.c(3, 4),
3.c(1, 2, 4).

There may also be a situation where there will be no solutions at all. Example 3. x=c(1, 2, 7), I'm looking for elements whose sum equals 5. Of course, there are no correct solutions here.

Everything seems trivially simple if we have vectors of several elements. Here, I even came up with a few alternative solutions. However, the problem becomes when the size of the vector increases.

My vector looks like this:

x= c(236L, 407L, 51L, 308L, 72L, 9787L, 458L, 5486L, 42L, 4290L,
  31L, 3533L, 1102L, 24L, 100L, 669L, 9352L, 4091L, 2751L, 3324L,
  3193L, 245L, 86L, 98932L, 77L, 13L, 9789L, 91L, 999L, 25L, 25379L,
  9626L, 9092L, 622L, 97L, 57L, 2911L, 6L, 405L, 894L, 1760L, 9634L,
  96L, 9765L, 223L, 765L, 743L, 5960L, 14L, 50L, 89L, 348L, 5875L,
  5L, 58602L, 397L, 1181L, 94L, 954L, 7901L, 836L, 8810L, 52L,
  15L, 48L, 26L, 4L, 66L, 5265L, 80L, 282L, 231L, 76L, 661L, 7604L,
  7406L, 58L, 10L, 903L, 49446L, 80921L, 1L, 876L, 334L, 63L, 796L,
  88L, 413L, 1214L, 2983L, 9518L, 595L, 708L, 53L, 321L, 12L, 634L,
  4910L, 8984L, 465L)

I have to find at least one subset of elements whose sum will be exactly 23745. Unfortunately, I had a complete failure here. Whatever I write is calculated in hours and I don't get any correct result anyway. Does anyone have any idea how this can be solved in R? I will be grateful for even a small hint.

like image 208
blrun Avatar asked Sep 10 '25 07:09

blrun


2 Answers

I have to admit that your problem inspired me and made me wonder. I decided to face it by creating my own optimization function. And even though you got the answer that uses the gbp package (I didn't know it before), let them share my own function. Here it is findSumm.

findSumm = function(x, sfind, nmax=1, tmax=1){
  if(sum(x)<sfind) stop("Impossible solution! sum(x)<sfind!")

  fTimeSec = function() as.numeric(Sys.time()-l$tstart, units="secs")
  #The current selection of vector element
  sel = c(TRUE, rep(FALSE, length(x)-1))
  #List of intermediate states of the vector sel
  lsel = list()
  #List with a collection of parameters and results
  l = list(
    x = sort(x, TRUE),
    tstart = Sys.time(),
    chosen = list(),
    xfind = list(),
    time = c(),
    stop = FALSE,
    reason = "")

  while(TRUE) {
    #Maximum Runtime Test
    if(fTimeSec()>tmax) {
      l$reason = "Calculation time is greater than tmax.\n"
      l$stop = TRUE
      break
    }

    #Record the solution and test the number of solutions
    if(sum(l$x[sel])==sfind){
      #Save solution
      l$chosen[[length(l$chosen)+1]] = sel
      l$xfind[[length(l$xfind)+1]] = l$x[sel]
      l$time = c(l$time, fTimeSec())

      #Test the number of solutions
      if(length(l$chosen)==nmax){
        l$reason = "Already found nmax solutions.\n"
        l$stop = TRUE
        break
      }
    }

    idx = which(sel)
    if(idx[length(idx)]==length(sel)) {
      if(length(lsel)==0) break
      sel=lsel[[length(lsel)]]
      idx = which(sel)
      lsel[length(lsel)]=NULL
      sel[idx[length(idx)]]=FALSE
      sel[idx[length(idx)]+1]=TRUE
      next
    }

    if(sum(l$x[sel])>=sfind){
      sel[idx[length(idx)]]=FALSE
      sel[idx[length(idx)]+1]=TRUE
      next
    } else {
      lsel[[length(lsel)+1]] = sel  #Save the current state of sel vector
      sel[idx[length(idx)]+1]=TRUE
      next
    }
  }
  if(length(l$chosen)==0 & !l$stop) stop("No solutions!")

  l$reason = paste(l$reason, "Found", length(l$chosen),
                   "solutions in time", signif(fTimeSec(), 3), "seconds.\n")
  cat(l$reason)
  return(l)
}

Let's check how it works

findSumm(1:5, 20)$xfind
#Error in findSumm(1:5, 20) : Impossible solution! sum(x)<sfind!

findSumm(c(1,2,7), 5)$xfind
#Error in findSumm(c(1, 2, 7), 5) : No solutions!

findSumm(1:5, 14, 10, 10)$xfind
# Found 1 solutions in time 0.007 seconds.
# [[1]]
# [1] 5 4 3 2

findSumm(1:5, 5, 10, 10)$xfind
# Found 3 solutions in time 0.001 seconds.
# [[1]]
# [1] 5
#
# [[2]]
# [1] 4 1
#
# [[3]]
# [1] 3 2

findSumm(1:5, 7, 10, 10)$xfind
# Found 3 solutions in time 0.004 seconds.
# [[1]]
# [1] 5 2
#
# [[2]]
# [1] 4 3
#
# [[3]]
# [1] 4 2 1

As you can see it was doing great. Now it's time to check that on your vector x.

x= c(236L, 407L, 51L, 308L, 72L, 9787L, 458L, 5486L, 42L, 4290L,
  31L, 3533L, 1102L, 24L, 100L, 669L, 9352L, 4091L, 2751L, 3324L,
  3193L, 245L, 86L, 98932L, 77L, 13L, 9789L, 91L, 999L, 25L, 25379L,
  9626L, 9092L, 622L, 97L, 57L, 2911L, 6L, 405L, 894L, 1760L, 9634L,
  96L, 9765L, 223L, 765L, 743L, 5960L, 14L, 50L, 89L, 348L, 5875L,
  5L, 58602L, 397L, 1181L, 94L, 954L, 7901L, 836L, 8810L, 52L,
  15L, 48L, 26L, 4L, 66L, 5265L, 80L, 282L, 231L, 76L, 661L, 7604L,
  7406L, 58L, 10L, 903L, 49446L, 80921L, 1L, 876L, 334L, 63L, 796L,
  88L, 413L, 1214L, 2983L, 9518L, 595L, 708L, 53L, 321L, 12L, 634L,
  4910L, 8984L, 465L)

findSumm(x, 23745, 1, 10)$xfind[[1]]
# Already found nmax solutions.
# Found 1 solutions in time 0.008 seconds.
# [1] 9789 9787 4091   77    1

A few comments about my function. My function searches for all possible and valid combinations unless it reaches the number of valid results specified by nmax or the computation takes tmax seconds. In the case of your vector x and the sum you are looking for 23745, the number of correct solutions is enormous. I turned it on for 1 min and got the 37827 results! And the function would still find valid results at a rate of 626 solutions per second perhaps for the next 100 years! Below is a visualization of this process.

l = findSumm(x, 23745, +Inf, 60)
library(tidyverse)
library(ggpmisc)
df = tibble(
  n = 1:length(l$chosen),
  t = l$time
)

df %>% ggplot(aes(t,n))+
  geom_line(size=0.1)+
  geom_smooth(method = lm, formula = y~x)+
  stat_poly_eq(formula = y~x,
               aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
               parse = TRUE)

enter image description here

Finally, I decided to check the performance of my function. I did not expect a revelation because it was not written in pure C. However, I must admit that the graph below surprised me very pleasantly!

library(microbenchmark)
ggplot2::autoplot(microbenchmark(findSumm(x, 23745), 
                                 gbp1d_solver_dpp(p = x, w = x, c = 23745L), 
                                 times=100))

enter image description here

As it turned out, my function is almost 4 times faster than gbp1d_solver_dpp. I'm proud!

Since I am very stubborn and curiously inquisitive, I decided to perform one more test on a vector with a length of 1000.

x=c(234L, 1891L, 3187L, 38417L, 2155L, 6857L, 71692L, 463575L, 
    800L, 2195L, 820L, 9735L, 913L, 62685L, 920597L, 864L, 903L, 
    478L, 2828L, 99371L, 3109L, 379L, 8544L, 444L, 772L, 571L, 226L, 
    94L, 378L, 60253L, 10920L, 47626L, 671L, 45163L, 27767L, 62498L, 
    87706L, 4966L, 4615L, 14897L, 261L, 684L, 3780L, 97L, 705L, 7313L, 
    3629L, 436L, 5076L, 3198L, 731L, 56634L, 67411L, 249L, 403L, 
    82728L, 9986L, 643662L, 11045L, 934L, 8154L, 289L, 4452L, 624L, 
    4876L, 86859L, 933L, 2372L, 6493L, 773566L, 6599L, 459L, 2024L, 
    80425L, 591L, 6262L, 35033L, 89607L, 6435L, 14917L, 9559L, 67983L, 
    82365L, 88127L, 466L, 758L, 11605L, 828L, 410L, 557L, 2991L, 
    808L, 8512L, 273605L, 294L, 4666L, 27L, 26337L, 7340L, 682L, 
    46480L, 19903L, 699L, 700L, 58L, 136L, 852L, 909L, 64316L, 9109L, 
    876L, 6382L, 803L, 295L, 9539L, 26271L, 1906L, 23639L, 9022L, 
    9513L, 169L, 65427L, 861864L, 743L, 91L, 9039L, 247L, 58749L, 
    5674L, 65959L, 99126L, 7765L, 5934L, 13881L, 77696L, 66894L, 
    977L, 6279L, 46273L, 919L, 6307L, 316L, 420113L, 61336L, 70L, 
    6148L, 257L, 17804L, 14L, 989L, 16907L, 36L, 25L, 333L, 224L, 
    119L, 4000L, 9438L, 5439L, 748L, 16532L, 4847L, 939L, 9504L, 
    2782L, 424L, 64034L, 5306L, 30247L, 6636L, 3976L, 60588L, 180L, 
    78118L, 1L, 61866L, 9501L, 15834L, 66712L, 77219L, 448L, 612L, 
    5339L, 58413L, 4785L, 2191L, 35711L, 84383L, 6261L, 896L, 24353L, 
    54868L, 288L, 8059L, 867L, 687L, 94667L, 1713L, 1507L, 71048L, 
    882L, 4155L, 97230L, 49492L, 47839L, 793L, 263L, 63160L, 9062L, 
    3518L, 55956L, 6626L, 14619L, 636L, 1127L, 970L, 5512L, 118117L, 
    2370L, 802L, 98333L, 6089L, 1076L, 80L, 305L, 3995L, 437L, 49L, 
    9207L, 2021L, 7554L, 9486L, 33501L, 55745L, 967L, 24857L, 692L, 
    4148L, 464957L, 2381L, 3876L, 3246L, 1478L, 308L, 98068L, 532L, 
    4670L, 7965L, 940L, 467L, 777L, 68749L, 2739L, 23951L, 831L, 
    60763L, 12047L, 75620L, 650L, 69584L, 294122L, 41149L, 9657L, 
    780L, 153054L, 37990L, 16L, 894L, 15500L, 31873L, 3800L, 472L, 
    50989L, 8767L, 8209L, 2929L, 4751L, 38L, 47403L, 64941L, 28042L, 
    49020L, 81785L, 299L, 936L, 63136L, 3L, 42033L, 1750L, 1147L, 
    273L, 62668L, 41L, 5829L, 686L, 511L, 65019L, 842L, 88716L, 96217L, 
    9442L, 6324L, 197L, 55422L, 630L, 665L, 3921L, 726L, 766916L, 
    43944L, 9035L, 573L, 77942L, 29689L, 749L, 95240L, 281L, 1933L, 
    78265L, 812L, 854L, 17445L, 8855L, 2940L, 6057L, 46689L, 999L, 
    381L, 347L, 50199L, 161L, 534L, 804L, 99043L, 13183L, 679L, 432L, 
    38887L, 575L, 781L, 2023L, 187077L, 89498L, 85L, 16780L, 3731L, 
    45904L, 13861L, 3971L, 301L, 4175L, 9427L, 126913L, 845L, 175L, 
    1684L, 9064L, 56647L, 116L, 479672L, 6754L, 441L, 412L, 97091L, 
    4062L, 598L, 146L, 423L, 2715L, 198939L, 80577L, 76385L, 2088L, 
    139L, 647L, 246L, 85002L, 898L, 50939L, 135L, 46388L, 623L, 17928L, 
    63072L, 346L, 78582L, 16691L, 838L, 44L, 5181L, 7918L, 3650L, 
    35L, 8825L, 9758L, 22677L, 9838L, 2239L, 9001L, 96689L, 570L, 
    47373L, 507L, 6378L, 40839L, 11677L, 937874L, 2485L, 22188L, 
    20413L, 13L, 877L, 5578L, 428L, 61L, 3200L, 5444L, 85540L, 640L, 
    94460L, 310L, 6043L, 3771L, 6167L, 476L, 9365L, 1956L, 143L, 
    7841L, 4957L, 3309L, 9317L, 41434L, 97881L, 51853L, 474L, 3098L, 
    7109L, 93976L, 545L, 28475L, 2066L, 4959L, 7410L, 293L, 8246L, 
    43L, 721L, 2260L, 72854L, 100L, 61382L, 107L, 5637L, 891L, 256L, 
    442L, 84440L, 55792L, 195L, 24074L, 19L, 57376L, 59159L, 805253L, 
    193329L, 3636L, 98954L, 968L, 380L, 5203L, 90157L, 71907L, 35497L, 
    41769L, 1683L, 1984L, 5765L, 832L, 411L, 4888L, 9801L, 710L, 
    2325L, 40L, 32927L, 435L, 66L, 66301L, 94776L, 48234L, 28977L, 
    122312L, 48L, 359L, 572L, 753L, 945L, 32241L, 328L, 55976L, 128L, 
    815794L, 57894L, 576L, 60131L, 342448L, 8913L, 33506L, 20448L, 
    58750L, 637L, 82086L, 635710L, 96772L, 272L, 938L, 4863L, 737L, 
    949L, 4804L, 3446L, 92319L, 28883L, 6032L, 53970L, 9394L, 5630L, 
    71583L, 136862L, 23161L, 8545L, 54249L, 213666L, 668L, 893L, 
    881126L, 8252L, 584L, 83L, 13754L, 244156L, 530L, 64574L, 22009L, 
    89204L, 34992L, 85992L, 82697L, 50L, 95845L, 3096L, 42L, 554949L, 
    325L, 2092L, 28L, 3830L, 893583L, 625L, 3740L, 4513L, 9938L, 
    910L, 8868L, 9614L, 41281L, 27915L, 25839L, 4417L, 5730L, 2825L, 
    683L, 550L, 88838L, 9248L, 961L, 2748L, 7259L, 53220L, 2179L, 
    4036L, 46014L, 83725L, 8211L, 6957L, 6886L, 4653L, 6300L, 80437L, 
    135885L, 23745L, 9536L, 78L, 652590L, 1037L, 5293L, 492L, 7467L, 
    71685L, 890L, 5023L, 96524L, 17465L, 53665L, 21508L, 463L, 159L, 
    311L, 764L, 27534L, 71L, 2504L, 270L, 6449L, 13449L, 302L, 88L, 
    3893L, 22007L, 9208L, 680618L, 878L, 14721L, 20L, 322374L, 644L, 
    944669L, 57334L, 233L, 982L, 870L, 950L, 121L, 254L, 4226L, 45L, 
    61823L, 9626L, 58590L, 6552L, 3920L, 68L, 3644L, 35775L, 4591L, 
    636207L, 78314L, 408L, 371L, 984L, 7089L, 4679L, 2233L, 756L, 
    20527L, 178L, 80573L, 589923L, 120L, 7938L, 894842L, 6563L, 569L, 
    91110L, 620L, 786288L, 46022L, 396L, 762533L, 145964L, 7732L, 
    60L, 274L, 87869L, 227L, 6706L, 707L, 955L, 48246L, 771L, 29001L, 
    14224L, 5173L, 20215L, 7566L, 1564L, 733L, 3568L, 3570L, 39256L, 
    925L, 41577L, 348L, 68267L, 151L, 98572L, 1389L, 5421L, 69043L, 
    42434L, 27597L, 53320L, 46051L, 1686L, 59L, 361L, 747579L, 5044L, 
    73873L, 28894L, 8146L, 353L, 2622L, 664L, 349L, 90764L, 8920L, 
    716L, 14903L, 96055L, 89L, 94239L, 416L, 7896L, 232L, 5543L, 
    61664L, 6709L, 2L, 14275L, 2954L, 917416L, 3567L, 42086L, 99956L, 
    86112L, 206L, 64L, 25956L, 57112L, 425L, 6507L, 28034L, 991L, 
    8444L, 140L, 1461L, 68783L, 347633L, 87696L, 593L, 164L, 837L, 
    8793L, 965L, 8811L, 97412L, 351L, 23L, 66808L, 8308L, 14245L, 
    12519L, 3019L, 1920L, 813L, 485L, 979L, 929L, 2970L, 32447L, 
    8962L, 867973L, 40534L, 551L, 20941L, 49413L, 188L, 948L, 9018L, 
    187252L, 3919L, 45963L, 358L, 7211L, 959L, 47L, 4220L, 36086L, 
    1645L, 33056L, 300L, 29682L, 9152L, 431L, 364L, 2211L, 3779L, 
    4633L, 22500L, 33980L, 794L, 84558L, 488L, 732L, 6686L, 15042L, 
    906L, 13553L, 6115L, 153L, 866L, 3624L, 329L, 6875L, 86L, 6298L, 
    57424L, 17582L, 955879L, 40945L, 4858L, 694L, 755L, 499L, 406L, 
    564L, 874L, 1695L, 43961L, 578L, 9063L, 505L, 5856L, 4484L, 76708L, 
    712L, 23348L, 986L, 275L, 996L, 8966L, 220L, 7008L, 849L, 953460L, 
    3062L, 278L, 26L, 8547L, 16895L, 98289L, 815L, 25135L, 956L, 
    370L, 8221L, 72674L, 31711L, 73L, 41667L, 2915L, 797L, 41309L, 
    4257L, 8148L, 5723L, 2124L, 8306L, 53388L, 33520L, 680L, 893759L, 
    40133L, 94791L, 988L, 162L, 79366L, 37625L, 7125L, 50947L, 171L, 
    99558L, 166L, 90717L, 5807L, 606L, 98592L, 59207L, 966L, 61299L, 
    7553L, 9678L, 62322L, 156L, 267L, 8478L, 59554L, 2264L, 28338L, 
    899L, 9719L, 98L, 51403L, 6302L, 265L, 79929L, 101L, 5227L, 972L, 
    145L, 48018L, 90140L, 698L, 8L, 5751L, 26083L, 1295L, 78124L, 
    383L, 2776L, 80204L, 210L, 3422L, 36064L, 46L, 4953L, 20271L, 
    3916L, 767L, 601372L, 56575L, 5237L, 5621L, 6705L, 1191L, 63768L, 
    1016L, 313L, 2285L, 12489L, 2755L, 338L, 7518L, 2630L, 421L, 
    6554L, 306L, 113L, 57197L, 885L, 9445L, 37364L, 86630L, 2460L, 
    715L, 10829L, 9914L, 6635L, 229L, 525L, 839L, 3278L, 969L, 182L, 
    187L, 7022L, 554L, 6489L, 15791L, 4157L, 47048L, 9447L, 152L, 
    1419L, 22618L, 5194L, 609L, 923L, 768L, 6248L, 714L, 1159L, 825893L, 
    53492L, 19731L, 65167L, 96325L, 336L, 4443L, 843L, 62960L, 9788L, 
    35032L, 284L, 4647L, 360L, 11297L, 1515L)

findSumm(x, 9568447L)$xfind[[1]]
# Already found nmax solutions.
# Found 1 solutions in time 0.065 seconds.
# [1] 955879 953460 944669 937874 920597 917416 894842 893759 893583 881126 347633  27597      8      3      1

As you can see my findSumm function works great. It took only 0.065 seconds to extract a subset with a sum equal to 9568447L! Unfortunately, trying to run gbp1d_solver_dpp on such a long vector resulted in the error "size is too large". So I was not able to compare the performance of these two solutions with such a large vector.

like image 177
Marek Fiołka Avatar answered Sep 12 '25 21:09

Marek Fiołka


This task sounds like a 1 dimensional bin packing problem or knapsack problem, in which case there are many resources available online to help guide you.

One potential solution is to use the gbp package, e.g.

#install.packages("gbp")
library(gbp)
#> Loading required package: magrittr
#> Loading required package: data.table
x <- c(236L, 407L, 51L, 308L, 72L, 9787L, 458L, 5486L, 42L, 4290L,
     31L, 3533L, 1102L, 24L, 100L, 669L, 9352L, 4091L, 2751L, 3324L,
     3193L, 245L, 86L, 98932L, 77L, 13L, 9789L, 91L, 999L, 25L, 25379L,
     9626L, 9092L, 622L, 97L, 57L, 2911L, 6L, 405L, 894L, 1760L, 9634L,
     96L, 9765L, 223L, 765L, 743L, 5960L, 14L, 50L, 89L, 348L, 5875L,
     5L, 58602L, 397L, 1181L, 94L, 954L, 7901L, 836L, 8810L, 52L,
     15L, 48L, 26L, 4L, 66L, 5265L, 80L, 282L, 231L, 76L, 661L, 7604L,
     7406L, 58L, 10L, 903L, 49446L, 80921L, 1L, 876L, 334L, 63L, 796L,
     88L, 413L, 1214L, 2983L, 9518L, 595L, 708L, 53L, 321L, 12L, 634L,
     4910L, 8984L, 465L)

test <- gbp1d_solver_dpp(p = x, w = x, c = 23745L)

list_of_selected_items <- x[as.logical(test$k)]
list_of_selected_items
#> [1]  236   51  308  458 5486 4290   31 3533 9352
sum(list_of_selected_items)
#> [1] 23745

Created on 2021-10-18 by the reprex package (v2.0.1)

like image 29
jared_mamrot Avatar answered Sep 12 '25 22:09

jared_mamrot