Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do subset-sum instances inherently require large integers to force exponential difficulty?

I'm developing custom subset-sum algorithms and have encountered a puzzling issue: it seems difficult to generate truly "hard" subset-sum instances (i.e., forcing exponential computational effort) without using very large integers (e.g., greater than about 2^22).

I'd specifically like to know:

  • Are there known constructions or instance generators for subset-sum that reliably force exponential complexity—particularly against common subset-sum algorithms or custom heuristics—using only moderately sized integers (≤2^22)?
  • Is the hardness of subset-sum instances inherently tied to the size of integers involved, or is it possible to create computationally difficult instances purely through numerical structure and relationships, even with smaller numbers?

For context, here are some attempts I've made at generating potentially hard instances (feedback or improvements welcome):

import random

def generate_exponential_instance(n):
    max_element = 2 ** 22
    A = [random.randint(1, max_element) for _ in range(n)]
    while True:
        mask = [random.choice([0, 1]) for _ in range(n)]
        if sum(mask) != 0:
            break
    target = sum(A[i] * mask[i] for i in range(n))
    return A, target

def generate_dense_high_values_instance(n):
    base = 2 ** 22 - random.randint(0, 100)
    A = [base + random.randint(0, 20) for _ in range(n)]
    target = sum(random.sample(A, k=n // 2))
    return A, target

def generate_merkle_hellman_instance(n, max_step=20):
    total = 0
    private_key = []
    for _ in range(n):
        next_val = total + random.randint(1, max_step)
        private_key.append(next_val)
        total += next_val
    q = random.randint(total + 1, 2 * total)
    r = random.randint(2, q - 1)
    public_key = [(r * w) % q for w in private_key]
    message = [random.randint(0, 1) for _ in range(n)]
    ciphertext = sum(b * k for b, k in zip(message, public_key))
    return public_key, ciphertext
like image 962
Naseiva Khan Avatar asked Oct 29 '25 17:10

Naseiva Khan


2 Answers

We know subset-sum to be solvable in pseudopolynomial time. "Pseudopolynomial time" means the worst-case running time on large inputs is bounded by a polynomial in the input length and the largest numeric value in the input. Because a string of L bits can encode numbers of size O(2^L), pseudopolynomial time algorithms really take exponential time (that is, exponential in the input length), but psuedopolynomial time algorithms are still considered to be better than "usual" exponential time algorithms since you can avoid the exponential behavior if you just use small numbers.

For example, even this simple algorithm for subset-sum:

def exists_subset_sum(num_set, target):
  reachable = {0}
  for num in num_set: reachable |= {prev + num for prev in reachable}
  return target in reachable

has pseudopolynomial running time. The key observation is that the reachable set contains only values in the interval [-i*M, i*M] at the ith iteration, where M is max(abs(n) for n in num_set)). Then, the size of the set is always O(L*M) (where L is the size of the input in bits) and operations on it take time polynomial in L and M, so the whole algorithm takes time polynomial in L and M. Technically, as M is exponential in L, the time complexity of exists_subset_sum in terms of only L is exponential. But, you can only get the exponential behavior if you let M grow exponentially. If you just increase L without increasing M you will get at worst polynomial growth.

You'll notice that many algorithms for solving subset-sum are also pseudopolynomial.

like image 115
HTNW Avatar answered Nov 01 '25 07:11

HTNW


Part of what may be confusing you is that this problem is hard because in deciding what is NP or not, the size of integers is measured in bits, not in the value. Given k inputs with a maximum value of N, it is pretty easy to solve the problem in time polynomial to kN. Too many computer scientists have gotten papers showing the P = NP because people thing they have solved this problem.

The "size" of an integer N is lg(N), not N. For this problem in particular, the worst-case complexity is dependent on the number of inputs and the total size (in bits) of all of the inputs. Hard problems either require lots of inputs or long inputs.

Try as your input 100 random 100-bit numbers, and make your target be half their sum.

like image 45
Frank Yellin Avatar answered Nov 01 '25 05:11

Frank Yellin