Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find longest substring s.t the count of each character does not exceed the number of unique characters

Tags:

algorithm

Given a string, I want to find the longest substring such that the count of each character does not exceed the number of unique characters in that substring.

For example:

Input: 'aaabb'
Output: 'aabb'

I understand how to do this brute-force: Generate all substrings while tracking the count of each character and the number of distinct characters in each substring. If there's a substring where the condition is violated, we move the start of the window forward. Else, we move the end of the window forward.

The brute-force approach is in O(n^2). Is there a more efficient way to do this?

like image 607
Mark Carothers Avatar asked Oct 16 '25 16:10

Mark Carothers


1 Answers

Here's an O(n^2) algorithm, I think. (note that I don't think the O(n^2) solution given in the question works)

O(n^2) in the n >>> # of unique characters case:

Let S be the input string.

  1. Scan S and produce C the set of distinct characters in S. Let |C| denote the length of C.
  2. For each character c in C, produce a cumulative-occurrences array O_c, where each O_c[i] is the number of c in the substring S[0:i]. This is O(n * |C|).
  3. Given a start and endpoint, we can get the number of characters c in the substring via O_c[end] - O_c[start]. Obtain the maximum occurrences and the number of non-zero occurrences and use these to check validity.
  4. Using 3, check each start and endpoint in O(n^2 * |C|). If we can assume that the number of characters is small compared to n, that's O(n^2).

It seems like if there is an improvement over O(n^2), it'd be some way to avoid checking every start and endpoint in 4.

O(n) in the n >>> (# of unique characters)^2 case:

  1. If for some substring S[i:j] there are more occurrences of some character c than |C|, then any S[i:k] with k > j is an invalid substring (because all the unique characters in the string cannot make up for the current occurrences of c). Thus, we can immediately increment i once we find this condition.
  2. For substring of length l, at least one character will have at least ceil(l / |C|) occurrences (pigeonhole argument).
  3. So the maximal length of a valid substring is |C|^2.
like image 89
Kaia Avatar answered Oct 18 '25 16:10

Kaia



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!