OVERVIEW
I got a set of possible valid chunks I can use to split a text (if possible).
How can i split a given text using these chunks such as the result will be optimized (minimized) in terms of the number of resulting chunks?
TEST SUITE
if __name__ == "__main__":
import random
import sys
random.seed(1)
# 1) Testing robustness
examples = []
sys.stdout.write("Testing correctness...")
N = 50
large_number = "3141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481"
for i in range(100):
for j in range(i):
choices = random.sample(range(i), j)
examples.append((choices, large_number))
for (choices, large_number) in examples:
get_it_done(choices, large_number)
sys.stdout.write("OK")
# 2) Testing correctness
examples = [
# Example1 ->
# Solution ['012345678910203040506070', '80', '90', '100', '200', '300', '400', '500', '600', '700', '800', '900']
(
[
"0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "20", "30", "40", "50", "60", "70", "80", "90",
"100", "200", "300", "400", "500", "600", "700", "800", "900",
"012345678910203040506070"
],
"0123456789102030405060708090100200300400500600700800900"
),
# Example2
## Solution ['100']
(
["0", "1", "10", "100"],
"100"
),
# Example3
## Solution ['101234567891020304050', '6070809010020030040050', '0600700800900']
(
[
"10", "20", "30", "40", "50", "60", "70", "80", "90",
"012345678910203040506070",
"101234567891020304050",
"6070809010020030040050",
"0600700800900"
],
"10123456789102030405060708090100200300400500600700800900"
),
# Example4
### Solution ['12', '34', '56', '78', '90']
(
[
"12", "34", "56", "78", "90",
"890",
],
"1234567890"
),
# Example5
## Solution ['12', '34']
(
[
"1", "2", "3",
"12", "23", "34"
],
"1234"
),
# Example6
## Solution ['100', '10']
(
["0", "1", "10", "100"],
"10010"
)
]
score = 0
for (choices, large_number) in examples:
res = get_it_done(choices, large_number)
flag = "".join(res) == large_number
print("{0}\n{1}\n{2} --> {3}".format(
large_number, "".join(res), res, flag))
print('-' * 80)
score += flag
print(
"Score: {0}/{1} = {2:.2f}%".format(score, len(examples), score / len(examples) * 100))
# 3) TODO: Testing optimization, it should provide (if possible)
# minimal cases
QUESTION
How could I solve this problem on python without using a brute-force approach?
Using dynamic programming, you can construct a list (l0, l1, l2, ... ln-1), where n is the number of characters in your input string and li is the minimum number of chunks you need to arrive at character i of the input string. The overall structure would look as follows:
minValues := list with n infinity entries
for i from 0 to n-1
for every choice c that is a suffix of input[0..i]
if i - len(c) < 0
newVal = 1
else
newVal = minValues[i - len(c)] + 1
end if
if(newVal < minValues[i])
minValues[i] = newVal
//optionally record the used chunk
end if
next
next
The minimum number of chunk for your entire string is then ln-1. You can get the actual chunks by tracking back through the list (which requires to record the used chunks).
Retrieving the choices that are suffixes can be sped up using a trie (of the reverse choice strings). The worst case complexity will still be O(n * c * lc), where n is the length of the input string, c is the number of choices, and lc is the maximum length of the choices. However, this complexity will only occur for choices that are nested suffixes (e.g. 0, 10, 010, 0010...). In this case, the trie will degenerate to a list. In average, the run time should be much less. Under the assumption that the number of retrieved choices from the trie is always a small constant, it is O(n * lc) (actually, the lc factor is probably also smaller).
Here is an example:
choices = ["0","1","10","100"]
text = "10010"
algorithm step content of minValues
0 1 2 3 4
---------------------------------------------------------
initialize (∞, ∞ , ∞ , ∞ , ∞ )
i = 0, c = "1" (1 "1", ∞ , ∞ , ∞ , ∞ )
i = 1, c = "0" (1 "1", 2 "0", ∞ , ∞ , ∞ )
i = 1, c = "10" (1 "1", 1 "10", ∞ , ∞ , ∞ )
i = 2, c = "0" (1 "1", 1 "10", 2 "0", ∞ , ∞ )
i = 2, c = "100" (1 "1", 1 "10", 1 "100", ∞ , ∞ )
i = 3, c = "1" (1 "1", 1 "10", 1 "100", 2 "1", ∞ )
i = 4, c = "0" (1 "1", 1 "10", 1 "100", 2 "1", 3 "0" )
i = 4, c = "10" (1 "1", 1 "10", 1 "100", 2 "1", 2 "10")
Meaning: We can compose the string with 2 chunks. Tracing back gives the chunks in reverse order: "10", "100".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With