Dynamic programming quiz or comparing two blocks of text

Question

Task definition:

I try to write my own diff util. I want to implement inline-search.

Means I have two paragraphs of text. I have to constrant strings from first paragraph (p1) to strings in second paragraph (p2) in such way that sum of common words in constranted strings will be maximal.

And one important point, you can't replace the strings: I mean if you constrant p1[i] to p2[j], than you can't constrant p1[k] to p2[v] if k < i and v < j.

Little example:

Input:

You have two paragraphs:

"Very very very very"         "Very very very"
"bla bla bla"                 "Very very very very very"
"looks like a very dump text" "One more sentence"
"simple text"                 "looks like a peace of ..."
                              "quite simple"
                              "bla bla bla bla"

...and matrix where matrix[i][j] = number of common words in string p1[i] and p2[j]

3 4 0 0 0 0
0 0 0 0 0 3
0 0 0 3 0 0
0 0 0 0 1 0

Output:

You need to constrant them in next way:

----------------               "Very very very"
"Very very very very"          "Very very very very very"
"bla bla bla"                  ----------------
----------------               "One more sentence"
"looks like a very dump text"  "looks like a peace of ..."
"simple text"                  "quite simple"
----------------               "bla bla bla bla"

Or you can just form the next matrix:

(indexes of strings which have the constrants)

p1Indexes: [0, 2, 3] p2Indexes: [1, 3 ,4]

Question:

what is an efficient algorith for this task?

[Not obligatory to read] Faced difficulties:

How to pass index-collection to next iteration: I mean you need to copy all indexes on each iteration
If you want to use dynamic programing, you need to store not only a common wors number but also a two indexes collections for each possible iteration.

Solution:

public void genConditionLCS() {
    int i = -1;
    int j = -1;
    while (true) {
        int[] indexes = nextIndexes(i+1, j+1);
        i = indexes[0];
        j = indexes[1];
        if (i == -1 || j == -1) break;
        firstParagraphIndexes.add(i); 
        secondParagraphIndexes.add(j);
    }
}
private int[] nextIndexes(int i, int j) {
    if ((i > (lcs.length-1)) || (j > (lcs[0].length-1)))
        return new int[] {-1, -1};
    int a = maxBenefit(i + 1, j);
    int b = maxBenefit(i, j + 1);
    int c = maxBenefit(i + 1, j + 1) + lcs[i][j];
    if ((a == 0) && (b == 0) && (c == 0))
        return new int[]{-1, -1};
    else if (a >= b && a >= c)
        return nextIndexes(i+1, j);
    else if (b >= a && b >= c)
        return nextIndexes(i, j+1);
    else //if (c >= a && c >= b)
        return new int[]{i, j};
}

private int maxBenefit(int i, int j) {
    if ((i > lcs.length - 1) || (j > lcs[0].length - 1)) return 0;
    int res = maxBenefit[i][j];
    if (res == -1) {
        int a = maxBenefit(i + 1, j);
        int b = maxBenefit(i, j + 1);
        int c = maxBenefit(i + 1, j + 1) + lcs[i][j];
        res = max(a, b, c);
        maxBenefit[i][j] = res;
    }
    return res;
}

necromancer · Accepted Answer

Given arrays a[m] and b[n] and given a cost function: benefit(i, j) which calculates the number of common words between elements i and j, your problem can be stated as max_benefit(i, j) which means that i and j are aligned/matched and you need to find out the max benefit and alignment of the remaining part, which is: max(benefit(i + 1, j + 1) + max_benefit(i + 2, j + 2), benefit(i + 2, j + 1) + max_benefit(i + 3, j + 1), benefit(i + 3, j + 1) + max_benefit(i + 4, j + 1), ..., benefit(i + 1, j + 2) + max_benefit(i + 2, j + 3), benefit(i + 1, j + 3) + max_benefit(i + 1, j + 4), ...)

Now, when you first compute max_benefit for any pair of indexes, store the result so that you do not need to reompute it. I. e. check if you have a stored value before computing it; if not, compute it and store the value.

Re faced difficulties:

You can have the array references available as globals / class members, or you can pass the array references as 2 extra arguments: e. g. max_benefit(i, j, a, b) and benefit(i, j, a, b). The arrays won't be copied in most languages.
see the main part of this answer, you just recursively compute and store values so that you do not recompute.

Dynamic programming quiz or comparing two blocks of text

Tags:

algorithm

matrix

dynamic-programming

VB_

1 Answers

necromancer

Recent Activity

Donate For Us

Dynamic programming quiz or comparing two blocks of text

Tags:

algorithm

matrix

dynamic-programming

VB_

1 Answers

necromancer

Related questions

Recent Activity

Donate For Us