Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom XML file comparison

Tags:

java

xml

I've seen there are a lot of posts about XML comparison, but none of the one's I've looked at solve my problem.

We have some XML-formatted text documents (product descritptions, with headings and paragraphs) that are being updated (i.e. versioned), and I've been tasked with making change digests. That is, we want to take two sequent files and generate a third; the heading structure (outline) is to be preserved, but only paragraphs with changes are to be kept -- additions as well as deletions should be marked up.

So I've been trying to find a way to walk both DOM trees and detecting additions and deletions, but I'm running into problems detecting them reliably. It's obviously because I should be doing a diff -- but I can't use a plain diff because I want to do individual diffs inside each element, and because I can't use a traditional diff result but need to have a fully formatted xml digest.

Any hints before I try to tackle the "Longest common subsequence problem", which is going to be a huge task?

like image 892
KlaymenDK Avatar asked Dec 08 '25 22:12

KlaymenDK


2 Answers

I would suggest using XMLUnit as an engine for differencing. It provides ability to use you own DifferenceListener which is notified whenever two nodes are different. In the handler you'd be able to process addition of appropriate DOM nodes to your target document.

like image 113
oiavorskyi Avatar answered Dec 11 '25 13:12

oiavorskyi


A professional solution to this problem - but it's not free - is the DeltaXML product. Buying it will probably be cheaper than building your own.

like image 38
Michael Kay Avatar answered Dec 11 '25 11:12

Michael Kay