Why use diffxml?
I’m the author of the diffxml tool for comparing XML documents. In this post I’d like to explain why you might want to use diffxml to compare XML documents rather than traditional text tools such as the UNIX diff command.
There are two things that diffxml understands that diff doesn’t; the syntax of XML documents (e.g. <br/> is equivalent to <br></br>) and the hierarchical structure they represent.
The advantages of understanding XML syntax are pretty easy to explain. Consider these two XML documents:
<a
>text<b/>
<c></c>
</a>
and
<a>text<b></b>
<d/>
</a>
If we compare these using diff, we get the following output:
1,3c1,2
< <a
< >text<b/>
< <c></c>
Which is telling us that every line in the document has changed. However, if we use diffxml to difference the documents, we get:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<delta>
<insert charpos="2" childno="4" name="d" nodetype="1" parent="/node()[1]"/>
<delete node="/node()[1]/node()[5]"/>
</delta>
Which is telling us that the difference between the documents is the insertion of an element “d” and the removal of another element1.
The other major advantage of diffxml is that it understands the hierarchical, or “tree” structure of XML documents. It’s a little harder to explain what this means, but consider the following. The XML document:
<a><b><d/></b><c><e/></c></a>
Can be represented as:

And the XML document:
<a><b/><c><d/><e/></c></a>
Can be represented as:

It’s clear from the diagram that the only change is that the element “d” has moved from element “b” to element “c”. There is no way that a line-based differencing utility could tell us this, but diffxml gives us:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<delta>
<move childno="1" new_charpos="1" node="/node()[1]/node()[1]/node()[1]" old_charpos="1" parent="/node()[1]/node()[2]"/>
</delta>
Which correctly identifies that the only difference is the move of a single element to a new parent.
I hope this makes it clear why tools such as diffxml which understand the heirarchical narture of XML documents are often a better choice than line-based equivalents for comparing XML documents.
- Admittedly the output is a little hard for humans to read currently. There are a couple of things that can be done to improve this (use proper node names instead of using the node() axis and put a string in the nodetype attribute), but in the future I hope to provide some sort of graphical interface. [↩]

May 20th, 2009 at 10:07 pm
Excellent stuff! Visual Studio project (.vcproj) and solution (.sln) files are XML and I frequently get problems with merging in changes using the TortoiseMerge (which is very good, but line-based).
Is it possible to use diffxml as a custom diff / merge tool with TortoiseSVN?
May 21st, 2009 at 6:59 pm
Not at the minute, but it’s sounds like a useful idea – I’ll add it to the list of wanted features.
The main focus at the moment is getting the quality right; in the current version you can still expect to run into the odd bug.
June 8th, 2009 at 10:27 am
I have just seen this utility and following a company project based on this. Its undoubtly an easy resolution but can we automate the process or comparing two xml files thru this tool? I have been asked this question and looking for an answer. So that we can include diffxml in our projects.
Thanks
June 8th, 2009 at 5:48 pm
I’m not 100% sure what you mean, but I think the answer is yes.
They are command line utilities, so it’s dead simple to create a wrapper script or something. You could also directly axis the Java classes, but that’s a little more work (and remember that they are GPL licensed).
August 27th, 2009 at 8:04 pm
Please, we absolutely need a GUI interface for this. Awesome tool!