<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Feeding the Bit Bucket &#187; diffxml</title>
	<atom:link href="http://www.adrianmouat.com/bit-bucket/tag/diffxml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.adrianmouat.com/bit-bucket</link>
	<description>Software development thoughts and rants</description>
	<lastBuildDate>Mon, 09 Jan 2012 13:49:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Why use diffxml?</title>
		<link>http://www.adrianmouat.com/bit-bucket/2009/05/why-use-diffxml/</link>
		<comments>http://www.adrianmouat.com/bit-bucket/2009/05/why-use-diffxml/#comments</comments>
		<pubDate>Tue, 19 May 2009 20:19:53 +0000</pubDate>
		<dc:creator>Adrian Mouat</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[diffxml]]></category>

		<guid isPermaLink="false">http://www.adrianmouat.com/bit-bucket/?p=15</guid>
		<description><![CDATA[I&#8217;m the author of the diffxml tool for comparing XML documents. In this post I&#8217;d like to explain why you might want to use diffxml to compare XML documents rather than traditional text tools such as the UNIX diff command. There are two things that diffxml understands that diff doesn&#8217;t; the syntax of XML documents [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m the author of the <a href="http://diffxml.sf.net">diffxml</a> tool for comparing XML documents. In this post I&#8217;d like to explain why you might want to use diffxml to compare XML documents rather than traditional text tools such as the UNIX diff command.</p>
<p>There are two things that diffxml understands that diff doesn&#8217;t; the syntax of XML documents (e.g. &lt;br/&gt; is equivalent to &lt;br&gt;&lt;/br&gt;) and the hierarchical structure they represent.<span id="more-15"></span></p>
<p>The advantages of understanding XML syntax are pretty easy to explain. Consider these two XML documents:</p>
<blockquote><p><code>&lt;a<br />
&gt;text&lt;b/&gt;<br />
&lt;c&gt;&lt;/c&gt;<br />
&lt;/a&gt;</code></p></blockquote>
<p>and</p>
<blockquote><p><code>&lt;a&gt;text&lt;b&gt;&lt;/b&gt;<br />
&lt;d/&gt;<br />
&lt;/a&gt;</code></p></blockquote>
<p>If we compare these using diff, we get the following output:</p>
<blockquote><p><code>1,3c1,2<br />
&lt; &lt;a<br />
&lt;     &gt;text&lt;b/&gt;<br />
&lt; &lt;c&gt;&lt;/c&gt;</code></p></blockquote>
<p>Which is telling us that every line in the document has changed. However, if we use diffxml to difference the documents, we get:</p>
<blockquote><p><code>&lt;?xml version="1.0" encoding="UTF-8" standalone="no"?&gt;<br />
&lt;delta&gt;<br />
&lt;insert charpos="2" childno="4" name="d" nodetype="1" parent="/node()[1]"/&gt;<br />
&lt;delete node="/node()[1]/node()[5]"/&gt;<br />
&lt;/delta&gt;</code></p></blockquote>
<p>Which is telling us that the difference between the documents is the insertion of an element &#8220;d&#8221; and the removal of another element<sup><a href="http://www.adrianmouat.com/bit-bucket/2009/05/why-use-diffxml/#footnote_0_15" id="identifier_0_15" class="footnote-link footnote-identifier-link" title="Admittedly the output is a little hard for humans to read currently. There are a couple of things that can be done to improve this (use proper node names instead of using the node() axis and put a string in the nodetype attribute), but in the future I hope to provide some sort of graphical interface.">1</a></sup>.</p>
<p>The other major advantage of diffxml is that it understands the hierarchical, or &#8220;tree&#8221; structure of XML documents. It&#8217;s a little harder to explain what this means, but consider the following. The XML document:</p>
<blockquote><p><code>&lt;a&gt;&lt;b&gt;&lt;d/&gt;&lt;/b&gt;&lt;c&gt;&lt;e/&gt;&lt;/c&gt;&lt;/a&gt;</code></p></blockquote>
<p>Can be represented as:</p>
<p><img class="size-medium wp-image-33 alignnone" title="Tree representation of XML" src="http://www.adrianmouat.com/bit-bucket/wp-content/uploads/2009/05/tree1.png" alt="Tree representation of XML" width="79" height="102" /></p>
<p>And the XML document:</p>
<blockquote><p><code>&lt;a&gt;&lt;b/&gt;&lt;c&gt;&lt;d/&gt;&lt;e/&gt;&lt;/c&gt;&lt;/a&gt;</code></p></blockquote>
<p>Can be represented as:</p>
<p><img class="size-full wp-image-41 alignnone" title="Tree representation of XML document" src="http://www.adrianmouat.com/bit-bucket/wp-content/uploads/2009/05/tree2.png" alt="Tree representation of XML document" width="98" height="100" /></p>
<p>It&#8217;s clear from the diagram that the only change is that the element &#8220;d&#8221; has moved from element &#8220;b&#8221; to element &#8220;c&#8221;. There is no way that a line-based differencing utility could tell us this, but diffxml gives us:</p>
<blockquote><p><code>&lt;?xml version="1.0" encoding="UTF-8" standalone="no"?&gt;<br />
&lt;delta&gt;<br />
&lt;move childno="1" new_charpos="1" node="/node()[1]/node()[1]/node()[1]" old_charpos="1" parent="/node()[1]/node()[2]"/&gt;<br />
&lt;/delta&gt;</code></p></blockquote>
<p>Which correctly identifies that the only difference is the move of a single element to a new parent.</p>
<p>I hope this makes it clear why tools such as diffxml which understand the heirarchical narture of XML documents are often a better choice than line-based equivalents for comparing XML documents.</p>
<ol class="footnotes"><li id="footnote_0_15" class="footnote">Admittedly the output is a little hard for humans to read currently. There are a couple of things that can be done to improve this (use proper node names instead of using the node() axis and put a string in the nodetype attribute), but in the future I hope to provide some sort of graphical interface.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.adrianmouat.com/bit-bucket/2009/05/why-use-diffxml/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>

