(How To Do) XML Schema Validation
Judging by the popularity of this question on StackOverflow (and my answer), it seems that a lot of people struggle to check the validity of an XML file against an XML Schema. It’s a shame that what should be a trivial task has wasted hours of developer’s lives. In this article I’ll try to offer a few alternatives for various platforms and hopefully make things a bit simpler.
There are actually a few different options at your disposal (which is probably part of the problem). We’ll start by looking at what is probably the fastest and easiest option to get started with validation:
The CoreFiling offering only allows uploading of files, whereas FreeFormatter allows either pasting of documents or pointing to URLs (but not uploading files!). Both work fine and seem to use the Apache Xerces parser underneath the hood, which I’ll talk more about in the next section.
The problem with on-line validation is that it will quickly get frustrating if you are doing a lot of validating and need a quick turn-around time. There may also be privacy or file size reasons which mean you can’t use an on-line validator.
Command Line Validators
This is my normal choice. There are options for Mac, Windows and Linux. For Mac and Linux users, you likely already have xmllint installed, which can do validation. At a terminal type:
xmllint --noout --schema test.xsd test.xml
test.xsd is your schema file and
test.xml and is your XML file. You need the
--noout flag or the XML file will be printed to standard out along with the validation results. If you don’t have it installed, you can use your package manager to install libxml2 or grab it from www.xmlsoft.org.
The major issue with xmllint is that it doesn’t support the whole of the XML Schema standard (which is large and complex). For this reason you may want to double check with another validator (I find xmllint’s error messages and ease of use make it very useful when rapidly iterating during development of XML/XSD files).
Xerces is the Daddy of XML Validators. It exists in both C and Java versions (I believe a customised version of Xerces is bundled in the Sun JDK). However, there isn’t a simple way to immediately run the Xerces validator from the command line. For that reason, I’ve cobbled together a Java program to solve this issue.
Either clone the git repository at https://github.com/amouat/xsd-validator.git or download the zip and unpack it. Then from a shell prompt:
./xsdv.sh test.xsd test.xml
There is also a cmd file that you can use to run xsdv from a windows Command Prompt. More information can be found on the github project page. Note that the project actually runs the XML parser bundled with Java, which is a customised version of Xerces if you are using the Oracle JRE.
If you’d like to use the C++ version of Xerces as opposed to the Java version, take a look at this wrapper program by Jean-Marc Vanel or the StdInParse utility that ships with Xerces C++ as an example.
I was hoping to find a command line utility for running Microsoft’s MSXML validator, but I failed to find any projects I liked. There are however several editors that use MSXML, as covered in the next section.
XML Editors Supporting Validation
The final choice is to use an editor with XSD support. I don’t have any experience of this, but if you’re doing a lot of XML editing and validation it may well be worth looking into one of the following:
- XML Notepad 2007 – Free editor from Microsoft. Uses MSXML to do validation. Windows only.
- Notepad++ – Free editor with validation plug-in available. This StackOverflow answer has more details. Windows only.
- Stylus Studio – Commercial XML editor. Claims to support all available validators. Windows only.
- OxygenXML – Commercial editor that uses both Xerces and Saxon-EE for validation. Windows/Mac/Linux.
- XMLSpy – commercial XML Editor that uses Altova’s RaptorXML processor. Windows only.
Use Multiple Validators!
Please be aware that due to the complexity of the Schema specification, there are differences between the validators; a document that is valid under one implementation may not be valid under another. For this reason you may want to mandate a validator to use project/organisation wide and you should consider testing with multiple implementations.
Also, the validators vary in quality of error messages; if you get stuck it is often worth running against a different validator to see if the error message is more helpful.