Chip Killmar

Software Development

Chip Killmar header image 2

Pretty-Print XML from a DOM

March 25th, 2009 · 4 Comments · Java

If you’re like me it can be useful to convert a DOM in Java into a formatted XML string. Pretty-printing a DOM Element can augment the information gleaned from your debugger when tracking down a  bug, for example.

I’ll show you three ways to pretty-print XML from a Java DOM below.  All of them format a Document but can be easily modified to serialize any Node.  (Note: if you’re interested, you can download source code for this posting here.)

Method 1: TrAX

The JAXP Transformation API for XML is a logical first choice for pretty-printing XML.

 1 static String prettyPrintWithTrAX(Document document) throws TransformerException {
 2     // Pretty-prints a DOM document to XML using TrAX.
 3     // Note that a stylesheet is needed to make formatting reliable.
 4     TransformerFactory transformerFactory = TransformerFactory.newInstance();
 5     Transformer transformer = transformerFactory.newTransformer(new StreamSource("pretty-print.xsl"));
 6     StringWriter stringWriter = new StringWriter();
 7     StreamResult streamResult = new StreamResult(stringWriter);
 8     DOMSource domSource = new DOMSource(document);
 9     transformer.transform(domSource, streamResult);
10     return stringWriter.toString();
11 }
Code Listing 1: Using TrAX to pretty-print XML.

 

Using the TrAX API is straightforward (Code Listing 1), however dealing with whitespace in a DOM is rather tricky.  It’s important to note that if your DOM was parsed from XML containing whitespace, this will by default be preserved in the object model.  Furthermore, the default TrAX indentation engine (Xalan-Java) won’t reformat existing whitespace. 

Therefore, to reliably pretty-print XML with TrAX you’ll need to supply an XSLT stylesheet (Code Listing 2) that removes whitespace from the DOM during transformation.  This is a subtle point about serialization using TrAX that’s missed on the Xalan website which can conflate a bug found in JDK 1.5.  You can read more about this phenomenon here and here

 1 <?xml version="1.0" encoding="UTF-8"?>
 2 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xalan="http://xml.apache.org/xslt" version="1.0">
 3     <xsl:output method="xml" encoding="UTF-8" indent="yes" xalan:indent-amount="4"/>
 4     <!--Important!  Remove existing whitespace in DOM elements.-->
 5     <xsl:strip-space elements="*"/>
 6     <!--Identity transformation (see http://www.w3.org/TR/xslt#copying).-->
 7     <xsl:template match="@*|node()">
 8         <xsl:copy>
 9             <xsl:apply-templates select="@*|node()"/>
10         </xsl:copy>
11     </xsl:template>
12 </xsl:stylesheet>      
Code Listing 2: An XSLT stylesheet for pretty-printing.

 

Method 2: DOM Level 3 Load and Save

DOM Level 3 Load and Save (LS) is the new API for pretty-printing XML.  Support for LS is included with JAXP 1.3 which is bundled as part of Sun’s JDK 1.5, but unfortunately pretty-print formatting only works in JDK 1.6 or better.

The code to use LS’s LSSerializer is more involved:

 1 static String prettyPrintWithDOM3LS(Document document) {
 2     // Pretty-prints a DOM document to XML using DOM Load and Save's LSSerializer.
 3     // Note that the "format-pretty-print" DOM configuration parameter can only be set in JDK 1.6+.
 4     DOMImplementation domImplementation = document.getImplementation();
 5     if (domImplementation.hasFeature("LS", "3.0") && domImplementation.hasFeature("Core", "2.0")) {
 6         DOMImplementationLS domImplementationLS = (DOMImplementationLS) domImplementation.getFeature("LS", "3.0");
 7         LSSerializer lsSerializer = domImplementationLS.createLSSerializer();
 8         DOMConfiguration domConfiguration = lsSerializer.getDomConfig();
 9         if (domConfiguration.canSetParameter("format-pretty-print", Boolean.TRUE)) {
10             lsSerializer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
11             LSOutput lsOutput = domImplementationLS.createLSOutput();
12             lsOutput.setEncoding("UTF-8");
13             StringWriter stringWriter = new StringWriter();
14             lsOutput.setCharacterStream(stringWriter);
15             lsSerializer.write(document, lsOutput);
16             return stringWriter.toString();
17         } else {
18             throw new RuntimeException("DOMConfiguration 'format-pretty-print' parameter isn't settable.");
19         }
20     } else {
21         throw new RuntimeException("DOM 3.0 LS and/or DOM 2.0 Core not supported.");
22     }
23 }
Code Listing 3: Pretty-printing XML with DOM Level 3 Load and Save.

 

Method 3: XMLSerializer

If you’re still stuck using JDK 1.5, don’t worry. You can configure Xerces-J‘s XMLSerialzer directly to pretty-print XML.  Although this is the easiest method, it depends on an internal class in the Xerces parser and is not JAXP compliant!  You can read more in Question 11 of the JAXP FAQ.

 1 static String prettyPrintWithXMLSerializer(Document document) throws IOException {
 2     // All is not lost if you're still on JDK 1.5: just use XMLSerializer with the appropriate OutputFormat.
 3     // The following will pretty-print the DOM document to XML.
 4     StringWriter stringWriter = new StringWriter();
 5     XMLSerializer serializer = new XMLSerializer(stringWriter, new OutputFormat(Method.XML, "UTF-8", true));
 6     serializer.serialize(document);
 7     return stringWriter.toString();
 8 }
Code Listing 4: Pretty-printing XML using XMLSerializer.

 

Tags:

4 responses so far ↓

  • 1 Juan García // Nov 22, 2009 at 2:27 am

    http://xerces.apache.org/xerces2-j/javadocs/other/org/apache/xml/serialize/XMLSerializer.html is now effectively deprecated.

    And JDK 1.5 already reached End of Service Life:
    http://java.sun.com/products/archive/eol.policy.html
    so DOM Level 3 Load and Save API can be used without any problem.

    Please, update your post this info is too valuable to stay outdated.

  • 2 Chip Killmar // Nov 30, 2009 at 4:05 pm

    Thanks for your comment, Juan.

    In my post I link to the documentation for XMLSerializer which clearly states that it has been deprecated since Xerces 2.9.0 (released in 2006) which is some time ago.

    You’re right that JDK 1.5 has been EOL’ed, but if you’re like me you know that *many* businesses are still using it and that upgrading to 1.6 isn’t an option for them anytime soon. Method 3 is still relevant if you’re in this camp.

    Although it’s deprecated, XMLSerializer works without any problems that I’ve observed.

  • 3 Thomas U. // Jan 16, 2014 at 4:21 am

    Thanks a lot, this works in Java 1.6!

  • 4 Ludovic Kuty // May 13, 2014 at 12:34 pm

    Fantastic post, thanks

Leave a Comment