The Java SE 1.4 platform included the "Crimson" reference implementation for JAXP 1.1. The Java SE 5 platform includes a reference implementation for JAXP 1.3 based on the Apache "Xerces" library.
Because these implementations come from entirely different codebases, and because the JAXP standard has evolved from 1.1 to 1.3, there are some subtle differences between the implementations, even they both conform to the JAXP standard. These two factors combine to create the compatibility issues described in this guide.
However, while XML applications written for 1.4 do suffer some incompatibilities, JAXP 1.3 in the Java SE 5 platform provides some compelling advantages:
javax.xml.validation
package.javax.xml.datatype
,
including Gregorian Calendar dates and times that previously had no
natural analogues in the Java platform.javax.xml.namespace
),
which provide better support for Unicode characters in tag names
and namespaces.TypeInfo
)ErrorHandling
and identification of an error's locationDOMImplementation
.UserDataHandler
which gets called
whenever a node is cloned, removed, renamed.javax.xml.xpath
,
which provide a more java-friendly way to use an XPath expression.
It was designed to be applicable to any data model that implements
the interface. You can use it now to process the reference
implementation DOM. In the future, it is likely to be available
when processing a JDom, or STAX data model, as well.That's the good news. The bad news is that some compatibility issues have survived all attempts at eradication. The remainder of this document discusses those issues.
While the reference implementation in Java SE 1.4 supported the DOM Level 2 API, the implementation in Java SE 5 supports the DOM Level 3 family of APIs. This section covers the impact of those changes on programs that used the JAXP 1.1 reference implementation:
For more information, see the complete list of changes in the DOM Level 3 Changes appendix.
In DOM level 3, additional methods were defined in the following interfaces:
The added methods only affect applications that implement the interfaces directly, and only then when the application is recompiled. Applications that use the factory methods to obtain implementation classes for these interfaces will have no problems.
These changes affect an application that reads in XML data into a DOM, makes modifications, and then writes it out in a way that preserves the original formatting.
In JAXP 1.1, extraneous whitespace was automatically removed on
input, and a single property (ignoringLexicalInfo
) was set to false
to preserve entity nodes and CDATA nodes, for
example. Including the additional nodes made the DOM somewhat more
complex to process, but because they were there, adding whitespace
output (indentation and newlines) produced highly readable,
formatted version of the XML data which closely approximated the
input.
In JAXP 1.3, there are four APIs that the application uses to
determine how much lexical (formatting) information is available to
process, using the following DocumentBuilderFactory
methods:
setCoalescing()
– To convert CDATA nodes to Text node and append to an adjacent Text node (if any).setExpandEntityReferences()
– To expand entity reference nodes.setIgnoringComments()
– To ignore comments.setIgnoringElementContentWhitespace()
– To ignore whitespace that is not a significant part of element content.The default values for all of these properties is false
, which preserves all the lexical information
necessary to reconstruct the incoming document in its original
form. Setting them all to true
lets you
construct the simplest possible DOM, so the application can focus
on the data's semantic content, without having to worry about
lexical syntax details.
Note: When adding new nodes, the application must add any indentation and newline formatting that is needed for readability, since it is not provided automatically.
Following are the changes made between SAX 2.0.0 and SAX 2.0.2 that might affect compatibility:
DeclHandler.externalEntityDecl
now requires the
parser to return the absolute system identifier for consistency
with DTDHandler.unparsedEntityDecl
. This may
cause some incompatibilities.In SAX 2.0.1, an application can set ErrorHandler
, EntityResolver
, ContentHandler
, or DTDHandler
to null
. This
is a relaxation of the previous restriction in SAX 2.0, which
generated a NullPointerException
(NPE)
in such circumstances.
So the following code is legal in JAXP 1.3:
SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); XMLReader reader = sp.getXMLReader(); reader.setErrorHandler(null); reader.setContentHandler(null); reader.setEntityResolver(null); reader.setDTDHandler(null);
The resolveEntity()
method in the
EntityResolver
API now throws
IOException
, as well as SAXException
. (Before, it only threw SAXException
.)
The vast majority of applications are unaffected by this change,
because the DefaultHandler
implementation class has been modified to declare the additional
exception, and very few applications use the DefaultHandler
in such a way that they will run into
a problem.
The only way an application can be affected is if it overrides
the resolveEntity()
method and
also invokes super.resolveEntity()
. In that case, the application
won't compile in Java SE 5 until the method is modified to handle an
IOException
that super.resolveEntity()
could throw.
The following new features are recognized:
http://xml.org/sax/features/external-general-entities
– To include external general entities.http://xml.org/sax/features/external-parameter-entities
– To include external parameter entities and the external DTD
subset.and the following new property:
http://xml.org/sax/properties/xml-string
– To
get the string of characters associated with the current
event.For a complete list of Xerces features and properties, see http://xml.apache.org/xerces2-j/features.html and http://xml.apache.org/xerces2-j/properties.html.
Note: One point of compatibility is also worth mentioning. Namespace recognition was turned off by default in Java SE 1.4 (JAXP 1.1). For backward compatibility, that policy is continued in Java SE 5 (JAXP 1.3). However, namespace recognition is turned on by default in the official SAX implementation at https://sourceforge.net/projects/sax/. While not strictly a compatibility issue from the standpoint of JAXP, it is an issue that sometimes comes as a surprise.
Code that uses the standard JAXP APIs to create and access an XSL transformer does not need to be changed. The output will be the same, but will in general be produced much faster, since the XSLTC compiling transformer will be used by default, instead of the interpreting Xalan transformer.
Note: There is no significant difference between Xalan and XSLTC performance for a single run on a small data set, as when you are developing and testing an XSL stylesheet. But there is a major performance benefit when using XSLTC on anything larger.
JAXP 1.3 provides the standard XPath API for evaluating XPath expressions. We encourage users to use this API. Xalan-interpretive is not included in the reference implementation. If an application explicitly uses the Xalan XPath API to evaluate a standalone XPath expression (one that is not part of an XSLT stylesheet), you'll need to download and install the Apache libraries for Xalan, put them on the classpath.
This change does not affect applications that confine themselves to using the standard JAXP APIs. But applications that access implementation-specific features of the XML processors defined in previous JAXP versions will have to be modified to take into account package names that changed in JAXP 1.3.
The change has several effects on previous applications:
In Java SE 1.4, the fact that JAXP was built into the Java platform was a mixed blessing. On the one hand, an application could rely on that fact that it was there. On the other, most applications needed features and bug fixes that were available in later versions.
But adding new libraries had no effect, because internal classes always take precedence over the classpath. The solution for that problem in 1.4 was to use the endorsed standards mechanism. However, that was a new mechanism, and one which frequently placed an additional burden on the end user, as well as the application developer.
The solution in the JAXP 1.3 reference name is to change the package names of the Apache libraries used in the implementation. That change lets you reference newer Apache libraries in the classpath, so application developers can use them in the same way that would use any other additions to the Java platform.
The new names given to the Apache packages in the JAXP 1.3 reference implementation are shown below:
JAXP 1.1 | JAXP 1.3 | |
JAXP | org.apache.crimson |
-/- |
org.apache.xml |
com.sun.org.apache.xml.internal |
|
XSLT | org.apache.xalan |
com.sun.org.apache.xalan.internal |
Applications that specifying system properties on the command
line with -D
, in the JRE's lib/jaxp.properties
file, or by hard-coding them
into the application, generally do so in order to access
functionality that is not present in the standard APIs.
JAXP 1.3 contains many new additions. When upgrading such
applications, it is advisable to look for standard APIs in the
javax.xml.*
packages that will do the
same job, because that's the best way to keep from having to change
the application in the future. If absolutely necessary (either
because of functionality restrictions or lack of time to
investigate the new APIs), the property values can be changed by
converting old-format package names into the format:
org.apache.somePackage → com.sun.org.apache.SomePackage.internal
Similarly, internal implementation classes all use the new package names. If your application is using implementation classes (it shouldn't!) those package names will have to change, as well.
While XML does not allow recursive entity definitions, it does permit nested entity definitions, which produces the potential for Denial of Service attacks on a server which accepts XML data from external sources. For example, a SOAP document like the following that has very deeply nested entity definitions can consume 100% of CPU time and large amounts of memory in entity expansions:
<?xml version="1.0" encoding ="UTF-8"?> <!DOCTYPE foobar[ <!ENTITY x100 "foobar"> <!ENTITY x99 "&x100;&x100;"> <!ENTITY x98 "&x99;&x99;"> ... <!ENTITY x2 "&x3;&x3;"> <!ENTITY x1 "&x2;&x2;"> ]> <SOAP-ENV:Envelope xmlns:SOAP-ENV=...> <SOAP-ENV:Body> <ns1:aaa xmlns:ns1="urn:aaa" SOAP-ENV:encodingStyle="..."> <foobar xsi:type="xsd:string">&x1;</foobar> </ns1:aaa> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
A system that doesn't take in external XML data need not be concerned with the issue, but one that does can utilize one of the following safeguards to prevent the problem:
New system property to limit entity expansion: The entityExpansionLimit system property lets existing applications constrain the total number of entity expansions without recompiling the code. The parser throws a fatal error once it has reached the entity expansion limit. (By default, the limit is set to 64000.)
To set the entity expansion limit using the system property, use an option like the following on the java command line: -DentityExpansionLimit=100000
New parser property to disallow DTDs: The application can also set the http://apache.org/xml/features/disallow-doctype-decl parser property to true. A fatal error is then thrown if the incoming XML document contains a DOCTYPE declaration. (The default value for this property is false.) This property is typically useful for SOAP based applications where a SOAP message must not contain a Document Type Declaration.
New feature for Secure Processing: JAXP 1.3 includes a new
secure processing feature in which an application can configure
the SAXParserFactory
or
DocumentBuilderFactory
to get an XML processor that behaves in
a secured fashion. Setting this feature to true sets the
entity expansion limit to 64000. Note that the default limit
can be increased using the entityExpansionLimit
system
property.