Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create an empty DOCTYPE using W3C DOM in Java?

I am trying to read an XML document and output it into a new XML document using the W3C DOM API in Java. To handle DOCTYPEs, I am using the following code (from an input Document doc to a target File target):

TransformerFactory transfac = TransformerFactory.newInstance();
Transformer trans = transfac.newTransformer();
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no"); // omit '<?xml version="1.0"?>'
trans.setOutputProperty(OutputKeys.INDENT, "yes");

// if a doctype was set, it needs to persist
if (doc.getDoctype() != null) {
    DocumentType doctype = doc.getDoctype();
    trans.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, doctype.getSystemId());
    trans.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, doctype.getPublicId());
}

FileWriter sw = new FileWriter(target);
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(doc);
trans.transform(source, result);

This works fine for both XML documents with and without DOCTYPEs. However, I am now coming across a NullPointerException when trying to transform the following input XML document:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE permissions >
<permissions>
  // ...
</permissions>

HTML 5 uses a similar syntax for its DOCTYPEs, and it is valid. But I have no idea how to handle this using the W3C DOM API - trying to set the DOCTYPE_SYSTEM to null throws an exception. Can I still use the W3C DOM API to output an empty doctype?

like image 427
jevon Avatar asked Oct 15 '25 07:10

jevon


1 Answers

Although this question is two years old, it is a top search result in some web search engine, so maybe it is a useful shortcut. See the question Set HTML5 doctype with XSLT referring to http://www.w3.org/html/wg/drafts/html/master/syntax.html#doctype-legacy-string, which says:

For the purposes of HTML generators that cannot output HTML markup with the short DOCTYPE "<!DOCTYPE html>", a DOCTYPE legacy string may be inserted into the DOCTYPE [...]

In other words, <!DOCTYPE html SYSTEM "about:legacy-compat"> or <!DOCTYPE html SYSTEM 'about:legacy-compat'>, case-insensitively except for the part in single or double quotes.

Leading to a line of Java code like this:

trans.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "about:legacy-compat");
like image 118
Johannes Avatar answered Oct 17 '25 21:10

Johannes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!