Saturday, August 18, 2012

Forcing escaping of HTML characters (less-than, ampersand) in JSON using Jackson

1. The problem

Jackson handles escaping of JSON String values in minimal way using escaping where absolutely necessary: it escapes two characters by default -- double quotes and backslash -- as well as non-visible control characters. But it does not escape other characters, since this is not required for producing valid JSON documents.

There are systems, however, that may run into problems with some characters that are valid in JSON documents. There are also use cases where you might prefer to add more escaping. For example, if you are to enclose a JSON fragment in XML attribute (or Javascript code), you might want to use apostrophe (') as quote character in XML, and force escaping of all apostrophes in JSON content; this allows you to simple embed encoded JSON value without other transformations.

Another specific use case is that of escaping "HTML funny characters", like less-than, greater-than, ampersand and apostrophe characters (double-quote are escaped by default).

Let's see how you can do that with Jackson.

2. Not as easy to change as you might think

Your first thought may be that of "I'll just do it myself". The problem is two-fold:

  1. When using API via data-binding, or regular Streaming generator, you must pass unescaped String, and it will get escaped using Jackson's escaping mechanism -- you can not pre-process it (*)
  2. If you decide to post-process content after JSON gets written, you need to be careful with replacements, and this will have negative impact on performance (i.e. it is likely to double time serialization takes)

(*) actually, there is method 'JsonGenerator.writeRaw(...)' which you can use to force exact details, but its use is cumbersome and you can easily break things if you are not careful. Plus it is only applicable via Streaming API

3. Jackson (1.8) has you covered

Luckily, there is no need for you to write custom post-processing code to change details of content escaping.

Version 1.8 of Jackson added a feature to let users customize details of escaping of characters in JSON String values.
This is done by defining a CharacterEscapes object to be used by JsonGenerator; it is registered on JsonFactory. If you use data-binding, you can set this by using ObjectMapper.getJsonFactory() first, then define CharacterEscapes to use.

Functionality is handled at low-level, during writing of JSON String values; and CharacterEscapes abstract class is designed in a way to minimize performance overhead.
While there is some performance overhead (little bit of additional processing is required), it should not have significant impact unless significant portion of content requires escaping.
As usual, if you care a lot about performance, you may want to measure impact of the change with test data.

4. The Code

Here is a way to force escaping of HTML "funny characters", using functionality Jackson 1.8 (and above) have.


import org.codehaus.jackson.SerializableString;
import org.codehaus.jackson.io.CharacterEscapes;

// First, definition of what to escape public class HTMLCharacterEscapes extends CharacterEscapes { private final int[] asciiEscapes; public HTMLCharacterEscapes() {
// start with set of characters known to require escaping (double-quote, backslash etc) int[] esc = CharacterEscapes.standardAsciiEscapesForJSON();
// and force escaping of a few others: esc['<'] = CharacterEscapes.ESCAPE_STANDARD; esc['>'] = CharacterEscapes.ESCAPE_STANDARD; esc['&'] = CharacterEscapes.ESCAPE_STANDARD; esc['\''] = CharacterEscapes.ESCAPE_STANDARD; asciiEscapes = esc; }
// this method gets called for character codes 0 - 127 @Override public int[] getEscapeCodesForAscii() { return asciiEscapes; }
// and this for others; we don't need anything special here @Override public SerializableString getEscapeSequence(int ch) { // no further escaping (beyond ASCII chars) needed: return null; } }

// and then an example of how to apply it
public ObjectMapper getEscapingMapper() {
ObjectMapper mapper = new ObjectMapper();
mapper.getJsonFactory().setCharacterEscapes(new HTMLCharacterEscapes());
return mapper;
}

// so we could do:
public byte[] serializeWithEscapes(Object ob) throws IOException
{
return getEscapingMapper().writeValueAsBytes(ob);
}


And that's it.

blog comments powered by Disqus

Sponsored By


Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.