Saturday, October 22, 2011

On prioritizing my Open Source projects, retrospect #3

(note: continuing story, see the previous installment)

1. What was the plan again?

Ok, it has been almost 8 months since the previous priorization overview (plan was to check after 4, but time flies when you are having fun!)
High-level priority list back then had these entries:

  1. Aalto 1.0 (complete async API, impl)
  2. ClassMate 1.0
  3. Java CacheMate, ideally 1.0
  4. Tr13 1.0
  5. Externalized Mr Bean (depending on interest)
  6. Jackson 1.8
  7. Jackson-xml-databinding 1.0
  8. Work on Smile format

2. And how have we done?

This time hit rate was even bit lower (than previous one at 50%), although there was some progress. In fact, had I checked things after 4 months, only one entry would have been completed (Jackson 1.8).

Item by item, we have:

  1. Aalto: modest progress (did write a blog entry on how to use async parsing at least); still need async SAX implementation, no 1.0 (although 0.9.7 was released right after blog entry)
  2. ClassMate: minor fixes, but no 1.0 yet
  3. CacheMate: significant progress (secondary indexes); I now have 1.0 design (for "raw" in-memory), but not yet implemented -- so kind of half-done
  4. Tr13: no progress
  5. Externalized mr Bean: no demand, no progress
  6. Jackson: 1.8 released (and even more, see below)
  7. Deferred: Externalized Mr Bean -- no work done (only some preliminary scoping)
  8. Jackson-xml-databinding: bug fixes, but no 1.0
  9. Smile format: actual progress -- Pierre from Ning implemented libsmile (C), contributed Smile-detection for unix/linux 'file' command

So it's mostly modest progress and misses this time; plan was not really aligned with what was needed. Only 3 entries had significant progress.

What went wrong? Partially it's just that huge popularity of Jackson swept away many of the plans; and conversely, lack of interest in many of the entries held them back.
But additionally, many other things got implemented. So let's look at that aspect next.

3. What was done instead?

Here are things I can remember, in loose work order:

  • LZF compression ("Ning LZF") -- much progress, quite close to 1.0
  • Jackson modules, such as Afterburner and improvements to already existing ones (scala, hibernate) -- although not yet for CSV or Joda modules (which exist in skeletal form)
  • JVM-compressor-benchmark for comparing space/time efficiency of various compressors on JVM, core done (can always add codecs)
  • Low-gc-membuffers, an experimental FIFO for byte[], with native memory buffers
  • Java merge sort (file-backed configurable efficient merge sort) -- mostly done, although not declared 1.0
  • Lzf4Hadoop, Hadoop integration for LZF compression -- basically done
  • New mode for JVM-serializers benchmark, data streams, for more balanced evaluations; implemented most common codecs
  • Jackson 1.9

Quite a list eh? One completely new "branch" of development was related to LZF compression codec. And continue huge demand for all things Jackson also meant that majority of my time was spent on Jackson and its extensions.

3. Updated list

Given recent developments, popular demand, and on-going plans, here is my current thinking of main priorities:

  • Jackson CSV module: I want to add proper Jackson support for CSV, since it it still a very common (and pretty functional!) input data format, and de facto default export format for lots of data sources. And best of all, this can be done without any work on Jackson core
  • CacheMate: I really want to implement secondary caches, and have a reasonable design (in many ways similar to persistence used by Cassandra/BigTable/HBase) on how to go about it
  • Jackson 2.0: move to github, refactor, redesign, remove deprecated things -- major renovation, to lay foundation for longer term 2.x development
  • ClassMate: getting to official 1.0 would be good, as well as writing blog entry or two on actual usage
  • Jackson XML data binding: fix bugs, declare 1.0, easier to market that way. And of course document
  • Ning-compress (LZF) 1.0: already functional, and feature-wise as good as 1.0, but there are couple of optimization tricks (by mr Dain S who ported Snappy to Java) that I'd still like to investigate, before declaring things 1.0

Other interesting things that might get included are:

  • Aalto 1.0: it would be good to sort of declare it done by implementing Async SAX, announcing the first non-beta release
  • Externalized mr Bean (BeanMate?) still looks like a potentially useful thing that others would want to use (this above and beyond basic refactoring that Jackson 2.0 would dictate, i.e. splitting of the jar as first-level new module)
  • Standardization work for Smile?
  • Maybe even design a splittable variant of LZF (Splitty? Splitz?) -- with improved usage of length indicators (VInts), designed so implementation can be even faster than LZF (on par with Snappy java), yet allow splittability which would be very valuable for Map/Reduce tasks

I expect above list to of course have at most 50% success rate, and for other good stuff to be worked on instead. Especially with likely changes to my daytime job, with possibly changing roles at day-to-day work, changes that will likely boost priority of some other open source efforts, reduce that of others.

Tuesday, October 11, 2011

Jackson 1.9 new feature overview

Jackson 1.9 was just released. As usual, it can downloaded from the Download page, and detailed release information can be found from 1.9 release page.

Let's have a look into contents of this release.

1. Overview

One of focus areas on this release was once again to tackle oldest significant issues and improvement ideas; and two of major new features are long-standing issues (ability to inline/unwrap JSON values; unify annotation handling for getters/setters/fields). Another big goal was to improve ergonomics: to simplify configuration, shorten commonly used usage patterns and so on. And finally there was also intent to try to "2.0 proof" things, by trying to figure out things that need to be deprecated to allow removal of obsolete methods as well as indicate cases where improved functionality is available.

2. Major features

(note: classification of features into major, medium and minor categories is not exact science, and different users might consider different things more important than others -- here we simply use categorization that the release page uses)

Major features included in 1.9 are:

  • Allow inlining/unwrapping of child objects using @JsonUnwrapped
  • Rewrite property introspection part of framework to combine getter/setter/field annotations
  • Allow injection of values during deserialization
  • Support for 'external type id' by adding @JsonTypeInfo.As.EXTERNAL_PROPERTY
  • Allow registering instantiators (ValueInstantiator) for types

2.1 @JsonUnwrapped

Ability to map JSON like

  {
    "name" : "home",
    "latitude" : 127,
    "longitude" : 345
  }

to classes defined as:

  class Place {
    public String name;

@JsonUnwrapped public Location location; }
class Location { public int latitude, longitude; }

has been on many users' wish list for a while now; and with addition of @JsonUnwrapped (used as shown above) this simple structural transformation can now be achieved without custom handling

2.2 "Unified" properties, merging ("sharing") of annotations of getters/setters/fields

Another long-standing issue has been that of isolation between annotations used by getters, setters and fields. Basically annotation added to a getter was only ever used for serialization, and would never have any effect on deserialization; similarly setter never affected deserialization. While this is not a problem for many annotation use cases, it would make following use case work quite different from what users intuitively expect:

  class Point {
@JsonProperty("width")
public int getW();
public void setW(int w); // must be separately renamed
}

which would actually lead to there being two separate properties: "width" that is written out during serialization; and "w" that is expected to be received when deserializing. Many users would intuitively expect annotation to be "shared" between two parts of logically related accessors. Same issue also affects annotations like @JsonIgnore and @JsonTypeInfo, requiring use of seemingly redundant annotations.

Jackson 1.9 solves this by adding new internal representation of logical property, and merging resulting annotations using expected priorities (meaning that annotations on a getter have precedence over setter when serializing, and vice versa).

There are also other more subtle changes, related to these changes. For example, class like:

  class ValueBean {
    private int value;

    public int getValue() { return value; }
  }

can now be deserialized succesfully, even without field "value" being visible or annotated: since it is joined with getter ("getValue()"), and getter is explicitly annotated, field is included as the accessor to use for assigning value for the property.

The last important benefit of this feature is that now handling of Jackson and JAXB annotations is much more similar, which should make JAXB annotations works better as a result (code was simplified significantly) -- this because JAXB had always considered annotations to be shared in this way.

2.3 Value Injection for Deserialization

Value injection here means ability to insert ("inject") values into POJOs outside of general data binding: that is, values that do not come from JSON input. Instead, values to inject are specified during configuration of ObjectMapper or ObjectReader used for data binding.

Why is this needed? Some Java types require additional context information to be able to construct POJO instances, for example. And in other cases, you may want to pre-populate values of some fields; and while there are other mechanims (for example, you can pass an existing POJO instance for "updateValue()") method) they are quite limited.

Only two things are needed for value injection:

  1. Means to indicate properties for which values are to be injected, and
  2. Definition of values to inject

Default mechanism is to handle first part by using new annotation, @JacksonInject, so that we could have:

  public class InjectableBean
  {
    @JacksonInject("seq") private int sequenceNumber;
    public String name;
  }

and second part is handled by allowing configuration of ObjectMapper or ObjectWriter instance with InjectableValues, object that can find values to inject given value id. Value ids can be specified as either Strings, or as Classes; if Class is used, Class.getName() is used to get actual String id to use. For above POJO, we could handle deserialization as follows:

  ObjectMapper mapper = new ObjectMapper();
  Integer sequenceNumber = SequenceGenerator.next(); // or whatever
  InjectableValues inject = new InjectableValues.Std()
   .addValue("seq", id)
  final String json = "{\"name\":\"Lucifer\"}";
  InjectableBean value = mapper.reader(InjectableBean.class).withInjectableValues(inject).readValue(json);

For more on this feature, check out FasterXML Wiki's entry on Value Injection.

2.4 External Type Id

Jackson has had support for full polymorphic type handling since 1.5, allowing configuration of both type identifier in use (usually either a class name, or logical type name) and type inclusion mechanism (as property, as wrapper array, as single-element wrapper object).
This covers wide range of usage scenarios, but there is one inclusion mechanism that is sometimes used but could not be supported by Jackson: that of using "external type identifier". This style of type inclusion is used by some data formats, most notably geoJSON.

By external type identifier we mean case such as this:

 {
  "type" : "rectangle",
  "shape" :  {
   "width": 20.0,
   "height" : 40.0
  }
 }

where type is included as a property ("type") that is outside of JSON Object being typed.

With 1.9 we can support such use case by using @JsonTypeInfo with a new inclusion value:

  public class ShapeContainer
  {
    @JsonTypeInfo(use=Id.NAME, include=As.EXTERNAL_PROPERTY, property="type")
    public Shape shape;    
  }
 
static class Shape { }
@JsonTypeName("rectangle") // or rely on class name, Rectangle static class Rectangle extends Shape { public double width, height; }

One thing to note here is that this inclusion mechanism should only be used with properties; annotating classes with @JsonTypeInfo that indicates external type identifiers can cause conflicts.

2.5 Value instantiators

And last but not least, 1.9 also allows much more control over mechanism used to create actual POJO value instances. While Jackson 1.2 added support for @JsonCreator annotation, there has not been a way to add custom creator objects.

With 1.9, we get following pieces:

  • ValueInstantiator (abstract class), extended by objects used to create value instances
  • ValueInstantiators (interface), provider for per-type ValueInstantor instances (as well as ValueInstantiators.Base abstract class for actual implementations)
  • Module.setupContext method addValueInstantiators(); as well as SimpleModule method addValueInstantiator(), for adding provider(s), so modules can easily provide instantiators for types they support
  • @JsonValueInstantiator annotation that can be used as an alternative to specify instantiator used for annotated type.

Above pieces are basically enough to support all three modes of construction @JsonCreator allows (so basically @JsonCreator could be implemented as module, if we wanted!):

  1. "Default" construction that takes no arguments and uses no-argument constructor or factory method
  2. "Delegate-based" construction, in which JSON value is first bound to an intermediate type (such as java.util.Map or Jackson JsonNode), and this instance is passed to single-argument creator method
  3. "Property-based" construction, in which one or more named values (JSON properties) are bound to specified types that match creator arguments, and these are passed to creator method.

Mapping of above construction methods to ValueInstantiator methods is fairly straight-forward:

  1. Simple no-arguments construction (ValueInstantiator.createUsingDefault()): used if the other construction mechanisms are not available: consumes no JSON properties.
  2. Delegate-based construction (ValueInstantiator.createUsingDelegate(Object)): similar to annotating a single-argument constructor or factory method with @JsonCreator, but NOT specifying argument name with @JsonProperty. If specified (i.e. value instantiator indicates it supports this), JSON value for property is first bound into intermediate (delegate) type, and then this value is passed to delegate creator method. Jackson mapper will handle all the details of initial binding, passing delegate object as the argument.
  3. Property-based construction (ValueInstantiator.createFromObjectWith(Object[] args)): similar to using @JsonCreator with arguments that all have @JsonProperty annotation to specify JSON property name to bind.

It is worth noting that order in which availability of different modes is checked is reverse of above: first a check is made to see if property-based method is available; if not, then delegate-based, and finally default construction.

Since this is possibly the most complicated new feature, I will need to defer a full example to another blog post. But let's consider a very simple ValueInstantiator implementation that just supports the default (no-argument) instantiation:

  class SimpleInstantiator extends ValueInstantiator
  {
    @Override public String getValueTypeDesc() { // only needed for error messages
      return MyType.class.getName();
    }

    @Override // yes, this creation method is available
    public boolean canCreateUsingDefault() { return true; }

    @Override
    public MyType createUsingDefault() {
      return new MyType(true);
    }
  }

and similarly you can add support for delegate- or property-based methods.

3. Other notable features

Aside from above-mentioned major features, there are many other useful improvements:

  • "mini-core" jar (jackson-mini-1.9.0.jar)
  • DeserializationConfig.Feature.UNWRAP_ROOT_VALUE
  • @JsonView for JAX-RS methods to return a specific JsonView
  • Terse(r) Visibility: ObjectMapper.setVisibility(), VisibilityChecker.with(Visibility)
  • Add standard naming-strategy implementation(s)
  • Add JsonTypeInfo.defaultSubType property to indicate type to use if class id/name missing
  • Add SimpleFilterProvider.setFailOnUnknownId() to disable throwing exception on missing filter id

"Mini core": as name suggests, there is now a new jar (jackson-mini-1.9.0.jar) that is about 40% smaller than the default one -- about 136kB or so. Size reduction is achieved by leaving out text files (LICENSE), as well as annotations, but otherwise functionality is equivalent to standard core package, i.e. supports streaming API (JsonParser/JsonGenerator, JsonFactory).

DeserializationConfig.Feature.UNWRAP_ROOT_VALUE is counterpart to SerializationConfig.Feature.WRAP_ROOT_VALUE; and there is also now a new annotation -- @JsonRootName -- that can be used to use custom wrapper name instead of the simple class name. This is useful with interoperability, as some frameworks insist on adding such wrappers.

One of few improvements to JAX-RS provider is that now you can add @JsonView annotation to JAX-RS resource methods, and if one is found, it will be set as the active Serialization View during serialization of the result value.

One nice ergonomic improvement is the ability to use much more compact configuration methods for changing default introspection visibility levels.
For example, you can use:

  objectMapper.setVisibility(JsonMethod.FIELD, JsonAutoDetect.Visibility.ANY);

to make all fields auto-detectable, regardless of their visibility. Or, to prevent all auto-detection, you could use:

  objectMapper.setVisibilityChecker(m.getVisibilityChecker()
  	.with(JsonAutoDetect.Visibility.NONE));

An improvement to naming strategy support is inclusion of one "standard" naming strategy -- CAMEL_CASE_TO_LOWER_CASE_WITH_UNDERSCORES -- which converts between standard Java Bean names (that setters and getters use), and C-style names (like used by Twitter). You can enable this converter by:

  mapper.setPropertyNamingStrategy(PropertyNamingStrategy.CAMEL_CASE_TO_LOWER_CASE_WITH_UNDERSCORES);

and from there on, can consume JSON like:

 { "first_name" : "Joe" }

to bind to class like:

public class Name { public String firstName; }

without having to use @JsonProperty to fix name mismatch.

As to sub-typing, you can now use new @JsonTypeInfo property defaultSubType to indicate, as name suggests, default sub-type to use in case where type name was missing or could not be resolved: use it like:

  @JsonSubType(use=Id.NAME, include=As.PROPERTY, defaultSubType=GenericImpl.class)
  public abstract class BaseType { }

And finally, one improvement to Json Filter functionality is ability to specify that it is ok to use a filter id that does not refer to an actual filter (i.e. can not be resolved by the currently configured filter provider) -- use 'SimpleFilterProvider.setFailOnUnknownId(false)' to make this the default behavior. Missing filter is then assumed to mean "no filtering", that is, serialization is handled as if no filter was specified.



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.