Friday, October 30, 2009

Good news is news too: thoughts on and by 3 quality nonline magazines

If I have said it once, I hope I have said it a thousands times. This is a fact: americans are pampered by the big, affordable selection of quality magazines. Or at least they should be spoiled if they do read them -- I am still not yet convinced they do, after living in the states for a decade. This abundance of goodness is in contrast to selection of daily newspapers; which are -- relatively speaking -- not that much to write home about. It's almost immoral how much good stuff you can actually read from refined-tree-based non-daily print products, for a very modest fee. My specific selection of such Great Magazines contains three exemplars -- Scientific American, Fortune and National Geographic -- but I suspect I could easily form many more such triplets and still make similar statements. My choice of three is not limited my economics or supply problems, but by time constraints: I can barely keep up with these three, as things are.

But I digress. After being inspired by one of many outstanding articles (it'll be the last one I mention in this entry, so more on that one later on), I thought it be fair to review one interesting and relevant article from all three. So here's Tatu's October 2009 American Magazine Article review.

1. How American Auto-Industry was Put In Its Proper Place

(aka "The Auto Bailout: How We Did It" by Fortune)

Of articles chosen, this is the most close-to-earth one. It's a condensed story of how GM/Chrysler cleanup project was done by the "Team Auto" of the US government. It's easy reading, and outlines how well difficult tasks can sometimes be managed, with combination of good people, right timing and perhaps bit of luck. If you had asked me to predict how well process could possibly succeed -- I mean, all the facts were there, and odds did not look very good -- I think I would have thought it unlikely that end result could be as good as it seems to be. And this not so much based on the story, which mostly explains what was done and how, but based on my thinking of how these things tend to go (with the level of business acumen that a software engineer can possibly possess, whether that's above or below average banker's talent).

I really like Fortune for articles like this: it's not a dumbed down version (there are weeklies that can dumb it down a notch; and then newspaper that take it to almost imbecil level; and finally TV shows that do the retard-a-versions for actual illiterates), but manages to be very easy reading.

But that's not all: Fortune also manages to be a good magazine due to its contrarian spirit. For a business magazine it has very independent spirit, and viewpoints presented are varied and if possible even something I'd call fair. It also tackles relevant and non-easy issues -- it's not just yet another WSJ (which itself actually may be one of few examples of good newspapers; nonetheless, it's much more predictable and thus less interesting with respect to non-daily news; but I guess that's only fair for a DAILY newspaper).

Anyhow: that's a good read, enough said.

2. Living On a Razor's Edge (by National Geographic Magazine)

And just so as not to get too well grounded with day-to-day (or year-to-year) living, it's good to mentally teleport into another time and/or place. National Geographic offers multiple articles for doing that; learning about other countries, cultures, flora and fauna, and all combinations thereof. Picking something to showcase is not easy: multiple articles could qualify.

But all things being equal, reading about Madagascar is always a safe bet for learning something new and unusual. But even within those expectations, the story and especially pictures that illustrate it stand far apart. I mean, how would one even imagine natural constructs like these cathedral-spire lookalikes? And things that live and grow on, around and under them. Whoa. Besides, it's somewhat of an uplifting article too, for once human development is unlikely to directly threaten the thing (indirectly climate change can of course affect it, perhaps destory, but that's still better than gone-in-next-five-years odds many other exotic places are given)

3. The Rise of Vertical Farms (by Scientific American)

(see http://www.verticalfarm.com for more)

And finally this is the article that got me inspired to write about stuff others write about. Article itself is sort of mind-blowing: the idea of having skyscrapers used for growing our food sounds decidedly futuristic, somewhat like the old (and for a while now, obsoleted) future predictions of how everyone by 2000 flies around by a jet pack and eats food pills for energy. But when you read the article and think about it, the first questions should be "would it really work?", "why didn't *I* think about it?" and "isn't that obvious now that I read it?"

I like the creativity aspect of the idea; as well as its immense fashionability. One of more surprising current undercurrents of progressive (and I don't mean politically leftist label here) forward-looking thinking is that agriculture is actually not a thing of the past, declining "industry", but something that is both very essential for humankind and also something that is part of the future and current, as well as past. The only thing that has been declining wrt. farming has been amount of population it employs; but its importance hasn't really reduced over time, nor will it significantly be reduced any time soon. So although there has been steady pace of R&D over the years, it is only becoming obvious now that farming is a big thing; there are lots of things wrong with it, but with all the challenges there are also gigantic opportunities. This along with more mundane trends of organic-food-is-cool, bundled with finally-at-last-here american environmentalist awakening is really making farming Cool with a capital C.

And this is where this idea becomes sizzling hot: hey, not only can you produce fresh food locally (where are the consumers? in cities dummy!), it can be both economically beneficial, good for your health (no, not in the "good vibes" sense of organic food but with regards to actual reduction of use of pesticides, less time for spoiling etc. etc. etc.), AND good for environment (less land used, less water, can recycle waste water and perhaps even solids; less energy for transportation). Oh yes, and also good tasting due to freshness -- fresh produce year round.

4. Common Threads (or Exercise in Deep Thinking by an Amateur Philosophist/-logist)

One more interesting thing about the "Big Three" is that they often converge around similar topics, somewhat aligned thinking, same threads; sometimes it might not be trivial to even know which magazine had any given article if you weren't shown it. And I don't mean this in negative way -- it's not that magazines are identical, or lack identity, but rather that they are varied and topic selection thereby overlaps (but is not lemming-like approach of daily news). Of course, some could call such convergence zeitgeist; different entities talking about similar things, threads that connect things that seem unrelated (like environmentalist/naturalist NG vs. business-talk of Fortune vs Geeky SciAm). And cynics would claim I am just missing weekly-paced groupthink. Perhaps this is part of the thing -- there being thoughts floating in time (as much as I hate the word, I guess I better use it... memes).

But I also think there's something related to sort of national way of thinking (what is the word for that again? Volkgaist?). Beyond temporal similarities (wars are more relevant when they are going on, obviously; significance of most events is time-bound), there is this common solution-oriented approach, and choosing of similarly current topics (not just fashionable, as in discussing stupid crap like celebrity gossip or politician's marital prolems) is something these magazines share. And most importantly: there is always this underlying faith in things improving over time. I suspect this is something profoundly american; something more genuine than stomach-revoltingly-plastic flag-waving variety of americana.

What I mean is that many articles talk about how things could be improved; it is actually quite rare to read an article where the overall tone is negative, much less something where things are pointed out to be hopeless. One could of course claim that's just good business sense (who would pay to read about bad stuff), but that's easily rebuked: selling social porn and doom-and-gloom is the business of TV networks, and quite a profitable plan at that.

Anyway, enough soap-box philosophizing (is that a word? can make it one if not?). Thank you for time. And please consider subscribing to some of these great affordable american magazines, if you don't already. I'd rather they be around during my lifetime, and maybe even my children. There'll be more time to read when I retire. :-)

Thursday, October 29, 2009

On State of State Machines

State Machines are things that all programmers should recall from their basic Computer Science courses, along with other basics like binary trees and merge sort. But until fairly recently I thought that they are mostly useful as low-level constructs, built by generator code like compiler compilers and regular expression packages. This contempt was fueled by having seen multiple cases of gratuitous usage: for example, state machines were used to complicate simple task of parsing paragraphs of line-oriented configuration files when I worked at Sun. So my thinking was that state machines are only fit for compilers to produce, something non-kosher for enlightened developers.

But about two years ago I started realizing that state machines are actually nifty little devices, not only to be created by software but also by wetware. And that their main benefit can actually be simplicity of the resulting solution -- when used in right places.

1. Block Me Not

The first place where I realized usefulness of state machines was within bowels of an XML parser. Specifically, when trying to write a non-blocking (asynchronous) parsing core of Aalto XML processor (more on this nice piece of software engineering in future, I promise)

Challenge with writing a non-blocking parser is simple: whereas blocking parser -- one that explicitly reads input from a stream and can block until input is available -- has full control of control flow, including ability to only stop when it wants to (at a token boundary; after fully parsing an element, or comment), a non-blocking parser is at mercy of whoever feeds it with data. If there is no more data for non-blocking parser to read, it has to store whatever state it has and return control to caller, ready or not. Which basically means it may have to stop parsing at any given character; or even better, within half-way THROUGH a character, which can happen with multi-byte UTF-8 characters And do it in such way that whenever more data does become available, it is ready to resume parsing based on newly available data.

So what is needed to do that? Ability to fully store and restore the state. And this is where state machine made its entrance: gee, wouldn't it make sense to explicitly separate out state, and create a state machine to handle execution. Or, in this case, set of small state machines.

Indeed it does; and once you go that route implementation is not nearly as complicated as it would be if one tried to do it all using regular procedural code (which might just be infeasibe altogether)

2. All Your Base64 Are Belong To Us

Ok, complex state keeping should be an obvious place for state machines to rule. But much smaller tasks can benefit as well.
Base64 decoding is a good example: given that decoding needs to be flexible with respect to things like white space (linefeeds at arbitrary locations), possible limitations on amount that can be decoded with one pass (with incremental parsing, as is the case with Woodstox), and the need to handle possible padding at the end, writing a method that does base64 decoding is a non-trivial task. I tried doing that, and resulting code was anything but elegant. I would even go as far as call it fugly.

That is, until I realized I should apply earlier lessons and see what comes of simple state keeping and looping. Lo and behold, tight loop of base64 decoding is tight both by amount of code (rather small) and processing time (pretty damn fast). Resulting state machine has just 8 states (4 characters per 24-bit unit to decode, few more to handle padding), and code is surprisingly simple and easy to follow (but still long enough not to be included here -- check out Woodstox/Stax2 API class "org.codehaus.stax2.ri.typed.CharArrayBase64Decoder" if you are interested in details).

3. Case of "I really should have..."

One more case where state machine approach would probably have worked well is that of "decoding framed XML stream".

At work, there is an analysis system that has to read gigabytes of data. Data consists of a sequence of short XML documents, separated by marker byte sequences that act as simple framing mechanism. Task itself is simple: take a stream, split it into segments (by markers), feed to parser. But to make it both reliable and efficient is not quite as easy: marker sequence consists of multiple bytes, and theoretically bytes in question could belong to a document: it's the full sequence that can not be contained within document. Plus for extra credit one should try to avoid having to re-read data multiple times.

So, foolishly I went ahead and managed to write piece of code that does such de-framing (demultiplexing) efficiently (which is needed for scale of processing we do). But code looks butt ugly; and took a bit of testing to make work correctly. Unfortunately I only had the light bulb moment after writing (... and fixing) the code: Would this not be a PERFECT case for writing a little state machine, where one state is used for each byte of the marker sequence?

Maybe next time I actually consider techniques I recently re-discovered, and apply them appropriately. :-)

Wednesday, October 28, 2009

Data Format anti-patterns: converting between secondary artifacts (like xml to json)

One commonly asked but fundamentally flawed question is "how do I convert xml to json" (or vice versa).
Given frequency at which I have encountered it, it probably ranks high on list of data format anti-patterns.

And just to be clear: I don't mean that there is any problem in having (or wanting to have) systems that produce data using multiple alternative data formats (views, representations). Quite on contrary: ability to do so is at core of REST(-like) web services, which are one useful form of web services. Rather, I think it is wrong to convert between such representations.

1. Why is it Anti-pattern?

Simply put: you should never convert from secondary (non-authoritative) representation into another such representation. Rather, you should render your source data (which is usually in relational model, or objects) into such secondary formats. So: if you need xml, map your objects to xml (using JAXB or XStream or what you have); if you need JSON, map it using Jackson. And ditto for the reverse direction.

This of course implies that there are cases where such transformation might make sense: namely, when your data storage format is XML (Native Xml DBs) or Json (CouchDB). In those cases you just have to worry about the practical problem of model/format impedance, similar to what happens when doing Object-Relational Mapping (ORM).

2. Ok: simple case is simple, but how about multiple mappings?

Sometimes you do need multi-step processing; for example, if your data lives in the database. Following my earlier suggestion, it would seem like you should convert directly from relational model (storage format) into resulting transfer format (json or xml). Ideally, yes: if there are such conversions. But in practice it is more likely that a two-phase mapping (ORM from database to objects; and then from objects to xml or json) works better: mostly because there are good tools for separate phases, but fewer that would do the end-to-end rendition.

Is this wrong? No. To understand why, it is necessary to understand 3 classes of formats that are talking about:

  • Persistence (storage) format, used for storing your data: usually relational model but can be something else as well (objects for object DBs; XML for native XML databases)
  • Processing format: Objects or structs of your processing language (POJOs for Java) that you use for actual processing. Occasionally this can also be something more exotic; like XML when using XSLT (or relational data for complicated reporting queries)
  • Transfer format: Serialization format used to transfer data between end points (or sometimes time-shifting, saving state over restart); may be closely bound to processing format (as is the case for Java serialization)

So what I am really saying is that you should not transfer within a class of formats; in this case between 2 alternate transfer formats. It is acceptable (and often sensible) to do conversions between classes of formats; and sometimes doing 2 transforms is simpler than trying to one bigger one. Just not within a class.

3. Three Formats may be simpler than Just One

One more thing about above-mentioned three formats: there is also a related fallacy of thinking that there is a problem if you are using multiple formats/models (like relational model for storage, objects for processing and xml or json for transfer). Assumption is that additional transformations needed to convert between representations is wasteful enough to be a problem in and of itself. But it should be rather obvious why there are often distinct models and formats in use: because each is optimal for specific use case. Storage format is good for, gee, storing data; processing model good for efficiently massaging data, and transfer format good for piping it through the wire. As long as you don't add gratuitous conversions in-between, transforming on boundary is completely sensible; especially considering alternative of trying to find a single model that works for all cases. One only needs to consider case of "XML for everything" cluster (esp. XML for processing, aka XSLT) to see why this is an approach that should be avoided (or, Java serialization as transfer format -- that is another anti-pattern in and of itself).

Friday, October 16, 2009

Eclipse, meet Subversion, vice versa, and PLEASE play nice together

Oh well. I thought it was too much to ask to expect Eclipse to play nice with Subversion -- that pesky annoying habit of blindly copying subversion metadata files, leading to irritating "The resource is a duplicate of .svn/..." errors, one for each directory you have within Subversion project.

But this little blog posting saved my day. Finally. After all the useless other workarounds (*), including installing Subversion-compatibility plug-ins (which could be useful per se, just not able to suppress this svn annoyance).

(* - yes, there are other ways to actually get rid of this warning, but usually less elegant; also, there are a few suggestions that do not even resolve it);

Sunday, October 11, 2009

Fresh new hope for JSON Schema: "Orderly" improvements afoot!

Here's something interesting related to on-going (if slowly moving) JSON Schema effort: Orderly micro-language. Orderly just might be that something that makes JSON Schema usable. There are other things that could do that, too, like good tool support; but more convenient syntax seems like the shortest route to improved usability: a custom-built DSL that does NOT (have to) use the target syntax as its own syntax. What a great idea! (not a novel one; RelaxNG compact syntax has been around for a while, and that wasn't new -- no matter, good ideas are good ideas)

As the web site says: "Orderly... is an ergonomic micro-language that can round-trip to JSONSchema ... ... optional. syntactic sugar, fluff. Tools should speak JSONSchema, but for areas where humans have to read or write the schema there should be an option to expose orderly in addition to JSON". Sounds good, I like that.

We shall see if and how this works out. My personal interest is more in the area of type definition language -- for me validation is actually not all that interesting; mostly because I believe it can be done quite well at (Java) object level (see Bean Validation API). So much so that even XML Schema is used much more as type definition language for data binding (as THE Object/XML type system for things like Soap, JAXB) than for actual XML validation, although original focus has squarely on handling validation aspects. Futher indirect proof is that its main competitor, RelaxNG, which is superior alternative for validation, isn't nearly as popular overall -- it would totally squash Schema if validation was the dominant use case for XML schema languages; but RelaxNG is not very useful for data binding, alas (because it allows ambiguity in grammar, acceptable for validation, but problematic when trying to do type inference and matching).

But I digress. I think that a prettified DSL that translates to/from JSON Schema could handle type system aspects just as well JSON Schema would; which is to say, "possibly well enough to be useful". Although JSON Schema as is has some nasty flaws in this area (only single type per schema? you kidding me, right? All references via static URLs? Really?), maybe it can all work out in the end with some spit and polish. Jury is still out.

Saturday, October 10, 2009

My 100 days back at the Mother of All E-Business

Time sure flies when you are having fun! Why: just today I realized it's approximately 100 days since I (re)started at Amazon!
This realization was based on having had to reset my password last week; and cycle between mandatory password change is 3 months.

I guess 3 months is still within loose definition of the Honeymoon, i.e. initial period of positive view on one's (still) new employer, team and surroundings. Nonetheless I am positively surprised how positive I feel about about things I work on and people I work with. It obviously helps that the company is doing well (as well as could be expected given macro-economic situation); but that doesn't quite compare to joy for high caliber of people that make things run, and that I have pleasure to work with. Especially so for my team: this is only the second time in my whole career (of 14 years) when I feel I am surrounded by people who know more about whole many things than I do; including core technical skills (for curious, the first time was in mid-to-late 90s, back in Helsinki). And the beauty of this all is that we are also working on somewhat cutting edge systems; not merely with regards to scale of operational things ( I generally don't care how many servers you waste CPU time on, or how many gigabytes of disk space they have -- that's sort of cool when you are starting your career, but you quickly [should] get over it), but rather with regards to complexity of the problem domain and resulting (mostly) essential complexity of solutions to solve problems. We are solving not only very big but also very hard problems. And that takes time; slowly, piece by piece new things get built, big beast of complexity starved to death. Fortunately Amazon has (and groks) something that is not very common in enterprise world: patience and long-term view; focus on things that matter, and perseverence to see through what you need and decide to do. Now that the team has worked on longer term plan for multiple years, results are accumulating, and that's the most exciting thing overall.

Anyway: I am very content with what I am doing now (hi there Sachin! can I get my raise now?). Enough said about that. No one likes people who brag about their marital bliss, luck in lottery, or amount of money, fame and chicks open source activities bring about. :-)

But one final thought on subject of work life, compared to my open source night time hacking: situation has always been such that the two are very loosely (if at all) connected. At first this seemed unfortunate, but over time I have come to appreciate this distinction: if the two were interlinked, wouldn't it just mean I spent both my work day and chunk of free time for work? And would it not also be putting all eggs in single basket? So perhaps there is something here similar to the rule of "never start a business with a friend (or relative)" (if business goes sour, you will be neither business partner nor friends; and even without that, there's enough tension to rip apart friendship): it may be good to keep open source "hobby" arms-length separate from paid-for development work. Much like work life is often best kept distinct from family life -- not totally apart of course; friends from work will be friends outside work too; and sometimes one world temporarily plunges into the other -- but at least asynchronous, transiently co-habiting temporal spectrum, but mostly not (ha! I never thought I'd write such a long sentence outside of tech specs...)

ps. If there's anyone with solid programming skills, background in NLP (or closely related areas), wish to solve actual real-life important problems and need or desire to get a (new) job, shoot me an email. We are still hiring. Stuff will be sold over the Internet, and facts need to extracted to support this lucractive business!

Thursday, October 08, 2009

Handling Base64-encoded binary data with Jackson

Hopefully by now you know that Woodstox can handle base64-encoded binary data for your XML use cases. You may even know that Jackson can do the same for JSON (notice that "g.writeBinary()" call in Jackson Tutorial?)

But there is actually bit more to know about base64 functionality here. Let's first review core Base64 handling with 3 main processing models Jackson supports.

1. Handling base64-encoded binary with Streaming API

Assuming you get JSON content like:

{
  "binary" : "hc3VyZS4="
}

you can get binary data out by, say:

  JsonParser jp = new JsonFactory().createJsonParser(jsonStr);
  jp.nextToken(); // START_OBJECT
  jp.nextValue(); // VALUE_STRING that has base64 (skips field name as that's not a value)
  byte[] data = jp.getBinaryValue();

And if you want to produce similar data, you can do:

  byte[] data = ...;
  StringWriter sw = new StringWriter();
  JsonGenerator jg = new JsonFactory().createJsonGenerator(sw);
  jg.writeStartObject();
  jg.writeFieldName("binary"); // 1.3 will have "writeBinaryField()" method
  jg.writeBinary(data, 0, data.length);
  jg.writeEndObject();

2. Handling base64-encoded binary with Data Binding

But where JsonParser and JsonGenerator make access quite easy, ObjectMapper makes it ridiculously easy. You just use 'byte[]' as the data type, and ObjectMapper binds data as expected.

  static class Bean {
    public byte[] binary;
  }

  ObjectMapper mapper = new ObjectMapper();
  Bean bean = mapper.readValue(jsonStr, Bean.class); 
  byte[] data = bean.binary; // Want to serialize it? Sure:
  String outputStr = mapper.writeValueAsString(bean); // note: Jackson 1.3 only; otherwise use StringWriter

3. Handling base64-encoded binary with Tree Model

Handling binary data is almost as easy with Tree Model as well:

  JsonNode object = mapper.readTree(jsonStr);
  JsonNode binaryNode = object.get("binary");
  byte[] data = binaryNode.getBinaryValue();

  // or construct from scratch, write?
  ObjectNode rootOb = mapper.createObjectNode();
  rootOb.put("binary", rootOb.binaryNode(data));
  outputStr = mapper.writeValueAsString(rootOb);

4. Additional Tricks

(DISCLAIMER: following features have been tested with Jackson 1.3, not yet released)

But what if you actually just want to encode or decode binary data to/from Base64-encoded Strings, outside context of JSON processing?

Turns out that you can do simple encoding and decoding quite easily. And as an additional bonus, Jackson's strong focus on performance means that the underlying codec is very efficient, even for "extra-curricular" use (where output buffering is not utilized as it is for incremental JSON processing).
In fact, it may just be faster than alternative commonly used processing toolkits.

Anyway: to encode arbitrary binary data as, you can do:

  import org.codehaus.jackson.node.BinaryNode;

  BinaryNode n = new BinaryNode(byteArray);
  String encodedText = n.getValueAsText();

// or as one-liner: encodedText = new BinaryNode(data).getValueAsText();

and to decode given Base64-encoded String, you can retrieve contained binary data by:

  import org.codehaus.jackson.node.TextNode;

  TextNode n = new TextNode(encodedString);
  byte[] data = n.getBinaryValue();

// or as one-liner: data = new TextNode().getBinaryValue();
// or, if encoded using non-standard Base64 variant, try: data = n.getBinaryValue(Base64Variants.MODIFIED_FOR_URL);

Useful? Possibly -- no need to include Jakarta Commons Codec just for Base64 handling, if you happen to use Jackson already.

Wednesday, October 07, 2009

Test First? Only if tests are the first and foremost deliverable...

Ok, given that my view on testing, unit tests, and "things of that nature" has slowly but surely evolved over time, I like reading what others have to say on subject. The only readings I steer clear of are fanboy articles, and possibly "it sucks because I'm contrarian" counterpoints. This still leaves lots of good material. For example, "Unit testing in Coders at Work" is a delightul compilation of multiple good (and some so-so) view points.

My personal favorite is the last episode: case of "a TDD proponent and another good programmer". Although it is but a single case I do think it suggests something simple yet fundamental: you tend to achieve whatever is your main goal. And if you consider Testing with capital T to be the most impotant thing; well, you will get good tests. But what does NOT follow is that you get stellar design or even code. You just get a design and an implementation that works the way you expected it to work. Which is not a bad thing per se; just not necessarily intrinsically good. That is: good design and implementation untested is better than perfectly tested but badly designed or implemented thing; it is easier to find implementation problems (bugs) than to re-design or re-implement.

That is probably my biggest misgiving regarding "Test First" idea: it does suggest that testing comes before anything else -- not just temporally, which may or may not make sense (often does; as often does not), but most importantly, as priority. For me testing is a very important supporting area: very useful (as I have said, none of my open source projects would have nearly as good quality and maintainability as they have without lots of time spent on writing and maintaining test code!), but ultimately not a goal but a tool to achieve the goals.

With respect to article, another pleasant observation is that I tend to agree with most practicioners interviewed. Pragmatism seems to be a core trait of good programmers: if not a defining one, at least one strongly correlated with core competencies. I suspect that has lots to do with the failure of the "TDD guy" on his try with Sudoku solving, as well as misplaced (if understandable, considering he was writing material for his blog) focus on testing.

Saturday, October 03, 2009

Joys of unit testing: finding a 5 year old bug (dormant, harmless, but a bug nonetheless)

Well here is something I don't do every day: write a unit test mostly to improve code coverage metrics, and find a dormant, almost 5-year old bug. But that's exactly what I did today: the bug was found in Jackson i/o package. And, it turns out, in 2 other libraries (Woodstox, Aalto) as well: code has been evolved over time, being originally written for Woodstox (checked in almost 5 years ago), and finding its way into Aalto and Jackson later on.

So what was the bug? It was a bug in 'InputStream.skip()' method, which skipped different number of bytes than it reported; return value was number that was requested, not number that was actually skipped. Simple case of mistaken variables.

And why was this bug not found earlier? Because this code path was never really executed -- just because skip() method exists in InputStream doesn't mean it is used. Except when someone writes a unit test to exercise it.

In the end I guess there are 2 complementary ways to think about this: first, try to unit test all code you have; and second, do not write code you do not plan to use. That is, given that I wasn't expecting to ever use this method, perhaps I should have written it to throw an IOException right away. And if was ever needed, then (and only then!) write it and add unit tests.

Thursday, October 01, 2009

Critical updates: Woodstox 4.0.6 released

This just in: Woodstox 4.0.6 was released, and it contains just one fix; but that one to a critical problem (text content truncation for long CDATA sections, when using XMLStreamReader.getElementText()). Upgrade is highly recommended for anyone using earlier 4.0 releases.

One more potentially useful addition is that I uploaded "relocation" Maven pom, for non-existing artifact "wstx-asl" v4.0.6 (the real id is "woostox-core-asl", as of 4.0; "wstx-asl" was used with 3.2 and previous). This was suggested by a user, to make upgrade bit less painful -- problem is that Woodstox tends to be one of those ubiquitous transitive dependencies to anyone running a Soap service (or nowadays almost any server-side XML processing system).

Next big thing should then be Jackson-1.3, stay tuned!



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.