Tuesday, February 15, 2011

Basic flaw with most binary formats: missing identifiable prefix (protobuf, Thrift, BSON, Avro, MsgPack)

Ok: I admit that I have many reservations regarding many existing binary data formats; and this is major reason why I worked on Smile format specification -- to develop a format that tries to address various deficiencies I have observed.

But while the full list of grievances would be long, I realized today that there is one basic design problem that is common to pretty much all formats -- at least Thrift, protobuf, BSON and MsgPack -- that is: lack of any kind of reliable, identifiable prefix. Commonly used techniques like "magic number", which is used to allow reliable type detection for things like image formats appears unknown to binary data format designers. This is a shame.

1. The Problem

Given a piece of data (file, web resource), one important piece of metadata is its structure. While this is often available explicitly from the context, this is not always the case; and even if it could be added there are benefits to being able to automatically detect type: this can significantly simplify systems, or to extend functionality by accepting multiple kinds of formats. Various graphics programs, for example, can operate on different image storage formats, without necessarily having any metadata available beyond just actual data.

So why does this matter? It helps in verifying basic correctness of interacton in many cases: if you can detect what is and what is not valid piece of data in a format, life is much easier: you have a chance to know immediately when piece of data is completely corrupt, or you are being fed data in some format than the one you expect. Or, if you support multiple formats, you can add automatic handling of differences.

2. Textual formats do it well

But let's go back to commonly used textual data formats: XML and JSON. Of these, XML specifies "xml declaration" which can be used to not only determine text encoding (UTF-8 etc) used but also the fact that data is XML. It is cleanly designed and is simple to implement. As if it was designed by people who knew what they were doing.

JSON does not define such a prefix, but specification does specify exact rules for detecting valid JSON, as well as encodings that can be used; so in practice JSON auto-detection is as easy to implement as that for XML.

3. But most new binary formats don't

Now; the task of defining unique (enough) header for binary formats would be even easier than that for textual formats, because structurally there is less variance: no need to allow variable text encoding, arbitrary white spaces, or other lexical sugar. It took me very little time to figure out the simple schema used by Smile to indicate its type (which in itself was inspired by design of PNG image format, an example of very good data format design).

So you might think that binary formats would excel in this area. Unfortunately, you would be wrong.

As far as I can see, following binary data formats have little or no support for type detection:

  • Thrift does not seem to have type identifier at its format layer. There is actually small amount of metadata at RPC level (there is a message-start structure of some kind), but this only helps if you want/need to use Thrift's RPC layer. Another odd things is that internal API actually exposes hooks that would be used to handle any type idenfitiers; it is as if designers were at least aware of possibility of using some markers to enclose main-level data entities.
  • protobuf does not seem to have anything to allow type detection of a given blob of protobuf data. I guess protobuf never claimed to be useful for anything beyond tightly coupled low-level system integration (although some clueless companies are apparently using it for data storage... which just plain old Bad Idea), so maybe I could buy argument that this is just not needed, that there is never any "arbitrary protobuf data" around. Still... adding a tiny bit of redundancy would make sense for diagnostics purposes; and given that protobuf already has some redundancy (field ids, instead of using ordering) it would seem acceptable to use first 2 or 4 bytes for this.
  • MsgPack and BSON both just define "raw" encoding, without any format identifier that I can see. This is especially puzzling since unlike protobuf and Thrift, they do not require a schema to be used; that is, they have plenty of other metadata (types, names of struct members; even length prefixes). So make these data formats completely unidentifiable?

4. But what about Avro?

There is one exception aside from Smile, however. Avro seems to do the right thing (as far as I can read the specification) -- at least when explicitly storing Avro data in a file (I assume including map/reduce use cases, stored in HDFS): there is a simple prefix to use, as well as requirement to store the schema used. This makes sense, since my biggest concern with formats like protobuf and Thrift is that being "schema-ridden", data without schema is all but useless. Requiring that two are bundled -- when stored -- makes sense; optimizations can be used for transfer.

So Avro definitely seems better design than 4 other binary data formats listed above in this respect.

5. Why do I care?

As part of my on-going expansion of Jackson ("the universal data processor"), I am thinking of adding many more backends (to support reading and writing data in alternate data formats), to allow clean and efficient data binding to/from most any commonly used data formats. Ideally this would include binary data formats. Current plans are to include format detection functionality in such a way that new codecs can detect data they are capable of reading and writing; and this will work just fine for most existing formats that Jackson can handle (JSON, Smile, XML). I also assumed that since it would be very easy to design data formats that can be reliably detected, existing formats should be a piece of cake to detect. It is only when I started digging into details of binary data formats that the sad reality sunk in...

On plus side, this makes it easier to focus on adding first rate support for data formats that are easy to detect. So I will probably prioritize Avro compatibility significantly higher than others; and I will unfortunately have to downgrade my work on adding Thrift support which would otherwise be the most important "alien" format to support (due to existing use by infrastructure I am working on).

Sunday, February 06, 2011

On prioritizing my Open Source projects, retrospect #2

(note: related to original "on prioritizing OS project", as well as first retrospect entry)

1. What was the plan again?

Ok, it has been almost 4 months since my last medium-term high-level priorization overview. Planned list back then had these entries:

  1. Woodstox 4.1
  2. Aalto 1.0 (complete async API, impl)
  3. Jackson 1.7: focus on extensibility
  4. ClassMate 1.0
  5. Externalized Mr Bean (not dependant on Jackson)
  6. StaxMate 2.1
  7. Tr13 1.0

2. And how have we done?

Looks like we got about half of it done. Point by point:

  1. DONE: Woodstox 4.1 (with 4.1.1 patch release)
  2. Almost: Aalto 1.0 -- half-done; but significant progress, API is defined, about half of implementation work done
  3. DONE: Jackson 1.7 (with 1.7.1 and 1.7.2 patch releases)
  4. Almost: ClassMate 1.0 not completed; version 0.5.2 released, javadocs publisher, minor work remains
  5. Deferred: Externalized Mr Bean -- no work done (only some preliminary scoping)
  6. DONE? StaxMate 2.1 -- released 2.0.1 patch instead that contains fixes to found issues, but no new features, which would defined 2.1.
  7. Some work done: Tr13: incremental work, but no definite 1.0 release (did release 0.2.5 patch version with cleanup)

I guess it is less than half since only 2 things were fully completed (or 3 if StaxMate 2.0.1 counts). But then again, of remaining tasks only one did not progress at all; and many are close to being completed (in fact, I was hoping to wrap up Aalto before doing update). And ones referred were lower entries on the list.

On the other hand, I did work on a few things that were not on the list. For example:

  • Started "jackson-xml-databinding" project (after Jackson 1.7.0), got first working version (0.5.0)
  • Started multiple other Jackson extension projects (jackson-module-hibernate, jackson-module-scala), with working builds and somewhat usable code; these based on code contributed by other Jackson developers
  • Started "java-cachemate" project, designed concept and implemented in-memory size-limited-LRU-cache (used already in a production system)

This just underlines how non-linear open source development can be; it is often opportunistic -- but necessarily in negative way -- and heavily influenced by feedback, as well as newly discovered inter-dependencies, and -opportunities.

3. Updated list

Let's try guestimating what to do going forward, then, shall we. Starting with leftovers, we could get something like:

  • Aalto 1.0: complete async implementation; do some marketing
  • ClassMate 1.0: relatively small amount of work (expose class annotations)
  • Java CacheMate: complete functionality, ideally release 1.0 version
  • Tr13: either complete 1.0, or augment with persistence options from cachemate (above)
  • Externalized Mr Bean? This is heavily dependant on external interest
  • Jackson 1.8: target most-wanted features (maybe external type id, multi-arg setters)
  • Jackson-xml-databinding 1.0: more testing, fix couple known issues
  • Work on Smile format; try to help with libsmile (C impl), maybe more formal specification; performance measurements, other advocacy; maybe even write a javascript codec

Other potential work could include:

  • StaxMate 2.1 with some new functionality
  • Woodstox 5.0, if there is interest (raise JDK minimum to 1.5, maybe convert to Maven build)
  • Jackson-module-scala: help drive 1.0 version, due to amount of interest in full Scala support
  • Jackson-module-csv: support data-binding to/from CSV -- perhaps surprisingly, much of "big data" exists as plain old CSV files...

But chances are that above lists are also incomplete... let's check back in May, on our first "anniversary" retrospect.

Thursday, February 03, 2011

Why do modularity, extensibility, matter?

After writing about Jackson 1.7 release, I realized that while I described what and how was done to significantly improve modularity and extensibility of Jackson, I did not talk much about why I felt both were desperately needed. So let's augment that entry with bit more background, fill in the blanks.

Two things actually go together such that while modularity in itself is somewhat useful, it is extremely important when it is coupled with extensibility (and conversely it is hard to be extensible without being modular). So I will consider them together, as "modular extensibility", in what follows.

1. Distributed development

The most obvious short-term benefit of better modularization, extensibility, is that it actually allows simple form of distributed development, as additional extension modules (and projects under which they are created) can be built independent from the core project. There are dependencies, of course -- modules may need certain features of the core library -- but this much looser coupling than having to actually work within same codebase, coordinating changes. This alone would be worth the effort.

But the need for distribution stems from the obvious challenge with Jackson's (or any smilar project's) status quo: that the core project, and its author (me) can easily become a bottleneck. This is due to coordination needed, such as code reviews, patch integration; much of which is most efficiently done with simple stop-and-wait'ish approach. While it is possible to increase concurrency within one project and codebase (with lots of additional coordination, communication, both of which are hard if activity levels of participants fluctuate), it is much easier and more efficient to do this by separate projects.

Not all projects can take the route we are taking, since one reason such modularity is possible is due to expansion of the project scope: extensions for new datatypes are "naturally modular" (conceptually at least; implementation-wise this is only now becoming true), and similarly support for non-Java JVM languages (Scala, Clojure, JRuby) and non-JSON data formats (BSON, xml, Smle). But there are many projects that could benefit from more focus on modular extensibility.

2. Reduced coupling leads to more efficient develo[ment

Reduced coupling between pieces of functionality in turn allows for much more efficient development. This is due to multiple factors: less need for coordination; efficiency in working on smaller pieces (bigger projects, as companies, have much more inherent overhead, lower productivity); shorter release cycles. Or, instead of canonically shorter development and release cycles, it is more accurate to talk about more optimal cycles: new, active projects can have shorter cycles, release more often, and more mature, slower moving (or ones with more established user base and hence bigger risks from regression) can choose slower pace. The key point is that each project can choose most optimal rate of releases, and only synchronize when some fundamental "platform" functionality is needed.

As an example, core Jackson project has released a significant new version every 3 - 6 months. While this is pretty respectable rate in itself, it is glacial pace compared to releases for, say, "jackson-xml-databinding" module, which might release new versions on weekly basis before reaching its 1.0 version.

3. Extending and expanding community

This improved efficiency is good just in itself, but I think it will actually make it easire to extend and expand community. Why? Because starting new projects and getting releases out faster should make it easier to join, get started and productive, and thereby lower threshold for participation. In fact I think that we are going to quickly double and quadruple number of active contributors quite soon, when everyone realizes potential for change; how easy it is to get to expand functionality in a way that everyone can share the fruits of labor. Previously best methods have been to write a blog entry about using a feature, or maybe report a bug; but now it will be trivially easy to start playing with new kinds of reusable extension functionality.

4. Modules are the new core

Given all the benefits of the increased modularity I am even thinking of further splitting much of existing "core" (meaning all components under main Jackson project; core, mapper, xc, jax-rs, mrbean, smile) as modules. All jars except for core and mapper would themselves work as modules (or similar extensions); and many features of mapper jar could be extracted out. The main reason for doing this would actually be to allow different release cycles: jax-rs component, for example, has changed relatively little since 1.0: there is no real need to release new version of it every time there is a new mapper version. In fact, of 6 jars, mapper is the only one that is constantly changing; others have evolved at much slower pace.

But even if core components were to stay within core Jackson project, most new extension functionality to be written will be done as new modules.

Saturday, January 08, 2011

On Perception of Java Verbosity

Today many software developers consider Java to be the modern-day equivalent of Cobol. This is evident from comments comparing amount of Java code needed to do tasks that can be written as one-liners using more dynamic and expressive scripting languages such as Python or Ruby. Funny how time flies -- it wasn't all THAT long ago that Java was seen as relatively concise language compared to C, due to its in-built support for things like garbage collection and standard library that contained implementations for host of things that in C were DIY (note that I did not say "due to simplicity of language itself")

1. Java verbose?

But while it is true that Java syntax can lead to code much more verbose than seems prudent (especially when traversing and modifying data structures), sometimes its reputation exceeds reality. I was reminded by this by a tweet I came across. The tweet asked "and how many lines would this be in Java", regarding a task of downloading JSON from a URL and parsing contents to extract data; something that can be done with a single line of Python (or Ruby or Perl). Implied assumption being be that it would take many more lines of Java code.

2. Ain't necessarily so

This assumption is not completely baseless: if a developer was to do this as part of a service, a typical java developer might well end up with code that exceeded ten lines; and this even without code itself being badly written. I will come back to question of "why" in a minute.

But assumption is also off base, for the simple reason that it can be a one-liner even in Java; for example:


Response resp = new ObjectMapper().readValue(new URL("http://dot.com/api/?customerId=1234").openStream(),Response.class);
// or if you prefer, bind similarly as "Map<String,Object>"

(and in fact, ".openConnection()" is actually unnecesary, as ObjectMapper can just take URL -- but if it didn't, one can open InputStream directly from URL, which sends request, takes response and so forth).

Code snippet just uses standard JDK URLConnection via URL, and a JSON library (Jackson in this case, but might as well be, GSon, flex-json, whatever); and results in request being made, contents read, parsed and bound to an object of caller's choosing, either a Plain Old Java Object, or simple Map.

Given that it IS that simple, why was there assumption that something more was needed?

3. But often is

Above use case happens to be doable in quite concise form; but there are other tasks where Java equivalent ends up being either a call to a very specific library tailored to condense usage, or is much fluffier than equivalents in modern scripting languages. But I don't think this is the main reason for the universal appearance of Java's bloatedness, i.e. it is not just case of choosing a wrong example.

I think it is because most Java developers would actually write piece of code that spanned more than a dozen lines of code. Why? Either because:

  1. They didn't know JDK or libraries, and use much more cumbersome methods (case for less experienced developers)
  2. They actually understand complexities of the task, within context where task needs to be done.

First one is easy to understand: if you don't know your tools, you can't expect a good outcome. But second point needs more explanation.

Let's consider the same task of sending a request to a service that returns a JSON response that we need to return as an object. What possible additional things should we cover, beyond what one-liner did? Here's sampling of possible issues:

  • There is no error handling in code snippet: if there are transient problems with connection, it will just fail for good, regardless of type of problem there is
  • How about problems with service itself? Requesting unknown customer? Do we get an HTTP error response; different JSON or what?
  • Do we really want to wait for unspecified amount of time, if request can not be made (TCP will try its damnest to connect, so it there is an outage it'll be minutes before anything fails)
  • URL to connect to is fixed (and hard-coded), including parameters to send; should they really be hard-coded
  • How is caching handled? What are connection details?
  • When there are failures, who is notified and how?
  • Are we happy with the default JDK URLConnection? It may not work all that well for some use cases (i.e. shouldn't be using Apache httpclient or something)

To cover such concerns for production systems, one probably would want much more complicated handling: possible retries for transient errors; definitely logging to indicate hard failures; way to handle error responses and indicates those to caller. Due to testing, end points being used are typically dynamically determined and passed; connection settings may need to be changed, and sometimes different parameters need to be sent. And for production systems we probably need more caching; whereas during testing we may want to disable any and all caching.

Since there are often many more aspects to cover, there is then tendendy to wrap all calls within helper objects or functionality; and if we did define something like "fetchJSONDataFromURL()", it surely would end up being more than dozen of lines of code. Yet calling functionality might still be no longer than a single Java statement.

So which one should we focus on? Helper method that is, say 50 lines long; or call to use it, which may be a one-liner? Former is what can be used to "prove" how bloated Java code is; yet it is written just once, whereas one-liners to use it are written ideally much more often.

By the way, above is not meant to say that it is ALWAYS necessary to handle all kinds of obscure error modes, or to create perfect system that is as efficient as possible. It is clearly not, and Java developers seem especially prone to over-complicating and -engineering solutions. But in other cases, happy-go-lucky approach (that I would claim is more common with "perl scripters") won't do. This is just a long way of saying that complexity of code should be based on actual requirements; and that those requirements vary widely.

4. Concise Java by Composition

I think my insight (if any) here is this: since Java, the language, offers relatively in way of writing compact code, economical source code must come from proper use of libraries, as well as design of those libraries. Furthermore, I think many Java developers have started wrongly believing that Java code must be verbose; and that this makes perception more of a self-fulfilling prophecy. This means that to write compact Java code one absolutely MUST be familiar with libraries to use for things that JDK does not support well (or at all).

Wednesday, December 22, 2010

Experiments with advertising, Adsense vs Adbrite, experience so far

It has been a while -- almost 6 months, to be precise -- since I decided to see if there is more to on-line advertising than venerable Google AdSense. So it is time to see if I have learnt anything.

1. Summary: gain some, lose some

An overall verdict is pretty much inconclusive: I like some aspects (more control, less fluctuation with revenue); but from strictly monetary view point, change is mixed bag. Fortunately revenue we are talking about is in "trivial" range -- enough so that it does not round down to zero, but not enough to pay for hosting at current rates. So I can freely do whatever I want without risking losing any "real money". I might as well just gain StackOverflow credits.

2. Positive

Overall the main positive aspect for me is the feeling of empowerment: AdBrite gives more control for publisher, from controlling what to display to defining minimum bids, and even allowing fallbacks (typically, to, what else but AdSense!). I like this a lot, and would assume it is a non-trivial competitive advantage as well: whereas for me control is more of a nice-to-have, I know for a fact that bigger "serious" publisher REALLY want to have more control. This is most important for publishers with valuable brand to take care of.

Another smallish positive thing is that since most advertisements are cost-per-display (aka cost-per-thousand == CPM), and NOT cost-per-click (CPC), revenue stream is steadier. With AdSense, your revenue typically fluctuates wildly, unless you get lots of direct placements.

3. Negative

On downside, "guaranteed" revenue from CPM is not particularly high. In fact, little CPM that I have seen from AdSense for others sites (not this blog) is typically in the same range as what I can get from AdBrite for majority of views (AB does have wider range of CPMs, based on viewer profiling); and even if readers are often skimpy clickers, whenever there are clicks it is typically worth more than two or three thousand CPM views. So overall it is possible that AdSense might actually pay more, over time (with caveat from above that either way, very little money will change hands :-) ).

4. Other

Oh. One thing I was hoping to see was wider selection of interesting ads to display; not just same old, same old. This may or may not be true: I think that overall selection may be wider (just from looking at all the ads that get displayed, via publisher management console), but selection for individual profile is still rather limited. So I don't know if it's very different from what Google would give. I guess it makes sense, in a way, that algorithms tend to over-fit ads with (IMO) too little randomity. But I really would like some more variation, personally.

5. Conclusions

It has been merely quite interesting ride. Perhaps I should check out potentially other choices? Too bad most alternatives seem to be just obnoxious irritating block-the-whole-page scams, or things that try to take over links, images; things that I would personally hate to see. I have no plans to introduce anything like that. But as usual, I am open to things that fit in well enough; something similar to AdSense or AdBrite ad systems, I guess.

Monday, December 13, 2010

Amazon Web Service (AWS), WikiLeaks: series of unfortunate events

As a current Amazon Web Service customer (as well as ex-employee of Amazon) I was sad to see reports of AWS mishandling of its WikiLeaks hosting.
My main objection is not regarding whether AWS should host the content or not, and I understand that due to self-service nature sometimes termination need to occur after customer relationship has been initially established. But the way termination came about was a complete cluster and really makes me wonder if I want to continue using AWS or even recommend it to others.

As far as I understand, basic facts are that:

  1. WikiLeaks started hosting content with AWS
  2. AWS was contacted by Posturing and Angry US politician(s) who wants to fight WikiLeaks using intimidation tactics ("you are either with us, or you are with.... terrorists!"). Sort of like, you know, people who use terror as a weapon to further their agenda.
  3. Shortly afterwards AWS terminated hosting of said content citing "probable cause for copyright infringement", without actual request for doing so (pro-actively) -- essentially claiming WikiLeaks was "guilty until proven innocent", but without giving them a chance to present any proof.

Now: the way I see it, one of two things happened to effect step 3: either Amazon agreed to do what Lieber"man" et al asked (but lied about not having done that); or Amazon wanted to pro-actively tackle issue they knew would become problematic (opportunistic), using some suitable weasel-word section from the contract.

What should have happened is simple: AWS should have done nothing before officials presented them with a court order or valid cease-and-desist letter (or whatever equivalent is done for Patriot Act requests); and if that happened, publicly announce what they did and why. This is what other companies have done (Google, Yahoo). Or in cases of copyright infringement, similar demand by (alleged) copyright holder, accompanied with court order or whatever DMCA requires. One would think this would be easy for government to do as well for content it has produced.

So why did this not happen? Since I have no idea what sort of backchannel communication resulted in what happened, best I can do is speculate. My two favourite suggestions are that either someone called favors; or that some mid-level manager made a panic decision.

Painful, very painful to watch. It's as if someone gave themselves a wedgie just to prevent bullies from doing it...

Tuesday, October 12, 2010

Look back on "prioritizing Open Source projects" (from May 2010)

It has been more than 4 months since I wrote about my experiences with priorization for Open Source projects, it seems like good time to see how things have been moving.

Looks like there are two ways to look at things -- whether glass is half full or half empty -- as I have pretty much completed 50% of tasks; but not necessarily in order of priority. And this even thought I publicly outlined priorities.

One positive thing is that the top entry (Java UUID Generator 3.) was just completed; and the second entry (Woodstox 4.1) is nicely in progress, to be completed within a month or two. On the other hand, other two completed tasks (both related to Jackson 1.6 that was completed a month ago) were entries listed as having the lowest priority. Some entries not on the list were also completed; specifically work with Async HTTP Client and OAuth signature calculation.

I guess I think this is reasonable outcome, as priority lists for my "hobby" development are there to help and assist, not to drive to specific business goals or to rein in my creativity. So even more important than getting things done in "right order" is that things do get done. So as long as more important or more urgent things are more likely to get worked on than less important or urgent things, overall efficiency remains brutally high, which is the way I like it. Finally, part of the reason for fluctuating order of execution is due to some tasks being more interesting than others; and working on "most interesting" things tends to maximize amount of progress (in contrasts to working on less interesting but more highly prioritized things).

But to get some closure on this entry, let's consider this a completed 4-month Scrum and create an updated priority list. Here's what it might look like:

  1. Complete Woodstox 4.1 (XML Schema, other user requested features) -- carry-over from the original list
  2. Aalto 1.0: finalize async API, implementation
  3. Jackson 1.7: focus on extensibility (module registration, contextual serializers)
  4. ClassMate (1.0?) -- library for fully resolving generic types; based on Jackson code
  5. External version of Mr Bean (from Jackson 1.6)
  6. StaxMate 2.1? (from the original list)
  7. Tr13 1.0? (from the original list)
  8. ... and then re-consider

This is an incomplete list and I expect roughly similar completion rate if I was to look back again in 4 months. Maybe I should start doing quarterly project reviews just for fun. :-)

Tuesday, June 29, 2010

Experiments in advertising, here goes nothing (aka Welcome, AdBrite!)

Ok let's talk about something that is quite visible to you dear readers, but something that you have probably managed to ignore automatically. Yes, I am taking about those commercial decorations on margin of these pages. But please, don't change the channel quite yet. :-)

1. Advertising Changes... yay!

So what's up there? After being a very small AdSense publisher for few years, I figured that I might well retire before ever seeing another check for ads displayed on this blog; so it might be time to explore options: if not to get higher yields then at least maybe get more interesting ads. I also generally root for underdogs, and at this point Google is the ultimate uber-dog if there ever was one. So why not partner up with some other advertising puppies.

Given these loose goals about the only criteria for finding a replacement would be that it is not Google. And, well, ideally it should not be Apple, and preferably not Microsoft. But latter two are negotiable constraints (in fact, I am tempted to check out M$'s PubCenter; if for nothing else due to its catchy name!).

2. So... ?

But enough background discussion: in the end, I decided to change my ad provider from the big G to an unknown-before-about-a-week-ago company called AdBrite. Mostly because they topped this Handy List of Google Adsense Alternatives. And finally, as of today, I bothered to change blog templates for the change to take effect.

3. Can hardly contain my excitement <yawn>

At this point I am curious to see to what kind of ads they might be pushing to my blog. I sort of wish it was something that lots of people found totally repugnant yet completely fascinating... but chances for that are probably low. We'll see -- maybe I need to cycle through variety of ad sales networks before choosing my poison.

4. Commercial Proposal by Author

By the way, if anyone actually wants to actually advertise here -- buy a section for month-by-month advertising, selling something that actually relates to something I have written about -- let me know. I am open to bids and can show google analytics statistics for pricing, so you have a fair idea of what you'd get.
The only limit I will put is that monthly ad space rental fee has to be non-zero positive number in full US dollars. :-)
(you can consider it as the auction starting price)

Friday, April 09, 2010

Rock on Kohsuke!

Term "Rock start programmer" is thrown around casually when discussing best software developers. But as with music, true stars are few and far between. While knowing the lifestyle can help, you got to have the chops, be able to influence and inspire others, and obviously deliver the goods to fill the stadiums, and data centers.

In Java enterprise programming world there are few more worthy of being called a rock star than Kohsuke Kawaguchi. List of projects he has single-handedly built is vast; list of projects he has contributed to immense, and his coding speed mighty fast (as confirmed by his use of term POTD, Project of the Day -- very very few individuals write sizable systems literally in a day!). It all makes you wonder whether he is actually a mere human being at all (maybe he's twin brother of Jon Skeet?!). For those not in the know, list of things he has authored or contributed to contains such programming pearls such as Multi-Schema Validator, Sun JAXB (v2) and JAX-WS implementations, Hudson, Maven, Glassfish, Xerces, Args4j, Com4j, and so on and on (for a more complete list, check out his profile at Ohloh; read and weep)

But to the point: it seems that mr. Kawaguchi is now moving on from sinking ship formerly known as Sun. This is not a sad thing per se (we all gotta move on at some point), nor unexpected -- steady stream of Sun people leaving Oracle has been and wil be going on for a while -- but it still feels strange. End of an era in a way; gradual shutting down of Sun brand. Image of a lonely cowboy riding against Sun settings (pun intended) comes to mind.

Anyway: rock on Kohsuke, onnea & lycka till! I look forward to seeing exactly what awesomeness you will come up with next!

Friday, March 26, 2010

Welcome HP, So Long Dell (and don't let the door hit you in the ass on your way out)

(warning: this is another rant. Sorry!)

Here's another improvement in my daily life: after more than a year of space-shuttle-lift-off noise, short-but-brutish uptimes, and countless curses, family's lean old Dell XP "work"station is out for good. Its only agreeable attribute was its slim neat looks (and sort of neat mechanism used for case, allowing its easy opening -- too bad there's not much to do even if you can open it easily). Good riddance, music to my ears.

The replacement, HP Pavilion Slimline, actually looks every bit as good as its predecessor. But otherwise the two are polar opposites: new box is quiet, as reliable as expected (i.e., "just works"), and its only design flaw is that it comes with an OS written by a company based in Redmond. But that I can live with, since it's not my work machine. :-)
And even Windows seems to have improved a bit between versions (new one has Windows 7, previous one whatever preceded Vista).

Anyway: I just thought I'll share my distaste with Dell products (maybe I should actually include "not-so-good" lists on my semi-professional home page?) now that I am getting rid of them.

It all started couple of years ago, when I decided to stop wasting my time on building my own PCs from components (which made sense after college, could save some money). I figured that with time I spent building PCs, and then troubleshooting problems with components, it just didn't make a whole lot of sense. And so I thought I'd go with something that other customers in general had found usable: back then Dell had highest customer ratings of all PC companies; and save for one friend of mine (who had already fought with Dell's phone "support" people, due to problems with memory chips that were failing; and that no amount of rebooting would ever fix), I wasn't aware of huge problems with the company or its products.

What I found out by experience makes me suspect that the company that had gotten good reviews had been abducted by aliens, and replaced by an ersatz replica or something. Correlation between happy customers and company that produced crap I bought just is not there. I am not talking about customer support (no point calling them wrt. badly designed piece of hardware, IMO, it is not not something a script-reading underpaid remote helper can help a lot with), but rather about quality of hardware. My experience beyond PC fiasco was that their products are competitively priced, but have low quality. For example, laser printer that I bought to replace trusty old Apple writer (which, after having bought second hand, served us for 8 years; for total lifetime of probably 15 years; and would have worked well but I couldn't find new toner cartridges for reasonable prices any more!) was inexpensive, and worked fine for a while. Like, maybe a year. And then broke down. The only thing left are LCD monitors, which I have to admit were reasonably priced, and still work. In fact both are still in active use. So I guess they do produce something other than lemons.

Thinking about that last sentence: I guess I could put my feelings into fitting slogan: Dell -- General Motors of Computers.
Feel free to quote.

ps. I am happy to admit that after kicking that incompetent CEO of theirs out, HP seems to have done nice comeback. Good for them, and us.

Related Blogs

(by Author (topics))

Powered By

Powered by Thingamablog,
Blogger Templates and Discus comments.

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.