Tuesday, July 25, 2006

Maximum TPS (c10k!)

With all the benchmarks testing how many Transactions Per Second can a web service crank through, it is sometimes easy to forget that serving web service requests should fundamentally be quite simple and fast. With today's multi-gigahertz number-crunching beasts of CPUs, numbers often quoted for SOAP-based web services can apparently still be expressed with just 2 digits ("we improved SOAP TPS from 20 to 80! breakthrough in efficiency!" [yes, I should dig up the reference]). This is pathetic.

So why do I think it's pathetic if getting up to 100 TPS was considered a huge achievement? What would be a more reasonable baseline performance to expect? I set out to try out some sort of baseline, by a very simple experiment: writing a web service (in the basic sense of the term: a server that serves HTTP GET requests), that:

  1. Is based on Servlet API, and runs on a servlet container (Tomcat in this case)
  2. Serves basic HTTP GET requests
  3. Returns very simple (non-xml) payload, along with a result code.

In addition to Tomcat-deployed dead simple service, I also obviously need matching simple client; one that can run multiple threads that send simple requests, and verify responses.

Running the server on a single 2 Ghz single-CPU Intel xeon box, and a 10-thread client on another similar box, over LAN, results in slightly over 4000 (!) requests per second throughput. Yes, that's right: FOUR THOUSAND. And CPU load on server is light, at around 15%. Adding another client box results in almost linear increase: almost 7500 TPS, with slightly below 50% CPU utilization. And third one can boost this up to almost 9000 TPS. So with little tweaking, such a toy service could conceivably serve up to 10k requests per second, with Plain Old Http (although, it's worth noting, this does require HTTP 1.1 pipelining -- without pipelining one gets less than 50% of the throughput).

Now, the service as tested is but a toy: it serves a short static string as a response and does not even do any logging. So what about simplest possible POX service: xml response to HTTP GET? With a set-up similar to the first case, but returning payload of almost 4 kB of XML (list of web service methods along with arguments and descriptions, as simple xml structure), and with full request-level logging, 3 multi-threaded clients can get throughput of slightly over 3000 requests per second on the same hardware. And this with no funky optimization, just using basic Woodstox XMLStreamWriters wrapped in StaxMate output framework. But xml generation itself is dynamic, starting from a list of services configured. This example service method was actually cut from a real web service: it will be interesting to see how the actual 'active' service methods (ones that need to access other backend services) will fare.

At this point, the trend should be clear: add more things, and get lower throughput. But the level of throughput is still an order of magnitude above what many consider maximum throughput obtainable. So what happens between fully static service invocations (like ones mentioned above), and more dynamic ones, to slow things down by a factory of 20? What kind of catastrophe can lead to such scalability degradation? One significant problem is that if the service itself has to call other services (or make database queries) -- something that usually happens -- latency of requests increases on server side, and so does number of threads needed to serve parallel requests (due to Servlet processing model). So even if the request itself is heavily I/O-bound (just waiting for another service to reply), thread scheduling overhead starts mounting, and uses CPU. And this is where projects like AsyncWeb start to matter a lot... but like they say, that's another story. ;-)

Anyway: as simple and naive as examples given above are, it should be food for thought to think of what "optimal performance" really means, and what should be achievable. For me it's clear that I want my services to serve "a thousand or more" requests per second. One hundred is just... well, so late-90s. :-)

"Wish I was there" -- OSCon 2006 in Portland

Tis once again the season for OSCon. I wish I was there, since the content is of particular interest this year: there are multiple interesting sessions about efficient XML processing, in addition to other generally interesting sessions.

Of particular interest to me was "Building a High Performance XML Router with AsyncWeb and XFire", both because Woodstox powers XFire, but also because AsyncWeb is a very interesting (and relatively new) piece of server-side technology. Whereas XFire is a second generation SOAP processor, AsyncWeb can be viewed as a second generation Servlet-like container (hopefully also influencing design for the next generation of the standard Servlet API itself?). Combining the two should allow significantly more efficient processing of xml messages, both for implementing web services, and for building infrastructure that routes those messages. Realistically, processing speed of a single web service box should be counted in thousands, not in dozens as seems to be the case with the first generation systems (ones using in-memory DOM model, such as Axis 1, with standard servlet containers). So this marriage of "best of breed" components seems very interesting. Stay tuned!

So here is hoping that we will see more convergence with "next generation" xml and message processing tools. Perhaps it will be even possible to get back to high processing speeds, partially lost with naive request/reply processing (over dedicated connections of, say, Corba), and partially with heavy-weight text processing (xml parsing) tools. It is only with such efficiency that increased scalability and higher modularity of Service-Oriented systems really starts to pay off. Up until now it has too often just been an expensive but unwieldy architectural experiment.

Saturday, July 22, 2006

Woodstox 3.0rc2 released

(editor's note: cowtown bloggers have been enjoying unusually warm weather of the pacific US northwest -- apologies for the lack of news, we'll try to round 'em up and writing for y'all's reading pleasure!)

Another release candidate of Woodstox XML processor was released yesterday. Since the goal now is to just solidify the release, there are no big changes. However, all bugs reported against the first release candidate were resolved. Some improvements were also done to the build process, so that pom files for Maven should now be properly generated, as requested. For a detailed list of changes, check the changelog.

With limited number of bug reports (and none that were regressions), here's hoping that the final 3.0 release can be done in next few weeks. 3.0 release itself will (when finalized) also mean that some other dependant/related projects can proceed. For example:

  • StaxMate project could release it's 1.0 version. It has been slow going (although I prefer "slow cooked" over "stagnant"...), but at least API and feature set are getting quite stable.
  • Woodstox feature set could be cleaned up for 4.0. Some of planned changes are:
    • Remove support for suppressing linefeed normalization. It doesn't seem very useful, but requires quite a bit of coding internally to work
    • Move base JDK requirement from 1.2 to 1.4. This will simplify code, since no work-arounds are needed for features not found in 1.2 or 1.3 (LinkedHashMap and such).
  • I could spend more time on documenting new 3.0 features like pluggable bi-directional validation system (todo: blog about "How do I validate XML output against a RNG schema"!



Related Blogs

(by Author (topics))

Powered By

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.