From the Annals of Leaky Abstractions

Last week I created a new Java package named “net.gredler.app.converter” during a bit of refactoring. I know, I know. Pretty impressive stuff. But there’s more.

If you’ve used Eclipse before, you know that it provides feedback as you type, alerting you if your package name is not valid as is. For example, if you type “net.gredler.app.”, Eclipse will helpfully throw up the following error:

Invalid package name. A package name cannot start or end with a dot.

Well, I eventually got to “net.gredler.app.con”, and received the following error message:

Invalid package name. con is an invalid name on this platform.

Weird, no? It turns out that there are some limitations on directory names in Windows: you can’t have directories named “con”, “prn”, “aux” or “nul”, among others.

Apparently these were reserved words in DOS back in the day, and this restriction has propagated to the latest versions of Windows in the name of backward compatibility.

So if you’re coding in Linux or Mac OS X and want to ensure that your Java web application isn’t deployable on Windows, adding a package named “con” ought to do the trick (assuming your servlet container explodes WARs) ;-)

Maven Enhancements to Keep an Eye On

MNG-3397: More concise POM syntax (more info here).
MNG-3379: Concurrent artifact resolution (more info here).
MNG-2315: Easy mass transitive dependency exclusions.
MNG-1977: Global transitive dependency exclusions.

Have I missed any?

HtmlUnit 2.2 Released

A new version of HtmlUnit, the Java headless browser, has been released. The main purpose of this library is to enable scalable, performant pure-Java integration testing of web applications. HtmlUnit can also be used to scrape the web, and drives a number of other open source libraries, including Canoo WebTest, WebDriver, Celerity, Schnell, and JSFUnit.

Highlights of changes incorporated in version 2.2 include:

- Better handling of ill-formed HTML.
- Enhancements in the areas of performance and memory usage.
- Enhanced API for dealing with attachments.
- Enhanced API for dealing with proxies.
- Use of a (temporary) forked version of Rhino to fix many JavaScript bugs.
- More than 80 bugfixes and enhancements overall.

Please see the changelog for more information.

HtmlUnit 2.2 is available via the central Maven repository, or may be downloaded directly here.

JavaScript Isn’t a Toy Language (Anymore)!

So… JavaScript. When did one begin to feel that this crufty, popup-enabling, slightly-better-than-VB programming language for the unwashed masses might actually merit a second look?

Was it the first time you used Google Maps and realized you were moving the map without reloading the entire page?

Was it when Sun decided to include Rhino in the JDK?

Or when you browsed the Dojo codebase and realized that Java doesn’t have a monopoly on obtuse, enterprisey, over-architected design?

No? Maybe you figured it out when 60+ companies got together and decided it was worth the effort to start the OpenAjax Alliance in order to formalize common sense best practices for JavaScript libraries.

I know! It was when you (and your mother, and your coworkers, and all of their extended families) read Steve Yegge’s NBL blog post.

Actually, maybe it was when you decided to add John Resig to your blogroll.

Me?

The other week I read this article by John, in which he mentions the big O performance characteristics of a certain JavaScript benchmark. It doesn’t matter what benchmark, just focus on the important part here: big O. In an article on JavaScript. Big O. And JavaScript. Big O. JavaScript. And nary a raised eyebrow among the comments; almost the complete opposite, actually!

It’s almost like it’s respectable, or som’n.

URL.hashCode() Considered Harmful

I just cut HtmlUnit’s build time by about 20% by changing four lines of code. How? HtmlUnit keeps a small cache of web requests in a HashMap, keyed on the request URL. The problem with this is twofold:

The URL.hashCode() method is synchronized. The URL.hashCode() method triggers DNS lookups for the URL hosts.

The impact of item 2 was magnified by the fact that some of the HtmlUnit unit tests use a mock web connection to connect to fake URLs. DNS (non)resolution of these fake URLs took an especially long time.

The fix was to key the map entries on the value of URL.toString() instead. Apparently I’m not the first person to stumble across this problem. So think twice before coding your next HashMap<URL, XXX> ;-)

HtmlUnit 2.1 Released

The HtmlUnit team is pleased to announce a new release of HtmlUnit. This latest version includes a number of bug fixes and performance enhancements, and sports excellent support for GWT, jQuery and Sarissa, decent support for Prototype and Dojo, and basic support for YUI. Please see the changelog for more details.

In related news, we’ve (temporarily) forked the Rhino JavaScript engine in order to add browser-compatible JavaScript behavior which is slowly making its way into the Rhino project proper. The most important of these changes (so far) is definition-order property iteration. All of this should be available in the next version; many thanks to Marc Guillemot for his work in this area.

Anyway, give it a whirl and let us know what you think!

Thomas Paine on Software Design

I draw my idea of the form of government software from a principle in nature which no art can overturn, viz. that the more simple any thing is, the less liable it is to be disordered, and the easier repaired when disordered;…

Thomas Paine, Common Sense (1776)

Hibernate: Trouble in Paradise

I’ve written before about the problems which seem to crop up when you introduce Hibernate into your project. Unfortunately, I’m becoming more and more convinced that these are not isolated issues. Hibernate has a dependency management problem.

Things were not always this way. Hibernate was once independent, unbeholden to outside influences. Sure, you’ve always needed a somewhat-thicker-than-average skin in order to file bugs or ask questions in the forums. An extra helping of patience never hurt, either. But nowadays it seems like you need all of these things, plus the time and skill necessary to patch a JAR or two.

We recently upgraded to the latest Hibernate production JARs: Hibernate Core 3.2.6.GA, Hibernate Annotations 3.3.0.GA and Hibernate EntityManager 3.3.1.GA. Having already benefited from our earlier experience regarding conflicts between Spring and Hibernate, I figured this would take about five minutes: modify our root Maven2 POM, do a full build, verify everything works. Done! Not quite.

The first problem we encountered, reported 4 months ago, is that the production Hibernate EntityManager POM excludes a transitive dependency which it should not exclude. Net result? You have to hack together a custom version of Hibernate EntityManager which does not exclude this transitive dependency. Of course, this transitive dependency exists in the JBoss Maven2 repo, but not in the central repo. Nice.

Next, if you’re using a standard Maven2 Windows installation, you’ll run into this bug, because Hibernate now refuses to load JARs from directories with spaces in them (the standard location for Maven2 local repositories in Windows is in “Documents and Settings”). Very nice. Welcome to 1998!

You may notice a trend here: an old ASM dependency which causes conflicts with other libraries, a required dependency that is mistakenly excluded, and a dependency on a deprecated class in a buggy JBoss utility library.

Two of my co-workers are already suggesting we switch to TopLink. Jokingly, of course. For now. Napoleon is reputed to have said that “if they want peace, nations should avoid the pin-pricks which precede cannon shots.” Third-party libraries should likewise avoid annoying their users with irritating minutiae, or they may find these users mobilizing the artillery.

Space vs Time

A long, long time ago I took a college course, the title of which was Languages and Translation. The content of this sophomore-level course? A smörgÃ¥sbord of systems programming, heaps and stacks, pointers, *nix system calls, compilers, lex and yacc, grammars, lexical and semantic analysis, code optimization, and data representation — all taught and learned in C. While learning C. Oh, and Lisp.

This course made quite an impression, mainly because of my initial inexperience. I should explain that the path which led me to L&T went something like this:

I’m a senior in high school. I’m applying to college. I need to choose a major. Hmm… writing that Blackjack game on my TI calculator was pretty cool. It was a whole 50 lines of code! Plus I’m in that typing class, closing in on 20 words per minute. Maybe I’ll try Computer Science! I’m a freshman at Georgia Tech. First semester. I’m taking Intro to Computing. Man, this HTML nonsense sure uses a lot of brackets! And this Microsoft Access program is impossible to use! Still a freshman at Tech. Second semester. This Intro to Programming class is pretty crazy! We’re using Java for the assignments. The TA mentioned in passing that there are no pointers in Java, but I have no idea what a pointer is, so I could care less. I’m beginning to grok object-oriented programming. Welcome to Languages and Translation! Malloc, Malloc, Malloc! Realloc, Calloc, Malloc! Bwahahahahahaha!

Psychological damage aside, this was a great class. Jim Greenlee, who taught the course, was both an evil bastard and a great teacher. One of the tenets of code optimization which he often highlighted was “space versus time,” the idea that you can often optimize for one at the expense of the other, but rarely for both at the same time.

For example, a compiler can decide to inline a short function in order to avoid time-consuming stack allocations, but the compiled program will be larger (less time, but more space). Of course, if space (memory) is at a premium, your compiler might instead try to recognize common code sequences and hoist them into artificial functions (less space, but more time).

Flash forward 8 years. We have a client/server application at work which uses DTOs to transfer data to and from the client, and we use Hibernate on the server to persist our BOs. A specific server call, invoked in the presence of a large amount of data, brings the application to its knees.

Immediately we jumped to conclusions — Arrgh! Hibernate is such a hog! If you don’t code things perfectly, you can’t scale! And sometimes not even then! A quick profiling session confirmed our fears. Three hours later we had bypassed Hibernate in this specific instance, coding to the JDBC API instead. Unfortunately, this wasn’t the last of our performance problems. A second profiling session indicated that we had another bottleneck in our DTO-to-BO conversion routines!

Now, something which must be understood about Hibernate’s collection semantics is that when you use Hibernate to load BO A, which has an X-to-N relationship with BO B, you should (usually) use the collection of Bs provided by Hibernate. For example, if you use Hibernate to load a UserGroup, and you want to modify the list of Users associated with said UserGroup, you should modify the existing list of Users. You should not create a new list, add Users to it, and then give the UserGroup the new list of Users. Why? Because creating a new list results in a one-shot delete, followed by N insert statements. This is usually not desired.

However, a naive approach to modifying the collection provided by Hibernate (clearing it and then adding the BOs which you know you want) is just as bad, because a call to collection.clear() also results in a one-shot delete. The best approach is one of minimal modification to the existing collection.

In the case of DTO-to-BO conversion, where the DTO representation of an object is being transferred to the corresponding BO representation, this means adding items to the BO’s collection that are in the DTO’s collection but not in the BO’s collection, and removing items from the BO’s collection that are in the BO’s collection but not in the DTO’s collection. Elements that are in both collections are simply ignored.

The obvious implementation of this algorithm looks something like this:

for(DTO dto : dtos) {
 if(!contains(bos, dto)) add(bos, dto);
}

for(BO bo : bos) {
 if(!contains(dtos, bo)) remove(bos, bo);
}

Unfortunately, when using lists, the contains() calls above hide a nested loop, resulting in O(n2) performance. Once n gets into the thousands, things start to get sluuuugish. Veeeeery sluuuugish.

The solution? Trade a little space for a lot of time! By constructing HashMaps which contain all of the BOs in the lists, keyed on business keys which uniquely identify the BOs, the contains() calls above can be performed in constant time by invoking map.containsKey(). The result is O(n) performance. Much better!

Java Remoting: Protocol Benchmarks

I’ve been analyzing Java remoting protocols at work over the past couple of days, and thought I’d share some of the results here. Specifically, I’m going to share the results of our protocol benchmarking.

Every application will have different requirements in this area, but most criteria will include performance. At the very least, you will want to know how much (if any) performance you are sacrificing for the sake of your other requirements.

The Protocols

The protocols under consideration are Java’s RMI/JRMP, Oracle’s ORMI (with and without HTTP tunneling enabled), Spring’s HttpInvoker, Caucho’s Hessian and Burlap, and three flavors of Apache XML-RPC (Sun-based, HttpClient-based and Lite-based).

RMI/JRMP is Java’s default binary remoting protocol, and its big advantage is that it supports full object serialization of Serializable and Externalizable objects. RMI/JRMP has historically not been as easy to use as some of the other options, but if you’re using Spring (as we are), it’s as easy to use as anything else (except Apache XML-RPC; see below).

ORMI is OC4J’s EJB remoting protocol, which we’re currently using in production. This protocol was originally developed as part of the Orion application server, whose codebase formed the basis for OC4J. The only differentiator between ORMI and RMI/JRMP is likely to be performance. Of course, this protocol probably won’t be an option unless your serverside component is deployed on OC4J. ORMI supports tunneling via HTTP and HTTPS, should you run into firewall or proxy problems.

Spring’s HttpInvoker is another Java-to-Java binary remoting protocol. It’s also very easy to use and supports full object serialization of Serializable and Externalizable objects. The big difference between HttpInvoker and RMI/JRMP is that HttpInvoker transports data via HTTP, which makes it easier to use when you start running into proxy and firewall issues. Unfortunately, HttpInvoker isn’t an option if you’re not using Spring on both the server and the client.

Caucho’s Hessian protocol is a slim, binary cross-platform remoting protocol. Most cross-platform protocols are XML-based, and thus sacrifice a significant amount of performance in order to achieve interoperability. Hessian’s draw is that it achieves cross-platform interoperability with minimal performance degradation. However, Hessian uses a custom reflection-based serialization mechanism which can be troublesome in certain scenarios (think Hibernate proxies).

Currently in a draft state, Hessian 2 is the second incarnation of the Hessian protocol. You probably don’t want to use it in your production applications quite yet, but it looks very promising. Of course, it’s so easy to switch between Hessian and Hessian 2 that you might want to have a peek just for the heck of it [1].

As far as I can tell, Caucho’s Burlap is essentially Hessian-in-XML. As such, Burlap is not a binary protocol and should be expected to suffer accordingly in the performance department. This being the case, I haven’t been able to figure out exactly why anyone would choose to use Burlap instead of Hessian… aside from a requirement from the marketing department that your RPC be buzzword compliant.

Apache XML-RPC is Apache’s implementation of XML-RPC, an XML-based RPC specification which focuses on interoperability. Apache XML-RPC may be a bit more difficult to set up than the other options if you’re using Spring. The infrastructure for the other protocols is built into the Spring framework, but you’ll have to write the Apache XML-RPC / Spring integration yourself.

Apache XML-RPC supports a number of transport factories under the covers. The three transport factories tested were the default transport factory (based on Sun’s HttpURLConnection), the HttpClient transport factory (based on HttpClient) and the Lite transport factory (based on an Apache XML-RPC minimal HTTP client implementation).

The Configuration

The test consisted of remote invocations to a single method which takes an integer size parameter and returns a list of the specified size. The objects in the returned list each contained 10 instance variables (4 strings of about 40 characters each, 2 dates, 1 long, 1 integer, 1 double and 1 float, none of which were null). The lists returned were static, so there was no list construction overhead.

The results of the first 10 invocations were thrown away, in order to allow the VMs to “warm up.” The next 100 invocations were tallied and averaged.

The tests were performed on a Windows XP SP2 machine with 2GB of RAM and a 2Ghz Core 2 Duo CPU, using Sun’s JVM 1.5.0_09. The client and server were run in two separate VMs on the same host, in order to avoid spurious networking artifacts [2].

The following library versions were used: Spring 2.0.6, Jetty 6.1.5, OC4J 10.1.3.2.0, slf4j 1.4.3, log4j 1.2.14, Apache XML-RPC 3.1, Commons HttpClient 3.1, and Caucho Hessian 3.1.3.

Spring was used on both the clientside and on the serverside in order to hide the remoting details from the interface consumer and implementor. All serverside components were run inside a single Jetty servlet container, except for the ORMI serverside testing components which were run in OC4J.

Vendor extensions and GZIP requesting were enabled for all Apache XML-RPC tests. Streaming and GZIP compressing were enabled for all Apache XML-RPC tests except the Lite tests (the Lite transport factory does not support these options). These configurations represent the best performance optimizations available to the various Apache XML-RPC flavors.

The Results

The following graph illustrates the response times of the various protocols for invocations returning smaller lists:

protocol-benchmark-smaller-lists-3.png

The following graph illustrates the response times of the various protocols for invocations returning larger lists:

protocol-benchmark-larger-lists-3.png

Conclusion

It’s important to note that no general conclusions can be derived from the absolute numbers represented above. Rather, the numbers must be examined relative to each other.

That said, the following observations may be made:

The binary protocols (RMI, ORMI, HttpInvoker and Hessian) are always faster than the XML-based protocols (Burlap and the Apache XML-RPC variants) — except for ORMI with HTTP tunneling enabled. Performance is pretty even amongst the binary protocols — except for Hessian, which performs well only when compared to the XML-based protocols, and ORMI with HTTP tunneling enabled, which performs on a par with the XML-RPC variants. Burlap has much better performance than the XML-RPC variants. Native RMI and Hessian 2 have the best performance until the remote method invocations start returning larger lists, at which point vanilla ORMI takes a slight lead. Changing Apache XML-RPC’s transport factory does not seem to have a very large effect on performance. It’s amazing how fast standard ORMI is, compared to ORMI with HTTP tunneling enabled. Hessian 2 bears watching!

Other Interesting Links

JBoss Remoting Framework Benchmarks: Tom Elrod’s benchmark of the JBoss Remoting framework. Includes part of the Spring Remoting framework.

Nominet’s Protocol Benchmarks: Interesting benchmark of many of the protocols here considered. Intriguingly, HttpInvoker is found to have consistently better performance than RMI/JRMP, a finding which contradicts our results.

Footnotes

[1] If you’re using Spring’s Hessian integration and the latest Hessian JARs, moving to Hessian 2 is as easy as adding the following property to your HessianProxyFactoryBean:

    <property name="proxyFactory">
        <bean class="com.caucho.hessian.client.HessianProxyFactory">
            <property name="hessian2Request" value="true" />
            <property name="hessian2Reply" value="true" />
        </bean>
    </property>

[2] The tests were later run on two separate machines over a controlled network, in order to verify that the local tests are representative; they are.

« Previous entries


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser