Geertjan Wielenga is the project owner of the open sourced JFugue Music NotePad. He is based in Prague, in the Czech Republic, where he works for the NetBeans Team, principally as a technical writer providing tutorials for the NetBeans Platform. He has worked as a technical writer since 1996, is Dutch and grew up in South Africa.
Kevin Bourrillion and Jared Levy are the two primary creators of the Google Collections Library, which aims to provide an extension to the Java Collections Framework. That's a pretty new and exciting development in the Java world. Time to find out more! Below, they discuss what the library is all about, its genesis, and how it will be useful to you.
Hi, Kevin and Jared. Can you briefly introduce each other?
Kevin: "About two years ago I had sent an internal Google announcement about my work on HashMultiset, which I'd just finished. I'd never met Jared, but he replied out of the blue and said he'd just go ahead and code up some Multimaps for us as a '20% project'. He'd been here about a month at the time, and I didn't know him from Adam, so I sent him a reply along the lines of 'Heh, heh, that's nice, go ahead and try, but you'll see, these things are a lot harder than they seem!' Oops. Turns out he was equal to the challenge, and in fact, by the time I came back to work from my paternity leave, Jared had even found and fixed about 97 bugs in Multiset and BiMap! As far as his background, I know he came from startups like I did, and he's a physicist by trade, but honestly, I don't have a clue where he came by this Collections mojo of his!"
Jared: "Kevin's the driving force behind Google's Java core libraries, which include lots of neat stuff besides collections, such as the Guice dependency injection framework. If it weren't for his enthusiasm, API design skills, coding ability, and willingness to do the grunge work, the Collections Library would never have been created. Plus, he's a great guy to work with. When I started at Google, I had no idea what side-project I might want to devote my '20% project' time to. A multimap class seemed like an interesting coding exercise that would help me learn generics. With Kevin's encouragement and feedback, I've stayed on the project for the last two years, writing libraries of much higher caliber than anything I've done before. Previously, Kevin did some important work on AdWords, but I'm not familiar with the details."
Kevin: "Yeah, until quite recently I was the lead engineer for billing and payments features for adwords.google.com. Laying the ground work for the Google Collections really had to fit into my 20% time while I was focusing on how to scale AdWords from supporting three forms of payment to the dozens that we accept today. You might not think so, but that was really fun work too. The people on that team are just beyond awesome. More relevant to your readers, the passion we all shared on that team for making incredibly audacious refactorings to the codebase we'd inherited is a huge part of what led to Guice being designed the way it is."
The main reason we're having this discussion is to find out about the Google Collections library. What problem does this library want to solve for us?
Kevin: "The same basic problems that the Java Collections Framework solves, just taken to the next level. Additional abstractions to more closely fit your needs, new data structures that speed up your processing, conveniences that wage war on boilerplate code everywhere. Jared and I both love to play board games and, when there's a board game that's successful, the publisher inevitably comes back and releases an 'expansion' for that game. With the expansion, you can still play the original game you already love, but you can do a little more as well -- sail over to the neighboring islands, or go build your train lines in Switzerland instead, etc. So we'd love for you to see our library as 'the unofficial expansion to the Java Collections Framework'! Whether you feel we do or don't measure up to that standard, it's the mindset we have had since the beginning. We find the gaps in java.util and we fill those in; we pick up where the JDK leaves off. Most importantly, we conform to the principles and design choices in the Java Collections as much as possible, only deviating when there's a very clear win."
Jared: "The main benefit of Google Collections is convenience for the developer. If you can write an application making use of the library, you could probably write that same application without it."
Kevin: "But just because you can do something, doesn't necessarily mean that you should!"
Jared: Right. The library's functionality simplifies your code so it's easier to write, read, and maintain. Some library methods make commonly used one-line commands, such as creating a HashSet of a particular generic type, more concise. Other new collection classes, such as a multimap, multiset, or bimap, are helpful in certain situations. While you can survive without such features, the Google Collections Library will improve your productivity as a developer, while reducing the amount of boilerplate low-level code you need to write."
Kevin: "And don't forget, code you don't need to write is code you don't need to test. We have some issues to sort out first, but all of our unit tests will be pushed out to subversion so you can see for yourself what you think of how well-tested our code is."
What is unique about your approach? How does it differ to, for example, the Apache Commons Collection?
Kevin: "Well, thank God for the Apache Commons. We'd all be in bad shape without libraries like this. That said, sadly that particular project has stalled, in a pre-generics world. They do want to adopt generics, but they recognize that this would involve a pretty nontrivial and incompatible rewrite. So far, no one seems to be actively driving such an effort. At Google we've been using Java 5 company-wide since the spring of 2005. A collections library being ungenerified was a deal-breaker for us, because we really hate getting compiler warnings. I was also concerned about the many places in which the Apache collections don't conform to the specifications of the JDK interfaces they implement."
"So, we were building the collections and related utilities that we needed, as we went along, and eventually, we had made something substantial enough to release. I don't know if our approach is absolutely 'unique,' but Google is an amazing environment for producing great results on things like this. We have a large and vocal user base of Java developers and a really healthy amount of collaboration. We have experts whose cubicles we can invade whenever we need to. And because of Google's famous '20% Time', every engineer who wants to contribute to our project is always free to do so -- and many have. Of course, Jared and I and our cohorts are merciless code-reviewers, so we've been able to keep the style and quality of the library consistent!"
Jared: "As Kevin implies, our library is the only collections library I know of, outside the JDK, built with Java 5 features: generics, enums, covariant return types, etc. When writing Java 5 code, you want a collections library that takes full advantage of the language. In addition, we put enormous effort into making the library complete, robust, and consistent with the JDK collection classes. Our collection classes were much more limited initially, but we've gradually improved them over the last two years. Since all library usage is in Google's source control system, we've had the flexibility to modify public interfaces. An open-source project like Apache Commons Collection doesn't have the freedom to change its behavior after the initial release. Since we'll lose that flexibility once Google Collections Library 1.0 is released, we're eager to receive feedback now so we can get things right."
Kevin: "Jared's right; being able to get our library into usage internally, but still make changes to it when we need to, has really helped. Sometimes it's tedious to deal with all the client changes, but we got pretty good at it. Anyway, when 1.0 is ready, it will be an API we can commit to."
Can you give us some crystal clear context by means of short examples? How would the way I do things now be different with the Google Collections library?
Kevin: "Your readers are in luck, because my friend Jesse Wilson has been writing a wonderful series of articles which he calls, "Coding in the small with Google Collections." Each article gives a very simple before-and-after comparison to show how using our library affects your code. Here's a good example of a 'before', for Multimap:
Map<Salesperson, List<Sale>> map = new Hashmap<SalesPerson, List<Sale>>();
public void makeSale(Salesperson salesPerson, Sale sale) {
List<Sale> sales = map.get(salesPerson);
if (sales == null) {
sales = new ArrayList<Sale>();
map.put(salesPerson, sales);
}
sales.add(sale);
}
And here is 'after':
Multimap<Salesperson, Sale> multimap = new ArrayListMultimap<Salesperson,Sale>();
public void makeSale(Salesperson salesPerson, Sale sale) {
multimap.put(salesperson, sale);
}
"Of course, this is a really trivial example. What's great about something like Multimap is that the minute you realize you need something slightly more complex, it's already ready to handle that for you. For example, if you're removing entries as well as adding them, in the first code example you'd have to figure out what to do about pruning those empty collections. With Multimap it just happens."
"In some of the examples, you can see how adopting our library makes the size of your code much smaller. In others, you may save only a bit. But in either case, you'll notice how much more readable the code becomes; how much more directly it can now express what it's really trying to say. That's really important. And remember, even one fewer line of code to write is one fewer line you'll have to test, review, and maintain!"
The library provides a number of enhancements. Can you state which of these enhancements, of them all, are especially attractive or interesting to you?
Kevin: "Bob Lee's ReferenceMap is a thrill. You never need WeakHashMap, WeakValueHashMap, WeakKeySoftValueHashMap, and so on and so on, just ReferenceMap. It handles any combination of strong, weak or soft keys, with strong, weak or soft values, and to top it off, it's fully concurrent, being backed by ConcurrentHashMap and implementing ConcurrentMap itself. Jared's Multimaps are incredibly flexible, powerful, and still understandable. They're very easy to use, but If you really read through the implementation, the complexity kind of blows your mind. I think Multimap is like the 'uber-collection'."
Jared: "Plus, the multimaps were so much fun to write. Of the programming challenges I've faced throughout my career, writing a new collections class was the most pure and satisfying. As Kevin says, the ReferenceMap is amazing. It's awesome that creating a map with weak or soft references is now as easy as creating a HashMap. ReferenceMaps are the biggest exceptions to my earlier claim that the Google Collection Library just makes things more convenient. Still, the most heavily-used features, which appear in almost every Java class I write, are static methods that reduce the number of repetitive keystrokes in your Java code. It's so convenient being able to enter commands like the following:
Map<OneClassWithALongName, AnotherClassWithALongName> = Maps.newHashMap();
List<String> animals = Lists.immutableList("cat", "dog", "horse");
One is reminded of Groovy when one sees the above! To what extent is what you're doing by providing the Google Collections Library comparable to what scripting languages such as, specifically, Groovy are doing in relation to Java?
Kevin: "I think we all want similar things: we want writing and reading the basic constructs that make up our software to be as easy as possible, if not easier. We're addressing that problem via libraries, staying within the confines of the Java language; Groovy is doing something more audacious,and I love seeing how their efforts are paying off. (Incidentally, more than being just syntactic sugar, our 'immutableList()' method returns a custom List implementation that's designed to handle the immutable case as efficiently as it can. 'immutableSet()' doesn't, but we're probably going to swap it out so that it does.)"
Jared: "Groovy and the Google Collections share one goal in common: making Java code less verbose. It's not surprising that their solutions have some overlap, despite the differences between a language and a library. Still, Groovy has many features that Google Collections lack, and vice versa. You could probably combine them, and use a ReferenceMap or Multimap in your Groovy code, though I doubt anyone has tried that yet."
How did this library come about?
Kevin: "Google's development tools and processes are specifically designed to promote the sharing of common source code across the whole company. If you wanted to write library code and make it commonly available, it was always pretty easy to do so, and a great many projects could then pull in and make use of your code effortlessly. This was all great, but over time the common libraries did turn into a bit of a junk drawer. Now, there's plenty of useful stuff in a junk drawer, it's just not always easy to find amid all the clutter."
"So, back in 2005 I was working on AdWords but I was looking for a 20% project. I ended up making the unusual decision that I would spend 20% of my time just working on the common Java libraries. As far as I know, I was the first person to ever do such a crazy thing. I started writing, fixing, testing, deprecating and deleting code with wild abandon. And because I had free access to all the client source code, I never had to guess at what library code people might want; I just looked at their code to see what they really were suffering without."
"One thing I noticed was that many people were using a Map, with values of type Integer or int[], to represent a histogram. I found dozens of instances of this, and I never found any two that actually coded it exactly the same way. It was stunning how many different variations could be found for such a basically simple idea, and more than one of them had bugs. This spurred me to write Multiset, which was the first major new collection type we made, and the real birth of the com.google.common.collect package."
Jared: "When Kevin announced his multiset, that inspired me to consider working on a multimap. After Josh Bloch made some suggestions in that same e-mail thread, I couldn't resist giving it a try. Over the next year, in the limited time remaining after dealing with my main project, I enhanced the multimap, giving it the functionality of the JDK collection classes. As the collection classes matured, Kevin proposed the next logical step: open-sourcing them. For the last several months, we've been working intensely to improve the caliber of the libraries: reconsidering the method signatures, adding missing functionality, fixing incorrect special-case behavior, rewriting the Javadoc, and so on. With the assistance of many other Googlers, we carefully reviewed every line of code, trying to perfect the libraries."
Is there a connection between this library and work you have done previously, such as Guice?
Kevin: "It's funny to hear it asked that way, since the bulk of the collections library really predates Guice. In fact, it was a bit of a pain to release Guice because we had to rip out the various places where we were using the Google Collections, and do things another way instead. Some things we just blindly copied into the Guice codebase in an 'internal' package so we could keep using them. Over time, I heard from several different people who wanted to open-source their work but had this little problem of depending on the Google Collections. Max Ross, who was releasing Hibernate Shards, told me that not being able to use the Google Collections was like 'coding with one hand tied behind my back.' So I realized that releasing this library would be an enabling step that would pave the way for all kinds of other future Google open-source releases, and that got me really excited. (And yes, you can hold our feet to the fire if that doesn't happen!)"
Jared: "This is the first open-source project that I've found the time to work on. After all these years of building on other people's open-source efforts, it's nice to finally contribute something of my own."
In order to use the Google Collections Library, one would need to add one or more JARs to one's application, until the library potentially becomes part of the JDK. Has this been seen as a problem or, at least, a hindrance?
Kevin: "Sure, managing all your JARs can be a big pain, but it's probably a big pain whether you add ours onto the list or not. And as far as weight, at present our JAR is only 350K, but yes, it will grow. What we found out with Guice is that for every one person with a legitimate difficulty in depending on the JAR, there are ten who simply feel that it's deeply weird to type 'import com.google' into their code. There's not much we can do about that!"
Jared: "Considering the proliferation of third-party jar files in most Java projects, it shouldn't be too much of a hindrance to include another one. Of course, it does take time for people to hear about Google Collections, learn how to use the libraries, and integrate them with their code. However, since its learning curve is less steep than most Java libraries, I'm optimistic that the meme will spread, especially with the help of an interview like this one."
There are several 'big names' included in this project, in various capacities. Josh Bloch not the least of them, of course. What's going on here?
Kevin: "Without a doubt, the biggest name on the project is my own. Sixteen letters. Yeah -- there's a double R in there. Poor Bob Lee, he really can't compete!"
"But seriously... Josh has been a godsend. This library would never exist in any releasable form if it wasn't for all of his guidance, advice, and sometimes even his grueling line-by-line code reviews. I shudder to think about how many hours in total I have stolen from that man's life. But he's actually been happy to help us; he really believes in the value of what we're doing, and he's determined to contribute many of these collections and utilities to JDK 7. We've pulled him into our discussions so frequently not only because he's a genius and an expert in All Things Java, but specifically because it's his library that we're extending. It's extremely important to us that our library be a simple, consistent and natural extension to the Java Collections Framework. The closer that our principles, conventions and standards can be to those of the JCF, the easier our library is for our users to learn and use. And Josh has all that stuff in his head. We are so lucky."
Jared: "We should acknowledge Doug Lea, the other big name you're referring to. Basically, when we have questions we talk to Josh, and when Josh wants a second opinion he calls Doug. In addition, we should acknowledge the numerous other people who helped out, such as Mike Bostock (who, sadly, is no longer at Google) and Robert Konigsberg. Many people added new functionality, reviewed code, or suggested features we could add."
Kevin: "That list goes on and on. Some of the names you see in @author tags, like Crazy Bob, and also Cliff L. Biffle who wrote us a spiffy ConcurrentMultiset. But we couldn't possibly thank all of the people we should. Googlers are awesome people."
So you'd like this to be part of JDK 7? Do you have plans to enter the process to make that happen or how far along are you?
Kevin: "Well, Josh, Doug and I are still working on narrowing down our specific set of recommendations. I'd always imagined that the mechanics of this would work similarly to the way in which Google contributed ArrayDeque and so forth for JDK 6, as part of the ongoing maintenance of JSR 166. But now it looks more likely that this will happen via the new OpenJDK channels. I don't know much about how the OpenJDK process works yet, but I believe Doug will be a committer and we will work through him."
The library is currently listed as being at version '0.5 (ALPHA)'. So, how fragile is all of this? Could it change completely by the time we reach Beta?
Kevin: "Yes! There's nothing that we might not change. We might decide tomorrow that BiMaps are wack and just yank them from the project altogether (not likely). But don't worry -- the goal of this period is to get to a 1.0 that we can commit to. Our plan is that every release after 1.0 will be backward-compatible with the release before it, so we really have to get it 'right' before then."
Jared: "On the other hand, hundreds of Googlers are using the library already, and we've already responded to their feedback. I don't anticipate many major changes to the libraries, besides adding new features, but we won't make any commitments until version 1.0 is released."
What is the 'take away' message that you'd like readers of this interview to be left with?
Kevin: "This library has gotten where it is today in part because of a strong and healthy dialogue that we have internally among Googlers from projects all over the company. A big part of why we're releasing it now is that we'd like to expand this dialogue. If you have ideas for us, we'd love to hear them. The best way to participate is to file issues, and join our mailing list to participate in discussions. We expect the library to continue to grow to encompass much more than it does today, and we could use your help!"
Jared: "Look over the library, or at least the Javadoc, and send us your suggestions. Thanks!"
Bios.
![[image]](http://mowser.com/img?url=http%3A%2F%2Fwww.javalobby.org%2Fimages%2Fpostings%2Fgeertjan%2Fpics%2Fkevin-b.jpg)
Kevin Bourrillion is a senior software engineer at Google, Inc., where he is a full-time developer and caretaker of the common libraries shared by Google's many Java projects. He is a lead developer of the Google Collections Library, as well as of Google's dependency injection framework, Guice. He came to Google in 2004 with seven years' experience fighting for life at various Hot Silicon Valley Start-Ups. He holds a bachelor's degree in Mathematics from Bradley University.
![[image]](http://mowser.com/img?url=http%3A%2F%2Fwww.javalobby.org%2Fimages%2Fpostings%2Fgeertjan%2Fpics%2Fjared-l.jpg)
Jared Levy has been a senior software engineer at Google for the last two years. He's on the team that evaluates the quality of Google's search engine. In addition, he assists the development of the Google Collections Library. Previously, he developed enterprise applications at TenFold Corporation and speech authentication applications at Vocent Solutions. He received a Ph.D. in theoretical physics from Stanford University.
You are viewing a mobilized version of this site...
View original page here