For all you know, it's just another Java library Fri, May 2. 2008
We also had an occasion to have 2,000 simultaneous (as in at the same time, pounding on their keyboards) users of Buy a Feature and we were able to, thanks to Jetty Continuations, service all 2,000 users with 2,000 open connections to our server and an average of 700 requests per second on a dual core opteron with a load average of around 0.24... try that with your Rails app.
One of the customers of Buy a Feature wanted it integrated into their larger, Java-powered web portal along with 2 other systems. I did the integration. The customer asks "Where's the Scala part?" I answer "It's in this JAR file." He goes "But, your program is written in Scala, but I looked at the byte-code and it's just Java." I answer "It's Scala... but it compiles down to Java byte-code and it runs in a Java debugger and you can't tell the difference." "You're right," he says.
So, to this customer's JVM, the Scala and lift code looks, smells and tastes just like Java code. If I renamed the scala-library.jar file to apache-closures.jar, nobody would know the difference... at all.
Okay... but each set of people I talk to, I hear a similar variation about the "operational risks" of using Scala.
Let's step back for a minute. There are development and team risks for using Scala.
Some Java programmers can't wrap their heads around the triple concepts of (1) type inference, (2) passing functions/higher order functions and (3) immutability as the default way of writing code. Most Ruby programmers that I've met don't have the above limitations. So, find a Ruby program who knows some Java libraries or find a Java programmer who's done some moonlighting with Rails or Python or JavaScript and you've got a developer who can pick up Scala in a week.
Yes, the tools in Scala-land are not as rich as the tools in Java-land. But, once again, anyone who can program Ruby can program in Scala. There's a fine Textmate bundle for Scala. I use jEdit. Steve Jenson uses emacs. Thanks to David Bernard's continuous compilation Maven plugin, you save your file and your code is compiled.
Oh... and there's the old Eclipse plugin which more or less works and has access to the Eclipse debugger and the new Eclipse plugin is reported to work quite well. And then there's the NetBeans plugin which is still raw, but getting better every week.
Even with the limitation of weak IDE support, head-to-head people can write Scala code 2 to 10 times faster than they can write Java code and maintaining Scala code is much easier because of Scala's strong type system and code conciseness.
But, getting back to our old friend "you can't tell it's not Java", I wrote a Scala program and compiled it with -g:vars (put all the symbols in the class file), started the program under jdb (the Java Debugger... a little more on this later) and set a breakpoint. This is what I got:
6 args.zipWithIndex.foreach(v => println(v))
v = {
_2: instance of java.lang.Integer(id=463)
_1: "Hello"
}
main[1] where
[1] foo.ScalaDB$$anonfun$main$1.apply (ScalaDB.scala:6)
[2] foo.ScalaDB$$anonfun$main$1.apply (ScalaDB.scala:6)
[3] scala.Iterator$class.foreach (Iterator.scala:387)
[4] scala.runtime.BoxedArray$$anon$2.foreach (BoxedArray.scala:45)
[5] scala.Iterable$class.foreach (Iterable.scala:256)
[6] scala.runtime.BoxedArray.foreach (BoxedArray.scala:24)
[7] foo.ScalaDB$.main (ScalaDB.scala:6)
[8] foo.ScalaDB.main (null)
main[1] print v
v = "(Hello,0)"
main[1]
A long time ago, when Java was Oak and it was being designed as a way to distribute untrusted code into set-top boxes (and later browsers), the rules defining how a program executed and what the means of the instruction set (byte codes) were was super important. Additionally, the semantics of the program had to be such that the Virtual Machine running the code could (1) verify that the code was well behaved and (2) that the source code and the object code had the same meaning. For example, the casting operation in Java compiles down to a byte code that checks that the class can actually be cast to the right thing and the verifier insures that there's no code path that could put an unchecked value into a variable. Put another way, there's no way to write verifiable byte code that can put a reference to a non-String into a variable that's defined as a String. It's not just at the compiler level, but at the actual Virtual Machine level that object typing is enforced.
In Java 1.0 days, there was nearly a 1:1 correspondence between Java language code and Java byte code. Put another way, there was only one thing you could write in Java byte code that you could not write in Java source code (it has to do with calling super in a constructor.) There was one source code file per class file.
Java 1.1 introduced inner classes which broke the 1:1 relationship between Java code and byte code. One of the things that inner classes introduced was access to private instance variables by the inner class. This was done without violating the JVM's enforcement of the privacy of private variables by creating accessor methods that were compiler enforced (but not JVM enforced) ways for the anonymous classes to access private variables. But the horse was out of the barn at this point anyway, because 1.1 brought us reflection and private was no longer private.
An interesting thing about the JVM. From 1.0 through 1.6, there has not been a new instruction added to the JVM. Wow. Think about it. Java came out when the 486 was around. How many instructions have been added to Intel machines since 1995? The Microsoft CLR has been around since 2000 and has gone through 3 revisions and new instructions have been added at every revision and source code compiled under an older revision does not work with newer revisions. On the other hand, I have Java 1.1 compiled code that works just fine under Java 1.6. Pretty amazing.
Even to this day, Java Generics are implemented using the same JVM byte-codes that were used in 1996. This is why you get the "type erasure" warnings. The compiler knows the type, but the JVM does not... so a List<String> looks to the JVM like a List, even though the compiler will not let you pass a List<String> to something that expects a List<URL>. On the server side, where we trust the code, this is not an issue. If we were writing code for an untrusted world, we'd care a lot more about the semantics of the source code being enforced by the execution environment.
So, there have been no new JVM instructions since Java was released. The JVM is perhaps the best specified piece of software this side of ADA-based military projects. There are specs and slow-moving JSRs for *everything*. Turns out, this works to our benefit.
The JVM has a clearly defined interface to debugging. The information that a class file needs to provide to the JVM for line numbers, variable names, etc. is very clearly specified. Because the JVM has a limited instruction set and the type of each item on the stack and of each instance variable in a class is know and verified when the class loads, the debugging information works for anything that compiles down to Java byte code and has semantics of named local variables and named instance variables. Scala shares these semantics with Java and that's why the Scala compiler can compile byte-code that has the appropriate debugging information so that it "just works" with jdb. And, just to be clear, jdb uses the standard, well documented interface into the JVM to do debugging and *every other* IDE for the JVM uses this same interface. That means that an IDE that compiles Scala can also hook into the JVM and debug Scala. That's why debugging work with the Scala Eclipse plugin.
But, let's go back to the statement: nobody knows Scala's operational characteristics.
That's just not true. Scala's operational characteristics are the same as Java's. The Scala compiler generates byte code that is nearly identical to the Java compiler. In fact, that you can decompile Scala code and wind up with readable Java code, with the exception of certain constructor operations. To the JVM, Scala code and Java code are indistinguishable. The only difference is that there's a single extra library file to support Scala.
Now, in most software projects, you don't have CEOs and board members, and everybody's grandmother asking what libraries you're using. In fact, in every project I've stepped into, there have been at least 2 libraries that the senior developers did not add but somehow got introduced into the mix (I believe in library audits to make sure there's no license violations in the library mix.) So, in the normal course of business, libraries are added to projects all the time. Any moderately complex project depends on dozens of libraries. I can tell you to a 100% degree of certainty that there are libraries in that mix that will not pass the "is the company that supports them going to be around in 5 years?" test. Period. Sure, memcached will be around in 5 years and most of the memcached clients will. Slide on the other hand is "retired". And Mongrel...
Making the choice to use Scala should be a deliberate, deliberated, well reasoned choice. It has to do with developer productivity, both to build the initial product and to maintain the product through a 2-5 year lifecycle. It has to do with maintaining existing QA and operations infrastructure (for existing JVM shops) or moving to the most scalable, flexible, predictable, well tested, and well supported web infrastructure around: the JVM.
Recruiting team members who can do Scala may be a challenge. Standardizing on a development environment may be a challenge as the Scala IDE support is immature (but there's always emacs, vi, jEdit and Textmate which work just fine.) Standardizing on a coding style is a challenge. These are all people challenges and all localized to recruiting and development and management thereof. The only rational parts of the debate are the trade-off between recruiting and organizing the team and the benefits to be gained from Scala.
But, you say, what if Martin Odersky decides to take Scala in a wrong direction? Then freeze at Scala 2.7 or 2.8 or where-ever you feel the break is rational. It was only last year that Kaiser moved from Java 1.3 to 1.4. Working off of 2 or 3 year old technology is normal. Running against trunk-head is not the way of an organization that's asking the question "where will Martin take Scala in 5 years?" And oh, by the way, if Martin gets off track or Scala for some reason languishes, it's most likely to be the same scenario as GJ (Generics Java... Martin's prior project that turned into Java Generics)... it's because Java 8 or Java 9 has adopted enough of Scala's features to make Scala marginal. In that case, you spend a couple of months porting the Scala code to Java XX and in the process fix some bugs.
And not to put too fine a point on it, but Martin's team runs one of the best ISVs I've ever seen. They crank out a new, feature-packed release every six months or so. They respond, sometimes within hours, to bug reports. There is an active support mechanism with some of the best coders around waiting to answer questions from newbies and old hands alike. If we were to measure the Scala team on commercial standards, they've got a longer funding runway than any private software company around and they're more responsive than almost every ISV, public or private. So what if they're academic... maybe that means they're thinking through issues rather then being code-wage slaves.
Bottom line... to anyone other than the folks with hands in the code and the folks who have to recruit and manage them, "For all you know, it's just another Java library."
#1 - pk11 2008-05-03 07:19 - (Reply)
btw the new netbeans scala plugin is awesome:
-code completion (both java and scala)
-debugger
-syntax highlighting
-rename variables in the file
-mark occurrences
#2 - Joel Klein said:
2008-05-03 07:27 - (Reply)
"And not to put too fine a point on it, but Martin's team runs one of the best ISVs I've ever seen." I can believe that. I was chatting with some of Odersky's students at a conference when GJ was the current project, and they were laughing about how obsessed he was with fixing bugs in the compiler.
#3 - Calum Leslie said:
2008-05-03 08:17 - (Reply)
This is an excellent article. I personally think that for a lot of uses Scala's tool support isn't quite there yet, but it's definitely coming along apace since the language has been getting more stable. The growing stability of the language itself actually addresses some of the concerns you mention; Odersky I believe stated that with the book being released and so on, that the newer revisions of the language are intended to be more stable. I think that Scala's getting out of its "experimental" phase and becoming really usable by any scale, and that's exciting.
The interesting thing about making Scala interact with Java is that to do so effectively, you really need to be aware of what you're doing. Scala constructs are usable from Java, but not easily usable. The interface has to be carefully written to work in both contexts fluently, but this is largely the fault of the languages being fundamentally different in emphasis rather than any compatibility issue.
Your main point is important. People should use what works best for them. What makes Scala interesting is that it's an effective language set upon a proven base, managing to mix the best of both of those worlds nicely.
#4 - anon 2008-05-03 08:25 - (Reply)
"An interesting thing about the JVM. From 1.0 through 1.6, there has not been a new instruction added to the JVM. Wow. Think about it. Java came out when the 486 was around. How many instructions have been added to Intel machines since 1995? The Microsoft CLR has been around since 2000 and has gone through 3 revisions and new instructions have been added at every revision and source code compiled under an older revision does not work with newer revisions. On the other hand, I have Java 1.1 compiled code that works just fine under Java 1.6. Pretty amazing."
Shouldn't you have said that you have Java 1.6 compiled code that works fine under Java 1.1? I mean, to bring the point home. What your example conveys is that no instructions have been removed, not that none have been added.
#5 - Steve Yen said:
2008-05-03 09:30 - (Reply)
Nice post!
Another data point for you: I've been using various Java debugging and profiling tools against a Scala-based project, including agentlib-based tools (hprof, jrat, etc) that instrument your program by rewriting bytecodes. These all have worked fine. All those tools were written originally without Scala in mind.
I'm also happy to report that Scala works fine with the various Java networking frameworks such as the Apache Mina NIO framework and the Sun Grizzly NIO framework. And, of course, Scala works too with simple blocking socket I/O (java.net.*).
#6 - gwenhwyfaer 2008-05-03 10:18 - (Reply)
> On the other hand, I have Java 1.1 compiled code that works just fine under Java 1.6. Pretty amazing.
Not half as amazing as the other way around...
#7 - bob said:
2008-05-03 13:33 - (Reply)
I agree completely with your analysis.
Scala is a language that happens to compile to Java byte codes, just like C/Fortran/etc can all compile to x86.
My big (performance, rather than operational reliability) concern about code generation is whether the Scala compiler generates bytecodes to implement its language features, or if it generates invocations to Scala's runtime frameworks. If the latter, it would be much more like a threaded interpreted language, such as Forth. (http://en.wikipedia.org/wiki/Threaded_code)
--bob
(just a nit, but new bytecodes have been added, for example, invokedynamic
http://jcp.org/en/jsr/detail?id=292)
#7.1 - David Pollak said:
2008-05-03 17:10 - (Reply)
Bob,
The analysis is different than compilation to X86. If you compile to X86, there are different calling conventions, different stack layout conventions and there's no way a GDB will work with a compiler that I write unless I follow the same conventions that GCC follows.
Put another way, please cast your memory back to the impendence mis-matches between Pascal and C calling conventions on Windows and Mac or the hassle of debugging a name-mangled C++ program with a C debugging tool. None of that was fun.
On the other hand, the JVM's byte-code is well defined, verifiable and has a single stack frame layout. Further, the semantics of Scala are very close to the semantics of Java (single inheritance, strongly typed, etc.) Scala calls on the stack intermingle with Java calls without proxy code or other things that languages such as JRuby need. Scala instances look, smell, and taste to the JVM just like Java instances and the Scala compiler emits the right class meta-information to make the Scala classes inspectable in jdb. Thus, it's possible to using jdb and other Java tools with Scala code.
As to your comment about invokedynamic... that is proposed for Java 1.7 and is not part of Java 1.6.
As to the above comments about running Java 1.6 code on a 1.1 JVM... that's not possible because the 1.1 JVM does not have 1.1 libraries and new methods on String, Double, Float, Int, etc. will be missing.
#7.1.1 - bob said:
2008-05-04 13:09 - (Reply)
David, I guess I should have said VAX rather than x86. The calling conventions on the VAX were standardized across all the languages, and one DEBUGger supported all the languages, including inter-language calling, including BASIC, DIBOL, COBOL, C and Pascal and Fortran and PL/I. --bob
#7.2 - James Iry said:
2008-05-03 18:16 - (Reply)
Bob,
> My big (performance, rather than operational reliability) concern about code generation is whether the Scala compiler generates bytecodes to implement its language features, or if it generates invocations to Scala's runtime frameworks.
The Java-like subset of the Scala language compiles to bytecode very similar to what you'd get from javac. For the more advanced features:
Closures are instances of anonymous classes that implement a simple interface.
Captured immutable variables are done exactly as in Java's inner classes.
Captured mutable variables are done with an extra level of indirection through another heap allocated object.
Pattern matching is done in part through generated code, in part through "instanceof" style checks, and in part by calling an "unapply" method on an either a user supplied or compiler generated object. The exact mix depends on your pattern match.
Lazy parameters are syntactic sugar for 0 argument closures. Lazy vals are the same, but memoized.
Structural types are done using Java reflection, so there is some expense here.
XML is largely done via libraries.
Hopefully this gives you some feel for the performance characteristics implied by various features. As always there's no substitute for performance testing.
#8 - shopping cart said:
2008-07-16 03:37 - (Reply)
Hi
It is a great and nice post and I like it.

