New Adventures in Software


Installing GHC 6.10.1 on OS X 10.4

Posted in Haskell, Mac by Dan on November 21st, 2008


Every time I need/decide to upgrade GHC, it seems there’s a different set of hoops I need to jump through to get it working on OS X 10.4 (Tiger).  I don’t have OS X 10.5 (Leopard) and I don’t intend to buy it, so unfortunately I don’t get to use the nice-and-simple installer.  I’ve decided to write down the exact steps that I’m taking this time so that I have a reference if I need to do it again (or if somebody else needs to do the same).

I’m pretty certain that this isn’t the way I did it last time.  I seem to recall manually building the whole thing from a source tarball and having to resolve the dependencies myself.  Then again, that’s probably why I have to upgrade now - my 6.8.2 install appears to be broken.

MacPorts and Xcode

The GHC site recommends that Tiger users use MacPorts, so that’s what I’m doing.  I would have used fink, because I already have that set-up, but they don’t have a recent GHC build available for Tiger (6.6.2 is relatively ancient).

First, I tried to install MacPorts without upgrading Xcode.  It hung.  So then I did what I had been told (and had ignored) and downloaded the latest version of Xcode from the Apple Developer Connection.  For Tiger, the latest version is 2.5.  3.0 and above are for Leopard only.  At 903mb, the download is not exactly slimline.  After running the Xcode installer, the MacPorts installer worked properly, which was nice.

Installing GHC from MacPorts

After that, it’s supposed to be easy:

$ sudo port install ghc
Password:
sudo: port: command not found

MacPorts installs to /opt/local and I didn’t have /opt/local/bin on the path (it seems that the “postflight script” mentioned here didn’t run or didn’t work). No problem:

$ sudo /opt/local/bin/port install ghc

This is meant to download, build and install the latest GHC and all its dependenices (GMP, yet another version of Perl, etc.).  After some time had elapsed, my first attempt failed with this helpful message:

Error: Status 1 encountered during processing.

The GCC output seemed to suggest that it couldn’t find the GMP library that MacPorts had just installed.  Google revealed this to be a bug in the Portfile.  Somebody else had run into the same problem earlier the same day and the maintainer was on the case.  After leaving it for a day, the bug is now fixed and I tried again.  This time the installation proceeded without problems, although it took a fecking long time to complete.

Paths and Symlinks

Once the install was done, I removed all traces of the previous 6.8.2 install (which was under /usr/local) and made sure that /opt/local/bin was on my path (in ~/.bash_login).

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.10.1

Excellent.  The thing that prompted me to upgrade was that Haddock wasn’t working for me since I upgraded it to the latest version.  So that was the next thing to check:

$ ./setup.hs haddock
setup.hs: haddock version >=0.6 is required but it could not be found.

It seems that for this particular build of GHC, the Haddock executable is called haddock-ghc, rather than haddock as in 6.8.2. Cabal is still looking for haddock though, so I added a symlink and everything was fine again:

$ sudo ln -s /opt/local/bin/haddock-ghc /opt/local/bin/haddock

I think I now have a working GHC 6.10.1 installation.

Want more articles like this? Subscribe to the feed.

Smaller Java

Posted in Java by Dan on November 9th, 2008


In my previous post I talked about how to reduce the size of your Java binaries without sacrificing functionality.  Using Proguard to strip out unused and redundant code, I was able to squeeze 1.4 megabytes of already-compressed JAR files (an applet plus its libraries) into 276kb.  The motivation was to reduce download times and data transfer costs for network-launched software (applets and Web Start applications).  An 80% size reduction is pretty impressive but can we do better?  Yes we can.

GZip

A JAR file (or Java ARchive) is simply a zip file with a different extension.  In other words, the contents are already compressed.  Given that we’ve already removed redundant information and zipped the files, you might think that we couldn’t compress the code much further.

Files in a zip archive are compressed individually.  In practice, better compression is often achieved by compressing an archive as a whole so that similarities across separate files can be exploited.  For this, we can use gzip.  Compressing the 276kb JAR file results in a 251kb GZipped JAR file.  This is a reduction of about 9%.  Good but not spectacular.

GZip is more effective when its input is uncompressed.  If we expand the 276kb JAR file, repack it as an uncompressed JAR (use jar -0) and then gzip that, the resultant jar.gz file is a mere 193kb.  Now that’s more like it, a 30% reduction on our already spartan 276kb binary.

At this point it is worth noting that you can’t just embed a jar.gz file in an HTML page.  It won’t work.  Instead, what you need to do is set-up content-negotiation on the web server so that when a browser requests the vanilla applet JAR file, it receives the GZipped version.  This is actually pretty straightforward and we’ll cover that shortly.  But first, I don’t think that 193kb is anywhere near small enough.  Can we do better?  Yes we can.

Pack200

We’ve reached the limits of what we can achieve with general purpose compression techniques.  We’ve also reached a lower limit for applets, since we are reliant on the browser for compression.  For Web Start applications though it’s a different story.

The Sun JDK includes a little-known tool called pack200.  It is a compression utility designed specifically for compressing JAR files.  Because Pack200 understands the class file format used by the archive contents, it is able to make optimisations that are unavailable to general purpose tools.  Pack200 restructures the archive and the class files it contains and then GZips the result.

At this point I really didn’t think there was much scope for further reductions in size.  I was wrong.  That 276kb JAR that we started with, the one that started out as 1.4mb of compressed Java bytecode, the one that was squashed to just 193kb by GZip, was reduced to a tiny 81 kilobytes after Pack200 had finished with it.

Over the course of two blog posts, I’ve reduced my data transfer requirements by 94%.  Despite the difference in size, the two programs are functionaly equivalent.  Of course, I shouldn’t pretend that compression is completely free.  The client machine will have to unpack the archive.  This will increase start-up processing, but since smaller files are downloaded quicker, it’s still likely to be faster overall.

Content-Negotiation

To use either Pack200 or GZip to compress network-launched Java applications requires content-negotiation on the web server.  The client tells the web server what encodings it supports and the web server responds with the most appropriate option.  In the case of applets, the client is the web browser.  None of the browsers that I have tested support Pack200, but they will all accept GZip.  For Web Start applications, the javaws launcher does accept Pack200 so that will be used were available.

Sun’s Pack200 page describes a servlet that can perform the necessary content-negotiation.  However, if you’re not already running a servlet container, it’s probably easier to use the features of your web server.  Chris Nokleberg has written some straightforward instructions on how to achieve this with Apache.

Want more articles like this? Subscribe to the feed.

The Incredible Shrinking Software

Posted in Java by Dan on November 7th, 2008


It may not seem important to those of us who develop server-side Java software, but size matters.  If you distribute your software on CD or DVD, you aren’t going to worry about 10 megabytes here and there.  The only Java developers who tend to put as much thought into optimising for size as they do into optimising for performance are those that work with JavaME and its constrained environments.  However, network-launched software, such as applets and Web Start applications, can suffer greatly from bloated binaries too.

In an era when users are used to interacting with AJAX web applictions and snappy Flash-powered content, a start-up time that rivals that of a cassette game on a C64 is going to set your application apart for all the wrong reasons.  Of course, maybe your software is so good that it doesn’t matter about the load time and people will use it anyway.  Then you have another problem.  If you aspire to have a huge number of users for your huge application, data transfer costs are going to bite.

1995 called, it wants its RIA technology back…

So where am I going with this?  If you’ve been following along at home, you’ll know that I’ve been playing around with applets again recently.  Version 1.4.3 of this applet (the last build of its initial incarnation, circa 2002) weighed in at 20.9 kilobytes.  Version 2.0.3 (the last build from the 2005 rewrite) was a relatively bloated 39.7 kilobytes, but still smaller than many of the image files that people embed in their web pages.  But version 3.0 was/is going to be much more ambitious.  More features means more code.  And since I’d finally be giving up on AWT and moving to Swing, I really had no excuse not to replace the horrific custom graphs I’d hacked together for version 2.

Every Swing developer knows that if you need graphs, there’s really only one place you need to look: JFreeChart.  JFreeChart does everything you could ever possibly need to do with charts and graphs.  It does it well and even looks pretty good.  You can tweak just about everything in order to get exactly what you are looking for.

Your library is so fat it’s got its own postcode

There was just one problem with JFreeChart: its size.  1.6 megabytes is not huge in most contexts, but something about having a 50kb applet towing a 1.6mb dependency offended me.  Perhaps it was because it made my contribution to the whole a lot less significant, but it seemed to me a lot like a bicycle pulling a caravan.

I could have just accepted it as a fact of life.  Good, comprehensive libraries are unlikely to be small.  Broadband users would probably be able to accept the slightly longer start-up, but any dial-up users would give up long before they’d get to see anything.

So I looked at the alternatives.  Most were smaller, some were ugly and some seemed a bit too basic.  A couple looked like feasible replacements but, other than the size, I was happy with JFreeChart.  Would I be able to get the results I wanted with these more limited libraries?  I’d also have to spend some time figuring out how to use them.  As I saw it, there was only one solution.  JFreeChart would just have to become smaller.

It was pretty obvious that there was ample scope to achieve a significant reduction in JFreeChart’s size.  As already mentioned, JFreeChart is very comprehensive.  It supports several different types of charts, customisable renderers and all sorts of other optional stuff.  My applet was using only a small fraction of this functionality (line graphs and pie charts).  The rest of it could go.

The labour-intensive approach would have been to check out JFreeChart’s source code, start deleting code and hack around until something smaller emerged (and hopefully compiled).  Too much effort for me.  Which is where the “Incredible Shrinking Software” of the title comes in…

Enter Proguard

You may already be familiar with Proguard.  It’s arguably the premier open source obfuscator.  If you’ve ever wanted to make your Java software difficult to reverse-engineer, then you’ve probably already used it.  But obfuscation is only one of two complementary functions that Proguard performs.  The other is shrinking.

Obfuscation and shrinking are inextricably linked since both involve removing unnecessary information from an application’s compiled binaries.  Java class files contain information that is not necessary to run the code, JAR files contain classes that you don’t use, the classes that you do use contain methods that you will never call, and all those descriptive identifiers you used are way too verbose.

The low-hanging fruit of class file shrinkage is the debug information inserted by the compiler.  This includes the line number table that is used to insert something more useful than “unknown source” into exception stack-traces.  You can simply instruct the compiler to omit this data (-g:none) and the class files will be smaller.  The downside of this is that the information won’t be there if you need it.

Proguard goes much further than this though.  You give it one or more entry points (in this case a class that extends Applet, but it could be a class with a main method, or something else).  From this, Proguard finds code from the input JAR(s) that is definitely not used and removes it.  In addtion, where it won’t break anything, members are renamed with shorter names, such as ‘a’ and ‘b’, resulting in further reductions in code size (again at the expense of easy debugging - stack traces will have neither meaningful names nor source code line numbers).

How low can you go?

So, how small did I make JFreechart?  Well, I cheated slightly by reverting from version 1.0.11 to version 1.0, which had all the functionality I needed but was only 1.3mb in size.  With my applet code weighing in at just over 100kb unobfuscated, the combined applet + JFreeChart size was 276kb after shrinking, a total size reduction (for applet and library combined) of about 80%.

Pretty good, hey?  Still bigger than I’d like though.  Proguard has taken me as far as it can, but 276kb is not the limit of my ambition.  There are still further reductions that can be made without sacrificing functionality. To be continued…

Want more articles like this? Subscribe to the feed.

Swing Applications on OS X

Posted in Java, Mac by Dan on November 6th, 2008


This post is mostly for my future reference as I keep forgetting this information and have to search for it each time. These links demonstrate the little tweaks that you can make to your Swing applications to improve the user experience under OS X.

Make Your Swing App Go Native (Part 1, Part 2, Part 3) Bringing your Java Application to Mac OS X (Part 1, Part 2)
Want more articles like this? Subscribe to the feed.
Comments Off

Preview Version of FSA 3.0 / Full Historical FA Premier League Statistics

Posted in Java by Dan on November 6th, 2008


A while ago I wrote about my decision to resurrect and open source one of my oldest projects, the Football Statistics Applet.  Being an AWT-based Java 1.1 applet, written by my less experienced self, the code was pretty clunky.

After a period of inactivity, I’ve spent a lot of time in the last week on the new Swing UI and other enhancements.  Many of the improvements I have in mind for version 3.0 are still to be done, but the updated software is at least useful, if you are interested in football statistics.

Version 3.0 head-to-head view (OS X)

I have set up a demo page for a pre-release build of 3.0 that displays statistics for every English Premier League season since 1992.  It also generates an all-time table that combines the separate seasons into one.  If you follow English football, you may find it interesting.  Any feedback should be directed to the issue tracker.

Are applets dead?  Maybe, though Sun doesn’t think so.  Even so, the refactored FSA codebase offers opportunities for other non-applet projects in the future.

Want more articles like this? Subscribe to the feed.

ReportNG 0.9.8 - HTML and XML reports for TestNG

Posted in Java by Dan on October 21st, 2008


Version 0.9.8 of ReportNG is now available for download.  This version addresses a couple of issues with the XML output from the JUnitXMLReporter:

The XML output now includes failed and skipped configuration methods.  Previously these were included in HTML reports but omitted from the XML. You can now control the dialect of the XML that is generated.  The default is to use the version that TestNG’s own reporter generates.  This includes the ability to mark tests as skipped and works well with Hudson.  Not all tools recognise the <skipped> element though, so you can now set the org.uncommons.reportng.xml-dialect property to “junit” (as opposed to “testng”) and it will mark skips as failures.  This works better with Ant’s junitreport task.

In addition, there have been a couple of enhancements to the HTML reporter:

There is now a separate page that collates all of the reporter log statements. You can now specify your own stylesheet to over-ride the default appearance of the generated report.  Just set the org.uncommons.reportng.stylesheet property to the path of your CSS file.  For example, the sample report looks like this (pictured) when using a custom Hudson-inspired stylesheet.

Thanks to Ron Saito and Mike Feinberg for the feedback and suggestions that were incorporated into this release.  If you have any problems, please use the issue tracker.  And if you come up with a good custom CSS file for the HTML reports, please consider submitting it so that it can be included in the distribution.

Want more articles like this? Subscribe to the feed.

Java Power Tools

Posted in Java by Dan on October 13th, 2008


I’ve been keen to take a look at John Ferguson Smart’s Java Power Tools since I first found out about it.  Fortunately, it has just been added to the ACM’s online books programme so, as an ACM member, I’ve been able to read it online.

The book consists of 30 chapters, each dedicated to a different development tool.  Build tools, version control, continuous integration, testing, profiling, static analysis and issue-tracking are among the topics covered.  For most tasks, more than one option is presented.  For example, the book covers both Ant and Maven, and JUnit and TestNG.  All of the tools covered are open source and freely available.

Java Power Tools

Some of the chapters will only be of interest to beginning Java developers.  I imagine that most Java professionals already know how to use Ant and some kind of version control system.  On the other hand, the book also introduces some tools which are not so well-known, so you are sure to find something useful here.

CVS and Subversion are the version control options demonstrated.  I can’t help thinking that Git (or even Mercurial) would have been a better choice for inclusion than CVS.  Usage of distributed version control systems is growing whereas CVS has effectively been supplanted by Subversion.

Elsewhere there are no such omissions.  The author covers four different continuous integration servers: CruiseControl, Continuum, LuntBuild and Hudson.  This is probably overkill.  I haven’t used LuntBuild, but I would quickly dismiss CruiseControl and Continuum in favour of Hudson.  It would have been sufficient to cover Hudson and one other.

The coverage of testing tools is particularly thorough, and is probably the most useful part for experienced developers.  Not only does it cover JUnit 4 and TestNG, but it also goes into some detail on a variety of related tools, such as DbUnit, FEST and Selenium, and performance testing tools including JMeter and JUnitPerf.

I found the chapter on the JDK’s profiling tools to be useful and there is also a chapter on profiling from Eclipse, but nothing on the NetBeans profiler.  This is my only real gripe with the book.  Three of the chapters are Eclipse-only with no alternatives offered for users of other IDEs.  One of these is the chapter on the Jupiter code review plug-in.  ReviewBoard might have been a better choice.

All-in-all though, this is a substantially useful book.  At 910 pages it covers a broad range of topics without skimping on the necessary detail.  There are dozens of ideas for improving and automating your software development processes.

If you want more information, Meera Subbarao at JavaLobby has also reviewed Java Power Tools.

Want more articles like this? Subscribe to the feed.

Distributed Evolutionary Algorithms with Watchmaker and Hadoop

Posted in Evolutionary Computation, Java by Dan on October 1st, 2008


One feature that has been on the TODO list of the Watchmaker Framework for Evolutionary Computation for some time is the ability to distribute the evolution across several machines.  Some time last year I started on a RMI-based solution, but I wasn’t happy with it so I deleted it and put the idea on the back burner while I concentrated on other things.  At some point I wanted to investigate using Terracotta, or possibly Hadoop, to distribute the computations.

However, it’s often the case with Open Source software that somebody smarter comes along and does the hard work for you.  I was delighted to find out today that Abdel Hakim Deneche has been busy integrating Watchmaker with the Apache Mahout project as part of Google’s Summer of Code programme.

I’d never heard of Mahout before.  According to Wikipedia, a Mahout is somebody who drives an elephant.  Apache Mahout is a sub-project of Lucene, the Java text search and indexing engine.  The Mahout project is focused on building scalable machine-learning libraries using Hadoop (presumably where the elephant connection comes in).

I haven’t yet tried using the Mahout software, but it looks like it provides a pretty straightforward way to distribute the fitness evaluations for just about any evolutionary algorithm implemented using Watchmaker.

Want more articles like this? Subscribe to the feed.
Comments Off

More thoughts on Stackoverflow.com

Posted in Software Development, The Internet by Dan on September 26th, 2008


Since my previous post on the subject, Stackoverflow.com has moved from private beta to public beta.  I’ve had more time to use the site and have some more thoughts.  The criticisms here are meant to be constructive.  Hopefully the feedback from users will help the Stackoverflow team to make a good site even better.

Performance

First the good news.  The site has transitioned from private to public very well.  Jeff and his team seem to have got it right in terms of architecture and infrastructure because, even with the increased load, it remains blindingly fast.

Front Page

In terms of usability, I think there’s more that could be done to help me find the content that I’m interested in.  The default front page is, to be honest, not very useful.  New questions are coming in so fast and on so many topics that displaying the most recent questions is just noise.

I would prefer to have a personalised home page that shows me relevant questions based on my previous answering/voting history.  I realise that this is major new functionality and I’m not criticising the Stackoverflow team for not having this in the initial version, it makes sense to get the site up and running first.  However, it would be great if this could be implemented at some point.  I’m not alone on this one, it’s the second most popular requested feature at the moment.

Presently I’m finding stuff that I want to look at by going to the tags page and clicking on interesting topics.  But I’m sure I’m missing out on questions that would be of interest if only I could find them.

Tag Cloud

The tag cloud on the right of the front page isn’t very helpful either.  It’s ordered with the most recent first.  If I just wanted to view questions tagged “html”, I’m going to struggle to find the tag in the cloud.  An alphabetical ordering would be more usable.  Unfortunately, this has already been suggested and rejected.

Voting and Reputation

I outlined my concerns on the voting mechanism previously.  In the interests of being constructive, rather than just a whiny blogger, I’ve opened new issues on the Stackoverflow Uservoice page.  If you agree with me, please vote on these issues:

Addressing each of these will help in resolving The Fastest Gun in the West Problem (currently the number one voted-on issue).  The problem is that early answers get the votes and later, better answers are largely ignored.  Removing the penalty for down-voting will encourage more down votes where they are deserved (so an early answer that is later shown to be wrong is less likely to retain a high score).  Also, if a down vote was as powerful as an up vote, people might be more careful in crafting good answers as opposed to quick answers.

Want more articles like this? Subscribe to the feed.

Source Control and Backups - More than just a good idea

Posted in Software Development by Dan on September 25th, 2008


Are there really software development teams out there that don’t use any form of proper source control at all, even the bad kind?  I’d like to think that it wasn’t the case but I’m not so naive.

There’s a reason that “Do you use source control?” is the first question on the Joel Test.  It’s because it’s the most important.  If you answer “no” to this question you shouldn’t be allowed to answer subsequent questions.  Even if the rest of your process is perfect, you score zero.  You failed at software development.  I could say that if your team doesn’t use source control it is a disaster waiting to happen, but more likely the disaster already happened and you haven’t noticed yet.

Of course, you and I aren’t nearly dumb enough to try developing anything more complex than “Hello World” without version control in place.  I’m sure I’m preaching to the converted.  The kind of people who read obscure software development blogs probably already know a few things about effective software development.

But how good are your back-ups?

You do have a back-up, don’t you?

If you don’t have a back-up you are one accidental key-stroke or one hardware failure away from scoring zero on the Joel Test (under my rules)… and failing at software development.  Hardware will fail, people will screw-up, disgruntled former employees will set fire to the building.  None of these is a problem but a failure to anticipate and prepare is.

How often do you back-up?

There is only one right answer to this: every day.  Weekly back-ups are too costly.  Can you really afford to have your whole team redo an entire week’s work?  The first time you lose a week’s work you will switch to daily back-ups, so why not just do it now?

A melted back-up is no back-up at all

Off-Site Storage. You could physically take tapes to another location or you could upload files to a remote server.  Just don’t leave them here.

Does it actually work?

Honestly, have you ever tried restoring your source control back-up onto a different machine?  The most comprehensive back-up plan imaginable is useless if you can’t restore the back-ups.  If you haven’t seen it working (recently) then it doesn’t work.  There’s a good time and a bad time to find out that your back-ups don’t work.  15 minutes after your source control server spontaneously combusted is the bad time.

Are you still here?  You should be checking those back-up tapes…

UPDATE: The good people of Stackoverflow are discussing what could possibly be a good excuse for not using source control.

Want more articles like this? Subscribe to the feed.
Next Page »


You are viewing a mobilized version of this site...
View original page here

How do you rate mobile version of this page?

Mobilized by Mowser Mowser