xpkg: generic package managment software
Many projects benefit from package-based distribution with dependencies. Famously, Perl has CPAN and Debian has apt. It can be argued these two projects are alive and successful because of CPAN and apt. Various packages can be downloaded from central server package repositories and installed on a local machine. If one package has declared dependencies on other packages then those other packages are installed also. Many package management systems are quite similar and it would make sense if they could use the same package management software on the client and sever.
xpkg is just such a bunch of generic package management tools. I've extracted these tools from the xjs project and these tools will be able to work on the client and server for any project that could benefit from package distribution on any operating system. Think of things like a central emacs extensions repository, a central C library repository, a central anything plugin repository. There are plenty of projects that could benefit from package-based distribution and a central package repository but cannot justify writing the software to manage the packages on the client or server. xpkg to the rescue.
Here is how it will work on a UNIX-like system.
First, install the generic xpkg package management client software.
curl -O http://xpkg.michaux.ca/releases/XPKG-0.01.tar.gz
tar xvzf XPKG-0.01.tar.gz
cd XPKG-0.01
perl Makefile.pl
make
sudo make install
Second, set up a "package set" named ourcobol to manage system-wide Cobol packages. (Note this is an example. The sources don't actually exist.)
sudo xpkg create ourcobol \
source=http://cobol.org/packages \
source=https://mycompany.com/internal_cobol_packages \
architecture=cobol85 \
architecture=cobol \
libdir=/usr/local/lib/cobol \
bindir=/usr/local/bin
Third, install the cobol-on-rails public package, the company-specific get-rich-quick code and all their dependencies in the ourcobol package set.
sudo xpkg install ourcobol cobol-on-rails
sudo xpkg install ourcobol get-rich-quick
Easy.
Suppose just one system user, sue, would like to manage her local emacs extensions from some central repository with a package set sue-emacs.
xpkg create sue-emacs \
source=http://emacs-packages.com/packages \
libdir=/home/sue/.elisp
xpkg install sue-emacs snippets
Many package sets, all with their own configuration variables/installation directories, can be managed on the same system by a single xpkg install.
I'm currently writing an xpkg implementation in Perl. Other xpkg implementations could be written in Lisp, C, Python or Java, for example, but Perl happens to be a perfect fit for this project. I never quite thought I'd be excited about Perl for anything but it really does suit this project well. Since Perl is on virtually every UNIX-like system already and can work on Microsoft Windows (from what I understand) it keeps the apparent installation cost low. The only other "neutral" language option was C and that really seems like development overkill as Perl does all the necessary things for package management.
xjs is distributed completely through xpkg. This removes a great deal of complexity from the xjs project as it doesn't have to bootstrap itself to run its own package management system written in JavaScript. I've talked a bit with the folks from the Helma NG project. They seem interested in having a drop-in package management system. I'm looking for other projects that would also be interested in having free and easy package management software to ensure the xpkg code is general enough for their needs. If you are interested please leave a comment below.
Fork v0.2 client-side JavaScript planning discussions
I've been working on version 0.2 of the Fork client-side JavaScript library. I released version 0.1.1 almost a year an a half ago and have been using it in production without incident ever since. Along with some API changes, the truly fundamental goal for version 0.2 is making it easier to use robust feature testing at the application levels built on top of the Fork library. The mainstream client-side JavaScript libraries do not really provide any support at all for developers wishing to employ the progressive enhancements and feature testing philosophy (though they often pay lip service to it.) Hard to believe given these ideas are well accepted as the right way to script browsers.
I've posted some messages about version 0.2 to the Fork discussion group and I hope some folks will like to join in the discussion. In particular, I think the following discuss some of the most interesting aspects of client-side library design for the browser (a hostile runtime environment.)
An Important Pair of Parens
Every single time I read code like the following it seems like someone has particularly gone out of his way to trick me and waste a little bit of my time.
var f = function() {
// function body code
// ...
// ...
}();
I happen to read code from top to bottom. You might too. When I see the first line above, I think "ok this functional literal is being assigned to f." Then I proceed reading the function and investing energy thinking about the function and what it will do when f is called. The trailing parens after the closing brace are the nasty little surprise waiting for me when I get to the bottom: after all that investment. That is when I learn that the function literal is not being assigned to f but rather the function literal is being called and its return value (perhaps an integer, for example) is being assigned to f. It would have been nice to have known this when I started reading the function. An early warning would be especially appreciated for functions over a screen long (perhaps many comments) where the trailing parens pair is not even visible without scrolling the page.
Yes I know I should always look first to see if the function is being automatically evaluated but normal assignment of a function literal is so much more common than automatic evaluation that it is a kneejerk reaction to assume it is a normal assignment. Habitual assumptions that things are normal are hard to break. (If someone changes your editor's key bindings how well can you type?)
I always write the above type of code like the following.
var f = (function(){
// function body code
// ...
// ...
})();
That extra opening paren before function lets me know something unusual is happening. That little bit of extra info really helps when reading code top to bottom by giving the automatic execution of a function literal a little prefix syntax of its own. This is only a convention but a very appreciated one.
It turns out the extra set of parens are necessary when using automatic function execution for the module pattern. If an entire file is inside one of these anonymous functions the parens must be like the following.
(function() {
// function body code
// ...
// ...
})();
So it is more consistent to always uses the extra set of parens to indicate automatic execution and it will make your code less painful for others to read. Please, for the good of your code's readers, always include the extra pair of parens.
Module Pattern Provides No Privacy...at least not in JavaScript(TM)
The module pattern has been discussed many times and has shown how ECMAScript has the ability to encapsulate data as "private" variables by using closures.
Today, in a comment on my blog, a reader, haysmark, points out that Mozilla's JavaScript(TM), the implementation in Firefox, has a second argument extension to eval that allows external code to spy on otherwise private variables.
Try the examples below in Firefox.
// Getting "private" variables
var obj = (function() {
var a = 21;
return {
// public function must reference 'a'
fn: function() {a;}
};
})();
var foo;
eval('foo=a', obj.fn);
console.log(foo); // 21
// Setting "private" variables
var obj = (function() {
var a = 21;
return {
getA: function(){return a;},
alertA: function(){alert(a);}
};
})();
console.log(obj.getA()); //21
eval('a=3', obj.getA);
console.log(obj.getA()); // 3
obj.alertA(); // 3
This use of eval delivers a blow to the usefulness of statements like "JavaScript provides the means to construct durable objects that can perfectly guard their state by using a variation of the Module Pattern." Perhaps an ECMAScript implementation with no extensions can provide such security but one of the most important implementations, JavaScript(TM) in Firefox, apparently does not.
This use of the eval, however, doesn't make the module pattern useless. Its primary benefits are modularizing code so similarly named variables are not colliding and protects you or other developers from accidentally violating a programming interface. The module pattern also makes it possible to do OOP-like things without using keywords new, this and prototype which generally makes code more robust.
So the module pattern is still good. It just doesn't provide any security in a major browser.
Thanks to haysmark for the comment today. A big part of the reason I blog is to put ideas out there to see if they are shot down. I'm interested to see if this idea is shot down.
Update June 27, 2008 This post has been discussed at some other places: Caja Discussion Group, Ajaxian, Simon Willison's Blog
Update July 2, 2008 Apparently this post mattered and the second argument to eval will be gone in Firefox 3.1. More discussion on Douglas Crockford's Blog, Ajaxian, and John Resig's blog.
I didn't necessarily think JavaScript(TM) should be changed at all. Patching this security hole doesn't make JavaScript "secure". Since there are so many ECMAScript implementations, the ECMAScript specification allows for implementation extensions to the language, and there is no governing body certifying all publicly available implementations as "secure", the language will never be secure. I just thought this example was kind of a neat oddity and it seemed many others found this surprising as well. If security is a concern then ensure all the code you allow in your page from third parties is safe. Projects which define a safe subset of JavaScript and/or automate checking of third party code like Caja and ADSafe are more likely the way forward. I have also been thinking legal business agreements would be a good idea before allowing third party script on pages.
Don't Choose Your Middleware Language or Architecture: they are consequences
I'm writing web applications and need to decide on every piece of technology involved. What are a logical set of choices?
I want the client-side barrier to be as low as possible. By default this means the client-side will be HTML, CSS and JavaScript. I won't be using Flash or Java Applets.
On the server-side, I want the data stored in a proven reliable system. There are many technologies on the rise (e.g. CouchDB) but right now it seems that a relational database management system (RDMS) like PostgreSQL is the solid choice. Data storage is not where I want to be on the cutting edge.
So what do the above choices this mean for my server-side middleware language and architecture?
Don't repeat yourself (DRY) is an good programming practice. If the same logic must be shared on the client and the server then how will that be done in a DRY manner? If the middleware language is not JavaScript then the code could be written in the middleware language and compiled to JavaScript. This is a debugging nightmare since the client-side involves many different implementations each with many different bugs. Debugging compiled code and finding the error in the source code in different languages is not a welcome thought. Another option is to write an interpreter in JavaScript for the middleware language and then run that interpreter in the browser. Client-side performance would be very poor. With either of these options there is a whole lot of code that must be written (i.e. compiler or interpreter) before I can start programming my business concerns.
DRY is about more than just sharing source code. Needing to switch between two languages all day long means repeating the process of becoming an expert in two languages. Knowing multiple languages is a very good thing and I've studied several at least at an introductory level; however, becoming a true expert in a language takes years. One day I might be an expert in just one language. It makes sense to me to get the most return on my invested time and learn one language well and program production code better on both the client and server sides rather then splitting my time between two languages and writing mediocre code everywhere.
So my server-side language will be JavaScript which, though not my first choice, just so happens to be a fine language.
Persisting data in the database is a fundamental activity of a web application. The object-relational impedance mismatch is well known and a result of choosing an RDMS for its strengths but still wanting an object-oriented programming paradigm. An object-relational model (ORM) is used to bridge this gap. Since the gap is reasonably wide, the ORM is a substantial piece of software. This is another big piece of software that needs to be written before business logic can be expressed. There are many ORMs available but whenever the object model doesn't quite match a normalized database schema for some complex data the whole deal seems to fall apart and it is a great struggle to wrestle the ORM into submission.
There are many examples of data sets which we manipulate daily without an ORM. For example, the UNIX operating system provides all sorts of tools for you to create, read, edit, move and delete the files on your hard disk. When you are at the terminal command line and use ls -l you receive rows of data you then loop over or pipe to grep. This is not object-oriented and it works well. In our server-side programming of a web application, why don't we just build tools (i.e. functions) which abstract database access and operate on sets?
So I won't be using an ORM. I'll embrace the idea of sets.
My choices of client-side and data storage technologies have driven the resulting middleware language and architecture. It is more common that developers do it the other way around. For example, they first decided to program the server-side in Java or Ruby with a very object-oriented style because they like that language and style of programming. Then to indulge this whim they must develop thick adapter layers on either side of this middleware to interface with the database and client. Witness Hibernate and GWT in Java or ActiveRecord, RJS and HotRuby in Ruby. Something is wrong with the order of decisions being made if they results in a need for such large adapters layers to apparently "make things easier." It seems to me better to go with the flow of constraints outside the middleware layer will result in a much thinner middle layer.
Maybe JavaScript is not my favorite language but as consolation at least it has lambdas. Dealing with sets may sometimes be a bit more work or sometimes less than dealing with objects. Most importantly the other options for middleware language and architecture involve far more work, software and bugs.
(No need to bother with the "What about the libraries?" issue. I know the Perl folks are dedicated to CPAN and dying to bring this up whenever they can. If I wasn't writing for the web then JavaScript wouldn't be the default language. I'd probably be using C or Scheme (but I certainly wouldn't be using Perl!) Anyway, wrapping jars for Rhino is a snap so the point is relatively moot.)