
Met up with Mike Meffie (an AFS Developer of Sine Nomine) and got a shuttle from the hotel to the 'Visitors Lobby'; only to find out that EACH BUILDING has a visitors lobby. One neat thing, Google provides free bikes (beach cruisers) to anyone who needs them. According to the receptionist, any bike that isn't locked down is considered public property at Google. However, it's hard to pedal a bike and hold a briefcase; so off we went hiking several blocks to the correct building. Mike was smart enough to use a backpack, but hiked with me regardless.
The food was quite good, a reasonably healthy breakfast including fresh fruit (very ripe kiwi, and a good assortment). The coffee was decent as well! After much discussion, it was decided that Mike & I would work towards migrating the community CVS repository over to git. Because git sees the world as 'patch sets' instead of just individual file changes, migrating it from a view of the 'Deltas' makes the most sense. The new git repo. (when complete) should match 1:1 to the Delta history. There was a good amount of teasing as to whether Mike and I could make any measurable progress in 2 days. Derrick was able to provide pre-processed delta patches and the bare CVS repo. (though we spent a good amount of the day just transferring things around and determining what machine should be used for development).
Lunch (rather tasty sandwiches) and after lunch snacks were provided; Google definitely doesn't skimp on the catering. Made good progress for one day of combined work, we now have a clear strategy for processing the deltas and initial code that is showing strong promise. Much teasing ensued that Mike & I should not be allowed to eat if we did not have the git repo. ready for use. Dinner was a big group affair of food, beer and Kerberos.
Day Two: After arriving with Mike Meffie via the shuttle, we found out that Tom Keiser (also of Sine Nomine) had been left behind! The shuttle driver was kind enough to go pick up Tom (who ended up at a related, but DIFFERENT hotel than the conference recommended) and bring him for questioning (or development, as the case may be). Determined that the major issue in applying the deltas was simply due to inconsistencies in what the 'base' import should consist of... After several rounds of cleanup, all but a few of the deltas (and those were fixed by hand) applied cleanly!
On the food side, Google outdid itself with these cornbread 'pizzas' that were extremely good. Once we started having a few branches to play with, things came together quickly... generating much buzz and excitement (at least, for us). We all split off for dinner, with a few of us escorting Tom to his train then getting some Indian food (on a rather busy day, as it was the 'Festival of Lights').
In Conclusion: We were able to get a clean specification WITH CONSENSUS for how we want to produce the public git repository. The specifications are even available on the OpenAFS wiki. The tools (found at '/afs/sinenomine.net/public/openafs/projects/git_work/') to produce this repo. are all in a rough working form, with only the 'merge' tool still needing some development effort. All of these efforts were definitely facilitated by Google providing a comfortable work environment, a solid internet connection and good food to keep us fueled through it all.
Things to do now:
One of our clients wanted confirmation that PostgreSQL will have no problem handling 1000 or 2000 databases on a single database cluster. I remember testing some years ago, probably on Postgres 7.2 or 7.3, creating 1000 or so databases and finding that it worked fine. But that was a long time ago, the software has changed, and I thought I should make sure my old experiment results still stand.
There's a PostgreSQL FAQ question, "What is the maximum size for a row, a table, and a database?" but no mention of the maximum number (or more importantly, maximum practical number) of databases per cluster. So I threw together a test script to create 10,000 databases, each with between (randomly) 1-5 tables with 2 columns each (INTEGER and TEXT), each getting randomly between 1-10 inserts with random data up to 100 or so characters in the TEXT field.
I ran the test on PostgreSQL 8.1, the default that ships with Red Hat Enterprise Linux 5 (x86_64). The hardware was a desktop-class HP with an Intel Core 2 @ 1.86 GHz that wasn't always idle.
The short answer: Postgres 8.1 handles 10,000 databases just fine. l in psql generates a long list of databases, of course, but returns quickly enough. Ad-hoc concurrency testing was fine. Running queries, inserts, etc. on a handpicked group of the various play databases worked fine, including while new databases were being created. During the creation process, the last database creates seemed about as fast the first. It took 2.75 hours to run.
This all is hardly a big surprise, but maybe by documenting it I'll save someone the bother of running your own test in the future.
Addendum: The actual limit on this platform is probably 31995 databases, because each database occupies a subdirectory in data/base/ and the ext3 filesystem has a limit of 31998 sub-directories per one directory, stemming from its limit of 32000 links per inode. The other 5 would be ., .., template0, template1, and postgres. (Thanks, Wikipedia.)
As my colleague Jon mentioned, the Presidential Youth Debates launched its full debate content this week. And, as Jon also mentioned, the mix of tools involved was fairly interesting:
Our use of Postgres for this project was not particularly special, and is simply a reflection of our policy that for new systems we use Postgres by default. So I won't discuss the Postgres usage further (though it pains me to ignore my favorite piece of the software stack).
Radiant
Dan Collis-Puro, who has done a fair amount of CMS-focused work throughout his career, was the initial engineer on this project and chose Radiant as the backbone of the site. He organized the content within Radiant, configured the Page Attachments extension for use with Amazon's S3 (Simple Storage Service), and designed the organization of videos and thumbnails for easy administration through the standard Radiant admin interface. Furthermore, prior to the release of the debate videos, Dan built a question submission and moderation facility as a Radiant extension, through which users could submit questions that might ultimately get passed along to the candidates for the debate.
In the last few days prior to launch, it fell to me to get the new debate materials into production, and we had to reorganize the way we wanted to lay out the campaign videos and associated content. Because the initial implementation relied purely on conventions in how page parts and page attachments are used, accomplishing the reorganization was straightforward and easily achieved; it was not the sort of thing that required code tweaks and the like, managed purely through the CMS. It ended up being quite -- dare I say it? -- an agile solution. (Agility! Baked right in! Because it's opinionated software! Where's my Mac? It just works! Think Same.)
For managing small, simple, straightforward sites, Radiant has much to recommend it. For instance:
That said, there are a number of things for which one quickly longs within Radiant:
The in-place editing UI features could presumably be added to Radiant given a reasonable degree of patience. The page attachment criticisms also seem achievable. The versioning, however, is a more fundamental issue. Many CMSes attempt to solve this problem many different ways, and ultimately things tend to get unpleasant. I tend to think that CMSes would do well to learn from version control systems like Git in their design; beyond that, integrate with Git: dump the content down to some intelligent serialized format and integrate with git branching, checkin, checkout, pushing, etc. That dandy, glorious future is not easily realized.
To be clear: Radiant is a very useful, effective, straightforward tool; I would be remiss not to emphasize that the things it does well are more important than the areas that need improvement. As is the case with most software, it could be better. I'd happily use/recommend it for most content management cases I've encountered.
Amazon S3
I knew it was only a matter of time before I got to play with Amazon S3. Having read about it, I felt like I pretty much knew what to expect. And the expectations were largely correct: it's been mostly reliable, fairly straightforward, and its cost-effectiveness will have to be determined over time. A few things did take me by surprise, though:
So, yeah, Amazon S3 has worked fine and been fine and generally not offended me overmuch.
Varnish
The Presidential Youth Debate project had a number of high-profile sponsors potentially capable of generating significant usage spikes. Given the simplicity of the public-facing portion of the site (read-only content, no forms to submit), scaling out with a caching reverse proxy server was a great option. Fortunately, Varnish makes it pretty easy; basic Varnish configuration is simple, and putting it in place took relatively little time.
Why go with Varnish? It's designed from the ground up to be fast and scalable (check out the architecture notes for an interesting technical read). The time-based caching of resources is a nice approach in this case; we can have the cached representations live for a couple of minutes, which effectively takes the load off of Apache/Rails (we're running Rails with Phusion Passenger) while refreshing frequently enough for little CMS-driven tweaks to percolate up in a timely fashion. Furthermore, it's not a custom caching design, instead relying upon the fundamentals of caching in HTTP itself. Varnish, with its Varnish Configuration Language (VCL), is extremely flexible and configurable, allowing us to easily do things like ignore cookies, normalize domain names (though I ultimately did this in Apache), normalize the annoying Accept-Encoding header values, etc. Furthermore, if the cache interval is too long for a particular change, Varnish gives you a straightforward, expressive way of purging cached representations, which came in handy on a number of occasions close to launch time.
A number of us at End Point have been interested in Varnish for some time. We've made some core patches: JT Justman tracked down a caching bug when using Edge-Side Includes (ESI), and Charles Curley and JT have done some work to add native gzip/deflate support in Varnish, though that remains to be released upstream. We've also prototyped a system relying on ESI and message-driven cache purging for an up-to-date, high-speed, extremely scalable architecture. (That particular project hasn't gone into production yet due to the degree of effort required to refactor much of the underlying app to fit the design, though it may still come to pass next year -- I hope!)
Getting Varnish to play nice with Radiant was a non-issue, because the relative simplicity of the site feature set and content did not require specialized handling of any particular resource: one cache interval was good for all pages. Consequently, rather than fretting about having Radiant issue Cache-Control headers on a per-page basis (which may have been fairly unpleasant, though I didn't look into it deeply; eventually I'll need to, though, having gotten modestly hooked on Radiant and less-modestly hooked on Varnish), the setup was refreshingly simple:
Header always set Cache-Control "public; max-age=120"
Header always set Vary "Accept-Encoding"
The Cache-Control header tells clients (Varnish in this case) that it's acceptable to cache representations for 120 seconds, and that all representations are valid for all users ("public"). We can, if we want, use VCL to clean this out of the representation Varnish passes along to clients (i.e. browsers) so that browsers don't cache automatically, instead relying on conditional GET. The Vary header tells clients that cache (again, primarily concerned with Varnish here) to consider the "Accept-Encoding" header value of a request when keying cached representations. An entirely separate domain exists that is not fronted by Varnish and allows access to the Radiant admin. We could have it fronted by Varnish with caching deactivated, but the configuration we used keeps things clean and simple. We use some simple VCL to tell Varnish to ignore cookies (in case of Rails sessions on the public site), to normalize the Accept-Encoding header value to one of "gzip" or "deflate" (or none at all) to avoid caching different versions of the same representation due to inconsistent header values submitted by competing browsers.Getting all that sorted was, as stated, refreshingly easy. It was a little less easy, surprisingly, to deal with logging. The main Varnish daemon (varnishd) logs to a shared memory block. The logs just sit there (and presumably eventually get overwritten) unless consumed by another process. A varnishlog utility, which can be run as a one-off or as a daemon, reads in the logs and outputs them in various ways. Furthermore, a varnishncsa utility outputs logging information in an Apache/NCSA-inspired "combined log" format (though it includes full URLs in the request string rather than just the path portion, presumably due to the possibility of Varnish fronting many different domains). Neither one of these is particularly complicated, though the varnishlog output is reportedly quite verbose and may need frequent rotation, and when run in daemon mode, both will re-open the log file to which they write upon receiving SIGHUP, meaning they'll play nice with log rotation routines. I found myself repeatedly wishing, however, that they both interfaced with syslog.
So, I'm very happy with Varnish at this point. Being a jerk, I must nevertheless publicly pick a few nits:
That's all I have to whine about, so either I'm insufficiently observant or the software effectively solves the problem it set out to address. These options are not mutually exclusive.
I'm definitely looking forward to further work with Varnish. This project didn't get into ESI support at all, but the native ESI support, combined with the high-performance caching, seems like a real win, potentially allowing for simplification of resource design in the application server, since documents can be constructed by the edge server (Varnish in this case) from multiple components. That sort of approach to design calls into question many of the standard practices seen in popular (and unpopular) application servers (namely, high-level templating with "pages" fitting into an overall "layout") but could help engineers keep maintain component encapsulation, think through more effectively the URL space, resource privacy and scoping considerations (whether or not a resource varies per user, by context, etc.), etc. But I digress. Shocking.
The Doxygen gave those new to OpenAFS code a chance to look under the covers of OpenAFS. Doxygen produces pretty nice output from simple formatting commands, so it's really just a matter of making comments follow some basic rules. Sample Doxygen output (from some previous work) can be seen here, and some of the new Doxygen changes made to OpenAFS are already upstream.
The Demand Attach work focused on the interprocess communications pieces, namely the FSSYNC & SALVSYNC channels, specifying requirements and outlining the approaches for implementing bi-directional communications so that the failure of one process would not leave a volume in an indeterminate state. Some coding was done to address some specific locking issues, but the design and implementation of better interprocess volume state management is still an open issue.
The OpenAFS Roadmap discussion revolved around 3 major pieces: CVS to Git conversion, Demand Attach, and Rx OSD. DAFS is in the 1.5.x branch currently, but Rx OSD is not. The general consensus was that DAFS plus some of Rx OSD might be able to go into a stable 1.6 release in Q1 of 2009, which would also let the Windows and Unix stable branches merge back together.
However, the major goal in the short term is to get the CVS to Git migration done to make development more streamlined. Derrick Brashear, Mike Meffie, and Fabrizio Manfredi are all working on this.
The 1.6 merge, DAFS, and Rx OSD are all still very much works-in-progress in terms off getting them into a stable release together. While individually, DAFS and Rx OSD have been used by some OpenAFS installations in production, there is a lot more work to be done in terms of getting them integrated into a stable OpenAFS release.
Overall, the hackathon went very well, with some new AFS developers trained, and some progress made on existing projects. Many thanks to the Ohio Linux Fest for their support, and to Mike Meffie specifically for his efforts in coordinating the hackathon.
This afternoon was the launch of Walden University's Presidential Youth Debate website, which features 14 questions and video responses from Presidential candidates Barack Obama and John McCain. The video responses are about 44 minutes long overall.
The site has a fairly simple feature set but is technologically interesting for us. It was developed by Dan Collis-Puro and Ethan Rowe using Radiant, PostgreSQL, CentOS Linux, Ruby on Rails, Phusion Passenger, Apache, Varnish, and Amazon S3.
Nice work, guys!
Over the weekend, geo-location hacking geeks converged in Portland, OR for the first WhereCampPDX. Topics of discussion ranged from where to find fruit in urban areas and the creepiness of location-aware dating software, to disaster management and using personal location data in times of crisis. I was part of the planning team, and was happy and proud that we brought together nearly 100 people over the course of three days.
WhereCampPDX is an unconference -- no sessions were planned in advance and all the participants were charged with responsibility for their own experience. This is an Open Spaces model for managing large group meetings. Photos of all the session topics are available here.
Many conversations were documented at the drop.io site. Some groups have decided to continue to meet, in particular the What Now, Portland? group had an intense few hours. One participant thought she was going to spend the morning playing Cruel2BKind out in the Saturday Market pavillion on Sunday, but ended up so engaged and deep in discussion that she never left her chair. I came away inspired that people were able talk about their passions and totally lose track of time.
One topic that came up repeatedly was crisis management and how locative tech might be able to help in the event of a disaster. This is an emerging topic and there's so much energy around it - expect a mini-conference on how to bridge the gap between people and the technologies that could help them this spring.
And people who work hard together, should play hard too! Check out this video of the PacManhattan game and other great coverage of the conference on WebMonkey.com.
![[image]](http://mowser.com/img?url=http%3A%2F%2Fdb.endpoint.com%2Fpg-conf-08%2Fgroup-photo-for-blog.jpg)
I attended the PostgreSQL Conference West and had a great time again this year. My photos of the event are up here:
http://db.endpoint.com/pg-conf-08
In addition, I shot some footage of the event in an attempt to highlight the benefits of the conference, Postgres itself, and the community strengths. I'm looking for a talented Editor willing to donate time; if none volunteer then I'll probably do it in January. My guess is that there will be several web sites willing to host it for free when it's done.
The Code Sprint was really interesting. Selena Deckelmann gave everyone a lot of ideas to get the most out of the time available for hacking code. At regular intervals, each team shared the progress they made and recieved candy as a reward. It was neat to see other people hacking on and committing changes to the Postgres source tree in meatspace.
Bruce Momjian's Postgres training covered a wide gamut of information about Postgres. He polled everyone in the room for their particular needs, which varied from administration to performance, then tailored the training to cover information relating to those needs in particular detail. Those who attended reported that they learned a great deal of new information from the training. From here, a lot of folks went out to continue interacting with Postgres people, but I headed for home.
Windowing Functions were covered by David Fetter in a talk that addressed ways to make OLAP easier with new features coming to Postgres 8.4. Functionality that used to be slow and difficult in client-side applications can be handled easily right in the database. I made a note to check this out when 8.4 hits the streets.
Jesse Young spoke about using Linux-HA + DRBD to build high availability Postgres clusters. It is working very well for him in over 30 different server installations; he proved this by taking down a production server in the middle of the presentation and demonstrating the rapid transition to the failover server. Just set-it-and-forget-it. I was able to weigh the advantages and disadvantages compared to other clustering options such as shared disk (e.g. GFS) and Postgres-specific replication options (Slony, Postgres Replicator, Bucardo, etc.).
In his talk, PostgreSQL Optimizer Exposed, Tom Raney delved into a variety of interesting topics. He described the general workings of the optimizer, then showed a variety of interesting plans that are evaluated for the example query, how each plan was measured for cost, and why the cost varied. He uncovered several interesting facts, such as demonstrating that the Materialization step (pushing sorts to disk that are too large for memory) doesn't increase the cost associated with that plan. Tom Lane explained that this would rarely, if ever, affect real world results, but that is the kind of information made obvious in the Visual Planner, but hidden by textual EXPLAIN ANALYZE. Tom Raney also demonstrated the three-fold difference (in one case) between the cost of the clustered index and the rest. Optimizing query performance is one of my favorite pastimes, so I enjoyed this talk a lot.
I learned a bit about what was going in Postgres community organizations during Joshua Drake's talk, "PostgreSQL - The happening since EAST". The PostgreSQL.US and other organizations are doing a lot to increase awareness of Postgres among education, government, business, and other developers. The point was made that we should do as much as we can to reach out to widely prevalent PHP applications and web hosting providers.
Common Table Expressions (CTE) were given a good explanation by David Fetter in his talk about doing Trees and More in SQL. Having worked on Nested Set and Adjacency List models, I was very interested in this new feature coming to 8.4. Starting with a simple recursive example, David built on it slide-by-slide until he had built and executed a brute force solution to a Traveling Salesman Problem (for a small number of cities in Italy) using only plain SQL. I'm excited to try this out and measure the performance.
Mark Wong & Gabrielle Roth presented the results of testing that they completed. Selena also covered that information in her post about Testing filesystems and RAID. After that we talked Perl on the way to the Portland Paramount for the party.
On Sunday, I sat in on "Developing a PL (Procedural Language) for PostgreSQL", by Joshua Tolley, as he carefully explained the parts and steps involved. LOLCODE humor peppered the useful information on the related Postgres libraries, Bison usage, and pitfalls.
I was glad to see Catalyst boosted in Matt Trout's presentation. He very quickly covered the design and advantages of Catalyst, DBIx::Class, and Postgres as they related to the implementation of a high profile and complex web site. It was very informative to see the Class structure for the View model, which gave me several ideas to take use for my own development. He demonstrated how a complex 56-way join was coded in very brief and comprehensible perl code relying on the underlying modules to provide the underlying support. The explain tree is so large that it couldn't fit on the screen even in microscopic font, and even with very large data sets, the Postgres optimizer found a way to return the results in one tenth of a second. Matt also demonstrated several flaws in his design, such as how his use of rules to implement updatable views caused multi-row updates to be slower than the equivalent trigger-based system. I use Catalyst for several projects, but I think Interchange still has more advantages. I'm definitely going to take another look at DBIx::Class.
Before lunch, I asked if I could shoot a group photo, so we went to the park. Several people were not in attendance, and I didn't want to take more than a minute or two, so the shots are not as good as I would have liked. Next time I'll ask if we can plan some time for arranging the group. At lunch I had a great time talking to fellow Postgres developers and learning more about their work.
Lightning Talks followed lunch and included a variety of interesting topics. One of my favorites was "I can't do that" by Matt Trout. He explained how wrong it is to believe you can't contribute something to Postgres or any other open source project. If you think your code will be incomplete or buggy, do it anyway, because it may prompt someone else to work on it, or scrap yours and do it right. Don't think you can't contribute to documentation because of your infamiliarity with the system, because that's exactly the advantage you have for documentation contributions: those who need the docs are in exactly your shoes.
Matt also gave the closing talk, "Perl 5 is Alive!", which was a concise, water-tight presentation of Perl 5's superiority over other development environments, including CPAN and job statistics that demonstrate its growing popularity.
Some attendees went out afterwards to finish the conference over a drink. I slept about 11 hours straight to recover from the whirlwind of weekend activity. Overall I'm grateful for the opportunity to interact with the community again and I'm excited for what the future has in store for Postgres.
At OSNews.com the article Windows x64 Watch List describes some of the key differences between 64-bit and 32-bit Windows. It's pretty interesting, and mostly pretty reasonable. But this one caught my eye:
"There are now separate system file sections for both 32-bit and 64-bit code
Windows x64's architecture keeps all 32-bit system files in a directory named "C:WINDOWSSysWOW64", and 64-bit system files are place in the the oddly-named "C:WINDOWSsystem32" directory. For most applications, this doesn't matter, as Windows will re-direct all 32-bit files to use "SysWOW64" automatically to avoid conflicts.
However, anyone (like us system admins) who depend on VBScripts to accomplish tasks, may have to directly reference "SysWOW64" files if needed, since re-direction doesn't apply as smoothly.
I've been using 64-bit Linux since 2005 and found there to be some learning curve there, with distributors taking different approaches to supporting 32-bit libraries and applications on a 64-bit operating system.
The Debian Etch approach is to treat the 64-bit architecture as "normal", for lack of a better word, with 64-bit libraries residing in /lib and /usr/lib as always. It's recommended to run a 32-bit chroot with important libraries in the ia32-libs package going into /emul/ia32-linux. Ubuntu is similar, but its ia32-libs puts its ia32-libs files into /usr/lib32.
The Red Hat approach called "multilib" keeps 32-bit libraries in /lib and /usr/lib with new 64-bit libraries living in /lib64 and /usr/lib64. (I mentioned this a while back while discussing building a custom Perl on 64-bit Red Hat OSes.)
Each way has its tradeoffs, and causes a bit of trouble. That's just the cost of dealing with multiple architectures in a single running OS, where no such support was previously needed.
But the Windows way? Putting your 32-bit libraries in C:WINDOWSSysWOW64 and your 64-bit libraries in C:WINDOWSsystem32? It hurts to see the names be exactly backwards. That's really tops for confusion.
As mentioned last week, Gabrielle Roth and I presented results from tests run in the new Postgres Performance Lab. Our slides are available on Slideshare.Â
Our hope is that someone in the Linux filesystem community takes up these tests and starts to produce them for other hardware, and on a more regular basis. We did have 3 people interested in running their own tests on our hardware from the talk! Â In the future, we plan to focus our testing most on Postgres performance.
For the record, and maybe to save confusion for someone else who runs into this:
On Red Hat Enterprise Linux 5 with SELinux in enforcing mode, Postfix cannot read ~/.forward files by default. It's probably not hard to fix -- perhaps the .forward files just need to have the right SELinux context set -- but we decided to just use /etc/aliases in this case.
I missed the news a week and a half ago that Red Hat has acquired Qumranet, makers of the Linux KVM virtualization software. They say they'll be focusing on KVM for their virtualization offerings in future versions of Red Hat Enterprise Linux, though still supporting Xen for the lifespan of RHEL 5 at least. (KVM is already in Fedora.)
Given that Ubuntu also chose KVM as their primary virtualization technology a while back, this should mean even easier use of KVM all around, perhaps making it the default choice on Linux. (Ubuntu supports other virtualization as well.)
Also, something helpful to note for RHEL virtualization users: Red Hat Network entitlements for up to 4 Xen guests carry no extra charge if entitled the right way.
In even older Red Hat news, Dag Wieers wrote about Red Hat lengthening its support lifespan for RHEL by one year for RHEL 4 and 5.
That means RHEL 5 (and thus also CentOS 5) will have full support until March 2011, new media releases until March 2012, and security updates until March 2014. And RHEL 4, despite its aging software stack, will receive security updates until February 2012!
That's very helpful in making it easier to choose the time of migration without being pushed too soon due to lack of support.
Seth Godin wrote an interesting article on the subject of competence; it resonated with me personally for a variety of reasons.
The article uses musicians, and Bob Dylan in particular, as an example of how "competence" can pale in comparison to "incompetence" in terms of the quality of the results. In particular, it asserts that competent musicians consistently play the music in question the same way, and suggests that the lack of such consistency could be thought of as incompetence. Bob Dylan thus becomes an incompetent musician who is nevertheless really great due to the emotional content of his performances; beyond that, he is a "change agent" because of his brilliance. And that's the crux of the article: the "incompetent" people are the change agents who advance the state of the art, while the "competent" people resist change and thus hold things back.
As a fairly serious practicing musician myself, I'll assert in response: this is not an accurate representation of musicianship, and the issue extends to the core of the article's argument.
Playing music the same way every time is not an indication of competence. It's an indicator of insufficient imagination and demonstrates a lack of mastery. The different musical traditions of the world vary considerably in the precision of their musical project specs (e.g. scores with full orchestrated notation versus "charts" with melody over chord symbols versus no notation at all), but the musician always has ample room to interpret. The jazz musician gives the appearance of spontaneity in wild improvisational flights of fancy, while the classical pianist playing Bach may seem to be playing things the same way twice. But there's plenty of interpretative, improvisational nuance going on in the Bach performance, it's just subtler and doesn't necessarily have to do with the order and combination of pitches played. The jazz musician's appearance of spontaneity is a studied spontaneity that is practiced and accumulated over time just like instrumental technique; the improvised material consists largely of material the musician has already mastered and played, in various combinations, many times over.
A musician who aspires to play something the same way every time is a musician who is trying to learn a piece, but who is not trying to master the piece so it can become an expressive vehicle. The musician may ultimately be able to execute the piece, but will probably not ever give a particularly compelling performance of it. Furthermore, it is commonly the case that truly dedicated listeners are potentially better-equipped than are casual listeners to separate the ho-hum performances from the truly exceptional.
So, how does this relate to the business world?
The person the article categorizes as "competent" may truly be simply competent, solving problems in the nearly same way every time, delivering consistent results. It may also be that the "competent" person is truly a master, with that mastery expressed in the small details that aren't necessarily so obvious to people unfamiliar with the craft. In software engineering, it takes all kinds to make things run. There are some engineers who consistently display a great creative impulse, who think outside the cliché, who can be used to approach difficult problems with ingenuity and boldness. There are some engineers who stay more focused on a particular toolset, methodology, etc., who will not necessarily display "outside the box" thinking, but will demonstrate complete command of the tools of their craft, delivering rock-solid, maintainable solutions to problems large and small, easy and difficult.
In the world of software engineering, or any other craft that involves building stuff to spec, what does it mean to solve problems the same way every time? If we're literally talking about writing similar code over and over, then the problem needs to be re-positioned: we should be talking about building a generic solution so humans don't need to waste their time with redundant custom solutions. Furthermore, if a proven method, design pattern, etc. has worked effectively for a problem in the past, and a similar problem comes up now, disregarding the "competent" solution of relying on past success to inform today's plans would be deeply unwise. Perhaps an "incompetent" person could come up with something even better, so the important thing is to have the flexibility to embrace change when appropriate.
The real win, I think, is to have the full spectrum of possibilities adequately represented. Hire people who are really smart, take pride in their work, and who show humility. The last point is critical: the extremes of "competent" versus "incompetent" as laid out in the article arguably represent archetypal factions that cannot appreciate the value that the other faction adds. A dose of humility ensures that all parties can appreciate the others' contributions.
Now, the original article is talking about "change agents", and I'm not really addressing that. But, to go back to music for a second, let's consider a rather important figure in western art music (that tradition most people call "classical music"): Johann Sebastian Bach.
Bach was working during the phase when the "baroque" period ended and the "classical" period began. He was known and respected for his skill in composition and keyboard performance, but he was regarded as something of a relic. He was composing at the very extreme edges of the "baroque" tradition, while the simplified "classical" tradition was coming into vogue. In that light, he was not regarded as an innovator.
Yet an innovator he was. Beyond his innovative experimentation with equal-tempered instruments, Bach achieved a level of sophistication and mastery of counterpoint (the weaving together of multiple concurrent melodies so that each melody stands on its own while all working together to achieve coherent harmonic progressions) that remains unparalleled. Everything he did was a logical extension of the tradition in which he operated; his prolific body of work and the stunning mastery it demonstrates would not be possible without a complete dedication to his musical tradition. He did not set out to be different; he mastered the compositional techniques of the day to such a degree that he had complete freedom in how he exercised those techniques. Yet he approached problems using the same techniques time and time again; his music unfolds in a clear, logical manner that to the well-versed can often be quite predictable.
JS Bach was a "change agent". The western art music tradition would never be the same after him. Countless major composers that followed were heavily influenced by Bach's work. Yet his work is that of the supremely competent craftsman. A staggering, brilliant, unerring competence.
Using Vyatta to Replace Cisco Gear
At the 2008 Utah Open Source Conference I attended an interesting presentation by Tristan Rhodes about the Vyatta open source networking software. Vyatta's software is designed to replace Cisco appliances of many sorts: WAN routers, firewalls, IDSes, VPNs, and load balancers. It runs on Debian GNU/Linux, on commodity hardware or virtualized.
A key selling point is the price/performance benefit vs. Cisco (prominently noted in Vyatta's marketing materials), and the IOS-style command-line management interface for experienced Cisco network administrators. Regular Linux interfaces are available too, though Tristan wasn't positive that writes would stick in all cases, as he's mostly used the native Linux tools for monitoring and reading, not writing.
Pretty cool stuff, and Vyatta sells pre-built appliances and support too. The Vyatta reps were handing out live CDs, but I haven't had a chance to try it out yet. Presentation details are here.
Google App Engine 101
Jonathan Ellis did a presentation and then hands-on workshop on Google App Engine, which I found especially useful because he's a longtime Python and Postgres user. His talk on SQLAlchemy last year made me think he wouldn't gloss over the huge differences in the runtime environment of GAE vs. regular Django, for example having GQL and BigTable instead of SQL and a relational database. And he didn't. They're quite different, and one is very primitive to use. I'll let you guess which one. :)
In fact, the day of the conference he wrote a blog post, App Engine Conclusions, where he says: "I've reluctantly concluded that I don't like it." His reasoning makes sense to me, and maybe it will improve enough later to be really nice. We'll see. Of course that's all ignoring the hosting lock-in too.
His presentation details are here.
Writing Documentation with Open Source Tools
Paul Frields (of the Fedora Project) and Jared Smith (of Asterisk fame) showed how to use DocBook XML to write documentation. It was a practical talk, we asked questions, they tag-teamed the answers and live demonstrations, showed us the Red Hat tool "Publican" and Gnome's yelp documentation viewer that can present DocBook XML natively. Good stuff, though XML sure hasn't gotten any less verbose.
The presentation details include a link to the slides.
Automated System Management with Puppet
Andrew Shafer did a presentation on Puppet, and I was sad to miss the beginning of it. But what I heard was quite enjoyable.
The message I took away is this: Without some overlap of the traditionally separate domains or disciplines of system administrator and programmer, no software tool is going to be able to magically manage all your systems for you. Puppet provides a domain-specific language for specifying what resources should be available. (Resources are comprised of packages, files, and services.) You still have to say what you want, but there's a nice way to do that in a cross-platform way, once. Paraphrasing Einstein, it's a simple as it can be, but not simpler.
The questions were good, but I had the feeling from a few of them that people wanted things to be simpler than possible. :)
Andrew's presentation slides tell the story pretty well even on their own.
LOLCATS
A nice bonus was the UTOSC crew giving out fortune cookies with LOLCATS fortunes. Mine read:
"given ur fortune
That was a delight. And I happened to meet up right about then with Josh Tolley, author of PL/LOLCODE.
In the past I've used virtualization mostly in server environments: Xen as a sysadmin, and VMware and Virtuozzo as a user. They have worked well enough. When there've been problems they've mostly been traceable to network configuration trouble.
Lately I've been playing with virtualization on the desktop, specifically on Ubuntu desktops, using Xen, kvm, and VirtualBox. Here are a few notes.
Xen: Requires hardware virtualization support for full virtualization, and paravirtualization is of course only for certain types of guests. It feels a little heavier on resource usage, but I haven't tried to move beyond lame anecdote to confirm that.
kvm: Rumored to have been not ready for prime time, but when used from libvirt with virt-manager, has been very nice for me. It requires hardware virtualization support. One major problem in kvm on Ubuntu 8.04 is with the CD/DVD driver when using RHEL/CentOS guests. To work around that, I used the net install and it worked fine.
VirtualBox: This was for me the simplest of all for desktop stuff. I've used both the OSE (Open Source Edition) in Ubuntu and Sun's cost-free but proprietary package on Windows Vista. The current release of VirtualBox only emulates i386 32-bit machines at the moment, though! (No 64-bit guests, though a 64-bit host is fine.) It's also been a little buggy at times -- I've had a few machine crashes when running both an OpenBSD 4.3 and a RHEL 5 guest, though I wasn't able to reproduce the problem and it's possible it wasn't a VirtualBox issue.
I should note that some manufacturers have a BIOS option to disable hardware virtualization, and that it is sometimes disabled by default. When booting a new machine, check for that, especially in servers you won't necessarily want to take down later.
A final note about RHEL 5's net install: Why, oh why, does the installer ask for an HTTP install location as separate web site and directory entries, instead of a universally used and easy URL? And further, when the install source I'm using goes down (as download mirrors occasionally do), why are my only options to reboot or retry? Would it have been so hard to allow me the option of entering a new download URL? Yes, I know, I need to send in a patch.
Git supports many workflows; one common model that we use here at End Point is having a shared central bare repository that all developers clone from. When changes are made, the developer pushes the commit to the central repository, and other developers see the relevant changes on subsequent pulls.
We ran into an issue today where after a commit/push cycle, suddenly pulls from the shared repository were broken for downstream developers. It turns out that one of the commits had been created by root and pushed to the shared repository. This worked fine to push, as root had read-write privileges to the filesystem, however it meant that the loose objects which the commit created were in turn owned by root as well; fs permissions on the loose objects and the updated refs/heads/branch prevented the read of the appropriate files, and hence broke the pull behavior downstream.
Trying to debug this purely on the reported messages from the tool itself would have resulted in more downtime at a critical time in the client's release cycle.
There are a couple of morals here:

![[image]](http://mowser.com/img?url=http%3A%2F%2Fwww.osnews.com%2Fimages%2Fstaff%2Farrow.gif)
![[image]](http://mowser.com/img?url=http%3A%2F%2Fwww.osnews.com%2Fimg%2Fpure.gif)