&

I'm Mike

syndicated content powered by FeedBurner

Subscribe Now!

...with web-based news readers. Click your choice below:

addtomyyahoo4Subscribe in NewsGator OnlineAdd to My AOL
Subscribe in RojoSubscribe with BloglinesAdd to netvibes
Add to Google[image]

...with other readers:

original feed View Feed XML

Embed this content on your site

Embed with: SpringWidgets

FeedBurner makes it easy to receive content updates in My Yahoo!, Newsgator, Bloglines, and other news readers.

Learn more about syndication and FeedBurner...

Current Feed Content

XHTML 2 vs. HTML 5

Posted: Wed, 06 Feb 2008 18:36:46 +0000

Rewind a little more than 10 years to December 18, 1997. Internet Explorer 4 had been released 3 months earlier. The Mozilla Foundation had not yet formed, and their Firefox web browser was years away from public release. There was no XMLHttpRequest… there wasn’t even XML. On that day, over a decade ago, HTML 4.0 was published as a W3C recommendation.

That’s the environment in which our current web standards were developed. Sure, improvements have been made. XHTML 1.0 became a recommendation in 2000, and CSS 2 has (kind of) been implemented by the major browser vendors. But the foundation of the Web — the common denominator that every web site is built on, from simple brochures to complex applications — has stayed essentially unchanged.

Until now, anyways. After a long lull, things seem to be changing at the W3C — there are two competing specifications under development to replace the aging HTML 4.x and XHTML 1.x standards. Both initiatives are working under the auspices of the W3C (although this was not always the case) and both are, in my opinion, vastly superior to the current crop of web markup languages. They are HTML 5 and XHTML 2.0. And if you’re reading this, you’ll probably become very familiar with one (or both) of them over the next couple of years.

Some History

Work on XHTML 2.0 began shortly after XHTML 1.1 became a recommendation in 2001. The first XHTML 2.0 working draft was released in 2002, but much of the document was in a non-normative, incomplete state (some still is). By 2004, some prominent industry stakeholders — browser vendors, web developers, designers, and content owners — had grown unhappy with the direction of the XHTML2 working group. Citing the closed nature of the W3C process, they decided to start over and develop their own standard.

So, in 2004 an independent coalition called WHATWG (Web Hypertext Application Technology Working Group) was formed. The group began working on a specification called Web Applications 1.0. In April 2007, the W3C voted overwhelmingly in favor of a proposal to adopt the group’s specification for review. The original WHATWG members began operating within the W3C as the HTML working group, and continued developing their proposal, which was renamed HTML 5. Thus, the HTML 5 working draft may one day become a W3C recommendation along with XHTML 2.0 (although that day is still far away, and the W3C has already missed several key project milestones).

Overview of XHTML 2.0

XHTML 2.0 is based solely on XML, forgoing the SGML heritage and syntax peculiarities present in current web markup. XHTML 2.0 is supposed to be a “general-purpose language,” with a minimal default feature set that is easy to extend using CSS and other technologies (XForms, XML Events, etc). It’s a modular approach that allows the XHTML2 group to focus on generic document markup, while others develop mechanisms for presentation, interactivity, document construction, etc.

Priority one for the XHTML2 working group is to further separate document content and structure from document presentation. Other goals include increased usability and accessibility, improved internationalization, more device independence, less scripting, and better integration with the Semantic Web. The group has been less concerned with backward compatibility than their predecessors (and the HTML working group), which has led them to drop some of the syntactic baggage present in earlier incarnations of HTML. The result is a cleaner, more concise language that corrects many of Web markup’s past indiscretions.

Overview of HTML 5

While XHTML 2.0 aims to be revolutionary, the HTML working group has taken a more pragmatic approach and designed HTML 5 as an evolutionary technology. That is to say, HTML 5 is an incremental step forward that remains mostly compatible with the current HTML 4/XHTML 1 standards. However, HTML 5 offers a host of changes and extensions to HTML 4/XHTML 1 that address many of the faults in these earlier specifications.

HTML 5 is about moving HTML away from document markup, and turning it into a language for web applications. To that end, much of the specification focuses on creating a more robust, feature-ful client side environment for web application development by providing a variety of APIs. Among other things, the spec stipulates that complying implementations must provide client-side persistent storage (both key/value and SQL storage engines), audio and video playback APIs, 2D drawing through the canvas element, cross-document messaging, server-sent events, and a networking API.

The HTML 5 specification maintains an SGML-like syntax that is compatible with the current HTML specifications (though some of the more esoteric features of SGML are no longer supported). Also included in the specification is a second “XML Serialization” which allows developers to serve valid XML documents as well. Again, by maintaining an SGML-like serialization the HTML 5 working group has struck a balance between pragmatism and progress. Developers can choose to markup content using either the HTML serialization (which looks more like HTML 4.x) or the XML serialization (which looks more like XHTML 1.x).

Similar Features

It shouldn’t be too surprising that both working groups are proposing a number of similar features. These features address familiar pain points for web developers, and should be welcome additions to the next generation of markup languages.

Removal of Presentational Elements

A number of elements have been removed from both XHTML 2.0 and HTML 5 because they are considered purely presentational. The consensus is that presentation should be handled using style sheets.

HTML 5 and XHTML 2.0 documents cannot contain these elements: basefont, big, font, s, strike, tt, and u. XHTML 2.0 also removes the small, b, i, and hr elements, while HTML 5 redefines them with non-presentational meanings. In XHTML 2.0, the hr element has been replaced with separator in an attempt to reduce confusion (since the hr element, which stands for horizontal rule, is not necessarily either of those things).

Navigation Lists

Navigation lists have been introduced in both XHTML 2.0 and HTML 5. In XHTML 2.0, navigation is marked up using the new nl element. Navigation lists must start with a child label element that defines the list title. Following the title, one or more li elements are used to markup links. Also new in XHTML 2.0 is the ability to create a hyperlink from any element using the href attribute. Combining these features produces simple, lightweight navigation markup:

<nl>
  <label>Category</label>
  <li href="/">All</li>
  <li href="/news">News</li>
  <li href="/videos">Videos</li>
  <li href="/images">Images</li>
</nl>

In HTML 5, the new nav element has been introduced for this purpose. Unfortunately, nav is not a list element, so it cannot contain child li elements to logically organize links (perhaps a new idiom will develop). And since anchor tags are still required to create hyperlinks in HTML 5, navigation markup is not quite as elegant:

<nav>
  <h1>Category</h1>
  <ul>
    <li><a href="/">All</a></li>
    <li><a href="/news">News</a></li>
    <li><a href="/videos">Videos</a></li>
    <li><a href="/images">Images</a></li>
  </ul>
</nav>
Enhanced Forms

Both specifications have new features to create more robust, consistent forms with less scripting. In XHTML 2.0, standard HTML forms are dropped completely in favor of the more comprehensive XForms standard. The XHTML2 working group does not control this standard, but references it from the XHTML 2.0 specification. To facilitate reuse, XForms separates the data being collected from the markup of the controls. It’s a robust and powerful language, but a full description is way beyond the scope of this post. Suffice it to say, there will be a bit of a learning curve for web developers trying to get up to speed with this technology.

HTML 5 retains the familiar HTML forms, but adds several new data types to simplify development and improve usability. In HTML 5, several new types of input elements have been introduced for email addresses, URLs, dates and times, and numeric data. This will allow user agents to provide more sophisticated user interfaces (e.g., calendar date pickers), integrate with other applications (e.g., pulling addresses from Outlook or Address Book), and validate user input before posting data to the server (less client-side javascript validation).

Semantic Markup

Both working groups have embraced the coming Semantic Web by allowing developers to embed richer metadata in their documents. As with forms, the XHTML2 working group has embraced a more sophisticated technology, while the HTML working group has kept things simple.

In XHTML 2.0, metadata can be embedded by using several new global attributes from the Metainformation Attributes Module. In particular, the new global role attribute is intended to describe the meaning of a given element in the context of the document. The technical term is Embedding Structured Data in Web Pages. Again, the group leverages an existing standard by referencing RDF. The technology is extremely powerful, but it’s also complicated.

The HTML working group has taken an approach that feels more like microformats by overloading the class attribute with a predefined set of reserved classes to represent various types of data. The specification currently lists seven reserved classes: copyright, error, example, issue, note, search, and warning. While overloading the class attribute like this might be confusing, it’s unlikely that user agents will render elements with these classes differently. And the class names are specific enough that there’s little worry: if an element has its class set to copyright, it’s probably a copyright whether the developer knew about the reserved classes or not.

Only in HTML 5

There are several new features that the HTML 5 specification describes that have no counterparts in XHTML 2.0.

Web Application APIs

HTML 5 introduces several APIs that will drastically improve the client-side web development environment. These APIs are what set HTML 5 apart as a proposal for a technology stack for Web Applications, rather than simply a markup language for documents. It should be noted that the details of these APIs are being worked out by the Web API working group, so they may be adopted with or without the rest of HTML 5. The new APIs, and corresponding elements are:

A 2D drawing API using the canvas element. An audio and video playback API, supporting the ability to offer multiple formats to user agents, which can be used with the new video and audio elements. Persistent storage on the client-side with support for both key/value and SQL databases. An offline web application API (similar to Google Gears). An API that allows Web Applications to register themselves for certain protocols or MIME types. An editing API that can be used in combination with the global contenteditable attribute. A drag & drop API that can be used with the draggable attribute. A network API allowing Web applications to communicate using TCP. An API that exposes the browser history, allowing applications to add to it so they don’t break the back button. A cross-document messaging API. Server-sent events in combination with the new event-source element.
New Elements

Several new elements are being introduced by HTML 5 that aren’t available in XHTML 2.0:

figure represents an image or graphic with a caption. A nested legend represents the caption, while a normal img element is used for the image. m represents text that has been marked in some way. It could be used to highly search terms in resulting documents, for example. time represents dates and time. meter represents a measurement. datagrid represents an interactive tree list or tabular data. command represents a command that the user can invoke. event-source is used to “catch” server sent events. output represents some type of output, such as from a calculation done through scripting. progress represents a completion of a task, such as downloading or when performing a series of expensive operations.

In addition, several new elements will help semantically markup the parts of a document. They’re fairly self explanatory: section, article, header, footer, and aside. And a new dialog element is designed to represent conversations using child dt elements for the speaker’s name and dd elements for the text.

Track Users by Pinging URIs

The new ping attribute can be used on the a and area elements to do user tracking. Rather than using redirects, or relying on javascript, the ping attribute allows you to specify a space separated list of URIs that should be pinged when the hyperlink is followed.

Only in XHTML 2.0

Also notable are the following new features that are available only in XHTML 2.0.

Any Element can be a Hyperlink

In XHTML 2.0, any element can be the source of a hyperlink — the href attribute can appear on any element. With this change the a element is no longer necessary, but it is retained.

Any Element can be an Image (or other resource)

In XHTML 2.0, the img element has been dropped. No worries, though — any element can now be an image. The idea is that all images have a “long description” that is equivalent to the image itself. By placing a src attribute on any element, you’re telling the user agent to load that resource in place of the element. If, for whatever reason, the resource is unavailable, the element is used instead. This allows developers to provide multiple equivalent resources using different file formats and representations by nesting elements within one another.

Lines Replace Line Breaks

The venerable br element, used to insert line breaks, has also been dropped from XHTML 2.0. The new l element is being introduced to replace it. l represents a line of text, and behaves like a span followed by a br in today’s markup.

New Heading Construct

The new h and section elements have been introduced to replace the numbered h1 through h6 elements. The goal is to accurately represent the hierarchical structure of a document. The current numbered headings are linear, not nested. By nesting section and h elements within parent sections the document structure is made explicit.

New Elements

The XHTML2 working group has focused on creating a more generic, simplified language. To that end, they’ve refrained from adding numerous specialized elements to represent different types of content. They argue that the new role attribute provides a mechanism for including rich metadata, making specialized elements unnecessary. That said, a couple new elements were included:

blockcode represents computer code. di represents a group of related terms and definitions in a dl (definition list). This is useful for words with multiple definitions, or multiple spellings. handler represents a scripted event handler, with a type attribute specifying the handler language. If the user agent doesn’t understand the language, the handler’s children are processed (otherwise they’re ignored). Handlers may be nested to provide multiple implementations in various languages.

Conclusion

Both proposals look promising, with lots of new features that address common web development problems. But neither specification is an official recommendation, and it’s likely to stay that way for some time.

Despite its late start, the HTML 5 working group seems to have more industry support, and is further along in the recommendation process. Their goal is to have a complete spec, with multiple interoperable implementations, by late 2010 (as I said before, though, the W3C has already missed some milestones in the approval process). With industry support from most of the major browser vendors (the only notable exception being Microsoft) it’s likely that this specification will be implemented quickly and consistently once it’s reached a stable state.

What everyone wants to avoid is another standards war. Fortunately, since both languages support XML namespaces (or, in the case of the HTML serialization of HTML 5, DOCTYPE switching) it’s unlikely that we’ll see the sort of browser dependent behavior we did in the 1990s. Standards wars aside, the future looks bright for web development. These new markup features and APIs will provide a rich environment for web development that should narrow the gap between Web and Desktop applications.

[image]

Why is VoIP cheaper than a standard telephone line?

Posted: Fri, 17 Aug 2007 23:35:25 +0000

Yesterday, Comcast came by to install their digital voice package at my apartment. Comcast has a special deal going on now: $24.95/month for 6 months, unlimited long distance. Skype’s even cheaper — $3/month for outgoing calls and $5/month for incoming (when they’re up). But here’s what I’m wondering: why’s it so cheap? Why is VoIP cheaper than a traditional plain old telephone service (POTS) line? Or, put another way, why is a POTS line more expensive than a VoIP line?

Let me take a moment to clarify. I understand why VoIP is cheaper for enterprise applications. Network convergence lowers the fixed cost of infrastructure, and commodity TCP/IP telecommunications equipment is a lot less expensive than specialized Public Switched Telephone Network (PSTN) equipment. What I’m wondering is why a single, residential POTS line (where fixed costs are already sunk, and there’s very little marginal cost) costs more than a VoIP connection.

In the beginning there was voice

Usually when I bring this up the first response people have is “duh, it’s the Internet — everything is cheaper online.” Competition, low overhead, etc, etc. But these people usually don’t know much about the history of the telcos, their relationship with computer networks, and the way data actually gets around the Internet. Even I had to go back to the books for some of this stuff. But keep reading: understanding this history is critical to fully appreciating the mystery behind the VoIP vs. POTS pricing riddle.

Long before computer networks became important, telephone companies were using digital communication. The first digital voice circuit was used in Chicago in 1962 (ARPANET, the predecessor to today’s Internet, wasn’t up and running until 1969). The telcos used these digital circuits to send lots of voice connections over long distances — something that analog circuits were no good at — and they continue to use them for this purpose today.

Voice communication has a few special characteristics. For one thing, it’s inherently real-time. You’d get annoyed if phone calls consisted of long periods of silence followed by several seconds of high-speed playback to catch up with the conversation on the other end. To prevent this from happening, digital voice circuits provide guaranteed Quality of Service (QoS). Once a connection is provisioned, you’ll always get exactly the amount of bandwidth you need. It’s not just bandwidth though, latency and jitter are also carefully controlled by using small, fixed sized data packets. The point is, these networks were specially designed to facilitate voice communication.

Then along came the Internet

When computer networks began popping up in the 1980s, the telcos wanted in. They already had a lot of infrastructure so they started looking at how they could send data over their existing trunk lines. They came up with a number of technologies with varying levels of success. But there was (and is) a problem: data networks are fundamentally different than voice networks.

First, data doesn’t have the same real-time constraints voice has. Computers can handle bursty connections, so latency and jitter aren’t a big issue. Packets can arrive out of order, long after they’re requested, without causing problems. And in most cases bandwidth guarantees aren’t needed; it makes more sense to let a single computer consume all available bandwidth if it’s the only one active.

With these things in mind, the Internet Protocol (IP) was designed to provide best effort delivery. That means it doesn’t guarantee bandwidth, data frequently arrives out of order (or not at all), and latency and jitter are accepted. Sending real-time data (like voice communication) over IP is very inefficient, and a huge pain. But it’s great for sending normal data like web sites and email.

Despite these differences, the telcos had infrastructure in place, so there was a lot of incentive to use it. After a few misses, Asynchronous Transfer Mode (ATM) was designed as a compromise technology that could carry both voice and data. But, in reality, it’s much less efficient than a pure data network. The overhead for data transfers on ATM is more than 10%, compared to about one percent for an ethernet link running full-throttle. While gigabit ethernet is challenging the technology, to this day ATM is used on most Internet backbones. And here’s the clencher: long distance telephone calls go over the same lines.

Wrapping things up

So in the end, PSTN and VoIP phone calls go over the same network. Yet, for some reason, the technology that makes more efficient use of existing network resources (PSTN) is more expensive. VoIP layers voice on top of IP, which is not ideal for transmitting real-time data (no QoS, high jitter). IP is then layered on top of ATM, which is not ideal for transmitting data packets (high overhead). Despite all that inefficiency, VoIP providers still manage to charge less than their old school telco counterparts. What gives?

[image]

5 tools every PHP programmer should know about

Posted: Thu, 16 Aug 2007 01:21:44 +0000

PHP ToolsAfter working on several large scale PHP projects, and writing a lot of PHP code, I’ve discovered a number of tools that improve code quality, streamline rollouts, and generally make life as a PHP developer a whole lot easier. Many of these tools probably deserve a post of their own. But, since some people aren’t even aware that these tools exist, I figured I’d start there. So, without further ado, here’s my list of tools that every PHP programmer should know about.

Phing - a project build system

Phing LogoPhing is a project build system based on Apache ANT. The name is a recursive acronym, of sorts, that stands for PHing Is Not GNU make. Phing can do anything a traditional build system like GNU make can do, but without the steep learning curve.

The idea behind phing (and other build tools) is to evaluate a set of dependencies, then execute a set of PHP classes to properly install and configure an application. The build process is controlled by a simple XML configuration file. Out of the box, phing can perform token replacement (e.g., to change include paths on your development and production systems), execute SQL, move and copy files, run shell scripts, and more. You can also create your own custom tasks by extending the “task” class included with the package.

Phing is an invaluable tool for anyone who needs to deploy large scale PHP applications on more than a single server. But I’ve found it useful for simple scripts, too.

Xdebug - debugger and profiler tool

Xdebug LogoXdebug is a PHP extension that helps you debug and profile scripts. Among the most useful features of Xdebug are the new notice, warning, and error messages that are displayed after activation. If a script fails to execute properly, Xdebug will print a full stack trace in the error message, along with function names, parameter values, source files, and line numbers. A welcome feature for developers who are tired of the skimpy error reports from a default PHP install.

The extension has a number of more advanced features that allow developers to perform code coverage analysis, collect profiling information, and debug scripts interactively. The profiling functionality is particularly useful. The profiler uses a common output file format, allowing you to use tools like KCacheGrind to quickly find bottlenecks in your code. A good profiler is an essential tool for any serious developer, as it allows you to properly optimize your code while avoiding the hazards of premature optimization.

PHPUnit - unit testing framework

PHPUnit logoPHPUnit is a lightweight testing framework for PHP. It’s a complete port of JUnit 3.8.1 for PHP5, and is a member of the xUnit family of testing frameworks (which are based on a design by software patterns pioneer Kent Beck).

Unit tests form the foundation of several modern agile development methodologies, making PHPUnit a vital tool for many large scale PHP projects. The tool can also be used to generate code coverage reports using the Xdebug extension discussed earlier, and integrates with phing to automate testing.

Propel - object-relational mapping framework

Propel LogoPropel is an Object-Relational Mapping (ORM) framework for PHP5 that was originally based on the Apache Torque project. It provides a sophisticated, but easy to use database abstraction layer that allows you to work with database entities the same way you work with normal classes and objects in PHP. Propel allows you to define your database in a simple XML format which it uses to construct the database, and generate static classes for use in your application.

Propel is integrated into the popular Symfony PHP framework (among others), which has helped keep the code base flexible, modular, and portable. The project has excellent documentation, and a great support community.

phpMyAdmin / phpPgAdmin - web-based database administration

phpMyAdmin LogoAn oldy but a goody, phpMyAdmin is one of the most useful administrative tools available for any database (along with it’s PostgreSQL and SQLite cousins phpPgAdmin and phpSQLiteAdmin) . It’s useful for everything from constructing and altering databases to debugging applications and making backups. This is often the first thing I install after Apache, PHP and MySQL on a LAMP server. If you use MySQL, and somehow you haven’t heard of it, install it now.

Other Stuff

There are tons of excellent tools that fill all sorts of niches, and help provide a rich environment for PHP developers — I wish I could mention them all. A few more that I’ve found useful myself are PHP Beautifier, Spyc, Creole, and Smarty. I’m sure there are tons more that I either forgot, or have never heard of. So if you know of a great PHP development tool that I left out, please post a comment and let me (and everyone else) know about it!

[image]

Database Design: Choosing a Primary Key

Posted: Tue, 14 Aug 2007 23:10:53 +0000

Entity-Relationship DiagramA good model and a proper database design form the foundation of an information system. Building the data layer is often the first critical step towards implementing a new system, and getting it right requires attention to detail and a whole lot of careful planning. A database, like any computer system, is a model of a small piece of the real world. And, like any model, it’s a narrow representation that disregards much of the complexity of the real thing.

Modern database systems rely on the relational model to store and retrieve data. The name comes from the relationship between columns in a table (not because you can relate tables to one another). In other words, relational means that several values that belong to the same row in a table are related.

The primary key is an attribute (or a combination of attributes) that uniquely identifies a row. Though not strictly required by relational mathematics, primary keys make it reasonably easy to deal with relational data programmatically. They make mapping relational data to an object-oriented model feasible, and allow applications to uniquely identify and manipulate each entity (row) in the database.

Natural Keys

The concept of a unique identifier is familiar in the real world — you use account numbers to identify credit cards, addresses to identify buildings or houses, etc. These are examples of natural keys, real-world identifiers that are used to uniquely identify real-world objects.

In general, if the data you are modeling has a decent natural identifier you should use it as the primary key. That said, not all natural keys make good primary keys. The goal of the primary key is to uniquely identify an entity in your database. It does not have to describe the entity. The fact that a particular identifier can be used to describe an object in the real world doesn’t mean it’s a good primary key.

There are a number of desirable (not necessarily required) primary key characteristics that natural identifiers sometimes lack:

Unique values: The primary key must uniquely identify each row in a table. Non-intelligent: The primary key should not have embedded semantic meaning. In other words, it should not describe characteristics of the entity. A customer ID of 398237 is typically preferred over Michael J. Malone. No change over time: The value of a primary key should never change. Changing a primary key value means you’re changing the identity of an entity, which doesn’t make sense. Non-intelligent keys are preferred because they are less likely to change. Single-attribute: A primary key should have the minimum number of attributes possible. Single-attribute primary keys are desirable, because they’re easier for applications to work with, and they simplify the implementation of foreign keys. Numeric: It’s easier to manage unique values when they are numeric. Most database systems have internal routines that facilitate auto-incrementing primary key attributes. While these facilities are useful, be careful not to use them as a crutch.

For each of these rules, there are exceptions. For example, composite primary keys are particularly useful as identifiers in join tables, modeling a many-to-many relationship. And an otherwise suitable single-value natural key should not be disregarded simply because it’s not numeric.

Surrogate Keys

When a natural key doesn’t exist, or when the natural key isn’t up to snuff, it’s time to consider using a surrogate key (also called a synthetic key) to uniquely identify entities. A surrogate primary key is typically a numeric, single-attribute key and is often auto-generated by the database system. While some DBAs continue to debate their use, surrogate primary keys are pretty much accepted practice these days.

A surrogate key is meaningless by itself. Thus, it has no embedded semantic meaning. The sole purpose of a surrogate key is to uniquely identify entities, and to facilitate relational operations like joins and filters. It’s a single, unique value that never has to change (because its only job is to identify the entity). Thus, it’s an ideal primary key.

Because surrogate keys always consist of a single attribute, they can simplify business logic. If the column name for the primary key of a table can be derived from the table name, for example, a code generator can be used to built a primitive database abstraction layer. These keys, in combination with the table name, act as a globally unique identifier at runtime. This identifier can be used to build sophisticated caching mechanisms, facilitate lazy-loading, and simplify serialization.

Conclusion

As the foundation of many information systems, database design should be carefully planned and properly implemented. Choosing the proper primary key is a critical task in modeling relational data. If possible, entities should have a unique identifier that has meaning rather than some obscure sequential integer. But the natural identifier need not be the primary key — it’s perfectly acceptable to use a synthetic, or surrogate key as a tables primary key. That said, don’t use auto-generated primary keys in order to avoid identifying and properly handling the natural keys present in your data.

[image]

Scaling WordPress

Posted: Tue, 14 Aug 2007 02:47:16 +0000

WordPress vs. DiggWordPress seems to have a bad reputation when it comes to scalability. Maybe it’s deserved, since a default WordPress installation doesn’t really scale well. But making WordPress scale isn’t hard. I recently hit the Digg home page and got roughly 70,000 pageviews in under 12 hours. Another post hit the home page later the same day, and another 10,000 clickthroughs followed. As a result, I’ve been asked by a few people how I managed to keep my site up under that sort of stress. Honestly, I haven’t done anything that fancy. But for future reference I figured I’d document my configuration, and let people in on one trick that saved my butt.

First, I run WP-Cache. I started using WP-Cache after my first Digg experience and, for me, the performance improvements far outweigh the compatibility issues that may arise. If you’re a real performance junkie you may be interested in a simple hack to enable gzip with WP-Cache.

If you have some content that must be fully dynamic (like the pageview counters that I recently added here under the post title) you can take advantage of the mfunc functionality built into WP-Cache. To include some dynamic code in a cached page, use the following syntax:

<!--mfunc function_name('parameter'); -->
<?php function_name('parameter'); ?>
<!--/mfunc-->

Caching a page, then selectively adding dynamic functionality where it’s necessary will drastically reduce the load on your server.

This site is hosted on a dedicated server, along with another project I’m working on. A second server handles the database back-end for both sites. This setup is capable of handling a tremendous amount of traffic. The bottleneck, surprisingly enough, is the front-end web server (not the database).

In retrospect, it probably would have been wise to upgrade the front-end server to at least 1GB of ram (it currently has only 512MB). With Apache processes running upwards of 20MB each, RAM limits the number of child processes that can be spawned before the system starts thrashing. And since an Apache process can only handle one connection at a time, the number of Apache child processes places an upper bound on the number of concurrent requests the server can handle. Additional requests are backlogged, and when the backlog builds up system performance suffers.

I’ve been experimenting with using mod_backhand in combination with Amazon EC2 to offload some traffic from my primary web server to an elastic compute cloud during periods of heavy traffic. My success with Amazon S3 has fueled my interest in EC2, but I’m still skeptical of EC2’s ability scale rapidly enough to handle traffic spikes (it takes a minute or two to provision an EC2 server — in that time, a front page story on Digg could send upwards of 1,000 page requests your way).

Finally, my little trick. If you’re not running WP-Cache, and one of your posts is about to go viral, try caching the post manually. It’s a simple process:

Create a static version of your post using a tool like wget, or by saving the page in your browser. Duplicate the directory structure that you see in the post URL under the root web directory on your server (e.g., http://immike.net/blog/08/13/post-title would be something like /var/www/blog/08/13/post-title). Name the static copy of the post index.html and store it in your newly created directory.

If you’re using the standard WordPress rewrite rules (the ones WordPress auto-generates), static HTML files will override dynamically generated content. Thus, as soon as the static copy of your post is in place, Apache will start serving it instead of passing requests off to WordPress. Even on modest hardware Apache can handle hundreds of requests for static content per second, so this trick should keep your server up through the storm. Once things calm down, simply remove the files/directories and WordPress will take over once again.

[image]

SCO Doesn’t Own UNIX

Posted: Sat, 11 Aug 2007 01:53:06 +0000

Following nearly five years of FUD and barratry, Judge Dale Kimball has issued a 102-page ruling [via Groklaw] concluding that “Novell is the owner of the UNIX and UnixWare copyrights.” Furthermore, the court found that SCO owes Novell quite a bit of money — “[B]ecause a portion of SCO’s 2003 Sun and Microsoft Agreements indisputably licenses SVRX products… SCO is obligated… to account for and pass through to Novell the appropriate portion relating to the license of SVRX products.”

While this judgement essentially renders all of SCO’s claims against Novell moot, the battle continues in the ongoing SCO vs. IBM case. That said, SCO is suing IBM for copyright infringement. And since today’s ruling found Novell the rightful owner of UNIX IP the case against IBM is pretty much over too (it’ll be hard for SCO to win a copyright infringement suite when they don’t actually own the copyright in question).

The amazing thing is, after all this time SCO hasn’t shown any convincing evidence to back up their claims. After raising millions of dollars from Sun and Microsoft SCO had a large enough war chest to drag a baseless lawsuit out for years. I’m glad it’s (mostly) over, but it’s still frustrating that it happened at all!

[image]

DNS Rebinding Revisited

Posted: Wed, 08 Aug 2007 22:58:39 +0000

I wrote last week about how DNS rebinding can bypass browser same origin policies. Since then I found a paper titled Protecting Browsers from DNS Rebinding Attacks that describes rebinding attacks in greater detail. It turns out that there are several varieties of rebinding attacks, and a couple of proof-of-concept DNS rebinding demonstrations already exist.

To perform a DNS rebinding attack, an attacker initially answers DNS queries for their domain with the IP address of their server, and a very short time-to-live (TTL). Using javascript, or some other mechanism, the attacker initiates a second request to their domain from the victim’s machine. Since the TTL has expired, another DNS query is sent to the attacker’s DNS server. This time, the server responds with the IP address of a target server that the attacker wishes to connect to (e.g., an internal web server).

Modern browsers implement a security mechanism called DNS pinning as a partial defense against DNS rebinding attacks. Once a browser resolves a host name to an IP address, the browser caches the result for a fixed duration regardless of the TTL (the cache duration varies quite a bit: 1 second for Safari 2, 120 seconds for Firefox 2, and 30 minutes for IE7).

But browser plug-ins maintain their own pin databases, creating a new class of multi-pin vulnerabilities. After the browser resolves an attackers host name, subsequent network connections from Flash or Java LiveConnect will trigger additional DNS queries. Thus, an attacker can pin one IP address to the browser, while pinning a second address to Java or Flash. Browsers allow inter-technology communication between plug-ins and the browser, so transmitting data between components is trivial.

Moreover, the Stanford research paper describes a number of techniques for bypassing the browser’s internal pinning mechanism, and performing rebinding attacks without relying on third-party plug-ins. It turns out that an attacker “can cause the browser to release its pin after as little as one second by forcing a connection to the current IP to fail, for example by including the element <img src=”http://attacker.com:81/”>.”

If you’re still questioning whether this sort of attack could be carried out in the real-world, the researchers at Stanford “tested DNS rebinding experimentally by running a Flash 9 advertisement on a minor advertising network… The experiment used two machines in our laboratory, an attacker and a target. The attacker ran a custom authoritative DNS server for dnsrebinding.net, a custom Flash policy server, and an Apache web server hosting the advertisement. The target ran an Apache web server to log successful attacks.” The results show that the experiment was “successful on 30,636 (60.1%) impressions and 27,480 unique IP addresses.” After running the experiment for three nights, the researchers “obtained 100.3 machine-days of network access.”

That’s proof enough for me, but for you seeing-is-believing folks, make sure you check out the proof of concept attacks for your browser, Java LiveConnect, and Flash 9.

[image]

Site Redesign

Posted: Tue, 07 Aug 2007 19:21:05 +0000

I’ve had “redesign blog” on my to-do list for more than a month now. Over the weekend I finally got around to doing something about it [screenshot]. The stock WordPress theme I’ve been using since I launched my blog is great, but I’ve noticed it showing up on more and more sites and I wanted something unique. The changes I made are evolutionary, not revolutionary. And I’m the first to admit that I’m not the best designer around, but I think it’ll do for now.

I’m Mike Screenshot

I’ll probably be making more changes in the near future. I have some ideas for post stats I’d like to show, and other navigational features I’d like to implement. In the meantime I’ll be working out any issues that come up with the changes that have already been made. If you notice any bugs with the CSS/XHTML please let me know in a comment or via my contact page.

Finally, I’d like to thank Ryan Powell for creating the header image (and helping me out with design ideas for the Twitter bar), Paul Stamatiou for helpful feedback and comments as I worked through creating the new layout, and Andrew Mager (and everyone else who has seen the new design) for not being too critical when I asked for their opinions!

[image]

Single line of HTML crashes IE 6

Posted: Mon, 06 Aug 2007 22:05:23 +0000

A Japanese blogger who goes by the name Hamachiya2 has discovered a single line of HTML and CSS that crashes IE 6. The line is:

<style>*{position:relative}</style><table><input></table>

If you’re brave, you can click here to try it out. The code is rendered correctly in Firefox, Safari and Opera (didn’t get a chance to try any other browsers, but presumably they work too). But in IE 6 it raises a fatal error in mshtml.dll.

[image]

Password free remote login and other SSH tips

Posted: Mon, 06 Aug 2007 19:35:40 +0000

I typically have four or five terminal windows open, and I’m almost always logged in to at least three servers (my dev box, production box, and database server). It’s a huge pain to log back into all these sessions whenever my connection is dropped. To keep myself sane, I use a couple of tricks to keep timeouts from occurring, and to streamline the login process when they do.

Keep connections alive

My home network consists of a cheap NAT firewall/wireless access point connected to a cable modem. In order to route incoming traffic properly, NAT devices keep a table of active connections in memory. As a result, NAT firewalls have a nasty habit of timing out idle sessions to keep their state tables clean. Thankfully, SSH has a built in keepalive mechanism that solves the problem.

You can turn SSH keepalives on at the system level, or on a per user basis. Single user configuration options are stored in ~/.ssh/config. The system wide configuration options are typically found in /etc/ssh_config (on Debian systems, it’s /etc/ssh/ssh_config). To enable keepalives, open the config file in your favorite text editor and add the following lines:

Host *
  ServerAliveInterval 60

The numeric argument specifies the number of seconds between keepalive requests on idle connections. The Host line lets you restrict declarations to a particular host, or group of hosts. A single ‘*’ is a wildcard pattern that matches any host, so keepalive requests will be sent for all sessions.

Password free login

Even with keepalive requests turned on, your session will time out occasionally (e.g., when you lose your internet connection). You can save yourself a bit of time by adding your workstation’s SSH key to the authorized_keys file on each remote system you login to.

First, generate an SSH public/private key pair on your local system (if you already have a key pair you can skip this step). When prompted for a passphrase, leave it blank.

local$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/mmalone/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/mmalone/.ssh/id_rsa.
Your public key has been saved in /Users/mmalone/.ssh/id_rsa.pub.
The key fingerprint is:
4d:d0:f4:f2:6c:3a:ac:b4:dc:c7:71:2b:b8:b7:5a:7c
mmalone@michael-malones-computer.local

Next, copy the public key from your local system to the system you’re logging into (if the ~/.ssh directory does not exist on the remote system, you may have to create it: mkdir ~/.ssh).

local$ cat ~/.ssh/id_rsa.pub | ssh mmalone@immike.net \\
> 'cat >> .ssh/authorized_keys'
Password:

All done. Now you can log into your remote system without a password:

local$ ssh mmalone@immike.net
Linux www 2.6.18-3-686 #1 SMP Sun Dec 10 19:37:06 UTC 2006 i686

mmalone@www:~$

The same public key can be used on any number of remote servers, so you can repeat steps two and three on any other servers you regularly use.

[image]

Threads vs. Processes: They’re not the same thing!

Posted: Sat, 04 Aug 2007 22:10:57 +0000

Process vs. ThreadI read a lot of tech-related blogs and other tech-news, and I’ve caught a number of very talented programmers and intelligent technologists using the terms thread and process interchangibly. Forgive me for being pedantic, but they’re not the same thing! It’s true that threads and processes are very similar: they’re both methods of parallelizing an application. But the similarities pretty much stop there.

A process is usually defined as an instance of a program that is being executed, including all variables and other information describing the program’s state. Processes have a life cycle: they are spawned, optionally generate one or more child processes, and eventually die. Each process is an independent entity to which system resources (CPU time, memory, etc.) are allocated. And each processes is executed in a separate address space. Thus, one process cannot access variables or other data structures that are defined in another process. If two processes want to communicate, they have to use inter-process communication mechanisms like files, pipes, or sockets.

The term thread is short for a thread of execution, and refers to a particular execution path through a process. Threads and processes work differently on different operating systems but, in general, multiple threads can share the state information of a single process. Since threads share memory and other system resources, they can communicate directly via variables and other memory structures. And because threads can share a single address space, context switching between threads is faster than switching between processes.

Many modern applications take advantage of multithreading. In particular, applications that perform a lot of I/O — like web servers and databases — can drastically improve performance by implementing a multithreaded execution model. On a multiprocessor system, multiple threads of execution can even run simultaneously within a single process. Unfortunately, the threading abstractions in modern operating systems can be hard to understand, and are unavailable in certain programming languages.

So, one more time. A process can be thought of as a thread, plus an address space, file descriptors, and a bunch of other data. A single process can consist of multiple threads, and when one thread modifies a process resource, the change is immediately visible to sibling threads. On the other hand, processes have their own address spaces, and are unable to communicate directly. Processes and threads are not the same thing.

[image]

DNS rebinding can bypass browser same origin policy

Posted: Thu, 02 Aug 2007 23:28:55 +0000

Artur Bergman posted an interesting story yesterday on O’Reilly Radar titled Your browser is a tcp/ip relay. In the post, Bergman explains a new technique that could allow malicious code to bypass the same origin browser security model. The article credits security researcher Dan Kaminsky with discovering the loophole, though it appears to have been around for a while.

The attack is fairly simple (to explain, at least). The attacker first configures their DNS server so that query results have a very short time to live (TTL) — say 10 seconds. The victim connects to attacker.com, and loads the site as usual. The DNS server is immediately reconfigured to resolve attacker.com to a different IP address (say, 10.0.0.1). After the TTL expires, JavaScript on the victim’s browser makes another request to attacker.com, in compliance with the same origin policy. But this time attacker.com resolves to an internal IP address (10.0.0.1), allowing the attacker to remotely access a private network.

I spoke briefly to OpenDNS founder David Ulevitch about the exploit. Though the concept is fairly simple, we agreed that it would be difficult to perform this sort of attack in practice. An attacker would need to have intimate knowledge of the victims internal network, or rely on Flash or other web technologies to perform a network scan. Moreover, an attack would end as soon as the victim closed their web browser.

Nevertheless, a vulnerability clearly exists, and it could be difficult to resolve. Many web sites rely on round robin DNS configurations for load distribution. Since round robin configurations may legitimately return different IP addresses for the same host name, distinguishing malicious DNS rebinding attacks from round robin configurations will be difficult, if not impossible to do.

[image]

What is the Completely Fair Scheduler?

Posted: Thu, 02 Aug 2007 00:54:53 +0000

Linux Penguin LogoIf you’ve been following Linux kernel news then you’ve probably heard about the new Completely Fair Scheduler that has been merged into the upcoming 2.6.23 kernel release. It’s been a while since I’ve done much Linux kernel hacking, so the initial announcement was mostly over my head. After reading about the new scheduler in several places, I decided to do a bit of research into how the current Linux scheduler works, and what makes the new scheduling algorithm so interesting. Here’s what I learned.

What is a scheduler?

Like every multitasking operating system, Linux achieves the illusion of simultaneous execution of multiple processes (or programs) by rapidly cycling through processes that are ready to run, giving each one a short amount of time to execute on the CPU. Determining when to switch between processes, and which process should be allowed to run next is called scheduling. The kernel code that performs process scheduling is called the scheduler.

Implementing a scheduling algorithm is tricky for a couple of reasons. First, an acceptable algorithm has to ration CPU time such that higher priority tasks (e.g., interactive applications like a web browser) are given preference over low priority processes (e.g., non-interactive batch processes like program compilation). At the same time, the scheduler must protect against low priority process starvation. In other words, low priority processes must eventually be allowed to run, regardless of how many high priority processes are vying for CPU time.

Schedulers must also be carefully crafted so that processes appear to be running simultaneously, without having too large an impact on system throughput. For interactive processes like GUIs, the ideal scheduler would give each process a very small amount of time on the CPU and rapidly cycle between processes.

A process can only respond to user input when it has control over the CPU. Because users expect interactive processes to respond immediately to input, the delay between user input and process execution should ideally be imperceptible to a human (somewhere between 50 and 150ms is usually sufficient).

For non-interactive processes the situation is reversed. Switching between processes, or context switching, is a relatively expensive operation. Thus, larger slices of time on the processor, and fewer context switches can improve system performance and throughput. The scheduling algorithm must strike a balance between all of these competing needs.

The O(1) Scheduler

The current Linux scheduler uses a complicated set of heuristics to provide adequate performance to nearly every type of process. The algorithm tries to identify interactive processes by analyzing average sleep time (e.g., the amount of time the process spends waiting for input). Processes that sleep for long periods of time are probably waiting for user input, so the scheduler assumes they’re interactive.

Each process is given a time quantum based on a static priority (set via the nice system call). The time quantum determines how long the process can execute on the CPU before it is preempted and another context switch occurs.

The scheduler maintains a list of active and expired processes. When a new process is spawned, it starts in the active list. Non-interactive processes move from the active list to the expired list once their time quantum is exhausted. Expired processes are forbidden to run until all active processes expire.

Interactive processes generally remain active: after execution, the scheduler refills their time quantum and leaves them in the set of active processes. However, if the oldest expired process has waited for a long time, or if an expired process has a higher static priority than the interactive process, the scheduler moves the interactive process to the expired process list. Thus, the scheduler gives interactive processes priority while preventing process starvation by guaranteeing that all processes will eventually have a chance to run.

The defining characteristics of the current scheduler is the algorithm’s running time. If you’re not familiar with Big O notation, it’s a simple system for describing how the size of the input data set affects the length of time it takes an algorithm to execute. An O(n^2) algorithm would take 1 unit of time to run if the input data contained a single item, 4 units of time for 2 units of input, etc. The current Linux scheduler is an O(1) scheduler. In other words, the algorithm takes the same amount of time to run regardless of the input data size (in this case, the number of processes being run on the system).

If you think all of that sounds way more complicated than it has to be, you’ll be happy to know that the new Completely Fair Scheduler is far simpler, and may prove to be a superior scheduler.

The Completely Fair Scheduler

The Completely Fair Scheduler (CFS) has replaced the current O(1) Scheduler in the upcoming 2.6.23 kernel release. The CFS uses a far simpler scheduling algorithm, completely ignoring sleep time, interactive process identification, time slices, etc. Author Ingo Molnar describes the CFS as “basically [modeling] an ‘ideal, precise, multi-tasking CPU’ on real hardware.” It took a bit of homework to figure out what the heck he was talking about, but it turns out that the concept is actually quite simple.

An “ideal, precise, multi-tasking CPU” is one that can run multiple processes at the same time (which is, of course, impossible), giving each process an equal share of processor power. If a single process is running, that process would be given 100% of the processor’s power. With two processes, each would have 50% of the physical power (in parallel).

Molnar notes that “on real hardware, we can run only a single task at once, so while that one task runs, the other tasks that are waiting for the CPU are at a disadvantage - the current task gets an unfair amount of CPU time.”

With CFS, this “fairness imbalance” is tracked on a per-process basis. As a process waits for the CPU, the scheduler tracks the amount of time it would have used on our ideal processor. This time is calculated by dividing the wait time (in nanoseconds) by the total number of processes waiting. The resulting value is the amount of CPU time the process is entitled to, and is used to rank processes for scheduling and to determine the amount of time the process is allowed to execute before being preempted.

Controversy

Red Hat employee and Linux kernel hacker Ingo Molnar caused quite a stir when he sent the Completely Fair Scheduler patch out over the Linux kernel mailing list. Before Molnar announced CFS, Con Kolivas’ RSDL scheduler was positioned to be merged into the kernel. Another scheduler written by Nick Piggin called nicksched had also been considered.

Some onlookers felt that Molnar co-opted ideas that Con and Nick pioneered, re-implemented them, and called them his own. While NIH syndrome may have contributed to the creation of CFS, Molnar’s ideas are novel enough to discredit any claimed impropriety. In any case, CFS has been merged into the Linux kernel and will be included in the upcoming 2.6.23 release. That said, I doubt the debate is over.

[image]

Flash Sucks

Posted: Tue, 31 Jul 2007 21:21:30 +0000

Flash SucksThe Adobe Flash Player is a multimedia application created by Macromedia (now a division of Adobe Systems). Flash Player features support for both vector and raster graphics, along with a scripting language and bidirectional streaming of video and audio content. The player is a virtual machine that runs Flash files, which are often embedded in websites to present animations, games, GUIs, or other visual interestingness. If you’re reading this website, you probably know all of this. What you might not know is that Flash sucks. It is the bane of the Internet, and it needs to go away.

Usability and Accessibility

The ironic thing about Flash is that its use is so frequently self-defeating. Flash is often used in an attempt to make sites more user friendly. But replacing familiar browser components with custom Flash garbage only hurts usability. Consistency is imperative for a UI — users learn how to do something once, and can apply that knowledge in tons of places. But with Flash, overzealous designers try to “fix” what they see as bad interface models by creating custom Flash crap. This sucks.

While we’re on usability, let’s talk about people who are disabled. Flash sucks at accessibility. Though Flash has some features that are supposed to improve accessibility, they’re weak and almost never used. The fact of the matter is that Flash is pretty much inherently inaccessible. If you want to use Flash, and remain accessible (and indexable — web spiders can’t understand Flash binaries either), your only real option is to create a second version of your site that uses standard technologies. That sucks.

Technically, the accessibility and usability issues apply only to poorly designed Flash sites. Someone could (and probably will) counter that it’s not Flash that sucks, but people who are using Flash the wrong way. I’d argue that a tool that encourages suckiness is itself inherently sucky, but I’ll spare you that schpeel and move on to the one thing that makes Flash incontrovertibly sucky, regardless of how you use it.

Closed Specification

Call me idealistic, but I hate companies that use closed specifications to stifle competition. And that may be my biggest issue with Flash. Sure, Adobe provides the SWF and FLV Specifications to developers who want to create Flash content. But first you have to agree to the SWF File Format Specification License where you promise that you will “not use the Specification in any way to create or develop a runtime, client, player, executable or other program that reads or renders SWF files.” That sucks.

Don’t care about the closed specification issues? Well, you should. As more and more content is stored in Adobe’s proprietary format, the company is gaining a tremendous amount of power. They’ve already announced a version of Flash that includes DRM support, allowing “copyright holders” to prevent users from skipping advertisements and restrict copying. Heck, digital rights management (DRM), combined with the overly restrictive anti-circumvention legislation in the DMCA, could make it illegal to download and save your own damned YouTube videos! That would definitely suck.

Glad to see Adobe has it’s priorities straight. While they rushed to include DRM support, the company has been dragging it’s feet on Flash support for 64-bit operating systems (there is none). This problem is years old. And it’s not like the advent of 64-bit CPUs was a surprise. They should have been working on 64-bit Flash in the late 1990s — or they should have at least given it some thought! And, seriously, it’s taken a team of coders more than two years to port a plugin from 32-bit to 64-bit? Christ, Apple ported an entire operating system from a RISC to a CISC chipset in less time than that. Sounds like the Flash code-base sucks too.

So what’s the alternative?

Yea, you got me. That’s what really sucks. Microsoft Silverlight might provide a viable alternative once it’s released. But chances are it will suck at least as much as Flash. Maybe if the W3C standards for SVG and SMIL are ever fully implemented a decent open solution will exist and the problem will go away (if you’re in Firefox, check out some of the SVG samples, they’re pretty cool). But until then, we’re stuck with Adobe’s crap. So I implore you: use it right, and only when absolutely, positively, unquestionably and undeniably necessary.

[image]

What exactly is a load average?

Posted: Fri, 27 Jul 2007 21:47:55 +0000

Load AverageIf you’ve spent some time on a Unix or Unix-like machine (e.g., Linux, OS X, Solaris, etc.) then you’re probably at least vaguely familiar with the concept of a load average. A system’s load average can be easily determined from the Unix shell by running the uptime command:

mmalone@www:~$ uptime
 15:37:38 up 133 days,  3:37,  3 users,
    load average: 0.37, 0.37, 0.41

The load average is also displayed by the w and top commands, and by pretty much every system monitoring package on the planet. But what the heck is a load average, exactly?

To most people, a load average is some mysterious number that is somehow related to the amount of work that their computer is currently handling. But what is a good load average, and how high is too high? The answer is actually quite simple. But first you have to understand what the load average is actually measuring.

Without getting into the vagaries of every Unix-like operating system in existence, the load average more or less represents the average number of processes that are in the running (using the CPU) or runnable (waiting for the CPU) states. One notable exception exists: Linux includes processes in uninterruptible sleep states, typically waiting for some I/O activity to complete. This can markedly increase the load average on Linux systems.

The load average is calculated as an exponential moving average of the load number (the number of processes that are running or runnable). The three numbers returned as the system’s load average represent the one, five, and fifteen minute moving load average of the system.

So, for a single processor machine a load average of 1 means that, on average, there is always a process in the running or runnable state. Thus, the CPU is being utilized 100% of the time and is at capacity. If you tried to run another process, it would have to wait in the run queue before being executed. For multiprocessor systems, however, the system isn’t CPU bound until the load average equals the number of processors (or cores, for multi-core processors) in the machine. My database server, for example, has two dual core processors. Thus, the system isn’t fully utilized until the load average reaches 4.

In summary, the load average is a moving average of the number of processes in the running or runnable states. You shouldn’t be worried about your system’s load unless it is consistently higher than the number of processors (or cores) in your machine. In general, you can calculate a system’s CPU utilization by dividing the load average by the number of processors/cores in the system.

[image]


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser