Sniffing HTTP

July 10, 2008 by Andy

Recently I spent a bit of time studying the effects of HTTP headers on different browsers. There was this issue with IE6 caching things too aggressively… but I digress. I crafted this command line for the command line version of Ethereal (WireShark). It continuously dumps HTTP request headers, response headers, and text responses. There is a 30-line limit on all three. Here is it, mainly for my memory but maybe someone else will benefit:

tethereal -i en1 -f 'host 1.2.3.4' -R 'http' -S -V -l | \
awk '/^[HL]/ {p=30} /^[^ HL]/ {p=0} /^ / {--p} {if (p>0) print}'

Replace en1 with the network adapter you are using (ifconfig). Replace 1.2.3.4 with the IP of the destination machine. I used the awk command as a state machine to filter out unwanted output from tethereal and to impose the 30-line limit. The output looks like this:

Hypertext Transfer Protocol
    GET /style.css HTTP/1.1\r\n
        Request Method: GET
        Request URI: /style.css
        Request Version: HTTP/1.1
    Host: example.wordpress.com\r\n
    User-Agent: Mozilla/5.0 [...] Firefox/3.0\r\n
    Accept: text/css,*/*;q=0.1\r\n
    Accept-Language: en-us,en;q=0.5\r\n
    Accept-Encoding: gzip,deflate\r\n
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
    Keep-Alive: 300\r\n
    Connection: keep-alive\r\n
    Referer: http://example.com/\r\n
    Cookie: wp_test=WP+Cookie+check\r\n
    \r\n

Hypertext Transfer Protocol
    HTTP/1.1 200 OK\r\n
        Request Version: HTTP/1.1
        Response Code: 200
    Date: Thu, 10 Jul 2008 20:37:45 GMT\r\n
    Server: LiteSpeed\r\n
    Accept-Ranges: bytes\r\n
    Connection: Keep-Alive\r\n
    Keep-Alive: timeout=5, max=100\r\n
    Cache-Control: max-age=604800\r\n
    Expires: Thu, 17 Jul 2008 20:37:45 GMT\r\n
    ETag: "461d-47e542a4-0"\r\n
    Last-Modified: Sat, 22 Mar 2008 17:32:20 GMT\r\n
    Content-Type: text/css\r\n
    Content-Length: 2400\r\n
    Content-Encoding: gzip\r\n
    Vary: Accept-Encoding\r\n
    \r\n
    Content-encoded entity body (gzip): 2400 bytes -> 17949 bytes
Line-based text data: text/css
    /*
    \tTheme Name: Example
    \tTheme URL: http://wordpress.com
    */
    [...]

Batcache for WordPress

June 22, 2008 by Andy

[I meant to publicize this after a period of quiet testing and feedback but the watchdogs at WLTC upended the kitten bag and forced my hand. Batcache comes with all the usual disclaimers. If you try it on a production server expect the moon to fall on your head.]

People say WordPress can’t perform under pressure. The way most people set it up, that’s true. For those who host their blog for $7.99 a month (do they also run Vista on an 8086?) the best bet is to serve static pages rather than dynamic pages. Donncha’s WP-Super-Cache does that brilliantly. I’ve seen it raise a server’s capacity for blog traffic by one hundred times or more. It’s a cheapskate’s dream.

WP-Super-Cache is good for anyone with a single web server with a writable wp-content/cache directory. To them, the majority, I say use WP-Super-Cache. What about enterprises with multiple servers that don’t share disk space? If you can’t or won’t use file-based caching, I have something for you. It’s based on what WordPress.com uses. It’s Batcache.

Batcache will protect you

Batcache implements a very simplistic caching model that shields your database and web servers from traffic spikes: after a document has been requested X times in Y seconds, the document is cached for Z seconds and all new users are served the cached copy.

New users are defined as anybody who hasn’t interacted with your domain—once they’ve left a comment or logged in, their cookies will ensure they get fresh pages. People arriving from Digg won’t notice that the comments are a minute or two behind but they’ll appreciate your site being up.

You don’t need PHP skills to install Batcache but you do have to get Memcached working first. That can be easy or hard. We use Memcached because it’s awesome. Once you know how to install it you can create the same kind of distributed, persistent cache that underpin web giants like WordPress.com and Facebook.

What Batcache does

The first thing Batcache does is decide whether the visitor is eligible to receive cached documents. If their cookies don’t show evidence of previous interaction on that domain they are eligible. Next it decides whether the request is eligible for caching. For example, Batcache won’t interfere when a comment is being posted.

If the visitor and the request are eligible, Batcache enters its traffic metering routine. By default it looks for URLs that receive more than two hits from unrecognized users in two minutes. When a URL’s traffic crosses that threshold, Batcache caches the document for five minutes. You can configure these numbers any way you like, or turn off traffic metering and send documents right to the cache.

Once a document has been cached, it is served to eligible visitors until it expires. This is one place where Batcache is different. Most other caches delete cached documents as soon as the underlying data changes. Batcache doesn’t care if it’s serving old data because “old” is relative (and configurable).

What Batcache doesn’t do

It doesn’t guarantee a current document. I repeat this because reliable cache invalidation is a typical feature that was purposefully omitted from Batcache. There is a routine in the included plugin that tries to trigger regeneration of updated and commented posts but in some situations a document will still live in the cache until it expires. This routine will be improved over time but it is only an afterthought.

Batcache doesn’t automatically know the difference between document variants. Variants exist when two requests for the same URL can yield two different documents. Common examples are user agent-dependent variants formatted for mobile devices and referrer-dependent variants with Google search terms highlighted. In these cases you MUST take extra steps to inform Batcache about variants to avoid serving a variant to the wrong audience. The source code includes examples of how to turn off caching of uncommon variants (search term highlighting) or cache common variants separately (mobile versions).

Where Batcache is going

I want to make Batcache easier to configure by adding a configuration page and storing the main settings in memcached as well as the database. This way you won’t have to deploy a code change to update the configuration. However, conditional configurations (e.g. “never cache URLs matching some pattern”) and variant detection will probably always live in PHP.

I want to have Batcache serve correct headers more reliably. On some servers it can detect the headers that were sent with a newly generated page and serve them again from the cache. But when that doesn’t work you will have to take extra steps to serve certain headers. For example you must specify the Content-Encoding header in the Batcache configuration or add it to php.ini. I want this sort of thing to be done automatically for all server setups.

I know that Batcache is not ideal for most WordPress installations. It saves us a lot of headaches and expense at WordPress.com, so maybe it can help other large installations. If you try it, I want to hear from you whether it worked and how well. I am also keen to see what new configurations and modifications you use.

As always, this software is provided without claims or warrantees. It’s so experimental that it doesn’t even have a version number! Until the project grows to need its own blog, keep an eye on the Trac browser for updates.

Austin WordPress Professional Office

May 27, 2008 by Andy

We’re thinking of opening an Automattic office in Austin. Being the only local employee, I imagine sharing the space with independent/satellite WordPress professionals. Are you interested?

Candidates should be earning all or part of their income working on WordPress (developing, designing, or servicing) and be able to defend their choice of editor. Benefits may include collaboration, networking, social opportunities, fortune, fame, romance, and French pressed Ruta Maya coffee.

We don’t have any locations in mind yet. Please include your home zip code for geographical tabulation. If you can recommend a cool location with flexible space, good bandwidth, and no long-term commitments, please do.

Prevent SSH timeouts: disable keepalive

May 16, 2008 by Andy

I never had the problem at home but on many other networks an idle connection would drop after a couple of minutes. I added this to ~/.ssh/config several hours ago and haven’t lost a connection since:

TCPKeepAlive no
ServerAliveInterval 20
ServerAliveCountMax 10

It’s counter-intuitive but it works for me. The usual disclaimers apply.

Bash equivalent for PHP realpath()

May 9, 2008 by Andy

For all you PHP hackers trying to write a BASH script and looking for an equivalent for PHP’s realpath function, try readlink. It can expand symbolic links and resolve relative paths like “./” and “../”. In a shell script, try this:

MY_PATH=$(readlink -f $0)

Thanks to Barry.



You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser