Following Directions
Tuesday, December 5th, 2006James Holderness just posted a helpful post discussing some of the issues with supporting bidirectional text in RSS. In the post he describes the functionality they implemented for Snarfer that attempts to guess the base directionality of text using a simple, but error prone, algorithm:
1. Initialise a counter to zero. 2. Look at the first n characters of the content with markup stripped. 3. If a character is from an RTL script, 4 increment the counter by one. 4. If a character is from an LTR script, 5 decrement the counter by one. 5. Once n characters have been processed, if the counter is positive, the content is considered to be RTL.
The key challenge with this approach is that it’s based on the premise that basically left-to-right text will contain more left-to-right characters than right-to-left characters. Obviously that’s not an assumption that’ll work out every time (which James duly acknowledges).
Unfortunately, with RSS, there is absolutely no reliable way of detecting the directionality of text that works 100% of the time short of explicitly sprinkling the Unicode Control Characters throughout the document; unfortunately most editors aren’t set up to support easy use of the Unicode Control Characters. The alternative is to try to guess the direction based on the specified language the way it appears IE7 does. However, even that approach is flawed as language is not always a clear indication of direction.
Personally, I think guessing sucks if you have the option of making it explicit. With Atom we have that opportunity.