skip to main | skip to sidebar

Friday, October 31, 2008

Boy Scout Check-ins

I missed the Boy Scouts callout when I was kid. Perhaps that's why it took me 20 years to figure out a knot that keeps my shoes tied all day. My recent attempt at starting a campfire required a propane torch. Despite my shortcomings, I felt a connection to the Scouts while reading this:

boyscoutsIt's not enough to write the code well. The code has to be kept clean over time. We've all seen code rot and degrade as time passes. So we must take an active role in preventing this degradation.

The Boy Scouts of America have a simple rule that we can apply to our profession.

Leave the campground cleaner than you found it.

If we all checked-in our code a little cleaner than when we checked it out, the code simply could not rot. The cleanup doesn't have to be something big. Change one variable name for the better, break up one function that's a little too large, eliminate one small bit of duplication, clean up one composite 'if' statement.

Can you imagine working on a project where the code simply got better as time passed? Do you believe that any other option is professional? Indeed, isn't continuous improvement an intrinsic part of professionalism? [Clean Code, p14]

While reading this, I was reminded of another gem:

Well-designed code looks obvious, but it probably took an awful lot of thought (and a lot of refactoring) to make it that simple. [Code Craft, p247]

Both sound great, but they have a sort of "Don't forget to floss" ring to them.

We've heard for years that refactoring helps improve code, but actually doing it can sometimes feel too daunting. It's just a fact that a lot of the code we deal with on a daily basis doesn't have enough test coverage, if there are any tests at all, to have us feel fully confident with many changes that we make to our code.

But a Scout wouldn't let the sad state of some parts of the code get him down. A Scout would always have his nose out for stinky code to clean up so that he could get a teeny bit of satisfaction knowing that he checked-in code that was better than when he checked it out.

This isn't a new idea; it's just a matter of realizing how easy it is.

Examples

There's a lot of low-hanging fruit in a typical code base. Over months of development, the top of a C# file might look like this:

using System.Linq;
using System;
using System.Collections.Generic;
using Moserware.IO;
using System.Diagnostics;
using System.Text.RegularExpressions;
using System.Text;using System.Threading;

This is a messy way to introduce your code. With just a simple click in Visual Studio, you can have it automatically remove clutter and sort things to get something nicer:

using System.Text.RegularExpressions;
using System.Threading;
using Moserware.IO;

For a three second time investment, you'll leave the file slightly better than you found it. With a free add-in, you can do this for your entire project in about the same amount of time.

Different team members can often work in the same class and you'll end up with member variable declarations that identify each person's style:

public class CircularBuffer<T>
{
T[] m_array;
private object syncRoot = new object();
private int _HeadIndex;
private int m_TailIndex;
...

This can easily be made consistent in few seconds:

public class CircularBuffer<T>
{
private T[] _Array;
private int _HeadIndex;
private int _TailIndex;
private object _SyncRoot = new object();
...

Changing a line like this:

if((i >= minAmount) && (i <= maxAmount))

to use number-line order:

if((minAmount <= i) && (i <= maxAmount))

makes the code slightly more visual and easier to read.

Introducing an explaining variable can turn:

private static DateTime GetElectionDay(int year)
{
DateTime startDate = new DateTime(year, 11, 1);
// Get the first Tuesday following the first Monday
if (startDate.DayOfWeek <= DayOfWeek.Monday)
{
// Current day of the week is before Tuesday
return startDate.AddDays(DayOfWeek.Tuesday - startDate.DayOfWeek);
}
else
{
// Current day of the week is Tuesday or after
return startDate.AddDays(7 - (startDate.DayOfWeek - DayOfWeek.Tuesday));
}
}

into a slightly more maintainable version that doesn't need comments:

private static DateTime GetElectionDay(int year)
{
DateTime nov1 = new DateTime(year, 11, 1);

int daysUntilMonday;
if (nov1.DayOfWeek < DayOfWeek.Tuesday)
{
daysUntilMonday = DayOfWeek.Monday - nov1.DayOfWeek;
}
else
{
daysUntilMonday = 6 - (nov1.DayOfWeek - DayOfWeek.Tuesday);
}

DateTime firstMonday = nov1.AddDays(daysUntilMonday);
return firstMonday.AddDays(1);
}

Along these lines, I tend to agree with Steve Yegge's observation:

The [Refactoring] book next tells me: don’t comment my code. Insanity again! But once again, his explanation makes sense. I resolve to stop writing one-line comments, and to start making more descriptive function and parameter names.

By themselves, each of these refactorings almost seems too simple. But each leaves your code in a slightly better place than you found it thereby qualify as a "Boy Scout Check-In."

Be Careful

We've probably all heard horror stories of some poor programmer who changed just "one little character" of code and caused rockets to explode or billion dollar security bugs. My advice is to not let that be you. Write more tests and use code reviews if that helps. Just don't let it be an excuse to not do something.

Boy Scout Check-ins should be short and measured in minutes for how long they take. Get in and out with slightly better code that fixed one small thing. Try hard not to fix a bug or add a feature while doing these small check-ins or you might face the wrath of your coworkers as Raymond Chen points out:

Whatever your changes are, go nuts. All I ask is that you restrict them to "layout-only" check-ins. In other words, if you want to do some source code reformatting and change some code, please split it up into two check-ins, one that does the reformatting and the other that changes the code.

Otherwise, I end up staring at a diff of 1500 changed lines of source code, 1498 of which are just reformatting, and 2 of which actually changed something. Finding those two lines is not fun.

Avoid Cycles

One subtle thing to realize is that you don't want to get lost in an infinite loop with a coworker of mutually exclusive changes. This is easy since good people can disagree. For instance, compare:

Use 'final' or 'const' when possible By declaring a variable to be final in Java or const in C++, you can prevent the variable from being assigned a value after it's initialized. The final and const keywords are useful for defining class constants, input-only parameters, and any local variables whose values are intended to remain unchanged after initialization. [Code Complete 2, p243]

and

I also deleted all the 'final' keywords in arguments and variable declarations. As far as I could tell, they added no real value but did add to the clutter. Eliminating 'final' flies in the face of some conventional wisdom. For example, Robert Simmons strongly recommends us to "... spread final all over your code." Clearly I disagree. I think there are a few good uses for 'final', such as the occasional 'final' constant, but otherwise the keyword adds little value and creates a lot of clutter. [Clean Code, p276]

Don't get bogged down with someone else on adding and removing 'final' or 'readonly'. Just be consistent.

I'm embarrassed to admit that in my earlier days, I might have "cleaned" code like this:

string headerHtml = "<h1>" + headerText + "</h1>";

into

string headerHtml = String.Concat("<h1>", headerText, "</h1>");

or even worse:

StringBuilder sb = new StringBuilder();
sb.Append("<h1>");
sb.Append(headerText);
sb.Append("</h1>");
string headerHtml = sb.ToString();

In this first case, I thought I was brilliant because I knew about the Concat method and thought it'd give me faster code. This is not true because the C# compiler special cases string concatenation to automatically do this. The second example is painful because it's uglier and slower and it isn't building a string in a loop where StringBuilders make a lot of sense.

After many dumb mistakes like this, I've finally decided that if I ever have a doubt on which option to use, I'll pick the more readable one:

Write your code to be read. By humans. Easily. The compiler will be able to cope. [Code Craft, p59]

Coda

Smokey3Boy Scout check-ins are small steps that help fix the broken windows that we all have in our code. When you look at your code, try to find at least one thing you can do to leave it better. Eventually it becomes a game where everyone benefits. These check-ins can often be a small reward for checking-in a large feature or fixing a nasty bug.

If you're pressed for time and can't make the change now or you want to save it for when you can make the change across all your code, make a note to yourself. I create empty change-lists in my version control system for this purpose. This is also helpful if you find a bug and want to later complained that hand-washing took too long and "wasted" their precious time. We all have our own pressures that might cause us to think that we can neglect basic code hygiene. Over time, this snowballs into a mess that we've all had to deal with.

Only you can make a difference in your code campground.


kick it on DotNetKicks.com

Monday, September 29, 2008

How Do Locks Lock?

When I was 9 or so, I was convinced I could write my own operating system in GW-BASIC. After all, I had written programs that played Mary Had a Little Lamb on the PC speaker while displaying CGA graphics. How much harder could an operating system be?

One thing had me stumped. How would I get my OS to multitask?

After giving it some thought, the best idea I could come up with was to write the programs for my "OS" in such a way that they would occasionally perform a GOSUB or GOTO to parts of other programs. By using a lot of mental handwaving to myself, I was convinced I could somehow make it work.

But the details came back to bite me. I never pulled it off. I put it aside and worked on other things. I just assumed that something magical happened under the covers. My bedtime reading of magazines like PC/Computing convinced me that the magical solution to the problem included buzzwords like "preemptive multitasking" and "multithreading." Both of these ideas were in the upcoming Windows NT 3.1 design. I didn't really understand what these buzzwords meant, but at least they sounded impressive.

Even though I've learned a bit more about writing programs that can do more than one thing at a time since those early days, I still have points where I don't understand exactly what is going on. Recently, I decided that I wanted to stop my handwaving about locks and do whatever it took to find out how they, well... lock.

A View from the Top

Locks let you express your belief that something bad might happen if two threads go through a section of code at the same time. Locks are present in many languages. Here's an example of one in C#:

object syncRoot = new object();

private void MyFunction()
{
lock (syncRoot)
{
// only one thread can enter this locked gate at a time.
}
}

The above lock creates a "gate" that only allows one thread at a time regardless of what that thread is doing. This is often too pessimistic and inefficient. In a lot of cases, it doesn't matter if multiple threads are reading a piece of data at the same time. However, we want to make sure that there is only one writer at a time. Furthermore, we want to make sure that the readers don't see corrupted garbage.

A lock that is optimized for allowing multiple readers at a time while still being consistent in the face of writers is called a reader/writer lock. Here's an example of some typical code you might see in an application that has to cache objects that take a long time to create:

private readonly Dictionary<string, ReallyExpensiveObject> expensiveObjectCache
= new Dictionary<string, ReallyExpensiveObject>();

private readonly ReaderWriterLock rwl = new ReaderWriterLock();

public ReallyExpensiveObject GetExpensiveObject(string key)
{
ReallyExpensiveObject expensiveObject;

rwl.AcquireReaderLock(Timeout.Infinite);

try
{
if (this.expensiveObjectCache.TryGetValue(key, out expensiveObject))
{
// Cache hit
return expensiveObject;
}

// Cache miss
LockCookie cookie;

cookie = rwl.UpgradeToWriterLock(Timeout.Infinite);
try
{
expensiveObject = new ReallyExpensiveObject();
expensiveObjectCache[key] = expensiveObject;
return expensiveObject;
}

finally
{
rwl.DowngradeFromWriterLock(ref cookie);
}
}
finally
{
rwl.ReleaseReaderLock();
}
}

This code will allow more than one reader at a time and will handle the case of upgrading the read lock to a write lock if it needs to insert an item into the cache. The code is a bit long and tedious to write. [1] Even worse, it has a very subtle bug in it that we'll look at later.

We first need to understand what is going on inside a lock. Since reader/writer locks are usually more interesting than a normal lock, we'll focus on the details of a reader/writer lock implementation. [2]

Entering an Internal Lock

Locks themselves have internal data that they must protect from corruption by competing threads. This internal data includes the number of active readers and how many threads are waiting to write data. In order to provide this protection, locks must enter an internal lock.

For simplicity, locks usually have a single variable that indicates if a thread has acquired the internal lock. I'll illustrate this and other concepts by posting snippets from Vance Morrison's excellent reader-writer lock sample: [3]

 private void EnterMyLock()  {
if (Interlocked.CompareExchange(ref myLock, 1, 0) != 0)
EnterMyLockSpin();
}

In Vance's code, a member variable named "myLock" is set to 1 if the lock is held, otherwise it is 0. The Interlocked.CompareExchange function is interesting because it does two things. If "myLock" is 0, then it is set to 1. If "myLock" doesn't equal 0, then we know that someone else has the internal lock. The documentation says that "the compare and exchange operations are performed as an atomic operation." I was curious how this actually worked, so I looked at .NET's source code and eventually hit this wall:

[MethodImplAttribute(MethodImplOptions.InternalCall)]
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)]
public static extern int Exchange(ref int location1, int value);

In my earlier .NET days, I would have given up at this point. The "InternalCall" attribute indicates that the CLR handles the call internally. It seemed like I would never quite know what happened inside this mysterious internal call.

And then I met Rotor.

Inside an InternalCall

Rotor lets you look at the CLR's internals that are mostly written in C++ with a small amount of assembly code. [4] If you navigate through several layers of function calls, you'll see that Interlocked.CompareExchange ultimately reaches this function in clr/src/vm/i386/asmhelpers.asm that is written in x86 assembly: [5]

FASTCALL_FUNC CompareExchangeMP,12
_ASSERT_ALIGNED_4_X86 ecx
mov eax, [esp+4] ; Comparand
cmpxchg [ecx], edx
retn 4 ; result in EAX
FASTCALL_ENDFUNC CompareExchangeMP

[image]By far, the most interesting aspect of this is the "lock cmpxchg" line. Besides giving gamers power to play DOOM, the 486 chip introduced a "cmpxchg" instruction that will both compare a value and exchange it with a new value if the comparison succeeds. It does this in a single CPU instruction. This is how the documentation can claim that it is an "atomic" operation. We'll see why it's very important to do this in a single instruction a bit later. The "lock" prefix is used to tell the CPU to assert a physical signal so that no other core can access memory at the same time. [6] This hardware locking is how the "myLock" variable can remain safe from simultaneous access by other cores.

But now what happens if another thread already had the internal lock?

Spinlocks

When locks need to wait on the internal lock, they typically use a spinlock. This tells the processor to essentially spin around in circles in order to waste a little bit of time so that the thread that has the internal lock can possibly finish what it is doing and exit the lock.

Here's an example from Vance's code of how a spinlock is performed:

private void EnterMyLockSpin()
{
for (int i = 0; ;i++)
{
if (i < 3 && Environment.ProcessorCount > 1)
Thread.SpinWait(20); // Wait a few dozen instructions to let another processor release the lock.
else
Thread.Sleep(0); // Give up my quantum.

if (Interlocked.CompareExchange(ref myLock, 1, 0) == 0)
return;
}
}

It's a basic for loop that has a few interesting bits. The first thing to notice is that this loop will keep going until the CompareExchange succeeds. That is, until we acquire the internal lock.

On a multiprocessor system, it will try a Thread.SpinWait for 3 times. The documentation for SpinWait is a bit confusing about what it does with the "20" argument:

SpinWait essentially puts the processor into a very tight loop, with the loop count specified by the iterations parameter. The duration of the wait therefore depends on the speed of the processor.

What exactly is it doing? Looking at Rotor's clr/src/vm/comsynchronizable.cpp gives us the reality:

FCIMPL1(void, ThreadNative::SpinWait, int iterations)
{
WRAPPER_CONTRACT;
STATIC_CONTRACT_SO_TOLERANT;

for(int i = 0; i < iterations; i++)
YieldProcessor();
}
FCIMPLEND

Further diving shows that "YieldProcessor" is this macro:

#define YieldProcessor() __asm { rep nop }

This is a "repeat no-op" assembly instruction. It's also known in the Intel instruction set manual as "PAUSE - Spin Loop Hint." [7] This means that the CPU knows about the spin waiting that we're wanting to accomplish.

As the documentation hinted, the time spent waiting is dependant upon the clock rate of the machine. In my case, I have an EE6600 Core 2 Duo where each core runs at 2.4 GHz, and it's about 3.4 nanoseconds for each "YieldProcessor" call. [8] This means it takes approximately 70 nanoseconds for 20 iterations. Three unsuccessful passes through this loop will cost about 210 nanoseconds.

I'm sure that if reader/writer locks had cheerleaders inside of them, they would be shouting in hope that the spin wait used enough time so that the thread that had the internal lock that is protected by the "cmpxchg" will have released it.

Why does the lock try so hard to get by with only a spin wait? In a word: speed.

By not being able to satisfy the request with a spin wait, the thread gives up and goes to sleep. This requires intervention by the OS to potentially swap it out and run a different thread that that has been waiting. By letting another thread run, there is a chance that the CPU will have to toss out its onboard cache only to fetch it back when the thread gets swapped back in. All of this could potentially add up to hundreds of thousands of CPU cycles that aren't being used to get "real" work done. [9]

How Does a Thread Sleep? How Does it Know When to Wake Up?

Windows keeps track of the state of each thread in your system. [10] Here's a simplified thread state chart from the excellent Windows Internals book:



When your thread is "Running" and is interrupted because its time-slice ran out or it voluntarily gave up its time slice by using a Sleep(0) request, the OS will save the state of what it is doing by performing a context switch. The fact that this switch can occur at any time is why it was critical that the CompareExchange code execute in a single CPU instruction that cannot be interrupted. If it didn't, it would be possible to put the lock in an inconsistent state between the compare and the exchange.

When the Sleep(0) call is made, the thread is put in a "Ready" queue meaning that it's "ready" to run and not waiting on anything. [11]

While we're at this chart, it's important to note that one of the possible thread states is "waiting." Windows will keep track of the events that a thread is waiting on.

Events are useful for signaling between threads. For example, a reader/writer lock can "signal" other threads by informing the OS that an event has been "set." What happens next inside of the thread scheduler/dispatcher depends on the type of event that was set: [12]

AutoResetEvent: In this case, only a single thread that is waiting for the event will be put into the "ready" queue. Anyone else that is waiting for the event stays in the waiting queue. This is useful for releasing exactly one writer. ManualResetEvent: When signaled, Windows will update all of the threads that are waiting on the event to be moved into the ready queue. Effectively, it releases them all. This is useful for releasing all waiting readers.

What's important to realize is that we have the cooperation of the thread dispatcher in the OS to obey the thread states. By simply being in the waiting queue, the threads aren't even considered to be scheduled to run and thus consume no real CPU time.

Putting It All Together

It's been a long journey, but we now have all the background information we need in order to understand how a reader/writer lock actually locks:

Acquiring a Read Lock:
Enter the internal lock (spin wait or sleep as necessary) Look to see If there is anyone writing
If no one is writing and no writers are waiting, just increase the number of readers. This is the happy path. If a writer is writing, tell the OS that we want to wait until the "ready for readers" event is set.
Exit the internal lock.
Exiting a read lock:
Enter the internal lock Decrement the total number of readers. Check the total number of readers remaining
If there are still readers remaining, then make sure that all waiting readers have been released so they can run. If there are no readers left and there is at least one writer waiting, signal the appropriate event so that exactly one of the waiting writers wakes up and will get the lock.
Exit the internal lock.
Acquiring a write lock:
Enter the internal lock Check the number of readers or if there is an active writer in progress. In addition, check to see if there are any in line waiting to do either.
If there are no readers or writers, record that we're writing. This is the happy path. Otherwise wait for our turn in line on the write queue by waiting on the "ready for writing" event.
Exit the internal lock
Exiting a write lock:
Enter the internal lock Check how many readers or writers are waiting
If there are no waiting writers, immediately signal all readers at once. Alternatively, release one writer that has been waiting.
Exit the internal lock.

As it turned out, the locks themselves are fairly simple. They just had a few pieces that came together in interesting ways.

The Problem of the Upgrades

There's a subtle problem with reader/writer locks: Imagine that two threads are both in a read lock and then they both decide that they need to upgrade to a write lock. Because a writer must wait for all readers to finish before acquiring a write lock, there seems to be a deadlock scenario since neither one wants to give up its read lock.

The secret trick is that they indeed must give up their read lock if there are any other active readers. This forces a correct implementation to do something like this when performing an upgrade:

// Cache miss
LockCookie cookie;

int writerSeqNum = rwl.WriterSeqNum;

cookie = rwl.UpgradeToWriterLock(Timeout.Infinite);

try
{
if (rwl.AnyWritersSince(writerSeqNum))
{
// Since another writer snuck in, we might no longer need
// to create a new object. Let's check again to see if we
// need to do this.

if (this.expensiveObjectCache.TryGetValue(key, out expensiveObjectCache))
{
// The other writer(s) must have created it
return expensiveObjectCache;
}
}

// Other writers didn't sneak in, so we know we need to create
// the object.

expensiveObject = new ReallyExpensiveObject();

expensiveObjectCache[key] = expensiveObject;

return expensiveObject;

}
finally
{
rwl.DowngradeFromWriterLock(ref cookie);
}

Each time a writer lock is acquired, the WriterSeqNum is incremented. The "AnyWritersSince" function will check to see if any writers have occurred since the moment just before we upgraded. If this is true, then it means that another writer might have come in and thus we need to recheck our assumptions.

This is just one way to handle the upgrade problem. [13]

As Joe Duffy explained on his blog, the new ReaderWriterLockSlim gets around the upgrade problem by introducing an "UpgradeableRead." When a thread enters an upgradeable read, the lock will force all subsequent readers to wait until the thread decides if it wants to upgrade. In the mean time, existing readers are allowed to finish. If an upgrade is requested, the lock will wait until no more readers are present. This preserves the integrity of the data.

Here's a full example:

private ReaderWriterLockSlim slimLock = new ReaderWriterLockSlim();

public ReallyExpensiveObject GetExpensiveObjectSlim(string key)
{
ReallyExpensiveObject expensiveObject;

bool upgraded = true;

this.slimLock.EnterUpgradeableReadLock();

try
{
if (this.expensiveObjectCache.TryGetValue(key, out expensiveObject))
{
// Cache hit
upgraded = false;
return expensiveObject;
}

// Cache miss
this.slimLock.EnterWriteLock();

try
{
expensiveObject = new ReallyExpensiveObject();
expensiveObjectCache[key] = expensiveObject;
return expensiveObject;
}
finally
{
this.slimLock.ExitWriteLock();
}
}
finally
{
slimLock.ExitUpgradeableReadLock();
}
}

While showing the full syntax needed, it's a bad example of using a ReaderWriterLockSlim since it doesn't allow for multiple readers. The code might perform better by just entering a read lock if reads occur much more frequently than writes. If the a write is actually needed, then a subsequent write lock can be acquired and a double check can occur to see if a write is still needed. Alternatively, a classic Monitor style lock can be used.

Conclusion: Why Should I Care About the Details?

Our journey has taken us from the high level of using a reader/writer lock to the low level of a single signal on the processor bus. Knowing this low level "goo" is helpful for a few reasons:

Ideally, lock sections should be kept small so that no lock contention occurs in the first place so that we don't need to "spin wait" at all (to acquire the internal lock). Since readers are released all at once via an event, it shows how a reader/writer lock gets its performance benefit over a normal Monitor lock that would force waiting readers to wait in a line to get released one-by-one. We now can see that locks aren't magic at all. They just have a few moving parts.

With a better understanding of locks, I'll focus my handwaving on other concepts... like building a work-stealing thread pool.

UPDATE 30 Sep 2008: I fixed the footnote links and updated conclusion #1 based off comments.

UPDATE 3 Oct 2008: I updated the paragraph discussing why spin waits can be helpful based off some good discussion in the comments.


kick it on DotNetKicks.com



Notes

(This article had a few places where I made some claims. I'm sure I probably got some things wrong or might have misunderstood something in my explorations. I'm putting notes here to give background of what I was thinking when I wrote the post. Please feel free to correct me and keep me honest via comments for the benefit of others)

[1] If performance isn't insanely critical, you can create your own lock and have helper methods like "EnterScopedRead" that return an IDisposable object that will automatically release the lock when it's disposed. Then, you can use a C# "using" statement with it to reduce some of the code clutter.

[2] I still think that reader/writer locks are more interesting than normal Monitor type locks. However, if you have a relatively balanced number of readers and writers (e.g. no clear readers majority), then Monitors make more sense since they're slightly faster. However, the code that Monitors uses a lot more assembly code for performance reasons and thus is often harder to understand.

[3] Even a cursory look at the source code of .NET 3.5's new ReaderWriterLockSlim in Reflector shows that Vance's lock code was the basis for it. ReaderWriterLockSlim is slightly more complex to handle cases like a recursive lock (e.g. acquiring a read lock when you have a write lock), but the core ideas remain the same.

[4] To be technically correct, Rotor isn't exactly what the commercial CLR is. Some of the aspects like the garbage collector and the Just In Time (JIT) compiler are different due to trade secrets. However, I don't think the parts I'll be mentioning here are different between the reference implementation and the actual commercial implementation. For more details on Rotor, I recommend reading Joel Pobar and Ted Neward's new book on Rotor 2.0.

[5] Things aren't quite that simple. The first thing to realize is that the CLR has a Platform Adaptation Layer (PAL) that handles all architecture specific code. Thus, there can be equivalent assembly code for other chips like the PowerPC which is located in clr/src/vm/ppc. Furthermore, the actual x86 code can be one of these two functions:

FASTCALL_FUNC CompareExchangeUP,12
_ASSERT_ALIGNED_4_X86 ecx
mov eax, [esp+4] ; Comparand
cmpxchg [ecx], edx
retn 4 ; result in EAX
FASTCALL_ENDFUNC CompareExchangeUP

or

FASTCALL_FUNC CompareExchangeMP,12
_ASSERT_ALIGNED_4_X86 ecx
mov eax, [esp+4] ; Comparand
lock cmpxchg [ecx], edx
retn 4 ; result in EAX
FASTCALL_ENDFUNC CompareExchangeMP

If you look at cgenx86.cpp, you'll see that CompareExchangeUP is the default and the CompareExchangeMP is used if your system has more than one processor:

CmpXchgOps FastInterlockCompareExchange = (CmpXchgOps)CompareExchangeUP;

// Adjust the generic interlocked operations for any platform specific ones we
// might have.

void InitFastInterlockOps()
{
...

  if (g_SystemInfo.dwNumberOfProcessors != 1)
{
...

FastInterlockCompareExchange = (CmpXchgOps)CompareExchangeMP;

...
}

...
}

From this, I was able to determine that "CompareExchangeUP" is for uniprocessor machines and "CompareExchangeMP" is for multiprocessor machines. If you look at the code carefully, you'll notice that the only difference is that the uniprocessor version doesn't bother locking the processor bus. This is safe because there is no chance of simultaneous memory access.

[6] The truth is a little bit more complicated, but more interesting. The Intel 64 and IA-32 Architectures Software Developer's Manual - Volume 3A: System Programming Guide, Part 1 tells us:

7.1.4 Effects of a LOCK Operation on Internal Processor Caches.

For the Intel486 and Pentium processors, the LOCK# signal is always asserted on the bus during a LOCK operation, even if the area of memory being locked is cached in the processor.

For the P6 and more recent processor families, if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write-back memory and is completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location internally and allow [its] cache coherency mechanism to insure that the operation is carried out atomically. This operation is called "cache locking." The cache coherency mechanism automatically prevents two or more processors that have the same area of memory from simultaneously modifying data in that area. (emphasis added)

Here we learn that the P6 and newer chips are smart enough to determine if they really have to block off the bus or can just rely on intelligent caching. I think this is a neat optimization.

[7] I think the "PAUSE" instruction is an interesting example of how Intel allows for backwards compatibility. Its opcode of "F3 90" makes it identical to "REP NOP" which will cause older processors to still do nothing, however newer chips use this odd opcode as a clear hint to do more efficient things.

[8] If you're curious about where I came up with the numbers, look at this discussion on the Intel forums. It's a bit hard to measure since context switches might skew the results. My lowest run yielded numbers that indicate a nop took 0.34 clocks per instruction and that "PAUSE" took 8.21 clocks per instruction. I did (2,400,000,000 clocks / second) * (1 PAUSE / 8.21 clocks) to get 292,326,431 PAUSEs / second or 3.42 ns / PAUSE.

[9] UPDATE: The original statement made a statement that this was 100,000 times slower. After some good discussion in the comments, I'm not going to claim specifics since it depends a lot on the machine's workload and a thread's working set.

It's interesting to note that Vance's original code gives up after only three tries of around 70ns each. The ReaderWriterLockSlim that was ultimately derived from it keeps up the fight a little longer. It does something like "Thread.SpinWait(20 * (i + 1))" which leads to a geometric back-off over time for a total of up to 10 iterations which yields the equivalent of 1100 "YieldProcessor" calls or approximately 3.76 microseconds. This means it stays in the fight 1100 / 210 or 5.2 times longer.

(See the comments for more discussion on this)

[10] It's important to realize that processes don't run on Windows, but threads do. Although processes provide isolation boundaries, they require a thread to execute code.

[11] The Thread.Sleep(0) call ultimately reaches the SleepEx API that has this documentation:

A[n argument value of] zero causes the thread to relinquish the remainder of its time slice to any other thread of equal priority that is ready to run. If there are no other threads of equal priority ready to run, the function returns immediately, and the thread continues execution.

It's interesting that ReaderWriterLockSlim does a Sleep(0) for 5 times before finally doing Sleep(1) until it reaches success. I'm assuming it does the Sleep(1) in order to protect users from themselves in the event that the thread that has the lock and then goes into a state where it's waiting on something (e.g. I/O) to occur. Lock sensitive code should not be doing this, but it doesn't hurt to provide this forced 1 ms minimum delay. If they didn't do this, the thread that had the lock would remain on the waiting queue while waiting for I/O and this would potentially have the thread that needs the lock to immediately be put back on the run queue without burning much CPU time.

Finally, I'm assuming that SleepEx(0) performs a software interrupt. This is like a GOTO that jumps to a well known address that that handles the scheduling/dispatching of threads in the kernel.

[12] I'm simplifying things a bit here. Threads can wait on multiple events at one time. Therefore, setting one event might still mean that a thread is waiting on other events.

As an aside, it was only while researching this post that I realized why the WaitForMultipleObjects API is so useful. By using it, you can eliminate a very costly process of putting your thread back on the ready queue and getting scheduled only to find out that it still needs to wait and thus go back on a the wait queue.

[13] While the ReaderWriterLockSlim handles things properly, the early version that Vance posted will throw an error on a simultaneous upgrade. This is technically another way to handle the upgrade problem :-).

Monday, August 25, 2008

Meta-FizzBuzz

By itself, solving the "FizzBuzz" problem won't get you a new job. If it does, you probably wouldn't want that job. It's a weed out type of question that doesn't say much about your skill if you get it right, but it hurts your credibility as a programmer if you can't do it. It asks:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

By rearranging the words in the problem statement and adding a few programmer keywords, you can create this simple pseudocode solution:

for every number from 1 to 100                          
if the number is a multiple of 3 and it is a multiple of 5 then
print "FizzBuzz"
else if it is a multiple of 3 then
print "Fizz"
else if it is a multiple of 5 then
print "Buzz"
else
print the number

See? Simple and boring.

But what if the bar were raised in the future to be this "Meta-FizzBuzz" problem:

Create a language implementation and write a solution to the FizzBuzz problem using it.

It sounds crazy at first, but if you take it seriously, it is way more interesting than the original FizzBuzz problem. It forces you to have a deeper understanding of languages rather than just getting by with the basics. It also shows that languages are full of many design choices that often have several alternatives.

I decided to create a language implementation that would execute the above "pseudocode" solution. While working on the implementation, I discovered that needed to handle cases that I hadn't seen in "real" languages:

Variables can be prefixed by "the" to make the code more readable. Instead of having a modulo operator (e.g. %), I picked the phrase "is a multiple of". This language supports the special "it" variable that always refers to the last variable that was explicitly referenced.

These were just arbitrary decisions that I made to make the actual FizzBuzz program easy to read. Even though I'm still in the early stages of the development of OMeta#, it wasn't too difficult to write a complete "Meta-FizzBuzz" implementation in about 100 lines of code with generous use of comments and whitespace:

ometa MetaFizzBuzz : Parser {
// Basic literals come first:

// We override the Parser's version of "Number" which returns a string
// to instead return the integer value of the string.
Number ^= Spaces ('+' '-' Empty):prefix
Digit+:ds -> { int.Parse(((prefix.Count > 0) ? prefix.As<string>() : "") + ds.As<string>()) },

// Allow literal strings surrounded by quotes (e.g. "FizzBuzz")
QuotedString = Spaces '"' (~'"' :c)*:inner '"' -> { inner.Count == 0 ? "" : inner.As<string>() },

// For more natural sounding code, we allow "the" in front of a variable name.
// In addition, we keep track of the last variable name used so that we can refer to
// it as "it" later on.
VariableName = ("the" Empty) Spaces
FirstAndRest("Letter", "LetterOrDigit"):n
!{ Set("_it", n.As<string>()) } -> { n },

// Expressions are functions that evaluate to some object.
// We don't return the value directly since expressions can depend on
// global variables that could be change while executing.
Exp = AndExp
BinExp
NumExp
QuotedString:qs -> { (Func<object>) (() => qs.As<string>()) },

// An "and" expression is the highest level of a function that returns
// a boolean value. This left-recursive definition allows for an arbitrary
// number of boolean expressions to be and-ed together.
AndExp = AndExp:l "and" BoolExp:r -> { (Func<object>)(
() => (
(bool)l.As<Func<object>>()()
&&
(bool)r.As<Func<object>>()()
))
}
BoolExp,

// This rule looks at what the expression returns and then tries to
// convert non-boolean values by declaring that non-zero values and
// non-empty strings are true and everything else is false.
BoolExp = Exp:e -> { (Func<object>)(
() => {
object o = e.As<Func<object>>()();
return o is bool ? (bool)o
: o is int ? ((int)o) != 0
: !string.IsNullOrEmpty(o as string) && (o != OMetaList<HostExpression>.Nil);
})
},

// Binary expressions take two arguments (left and right).
// Here we just have the one that is relevant to FizzBuzz
BinExp = NumExp:left
"is" "a" "multiple" "of"
NumExp:right -> { (Func<object>)(
() => {
var l = (int)left.As<Func<object>>()();
var r = (int)right.As<Func<object>>()();
return (l%r) == 0;
})
},

// Number expressions are functions that just return an integer.
// Note that "it" resolves to the *value* of the variable that was last referenced by name.
NumExp = Number:n -> { (Func<object>) (() => n.As<int>()) }
"it" -> { (Func<object>) (() => Get<int>(Get<string>("_it"))) }
VariableName:vn -> { (Func<object>) (() => Get<int>(vn.As<string>())) },

// Statements are the things that actually do work in this language.
Stmt = "print" Exp:e -> { (Action) (() => { Console.WriteLine(e.As<Func<object>>()()); }) }
"if" AndExp:b "then" Stmt:t
("else" Stmt Empty):f -> { (Action) (
() => {
if((bool)b.As<Func<object>>()())
t.As<Action>()();
else if(f.Count > 0)
f.As<Action>()();
})
}
"for" "every" VariableName:n "from"
Number:low "to" Number:high
Stmt:s -> { (Action) (
() => {
int lowerBound = low.As<int>();
int upperBound = high.As<int>();
string iterationVar = n.As<string>();
Action iterationStmt = s.As<Action>();
for(int i = lowerBound; i <= upperBound; i++)
{
Set(iterationVar, i);
iterationStmt();
}
})
},

// A "block" is zero or more statements.
Block = Stmt*:s -> { (Action) (
() => {
foreach(var currentStatement in s)
{
currentStatement.As<Action>()();
}
})
},

// And finally, a "program" is just a series of statements, which is a "block"
Program = Block
}

Full disclosure: It took some debugging to get this implementation to work right.

If I had been beamed into the future and had to write this solution on a whiteboard, I would have been a bit nervous that I wouldn't have gotten it right and would have taken noticeably more time than I now take to write a FizzBuzz solution.

In order to keep the code size down, I directly execute the "pseudocode" rather than convert it to an intermediate tree. I also simplified things by making all variables in this language global. The latest version of the source code has a slightly more elaborate version of Meta-FizzBuzz that also allows an alternative pseudocode implementation.

The current implementation of OMeta# uses C# 3.0 as the host language. This makes everything inside of the { }'s two to three times uglier and larger than it might have been if I had used a more dynamic host language like JavaScript, Python, Ruby, or possibly C# 4.0. As I mentioned in my last post, my hope is that eventually I'll be able to use clever techniques to make the host code appear more dynamic even though it might need to be strongly typed.

In the end, the Meta-FizzBuzz problem is just a thought experiment to guess what the low bar might look like in the future. That is, a future where it's so easy to create a language that perfectly fits your problem that people actually do. As an industry, we're not there yet, but it's fun to dream of the STEPS needed to get there.

Feedback: What do you think the low bar will be in an upcoming era of Moore's Law Software? What do you think of the Meta-FizzBuzz problem? Feel free to post your own solution (or outline of a path that you'd take towards a solution) in the comments. Comments are also encouraged about OMeta#'s current syntax in general.


kick it on DotNetKicks.com

Thursday, July 31, 2008

Building an Object-Oriented Parasitic Metalanguage

It's all about pattern matching. The rest is just details.

My last post on OMeta# focused on what it is and the vision I have for its future. Briefly, OMeta is a new language that makes it easier to design programming languages in an object-oriented fashion using very few lines of quite readable code. OMeta# is my attempt to support OMeta on the .NET runtime. My main strategy has been to base its design off of Alessandro Warth's JavaScript implementation of OMeta/JS. This post gets into the specifics of how I've done this so far, but my hope is that you might see some places where the design could use improvement and that you might be encouraged to leave a comment so that it could get better.

High Level Design

At its core, OMeta has a very simple design. The highest level of abstraction is a grammar. Here is a calculator grammar in the current implementation of OMeta#:

ometa Calculator <: Parser {
Digit = Super((Digit)):d -> { d.ToDigit() },
Number = Number:n Digit:d -> { n.As<int>() * 10 + d.As<int>() }
Digit,
AddExpr = AddExpr:x '+' MulExpr:y -> { x.As<int>() + y.As<int>() }
AddExpr:x '-' MulExpr:y -> { x.As<int>() - y.As<int>() }
MulExpr,
MulExpr = MulExpr:x '*' PrimExpr:y -> { x.As<int>() * y.As<int>() }
MulExpr:x '/' PrimExpr:y -> { x.As<int>() / y.As<int>() }
PrimExpr,
PrimExpr = '(' Expr:x ')' -> { x }
Number,
Expr = AddExpr
}

A grammar is composed of rules (which are sometimes referred to as "productions"). Both the grammar itself and the rules have the same form:

[image]

Going into the rule is a list of things of the input type (here they're represented by red circles) and coming out on the right is a list of things of the destination type (represented by purple triangles). As you can see from the diagram, the output list can have lists inside of it. In the calculator grammar, the circles are individual characters and the output is a single integer.

My "Ad Hoc, Informally-Specified, Bug-Ridden, Slow Implementation of Half of Common Lisp"

As an implementer of OMeta, you have to make several design choices. One of the first is how to represent grammars. Since OMeta is object-oriented, I decided that it made sense to represent grammars as a .NET class (a.k.a. "Type"). Grammars contain rules; the obvious choice for implementing rules is to use a method. This especially makes sense because rules can refer to rules in their base grammar. For example, the above calculator grammar has a rule called "Digit" which captures a single numeric digit. The "Digit" rule applies the "Digit" rule in its base grammar (Parser) that simply captures a digit character. It does this by applying the special "Super" rule which will use the base grammar to apply the rule that is specified by an argument. Since rules are implemented as methods and rules can "override" their definition from their base grammar, it made sense to make rule methods "virtual."

Another important decision is how to represent data going into and out of a rule application. An easy choice is to represent the data in a List<T>. This works out well except for the small detail that lists can contain other lists. This nested list idea has been around for at least 50 years in LISP and languages derived from it (like Scheme). Since it has worked out fairly well in those languages, I stole the idea of a LISP-like list in my OMetaList<TInput> class which is a list of other lists with a special case that makes lists of length 1 equivalent to the element itself.

These decisions get us started, but a problem comes up if you try to implement the factorial function using recursion in an OMeta grammar:

ometa Factorial {
Fact 0 -> { 1 },
Fact :n ?(n.As<int>() > 0) Fact((n.As<int>() - 1)):m -> { n.As<int>() * m.As<int>() }
}

If you look carefully at this grammar, you'll see that it only says two things:

If the next element of input matches a 0, return 1 If the next element of input is something greater than zero, store its value to "n" and then apply the "Fact" rule supplying it with the value of "n - 1" and capture that result to "m". Finally, return "n * m".

What might not be obvious at first is that any time you store something to a variable using an expression like ":n", the value stored must be of the destination type (e.g. a triangle in the diagram). This makes us need to update our mental model for how OMeta works. We need a place to store rule "arguments" which are already in the destination type.

One way of doing this is to create an argument stack that is separate from the rest of the input. Additionally, we'll need a place to store grammar-local variables as well as a place to remember previous results of rule applications for performance reasons. This gives us a much more complete picture of what really occurs:

[image]

When a rule needs to consume input, it will first attempt to grab pop an item from the argument stack. If the stack is empty, then next element from the input list can be read.

I decided to store all of these details into a single class which I call an OMetaStream<TInput> which internally uses an OMetaList<TInput> for input data and an OMetaList<HostExpression> for storing the argument stack. The arguments are actually stored in an OMetaProxyStream<TInput> which inherits from OMetaStream<TInput>, but this is just an implementation detail.

With the major data structures out of the way, one important choice remaining is how to handle going down the wrong path. If you look at the calculator grammar, you'll see that the "AddExpr" rule can be of the form "x + y", or "x - y", or a "MulExpr".

How does OMeta know which one to take? It makes a prioritized choice by trying each of the alternatives in the order specified. If it discovers that it has gone down the wrong path (e.g. the next item from the input is a "-" instead of a "+"), it needs to backtrack and reset itself to the condition it was in before the bad choice and then try the next option.

There are two obvious ways of doing this. The first approach is to leverage runtime support by using exceptions. This is exactly what OMeta/JS does. Using exceptions has an advantage of making the generated code simpler because there is no explicit code to "unwind the stack" in case you go down a bad path.

Although it makes the code easier to read, I decided against using exceptions and instead chose to use methods that return a boolean value to indicate if they succeeded. I did this for two important reasons:

.NET does a bit of extra work when an exception is thrown (e.g. getting a stack trace). Therefore, throwing an exception has a non-trivial performance penalty. Rules have prioritized choice. This means that failing a single choice doesn't imply that the whole rule fails. Failing can be a very common occurrence (it's not exceptional) and this makes it very performance sensitive.

Here's a small glimpse of what the generated code looks like for attempting to apply the "AddExpr" rule:

OMetaList<HostExpression> x;
OMetaStream<char> addExprOut;

if(!MetaRules.Apply(AddExpr, inputStream2, out x, out addExprOut))
{
return MetaRules.Fail(out result2, out modifiedStream2);
}

In this case, AddExpr is the name of the rule which is implemented as a method. Data is read from the "inputStream2" which is an OMetaStream (where the red circles are characters). The resulting list is saved to the "x" variable and the modified stream is stored in "addExprOut". In order to make back-tracking easier, OMetaStreams are mostly immutable, which means that they can't change. Instead of updating the stream itself, you get a fresh copy of the stream that contains what the result would be. If you need to back track, you can simply ignore the rule application output. As a side benefit, it makes it slightly easier to debug since you don't have to worry about the mutations that could otherwise occur.

Parasitic?

[image]OMeta's creator likes to refer to OMeta as a "parasitic language" because it doesn't exist on its own. It always lives on top of a "host language." This makes an implementation usually defined in terms of its host language. That's why there is "OMeta/JS" which uses JavaScript as a host language. Although my ultimate goal for OMeta# is to have the host language be any .NET language, I decided to use C# as my first host language.

One of the first issues that came up was trying to recognize what exactly is C# code. As you can see from examples in my last post, my first attempt was to pick a really ugly pattern that wouldn't occur in normal use. I picked two consecutive @ signs as in:

@@/*This is C#*/@@

It worked out well, but it had the downside that it was, well... incredibly ugly. I have since written a rudimentary C# recognizer written in OMeta# itself that is aware of strings and comments. The upside is that the syntax looks a little nicer:

{ /*This is C#*/ }

Using C# as the host language had several implications. The most challenging was working with C#'s static type system. This was both a blessing and a curse. On the one hand, I had to constantly think about where the data came from and where it was going. This forced me to actually understand the overall design better. On the other hand, I had to spend so much time thinking about it. Late-bound systems with dynamic typing such as Smalltalk and JavaScript remove the need for people to spend so much time doing this.

I'm optimistic that over time a lot of the stronger annoyances will go away. For example, the current syntax forces type identification as in:

-> { n.As<int>() * 10 + d.As<int>() }

Where, OMeta/JS can just say:

-> { n * 10 + d }

I'm not exactly sure how to do this yet. My initial guess is that I might be able to write some sort of type inference algorithm (preferably written in OMeta#) that could do a reasonable job. My current ideas are tied too closely with the specific host language and would require a bit of work to port them to other host languages like Visual Basic. Another approach is to implement one or more helper classes that could enable duck typing (e.g. if it looks like a certain type, walks like that type, and quacks like the type, it's probably that type).

Being able to have strong static typing internally while having a very clean syntax would be ideal (e.g. sort of like the C# 3.0 "var" keyword). Static typing is usually harder to achieve, but if you can do it well, you get many benefits. C# designer Anders Hejlsberg said it well in a recent interview:

"I don't see a lot of downsides to static typing other than the fact that it may not be practical to put in place, and it is harder to put in place and therefore takes longer for us to get there with static typing, but once you do have static typing. I mean, gosh, you know, like hey -- the compiler is going to report the errors before the space shuttle flies instead of whilst it's flying, that's a good thing!"

Another issue that came up was properly handling inheritance. It makes sense to use "virtual" methods for rules, but this also requires you to emit the "override" directive to avoid warnings. Another warning cropped up from my use of the "base" keyword in the many delegates that makes it convenient from a code emission perspective, but frustrating from a verifiable security perspective. Both of these warning are being left for a latter phase.

Compiler issues were only one factor. The other significant piece was aesthetics. A notable number of rules yield specific lists as their result. C# doesn't quite have a good, clean way of expressing an OMetaList (that is, a list of lists). In order to get around this, I created the "Sugar" helper class which makes it slightly easier to write these lists. It's still far from perfect. Instead of having a nice way for expressing lists like that OMeta/JS:

-> ['add', x, y]

You have to do this instead:

-> { Sugar.Cons("add", x, y) }

This annoyance can probably be fixed by tricking the C# recognizer to act as if C# had a nice implicit way of expressing an OMetaList and then re-writing the expression to use the Sugar class.

Beyond C#, there is the "meta" "host language" which is the .NET common language runtime itself. For code to play nice on it, it should abide by the .NET framework design guidelines which dictate naming conventions and general practices. This had a parasitic effect in that it led me to name my rules after the naming guidelines for methods (e.g. they must be PascalCased). Thus, in OMeta# there is the "Digit" rule instead of the "digit" rule. OMeta/JS opts to prefix metarule method names with an underscore to make them not collide with other rules. I also wanted to hide metarules so that people would have to try harder to do the wrong thing; the best I could come up with to convey this idea was to implement them using an explicit interface exposed via a property (e.g. instead of "_apply", I have "MetaRules.Apply").

Being a Contextual Packrat

The last intriguing thing about OMeta is exactly how it implements parsing. It uses a technique called a Parsing Expression Grammar (PEG) that Bryan Ford introduced in 2004. It's just a fancy way to say you support the following features: (The table comes from the OMeta paper. I've also included how OMeta# implements them to prove its legitimacy):



expression meaning OMeta# Function
e1e2 sequencing Seq
e1e2 prioritized choice MetaRules.Or
e* zero or more repetitions MetaRules.Many
e+ one or more repetitions (not essential) MetaRules.Many1
~e negation MetaRules.Not
<p> production application MetaRules.Apply
'x' matches the character x Exactly

The most interesting aspect to me is the prioritized choice. This is in contrast to "Context-Free Grammars" (CFGs) which are the traditional way of defining parsers. To highlight the difference, consider parsing this line of C/C++ code:

if (condition1) if (condition2) Statement1(); else Statement2();

Is it the same as:

if (condition1)
{
if (condition2)
{
Statement1();
}
else
{
Statement2();
}
}

... or is it:

if (condition1)
{
if (condition2)
{
Statement1();
}
}
else
{
Statement2();
}

This is the classic "dangling else" problem. Context Free Grammars usually have this problem because they are free of any context when they are parsing (surprise!). In these situations, you often have to rely on something besides the grammar itself to resolve these ambiguities. Parsing Expression Grammars don't have this type of problem because you specify explicitly which one to try first. The prioritized choice therefore avoids ambiguity.

In order to make the performance relatively in line with the size of the input, a technique of "memoization" is used. This means the parser remembers what it has done previously; it does this as a space-time tradeoff. Keeping all those previous values have given them the reputation of being a "packrat" parser.

Let's say you have

  AddExpr  = AddExpr:x '+' MulExpr:y  -> { x.As<int>() + y.As<int>() }
AddExpr:x '-' MulExpr:y -> { x.As<int>() - y.As<int>() }
MulExpr
and you try the first choice (AddExpr:x '+' MulExpr:y), but you get to where you're expecting a "+" and it isn't there. You'll notice that the second choice (AddExpr:x '-' MulExpr:y) repeats the first part (namely AddExpr:x). Instead of re-evaluating it, you can just retrieve the stored value that you remembered, or rather memoized.

Well, it's almost that simple. Note that the definition of an add expression (AddExpr) is left-recursive. This means that the left side is defined in terms of itself. A naïve implementation of the parser would try to evaluate "AddExpr" by applying the first choice which started with a AddExpr which would then cause it to try to apply AddExpr again and eventually cause a stack overflow.

It's not a trivial problem to fix. In fact, the paper that introduced Parsing Expression Grammars said that left-recursion should be avoided since you can rewrite the rule to avoid left-recursion. This is unfortunate guidance because the rewritten rule sometimes loses the clarity that the left-recursive version of the rule had.

At the start of this year, a paper came out showing a very clever trick that makes left recursion possible.

Say that we're parsing the expression "1+2+3" using the "AddExpr." At the start of the expression we look at our memoization table to see if we have a precomputed value for "AddExpr". Since it is our first time, we won't find anything. The first part of the trick is to immediately store the result of "Fail" as the memoized result for AddExpr when starting at offset 0. We do this before we attempt to apply the rule.

When the parser uses recursion and finds itself again asking for the value of "AddExpr" at position 0, it will find a result of "fail" and thus fail that choice. It will then continue down to "MulExpr" where it will eventually match "1". The second part of the trick is to treat this value of "1" for "AddExpr" at position 0 as a "seed parse." The parser will then attempt to "grow the seed" by starting over at the beginning of the input stream, but this time it will remember its previous successful match. This will cause it to match "1+2" for AddExpr and then start over again by growing the seed to finally match "1+2+3."

In hindsight, it's a very simple idea. However, I say that only after some serious debugging time where I kept finding problems in my poor implementation. Alas, the entire algorithm is implemented in only a few lines of code in the "MetaRules.Apply" function in OMetaBase.cs.

So there you have it, you now know all of the interesting ideas in OMeta#.

Where is This Going?

OMeta# is a hobby project that's been a labor of love for me. You might have already guessed that if you noticed that most check-ins occur late at night or on the weekends. It's been slow, but it's coming together. Probably the most exciting highlight in the development came when I got enough of OMeta# working that I could rewrite parts of itself in OMeta#. This has the benefit of being cool from a meta self-reproducing level, but it's also a bonus because the code becomes much more readable and introspective. For example, compare the C# recognizer that I wrote by hand to the OMeta# grammar version that implements the same idea.

This approach of writing more of OMeta# in OMeta# will continue. It's my hope that as much as possible will be written in itself. A significant next step would be to rewrite the code generator as an OMeta# grammar. This isn't a new design idea. Each subsequent version of Smalltalk has been bootstrapped by its previous version.

I'll be the first to point out that there are a lot of rough edges to my current implementation. If you look around the source code, you'll see several comments labeled "// HACK" where I admit a hacked version of an idea. One of the most glaring problems is that debugging is annoyingly difficult. I haven't yet implemented an optimizer for the generated code which causes a lot of unnecessary metarule applications. The second annoyance is that just stepping into so many methods that fail often takes a long time. Over time, I hope to add better hints to get more traditional error messages that you get from good compilers so that you don't have to step through the generated code to debug a grammar.

As mentioned earlier, the language recognizer could use some work to make the syntax cleaner as well inferring types so that grammar writers don't have to explicitly declare them. I have confidence that the work that would go into a good C# host language implementation would directly port to other languages such as Visual Basic.

There's also a great need cross pollination with other projects such as the Dynamic Language Runtime (DLR) which has already tackled some of the issues that OMeta# will face. Additionally, languages such as F# have similar data structures already built-in that OMeta uses. Boo and Nemerle have interesting extensibility points which could provide some good ideas. The problem for me is that there is so many different places to look. It'd be great if you know these languages (or others) and could tell me via a comment or better yet, by sending code of how they could help OMeta#.

As of right now, I plan on keeping the core functionality in C# rather than rewriting everything in another language or by using the DLR. The main reason is that C# is wide-spread in availability, both on .NET as well as on Mono. A second reason is that I would like to keep the outside dependencies on OMeta# as small as possible.

OMeta is a beautiful in the same sense that mathematics can have a simple beauty. LISP has been referred to by some as "The Most Important Idea in Computer Science" largely because its core ideas could be expressed on a half page of the LISP 1.5 manual (page 13 to be exact). This elegance is similar to mathematics of Maxwell's Equations describing electromagnetism.

It's my hope that I won't mess up the implementation to the point where this elegant beauty is lost. Currently it's way too ugly. I see some of the rough edges, but I'm sure you can help me discover others along with how to fix them. If this post sounds too complicated, it's not you, it's me. Let me know if something doesn't make sense and I'll try to provide clarification.

If you're interested in OMeta#, I highly encourage you to download the latest version of the source code from CodePlex and step through the unit test runner to see things get built up.

If you can, keep notes of all the places where you see me doing something and let me know about them either through comments or via email. In addition, I also highly recommend experimenting Alessandro Warth's great OMeta/JS implementation via your web browser.

Happy OMeta-ing!


kick it on DotNetKicks.com

Tuesday, June 24, 2008

OMeta#: Who? What? When? Where? Why?

What if programming language implementations were object-oriented? What if you wanted to change one itty-bitty part of your favorite language? Say you wanted to add an exponentiation operator (^^). How hard would it be?

Wouldn't it be nice if you could "subclass" a language like C# and then add 5 lines of code to make it work and then use it? What if you could add Ruby-like method_missing support in 20 lines?

What if you could conceive of a new language and start experimenting with it in production code in an hour? What if it leveraged your knowledge of existing frameworks like .NET?

Let's imagine that it were possible. Let's say someone else created a basic calculator-like "language":

[image]

Even without knowing the syntax, you can probably figure out how it works since it's written very close to the standard way of describing a language. If you give this language "3*4", it will respond with 12. If you give it "2^(3*4)", it'd respond with a nice syntax error message like "I don't know how to deal with the '^' symbol. Sorry!"

But you could make it work by writing this object-oriented-like "subclass" of the language:

[image]

Now, giving this language "2^(3*4)" happily gives you 4096.

This worked using simple inheritance. What if you wanted to compose your language from several other languages? That is, something like "multiple inheritance" but without its problems? What if we took a simpler approach and allowed a language to attempt to match elements of any other language?

Let's say we want to build a scientific calculator that leverages some of the work we've already done. Instead of inheriting from it, we'll call out to it:

[image]

Now you can calculate expressions like "sqrt (1+2^3)!"

What if such a meta language (e.g. a language for describing other languages) existed? What would you do with it? How might it help you get things done while producing extremely small, but very readable code? What if creating a new language that perfectly fit a problem your software was trying to solve was as easy as writing a simple regular expression?

What if you could get all the whiz-bang, gotta-have tool support like syntax highlighting, interactive debugging, and a compiler for it came at almost no extra cost? What if the barrier to entry for a new language was so low, that almost anyone could do it quickly?

This is what I've been thinking about for the past month or so.

Who? & What?

When I was doing some research for my "Towards Moore's Law Software" series, I came across a very interesting language called OMeta which was created by Alessandro Warth and Ian Piumarta with some inspiration from Alan Kay. In their 2007 Dynamic Languages Symposium paper, the authors described OMeta as

".. a new object-oriented language for pattern matching. OMeta is based on a variant of Parsing Expression Grammars (PEGs) —a recognition based foundation for describing syntax—which we have extended to handle arbitrary kinds of data. We show that OMeta’s general-purpose pattern matching provides a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree transformers, all of which can be extended in interesting ways using familiar object-oriented mechanisms. This makes OMeta particularly well-suited as a medium for experimenting with new designs for programming languages and extensions to existing languages."

To be honest, I didn't really understand the paper when I first read it. It sounded cool, but it used a few new terms I hadn't heard of before. However, I couldn't help but be fascinated by OMeta's readable syntax. OMeta doesn't try to solve every problem involved with writing a complete program. It makes no claims about being a good general purpose programming language. But it positions itself by doing just one thing really well:

Pattern matching.

If you look at modern programming languages that are good for developing other languages, you'll notice that they usually have good pattern matching capabilities. For example, in ML (which stands for meta-language), you can specify functions in terms of patterns. Here's how you can reverse a list:

[image]

This defines the function named "reverse." When it sees the "pattern" where the function is called with an empty list (nil), it just returns nil. However, when the function is passed in a list whose first element is "x" and the rest of the list is "xs", it will return the reverse of the rest of the list and then append the head of the list to the end.

If you think about it for a second, it's a really compact way of expressing how to reverse a list.

Streams of Anything

What I find interesting about OMeta is how it matches patterns. OMeta can operate on a stream of anything. This is useful because modern language implementations have several phases that work on different types of "things." Here's a simplified view of what a compiler does:

[image]

At the start of the process, you're working with individual characters from source code. The list of characters is converted to tokens by use of a scanner. This is how the characters 'i', 'n', and 't' become the "INT" token. The next phase combines tokens together to form a higher level view of the program, typically using a tree data structure. From that phase, several intermediate steps occur on the program (almost always in tree form), until it's finally pulled apart to make an executable program.

Often, you have to use different