Mike Schaeffer's Blog

Articles with tag: programming
March 4, 2005

I'm writing some content for a future post on the offshoring of jobs overseas, but I want to clear something up before it gets posted: Outsourcing and offshoring are two different and orthogonal concepts. This seems to be something that gets misunderstood a great deal, but simply put, outsourcing is the movement of jobs to a different company and offshoring is the movement of jobs to a different country. Either one can be done without the other.

The scenarios that people tend to get upset about (at least in the United States) are the scenarios involving offshoring, the movement of work overseas. Outsourcing, however, does not necessarily imply that the work gets moved to a different country: it's very common for work to be outsourced to another American business employing American workers. An example of this is hiring a Madison Avenue firm to put together an ad campaign. Sure, it'd be possible to develop the talent in house to do this yourself, but there are many advantages in outsourcing the work to a more specialized vendor.

March 2, 2005

I'm in the middle of developing a Scheme compiler for a future release of vCalc. While I've been developing the code, I've peppered it full of debugging print statements that look something like this:

(format #t "compiling ~l, tail?=~l, value?=~l" form tail? value?)

with the output statements in place, the compiler takes about 250-300ms to compile relatively small functions. Not great, particularly considering that there's no optimization being done at all. Anyway, on a hunch I removed the format statements, and execution time improved by a couple orders of magnitude to a millisecond or two per function. That's a lot closer to what I was hoping for at this stage of development.

On the other hand, I hadn't realized that my (ad hoc, slapped together in an hour) format function was running quite that slowly. I think it'll end up being an interesting optimnization problem sooner or later.

March 1, 2005

Idempotence has benefits at a program's run-time, as well as at build time. To illustrate, consider the case of a reference counted string. For the sake of example, it might be declared like this (In case you're wondering, no, I don't think this is a production-ready counted string library...):

struct CountedString
{
    int  _references;
    char *_data;
};

CountedString *makeString(char *data)
{
    CountedString cs = (CountedString *)malloc(sizeof(CountedString));

    cs->_references = 1;
    cs->_data = strdup(data);

    return 1;
}

CountedString *referToString(CountedString *cs)
{
    cs->_references++;
    return cs;
}

void doneWithString(CountedString *cs)
{
    cs->_references--;

    if (cs->_references == 0)
    {
        free(cs->_data);
        free(cs);
    }
}

// ... useful library functions go here...

The reference counting mechanism buys you two things. It gives you the ability to delete strings when they're no longer accessible; It also gives you the abilty to avoid string copies by deferring them to the last possible moment. This second benefit, known as copy-on-write, is where idempotence can play a role. What copy on write entails is ensuring that whenever you write to a resource, you ensure that you have a copy unique to to yourself. If the copy you have isn't unique, copy-on-write requires that you duplicate the resource and modify the copy instead of the original. If you never modify the string, you never make the copy.

This means that the beginning of every string function that alters a string has to look something like this:

CountedString *alterString(CountedString *cs)
{
    if (cs->_references > 1)
    {
        CountedString *uniqueString = makeString(cs->_data);
        doneWithString(cs);
        cs = uniqueString;
    }

    \\ ... now, cs can be modified at will

     return cs;
}

Apply a little refactoring, and you get this...

CountedString *ensureUniqueInstance(CountedString *cs)
{
    if (cs->_references > 1)
    {
        CountedString *uniqueString = makeString(cs->_data);
        doneWithString(cs);
        cs = uniqueString;
    }

    return cs;
}

CountedString *alterString(CountedString *cs)
{
    cs = ensureUniqueReference(cs);

    \\ ... now, cs can be modified at will

    return cs;
}

Of course, ensureUniqueInstance ends up being idempotent: it gets you into a known state from an unknown state, and it doesn't (semantically) matter if you call it too often. That's the key insight into why idempotence can be useful. Because idempotent processes don't rely on foreknowledge of your system's state to work reliably, they can be a predictable means to get into a known state. Also, If you hide idempotent processes behind the appropriate abstractions, they allow you to write code that's more self documenting. A function that begins with a line like cs = ensureUniqueInstance(cs); more clearly says to the reader that it needs a unique instance of cs than lines of code that check the reference count of cs and potentially duplicate it.

Next up are a few more examples of idempotence, as well as a look into some of the pitfalls.

February 22, 2005

Larry Osterman has been running a nice series of posts on issues related to thread synchronization and concurrency related issues. It's been full of useful tips and tricks, I particularly like part 2, Avoiding the Issue. That's a technique that's worked well for me in the multithreaded systems I've worked on. Of course, if you're writing SQL Server, etc. I'm sure you can't take nearly as simple an approach.

Older Articles...