<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Mike Schaeffer's Weblog   </title>
    <link>http://www.mschaef.com/blog</link>
    <description>Mike Schaeffer's Weblog</description>
    <language>en</language>

  <item>
    <title>The End of 16-bit Windows</title>
    <link>http://www.mschaef.com/blog/2008/11/06#windows-31-end-of-life</link>
    <description>
In an era in which customers are almost begging Microsoft not to
discontinue Windows XP, I was suprised to see &lt;a
href=&quot;http://arstechnica.com/news.ars/post/20081105-microsoft-puts-windows-3-11-for-workgroups-out-to-pasture.html&quot;&gt;a
recent news story&lt;/a&gt; on the end of life of Windows for Workgroups
3.11 (WfWG).  If you're not completely up on the early history of
Windows, WfWG 3.11 was released in August of 1993, and was the last of
the major US-market versions of Windows without native Win32 support
out of the box. It was also one of a series of Windows releases in the
early 90's that turned Windows from 'the library you need to run
Excel' into a legitimate platform for general purpose computing.

&lt;br&gt;&lt;br&gt;

From it's introduction in 1985 until the release of Windows 3.0 in 1990, Windows was almost 
entirely composed of the same basic core: DOS for file access and system startup, and a 
collection of three DLL's (KERNEL, GDI, and USER) for memory management, device independant 
graphics, and the GUI widget library and window manager.  Atop the core sat programs 
written to the Windows API. All of this ran sharing the one 20-bit segmented address space 
provided by x86 real mode: with 640K usable memory.  If you were lucky, you might have had 
a LIM/EMS board that allowed a few MB of extra memory to be addressed through a 64KB window 
at the top of the addres space. If you were &lt;b&gt;really&lt;/b&gt; lucky, you might have had a 80386 
computer with a special program that let it pretend its extra memory worked like a LIM/EMS 
board. Needless to say, memory was tight, difficult to use, and dangerous to share it 
between multiple programs.

&lt;br&gt;&lt;br&gt;

The solution to this memory problem was initially to be OS/2. OS/2 was the operating system 
part of IBM's vast (and doomed) PS/2 program to recapture the PC space back from clone 
vendors. Like DOS, it was done in partnership with Microsoft, but IBM took a much more 
active role in the design and development of OS/2 than they did with DOS.  OS/2's most 
noteworthy feature was the fact that it was designed to run in 80286 'protected mode' 
rather than the 'real mode' of DOS and Windows. Protected mode, like its name implies, 
added memory protection between processes that made multi-tasking more reliable. Protected 
mode also widened the physical address space of the CPU from 20-bits to 24-bits, making it 
possible to directly address 16MB of memory without resorting to tricks like LIM/EMS 
paging. This was all good, but it was tempered by the fact that OS/2 was expensive to run 
and didn't run DOS programs very well, thanks to its choice of 80286 protected mode over 
80386. The only programs that could actually use the benefits of protected mode under OS/2 
were OS/2-specific software that nobody had.

&lt;br&gt;&lt;br&gt;

By the time 1988 rolled around, PC's with the capability of addressing more than 1MB of 
memory had been around since 1984, and there still wasn't a viable mainstream operating 
system that took advantage of this capability. This is when Windows got its big break: 
David Weise at Microsoft &lt;a 
href=&quot;http://blogs.msdn.com/larryosterman/archive/2005/02/02/365635.aspx&quot;&gt;figured out how 
to run Windows itself in Protected Mode&lt;/a&gt;, &lt;i&gt;along with unmodified Windows programs&lt;/i&gt;. 
Running existing software in protected mode was something of a holy grail, and Dr. Weise's 
idea ultimately resulted in Windows 3.0, released in 1990 to heady acclaim. Windows 3.0 
also included the V86 multitasker from the older Windows/386 product. This meant Windows 
3.0 could do things OS/2 could not do, like run multiple DOS programs at the same time and 
run them in graphical windows on the desktop.

&lt;br&gt;&lt;br&gt;

Windows 3.0 ended up being a runaway sales success, and after its release, the rest of the 
dominos fell fairly quickly. Microsoft's partnership with IBM effectively ended, with IBM 
getting a source licence to Microsoft products through the early 1990's. IBM ultimately 
used this license to develop a special version of Windows they bundled with OS/2 2.0 to let 
Windows programs run under OS/2 (&quot;a better Windows than Windows&quot; went the ad). Microsoft's 
own 32-bit OS/2 2.0 got dropped, and the work done on OS/2 NT (3.0) ultimately formed the 
basis for 1993's Windows NT and the Win32 API. The next version of 16-bit Windows, Windows 
3.1, dropped support for real mode entirely, and as it evolved into Windows 95, more and 
more system services were moved into 32-bit code. This 16/32-bit hybrid version of Windows 
lasted until Windows Me.  It was definately barouque, and ended up notoriously unreliable, 
but its evolution from 256K 8088's to 128MB Pentiums is to my eye one of the more 
impressive examples of evolutionary software engineering. I don't miss using these versions 
of Windows, but it's easy to miss the 'brave new world' spirit they embodied.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fhistory%2Fwindows-31-end-of-life.txt;title=The%20End%20of%2016-bit%20Windows&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/history/windows-31-end-of-life.txt&amp;amp;title=The End of 16-bit Windows&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>Macintosh vs. PC Pricing, and missing the point.</title>
    <link>http://www.mschaef.com/blog/2008/08/18#mac_vs_pc_pricing_and_openness</link>
    <description>
Harry McCracken just wrote &lt;a 
href=&quot;http://technologizer.com/2008/08/14/are-macs-more-expensive-lets-do-the-math-once-and-for-all/&quot;&gt; 
a bit&lt;/a&gt; comparing the price of PC's to Macintosh's. Like most of these 
guys, he misses the point. Consider his methodology: &lt;i&gt;&quot;I chose a 
standard [Apple] MacBook configuration...Then I configured laptops as 
similarly as possible from the country's two largest PC 
manufacturers&quot;&lt;/i&gt;. The problem is that this methodology takes the set 
of Apple machines to be the set of valid configurations for comparison, 
excluding configurations that Apple does not offer. Just for the sake of 
a more full comparison, what does a MacBook cost with these 
configurations?

&lt;ul&gt;
&lt;li&gt;A TrackPoint.
&lt;li&gt;A numeric keypad.
&lt;li&gt;Two internal batteries.
&lt;li&gt;Two internal mouse buttons.
&lt;li&gt;A swappable drive bay.
&lt;li&gt;A docking station.
&lt;li&gt;A calibrated, high-gamut display and a digitizer.
&lt;li&gt;A display smaller than 13&quot; or bigger than 17&quot;.
&lt;li&gt;No keyboard.
&lt;li&gt;A convertable tablet configuration.
&lt;li&gt;Embedded on a PXI card.
&lt;liThe absolute highest performance.
&lt;li&gt;The absolute minimum cost.
&lt;/ul&gt;

Of course, none of these configurations are available from Apple. If you
need or want one of these options, you can't get it at any price from 
Apple. Similar comparisons can be made in the server and desktop PC 
spaces. 

&lt;br&gt;&lt;br&gt;

This is an unsuprising result. When you enlarge the playing field beyond 
Apple's relatively limited reach, it becomes even more apparant that 
these comparisons aren't 'Apple vs. PC' what they really are is 'Apple 
vs. The Entire Computer Industry'. Apple doesn't have the capability, 
desire, or brand to fare well in such a comparison: There are just too 
many market segments they don't address. Addressing all of these 
segments would leave them with a confusing product line, a highly taxed 
engineering group, and a muddled brand image.

&lt;br&gt;&lt;br&gt;

Part of the value of the PC platform is that it not subject to the 
limitations of being confined to one highly image-sensitive company. 
Part of the value of the PC is that it allows other vendors to enlarge 
the platform into new segments. Missing out on this is one of the costs 
of picking an Apple that is missing in most comparisons, including 
McCracken's.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fapple%2Fmac_vs_pc_pricing_and_openness.txt;title=Macintosh%20vs.%20PC%20Pricing%2C%20and%20missing%20the%20point.&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/apple/mac_vs_pc_pricing_and_openness.txt&amp;amp;title=Macintosh vs. PC Pricing, and missing the point.&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>The Linux Hater's Blog</title>
    <link>http://www.mschaef.com/blog/2008/08/12#linux-haters-blog</link>
    <description>
I have a new favorite blog, the &lt;a
href=&quot;http://linuxhaters.blogspot.com/&quot;&gt;Linux Hater's Blog&lt;/a&gt;. Some
anonymous Linux user has taken it upon himself to open a blog
dedicated to all of the many reasons why desktop Linux sucks (which it
does). While it's more than a little mean-spirited, this blog is the
dissenting voice of Linux. It is the conscience that, if heeded, will
make the Linux desktop a better place to work.

&lt;br&gt;&lt;br&gt;

For all of the problems with Linux, it is also the one major platform
that allows the motivated individual or company to actually address
those problems. The single biggest difference between the Linux
Hater's Blog and the (would-be) Windows and MacOS X Hater's Blogs is
that on the Linux blog, it's actually possible to do something about
the problems. Consider this: 18 years ago, there was no Linux, 12
years ago, there was no Gnome. In 1990, the Linux Hater's blog would
have one post: &lt;i&gt;&quot;It doesn't exist, go buy Windows.&quot;&lt;/i&gt; The reason I
mention this is that while it's easy to dismiss the benefits of open
source as purely theoretical (i.e.: &lt;i&gt;&quot;Have &lt;b&gt;you&lt;/b&gt; ever needed to
recompile your kernel?&quot;&lt;/i&gt;), the benefits of open source are the
entire reason it exists at all.

&lt;br&gt;&lt;br&gt;

To look at this in a bit more depth, consider the gnome-panel as an
example. Based on the copyright claims in the source code, gnome-panel
is itself a collaboration of &lt;a
href=&quot;http://en.wikipedia.org/wiki/Eazel&quot;&gt;Eazel&lt;/a&gt;, &lt;a
href=&quot;http://primates.ximian.com/~miguel/helix-history.html&quot;&gt;Helix
Code&lt;/a&gt;/Ximian/Novell, &lt;a href=&quot;http://www.sun.com/&quot;&gt;Sun
Microsytems&lt;/a&gt;, &lt;a href=&quot;http://www.redhat.com&quot;&gt;Red Hat&lt;/a&gt;, &lt;a
href=&quot;http://www.fsf.org&quot;&gt;The Free Software Foundation&lt;/a&gt;, &lt;a
href=&quot;http://ian.mckellar.org/&quot;&gt;Ian McKellar&lt;/a&gt;, James Wilcox, Rob
Adams, &lt;a href=&quot;http://live.gnome.org/VincentUntz&quot;&gt;Vincent Untz&lt;/a&gt;,
and &lt;a href=&quot;http://carlosgc.linups.org/&quot;&gt;Carlos Garcia
Campos&lt;/a&gt;. All of these contributors found things to change or fix,
'itches to scratch', and all of them changed or fixed the
gnome-panel. This is something that basically cannot happen in the
model of closed source software. If you want to change something in
MacOS X, you basically have three options: try to convince Apple it is
a worthwhile change by trying to present (giving up the rights to) a
business case justifying the feature, try to go to work for Apple in
the right group and convince them to let you implement your feature,
or reimplement the entire thing yourself.

&lt;br&gt;&lt;br&gt;

As a result of these kinds of trade offs, cross-organization collaboration 
in closed source is a lot harder to come by than in open source. Closed 
source essentially divides the stakeholders in a piece of software into 
two groups: those that can take responsibility for the softawre by making 
changes, and those that cannot and must either accept the changes as 
provided or work around them. In that sense, Free Software is the 
licencing model that brings to software the democratic ideals of personal 
responsibility and the sovereignty of the people. Like any democracy, in 
the short term it will have issues compared to more centralized forms of 
planning, but in the long term it will be a much more vibrant and 
productive place to be. This is also why the Linux Hater's Blog is so very 
important. To see why, continue the analogy with democracy a bit, and 
consider the process by which the United Stated adopted its constitution.

&lt;br&gt;&lt;br&gt;

After the U.S. Constitutional Convention in Philadelphia, there came the 
long and highly political process of states ratifying new form of 
government.  During this two year long debate, there were a series of 
papers, the Federalist Papers, written in support of the proposed 
Constutition. Less well known are the Anti-Federalist papers, a series of 
dissenting arguments &lt;i&gt;against&lt;/i&gt; ratification. This dissent primarily 
centered around the lack of a Bill of Rights, and ultimatly led to the 
incorporation of a Bill of Rights as the first ten amendments to the 
constitution. The dissent was not just criticism: open process and free 
debate allowed it to be a key part of the construction of the 
Constitution.

&lt;br&gt;&lt;br&gt;

This is a much grander version of what the Linux Hater's blog can do for 
Linux. By dissenting against the idea that Linux is already ready for the 
desktop (or the server), it also provides a list of weaknesses to fix.  
Unlike a Windows Hater's Blog, the freedoms of Linux allow this list of 
weaknesses to effectively become a to do list for anyone or any company 
with the motivation and time to do the work. It is therefore not a 
liabilty to Linux, but an asset that derives its value from the freedom at 
the core of Free Software. Ironically enough, because of this, the 'Linux 
Hater' could easily turn out to be one of Linux's best friends.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Flinux%2Flinux-haters-blog.txt;title=The%20Linux%20Hater%27s%20Blog&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/linux/linux-haters-blog.txt&amp;amp;title=The Linux Hater's Blog&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title> Does Openness Matter Anymore?</title>
    <link>http://www.mschaef.com/blog/2008//11#does_openness_really_matter</link>
    <description>
I was born in 1975. In the 'computer world', this means I grew up at
the tail end of the 8-bit era. By the time I was a teenager the market
was in the middle of deciding whether to go with PC's, the Apple
Macintosh, or &lt;a
href=&quot;http://en.wikipedia.org/wiki/DESQview#DESQview.2FX&quot;&gt;something
else&lt;/a&gt;. Microsoft basically cinched that deal in 1990 with the
release of &lt;a href=&quot;http://en.wikipedia.org/wiki/Windows_3.0&quot;&gt;Windows
3.0&lt;/a&gt;, the first relevant version. A PC running Windows 3.0 wasn't
as nice as a Macintosh, but it didn't matter. If you already had a PC,
you could buy Windows off the shelf for $89, retain all of your
existing hardware and software, and then experiment with the GUI when
you had the time. If typing &lt;tt&gt;win&lt;/tt&gt; at a DOS prompt took you down
the rabbit hole, clicking 'Exit Windows' took you right back to your
comfort zone.

&lt;br&gt;&lt;br&gt;

Windows 3.0 also had the benefit of a huge installed base of latent
and mostly unused hardware. A typical business PC in 1990 might have
been something like an &lt;a
href=&quot;http://en.wikipedia.org/wiki/Intel_80286&quot;&gt;80286&lt;/a&gt; with 2MB of
RAM, a 40MB disk, and an &lt;a
href=&quot;http://en.wikipedia.org/wiki/Enhanced_Graphics_Adapter&quot;&gt;EGA&lt;/a&gt;
(640x350x4bpp) bitmapped display. It would then be running DOS
software that basically couldn't address more than the first &lt;a
href=&quot;http://en.wikipedia.org/wiki/Conventional_memory&quot;&gt;640K&lt;/a&gt; of
memory, and tf you ever saw the bitmap display in use, it was probably
for a static plot of a graph. Compared to a Macintosh from the same
year, a PC looked positively like something from a totally different
generation. Windows 3.0 changed all this. It allowed you to switch
your 80286 into &lt;a
href=&quot;http://en.wikipedia.org/wiki/Protected_mode&quot;&gt;'Protected
Mode'&lt;/a&gt; to get at that extra memory. It provided a &lt;a
href=&quot;http://en.wikipedia.org/wiki/Graphics_Device_Interface&quot;&gt;graphics
API&lt;/a&gt; (with drivers!) and forced programs to use the bitmapped
display. It provided standard printer drivers that worked for &lt;b&gt;all
Windows programs&lt;/b&gt;. Basically, for $89 it took the hardware you
already had and made it look almost like the Macintosh that would
otherwise have cost you thousands of dollars. It was utterly
transforming.

&lt;br&gt;&lt;br&gt;

Almost 20 years later, the most interesting thing about this is the
relative timing of the hardware and its software support. Most of the
hardware in my 'typical 1990 PC' was introduced by IBM in its 1984
announcement of the &lt;a
href=&quot;http://www.vintage-computer.com/ibmpcat.shtml&quot;&gt;IBM PC AT&lt;/a&gt;.
The first attempt by IBM and Microsoft to support the 80286 natively
came three years later in 1987's release of &lt;a
href=&quot;http://en.wikipedia.org/wiki/OS/2&quot;&gt;OS/2&lt;/a&gt;. The first
286-native platform to reach mainstream acceptance came in 1990. Think
about that: it took 6 years for the open PC market to develop software
capable of fully utilizing the 80286.  The &lt;a
href=&quot;http://en.wikipedia.org/wiki/Intel_80386&quot;&gt;80386&lt;/a&gt; fared even
worse; The &lt;a
href=&quot;http://en.wikipedia.org/wiki/Compaq#Deskpro_386&quot;&gt;first 386
machine&lt;/a&gt; was released in 1986, and it didn't have a major
mainstream OS until either 1993 or &lt;a
href=&quot;http://en.wikipedia.org/wiki/Windows_95&quot;&gt;1995&lt;/a&gt; (depending on
whether or not you count &lt;a
href=&quot;http://en.wikipedia.org/wiki/Windows_NT_3.1&quot;&gt;Windows NT 3.1&lt;/a&gt;
as 'mainstream'). Thus, there were scores of 286 and 386 boxes that
did nothing more than execute 8086 code really, really fast (for the
time :-)). In modern terms, this is analogous to a vendor introducing
a hardware devide today and then delaying software support until 2018.

&lt;br&gt;&lt;br&gt;

This is emblematic of the hugely diminishing value of an open device
platform in today's computer industry. In 1989, using a computer was
largely an exercise in getting the damn thing to work.  When those are
the issues you're worried about as a PC user, an open platform is
helpful because it enables a broader selection of vendors for parts
and software. If you've run out of slots for both your video board and
your bus mouse interface, you can always switch to an ATI video board
with a built-in mouse port. If you need a memory manager that supports
&lt;a href=&quot;http://docs.ruudkoot.nl/vcpi.doc&quot;&gt;VCPI&lt;/a&gt; to enable your &lt;a
href=&quot;http://osdev.berlios.de/v86.html&quot;&gt;V86&lt;/a&gt; multitasker, you can
always switch to something like &lt;a
href=&quot;http://en.wikipedia.org/wiki/QEMM&quot;&gt;QEMM/386&lt;/a&gt;. If you need
more memory to run your spreadsheet, you can go to AST Technolgies and
buy a &lt;a
href=&quot;http://www.borrett.id.au/computing/art-1989-01-02.htm&quot;&gt;LIM/EMS&lt;/a&gt;
board.  When you're worried about these kinds of issues, issues 'low
in the stack', the flexibility of choice provided by openness is
useful enough that you might be more willing to bear the costs of a
market slow to adopt new technologies.

&lt;br&gt;&lt;br&gt;

Of course, price is also a factor.  In 1988, Byte magazine ran a
review of Compaq's Deskpro 386s. This was their first &lt;a
href=&quot;http://www.borrett.id.au/computing/art-1989-01-02.htm&quot;&gt;80386SX&lt;/a&gt;
machine, a desktop computer designed to be a cheaper way to run
80386-specific software. The cost of the review machine was something
like $15,000.  In 2007 dollars, this would buy you a nice, reasonably
late-model BMW 3-series. A year later in 1989, my family bought a
similar machine from  ALR, which cost around $3,000. Thus isn't
nearly as bad, but it's still around $5,200 in 2007, which basically
means that a mid-range 1989 PC is priced at the very top end of the
2007 PC market. With monetary costs that high, that other benefit of
openness, price competition, becomes a much bigger deal. Compaq ended
up suffering badly as competition drove the price of the market to
where it is today.

&lt;br&gt;&lt;br&gt;

In the intervening 20 years, both of these circumstances have changed
dramtically. PC's, both Windows and Macintosh, are well enough
integrated that nothing needs to be done to get them to run aside from
unpacking the box.  &lt;a
href=&quot;http://en.wikipedia.org/wiki/NeXTSTEP&quot;&gt;NeXTStep&lt;/a&gt;, which in
1994 required a fancy $5,000 PC bought from a custom vendor to run
well, will shortly be able to run (with long-range, high-speed
wireless!) on a &lt;a href=&quot;http://www.apple.com/iphone/&quot;&gt;$200&lt;/a&gt;
handheld bought at your local shopping mall. Our industry has moved up
&lt;a
href=&quot;http://en.wikipedia.org/wiki/Maslow%27s_hierarchy_of_needs&quot;&gt;Maslow's
hierarchy of needs&lt;/a&gt; from expensive, unreliable hardware, run by the
dedicated few to cheap, reliable hardware, run by disinterested
many. We can now concentrate on more interesting things than just
getting the computer to work, and tt is with this shift that the some
of the unique value of openness has been lost. Unfortunately, &lt;i&gt;the
costs have been retained&lt;/i&gt;, there is no countervening force in the
market that's forcing open platforms to move any faster.

&lt;br&gt;&lt;br&gt;

Personally, I believe this bodes very well for Apple's latest attempt
to own the smartphone space. There will only be one vendor and one
price for the iPhone, but the platform will be able to move faster to
adopt new technolgies, and integrate them more tightly, because
there's only one kind of hardware to run on. The fewer hardware
configurations and stricter quality control guidelines will make it
easier (and more mandatory) that developers produce high quality
software. The fact that entry into the software market is controlled,
doesn't matter, because there are still more eligable developers than
the platform actually needs. The net result of all this is that Apple,
again, has a product that looks 'next generation', but the pricing and
openness factors that cost them that advantage in the early 90's are
no longer there. It's a good time to be involved in the iPhone,
methinks.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fgeneral%2Fdoes_openness_really_matter.txt;title=%20Does%20Openness%20Matter%20Anymore%3F&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/general/does_openness_really_matter.txt&amp;amp;title= Does Openness Matter Anymore?&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>defmacro and coupling.</title>
    <link>http://www.mschaef.com/blog/2008/04/10#defmacro-coupling</link>
    <description>
A few months ago, I ran into a problem with a macro that seriously
changed my opinions on how they should be used. It all comes down to
the fact that macro are incorporated into compiler output. Two pieces
of code that look nicely decoupled in the source text can end up very
entwined with each other, once they are compiled.

&lt;br&gt;&lt;br&gt;

To illustrate, I'll use the macro in question, something I once used
to accept a sort of simulated 'multiple return value' in a dialect of
Scheme. This is a low level example, something from my hobby work, but
it can apply equally well to other uses of macros.

&lt;pre class=&quot;syntax&quot;&gt;
(defmacro (values-bind form vars . body)
  (with-gensyms (form-rv-sym)
    `(let ((,form-rv-sym ,form))
       (list-let ,vars (if (%values-tuple? ,form-rv-sym)
                           (slot-ref ,form-rv-sym 'v)
                           (list ,form-rv-sym))
         ,@body))))
&lt;/pre&gt;

This macro expands code like this:

&lt;pre class=&quot;syntax&quot;&gt;
(values-bind (returns-2-args 'foo) (arg-1 arg-2)
   (+ arg-1 arg-2))
&lt;/pre&gt;

Into code that looks like this:
&lt;pre class=&quot;syntax&quot;&gt;
(let ((#:form-rv-sym-69@00beeec4 (returns-2-args 'foo)))
   (list-let (arg-1 arg-2) (if (%values-tuple? #:form-rv-sym-69@00beeec4)
                              (slot-ref #:form-rv-sym-69@00beeec4 'v)
                              (list #:form-rv-sym-69@00beeec4))
     (+ arg-1 arg-2)))
&lt;/pre&gt;

And then, the compiler compiles that form and drops the result into
the output file, which now contains several pretty deep assumptions
about the simulated multiple value protocol it needs to honor:

&lt;br&gt;&lt;br&gt;

&lt;list&gt;
&lt;li&gt;Values are returned in a single value that satifies
    &lt;tt&gt;%values-tuple?&lt;/tt&gt;.
&lt;li&gt;Values are extracted from a tuple with a call to &lt;tt&gt;slot-ref&lt;/tt&gt;
    for slot &lt;tt&gt;v&lt;/tt&gt;.
&lt;li&gt;Values are stored within slot as a list.
&lt;/list&gt;

&lt;br&gt;

While the source text that uses &lt;tt&gt;values-bind&lt;/tt&gt; doesn't need to
know any of these details, the compiler output does. This results in
compiler output that is very closely tied to the value protocol;
Compiler output that is likely to be incompatible with any changes to
that protocol.

&lt;br&gt;&lt;br&gt;

In many development scenarios, this doesn't matter.  Within a single
project, if compiled file A comes to depend on assumptions embedded in
macros from file B, it's less of an issue: both files are usually
compiled at the same time. If both files can't be simultaneously
compiled, things start to go wrong. I ran into this issue myself when
trying to change the multiple value protocol I was using in my
compiler. My core library was built with the old protocol, my new
library was to be built with the new protocol, and the two could not
interoperate for the brief period of time necessary to produce a
compiled version of the new library. There are several possible
approaches to solving this, but but one I took was the two step of
building a new 'old' library that can handle &lt;b&gt;both&lt;/b&gt; protocols,
using it to compile a version that works only with the new protocol,
and then switching over completely. It was a mess, and a mess I
created myself with a macro that expanded into something that assumed
way too much. The better approach, the approach that I switched to,
is this:

&lt;pre class=&quot;syntax&quot;&gt;
(define (call-with-values proc vals)
  (apply proc (%values-&gt;list vals)))

(defmacro (values-bind form vars . body)
  `(call-with-values (lambda ,vars ,@body) ,form))
&lt;/pre&gt;

This expands the above code to something more palatable:

&lt;pre class=&quot;syntax&quot;&gt;
(call-with-values (lambda (arg-1 arg-2)
                     (+ arg-1 arg-2))
                  (returns-2-args 'foo))
&lt;/pre&gt;

The only assumption this makes in the compiled output is that there's
a function &lt;tt&gt;call-with-values&lt;/tt&gt; that calls its first argument
with values passed in as its second argument. All of the gory details,
which could easily be the same three from my list, are hidden behind
function calls and dynamic linkage. This is actually the
representation that made the two-step cutover approach
plausible. Switching to this version of the &lt;tt&gt;values-bind&lt;/tt&gt; macro
removed assumptions about the value protocol from every call site, and
made it easy to switch.


&lt;br&gt;&lt;br&gt;

The upshot of this is something that's, I'm sure, pretty common
knowledge in Lisp/Scheme circles: macros are best when limited to
syntax, with the underlying functionality implemented in a more
functional interface. The functional interface keeps things more
decoupled, even when compiled, and leaves your software more
managable. It also provides a second way to 'get at' the
functionality provided by the underlying code. With the function/macro
split, the macro expansionn can be avoided entirely, in the case
when you already have a closure that contains the code you need
to run.
 
&lt;br&gt;&lt;br&gt;

One more brief example, a bit higher up the 'stack' in the
language environment is the transformation of this macro:

&lt;pre class=&quot;syntax&quot;&gt;
(defmacro (with-output-to-string . code)
  (with-gensyms (saved-output-port-sym output-string-sym)
    `(let ((,saved-output-port-sym (current-output-port))
           (,output-string-sym (open-output-string)))
       (unwind-protect (lambda ()
                         (set-current-output-port ,output-string-sym)
                         ,@code
                         (get-output-string ,output-string-sym))
                       (lambda ()
                         (set-current-output-port ,saved-output-port-sym))))))
&lt;/pre&gt;

Into this macro/function pair:

&lt;pre class=&quot;syntax&quot;&gt;
(define (call-with-output-to-string fn)
  (let ((saved-output-port (current-output-port))
        (output-string (open-output-string)))
    (unwind-protect (lambda ()
                      (set-current-output-port output-string)
                      (fn)
                      (get-output-string output-string))
                    (lambda ()
                      (set-current-output-port saved-output-port)))))

(defmacro (with-output-to-string . code)
  `(call-with-output-to-string (lambda () ,@code)))
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Flisp%2Fdefmacro-coupling.txt;title=defmacro%20and%20coupling.&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/lisp/defmacro-coupling.txt&amp;amp;title=defmacro and coupling.&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>Renaming SVN Users on Windows</title>
    <link>http://www.mschaef.com/blog/2008/02/22#renaming-svn-users-on-windows</link>
    <description>
The instructions I gave earlier on &lt;a 
href=&quot;http://www.mschaef.com/blog/tech/programming/renaming-svn-users.html&quot;&gt;Renaming 
SVN Users&lt;/a&gt; work only when the SVN repository is hosted on a machine that can run 
SVN hooks written in Unix style shell script. On a conventional Windows machine, one 
without Cygwin, MSYS, or similar, you have to switch to writing hooks in something 
like Windows batch language. 

&lt;br&gt;&lt;br&gt;

If all you want to do is temporarily rename users, then you can just create an empty 
file named &lt;tt&gt;pre-revprop-change.cmd&lt;/tt&gt; in your repository under &lt;tt&gt;hooks\&lt;/tt&gt;. 
The default return code from a batch file is success, which SVN interprets as a 
&lt;b&gt;all&lt;/b&gt; revision 
property changes, all the time, by anybody. If you want to implement an actual policy, 
Philibert Pérusse has posted a &lt;a 
href=&quot;http://svn.haxx.se/users/archive-2006-03/0107.shtml&quot;&gt;template script&lt;/a&gt; online.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fprogramming%2Frenaming-svn-users-on-windows.txt;title=Renaming%20SVN%20Users%20on%20Windows&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/programming/renaming-svn-users-on-windows.txt&amp;amp;title=Renaming SVN Users on Windows&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>CAR, CDR, and Lisp...</title>
    <link>http://www.mschaef.com/blog/2008/02/14#car-cdr</link>
    <description>
A couple weeks ago, I got into a brief &lt;a href=&quot;http://www.reddit.com&quot;&gt;reddit&lt;/a&gt; &lt;a
href=&quot;http://reddit.com/info/677u6/comments/c031w9f&quot;&gt;discussion&lt;/a&gt; on the relative merits
of Lisp's &lt;a
href=&quot;http://www.lisp.org/HyperSpec/Body/acc_carcm_cdr_darcm_cddddr.html&quot;&gt;&lt;tt&gt;car&lt;/tt&gt; and
&lt;tt&gt;cdr&lt;/tt&gt;&lt;/a&gt; functions. Given a Lisp list, applying &lt;tt&gt;car&lt;/tt&gt; to the list returns
the first element of the list and applying &lt;tt&gt;cdr&lt;/tt&gt; to the list returns a list of
every element excluding the first.  For someone new to Lisp (as we all were once), these
names can be a bit awkward. However, like many other aspects of the language, there is
more to &lt;tt&gt;car&lt;/tt&gt; and &lt;tt&gt;cdr&lt;/tt&gt; than meets the eye.

&lt;br&gt;&lt;br&gt;

The first implementation of Lisp was done by &lt;a
href=&quot;http://en.wikipedia.org/wiki/Steve_Russell&quot;&gt;Steve Russell&lt;/a&gt; on an &lt;a
href=&quot;http://en.wikipedia.org/wiki/IBM_704&quot;&gt;IBM 704&lt;/a&gt;. The 704 was a 36-bit vacuum tube
machine that IBM started selling in 1954. By the time it was discontinued in 1960, they
had sold a total of 123 of the machines, each capable of a whopping 40,000 calculations
per second. Russell's original 1959 implementation of Lisp on this machine took advantage
of the fact that the 704's instruction set had special capabilities for accessing two
distinct 15 bit fields of a 36 bit value loaded into a machine register: the
&lt;b&gt;a&lt;/b&gt;ddress and &lt;b&gt;d&lt;/b&gt;ecrement fields.  In &lt;a
href=&quot;http://www.iwriteiam.nl/HaCAR_CDR.html&quot;&gt;Russell's own words&lt;/a&gt;: &lt;i&gt; &quot;Because of an
unfortunate temporary lapse of inspiration, we couldn't think of any other names for the 2
pointers in a list node than 'address' and 'decrement', so we called the functions CAR for
'Contents of Address of Register' and CDR for 'Contents of Decrement of Register'.&quot;&lt;/i&gt;
Interestingly enough, he continues with this: &lt;i&gt;&quot;After several months and giving a few
classes in LISP, we realized that 'first' and 'rest' were better names, and we (John
McCarthy, I and some of the rest of the AI Project) tried to get people to use them
instead. ... Alas, it was too late! We couldn't make it stick at all. So we have CAR and
CDR. &quot;&lt;/i&gt; So there you have it: &lt;tt&gt;car&lt;/tt&gt; and &lt;tt&gt;cdr&lt;/tt&gt;, two of the most famous and
widely used functions of Lisp and its descendents, owe their names to a bizarre quirk of a
computer architecture that's been obsolete for close to fifty years. For the record,
here's a source listing for the original 704 implementataion of &lt;tt&gt;car&lt;/tt&gt; taken from &lt;a
href=&quot;ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-006.pdf&quot;&gt;MIT AI Lab Memo
6&lt;/a&gt;:

&lt;pre class=&quot;syntax&quot;&gt;
LXD JLOC, 4
CLA 0,4
PAX 0,4
PSX 0,4
&lt;/pre&gt;

But the story of &lt;tt&gt;car&lt;/tt&gt; and &lt;tt&gt;cdr&lt;/tt&gt; doesn't stop there.  In the 1960's and
early 70's, the &lt;a href=&quot;http://www.utexas.edu&quot;&gt;University of Texas at Austin&lt;/a&gt; ran a
computer called a &lt;a href=&quot;http://en.wikipedia.org/wiki/CDC_6600&quot;&gt;CDC 6600&lt;/a&gt;. The 6600
was one of &lt;a href=&quot;http://en.wikipedia.org/wiki/Seymour_Cray&quot;&gt;Seymour Cray&lt;/a&gt;'s first
big supercomputer designs, and was the fastest computer in the world for a time.  It had a
60-bit machine word and an 18-bit address space, so you can probably see where this is
going.  The designers of UT's Lisp for the CDC 6600 added a third field to &lt;tt&gt;cons&lt;/tt&gt;
cells, giving them each three pointers, the &lt;a
href=&quot;http://groups.google.com/group/alt.folklore.computers/browse_thread/thread/a1c2c48abe467f3c/b6217d9e521e043d?hl=en&amp;lnk=st&amp;q=car+cdr+csr+cdc+6600+lisp#b6217d9e521e043d&quot;&gt;&lt;tt&gt;car&lt;/tt&gt;,
&lt;tt&gt;cdr&lt;/tt&gt;, and &lt;tt&gt;csr&lt;/tt&gt;&lt;/a&gt;. I'm sure the third pointer was useful for implementing
things like trees of nodes and lists of key/value pairs, although apparantly not useful
enough to stick around. Two pointer cons cells are a better fit for modern hardware, and
two two pointer cons cells can represent everything that a single three pointer cons cell
can represent.

&lt;br&gt;&lt;br&gt;

Back at MIT in the 70's, and before things like &lt;a
href=&quot;http://www.lisp.org/HyperSpec/Body/mac_defstruct.html&quot;&gt;&lt;tt&gt;defstruct&lt;/tt&gt;&lt;/a&gt;, &lt;a
href=&quot;http://www.multicians.org/lcp.html&quot;&gt;Maclisp&lt;/a&gt; took the idea of multi-pointer cons
cells to what must be its logical extreme: &lt;a
href=&quot;http://www.maclisp.info/pitmanual/hunks.html&quot;&gt;hunks&lt;/a&gt;. A Maclisp hunk was a
structure like a cons cell that could hold an arbitrary number of pointers, up to total of
512. Each of these slots in a hunk was referred to as a numbered &lt;tt&gt;cxr&lt;/tt&gt;, with a
numbering scheme that went like this: &lt;tt&gt;( cxr-1 cxr-2 cxr-3 ... cxr-&lt;i&gt;n&lt;/i&gt; cxr-0
)&lt;/tt&gt;. No matter how many slots were in the hunk, &lt;tt&gt;car&lt;/tt&gt; was equivalent to &lt;tt&gt;(cxr
1 &lt;i&gt;hunk&lt;/i&gt;)&lt;/tt&gt; and &lt;tt&gt;cdr&lt;/tt&gt; was equivalent to &lt;tt&gt;(cxr 0 &lt;i&gt;hunk&lt;/i&gt;)&lt;/tt&gt;. This is
a nice generalization of the basic idea of a cons cell, but modern Lisps offer other ways
to structure data that are both possibly more useful and more readable: structures for a fixed 
collection of named slots, hash tables for a variable collection of named slots, and vectors
for a collection of numbered slots.

&lt;br&gt;&lt;br&gt;

After these historical blind alleys, it's interesting to think about why &lt;tt&gt;car&lt;/tt&gt; and
&lt;tt&gt;cdr&lt;/tt&gt; still persist fifty years after McCarthy and Russell roamed the halls of MIT
evangelising &lt;tt&gt;first&lt;/tt&gt; and &lt;tt&gt;rest&lt;/tt&gt;. Common Lisp does at least have
&lt;tt&gt;first&lt;/tt&gt; and &lt;tt&gt;rest&lt;/tt&gt; as part of the standard. However, when I was taught Lisp
in the mid-90's, I was encouraged to primarily favor the older &lt;tt&gt;car&lt;/tt&gt; and
&lt;tt&gt;cdr&lt;/tt&gt;. I remember two primary reasons for this. The first was that existing code
favored &lt;tt&gt;car&lt;/tt&gt; and &lt;tt&gt;cdr&lt;/tt&gt;, so it was important to be able to read code written
in that style. The second reason was that &lt;tt&gt;first&lt;/tt&gt; and &lt;tt&gt;rest&lt;/tt&gt; impose a
particular meaning on the fields of a cons cell that may or may not be appropriate. In the
common case of a linear list, &lt;tt&gt;first&lt;/tt&gt; and &lt;tt&gt;rest&lt;/tt&gt; work rather well. If you
call &lt;tt&gt;first&lt;/tt&gt; on the list, you get the first element, if you call &lt;tt&gt;rest&lt;/tt&gt;, you
get the rest. In the case of a cons cell as a node of an association list, they work less
well, unless, that is, you can figure out a reason why &lt;tt&gt;first&lt;/tt&gt; makes sense as
'key', and &lt;tt&gt;rest&lt;/tt&gt; makes sense as 'value'.

&lt;br&gt;&lt;br&gt;

Some of this confusion stems from the fact that most Lisps, despite the name &lt;b&gt;Lis&lt;/b&gt;t
&lt;b&gt;P&lt;/b&gt;rocessing, don't have an official list data type. What they have instead is a two
element cons cell and a set of conventions in the library, reader, and writer for using
the to make linear lists. In a sense, this is a lot like strings in C. C doesn't have a
string type, what it has instead is a pointer to character data (&lt;tt&gt;char *&lt;/tt&gt;) and a
set of library conventions for using blocks of memory as strings of characters. This
laxness on the part of both languages comes with the advantages and disadvantages you'd
expect from letting the deatils of an underlying implementation leak through. In the case
of C 'strings', the representation lends itself both to things like Rob Pike's beautiful
&lt;a href=&quot;http://cm.bell-labs.com/cm/cs/tpop/grep.c&quot;&gt;regex implementation&lt;/a&gt; in &lt;a
href=&quot;http://cm.bell-labs.com/cm/cs/tpop/&quot;&gt;The Practice of Programming&lt;/a&gt; and a seemingly
never ending series of buffer overrun attacks. In the case of Lisp cons cells, it provides
both an incredibly flexible data structure, and confusion over such basic notions as
the 'first', 'second', and 'rest' of a list.

&lt;br&gt;&lt;br&gt;

If something as baroque as 'car' actually makes more sense than 'first' because 'first'
doesn't match up well to underlying abstraction, it might them make sense to
reconsider the underlying implementation. Some modern Lisps like &lt;a
href=&quot;http://clojure.sourceforge.net/&quot;&gt;Clojure&lt;/a&gt; do just that; A Clojure 'list' isn't a
string of cons cells, but rather an instance of a JVM object that implements the
interface &lt;tt&gt;ISeq&lt;/tt&gt;:

&lt;pre class=&quot;syntax&quot;&gt;
public interface ISeq extends IPersistentCollection{
   Object first();
   ISeq rest();
   ISeq cons(Object o);
}
&lt;/pre&gt;

Clojure's user-visible &lt;tt&gt;first&lt;/tt&gt; and &lt;tt&gt;rest&lt;/tt&gt; functions ultimately call into
their like-named methods in &lt;tt&gt;ISeq&lt;/tt&gt;:

&lt;pre class=&quot;syntax&quot;&gt;
static public Object first(Object x){
	ISeq seq = seq(x);
	if(seq == null)
		return null;
	return seq.first();
}

// ...

static public ISeq rest(Object x){
	ISeq seq = seq(x);
	if(seq == null)
		return null;
	return seq.rest();
}
&lt;/pre&gt;

A noteworthy difference between this and a 'conventional' Lisp is the return type of
&lt;tt&gt;rest&lt;/tt&gt;: it's another &lt;tt&gt;ISeq&lt;/tt&gt;, rather than a &lt;tt&gt;Object&lt;/tt&gt;. Because of this,
the 'CDR' of a Clojure cons cell has a new constraint: it is constrained to be another
sequence, increasing greatly the likelihood that 'rest' really is 'the rest'. While
this could be done even if &lt;tt&gt;rest&lt;/tt&gt; returned an &lt;tt&gt;Object&lt;/tt&gt;, constraining
rest to be a sequence eliminates a number of edge cases in the language that arise
when you allow the rest of a list to be something other than a list itself. This altered 
representation also fits in nicely with Clojure's host JVM: there's nothing that says
&lt;tt&gt;ISeq&lt;/tt&gt; has to be implemented by a two-element pointer. Indeed, Closure 
&lt;a href=&quot;http://reddit.com/r/programming/info/68sll/comments/c036rc6&quot;&gt;
&lt;i&gt;&quot;also implements first and rest for vectors, strings, arrays, maps, Java Iterables,
lazily calculated and infinite sequences etc.&quot;&lt;/i&gt;&lt;/a&gt;  All these implementations of
&lt;tt&gt;ISeq&lt;/tt&gt; make it easier for Clojure sequences to interoperate with Java,
and it makes it easier to build a sequence library that works on all kinds of sequences.
What's lost with this choice is the ability to use a cons cell as an informal two-element
structure.  Even then, this style of &lt;tt&gt;first&lt;/tt&gt; and &lt;tt&gt;rest&lt;/tt&gt; could co-exist
with implementations of &lt;tt&gt;car&lt;/tt&gt; and &lt;tt&gt;cdr&lt;/tt&gt; that work the 'old way'.

&lt;br&gt;&lt;br&gt;

So in the end, maybe &lt;tt&gt;car&lt;/tt&gt; and &lt;tt&gt;cdr&lt;/tt&gt; are all right. They aren't the best
names in the world, but they fit nicely the semantics of a unrestricted two-pointer
cons cell. For those cases where you really are finding the first and rest of a 
list of cons cells, it is easy to use the &lt;tt&gt;first&lt;/tt&gt; and &lt;tt&gt;rest&lt;/tt&gt; functions in
lieu of &lt;tt&gt;car&lt;/tt&gt; and &lt;tt&gt;cdr&lt;/tt&gt;. Then, for dialects like Clojure, your code is
automatically portable to other sequence types. If you're using cons cells as a
ad hoc structure, then you can either use &lt;tt&gt;car&lt;/tt&gt; and &lt;tt&gt;cdr&lt;/tt&gt; and accept those
names as  historical baggage of the second oldest major programming language, or
investigate some of  the other more modern ways of structuring data in Lisp programs. &lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Flisp%2Fcar-cdr.txt;title=CAR%2C%20CDR%2C%20and%20Lisp...&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/lisp/car-cdr.txt&amp;amp;title=CAR, CDR, and Lisp...&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>A correction and another blog.</title>
    <link>http://www.mschaef.com/blog/2008/02/14#correction_and_blog</link>
    <description>
&lt;list&gt;
&lt;li&gt;There was a copy/paste error in the version of &lt;a 
  href=&quot;http://www.mschaef.com/blog/tech/programming/ant-up.html&quot;&gt;ant-up&lt;/a&gt; 
  I posted a while ago. It has now been corrected.
&lt;li&gt;I ran across &lt;a href=&quot;http://www.adamhoughton.com&quot;&gt;Adam Houghton's
   blog&lt;/a&gt; the other day. It looks pretty interesting and there's 
   software to download (which is more than I can say right now). The blog 
   seems to currently focus on Apple/Java/AJAX related content. The iPhone
   based javadoc viewer looks particularly interesting, for those of us 
   not interested in carrying around a library.
&lt;li&gt;Also, on a totally different note is &lt;a 
    href=&quot;http://www.autoblog.com/&quot;&gt;Autoblog&lt;/a&gt;, a 'professional' blog
    covering automotive news. It's updated fairly often too.
&lt;/list&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fgeneral%2Fcorrection_and_blog.txt;title=A%20correction%20and%20another%20blog.&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/general/correction_and_blog.txt&amp;amp;title=A correction and another blog.&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>Renaming historical users (svn:author's) in SVN repositories</title>
    <link>http://www.mschaef.com/blog/2008/02/11#renaming-svn-users</link>
    <description>
I've been keeping track of the vCalc source code in an SVN
repository since May of 2005. While I'm the only person who has
ever committed code into the repository, I've developed vCalc on
three or four machines, with different usernames on each
machine. Since SVN records usernames with each commit, these
historical usernames show up in each &lt;a
href=&quot;http://svnbook.red-bean.com/en/1.0/re15.html&quot;&gt;&lt;tt&gt;svn
log&lt;/tt&gt;&lt;/a&gt; or &lt;a
href=&quot;http://svnbook.red-bean.com/en/1.0/re02.html&quot;&gt;&lt;tt&gt;svn
blame&lt;/tt&gt;&lt;/a&gt;. &lt;tt&gt;svn blame&lt;/tt&gt; is particularly bad because it
displays a code listing with the username prepended to each line
in a fixed width gutter. With some usernames longer than others,
usernames that are very long can exceed the width of the gutter
and push the code over to the right.  Fortunately, changing
historical usernames isn't that hard, if you have administrator
rights on your SVN repository.

&lt;br&gt;&lt;br&gt;

SVN stores the name of a revision's committer in a &lt;a
href=&quot;http://svnbook.red-bean.com/en/1.4/svn.advanced.props.html&quot;&gt;revision
property&lt;/a&gt; named &lt;tt&gt;svn:author&lt;/tt&gt;. If you're not familar
with the term, a revision property is a blob of out of band data
that SVN attaches to the revision. In addition to the author of a
commit, they're also used to store the commit log message, and,
via SVN's &lt;a
href=&quot;http://svnbook.red-bean.com/en/1.0/re23.html&quot;&gt;&lt;tt&gt;propset&lt;/tt&gt;&lt;/a&gt;
and &lt;a
href=&quot;http://svnbook.red-bean.com/en/1.0/re21.html&quot;&gt;&lt;tt&gt;propget&lt;/tt&gt;&lt;/a&gt;
commands, user-provided custom metadata for the
revision. Changing the name of a user associated with a commit
basically amounts to using &lt;tt&gt;propset&lt;/tt&gt; to update the
&lt;tt&gt;svn:author&lt;/tt&gt; property for a revision. The command to do
this is structured like so:

&lt;pre class=&quot;syntax&quot;&gt;
svn propset svn:author --revprop -r&lt;i&gt;rev-number&lt;/i&gt; &lt;i&gt;new-username&lt;/i&gt; &lt;i&gt;repository-URL&lt;/i&gt;
&lt;/pre&gt;

If this works, you are all set, but what is more likely to happen
is the following error:

&lt;pre class=&quot;syntax&quot;&gt;
svn: Repository has not been enabled to accept revision propchanges;
ask the administrator to create a pre-revprop-change hook
&lt;/pre&gt;

By default, revision property changes are disabled. This makes
sense if you are at all interested in using your source code
control system to satisfy an auditing requirement. Changing the
author of a commit would be a great way for a developer to cover
their tracks, if they were interested in doing something
underhanded. Also, unlike most other aspects of a project managed
in SVN, &lt;i&gt;revision properties have no change tracking&lt;/i&gt;: They
are the change tracking mechanism for everything else.  Because
of the security risks, enabling changes to revision properties
requires establishment of a guard hook: an external procedure
that is consulted whenever someone requests that a revision
property be changed. Any policy decisions about who can change
what revision property when are implemented in the hook
procedure.

&lt;br&gt;&lt;br&gt;

Hooks in SVN are stored in the &lt;tt&gt;hooks/&lt;/tt&gt; directory under
the repository toplevel. Conveniently, SVN provides a sample
implementation on the hook we need to implement in the shell script
&lt;tt&gt;pre-revprop-change.tmpl&lt;/tt&gt;, but the sample implementation
also has strict defaults about what can be changed, allowing only
the log message to be set:

&lt;pre class=&quot;syntax&quot;&gt;
if [ &quot;$ACTION&quot; = &quot;M&quot; -a &quot;$PROPNAME&quot; = &quot;svn:log&quot; ]; then exit 0; fi

echo &quot;Changing revision properties other than svn:log is prohibited&quot; &amp;gt;&amp;2
exit 1
&lt;/pre&gt;

The sample script can be enabled by renaming it to
&lt;tt&gt;pre-revprop-change&lt;/tt&gt;. It can be made considerably more lax
by adding an &lt;tt&gt;exit 0&lt;/tt&gt; before the lines I list above. At
this point, the property update command should work, although if
you're at all interested in the security of your repository, it
is best to restore whatever revision property policy was in place
as soon as possible.
&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fprogramming%2Frenaming-svn-users.txt;title=Renaming%20historical%20users%20%28svn%3Aauthor%27s%29%20in%20SVN%20repositories&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/programming/renaming-svn-users.txt&amp;amp;title=Renaming historical users (svn:author's) in SVN repositories&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>Excel, CR/LF, and CSV</title>
    <link>http://www.mschaef.com/blog/2008/02/01#cr-lf</link>
    <description>
I've spent a fair amount of time lately working with code that generates 
Comma Seperated Value files for loading into Excel.  You'd think the 
format would be trivial, but &lt;a 
href=&quot;http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm#EmbedBRs&quot;&gt;not quite&lt;/a&gt;. One 
additional subtlety, one not covered in that 'specification', is Excel's inconsistent handling 
of end of line markers. As it turns out, if Excel loads a CSV file that contains a quoted, 
multi-line value, it expects a different line feed convention within the quoted value than the 
usual CR/LF. A CR embedded in a quoted field renders as a box, rather than as part of a newline. 
To suppress the box, CSV files for Excel need to be written with a LF-only convention within 
quoted values. Even then, Excel will not automatically expand rows containing a multi-line 
value. That has to be done manually.

&lt;br&gt;&lt;br&gt;

Internally, Excel seems to follow the same LF-only convention that this issue with CSV files 
seems to imply.  Taking the &lt;tt&gt;CODE(...)&lt;/tt&gt; of each character in a manually entered 
multi-line cell value, shows only one charater, a LF, at each line break. My guess is that the 
quotes in a CSV file just act as a signal to turn off &lt;b&gt;all&lt;/b&gt; special character handling, not 
just handling that signals new rows and cells. Either way, it's more than a little irritating 
that Excel compatible CSV files with multi-line values have to have two seperate end of line 
conventions.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fexcel%2Fcr-lf.txt;title=Excel%2C%20CR%2FLF%2C%20and%20CSV&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/excel/cr-lf.txt&amp;amp;title=Excel, CR/LF, and CSV&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>Still not tested... still not working... sort of...</title>
    <link>http://www.mschaef.com/blog/2008/01/21#still_not_tested</link>
    <description>
Another one along the lines of &lt;a 
href=&quot;http://www.mschaef.com/blog/tech/programming/not_tested_not_working.html&quot;&gt;My 
last post&lt;/a&gt;. I tried to compile this source file today, using the 
compiler in my little Lisp:

&lt;pre class=&quot;syntax&quot;&gt;
(define (values . args) (%panic &quot;roh roh&quot;))

(define (test x) (+ x 1))
&lt;/pre&gt;

I got the following result:

&lt;pre class=&quot;syntax&quot;&gt;
d:\test&gt;vcsh -c test.scm
;;;; VCSH, Debug Build (SCAN 0.99 - Dec 17 2007 16:47:30)

; Info: Loading Internal File: fasl-compiler
; Info: Package 'fasl-compiler' created
; Info: Loading Internal File: fasl-write
; Info: Package 'fasl-write' created
; Info: Loading Internal File: fasl-compiler-run
; Info: Package 'fasl-compiler-run' created
; Info: stack limit disabled!
Fatal Error: roh roh @ (error.cpp:168)
&lt;/pre&gt;

Needless to say, fatal errors still aren't any good. However, this one is 
a bit more interesting than a simple type checking problem. The function 
&lt;tt&gt;%panic&lt;/tt&gt; is the internal function used to signal fatal errors from 
Lisp code. The first definition above redefines &lt;tt&gt;values&lt;/tt&gt;, the 
function to return multiple return values, so that it always panics with a 
fatal error. This is the kind of thing that, if done in a running 
environment, would break things almost immediately.

&lt;br&gt;&lt;Br&gt;

But, the compiler is slightly different.... it isolates the program being 
compiled from the compiler itself. This is done to keep redefinitions that 
might break the currently running compiler from doing just that. 
Redefinitions by the compiled program are only supposed to be visible to 
the compiled program. Since the above program never itself invokes 
&lt;tt&gt;values&lt;/tt&gt;, it should never hit the call to &lt;tt&gt;%panic&lt;/tt&gt;... except 
that it does.

&lt;br&gt;&lt;br&gt;

What's happening here lies in the processing of the second definition. The 
definition itself is transformed a couple times by macroexpansion, first 
to this:

&lt;pre class=&quot;syntax&quot;&gt;
(%define test (named-lambda test (x) (+ x 1)))
&lt;/pre&gt;

And then, basically, to this:

&lt;pre class=&quot;syntax&quot;&gt;
(%define test (%lambda ((name . test) (lambda-list x)) (x) (+ x 1)))
&lt;/pre&gt;

The second macroexpansion step is the step that looks for optional 
arguments, and the internal function that parses lambda lists for optional 
arguments returns three values using &lt;tt&gt;values&lt;/tt&gt;. This invocation of 
&lt;tt&gt;values&lt;/tt&gt; happens in the environment of the program being compiled, 
so it hits the new &lt;tt&gt;%panic&lt;/tt&gt;-invoking definition and the whole show 
grinds to a halt. The 'easy' fix, ensuring that macro expansion is 
isolated from potentially harmful redefinitions, won't work. Macro 
expansion has to happen in the user environment, so that macros can see 
function definitions that they might rely upon.

&lt;br&gt;&lt;br&gt;

I don't have a unit test for the user/compiler seperation logic, so I 
thought when I started this blog post I was going to say something like: 
'look, something else fundamentally broken, and without a test case'. 
That's interesting, but if you need convincing to write unit tests, you're 
probably already lost. What I actually learned while researching this post 
is a bit more subtle: it's a fundamental problem, but it's more about the 
design than the code itself.  While the design I have for user/compiler 
seperation seems to work most of the time, it's not adequate to solve this 
kind of problem. I'm not yet exactly sure what the solution is, but it 
won't necessarily involve a missing unit test.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fprogramming%2Fstill_not_tested.txt;title=Still%20not%20tested...%20still%20not%20working...%20sort%20of...&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/programming/still_not_tested.txt&amp;amp;title=Still not tested... still not working... sort of...&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>Not tested? Then it doesn't work.</title>
    <link>http://www.mschaef.com/blog/2008/01/20#not_tested_not_working</link>
    <description>
The other day, I had the following (abbreviated) dialog with my little 
Scheme interpreter:

&lt;br&gt;&lt;br&gt;

&lt;pre class=&quot;syntax&quot;&gt;
scheme&gt; (intern! 'xyzzy2 (find-package &quot;keyword&quot;))
; Fatal Error: Assertation Failed: STRINGP(pname) @ (oblist.cpp:451)
c:\vcalc&gt;vcsh.exe

scheme&gt; (intern! 12)
; Fatal Error: Assertation Failed: STRINGP(sym_name) @ (oblist.cpp:269)
c:\vcalc&gt;
&lt;/pre&gt;

&lt;br&gt;

Needless to say, 'Fatal errors' aren't good things, and fatal errors 
in &lt;tt&gt;intern!&lt;/tt&gt;, a core function, are even worse. Without going 
into too many details, the first call should be returning 
successfully, and the second should be throwing a runtime type check 
error. However, the implementation of &lt;tt&gt;intern!&lt;/tt&gt; wasn't 
checking argument types and passing invalid arguments into lower 
layers of the interpreter's oblist (symbol table) code, which died 
with an assertation failure.

&lt;br&gt;&lt;br&gt;

To put this in perspective, my implentation of &lt;tt&gt;intern!&lt;/tt&gt; is 
about five years old, and something that I thought to be a fairly 
reliable piece of code. At the very least, I didn't think it was 
susceptable to something as simple as a type checking error that
would crash the entire interpreter. Of course, when I looked at my 
test suite, there wasn't a set of tests for &lt;tt&gt;intern!&lt;/tt&gt;. That
might have something to do with it, don't you think?  

&lt;br&gt;&lt;br&gt;

Here are the morals I'm taking from this little story:

&lt;br&gt;&lt;br&gt;

&lt;list&gt;

&lt;li&gt;Do not assume something works, unless you have a complete
    test suite for it. (Even then be wary, because your test suite
    is probably not complete.)

&lt;li&gt;Shoot for more than 60% code coverage on your test cases.

&lt;li&gt;Don't write your own interpreter, because there are probably
    hundreds of other bugs just like this one. :-)

&lt;/list&gt;
&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fprogramming%2Fnot_tested_not_working.txt;title=Not%20tested%3F%20Then%20it%20doesn%27t%20work.&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/programming/not_tested_not_working.txt&amp;amp;title=Not tested? Then it doesn't work.&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>The programmer's 'food' pyramid.</title>
    <link>http://www.mschaef.com/blog/2008/01/17#food_pyramid</link>
    <description>
I don't usually write posts for the sole purpose of linking to other 
posts, but this is an exception. &lt;a 
href=&quot;http://osteele.com/archives/2008/01/programmers-pyramid&quot;&gt;This&lt;/a&gt; is 
brilliant. What it is is the &lt;a href=&quot;http://www.mypyramid.gov/&quot;&gt;USDA's 
Food Pyramid&lt;/a&gt;but adapted to how programmers should spend their time.  
My one complaint is that it's way too focused on coding.  My experience 
has been that it really pays to spend time on design work and learning to 
how to better interact with others, be they clients or team-mates. If you 
can design your way out of a rewrite, or work with your client to recast 
requirements to save complexity, it can be far more cost effective than 
even the best raw code.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fprogramming%2Ffood_pyramid.txt;title=The%20programmer%27s%20%27food%27%20pyramid.&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/programming/food_pyramid.txt&amp;amp;title=The programmer's 'food' pyramid.&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>Cingular 2125 Followup</title>
    <link>http://www.mschaef.com/blog/2008/01/12#c2125_followup</link>
    <description>
Last June, I wrote &lt;a href=&quot;http://www.mschaef.com/blog/tech/products/c2125.txt&quot;&gt;a bit&lt;/a&gt;
on my experiences with the Cingular 2125 Windows Smartphone. After more than a year, the 
phone has been a good choice, but there have been several suprises, for both the good and 
the bad.

&lt;br&gt;&lt;br&gt;

&lt;list&gt;
&lt;li&gt;This is the first phone I've used with a web browser that's usable for 
    general web surfing. Most sites render reasonably correctly, and the display
    is large enough to contain a useful amount of content. It's still not perfect,
    the browser crashes too often and it is difficult to log into
    &lt;a href=&quot;http://reddit.com/&quot;&gt;reddit&lt;/a&gt;, but this is a vast improvement
    over conventional phones.

&lt;li&gt;I installed a 1GB SD Card that is borderline useless. This might be different
    if I'd been more aggressively installing software or music, but as it is, the
    primary benefit of having a card like this is that I can now take 40,000
    pictures before I run out of space.

&lt;li&gt;Outlook integration is still incredibly useful, but it's been harder
    to keep the calendar in sync than I thought. This is probably due
    to the fact I get lots of meeting invites that change, but it's made
    it difficult to rely on the phone as the 'authoritative' source for
    my scheduling information I hoped it would be.

&lt;li&gt;J2ME is a non-starter on this phone. There is a JVM, but it's buried under
    a submenu and the applications running on it look more like 'steerage class'
    than 'first class' citizens of the phone. They aren't integrated with the 
    main application launcher, and their interfaces look like something out of
    1988. I really get the impression that the phone has J2ME solely for the 
    purpose of selling into corporate clients with a requirement to run custom
    J2ME code.

&lt;li&gt;Given the power of the underlying hardware and the quality of the display,
    I was hoping to find more games for the phone. My previous two phones both
    had small collections of J2ME games purchased through my service provider's
    web site. AT&amp;T has finally started adding games for this phone to their
    site, but the selection is limited, expensive, and not that great. I did
    at least find a few games elsewhere that are pretty fun, &lt;a 
    href=&quot;http://www.isotope244.com/atomic_cannon_pocket.html&quot;&gt;
    Atomic Cannon&lt;/a&gt; and &lt;a href=&quot;http://www.nethack.org/v343/ports/download-wince.html&quot;&gt;
    Nethack&lt;/a&gt;. These were both pretty easy to install. Atomic Cannon,
    in particular, demonstrates the graphics of the phone quite well.

&lt;li&gt;I don't use the 'Phone as Modem' capability at all. I don't have as many places
    where I need to use it as I thought. That said, it does work, and would be
    a nice way to check mail in a pinch.

&lt;/list&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fproducts%2Fc2125_followup.txt;title=Cingular%202125%20Followup&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/products/c2125_followup.txt&amp;amp;title=Cingular 2125 Followup&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>Ant-up</title>
    <link>http://www.mschaef.com/blog/2008/01/09#ant-up</link>
    <description>
In my career, I've done a bit of switching back and forth between &lt;a
href=&quot;http://www.gnu.org/software/emacs/&quot;&gt;Emacs&lt;/a&gt; and various
IDE's. One of the IDE features I've come to depend on is quick access
to the compiler. Typically, IDE's make it possible to compile your
project with a keystroke, and then navigate from error to error at the
press of a key. It's easy to recreate this in Emacs. The following two
expressions make Emacs work a lot like Visual Studio in this regard.

&lt;pre class=&quot;syntax&quot;&gt;
(global-set-key [(shift f5)] 'compile)
(global-set-key [f12] 'next-error)
&lt;/pre&gt;

After these forms are evaluated, pressing Shift-F5 invokes the
&lt;tt&gt;compile&lt;/tt&gt; command, which asks for a command to be run in an
inferior shell, typically &lt;tt&gt;make&lt;/tt&gt;, &lt;tt&gt;ant&lt;/tt&gt;, or some other
build utility. The catch is that it runs the command in the directory
of the current buffer, which implies that the build script can be
found in the same directory as the current source file. For a Java
project with a per-package directory hierarchy, this is often not
true. There are probably a bunch of ways to fix this, but I've solved
it with a Windows NT batch file, &lt;tt&gt;ant-up.bat&lt;/tt&gt;, that repeatedly
searches up the directory hierarchy for &lt;tt&gt;build.xml&lt;/tt&gt;. I just
compile with &lt;tt&gt;ant-up&lt;/tt&gt;, rather than a direct invocation of
&lt;tt&gt;ant&lt;/tt&gt;. This is not the most elegant solution, I'm sure, but it
took about five minutes to implement and works well.

&lt;pre class=&quot;syntax&quot;&gt;
@echo off

setlocal


:retry

set last_path=%CD%

echo Searching %CD% ...

if exist build.xml goto compile-now

cd ..

if &quot;%last_path%&quot;==&quot;%CD%&quot; goto abort

goto retry

:compile-now

call ant -k %1 %2 %3 %4 %5

if errorlevel 1 goto failure

goto success

:abort

echo build.xml not found... compile failed

:failure

exit /b 1

:success

exit /b 0
&lt;/pre&gt;

&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fprogramming%2Fant-up.txt;title=Ant-up&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/programming/ant-up.txt&amp;amp;title=Ant-up&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  <item>
    <title>Function Call Interfaces and Dynamic Typing</title>
    <link>http://www.mschaef.com/blog/2008/01/08#function-call-interfaces-and-dynamic-typing</link>
    <description>
Lately, I've been thinking a bit about the way language design
influences library design. My line of thought started out inspired by
some of the recent conversations about closures in Java, but it ended
up also touching on dynamic typing and a few other 'modern' language
features.  This will end up being more than one post, but I thought
I'd record some of it in my blog, with the hope that it might shed
some light for somebody, somewhere.

&lt;br&gt;&lt;br&gt;

To motivate this discussion, I'll use as an example a simple C
implementation of a string-interning function, &lt;tt&gt;intern_string&lt;/tt&gt;.
If you're not familiar with the concept of interning, the premise is
that interning two objects ensures that if they have the same value,
they also have the same identity. In the case of C strings, interning
ensures that if &lt;tt&gt;strcmp(intern_string(a), intern_string(b)) ==
0&lt;/tt&gt; holds true, then &lt;tt&gt;intern_string(a) == intern_string(b)&lt;/tt&gt;
also holds true.  Since it effectively means that each string value is
only stored one time, this technique can reduce memory
requirements. It also gives you a cheap string equality comparison:
checking two interned strings for equality reduces to a pointer
comparison, which is about as fast as it gets.

&lt;br&gt;&lt;br&gt;

Given a hash table that compares keys by value, implementing the
function &lt;tt&gt;string_intern&lt;/tt&gt; is fairly simple. In the following
code code, &lt;tt&gt;intern_table&lt;/tt&gt; is a hash table that maps strings to
themselves. &lt;tt&gt;hash_ref&lt;/tt&gt;, &lt;tt&gt;hash_set&lt;/tt&gt;, and
&lt;tt&gt;hash_has&lt;/tt&gt; are functions that manipulate the hash table:

&lt;br&gt;&lt;br&gt;
	
&lt;list&gt;
&lt;li&gt;&lt;tt&gt;int hash_has(hash_table_t ht, char *key)&lt;/tt&gt; - Returns &lt;tt&gt;TRUE&lt;/tt&gt; or
    &lt;tt&gt;FALSE&lt;/tt&gt;, depending on whether or not the key &lt;tt&gt;key&lt;/tt&gt; is found
    in the hash table &lt;tt&gt;ht&lt;/tt&gt;.

&lt;li&gt;&lt;tt&gt;char *hash_ref(hash_table_t ht, char *key)&lt;/tt&gt; - Returns the
    value bound to the key &lt;tt&gt;key&lt;/tt&gt; by the hash table &lt;tt&gt;ht&lt;/tt&gt;. 
    If the key is not found, behavior is undefined.

&lt;li&gt;&lt;tt&gt;char *hash_set(hash_table_t ht, char *key, char *value)&lt;/tt&gt; - 
    Binds the value &lt;tt&gt;value&lt;/tt&gt; to the key &lt;tt&gt;key&lt;/tt&gt; in the hash
    table &lt;tt&gt;ht&lt;/tt&gt;. If the key is already present, the existing value
    is overwritten. This function returns &lt;tt&gt;value&lt;/tt&gt;.    
&lt;/list&gt;

&lt;br&gt;

Note the critical assumption that the &lt;tt&gt;hash_*&lt;/tt&gt; accessors
implement key comparison by value sementics, &lt;tt&gt;strcmp&lt;/tt&gt;, rather
than identity semantics, &lt;tt&gt;==&lt;/tt&gt;.

&lt;pre class=&quot;syntax&quot;&gt;
   hash_table_t intern_table; // assume this is initialized somewhere else.

   char *intern_string(char *str) 
   {
     if (hash_has(intern_table, str))
        return hash_ref(intern_table, str);
    
     char *interned_str = strdup(str);
  
     hash_set(intern_table, interned_str, interned_str);

     return interned_str;
   }
&lt;/pre&gt;


The first step of &lt;tt&gt;intern_string&lt;/tt&gt; is to check to see if the
intern table already contains a string with the value of the new
string. If the new string is already in the intern table, then the
function returns the copy that's in the table.  Otherwise, a new copy
of the incoming string is created and stored in the hash table. In
either case, the string returned is in the the intern table.  This
logic ensures that every time &lt;tt&gt;intern_string&lt;/tt&gt; is called with a
&lt;tt&gt;str&lt;/tt&gt; of the same value, it returns the same exact string.

&lt;br&gt;&lt;br&gt;
 
If you haven't guessed already, the problem with this implementation
of &lt;tt&gt;intern_string&lt;/tt&gt; lies in the dual calls to &lt;tt&gt;hash_has&lt;/tt&gt;
and &lt;tt&gt;hash_ref&lt;/tt&gt;.  Both calls involve searching the hash table
for the key: &lt;tt&gt;hash_has&lt;/tt&gt; to determine if the key exists, and
&lt;tt&gt;hash_ref&lt;/tt&gt; to retrieve the key's value. This means that in the
common case, interning a string that's already been interned, this
implementaion searches the hash table twice. Double work.

&lt;br&gt;&lt;br&gt;

Fixing this problem involves changing the calling conventions for
&lt;tt&gt;hash_ref&lt;/tt&gt;. One of the simplest ways to do this involves
defining a specific return value that &lt;tt&gt;hash_ref&lt;/tt&gt; can return in
the 'key not found' case. For strings in C, &lt;tt&gt;NULL&lt;/tt&gt; is a logical
choice. This change to &lt;tt&gt;hash_ref&lt;/tt&gt; makes it possible to remove
the double search by eliminating the explicit &lt;tt&gt;hash_has&lt;/tt&gt; check:

&lt;pre class=&quot;syntax&quot;&gt;
   hash_table_t intern_table;

   char *intern_string(char *str) 
   {
     char *interned_str = hash_ref(intern_table, str);

     if (interned_str == NULL) 
     {   
        interned_str = strdup(str);
  
        hash_set(intern_table, interned_str, interned_str);
     }

     return interned_str;
   }
&lt;/pre&gt;

For this string interning, this change to &lt;tt&gt;hash_ref&lt;/tt&gt; interface
works fairly well. We know that we'll never store a hash key with a
&lt;tt&gt;NULL&lt;/tt&gt; value, so we know that &lt;tt&gt;NULL&lt;/tt&gt; is safe to use for
signaling that a key was not found. Were this ever to change, this
version of &lt;tt&gt;hash_ref&lt;/tt&gt; doesn't return enough information to
distinguish between the 'key not found' case and the '&lt;tt&gt;NULL&lt;/tt&gt;
value' case. Both would return &lt;tt&gt;NULL&lt;/tt&gt;.  To fix this,
&lt;tt&gt;hash_ref&lt;/tt&gt; needs to be extended to also return a seperate value
that indicates if the key was found. This can be done in C by having
&lt;tt&gt;hash_ref&lt;/tt&gt; return the 'key found' flag as a return value, and
also accept a pointer to a buffer that will contain the key's value,
if it's found:

&lt;pre class=&quot;syntax&quot;&gt;
   hash_table_t intern_table;

   char *intern_string(char *str) 
   {
     char *interned_str;  

     if (!hash_ref(intern_table, str, &amp;interned_str))
     {   
        interned_str = strdup(str);
  
        hash_set(intern_table, interned_str, interned_str);
     }

     return interned_str;
   }
&lt;/pre&gt;

This is probably about as good as you can get in straight C.  It
easily distinguishes between the 'no-value' and 'no-key' cases, it's
relatively clear to read, and it uses the common idioms of the
language. However, C is a relatively sparse language. If you're
willing to switch to something a bit more expressive, you have other
choices.

&lt;br&gt;&lt;br&gt;

One example of this is a choice that's particularly well supported by
dynamically typed languages.  With a language that can identify types
at runtime, it becomes possible for &lt;tt&gt;hash_ref&lt;/tt&gt; to return values
of a different type if the key is not found. This value can be
distinguished from other return values by virtue of the run time type
identification supported by the language. In one such language,
Scheme, this lets us implement &lt;tt&gt;intern-string&lt;/tt&gt; like this:

&lt;pre class=&quot;syntax&quot;&gt; 
   (define *intern-table* (make-hash :equal))

   (define (intern-string str)
    (let ((interned-str (hash-ref *intern-table* str 42)))
     (cond ((= interned-str 42)
             (hash-set! *intern-table* str str)
              str)
           (#t
             interned-str)))))
&lt;/pre&gt;

If you prefer C/JavaScript-style syntax, it looks like this:

&lt;pre class=&quot;syntax&quot;&gt; 
   var intern_table = make_hash(EQUAL);

   function intern_string(str)

   {
      var interned_str = hash_ref(intern_table, str, 42);

      if (interned_str == 42)
      {
          hash_set(intern_table, str, str);
          return str;
      }

      return interned_str;
   }
&lt;/pre&gt;


In this case, &lt;tt&gt;hash_ref&lt;/tt&gt; has been extended with a third
argument: a default return value if the key is not found. The above
code uses this to have &lt;tt&gt;hash_ref&lt;/tt&gt; return a number in 'no value'
case, and it's the type itself of this return value that ensures its
distinctness. This is a common dynamic language idiom, but for a
moment, consider what it would look like in C:

&lt;pre class=&quot;syntax&quot;&gt;
   hash_table_t intern_table;

   char *intern_string(char *str) 
   {
     char *interned_str = hash_ref(intern_table, str, (char *)42);

     if (interned_str == (char *)42) 
     {   
        interned_str = strdup(str);
  
        hash_set(intern_table, interned_str, interned_str);
     }

     return interned_str;
   }
&lt;/pre&gt;

At first, this actually seems like it might a plausible implementation
of &lt;tt&gt;intern_string&lt;/tt&gt;. My guess is that it might even work most of
the time. Where this implementation gets into trouble is the case when
an interned string might reasonably be located at address 42. Because
C is statically typed, When &lt;tt&gt;hash_ref&lt;/tt&gt; returns, all it returns
is a pointer. The caller cannot distinguish between the 'valid string
at address 42' return value and the 'no-key' return value.  This is
basically the same problem as the case where we overloaded &lt;tt&gt;NULL&lt;/tt&gt;
to signal 'no-key'.

&lt;br&gt;&lt;br&gt;

The way the dynamically typed language solved this problem is worth
considering. When a dynamically typed language passes a value, what 
it's really doing is  returning a pointer &lt;i&gt;along with a few extra
bits describing the type of the object&lt;/i&gt; being pointed to. (Runtime
implementations might vary, but that's the gist of many.)  Using
dynamic typing to distinguish between those two possible cases really
amounts to using those few extra type bits to contain 'another' return
value, one holding information on whether or not the key was found.
&lt;i&gt;This is exactly what our 'best' C implementation does explicitly with
a return value and a reference value.&lt;/i&gt; The dynamic typing isn't
necessarily adding any expressive power, but it is giving another,
concise means of expressing what we're trying to say.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://reddit.com/submit?url=http%3A%2F%2Fwww.mschaef.com%2Fblog%2Ftech%2Fprogramming%2Ffunction-call-interfaces-and-dynamic-typing.txt;title=Function%20Call%20Interfaces%20and%20Dynamic%20Typing&quot;&gt;reddit this!&lt;/a&gt; &lt;a href=&quot;http://www.digg.com/submit?url=http://www.mschaef.com/blog/tech/programming/function-call-interfaces-and-dynamic-typing.txt&amp;amp;title=Function Call Interfaces and Dynamic Typing&amp;amp;phase=2&quot;&gt;Digg Me!&lt;/a&gt;</description>
  </item>
  </channel>
</rss>