Mike Schaeffer's Weblog
Fri, 29 Jul 2005
Thu, 28 Jul 2005
Now that I've written a little about why you might want to replace Excel AutoFilter, here's how to actually do it. To frame the discussion, there are two problems to solve:
• Deciding which rows of the input set are part of the result set
• Displaying the result set in a contiguous sequence of spreadsheet rows.
The first problem is easy: add another column alongside the input set with a formula that evaluates to TRUE if the row belongs in the result. This can be any valid Excel formula: it can include complex logic, it can depend on other cells containing control parameters. In my example spreadsheet, this formula is in column H, labeled In Query?:

The tricky bit of the formula-based filter is the second problem: displaying the result set in a contiguous range of rows with no gaps. Each cell that might display part of the result set has to figure out itself what part of the result set to display, if any, and pull the data from the input set. A simple MATCH or LOOKUP can't handle this, since MATCH or LOOKUP can't be told to return the second, third, or nth match. They return the first match, which isn't quite enough for what we're trying to do.

As it turns out, even though having the result set compute a mapping from the input set is quite hard, solving the reverse problem isn't too bad. Having the input set compute the mapping to the result set is easy. Here's how it works, by column:
• Ord. - The row ordinal number of the row in the input set, starting with 1.
• Result Ord. - This column starts at zero, in the row preceeding the first row of the result set, and increments by 1 for each row where In Query? is TRUE. For each row with In Query? of TRUE, this column is the row ordinal number of this row in the result set.... We are almost there.
• Result Rows. - The input row ordinal of each row in the output set. This is done by using MATCH to find the first row for each number in the Result Ord..
Once the Result Rows. column has been calculated, populating the actual result set is just a matter of using INDEX. ISERROR can be called on cells in Result Rows. to identify rows that don't contain values. After all this is said and done, we have a spreadsheet range that contains only a result set, updates like every other range in Excel, and can be used in formulas like every other range. I have a sample spreadsheet that implements a lot of this here.

reddit this! Digg Me!
Wed, 27 Jul 2005
As rumored, Apple just refreshed the iBook. The other rumor, the one about a new chassis and a widescreen display, did not come true. Between that and Apple's desire not to encroach too much on the PowerBooks, there wasn't much headroom for major upgrades:
• Sudden motion sensing for the disk. (Is this done by the disk itself with a built in motion sensor or by the motherboard/CPU?)
• Standard Bluetooth
• A minor speed bump: the peak CPU is now a 1.42GHz G4 with a 142MHz bus.
I was hoping for more, but given Apple's total lack of manuvering room in the laptop space, this is an understandable bump. If they upgraded the iBook too much, there'd be little reason to pay extra for the PowerBook. Since they can't upgrade the PowerBook too much (thanks to the stagnant G4) they have a natural cap on the features in the iBook. Thus, Apple is restricted to selling up its five year old laptop with slogans like "a fast 133MHz or 142MHz system bus" (fast? Dell's \$500 Inspiron 1200 runs its system bus at 400MHz) and "brilliant 1024 by 768 pixel resolution" (maybe it was brilliant five years ago).

Anyway, I've recently come to have a theory on the limited display resolution of Apple's notebooks. It seems obvious in retrospect, but Apple can't scale up the display resolution since they don't have the CPU or memory bandwidth to support higher resolutions as well as they want. With modern display stacks like Quartz and Quartz Extreme, pushing pixels around is one of the biggest user-visible performance burdens on a modern machine (hence, "the snappy"). While a GPU can help, there's no getting around the fact that if they doubled the resolution, they'd double the number of bytes their system has to process to render the same sized desktop on the screen. Given that Apple's best G4's have less than half the main memory bandwidth of the lowest end Centrinos, there's no wonder Apple's not chomping on the bit to eat up more of their bus.

Since Apple's first wave of Centrino laptops should bring fixes for all of this, the computing community has some pretty amazing hardware to look forwards to in a year or so.

reddit this! Digg Me!
Mon, 18 Jul 2005
I think that the weaknesses of the Excel AutoFilter turn out to be pretty typical of Excel in general.

To me, the brilliance of the spreadsheet was that it took a data model that business people were familar with, the accountant's paper spreadsheet, and layered on automatic computation and reporting facilities in a natural way. There's something very intuitive about going to cell c1, entering =A1+B1, and then having C1 contain the sum of the other two cells, automatically updated as the source cells change. It just makes sense, and is at the very core of every software spreadsheet dating back to the first, VisiCalc.

For years, spreadsheets worked at making this model work better. Lotus 1-2-3 introduced something called natural recalculation order that made it easier to follow the logic of spreadsheet calculation. Somewhere along the way, spreadsheets started doing limited recalculation, where formulas that didn't change weren't recalculated (thus saving time). New intrinsic functions were added, and Excel made a huge stride when it added array formulas: individual formulas that can produce more than one result. The gateway to user defined functions written in VisualBASIC was another huge win.

The core strength of all of these ideas is that they rely on and extend the core concept of the software spreadsheet: the software tracks dependancies between cells and automatically recalculates the appropriate results as necessary. As powerful as that concept is, Microsoft lost the plot somewhere around Excel 4 or 5 and keeps sinking money and effort into features that don't fully participate:
• Excel has two data filter features: neither one can automatically update a table as a part of recalculation.
• PivotTables don't update when their source data updates either. (For SQL data sources, this is understandable, but not so much when the source data comes from Excel itself).
• PivotTables produce tables with missing values (to improve the formatting), which makes them very difficult to query with spreadsheet lookup functions.
• The historgram function (among others) of the Analysis ToolPak is a one-time thing: you use it, it generates a histogram, and that's it. It's not possible to incorporate histogram generation into the dataflow driven recalculation of a spreadsheet.
• There's no way to use an Excel formula to determine if a row is excluded or included in an AutoFilter query. Actually, there's no way to have the result set of an AutoFilter query drive spreadsheet recalculation at all.
Maybe this is being picky, but spreadsheets have a real strength in that they made it a lot easier for non-techies to specify how a computer can automatically solve certain types of problems. It's just a shame that so many of Excel's features are excluded from the natural way Excel is programmed.

reddit this! Digg Me!
One of Excel's more interesting features for querying data sets is the AutoFilter. Applied to a table of data in a spreadsheet, The AutoFilter allows the table to be queried for subsets of data based on combo boxes in the table's header row. It's a simple way to filter out extraneous data and it can support quite elaborate query semantics (since it can filter based on values in computed cells).

However, AutoFilter is not without its problems:
• AutoFilter imposes its own user interface: if you want a look-and-feel other than stock, you're out of luck.
• For wide data tables with lots of columns, it can be hard to see the current AutoFilter query. To see the entire query requires horizontal scrolling down the header row.
• Cell formatting and AutoFilter are independant of each other. If you want position dependant formatting (alternate row formatting, for example), it has to be recreated after each AutoFilter adjustment.
• An AutoFilter works by selectively 'hiding' rows in the worksheet it's a part of. This means that an AutoFiltered list can't share rows with anything else that you don't also want selectively 'filtered' from view.
• You can't have more than one AutoFilter on a worksheet tab.
• AutoFilter isn't part of the natural 'ebb and flow' of the life of a spreadsheet: it doesn't participate in the dependancy driven formula solver that drives Excel's computational capability. This has some profound (bad) implications:
• As data rows are added and removed from the list being AutoFiltered, the AutoFilter has to be removed and reapplied to the new data list to reflect changes to its source.
• You can't use AutoFilter to filter a list and then search that list with =LOOKUP() or =MATCH(): the lookup operation will search the entire list, not the filtered list.
• If you AutoFilter a list that contains calculated cells, and those cells change value, the set of filtered rows is not updated.
Anyway, I could go on, but I hope it's pretty clear by now that there are sometimes good reasons to look for other list filter mechanisms than AutoFilter. (FYI: 'Advanced Filter' has its own limitations, some of which are very similar to AutoFilter's.) I'll post a way to get AutoFilter-like behavior directly from Excel formulas. This technique has its own issues, but it does address lots of the issues I mentioned here.

reddit this! Digg Me!
Thu, 07 Jul 2005
This is pretty well documented online, but I can never seem to find it when I need it. So, I'm putting it here too.

Internet Explorer defaults to anonymous FTP, when sometimes you need to log in with an explict username and password. One of the lesser known features of URL's is that they allow login information to be specified as part of a web address.