Mike Schaeffer's Blog

Articles with tag: excel
July 28, 2005

Now that I've written a little about why you might want to replace Excel AutoFilter, here's how to actually do it. To frame the discussion, there are two problems to solve:

  • Deciding which rows of the input set are part of the result set
  • Displaying the result set in a contiguous sequence of spreadsheet rows.

The first problem is easy: add another column alongside the input set with a formula that evaluates to TRUE if the row belongs in the result. This can be any valid Excel formula: it can include complex logic, it can depend on other cells containing control parameters. In my example spreadsheet, this formula is in column H, labeled In Query?:

The tricky bit of the formula-based filter is the second problem: displaying the result set in a contiguous range of rows with no gaps. Each cell that might display part of the result set has to figure out itself what part of the result set to display, if any, and pull the data from the input set. A simple MATCH or LOOKUP can't handle this, since MATCH or LOOKUP can't be told to return the second, third, or nth match. They return the first match, which isn't quite enough for what we're trying to do.

As it turns out, even though having the result set compute a mapping from the input set is quite hard, solving the reverse problem isn't too bad. Having the input set compute the mapping to the result set is easy. Here's how it works, by column:

  • Ord. - The row ordinal number of the row in the input set, starting with 1.
  • Result Ord. - This column starts at zero, in the row preceeding the first row of the result set, and increments by 1 for each row where In Query? is TRUE. For each row with In Query? of TRUE, this column is the row ordinal number of this row in the result set.... We are almost there.
  • Result Rows. - The input row ordinal of each row in the output set. This is done by using MATCH to find the first row for each number in the Result Ord..

Once the Result Rows. column has been calculated, populating the actual result set is just a matter of using INDEX. ISERROR can be called on cells in Result Rows. to identify rows that don't contain values. After all this is said and done, we have a spreadsheet range that contains only a result set, updates like every other range in Excel, and can be used in formulas like every other range. I have a sample spreadsheet that implements a lot of this here.

July 18, 2005

I think that the weaknesses of the Excel AutoFilter turn out to be pretty typical of Excel in general.

To me, the brilliance of the spreadsheet was that it took a data model that business people were familar with, the accountant's paper spreadsheet, and layered on automatic computation and reporting facilities in a natural way. There's something very intuitive about going to cell c1, entering =A1+B1, and then having C1 contain the sum of the other two cells, automatically updated as the source cells change. It just makes sense, and is at the very core of every software spreadsheet dating back to the first, VisiCalc.

For years, spreadsheets worked at making this model work better. Lotus 1-2-3 introduced something called natural recalculation order that made it easier to follow the logic of spreadsheet calculation. Somewhere along the way, spreadsheets started doing limited recalculation, where formulas that didn't change weren't recalculated (thus saving time). New intrinsic functions were added, and Excel made a huge stride when it added array formulas: individual formulas that can produce more than one result. The gateway to user defined functions written in VisualBASIC was another huge win.

The core strength of all of these ideas is that they rely on and extend the core concept of the software spreadsheet: the software tracks dependancies between cells and automatically recalculates the appropriate results as necessary. As powerful as that concept is, Microsoft lost the plot somewhere around Excel 4 or 5 and keeps sinking money and effort into features that don't fully participate:

  • Excel has two data filter features: neither one can automatically update a table as a part of recalculation.
  • PivotTables don't update when their source data updates either. (For SQL data sources, this is understandable, but not so much when the source data comes from Excel itself).
  • PivotTables produce tables with missing values (to improve the formatting), which makes them very difficult to query with spreadsheet lookup functions.
  • The historgram function (among others) of the Analysis ToolPak is a one-time thing: you use it, it generates a histogram, and that's it. It's not possible to incorporate histogram generation into the dataflow driven recalculation of a spreadsheet.
  • There's no way to use an Excel formula to determine if a row is excluded or included in an AutoFilter query. Actually, there's no way to have the result set of an AutoFilter query drive spreadsheet recalculation at all.

Maybe this is being picky, but spreadsheets have a real strength in that they made it a lot easier for non-techies to specify how a computer can automatically solve certain types of problems. It's just a shame that so many of Excel's features are excluded from the natural way Excel is programmed.

July 18, 2005

One of Excel's more interesting features for querying data sets is the AutoFilter. Applied to a table of data in a spreadsheet, The AutoFilter allows the table to be queried for subsets of data based on combo boxes in the table's header row. It's a simple way to filter out extraneous data and it can support quite elaborate query semantics (since it can filter based on values in computed cells).

However, AutoFilter is not without its problems:

  • AutoFilter imposes its own user interface: if you want a look-and-feel other than stock, you're out of luck.
  • For wide data tables with lots of columns, it can be hard to see the current AutoFilter query. To see the entire query requires horizontal scrolling down the header row.
  • Cell formatting and AutoFilter are independant of each other. If you want position dependant formatting (alternate row formatting, for example), it has to be recreated after each AutoFilter adjustment.
  • An AutoFilter works by selectively 'hiding' rows in the worksheet it's a part of. This means that an AutoFiltered list can't share rows with anything else that you don't also want selectively 'filtered' from view.
  • You can't have more than one AutoFilter on a worksheet tab.
  • AutoFilter isn't part of the natural 'ebb and flow' of the life of a spreadsheet: it doesn't participate in the dependancy driven formula solver that drives Excel's computational capability. This has some profound (bad) implications:
    • As data rows are added and removed from the list being AutoFiltered, the AutoFilter has to be removed and reapplied to the new data list to reflect changes to its source.
    • You can't use AutoFilter to filter a list and then search that list with =LOOKUP() or =MATCH(): the lookup operation will search the entire list, not the filtered list.
    • If you AutoFilter a list that contains calculated cells, and those cells change value, the set of filtered rows is not updated.

Anyway, I could go on, but I hope it's pretty clear by now that there are sometimes good reasons to look for other list filter mechanisms than AutoFilter. (FYI: 'Advanced Filter' has its own limitations, some of which are very similar to AutoFilter's.) I'll post a way to get AutoFilter-like behavior directly from Excel formulas. This technique has its own issues, but it does address lots of the issues I mentioned here.

June 2, 2005

Lately, I've been finding myself spending lots of time toggling between two Excel spreadsheets to make edits. This little macro makes it easy in Excel 2000 to toggle between two spreadsheet windows. I reccomend you bind it to a keystroke.

 Option Explicit

Dim lastWindow As Variant

Sub HereAndThere()

If IsEmpty(lastWindow) Then
    Set lastWindow = ActiveWindow
    Dim currentWindow As Window
    Set currentWindow = ActiveWindow
    Set lastWindow = currentWindow
End If 
End Sub

Here's how you use it:

  • Run the macro once to save your current location.
  • Switch to your other spreadsheet
  • Now, running the macro will switch back and forth betweeen the two worksheets.

The "saving excursions" in the title is a reference to the save-excursion special form in Emacs Lisp. This macro isn't quite the same (and not nearly as powerful), but it reminded me of the Emacs feature. If it turns out to be useful, I might generalize my little macro to include some of the capabilities of Emacs' save-excursion.

June 2, 2005

This is a little add in for Excel that takes formulas and reformats them in a more readable style.