Mike Schaeffer's Weblog
Fri, 01 Feb 2008
I've spent a fair amount of time lately working with code that generates
Comma Seperated Value files for loading into Excel. You'd think the
format would be trivial, but not quite. One
additional subtlety, one not covered in that 'specification', is Excel's inconsistent handling
of end of line markers. As it turns out, if Excel loads a CSV file that contains a quoted,
multi-line value, it expects a different line feed convention within the quoted value than the
usual CR/LF. A CR embedded in a quoted field renders as a box, rather than as part of a newline.
To suppress the box, CSV files for Excel need to be written with a LF-only convention within
quoted values. Even then, Excel will not automatically expand rows containing a multi-line
value. That has to be done manually.
Internally, Excel seems to follow the same LF-only convention that this issue with CSV files seems to imply. Taking the CODE(...) of each character in a manually entered multi-line cell value, shows only one charater, a LF, at each line break. My guess is that the quotes in a CSV file just act as a signal to turn off all special character handling, not just handling that signals new rows and cells. Either way, it's more than a little irritating that Excel compatible CSV files with multi-line values have to have two seperate end of line conventions.
reddit this! Digg Me!
Internally, Excel seems to follow the same LF-only convention that this issue with CSV files seems to imply. Taking the CODE(...) of each character in a manually entered multi-line cell value, shows only one charater, a LF, at each line break. My guess is that the quotes in a CSV file just act as a signal to turn off all special character handling, not just handling that signals new rows and cells. Either way, it's more than a little irritating that Excel compatible CSV files with multi-line values have to have two seperate end of line conventions.
reddit this! Digg Me!
Fri, 23 Jun 2006
The clipboard is one of the oldest and most often used data exchange
mechanisms in Microsoft Windows. It's been around since Windows 1.0
and basically all applications use it to support operations like copy,
cut, and paste. One of the more interesting aspects of the way the
clipboard works is that it allows applications to copy data to the
clipboard in multiple formats. For example, if you copy text in
Microsoft Word to the clipboard, it's not copied in one format, it's
copied in seventeen. One way to see this is to open the Clipbook
viewer application (start>>run, clipbrd.exe), open the Clipboard
window, and look at the list of data formats in the view sub menu.
This is how different applications negotiate data formats when copying
and pasting data; this is why you can copy a spreadsheet from
super-smart Excel, paste it into super-dumb Notepad, and still get
reasonable results.
In the process of adding better clipboard support to vCalc, I wrote a small tool for dumping clipboard contents to a console window. This tool, cbdump, runs from the command line and shows a list of all the data formats currently on the clipboard. It can also dump out the data in hexadecimal format, so you can see the actual data, unaltered by applications. To show what it looks like, here's sample output after copying a small Excel table to the clipboard:
A couple paragraphs ago, I used the word 'negotiate' when referring to the way that two applications use the clipboard to exchange data. As currently presented, that negotiation looks a lot like the source application making everything available and the sink application cherry picking the one format it wants. This is not a very balanced or efficient negotiation. However, the clipboard is actually more intelligent than that. To see what I mean, go to Excel, copy a huge range of data to the clipboard and run cbdump. The output list of formats will basically be the same as before, the difference is in the way the list appears. You'll notice that the list scrolls past unevenly, jerking along with some formats taking longer to list than others.
The reason for the uneven scrolling is that Windows does not force an application to always provide data in every clipboard format it supports. When Excel copies data to the clipboard, what it's really doing is telling Windows that it has the ability to provide data in a format, it is not necessarily rendering the data in every format at the time of the copy. Then, when a sink application requests data from the clipboard, Windows can see if it has an actual copy of the requested format. If not, it then requests that Excel render the clipboard contents in the requested format and passes that result to the requesting application. The way cbdump works, to show the size of each data format on the clipboard, it requests a copy of each available data format. This forces Excel to render every supported format, some of which take longer than others. If you happened to have a Windows message viewer looking at the message stream to the Excel window, you'd see a series of window messages requesting rendered copies of unrendered formats.
reddit this! Digg Me!
In the process of adding better clipboard support to vCalc, I wrote a small tool for dumping clipboard contents to a console window. This tool, cbdump, runs from the command line and shows a list of all the data formats currently on the clipboard. It can also dump out the data in hexadecimal format, so you can see the actual data, unaltered by applications. To show what it looks like, here's sample output after copying a small Excel table to the clipboard:
c009: "DataObject" (4 bytes) 000e: CF_ENHMETAFILE (0 bytes) 0003: CF_METAFILEPICT (16 bytes) 0002: CF_BITMAP (0 bytes) c2e2: "Biff8" (5120 bytes) c2e6: "Biff5" (4608 bytes) c1f3: "BIFF4" (1904 bytes) c2e3: "Biff3" (1773 bytes) c2f7: "Biff" (913 bytes) 0004: CF_SYLK (1131 bytes) c2ec: "Wk1" (197 bytes) 0005: CF_DIF (137 bytes) c298: "XML Spreadsheet" (943 bytes) c0f2: "HTML Format" (2344 bytes) 000d: CF_UNICODETEXT (22 bytes) 0001: CF_TEXT (11 bytes) c295: "Csv" (11 bytes) c0a5: "Rich Text Format" (3071 bytes) c00b: "Embed Source" (6144 bytes) c004: "Native" (6144 bytes) c003: "OwnerLink" ERROR in GetClipboardData c00e: "Object Descriptor" (152 bytes) c00d: "Link Source" (135 bytes) c00f: "Link Source Descriptor" (152 bytes) c1f2: "Link" (31 bytes) 0081: CF_DSPTEXT (13 bytes) c002: "ObjectLink" (39 bytes) c013: "Ole Private Data" (792 bytes) 0010: CF_LOCALE (4 bytes) 0007: CF_OEMTEXT (11 bytes) 0008: CF_DIB (57368 bytes) 0011: CF_DIBV5 (57452 bytes) 150642 bytesThe leftmost column is the integer ID of the clipboard format. This is the ID used by Windows to identify the format used by a particular chunk of data. Following the ID is the clipboard format's name, of which there are two kinds. By default, Windows knows about a few predefined types of clipboard data: these are identified by constants in the header file and are things like CF_TEXT (text data) and CF_DIB (a device indepentant bitmap). However, to handle the case where one of the Windows default formats will not work, Windows also allows applicatons to register additional formats and give them useful names. In the list above, that includes formats like "XML Spreadsheet", "Csv", and "Biff8". This allows applications like Excel to communicate rich, specialized data to applications that support it (Excel itself being a good example).
A couple paragraphs ago, I used the word 'negotiate' when referring to the way that two applications use the clipboard to exchange data. As currently presented, that negotiation looks a lot like the source application making everything available and the sink application cherry picking the one format it wants. This is not a very balanced or efficient negotiation. However, the clipboard is actually more intelligent than that. To see what I mean, go to Excel, copy a huge range of data to the clipboard and run cbdump. The output list of formats will basically be the same as before, the difference is in the way the list appears. You'll notice that the list scrolls past unevenly, jerking along with some formats taking longer to list than others.
The reason for the uneven scrolling is that Windows does not force an application to always provide data in every clipboard format it supports. When Excel copies data to the clipboard, what it's really doing is telling Windows that it has the ability to provide data in a format, it is not necessarily rendering the data in every format at the time of the copy. Then, when a sink application requests data from the clipboard, Windows can see if it has an actual copy of the requested format. If not, it then requests that Excel render the clipboard contents in the requested format and passes that result to the requesting application. The way cbdump works, to show the size of each data format on the clipboard, it requests a copy of each available data format. This forces Excel to render every supported format, some of which take longer than others. If you happened to have a Windows message viewer looking at the message stream to the Excel window, you'd see a series of window messages requesting rendered copies of unrendered formats.
reddit this! Digg Me!
Tue, 13 Jun 2006
Recently, I was asked for advice on how to learn Excel macro programming. I'm not
the expert, but I read sites and books written by folks who are. Here are some
useful links.
Websites:
The last book is particularly interesting: it focuses on the C/C++ API to Microsoft Excel. Since Excel 5, when Microsoft introduced VBA and the Excel COM object model, this programming mechanism seems to have fallen out of vogue, but it is still being maintained, and represents a way to write really fast and secure Excel add-ins.
reddit this! Digg Me!
Websites:
- An Introduction to User Defined Functions
- Chip Pearson's Website
- John Walkenbach's Website
- The Daily Dose of Excel, an Excel community blog
- Microsoft's Excel 12 Blog
- Professional Excel Development : The Definitive Guide to Developing Applications Using Microsoft(R) Excel and VBA(R)
- Excel 2003 Power Programming with VBA.
- Excel Add-in Development in C/C++
The last book is particularly interesting: it focuses on the C/C++ API to Microsoft Excel. Since Excel 5, when Microsoft introduced VBA and the Excel COM object model, this programming mechanism seems to have fallen out of vogue, but it is still being maintained, and represents a way to write really fast and secure Excel add-ins.
reddit this! Digg Me!
Wed, 09 Nov 2005
"Thirty days hath September,
All the rest I can't remember.
The calendar hangs on the wall;
Why bother me with this at all?"
—http://leapyearday.com/30Days.htm
Here's an Excel one liner that computes the number of days in a particular month. Cell A2 contains the year of the month you're looking for, Cell B2 contains the months' ordinal (1=January, 2=February, etc.):
=DAY(DATE(A2,B2+1,1)-1)This is mainly useful to illustrate what can be done with Excel's internal representation of dates. Dates and times in Windows versions of Excel are normally stored as the number of days from January 1st, 1900. You can see this by entering a date in a cell, and then reformatting the cell to display as a number rather than a date. For example, this reveals April 1st, 2004 to be represented internally as the number 38078. This is because there are 38,078 days between January 1st, 1900 and April 1st, 2004.
The formula above relies on this in its computation of the number of days in a month. The sub-expression DATE(A2,B2+1,1) computes the date number for the first day of the month immediately following the month we're interested in. We then subtract one from that number, which gives us the date number for the last day of the month that we are interested in. The call to DAY then returns the number of the day within the month, which happens to be the number of days in the month.
reddit this! Digg Me!
Fri, 04 Nov 2005
At my job, we use Excel extensively to keep track of software
testing progress. One typical use is to maintain a list of features
to be tested, along with their current pass/fail statuses and
an attempt at a rough subdivision into functional areas. Excel's
AutoFilter then makes it easy to ask questions like "show me all
failed tests relating to function block scheduling."
This works really well as long as "function block scheduling" is one of the categories into which you've subdivided your features list. If it's not, you have to get a little creative to filter your list. One approach to this problem I've found useful is filtering based on columns populated with a formula similar to this:
reddit this! Digg Me!
This works really well as long as "function block scheduling" is one of the categories into which you've subdivided your features list. If it's not, you have to get a little creative to filter your list. One approach to this problem I've found useful is filtering based on columns populated with a formula similar to this:
=IF(ISERROR(SEARCH($K$5,K6)),"No","Yes")If column K contains feature descriptions, this formula returns "Yes" is the description matches the search string in K5 and "No", otherwise. Filtering based on this formula makes it possible to display every list item whose description matches a word. If there is more than one column to search, you can use string concatenation to aggregate the columns together:
=IF(ISERROR(SEARCH($K$5,K6&L6&M6)),"No","Yes")So, why the name apropos? Follow this link.
reddit this! Digg Me!
Tue, 11 Oct 2005
David Gainer has Summarized
a a number of new conditional formatting rules in Excel 12, over on the Excel 12 blog. These
rules were designed to "make a greater number of scenarios possible without needing to write
formulas." In other words, all these scenarios have simple solutions directly visible in
the Excel 12 UI.
Well, if you can't wait for Excel 12, Excel is pretty darned powerful as it is, and as Mr. Gainer states: most of these scenarios have formula-based approaches that work right now. Here are some of the approaches for current versions of Excel:
reddit this! Digg Me!
Well, if you can't wait for Excel 12, Excel is pretty darned powerful as it is, and as Mr. Gainer states: most of these scenarios have formula-based approaches that work right now. Here are some of the approaches for current versions of Excel:
- With data bars, color scales, or icons based on the numeric value in the cell, percentages, percentiles, or a formula. See the posts on data bars, color scales, and icon sets for more information on each of these. - This approach to 'databars' generalizes to formula-based scaling, although it's not as pretty, not a color scale, and not an icon set.
- Containing, not containing, beginning with, or ending with specific text. For example, highlighting parts containing certain characters in a parts catalog. - Use a formula: a lot of these conditions can be tested using FIND: =FIND(string, A1)=1, checks for parts that begin with string, for example.
- Containing dates that match dynamic conditions like yesterday, today, tomorrow, in the last 7 days, last week, this week, next week, last month, this month, next month. For example, highlight all items dated yesterday. The great part about these conditions is that Excel handles calculating the date based on the system clock, so the user doesn.t need to worry about updating the condition. - Use a formula: the system date is available via NOW(), and Excel offers plenty of date arithmetic functions to check for specific conditions.
- That are blank or that are non-blank. - Use a formula: =ISBLANK(A1) or =NOT(ISBLANK(A1))
- That have errors or that do not have errors. - Use a formula: =ISERROR(A1) or =NOT(ISERROR(A1))
- That are in the top n of a selected range (where n is whatever number you want) OR that are in the top n percent of a selected range (again, where n is adjustable). For example, highlighting the top 10 investment returns in a table of 1,000 investments. - Use a formula: =RANK(A1, range)>n.
- Cells that have the bottom n values OR cells that are the bottom n percent of a selected range. - Use a formula: =RANK(A1, range)<ROWS(range)-n.
- Cells that are above average, below average, equal to or above average, equal to or below average, 1 standard deviation above, 1 standard deviation below, 2 standard deviations above, 2 standard deviations below, 3 standard deviations above, 3 standard deviations below a selected range. - This type of thing can be solved using a particular form of formula: =A1<(AVERAGE(range)-n*STDEV(range)) or =A1>(AVERAGE(range)+n*STDEV(range)). For large ranges, it probably makes sense to move the computation of AVERAGE and STDEV into a cell, and have the conditional format reference (with an absolute reference) that cell.
- Cells that are duplicate values or, conversely, cells that are unique values. - Use a formula: =COUNTIF(range, A1)=1 or =COUNTIF(range, A1)>1. Ensure that the range you use in the formula has an absolute address. If your range is sorted on the 'key' field, you can use this style of formula: =A1<>A2. This can be much, much faster, particularly for large tables. (For the Comp. Sci. types it's O(N), rather than O(N^2), once you have sorted data.)
- Based on comparisons between two columns in tables. For example, highlight values where values in the .Actual Sales. column are less than in the .Sales Target. column. - Use a conditional format formula: =A1<B1. Apply it to the entire column you want shaded, and Excel will evaluate the seperately for each cell. The cell references in the format formula are relative to the current cell in the selected range. The current cell is the cell in the range that is not highlighted (but is surrounded by a selection border), and can be moved around the four corners of the range with Control+. (period).
- When working with tables, we have also made it easy to format the entire row based on the results of a condition. - Relative formulas can be made to do this: select an entire range, and define a conditional formula using absolute column addresses (ie: =$a1). Excel evaluates the format formula for each cell in the range, and since the column addresses are absolute, each cell in a row will pull from the came columns. Therefore, each cell in a row will share the same conditional format, which is what we want.
reddit this! Digg Me!
Fri, 07 Oct 2005
I suspected as much, but Excel has a way to duplicate my UDF
using Excel formulas.
=REPT("█",A1)&REPT("▌",ROUND(FLOOR(A1,1),0))
That formula evaluates to a bar of length A1 units, rounded to the nearest 0.5. Rescaling can be done in another cell. If you're interested in a bar that can be right-justified, you can use this:
=REPT("▐",ROUND(A1-FLOOR(A1,1),0))&REPT("█",A1)
The trickiest part about this is getting the block characters into the formula. For that, I reccomend using the Windows Character Map.
Qualitatively compared to VBA, this method requires more logic to be represented in the spreadsheet: that adds compelxity for readers and makes it tricker to set up than the VBA. On the other hand, it avoids the performance hit of calling UDF and the requirement that the spreadsheet contain a macro. I honestly don't know which is better style, but can say that this would be a perfect time to use a paramaterized range name (if Excel had such a thing).
reddit this! Digg Me!
=REPT("█",A1)&REPT("▌",ROUND(FLOOR(A1,1),0))
That formula evaluates to a bar of length A1 units, rounded to the nearest 0.5. Rescaling can be done in another cell. If you're interested in a bar that can be right-justified, you can use this:
=REPT("▐",ROUND(A1-FLOOR(A1,1),0))&REPT("█",A1)
The trickiest part about this is getting the block characters into the formula. For that, I reccomend using the Windows Character Map.
Qualitatively compared to VBA, this method requires more logic to be represented in the spreadsheet: that adds compelxity for readers and makes it tricker to set up than the VBA. On the other hand, it avoids the performance hit of calling UDF and the requirement that the spreadsheet contain a macro. I honestly don't know which is better style, but can say that this would be a perfect time to use a paramaterized range name (if Excel had such a thing).
reddit this! Digg Me!
Microsoft has just announced a cool new feature on the Excel 12 blog: the
databar. I think a picture (linked from Microsoft's
Excel 12 Blog) can explain it
better than I can:

This will be a nice way to look for trends/outliers, but I can also see it being useful for tracking parallel completion percentages in status reports, etc. Of the Excel 12 features announced so far, this is the one that I'm the most excited about. Of course, it's also the one that's easiest to approximate in Excel <12. Andrew has an approach using Autoshapes on his blog, and I'm going to present a slightly different approach.
IMO, his approach looks a lot better, this approach has the benefit of updating automatically. Pick your poison.
It all centers around this little UDF:
The formulat in C2 (and filled down) is =GraphBar(B2,MIN(B$2:B$8),MAX(B$2:B$8),5). The MIN and MAX set the scale, the 5 sets the maximum length of a bar. The maximum length, font size, column width can be tweaked to produce a reasonably attractive result, although I do reccomend using vertical centering.
If you want to get a little fancier, conditional formatting works on plot cells...
...whitespace can possibly improve the appearance...
...and this technique can scale.
1] (The original PC didn't have stanard graphics, it was an option. If you bought the monochrome, non-graphics, video board, characters like this were as close as you could get to a bar chart.)
reddit this! Digg Me!
This will be a nice way to look for trends/outliers, but I can also see it being useful for tracking parallel completion percentages in status reports, etc. Of the Excel 12 features announced so far, this is the one that I'm the most excited about. Of course, it's also the one that's easiest to approximate in Excel <12. Andrew has an approach using Autoshapes on his blog, and I'm going to present a slightly different approach.
IMO, his approach looks a lot better, this approach has the benefit of updating automatically. Pick your poison.
It all centers around this little UDF:
Option Explicit
Function GraphBar(x As Double, _
Low As Double, _
High As Double, _
ScaleTo As Double) As String
x = ((x - Low) / (High - Low)) * ScaleTo
Dim i As Integer
Dim blockFull As String
Dim blockHalf As String
blockFull = ChrW(9608)
blockHalf = ChrW(9612)
GraphBar = ""
For i = 1 To Fix(x)
GraphBar = GraphBar + blockFull
Next
If x - Fix(x) > 0.5 Then
GraphBar = GraphBar + blockHalf
End If
End Function
This isn't rocket science: all it does is rescale x from the range
[Low, High] to the range [0.0, ScaleTo]. Then, it
strings together that many Chrw(9608)'s, followed by a
Chrw(9612), if the scaled value's fractional part is >0.5. The
trick in this is that Chrw(9608) and Chrw(9612) are VBA
expressions that produce the the Unicode equivalent of the old line
drawing characters IBM put in the original PC [1]. 9608 is a full box
("█"), 9612 is a half box on the left ("▌"). The result of
this function ends up being a string that (when displayed as Arial) looks
like a horizontal bar. ("████▌"). Put a few
of those in adjacent cells, and you get this:
The formulat in C2 (and filled down) is =GraphBar(B2,MIN(B$2:B$8),MAX(B$2:B$8),5). The MIN and MAX set the scale, the 5 sets the maximum length of a bar. The maximum length, font size, column width can be tweaked to produce a reasonably attractive result, although I do reccomend using vertical centering.
If you want to get a little fancier, conditional formatting works on plot cells...
...whitespace can possibly improve the appearance...
...and this technique can scale.
1] (The original PC didn't have stanard graphics, it was an option. If you bought the monochrome, non-graphics, video board, characters like this were as close as you could get to a bar chart.)
reddit this! Digg Me!
Thu, 06 Oct 2005
This is a simple little two-bit Excel trick that I find myself
using all the time, particularly when formatting worksheets.
In Excel, Control+Y is the 'other half' of the Undo/Redo pair. If you undo an action and want to redo what you just undid, Control+Y undoes the undo, so to speak. However, if you haven't undone anything, and there's nothing on the redo queue, Control+Y repeats the last single action you took.
Repeatable actions can actually be quite complex. For example, opening the Format Cell dialog box and applying a format counts as one repeatable action, regardless of how many format attributes you change. Once you make that format change to one cell and before you do anything else Control+Y has become a key that applies that specific format change to as many other cells as you like.
In a sense, Control+Y is a command that's eternally bound to a simple macro that Excel keeps updating with your last action. If you plan your work to group actions together, this 'automatic' macro can save a lots of time.
reddit this! Digg Me!
In Excel, Control+Y is the 'other half' of the Undo/Redo pair. If you undo an action and want to redo what you just undid, Control+Y undoes the undo, so to speak. However, if you haven't undone anything, and there's nothing on the redo queue, Control+Y repeats the last single action you took.
Repeatable actions can actually be quite complex. For example, opening the Format Cell dialog box and applying a format counts as one repeatable action, regardless of how many format attributes you change. Once you make that format change to one cell and before you do anything else Control+Y has become a key that applies that specific format change to as many other cells as you like.
In a sense, Control+Y is a command that's eternally bound to a simple macro that Excel keeps updating with your last action. If you plan your work to group actions together, this 'automatic' macro can save a lots of time.
reddit this! Digg Me!
Mon, 03 Oct 2005
I had this written out and then discovered a better way. SUBTOTAL
is "sensitive to AutoFilter settings", right? Assuming A1 isn't
empty, this formula =subtotal(a1, 2)=1 returns TRUE if
row 1 is visible and FALSE otherwise. No VBA necessary.
Not too long ago, I made a post that describes how to replicate some of the behavior of Excel Autofilters using a purely formula based approach. One of the arguments I put forward in support of that technique is that it makes it possible to use filtered result sets to drive other calculations. However, the approach also has two disadvantages: it's slow to compute and can be a little tricky to setup and understand. As a sort of intermediate ground between using the AutoFilter and re-implementing it, this post describes how an Excel formula can determine if a row is a member of an AutoFilter result set. The magic bit is this little user defined function:
For a simple use case, this function can be used to generate alternating color bars that always alternate regardless of the AutoFilter settings. To set it up, Put TRUE in the topmost cell of a free column next to the AutoFilter to be colored. Below the TRUE, fill down with a formula like this: =IF(isvisible(D2),NOT(D1),D1). This formula inverts the value in the column, but only for cells that are visible. This guarantees that regardless of the AutoFilter settings, this column will always alternate TRUE/FALSE in the set of visible rows. This column can then be used to drive a conditional format that highlights alternating visible rows.
A couple sidenotes:
reddit this! Digg Me!
Not too long ago, I made a post that describes how to replicate some of the behavior of Excel Autofilters using a purely formula based approach. One of the arguments I put forward in support of that technique is that it makes it possible to use filtered result sets to drive other calculations. However, the approach also has two disadvantages: it's slow to compute and can be a little tricky to setup and understand. As a sort of intermediate ground between using the AutoFilter and re-implementing it, this post describes how an Excel formula can determine if a row is a member of an AutoFilter result set. The magic bit is this little user defined function:
Function IsVisible(rng As Range) As Boolean
IsVisible = True
Dim row As Range
Dim col As Range
For Each row In rng.Rows
If row.RowHeight = 0 Then
IsVisible = False
Exit Function
End If
Next
For Each col In rng.Columns
If col.ColumnWidth = 0 Then
IsVisible = False
Exit Function
End If
Next
End Function
Given a range, this function returns true if every cell in the range
is visible (non-zero row height and column width). The way Excel
works, the Row Height of a row hidden by the Autofilter is reported
as zero. Therefore, IsVisible returns false when given a reference to
a cell in a hidden AutoFiltered row. Of course, it also returns False
for cells in manually hidden rows and columns, but if you're careful,
you can avoid that.
For a simple use case, this function can be used to generate alternating color bars that always alternate regardless of the AutoFilter settings. To set it up, Put TRUE in the topmost cell of a free column next to the AutoFilter to be colored. Below the TRUE, fill down with a formula like this: =IF(isvisible(D2),NOT(D1),D1). This formula inverts the value in the column, but only for cells that are visible. This guarantees that regardless of the AutoFilter settings, this column will always alternate TRUE/FALSE in the set of visible rows. This column can then be used to drive a conditional format that highlights alternating visible rows.
A couple sidenotes:
- This function works because adjusting an AutoFilter triggers recalculation, and Excel notices that this function depends on row heights. For hiding columns, it's a lot less reliable. All the calls to IsVisible have to be forced to recompute after the column is hidden or displayed. To do this, IsVisible can be marked as volatile and recalculation forced by pressing F9. This is a lousy solution.
- To optimize performance, the function short-circuits its search. The Exit Function's bail out of the calculation as soon as the first hidden row or column is discovered.
- Excel's SUBTOTAL intrinsic function is also sensitive to AutoFilter settings.
reddit this! Digg Me!
Tue, 20 Sep 2005
This may be something of a suprise, but Excel has even gotten
the attention of Microsoft
Research. Simon Peyton Jones, Margaret Burnett, and Alan Blackwell
have written a
paper that describes "extensions to the Excel spreadsheet that integrate
user-defined functions into the spreadsheet grid". Of course, Excel
doesn't do this... but, I wonder if that should be "Excel doesn't do this
yet".
As a sidenote, this reminds me a little of how LabView handled subfunction definitions: subfunctions are defined using the same visual tools as top-level functions. It worked, but 'felt' a little heavy weight in actual use.
reddit this! Digg Me!
As a sidenote, this reminds me a little of how LabView handled subfunction definitions: subfunctions are defined using the same visual tools as top-level functions. It worked, but 'felt' a little heavy weight in actual use.
reddit this! Digg Me!
I really liked
This post by Dick Kusleika, over on Daily Dose
of Excel. I'm a big fan of controlling frequently used options with keyboard
shortcuts. To riff on Mr. Kusleika's post a little, here's a refinement I've found
useful in the past for macros like these. Basically this allows the same macro
to toggle a state as well as non-destructively display the current state.
The first time the macro MaybeToggleMAR is invoked, it displays the current state in the status bar, and sets a timer to expire in 3 seconds. If the macro is invoked a second time before the timer expires (easy to do if it's bound to a keystroke) the state is toggled. Technically speaking, the trickiest bit is that the function that sets the 3 second timer also has to handle cancelling any previous instance of the same timer. It works without the timer cancellation, but without it, the UI behaves oddly after multiple keypresses in rapid succession.
Chip Pearson's website has useful content discussing the Excel API's for Timers and the Status Bar.
Here's the code: to use it, stick it in a module and bind MaybeToggleMAR to the keyboard shortcut of your choice.
reddit this! Digg Me!
The first time the macro MaybeToggleMAR is invoked, it displays the current state in the status bar, and sets a timer to expire in 3 seconds. If the macro is invoked a second time before the timer expires (easy to do if it's bound to a keystroke) the state is toggled. Technically speaking, the trickiest bit is that the function that sets the 3 second timer also has to handle cancelling any previous instance of the same timer. It works without the timer cancellation, but without it, the UI behaves oddly after multiple keypresses in rapid succession.
Chip Pearson's website has useful content discussing the Excel API's for Timers and the Status Bar.
Here's the code: to use it, stick it in a module and bind MaybeToggleMAR to the keyboard shortcut of your choice.
Option Explicit
Private MARChangesEnabled As Boolean
Public NextDisableTime As Double
Sub DisableMARChanges()
Application.StatusBar = False
MARChangesEnabled = False
End Sub
Sub DisableMARChangesSoon()
On Error Resume Next
Application.OnTime NextDisableTime, "DisableMARChanges", , False
NextDisableTime = Now + TimeSerial(0, 0, 3)
Application.OnTime NextDisableTime, "DisableMARChanges", , True
End Sub
Sub MaybeToggleMAR()
Dim NewStatusText As String
NewStatusText = ""
If MARChangesEnabled Then
Application.MoveAfterReturn = Not Application.MoveAfterReturn
NewStatusText = "Status changed: "
Else
MARChangesEnabled = True
NewStatusText = "Second press will change status: "
End If
If Application.MoveAfterReturn Then
NewStatusText = NewStatusText & "MoveAfterReturn Enabled"
Else
NewStatusText = NewStatusText & "MoveAfterReturn Disabled"
End If
Application.StatusBar = NewStatusText
DisableMARChangesSoon
End Sub
reddit this! Digg Me!
Wed, 24 Aug 2005
I literally dreamed about this last night. It would be wonderful
if Excel supported formulas like this:
reddit this! Digg Me!
=LET(value=MATCH(item,range,0), IF(ISERROR(value), 0, value))If you're into Lisp-y languages, it'd look like this:
(let ((value (match item range 0))) (if (is-error? value) 0 value))The function call
=LET(name=binding, expression)
would create a local range name named name, bound (equal) to the value returned by
binding, to be used during the evaluation of expression. In the example
above, during the evaluation of IF(ISERROR(value), 0, value)), value
would be bound to the value returned by MATCH(item, range, 0).
It's worth pointing out that this is slightly different from how normal Excel range names work. Range names in Excel work through textual substitution. With textual substitution, the initial expression would be logically equivalent to this:
=IF(ISERROR(MATCH(item, range, 0)), 0, MATCH(item, range, 0)))In other words, Excel would treat every instance of value as if
MATCH(item, range, 0) was explictly spelled out. This
means there are two calls to MATCH and two potential searches
through the range. While it's possible that Excel
optimizes the second
search away, I'm not sure that anybody outside of Microsoft can
know for sure how this is handled.
Microsoft's current
reccomendation for handling the specific ISERROR scenario
in the first expression is this VBA function:
Function IfError(formula As Variant, show As String)
On Error GoTo ErrorHandler
If IsError(formula) Then
IfError = show
Else
IfError = formula
End If
Exit Function
ErrorHandler:
Resume Next
End Function
This isn't bad, but it requires that spreadsheet authors and
readers understand VBA. It also imposes significant performance
costs: calling into VBA from a worksheet takes time.reddit this! Digg Me!
Thu, 28 Jul 2005
Now that I've written a little about why you might want to
replace Excel AutoFilter, here's how to actually do it. To frame the discussion,
there are two problems to solve:
The tricky bit of the formula-based filter is the second problem: displaying the result set in a contiguous range of rows with no gaps. Each cell that might display part of the result set has to figure out itself what part of the result set to display, if any, and pull the data from the input set. A simple MATCH or LOOKUP can't handle this, since MATCH or LOOKUP can't be told to return the second, third, or nth match. They return the first match, which isn't quite enough for what we're trying to do.
As it turns out, even though having the result set compute a mapping from the input set is quite hard, solving the reverse problem isn't too bad. Having the input set compute the mapping to the result set is easy. Here's how it works, by column:
reddit this! Digg Me!
- Deciding which rows of the input set are part of the result set
- Displaying the result set in a contiguous sequence of spreadsheet rows.
The tricky bit of the formula-based filter is the second problem: displaying the result set in a contiguous range of rows with no gaps. Each cell that might display part of the result set has to figure out itself what part of the result set to display, if any, and pull the data from the input set. A simple MATCH or LOOKUP can't handle this, since MATCH or LOOKUP can't be told to return the second, third, or nth match. They return the first match, which isn't quite enough for what we're trying to do.
As it turns out, even though having the result set compute a mapping from the input set is quite hard, solving the reverse problem isn't too bad. Having the input set compute the mapping to the result set is easy. Here's how it works, by column:
- Ord. - The row ordinal number of the row in the input set, starting with 1.
- Result Ord. - This column starts at zero, in the row preceeding the first row of the result set, and increments by 1 for each row where In Query? is TRUE. For each row with In Query? of TRUE, this column is the row ordinal number of this row in the result set.... We are almost there.
- Result Rows. - The input row ordinal of each row in the output set. This is done by using MATCH to find the first row for each number in the Result Ord..
reddit this! Digg Me!
Mon, 18 Jul 2005
I think that the weaknesses of the Excel AutoFilter turn out
to be pretty typical of Excel in general.
To me, the brilliance of the spreadsheet was that it took a data model that business people were familar with, the accountant's paper spreadsheet, and layered on automatic computation and reporting facilities in a natural way. There's something very intuitive about going to cell c1, entering =A1+B1, and then having C1 contain the sum of the other two cells, automatically updated as the source cells change. It just makes sense, and is at the very core of every software spreadsheet dating back to the first, VisiCalc.
For years, spreadsheets worked at making this model work better. Lotus 1-2-3 introduced something called natural recalculation order that made it easier to follow the logic of spreadsheet calculation. Somewhere along the way, spreadsheets started doing limited recalculation, where formulas that didn't change weren't recalculated (thus saving time). New intrinsic functions were added, and Excel made a huge stride when it added array formulas: individual formulas that can produce more than one result. The gateway to user defined functions written in VisualBASIC was another huge win.
The core strength of all of these ideas is that they rely on and extend the core concept of the software spreadsheet: the software tracks dependancies between cells and automatically recalculates the appropriate results as necessary. As powerful as that concept is, Microsoft lost the plot somewhere around Excel 4 or 5 and keeps sinking money and effort into features that don't fully participate:
reddit this! Digg Me!
To me, the brilliance of the spreadsheet was that it took a data model that business people were familar with, the accountant's paper spreadsheet, and layered on automatic computation and reporting facilities in a natural way. There's something very intuitive about going to cell c1, entering =A1+B1, and then having C1 contain the sum of the other two cells, automatically updated as the source cells change. It just makes sense, and is at the very core of every software spreadsheet dating back to the first, VisiCalc.
For years, spreadsheets worked at making this model work better. Lotus 1-2-3 introduced something called natural recalculation order that made it easier to follow the logic of spreadsheet calculation. Somewhere along the way, spreadsheets started doing limited recalculation, where formulas that didn't change weren't recalculated (thus saving time). New intrinsic functions were added, and Excel made a huge stride when it added array formulas: individual formulas that can produce more than one result. The gateway to user defined functions written in VisualBASIC was another huge win.
The core strength of all of these ideas is that they rely on and extend the core concept of the software spreadsheet: the software tracks dependancies between cells and automatically recalculates the appropriate results as necessary. As powerful as that concept is, Microsoft lost the plot somewhere around Excel 4 or 5 and keeps sinking money and effort into features that don't fully participate:
- Excel has two data filter features: neither one can automatically update a table as a part of recalculation.
- PivotTables don't update when their source data updates either. (For SQL data sources, this is understandable, but not so much when the source data comes from Excel itself).
- PivotTables produce tables with missing values (to improve the formatting), which makes them very difficult to query with spreadsheet lookup functions.
- The historgram function (among others) of the Analysis ToolPak is a one-time thing: you use it, it generates a histogram, and that's it. It's not possible to incorporate histogram generation into the dataflow driven recalculation of a spreadsheet.
- There's no way to use an Excel formula to determine if a row is excluded or included in an AutoFilter query. Actually, there's no way to have the result set of an AutoFilter query drive spreadsheet recalculation at all.
reddit this! Digg Me!
One of Excel's more interesting features for querying data sets
is the AutoFilter. Applied to a table of data in a spreadsheet,
The AutoFilter allows the table to be queried for subsets of data
based on combo boxes in the table's header row. It's a simple way
to filter out extraneous data and it can support quite elaborate
query semantics (since it can filter based on values in computed
cells).
However, AutoFilter is not without its problems:
reddit this! Digg Me!
However, AutoFilter is not without its problems:
- AutoFilter imposes its own user interface: if you want a look-and-feel other than stock, you're out of luck.
- For wide data tables with lots of columns, it can be hard to see the current AutoFilter query. To see the entire query requires horizontal scrolling down the header row.
- Cell formatting and AutoFilter are independant of each other. If you want position dependant formatting (alternate row formatting, for example), it has to be recreated after each AutoFilter adjustment.
- An AutoFilter works by selectively 'hiding' rows in the worksheet it's a part of. This means that an AutoFiltered list can't share rows with anything else that you don't also want selectively 'filtered' from view.
- You can't have more than one AutoFilter on a worksheet tab.
- AutoFilter isn't part of the natural 'ebb and flow' of
the life of a spreadsheet: it doesn't participate in the
dependancy driven formula solver that drives Excel's
computational capability. This has some profound (bad)
implications:
- As data rows are added and removed from the list being AutoFiltered, the AutoFilter has to be removed and reapplied to the new data list to reflect changes to its source.
- You can't use AutoFilter to filter a list and then search that list with =LOOKUP() or =MATCH(): the lookup operation will search the entire list, not the filtered list.
- If you AutoFilter a list that contains calculated cells, and those cells change value, the set of filtered rows is not updated.
reddit this! Digg Me!