Mike Schaeffer's Blog

Articles with tag: excel
February 1, 2008

I've spent a fair amount of time lately working with code that generates Comma Seperated Value files for loading into Excel. You'd think the format would be trivial, but not quite. One additional subtlety, one not covered in that 'specification', is Excel's inconsistent handling of end of line markers. As it turns out, if Excel loads a CSV file that contains a quoted, multi-line value, it expects a different line feed convention within the quoted value than the usual CR/LF. A CR embedded in a quoted field renders as a box, rather than as part of a newline. To suppress the box, CSV files for Excel need to be written with a LF-only convention within quoted values. Even then, Excel will not automatically expand rows containing a multi-line value. That has to be done manually.

Internally, Excel seems to follow the same LF-only convention that this issue with CSV files seems to imply. Taking the CODE(...) of each character in a manually entered multi-line cell value, shows only one charater, a LF, at each line break. My guess is that the quotes in a CSV file just act as a signal to turn off all special character handling, not just handling that signals new rows and cells. Either way, it's more than a little irritating that Excel compatible CSV files with multi-line values have to have two seperate end of line conventions.

June 13, 2006

Recently, I was asked for advice on how to learn Excel macro programming. I'm not the expert, but I read sites and books written by folks who are. Here are some useful links.

Websites:

Books:

The last book is particularly interesting: it focuses on the C/C++ API to Microsoft Excel. Since Excel 5, when Microsoft introduced VBA and the Excel COM object model, this programming mechanism seems to have fallen out of vogue, but it is still being maintained, and represents a way to write really fast and secure Excel add-ins.

November 9, 2005

Thirty days hath September,
the rest I can't remember.
The calendar hangs on the wall;
Why bother me with this at all?

http://leapyearday.com/30Days.htm

Here's an Excel one liner that computes the number of days in a particular month. Cell A2 contains the year of the month you're looking for, Cell B2 contains the months' ordinal (1=January, 2=February, etc.):

=DAY(DATE(A2,B2+1,1)-1)

This is mainly useful to illustrate what can be done with Excel's internal representation of dates. Dates and times in Windows versions of Excel are normally stored as the number of days from January 1st, 1900. You can see this by entering a date in a cell, and then reformatting the cell to display as a number rather than a date. For example, this reveals April 1st, 2004 to be represented internally as the number 38078. This is because there are 38,078 days between January 1st, 1900 and April 1st, 2004.

The formula above relies on this in its computation of the number of days in a month. The sub-expression DATE(A2,B2+1,1) computes the date number for the first day of the month immediately following the month we're interested in. We then subtract one from that number, which gives us the date number for the last day of the month that we are interested in. The call to DAY then returns the number of the day within the month, which happens to be the number of days in the month.

November 4, 2005

At my job, we use Excel extensively to keep track of software testing progress. One typical use is to maintain a list of features to be tested, along with their current pass/fail statuses and an attempt at a rough subdivision into functional areas. Excel's AutoFilter then makes it easy to ask questions like "show me all failed tests relating to function block scheduling."

This works really well as long as "function block scheduling" is one of the categories into which you've subdivided your features list. If it's not, you have to get a little creative to filter your list. One approach to this problem I've found useful is filtering based on columns populated with a formula similar to this:

=IF(ISERROR(SEARCH($K$5,K6)),"No","Yes")

If column K contains feature descriptions, this formula returns "Yes" is the description matches the search string in K5 and "No", otherwise. Filtering based on this formula makes it possible to display every list item whose description matches a word. If there is more than one column to search, you can use string concatenation to aggregate the columns together:

=IF(ISERROR(SEARCH($K$5,K6&L6&M6)),"No","Yes")

So, why the name apropos? Follow this link.

October 11, 2005

David Gainer has Summarized a a number of new conditional formatting rules in Excel 12, over on the Excel 12 blog. These rules were designed to "make a greater number of scenarios possible without needing to write formulas." In other words, all these scenarios have simple solutions directly visible in the Excel 12 UI.

Well, if you can't wait for Excel 12, Excel is pretty darned powerful as it is, and as Mr. Gainer states: most of these scenarios have formula-based approaches that work right now. Here are some of the approaches for current versions of Excel:

  • With data bars, color scales, or icons based on the numeric value in the cell, percentages, percentiles, or a formula. See the posts on data bars, color scales, and icon sets for more information on each of these. - This approach to 'databars' generalizes to formula-based scaling, although it's not as pretty, not a color scale, and not an icon set.
  • Containing, not containing, beginning with, or ending with specific text. For example, highlighting parts containing certain characters in a parts catalog. - Use a formula: a lot of these conditions can be tested using FIND: =FIND(string, A1)=1, checks for parts that begin with string, for example.
  • Containing dates that match dynamic conditions like yesterday, today, tomorrow, in the last 7 days, last week, this week, next week, last month, this month, next month. For example, highlight all items dated yesterday. The great part about these conditions is that Excel handles calculating the date based on the system clock, so the user doesn.t need to worry about updating the condition. - Use a formula: the system date is available via NOW(), and Excel offers plenty of date arithmetic functions to check for specific conditions.
  • That are blank or that are non-blank. - Use a formula: =ISBLANK(A1) or =NOT(ISBLANK(A1))
  • That have errors or that do not have errors. - Use a formula: =ISERROR(A1) or =NOT(ISERROR(A1))
  • That are in the top n of a selected range (where n is whatever number you want) OR that are in the top n percent of a selected range (again, where n is adjustable). For example, highlighting the top 10 investment returns in a table of 1,000 investments. - Use a formula: =RANK(A1, range) > n.
  • Cells that have the bottom n values OR cells that are the bottom n percent of a selected range. - Use a formula: `=RANK(A1, range)n`.
  • Cells that are above average, below average, equal to or above average, equal to or below average, 1 standard deviation above, 1 standard deviation below, 2 standard deviations above, 2 standard deviations below, 3 standard deviations above, 3 standard deviations below a selected range. - This type of thing can be solved using a particular form of formula: =A1<(AVERAGE(ange)-n*STDEV(range)) or =A1>(AVERAGE(ange)+n*STDEV(range)). For large ranges, it probably makes sense to move the computation of AVERAGE and STDEV into a cell, and have the conditional format reference (with an absolute reference) that cell.
  • Cells that are duplicate values or, conversely, cells that are unique values. - Use a formula: =COUNTIF(range, A1)=1 or =COUNTIF(range, A1)>1. Ensure that the range you use in the formula has an absolute address. If your range is sorted on the 'key' field, you can use this style of formula: =A1<>A2. This can be much, much faster, particularly for large tables. (For the Comp. Sci. types it's O(N), rather than O(N2 ), once you have sorted data.)
  • Based on comparisons between two columns in tables. For example, highlight values where values in the .Actual Sales. column are less than in the .Sales Target. column. - Use a conditional format formula: =A1<B1. Apply it to the entire column you want shaded, and Excel will evaluate the seperately for each cell. The cell references in the format formula are relative to the current cell in the selected range. The current cell is the cell in the range that is not highlighted (but is surrounded by a selection border), and can be moved around the four corners of the range with Control+. (period).
  • When working with tables, we have also made it easy to format the entire row based on the results of a condition. - Relative formulas can be made to do this: select an entire range, and define a conditional formula using absolute column addresses (ie: =$a1). Excel evaluates the format formula for each cell in the range, and since the column addresses are absolute, each cell in a row will pull from the came columns. Therefore, each cell in a row will share the same conditional format, which is what we want.

Based on this, you don't have to wait for Excel 12 to get a lot of these features, you just have to wait for Excel 12 if you want Excel to do it for you automatically. My suggestion would be to learn how to use conditional formatting formulas, but I tend to be "here's how to fish" kind of guy more than a "here's a fish" kind of guy.

October 6, 2005

This is a simple little two-bit Excel trick that I find myself using all the time, particularly when formatting worksheets.

In Excel, Control+Y is the 'other half' of the Undo/Redo pair. If you undo an action and want to redo what you just undid, Control+Y undoes the undo, so to speak. However, if you haven't undone anything, and there's nothing on the redo queue, Control+Y repeats the last single action you took.

Repeatable actions can actually be quite complex. For example, opening the Format Cell dialog box and applying a format counts as one repeatable action, regardless of how many format attributes you change. Once you make that format change to one cell and before you do anything else Control+Y has become a key that applies that specific format change to as many other cells as you like.

In a sense, Control+Y is a command that's eternally bound to a simple macro that Excel keeps updating with your last action. If you plan your work to group actions together, this 'automatic' macro can save a lots of time.

October 3, 2005

I had this written out and then discovered a better way. SUBTOTAL is "sensitive to AutoFilter settings", right? Assuming A1 isn't empty, this formula =subtotal(a1, 2)=1 returns TRUE if row 1 is visible and FALSE otherwise. No VBA necessary.

Not too long ago, I made a post that describes how to replicate some of the behavior of Excel Autofilters using a purely formula based approach. One of the arguments I put forward in support of that technique is that it makes it possible to use filtered result sets to drive other calculations. However, the approach also has two disadvantages: it's slow to compute and can be a little tricky to setup and understand. As a sort of intermediate ground between using the AutoFilter and re-implementing it, this post describes how an Excel formula can determine if a row is a member of an AutoFilter result set. The magic bit is this little user defined function:

Function IsVisible(rng As Range) As Boolean
    IsVisible = True
    
    Dim row As Range
    Dim col As Range
           
    For Each row In rng.Rows
        If row.RowHeight = 0 Then
            IsVisible = False
            Exit Function
        End If
    Next
        
    For Each col In rng.Columns
        If col.ColumnWidth = 0 Then
            IsVisible = False
            Exit Function
        End If
    Next
End Function

Given a range, this function returns true if every cell in the range is visible (non-zero row height and column width). The way Excel works, the Row Height of a row hidden by the Autofilter is reported as zero. Therefore, IsVisible returns false when given a reference to a cell in a hidden AutoFiltered row. Of course, it also returns False for cells in manually hidden rows and columns, but if you're careful, you can avoid that.

For a simple use case, this function can be used to generate alternating color bars that always alternate regardless of the AutoFilter settings. To set it up, Put TRUE in the topmost cell of a free column next to the AutoFilter to be colored. Below the TRUE, fill down with a formula like this: =IF(isvisible(D2),NOT(D1),D1). This formula inverts the value in the column, but only for cells that are visible. This guarantees that regardless of the AutoFilter settings, this column will always alternate TRUE/FALSE in the set of visible rows. This column can then be used to drive a conditional format that highlights alternating visible rows.

A couple sidenotes:

  • This function works because adjusting an AutoFilter triggers recalculation, and Excel notices that this function depends on row heights. For hiding columns, it's a lot less reliable. All the calls to IsVisible have to be forced to recompute after the column is hidden or displayed. To do this, IsVisible can be marked as volatile and recalculation forced by pressing F9. This is a lousy solution.
  • To optimize performance, the function short-circuits its search. The Exit Function's bail out of the calculation as soon as the first hidden row or column is discovered.
  • Excel's SUBTOTAL intrinsic function is also sensitive to AutoFilter settings.
September 20, 2005

I really liked This post by Dick Kusleika, over on Daily Dose of Excel. I'm a big fan of controlling frequently used options with keyboard shortcuts. To riff on Mr. Kusleika's post a little, here's a refinement I've found useful in the past for macros like these. Basically this allows the same macro to toggle a state as well as non-destructively display the current state.

The first time the macro MaybeToggleMAR is invoked, it displays the current state in the status bar, and sets a timer to expire in 3 seconds. If the macro is invoked a second time before the timer expires (easy to do if it's bound to a keystroke) the state is toggled. Technically speaking, the trickiest bit is that the function that sets the 3 second timer also has to handle cancelling any previous instance of the same timer. It works without the timer cancellation, but without it, the UI behaves oddly after multiple keypresses in rapid succession.

Chip Pearson's website has useful content discussing the Excel API's for Timers and the Status Bar.

Here's the code: to use it, stick it in a module and bind MaybeToggleMAR to the keyboard shortcut of your choice.

Option Explicit
Private MARChangesEnabled As Boolean

Public NextDisableTime As Double

Sub DisableMARChanges()
    Application.StatusBar = False
    MARChangesEnabled = False
End Sub

Sub DisableMARChangesSoon()
    On Error Resume Next
    Application.OnTime NextDisableTime, "DisableMARChanges", , False
    
    NextDisableTime = Now + TimeSerial(0, 0, 3)
    Application.OnTime NextDisableTime, "DisableMARChanges", , True
End Sub


Sub MaybeToggleMAR()
    Dim NewStatusText As String
    
    NewStatusText = ""
    
    If MARChangesEnabled Then
        Application.MoveAfterReturn = Not Application.MoveAfterReturn
        NewStatusText = "Status changed: "
    Else
        MARChangesEnabled = True
        NewStatusText = "Second press will change status: "
    End If
            
    If Application.MoveAfterReturn Then
        NewStatusText = NewStatusText & "MoveAfterReturn Enabled"
    Else
        NewStatusText = NewStatusText & "MoveAfterReturn Disabled"
    End If
                
    Application.StatusBar = NewStatusText
    
    DisableMARChangesSoon
End Sub
September 20, 2005

This may be something of a suprise, but Excel has even gotten the attention of Microsoft Research. Simon Peyton Jones, Margaret Burnett, and Alan Blackwell have written a paper that describes "extensions to the Excel spreadsheet that integrate user-defined functions into the spreadsheet grid". Of course, Excel doesn't do this... but, I wonder if that should be "Excel doesn't do this yet".

As a sidenote, this reminds me a little of how LabView handled subfunction definitions: subfunctions are defined using the same visual tools as top-level functions. It worked, but 'felt' a little heavy weight in actual use.

August 24, 2005

I literally dreamed about this last night. It would be wonderful if Excel supported formulas like this:

=LET(value=MATCH(item,range,0), IF(ISERROR(value), 0, value))

If you're into Lisp-y languages, it'd look like this:

(let ((value (match item range 0)))
  (if (is-error? value) 0 value))

The function call =LET(name=binding, expression) would create a local range name named name, bound (equal) to the value returned by binding, to be used during the evaluation of expression. In the example above, during the evaluation of IF(ISERROR(value), 0, value))<, value would be bound to the value returned by MATCH(item, range, 0).

It's worth pointing out that this is slightly different from how normal Excel range names work. Range names in Excel work through textual substitution. With textual substitution, the initial expression would be logically equivalent to this:

=IF(ISERROR(MATCH(item, range, 0)), 0, MATCH(item, range, 0)))

In other words, Excel would treat every instance of value as if MATCH(item, range, 0) was explictly spelled out. This means there are two calls to MATCH and two potential searches through the range. While it's possible that Excel optimizes the second search away, I'm not sure that anybody outside of Microsoft can know for sure how this is handled.

Microsoft's current reccomendation for handling the specific ISERROR scenario in the first expression is this VBA function:

Function IfError(formula As Variant, show As String)

    On Error GoTo ErrorHandler

    If IsError(formula) Then
        IfError = show
    Else
        IfError = formula
    End If

    Exit Function

ErrorHandler:
    Resume Next

End Function

This isn't bad, but it requires that spreadsheet authors and readers understand VBA. It also imposes significant performance costs: calling into VBA from a worksheet takes time.

Older Articles...