Array Formulae

Array Formulae

Array formulae are one of Excel’s most powerful features, although not always the easiest to use. Look at Bob Umlas's excellent paper at http://www.emailoffice.com/excel/arrays-bobumlas.html for many examples of Array formulae, and also Chip Pearson's page at http://www.cpearson.com/excel/array.htm .

Single-cell array formulae are evaluated multiple times, depending on the number of cells referred to in the formula. This can take significant time, and may or may not be faster than the alternatives.
A range array formula covering multiple cells may be faster to calculate than individual formulae in each cell (although the speed advantage seems less in Excel97 and Excel2000).
Because array formulae and functions like SUM which reference ranges influence the sequence in which Excel calculates, you should try and avoid mixing row and column references or overlapping array references.

Minimise the Number of Array Operations

Minimise the number of cells and expressions evaluated.

Use helper columns or rows to take as many cell references as possible out of the array formulae

The main key to optimising the calculation speed of Array formulae is to make sure that the number of cells and expressions evaluated in the array formula is as small as possible.

Remember that an array formula is a bit like a volatile formula: if any single one of the cells it references has changed or is volatile or has been recalculated then the array formula will evaluate ALL the cells it references.

  • Take expressions and range references out of the array formulae into separate helper columns/rows. This will make much better use of Excel's smart recalculation process.
  • Don’t reference complete rows, or more rows and columns than you need.
  • Use Dynamic Range Names  where possible. Even though they are volatile its worthwhile because they minimise the size of the ranges.
  • Be careful with array formulae that reference both a row and a column: this forces the calculation of a rectangular range.
  • Use SUMPRODUCT if possible: it’s slightly faster than the equivalent array formula.

Array formulae SUM with multiple conditions

One frequent use of Array formulae is to do a sum with multiple conditions. This relatively easy to do, particularily if you use Excel's Conditional Sum Wizard, but often very slow. Usually there are alternative ways of getting the same result, but much faster.

Two good methods of speeding up multiple condition SUMs are:

  • Split out the multiple conditions into a column of helper formulae that return True or False for each row, then reference the helper column in a SUMIF or array formulae. Whilst it may seem that for a single array formula that this does not reduce the number of calculations, in fact most of the time this allows Excel's smart recalculation process to only recalculate the formulae in the helper column that NEED to be recalculated.
  • If the data can be sorted then a good technique is to count groups of rows and limit the array formulae to looking at the subset groups.

See SUMIF example for a worked example. The downloadable FastExcel Sample Problem also has an example which shows the method and the considerable time saving which is often achievable.

Using SUMPRODUCT for multiple condition array formulae.

There are some advantages to using SUMPRODUCT rather than SUM array formulae:

  • SUMPRODUCT does not have to be array-entered using Control-shift-enter.
  • SUMPRODUCT is usually slightly faster (5-10%).
  • SUMPRODUCT can be made to treat blanks and text as zero.

You can use SUMPRODUCT for multiple condition array formulae as follows:

SUMPRODUCT(--(Condition1),--(Condition2),RangetoSum)

where Condition1 & 2 are conditional expressions such as $A$1:$A$10000<=$Z4. Because conditional expresssions return True or False rather than numbers they need to be coerced to numbers inside the SUMPRODUCT function. You can do this using two minus signs (--) or by adding 0 (+0) or by multiplying by 1 (*1). Using -- is very slightly faster than +0 or *1.

Note that the size and shape of the ranges or arrays used in the conditional expressions and range to sum must be the same, and cannot contain entire columns.

You can also directly multiply the terms inside SUMPRODUCT rather than separate them by commas:

SUMPRODUCT((Condition1)*(Condition2)*RangetoSum)

But this is usually slightly slower than using the comma syntax, and gives an error if the range to sum contains a text value.
However it is slightly more flexible in that the range to sum may have for instance multiple columns when the conditions only have 1 column.

Using SUMPRODUCT to multiply and add ranges and arrays.

In cases like weighted average calculations where you need to multiply a range of numbers by another range of numbers and sum the results using the comma syntax for SUMPRODUCT can be 20-25% faster than an array entered SUM:

{=SUM($D$2:$D$10301*$E$2:$E$10301)}

=SUMPRODUCT($D$2:$D$10301*$E$2:$E$10301)

=SUMPRODUCT($D$2:$D$10301,$E$2:$E$10301)

These three formulae all produce the same result, but the third formula using the comma syntax for SUMPRODUCT only takes about 77% of the time to calculate that the other two formulae need.

Use DSUM instead of multiple condition array formulae.

If you only have a few array formulae sums with multiple conditions you may be able to use DSUM instead. DSUM is significantly faster than equivalent array formulae. The disadvantage of DSUM is that the criteria have to be in a separate range, which makes it impractical to use and maintain in many circumstances.

Use FastExcel AVLOOKUPS.

You can use the FastExcel Version 2 AVLOOKUPS function inside a SUM function to return multiple rows and SUM them. If you are doing multiple conditional sums, particularly on sorted data, AVLOOKUPS can be significantly faster than array formulae.

Array and Function Calculation Bottlenecks

Excel’s calculation engine is optimised to exploit array formulae and functions that reference ranges. However some unusual arrangements of these formulae and functions can sometimes, but not always, cause significantly increased calculation time.

If you find a calculation bottleneck involving array-formulae and range functions look for:

  • Partially overlapping references:
  • Array formulae/range functions which reference part of a block of cells that are calculated in another array formula/range function. This situation can frequently occur in time series analysis.
  • One set of formulae referencing by row, and a second set of formulae referencing the first set by column:
  • A large set of single-row array formulae covering a block of columns, with SUM functions at the foot of each column.

Advantages and Disadvantages of Array Formulae

Plus points of Array Formulae:

  • Array Formulae are concise: You can eliminate columns and rows by packing the calculations into an array formula.
  • Array Formulae are powerful: You can easily perform many complex calculations such as multiple conditional sums and counts using array formulae.
  • Array Formulae save disk space: Using array formulae covering a range of cells can reduce workbook size compared to equivalent individual formulae. Memory used does not usually seem to reduce significantly.
  • Array Formulae offer increased protection: Excel will only allow you to alter a complete array formula block so the user is prevented from accidentally changing a single formula.

Minus points of Array Formulae:

  • The Black-Box effect: Array Formulae can be complex and hard to understand. Most Excel users do not understand array formulae at all. This can reduce confidence and usability of your spreadsheet.
  • Calculation Overhead: Each time an array formula is calculated all of the virtual cells needed by the array formula are calculated, regardless of whether this is required or not. This may cause the array formula to be slower than a non-array equivalent set of formulae.
  • Requirement for all components of the array formulae to be the same size: This may require the array formula to perform a large number of unnecessary calculations. This problem frequently occurs in a SUM with multiple conditions problem. Sorting and then calculating the subset range that needs to be summed may often make significant calculation speed improvements. The SUMIF example shows how several hours of Array Calculations can be reduced to under a second.
© 2001-2006 Decision Models  Legal Disclaimer Privacy Statement E-Mail Webmaster