Array Formulae
Array formulae are one of Excels most powerful features, although
not always the easiest to use. Look at Bob Umlas's excellent paper
at http://www.emailoffice.com/excel/arrays-bobumlas.html
for many examples of Array formulae, and also Chip Pearson's page
at http://www.cpearson.com/excel/array.htm
.
Single-cell array formulae are evaluated multiple times, depending
on the number of cells referred to in the formula. This can take significant
time, and may or may not be faster than the alternatives.
A range array formula covering multiple cells may be faster to calculate
than individual formulae in each cell (although the speed advantage
seems less in Excel97 and Excel2000).
Because array formulae and functions like SUM which reference ranges
influence the sequence in which Excel calculates, you should try and
avoid mixing row and column references or overlapping array references.
Minimise the Number of Array Operations
| Minimise the number
of cells and expressions evaluated.
Use helper columns or rows to take as many
cell references as possible out of the array formulae |
The main key to optimising the calculation speed
of Array formulae is to make sure that the number of cells and
expressions evaluated in the array formula is as small as possible.
Remember that an array formula is a bit like a volatile formula:
if any single one of the cells it references has changed or
is volatile or has been recalculated then the array formula
will evaluate ALL the cells it references.
- Take expressions and range references out of the array
formulae into separate helper columns/rows. This will make
much better use of Excel's smart recalculation process.
- Dont reference complete rows, or more rows and columns
than you need.
- Use Dynamic Range Names where
possible. Even though they are volatile its worthwhile because
they minimise the size of the ranges.
- Be careful with array formulae that reference both a row
and a column: this forces the calculation of a rectangular
range.
- Use SUMPRODUCT if possible:
its slightly faster than the equivalent array formula.
|
Array formulae SUM with multiple conditions
One frequent use of Array formulae is to do a sum with multiple conditions.
This relatively easy to do, particularily if you use Excel's Conditional
Sum Wizard, but often very slow. Usually there are alternative ways
of getting the same result, but much faster.
Two good methods of speeding up multiple condition SUMs are:
- Split out the multiple conditions into a column of helper formulae
that return True or False for each row, then reference the helper
column in a SUMIF or array formulae. Whilst it may seem that for
a single array formula that this does not reduce the number of calculations,
in fact most of the time this allows Excel's smart recalculation
process to only recalculate the formulae in the helper column that
NEED to be recalculated.
- If the data can be sorted then a good technique is to count groups
of rows and limit the array formulae to looking at the subset groups.
See SUMIF example for a worked example.
The downloadable FastExcel Sample Problem
also has an example which shows the method and the considerable time
saving which is often achievable.
Using SUMPRODUCT for multiple condition array formulae.
There are some advantages to using SUMPRODUCT rather than SUM array
formulae:
- SUMPRODUCT does not have to be array-entered using Control-shift-enter.
- SUMPRODUCT is usually slightly faster (5-10%).
- SUMPRODUCT can be made to treat blanks and text as zero.
You can use SUMPRODUCT for multiple condition array formulae
as follows:
SUMPRODUCT(--(Condition1),--(Condition2),RangetoSum)
where Condition1 & 2 are conditional expressions such as $A$1:$A$10000<=$Z4.
Because conditional expresssions return True or False rather than
numbers they need to be coerced to numbers inside the SUMPRODUCT function.
You can do this using two minus signs (--) or by adding 0 (+0) or
by multiplying by 1 (*1). Using -- is very slightly faster than +0
or *1.
Note that the size and shape of the ranges or arrays used in the
conditional expressions and range to sum must be the same, and cannot
contain entire columns.
You can also directly multiply the terms inside SUMPRODUCT rather
than separate them by commas:
SUMPRODUCT((Condition1)*(Condition2)*RangetoSum)
But this is usually slightly slower than using the comma syntax,
and gives an error if the range to sum contains a text value.
However it is slightly more flexible in that the range to sum may
have for instance multiple columns when the conditions only have 1
column.
Using SUMPRODUCT to multiply and add ranges and arrays.
In cases like weighted average calculations where you need to multiply
a range of numbers by another range of numbers and sum the results
using the comma syntax for SUMPRODUCT can be 20-25% faster than an
array entered SUM:
{=SUM($D$2:$D$10301*$E$2:$E$10301)}
=SUMPRODUCT($D$2:$D$10301*$E$2:$E$10301)
=SUMPRODUCT($D$2:$D$10301,$E$2:$E$10301)
These three formulae all produce the same result, but the third formula
using the comma syntax for SUMPRODUCT only takes about 77% of the
time to calculate that the other two formulae need.
Use DSUM instead of multiple condition array formulae.
If you only have a few array formulae sums with multiple conditions
you may be able to use DSUM instead. DSUM is significantly faster
than equivalent array formulae. The disadvantage of DSUM is that the
criteria have to be in a separate range, which makes it impractical
to use and maintain in many circumstances.
Use FastExcel AVLOOKUPS.
You can use the FastExcel Version 2 AVLOOKUPS function inside a
SUM function to return multiple rows and SUM them. If you are doing
multiple conditional sums, particularly on sorted data, AVLOOKUPS
can be significantly faster than array formulae.
Array and Function Calculation Bottlenecks
Excels calculation engine is optimised to exploit array formulae
and functions that reference ranges. However some unusual arrangements
of these formulae and functions can sometimes, but not always, cause
significantly increased calculation time.
If you find a calculation bottleneck involving array-formulae and
range functions look for:
- Partially overlapping references:
- Array formulae/range functions which reference part of a block
of cells that are calculated in another array formula/range function.
This situation can frequently occur in time series analysis.
- One set of formulae referencing by row, and a second set of formulae
referencing the first set by column:
- A large set of single-row array formulae covering a block of
columns, with SUM functions at the
foot of each column.
Advantages and Disadvantages of Array Formulae
Plus points of Array Formulae:
- Array Formulae are concise: You can eliminate columns and rows
by packing the calculations into an array formula.
- Array Formulae are powerful: You can easily perform many complex
calculations such as multiple conditional sums and counts using
array formulae.
- Array Formulae save disk space: Using array formulae covering
a range of cells can reduce workbook size compared to equivalent
individual formulae. Memory used does not usually seem to reduce
significantly.
- Array Formulae offer increased protection: Excel will only allow
you to alter a complete array formula block so the user is prevented
from accidentally changing a single formula.
Minus points of Array Formulae:
- The Black-Box effect: Array Formulae can be complex and hard
to understand. Most Excel users do not understand array formulae
at all. This can reduce confidence and usability of your spreadsheet.
- Calculation Overhead: Each time an array formula is calculated
all of the virtual cells needed by the array formula are calculated,
regardless of whether this is required or not. This may cause the
array formula to be slower than a non-array equivalent set of formulae.
- Requirement for all components of the array formulae to be the
same size: This may require the array formula to perform a large
number of unnecessary calculations. This problem frequently occurs
in a SUM with multiple conditions problem. Sorting and then calculating
the subset range that needs to be summed may often make significant
calculation speed improvements. The SUMIF
example shows how several hours of Array Calculations can be
reduced to under a second.
|