Wednesday, December 10, 2008

The Importance of COMPUTE

In Monday's Windows-Style ORDER BY I answered the question "How do I use ORDER BY to sort alphanumeric strings the same way Windows Explorer sorts file names?" as follows:

"By using CREATE FUNCTION to return an alternative string value that will have the desired result when used in ORDER BY, and then (optionally) calling that function in a new COMPUTE column."
I really should have left out "(optionally)"... the COMPUTE clause isn't just some cute technique to push program code down into the database, it's a valuable performance improvement tool. And without it, in this case, queries run about as fast as continental drift.

Here is an example of a table without the COMPUTE column, followed by a query that refers to the user-defined function in the ORDER BY (see Windows-Style ORDER BY for the reformat code):
CREATE TABLE slow (
original_name LONG VARCHAR NOT NULL PRIMARY KEY );

SELECT TOP 100 START AT 200 *
FROM slow
ORDER BY reformat ( slow.original_name );
When the table contained 1000 rows and the cache was empty, SQL Anywhere 11 took over 7 seconds to run the query on a 3.4GHz Pentium 4 computer.

Here's what the plan looked like... a sequential table scan, no surprises there:



The table scan is just the beginning of the problem; it doesn't take 7 seconds to sort 1000 rows, but it does take that time to call the user defined function 1000 times.

Now consider a table with the COMPUTE column, plus a CLUSTERED INDEX on that column, followed by a query that refers to that column in the ORDER BY:
CREATE TABLE fast (
original_name LONG VARCHAR NOT NULL PRIMARY KEY,
alternate_name LONG VARCHAR NOT NULL
COMPUTE ( reformat ( original_name ) ) );

CREATE CLUSTERED INDEX xalternate_name
ON fast ( alternate_name );

SELECT TOP 100 START AT 200 *
FROM fast
ORDER BY fast.alternate_name;
On the same 1000 rows, same empty cache, same computer, the query now took less than 1 second. Now the plan looked better, an index scan:



But wait, there's more! If you code the ORDER BY to use the function call instead of the computed column, SQL Anywhere STILL finds the index and uses it, and it STILL takes less than one second:
SELECT TOP 100 START AT 200 *
FROM fast
ORDER BY reformat ( fast.original_name );
I kid you not; here's the plan... fast.alternate_name does not appear anywhere in the SELECT but the index xalternate_name is still being used:



There is a down side to this COMPUTE clause: the user defined function must be called for each row that is inserted or updated. In this test it took almost 7 seconds to run a LOAD TABLE with COMPUTES ON... but that's a one-time cost, whereas queries tend to be repeated over and over again:
LOAD TABLE fast ( original_name ) 
FROM 'file_names.txt'
COMPUTES ON;

No comments: