Archive for June, 2009

TeradataTip6 – Let the database recommend which statistics to collect

Tuesday, June 2nd, 2009

Teradata uses statistics to determine the most efficient query plan. Sometimes it is difficult to decide which statistics to collect. Collecting statistics requires a full table scan and you don’t want to do it unnecessarily, but not collecting the required statistics can lead to inefficient queries and increased resource consumption.

The following technique lets Teradata tell you which statistics it could use to improve the query plan for a specific query. To demonstrate, first, we set up some test tables.

CREATE TABLE A(k INTEGER, fk1 INTEGER, fk2 INTEGER)
PRIMARY INDEX(k);
 
CREATE TABLE B(kb1 INTEGER, kb2 INTEGER)
PRIMARY INDEX(kb1);

Now we turn on requesting statistics recommendations like this. The following can be entered as SQL using BTEQ or Teradata SQL Assistant.

diagnostic helpstats on for session;

Then when we ask for an EXPLAIN, we get statistics recommendations after the explain plan text.

EXPLAIN
SELECT * FROM a JOIN b
ON a.fk1 = b.kb1 AND a.fk2 = b.kb2;

Note the recommended COLLECT STATISTICS commands at the end.

Explanation
---------------------------------------------------------------------------
  1) First, we lock a distinct TEST_DB."pseudo table" for read on
     a RowHash to prevent global deadlock for TEST_DB.b.
  2) Next, we lock a distinct TEST_DB."pseudo table" for read on a
     RowHash to prevent global deadlock for TEST_DB.a.
  3) We lock TEST_DB.b for read, and we lock TEST_DB.a for
     read.
  4) We do an all-AMPs RETRIEVE step from TEST_DB.a by way of an
     all-rows scan with a condition of ("(NOT (TEST_DB.a.fk2 IS
     NULL )) AND (NOT (TEST_DB.a.fk1 IS NULL ))") into Spool 2
     (all_amps), which is redistributed by hash code to all AMPs.  Then
     we do a SORT to order Spool 2 by row hash.  The size of Spool 2 is
     estimated with no confidence to be 96 rows.  The estimated time
     for this step is 0.00 seconds.
  5) We do an all-AMPs JOIN step from TEST_DB.b by way of a
     RowHash match scan with a condition of ("NOT (TEST_DB.b.kb1
     IS NULL)"), which is joined to Spool 2 (Last Use) by way of a
     RowHash match scan.  TEST_DB.b and Spool 2 are joined using a
     merge join, with a join condition of ("(fk1 = TEST_DB.b.kb1)
     AND (fk2 = TEST_DB.b.kb2)").  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with no confidence to be 9,216 rows.  The
     estimated time for this step is 0.03 seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.03 seconds.
     BEGIN RECOMMENDED STATS ->
  7) "COLLECT STATISTICS TEST_DB.b COLUMN KB1".  (HighConf)
  8) "COLLECT STATISTICS TEST_DB.b COLUMN (KB2 ,KB1)".  (HighConf)
  9) "COLLECT STATISTICS TEST_DB.b COLUMN KB2".  (HighConf)
 10) "COLLECT STATISTICS TEST_DB.a COLUMN K".  (HighConf)
 11) "COLLECT STATISTICS TEST_DB.a COLUMN (FK2 ,FK1)".  (HighConf)
 12) "COLLECT STATISTICS TEST_DB.a COLUMN FK2".  (HighConf)
 13) "COLLECT STATISTICS TEST_DB.a COLUMN FK1".  (HighConf)
     <- END RECOMMENDED STATS

You can run the COLLECT STATISTICS commands that appear in double-quotes in order to provide Teradata with the data demographic information that it needs for an efficient plan.

TeradataTip5 – How to find join indexes defined on your table

Monday, June 1st, 2009

If you try to drop or mload a table that has a join index defined on it, Teradata will give you an error:

5467:  Cannot drop or MLOAD a table with join or hash indexes. 

To find the join indexes that are defined on your table, run the following query:

SELECT  * FROM dbc.indices 
WHERE indextype='J'
     AND databasename='your_database_name'
     AND tablename='your_table_name'
ORDER BY indexname,columnposition;