Teradata uses statistics to determine the most efficient query plan. Sometimes it is difficult to decide which statistics to collect. Collecting statistics requires a full table scan and you don’t want to do it unnecessarily, but not collecting the required statistics can lead to inefficient queries and increased resource consumption.
The following technique lets Teradata tell you which statistics it could use to improve the query plan for a specific query. To demonstrate, first, we set up some test tables.
CREATE TABLE A(k INTEGER, fk1 INTEGER, fk2 INTEGER) PRIMARY INDEX(k); CREATE TABLE B(kb1 INTEGER, kb2 INTEGER) PRIMARY INDEX(kb1); |
Now we turn on requesting statistics recommendations like this. The following can be entered as SQL using BTEQ or Teradata SQL Assistant.
diagnostic helpstats on for session;
Then when we ask for an EXPLAIN, we get statistics recommendations after the explain plan text.
EXPLAIN SELECT * FROM a JOIN b ON a.fk1 = b.kb1 AND a.fk2 = b.kb2; |
Note the recommended COLLECT STATISTICS commands at the end.
Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct TEST_DB."pseudo table" for read on a RowHash to prevent global deadlock for TEST_DB.b. 2) Next, we lock a distinct TEST_DB."pseudo table" for read on a RowHash to prevent global deadlock for TEST_DB.a. 3) We lock TEST_DB.b for read, and we lock TEST_DB.a for read. 4) We do an all-AMPs RETRIEVE step from TEST_DB.a by way of an all-rows scan with a condition of ("(NOT (TEST_DB.a.fk2 IS NULL )) AND (NOT (TEST_DB.a.fk1 IS NULL ))") into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with no confidence to be 96 rows. The estimated time for this step is 0.00 seconds. 5) We do an all-AMPs JOIN step from TEST_DB.b by way of a RowHash match scan with a condition of ("NOT (TEST_DB.b.kb1 IS NULL)"), which is joined to Spool 2 (Last Use) by way of a RowHash match scan. TEST_DB.b and Spool 2 are joined using a merge join, with a join condition of ("(fk1 = TEST_DB.b.kb1) AND (fk2 = TEST_DB.b.kb2)"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 9,216 rows. The estimated time for this step is 0.03 seconds. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds. BEGIN RECOMMENDED STATS -> 7) "COLLECT STATISTICS TEST_DB.b COLUMN KB1". (HighConf) 8) "COLLECT STATISTICS TEST_DB.b COLUMN (KB2 ,KB1)". (HighConf) 9) "COLLECT STATISTICS TEST_DB.b COLUMN KB2". (HighConf) 10) "COLLECT STATISTICS TEST_DB.a COLUMN K". (HighConf) 11) "COLLECT STATISTICS TEST_DB.a COLUMN (FK2 ,FK1)". (HighConf) 12) "COLLECT STATISTICS TEST_DB.a COLUMN FK2". (HighConf) 13) "COLLECT STATISTICS TEST_DB.a COLUMN FK1". (HighConf) <- END RECOMMENDED STATS
You can run the COLLECT STATISTICS commands that appear in double-quotes in order to provide Teradata with the data demographic information that it needs for an efficient plan.