Thursday, January 8, 2015

"SELECT * FROM TABLE" Runs Out Of TEMP Space

Now that I've shown in the previous post in general that sometimes Parallel Execution plans might end up with unnecessary BUFFER SORT operations, let's have a look what particular side effects are possible due to this.

What would you say if someone tells you that (s)he just did a simple, straightforward "SELECT * FROM TABLE" that took several minutes to execute without returning, only to then error out with "ORA-01652 unable to extend temp segment", and the TABLE in question is actually nothing but a simple, partitioned heap table, so no special tricks, no views, synonyms, VPD etc. involved, it's really just a plain simple table?

Some time ago I was confronted with such a case at a client. Of course, the first question is, why would someone run a plain SELECT * FROM TABLE, but nowadays with power users and developers using GUI based tools like TOAD or SQLDeveloper, this is probably the GUI approach of a table describe command. Since these tools by default show the results in a grid that only fetches the first n rows, this typically isn't really a threat even in case of large tables, besides the common problems with allocated PX servers in case the table is queried using Parallel Execution, and the users simply keep the grid/cursor open and hence don't allow re-using the PX servers for different executions.

But have a look at the following output, in this case taken from 12.1.0.2, but assuming the partitioned table T_PART in question is marked parallel, resides on Exadata, has many partitions that are compressed via HCC, that uncompressed represent several TB of data (11.2.0.4 on Exadata produces a similar plan):

SQL> explain plan for
  2  select * from t_part p;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display(format => 'BASIC PARTITION PARALLEL'));
Plan hash value: 2545275170

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                            | Name       | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                     |            |       |       |        |      |            |
|   1 |  PX COORDINATOR                                      |            |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)                                | :TQ10002   |       |       |  Q1,02 | P->S | QC (RAND)  |
|   3 |    BUFFER SORT                                       |            |       |       |  Q1,02 | PCWP |            |
|   4 |     VIEW                                             | VW_TE_2    |       |       |  Q1,02 | PCWP |            |
|   5 |      UNION-ALL                                       |            |       |       |  Q1,02 | PCWP |            |
|   6 |       CONCATENATION                                  |            |       |       |  Q1,02 | PCWP |            |
|   7 |        BUFFER SORT                                   |            |       |       |  Q1,02 | PCWC |            |
|   8 |         PX RECEIVE                                   |            |       |       |  Q1,02 | PCWP |            |
|   9 |          PX SEND ROUND-ROBIN                         | :TQ10000   |       |       |        | S->P | RND-ROBIN  |
|  10 |           BUFFER SORT                                |            |       |       |        |      |            |
|  11 |            PARTITION RANGE SINGLE                    |            |     2 |     2 |        |      |            |
|  12 |             TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PART     |     2 |     2 |        |      |            |
|* 13 |              INDEX RANGE SCAN                        | T_PART_IDX |     2 |     2 |        |      |            |
|  14 |        BUFFER SORT                                   |            |       |       |  Q1,02 | PCWC |            |
|  15 |         PX RECEIVE                                   |            |       |       |  Q1,02 | PCWP |            |
|  16 |          PX SEND ROUND-ROBIN                         | :TQ10001   |       |       |        | S->P | RND-ROBIN  |
|  17 |           BUFFER SORT                                |            |       |       |        |      |            |
|  18 |            PARTITION RANGE SINGLE                    |            |     4 |     4 |        |      |            |
|  19 |             TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PART     |     4 |     4 |        |      |            |
|* 20 |              INDEX RANGE SCAN                        | T_PART_IDX |     4 |     4 |        |      |            |
|  21 |        PX BLOCK ITERATOR                             |            |     6 |    20 |  Q1,02 | PCWC |            |
|* 22 |         TABLE ACCESS FULL                            | T_PART     |     6 |    20 |  Q1,02 | PCWP |            |
|  23 |       PX BLOCK ITERATOR                              |            |KEY(OR)|KEY(OR)|  Q1,02 | PCWC |            |
|* 24 |        TABLE ACCESS FULL                             | T_PART     |KEY(OR)|KEY(OR)|  Q1,02 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  13 - access("P"."DT">=TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
  20 - access("P"."DT">=TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
       filter(LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss')))
  22 - filter("P"."DT">=TO_DATE(' 2005-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2020-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND (LNNVL("P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2003-01-01 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))) AND (LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2001-01-01
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))))
  24 - filter("P"."DT"<TO_DATE(' 2005-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Can you spot the problem? It's again the "unnecessary BUFFER SORTS" problem introduced in the previous post. In particular the operation ID = 3 BUFFER SORT is "deadly" if the table T_PART is large, because it needs to buffer the whole table data before any row will be returned to the client. This explains why this simple SELECT * FROM T_PART will potentially run out of TEMP space, assuming the uncompressed table data is larger in size than the available TEMP space. Even if it doesn't run out of TEMP space it will be a totally inefficient operation, copying all table data to PGA (unlikely sufficient) respectively TEMP before returning any rows to the client.

But why does a simple SELECT * FROM TABLE come up with such an execution plan? A hint is the VW_TE_2 alias shown in the NAME column of the plan: It's the result of the "table expansion" transformation that was introduced in 11.2 allowing to set some partition's local indexes to unusable but still make use of the usable index partitions of other partitions. It takes a bit of effort to bring the table into a state where such a plan will be produced for a plain SELECT * FROM TABLE, but as you can see, it is possible. And as you can see from the CONCATENATION operation in the plan, the transformed query produced by the "table expansion" then triggered another transformation, the "concatenation" transformation mentioned in the previous post, that then results in the addition of unnecessary BUFFER SORT operations when combined with Parallel Execution.

Here is a manual rewrite that corresponds to the query that is the result of both, the "table expansion" and the "concatenation" transformation:
select * from (
select /*+ opt_param('_optimizer_table_expansion', 'false') */ * from t_part p where
("P"."DT">=TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
union all
select * from t_part p where
("P"."DT">=TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
and
(LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2001-01-01 00:00:00',
              'syyyy-mm-dd hh24:mi:ss')))
union all
select * from t_part p where
("P"."DT">=TO_DATE(' 2005-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2020-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND (LNNVL("P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2003-01-01 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))) AND (LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE('
              2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))))
)
union all
select * from t_part p where
("P"."DT"<TO_DATE(' 2005-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
;

But if you run an EXPLAIN PLAN on above manual rewrite, then 12.1.0.2 produces the following simple and elegant plan:
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                        | Name       | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                 |            |       |       |        |      |            |
|   1 |  PX COORDINATOR                                  |            |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)                            | :TQ10000   |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    UNION-ALL                                     |            |       |       |  Q1,00 | PCWP |            |
|   4 |     VIEW                                         |            |       |       |  Q1,00 | PCWP |            |
|   5 |      UNION-ALL                                   |            |       |       |  Q1,00 | PCWP |            |
|   6 |       PX SELECTOR                                |            |       |       |  Q1,00 | PCWP |            |
|   7 |        PARTITION RANGE SINGLE                    |            |     2 |     2 |  Q1,00 | PCWP |            |
|   8 |         TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PART     |     2 |     2 |  Q1,00 | PCWP |            |
|*  9 |          INDEX RANGE SCAN                        | T_PART_IDX |     2 |     2 |  Q1,00 | PCWP |            |
|  10 |       PX SELECTOR                                |            |       |       |  Q1,00 | PCWP |            |
|  11 |        PARTITION RANGE SINGLE                    |            |     4 |     4 |  Q1,00 | PCWP |            |
|  12 |         TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PART     |     4 |     4 |  Q1,00 | PCWP |            |
|* 13 |          INDEX RANGE SCAN                        | T_PART_IDX |     4 |     4 |  Q1,00 | PCWP |            |
|  14 |       PX BLOCK ITERATOR                          |            |     6 |    20 |  Q1,00 | PCWC |            |
|* 15 |        TABLE ACCESS FULL                         | T_PART     |     6 |    20 |  Q1,00 | PCWP |            |
|  16 |     PX BLOCK ITERATOR                            |            |KEY(OR)|KEY(OR)|  Q1,00 | PCWC |            |
|* 17 |      TABLE ACCESS FULL                           | T_PART     |KEY(OR)|KEY(OR)|  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   9 - access("P"."DT">=TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd 
              hh24:mi:ss'))
  13 - access("P"."DT">=TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd 
              hh24:mi:ss'))
       filter(LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2001-01-01 00:00:00', 
              'syyyy-mm-dd hh24:mi:ss')))
  15 - filter((LNNVL("P"."DT"<TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2003-01-01 00:00:00', 
              'syyyy-mm-dd hh24:mi:ss'))) AND (LNNVL("P"."DT"<TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) OR LNNVL("P"."DT">=TO_DATE(' 2001-01-01 
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))))
  17 - filter("P"."DT"<TO_DATE(' 2005-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2004-01-01 00:00:00', 'syyyy-mm-dd 
              hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') OR "P"."DT"<TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd 
              hh24:mi:ss') AND "P"."DT">=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

I've disabled the "table expansion" transformation in this case, because it kicks in again when optimizing this query and just adds some harmless (and useless) branches to the plan that confuse the issue. Without those additional, useless branches it is very similar to the above plan, but without any BUFFER SORT operations, hence it doesn't cause any overhead and should return the first rows rather quickly, no matter how large the table is.

The 11.2.0.4 optimizer unfortunately again adds unnecessary BUFFER SORT operations even to the manual rewrite above, so as mentioned in the previous post the problem of those spurious BUFFER SORTs isn't limited to the CONCATENATION transformation.

Of course, since all this is related to Parallel Execution, a simple workaround to the problem is to run the SELECT * FROM TABLE using a NO_PARALLEL hint, and all those strange side effects of BUFFER SORTS will be gone. And not having unusable local indexes will also prevent the problem, because then the "table expansion" transformation won't kick in.

Interestingly, if the optimizer is told about the true intention of initially fetching only the first n rows from the SELECT * FROM TABLE - for example simply by adding a corresponding FIRST_ROWS(n) hint - at least in my tests using 12.1.0.2 all the complex transformations were rejected and a plain (parallel) FULL TABLE SCAN was preferred instead, simply because it is now differently costed, which would allow working around the problem, too.

If you want to reproduce the issue, here's a sample table definition, along with some comments what I had to do to bring it into the state required to reproduce:
-- The following things have to come together to turn a simple SELECT * from partitioned table into a complex execution plan
-- including Table Expansion and Concatenation:
--
-- - Unusable index partitions to trigger Table Expansion
-- - Partitions with usable indexes that are surrounded by partitions with unusable indexes
-- - And such a partition needs to have an index access path that is cheaper than a corresponding FTS, typically by deleting the vast majority of rows without resetting the HWM
-- - All this also needs to be reflected properly in the statistics
--
-- If this scenario is combined with Parallel Execution the "Parallel Concatenation" bug that plasters the plan with superfluous BUFFER SORT will lead to the fact
-- that the whole table will have to be kept in memory / TEMP space when running SELECT * from the table, because the bug adds, among many other BUFFER SORTs, one deadly BUFFER SORT
-- on top level before returning data to the coordinator, typically operation ID = 3
--
create table t_part (dt not null, id not null, filler)
partition by range (dt)
(
partition p_1 values less than (date '2001-01-01'),
partition p_2 values less than (date '2002-01-01'),
partition p_3 values less than (date '2003-01-01'),
partition p_4 values less than (date '2004-01-01'),
partition p_5 values less than (date '2005-01-01'),
partition p_6 values less than (date '2006-01-01'),
partition p_7 values less than (date '2007-01-01'),
partition p_8 values less than (date '2008-01-01'),
partition p_9 values less than (date '2009-01-01'),
partition p_10 values less than (date '2010-01-01'),
partition p_11 values less than (date '2011-01-01'),
partition p_12 values less than (date '2012-01-01'),
partition p_13 values less than (date '2013-01-01'),
partition p_14 values less than (date '2014-01-01'),
partition p_15 values less than (date '2015-01-01'),
partition p_16 values less than (date '2016-01-01'),
partition p_17 values less than (date '2017-01-01'),
partition p_18 values less than (date '2018-01-01'),
partition p_19 values less than (date '2019-01-01'),
partition p_20 values less than (date '2020-01-01')
)
as
with generator as
(
  select /*+ cardinality(1000) */ rownum as id, rpad('x', 100) as filler from dual connect by level <= 1e3
)
select
        add_months(date '2000-01-01', trunc(
        case
        when id >= 300000 and id < 700000 then id + 100000
        when id >= 700000 then id + 200000
        else id
        end / 100000) * 12) as dt
      , id
      , filler
from    (
          select
                  (a.id + (b.id - 1) * 1e3) - 1 + 100000 as id
                , rpad('x', 100) as filler
          from
                  generator a,
                  generator b
        )
;

delete from t_part partition (p_2);

commit;

exec dbms_stats.gather_table_stats(null, 't_part')

create unique index t_part_idx on t_part (dt, id) local;

alter index t_part_idx modify partition p_1 unusable;

alter index t_part_idx modify partition p_3 unusable;

alter index t_part_idx modify partition p_5 unusable;

alter table t_part parallel;

alter index t_part_idx parallel;

set echo on pagesize 0 linesize 200

explain plan for
select * from t_part p;

select * from table(dbms_xplan.display(format => 'BASIC PARTITION PARALLEL'));

No comments:

Post a Comment