Sunday, May 29, 2011

Things worth to mention and remember (IV) - Data Loading

In this part of the series I'll cover some basics about data loading:

1. If you want to load a large amount of data quickly into a table or partition that gets truncated before the load and indexes also need to be maintained, then it is probably faster to set the the indexes to unusable before the load and rebuild them afterwards instead of letting the insert maintain the indexes. Note that even with a direct-path insert the index maintenance of usable indexes will generate undo and redo, whereas a separate index rebuild doesn't generate undo and can be also be run as a nologging operation if desired. However, as always, test for your particular situation/configuration as the index maintenance as part of direct-path inserts are quite efficient and therefore might not be that much slower than separate index rebuild steps.

However, there is a simple, yet important point to consider when attempting to load with unusable indexes:

A truncate makes indexes automatically usable

So the order of DDL statement execution before the load is relevant:

- ALTER INDEX ... UNUSABLE
- TRUNCATE TABLE

This way the data load will maintain the indexes since the TRUNCATE after the ALTER INDEX has set the indexes to USABLE again!

The probably intended order is:

- TRUNCATE TABLE
- ALTER INDEX ... UNUSABLE

The same applies to a data load into a partition of a table, only the syntax of the two commands is slightly different.

2. Note that you cannot load into a segment with a UNIQUE index defined that is in UNUSABLE state, you'll get an error "ORA-26026: unique index ... initially in unusable state" (when using a direct-path insert) or "ORA-01502: index ... or partition of such index is in unusable state" (when using conventional insert) even with SKIP_UNUSABLE_INDEXES set to TRUE (default from 10g on).

If you want to use an UNIQUE index that is not maintained during the data load you need to drop it and re-create after the load.

There is a however a way to circumvent this: You can support a UNIQUE or PRIMARY KEY constraint by means of a non-unique index. This way you can either set the constraint deferrable, or disable the constraint before the load while keeping the index in unusable state and rebuilding the index and re-enabling the constraint afterwards.

But you need to be aware of the following implications and side-effects of doing so. Note that depending on how you use the index and how your data manipulation patterns look like, they might not make any difference to your particular situation, but they can also have a very significant effect:

A non-unique index behaves differently from a unique index in several ways. Richard Foote has covered these differences very detailed already and since I probably couldn't say it any better I'm only going to mention here short recaps and corresponding links to his posts.

- It requires slightly more space for the same amount of data (one additional length byte, see Richard Foote)
- Depending on the data manipulation patterns it might require potentially significant more space as part of DML operations because the index entries can not be re-used within a transaction even when the same data gets re-inserted, see Richard Foote and again here (because the ROWID is part of the key to make the index expression unique and because the ROWID can not be re-used within a transaction because the row entries in the row directory of a block can not be re-used within a transaction, see Jonathan Lewis)
- For non-unique indexes only: If you insert multiple rows with the same index key expression depending on the ROWID there might also be differences in the efficiency of a potential index block split (50-50 vs. 90-10 (actually 99-1))
- At execution time an operation based on a INDEX UNIQUE SCAN is handled differently from an INDEX RANGE SCAN in terms of latching and optimizations of the consistent gets, so there might be measurable run-time differences (see Richard Foote). Note that this changes under certain circumstances from version 11g on, which I will cover in a separate blog series because it is a very interesting topic on its own.

More on the topic by Richard Foote.

Also there are a few oddities with this setup to be aware of:

- In 10.2 a unique/PK constraint supported by non-unique index disables direct-path inserts, see here for more details

- In 11.1 and 11.2 a unique/PK constraint supported by non-unique index allows direct-path inserts but unfortunately allows you to accidentally insert duplicates, see here for more details

3 comments:

  1. Randolf,
    thank you for the very instructive summary. I am a little bit surprised to see ORA-26026 - recently I made some tests with unusable unique indexes (in 11.1.0.7 and 11.2.0.1) and always got "ORA-01502: Index 'DBADMIN.TEST_INTERVAL_IDX' oder Partition dieses Index in nicht brauchbarem Zustand" - do you know what's the difference between the two error messages?

    Apart from this I don't understand why it's not possible to defer the checking of duplicates in an unique index until I try to rebuild the index (as it is for a pk constraint). (I asked Richard Foote for his opinion, but he only acknoledged that the need to drop and recreate unique indexes was a bit of a pain: http://richardfoote.wordpress.com/2011/02/27/oracle11g-zero-sized-unusable-indexes-part-ii-nathan-adler/#comment-12687)

    ReplyDelete
  2. Hi Martin,

    the difference regarding the error message is that I've used direct-path insert for testing - and conventional insert gives a different error message interestingly.

    I've updated the blog post accordingly, thanks for pointing this out.

    I agree that it is a bit unfortunate that a unusable unique index behaves like that.

    Randolf

    ReplyDelete
  3. Thanks for sharing this.

    ReplyDelete