Doc ID: Note:176135.1
Subject: How CTX_DDL.OPTIMIZE_INDEX really works
Type: BULLETIN
Status: PUBLISHED
Content Type: TEXT/PLAIN
Creation Date: 12-FEB-2002
Last Revision Date: 31-DEC-2002
PURPOSE 

-------

 

Explain what ctx_ddl.optimize_index does and how it works.

Also answers the question what is the performance difference between optimizing

a Text index in small batches versus running it with maxtime unlimited.

 

SCOPE & APPLICATION

-------------------

 

Intended for anyone who works with large Oracle Text indexes.

 

 

HOW CTX_DDL.OPTIMIZE_INDEX WORKS

--------------------------------

 

See [NOTE:104262.1] for an excellent overview of DML processing in

Oracle Text.

 

In addition to this, you may have some remaining questions on the performance

implications of running ctx_ddl.optimize_index.

 

Some background information first:

 

If you haven't run an optimize job on a large index (or an index on a very active

table) for a while, you may have thousands or even millions of rows that need

to be dealt with.

Remember that any deletion or update will cause an entry to be inserted into

the index's $N table identifying that row for garbage collection (ie. deletion).

 

 

With versions 8.1.5 - 9.0.1 a FULL optimize can only delete up to 16,000

docid's per run as it optimizes the index, regardless of how much time you allow

it to run for. Once it has cleaned up 16,000 docid's you can submit the optimize

job again to process more documents. The best way to check the progress of your

index optimize is actually by executing the following sql statement from SQL*Plus,

logged on as CTXSYS:

 

col idx_opt_token format a20

select idx_name,idx_opt_token,idx_opt_type,idx_opt_count from dr$index;

 

IDX_NAME IDX_OPT_TOKEN IDX_OPT_TYPE IDX_OPT_COUNT

------------------------------ -------------------- ------------ -------------

TAR_TABLE_TOOLS_CTXIDX BLOCKAND 0 3660

TAR_TABLE_APPS_CTXIDX

TAR_TABLE_CTXIDX IBRECE 0 16000

ENAME_CTXIDX

IMT_TEST_IDX

MAILLIST_CTXIDX

 

Run this select statement while your optimize job is running. You will see the

token under IDX_OPT_TOKEN change. This column keeps track of how far Oracle

Text has progressed in the optimize job.

 

You can also check the fragmentation and invalid docid status of a given index

using the CTX_REPORT package, introduced in 9.2.

 

Keep in mind that the $N table will only get cleaned out when we have optimized

ALL the tokens found in the $I table (ie. the entire $I table must be scanned and

processed before ANY invalidated docids are removed from the $N table). Also

keep in mind that at most 16,000 docids will be removed from the $N (garbage

collection) table.

 

 

Starting with version 9.2.0, a FULL optimize (given enough time) will actually

process ALL the deleted documents even though the $N table may still have rows

left over. This will be a little confusing. This is because internally, we are

still only removing 16,000 docids from the $N table at a time, even if all

invalid docids are removed from the $I table. For example, let's say you have

an index that has 400,000 rows in $N because of invalidations. If you run a

time-unlimited FULL optimize on that index and no other DML activity takes place

against the base table during this full optimize, then when that full optimize is

done, it will have optimized all tokens *and* removed all invalid docids from

the $I table, even though the count in the $N table will only have gone down to

384,000 after the FULL optimize. This is normal behaviour and should not be a

cause for concern. Repeatedly running a FULL optimize will eventually cause the

count to go down to 0, if you run it enough times.

 

 

Many organizations need to use the 'maxtime' parameter with

ctx_ddl.optimize_index because they can't afford to have the (possibly) long job

run during normal operating hours when the additional load on the server may

cause response-related problems. This is especially important under version 9.2,

where a full optimize will optimize and do garbage collection on the entire

table in one run if you don't specify a maxtime parameter. On huge tables,

this could take many hours or perhaps even days -- this is really dependent on

the speed of the machine, and of course on the size of the index.

 

The solution is to run the optimize job a little bit each night using dbms_job.

However, because little information is available on the inner workings of the

ctx_ddl.optimize_index procedure, many people will wonder whether running the

optimize job in small batches will cause it to be less efficient than letting it

run with maxtime unlimited. The good news is that this is not the case. Oracle Text

remembers where it left off. So, if you run an optimize job for one hour each night,

you need not worry that Oracle Text may be wasting time trying to figure out where

it left off last, etc. As mentionned earlier, you can check dr$index to monitor

the progress of the optimize job. If you keep track of the IDX_OPT_TOKEN column

in dr$index, you'll know when the entire index has been optimized: When you run

your first optimize job, you'll probably first see tokens starting with '$' being

optimized. Eventually, tokens starting with numbers, and then letters will be

optimized, etc. Near the end, tokens starting with '_' will be processed, and

then tokens starting with characters other than the standard US7ASCII characterset

and when the entire $I table is done, this column will become NULL and you'll be

done. At that point, your entire index will have been optimized. If you now

launch the optimize job again, this cycle will repeat. Oracle Text will not

give you any other visual indication that the entire index has been optimized,

except for that IDX_OPT_TOKEN column going blank (null).

 

Also keep in mind that the optimize job actually performs 2 distinct functions:

1) it removes invalidated documents (ie. garbage collection), and

2) it defragments the index, compacting each token together as much as possible.

 

These operations are being performed simultaneously during an optimize job. We

don't work on purging all the delete docid's first and then defragmenting

the tokens or the other way around - it's all being done at the same time.

 

Finally, keep in mind that the optimize job (once it has gone through fully

optimizing an index), has essentially re-written the entire index, unless you're

using Oracle 9.2 or later (see below).

 

As a result of this implementation, this job can be fairly demanding as far as

I/O requirements go: It is essentially deleting, compacting, then re-inserting

each row (except for rows waiting for garbage collection) in your $I

table and related tables, so it will use a lot of I/O *and* create a lot of

archives - make sure you have enough room in your archive destination.

 

If you are using 9.2 or later, a full optimize will *not* rewrite already-optimized

rows. This will save a huge amount of time and archiving activity in optimizing

your indexes.

 

 

RELATED DOCUMENTS

-----------------

 

[NOTE:104262.1] Technical Overview: DML Processing in interMedia Text

 

.