PURPOSE -------
Explain what ctx_ddl.optimize_index does and how it works. Also answers the question what is the performance difference between optimizing a Text index in small batches versus running it with maxtime unlimited.
SCOPE & APPLICATION -------------------
Intended for anyone who works with large Oracle Text indexes.
HOW CTX_DDL.OPTIMIZE_INDEX WORKS --------------------------------
See [NOTE:104262.1] for an excellent overview of DML processing in Oracle Text.
In addition to this, you may have some remaining questions on the performance implications of running ctx_ddl.optimize_index.
Some background information first:
If you haven't run an optimize job on a large index (or an index on a very active table) for a while, you may have thousands or even millions of rows that need to be dealt with. Remember that any deletion or update will cause an entry to be inserted into the index's $N table identifying that row for garbage collection (ie. deletion).
With versions 8.1.5 - 9.0.1 a FULL optimize can only delete up to 16,000 docid's per run as it optimizes the index, regardless of how much time you allow it to run for. Once it has cleaned up 16,000 docid's you can submit the optimize job again to process more documents. The best way to check the progress of your index optimize is actually by executing the following sql statement from SQL*Plus, logged on as CTXSYS:
col idx_opt_token format a20 select idx_name,idx_opt_token,idx_opt_type,idx_opt_count from dr$index;
IDX_NAME IDX_OPT_TOKEN IDX_OPT_TYPE IDX_OPT_COUNT ------------------------------ -------------------- ------------ ------------- TAR_TABLE_TOOLS_CTXIDX BLOCKAND 0 3660 TAR_TABLE_APPS_CTXIDX TAR_TABLE_CTXIDX IBRECE 0 16000 ENAME_CTXIDX IMT_TEST_IDX MAILLIST_CTXIDX
Run this select statement while your optimize job is running. You will see the token under IDX_OPT_TOKEN change. This column keeps track of how far Oracle Text has progressed in the optimize job.
You can also check the fragmentation and invalid docid status of a given index using the CTX_REPORT package, introduced in 9.2.
Keep in mind that the $N table will only get cleaned out when we have optimized ALL the tokens found in the $I table (ie. the entire $I table must be scanned and processed before ANY invalidated docids are removed from the $N table). Also keep in mind that at most 16,000 docids will be removed from the $N (garbage collection) table.
Starting with version 9.2.0, a FULL optimize (given enough time) will actually process ALL the deleted documents even though the $N table may still have rows left over. This will be a little confusing. This is because internally, we are still only removing 16,000 docids from the $N table at a time, even if all invalid docids are removed from the $I table. For example, let's say you have an index that has 400,000 rows in $N because of invalidations. If you run a time-unlimited FULL optimize on that index and no other DML activity takes place against the base table during this full optimize, then when that full optimize is done, it will have optimized all tokens *and* removed all invalid docids from the $I table, even though the count in the $N table will only have gone down to 384,000 after the FULL optimize. This is normal behaviour and should not be a cause for concern. Repeatedly running a FULL optimize will eventually cause the count to go down to 0, if you run it enough times.
Many organizations need to use the 'maxtime' parameter with ctx_ddl.optimize_index because they can't afford to have the (possibly) long job run during normal operating hours when the additional load on the server may cause response-related problems. This is especially important under version 9.2, where a full optimize will optimize and do garbage collection on the entire table in one run if you don't specify a maxtime parameter. On huge tables, this could take many hours or perhaps even days -- this is really dependent on the speed of the machine, and of course on the size of the index.
The solution is to run the optimize job a little bit each night using dbms_job. However, because little information is available on the inner workings of the ctx_ddl.optimize_index procedure, many people will wonder whether running the optimize job in small batches will cause it to be less efficient than letting it run with maxtime unlimited. The good news is that this is not the case. Oracle Text remembers where it left off. So, if you run an optimize job for one hour each night, you need not worry that Oracle Text may be wasting time trying to figure out where it left off last, etc. As mentionned earlier, you can check dr$index to monitor the progress of the optimize job. If you keep track of the IDX_OPT_TOKEN column in dr$index, you'll know when the entire index has been optimized: When you run your first optimize job, you'll probably first see tokens starting with '$' being optimized. Eventually, tokens starting with numbers, and then letters will be optimized, etc. Near the end, tokens starting with '_' will be processed, and then tokens starting with characters other than the standard US7ASCII characterset and when the entire $I table is done, this column will become NULL and you'll be done. At that point, your entire index will have been optimized. If you now launch the optimize job again, this cycle will repeat. Oracle Text will not give you any other visual indication that the entire index has been optimized, except for that IDX_OPT_TOKEN column going blank (null).
Also keep in mind that the optimize job actually performs 2 distinct functions: 1) it removes invalidated documents (ie. garbage collection), and 2) it defragments the index, compacting each token together as much as possible.
These operations are being performed simultaneously during an optimize job. We don't work on purging all the delete docid's first and then defragmenting the tokens or the other way around - it's all being done at the same time.
Finally, keep in mind that the optimize job (once it has gone through fully optimizing an index), has essentially re-written the entire index, unless you're using Oracle 9.2 or later (see below).
As a result of this implementation, this job can be fairly demanding as far as I/O requirements go: It is essentially deleting, compacting, then re-inserting each row (except for rows waiting for garbage collection) in your $I table and related tables, so it will use a lot of I/O *and* create a lot of archives - make sure you have enough room in your archive destination.
If you are using 9.2 or later, a full optimize will *not* rewrite already-optimized rows. This will save a huge amount of time and archiving activity in optimizing your indexes.
RELATED DOCUMENTS -----------------
[NOTE:104262.1] Technical Overview: DML Processing in interMedia Text . |