Monday, September 15, 2008

SQL*Loader direct

We had to setup SQL Server Replication to an Oracle database. The 2 tables in questions are huge. The small table has 63mill rows and the big one has 1.5bill rows. The big table has 50 date ranged partitions.

I used SQL Loader direct path without any indexes in place from csv files to do the initial load.

The first thing to do was to prepare the database for Direct Loading by executing the script $ORACLE_HOME/rdbms/admin/catldr.sql.

I also had to set my session’s DATE format to be the same as the format in the CSV file.

$ export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'

To get the fastest possible load performance on the 50 partitions I followed these rules:
There were no indexes on the table
The table was set to NOLOGGING
The database was put in NOARCHIVELOG mode

And the following SQL Loader options were used:
DIRECT=TRUE to invoke Direct Loading. This will effectively bypass most of the RDBMS processing
UNRECOVERABLE to turn off database logging
PARELLEL=TRUE and APPEND in order to run multiple load jobs concurrently for the different partitions.

My Korn Shell Script looked like this:

#!/bin/ksh

export ORAENV_ASK=NO;
export ORACLE_SID=xxprd;
. oraenv;

export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS".000"'

sqlldr xx/xx control=c1.ctl DIRECT=TRUE PARALLEL=TRUE log=c1.log &
sqlldr xx/xx control=c2.ctl DIRECT=TRUE PARALLEL=TRUE log=c2.log &
sqlldr xx/xx control=c3.ctl DIRECT=TRUE PARALLEL=TRUE log=c3.log &
… etc up to 10 jobs

And I created 10 control files they all looked identical except for the infile:

UNRECOVERABLE LOAD DATA
infile '/dump/exports/XXprd/x200512.csv'
append into table usage_profile
fields terminated by "," TRAILING NULLCOLS
(MONTH_ID, MSISDN, IMSI, IMEI, START_DT, END_DT, EVENT_CNT, STREAM, SUCCESS_CALLS, UNSUCCESS, DURATION, VOLUME_UPLINK, VOLUME_DOWNLINK, VOLUME, SUBSCRIBER_TYPE, SP_CODE)

The job completed in 12 minutes and 5 jobs later all 1.5 billion records were loaded!

No comments: