I used SQL Loader direct path without any indexes in place from csv files to do the initial load.
The first thing to do was to prepare the database for Direct Loading by executing the script $ORACLE_HOME/rdbms/admin/catldr.sql.
I also had to set my session’s DATE format to be the same as the format in the CSV file.
$ export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'
To get the fastest possible load performance on the 50 partitions I followed these rules:
There were no indexes on the table
The table was set to NOLOGGING
The database was put in NOARCHIVELOG mode
And the following SQL Loader options were used:
DIRECT=TRUE to invoke Direct Loading. This will effectively bypass most of the RDBMS processing
UNRECOVERABLE to turn off database logging
PARELLEL=TRUE and APPEND in order to run multiple load jobs concurrently for the different partitions.
My Korn Shell Script looked like this:
… etc up to 10 jobs
#!/bin/ksh
export ORAENV_ASK=NO;
export ORACLE_SID=xxprd;
. oraenv;
export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS".000"'
sqlldr xx/xx control=c1.ctl DIRECT=TRUE PARALLEL=TRUE log=c1.log &
sqlldr xx/xx control=c2.ctl DIRECT=TRUE PARALLEL=TRUE log=c2.log &
sqlldr xx/xx control=c3.ctl DIRECT=TRUE PARALLEL=TRUE log=c3.log &
And I created 10 control files they all looked identical except for the infile:
UNRECOVERABLE LOAD DATA
infile '/dump/exports/XXprd/x200512.csv'
append into table usage_profile
fields terminated by "," TRAILING NULLCOLS
(MONTH_ID, MSISDN, IMSI, IMEI, START_DT, END_DT, EVENT_CNT, STREAM, SUCCESS_CALLS, UNSUCCESS, DURATION, VOLUME_UPLINK, VOLUME_DOWNLINK, VOLUME, SUBSCRIBER_TYPE, SP_CODE)
The job completed in 12 minutes and 5 jobs later all 1.5 billion records were loaded!
No comments:
Post a Comment