inicio mail me! sindicaci;ón

Importing large tab-delimited txt files into an Oracle database

Yesterday a co-worker asked me to help load a large tab-delimited txt file into an Oracle database.  The file had 6.5 million rows and eleven columns.  She typically uses TOAD to import data from personal sources like txt files and Excel spreadsheets into disparate databases.  When she tried to use TOAD to import the 6.5 million row file, it failed at around 4 million rows on multiple multi-TB databases.

I used sqlldr to import the file and ran into a few issues along the way so I thought I’d share my experience for a couple reasons: 1) to help me remember what I did, and 2) to help other poor souls who run into problems when faced with similar circumstances.

The first obstacle I faced was figuring out how to tell Oracle that the fields are tab-delimited.  (I always use the pipe as a delimiter because you are not likely to find pipes in your fields and there is no problem determining null columns.)  A quick search led me to an asktom.oracle.com thread that provided the answer: X’9′. Easy enough.  I imported the file.  It did not import any rows because the ordering of the columns was thrown out of whack whenever there was a null value in a field.  When the values shifted left, 20 character varchar2 values did not fit in 2 character columns.  Next I adjusted the sizes and types of the columns to see which data was being imported where.  All the rows imported, but the ordering of the columns was off whenever there was a null in a field.  This was a big problem because there were three columns with more null values than populated values.  I tried a number of different options in the ctl file that did not work, mostly centering around the NULLIF operator.  Oracle was never able to discern when a column was null as long as I had the line OPTIONALLY ENCLOSED BY ‘”‘ in the ctl file.  So I removed it and used the following:

LOAD DATA
INFILE “\SERVERNAMEFILEPATHFILENAME.txt”
REPLACE
INTO TABLE SCHEMA.TABLE
FIELDS terminated by X’9′
TRAILING NULLCOLS
(
ACSID,
EMAIL_ID,
FNAME,
MNAME,
LNAME,
FULL_NAME,
DIVISION_CD,
GROUP_NAME,
RG_SOURCE,
RG_DT,
EVENT_PARTICIPATION
)

All the data loaded in the correct fields.  Hooray!  One last problem: every field was enclosed with quotation marks so, in addition to being aesthetically displeasing, if I wanted to join this new table with any existing tables, I’d have a problem.  Removing the quotation marks was easy; I ran a simple update statement:

UPDATE TABLE
SET ACSID=REPLACE(ACSID,'”‘,”),
EMAIL_ID=REPLACE(EMAIL_ID,'”‘,”),
FNAME=REPLACE(FNAME,'”‘,”),
MNAME=REPLACE(MNAME,'”‘,”),
LNAME=REPLACE(LNAME,'”‘,”),
FULL_NAME=REPLACE(FULL_NAME,'”‘,”),
DIVISION_CD=REPLACE(DIVISION_CD,'”‘,”),
GROUP_NAME=REPLACE(GROUP_NAME,'”‘,”),
RG_SOURCE=REPLACE(RG_SOURCE,'”‘,”),
RG_DT=REPLACE(RG_DT,'”‘,”),
EVENT_PARTICIPATION=REPLACE(EVENT_PARTICIPATION,'”‘,”);

Mission accomplished.

donnaj923 said,

April 4, 2009 @ 6:00 pm

Worst blog post ever!

Pages tagged "imported" said,

April 5, 2009 @ 12:03 am

[…] bookmarks tagged imported ahelms.com » Importing large tab-delimited txt fi… saved by 10 others     sporty04 bookmarked on 04/04/09 | […]

RSS feed for comments on this post · TrackBack URI

Leave a Comment

This blog is kept spam free by WP-SpamFree.