ORA-02374, ORA-12899, ORA-02372 during Data Pump export/import from 10g WE8MSWIN1252 into 11g AL32UTF8

Environment: Oracle 10.2.0.5, Oracle 11.2.0.3, Oracle Linux 6.2

During a data pump export/import from 10g into 11g, I received several errors in the data pump import log file:

ORA-02374: conversion error loading table “COMPOUND”.”COMPOUND”
ORA-12899: value too large for column COMPMP (actual: 21, maximum: 20)
ORA-02372: data for row: COMPMP : 0X’3E323030B043206465636F6D706F736974696F6E’

The cause of these errors is related to the fact that I am migrating data from a database with a single-byte character set (WE8MSWIN1252) into one with a multi-byte character set (AL32UTF8). This means that some single-byte characters will be “expanded” into multi-byte characters, and if the column was already filled completely, ORA-12899 will be raised, showing the maximum allowed column value and the actual value needed to fit all the characters into the multi-byte column.

There are several solutions to this problem: increase the size of the source column or truncate the data before the import, stick to the same character set, pre-create the tables with modified column sizes in the source database before the import, … But I won’t elaborate too much on this.

What if you want to see the actual values or ROWID’s of the affected rows? If you have only a few rows with problems, you could query the original source table based on the column value from the import log file, converted to ASCII:

select rowid,compmp from compound.compound where compmp = utl_raw.cast_to_varchar2('3E323030B043206465636F6D706F736974696F6E')

Output:

ROWID,COMPMP
AAAQAYAAKAABjDuAAd,>200°C decomposition

Here we see that the character for degrees (°) is causing the problem during the import, since 1 byte is converted into 2 bytes and the column already has 20 characters.

However, if you have a lot of rows with conversion errors in your import log file, there’s another solution. This requires the installation of the Database Character Set Scanner utility (csscan) into your source database.

Csscan is installed by running the csminst.sql script under $ORACLE_HOME/rdbms/admin. This script will create a user “csmig”. It’s a good idea to first modify the following line so the user csmig doesn’t write into the SYSTEM tablespace:

alter user csmig default tablespace SYSTEM quota unlimited on SYSTEM

In my case, I replaced SYSTEM by SYSAUX.

Let’s install csscan into our source database:

$ cd $ORACLE_HOME/rdbms/admin
$ sqlplus /nolog

SQL*Plus: Release 10.2.0.5.0 – Production on Wed Apr 11 13:21:33 2012

Copyright (c) 1982, 2010, Oracle. All Rights Reserved.

SQL> conn / as sysdba
Connected.
SQL> set TERMOUT ON
SQL> set ECHO ON
SQL> spool csminst.log
SQL> @csminst.sql

After this you should check the csminst.log file for errors. In my case, there were some errors because the script tried to drop public synonyms that didn’t exist. Nothing to worry about.

Now that csscan is installed, you can use it to check specific schemas or tables for data conversion problems during the migration to another character set. I executed it as follows (you will need the password of the SYS user):

$ csscan \”sys as sysdba\” LOG=/tmp/csscan.log USER=compound CAPTURE=Y TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=3

This will analyze all character data stored in the tables of the schema “compound”. The script will create 3 log files. The log file with the extension “.err” has the list of all affected rows with their corresponding ROWID’s. You can count the affected rows from the log file using the UNIX “grep” command:

$ grep -i “exceed column size” /tmp/csscan.log.err|wc -l

568

So, in my case, 568 rows from the “compound” schema will have conversion problems during the migration to the AL32UTF8 character set.

You can remove csscan by dropping the user “csmig”. More information regarding csscan can be found on My Oracle Support, please see document ID 1297961.1.

Matthias