Problem with high ASCII characters in external tables with database character set AL32UTF8

Environment: Oracle 11.2.0.3.0 with AL32UTF8 character set, Windows 7, Oracle Linux 6.2 with UTF-8 locale.

After the migration of external tables from Oracle 10g with character set WE8MSWIN1252 to Oracle 11g with character set AL32UTF8, high ASCII characters like the degrees Celsius sign (°) showed up as upside down question marks in the web browser. The external tables are created on CSV text files which come from a Windows 7 PC and are uploaded on a Linux database server. Let’s try to simulate this problem and find a solution.

First, we need to create a directory to the /tmp folder and grant our user read and write access to that directory:

create directory tmp_dir as '/tmp';
grant read,write on directory tmp_dir to matthiash;

Next, create a text file called “high_ascii.csv” with these 2 lines in it:

This is the euro sign: €
This is the degrees Celsius sign: °

Put this file on the Linux server under /tmp.

We will now create an external table from the uploaded CSV file:

CREATE TABLE MATTHIASH.HIGH_ASCII_CSV_EXT
(
  HIGH_ASCII_TEXT   VARCHAR2(50)
)
ORGANIZATION EXTERNAL
  (  TYPE ORACLE_LOADER
     DEFAULT DIRECTORY TMP_DIR
     ACCESS PARAMETERS
       ( RECORDS DELIMITED BY newline
       NOLOGFILE
       SKIP 0
  FIELDS TERMINATED BY ","  OPTIONALLY ENCLOSED BY '"' LRTRIM
  missing field values are null
  reject rows with all null fields
  (
    high_ascii_text)
                   )
     LOCATION (TMP_DIR:'high_ascii.csv')
  )
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING;

We will now query the external table. This is from SQL Developer on Windows 7:

select * from MATTHIASH.HIGH_ASCII_CSV_EXT;
This is the euro sign: �
This is the degrees Celsius sign: �

You will notice that the euro sign and the degrees Celsius sign do not display correctly! I’m not sure what causes this translation problem. After all, shouldn’t Oracle be able to determine the character set of the CSV file and do the correct translation? Apparently not.

However, there is a solution. You can define the external table with the “CHARACTERSET” option. This parameter should match the character set of the text file that the external table is pointing to. In our case, the CSV files are created on Windows 7 with code page “1252 8-bit West European”, so the character set of the external table should be WE8MSWIN1252 (for a complete character set translation table between Oracle and Windows, see here).

Let’s recreate the external table:

DROP TABLE MATTHIASH.HIGH_ASCII_CSV_EXT;
CREATE TABLE MATTHIASH.HIGH_ASCII_CSV_EXT
(
  HIGH_ASCII_TEXT   VARCHAR2(50)
)
ORGANIZATION EXTERNAL
  (  TYPE ORACLE_LOADER
     DEFAULT DIRECTORY TMP_DIR
     ACCESS PARAMETERS 
       ( RECORDS DELIMITED BY newline CHARACTERSET WE8MSWIN1252
       NOLOGFILE
       SKIP 0
  FIELDS TERMINATED BY ","  OPTIONALLY ENCLOSED BY '"' LRTRIM 
  missing field values are null
  reject rows with all null fields
  (
    high_ascii_text)
                   )
     LOCATION (TMP_DIR:'high_ascii.csv')
  )
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING;

When you now query the external table again, you should see the correct values for the high ASCII characters:

select * from MATTHIASH.HIGH_ASCII_CSV_EXT;
This is the euro sign: €
This is the degrees Celsius sign: °

 

Matthias

Advertisements