Problem with high ASCII characters in external tables with database character set AL32UTF8

Environment: Oracle 11.2.0.3.0 with AL32UTF8 character set, Windows 7, Oracle Linux 6.2 with UTF-8 locale.

After the migration of external tables from Oracle 10g with character set WE8MSWIN1252 to Oracle 11g with character set AL32UTF8, high ASCII characters like the degrees Celsius sign (°) showed up as upside down question marks in the web browser. The external tables are created on CSV text files which come from a Windows 7 PC and are uploaded on a Linux database server. Let’s try to simulate this problem and find a solution.

First, we need to create a directory to the /tmp folder and grant our user read and write access to that directory:

create directory tmp_dir as '/tmp';
grant read,write on directory tmp_dir to matthiash;

Next, create a text file called “high_ascii.csv” with these 2 lines in it:

This is the euro sign: €
This is the degrees Celsius sign: °

Put this file on the Linux server under /tmp.

We will now create an external table from the uploaded CSV file:

CREATE TABLE MATTHIASH.HIGH_ASCII_CSV_EXT
(
  HIGH_ASCII_TEXT   VARCHAR2(50)
)
ORGANIZATION EXTERNAL
  (  TYPE ORACLE_LOADER
     DEFAULT DIRECTORY TMP_DIR
     ACCESS PARAMETERS
       ( RECORDS DELIMITED BY newline
       NOLOGFILE
       SKIP 0
  FIELDS TERMINATED BY ","  OPTIONALLY ENCLOSED BY '"' LRTRIM
  missing field values are null
  reject rows with all null fields
  (
    high_ascii_text)
                   )
     LOCATION (TMP_DIR:'high_ascii.csv')
  )
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING;

We will now query the external table. This is from SQL Developer on Windows 7:

select * from MATTHIASH.HIGH_ASCII_CSV_EXT;
This is the euro sign: �
This is the degrees Celsius sign: �

You will notice that the euro sign and the degrees Celsius sign do not display correctly! I’m not sure what causes this translation problem. After all, shouldn’t Oracle be able to determine the character set of the CSV file and do the correct translation? Apparently not.

However, there is a solution. You can define the external table with the “CHARACTERSET” option. This parameter should match the character set of the text file that the external table is pointing to. In our case, the CSV files are created on Windows 7 with code page “1252 8-bit West European”, so the character set of the external table should be WE8MSWIN1252 (for a complete character set translation table between Oracle and Windows, see here).

Let’s recreate the external table:

DROP TABLE MATTHIASH.HIGH_ASCII_CSV_EXT;
CREATE TABLE MATTHIASH.HIGH_ASCII_CSV_EXT
(
  HIGH_ASCII_TEXT   VARCHAR2(50)
)
ORGANIZATION EXTERNAL
  (  TYPE ORACLE_LOADER
     DEFAULT DIRECTORY TMP_DIR
     ACCESS PARAMETERS 
       ( RECORDS DELIMITED BY newline CHARACTERSET WE8MSWIN1252
       NOLOGFILE
       SKIP 0
  FIELDS TERMINATED BY ","  OPTIONALLY ENCLOSED BY '"' LRTRIM 
  missing field values are null
  reject rows with all null fields
  (
    high_ascii_text)
                   )
     LOCATION (TMP_DIR:'high_ascii.csv')
  )
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING;

When you now query the external table again, you should see the correct values for the high ASCII characters:

select * from MATTHIASH.HIGH_ASCII_CSV_EXT;
This is the euro sign: €
This is the degrees Celsius sign: °

 

Matthias

Advertisements

3 Responses to Problem with high ASCII characters in external tables with database character set AL32UTF8

  1. Kasra Rasaee says:

    Thanks for the clarification, finally!

    I would of assumed setting it to UTF8 would include the ISO 8859-1 code pages, I suppose I was wrong!? the file was exported via excel as tab separated, characters were mangled..

    Checked, Latin character expressed as a single byte in 8859-1 or WIN1252 are actually expressed as more than a single byte in UTF8… anything from 0 to 127 (ASCII) remains
    as a single byte in UTF8.

    • matthiashoys says:

      Yes, that’s correct. Some single-byte characters use more than one byte in a unicode character set, for example the degrees Celsius character “°”.

  2. Sandy says:

    WE8MSWIN1252 has worked for letters like Ô É, but strange thing is that they did not work for small cases of same letters ô é. Any thought?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: