Programming Information


Handling internationalization and localization

This section discusses internationalization and localization issues relevant to jConnect.

Using jConnect to pass Unicode data

In Adaptive Server version 12.5 and later, database clients can take advantage of the unichar and univarchar datatypes. The two datatypes allow for the efficient storage and retrieval of Unicode data.

Quoting from the Unicode Standard, version 2.0:

"The  Unicode Standard is a fixed-width, uniform encoding scheme for encoding  characters and text. The repertoire of this international character  code for information processing includes characters for the major  scripts of the world, as well as technical symbols in common. The  Unicode character encoding treats alphabetic characters, ideographic  characters, and symbols identically, which means they can be used  in any mixture and with equal facility. The Unicode Standard is  modeled on the ASCII character set, but uses a 16-bit encoding to  support full multilingual text."

This means that the user can designate database table columns to store Unicode data, regardless of the default character set of the server.

Note   In Adaptive Server version 12.5 through 12.5.0.3, the server had to have a default character set of utf-8 in order to use the Unicode datatypes. However, in Adaptive Server 12.5.1 and later, database users can use unichar and univarchar without having to consider the default character set of the server.

When the server accepts unichar and univarchar data, jConnect behaves as follows:

For example, if a client attempts to send a Unicode Japanese character to an Adaptive Server 12.5.1 that has iso_1 as the default character set, jConnect detects that the Japanese character cannot be converted to an iso_1 character. jConnect then sends the string as Unicode data.

There is a performance penalty when a client sends unichar/univarchar data to a server. This is because jConnect must perform character-to-byte conversion twice for all strings and characters that do not map directly to the default character set of the server.

If you are using a jConnect version that is earlier than 6.05 and you wish to use the unichar and univarchar datatypes, you must perform the following tasks:

  1. Set the JCONNECT_VERSION = 6 or later. See "Setting the jConnect version" for more information.
  2. You need to set the DISABLE_UNICHAR_SENDING connection property to false. Starting with jConnect 6.05 this property is set to false by default. See "Setting connection properties" for more information.

Note   For more information on support for unichar and univarchar datatypes, see the manuals for Adaptive Server version 12.5 or later.

jConnect character-set converters

jConnect uses special classes for all character-set conversions. By selecting a character-set converter class, you specify how jConnect handles single-byte and multibyte character-set conversions, and what performance impact the conversions have on your applications.

There are two character-set conversion classes. The conversion class that jConnect uses is based on the JCONNECT_VERSION, CHARSET, and CHARSET_CONVERTER_CLASS connection properties.

Selecting a character-set converter

jConnect uses the JCONNECT_VERSION to determine the default
character-set converter class to use. For JCONNECT_VERSION = 2 or higher, the default is PureConverter the default is TruncationConverter. For JCONNECT_VERSION = 4 or higher, the default is PureConverter.

You can also set the CHARSET_CONVERTER_CLASS connection property to specify which character-set converter you want jConnect to use. This is useful if you want to use a character-set converter other than the default for your jConnect version.

For example, if you set JCONNECT_VERSION = 4 or later but want to use the TruncationConverter class rather than the multibyte PureConverter class, you can set CHARSET_CONVERTER_CLASS:

...
 props.put("CHARSET_CONVERTER_CLASS", 
   "com.sybase.jdbc3.utils.TruncationConverter")

Setting the CHARSET connection property

You can specify the character set to use in your application by setting the CHARSET driver property. If you do not set the CHARSET property:

You can also use the -J charset command line option for the IsqlApp application to specify a character set.

To determine which character sets are installed on your Adaptive Server, issue the following SQL query on your server:

select name from syscharsets
 go

For the PureConverter class, if the designated CHARSET does not work with the client Java Virtual Machine (VM), the connection fails with a SQLException, indicating that you must set CHARSET to a character set that is supported by both Adaptive Server and the client.

When the TruncationConverter class is used, character truncation is applied regardless of whether the designated CHARSET is 7-bit ASCII or not. Therefore, if your application needs to handle non-ascii data (for instance any asian languages), you should not use TruncationConverter, as this causes data corruption.

Improving character-set conversion performance

If you use multibyte character sets and need to improve driver performance, you can use the SunIoConverter class provided with the jConnect samples. See "SunIoConverter character-set conversion" for details.

In addition, you can use TruncationConverter to improve performance if your application deals with only 7-bit ASCII data.

Supported character sets

Table 2-4 lists the Sybase character sets that are supported by jConnect. The table also lists the corresponding JDK byte converter for each supported character set.

Although jConnect supports UCS-2, currently no Sybase databases or Open Servers support UCS-2.

Adaptive Server versions 12.5 and later support a version of Unicode known as the UTF-16 encoding.

Table 2-4 lists the character sets currently supported by Sybase.

Table 2-4: Supported Sybase character sets
SybCharset name JDK byte converter
ascii_7 ASCII
big5 Big5
big5hk (see note) Big5_HKSCS
cp037 Cp037
cp437 Cp437
cp500 Cp500
cp850 Cp850
cp852 Cp852
cp855 Cp855
cp857 Cp857
cp860 Cp860
cp863 Cp863
cp864 Cp864
cp866 Cp866
cp869 Cp869
cp874 Cp874
cp932 MS932
cp936 GBK
cp950 Cp950
cp1250 Cp1250
cp1251 Cp1251
cp1252 Cp1252
cp1253 Cp1253
cp1254 Cp1254
cp1255 Cp1255
cp1256 Cp1256
cp1257 Cp1257
cp1258 Cp1258
deckanji EUC_JP
eucgb EUC_CN
eucjis EUC_JP
eucksc EUC_KR
GB18030 GB18030
ibm420 Cp420
ibm918 Cp918
iso_1 ISO8859_1
iso88592 ISO8859-2
is088595 ISO8859_5
iso88596 ISO8859_6
iso88597 ISO8859_7
iso88598 ISO8859_8
iso88599 ISO8859_9
iso15 ISO8859_15_FDIS
koi8 KOI8_R
mac Macroman
mac_cyr MacCyrillic
mac_ee MacCentralEurope
macgreek MacGreek
macturk MacTurkish
sjis MS932
tis620 MS874
utf8 UTF8

Note   The big5hk character set is supported only if you are using JDK 1.3 or later.

European currency symbol support

jConnect versions 4.1 and later support the use of the new European currency symbol, or "euro ," and its conversion to and from UCS-2 Unicode.

The euro has been added to the following Sybase character sets: cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, cp874, iso885915, and utf8.

To use the euro symbol:

Unsupported character sets

The following Sybase character sets are not supported in jConnect because no JDK byte converters are analogous to the Sybase character sets:

You can use these character sets with the TruncationConverter class as long as the application uses only the 7-bit ASCII subsets of these characters.

 


Copyright (C) 2005. Sybase Inc. All rights reserved.