Programming Information
This section discusses internationalization and localization issues relevant to jConnect.
In Adaptive Server version 12.5 and later, database clients can take advantage of the unichar and univarchar datatypes. The two datatypes allow for the efficient storage and retrieval of Unicode data.
Quoting from the Unicode Standard, version 2.0:
"The Unicode Standard is a fixed-width, uniform encoding scheme for encoding characters and text. The repertoire of this international character code for information processing includes characters for the major scripts of the world, as well as technical symbols in common. The Unicode character encoding treats alphabetic characters, ideographic characters, and symbols identically, which means they can be used in any mixture and with equal facility. The Unicode Standard is modeled on the ASCII character set, but uses a 16-bit encoding to support full multilingual text."
This means that the user can designate database table columns to store Unicode data, regardless of the default character set of the server.
In Adaptive Server version 12.5 through 12.5.0.3, the
server had to have a default character set of utf-8 in order to
use the Unicode datatypes. However, in Adaptive Server 12.5.1 and
later, database users can use unichar and univarchar without
having to consider the default character set of the server.
When the server accepts unichar and univarchar data, jConnect behaves as follows:
For example, if a client attempts to send a Unicode Japanese character to an Adaptive Server 12.5.1 that has iso_1 as the default character set, jConnect detects that the Japanese character cannot be converted to an iso_1 character. jConnect then sends the string as Unicode data.
There is a performance penalty when a client sends unichar/univarchar data to a server. This is because jConnect must perform character-to-byte conversion twice for all strings and characters that do not map directly to the default character set of the server.
If you are using a jConnect version that is earlier than 6.05 and you wish to use the unichar and univarchar datatypes, you must perform the following tasks:
For more information on support for unichar and univarchar datatypes, see
the manuals for Adaptive Server version 12.5 or later.
jConnect uses special classes for all character-set conversions. By selecting a character-set converter class, you specify how jConnect handles single-byte and multibyte character-set conversions, and what performance impact the conversions have on your applications.
There are two character-set conversion classes. The conversion class that jConnect uses is based on the JCONNECT_VERSION, CHARSET, and CHARSET_CONVERTER_CLASS connection properties.
jConnect uses the JCONNECT_VERSION to determine the
default
character-set
converter class to use. For JCONNECT_VERSION = 2
or higher, the default is PureConverter the default is TruncationConverter.
For JCONNECT_VERSION = 4 or higher, the default
is PureConverter.
You can also set the CHARSET_CONVERTER_CLASS connection property to specify which character-set converter you want jConnect to use. This is useful if you want to use a character-set converter other than the default for your jConnect version.
For example, if you set JCONNECT_VERSION = 4 or later but want to use the TruncationConverter class rather than the multibyte PureConverter class, you can set CHARSET_CONVERTER_CLASS:
...
props.put("CHARSET_CONVERTER_CLASS",
"com.sybase.jdbc3.utils.TruncationConverter")
You can specify the character set to use in your application by setting the CHARSET driver property. If you do not set the CHARSET property:
You can also use the -J charset command line option for the IsqlApp application to specify a character set.
To determine which character sets are installed on your Adaptive Server, issue the following SQL query on your server:
select name from syscharsets go
For the PureConverter class, if the designated CHARSET does not work with the client Java Virtual Machine (VM), the connection fails with a SQLException, indicating that you must set CHARSET to a character set that is supported by both Adaptive Server and the client.
When the TruncationConverter class is used, character truncation is applied regardless of whether the designated CHARSET is 7-bit ASCII or not. Therefore, if your application needs to handle non-ascii data (for instance any asian languages), you should not use TruncationConverter, as this causes data corruption.
If you use multibyte character sets and need to improve driver performance, you can use the SunIoConverter class provided with the jConnect samples. See "SunIoConverter character-set conversion" for details.
In addition, you can use TruncationConverter to improve performance if your application deals with only 7-bit ASCII data.
Table 2-4 lists the Sybase character sets that are supported by jConnect. The table also lists the corresponding JDK byte converter for each supported character set.
Although jConnect supports UCS-2, currently no Sybase databases or Open Servers support UCS-2.
Adaptive Server versions 12.5 and later support a version of Unicode known as the UTF-16 encoding.
Table 2-4 lists the character sets currently supported by Sybase.
| SybCharset name | JDK byte converter |
|---|---|
| ascii_7 | ASCII |
| big5 | Big5 |
| big5hk (see note) | Big5_HKSCS |
| cp037 | Cp037 |
| cp437 | Cp437 |
| cp500 | Cp500 |
| cp850 | Cp850 |
| cp852 | Cp852 |
| cp855 | Cp855 |
| cp857 | Cp857 |
| cp860 | Cp860 |
| cp863 | Cp863 |
| cp864 | Cp864 |
| cp866 | Cp866 |
| cp869 | Cp869 |
| cp874 | Cp874 |
| cp932 | MS932 |
| cp936 | GBK |
| cp950 | Cp950 |
| cp1250 | Cp1250 |
| cp1251 | Cp1251 |
| cp1252 | Cp1252 |
| cp1253 | Cp1253 |
| cp1254 | Cp1254 |
| cp1255 | Cp1255 |
| cp1256 | Cp1256 |
| cp1257 | Cp1257 |
| cp1258 | Cp1258 |
| deckanji | EUC_JP |
| eucgb | EUC_CN |
| eucjis | EUC_JP |
| eucksc | EUC_KR |
| GB18030 | GB18030 |
| ibm420 | Cp420 |
| ibm918 | Cp918 |
| iso_1 | ISO8859_1 |
| iso88592 | ISO8859-2 |
| is088595 | ISO8859_5 |
| iso88596 | ISO8859_6 |
| iso88597 | ISO8859_7 |
| iso88598 | ISO8859_8 |
| iso88599 | ISO8859_9 |
| iso15 | ISO8859_15_FDIS |
| koi8 | KOI8_R |
| mac | Macroman |
| mac_cyr | MacCyrillic |
| mac_ee | MacCentralEurope |
| macgreek | MacGreek |
| macturk | MacTurkish |
| sjis | MS932 |
| tis620 | MS874 |
| utf8 | UTF8 |
The big5hk character set is supported only if you are
using JDK 1.3 or later.
jConnect versions 4.1 and later support the use of the new European currency symbol, or "euro ," and its conversion to and from UCS-2 Unicode.
The euro has been added to the following Sybase character sets: cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, cp874, iso885915, and utf8.
To use the euro symbol:
The following Sybase character sets are not supported in jConnect because no JDK byte converters are analogous to the Sybase character sets:
You can use these character sets with the TruncationConverter class as long as the application uses only the 7-bit ASCII subsets of these characters.
| Copyright (C) 2005. Sybase Inc. All rights reserved. |
| |