Oracle Database - Bytes or Characters for VARCHAR2 and CHAR
Historically database columns which hold alphanumeric data have been defined using the number of bytes they store. This approach was fine as the number of bytes equated to the number of characters when using single-byte character sets.
With the increasing use of multi-byte character sets to support globalized databases comes the problem of bytes no longer equating to characters.
The VARCHAR2 and CHAR types support two methods of specifying lengths:
In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multi-byte character sets.
In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information.
1) Difference between varchar2(11 CHAR) and varchar2(11 BYTE)
If you define the field as VARCHAR2(11 BYTE), Oracle will allocate 11 bytes for storage, but you may not actually be able to store 11 characters in the field, because some of them take more than one byte to store, e.g. non-English characters.
By defining the field as VARCHAR2(11 CHAR) you tell Oracle to allocate enough space to store 11 characters, no matter how many bytes it takes to store each one. I believe that in Oracle 10g, 3 bytes per character were used.
Advise:
When using a multibyte character set such as UTF8, since your intention is likely to define a column that can in fact store characters of data, you may use :
> the CHAR modifier in the VARCHAR2/CHAR definition. That is, use of VARCHAR2(80 CHAR), not VARCHAR2(80).
> the session or system parameter NLS_LENGTH_SEMANTICS to change the default behavior from BYTE to CHAR.
It's not recommended to change this setting at the system level scope but rather an the SESSION scope.
ALTER system SET nls_length_semantics=CHAR scope=both;
ALTER system SET nls_length_semantics=byte scope=both;
ALTER session SET nls_length_semantics=CHAR;
ALTER session SET nls_length_semantics=byte
Depending on the system configuration, size of CHAR mesured in BYTES can vary. In your examples:
1.Limits field to 11 BYTE
2.Limits field to 11 CHARacters
Conclusion:
1 CHAR is not equal to 1 BYTE.
BYTE would be the default; the other possibility is CHAR.
2) Difference between varchar2(10 CHAR) and varchar2(10)
BYTE would be the default; the other possibility is CHAR.
So, technically, there is no difference between the two (BYTE being default), the difference is in the CHAR option: a character
can take more that one byte (up to 3 byte, iirc). Think multibyte character sets.
This option tells a multibyte database (!) to use character semantics when storing data: a varchar2(10) would be able to store 10 characters,
in stead of 10 byte worth of characters.
For example in ASCII each character is coded on a single byte, so a varchar2(10) field would accept any 10 characters long word, whereas in UTF-16 where each character is coded on 2 bytes, the same field would only accept 5 characters long words.
In comparison, a varchar2(10 CHAR) would always accept 10 characters long words, be the character set ASCII or UTF-16 or any other one.
for details click here for Character Semantics and Globalization in Oracle
character-semantics-and-globalization
Thanks
Historically database columns which hold alphanumeric data have been defined using the number of bytes they store. This approach was fine as the number of bytes equated to the number of characters when using single-byte character sets.
With the increasing use of multi-byte character sets to support globalized databases comes the problem of bytes no longer equating to characters.
The VARCHAR2 and CHAR types support two methods of specifying lengths:
In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multi-byte character sets.
In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information.
1) Difference between varchar2(11 CHAR) and varchar2(11 BYTE)
If you define the field as VARCHAR2(11 BYTE), Oracle will allocate 11 bytes for storage, but you may not actually be able to store 11 characters in the field, because some of them take more than one byte to store, e.g. non-English characters.
By defining the field as VARCHAR2(11 CHAR) you tell Oracle to allocate enough space to store 11 characters, no matter how many bytes it takes to store each one. I believe that in Oracle 10g, 3 bytes per character were used.
Advise:
When using a multibyte character set such as UTF8, since your intention is likely to define a column that can in fact store characters of data, you may use :
> the CHAR modifier in the VARCHAR2/CHAR definition. That is, use of VARCHAR2(80 CHAR), not VARCHAR2(80).
> the session or system parameter NLS_LENGTH_SEMANTICS to change the default behavior from BYTE to CHAR.
It's not recommended to change this setting at the system level scope but rather an the SESSION scope.
ALTER system SET nls_length_semantics=CHAR scope=both;
ALTER system SET nls_length_semantics=byte scope=both;
ALTER session SET nls_length_semantics=CHAR;
ALTER session SET nls_length_semantics=byte
Depending on the system configuration, size of CHAR mesured in BYTES can vary. In your examples:
1.Limits field to 11 BYTE
2.Limits field to 11 CHARacters
Conclusion:
1 CHAR is not equal to 1 BYTE.
BYTE would be the default; the other possibility is CHAR.
2) Difference between varchar2(10 CHAR) and varchar2(10)
BYTE would be the default; the other possibility is CHAR.
So, technically, there is no difference between the two (BYTE being default), the difference is in the CHAR option: a character
can take more that one byte (up to 3 byte, iirc). Think multibyte character sets.
This option tells a multibyte database (!) to use character semantics when storing data: a varchar2(10) would be able to store 10 characters,
in stead of 10 byte worth of characters.
For example in ASCII each character is coded on a single byte, so a varchar2(10) field would accept any 10 characters long word, whereas in UTF-16 where each character is coded on 2 bytes, the same field would only accept 5 characters long words.
In comparison, a varchar2(10 CHAR) would always accept 10 characters long words, be the character set ASCII or UTF-16 or any other one.
for details click here for Character Semantics and Globalization in Oracle
character-semantics-and-globalization
Thanks
No comments:
Post a Comment