|Navigate Language Fundamentals topic: )|
Most Java program text consists of ASCII characters, but any Unicode character can be used as part of identifier names, in comments, and in character and string literals. For example, π (which is the Greek Lowercase Letter pi) is a valid Java identifier:
|Code section 3.100: Pi.
and in a string literal:
|Code section 3.101: Pi literal.
Unicode escape sequences
Unicode characters can also be expressed through Unicode Escape Sequences. Unicode escape sequence may appear anywhere in a Java source file (including inside identifiers, comments, and string literals).
Unicode escape sequences consist of
- a backslash '
\' (ASCII character 92, hex 0x5c),
- a '
u' (ASCII 117, hex 0x75)
- optionally one or more additional '
u' characters, and
- four hexadecimal digits (the characters '
0' through '
9' or '
a' through '
f' or '
A' through '
Such sequences represent the UTF-16 encoding of a Unicode character. For example, 'a' is equivalent to '\u0061'. This escape method does not support characters beyond U+FFFF or you have to make use of surrogate pairs.
Any and all characters in a program may be expressed in Unicode escape characters, but such programs are not very readable, except by the Java compiler! They are not compact either!
One can find a full list of the characters here.
π may also be represented in Java as the Unicode escape sequence
\u03C0. Thus, the following is a valid, but not very readable, declaration and assignment:
|Code section 3.102: Unicode escape sequences for Pi.
The following demonstrates the use of Unicode escape sequences in other Java syntax:
|Code section 3.103: Unicode escape sequences in a string literal.
Note that a Unicode escape sequence functions just like any other character in the source code. E.g.,
\u0022 (double quote, ") needs to be quoted in a string just like ".
|Code section 3.104: Double quote.
International language support
The language distinguishes between bytes and characters. Characters are stored internally using UCS-2, although as of J2SE 5.0, the language also supports using UTF-16 and its surrogates. Java program source may therefore contain any Unicode character.
|Code listing 3.50: 哈嘍世界.java
- "3.1 Unicode", The Java™ Language Specification , Java SE 7 Edition, pp. 15-16.