LaTeX/Internationalization
From Wikibooks, the open-content textbooks collection
When you write documents in languages other than English, areas where LaTeX has to be configured appropriately:
- All automatically generated text strings have to be adapted to the new language.
- Language specific typographic rules. In French for example, there is a mandatory space before each colon character (:).
- LaTeX needs to know the hyphenation rules for the new language.
- You want to be able to insert all the language-specific special characters directly, without using any strange coding.
If you simply need to add a few words from another language, you may find LaTeX/Accents an easier way.
About the first, second and part of the third point, if your system is already configured appropriately (and it is, unless your LaTeX distribution has a bug), the babel package by Johannes Braams will take care of everything. You can use it loading in your preamble, providing as an argument the language you want to use:
\usepackage[language]{babel}
you'd better place it soon after the \documentclass command, so that all the other packages you load afterwards will know the language you are using. A list of the languages built into your LaTeX system will be displayed every time the compiler is started. Babel will automatically activate the appropriate hyphenation rules for the language you choose. If your LaTeX format does not support hyphenation in the language of your choice, babel will still work but will disable hyphenation, which has quite a negative effect on the appearance of the typeset document. Babel also specifies new commands for some languages, which simplify the input of special characters. See the sections about languages for more information
If you call babel with multiple languages:
\usepackage[languageA,languageB]{babel}
then the last language in the option list will be active (i.e. languageB), and you can use the command
\selectlanguage{languageA}
to change the active language.
Most of the modern computer systems allow you to input letter of national alphabets directly from the keyboard. In order to handle variety of input encoding used for different groups of languages and/or on different computer platforms LaTeX employs the inputenc package:
\usepackage[encoding]{inputenc}
When using this package, you should consider that other people might not be able to display your input files on their computer, because they use a different encoding. For example, the German umlaut ä on OS/2 is encoded as 132, on Unix systems using ISO-LATIN 1 it is encoded as 228, while in Cyrillic encoding cp1251 for Windows this letter does not exist at all; therefore you should use this feature with care. The following encodings may come in handy, depending on the type of system you are working on:
| Operating system | Encodings | |
|---|---|---|
| Western Latin | Cyrillic | |
| Mac | applemac | macukr |
| Unix | latin1 | koi8-ru |
| Windows | ansinew | cp1251 |
| DOS, OS/2 | cp850 | cp866nav |
If you have a multilingual document with conflicting input encodings, you might want to switch to unicode, using the ucs package.
\usepackage{ucs} \usepackage[utf8x]{inputenc}
will enable you to create LaTeX input files in utf8x, a multi-byte encoding in which each character can be encoded in as little as one byte and as many as four bytes.
Font encoding is a different matter. It defines at which position inside a TeX-font each letter is stored. Multiple input encodings could be mapped into one font encoding, which reduces number of required font sets. Font encodings are handled through fontenc package:
\usepackage[encoding]{fontenc}
where encoding is font encoding. It is possible to load several encodings simultaneously.
The default LaTeX font encoding is OT1, the encoding of the original Computer Modern TeX font. It contains only the 128 characters of the 7-bit ASCII character set. When accented characters are required, TeX creates them by combining a normal character with an accent. While the resulting output looks perfect, this approach stops the automatic hyphenation from working inside words containing accented characters. Besides, some of Latin letters could not be created by combining a normal character with an accent, to say nothing about letters of non-Latin alphabets, such as Greek or Cyrillic.
To overcome these shortcomings, several 8-bit CM-like font sets were created. Extended Cork (EC) fonts in T1 encoding contains letters and punctuation characters for most of the European languages based on Latin script. The LH font set contains letters necessary to typeset documents in languages using Cyrillic script. Because of the large number of Cyrillic glyphs, they are arranged into four font encodings—T2A, T2B, T2C, and X2. The CB bundle contains fonts in LGR encoding for the composition of Greek text. By using these fonts you can improve/enable hyphenation in non-English documents. Another advantage of using new CM-like fonts is that they provide fonts of CM families in all weights, shapes, and optically scaled font sizes
Here is a collection of suggestions about writing a LaTeX document in a language other than English. If you have experience in a language not listed below, please add some notes about it.
Contents |
[edit] Arabic script
For languages which use the Arabic script, including Arabic, Persian, Urdu, Pashto, Kurdish, Uyghur, etc., add the following code to your preamble:
\usepackage{arabtex}
You can input text in either romanized characters or native Arabic script encodings. Use any of the following commands/environment to enter in text:
\< … > \RL{ … } \begin{arabtext} … \end{arabtext}.
See the ArabTeX Wikipedia article for further details.
[edit] Cyrillic script
Please add the section "Writing in Cyrillic" from http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf . You are allowed to copy it.
See also the Bulgarian translation of the "Not so Short Introduction to LaTeX 2e” from http://www.ctan.org/tex-archive/info/lshort/bulgarian/lshort-bg.pdf
This enables you to type cyrillic letters directly via your keyboard, but with a different distribution than a standard cyrillic keyboard! To get the standard distribution, only include: \usepackage[OT1]{fontenc} \usepackage[russian]{babel}
[edit] Czech
Czech is fine using
\usepackage[czech]{babel} \usepackage[T1]{fontenc} \usepackage[utf8x]{inputenc}
You may use different encoding, but UTF-8 is becoming standard and it allows you to have „czech quotation marks“ directly in your text. Otherwise, there are macros \glqq and \grqq to produce left and right quote.
[edit] French
Some hints for those creating French documents with LaTeX: you can load French language support with the following command:
\usepackage[frenchb]{babel}
There are multiple options for typesetting French documents, depending on the flavor of French: french, frenchb, and francais for Parisian French, and acadian and canadien for new-world French. All enable French hyphenation, if you have configured your LaTeX system accordingly. All of these also change all automatic text into French: \chapter prints Chapitre, \today prints the current date in French and so on. A set of new commands also becomes available, which allows you to write French input files more easily. Check out the following table for inspiration:
| input code | rendered output |
|---|---|
| \og guillemets \fg{} | « guillemets » |
| M\up{me}, D\up{r} | Mme, Dr |
| 1\ier{}, 1\iere{}, 1\ieres{} | 1er, 1re, 1res |
| 2\ieme{} 4\iemes{} | 2e 4es |
| \No 1, \no 2 | N° 1, n° 2 |
| 20~\degres C, 45\degres | 20 °C, 45° |
| M. \bsc{Durand} | M. Durand |
| \nombre{1234,56789} | 1 234,567 89 |
You will also notice that the layout of lists changes when switching to the French language. For more information on what the frenchb option of babel does and how you can customize its behavior, run LaTeX on file frenchb.dtx and read the produced file frenchb.pdf or frenchb.dvi.
[edit] German
You can load German language support using either one of the two following commands.
For old german orthography use
\usepackage[german]{babel}
or for new german orthography use
\usepackage[ngerman]{babel}
This enables German hyphenation, if you have configured your LaTeX system accordingly. It also changes all automatic text into German. Eg. “Chapter” becomes “Kapitel.” A set of new commands also becomes available, which allows you to write German input files more quickly even when you don’t use the inputenc package. Check out table 2.5 for inspiration. With inputenc, all this becomes moot, but your text also is locked in a particular encoding world.
| "a | ä |
| "s | ß |
| "` or \glqq | „ |
| "' or \grqq | “ |
| "< or \flqq | « |
| "> or \frqq | » |
| \flq | ‹ |
| \frq | › |
| \dq | " |
In German books you often find French quotation marks («guillemets»). German typesetters, however, use them differently. A quote in a German book would look like »this«. In the German speaking part of Switzerland, typesetters use «guillemets» the same way the French do. A major problem arises from the use of commands like \flq: If you use the OT1 font (which is the default font) the guillemets will look like the math symbol "
", which turns a typesetter’s stomach. T1 encoded fonts, on the other hand, do contain the required symbols. So if you are using this type of quote, make sure you use the T1 encoding. (\usepackage[T1]{fontenc})
[edit] Greek
This is the preamble you need to write in the Greek language.
\usepackage[english,greek]{babel} \usepackage[iso-8859-7]{inputenc}
This preamble enables hyphenation and changes all automatic text to Greek. A set of new commands also becomes available, which allows you to write Greek input files more easily. In order to temporarily switch to English and vice versa, one can use the commands \textlatin{english text} and \textgreek{greek text} that both take one argument which is then typeset using the requested font encoding. Otherwise you can use the command \selectlanguage{...} described in a previous section. Use \euro for the Euro symbol.
[edit] Hungarian
Similar to Italian, but use the following lines:
\usepackage[magyar]{babel} \usepackage[latin2]{inputenc} \usepackage[T1]{fontenc}
- More information in hungarian.
[edit] Italian
Italian is well supported by LaTeX. Just add \usepackage[italian]{babel} at the beginning of your document and the output of all the commands will be translated properly. You can add letters with accents without any particular setting, just write \`a \`e \'e \`i \`o \`u and you will get à è é ì ò ù (NB: the symbol changes if the inclination of the accent changes). Anyway, if you do so, it could be quite annoying since it's time-wasting. Moreover, if you are using any spell-checking program, "città" is correct, but "citt\`a" will be seen as a mistake. If you add \usepackage[latin1]{inputenc} at the beginning of your document, LaTeX will include correctly all your accented letters. To sum up, just add
\usepackage[italian]{babel} \usepackage[latin1]{inputenc}
at the beginning of your document and you can write in Italian without being worried of translations and fonts. If you are writing your document without getting any error, then don't worry about anything else. If you start getting some unknown errors whenever you use an Italian letter, then you have to worry about the encoding of your files. As known, any LaTeX source is just plain text, so you'll have to insert accented letters properly within the text file. If you write your document using always the same program on the same computer, you should not have any problem. If you are writing your document using different programs, if could start getting some strange errors from the compiler. The reason could be that the accented letters were not included properly within your source file and LaTeX can't recognize them. The reason is that an editor modified your document with a different encoding from the one that was used when creating it. Most of the operating systems use UTF-8 as default, but this could create problems if are using programs based on different libraries or different operating systems. The best way to solve this problem is to change the encoding to ISO-8859-1, that includes all the letters you need. Some text editors let you change the encoding in the settings.
[edit] Korean
To use LATEX for typesetting Korean, we need to solve three problems:
- We must be able to edit Korean input files. Korean input files must be in plain text format, but because Korean uses its own character set outside the repertoire of US-ASCII, they will look rather strange with a normal ASCII editor. The two most widely used encodings for Korean text files are EUC-KR and its upward compatible extension used in Korean MS-Windows, CP949/Windows-949/UHC. In these encodings each US-ASCII character represents its normal ASCII character similar to other ASCII compatible encodings such as ISO-8859-x, EUC-JP, Big5, or Shift_JIS. On the other hand, Hangul syllables, Hanjas (Chinese characters as used in Korea), Hangul Jamos, Hiraganas, Katakanas, Greek and Cyrillic characters and other symbols and letters drawn from KS X 1001 are represented by two consecutive octets. The first has its MSB set. Until the mid-1990’s, it took a considerable amount of time and effort to set up a Korean-capable environment under a non-localized (non-Korean) operating system. You can skim through the now much-outdated http://jshin.net/faq to get a glimpse of what it was like to use Korean under non-Korean OS in mid-1990’s. These days all three major operating systems (Mac OS, Unix, Windows) come equipped with pretty decent multilingual support and internationalization features so that editing Korean text file is not so much of a problem anymore, even on non-Korean operating systems.
- TEX and LATEX were originally written for scripts with no more than 256 characters in their alphabet. To make them work for languages with considerably more characters such as Korean or Chinese, a subfont mechanism was developed. It divides a single CJK font with thousands or tens of thousands of glyphs into a set of subfonts with 256 glyphs each. For Korean, there are three widely used packages; HLATEX by UN Koaunghi, hLATEXp by CHA Jaechoon and the CJK package byWerner Lemberg. HLATEX and hLATEXp are specific to Korean and provide Korean localization on top of the font support. They both can process Korean input text files encoded in EUC-KR. HLATEX can even process input files encoded in CP949/Windows-949/UHC and UTF-8 when used along with Λ, Ω. The CJK package is not specific to Korean. It can process input files in UTF-8 as well as in various CJK encodings including EUC-KR and CP949/Windows-949/UHC, it can be used to typeset documents with multilingual content (especially Chinese, Japanese and Korean). The CJK package has no Korean localization such as the one offered by HLATEX and it does not come with as many special Korean fonts as HLATEX.
- The ultimate purpose of using typesetting programs like TEX and LATEX is to get documents typeset in an ‘aesthetically’ satisfying way. Arguably the most important element in typesetting is a set of welldesigned fonts. The HLATEX distribution includes UHC PostScript fonts of 10 different families and Munhwabu fonts (TrueType) of 5 different families. The CJK package works with a set of fonts used by earlier versions of HLATEX and it can use Bitstream’s cyberbit True-Type font.
To use the HLATEX package for typesetting your Korean text, put the following declaration into the preamble of your document:
\usepackage{hangul}
This command turns the Korean localization on. The headings of chapters, sections, subsections, table of content and table of figures are all translated into Korean and the formatting of the document is changed to follow Korean conventions. The package also provides automatic “particle selection.” In Korean, there are pairs of post-fix particles grammatically equivalent but different in form. Which of any given pair is correct depends on whether the preceding syllable ends with a vowel or a consonant. (It is a bit more complex than this, but this should give you a good picture.) Native Korean speakers have no problem picking the right particle, but it cannot be determined which particle to use for references and other automatic text that will change while you edit the document. It takes a painstaking effort to place appropriate particles manually every time you add/remove references or simply shuffle parts of your document around. HLATEX relieves its users from this boring and error-prone process.
In case you don’t need Korean localization features but just want to typeset Korean text, you can put the following line in the preamble, instead.
\usepackage{hfont}
For more details on typesetting Korean with HLATEX, refer to the HLATEX Guide. Check out the web site of the Korean TEX User Group (KTUG) at http://www.ktug.or.kr/.
[edit] Polish
If you plan to use Polish in your UTF-8 encoded document, use the following code
\usepackage[utf8]{inputenc} \usepackage{polski} \usepackage[polish]{babel}
[edit] Portuguese
Add the following code to your preamble:
\usepackage[portuguese]{babel} \usepackage[latin1]{inputenc} \usepackage[T1]{fontenc}
if you are in Brazil, you can substitute the language for brazilian portuguese by choosing: brazilian. The first line is to get everything translated properly, the second is for being able to input text correctly and the third one to get hyphenation right. Note that we are using the latin1 input encoding here, so this will not work on a Mac or on DOS. Just use the appropriate encoding for your system. If you are using Linux, use
\usepackage[utf8]{inputenc}
[edit] Spanish
To enable Spanish writing, besides installing the appropriate hyphenation patterns, you type:
\usepackage[spanish]{babel}
The trick is that Spanish has several options and commands changing the layout. The options may be loaded either at the call to Babel, after calling spanish, or before, defining the \spanishoptions macro. So the following commands are roughly equivalent:
\def\spanishoptions{mexico} \usepackage[spanish]{babel}
\usepackage[spanish,mexico]{babel}
On average, the former syntax should be preferred, as the latter is a deviation from standard Babel behavior, and thus may break other programs (LyX, latex2rtf2e) interacting with LaTeX.
Two particularly useful options are es-noquoting,es-nolists: some packages and classes are known to collide with Spanish in the way they handle active characters, and these options disable the internal workings of Spanish to allow you to overcome these common pitfalls. Moreover, these options may simplify the way LyX customizes some features of the Spanish layout from inside the GUI.
The options mexico,mexico-com provide support for local custom in Mexico: the former using decimal dot, as customary, and the latter allowing decimal comma, as required by the Mexican Official Norm (NOM) of the Department of Economy for labels in foods and goods. More localizations are in the making.
Two particularly useful commands are \spanishoperators and \spanishdeactivate.
The macro \spanishoperators{list of operators} contains a list of spanish mathematical operators, and may be redefined at will. For instance, the command \def\spanishoperators{sen} only defines sen, overriding all other definitions; the command \let\spanishoperators\relax disables them all. This command supports accented or spaced operators. For instance, the following operators are stated by default.
l\acute{i}m l\acute{i}m\,sup l\acute{i}m\,inf m\acute{a}x \acute{i}nf m\acute{i}n sen tg arc\,sen arc\,cos arc\,tg cotg cosec senh tgh
The \acute{<letter>} command puts an accent, and the \, command adds a small space.
Finally, the macro \spanishdeactivate{list of characters} disables some active characters, to keep you out of trouble if they are redefined by other packages. The candidates for deactivation are the set <>."'.
Please check the documentation for Babel or spanish.dtx for further details.