Hyphenation

Plain TeX’s built in hyphenation algorithm uses patterns to decide where words can be hyphenated. The patterns are in the file hyphen.tex which can be found here, or somewhere in your TeX installation. The contents of that file might appear a bit cryptic. See this question on stackexchange: .

Because the hyphenation patterns cannot possibly cover every possible word break in a language, there will be exceptions, cases where TeX fails to hyphenate a word properly. In a document I was working on recently ‘typography’ was broken as ‘typog-raphy’, which looked strange to me. You can instruct TeX to break a word the way you want it by putting hyphenation exceptions at the beginning of the document like this –

\hyphenation{ty-po-gra-phy}

But I discovered that this wasn’t actually a hyphenation failure: it’s because TeX was using its default US English hyphenation rules. If I made TeX use British hyphenation patterns by setting \uselanguage{ukenglish} I got ‘typo-graphy’ without having to use exceptions. British English tends to break words etymologically, whereas American English breaks words syllabically.

Modern TeX engines can use hyphenation patterns for many languages. The file hyphenation.pdf () lists all hyphenation patterns for TeX, which have been collected in a single package hyph-utf8, which can be used in pdfTeX, XeTeX, or LuaTeX.

An example of their use is given in the following files: hyph.tex, hyph.pdf

These files use the commands \uselanguage{ukenglish}, \uselanguage{ngerman}, etc to switch between hyphenation patterns. If you find those commands a bit long, you can create an alias like this:

\def\de{\uselanguage{ngerman}}

now \de will apply German hyphenation.

You can maintain hyphenation exceptions for more than one language if you load the patterns before \hyphenation{}:

\uselanguage{ukenglish}
\hyphenation{man-u-script man-u-scripts ap-pen-dix also into upon}


\uselanguage{ancientgreek}
\hyphenation{δε-δογ-μέ-νον Λα-κε-δαι-μονί-ων δύο ἀπὸ}


Entering a word in \hyphenation{} without any hyphens means it will not be hyphenated at all.

There is another method which allows more control over the number of letters before and after a word break:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% for hyphenation profiles and European languages
{\catcode`@=11
\gdef\eplainsetlanguage#1#2#3{%
% do not set the language if the name is undefined in the current TeX.
\expandafter\ifx\csname lang@#1\endcsname \relax
\message{no patterns for #1}%
\else
\global\language = \csname lang@#1\endcsname
\fi

% but there is no harm in adjusting the hyphenmin values regardless.
\global\lefthyphenmin = #2\relax
\global\righthyphenmin = #3\relax
}}%

\def\ukenglish{\eplainsetlanguage{ukenglish}{2}{3}}
\def\ngerman{\eplainsetlanguage{ngerman}{2}{3}}
\def\russian{\eplainsetlanguage{russian}{2}{2}}
\def\latin{\eplainsetlanguage{latin}{2}{2}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

These macros written by Karl Berry allow you to change the values of \lefthyphenmin and \righthyphenmin. \lefthyphenmin sets the minimum number of characters before a hyphenation, \righthyphenmin the minimum number of characters after a hyphenation. \lefthyphenmin is normally set to 2 and \righthyphenmin to 3, so the word ‘advocate’ could be broken ad-vocate or advoc-ate, but advoca-te would not be permissable. The 2, 3 settings seem to be used in English, French and German. But other languages may have different settings.

Also in \def\ukenglish{\eplainsetlanguage{ukenglish}{2}{3}} the \ukenglish can be changed to something different if you prefer, e.g.

\def\english{\eplainsetlanguage{ukenglish}{2}{3}}
\def\deutsch{\eplainsetlanguage{ngerman}{2}{3}}
\def\rus{\eplainsetlanguage{russian}{2}{2}}
\def\francais{\eplainsetlanguage{french}{2}{3}}
\def\gaeilge{\eplainsetlanguage{irish}{2}{2}}


Here is the example file above but using the Eplain hyphenation macros. I’ve also tested the hyphenation a bit more by setting the type in two columns using Eplain’s \doublecolumns macro: hyph2.tex, hyph2.pdf

Of the two methods the first is easier to use. It also works better when used within macros.