patgen



PATGEN(1)                                                            PATGEN(1)




NAME

       patgen - generate patterns for TeX hyphenation


SYNOPSIS

       patgen dictionary_file pattern_file patout_file translate_file


DESCRIPTION

       This  manual page is not meant to be exhaustive.  The complete documen-
       tation for this version of TeX can be found in the info file or  manual
       Web2C: A TeX implementation.

       The  patgen  program  reads  the  dictionary_file  containing a list of
       hyphenated words and the pattern_file  containing  previously-generated
       patterns   (if  any)  for  a  particular  language,  and  produces  the
       patout_file with (previously- plus  newly-generated)  hyphenation  pat-
       terns  for  that language. The translate_file defines language specific
       values for the parameters left_hyphen_min and right_hyphen_min used  by
       TeX’s  hyphenation  algorithm  and  the  external representation of the
       lower and upper case version(s) of all `letters’ of that language. Fur-
       ther details of the pattern generation process such as hyphenation lev-
       els and pattern lengths are requested  interactively  from  the  user’s
       terminal.  Optionally  patgen  creates  a  new dictionary file pattmp.n
       showing the good and bad hyphens found by the generated patterns, where
       n is the highest hypenation level.

       The  patterns  generated  by  patgen  can  be read by initex for use in
       hyphenating words. For a (very) long example of  patgen’s  output,  see
       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex,  which  contains the patterns
       TeX uses for English.  At some sites, patterns for several  other  lan-
       guages  may  be  available,  and  the  local tex programs may have them
       preloaded; consult your Local Guide or your  system  administrator  for
       details.

       All filenames must be complete; no adding of default extensions or path
       searching is done.


FILE FORMATS

       Letters
           When initex digests hyphenation patterns, TeX first expands  macros
           and  the  result  must entirely consist of digits (hyphenation lev-
           els), dots (`.’, edge of a word), and letters. In pattern files for
           non-English  languages  letters  are often represented by macros or
           other expandable constructs.  For the purpose of patgen  these  are
           just  character  sequences,  subject  to the condition that no such
           sequence is a prefix of another one.

       Dictionary file
           A dictionary file contains a weighted list of hyphenated words, one
           word per line starting in column 1. A digit in column 1 indicates a
           global word weight (initially =1) applicable to all following words
           up  to  the next global word weight. A digit at some intercharacter
           position indicates a weight for that position only.

           The hyphens in a word are indicated by `-’, `*’, or `.’  (or  their
           replacements  as  defined in the translate file) for hyphens yet to
           be found, `good’ hyphens (correctly found  by  the  patterns),  and
           `bad’  hyphens  (erroneously  found  by the patterns) respectively;
           when reading a dictionary file `*’ is treated like `-’ and  `.’  is
           ignored.

       Translate file
           A  translate  file  starts  with  a  line  containing the values of
           left_hypen_min in columns 1-2, right_hyphen_min in columns 3-4, and
           either  a  blank or the replacement for one of the "hyphen" charac-
           ters `-’, `*’, and `.’ in columns 5, 6, and  7.  (Input  lines  are
           padded with blanks as for many TeX related programs.)

           Each  following  line  defines one `letter’: an arbitrary delimiter
           character in column 1, followed by one or more external representa-
           tions  of  that character (first the `lower’ case one used for out-
           put), each one terminated by the delimiter and the  whole  sequence
           terminated by another delimiter.

           If  the  translate  file  is  empty,  the  values left_hypen_min=2,
           right_hyphen_min=3, and the 26 lower case letters a...z with  their
           upper case representations A...Z are assumed.

       Terminal input
           After  reading the translate_file and any previously-generated pat-
           terns from pattern_file, patgen requests input from the user’s ter-
           minal.

           First  the integer values of hyph_start and hyph_finish, the lowest
           and highest hyphenation level for which patterns are to  be  gener-
           ated. The value of hyph_start should be larger than any hyphenation
           level already present in pattern_file.

           Then, for each hyphenation level, the integer values  of  pat_start
           and  pat_finish, the smallest and largest pattern length to be ana-
           lyzed, as well as good  weight,  bad  weight,  and  threshold,  the
           weights  for good and bad hyphens and a weight threshold for useful
           patterns.

           Finally the decision (`y’ or `Y’ vs. anything else) whether or  not
           to produce a hypenated word list.


FILES

       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
           Patterns for English.


SEE ALSO

       Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford
       University Ph.D. thesis, 1983.
       Donald E. Knuth, The TeX  for  nroffbook,  Addison-Wesley,  1986,  ISBN
       0-201-13447-0, Appendix H.


AUTHORS

       Frank  Liang  wrote  the first version of this program.  Peter Breiten-
       lohner made a substantial revision in 1991 for TeX 3.  The  first  ver-
       sion  was  published as the appendix to the TeX for nroffware technical
       report, available from the TeX Users Group. Howard  Trickey  originally
       ported it to Unix.



Web2C 7.5.4                     23 August 2004                       PATGEN(1)

Man(1) output converted with man2html