dictd



DICTD(8)                                                              DICTD(8)




NAME

       dictd - a dictionary database server


SYNOPSIS

       dictd [options]


DESCRIPTION

       dictd  is  a  server  for  the Dictionary Server Protocol (DICT), a TCP
       transaction based query/response  protocol  that  allows  a  client  to
       access dictionary definitions from a set of natural language dictionary
       databases.

       For security reasons, dictd drops root permissions after  startup.   If
       user  dictd  exists  on  the  system, the daemon will run as that user,
       group dictd, otherwise it will run as  user  nobody,  group  nobody  or
       nogroup (depending on the operating system distribution).

       Since  startup  time is significant, the server is designed to run con-
       tinuously, and should not be run from inetd(8).

       Databases are distributed separately from the server.

       dictd assumes that  the  index  files  are  sorted  alphabeticaly.   By
       default,  only  alphanumeric  charactares  are  used  for search.  This
       default may be overridden by a header in the data file.  The only  such
       features  implemented  at  this  time  are  the  headers  "00-database-
       allchars" which tells dictd that non-alphanumeric characters  may  also
       be  used  for search, and the header "00-database-utf8" which indicates
       that the database uses utf8 encoding. All headwords in the  index  file
       are sorted alphabetically.

       A header "00-database-plugin" may also be present and is used for inte-
       grating plugins into dictd. See "dictfmt_plugin --help" and  "dictdplu-
       gin.h" for more information.

       A header "00-database-virtual" identifies "virtual dictionaries", which
       are lists of real dictionaries to be searched by dictd.


BACKGROUND

       For many years, the Internet community has relied on the "webster" pro-
       tocol for access to natural language definitions.  The webster protocol
       supports access to a single dictionary and  (optionally)  to  a  single
       thesaurus.   In  recent years, the number of publicly available webster
       servers on the Internet has dramatically decreased.

       Fortunately, several  freely-distributable  dictionaries  and  lexicons
       have recently become available on the Internet.  However, these freely-
       distributable databases are not accessible via a uniform interface, and
       are not accessible from a single site.  They are often small and incom-
       plete individually, but would collectively provide an  interesting  and
       useful  database  of  English words.  Examples include the Jargon file,
       the WordNet database, MICRA’s version of  the  1913  Webster’s  Revised
       Unabridged  Dictionary,  and  the  Free Online Dictionary of Computing.
       (See the DICT protocol specification (RFC) for references.)   Translat-
       ing and non-English dictionaries are also becoming available (for exam-
       ple, the FOLDOC dictionary is being translated into Spanish).

       The webster protocol is not suitable for providing access  to  a  large
       number  of separate dictionary databases, and extensions to the current
       webster protocol were not felt to be a clean solution to the dictionary
       database problem.

       The  DICT protocol is designed to provide access to multiple databases.
       Word definitions can be requested,  the  word  index  can  be  searched
       (using  an  easily  extended  set of algorithms), information about the
       server can be provided (e.g., which index search  strategies  are  sup-
       ported,  or  which  databases  are  available), and information about a
       database can be provided (e.g., copyright,  citation,  or  distribution
       information).  Further, the DICT protocol has hooks that can be used to
       restrict access to some or all of the databases.

       dictd(8) is a server that implements the DICT  protocol.   Bret  Martin
       implemented  another  server,  and  several  people (including Bret and
       myself) have implemented clients in a variety of languages.


OPTIONS

       -V or --version
              Display version information.

       --license
              Display copyright and license information.

       -h or --help
              Display help information.

       -v or --verbose or  -dverbose
              Be verbose.

       -c file or --config file
              Specify configuration file.  The default is /etc/dictd.conf, but
              may  be  changed  in the dictd.h file at compile time (DICT_CON-
              FIG_FILE).

       -p port or --port port
              Specifies the port (e.g., 2628).  The default is 2628, as speci-
              fied in the DICT Protocol RFC, but may be changed in the dictd.h
              file at compile time (DICT_DEFAULT_SERVICE).

       --depth length
              Specify the queue length for listen(2).  Specifies the number of
              pending  socket  connections  which  are queued by the operating
              system.  Some operating systems may silently limit this value to
              5 (older BSD systems) or 128 (Linux).  The default is 10 but may
              be   changed   in   the   dictd.h   file   at    compile    time
              (DICT_QUEUE_DEPTH).

       --delay seconds
              Specifies  the number of seconds a client may be idle before the
              server will close the connection.  Idle time is  defined  to  be
              the  time  the  server is waiting for input and does not include
              the time the server spends searching the database.   Connections
              are closed without warning since no provision for premature con-
              nection termination is specified in the DICT protocol RFC.   The
              default  is  600 seconds (10 minutes), but may be changed in the
              dictd.h file at compile time (DICT_DEFAULT_DELAY).

       --facility facility
              Specifies the syslog facility to use.  The use  of  this  option
              implies  the  -s option to turn on logging via syslog.  When the
              operating system libraries support SYSLOG_NAMES, the names  used
              for  this option should be those listed in syslog.conf(5).  Oth-
              erwise, the following names are used  (assuming  the  particular
              facility  is defined in the header files): auth, authpriv, cron,
              daemon, ftp, kern, lpr, mail, news, syslog, user, uucp,  local0,
              local1, local2, local3, local4, local5, local6, and local7.

       -f or --force
              Force  the  daemon to start even if an instance of the daemon is
              already running.  (This is of little value unless a  non-default
              port  is specified with -p, since, if one instance is bound to a
              port, the second one fails when it can not bind to the port.)

       --limit children
              Specifies the number of daemons that may be  running  simultane-
              ously.   Each daemon services a single connection.  If the limit
              is exceeded, a (serialized)  connection  will  be  made  by  the
              server  process,  and  a  response  code 420 (server temporarily
              unavailable) will be sent to the client.  This parameter  should
              be  adjusted to prevent the server machine from being overloaded
              by dict clients, but should not be set so low that many  clients
              are  denied  useful  connections. The default is 100, but may be
              changed in the dictd.h file at compile time (DICT_DAEMON_LIMIT).

       --locale locale
              Specifies the locale used for searching.  If no locale is speci-
              fied, the "C" locale is used.  The locale used  for  the  server
              should  be  the  same as that used for dictfmt when the database
              was built (specifically, the locale under which  the  index  was
              sorted). The locale should be specified for both 8-bit and UTF-8
              formats. If locale contains utf8 or utf-8 substring, UTF-8  for-
              mat is expected.  Note that if your database is not in ASCII7 or
              UTF-8 format, then the dictd server will not be compliant to RFC
              2229.

       -s     Log using the syslog(3) facility.

       -L file or --logfile file
              Specify  the file for logging.  The filename specified is recom-
              puted on each use using the strftime(3) call.   For  example,  a
              filename  ending  in ".%Y%m%d" will write to log files ending in
              the year, month, and date that the log entry was written.  NOTE:
              If  dictd  does not have write permission for this file, it will
              silently fail.

       -m minutes  or --mark minutes
              How often a timestamp should be logged.  (This is effective only
              if  logging has been enabled with the -s or -L option, or with a
              debugging option.)

       --default-strategy strategy
              Set the default strategy for MATCH search type. The  default  is
              ’lev’.

       --without-strategy strat1,strat2,...
              Disable  specified strategies.  By default all search strategies
              are enabled.

       --add-strategy strat:descr
              Adds strategy ’strat’  with  the  description  ’descr’.   A  new
              search strategy may be implemented with a help of plugins.

       --no-mmap
              do not use the mmap() function and read entire files into memory
              instead.

       --test word  or -t word
              self test -- lookup word

       --test-file file or --ftest file
              self test -- lookup all words in file

       --test-strategy strategy
              self test -- set search strategy for --test  and  --ftest.   The
              default is ’exact’.

       --test-db database
              self  test -- set dictionary to be searched. The default is ’*’.

       --test-match
              self test -- set search type to MATCH. The default is DEFINE.


       -l option or --log option
              Specify a logging option.  This is effective only if logging has
              been enabled with the -s or -L option, or logging to the console
              has been activated with a debugging option (e.g., --debug  node-
              tach.   Only  one option may be set with each invocation of this
              option; however, multiple invocations of this option may be made
              in one dictd command line.  For instance:
              dictd -s --log stats --log found --log notfound
              is a valid command line, and sets three logging options.

              Some  of the more verbose logging options are used primarily for
              debugging the server code, and are not practical for normal use.

              server Log server diagnostics.  This is extremely verbose.

              connect
                     Log all connections.

              stats  Log all children terminations.

              command
                     Log all commands.  This is extremely verbose.

              client Log results of CLIENT command.

              found  Log all words found in the databases.

              notfound
                     Log all words not found in the databases.

              timestamp
                     When  logging  to  a file, use a full timestamp like that
                     which syslog would produce.  Otherwise, no  timestamp  is
                     made, making the files shorter.

              host   Log name of foreign host.

              auth   Log authentication failures.

              min    Set a minimal number of options.  If logging is activated
                     (to a file, or via syslog), and no options are set,  then
                     the  minimal set of options will be used.  If options are
                     set, then only those options specified will be used.

              all    Set all of the options.

              none   Clear all of the options.

              To facilitate location of interesting  information  in  the  log
              file,  entries  are  marked  with initial letters indicating the
              class of the line being logged:

              I      Information about the server, connections, or termination
                     statistics.  These lines are generally not designed to be
                     parsed automatically.

              E      Error messages.

              C      CLIENT command information.

              D      Definitions found in the databases searched.

              M      Matches found in the database searched.

              N      Matches which were not found in the databases searched.

              T      Trace of exact line sent by client.

              A      Authentication information.

              To preserve anonymity of the client, do not use the  connect  or
              host  options.   Clients  may  or  may not send host information
              using the CLIENT command, but this should be an option  that  is
              selectable on the client side.

       -d option
              Activate  a  debugging  option.  There are several, all of which
              are only useful to developers.  They  are  documented  here  for
              completeness.   A list can be obtained interactively by using -d
              with an illegal option.

              verbose
                     The same as -v or --verbose.   Adds  verbosity  to  other
                     options.

              scan   Debug the scanner for the configuration file.

              parse  Debug the parser for the configuration file.

              search Debug the character folding and binary search routines.

              init   Report database initialization.

              port   Log client-side port number to the log file.

              lev    Debug Levenshtein search algorithm.

              auth   Debug the authorization routines.

              nodetach
                     Do  not  detach  as a background process.  Implies that a
                     copy of the log file will appear on the standard  output.

              nofork Do  not  fork  daemons to service requests.  Be a single-
                     threaded server.  This option implies  nodetach,  and  is
                     most  useful  for  using  a debugger to find the point at
                     which daemon processes are dumping core.

              alt    Debugs altcompare in index.c.


CONFIGURATION FILE

       Introduction
              The configuration file defaults to /etc/dictd.conf, but  can  be
              specified  on  the  command line with the -c option (see above).
              The configuration file has  four  distinct  sections.   At  this
              time,  each section must appear in the specified order, although
              only the Database section is required.

              The file is divided up into different sections.  The  Site  Sec-
              tion  should  come  first,  followed  by the Access Section, the
              Database Section, and the User Section.  Sections are  optional,
              but they should be in the order listed here.

       Syntax The  following  keywords  are  valid  in  a  configuration file:
              access, allow, deny, group, database, data, index, filter,  pre-
              filter,  postfilter,  name, include, user, authonly, site.  Key-
              words are case sensitive.  String arguments that contain  spaces
              should be surrounded by double quotes.  Without quoting, strings
              may contain alphanumeric characters and _, -,
               ., and *, but not spaces.  Strings  can  be  continued  between
              lines. \", \\, \n, \<NL> are treated as double quote, backslash,
              new line and no symbol respectively.  Comments start with #  and
              extend to the end of the line.

       Site Section

              site string
                     Used  to  specify  the  filename for the site information
                     file, a  flat  text  file  which  will  be  displayed  in
                     response  to  the  SHOW SERVER command.  This section, if
                     present, must be first.

       Access Section

              access { access specification }
                     This section, the second if the Site Section is  present,
                     contains  access  restrictions  for the server and all of
                     the  databases  collectively.   Per-database  control  is
                     specified in the Database Section.

       Database Section

              database string { database specification }
                     The  string  specifies the name of the database (e.g., wn
                     or web1913).  (This is an arbitrary name selected by  the
                     administrator, and is not necessarily related to the file
                     name or any name listed in the data file.  A short,  easy
                     to  type  name  is  often selected for easy use with dict
                     -d.)

                     NOTE: If the files specified in the  database  specifica-
                     tion do not exist on the system, dictd may silently fail.

              database_virtual string { virtual database specification }
                     This section specifies the virtual database.  The  string
                     specifies the name of the database (e.g., en-ru or fren).

              database_plugin string { plugin specification }
                     This section specifies the plugin.  The string  specifies
                     the name of the database.

              database_exit
                     Excludes  following  databases from the ’*’ database.  By
                     default ’*’  means  all  databases  available.   Look  at
                     ’example_virtual.conf’ file for example configuration.

                     NOTE:  If  you use ’virtual’ dictionaries, you should use
                     this directive, otherwise you will search the  same  dic-
                     tionary twice.

       User Section

              user string string
                     The  first  string specifies the username, and the second
                     string specifies the shared  secret  for  this  username.
                     When  the  AUTH  command is used, the client will provide
                     the username and a hashed version of the  shared  secret.
                     If  the  shared  secret matches, the user is said to have
                     authenticated, and will have access  to  databases  whose
                     access  specifications  allow  that  user (by name, or by
                     wildcard).  If present, this section must appear last  in
                     the  configuration file.  There may be many user entries.
                     The shared secret should be kept secret,  as  anyone  who
                     has  access to it can access the shared databases (assum-
                     ing access is not denied by domain name).

       Access Specification
              Access specifications may occur in the Access Section or in  the
              Database  Section.   The  access specification will be described
              here.

              For allow, deny, and authonly, a star (*) may be used as a  wild
              card that matches any number of characters.  A question mark (?)
              may be used as a wildcard that matches a single character.   For
              example, 10.0.0.* and *.edu are valid strings.

              Further, a range of IP addresses and an IP address followed by a
              netmask may be  specified.   For  example,  10.0.0.0:10.0.0.255,
              10.0.0.0/24,  and 10.0.0.* all specify the same range of IP num-
              bers.  Notation cannot be combined on the  same  line.   If  the
              notation  does not make sense, access will be denied by default.
              Use the --debug auth option to debug related problems.

              Note that these specifications take only one string per specifi-
              cation line.  However, you can have multiple lines of each type.

              The syntax is as follows:

              allow string
                     The string specifies a domain name or IP address which is
                     allowed  access  to the server (in the Access Section) or
                     to a database (in the Database Section).  Note that  more
                     than  one  string  is  not permitted for a single "allow"
                     line, but more than one "allow" lines  are  permitted  in
                     the configuration file.

              deny string
                     The string specifies a domain name or IP address which is
                     denied access to the server (in the Access Section) or to
                     a  database  (in  the  Database  Section).   Note that if
                     reverse DNS is not working, then only the IP number  will
                     be  checked.  Therefore, it is essential to deny networks
                     based on IP number, since a denial based on  domain  name
                     may not always be checked.

              authonly string
                     This  form  is  only  useful  in the Access Section.  The
                     string specifies a domain name or  IP  address  which  is
                     allowed  access  to  the  server  but  not  to any of the
                     databases.  All commands are valid except DEFINE,  MATCH,
                     and  SHOW DB.  More specifically AUTH is a valid command,
                     and commands which access the databases are not  allowed.

              user string
                     This  form  is  only useful in the Database Section.  The
                     string specifies a username that  is  allowed  to  access
                     this  database  after  a  successful AUTH command is exe-
                     cuted.

       Database Specification
              The database specification describes the database:

              data string
                     Specifies the filename for the flat  text  database.   If
                     the  filename  does  not  begin  with  ’.’  or ’/’, it is
                     prepended with $datadir/. It is a  compile  time  option.
                     You can change this behaviour by editing Makefile or run-
                     ning ./configure --datadir=...

              index string
                     Specifies the filename for the index file.   Path  matter
                     is similar to that described above in "data" option .

              index_suffix string
                     This  is  optional  index  file  to  make ’suffix’ search
                     strategy faster (binary  search).   It  is  generated  by
                     ’dictfmt_index2suffix’. Run "dictfmt_index2suffix --help"
                     for more information.  Path matter  is  similar  to  that
                     described above in "data" option .

              index_word string
                     This  is optional index file to make ’word’ search strat-
                     egy  faster  (binary  search).   It   is   generated   by
                     ’dictfmt_index2word’. Run "dictfmt_index2word --help" for
                     more  information.   Path  matter  is  similar  to   that
                     described above in "data" option .

              prefilter string
                     Specifies  the   prefilter command.  When  a chunk of the
                     compressed database is  read, it will be  filtered   with
                     this filter before being decompressed.  This may be  used
                     to provide  some additional compression  that knows about
                     the data and can provide better compression than the LZ77
                     algorithm used by zlib.

              postfilter string
                     Specifies the postfilter command.  When a  chunk  of  the
                     compressed  database  is  read,  it will be filtered with
                     this filter before the offset and length  for  the  entry
                     are  used  to access data.  This is provided for symmetry
                     with the prefilter command, and may also  be  useful  for
                     providing additional database compression.

              filter string
                     Specifies   the  filter  command.   After  the  entry  is
                     extracted from the database, it  will  be  filtered  with
                     this  filter.  This may be used to provide formatting for
                     the entry (e.g., for html).  Warning: This  is  not  cur-
                     rently implemented.

              name string
                     Specifies  the  short  name  of the database (e.g., "1913
                     Webster’s").  If the string begins with @, then it speci-
                     fies  the  headword  to look up in the dictionary to find
                     the  short  name  of  the  database.   The   default   is
                     "@00-database-short",  but  this  may  be  changed in the
                     dictd.h file at compile time (DICT_SHORT_ENTRY_NAME).

              info string
                     Specifies the information about database.  If the  string
                     begins  with @, then it specifies the headword to look up
                     in the dictionary to find information.   The  default  is
                     "@00-database-info",  but  this  may  be  changed  in the
                     dictd.h file at compile time (DICT_INFO_ENTRY_NAME).

              invisible
                     Makes dictionary invisible to the clients i.e. this  dic-
                     tionary will not be recognized or shown by DEFINE, MATCH,
                     SHOW INFO, SHOW SERVER and SHOW DB commands. If some def-
                     initions  or  matches  are found in invisible dictionary,
                     the name of the upper visible virtual dictionary  or  ’*’
                     is  returned.  NOTE: There is no sense to make dictionary
                     invisible unless it is included to  the  virtual  dictio-
                     nary.

       Virtual Database Specification
              The   virtual   database  specification  describes  the  virtual
              database:

              database_list string
                     Specifies a list of databases which are included into the
                     virtual  database.   Database names are in the string and
                     are separated by comma.

              name string
                     Specifies the short name of the database.  String  begin-
                     ning with ’@’ symbol is not treated as an entry name.

              info string
                     Specifies  the information about database.  String begin-
                     ning with ’@’ symbol is not treated as an entry name.

              invisible
                     Makes dictionary invisible to the clients.  See  database
                     specification

              NOTE:  Another  way to implement a virtual database is to create
                     database files by dictfmt_virtual executable

       Plugin Specification

              plugin string
                     Specifies a filename of the plugin.

              data string
                     Specifies data for initializing plugin.

              name string
                     Specifies the short name of the  database.  See  database
                     specification

              info string
                     Specifies  the  information  about database. See database
                     specification

              invisible
                     Makes dictionary invisible to the clients.  See  database
                     specification

              NOTE:  Another  way  to  configure  plugin is to create database
                     files by dictfmt_plugin executable

       include string
              The text of the file "string" (usually a database specification)
              will  be read as if it appeared at this location in the configu-
              ration file.  Nested includes are not permitted.



DETERMINATION OF ACCESS LEVEL

       When a client connects, the global access specification is scanned,  in
       order,  until  a  specification  matches.   If  no access specification
       exists, all access is allowed (e.g., the  action  is  the  same  as  if
       "allow *" was the only item in the specification).  For each item, both
       the hostname and IP are checked. For example,  consider  the  following
       access specification:
              allow 10.42.*
              authonly *.edu
              deny *
       With  this  specification,  all  clients  in  the 10.42 network will be
       allowed access to unrestricted databases; all clients from *.edu  sites
       will  be  allowed  to  authenticate,  but  will be denied access to all
       databases, even those which are otherwise unrestricted; and  all  other
       clients  will  have their connection terminated immediately.  The 10.42
       network clients can send an AUTH command and gain access to  restricted
       databases.   The *.edu clients must send an AUTH command to gain access
       to any databases, restricted or unrestricted.

       When the AUTH command is sent, the access list  for  each  database  is
       scanned, in order, just as the global access list is scanned.  However,
       after authentication, the client has an associated username.  For exam-
       ple, consider the following access specification:
              user u1
              deny *.com
              user u2
              allow *
       If  the client authenticated as u1, then the client will have access to
       this database, even if the client comes from a  *.com  site.   In  con-
       trast,  if  the  client  authenticated as u2, the client will only have
       access if it does not come from a *.com site.  In this case, the  "user
       u2" is redundant, since that client would also match "allow *".

       Warning:  Checks  are  performed for domain names and for IP addresses.
       However, if reverse DNS for a specific site is not working, it is  pos-
       sible  that a domain name may not be available for checking.  Make sure
       that all denials use IP addresses.  (And consider a future enhancement:
       if  a  domain  name  is  not available, should denials that depend on a
       domain name match anything?  This is the more  conservative  viewpoint,
       but it is not currently implemented.)


SEARCH ALGORITHMS

       The DICT standard specifies a few search algorithms that must be imple-
       mented, and permits others to be supported on a server-dependent basis.
       The  following  search  strategies  are supported by this server.  Note
       that all strategies are case  insensitive.   Most  ignore  non-alphanu-
       meric, non-whitespace characters.

       exact  An  exact match.  This algorithm uses a binary search and is one
              of the fastest search algorithms available.

       lev    The Levenshtein algorithm (string edit distance of  one).   This
              algorithm  searches  for all words which are within an edit dis-
              tance of one from the target word.  An "edit"  means  an  inser-
              tion, deletion, or transposition.  This is a rapid algorithm for
              correcting spelling  errors,  since  many  spelling  errors  are
              within a Levenshtein distance of one from the original word.

       prefix Prefix  match.   This algorithm also uses a binary search and is
              very fast.

       re     POSIX 1003.2 (modern) regular expression search.  Modern regular
              expressions  are  the  ones  used  by  egrep(1).   These regular
              expressions   allow   predefined   character   classes    (e.g.,
              [[:alnum:]], [[:alpha:]], [[:digit:]], and [[:xdigit:]] are use-
              ful for this application); uses * to match a sequence 0 or  more
              matches of the previous atom; uses + to match a sequence of 1 or
              more matches of the previous atom; uses ? to match a sequence of
              0  or 1 matches of the previous atom; used ^ to match the begin-
              ning of a word, uses $ to match the end of a  word,  and  allows
              nested  subexpression  and alternation with () and |.  For exam-
              ple, "(foo|bar)" matches all words that contain either "foo"  or
              "bar".   To  match these special characters, they must be quoted
              with two backslashes (due to the quoting characteristics of  the
              server).  Warning: Regular expression matches can take 10 to 300
              times longer than substring matches.  On  a  busy  server,  with
              many databases, this can required more than 5 minutes of waiting
              time, depending on the complexity of the regular expression.

       regexp Old (basic)  regular  expressions.   These  regular  expressions
              don’t  support  |,  +,  or  ?.   Groups use escaped parentheses.
              While modern regular expressions are generally  easier  to  use,
              basic  regular  expressions have a back reference feature.  This
              can be used to match a second occurrence of something  that  was
              already  matched.   For  example, the following expression finds
              all words that begin and end with the same three letters:
                  ^\\(...\\).*\\1$

              Note the use of the double backslashes  to  escape  the  special
              characters.  This is required by the DICT protocol string speci-
              fication (a single backslash quotes the next character -- we use
              two  to get a single backslash through to the regular expression
              engine).  Warning: Note that the use  of  backtracking  is  even
              slower than the use of general regular expressions.

       soundex
              The  Soundex  algorithm,  a  classic algorithm for finding words
              that sound similar to each other.  The  algorithm  encodes  each
              word  using the first letter of the word and up to three digits.
              Since the first letter is known, this search is relatively fast,
              and  it  sometimes  good for correcting spelling errors when the
              Levenshtein algorithm doesn’t help.

       substring
              Match a substring anywhere in the headword.  This search  strat-
              egy  uses  a  modified Boyer-Moore-Horspool algorithm.  Since it
              must search the whole index file, it is not as fast as the exact
              and prefix matches.

       suffix Suffix  match.  This search strategy also uses a modified Boyer-
              Moore-Horspool algorithm,  and  is  as  fast  as  the  substring
              search.   If  the optional index_suffix string file is listed in
              the configuration file this search is much faster.

       word   Match any single word, even if part of a multi-word  entry.   If
              the  optional index_word string file is listed in the configura-
              tion file this search is much faster.


DATABASE FORMAT

       Databases for dictd are distributed separately.  A database consists of
       two files.  One is a flat text file, the other in the index.

       The  flat  text file contains dictionary entries (or any other suitable
       data), and the index contains tab-delimited tuples  consisting  of  the
       headword,  the  byte offset at which this entry begins in the flat text
       file, and the length of the entry in bytes.  The offset and length  are
       encoded  using base 64 encoding using the 64-character subset of Inter-
       national Alphabet IA5 discussed in RFC 1421  (printable  encoding)  and
       RFC  1522 (base64 MIME).  Encoding the offsets in base 64 saves consid-
       erable space when compared with the usual base 10 encoding, while still
       permitting tab characters (ASCII 9) to be used for delimiting fields in
       a record.  Each record ends with a newline (ASCII  10),  so  the  index
       file is human readable.

       The flat text file may be compressed using gzip(1) (not recommended) or
       dictzip(1) (highly recommended).  Optimal speed will be obtained  using
       an  uncompressed  file.   However, the gzip compression algorithm works
       very well on plain text, and can  result  in  space  savings  typically
       between 60 and 80%.  Using a file compressed with gzip(1) is not recom-
       mended, however, because random access on the file can only  be  accom-
       plished  by  serially  decompressing the whole file, a process which is
       prohibitively slow.  dictzip(1) uses the same compression algorithm and
       file  format  as does gzip(1), but provides a table that can be used to
       randomly access compressed blocks in the  file.   The  use  of  50-64kB
       blocks for compression typically degrades compression by less than 10%,
       while maintaining acceptable random access capabilities for all data in
       the file.  As an added benefit, files compressed with dictzip(1) can be
       decompressed with gzip(1) or zcat(1).  (Note: recompressing a dictzip’d
       file using, for example, znew(1) will destroy the random access charac-
       teristics of the file.  Always compress data files using dictzip(1).)


ACKNOWLEDGEMENTS

       Special thanks to Jean-loup Gailly and Mark Adler for writing the  zlib
       general  purpose  data compression library.  The version contained with
       dictd is not necessarily an original version and may have been modified
       (unnecessary  files  may  have  been  deleted  to make the distribution
       smaller; makefiles may have been  modified  to  ease  compilation;  see
       zlib/README.DICT for any significant changes).  For more information on
       zlib, please see the zlib home page at
              http://www.gzip.org/zlib/

       The key features of the  dictzip  random-access  compression  algorithm
       utilize  a  documented extension of the gzip format, and do not require
       any modifications to zlib.

       Special thanks to Henry Spencer for his  regex  package.   The  package
       contained  with  dictd  is  not necessarily an original version and may
       have been modified (unnecessary files may have been deleted to make the
       distribution smaller; makefiles may have been modified to ease compila-
       tion; see regex/README.DICT for any  significant  changes).   For  more
       information on regex, please see
              ftp://zoo.toronto.edu/pub/regex.shar


COPYING

       The  main source files for the dictd server and the dictzip compression
       program were written by Rik  Faith  (faith@dict)  and  are  distributed
       under the terms of the GNU General Public License.  If you need to dis-
       tribute under other terms, write to the author.

       The main libraries used by these programs  (zlib,  regex,  libmaa)  are
       distributed  under  different  terms,  so  you  may  be able to use the
       libraries for applications which  are  incompatible  with  the  GPL  --
       please see the copyright notices and license information that come with
       the libraries for more information, and consult with your  attorney  to
       resolve these issues.


BUGS

       The  regular  expression  searches  do  not ignore non-whitespace, non-
       alphanumeric characters as do the other searches.   In  practice,  this
       isn’t much of a problem.

       The  databases are memory mapped and cannot be updated while the server
       is running.

       There is no way to get a running server to  re-read  the  configuration
       file, so databases cannot be added or deleted on the fly.


FILES

       /etc/dictd.conf
       /usr/sbin/dictd


SEE ALSO

       dictfmt(1),   dictfmt_virtual(1),   dict(1),   dictzip(1),   gunzip(1),
       zcat(1), webster(1), RFC 2229



                                 29 March 2002                        DICTD(8)

Man(1) output converted with man2html