Sendmail Operation Guide

R and S: rewriting rules

The core of address parsing are the rewriting rules. These are an ordered production system. sendmail(1M) scans through the set of rewriting rules looking for a match on the left hand side (LHS) of the rule. When a rule matches, the address is replaced by the right hand side (RHS) of the rule.

There are several sets of rewriting rules. Some of the rewriting sets are used internally and must have specific semantics. Other rewriting sets do not have specifically assigned semantics, and may be referenced by the mailer definitions or by other rewriting sets.

The syntax of these two commands are:

Sn

Sets the current ruleset being collected to n. If you begin a ruleset more than once it appends to the old definition.

R lhs rhs comments

The fields must be separated by at least one tab character; there may be embedded spaces in the fields. The lhs is a pattern that is applied to the input. If it matches, the input is rewritten to the rhs. The comments are ignored.

Macro expansions of the form $x are performed when the configuration file is read. Expansions of the form $&x are performed at run time using a somewhat less general algorithm. This for is intended only for referencing internally defined macros such as $h that are changed at runtime.

The left hand side

The left hand side of rewriting rules contains a pattern. Normal words are simply matched directly. Metasyntax is introduced using a dollar sign. The metasymbols are:


$*
match zero or more tokens

$+
match one or more tokens

$-
match exactly one token

$=x
match any phrase in class x

$~x
match any word not in class x
If any of these match, they are assigned to the symbol $n for replacement on the right hand side, where n is the index in the LHS. For example, if the LHS $-:$+ is applied to the input:

UCBARPA:eric

the rule will match, and the values passed to the RHS will be:


$1
UCBARPA

$2
eric
Additionally, the LHS can include $@ to match zero tokens. This is not bound to a $n on the RHS, and is normally only used when it stands alone in order to match the null input.

The right hand side

When the left hand side of a rewriting rule matches, the input is deleted and replaced by the right hand side. Tokens are copied directly from the RHS unless they begin with a dollar sign. Metasymbols are:


$n
substitute indefinite token n from LHS

$[name$]
canonicalize name

$(map key $@arguments $:default $)
generalized keyed mapping function

$>n
call ruleset n

$#mailer
resolve to mailer

$@host
specify host

$:user
specify user
The $n syntax substitutes the corresponding value from a $+, $-, $*, $=, or $~ match on the LHS. It may be used anywhere.

A host name enclosed between $[ and $] is looked up in the host database(s) and replaced by the canonical name. This is actually completely equivalent to the following:

   $(host hostname$)
In particular, a $: default can be used.

For example, $[ftp$] might become ftp.CS.Berkeley.EDU and $[[128.32.130.2]$] would become vangogh.CS.Berkeley.EDU. sendmail(1M) recognizes its numeric IP address without calling the name server and replaces it with its canonical name.

The $( ... $) syntax is a more general form of lookup; it uses a named map instead of an implicit map. If no lookup is found, the indicated default is inserted; if no default is specified and no lookup matches, the value is left unchanged. The arguments are passed to the map for possible use.

The $>n syntax causes the remainder of the line to be substituted as usual and then passed as the argument to ruleset n. The final value of ruleset n then becomes the substitution for this rule. The $> syntax expands everything after the ruleset name to the end of the replacement string and then passes that as the initial input to the ruleset. Recursive calls are allowed. For example, $>0 $>3 $1 expands $1, passes that to ruleset 3, and then passes the result of ruleset 3 to ruleset 0.

The $# syntax should only be used in ruleset zero or a subroutine of ruleset zero. It causes evaluation of the ruleset to terminate immediately, and signals to sendmail that the address has completely resolved. The complete syntax is:

$#mailer $@host $:user

This specifies the {mailer, host, user} triple necessary to direct the mailer. If the mailer is local, the host part may be omitted. You may want to use it for special ``per user'' extensions. For example, in the address jgm+foo@CMU.EDU, the +foo part is not part of the user name, and is passed to the local mailer for local use.

The mailer must be a single word, but the host and user may be multi-part. If the mailer is the built-in IPC mailer, the host may be a colon-separated list of hosts that are searched in order for the first working address (exactly like MX records). The user is later rewritten by the mailer-specific envelope rewriting set and assigned to the $u macro. As a special case, if the mailer specified has the F=@ flag specified and the first character of the $: value is @, the @ is stripped off, and a flag is set in the address descriptor that causes sendmail to not do ruleset 5 processing.

Normally, a rule that matches is retried, that is, the rule loops until it fails. A RHS may also be preceded by a $@ or a $: to change this behavior. A $@ prefix causes the ruleset to return with the remainder of the RHS as the value. A $: prefix causes the rule to terminate immediately, but the ruleset to continue; this can be used to avoid continued application of a rule. The prefix is stripped before continuing.

The $@ and $: prefixes may precede a $> specification; for example:

   R$+     $: $>7 $1
matches anything, passes that to ruleset seven, and continues; the $: is necessary to avoid an infinite loop.

Substitution occurs in the order described, that is, parameters from the LHS are substituted, hostnames are canonicalized, subroutines are called, and finally $#, $@, and $: are processed.

Semantics of rewriting rule sets

There are six rewriting sets that have specific semantics.

Ruleset three should turn the address into ``canonical form''. This form should have the basic syntax:

local-part@host-domain-spec

Ruleset three is applied by sendmail(1M) before doing anything with any address.

If no @ is specified, then the host-domain-spec may be appended from the sender address (if the C flag is set in the mailer definition corresponding to the sending mailer). Ruleset zero is applied after ruleset three to addresses that are going to actually specify recipients. It must resolve to a {mailer, host, address} triple. The mailer must be defined in the mailer definitions from the configuration file. The host is defined into the $h macro for use in the argv expansion of the specified mailer.

Rulesets one and two are applied to all sender and recipient addresses respectively. They are applied before any specification in the mailer definition. They must never resolve.

Ruleset four is applied to all addresses in the message. It is typically used to translate internal to external form.

In addition, ruleset 5 is applied to all local addresses (specifically, those that resolve to a mailer with the F=5 flag set) that do not have aliases. This allows a last minute hook for local names.

Ruleset hooks

A few extra rulesets are defined as hooks that can be defined to get special features. They are all named rulesets. The check_* forms all give accept/reject status; falling off the end or returning normally is an accept, and resolving to $#error is a reject. Many of these can also resolve to the special mailer name $#discard; this accepts the message as though it were successful but then discards it without delivery. Note that this mailer can not be chosen as a mailer in ruleset 0.

IPC mailers

Some special processing occurs if the ruleset zero resolves to an IPC mailer (that is, a mailer that has [IPC] listed as the path in the M configuration line. The host name passed after $@ has MX expansion performed if not delivering via a named socket; this looks the name up in DNS to find alternate delivery sites.

The host name can also be provided as a dotted quad in square brackets; for example:

[128.32.149.78]

This causes direct conversion of the numeric value to an IP host address.

The host name passed in after the $@ may also be a colon-separated list of hosts. Each is separately MX expanded and the results are concatenated to make (essentially) one long MX list. The intent here is to create fake MX records that are not published in DNS for private internal networks.

As a final special case, the host name can be passed in as a text string in square brackets:

[ucbvax.berkeley.edu]

This form avoids the MX mapping.


NOTE: This is intended only for situations where you have a network firewall or other host that will do special processing for all your mail, so that your MX record points to a gateway machine; this machine could then do direct delivery to machines within your local domain. Use of this feature directly violates RFC1123 section 5.3.5.


© 2000 The Santa Cruz Operation, Inc. All rights reserved.