The core of address parsing are the rewriting rules. These are an ordered production system. sendmail(1M) scans through the set of rewriting rules looking for a match on the left hand side (LHS) of the rule. When a rule matches, the address is replaced by the right hand side (RHS) of the rule.
There are several sets of rewriting rules. Some of the rewriting sets are used internally and must have specific semantics. Other rewriting sets do not have specifically assigned semantics, and may be referenced by the mailer definitions or by other rewriting sets.
The syntax of these two commands are:
Sn
Sets the current ruleset being collected to n. If you
begin a ruleset more than once it appends to the old definition.
R lhs rhs comments
The fields must be separated by at least one tab character; there may be embedded spaces in the fields. The lhs is a pattern that is applied to the input. If it matches, the input is rewritten to the rhs. The comments are ignored.
Macro expansions of the form $x are performed when the configuration file is read. Expansions of the form $&x are performed at run time using a somewhat less general algorithm. This for is intended only for referencing internally defined macros such as $h that are changed at runtime.
The left hand side of rewriting rules contains a pattern. Normal words are simply matched directly. Metasyntax is introduced using a dollar sign. The metasymbols are:
UCBARPA:eric
the rule will match, and the values passed to the RHS will be:
When the left hand side of a rewriting rule matches, the input is deleted and replaced by the right hand side. Tokens are copied directly from the RHS unless they begin with a dollar sign. Metasymbols are:
A host name enclosed between $[ and $] is looked up in the host database(s) and replaced by the canonical name. This is actually completely equivalent to the following:
$(host hostname$)In particular, a $: default can be used.
For example, $[ftp$] might become ftp.CS.Berkeley.EDU and $[[128.32.130.2]$] would become vangogh.CS.Berkeley.EDU. sendmail(1M) recognizes its numeric IP address without calling the name server and replaces it with its canonical name.
The $( ... $) syntax is a more general form of lookup; it uses a named map instead of an implicit map. If no lookup is found, the indicated default is inserted; if no default is specified and no lookup matches, the value is left unchanged. The arguments are passed to the map for possible use.
The $>n syntax causes the remainder of the line to be substituted as usual and then passed as the argument to ruleset n. The final value of ruleset n then becomes the substitution for this rule. The $> syntax expands everything after the ruleset name to the end of the replacement string and then passes that as the initial input to the ruleset. Recursive calls are allowed. For example, $>0 $>3 $1 expands $1, passes that to ruleset 3, and then passes the result of ruleset 3 to ruleset 0.
The $# syntax should only be used in ruleset zero
or a subroutine of ruleset zero. It causes evaluation of the
ruleset to terminate immediately, and signals to sendmail
that the address has completely resolved. The complete syntax is:
$#mailer $@host $:user
This specifies the {mailer, host, user} triple necessary to direct the mailer. If the mailer is local, the host part may be omitted. You may want to use it for special ``per user'' extensions. For example, in the address jgm+foo@CMU.EDU, the +foo part is not part of the user name, and is passed to the local mailer for local use.
The mailer must be a single word, but the host and user may be multi-part. If the mailer is the built-in IPC mailer, the host may be a colon-separated list of hosts that are searched in order for the first working address (exactly like MX records). The user is later rewritten by the mailer-specific envelope rewriting set and assigned to the $u macro. As a special case, if the mailer specified has the F=@ flag specified and the first character of the $: value is @, the @ is stripped off, and a flag is set in the address descriptor that causes sendmail to not do ruleset 5 processing.
Normally, a rule that matches is retried, that is, the rule loops until it fails. A RHS may also be preceded by a $@ or a $: to change this behavior. A $@ prefix causes the ruleset to return with the remainder of the RHS as the value. A $: prefix causes the rule to terminate immediately, but the ruleset to continue; this can be used to avoid continued application of a rule. The prefix is stripped before continuing.
The $@ and $: prefixes may precede a $> specification; for example:
R$+ $: $>7 $1matches anything, passes that to ruleset seven, and continues; the $: is necessary to avoid an infinite loop.
Substitution occurs in the order described, that is, parameters from the LHS are substituted, hostnames are canonicalized, subroutines are called, and finally $#, $@, and $: are processed.
There are six rewriting sets that have specific semantics.
Ruleset three should turn the address into ``canonical form''. This
form should have the basic syntax:
local-part@host-domain-spec
Ruleset three is applied by sendmail(1M) before doing anything with any address.
If no @ is specified, then the host-domain-spec may be appended from the sender address (if the C flag is set in the mailer definition corresponding to the sending mailer). Ruleset zero is applied after ruleset three to addresses that are going to actually specify recipients. It must resolve to a {mailer, host, address} triple. The mailer must be defined in the mailer definitions from the configuration file. The host is defined into the $h macro for use in the argv expansion of the specified mailer.
Rulesets one and two are applied to all sender and recipient addresses respectively. They are applied before any specification in the mailer definition. They must never resolve.
Ruleset four is applied to all addresses in the message. It is typically used to translate internal to external form.
In addition, ruleset 5 is applied to all local addresses (specifically, those that resolve to a mailer with the F=5 flag set) that do not have aliases. This allows a last minute hook for local names.
A few extra rulesets are defined as hooks that can be defined to get special features. They are all named rulesets. The check_* forms all give accept/reject status; falling off the end or returning normally is an accept, and resolving to $#error is a reject. Many of these can also resolve to the special mailer name $#discard; this accepts the message as though it were successful but then discards it without delivery. Note that this mailer can not be chosen as a mailer in ruleset 0.
The check_relay ruleset is called after a connection is
accepted. It is passed as follows:
client.host.name $| client.host.address
$| is a metacharacter separating the two parts. This ruleset can reject connections from various locations.
The check_mail ruleset is passed the user name parameter of the SMTP MAIL command. It can accept or reject the address.
The check_rcpt ruleset is passed the user name parameter of the SMTP RCPT command. It can accept or reject the address.
The check_compat ruleset is passed as follows:
sender-address $| recipient-address
$| is a metacharacter separating the addresses. It can accept or reject mail transfer between these two addresses, much like the checkcompat function. For more information on this function, refer to ``Restricting use of email''.
This ruleset is passed
number-of-headers $| size-of-headers
where $|
is a metacharacter separating the numbers.
These numbers can be used for size comparisons with the
arith map. The ruleset is triggered after
all of the headers have been read.
It can be used to correlate information gathered
from those headers using the macro storage map.
One possible use is to check for a missing header.
For example:
Kstorage macro HMessage-Id: $>CheckMessageIdKeep in mind theSCheckMessageId # Record the presence of the header R$* $: $(storage {MessageIdCheck} $@ OK $) $1 R< $+ @ $+ > $@ OK R$* $#error $: 553 Header Error
Scheck_eoh # Check the macro R$* $: < $&{MessageIdCheck} > # Clear the macro for the next message R$* $: $(storage {MessageIdCheck} $) $1 # Has a Message-Id: header R< $+ > $@ OK # Allow missing Message-Id: from local mail R$* $: < $&{client_name} > R< > $@ OK R< $=w > $@ OK # Otherwise, reject the mail R$* $#error $: 553 Header Error
Message-Id:
header is not a required header and
is not a guaranteed spam indicator.
This ruleset is an example and
should probably not be used in production.
This ruleset is passed the parameter of the SMTP ETRN command. It can accept or reject the command.
This ruleset is passed the user name parameter of the SMTP EXPN command. It can accept or reject the address.
This ruleset is passed the user name parameter of the SMTP VRFY command. It can accept or reject the command.
This ruleset is passed the AUTH= parameter of the SMTP MAIL command. It is used to determine whether this value should be trusted. In order to make this decision, the ruleset may make use of the various ${auth_*} macros. If the ruleset does resolve to the ``error'' mailer the AUTH= parameter is not trusted and hence not passed on to the next relay.
Some special processing occurs if the ruleset zero resolves to an IPC mailer (that is, a mailer that has [IPC] listed as the path in the M configuration line. The host name passed after $@ has MX expansion performed if not delivering via a named socket; this looks the name up in DNS to find alternate delivery sites.
The host name can also be provided as a dotted quad in square
brackets; for example:
[128.32.149.78]
This causes direct conversion of the numeric value to an IP host address.
The host name passed in after the $@ may also be a colon-separated list of hosts. Each is separately MX expanded and the results are concatenated to make (essentially) one long MX list. The intent here is to create fake MX records that are not published in DNS for private internal networks.
As a final special case, the host name can be passed in as a
text string in square brackets:
[ucbvax.berkeley.edu]
This form avoids the MX mapping.