Keepalive Process Attributes

The information displayed about each process registered with keepalive includes the following attributes:
Monitored Process  Fields
Field Description
state The keepalive state for the process/daemon, which is one of:
start
Process is being started.
ok
Process is running
dead
Process is not running, but will go to respawn (if it has not reached its failure limit) or down (if it has reached its failure limit).
down
Process is down because it has reached its failure limit or is a member of a group that is down because one of its members went down.
respawn
Process is being restarted and will go to ok if successful or dead if not successful.
shutdown
Process has been shut down. If it is a member of a group that was shut down because a critical member of the group failed, it goes to respawn.
daemonize
Process is daemonizing itself and will go to ok if successful or dead if not successful.
pid The current process identification number (PID) of the process/daemon.
node number The number of the node on which the process/daemon is running.
full path to process The complete path to the process/daemon.
argument list The arguments (if any) used by keepalive to distinguish this process/daemon from others by the same name.
child of keepalive Set TRUE if the process is a child of keepalive; otherwise (such as when daemonization recovery is underway), set FALSE.
daemonization recovery Set TRUE if the process has daemonized itself; otherwise, set FALSE.
pinned Indicates whether the process/daemon has been designated to run on one or more specific nodes in the cluster. If set TRUE, the process/daemon is pinned to one or more nodes. If set FALSE, the process/daemon can float (migrate) among the nodes in the cluster.
lastexeced Indicates when the process/daemon was last started.
process first died Indicates the first time the process/daemon stopped before being restarted.
process last died Indicates the last time the process/daemon stopped before being restarted
min. respawn Specifies the number of seconds the process/daemon must run before it is eligible for restarting.
num. errors The number of errors (such as process/daemon failures) that have occurred during the current probation period, which starts when the process/daemon fails and the error count is set to one (1). The error count includes process/daemon failures, node rejections by the process/daemon (see -c reject option), and node failures.
total errors Specifies the total number of errors since the process/daemon was first started. See num_errors for the events included in the error count.
max. errors during probation Specifies the maximum number of errors (process/daemon failures) allowed before keepalive will no longer respawn the process/daemon (leaving the process/daemon in the down state). The maximum number of errors must occur during the specified probation period in order for the process/daemon to be left in the down state.
probation period Specifies the time, in seconds, during which the number of errors specified by max_errors_during_probation must occur in order for the process/daemon to be taken to the down state.
registration policy One of the following methods by which the process/daemon is registered: Name, meaning keepalive looks for the process/daemon by name (the -a and -o options were not used to register the process/daemon); Argument List, meaning keepalive looks for the process/daemon by name and argument list (the -a option was used); PID, meaning keepalive looks for the process/daemon by process ID (-o option was used).
node selection policy Identifies the node selection policy as specified by the -Z option.
favored node The node on which the process/daemon is executed, as specified by the -F option. If no node is specified, this value is None.
backup nodes Specifies the nodes on which the process/daemon is executed if the favored node is unavailable. If no nodes are specified, this value is None.
rejected nodes A list of nodes that the process/daemon has rejected or the keepalive node selection policy has rejected.
termwait The time interval (in seconds) that keepalive gives the process/daemon to shut down before sending it a SIGKILL signal.
euid The user identification number of the process/daemon.
egid The group identification number of the process/daemon.
startup script The name of the script that keepalive executes when it starts the process/daemon.
shutdown script The name of the script that keepalive executes when it shuts down the process/daemon.
process failure recovery script The name of the script that keepalive executes when it restarts the process/daemon after it fails.
node failure recovery script The name of the script that keepalive executes when it restarts a process/daemon whose node has failed.
down script The name of the script that keepalive executes when a process/daemon enters the down state.
group The name of the registration group to which the process/daemon belongs. If the process/daemon does not belong to a group, None is displayed.
critical group process Set to TRUE or FALSE to indicate whether or not the process/daemon is critical to its group. Set to N/A if the process/daemon is not a member of a group.
reject node exit code Exit code specified by -c reject option. Set to None if the process/daemon was not registered with the -c reject option.
down exit code Exit code specified by -c down option. Set to None if the process/daemon was not registered with the -c down option.
exit status returned If a process/daemon is not running, this is the exit code associated with the most recent failure/exit. If a process/daemon is running, None is reported.
last pid The PID of the last failed process/daemon in this slot.
slot The slot number of the process/daemon in this table.
 

Additional Keepalive Manager help: For detailed information about these attributes, see the Monitoring and Restarting Processes chapter of the NonStop Clusters for SCO UnixWare System Administrator's Guide.