Monitored Process Fields |
Field Description |
state |
The keepalive state for the process/daemon, which is one
of:
- start
- Process is being started.
- ok
- Process is running
- dead
- Process is not running, but will go to respawn (if it has not reached
its failure limit) or down (if it has reached its failure limit).
- down
- Process is down because it has reached its failure limit or is a member
of a group that is down because one of its members went down.
- respawn
- Process is being restarted and will go to ok if successful or dead if
not successful.
- shutdown
- Process has been shut down. If it is a member of a group that was
shut down because a critical member of the group failed, it goes to respawn.
- daemonize
- Process is daemonizing itself and will go to ok if successful or dead
if not successful.
|
pid
| The current process identification number (PID) of the process/daemon.
|
node number
| The number of the node on which the process/daemon is running.
|
full path to process
| The complete path to the process/daemon.
|
argument list |
The arguments (if any) used by keepalive to distinguish this
process/daemon from others by the same name. |
child of keepalive
| Set TRUE if the process is a child of keepalive; otherwise (such as
when daemonization recovery is underway), set FALSE.
|
daemonization recovery
| Set TRUE if the process has daemonized itself; otherwise, set
FALSE.
|
pinned |
Indicates whether the process/daemon has been designated to run on one
or more specific nodes in the cluster. If set TRUE, the process/daemon
is pinned to one or more nodes. If set FALSE, the process/daemon can
float (migrate) among the nodes in the cluster. |
lastexeced |
Indicates when the process/daemon was last started. |
process first died |
Indicates the first time the process/daemon stopped before being
restarted. |
process last died |
Indicates the last time the process/daemon stopped before being
restarted |
min. respawn |
Specifies the number of seconds the process/daemon must run before it
is eligible for restarting. |
num. errors |
The number of errors (such as process/daemon failures) that have
occurred during the current probation period, which starts when the
process/daemon fails and the error count is set to one (1). The error
count includes process/daemon failures, node rejections by the
process/daemon (see -c reject option), and node failures. |
total errors |
Specifies the total number of errors since the process/daemon was
first started. See num_errors for the events included in the error
count. |
max. errors during probation |
Specifies the maximum number of errors (process/daemon failures)
allowed before keepalive will no longer respawn the process/daemon
(leaving the process/daemon in the down state). The maximum number of
errors must occur during the specified probation period in order for the
process/daemon to be left in the down state. |
probation period |
Specifies the time, in seconds, during which the number of errors
specified by max_errors_during_probation must occur in order for the
process/daemon to be taken to the down state. |
registration policy |
One of the following methods by which the process/daemon is
registered: Name, meaning keepalive looks for the process/daemon
by name (the -a and -o options were not used to register
the process/daemon); Argument List, meaning keepalive looks for
the process/daemon by name and argument list (the -a option was
used); PID, meaning keepalive looks for the process/daemon by
process ID (-o option was used).
|
node selection policy |
Identifies the node selection policy as specified by the -Z
option. |
favored node |
The node on which the process/daemon is executed, as specified by the -F
option. If no node is specified, this value is None. |
backup nodes
| Specifies the nodes on which the process/daemon is executed if the
favored node is unavailable. If no nodes are specified, this value is
None.
|
rejected nodes
| A list of nodes that the process/daemon has rejected or the keepalive
node selection policy has rejected.
|
termwait
| The time interval (in seconds) that keepalive gives the process/daemon
to shut down before sending it a SIGKILL signal. |
euid |
The user identification number of the process/daemon. |
egid |
The group identification number of the process/daemon. |
startup script |
The name of the script that keepalive executes when it starts
the process/daemon. |
shutdown script
| The name of the script that keepalive executes when it shuts
down the process/daemon.
|
process failure recovery script |
The name of the script that keepalive executes when it restarts
the process/daemon after it fails. |
node failure recovery script |
The name of the script that keepalive executes when it restarts
a process/daemon whose node has failed. |
down script |
The name of the script that keepalive executes when a
process/daemon enters the down state. |
group |
The name of the registration group to which the process/daemon
belongs. If the process/daemon does not belong to a group, None is
displayed. |
critical group process |
Set to TRUE or FALSE to indicate whether or not the process/daemon is
critical to its group. Set to N/A if the process/daemon is not a member
of a group. |
reject node exit code |
Exit code specified by -c reject option. Set to None if the
process/daemon was not registered with the -c reject option. |
down exit code |
Exit code specified by -c down option. Set to None if the
process/daemon was not registered with the -c down option. |
exit status returned
| If a process/daemon is not running, this is the exit code associated
with the most recent failure/exit. If a process/daemon is running, None
is reported.
|
last pid
| The PID of the last failed process/daemon in this slot.
|
slot
| The slot number of the process/daemon in this table.
|