[SunHELP] in.mpathd failure detection time
velociraptor
velociraptor at gmail.com
Wed Sep 8 10:26:35 CDT 2004
I'm seeing something usual with in.mpathd on
one of my boxes here at $work.
Sep 1 04:01:18 host.foo.com in.mpathd[37]: [ID 398532 daemon.error] C
annot meet requested failure detection time of 10000 ms on (inet qfe4) new fail
ure detection time is 21014 ms
Sep 1 04:02:18 host.foo.com in.mpathd[37]: [ID 122137 daemon.error] I
mproved failure detection time 10507 ms
Sep 1 04:02:19 host.foo.com in.mpathd[37]: [ID 122137 daemon.error] I
mproved failure detection time 10000 ms
I've googled around, looked at sunsolve, the sun
support forums, etc. with little results, other than
one page indicating the log messages I see are
"normal" and not indicative of a problem. However
I am seeing these messages on only one interface
pair on a single machine out of the dozen or so that
I have, each with 2 sets of these teamed interfaces.
(FWIW, the team is two ports on a qfe card.)
There is an inconsistent problem which I believe is
resulting from these "mini-flaps" of the interface. I
have a this NIC team being monitored by my hosting
company; about 25% of the time when these "flaps"
happen, the NIC team will show as being unreachable.
One reference to a different problem mentioned patch
#111041-04, which is not installed, but the patch
description doesn't suggest to me that it would be
of any help here.
I can only think to go down to the data center and
change switch ports. All the NIC ports are in use on
the box, so changing them would be a good test to
see if it's a problem port on the NIC (if the problem
follows the NIC port), but this will be my last resort,
since it will require some planning to avoid down-
time.
Anyone have any other suggestions?
More information about the SunHELP
mailing list