[SunHELP] in.mpathd failure detection time

velociraptor velociraptor at gmail.com
Wed Sep 8 10:26:35 CDT 2004


I'm seeing something usual with in.mpathd on
one of my boxes here at $work.

Sep  1 04:01:18 host.foo.com in.mpathd[37]: [ID 398532 daemon.error] C
annot meet requested failure detection time of 10000 ms on (inet  qfe4) new fail
ure detection time is 21014 ms
Sep  1 04:02:18 host.foo.com in.mpathd[37]: [ID 122137 daemon.error] I
mproved failure detection time 10507 ms
Sep  1 04:02:19 host.foo.com in.mpathd[37]: [ID 122137 daemon.error] I
mproved failure detection time 10000 ms

I've googled around, looked at sunsolve, the sun
support forums, etc. with little results, other than
one page indicating the log messages I see are 
"normal" and not indicative of a problem. However
I am seeing these messages on only one interface
pair on a single machine out of the dozen or so that
I have, each with 2 sets of these teamed interfaces.
(FWIW, the team is two ports on a qfe card.)

There is an inconsistent problem which I believe is
resulting from these "mini-flaps" of the interface.  I
have a this NIC team being monitored by my hosting
company; about 25% of the time when these "flaps"
happen, the NIC team will show as being unreachable.

One reference to a different problem mentioned patch
#111041-04, which is not installed, but the patch 
description doesn't suggest to me that it would be
of any help here.

I can only think to go down to the data center and 
change switch ports.  All the NIC ports are in use on 
the box, so changing them would be a good test to 
see if it's a problem port on the NIC (if the problem 
follows the NIC port), but this will be my last resort,
since it will require some planning to avoid down-
time.

Anyone have any other suggestions?



More information about the SunHELP mailing list