[Sunhelp] Thanks for everyone's responses on the rebooting issue!

Hal Flynn mrhal at mrhal.com
Wed Sep 6 15:14:36 CDT 2000


Here's an article I got in email.  This may or may not apply to you.  It's
interesting, though.  I don't know the source of this article.

********** Begin Article **********

Subject: Interesting Sun UltraSPARC II Cache issues... is it happening to you?
Date: Wed, 6 Sep 2000 08:47:16 -0400

More users slam Sun for memory issue

Hardware maker's handling of problem, frequency of crashes are major complaints

By Jaikumar Vijayan

(Sep. 04, 2000) More users affected by a long-standing cache memory problem
on Sun Microsystems Inc.'s Ultra Enterprise servers are slamming Sun's
handling of the issue and its attempt to keep users quiet about it.
Since Computerworld first reported the problem last week, additional users
have come forward to recount similar experiences. Topping their list of
complaints are the frequency of server crashes caused by the problem, fixes
that don't work and Sun's tendency to initially blame the problem on other
factors before acknowledging it - often only under a nondisclosure
agreement.

"They treated the whole thing like a cover-up," said one user at a large
utility in the Western U.S. who asked not to be named.

Even with hardware replacements, the utility has had so many crashes on its
Sun 6500 servers since they were purchased in May that the machines have
been pulled out of production, the user said. Now the company is considering
returning some of them as defective.

When the utility first informed Sun about the issues, "they told us this was
an unusual problem and that others did not have it. . . . They repeatedly
said this," the user said. "Poor handling of this case could cost Sun
millions of dollars in sales as well as a high-profile client."

Sun recently acknowledged a problem involving an external memory cache on
its UltraSPARC II microprocessor module. Under certain conditions, the
problem has been triggering system failures and frequent server reboots at
customer locations over the past 18 months.

In an e-mailed response to Computerworld questions, Shahin Kahn, Sun's
director of marketing for its enterprise servers, late last week reiterated
Sun's earlier explanation that customers early on were asked for NDAs only
because Sun engineers were still investigating the issues and looking at
different ways of resolving them. "Our interest was in sharing what we knew
with customers at the time," Kahn's message said.

"NDAs are a common practice in the tech industry when companies talk about
unannounced topics. That was so in this case -- we were looking at a variety
of approaches to helping customers," the note said. As of earlier this year,
the company stopped requiring NDAs on this subject, Kahn added.

Kahn's e-mail also insisted that "the number of module errors has been small
enough that we've been able to work with customers one on one to resolve
their particular issues. And we have seen that number grow smaller."

In a recent interview with Computerworld on the subject, Sun Executive Vice
President John Shoemaker said a fix - in the form of a mirrored-cache
technology - is on the way. "We are close to declaring complete victory over
this," he said.

Fix Can't Come Too Soon

That won't come a moment too soon for IQ4hire Inc., a Chicago-based start-up
that purchased a Sun Enterprise 420R and a Sun Enterprise 220 server in May.
Since then, the 420R has crashed seven times at the dot-com, while the 220R
crashed for the first time last week, said CIO Eric Durst.

"Sun came out at least four times on the 420. They talked about the heating,
the air conditioning, the static electricity. . . . They replaced hardware
and generally changed everything but the frame," Durst said. "They didn't
appear to know how to fix it."

"I've had cases open on this problem over and over again," said Norman
Morrison, an independent project consultant working at a service provider
that hosts Web sites for companies that sell retail goods. But, he added,
"I've had people at Sun tell me it is a very rare occurrence."

It was only recently that Sun finally told him about the problem and the
planned fix. "They said it was necessary to sign an NDA to find out what
fixes they had in the works for the cache problem. Neither I nor my company
has signed such an agreement," Morrison said.

Ken Dort, an attorney at Gordon & Glickson LLC in Chicago, last week said
that such nondisclosure agreements (NDA), though highly unusual, are legally
enforceable as long as they aren't signed under duress.

"If there's bad news to be distributed, these NDAs can slow down the
propagation of the information and give the [vendor] more time to fix the
problem," Dort said.

In cases where users rely heavily on a vendor's product, they are more
willing to sign such agreements, he added.

"It's not illegal or even coercive," said Esther Roditti, an independent
computer and Internet law specialist in New York. On the other hand, she
said, "I've never heard of this happening before."

Many Users Unaffected

Despite the frequency with which the problem appears to hit some Sun users,
there are clearly many others who aren't affected by it.

"We have seen zero problems of this nature on our machines," said Scott
Medlock, chief operating officer at Commercial Open Systems Inc., an
Internet service provider in Kansas City, Mo.

The company runs a variety of Sun servers and has seen no evidence of a
memory glitch, despite running the servers "at 70% capacity 100% of the
time" during the past three years, Medlock said.

DiCarta Inc., an online contract management service, has also had no
problems related to the memory issue, said CEO Scott Martin. The Redwood
City, Calif.-based company is a member of a Sun program aimed at improving
overall service delivery of Internet service providers.

"There's never been an issue with any of the Sun equipment with regard to
any hardware failures," Martin said. "And that includes everything from the
smallest servers all the way to their biggest one."

********** End Article **********

Hope you find this at least useful in some context.

Hal

Almazan_Hector at emc.com wrote:

> You need to replace CPU 3. The problem is the ecache not the CPU, but the
> ecache is part of the CPU Module.
> There has been a problem with SUN's ecache made by IBM. They said that the
> new CPU will fix this problem.






More information about the SunHELP mailing list