[SunRescue] OT: dual PPro mb's...
Greg A. Woods
rescue at sunhelp.org
Wed Apr 4 18:27:23 CDT 2001
[ On Wednesday, April 4, 2001 at 11:07:11 (-0700), James Lockwood wrote: ]
> Subject: Re: [SunRescue] OT: dual PPro mb's...
>
> On Fri, 30 Mar 2001, Greg A. Woods wrote:
> >
> > (I really don't know what the problem is -- the Bell Labs folks did a
> > dual-CPU implementation of V7 way back in the early 80's and there have
>
> This was a jumbo lock single kernel thread approach, if you're referring
> to the 11/74 experiment.
No, I'm thinking of the article "Multiprocessing UNIX Operating Systems"
in the AT&T Bell Labs Tech journal Vol. 63, No. 8, October 1984. It
describes work done to run on IBM/370's and multi-processor AT&T 3B20A
and 3B5. They describe using separate semaphores on every shared
structure and using hash indexes on larger kernel tables to cut down on
the search times. They claimed 1.7 times the performance (process
throughput) on dual CPU machines.
> Not to disparage your comments, but I have a feeling that you've never
> worked on a large SMP kernel. Kernel lock contention analysis is _hard_
> and will always be suboptimal for different hardware and syscall usage
> patterns. Solaris (on both SPARC and x86) is actually a fairly good
> example of how to do it right and it's taken them a long time to get there
> (with high end customers pushing all the way). Some other commercial
> Unices do a reasonable job (Tru64 and AIX, IMHO) while HP-UX was lagging
> in kernel parallelization last I checked.
I haven't designed an SMP kernel myself, but I have studied the
implementations and results of others.
Unisys were actually the first UNIX System V development shop to come
out with SMP that seriously broke the 4-CPU barrier. Lots of other
developers licensed their kernel.
Indeed in the AT&T article I mentioned Bach and Buroff talk about the
many trade offs in choosing the correct granularity for semaphore
protection.
My comments are mostly aimed at the rather silly choice of modern
designers to still use large all-encompassing locks when even an initial
attempt at dividing kernel structures up into known groupings already
well identified in the literature will result in significant performance
improvements. The real problems, as shown by the mythical 4-CPU
barrier, are in how you continue to get good performance out of larger
numbers of processors.
--
Greg A. Woods
+1 416 218-0098 VE3TCP <gwoods at acm.org> <robohack!woods>
Planix, Inc. <woods at planix.com>; Secrets of the Weird <woods at weird.com>
More information about the rescue
mailing list