Patch Name: PHKL_10454 Patch Description: s700 10.01 SystemV semaphores, semop(2) cumulative patch Creation Date: 97/03/25 Post Date: 97/03/27 Hardware Platforms - OS Releases: s700: 10.01 Products: N/A Filesets: OS-Core.CORE-KRN ProgSupport.C-INC Automatic Reboot?: Yes Status: General Release Critical: Yes PHKL_10454: HANG Path Name: /hp-ux_patches/s700/10.X/PHKL_10454 Symptoms: PHKL_10454: Applications could hang when using semop() with non binary semaphores. PHKL_9744: Applications using SystemV semaphores were experiencing poor performance. This problem was reported by several customers using Oracle or SAP. PHKL_9114: Two cases of application hangs on semaphore; 1) using sema sets and 2) wrong sema counts cuased by a signal to a thread. PHKL_7053: semop() with counting semaphores could leave processes hanging. By "counting" semaphores we mean semops that use absolute values > 1 in the sem_op field. Here is an example situation which causes a hang: semval is 0 Child A is asleep trying to get 2 from the sema. Child B is asleep trying to get 1 from the sema. Parent adds 1 to semaphore. Child A "wakes up" but finds there is'nt enough and goes back to sleep. Child B, who could be satisfied with 1, never wakes up thus leading to a hang. The customer visible symptom is process hangs in code that uses system V semaphores. This fix is in addition to an earlier fix related to process hangs in semop code. It covers some bugs which were not fixed by the earlier fix. PHKL_5869: A process that uses semop(2) may hang indefinitely. The bug only shows up if some other process also uses semop(2), specifically requesting two or more semaphore operations in a single call. Defect Description: PHKL_10454: The semop() system call would not try to wake up more "ncnt" sleepers after the completion of a decrement operation resulting in a positive semaphore value. The following scenario describes the defect: The parent sets semval to 0 and forks 2 children: each child tries a decrement operation and goes to sleep waiting for semval > 0. Then the parent increments semval to 2 thus causing the wakeup of the 1st child, but after completing its decrement operation, the 1st child would not wakeup the 2nd child! This defect does not impact applications using binary semaphores. PHKL_9744: The previous patch (PHKL_9115) fixed a hang situation caused by a lost wakeup, but it also introduced a performance problem by waking up too many processes. The semaphore sleep/wakeup strategy was redesigned to minimize the number of wakeups. PHKL_9114: Two residual 'holes' were found: Case 1: Thread A holds Sem0 and Sem1 (of a set) Thread B attempts Down(0) and sleeps Thread C attempts Down(0 and 1) and sleeps A does Up(0), causing wakeup_one() C awakens, gets Sem0 but sleeps on Sem1 B remains asleep; C did not call wakeup() The solution is to pass the wakeup on to anyone else sleeping on the sema that triggered the original wakeup. Case 2: Thread A initializes Sem0 to '0' Thread B does a Down(0) and blocks Thread C does a Down(0) and blocks A does an Up(0) to awaken B or C ...However, a kill signal hits B B awakens with the signal and completes A does an Up(0) to wake C C does not get the wakeup The bug was that the sleeper count was decremented when thread A releases the sema. When B awakens with a signal there was an additional decrement of the sleeper count. This will cause the loss of a future wakeup on the next Up sema. PHKL_7053: The defect results in process hangs in semaphore code. Here is some sample code which reproduces a hang: { ....stuff deleted... if (!(pid1 = fork())) { P(2); exit(0); } else if (!(pid2 = fork())) { sleep(5); P(1); V(2); exit(0); } else { sleep(10);/* wait for children to block */ V(1); } /* wait for children */ } P(val) int val; { p_op.sem_op = -val; semop(semid, &p_op, 1); } V(val) int val; { v_op.sem_op = val; semop(semid, &v_op, 1); } semid is the semaphore id which is obtained through semget(). p_op and v_op are semop data structures initialized in program. PHKL_5869: The symptom here is that a process may sleep forever, waiting on a semaphore that is available. The cause is that semop supports operations on multiple semaphores. If the first semaphore is available, but the second one is not, semop must back out by releasing the first one before going to sleep on the second one. The back out code calls semundo which does release the semaphore; however, it fails to do a wakeup which is necessary if some other process is waiting for that same semaphore. The fix was to modify the back out code so that it performs the necessary wakeup. This bug shows up in 10.01 because the semop code was redesigned to be more efficient by performing the absolute minimum number of wakeups necessary for correct functionality. Obviously, we were too aggressive and forgot one of them. SR: 1653195545 4701344648 5003276675 5003306571 5003339747 Patch Files: /usr/conf/h/sem.h /usr/conf/lib/libhp-ux.a(sysV_sem.o) /usr/include/sys/sem.h what(1) Output: /usr/conf/h/sem.h: sem.h $Date: 96/11/15 15:25:14 $ $Revision: 1.20.71 .4 $ PATCH_10.01 (PHKL_9114) /usr/conf/lib/libhp-ux.a(sysV_sem.o): sysV_sem.c $Date: 97/03/21 11:44:14 $ $Revision: 1. 27.71.20 $ PATCH_10.01 (PHKL_10454) /usr/include/sys/sem.h: sem.h $Date: 96/11/15 15:25:14 $ $Revision: 1.20.71 .4 $ PATCH_10.01 (PHKL_9114) cksum(1) Output: 2023641962 5532 /usr/conf/h/sem.h 1025384819 15552 /usr/conf/lib/libhp-ux.a(sysV_sem.o) 2023641962 5532 /usr/include/sys/sem.h Patch Conflicts: None Patch Dependencies: None Hardware Dependencies: None Other Dependencies: None Supersedes: PHKL_5869 PHKL_7053 PHKL_9114 PHKL_9744 Equivalent Patches: PHKL_10455: s800: 10.01 PHKL_10456: s700: 10.10 PHKL_10457: s800: 10.10 PHKL_10458: s700: 10.20 PHKL_10459: s800: 10.20 Patch Package Size: 80 Kbytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_10454 5a. For a standalone system, run swinstall to install the patch: swinstall -x autoreboot=true -x match_target=true \ -s /tmp/PHKL_10454.depot 5b. For a homogeneous NFS Diskless cluster run swcluster on the server to install the patch on the server and the clients: swcluster -i -b This will invoke swcluster in the interactive mode and force all clients to be shut down. WARNING: All cluster clients must be shut down prior to the patch installation. Installing the patch while the clients are booted is unsupported and can lead to serious problems. The swcluster command will invoke an swinstall session in which you must specify: alternate root path - default is /export/shared_root/OS_700 source depot path - /tmp/PHKL_10454.depot To complete the installation, select the patch by choosing "Actions -> Match What Target Has" and then "Actions -> Install" from the Menubar. 5c. For a heterogeneous NFS Diskless cluster: - run swinstall on the server as in step 5a to install the patch on the cluster server. - run swcluster on the server as in step 5b to install the patch on the cluster clients. By default swinstall will archive the original software in /var/adm/sw/patch/PHKL_10454. If you do not wish to retain a copy of the original software, you can create an empty file named /var/adm/sw/patch/PATCH_NOSAVE. Warning: If this file exists when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. It is recommended that you move the PHKL_10454.text file to /var/adm/sw/patch for future reference. To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_10454.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: None