Patch Name: PHKL_7793 Patch Description: s700 10.01 VxFS (JFS) buffer cache hangs Creation Date: 96/06/27 Post Date: 96/07/15 Hardware Platforms - OS Releases: s700: 10.01 Products: N/A Filesets: OS-Core.CORE-KRN Automatic Reboot?: Yes Status: General Superseded Critical: Yes PHKL_7793: HANG PHKL_7408: HANG Path Name: /hp-ux_patches/s700/10.X/PHKL_7793 Symptoms: PHKL_7793: Heavily used Vxfs with snapshot hangs. PHKL_7408: System hang can occur during heavy buffer cache activity in combination with readahead (prevalent in VxFS). Defect Description: PHKL_7793: When Vxfs calls getnewbuf() to acquire a buffer, it passes VX_NONBLOCK in bxflags to avoid potential deadlock. However, getnewbuf(), when finding a B_DELWRI buffer, proceeds to call bwrite() to flush the buffer without checking the bxflags. This causes deadlock in the following scenario: 1. getnewbuf() tries to flush a buffer belongs to Vxfs. The Vxfs strategy layer finds the buffer involves in an uncommitted transaction and decides to flush the log buffer first. 2. This file system has a snapshot and the region of the log we are about to overwrite happens to change for the first time. So the snapshot strategy layer needs to copy the old data from primary disk to the snapshot before flushing the log. In order to do so, it asks for another buffer. 3. getnewfs() is called again and yet another Vxfs dirty buffer needs to be flushed. 4. Vxfs strategy layer decides the current log buffer must be flushed before any dirty buffer. Since the current log buffer is locked, it sleeps and waits, for a lock that is owned by itself. The fix is that if a B_DELWRI buffer with VX_NONBLOCK is chosen, getnewbuf() will return NULL instead of flushing and returning the buffer. PHKL_7408: This defect will occur under the following conditions: - We are doing readahead on the disk. JFS is aggressive this way. Essentially, the BX_NONBLOCK and/or BX_NOBUFWAIT flags will be set for the buffer read. - The buffer cache virtual space (check bufmap) is highly fragmented. Another possibility (though it hasn't been seen) is that the current buffer overlaps another locked buffer. Essentially, anything that necessitates sleeping in brealloc1() or allocbuf1(). brealloc1() and allocbuf1() refuse to sleep if one of the nonblock flags are set. However, a bug in ogetblk() ignores this error return condition, and simply loops if the call to brealloc1() fails. SR: 1653166496 5003314906 Patch Files: /usr/conf/lib/libhp-ux.a(vfs_bio.o) what(1) Output: /usr/conf/lib/libhp-ux.a(vfs_bio.o): vfs_bio.c $Date: 96/06/27 13:46:48 $ $Revision: 1.20 .72.72 $ PATCH_10.01 (PHKL_7793) cksum(1) Output: 497103255 27336 /usr/conf/lib/libhp-ux.a(vfs_bio.o) Patch Conflicts: None Patch Dependencies: None Hardware Dependencies: None Other Dependencies: None Supersedes: PHKL_7408 Equivalent Patches: PHKL_7794: s800: 10.01 PHKL_7795: s700: 10.10 PHKL_7796: s800: 10.10 Patch Package Size: 80 Kbytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_7793 5a. For a standalone system, run swinstall to install the patch: swinstall -x autoreboot=true -x match_target=true \ -s /tmp/PHKL_7793.depot 5b. For a homogeneous NFS Diskless cluster run swcluster on the server to install the patch on the server and the clients: swcluster -i -b This will invoke swcluster in the interactive mode and force all clients to be shut down. WARNING: All cluster clients must be shut down prior to the patch installation. Installing the patch while the clients are booted is unsupported and can lead to serious problems. The swcluster command will invoke an swinstall session in which you must specify: alternate root path - default is /export/shared_root/OS_700 source depot path - /tmp/PHKL_7793.depot To complete the installation, select the patch by choosing "Actions -> Match What Target Has" and then "Actions -> Install" from the Menubar. 5c. For a heterogeneous NFS Diskless cluster: - run swinstall on the server as in step 5a to install the patch on the cluster server. - run swcluster on the server as in step 5b to install the patch on the cluster clients. The cluster clients must be shut down as described in step 5b. By default swinstall will archive the original software in /var/adm/sw/patch/PHKL_7793. If you do not wish to retain a copy of the original software, you can create an empty file named /var/adm/sw/patch/PATCH_NOSAVE. Warning: If this file exists when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. It is recommended that you move the PHKL_7793.text file to /var/adm/sw/patch for future reference. To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_7793.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: None