Patch Name: PHKL_20963 Patch Description: s700 10.20 LVM Cumulative patch w/FC Hub + Optimus support Creation Date: 00/02/17 Post Date: 00/02/21 Warning: 00/05/04 - This Non-Critical Warning has been issued by HP. - PHKL_20963 exhibits a problem with Logical Volume Manager (LVM) mirroring facilities (also known as MirrorDisk/UX). When a volume group is activated, LVM validates a data structure on each physical volume called the Mirror Consistency Record (MCR). If the MCR is not valid, LVM performs a full resync of the physical volume and should rewrite the MCR. With PHKL_20963 installed, LVM does not rewrite the MCR. Instead, if the MCR is invalid LVM performs a full resync every time the volume group is activated rather than just the first time the MCR is determined to be invalid. - While the resync is in progress, system performance may be degraded. In addition, since LVM does not have two valid copies of data during the resync the system is vulnerable to a disk failure until the resync completes (This does not affect the performance or availability of mirrored logical volumes after the resync has completed). - Most systems will not experience this resync problem since it only occurs if the MCR has become corrupted. As long as the MCR is correct, the validation performed by LVM will succeed and no resync will be performed. This is why PHKL_20963 passed the rigorous testing performed before it was released. Recently, additional tests were introduced to simulate the corruption of the MCR and this problem was discovered. - To avoid this problem, HP recommends that PHKL_20963 be removed from all systems using MirrorDisk/UX. PHKL_20963 should also be removed from all software depots that may be used to install patches on systems with MirrorDisk/UX. - The problem is corrected in PHKL_21369, which is released. PHKL_21369 should be installed after PHKL_20963 is removed. - To prevent reverting back to PHKL_20963 if PHKL_21369 is removed in the future, HP recommends that PHKL_20963 be removed before PHKL_21369 is installed. If you choose not to remove PHKL_20963 before installing PHKL_21369, the system will still function properly after PHKL_21369 is installed. Hardware Platforms - OS Releases: s700: 10.20 Products: N/A Filesets: LVM.LVM-KRN OS-Core.CORE-KRN Automatic Reboot?: Yes Status: General Superseded With Warnings Critical: Yes PHKL_20963: HANG PHKL_19696: PANIC PHKL_19209: OTHER This patch is essential to improve the recovery of FC devices in large configurations. PHKL_19166: HANG Path Name: /hp-ux_patches/s700/10.X/PHKL_20963 Symptoms: PHKL_20963: (SR: 8606106637 CR: JAGab75913) An LVM deadlock (hang) can occur when LVM commands which operate on logical volumes are run at the same time as device query operations. The result is that the LVM commands and the query operations never complete (and cannot be terminated), and it is not possible to run any subsequent LVM commands. Furthermore, it is possible that subsequent device recovery will be delayed indefinitely. The only way to restore normal operation is to reboot the system. For example, the lvmerge(1M), lvsplit(1M) commands and glance(1) when run together could cause the commands to deadlock, resulting in a situation where they make no forward progress and cannot be interrupted or killed. The same result could occur running lvchange(1M) or lvextend(1M) and lvdisplay(1M) together. The defect was introduced in PHKL_19209 which has since been recalled. This new patch supersedes PHKL_19696, PHKL_19209, PHKL_20040 and PHKL_20807. This patch contains all the fixes contained in these patches. Customers with any of these superseded patches installed should apply this new patch. PHKL_20807: ( SR:8606100412 DTS: JAGab31786 ) This is an interim patch to support the Optimus disk array. Other customers are discouraged from installing it. Those who do install it will be required to update their systems later when a more general patch is available. This patch supersedes PHKL_19166, and all fixes from that patch are included here, too. However, this patch also supersedes several later patches that have been recalled (PHKL_19209, PHKL_19696, PHKL_20040), but does NOT include all the fixes from those patches. The fixes from the recalled patches will be made available later, in a more general patch that will supersede this interim patch. See below for more details. PHKL_20040: ( SR:8606100412 DTS: JAGab31786 ) LVM incorrectly treats two volumes within an Optimus disk array as alternate paths to a single volume because they have the same LUN ID, even though they are distinct volumes and have different target IDs. PHKL_19696: ( SR:8606101971 DTS: JAGab66231 ) If there is a problem with the physical volume to which the logical volume is mapped, LVM returns EIO error for logical requests, without retrying till the I/O timeout value set on that logical volume. PHKL_19209: ( SR:5003437970 DTS: JAGaa40887 ) When multiple physical volumes or paths to physical volumes are lost, it can require minutes to recover them. During the time the PVs for a given volume group are tested, locks were held which delayed other LVM operations and the opens and closes of logical volumes. Prior changes to the device recovery code provided some benefit, assuring that device recovery was 1-2 minutes regardless of the number of paths or devices to be recovered, however this still was not enough. The new device recovery code in this patch reduces the recovery time to under 35 seconds, again independent of the number of paths or devices offline. PHKL_19166: ( SR: 8606100864 DTS: JAGab39559 ) ( SR: 4701424846 DTS: JAGab14452 ) Performance degradation when massively parallel subpage size (<8K) reads are performed (as with Informix). ( SR: 8606100864 DTS: JAGab39559 ) ( SR: 1653289132 DTS: JAGaa67952 ) The system hangs when lvmkd is waiting for the lock obtained earlier by an application that performs a vg_create operation. The hang does not happen unless there is a powerfailed disk. ( SR: 8606100864 DTS: JAGab39559 ) ( SR: 4701424895 DTS: JAGab14455 ) Optimus Disk Arrays (model number A5277A-HP) are not recognized as an ACTIVE/PASSIVE device and subsequently are not handled properly by the driver. PHKL_17546: ( SR: 1653289553 DTS: JAGaa46305 ) LVM's autoresync after disk powerfail can leave extents stale. Defect Description: PHKL_20963: (SR: 8606106637 CR: JAGab75913) The LVM deadlock (hang) occurs due to a defect introduced in PHKL_19209 (recalled). In the bad patch, the problem was that an easily encountered deadlock condition was introduced while attempting to correct another relatively rare deadlock. The problem can be easily reproduced by running LVM commands which operate on existing logical volumes such as lvextend(1M), lvsplit(1M) or lvmerge(1M) along with commands that query logical volumes, such as glance(1). The deadlock occurs roughly 10% of the time, but when it does happen there are severe consequences. The deadlock makes it impossible to complete the operations or to run any other LVM commands, without rebooting the system. Resolution: The LVM kernel code was modified. The volume group lock and other LVM locks were reordered and a new volume group data lock was added to allow device recovery operations to occur simultaneously with command operations. Thus correcting the old and newly introduced deadlock defects. This patch supersedes the interim PHKL_20807 patch. It reintroduces the device recovery changes from PHKL_19209 and the bug fixes from PHKL_19696 which were purposely excluded from PHKL_20807. PHKL_20807: ( SR:8606100412 DTS: JAGab31786 ) PHKL_19209, PHKL_19696 and PHKL_20040 have been recalled. All fixes from those patches have been removed, except those needed to support the Optimus disk array. The other fixes will be reintroduced later in a superseding patch. Resolution: Removed changes from PHKL_19696. I/O start times are initialized as they used to be. As a consequence, if the system administrator has specified a timeout for a logical volume and the underlying physical volume is off line when LVM receives an I/O request, then when the physical volume becomes available again, the request will immediately fail with EIO (instead of being retried for the remainer of the timeout period, as it should be). Removed changes from PHKL_19209. Some of the LVM device recovery is once again a serial process. Because of this, LVM operations that access volume group data structures-- LVM commands, opens and closes of logical volumes--could be held off for minutes while LVM tries to recover physical volumes that are off line. PHKL_20040: ( SR:8606100412 DTS: JAGab31786 ) For some disk arrays, LVM treats all occurrences of the same LUN ID as alternate paths to a single volume. This assumption is not correct for the Optimus disk array: two distinct volumes may have the same LUN ID, but different target IDs. Resolution: To identify a unique volume in an Optimus array, LVM now uses both its LUN ID and target ID. PHKL_19696: ( SR:8606101971 DTS: JAGab66231 ) If the physical volume to which the logical volume is mapped has problems, instead of retrying till the lv_iotimeout value set on the logical volume, LVM returns EIO for logical requests before lv_iotimeout. This is because we are initializing the start time on the request during scheduling of the request. If the PV to which the request is to be scheduled is down then we append the request to powerfail wait queue without scheduling. When the PV comes back, we start resending the buffers in the powerfail wait queue and at that time we check the elapsed time (current time - initial time set) of the request, since we had not initialized the time on the request as we did not do the scheduling, it will be set to zero resulting in a value higher than the lv_iotimeout. Hence we bail out the request without processing it, although the time elapsed will be much less than the lv_iotimeout value. Resolution : Initializing logical buf start time in lv_strategy(), at the time of processing the request instead of setting it during scheduling in lv_schedule(). PHKL_19209: ( SR:5003437970 DTS: JAGaa40887 ) The problem was that some of the LVM device recovery was still a serial process. Resolution: The LVM device recovery code was modified to cause all tests of devices and paths to be conducted in parallel. Devices which are available are immediately brought online again, irrespective of other failed devices or paths. The changes in this patch assure that devices recover within the time it takes to test the device/path and to update its data structures. The volume group data structures and LVM operations that require them --LVM commands and opens and closes of logical volumes should be held off no more than 35 seconds. PHKL_19166: ( SR: 8606100864 DTS: JAGab39559 ) ( SR: 4701424846 DTS: JAGab14452 ) Informix issues massive amounts of 1K reads in parallel. With an 8K page size and I/Os serialized within the page, performance suffers. Resolution: Logic was added to allow reads from the same 8K page to proceed in parallel when bad block relocation is completely disabled (lvchange -r N). ( SR: 8606100864 DTS: JAGab39559 ) ( SR: 1653289132 DTS: JAGaa67952 ) If the holder of the vg_lock is waiting for I/O to finish, and if the I/O can't finish until we switch to another link, then we get into a deadlock. Resolution: To resolve the deadlock, the code now obtains the lock temporarily, in order to switch to the alternate link, then returns the lock to the original holder to finish the I/O. ( SR: 8606100864 DTS: JAGab39559 ) ( SR: 4701424895 DTS: JAGab14455 ) We need to recognize Optimus Array as an ACTIVE/PASSIVE device. Resolution: Added code to recognize the Optimus Array as an ACTIVE/PASSIVE device. PHKL_17546: ( SR: 1653289553 DTS: JAGaa46305 ) lv_syncx() may return with stale extents without actually syncing all the extents. Resolution: Added additional check to see if all the extents are synced; otherwise return error. lv_syncx() will return SUCCESS only when the syncing is completed. Made changes in lv_resyncpv() to preserve error value. SR: 1653289132 1653289553 4701424846 4701424895 5003437970 8606100412 8606100864 8606101971 8606106637 Patch Files: /usr/conf/lib/libhp-ux.a(rw_lock.o) /usr/conf/lib/liblvm.a(lv_block.o) /usr/conf/lib/liblvm.a(lv_cluster_lock.o) /usr/conf/lib/liblvm.a(lv_hp.o) /usr/conf/lib/liblvm.a(lv_ioctls.o) /usr/conf/lib/liblvm.a(lv_lvsubr.o) /usr/conf/lib/liblvm.a(lv_mircons.o) /usr/conf/lib/liblvm.a(lv_phys.o) /usr/conf/lib/liblvm.a(lv_schedule.o) /usr/conf/lib/liblvm.a(lv_spare.o) /usr/conf/lib/liblvm.a(lv_strategy.o) /usr/conf/lib/liblvm.a(lv_subr.o) /usr/conf/lib/liblvm.a(lv_syscalls.o) /usr/conf/lib/liblvm.a(lv_vgda.o) /usr/conf/lib/liblvm.a(sh_vgsa.o) what(1) Output: /usr/conf/lib/libhp-ux.a(rw_lock.o): rw_lock.c $Date: 2000/02/17 09:45:49 $ $Revision: 1. 8.98.6 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_block.o): lv_block.c $Date: 99/07/12 14:23:13 $ $Revision: 1.1 3.98.7 $ PATCH_10.20 (PHKL_19166) /usr/conf/lib/liblvm.a(lv_cluster_lock.o): lv_cluster_lock.c $Date: 2000/02/17 09:41:57 $ $Revi sion: 1.10.98.8 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_hp.o): lv_hp.c $Date: 2000/02/17 09:41:58 $ $Revision: 1.18 .98.37 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_ioctls.o): lv_ioctls.c $Date: 2000/02/17 09:42:08 $ $Revision: 1.18.98.26 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_lvsubr.o): lv_lvsubr.c $Date: 2000/02/17 09:42:10 $ $Revision: 1.15.98.25 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_mircons.o): lv_mircons.c $Date: 2000/02/17 09:42:12 $ $Revision: 1.14.98.8 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_phys.o): lv_phys.c $Date: 2000/02/17 09:42:14 $ $Revision: 1. 14.98.20 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_schedule.o): lv_schedule.c $Date: 2000/02/17 09:42:15 $ $Revision : 1.18.98.15 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_spare.o): lv_spare.c $Date: 2000/02/17 09:42:17 $ $Revision: 1 .3.98.11 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_strategy.o): lv_strategy.c $Date: 2000/02/17 09:42:22 $ $Revision : 1.14.98.17 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_subr.o): lv_subr.c $Date: 2000/02/17 09:42:23 $ $Revision: 1. 18.98.10 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_syscalls.o): lv_syscalls.c $Date: 2000/02/17 09:42:24 $ $Revision : 1.14.98.12 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(lv_vgda.o): lv_vgda.c $Date: 2000/02/17 09:42:25 $ $Revision: 1. 18.98.7 $ PATCH_10.20 (PHKL_20963) /usr/conf/lib/liblvm.a(sh_vgsa.o): sh_vgsa.c $Date: 2000/02/17 09:42:29 $ $Revision: 1 .3.98.10 $ PATCH_10.20 (PHKL_20963) cksum(1) Output: 2303322019 6588 /usr/conf/lib/libhp-ux.a(rw_lock.o) 1522248600 2660 /usr/conf/lib/liblvm.a(lv_block.o) 702158539 10592 /usr/conf/lib/liblvm.a(lv_cluster_lock.o) 3092669862 89316 /usr/conf/lib/liblvm.a(lv_hp.o) 2764069542 34648 /usr/conf/lib/liblvm.a(lv_ioctls.o) 2409828445 38412 /usr/conf/lib/liblvm.a(lv_lvsubr.o) 3256701688 18000 /usr/conf/lib/liblvm.a(lv_mircons.o) 2797588777 7740 /usr/conf/lib/liblvm.a(lv_phys.o) 634719265 26332 /usr/conf/lib/liblvm.a(lv_schedule.o) 2721944056 38920 /usr/conf/lib/liblvm.a(lv_spare.o) 396664708 7668 /usr/conf/lib/liblvm.a(lv_strategy.o) 823986861 10180 /usr/conf/lib/liblvm.a(lv_subr.o) 2437921177 14080 /usr/conf/lib/liblvm.a(lv_syscalls.o) 1739315003 9436 /usr/conf/lib/liblvm.a(lv_vgda.o) 2897402979 42260 /usr/conf/lib/liblvm.a(sh_vgsa.o) Patch Conflicts: None Patch Dependencies: s700: 10.20: PHKL_16750 Hardware Dependencies: None Other Dependencies: None Supersedes: PHKL_17546 PHKL_19166 PHKL_19209 PHKL_19696 PHKL_20040 PHKL_20807 Equivalent Patches: PHKL_20964: s800: 10.20 Patch Package Size: 430 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_20963 5a. For a standalone system, run swinstall to install the patch: swinstall -x autoreboot=true -x match_target=true \ -s /tmp/PHKL_20963.depot By default swinstall will archive the original software in /var/adm/sw/patch/PHKL_20963. If you do not wish to retain a copy of the original software, you can create an empty file named /var/adm/sw/patch/PATCH_NOSAVE. WARNING: If this file exists when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. It is recommended that you move the PHKL_20963.text file to /var/adm/sw/patch for future reference. To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_20963.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: This patch depends on base patch PHKL_16750. For successful installation, please ensure that PHKL_16750 is in the same depot with this patch, or PHKL_16750 is already installed. Due to the number of objects in this patch, the customization phase of the update may take more than 10 minutes. During that time the system will not appear to make forward progress, but it will actually be installing the objects.