Patch Name: PHKL_19614 Patch Description: s700 10.20 WSIO SCSI cumulative patch Creation Date: 99/08/27 Post Date: 99/09/20 Hardware Platforms - OS Releases: s700: 10.20 Products: N/A Filesets: OS-Core.CORE-KRN ProgSupport.C-INC Automatic Reboot?: Yes Status: General Superseded Critical: Yes PHKL_19614: OTHER ioscan output is affected. LVM primary paths become unavailable. ServiceGuard Tocs node. PHKL_19130: PANIC PHKL_19097: PANIC PHKL_18917: HANG PHKL_18390: HANG PHKL_17467: HANG PHKL_16861: HANG PHKL_16926: PANIC Path Name: /hp-ux_patches/s700/10.X/PHKL_19614 Symptoms: PHKL_19614: DTS# JAGab70055 SR# 8606103392 The command "ioscan -kfn" displays erroneous output with the patch PHKL_19131. A fast wide differential disk may be incorrectly reported as narrow single- ended under description column in the ioscan output. DTS# JAGab71209 SR# 8606103998 Suppose a diskarray is hooked up via Fibre Channel with the primary paths through mux1 and the alternate paths through mux2. If mux1 is powerfailed, LVM fails over to the alternate paths, but when mux1 is powered back up, the primary paths are still unavailable. DTS# JAGab41088 SR# 8606101027 ServiceGuard package TOCs node because cluster lock ioctls take too long. PHKL_19130: DTS#: JAGaa42584 SR#: 1653281824 system panic with "scsi unrecovered deferred error" DTS#: JAGaa44446 SR#: 8606101359 command-mode might stop working at any arbitrary time with respect to the application and device trying to use it. DTS#: JAGaa08513 SR#: 8606101473 While doing an Inquiry command to request Unit Serial Number Page, one extra byte is transfered. There are no real symptoms associated with this problem. PHKL_19097: System panics (Data page Fault) in scsi_start_bus_locked() PHKL_18917: LVM hangs due to I/O requests never being returned by the IO subsystem. The message "Device violation of Contingent Allegiance" is issued to syslog. PHKL_18390: SR:1653300004 DTS: JAGaa47696 (dup of JAGab11155) Slow PVlink failover after installing PHKL_17467. Diskinfo reports back on an unavailable disk. SR:1653300970 DTS:JAGab11365 SR:1653290395 DTS: JAGaa47016 A faulty disk can prevent the LVM mirroring from working. PHKL_17467: I/O failover hang on Fiber Channel PV_link. PHKL_16861: I/O failover hang on Fiber Channel PV_link. PHKL_17639: This patch enables new functionality that is part of the 10.20 ACE (Additional Core Enhancements) Workstation bundle, which adds new I/O drivers to support the B1000, C3000, and J5000 systems. PHKL_16926: SR:5003434118 DTS:JAGaa23967 System panics (Data Page Fault) in scsi_destroy_scb SR:5003429654 DTS:JAGaa40369 System panics in c720_invalid_req_done SR:4701407890 DTS:JAGaa23080 Unexpected Disconnect Messages when using pass through driver Defect Description: PHKL_19614: DTS# JAGab70055 SR#: 8606103392 PHKL_19131 affects disk representation in ioscan output. This problem happens due to A class system firmware returning 4 words of info in response to a GET_INITIATOR PDC call instead of 6 words and the 5th and 6th words are zeroed out. Driver misinterpreted these zero values for the 5th and 6th words and assumed that the firmware wanted it to configure the SCSI card as Narrow Single Ended card. Resolution: The type for the buffer used for info returned by firmware for GET_INITIATOR call is changed from ulong_t to int in c720_init() routine. This fixed the problem. DTS#JAGab71209 8606103998 The root cause of the problem is bit collision between definition of the flags L_EPOWERF and L_DEFERRED_ERROR. Resolution: As a resolution the flag L_DEFERRED_ERROR is redefined with a different value. DTS# JAGab41088 SR# 8606101027 ServiceGuard TOCs node because cluster lock ioctls take too long. This occurs with 10.20 of HP-UX and 10.10 of ServiceGuard. The root cause of the problem is that the sdisk driver responds with cached data when an SIOC_INQUIRY ioctl is issued for an open device. This defeats LVM's attempt to determine if the either the device or the path to it have failed before issuing IO requests to acquire the cluster lock. When the IO request is subsequently issued, the improved error handling recently reintroduced to the sdisk driver now results in several retries being attempted before the error is reported back to LVM. This does not allow LVM sufficient time to switch to the mirror disk or alternate path and successfully complete the operation before ServiceGuard TOCs the machine. Resolution: The resolution to this problem is to make all SCSI inquiries go out to the device rather than read the data cached in memory. PHKL_19130: DTS#: JAGaa42584 SR#: 1653281824 If immediate reporting is enabled and a deferred error occurs, the system will panic with "scsi unrecovered deferred error". Resolution: The new deferred error check/handling method is to block all IO requests for the disk, when a deferred error occurs, until the device is closed and reopened. DTS#: JAGaa44446 SR#: 8606101359 scsi_ctl replaces the cdevsw table entries for d_read and d_write when the lun is not in command-mode for performance improvements. The problem is that the cdevsw table is a global resource and is not owned by a lun and command-mode might stop working at any arbitrary time. Resolution: Removed that code. DTS#: JAGaa08513 SR#: 8606101473 the FC data length exceeds the maximum SCSI transfer length by 1 byte while performing an Unit Serial Number Page. Resolution: Reduce the size of the scsi serial structure by 1. PHKL_19097: When sd_open() fails in scsi_lun_open, we goto recover_lck1 which falls through to recover_lp. recover_lp sets lp->ddsw to NULL but fails to set lp->scb_q_nonempty to NULL. This causes a data page fault panic in scsi_start_bus_locked(). This might occur when an open() fails on a busy device. Resolution: Set lp->scb_q_nonempty to NULL in label recover_lp in scsi_lun_open(). PHKL_18917: When the message is issued (typically caused by a bus RESET during contingent allegiance condition (CAC)), the corresponding I/O request is then lost and never returned to the requestor, eventually causing a system hang. Resolution: When a bus RESET happens during a CAC, the c720 driver now insures that all currently active I/O requests are posted as incomplete and scheduled to be retried. PHKL_18390: SR:1653300004 DTS: JAGaa47696 (dup of JAGaa11155) Slow PVlink failover or diskinfo reporting good disk status on an unavailable disk is due to the SCSI INQUIRY command returning cached data instead of sending the command down to the device." Resolution: We now ensure an INQUIRY command will be sent down to the device when the disk becomes nonresponsive. SR:1653300970 DTS:JAGab11365 and SR:1653290395 DTS:JAGaa47016 If a faulty disk sends NOT_READY sense key to SCSI. The current SCSI policy is to retry the request until the disk is ready. This results in a hang IO situation and prevents the LVM mirroring from working. Resolution: LVM-related NOT_READY requests will be treated as nonresponse from the disk and will therefore be failed back for LVM to handle. PHKL_17467: In a hardware configuration, mirrored disks can be accessed through primary/alternate Fiber Channel (FC) links. If the primary link and the alternate link of a disk of the mirrored pairs are down, the other disk should continue to sending or receiving data. The problem is it fails to do so and causes an I/O hang. Resolution: This patch provides fix for this hang problem. The SCSI layer will retry the FC requests as long as the PFTIMEOUT period has not expired and the request is recoverable. PHKL_16861: In a hardware configuration, mirrored disks can be accessed through primary/alternate Fiber Channel (FC) links. If the primary link and the alternate link of a disk of the mirrored pairs are down, the other disk should continue to sending or receiving data. The problem is it fails to do so and causes an I/O hang. This patch will provide a temporary fix for this problem. In this fix, the SCSI layer will retry the FC request as long as the FC sets a flag to ask for retrying the request. PHKL_17639: New functionality to support the B1000, C3000, and J5000 systems on HP-UX 10.20. New functionality adds new I/O drivers. Resolution: Add support for new SCSI hardware in the SCSI driver. PHKL_16926: SR:5003434118 DTS:JAGaa23967 There is a race condition between scsi_lun_open and scsi_start_bus_locked. This can be fixed by incrementing the in_use counter before releasing the lun lock therefore insuring the lun stay open. SR:5003429654 DTS:JAGaa40369 In c720_invalid_req_done, we directly dereference scb->busp without assuring that this scb is a bus scb. The busp pointer is NULL if the scb is a lun scb. Thus, the fix is to add a check to see whether lsp->scb->busp is NULL, if so, obtain the busp from lsp->scb->lp->bus instead. SR:4701407890 DTS:JAGaa23080 When using the pass through driver with the "inhibit Inquiry on open" option (see scsi_ctl(7)) and a device on a SCSI bus with no other devices and repeatedly opening and closing the device to send but a single SCSI command, the bus is sometimes in the wrong state when the target device begins to transfer data. SR: 1653281824 1653290395 1653300004 1653300970 1653306654 4701398263 4701407668 4701407890 4701414136 5003429654 5003434118 5003464297 8606103392 8606103998 8606101027 Patch Files: /usr/conf/lib/libhp-ux.a(scsi_c720.o) /usr/conf/lib/libhp-ux.a(scsi_ctl.o) /usr/conf/lib/libhp-ux.a(scsi_disk.o) /usr/include/sys/scsi_ctl.h what(1) Output: /usr/conf/lib/libhp-ux.a(scsi_c720.o): scsi_c720.c $Date: 99/08/27 13:04:46 $ $Revision: 1.5.98.46 $ PATCH_10.20 (PHKL_19614) scsi_c720.c $Date: 99/08/27 13:04:46 $ $Revision: 1. 5.98.46 $ /usr/conf/lib/libhp-ux.a(scsi_ctl.o): scsi_ctl.c $Date: 99/08/23 12:48:50 $ $Revision: 1 .9.98.40 $ PATCH_10.20 (PHKL_19614) /usr/conf/lib/libhp-ux.a(scsi_disk.o): scsi_disk.c $Date: 99/08/23 12:45:27 $ $Revision: 1.7.98.36 $ PATCH_10.20 (PHKL_19614) /usr/include/sys/scsi_ctl.h: scsi_ctl.h $Date: 99/08/26 13:08:29 $ $Revision: 1.8 .98.13 $ PATCH_10.20 (PHKL_19614) cksum(1) Output: 2413026974 98104 /usr/conf/lib/libhp-ux.a(scsi_c720.o) 3738181382 67856 /usr/conf/lib/libhp-ux.a(scsi_ctl.o) 3481520026 20588 /usr/conf/lib/libhp-ux.a(scsi_disk.o) 2027567749 52468 /usr/include/sys/scsi_ctl.h Patch Conflicts: None Patch Dependencies: s700: 10.20: PHKL_16750 Hardware Dependencies: None Other Dependencies: None Supersedes: PHKL_16861 PHKL_16926 PHKL_17467 PHKL_17639 PHKL_18390 PHKL_18917 PHKL_19097 PHKL_19130 Equivalent Patches: PHKL_19615: s800: 10.20 Patch Package Size: 300 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_19614 5a. For a standalone system, run swinstall to install the patch: swinstall -x autoreboot=true -x match_target=true \ -s /tmp/PHKL_19614.depot By default swinstall will archive the original software in /var/adm/sw/patch/PHKL_19614. If you do not wish to retain a copy of the original software, you can create an empty file named /var/adm/sw/patch/PATCH_NOSAVE. WARNING: If this file exists when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. It is recommended that you move the PHKL_19614.text file to /var/adm/sw/patch for future reference. To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_19614.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: This patch depends on base patch PHKL_16750. For successful installation, please ensure that PHKL_16750 is in the same depot with this patch, or PHKL_16750 is already installed.