Patch Name: PHKL_15456 Patch Description: s700 10.20 LVM/UFS/NDDB/flkmgr/VxFS/PCI cumulative patch Creation Date: 98/05/28 Post Date: 98/05/29 Warning: 98/06/16 - This Critical Warning has been issued by HP. - This patch introduces a problem in which once a VxFS filesystem mount fails, for example due to a missing device or a bad superblock, all processes accessing VxFS filesystems will hang over time. This includes processes such as quota and sync, therefore logins and the syncer may hang. - The problem was introduced in patch PHKL_15244 and also exists in the superseding patches: PHKL_15456, PHKL_15321, PHKL_15504, PHKL_15495, PHKL_15492. - It is recommended that PHKL_15456, and any of the mentioned patches, be removed from any system utilitizing VxFS filesystems. - Patch PHKL_15199 will be re-released until a replacement patch is available. - Patch PHKL_15456 is not included in any Extension Software Patch Bundle. Hardware Platforms - OS Releases: s700: 10.20 Products: N/A Filesets: AdvJournalFS.VXFS-ADV-KRN JournalFS.VXFS-BASE-KRN JournalFS.VXFS-PRG LVM.LVM-KRN OS-Core.CORE-KRN OS-Core.KERN-RUN ProgSupport.C-INC Automatic Reboot?: Yes Status: General Superseded With Warnings Critical: Yes PHKL_15456: PANIC HANG PHKL_15244: PANIC HANG PHKL_15199: HANG PHKL_15085: PANIC PHKL_13868: OTHER Port I/O will not work correctly, machine may panic. PHKL_13514: PANIC PHKL_11351: PANIC PHKL_14856: PANIC This defect was only observed under extremely heavy loaded system with infrequency. PHKL_13986: PANIC PHKL_13744: PANIC PHKL_13014: PANIC PHKL_12306: HANG PHKL_11545: PANIC PHKL_11332: PANIC PHKL_11002: OTHER T600 will not BOOT PHKL_10769: PANIC PHKL_8908: PANIC CORRUPTION Panic should occur only if root volume is LVM on built-in SCSI in 710, 720, 715/50, 725/50. Corruption should only occur on QUANTUM LPS525S on built-in SCSI in 710, 720, 715/50, 725/50. PHKL_8187: HANG PHKL_7987: HANG PHKL_14953: HANG io's hang in lvm wait queue when the FC cord link is broken PHKL_14803: HANG PHKL_14568: PANIC Panic while trying to perform an lvmerge operation. PHKL_14613: HANG PHKL_11316: HANG The system hangs when a large number of users try to log in/out of the system simultaneously. PHKL_14490: PANIC HANG PHKL_14321: PANIC PHKL_14225: ABORT PHKL_14126: PANIC CORRUPTION PHKL_14049: PANIC PHKL_13911: PANIC PHKL_13768: OTHER Prevents MC/Service Guard TOC on PA8000 systems. PHKL_13452: PANIC PHKL_13305: HANG PHKL_13260: HANG This patch fixes a defect which could essentially hang the systems with VISUALIZE-FX graphics hardware. The patch in addition to this fixes a performance problem seen on systems that use applications with large number of shared memory segments. The systems spend too much time servicing protection id faults. PHKL_13247: ABORT PANIC PHKL_13206: CORRUPTION a clean file on VxFS can be marked bad. PHKL_13155: HANG PHKL_12963: PANIC PHKL_12901: PANIC PHKL_12662: HANG PHKL_12397: PANIC HANG PHKL_12110: PANIC HANG PHKL_12100: CORRUPTION Kernel portion of fix for unrepairable JFS file system where fsck produces the error message "no valid ILIST in fileset 999". PHKL_12088: OTHER Patch REQUIRED for the OmniStorage 2.20 product PHKL_11860: PANIC PHKL_11766: PANIC PHKL_11733: OTHER Unmountable VxFS (JFS) file system. PHKL_11730: PANIC PHKL_11696: PANIC PHKL_11607: PANIC PHKL_11561: CORRUPTION This patch when used with the appropriate SG/DLM version will avoid any potential windows for data corruption during reconfiguration in a HA cluster env. using SG or DLM. PHKL_11519: ABORT PHKL_11408: CORRUPTION PHKL_11406: CORRUPTION PHKL_11358: PANIC PHKL_11321: PANIC PHKL_11238: PANIC So far, the panic only appears on MP systems running the latest Informix release. PHKL_11164: PANIC PHKL_11085: CORRUPTION From a customer perspective, EMC Symmetrix disks can appear to lose or corrupt data when rare spurious errors are reported. The data is actually able to be recovered on the disk, and this patch allows LVM to ignore the fact that the block was once "bad" and obtain the good data from the repaired block. PHKL_11055: HANG PHKL_11039: CORRUPTION This patch fixes data corruption for applications that create large files which keep changing their size dynamically. PHKL_11013: OTHER This problem leaves the file system disabled and unusable, and requires a system reboot to regain access to it. PHKL_10953: HANG PHKL_10932: OTHER HPMC on Emerald-class systems at boot time. PHKL_10930: HANG PHKL_10757: PANIC PHKL_10675: PANIC CORRUPTION PHKL_10643: PANIC PHKL_10554: PANIC The HPMC/panic that is fixed by this patch has been observed only in rare instances on pre-release hardware. However, the potential exists for similar problems in the field only if PHKL_9151 has been applied. This fix should also provide increased performance for PA-8000 systems. PHKL_10452: PANIC PHKL_10288: PANIC PHKL_10257: PANIC PHKL_10234: PANIC PHKL_10199: CORRUPTION PHKL_9931: HANG PHKL_9909: HANG PHKL_9569: HANG PHKL_9517: CORRUPTION PHKL_9372: PANIC PHKL_9370: OTHER Unmountable VxFS (JFS) file system. PHKL_9365: CORRUPTION PHKL_9361: PANIC PHKL_9075: PANIC PHKL_8953: HANG PHKL_8683: PANIC PHKL_8532: CORRUPTION PHKL_8331: CORRUPTION PHKL_8294: HANG PHKL_8203: HANG PHKL_8084: ABORT PHKL_7899: PANIC HANG OTHER The KI instrumentation fix does not fall into any of specified symptoms; the root setuid fix does not either. PHKL_7870: PANIC PHKL_7776: OTHER Unmountable VxFS (JFS) file system. Path Name: /hp-ux_patches/s700/10.X/PHKL_15456 Symptoms: PHKL_15456: SR 5003418244, DTS JAGaa07539: System experiences intermittent hangs; in some cases the system is able to resume processing without intervention in about 20 minutes, but other times it will lock up and need to be TOC'ed. The event trace in the dump shows: crash event was a TOC timespecfix+0x0 timespecadd+0x30 ktimer_expire+0x24c invoke_callouts+0x160 softclock+0x38 sw_service+0x154 mp_ext_interrupt+0x2a0 At times the system will panic with Spinlock deadlock. SR 4701391730, DTS DSDe438419: processes may hang doing mmap(2) due to a lock-order violation; the offending process' stack trace will look something like this: _swtch+0x138 _mp_b_sema_sleep+0xe0 vnode_vas_lock+0x78 freereg+0x194 smmap_common+0x5d0 smmap+0x38 syscall+0x1a4 $syscallrtn+0x0 SR 1653177089, DTS DSDe429996: 10.xx read/write via block devices is slower than 9.xx. PHKL_15244: - DTS: DSDe442455 SR: 5003413278 System panic's when multiple processes are trying to mount a snapshot file system using the same logical volume. - DTS: JAGaa02119 SR: 5003407601 Reboot process hangs on a MP system. PHKL_15199: SR:1653259408 DTS: DSDe442757 other lab DTS: DSDe441605 and DSDe442781 LVM PVLink switches with failed Fibre-Channel hubs can be very slow. LVM PVsparing does not always start using a spare when a device fails. Hang occurs with heavy i/o stress and patch PHKL_14804 installed. This patch is essential for the use of Fiber-Channel hubs. PHKL_15145: flush data to disk before splitting mirrored disks PHKL_15085: The system will panic if a graphics process does graphics dma on VISUALIZE-FX hardware, and then execs another program (without forking first). PHKL_15057: The current first-fit allocation policy for allocating virtual addresses for shared opjects (shared memory, memory mapped files, etc.) does not meet certain customers requirements. It is found to cause fragmentation of the shared virtual space, preventing certain applcations from being run. PHKL_14955: This patch allows the NDDB kernel debugger to debug 10.20 kernels running on PA-RISC 2.0 CPUs. Without this patch, NDDB (and landdb/rsddb) may report incorrect status such as "Breakpoint at 0x0". The landdb/rsddb kernel debuggers will no longer function after the patch is applied; the NDDB kernel debugger must be used instead. Other functionality provided in this patch includes: o Support the Core I/O 10Base-T LAN card for NDDB communications. o Support configurations where the console is connected to the second RS-232 port. o Support platforms using RS-232 communications on the GSC-to-PCI bus bridge. o Detach the target kernel using the "free" command; reboot the target using the "kill" or "quit" commands. o NDDB "ps", "ktps", and "cpus" commands show info on processes, "kernel threads", and CPUs. o Various bug fixes. PHKL_13868: PCI port I/O does not work properly, and machine may panic. PHKL_13514: With certain hardware configurations, especially on the J2240, it is possible for the system to panic (at bootup) with the message: panic: Could not assign interrupt handler PHKL_11351: (DSDe438658) panic: Data Page Fault with FDDI. T600 will HPMC when HSC (GSC) interface cards using a GSCtoPCI interface chip are installed. (4701362111) Any MP system is likely to panic when multiple HSC (GSC) interface cards using GSCtoPCI interface chips are installed. (DSDe437012) Performance enhancement in PCI services. PHKL_10756: This patch contains enhancements to support the GSCtoPCI PCI Bus Bridge on HP workstations. PHKL_14917: Excessive SCSI timeouts on MO drives. PHKL_14856: machine under a heavy load panics with instruction page fault PHKL_13986: Allows attachment of external nio card cage to HP_HSC to HP_PB converter PHKL_13744: Errors in opening a tape device on the HSC bus may cause a data page fault panic. The panic occurs at st_head_pos+0x5c. PHKL_13014: - 9000/725/B panic on boot. - Tape driver rejects odd-length write requests. - "Unhandled pending interrupt" declaration and SCSI bus reset. PHKL_12660: EINVAL errors on FibreChannel raw disk reads and writes. PHKL_12306: Glance reports 0 MB/s. EINVAL on read/write to FC EMC Symmetrix. System hang. Device hang. PHKL_11632: SCSI bus resets when probing bus with DLT library robotic arm via ioscan. PHKL_11545: - Fixes an intermittent panic when opening a tape device on the HSC bus - Improves performance for wide SCSI tape devices connected via the HSC bus - Allows DLT4000 devices to perform odd-sized writes PHKL_11332: (4701356766/ DSDe437007; 4701353888/ DSDe436271) T600 systems panics under heaving i/o load stress testing during lab tests. PHKL_11002: (4701353888/ DSDe436271) T600 systems will not boot without this patch on 10.20 HP-UX. PHKL_10769: On C200 and newer workstations with PCI there is a possible HPMC that can occur due to a hardware bug in certain revisions of the PCI bus ASIC. This patch prevents the HPMC from happening. PHKL_10755: Performance enhancements for PCI SCSI (Ultra-SCSI). PHKL_10421: When reading a tape on a 7980S tape drive, reading a partial record fails and returns I/O Error. If a tape device receives a bus reset the device will rewind. Following this, when the device is closed, the driver will write EOF marks at the beginning of the tape causing the remaining data to be unusable. When a tape device is opened for write access and the media is write protected the driver returns I/O Error, which can be ambiguous. EPERM (Permission denied) is a more descriptive error. PHKL_9965: The performance of some drivers (mostly Networking, and for example ATM) was not optimal on Cache Coherent IO systems such as the K-series. PHKL_8908: "SCSI: Unhandled interrupt" and resulting bus reset can cause panic during boot of 710, 715/50, 720 and 725/50 workstations if root disk is LVM and on built-in SCSI bus. It's theoretically possible for the bus reset to cause data corruption on QUANTUM LPS525S disks on the bus. Some M/O drives will not work on the above systems plus 705, 715/33, 730, 735 and 755. PHKL_8755: Fixes a bug with Exabyte tape drives that caused append writes (those not at BOT) to be in non-compressed mode when using the BEST density setting. PHKL_8506: This patch changes the behavior of the open() call with a write protected tape. The open() will now fail with EIO if the mode is not O_RDONLY. PHKL_8187: Select timeouts are retried forever, i.e. I/O's never complete when a device is removed. SCSI bus hang and reset. PHKL_8128: Device files other than those that use the BEST density do not work. Opening such a device file returns EINVAL. This happens only on DLT tape drives. PHKL_8028: Without immediate reporting enabled a DLT tape drive will take several seconds for each filemark written. If a user application is writing many filemarks to a tape the performance will be poor. This patch enables immediate reporting of filemarks, which should improve performance for an application that writes many filemarks to a tape. PHKL_7987: System hang. Select timeouts with EMC/Symmetrix disk array. PHKL_14953: io's passing through the LVM layer hang when the pv-link is broken (fc-cord is pulled off). The io's are hung even if the pv-link is restored. PHKL_14803: SR: 5003407601 DTS: JAGaa02119 System hangs during reboot. SR: 5003407619 DTS: JAGaa01516 Under heavy I/O, after the buffer cache allocation has reached its system defined maximum, the system eventually hangs when all I/O processes sleep on waiting for buffer cache. PHKL_14740: When a Service Guard cluster experiences a failure on one node, the volume groups may fail to activate on the surviving node when the shared disks are using Fibre Channel. PHKL_14685: When the customer installs the flock manager driver, the library libflkmgr.a does not exist. This is seen only in those environments running flock manager. PHKL_14568: This patch fixes 3 defects: - SR:1653254987 DTS:JAGaa01797 MP systems using LVM mirroring could experience panics (data fage fault) while doing an lvmerge() operation. - SR:4701379347 DTS:DSDe441470 Add flock manager driver functionality - SR:1653216952 DTS:DSDe437110 sleep(3C) behaves incorrectly for values larger than (2^32-1)/200 = 21474836 seconds. The man page documents legal values up to 2^32-1 seconds + 10^9-1 nanoseconds. PHKL_14613: System hangs under extreme load on UFS file system while a logical volume creation is in progress. PHKL_11316: A system "HANG" occurs whenever a large number of users log In/Out at the same time. The password file has large number of entries and the system can have around 1000 concurrent connections. Problem occurs when many processes try to access the same file concurrently, the inode locking routines start dealing with the contention very inefficiently. PHKL_14490: This patch fixes three defects: - SR: 4701382564, DTS: DSDe441726 If the kernel free sysmap space falls to 0, the system panics without prior warning that indicates this condition. - SR: 1653251934 DTS: JAGaa01592 The System hangs under heavy I/O involving VxFS type of filesystems. When the buffer cache virtual space is heavily fragmented and a readhead is being done, system hangs. - SR: 1653252544 DTS: DSDe441877 Some user applications will hang with high system load. These processes cannot be killed and the system needs to be rebooted to clear these processes. PHKL_14323: Command lvlnboot may fail with the following error: "No Boot Logical Volume Configured". PHKL_14321: This patch fixes two defects : - SR: 1653235176, DTS: JAGaa01482 This defect causes a panic in a multi processor system when two processes are doing mmap/munmap on portions of the same file using a sliding window. The user will see the system panic with "panic: rmfree: overlap" message. The stack trace will be as shown below: panic+0x10 rmfree+0x268 quaddealloc34+0x30 hdl_detach+0x108 detachreg+0x3c do_munmap+0x190 do_munmap+0x84 syscall+0x1a4 A different panic could occur due to an unrelated race condition when mmap/munmap is called. This second panic is a result of data page fault. The stack trace in this case will be as shown below: panic+0x3C report_trap_or_int_and_panic+0x8C trap+0xC18 $RDB_trap_patch+0x20 smmap+0x8F0 syscall+0x1A4 - SR: 4701383612, DTS: DSDe441827 The defect's symptoms are that writes to a VME device will succeed on 10.20 but reads will fail with EFAULT. PHKL_14225: Application experiences random illegal instruction trap when executing cache flushing code. PHKL_14183: This patch is part of the 10.20 ACE 2 bundle which adds networking enhancements to 10.20. New networking features supported in ACE 2 include NFS Version 3.0, AutoFS, and CacheFS. PHKL_14126: This patch fixes two defects : - SR: 4701381608, DTS: DSDe441567 Patch PHKL_13684 introduced a defect that can break applications which use the vnode layer procedure vn_open(). This can result in a panic due to an invalid address or possibly on data corruption. Currently we are only aware of conflicts with Netware. - SR: 1653223404, DTS: DSDe438306 vgchange display couldn't query one physical volume The specified path does not correspond to physical volume attached to this volume group. After issue vgchange with -a y -q n options, system trap panic. PHKL_14049: This patch fixes three problems: a) SR: 1653247486, DTS: JAGaa01357 For a mirrored LVM root disk containing 2**n extents, if the system is booted in maintenance mode (hpux -lm), it will panic with trap 15 data page fault on the next reboot. b) SR: 1653239137, DTS: JAGaa01378 For a root volume group with two disks which are mirror partners, if one disk becomes inaccessible, the system panics on bootup with trap 15 data page fault. c) SR: 1653248690, DTS: JAGaa01406 System panics in lv_end() with isr.ior=0.58 data page fault when a bad block is detected on disk. The console message shows: lv_readvgdats: Could not read VGDA 1 header & trailer from disk H/W path x/x.x.0 (error = 5) lv_readvgdats: Could not read VGDA 2 header & trailer from disk H/W path x/x.x.0 (error = 5) PHKL_14012: This is an enhancement to add the flock manager driver hooks to the kernel. This patch is only needed if you are planning to use the flock manager driver. PHKL_14009: pstat_getlv() returns information about first VG only. PHKL_13911: In a mixed hfs/vxfs environment, the system panics with a dirty invalid buffer which was associated with an hfs fs that has since been unmounted. The typical stack trace of the panic looks like the following: panic+0x14 brelse+0x1ec getnewbuf+0x6b0 ogetblk+0x104 getblk1+0x258 vx_getblk+0x5c PHKL_13874: This patch fixes two problems: a) When running fsadm on a file which was created under JFS version 2, sometimes "Invalid Argument" would be reported b) When running fsadm on a file which was created under JFS version 2, sometimes "No such device or address" would be reported PHKL_13795: This patch is part of the 10.20 ACE 2 bundle which adds networking enhancements to 10.20. New networking features supported in ACE 2 include NFS Version 3.0, AutoFs, and CacheFS. PHKL_13768: Temporary system hang on PA8000 systems, possibly resulting in TOC to preserve system integrity (if running MC/Service Guard). When analysing the system crash dump, a processor would be executing in the kernel routine alloc_large_page(). PHKL_13761: mprotect() system call causes high system time resulting in poor system performance. PHKL_13713: When using chown for certain user IDs, the command would fail. PHKL_13684: In a multi-vendor client-server environment (HP client or HP server), a user-supplied umask is ignored during file creation, even if no default ACL is present. This appears to violate POSIX ACL draft 12. PHKL_13680: 10.xx read/write via block special device file is slower than 9.xx. PHKL_13508: On a heavy fragmented JFS filesystem bdf and df show only small amounts of available space. df furthermore shows a larger percentage number for minfree although this concept is unknown to JFS. All this is ok on a JFS Version 2 fs, but on a Version 3 fs extents smaller than 8k are available for files. PHKL_13452: The user will see a data page fault PANIC with vx_attr_alloc(), vx_attr_indadd(), or vx_attr_iget() at the top of the stack. A sample stack trace might look like: panic+0x10 report_trap_or_int_and_panic+0xe8 trap+0x1054 $RDB_trap_patch+0x20 vx_attr_iget+0x90 vx_iremove_attr+0x358 vx_attr_uset+0x374 vx_do_ioctl+0x73c vx_ioctl+0x38 vno_ioctl+0x98 ioctl+0x444 syscall+0x1a4 PHKL_13305: This problem manifests as a hang during a reboot operation. The reboot can be initiated with a shutdown command. The problem will only appear on a multi-processor machine. I/O in progress during the reboot either from disk or from the network can cause the primary reboot processor to wait for an indefinite amount of time. PHKL_13260: This patch fixes three problems: a) SR: 1653237842, DTS: JAGaa01160 Slow performance with high system time on PA2.0 and PA1.1 systems. The system slows down under user applications with lots of shared memory segments (like Informix database which uses lots of shared memory segments). b) SR: 1653237842, DTS: JAGaa01160 Illegal sharing of protection id's lead to silent data corruption (SHMEM_MAGIC applications) c) SR: 4701373969, DTS: DSDe440766 This defect causes VISUALIZE-FX graphics hardware to lock-up. If the VISUALIZE-FX device is the console for the machine, this will render the machine unusable unless it can be reached over a network (in which case killing and restarting the X server should fix the problem). CTRL-SHIFT-RESET from the console keyboard will not terminate the X server in this case. PHKL_13247: The shared PV LINKS will not switch when needed. The activation of the volume group using vgchange -a s may fail on one of the nodes if the command is being run simultaneously on all the nodes. HPMC or bad pointer panic in FibreChannel driver on ServiceGuard cluster fail-over. Failure to activate volume group (any I/O connect type) on ServiceGuard cluster fail-over. PHKL_13237: If serialize() command is executed as: 'serialize ls', the command fails with the error number set to EINVAL. (User id must be 0, or user belong to a group that has privilege to execute the serialize command). The serialize command should have serialized the target process, in this case the command 'ls', and should have listed the content of the current directory. PHKL_13206: When getdirentries() is incorrectly called with a regular file on VxFS(JFS) file system, following message will be reported and the clean file can be marked bad. "vxfs: mesg 008: vx_direrr - file system inode X block Y error 6" or "vxfs: mesg 017: vx_dirbread - file system inode X marked bad" or "vxfs: mesg 016: vx_ilisterr - file system error reading inode X" PHKL_13155: Processes hang intermittently due to process deactivation and reactivation. PHKL_12997: When dump is configured past 2GB on a device connected to a HSC F/W SCSI interface card, the device fails to configure: WARNING: Dump space on device cannot be configured past 2097151 bytes. Device skipped. PHKL_12963: In the past, a process running in a group could consume more than just its own groups entitlement if excess CPU cycles were available. This change allows this to optionally be disallowed / capped within PRM Also, there was a problem when PRM, thread stealing and processor affinity were used/occurred on a system at the same time. In this case, it was possible for a processor to not find a process, which could cause a panic. PHKL_12901: A kernel stack overflow panic occurs when the stack is valid and all the 3 pages allocated for kernel stack are consumed. The defect is seen when a combination of NFS,LVM,JFS related modules are called. PHKL_12669: This patch contains problems in three areas: 1. There are no warning to user when bad block alternates were allocated inside the user data area. 2. There is no way to use lv timeout feature when the async driver minor number is not 0. 3. Add a new lvol flag in lv_lvsubr.c to support a command patch. PHKL_12662: HFS file system may hang on system reboot. PHKL_12633: SHMEM_MAGIC executables suffer from unacceptable performance greatly impairing its usefulness. PHKL_12601: VxFS systems do not allow a process to ftruncate() a file it has opened, write-locked and read from. When the program fails, errno is set to 11 which means resource is temporarily unavailable. PHKL_12409: Applications relying on alarm() returning a number greater than zero will fail if there was time remaining on the cancelled alarm and were trying to re-schedule the original alarm. alarm() returns "zero" when cancelling a previous alarm() with an alarm pending. PHKL_12397: This patch fixes two defects : - System panic (data page fault) when debugging processes over an interruptable NFS mount point. - After call to pstat_getmsg(), all accesses to the message queue hang. PHKL_12378: Multiprocessor systems running applications that require many files with filenames longer than 14 characters to be accessed will experience severe CPU contention. Netscape Mail server v3.0 with Netscape mail server Benchmark puts this in evidence by reporting a very low message receive/deliver rate. Systems with large buffer cache will have spinlock contention problem. PHKL_12217: The tar command can go into a close loop, backing up the same file over and over on a nfs mounted filesystem. So far this problem has been seen only on nfs filesystems exported from SGI IRIX 6.2 systems. PHKL_12110: System hangs on UP systems or spinlock deadlocks on MP systems, when using nanosleep system call. If using signals which were to be ignored when in nanosleep() we were awaken eventhough we should not. Patch is especially critical as soon as newer libc patches are installed on the system too. The first releases of libc to be critical are PHCO_10384 for 10.10, PHCO_11004 for 10.20 and PHCO_10652 for 10.01. PHKL_12100: kernel part of the fix for the slow fsck problem: command patch PHCO_12922 The following problems are fixed: 1) Under certain circumstances after a hard failure, VxFs fsck log replay would fail requiring a full fsck. 2) On large file systems containing a very large number of files, full fsck would run extremely slowly. PHKL_12088: The DMAPI functionality delivered with HP-UX 10.20 OnlineJFS delivers a Kernel-space library, but does not include a User-level library. OmniStorage 2.20 requires a User-space DMAPI library to function successfully. PHKL_12073: With sa_flags (in sigaction structure) set to SA_SIGINFO, after a child process abnormally terminates without a core file being generated, the si_code number of the siginfo_t structure is supposed to contain CLD_KILLED. This failed to happen when the child process abnormally terminated with a SIGPOLL signal. PHKL_12042: Resource-intensive processes (such as an Informix oninit) either perform poorly over time (as timeshare) or monopolize the system (as realtime). Also, MP systems show processes frequently being moved from one CPU to another. PHKL_11902: pid information is missing from a diagnostic message which tries to explain to a user that their process does not have the correct locking priveleges required for using large text pages. PHKL_11860: The PHKL_9371 installed system panic's during reboot, if reboot command is used. This patch fixes this problem. PHKL_11766: 1. vgextend command will complain "too many links" if user wants to add an addition physical volume to a volume group, and the total physical volume and their links add up to total number of max physical volume allowed in a volume group. 2. trap panic in lv_resyncpv from LVM. lots of "pvnum is POWERFAILED" messages PHKL_11733: After a hard failure (panic/hang/TOC), a JFS file system may not mount and will return the following error message: vxfs mount: %s is not a vxfs file system. This could even happen with patches PHKL_9371 and PHCO_11223 installed. A full fsck does not clean the file system; a newfs/mkfs is the only solution. PHKL_11730: Data page fault in bwrite. PHKL_11696: panic: hdl_alloc_spaceid: spacemap exhausted PHKL_11637: CDROM drive remains locked when system is rebooted. PHKL_11614: Accounting does not work for the clients (diskless) in a cluster. The accounting file does not get updated for users other than ROOT when accounting is ON.(when pacct file is on NFS file system). PHKL_11607: System may panic in vx_itimes() when mounting a JFS file system after a boot from a hard failure. PHKL_11561: A customer might find corrupt data on disk after a Service Guard or Distributed Lock Manager fail-over. This defect is specific to the HA cluster environments. PHKL_11519: On a NFS server with a Vxfs file system exported, the file /var/adm/nettl.LOG0x grows dramatically with the recording of "read failed with errno 22" (EINVAL). This problem is addressed by kernel patch PHKL_11322 among other problems. However, after the patch is installed, some NFS clients experience command coredump on the NFS mounted VxFS disk. PHKL_11471: Quota command shows poor performance on a busy system under JFS. PHKL_11408: Corruption of memory pages on systems with PA-RISC 2.0 cpu. Problem should happen rarely and only under extreme memory stress. PHKL_11406: When using large environments greater than 20 kbytes user applications dump core sometimes, or get bad data. PHKL_11358: A panic would result when trying to rename a vnode which was a UFS mount point under a JFS directory. The reverse did not cause a panic but was not being handled properly either. PHKL_11339: A process launched from shell sees (getrlimit) limits set in the shell via ulimit -t but ignores them. When a process forks, the child sees the limits set by the parent via setrlimit but ignores them PHKL_11321: This patch fixes two JFS performance problems and one defect: 1) Upgrading from 10.10 to 10.20, customer experienced approximately 25% performance degradation in read operation under JFS. 2) Loading a program for a second and subsequent time takes much longer if the program is using shared libraries on JFS. 3) Occasional panic occurs when running an executable which attempts to pagein through vnode level, producing a stack: trap+0xf84 0.0xe21c4 0x6954.0x7ffe75b8 ... $RDB_trap_patch+0x20 0.0x22608 0x6954.0x7ffe75b8 ... lbcopy_fp_method1+0x58 0.0x222b98 0x6954.0x7ffe7108 ... privlbcopy+0x1c 0.0x222fc4 0x6954.0x7ffe70c8 ... uiomove+0x4f0 0.0xfabe8 0x6954.0x7ffe7008 ... vx_read1+0x26c 0.0xb9744 0x6954.0x7ffe6ec8 ... vx_rdwr+0x16c 0.0xc5164 0x6954.0x7ffe6e08 ... vn_rdwr+0x8c 0.0x107b04 0x6954.0x7ffe6d48 ... mfs_hpux_pagein+0x448 0.0x1cbb5c 0x6954.0x7ffe6c88 ... virtual_fault+0x9a0 0.0xf1fc8 0x6954.0x7ffe6b48 ... vfault+0x104 0.0xf281c 0x6954.0x7ffe6ac8 ... trap+0x129c and the message: trap type 15, pcsq.pcoq = 0.222b98, isr.ior = 18e.4179000 savestate ptr = 0x7ffe7108, savestate return ptr = 0x222fc4 9245XB HP-UX (B.10.20) #1: Sun Jun 9 06:31:19 PDT 1996 panic: (display==0xbd00, flags==0x0) Data page fault This is mostly seen on PureAtria's Clearcase product which sets the UIOSEG_PAGEIN flag for vn_rdwr() calls. PHKL_11247: On a VxFS with quota turned on, users who do not have quota set receive "write: Disk quota exceeded" error message when creating files on this file system. PHKL_11244: When accessing an address returned by mmap(), the application gets a SIGBUS error and core is dumped. PHKL_11238: Panic on S800 MP systems using 10.01+ with latest Informix PHKL_11164: The system message buffer would show many "sysmap: rmap ovflo, lost [...]". Eventually, the system would panic with "kalloc: out of kernel virtual space." This problem would only be seen on PA8000 systems. PHKL_11085: On very rare occasions EMC Symmetrix disk drives will report a disk error which causes LVM to mark the block as bad in its bad block directory. The Symmetrix drive can be "repaired" online by EMC support, but the entry in the LVM bad block directory will prevent any further I/O to the affected block. This patch enables a new relocation policy which will prevent entries from being added to the bad block directory. In order to make use of this new relocation policy, a commands patch, PHCO_10826 must also be installed. Also, algorithms within LVM that deal with PVLinks had built in the assumption that NIKE serial numbers were unique. This turned out to not be the case. The only time that the serial numbers need to be unique is in OPS clusters. This patch removes this restriction for all non-OPS cluster environments. PHKL_11055: Large memory systems could hang while trying to allocate kernel memory. A TOC crash dump would show a processor executing in alloc_large_page() while other processors would be spinning waiting for the pfdat_lock spinlock. This problem would only be seen on PA8000 systems. PHKL_11039: JFS file system corruption when running applications using file truncation on large files. The system message buffer will show: "vxfs: mesg 017: vx_trunc - /mnt file system inode 123 marked bad. vxfs: mesg 016: vx_ilisterr - /mnt file system error reading inode 123". It also fixes a malloc panic which was found while fixing this defect. PHKL_11013: The vxupgrade command, used to convert a JFS version 2 file system to a version 3, can give an I/O error on execution. This causes the file system to be marked as unavailable, and a full fsck to be run. PHKL_11006: timer_settime(2) does not set 10ms timer interval properly. The smallest interval can be set is 20ms. PHKL_10966: When running fsadm to reorganize extents (de-fragment) the command fails with error: exclusion zone for bno = 323584, len = 2048 failed This error message does not occur at all times. It has been observed when running fsadm on file systems which contain directories with many small files. PHKL_10953: Severe system hang: the crash dump (TOC) would show many threads waiting for the filesystem alpha semaphore. The kernel stack trace of the thread owning filesys_sema would show bc_yield() calling swtch(). PHKL_10932: (4701353078/DSDe436182) Emerald-class (890, T5x0, T600) systems will experience an HPMC at boot when IODC for memory controllers is being accessed. Note: if you are not experiencing this problem now, then your memory controllers are not subject to this problem. (This problem is NOT intermittant.) PHKL_10930: The system hangs when trying to dump core, as the result of a system panic or TOC. (Normally, it would dump core and reboot.) PHKL_10821: Although users can now exec() programs with up to 2048000 bytes of argument and env strings, sysconfig() _SC_ARG_MAX continues to return 20480 bytes as the maximum length of all arguments and env strings. PHKL_10800: audit records contain relative path names which the user has no idea where they are anchored. this fix prepends the cwd to the relative path name yielding a complete absolute path PHKL_10757: Workstation Additional Core Enhancements for HP-UX 10.20 (July 1997). This patch provides additional enhancements to support new HP workstations. The primary change is the addition of a new signal (SIGGFAULT) and virtual memory subsystem changes to support virtual device locking for new VISUALIZE-FX graphics hardware. It also contains two bug fixes: one for the MP PDIR bug (could panic the system) and the other for the pstat_cmd() panic. PHKL_10689: HP-UX didn't log any error when a user process: 1. encounters a swap space shortage 2. exceeds a system resource limitation Processes were terminated but the errors were not recorded on any of the system log files. PHKL_10675: (1) System may panic with "panic: sync not stale" while running lvmerge (merging LVM mirrors). For the panic to occur, an i/o timeout must occur during the lvmerge operation which results in a resync getting scheduled. (2) Potential data corruption if user i/o's encounter errors to the same extents which are being reimaged by lvmerge. (3) Various panics during vg activation(vgchange -a). A bit is cleared in a kernel structure which LVM has already freed. If another kernel subsystem has been allocated the freed memory before the bit is cleared, panics or other strange behaviors may occur. This particular defect was introduced by PHKL_9000. PHKL_10643: System panic with Memory Mapped Files on UFS filesystem. A typical kernel stack trace would show a data page fault panic in hdl_unsetbits() called from async_pagein_comp(). PHKL_10554: PA-8000 performance; fix kernel-assisted branch prediction. Corrects corner case condition that causes an HPMC. The stack trace would point to module flip_comb(). This corner case has only been seen in lab-internal testing, using pre-release hardware. It has not been seen on any customer's system. PHKL_10452: Panic: kernel stack overflow; trace includes lv_end(). PHKL_10316: When ptrace is called from the DDE debugger while the DDE debugger has watchpoints set, the ptrace system call is called to single step the user process. If the ptrace call is handling a user signal and another signal event is pro- cessed before returning to the user process from ptrace, ptrace may incorretly sent the user's save_state program counter to an incorrect value and return EIO to the parent debugger. PHKL_10288: Panic trap 15 in bwrite() under heavy disk I/O stress. PHKL_10257: Panic with "vn_rele" with EXEC_MAGIC executable run over NFS PHKL_10234: panic: kernel scheduler interrupt PHKL_10199: 10.20 JFS version 3 file system returns the following file allocation error on file systems large than 64Gb before the file system is actually full: vxfs: msg001: vx_nospace - /dev/vgXX/lvolY file system full (1 block extent) The following console message may also be seen: vxfs: mesg 003: vx_mapbad - /dev/vgXX/lvolY file system free extent bitmap in au nnnn marked bad PHKL_10176: The total length (including terminators) of all argv and env strings passed to a newly-EXECed process was 20480 bytes. If a greater length was detected, the exec() failed with E2BIG. PHKL_10064: Setting a negative timestamp with utime() fails PHKL_9931: VxFS hangs waiting for I/O to finish.. PHKL_9919: Timing differences between CPU to large, causes MI Daemon to die frequently (often in less than 15 minutes). PHKL_9909: A deadlock can occur on system running LVM, JFS and HFS. The hang was introduced by one process running lvmerge on HFS logical volumes and another process running umount on a JFS logical volume. This deadlock can only occur with the following scenario: (1). Process A is running a lvmerge or a lvsplit on a HFS logical volume (2). Process B is running a mount, umount or sync on a JFS logical volume. PHKL_9711: Each time edquota -t is invoked for a VxFS file system, it resets the previously defined file system time limit back to default (7 days). PHKL_9569: This patch addresses 2 distinct VxFS (JFS) symptoms: - When trying to take a file-system snapshot, the mount command could fail with the following error message: # mount -F vxfs -o snapof=/dev/vg00/vxonline \ /dev/vg00/vxbackup /vxbackup vxfs mount: /dev/vg00/vxbackup is already mounted, /vxbackup is busy, or allowable number of mount points exceeded - The system could hang when manipulating directories. PHKL_9529: vgdisplay(1M)/vgextend(1M) show incorrect value for max number of PE per PV. PHKL_9517: VxFS file system corruption can occur when running the reorg option of the fsadm command on a busy file system. System diagnostic messages from the dmesg command will include the following: vxfs: mesg 008: vx_direrr - /??? file system inode X block Y error 6 vxfs: mesg 017: vx_iread - /??? file system inode X marked bad vxfs: mesg 016: vx_ilisterr - /??? file system error reading inode X vxfs: mesg 008: vx_direrr - /??? file system inode X block Y error 6 vxfs: mesg 017: vx_iread - /??? file system error reading inode X . . . PHKL_9372: Panic: "data page fault" when using fsadm to resize a mounted VxFS filesystem with disk quotas. PHKL_9370: If a customer upgrades from 10.01 or 10.10 to 10.20, and tries to increase their VxFS file systems via SAM, the file system will not mount after completion of extending the volume and file system. The typical approach for SAM is: 1) unmount the file system 2) lvextend the volume 3) extendfs the file system 4) mount the file system The mount will always fail in this case. PHKL_9365: The only symptom is random, spurious, rare instances of data corruption, usually in I/O to devices. This is rare enough (and masked by normal system configurations) that it has not been observed in customer systems, only within HP. PHKL_9361: MP machine used as a nfs client panics with: panic: (display==0xb800, flags==0x0) Data page fault panic stack: crash event was a panic panic+0x10 report_trap_or_int_and_panic+0x8c trap+0xbfc $call_trap+0x20 binvalfree_vfs+0x5c rinval+0x10 nfs_unmount+0x20 umount_vfs+0x5c umount_root+0x20 umount+0x98 syscall+0x1a4 PHKL_9273: On MP systems with several processors, applications which do file locking frequently can perform poorly. The symptoms are a high switch rate (switch rate > syscall rate) and heavy system activity (%sys > 90%). PHKL_9153: Add write-gathering support for NFS servers. PHKL_9151: This patch includes 3 separate performance enhancements. All are targetted for PA-RISC 8000 processors. 1. Kernel-assisted branch prediction. 2. bl->bll branch stub elimination. 3. Re-positioning perf-critical kernel assembly routines. PHKL_9075: Applications using Memory Mapped Files were performing poorly when mapping thousands of pregions to the same file. The problem would mainly be noticed with shared (MAP_SHARED) and exclusive (MAP_FIXED with address in the process private data space) mappings. This patch is required when using the Object Store database product from ODI. Additionally, this patch provides an enhancement to the mprotect(2) system call: mprotect(2) used to fail protecting non mmap(2)'ed addresses. This patch enables to mprotect(2) data, stack and shared memory segment addresses. Finally, this patch fixes a kernel panic with large buffer cache: kernel panic with a data page fault when attempting to copy data from the last page of the third quadrant. This will only occur on systems with a buffer cache of one gigabyte or larger. The panic message will display the following: isr.ior = 0.bffffffc running strings on a raw sar(1) output file can show some printable strings (sar ignores these). PHKL_8999: Without this patch customers are limited to supporting 2 nodes in a shared environment With this patch customers can now use SLVM in a 4 node cluster Alternate links for devices such as the Nike disk array are now supported in a shared environment This change supports a new -t switch for lvchange allowing the administrator the option to limit the time lvm holds i/os to be retried on logical volumes when disks are powerfailed. Without using this option, LVM will hold the i/os as long as there is is one disk where the data resides which may eventually return. Using this option would cause LVM to give up on the powerfailed disk and return i/o errors to the user application using the logical volume. This feature is obviously not to be used indiscriminately. For many High Availability applications, having i/os held in kernel indefinitely is not acceptable. Most customers should not need to use the new switch. PHKL_8953: The K400 4-way suddenly stopped to work. The user heavily accessed vxfs snapshot filesystem and have done sync's in parallel. We TOC'ed the system. PHKL_8716: After call to pstat_getmsg(), all accesses to the message queue pstat_getmsg() was called hang. PHKL_8683: System panic with data page fault on ICS. PHKL_8532: System crash dumps are corrupt and unusable. PHKL_8481: JFS performance in 10.20 has improved disk i/o throughput at the cost of extra CPU utilization. This patch improves JFS performance by implementing a log buffer re-use scheme and also by making modifications to the read/write sleep lock primitives. PHKL_8346: Executables cannot access more than 1.75 GB shared memory PHKL_8331: Data loss with UFS files using fragments. PHKL_8294: When multiple nfsd's access the same file simultaneously, they hang in a deadlock. PHKL_8203: MP system hangs during panic. The LED shows system staying at INIT CB0B. Machine needs to be TOC'ed to save the core dump. PHKL_8084: LVM may return I/O's with errors instead of sending them to an alternate link. This patch also facilitates using "vgreduce -f" for physical volumes which have alternate links; without this patch "vgreduce -f" is not allowed on LVM disks with alternate links. PHKL_7951: Ptrace not allowing writes on PCUX to some f.p regs PHKL_7899: KI queuedone, enqueue and queuestart traces on JFS may contain NULL values in the b_apid and b_upid fields. With a system running a heavy load using JFS on LVM, the following panic may occur: "lv_syncx: returning error: 5" Systems running JFS may hang due to a deadlock problem. The setuid bit will be removed on a JFS file when the file is edited or text has been appended to it when run as root. PHKL_7870: lvreduce(1M) may cause a system panic, if it is used to reduce an lvol which was left inconsistent by a prior LVM operation. lvreduce(1M) could not be used to remove lvols that were somehow corrupted, if it was, the command would cause a system panic. PHKL_7776: It's possible for a JFS file system to get into a state where it can't be mounted (except read-only), but fsck(1M) does not report any problem with it. At mount time, the kernel prints the following warning on the console: vxfs: mesg 012: vx_iget - file system invalid inode number XXX and mount(1M) fails with the following message: vxfs mount: /dev/XXX is not a vxfs file system. Once a file system has gotten into this state, it remains unmountable, even after running fsck(1M). Defect Description: PHKL_15456: SR 5003418244, DTS JAGaa07539: In 10.X, setitimer() does not check for the validity of user arguments properly. When an application uses setitimer(2) with a negative interrupt timer value, it causes the kernel timer expiration routine to spin in a loop while holding the timer callout_lock. If any other process tries to grab the callout_lock at this time, it will result a spinlock deadlock panic. The fix is to check the validity of the arguments and return an EINVAL for negative values. SR 4701391730, DTS DSDe438419: Under certain conditions, the system could violate internal lock-order rules while mmap(2)ing; the fix was to give up and re-acquire the region lock to avoid this. SR 1653177089, DTS DSDe429996: In 10.xx buffer caching was disabled for block devices. This produced degraded performance in reads/writes to block devices; this patch re-enables buffer caching for block devices that are not mounted as filesystems, and is a re-implementation of PHKL_13680. PHKL_15244: - DTS: DSDe442455 SR: 5003413278 When multiple processes compete in mounting a snapshot file system using the same logical volume, a race condition in JFS causes buffers of one of the snapshots to contain partially initialized data. Flushing these buffers results system panic. - DTS: JAGaa02119 SR: 5003407601 During reboot, the reboot processor is handling the reboot process, while all other processors are in idle spin. When the reboot process runs into a situation that it has to wait (e.g. biowait() for I/O completion) and gets switched out, the reboot processor at this time enters idle() and it should pick up the runnable threads from the other processors and complete the processing. The defect is that the reboot processor fails to schedule runnable threads belonging to the other processors. If the reboot's biowait() depends on the completion of other threads, the system hangs indefinitely. PHKL_15199: LVM recovered PVLinks serially so in situations where 100's of links can fail simultaneously (eg. Fibre-Channel hubs) the time to switch all the links can be hours. The PVSparing defect resulted because when a lock could not be obtained for a key data structure the sparing code assumed that no spares were available instead of retrying the operation at a later time. The UFS defect appeared to have been intruduced with PHKL_14804. Excessive rmprobe calls were being made even when the VM for buffers exceeded the buffer_low watermark. PHKL_15145: dirty buffers are not flushed to mirrored disk on lvsplit PHKL_15085: The graphics_exit hook was being called in the wrong place. It was called after all of the pregions were released in the exec path, so graphics dma buffers were not properly cleaned up, and then graphics_exit could not find them when it was called. The problem can be reproduced by having a process do graphics dma with VISUALIZE-FX hardware and then call exec without first forking. PHKL_15057: The current first fit policy for allocating virtual addresses for shared objects causes a form of fragmentation that is preventing some user's applciations from running. Basically since the algorithm is first-fit, small allocation requests can cause a large block of free virtual address range to be broken up into small ranges, thereby preventing bigger allocation requests from succeeding. PHKL_14955: Any 10.20 kernel booted on PA-RISC 2.0 will not be debuggable, due to firmware changes in PCX-U. PHKL_13868: Incorrect values were being written to port I/O registers. PHKL_13514: This defect is due to an error in the Dino CDIO regarding the reservation and assignment of interrupt resources. PHKL_11351: The GSC "I/O Adapter" was not designed to support one of the GSC control lines used by the GSCtoPCI interface chip. The GSCtoPCI services now avoids using this control line. Shared IRQ panics are due to a subtle bug in the way interrupt service routine lists are managed. (DSDe437012) Shortened code paths in PCI services. All HP-UX PCI drivers will benefit from this. (DSDe438658) Added support for an additional return value from PCI services (to FDDI driver). PHKL_10756: Support for GSCtoPCI PCI Bus Bridge in "card mode". PHKL_14917: All SCSI timeouts were set to 30 seconds. With this patch, we identify the MO drives and set a longer (5 mins) timeout for those drives. PHKL_14856: racing condition on shared interrupt bit caused the driver to mishandle setting up future IO job PHKL_13986: In K & D class computers the kernel will panic when an HP-PB external card cage is powered on and attached to the HP-HSC to HP-PB converter. This is caused by the unpatched code erroneously applying all type-B DMA devices to the first IOA instead of treating each IOA as a separate device. PHKL_13744: When an error occurs early in the open process the routine st_dev_sense can get called without any valid sense data. The sense pointer is NULL, and is passed to st_head_pos which tries to dereference the pointer. PHKL_13014: - scsi_wakeup missing; required by ISDN driver - scsi_iodone calls bp->b_iodone directly when B_CALL is set rather than calling biodone and letting it call bp->b_iodone - disk driver uses its own global variable, sd_ki, instead of the system defined global variable, km_disable, for disabling kernel metrics; and it examines sd_ki only on first open - incorrect initialization of NCR 53C700 SCSI chip for systems with sclk <= 37.5 MHz. 705/35, 715/33, 715/75, 725/75, 730/66, 735/99, 735/125/B, 750/66, 755/99, and 755/125 - SCSI bus reset while bus closed results in incorrect "Unandled pending interrupt" declaration and second bus reset. - scsi_tape driver rejects odd-length write requests on wide SCSI busses - scsi_c720 driver reports incorrect number of bytes transferred when IWR is used PHKL_12660: Disk driver, sdisk, proceeds with partial open rather than failing the open in response to SCTL_INCOMPLETE on TUR or Read Capacity during first open. PHKL_12306: bp->b_resid updated after km_io_done. Select timeout not being retried during sd_open. Unhandled interrupt on 53C895 in single-ended and/or high voltage differential mode. scsi_update_tag_state not always getting called to restart device after Queue Full. PHKL_11632: Driver was not enabling LASI SCSI burst-mode transfers. Driver was using uninitialized variables to initialize chip registers on first open. DLT library robotics do not do synchronous transfers which exposed the bug. PHKL_11545: - The tape driver will occasionally panic on the open of a tape device which is attached to the HSC bus. The panic is caused when a tape device responds with a "check condition" very early in the open process. The problem is intermittent and can not be reliably reproduced. - Patches PHKL_10417 through PHKL_10422 changed the way the driver negotiated with the device for narrow (8 bit) or wide (16 bit) transfers. This fixed a problem with 7980S tape drives, but caused the driver to never correctly negotiate for wide transfers. Any wide device on a wide interface would suffer a performance degradation because the driver was throttling it down to 8 bit transfers instead of 16 bit transfers. - Because of problems with transfering an odd number of bytes over a wide bus, the driver prevents doing an odd sized write to a wide device. The DLT4000 is a narrow device, but it is differential so it attaches to a wide bus. The driver was looking at the bus size, rather than the device type, to determine the transfer size and so it was blocking odd-sized transfers to DLT4000 devices even though these should be allowed. Writing an odd-sized record to a DLT4000 would return an EIO error. PHKL_11332: (4701356766/ DSDe437007; 4701353888/ DSDe436271) I/O stress tests panic on some of the internal T600 test machines. PHKL_11002: (4701353888/ DSDe436271) This patch is needed for 10.20 HP-UX bring up on T600 systems. PHKL_10769: Certain revisions of the PCI bus ASIC can cause an HPMC due to a incorrectly decoded memory address. Only one 4K page of memory per 256Megabytes of real memory are affected. The fix involves mapping the page(s) out so that they are unavailable to the system. PHKL_10755: The new B180 and C200 workstations include PCI Ultra SCSI devices. The SCSI subsystem has been enhanced to provide optimal performance for for these devices. PHKL_10421: 7980S problem was caused by a change in the SCSI interface driver which caused the interface to negotiate for synchronous even when the driver had not enabled that negotiation. The 7980S drive is not SCSI-2 compliant, and has problems with synchronous negotiation in some places. Other changes were enhancements - no defect. PHKL_9965: Drivers using the IO_CONTIGUOUS flag were not delivering optimal performance on Cache Coherent IO platforms. The CCIO mapping routine was making unnecessary calls to other low level subroutines. PHKL_8908: The c720 driver does not lisc->sclk soon enough. Chip timing parameters are set up incorrectly. PHKL_8755: The Exabyte bug can be reproduced by writing a large (100 Mb) file to an Exabyte drive using the 'BEST' device file in 'no rewind' mode, then writing the same file again to the same device. The first write will be substantially faster because it is compressed while following writes are not compressed. PHKL_8506: Before this patch the open() call did not look at media write protection. A write() to a write protected tape would fail, but an open() with FWRITE mode would succeed. This change was made to make the GSC driver behave the same as the NIO driver. PHKL_8187: Select timeouts are retried forever. B_NDELAY should eliminate retries on select timeout. Zalon chip bug results in SCSI bus hang. PHKL_8128: A 'break' statement was missing from the end of a switch, causing the code to fall through to an error return. Device files which specify 'BEST' density work, but a DLT device with a density other than BEST will not open. PHKL_8028: A flag in the driver indicates, for each device type, whether or not immediate reporting should be enabled for filemarks. That flag was not being set for DLT drives. To reproduce, write a short C program that writes 20 blocks of 1K bytes, each separated by a filemark. Performance will be substantially better with this patch applied. PHKL_7987: Bug in scsi_schedule_retry causes hang. 10 ms select timeout period is too short. PHKL_14953: io's passing through LVM layer in the process of being sent to the disk hang when the pv-link is broken by pulling the fc-cord. The io's wait on the wait queue as the primary lvm link is broken and also the disk is not accessible. But the lvm code was incorrectly setting the disk also not-accessible even after the link-switch, so after the alternate link is restored, the disk is still not accessible to those io's. PHKL_14803: SR: 5003407601 DTS: JAGaa02119 The reboot process is stuck at biowait() while sync'ing disks because of a failure in acquiring LVM physical buffer. The lv_reschedule() routine needs to ensure a successful retry in this case because there is no other process to trigger the emptying of physical buffer wait queue during reboot. SR: 5003407619 DTS: JAGaa01516 When buffer cache allocation reaches the defined maximum, the allocation of new buffers from system memory stops. When buffers are freed, their physical memory spaces, instead of being returned to the system, are saved in a pool to be reused for new buffers. A bug in the getnewbuf() routine disallows the reclaiming of this available memory when bufpages reaches the ceiling. Further more, another bug in the routine that frees up space for the buffer cache resource map erroneously releases more buffers than necessary. The prohibition of reusing the memory from the reserved pool and the lost of free buffers result running out of buffers in the freelist, causing all I/O processes hung in sleep wait. PHKL_14740: Device resets on Fibre Channel devices can overlap and interfere with subsequent volume group activation. PHKL_14685: The master file had to be changed to point to libhp-ux.a instead of libflkmgr.a. PHKL_14568: - SR:1653254987 DTS:JAGaa01797 The problem is an MP race condition caused by a defect in the routine lv_tbl_reimage_realloc().It is freeing the lv_bitmap before pausing the logical volume.This results in I/Os possibly reaching scheduling routines (such as lv_parallel()) while lv_tbl_reimage_realloc() is in the middle of freeing the lv_bitmap. The fix is to call lv_pause() before calling KFREE() on lv_bitmap - SR:4701379347 DTS:DSDe441470 Add flock manager driver functionality to the kernel. - SR:1653216952 DTS:DSDe437110 The data path in the kernel code that supports sleep(3C) is not wide enough to support parameter values larger than 21474836 seconds. If a larger value is passed, incorrect arithmetic is performed, with results varying from immediate return (with or without error) to sleeping for the wrong length of time. PHKL_14613: This system hang is hard to reproduce. Once in a while, during heavy load on UFS file systems if a logical volume is being created or extended, the UFS code deadlocks due to ordering problems in acquiring the inode lock and the device vnode lock. PHKL_11316: Classic "thundering herd" problem was made worse by the fact that the first thing each woken process did was to try to lock a "COMMON" spinlock. A process that gets the spinlock first, locks the INODE and releases the spinlock has to try hard to be able to get the spinlock second time while it tries to unlock the same inode. Hence the inode was locked whilst no useful work was being done. PHKL_14490: -SR: 4701382564, DTS: DSDe441726 If the free kernel sysmap space, as a percentage of 1GB, falls below the threshold value indicated by the variable kern_vm_pct, now a warning message is printed to indicate this. The variable kern_vm_scan_rate, in seconds, is used to set the frequency this check is performed. The variables kern_vm_pct and kern_vm_scan_rate are tunables that can be set in the /stand/system file. Reproduce: Run an application that uses up lot of kernel sysmap space, or run a defective kernel module that uses up kernel sysmap space continuously without releasing the allocated space. -SR: 1653251934 DTS: JAGaa01592 An Uniprocessor system hangs when heavy I/O is performed. When the buffer cache virtual space is heavily fragmented and we are doing a readahead, system hangs. The allocbuf1() won't do a bcfeeding_frenzy() to de-frag because readahead sets BX_NONBLOCK|BX_NOBUFWAIT flags. Instead it returns an error to ogetblk(). A bug there keeps us looping instead of returning to the original caller. Reproduce: On a UP K-series system with 64Mb, create a 400 Mb VxFS file system mounted at /new_fs. Make sure that /usr is also of type VxFS type. The following code will produce the hang. while true do rm -rf /new_fs/* cp -r /usr/* /new_fs done & while true do date sleep 10 done & The system will hang in about 15 minutes. - SR: 1653252544 DTS: DSDe441877 In the routine csuperpage_lock() if an attempt to acquire pfdat lock on every pfd in the superpage of which the input page frame number is a part of fails, then in some cases the lock on the original input page frame number will not be released thereby resulting in any process that subsequently tries accquiring the same pfdat lock on the same page frame to hang. Reproduction: difficult to reproduce the failure case. Requires execution of large number of user applications that subject the system to a high stress load. PHKL_14323: mkboot -p uses the block device. Because block devices use the buffer cache, the label may be overwritten with incorrect label information once the buffer cache is sync'ed. The solution involved removing use of buffer cache by block devices until a mkboot patch is available. PHKL_14321: -SR: 1653235176, DTS: JAGaa01482 Both the problems (panics) discussed earlier occur in a multiprocessor system when two processes are doing map/munmap on portions of the same file using a sliding window. Panic-1 -------- The panic is caused by a race condition in hdl_mmf_attach() in hdl_policy.c in the machine directory. The race condition is in case-3 of "overlapping" pregions. In this case, the new process' pregion starts within and extends past the existing process pregion. In the original implementation,we were releasing the region lock to call mapvnode(). At that time, the new pregion is not yet placed on the region's pregion-list and hence opens a window for a race condition. With the fix, the pregion is placed on the region's pregion-list before releasing the region lock, thereby eliminating the race condition. Panic-2 -------- The second panic is caused by a race condition in the error path of smmap() in vm_map.c in the sys directory. In a code segment following the label "bad", in the case where vnode is associated with the region, we release the region lock to be able to call dectext(), we copy the file descriptor from the vas, acquire the file-table lock and then check that the file descriptor is not zero before proceeding further. In the meantime, the file descriptor could have been closed and hence dereferencing it would cause a data page fault. To avoid this race condition, we need to postpone calling dectext(), which is what the fix does. -SR #4701383612 maps to DTS #DSDe441827 The problem is due to the space register not being set before jumping to the code that handles ulbcopies to I/O space. The fix is to move one line of assembler up above the check to see if this is a ulbcopy to I/O space. PHKL_14225: When disallowing user write permission for copy-on-write, an execution permission is added to every translation. For pages which do not have execution right before, this misses the R/W to execute promotion through fault and the proper cache flushing. PHKL_14183: New functionality to support networking features in 10.20. PHKL_14126: This patch fixes two defects : - SR: 4701381608, DTS: DSDe441567 Patch PHKL_13684 introduced a problem with procedure vn_open(). By adding one extra argument to this procedure, compatibility with older users of vn_open (now believed to be limited to Netware) was broken. Because of this extra argument it is possible that the vnode returned by this procedure can be a random memory location. This patch backs out the addition of the extra argument and it restores compatibility. - SR: 1653223404, DTS: DSDe438306 Trap panic in lv_init_immediate_reporting at 10.20 because currentPhysicalLink field in the struct pvol associated with the PV which the vgcfgrestore had been performed on was NULL. The problem will only be seen on 10.20 with patch PHKL_8999 or superseding patches as this patch introduced this new pointer field to the struct pvol. The panic problem will only be seen on multi-PV VGs. PHKL_14049: This patch fixes three defects: 1) SR: 1653247486, DTS: JAGaa01357 After a maintenance mode reboot, if the root LV is mirrored, we make the copy of root on the boot device the only fresh copy. Before updating the VGSA with the new stale/fresh information, a structure used to pass physical extent info to the configuration is set up. In the case of a LV containing 2**n extents, memory allocated for the array of structures is exactly enough for the extents. The terminator of the array was written beyond the allocated memory. This corrupts the memory at the next address and causes system to panic when the next piece of memory is accessed. 2) SR: 1653239137, DTS: JAGaa01378 When the root VG has a mirror PV missing with a lower PV index number than the boot disc, the PV's current physical link field is zero. The code attempts to dereference this null pointer in the bootup path and traps. 3) SR: 1653248690, DTS: JAGaa01378 The problem is caused by a disc with bad blocks in the LVM structure area. This results the logical volume field in the physical request buffer to be zero. Deferencing this null pointer causes data page fault. PHKL_14012: This is an enhancement to add the flock manager driver hooks to the kernel. PHKL_14009: pstat_lvinfo() algorithm describes that if the number of entries requested is non-zero, it will traverse through all the Volume Groups (VG) to report the open logical volume information. The test (lvix >= vgp->num_lvols) is used to test if LV index is covered by VG. This should be (lvix > cur_lvs) which is lvix compared to the number of open logical volumes. Also there can be some volumes that are configured but not mounted.(the VG where they reside is still ACTIVE). The fix now shows the all the logical volumes that are open in the system (within any Volume Group). The defect can be reproduced by writing a program based on pstat_getlvinfo() to display information about Logical Volumes configured on a system. The output of this program only shows logical volumes for the first volume group. PHKL_13911: Essentially, in getnewbuf, we might set a B_DELWRI buffer to B_BUSY, but later decide to NOT write it out (in fixing some deadlock problem). Rather, we will simply brelse it. If an umount process of the device or filesystem comes in between the time of setting B_BUSY and brelse(), it will skip the buffer thinking that someone else is flushing it out. Later when it calls binval() assuming all buffers should have been written out, it may invalidate a dirty buffer. PHKL_13874: There were two defects fixed by this patch: a) Report of "Invalid Argument": The problem occurs when reorganizing a version 2 JFS with EXT4 type of extents from an original indirect extent to a reorg'ed indirect extent. In this case, vx_reorg_emap() does not allocate a fixed size extent for the reorg'ed inode's indirect data extent. The incorrect size causes vx_enter_ext4() fail to enter the allocated extent to the extent map, resulting an EINVAL error. b) Report of "No such device or address" This was caused also by incorrectly handling the indirect extent of an IORG_EXT4 type of file (alson created under version 2). In this case vx_reorg_copy() would blindly attempt to copy pad bytes if the indirect extent of the original file was not completely filled. (Pad bytes are used in the last indirect extent if the data does not fit exactly)(This is a consequence of having a fixed size for all indirect extents of the IORG_EXT4 type of file). PHKL_13795: New functionality to support networking features in 10.20 PHKL_13768: On PA8000 systems, when free physical memory becomes heavily fragmented, the time needed to find free large pages (superpages) increases drastically. During this time (possibly several seconds) the kernel would preempt any user or system processes. This could result in MC/Service Guard TOC'ing the system. The fix was to yield the cpu when spending too much time in alloc_large_page(). PHKL_13761: Use of a global mprot_list_lock lock caused spinlock contention as its one lock per system and hence poor system performance. It is still used to protect mprot_list. The fix was to use another pool of locks called prp_hash_locks for protection ids. The hashing function chooses different locks (from this pool of hash locks) for different range of addresses thus removing dependency on the various locks that should be acquired before the protection ids for a given page are changed. PHKL_13713: The defect was due to an uninitialized field (ex_elen) in the vx_extent structure when allocated by the vx_dqnewid procedure. PHKL_13684: The system supplies a "default" ACL even if none has been configured. This in turn overrides any umask, and produces unexpected file access behaviors. PHKL_13680: In 10.xx buffer caching was disabled for block devices. This produced degraded performance in reads/writes to block devices. PHKL_13508: vx_statvfs doesn't count extents smaller than 8k for f_bavail. PHKL_13452: When changing multiple attributes on a file, VxFS (JFS) code creates a "ghost" inode, or working copy to make changes. When the changes are complete the ghost inode is swapped into the real inode, thus making the changes visible in the file system. Some parts of the "ghost" inode may not be set by the time other parts of the VxFS code try to use them. These uninitialized parts of the "ghost" inode cause a data page fault and panic when they are referenced before they are initialized. When only one attribute is changed, no "ghost" inode is created. Changes are made directly to the inode involved. PHKL_13305: The defect is that the scheduling algorithm in idle() was preventing the reboot processor from picking up the thread that needed to complete processing because that thread was locked to another processor. SR: 1653096131 1653138164 1653166066 1653166496 1653166983 1653177089 1653179895 1653183699 1653189738 1653192294 1653194555 1653194977 1653199802 1653202754 1653207175 1653211607 1653213082 1653214338 1653215020 1653215467 1653216077 1653216606 1653216952 1653218065 1653220079 1653221820 1653221895 1653223404 1653227983 1653228106 1653230771 1653235176 1653237842 1653239137 1653247486 1653248690 1653253229 1653254987 1653258681 1653259408 1653260158 4701327338 4701327544 4701329292 4701329300 4701329417 4701329433 4701329441 4701330647 4701333419 4701334367 4701334698 4701334847 4701334995 4701335497 4701335935 4701336412 4701337394 4701339226 4701341362 4701341479 4701341644 4701341669 4701342089 4701342121 4701342147 4701344515 4701345843 4701346122 4701346791 4701347922 4701348359 4701349175 4701349431 4701350157 4701350975 4701351932 4701352278 4701352799 4701353078 4701353094 4701353102 4701353888 4701354274 4701354803 4701354837 4701354845 4701355123 4701355321 4701355560 4701355610 4701356766 4701356931 4701358143 4701358523 4701360693 4701360925 4701361188 4701361444 4701361758 4701362111 4701364182 4701365114 4701365791 4701371294 4701371617 4701372276 4701373050 4701374520 4701375816 4701375956 4701376269 4701376863 4701377226 4701377580 4701378117 4701379347 4701379354 4701380519 4701380808 4701381608 4701382564 4701383315 4701383612 4701388975 5000698738 5000716225 5003281469 5003314633 5003317487 5003318667 5003323493 5003325506 5003328237 5003329078 5003330746 5003330910 5003334961 5003341925 5003344630 5003345496 5003348425 5003353797 5003356345 5003357616 5003359414 5003359489 5003360024 5003360446 5003361766 5003363523 5003363820 5003364224 5003365692 5003366500 5003366971 5003367888 5003367979 5003368290 5003379156 5003380113 5003384586 5003385203 5003385393 5003387019 5003387183 5003397174 5003398800 5003399188 5003407221 5003407601 5003407619 5003409185 5003413278 5003418244 4701391730 1653177089 Patch Files: what(1) Output: cksum(1) Output: Patch Conflicts: None Patch Dependencies: s700: 10.20: PHCO_12922 PHCO_8871 PHNE_13245 Hardware Dependencies: None Other Dependencies: None Supersedes: PHKL_7776 PHKL_7870 PHKL_7899 PHKL_7951 PHKL_7987 PHKL_8028 PHKL_8084 PHKL_8128 PHKL_8187 PHKL_8203 PHKL_8294 PHKL_8331 PHKL_8346 PHKL_8481 PHKL_8506 PHKL_8532 PHKL_8683 PHKL_8716 PHKL_8755 PHKL_8908 PHKL_8953 PHKL_8999 PHKL_9075 PHKL_9151 PHKL_9153 PHKL_9273 PHKL_9361 PHKL_9365 PHKL_9370 PHKL_9372 PHKL_9517 PHKL_9529 PHKL_9569 PHKL_9711 PHKL_9909 PHKL_9919 PHKL_9931 PHKL_9965 PHKL_10064 PHKL_10176 PHKL_10199 PHKL_10234 PHKL_10257 PHKL_10288 PHKL_10316 PHKL_10421 PHKL_10452 PHKL_10554 PHKL_10643 PHKL_10675 PHKL_10689 PHKL_10755 PHKL_10756 PHKL_10757 PHKL_10769 PHKL_10800 PHKL_10821 PHKL_10930 PHKL_10932 PHKL_10953 PHKL_10966 PHKL_11002 PHKL_11006 PHKL_11013 PHKL_11039 PHKL_11055 PHKL_11085 PHKL_11164 PHKL_11238 PHKL_11244 PHKL_11247 PHKL_11316 PHKL_11321 PHKL_11332 PHKL_11339 PHKL_11351 PHKL_11358 PHKL_11406 PHKL_11408 PHKL_11471 PHKL_11519 PHKL_11545 PHKL_11561 PHKL_11607 PHKL_11614 PHKL_11632 PHKL_11637 PHKL_11696 PHKL_11730 PHKL_11733 PHKL_11766 PHKL_11860 PHKL_11902 PHKL_12042 PHKL_12073 PHKL_12088 PHKL_12100 PHKL_12110 PHKL_12217 PHKL_12306 PHKL_12378 PHKL_12397 PHKL_12409 PHKL_12601 PHKL_12633 PHKL_12660 PHKL_12662 PHKL_12669 PHKL_12901 PHKL_12963 PHKL_12997 PHKL_13014 PHKL_13155 PHKL_13206 PHKL_13237 PHKL_13247 PHKL_13260 PHKL_13305 PHKL_13452 PHKL_13508 PHKL_13514 PHKL_13680 PHKL_13684 PHKL_13713 PHKL_13744 PHKL_13761 PHKL_13768 PHKL_13795 PHKL_13868 PHKL_13874 PHKL_13911 PHKL_13986 PHKL_14009 PHKL_14012 PHKL_14049 PHKL_14126 PHKL_14183 PHKL_14225 PHKL_14321 PHKL_14323 PHKL_14490 PHKL_14568 PHKL_14613 PHKL_14685 PHKL_14740 PHKL_14803 PHKL_14856 PHKL_14917 PHKL_14953 PHKL_14955 PHKL_15057 PHKL_15085 PHKL_15145 PHKL_15199 PHKL_15244 Equivalent Patches: None Patch Package Size: 3800 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHKL_15456 5a. For a standalone system, run swinstall to install the patch: swinstall -x autoreboot=true -x match_target=true \ -s /tmp/PHKL_15456.depot 5b. For a homogeneous NFS Diskless cluster run swcluster on the server to install the patch on the server and the clients: swcluster -i -b This will invoke swcluster in the interactive mode and force all clients to be shut down. WARNING: All cluster clients must be shut down prior to the patch installation. Installing the patch while the clients are booted is unsupported and can lead to serious problems. The swcluster command will invoke an swinstall session in which you must specify: alternate root path - default is /export/shared_root/OS_700 source depot path - /tmp/PHKL_15456.depot To complete the installation, select the patch by choosing "Actions -> Match What Target Has" and then "Actions -> Install" from the Menubar. 5c. For a heterogeneous NFS Diskless cluster: - run swinstall on the server as in step 5a to install the patch on the cluster server. - run swcluster on the server as in step 5b to install the patch on the cluster clients. By default swinstall will archive the original software in /var/adm/sw/patch/PHKL_15456. If you do not wish to retain a copy of the original software, you can create an empty file named /var/adm/sw/patch/PATCH_NOSAVE. Warning: If this file exists when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. It is recommended that you move the PHKL_15456.text file to /var/adm/sw/patch for future reference. To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHKL_15456.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: The global variable (flag) `use_bestfit' determines what allocation policy will be used. The default setting of this flag is 0, resulting in fist-fit being used for all allocations for shared virtual addresses. To enable the system to use best-fit, this flag (use_bestfit) must be set to a non-zero value (say 1) using `adb.' The value can be set/reset at any time during system operation. The policy that will be used is based on the current value of this flag. The patch is applied to the kernel being debugged on the target machine, not the host machine where the debugger is run. - SR: 4701382564, DTS: DSDe441726 Two tunable variables have been defined for this patch. They are: kern_vm_pct - the threshold percentage. When the free sysmap space as a percentage of 1 GB falls below the value of kern_vm_pct, warning message is printed to notify the user of this condition. kern_vm_scan_rate - the time interval in seconds between subsequent checks that statdaemon makes to determine if the free kernel sysmap space as a percentage of 1GB is below the threshold percentage (kern_vm_pct) value or not; if below the threshold value print the warning message about the condition. By default both kern_vm_pct and kern_vm_scan_rate are set to 0 ie. by default the monitoring of the free kernel sysmap space is turned off. To turn on the feature you need to set kern_vm_scan_rate and kern_vm_pct variables to non zero value. eg. In file /stand/system * Tunable parameters kern_vm_pct 10 kern_vm_scan_rate 10 Threshold percentage is 10%; scan rate is 10 seconds. CAUTION: Failure to follow the instructions in this section could result in undesired system behavior up to and including data corruption or a system panic! This kernel patch need to work with the command patch PHCO_12922; please install PHCO_12922 with this patch. Installed alone, this kernel patch will not solve the fsck problem. --- If you are planning to install the advanced VxFS product (AdvJournalFS.VXFS-ADV-KRN), it is imperative that this patch, and all listed superseded patches, be removed from the system via swremove(1M) before the actual product installation. After the installation of the advanced VxFS product has completed, this patch can be re-installed. (It is not necessary to re-install superseded patches.) All patches listed in the Supersedes field are subject to this behavior and need to be removed before installing the advanced VxFS product. After running swremove(1M), use the swlist(1M) command to insure that none of the previous patches were restored during the removal process. If one was, remove it using swremove(1M). --- When this patch is installed the default environment size is 20478 bytes. To enable the system to use the larger environment size of 2048000 bytes, the following steps must be followed. 1. A new tunable called `large_ncargs_enabled' must be defined in the sytem file in the following manner large_ncargs_enabled 1 2. A new kernel must be built (using this system file) and the system rebooted. To return to the default environment size, the new tunable needs to be either removed from the system file, or its value set to zero. A new kernel should then be built (using the modified system file) and the machine rebooted. --- Due to the number of objects in this patch, the customization phase of the update may take more than 10 minutes. During that time the system will not appear to make forward progress, but it will actually be installing the objects.