Patch Name: PHNE_17510 Patch Description: s700_800 10.20 R4.4 SNAplus Link cumulative patch Creation Date: 00/03/07 Post Date: 00/09/28 Hardware Platforms - OS Releases: s700: 10.20 s800: 10.20 Products: SNAP-LINK R4.4 Filesets: SNAP-LINK.SNAP-LINK Automatic Reboot?: Yes Status: General Release Critical: Yes PHNE_17510: PANIC PHNE_17509: PANIC HANG PHNE_15608: PANIC HANG PHNE_13765: PANIC PHNE_9659: PANIC Path Name: /hp-ux_patches/s700_800/10.X/PHNE_17510 Symptoms: PHNE_17510: (1) JAGab64407/1653293522 System panics when having network problems on token ring link. Stack shows s1pugeti from s1pncrct and s1psntfy. PHNE_17509: (1) 5003439372 System hang when starting the SNAplus daemon. When the system hangs, the sna process stack is : LEVEL FUNC lev 0) ccio_unmap+0xec lev 1) nio_initialize+0x3ec lev 2) psi0_ioctl+0x938 lev 3) spec_ioctl+0xd4 lev 4) vno_ioctl+0x98 lev 5) ioctl+0x78 lev 6) Enf_ioctl+0xf4 lev 7) syscall+0x1a4 lev 8) $syscallrtn+0x0 (2) 5003437145 System panic with s1pnsgsv+0x214 on stack - most likely occurs after a link failure. In the dump file, we can see that we are processing data on a mode control block which has been corrupted (several fields have been 'randomly' incremented). (3) 5003423095 System panic occurs when shutting down the system,while SNA is running LU6.2 transactions (s1pxabnd on top of stack) (4) 1653285890 system panic processing received XID - stack trace was: $call_trap+0x20 s1pcrqbd+0x14 s1pncrct+0x183c s1pncfgr+0x20c s1pgdisp+0x1dc sna_1_sbpsched+0x110 sna_1_sbpikusv+0x50 (5) 1653274928 Panic from sna_q_sbpikuwp+0x6fd on system running SNAPLUS R4.2 with QLLC link. PHNE_15608: (1) 5003403691 If an attach comes in and is queued up for a TP which is currently processing another conversation, and that TP then exits without issuing a RCV_ALLOC verb to pick up the attach, then it will be 30 seconds before the attach manager spawns a new copy of the TP to handle it. (2) 4701394676 Incorrect management of the IO unmapping mechanism within the PSI libary can cause system hangs when starting the snaplus daemon and initialising the PSI card. (3) 1653273201 Unwanted trace entries in syslog file from libpsi0.a (4) 1653264416 System panic can occur whenever a QLLC link is closed (either on shutting down the system, or deactivating the link). This is a window condition, and does not happen on every close down. (5) 1653260166 SNAPlus sends large SDLC segments despite configuring BTU size of 265. This results in the host bringing down the link or losing data. PHNE_13765: (1) 5003387670 Firmware: --------- 1- The EISA card was never configured to use FULL-DUPLEX. 2- The EISA card couldn't manage an IN frame and an OUT frame at the same time (i.e. on the same interrupt). Driver: ------- 1- The card can't manage an IN frame during a shutdown (if the driver acknowledges the end of the IN frame with the same interrupt it acknowledges the shutdown, the firmware can't manage it). 2- A global variable used in open function can be corrupted by interrupt and leads us to a kernel panic. 3- The driver can stay a lot of time in interrupt function of level 5 and crash the kernel. (2) 5003382994 When starting a Node with initially-active connection(s) it may report error SNA0039. The initially-active connections come up correctly on a retry. (3) 1653237644 Crash during start up (link activation). Caused by getting BIND with bad URC field that does not match the URC from INIT-SELF. Failure in s1pcasrp or s1psineg. (4) 1653234948 SNAplusLink R4.4 EISA PSI Constant Carrier not raising RTS PHNE_12816: (1) 1653227611 Parallel mode with initially active sessions. Host resets sessions by UNBIND on all sessions (CICS taken down for the night). Auto-CNOS also rejected with UNBIND. When CICS restarts and session limit set we fail to activate 1 session. Each night another session is lost. Sessions cannot be activated by snamanage. (2) 5003378166 Could not use multipoint SDLC and configure the link as full duplex because this required that constant carrier be configured. However, constant carrier does not work on with multipoint SDLC - RTS is no longer set on continually when SNAPlus is configured for constant carrier. PHNE_9886: (1) 1653207761 Queued operator started TP hangs for 30 seconds when RECEIVE_ALLOCATE issued even though attach has been received. In fact attach received while previous copy of TP still running (that does not issue a second RCV_ALLOC). The DLOAD is retransmitted by the 2.1 DMOD on a 30 second timer which then satisfies the RCV_ALLOCATE. (2) 1653212332 Certain TN3270 products will not connect to TN Server because we do not support the correct TN3270 regime. (3) 1653215681 Raw VTAM application (not CICS) treats our deactivation of SNASVCMG session (when it the only one) as a fatal error for conversations on that mode. Other SNA implementations do not deactivate the SNASVCMG sessions. (4) 1653219204 Link failures with SDLC 64K link on D class system while working fine at 19.2K or on 64K link for E class system. SDLC Link fails when activating 3270 sessions, with the message 'LOST INTERUPT' in SNA trace files. PHNE_9659: (1) 1653185140 1/ SNAPlus takes the X25 address from the X121 address field instead of the X121 packect address filed in X25 configuration file. 2/ Incoming calls are rejected when an SNA subaddress is configured. (2) 1653201301 Crash occasionally on reboot when there is a small flurry of activity (the RJE workstation is killed and we issue TERM-SELF messages for the LUs). We go into a flow control condition temporarily and this gets released after the SNA software has been terminated (the router terminate log is queued up in the kernel). The token ring glue module tries to process the flow control release event with terminated data structures and the panic occurs. PHNE_9554: (1) 1653180323 APPC Application fails to pick up incoming attach after running for some time. (2) 1653185306 The case of a N_PVC_DETACH indication was treated like a N_PVC_DETACH confirmation: The outage was not reported to the Node. (3) 1653191924 On R4.2 customer configured incoming peer SDLC connection. DTR was raised but instead of frames being received RX overrun and Lost interrupt events occured. Short frames could be received, frames could be transmitted. (4) 1653183947 The problem occurs because the customer has an unusual configuration and then gets a certain error sequence. Specifically the customer has a dependent LU6.2 local LU with a single remote LU and 2 associated modes. One of these modes (APPC4K) has an initially active session configured (session limits 1,1,0,1) and the other mode (APPC1K) has an on demand session (limits 1,0,0,0). The problem only occurs with the modes in this order (alphabetically sorted by mode name). We have first had a time-out trying to activate the SDLC connection (so the initially active mode is marked as needing retry). We then get an Allocate for the on demand mode before the connection is retried. When we get ACTLU for the LU we try to activate both modes (internally queuing an INIT-SELF for APPC1K) and get into an infinite loop (the kernel trace buffer is full of logs SNA0026 for APPC4K). (5) 1653179788 A kernel panic with the following stack trace uniquely identifies this problem: panic+0x10 report_trap_or_int_and_panic+0x8c trap+0xbf0 $RDB_trap_patch+0x20 s1pupbnd+0xd4 s1pucsc+0x2f4 s1pusvc+0x3e4 s1pgdisp+0x20c (6) 4701335000 Customer may get a variety of communication path errors on any of the services, but this is most likely to hit those who are running a large number of TPs. (7) 5003320341 Snaplus R4.1 ethernet (802.3) connection fail to recover if host link is inactive for greater than 1 & 1/2 hours. The R4.1 system reports the following error every 10secs after the 1 & 1/2 hrs in sna.aud: LAN T10SNA9706: Exceeded maximum connections allowed for 802.3 link LAN on node NODE1 After this error is recorded the only way to recover the connection is to stop and start the link used by that connection. (8) 1653180125 panic: (display==0xbf00, flags==0x0) Data segmentation fault Stack as follows starting sp=0x7ffe6fc0 panic+0x1c ( arguments not stored ) pc=0x2625c0, pfmp=0x7ffe6f60, psp=0x7ffe6f80 trap+0xaac ( arguments not stored ) pc=0x1c15f0, pfmp=0x7ffe6ea0, psp=0x7ffe6ec0 trap marker save state 0x7ffe6c90 sp 0x7ffe6ec0 framesize 0x230 s1pxabnd+0x1b1 ( 0x7ffe1712 ,0x003d0002 ,0x00000000 ,0x7ffe0149 ) pc=0x147ffc, pfmp=0x7ffe6bf0, psp=0x7ffe6c10 s1pxsnd+0x8c9 ( arguments not stored ) pc=0x144f64, pfmp=0x7ffe6b70, psp=0x7ffe6b90 s1pgdisp+0x385 ( 0x004d0028 ,0x7ffe6afc ,0x7ffe0001 ,0x660000fd ) pc=0xe3e94, pfmp=0x7ffe6b30, psp=0x7ffe6b50 sna_1_sbpsched+0x10d ( arguments not stored ) pc=0x158ca8, pfmp=0x7ffe6ab0, psp=0x7ffe6ad0 sna_1_sbpikusv+0x55 ( 0x01263074 ,0x00000000 ,0x00000098 ,0x00004321 ) pc=0x152518, pfmp=0x7ffe6a70, psp=0x7ffe6a90 sq_wrapper+0x5c ( arguments not stored ) pc=0x93188, pfmp=0x7ffe6a30, psp=0x7ffe6a50 csq_lateral+0x80 ( arguments not stored ) pc=0x96174, pfmp=0x7ffe69b0, psp=0x7ffe69d0 runq_run+0x58 ( arguments not stored ) pc=0x930d4, pfmp=0x7ffe6970, psp=0x7ffe6990 str_sched_daemon+0x1b0 ( arguments not stored ) pc=0x934e0, pfmp=0x7ffe68b0, psp=0x7ffe68d0 main+0xa04 ( arguments not stored ) pc=0x24c814, pfmp=0x7ffe67f0, psp=0x7ffe6810 $vstart+0x3d ( arguments not stored ) pc=0x1b1fa4, pfmp=0x7ffe67c0, psp=0x7ffe67e0 istackatbase+0x88 ( arguments not stored ) pc=0x1c5f0, pfmp=0xffffffe0, psp=0x0 (9) 5003287276 3270 sessions are not completely logged off when the user exits 3270 using Ctrl-C, or from the file pull down menu. (10) 4701325399 APPC TPs will not work when one of the machines is migrated from the R4 line of releases to the R5 line. (11) 1653153957 Only one Session Binds successfully when multiple sessions try to use the APPC default LU pool. Defect Description: PHNE_17510: (1) JAGab64407/1653293522 System panic caused by invalid messages received (XID and NOTIFY) due to some (unresolved) token ring problem. Resolution: Protection code added to parse invalid messages better. PHNE_17509: (1) 5003439372 The problem is caused by IOVA not correctly unmapped in the nio_initialize() routine. Correction : The former unmapping unmaps a step address which has not been IOVA-mapped : if (step->physical_addr != 0) { l_io_vec.iov_base = (caddr_t) step; l_io_vec.iov_len = sizeof (struct psi0_step); sio_unmap(pda->mgr_port_num, &l_io_vec); } As the step IOVA is held within step->physical_addr, step->physical_addr should be unmap instead of step !! Resolution: libpsi0.a code modified as follows:- if (step->physical_addr != 0) { l_io_vec.iov_base = (caddr_t) step->physical_addr; l_io_vec.iov_len = sizeof (struct psi0_step); sio_unmap(pda->mgr_port_num, &l_io_vec); } (2) 5003437145 System panic occurs where the basic symptom is that fields on one of our mode control blocks have been incremented (to invalid values). Resolution: We have seen in the past that the LAN link types produce statistics messages to the 2.1 node whenever a link outage occurs - unfortunately, these statistics messages expose a bug in the 2.1 node where we can increment areas of memory well beyond the end of the statistics control blocks (in the area of memory where the mode control blocks are stored in fact). This code change is to correct that bug. (3) 5003423095 System panic occurs when shutting down the system if attempting to deallocate a session which is no longer active. Resolution: Ensure that before we attempt to send details of a deallocating process to the RM component of the 2.1 node, we check to see that there is still an active session for the process - if not, then simply close down and free up resources. (4) 1653285890 A system panic occurred because the PU2.1 received an XID with attached signalling information - however, the signalling information was incorrectly formatted and had an invalid length field at the beginning. This caused the PU2.1 to overwrite the stack and crash. Resolution: A code change has been made to ensure that the PU2.1 can handle badly formatted signalling information without crashing - simply reject the XID. (5) 1653274928 A trap situation was encountered in the QLLC shutdown code resulting in a system panic. Resolution: A trap situation was encountered in the QLLC shutdown code. This code has already been rewritten for SR 1653264416, and this trap situation will not now occur. PHNE_15608: (1) 5003403691 Code changed to ensure that when a TP exits, we look for outstanding attaches and kick off another copy of the TP immediately if required. (2) 4701394676 Code changed to correct management of the IO unmapping mechanism. (3) 1653273201 Build environment - library inadvertently built with instrumentation (4) 1653264416 Code change made to prevent the window condition from occurring by ensuring that the control blocks in the QLLC glue code used for each active VC are only freed when the X.25 layer has freed off all streams AND the 2.1 node has confirmed that the link is down by sending a CLOSE(LINK) message. (5) 1653260166 Problem caused by clash of enhancement code to support SDLC configuration of segment sizes and enhancement code to support maximum SDLC segment size in space.c file. Need to take minimum of these two values for SDLC links. PHNE_13765: (1) 5003387670 Coding errors in Assembler langage (2) 5003382994 The underlying problem is a race between the ADD_SVCE messages from the router and the Node intialisation processing. We suppress the logging of SNA0039 until we have received at least one ADD_SVCE message. (3) 1653237644 Code change made to not allow bad value of mode control block index to be used when trying to send -ve ACT session RSP. (4) 1653234948 The firmware has been changed so that it takes into account the Constant Carrier configuration for RTS correct behaviour. PHNE_12816: (1) 1653227611 We failed to clean up a suspended pending session in this case. Code change made to do reset the session following the CNOS UNBIND. (2) 5003378166 A change has been made so that the RTS is dropped at the end of the frame transmission independant of which carrier type is configured. PHNE_9886: (1) 1653207761 When ADD_SVCE received at 2.1 DMOD we check to see if the ADD_SVCE is for a QD_OP TP and if there is a queued DLOAD for that TP. If so a copy of it is sent to the TP at once. (2) 1653212332 Since TN3270 regime is an optional part of the protocol, this code has been removed, forcing the negotiation exchange to use alternative supported methods. (3) 1653215681 Test for SNASVCMG before starting deactivation timer (20 seconds). (4) 1653219204 The 'LOST INTERUPT' message is reported in traces because the firmware detected that the frame size received from the network is too long compared with its configured maximum frame size which is hard-coded to 269. And effectively, the host, configured with a MAX_DATA of 1500 is sending frames of moe than 269 bytes. PHNE_9659: (1) 1653185140 1/ Coding error: bad field is taken to initialize SNA X25 address value. 2/ QLLC does not pass the SNA subaddress to the Glue so that the Glue registers to X25 only on the X121 address. Then, either X25 rejects the incoming call if called address includes the subaddress or X25 accepts the call if called address does not include subaddress but then SNA rejects the incoming call because it checks the called address against its subaddress. (2) 1653201301 The fix is simply to check for terminated data structures in the lower write service procedure (a similar check to other streams entry points). PHNE_9554: (1) 1653180323 Reset the alarm timer only when the signal catcher has finished to avoid recursively entering signal catcher. (2) 1653185306 The processing of a Disconnect Indication generates two calls to the Close Connection routine. (3) 1653191924 Problem was that the HMOD was being primed with a frame size of 5, the configured frame size in the connection record was not reflected in the link record. Altered the link record to point to link data of the first connection in all cases (already done for Host links). Also sent fix to HMOD. (4) 1653183947 The fix is to prevent any looping for ACTLU processing which can only have 1 mode that can be processed. The customer could also prevent the problem by altering the configuration to make the APPC4K mode not initially active (change the session limits to 1,1,0,0) or remove one of the modes. This would also prevent some of the error logs that he will get with this configuration as the two modes compete for the LU. (5) 1653179788 The problem occurred because we fail to associate a new control block for an incoming dependent LU BIND with the associated SSCP control block. (6) 4701335000 When the customer has used up all of the Service Table entries, the node (PU2.1) will be unable to handle any more requests from the services causing various types of errors. (7) 5003320341 No timer was implemented on the Ethernet connection. (8) 1653180125 The problem is caused by an unusual LU 6.2 SNA sequence talking to the mainframe: RX BIND (dependent, CL) TX BIND +ve RX FMH5 RQD2 (S.N. 1), TP starts Kill TP, TX -ve RSP, TX FMH7 CEB RQD1 RX FMH7 CEB RQD1 (S.N. 2), TX +ve (S.N. 8000) RX FMH5 BB RQD2 (S.N. 3), TP starts RX +ve (S.N. 8003) - we detect BETB condition as the send chain FSM is pending RSP with CEB in it and then decouple the SCB and RCB leading to later crash when we get lost locality from the TP being killed. The SNA is unusual because there are 2 FMH7 CEBs (Deallocate Abend), the Host's one looks to be superfluous but is allowed through by the APPC protocols. The Host also delays responding to the CEB we send until part way through the next bracket and uses the current bracket sequence number (so the response does not appear stray). (9) 5003287276 The reason sessions aren't completely logged off is that 3270 exits with a TERMSLF instead of an UNBIND. Since IBM implementations currently send UNBINDs in this type of situation, it is reasonable to change our product to do the same. In fact, our SNAplus2 3270 product already behaves this way. (10) 4701325399 The cause of the problem is the lack of a fully qualified LU name on the R4 side. The R5 behaves correctly and sends the Fully Qualified LU name but then it doesn't match the table entry on the R4 side. This fix makes the R4 Node smart enough to match the LU names. (11) 1653153957 SNAplus always attempts to use the same LU even if there are other available LUs in the default LU pool. SR: 5003439372 5003437145 5003423095 5003403691 5003387670 5003382994 5003378166 5003320341 5003287276 4701394676 4701335000 4701325399 1653293522 1653285890 1653274928 1653273201 1653264416 1653260166 1653237644 1653234948 1653227611 1653219204 1653215681 1653212332 1653207761 1653201301 1653191924 1653185306 1653185140 1653183947 1653180125 1653179788 1653153957 Patch Files: /usr/conf/lib/libpsi0.a /usr/conf/lib/libpsi1.a /usr/conf/lib/libsix1.a /usr/conf/lib/libsixet.a /usr/conf/lib/libsixfd.a /usr/conf/lib/libsixl.a /usr/conf/lib/libsixn.a /usr/conf/lib/libsixqs.a /usr/conf/lib/libsixtk.a /opt/sna/bin/snaptnstub /opt/sna/sdlc.dlf /opt/sna/sdlc.pbs what(1) Output: /usr/conf/lib/libsixn.a: A.10.20.200 SNAplus R4.4 TN Server Core (PHNE_9554 : 97/01/21 16:39:44) /usr/conf/lib/libsixqs.a: A.10.20.204 SNAplus R4.4 Streams QLLC (PHNE_15608 : 98/08/27 17:34:06) /usr/conf/lib/libsix1.a: A.10.20.212 SNAplus R4.4 PU 2.1 (PHNE_17510 : 99/02/11 17:40:10) /usr/conf/lib/libsixl.a: A.10.20.201 SNAplus R4.4 SDLC in the kernel (PHNE_9886 : 97/06/25 14:27:22) /usr/conf/lib/libsixet.a: A.10.20.204 SNAplus R4.4 802.3 (PHNE_9659 : 97/02/06 19:32:52) /usr/conf/lib/libsixfd.a: A.10.20.204 SNAplus R4.4 FDDI (PHNE_9659 : 97/02/06 19:31:50) /usr/conf/lib/libsixtk.a: A.10.20.204 SNAplus R4.4 Token Ring (PHNE_9659 : 97/02/06 19:33:33) /usr/conf/lib/libpsi0.a: A.10.20.204 SNAplus R4.4 PSI Driver (PHNE_17509: 98/11/02 18:09:35) /usr/conf/lib/libpsi1.a: A.10.20.001 SNAplus R4.4 EISA PSI Driver (PHNE_13765: 98/01/23 14:09:26) /opt/sna/bin/snaptnstub: A.10.20.201 SNAplus R4.4 TN Server Stub (PHNE_9659 : 97/05/06 14:05:00) /opt/sna/sdlc.dlf: SNAplus EISA FW v2.3 (98/01/27 13:03:03) /opt/sna/sdlc.pbs: SNAplus NIO FW v2 cksum(1) Output: 3751159636 73880 /usr/conf/lib/libsixn.a 1190472722 181964 /usr/conf/lib/libsixqs.a 3798597064 906072 /usr/conf/lib/libsix1.a 4122901300 244732 /usr/conf/lib/libsixl.a 1218112106 187324 /usr/conf/lib/libsixet.a 1279901727 186624 /usr/conf/lib/libsixfd.a 727444658 186308 /usr/conf/lib/libsixtk.a 2905487678 46516 /usr/conf/lib/libpsi0.a 2022576048 31896 /usr/conf/lib/libpsi1.a 2004067365 69632 /opt/sna/bin/snaptnstub 763652238 105276 /opt/sna/sdlc.dlf 1667265622 172168 /opt/sna/sdlc.pbs Patch Conflicts: None Patch Dependencies: s700: 10.20: PHNE_21212 s800: 10.20: PHNE_21212 Hardware Dependencies: None Other Dependencies: None Supersedes: PHNE_9554 PHNE_9659 PHNE_9886 PHNE_12816 PHNE_13765 PHNE_15608 PHNE_17509 Equivalent Patches: None Patch Package Size: 2410 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHNE_17510 5a. For a standalone system, run swinstall to install the patch: swinstall -x autoreboot=true -x match_target=true \ -s /tmp/PHNE_17510.depot By default swinstall will archive the original software in /var/adm/sw/patch/PHNE_17510. If you do not wish to retain a copy of the original software, you can create an empty file named /var/adm/sw/patch/PATCH_NOSAVE. WARNING: If this file exists when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. It is recommended that you move the PHNE_17510.text file to /var/adm/sw/patch for future reference. To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHNE_17510.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: Stop SNA daemon before installing patch (snapstop daemon). After installing the patch start the SNA daemon (snapstart daemon).