Patch Name: PHNE_17819 Patch Description: s700_800 10.20 R6.10.20 SNAplus2 Link cumulative patch Creation Date: 99/06/02 Post Date: 99/07/19 Hardware Platforms - OS Releases: s700: 10.20 s800: 10.20 Products: SNAplus2-Link R6.10.20 Filesets: SNAplus2-Link.SNAP2-LINK Automatic Reboot?: Yes Status: General Superseded Critical: Yes PHNE_17819: PANIC HANG PHNE_17405: HANG PANIC PHNE_16758: HANG Path Name: /hp-ux_patches/s700_800/10.X/PHNE_17819 Symptoms: PHNE_17819: (1) 5003446971 Data page fault panic in nbm_free_buffer while running simple SNA tests over two LAN interfaces between the three machines running SNAplus2. (2) 4701418707 When using CPI-C without side information, outgoing attaches sometimes fail because validation has been unintentionally turned on. (3) 1653305805 One processor on a two processor box running R6 over 10.20 hangs which then causes cmcld to TOC the box to preserve system integrity. Top of stack for hanging process is: FUNC PC v0_get_rw_lock+0xb8 0.0x3cc4a8 vpr_route_ips_on_route+0x40 0.0x4094e0 vds_rcv_buffers_available+0x1a0 0.0x3e1720 vds_receive_proc+0x674 0.0x3e47fc nba_dispatch_input+0x298 0.0x5af050 nba_dispatch_process+0xa4 0.0x5af184 nba_schedule_process+0x134 0.0x5af5ec nba_send_ips+0x308 0.0x5afd3c (4) 1653293878 Invokable TP failing to start with following error messages logged. ------------- 10:52:14 GMT 10 Feb 1999 ---------------- NODE Message 16384 - 0, Subcode: 10 - 10 Log category: EXCEPTION Cause Type: Internal System: LR1875 Internal system error. Errno = 7 Action: Provide support services with the audit and error log files, and trace files if available. ------------- 10:52:14 GMT 10 Feb 1999 ---------------- APPN Message 512 - 257, Subcode: 0 - 10 Log category: PROBLEM Cause Type: Config System: LR1875 Dynamic load of TP failed. Sense code = 0x07000000 LU alias = DFKC TP name = lr229bci PHNE_17405: (1) 5003441717 The snaperrlog process can be left lying around when the SNAplus2 daemon is not started. Attempting to restart the SNAplus2 software using 'snap start' will fail (because the snaperrlog process is still there from a previous run). (2) 4701413054 System panic - Data Page Fault at nsm_process_record_from_ss+130 (3) 4701399279 The PSI firmware header is not recognized by the snapwhat command. (4) 1653289686 If using a TN3270 (not E) client and hit the clear key while TN Server is presenting an SSCP screen, then the client will lock up. The host may respond with an error message. (5) 1653289603 If using a TN3270 (not E) client and hit the clear key while TN Server is presenting an SSCP screen, TN Server forwards the clear key to the host(sends an empty RU on SSCP-LU session). The host may respond with an error message. PHNE_16758: (1) 4701405316 Updated binaries required for patching the latest R6 release of SNAplus2. (2) 4701399527 Assert errors are produced when the host sends a USSMSG10 screen to a LU configured for LU6.2. The ASSERTS are in fact benign, and will cause no problems with the integrity of the system. The ASSERTs only occur when the USSMSG10 screen is segmented, and greater than around 500 bytes in size. (3) 1653279703 Assert errors logged when RJE workstation is started due to RTM request being received. This has no affect on opperation other than bad entries in the error log. (4) 1653276543 3270 session can hang, even if you stop and restart the snap3270 emulator. If you take a trace of the problem, you will see that a NOTIFY is not sent when the 3270 emulator is stopped or started. (5) 1653273979 Enhancement to allow multiple PUs to be used on a secondary leased link. This means that if an SDLC port is connected to a leased line, you can have multiple LS's active over the port at the same time. (6) 1653267179 An Application can fail to start a remote LU62 transaction, because an invalid user ID is specified on the Attach, when AP_SAME is specified on the ALLOCATE verb. PHNE_15937: (1) 5003398354 If you issue 'snapadmin define_local_lu' for an LU which is already defined, only the attach routing data and the description field are updated - all other parameters are ignored - however, the snapadmin command does not indicate this but gives a successful return message. (2) 4701396374 Node fails to start if TN Client configured with unrecognised hostname. (3) 4701395459 The following assert message is logged to the console, syslog file and sna log files. Assert ips->cont_size >= MU_CONT_SIZE from vtc.c (4) 4701392670 When in client /server configuration, various ASSERT messages recorded in the sna.err log file from the SLIM component, followed often by a crash of the SLIM. This causes general unpredictable behaviour of the system when the master server goes down and is restarted. (5) 1653261669 SDLC link shows DISABLED after restart of SNAPLUS2 daemon PHNE_14392: (1) 4701386227 Unable to start SDLC Eisa cards Defect Description: PHNE_17819: (1) 5003446971 Panic caused by attempting to dereference null pointer while examining posted_list LQE in nbm_info structure to see whether it is empty. Resolution: Add boolean flag to nbm_info to say whether posted list is empty or not. (2) 4701418707 The outgoing attach was being sent with the password from the previously rejected incoming attach causing a validation error. Resolution: Add code to copy the password from the START_TP signal into the tcp_ptr in nrmsttp.c (3) 1653305805 From the stack we can see that this a deadlock in the kernel during snap stop processing. We grab a write lock on vpr_entity_lock in vpr_stream_close() which we hold across a number of calls, including the one to nba_term(). It is this lock we are trying to acquire in vpr_route_ips_on_route() near the top of the stack trace. Resolution: We don't actually need to hold the vpr_entity_lock round the call to nba_term() in vpr_stream_close(). So the fix is just to release it before that call and reacquire it afterwards. (4) 1653293878 The TP is failing to start because the userid under which it is running has been misconfigured so that it can't retrieve its own group name. This may be due to local access to the group file or with running NIS (Network Information Service) to share user and group IDs across more than one machine. There are two reasons for the cryptic error logs recorded by SNAplus2:- - Failure of the getpwuid() or getgrgid() system calls was not logged as an error message. - The VSM_AS_TP_FAILURE internal error code was not getting put in the right part of the DLOAD_RSP_ERR message sent from the Service Manager to the APPC Stub. This meant that the APPC stub was misinterpreting it as an APPC sense code. Resolution: The root cause of the problem is to correctly configure the Unix user/group under which the TP is to be run. However changes to SNAplus2 have been made to improve the logging in this area as follows:- In vpm_build_user_info() in vr/vpmu.c we add error logs for the cases where getpwuid() or getgrgid() system calls fail. However, failure of these system calls leads to a path failure. So to make sure these new error logs actually reach the sna.err log file, we also modify vlm_user_write_log() in vdiag/vlmuser.c so that even if we fail to open a path we still attempt to send the datagram containing the log (in addition to attempting to write it locally). In the error reply arm of vsm_rcv_dload_confirm() in vr/vsmdload.c we put the error code in the dld_status field rather than the ld_sense_data field of the DLOAD_RSP_ERR message -- because this is where the vas_datagrams() routine in the APPC Stub expects to find it. We also change the exception logged in vsm_rcv_dload_confirm() from the generic one, with its rather misleading reference to errno to a new specific error. Texts of the new logs are in the vdiag/*.txt files. PHNE_17405: (1) 5003441717 If the kernel initialisation fails, it is possible that the snaperrlog process could hang - waiting for a signal from the kernel which never arrives. Resolution: A code change has been made to ensure that ,if the kernel initialisation fails, a failure notification is sent to the snaperrlog process so it can exit cleanly. (2) 4701413054 Small timing window when there is an empty list of LULU control blocks when processing SSCP_INIT_SIGNAL_NEG_RSP ISP. Resolution: Code changed to check whether LULU list is empty before trying to obtain first element of it. (3) 4701399279 The PSI f/w header string was not changed with the release of SNAplus2 as the f/w is common to both SNAplus & SNAplus2. Resolution: - a new what string for the NIO firmware - a new what string and a new compilation format for the EISA firmware The ']' character has been added at the beginning of each PSI firmware library header line so that the header can be recognized by the snapwhat command. (4) 1653289686 TN3270 cliemt was locking up when the clear key was entered because TN Server was passing the clear command to the Host instead of processing it locally (as is done in the Motif 3270 emulator for example). Resolution: Code changed to add check and special handling for the clear key at the beginning of the TN Server SSCP inbound MU processing. (5) 1653289603 TN3270 client was receiving SSCP datas when the clear key was entered because TN Server was passing the clear command to the Host instead of processing it locally (as is done in the Motif 3270 emulator for example). Resolution: Code changed to add check and special handling for the clear key at the beginning of the TN Server SSCP inbound MU processing. PHNE_16758: (1) 4701405316 Updated binaries provided for combined patching of latest R6 release ,as documented in SR text. (2) 4701399527 A Code change has been made to prevent Assert errors occurring when a large USSMSG10 is received for an LU6.2 session . The maximum amount of data permissible on the SSCP screen has been increased to 2048 bytes, to ensure segmented data on SSCP screen handled correctly. (3) 1653279703 Code change made to fix a problem with Assert errors being logged when an RJE workstation is started. Code changed to correct the ASSERT - it should only be produced if an application has opened the SSCP session and is listening for RTM requests (RJE does not do this so it should not be logged as an error). (4) 1653276543 Code change to prevent 3270 session hang, due to NOTIFY not sent. The fix applied is to ensure that any pending NOTIFY requests are flushed from the CH queue in the APPN node when a CLOSE_SSCP message is received (indicating that the emulator has been stopped). (5) 1653273979 Enhancement to allow multiple PUs to be used on a secondary leased link. This means that if an SDLC port is connected to a leased line, you can have multiple LS's active over the port at the same time. (6) 1653267179 The problem is basically that if you specify AP_SAME on the ALLOCATE verb but did not configure user validation, then we will send a user ID consisting of 10 NULLs. A code change has been made, and the following behavior applies when AP_SAME is used : case 1: a TP on Unix invokes a remote TP: the outgoing Allocate will contain a userID subfield, set to the Unix user ID the TP is running under; case 2 :a TP on Unix invokes several remote TPs:see case 1; case 3 : multiple conversations, where an INVOKED TP issues an ALLOCATE:in that case, the outgoing Allocate will include the same level of validation which was on the ATTACH that invoked that TP. PHNE_15937: (1) 5003398354 Code changed to ensure that if the user specifies any other parameters, they match those used on the initial define. Produce an error code otherwise. (2) 4701396374 Code changed to allow the node to start if it finds a TN Client configured with unrecognised hostname, but generates an error log which tells the user of the failure. (3) 4701395459 This was an incorrect ASSERT which has been removed. It is a benign problem, but produces annoying error logs and console messages. (4) 4701392670 The LAN logger component (which handles central logging) incorrectly registered itself with the service manager as a server. This means that a server could end up twice in the service table (for example, once as a backup, then again as a master server). This lead to extremely unpredictable and unreliable client/server operation. Code change made to prevent this incorrect registering. (5) 1653261669 Send a signal to the host when firmware is ready (backplane and frontplane are initialized) Remove debug trace from msgbuf (opt1:) PHNE_14392: (1) 4701386227 Fixed problems in SDLC driver and firmware. SR: 5003446971 5003441717 5003398354 4701418707 4701413054 4701405316 4701399527 4701399279 4701396374 4701395459 4701392670 4701386227 1653305805 1653293878 1653289686 1653289603 1653279703 1653276543 1653273979 1653267179 1653261669 Patch Files: /opt/sna/conf/lib/libpsi0.a /opt/sna/conf/lib/libpsi1.a /opt/sna/conf/lib/libsixd.a /opt/sna/conf/lib/libsixl.a /opt/sna/conf/lib/libsixs.a /opt/sna/sdlc.dlf /opt/sna/sdlc.pbs /opt/sna/bin/snaptnsrvr what(1) Output: /opt/sna/bin/snaptnsrvr: HP92453-02A.10.00 HP-UX SYMBOLIC DEBUGGER (END.O) $R evision: 74.03 $ ]R6.10.20.101 SNAplus2 R6 TN Server ] (PHNE_17405 : 99/01/11 17:27:05) ] /opt/sna/conf/lib/libpsi0.a: ]R6.10.20.100 SNAplus2 R6 NIO PSI driver ] (10.20.R6 (DART 41): 98/07/17 13:16:55) ] /opt/sna/conf/lib/libpsi1.a: ]R6.10.20.100 SNAplus2 R6 EISA PSI driver ] (10.20.R6 (DART 41): 98/07/22 10:36:54) ] /opt/sna/conf/lib/libsixd.a: ]R6.10.20.100 SNAplus2 R6 NDLC to DLPI Mapping ] (10.20.R6: 98/08/17 14:16:43) ] /opt/sna/conf/lib/libsixl.a: ]R6.10.20.102 SNAplus2 R6 SDLC in the Kernel ] (PHNE_17819 : 99/02/09 10:27:50) ] /opt/sna/conf/lib/libsixs.a: ]R6.10.20.111 SNAplus2 R6 Router in the kernel ] (PHNE_17819 : 99/05/27 17:49:48) ] ]R6.10.20.107 SNAplus2 R6 APPN kernel library routin es ] (PHNE_17819 : 99/03/16 17:43:19) ] /opt/sna/sdlc.dlf: ]SNAplus2 EISA FW v2.5 ](99/01/07 15:26:09) /opt/sna/sdlc.pbs: ]SNAplus2 NIO FW v2.1 ](98/11/13 11:58:22) cksum(1) Output: 2390782579 204416 /opt/sna/bin/snaptnsrvr 2855579278 64504 /opt/sna/conf/lib/libpsi0.a 3528364603 46800 /opt/sna/conf/lib/libpsi1.a 2674665038 172144 /opt/sna/conf/lib/libsixd.a 2006556386 360448 /opt/sna/conf/lib/libsixl.a 2056815160 3023784 /opt/sna/conf/lib/libsixs.a 3269251168 105244 /opt/sna/sdlc.dlf 3918812582 172212 /opt/sna/sdlc.pbs Patch Conflicts: None Patch Dependencies: None Hardware Dependencies: None Other Dependencies: None Supersedes: PHNE_14392 PHNE_15937 PHNE_16758 PHNE_17405 Equivalent Patches: None Patch Package Size: 4120 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHNE_17819 5a. For a standalone system, run swinstall to install the patch: swinstall -x autoreboot=true -x match_target=true \ -s /tmp/PHNE_17819.depot By default swinstall will archive the original software in /var/adm/sw/patch/PHNE_17819. If you do not wish to retain a copy of the original software, you can create an empty file named /var/adm/sw/patch/PATCH_NOSAVE. WARNING: If this file exists when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. It is recommended that you move the PHNE_17819.text file to /var/adm/sw/patch for future reference. To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHNE_17819.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: Stop SNA daemon before installing patch (snap stop). After installing the patch start the SNA daemon (snap start).