Patch Name: PHNE_22811 Patch Description: s700_800 10.20 R5.1 SNAplus2 Link cumulative patch Creation Date: 02/03/05 Post Date: 02/03/21 Hardware Platforms - OS Releases: s700: 10.20 s800: 10.20 Products: SNAplus2-Link R5.1 Filesets: SNAplus2-Link.SNAP2-LINK,B.10.20 Automatic Reboot?: Yes Status: General Release Critical: Yes PHNE_22811: PANIC PHNE_20437: PANIC PHNE_18487: HANG PANIC PHNE_17403: HANG PANIC PHNE_16810: PANIC PHNE_15052: PANIC HANG PHNE_14552: PANIC HANG PHNE_13642: PANIC ABORT PHNE_13427: PANIC PHNE_11961: PANIC PHNE_9663: PANIC PHNE_9651: PANIC Path Name: /hp-ux_patches/s700_800/10.X/PHNE_22811 Symptoms: PHNE_22811: (1) JAGad39485/8606170221 System Panic in nlm_send_notify+0x48. data page fault nlm_send_notify+0x48. (2) JAGab25510/5003467290 Token Ring connection hangs when frames exceeding the largest frame size returned in RIF are dropped by the network. (3) JAGad50745/8606181529 System panic in nrm_process_cnos_reply. Stack trace details panic+0x10 report_trap_or_int_and_panic+0xe8 trap+0xa48 $RDB_trap_patch+0x20 nrm_process_cnos_reply+0x188 nrm_ps_to_rm_rec+0x648 nrm_queue_handler+0x70 nba_dispatch_process+0x114 nba_scheduler+0x208 vpr_stream_lr_svc+0x160 sq_wrapper+0xb8 str_sched_up_daemon+0x1f8 str_sched_daemon+0xf4 main+0x958 $vstart+0x34 $locore+0x74 (4) JAGad77708/8606208520 SNAPLUS2 R5.1 system panic. Data Page Faults: libsixs.a nms_nah_process_mu_rsp+0x100 Crash in nms_nah_process_mu_rsp Stack is:- trap 0xa48 thandler 0xb7c nms_nah_process_mu_rsp 0x100 nms_msm_queue_handler 0x694 nba_dispatch_process 0xd0 nba_scheduler 0x208 vpr_stream_lr_svc 0x160 sq_wrapper 0xb8 str_sched_up_daemon 0x2b0 str_sched_daemon 0xf4 main 0x94c $vstart 0x34 PHNE_20437: (1) JAGab79003/8606108556 K580 system panics on reboot with the following error messages: System Panic: 9245XB HP-UX (B.10.20) #1: Sun Jun 9 06:31:19 PDT 1996 panic: (display==0xb800, flags==0x0) nio_initialize : l_io_vec.iov_base == NULL PC-Offset Stack Trace (read across, most recent is 1st): 0x00276f44 0x003fdab0 0x00404968 0x001297c4 0x00240ae8 0x002a7ea8 0x002a6ae4 0x00215088 0x002147e0 0x00295368 0x00295444 0x00295444 0x00295444 0x002951c0 0x00295d20 0x002f316c 0x000c7014 0x00183960 End Of Stack It was not possible for the kernel to find a process that caused this crash. jCj1D Dumpsys() called (2) JAGab78436/8606108079 Within one month, several panics occured for which no explanation was found from dump analysis. Vmtrace was enabled to check for possible memory buffer corruption. After scheduling vmtrace at run level 1 and then changing to runlevel 3, there was a panic with the following stack trace: STANM : q4> trace event 0 stack trace for event 0 crash event was a panic panic+0x10 report_trap_or_int_and_panic+0xe8 trap+0xa48 $call_trap+0x20 vns_lr_receive_proc+0x888 vns_send_verb_to_lib+0x8c vns_reg_sink_req+0x1f0 vns_lr_create+0x2e4 vns_create_stubs+0x124 vns_verb_response_proc+0x1b8 vns_receive_proc+0x124 nba_dispatch_process+0xd0 nba_scheduler+0x208 vpr_stream_uw_svc+0x58c sq_wrapper+0xb8 putmsg+0x2fc syscall+0x1a4 $syscallrtn+0x0 (3) JAGab74185/8606105839 An LUA application receives an UNBIND indication from an established LU-LU session. It responds with RSP.UNBIND and encounters the following error: primary return code: LUA_STATE_CHECK (0x0200) secondary return code: LUA_NO_RUI_SESSION(0x81000000) (4) JAGab74007/8606105750 Any snapadmin command issued for a QLLC link simply hangs never to return. The link remains in the STOPPING state. (5) JAGab72203/8606104853 An LUA application fails with a secondary return code of LUA_RUI_LOGIC_ERROR on an RUI_READ having received a RQ.SDT from the host. (6) JAGab71519/8606104163 This system panic occurred on a H20 running 10.20 with 128MB. panic+0x10 report_trap_or_int_and_panic+0xe8 trap+0xa48 $call_trap+0x20 nsm_delete_session_id+0x104 nsm_cleanup_lu_lu_session+0x7b8 nsm_fsm_status+0x1d4c nsm_process_deactivate_session+0x140 nsm_process_record_from_rm+0x1a8 nsm_queue_handler+0x68 nba_dispatch_process+0x114 nba_scheduler+0x208 vpr_stream_uw_svc+0x58c sq_wrapper+0xb8 str_sched_up_daemon+0x2b0 str_sched_daemon+0xf4 main+0x974 $vstart+0x34 PHNE_18487: (1) JAGab65539 Lan performance degraded when attempting to start an SDLC psi card on a T600 system. (2) JAGab65400 SNAPlus2 R5.1 SDLC link fails to start if DSR is not present when link station is started. Only recovery is to reboot the system and ensure that modem signals are present before starting the link. Loss of DSR causes the same problem. The following errors are logged: SDLC Message 768 - 17, Subcode: 1 - 11 Log category: EXCEPTION Cause Type: External System: dqserv1 DSR was not active when activating port. Return code = 0x0003 Cause: An error occurred on a port. The port is configured as Non-switched but DSR was not present. Action: Check whether the configuration setting of non-switched is correct. If the port is correctly configured, check the modem and link hardware. Check other messages for further diagnostics. SDLC Message 768 - 94, Subcode: 0 - 11 Log category: EXCEPTION Cause Type: External System: dqserv1 SDLC port device driver reported an error. DLC name = SDLC0 Port name = mapport Port number = 0x00000000 Return code = 0x0003 Detailed return code = 0x0020 Cause: An error occurred on an SDLC port. The detailed return code provides more information on the error, as follows: 0x0020 DSR failure 0x0021 General hardware failure 0x0022 Modem power off 0x0023 CTS failure between frames (4 wire) 0x0024 CD failure between frames (4 wire) Action: Check the detailed return code shown; check for previous exception messages providing more information about the failure. Additional Problem: During testing, it was discovered when this DSR problem occurs, sequence of 'snap stop' followed by 'snap start' will not reset the card causing the problem. The PSI driver will complain about download failure. (3) JAGab65383 System hung caused by negative spinlock depth. The kernel instrumentation that panics the machine when the spinlock depth goes negative was installed: protection.s $Date: 96/11/22 11:00:38 $ $Revision: 1.10.98.4 $ PATCH_10.20 (PHKL_9152/WTEC A5303386) and the machine paniced with the following stack trace: LEVEL FUNC ARG0 ARG1 ARG2 lev 0) panic+0x10 n/a n/a n/a lev 1) report_trap_or_int_and_panic+0xe8 0x2 0x9 0x69c6e0 lev 2) interrupt+0x458 n/a 0x69c6e0 n/a lev 3) $ihndlr_rtn+0x0 n/a n/a n/a lev 4) mpn_spinunlock_ul4_brn_target+0x8 0x5066380 n/a n/a lev 5) psi1_isr+0x13c 0x5060e00 0 n/a lev 6) eisa_int+0x10c 0x5060a00 0 n/a lev 7) lasi_interrupt+0x88 0x8 0x69c030 n/a lev 8) mp_ext_interrupt+0x2a0 0x69c030 n/a n/a lev 9) $RDB_int_patch+0x58 n/a n/a n/a lev 10) pdc_call+0x17c n/a n/a n/a (4) JAGab65307 System hung caused by negative spinlock depth. The kernel instrumentation that panics the machine when the spinlock depth goes negative was installed: protection.s $Date: 96/11/22 11:00:38 $ $Revision: 1.10.98.4 $ PATCH_10.20 (PHKL_9152/WTEC A5303386) and the machine paniced with the following stack trace: LEVEL FUNC ARG0 ARG1 ARG2 lev 0) panic+0x10 n/a n/a n/a lev 1) report_trap_or_int_and_panic+0xe8 0x2 0x9 0x69c6e0 lev 2) interrupt+0x458 n/a 0x69c6e0 n/a lev 3) $ihndlr_rtn+0x0 n/a n/a n/a lev 4) mpn_spinunlock_ul4_brn_target+0x8 0x5066380 n/a n/a lev 5) psi1_isr+0x13c 0x5060e00 0 n/a lev 6) eisa_int+0x10c 0x5060a00 0 n/a lev 7) lasi_interrupt+0x88 0x8 0x69c030 n/a lev 8) mp_ext_interrupt+0x2a0 0x69c030 n/a n/a lev 9) $RDB_int_patch+0x58 n/a n/a n/a lev 10) pdc_call+0x17c n/a n/a n/a (5) JAGab65300 The problem occurs when 2 PSI cards are installed, and the link is started on the second PSI card. The error reported is :- LOG Message 4096 - 13, Subcode: 0 - 11 Log category: EXCEPTION Cause Type: Internal System: h0760101 ASSERT: File name = ../../c/cnbase/nbasched.c Line number = 924 Expression = !NB_IN_LIST(ips->lqe) The connection to the host is lost when the problem occurs. PHNE_17403: (1) 5003441717 The snaperrlog process fails to terminate when a hung 'snap start' is aborted with Ctrl-C. Attempts to restart SNAplus2 software using 'snap start' will fail because the snaperrlog process still exists from the previously aborted start. (2) 4701413054 System panic - Data Page Fault at nsm_process_record_from_ss+130 (3) 4701399279 The PSI firmware header is not recognized by the snapwhat command. (4) 1653289686 If using a TN3270 (not E) client and hit the clear key while TN Server is presenting an SSCP screen, then the client will lock up. (5) 1653289603 If using a TN3270 (not E) client and hit the clear key while TN Server is presenting an SSCP screen, TN Server forwards the clear key to the host(sends an empty RU on SSCP-LU session). The host may respond with an error message. For example, after using the 'Clear Screen' function while in SSCP-LU session, the following message was retuned to the emulator: 'LU= ED8A5008 UNSUPPORTED FUNCTION ' PHNE_16810: (1) 4701405597 Panic in nrm_bld_and_send_deact_sess when called from nrm_ps_abend_proc with NULL pointer as deact_sess buffer pointer. (2) 4701399527 Assert errors are produced when the host sends a USSMSG10 screen to a LU configured for LU6.2. The ASSERTS are in fact benign, and will cause no problems with the integrity of the system. The ASSERTs only occur when the USSMSG10 screen is segmented, and greater than around 500 bytes in size. PHNE_15052: (1) 5003398354 If you issue 'snapadmin define_local_lu' for an LU which is already defined, only the attach routing data and the description field are updated - all other parameters are ignored - however, the snapadmin command does not indicate this but gives a successful return message. (2) 4701395459 The following assert message is logged to the console, syslog file and sna log files. Assert ips->cont_size >= MU_CONT_SIZE from vtc.c (3) 4701393256 Various system panics. Stack traces show: 1) Cyclic loop (stack overflows) with nbm_free_buffer and nba_send_ips 2) Panic at vds_route_ips+0x70 3) Panic in nbm_free_buffer after calling various routines to n*_free_queue (4) 4701392290 If you issue snapadmin aping and specify the lu_alias parameter, the system will hang. (5) 4701389601 The cyclic trace added for SR 1653236075 produced corrupted trace output. (6) 4701388983 When connecting to Brixton Software running as PU5, we cannot activate a connection. The Brixton stack sends an XID with an error vector, and we log an error message reporting this and giving a sense code of 088C0EF1. (7) 4701388124 Various system panics. Stack traces show: 1) Cyclic loop (stack overflows) with nbm_free_buffer and nba_send_ips 2) Panic at vds_route_ips+0x70 3) Panic in nbm_free_buffer after calling various routines to n*_free_queue (8) 1653261669 SDLC link shows DISABLED after restart of SNAPLUS2 daemon (9) 1653260489 Various system panics. Stack traces show: 1) Cyclic loop (stack overflows) with nbm_free_buffer and nba_send_ips 2) Panic at vds_route_ips+0x70 3) Panic in nbm_free_buffer after calling various routines to n*_free_queue (10) 1653256123 Message 4096 - 13 asserts in snaplus2 error log (11) 1653241521 Customer's original 'problem' was because LU is allocated from default pool on issuing TP_STARTED verb. This is working as designed and will not be fixed. However, often when attempting to allocate an LU from the default pool you get an error on TP_STARTED with a secondary rc of 0x50300000 - this is a bug in the code which determines if an LU is available. PHNE_14552: (1) 5003395475 SDLC link recovery requires system restart (2) 4701385500 Enhancement to support Streams linked HMOD for Stratus. (3) 4701382572 System hang when snapdaemon starts after patch install. (4) 4701385815 Connection dropped and tn server core dumps when client sends hex 'FF4C' in the midst of data. (5) 1653252965 If you configure format 0 XID, we only ever send NULL XIDs. (6) 1653236075 System panic with various stack traces. Each stack trace shows 'ch' as the component which is calling the nbase to free a buffer - panic finally occurs in nba_account_buffer_out. PHNE_13642: (1) 5003395376 SNAPlus2 does not send RTM statistics when session ends if Host has not sent a soliciting RTM. (2) 5003379065 SDLC link dies when kernel level tracing is activated. (3) 4701379560 No message displayed when monitoring is active (4) 1653249474 System panic with ASSERT at line 68 of ncsxidpr followed by crash in ncs_exchange_and_conn_counts. (5) 1653248732 snaptnsrvr process exits without core. (6) 1653246231 System panic, with stack trace: sdl_reset_port_rsp sdl_hms_ctl_proc sdl_receive_proc sna_sdlc_nba_dispatch_process sna_sdlc_nba_scheduler vsi_stream_uw_service following 3 asserts: WARNING: SNAP-IX ASSERT: 12:57:54 14 JAN 1998 File: ../../p/vsdlc/sdlcsigi.c Line: 1482 Condition: pcb->resetting == FALSE WARNING: SNAP-IX ASSERT: 12:57:54 14 JAN 1998 File: ../../p/vsdlc/sdlcsigi.c Line: 648 Condition: pcb->resetting WARNING: SNAP-IX ASSERT: 12:57:54 14 JAN 1998 File: ../../p/vsdlc/sdlcsigi.c Line: 660 Condition: NB_NEXT_IN_LIST(pcb->station_root) != NULL (7) 1653239624 The TN Server process simply dies, meaning that clients can no longer connect and all active clients are thrown off. PHNE_13427: (1) 1653240846 ASSERTS in vbaaccess and elsewhere then crash in vtr_write_dlc_msg_to_buf. (2) 4701342899 HA / MC ServiceGuard could be unusable because script described in Administration Guide is based on return code. A value of 0 is an indication that the link station was active at a time when snapmon was running, non-zero value means the link station was not seen active. (3) 4701375600 If you issue a query_lu_0_to_3 NOF API verb with a buffer greater than 65535 bytes, then we return incomplete data. PHNE_12954: (1) 1653230284 TN3270 client is unable to connect to TN Server when using a slow dial-up TCP/IP Connection if a BIND is received by TN Server before negotiation with the client completes. (2) 5003388793 If you issue an ALLOCATE verb with AP_SAME as the secrity type, the system will always set the user_id on the outgoing attach as the unix user ID which the TP was running under. PHNE_11961: (1) 1653223529 Allocate Immediate hangs after CNOS used at mainframe to reset session limits. (2) 5003378646 System panic in ntc_segment_reassembly PHNE_9663: (1) 1653204875 System panSystem panic processing init self response. Stack trace was: nsm_fsm_status+0x6cc nsm_process_init_self_rsp+0x17c nsm_process_dependent_mu+0x388 nsm_process_record_from_hs+0x198 nsm_queue_handler+0x7c nba_dispatch_process+0xd0 nba_scheduler+0x208 (2) 1653212332 Certain TN3270 products will not connect to TN Server because we do not support the correct TN3270 regime. (3) 5003352336 Enhancement - allow a non queued tp to loop round and reissue a receive allocate verb. If an attach comes in, a new TP will be forked and exec'd ONLY if an existing TP isn't sitting with a rcv alloc outstanding. (4) 5003362665 Session lost when trying to switch applications on some mainframe systems with TN server only. Trace shows we transmit UNBIND after receiving SIGNAL when in receive state (just are receiving CD). 2 ASSERTS at line 42 on nchdutil and NACK-2 1002 sent internally by CH to TN. (5) 5003368696 If a host sends RMA in 3270 datastream it will be bounced by TNSERVER and the session unbound. If RMA ever sent by another TN Server to our TN3270 client it will be rejected. (6) 5003369736 If a BIND is received where the length byte for PLU is present, but is set to zero, the BIND will be rejected. PHNE_9651: (1) 1653206953 It is a trap in nms_nah_process_mu_rsp indirecting on the verb_sig pointer which is taken from the head of the sess_entry->process_queue (line 198 in nmsnah.c). The scenario was that everything was active, the host was shut down and when the host link was restarted, this trap occurred. (2) 1653206961 Using RUI interface, receive non-negotiable BIND with a certain maximum RU size. Sometimes RUI_WRITEs of data within that maximum RU size will fail. (3) 1653206979 Started leased connection to host. As soon as link came up (ie before ACTLUs all processed), issued 'random ' command, with bad tp name. Node crashed. (4) 1653207027 When SNAP-IX received an invalid LOCATE which did not have FQPCID control vector, a system panic occurred in nds_add_to_fqpcid_table. (5) 1653207035 After running a random test which allocates between pairs of LUs (mole), QUERY_DIRECTORY_ENTRY returns an LU name with NETID=APPN, but LU name = 8 null characters. (6) 1653207043 System can panic under low memory conditions when processing session activation. (7) 1653207084 Unable to connect Attachmate TN3270 to SNAPlus TN Server. (8) 1653207092 Symptoms of problem: panic in nrm_bld_and_send_deact_sess called from nrm_ps_abend_proc with NULL pointer as deact_sess buffer pointer. (9) 1653207100 Failure is seen with END node and LEN node, with others versions of SNAplus2 too. It seems OK for NN node on previous versions . picot:> snapadmin -a -d query_directory_stats gives a line of -, and hangs. (10) 1653200352 Crash when running SDLC on MP box. Problem seen when using 2 ACC cards and many links. Problem caused by upper HMOD stub code and call back HMOD stub code running at the same time in parallel on different CPUs. (11) 1653203505 RUI_WRITE verb sometimes rejected with return code RU_LENGTH_ERROR (when the length is OK). Problem occurs if application sends a long BIND RSP (>1 byte) to a non-negotiable BIND (normal case). We are then looking at uninitialised data to get the RU length. (12) 1653204636 BIND from Power to RJE rejected. In fact any BIND that ends with user data is rejected. (13) 1653199224 When using format 0 XIDs can fail to connect with the host (14) 1653199091 After stopping and restarting the sna daemon, the system panics with the following stack trace: vsm_rcv_appl_ready_req+0x390 vsm_rcv_datagram+0x348 vpm_rcv_msg+0x80 vpr_stream_input_msg+0x458 vpr_stream_uw_svc+0x490 (15) 1653195644 Symptoms of problem: Issue a register TP call,then accept incoming in non blocking mode. If a de-register TP call is made before an attach is received, then the system kernel panics upon receipt of an attach. (16) 4701341289 Every time CPI-C TP terminates normally we get an error log (exception) type 2-3. (17) 1653192419 If the LUWID on a received attach does not contain a fully qualified LU name it will be rejected. (18) 5003343921 If remote station not on local FDDI ring, we reissue a second TEST frame 0.25 seconds after the first TEST. This is bounced by DLPI, and an error 12288 - 18 is logged (error=1, DLPI primitive=45). (19) 1653187203 The host suddenly sends UNBIND with 'normal end of session' type just after exchanging STSN request/response. (20) 4701325498 The SNAplus2 product at MR does not support autoactivation of sessions. Sessions are activated upon demand instead of when the link is brought up. The impact of this is minor, except for customers using APPC or CPI-C TPs which use the AP_IMMEDIATE return control type on their allocate verbs. These allocate verbs fail if the sessions they are expecting to use are not active. A patch will be provided in the near future which allows preconfiguration of an autoactivation limit for APPC modes. This patch will allow SNAplus2 APPC and CPI-C TPs which use the AP_IMMEDIATE return control type to function just as they did on SNAplus. If you have TPs which use the AP_IMMEDIATE return control type, you should install this patch after installing the SNAplus2 software. Defect Description: PHNE_22811: (1) JAGad39485/8606170221 The code was attempting to send NOTIFY when inappropriate. Resolution: APPN code has been corrected to enhance checking to see if NOTIFY is required. (2) JAGab25510/5003467290 SNAplus2 does not negotiate down the maximum BTU size from the largest frame size received in the RIF on the TEST_RSP (route discovery frame) received from the token ring driver. Resolution: Corrected the code to process the RIF before sending the size back to SNAP APPN. (3) JAGad50745/8606181529 A TP has terminated at the same time as a CNOS race condition is occurring. The code tries to access a message that has already been released and this causes a system panic. Resolution: Test for the presence of the get_session on the mode control block before trying to process it. (4) JAGad77708/8606208520 Uncorrelated NMVT RSP received. Resolution: The code has been changed to ignore uncorrelated NMVT RSP. PHNE_20437: (1) JAGab79003/8606108556 During nio_initialize, the driver code checks for NULL IOVAs returned from sio_map() and will panic if the returned IOVA is NULL. However, a NULL IOVA is still valid and no panic should occur. Resolution: Fix is to remove the panic on NULL IOVAs after sio_map() calls. Also in the step data structure, invalid IOVAs are redefined to be -1 (void* 0xffffffff) instead of 0. Also, change all checks for 0 IOVAs to be -1 IOVAs. (2) JAGab78436/8606108079 The problem is caused by a bug in vns_lr_receive_proc whereby under specific circumstances a buffer is freed but the routine continues on to try and access the buffer. The memory buffer is checked after it is released. If scrambled data was written beyond the buffer limit then the machine will panic with isr.ior pointing to the buffer address. The cause of this is again attempting to access an IPS for trace purposes that has been sent on. The process of sending the IPS will cause the memory we are accessing to be freed. Resolution: After the buffer is freed, the routine should be exited - fix is to add the line 'goto EXIT;' after the buffer is freed off. This stops additional processing of the buffer. (3) JAGab74185/8606105839 The problem is caused by a STATUS_SESSION(NO_SESSION, LU_INACTIVE) followed by an CLOSE_PLU_SLU_SEC_RQ. The CLOSE causes a dummy UNBIND to be built and queued, but the CLOSE kills the session between the RUI_BID returning and the subsequent RUI_READ. Resolution: Check the reason for the CLOSE - if it is due to a LINK error or because the PU or LU is inactive, then simply kill the session - don't send the UNBIND down to the application. (4) JAGab74007/8606105750 The defect occurred as follows: -user clicks on stop to deactivate a QLLC connection -QLLC driver sends a QRD frame to the host -for some reason the QRD frame is ignored by the host -any attempt to stop the system will fail because there is no response to the QRD. Resolution: Implemented a fix whereby a (short) timeout is started when the QRD is sent. Also included a QDISC in the fix because it was using a full-length timeout. (5) JAGab72203/8606104853 The problem occurs because stat_type_exp is unexpectedly non-zero when the RUI_READ for the SDT is processed. The stat_type_exp records the existence of an outstanding STATUS_CTL message (i.e. a STATUS_CTL message received from CH and passed to the application, but for which no response has been received). The stat_type_exp should not be set when the SDT is received as it is cleared when OPEN_PLU_SLU_SEC_RQ is received and no STATUS_CTL messages have been passed from CH since. However, the OPEN_PLU_SLU_SEC_RQ is converted to a STATUS_CTL message internally by RUI which, critically, leaves status_control.qualifer with a floating value. If this qualifier has the value 1, then the BIND registers in stat_type_exp as an outstanding STATUS_CTL message which is never cleared. This fails the SDT. Resolution: The fix is to force the qualifier to 0. (6) JAGab71519/8606104163 In certain circumstances RM will be asked to delete an SCB that was created by another instance of RM. If this happens then we can assert or crash when trying to delete the session. Resolution: Fix is to store the process ID of the RM instance that creates the SCB on the SCB, then when RM receives a DEACTIVATE_CONV_GROUP (which is routed from the NOF using the LU name/alias field) it can check that the SCB was created by this instance of RM and reject the signal (with new secondary return code NAP_LUNAME_CGID_MISMATCH). PHNE_18487: (1) JAGab65539 The problem is due to a code defect in the psi driver when attempting to process multiple DMA transactions. Resolution: The fix implemented consists in processing 1 DMA transaction at a time instead of processing queued DMA transactions. The DMA transactions are still queued but the DMA engine processes only one transaction at a time. It does not prefetch DMA transaction because we force it to stop and generate an interrupt after having processed every transaction. When the driver gets the interrupt related to the completion of a DMA transaction, it starts processing the next DMA transaction in the queue. (2) JAGab65400 The protocol between driver and firmware requires a message exchange between the two for the link to start up. If the driver and firmware become out of sync a link failure occurs. Resolution: The fix is to make sure that the driver and firmware, once they become 'unsynchronized', have a way to be re-synchronized. This is done by: A) If the firmware hits an error condition in which it does not know how to handle, it will set a system error and 'jump' to the first line of the firmware code (i.e., first line of main). This is done by timing out on inactivity in the OPEN_PEND and CLOSE_PEND states of the firmware. And also by making sure that unexpected errors will result in a declaration of system error which will allow the firmware code to reset. B) To make sure that the driver will not be hung, the driver code will start out with a credit of 2 when it initiates data transfer with the card. This is to prevent situations in which both the firmware and the driver are waiting for each other to send a message. With the new credit assignment during initialization, the driver will always able to initiate action on the card. Since the code is designed assuming only 1 outstanding message to the firmware at a time, the driver has a credit check to make sure the credit value is not greater than 2. Also discovered in the trace of the PSI1 driver that the driver tries to send a shutdown when its buffer credit is less than 2. Furthermore, the send_shutdown() routine never checked the return code of the send buffer routine. The same problem occurred when the driver sends out the CONFIG message to the firmware. As a result, the driver is waiting and timing out for the response from the firmware. This seems to occur when there is a unexpected shutdown as in the loss of DSR signal. The cause of the problem was the firmware looping forever inside the rx_frame() routine. This routine is called from within the s1_isr() (ISR for handling SCC interrupts). It looped forever looking for a unused buffer. If it found one, it would exit the loop. The solution is to have the rx_frame() routine examine the RX_BD value (a pointer to the last receive buffer serviced by the SCC). The rx_frame() will only visit the buffer visited during the last rx_frame() call to the one pointed to by the current RX_BD value. Thus, the number of buffers visited is limited to 8, the number of receive buffers available. When this rx_frame() problem was fixed, the FRMR (frame reject) problem appeared. This was observed after turning on the sna trace. This was caused by the driver sending the SDLC layer a frame that it had sent before. The _EnabInt() and _s1_isr routines (written in 68k asm) was not saving all the registers it used. As a result, data corruption can occur. The fix also fixed the FRMR problem. (3) JAGab65383 The panic in the psi1_isr() routine is due to two consecutive calls to the spinunlock() routine. The initial implementation of the spinlock management in the psi1 driver made the assumption that only one event at a time may be handled by the interrupt subroutine. This assumption was wrong. Resolution: The fix consists in changing the spinlock management in the psi1_isr() routine. We have attempted to keep the spinlock on the PD data structure as short as possible. (4) JAGab65307 The panic in the psi1_isr() routine is due to two consecutive calls to the spinunlock() routine. The initial implementation of the spinlock management in the psi1 driver made the assumption that only one event at a time may be handled by the interrupt subroutine. This assumption was wrong. Resolution: The fix consists in changing the spinlock management in the psi1_isr() routine. We have attempted to keep the spinlock on the PD data structure as short as possible. (5) JAGab65300 The problem with the use of two PSI cards occurs because two pieces of code are trying to access the same LQE at once. Resolution: Supplied a library to the customer that demonstrates this problem is resolved if the extra locking added for SR 1653200352 for MP systems is also activated for UP systems. That locking used the MP_SPINLOCK HP system macro, which is defined in /usr/include/sys/spinlock.h and macros to 0 on uniprocessor systems. The fix therefore is to replace MP_SPINLOCK with spinlock and MP_SPINUNLOCK with spinunlock in the source files listed above. Note that this is an R5-only fix. In R6 this area of the code has been substantially modified from R5 so the problem should not occur in the first place. PHNE_17403: (1) 5003441717 If the kernel initialisation fails, it is possible that the snaperrlog process could hang - waiting for a signal from the kernel which never arrives. Resolution: A code change has been made to ensure that ,if the kernel initialisation fails, a failure notification is sent to the snaperrlog process so it can exit cleanly. (2) 4701413054 Small timing window when there is an empty list of LULU control blocks when processing SSCP_INIT_SIGNAL_NEG_RSP ISP. Resolution: Code changed to check whether LULU list is empty before trying to obtain first element of it. (3) 4701399279 The PSI f/w header string was not changed with the release of SNAplus2 as the f/w is common to both SNAplus & SNAplus2. Resolution: - a new what string for the NIO firmware - a new what string and a new compilation format for the EISA firmware The ']' character has been added at the beginning of each PSI firmware library header line so that the header can be recognized by the snapwhat command. (4) 1653289686 TN3270 client was locking up when the clear key was entered because TN Server was passing the clear command to the Host instead of processing it locally (as is done in the Motif 3270 emulator for example). Resolution: Code changed to add check and special handling for the clear key at the beginning of the TN Server SSCP inbound MU processing. (5) 1653289603 TN3270 client was receiving SSCP datas when the clear key was entered because TN Server was passing the clear command to the Host instead of processing it locally (as is done in the Motif 3270 emulator for example). Resolution: Code changed to add check and special handling for the clear key at the beginning of the TN Server SSCP inbound MU processing. PHNE_16810: (1) 4701405597 Ensure we only decouple SCB from RCB when completely finished with a conversation (2) 4701399527 A Code change has been made to prevent Assert errors occurring when a large USSMSG10 is received for an LU6.2 session . The maximum amount of data permissible on the SSCP screen has been increased to 2048 bytes, to ensure segmented data on SSCP screen handled correctly. PHNE_15052: (1) 5003398354 Code changed to ensure that if the user specifies any other parameters, they match those used on the initial define. Produce an error code otherwise. (2) 4701395459 This was an incorrect ASSERT which has been removed. It is a benign problem, but produces annoying error logs and console messages. (3) 4701393256 Code changed to ensure we pass LQE by pointer, and not value when freeing a queue - otherwise the buffer at the head of the queue remains on the LQE value in the calling routine. This buffer can then be reused causing corruption on the CH queues. (4) 4701392290 Code change made to ensure that when allocating buffers, a check should be made for the correct offset within aping command for the data length parameter. (5) 4701389601 Correct the corruption in the trace written to the kernel buffer. Also add a user space formatter which can be run on the customer's system. (6) 4701388983 Code change to allow XID exchange to continue even if no PU Name vector included. (7) 4701388124 Code changed to ensure we pass LQE by pointer, and not value when freeing a queue - otherwise the buffer at the head of the queue remains on the LQE value in the calling routine. This buffer can then be reused causing corruption on the CH queues. (8) 1653261669 Send a signal to the host when firmware is ready (backplane and frontplane are initialized) Remove debug trace from msgbuf (opt1:) (9) 1653260489 Code changed to ensure we pass LQE by pointer, and not value when freeing a queue - otherwise the buffer at the head of the queue remains on the LQE value in the calling routine. This buffer can then be reused causing corruption on the CH queues. (10) 1653256123 The code has been changed to ensure we pass LQE by pointer, and not value when freeing a queue - otherwise the buffer at the head of the queue remains on the LQE value in the calling routine. This buffer can then be reused causing corruption on the CH queues. In some instances this corruption was the precursor to subsequent system panic but in this case it has produced a warning message only. (11) 1653241521 Code changed to ensure we correctly chain through all default LUs using the LU NAME field - otherwise we can miss LUs. PHNE_14552: (1) 5003395475 Code change made to fix DMA problems. (2) 4701385500 Add lower Streams routines for Stratus. Note that these are dummied out in HP HMOD. Fix only applied to R5.1 . R6 and onwards already includes this fix. (3) 4701382572 System hang was caused by semaphore being held . Code change made to both the libpsi0 and sdlc.pbs firmware. (4) 4701385815 Ignore invalid control codes rather than dropping connection. (5) 1653252965 Code changed to ensure the length field is set up correctly for format 0 XIDs when the node transmits the XID frame to the DLC. (6) 1653236075 Unable to determine exact cause of panic. Because problem has not been reproducible the code has been changed to :- a) Add code to zero off key pointers when a buffer is freed so we can determine multiple frees much earlier. b) Add in cyclic trace so we can see sequence of flows which cause the panic. PHNE_13642: (1) 5003395376 Code changed to initialise RTM control block. (2) 5003379065 Trace shows that we are very busy (tracing for another problem) and SNAP-LINK gets behind and is retransmitting frames unnecessarily. Problem caused by sending 8 frames in window. (3) 4701379560 That message was not taken into account in R5.1 (4) 1653249474 Code change to handle non-activation XIDs on non-APPN links. (5) 1653248732 Code changed to handle case of client exiting while we are trying to send to him with SIGPIPE generated by sockets library. (6) 1653246231 It appears that this problem is due to a window in link closedown where we have issued a close_port message down to the HMOD, and are awaiting a response. While the HMOD is closing, it detects a link failure and issues a callback - this causes us to go into closedown processing again. While not being able to ascertain exactly what the HMOD was doing, a firewall fix to the SDLC driver has been written which will correctly handle this situation safely, and prevent the panic. (7) 1653239624 The problem is a scheduling problem. The sequence is: (schedule) close PLU ----------------------------> close PLU(OK) Status Session error -----------------------------> close SSCP <---------------------------- (and close file descriptors) (schedule) At this point the scheduling loop attempts to unblock the socket, as there is still a message outstanding. the socket file descriptor is now -1, as the file descriptor has been closed. This breaks the computation of the fd mask for the select, and things go downhill very rapidly. The fix is a) put an ASSERT in vtn_sched_chg_fd b) bypass the message processing if we detect that the tni_state is TNI_STATE_RESET apparently, we don't need to discard the queued message, as it is actually embedded in the control block. PHNE_13427: (1) 1653240846 Problem caused by receipt of BIND with send RU size of 0. Code changed to handle this situation. (2) 4701342899 In some cases the link loss is not detected. (3) 4701375600 Code change to allow buffer size greater than 2 bytes. PHNE_12954: (1) 1653230284 Ensure we allow negotiation to complete within TN Server before putting flow control on the session - regardless of when BIND received (2) 5003388793 If the TP is running as root (and only if) then allow the user to put a user_id on the allocate verb - this will be sent in the attach. Otherwise, send the user ID the TP is running under. PHNE_11961: (1) 1653223529 Correct the code that handles initially active sessions for immediate allocates. (2) 5003378646 Ensure that if SSCP segment received which is not BBIU that the previous parts of the segment have been processed already. PHNE_9663: (1) 1653204875 Correct the APPN FSM to ensure the response is handled by the correct routine within the APPN node. (2) 1653212332 Since TN3270 regime is an optional part of the protocol, this code has been removed, forcing the negotiation exchange to use alternative supported methods. (3) 5003352336 Code change made to provide Enhancement as per Problem description. (4) 5003362665 Corrected syntax of null RU built by TN server which was being rejected by CH. Needed to set the dsf field and dcf correctly. (5) 5003368696 Pass through RMA unmodified if received at TN Server or TN3270. (6) 5003369736 Code change made to allow the bind to be accepted. PHNE_9651: (1) 1653206953 Details of fix applied: Set up RH in LM before sending NMVT to CH. (nlmdsscp.c) (2) 1653206961 Details of fix applied: Changed nru_write_lu() in nruwrtlu.c so that if the application writes a BIND response but the BIND is non-negotiable, we copy the data from the BIND request in to the BIND response. (3) 1653206979 The code-arm has been changed to ensure that it is called only once, the first time the session moves to the active state. An assertion has been added that the temporary memory is still present before it is freed. (4) 1653207027 Details of fix applied: After completing the parsing of the LOCATE in nds_parse_locate, check if the FQP CID subfield was present. If not, return sense codes 10140060 which will cause CP-CP session deactivation. (5) 1653207035 Details of fix applied: Changed nss_bld_snd_locate_message (nsssinit.c) so that it fills the fqc pname in the fqolu_name field of LOCATE_MESSAGE when processing HPR_INIT_SIGNAL. (6) 1653207043 Details of fix applied: Changed nas_actpulu_processor (nasrecv.c) so that it sends a -RSP if there is not enough memory to assign the LFSID when an ACTPU or ACTLU is received from PS. Changed nas_send_router (nassend.c) so that it returns a -RSP to SM if there is not enough memory to assign the LFSID when SM sends an ACTPU or ACTLU, or a BIND with SIDH = 1. Changed nlm_send_actlu_rq (nlmpsfsm.c) so that it gets a big enough buffer for t he ACTLU RQ so that it can be converted into a -RSP by AS if necessary. (7) 1653207084 Details of Fix applied: Attachmate client is rejecting DO_3270_REGIME with DONT_3270_REGIME message, when code is expecting DONT_3270_REGIME message. Enhance TN server to support this behaviour. (8) 1653207092 Details of fix applied: Ensure do not dereference null data. (9) 1653207100 Details of fix applied: Correctly process query_directory_stats for LEN/EN in the APPN node. (10) 1653200352 Details of fix applied: Add spinlock to protect data structures shared (queue in particular). Removed logs from call back context. Strategic fix applied at V6. (11) 1653203505 Details of fix applied: Node fix to handle case of non-negotiable long response correctly (use the parameters from the BIND request). (12) 1653204636 The code fix applied to cappn/nasbind.c is to Correct a length test of user data from >= to >. (13) 1653199224 Details of fix applied: Ensure length field set up correctly (14) 1653199091 Details of fix applied: Ensure control blocks used to process appl_ready messages are initialised before processing occurs. If a TP has deregistered its TP name it was possible for us to access a pointer which had not been set up - hence the trap. (15) 1653195644 Details of fix applied is to ensure we clean up control blocks correctly at the point an attach is being processed if the TP name has been de-registered. (16) 4701341289 Logging of this event is not now done to the error file. The error case originally looked for is logged elsewhere. (17) 1653192419 LUWID will not now be rejected if the LU NAME is not fully qualified. (18) 5003343921 Some routers do not require source routing information to function and will continue to work correctly - however, even with this patch some customers will still be unable to connect to stations on remote rings. When the FDDI driver team add source routing support to the HP-UX driver, this fix must be removed. (19) 1653187203 The RU data on a STSN (Set/Test Sequence Number) response generated by SNAP-IX is set to an invalid value. (20) 4701325498 The SNAplus2 product at MR does not support autoactivation of sessions. Sessions are activated upon demand instead of when the link is brought up. The impact of this is minor, except for customers using APPC or CPI-C TPs which use the AP_IMMEDIATE return control type on their allocate verbs. These allocate verbs fail if the sessions they are expecting to use are not active. A patch will be provided in the near future which allows preconfiguration of an autoactivation limit for APPC modes. This patch will allow SNAplus2 APPC and CPI-C TPs which use the AP_IMMEDIATE return control type to function just as they did on SNAplus. If you have TPs which use the AP_IMMEDIATE return control type, you should install this patch after installing the SNAplus2 software. SR: 8606108556 8606108079 8606105839 8606105750 8606104853 8606104163 5003441717 5003398354 5003395475 5003395376 5003388793 5003379065 5003378646 5003369736 5003368696 5003362665 5003352336 5003343921 4701429779 4701429407 4701416883 4701413054 4701405597 4701399527 4701399279 4701395459 4701393256 4701392290 4701389601 4701388983 4701388124 4701385815 4701385500 4701382572 4701379560 4701375600 4701342899 4701341289 4701325498 1653308577 1653302703 1653289686 1653289603 1653261669 1653260489 1653256123 1653252965 1653249474 1653248732 1653246231 1653241521 1653240846 1653239624 1653236075 1653230284 1653223529 1653212332 1653207100 1653207092 1653207084 1653207043 1653207035 1653207027 1653206979 1653206961 1653206953 1653204875 1653204636 1653203505 1653200352 1653199224 1653199091 1653195644 1653192419 1653187203 8606208520 8606181529 8606170221 5003467290 Patch Files: /opt/sna/conf/lib/libpsi0.a /opt/sna/conf/lib/libpsi1.a /opt/sna/conf/lib/libsixd.a /opt/sna/conf/lib/libsixl.a /opt/sna/conf/lib/libsixp.a /opt/sna/conf/lib/libsixs.a /opt/sna/sdlc.dlf /opt/sna/sdlc.pbs /opt/sna/bin/snaprcf /opt/sna/bin/snapmon /opt/sna/bin/snaptnsrvr what(1) Output: /opt/sna/bin/snapmon: ]B.10.20.002 SNAplus2 R5.1 Link Station Monitor ] (PHNE_13642: 98/01/30 15:36:45) ] /opt/sna/bin/snaprcf: ]B.10.20.002 SNAplus2 R5.1 Remote Command Facility d aemon ] (PHNE_9651 : 96/10/30 10:41:53) ] /opt/sna/bin/snaptnsrvr: ]B.10.20.107 SNAplus2 R5.1 TN Server ] (PHNE_17403 : 99/01/11 18:37:09) ] /opt/sna/conf/lib/libpsi0.a: ]B.10.20.011 SNAplus2 R5.1 NIO PSI driver ] (PHNE_20437 : 99/11/09 11:02:32) ] /opt/sna/conf/lib/libpsi1.a: ]B.10.20.013 SNAplus2 R5.1 EISA PSI driver ] (PHNE_18487 : 99/08/06 04:45:51) ] /opt/sna/conf/lib/libsixd.a: ]B.10.20.002 SNAplus2 R5.1 NDLC to DLPI Mapping ] (PHNE_22811 : 99/11/30 17:20:20) ] /opt/sna/conf/lib/libsixl.a: ]B.10.20.008 SNAplus2 R5.1 SDLC in the Kernel ] (PHNE_18487 : 99/05/26 14:30:54) ] /opt/sna/conf/lib/libsixp.a: ]B.10.20.001 SNAplus2 R5.1 QLLC Module ] (PHNE_20437 : 98/04/22 15:34:58) ] /opt/sna/conf/lib/libsixs.a: ]B.10.20.052 SNAplus2 R5.1 Router in the kernel ] (PHNE_22811 : 01/08/07 13:53:14) ] ]B.10.20.038 SNAplus2 R5.1 APPN kernel library routi nes ] (PHNE_22811 : 01/08/07 13:52:47) ] /opt/sna/sdlc.dlf: SNAplus2 EISA FW v2.7 (99/07/30 12:58:42) /opt/sna/sdlc.pbs: ]SNAplus2 NIO FW v2.1 ](98/11/13 11:58:22) cksum(1) Output: 2958791104 36864 /opt/sna/bin/snapmon 2163890184 61440 /opt/sna/bin/snaprcf 1874665569 114688 /opt/sna/bin/snaptnsrvr 1032167785 63504 /opt/sna/conf/lib/libpsi0.a 2255153051 47504 /opt/sna/conf/lib/libpsi1.a 1620319898 161600 /opt/sna/conf/lib/libsixd.a 1666269896 406252 /opt/sna/conf/lib/libsixl.a 4204635980 147908 /opt/sna/conf/lib/libsixp.a 1162159993 2850184 /opt/sna/conf/lib/libsixs.a 3715532193 105228 /opt/sna/sdlc.dlf 3918812582 172212 /opt/sna/sdlc.pbs Patch Conflicts: None Patch Dependencies: s700: 10.20: PHNE_20438 s800: 10.20: PHNE_20438 Hardware Dependencies: None Other Dependencies: None Supersedes: PHNE_9651 PHNE_9663 PHNE_11961 PHNE_12954 PHNE_13427 PHNE_13642 PHNE_14552 PHNE_15052 PHNE_16810 PHNE_17403 PHNE_18487 PHNE_20437 Equivalent Patches: None Patch Package Size: 4140 KBytes Installation Instructions: Please review all instructions and the Hewlett-Packard SupportLine User Guide or your Hewlett-Packard support terms and conditions for precautions, scope of license, restrictions, and, limitation of liability and warranties, before installing this patch. ------------------------------------------------------------ 1. Back up your system before installing a patch. 2. Login as root. 3. Copy the patch to the /tmp directory. 4. Move to the /tmp directory and unshar the patch: cd /tmp sh PHNE_22811 5a. For a standalone system, run swinstall to install the patch: swinstall -x autoreboot=true -x match_target=true \ -s /tmp/PHNE_22811.depot By default swinstall will archive the original software in /var/adm/sw/patch/PHNE_22811. If you do not wish to retain a copy of the original software, you can create an empty file named /var/adm/sw/patch/PATCH_NOSAVE. WARNING: If this file exists when a patch is installed, the patch cannot be deinstalled. Please be careful when using this feature. It is recommended that you move the PHNE_22811.text file to /var/adm/sw/patch for future reference. To put this patch on a magnetic tape and install from the tape drive, use the command: dd if=/tmp/PHNE_22811.depot of=/dev/rmt/0m bs=2k Special Installation Instructions: Stop SNA daemon before installing patch (snap stop). After installing the patch start the SNA daemon (snap start).