(Interim #2) Mysterious Sun Hangs

2007-12-25 8:38:00

Hi folks -- back, I don't know how long ago, I posted with a problem

of our machines hanging. I posted one interim report, and this is the

second. We are getting places. The problem is actually being worked

by other folks at work, but since I posted the original problem I

am following up for you. Also, Sun is working with us, in the

form of Tim Smith. He reads the list, and Tim - if you want to

add more to this, please feel free as you are more intimate with

the details than I. I got this second hand and I'm not in the

office this week to check the details, but here goes:

What our problem *looks* like is an infinite loop in the socket

buffer routines which appears to be caused by what looks like

a broadcast packet sent from a Solaris Compartmented Mode

Workstation (CMW). If it sounds like I'm hedging this

post with "looks like" and "appears", well I guess I am <:-)

Sun is doing some additional research into this. I will

keep you posted.

Jim Watson

Acting Technical Officer, System Programming Shop

Defense Intelligence Agency

Bolling Air Force Base

Washington, D.C. 20340

--- Standard disclaimers apply ---

------------- Original note follows ----------------

To: sun-managers@eecs.nwu.edu

Hi folks. Mixed environment:

Suns (4.1.x)

DECs (ULTRIX v?)

IBM 3090s

DOS/Novell

Macs (A/UX v?)

IBM Risc 6000 (v ?)

ethernet, with some routing and subnetting

Our problems started with a Sun 690MP running SunOS 4.1.3, no patches. Often

(sometimes twice/more a day) the machine just _locks up_ so hard a keystroke

interrupt (L1-A) will not clear it. We normally generate an interrupt, then,

by unplugging/plugging the keyboard cable. We've taken several dumps and

portmapper is the last beast running.

We recently started transitioning our massive DOS/Novell installed base to

UNIX (Sun SPARC2s 4.1.3, mostly, some DECs, some Macs). Some of our new

SPARC2s are also hanging. (We've stopped taking dumps on them, but saw

portmapper in instances there as well.)

Interestingly, one group of machines on the same subnet as the 690MP, running

SunOS 4.1.2 did _not_ hang. At least until last week when we upgraded to

SunOS 4.1.3. Since then, three out of four upgraded machines have hung.

(No dumps taken.)

          We see a message about "giant packet received from xx:yy:zz:...:"

and the STP cleared message. These are often the last messages on a hung

machine's console. We're seeing lots of these giant packets around the

net, but not always generating a crash.

Some folks think the following: Novell's sending out some IPX packets

which look like TCP/IP packets so the Suns are picking them up and trying

to do something with them and end up crashing.

Sun doesn't know _what_ the deal is. But, they are looking.

We've also got some developers on the net. Maybe doing something with

RPCs. We've traced RPC connections and found nothing out of the ordinary.

Can anyone speak to this problem? Much obliged. Summary, as usual.

Thanks, Jim

---------------------------------------------------------------

Acting Chief, Network Operating Systems Shop

Defense Intelligence Agency

Bolling AFB

Washington, DC 20340

---------------------------------------------------------------

Standard disclaimers apply.

------------- End Original Note ----------------

Comments

Got something to say?

You must be logged in to post a comment.