Panic cpu0 writeback data parity error

2007-12-25 10:09:00

     To All Sun Managers:

     

     I still have not been able to determine the cause of CPU(0) panic

     error on Ultra-2 with 200 MHz CPU.

     

     

     But here is the summary that I have tried so far.

     

     

     Original Question:

     

     To all Sun Manager:

     

     Not long ago, Kun Li has just sent a summary out about panic[cpu0]

     problem and tracking down to cpu problem for Ultra 2 167 or 200 Mhz

     system.

     

     According to the summary, if the cpu number is 501-4791-02 or later,

     the problem will be fixed.

     

     I have looked into our system and found our cpu number is

     501-4791-04-5596 but I still experienced system randomly reboot itself

     with following error messages on error log.

     

     I just wonder whether the previous summary is correct or not. Or the

     error message I got is something else.

     

     

     Thanks for you feedback and I will summarize.

     

     

     Zion

     

     

     The system is dual 200-MHz Enterprise Ultra-2

     running solaris 2.6. It is less than one month old machine and it has

     been randomly rebooted twice.

     Apr 16 06:16:52 vizion unix: panic[cpu0]/thread=0x60fa1ba0: CPU0

     Writeback

     Data Parity Error: AFSR 0x00000000 00800800 AFAR 0x000001ff 30000000

     Apr 16 06:16:52 vizion unix: syncing file systems... 56 49 49 49 49 49

     49

     49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49 49

     49panic[cpu0]/

     thread=0x30043e80: panic sync timeout

     Apr 16 06:16:52 vizion unix: 7472 static and sysmap kernel pages Apr

     16 06:16:52 vizion unix: 64 dynamic kernel data pages

     Apr 16 06:16:52 vizion unix: 199 kernel-pageable pages Apr 16

     06:16:52 vizion unix: Copyright (c) 1983-1997, Sun Microsystems, In

     

     Thanks to following responsers:

     

     Colin Melville

     Sandeep

     Leif Ericksen

     

     Summaries:

     

     As Leif suggested to check out the fan on the CPU box and the fan is

     working fine.

     

     As Sandeep suggested that this error has been common to a batch of

     CPUs from Sun and he suggested to replace them. Our CPU is prtty new

     and the number is -04 level, but the problem still appears.

     

     As Colin suggested to run vts to do stress test on the processors and

     I have run them but the result was clean, no error at all.

     

     My last chance is to call Sun to come out to swap the CPUs. I will

     keep your all posted if I have new finding. It seems to me I am not

     the only one running into this problem, I just hope there are some Sun

     technical staff on this list to shed some light on this issue. I am

     new to the Sun Server and it is brand new system and yet it is such an

     unreliable box for production deployment. I am kind of disappointed

     from my short period of experience with the Sun system so far. :(

     

     

     Zion

Comments

Got something to say?

You must be logged in to post a comment.