SUMMARY: mutexes and heavy load

2007-12-24 21:33:00

Hi!

Ok. I've got the following answer from Don (thanks a bunch, Don) and it
seems it's good explanation of what
was going on with my system.

> There is a problem in Solaris 7 and early Solaris 8 OS systems with regard
to large
> numbers of mutexes. As the number of mutexes increases in the system, the
amount
> of cpu time they consume increases linearly after a certain point until
they are consuming
> virtually all of the available cpu time, even if the system is essentially
idle!

> There is a patch for Solaris 7 (which I don't have the number of any more)
and one of the
> early Solaris 8 systems which is 108827-05.

I've found that patch for Solaris 7 - it's 106980-17. I applied it and
rebooted machine.
Now, after 2 days it still works like charm. We'll se what will it be like
after a few weeks...

And here is my original mail:

> Hi people!
>
> I've got strange problem:
> The machine is 14 processor domain on E10k with Solaris 7 11/99. It
> runs Oracle 8.1.7 and some processes that use this database. There is
> about 1000 oracle sessions. Machine is _heavily_ loaded - loadavg is
> 95. And.. well just look at some statistics:
>
> "vmstat 3" shows:
> r b w swap free re mf pi po fr de sr s0 s1 s2 s8 in sy cs us
sy
> id
> 3 0 0 16928 1384 0 0 0 0 0 0 0 0 0 0 0 4294967196 0 0 -46
-8
> -95
> 81 1 0 17316992 6124176 0 66 0 0 0 0 0 0 0 0 0 7880 589133
> 12364 75 25 0 87 3 0 17315136 6122600 0 133 0 0 0 0 0 0 0 0 0
> 9435 552462 14111 74 26 0
> 95 2 0 17311744 6120176 0 242 0 0 0 0 0 4 3 0 0 9574 493991 14674
75
> 25 0
> 90 0 0 17308128 6117512 0 301 0 0 0 0 0 0 0 0 0 8812 594396 13331
75
> 25 0
> 102 1 0 17305520 6115024 0 245 0 0 0 0 0 0 0 0 0 6746 590357 11123 76
> 24 0
>
> And "mpstat 3" shows
>
> CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
idl
> 0 2 0 0 226 1 563 232 29 6601 0 43940 74 26 0
0
> 1 18 0 13 1005 474 1975 783 30 5776 0 30448 71 29 0
0
> 2 11 0 93 911 533 1736 668 35 6544 0 33818 80 20 0
0
> 3 28 0 815 173 1 413 167 23 5810 0 28223 68 32 0
0
> 28 0 0 537 456 1 1140 469 20 8601 0 46569 78 22 0
0
> 30 0 0 0 385 1 1080 404 23 7813 0 48830 76 24 0
0
> 32 0 0 34 262 1 673 270 22 8893 0 54499 78 22 0
0
> 33 1 0 17 302 1 792 316 16 7328 0 47958 79 21 0
0
> 34 19 0 5169 172 1 439 174 19 7511 0 42699 78 22 0
0
> 35 0 0 86 366 1 939 383 20 9280 0 55249 73 27 0
0
> 48 117 0 489 244 1 573 257 32 4971 0 27181 84 16 0
0
> 49 12 0 1248 205 1 530 209 26 8644 0 53641 74 26 0
0
> 50 12 0 13 428 71 585 212 20 9953 0 65631 54 46 0
0
> 51 0 0 43 2699 2276 596 209 20 7892 0 43045 64 36 0
0
>
> Look at smtx column (!) it's incredibly high!
>
> and then i ran "lockstat sleep 5" and here is a bit of output from it:
>
> Adaptive mutex spin: 467216 events
>
> Count indv cuml rcnt spin Lock Caller
>
> ----------------------------------------------------------------------
> ------
> ---
> 439790 94% 94% 1.00 36 tod_lock uniqtime+0x10
>
> 1841 0% 95% 1.00 3 0x30000c6c000 untimeout+0x18
>
> 1828 0% 95% 1.00 2 0x30000c72000 untimeout+0x18
>
> 1663 0% 95% 1.00 4 0x30000c72000 timeout_common+0x4
> 1547 0% 96% 1.00 3 0x30000c6c000 timeout_common+0x4
>
> 1337 0% 96% 1.00 3 0x30000c75000 timeout_common+0x4
>
> 1305 0% 96% 1.00 2 0x30000c75000 untimeout+0x18
>
> 1255 0% 96% 1.00 102 0x30005c703c8 qfestart+0x204
>
> I have totally no clue what to do with this :-(
> Any suggestions will be helpful.
> I will summarize of course.
>
> best regards,
> Jedrzej Nasiadek
>
> p.s.
> One more thing - i've looked at ps' output - there is no
> cpu-consuming "pig"
> the most busy process occupies 1.8% cpu.

Comments

Got something to say?

You must be logged in to post a comment.