Dear all,
in our Scientific Linux cluster (SL 4.5 64bit) we have received the
following two errors:
wn014: Mar 12 09:09:20 wn014 kernel: knot[24574] trap divide error
rip:405894 rsp:7fbfffe520 error:0
wn057: Feb 27 04:31:44 wn057 kernel: fs[4851]: segfault at
00000000ed410c22 rip 00000033ed3707b0 rsp 0000007fbfffb2a8 error 4
Errors are repeated many times (each 4-5 seconds) and appear every few
days on most of the nodes.
All nodes have 2.6.9-67.0.4.ELsmp x86_64 kernel and run a small set of
ordinary services - as grid worker nodes- plus a gpfs client software
to mount a gpfs filesystem.
Have you experienced this before? Could you point us at how we can
source the problem or increase the level of debugging, so we can provide
more valuable information?
Thanks and regards,
IASA IT TEAM