SCIENTIFIC-LINUX-USERS Archives

December 2018

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Gilles Detillieux <[log in to unmask]>
Reply To:
Gilles Detillieux <[log in to unmask]>
Date:
Mon, 3 Dec 2018 10:21:10 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (193 lines)
I've gotten the same errors on one of my SL 7 systems since the updates 

on Monday & Tuesday last week. I thought it was a hardware failure of 

the on-board GPU (an ASUS M5A78L-M/USB3 motherboard with on-board Radeon 

3000), so I put in a PCIe graphics card (Radeon X1300/X1550) and didn't 

have further problems. I have 6 other SL 7 systems with a similar m/b 

and GPU, and didn't notice the same problem with any of the others, but 

now that I scrutinize the logs I see that one other of these systems got 

the same errors last Tuesday. It would seem that it doesn't affect all 

Radeons equally, and it's inconsistent even on the 3000 chipset, but it 

was happening quite frequently on one of my systems.



Looks like I need to do more testing on all these systems, and try 

backing out kernel and/or Xorg updates to see which one brought in this 

error.



On 2018-12-03 09:16, Andreas Nowack wrote:

> Hello,

>

> since the update of the Xorg packages last week I see the problems 

> with the graphics card in /var/log/messages (see below). Every now and 

> then, the graphics output is delayed. From time to time, the screen 

> gets distorted during login or the computer does not accept any 

> keyboard or mouse input.

>

> Is this a known problem?

>

>

> Dec  3 13:39:30 lxcip04 kernel: [  282.721762] radeon 0000:01:05.0: 

> ring 0 stalled for more than 27178msec

> Dec  3 13:39:30 lxcip04 kernel: [  282.721771] radeon 0000:01:05.0: 

> GPU lockup (current fence id 0x00000000000006d2 last fence id 

> 0x00000000000006d5 on ring 0)

> Dec  3 13:39:30 lxcip04 kernel: [  283.222747] radeon 0000:01:05.0: 

> ring 0 stalled for more than 27679msec

> Dec  3 13:39:30 lxcip04 kernel: [  283.222757] radeon 0000:01:05.0: 

> GPU lockup (current fence id 0x00000000000006d2 last fence id 

> 0x00000000000006d5 on ring 0)

> Dec  3 13:39:31 lxcip04 kernel: [  283.723741] radeon 0000:01:05.0: 

> ring 0 stalled for more than 28180msec

> Dec  3 13:39:31 lxcip04 kernel: [  283.723751] radeon 0000:01:05.0: 

> GPU lockup (current fence id 0x00000000000006d2 last fence id 

> 0x00000000000006d5 on ring 0)

> Dec  3 13:39:31 lxcip04 kernel: [  284.224735] radeon 0000:01:05.0: 

> ring 0 stalled for more than 28681msec

> Dec  3 13:39:31 lxcip04 kernel: [  284.224744] radeon 0000:01:05.0: 

> GPU lockup (current fence id 0x00000000000006d2 last fence id 

> 0x00000000000006d5 on ring 0)

> Dec  3 13:39:32 lxcip04 kernel: [  284.725725] radeon 0000:01:05.0: 

> ring 0 stalled for more than 29182msec

> Dec  3 13:39:32 lxcip04 kernel: [  284.725735] radeon 0000:01:05.0: 

> GPU lockup (current fence id 0x00000000000006d2 last fence id 

> 0x00000000000006d5 on ring 0)

> Dec  3 13:39:32 lxcip04 kernel: [  285.226720] radeon 0000:01:05.0: 

> ring 0 stalled for more than 29683msec

> Dec  3 13:39:32 lxcip04 kernel: [  285.226729] radeon 0000:01:05.0: 

> GPU lockup (current fence id 0x00000000000006d2 last fence id 

> 0x00000000000006d5 on ring 0)

> Dec  3 13:39:33 lxcip04 kernel: [  285.544838] radeon 0000:01:05.0: 

> Saved 89 dwords of commands on ring 0.

> Dec  3 13:39:33 lxcip04 kernel: [  285.544851] radeon 0000:01:05.0: 

> GPU softreset: 0x00000009

> Dec  3 13:39:33 lxcip04 kernel: [  285.544857] radeon 0000:01:05.0:   

> R_008010_GRBM_STATUS      = 0xE4703030

> Dec  3 13:39:33 lxcip04 kernel: [  285.544862] radeon 0000:01:05.0:   

> R_008014_GRBM_STATUS2     = 0x00110103

> Dec  3 13:39:33 lxcip04 kernel: [  285.544866] radeon 0000:01:05.0:   

> R_000E50_SRBM_STATUS      = 0x20000040

> Dec  3 13:39:33 lxcip04 kernel: [  285.544871] radeon 0000:01:05.0:   

> R_008674_CP_STALLED_STAT1 = 0x00000000

> Dec  3 13:39:33 lxcip04 kernel: [  285.544875] radeon 0000:01:05.0:   

> R_008678_CP_STALLED_STAT2 = 0x00008002

> Dec  3 13:39:33 lxcip04 kernel: [  285.544880] radeon 0000:01:05.0:   

> R_00867C_CP_BUSY_STAT     = 0x00008084

> Dec  3 13:39:33 lxcip04 kernel: [  285.544884] radeon 0000:01:05.0:   

> R_008680_CP_STAT          = 0x80018645

> Dec  3 13:39:33 lxcip04 kernel: [  285.544889] radeon 0000:01:05.0:   

> R_00D034_DMA_STATUS_REG   = 0x44C83D57

> Dec  3 13:39:33 lxcip04 kernel: [  285.599358] radeon 0000:01:05.0: 

> R_008020_GRBM_SOFT_RESET=0x00007FEF

> Dec  3 13:39:33 lxcip04 kernel: [  285.599415] radeon 0000:01:05.0: 

> SRBM_SOFT_RESET=0x00000100

> Dec  3 13:39:33 lxcip04 kernel: [  285.601524] radeon 0000:01:05.0:   

> R_008010_GRBM_STATUS      = 0xA0003030

> Dec  3 13:39:33 lxcip04 kernel: [  285.601530] radeon 0000:01:05.0:   

> R_008014_GRBM_STATUS2     = 0x00000003

> Dec  3 13:39:33 lxcip04 kernel: [  285.601534] radeon 0000:01:05.0:   

> R_000E50_SRBM_STATUS      = 0x20008040

> Dec  3 13:39:33 lxcip04 kernel: [  285.601539] radeon 0000:01:05.0:   

> R_008674_CP_STALLED_STAT1 = 0x00000000

> Dec  3 13:39:33 lxcip04 kernel: [  285.601544] radeon 0000:01:05.0:   

> R_008678_CP_STALLED_STAT2 = 0x00000000

> Dec  3 13:39:33 lxcip04 kernel: [  285.601548] radeon 0000:01:05.0:   

> R_00867C_CP_BUSY_STAT     = 0x00000000

> Dec  3 13:39:33 lxcip04 kernel: [  285.601553] radeon 0000:01:05.0:   

> R_008680_CP_STAT          = 0x80100000

> Dec  3 13:39:33 lxcip04 kernel: [  285.601557] radeon 0000:01:05.0:   

> R_00D034_DMA_STATUS_REG   = 0x44C83D57

> Dec  3 13:39:33 lxcip04 kernel: [  285.601565] radeon 0000:01:05.0: 

> GPU reset succeeded, trying to resume

> Dec  3 13:39:33 lxcip04 kernel: [  285.621315] [drm] PCIE GART of 512M 

> enabled (table at 0x00000000C0040000).

> Dec  3 13:39:33 lxcip04 kernel: [  285.621336] radeon 0000:01:05.0: WB 

> enabled

> Dec  3 13:39:33 lxcip04 kernel: [  285.621344] radeon 0000:01:05.0: 

> fence driver on ring 0 use gpu addr 0x00000000a0000c00 and cpu addr 

> 0xffffa0f72f3adc00

> Dec  3 13:39:33 lxcip04 kernel: [  285.652545] [drm] ring test on 0 

> succeeded in 1 usecs

> Dec  3 13:39:43 lxcip04 kernel: [  295.747591] radeon 0000:01:05.0: 

> ring 0 stalled for more than 10020msec

> Dec  3 13:39:43 lxcip04 kernel: [  295.747602] radeon 0000:01:05.0: 

> GPU lockup (current fence id 0x00000000000006d4 last fence id 

> 0x00000000000006d5 on ring 0)

> Dec  3 13:39:43 lxcip04 org.gnome.Shell.desktop[10501]: radeon: The 

> kernel rejected CS, see dmesg for more information (-16).

> Dec  3 13:39:43 lxcip04 kernel: [  295.749700] [drm:r600_ib_test 

> [radeon]] *ERROR* radeon: fence wait failed (-35).

> Dec  3 13:39:43 lxcip04 kernel: [  295.749748] 

> [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB 

> on GFX ring (-35).

> Dec  3 13:39:43 lxcip04 org.gnome.Shell.desktop[10501]: radeon: The 

> kernel rejected CS, see dmesg for more information (-16).

> Dec  3 13:39:43 lxcip04 org.gnome.Shell.desktop[10501]: radeon: The 

> kernel rejected CS, see dmesg for more information (-16).

> Dec  3 13:39:43 lxcip04 org.gnome.Shell.desktop[10501]: radeon: The 

> kernel rejected CS, see dmesg for more information (-16).

> Dec  3 13:39:43 lxcip04 org.gnome.Shell.desktop[10501]: radeon: The 

> kernel rejected CS, see dmesg for more information (-16).

> Dec  3 13:39:43 lxcip04 org.gnome.Shell.desktop[10501]: radeon: The 

> kernel rejected CS, see dmesg for more information (-16).

> Dec  3 13:39:43 lxcip04 org.gnome.Shell.desktop[10501]: radeon: The 

> kernel rejected CS, see dmesg for more information (-16).

>

> Here is a list of installed relevant packages:

>

> kernel-3.10.0-957.1.3.el7.x86_64

> xorg-x11-apps-7.7-7.el7.x86_64

> xorg-x11-docs-1.6-7.el7.noarch

> xorg-x11-drivers-7.7-6.el7.x86_64

> xorg-x11-drv-ati-18.0.1-1.el7.x86_64

> xorg-x11-drv-dummy-0.3.7-1.el7.1.x86_64

> xorg-x11-drv-evdev-2.10.6-1.el7.x86_64

> xorg-x11-drv-fbdev-0.5.0-1.el7.x86_64

> xorg-x11-drv-intel-2.99.917-28.20180530.el7.x86_64

> xorg-x11-drv-keyboard-1.9.0-1.el7.x86_64

> xorg-x11-drv-libinput-0.27.1-2.el7.x86_64

> xorg-x11-drv-mouse-1.9.2-2.el7.x86_64

> xorg-x11-drv-nouveau-1.0.15-1.el7.x86_64

> xorg-x11-drv-openchrome-0.5.0-3.el7.1.x86_64

> xorg-x11-drv-qxl-0.1.5-4.el7.1.x86_64

> xorg-x11-drv-synaptics-1.9.0-2.el7.x86_64

> xorg-x11-drv-v4l-0.2.0-49.el7.x86_64

> xorg-x11-drv-vesa-2.4.0-1.el7.x86_64

> xorg-x11-drv-vmmouse-13.1.0-1.el7.1.x86_64

> xorg-x11-drv-vmware-13.2.1-1.el7.1.x86_64

> xorg-x11-drv-void-1.4.1-2.el7.1.x86_64

> xorg-x11-drv-wacom-0.36.1-1.el7.x86_64

> xorg-x11-fonts-ISO8859-1-100dpi-7.5-9.el7.noarch

> xorg-x11-fonts-ISO8859-1-75dpi-7.5-9.el7.noarch

> xorg-x11-fonts-misc-7.5-9.el7.noarch

> xorg-x11-fonts-Type1-7.5-9.el7.noarch

> xorg-x11-font-utils-7.5-21.el7.x86_64

> xorg-x11-proto-devel-2018.4-1.el7.noarch

> xorg-x11-server-common-1.20.1-5.1.sl7.x86_64

> xorg-x11-server-utils-7.7-20.el7.x86_64

> xorg-x11-server-Xorg-1.20.1-5.1.sl7.x86_64

> xorg-x11-utils-7.5-23.el7.x86_64

> xorg-x11-xauth-1.0.9-1.el7.x86_64

> xorg-x11-xbitmaps-1.1.1-6.el7.noarch

> xorg-x11-xinit-1.3.4-2.el7.x86_64

> xorg-x11-xkb-utils-7.7-14.el7.x86_64

>

>

> Best regards,

>   Andreas

>

> ------------------------------------------------------------------------

>   Dr. Andreas Nowack               email: [log in to unmask]

>   RWTH Aachen

>   III. Phys. Institut B

>   Sommerfeldstr. / Physikzentrum   phone: +49 241 80-27282

>   D-52056 Aachen                     fax: +49 241 80-22244

>   Germany

>

>



-- 

Gilles R. Detillieux              E-mail: <[log in to unmask]>

Spinal Cord Research Centre       WWW:    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scrc.umanitoba.ca_&d=DwIDbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=gd8BzeSQcySVxr0gDWSEbN-P-pgDXkdyCtaMqdCgPPdW1cyL5RIpaIYrCn8C5x2A&m=63PoGckGjmKhHxdx62gqtp6MuytQnlk2lCpyEl9rai4&s=0kgvU4tW7HeYQVWQyJ1tXTI8l-j2kg1VtAta9SVmkOk&e=

Dept. of Physiology and Pathophysiology, Faculty of Health Sciences,

Univ. of Manitoba  Winnipeg, MB  R3E 0J9  (Canada)




ATOM RSS1 RSS2