SCIENTIFIC-LINUX-USERS Archives

May 2005

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Michael Mansour <[log in to unmask]>
Reply To:
Michael Mansour <[log in to unmask]>
Date:
Sun, 22 May 2005 14:17:15 +1000
Content-Type:
text/plain
Parts/Attachments:
text/plain (311 lines)
Hi,

After another 3 hours trouble-shooting I finally have the Intel pro 1000 
working as expected, 

I'll detail what I did below for anyone else that may have similar issues.

Firstly, I moved the Intel Pro 1000MT card from one PCI slot to another, 
after doing this an offline test from ethtool worked:

# ethtool -t eth1 offline
The test result is PASS
The test extra info:
Register test  (offline)         0
Eeprom test    (offline)         0
Interrupt test (offline)         0
Loopback test  (offline)         0
Link test   (on/offline)         0

this solved the "Interrupt test 4" error I detailed below. I ran ping tests 
on the interfaces connected via a cat 6 cross-over cable (1000mbps FD both 
sides) and all worked. I transferred an ISO across using scp and got rates 
between 6mbps - 7mbps for the copy (is this what I should expect here? I 
would have expected much more as I get that scp speed on 100Mbps FD).

Anyway, after doing that I started to have issues with the eth2 interface 
(eth0 is a realtek 100m pci, eth1 is the intel pro 1000mt, eth2 is an 
integrated SiS 900 PCI Fast Ethernet). With the following showing up on boot:

May 22 12:44:23 server kernel: sis96x_smbus 0000:00:02.1: SiS96x SMBus base 
address: 0xe600
May 22 12:44:23 server kernel: irq 3: nobody cared! (screaming interrupt?)
May 22 12:44:23 server kernel: irq 3: Please try booting with acpi=off and 
report a bug
May 22 12:44:23 server kernel:  [<c010734a>] __report_bad_irq+0x3a/0x77
May 22 12:44:23 server kernel:  [<c01075c1>] note_interrupt+0xea/0x115
May 22 12:44:23 server kernel:  [<c01077ff>] do_IRQ+0xd5/0x130
May 22 12:44:23 server kernel:  [<c02c6ca4>] common_interrupt+0x18/0x20
May 22 12:44:23 server kernel:  [<c01072de>] handle_IRQ_event+0x1d/0x4f
May 22 12:44:23 server kernel:  [<c01077da>] do_IRQ+0xb0/0x130
May 22 12:44:23 server kernel:  [<c02c6ca4>] common_interrupt+0x18/0x20
May 22 12:44:24 server kernel:  [<c011b1d7>] kunmap_atomic+0x51/0x58
May 22 12:44:24 server kernel:  [<c014827e>] do_wp_page+0xd8/0x34f
May 22 12:44:24 server kernel:  [<c01490e8>] handle_mm_fault+0x11c/0x175
May 22 12:44:24 server kernel:  [<c0119811>] do_page_fault+0x1ae/0x5b6
May 22 12:44:24 server kernel:  [<c011bceb>] finish_task_switch+0x46/0x66
May 22 12:44:24 server kernel:  [<c02c459f>] schedule+0x833/0x869
May 22 12:44:24 server kernel:  [<c012ab4a>] sigprocmask+0xb0/0xca
May 22 12:44:24 server kernel:  [<c012abfc>] sys_rt_sigprocmask+0x98/0x145
May 22 12:44:24 server kernel:  [<c0119663>] do_page_fault+0x0/0x5b6
May 22 12:44:24 server kernel:  [<c02c6dc3>] error_code+0x2f/0x38

and therefore disabling that device. Note that I already have acpi=off in my 
grub boot for this server.

I then went into the BIOS and removed IRQ 3 from the available IRQ's, and 
the same error came up with IRQ5, I removed IRQ5, the same error came up 
with IRQ7, I removed IRQ7 and the machine booted and brought up eth2 
correctly (this time assigning it IRQ9).

After running some scp tests on eth0 (eth0 and eth2 are within the same 
subnet), I got the error above for IRQ 9:

May 22 13:55:10 server kernel: eth2: Media Link On 100mbps full-duplex
May 22 13:55:56 server kernel: irq 9: nobody cared! (screaming interrupt?)
May 22 13:55:56 server kernel: irq 9: Please try booting with acpi=off and 
report a bug
May 22 13:55:56 server kernel:  [<c010734a>] __report_bad_irq+0x3a/0x77
May 22 13:55:56 server kernel:  [<c01075c1>] note_interrupt+0xea/0x115
May 22 13:55:56 server kernel:  [<c01077ff>] do_IRQ+0xd5/0x130
May 22 13:55:56 server kernel:  [<c02c6ca4>] common_interrupt+0x18/0x20
May 22 13:55:56 server kernel:  [<c02c007b>] unix_getname+0x64/0x92
May 22 13:55:56 server kernel: handlers:
May 22 13:55:56 server kernel: [<f881340f>] (sis900_interrupt+0x0/0xaa 
[sis900])
May 22 13:55:56 server kernel: Disabling IRQ #9

When I go ahead and disable IRQ9 from the BIOS, it complains about IRQ4 on 
the next boot, so I disabled IRQ4 and it booted cleanly. I did scp's and all 
worked fine without disabling IRQ's.

Although now I have:

# cat /proc/interrupts
           CPU0
  0:     353219          XT-PIC  timer
  1:        397          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  8:          1          XT-PIC  rtc
 10:     762305          XT-PIC  eth0, eth2, eth1
 12:       1070          XT-PIC  i8042
 14:      15579          XT-PIC  ide0
 15:       2948          XT-PIC  ide1
NMI:          0
LOC:     353069
ERR:          0
MIS:          0

so all ethernets are sharing the one interrupt. At least it works.

Michael.

> Hi Michel,
> 
> I know it's been a while since we last chatted about this, but I 
> took your recommendation at that time and bought 4 of these cards 
> from the US (I'm in Australia).
> 
> When inserting the cards into 2 newly built SL4 servers, Linux picks 
> them up and configures the e1000 module without issues.
> 
> lspci shows:
> 
> 00:0e.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet 
> Controller 
> (rev 02)
> 
> and
> 
> 00:09.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet 
> Controller 
> (rev 02)
> 
> on both servers, and ethtool shows:
> 
> Settings for eth1:
>         Supported ports: [ TP ]
>         Supported link modes:   10baseT/Half 10baseT/Full
>                                 100baseT/Half 100baseT/Full
>                                 1000baseT/Full
>         Supports auto-negotiation: Yes
>         Advertised link modes:  10baseT/Half 10baseT/Full
>                                 100baseT/Half 100baseT/Full
>                                 1000baseT/Full
>         Advertised auto-negotiation: Yes
>         Speed: 1000Mb/s
>         Duplex: Full
>         Port: Twisted Pair
>         PHYAD: 0
>         Transceiver: internal
>         Auto-negotiation: on
>         Supports Wake-on: umbg
>         Wake-on: g
>         Current message level: 0x00000007 (7)
>         Link detected: yes
> 
> on both servers.
> 
> However, I've been testing "ping" on these cards under SL4 without 
> any success (a ping from the local host works but not from the 
> remote host).
> 
> I inserted one card in each of two servers, and hooked them up via a 
> cross- over cable. They autonegotiate to 1000mbps FD, but they 
> cannot communicate with each other at all (they link fine, just 
> cannot ping each other). When I try adn ping from the private subnet 
> I've assigned each card, the only responses I get from the cards are:
> 
> kernel: NETDEV WATCHDOG: eth1: transmit timed out
> kernel: e1000: eth1: e1000_watchdog: NIC Link is Down
> kernel: e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
> 
> every few seconds. If I stop the pings, they sit there idle and 
> happy without resetting the interfaces at all, only when I try and 
> transmit data 
> (ping), I get the "NETDEV WATCHDOG" errors.
> 
> Last night I was trouble-shooting this problem for about 6 hours,
>  trying everything I could think of, different speeds and duplex 
> settings, even downloading the latest intel drivers for these cards 
> and compiling them as modules and trying them, still no go (although 
> the new driver - e1000-
> 6.0.54.tar.gz - got rid of the "NETDEV WATCHDOG" errors, I still 
> couldn't ping the interfaces).
> 
> I used the intel diag1000 tools off the cd which accompanied the 
> cards to test the cards from dos. All test passed fine and even the 
> continuous network tests with one setup as a responder and the other 
> running the network test (and vice versa), but under Linux a total 
> no go.
> 
> When running ethtool's offline test, on one server I get:
> 
> # ethtool -t eth1 offline
> The test result is FAIL
> The test extra info:
> Register test  (offline)         0
> Eeprom test    (offline)         0
> Interrupt test (offline)         4
> Loopback test  (offline)         0
> Link test   (on/offline)         0
> 
> (an SMP server)
> 
> on the other server I get:
> 
> # ethtool -t eth1 offline
> The test result is PASS
> The test extra info:
> Register test  (offline)         0
> Eeprom test    (offline)         0
> Interrupt test (offline)         0
> Loopback test  (offline)         0
> Link test   (on/offline)         0
> 
> (a UP server)
> 
> The "Interrupt test (offline)         4" on the first server 
> concerned me but I tried everything I could think of from BIOS 
> changes etc but could not get it corrected.
> 
> I booted both servers with "noapic" and "acpi=off" as some others 
> suggested from googling, but the same results above.
> 
> I tried configuring the "InterruptThrottleRate" in 
> /etc/modprobe.conf as suggested by Intel in:
> 
> ftp://aiedownload.intel.com/df-support/2897/ENG/README.txt
> 
> as they seemed to recognise this NETDEV problem, but again no go.
> 
> So basically, from what I've seen so far, I just can't use these 
> cards under SL4.
> 
> Today I'm just going to try connecting them to my 100Mbps FD switch 
> and see if they at least work like that, but I'm really not that hopeful.
> 
> Michael.
> 
> > If you want 32-bit pci Nic's, I'd suggest the Intel Pro/1000 MT.
> > 
> > It's what I run in this particular desktop system as well as several
> > others. In my servers (~50) I have all 64-bit PCI-X broadcom nic's
> > integrated on the boards.
> > 
> > They both seem to work quite well.
> > 
> > Hope this helps.
> > 
> > Michael
> > 
> > On Tue, 2005-02-15 at 10:19 +1000, Michael Mansour wrote:
> > > Hi Michael,
> > > 
> > > Is there any particular model of Intel gigabit card you recommend?
> > > 
> > > I've spoken to my supplier and they say they'll take back the cards I 
> bought 
> > > and swap them for Intel, but would like to know a model so I don't 
have 
> to 
> > > make this same mistake again.
> > > 
> > > Thanks.
> > > 
> > > Michael.
> > > 
> > > > Well the first thing I'd say is that the realtek chipset's are 
terrible
> > > > if you really want performance. They don't do any packet header
> > > > processing or other tcp hardware offloading, causing your CPU to take
> > > > the brunt of the packet load (with gigabit this is bad).
> > > > 
> > > > I would recommend Broadcom or Intel based gigabit cards.
> > > > 
> > > > Regardless of this fact, mii-tool doesn't support reading out gigabit
> > > > link status. It'll give you a link up status with 100FD for 1000FD 
> cards
> > > > linked at 1000FD with flow control enabled.
> > > > 
> > > > I know it at least reports on the broadcom and intel based cards 
> > > > with a link up status.
> > > > 
> > > > Otherwise, I think you might need to find another tool to get real 
link
> > > > status out of a gigabit nic.
> > > > 
> > > > On Tue, 2005-02-15 at 09:32 +1000, Michael Mansour wrote:
> > > > > Hi,
> > > > > 
> > > > > I've just purchased some Netgear 1gigabit ethernet cards and to my 
> > > > > dissapointment mii-tool couldn't be used to query them, even 
though 
> the 
> > > linux 
> > > > > kernel has no issues with using them.
> > > > > 
> > > > > I run cluster software with SL303 which uses mii-tool to do link 
> level 
> > > > > checking etc... so my question is, which 1gigabit cards work 
> correctly 
> > > with 
> > > > > SL303 and mii-tool? so that I get output similar to the following:
> > > > > 
> > > > > [root@anaconda root]# mii-tool
> > > > > eth0: negotiated 100baseTx-HD, link ok
> > > > > eth1: negotiated 100baseTx-HD, link ok
> > > > > SIOCGMIIPHY on 'eth2' failed: Operation not supported
> > > > > 
> > > > > note: eth2 is the Netgear card (which uses a Realtek chip), the 
> other two 
> > > are 
> > > > > just standard Realtek PCI cards.
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > Michael.
> > > ------- End of Original Message -------
> ------- End of Original Message -------
------- End of Original Message -------

ATOM RSS1 RSS2