SCIENTIFIC-LINUX-USERS Archives

February 2012

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Benjamin Reiter, Aginion IT-Consulting" <[log in to unmask]>
Reply To:
Benjamin Reiter, Aginion IT-Consulting
Date:
Sat, 4 Feb 2012 04:47:11 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (187 lines)
> Dne 4.2.2012 3:34, Benjamin Reiter, Aginion IT-Consulting napsal(a):
>> To verify whether this is a hardware or configuration problem on my
>> side, do SL 6.2 guests on SL 6.2 hosts work reliably without virtio-net
>> hiccups for other people?
>>
>> Even when they are stressed with a bit of network traffic? (~100 GB/hour)
>>
>> Any reports are highly appreciated.
>>
>>
>>
>> -------- Original Message --------
>> Subject: Bug report: Page allocation failure with virtio-net in kvm
>> guest on 2.6.32-220.4.1
>> Date: Thu, 02 Feb 2012 16:21:38 +0100
>> From: Benjamin Reiter, Aginion IT-Consulting
>> <[log in to unmask]>
>> To: [log in to unmask]
>>
>> Page allocation failure with virtio-net in kvm guest on 2.6.32-220.4.1
>>
>> Reproducibly after a couple minutes or hours and 100 MB - 30GB of
>> network traffic (NFS) the network interface in the guest goes down. The
>> guest can be shut down from the host via acpi event.
>>
>> This does only happen with the virtio net driver, with e1000 the guest
>> is stable for days.
>>
>> Host and guest run 2.6.32-220.4.1.el6.x86_64
>>
>> Host runs kvm version 0.12.1.2-2.209.el6_2.4.x86_64
>>
>>
>>
>>
>> Feb  2 13:04:02 host656 kernel: rpciod/0: page allocation failure.
>> order:0, mode:0x20
>> Feb  2 13:04:02 host656 kernel: Pid: 1081, comm: rpciod/0 Not tainted
>> 2.6.32-220.4.1.el6.x86_64 #1
>> Feb  2 13:04:02 host656 kernel: Call Trace:
>> Feb  2 13:04:02 host656 kernel:<IRQ>   [<ffffffff81123daf>] ?
>> __alloc_pages_nodemask+0x77f/0x940
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81158a1a>] ?
>> alloc_pages_current+0xaa/0x110
>> Feb  2 13:04:02 host656 kernel: [<ffffffffa0108d22>] ?
>> try_fill_recv+0x262/0x280 [virtio_net]
>> Feb  2 13:04:02 host656 kernel: [<ffffffff8142df18>] ?
>> netif_receive_skb+0x58/0x60
>> Feb  2 13:04:02 host656 kernel: [<ffffffffa01091fd>] ?
>> virtnet_poll+0x42d/0x8d0 [virtio_net]
>> Feb  2 13:04:02 host656 kernel: [<ffffffff814307c3>] ?
>> net_rx_action+0x103/0x2f0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81072001>] ?
>> __do_softirq+0xc1/0x1d0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff8100c24c>] ?
>> call_softirq+0x1c/0x30
>> Feb  2 13:04:02 host656 kernel:<EOI>   [<ffffffff8100de85>] ?
>> do_softirq+0x65/0xa0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81071f0a>] ?
>> local_bh_enable+0x9a/0xb0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff8147a8e7>] ?
>> tcp_rcv_established+0x107/0x800
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81482c13>] ?
>> tcp_v4_do_rcv+0x2e3/0x430
>> Feb  2 13:04:02 host656 kernel: [<ffffffff8147ead6>] ?
>> tcp_write_xmit+0x1f6/0x9e0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff8141cc75>] ?
>> release_sock+0x65/0xe0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff8146fb4c>] ?
>> tcp_sendmsg+0x73c/0xa10
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81419a0a>] ?
>> sock_sendmsg+0x11a/0x150
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81038488>] ?
>> pvclock_clocksource_read+0x58/0xd0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81090a90>] ?
>> autoremove_wake_function+0x0/0x40
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81061c95>] ?
>> enqueue_entity+0x125/0x420
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81419a81>] ?
>> kernel_sendmsg+0x41/0x60
>> Feb  2 13:04:02 host656 kernel: [<ffffffffa018ab6e>] ?
>> xs_send_kvec+0x8e/0xa0 [sunrpc]
>> Feb  2 13:04:02 host656 kernel: [<ffffffffa018acf3>] ?
>> xs_sendpages+0x173/0x220 [sunrpc]
>> Feb  2 13:04:02 host656 kernel: [<ffffffffa018aedd>] ?
>> xs_tcp_send_request+0x5d/0x160 [sunrpc]
>> Feb  2 13:04:02 host656 kernel: [<ffffffffa0188e63>] ?
>> xprt_transmit+0x83/0x2e0 [sunrpc]
>> Feb  2 13:04:02 host656 kernel: [<ffffffffa0185c48>] ?
>> call_transmit+0x1d8/0x2c0 [sunrpc]
>> Feb  2 13:04:02 host656 kernel: [<ffffffffa018e23e>] ?
>> __rpc_execute+0x5e/0x2a0 [sunrpc]
>> Feb  2 13:04:02 host656 kernel: [<ffffffffa018e4d0>] ?
>> rpc_async_schedule+0x0/0x20 [sunrpc]
>> Feb  2 13:04:02 host656 kernel: [<ffffffffa018e4e5>] ?
>> rpc_async_schedule+0x15/0x20 [sunrpc]
>> Feb  2 13:04:02 host656 kernel: [<ffffffff8108b150>] ?
>> worker_thread+0x170/0x2a0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81090a90>] ?
>> autoremove_wake_function+0x0/0x40
>> Feb  2 13:04:02 host656 kernel: [<ffffffff8108afe0>] ?
>> worker_thread+0x0/0x2a0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81090726>] ? kthread+0x96/0xa0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
>> Feb  2 13:04:02 host656 kernel: [<ffffffff81090690>] ? kthread+0x0/0xa0
>> Feb  2 13:04:02 host656 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
>> ...
>>
>>
>> VM is started with:
>>
>> qemu      2347 61.7  3.7 537704 281556 ?       Sl   13:09  67:29
>> /usr/libexec/qemu-kvm -S -M rhel6.2.0 -enable-kvm -m 256 -smp
>> 1,sockets=1,cores=1,threads=1 -name kvm_host656.net31 -uuid
>> 97eae23f-bb13-58da-b4bc-258c6bf275a2 -nodefconfig -nodefaults -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/kvm_host656.net31.monitor,server,nowait
>>
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
>> -no-shutdown -drive
>> file=/dev/disk/by-path/ip-10.224.2.20:3260-iscsi-iqn.1986-03.com.sun:02:e9e63ad1-3f29-4d5c-9da9-b10e44a1520f.vmstore12.net31-lun-1,if=none,id=drive-virtio-disk0,format=raw,cache=none
>>
>> -device
>> virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>>
>> -netdev tap,fd=21,id=hostnet0 -device
>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:6a:c7:d8,bus=pci.0,addr=0x3
>>
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 0.0.0.0:21 -k en-us
>> -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
>
> Hi, for me virtio-net works fine but with sufficient amount of memory.
>
> Which in your case, I think it's just too low to make the kernel work
> reliable under pressure.
>
> I have amd64 host machine, one of the guests is configured as i386,
> 1CPU, 1GB memory, virtio-net and it's doing single thread data analysis
> from NFS share without any problems. I had issues under low memory
> condition, but not just with virtio :-)
>
> HTH, Z.
>
>

Thanks for your input. I can try increasing ram size but I don't think 
this qualifies as a low memory condition. This is normal operation after 
about 30 hours uptime:

        244680  total memory
        215732  used memory
         58152  active memory
         85692  inactive memory
         28948  free memory
          1456  buffer memory
        107944  swap cache
       3561464  total swap
          4256  used swap
       3557208  free swap
        172194 non-nice user cpu ticks
           673 nice user cpu ticks
        127903 system cpu ticks
      12064827 idle cpu ticks
        665005 IO-wait cpu ticks
        208048 IRQ cpu ticks
        561345 softirq cpu ticks
             0 stolen cpu ticks
        764023 pages paged in
        480384 pages paged out
            98 pages swapped in
          1081 pages swapped out
     103629433 interrupts
      54954195 CPU context switches
    1328188168 boot time
          2993 forks

Seems sufficient, doesn't it?

Why would virtio be affected by it but e1000 not at all? Even if memory 
would really be a problem I don't think killing the network is the right 
response.

Btw: Looks a lot like this bug from 2009: 
https://bugzilla.redhat.com/show_bug.cgi?id=520119

Benjamin

ATOM RSS1 RSS2