> Dne 4.2.2012 3:34, Benjamin Reiter, Aginion IT-Consulting napsal(a):
>> To verify whether this is a hardware or configuration problem on my
>> side, do SL 6.2 guests on SL 6.2 hosts work reliably without virtio-net
>> hiccups for other people?
>>
>> Even when they are stressed with a bit of network traffic? (~100 GB/hour)
>>
>> Any reports are highly appreciated.
>>
>>
>>
>> -------- Original Message --------
>> Subject: Bug report: Page allocation failure with virtio-net in kvm
>> guest on 2.6.32-220.4.1
>> Date: Thu, 02 Feb 2012 16:21:38 +0100
>> From: Benjamin Reiter, Aginion IT-Consulting
>> <[log in to unmask]>
>> To: [log in to unmask]
>>
>> Page allocation failure with virtio-net in kvm guest on 2.6.32-220.4.1
>>
>> Reproducibly after a couple minutes or hours and 100 MB - 30GB of
>> network traffic (NFS) the network interface in the guest goes down. The
>> guest can be shut down from the host via acpi event.
>>
>> This does only happen with the virtio net driver, with e1000 the guest
>> is stable for days.
>>
>> Host and guest run 2.6.32-220.4.1.el6.x86_64
>>
>> Host runs kvm version 0.12.1.2-2.209.el6_2.4.x86_64
>>
>>
>>
>>
>> Feb 2 13:04:02 host656 kernel: rpciod/0: page allocation failure.
>> order:0, mode:0x20
>> Feb 2 13:04:02 host656 kernel: Pid: 1081, comm: rpciod/0 Not tainted
>> 2.6.32-220.4.1.el6.x86_64 #1
>> Feb 2 13:04:02 host656 kernel: Call Trace:
>> Feb 2 13:04:02 host656 kernel:<IRQ> [<ffffffff81123daf>] ?
>> __alloc_pages_nodemask+0x77f/0x940
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81158a1a>] ?
>> alloc_pages_current+0xaa/0x110
>> Feb 2 13:04:02 host656 kernel: [<ffffffffa0108d22>] ?
>> try_fill_recv+0x262/0x280 [virtio_net]
>> Feb 2 13:04:02 host656 kernel: [<ffffffff8142df18>] ?
>> netif_receive_skb+0x58/0x60
>> Feb 2 13:04:02 host656 kernel: [<ffffffffa01091fd>] ?
>> virtnet_poll+0x42d/0x8d0 [virtio_net]
>> Feb 2 13:04:02 host656 kernel: [<ffffffff814307c3>] ?
>> net_rx_action+0x103/0x2f0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81072001>] ?
>> __do_softirq+0xc1/0x1d0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff8100c24c>] ?
>> call_softirq+0x1c/0x30
>> Feb 2 13:04:02 host656 kernel:<EOI> [<ffffffff8100de85>] ?
>> do_softirq+0x65/0xa0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81071f0a>] ?
>> local_bh_enable+0x9a/0xb0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff8147a8e7>] ?
>> tcp_rcv_established+0x107/0x800
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81482c13>] ?
>> tcp_v4_do_rcv+0x2e3/0x430
>> Feb 2 13:04:02 host656 kernel: [<ffffffff8147ead6>] ?
>> tcp_write_xmit+0x1f6/0x9e0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff8141cc75>] ?
>> release_sock+0x65/0xe0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff8146fb4c>] ?
>> tcp_sendmsg+0x73c/0xa10
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81419a0a>] ?
>> sock_sendmsg+0x11a/0x150
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81038488>] ?
>> pvclock_clocksource_read+0x58/0xd0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81090a90>] ?
>> autoremove_wake_function+0x0/0x40
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81061c95>] ?
>> enqueue_entity+0x125/0x420
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81419a81>] ?
>> kernel_sendmsg+0x41/0x60
>> Feb 2 13:04:02 host656 kernel: [<ffffffffa018ab6e>] ?
>> xs_send_kvec+0x8e/0xa0 [sunrpc]
>> Feb 2 13:04:02 host656 kernel: [<ffffffffa018acf3>] ?
>> xs_sendpages+0x173/0x220 [sunrpc]
>> Feb 2 13:04:02 host656 kernel: [<ffffffffa018aedd>] ?
>> xs_tcp_send_request+0x5d/0x160 [sunrpc]
>> Feb 2 13:04:02 host656 kernel: [<ffffffffa0188e63>] ?
>> xprt_transmit+0x83/0x2e0 [sunrpc]
>> Feb 2 13:04:02 host656 kernel: [<ffffffffa0185c48>] ?
>> call_transmit+0x1d8/0x2c0 [sunrpc]
>> Feb 2 13:04:02 host656 kernel: [<ffffffffa018e23e>] ?
>> __rpc_execute+0x5e/0x2a0 [sunrpc]
>> Feb 2 13:04:02 host656 kernel: [<ffffffffa018e4d0>] ?
>> rpc_async_schedule+0x0/0x20 [sunrpc]
>> Feb 2 13:04:02 host656 kernel: [<ffffffffa018e4e5>] ?
>> rpc_async_schedule+0x15/0x20 [sunrpc]
>> Feb 2 13:04:02 host656 kernel: [<ffffffff8108b150>] ?
>> worker_thread+0x170/0x2a0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81090a90>] ?
>> autoremove_wake_function+0x0/0x40
>> Feb 2 13:04:02 host656 kernel: [<ffffffff8108afe0>] ?
>> worker_thread+0x0/0x2a0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81090726>] ? kthread+0x96/0xa0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
>> Feb 2 13:04:02 host656 kernel: [<ffffffff81090690>] ? kthread+0x0/0xa0
>> Feb 2 13:04:02 host656 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
>> ...
>>
>>
>> VM is started with:
>>
>> qemu 2347 61.7 3.7 537704 281556 ? Sl 13:09 67:29
>> /usr/libexec/qemu-kvm -S -M rhel6.2.0 -enable-kvm -m 256 -smp
>> 1,sockets=1,cores=1,threads=1 -name kvm_host656.net31 -uuid
>> 97eae23f-bb13-58da-b4bc-258c6bf275a2 -nodefconfig -nodefaults -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/kvm_host656.net31.monitor,server,nowait
>>
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
>> -no-shutdown -drive
>> file=/dev/disk/by-path/ip-10.224.2.20:3260-iscsi-iqn.1986-03.com.sun:02:e9e63ad1-3f29-4d5c-9da9-b10e44a1520f.vmstore12.net31-lun-1,if=none,id=drive-virtio-disk0,format=raw,cache=none
>>
>> -device
>> virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>>
>> -netdev tap,fd=21,id=hostnet0 -device
>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:6a:c7:d8,bus=pci.0,addr=0x3
>>
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 0.0.0.0:21 -k en-us
>> -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
>
> Hi, for me virtio-net works fine but with sufficient amount of memory.
>
> Which in your case, I think it's just too low to make the kernel work
> reliable under pressure.
>
> I have amd64 host machine, one of the guests is configured as i386,
> 1CPU, 1GB memory, virtio-net and it's doing single thread data analysis
> from NFS share without any problems. I had issues under low memory
> condition, but not just with virtio :-)
>
> HTH, Z.
>
>
Thanks for your input. I can try increasing ram size but I don't think
this qualifies as a low memory condition. This is normal operation after
about 30 hours uptime:
244680 total memory
215732 used memory
58152 active memory
85692 inactive memory
28948 free memory
1456 buffer memory
107944 swap cache
3561464 total swap
4256 used swap
3557208 free swap
172194 non-nice user cpu ticks
673 nice user cpu ticks
127903 system cpu ticks
12064827 idle cpu ticks
665005 IO-wait cpu ticks
208048 IRQ cpu ticks
561345 softirq cpu ticks
0 stolen cpu ticks
764023 pages paged in
480384 pages paged out
98 pages swapped in
1081 pages swapped out
103629433 interrupts
54954195 CPU context switches
1328188168 boot time
2993 forks
Seems sufficient, doesn't it?
Why would virtio be affected by it but e1000 not at all? Even if memory
would really be a problem I don't think killing the network is the right
response.
Btw: Looks a lot like this bug from 2009:
https://bugzilla.redhat.com/show_bug.cgi?id=520119
Benjamin
|