On 10/14/20 2:06 PM, David Hildenbrand wrote:
> On 14.10.20 13:53, Michal Privoznik wrote:
>> On 10/14/20 10:26 AM, David Hildenbrand wrote:
>>> On 14.10.20 08:30, Michal Privoznik wrote:
>>
> <snip/>
>
> No, not at all. Thanks for reporting!
>
> And the "bad" thing is, that QEMU doesn't do anything too fancy. All
it
> does is "fallocate(FALLOC_FL_PUNCH_HOLE)" on hugetlbfs when trying to
> zap reported pages. The same mechanism is also used for postcopy live
> migration and virtio-mem with hugetlbfs.
>
> Which kernel are you running?
>
> 1. Is it an upstream kernel, lkml + -mm lists are the right place
> (please cc me, or I can try to reproduce and report it).
>
> 2. Is it a distro kernel? Then create a BUG there.
>
> I was just recently testing virtio-mem with hugetlbfs and it worked on
> decent upstream Fedora. But maybe I was not able to trigger it.
Okay, I've upgraded to 5.9.0-gentoo, but the problem persists. Gentoo
puts only a very few patches on top of vanilla kernel neither of which
touches that area of the code:
https://dev.gentoo.org/~mpagano/genpatches/trunk/5.9/
So I think this is reproducible on vanilla too.
BTW: Have you tried placing the qemu inside v1 cgroups? Libvirt does
that so maybe that's the problem. Anyway, here's the cmd line:
/home/zippy/work/qemu/qemu.git/build/qemu-system-x86_64 \
-name guest=fedora,debug-threads=on \
-S \
-object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes
\
-machine
pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
-cpu host,migratable=on \
-m 4096 \
-object
memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes,size=4294967296,host-nodes=0,policy=bind
\
-overcommit mem-lock=off \
-smp 4,sockets=1,dies=1,cores=2,threads=2 \
-object iothread,id=iothread1 \
-object iothread,id=iothread2 \
-object iothread,id=iothread3 \
-object iothread,id=iothread4 \
-uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
-no-user-config \
-nodefaults \
-device sga \
-chardev socket,id=charmonitor,fd=33,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-boot menu=on,strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-blockdev
'{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}'
\
-blockdev
'{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-1-storage","backing":null}'
\
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1
\
-netdev tap,fd=35,id=hostnet0 \
-device
virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3
\
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=36,server,nowait \
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
\
-spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on \
-device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
-device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7,free-page-reporting=on \
-sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
Thanks!
Reproduced easily on Fedora 32 (5.7.16-200.fc32.x86_64).
[ 70.641802] CPU: 3 PID: 2178 Comm: qemu-system-x86 Not tainted 5.7.16-200.fc32.x86_64
#1
[ 70.641802] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO,
BIOS F21 07/31/2020
[ 70.641803] RIP: 0010:page_counter_uncharge+0x4b/0x50
[ 70.641804] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85
db 78 10 48 8b 6d 20 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 0f 1f 44
00 00 48 8b 17 48 39 d6 72 41 41 54 49 89
[ 70.641804] RSP: 0018:ffffb4044139bb18 EFLAGS: 00010286
[ 70.641805] RAX: fffffffffff94600 RBX: fffffffffff94600 RCX: ffff8da63d007000
[ 70.641805] RDX: 000000000000046e RSI: fffffffffff94600 RDI: ffff8da678412e28
[ 70.641806] RBP: ffff8da678412e28 R08: ffff8da678412e28 R09: ffff8da63d0078c0
[ 70.641806] R10: ffff8da634173000 R11: 0000000000000007 R12: 000000000008dc00
[ 70.641806] R13: fffffffffff72400 R14: ffff8da63d007000 R15: 0000000000000391
[ 70.641807] FS: 00007fe7ab5fe700(0000) GS:ffff8da67eac0000(0000)
knlGS:0000000000000000
[ 70.641808] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 70.641808] CR2: 000055568796a468 CR3: 0000000fe9860000 CR4: 0000000000340ee0
[ 70.641808] Call Trace:
[ 70.641813] hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[ 70.641815] region_del+0x1d3/0x300
[ 70.641816] hugetlb_unreserve_pages+0x39/0xb0
[ 70.641818] remove_inode_hugepages+0x1a8/0x3d0
[ 70.641831] ? kvm_mmu_notifier_invalidate_range+0x38/0x60 [kvm]
[ 70.641832] ? tlb_finish_mmu+0x7a/0x1d0
[ 70.641833] hugetlbfs_fallocate+0x3ac/0x5e0
[ 70.641835] ? avc_has_perm+0x3b/0x160
[ 70.641836] ? file_has_perm+0xa2/0xb0
[ 70.641837] ? selinux_inode_follow_link+0x4c/0xb0
[ 70.641838] ? selinux_file_permission+0x4e/0x120
[ 70.641839] ? security_file_permission+0x2e/0x160
[ 70.641840] vfs_fallocate+0x146/0x280
[ 70.641841] __x64_sys_fallocate+0x3e/0x70
[ 70.641843] do_syscall_64+0x5b/0xf0
[ 70.641846] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Note: prealloc=yes is a bad choice in this environment. It
contradicts memory overcommit - what we want to optimize with
free page reporting. You allocate all VM memory to throw it away
once the guest is up. Is there an option to turn this of with
hugetlbfs? I hope so.
I'll try reproducing upstream and send a BUG report upstream, ccing you. Thanks!