On 06/03/2015 07:05 PM, Kirill A. Shutemov wrote: > Hello everybody, > > Here's new revision of refcounting patchset. Please review and consider > applying. > > The goal of patchset is to make refcounting on THP pages cheaper with > simpler semantics and allow the same THP compound page to be mapped with > PMD and PTEs. This is required to get reasonable THP-pagecache > implementation. > > With the new refcounting design it's much easier to protect against > split_huge_page(): simple reference on a page will make you the deal. > It makes gup_fast() implementation simpler and doesn't require > special-case in futex code to handle tail THP pages. > > It should improve THP utilization over the system since splitting THP in > one process doesn't necessary lead to splitting the page in all other > processes have the page mapped. > > The patchset drastically lower complexity of get_page()/put_page() > codepaths. I encourage people look on this code before-and-after to > justify time budget on reviewing this patchset. > > = Changelog = > > v6: > - rebase to since-4.0; > - optimize mapcount handling: significantely reduce overhead for most > common cases. > - split pages on migrate_pages(); > - remove infrastructure for handling splitting PMDs on all architectures; > - fix page_mapcount() for hugetlb pages; > Hi Kirill, I ran some LTP mm tests and hugemmap tests trigger the following: [ 438.749457] page:ffffea0000df8000 count:2 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 [ 438.750089] flags: 0x3ffc0000004001(locked|head) [ 438.750089] page dumped because: VM_BUG_ON_PAGE(page_mapped(page)) [ 438.750089] ------------[ cut here ]------------ [ 438.768046] kernel BUG at mm/filemap.c:205! [ 438.768046] invalid opcode: 0000 [#1] SMP [ 438.768046] Modules linked in: loop ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ppdev iosf_mbi crct10dif_pclmul crc32_pclmul crc32c_intel joydev ghash_clmulni_intel virtio_balloon pcspkr virtio_console nfsd parport_pc parport floppy pvpanic i2c_piix4 acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc virtio_net qxl virtio_blk drm_kms_helper ttm drm serio_raw ata_generic virtio_pci virtio_ring virtio pata_acpi [ 438.768046] CPU: 1 PID: 12918 Comm: hugemmap01 Not tainted 4.0.0thprfc-kasv6+ #247 [ 438.768046] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 438.768046] task: ffff88007b09cc40 ti: ffff880077b88000 task.ti: ffff880077b88000 [ 438.768046] RIP: 0010:[] [] __delete_from_page_cache+0x4bc/0x5a0 [ 438.768046] RSP: 0018:ffff880077b8bc58 EFLAGS: 00010086 [ 438.768046] RAX: 0000000000000036 RBX: ffffea0000df8000 RCX: 0000000000000006 [ 438.768046] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88007d5ce9c0 [ 438.768046] RBP: ffff880077b8bcb8 R08: 0000000000000001 R09: 0000000000000001 [ 438.768046] R10: 0000000000000001 R11: ffff880034e44210 R12: ffffea0000df8000 [ 438.768046] R13: ffff88003562cac0 R14: 0000000000000000 R15: ffff88003562cac8 [ 438.768046] FS: 00007fda9ccbb700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000 [ 438.768046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 438.768046] CR2: 00007fda9ccc7000 CR3: 00000000785e6000 CR4: 00000000001407e0 [ 438.768046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 438.768046] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 438.768046] Stack: [ 438.768046] 0000000000000246 ffff88003562cad8 ffff88003562caf0 0000000000000000 [ 438.768046] ffff88003562cad0 000000009bfc6d69 ffff880077b8bcb8 ffffea0000df8000 [ 438.768046] ffff88003562cad8 0000000000000000 ffffea0000df8000 0000000000000000 [ 438.768046] Call Trace: [ 438.768046] [] delete_from_page_cache+0x55/0xd0 [ 438.768046] [] truncate_hugepages+0x135/0x290 [ 438.768046] [] ? local_clock+0x15/0x30 [ 438.768046] [] ? lock_release_holdtime.part.31+0xf/0x190 [ 438.768046] [] hugetlbfs_evict_inode+0x18/0x40 [ 438.768046] [] evict+0xab/0x180 [ 438.768046] [] iput+0x1ce/0x390 [ 438.768046] [] do_unlinkat+0x209/0x330 [ 438.768046] [] ? ret_from_sys_call+0x24/0x5f [ 438.768046] [] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 438.768046] [] SyS_unlink+0x16/0x20 [ 438.768046] [] system_call_fastpath+0x12/0x17 [ 438.768046] Code: 49 8b 14 24 4c 89 e0 80 e6 80 74 08 4c 89 e7 e8 15 2e 69 00 8b 40 48 83 c0 01 74 25 48 c7 c6 28 fb c6 81 48 89 df e8 d4 43 03 00 <0f> 0b 48 89 df e8 f4 2d 69 00 48 f7 00 00 c0 00 00 49 89 c4 75 [ 438.768046] RIP [] __delete_from_page_cache+0x4bc/0x5a0 [ 438.768046] RSP [ 438.768046] ---[ end trace 3903188dcb3f3d48 ]--- [ 438.768046] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41 [ 438.768046] in_atomic(): 1, irqs_disabled(): 1, pid: 12918, name: hugemmap01 [ 438.768046] INFO: lockdep is turned off. [ 438.768046] irq event stamp: 6218 [ 438.768046] hardirqs last enabled at (6217): [] __mutex_unlock_slowpath+0xbf/0x190 [ 438.768046] hardirqs last disabled at (6218): [] _raw_spin_lock_irq+0x1f/0x80 [ 438.768046] softirqs last enabled at (6042): [] __do_softirq+0x377/0x670 [ 438.768046] softirqs last disabled at (6027): [] irq_exit+0x11d/0x130 [ 438.768046] CPU: 1 PID: 12918 Comm: hugemmap01 Tainted: G D 4.0.0thprfc-kasv6+ #247 [ 438.768046] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 438.768046] 0000000000000000 000000009bfc6d69 ffff880077b8b8a8 ffffffff81879afa [ 438.768046] 0000000000000000 ffff88007b09cc40 ffff880077b8b8d8 ffffffff810da0cc [ 438.768046] 0000000000000000 ffffffff81c68746 0000000000000029 0000000000000000 [ 438.768046] Call Trace: [ 438.768046] [] dump_stack+0x4c/0x65 [ 438.768046] [] ___might_sleep+0x18c/0x250 [ 438.768046] [] __might_sleep+0x4d/0x90 [ 438.768046] [] down_read+0x2a/0xa0 [ 438.768046] [] exit_signals+0x33/0x150 [ 438.768046] [] do_exit+0xcf/0xd20 [ 438.768046] [] ? kmsg_dump+0x166/0x220 [ 438.768046] [] ? kmsg_dump+0x34/0x220 [ 438.768046] [] oops_end+0x9e/0xe0 [ 438.768046] [] die+0x4b/0x70 [ 438.768046] [] do_trap+0xb0/0x150 [ 438.768046] [] do_error_trap+0xa4/0x180 [ 438.768046] [] ? __delete_from_page_cache+0x4bc/0x5a0 [ 438.768046] [] ? vprintk_emit+0x285/0x620 [ 438.768046] [] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 438.768046] [] do_invalid_op+0x20/0x30 [ 438.768046] [] invalid_op+0x1e/0x30 [ 438.768046] [] ? __delete_from_page_cache+0x4bc/0x5a0 [ 438.768046] [] ? __delete_from_page_cache+0x4bc/0x5a0 [ 438.768046] [] delete_from_page_cache+0x55/0xd0 [ 438.768046] [] truncate_hugepages+0x135/0x290 [ 438.768046] [] ? local_clock+0x15/0x30 [ 438.768046] [] ? lock_release_holdtime.part.31+0xf/0x190 [ 438.768046] [] hugetlbfs_evict_inode+0x18/0x40 [ 438.768046] [] evict+0xab/0x180 [ 438.768046] [] iput+0x1ce/0x390 [ 438.768046] [] do_unlinkat+0x209/0x330 [ 438.768046] [] ? ret_from_sys_call+0x24/0x5f [ 438.768046] [] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 438.768046] [] SyS_unlink+0x16/0x20 [ 438.768046] [] system_call_fastpath+0x12/0x17 [ 438.768046] note: hugemmap01[12918] exited with preempt_count 1 Jerome