[PATCH] Unpinned futexes v2: indexing changes
This changes the way futexes are indexed, so that they don't pin pages. It also fixes some bugs with private mappings and COW pages. Currently, all futexes look up the page at the userspace address and pin it, using the pair (page,offset) as an index into a table of waiting futexes. Any page with a futex waiting on it remains pinned in RAM, which is a problem when many futexes are used, especially with FUTEX_FD. Another problem is that the page is not always the correct one, if it can be changed later by a COW (copy on write) operation. This can happen when waiting on a futex without writing to it after fork(), exec() or mmap(), if the page is then written to before attempting to wake a futex at the same adress. There are two symptoms of the COW problem: - The wrong process can receive wakeups - A process can fail to receive required wakeups. This patch fixes both by changing the indexing so that VM_SHARED mappings use the triple (inode,offset,index), and private mappings use the pair (mm,virtual_address). The former correctly handles all shared mappings, including tmpfs and therefore all kinds of shared memory (IPC shm, /dev/shm and MAP_ANON|MAP_SHARED). This works because every mapping which is VM_SHARED has an associated non-zero vma->vm_file, and hence inode. (This is ensured in do_mmap_pgoff, where it calls shmem_zero_setup). The latter handles all private mappings, both files and anonymous. It isn't affected by COW, because it doesn't care about the actual pages, just the virtual address. The patch has a few bonuses: 1. It removes the vcache implementation, as only futexes were using it, and they don't any more. 2. Removing the vcache should make COW page faults a bit faster. 3. Futex operations no longer take the page table lock, walk the page table, fault in pages that aren't mapped in the page table, or do a vcache hash lookup - they are mostly a simple offset calculation with one hash for the futex table. So they should be noticably faster. Special thanks to Hugh Dickins, Andrew Morton and Rusty Russell for insightful feedback. All suggestions are included.
Showing
This diff is collapsed.
mm/vcache.c
deleted
100644 → 0
Please register or sign in to comment