- 15 Dec, 2016 40 commits
-
-
Matthew Wilcox authored
The random iteration test only inserts order-0 entries currently. Update it to insert entries of order between 7 and 0. Also make the maximum index configurable, make some variables static, make the test duration variable, remove some useless spinning, and add a fifth thread which calls tag_tagged_items(). Link: http://lkml.kernel.org/r/1480369871-5271-62-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
When replacing an entry with NULL, we need to delete any sibling entries. Also account deleting exceptional entries properly. Also fix a bug with radix_tree_iter_replace() where we would fail to remove entirely freed nodes. Also fix accounting bug when switching between normal and exceptional entries with replace_slot. Also add testcases for all these bugs. Link: http://lkml.kernel.org/r/1480369871-5271-61-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Calculate how many nodes we need to allocate to split an old_order entry into multiple entries, each of size new_order. The test suite checks that we allocated exactly the right number of nodes; neither too many (checked by rtp->nr == 0), nor too few (checked by comparing nr_allocated before and after the call to radix_tree_split()). Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
This new function splits a larger multiorder entry into smaller entries (potentially multi-order entries). These entries are initialised to RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't confused. The caller should then call radix_tree_for_each_slot() and radix_tree_replace_slot() in order to turn these retry entries into the intended new entries. Tags are replicated from the original multiorder entry into each new entry. Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
This new function allows for the replacement of many smaller entries in the radix tree with one larger multiorder entry. From the point of view of an RCU walker, they may see a mixture of the smaller entries and the large entry during the same walk, but they will never see NULL for an index which was populated before the join. Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
This is an exceptionally complicated function with just one caller (tag_pages_for_writeback). We devote a large portion of the runtime of the test suite to testing this one function which has one caller. By introducing the new function radix_tree_iter_tag_set(), we can eliminate all of the complexity while keeping the performance. The caller can now use a fairly standard radix_tree_for_each() loop, and it doesn't need to worry about tricksy things like 'start' wrapping. The test suite continues to spend a large amount of time investigating this function, but now it's testing the underlying primitives such as radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator which are also used by other parts of the kernel. Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
This rather complicated function can be better implemented as an iterator. It has only one caller, so move the functionality to the only place that needs it. Update the test suite to follow the same pattern. Link: http://lkml.kernel.org/r/1480369871-5271-56-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called *before* dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
We drop the lock which protects the radix tree, so we must call radix_tree_iter_next() in order to avoid a modification to the tree invalidating the iterator state. Link: http://lkml.kernel.org/r/1480369871-5271-54-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Print the indices of the entries as unsigned (instead of signed) integers and print the parent node of each entry to help navigate around larger trees where the layout is not quite so obvious. Print the indices covered by a node. Rearrange the order of fields printed so the indices and parents line up for each type of entry. Link: http://lkml.kernel.org/r/1480369871-5271-53-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Since this function is specialised to the radix tree, pass in the node and tag to calculate the address of the bitmap in radix_tree_find_next_bit() instead of the caller. Likewise, there is no need to pass in the size of the bitmap. Link: http://lkml.kernel.org/r/1480369871-5271-52-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Similar to node_tag_clear(), factor node_tag_set() out of radix_tree_range_tag_if_tagged(). Link: http://lkml.kernel.org/r/1480369871-5271-51-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
I want to be able to reference node->parent after freeing node. Currently node->parent is in a union with rcu_head, so it is overwritten when the node is put on the RCU list. We know that private_list is not referenced after the node is freed, so it is safe for these two members to share space. Link: http://lkml.kernel.org/r/1480369871-5271-50-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Remove the old find_next_bit code in favour of linking in the find_bit code from tools/lib. Link: http://lkml.kernel.org/r/1480369871-5271-48-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
I need the following functions for the radix tree: bitmap_fill bitmap_empty bitmap_full Copy the implementations from include/linux/bitmap.h Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
This probably doubles the size of each item allocated by the test suite but it lets us check a few more things, and may be needed for upcoming API changes that require the caller pass in the order of the entry. Link: http://lkml.kernel.org/r/1480369871-5271-46-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
item_kill_tree() assumes that everything in the tree is a pointer to a struct item, which is annoying when testing the behaviour of exceptional entries. Fix it to delete exceptional entries on the assumption they don't need to be freed. Link: http://lkml.kernel.org/r/1480369871-5271-45-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Calling rcu_barrier() allows all of the rcu-freed memory to be actually returned to the pool, and allows nr_allocated to return to 0. As well as allowing diffs between runs to be more useful, it also lets us pinpoint leaks more effectively. Link: http://lkml.kernel.org/r/1480369871-5271-44-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Konstantin Khlebnikov authored
This adds simple benchmark for iterator similar to one I've used for commit 78c1d784 ("radix-tree: introduce bit-optimized iterator") Building with make BENCHMARK=1 set radix tree order to 6, this allows to get performance comparable to in kernel performance. Link: http://lkml.kernel.org/r/1480369871-5271-43-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Each thread needs to register itself with RCU, otherwise the reading thread's read lock has no effect and the freeing thread will free the memory in the tree without waiting for the read lock to be dropped. Link: http://lkml.kernel.org/r/1480369871-5271-42-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Instead of reseeding the random number generator every time around the loop in big_gang_check(), seed it at the beginning of execution. Use rand_r() and an independent base seed for each thread in iteration_test() so they don't stomp all over each others state. Since this particular test depends on the kernel scheduler, the iteration test can't be reproduced based purely on the random seed, but at least it won't pollute the other tests. Print the seed, and allow the seed to be specified so that a run which hits a problem can be reproduced. Link: http://lkml.kernel.org/r/1480369871-5271-41-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
It can be a source of mild concern when the test suite shows that we're leaking nodes. While poring over the source code looking for leaks can lead to some fascinating bugs being discovered, sometimes the leak is simply that these nodes were preallocated and are sitting on the per-CPU list. Free them by calling the CPU dead callback. Link: http://lkml.kernel.org/r/1480369871-5271-40-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Rather than simply NOP out preempt_enable() and preempt_disable(), keep track of preempt_count and display it regularly in case either the test suite or the code under test is forgetting to balance the enables & disables. Only found a test-case that was forgetting to re-enable preemption, but it's a possibility worth checking. Link: http://lkml.kernel.org/r/1480369871-5271-39-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
In order to test the preload code, it is necessary to fail GFP_ATOMIC allocations, which requires defining GFP_KERNEL and GFP_ATOMIC properly. Remove the obsolete __GFP_WAIT and copy the definitions of the __GFP flags which are used from the kernel include files. We also need the real definition of gfpflags_allow_blocking() to persuade the radix tree to actually use its preallocated nodes. Link: http://lkml.kernel.org/r/1480369871-5271-38-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Patch series "Radix tree patches for 4.10", v3. Mostly these are improvements; the only bug fixes in here relate to multiorder entries (which are unused in the 4.9 tree). This patch (of 32): The radix tree uses its own buggy WARN_ON_ONCE. Replace it with the definition from asm-generic/bug.h Link: http://lkml.kernel.org/r/1480369871-5271-37-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Currently we never clear dirty tags in DAX mappings and thus address ranges to flush accumulate. Now that we have locking of radix tree entries, we have all the locking necessary to reliably clear the radix tree dirty tag when flushing caches for corresponding address range. Similarly to page_mkclean() we also have to write-protect pages to get a page fault when the page is next written to so that we can mark the entry dirty again. Link: http://lkml.kernel.org/r/1479460644-25076-21-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite() has released corresponding radix tree entry lock. When we want to writeprotect PTE on cache flush, we need PTE modification to happen under radix tree entry lock to ensure consistent updates of PTE and radix tree (standard faults use page lock to ensure this consistency). So move update of PTE bit into dax_pfn_mkwrite(). Link: http://lkml.kernel.org/r/1479460644-25076-20-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Currently, flushing of caches for DAX mappings was ignoring entry lock. So far this was ok (modulo a bug that a difference in entry lock could cause cache flushing to be mistakenly skipped) but in the following patches we will write-protect PTEs on cache flushing and clear dirty tags. For that we will need more exclusion. So do cache flushing under an entry lock. This allows us to remove one lock-unlock pair of mapping->tree_lock as a bonus. Link: http://lkml.kernel.org/r/1479460644-25076-19-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
DAX will need to implement its own version of page_check_address(). To avoid duplicating page table walking code, export follow_pte() which does what we need. Link: http://lkml.kernel.org/r/1479460644-25076-18-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Currently finish_mkwrite_fault() returns 0 when PTE got changed before we acquired PTE lock and VM_FAULT_WRITE when we succeeded in modifying the PTE. This is somewhat confusing since 0 generally means success, it is also inconsistent with finish_fault() which returns 0 on success. Change finish_mkwrite_fault() to return 0 on success and VM_FAULT_NOPAGE when PTE changed. Practically, there should be no behavioral difference since we bail out from the fault the same way regardless whether we return 0, VM_FAULT_NOPAGE, or VM_FAULT_WRITE. Also note that VM_FAULT_WRITE has no effect for shared mappings since the only two places that check it - KSM and GUP - care about private mappings only. Generally the meaning of VM_FAULT_WRITE for shared mappings is not well defined and we should probably clean that up. Link: http://lkml.kernel.org/r/1479460644-25076-17-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Provide a helper function for finishing write faults due to PTE being read-only. The helper will be used by DAX to avoid the need of complicating generic MM code with DAX locking specifics. Link: http://lkml.kernel.org/r/1479460644-25076-16-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
wp_page_reuse() handles write shared faults which is needed only in wp_page_shared(). Move the handling only into that location to make wp_page_reuse() simpler and avoid a strange situation when we sometimes pass in locked page, sometimes unlocked etc. Link: http://lkml.kernel.org/r/1479460644-25076-15-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
So far we set vmf->page during WP faults only when we needed to pass it to the ->page_mkwrite handler. Set it in all the cases now and use that instead of passing page pointer explicitly around. Link: http://lkml.kernel.org/r/1479460644-25076-14-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
We will need more information in the ->page_mkwrite() helper for DAX to be able to fully finish faults there. Pass vm_fault structure to do_page_mkwrite() and use it there so that information propagates properly from upper layers. Link: http://lkml.kernel.org/r/1479460644-25076-13-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Currently we duplicate handling of shared write faults in wp_page_reuse() and do_shared_fault(). Factor them out into a common function. Link: http://lkml.kernel.org/r/1479460644-25076-12-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Move final handling of COW faults from generic code into DAX fault handler. That way generic code doesn't have to be aware of peculiarities of DAX locking so remove that knowledge and make locking functions private to fs/dax.c. Link: http://lkml.kernel.org/r/1479460644-25076-11-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Introduce finish_fault() as a helper function for finishing page faults. It is rather thin wrapper around alloc_set_pte() but since we'd want to call this from DAX code or filesystems, it is still useful to avoid some boilerplate code. Link: http://lkml.kernel.org/r/1479460644-25076-10-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Patch series "dax: Clear dirty bits after flushing caches", v5. Patchset to clear dirty bits from radix tree of DAX inodes when caches for corresponding pfns have been flushed. In principle, these patches enable handlers to easily update PTEs and do other work necessary to finish the fault without duplicating the functionality present in the generic code. I'd like to thank Kirill and Ross for reviews of the series! This patch (of 20): To allow full handling of COW faults add memcg field to struct vm_fault and a return value of ->fault() handler meaning that COW fault is fully handled and memcg charge must not be canceled. This will allow us to remove knowledge about special DAX locking from the generic fault code. Link: http://lkml.kernel.org/r/1479460644-25076-9-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
Add orig_pte field to vm_fault structure to allow ->page_mkwrite handlers to fully handle the fault. This also allows us to save some passing of extra arguments around. Link: http://lkml.kernel.org/r/1479460644-25076-8-git-send-email-jack@suse.czSigned-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-