- 15 Mar, 2016 40 commits
-
-
Joonsoo Kim authored
Current cache type determination code is open-code and looks not understandable. Following patch will introduce one more cache type and it would make code more complex. So, before it happens, this patch abstracts these codes. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
Finding suitable OFF_SLAB candidate is more related to aligned cache size rather than original size. Same reasoning can be applied to the debug pagealloc candidate. So, this patch moves up alignment fixup to proper position. From that point, size is aligned so we can remove some alignment fixups. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
Currently, the freelist is at the front of slab page. This requires extra space to meet object alignment requirement. If we put the freelist at the end of a slab page, objects could start at page boundary and will be at correct alignment. This is possible because freelist has no alignment constraint itself. This gives us two benefits: It removes extra memory space for the freelist alignment and remove complex calculation at cache initialization step. I can't think notable drawback here. I mentioned that this would reduce extra memory space, but, this benefit is rather theoretical because it can be applied to very few cases. Following is the example cache type that can get benefit from this change. size align num before after 32 8 124 4100 4092 64 8 63 4103 4095 88 8 46 4102 4094 272 8 15 4103 4095 408 8 10 4098 4090 32 16 124 4108 4092 64 16 63 4111 4095 32 32 124 4124 4092 64 32 63 4127 4095 96 32 42 4106 4074 before means whole size for objects and aligned freelist before applying patch and after shows the result of this patch. Since before is more than 4096, number of object should decrease and memory waste happens. Anyway, this patch removes complex calculation so looks beneficial to me. [akpm@linux-foundation.org: fix kerneldoc] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
Now, we don't use object status buffer in any setup. Remove it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
DEBUG_SLAB_LEAK is a debug option. It's current implementation requires status buffer so we need more memory to use it. And, it cause kmem_cache initialization step more complex. To remove this extra memory usage and to simplify initialization step, this patch implement this feature with another way. When user requests to get slab object owner information, it marks that getting information is started. And then, all free objects in caches are flushed to corresponding slab page. Now, we can distinguish all freed object so we can know all allocated objects, too. After collecting slab object owner information on allocated objects, mark is checked that there is no free during the processing. If true, we can be sure that our information is correct so information is returned to user. Although this way is rather complex, it has two important benefits mentioned above. So, I think it is worth changing. There is one drawback that it takes more time to get slab object owner information but it is just a debug option so it doesn't matter at all. To help review, this patch implements new way only. Following patch will remove useless code. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
Currently, open code for checking DEBUG_PAGEALLOC cache is spread to some sites. It makes code unreadable and hard to change. This patch cleans up this code. The following patch will change the criteria for DEBUG_PAGEALLOC cache so this clean-up will help it, too. [akpm@linux-foundation.org: fix build with CONFIG_DEBUG_PAGEALLOC=n] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
debug_pagealloc debugging is related to SLAB_POISON flag rather than FORCED_DEBUG option, although FORCED_DEBUG option will enable SLAB_POISON. Fix it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
Some of "#if DEBUG" are for reporting slab implementation bug rather than user usecase bug. It's not really needed because slab is stable for a quite long time and it makes code too dirty. This patch remove it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
It is obsolete so remove it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
This patchset implements a new freed object management way, that is, OBJFREELIST_SLAB. Purpose of it is to reduce memory overhead in SLAB. SLAB needs a array to manage freed objects in a slab. If there is leftover after objects are packed into a slab, we can use it as a management array, and, in this case, there is no memory waste. But, in the other cases, we need to allocate extra memory for a management array or utilize dedicated internal memory in a slab for it. Both cases causes memory waste so it's not good. With this patchset, freed object itself can be used for a management array. So, memory waste could be reduced. Detailed idea and numbers are described in last patch's commit description. Please refer it. In fact, I tested another idea implementing OBJFREELIST_SLAB with extendable linked array through another freed object. It can remove memory waste completely but it causes more computational overhead in critical lock path and it seems that overhead outweigh benefit. So, this patchset doesn't include it. I will attach prototype just for a reference. This patch (of 16): We use freelist_idx_t type for free object management whose size would be smaller than size of unsigned int. Fix it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
Fix up trivial spelling errors, noticed while reading the code. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
This patch introduce a new API call kfree_bulk() for bulk freeing memory objects not bound to a single kmem_cache. Christoph pointed out that it is possible to implement freeing of objects, without knowing the kmem_cache pointer as that information is available from the object's page->slab_cache. Proposing to remove the kmem_cache argument from the bulk free API. Jesper demonstrated that these extra steps per object comes at a performance cost. It is only in the case CONFIG_MEMCG_KMEM is compiled in and activated runtime that these steps are done anyhow. The extra cost is most visible for SLAB allocator, because the SLUB allocator does the page lookup (virt_to_head_page()) anyhow. Thus, the conclusion was to keep the kmem_cache free bulk API with a kmem_cache pointer, but we can still implement a kfree_bulk() API fairly easily. Simply by handling if kmem_cache_free_bulk() gets called with a kmem_cache NULL pointer. This does increase the code size a bit, but implementing a separate kfree_bulk() call would likely increase code size even more. Below benchmarks cost of alloc+free (obj size 256 bytes) on CPU i7-4790K @ 4.00GHz, no PREEMPT and CONFIG_MEMCG_KMEM=y. Code size increase for SLAB: add/remove: 0/0 grow/shrink: 1/0 up/down: 74/0 (74) function old new delta kmem_cache_free_bulk 660 734 +74 SLAB fastpath: 87 cycles(tsc) 21.814 sz - fallback - kmem_cache_free_bulk - kfree_bulk 1 - 103 cycles 25.878 ns - 41 cycles 10.498 ns - 81 cycles 20.312 ns 2 - 94 cycles 23.673 ns - 26 cycles 6.682 ns - 42 cycles 10.649 ns 3 - 92 cycles 23.181 ns - 21 cycles 5.325 ns - 39 cycles 9.950 ns 4 - 90 cycles 22.727 ns - 18 cycles 4.673 ns - 26 cycles 6.693 ns 8 - 89 cycles 22.270 ns - 14 cycles 3.664 ns - 23 cycles 5.835 ns 16 - 88 cycles 22.038 ns - 14 cycles 3.503 ns - 22 cycles 5.543 ns 30 - 89 cycles 22.284 ns - 13 cycles 3.310 ns - 20 cycles 5.197 ns 32 - 88 cycles 22.249 ns - 13 cycles 3.420 ns - 20 cycles 5.166 ns 34 - 88 cycles 22.224 ns - 14 cycles 3.643 ns - 20 cycles 5.170 ns 48 - 88 cycles 22.088 ns - 14 cycles 3.507 ns - 20 cycles 5.203 ns 64 - 88 cycles 22.063 ns - 13 cycles 3.428 ns - 20 cycles 5.152 ns 128 - 89 cycles 22.483 ns - 15 cycles 3.891 ns - 23 cycles 5.885 ns 158 - 89 cycles 22.381 ns - 15 cycles 3.779 ns - 22 cycles 5.548 ns 250 - 91 cycles 22.798 ns - 16 cycles 4.152 ns - 23 cycles 5.967 ns SLAB when enabling MEMCG_KMEM runtime: - kmemcg fastpath: 130 cycles(tsc) 32.684 ns (step:0) 1 - 148 cycles 37.220 ns - 66 cycles 16.622 ns - 66 cycles 16.583 ns 2 - 141 cycles 35.510 ns - 51 cycles 12.820 ns - 58 cycles 14.625 ns 3 - 140 cycles 35.017 ns - 37 cycles 9.326 ns - 33 cycles 8.474 ns 4 - 137 cycles 34.507 ns - 31 cycles 7.888 ns - 33 cycles 8.300 ns 8 - 140 cycles 35.069 ns - 25 cycles 6.461 ns - 25 cycles 6.436 ns 16 - 138 cycles 34.542 ns - 23 cycles 5.945 ns - 22 cycles 5.670 ns 30 - 136 cycles 34.227 ns - 22 cycles 5.502 ns - 22 cycles 5.587 ns 32 - 136 cycles 34.253 ns - 21 cycles 5.475 ns - 21 cycles 5.324 ns 34 - 136 cycles 34.254 ns - 21 cycles 5.448 ns - 20 cycles 5.194 ns 48 - 136 cycles 34.075 ns - 21 cycles 5.458 ns - 21 cycles 5.367 ns 64 - 135 cycles 33.994 ns - 21 cycles 5.350 ns - 21 cycles 5.259 ns 128 - 137 cycles 34.446 ns - 23 cycles 5.816 ns - 22 cycles 5.688 ns 158 - 137 cycles 34.379 ns - 22 cycles 5.727 ns - 22 cycles 5.602 ns 250 - 138 cycles 34.755 ns - 24 cycles 6.093 ns - 23 cycles 5.986 ns Code size increase for SLUB: function old new delta kmem_cache_free_bulk 717 799 +82 SLUB benchmark: SLUB fastpath: 46 cycles(tsc) 11.691 ns (step:0) sz - fallback - kmem_cache_free_bulk - kfree_bulk 1 - 61 cycles 15.486 ns - 53 cycles 13.364 ns - 57 cycles 14.464 ns 2 - 54 cycles 13.703 ns - 32 cycles 8.110 ns - 33 cycles 8.482 ns 3 - 53 cycles 13.272 ns - 25 cycles 6.362 ns - 27 cycles 6.947 ns 4 - 51 cycles 12.994 ns - 24 cycles 6.087 ns - 24 cycles 6.078 ns 8 - 50 cycles 12.576 ns - 21 cycles 5.354 ns - 22 cycles 5.513 ns 16 - 49 cycles 12.368 ns - 20 cycles 5.054 ns - 20 cycles 5.042 ns 30 - 49 cycles 12.273 ns - 18 cycles 4.748 ns - 19 cycles 4.758 ns 32 - 49 cycles 12.401 ns - 19 cycles 4.821 ns - 19 cycles 4.810 ns 34 - 98 cycles 24.519 ns - 24 cycles 6.154 ns - 24 cycles 6.157 ns 48 - 83 cycles 20.833 ns - 21 cycles 5.446 ns - 21 cycles 5.429 ns 64 - 75 cycles 18.891 ns - 20 cycles 5.247 ns - 20 cycles 5.238 ns 128 - 93 cycles 23.271 ns - 27 cycles 6.856 ns - 27 cycles 6.823 ns 158 - 102 cycles 25.581 ns - 30 cycles 7.714 ns - 30 cycles 7.695 ns 250 - 107 cycles 26.917 ns - 38 cycles 9.514 ns - 38 cycles 9.506 ns SLUB when enabling MEMCG_KMEM runtime: - kmemcg fastpath: 71 cycles(tsc) 17.897 ns (step:0) 1 - 85 cycles 21.484 ns - 78 cycles 19.569 ns - 75 cycles 18.938 ns 2 - 81 cycles 20.363 ns - 45 cycles 11.258 ns - 44 cycles 11.076 ns 3 - 78 cycles 19.709 ns - 33 cycles 8.354 ns - 32 cycles 8.044 ns 4 - 77 cycles 19.430 ns - 28 cycles 7.216 ns - 28 cycles 7.003 ns 8 - 101 cycles 25.288 ns - 23 cycles 5.849 ns - 23 cycles 5.787 ns 16 - 76 cycles 19.148 ns - 20 cycles 5.162 ns - 20 cycles 5.081 ns 30 - 76 cycles 19.067 ns - 19 cycles 4.868 ns - 19 cycles 4.821 ns 32 - 76 cycles 19.052 ns - 19 cycles 4.857 ns - 19 cycles 4.815 ns 34 - 121 cycles 30.291 ns - 25 cycles 6.333 ns - 25 cycles 6.268 ns 48 - 108 cycles 27.111 ns - 21 cycles 5.498 ns - 21 cycles 5.458 ns 64 - 100 cycles 25.164 ns - 20 cycles 5.242 ns - 20 cycles 5.229 ns 128 - 155 cycles 38.976 ns - 27 cycles 6.886 ns - 27 cycles 6.892 ns 158 - 132 cycles 33.034 ns - 30 cycles 7.711 ns - 30 cycles 7.728 ns 250 - 130 cycles 32.612 ns - 38 cycles 9.560 ns - 38 cycles 9.549 ns Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
This patch implements the free side of bulk API for the SLAB allocator kmem_cache_free_bulk(), and concludes the implementation of optimized bulk API for SLAB allocator. Benchmarked[1] cost of alloc+free (obj size 256 bytes) on CPU i7-4790K @ 4.00GHz, with no debug options, no PREEMPT and CONFIG_MEMCG_KMEM=y but no active user of kmemcg. SLAB single alloc+free cost: 87 cycles(tsc) 21.814 ns with this optimized config. bulk- Current fallback - optimized SLAB bulk 1 - 102 cycles(tsc) 25.747 ns - 41 cycles(tsc) 10.490 ns - improved 59.8% 2 - 94 cycles(tsc) 23.546 ns - 26 cycles(tsc) 6.567 ns - improved 72.3% 3 - 92 cycles(tsc) 23.127 ns - 20 cycles(tsc) 5.244 ns - improved 78.3% 4 - 90 cycles(tsc) 22.663 ns - 18 cycles(tsc) 4.588 ns - improved 80.0% 8 - 88 cycles(tsc) 22.242 ns - 14 cycles(tsc) 3.656 ns - improved 84.1% 16 - 88 cycles(tsc) 22.010 ns - 13 cycles(tsc) 3.480 ns - improved 85.2% 30 - 89 cycles(tsc) 22.305 ns - 13 cycles(tsc) 3.303 ns - improved 85.4% 32 - 89 cycles(tsc) 22.277 ns - 13 cycles(tsc) 3.309 ns - improved 85.4% 34 - 88 cycles(tsc) 22.246 ns - 13 cycles(tsc) 3.294 ns - improved 85.2% 48 - 88 cycles(tsc) 22.121 ns - 13 cycles(tsc) 3.492 ns - improved 85.2% 64 - 88 cycles(tsc) 22.052 ns - 13 cycles(tsc) 3.411 ns - improved 85.2% 128 - 89 cycles(tsc) 22.452 ns - 15 cycles(tsc) 3.841 ns - improved 83.1% 158 - 89 cycles(tsc) 22.403 ns - 14 cycles(tsc) 3.746 ns - improved 84.3% 250 - 91 cycles(tsc) 22.775 ns - 16 cycles(tsc) 4.111 ns - improved 82.4% Notice it is not recommended to do very large bulk operation with this bulk API, because local IRQs are disabled in this period. [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.cSigned-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
Move the call to cache_alloc_debugcheck_after() outside the IRQ disabled section in kmem_cache_alloc_bulk(). When CONFIG_DEBUG_SLAB is disabled the compiler should remove this code. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
This patch implements the alloc side of bulk API for the SLAB allocator. Further optimization are still possible by changing the call to __do_cache_alloc() into something that can return multiple objects. This optimization is left for later, given end results already show in the area of 80% speedup. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
Reviewers notice that the order in slab_post_alloc_hook() of kmemcheck_slab_alloc() and kmemleak_alloc_recursive() gets swapped compared to slab.c / SLAB allocator. Also notice memset now occurs before calling kmemcheck_slab_alloc() and kmemleak_alloc_recursive(). I assume this reordering of kmemcheck, kmemleak and memset is okay because this is the order they are used by the SLUB allocator. This patch completes the sharing of alloc_hook's between SLUB and SLAB. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
In the SLAB allocator kmemcheck_slab_alloc() is guarded against being called in case the object is NULL. In SLUB allocator this NULL pointer invocation can happen, which seems like an oversight. Move the NULL pointer check into kmemcheck code (kmemcheck_slab_alloc) so the check gets moved out of the fastpath, when not compiled with CONFIG_KMEMCHECK. This is a step towards sharing post_alloc_hook between SLUB and SLAB, because slab_post_alloc_hook() does not perform this check before calling kmemcheck_slab_alloc(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
Deduplicate code in SLAB allocator functions slab_alloc() and slab_alloc_node() by using the slab_pre_alloc_hook() call, which is now shared between SLUB and SLAB. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
Remove the SLAB specific function slab_should_failslab(), by moving the check against fault-injection for the bootstrap slab, into the shared function should_failslab() (used by both SLAB and SLUB). This is a step towards sharing alloc_hook's between SLUB and SLAB. This bootstrap slab "kmem_cache" is used for allocating struct kmem_cache objects to the allocator itself. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
First step towards sharing alloc_hook's between SLUB and SLAB allocators. Move the SLUB allocators *_alloc_hook to the common mm/slab.h for internal slab definitions. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jesper Dangaard Brouer authored
This change is primarily an attempt to make it easier to realize the optimizations the compiler performs in-case CONFIG_MEMCG_KMEM is not enabled. Performance wise, even when CONFIG_MEMCG_KMEM is compiled in, the overhead is zero. This is because, as long as no process have enabled kmem cgroups accounting, the assignment is replaced by asm-NOP operations. This is possible because memcg_kmem_enabled() uses a static_key_false() construct. It also helps readability as it avoid accessing the p[] array like: p[size - 1] which "expose" that the array is processed backwards inside helper function build_detached_freelist(). Lastly this also makes the code more robust, in error case like passing NULL pointers in the array. Which were previously handled before commit 03374518 ("slub: add missing kmem cgroup support to kmem_cache_free_bulk"). Fixes: 03374518 ("slub: add missing kmem cgroup support to kmem_cache_free_bulk") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arnd Bergmann authored
gcc-6.0 found an ancient bug in the paride driver, which had a "module_param(verbose, bool, 0);" since before 2.6.12, but actually uses it to accept '0', '1' or '2' as arguments: drivers/block/paride/pd.c: In function 'pd_init_dev_parms': drivers/block/paride/pd.c:298:29: warning: comparison of constant '1' with boolean expression is always false [-Wbool-compare] #define DBMSG(msg) ((verbose>1)?(msg):NULL) In 2012, Rusty did a cleanup patch that also changed the type of the variable to 'bool', which introduced what is now a gcc warning. This changes the type back to 'int' and adapts the module_param() line instead, so it should work as documented in case anyone ever cares about running the ancient driver with debugging. Fixes: 90ab5ee9 ("module_param: make bool parameters really bool (drivers & misc)") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Rusty Russell <rusty@rustcorp.com.au> Cc: Tim Waugh <tim@cyberelk.net> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Jens Axboe <axboe@fb.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
San Mehat authored
This patch has been carried in the Android tree for quite some time and is one of the few patches required to get a mainline kernel up and running with an exsiting Android userspace. So I wanted to submit it for review and consideration if it should be merged. For partitions, add new uevent parameters 'PARTN' which specifies the partitions index in the table, and 'PARTNAME', which specifies PARTNAME specifices the partition name of a partition device. Android's userspace uses this for creating device node links from the partition name and number, ie: /dev/block/platform/soc/by-name/system or /dev/block/platform/soc/by-num/p1 One can see its usage here: https://android.googlesource.com/platform/system/core/+/master/init/devices.cpp#355 and https://android.googlesource.com/platform/system/core/+/master/init/devices.cpp#494 [john.stultz@linaro.org: dropped NPARTS and reworded commit message for context] Signed-off-by: Dima Zavin <dima@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Rom Lemarchand <romlem@google.com> Cc: Android Kernel Team <kernel-team@android.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: <harald@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jun Piao authored
In dlm_send_join_cancels(), node is defined with type unsigned int, but initialized with -1, this will lead variable overflow. Although this won't cause any runtime problem, the code looks a little uncoordinated. Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jiufei Xue authored
when o2hb detect a node down, it first set the dead node to recovery map and create ocfs2rec which will replay journal for dead node. o2hb thread then call dlm_do_local_recovery_cleanup() to delete the lock for dead node. After the lock of dead node is gone, locks for other nodes can be granted and may modify the meta data without replaying journal of the dead node. The detail is described as follows. N1 N2 N3(master) modify the extent tree of inode, and commit dirty metadata to journal, then goes down. o2hb thread detects N1 goes down, set recovery map and delete the lock of N1. dlm_thread flush ast for the lock of N2. do not detect the death of N1, so recovery map is empty. read inode from disk without replaying the journal of N1 and modify the extent tree of the inode that N1 had modified. ocfs2rec recover the journal of N1. The modification of N2 is lost. The modification of N1 and N2 are not serial, and it will lead to read-only file system. We can set recovery_waiting flag to the lock resource after delete the lock for dead node to prevent other node from getting the lock before dlm recovery. After dlm recovery, the recovery map on N2 is not empty, ocfs2_inode_lock_full_nested() will wait for ocfs2 recovery. Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
xuejiufei authored
If master migrate this lock resource to node when it happened to purge it, a new lock resource will be created and inserted into hash list. If then master goes down, the lock resource being purged is recovered, so there exist two lock resource with different owner. So return error to master if the lock resource is in DROPPING state, master will retry to migrate this lock resource. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
xuejiufei authored
If the master goes down after return in-progress for deref message. The lock resource on non-master node can not be purged. Clear the DROPPING_REF flag and recovery it. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
xuejiufei authored
Master returns in-progress to non-master node when it can not clear the refmap bit right now. And non-master node will not purge the lock resource until receiving deref done message. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
xuejiufei authored
This series of patches is to fix the dis-order issue of setting/clearing refmap bit described below. Node 1 Node 2(master) dlmlock dlm_do_master_request dlm_master_request_handler -> dlm_lockres_set_refmap_bit dlmlock succeed dlmunlock succeed dlm_purge_lockres dlm_deref_handler -> find lock resource is in DLM_LOCK_RES_SETREF_INPROG state, so dispatch a deref work dlm_purge_lockres succeed. call dlmlock again dlm_do_master_request dlm_master_request_handler -> dlm_lockres_set_refmap_bit deref work trigger, call dlm_lockres_clear_refmap_bit to clear Node 1 from refmap dlm_purge_lockres succeed dlm_send_remote_lock_request return DLM_IVLOCKID because the lockres is not exist BUG if the lockres is $RECOVERY This series of patches add a new message to keep the order of set and clear. Other nodes can purge the lock resource only after the refmap bit on master is cleared. This patch is to add DEREF_DONE message and corresponding handler. Node can purge the lock resource after receiving this message. As a new message is added, so increase the minor number of dlm protocol version. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joseph Qi authored
Refer to cluster/tcp.h, NET_MAX_PAYLOAD_BYTES is a typo for O2NET_MAX_PAYLOAD_BYTES. Since currently DLM_MIG_LOCKRES_RESERVED is not actually used, it won't cause any problem. But we'd better correct it for further use. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
jiangyiwen authored
Commit a75e9cca ("ocfs2: use spinlock irqsave for downconvert lock") missed an unmodified place in ocfs2_osb_dump(), so it still exists a deadlock scenario. ocfs2_wake_downconvert_thread ocfs2_rw_unlock ocfs2_dio_end_io dio_complete ..... bio_endio req_bio_endio .... scsi_io_completion blk_done_softirq __do_softirq do_softirq irq_exit do_IRQ ocfs2_osb_dump cat /sys/kernel/debug/ocfs2/${uuid}/fs_state This patch still uses spin_lock_irqsave() - replace spin_lock() to solve this situation. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
jiangyiwen authored
There actually no hardware or software interrupts in the context which using o2hb_live_lock, so we don't need to worry about race conditions caused by irq/softirq with spinlock held. Turning off irq is not good for system performance after all. Just replace them with a non interrupt safe function. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Sudip Mukherjee authored
blackfin allmodconfig build fails with the error: ../sound/core/pcm_native.c: In function 'snd_pcm_lib_default_mmap': ../sound/core/pcm_native.c:3386:24: error: implicit declaration of function 'pgprot_writecombine' [-Werror=implicit-function-declaration] area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); ^ ../sound/core/pcm_native.c:3386:22: error: incompatible types when assigning to type 'pgprot_t {aka struct <anonymous>}' from type 'int' area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); ^ When !MMU, asm-generic will not define default pgprot_writecombine, so blackfin needs to define it by itself. The patch idea is from commit 65b9ab88 ("arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU") Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Cc: Steven Miao <realmz6@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Sudip Mukherjee authored
While building we are getting warnings: arch/m32r/mm/init.c:63:17: warning: unused variable 'low' arch/m32r/mm/init.c:62:17: warning: unused variable 'max_dma' max_dma and low are only used if CONFIG_MMU is defined. Lets declare the variables inside the #ifdef. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Zijlstra authored
$ make tags GEN tags ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1" ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1" ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1" ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1" ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1" ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1" ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1" Which are all the result of the DEFINE_PER_CPU pattern: scripts/tags.sh:200: '/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/' scripts/tags.sh:201: '/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/' The below cures them. All except the workqueue one are within reasonable distance of the 80 char limit. TJ do you have any preference on how to fix the wq one, or shall we just not care its too long? Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Geliang Tang authored
Use list_for_each_entry() instead of list_for_each() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull NOHZ updates from Ingo Molnar: "NOHZ enhancements, by Frederic Weisbecker, which reorganizes/refactors the NOHZ 'can the tick be stopped?' infrastructure and related code to be data driven, and harmonizes the naming and handling of all the various properties" [ This makes the ugly "fetch_or()" macro that the scheduler used internally a new generic helper, and does a bad job at it. I'm pulling it, but I've asked Ingo and Frederic to get this fixed up ] * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched-clock: Migrate to use new tick dependency mask model posix-cpu-timers: Migrate to use new tick dependency mask model sched: Migrate sched to use new tick dependency mask model sched: Account rr tasks perf: Migrate perf to use new tick dependency mask model nohz: Use enum code for tick stop failure tracing message nohz: New tick dependency mask nohz: Implement wide kick on top of irq work atomic: Export fetch_or()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler updates from Ingo Molnar: "The main changes in this cycle are: - Make schedstats a runtime tunable (disabled by default) and optimize it via static keys. As most distributions enable CONFIG_SCHEDSTATS=y due to its instrumentation value, this is a nice performance enhancement. (Mel Gorman) - Implement 'simple waitqueues' (swait): these are just pure waitqueues without any of the more complex features of full-blown waitqueues (callbacks, wake flags, wake keys, etc.). Simple waitqueues have less memory overhead and are faster. Use simple waitqueues in the RCU code (in 4 different places) and for handling KVM vCPU wakeups. (Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker, Marcelo Tosatti) - sched/numa enhancements (Rik van Riel) - NOHZ performance enhancements (Rik van Riel) - Various sched/deadline enhancements (Steven Rostedt) - Various fixes (Peter Zijlstra) - ... and a number of other fixes, cleanups and smaller enhancements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) sched/cputime: Fix steal_account_process_tick() to always return jiffies sched/deadline: Remove dl_new from struct sched_dl_entity Revert "kbuild: Add option to turn incompatible pointer check into error" sched/deadline: Remove superfluous call to switched_to_dl() sched/debug: Fix preempt_disable_ip recording for preempt_disable() sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity time, acct: Drop irq save & restore from __acct_update_integrals() acct, time: Change indentation in __acct_update_integrals() sched, time: Remove non-power-of-two divides from __acct_update_integrals() sched/rt: Kick RT bandwidth timer immediately on start up sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug sched/debug: Move sched_domain_sysctl to debug.c sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c sched/rt: Fix PI handling vs. sched_setscheduler() sched/core: Remove duplicated sched_group_set_shares() prototype sched/fair: Consolidate nohz CPU load update code sched/fair: Avoid using decay_load_missed() with a negative value sched/deadline: Always calculate end of period on sched_yield() sched/cgroup: Fix cgroup entity load tracking tear-down rcu: Use simple wait queues where possible in rcutree ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull RAS updates from Ingo Molnar: "Various RAS updates: - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable MCA) error decoding support (Aravind Gopalakrishnan) - x86 memcpy_mcsafe() support, to enable smart(er) hardware error recovery in NVDIMM drivers, based on an extension of the x86 exception handling code. (Tony Luck)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/sb_edac: Fix computation of channel address x86/mm, x86/mce: Add memcpy_mcsafe() x86/mce/AMD: Document some functionality x86/mce: Clarify comments regarding deferred error x86/mce/AMD: Fix logic to obtain block address x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors x86/mce: Move MCx_CONFIG MSR definitions x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries x86/mm: Expand the exception table logic to allow new handling options x86/mce/AMD: Set MCAX Enable bit x86/mce/AMD: Carve out threshold block preparation x86/mce/AMD: Fix LVT offset configuration for thresholding x86/mce/AMD: Reduce number of blocks scanned per bank x86/mce/AMD: Do not perform shared bank check for future processors x86/mce: Fix order of AMD MCE init function call
-