1. 25 Mar, 2015 16 commits
  2. 13 Mar, 2015 17 commits
    • Martin Schwidefsky's avatar
      s390/mm: limit STACK_RND_MASK for compat tasks · e143fa93
      Martin Schwidefsky authored
      For compat tasks the mmap randomization does not use the maximum
      randomization value from mmap_rnd_mask but the fixed value of 0x7ff.
      This needs to be respected in the definition of STACK_RND_MASK as
      well.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      e143fa93
    • Heiko Carstens's avatar
    • Hendrik Brueckner's avatar
      s390/cpum_sf: add diagnostic sampling event only if it is authorized · 0a648150
      Hendrik Brueckner authored
      The SF_CYCLES_BASIC_DIAG is always registered even if it is turned of in the
      current hardware configuration.  Because diagnostic-sampling is typically not
      turned on in the hardware configuration, do not register this perf event by
      default.  Enable it only if the diagnostic-sampling function is authorized.
      Signed-off-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      0a648150
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · c202baf0
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        memcg: disable hierarchy support if bound to the legacy cgroup hierarchy
        mm: reorder can_do_mlock to fix audit denial
        kasan, module: move MODULE_ALIGN macro into <linux/moduleloader.h>
        kasan, module, vmalloc: rework shadow allocation for modules
        fanotify: fix event filtering with FAN_ONDIR set
        mm/nommu.c: export symbol max_mapnr
        arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU
        nilfs2: fix deadlock of segment constructor during recovery
        mm: cma: fix CMA aligned offset calculation
        mm, hugetlb: close race when setting PageTail for gigantic pages
        mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled
        drivers/rtc/rtc-s3c.c: add .needs_src_clk to s3c6410 RTC data
        ocfs2: make append_dio an incompat feature
      c202baf0
    • Vladimir Davydov's avatar
      memcg: disable hierarchy support if bound to the legacy cgroup hierarchy · 7feee590
      Vladimir Davydov authored
      If the memory cgroup controller is initially mounted in the scope of the
      default cgroup hierarchy and then remounted to a legacy hierarchy, it will
      still have hierarchy support enabled, which is incorrect.  We should
      disable hierarchy support if bound to the legacy cgroup hierarchy.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7feee590
    • Jeff Vander Stoep's avatar
      mm: reorder can_do_mlock to fix audit denial · a5a6579d
      Jeff Vander Stoep authored
      A userspace call to mmap(MAP_LOCKED) may result in the successful locking
      of memory while also producing a confusing audit log denial.  can_do_mlock
      checks capable and rlimit.  If either of these return positive
      can_do_mlock returns true.  The capable check leads to an LSM hook used by
      apparmour and selinux which produce the audit denial.  Reordering so
      rlimit is checked first eliminates the denial on success, only recording a
      denial when the lock is unsuccessful as a result of the denial.
      Signed-off-by: default avatarJeff Vander Stoep <jeffv@google.com>
      Acked-by: default avatarNick Kralevich <nnk@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Paul Cassella <cassella@cray.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a5a6579d
    • Andrey Ryabinin's avatar
      kasan, module: move MODULE_ALIGN macro into <linux/moduleloader.h> · d3733e5c
      Andrey Ryabinin authored
      include/linux/moduleloader.h is more suitable place for this macro.
      Also change alignment to PAGE_SIZE for CONFIG_KASAN=n as such
      alignment already assumed in several places.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d3733e5c
    • Andrey Ryabinin's avatar
      kasan, module, vmalloc: rework shadow allocation for modules · a5af5aa8
      Andrey Ryabinin authored
      Current approach in handling shadow memory for modules is broken.
      
      Shadow memory could be freed only after memory shadow corresponds it is no
      longer used.  vfree() called from interrupt context could use memory its
      freeing to store 'struct llist_node' in it:
      
          void vfree(const void *addr)
          {
          ...
              if (unlikely(in_interrupt())) {
                  struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
                  if (llist_add((struct llist_node *)addr, &p->list))
                          schedule_work(&p->wq);
      
      Later this list node used in free_work() which actually frees memory.
      Currently module_memfree() called in interrupt context will free shadow
      before freeing module's memory which could provoke kernel crash.
      
      So shadow memory should be freed after module's memory.  However, such
      deallocation order could race with kasan_module_alloc() in module_alloc().
      
      Free shadow right before releasing vm area.  At this point vfree()'d
      memory is not used anymore and yet not available for other allocations.
      New VM_KASAN flag used to indicate that vm area has dynamically allocated
      shadow memory so kasan frees shadow only if it was previously allocated.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a5af5aa8
    • Suzuki K. Poulose's avatar
      fanotify: fix event filtering with FAN_ONDIR set · b3c1030d
      Suzuki K. Poulose authored
      With FAN_ONDIR set, the user can end up getting events, which it hasn't
      marked.  This was revealed with fanotify04 testcase failure on
      Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0
      ("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask").
      
         # /opt/ltp/testcases/bin/fanotify04
         [ ... ]
        fanotify04    7  TPASS  :  event generated properly for type 100000
        fanotify04    8  TFAIL  :  fanotify04.c:147: got unexpected event 30
        fanotify04    9  TPASS  :  No event as expected
      
      The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for
      a fanotify on a dir.  Then does an open(), followed by close() of the
      directory and expects to see an event FAN_OPEN(0x20).  However, the
      fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)).  This happens due to
      the flaw in the check for event_mask in fanotify_should_send_event() which
      does:
      
      	if (event_mask & marks_mask & ~marks_ignored_mask)
      		return true;
      
      where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE),
             marks_mask == (FAN_ONDIR | FAN_OPEN),
             marks_ignored_mask == 0
      
      Fix this by masking the outgoing events to the user, as we already take
      care of FAN_ONDIR and FAN_EVENT_ON_CHILD.
      Signed-off-by: default avatarSuzuki K. Poulose <suzuki.poulose@arm.com>
      Tested-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3c1030d
    • gchen gchen's avatar
      mm/nommu.c: export symbol max_mapnr · 5b8bf307
      gchen gchen authored
      Several modules may need max_mapnr, so export, the related error with
      allmodconfig under c6x:
      
        MODPOST 3327 modules
        ERROR: "max_mapnr" [fs/pstore/ramoops.ko] undefined!
        ERROR: "max_mapnr" [drivers/media/v4l2-core/videobuf2-dma-contig.ko] undefined!
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b8bf307
    • Chen Gang's avatar
      arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU · 65b9ab88
      Chen Gang authored
      When !MMU, asm-generic will not define default pgprot_writecombine, so c6x
      needs to define it by itself.  The related error:
      
          CC [M]  fs/pstore/ram_core.o
        fs/pstore/ram_core.c: In function 'persistent_ram_vmap':
        fs/pstore/ram_core.c:399:10: error: implicit declaration of function 'pgprot_writecombine' [-Werror=implicit-function-declaration]
           prot = pgprot_writecombine(PAGE_KERNEL);
                  ^
        fs/pstore/ram_core.c:399:8: error: incompatible types when assigning to type 'pgprot_t {aka struct <anonymous>}' from type 'int'
           prot = pgprot_writecombine(PAGE_KERNEL);
                ^
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65b9ab88
    • Ryusuke Konishi's avatar
      nilfs2: fix deadlock of segment constructor during recovery · 283ee148
      Ryusuke Konishi authored
      According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
      during recovery at mount time.  The code path that caused the deadlock was
      as follows:
      
        nilfs_fill_super()
          load_nilfs()
            nilfs_salvage_orphan_logs()
              * Do roll-forwarding, attach segment constructor for recovery,
                and kick it.
      
              nilfs_segctor_thread()
                nilfs_segctor_thread_construct()
                 * A lock is held with nilfs_transaction_lock()
                   nilfs_segctor_do_construct()
                     nilfs_segctor_drop_written_files()
                       iput()
                         iput_final()
                           write_inode_now()
                             writeback_single_inode()
                               __writeback_single_inode()
                                 do_writepages()
                                   nilfs_writepage()
                                     nilfs_construct_dsync_segment()
                                       nilfs_transaction_lock() --> deadlock
      
      This can happen if commit 7ef3ff2f ("nilfs2: fix deadlock of segment
      constructor over I_SYNC flag") is applied and roll-forward recovery was
      performed at mount time.  The roll-forward recovery can happen if datasync
      write is done and the file system crashes immediately after that.  For
      instance, we can reproduce the issue with the following steps:
      
       < nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
       # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
       # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
       count=1 && reboot -nfh
       < the system will immediately reboot >
       # mount -t nilfs2 /dev/sdb1 /nilfs
      
      The deadlock occurs because iput() can run segment constructor through
      writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags.  The
      above commit changed segment constructor so that it calls iput()
      asynchronously for inodes with i_nlink == 0, but that change was
      imperfect.
      
      This fixes the another deadlock by deferring iput() in segment constructor
      even for the case that mount is not finished, that is, for the case that
      MS_ACTIVE flag is not set.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reported-by: default avatarYuxuan Shui <yshuiv7@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      283ee148
    • Danesh Petigara's avatar
      mm: cma: fix CMA aligned offset calculation · 850fc430
      Danesh Petigara authored
      The CMA aligned offset calculation is incorrect for non-zero order_per_bit
      values.
      
      For example, if cma->order_per_bit=1, cma->base_pfn= 0x2f800000 and
      align_order=12, the function returns a value of 0x17c00 instead of 0x400.
      
      This patch fixes the CMA aligned offset calculation.
      
      The previous calculation was wrong and would return too-large values for
      the offset, so that when cma_alloc looks for free pages in the bitmap with
      the requested alignment > order_per_bit, it starts too far into the bitmap
      and so CMA allocations will fail despite there actually being plenty of
      free pages remaining.  It will also probably have the wrong alignment.
      With this change, we will get the correct offset into the bitmap.
      
      One affected user is powerpc KVM, which has kvm_cma->order_per_bit set to
      KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, or 18 - 12 = 6.
      
      [gregory.0xf0@gmail.com: changelog additions]
      Signed-off-by: default avatarDanesh Petigara <dpetigara@broadcom.com>
      Reviewed-by: default avatarGregory Fong <gregory.0xf0@gmail.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      850fc430
    • David Rientjes's avatar
      mm, hugetlb: close race when setting PageTail for gigantic pages · 44fc8057
      David Rientjes authored
      Now that gigantic pages are dynamically allocatable, care must be taken to
      ensure that p->first_page is valid before setting PageTail.
      
      If this isn't done, then it is possible to race and have compound_head()
      return NULL.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44fc8057
    • Michal Hocko's avatar
      mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled · e009d5dc
      Michal Hocko authored
      Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
      after OOM killer is disabled if the allocation is performed by a kernel
      thread.  This behavior was introduced from the very beginning by
      7f33d49a ("mm, PM/Freezer: Disable OOM killer when tasks are frozen").
       This means that the basic contract for the allocation request is broken
      and the context requesting such an allocation might blow up unexpectedly.
      
      There are basically two ways forward.
      
      1) move oom_killer_disable after kernel threads are frozen.  This has a
         risk that the OOM victim wouldn't be able to finish because it would
         depend on an already frozen kernel thread.  This would be really tricky
         to debug.
      
      2) do not fail GFP_NOFAIL allocation no matter what and risk a
         potential Freezable kernel threads will loop and fail the suspend.
         Incidental allocations after kernel threads are frozen will at least
         dump a warning - if we are lucky and the serial console is still active
         of course...
      
      This patch implements the later option because it is safer.  We would see
      warning rather than allocation failures for the kernel threads which would
      blow up otherwise and have a higher chances to identify __GFP_NOFAIL users
      from deeper pm code.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@gooogle.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e009d5dc
    • Javier Martinez Canillas's avatar
      drivers/rtc/rtc-s3c.c: add .needs_src_clk to s3c6410 RTC data · 8792f777
      Javier Martinez Canillas authored
      Commit df9e26d0 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
      added an "rtc_src" DT property to specify the clock used as a source to
      the S3C real-time clock.
      
      Not all SoCs needs this so commit eaf3a659 ("drivers/rtc/rtc-s3c.c:
      fix initialization failure without rtc source clock") changed to check
      the struct s3c_rtc_data .needs_src_clk to conditionally grab the clock.
      
      But that commit didn't update the data for each IP version so the RTC
      broke on the boards that needs a source clock. This is the case of at
      least Exynos5250 and Exynos5440 which uses the s3c6410 RTC IP block.
      
      This commit fixes the S3C rtc on the Exynos5250 Snow and Exynos5420
      Peach Pit and Pi Chromebooks.
      Signed-off-by: default avatarJavier Martinez Canillas <javier.martinez@collabora.co.uk>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Chanwoo Choi <cw00.choi@samsung.com>
      Cc: Doug Anderson <dianders@chromium.org>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Tyler Baker <tyler.baker@linaro.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8792f777
    • Mark Fasheh's avatar
      ocfs2: make append_dio an incompat feature · 18d585f0
      Mark Fasheh authored
      It turns out that making this feature ro_compat isn't quite enough to
      prevent accidental corruption on mount from older kernels.  Ocfs2 (like
      other file systems) will process orphaned inodes even when the user mounts
      in 'ro' mode.  So for the case of a filesystem not knowing the append_dio
      feature, mounting the filesystem could result in orphaned-for-dio files
      being deleted, which we clearly don't want.
      
      So instead, turn this into an incompat flag.
      
      Btw, this is kind of my fault - initially I asked that we add a flag to
      cover the feature and even suggested that we use an ro flag.  It wasn't
      until I was looking through our commits for v4.0-rc1 that I realized we
      actually want this to be incompat.
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18d585f0
  3. 12 Mar, 2015 7 commits
    • Mel Gorman's avatar
      mm: thp: Return the correct value for change_huge_pmd · ba68bc01
      Mel Gorman authored
      The wrong value is being returned by change_huge_pmd since commit
      10c1045f ("mm: numa: avoid unnecessary TLB flushes when setting
      NUMA hinting entries") which allows a fallthrough that tries to adjust
      non-existent PTEs. This patch corrects it.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba68bc01
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 09d35919
      Linus Torvalds authored
      Pull i2c fix from Wolfram Sang:
       "An important bugfix for the I2C subsystem core"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        Revert "i2c: core: Dispose OF IRQ mapping at client removal time"
      09d35919
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 91e9134e
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
       "Here are a couple updates for v4.0.
      
        One fixes a config accessor problem on APM X-Gene that we introduced
        when switching to generic config accessors, and the other fixes an
        older read-past-end-of-buffer problem in sysfs.
      
        APM X-Gene host bridge driver
          - Add register offset to config space base address (Feng Kan)
      
        Miscellaneous
          - Don't read past the end of sysfs "driver_override" buffer (Sasha Levin)"
      
      * tag 'pci-v4.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: xgene: Add register offset to config space base address
        PCI: Don't read past the end of sysfs "driver_override" buffer
      91e9134e
    • Linus Torvalds's avatar
      Merge tag 'microblaze-4.0-rc4' of git://git.monstr.eu/linux-2.6-microblaze · d3dd73fc
      Linus Torvalds authored
      Pull arch/microblaze fixes from Michal Simek:
       "Fix syscall error recovery.
      
        Two patches - one is just preparation patch for the second which is
        fixing the problem with syscalls"
      
      * tag 'microblaze-4.0-rc4' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: Fix syscall error recovery for invalid syscall IDs
        microblaze: Coding style cleanup
      d3dd73fc
    • Linus Torvalds's avatar
      Merge tag 'nios2-fix-4.0-rc4' of git://git.rocketboards.org/linux-socfpga-next · 56275112
      Linus Torvalds authored
      Pull arch/nios2 fix from Ley Foon Tan:
       "Remove pt_regs from user header and use generic ucontext.h"
      
      * tag 'nios2-fix-4.0-rc4' of git://git.rocketboards.org/linux-socfpga-next:
        nios2: update pt_regs
      56275112
    • Linus Torvalds's avatar
      mm: fix up numa read-only thread grouping logic · 53da3bc2
      Linus Torvalds authored
      Dave Chinner reported that commit 4d942466 ("mm: convert
      p[te|md]_mknonnuma and remaining page table manipulations") slowed down
      his xfsrepair test enormously.  In particular, it was using more system
      time due to extra TLB flushing.
      
      The ultimate reason turns out to be how the change to use the regular
      page table accessor functions broke the NUMA grouping logic.  The old
      special mknuma/mknonnuma code accessed the page table present bit and
      the magic NUMA bit directly, while the new code just changes the page
      protections using PROT_NONE and the regular vma protections.
      
      That sounds equivalent, and from a fault standpoint it really is, but a
      subtle side effect is that the *other* protection bits of the page table
      entries also change.  And the code to decide how to group the NUMA
      entries together used the writable bit to decide whether a particular
      page was likely to be shared read-only or not.
      
      And with the change to make the NUMA handling use the regular permission
      setting functions, that writable bit was basically always cleared for
      private mappings due to COW.  So even if the page actually ends up being
      written to in the end, the NUMA balancing would act as if it was always
      shared RO.
      
      This code is a heuristic anyway, so the fix - at least for now - is to
      instead check whether the page is dirty rather than writable.  The bit
      doesn't change with protection changes.
      
      NOTE! This also adds a FIXME comment to revisit this issue,
      
      Not only should we probably re-visit the whole "is this a shared
      read-only page" heuristic (we might want to take the vma permissions
      into account and base this more on those than the per-page ones, and
      also look at whether the particular access that triggers it is a write
      or not), but the whole COW issue shows that we should think about the
      NUMA fault handling some more.
      
      For example, maybe we should do the early-COW thing that a regular fault
      does.  Or maybe we should accept that while using the same bits as
      PROTNONE was a good thing (and got rid of the specual NUMA bit), we
      might still want to just preseve the other protection bits across NUMA
      faulting.
      
      Those are bigger questions, left for later.  This just fixes up the
      heuristic so that it at least approximates working again.  More analysis
      and work needed.
      Reported-by: default avatarDave Chinner <david@fromorbit.com>
      Tested-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>,
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      53da3bc2
    • Jakub Kicinski's avatar
      Revert "i2c: core: Dispose OF IRQ mapping at client removal time" · a4944572
      Jakub Kicinski authored
      This reverts commit e4df3a0b
      ("i2c: core: Dispose OF IRQ mapping at client removal time")
      
      Calling irq_dispose_mapping() will destroy the mapping and disassociate
      the IRQ from the IRQ chip to which it belongs. Keeping it is OK, because
      existent mappings are reused properly.
      
      Also, this commit breaks drivers using devm* for IRQ management on
      OF-based systems because devm* cleanup happens in device code, after
      bus's remove() method returns.
      Signed-off-by: default avatarJakub Kicinski <kubakici@wp.pl>
      Reported-by: default avatarSébastien Szymanski <sebastien.szymanski@armadeus.com>
      Acked-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Acked-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      [wsa: updated the commit message with findings fromt the other bug report]
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Cc: stable@kernel.org
      Fixes: e4df3a0b
      a4944572