1. 09 Jun, 2015 28 commits
  2. 23 May, 2015 12 commits
    • Rabin Vincent's avatar
      Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY" · 90cdd181
      Rabin Vincent authored
      [ Upstream commit c0403ec0 ]
      
      This reverts Linux 4.1-rc1 commit 0618764c.
      
      The problem which that commit attempts to fix actually lies in the
      Freescale CAAM crypto driver not dm-crypt.
      
      dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG.  This means the the crypto
      driver should internally backlog requests which arrive when the queue is
      full and process them later.  Until the crypto hw's queue becomes full,
      the driver returns -EINPROGRESS.  When the crypto hw's queue if full,
      the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
      expected to backlog the request and process it when the hardware has
      queue space.  At the point when the driver takes the request from the
      backlog and starts processing it, it calls the completion function with
      a status of -EINPROGRESS.  The completion function is called (for a
      second time, in the case of backlogged requests) with a status/err of 0
      when a request is done.
      
      Crypto drivers for hardware without hardware queueing use the helpers,
      crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request()
      and crypto_get_backlog() helpers to implement this behaviour correctly,
      while others implement this behaviour without these helpers (ccp, for
      example).
      
      dm-crypt (before the patch that needs reverting) uses this API
      correctly.  It queues up as many requests as the hw queues will allow
      (i.e. as long as it gets back -EINPROGRESS from the request function).
      Then, when it sees at least one backlogged request (gets -EBUSY), it
      waits till that backlogged request is handled (completion gets called
      with -EINPROGRESS), and then continues.  The references to
      af_alg_wait_for_completion() and af_alg_complete() in that commit's
      commit message are irrelevant because those functions only handle one
      request at a time, unlink dm-crypt.
      
      The problem is that the Freescale CAAM driver, which that commit
      describes as having being tested with, fails to implement the
      backlogging behaviour correctly.  In cam_jr_enqueue(), if the hardware
      queue is full, it simply returns -EBUSY without backlogging the request.
      What the observed deadlock was is not described in the commit message
      but it is obviously the wait_for_completion() in crypto_convert() where
      dm-crypto would wait for the completion being called with -EINPROGRESS
      in the case of backlogged requests.  This completion will never be
      completed due to the bug in the CAAM driver.
      
      Commit 0618764c incorrectly made dm-crypt wait for every request,
      even when the driver/hardware queues are not full, which means that
      dm-crypt will never see -EBUSY.  This means that that commit will cause
      a performance regression on all crypto drivers which implement the API
      correctly.
      
      Revert it.  Correct backlog handling should be implemented in the CAAM
      driver instead.
      
      Cc'ing stable purely because commit 0618764c did.  If for some reason
      a stable@ kernel did pick up commit 0618764c it should get reverted.
      Signed-off-by: default avatarRabin Vincent <rabin.vincent@axis.com>
      Reviewed-by: default avatarHoria Geanta <horia.geanta@freescale.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      90cdd181
    • Ben Hutchings's avatar
      xen-pciback: Add name prefix to global 'permissive' variable · 8d102e1d
      Ben Hutchings authored
      [ Upstream commit 8014bcc8 ]
      
      The variable for the 'permissive' module parameter used to be static
      but was recently changed to be extern.  This puts it in the kernel
      global namespace if the driver is built-in, so its name should begin
      with a prefix identifying the driver.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: af6fc858 ("xen-pciback: limit guest control of command register")
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8d102e1d
    • Boris Ostrovsky's avatar
      xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq() · 3b2ec381
      Boris Ostrovsky authored
      [ Upstream commit 16e6bd59 ]
      
      .. because bind_evtchn_to_cpu(evtchn, cpu) will map evtchn to
      'info' and pass 'info' down to xen_evtchn_port_bind_to_cpu().
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Tested-by: default avatarAnnie Li <annie.li@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3b2ec381
    • Boris Ostrovsky's avatar
      xen/console: Update console event channel on resume · 0120beac
      Boris Ostrovsky authored
      [ Upstream commit b9d934f2 ]
      
      After a resume the hypervisor/tools may change console event
      channel number. We should re-query it.
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0120beac
    • Boris Ostrovsky's avatar
      xen/xenbus: Update xenbus event channel on resume · df39fed6
      Boris Ostrovsky authored
      [ Upstream commit 16f1cf3b ]
      
      After a resume the hypervisor/tools may change xenbus event
      channel number. We should re-query it.
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      df39fed6
    • Boris Ostrovsky's avatar
      xen/events: Clear cpu_evtchn_mask before resuming · 5ea5c920
      Boris Ostrovsky authored
      [ Upstream commit 5cec9883 ]
      
      When a guest is resumed, the hypervisor may change event channel
      assignments. If this happens and the guest uses 2-level events it
      is possible for the interrupt to be claimed by wrong VCPU since
      cpu_evtchn_mask bits may be stale. This can happen even though
      evtchn_2l_bind_to_cpu() attempts to clear old bits: irq_info that
      is passed in is not necessarily the original one (from pre-migration
      times) but instead is freshly allocated during resume and so any
      information about which CPU the channel was bound to is lost.
      
      Thus we should clear the mask during resume.
      
      We also need to make sure that bits for xenstore and console channels
      are set when these two subsystems are resumed. While rebind_evtchn_irq()
      (which is invoked for both of them on a resume) calls irq_set_affinity(),
      the latter will in fact postpone setting affinity until handling the
      interrupt. But because cpu_evtchn_mask will have bits for these two
      cleared we won't be able to take the interrupt.
      
      With that in mind, we need to bind those two channels explicitly in
      rebind_evtchn_irq(). We will keep irq_set_affinity() so that we have a
      pass through generic irq affinity code later, in case something needs
      to be updated there as well.
      
      (Also replace cpumask_of(0) with cpumask_of(info->cpu) in
      rebind_evtchn_irq(): it should be set to zero in preceding
      xen_irq_info_evtchn_setup().)
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reported-by: default avatarAnnie Li <annie.li@oracle.com>
      Cc: <stable@vger.kernel.org> # 3.14+
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5ea5c920
    • Naoya Horiguchi's avatar
      mm: soft-offline: fix num_poisoned_pages counting on concurrent events · 881241e8
      Naoya Horiguchi authored
      [ Upstream commit 602498f9 ]
      
      If multiple soft offline events hit one free page/hugepage concurrently,
      soft_offline_page() can handle the free page/hugepage multiple times,
      which makes num_poisoned_pages counter increased more than once.  This
      patch fixes this wrong counting by checking TestSetPageHWPoison for normal
      papes and by checking the return value of dequeue_hwpoisoned_huge_page()
      for hugepages.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarDean Nelson <dnelson@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: <stable@vger.kernel.org>	[3.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      881241e8
    • Tejun Heo's avatar
      writeback: use |1 instead of +1 to protect against div by zero · 4e258a15
      Tejun Heo authored
      [ Upstream commit 464d1387 ]
      
      mm/page-writeback.c has several places where 1 is added to the divisor
      to prevent division by zero exceptions; however, if the original
      divisor is equivalent to -1, adding 1 leads to division by zero.
      
      There are three places where +1 is used for this purpose - one in
      pos_ratio_polynom() and two in bdi_position_ratio().  The second one
      in bdi_position_ratio() actually triggered div-by-zero oops on a
      machine running a 3.10 kernel.  The divisor is
      
        x_intercept - bdi_setpoint + 1 == span + 1
      
      span is confirmed to be (u32)-1.  It isn't clear how it ended up that
      but it could be from write bandwidth calculation underflow fixed by
      c72efb65 ("writeback: fix possible underflow in write bandwidth
      calculation").
      
      At any rate, +1 isn't a proper protection against div-by-zero.  This
      patch converts all +1 protections to |1.  Note that
      bdi_update_dirty_ratelimit() was already using |1 before this patch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4e258a15
    • Al Viro's avatar
      path_openat(): fix double fput() · f42b4553
      Al Viro authored
      [ Upstream commit f15133df ]
      
      path_openat() jumps to the wrong place after do_tmpfile() - it has
      already done path_cleanup() (as part of path_lookupat() called by
      do_tmpfile()), so doing that again can lead to double fput().
      
      Cc: stable@vger.kernel.org	# v3.11+
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f42b4553
    • Naoya Horiguchi's avatar
      mm/memory-failure: call shake_page() when error hits thp tail page · d7259f44
      Naoya Horiguchi authored
      [ Upstream commit 09789e5d ]
      
      Currently memory_failure() calls shake_page() to sweep pages out from
      pcplists only when the victim page is 4kB LRU page or thp head page.
      But we should do this for a thp tail page too.
      
      Consider that a memory error hits a thp tail page whose head page is on
      a pcplist when memory_failure() runs.  Then, the current kernel skips
      shake_pages() part, so hwpoison_user_mappings() returns without calling
      split_huge_page() nor try_to_unmap() because PageLRU of the thp head is
      still cleared due to the skip of shake_page().
      
      As a result, me_huge_page() runs for the thp, which is broken behavior.
      
      One effect is a leak of the thp.  And another is to fail to isolate the
      memory error, so later access to the error address causes another MCE,
      which kills the processes which used the thp.
      
      This patch fixes this problem by calling shake_page() for thp tail case.
      
      Fixes: 385de357 ("thp: allow a hwpoisoned head page to be put back to LRU")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarDean Nelson <dnelson@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
      Cc: <stable@vger.kernel.org>	[3.4+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d7259f44
    • Eric W. Biederman's avatar
      mnt: Fix fs_fully_visible to verify the root directory is visible · 2c981af6
      Eric W. Biederman authored
      [ Upstream commit 7e96c1b0 ]
      
      This fixes a dumb bug in fs_fully_visible that allows proc or sys to
      be mounted if there is a bind mount of part of /proc/ or /sys/ visible.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarEric Windisch <ewindisch@docker.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2c981af6
    • Johan Hovold's avatar
      gpio: sysfs: fix memory leaks and device hotplug · d4b113a8
      Johan Hovold authored
      [ Upstream commit 483d8211 ]
      
      Unregister GPIOs requested through sysfs at chip remove to avoid leaking
      the associated memory and sysfs entries.
      
      The stale sysfs entries prevented the gpio numbers from being exported
      when the gpio range was later reused (e.g. at device reconnect).
      
      This also fixes the related module-reference leak.
      
      Note that kernfs makes sure that any on-going sysfs operations finish
      before the class devices are unregistered and that further accesses
      fail.
      
      The chip exported flag is used to prevent gpiod exports during removal.
      This also makes it harder to trigger, but does not fix, the related race
      between gpiochip_remove and export_store, which is really a race with
      gpiod_request that needs to be addressed separately.
      
      Also note that this would prevent the crashes (e.g. NULL-dereferences)
      at reconnect that affects pre-3.18 kernels, as well as use-after-free on
      operations on open attribute files on pre-3.14 kernels (prior to
      kernfs).
      
      Fixes: d8f388d8 ("gpio: sysfs interface")
      Cc: stable <stable@vger.kernel.org>	# v2.6.27: 01cca93aSigned-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d4b113a8