1. 22 Aug, 2016 40 commits
    • Sven Eckelmann's avatar
      batman-adv: Fix unexpected free of bcast_own on add_if error · 4a8b24ed
      Sven Eckelmann authored
      commit f7dcdf5f upstream.
      
      The function batadv_iv_ogm_orig_add_if allocates new buffers for bcast_own
      and bcast_own_sum. It is expected that these buffers are unchanged in case
      either bcast_own or bcast_own_sum couldn't be resized.
      
      But the error handling of this function frees the already resized buffer
      for bcast_own when the allocation of the new bcast_own_sum buffer failed.
      This will lead to an invalid memory access when some code will try to
      access bcast_own.
      
      Instead the resized new bcast_own buffer has to be kept. This will not lead
      to problems because the size of the buffer was only increased and therefore
      no user of the buffer will try to access bytes outside of the new buffer.
      
      Fixes: d0015fdd ("batman-adv: provide orig_node routing API")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4a8b24ed
    • Florian Westphal's avatar
      batman-adv: fix skb deref after free · fd966aaf
      Florian Westphal authored
      commit 63d443ef upstream.
      
      batadv_send_skb_to_orig() calls dev_queue_xmit() so we can't use skb->len.
      
      Fixes: 95332477 ("batman-adv: network coding - buffer unicast packets before forward")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      fd966aaf
    • Daniel Lezcano's avatar
      cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter() · 188e5f79
      Daniel Lezcano authored
      commit e7387da5 upstream.
      
      Commit 0b89e9aa (cpuidle: delay enabling interrupts until all
      coupled CPUs leave idle) rightfully fixed a regression by letting
      the coupled idle state framework to handle local interrupt enabling
      when the CPU is exiting an idle state.
      
      The current code checks if the idle state is coupled and, if so, it
      will let the coupled code to enable interrupts. This way, it can
      decrement the ready-count before handling the interrupt. This
      mechanism prevents the other CPUs from waiting for a CPU which is
      handling interrupts.
      
      But the check is done against the state index returned by the back
      end driver's ->enter functions which could be different from the
      initial index passed as parameter to the cpuidle_enter_state()
      function.
      
       entered_state = target_state->enter(dev, drv, index);
      
       [ ... ]
      
       if (!cpuidle_state_is_coupled(drv, entered_state))
      	local_irq_enable();
      
       [ ... ]
      
      If the 'index' is referring to a coupled idle state but the
      'entered_state' is *not* coupled, then the interrupts are enabled
      again. All CPUs blocked on the sync barrier may busy loop longer
      if the CPU has interrupts to handle before decrementing the
      ready-count. That's consuming more energy than saving.
      
      Fixes: 0b89e9aa (cpuidle: delay enabling interrupts until all coupled CPUs leave idle)
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      [ rjw: Subject & changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      188e5f79
    • Chuck Lever's avatar
      sunrpc: Update RPCBIND_MAXNETIDLEN · b3f2cf10
      Chuck Lever authored
      commit 4b9c7f9d upstream.
      
      Commit 176e21ee ("SUNRPC: Support for RPC over AF_LOCAL
      transports") added a 5-character netid, but did not bump
      RPCBIND_MAXNETIDLEN from 4 to 5.
      
      Fixes: 176e21ee ("SUNRPC: Support for RPC over AF_LOCAL ...")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b3f2cf10
    • Steve French's avatar
      remove directory incorrectly tries to set delete on close on non-empty directories · 496bc4a1
      Steve French authored
      commit 897fba11 upstream.
      
      Wrong return code was being returned on SMB3 rmdir of
      non-empty directory.
      
      For SMB3 (unlike for cifs), we attempt to delete a directory by
      set of delete on close flag on the open. Windows clients set
      this flag via a set info (SET_FILE_DISPOSITION to set this flag)
      which properly checks if the directory is empty.
      
      With this patch on smb3 mounts we correctly return
       "DIRECTORY NOT EMPTY"
      on attempts to remove a non-empty directory.
      Signed-off-by: default avatarSteve French <steve.french@primarydata.com>
      Acked-by: default avatarSachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      496bc4a1
    • Stefan Metzmacher's avatar
      fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication · d1c77424
      Stefan Metzmacher authored
      commit 1a967d6c upstream.
      
      Only server which map unknown users to guest will allow
      access using a non-null NTLMv2_Response.
      
      For Samba it's the "map to guest = bad user" option.
      
      BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913Signed-off-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      [bwh: Backported to 3.16:adjust context, indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d1c77424
    • Stefan Metzmacher's avatar
      fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication · 2432f27f
      Stefan Metzmacher authored
      commit 777f69b8 upstream.
      
      Only server which map unknown users to guest will allow
      access using a non-null NTChallengeResponse.
      
      For Samba it's the "map to guest = bad user" option.
      
      BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913Signed-off-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      [bwh: Backported to 3.16: adjust context, indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2432f27f
    • Stefan Metzmacher's avatar
      fs/cifs: correctly to anonymous authentication for the LANMAN authentication · 63e707b0
      Stefan Metzmacher authored
      commit fa8f3a35 upstream.
      
      Only server which map unknown users to guest will allow
      access using a non-null LMChallengeResponse.
      
      For Samba it's the "map to guest = bad user" option.
      
      BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913Signed-off-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      [bwh: Backported to 3.16: adjust context, indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      63e707b0
    • Stefan Metzmacher's avatar
      fs/cifs: correctly to anonymous authentication via NTLMSSP · c764ef67
      Stefan Metzmacher authored
      commit cfda35d9 upstream.
      
      See [MS-NLMP] 3.2.5.1.2 Server Receives an AUTHENTICATE_MESSAGE from the Client:
      
         ...
         Set NullSession to FALSE
         If (AUTHENTICATE_MESSAGE.UserNameLen == 0 AND
            AUTHENTICATE_MESSAGE.NtChallengeResponse.Length == 0 AND
            (AUTHENTICATE_MESSAGE.LmChallengeResponse == Z(1)
             OR
             AUTHENTICATE_MESSAGE.LmChallengeResponse.Length == 0))
             -- Special case: client requested anonymous authentication
             Set NullSession to TRUE
         ...
      
      Only server which map unknown users to guest will allow
      access using a non-null NTChallengeResponse.
      
      For Samba it's the "map to guest = bad user" option.
      
      BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913Signed-off-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c764ef67
    • Lyude's avatar
      drm/fb_helper: Fix references to dev->mode_config.num_connector · a6ff2d56
      Lyude authored
      commit 255f0e7c upstream.
      
      During boot, MST hotplugs are generally expected (even if no physical
      hotplugging occurs) and result in DRM's connector topology changing.
      This means that using num_connector from the current mode configuration
      can lead to the number of connectors changing under us. This can lead to
      some nasty scenarios in fbcon:
      
      - We allocate an array to the size of dev->mode_config.num_connectors.
      - MST hotplug occurs, dev->mode_config.num_connectors gets incremented.
      - We try to loop through each element in the array using the new value
        of dev->mode_config.num_connectors, and end up going out of bounds
        since dev->mode_config.num_connectors is now larger then the array we
        allocated.
      
      fb_helper->connector_count however, will always remain consistent while
      we do a modeset in fb_helper.
      
      Note: This is just polish for 4.7, Dave Airlie's drm_connector
      refcounting fixed these bugs for real. But it's good enough duct-tape
      for stable kernel backporting, since backporting the refcounting
      changes is way too invasive.
      Signed-off-by: default avatarLyude <cpaul@redhat.com>
      [danvet: Clarify why we need this. Also remove the now unused "dev"
      local variable to appease gcc.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1463065021-18280-3-git-send-email-cpaul@redhat.comSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a6ff2d56
    • Lyude's avatar
      drm/i915/fbdev: Fix num_connector references in intel_fb_initial_config() · 08cc973b
      Lyude authored
      commit 14a3842a upstream.
      
      During boot time, MST devices usually send a ton of hotplug events
      irregardless of whether or not any physical hotplugs actually occurred.
      Hotplugs mean connectors being created/destroyed, and the number of DRM
      connectors changing under us. This isn't a problem if we use
      fb_helper->connector_count since we only set it once in the code,
      however if we use num_connector from struct drm_mode_config we risk it's
      value changing under us. On top of that, there's even a chance that
      dev->mode_config.num_connector != fb_helper->connector_count. If the
      number of connectors happens to increase under us, we'll end up using
      the wrong array size for memcpy and start writing beyond the actual
      length of the array, occasionally resulting in kernel panics.
      
      Note: This is just polish for 4.7, Dave Airlie's drm_connector
      refcounting fixed these bugs for real. But it's good enough duct-tape
      for stable kernel backporting, since backporting the refcounting
      changes is way too invasive.
      Signed-off-by: default avatarLyude <cpaul@redhat.com>
      [danvet: Clarify why we need this.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1463065021-18280-2-git-send-email-cpaul@redhat.com
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      08cc973b
    • Peter Zijlstra's avatar
      sched/preempt: Fix preempt_count manipulations · 9df37470
      Peter Zijlstra authored
      commit 2e636d5e upstream.
      
      Vikram reported that his ARM64 compiler managed to 'optimize' away the
      preempt_count manipulations in code like:
      
      	preempt_enable_no_resched();
      	put_user();
      	preempt_disable();
      
      Irrespective of that fact that that is horrible code that should be
      fixed for many reasons, it does highlight a deficiency in the generic
      preempt_count manipulators. As it is never right to combine/elide
      preempt_count manipulations like this.
      
      Therefore sprinkle some volatile in the two generic accessors to
      ensure the compiler is aware of the fact that the preempt_count is
      observed outside of the regular program-order view and thus cannot be
      optimized away like this.
      
      x86; the only arch not using the generic code is not affected as we
      do all this in asm in order to use the segment base per-cpu stuff.
      Reported-by: default avatarVikram Mulukutla <markivx@codeaurora.org>
      Tested-by: default avatarVikram Mulukutla <markivx@codeaurora.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: a7878709 ("sched, arch: Create asm/preempt.h")
      Link: http://lkml.kernel.org/r/20160516131751.GH3205@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.16: use ACCESS_ONCE() instead of READ_ONCE()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9df37470
    • Herbert Xu's avatar
      netlink: Fix dump skb leak/double free · ed8ab6b2
      Herbert Xu authored
      commit 92964c79 upstream.
      
      When we free cb->skb after a dump, we do it after releasing the
      lock.  This means that a new dump could have started in the time
      being and we'll end up freeing their skb instead of ours.
      
      This patch saves the skb and module before we unlock so we free
      the right memory.
      
      Fixes: 16b304f3 ("netlink: Eliminate kmalloc in netlink dump operation.")
      Reported-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ed8ab6b2
    • Prarit Bhargava's avatar
      PCI: Disable all BAR sizing for devices with non-compliant BARs · 36265cad
      Prarit Bhargava authored
      commit ad67b437 upstream.
      
      b84106b4 ("PCI: Disable IO/MEM decoding for devices with non-compliant
      BARs") disabled BAR sizing for BARs 0-5 of devices that don't comply with
      the PCI spec.  But it didn't do anything for expansion ROM BARs, so we
      still try to size them, resulting in warnings like this on Broadwell-EP:
      
        pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
      
      Move the non-compliant BAR check from __pci_read_base() up to
      pci_read_bases() so it applies to the expansion ROM BAR as well as
      to BARs 0-5.
      
      Note that direct callers of __pci_read_base(), like sriov_init(), will now
      bypass this check.  We haven't had reports of devices with broken SR-IOV
      BARs yet.
      
      [bhelgaas: changelog]
      Fixes: b84106b4 ("PCI: Disable IO/MEM decoding for devices with non-compliant BARs")
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: "H. Peter Anvin" <hpa@zytor.com>
      CC: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      36265cad
    • Prarit Bhargava's avatar
      x86/PCI: Mark Broadwell-EP Home Agent 1 as having non-compliant BARs · 27d4396a
      Prarit Bhargava authored
      commit da77b671 upstream.
      
      Commit b8941571 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having
      non-compliant BARs") marked Home Agent 0 & PCU has having non-compliant
      BARs.  Home Agent 1 also has non-compliant BARs.
      
      Mark Home Agent 1 as having non-compliant BARs so the PCI core doesn't
      touch them.
      
      The problem with these devices is documented in the Xeon v4 specification
      update:
      
        BDF2          PCI BARs in the Home Agent Will Return Non-Zero Values
                      During Enumeration
      
        Problem:      During system initialization the Operating System may access
                      the standard PCI BARs (Base Address Registers).  Due to
                      this erratum, accesses to the Home Agent BAR registers (Bus
                      1; Device 18; Function 0,4; Offsets (0x14-0x24) will return
                      non-zero values.
      
        Implication:  The operating system may issue a warning.  Intel has not
                      observed any functional failures due to this erratum.
      
      Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v4-spec-update.html
      Fixes: b8941571 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs")
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: "H. Peter Anvin" <hpa@zytor.com>
      CC: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      27d4396a
    • Tariq Toukan's avatar
      net/mlx4_core: Fix access to uninitialized index · c4fe39b5
      Tariq Toukan authored
      commit 2bb07e15 upstream.
      
      Prevent using uninitialized or negative index when handling
      steering entries.
      
      Fixes: b12d93d6 ('mlx4: Add support for promiscuous mode in the new steering model.')
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c4fe39b5
    • Bartlomiej Zolnierkiewicz's avatar
      blk-mq: fix undefined behaviour in order_to_size() · 66003d80
      Bartlomiej Zolnierkiewicz authored
      commit b3a834b1 upstream.
      
      When this_order variable in blk_mq_init_rq_map() becomes zero
      the code incorrectly decrements the variable and passes the result
      to order_to_size() helper causing undefined behaviour:
      
       UBSAN: Undefined behaviour in block/blk-mq.c:1459:27
       shift exponent 4294967295 is too large for 32-bit type 'unsigned int'
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1f #22
      
      Fix the code by checking this_order variable for not having the zero
      value first.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Fixes: 320ae51f ("blk-mq: new multi-queue block IO queueing mechanism")
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      66003d80
    • Adrian Hunter's avatar
      mmc: mmc: Fix partition switch timeout for some eMMCs · c47ae034
      Adrian Hunter authored
      commit 1c447116 upstream.
      
      Some eMMCs set the partition switch timeout too low.
      
      Now typically eMMCs are considered a critical component (e.g. because
      they store the root file system) and consequently are expected to be
      reliable.  Thus we can neglect the use case where eMMCs can't switch
      reliably and we might want a lower timeout to facilitate speedy
      recovery.
      
      Although we could employ a quirk for the cards that are affected (if
      we could identify them all), as described above, there is little
      benefit to having a low timeout, so instead simply set a minimum
      timeout.
      
      The minimum is set to 300ms somewhat arbitrarily - the examples that
      have been seen had a timeout of 10ms but were sometimes taking 60-70ms.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c47ae034
    • Dan Carpenter's avatar
      i40e: fix an uninitialized variable bug · 8e009bc2
      Dan Carpenter authored
      commit 1c306f7f upstream.
      
      We removed this initialization but it is required.  Let's put it back.
      
      Fixes: 895106a5 ('i40e: trivial fixes')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8e009bc2
    • Mark Bloch's avatar
      IB/core: Fix a potential array overrun in CMA and SA agent · da71e66f
      Mark Bloch authored
      commit 2fa2d4fb upstream.
      
      Fix array overrun when going over callback table.
      In declaration of callback table, the max size isn't provided and
      in registration phase, it is provided.
      
      There is potential scenario where a new operation is added
      and it is not supported by current client. The acceptance of
      such operation by ib_netlink will cause to array overrun.
      
      Fixes: 809d5fc9 ("infiniband: pass rdma_cm module to netlink_dump_start")
      Fixes: b493d91d ("iwcm: common code for port mapper")
      Fixes: 2ca546b9 ("IB/sa: Route SA pathrecord query through netlink")
      Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      [bwh: Backported to 3.16:
       - Only cma.c needs to be fixed
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      da71e66f
    • Mark Bloch's avatar
      IB/IWPM: Fix a potential skb leak · 6c1082d4
      Mark Bloch authored
      commit 5ed935e8 upstream.
      
      In case ibnl_put_msg fails in send_nlmsg_done,
      the function returns with -ENOMEM without freeing.
      
      This patch fixes this behavior.
      
      Fixes: 30dc5e63 ("RDMA/core: Add support for iWARP Port Mapper user space service")
      Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6c1082d4
    • Hariprasad S's avatar
      RDMA/iw_cxgb4: Always wake up waiter in c4iw_peer_abort_intr() · 026d4750
      Hariprasad S authored
      commit 093108cb upstream.
      
      Currently c4iw_peer_abort_intr() does not wake up the waiter if the
      endpoint state indicates we're using MPAv2 and we're currently trying to
      connect. This was introduced with commit 7c0a33d6 ("RDMA/cxgb4:
      Don't wakeup threads for MPAv2")
      
      However, this original fix is flawed because it introduces a race that
      can cause a deadlock of the iwarp stack.  Here is the race:
      
      ->local side sets up an active offload connection.
      
      ->local side sends MPA_START request.
      
      ->peer sends MPA_START response.
      
      ->local side ingress cpl thread begins processing the MPA_START response,
      but before it changes the state from MPA_REQ_SENT to FPDU_MODE:
      
      ->peer sends a RST which results in a ABORT_REQ_RSS.  This triggers
      peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev
      is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the
      stack will re-attempt the connection using MPAv1.
      
      ->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls
      c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR
      to firmware.  But since HW sent an abort, FW correctly drops the RI_WR/INIT
      WR.
      
      ->So the cpl thread is stuck waiting for a reply and cannot process the
      ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a
      halt because no more ingress cpls are processed by the stack...
      
      The correct fix for the issue is to always do the wake up in
      c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect().
      
      Fixes: 7c0a33d6 ("RDMA/cxgb4: Don't wakeup threads for MPAv2")
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      026d4750
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Prevent overflow of size in ring_buffer_resize() · e5e2cbc7
      Steven Rostedt (Red Hat) authored
      commit 59643d15 upstream.
      
      If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
      then the DIV_ROUND_UP() will return zero.
      
      Here's the details:
      
        # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb
      
      tracing_entries_write() processes this and converts kb to bytes.
      
       18014398509481980 << 10 = 18446744073709547520
      
      and this is passed to ring_buffer_resize() as unsigned long size.
      
       size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
      
      Where DIV_ROUND_UP(a, b) is (a + b - 1)/b
      
      BUF_PAGE_SIZE is 4080 and here
      
       18446744073709547520 + 4080 - 1 = 18446744073709551599
      
      where 18446744073709551599 is still smaller than 2^64
      
       2^64 - 18446744073709551599 = 17
      
      But now 18446744073709551599 / 4080 = 4521260802379792
      
      and size = size * 4080 = 18446744073709551360
      
      This is checked to make sure its still greater than 2 * 4080,
      which it is.
      
      Then we convert to the number of buffer pages needed.
      
       nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)
      
      but this time size is 18446744073709551360 and
      
       2^64 - (18446744073709551360 + 4080 - 1) = -3823
      
      Thus it overflows and the resulting number is less than 4080, which makes
      
        3823 / 4080 = 0
      
      an nr_pages is set to this. As we already checked against the minimum that
      nr_pages may be, this causes the logic to fail as well, and we crash the
      kernel.
      
      There's no reason to have the two DIV_ROUND_UP() (that's just result of
      historical code changes), clean up the code and fix this bug.
      
      Fixes: 83f40318 ("ring-buffer: Make removal of ring buffer pages atomic")
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e5e2cbc7
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Use long for nr_pages to avoid overflow failures · ad7365e6
      Steven Rostedt (Red Hat) authored
      commit 9b94a8fb upstream.
      
      The size variable to change the ring buffer in ftrace is a long. The
      nr_pages used to update the ring buffer based on the size is int. On 64 bit
      machines this can cause an overflow problem.
      
      For example, the following will cause the ring buffer to crash:
      
       # cd /sys/kernel/debug/tracing
       # echo 10 > buffer_size_kb
       # echo 8556384240 > buffer_size_kb
      
      Then you get the warning of:
      
       WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260
      
      Which is:
      
        RB_WARN_ON(cpu_buffer, nr_removed);
      
      Note each ring buffer page holds 4080 bytes.
      
      This is because:
      
       1) 10 causes the ring buffer to have 3 pages.
          (10kb requires 3 * 4080 pages to hold)
      
       2) (2^31 / 2^10  + 1) * 4080 = 8556384240
          The value written into buffer_size_kb is shifted by 10 and then passed
          to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760
      
       3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
          which is 4080. 8761737461760 / 4080 = 2147484672
      
       4) nr_pages is subtracted from the current nr_pages (3) and we get:
          2147484669. This value is saved in a signed integer nr_pages_to_update
      
       5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
          turns into the value of -2147482627
      
       6) As the value is a negative number, in update_pages_handler() it is
          negated and passed to rb_remove_pages() and 2147482627 pages will
          be removed, which is much larger than 3 and it causes the warning
          because not all the pages asked to be removed were removed.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001
      
      Fixes: 7a8e76a3 ("tracing: unified trace buffer")
      Reported-by: default avatarHao Qin <QEver.cn@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ad7365e6
    • Paul Burton's avatar
      MIPS: math-emu: Fix jalr emulation when rd == $0 · 193f0270
      Paul Burton authored
      commit ab4a92e6 upstream.
      
      When emulating a jalr instruction with rd == $0, the code in
      isBranchInstr was incorrectly writing to GPR $0 which should actually
      always remain zeroed. This would lead to any further instructions
      emulated which use $0 operating on a bogus value until the task is next
      context switched, at which point the value of $0 in the task context
      would be restored to the correct zero by a store in SAVE_SOME. Fix this
      by not writing to rd if it is $0.
      
      Fixes: 102cedc3 ("MIPS: microMIPS: Floating point support.")
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: Maciej W. Rozycki <macro@imgtec.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/13160/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      193f0270
    • Lars Persson's avatar
      MIPS: Fix race condition in lazy cache flushing. · f93675ec
      Lars Persson authored
      commit 4d46a67a upstream.
      
      The lazy cache flushing implemented in the MIPS kernel suffers from a
      race condition that is exposed by do_set_pte() in mm/memory.c.
      
      A pre-condition is a file-system that writes to the page from the CPU
      in its readpage method and then calls flush_dcache_page(). One example
      is ubifs. Another pre-condition is that the dcache flush is postponed
      in __flush_dcache_page().
      
      Upon a page fault for an executable mapping not existing in the
      page-cache, the following will happen:
      1. Write to the page
      2. flush_dcache_page
      3. flush_icache_page
      4. set_pte_at
      5. update_mmu_cache (commits the flush of a dcache-dirty page)
      
      Between steps 4 and 5 another thread can hit the same page and it will
      encounter a valid pte. Because the data still is in the L1 dcache the CPU
      will fetch stale data from L2 into the icache and execute garbage.
      
      This fix moves the commit of the cache flush to step 3 to close the
      race window. It also reduces the amount of flushes on non-executable
      mappings because we never enter __flush_dcache_page() for non-aliasing
      CPUs.
      
      Regressions can occur in drivers that mistakenly relies on the
      flush_dcache_page() in get_user_pages() for DMA operations.
      
      [ralf@linux-mips.org: Folded in patch 9346 to fix highmem issue.]
      Signed-off-by: default avatarLars Persson <larper@axis.com>
      Cc: linux-mips@linux-mips.org
      Cc: paul.burton@imgtec.com
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/9346/
      Patchwork: https://patchwork.linux-mips.org/patch/9738/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f93675ec
    • Guilherme G. Piccoli's avatar
      powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism · 3e0363f5
      Guilherme G. Piccoli authored
      commit 8445a87f upstream.
      
      Commit 39baadbf ("powerpc/eeh: Remove eeh information from pci_dn")
      changed the pci_dn struct by removing its EEH-related members.
      As part of this clean-up, DDW mechanism was modified to read the device
      configuration address from eeh_dev struct.
      
      As a consequence, now if we disable EEH mechanism on kernel command-line
      for example, the DDW mechanism will fail, generating a kernel oops by
      dereferencing a NULL pointer (which turns to be the eeh_dev pointer).
      
      This patch just changes the configuration address calculation on DDW
      functions to a manual calculation based on pci_dn members instead of
      using eeh_dev-based address.
      
      No functional changes were made. This was tested on pSeries, both
      in PHyp and qemu guest.
      
      Fixes: 39baadbf ("powerpc/eeh: Remove eeh information from pci_dn")
      Reviewed-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3e0363f5
    • Vik Heyndrickx's avatar
      sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems · 36e9dca0
      Vik Heyndrickx authored
      commit 20878232 upstream.
      
      Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
      have no load at all.
      
      Uptime and /proc/loadavg on all systems with kernels released during the
      last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
      minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
      idle systems, but the way the kernel calculates this value prevents it
      from getting lower than the mentioned values.
      
      Likewise but not as obviously noticeable, a fully loaded system with no
      processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
      (multiplied by number of cores).
      
      Once the (old) load becomes 93 or higher, it mathematically can never
      get lower than 93, even when the active (load) remains 0 forever.
      This results in the strange 0.00, 0.01, 0.05 uptime values on idle
      systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.
      
      It is not correct to add a 0.5 rounding (=1024/2048) here, since the
      result from this function is fed back into the next iteration again,
      so the result of that +0.5 rounding value then gets multiplied by
      (2048-2037), and then rounded again, so there is a virtual "ghost"
      load created, next to the old and active load terms.
      
      By changing the way the internally kept value is rounded, that internal
      value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
      increasing load, the internally kept load value is rounded up, when the
      load is decreasing, the load value is rounded down.
      
      The modified code was tested on nohz=off and nohz kernels. It was tested
      on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
      tested on single, dual, and octal cores system. It was tested on virtual
      hosts and bare hardware. No unwanted effects have been observed, and the
      problems that the patch intended to fix were indeed gone.
      Tested-by: default avatarDamien Wyart <damien.wyart@free.fr>
      Signed-off-by: default avatarVik Heyndrickx <vik.heyndrickx@veribox.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Doug Smythies <dsmythies@telus.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 0f004f5a ("sched: Cure more NO_HZ load average woes")
      Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      36e9dca0
    • wang yanqing's avatar
      rtlwifi: Fix logic error in enter/exit power-save mode · 1f38be9e
      wang yanqing authored
      commit 873ffe15 upstream.
      
      In commit a269913c ("rtlwifi: Rework rtl_lps_leave() and
      rtl_lps_enter() to use work queue"), the tests for enter/exit
      power-save mode were inverted. With this change applied, the
      wifi connection becomes much more stable.
      
      Fixes: a269913c ("rtlwifi: Rework rtl_lps_leave() and rtl_lps_enter() to use work queue")
      Signed-off-by: default avatarWang YanQing <udknight@gmail.com>
      Acked-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      [bwh: Backported to 3.16:
       - We only set a flag here to be used later, but it was also set the wrong way
       - Adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1f38be9e
    • Naveen N. Rao's avatar
      perf tools: Fix perf regs mask generation · e9c95ab5
      Naveen N. Rao authored
      commit f4782207 upstream.
      
      On some architectures (powerpc in particular), the number of registers
      exceeds what can be represented in an integer bitmask. Ensure we
      generate the proper bitmask on such platforms.
      
      Fixes: 71ad0f5e ("perf tools: Support for DWARF CFI unwinding on post processing")
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e9c95ab5
    • Michael Ellerman's avatar
      powerpc/mm/hash64: Fix subpage protection with 4K HPTE config · 481f0e6c
      Michael Ellerman authored
      commit aac55d75 upstream.
      
      With Linux page size of 64K and hardware only supporting 4K HPTE, if we
      use subpage protection, we always fail for the subpage 0 as shown
      below (using the selftest subpage_prot test):
      
        520175565:  (4520111850): Failed at 0x3fffad4b0000 (p=13,sp=0,w=0), want=fault, got=pass !
        4520890210: (4520826495): Failed at 0x3fffad5b0000 (p=29,sp=0,w=0), want=fault, got=pass !
        4521574251: (4521510536): Failed at 0x3fffad6b0000 (p=45,sp=0,w=0), want=fault, got=pass !
        4522258324: (4522194609): Failed at 0x3fffad7b0000 (p=61,sp=0,w=0), want=fault, got=pass !
      
      This is because hash preload wrongly inserts the HPTE entry for subpage
      0 without looking at the subpage protection information.
      
      Fix it by teaching should_hash_preload() not to preload if we have
      subpage protection configured for that range.
      
      It appears this has been broken since it was introduced in 2008.
      
      Fixes: fa28237c ("[POWERPC] Provide a way to protect 4k subpages when using 64k pages")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Rework into should_hash_preload() to avoid build fails w/SLICES=n]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      481f0e6c
    • Michael Ellerman's avatar
      powerpc/mm/hash64: Factor out hash preload psize check · 0ec2d36e
      Michael Ellerman authored
      commit 8bbc9b7b upstream.
      
      Currently we have a check in hash_preload() against the psize, which is
      only included when CONFIG_PPC_MM_SLICES is enabled. We want to expand
      this check in a subsequent patch, so factor it out to allow that. As a
      bonus it removes the #ifdef in the C code.
      
      Unfortunately we can't put this in the existing CONFIG_PPC_MM_SLICES
      block because it would require a forward declaration.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0ec2d36e
    • Arnd Bergmann's avatar
      kbuild: move -Wunused-const-variable to W=1 warning level · 282247a8
      Arnd Bergmann authored
      commit c9c6837d upstream.
      
      gcc-6 started warning by default about variables that are not
      used anywhere and that are marked 'const', generating many
      false positives in an allmodconfig build, e.g.:
      
      arch/arm/mach-davinci/board-da830-evm.c:282:20: warning: 'da830_evm_emif25_pins' defined but not used [-Wunused-const-variable=]
      arch/arm/plat-omap/dmtimer.c:958:34: warning: 'omap_timer_match' defined but not used [-Wunused-const-variable=]
      drivers/bluetooth/hci_bcm.c:625:39: warning: 'acpi_bcm_default_gpios' defined but not used [-Wunused-const-variable=]
      drivers/char/hw_random/omap-rng.c:92:18: warning: 'reg_map_omap4' defined but not used [-Wunused-const-variable=]
      drivers/devfreq/exynos/exynos5_bus.c:381:32: warning: 'exynos5_busfreq_int_pm' defined but not used [-Wunused-const-variable=]
      drivers/dma/mv_xor.c:1139:34: warning: 'mv_xor_dt_ids' defined but not used [-Wunused-const-variable=]
      
      This is similar to the existing -Wunused-but-set-variable warning
      that was added in an earlier release and that we disable by default
      now and only enable when W=1 is set, so it makes sense to do
      the same here. Once we have eliminated the majority of the
      warnings for both, we can put them back into the default list.
      
      We probably want this in backport kernels as well, to allow building
      them with gcc-6 without introducing extra warnings.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarOlof Johansson <olof@lixom.net>
      Acked-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarMichal Marek <mmarek@suse.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      282247a8
    • Julien Grall's avatar
      arm64: cpuinfo: Missing NULL terminator in compat_hwcap_str · 4ab708b7
      Julien Grall authored
      commit f228b494 upstream.
      
      The loop that browses the array compat_hwcap_str will stop when a NULL
      is encountered, however NULL is missing at the end of array. This will
      lead to overrun until a NULL is found somewhere in the following memory.
      In reality, this works out because the compat_hwcap2_str array tends to
      follow immediately in memory, and that *is* terminated correctly.
      Furthermore, the unsigned int compat_elf_hwcap is checked before
      printing each capability, so we end up doing the right thing because
      the size of the two arrays is less than 32. Still, this is an obvious
      mistake and should be fixed.
      
      Note for backporting: commit 12d11817 ("arm64: Move
      /proc/cpuinfo handling code") moved this code in v4.4. Prior to that
      commit, the same change should be made in arch/arm64/kernel/setup.c.
      
      Fixes: 44b82b77 "arm64: Fix up /proc/cpuinfo"
      Signed-off-by: default avatarJulien Grall <julien.grall@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4ab708b7
    • Will Deacon's avatar
      irqchip/gic: Ensure ordering between read of INTACK and shared data · e30364e0
      Will Deacon authored
      commit f86c4fbd upstream.
      
      When an IPI is generated by a CPU, the pattern looks roughly like:
      
        <write shared data>
        smp_wmb();
        <write to GIC to signal SGI>
      
      On the receiving CPU we rely on the fact that, once we've taken the
      interrupt, then the freshly written shared data must be visible to us.
      Put another way, the CPU isn't going to speculate taking an interrupt.
      
      Unfortunately, this assumption turns out to be broken.
      
      Consider that CPUx wants to send an IPI to CPUy, which will cause CPUy
      to read some shared_data. Before CPUx has done anything, a random
      peripheral raises an IRQ to the GIC and the IRQ line on CPUy is raised.
      CPUy then takes the IRQ and starts executing the entry code, heading
      towards gic_handle_irq. Furthermore, let's assume that a bunch of the
      previous interrupts handled by CPUy were SGIs, so the branch predictor
      kicks in and speculates that irqnr will be <16 and we're likely to
      head into handle_IPI. The prefetcher then grabs a speculative copy of
      shared_data which contains a stale value.
      
      Meanwhile, CPUx gets round to updating shared_data and asking the GIC
      to send an SGI to CPUy. Internally, the GIC decides that the SGI is
      more important than the peripheral interrupt (which hasn't yet been
      ACKed) but doesn't need to do anything to CPUy, because the IRQ line
      is already raised.
      
      CPUy then reads the ACK register on the GIC, sees the SGI value which
      confirms the branch prediction and we end up with a stale shared_data
      value.
      
      This patch fixes the problem by adding an smp_rmb() to the IPI entry
      code in gic_handle_irq. As it turns out, the combination of a control
      dependency and an ISB instruction from the EOI in the GICv3 driver is
      enough to provide the ordering we need, so we add a comment there
      justifying the absence of an explicit smp_rmb().
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      [bwh: Backported to 3.16: drop changes to irq-gic-v3]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e30364e0
    • Heiko Carstens's avatar
      s390/vmem: fix identity mapping · 62582a66
      Heiko Carstens authored
      commit c34a6905 upstream.
      
      The identity mapping is suboptimal for the last 2GB frame. The mapping
      will be established with a mix of 4KB and 1MB mappings instead of a
      single 2GB mapping.
      
      This happens because of a off-by-one bug introduced with
      commit 50be6345 ("s390/mm: Convert bootmem to memblock").
      
      Currently the identity mapping looks like this:
      
      0x0000000080000000-0x0000000180000000        4G PUD RW
      0x0000000180000000-0x00000001fff00000     2047M PMD RW
      0x00000001fff00000-0x0000000200000000        1M PTE RW
      
      With the bug fixed it looks like this:
      
      0x0000000080000000-0x0000000200000000        6G PUD RW
      
      Fixes: 50be6345 ("s390/mm: Convert bootmem to memblock")
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      62582a66
    • Mans Rullgard's avatar
      ata: sata_dwc_460ex: remove incorrect locking · fe5f901a
      Mans Rullgard authored
      commit 55e610cd upstream.
      
      This lock is already taken in ata_scsi_queuecmd() a few levels up the
      call stack so attempting to take it here is an error.  Moreover, it is
      pointless in the first place since it only protects a single, atomic
      assignment.
      
      Enabling lock debugging gives the following output:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      4.4.0-rc5+ #189 Not tainted
      ---------------------------------------------
      kworker/u2:3/37 is trying to acquire lock:
       (&(&host->lock)->rlock){-.-...}, at: [<90283294>] sata_dwc_exec_command_by_tag.constprop.14+0x44/0x8c
      
      but task is already holding lock:
       (&(&host->lock)->rlock){-.-...}, at: [<902761ac>] ata_scsi_queuecmd+0x2c/0x330
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&host->lock)->rlock);
        lock(&(&host->lock)->rlock);
      
       *** DEADLOCK ***
       May be due to missing lock nesting notation
      
      4 locks held by kworker/u2:3/37:
       #0:  ("events_unbound"){.+.+.+}, at: [<9003a0a4>] process_one_work+0x12c/0x430
       #1:  ((&entry->work)){+.+.+.}, at: [<9003a0a4>] process_one_work+0x12c/0x430
       #2:  (&bdev->bd_mutex){+.+.+.}, at: [<9011fd54>] __blkdev_get+0x50/0x380
       #3:  (&(&host->lock)->rlock){-.-...}, at: [<902761ac>] ata_scsi_queuecmd+0x2c/0x330
      
      stack backtrace:
      CPU: 0 PID: 37 Comm: kworker/u2:3 Not tainted 4.4.0-rc5+ #189
      Workqueue: events_unbound async_run_entry_fn
      Stack : 90b38e30 00000021 00000003 9b2a6040 00000000 9005f3f0 904fc8dc 00000025
              906b96e4 00000000 90528648 9b3336c4 904fc8dc 9009bf18 00000002 00000004
              00000000 00000000 9b3336c4 9b3336e4 904fc8dc 9003d074 00000000 90500000
              9005e738 00000000 00000000 00000000 00000000 00000000 00000000 00000000
              6e657665 755f7374 756f626e 0000646e 00000000 00000000 9b00ca00 9b025000
                ...
      Call Trace:
      [<90009d6c>] show_stack+0x88/0xa4
      [<90057744>] __lock_acquire+0x1ce8/0x2154
      [<900583e4>] lock_acquire+0x64/0x8c
      [<9045ff10>] _raw_spin_lock_irqsave+0x54/0x78
      [<90283294>] sata_dwc_exec_command_by_tag.constprop.14+0x44/0x8c
      [<90283484>] sata_dwc_qc_issue+0x1a8/0x24c
      [<9026b39c>] ata_qc_issue+0x1f0/0x410
      [<90273c6c>] ata_scsi_translate+0xb4/0x200
      [<90276234>] ata_scsi_queuecmd+0xb4/0x330
      [<9025800c>] scsi_dispatch_cmd+0xd0/0x128
      [<90259934>] scsi_request_fn+0x58c/0x638
      [<901a3e50>] __blk_run_queue+0x40/0x5c
      [<901a83d4>] blk_queue_bio+0x27c/0x28c
      [<901a5914>] generic_make_request+0xf0/0x188
      [<901a5a54>] submit_bio+0xa8/0x194
      [<9011adcc>] submit_bh_wbc.isra.23+0x15c/0x17c
      [<9011c908>] block_read_full_page+0x3e4/0x428
      [<9009e2e0>] do_read_cache_page+0xac/0x210
      [<9009fd90>] read_cache_page+0x18/0x24
      [<901bbd18>] read_dev_sector+0x38/0xb0
      [<901bd174>] msdos_partition+0xb4/0x5c0
      [<901bcb8c>] check_partition+0x140/0x274
      [<901bba60>] rescan_partitions+0xa0/0x2b0
      [<9011ff68>] __blkdev_get+0x264/0x380
      [<901201ac>] blkdev_get+0x128/0x36c
      [<901b9378>] add_disk+0x3c0/0x4bc
      [<90268268>] sd_probe_async+0x100/0x224
      [<90043a44>] async_run_entry_fn+0x50/0x124
      [<9003a11c>] process_one_work+0x1a4/0x430
      [<9003a4f4>] worker_thread+0x14c/0x4fc
      [<900408f4>] kthread+0xd0/0xe8
      [<90004338>] ret_from_kernel_thread+0x14/0x1c
      
      Fixes: 62936009 ("[libata] Add 460EX on-chip SATA driver, sata_dwc_460ex")
      Tested-by: default avatarChristian Lamparter <chunkeey@googlemail.com>
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      fe5f901a
    • Arnd Bergmann's avatar
      gcov: disable tree-loop-im to reduce stack usage · 3c3e6b5e
      Arnd Bergmann authored
      commit c87bf431 upstream.
      
      Enabling CONFIG_GCOV_PROFILE_ALL produces us a lot of warnings like
      
      lib/lz4/lz4hc_compress.c: In function 'lz4_compresshcctx':
      lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1504 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      After some investigation, I found that this behavior started with gcc-4.9,
      and opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702.
      A suggested workaround for it is to use the -fno-tree-loop-im
      flag that turns off one of the optimization stages in gcc, so the
      code runs a little slower but does not use excessive amounts
      of stack.
      
      We could make this conditional on the gcc version, but I could not
      find an easy way to do this in Kbuild and the benefit would be
      fairly small, given that most of the gcc version in production are
      affected now.
      
      I'm marking this for 'stable' backports because it addresses a bug
      with code generation in gcc that exists in all kernel versions
      with the affected gcc releases.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichal Marek <mmarek@suse.com>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3c3e6b5e
    • James Hogan's avatar
      MIPS: KVM: Fix timer IRQ race when writing CP0_Compare · 9b1f40aa
      James Hogan authored
      commit b45bacd2 upstream.
      
      Writing CP0_Compare clears the timer interrupt pending bit
      (CP0_Cause.TI), but this wasn't being done atomically. If a timer
      interrupt raced with the write of the guest CP0_Compare, the timer
      interrupt could end up being pending even though the new CP0_Compare is
      nowhere near CP0_Count.
      
      We were already updating the hrtimer expiry with
      kvm_mips_update_hrtimer(), which used both kvm_mips_freeze_hrtimer() and
      kvm_mips_resume_hrtimer(). Close the race window by expanding out
      kvm_mips_update_hrtimer(), and clearing CP0_Cause.TI and setting
      CP0_Compare between the freeze and resume. Since the pending timer
      interrupt should not be cleared when CP0_Compare is written via the KVM
      user API, an ack argument is added to distinguish the source of the
      write.
      
      Fixes: e30492bb ("MIPS: KVM: Rewrite count/compare timer emulation")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim KrčmáÅ" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [bwh: Backported to 3.16: adjust filenames]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9b1f40aa
    • James Hogan's avatar
      MIPS: KVM: Fix timer IRQ race when freezing timer · 75b5b632
      James Hogan authored
      commit 4355c44f upstream.
      
      There's a particularly narrow and subtle race condition when the
      software emulated guest timer is frozen which can allow a guest timer
      interrupt to be missed.
      
      This happens due to the hrtimer expiry being inexact, so very
      occasionally the freeze time will be after the moment when the emulated
      CP0_Count transitions to the same value as CP0_Compare (so an IRQ should
      be generated), but before the moment when the hrtimer is due to expire
      (so no IRQ is generated). The IRQ won't be generated when the timer is
      resumed either, since the resume CP0_Count will already match CP0_Compare.
      
      With VZ guests in particular this is far more likely to happen, since
      the soft timer may be frozen frequently in order to restore the timer
      state to the hardware guest timer. This happens after 5-10 hours of
      guest soak testing, resulting in an overflow in guest kernel timekeeping
      calculations, hanging the guest. A more focussed test case to
      intentionally hit the race (with the help of a new hypcall to cause the
      timer state to migrated between hardware & software) hits the condition
      fairly reliably within around 30 seconds.
      
      Instead of relying purely on the inexact hrtimer expiry to determine
      whether an IRQ should be generated, read the guest CP0_Compare and
      directly check whether the freeze time is before or after it. Only if
      CP0_Count is on or after CP0_Compare do we check the hrtimer expiry to
      determine whether the last IRQ has already been generated (which will
      have pushed back the expiry by one timer period).
      
      Fixes: e30492bb ("MIPS: KVM: Rewrite count/compare timer emulation")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim KrčmáÅ" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      75b5b632