1. 01 May, 2014 40 commits
    • Mizuma, Masayoshi's avatar
      mm: hugetlb: fix softlockup when a large number of hugepages are freed. · 9c046926
      Mizuma, Masayoshi authored
      commit 55f67141 upstream.
      
      When I decrease the value of nr_hugepage in procfs a lot, softlockup
      happens.  It is because there is no chance of context switch during this
      process.
      
      On the other hand, when I allocate a large number of hugepages, there is
      some chance of context switch.  Hence softlockup doesn't happen during
      this process.  So it's necessary to add the context switch in the
      freeing process as same as allocating process to avoid softlockup.
      
      When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing
      process occupied a CPU over 150 seconds and following softlockup message
      appeared twice or more.
      
      $ echo 6000000 > /proc/sys/vm/nr_hugepages
      $ cat /proc/sys/vm/nr_hugepages
      6000000
      $ grep ^Huge /proc/meminfo
      HugePages_Total:   6000000
      HugePages_Free:    6000000
      HugePages_Rsvd:        0
      HugePages_Surp:        0
      Hugepagesize:       2048 kB
      $ echo 0 > /proc/sys/vm/nr_hugepages
      
      BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ...
      Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1
      Call Trace:
        free_pool_huge_page+0xb8/0xd0
        set_max_huge_pages+0x128/0x190
        hugetlb_sysctl_handler_common+0x113/0x140
        hugetlb_sysctl_handler+0x1e/0x20
        proc_sys_call_handler+0x97/0xd0
        proc_sys_write+0x14/0x20
        vfs_write+0xb8/0x1a0
        sys_write+0x51/0x90
        __audit_syscall_exit+0x265/0x290
        system_call_fastpath+0x16/0x1b
      
      I have not confirmed this problem with upstream kernels because I am not
      able to prepare the machine equipped with 12TB memory now.  However I
      confirmed that the amount of decreasing hugepages was directly
      proportional to the amount of required time.
      
      I measured required times on a smaller machine.  It showed 130-145
      hugepages decreased in a millisecond.
      
        Amount of decreasing     Required time      Decreasing rate
        hugepages                     (msec)         (pages/msec)
        ------------------------------------------------------------
        10,000 pages == 20GB         70 -  74          135-142
        30,000 pages == 60GB        208 - 229          131-144
      
      It means decrement of 6TB hugepages will trigger softlockup with the
      default threshold 20sec, in this decreasing rate.
      Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9c046926
    • Vlastimil Babka's avatar
      mm: try_to_unmap_cluster() should lock_page() before mlocking · 94a3e116
      Vlastimil Babka authored
      commit 57e68e9c upstream.
      
      A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin
      fuzzing with trinity.  The call site try_to_unmap_cluster() does not lock
      the pages other than its check_page parameter (which is already locked).
      
      The BUG_ON in mlock_vma_page() is not documented and its purpose is
      somewhat unclear, but apparently it serializes against page migration,
      which could otherwise fail to transfer the PG_mlocked flag.  This would
      not be fatal, as the page would be eventually encountered again, but
      NR_MLOCK accounting would become distorted nevertheless.  This patch adds
      a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that
      effect.
      
      The call site try_to_unmap_cluster() is fixed so that for page !=
      check_page, trylock_page() is attempted (to avoid possible deadlocks as we
      already have check_page locked) and mlock_vma_page() is performed only
      upon success.  If the page lock cannot be obtained, the page is left
      without PG_mlocked, which is again not a problem in the whole unevictable
      memory design.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      94a3e116
    • Johannes Weiner's avatar
      mm: page_alloc: spill to remote nodes before waking kswapd · 7bd9fcd9
      Johannes Weiner authored
      commit 3a025760 upstream.
      
      On NUMA systems, a node may start thrashing cache or even swap anonymous
      pages while there are still free pages on remote nodes.
      
      This is a result of commits 81c0a2bb ("mm: page_alloc: fair zone
      allocator policy") and fff4068c ("mm: page_alloc: revert NUMA aspect
      of fair allocation policy").
      
      Before those changes, the allocator would first try all allowed zones,
      including those on remote nodes, before waking any kswapds.  But now,
      the allocator fastpath doubles as the fairness pass, which in turn can
      only consider the local node to prevent remote spilling based on
      exhausted fairness batches alone.  Remote nodes are only considered in
      the slowpath, after the kswapds are woken up.  But if remote nodes still
      have free memory, kswapd should not be woken to rebalance the local node
      or it may thrash cash or swap prematurely.
      
      Fix this by adding one more unfair pass over the zonelist that is
      allowed to spill to remote nodes after the local fairness pass fails but
      before entering the slowpath and waking the kswapds.
      
      This also gets rid of the GFP_THISNODE exemption from the fairness
      protocol because the unfair pass is no longer tied to kswapd, which
      GFP_THISNODE is not allowed to wake up.
      
      However, because remote spills can be more frequent now - we prefer them
      over local kswapd reclaim - the allocation batches on remote nodes could
      underflow more heavily.  When resetting the batches, use
      atomic_long_read() directly instead of zone_page_state() to calculate the
      delta as the latter filters negative counter values.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7bd9fcd9
    • Martin Svec's avatar
      Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist · 9f66dd6c
      Martin Svec authored
      commit a1e1774c upstream.
      
      When compiled with CONFIG_DEBUG_SG set, uninitialized SGL leads
      to BUG() in compare_and_write_callback().
      Signed-off-by: default avatarMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9f66dd6c
    • Nicholas Bellinger's avatar
      iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug · bf879dc8
      Nicholas Bellinger authored
      commit d444edc6 upstream.
      
      This patch fixes a long-standing bug in iscsit_build_conn_drop_async_message()
      where during ERL=2 connection recovery, a bogus conn_p pointer could
      end up being used to send the ISCSI_OP_ASYNC_EVENT + DROPPING_CONNECTION
      notifying the initiator that cmd->logout_cid has failed.
      
      The bug was manifesting itself as an OOPs in iscsit_allocate_cmd() with
      a bogus conn_p pointer in iscsit_build_conn_drop_async_message().
      Reported-by: default avatarArshad Hussain <arshad.hussain@calsoftinc.com>
      Reported-by: default avatarsantosh kulkarni <santosh.kulkarni@calsoftinc.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bf879dc8
    • Nicholas Bellinger's avatar
      iser-target: Add missing se_cmd put for WRITE_PENDING in tx_comp_err · 84a55168
      Nicholas Bellinger authored
      commit 03e7848a upstream.
      
      This patch fixes a bug where outstanding RDMA_READs with WRITE_PENDING
      status require an extra target_put_sess_cmd() in isert_put_cmd() code
      when called from isert_cq_tx_comp_err() + isert_cq_drain_comp_llist()
      context during session shutdown.
      
      The extra kref PUT is required so that transport_generic_free_cmd()
      invokes the last target_put_sess_cmd() -> target_release_cmd_kref(),
      which will complete(&se_cmd->cmd_wait_comp) the outstanding se_cmd
      descriptor with WRITE_PENDING status, and awake the completion in
      target_wait_for_sess_cmds() to invoke TFO->release_cmd().
      
      The bug was manifesting itself in target_wait_for_sess_cmds() where
      a se_cmd descriptor with WRITE_PENDING status would end up sleeping
      indefinately.
      Acked-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [ kamal: backport to 3.13 (context) ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      84a55168
    • Michael Neuling's avatar
      powerpc/tm: Disable IRQ in tm_recheckpoint · 42d50463
      Michael Neuling authored
      commit e6b8fd02 upstream.
      
      We can't take an IRQ when we're about to do a trechkpt as our GPR state is set
      to user GPR values.
      
      We've hit this when running some IBM Java stress tests in the lab resulting in
      the following dump:
      
        cpu 0x3f: Vector: 700 (Program Check) at [c000000007eb3d40]
            pc: c000000000050074: restore_gprs+0xc0/0x148
            lr: 00000000b52a8184
            sp: ac57d360
           msr: 8000000100201030
          current = 0xc00000002c500000
          paca    = 0xc000000007dbfc00     softe: 0     irq_happened: 0x00
            pid   = 34535, comm = Pooled Thread #
        R00 = 00000000b52a8184   R16 = 00000000b3e48fda
        R01 = 00000000ac57d360   R17 = 00000000ade79bd8
        R02 = 00000000ac586930   R18 = 000000000fac9bcc
        R03 = 00000000ade60000   R19 = 00000000ac57f930
        R04 = 00000000f6624918   R20 = 00000000ade79be8
        R05 = 00000000f663f238   R21 = 00000000ac218a54
        R06 = 0000000000000002   R22 = 000000000f956280
        R07 = 0000000000000008   R23 = 000000000000007e
        R08 = 000000000000000a   R24 = 000000000000000c
        R09 = 00000000b6e69160   R25 = 00000000b424cf00
        R10 = 0000000000000181   R26 = 00000000f66256d4
        R11 = 000000000f365ec0   R27 = 00000000b6fdcdd0
        R12 = 00000000f66400f0   R28 = 0000000000000001
        R13 = 00000000ada71900   R29 = 00000000ade5a300
        R14 = 00000000ac2185a8   R30 = 00000000f663f238
        R15 = 0000000000000004   R31 = 00000000f6624918
        pc  = c000000000050074 restore_gprs+0xc0/0x148
        cfar= c00000000004fe28 dont_restore_vec+0x1c/0x1a4
        lr  = 00000000b52a8184
        msr = 8000000100201030   cr  = 24804888
        ctr = 0000000000000000   xer = 0000000000000000   trap =  700
      
      This moves tm_recheckpoint to a C function and moves the tm_restore_sprs into
      that function.  It then adds IRQ disabling over the trechkpt critical section.
      It also sets the TEXASR FS in the signals code to ensure this is never set now
      that we explictly write the TM sprs in tm_recheckpoint.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      42d50463
    • Takashi Iwai's avatar
      thinkpad_acpi: Fix inconsistent mute LED after resume · ca98f1e0
      Takashi Iwai authored
      commit 119f4498 upstream.
      
      The mute LED states have to be restored after resume.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70351Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ca98f1e0
    • Joe Thornber's avatar
      dm cache: fix a lock-inversion · cb3706ff
      Joe Thornber authored
      commit 0596661f upstream.
      
      When suspending a cache the policy is walked and the individual policy
      hints written to the metadata via sync_metadata().  This led to this
      lock order:
      
            policy->lock
              cache_metadata->root_lock
      
      When loading the cache target the policy is populated while the metadata
      lock is held:
      
            cache_metadata->root_lock
               policy->lock
      
      Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
      ensuring the cache_metadata root_lock is held whilst all the hints are
      written, rather than being repeatedly locked while policy->lock is held
      (as was the case with each callout that policy_walk_mappings() made to
      the old save_hint() method).
      
      Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
      locking correctness") build option.  However, it is not clear how the
      LOCKDEP reported paths can lead to a deadlock since the two paths,
      suspending a target and loading a target, never occur at the same time.
      But that doesn't mean the same lock-inversion couldn't have occurred
      elsewhere.
      Reported-by: default avatarMarian Csontos <mcsontos@redhat.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      [ kamal: backport to 3.13: omitted cache_target version string change ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cb3706ff
    • Giacomo Comes's avatar
      Skip intel_crt_init for Dell XPS 8700 · 8fff1d7a
      Giacomo Comes authored
      commit 10b6ee4a upstream.
      
      The Dell XPS 8700 has a onboard Display port and HDMI port and no VGA port.
      The call intel_crt_init freeze the machine, so skip such call.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73559
      Signed-off-by: Giacomo Comes <comes at naic.edu>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8fff1d7a
    • Imre Deak's avatar
      drm/i915: move power domain init earlier during system resume · 670c7451
      Imre Deak authored
      commit 76c4b250 upstream.
      
      During resume the intel hda audio driver depends on the i915 driver
      reinitializing the audio power domain. Since the order of calling the
      i915 resume handler wrt. that of the audio driver is not guaranteed,
      move the power domain reinitialization step to the resume_early
      handler. This is guaranteed to run before the resume handler of any
      other driver.
      
      The power domain initialization in turn requires us to enable the i915
      pci device first, so move that part earlier too.
      
      Accordingly disabling of the i915 pci device should happen after the
      audio suspend handler ran. So move the disabling later from the i915
      resume handler to the resume_late handler.
      
      v2:
      - move intel_uncore_sanitize/early_sanitize earlier too, so they don't
        get reordered wrt. intel_power_domains_init_hw()
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76152Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarTakashi Iwai <tiwai@suse.de>
      [danvet: Add cc: stable and loud comments that this is just a hack.]
      [danvet: Fix "Should it be static?" sparse warning reported by Wu
      Fengguang's kbuilder.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      [ kamal: backport to 3.13: intel_power_domains_init_hw(dev) not (dev_priv) ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      670c7451
    • Serge Hallyn's avatar
      xattr: guard against simultaneous glibc header inclusion · da732f45
      Serge Hallyn authored
      commit ea1a8217 upstream.
      
      If the glibc xattr.h header is included after the uapi header,
      compilation fails due to an enum re-using a #define from the uapi
      header.
      
      Protect against this by guarding the define and enum inclusions against
      each other.
      
      (See https://lists.debian.org/debian-glibc/2014/03/msg00029.html
      and https://sourceware.org/glibc/wiki/Synchronizing_Headers
      for more information.)
      Signed-off-by: default avatarSerge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Allan McRae <allan@archlinux.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      da732f45
    • Tetsuo Handa's avatar
      ocfs2: fix panic on kfree(xattr->name) · 625e99e4
      Tetsuo Handa authored
      commit f81c2015 upstream.
      
      Commit 9548906b ('xattr: Constify ->name member of "struct xattr"')
      missed that ocfs2 is calling kfree(xattr->name).  As a result, kernel
      panic occurs upon calling kfree(xattr->name) because xattr->name refers
      static constant names.  This patch removes kfree(xattr->name) from
      ocfs2_mknod() and ocfs2_symlink().
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarTariq Saeed <tariq.x.saeed@oracle.com>
      Tested-by: default avatarTariq Saeed <tariq.x.saeed@oracle.com>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      625e99e4
    • alex chen's avatar
      ocfs2: do not put bh when buffer_uptodate failed · aac86e1c
      alex chen authored
      commit f7cf4f5b upstream.
      
      Do not put bh when buffer_uptodate failed in ocfs2_write_block and
      ocfs2_write_super_or_backup, because it will put bh in b_end_io.
      Otherwise it will hit a warning "VFS: brelse: Trying to free free
      buffer".
      Signed-off-by: default avatarAlex Chen <alex.chen@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Acked-by: default avatarJoel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      aac86e1c
    • Junxiao Bi's avatar
      ocfs2: dlm: fix recovery hung · a0159efa
      Junxiao Bi authored
      commit ded2cf71 upstream.
      
      There is a race window in dlm_do_recovery() between dlm_remaster_locks()
      and dlm_reset_recovery() when the recovery master nearly finish the
      recovery process for a dead node.  After the master sends FINALIZE_RECO
      message in dlm_remaster_locks(), another node may become the recovery
      master for another dead node, and then send the BEGIN_RECO message to
      all the nodes included the old master, in the handler of this message
      dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
      dlm->reco.new_master will be set to the second dead node and the new
      master, then in dlm_reset_recovery(), these two variables will be reset
      to default value.  This will cause new recovery master can not finish
      the recovery process and hung, at last the whole cluster will hung for
      recovery.
      
      old recovery master:                                 new recovery master:
      dlm_remaster_locks()
                                                        become recovery master for
                                                        another dead node.
                                                        dlm_send_begin_reco_message()
      dlm_begin_reco_handler()
      {
       if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
        return -EAGAIN;
       }
       dlm_set_reco_master(dlm, br->node_idx);
       dlm_set_reco_dead_node(dlm, br->dead_node);
      }
      dlm_reset_recovery()
      {
       dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
       dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
      }
                                                        will hang in dlm_remaster_locks() for
                                                        request dlm locks info
      
      Before send FINALIZE_RECO message, recovery master should set
      DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
      this can break the race windows as the BEGIN_RECO messages will not be
      handled before DLM_RECO_STATE_FINALIZE flag is cleared.
      
      A similar race may happen between new recovery master and normal node
      which is in dlm_finalize_reco_handler(), also fix it.
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Reviewed-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a0159efa
    • Junxiao Bi's avatar
      ocfs2: dlm: fix lock migration crash · 765bf9f6
      Junxiao Bi authored
      commit 34aa8dac upstream.
      
      This issue was introduced by commit 800deef3 ("ocfs2: use
      list_for_each_entry where benefical") in 2007 where it replaced
      list_for_each with list_for_each_entry.  The variable "lock" will point
      to invalid data if "tmpq" list is empty and a panic will be triggered
      due to this.  Sunil advised reverting it back, but the old version was
      also not right.  At the end of the outer for loop, that
      list_for_each_entry will also set "lock" to an invalid data, then in the
      next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
      data and cause the panic.  So reverting the list_for_each back and reset
      "lock" to NULL to fix this issue.
      
      Another concern is that this seemes can not happen because the "tmpq"
      list should not be empty.  Let me describe how.
      
      old lock resource owner(node 1):                                  migratation target(node 2):
      image there's lockres with a EX lock from node 2 in
      granted list, a NR lock from node x with convert_type
      EX in converting list.
      dlm_empty_lockres() {
       dlm_pick_migration_target() {
         pick node 2 as target as its lock is the first one
         in granted list.
       }
       dlm_migrate_lockres() {
         dlm_mark_lockres_migrating() {
           res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
           wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
      	 //after the above code, we can not dirty lockres any more,
           // so dlm_thread shuffle list will not run
                                                                         downconvert lock from EX to NR
                                                                         upconvert lock from NR to EX
      <<< migration may schedule out here, then
      <<< node 2 send down convert request to convert type from EX to
      <<< NR, then send up convert request to convert type from NR to
      <<< EX, at this time, lockres granted list is empty, and two locks
      <<< in the converting list, node x up convert lock followed by
      <<< node 2 up convert lock.
      
      	 // will set lockres RES_MIGRATING flag, the following
      	 // lock/unlock can not run
           dlm_lockres_release_ast(dlm, res);
         }
      
         dlm_send_one_lockres()
                                                                       dlm_process_recovery_data()
                                                                         for (i=0; i<mres->num_locks; i++)
                                                                           if (ml->node == dlm->node_num)
                                                                             for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
                                                                              list_for_each_entry(lock, tmpq, list)
                                                                              if (lock) break; <<< lock is invalid as grant list is empty.
                                                                             }
                                                                             if (lock->ml.node != ml->node)
                                                                               BUG() >>> crash here
       }
      
      I see the above locks status from a vmcore of our internal bug.
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      765bf9f6
    • Matt Fleming's avatar
      sh: fix format string bug in stack tracer · 7ca07be2
      Matt Fleming authored
      commit a0c32761 upstream.
      
      Kees reported the following error:
      
         arch/sh/kernel/dumpstack.c: In function 'print_trace_address':
         arch/sh/kernel/dumpstack.c:118:2: error: format not a string literal and no format arguments [-Werror=format-security]
      
      Use the "%s" format so that it's impossible to interpret 'data' as a
      format string.
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Reported-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7ca07be2
    • Alex Deucher's avatar
      a1516f0e
    • Alex Deucher's avatar
      drm/radeon: fix endian swap on hawaii clear state buffer setup · dc907e0b
      Alex Deucher authored
      commit a8947f57 upstream.
      
      Need to swap on BE.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      dc907e0b
    • Alex Deucher's avatar
      drm/radeon: call drm_edid_to_eld when we update the edid · 107fcf4f
      Alex Deucher authored
      commit 16086279 upstream.
      
      This needs to be done to update some of the fields in
      the connector structure used by the audio code.
      
      Noticed by several users on irc.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      107fcf4f
    • Christian König's avatar
      drm/radeon: clear needs_reset flag if IB test fails · 59197128
      Christian König authored
      commit 06a139f7 upstream.
      
      If the IB test fails we don't want to reset the card over
      and over again, just accept that it isn't working.
      
      Bug: https://bugs.freedesktop.org/show_bug.cgi?id=76501Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      59197128
    • Takashi Iwai's avatar
      ALSA: hda - Fix silent speaker output due to mute LED fixup · a9d247b1
      Takashi Iwai authored
      commit 415d555e upstream.
      
      The recent fixups for HP laptops to support the mute LED made the
      speaker output silent on some machines.  It turned out that they use
      the NID 0x18 for the speaker while it's also used for controlling the
      LED via VREF bits although the current driver code blindly assumes
      that such a node is a mic pin (where 0x18 is usually so).
      
      This patch fixes the problem by only changing the VREF bits and
      keeping the other pin ctl bits.
      Reported-and-tested-by: default avatarHui Wang <hui.wang@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a9d247b1
    • Christopher Friedt's avatar
      drm/vmwgfx: correct fb_fix_screeninfo.line_length · 8d487b18
      Christopher Friedt authored
      commit aa6de142 upstream.
      
      Previously, the vmwgfx_fb driver would allow users to call FBIOSET_VINFO, but it would not adjust
      the FINFO properly, resulting in distorted screen rendering. The patch corrects that behaviour.
      
      See https://bugs.gentoo.org/show_bug.cgi?id=494794 for examples.
      Signed-off-by: default avatarChristopher Friedt <chrisfriedt@gmail.com>
      Reviewed-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8d487b18
    • Jeff Mahoney's avatar
      reiserfs: fix race in readdir · c7f1f8ab
      Jeff Mahoney authored
      commit 01d88857 upstream.
      
      jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2)
      
      The -ENOENT is due to readdir calling dir_emit on the same entry twice.
      
      If the dir_emit callback sleeps and the tree is changed underneath us,
      we won't be able to trust deh_offset(deh) anymore. We need to save
      next_pos before we might sleep so we can find the next entry.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c7f1f8ab
    • Al Viro's avatar
      don't bother with {get,put}_write_access() on non-regular files · 1b821784
      Al Viro authored
      commit dd20908a upstream.
      
      it's pointless and actually leads to wrong behaviour in at least one
      moderately convoluted case (pipe(), close one end, try to get to
      another via /proc/*/fd and run into ETXTBUSY).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1b821784
    • Al Viro's avatar
      smarter propagate_mnt() · 1757ea5a
      Al Viro authored
      commit f2ebb3a9 upstream.
      
      The current mainline has copies propagated to *all* nodes, then
      tears down the copies we made for nodes that do not contain
      counterparts of the desired mountpoint.  That sets the right
      propagation graph for the copies (at teardown time we move
      the slaves of removed node to a surviving peer or directly
      to master), but we end up paying a fairly steep price in
      useless allocations.  It's fairly easy to create a situation
      where N calls of mount(2) create exactly N bindings, with
      O(N^2) vfsmounts allocated and freed in process.
      
      Fortunately, it is possible to avoid those allocations/freeings.
      The trick is to create copies in the right order and find which
      one would've eventually become a master with the current algorithm.
      It turns out to be possible in O(nodes getting propagation) time
      and with no extra allocations at all.
      
      One part is that we need to make sure that eventual master will be
      created before its slaves, so we need to walk the propagation
      tree in a different order - by peer groups.  And iterate through
      the peers before dealing with the next group.
      
      Another thing is finding the (earlier) copy that will be a master
      of one we are about to create; to do that we are (temporary) marking
      the masters of mountpoints we are attaching the copies to.
      
      Either we are in a peer of the last mountpoint we'd dealt with,
      or we have the following situation: we are attaching to mountpoint M,
      the last copy S_0 had been attached to M_0 and there are sequences
      S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i},
      S_{i} mounted on M{i} and we need to create a slave of the first S_{k}
      such that M is getting propagation from M_{k}.  It means that the master
      of M_{k} will be among the sequence of masters of M.  On the
      other hand, the nearest marked node in that sequence will either
      be the master of M_{k} or the master of M_{k-1} (the latter -
      in the case if M_{k-1} is a slave of something M gets propagation
      from, but in a wrong peer group).
      
      So we go through the sequence of masters of M until we find
      a marked one (P).  Let N be the one before it.  Then we go through
      the sequence of masters of S_0 until we find one (say, S) mounted
      on a node D that has P as master and check if D is a peer of N.
      If it is, S will be the master of new copy, if not - the master of S
      will be.
      
      That's it for the hard part; the rest is fairly simple.  Iterator
      is in next_group(), handling of one prospective mountpoint is
      propagate_one().
      
      It seems to survive all tests and gives a noticably better performance
      than the current mainline for setups that are seriously using shared
      subtrees.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1757ea5a
    • Maarten Lankhorst's avatar
      drm/qxl: unset a pointer in sync_obj_unref · 12e7a401
      Maarten Lankhorst authored
      commit 41ccec35 upstream.
      
      This fixes a BUG_ON(bo->sync_obj != NULL); in ttm_bo_release_list.
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      12e7a401
    • Yann Droneaud's avatar
      IB/ehca: Returns an error on ib_copy_to_udata() failure · 40629dc9
      Yann Droneaud authored
      commit 5bdb0f02 upstream.
      
      In case of error when writing to userspace, function ehca_create_cq()
      does not set an error code before following its error path.
      
      This patch sets the error code to -EFAULT when ib_copy_to_udata()
      fails.
      
      This was caught when using spatch (aka. coccinelle)
      to rewrite call to ib_copy_{from,to}_udata().
      
      Link: https://www.gitorious.org/opteya/coccib/source/75ebf2c1033c64c1d81df13e4ae44ee99c989eba:ib_copy_udata.cocci
      Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.comSigned-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      40629dc9
    • Yann Droneaud's avatar
      IB/mthca: Return an error on ib_copy_to_udata() failure · 18ceb330
      Yann Droneaud authored
      commit 08e74c4b upstream.
      
      In case of error when writing to userspace, the function mthca_create_cq()
      does not set an error code before following its error path.
      
      This patch sets the error code to -EFAULT when ib_copy_to_udata() fails.
      
      This was caught when using spatch (aka. coccinelle)
      to rewrite call to ib_copy_{from,to}_udata().
      
      Link: https://www.gitorious.org/opteya/coccib/source/75ebf2c1033c64c1d81df13e4ae44ee99c989eba:ib_copy_udata.cocci
      Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.comSigned-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      18ceb330
    • Heiko Carstens's avatar
      s390/bitops,atomic: add missing memory barriers · 18e26989
      Heiko Carstens authored
      commit 0ccc8b7a upstream.
      
      When reworking the bitops and atomic ops I missed that those instructions
      that got atomic behaviour only perform a "specific-operand-serialization"
      instead of a full "serialization".
      The compare-and-swap instruction used before performs a full serialization
      before and after the instruction is executed, which means it has full
      memory barrier semantics.
      In order to give the new bitops and atomic ops functions also full memory
      barrier semantics add a "bcr 14,0" before and after each of those new
      instructions which performs full serialization as well.
      
      This restores memory barrier semantics for bitops and atomic ops functions
      which return values, like e.g. atomic_add_return(), but not for functions
      which do not return a value, like e.g. atomic_add().
      This is consistent to other architectures and what common code requires.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      18e26989
    • Stanislav Kinsbursky's avatar
      nfsd: check passed socket's net matches NFSd superblock's one · 72dfbca8
      Stanislav Kinsbursky authored
      commit 30646394 upstream.
      
      There could be a case, when NFSd file system is mounted in network, different
      to socket's one, like below:
      
      "ip netns exec" creates new network and mount namespace, which duplicates NFSd
      mount point, created in init_net context. And thus NFS server stop in nested
      network context leads to RPCBIND client destruction in init_net.
      Then, on NFSd start in nested network context, rpc.nfsd process creates socket
      in nested net and passes it into "write_ports", which leads to RPCBIND sockets
      creation in init_net context because of the same reason (NFSd monut point was
      created in init_net context). An attempt to register passed socket in nested
      net leads to panic, because no RPCBIND client present in nexted network
      namespace.
      
      This patch add check that passed socket's net matches NFSd superblock's one.
      And returns -EINVAL error to user psace otherwise.
      
      v2: Put socket on exit.
      Reported-by: default avatarWeng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: default avatarStanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      72dfbca8
    • W. Trevor King's avatar
      ALSA: hda - Enable beep for ASUS 1015E · 63dd02a7
      W. Trevor King authored
      commit a4b7f21d upstream.
      
      The `lspci -nnvv` output contains (wrapped for line length):
      
        00:1b.0 Audio device [0403]:
          Intel Corporation 7 Series/C210 Series Chipset Family
          High Definition Audio Controller [8086:1e20] (rev 04)
              Subsystem: ASUSTeK Computer Inc. Device [1043:115d]
      Signed-off-by: default avatarW. Trevor King <wking@tremily.us>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      63dd02a7
    • Huacai Chen's avatar
      MIPS: Hibernate: Flush TLB entries in swsusp_arch_resume() · ee98dc5c
      Huacai Chen authored
      commit c14af233 upstream.
      
      The original MIPS hibernate code flushes cache and TLB entries in
      swsusp_arch_resume(). But they are removed in Commit 44eeab67
      (MIPS: Hibernation: Remove SMP TLB and cacheflushing code.). A cross-
      CPU flush is surely unnecessary because all but the local CPU have
      already been disabled. But a local flush (at least the TLB flush) is
      needed. When we do hibernation on Loongson-3 with an E1000E NIC, it is
      very easy to produce a kernel panic (kernel page fault, or unaligned
      access). The root cause is E1000E driver use vzalloc_node() to allocate
      pages, the stale TLB entries of the booting kernel will be misused by
      the resumed target kernel.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Cc: John Crispin <john@phrozen.org>
      Cc: Steven J. Hill <Steven.Hill@imgtec.com>
      Cc: Aurelien Jarno <aurelien@aurel32.net>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Patchwork: https://patchwork.linux-mips.org/patch/6643/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ee98dc5c
    • J. Bruce Fields's avatar
      nfsd4: fix setclientid encode size · 7f6d446a
      J. Bruce Fields authored
      commit 480efaee upstream.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7f6d446a
    • Trond Myklebust's avatar
      NFSv4: Fix a use-after-free problem in open() · 82dfe776
      Trond Myklebust authored
      commit e911b815 upstream.
      
      If we interrupt the nfs4_wait_for_completion_rpc_task() call in
      nfs4_run_open_task(), then we don't prevent the RPC call from
      completing. So freeing up the opendata->f_attr.mdsthreshold
      in the error path in _nfs4_do_open() leads to a use-after-free
      when the XDR decoder tries to decode the mdsthreshold information
      from the server.
      
      Fixes: 82be417a (NFSv4.1 cache mdsthreshold values on OPEN)
      Tested-by: default avatarSteve Dickson <SteveD@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      82dfe776
    • Antti Palosaari's avatar
      [media] em28xx: fix PCTV 290e LNA oops · 90327113
      Antti Palosaari authored
      commit 3ec40dcf upstream.
      
      Pointer to device state has been moved to different location during
      some change. PCTV 290e LNA function still uses old pointer, carried
      over FE priv, and it crash.
      Reported-by: default avatarJanne Kujanpää <jikuja@iki.fi>
      Signed-off-by: default avatarAntti Palosaari <crope@iki.fi>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      90327113
    • Mike Snitzer's avatar
      dm thin: fix dangling bio in process_deferred_bios error path · bb2508f4
      Mike Snitzer authored
      commit fe76cd88 upstream.
      
      If unable to ensure_next_mapping() we must add the current bio, which
      was removed from the @bios list via bio_list_pop, back to the
      deferred_bios list before all the remaining @bios.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bb2508f4
    • Jani Nikula's avatar
      drm/i915/tv: fix gen4 composite s-video tv-out · bc447918
      Jani Nikula authored
      commit e1f23f3d upstream.
      
      This is *not* bisected, but the likely regression is
      
      commit c3561438
      Author: Zhao Yakui <yakui.zhao@intel.com>
      Date:   Tue Nov 24 09:48:48 2009 +0800
      
          drm/i915: Don't set up the TV port if it isn't in the BIOS table.
      
      The commit does not check for all TV device types that might be present
      in the VBT, disabling TV out for the missing ones. Add composite
      S-video.
      Reported-and-tested-by: default avatarMatthew Khouzam <matthew.khouzam@gmail.com>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73362Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bc447918
    • Jeff Layton's avatar
      lockd: ensure we tear down any live sockets when socket creation fails during lockd_up · 84b017e1
      Jeff Layton authored
      commit 679b033d upstream.
      
      We had a Fedora ABRT report with a stack trace like this:
      
      kernel BUG at net/sunrpc/svc.c:550!
      invalid opcode: 0000 [#1] SMP
      [...]
      CPU: 2 PID: 913 Comm: rpc.nfsd Not tainted 3.13.6-200.fc20.x86_64 #1
      Hardware name: Hewlett-Packard HP ProBook 4740s/1846, BIOS 68IRR Ver. F.40 01/29/2013
      task: ffff880146b00000 ti: ffff88003f9b8000 task.ti: ffff88003f9b8000
      RIP: 0010:[<ffffffffa0305fa8>]  [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
      RSP: 0018:ffff88003f9b9de0  EFLAGS: 00010206
      RAX: ffff88003f829628 RBX: ffff88003f829600 RCX: 00000000000041ee
      RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
      RBP: ffff88003f9b9de8 R08: 0000000000017360 R09: ffff88014fa97360
      R10: ffffffff8114ce57 R11: ffffea00051c9c00 R12: ffff88003f829600
      R13: 00000000ffffff9e R14: ffffffff81cc7cc0 R15: 0000000000000000
      FS:  00007f4fde284840(0000) GS:ffff88014fa80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f4fdf5192f8 CR3: 00000000a569a000 CR4: 00000000001407e0
      Stack:
       ffff88003f792300 ffff88003f9b9e18 ffffffffa02de02a 0000000000000000
       ffffffff81cc7cc0 ffff88003f9cb000 0000000000000008 ffff88003f9b9e60
       ffffffffa033bb35 ffffffff8131c86c ffff88003f9cb000 ffff8800a5715008
      Call Trace:
       [<ffffffffa02de02a>] lockd_up+0xaa/0x330 [lockd]
       [<ffffffffa033bb35>] nfsd_svc+0x1b5/0x2f0 [nfsd]
       [<ffffffff8131c86c>] ? simple_strtoull+0x2c/0x50
       [<ffffffffa033c630>] ? write_pool_threads+0x280/0x280 [nfsd]
       [<ffffffffa033c6bb>] write_threads+0x8b/0xf0 [nfsd]
       [<ffffffff8114efa4>] ? __get_free_pages+0x14/0x50
       [<ffffffff8114eff6>] ? get_zeroed_page+0x16/0x20
       [<ffffffff811dec51>] ? simple_transaction_get+0xb1/0xd0
       [<ffffffffa033c098>] nfsctl_transaction_write+0x48/0x80 [nfsd]
       [<ffffffff811b8b34>] vfs_write+0xb4/0x1f0
       [<ffffffff811c3f99>] ? putname+0x29/0x40
       [<ffffffff811b9569>] SyS_write+0x49/0xa0
       [<ffffffff810fc2a6>] ? __audit_syscall_exit+0x1f6/0x2a0
       [<ffffffff816962e9>] system_call_fastpath+0x16/0x1b
      Code: 31 c0 e8 82 db 37 e1 e9 2a ff ff ff 48 8b 07 8b 57 14 48 c7 c7 d5 c6 31 a0 48 8b 70 20 31 c0 e8 65 db 37 e1 e9 f4 fe ff ff 0f 0b <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55
      RIP  [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
       RSP <ffff88003f9b9de0>
      
      Evidently, we created some lockd sockets and then failed to create
      others. make_socks then returned an error and we tried to tear down the
      svc, but svc->sv_permsocks was not empty so we ended up tripping over
      the BUG() in svc_destroy().
      
      Fix this by ensuring that we tear down any live sockets we created when
      socket creation is going to return an error.
      
      Fixes: 786185b5 (SUNRPC: move per-net operations from...)
      Reported-by: default avatarRaphos <raphoszap@laposte.net>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarStanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      84b017e1
    • Kinglong Mee's avatar
      NFSD: Traverse unconfirmed client through hash-table · 59fd0ab1
      Kinglong Mee authored
      commit 2b905635 upstream.
      
      When stopping nfsd, I got BUG messages, and soft lockup messages,
      The problem is cuased by double rb_erase() in nfs4_state_destroy_net()
      and destroy_client().
      
      This patch just let nfsd traversing unconfirmed client through
      hash-table instead of rbtree.
      
      [ 2325.021995] BUG: unable to handle kernel NULL pointer dereference at
                (null)
      [ 2325.022809] IP: [<ffffffff8133c18c>] rb_erase+0x14c/0x390
      [ 2325.022982] PGD 7a91b067 PUD 7a33d067 PMD 0
      [ 2325.022982] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 2325.022982] Modules linked in: nfsd(OF) cfg80211 rfkill bridge stp
      llc snd_intel8x0 snd_ac97_codec ac97_bus auth_rpcgss nfs_acl serio_raw
      e1000 i2c_piix4 ppdev snd_pcm snd_timer lockd pcspkr joydev parport_pc
      snd parport i2c_core soundcore microcode sunrpc ata_generic pata_acpi
      [last unloaded: nfsd]
      [ 2325.022982] CPU: 1 PID: 2123 Comm: nfsd Tainted: GF          O
      3.14.0-rc8+ #2
      [ 2325.022982] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
      VirtualBox 12/01/2006
      [ 2325.022982] task: ffff88007b384800 ti: ffff8800797f6000 task.ti:
      ffff8800797f6000
      [ 2325.022982] RIP: 0010:[<ffffffff8133c18c>]  [<ffffffff8133c18c>]
      rb_erase+0x14c/0x390
      [ 2325.022982] RSP: 0018:ffff8800797f7d98  EFLAGS: 00010246
      [ 2325.022982] RAX: ffff880079c1f010 RBX: ffff880079f4c828 RCX:
      0000000000000000
      [ 2325.022982] RDX: 0000000000000000 RSI: ffff880079bcb070 RDI:
      ffff880079f4c810
      [ 2325.022982] RBP: ffff8800797f7d98 R08: 0000000000000000 R09:
      ffff88007964fc70
      [ 2325.022982] R10: 0000000000000000 R11: 0000000000000400 R12:
      ffff880079f4c800
      [ 2325.022982] R13: ffff880079bcb000 R14: ffff8800797f7da8 R15:
      ffff880079f4c860
      [ 2325.022982] FS:  0000000000000000(0000) GS:ffff88007f900000(0000)
      knlGS:0000000000000000
      [ 2325.022982] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 2325.022982] CR2: 0000000000000000 CR3: 000000007a3ef000 CR4:
      00000000000006e0
      [ 2325.022982] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [ 2325.022982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
      0000000000000400
      [ 2325.022982] Stack:
      [ 2325.022982]  ffff8800797f7de0 ffffffffa0191c6e ffff8800797f7da8
      ffff8800797f7da8
      [ 2325.022982]  ffff880079f4c810 ffff880079bcb000 ffffffff81cc26c0
      ffff880079c1f010
      [ 2325.022982]  ffff880079bcb070 ffff8800797f7e28 ffffffffa01977f2
      ffff8800797f7df0
      [ 2325.022982] Call Trace:
      [ 2325.022982]  [<ffffffffa0191c6e>] destroy_client+0x32e/0x3b0 [nfsd]
      [ 2325.022982]  [<ffffffffa01977f2>] nfs4_state_shutdown_net+0x1a2/0x220
      [nfsd]
      [ 2325.022982]  [<ffffffffa01700b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
      [ 2325.022982]  [<ffffffffa017013e>] nfsd_last_thread+0x4e/0x80 [nfsd]
      [ 2325.022982]  [<ffffffffa001f1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
      [ 2325.022982]  [<ffffffffa017064b>] nfsd_destroy+0x5b/0x80 [nfsd]
      [ 2325.022982]  [<ffffffffa0170773>] nfsd+0x103/0x130 [nfsd]
      [ 2325.022982]  [<ffffffffa0170670>] ? nfsd_destroy+0x80/0x80 [nfsd]
      [ 2325.022982]  [<ffffffff810a8232>] kthread+0xd2/0xf0
      [ 2325.022982]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
      [ 2325.022982]  [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
      [ 2325.022982]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
      [ 2325.022982] Code: 48 83 e1 fc 48 89 10 0f 84 02 01 00 00 48 3b 41 10
      0f 84 08 01 00 00 48 89 51 08 48 89 fa e9 74 ff ff ff 0f 1f 40 00 48 8b
      50 10 <f6> 02 01 0f 84 93 00 00 00 48 8b 7a 10 48 85 ff 74 05 f6 07 01
      [ 2325.022982] RIP  [<ffffffff8133c18c>] rb_erase+0x14c/0x390
      [ 2325.022982]  RSP <ffff8800797f7d98>
      [ 2325.022982] CR2: 0000000000000000
      [ 2325.022982] ---[ end trace 28c27ed011655e57 ]---
      
      [  228.064071] BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:558]
      [  228.064428] Modules linked in: ip6t_rpfilter ip6t_REJECT cfg80211
      xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc
      ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
      nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw
      ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
      nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
      iptable_raw nfsd(OF) auth_rpcgss nfs_acl lockd snd_intel8x0
      snd_ac97_codec ac97_bus joydev snd_pcm snd_timer e1000 sunrpc snd ppdev
      parport_pc serio_raw pcspkr i2c_piix4 microcode parport soundcore
      i2c_core ata_generic pata_acpi
      [  228.064539] CPU: 0 PID: 558 Comm: nfsd Tainted: GF          O
      3.14.0-rc8+ #2
      [  228.064539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
      VirtualBox 12/01/2006
      [  228.064539] task: ffff880076adec00 ti: ffff880074616000 task.ti:
      ffff880074616000
      [  228.064539] RIP: 0010:[<ffffffff8133ba17>]  [<ffffffff8133ba17>]
      rb_next+0x27/0x50
      [  228.064539] RSP: 0018:ffff880074617de0  EFLAGS: 00000282
      [  228.064539] RAX: ffff880074478010 RBX: ffff88007446f860 RCX:
      0000000000000014
      [  228.064539] RDX: ffff880074478010 RSI: 0000000000000000 RDI:
      ffff880074478010
      [  228.064539] RBP: ffff880074617de0 R08: 0000000000000000 R09:
      0000000000000012
      [  228.064539] R10: 0000000000000001 R11: ffffffffffffffec R12:
      ffffea0001d11a00
      [  228.064539] R13: ffff88007f401400 R14: ffff88007446f800 R15:
      ffff880074617d50
      [  228.064539] FS:  0000000000000000(0000) GS:ffff88007f800000(0000)
      knlGS:0000000000000000
      [  228.064539] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  228.064539] CR2: 00007fe9ac6ec000 CR3: 000000007a5d6000 CR4:
      00000000000006f0
      [  228.064539] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [  228.064539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
      0000000000000400
      [  228.064539] Stack:
      [  228.064539]  ffff880074617e28 ffffffffa01ab7db ffff880074617df0
      ffff880074617df0
      [  228.064539]  ffff880079273000 ffffffff81cc26c0 ffffffff81cc26c0
      0000000000000000
      [  228.064539]  0000000000000000 ffff880074617e48 ffffffffa01840b8
      ffffffff81cc26c0
      [  228.064539] Call Trace:
      [  228.064539]  [<ffffffffa01ab7db>] nfs4_state_shutdown_net+0x18b/0x220
      [nfsd]
      [  228.064539]  [<ffffffffa01840b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
      [  228.064539]  [<ffffffffa018413e>] nfsd_last_thread+0x4e/0x80 [nfsd]
      [  228.064539]  [<ffffffffa00aa1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
      [  228.064539]  [<ffffffffa018464b>] nfsd_destroy+0x5b/0x80 [nfsd]
      [  228.064539]  [<ffffffffa0184773>] nfsd+0x103/0x130 [nfsd]
      [  228.064539]  [<ffffffffa0184670>] ? nfsd_destroy+0x80/0x80 [nfsd]
      [  228.064539]  [<ffffffff810a8232>] kthread+0xd2/0xf0
      [  228.064539]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
      [  228.064539]  [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
      [  228.064539]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
      [  228.064539] Code: 1f 44 00 00 55 48 8b 17 48 89 e5 48 39 d7 74 3b 48
      8b 47 08 48 85 c0 75 0e eb 25 66 0f 1f 84 00 00 00 00 00 48 89 d0 48 8b
      50 10 <48> 85 d2 75 f4 5d c3 66 90 48 3b 78 08 75 f6 48 8b 10 48 89 c7
      
      Fixes: ac55fdc4 (nfsd: move the confirmed and unconfirmed hlists...)
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      59fd0ab1