1. 31 Jan, 2019 25 commits
  2. 26 Jan, 2019 15 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.14.96 · e6608e1f
      Greg Kroah-Hartman authored
      e6608e1f
    • Corey Minyard's avatar
      ipmi:ssif: Fix handling of multi-part return messages · 7c307d32
      Corey Minyard authored
      commit 7d6380cd upstream.
      
      The block number was not being compared right, it was off by one
      when checking the response.
      
      Some statistics wouldn't be incremented properly in some cases.
      
      Check to see if that middle-part messages always have 31 bytes of
      data.
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Cc: stable@vger.kernel.org # 4.4
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c307d32
    • Marc Zyngier's avatar
      PCI: dwc: Move interrupt acking into the proper callback · 413cb66b
      Marc Zyngier authored
      commit 3f7bb2ec upstream.
      
      The write to the status register is really an ACK for the HW,
      and should be treated as such by the driver. Let's move it to the
      irq_ack() callback, which will prevent people from moving it around
      in order to paper over other bugs.
      
      Fixes: 8c934095 ("PCI: dwc: Clear MSI interrupt status after it is handled,
      not before")
      Fixes: 7c5925af ("PCI: dwc: Move MSI IRQs allocation to IRQ domains
      hierarchical API")
      Link: https://lore.kernel.org/linux-pci/20181113225734.8026-1-marc.zyngier@arm.com/Reported-by: default avatarTrent Piepho <tpiepho@impinj.com>
      Tested-by: default avatarNiklas Cassel <niklas.cassel@linaro.org>
      Tested-by: default avatarGustavo Pimentel <gustavo.pimentel@synopsys.com>
      Tested-by: default avatarStanimir Varbanov <svarbanov@mm-sol.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      [lorenzo.pieralisi@arm.com: updated commit log]
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      413cb66b
    • Zhenyu Wang's avatar
      drm/i915/gvt: Fix mmap range check · e89ec9b9
      Zhenyu Wang authored
      commit 51b00d85 upstream.
      
      This is to fix missed mmap range check on vGPU bar2 region
      and only allow to map vGPU allocated GMADDR range, which means
      user space should support sparse mmap to get proper offset for
      mmap vGPU aperture. And this takes care of actual pgoff in mmap
      request as original code always does from beginning of vGPU
      aperture.
      
      Fixes: 659643f7 ("drm/i915/gvt/kvmgt: add vfio/mdev support to KVMGT")
      Cc: "Monroy, Rodrigo Axel" <rodrigo.axel.monroy@intel.com>
      Cc: "Orrala Contreras, Alfredo" <alfredo.orrala.contreras@intel.com>
      Cc: stable@vger.kernel.org # v4.10+
      Reviewed-by: default avatarHang Yuan <hang.yuan@intel.com>
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e89ec9b9
    • Steve French's avatar
      cifs: allow disabling insecure dialects in the config · bfbd7e95
      Steve French authored
      commit 7420451f upstream.
      
      allow disabling cifs (SMB1 ie vers=1.0) and vers=2.0 in the
      config for the build of cifs.ko if want to always prevent mounting
      with these less secure dialects.
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Reviewed-by: default avatarAurelien Aptel <aaptel@suse.com>
      Reviewed-by: default avatarJeremy Allison <jra@samba.org>
      Cc: Alakesh Haloi <alakeshh@amazon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfbd7e95
    • Corey Minyard's avatar
      ipmi:pci: Blacklist a Realtek "IPMI" device · de616eb2
      Corey Minyard authored
      commit bc48fa1b upstream.
      
      Realtek has some sort of "Virtual" IPMI device on the PCI bus as a
      KCS controller, but whatever it is, it's not one.  Ignore it if seen.
      
      [ Commit 13d0b35c (ipmi_si: Move PCI setup to another file) from Linux
        4.15-rc1 has not been back ported, so the PCI code is still in
        `drivers/char/ipmi/ipmi_si_intf.c`, requiring to apply the commit
        manually.
      
        This fixes a 100 s boot delay on the HP EliteDesk 705 G4 MT with Linux
        4.14.94. ]
      Reported-by: default avatarChris Chiu <chiu@endlessm.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Tested-by: default avatarDaniel Drake <drake@endlessm.com>
      Signed-off-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de616eb2
    • Scott Mayhew's avatar
      nfs: fix a deadlock in nfs client initialization · 53818c76
      Scott Mayhew authored
      commit c156618e upstream.
      
      The following deadlock can occur between a process waiting for a client
      to initialize in while walking the client list during nfsv4 server trunking
      detection and another process waiting for the nfs_clid_init_mutex so it
      can initialize that client:
      
      Process 1                               Process 2
      ---------                               ---------
      spin_lock(&nn->nfs_client_lock);
      list_add_tail(&CLIENTA->cl_share_link,
              &nn->nfs_client_list);
      spin_unlock(&nn->nfs_client_lock);
                                              spin_lock(&nn->nfs_client_lock);
                                              list_add_tail(&CLIENTB->cl_share_link,
                                                      &nn->nfs_client_list);
                                              spin_unlock(&nn->nfs_client_lock);
                                              mutex_lock(&nfs_clid_init_mutex);
                                              nfs41_walk_client_list(clp, result, cred);
                                              nfs_wait_client_init_complete(CLIENTA);
      (waiting for nfs_clid_init_mutex)
      
      Make sure nfs_match_client() only evaluates clients that have completed
      initialization in order to prevent that deadlock.
      
      This patch also fixes v4.0 trunking behavior by not marking the client
      NFS_CS_READY until the clientid has been confirmed.
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarQian Lu <luqia@amazon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53818c76
    • Michal Hocko's avatar
      mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps · 696ce77b
      Michal Hocko authored
      [ Upstream commit 7550c607 ]
      
      Patch series "THP eligibility reporting via proc".
      
      This series of three patches aims at making THP eligibility reporting much
      more robust and long term sustainable.  The trigger for the change is a
      regression report [2] and the long follow up discussion.  In short the
      specific application didn't have good API to query whether a particular
      mapping can be backed by THP so it has used VMA flags to workaround that.
      These flags represent a deep internal state of VMAs and as such they
      should be used by userspace with a great deal of caution.
      
      A similar has happened for [3] when users complained that VM_MIXEDMAP is
      no longer set on DAX mappings.  Again a lack of a proper API led to an
      abuse.
      
      The first patch in the series tries to emphasise that that the semantic of
      flags might change and any application consuming those should be really
      careful.
      
      The remaining two patches provide a more suitable interface to address [2]
      and provide a consistent API to query the THP status both for each VMA and
      process wide as well.  [1]
      
      http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2]
      http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com
      [3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz
      
      This patch (of 3):
      
      Even though vma flags exported via /proc/<pid>/smaps are explicitly
      documented to be not guaranteed for future compatibility the warning
      doesn't go far enough because it doesn't mention semantic changes to those
      flags.  And they are important as well because these flags are a deep
      implementation internal to the MM code and the semantic might change at
      any time.
      
      Let's consider two recent examples:
      http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz
      : commit e1fb4a08 "dax: remove VM_MIXEDMAP for fsdax and device dax" has
      : removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the
      : mean time certain customer of ours started poking into /proc/<pid>/smaps
      : and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA
      : flags, the application just fails to start complaining that DAX support is
      : missing in the kernel.
      
      http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com
      : Commit 18600332 ("mm: make PR_SET_THP_DISABLE immediately active")
      : introduced a regression in that userspace cannot always determine the set
      : of vmas where thp is ineligible.
      : Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
      : to determine if a vma is eligible to be backed by hugepages.
      : Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
      : be disabled and emit "nh" as a flag for the corresponding vmas as part of
      : /proc/pid/smaps.  After the commit, thp is disabled by means of an mm
      : flag and "nh" is not emitted.
      : This causes smaps parsing libraries to assume a vma is eligible for thp
      : and ends up puzzling the user on why its memory is not backed by thp.
      
      In both cases userspace was relying on a semantic of a specific VMA flag.
      The primary reason why that happened is a lack of a proper interface.
      While this has been worked on and it will be fixed properly, it seems that
      our wording could see some refinement and be more vocal about semantic
      aspect of these flags as well.
      
      Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paul Oppenheimer <bepvte@gmail.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      696ce77b
    • Aaron Lu's avatar
      mm/swap: use nr_node_ids for avail_lists in swap_info_struct · 4fb12a08
      Aaron Lu authored
      [ Upstream commit 66f71da9 ]
      
      Since a2468cc9 ("swap: choose swap device according to numa node"),
      avail_lists field of swap_info_struct is changed to an array with
      MAX_NUMNODES elements.  This made swap_info_struct size increased to 40KiB
      and needs an order-4 page to hold it.
      
      This is not optimal in that:
      1 Most systems have way less than MAX_NUMNODES(1024) nodes so it
        is a waste of memory;
      2 It could cause swapon failure if the swap device is swapped on
        after system has been running for a while, due to no order-4
        page is available as pointed out by Vasily Averin.
      
      Solve the above two issues by using nr_node_ids(which is the actual
      possible node number the running system has) for avail_lists instead of
      MAX_NUMNODES.
      
      nr_node_ids is unknown at compile time so can't be directly used when
      declaring this array.  What I did here is to declare avail_lists as zero
      element array and allocate space for it when allocating space for
      swap_info_struct.  The reason why keep using array but not pointer is
      plist_for_each_entry needs the field to be part of the struct, so pointer
      will not work.
      
      This patch is on top of Vasily Averin's fix commit.  I think the use of
      kvzalloc for swap_info_struct is still needed in case nr_node_ids is
      really big on some systems.
      
      Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.comSigned-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4fb12a08
    • Brian Foster's avatar
      mm/page-writeback.c: don't break integrity writeback on ->writepage() error · 694c20fe
      Brian Foster authored
      [ Upstream commit 3fa750dc ]
      
      write_cache_pages() is used in both background and integrity writeback
      scenarios by various filesystems.  Background writeback is mostly
      concerned with cleaning a certain number of dirty pages based on various
      mm heuristics.  It may not write the full set of dirty pages or wait for
      I/O to complete.  Integrity writeback is responsible for persisting a set
      of dirty pages before the writeback job completes.  For example, an
      fsync() call must perform integrity writeback to ensure data is on disk
      before the call returns.
      
      write_cache_pages() unconditionally breaks out of its processing loop in
      the event of a ->writepage() error.  This is fine for background
      writeback, which had no strict requirements and will eventually come
      around again.  This can cause problems for integrity writeback on
      filesystems that might need to clean up state associated with failed page
      writeouts.  For example, XFS performs internal delayed allocation
      accounting before returning a ->writepage() error, where applicable.  If
      the current writeback happens to be associated with an unmount and
      write_cache_pages() completes the writeback prematurely due to error, the
      filesystem is unmounted in an inconsistent state if dirty+delalloc pages
      still exist.
      
      To handle this problem, update write_cache_pages() to always process the
      full set of pages for integrity writeback regardless of ->writepage()
      errors.  Save the first encountered error and return it to the caller once
      complete.  This facilitates XFS (or any other fs that expects integrity
      writeback to process the entire set of dirty pages) to clean up its
      internal state completely in the event of persistent mapping errors.
      Background writeback continues to exit on the first error encountered.
      
      [akpm@linux-foundation.org: fix typo in comment]
      Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.comSigned-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      694c20fe
    • Junxiao Bi's avatar
      ocfs2: fix panic due to unrecovered local alloc · f976e59e
      Junxiao Bi authored
      [ Upstream commit 532e1e54 ]
      
      mount.ocfs2 ignore the inconsistent error that journal is clean but
      local alloc is unrecovered.  After mount, local alloc not empty, then
      reserver cluster didn't alloc a new local alloc window, reserveration
      map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the
      following panic.
      
      This issue was reported at
      
        https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html
      
      and was advised to fixed during mount.  But this is a very unusual
      inconsistent state, usually journal dirty flag should be cleared at the
      last stage of umount until every other things go right.  We may need do
      further debug to check that.  Any way to avoid possible futher
      corruption, mount should be abort and fsck should be run.
      
        (mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered!
        found = 6518, set = 6518, taken = 8192, off = 15912372
        ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode.
        o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes
        ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode.
        o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device
        o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777
        o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
        o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
        ------------[ cut here ]------------
        kernel BUG at fs/ocfs2/reservations.c:507!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod
        CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2
        Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018
        task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000
        RIP: 0010:[<ffffffffa05e96a8>]  [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
        Call Trace:
          ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2]
          ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2]
          __ocfs2_claim_clusters+0x178/0x360 [ocfs2]
          ocfs2_claim_clusters+0x1f/0x30 [ocfs2]
          ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2]
          ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2]
          ocfs2_write_begin+0x13e/0x230 [ocfs2]
          generic_perform_write+0xbf/0x1c0
          __generic_file_write_iter+0x19c/0x1d0
          ocfs2_file_write_iter+0x589/0x1360 [ocfs2]
          __vfs_write+0xb8/0x110
          vfs_write+0xa9/0x1b0
          SyS_write+0x46/0xb0
          system_call_fastpath+0x18/0xd7
        Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85
        RIP   __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
         RSP <ffff8800ea4db668>
        ---[ end trace 566f07529f2edf3c ]---
        Kernel panic - not syncing: Fatal exception
        Kernel Offset: disabled
      
      Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Acked-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f976e59e
    • Qian Cai's avatar
      scsi: megaraid: fix out-of-bound array accesses · c010d494
      Qian Cai authored
      [ Upstream commit c7a082e4 ]
      
      UBSAN reported those with MegaRAID SAS-3 3108,
      
      [   77.467308] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
      [   77.475402] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
      [   77.481677] CPU: 16 PID: 333 Comm: kworker/16:1 Not tainted 4.20.0-rc5+ #1
      [   77.488556] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
      [   77.495791] Workqueue: events work_for_cpu_fn
      [   77.500154] Call trace:
      [   77.502610]  dump_backtrace+0x0/0x2c8
      [   77.506279]  show_stack+0x24/0x30
      [   77.509604]  dump_stack+0x118/0x19c
      [   77.513098]  ubsan_epilogue+0x14/0x60
      [   77.516765]  __ubsan_handle_out_of_bounds+0xfc/0x13c
      [   77.521767]  mr_update_load_balance_params+0x150/0x158 [megaraid_sas]
      [   77.528230]  MR_ValidateMapInfo+0x2cc/0x10d0 [megaraid_sas]
      [   77.533825]  megasas_get_map_info+0x244/0x2f0 [megaraid_sas]
      [   77.539505]  megasas_init_adapter_fusion+0x9b0/0xf48 [megaraid_sas]
      [   77.545794]  megasas_init_fw+0x1ab4/0x3518 [megaraid_sas]
      [   77.551212]  megasas_probe_one+0x2c4/0xbe0 [megaraid_sas]
      [   77.556614]  local_pci_probe+0x7c/0xf0
      [   77.560365]  work_for_cpu_fn+0x34/0x50
      [   77.564118]  process_one_work+0x61c/0xf08
      [   77.568129]  worker_thread+0x534/0xa70
      [   77.571882]  kthread+0x1c8/0x1d0
      [   77.575114]  ret_from_fork+0x10/0x1c
      
      [   89.240332] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
      [   89.248426] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
      [   89.254700] CPU: 16 PID: 95 Comm: kworker/u130:0 Not tainted 4.20.0-rc5+ #1
      [   89.261665] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
      [   89.268903] Workqueue: events_unbound async_run_entry_fn
      [   89.274222] Call trace:
      [   89.276680]  dump_backtrace+0x0/0x2c8
      [   89.280348]  show_stack+0x24/0x30
      [   89.283671]  dump_stack+0x118/0x19c
      [   89.287167]  ubsan_epilogue+0x14/0x60
      [   89.290835]  __ubsan_handle_out_of_bounds+0xfc/0x13c
      [   89.295828]  MR_LdRaidGet+0x50/0x58 [megaraid_sas]
      [   89.300638]  megasas_build_io_fusion+0xbb8/0xd90 [megaraid_sas]
      [   89.306576]  megasas_build_and_issue_cmd_fusion+0x138/0x460 [megaraid_sas]
      [   89.313468]  megasas_queue_command+0x398/0x3d0 [megaraid_sas]
      [   89.319222]  scsi_dispatch_cmd+0x1dc/0x8a8
      [   89.323321]  scsi_request_fn+0x8e8/0xdd0
      [   89.327249]  __blk_run_queue+0xc4/0x158
      [   89.331090]  blk_execute_rq_nowait+0xf4/0x158
      [   89.335449]  blk_execute_rq+0xdc/0x158
      [   89.339202]  __scsi_execute+0x130/0x258
      [   89.343041]  scsi_probe_and_add_lun+0x2fc/0x1488
      [   89.347661]  __scsi_scan_target+0x1cc/0x8c8
      [   89.351848]  scsi_scan_channel.part.3+0x8c/0xc0
      [   89.356382]  scsi_scan_host_selected+0x130/0x1f0
      [   89.361002]  do_scsi_scan_host+0xd8/0xf0
      [   89.364927]  do_scan_async+0x9c/0x320
      [   89.368594]  async_run_entry_fn+0x138/0x420
      [   89.372780]  process_one_work+0x61c/0xf08
      [   89.376793]  worker_thread+0x13c/0xa70
      [   89.380546]  kthread+0x1c8/0x1d0
      [   89.383778]  ret_from_fork+0x10/0x1c
      
      This is because when populating Driver Map using firmware raid map, all
      non-existing VDs set their ldTgtIdToLd to 0xff, so it can be skipped later.
      
      From drivers/scsi/megaraid/megaraid_sas_base.c ,
      memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS);
      
      From drivers/scsi/megaraid/megaraid_sas_fp.c ,
      /* For non existing VDs, iterate to next VD*/
      if (ld >= (MAX_LOGICAL_DRIVES_EXT - 1))
      	continue;
      
      However, there are a few places that failed to skip those non-existing VDs
      due to off-by-one errors. Then, those 0xff leaked into MR_LdRaidGet(0xff,
      map) and triggered the out-of-bound accesses.
      
      Fixes: 51087a86 ("megaraid_sas : Extended VD support")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c010d494
    • Yanjiang Jin's avatar
      scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown() · 008b4a70
      Yanjiang Jin authored
      [ Upstream commit e57b2945 ]
      
      We must free all irqs during shutdown, else kexec's 2nd kernel would hang
      in pqi_wait_for_completion_io() as below:
      
      Call trace:
      
       pqi_wait_for_completion_io
       pqi_submit_raid_request_synchronous.constprop.78+0x23c/0x310 [smartpqi]
       pqi_configure_events+0xec/0x1f8 [smartpqi]
       pqi_ctrl_init+0x814/0xca0 [smartpqi]
       pqi_pci_probe+0x400/0x46c [smartpqi]
       local_pci_probe+0x48/0xb0
       pci_device_probe+0x14c/0x1b0
       really_probe+0x218/0x3fc
       driver_probe_device+0x70/0x140
       __driver_attach+0x11c/0x134
       bus_for_each_dev+0x70/0xc8
       driver_attach+0x30/0x38
       bus_add_driver+0x1f0/0x294
       driver_register+0x74/0x12c
       __pci_register_driver+0x64/0x70
       pqi_init+0xd0/0x10000 [smartpqi]
       do_one_initcall+0x60/0x1d8
       do_init_module+0x64/0x1f8
       load_module+0x10ec/0x1350
       __se_sys_finit_module+0xd4/0x100
       __arm64_sys_finit_module+0x28/0x34
       el0_svc_handler+0x104/0x160
       el0_svc+0x8/0xc
      
      This happens only in the following combinations:
      
      1. smartpqi is built as module, not built-in;
      2. We have a disk connected to smartpqi card;
      3. Both kexec's 1st and 2nd kernels use this disk as Rootfs' mount point.
      Signed-off-by: default avatarYanjiang Jin <yanjiang.jin@hxt-semitech.com>
      Acked-by: default avatarDon Brace <don.brace@microsemi.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      008b4a70
    • Kevin Barnett's avatar
      scsi: smartpqi: correct lun reset issues · ceecd0c6
      Kevin Barnett authored
      [ Upstream commit 2ba55c98 ]
      
      Problem:
      The Linux kernel takes a logical volume offline after a LUN reset.  This is
      generally accompanied by this message in the dmesg output:
      
      Device offlined - not ready after error recovery
      
      Root Cause:
      The root cause is a "quirk" in the timeout handling in the Linux SCSI
      layer. The Linux kernel places a 30-second timeout on most media access
      commands (reads and writes) that it send to device drivers.  When a media
      access command times out, the Linux kernel goes into error recovery mode
      for the LUN that was the target of the command that timed out. Every
      command that timed out is kept on a list inside of the Linux kernel to be
      retried later. The kernel attempts to recover the command(s) that timed out
      by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and
      TEST UNIT READY commands are successful, the kernel retries the command(s)
      that timed out.
      
      Each SCSI command issued by the kernel has a result field associated with
      it. This field indicates the final result of the command (success or
      error). When a command times out, the kernel places a value in this result
      field indicating that the command timed out.
      
      The "quirk" is that after the LUN reset and TEST UNIT READY commands are
      completed, the kernel checks each command on the timed-out command list
      before retrying it. If the result field is still "timed out", the kernel
      treats that command as not having been successfully recovered for a
      retry. If the number of commands that are in this state are greater than
      two, the kernel takes the LUN offline.
      
      Fix:
      When our RAIDStack receives a LUN reset, it simply waits until all
      outstanding commands complete. Generally, all of these outstanding commands
      complete successfully. Therefore, the fix in the smartpqi driver is to
      always set the command result field to indicate success when a request
      completes successfully. This normally isn’t necessary because the result
      field is always initialized to success when the command is submitted to the
      driver. So when the command completes successfully, the result field is
      left untouched. But in this case, the kernel changes the result field
      behind the driver’s back and then expects the field to be changed by the
      driver as the commands that timed-out complete.
      Reviewed-by: default avatarDave Carroll <david.carroll@microsemi.com>
      Reviewed-by: default avatarScott Teel <scott.teel@microsemi.com>
      Signed-off-by: default avatarKevin Barnett <kevin.barnett@microsemi.com>
      Signed-off-by: default avatarDon Brace <don.brace@microsemi.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ceecd0c6
    • Parvi Kaustubhi's avatar
      IB/usnic: Fix potential deadlock · 9d3e1c09
      Parvi Kaustubhi authored
      [ Upstream commit 8036e90f ]
      
      Acquiring the rtnl lock while holding usdev_lock could result in a
      deadlock.
      
      For example:
      
      usnic_ib_query_port()
      | mutex_lock(&us_ibdev->usdev_lock)
       | ib_get_eth_speed()
        | rtnl_lock()
      
      rtnl_lock()
      | usnic_ib_netdevice_event()
       | mutex_lock(&us_ibdev->usdev_lock)
      
      This commit moves the usdev_lock acquisition after the rtnl lock has been
      released.
      
      This is safe to do because usdev_lock is not protecting anything being
      accessed in ib_get_eth_speed(). Hence, the correct order of holding locks
      (rtnl -> usdev_lock) is not violated.
      Signed-off-by: default avatarParvi Kaustubhi <pkaustub@cisco.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9d3e1c09