1. 19 Aug, 2016 3 commits
  2. 16 Aug, 2016 2 commits
  3. 12 Aug, 2016 2 commits
    • Calvin Owens's avatar
      ses: Fix racy cleanup of /sys in remove_dev() · e120dcb6
      Calvin Owens authored
      Currently we free the resources backing the enclosure device before we
      call device_unregister(). This is racy: during rmmod of low-level SCSI
      drivers that hook into enclosure, we end up with a small window of time
      during which writing to /sys can OOPS. Example trace with mpt3sas:
      
        general protection fault: 0000 [#1] SMP KASAN
        Modules linked in: mpt3sas(-) <...>
        RIP: [<ffffffffa0388a98>] ses_get_page2_descriptor.isra.6+0x38/0x220 [ses]
        Call Trace:
         [<ffffffffa0389d14>] ses_set_fault+0xf4/0x400 [ses]
         [<ffffffffa0361069>] set_component_fault+0xa9/0xf0 [enclosure]
         [<ffffffff8205bffc>] dev_attr_store+0x3c/0x70
         [<ffffffff81677df5>] sysfs_kf_write+0x115/0x180
         [<ffffffff81675725>] kernfs_fop_write+0x275/0x3a0
         [<ffffffff8151f810>] __vfs_write+0xe0/0x3e0
         [<ffffffff8152281f>] vfs_write+0x13f/0x4a0
         [<ffffffff81526731>] SyS_write+0x111/0x230
         [<ffffffff828b401b>] entry_SYSCALL_64_fastpath+0x13/0x94
      
      Fortunately the solution is extremely simple: call device_unregister()
      before we free the resources, and the race no longer exists. The driver
      core holds a reference over ->remove_dev(), so AFAICT this is safe.
      Signed-off-by: default avatarCalvin Owens <calvinowens@fb.com>
      Reviewed-by: default avatarJames Bottomley <jejb@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      e120dcb6
    • Greg Edwards's avatar
      mpt3sas: Fix resume on WarpDrive flash cards · ce7c6c9e
      Greg Edwards authored
      mpt3sas crashes on resume after suspend with WarpDrive flash cards.  The
      reply_post_host_index array is not set back up after the resume, and we
      deference a stale pointer in _base_interrupt().
      
      [   47.309711] BUG: unable to handle kernel paging request at ffffc90001f8006c
      [   47.318289] IP: [<ffffffffc00863ef>] _base_interrupt+0x49f/0xa30 [mpt3sas]
      [   47.326749] PGD 41ccaa067 PUD 41ccab067 PMD 3466c067 PTE 0
      [   47.333848] Oops: 0002 [#1] SMP
      ...
      [   47.452708] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0 #6
      [   47.460506] Hardware name: Dell Inc. OptiPlex 990/06D7TR, BIOS A18 09/24/2013
      [   47.469629] task: ffffffff81c0d500 ti: ffffffff81c00000 task.ti: ffffffff81c00000
      [   47.479112] RIP: 0010:[<ffffffffc00863ef>]  [<ffffffffc00863ef>] _base_interrupt+0x49f/0xa30 [mpt3sas]
      [   47.490466] RSP: 0018:ffff88041d203e30  EFLAGS: 00010002
      [   47.497801] RAX: 0000000000000001 RBX: ffff880033f4c000 RCX: 0000000000000001
      [   47.506973] RDX: ffffc90001f8006c RSI: 0000000000000082 RDI: 0000000000000082
      [   47.516141] RBP: ffff88041d203eb0 R08: ffff8804118e2820 R09: 0000000000000001
      [   47.525300] R10: 0000000000000001 R11: 00000000100c0000 R12: 0000000000000000
      [   47.534457] R13: ffff880412c487e0 R14: ffff88041a8987d8 R15: 0000000000000001
      [   47.543632] FS:  0000000000000000(0000) GS:ffff88041d200000(0000) knlGS:0000000000000000
      [   47.553796] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   47.561632] CR2: ffffc90001f8006c CR3: 0000000001c06000 CR4: 00000000000406f0
      [   47.570883] Stack:
      [   47.575015]  000000001d211228 ffff88041d2100c0 ffff8800c47d8130 0000000000000100
      [   47.584625]  ffff8804100c0000 100c000000000000 ffff88041a8992a0 ffff88041a8987f8
      [   47.594230]  ffff88041d203e00 ffffffff81111e55 000000000000038c ffff880414ad4280
      [   47.603862] Call Trace:
      [   47.608474]  <IRQ>
      [   47.610413]  [<ffffffff81111e55>] ? call_timer_fn+0x35/0x120
      [   47.620539]  [<ffffffff81100a1f>] handle_irq_event_percpu+0x7f/0x1c0
      [   47.629061]  [<ffffffff81100b8c>] handle_irq_event+0x2c/0x50
      [   47.636859]  [<ffffffff81103fff>] handle_edge_irq+0x6f/0x130
      [   47.644654]  [<ffffffff8102fbf3>] handle_irq+0x73/0x120
      [   47.652011]  [<ffffffff810c6ada>] ? atomic_notifier_call_chain+0x1a/0x20
      [   47.660854]  [<ffffffff817e374b>] do_IRQ+0x4b/0xd0
      [   47.667777]  [<ffffffff817e160c>] common_interrupt+0x8c/0x8c
      [   47.675635]  <EOI>
      
      Move the reply_post_host_index array setup into
      mpt3sas_base_map_resources(), which is also in the resume path.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Edwards <gedwards@fireweed.org>
      Acked-by: default avatarChaitra P B <chaitra.basappa@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      ce7c6c9e
  4. 11 Aug, 2016 2 commits
    • Brian King's avatar
      ipr: Fix sync scsi scan · 0d7826dd
      Brian King authored
      Commit b195d5e2 ("ipr: Wait to do async scan until scsi host is
      initialized") fixed async scan for ipr, but broke sync scan. This fixes
      sync scan back up.
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Tested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      0d7826dd
    • Yinghai Lu's avatar
      megaraid_sas: Fix probing cards without io port · e7f85168
      Yinghai Lu authored
      Found one megaraid_sas HBA probe fails,
      
      [  187.235190] scsi host2: Avago SAS based MegaRAID driver
      [  191.112365] megaraid_sas 0000:89:00.0: BAR 0: can't reserve [io  0x0000-0x00ff]
      [  191.120548] megaraid_sas 0000:89:00.0: IO memory region busy!
      
      and the card has resource like,
      [  125.097714] pci 0000:89:00.0: [1000:005d] type 00 class 0x010400
      [  125.104446] pci 0000:89:00.0: reg 0x10: [io  0x0000-0x00ff]
      [  125.110686] pci 0000:89:00.0: reg 0x14: [mem 0xce400000-0xce40ffff 64bit]
      [  125.118286] pci 0000:89:00.0: reg 0x1c: [mem 0xce300000-0xce3fffff 64bit]
      [  125.125891] pci 0000:89:00.0: reg 0x30: [mem 0xce200000-0xce2fffff pref]
      
      that does not io port resource allocated from BIOS, and kernel can not
      assign one as io port shortage.
      
      The driver is only looking for MEM, and should not fail.
      
      It turns out megasas_init_fw() etc are using bar index as mask.  index 1
      is used as mask 1, so that pci_request_selected_regions() is trying to
      request BAR0 instead of BAR1.
      
      Fix all related reference.
      
      Fixes: b6d5d880 ("megaraid_sas: Use lowest memory bar for SR-IOV VF support")
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarKashyap Desai <kashyap.desai@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      e7f85168
  5. 09 Aug, 2016 1 commit
  6. 05 Aug, 2016 1 commit
  7. 02 Aug, 2016 3 commits
  8. 27 Jul, 2016 5 commits
    • Hannes Reinecke's avatar
      fcoe: Use default VLAN for FIP VLAN discovery · d242e668
      Hannes Reinecke authored
      FC-BB-6 states: FIP protocols shall be performed on a per-VLAN basis. It
      is recommended to use the FIP VLAN discovery protocol on the default
      VLAN.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.com>
      Acked-by: default avatarJohannes Thumshirn <jth@kernel.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      d242e668
    • Brian King's avatar
      ipr: Wait to do async scan until scsi host is initialized · b195d5e2
      Brian King authored
      When performing an async scan, make sure the kthread doing scanning
      doesn't start before the scsi host is fully initialized.
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Reviewed-by: default avatarGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      b195d5e2
    • Uma Krishnan's avatar
      MAINTAINERS: Update cxlflash maintainers · b2c0627c
      Uma Krishnan authored
      Adding myself as a cxlflash maintainer.
      Signed-off-by: default avatarUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      b2c0627c
    • Uma Krishnan's avatar
      cxlflash: Verify problem state area is mapped before notifying shutdown · 1bd2b282
      Uma Krishnan authored
      If an EEH or some other hard error occurs while the adapter instance was
      being initialized, on the subsequent shutdown of the device, the system
      could crash with:
      
      [c000000f1da03b60] c0000000005eccfc pci_device_shutdown+0x6c/0x100
      [c000000f1da03ba0] c0000000006d67d4 device_shutdown+0x1b4/0x2c0
      [c000000f1da03c40] c0000000000ea30c kernel_restart_prepare+0x5c/0x80
      [c000000f1da03c70] c0000000000ea48c kernel_restart+0x2c/0xc0
      [c000000f1da03ce0] c0000000000ea970 SyS_reboot+0x1c0/0x2d0
      [c000000f1da03e30] c000000000009204 system_call+0x38/0xb4
      
      This crash is due to the AFU not being mapped when the shutdown
      notification routine is called and is a regression that was inserted
      recently with Commit 704c4b0d ("cxlflash: Shutdown notify support
      for CXL Flash cards").
      
      As a fix, shutdown notification should only occur when the AFU is
      mapped.
      
      Fixes: 704c4b0d ("cxlflash: Shutdown notify support for CXL Flash cards")
      Signed-off-by: default avatarUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      1bd2b282
    • Mauricio Faria de Oliveira's avatar
      lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from lpfc_send_taskmgmt() · 05a05872
      Mauricio Faria de Oliveira authored
      The lpfc_sli4_scmd_to_wqidx_distr() function expects the scsi_cmnd
      'lpfc_cmd->pCmd' not to be null, and point to the midlayer command.
      
      That's not true in the .eh_(device|target|bus)_reset_handler path,
      because lpfc_send_taskmgmt() sends commands not from the midlayer, so
      does not set 'lpfc_cmd->pCmd'.
      
      That is true in the .queuecommand path because lpfc_queuecommand()
      stores the scsi_cmnd from midlayer in lpfc_cmd->pCmd; and lpfc_cmd is
      stored by lpfc_scsi_prep_cmnd() in piocbq->context1 -- which is passed
      to lpfc_sli4_scmd_to_wqidx_distr() as lpfc_cmd parameter.
      
      This problem can be hit on SCSI EH, and immediately with sg_reset.
      These 2 test-cases demonstrate the problem/fix with next-20160601.
      
      Test-case 1) sg_reset
      
          # strace sg_reset --device /dev/sdm
          <...>
          open("/dev/sdm", O_RDWR|O_NONBLOCK)     = 3
          ioctl(3, SG_SCSI_RESET, 0x3fffde6d0994 <unfinished ...>
          +++ killed by SIGSEGV +++
          Segmentation fault
      
          # dmesg
          Unable to handle kernel paging request for data at address 0x00000000
          Faulting instruction address: 0xd00000001c88442c
          Oops: Kernel access of bad area, sig: 11 [#1]
          <...>
          CPU: 104 PID: 16333 Comm: sg_reset Tainted: G        W       4.7.0-rc1-next-20160601-00004-g95b89dc #6
          <...>
          NIP [d00000001c88442c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc]
          LR [d00000001c826fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc]
          Call Trace:
          [c000003c9ec876f0] [c000003c9ec87770] 0xc000003c9ec87770 (unreliable)
          [c000003c9ec87720] [d00000001c82e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc]
          [c000003c9ec87780] [d00000001c831a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc]
          [c000003c9ec87880] [d00000001c87f27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc]
          [c000003c9ec87950] [d00000001c87fd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc]
          [c000003c9ec87a10] [c000000000610694] scsi_try_bus_device_reset+0x44/0xc0
          [c000003c9ec87a40] [c0000000006113e8] scsi_ioctl_reset+0x198/0x2c0
          [c000003c9ec87bf0] [c00000000060fe5c] scsi_ioctl+0x13c/0x4b0
          [c000003c9ec87c80] [c0000000006629b0] sd_ioctl+0xf0/0x120
          [c000003c9ec87cd0] [c00000000046e4f8] blkdev_ioctl+0x248/0xb70
          [c000003c9ec87d30] [c0000000002a1f60] block_ioctl+0x70/0x90
          [c000003c9ec87d50] [c00000000026d334] do_vfs_ioctl+0xc4/0x890
          [c000003c9ec87de0] [c00000000026db60] SyS_ioctl+0x60/0xc0
          [c000003c9ec87e30] [c000000000009120] system_call+0x38/0x108
          Instruction dump:
          <...>
      
          With fix:
      
          # strace sg_reset --device /dev/sdm
          <...>
          open("/dev/sdm", O_RDWR|O_NONBLOCK)     = 3
          ioctl(3, SG_SCSI_RESET, 0x3fffe103c554) = 0
          close(3)                                = 0
          exit_group(0)                           = ?
          +++ exited with 0 +++
      
          # dmesg
          [  424.658649] lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (1, 0) return x2002
      
      Test-case 2) SCSI EH
      
          Using this debug patch to wire an SCSI EH trigger, for lpfc_scsi_cmd_iocb_cmpl():
          -       cmd->scsi_done(cmd);
          +       if ((phba->pport ? phba->pport->cfg_log_verbose : phba->cfg_log_verbose) == 0x32100000)
          +               printk(KERN_ALERT "lpfc: skip scsi_done()\n");
          +       else
          +               cmd->scsi_done(cmd);
      
          # echo 0x32100000 > /sys/class/scsi_host/host11/lpfc_log_verbose
      
          # dd if=/dev/sdm of=/dev/null iflag=direct &
          <...>
      
          After a while:
      
          # dmesg
          lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x32100000)
          lpfc: skip scsi_done()
          <...>
          Unable to handle kernel paging request for data at address 0x00000000
          Faulting instruction address: 0xd0000000199e448c
          Oops: Kernel access of bad area, sig: 11 [#1]
          <...>
          CPU: 96 PID: 28556 Comm: scsi_eh_11 Tainted: G        W       4.7.0-rc1-next-20160601-00004-g95b89dc #6
          <...>
          NIP [d0000000199e448c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc]
          LR [d000000019986fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc]
          Call Trace:
          [c000000ff0d0b890] [c000000ff0d0b900] 0xc000000ff0d0b900 (unreliable)
          [c000000ff0d0b8c0] [d00000001998e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc]
          [c000000ff0d0b920] [d000000019991a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc]
          [c000000ff0d0ba20] [d0000000199df27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc]
          [c000000ff0d0baf0] [d0000000199dfd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc]
          [c000000ff0d0bbb0] [c000000000610694] scsi_try_bus_device_reset+0x44/0xc0
          [c000000ff0d0bbe0] [c0000000006126cc] scsi_eh_ready_devs+0x49c/0x9c0
          [c000000ff0d0bcb0] [c000000000614160] scsi_error_handler+0x580/0x680
          [c000000ff0d0bd80] [c0000000000ae848] kthread+0x108/0x130
          [c000000ff0d0be30] [c0000000000094a8] ret_from_kernel_thread+0x5c/0xb4
          Instruction dump:
          <...>
      
          With fix:
      
          # dmesg
          lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x32100000)
          lpfc: skip scsi_done()
          <...>
          lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (0, 0) return x2002
          <...>
          lpfc 0006:01:00.4: 4:(0):0723 SCSI layer issued Target Reset (1, 0) return x2002
          <...>
          lpfc 0006:01:00.4: 4:(0):0714 SCSI layer issued Bus Reset Data: x2002
          <...>
          lpfc 0006:01:00.4: 4:(0):3172 SCSI layer issued Host Reset Data:
          <...>
      
      Fixes: 8b0dff14 ("lpfc: Add support for using block multi-queue")
      Cc: <stable@vger.kernel.org> # v4.2+
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Acked-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      05a05872
  9. 21 Jul, 2016 2 commits
    • Wei Fang's avatar
      scsi:libsas: fix oops caused by assigning a freed task to ->lldd_task · 354a086d
      Wei Fang authored
      A freed task has been assigned to ->lldd_task when lldd_execute_task()
      failed in sas_ata_qc_issue(), and access of ->lldd_task will cause an
      oops:
      
      Call trace:
      [<ffffffc000641f64>] sas_ata_post_internal+0x6c/0x150
      [<ffffffc0006c0d64>] ata_exec_internal_sg+0x32c/0x588
      [<ffffffc0006c1048>] ata_exec_internal+0x88/0xe8
      [<ffffffc0006c13b4>] ata_dev_read_id+0x204/0x5e0
      [<ffffffc0006c17f0>] ata_dev_reread_id+0x60/0xc8
      [<ffffffc0006c3098>] ata_dev_revalidate+0x88/0x1e0
      [<ffffffc0006cf828>] ata_eh_recover+0xcf8/0x13a8
      [<ffffffc0006d075c>] ata_do_eh+0x5c/0xe0
      [<ffffffc0006d0828>] ata_std_error_handler+0x48/0x98
      [<ffffffc0006d042c>] ata_scsi_port_error_handler+0x474/0x658
      [<ffffffc000641b78>] async_sas_ata_eh+0x50/0x80
      [<ffffffc0000ca664>] async_run_entry_fn+0x64/0x180
      [<ffffffc0000c085c>] process_one_work+0x164/0x438
      [<ffffffc0000c0c74>] worker_thread+0x144/0x4b0
      [<ffffffc0000c70fc>] kthread+0xfc/0x110
      
      Fix this by reassigning NULL to ->lldd_task in error path.
      Signed-off-by: default avatarWei Fang <fangwei1@huawei.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      354a086d
    • Dan Carpenter's avatar
      fnic: pci_dma_mapping_error() doesn't return an error code · dd7328e4
      Dan Carpenter authored
      pci_dma_mapping_error() returns true on error and false on success.
      
      Fixes: fd6ddfa4 ('fnic: check pci_map_single() return value')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      dd7328e4
  10. 20 Jul, 2016 4 commits
  11. 15 Jul, 2016 15 commits