1. 27 Feb, 2017 3 commits
    • Nicholas Bellinger's avatar
      target: Add counters for ABORT_TASK success + failure · c87ba9c4
      Nicholas Bellinger authored
      This patch introduces two counters for ABORT_TASK success +
      failure under:
      
         /sys/kernel/config/target/core/$HBA/$DEV/statistics/scsi_tgt_dev/
      
      that are useful for diagnosing various backend device latency
      and front fabric issues.
      
      Normally when folks see alot of aborts_complete happening,
      it means the backend device I/O completion latency is high,
      and not returning completions fast enough before host side
      timeouts trigger.
      
      And normally when folks see alot of aborts_no_task, it means
      completions are being posted by target-core into fabric driver
      code, but the responses aren't making it back to the host.
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      c87ba9c4
    • Nicholas Bellinger's avatar
      iscsi-target: Fix early login failure statistics misses · 17c61ad6
      Nicholas Bellinger authored
      Due to the long standing checks in iscsit_snmp_get_tiqn()
      that assume conn->sess->tpg dereference of tpg->tpg_tiqn
      for iscsit_collect_login_stats() usage, some of the early
      login failure cases like ISCSI_LOGIN_STATUS_TGT_FORBIDDEN
      where not getting incremented, due to sess->tpg assignment
      happening later in iscsi_login_zero_tsih_s2().
      
      Instead, use the earlier conn->tpg assignment done by
      iscsi_target_locate_portal() -> iscsit_get_tpg_from_np()
      so the existing counters are incremented correctly for
      the various early login failure cases.
      
      Also, go ahead and drop the old rate limiting check in
      iscsit_collect_login_stats(), so we get the true number
      of failed login attempts in the existing statistics.
      Reported-by: default avatarRyan Stiles <ras@datera.io>
      Cc: Ryan Stiles <ras@datera.io>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      17c61ad6
    • Nicholas Bellinger's avatar
      target: Fix NULL dereference during LUN lookup + active I/O shutdown · bd4e2d29
      Nicholas Bellinger authored
      When transport_clear_lun_ref() is shutting down a se_lun via
      configfs with new I/O in-flight, it's possible to trigger a
      NULL pointer dereference in transport_lookup_cmd_lun() due
      to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD
      checking before incrementing lun->lun_ref.count after
      lun->lun_ref has switched to atomic_t mode.
      
      This results in a NULL pointer dereference as LUN shutdown
      code in core_tpg_remove_lun() continues running after the
      existing ->release() -> core_tpg_lun_ref_release() callback
      completes, and clears the RCU protected se_lun->lun_se_dev
      pointer.
      
      During the OOPs, the state of lun->lun_ref in the process
      which triggered the NULL pointer dereference looks like
      the following on v4.1.y stable code:
      
      struct se_lun {
        lun_link_magic = 4294932337,
        lun_status = TRANSPORT_LUN_STATUS_FREE,
      
        .....
      
        lun_se_dev = 0x0,
        lun_sep = 0x0,
      
        .....
      
        lun_ref = {
          count = {
            counter = 1
          },
          percpu_count_ptr = 3,
          release = 0xffffffffa02fa1e0 <core_tpg_lun_ref_release>,
          confirm_switch = 0x0,
          force_atomic = false,
          rcu = {
            next = 0xffff88154fa1a5d0,
            func = 0xffffffff8137c4c0 <percpu_ref_switch_to_atomic_rcu>
          }
        }
      }
      
      To address this bug, use percpu_ref_tryget_live() to ensure
      once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref
      has switched to atomic_t, all new I/Os will fail to obtain
      a new lun->lun_ref reference.
      
      Also use an explicit percpu_ref_kill_and_confirm() callback
      to block on ->lun_ref_comp to allow the first stage and
      associated RCU grace period to complete, and then block on
      ->lun_ref_shutdown waiting for the final percpu_ref_put()
      to drop the last reference via transport_lun_remove_cmd()
      before continuing with core_tpg_remove_lun() shutdown.
      Reported-by: default avatarRob Millner <rlm@daterainc.com>
      Tested-by: default avatarRob Millner <rlm@daterainc.com>
      Cc: Rob Millner <rlm@daterainc.com>
      Tested-by: default avatarVaibhav Tandon <vst@datera.io>
      Cc: Vaibhav Tandon <vst@datera.io>
      Tested-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org> # v3.14+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      bd4e2d29
  2. 20 Feb, 2017 2 commits
  3. 19 Feb, 2017 9 commits
  4. 09 Feb, 2017 26 commits