An error occurred fetching the project authors.
  1. 23 Jul, 2023 2 commits
  2. 31 May, 2023 4 commits
  3. 08 May, 2023 1 commit
  4. 10 Mar, 2023 2 commits
  5. 22 Feb, 2023 1 commit
  6. 12 Jan, 2023 2 commits
  7. 17 Nov, 2022 1 commit
  8. 11 Oct, 2022 1 commit
    • Jason A. Donenfeld's avatar
      treewide: use get_random_{u8,u16}() when possible, part 1 · 7e3cf084
      Jason A. Donenfeld authored
      Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
      simply use the get_random_{u8,u16}() functions, which are faster than
      wasting the additional bytes from a 32-bit value. This was done
      mechanically with this coccinelle script:
      
      @@
      expression E;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u16;
      typedef __be16;
      typedef __le16;
      typedef u8;
      @@
      (
      - (get_random_u32() & 0xffff)
      + get_random_u16()
      |
      - (get_random_u32() & 0xff)
      + get_random_u8()
      |
      - (get_random_u32() % 65536)
      + get_random_u16()
      |
      - (get_random_u32() % 256)
      + get_random_u8()
      |
      - (get_random_u32() >> 16)
      + get_random_u16()
      |
      - (get_random_u32() >> 24)
      + get_random_u8()
      |
      - (u16)get_random_u32()
      + get_random_u16()
      |
      - (u8)get_random_u32()
      + get_random_u8()
      |
      - (__be16)get_random_u32()
      + (__be16)get_random_u16()
      |
      - (__le16)get_random_u32()
      + (__le16)get_random_u16()
      |
      - prandom_u32_max(65536)
      + get_random_u16()
      |
      - prandom_u32_max(256)
      + get_random_u8()
      |
      - E->inet_id = get_random_u32()
      + E->inet_id = get_random_u16()
      )
      
      @@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u16;
      identifier v;
      @@
      - u16 v = get_random_u32();
      + u16 v = get_random_u16();
      
      @@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u8;
      identifier v;
      @@
      - u8 v = get_random_u32();
      + u8 v = get_random_u8();
      
      @@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u16;
      u16 v;
      @@
      -  v = get_random_u32();
      +  v = get_random_u16();
      
      @@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u8;
      u8 v;
      @@
      -  v = get_random_u32();
      +  v = get_random_u8();
      
      // Find a potential literal
      @literal_mask@
      expression LITERAL;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      position p;
      @@
      
              ((T)get_random_u32()@p & (LITERAL))
      
      // Examine limits
      @script:python add_one@
      literal << literal_mask.LITERAL;
      RESULT;
      @@
      
      value = None
      if literal.startswith('0x'):
              value = int(literal, 16)
      elif literal[0] in '123456789':
              value = int(literal, 10)
      if value is None:
              print("I don't know how to handle %s" % (literal))
              cocci.include_match(False)
      elif value < 256:
              coccinelle.RESULT = cocci.make_ident("get_random_u8")
      elif value < 65536:
              coccinelle.RESULT = cocci.make_ident("get_random_u16")
      else:
              print("Skipping large mask of %s" % (literal))
              cocci.include_match(False)
      
      // Replace the literal mask with the calculated result.
      @plus_one@
      expression literal_mask.LITERAL;
      position literal_mask.p;
      identifier add_one.RESULT;
      identifier FUNC;
      @@
      
      -       (FUNC()@p & (LITERAL))
      +       (RESULT() & LITERAL)
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      7e3cf084
  9. 16 Sep, 2022 3 commits
  10. 01 Sep, 2022 2 commits
  11. 07 Jul, 2022 2 commits
  12. 11 May, 2022 4 commits
  13. 19 Apr, 2022 4 commits
  14. 30 Mar, 2022 2 commits
    • James Smart's avatar
      scsi: lpfc: Fix unload hang after back to back PCI EEH faults · a4691038
      James Smart authored
      When injecting EEH errors the port is getting hung up waiting on the node
      list to empty, message number 0233. The driver is stuck at this point and
      also can't unload. The driver makes transport remoteport delete calls which
      try to abort I/O's, but the EEH daemon has already called the driver to
      detach and the detachment has set the global FC_UNLOADING flag.  There are
      several code paths that will avoid I/O cleanup if the FC_UNLOADING flag is
      set, resulting in transports waiting for I/O while the driver is waiting on
      transports to clean up.
      
      Additionally, during study of the list, a locking issue was found in
      lpfc_sli_abort_iocb_ring that could corrupt the list.
      
      A special case was added to the lpfc_cleanup() routine to call
      lpfc_sli_flush_rings() if the driver is FC_UNLOADING and if the pci-slot
      is offline (e.g. EEH).
      
      The SLI4 part of lpfc_sli_abort_iocb_ring() is changed to use the
      ring_lock.  Also added code to cancel the I/Os if the pci-slot is offline
      and added checks and returns for the FC_UNLOADING and HBA_IOQ_FLUSH flags
      to prevent trying to send an I/O that we cannot handle.
      
      Link: https://lore.kernel.org/r/20220317032737.45308-3-jsmart2021@gmail.comCo-developed-by: default avatarJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: default avatarJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      a4691038
    • James Smart's avatar
      scsi: lpfc: Improve PCI EEH Error and Recovery Handling · 35ed9613
      James Smart authored
      Following EEH errors, the driver can crash or hang when deleting the
      localport or when attempting to unload.
      
      The EEH handlers in the driver did not notify the NVMe-FC transport before
      tearing the driver down. This was delayed until the resume steps. This
      worked for SCSI because lpfc_block_scsi() would notify the
      scsi_fc_transport that the target was not available but it would not clean
      up all the references to the ndlp.
      
      The SLI3 prep for dev reset handler did the lpfc_offline_prep() and
      lpfc_offline() calls to get the port stopped before restarting. The SLI4
      version of the prep for dev reset just destroyed the queues and did not
      stop NVMe from continuing.  Also because the port was not really stopped
      the localport destroy would hang because the transport was still waiting
      for I/O. Additionally, a devloss tmo can fire and post events to a stopped
      worker thread creating another hang condition.
      
      lpfc_sli4_prep_dev_for_reset() is modified to call lpfc_offline_prep() and
      lpfc_offline() rather than just lpfc_scsi_dev_block() to ensure both SCSI
      and NVMe transports are notified to block I/O to the driver.
      
      Logic is added to devloss handler and worker thread to clean up ndlp
      references and quiesce appropriately.
      
      Link: https://lore.kernel.org/r/20220317032737.45308-2-jsmart2021@gmail.comCo-developed-by: default avatarJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: default avatarJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      35ed9613
  15. 15 Mar, 2022 2 commits
  16. 07 Dec, 2021 2 commits
  17. 21 Oct, 2021 2 commits
  18. 15 Sep, 2021 3 commits
    • James Smart's avatar
      scsi: lpfc: Fix EEH support for NVMe I/O · 25ac2c97
      James Smart authored
      Injecting errors on the PCI slot while the driver is handling NVMe I/O will
      cause crashes and hangs.
      
      There are several rather difficult scenarios occurring. The main issue is
      that the adapter can report a PCI error before or simultaneously to the PCI
      subsystem reporting the error. Both paths have different entry points and
      currently there is no interlock between them. Thus multiple teardown paths
      are competing and all heck breaks loose.
      
      Complicating things is the NVMs path. To a large degree, I/O was able to be
      shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a
      similar call. At best, it works on a per-controller basis, but even at the
      controller level, it's a controller "reset" call. All of which means I/O is
      still flowing on different CPUs with reset paths expecting hw access
      (mailbox commands) to execute properly.
      
      The following modifications are made:
      
       - A new flag is set in PCI error entrypoints so the driver can track being
         called by that path.
      
       - An interlock is added in the SLI hw error path and the PCI error path
         such that only one of the paths proceeds with the teardown logic.
      
       - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx
         cmds in cases of hw error.
      
       - If entering the SLI port re-init calls, a case where SLI error teardown
         was quick and beat the PCI calls now reporting error, check whether the
         SLI port is still live on the PCI bus.
      
       - In the PCI reset code to bring the adapter back, recheck the IRQ
         settings. Different checks for SLI3 vs SLI4.
      
       - In I/O completions, that may be called as part of the cleanup or
         underway just before the hw error, check the state of the adapter.  If
         in error, shortcut handling that would expect further adapter
         completions as the hw error won't be sending them.
      
       - In routines waiting on I/O completions, which may have been in progress
         prior to the hw error, detect the device is being torn down and abort
         from their waits and just give up. This points to a larger issue in the
         driver on ref-counting for data structures, as it doesn't have
         ref-counting on q and port structures. We'll do this fix for now as it
         would be a major rework to be done differently.
      
       - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being
         failed back due to hw error.
      
       - In I/O buf allocation, done at the start of new I/Os, check hw state and
         fail if hw error.
      
      Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.comCo-developed-by: default avatarJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: default avatarJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      25ac2c97
    • James Smart's avatar
      scsi: lpfc: Fix rediscovery of tape device after LIP · 3a874488
      James Smart authored
      On link up and node discovery, a remote port is registered with the SCSI
      transport and the driver sets fc4_xpt_flags to track transport
      registration.
      
      A link down event causes the driver to deregister with the SCSI transport,
      starting the devloss timer, and calls a local unreg routine to clear the
      login state. Part of the login state is the fc4_xpt_flags.  However, with
      tape devices that support sequence level error recovery, which wants to
      preserve the login, the local unreg routine is skipped, thus the flags
      aren't cleared.
      
      A subsequent link up, ADISC is performed and the lpfc_nlp_reg_node()
      routine is called. As the fc4_xpt_flags is not clear, it's believed the
      node is already registered with the transport. Unfortunately, the
      registration was already terminated. Eventually the devloss tmo timer
      expires and tears down the device.
      
      Fix by ensuring the tape device, known by the ADISC flag, is always
      unregistered if the link drops.
      
      Link: https://lore.kernel.org/r/20210910233159.115896-6-jsmart2021@gmail.comCo-developed-by: default avatarJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: default avatarJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      3a874488
    • James Smart's avatar
      scsi: lpfc: Fix hang on unload due to stuck fport node · 88f77029
      James Smart authored
      A test scenario encountered an unload hang while an FLOGI ELS was in flight
      when a link down condition occurred.  The driver fails unload as it never
      releases the fport node.
      
      For most nodes, when the link drops, devloss tmo is started and the timeout
      will cause the final node release. For the Fport, as it has not yet
      registered with the SCSI transport, there is no devloss timer to be
      started, so there is no final release.  Additionally, the link down
      sequence causes ABORTS to be issued for pending ELS's. The completions from
      the ABORTS perform the release of node references.  However, as the adapter
      is being reset to be unloaded, those completions will never occur.
      
      Fix by the following:
      
       - In the ELS cleanup, recognize when unloading and place the ELS's on a
         different list that immediately cleans up/completes the ELS's.  It's
         recognized that this condition primarily affects only the fport, with
         other ports having normal clean up logic that handles things.
      
       - Resolve the devloss issue by, when cleaning up nodes on after link down,
         recognizing when the fabric node does not have a completed state (its
         state is UNUSED) and removing a reference so the node can delete after
         the ELS reference is released.
      
      Link: https://lore.kernel.org/r/20210910233159.115896-5-jsmart2021@gmail.comCo-developed-by: default avatarJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: default avatarJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      88f77029