1. 09 May, 2018 8 commits
    • Alexandru Moise's avatar
      nvmet,rxe: defer ip datagram sending to tasklet · 1661d3b0
      Alexandru Moise authored
      This addresses 3 separate problems:
      
      1. When using NVME over Fabrics we may end up sending IP
      packets in interrupt context, we should defer this work
      to a tasklet.
      
      [   50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 __local_bh_enable_ip+0x1f/0xa0
      [   50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         4.17.0-rc3-ARCH+ #104
      [   50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
      [   50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0
      [   50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006
      [   50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 0000000000000001
      [   50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff817e04ec
      [   50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 0000000000000614
      [   50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: ffffffff817e04ec
      [   50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: ffff8800956c7000
      [   50.959402] FS:  0000000000000000(0000) GS:ffff88009c180000(0000) knlGS:0000000000000000
      [   50.961552] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 00000000000606e0
      [   50.966121] Call Trace:
      [   50.966845]  <IRQ>
      [   50.967497]  __dev_queue_xmit+0x62d/0x690
      [   50.968722]  dev_queue_xmit+0x10/0x20
      [   50.969894]  neigh_resolve_output+0x173/0x190
      [   50.971244]  ip_finish_output2+0x2b8/0x370
      [   50.972527]  ip_finish_output+0x1d2/0x220
      [   50.973785]  ? ip_finish_output+0x1d2/0x220
      [   50.975010]  ip_output+0xd4/0x100
      [   50.975903]  ip_local_out+0x3b/0x50
      [   50.976823]  rxe_send+0x74/0x120
      [   50.977702]  rxe_requester+0xe3b/0x10b0
      [   50.978881]  ? ip_local_deliver_finish+0xd1/0xe0
      [   50.980260]  rxe_do_task+0x85/0x100
      [   50.981386]  rxe_run_task+0x2f/0x40
      [   50.982470]  rxe_post_send+0x51a/0x550
      [   50.983591]  nvmet_rdma_queue_response+0x10a/0x170
      [   50.985024]  __nvmet_req_complete+0x95/0xa0
      [   50.986287]  nvmet_req_complete+0x15/0x60
      [   50.987469]  nvmet_bio_done+0x2d/0x40
      [   50.988564]  bio_endio+0x12c/0x140
      [   50.989654]  blk_update_request+0x185/0x2a0
      [   50.990947]  blk_mq_end_request+0x1e/0x80
      [   50.991997]  nvme_complete_rq+0x1cc/0x1e0
      [   50.993171]  nvme_pci_complete_rq+0x117/0x120
      [   50.994355]  __blk_mq_complete_request+0x15e/0x180
      [   50.995988]  blk_mq_complete_request+0x6f/0xa0
      [   50.997304]  nvme_process_cq+0xe0/0x1b0
      [   50.998494]  nvme_irq+0x28/0x50
      [   50.999572]  __handle_irq_event_percpu+0xa2/0x1c0
      [   51.000986]  handle_irq_event_percpu+0x32/0x80
      [   51.002356]  handle_irq_event+0x3c/0x60
      [   51.003463]  handle_edge_irq+0x1c9/0x200
      [   51.004473]  handle_irq+0x23/0x30
      [   51.005363]  do_IRQ+0x46/0xd0
      [   51.006182]  common_interrupt+0xf/0xf
      [   51.007129]  </IRQ>
      
      2. Work must always be offloaded to tasklet for rxe_post_send_kernel()
      when using NVMEoF in order to solve lock ordering between neigh->ha_lock
      seqlock and the nvme queue lock:
      
      [   77.833783]  Possible interrupt unsafe locking scenario:
      [   77.833783]
      [   77.835831]        CPU0                    CPU1
      [   77.837129]        ----                    ----
      [   77.838313]   lock(&(&n->ha_lock)->seqcount);
      [   77.839550]                                local_irq_disable();
      [   77.841377]                                lock(&(&nvmeq->q_lock)->rlock);
      [   77.843222]                                lock(&(&n->ha_lock)->seqcount);
      [   77.845178]   <Interrupt>
      [   77.846298]     lock(&(&nvmeq->q_lock)->rlock);
      [   77.847986]
      [   77.847986]  *** DEADLOCK ***
      
      3. Same goes for the lock ordering between sch->q.lock and nvme queue lock:
      
      [   47.634271]  Possible interrupt unsafe locking scenario:
      [   47.634271]
      [   47.636452]        CPU0                    CPU1
      [   47.637861]        ----                    ----
      [   47.639285]   lock(&(&sch->q.lock)->rlock);
      [   47.640654]                                local_irq_disable();
      [   47.642451]                                lock(&(&nvmeq->q_lock)->rlock);
      [   47.644521]                                lock(&(&sch->q.lock)->rlock);
      [   47.646480]   <Interrupt>
      [   47.647263]     lock(&(&nvmeq->q_lock)->rlock);
      [   47.648492]
      [   47.648492]  *** DEADLOCK ***
      
      Using NVMEoF after this patch seems to finally be stable, without it,
      rxe eventually deadlocks the whole system and causes RCU stalls.
      Signed-off-by: default avatarAlexandru Moise <00moses.alexander00@gmail.com>
      Reviewed-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      1661d3b0
    • Mustafa Ismail's avatar
      i40iw: Use correct address in dst_neigh_lookup for IPv6 · eeb1af4f
      Mustafa Ismail authored
      Use of incorrect structure address for IPv6 neighbor lookup
      causes connections to IPv6 addresses to fail. Fix this by
      using correct address in call to dst_neigh_lookup.
      
      Fixes: f27b4746 ("i40iw: add connection management code")
      Signed-off-by: default avatarMustafa Ismail <mustafa.ismail@intel.com>
      Signed-off-by: default avatarShiraz Saleem <shiraz.saleem@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      eeb1af4f
    • Mustafa Ismail's avatar
      i40iw: Fix memory leak in error path of create QP · 5a7189d5
      Mustafa Ismail authored
      If i40iw_allocate_dma_mem fails when creating a QP, the
      memory allocated for the QP structure using kzalloc is not
      freed because iwqp->allocated_buffer is used to free the
      memory and it is not setup until later. Fix this by setting
      iwqp->allocated_buffer before allocating the dma memory.
      
      Fixes: d3749841 ("i40iw: add files for iwarp interface")
      Signed-off-by: default avatarMustafa Ismail <mustafa.ismail@intel.com>
      Signed-off-by: default avatarShiraz Saleem <shiraz.saleem@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      5a7189d5
    • Daria Velikovsky's avatar
      RDMA/mlx5: Use proper spec flow label type · 37da2a03
      Daria Velikovsky authored
      Flow label is defined as u32 in the in ipv6 flow spec, but
      used internally in the flow specs parsing as u8. That was
      causing loss of part of flow_label value.
      
      Fixes: 2d1e697e ('IB/mlx5: Add support to match inner packet fields')
      Reviewed-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarDaria Velikovsky <daria@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      37da2a03
    • Yishai Hadas's avatar
      RDMA/mlx5: Don't assume that medium blueFlame register exists · 18b0362e
      Yishai Hadas authored
      User can leave system without medium BlueFlames registers,
      however the code assumed that at least one such register exists.
      
      This patch fixes that assumption.
      
      Fixes: c1be5232 ("IB/mlx5: Fix micro UAR allocator")
      Reported-by: default avatarRohit Zambre <rzambre@uci.edu>
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      18b0362e
    • Michael J. Ruhl's avatar
      IB/hfi1: Use after free race condition in send context error path · f9e76ca3
      Michael J. Ruhl authored
      A pio send egress error can occur when the PSM library attempts to
      to send a bad packet.  That issue is still being investigated.
      
      The pio error interrupt handler then attempts to progress the recovery
      of the errored pio send context.
      
      Code inspection reveals that the handling lacks the necessary locking
      if that recovery interleaves with a PSM close of the "context" object
      contains the pio send context.
      
      The lack of the locking can cause the recovery to access the already
      freed pio send context object and incorrectly deduce that the pio
      send context is actually a kernel pio send context as shown by the
      NULL deref stack below:
      
      [<ffffffff8143d78c>] _dev_info+0x6c/0x90
      [<ffffffffc0613230>] sc_restart+0x70/0x1f0 [hfi1]
      [<ffffffff816ab124>] ? __schedule+0x424/0x9b0
      [<ffffffffc06133c5>] sc_halted+0x15/0x20 [hfi1]
      [<ffffffff810aa3ba>] process_one_work+0x17a/0x440
      [<ffffffff810ab086>] worker_thread+0x126/0x3c0
      [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0
      [<ffffffff810b252f>] kthread+0xcf/0xe0
      [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
      [<ffffffff816b8798>] ret_from_fork+0x58/0x90
      [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
      
      This is the best case scenario and other scenarios can corrupt the
      already freed memory.
      
      Fix by adding the necessary locking in the pio send context error
      handler.
      
      Cc: <stable@vger.kernel.org> # 4.9.x
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      f9e76ca3
    • Leon Romanovsky's avatar
      MAINTAINERS: Remove bouncing @mellanox.com addresses · 27f70620
      Leon Romanovsky authored
      Delete non-existent @mellanox.com addresses from MAINTAINERS file.
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      27f70620
    • Greg Thelen's avatar
      IB: remove redundant INFINIBAND kconfig dependencies · 9533b292
      Greg Thelen authored
      INFINIBAND_ADDR_TRANS depends on INFINIBAND.  So there's no need for
      options which depend INFINIBAND_ADDR_TRANS to also depend on INFINIBAND.
      Remove the unnecessary INFINIBAND depends.
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      9533b292
  2. 03 May, 2018 8 commits
  3. 30 Apr, 2018 1 commit
  4. 27 Apr, 2018 23 commits