1. 30 Apr, 2016 1 commit
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · 925d96a0
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "Final set of -rc fixes for 4.6.
      
        I've collected up a number of patches that are all pretty small with
        the exception of only a couple.  The hfi1 driver has a number of
        important patches, and it is what really drives the line count of this
        pull request up.  These are all small and I've got this kernel built
        and running in the test lab (I have most of the hardware, I think nes
        is the only thing in this patch set that I can't say I've personally
        tested and have up and running).
      
        Summary:
      
         - A number of collected fixes for oopses, memory corruptions,
           deadlocks, etc.  All of these fixes are small (many only 5-10
           lines), obvious, and tested.
      
         - Fix for the security issue related to the use of write for
           bi-directional communications"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        RDMA/nes: don't leak skb if carrier down
        IB/security: Restrict use of the write() interface
        IB/hfi1: Use kernel default llseek for ui device
        IB/hfi1: Don't attempt to free resources if initialization failed
        IB/hfi1: Fix missing lock/unlock in verbs drain callback
        IB/rdmavt: Fix send scheduling
        IB/hfi1: Prevent unpinning of wrong pages
        IB/hfi1: Fix deadlock caused by locking with wrong scope
        IB/hfi1: Prevent NULL pointer deferences in caching code
        MAINTAINERS: Update iser/isert maintainer contact info
        IB/mlx5: Expose correct max_sge_rd limit
        RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
        iw_cxgb4: handle draining an idle qp
        iw_cxgb3: initialize ibdev.iwcm->ifname for port mapping
        iw_cxgb4: initialize ibdev.iwcm->ifname for port mapping
        IB/core: Don't drain non-existent rq queue-pair
        IB/core: Fix oops in ib_cache_gid_set_default_gid
      925d96a0
  2. 29 Apr, 2016 30 commits
  3. 28 Apr, 2016 9 commits
    • Jason Gunthorpe's avatar
      IB/security: Restrict use of the write() interface · e6bd18f5
      Jason Gunthorpe authored
      The drivers/infiniband stack uses write() as a replacement for
      bi-directional ioctl().  This is not safe. There are ways to
      trigger write calls that result in the return structure that
      is normally written to user space being shunted off to user
      specified kernel memory instead.
      
      For the immediate repair, detect and deny suspicious accesses to
      the write API.
      
      For long term, update the user space libraries and the kernel API
      to something that doesn't present the same security vulnerabilities
      (likely a structured ioctl() interface).
      
      The impacted uAPI interfaces are generally only available if
      hardware from drivers/infiniband is installed in the system.
      Reported-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      [ Expanded check to all known write() entry points ]
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      e6bd18f5
    • Dean Luick's avatar
      IB/hfi1: Use kernel default llseek for ui device · 7723d8c2
      Dean Luick authored
      The ui device llseek had a mistake with SEEK_END and did
      not fully follow seek semantics.  Correct all this by
      using a kernel supplied function for fixed size devices.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDean Luick <dean.luick@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      7723d8c2
    • Mitko Haralanov's avatar
      IB/hfi1: Don't attempt to free resources if initialization failed · 94158442
      Mitko Haralanov authored
      Attempting to free resources which have not been allocated and
      initialized properly led to the following kernel backtrace:
      
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
          PGD 852a43067 PUD 85d4a6067 PMD 0
          Oops: 0000 [#1] SMP
          CPU: 0 PID: 2831 Comm: osu_bw Tainted: G          IO 3.12.18-wfr+ #1
          task: ffff88085b15b540 ti: ffff8808588fe000 task.ti: ffff8808588fe000
          RIP: 0010:[<ffffffffa09658fe>]  [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
          RSP: 0018:ffff8808588ffde0  EFLAGS: 00010282
          RAX: 0000000000000000 RBX: ffff880858a31800 RCX: 0000000000000000
          RDX: ffff88085d971bc0 RSI: ffff880858a318f8 RDI: ffff880858a318c0
          RBP: ffff8808588ffe20 R08: 0000000000000000 R09: 0000000000000000
          R10: ffff88087ffd6f40 R11: 0000000001100348 R12: ffff880852900000
          R13: ffff880858a318c0 R14: 0000000000000000 R15: ffff88085d971be8
          FS:  00007f4674e83740(0000) GS:ffff88087f400000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000000 CR3: 000000085c377000 CR4: 00000000001407f0
          Stack:
           ffffffffa0941a71 ffff880858a318f8 ffff88085d971bc0 ffff880858a31800
           ffff880852900000 ffff880858a31800 00000000003ffff7 ffff88085d971bc0
           ffff8808588ffe60 ffffffffa09663fc ffff8808588ffe60 ffff880858a31800
          Call Trace:
           [<ffffffffa0941a71>] ? find_mmu_handler+0x51/0x70 [hfi1]
           [<ffffffffa09663fc>] hfi1_user_exp_rcv_free+0x6c/0x120 [hfi1]
           [<ffffffffa0932809>] hfi1_file_close+0x1a9/0x340 [hfi1]
           [<ffffffff8116c189>] __fput+0xe9/0x270
           [<ffffffff8116c35e>] ____fput+0xe/0x10
           [<ffffffff81065707>] task_work_run+0xa7/0xe0
           [<ffffffff81002969>] do_notify_resume+0x59/0x80
           [<ffffffff814ffc1a>] int_signal+0x12/0x17
      
      This commit re-arranges the context initialization code in a way that
      would allow for context event flags to be used to determine whether
      the context has been successfully initialized.
      
      In turn, this can be used to skip the resource de-allocation if they
      were never allocated in the first place.
      
      Fixes: 3abb33ac ("staging/hfi1: Add TID cache receive init and free funcs")
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarMitko Haralanov <mitko.haralanov@intel.com>
      Reviewed-by: Leon Romanovsky <leonro@mellanox.com.
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      94158442
    • Mike Marciniszyn's avatar
      IB/hfi1: Fix missing lock/unlock in verbs drain callback · b9b06cb6
      Mike Marciniszyn authored
      The iowait_sdma_drained() callback lacked locking to
      protect the qp s_flags field.
      
      This causes the s_flags to be out of sync
      on multiple CPUs, potentially corrupting the s_flags.
      
      Fixes: a545f530 ("staging/rdma/hfi: fix CQ completion order issue")
      Reviewed-by: default avatarSebastian Sanchez <sebastian.sanchez@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      b9b06cb6
    • Jubin John's avatar
      IB/rdmavt: Fix send scheduling · e6d2e017
      Jubin John authored
      call_send is used to determine whether to send immediately or schedule
      a send for later. The current logic in rdmavt is inverted and has a
      negative impact on the latency of the hfi1 and qib drivers. Fix this
      regression by correctly calling send immediately when call_send is set.
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarJubin John <jubin.john@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      e6d2e017
    • Mitko Haralanov's avatar
      IB/hfi1: Prevent unpinning of wrong pages · 849e3e93
      Mitko Haralanov authored
      The routine used by the SDMA cache to handle already
      cached nodes can extend an already existing node.
      
      In its error handling code, the routine will unpin pages
      when not all pages of the buffer extension were pinned.
      
      There was a bug in that part of the routine, which would
      mistakenly unpin pages from the original set rather than
      the newly pinned pages.
      
      This commit fixes that bug by offsetting the page array
      to the proper place pointing at the beginning of the newly
      pinned pages.
      Reviewed-by: default avatarDean Luick <dean.luick@intel.com>
      Signed-off-by: default avatarMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      849e3e93
    • Mitko Haralanov's avatar
      IB/hfi1: Fix deadlock caused by locking with wrong scope · de82bdff
      Mitko Haralanov authored
      The locking around the interval RB tree is designed to prevent
      access to the tree while it's being modified. The locking in its
      current form is too overzealous, which is causing a deadlock in
      certain cases with the following backtrace:
      
          Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
          CPU: 0 PID: 5836 Comm: IMB-MPI1 Tainted: G           O 3.12.18-wfr+ #1
           0000000000000000 ffff88087f206c50 ffffffff814f1caa ffffffff817b53f0
           ffff88087f206cc8 ffffffff814ecd56 0000000000000010 ffff88087f206cd8
           ffff88087f206c78 0000000000000000 0000000000000000 0000000000001662
          Call Trace:
           <NMI>  [<ffffffff814f1caa>] dump_stack+0x45/0x56
           [<ffffffff814ecd56>] panic+0xc2/0x1cb
           [<ffffffff810d4370>] ? restart_watchdog_hrtimer+0x50/0x50
           [<ffffffff810d4432>] watchdog_overflow_callback+0xc2/0xd0
           [<ffffffff81109b4e>] __perf_event_overflow+0x8e/0x2b0
           [<ffffffff8110a714>] perf_event_overflow+0x14/0x20
           [<ffffffff8101c906>] intel_pmu_handle_irq+0x1b6/0x390
           [<ffffffff814f927b>] perf_event_nmi_handler+0x2b/0x50
           [<ffffffff814f8ad8>] nmi_handle.isra.3+0x88/0x180
           [<ffffffff814f8d39>] do_nmi+0x169/0x310
           [<ffffffff814f8177>] end_repeat_nmi+0x1e/0x2e
           [<ffffffff81272600>] ? unmap_single+0x30/0x30
           [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
           [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
           [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
           <<EOE>>  <IRQ>  [<ffffffffa056c4a8>] hfi1_mmu_rb_search+0x38/0x70 [hfi1]
           [<ffffffffa05919cb>] user_sdma_free_request+0xcb/0x120 [hfi1]
           [<ffffffffa0593393>] user_sdma_txreq_cb+0x263/0x350 [hfi1]
           [<ffffffffa057fad7>] ? sdma_txclean+0x27/0x1c0 [hfi1]
           [<ffffffffa0593130>] ? user_sdma_send_pkts+0x1710/0x1710 [hfi1]
           [<ffffffffa057fdd6>] sdma_make_progress+0x166/0x480 [hfi1]
           [<ffffffff810762c9>] ? ttwu_do_wakeup+0x19/0xd0
           [<ffffffffa0581c7e>] sdma_engine_interrupt+0x8e/0x100 [hfi1]
           [<ffffffffa0546bdd>] sdma_interrupt+0x5d/0xa0 [hfi1]
           [<ffffffff81097e57>] handle_irq_event_percpu+0x47/0x1d0
           [<ffffffff81098017>] handle_irq_event+0x37/0x60
           [<ffffffff8109aa5f>] handle_edge_irq+0x6f/0x120
           [<ffffffff810044af>] handle_irq+0xbf/0x150
           [<ffffffff8104c9b7>] ? irq_enter+0x17/0x80
           [<ffffffff8150168d>] do_IRQ+0x4d/0xc0
           [<ffffffff814f7c6a>] common_interrupt+0x6a/0x6a
           <EOI>  [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0
           [<ffffffff814f56c6>] __schedule+0x3b6/0x7e0
           [<ffffffff810763a6>] __cond_resched+0x26/0x30
           [<ffffffff814f5eda>] _cond_resched+0x3a/0x50
           [<ffffffff814f4f82>] down_write+0x12/0x30
           [<ffffffffa0591619>] hfi1_release_user_pages+0x69/0x90 [hfi1]
           [<ffffffffa059173a>] sdma_rb_remove+0x9a/0xc0 [hfi1]
           [<ffffffffa056c00d>] __mmu_rb_remove.isra.5+0x5d/0x70 [hfi1]
           [<ffffffffa056c536>] hfi1_mmu_rb_remove+0x56/0x70 [hfi1]
           [<ffffffffa059427b>] hfi1_user_sdma_process_request+0x74b/0x1160 [hfi1]
           [<ffffffffa055c763>] hfi1_aio_write+0xc3/0x100 [hfi1]
           [<ffffffff8116a14c>] do_sync_readv_writev+0x4c/0x80
           [<ffffffff8116b58b>] do_readv_writev+0xbb/0x230
           [<ffffffff811a9da1>] ? fsnotify+0x241/0x320
           [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0
           [<ffffffff8116b795>] vfs_writev+0x35/0x60
           [<ffffffff8116b8c9>] SyS_writev+0x49/0xc0
           [<ffffffff810cd876>] ? __audit_syscall_exit+0x1f6/0x2a0
           [<ffffffff814ff992>] system_call_fastpath+0x16/0x1b
      
      As evident from the backtrace above, the process was being put to sleep
      while holding the lock.
      
      Limiting the scope of the lock only to the RB tree operation fixes the
      above error allowing for proper locking and the process being put to
      sleep when needed.
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Reviewed-by: default avatarDean Luick <dean.luick@intel.com>
      Signed-off-by: default avatarMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      de82bdff
    • Mitko Haralanov's avatar
      IB/hfi1: Prevent NULL pointer deferences in caching code · f19bd643
      Mitko Haralanov authored
      There is a potential kernel crash when the MMU notifier calls the
      invalidation routines in the hfi1 pinned page caching code for sdma.
      
      The invalidation routine could call the remove callback
      for the node, which in turn ends up dereferencing the
      current task_struct to get a pointer to the mm_struct.
      However, the mm_struct pointer could be NULL resulting in
      the following backtrace:
      
          BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
          IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
          15
          task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000
          RIP: 0010:[<ffffffffa041f75a>]  [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
          RSP: 0000:ffff88085c245878  EFLAGS: 00010002
          RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830
          RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0
          RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb
          R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0
          R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40
          FS:  0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0
          Stack:
           ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
           ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
           0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
          Call Trace:
           [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
           [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
           [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
           [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
           [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
           [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
           [<ffffffff81141fad>] try_to_unmap+0x4d/0x60
           [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
           [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
           [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640
           [<ffffffff81121641>] shrink_zone+0x31/0x100
           [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
           [<ffffffff811229e3>] kswapd+0x403/0x7e0
           [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
           [<ffffffff81068ac0>] kthread+0xc0/0xd0
           [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
           [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
           [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
      
      To correct this, the mm_struct passed to us by the MMU notifier is
      used (which is what should have been done to begin with). This avoids
      the broken derefences and ensures that the correct mm_struct is used.
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Reviewed-by: default avatarDean Luick <dean.luick@intel.com>
      Signed-off-by: default avatarMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      f19bd643
    • Sagi Grimberg's avatar