1. 07 Mar, 2018 6 commits
  2. 23 Feb, 2018 15 commits
  3. 15 Feb, 2018 7 commits
    • Yonatan Cohen's avatar
      IB/mlx5: Implement fragmented completion queue (CQ) · 388ca8be
      Yonatan Cohen authored
      The current implementation of create CQ requires contiguous
      memory, such requirement is problematic once the memory is
      fragmented or the system is low in memory, it causes for
      failures in dma_zalloc_coherent().
      
      This patch implements new scheme of fragmented CQ to overcome
      this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
      to allocate fragmented buffers, rather than contiguous ones.
      
      Base the Completion Queues (CQs) on this new fragmented buffer.
      
      It fixes following crashes:
      kworker/29:0: page allocation failure: order:6, mode:0x80d0
      CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
      Workqueue: ib_cm cm_work_handler [ib_cm]
      Call Trace:
      [<>] dump_stack+0x19/0x1b
      [<>] warn_alloc_failed+0x110/0x180
      [<>] __alloc_pages_slowpath+0x6b7/0x725
      [<>] __alloc_pages_nodemask+0x405/0x420
      [<>] dma_generic_alloc_coherent+0x8f/0x140
      [<>] x86_swiotlb_alloc_coherent+0x21/0x50
      [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
      [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
      [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
      [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
      [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
      [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
      Signed-off-by: default avatarYonatan Cohen <yonatanc@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      388ca8be
    • Saeed Mahameed's avatar
      net/mlx5: Remove redundant EQ API exports · 3ec5693b
      Saeed Mahameed authored
      EQ structure and API is private to mlx5_core driver only, external
      drivers should not have access or the means to manipulate EQ objects.
      
      Remove redundant exports and move API functions out of the linux/mlx5
      include directory into the driver's mlx5_core.h private include file.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarGal Pressman <galp@mellanox.com>
      3ec5693b
    • Saeed Mahameed's avatar
      net/mlx5: Move CQ completion and event forwarding logic to eq.c · 3ac7afdb
      Saeed Mahameed authored
      Since CQ tree is now per EQ, CQ completion and event forwarding became
      specific implementation of EQ logic, this patch moves that logic to eq.c
      and makes those functions static.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarGal Pressman <galp@mellanox.com>
      3ac7afdb
    • Saeed Mahameed's avatar
      net/mlx5: CQ hold/put API · f105b45b
      Saeed Mahameed authored
      Now as the CQ table is per EQ, add an API to hold/put CQ to be used from
      eq.c in downstream patch.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarGal Pressman <galp@mellanox.com>
      f105b45b
    • Saeed Mahameed's avatar
      net/mlx5: EQ add/del CQ API · d5c07157
      Saeed Mahameed authored
      Add API to add/del CQ to/from EQs CQ table to be used in cq.c upon CQ
      creation/destruction, as CQ table is now private to eq.c.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarGal Pressman <galp@mellanox.com>
      d5c07157
    • Saeed Mahameed's avatar
      net/mlx5: Add missing likely/unlikely hints to cq events · d2ff4fa5
      Saeed Mahameed authored
      If a hardware event is targeting a CQ, that CQ should exist.
      Add unlikely to error handling flows.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarGal Pressman <galp@mellanox.com>
      d2ff4fa5
    • Saeed Mahameed's avatar
      net/mlx5: CQ Database per EQ · 02d92f79
      Saeed Mahameed authored
      Before this patch the driver had one CQ database protected via one
      spinlock, this spinlock is meant to synchronize between CQ
      adding/removing and CQ IRQ interrupt handling.
      
      On a system with large number of CPUs and on a work load that requires
      lots of interrupts, this global spinlock becomes a very nasty hotspot
      and introduces a contention between the active cores, which will
      significantly hurt performance and becomes a bottleneck that prevents
      seamless cpu scaling.
      
      To solve this we simply move the CQ database and its spinlock to be per
      EQ (IRQ), thus per core.
      
      Tested with:
      system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores
      netperf command: ./super_netperf 200 -P 0 -t TCP_RR  -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M
      
      WITHOUT THIS PATCH:
      Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft %steal  %guest  %gnice   %idle
      Average:     all    4.32    0.00   36.15    0.09    0.00   34.02   0.00    0.00    0.00   25.41
      
      Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271
      Overhead  Command          Shared Object                 Symbol
      +   14.28%  swapper          [kernel.vmlinux]              [k] intel_idle
      +   12.25%  swapper          [kernel.vmlinux]              [k] queued_spin_lock_slowpath
      +   10.29%  netserver        [kernel.vmlinux]              [k] queued_spin_lock_slowpath
      +    1.32%  netserver        [kernel.vmlinux]              [k] mlx5e_xmit
      
      WITH THIS PATCH:
      Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
      Average:     all    4.27    0.00   34.31    0.01    0.00   18.71    0.00    0.00    0.00   42.69
      
      Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483
      Overhead  Command          Shared Object             Symbol
      +   23.33%  swapper          [kernel.vmlinux]          [k] intel_idle
      +    1.69%  netserver        [kernel.vmlinux]          [k] mlx5e_xmit
      Tested-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarGal Pressman <galp@mellanox.com>
      02d92f79
  4. 11 Feb, 2018 9 commits
    • Linus Torvalds's avatar
      Linux 4.16-rc1 · 7928b2cb
      Linus Torvalds authored
      7928b2cb
    • Al Viro's avatar
      unify {de,}mangle_poll(), get rid of kernel-side POLL... · 7a163b21
      Al Viro authored
      except, again, POLLFREE and POLL_BUSY_LOOP.
      
      With this, we finally get to the promised end result:
      
       - POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any
         stray instances of ->poll() still using those will be caught by
         sparse.
      
       - eventpoll.c and select.c warning-free wrt __poll_t
      
       - no more kernel-side definitions of POLL... - userland ones are
         visible through the entire kernel (and used pretty much only for
         mangle/demangle)
      
       - same behavior as after the first series (i.e. sparc et.al. epoll(2)
         working correctly).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a163b21
    • Linus Torvalds's avatar
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds authored
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
    • Linus Torvalds's avatar
      Merge branch 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ee5daa13
      Linus Torvalds authored
      Pull more poll annotation updates from Al Viro:
       "This is preparation to solving the problems you've mentioned in the
        original poll series.
      
        After this series, the kernel is ready for running
      
            for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
                  L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
                  for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
            done
      
        as a for bulk search-and-replace.
      
        After that, the kernel is ready to apply the patch to unify
        {de,}mangle_poll(), and then get rid of kernel-side POLL... uses
        entirely, and we should be all done with that stuff.
      
        Basically, that's what you suggested wrt KPOLL..., except that we can
        use EPOLL... instead - they already are arch-independent (and equal to
        what is currently kernel-side POLL...).
      
        After the preparations (in this series) switch to returning EPOLL...
        from ->poll() instances is completely mechanical and kernel-side
        POLL... can go away. The last step (killing kernel-side POLL... and
        unifying {de,}mangle_poll() has to be done after the
        search-and-replace job, since we need userland-side POLL... for
        unified {de,}mangle_poll(), thus the cherry-pick at the last step.
      
        After that we will have:
      
         - POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of
           ->poll() still using those will be caught by sparse.
      
         - eventpoll.c and select.c warning-free wrt __poll_t
      
         - no more kernel-side definitions of POLL... - userland ones are
           visible through the entire kernel (and used pretty much only for
           mangle/demangle)
      
         - same behavior as after the first series (i.e. sparc et.al. epoll(2)
           working correctly)"
      
      * 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        annotate ep_scan_ready_list()
        ep_send_events_proc(): return result via esed->res
        preparation to switching ->poll() to returning EPOLL...
        add EPOLLNVAL, annotate EPOLL... and event_poll->event
        use linux/poll.h instead of asm/poll.h
        xen: fix poll misannotation
        smc: missing poll annotations
      ee5daa13
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20180211' of git://github.com/jcmvbkbc/linux-xtensa · 3fc928dc
      Linus Torvalds authored
      Pull xtense fix from Max Filippov:
       "Build fix for xtensa architecture with KASAN enabled"
      
      * tag 'xtensa-20180211' of git://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix build with KASAN
      3fc928dc
    • Linus Torvalds's avatar
      Merge tag 'nios2-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2 · 60d7a21a
      Linus Torvalds authored
      Pull nios2 update from Ley Foon Tan:
      
       - clean up old Kconfig options from defconfig
      
       - remove leading 0x and 0s from bindings notation in dts files
      
      * tag 'nios2-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
        nios2: defconfig: Cleanup from old Kconfig options
        nios2: dts: Remove leading 0x and 0s from bindings notation
      60d7a21a
    • Max Filippov's avatar
      xtensa: fix build with KASAN · f8d0cbf2
      Max Filippov authored
      The commit 917538e2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT
      usage") removed KASAN_SHADOW_SCALE_SHIFT definition from
      include/linux/kasan.h and added it to architecture-specific headers,
      except for xtensa. This broke the xtensa build with KASAN enabled.
      Define KASAN_SHADOW_SCALE_SHIFT in arch/xtensa/include/asm/kasan.h
      
      Reported by: kbuild test robot <fengguang.wu@intel.com>
      Fixes: 917538e2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage")
      Acked-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      f8d0cbf2
    • Krzysztof Kozlowski's avatar
      nios2: defconfig: Cleanup from old Kconfig options · e0691ebb
      Krzysztof Kozlowski authored
      Remove old, dead Kconfig option INET_LRO. It is gone since
      commit 7bbf3cae ("ipv4: Remove inet_lro library").
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Acked-by: default avatarLey Foon Tan <ley.foon.tan@intel.com>
      e0691ebb
    • Mathieu Malaterre's avatar
      nios2: dts: Remove leading 0x and 0s from bindings notation · 5d13c731
      Mathieu Malaterre authored
      Improve the DTS files by removing all the leading "0x" and zeros to fix the
      following dtc warnings:
      
      Warning (unit_address_format): Node /XXX unit name should not have leading "0x"
      
      and
      
      Warning (unit_address_format): Node /XXX unit name should not have leading 0s
      
      Converted using the following command:
      
      find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} +
      
      For simplicity, two sed expressions were used to solve each warnings separately.
      
      To make the regex expression more robust a few other issues were resolved,
      namely setting unit-address to lower case, and adding a whitespace before the
      the opening curly brace:
      
      https://elinux.org/Device_Tree_Linux#Linux_conventions
      
      This is a follow up to commit 4c9847b7 ("dt-bindings: Remove leading 0x from bindings notation")
      Reported-by: default avatarDavid Daney <ddaney@caviumnetworks.com>
      Suggested-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Acked-by: default avatarLey Foon Tan <ley.foon.tan@intel.com>
      5d13c731
  5. 10 Feb, 2018 3 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · d48fcbd8
      Linus Torvalds authored
      Pull PCI fix from Bjorn Helgaas:
       "Fix a POWER9/powernv INTx regression from the merge window (Alexey
        Kardashevskiy)"
      
      * tag 'pci-v4.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        powerpc/pci: Fix broken INTx configuration via OF
      d48fcbd8
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180210' of git://git.kernel.dk/linux-block · 9454473c
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes to round off the merge window on the block side:
      
         - a set of bcache fixes by way of Michael Lyle, from the usual bcache
           suspects.
      
         - add a simple-to-hook-into function for bpf EIO error injection.
      
         - fix blk-wbt that mischarectized flushes as reads. Improve the logic
           so that flushes and writes are accounted as writes, and only reads
           as reads. From me.
      
         - fix requeue crash in BFQ, from Paolo"
      
      * tag 'for-linus-20180210' of git://git.kernel.dk/linux-block:
        block, bfq: add requeue-request hook
        bcache: fix for data collapse after re-attaching an attached device
        bcache: return attach error when no cache set exist
        bcache: set writeback_rate_update_seconds in range [1, 60] seconds
        bcache: fix for allocator and register thread race
        bcache: set error_limit correctly
        bcache: properly set task state in bch_writeback_thread()
        bcache: fix high CPU occupancy during journal
        bcache: add journal statistic
        block: Add should_fail_bio() for bpf error injection
        blk-wbt: account flush requests correctly
      9454473c
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86 · cc5cb5af
      Linus Torvalds authored
      Pull x86 platform driver updates from Darren Hart:
       "Mellanox fixes and new system type support.
      
        Mostly data for new system types with a correction and an
        uninitialized variable fix"
      
      [ Pulling from github because git.infradead.org currently seems to be
        down for some reason, but Darren had a backup location    - Linus ]
      
      * tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86:
        platform/x86: mlx-platform: Add support for new 200G IB and Ethernet systems
        platform/x86: mlx-platform: Add support for new msn201x system type
        platform/x86: mlx-platform: Add support for new msn274x system type
        platform/x86: mlx-platform: Fix power cable setting for msn21xx family
        platform/x86: mlx-platform: Add define for the negative bus
        platform/x86: mlx-platform: Use defines for bus assignment
        platform/mellanox: mlxreg-hotplug: Fix uninitialized variable
      cc5cb5af