1. 22 Jul, 2018 1 commit
    • David Ahern's avatar
      net/ipv6: Fix linklocal to global address with VRF · 24b711ed
      David Ahern authored
      Example setup:
          host: ip -6 addr add dev eth1 2001:db8:104::4
                 where eth1 is enslaved to a VRF
      
          switch: ip -6 ro add 2001:db8:104::4/128 dev br1
                  where br1 only has an LLA
      
                 ping6 2001:db8:104::4
                 ssh   2001:db8:104::4
      
      (NOTE: UDP works fine if the PKTINFO has the address set to the global
      address and ifindex is set to the index of eth1 with a destination an
      LLA).
      
      For ICMP, icmp6_iif needs to be updated to check if skb->dev is an
      L3 master. If it is then return the ifindex from rt6i_idev similar
      to what is done for loopback.
      
      For TCP, restore the original tcp_v6_iif definition which is needed in
      most places and add a new tcp_v6_iif_l3_slave that considers the
      l3_slave variability. This latter check is only needed for socket
      lookups.
      
      Fixes: 9ff74384 ("net: vrf: Handle ipv6 multicast and link-local addresses")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24b711ed
  2. 21 Jul, 2018 10 commits
  3. 20 Jul, 2018 10 commits
    • Doron Roberts-Kedes's avatar
      tls: check RCV_SHUTDOWN in tls_wait_data · fcf4793e
      Doron Roberts-Kedes authored
      The current code does not check sk->sk_shutdown & RCV_SHUTDOWN.
      tls_sw_recvmsg may return a positive value in the case where bytes have
      already been copied when the socket is shutdown. sk->sk_err has been
      cleared, causing the tls_wait_data to hang forever on a subsequent
      invocation. Checking sk->sk_shutdown & RCV_SHUTDOWN, as in tcp_recvmsg,
      fixes this problem.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Acked-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDoron Roberts-Kedes <doronrk@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcf4793e
    • David S. Miller's avatar
      Merge branch 'tcp-fix-DCTCP-ECE-Ack-series' · f7a6eb1e
      David S. Miller authored
      Yuchung Cheng says:
      
      ====================
      fix DCTCP ECE Ack series
      
      This patch set address that the existing DCTCP implementation does not
      fully implement the ACK policy specified in the RFC. This improves
      the responsiveness of CE status change particularly on flows with
      small inflight.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7a6eb1e
    • Yuchung Cheng's avatar
      tcp: do not delay ACK in DCTCP upon CE status change · a0496ef2
      Yuchung Cheng authored
      Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change
      has to be sent immediately so the sender can respond quickly:
      
      """ When receiving packets, the CE codepoint MUST be processed as follows:
      
         1.  If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
             true and send an immediate ACK.
      
         2.  If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
             to false and send an immediate ACK.
      """
      
      Previously DCTCP implementation may continue to delay the ACK. This
      patch fixes that to implement the RFC by forcing an immediate ACK.
      
      Tested with this packetdrill script provided by Larry Brakmo
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
      0.110 < [ect0] . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
         +0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
      
      0.200 < [ect0] . 1:1001(1000) ack 1 win 257
      0.200 > [ect01] . 1:1(0) ack 1001
      
      0.200 write(4, ..., 1) = 1
      0.200 > [ect01] P. 1:2(1) ack 1001
      
      0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
      +0.005 < [ce] . 2001:3001(1000) ack 2 win 257
      
      +0.000 > [ect01] . 2:2(0) ack 2001
      // Previously the ACK below would be delayed by 40ms
      +0.000 > [ect01] E. 2:2(0) ack 3001
      
      +0.500 < F. 9501:9501(0) ack 4 win 257
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0496ef2
    • Yuchung Cheng's avatar
      tcp: do not cancel delay-AcK on DCTCP special ACK · 27cde44a
      Yuchung Cheng authored
      Currently when a DCTCP receiver delays an ACK and receive a
      data packet with a different CE mark from the previous one's, it
      sends two immediate ACKs acking previous and latest sequences
      respectly (for ECN accounting).
      
      Previously sending the first ACK may mark off the delayed ACK timer
      (tcp_event_ack_sent). This may subsequently prevent sending the
      second ACK to acknowledge the latest sequence (tcp_ack_snd_check).
      The culprit is that tcp_send_ack() assumes it always acknowleges
      the latest sequence, which is not true for the first special ACK.
      
      The fix is to not make the assumption in tcp_send_ack and check the
      actual ack sequence before cancelling the delayed ACK. Further it's
      safer to pass the ack sequence number as a local variable into
      tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
      future bugs like this.
      Reported-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27cde44a
    • Yuchung Cheng's avatar
      tcp: helpers to send special DCTCP ack · 2987babb
      Yuchung Cheng authored
      Refactor and create helpers to send the special ACK in DCTCP.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2987babb
    • Martin KaFai Lau's avatar
      bpf: Use option "help" in the llvm-objcopy test · 7c3e8b64
      Martin KaFai Lau authored
      I noticed the "--version" option of the llvm-objcopy command has recently
      disappeared from the master llvm branch.  It is currently used as a BTF
      support test in tools/testing/selftests/bpf/Makefile.
      
      This patch replaces it with "--help" which should be
      less error prone in the future.
      
      Fixes: c0fa1b6c ("bpf: btf: Add BTF tests")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7c3e8b64
    • Martin KaFai Lau's avatar
      bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h · 36fc3c8c
      Martin KaFai Lau authored
      This patch shrinks the BTF_INT_BITS() mask.  The current
      btf_int_check_meta() ensures the nr_bits of an integer
      cannot exceed 64.  Hence, it is mostly an uapi cleanup.
      
      The actual btf usage (i.e. seq_show()) is also modified
      to use u8 instead of u16.  The verification (e.g. btf_int_check_meta())
      path stays as is to deal with invalid BTF situation.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      36fc3c8c
    • Taeung Song's avatar
      tools/bpftool: Fix segfault case regarding 'pin' arguments · 759b94a0
      Taeung Song authored
      Arguments of 'pin' subcommand should be checked
      at the very beginning of do_pin_any().
      Otherwise segfault errors can occur when using
      'map pin' or 'prog pin' commands, so fix it.
      
        # bpftool prog pin id
        Segmentation fault
      
      Fixes: 71bb428f ("tools: bpf: add bpftool")
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reported-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarTaeung Song <treeze.taeung@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      759b94a0
    • Zhao Chen's avatar
      net-next/hinic: fix a problem in hinic_xmit_frame() · f7482683
      Zhao Chen authored
      The calculation of "wqe_size" is not correct when the tx queue is busy in
      hinic_xmit_frame().
      
      When there are no free WQEs, the tx flow will unmap the skb buffer, then
      ring the doobell for the pending packets. But the "wqe_size" which used
      to calculate the doorbell address is not correct. The wqe size should be
      cleared to 0, otherwise, it will cause a doorbell error.
      
      This patch fixes the problem.
      Reported-by: default avatarZhou Wang <wangzhou1@hisilicon.com>
      Signed-off-by: default avatarZhao Chen <zhaochen6@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7482683
    • Tariq Toukan's avatar
      net/page_pool: Fix inconsistent lock state warning · 4905bd9a
      Tariq Toukan authored
      Fix the warning below by calling the ptr_ring_consume_bh,
      which uses spin_[un]lock_bh.
      
      [  179.064300] ================================
      [  179.069073] WARNING: inconsistent lock state
      [  179.073846] 4.18.0-rc2+ #18 Not tainted
      [  179.078133] --------------------------------
      [  179.082907] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  179.089637] swapper/21/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [  179.095478] 00000000963d1995 (&(&r->consumer_lock)->rlock){+.?.}, at:
      __page_pool_empty_ring+0x61/0x100
      [  179.105988] {SOFTIRQ-ON-W} state was registered at:
      [  179.111443]   _raw_spin_lock+0x35/0x50
      [  179.115634]   __page_pool_empty_ring+0x61/0x100
      [  179.120699]   page_pool_destroy+0x32/0x50
      [  179.125204]   mlx5e_free_rq+0x38/0xc0 [mlx5_core]
      [  179.130471]   mlx5e_close_channel+0x20/0x120 [mlx5_core]
      [  179.136418]   mlx5e_close_channels+0x26/0x40 [mlx5_core]
      [  179.142364]   mlx5e_close_locked+0x44/0x50 [mlx5_core]
      [  179.148509]   mlx5e_close+0x42/0x60 [mlx5_core]
      [  179.153936]   __dev_close_many+0xb1/0x120
      [  179.158749]   dev_close_many+0xa2/0x170
      [  179.163364]   rollback_registered_many+0x148/0x460
      [  179.169047]   rollback_registered+0x56/0x90
      [  179.174043]   unregister_netdevice_queue+0x7e/0x100
      [  179.179816]   unregister_netdev+0x18/0x20
      [  179.184623]   mlx5e_remove+0x2a/0x50 [mlx5_core]
      [  179.190107]   mlx5_remove_device+0xe5/0x110 [mlx5_core]
      [  179.196274]   mlx5_unregister_interface+0x39/0x90 [mlx5_core]
      [  179.203028]   cleanup+0x5/0xbfc [mlx5_core]
      [  179.208031]   __x64_sys_delete_module+0x16b/0x240
      [  179.213640]   do_syscall_64+0x5a/0x210
      [  179.218151]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  179.224218] irq event stamp: 334398
      [  179.228438] hardirqs last  enabled at (334398): [<ffffffffa511d8b7>]
      rcu_process_callbacks+0x1c7/0x790
      [  179.239178] hardirqs last disabled at (334397): [<ffffffffa511d872>]
      rcu_process_callbacks+0x182/0x790
      [  179.249931] softirqs last  enabled at (334386): [<ffffffffa509732e>] irq_enter+0x5e/0x70
      [  179.259306] softirqs last disabled at (334387): [<ffffffffa509741c>] irq_exit+0xdc/0xf0
      [  179.268584]
      [  179.268584] other info that might help us debug this:
      [  179.276572]  Possible unsafe locking scenario:
      [  179.276572]
      [  179.283877]        CPU0
      [  179.286954]        ----
      [  179.290033]   lock(&(&r->consumer_lock)->rlock);
      [  179.295546]   <Interrupt>
      [  179.298830]     lock(&(&r->consumer_lock)->rlock);
      [  179.304550]
      [  179.304550]  *** DEADLOCK ***
      
      Fixes: ff7d6b27 ("page_pool: refurbish version of page_pool code")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4905bd9a
  4. 19 Jul, 2018 14 commits
    • Alexei Starovoitov's avatar
      Merge branch 'ppc-fix' · bb392867
      Alexei Starovoitov authored
      Daniel Borkmann says:
      
      ====================
      This set adds a ppc64 JIT fix for xadd as well as a missing test
      case for verifying whether xadd messes with src/dst reg. Thanks!
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bb392867
    • Daniel Borkmann's avatar
      bpf: test case to check whether src/dst regs got mangled by xadd · fa47a16b
      Daniel Borkmann authored
      We currently do not have such a test case in test_verifier selftests
      but it's important to test under bpf_jit_enable=1 to make sure JIT
      implementations do not mistakenly mess with src/dst reg for xadd/{w,dw}.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fa47a16b
    • Daniel Borkmann's avatar
      bpf, ppc64: fix unexpected r0=0 exit path inside bpf_xadd · b9c1e60e
      Daniel Borkmann authored
      None of the JITs is allowed to implement exit paths from the BPF
      insn mappings other than BPF_JMP | BPF_EXIT. In the BPF core code
      we have a couple of rewrites in eBPF (e.g. LD_ABS / LD_IND) and
      in eBPF to cBPF translation to retain old existing behavior where
      exceptions may occur; they are also tightly controlled by the
      verifier where it disallows some of the features such as BPF to
      BPF calls when legacy LD_ABS / LD_IND ops are present in the BPF
      program. During recent review of all BPF_XADD JIT implementations
      I noticed that the ppc64 one is buggy in that it contains two
      jumps to exit paths. This is problematic as this can bypass verifier
      expectations e.g. pointed out in commit f6b1b3bf ("bpf: fix
      subprog verifier bypass by div/mod by 0 exception"). The first
      exit path is obsoleted by the fix in ca369602 ("bpf: allow xadd
      only on aligned memory") anyway, and for the second one we need to
      do a fetch, add and store loop if the reservation from lwarx/ldarx
      was lost in the meantime.
      
      Fixes: 156d0e29 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
      Reviewed-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: default avatarSandipan Das <sandipan@linux.vnet.ibm.com>
      Tested-by: default avatarSandipan Das <sandipan@linux.vnet.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b9c1e60e
    • Linus Torvalds's avatar
      Merge tag 'sound-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · f39f28ff
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A rawmidi race fix and three trivial HD-audio quirks"
      
      * tag 'sound-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/realtek - Yet another Clevo P950 quirk entry
        ALSA: rawmidi: Change resized buffers atomically
        ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk
        ALSA: hda: add mute led support for HP ProBook 455 G5
      f39f28ff
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · b4394c34
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "This fixes an allocation error-path bug in af_alg discovered by
        syzkaller"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: af_alg - Initialize sg_num_bytes in error code path
      b4394c34
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 024ddc0c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Lots of fixes, here goes:
      
         1) NULL deref in qtnfmac, from Gustavo A. R. Silva.
      
         2) Kernel oops when fw download fails in rtlwifi, from Ping-Ke Shih.
      
         3) Lost completion messages in AF_XDP, from Magnus Karlsson.
      
         4) Correct bogus self-assignment in rhashtable, from Rishabh
            Bhatnagar.
      
         5) Fix regression in ipv6 route append handling, from David Ahern.
      
         6) Fix masking in __set_phy_supported(), from Heiner Kallweit.
      
         7) Missing module owner set in x_tables icmp, from Florian Westphal.
      
         8) liquidio's timeouts are HZ dependent, fix from Nicholas Mc Guire.
      
         9) Link setting fixes for sh_eth and ravb, from Vladimir Zapolskiy.
      
        10) Fix NULL deref when using chains in act_csum, from Davide Caratti.
      
        11) XDP_REDIRECT needs to check if the interface is up and whether the
            MTU is sufficient. From Toshiaki Makita.
      
        12) Net diag can do a double free when killing TCP_NEW_SYN_RECV
            connections, from Lorenzo Colitti.
      
        13) nf_defrag in ipv6 can unnecessarily hold onto dst entries for a
            full minute, delaying device unregister. From Eric Dumazet.
      
        14) Update MAC entries in the correct order in ixgbe, from Alexander
            Duyck.
      
        15) Don't leave partial mangles bpf program in jit_subprogs, from
            Daniel Borkmann.
      
        16) Fix pfmemalloc SKB state propagation, from Stefano Brivio.
      
        17) Fix ACK handling in DCTCP congestion control, from Yuchung Cheng.
      
        18) Use after free in tun XDP_TX, from Toshiaki Makita.
      
        19) Stale ipv6 header pointer in ipv6 gre code, from Prashant Bhole.
      
        20) Don't reuse remainder of RX page when XDP is set in mlx4, from
            Saeed Mahameed.
      
        21) Fix window probe handling of TCP rapair sockets, from Stefan
            Baranoff.
      
        22) Missing socket locking in smc_ioctl(), from Ursula Braun.
      
        23) IPV6_ILA needs DST_CACHE, from Arnd Bergmann.
      
        24) Spectre v1 fix in cxgb3, from Gustavo A. R. Silva.
      
        25) Two spots in ipv6 do a rol32() on a hash value but ignore the
            result. Fixes from Colin Ian King"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (176 commits)
        tcp: identify cryptic messages as TCP seq # bugs
        ptp: fix missing break in switch
        hv_netvsc: Fix napi reschedule while receive completion is busy
        MAINTAINERS: Drop inactive Vitaly Bordug's email
        net: cavium: Add fine-granular dependencies on PCI
        net: qca_spi: Fix log level if probe fails
        net: qca_spi: Make sure the QCA7000 reset is triggered
        net: qca_spi: Avoid packet drop during initial sync
        ipv6: fix useless rol32 call on hash
        ipv6: sr: fix useless rol32 call on hash
        net: sched: Using NULL instead of plain integer
        net: usb: asix: replace mii_nway_restart in resume path
        net: cxgb3_main: fix potential Spectre v1
        lib/rhashtable: consider param->min_size when setting initial table size
        net/smc: reset recv timeout after clc handshake
        net/smc: add error handling for get_user()
        net/smc: optimize consumer cursor updates
        net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
        ipv6: ila: select CONFIG_DST_CACHE
        net: usb: rtl8150: demote allmulti message to dev_dbg()
        ...
      024ddc0c
    • Roi Dayan's avatar
      net/mlx5e: Only allow offloading decap egress (egdev) flows · 7e29392e
      Roi Dayan authored
      We get egress rules through the egdev mechanism when the ingress device
      is not supporting offload, with the expected use-case of tunnel decap
      ingress rule set on shared tunnel device.
      
      Make sure to offload egress/egdev rules only if decap action (tunnel key
      unset) exists there and err otherwise.
      
      Fixes: 717503b9 ("net: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra")
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      7e29392e
    • Tariq Toukan's avatar
      net/mlx5: Fix QP fragmented buffer allocation · d7037ad7
      Tariq Toukan authored
      Fix bad alignment of SQ buffer in fragmented QP allocation.
      It should start directly after RQ buffer ends.
      
      Take special care of the end case where the RQ buffer does not occupy
      a whole page. RQ size is a power of two, so would be the case only for
      small RQ sizes (RQ size < PAGE_SIZE).
      
      Fix wrong assignments for sqb->size (mistakenly assigned RQ size),
      and for npages value of RQ and SQ.
      
      Fixes: 3a2f7033 ("net/mlx5: Use order-0 allocations for all WQ types")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d7037ad7
    • Raed Salem's avatar
      net/mlx5: Fix 'DON'T_TRAP' functionality · 8c49f54a
      Raed Salem authored
      The flow counters binding support commit introduced a code change where
      none NULL 'rule_dest' is always passed to mlx5_add_flow_rules, this breaks
      'DON'T_TRAP' rules insertion.
      
      The fix uses the equivalent 'dest_num' value instead of dest pointer
      at the failed check.
      
      fixes: 3b3233fb ('IB/mlx5: Add flow counters binding support')
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      8c49f54a
    • Saeed Mahameed's avatar
      net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode · 443a8581
      Saeed Mahameed authored
      With debug kernel UBSAN detects the following issue, which might happen
      when eswitch instance is not created, fix this by testing the eswitch
      pointer before returning the eswitch mode, if not set return mode =
      SRIOV_NONE.
      
      [   32.528951] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx5/core/eswitch.c:2219:12
      [   32.528951] member access within null pointer of type 'struct mlx5_eswitch'
      [   32.528951] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc3-dirty #181
      [   32.528951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
      [   32.528951] Call Trace:
      [   32.528951]  dump_stack+0xc7/0x13b
      [   32.528951]  ? show_regs_print_info+0x5/0x5
      [   32.528951]  ? __pm_runtime_use_autosuspend+0x140/0x140
      [   32.528951]  ubsan_epilogue+0x9/0x49
      [   32.528951]  ubsan_type_mismatch_common+0x1f9/0x2c0
      [   32.528951]  ? ucs2_as_utf8+0x310/0x310
      [   32.528951]  ? device_initialize+0x229/0x2e0
      [   32.528951]  __ubsan_handle_type_mismatch+0x9f/0xc9
      [   32.528951]  ? __ubsan_handle_divrem_overflow+0x19b/0x19b
      [   32.578008]  ? ib_device_get_by_index+0xf0/0xf0
      [   32.578008]  mlx5_eswitch_mode+0x30/0x40
      [   32.578008]  mlx5_ib_add+0x1e0/0x4a0
      
      Fixes: 57cbd893 ("net/mlx5: E-Switch, Move representors definition to a global scope")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      443a8581
    • Eran Ben Elisha's avatar
      net/mlx5e: Don't allow aRFS for encapsulated packets · d2e1c57b
      Eran Ben Elisha authored
      Driver is yet to support aRFS for encapsulated packets, return early
      error in such case.
      
      Fixes: 18c908e4 ("net/mlx5e: Add accelerated RFS support")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d2e1c57b
    • Eran Ben Elisha's avatar
      net/mlx5e: Fix quota counting in aRFS expire flow · 2630bae8
      Eran Ben Elisha authored
      Quota should follow the amount of rules which do expire, and not the
      number of rules that were examined, fixed that.
      
      Fixes: 18c908e4 ("net/mlx5e: Add accelerated RFS support")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      2630bae8
    • Ariel Levkovich's avatar
      net/mlx5: Adjust clock overflow work period · 33180bee
      Ariel Levkovich authored
      When driver converts HW timestamp to wall clock time it subtracts
      the last saved cycle counter from the HW timestamp and converts the
      difference to nanoseconds.
      The conversion is done by multiplying the cycles difference with the
      clock multiplier value as a first step and therefore the cycles
      difference should be small enough so that the multiplication product
      doesn't exceed 64bit.
      
      The overflow handling routine is in charge of updating the last saved
      cycle counter in driver and it is called periodically using kernel
      delayed workqueue.
      
      The delay period for this work is calculated using the max HW cycle
      counter value (a 41 bit mask) as a base which doesn't take the 64bit
      limit into account so the delay period may be incorrect and too
      long to prevent a large difference between the HW counter and the last
      saved counter in SW.
      
      This change adjusts the work period for the HW clock overflow work by
      taking the minimum between the previous value and the quotient of max
      u64 value and the clock multiplier value.
      
      Fixes: ef9814de ("net/mlx5e: Add HW timestamping (TS) support")
      Signed-off-by: default avatarAriel Levkovich <lariel@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      33180bee
    • Shay Agroskin's avatar
      net/mlx5e: Refine ets validation function · e279d634
      Shay Agroskin authored
      Removed an error message received when configuring ETS total
      bandwidth to be zero.
      Our hardware doesn't support such configuration, so we shall
      reject it in the driver. Nevertheless, we removed the error message
      in order to eliminate error messages caused by old userspace tools
      who try to pass such configuration.
      
      Fixes: ff089191 ("net/mlx5e: Fix ETS BW check")
      Signed-off-by: default avatarShay Agroskin <shayag@mellanox.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e279d634
  5. 18 Jul, 2018 5 commits