1. 20 Jul, 2018 6 commits
    • David S. Miller's avatar
      Merge branch 'tcp-fix-DCTCP-ECE-Ack-series' · f7a6eb1e
      David S. Miller authored
      Yuchung Cheng says:
      
      ====================
      fix DCTCP ECE Ack series
      
      This patch set address that the existing DCTCP implementation does not
      fully implement the ACK policy specified in the RFC. This improves
      the responsiveness of CE status change particularly on flows with
      small inflight.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7a6eb1e
    • Yuchung Cheng's avatar
      tcp: do not delay ACK in DCTCP upon CE status change · a0496ef2
      Yuchung Cheng authored
      Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change
      has to be sent immediately so the sender can respond quickly:
      
      """ When receiving packets, the CE codepoint MUST be processed as follows:
      
         1.  If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
             true and send an immediate ACK.
      
         2.  If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
             to false and send an immediate ACK.
      """
      
      Previously DCTCP implementation may continue to delay the ACK. This
      patch fixes that to implement the RFC by forcing an immediate ACK.
      
      Tested with this packetdrill script provided by Larry Brakmo
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
      0.110 < [ect0] . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
         +0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
      
      0.200 < [ect0] . 1:1001(1000) ack 1 win 257
      0.200 > [ect01] . 1:1(0) ack 1001
      
      0.200 write(4, ..., 1) = 1
      0.200 > [ect01] P. 1:2(1) ack 1001
      
      0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
      +0.005 < [ce] . 2001:3001(1000) ack 2 win 257
      
      +0.000 > [ect01] . 2:2(0) ack 2001
      // Previously the ACK below would be delayed by 40ms
      +0.000 > [ect01] E. 2:2(0) ack 3001
      
      +0.500 < F. 9501:9501(0) ack 4 win 257
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0496ef2
    • Yuchung Cheng's avatar
      tcp: do not cancel delay-AcK on DCTCP special ACK · 27cde44a
      Yuchung Cheng authored
      Currently when a DCTCP receiver delays an ACK and receive a
      data packet with a different CE mark from the previous one's, it
      sends two immediate ACKs acking previous and latest sequences
      respectly (for ECN accounting).
      
      Previously sending the first ACK may mark off the delayed ACK timer
      (tcp_event_ack_sent). This may subsequently prevent sending the
      second ACK to acknowledge the latest sequence (tcp_ack_snd_check).
      The culprit is that tcp_send_ack() assumes it always acknowleges
      the latest sequence, which is not true for the first special ACK.
      
      The fix is to not make the assumption in tcp_send_ack and check the
      actual ack sequence before cancelling the delayed ACK. Further it's
      safer to pass the ack sequence number as a local variable into
      tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
      future bugs like this.
      Reported-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27cde44a
    • Yuchung Cheng's avatar
      tcp: helpers to send special DCTCP ack · 2987babb
      Yuchung Cheng authored
      Refactor and create helpers to send the special ACK in DCTCP.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2987babb
    • Zhao Chen's avatar
      net-next/hinic: fix a problem in hinic_xmit_frame() · f7482683
      Zhao Chen authored
      The calculation of "wqe_size" is not correct when the tx queue is busy in
      hinic_xmit_frame().
      
      When there are no free WQEs, the tx flow will unmap the skb buffer, then
      ring the doobell for the pending packets. But the "wqe_size" which used
      to calculate the doorbell address is not correct. The wqe size should be
      cleared to 0, otherwise, it will cause a doorbell error.
      
      This patch fixes the problem.
      Reported-by: default avatarZhou Wang <wangzhou1@hisilicon.com>
      Signed-off-by: default avatarZhao Chen <zhaochen6@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7482683
    • Tariq Toukan's avatar
      net/page_pool: Fix inconsistent lock state warning · 4905bd9a
      Tariq Toukan authored
      Fix the warning below by calling the ptr_ring_consume_bh,
      which uses spin_[un]lock_bh.
      
      [  179.064300] ================================
      [  179.069073] WARNING: inconsistent lock state
      [  179.073846] 4.18.0-rc2+ #18 Not tainted
      [  179.078133] --------------------------------
      [  179.082907] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  179.089637] swapper/21/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [  179.095478] 00000000963d1995 (&(&r->consumer_lock)->rlock){+.?.}, at:
      __page_pool_empty_ring+0x61/0x100
      [  179.105988] {SOFTIRQ-ON-W} state was registered at:
      [  179.111443]   _raw_spin_lock+0x35/0x50
      [  179.115634]   __page_pool_empty_ring+0x61/0x100
      [  179.120699]   page_pool_destroy+0x32/0x50
      [  179.125204]   mlx5e_free_rq+0x38/0xc0 [mlx5_core]
      [  179.130471]   mlx5e_close_channel+0x20/0x120 [mlx5_core]
      [  179.136418]   mlx5e_close_channels+0x26/0x40 [mlx5_core]
      [  179.142364]   mlx5e_close_locked+0x44/0x50 [mlx5_core]
      [  179.148509]   mlx5e_close+0x42/0x60 [mlx5_core]
      [  179.153936]   __dev_close_many+0xb1/0x120
      [  179.158749]   dev_close_many+0xa2/0x170
      [  179.163364]   rollback_registered_many+0x148/0x460
      [  179.169047]   rollback_registered+0x56/0x90
      [  179.174043]   unregister_netdevice_queue+0x7e/0x100
      [  179.179816]   unregister_netdev+0x18/0x20
      [  179.184623]   mlx5e_remove+0x2a/0x50 [mlx5_core]
      [  179.190107]   mlx5_remove_device+0xe5/0x110 [mlx5_core]
      [  179.196274]   mlx5_unregister_interface+0x39/0x90 [mlx5_core]
      [  179.203028]   cleanup+0x5/0xbfc [mlx5_core]
      [  179.208031]   __x64_sys_delete_module+0x16b/0x240
      [  179.213640]   do_syscall_64+0x5a/0x210
      [  179.218151]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  179.224218] irq event stamp: 334398
      [  179.228438] hardirqs last  enabled at (334398): [<ffffffffa511d8b7>]
      rcu_process_callbacks+0x1c7/0x790
      [  179.239178] hardirqs last disabled at (334397): [<ffffffffa511d872>]
      rcu_process_callbacks+0x182/0x790
      [  179.249931] softirqs last  enabled at (334386): [<ffffffffa509732e>] irq_enter+0x5e/0x70
      [  179.259306] softirqs last disabled at (334387): [<ffffffffa509741c>] irq_exit+0xdc/0xf0
      [  179.268584]
      [  179.268584] other info that might help us debug this:
      [  179.276572]  Possible unsafe locking scenario:
      [  179.276572]
      [  179.283877]        CPU0
      [  179.286954]        ----
      [  179.290033]   lock(&(&r->consumer_lock)->rlock);
      [  179.295546]   <Interrupt>
      [  179.298830]     lock(&(&r->consumer_lock)->rlock);
      [  179.304550]
      [  179.304550]  *** DEADLOCK ***
      
      Fixes: ff7d6b27 ("page_pool: refurbish version of page_pool code")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4905bd9a
  2. 19 Jul, 2018 3 commits
    • Linus Torvalds's avatar
      Merge tag 'sound-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · f39f28ff
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A rawmidi race fix and three trivial HD-audio quirks"
      
      * tag 'sound-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/realtek - Yet another Clevo P950 quirk entry
        ALSA: rawmidi: Change resized buffers atomically
        ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk
        ALSA: hda: add mute led support for HP ProBook 455 G5
      f39f28ff
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · b4394c34
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "This fixes an allocation error-path bug in af_alg discovered by
        syzkaller"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: af_alg - Initialize sg_num_bytes in error code path
      b4394c34
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 024ddc0c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Lots of fixes, here goes:
      
         1) NULL deref in qtnfmac, from Gustavo A. R. Silva.
      
         2) Kernel oops when fw download fails in rtlwifi, from Ping-Ke Shih.
      
         3) Lost completion messages in AF_XDP, from Magnus Karlsson.
      
         4) Correct bogus self-assignment in rhashtable, from Rishabh
            Bhatnagar.
      
         5) Fix regression in ipv6 route append handling, from David Ahern.
      
         6) Fix masking in __set_phy_supported(), from Heiner Kallweit.
      
         7) Missing module owner set in x_tables icmp, from Florian Westphal.
      
         8) liquidio's timeouts are HZ dependent, fix from Nicholas Mc Guire.
      
         9) Link setting fixes for sh_eth and ravb, from Vladimir Zapolskiy.
      
        10) Fix NULL deref when using chains in act_csum, from Davide Caratti.
      
        11) XDP_REDIRECT needs to check if the interface is up and whether the
            MTU is sufficient. From Toshiaki Makita.
      
        12) Net diag can do a double free when killing TCP_NEW_SYN_RECV
            connections, from Lorenzo Colitti.
      
        13) nf_defrag in ipv6 can unnecessarily hold onto dst entries for a
            full minute, delaying device unregister. From Eric Dumazet.
      
        14) Update MAC entries in the correct order in ixgbe, from Alexander
            Duyck.
      
        15) Don't leave partial mangles bpf program in jit_subprogs, from
            Daniel Borkmann.
      
        16) Fix pfmemalloc SKB state propagation, from Stefano Brivio.
      
        17) Fix ACK handling in DCTCP congestion control, from Yuchung Cheng.
      
        18) Use after free in tun XDP_TX, from Toshiaki Makita.
      
        19) Stale ipv6 header pointer in ipv6 gre code, from Prashant Bhole.
      
        20) Don't reuse remainder of RX page when XDP is set in mlx4, from
            Saeed Mahameed.
      
        21) Fix window probe handling of TCP rapair sockets, from Stefan
            Baranoff.
      
        22) Missing socket locking in smc_ioctl(), from Ursula Braun.
      
        23) IPV6_ILA needs DST_CACHE, from Arnd Bergmann.
      
        24) Spectre v1 fix in cxgb3, from Gustavo A. R. Silva.
      
        25) Two spots in ipv6 do a rol32() on a hash value but ignore the
            result. Fixes from Colin Ian King"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (176 commits)
        tcp: identify cryptic messages as TCP seq # bugs
        ptp: fix missing break in switch
        hv_netvsc: Fix napi reschedule while receive completion is busy
        MAINTAINERS: Drop inactive Vitaly Bordug's email
        net: cavium: Add fine-granular dependencies on PCI
        net: qca_spi: Fix log level if probe fails
        net: qca_spi: Make sure the QCA7000 reset is triggered
        net: qca_spi: Avoid packet drop during initial sync
        ipv6: fix useless rol32 call on hash
        ipv6: sr: fix useless rol32 call on hash
        net: sched: Using NULL instead of plain integer
        net: usb: asix: replace mii_nway_restart in resume path
        net: cxgb3_main: fix potential Spectre v1
        lib/rhashtable: consider param->min_size when setting initial table size
        net/smc: reset recv timeout after clc handshake
        net/smc: add error handling for get_user()
        net/smc: optimize consumer cursor updates
        net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
        ipv6: ila: select CONFIG_DST_CACHE
        net: usb: rtl8150: demote allmulti message to dev_dbg()
        ...
      024ddc0c
  3. 18 Jul, 2018 31 commits