1. 09 Nov, 2019 6 commits
    • Magnus Karlsson's avatar
      ixgbe: need_wakeup flag might not be set for Tx · 0843aa8f
      Magnus Karlsson authored
      The need_wakeup flag for Tx might not be set for AF_XDP sockets that
      are only used to send packets. This happens if there is at least one
      outstanding packet that has not been completed by the hardware and we
      get that corresponding completion (which will not generate an
      interrupt since interrupts are disabled in the napi poll loop) between
      the time we stopped processing the Tx completions and interrupts are
      enabled again. In this case, the need_wakeup flag will have been
      cleared at the end of the Tx completion processing as we believe we
      will get an interrupt from the outstanding completion at a later point
      in time. But if this completion interrupt occurs before interrupts
      are enable, we lose it and should at that point really have set the
      need_wakeup flag since there are no more outstanding completions that
      can generate an interrupt to continue the processing. When this
      happens, user space will see a Tx queue need_wakeup of 0 and skip
      issuing a syscall, which means will never get into the Tx processing
      again and we have a deadlock.
      
      This patch introduces a quick fix for this issue by just setting the
      need_wakeup flag for Tx to 1 all the time. I am working on a proper
      fix for this that will toggle the flag appropriately, but it is more
      challenging than I anticipated and I am afraid that this patch will
      not be completed before the merge window closes, therefore this easier
      fix for now. This fix has a negative performance impact in the range
      of 0% to 4%. Towards the higher end of the scale if you have driver
      and application on the same core and issue a lot of packets, and
      towards no negative impact if you use two cores, lower transmission
      speeds and/or a workload that also receives packets.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0843aa8f
    • Magnus Karlsson's avatar
      i40e: need_wakeup flag might not be set for Tx · 70563957
      Magnus Karlsson authored
      The need_wakeup flag for Tx might not be set for AF_XDP sockets that
      are only used to send packets. This happens if there is at least one
      outstanding packet that has not been completed by the hardware and we
      get that corresponding completion (which will not generate an
      interrupt since interrupts are disabled in the napi poll loop) between
      the time we stopped processing the Tx completions and interrupts are
      enabled again. In this case, the need_wakeup flag will have been
      cleared at the end of the Tx completion processing as we believe we
      will get an interrupt from the outstanding completion at a later point
      in time. But if this completion interrupt occurs before interrupts
      are enable, we lose it and should at that point really have set the
      need_wakeup flag since there are no more outstanding completions that
      can generate an interrupt to continue the processing. When this
      happens, user space will see a Tx queue need_wakeup of 0 and skip
      issuing a syscall, which means will never get into the Tx processing
      again and we have a deadlock.
      
      This patch introduces a quick fix for this issue by just setting the
      need_wakeup flag for Tx to 1 all the time. I am working on a proper
      fix for this that will toggle the flag appropriately, but it is more
      challenging than I anticipated and I am afraid that this patch will
      not be completed before the merge window closes, therefore this easier
      fix for now. This fix has a negative performance impact in the range
      of 0% to 4%. Towards the higher end of the scale if you have driver
      and application on the same core and issue a lot of packets, and
      towards no negative impact if you use two cores, lower transmission
      speeds and/or a workload that also receives packets.
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      70563957
    • Jacob Keller's avatar
      igb/igc: use ktime accessors for skb->tstamp · 6acab13b
      Jacob Keller authored
      When implementing launch time support in the igb and igc drivers, the
      skb->tstamp value is assumed to be a s64, but it's declared as a ktime_t
      value.
      
      Although ktime_t is typedef'd to s64 it wasn't always, and the kernel
      provides accessors for ktime_t values.
      
      Use the ktime_to_timespec64 and ktime_set accessors instead of directly
      assuming that the variable is always an s64.
      
      This improves portability if the code is ever moved to another kernel
      version, or if the definition of ktime_t ever changes again in the
      future.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      6acab13b
    • Arkadiusz Kubalewski's avatar
      i40e: Fix for ethtool -m issue on X722 NIC · 4c9da6f2
      Arkadiusz Kubalewski authored
      This patch contains fix for a problem with command:
      'ethtool -m <dev>'
      which breaks functionality of:
      'ethtool <dev>'
      when called on X722 NIC
      
      Disallowed update of link phy_types on X722 NIC
      Currently correct value cannot be obtained from FW
      Previously wrong value returned by FW was used and was
      a root cause for incorrect output of 'ethtool <dev>' command
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4c9da6f2
    • Nicholas Nunley's avatar
      iavf: initialize ITRN registers with correct values · 4eda4e00
      Nicholas Nunley authored
      Since commit 92418fb1 ("i40e/i40evf: Use usec value instead of reg
      value for ITR defines") the driver tracks the interrupt throttling
      intervals in single usec units, although the actual ITRN registers are
      programmed in 2 usec units. Most register programming flows in the driver
      correctly handle the conversion, although it is currently not applied when
      the registers are initialized to their default values. Most of the time
      this doesn't present a problem since the default values are usually
      immediately overwritten through the standard adaptive throttling mechanism,
      or updated manually by the user, but if adaptive throttling is disabled and
      the interval values are left alone then the incorrect value will persist.
      
      Since the intended default interval of 50 usecs (vs. 100 usecs as
      programmed) performs better for most traffic workloads, this can lead to
      performance regressions.
      
      This patch adds the correct conversion when writing the initial values to
      the ITRN registers.
      Signed-off-by: default avatarNicholas Nunley <nicholas.d.nunley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4eda4e00
    • Colin Ian King's avatar
      ice: fix potential infinite loop because loop counter being too small · 615457a2
      Colin Ian King authored
      Currently the for-loop counter i is a u8 however it is being checked
      against a maximum value hw->num_tx_sched_layers which is a u16. Hence
      there is a potential wrap-around of counter i back to zero if
      hw->num_tx_sched_layers is greater than 255.  Fix this by making i
      a u16.
      
      Addresses-Coverity: ("Infinite loop")
      Fixes: b36c598c ("ice: Updates to Tx scheduler code")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      615457a2
  2. 08 Nov, 2019 10 commits
    • Eric Dumazet's avatar
      net: fix data-race in neigh_event_send() · 1b53d644
      Eric Dumazet authored
      KCSAN reported the following data-race [1]
      
      The fix will also prevent the compiler from optimizing out
      the condition.
      
      [1]
      
      BUG: KCSAN: data-race in neigh_resolve_output / neigh_resolve_output
      
      write to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 1:
       neigh_event_send include/net/neighbour.h:443 [inline]
       neigh_resolve_output+0x78/0x480 net/core/neighbour.c:1474
       neigh_output include/net/neighbour.h:511 [inline]
       ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228
       __ip_finish_output net/ipv4/ip_output.c:308 [inline]
       __ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290
       ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip_output+0xdf/0x210 net/ipv4/ip_output.c:432
       dst_output include/net/dst.h:436 [inline]
       ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125
       __ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
       ip_queue_xmit+0x45/0x60 include/net/ip.h:237
       __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
       tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
       __tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976
       tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999
       tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515
       tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598
       tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:618
      
      read to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 0:
       neigh_event_send include/net/neighbour.h:442 [inline]
       neigh_resolve_output+0x57/0x480 net/core/neighbour.c:1474
       neigh_output include/net/neighbour.h:511 [inline]
       ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228
       __ip_finish_output net/ipv4/ip_output.c:308 [inline]
       __ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290
       ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip_output+0xdf/0x210 net/ipv4/ip_output.c:432
       dst_output include/net/dst.h:436 [inline]
       ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125
       __ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
       ip_queue_xmit+0x45/0x60 include/net/ip.h:237
       __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
       tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
       __tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976
       tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999
       tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515
       tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b53d644
    • Stefano Garzarella's avatar
      vsock/virtio: fix sock refcnt holding during the shutdown · ad8a7220
      Stefano Garzarella authored
      The "42f5cda5" commit rightly set SOCK_DONE on peer shutdown,
      but there is an issue if we receive the SHUTDOWN(RDWR) while the
      virtio_transport_close_timeout() is scheduled.
      In this case, when the timeout fires, the SOCK_DONE is already
      set and the virtio_transport_close_timeout() will not call
      virtio_transport_reset() and virtio_transport_do_close().
      This causes that both sockets remain open and will never be released,
      preventing the unloading of [virtio|vhost]_transport modules.
      
      This patch fixes this issue, calling virtio_transport_reset() and
      virtio_transport_do_close() when we receive the SHUTDOWN(RDWR)
      and there is nothing left to read.
      
      Fixes: 42f5cda5 ("vsock/virtio: set SOCK_DONE on peer shutdown")
      Cc: Stephen Barber <smbarber@chromium.org>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad8a7220
    • David S. Miller's avatar
      Merge tag 'mac80211-for-net-2019-11-08' of... · b05f5b4a
      David S. Miller authored
      Merge tag 'mac80211-for-net-2019-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Three small fixes:
       * we hit a failure path bug related to
         ieee80211_txq_setup_flows()
       * also use kvmalloc() to make that less likely
       * fix a timing value shortly after boot (during
         INITIAL_JIFFIES)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b05f5b4a
    • Alexander Sverdlin's avatar
      net: ethernet: octeon_mgmt: Account for second possible VLAN header · e4dd5608
      Alexander Sverdlin authored
      Octeon's input ring-buffer entry has 14 bits-wide size field, so to account
      for second possible VLAN header max_mtu must be further reduced.
      
      Fixes: 109cc165 ("ethernet/cavium: use core min/max MTU checking")
      Signed-off-by: default avatarAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4dd5608
    • Ahmed Zaki's avatar
      mac80211: fix station inactive_time shortly after boot · 285531f9
      Ahmed Zaki authored
      In the first 5 minutes after boot (time of INITIAL_JIFFIES),
      ieee80211_sta_last_active() returns zero if last_ack is zero. This
      leads to "inactive time" showing jiffies_to_msecs(jiffies).
      
       # iw wlan0 station get fc:ec:da:64:a6:dd
       Station fc:ec:da:64:a6:dd (on wlan0)
      	inactive time:	4294894049 ms
      	.
      	.
      	connected time:	70 seconds
      
      Fix by returning last_rx if last_ack == 0.
      Signed-off-by: default avatarAhmed Zaki <anzaki@gmail.com>
      Link: https://lore.kernel.org/r/20191031121243.27694-1-anzaki@gmail.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      285531f9
    • Toke Høiland-Jørgensen's avatar
      net/fq_impl: Switch to kvmalloc() for memory allocation · 71e67c3b
      Toke Høiland-Jørgensen authored
      The FQ implementation used by mac80211 allocates memory using kmalloc(),
      which can fail; and Johannes reported that this actually happens in
      practice.
      
      To avoid this, switch the allocation to kvmalloc() instead; this also
      brings fq_impl in line with all the FQ qdiscs.
      
      Fixes: 557fc4a0 ("fq: add fair queuing framework")
      Reported-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20191105155750.547379-1-toke@redhat.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      71e67c3b
    • Johannes Berg's avatar
      mac80211: fix ieee80211_txq_setup_flows() failure path · 6dd47d97
      Johannes Berg authored
      If ieee80211_txq_setup_flows() fails, we don't clean up LED
      state properly, leading to crashes later on, fix that.
      
      Fixes: dc8b274f ("mac80211: Move up init of TXQs")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Link: https://lore.kernel.org/r/20191105154110.1ccf7112ba5d.I0ba865792446d051867b33153be65ce6b063d98c@changeidSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      6dd47d97
    • David Ahern's avatar
      ipv4: Fix table id reference in fib_sync_down_addr · e0a31262
      David Ahern authored
      Hendrik reported routes in the main table using source address are not
      removed when the address is removed. The problem is that fib_sync_down_addr
      does not account for devices in the default VRF which are associated
      with the main table. Fix by updating the table id reference.
      
      Fixes: 5a56a0b3 ("net: Don't delete routes in different VRFs")
      Reported-by: default avatarHendrik Donner <hd@os-cillation.de>
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0a31262
    • Eric Dumazet's avatar
      ipv6: fixes rt6_probe() and fib6_nh->last_probe init · 1bef4c22
      Eric Dumazet authored
      While looking at a syzbot KCSAN report [1], I found multiple
      issues in this code :
      
      1) fib6_nh->last_probe has an initial value of 0.
      
         While probably okay on 64bit kernels, this causes an issue
         on 32bit kernels since the time_after(jiffies, 0 + interval)
         might be false ~24 days after boot (for HZ=1000)
      
      2) The data-race found by KCSAN
         I could use READ_ONCE() and WRITE_ONCE(), but we also can
         take the opportunity of not piling-up too many rt6_probe_deferred()
         works by using instead cmpxchg() so that only one cpu wins the race.
      
      [1]
      BUG: KCSAN: data-race in find_match / find_match
      
      write to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 1:
       rt6_probe net/ipv6/route.c:663 [inline]
       find_match net/ipv6/route.c:757 [inline]
       find_match+0x5bd/0x790 net/ipv6/route.c:733
       __find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831
       find_rr_leaf net/ipv6/route.c:852 [inline]
       rt6_select net/ipv6/route.c:896 [inline]
       fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164
       ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200
       ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
       fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117
       ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
       ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
       ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049
       ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150
       inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106
       inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121
       __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
       tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
       tcp_xmit_probe_skb+0x19b/0x1d0 net/ipv4/tcp_output.c:3735
      
      read to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 0:
       rt6_probe net/ipv6/route.c:657 [inline]
       find_match net/ipv6/route.c:757 [inline]
       find_match+0x521/0x790 net/ipv6/route.c:733
       __find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831
       find_rr_leaf net/ipv6/route.c:852 [inline]
       rt6_select net/ipv6/route.c:896 [inline]
       fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164
       ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200
       ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
       fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117
       ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
       ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
       ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049
       ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150
       inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106
       inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121
       __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 18894 Comm: udevd Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: cc3a86c8 ("ipv6: Change rt6_probe to take a fib6_nh")
      Fixes: f547fac6 ("ipv6: rate-limit probes for neighbourless routes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bef4c22
    • Salil Mehta's avatar
      net: hns: Fix the stray netpoll locks causing deadlock in NAPI path · bf5a6b4c
      Salil Mehta authored
      This patch fixes the problem of the spin locks, originally
      meant for the netpoll path of hns driver, causing deadlock in
      the normal NAPI poll path. The issue happened due to the presence
      of the stray leftover spin lock code related to the netpoll,
      whose support was earlier removed from the HNS[1], got activated
      due to enabling of NET_POLL_CONTROLLER switch.
      
      Earlier background:
      The netpoll handling code originally had this bug(as identified
      by Marc Zyngier[2]) of wrong spin lock API being used which did
      not disable the interrupts and hence could cause locking issues.
      i.e. if the lock were first acquired in context to thread like
      'ip' util and this lock if ever got later acquired again in
      context to the interrupt context like TX/RX (Interrupts could
      always pre-empt the lock holding task and acquire the lock again)
      and hence could cause deadlock.
      
      Proposed Solution:
      1. If the netpoll was enabled in the HNS driver, which is not
         right now, we could have simply used spin_[un]lock_irqsave()
      2. But as netpoll is disabled, therefore, it is best to get rid
         of the existing locks and stray code for now. This should
         solve the problem reported by Marc.
      
      [1] https://git.kernel.org/torvalds/c/4bd2c03be7
      [2] https://patchwork.ozlabs.org/patch/1189139/
      
      Fixes: 4bd2c03b ("net: hns: remove ndo_poll_controller")
      Cc: lipeng <lipeng321@huawei.com>
      Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Reported-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarSalil Mehta <salil.mehta@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf5a6b4c
  3. 07 Nov, 2019 24 commits