1. 26 Jun, 2020 3 commits
    • Claudiu Manoil's avatar
      enetc: Fix tx rings bitmap iteration range, irq handling · 0574e200
      Claudiu Manoil authored
      The rings bitmap of an interrupt vector encodes
      which of the device's rings were assigned to that
      interrupt vector.
      Hence the iteration range of the tx rings bitmap
      (for_each_set_bit()) should be the total number of
      Tx rings of that netdevice instead of the number of
      rings assigned to the interrupt vector.
      Since there are 2 cores, and one interrupt vector for
      each core, the number of rings asigned to an interrupt
      vector is half the number of available rings.
      The impact of this error is that the upper half of the
      tx rings could still generate interrupts during napi
      polling.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0574e200
    • Shannon Nelson's avatar
      ionic: update the queue count on open · fa48494c
      Shannon Nelson authored
      Let the network stack know the real number of queues that
      we are using.
      
      v2: added error checking
      
      Fixes: 49d3b493 ("ionic: disable the queues on link down")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa48494c
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 4a21185c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen.
      
       2) The default crypto algorithm selection in Kconfig for IPSEC is out
          of touch with modern reality, fix this up. From Eric Biggers.
      
       3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii
          Nakryiko.
      
       4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from
          Hangbin Liu.
      
       5) Adjust packet alignment handling in ax88179_178a driver to match
          what the hardware actually does. From Jeremy Kerr.
      
       6) register_netdevice can leak in the case one of the notifiers fail,
          from Yang Yingliang.
      
       7) Use after free in ip_tunnel_lookup(), from Taehee Yoo.
      
       8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir
          Oltean.
      
       9) tg3 driver can sleep forever when we get enough EEH errors, fix from
          David Christensen.
      
      10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet
          drivers, from Ciara Loftus.
      
      11) Fix scanning loop break condition in of_mdiobus_register(), from
          Florian Fainelli.
      
      12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon.
      
      13) Endianness fix in mlxsw, from Ido Schimmel.
      
      14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen.
      
      15) Missing bridge mrp configuration validation, from Horatiu Vultur.
      
      16) Fix circular netns references in wireguard, from Jason A. Donenfeld.
      
      17) PTP initialization on recovery is not done properly in qed driver,
          from Alexander Lobakin.
      
      18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong,
          from Rahul Lakkireddy.
      
      19) Don't clear bound device TX queue of socket prematurely otherwise we
          get problems with ktls hw offloading, from Tariq Toukan.
      
      20) ipset can do atomics on unaligned memory, fix from Russell King.
      
      21) Align ethernet addresses properly in bridging code, from Thomas
          Martitz.
      
      22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set,
          from Marcelo Ricardo Leitner.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits)
        rds: transport module should be auto loaded when transport is set
        sch_cake: fix a few style nits
        sch_cake: don't call diffserv parsing code when it is not needed
        sch_cake: don't try to reallocate or unshare skb unconditionally
        ethtool: fix error handling in linkstate_prepare_data()
        wil6210: account for napi_gro_receive never returning GRO_DROP
        hns: do not cast return value of napi_gro_receive to null
        socionext: account for napi_gro_receive never returning GRO_DROP
        wireguard: receive: account for napi_gro_receive never returning GRO_DROP
        vxlan: fix last fdb index during dump of fdb with nhid
        sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
        tc-testing: avoid action cookies with odd length.
        bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
        tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
        net: dsa: sja1105: fix tc-gate schedule with single element
        net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
        net: dsa: sja1105: unconditionally free old gating config
        net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
        net: macb: free resources on failure path of at91ether_open()
        net: macb: call pm_runtime_put_sync on failure path
        ...
      4a21185c
  2. 25 Jun, 2020 37 commits
    • Rao Shoaib's avatar
      rds: transport module should be auto loaded when transport is set · 4c342f77
      Rao Shoaib authored
      This enhancement auto loads transport module when the transport
      is set via SO_RDS_TRANSPORT socket option.
      Reviewed-by: default avatarKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Reviewed-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: default avatarRao Shoaib <rao.shoaib@oracle.com>
      Signed-off-by: default avatarSomasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c342f77
    • David S. Miller's avatar
      Merge branch 'sched-A-couple-of-fixes-for-sch_cake' · 6aeaf262
      David S. Miller authored
      Toke Høiland-Jørgensen says:
      
      ====================
      sched: A couple of fixes for sch_cake
      
      This series contains a couple of fixes for diffserv handling in sch_cake that
      provide a nice speedup (with a somewhat pedantic nit fix tacked on to the end).
      
      Not quite sure about whether this should go to stable; it does provide a nice
      speedup, but it's not strictly a fix in the "correctness" sense. I lean towards
      including this in stable as well, since our most important consumer of that
      (OpenWrt) is likely to backport the series anyway.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6aeaf262
    • Toke Høiland-Jørgensen's avatar
      sch_cake: fix a few style nits · 3f608f0c
      Toke Høiland-Jørgensen authored
      I spotted a few nits when comparing the in-tree version of sch_cake with
      the out-of-tree one: A redundant error variable declaration shadowing an
      outer declaration, and an indentation alignment issue. Fix both of these.
      
      Fixes: 046f6fd5 ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f608f0c
    • Toke Høiland-Jørgensen's avatar
      sch_cake: don't call diffserv parsing code when it is not needed · 8c95eca0
      Toke Høiland-Jørgensen authored
      As a further optimisation of the diffserv parsing codepath, we can skip it
      entirely if CAKE is configured to neither use diffserv-based
      classification, nor to zero out the diffserv bits.
      
      Fixes: c87b4ecd ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c95eca0
    • Ilya Ponetayev's avatar
      sch_cake: don't try to reallocate or unshare skb unconditionally · 9208d286
      Ilya Ponetayev authored
      cake_handle_diffserv() tries to linearize mac and network header parts of
      skb and to make it writable unconditionally. In some cases it leads to full
      skb reallocation, which reduces throughput and increases CPU load. Some
      measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core
      CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable()
      reallocates skb, if skb was allocated in ethernet driver via so-called
      'build skb' method from page cache (it was discovered by strange increase
      of kmalloc-2048 slab at first).
      
      Obtain DSCP value via read-only skb_header_pointer() call, and leave
      linearization only for DSCP bleaching or ECN CE setting. And, as an
      additional optimisation, skip diffserv parsing entirely if it is not needed
      by the current configuration.
      
      Fixes: c87b4ecd ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
      Signed-off-by: default avatarIlya Ponetayev <i.ponetaev@ndmsystems.com>
      [ fix a few style issues, reflow commit message ]
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9208d286
    • Michal Kubecek's avatar
      ethtool: fix error handling in linkstate_prepare_data() · 1ae71d99
      Michal Kubecek authored
      When getting SQI or maximum SQI value fails in linkstate_prepare_data(), we
      must not return without calling ethnl_ops_complete(dev) as that could
      result in imbalance between ethtool_ops ->begin() and ->complete() calls.
      
      Fixes: 80660219 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ae71d99
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 42e9c85f
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Four small fixes:
      
         - Fix a ringbuffer bug for nested events having time go backwards
      
         - Fix a config dependency for boot time tracing to depend on
           synthetic events instead of histograms.
      
         - Fix trigger format parsing to handle multiple spaces
      
         - Fix bootconfig to handle failures in multiple events"
      
      * tag 'trace-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/boottime: Fix kprobe multiple events
        tracing: Fix event trigger to accept redundant spaces
        tracing/boot: Fix config dependency for synthedic event
        ring-buffer: Zero out time extend if it is nested and not absolute
      42e9c85f
    • David S. Miller's avatar
      Merge branch 'napi_gro_receive-caller-return-value-cleanups' · 0e00c05f
      David S. Miller authored
      Jason A. Donenfeld says:
      
      ====================
      napi_gro_receive caller return value cleanups
      
      In 6570bc79 ("net: core: use listified Rx for GRO_NORMAL in
      napi_gro_receive()"), the GRO_NORMAL case stopped calling
      netif_receive_skb_internal, checking its return value, and returning
      GRO_DROP in case it failed. Instead, it calls into
      netif_receive_skb_list_internal (after a bit of indirection), which
      doesn't return any error. Therefore, napi_gro_receive will never return
      GRO_DROP, making handling GRO_DROP dead code.
      
      I emailed the author of 6570bc79 on netdev [1] to see if this change
      was intentional, but the dlink.ru email address has been disconnected,
      and looking a bit further myself, it seems somewhat infeasible to start
      propagating return values backwards from the internal machinations of
      netif_receive_skb_list_internal.
      
      Taking a look at all the callers of napi_gro_receive, it appears that
      three are checking the return value for the purpose of comparing it to
      the now never-happening GRO_DROP, and one just casts it to (void), a
      likely historical leftover. Every other of the 120 callers does not
      bother checking the return value.
      
      And it seems like these remaining 116 callers are doing the right thing:
      after calling napi_gro_receive, the packet is now in the hands of the
      upper layers of the newtworking, and the device driver itself has no
      business now making decisions based on what the upper layers choose to
      do. Incrementing stats counters on GRO_DROP seems like a mistake, made
      by these three drivers, but not by the remaining 117.
      
      It would seem, therefore, that after rectifying these four callers of
      napi_gro_receive, that I should go ahead and just remove returning the
      value from napi_gro_receive all together. However, napi_gro_receive has
      a function event tracer, and being able to introspect into the
      networking stack to see how often napi_gro_receive is returning whatever
      interesting GRO status (aside from _DROP) remains an interesting
      data point worth keeping for debugging.
      
      So, this series simply gets rid of the return value checking for the
      four useless places where that check never evaluates to anything
      meaningful.
      
      [1] https://lore.kernel.org/netdev/20200624210606.GA1362687@zx2c4.com/
      ====================
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e00c05f
    • Jason A. Donenfeld's avatar
      wil6210: account for napi_gro_receive never returning GRO_DROP · 045790b7
      Jason A. Donenfeld authored
      The napi_gro_receive function no longer returns GRO_DROP ever, making
      handling GRO_DROP dead code. This commit removes that dead code.
      Further, it's not even clear that device drivers have any business in
      taking action after passing off received packets; that's arguably out of
      their hands. In this case, too, the non-gro path didn't bother checking
      the return value. Plus, this had some clunky debugging functions that
      duplicated code from elsewhere and was generally pretty messy. So, this
      commit cleans that all up too.
      
      Fixes: 6570bc79 ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      045790b7
    • Jason A. Donenfeld's avatar
      hns: do not cast return value of napi_gro_receive to null · 93ab48a9
      Jason A. Donenfeld authored
      Basically no drivers care about the return value here, and there's no
      __must_check that would make casting to void sensible, so remove it.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93ab48a9
    • Jason A. Donenfeld's avatar
      socionext: account for napi_gro_receive never returning GRO_DROP · e5e7d805
      Jason A. Donenfeld authored
      The napi_gro_receive function no longer returns GRO_DROP ever, making
      handling GRO_DROP dead code. This commit removes that dead code.
      Further, it's not even clear that device drivers have any business in
      taking action after passing off received packets; that's arguably out of
      their hands.
      
      Fixes: 6570bc79 ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5e7d805
    • Jason A. Donenfeld's avatar
      wireguard: receive: account for napi_gro_receive never returning GRO_DROP · df08126e
      Jason A. Donenfeld authored
      The napi_gro_receive function no longer returns GRO_DROP ever, making
      handling GRO_DROP dead code. This commit removes that dead code.
      Further, it's not even clear that device drivers have any business in
      taking action after passing off received packets; that's arguably out of
      their hands.
      
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Fixes: 6570bc79 ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df08126e
    • Roopa Prabhu's avatar
      vxlan: fix last fdb index during dump of fdb with nhid · b18e9834
      Roopa Prabhu authored
      This patch fixes last saved fdb index in fdb dump handler when
      handling fdb's with nhid.
      
      Fixes: 1274e1cc ("vxlan: ecmp support for mac fdb entries")
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b18e9834
    • Marcelo Ricardo Leitner's avatar
      sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket · 471e39df
      Marcelo Ricardo Leitner authored
      If a socket is set ipv6only, it will still send IPv4 addresses in the
      INIT and INIT_ACK packets. This potentially misleads the peer into using
      them, which then would cause association termination.
      
      The fix is to not add IPv4 addresses to ipv6only sockets.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Tested-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      471e39df
    • Briana Oursler's avatar
      tc-testing: avoid action cookies with odd length. · b6186d41
      Briana Oursler authored
      Update odd length cookie hexstrings in csum.json, tunnel_key.json and
      bpf.json to be even length to comply with check enforced in commit
      0149dabf2a1b ("tc: m_actions: check cookie hexstring len") in iproute2.
      Signed-off-by: default avatarBriana Oursler <briana.oursler@gmail.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6186d41
    • David S. Miller's avatar
      Merge branch 'tcp_cubic-fix-spurious-HYSTART_DELAY-on-RTT-decrease' · 3b0e7dc0
      David S. Miller authored
      Neal Cardwell says:
      
      ====================
      tcp_cubic: fix spurious HYSTART_DELAY on RTT decrease
      
      This series fixes a long-standing bug in the TCP CUBIC
      HYSTART_DELAY mechanim recently reported by Mirja Kuehlewind. The
      code can cause a spurious exit of slow start in some particular
      cases: upon an RTT decrease that happens on the 9th or later ACK
      in a round trip. This series fixes the original Hystart code and
      also the recent BPF implementation.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b0e7dc0
    • Neal Cardwell's avatar
      bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT · 7d21d54d
      Neal Cardwell authored
      Apply the fix from:
       "tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT"
      to the BPF implementation of TCP CUBIC congestion control.
      
      Repeating the commit description here for completeness:
      
      Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
      Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
      ACK when the minimum rtt of a connection goes down. From inspection it
      is clear from the existing code that this could happen in an example
      like the following:
      
      o The first 8 RTT samples in a round trip are 150ms, resulting in a
        curr_rtt of 150ms and a delay_min of 150ms.
      
      o The 9th RTT sample is 100ms. The curr_rtt does not change after the
        first 8 samples, so curr_rtt remains 150ms. But delay_min can be
        lowered at any time, so delay_min falls to 100ms. The code executes
        the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
        of 100ms, and the curr_rtt is declared far enough above delay_min to
        force a (spurious) exit of Slow start.
      
      The fix here is simple: allow every RTT sample in a round trip to
      lower the curr_rtt.
      
      Fixes: 6de4a9c4 ("bpf: tcp: Add bpf_cubic example")
      Reported-by: default avatarMirja Kuehlewind <mirja.kuehlewind@ericsson.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d21d54d
    • Neal Cardwell's avatar
      tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT · b344579c
      Neal Cardwell authored
      Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
      Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
      ACK when the minimum rtt of a connection goes down. From inspection it
      is clear from the existing code that this could happen in an example
      like the following:
      
      o The first 8 RTT samples in a round trip are 150ms, resulting in a
        curr_rtt of 150ms and a delay_min of 150ms.
      
      o The 9th RTT sample is 100ms. The curr_rtt does not change after the
        first 8 samples, so curr_rtt remains 150ms. But delay_min can be
        lowered at any time, so delay_min falls to 100ms. The code executes
        the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
        of 100ms, and the curr_rtt is declared far enough above delay_min to
        force a (spurious) exit of Slow start.
      
      The fix here is simple: allow every RTT sample in a round trip to
      lower the curr_rtt.
      
      Fixes: ae27e98a ("[TCP] CUBIC v2.3")
      Reported-by: default avatarMirja Kuehlewind <mirja.kuehlewind@ericsson.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b344579c
    • David S. Miller's avatar
      Merge branch 'Fixes-for-SJA1105-DSA-tc-gate-action' · 29a30bac
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fixes for SJA1105 DSA tc-gate action
      
      This small series fixes 2 bugs in the tc-gate implementation:
      1. The TAS state machine keeps getting rescheduled even after removing
         tc-gate actions on all ports.
      2. tc-gate actions with only one gate control list entry are installed
         to hardware with an incorrect interval of zero, which makes the
         switch erroneously drop those packets (since the configuration is
         invalid).
      
      To keep the code palatable, a forward-declaration was avoided by moving
      some code around in patch 1/4. I hope that isn't too much of an issue.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29a30bac
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix tc-gate schedule with single element · 43ce887c
      Vladimir Oltean authored
      The sja1105_gating_cfg_time_to_interval function does this, as per the
      comments:
      
      /* The gate entries contain absolute times in their e->interval field. Convert
       * that to proper intervals (i.e. "0, 5, 10, 15" to "5, 5, 5, 5").
       */
      
      To perform that task, it iterates over gating_cfg->entries, at each step
      updating the interval of the _previous_ entry. So one interval remains
      to be updated at the end of the loop: the last one (since it isn't
      "prev" for anyone else).
      
      But there was an erroneous check, that the last element's interval
      should not be updated if it's also the only element. I'm not quite sure
      why that check was there, but it's clearly incorrect, as a tc-gate
      schedule with a single element would get an e->interval of zero,
      regardless of the duration requested by the user. The switch wouldn't
      even consider this configuration as valid: it will just drop all traffic
      that matches the rule.
      
      Fixes: 834f8933 ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
      Reported-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43ce887c
    • Vladimir Oltean's avatar
      net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules · 82f6896a
      Vladimir Oltean authored
      Currently, tas_data->enabled would remain true even after deleting all
      tc-gate rules from the switch ports, which would cause the
      sja1105_tas_state_machine to get unnecessarily scheduled.
      
      Also, if there were any errors which would prevent the hardware from
      enabling the gating schedule, the sja1105_tas_state_machine would
      continuously detect and print that, spamming the kernel log, even if the
      rules were subsequently deleted.
      
      The rules themselves are _not_ active, because sja1105_init_scheduling
      does enough of a job to not install the gating schedule in the static
      config. But the virtual link rules themselves are still present.
      
      So call the functions that remove the tc-gate configuration from
      priv->tas_data.gating_cfg, so that tas_data->enabled can be set to
      false, and sja1105_tas_state_machine will stop from being scheduled.
      
      Fixes: 834f8933 ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82f6896a
    • Vladimir Oltean's avatar
      net: dsa: sja1105: unconditionally free old gating config · 026bdb2b
      Vladimir Oltean authored
      Currently sja1105_compose_gating_subschedule is not prepared to be
      called for the case where we want to recompute the global tc-gate
      configuration after we've deleted those actions on a port.
      
      After deleting the tc-gate actions on the last port, max_cycle_time
      would become zero, and that would incorrectly prevent
      sja1105_free_gating_config from getting called.
      
      So move the freeing function above the check for the need to apply a new
      configuration.
      
      Fixes: 834f8933 ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      026bdb2b
    • Vladimir Oltean's avatar
      net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top · e39109f5
      Vladimir Oltean authored
      It turns out that sja1105_compose_gating_subschedule must also be called
      from sja1105_vl_delete, to recalculate the overall tc-gate
      configuration. Currently this is not possible without introducing a
      forward declaration. So move the function at the top of the file, along
      with its dependencies.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e39109f5
    • Claudiu Beznea's avatar
      net: macb: free resources on failure path of at91ether_open() · 33fdef24
      Claudiu Beznea authored
      DMA buffers were not freed on failure path of at91ether_open().
      Along with changes for freeing the DMA buffers the enable/disable
      interrupt instructions were moved to at91ether_start()/at91ether_stop()
      functions and the operations on at91ether_stop() were done in
      their reverse order (compared with how is done in at91ether_start()):
      before this patch the operation order on interface open path
      was as follows:
      1/ alloc DMA buffers
      2/ enable tx, rx
      3/ enable interrupts
      and the order on interface close path was as follows:
      1/ disable tx, rx
      2/ disable interrupts
      3/ free dma buffers.
      
      Fixes: 7897b071 ("net: macb: convert to phylink")
      Signed-off-by: default avatarClaudiu Beznea <claudiu.beznea@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33fdef24
    • Claudiu Beznea's avatar
      net: macb: call pm_runtime_put_sync on failure path · 0eaf228d
      Claudiu Beznea authored
      Call pm_runtime_put_sync() on failure path of at91ether_open.
      
      Fixes: e6a41c23 ("net: macb: ensure interface is not suspended on at91rm9200")
      Signed-off-by: default avatarClaudiu Beznea <claudiu.beznea@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eaf228d
    • Linus Torvalds's avatar
      Merge tag 'fsnotify_for_v5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 52366a10
      Linus Torvalds authored
      Pull fsnotify fixlet from Jan Kara:
       "A performance improvement to reduce impact of fsnotify for inodes
        where it isn't used"
      
      * tag 'fsnotify_for_v5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fs: Do not check if there is a fsnotify watcher on pseudo inodes
      52366a10
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · f4926d51
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net, they are:
      
      1) Unaligned atomic access in ipset, from Russell King.
      
      2) Missing module description, from Rob Gill.
      
      3) Patches to fix a module unload causing NULL pointer dereference in
         xtables, from David Wilder. For the record, I posting here his cover
         letter explaining the problem:
      
          A crash happened on ppc64le when running ltp network tests triggered by
          "rmmod iptable_mangle".
      
          See previous discussion in this thread:
          https://lists.openwall.net/netdev/2020/06/03/161 .
      
          In the crash I found in iptable_mangle_hook() that
          state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference.
          net->ipv4.iptable_mangle is set to NULL in +iptable_mangle_net_exit() and
          called when ip_mangle modules is unloaded. A rmmod task was found running
          in the crash dump.  A 2nd crash showed the same problem when running
          "rmmod iptable_filter" (net->ipv4.iptable_filter=NULL).
      
          To fix this I added .pre_exit hook in all iptable_foo.c. The pre_exit will
          un-register the underlying hook and exit would do the table freeing. The
          netns core does an unconditional +synchronize_rcu after the pre_exit hooks
          insuring no packets are in flight that have picked up the pointer before
          completing the un-register.
      
          These patches include changes for both iptables and ip6tables.
      
          We tested this fix with ltp running iptables01.sh and iptables01.sh -6 a
          loop for 72 hours.
      
      4) Add a selftest for conntrack helper assignment, from Florian Westphal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4926d51
    • Thomas Martitz's avatar
      net: bridge: enfore alignment for ethernet address · 206e7323
      Thomas Martitz authored
      The eth_addr member is passed to ether_addr functions that require
      2-byte alignment, therefore the member must be properly aligned
      to avoid unaligned accesses.
      
      The problem is in place since the initial merge of multicast to unicast:
      commit 6db6f0ea bridge: multicast to unicast
      
      Fixes: 6db6f0ea ("bridge: multicast to unicast")
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Felix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarThomas Martitz <t.martitz@avm.de>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      206e7323
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 87d93e9a
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "Several regression fixes from work that landed in the merge window,
        particularly in the mlx5 driver:
      
         - Various static checker and warning fixes
      
         - General bug fixes in rvt, qedr, hns, mlx5 and hfi1
      
         - Several regression fixes related to the ECE and QP changes in last
           cycle
      
         - Fixes for a few long standing crashers in CMA, uverbs ioctl, and
           xrc"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits)
        IB/hfi1: Add atomic triggered sleep/wakeup
        IB/hfi1: Correct -EBUSY handling in tx code
        IB/hfi1: Fix module use count flaw due to leftover module put calls
        IB/hfi1: Restore kfree in dummy_netdev cleanup
        IB/mad: Fix use after free when destroying MAD agent
        RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata
        RDMA/counter: Query a counter before release
        RDMA/mad: Fix possible memory leak in ib_mad_post_receive_mads()
        RDMA/mlx5: Fix integrity enabled QP creation
        RDMA/mlx5: Remove ECE limitation from the RAW_PACKET QPs
        RDMA/mlx5: Fix remote gid value in query QP
        RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path
        RDMA/core: Check that type_attrs is not NULL prior access
        RDMA/hns: Fix an cmd queue issue when resetting
        RDMA/hns: Fix a calltrace when registering MR from userspace
        RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake
        RDMA/cma: Protect bind_list and listen_list while finding matching cm id
        RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532
        RDMA/efa: Set maximum pkeys device attribute
        RDMA/rvt: Fix potential memory leak caused by rvt_alloc_rq
        ...
      87d93e9a
    • Denis Kirjanov's avatar
      tcp: don't ignore ECN CWR on pure ACK · 25702840
      Denis Kirjanov authored
      there is a problem with the CWR flag set in an incoming ACK segment
      and it leads to the situation when the ECE flag is latched forever
      
      the following packetdrill script shows what happens:
      
      // Stack receives incoming segments with CE set
      +0.1 <[ect0]  . 11001:12001(1000) ack 1001 win 65535
      +0.0 <[ce]    . 12001:13001(1000) ack 1001 win 65535
      +0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535
      
      // Stack repsonds with ECN ECHO
      +0.0 >[noecn]  . 1001:1001(0) ack 12001
      +0.0 >[noecn] E. 1001:1001(0) ack 13001
      +0.0 >[noecn] E. 1001:1001(0) ack 14001
      
      // Write a packet
      +0.1 write(3, ..., 1000) = 1000
      +0.0 >[ect0] PE. 1001:2001(1000) ack 14001
      
      // Pure ACK received
      +0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535
      
      // Since CWR was sent, this packet should NOT have ECE set
      
      +0.1 write(3, ..., 1000) = 1000
      +0.0 >[ect0]  P. 2001:3001(1000) ack 14001
      // but Linux will still keep ECE latched here, with packetdrill
      // flagging a missing ECE flag, expecting
      // >[ect0] PE. 2001:3001(1000) ack 14001
      // in the script
      
      In the situation above we will continue to send ECN ECHO packets
      and trigger the peer to reduce the congestion window. To avoid that
      we can check CWR on pure ACKs received.
      
      v3:
      - Add a sequence check to avoid sending an ACK to an ACK
      
      v2:
      - Adjusted the comment
      - move CWR check before checking for unacknowledged packets
      Signed-off-by: default avatarDenis Kirjanov <denis.kirjanov@suse.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25702840
    • Ard Biesheuvel's avatar
      net: phy: mscc: avoid skcipher API for single block AES encryption · 5a3235e5
      Ard Biesheuvel authored
      The skcipher API dynamically instantiates the transformation object
      on request that implements the requested algorithm optimally on the
      given platform. This notion of optimality only matters for cases like
      bulk network or disk encryption, where performance can be a bottleneck,
      or in cases where the algorithm itself is not known at compile time.
      
      In the mscc case, we are dealing with AES encryption of a single
      block, and so neither concern applies, and we are better off using
      the AES library interface, which is lightweight and safe for this
      kind of use.
      
      Note that the scatterlist API does not permit references to buffers
      that are located on the stack, so the existing code is incorrect in
      any case, but avoiding the skcipher and scatterlist APIs entirely is
      the most straight-forward approach to fixing this.
      
      Cc: Antoine Tenart <antoine.tenart@bootlin.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Fixes: 28c5107a ("net: phy: mscc: macsec support")
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a3235e5
    • Linus Torvalds's avatar
      Merge tag 's390-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 908f7d12
      Linus Torvalds authored
      Pull s390 fixes from Heiko Carstens:
      
       - Fix kernel crash on system call single stepping.
      
       - Make sure early program check handler is executed with DAT on to
         avoid an endless program check loop.
      
       - Add __GFP_NOWARN flag to debug feature to avoid user triggerable
         allocation failure messages.
      
      * tag 's390-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/debug: avoid kernel warning on too large number of pages
        s390/kasan: fix early pgm check handler execution
        s390: fix system call single stepping
      908f7d12
    • Linus Torvalds's avatar
      Merge tag 'sound-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · a4d3712b
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes gathered in the last two weeks.
      
        The major changes here are fixes for the recent DPCM regressions found
        on i.MX and Qualcomm platforms and fixes for resource leaks in ASoC
        DAI registrations.
      
        Other than those are mostly device-specific fixes including the usual
        USB- and HD-audio quirks, and a fix for syzkaller case and ID updates
        for new Intel platforms"
      
      * tag 'sound-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits)
        ALSA: usb-audio: Fix OOB access of mixer element list
        ALSA: usb-audio: add quirk for Samsung USBC Headset (AKG)
        ALSA: usb-audio: Add registration quirk for Kingston HyperX Cloud Flight S
        ASoC: rockchip: Fix a reference count leak.
        ASoC: amd: closing specific instance.
        ALSA: hda: Intel: add missing PCI IDs for ICL-H, TGL-H and EKL
        ASoC: hdac_hda: fix memleak with regmap not freed on remove
        ASoC: SOF: Intel: add PCI IDs for ICL-H and TGL-H
        ASoC: SOF: Intel: add PCI ID for CometLake-S
        ASoC: Intel: SOF: merge COMETLAKE_LP and COMETLAKE_H
        ALSA: hda/realtek: Add mute LED and micmute LED support for HP systems
        ALSA: usb-audio: Fix potential use-after-free of streams
        ALSA: hda/realtek - Add quirk for MSI GE63 laptop
        ASoC: fsl_ssi: Fix bclk calculation for mono channel
        ASoC: SOF: Intel: hda: Clear RIRB status before reading WP
        ASoC: rt1015: Update rt1015 default register value according to spec modification.
        ASoC: qcom: common: set correct directions for dailinks
        ASoc: q6afe: add support to get port direction
        ASoC: soc-pcm: fix checks for multi-cpu FE dailinks
        ASoC: rt5682: Let dai clks be registered whether mclk exists or not
        ...
      a4d3712b
    • David S. Miller's avatar
      Merge branch 'net-bcmgenet-use-hardware-padding-of-runt-frames' · eb2932b0
      David S. Miller authored
      Doug Berger says:
      
      ====================
      net: bcmgenet: use hardware padding of runt frames
      
      Now that scatter-gather and tx-checksumming are enabled by default
      it revealed a packet corruption issue that can occur for very short
      fragmented packets.
      
      When padding these frames to the minimum length it is possible for
      the non-linear (fragment) data to be added to the end of the linear
      header in an SKB. Since the number of fragments is read before the
      padding and used afterward without reloading, the fragment that
      should have been consumed can be tacked on in place of part of the
      padding.
      
      The third commit in this set corrects this by removing the software
      padding and allowing the hardware to add the pad bytes if necessary.
      
      The first two commits resolve warnings observed by the kbuild test
      robot and are included here for simplicity of application.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb2932b0
    • Doug Berger's avatar
      net: bcmgenet: use hardware padding of runt frames · 20d1f2d1
      Doug Berger authored
      When commit 474ea9ca ("net: bcmgenet: correctly pad short
      packets") added the call to skb_padto() it should have been
      located before the nr_frags parameter was read since that value
      could be changed when padding packets with lengths between 55
      and 59 bytes (inclusive).
      
      The use of a stale nr_frags value can cause corruption of the
      pad data when tx-scatter-gather is enabled. This corruption of
      the pad can cause invalid checksum computation when hardware
      offload of tx-checksum is also enabled.
      
      Since the original reason for the padding was corrected by
      commit 7dd39913 ("net: bcmgenet: fix skb_len in
      bcmgenet_xmit_single()") we can remove the software padding all
      together and make use of hardware padding of short frames as
      long as the hardware also always appends the FCS value to the
      frame.
      
      Fixes: 474ea9ca ("net: bcmgenet: correctly pad short packets")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20d1f2d1
    • Doug Berger's avatar
      net: bcmgenet: use __be16 for htons(ETH_P_IP) · d966d2ef
      Doug Berger authored
      The 16-bit value that holds a short in network byte order should
      be declared as a restricted big endian type to allow type checks
      to succeed during assignment.
      
      Fixes: 3e370952 ("net: bcmgenet: add support for ethtool rxnfc flows")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d966d2ef
    • Doug Berger's avatar
      net: bcmgenet: re-remove bcmgenet_hfb_add_filter · 673bafd5
      Doug Berger authored
      This function was originally removed by Baoyou Xie in
      commit e2072600 ("net: bcmgenet: remove unused function in
      bcmgenet.c") to prevent a build warning.
      
      Some of the functions removed by Baoyou Xie are now used for
      WAKE_FILTER support so his commit was reverted, but this function
      is still unused and the kbuild test robot dutifully reported the
      warning.
      
      This commit once again removes the remaining unused hfb functions.
      
      Fixes: 14da1510 ("Revert "net: bcmgenet: remove unused function in bcmgenet.c"")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      673bafd5