1. 25 Sep, 2020 9 commits
    • Wang Qing's avatar
      net/ethernet/broadcom: fix spelling typo · 0eb11dfe
      Wang Qing authored
      Modify the comment typo: "compliment" -> "complement".
      Signed-off-by: default avatarWang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eb11dfe
    • Xiaoliang Yang's avatar
      net: mscc: ocelot: fix fields offset in SG_CONFIG_REG_3 · 4ab810a4
      Xiaoliang Yang authored
      INIT_IPS and GATE_ENABLE fields have a wrong offset in SG_CONFIG_REG_3.
      This register is used by stream gate control of PSFP, and it has not
      been used before, because PSFP is not implemented in ocelot driver.
      Signed-off-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ab810a4
    • Xiaoliang Yang's avatar
      net: dsa: felix: convert TAS link speed based on phylink speed · dba1e466
      Xiaoliang Yang authored
      state->speed holds a value of 10, 100, 1000 or 2500, but
      QSYS_TAG_CONFIG_LINK_SPEED expects a value of 0, 1, 2, 3. So convert the
      speed to a proper value.
      
      Fixes: de143c0e ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
      Signed-off-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba1e466
    • Luo bin's avatar
      hinic: fix wrong return value of mac-set cmd · f68910a8
      Luo bin authored
      It should also be regarded as an error when hw return status=4 for PF's
      setting mac cmd. Only if PF return status=4 to VF should this cmd be
      taken special treatment.
      
      Fixes: 7dd29ee1 ("hinic: add sriov feature support")
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f68910a8
    • Xie He's avatar
      drivers/net/wan/x25_asy: Correct the ndo_open and ndo_stop functions · ed46cd1d
      Xie He authored
      1.
      Move the lapb_register/lapb_unregister calls into the ndo_open/ndo_stop
      functions.
      This makes the LAPB protocol start/stop when the network interface
      starts/stops. When the network interface is down, the LAPB protocol
      shouldn't be running and the LAPB module shoudn't be generating control
      frames.
      
      2.
      Move netif_start_queue/netif_stop_queue into the ndo_open/ndo_stop
      functions.
      This makes the TX queue start/stop when the network interface
      starts/stops.
      (netif_stop_queue was originally in the ndo_stop function. But to make
      the code look better, I created a new function to use as ndo_stop, and
      made it call the original ndo_stop function. I moved netif_stop_queue
      from the original ndo_stop function to the new ndo_stop function.)
      
      Cc: Martin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed46cd1d
    • Maciej Żenczykowski's avatar
      net/ipv4: always honour route mtu during forwarding · 02a1b175
      Maciej Żenczykowski authored
      Documentation/networking/ip-sysctl.txt:46 says:
        ip_forward_use_pmtu - BOOLEAN
          By default we don't trust protocol path MTUs while forwarding
          because they could be easily forged and can lead to unwanted
          fragmentation by the router.
          You only need to enable this if you have user-space software
          which tries to discover path mtus by itself and depends on the
          kernel honoring this information. This is normally not the case.
          Default: 0 (disabled)
          Possible values:
          0 - disabled
          1 - enabled
      
      Which makes it pretty clear that setting it to 1 is a potential
      security/safety/DoS issue, and yet it is entirely reasonable to want
      forwarded traffic to honour explicitly administrator configured
      route mtus (instead of defaulting to device mtu).
      
      Indeed, I can't think of a single reason why you wouldn't want to.
      Since you configured a route mtu you probably know better...
      
      It is pretty common to have a higher device mtu to allow receiving
      large (jumbo) frames, while having some routes via that interface
      (potentially including the default route to the internet) specify
      a lower mtu.
      
      Note that ipv6 forwarding uses device mtu unless the route is locked
      (in which case it will use the route mtu).
      
      This approach is not usable for IPv4 where an 'mtu lock' on a route
      also has the side effect of disabling TCP path mtu discovery via
      disabling the IPv4 DF (don't frag) bit on all outgoing frames.
      
      I'm not aware of a way to lock a route from an IPv6 RA, so that also
      potentially seems wrong.
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com>
      Cc: Vinay Paradkar <vparadka@qti.qualcomm.com>
      Cc: Tyler Wear <twear@quicinc.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02a1b175
    • David S. Miller's avatar
      Merge branch 'net_sched-fix-a-UAF-in-tcf_action_init' · 6d889996
      David S. Miller authored
      Cong Wang says:
      
      ====================
      net_sched: fix a UAF in tcf_action_init()
      
      This patchset fixes a use-after-free triggered by syzbot. Please
      find more details in each patch description.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d889996
    • Cong Wang's avatar
      net_sched: commit action insertions together · 0fedc63f
      Cong Wang authored
      syzbot is able to trigger a failure case inside the loop in
      tcf_action_init(), and when this happens we clean up with
      tcf_action_destroy(). But, as these actions are already inserted
      into the global IDR, other parallel process could free them
      before tcf_action_destroy(), then we will trigger a use-after-free.
      
      Fix this by deferring the insertions even later, after the loop,
      and committing all the insertions in a separate loop, so we will
      never fail in the middle of the insertions any more.
      
      One side effect is that the window between alloction and final
      insertion becomes larger, now it is more likely that the loop in
      tcf_del_walker() sees the placeholder -EBUSY pointer. So we have
      to check for error pointer in tcf_del_walker().
      
      Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com
      Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fedc63f
    • Cong Wang's avatar
      net_sched: defer tcf_idr_insert() in tcf_action_init_1() · e49d8c22
      Cong Wang authored
      All TC actions call tcf_idr_insert() for new action at the end
      of their ->init(), so we can actually move it to a central place
      in tcf_action_init_1().
      
      And once the action is inserted into the global IDR, other parallel
      process could free it immediately as its refcnt is still 1, so we can
      not fail after this, we need to move it after the goto action
      validation to avoid handling the failure case after insertion.
      
      This is found during code review, is not directly triggered by syzbot.
      And this prepares for the next patch.
      
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e49d8c22
  2. 24 Sep, 2020 5 commits
    • Voon Weifeng's avatar
      net: stmmac: removed enabling eee in EEE set callback · 7241c5a6
      Voon Weifeng authored
      EEE should be only be enabled during stmmac_mac_link_up() when the
      link are up and being set up properly. set_eee should only do settings
      configuration and disabling the eee.
      
      Without this fix, turning on EEE using ethtool will return
      "Operation not supported". This is due to the driver is in a dead loop
      waiting for eee to be advertised in the for eee to be activated but the
      driver will only configure the EEE advertisement after the eee is
      activated.
      
      Ethtool should only return "Operation not supported" if there is no EEE
      capbility in the MAC controller.
      
      Fixes: 8a7493e5 ("net: stmmac: Fix a race in EEE enable callback")
      Signed-off-by: default avatarVoon Weifeng <weifeng.voon@intel.com>
      Acked-by: default avatarMark Gross <mgross@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7241c5a6
    • Hauke Mehrtens's avatar
      net: lantiq: Add locking for TX DMA channel · f9317ae5
      Hauke Mehrtens authored
      The TX DMA channel data is accessed by the xrx200_start_xmit() and the
      xrx200_tx_housekeeping() function from different threads. Make sure the
      accesses are synchronized by acquiring the netif_tx_lock() in the
      xrx200_tx_housekeeping() function too. This lock is acquired by the
      kernel before calling xrx200_start_xmit().
      Signed-off-by: default avatarHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9317ae5
    • Tian Tao's avatar
      net: switchdev: Fixed kerneldoc warning · ea6754ae
      Tian Tao authored
      Update kernel-doc line comments to fix warnings reported by make W=1.
      net/switchdev/switchdev.c:413: warning: Function parameter or
      member 'extack' not described in 'call_switchdev_notifiers'
      Signed-off-by: default avatarTian Tao <tiantao6@hisilicon.com>
      Acked-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea6754ae
    • Geert Uytterhoeven's avatar
      Revert "ravb: Fixed to be able to unload modules" · 77972b55
      Geert Uytterhoeven authored
      This reverts commit 1838d6c6.
      
      This commit moved the ravb_mdio_init() call (and thus the
      of_mdiobus_register() call) from the ravb_probe() to the ravb_open()
      call.  This causes a regression during system resume (s2idle/s2ram), as
      new PHY devices cannot be bound while suspended.
      
      During boot, the Micrel PHY is detected like this:
      
          Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached PHY driver [Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=228)
          ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
      
      During system suspend, (A) defer_all_probes is set to true, and (B)
      usermodehelper_disabled is set to UMH_DISABLED, to avoid drivers being
      probed while suspended.
      
        A. If CONFIG_MODULES=n, phy_device_register() calling device_add()
           merely adds the device, but does not probe it yet, as
           really_probe() returns early due to defer_all_probes being set:
      
             dpm_resume+0x128/0x4f8
      	 device_resume+0xcc/0x1b0
      	   dpm_run_callback+0x74/0x340
      	     ravb_resume+0x190/0x1b8
      	       ravb_open+0x84/0x770
      		 of_mdiobus_register+0x1e0/0x468
      		   of_mdiobus_register_phy+0x1b8/0x250
      		     of_mdiobus_phy_device_register+0x178/0x1e8
      		       phy_device_register+0x114/0x1b8
      			 device_add+0x3d4/0x798
      			   bus_probe_device+0x98/0xa0
      			     device_initial_probe+0x10/0x18
      			       __device_attach+0xe4/0x140
      				 bus_for_each_drv+0x64/0xc8
      				   __device_attach_driver+0xb8/0xe0
      				     driver_probe_device.part.11+0xc4/0xd8
      				       really_probe+0x32c/0x3b8
      
           Later, phy_attach_direct() notices no PHY driver has been bound,
           and falls back to the Generic PHY, leading to degraded operation:
      
             Generic PHY e6800000.ethernet-ffffffff:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=POLL)
             ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
      
        B. If CONFIG_MODULES=y, request_module() returns early with -EBUSY due
           to UMH_DISABLED, and MDIO initialization fails completely:
      
             mdio_bus e6800000.ethernet-ffffffff:00: error -16 loading PHY driver module for ID 0x00221622
             ravb e6800000.ethernet eth0: failed to initialize MDIO
             PM: dpm_run_callback(): ravb_resume+0x0/0x1b8 returns -16
             PM: Device e6800000.ethernet failed to resume: error -16
      
           Ignoring -EBUSY in phy_request_driver_module(), like was done for
           -ENOENT in commit 21e19442 ("net: phy: fix issue with loading
           PHY driver w/o initramfs"), would makes it fall back to the Generic
           PHY, like in the CONFIG_MODULES=n case.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSergei Shtylyov <sergei.shtylyov@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77972b55
    • Mat Martineau's avatar
      mptcp: Wake up MPTCP worker when DATA_FIN found on a TCP FIN packet · ef59b195
      Mat Martineau authored
      When receiving a DATA_FIN MPTCP option on a TCP FIN packet, the DATA_FIN
      information would be stored but the MPTCP worker did not get
      scheduled. In turn, the MPTCP socket state would remain in
      TCP_ESTABLISHED and no blocked operations would be awakened.
      
      TCP FIN packets are seen by the MPTCP socket when moving skbs out of the
      subflow receive queues, so schedule the MPTCP worker when a skb with
      DATA_FIN but no data payload is moved from a subflow queue. Other cases
      (DATA_FIN on a bare TCP ACK or on a packet with data payload) are
      already handled.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/84
      Fixes: 43b54c6e ("mptcp: Use full MPTCP-level disconnect state machine")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef59b195
  3. 22 Sep, 2020 26 commits
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 805c6d3c
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "No common topic, just assorted fixes"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fuse: fix the ->direct_IO() treatment of iov_iter
        fs: fix cast in fsparam_u32hex() macro
        vboxsf: Fix the check for the old binary mount-arguments struct
      805c6d3c
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d3017135
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
      
       - fix failure to add bond interfaces to a bridge, the offload-handling
         code was too defensive there and recent refactoring unearthed that.
         Users complained (Ido)
      
       - fix unnecessarily reflecting ECN bits within TOS values / QoS marking
         in TCP ACK and reset packets (Wei)
      
       - fix a deadlock with bpf iterator. Hopefully we're in the clear on
         this front now... (Yonghong)
      
       - BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel)
      
       - fix AQL on mt76 devices with FW rate control and add a couple of AQL
         issues in mac80211 code (Felix)
      
       - fix authentication issue with mwifiex (Maximilian)
      
       - WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro)
      
       - fix exception handling for multipath routes via same device (David
         Ahern)
      
       - revert back to a BH spin lock flavor for nsid_lock: there are paths
         which do require the BH context protection (Taehee)
      
       - fix interrupt / queue / NAPI handling in the lantiq driver (Hauke)
      
       - fix ife module load deadlock (Cong)
      
       - make an adjustment to netlink reply message type for code added in
         this release (the sole change touching uAPI here) (Michal)
      
       - a number of fixes for small NXP and Microchip switches (Vladimir)
      
      [ Pull request acked by David: "you can expect more of this in the
        future as I try to delegate more things to Jakub" ]
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits)
        net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
        net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
        net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
        inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute
        net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU
        net: Update MAINTAINERS for MediaTek switch driver
        net/mlx5e: mlx5e_fec_in_caps() returns a boolean
        net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
        net/mlx5e: kTLS, Fix leak on resync error flow
        net/mlx5e: kTLS, Add missing dma_unmap in RX resync
        net/mlx5e: kTLS, Fix napi sync and possible use-after-free
        net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
        net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
        net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
        net/mlx5e: Fix endianness when calculating pedit mask first bit
        net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported
        net/mlx5e: CT: Fix freeing ct_label mapping
        net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready
        net/mlx5e: Use synchronize_rcu to sync with NAPI
        net/mlx5e: Use RCU to protect rq->xdp_prog
        ...
      d3017135
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.9-2020-09-22' of git://git.kernel.dk/linux-block · 0baca070
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A few fixes - most of them regression fixes from this cycle, but also
        a few stable heading fixes, and a build fix for the included demo tool
        since some systems now actually have gettid() available"
      
      * tag 'io_uring-5.9-2020-09-22' of git://git.kernel.dk/linux-block:
        io_uring: fix openat/openat2 unified prep handling
        io_uring: mark statx/files_update/epoll_ctl as non-SQPOLL
        tools/io_uring: fix compile breakage
        io_uring: don't use retry based buffered reads for non-async bdev
        io_uring: don't re-setup vecs/iter in io_resumit_prep() is already there
        io_uring: don't run task work on an exiting task
        io_uring: drop 'ctx' ref on task work cancelation
        io_uring: grab any needed state during defer prep
      0baca070
    • Linus Torvalds's avatar
      Merge tag 'block-5.9-2020-09-22' of git://git.kernel.dk/linux-block · c37b7189
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few NVMe fixes, and a dasd write zero fix"
      
      * tag 'block-5.9-2020-09-22' of git://git.kernel.dk/linux-block:
        nvmet: get transport reference for passthru ctrl
        nvme-core: get/put ctrl and transport module in nvme_dev_open/release()
        nvme-tcp: fix kconfig dependency warning when !CRYPTO
        nvme-pci: disable the write zeros command for Intel 600P/P3100
        s390/dasd: Fix zero write for FBA devices
      c37b7189
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · eff48dde
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Check kprobe is enabled before unregistering from ftrace as it isn't
         registered when disabled.
      
       - Remove kprobes enabled via command-line that is on init text when
         freed.
      
       - Add missing RCU synchronization for ftrace trampoline symbols removed
         from kallsyms.
      
       - Free trampoline on error path if ftrace_startup() fails.
      
       - Give more space for the longer PID numbers in trace output.
      
       - Fix a possible double free in the histogram code.
      
       - A couple of fixes that were discovered by sparse.
      
      * tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        bootconfig: init: make xbc_namebuf static
        kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
        tracing: fix double free
        ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
        tracing: Make the space reserved for the pid wider
        ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms
        ftrace: Free the trampoline when ftrace_startup() fails
        kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
      eff48dde
    • David S. Miller's avatar
      Merge branch 'Fix-broken-tc-flower-rules-for-mscc_ocelot-switches' · b334ec66
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fix broken tc-flower rules for mscc_ocelot switches
      
      All 3 switch drivers from the Ocelot family have the same bug in the
      VCAP IS2 key offsets, which is that some keys are in the incorrect
      order.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b334ec66
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries · 8194d8fa
      Vladimir Oltean authored
      The IS2 IP4_TCP_UDP key offsets do not correspond to the VSC7514
      datasheet. Whether they work or not is unknown to me. On VSC9959 and
      VSC9953, with the same mistake and same discrepancy from the
      documentation, tc-flower src_port and dst_port rules did not work, so I
      am assuming the same is true here.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8194d8fa
    • Vladimir Oltean's avatar
      net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries · 7a023075
      Vladimir Oltean authored
      Since these were copied from the Felix VCAP IS2 code, and only the
      offsets were adjusted, the order of the bit fields is still wrong.
      Fix it.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a023075
    • Xiaoliang Yang's avatar
      net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries · 8b9e03cd
      Xiaoliang Yang authored
      Some of the IS2 IP4_TCP_UDP keys are not correct, like L4_DPORT,
      L4_SPORT and other L4 keys. This prevents offloaded tc-flower rules from
      matching on src_port and dst_port for TCP and UDP packets.
      Signed-off-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b9e03cd
    • Eric Dumazet's avatar
      inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute · d5e4d0a5
      Eric Dumazet authored
      User space could send an invalid INET_DIAG_REQ_PROTOCOL attribute
      as caught by syzbot.
      
      BUG: KMSAN: uninit-value in inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline]
      BUG: KMSAN: uninit-value in __inet_diag_dump+0x58c/0x720 net/ipv4/inet_diag.c:1147
      CPU: 0 PID: 8505 Comm: syz-executor174 Not tainted 5.9.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x21c/0x280 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219
       inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline]
       __inet_diag_dump+0x58c/0x720 net/ipv4/inet_diag.c:1147
       inet_diag_dump_compat+0x2a5/0x380 net/ipv4/inet_diag.c:1254
       netlink_dump+0xb73/0x1cb0 net/netlink/af_netlink.c:2246
       __netlink_dump_start+0xcf2/0xea0 net/netlink/af_netlink.c:2354
       netlink_dump_start include/linux/netlink.h:246 [inline]
       inet_diag_rcv_msg_compat+0x5da/0x6c0 net/ipv4/inet_diag.c:1288
       sock_diag_rcv_msg+0x24f/0x620 net/core/sock_diag.c:256
       netlink_rcv_skb+0x6d7/0x7e0 net/netlink/af_netlink.c:2470
       sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:275
       netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
       netlink_unicast+0x11c8/0x1490 net/netlink/af_netlink.c:1330
       netlink_sendmsg+0x173a/0x1840 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       ____sys_sendmsg+0xc82/0x1240 net/socket.c:2353
       ___sys_sendmsg net/socket.c:2407 [inline]
       __sys_sendmsg+0x6d1/0x820 net/socket.c:2440
       __do_sys_sendmsg net/socket.c:2449 [inline]
       __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
       do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x441389
      Code: e8 fc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff3b02ce98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441389
      RDX: 0000000000000000 RSI: 0000000020001500 RDI: 0000000000000003
      RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000402130
      R13: 00000000004021c0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:143 [inline]
       kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:126
       kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:80
       slab_alloc_node mm/slub.c:2907 [inline]
       __kmalloc_node_track_caller+0x9aa/0x12f0 mm/slub.c:4511
       __kmalloc_reserve net/core/skbuff.c:142 [inline]
       __alloc_skb+0x35f/0xb30 net/core/skbuff.c:210
       alloc_skb include/linux/skbuff.h:1094 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline]
       netlink_sendmsg+0xdb9/0x1840 net/netlink/af_netlink.c:1894
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       ____sys_sendmsg+0xc82/0x1240 net/socket.c:2353
       ___sys_sendmsg net/socket.c:2407 [inline]
       __sys_sendmsg+0x6d1/0x820 net/socket.c:2440
       __do_sys_sendmsg net/socket.c:2449 [inline]
       __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
       do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 3f935c75 ("inet_diag: support for wider protocol numbers")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Christoph Paasch <cpaasch@apple.com>
      Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5e4d0a5
    • Vladimir Oltean's avatar
      net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU · 99f62a74
      Vladimir Oltean authored
      When calling the RCU brother of br_vlan_get_pvid(), lockdep warns:
      
      =============================
      WARNING: suspicious RCU usage
      5.9.0-rc3-01631-g13c17acb8e38-dirty #814 Not tainted
      -----------------------------
      net/bridge/br_private.h:1054 suspicious rcu_dereference_protected() usage!
      
      Call trace:
       lockdep_rcu_suspicious+0xd4/0xf8
       __br_vlan_get_pvid+0xc0/0x100
       br_vlan_get_pvid_rcu+0x78/0x108
      
      The warning is because br_vlan_get_pvid_rcu() calls nbp_vlan_group()
      which calls rtnl_dereference() instead of rcu_dereference(). In turn,
      rtnl_dereference() calls rcu_dereference_protected() which assumes
      operation under an RCU write-side critical section, which obviously is
      not the case here. So, when the incorrect primitive is used to access
      the RCU-protected VLAN group pointer, READ_ONCE() is not used, which may
      cause various unexpected problems.
      
      I'm sad to say that br_vlan_get_pvid() and br_vlan_get_pvid_rcu() cannot
      share the same implementation. So fix the bug by splitting the 2
      functions, and making br_vlan_get_pvid_rcu() retrieve the VLAN groups
      under proper locking annotations.
      
      Fixes: 7582f5b7 ("bridge: add br_vlan_get_pvid_rcu()")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99f62a74
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2020-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 47cec3f6
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes-2020-09-18
      
      This series introduces some fixes to mlx5 driver.
      
      Please pull and let me know if there is any problem.
      
      v1->v2:
       Remove missing patch from -stable list.
      
      For -stable v5.1
       ('net/mlx5: Fix FTE cleanup')
      
      For -stable v5.3
       ('net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported')
       ('net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported')
      
      For -stable v5.7
       ('net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready')
      
      For -stable v5.8
       ('net/mlx5e: Use RCU to protect rq->xdp_prog')
       ('net/mlx5e: Fix endianness when calculating pedit mask first bit')
       ('net/mlx5e: Use synchronize_rcu to sync with NAPI')
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47cec3f6
    • Sean Wang's avatar
      net: Update MAINTAINERS for MediaTek switch driver · 2b617c11
      Sean Wang authored
      Update maintainers for MediaTek switch driver with Landen Chao who is
      familiar with MediaTek MT753x switch devices and will help maintenance
      from the vendor side.
      
      Cc: Steven Liu <steven.liu@mediatek.com>
      Signed-off-by: default avatarSean Wang <sean.wang@mediatek.com>
      Signed-off-by: default avatarLanden Chao <Landen.Chao@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b617c11
    • Saeed Mahameed's avatar
      net/mlx5e: mlx5e_fec_in_caps() returns a boolean · cb39ccc5
      Saeed Mahameed authored
      Returning errno is a bug, fix that.
      
      Also fixes smatch warnings:
      drivers/net/ethernet/mellanox/mlx5/core/en/port.c:453
      mlx5e_fec_in_caps() warn: signedness bug returning '(-95)'
      
      Fixes: 2132b71f ("net/mlx5e: Advertise globaly supported FEC modes")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarAya Levin <ayal@nvidia.com>
      cb39ccc5
    • Saeed Mahameed's avatar
      net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock · 94c4fed7
      Saeed Mahameed authored
      The spinlock only needed when accessing the channel's icosq, grab the lock
      after the buf allocation in resync_post_get_progress_params() to avoid
      kzalloc(GFP_KERNEL) in atomic context.
      
      Fixes: 0419d8c9 ("net/mlx5e: kTLS, Add kTLS RX resync support")
      Reported-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      94c4fed7
    • Saeed Mahameed's avatar
      net/mlx5e: kTLS, Fix leak on resync error flow · 581642f3
      Saeed Mahameed authored
      Resync progress params buffer and dma weren't released on error,
      Add missing error unwinding for resync_post_get_progress_params().
      
      Fixes: 0419d8c9 ("net/mlx5e: kTLS, Add kTLS RX resync support")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      581642f3
    • Saeed Mahameed's avatar
      net/mlx5e: kTLS, Add missing dma_unmap in RX resync · 66ce5fc0
      Saeed Mahameed authored
      Progress params dma address is never unmapped, unmap it when completion
      handling is over.
      
      Fixes: 0419d8c9 ("net/mlx5e: kTLS, Add kTLS RX resync support")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      66ce5fc0
    • Tariq Toukan's avatar
      net/mlx5e: kTLS, Fix napi sync and possible use-after-free · 6e8de0b6
      Tariq Toukan authored
      Using synchronize_rcu() is sufficient to wait until running NAPI quits.
      
      See similar upstream fix with detailed explanation:
      ("net/mlx5e: Use synchronize_rcu to sync with NAPI")
      
      This change also fixes a possible use-after-free as the NAPI
      might be already released at this stage.
      
      Fixes: 0419d8c9 ("net/mlx5e: kTLS, Add kTLS RX resync support")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6e8de0b6
    • Tariq Toukan's avatar
      net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported · 8f0bcd19
      Tariq Toukan authored
      The set of TLS TX global SW counters in mlx5e_tls_sw_stats_desc
      is updated from all rings by using atomic ops.
      This set of stats is used only in the FPGA TLS use case, not in
      the Connect-X TLS one, where regular per-ring counters are used.
      
      Do not expose them in the Connect-X use case, as this would cause
      counter duplication. For example, tx_tls_drop_no_sync_data would
      appear twice in the ethtool stats.
      
      Fixes: d2ead1f3 ("net/mlx5e: Add kTLS TX HW offload support")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8f0bcd19
    • Alaa Hleihel's avatar
      net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats() · b521105b
      Alaa Hleihel authored
      The cited commit started to reuse function mlx5e_update_ndo_stats() for
      the representors as well.
      However, the function is hard-coded to work on mlx5e_nic_stats_grps only.
      Due to this issue, the representors statistics were not updated in the
      output of "ip -s".
      
      Fix it to work with the correct group by extracting it from the caller's
      profile.
      
      Also, while at it and since this function became generic, move it to
      en_stats.c and rename it accordingly.
      
      Fixes: 8a236b15 ("net/mlx5e: Convert rep stats to mlx5e_stats_grp-based infra")
      Signed-off-by: default avatarAlaa Hleihel <alaa@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b521105b
    • Ron Diskin's avatar
      net/mlx5e: Fix multicast counter not up-to-date in "ip -s" · 47c97e6b
      Ron Diskin authored
      Currently the FW does not generate events for counters other than error
      counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip -s
      uses) might run in atomic context, while the FW interface is non atomic.
      Thus, 'ip' is not allowed to issue FW commands, so it will only display
      cached counters in the driver.
      
      Add a SW counter (mcast_packets) in the driver to count rx multicast
      packets. The counter also counts broadcast packets, as we consider it a
      special case of multicast.
      Use the counter value when calling "ip -s"/"ifconfig".
      
      Fixes: f62b8bb8 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
      Signed-off-by: default avatarRon Diskin <rondi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      47c97e6b
    • Maor Dickman's avatar
      net/mlx5e: Fix endianness when calculating pedit mask first bit · 82198d8b
      Maor Dickman authored
      The field mask value is provided in network byte order and has to
      be converted to host byte order before calculating pedit mask
      first bit.
      
      Fixes: 88f30bbc ("net/mlx5e: Bit sized fields rewrite support")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      82198d8b
    • Maor Dickman's avatar
      net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported · 6cec0229
      Maor Dickman authored
      The cited commit creates peer miss group during switchdev mode
      initialization in order to handle miss packets correctly while in VF
      LAG mode. This is done regardless of FW support of such groups which
      could cause rules setups failure later on.
      
      Fix by adding FW capability check before creating peer groups/rule.
      
      Fixes: ac004b83 ("net/mlx5e: E-Switch, Add peer miss rules")
      Signed-off-by: default avatarMaor Dickman <maord@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarRaed Salem <raeds@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6cec0229
    • Roi Dayan's avatar
      net/mlx5e: CT: Fix freeing ct_label mapping · 4c8594ad
      Roi Dayan authored
      Add missing mapping remove call when removing ct rule,
      as the mapping was allocated when ct rule was adding with ct_label.
      Also there is a missing mapping remove call in error flow.
      
      Fixes: 54b154ec ("net/mlx5e: CT: Map 128 bits labels to 32 bit map ID")
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarEli Britstein <elibr@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      4c8594ad
    • Jianbo Liu's avatar
      net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready · 12a240a4
      Jianbo Liu authored
      When deleting vxlan flow rule under multipath, tun_info in parse_attr is
      not freed when the rule is not ready.
      
      Fixes: ef06c9ee ("net/mlx5e: Allow one failure when offloading tc encap rules under multipath")
      Signed-off-by: default avatarJianbo Liu <jianbol@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      12a240a4
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Use synchronize_rcu to sync with NAPI · 9c25a22d
      Maxim Mikityanskiy authored
      As described in the previous commit, napi_synchronize doesn't quite fit
      the purpose when we just need to wait until the currently running NAPI
      quits. Its implementation waits until NAPI is not running by polling and
      waiting for 1ms in between. In cases where we need to deactivate one
      queue (e.g., recovery flows) or where we deactivate them one-by-one
      (deactivate channel flow), we may get stuck in napi_synchronize forever
      if other queues keep NAPI active, causing a soft lockup. Depending on
      kernel configuration (CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC), it may result
      in a kernel panic.
      
      To fix the issue, use synchronize_rcu to wait for NAPI to quit, and wrap
      the whole NAPI in rcu_read_lock.
      
      Fixes: acc6c595 ("net/mlx5e: Split open/close channels to stages")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      9c25a22d