1. 19 Dec, 2018 31 commits
    • Bryan Whitehead's avatar
      lan743x: Remove MAC Reset from initialization · e0e58787
      Bryan Whitehead authored
      The MAC Reset was noticed to erase important EEPROM settings.
      It is also unnecessary since a chip wide reset was done earlier
      in initialization, and that reset preserves EEPROM settings.
      
      There for this patch removes the unnecessary MAC specific reset.
      Signed-off-by: default avatarBryan Whitehead <Bryan.Whitehead@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0e58787
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · d9842f38
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-fixes-2018-12-19
      
      Some fixes for the mlx5 driver
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9842f38
    • Alaa Hleihel's avatar
      net/mlx5e: Remove the false indication of software timestamping support · 47654204
      Alaa Hleihel authored
      mlx5 driver falsely advertises support of software timestamping.
      Fix it by removing the false indication.
      
      Fixes: ef9814de ("net/mlx5e: Add HW timestamping (TS) support")
      Signed-off-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      47654204
    • Yuval Avnery's avatar
      net/mlx5: Typo fix in del_sw_hw_rule · f0337889
      Yuval Avnery authored
      Expression terminated with "," instead of ";", resulted in
      set_fte getting bad value for modify_enable_mask field.
      
      Fixes: bd5251db ("net/mlx5_core: Introduce flow steering destination of type counter")
      Signed-off-by: default avatarYuval Avnery <yuvalav@mellanox.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f0337889
    • Tariq Toukan's avatar
      net/mlx5e: RX, Fix wrong early return in receive queue poll · bfc69825
      Tariq Toukan authored
      When the completion queue of the RQ is empty, do not immediately return.
      If left-over decompressed CQEs (from the previous cycle) were processed,
      need to go to the finalization part of the poll function.
      
      Bug exists only when CQE compression is turned ON.
      
      This solves the following issue:
      mlx5_core 0000:82:00.1: mlx5_eq_int:544:(pid 0): CQ error on CQN 0xc08, syndrome 0x1
      mlx5_core 0000:82:00.1 p4p2: mlx5e_cq_error_event: cqn=0x000c08 event=0x04
      
      Fixes: 4b7dfc99 ("net/mlx5e: Early-return on empty completion queues")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      bfc69825
    • Cong Wang's avatar
      ipv6: explicitly initialize udp6_addr in udp_sock_create6() · fb242745
      Cong Wang authored
      syzbot reported the use of uninitialized udp6_addr::sin6_scope_id.
      We can just set ::sin6_scope_id to zero, as tunnels are unlikely
      to use an IPv6 address that needs a scope id and there is no
      interface to bind in this context.
      
      For net-next, it looks different as we have cfg->bind_ifindex there
      so we can probably call ipv6_iface_scope_id().
      
      Same for ::sin6_flowinfo, tunnels don't use it.
      
      Fixes: 8024e028 ("udp: Add udp_sock_create for UDP tunnels to open listener socket")
      Reported-by: syzbot+c56449ed3652e6720f30@syzkaller.appspotmail.com
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb242745
    • Michael Chan's avatar
      bnxt_en: Fix ethtool self-test loopback. · 84404d5f
      Michael Chan authored
      The current code has 2 problems.  It assumes that the RX ring for
      the loopback packet is combined with the TX ring.  This is not
      true if the ethtool channels are set to non-combined mode.  The
      second problem is that it won't work on 57500 chips without
      adjusting the logic to get the proper completion ring (cpr) pointer.
      Fix both issues by locating the proper cpr pointer through the RX
      ring.
      
      Fixes: e44758b7 ("bnxt_en: Use bnxt_cp_ring_info struct pointer as parameter for RX path.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84404d5f
    • David S. Miller's avatar
      Merge branch 'rds-fixes' · 912cb1d5
      David S. Miller authored
      Shamir Rabinovitch says:
      
      ====================
      WARNING in rds_message_alloc_sgs
      
      This patch set fix google syzbot rds bug found in linux-next.
      The first patch solve the syzbot issue.
      The second patch fix issue mentioned by Leon Romanovsky that
      drivers should not call WARN_ON as result from user input.
      
      syzbot bug report can be foud here: https://lkml.org/lkml/2018/10/31/28
      
      v1->v2:
      - patch 1: make rds_iov_vector fields name more descriptive (Hakon)
      - patch 1: fix potential mem leak in rds_rm_size if krealloc fail
        (Hakon)
      v2->v3:
      - patch 2: harden rds_sendmsg for invalid number of sgs (Gerd)
      v3->v4
      - Santosh a.b. on both patches + repost to net-dev
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      912cb1d5
    • shamir rabinovitch's avatar
      net/rds: remove user triggered WARN_ON in rds_sendmsg · c75ab8a5
      shamir rabinovitch authored
      per comment from Leon in rdma mailing list
      https://lkml.org/lkml/2018/10/31/312 :
      
      Please don't forget to remove user triggered WARN_ON.
      https://lwn.net/Articles/769365/
      "Greg Kroah-Hartman raised the problem of core kernel API code that will
      use WARN_ON_ONCE() to complain about bad usage; that will not generate
      the desired result if WARN_ON_ONCE() is configured to crash the machine.
      He was told that the code should just call pr_warn() instead, and that
      the called function should return an error in such situations. It was
      generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be
      triggered from user space need to be fixed."
      
      in addition harden rds_sendmsg to detect and overcome issues with
      invalid sg count and fail the sendmsg.
      Suggested-by: default avatarLeon Romanovsky <leon@kernel.org>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarshamir rabinovitch <shamir.rabinovitch@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c75ab8a5
    • shamir rabinovitch's avatar
      net/rds: fix warn in rds_message_alloc_sgs · ea010070
      shamir rabinovitch authored
      redundant copy_from_user in rds_sendmsg system call expose rds
      to issue where rds_rdma_extra_size walk the rds iovec and and
      calculate the number pf pages (sgs) it need to add to the tail of
      rds message and later rds_cmsg_rdma_args copy the rds iovec again
      and re calculate the same number and get different result causing
      WARN_ON in rds_message_alloc_sgs.
      
      fix this by doing the copy_from_user only once per rds_sendmsg
      system call.
      
      When issue occur the below dump is seen:
      
      WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316 rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 19789 Comm: syz-executor827 Not tainted 4.19.0-next-20181030+ #101
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x244/0x39d lib/dump_stack.c:113
       panic+0x2ad/0x55c kernel/panic.c:188
       __warn.cold.8+0x20/0x45 kernel/panic.c:540
       report_bug+0x254/0x2d0 lib/bug.c:186
       fixup_bug arch/x86/kernel/traps.c:178 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
       do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
      RIP: 0010:rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
      Code: c0 74 04 3c 03 7e 6c 44 01 ab 78 01 00 00 e8 2b 9e 35 fa 4c 89 e0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 14 9e 35 fa <0f> 0b 31 ff 44 89 ee e8 18 9f 35 fa 45 85 ed 75 1b e8 fe 9d 35 fa
      RSP: 0018:ffff8801c51b7460 EFLAGS: 00010293
      RAX: ffff8801bc412080 RBX: ffff8801d7bf4040 RCX: ffffffff8749c9e6
      RDX: 0000000000000000 RSI: ffffffff8749ca5c RDI: 0000000000000004
      RBP: ffff8801c51b7490 R08: ffff8801bc412080 R09: ffffed003b5c5b67
      R10: ffffed003b5c5b67 R11: ffff8801dae2db3b R12: 0000000000000000
      R13: 000000000007165c R14: 000000000007165c R15: 0000000000000005
       rds_cmsg_rdma_args+0x82d/0x1510 net/rds/rdma.c:623
       rds_cmsg_send net/rds/send.c:971 [inline]
       rds_sendmsg+0x19a2/0x3180 net/rds/send.c:1273
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
       __sys_sendmsg+0x11d/0x280 net/socket.c:2155
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x44a859
      Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 6b cb fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f1d4710ada8 EFLAGS: 00000297 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 000000000044a859
      RDX: 0000000000000000 RSI: 0000000020001600 RDI: 0000000000000003
      RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000297 R12: 00000000006dcc2c
      R13: 646e732f7665642f R14: 00007f1d4710b9c0 R15: 00000000006dcd2c
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarshamir rabinovitch <shamir.rabinovitch@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea010070
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-for-davem-2018-12-19' of... · c6f4075e
      David S. Miller authored
      Merge tag 'wireless-drivers-for-davem-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 4.20
      
      Last set of fixes for 4.20. All (except the mt76 fix) of these are
      important fixes to user reported problems and pretty small in size.
      
      rtlwifi
      
      * fix skb leak
      
      mwifiex
      
      * revert a commit from v4.19 due to problems with locking
      
      mt76
      
      * fix a potential NULL derenfence
      
      * add entry to MAINTAINERS
      
      iwlwifi
      
      * fix a firmware crash which was a regression introduced in v4.20-rc4
      
      ath10k
      
      * fix a firmware crash with wcn3990 firmware
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6f4075e
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2018-12-19' of... · 49ce708b
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Just three fixes:
       * fix a memory leak in an error path
       * fix TXQs in interface teardown
       * free fraglist if we used it internally
         before returning SKB
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49ce708b
    • Rakesh Pillai's avatar
      ath10k: skip sending quiet mode cmd for WCN3990 · 53884577
      Rakesh Pillai authored
      HL2.0 firmware does not support setting quiet mode.  If the host driver sends
      the quiet mode setting command to the HL2.0 firmware, it crashes with the below
      signature.
      
      fatal error received: err_qdi.c:456:EX:wlan_process:1:WLAN RT:207a:PC=b001b4f0
      
      The quiet mode command support is exposed by the firmware via thermal throttle
      wmi service. Enable ath10k thermal support if thermal throttle wmi service bit
      is set.  10.x firmware versions support this feature by default, but
      unfortunately do not advertise the support via service flags, hence have to
      manually set the service flag in ath10k_core_compat_services().
      
      Tested on QCA988X with 10.2.4.70.9-2. Also tested on WCN3990.
      Co-developed-by: default avatarGovind Singh <govinds@codeaurora.org>
      Co-developed-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarRakesh Pillai <pillair@codeaurora.org>
      Signed-off-by: default avatarGovind Singh <govinds@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      53884577
    • Sara Sharon's avatar
      mac80211: free skb fraglist before freeing the skb · 34b1e0e9
      Sara Sharon authored
      mac80211 uses the frag list to build AMSDU. When freeing
      the skb, it may not be really freed, since someone is still
      holding a reference to it.
      In that case, when TCP skb is being retransmitted, the
      pointer to the frag list is being reused, while the data
      in there is no longer valid.
      Since we will never get frag list from the network stack,
      as mac80211 doesn't advertise the capability, we can safely
      free and nullify it before releasing the SKB.
      Signed-off-by: default avatarSara Sharon <sara.sharon@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      34b1e0e9
    • Johannes Berg's avatar
      nl80211: fix memory leak if validate_pae_over_nl80211() fails · d350a0f4
      Johannes Berg authored
      If validate_pae_over_nl80211() were to fail in nl80211_crypto_settings(),
      we might leak the 'connkeys' allocation. Fix this.
      
      Fixes: 64bf3d4b ("nl80211: Add CONTROL_PORT_OVER_NL80211 attribute")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      d350a0f4
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 3061169a
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2018-12-18
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) promote bpf_perf_event.h to mandatory UAPI header, from Masahiro.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3061169a
    • Myungho Jung's avatar
      net/smc: fix TCP fallback socket release · 78abe3d0
      Myungho Jung authored
      clcsock can be released while kernel_accept() references it in TCP
      listen worker. Also, clcsock needs to wake up before released if TCP
      fallback is used and the clcsock is blocked by accept. Add a lock to
      safely release clcsock and call kernel_sock_shutdown() to wake up
      clcsock from accept in smc_release().
      
      Reported-by: syzbot+0bf2e01269f1274b4b03@syzkaller.appspotmail.com
      Reported-by: syzbot+e3132895630f957306bc@syzkaller.appspotmail.com
      Signed-off-by: default avatarMyungho Jung <mhjungk@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78abe3d0
    • Colin Ian King's avatar
      vxge: ensure data0 is initialized in when fetching firmware version information · f7db2beb
      Colin Ian King authored
      Currently variable data0 is not being initialized so a garbage value is
      being passed to vxge_hw_vpath_fw_api and this value is being written to
      the rts_access_steer_data0 register.  There are other occurrances where
      data0 is being initialized to zero (e.g. in function
      vxge_hw_upgrade_read_version) so I think it makes sense to ensure data0
      is initialized likewise to 0.
      
      Detected by CoverityScan, CID#140696 ("Uninitialized scalar variable")
      
      Fixes: 8424e00d ("vxge: serialize access to steering control register")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7db2beb
    • Juergen Gross's avatar
      xen/netfront: tolerate frags with no data · d81c5054
      Juergen Gross authored
      At least old Xen net backends seem to send frags with no real data
      sometimes. In case such a fragment happens to occur with the frag limit
      already reached the frontend will BUG currently even if this situation
      is easily recoverable.
      
      Modify the BUG_ON() condition accordingly.
      Tested-by: default avatarDietmar Hahn <dietmar.hahn@ts.fujitsu.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d81c5054
    • Kunihiko Hayashi's avatar
      net: phy: Fix the issue that netif always links up after resuming · 8742beb5
      Kunihiko Hayashi authored
      Even though the link is down before entering hibernation,
      there is an issue that the network interface always links up after resuming
      from hibernation.
      
      If the link is still down before enabling the network interface,
      and after resuming from hibernation, the phydev->state is forcibly set
      to PHY_UP in mdio_bus_phy_restore(), and the link becomes up.
      
      In suspend sequence, only if the PHY is attached, mdio_bus_phy_suspend()
      calls phy_stop_machine(), and mdio_bus_phy_resume() calls
      phy_start_machine().
      In resume sequence, it's enough to do the same as mdio_bus_phy_resume()
      because the state has been preserved.
      
      This patch fixes the issue by calling phy_start_machine() in
      mdio_bus_phy_restore() in the same way as mdio_bus_phy_resume().
      
      Fixes: bc87922f ("phy: Move PHY PM operations into phy_device")
      Suggested-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarKunihiko Hayashi <hayashi.kunihiko@socionext.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8742beb5
    • Jason Martinsen's avatar
      lan78xx: Resolve issue with changing MAC address · 15515aaa
      Jason Martinsen authored
      Current state for the lan78xx driver does not allow for changing the
      MAC address of the interface, without either removing the module (if
      you compiled it that way) or rebooting the machine.  If you attempt to
      change the MAC address, ifconfig will show the new address, however,
      the system/interface will not respond to any traffic using that
      configuration.  A few short-term options to work around this are to
      unload the module and reload it with the new MAC address, change the
      interface to "promisc", or reboot with the correct configuration to
      change the MAC.
      
      This patch enables the ability to change the MAC address via fairly normal means...
      ifdown <interface>
      modify entry in /etc/network/interfaces OR a similar method
      ifup <interface>
      Then test via any network communication, such as ICMP requests to gateway.
      
      My only test platform for this patch has been a raspberry pi model 3b+.
      Signed-off-by: default avatarJason Martinsen <jasonmartinsen@msn.com>
      
      -----
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15515aaa
    • Bryan Whitehead's avatar
      lan743x: Expand phy search for LAN7431 · 0db7d253
      Bryan Whitehead authored
      The LAN7431 uses an external phy, and it can be found anywhere in
      the phy address space. This patch uses phy address 1 for LAN7430
      only. And searches all addresses otherwise.
      Signed-off-by: default avatarBryan Whitehead <Bryan.Whitehead@microchip.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0db7d253
    • David S. Miller's avatar
      Merge branch 'vxlan-Various-fixes' · 59fc137e
      David S. Miller authored
      Petr Machata says:
      
      ====================
      vxlan: Various fixes
      
      This patch set contains three fixes for the vxlan driver.
      
      Patch #1 fixes handling of offload mark on replaced VXLAN FDB entries. A
      way to trigger this is to replace the FDB entry with one that can not be
      offloaded. A future patch set should make it possible to veto such FDB
      changes. However the FDB might still fail to be offloaded due to another
      issue, and the offload mark should reflect that.
      
      Patch #2 fixes problems in __vxlan_dev_create() when a call to
      rtnl_configure_link() fails. These failures would be tricky to hit on a
      real system, the most likely vector is through an error in vxlan_open().
      However, with the abovementioned vetoing patchset, vetoing the created
      entry would trigger the same problems (and be easier to reproduce).
      
      Patch #3 fixes a problem in vxlan_changelink(). In situations where the
      default remote configured in the FDB table (if any) does not exactly
      match the remote address configured at the VXLAN device, changing the
      remote address breaks the default FDB entry. Patch #4 is then a self
      test for this issue.
      
      v3:
      - Patch #2:
          - Reuse the same errout block for both cleanup paths. Use a bool to
            decide whether the unregister_netdevice() call should be made.
      
      v2:
      - Drop former patch #3
      - Patch #2:
          - Delete the default entry before calling unregister_netdevice(). That
            takes care of former patch #3, hence tweak the commit message to
            mention that problem as well.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59fc137e
    • Petr Machata's avatar
      selftests: net: Add test_vxlan_fdb_changelink.sh · 55cbe079
      Petr Machata authored
      Add a test to exercise the fix from the previous patch.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55cbe079
    • Petr Machata's avatar
      vxlan: changelink: Fix handling of default remotes · ce5e098f
      Petr Machata authored
      Default remotes are stored as FDB entries with an Ethernet address of
      00:00:00:00:00:00. When a request is made to change a remote address of
      a VXLAN device, vxlan_changelink() first deletes the existing default
      remote, and then creates a new FDB entry.
      
      This works well as long as the list of default remotes matches exactly
      the configuration of a VXLAN remote address. Thus when the VXLAN device
      has a remote of X, there should be exactly one default remote FDB entry
      X. If the VXLAN device has no remote address, there should be no such
      entry.
      
      Besides using "ip link set", it is possible to manipulate the list of
      default remotes by using the "bridge fdb". It is therefore easy to break
      the above condition. Under such circumstances, the __vxlan_fdb_delete()
      call doesn't delete the FDB entry itself, but just one remote. The
      following vxlan_fdb_create() then creates a new FDB entry, leading to a
      situation where two entries exist for the address 00:00:00:00:00:00,
      each with a different subset of default remotes.
      
      An even more obvious breakage rooted in the same cause can be observed
      when a remote address is configured for a VXLAN device that did not have
      one before. In that case vxlan_changelink() doesn't remove any remote,
      and just creates a new FDB entry for the new address:
      
      $ ip link add name vx up type vxlan id 2000 dstport 4789
      $ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.20 self permanent
      $ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.30 self permanent
      $ ip link set dev vx type vxlan remote 192.0.2.30
      $ bridge fdb sh dev vx | grep 00:00:00:00:00:00
      00:00:00:00:00:00 dst 192.0.2.30 self permanent <- new entry, 1 rdst
      00:00:00:00:00:00 dst 192.0.2.20 self permanent <- orig. entry, 2 rdsts
      00:00:00:00:00:00 dst 192.0.2.30 self permanent
      
      To fix this, instead of calling vxlan_fdb_create() directly, defer to
      vxlan_fdb_update(). That has logic to handle the duplicates properly.
      Additionally, it also handles notifications, so drop that call from
      changelink as well.
      
      Fixes: 0241b836 ("vxlan: fix default fdb entry netlink notify ordering during netdev create")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce5e098f
    • Petr Machata's avatar
      vxlan: Fix error path in __vxlan_dev_create() · 6db92468
      Petr Machata authored
      When a failure occurs in rtnl_configure_link(), the current code
      calls unregister_netdevice() to roll back the earlier call to
      register_netdevice(), and jumps to errout, which calls
      vxlan_fdb_destroy().
      
      However unregister_netdevice() calls transitively ndo_uninit, which is
      vxlan_uninit(), and that already takes care of deleting the default FDB
      entry by calling vxlan_fdb_delete_default(). Since the entry added
      earlier in __vxlan_dev_create() is exactly the default entry, the
      cleanup code in the errout block always leads to double free and thus a
      panic.
      
      Besides, since vxlan_fdb_delete_default() always destroys the FDB entry
      with notification enabled, the deletion of the default entry is notified
      even before the addition was notified.
      
      Instead, move the unregister_netdevice() call after the manual destroy,
      which solves both problems.
      
      Fixes: 0241b836 ("vxlan: fix default fdb entry netlink notify ordering during netdev create")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6db92468
    • Petr Machata's avatar
      vxlan: Unmark offloaded bit on replaced FDB entries · 6ad0b5a4
      Petr Machata authored
      When rdst of an offloaded FDB entry is replaced, it certainly isn't
      offloaded anymore. Drivers are notified about such replacements, and can
      re-mark the entry as offloaded again if they so wish. However until a
      driver does so explicitly, assume a replaced FDB entry is not offloaded.
      
      Note that replaces coming via vxlan_fdb_external_learn_add() are always
      immediately followed by an explicit offload marking.
      
      Fixes: 0efe1173 ("vxlan: Support marking RDSTs as offloaded")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ad0b5a4
    • David S. Miller's avatar
      Merge branch 'macb-DMA-race-fixes' · a9d6d897
      David S. Miller authored
      Anssi Hannula says:
      
      ====================
      net: macb: DMA race condition fixes
      
      Here are a couple of race condition fixes for the macb driver. The first
      two are for issues observed at runtime on real HW.
      
      v2:
      - added received Tested-bys and Acked-bys to the first two patches
      - in patch 3/3, moved the timestamp protection barrier closer to the
        timestamp reads
      - in patch 3/3, removed unnecessary move of the addr assignment in
        gem_rx() to keep the patch minimal for maximum clarity
      - in patch 3/3, clarified commit message and comments
      
      The 3/3 is the same one I improperly sent last week as a standalone
      patch.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9d6d897
    • Anssi Hannula's avatar
      net: macb: add missing barriers when reading descriptors · 6e0af298
      Anssi Hannula authored
      When reading buffer descriptors on RX or on TX completion, an
      RX_USED/TX_USED bit is checked first to ensure that the descriptors have
      been populated, i.e. the ownership has been transferred. However, there
      are no memory barriers to ensure that the data protected by the
      RX_USED/TX_USED bit is up-to-date with respect to that bit.
      
      Specifically:
      
      - TX timestamp descriptors may be loaded before ctrl is loaded for the
        TX_USED check, which is racy as the descriptors may be updated between
        the loads, causing old timestamp descriptor data to be used.
      
      - RX ctrl may be loaded before addr is loaded for the RX_USED check,
        which is racy as a new frame may be written between the loads, causing
        old ctrl descriptor data to be used.
        This issue exists for both macb_rx() and gem_rx() variants.
      
      Fix the races by adding DMA read memory barriers on those paths and
      reordering the reads in macb_rx().
      
      I have not observed any actual problems in practice caused by these
      being missing, though.
      
      Tested on a ZynqMP based system.
      
      Fixes: 89e5785f ("[PATCH] Atmel MACB ethernet driver")
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e0af298
    • Anssi Hannula's avatar
      net: macb: fix dropped RX frames due to a race · 8159ecab
      Anssi Hannula authored
      Bit RX_USED set to 0 in the address field allows the controller to write
      data to the receive buffer descriptor.
      
      The driver does not ensure the ctrl field is ready (cleared) when the
      controller sees the RX_USED=0 written by the driver. The ctrl field might
      only be cleared after the controller has already updated it according to
      a newly received frame, causing the frame to be discarded in gem_rx() due
      to unexpected ctrl field contents.
      
      A message is logged when the above scenario occurs:
      
        macb ff0b0000.ethernet eth0: not whole frame pointed by descriptor
      
      Fix the issue by ensuring that when the controller sees RX_USED=0 the
      ctrl field is already cleared.
      
      This issue was observed on a ZynqMP based system.
      
      Fixes: 4df95131 ("net/macb: change RX path for GEM")
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Tested-by: default avatarClaudiu Beznea <claudiu.beznea@microchip.com>
      Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8159ecab
    • Anssi Hannula's avatar
      net: macb: fix random memory corruption on RX with 64-bit DMA · e100a897
      Anssi Hannula authored
      64-bit DMA addresses are split in upper and lower halves that are
      written in separate fields on GEM. For RX, bit 0 of the address is used
      as the ownership bit (RX_USED). When the RX_USED bit is unset the
      controller is allowed to write data to the buffer.
      
      The driver does not guarantee that the controller already sees the upper
      half when the RX_USED bit is cleared, possibly resulting in the
      controller writing an incoming frame to an address with an incorrect
      upper half and therefore possibly corrupting unrelated system memory.
      
      Fix that by adding the necessary DMA memory barrier between the writes.
      
      This corruption was observed on a ZynqMP based system.
      
      Fixes: fff8019a ("net: macb: Add 64 bit addressing support for GEM")
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Acked-by: default avatarHarini Katakam <harini.katakam@xilinx.com>
      Tested-by: default avatarClaudiu Beznea <claudiu.beznea@microchip.com>
      Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e100a897
  2. 18 Dec, 2018 9 commits