1. 05 Oct, 2019 13 commits
    • Jakub Kicinski's avatar
      selftests/net: add nettest to .gitignore · ef129d34
      Jakub Kicinski authored
      nettest is missing from gitignore.
      
      Fixes: acda655f ("selftests: Add nettest")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef129d34
    • Navid Emamdoost's avatar
      net: qlogic: Fix memory leak in ql_alloc_large_buffers · 1acb8f2a
      Navid Emamdoost authored
      In ql_alloc_large_buffers, a new skb is allocated via netdev_alloc_skb.
      This skb should be released if pci_dma_mapping_error fails.
      
      Fixes: 0f8ab89e ("qla3xxx: Check return code from pci_map_single() in ql_release_to_lrg_buf_free_list(), ql_populate_free_queue(), ql_alloc_large_buffers(), and ql3xxx_send()")
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1acb8f2a
    • Eric Dumazet's avatar
      nfc: fix memory leak in llcp_sock_bind() · a0c2dc1f
      Eric Dumazet authored
      sysbot reported a memory leak after a bind() has failed.
      
      While we are at it, abort the operation if kmemdup() has failed.
      
      BUG: memory leak
      unreferenced object 0xffff888105d83ec0 (size 32):
        comm "syz-executor067", pid 7207, jiffies 4294956228 (age 19.430s)
        hex dump (first 32 bytes):
          00 69 6c 65 20 72 65 61 64 00 6e 65 74 3a 5b 34  .ile read.net:[4
          30 32 36 35 33 33 30 39 37 5d 00 00 00 00 00 00  026533097]......
        backtrace:
          [<0000000036bac473>] kmemleak_alloc_recursive /./include/linux/kmemleak.h:43 [inline]
          [<0000000036bac473>] slab_post_alloc_hook /mm/slab.h:522 [inline]
          [<0000000036bac473>] slab_alloc /mm/slab.c:3319 [inline]
          [<0000000036bac473>] __do_kmalloc /mm/slab.c:3653 [inline]
          [<0000000036bac473>] __kmalloc_track_caller+0x169/0x2d0 /mm/slab.c:3670
          [<000000000cd39d07>] kmemdup+0x27/0x60 /mm/util.c:120
          [<000000008e57e5fc>] kmemdup /./include/linux/string.h:432 [inline]
          [<000000008e57e5fc>] llcp_sock_bind+0x1b3/0x230 /net/nfc/llcp_sock.c:107
          [<000000009cb0b5d3>] __sys_bind+0x11c/0x140 /net/socket.c:1647
          [<00000000492c3bbc>] __do_sys_bind /net/socket.c:1658 [inline]
          [<00000000492c3bbc>] __se_sys_bind /net/socket.c:1656 [inline]
          [<00000000492c3bbc>] __x64_sys_bind+0x1e/0x30 /net/socket.c:1656
          [<0000000008704b2a>] do_syscall_64+0x76/0x1a0 /arch/x86/entry/common.c:296
          [<000000009f4c57a4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 30cc4587 ("NFC: Move LLCP code to the NFC top level diirectory")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0c2dc1f
    • Eric Dumazet's avatar
      sch_dsmark: fix potential NULL deref in dsmark_init() · 474f0813
      Eric Dumazet authored
      Make sure TCA_DSMARK_INDICES was provided by the user.
      
      syzbot reported :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 8799 Comm: syz-executor235 Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:nla_get_u16 include/net/netlink.h:1501 [inline]
      RIP: 0010:dsmark_init net/sched/sch_dsmark.c:364 [inline]
      RIP: 0010:dsmark_init+0x193/0x640 net/sched/sch_dsmark.c:339
      Code: 85 db 58 0f 88 7d 03 00 00 e8 e9 1a ac fb 48 8b 9d 70 ff ff ff 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 ca
      RSP: 0018:ffff88809426f3b8 EFLAGS: 00010247
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff85c6eb09
      RDX: 0000000000000000 RSI: ffffffff85c6eb17 RDI: 0000000000000004
      RBP: ffff88809426f4b0 R08: ffff88808c4085c0 R09: ffffed1015d26159
      R10: ffffed1015d26158 R11: ffff8880ae930ac7 R12: ffff8880a7e96940
      R13: dffffc0000000000 R14: ffff88809426f8c0 R15: 0000000000000000
      FS:  0000000001292880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000080 CR3: 000000008ca1b000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       qdisc_create+0x4ee/0x1210 net/sched/sch_api.c:1237
       tc_modify_qdisc+0x524/0x1c50 net/sched/sch_api.c:1653
       rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5223
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x803/0x920 net/socket.c:2311
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2356
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg net/socket.c:2363 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
       do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440369
      
      Fixes: 758cc43c ("[PKT_SCHED]: Fix dsmark to apply changes consistent")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      474f0813
    • David S. Miller's avatar
      Merge branch 'Fix-regression-with-AR8035-speed-downgrade' · e3ba9bf6
      David S. Miller authored
      Russell King says:
      
      ====================
      Fix regression with AR8035 speed downgrade
      
      The following series attempts to address an issue spotted by tinywrkb
      with the AR8035 on the Cubox-i2 in a situation where the PHY downgrades
      the negotiated link.
      
      This is version 2, not much has changed other than rebasing on the
      current net tree.  Changes have happend to patch 2 due to conflicts,
      so I dropped Andrew's reviewed-by.  Minor context changes to patch 4
      which I don't consider important enough to warrant dropping the
      reviewed-by.
      
      Before commit 5502b218 ("net: phy: use phy_resolve_aneg_linkmode in
      genphy_read_status"), we would read not only the link partner's
      advertisement, but also our own advertisement from the PHY registers,
      and use both to derive the PHYs current link mode.  This works when the
      AR8035 downgrades the speed, because it appears that the AR8035 clears
      link mode bits in the advertisement registers as part of the downgrade.
      
      Commentary: what is not yet known is whether the AR8035 restores the
                  advertisement register when the link goes down to the
      	    previous state.
      
      However, since the above referenced commit, we no longer use the PHYs
      advertisement registers, instead converting the link partner's
      advertisement to the ethtool link mode array, and combine that with
      phylib's cached version of our advertisement - which is not updated on
      speed downgrade.
      
      This results in phylib disagreeing with the actual operating mode of
      the PHY.
      
      Commentary: I wonder how many more PHY drivers are broken by this
      	    commit, but have yet to be discovered.
      
      The obvious way to address this would be to disable the downgrade
      feature, and indeed this does fix the problem in tinywrkb's case - his
      link partner instead downgrades the speed by reducing its
      advertisement, resulting in phylib correctly evaluating a slower speed.
      
      However, it has a serious drawback - the gigabit control register (MII
      register 9) appears to become read only.  It seems the only way to
      update the register is to re-enable the downgrade feature, reset the
      PHY, changing register 9, disable the downgrade feature, and reset the
      PHY again.
      
      This series attempts to address the problem using a different approach,
      similar to the approach taken with Marvell PHYs.  The AR8031, AR8033
      and AR8035 have a PHY-Specific Status register which reports the
      actual operating mode of the PHY - both speed and duplex.  This
      register correctly reports the operating mode irrespective of whether
      autoneg is enabled or not.  We use this register to fill in phylib's
      speed and duplex parameters.
      
      In detail:
      
      Patch 1 fixes a bug where writing to register 9 does not update
      phylib's advertisement mask in the same way that writing register 4
      does; this looks like an omission from when gigabit PHY support came
      into being.
      
      Patch 2 seperates the generic phylib code which reads the link partners
      advertisement from the PHY, so that we can re-use this in the Atheros
      PHY driver.
      
      Patch 3 seperates the generic phylib pause mode; phylib provides no
      help for MAC drivers to ascertain the negotiated pause mode, it merely
      copies the link partner's pause mode bits into its own variables.
      
      Commentary: Both the aforementioned Atheros PHYs and Marvell PHYs
                  provide the resolved pause modes in terms of whether
      	    we should transmit pause frames, or whether we should
      	    allow reception of pause frames.  Surely the resolution
      	    of this should be in phylib?
      
      Patch 4 provides the Atheros PHY driver with a private "read_status"
      implementation that fills in phylib's speed and duplex settings
      depending on the PHY-Specific status register.  This ensures that
      phylib and the MAC driver match the operating mode that the PHY has
      decided to use.  Since the register also gives us MDIX status, we
      can trivially fill that status in as well.
      
      Note that, although the bits mentioned in this patch for this register
      match those in th Marvell PHY driver, and it is located at the same
      address, the meaning of other register bits varies between the PHYs.
      Therefore, I do not feel that it would be appropriate to make this some
      kind of generic function.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3ba9bf6
    • Russell King's avatar
      net: phy: at803x: use operating parameters from PHY-specific status · 06d5f344
      Russell King authored
      Read the PHY-specific status register for the current operating mode
      (speed and duplex) of the PHY.  This register reflects the actual
      mode that the PHY has resolved depending on either the advertisements
      of autoneg is enabled, or the forced mode if autoneg is disabled.
      
      This ensures that phylib's software state always tracks the hardware
      state.
      
      It seems both AR8033 (which uses the AR8031 ID) and AR8035 support
      this status register.  AR8030 is not known at the present time.
      
      This patch depends on "net: phy: extract pause mode" and "net: phy:
      extract link partner advertisement reading".
      Reported-by: default avatartinywrkb <tinywrkb@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Tested-by: default avatartinywrkb <tinywrkb@gmail.com>
      Fixes: 5502b218 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06d5f344
    • Russell King's avatar
      net: phy: extract pause mode · 2d880b87
      Russell King authored
      Extract the update of phylib's software pause mode state from
      genphy_read_status(), so that we can re-use this functionality with
      PHYs that have alternative ways to read the negotiation results.
      Tested-by: default avatartinywrkb <tinywrkb@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d880b87
    • Russell King's avatar
      net: phy: extract link partner advertisement reading · 8d3dc3ac
      Russell King authored
      Move reading the link partner advertisement out of genphy_read_status()
      into its own separate function.  This will allow re-use of this code by
      PHY drivers that are able to read the resolved status from the PHY.
      Tested-by: default avatartinywrkb <tinywrkb@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d3dc3ac
    • Russell King's avatar
      net: phy: fix write to mii-ctrl1000 register · 4cf6c57e
      Russell King authored
      When userspace writes to the MII_ADVERTISE register, we update phylib's
      advertising mask and trigger a renegotiation.  However, writing to the
      MII_CTRL1000 register, which contains the gigabit advertisement, does
      neither.  This can lead to phylib's copy of the advertisement becoming
      de-synced with the values in the PHY register set, which can result in
      incorrect negotiation resolution.
      
      Fixes: 5502b218 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cf6c57e
    • David Ahern's avatar
      ipv6: Handle missing host route in __ipv6_ifa_notify · 2d819d25
      David Ahern authored
      Rajendra reported a kernel panic when a link was taken down:
      
          [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
          [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
      
          <snip>
      
          [ 6870.570501] Call Trace:
          [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
          [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
          [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
          [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
          [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
          [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
          [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
          [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
          [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
          [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
          [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
          [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
          [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
      
      addrconf_dad_work is kicked to be scheduled when a device is brought
      up. There is a race between addrcond_dad_work getting scheduled and
      taking the rtnl lock and a process taking the link down (under rtnl).
      The latter removes the host route from the inet6_addr as part of
      addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
      to use the host route in __ipv6_ifa_notify. If the down event removes
      the host route due to the race to the rtnl, then the BUG listed above
      occurs.
      
      Since the DAD sequence can not be aborted, add a check for the missing
      host route in __ipv6_ifa_notify. The only way this should happen is due
      to the previously mentioned race. The host route is created when the
      address is added to an interface; it is only removed on a down event
      where the address is kept. Add a warning if the host route is missing
      AND the device is up; this is a situation that should never happen.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: default avatarRajendra Dendukuri <rajendra.dendukuri@broadcom.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d819d25
    • Andrea Merello's avatar
      net: phy: allow for reset line to be tied to a sleepy GPIO controller · ea977d19
      Andrea Merello authored
      mdio_device_reset() makes use of the atomic-pretending API flavor for
      handling the PHY reset GPIO line.
      
      I found no hint that mdio_device_reset() is called from atomic context
      and indeed it uses usleep_range() since long time, so I would assume that
      it is OK to sleep there.
      
      This patch switch to gpiod_set_value_cansleep() in mdio_device_reset().
      This is relevant if e.g. the PHY reset line is tied to a I2C GPIO
      controller.
      
      This has been tested on a ZynqMP board running an upstream 4.19 kernel and
      then hand-ported on current kernel tree.
      Signed-off-by: default avatarAndrea Merello <andrea.merello@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea977d19
    • Paolo Abeni's avatar
      net: ipv4: avoid mixed n_redirects and rate_tokens usage · b406472b
      Paolo Abeni authored
      Since commit c09551c6 ("net: ipv4: use a dedicated counter
      for icmp_v4 redirect packets") we use 'n_redirects' to account
      for redirect packets, but we still use 'rate_tokens' to compute
      the redirect packets exponential backoff.
      
      If the device sent to the relevant peer any ICMP error packet
      after sending a redirect, it will also update 'rate_token' according
      to the leaking bucket schema; typically 'rate_token' will raise
      above BITS_PER_LONG and the redirect packets backoff algorithm
      will produce undefined behavior.
      
      Fix the issue using 'n_redirects' to compute the exponential backoff
      in ip_rt_send_redirect().
      
      Note that we still clear rate_tokens after a redirect silence period,
      to avoid changing an established behaviour.
      
      The root cause predates git history; before the mentioned commit in
      the critical scenario, the kernel stopped sending redirects, after
      the mentioned commit the behavior more randomic.
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Fixes: c09551c6 ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b406472b
    • Kai-Heng Feng's avatar
      r8152: Set macpassthru in reset_resume callback · a54cdeeb
      Kai-Heng Feng authored
      r8152 may fail to establish network connection after resume from system
      suspend.
      
      If the USB port connects to r8152 lost its power during system suspend,
      the MAC address was written before is lost. The reason is that The MAC
      address doesn't get written again in its reset_resume callback.
      
      So let's set MAC address again in reset_resume callback. Also remove
      unnecessary lock as no other locking attempt will happen during
      reset_resume.
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a54cdeeb
  2. 04 Oct, 2019 5 commits
  3. 03 Oct, 2019 6 commits
    • Eric Dumazet's avatar
      tcp: fix slab-out-of-bounds in tcp_zerocopy_receive() · 3afb0961
      Eric Dumazet authored
      Apparently a refactoring patch brought a bug, that was caught
      by syzbot [1]
      
      Original code was correct, do not try to be smarter than the
      compiler :/
      
      [1]
      BUG: KASAN: slab-out-of-bounds in tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline]
      BUG: KASAN: slab-out-of-bounds in do_tcp_getsockopt.isra.0+0x2c6c/0x3120 net/ipv4/tcp.c:3654
      Read of size 4 at addr ffff8880943cf188 by task syz-executor.2/17508
      
      CPU: 0 PID: 17508 Comm: syz-executor.2 Not tainted 5.3.0-rc7+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
       __kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
       kasan_report+0x12/0x17 mm/kasan/common.c:618
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
       tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline]
       do_tcp_getsockopt.isra.0+0x2c6c/0x3120 net/ipv4/tcp.c:3654
       tcp_getsockopt+0xbf/0xe0 net/ipv4/tcp.c:3680
       sock_common_getsockopt+0x94/0xd0 net/core/sock.c:3098
       __sys_getsockopt+0x16d/0x310 net/socket.c:2129
       __do_sys_getsockopt net/socket.c:2144 [inline]
       __se_sys_getsockopt net/socket.c:2141 [inline]
       __x64_sys_getsockopt+0xbe/0x150 net/socket.c:2141
       do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
      
      Fixes: d8e18a51 ("net: Use skb accessors in network core")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3afb0961
    • Randy Dunlap's avatar
      lib: textsearch: fix escapes in example code · 2105b52e
      Randy Dunlap authored
      This textsearch code example does not need the '\' escapes and they can
      be misleading to someone reading the example. Also, gcc and sparse warn
      that the "\%d" is an unknown escape sequence.
      
      Fixes: 5968a70d ("textsearch: fix kernel-doc warnings and add kernel-api section")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2105b52e
    • Josh Hunt's avatar
      udp: only do GSO if # of segs > 1 · 4094871d
      Josh Hunt authored
      Prior to this change an application sending <= 1MSS worth of data and
      enabling UDP GSO would fail if the system had SW GSO enabled, but the
      same send would succeed if HW GSO offload is enabled. In addition to this
      inconsistency the error in the SW GSO case does not get back to the
      application if sending out of a real device so the user is unaware of this
      failure.
      
      With this change we only perform GSO if the # of segments is > 1 even
      if the application has enabled segmentation. I've also updated the
      relevant udpgso selftests.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: default avatarJosh Hunt <johunt@akamai.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4094871d
    • Josh Hunt's avatar
      udp: fix gso_segs calculations · 44b321e5
      Josh Hunt authored
      Commit dfec0ee2 ("udp: Record gso_segs when supporting UDP segmentation offload")
      added gso_segs calculation, but incorrectly got sizeof() the pointer and
      not the underlying data type. In addition let's fix the v6 case.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Fixes: dfec0ee2 ("udp: Record gso_segs when supporting UDP segmentation offload")
      Signed-off-by: default avatarJosh Hunt <johunt@akamai.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44b321e5
    • Eric Dumazet's avatar
      ipv6: drop incoming packets having a v4mapped source address · 6af1799a
      Eric Dumazet authored
      This began with a syzbot report. syzkaller was injecting
      IPv6 TCP SYN packets having a v4mapped source address.
      
      After an unsuccessful 4-tuple lookup, TCP creates a request
      socket (SYN_RECV) and calls reqsk_queue_hash_req()
      
      reqsk_queue_hash_req() calls sk_ehashfn(sk)
      
      At this point we have AF_INET6 sockets, and the heuristic
      used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses
      is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr)
      
      For the particular spoofed packet, we end up hashing V4 addresses
      which were not initialized by the TCP IPv6 stack, so KMSAN fired
      a warning.
      
      I first fixed sk_ehashfn() to test both source and destination addresses,
      but then faced various problems, including user-space programs
      like packetdrill that had similar assumptions.
      
      Instead of trying to fix the whole ecosystem, it is better
      to admit that we have a dual stack behavior, and that we
      can not build linux kernels without V4 stack anyway.
      
      The dual stack API automatically forces the traffic to be IPv4
      if v4mapped addresses are used at bind() or connect(), so it makes
      no sense to allow IPv6 traffic to use the same v4mapped class.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6af1799a
    • Thierry Reding's avatar
      net: stmmac: Avoid deadlock on suspend/resume · 134cc4ce
      Thierry Reding authored
      The stmmac driver will try to acquire its private mutex during suspend
      via phylink_resolve() -> stmmac_mac_link_down() -> stmmac_eee_init().
      However, the phylink configuration is updated with the private mutex
      held already, which causes a deadlock during suspend.
      
      Fix this by moving the phylink configuration updates out of the region
      of code protected by the private mutex.
      
      Fixes: 19e13cb2 ("net: stmmac: Hold rtnl lock in suspend/resume callbacks")
      Suggested-by: default avatarBitan Biswas <bbiswas@nvidia.com>
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      134cc4ce
  4. 02 Oct, 2019 15 commits
    • Yizhuo's avatar
      net: hisilicon: Fix usage of uninitialized variable in function mdio_sc_cfg_reg_write() · 53de429f
      Yizhuo authored
      In function mdio_sc_cfg_reg_write(), variable "reg_value" could be
      uninitialized if regmap_read() fails. However, "reg_value" is used
      to decide the control flow later in the if statement, which is
      potentially unsafe.
      Signed-off-by: default avatarYizhuo <yzhai003@ucr.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53de429f
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 4fbb97ba
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Remove the skb_ext_del from nf_reset, and renames it to a more
         fitting nf_reset_ct(). Patch from Florian Westphal.
      
      2) Fix deadlock in nft_connlimit between packet path updates and
         the garbage collector.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4fbb97ba
    • Vladimir Oltean's avatar
      ptp_qoriq: Initialize the registers' spinlock before calling ptp_qoriq_settime · db34a471
      Vladimir Oltean authored
      Because ptp_qoriq_settime is being called prior to spin_lock_init, the
      following stack trace can be seen at driver probe time:
      
      [    2.269117] the code is fine but needs lockdep annotation.
      [    2.274569] turning off the locking correctness validator.
      [    2.280027] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc7-01478-g01eaa67a4797 #263
      [    2.288073] Hardware name: Freescale LS1021A
      [    2.292337] [<c0313cb4>] (unwind_backtrace) from [<c030e11c>] (show_stack+0x10/0x14)
      [    2.300045] [<c030e11c>] (show_stack) from [<c1219440>] (dump_stack+0xcc/0xf8)
      [    2.307235] [<c1219440>] (dump_stack) from [<c03b9b44>] (register_lock_class+0x730/0x73c)
      [    2.315372] [<c03b9b44>] (register_lock_class) from [<c03b6190>] (__lock_acquire+0x78/0x270c)
      [    2.323856] [<c03b6190>] (__lock_acquire) from [<c03b90cc>] (lock_acquire+0xe0/0x22c)
      [    2.331649] [<c03b90cc>] (lock_acquire) from [<c123c310>] (_raw_spin_lock_irqsave+0x54/0x68)
      [    2.340048] [<c123c310>] (_raw_spin_lock_irqsave) from [<c0e73fe4>] (ptp_qoriq_settime+0x38/0x80)
      [    2.348878] [<c0e73fe4>] (ptp_qoriq_settime) from [<c0e746d4>] (ptp_qoriq_init+0x1f8/0x484)
      [    2.357189] [<c0e746d4>] (ptp_qoriq_init) from [<c0e74aac>] (ptp_qoriq_probe+0xd0/0x184)
      [    2.365243] [<c0e74aac>] (ptp_qoriq_probe) from [<c0b0a07c>] (platform_drv_probe+0x48/0x9c)
      [    2.373555] [<c0b0a07c>] (platform_drv_probe) from [<c0b07a14>] (really_probe+0x1c4/0x400)
      [    2.381779] [<c0b07a14>] (really_probe) from [<c0b07e28>] (driver_probe_device+0x78/0x1b8)
      [    2.390003] [<c0b07e28>] (driver_probe_device) from [<c0b081d0>] (device_driver_attach+0x58/0x60)
      [    2.398832] [<c0b081d0>] (device_driver_attach) from [<c0b082d4>] (__driver_attach+0xfc/0x160)
      [    2.407402] [<c0b082d4>] (__driver_attach) from [<c0b05a84>] (bus_for_each_dev+0x68/0xb4)
      [    2.415539] [<c0b05a84>] (bus_for_each_dev) from [<c0b06b68>] (bus_add_driver+0x104/0x20c)
      [    2.423763] [<c0b06b68>] (bus_add_driver) from [<c0b0909c>] (driver_register+0x78/0x10c)
      [    2.431815] [<c0b0909c>] (driver_register) from [<c030313c>] (do_one_initcall+0x8c/0x3ac)
      [    2.439954] [<c030313c>] (do_one_initcall) from [<c1f013f4>] (kernel_init_freeable+0x468/0x548)
      [    2.448610] [<c1f013f4>] (kernel_init_freeable) from [<c12344d8>] (kernel_init+0x8/0x10c)
      [    2.456745] [<c12344d8>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
      [    2.464273] Exception stack(0xea89ffb0 to 0xea89fff8)
      [    2.469297] ffa0:                                     00000000 00000000 00000000 00000000
      [    2.477432] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    2.485566] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
      
      Fixes: ff54571a ("ptp_qoriq: convert to use ptp_qoriq_init/free")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db34a471
    • David S. Miller's avatar
      Merge branch 'SJA1105-DSA-locking-fixes-for-PTP' · 76d67494
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      SJA1105 DSA locking fixes for PTP
      
      This series fixes the locking API usage problems spotted when compiling
      the kernel with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_DEBUG_SPINLOCK=y.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76d67494
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Fix sleeping while atomic in .port_hwtstamp_set · 3e8db7e5
      Vladimir Oltean authored
      Currently this stack trace can be seen with CONFIG_DEBUG_ATOMIC_SLEEP=y:
      
      [   41.568348] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
      [   41.576757] in_atomic(): 1, irqs_disabled(): 0, pid: 208, name: ptp4l
      [   41.583212] INFO: lockdep is turned off.
      [   41.587123] CPU: 1 PID: 208 Comm: ptp4l Not tainted 5.3.0-rc6-01445-ge950f2d4bc7f-dirty #1827
      [   41.599873] [<c0313d7c>] (unwind_backtrace) from [<c030e13c>] (show_stack+0x10/0x14)
      [   41.607584] [<c030e13c>] (show_stack) from [<c1212d50>] (dump_stack+0xd4/0x100)
      [   41.614863] [<c1212d50>] (dump_stack) from [<c037dfc8>] (___might_sleep+0x1c8/0x2b4)
      [   41.622574] [<c037dfc8>] (___might_sleep) from [<c122ea90>] (__mutex_lock+0x48/0xab8)
      [   41.630368] [<c122ea90>] (__mutex_lock) from [<c122f51c>] (mutex_lock_nested+0x1c/0x24)
      [   41.638340] [<c122f51c>] (mutex_lock_nested) from [<c0c6fe08>] (sja1105_static_config_reload+0x30/0x27c)
      [   41.647779] [<c0c6fe08>] (sja1105_static_config_reload) from [<c0c7015c>] (sja1105_hwtstamp_set+0x108/0x1cc)
      [   41.657562] [<c0c7015c>] (sja1105_hwtstamp_set) from [<c0feb650>] (dev_ifsioc+0x18c/0x330)
      [   41.665788] [<c0feb650>] (dev_ifsioc) from [<c0febbd8>] (dev_ioctl+0x320/0x6e8)
      [   41.673064] [<c0febbd8>] (dev_ioctl) from [<c0f8b1f4>] (sock_ioctl+0x334/0x5e8)
      [   41.680340] [<c0f8b1f4>] (sock_ioctl) from [<c05404a8>] (do_vfs_ioctl+0xb0/0xa10)
      [   41.687789] [<c05404a8>] (do_vfs_ioctl) from [<c0540e3c>] (ksys_ioctl+0x34/0x58)
      [   41.695151] [<c0540e3c>] (ksys_ioctl) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
      [   41.702768] Exception stack(0xe8495fa8 to 0xe8495ff0)
      [   41.707796] 5fa0:                   beff4a8c 00000001 00000011 000089b0 beff4a8c beff4a80
      [   41.715933] 5fc0: beff4a8c 00000001 0000000c 00000036 b6fa98c8 004e19c1 00000001 00000000
      [   41.724069] 5fe0: 004dcedc beff4a6c 004c0738 b6e7af4c
      [   41.729860] BUG: scheduling while atomic: ptp4l/208/0x00000002
      [   41.735682] INFO: lockdep is turned off.
      
      Enabling RX timestamping will logically disturb the fastpath (processing
      of meta frames). Replace bool hwts_rx_en with a bit that is checked
      atomically from the fastpath and temporarily unset from the sleepable
      context during a change of the RX timestamping process (a destructive
      operation anyways, requires switch reset).
      If found unset, the fastpath (net/dsa/tag_sja1105.c) will just drop any
      received meta frame and not take the meta_lock at all.
      
      Fixes: a602afd2 ("net: dsa: sja1105: Expose PTP timestamping ioctls to userspace")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e8db7e5
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Initialize the meta_lock · d6530e5a
      Vladimir Oltean authored
      Otherwise, with CONFIG_DEBUG_SPINLOCK=y, this stack trace gets printed
      when enabling RX timestamping and receiving a PTP frame:
      
      [  318.537078] INFO: trying to register non-static key.
      [  318.542040] the code is fine but needs lockdep annotation.
      [  318.547500] turning off the locking correctness validator.
      [  318.552972] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-13257-g0825b0669811-dirty #1962
      [  318.561283] Hardware name: Freescale LS1021A
      [  318.565566] [<c03144bc>] (unwind_backtrace) from [<c030e164>] (show_stack+0x10/0x14)
      [  318.573289] [<c030e164>] (show_stack) from [<c11b9f50>] (dump_stack+0xd4/0x100)
      [  318.580579] [<c11b9f50>] (dump_stack) from [<c03b9b40>] (register_lock_class+0x728/0x734)
      [  318.588731] [<c03b9b40>] (register_lock_class) from [<c03b60c4>] (__lock_acquire+0x78/0x25cc)
      [  318.597227] [<c03b60c4>] (__lock_acquire) from [<c03b8ef8>] (lock_acquire+0xd8/0x234)
      [  318.605033] [<c03b8ef8>] (lock_acquire) from [<c11db934>] (_raw_spin_lock+0x44/0x54)
      [  318.612755] [<c11db934>] (_raw_spin_lock) from [<c1164370>] (sja1105_rcv+0x1f8/0x4e8)
      [  318.620561] [<c1164370>] (sja1105_rcv) from [<c115d7cc>] (dsa_switch_rcv+0x80/0x204)
      [  318.628283] [<c115d7cc>] (dsa_switch_rcv) from [<c0f58c80>] (__netif_receive_skb_one_core+0x50/0x6c)
      [  318.637386] [<c0f58c80>] (__netif_receive_skb_one_core) from [<c0f58f04>] (netif_receive_skb_internal+0xac/0x264)
      [  318.647611] [<c0f58f04>] (netif_receive_skb_internal) from [<c0f59e98>] (napi_gro_receive+0x1d8/0x338)
      [  318.656887] [<c0f59e98>] (napi_gro_receive) from [<c0c298a4>] (gfar_clean_rx_ring+0x328/0x724)
      [  318.665472] [<c0c298a4>] (gfar_clean_rx_ring) from [<c0c29e60>] (gfar_poll_rx_sq+0x34/0x94)
      [  318.673795] [<c0c29e60>] (gfar_poll_rx_sq) from [<c0f5b40c>] (net_rx_action+0x128/0x4f8)
      [  318.681860] [<c0f5b40c>] (net_rx_action) from [<c03022f0>] (__do_softirq+0x148/0x5ac)
      [  318.689666] [<c03022f0>] (__do_softirq) from [<c0355af4>] (irq_exit+0x160/0x170)
      [  318.697040] [<c0355af4>] (irq_exit) from [<c03c6818>] (__handle_domain_irq+0x60/0xb4)
      [  318.704847] [<c03c6818>] (__handle_domain_irq) from [<c07e9440>] (gic_handle_irq+0x58/0x9c)
      [  318.713172] [<c07e9440>] (gic_handle_irq) from [<c0301a70>] (__irq_svc+0x70/0x98)
      [  318.720622] Exception stack(0xc2001f18 to 0xc2001f60)
      [  318.725656] 1f00:                                                       00000001 00000006
      [  318.733805] 1f20: 00000000 c20165c0 ffffe000 c2010cac c2010cf4 00000001 00000000 c2010c88
      [  318.741955] 1f40: c1f7a5a8 00000000 00000000 c2001f68 c03ba140 c030a288 200e0013 ffffffff
      [  318.750110] [<c0301a70>] (__irq_svc) from [<c030a288>] (arch_cpu_idle+0x24/0x3c)
      [  318.757486] [<c030a288>] (arch_cpu_idle) from [<c038a480>] (do_idle+0x1b8/0x2a4)
      [  318.764859] [<c038a480>] (do_idle) from [<c038a94c>] (cpu_startup_entry+0x18/0x1c)
      [  318.772407] [<c038a94c>] (cpu_startup_entry) from [<c1e00f10>] (start_kernel+0x4cc/0x4fc)
      
      Fixes: 844d7edc ("net: dsa: sja1105: Add a global sja1105_tagger_data structure")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6530e5a
    • Dotan Barak's avatar
      net/rds: Fix error handling in rds_ib_add_one() · d64bf89a
      Dotan Barak authored
      rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized
      after allocation some resources such as protection domain.
      If allocation of such resources fail, then these uninitialized
      variables are accessed in rds_ib_dev_free() in failure path. This
      can potentially crash the system. The code has been updated to
      initialize these variables very early in the function.
      Signed-off-by: default avatarDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: default avatarSudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d64bf89a
    • Linus Walleij's avatar
      net: dsa: rtl8366: Check VLAN ID and not ports · e8521e53
      Linus Walleij authored
      There has been some confusion between the port number and
      the VLAN ID in this driver. What we need to check for
      validity is the VLAN ID, nothing else.
      
      The current confusion came from assigning a few default
      VLANs for default routing and we need to rewrite that
      properly.
      
      Instead of checking if the port number is a valid VLAN
      ID, check the actual VLAN IDs passed in to the callback
      one by one as expected.
      
      Fixes: d8652956 ("net: dsa: realtek-smi: Add Realtek SMI driver")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8521e53
    • Michal Kubecek's avatar
      mlx5: avoid 64-bit division in dr_icm_pool_mr_create() · 8b6b82ad
      Michal Kubecek authored
      Recently added code introduces 64-bit division in dr_icm_pool_mr_create()
      so that build on 32-bit architectures fails with
      
        ERROR: "__umoddi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!
      
      As the divisor is always a power of 2, we can use bitwise operation
      instead.
      
      Fixes: 29cf8feb ("net/mlx5: DR, ICM pool memory allocator")
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b6b82ad
    • Tuong Lien's avatar
      tipc: fix unlimited bundling of small messages · e95584a8
      Tuong Lien authored
      We have identified a problem with the "oversubscription" policy in the
      link transmission code.
      
      When small messages are transmitted, and the sending link has reached
      the transmit window limit, those messages will be bundled and put into
      the link backlog queue. However, bundles of data messages are counted
      at the 'CRITICAL' level, so that the counter for that level, instead of
      the counter for the real, bundled message's level is the one being
      increased.
      Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
      to be tested against the unchanged counter for their own level, while
      contributing to an unrestrained increase at the CRITICAL backlog level.
      
      This leaves a gap in congestion control algorithm for small messages
      that can result in starvation for other users or a "real" CRITICAL
      user. Even that eventually can lead to buffer exhaustion & link reset.
      
      We fix this by keeping a 'target_bskb' buffer pointer at each levels,
      then when bundling, we only bundle messages at the same importance
      level only. This way, we know exactly how many slots a certain level
      have occupied in the queue, so can manage level congestion accurately.
      
      By bundling messages at the same level, we even have more benefits. Let
      consider this:
      - One socket sends 64-byte messages at the 'CRITICAL' level;
      - Another sends 4096-byte messages at the 'LOW' level;
      
      When a 64-byte message comes and is bundled the first time, we put the
      overhead of message bundle to it (+ 40-byte header, data copy, etc.)
      for later use, but the next message can be a 4096-byte one that cannot
      be bundled to the previous one. This means the last bundle carries only
      one payload message which is totally inefficient, as for the receiver
      also! Later on, another 64-byte message comes, now we make a new bundle
      and the same story repeats...
      
      With the new bundling algorithm, this will not happen, the 64-byte
      messages will be bundled together even when the 4096-byte message(s)
      comes in between. However, if the 4096-byte messages are sent at the
      same level i.e. 'CRITICAL', the bundling algorithm will again cause the
      same overhead.
      
      Also, the same will happen even with only one socket sending small
      messages at a rate close to the link transmit's one, so that, when one
      message is bundled, it's transmitted shortly. Then, another message
      comes, a new bundle is created and so on...
      
      We will solve this issue radically by another patch.
      
      Fixes: 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Reported-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e95584a8
    • Dongli Zhang's avatar
      xen-netfront: do not use ~0U as error return value for xennet_fill_frags() · a761129e
      Dongli Zhang authored
      xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
      to cache extra fragments. This is incorrect because the return type of
      xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
      ring buffer index.
      
      In the situation when the rsp_cons is approaching 0xffffffff, the return
      value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
      caller) would regard as error. As a result, queue->rx.rsp_cons is set
      incorrectly because it is updated only when there is error. If there is no
      error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
      Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
      queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
      This leads to NULL pointer access in the next iteration to process rx ring
      buffer entries.
      
      The symptom is similar to the one fixed in
      commit 00b36850 ("xen-netfront: do not assume sk_buff_head list is
      empty in error handling").
      
      This patch changes the return type of xennet_fill_frags() to indicate
      whether it is successful or failed. The queue->rx.rsp_cons will be
      always updated inside this function.
      
      Fixes: ad4f15dc ("xen/netfront: don't bug in case of too many frags")
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a761129e
    • David Ahern's avatar
      ipv6: Handle race in addrconf_dad_work · a3ce2a21
      David Ahern authored
      Rajendra reported a kernel panic when a link was taken down:
      
      [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
      [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
      
      <snip>
      
      [ 6870.570501] Call Trace:
      [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
      [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
      [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
      [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
      [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
      [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
      [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
      [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
      [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
      [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
      [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
      [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
      [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
      
      addrconf_dad_work is kicked to be scheduled when a device is brought
      up. There is a race between addrcond_dad_work getting scheduled and
      taking the rtnl lock and a process taking the link down (under rtnl).
      The latter removes the host route from the inet6_addr as part of
      addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
      to use the host route in ipv6_ifa_notify. If the down event removes
      the host route due to the race to the rtnl, then the BUG listed above
      occurs.
      
      This scenario does not occur when the ipv6 address is not kept
      (net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the
      state of the ifp to DEAD. Handle when the addresses are kept by checking
      IF_READY which is reset by addrconf_ifdown.
      
      The 'dead' flag for an inet6_addr is set only under rtnl, in
      addrconf_ifdown and it means the device is getting removed (or IPv6 is
      disabled). The interesting cases for changing the idev flag are
      addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown
      (reset the flag). The former does not have the idev lock - only rtnl;
      the latter has both. Based on that the existing dead + IF_READY check
      can be moved to right after the rtnl_lock in addrconf_dad_work.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: default avatarRajendra Dendukuri <rajendra.dendukuri@broadcom.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3ce2a21
    • Eric Dumazet's avatar
      tcp: adjust rto_base in retransmits_timed_out() · 3256a2d6
      Eric Dumazet authored
      The cited commit exposed an old retransmits_timed_out() bug
      which assumed it could call tcp_model_timeout() with
      TCP_RTO_MIN as rto_base for all states.
      
      But flows in SYN_SENT or SYN_RECV state uses a different
      RTO base (1 sec instead of 200 ms, unless BPF choses
      another value)
      
      This caused a reduction of SYN retransmits from 6 to 4 with
      the default /proc/sys/net/ipv4/tcp_syn_retries value.
      
      Fixes: a41e8a88 ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Marek Majkowski <marek@cloudflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3256a2d6
    • Dexuan Cui's avatar
      vsock: Fix a lockdep warning in __vsock_release() · 0d9138ff
      Dexuan Cui authored
      Lockdep is unhappy if two locks from the same class are held.
      
      Fix the below warning for hyperv and virtio sockets (vmci socket code
      doesn't have the issue) by using lock_sock_nested() when __vsock_release()
      is called recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.3.0+ #1 Not tainted
      --------------------------------------------
      server/1795 is trying to acquire lock:
      ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
      
      but task is already holding lock:
      ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_VSOCK);
        lock(sk_lock-AF_VSOCK);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by server/1795:
       #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
       #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      stack backtrace:
      CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
      Call Trace:
       dump_stack+0x67/0x90
       __lock_acquire.cold.67+0xd2/0x20b
       lock_acquire+0xb5/0x1c0
       lock_sock_nested+0x6d/0x90
       hvs_release+0x10/0x120 [hv_sock]
       __vsock_release+0x24/0xf0 [vsock]
       __vsock_release+0xa0/0xf0 [vsock]
       vsock_release+0x12/0x30 [vsock]
       __sock_release+0x37/0xa0
       sock_close+0x14/0x20
       __fput+0xc1/0x250
       task_work_run+0x98/0xc0
       do_exit+0x344/0xc60
       do_group_exit+0x47/0xb0
       get_signal+0x15c/0xc50
       do_signal+0x30/0x720
       exit_to_usermode_loop+0x50/0xa0
       do_syscall_64+0x24e/0x270
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f4184e85f31
      Tested-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d9138ff
    • Johan Hovold's avatar
      hso: fix NULL-deref on tty open · 8353da9f
      Johan Hovold authored
      Fix NULL-pointer dereference on tty open due to a failure to handle a
      missing interrupt-in endpoint when probing modem ports:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000006
      	...
      	RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
      	...
      	Call Trace:
      	hso_start_serial_device+0xdc/0x140 [hso]
      	hso_serial_open+0x118/0x1b0 [hso]
      	tty_open+0xf1/0x490
      
      Fixes: 542f5482 ("tty: Modem functions for the HSO driver")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8353da9f
  5. 01 Oct, 2019 1 commit
    • Oleksij Rempel's avatar
      net: ag71xx: fix mdio subnode support · 569aad4f
      Oleksij Rempel authored
      This patch is syncing driver with actual devicetree documentation:
      Documentation/devicetree/bindings/net/qca,ar71xx.txt
      |Optional subnodes:
      |- mdio : specifies the mdio bus, used as a container for phy nodes
      |  according to phy.txt in the same directory
      
      The driver was working with fixed phy without any noticeable issues. This bug
      was uncovered by introducing dsa ar9331-switch driver.
      Since no one reported this bug until now, I assume no body is using it
      and this patch should not brake existing system.
      
      Fixes: d51b6ce4 ("net: ethernet: add ag71xx driver")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      569aad4f