1. 16 Oct, 2019 9 commits
    • Eric Dumazet's avatar
      net: avoid potential infinite loop in tc_ctl_action() · 39f13ea2
      Eric Dumazet authored
      tc_ctl_action() has the ability to loop forever if tcf_action_add()
      returns -EAGAIN.
      
      This special case has been done in case a module needed to be loaded,
      but it turns out that tcf_add_notify() could also return -EAGAIN
      if the socket sk_rcvbuf limit is hit.
      
      We need to separate the two cases, and only loop for the module
      loading case.
      
      While we are at it, add a limit of 10 attempts since unbounded
      loops are always scary.
      
      syzbot repro was something like :
      
      socket(PF_NETLINK, SOCK_RAW|SOCK_NONBLOCK, NETLINK_ROUTE) = 3
      write(3, ..., 38) = 38
      setsockopt(3, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0
      sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{..., 388}], msg_controllen=0, msg_flags=0x10}, ...)
      
      NMI backtrace for cpu 0
      CPU: 0 PID: 1054 Comm: khungtaskd Not tainted 5.4.0-rc1+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
       nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
       arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
       trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
       check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
       watchdog+0x9d0/0xef0 kernel/hung_task.c:289
       kthread+0x361/0x430 kernel/kthread.c:255
       ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
      Sending NMI from CPU 0 to CPUs 1:
      NMI backtrace for cpu 1
      CPU: 1 PID: 8859 Comm: syz-executor910 Not tainted 5.4.0-rc1+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:751 [inline]
      RIP: 0010:lockdep_hardirqs_off+0x1df/0x2e0 kernel/locking/lockdep.c:3453
      Code: 5c 08 00 00 5b 41 5c 41 5d 5d c3 48 c7 c0 58 1d f3 88 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 d3 00 00 00 <48> 83 3d 21 9e 99 07 00 0f 84 b9 00 00 00 9c 58 0f 1f 44 00 00 f6
      RSP: 0018:ffff8880a6f3f1b8 EFLAGS: 00000046
      RAX: 1ffffffff11e63ab RBX: ffff88808c9c6080 RCX: 0000000000000000
      RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88808c9c6914
      RBP: ffff8880a6f3f1d0 R08: ffff88808c9c6080 R09: fffffbfff16be5d1
      R10: fffffbfff16be5d0 R11: 0000000000000003 R12: ffffffff8746591f
      R13: ffff88808c9c6080 R14: ffffffff8746591f R15: 0000000000000003
      FS:  00000000011e4880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffff600400 CR3: 00000000a8920000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       trace_hardirqs_off+0x62/0x240 kernel/trace/trace_preemptirq.c:45
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
       _raw_spin_lock_irqsave+0x6f/0xcd kernel/locking/spinlock.c:159
       __wake_up_common_lock+0xc8/0x150 kernel/sched/wait.c:122
       __wake_up+0xe/0x10 kernel/sched/wait.c:142
       netlink_unlock_table net/netlink/af_netlink.c:466 [inline]
       netlink_unlock_table net/netlink/af_netlink.c:463 [inline]
       netlink_broadcast_filtered+0x705/0xb80 net/netlink/af_netlink.c:1514
       netlink_broadcast+0x3a/0x50 net/netlink/af_netlink.c:1534
       rtnetlink_send+0xdd/0x110 net/core/rtnetlink.c:714
       tcf_add_notify net/sched/act_api.c:1343 [inline]
       tcf_action_add+0x243/0x370 net/sched/act_api.c:1362
       tc_ctl_action+0x3b5/0x4bc net/sched/act_api.c:1410
       rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5386
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5404
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x803/0x920 net/socket.c:2311
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2356
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg net/socket.c:2363 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
       do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440939
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+cf0adbb9c28c8866c788@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39f13ea2
    • Nishad Kamdar's avatar
      net: dsa: sja1105: Use the correct style for SPDX License Identifier · b790b554
      Nishad Kamdar authored
      This patch corrects the SPDX License Identifier style
      in header files related to Distributed Switch Architecture
      drivers for NXP SJA1105 series Ethernet switch support.
      It uses an expilict block comment for the SPDX License
      Identifier.
      
      Changes made by using a script provided by Joe Perches here:
      https://lkml.org/lkml/2019/2/7/46.
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarNishad Kamdar <nishadkamdar@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b790b554
    • Eric Dumazet's avatar
      tcp: fix a possible lockdep splat in tcp_done() · cab209e5
      Eric Dumazet authored
      syzbot found that if __inet_inherit_port() returns an error,
      we call tcp_done() after inet_csk_prepare_forced_close(),
      meaning the socket lock is no longer held.
      
      We might fix this in a different way in net-next, but
      for 5.4 it seems safer to relax the lockdep check.
      
      Fixes: d983ea6f ("tcp: add rcu protection around tp->fastopen_rsk")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cab209e5
    • David S. Miller's avatar
      Merge branch 'Update-MT7629-to-support-PHYLINK-API' · c9b96eb6
      David S. Miller authored
      MarkLee says:
      
      ====================
      Update MT7629 to support PHYLINK API
      
      This patch set has two goals :
      	1. Fix mt7629 GMII mode issue after apply mediatek
      	   PHYLINK support patch.
      	2. Update mt7629 dts to reflect the latest dt-binding
      	   with PHYLINK support.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9b96eb6
    • MarkLee's avatar
      arm: dts: mediatek: Update mt7629 dts to reflect the latest dt-binding · 2618500d
      MarkLee authored
      * Removes mediatek,physpeed property from dtsi that is useless in PHYLINK
      * Use the fixed-link property speed = <2500> to set the phy in 2.5Gbit.
      * Set gmac1 to gmii mode that connect to a internal gphy
      Signed-off-by: default avatarMarkLee <Mark-MC.Lee@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2618500d
    • MarkLee's avatar
      net: ethernet: mediatek: Fix MT7629 missing GMII mode support · 4e3eff5b
      MarkLee authored
      In the original design, mtk_phy_connect function will set ge_mode=1
      if phy-mode is GMII(PHY_INTERFACE_MODE_GMII) and then set the correct
      ge_mode to ETHSYS_SYSCFG0 register. This logic was broken after apply
      mediatek PHYLINK patch(Fixes tag), the new mtk_mac_config function will
      not set ge_mode=1 for GMII mode hence the final ETHSYS_SYSCFG0 setting
      will be incorrect for mt7629 GMII mode. This patch add the missing logic
      back to fix it.
      
      Fixes: b8fc9f30 ("net: ethernet: mediatek: Add basic PHYLINK support")
      Signed-off-by: default avatarMarkLee <Mark-MC.Lee@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e3eff5b
    • David S. Miller's avatar
      Merge branch 'mpls-push-pop-fix' · 8d045995
      David S. Miller authored
      Davide Caratti says:
      
      ====================
      net/sched: fix wrong behavior of MPLS push/pop action
      
      this series contains two fixes for TC 'act_mpls', that try to address
      two problems that can be observed configuring simple 'push' / 'pop'
      operations:
      - patch 1/2 avoids dropping non-MPLS packets that pass through the MPLS
        'pop' action.
      - patch 2/2 fixes corruption of the L2 header that occurs when 'push'
        or 'pop' actions are configured in TC egress path.
      
      v2: - change commit message in patch 1/2 to better describe that the
            patch impacts only TC, thanks to Simon Horman
          - fix missing documentation of 'mac_len' in patch 2/2
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d045995
    • Davide Caratti's avatar
      net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions · fa4e0f88
      Davide Caratti authored
      the following script:
      
       # tc qdisc add dev eth0 clsact
       # tc filter add dev eth0 egress protocol ip matchall \
       > action mpls push protocol mpls_uc label 0x355aa bos 1
      
      causes corruption of all IP packets transmitted by eth0. On TC egress, we
      can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
      operation will result in an overwrite of the first 4 octets in the packet
      L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
      error pattern is present also in the MPLS 'pop' operation. Fix this error
      in act_mpls data plane, computing 'mac_len' as the difference between the
      network header and the mac header (when not at TC ingress), and use it in
      MPLS 'push'/'pop' core functions.
      
      v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
          in skb_mpls_pop(), reported by kbuild test robot
      
      CC: Lorenzo Bianconi <lorenzo@kernel.org>
      Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC")
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa4e0f88
    • Davide Caratti's avatar
      net: avoid errors when trying to pop MLPS header on non-MPLS packets · dedc5a08
      Davide Caratti authored
      the following script:
      
       # tc qdisc add dev eth0 clsact
       # tc filter add dev eth0 egress matchall action mpls pop
      
      implicitly makes the kernel drop all packets transmitted by eth0, if they
      don't have a MPLS header. This behavior is uncommon: other encapsulations
      (like VLAN) just let the packet pass unmodified. Since the result of MPLS
      'pop' operation would be the same regardless of the presence / absence of
      MPLS header(s) in the original packet, we can let skb_mpls_pop() return 0
      when dealing with non-MPLS packets.
      
      For the OVS use-case, this is acceptable because __ovs_nla_copy_actions()
      already ensures that MPLS 'pop' operation only occurs with packets having
      an MPLS Ethernet type (and there are no other callers in current code, so
      the semantic change should be ok).
      
      v2: better documentation of use-cases for skb_mpls_pop(), thanks to Simon
          Horman
      
      Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC")
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dedc5a08
  2. 15 Oct, 2019 12 commits
  3. 14 Oct, 2019 1 commit
    • Miaoqing Pan's avatar
      ath10k: fix latency issue for QCA988x · d79749f7
      Miaoqing Pan authored
      (kvalo: cherry picked from commit 1340cc63 in
      wireless-drivers-next to wireless-drivers as this a frequently reported
      regression)
      
      Bad latency is found on QCA988x, the issue was introduced by
      commit 4504f0e5 ("ath10k: sdio: workaround firmware UART
      pin configuration bug"). If uart_pin_workaround is false, this
      change will set uart pin even if uart_print is false.
      
      Tested HW: QCA9880
      Tested FW: 10.2.4-1.0-00037
      
      Fixes: 4504f0e5 ("ath10k: sdio: workaround firmware UART pin configuration bug")
      Signed-off-by: default avatarMiaoqing Pan <miaoqing@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      d79749f7
  4. 13 Oct, 2019 13 commits
    • YueHaibing's avatar
      netdevsim: Fix error handling in nsim_fib_init and nsim_fib_exit · 33902b4a
      YueHaibing authored
      In nsim_fib_init(), if register_fib_notifier failed, nsim_fib_net_ops
      should be unregistered before return.
      
      In nsim_fib_exit(), unregister_fib_notifier should be called before
      nsim_fib_net_ops be unregistered, otherwise may cause use-after-free:
      
      BUG: KASAN: use-after-free in nsim_fib_event_nb+0x342/0x570 [netdevsim]
      Read of size 8 at addr ffff8881daaf4388 by task kworker/0:3/3499
      
      CPU: 0 PID: 3499 Comm: kworker/0:3 Not tainted 5.3.0-rc7+ #30
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Workqueue: ipv6_addrconf addrconf_dad_work [ipv6]
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xa9/0x10e lib/dump_stack.c:113
       print_address_description+0x65/0x380 mm/kasan/report.c:351
       __kasan_report+0x149/0x18d mm/kasan/report.c:482
       kasan_report+0xe/0x20 mm/kasan/common.c:618
       nsim_fib_event_nb+0x342/0x570 [netdevsim]
       notifier_call_chain+0x52/0xf0 kernel/notifier.c:95
       __atomic_notifier_call_chain+0x78/0x140 kernel/notifier.c:185
       call_fib_notifiers+0x30/0x60 net/core/fib_notifier.c:30
       call_fib6_entry_notifiers+0xc1/0x100 [ipv6]
       fib6_add+0x92e/0x1b10 [ipv6]
       __ip6_ins_rt+0x40/0x60 [ipv6]
       ip6_ins_rt+0x84/0xb0 [ipv6]
       __ipv6_ifa_notify+0x4b6/0x550 [ipv6]
       ipv6_ifa_notify+0xa5/0x180 [ipv6]
       addrconf_dad_completed+0xca/0x640 [ipv6]
       addrconf_dad_work+0x296/0x960 [ipv6]
       process_one_work+0x5c0/0xc00 kernel/workqueue.c:2269
       worker_thread+0x5c/0x670 kernel/workqueue.c:2415
       kthread+0x1d7/0x200 kernel/kthread.c:255
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      Allocated by task 3388:
       save_stack+0x19/0x80 mm/kasan/common.c:69
       set_track mm/kasan/common.c:77 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:493
       kmalloc include/linux/slab.h:557 [inline]
       kzalloc include/linux/slab.h:748 [inline]
       ops_init+0xa9/0x220 net/core/net_namespace.c:127
       __register_pernet_operations net/core/net_namespace.c:1135 [inline]
       register_pernet_operations+0x1d4/0x420 net/core/net_namespace.c:1212
       register_pernet_subsys+0x24/0x40 net/core/net_namespace.c:1253
       nsim_fib_init+0x12/0x70 [netdevsim]
       veth_get_link_ksettings+0x2b/0x50 [veth]
       do_one_initcall+0xd4/0x454 init/main.c:939
       do_init_module+0xe0/0x330 kernel/module.c:3490
       load_module+0x3c2f/0x4620 kernel/module.c:3841
       __do_sys_finit_module+0x163/0x190 kernel/module.c:3931
       do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 3534:
       save_stack+0x19/0x80 mm/kasan/common.c:69
       set_track mm/kasan/common.c:77 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:455
       slab_free_hook mm/slub.c:1423 [inline]
       slab_free_freelist_hook mm/slub.c:1474 [inline]
       slab_free mm/slub.c:3016 [inline]
       kfree+0xe9/0x2d0 mm/slub.c:3957
       ops_free net/core/net_namespace.c:151 [inline]
       ops_free_list.part.7+0x156/0x220 net/core/net_namespace.c:184
       ops_free_list net/core/net_namespace.c:182 [inline]
       __unregister_pernet_operations net/core/net_namespace.c:1165 [inline]
       unregister_pernet_operations+0x221/0x2a0 net/core/net_namespace.c:1224
       unregister_pernet_subsys+0x1d/0x30 net/core/net_namespace.c:1271
       nsim_fib_exit+0x11/0x20 [netdevsim]
       nsim_module_exit+0x16/0x21 [netdevsim]
       __do_sys_delete_module kernel/module.c:1015 [inline]
       __se_sys_delete_module kernel/module.c:958 [inline]
       __x64_sys_delete_module+0x244/0x330 kernel/module.c:958
       do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: 59c84b9f ("netdevsim: Restore per-network namespace accounting for fib entries")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33902b4a
    • Cédric Le Goater's avatar
      net/ibmvnic: Fix EOI when running in XIVE mode. · 11d49ce9
      Cédric Le Goater authored
      pSeries machines on POWER9 processors can run with the XICS (legacy)
      interrupt mode or with the XIVE exploitation interrupt mode. These
      interrupt contollers have different interfaces for interrupt
      management : XICS uses hcalls and XIVE loads and stores on a page.
      H_EOI being a XICS interface the enable_scrq_irq() routine can fail
      when the machine runs in XIVE mode.
      
      Fix that by calling the EOI handler of the interrupt chip.
      
      Fixes: f23e0643 ("ibmvnic: Clear pending interrupt after device reset")
      Signed-off-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11d49ce9
    • Alexandre Belloni's avatar
      net: lpc_eth: avoid resetting twice · c23936fa
      Alexandre Belloni authored
      __lpc_eth_shutdown is called after __lpc_eth_reset but it is already
      calling __lpc_eth_reset. Avoid resetting the IP twice.
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c23936fa
    • David S. Miller's avatar
      Merge branch 'tcp-address-KCSAN-reports-in-tcp_poll-part-I' · 3f233809
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: address KCSAN reports in tcp_poll() (part I)
      
      This all started with a KCSAN report (included
      in "tcp: annotate tp->rcv_nxt lockless reads" changelog)
      
      tcp_poll() runs in a lockless way. This means that about
      all accesses of tcp socket fields done in tcp_poll() context
      need annotations otherwise KCSAN will complain about data-races.
      
      While doing this detective work, I found a more serious bug,
      addressed by the first patch ("tcp: add rcu protection around
      tp->fastopen_rsk").
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f233809
    • Eric Dumazet's avatar
      tcp: annotate sk->sk_wmem_queued lockless reads · ab4e846a
      Eric Dumazet authored
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_wmem_queued while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      sk_wmem_queued_add() helper is added so that we can in
      the future convert to ADD_ONCE() or equivalent if/when
      available.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab4e846a
    • Eric Dumazet's avatar
      tcp: annotate sk->sk_sndbuf lockless reads · e292f05e
      Eric Dumazet authored
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_sndbuf while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      Note that other transports probably need similar fixes.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e292f05e
    • Eric Dumazet's avatar
      tcp: annotate sk->sk_rcvbuf lockless reads · ebb3b78d
      Eric Dumazet authored
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_rcvbuf while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      Note that other transports probably need similar fixes.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebb3b78d
    • Eric Dumazet's avatar
      tcp: annotate tp->urg_seq lockless reads · d9b55bf7
      Eric Dumazet authored
      There two places where we fetch tp->urg_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write side use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9b55bf7
    • Eric Dumazet's avatar
      tcp: annotate tp->snd_nxt lockless reads · e0d694d6
      Eric Dumazet authored
      There are few places where we fetch tp->snd_nxt while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0d694d6
    • Eric Dumazet's avatar
      tcp: annotate tp->write_seq lockless reads · 0f317464
      Eric Dumazet authored
      There are few places where we fetch tp->write_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f317464
    • Eric Dumazet's avatar
      tcp: annotate tp->copied_seq lockless reads · 7db48e98
      Eric Dumazet authored
      There are few places where we fetch tp->copied_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      
      Note that tcp_inq_hint() was already using READ_ONCE(tp->copied_seq)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7db48e98
    • Eric Dumazet's avatar
      tcp: annotate tp->rcv_nxt lockless reads · dba7d9b8
      Eric Dumazet authored
      There are few places where we fetch tp->rcv_nxt while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      
      Note that tcp_inq_hint() was already using READ_ONCE(tp->rcv_nxt)
      
      syzbot reported :
      
      BUG: KCSAN: data-race in tcp_poll / tcp_queue_rcv
      
      write to 0xffff888120425770 of 4 bytes by interrupt on cpu 0:
       tcp_rcv_nxt_update net/ipv4/tcp_input.c:3365 [inline]
       tcp_queue_rcv+0x180/0x380 net/ipv4/tcp_input.c:4638
       tcp_rcv_established+0xbf1/0xf50 net/ipv4/tcp_input.c:5616
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
       tcp_v4_rcv+0x1a03/0x1bf0 net/ipv4/tcp_ipv4.c:1923
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
      
      read to 0xffff888120425770 of 4 bytes by task 7254 on cpu 1:
       tcp_stream_is_readable net/ipv4/tcp.c:480 [inline]
       tcp_poll+0x204/0x6b0 net/ipv4/tcp.c:554
       sock_poll+0xed/0x250 net/socket.c:1256
       vfs_poll include/linux/poll.h:90 [inline]
       ep_item_poll.isra.0+0x90/0x190 fs/eventpoll.c:892
       ep_send_events_proc+0x113/0x5c0 fs/eventpoll.c:1749
       ep_scan_ready_list.constprop.0+0x189/0x500 fs/eventpoll.c:704
       ep_send_events fs/eventpoll.c:1793 [inline]
       ep_poll+0xe3/0x900 fs/eventpoll.c:1930
       do_epoll_wait+0x162/0x180 fs/eventpoll.c:2294
       __do_sys_epoll_pwait fs/eventpoll.c:2325 [inline]
       __se_sys_epoll_pwait fs/eventpoll.c:2311 [inline]
       __x64_sys_epoll_pwait+0xcd/0x170 fs/eventpoll.c:2311
       do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 7254 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba7d9b8
    • Eric Dumazet's avatar
      tcp: add rcu protection around tp->fastopen_rsk · d983ea6f
      Eric Dumazet authored
      Both tcp_v4_err() and tcp_v6_err() do the following operations
      while they do not own the socket lock :
      
      	fastopen = tp->fastopen_rsk;
       	snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;
      
      The problem is that without appropriate barrier, the compiler
      might reload tp->fastopen_rsk and trigger a NULL deref.
      
      request sockets are protected by RCU, we can simply add
      the missing annotations and barriers to solve the issue.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d983ea6f
  5. 12 Oct, 2019 2 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 8caf8a91
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2019-10-12
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) a bunch of small fixes. Nothing critical.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8caf8a91
    • David Howells's avatar
      rxrpc: Fix possible NULL pointer access in ICMP handling · f0308fb0
      David Howells authored
      If an ICMP packet comes in on the UDP socket backing an AF_RXRPC socket as
      the UDP socket is being shut down, rxrpc_error_report() may get called to
      deal with it after sk_user_data on the UDP socket has been cleared, leading
      to a NULL pointer access when this local endpoint record gets accessed.
      
      Fix this by just returning immediately if sk_user_data was NULL.
      
      The oops looks like the following:
      
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      ...
      RIP: 0010:rxrpc_error_report+0x1bd/0x6a9
      ...
      Call Trace:
       ? sock_queue_err_skb+0xbd/0xde
       ? __udp4_lib_err+0x313/0x34d
       __udp4_lib_err+0x313/0x34d
       icmp_unreach+0x1ee/0x207
       icmp_rcv+0x25b/0x28f
       ip_protocol_deliver_rcu+0x95/0x10e
       ip_local_deliver+0xe9/0x148
       __netif_receive_skb_one_core+0x52/0x6e
       process_backlog+0xdc/0x177
       net_rx_action+0xf9/0x270
       __do_softirq+0x1b6/0x39a
       ? smpboot_register_percpu_thread+0xce/0xce
       run_ksoftirqd+0x1d/0x42
       smpboot_thread_fn+0x19e/0x1b3
       kthread+0xf1/0xf6
       ? kthread_delayed_work_timer_fn+0x83/0x83
       ret_from_fork+0x24/0x30
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Reported-by: syzbot+611164843bd48cc2190c@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0308fb0
  6. 11 Oct, 2019 3 commits