1. 10 Mar, 2020 11 commits
    • Shakeel Butt's avatar
      net: memcg: late association of sock to memcg · d752a498
      Shakeel Butt authored
      If a TCP socket is allocated in IRQ context or cloned from unassociated
      (i.e. not associated to a memcg) in IRQ context then it will remain
      unassociated for its whole life. Almost half of the TCPs created on the
      system are created in IRQ context, so, memory used by such sockets will
      not be accounted by the memcg.
      
      This issue is more widespread in cgroup v1 where network memory
      accounting is opt-in but it can happen in cgroup v2 if the source socket
      for the cloning was created in root memcg.
      
      To fix the issue, just do the association of the sockets at the accept()
      time in the process context and then force charge the memory buffer
      already used and reserved by the socket.
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d752a498
    • Shakeel Butt's avatar
      cgroup: memcg: net: do not associate sock with unrelated cgroup · e876ecc6
      Shakeel Butt authored
      We are testing network memory accounting in our setup and noticed
      inconsistent network memory usage and often unrelated cgroups network
      usage correlates with testing workload. On further inspection, it
      seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
      irq context specially for cgroup v1.
      
      mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
      and kind of assumes that this can only happen from sk_clone_lock()
      and the source sock object has already associated cgroup. However in
      cgroup v1, where network memory accounting is opt-in, the source sock
      can be unassociated with any cgroup and the new cloned sock can get
      associated with unrelated interrupted cgroup.
      
      Cgroup v2 can also suffer if the source sock object was created by
      process in the root cgroup or if sk_alloc() is called in irq context.
      The fix is to just do nothing in interrupt.
      
      WARNING: Please note that about half of the TCP sockets are allocated
      from the IRQ context, so, memory used by such sockets will not be
      accouted by the memcg.
      
      The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
      
      CPU: 70 PID: 12720 Comm: ssh Tainted:  5.6.0-smp-DEV #1
      Hardware name: ...
      Call Trace:
       <IRQ>
       dump_stack+0x57/0x75
       mem_cgroup_sk_alloc+0xe9/0xf0
       sk_clone_lock+0x2a7/0x420
       inet_csk_clone_lock+0x1b/0x110
       tcp_create_openreq_child+0x23/0x3b0
       tcp_v6_syn_recv_sock+0x88/0x730
       tcp_check_req+0x429/0x560
       tcp_v6_rcv+0x72d/0xa40
       ip6_protocol_deliver_rcu+0xc9/0x400
       ip6_input+0x44/0xd0
       ? ip6_protocol_deliver_rcu+0x400/0x400
       ip6_rcv_finish+0x71/0x80
       ipv6_rcv+0x5b/0xe0
       ? ip6_sublist_rcv+0x2e0/0x2e0
       process_backlog+0x108/0x1e0
       net_rx_action+0x26b/0x460
       __do_softirq+0x104/0x2a6
       do_softirq_own_stack+0x2a/0x40
       </IRQ>
       do_softirq.part.19+0x40/0x50
       __local_bh_enable_ip+0x51/0x60
       ip6_finish_output2+0x23d/0x520
       ? ip6table_mangle_hook+0x55/0x160
       __ip6_finish_output+0xa1/0x100
       ip6_finish_output+0x30/0xd0
       ip6_output+0x73/0x120
       ? __ip6_finish_output+0x100/0x100
       ip6_xmit+0x2e3/0x600
       ? ipv6_anycast_cleanup+0x50/0x50
       ? inet6_csk_route_socket+0x136/0x1e0
       ? skb_free_head+0x1e/0x30
       inet6_csk_xmit+0x95/0xf0
       __tcp_transmit_skb+0x5b4/0xb20
       __tcp_send_ack.part.60+0xa3/0x110
       tcp_send_ack+0x1d/0x20
       tcp_rcv_state_process+0xe64/0xe80
       ? tcp_v6_connect+0x5d1/0x5f0
       tcp_v6_do_rcv+0x1b1/0x3f0
       ? tcp_v6_do_rcv+0x1b1/0x3f0
       __release_sock+0x7f/0xd0
       release_sock+0x30/0xa0
       __inet_stream_connect+0x1c3/0x3b0
       ? prepare_to_wait+0xb0/0xb0
       inet_stream_connect+0x3b/0x60
       __sys_connect+0x101/0x120
       ? __sys_getsockopt+0x11b/0x140
       __x64_sys_connect+0x1a/0x20
       do_syscall_64+0x51/0x200
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
      Fixes: 2d758073 ("mm: memcontrol: consolidate cgroup socket tracking")
      Fixes: d979a39d ("cgroup: duplicate cgroup reference when cloning sockets")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e876ecc6
    • Jakub Kicinski's avatar
      MAINTAINERS: update cxgb4vf maintainer to Vishal · 65dfcf08
      Jakub Kicinski authored
      Casey Leedomn <leedom@chelsio.com> is bouncing,
      Vishal indicated he's happy to take the role.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65dfcf08
    • David S. Miller's avatar
      Merge tag 'batadv-net-for-davem-20200306' of git://git.open-mesh.org/linux-merge · 23620594
      David S. Miller authored
      Simon Wunderlich says:
      
      ====================
      Here is a batman-adv bugfix:
      
       - Don't schedule OGM for disabled interface, by Sven Eckelmann
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23620594
    • Vladimir Oltean's avatar
      net: mscc: ocelot: properly account for VLAN header length when setting MRU · a8015ded
      Vladimir Oltean authored
      What the driver writes into MAC_MAXLEN_CFG does not actually represent
      VLAN_ETH_FRAME_LEN but instead ETH_FRAME_LEN + ETH_FCS_LEN. Yes they are
      numerically equal, but the difference is important, as the switch treats
      VLAN-tagged traffic specially and knows to increase the maximum accepted
      frame size automatically. So it is always wrong to account for VLAN in
      the MAC_MAXLEN_CFG register.
      
      Unconditionally increase the maximum allowed frame size for
      double-tagged traffic. Accounting for the additional length does not
      mean that the other VLAN membership checks aren't performed, so there's
      no harm done.
      
      Also, stop abusing the MTU name for configuring the MRU. There is no
      support for configuring the MRU on an interface at the moment.
      
      Fixes: a556c76a ("net: mscc: Add initial Ocelot switch support")
      Fixes: fa914e9c ("net: mscc: ocelot: create a helper for changing the port MTU")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8015ded
    • Eric Dumazet's avatar
      ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast() · afe207d8
      Eric Dumazet authored
      Commit e18b353f ("ipvlan: add cond_resched_rcu() while
      processing muticast backlog") added a cond_resched_rcu() in a loop
      using rcu protection to iterate over slaves.
      
      This is breaking rcu rules, so lets instead use cond_resched()
      at a point we can reschedule
      
      Fixes: e18b353f ("ipvlan: add cond_resched_rcu() while processing muticast backlog")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afe207d8
    • Dmitry Yakunin's avatar
      cgroup, netclassid: periodically release file_lock on classid updating · 018d26fc
      Dmitry Yakunin authored
      In our production environment we have faced with problem that updating
      classid in cgroup with heavy tasks cause long freeze of the file tables
      in this tasks. By heavy tasks we understand tasks with many threads and
      opened sockets (e.g. balancers). This freeze leads to an increase number
      of client timeouts.
      
      This patch implements following logic to fix this issue:
      аfter iterating 1000 file descriptors file table lock will be released
      thus providing a time gap for socket creation/deletion.
      
      Now update is non atomic and socket may be skipped using calls:
      
      dup2(oldfd, newfd);
      close(oldfd);
      
      But this case is not typical. Moreover before this patch skip is possible
      too by hiding socket fd in unix socket buffer.
      
      New sockets will be allocated with updated classid because cgroup state
      is updated before start of the file descriptors iteration.
      
      So in common cases this patch has no side effects.
      Signed-off-by: default avatarDmitry Yakunin <zeil@yandex-team.ru>
      Reviewed-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      018d26fc
    • Mahesh Bandewar's avatar
      macvlan: add cond_resched() during multicast processing · ce9a4186
      Mahesh Bandewar authored
      The Rx bound multicast packets are deferred to a workqueue and
      macvlan can also suffer from the same attack that was discovered
      by Syzbot for IPvlan. This solution is not as effective as in
      IPvlan. IPvlan defers all (Tx and Rx) multicast packet processing
      to a workqueue while macvlan does this way only for the Rx. This
      fix should address the Rx codition to certain extent.
      
      Tx is still suseptible. Tx multicast processing happens when
      .ndo_start_xmit is called, hence we cannot add cond_resched().
      However, it's not that severe since the user which is generating
       / flooding will be affected the most.
      
      Fixes: 412ca155 ("macvlan: Move broadcasts into a work queue")
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce9a4186
    • Mahesh Bandewar's avatar
      ipvlan: add cond_resched_rcu() while processing muticast backlog · e18b353f
      Mahesh Bandewar authored
      If there are substantial number of slaves created as simulated by
      Syzbot, the backlog processing could take much longer and result
      into the issue found in the Syzbot report.
      
      INFO: rcu_sched detected stalls on CPUs/tasks:
              (detected by 1, t=10502 jiffies, g=5049, c=5048, q=752)
      All QSes seen, last rcu_sched kthread activity 10502 (4294965563-4294955061), jiffies_till_next_fqs=1, root ->qsmask 0x0
      syz-executor.1  R  running task on cpu   1  10984 11210   3866 0x30020008 179034491270
      Call Trace:
       <IRQ>
       [<ffffffff81497163>] _sched_show_task kernel/sched/core.c:8063 [inline]
       [<ffffffff81497163>] _sched_show_task.cold+0x2fd/0x392 kernel/sched/core.c:8030
       [<ffffffff8146a91b>] sched_show_task+0xb/0x10 kernel/sched/core.c:8073
       [<ffffffff815c931b>] print_other_cpu_stall kernel/rcu/tree.c:1577 [inline]
       [<ffffffff815c931b>] check_cpu_stall kernel/rcu/tree.c:1695 [inline]
       [<ffffffff815c931b>] __rcu_pending kernel/rcu/tree.c:3478 [inline]
       [<ffffffff815c931b>] rcu_pending kernel/rcu/tree.c:3540 [inline]
       [<ffffffff815c931b>] rcu_check_callbacks.cold+0xbb4/0xc29 kernel/rcu/tree.c:2876
       [<ffffffff815e3962>] update_process_times+0x32/0x80 kernel/time/timer.c:1635
       [<ffffffff816164f0>] tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
       [<ffffffff81616ae4>] tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
       [<ffffffff815e75f7>] __run_hrtimer kernel/time/hrtimer.c:1393 [inline]
       [<ffffffff815e75f7>] __hrtimer_run_queues+0x307/0xd90 kernel/time/hrtimer.c:1455
       [<ffffffff815e90ea>] hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1513
       [<ffffffff844050f4>] local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
       [<ffffffff844050f4>] smp_apic_timer_interrupt+0x144/0x5e0 arch/x86/kernel/apic/apic.c:1056
       [<ffffffff84401cbe>] apic_timer_interrupt+0x8e/0xa0 arch/x86/entry/entry_64.S:778
      RIP: 0010:do_raw_read_lock+0x22/0x80 kernel/locking/spinlock_debug.c:153
      RSP: 0018:ffff8801dad07ab8 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff12
      RAX: 0000000000000000 RBX: ffff8801c4135680 RCX: 0000000000000000
      RDX: 1ffff10038826afe RSI: ffff88019d816bb8 RDI: ffff8801c41357f0
      RBP: ffff8801dad07ac0 R08: 0000000000004b15 R09: 0000000000310273
      R10: ffff88019d816bb8 R11: 0000000000000001 R12: ffff8801c41357e8
      R13: 0000000000000000 R14: ffff8801cfb19850 R15: ffff8801cfb198b0
       [<ffffffff8101460e>] __raw_read_lock_bh include/linux/rwlock_api_smp.h:177 [inline]
       [<ffffffff8101460e>] _raw_read_lock_bh+0x3e/0x50 kernel/locking/spinlock.c:240
       [<ffffffff840d78ca>] ipv6_chk_mcast_addr+0x11a/0x6f0 net/ipv6/mcast.c:1006
       [<ffffffff84023439>] ip6_mc_input+0x319/0x8e0 net/ipv6/ip6_input.c:482
       [<ffffffff840211c8>] dst_input include/net/dst.h:449 [inline]
       [<ffffffff840211c8>] ip6_rcv_finish+0x408/0x610 net/ipv6/ip6_input.c:78
       [<ffffffff840214de>] NF_HOOK include/linux/netfilter.h:292 [inline]
       [<ffffffff840214de>] NF_HOOK include/linux/netfilter.h:286 [inline]
       [<ffffffff840214de>] ipv6_rcv+0x10e/0x420 net/ipv6/ip6_input.c:278
       [<ffffffff83a29efa>] __netif_receive_skb_one_core+0x12a/0x1f0 net/core/dev.c:5303
       [<ffffffff83a2a15c>] __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:5417
       [<ffffffff83a2f536>] process_backlog+0x216/0x6c0 net/core/dev.c:6243
       [<ffffffff83a30d1b>] napi_poll net/core/dev.c:6680 [inline]
       [<ffffffff83a30d1b>] net_rx_action+0x47b/0xfb0 net/core/dev.c:6748
       [<ffffffff846002c8>] __do_softirq+0x2c8/0x99a kernel/softirq.c:317
       [<ffffffff813e656a>] invoke_softirq kernel/softirq.c:399 [inline]
       [<ffffffff813e656a>] irq_exit+0x16a/0x1a0 kernel/softirq.c:439
       [<ffffffff84405115>] exiting_irq arch/x86/include/asm/apic.h:561 [inline]
       [<ffffffff84405115>] smp_apic_timer_interrupt+0x165/0x5e0 arch/x86/kernel/apic/apic.c:1058
       [<ffffffff84401cbe>] apic_timer_interrupt+0x8e/0xa0 arch/x86/entry/entry_64.S:778
       </IRQ>
      RIP: 0010:__sanitizer_cov_trace_pc+0x26/0x50 kernel/kcov.c:102
      RSP: 0018:ffff880196033bd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff12
      RAX: ffff88019d8161c0 RBX: 00000000ffffffff RCX: ffffc90003501000
      RDX: 0000000000000002 RSI: ffffffff816236d1 RDI: 0000000000000005
      RBP: ffff880196033bd8 R08: ffff88019d8161c0 R09: 0000000000000000
      R10: 1ffff10032c067f0 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000000
       [<ffffffff816236d1>] do_futex+0x151/0x1d50 kernel/futex.c:3548
       [<ffffffff816260f0>] C_SYSC_futex kernel/futex_compat.c:201 [inline]
       [<ffffffff816260f0>] compat_SyS_futex+0x270/0x3b0 kernel/futex_compat.c:175
       [<ffffffff8101da17>] do_syscall_32_irqs_on arch/x86/entry/common.c:353 [inline]
       [<ffffffff8101da17>] do_fast_syscall_32+0x357/0xe1c arch/x86/entry/common.c:415
       [<ffffffff84401a9b>] entry_SYSENTER_compat+0x8b/0x9d arch/x86/entry/entry_64_compat.S:139
      RIP: 0023:0xf7f23c69
      RSP: 002b:00000000f5d1f12c EFLAGS: 00000282 ORIG_RAX: 00000000000000f0
      RAX: ffffffffffffffda RBX: 000000000816af88 RCX: 0000000000000080
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000816af8c
      RBP: 00000000f5d1f228 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      rcu_sched kthread starved for 10502 jiffies! g5049 c5048 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1
      rcu_sched       R  running task on cpu   1  13048     8      2 0x90000000 179099587640
      Call Trace:
       [<ffffffff8147321f>] context_switch+0x60f/0xa60 kernel/sched/core.c:3209
       [<ffffffff8100095a>] __schedule+0x5aa/0x1da0 kernel/sched/core.c:3934
       [<ffffffff810021df>] schedule+0x8f/0x1b0 kernel/sched/core.c:4011
       [<ffffffff8101116d>] schedule_timeout+0x50d/0xee0 kernel/time/timer.c:1803
       [<ffffffff815c13f1>] rcu_gp_kthread+0xda1/0x3b50 kernel/rcu/tree.c:2327
       [<ffffffff8144b318>] kthread+0x348/0x420 kernel/kthread.c:246
       [<ffffffff84400266>] ret_from_fork+0x56/0x70 arch/x86/entry/entry_64.S:393
      
      Fixes: ba35f858 (“ipvlan: Defer multicast / broadcast processing to a work-queue”)
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e18b353f
    • Mahesh Bandewar's avatar
      ipvlan: don't deref eth hdr before checking it's set · ad819276
      Mahesh Bandewar authored
      IPvlan in L3 mode discards outbound multicast packets but performs
      the check before ensuring the ether-header is set or not. This is
      an error that Eric found through code browsing.
      
      Fixes: 2ad7bf36 (“ipvlan: Initial check-in of the IPVLAN driver.”)
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad819276
    • Edward Cree's avatar
      sfc: detach from cb_page in efx_copy_channel() · 4b1bd9db
      Edward Cree authored
      It's a resource, not a parameter, so we can't copy it into the new
       channel's TX queues, otherwise aliasing will lead to resource-
       management bugs if the channel is subsequently torn down without
       being initialised.
      
      Before the Fixes:-tagged commit there was a similar bug with
       tsoh_page, but I'm not sure it's worth doing another fix for such
       old kernels.
      
      Fixes: e9117e50 ("sfc: Firmware-Assisted TSO version 2")
      Suggested-by: default avatarDerek Shute <Derek.Shute@stratus.com>
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b1bd9db
  2. 09 Mar, 2020 5 commits
    • Masanari Iida's avatar
      linux-next: DOC: RDS: Fix a typo in rds.txt · 661388f9
      Masanari Iida authored
      This patch fix a spelling typo in rds.txt
      Signed-off-by: default avatarMasanari Iida <standby24x7@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      661388f9
    • Dmitry Yakunin's avatar
      inet_diag: return classid for all socket types · 83f73c5b
      Dmitry Yakunin authored
      In commit 1ec17dbd ("inet_diag: fix reporting cgroup classid and
      fallback to priority") croup classid reporting was fixed. But this works
      only for TCP sockets because for other socket types icsk parameter can
      be NULL and classid code path is skipped. This change moves classid
      handling to inet_diag_msg_attrs_fill() function.
      
      Also inet_diag_msg_attrs_size() helper was added and addends in
      nlmsg_new() were reordered to save order from inet_sk_diag_fill().
      
      Fixes: 1ec17dbd ("inet_diag: fix reporting cgroup classid and fallback to priority")
      Signed-off-by: default avatarDmitry Yakunin <zeil@yandex-team.ru>
      Reviewed-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83f73c5b
    • Remi Pommarel's avatar
      net: stmmac: dwmac1000: Disable ACS if enhanced descs are not used · b723bd93
      Remi Pommarel authored
      ACS (auto PAD/FCS stripping) removes FCS off 802.3 packets (LLC) so that
      there is no need to manually strip it for such packets. The enhanced DMA
      descriptors allow to flag LLC packets so that the receiving callback can
      use that to strip FCS manually or not. On the other hand, normal
      descriptors do not support that.
      
      Thus in order to not truncate LLC packet ACS should be disabled when
      using normal DMA descriptors.
      
      Fixes: 47dd7a54 ("net: add support for STMicroelectronics Ethernet controllers.")
      Signed-off-by: default avatarRemi Pommarel <repk@triplefau.lt>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b723bd93
    • Eric Dumazet's avatar
      gre: fix uninit-value in __iptunnel_pull_header · 17c25caf
      Eric Dumazet authored
      syzbot found an interesting case of the kernel reading
      an uninit-value [1]
      
      Problem is in the handling of ETH_P_WCCP in gre_parse_header()
      
      We look at the byte following GRE options to eventually decide
      if the options are four bytes longer.
      
      Use skb_header_pointer() to not pull bytes if we found
      that no more bytes were needed.
      
      All callers of gre_parse_header() are properly using pskb_may_pull()
      anyway before proceeding to next header.
      
      [1]
      BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2303 [inline]
      BUG: KMSAN: uninit-value in __iptunnel_pull_header+0x30c/0xbd0 net/ipv4/ip_tunnel_core.c:94
      CPU: 1 PID: 11784 Comm: syz-executor940 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       pskb_may_pull include/linux/skbuff.h:2303 [inline]
       __iptunnel_pull_header+0x30c/0xbd0 net/ipv4/ip_tunnel_core.c:94
       iptunnel_pull_header include/net/ip_tunnels.h:411 [inline]
       gre_rcv+0x15e/0x19c0 net/ipv6/ip6_gre.c:606
       ip6_protocol_deliver_rcu+0x181b/0x22c0 net/ipv6/ip6_input.c:432
       ip6_input_finish net/ipv6/ip6_input.c:473 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip6_input net/ipv6/ip6_input.c:482 [inline]
       ip6_mc_input+0xdf2/0x1460 net/ipv6/ip6_input.c:576
       dst_input include/net/dst.h:442 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ipv6_rcv+0x683/0x710 net/ipv6/ip6_input.c:306
       __netif_receive_skb_one_core net/core/dev.c:5198 [inline]
       __netif_receive_skb net/core/dev.c:5312 [inline]
       netif_receive_skb_internal net/core/dev.c:5402 [inline]
       netif_receive_skb+0x66b/0xf20 net/core/dev.c:5461
       tun_rx_batched include/linux/skbuff.h:4321 [inline]
       tun_get_user+0x6aef/0x6f60 drivers/net/tun.c:1997
       tun_chr_write_iter+0x1f2/0x360 drivers/net/tun.c:2026
       call_write_iter include/linux/fs.h:1901 [inline]
       new_sync_write fs/read_write.c:483 [inline]
       __vfs_write+0xa5a/0xca0 fs/read_write.c:496
       vfs_write+0x44a/0x8f0 fs/read_write.c:558
       ksys_write+0x267/0x450 fs/read_write.c:611
       __do_sys_write fs/read_write.c:623 [inline]
       __se_sys_write fs/read_write.c:620 [inline]
       __ia32_sys_write+0xdb/0x120 fs/read_write.c:620
       do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
       do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
       entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
      RIP: 0023:0xf7f62d99
      Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
      RSP: 002b:00000000fffedb2c EFLAGS: 00000217 ORIG_RAX: 0000000000000004
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020002580
      RDX: 0000000000000fca RSI: 0000000000000036 RDI: 0000000000000004
      RBP: 0000000000008914 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
       kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
       kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
       slab_alloc_node mm/slub.c:2793 [inline]
       __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401
       __kmalloc_reserve net/core/skbuff.c:142 [inline]
       __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210
       alloc_skb include/linux/skbuff.h:1051 [inline]
       alloc_skb_with_frags+0x18c/0xa70 net/core/skbuff.c:5766
       sock_alloc_send_pskb+0xada/0xc60 net/core/sock.c:2242
       tun_alloc_skb drivers/net/tun.c:1529 [inline]
       tun_get_user+0x10ae/0x6f60 drivers/net/tun.c:1843
       tun_chr_write_iter+0x1f2/0x360 drivers/net/tun.c:2026
       call_write_iter include/linux/fs.h:1901 [inline]
       new_sync_write fs/read_write.c:483 [inline]
       __vfs_write+0xa5a/0xca0 fs/read_write.c:496
       vfs_write+0x44a/0x8f0 fs/read_write.c:558
       ksys_write+0x267/0x450 fs/read_write.c:611
       __do_sys_write fs/read_write.c:623 [inline]
       __se_sys_write fs/read_write.c:620 [inline]
       __ia32_sys_write+0xdb/0x120 fs/read_write.c:620
       do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
       do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
       entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
      
      Fixes: 95f5c64c ("gre: Move utility functions to common headers")
      Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17c25caf
    • Jiri Wiesner's avatar
      ipvlan: do not add hardware address of master to its unicast filter list · 63aae7b1
      Jiri Wiesner authored
      There is a problem when ipvlan slaves are created on a master device that
      is a vmxnet3 device (ipvlan in VMware guests). The vmxnet3 driver does not
      support unicast address filtering. When an ipvlan device is brought up in
      ipvlan_open(), the ipvlan driver calls dev_uc_add() to add the hardware
      address of the vmxnet3 master device to the unicast address list of the
      master device, phy_dev->uc. This inevitably leads to the vmxnet3 master
      device being forced into promiscuous mode by __dev_set_rx_mode().
      
      Promiscuous mode is switched on the master despite the fact that there is
      still only one hardware address that the master device should use for
      filtering in order for the ipvlan device to be able to receive packets.
      The comment above struct net_device describes the uc_promisc member as a
      "counter, that indicates, that promiscuous mode has been enabled due to
      the need to listen to additional unicast addresses in a device that does
      not implement ndo_set_rx_mode()". Moreover, the design of ipvlan
      guarantees that only the hardware address of a master device,
      phy_dev->dev_addr, will be used to transmit and receive all packets from
      its ipvlan slaves. Thus, the unicast address list of the master device
      should not be modified by ipvlan_open() and ipvlan_stop() in order to make
      ipvlan a workable option on masters that do not support unicast address
      filtering.
      
      Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver")
      Reported-by: default avatarPer Sundstrom <per.sundstrom@redqube.se>
      Signed-off-by: default avatarJiri Wiesner <jwiesner@suse.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63aae7b1
  3. 07 Mar, 2020 10 commits
    • Jonathan Neuschäfer's avatar
      rhashtable: Document the right function parameters · aeaa925b
      Jonathan Neuschäfer authored
      rhashtable_lookup_get_insert_key doesn't have a parameter `data`. It
      does have a parameter `key`, however.
      Signed-off-by: default avatarJonathan Neuschäfer <j.neuschaefer@gmx.net>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aeaa925b
    • Jakub Kicinski's avatar
      MAINTAINERS: remove bouncing pkaustub@cisco.com from enic · 03138e2b
      Jakub Kicinski authored
      pkaustub@cisco.com is bouncing, remove it.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarChristian Benvenuti <benve@cisco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03138e2b
    • Shannon Nelson's avatar
      ionic: fix vf op lock usage · e396ce5f
      Shannon Nelson authored
      These are a couple of read locks that should be write locks.
      
      Fixes: fbb39807 ("ionic: support sr-iov operations")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e396ce5f
    • Eric Dumazet's avatar
      bonding/alb: make sure arp header is pulled before accessing it · b7469e83
      Eric Dumazet authored
      Similar to commit 38f88c45 ("bonding/alb: properly access headers
      in bond_alb_xmit()"), we need to make sure arp header was pulled
      in skb->head before blindly accessing it in rlb_arp_xmit().
      
      Remove arp_pkt() private helper, since it is more readable/obvious
      to have the following construct back to back :
      
      	if (!pskb_network_may_pull(skb, sizeof(*arp)))
      		return NULL;
      	arp = (struct arp_pkt *)skb_network_header(skb);
      
      syzbot reported :
      
      BUG: KMSAN: uninit-value in bond_slave_has_mac_rx include/net/bonding.h:704 [inline]
      BUG: KMSAN: uninit-value in rlb_arp_xmit drivers/net/bonding/bond_alb.c:662 [inline]
      BUG: KMSAN: uninit-value in bond_alb_xmit+0x575/0x25e0 drivers/net/bonding/bond_alb.c:1477
      CPU: 0 PID: 12743 Comm: syz-executor.4 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       bond_slave_has_mac_rx include/net/bonding.h:704 [inline]
       rlb_arp_xmit drivers/net/bonding/bond_alb.c:662 [inline]
       bond_alb_xmit+0x575/0x25e0 drivers/net/bonding/bond_alb.c:1477
       __bond_start_xmit drivers/net/bonding/bond_main.c:4257 [inline]
       bond_start_xmit+0x85d/0x2f70 drivers/net/bonding/bond_main.c:4282
       __netdev_start_xmit include/linux/netdevice.h:4524 [inline]
       netdev_start_xmit include/linux/netdevice.h:4538 [inline]
       xmit_one net/core/dev.c:3470 [inline]
       dev_hard_start_xmit+0x531/0xab0 net/core/dev.c:3486
       __dev_queue_xmit+0x37de/0x4220 net/core/dev.c:4063
       dev_queue_xmit+0x4b/0x60 net/core/dev.c:4096
       packet_snd net/packet/af_packet.c:2967 [inline]
       packet_sendmsg+0x8347/0x93b0 net/packet/af_packet.c:2992
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg net/socket.c:672 [inline]
       __sys_sendto+0xc1b/0xc50 net/socket.c:1998
       __do_sys_sendto net/socket.c:2010 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:2006
       __x64_sys_sendto+0x6e/0x90 net/socket.c:2006
       do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45c479
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fc77ffbbc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007fc77ffbc6d4 RCX: 000000000045c479
      RDX: 000000000000000e RSI: 00000000200004c0 RDI: 0000000000000003
      RBP: 000000000076bf20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000000a04 R14: 00000000004cc7b0 R15: 000000000076bf2c
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
       kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
       kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
       slab_alloc_node mm/slub.c:2793 [inline]
       __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401
       __kmalloc_reserve net/core/skbuff.c:142 [inline]
       __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210
       alloc_skb include/linux/skbuff.h:1051 [inline]
       alloc_skb_with_frags+0x18c/0xa70 net/core/skbuff.c:5766
       sock_alloc_send_pskb+0xada/0xc60 net/core/sock.c:2242
       packet_alloc_skb net/packet/af_packet.c:2815 [inline]
       packet_snd net/packet/af_packet.c:2910 [inline]
       packet_sendmsg+0x66a0/0x93b0 net/packet/af_packet.c:2992
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg net/socket.c:672 [inline]
       __sys_sendto+0xc1b/0xc50 net/socket.c:1998
       __do_sys_sendto net/socket.c:2010 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:2006
       __x64_sys_sendto+0x6e/0x90 net/socket.c:2006
       do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7469e83
    • David S. Miller's avatar
      Merge branch 'QorIQ-DPAA-FMan-erratum-A050385-workaround' · 172fd3eb
      David S. Miller authored
      Madalin Bucur says:
      
      ====================
      QorIQ DPAA FMan erratum A050385 workaround
      
      Changes in v2:
       - added CONFIG_DPAA_ERRATUM_A050385
       - removed unnecessary parenthesis
       - changed alignment defines to use only decimal values
      
      The patch set implements the workaround for FMan erratum A050385:
      
      FMAN DMA read or writes under heavy traffic load may cause FMAN
      internal resource leak; thus stopping further packet processing.
      To reproduce this issue when the workaround is not applied, one
      needs to ensure the FMan DMA transaction queue is already full
      when a transaction split occurs so the system must be under high
      traffic load (i.e. multiple ports at line rate). After the errata
      occurs, the traffic stops. The only SoC impacted by this is the
      LS1043A, the other ARM DPAA 1 SoC or the PPC DPAA 1 SoCs do not
      have this erratum.
      
      The FMAN internal queue can overflow when FMAN splits single
      read or write transactions into multiple smaller transactions
      such that more than 17 AXI transactions are in flight from FMAN
      to interconnect. When the FMAN internal queue overflows, it can
      stall further packet processing. The issue can occur with any one
      of the following three conditions:
      
        1. FMAN AXI transaction crosses 4K address boundary (Errata
               A010022)
        2. FMAN DMA address for an AXI transaction is not 16 byte
               aligned, i.e. the last 4 bits of an address are non-zero
        3. Scatter Gather (SG) frames have more than one SG buffer in
               the SG list and any one of the buffers, except the last
               buffer in the SG list has data size that is not a multiple
               of 16 bytes, i.e., other than 16, 32, 48, 64, etc.
      
      With any one of the above three conditions present, there is
      likelihood of stalled FMAN packet processing, especially under
      stress with multiple ports injecting line-rate traffic.
      
      To avoid situations that stall FMAN packet processing, all of the
      above three conditions must be avoided; therefore, configure the
      system with the following rules:
      
        1. Frame buffers must not span a 4KB address boundary, unless
               the frame start address is 256 byte aligned
        2. All FMAN DMA start addresses (for example, BMAN buffer
               address, FD[address] + FD[offset]) are 16B aligned
        3. SG table and buffer addresses are 16B aligned and the size
               of SG buffers are multiple of 16 bytes, except for the last
               SG buffer that can be of any size.
      
      Additional workaround notes:
      - Address alignment of 64 bytes is recommended for maximally
      efficient system bus transactions (although 16 byte alignment is
      sufficient to avoid the stall condition)
      - To support frame sizes that are larger than 4K bytes, there are
      two options:
        1. Large single buffer frames that span a 4KB page boundary can
               be converted into SG frames to avoid transaction splits at
               the 4KB boundary,
        2. Align the large single buffer to 256B address boundaries,
               ensure that the frame address plus offset is 256B aligned.
      - If software generated SG frames have buffers that are unaligned
      and with random non-multiple of 16 byte lengths, before
      transmitting such frames via FMAN, frames will need to be copied
      into a new single buffer or multiple buffer SG frame that is
      compliant with the three rules listed above.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      172fd3eb
    • Madalin Bucur's avatar
      dpaa_eth: FMan erratum A050385 workaround · 3c68b8ff
      Madalin Bucur authored
      Align buffers, data start, SG fragment length to avoid DMA splits.
      These changes prevent the A050385 erratum to manifest itself:
      
      FMAN DMA read or writes under heavy traffic load may cause FMAN
      internal resource leak; thus stopping further packet processing.
      
      The FMAN internal queue can overflow when FMAN splits single
      read or write transactions into multiple smaller transactions
      such that more than 17 AXI transactions are in flight from FMAN
      to interconnect. When the FMAN internal queue overflows, it can
      stall further packet processing. The issue can occur with any one
      of the following three conditions:
      
        1. FMAN AXI transaction crosses 4K address boundary (Errata
      	 A010022)
        2. FMAN DMA address for an AXI transaction is not 16 byte
      	 aligned, i.e. the last 4 bits of an address are non-zero
        3. Scatter Gather (SG) frames have more than one SG buffer in
      	 the SG list and any one of the buffers, except the last
      	 buffer in the SG list has data size that is not a multiple
      	 of 16 bytes, i.e., other than 16, 32, 48, 64, etc.
      
      With any one of the above three conditions present, there is
      likelihood of stalled FMAN packet processing, especially under
      stress with multiple ports injecting line-rate traffic.
      
      To avoid situations that stall FMAN packet processing, all of the
      above three conditions must be avoided; therefore, configure the
      system with the following rules:
      
        1. Frame buffers must not span a 4KB address boundary, unless
      	 the frame start address is 256 byte aligned
        2. All FMAN DMA start addresses (for example, BMAN buffer
      	 address, FD[address] + FD[offset]) are 16B aligned
        3. SG table and buffer addresses are 16B aligned and the size
      	 of SG buffers are multiple of 16 bytes, except for the last
      	 SG buffer that can be of any size.
      
      Additional workaround notes:
      - Address alignment of 64 bytes is recommended for maximally
      efficient system bus transactions (although 16 byte alignment is
      sufficient to avoid the stall condition)
      - To support frame sizes that are larger than 4K bytes, there are
      two options:
        1. Large single buffer frames that span a 4KB page boundary can
      	 be converted into SG frames to avoid transaction splits at
      	 the 4KB boundary,
        2. Align the large single buffer to 256B address boundaries,
      	 ensure that the frame address plus offset is 256B aligned.
      - If software generated SG frames have buffers that are unaligned
      and with random non-multiple of 16 byte lengths, before
      transmitting such frames via FMAN, frames will need to be copied
      into a new single buffer or multiple buffer SG frame that is
      compliant with the three rules listed above.
      Signed-off-by: default avatarMadalin Bucur <madalin.bucur@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c68b8ff
    • Madalin Bucur's avatar
      fsl/fman: detect FMan erratum A050385 · b281f7b9
      Madalin Bucur authored
      Detect the presence of the A050385 erratum.
      Signed-off-by: default avatarMadalin Bucur <madalin.bucur@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b281f7b9
    • Madalin Bucur's avatar
      arm64: dts: ls1043a: FMan erratum A050385 · b54d3900
      Madalin Bucur authored
      The LS1043A SoC is affected by the A050385 erratum stating that
      FMAN DMA read or writes under heavy traffic load may cause FMAN
      internal resource leak thus stopping further packet processing.
      Signed-off-by: default avatarMadalin Bucur <madalin.bucur@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b54d3900
    • Madalin Bucur's avatar
      dt-bindings: net: FMan erratum A050385 · 26d5bb9e
      Madalin Bucur authored
      FMAN DMA read or writes under heavy traffic load may cause FMAN
      internal resource leak; thus stopping further packet processing.
      
      The FMAN internal queue can overflow when FMAN splits single
      read or write transactions into multiple smaller transactions
      such that more than 17 AXI transactions are in flight from FMAN
      to interconnect. When the FMAN internal queue overflows, it can
      stall further packet processing. The issue can occur with any one
      of the following three conditions:
      
        1. FMAN AXI transaction crosses 4K address boundary (Errata
           A010022)
        2. FMAN DMA address for an AXI transaction is not 16 byte
           aligned, i.e. the last 4 bits of an address are non-zero
        3. Scatter Gather (SG) frames have more than one SG buffer in
           the SG list and any one of the buffers, except the last
           buffer in the SG list has data size that is not a multiple
           of 16 bytes, i.e., other than 16, 32, 48, 64, etc.
      
      With any one of the above three conditions present, there is
      likelihood of stalled FMAN packet processing, especially under
      stress with multiple ports injecting line-rate traffic.
      
      To avoid situations that stall FMAN packet processing, all of the
      above three conditions must be avoided; therefore, configure the
      system with the following rules:
      
        1. Frame buffers must not span a 4KB address boundary, unless
           the frame start address is 256 byte aligned
        2. All FMAN DMA start addresses (for example, BMAN buffer
           address, FD[address] + FD[offset]) are 16B aligned
        3. SG table and buffer addresses are 16B aligned and the size
           of SG buffers are multiple of 16 bytes, except for the last
           SG buffer that can be of any size.
      
      Additional workaround notes:
      - Address alignment of 64 bytes is recommended for maximally
      efficient system bus transactions (although 16 byte alignment is
      sufficient to avoid the stall condition)
      - To support frame sizes that are larger than 4K bytes, there are
      two options:
        1. Large single buffer frames that span a 4KB page boundary can
           be converted into SG frames to avoid transaction splits at
           the 4KB boundary,
        2. Align the large single buffer to 256B address boundaries,
           ensure that the frame address plus offset is 256B aligned.
      - If software generated SG frames have buffers that are unaligned
      and with random non-multiple of 16 byte lengths, before
      transmitting such frames via FMAN, frames will need to be copied
      into a new single buffer or multiple buffer SG frame that is
      compliant with the three rules listed above.
      Signed-off-by: default avatarMadalin Bucur <madalin.bucur@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26d5bb9e
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 357ddbb9
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Patches to bump position index from sysctl seq_next,
         from Vasilin Averin.
      
      2) Release flowtable hook from error path, from Florian Westphal.
      
      3) Patches to add missing netlink attribute validation,
         from Jakub Kicinski.
      
      4) Missing NFTA_CHAIN_FLAGS in nf_tables_fill_chain_info().
      
      5) Infinite loop in module autoload if extension is not available,
         from Florian Westphal.
      
      6) Missing module ownership in inet/nat chain type definition.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      357ddbb9
  4. 06 Mar, 2020 4 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nft_chain_nat: inet family is missing module ownership · 6a42cefb
      Pablo Neira Ayuso authored
      Set owner to THIS_MODULE, otherwise the nft_chain_nat module might be
      removed while there are still inet/nat chains in place.
      
      [  117.942096] BUG: unable to handle page fault for address: ffffffffa0d5e040
      [  117.942101] #PF: supervisor read access in kernel mode
      [  117.942103] #PF: error_code(0x0000) - not-present page
      [  117.942106] PGD 200c067 P4D 200c067 PUD 200d063 PMD 3dc909067 PTE 0
      [  117.942113] Oops: 0000 [#1] PREEMPT SMP PTI
      [  117.942118] CPU: 3 PID: 27 Comm: kworker/3:0 Not tainted 5.6.0-rc3+ #348
      [  117.942133] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
      [  117.942145] RIP: 0010:nf_tables_chain_destroy.isra.0+0x94/0x15a [nf_tables]
      [  117.942149] Code: f6 45 54 01 0f 84 d1 00 00 00 80 3b 05 74 44 48 8b 75 e8 48 c7 c7 72 be de a0 e8 56 e6 2d e0 48 8b 45 e8 48 c7 c7 7f be de a0 <48> 8b 30 e8 43 e6 2d e0 48 8b 45 e8 48 8b 40 10 48 85 c0 74 5b 8b
      [  117.942152] RSP: 0018:ffffc9000015be10 EFLAGS: 00010292
      [  117.942155] RAX: ffffffffa0d5e040 RBX: ffff88840be87fc2 RCX: 0000000000000007
      [  117.942158] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffffffffa0debe7f
      [  117.942160] RBP: ffff888403b54b50 R08: 0000000000001482 R09: 0000000000000004
      [  117.942162] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8883eda7e540
      [  117.942164] R13: dead000000000122 R14: dead000000000100 R15: ffff888403b3db80
      [  117.942167] FS:  0000000000000000(0000) GS:ffff88840e4c0000(0000) knlGS:0000000000000000
      [  117.942169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  117.942172] CR2: ffffffffa0d5e040 CR3: 00000003e4c52002 CR4: 00000000001606e0
      [  117.942174] Call Trace:
      [  117.942188]  nf_tables_trans_destroy_work.cold+0xd/0x12 [nf_tables]
      [  117.942196]  process_one_work+0x1d6/0x3b0
      [  117.942200]  worker_thread+0x45/0x3c0
      [  117.942203]  ? process_one_work+0x3b0/0x3b0
      [  117.942210]  kthread+0x112/0x130
      [  117.942214]  ? kthread_create_worker_on_cpu+0x40/0x40
      [  117.942221]  ret_from_fork+0x35/0x40
      
      nf_tables_chain_destroy() crashes on module_put() because the module is
      gone.
      
      Fixes: d164385e ("netfilter: nat: add inet family nat support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6a42cefb
    • Paolo Abeni's avatar
      mptcp: always include dack if possible. · 2398e399
      Paolo Abeni authored
      Currently passive MPTCP socket can skip including the DACK
      option - if the peer sends data before accept() completes.
      
      The above happens because the msk 'can_ack' flag is set
      only after the accept() call.
      
      Such missing DACK option may cause - as per RFC spec -
      unwanted fallback to TCP.
      
      This change addresses the issue using the key material
      available in the current subflow, if any, to create a suitable
      dack option when msk ack seq is not yet available.
      
      v1 -> v2:
       - adavance the generated ack after the initial MPC packet
      
      Fixes: d22f4988 ("mptcp: process MP_CAPABLE data option")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2398e399
    • Dan Carpenter's avatar
      net: nfc: fix bounds checking bugs on "pipe" · a3aefbfe
      Dan Carpenter authored
      This is similar to commit 674d9de0 ("NFC: Fix possible memory
      corruption when handling SHDLC I-Frame commands") and commit d7ee81ad
      ("NFC: nci: Add some bounds checking in nci_hci_cmd_received()") which
      added range checks on "pipe".
      
      The "pipe" variable comes skb->data[0] in nfc_hci_msg_rx_work().
      It's in the 0-255 range.  We're using it as the array index into the
      hdev->pipes[] array which has NFC_HCI_MAX_PIPES (128) members.
      
      Fixes: 118278f2 ("NFC: hci: Add pipes table to reference them with a tuple {gate, host}")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3aefbfe
    • Jiang Lidong's avatar
      veth: ignore peer tx_dropped when counting local rx_dropped · e25d5dbc
      Jiang Lidong authored
      When local NET_RX backlog is full due to traffic overrun,
      peer veth tx_dropped counter increases. At that time, list
      local veth stats, rx_dropped has double value of peer
      tx_dropped, even bigger than transmit packets by peer.
      
      In NET_RX softirq process, if any packet drop case happens,
      it increases dev's rx_dropped counter and returns NET_RX_DROP.
      
      At veth tx side, it records any error returned from peer netif_rx
      into local dev tx_dropped counter.
      
      In veth get stats process, it puts local dev rx_dropped and
      peer dev tx_dropped into together as local rx_drpped value.
      So that it shows double value of real dropped packets number in
      this case.
      
      This patch ignores peer tx_dropped when counting local rx_dropped,
      since peer tx_dropped is duplicated to local rx_dropped at most cases.
      Signed-off-by: default avatarJiang Lidong <jianglidong3@jd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e25d5dbc
  5. 05 Mar, 2020 6 commits
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2020-03-05' of... · 2f63f2d5
      David S. Miller authored
      Merge tag 'wireless-drivers-2020-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.6
      
      Second set of fixes for v5.6. Only two small fixes this time.
      
      iwlwifi
      
      * fix another initialisation regression with 3168 devices
      
      mt76
      
      * fix memory corruption with too many rx fragments
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f63f2d5
    • Tom Zhao's avatar
      sfc: complete the next packet when we receive a timestamp · 3b4f06c7
      Tom Zhao authored
      We now ignore the "completion" event when using tx queue timestamping,
      and only pay attention to the two (high and low) timestamp events. The
      NIC will send a pair of timestamp events for every packet transmitted.
      The current firmware may merge the completion events, and it is possible
      that future versions may reorder the completion and timestamp events.
      As such the completion event is not useful.
      
      Without this patch in place a merged completion event on a queue with
      timestamping will cause a "spurious TX completion" error. This affects
      SFN8000-series adapters.
      Signed-off-by: default avatarTom Zhao <tzhao@solarflare.com>
      Acked-by: default avatarMartin Habets <mhabets@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b4f06c7
    • Jian Shen's avatar
      net: hns3: fix a not link up issue when fibre port supports autoneg · 68e1006f
      Jian Shen authored
      When fibre port supports auto-negotiation, the IMP(Intelligent
      Management Process) processes the speed of auto-negotiation
      and the  user's speed separately.
      For below case, the port will get a not link up problem.
      step 1: disables auto-negotiation and sets speed to A, then
      the driver's MAC speed will be updated to A.
      step 2: enables auto-negotiation and MAC gets negotiated
      speed B, then the driver's MAC speed will be updated to B
      through querying in periodical task.
      step 3: MAC gets new negotiated speed A.
      step 4: disables auto-negotiation and sets speed to B before
      periodical task query new MAC speed A, the driver will  ignore
      the speed configuration.
      
      This patch fixes it by skipping speed and duplex checking when
      fibre port supports auto-negotiation.
      
      Fixes: 22f48e24 ("net: hns3: add autoneg and change speed support for fibre port")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68e1006f
    • Eric Dumazet's avatar
      slip: make slhc_compress() more robust against malicious packets · 110a40df
      Eric Dumazet authored
      Before accessing various fields in IPV4 network header
      and TCP header, make sure the packet :
      
      - Has IP version 4 (ip->version == 4)
      - Has not a silly network length (ip->ihl >= 5)
      - Is big enough to hold network and transport headers
      - Has not a silly TCP header size (th->doff >= sizeof(struct tcphdr) / 4)
      
      syzbot reported :
      
      BUG: KMSAN: uninit-value in slhc_compress+0x5b9/0x2e60 drivers/net/slip/slhc.c:270
      CPU: 0 PID: 11728 Comm: syz-executor231 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       slhc_compress+0x5b9/0x2e60 drivers/net/slip/slhc.c:270
       ppp_send_frame drivers/net/ppp/ppp_generic.c:1637 [inline]
       __ppp_xmit_process+0x1902/0x2970 drivers/net/ppp/ppp_generic.c:1495
       ppp_xmit_process+0x147/0x2f0 drivers/net/ppp/ppp_generic.c:1516
       ppp_write+0x6bb/0x790 drivers/net/ppp/ppp_generic.c:512
       do_loop_readv_writev fs/read_write.c:717 [inline]
       do_iter_write+0x812/0xdc0 fs/read_write.c:1000
       compat_writev+0x2df/0x5a0 fs/read_write.c:1351
       do_compat_pwritev64 fs/read_write.c:1400 [inline]
       __do_compat_sys_pwritev fs/read_write.c:1420 [inline]
       __se_compat_sys_pwritev fs/read_write.c:1414 [inline]
       __ia32_compat_sys_pwritev+0x349/0x3f0 fs/read_write.c:1414
       do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
       do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
       entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
      RIP: 0023:0xf7f7cd99
      Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
      RSP: 002b:00000000ffdb84ac EFLAGS: 00000217 ORIG_RAX: 000000000000014e
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000200001c0
      RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 0000000040047459 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
       kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
       kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
       slab_alloc_node mm/slub.c:2793 [inline]
       __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401
       __kmalloc_reserve net/core/skbuff.c:142 [inline]
       __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210
       alloc_skb include/linux/skbuff.h:1051 [inline]
       ppp_write+0x115/0x790 drivers/net/ppp/ppp_generic.c:500
       do_loop_readv_writev fs/read_write.c:717 [inline]
       do_iter_write+0x812/0xdc0 fs/read_write.c:1000
       compat_writev+0x2df/0x5a0 fs/read_write.c:1351
       do_compat_pwritev64 fs/read_write.c:1400 [inline]
       __do_compat_sys_pwritev fs/read_write.c:1420 [inline]
       __se_compat_sys_pwritev fs/read_write.c:1414 [inline]
       __ia32_compat_sys_pwritev+0x349/0x3f0 fs/read_write.c:1414
       do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
       do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
       entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
      
      Fixes: b5451d78 ("slip: Move the SLIP drivers")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      110a40df
    • Florian Westphal's avatar
      netfilter: nf_tables: fix infinite loop when expr is not available · 1d305ba4
      Florian Westphal authored
      nft will loop forever if the kernel doesn't support an expression:
      
      1. nft_expr_type_get() appends the family specific name to the module list.
      2. -EAGAIN is returned to nfnetlink, nfnetlink calls abort path.
      3. abort path sets ->done to true and calls request_module for the
         expression.
      4. nfnetlink replays the batch, we end up in nft_expr_type_get() again.
      5. nft_expr_type_get attempts to append family-specific name. This
         one already exists on the list, so we continue
      6. nft_expr_type_get adds the generic expression name to the module
         list. -EAGAIN is returned, nfnetlink calls abort path.
      7. abort path encounters the family-specific expression which
         has 'done' set, so it gets removed.
      8. abort path requests the generic expression name, sets done to true.
      9. batch is replayed.
      
      If the expression could not be loaded, then we will end up back at 1),
      because the family-specific name got removed and the cycle starts again.
      
      Note that userspace can SIGKILL the nft process to stop the cycle, but
      the desired behaviour is to return an error after the generic expr name
      fails to load the expression.
      
      Fixes: eb014de4 ("netfilter: nf_tables: autoload modules from the abort path")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1d305ba4
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: dump NFTA_CHAIN_FLAGS attribute · d78008de
      Pablo Neira Ayuso authored
      Missing NFTA_CHAIN_FLAGS netlink attribute when dumping basechain
      definitions.
      
      Fixes: c9626a2c ("netfilter: nf_tables: add hardware offload support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d78008de
  6. 04 Mar, 2020 4 commits
    • Dajun Jin's avatar
      drivers/of/of_mdio.c:fix of_mdiobus_register() · 209c65b6
      Dajun Jin authored
      When registers a phy_device successful, should terminate the loop
      or the phy_device would be registered in other addr. If there are
      multiple PHYs without reg properties, it will go wrong.
      Signed-off-by: default avatarDajun Jin <adajunjin@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      209c65b6
    • Vishal Kulkarni's avatar
      cxgb4: fix checks for max queues to allocate · 116ca924
      Vishal Kulkarni authored
      Hardware can support more than 8 queues currently limited by
      netif_get_num_default_rss_queues(). So, rework and fix checks for max
      number of queues to allocate. The checks should be based on how many are
      actually supported by hardware, OR the number of online cpus; whichever
      is lower.
      
      Fixes: 5952dde7 ("cxgb4: set maximal number of default RSS queues")
      Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>"
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      116ca924
    • Hauke Mehrtens's avatar
      phylink: Improve error message when validate failed · 20d8bb0d
      Hauke Mehrtens authored
      This should improve the error message when the PHY validate in the MAC
      driver failed. I ran into this problem multiple times that I put wrong
      interface values into the device tree and was searching why it is
      failing with -22 (-EINVAL). This should make it easier to spot the
      problem.
      Signed-off-by: default avatarHauke Mehrtens <hauke@hauke-m.de>
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20d8bb0d
    • Jonas Gorski's avatar
      net: phy: bcm63xx: fix OOPS due to missing driver name · 43de81b0
      Jonas Gorski authored
      719655a1 ("net: phy: Replace phy driver features u32 with link_mode
      bitmap") was a bit over-eager and also removed the second phy driver's
      name, resulting in a nasty OOPS on registration:
      
      [    1.319854] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 804dd50c, ra == 804dd4f0
      [    1.330859] Oops[#1]:
      [    1.333138] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.22 #0
      [    1.339217] $ 0   : 00000000 00000001 87ca7f00 805c1874
      [    1.344590] $ 4   : 00000000 00000047 00585000 8701f800
      [    1.349965] $ 8   : 8701f800 804f4a5c 00000003 64726976
      [    1.355341] $12   : 00000001 00000000 00000000 00000114
      [    1.360718] $16   : 87ca7f80 00000000 00000000 80639fe4
      [    1.366093] $20   : 00000002 00000000 806441d0 80b90000
      [    1.371470] $24   : 00000000 00000000
      [    1.376847] $28   : 87c1e000 87c1fda0 80b90000 804dd4f0
      [    1.382224] Hi    : d1c8f8da
      [    1.385180] Lo    : 5518a480
      [    1.388182] epc   : 804dd50c kset_find_obj+0x3c/0x114
      [    1.393345] ra    : 804dd4f0 kset_find_obj+0x20/0x114
      [    1.398530] Status: 10008703 KERNEL EXL IE
      [    1.402833] Cause : 00800008 (ExcCode 02)
      [    1.406952] BadVA : 00000000
      [    1.409913] PrId  : 0002a075 (Broadcom BMIPS4350)
      [    1.414745] Modules linked in:
      [    1.417895] Process swapper/0 (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
      [    1.426214] Stack : 87cec000 80630000 80639370 80640658 80640000 80049af4 80639fe4 8063a0d8
      [    1.434816]         8063a0d8 802ef078 00000002 00000000 806441d0 80b90000 8063a0d8 802ef114
      [    1.443417]         87cea0de 87c1fde0 00000000 804de488 87cea000 8063a0d8 8063a0d8 80334e48
      [    1.452018]         80640000 8063984c 80639bf4 00000000 8065de48 00000001 8063a0d8 80334ed0
      [    1.460620]         806441d0 80b90000 80b90000 802ef164 8065dd70 80620000 80b90000 8065de58
      [    1.469222]         ...
      [    1.471734] Call Trace:
      [    1.474255] [<804dd50c>] kset_find_obj+0x3c/0x114
      [    1.479141] [<802ef078>] driver_find+0x1c/0x44
      [    1.483665] [<802ef114>] driver_register+0x74/0x148
      [    1.488719] [<80334e48>] phy_driver_register+0x9c/0xd0
      [    1.493968] [<80334ed0>] phy_drivers_register+0x54/0xe8
      [    1.499345] [<8001061c>] do_one_initcall+0x7c/0x1f4
      [    1.504374] [<80644ed8>] kernel_init_freeable+0x1d4/0x2b4
      [    1.509940] [<804f4e24>] kernel_init+0x10/0xf8
      [    1.514502] [<80018e68>] ret_from_kernel_thread+0x14/0x1c
      [    1.520040] Code: 1060000c  02202025  90650000 <90810000> 24630001  14250004  24840001  14a0fffb  90650000
      [    1.530061]
      [    1.531698] ---[ end trace d52f1717cd29bdc8 ]---
      
      Fix it by readding the name.
      
      Fixes: 719655a1 ("net: phy: Replace phy driver features u32 with link_mode bitmap")
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43de81b0