1. 15 Oct, 2020 16 commits
    • Jakub Kicinski's avatar
      Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" · 2ecbc1f6
      Jakub Kicinski authored
      This reverts commit 1d273fcc.
      
      Alexei points out there's nothing implying headers will be built
      and therefore exist under usr/include, so this fix doesn't make
      much sense.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2ecbc1f6
    • Yonghong Song's avatar
      net: fix pos incrementment in ipv6_route_seq_next · 6617dfd4
      Yonghong Song authored
      Commit 4fc427e0 ("ipv6_route_seq_next should increase position index")
      tried to fix the issue where seq_file pos is not increased
      if a NULL element is returned with seq_ops->next(). See bug
        https://bugzilla.kernel.org/show_bug.cgi?id=206283
      The commit effectively does:
        - increase pos for all seq_ops->start()
        - increase pos for all seq_ops->next()
      
      For ipv6_route, increasing pos for all seq_ops->next() is correct.
      But increasing pos for seq_ops->start() is not correct
      since pos is used to determine how many items to skip during
      seq_ops->start():
        iter->skip = *pos;
      seq_ops->start() just fetches the *current* pos item.
      The item can be skipped only after seq_ops->show() which essentially
      is the beginning of seq_ops->next().
      
      For example, I have 7 ipv6 route entries,
        root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
        00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001     eth0
        fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
        00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
        fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
        ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
        0+1 records in
        0+1 records out
        1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
        root@arch-fb-vm1:~/net-next
      
      In the above, I specify buffer size 4096, so all records can be returned
      to user space with a single trip to the kernel.
      
      If I use buffer size 128, since each record size is 149, internally
      kernel seq_read() will read 149 into its internal buffer and return the data
      to user space in two read() syscalls. Then user read() syscall will trigger
      next seq_ops->start(). Since the current implementation increased pos even
      for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
      record is #1.
      
        root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
        00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
        fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
        00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
      4+1 records in
      4+1 records out
      600 bytes copied, 0.00127758 s, 470 kB/s
      
      To fix the problem, create a fake pos pointer so seq_ops->start()
      won't actually increase seq_file pos. With this fix, the
      above `dd` command with `bs=128` will show correct result.
      
      Fixes: 4fc427e0 ("ipv6_route_seq_next should increase position index")
      Cc: Alexei Starovoitov <ast@kernel.org>
      Suggested-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6617dfd4
    • Jakub Kicinski's avatar
      Merge branch 'net-smc-fixes-2020-10-14' · 0c124aa5
      Jakub Kicinski authored
      Karsten Graul says:
      
      ====================
      net/smc: fixes 2020-10-14
      
      The first patch fixes a possible use-after-free of delayed llc events.
      Patch 2 corrects the number of DMB buffer sizes. And patch 3 ensures
      a correctly formatted return code when smc_ism_register_dmb() fails to
      create a new DMB.
      ====================
      
      Link: https://lore.kernel.org/r/20201014174329.35791-1-kgraul@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c124aa5
    • Karsten Graul's avatar
      net/smc: fix invalid return code in smcd_new_buf_create() · 6b1bbf94
      Karsten Graul authored
      smc_ism_register_dmb() returns error codes set by the ISM driver which
      are not guaranteed to be negative or in the errno range. Such values
      would not be handled by ERR_PTR() and finally the return code will be
      used as a memory address.
      Fix that by using a valid negative errno value with ERR_PTR().
      
      Fixes: 72b7f6c4 ("net/smc: unique reason code for exceeded max dmb count")
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6b1bbf94
    • Karsten Graul's avatar
      net/smc: fix valid DMBE buffer sizes · ef12ad45
      Karsten Graul authored
      The SMCD_DMBE_SIZES should include all valid DMBE buffer sizes, so the
      correct value is 6 which means 1MB. With 7 the registration of an ISM
      buffer would always fail because of the invalid size requested.
      Fix that and set the value to 6.
      
      Fixes: c6ba7c9b ("net/smc: add base infrastructure for SMC-D and ISM")
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ef12ad45
    • Karsten Graul's avatar
      net/smc: fix use-after-free of delayed events · d535ca13
      Karsten Graul authored
      When a delayed event is enqueued then the event worker will send this
      event the next time it is running and no other flow is currently
      active. The event handler is called for the delayed event, and the
      pointer to the event keeps set in lgr->delayed_event. This pointer is
      cleared later in the processing by smc_llc_flow_start().
      This can lead to a use-after-free condition when the processing does not
      reach smc_llc_flow_start(), but frees the event because of an error
      situation. Then the delayed_event pointer is still set but the event is
      freed.
      Fix this by always clearing the delayed event pointer when the event is
      provided to the event handler for processing, and remove the code to
      clear it in smc_llc_flow_start().
      
      Fixes: 555da9af ("net/smc: add event-based llc_flow framework")
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d535ca13
    • YueHaibing's avatar
      bpfilter: Fix build error with CONFIG_BPFILTER_UMH · 1d273fcc
      YueHaibing authored
      IF CONFIG_BPFILTER_UMH is set, building fails:
      
      In file included from /usr/include/sys/socket.h:33:0,
                       from net/bpfilter/main.c:6:
      /usr/include/bits/socket.h:390:10: fatal error: asm/socket.h: No such file or directory
       #include <asm/socket.h>
                ^~~~~~~~~~~~~~
      compilation terminated.
      scripts/Makefile.userprogs:43: recipe for target 'net/bpfilter/main.o' failed
      make[2]: *** [net/bpfilter/main.o] Error 1
      
      Add missing include path to fix this.
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d273fcc
    • Leon Romanovsky's avatar
      net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info · d086a1c6
      Leon Romanovsky authored
      The access of tcf_tunnel_info() produces the following splat, so fix it
      by dereferencing the tcf_tunnel_key_params pointer with marker that
      internal tcfa_liock is held.
      
       =============================
       WARNING: suspicious RCU usage
       5.9.0+ #1 Not tainted
       -----------------------------
       include/net/tc_act/tc_tunnel_key.h:59 suspicious rcu_dereference_protected() usage!
       other info that might help us debug this:
      
       rcu_scheduler_active = 2, debug_locks = 1
       1 lock held by tc/34839:
        #0: ffff88828572c2a0 (&p->tcfa_lock){+...}-{2:2}, at: tc_setup_flow_action+0xb3/0x48b5
       stack backtrace:
       CPU: 1 PID: 34839 Comm: tc Not tainted 5.9.0+ #1
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x9a/0xd0
        tc_setup_flow_action+0x14cb/0x48b5
        fl_hw_replace_filter+0x347/0x690 [cls_flower]
        fl_change+0x2bad/0x4875 [cls_flower]
        tc_new_tfilter+0xf6f/0x1ba0
        rtnetlink_rcv_msg+0x5f2/0x870
        netlink_rcv_skb+0x124/0x350
        netlink_unicast+0x433/0x700
        netlink_sendmsg+0x6f1/0xbd0
        sock_sendmsg+0xb0/0xe0
        ____sys_sendmsg+0x4fa/0x6d0
        ___sys_sendmsg+0x12e/0x1b0
        __sys_sendmsg+0xa4/0x120
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f1f8cd4fe57
       Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
       RSP: 002b:00007ffdc1e193b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1f8cd4fe57
       RDX: 0000000000000000 RSI: 00007ffdc1e19420 RDI: 0000000000000003
       RBP: 000000005f85aafa R08: 0000000000000001 R09: 00007ffdc1e1936c
       R10: 000000000040522d R11: 0000000000000246 R12: 0000000000000001
       R13: 0000000000000000 R14: 00007ffdc1e1d6f0 R15: 0000000000482420
      
      Fixes: 3ebaf6da ("net: sched: Do not assume RTNL is held in tunnel key action helpers")
      Fixes: 7a472814 ("net: sched: lock action when translating it to flow_action infra")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d086a1c6
    • Jakub Kicinski's avatar
      Merge branch 'ibmveth-gso-fix' · 15f0d292
      Jakub Kicinski authored
      David Wilder says:
      
      ====================
      ibmveth gso fix
      
      The ibmveth driver is a virtual Ethernet driver used on IBM pSeries systems.
      Gso packets can be sent between LPARS (virtual hosts) without segmentation,
      by flagging gso packets using one of two methods depending on the firmware
      version. Some gso packet were not correctly identified by the receiver.
      This patch-set corrects this issue.
      
      V2:
      - Added fix tags.
      - Byteswap the constant at compilation time.
      - Updated the commit message to clarify what frame validation is performed
        by the hypervisor.
      ====================
      
      Link: https://lore.kernel.org/r/20201013232014.26044-1-dwilder@us.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      15f0d292
    • David Wilder's avatar
      ibmveth: Identify ingress large send packets. · 413f142c
      David Wilder authored
      Ingress large send packets are identified by either:
      The IBMVETH_RXQ_LRG_PKT flag in the receive buffer
      or with a -1 placed in the ip header checksum.
      The method used depends on firmware version. Frame
      geometry and sufficient header validation is performed by the
      hypervisor eliminating the need for further header checks here.
      
      Fixes: 7b596738 ("ibmveth: set correct gso_size and gso_type")
      Signed-off-by: default avatarDavid Wilder <dwilder@us.ibm.com>
      Reviewed-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Reviewed-by: default avatarCristobal Forno <cris.forno@ibm.com>
      Reviewed-by: default avatarPradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      413f142c
    • David Wilder's avatar
      ibmveth: Switch order of ibmveth_helper calls. · 5ce9ad81
      David Wilder authored
      ibmveth_rx_csum_helper() must be called after ibmveth_rx_mss_helper()
      as ibmveth_rx_csum_helper() may alter ip and tcp checksum values.
      
      Fixes: 66aa0678 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")
      Signed-off-by: default avatarDavid Wilder <dwilder@us.ibm.com>
      Reviewed-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Reviewed-by: default avatarCristobal Forno <cris.forno@ibm.com>
      Reviewed-by: default avatarPradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5ce9ad81
    • Herat Ramani's avatar
      cxgb4: handle 4-tuple PEDIT to NAT mode translation · 2ef813b8
      Herat Ramani authored
      The 4-tuple NAT offload via PEDIT always overwrites all the 4-tuple
      fields even if they had not been explicitly enabled. If any fields in
      the 4-tuple are not enabled, then the hardware overwrites the
      disabled fields with zeros, instead of ignoring them.
      
      So, add a parser that can translate the enabled 4-tuple PEDIT fields
      to one of the NAT mode combinations supported by the hardware and
      hence avoid overwriting disabled fields to 0. Any rule with
      unsupported NAT mode combination is rejected.
      Signed-off-by: default avatarHerat Ramani <herat@chelsio.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2ef813b8
    • Jakub Kicinski's avatar
      Merge branch 'l3mdev-icmp-error-route-lookup-fixes' · f8ea4a19
      Jakub Kicinski authored
      Mathieu Desnoyers says:
      
      ====================
      l3mdev icmp error route lookup fixes
      
      Here is a series of fixes for ipv4 and ipv6 which ensure the route
      lookup is performed on the right routing table in VRF configurations
      when sending TTL expired icmp errors (useful for traceroute).
      
      It includes tests for both ipv4 and ipv6.
      
      These fixes address specifically address the code paths involved in
      sending TTL expired icmp errors. As detailed in the individual commit
      messages, those fixes do not address similar icmp errors related to
      network namespaces and unreachable / fragmentation needed messages,
      which appear to use different code paths.
      ====================
      
      Link: https://lore.kernel.org/r/20201012145016.2023-1-mathieu.desnoyers@efficios.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f8ea4a19
    • Michael Jeanson's avatar
      selftests: Add VRF route leaking tests · 1a017276
      Michael Jeanson authored
      The objective of the tests is to check that ICMP errors generated while
      crossing between VRFs are properly routed back to the source host.
      
      The first ttl test sends a ping with a ttl of 1 from h1 to h2 and parses the
      output of the command to check that a ttl expired error is received.
      
      The second ttl test runs traceroute from h1 to h2 and parses the output to
      check for a hop on r1.
      
      The mtu test sends a ping with a payload of 1450 from h1 to h2, through
      r1 which has an interface with a mtu of 1400 and parses the output of the
      command to check that a fragmentation needed error is received.
      
      [ The IPv6 MTU test still fails with the symmetric routing setup. It
        appears to be caused by source address selection picking ::1.  Fixing
        this is beyond the scope of this series. ]
      Signed-off-by: default avatarMichael Jeanson <mjeanson@efficios.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1a017276
    • Mathieu Desnoyers's avatar
      ipv6/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) · 272928d1
      Mathieu Desnoyers authored
      As per RFC4443, the destination address field for ICMPv6 error messages
      is copied from the source address field of the invoking packet.
      
      In configurations with Virtual Routing and Forwarding tables, looking up
      which routing table to use for sending ICMPv6 error messages is
      currently done by using the destination net_device.
      
      If the source and destination interfaces are within separate VRFs, or
      one in the global routing table and the other in a VRF, looking up the
      source address of the invoking packet in the destination interface's
      routing table will fail if the destination interface's routing table
      contains no route to the invoking packet's source address.
      
      One observable effect of this issue is that traceroute6 does not work in
      the following cases:
      
      - Route leaking between global routing table and VRF
      - Route leaking between VRFs
      
      Use the source device routing table when sending ICMPv6 error
      messages.
      
      [ In the context of ipv4, it has been pointed out that a similar issue
        may exist with ICMP errors triggered when forwarding between network
        namespaces. It would be worthwhile to investigate whether ipv6 has
        similar issues, but is outside of the scope of this investigation. ]
      
      [ Testing shows that similar issues exist with ipv6 unreachable /
        fragmentation needed messages.  However, investigation of this
        additional failure mode is beyond this investigation's scope. ]
      
      Link: https://tools.ietf.org/html/rfc4443Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      272928d1
    • Mathieu Desnoyers's avatar
      ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) · e1e84eb5
      Mathieu Desnoyers authored
      As per RFC792, ICMP errors should be sent to the source host.
      
      However, in configurations with Virtual Routing and Forwarding tables,
      looking up which routing table to use is currently done by using the
      destination net_device.
      
      commit 9d1a6c4e ("net: icmp_route_lookup should use rt dev to
      determine L3 domain") changes the interface passed to
      l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
      to skb_dst(skb_in)->dev. This effectively uses the destination device
      rather than the source device for choosing which routing table should be
      used to lookup where to send the ICMP error.
      
      Therefore, if the source and destination interfaces are within separate
      VRFs, or one in the global routing table and the other in a VRF, looking
      up the source host in the destination interface's routing table will
      fail if the destination interface's routing table contains no route to
      the source host.
      
      One observable effect of this issue is that traceroute does not work in
      the following cases:
      
      - Route leaking between global routing table and VRF
      - Route leaking between VRFs
      
      Preferably use the source device routing table when sending ICMP error
      messages. If no source device is set, fall-back on the destination
      device routing table. Else, use the main routing table (index 0).
      
      [ It has been pointed out that a similar issue may exist with ICMP
        errors triggered when forwarding between network namespaces. It would
        be worthwhile to investigate, but is outside of the scope of this
        investigation. ]
      
      [ It has also been pointed out that a similar issue exists with
        unreachable / fragmentation needed messages, which can be triggered by
        changing the MTU of eth1 in r1 to 1400 and running:
      
        ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2
      
        Some investigation points to raw_icmp_error() and raw_err() as being
        involved in this last scenario. The focus of this patch is TTL expired
        ICMP messages, which go through icmp_route_lookup.
        Investigation of failure modes related to raw_icmp_error() is beyond
        this investigation's scope. ]
      
      Fixes: 9d1a6c4e ("net: icmp_route_lookup should use rt dev to determine L3 domain")
      Link: https://tools.ietf.org/html/rfc792Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e1e84eb5
  2. 14 Oct, 2020 4 commits
  3. 13 Oct, 2020 2 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_log: missing vlan offload tag and proto · 0d9826bc
      Pablo Neira Ayuso authored
      Dump vlan tag and proto for the usual vlan offload case if the
      NF_LOG_MACDECODE flag is set on. Without this information the logging is
      misleading as there is no reference to the VLAN header.
      
      [12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0
      [12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1
      
      Fixes: 83e96d44 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0d9826bc
    • Willem de Bruijn's avatar
      docs: networking: update XPS to account for netif_set_xps_queue · 254941f3
      Willem de Bruijn authored
      With the introduction of netif_set_xps_queue, XPS can be enabled
      by the driver at initialization.
      
      Update the documentation to reflect this, as otherwise users
      may incorrectly believe that the feature is off by default.
      
      Fixes: 537c00de ("net: Add functions netif_reset_xps_queue and netif_set_xps_queue")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      254941f3
  4. 12 Oct, 2020 5 commits
    • Marek Vasut's avatar
      net: fec: Fix phy_device lookup for phy_reset_after_clk_enable() · 64a632da
      Marek Vasut authored
      The phy_reset_after_clk_enable() is always called with ndev->phydev,
      however that pointer may be NULL even though the PHY device instance
      already exists and is sufficient to perform the PHY reset.
      
      This condition happens in fec_open(), where the clock must be enabled
      first, then the PHY must be reset, and then the PHY IDs can be read
      out of the PHY.
      
      If the PHY still is not bound to the MAC, but there is OF PHY node
      and a matching PHY device instance already, use the OF PHY node to
      obtain the PHY device instance, and then use that PHY device instance
      when triggering the PHY reset.
      
      Fixes: 1b0a83ac ("net: fec: add phy_reset_after_clk_enable() support")
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: Christoph Niedermaier <cniedermaier@dh-electronics.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: NXP Linux Team <linux-imx@nxp.com>
      Cc: Richard Leitner <richard.leitner@skidata.com>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      64a632da
    • Jonathan Lemon's avatar
      mlx4: handle non-napi callers to napi_poll · b2b8a927
      Jonathan Lemon authored
      netcons calls napi_poll with a budget of 0 to transmit packets.
      Handle this by:
       - skipping RX processing
       - do not try to recycle TX packets to the RX cache
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b2b8a927
    • Valentin Vidic's avatar
      net: korina: fix kfree of rx/tx descriptor array · 3af5f0f5
      Valentin Vidic authored
      kmalloc returns KSEG0 addresses so convert back from KSEG1
      in kfree. Also make sure array is freed when the driver is
      unloaded from the kernel.
      
      Fixes: ef11291b ("Add support the Korina (IDT RC32434) Ethernet MAC")
      Signed-off-by: default avatarValentin Vidic <vvidic@valentin-vidic.from.hr>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3af5f0f5
    • Christian Eggers's avatar
      net: dsa: microchip: fix race condition · 8098bd69
      Christian Eggers authored
      Between queuing the delayed work and finishing the setup of the dsa
      ports, the process may sleep in request_module() (via
      phy_device_create()) and the queued work may be executed prior to the
      switch net devices being registered. In ksz_mib_read_work(), a NULL
      dereference will happen within netof_carrier_ok(dp->slave).
      
      Not queuing the delayed work in ksz_init_mib_timer() makes things even
      worse because the work will now be queued for immediate execution
      (instead of 2000 ms) in ksz_mac_link_down() via
      dsa_port_link_register_of().
      
      Call tree:
      ksz9477_i2c_probe()
      \--ksz9477_switch_register()
         \--ksz_switch_register()
            +--dsa_register_switch()
            |  \--dsa_switch_probe()
            |     \--dsa_tree_setup()
            |        \--dsa_tree_setup_switches()
            |           +--dsa_switch_setup()
            |           |  +--ksz9477_setup()
            |           |  |  \--ksz_init_mib_timer()
            |           |  |     |--/* Start the timer 2 seconds later. */
            |           |  |     \--schedule_delayed_work(&dev->mib_read, msecs_to_jiffies(2000));
            |           |  \--__mdiobus_register()
            |           |     \--mdiobus_scan()
            |           |        \--get_phy_device()
            |           |           +--get_phy_id()
            |           |           \--phy_device_create()
            |           |              |--/* sleeping, ksz_mib_read_work() can be called meanwhile */
            |           |              \--request_module()
            |           |
            |           \--dsa_port_setup()
            |              +--/* Called for non-CPU ports */
            |              +--dsa_slave_create()
            |              |  +--/* Too late, ksz_mib_read_work() may be called beforehand */
            |              |  \--port->slave = ...
            |             ...
            |              +--Called for CPU port */
            |              \--dsa_port_link_register_of()
            |                 \--ksz_mac_link_down()
            |                    +--/* mib_read must be initialized here */
            |                    +--/* work is already scheduled, so it will be executed after 2000 ms */
            |                    \--schedule_delayed_work(&dev->mib_read, 0);
            \-- /* here port->slave is setup properly, scheduling the delayed work should be safe */
      
      Solution:
      1. Do not queue (only initialize) delayed work in ksz_init_mib_timer().
      2. Only queue delayed work in ksz_mac_link_down() if init is completed.
      3. Queue work once in ksz_switch_register(), after dsa_register_switch()
      has completed.
      
      Fixes: 7c6ff470 ("net: dsa: microchip: add MIB counter reading support")
      Signed-off-by: default avatarChristian Eggers <ceggers@arri.de>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8098bd69
    • Pablo Neira Ayuso's avatar
      netfilter: nftables: extend error reporting for chain updates · 98a381a7
      Pablo Neira Ayuso authored
      The initial support for netlink extended ACK is missing the chain update
      path, which results in misleading error reporting in case of EEXIST.
      
      Fixes 36dd1bcc ("netfilter: nf_tables: initial support for extended ACK reporting")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      98a381a7
  5. 11 Oct, 2020 2 commits
  6. 10 Oct, 2020 9 commits
    • David Ahern's avatar
      ipv4: Restore flowi4_oif update before call to xfrm_lookup_route · 874fb9e2
      David Ahern authored
      Tobias reported regressions in IPsec tests following the patch
      referenced by the Fixes tag below. The root cause is dropping the
      reset of the flowi4_oif after the fib_lookup. Apparently it is
      needed for xfrm cases, so restore the oif update to ip_route_output_flow
      right before the call to xfrm_lookup_route.
      
      Fixes: 2fbc6e89 ("ipv4: Update exception handling for multipath routes via same device")
      Reported-by: default avatarTobias Brunner <tobias@strongswan.org>
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      874fb9e2
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-some-fallback-fixes' · 49fb2f33
      Jakub Kicinski authored
      Paolo Abeni says:
      
      ====================
      mptcp: some fallback fixes
      
      pktdrill pointed-out we currently don't handle properly some
      fallback scenario for MP_JOIN subflows
      
      The first patch addresses such issue.
      
      Patch 2/2 fixes a related pre-existing issue that is more
      evident after 1/2: we could keep using for MPTCP signaling
      closed subflows.
      ====================
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      49fb2f33
    • Paolo Abeni's avatar
      mptcp: subflows garbage collection · 0e4f35d7
      Paolo Abeni authored
      The msk can close MP_JOIN subflows if the initial handshake
      fails. Currently such subflows are kept alive in the
      conn_list until the msk itself is closed.
      
      Beyond the wasted memory, we could end-up sending the
      DATA_FIN and the DATA_FIN ack on such socket, even after a
      reset.
      
      Fixes: 43b54c6e ("mptcp: Use full MPTCP-level disconnect state machine")
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0e4f35d7
    • Paolo Abeni's avatar
      mptcp: fix fallback for MP_JOIN subflows · d5824847
      Paolo Abeni authored
      Additional/MP_JOIN subflows that do not pass some initial handshake
      tests currently causes fallback to TCP. That is an RFC violation:
      we should instead reset the subflow and leave the the msk untouched.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/91
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d5824847
    • Pujin Shi's avatar
      net: smc: fix missing brace warning for old compilers · 16cb3653
      Pujin Shi authored
      For older versions of gcc, the array = {0}; will cause warnings:
      
      net/smc/smc_llc.c: In function 'smc_llc_add_link_local':
      net/smc/smc_llc.c:1212:9: warning: missing braces around initializer [-Wmissing-braces]
        struct smc_llc_msg_add_link add_llc = {0};
               ^
      net/smc/smc_llc.c:1212:9: warning: (near initialization for 'add_llc.hd') [-Wmissing-braces]
      net/smc/smc_llc.c: In function 'smc_llc_srv_delete_link_local':
      net/smc/smc_llc.c:1245:9: warning: missing braces around initializer [-Wmissing-braces]
        struct smc_llc_msg_del_link del_llc = {0};
               ^
      net/smc/smc_llc.c:1245:9: warning: (near initialization for 'del_llc.hd') [-Wmissing-braces]
      
      2 warnings generated
      
      Fixes: 4dadd151 ("net/smc: enqueue local LLC messages")
      Signed-off-by: default avatarPujin Shi <shipujin.t@gmail.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      16cb3653
    • Pujin Shi's avatar
      net: smc: fix missing brace warning for old compilers · 7e94e46c
      Pujin Shi authored
      For older versions of gcc, the array = {0}; will cause warnings:
      
      net/smc/smc_llc.c: In function 'smc_llc_send_link_delete_all':
      net/smc/smc_llc.c:1317:9: warning: missing braces around initializer [-Wmissing-braces]
        struct smc_llc_msg_del_link delllc = {0};
               ^
      net/smc/smc_llc.c:1317:9: warning: (near initialization for 'delllc.hd') [-Wmissing-braces]
      
      1 warnings generated
      
      Fixes: f3811fd7 ("net/smc: send DELETE_LINK, ALL message and wait for send to complete")
      Signed-off-by: default avatarPujin Shi <shipujin.t@gmail.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e94e46c
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-5.9-20201008' of... · b54fa649
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-5.9-20201008' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      ====================
      linux-can-fixes-for-5.9-20201008
      
      The first patch is by Lucas Stach and fixes m_can driver by removing an
      erroneous call to m_can_class_suspend() in runtime suspend. Which causes the
      pinctrl state to get stuck on the "sleep" state, which breaks all CAN
      functionality on SoCs where this state is defined.
      
      The last two patches target the j1939 protocol: Cong Wang fixes a syzbot
      finding of an uninitialized variable in the j1939 transport protocol. I
      contribute a patch, that fixes the initialization of a same uninitialized
      variable in a different function.
      ====================
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b54fa649
    • Hoang Huu Le's avatar
      tipc: fix NULL pointer dereference in tipc_named_rcv · 7b50ee3d
      Hoang Huu Le authored
      In the function node_lost_contact(), we call __skb_queue_purge() without
      grabbing the list->lock. This can cause to a race-condition why processing
      the list 'namedq' in calling path tipc_named_rcv()->tipc_named_dequeue().
      
          [] BUG: kernel NULL pointer dereference, address: 0000000000000000
          [] #PF: supervisor read access in kernel mode
          [] #PF: error_code(0x0000) - not-present page
          [] PGD 7ca63067 P4D 7ca63067 PUD 6c553067 PMD 0
          [] Oops: 0000 [#1] SMP NOPTI
          [] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G  O  5.9.0-rc6+ #2
          [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS [...]
          [] RIP: 0010:tipc_named_rcv+0x103/0x320 [tipc]
          [] Code: 41 89 44 24 10 49 8b 16 49 8b 46 08 49 c7 06 00 00 00 [...]
          [] RSP: 0018:ffffc900000a7c58 EFLAGS: 00000282
          [] RAX: 00000000000012ec RBX: 0000000000000000 RCX: ffff88807bde1270
          [] RDX: 0000000000002c7c RSI: 0000000000002c7c RDI: ffff88807b38f1a8
          [] RBP: ffff88807b006288 R08: ffff88806a367800 R09: ffff88806a367900
          [] R10: ffff88806a367a00 R11: ffff88806a367b00 R12: ffff88807b006258
          [] R13: ffff88807b00628a R14: ffff888069334d00 R15: ffff88806a434600
          [] FS:  0000000000000000(0000) GS:ffff888079480000(0000) knlGS:0[...]
          [] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [] CR2: 0000000000000000 CR3: 0000000077320000 CR4: 00000000000006e0
          [] Call Trace:
          []  ? tipc_bcast_rcv+0x9a/0x1a0 [tipc]
          []  tipc_rcv+0x40d/0x670 [tipc]
          []  ? _raw_spin_unlock+0xa/0x20
          []  tipc_l2_rcv_msg+0x55/0x80 [tipc]
          []  __netif_receive_skb_one_core+0x8c/0xa0
          []  process_backlog+0x98/0x140
          []  net_rx_action+0x13a/0x420
          []  __do_softirq+0xdb/0x316
          []  ? smpboot_thread_fn+0x2f/0x1e0
          []  ? smpboot_thread_fn+0x74/0x1e0
          []  ? smpboot_thread_fn+0x14e/0x1e0
          []  run_ksoftirqd+0x1a/0x40
          []  smpboot_thread_fn+0x149/0x1e0
          []  ? sort_range+0x20/0x20
          []  kthread+0x131/0x150
          []  ? kthread_unuse_mm+0xa0/0xa0
          []  ret_from_fork+0x22/0x30
          [] Modules linked in: veth tipc(O) ip6_udp_tunnel udp_tunnel [...]
          [] CR2: 0000000000000000
          [] ---[ end trace 65c276a8e2e2f310 ]---
      
      To fix this, we need to grab the lock of the 'namedq' list on both
      path calling.
      
      Fixes: cad2929d ("tipc: update a binding service via broadcast")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarHoang Huu Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b50ee3d
    • Cong Wang's avatar
      tipc: fix the skb_unshare() in tipc_buf_append() · ed42989e
      Cong Wang authored
      skb_unshare() drops a reference count on the old skb unconditionally,
      so in the failure case, we end up freeing the skb twice here.
      And because the skb is allocated in fclone and cloned by caller
      tipc_msg_reassemble(), the consequence is actually freeing the
      original skb too, thus triggered the UAF by syzbot.
      
      Fix this by replacing this skb_unshare() with skb_cloned()+skb_copy().
      
      Fixes: ff48b622 ("tipc: use skb_unshare() instead in tipc_buf_append()")
      Reported-and-tested-by: syzbot+e96a7ba46281824cc46a@syzkaller.appspotmail.com
      Cc: Jon Maloy <jmaloy@redhat.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ed42989e
  7. 09 Oct, 2020 2 commits
    • Randy Dunlap's avatar
      net/tls: remove a duplicate function prototype · 923527dc
      Randy Dunlap authored
      Remove one of the two instances of the function prototype for
      tls_validate_xmit_skb().
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Boris Pismenny <borisp@nvidia.com>
      Cc: Aviad Yehezkel <aviadye@nvidia.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      923527dc
    • Rohit Maheshwari's avatar
      net/tls: sendfile fails with ktls offload · ea1dd3e9
      Rohit Maheshwari authored
      At first when sendpage gets called, if there is more data, 'more' in
      tls_push_data() gets set which later sets pending_open_record_frags, but
      when there is no more data in file left, and last time tls_push_data()
      gets called, pending_open_record_frags doesn't get reset. And later when
      2 bytes of encrypted alert comes as sendmsg, it first checks for
      pending_open_record_frags, and since this is set, it creates a record with
      0 data bytes to encrypt, meaning record length is prepend_size + tag_size
      only, which causes problem.
       We should set/reset pending_open_record_frags based on more bit.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: default avatarRohit Maheshwari <rohitm@chelsio.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ea1dd3e9