1. 14 May, 2021 4 commits
    • Yunsheng Lin's avatar
      net: sched: fix packet stuck problem for lockless qdisc · a90c57f2
      Yunsheng Lin authored
      Lockless qdisc has below concurrent problem:
          cpu0                 cpu1
           .                     .
      q->enqueue                 .
           .                     .
      qdisc_run_begin()          .
           .                     .
      dequeue_skb()              .
           .                     .
      sch_direct_xmit()          .
           .                     .
           .                q->enqueue
           .             qdisc_run_begin()
           .            return and do nothing
           .                     .
      qdisc_run_end()            .
      
      cpu1 enqueue a skb without calling __qdisc_run() because cpu0
      has not released the lock yet and spin_trylock() return false
      for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
      enqueued by cpu1 when calling dequeue_skb() because cpu1 may
      enqueue the skb after cpu0 calling dequeue_skb() and before
      cpu0 calling qdisc_run_end().
      
      Lockless qdisc has below another concurrent problem when
      tx_action is involved:
      
      cpu0(serving tx_action)     cpu1             cpu2
                .                   .                .
                .              q->enqueue            .
                .            qdisc_run_begin()       .
                .              dequeue_skb()         .
                .                   .            q->enqueue
                .                   .                .
                .             sch_direct_xmit()      .
                .                   .         qdisc_run_begin()
                .                   .       return and do nothing
                .                   .                .
       clear __QDISC_STATE_SCHED    .                .
       qdisc_run_begin()            .                .
       return and do nothing        .                .
                .                   .                .
                .            qdisc_run_end()         .
      
      This patch fixes the above data race by:
      1. If the first spin_trylock() return false and STATE_MISSED is
         not set, set STATE_MISSED and retry another spin_trylock() in
         case other CPU may not see STATE_MISSED after it releases the
         lock.
      2. reschedule if STATE_MISSED is set after the lock is released
         at the end of qdisc_run_end().
      
      For tx_action case, STATE_MISSED is also set when cpu1 is at the
      end if qdisc_run_end(), so tx_action will be rescheduled again
      to dequeue the skb enqueued by cpu2.
      
      Clear STATE_MISSED before retrying a dequeuing when dequeuing
      returns NULL in order to reduce the overhead of the second
      spin_trylock() and __netif_schedule() calling.
      
      Also clear the STATE_MISSED before calling __netif_schedule()
      at the end of qdisc_run_end() to avoid doing another round of
      dequeuing in the pfifo_fast_dequeue().
      
      The performance impact of this patch, tested using pktgen and
      dummy netdev with pfifo_fast qdisc attached:
      
       threads  without+this_patch   with+this_patch      delta
          1        2.61Mpps            2.60Mpps           -0.3%
          2        3.97Mpps            3.82Mpps           -3.7%
          4        5.62Mpps            5.59Mpps           -0.5%
          8        2.78Mpps            2.77Mpps           -0.3%
         16        2.22Mpps            2.22Mpps           -0.0%
      
      Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Tested-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a90c57f2
    • Jim Ma's avatar
      tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT · 974271e5
      Jim Ma authored
      In tls_sw_splice_read, checkout MSG_* is inappropriate, should use
      SPLICE_*, update tls_wait_data to accept nonblock arguments instead
      of flags for recvmsg and splice.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Signed-off-by: default avatarJim Ma <majinjing3@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      974271e5
    • Hoang Le's avatar
      Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv" · 75016891
      Hoang Le authored
      This reverts commit 6bf24dc0.
      Above fix is not correct and caused memory leak issue.
      
      Fixes: 6bf24dc0 ("net:tipc: Fix a double free in tipc_sk_mcast_rcv")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Acked-by: default avatarTung Nguyen <tung.q.nguyen@dektech.com.au>
      Signed-off-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75016891
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 414ed7fe
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Remove the flowtable hardware refresh state, fall back to the
         existing hardware pending state instead, from Roi Dayan.
      
      2) Fix crash in pipapo avx2 lookup when FPU is in used from user
         context, from Stefano Brivio.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      414ed7fe
  2. 13 May, 2021 7 commits
    • Stefano Brivio's avatar
      netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version · f0b3d338
      Stefano Brivio authored
      Arturo reported this backtrace:
      
      [709732.358791] WARNING: CPU: 3 PID: 456 at arch/x86/kernel/fpu/core.c:128 kernel_fpu_begin_mask+0xae/0xe0
      [709732.358793] Modules linked in: binfmt_misc nft_nat nft_chain_nat nf_nat nft_counter nft_ct nf_tables nf_conntrack_netlink nfnetlink 8021q garp stp mrp llc vrf intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp crc32_pclmul mgag200 ghash_clmulni_intel drm_kms_helper cec aesni_intel drm libaes crypto_simd cryptd glue_helper mei_me dell_smbios iTCO_wdt evdev intel_pmc_bxt iTCO_vendor_support dcdbas pcspkr rapl dell_wmi_descriptor wmi_bmof sg i2c_algo_bit watchdog mei acpi_ipmi ipmi_si button nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_devintf ipmi_msghandler ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor sd_mod t10_pi crc_t10dif crct10dif_generic raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ahci libahci tg3 libata xhci_pci libphy xhci_hcd ptp usbcore crct10dif_pclmul crct10dif_common bnxt_en crc32c_intel scsi_mod
      [709732.358941]  pps_core i2c_i801 lpc_ich i2c_smbus wmi usb_common
      [709732.358957] CPU: 3 PID: 456 Comm: jbd2/dm-0-8 Not tainted 5.10.0-0.bpo.5-amd64 #1 Debian 5.10.24-1~bpo10+1
      [709732.358959] Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.9.3 09/23/2020
      [709732.358964] RIP: 0010:kernel_fpu_begin_mask+0xae/0xe0
      [709732.358969] Code: ae 54 24 04 83 e3 01 75 38 48 8b 44 24 08 65 48 33 04 25 28 00 00 00 75 33 48 83 c4 10 5b c3 65 8a 05 5e 21 5e 76 84 c0 74 92 <0f> 0b eb 8e f0 80 4f 01 40 48 81 c7 00 14 00 00 e8 dd fb ff ff eb
      [709732.358972] RSP: 0018:ffffbb9700304740 EFLAGS: 00010202
      [709732.358976] RAX: 0000000000000001 RBX: 0000000000000003 RCX: 0000000000000001
      [709732.358979] RDX: ffffbb9700304970 RSI: ffff922fe1952e00 RDI: 0000000000000003
      [709732.358981] RBP: ffffbb9700304970 R08: ffff922fc868a600 R09: ffff922fc711e462
      [709732.358984] R10: 000000000000005f R11: ffff922ff0b27180 R12: ffffbb9700304960
      [709732.358987] R13: ffffbb9700304b08 R14: ffff922fc664b6c8 R15: ffff922fc664b660
      [709732.358990] FS:  0000000000000000(0000) GS:ffff92371fec0000(0000) knlGS:0000000000000000
      [709732.358993] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [709732.358996] CR2: 0000557a6655bdd0 CR3: 000000026020a001 CR4: 00000000007706e0
      [709732.358999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [709732.359001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [709732.359003] PKRU: 55555554
      [709732.359005] Call Trace:
      [709732.359009]  <IRQ>
      [709732.359035]  nft_pipapo_avx2_lookup+0x4c/0x1cba [nf_tables]
      [709732.359046]  ? sched_clock+0x5/0x10
      [709732.359054]  ? sched_clock_cpu+0xc/0xb0
      [709732.359061]  ? record_times+0x16/0x80
      [709732.359068]  ? plist_add+0xc1/0x100
      [709732.359073]  ? psi_group_change+0x47/0x230
      [709732.359079]  ? skb_clone+0x4d/0xb0
      [709732.359085]  ? enqueue_task_rt+0x22b/0x310
      [709732.359098]  ? bnxt_start_xmit+0x1e8/0xaf0 [bnxt_en]
      [709732.359102]  ? packet_rcv+0x40/0x4a0
      [709732.359121]  nft_lookup_eval+0x59/0x160 [nf_tables]
      [709732.359133]  nft_do_chain+0x350/0x500 [nf_tables]
      [709732.359152]  ? nft_lookup_eval+0x59/0x160 [nf_tables]
      [709732.359163]  ? nft_do_chain+0x364/0x500 [nf_tables]
      [709732.359172]  ? fib4_rule_action+0x6d/0x80
      [709732.359178]  ? fib_rules_lookup+0x107/0x250
      [709732.359184]  nft_nat_do_chain+0x8a/0xf2 [nft_chain_nat]
      [709732.359193]  nf_nat_inet_fn+0xea/0x210 [nf_nat]
      [709732.359202]  nf_nat_ipv4_out+0x14/0xa0 [nf_nat]
      [709732.359207]  nf_hook_slow+0x44/0xc0
      [709732.359214]  ip_output+0xd2/0x100
      [709732.359221]  ? __ip_finish_output+0x210/0x210
      [709732.359226]  ip_forward+0x37d/0x4a0
      [709732.359232]  ? ip4_key_hashfn+0xb0/0xb0
      [709732.359238]  ip_sublist_rcv_finish+0x4f/0x60
      [709732.359243]  ip_sublist_rcv+0x196/0x220
      [709732.359250]  ? ip_rcv_finish_core.isra.22+0x400/0x400
      [709732.359255]  ip_list_rcv+0x137/0x160
      [709732.359264]  __netif_receive_skb_list_core+0x29b/0x2c0
      [709732.359272]  netif_receive_skb_list_internal+0x1a6/0x2d0
      [709732.359280]  gro_normal_list.part.156+0x19/0x40
      [709732.359286]  napi_complete_done+0x67/0x170
      [709732.359298]  bnxt_poll+0x105/0x190 [bnxt_en]
      [709732.359304]  ? irqentry_exit+0x29/0x30
      [709732.359309]  ? asm_common_interrupt+0x1e/0x40
      [709732.359315]  net_rx_action+0x144/0x3c0
      [709732.359322]  __do_softirq+0xd5/0x29c
      [709732.359329]  asm_call_irq_on_stack+0xf/0x20
      [709732.359332]  </IRQ>
      [709732.359339]  do_softirq_own_stack+0x37/0x40
      [709732.359346]  irq_exit_rcu+0x9d/0xa0
      [709732.359353]  common_interrupt+0x78/0x130
      [709732.359358]  asm_common_interrupt+0x1e/0x40
      [709732.359366] RIP: 0010:crc_41+0x0/0x1e [crc32c_intel]
      [709732.359370] Code: ff ff f2 4d 0f 38 f1 93 a8 fe ff ff f2 4c 0f 38 f1 81 b0 fe ff ff f2 4c 0f 38 f1 8a b0 fe ff ff f2 4d 0f 38 f1 93 b0 fe ff ff <f2> 4c 0f 38 f1 81 b8 fe ff ff f2 4c 0f 38 f1 8a b8 fe ff ff f2 4d
      [709732.359373] RSP: 0018:ffffbb97008dfcd0 EFLAGS: 00000246
      [709732.359377] RAX: 000000000000002a RBX: 0000000000000400 RCX: ffff922fc591dd50
      [709732.359379] RDX: ffff922fc591dea0 RSI: 0000000000000a14 RDI: ffffffffc00dddc0
      [709732.359382] RBP: 0000000000001000 R08: 000000000342d8c3 R09: 0000000000000000
      [709732.359384] R10: 0000000000000000 R11: ffff922fc591dff0 R12: ffffbb97008dfe58
      [709732.359386] R13: 000000000000000a R14: ffff922fd2b91e80 R15: ffff922fef83fe38
      [709732.359395]  ? crc_43+0x1e/0x1e [crc32c_intel]
      [709732.359403]  ? crc32c_pcl_intel_update+0x97/0xb0 [crc32c_intel]
      [709732.359419]  ? jbd2_journal_commit_transaction+0xaec/0x1a30 [jbd2]
      [709732.359425]  ? irq_exit_rcu+0x3e/0xa0
      [709732.359447]  ? kjournald2+0xbd/0x270 [jbd2]
      [709732.359454]  ? finish_wait+0x80/0x80
      [709732.359470]  ? commit_timeout+0x10/0x10 [jbd2]
      [709732.359476]  ? kthread+0x116/0x130
      [709732.359481]  ? kthread_park+0x80/0x80
      [709732.359488]  ? ret_from_fork+0x1f/0x30
      [709732.359494] ---[ end trace 081a19978e5f09f5 ]---
      
      that is, nft_pipapo_avx2_lookup() uses the FPU running from a softirq
      that interrupted a kthread, also using the FPU.
      
      That's exactly the reason why irq_fpu_usable() is there: use it, and
      if we can't use the FPU, fall back to the non-AVX2 version of the
      lookup operation, i.e. nft_pipapo_lookup().
      Reported-by: default avatarArturo Borrero Gonzalez <arturo@netfilter.org>
      Cc: <stable@vger.kernel.org> # 5.6.x
      Fixes: 7400b063 ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f0b3d338
    • Roi Dayan's avatar
      netfilter: flowtable: Remove redundant hw refresh bit · c07531c0
      Roi Dayan authored
      Offloading conns could fail for multiple reasons and a hw refresh bit is
      set to try to reoffload it in next sw packet.
      But it could be in some cases and future points that the hw refresh bit
      is not set but a refresh could succeed.
      Remove the hw refresh bit and do offload refresh if requested.
      There won't be a new work entry if a work is already pending
      anyway as there is the hw pending bit.
      
      Fixes: 8b3646d6 ("net/sched: act_ct: Support refreshing the flow table entries")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c07531c0
    • Tao Liu's avatar
      openvswitch: meter: fix race when getting now_ms. · e4df1b0c
      Tao Liu authored
      We have observed meters working unexpected if traffic is 3+Gbit/s
      with multiple connections.
      
      now_ms is not pretected by meter->lock, we may get a negative
      long_delta_ms when another cpu updated meter->used, then:
          delta_ms = (u32)long_delta_ms;
      which will be a large value.
      
          band->bucket += delta_ms * band->rate;
      then we get a wrong band->bucket.
      
      OpenVswitch userspace datapath has fixed the same issue[1] some
      time ago, and we port the implementation to kernel datapath.
      
      [1] https://patchwork.ozlabs.org/project/openvswitch/patch/20191025114436.9746-1-i.maximets@ovn.org/
      
      Fixes: 96fbc13d ("openvswitch: Add meter infrastructure")
      Signed-off-by: default avatarTao Liu <thomas.liu@ucloud.cn>
      Suggested-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Reviewed-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4df1b0c
    • Wei Yongjun's avatar
      net: korina: Fix return value check in korina_probe() · c7d83024
      Wei Yongjun authored
      In case of error, the function devm_platform_ioremap_resource_byname()
      returns ERR_PTR() and never returns NULL. The NULL test in the return
      value check should be replaced with IS_ERR().
      
      Fixes: b4cd249a ("net: korina: Use devres functions")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7d83024
    • Ayush Sawal's avatar
      cxgb4/ch_ktls: Clear resources when pf4 device is removed · 65e302a9
      Ayush Sawal authored
      This patch maintain the list of active tids and clear all the active
      connection resources when DETACH notification comes.
      
      Fixes: a8c16e8e ("crypto/chcr: move nic TLS functionality to drivers/net")
      Signed-off-by: default avatarAyush Sawal <ayush.sawal@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65e302a9
    • Christophe JAILLET's avatar
      net: mdio: octeon: Fix some double free issues · e1d027dd
      Christophe JAILLET authored
      'bus->mii_bus' has been allocated with 'devm_mdiobus_alloc_size()' in the
      probe function. So it must not be freed explicitly or there will be a
      double free.
      
      Remove the incorrect 'mdiobus_free' in the error handling path of the
      probe function and in remove function.
      Suggested-By: default avatarAndrew Lunn <andrew@lunn.ch>
      Fixes: 35d2aeac ("phy: mdio-octeon: Use devm_mdiobus_alloc_size()")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1d027dd
    • Christophe JAILLET's avatar
      net: mdio: thunder: Fix a double free issue in the .remove function · a93a0a15
      Christophe JAILLET authored
      'bus->mii_bus' have been allocated with 'devm_mdiobus_alloc_size()' in the
      probe function. So it must not be freed explicitly or there will be a
      double free.
      
      Remove the incorrect 'mdiobus_free' in the remove function.
      
      Fixes: 379d7ac7 ("phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a93a0a15
  3. 12 May, 2021 14 commits
  4. 11 May, 2021 15 commits