1. 06 Nov, 2018 7 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · a422757e
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains the first batch of Netfilter fixes for
      your net tree:
      
      1) Fix splat with IPv6 defragmenting locally generated fragments,
         from Florian Westphal.
      
      2) Fix Incorrect check for missing attribute in nft_osf.
      
      3) Missing INT_MIN & INT_MAX definition for netfilter bridge uapi
         header, from Jiri Slaby.
      
      4) Revert map lookup in nft_numgen, this is already possible with
         the existing infrastructure without this extension.
      
      5) Fix wrong listing of set reference counter, make counter
         synchronous again, from Stefano Brivio.
      
      6) Fix CIDR 0 in hash:net,port,net, from Eric Westbrook.
      
      7) Fix allocation failure with large set, use kvcalloc().
         From Andrey Ryabinin.
      
      8) No need to disable BH when fetch ip set comment, patch from
         Jozsef Kadlecsik.
      
      9) Sanity check for valid sysfs entry in xt_IDLETIMER, from
         Taehee Yoo.
      
      10) Fix suspicious rcu usage via ip_set() macro at netlink dump,
          from Jozsef Kadlecsik.
      
      11) Fix setting default timeout via nfnetlink_cttimeout, this
          comes with preparation patch to add nf_{tcp,udp,...}_pernet()
          helper.
      
      12) Allow ebtables table nat to be of filter type via nft_compat.
          From Florian Westphal.
      
      13) Incorrect calculation of next bucket in early_drop, do no bump
          hash value, update bucket counter instead. From Vasily Khoruzhick.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a422757e
    • Rasmus Villemoes's avatar
      net: alx: make alx_drv_name static · 71311931
      Rasmus Villemoes authored
      alx_drv_name is not used outside main.c, so there's no reason for it to
      have external linkage.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71311931
    • Taehee Yoo's avatar
      net: bpfilter: fix iptables failure if bpfilter_umh is disabled · 97adadda
      Taehee Yoo authored
      When iptables command is executed, ip_{set/get}sockopt() try to upload
      bpfilter.ko if bpfilter is enabled. if it couldn't find bpfilter.ko,
      command is failed.
      bpfilter.ko is generated if CONFIG_BPFILTER_UMH is enabled.
      ip_{set/get}sockopt() only checks CONFIG_BPFILTER.
      So that if CONFIG_BPFILTER is enabled and CONFIG_BPFILTER_UMH is disabled,
      iptables command is always failed.
      
      test config:
         CONFIG_BPFILTER=y
         # CONFIG_BPFILTER_UMH is not set
      
      test command:
         %iptables -L
         iptables: No chain/target/match by that name.
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97adadda
    • Andrei Vagin's avatar
      sock_diag: fix autoloading of the raw_diag module · c34c1287
      Andrei Vagin authored
      IPPROTO_RAW isn't registred as an inet protocol, so
      inet_protos[protocol] is always NULL for it.
      
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Fixes: bf2ae2e4 ("sock_diag: request _diag module only when the family or proto has been registered")
      Signed-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Reviewed-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c34c1287
    • Matwey V. Kornilov's avatar
      net: core: netpoll: Enable netconsole IPv6 link local address · d016b4a3
      Matwey V. Kornilov authored
      There is no reason to discard using source link local address when
      remote netconsole IPv6 address is set to be link local one.
      
      The patch allows administrators to use IPv6 netconsole without
      explicitly configuring source address:
      
          netconsole=@/,@fe80::5054:ff:fe2f:6012/
      Signed-off-by: default avatarMatwey V. Kornilov <matwey@sai.msu.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d016b4a3
    • Alexey Kodanev's avatar
      ipv6: properly check return value in inet6_dump_all() · e22d0bfa
      Alexey Kodanev authored
      Make sure we call fib6_dump_end() if it happens that skb->len
      is zero. rtnl_dump_all() can reset cb->args on the next loop
      iteration there.
      
      Fixes: 08e814c9 ("net/ipv6: Bail early if user only wants cloned entries")
      Fixes: ae677bbb ("net: Don't return invalid table id error when dumping all families")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e22d0bfa
    • Alexey Kodanev's avatar
      rtnetlink: restore handling of dumpit return value in rtnl_dump_all() · 5e1acb4a
      Alexey Kodanev authored
      For non-zero return from dumpit() we should break the loop
      in rtnl_dump_all() and return the result. Otherwise, e.g.,
      we could get the memory leak in inet6_dump_fib() [1]. The
      pointer to the allocated struct fib6_walker there (saved
      in cb->args) can be lost, reset on the next iteration.
      
      Fix it by partially restoring the previous behavior before
      commit c63586dc ("net: rtnl_dump_all needs to propagate
      error from dumpit function"). The returned error from
      dumpit() is still passed further.
      
      [1]:
      unreferenced object 0xffff88001322a200 (size 96):
        comm "sshd", pid 1484, jiffies 4296032768 (age 1432.542s)
        hex dump (first 32 bytes):
          00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
          18 09 41 36 00 88 ff ff 18 09 41 36 00 88 ff ff  ..A6......A6....
        backtrace:
          [<0000000095846b39>] kmem_cache_alloc_trace+0x151/0x220
          [<000000007d12709f>] inet6_dump_fib+0x68d/0x940
          [<000000002775a316>] rtnl_dump_all+0x1d9/0x2d0
          [<00000000d7cd302b>] netlink_dump+0x945/0x11a0
          [<000000002f43485f>] __netlink_dump_start+0x55d/0x800
          [<00000000f76bbeec>] rtnetlink_rcv_msg+0x4fa/0xa00
          [<000000009b5761f3>] netlink_rcv_skb+0x29c/0x420
          [<0000000087a1dae1>] rtnetlink_rcv+0x15/0x20
          [<00000000691b703b>] netlink_unicast+0x4e3/0x6c0
          [<00000000b5be0204>] netlink_sendmsg+0x7f2/0xba0
          [<0000000096d2aa60>] sock_sendmsg+0xba/0xf0
          [<000000008c1b786f>] __sys_sendto+0x1e4/0x330
          [<0000000019587b3f>] __x64_sys_sendto+0xe1/0x1a0
          [<00000000071f4d56>] do_syscall_64+0x9f/0x300
          [<000000002737577f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
          [<0000000057587684>] 0xffffffffffffffff
      
      Fixes: c63586dc ("net: rtnl_dump_all needs to propagate error from dumpit function")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e1acb4a
  2. 05 Nov, 2018 3 commits
    • Jeff Barnhill's avatar
      net/ipv6: Move anycast init/cleanup functions out of CONFIG_PROC_FS · 6915ed86
      Jeff Barnhill authored
      Move the anycast.c init and cleanup functions which were inadvertently
      added inside the CONFIG_PROC_FS definition.
      
      Fixes: 2384d025 ("net/ipv6: Add anycast addresses to a global hashtable")
      Signed-off-by: default avatarJeff Barnhill <0xeffeff@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6915ed86
    • Jarod Wilson's avatar
      bonding/802.3ad: fix link_failure_count tracking · ea53abfa
      Jarod Wilson authored
      Commit 4d2c0cda set slave->link to
      BOND_LINK_DOWN for 802.3ad bonds whenever invalid speed/duplex values
      were read, to fix a problem with slaves getting into weird states, but
      in the process, broke tracking of link failures, as going straight to
      BOND_LINK_DOWN when a link is indeed down (cable pulled, switch rebooted)
      means we broke out of bond_miimon_inspect()'s BOND_LINK_DOWN case because
      !link_state was already true, we never incremented commit, and never got
      a chance to call bond_miimon_commit(), where slave->link_failure_count
      would be incremented. I believe the simple fix here is to mark the slave
      as BOND_LINK_FAIL, and let bond_miimon_inspect() transition the link from
      _FAIL to either _UP or _DOWN, and in the latter case, we now get proper
      incrementing of link_failure_count again.
      
      Fixes: 4d2c0cda ("bonding: speed/duplex update at NETDEV_UP event")
      CC: Mahesh Bandewar <maheshb@google.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea53abfa
    • Holger Hoffstätte's avatar
      net: phy: realtek: fix RTL8201F sysfs name · 0432e833
      Holger Hoffstätte authored
      Since 4.19 the following error in sysfs has appeared when using the
      r8169 NIC driver:
      
      $cd /sys/module/realtek/drivers
      $ls -l
      ls: cannot access 'mdio_bus:RTL8201F 10/100Mbps Ethernet': No such file or directory
      [..garbled dir entries follow..]
      
      Apparently the forward slash in "10/100Mbps Ethernet" is interpreted
      as directory separator that leads nowhere, and was introduced in commit
      513588dd ("net: phy: realtek: add RTL8201F phy-id and functions").
      
      Fix this by removing the offending slash in the driver name.
      
      Other drivers in net/phy seem to have the same problem, but I cannot
      test/verify them.
      
      Fixes: 513588dd ("net: phy: realtek: add RTL8201F phy-id and functions")
      Signed-off-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0432e833
  3. 04 Nov, 2018 6 commits
  4. 03 Nov, 2018 24 commits
    • Yunsheng Lin's avatar
      net: hns3: Fix for out-of-bounds access when setting pfc back pressure · e8ccbb7d
      Yunsheng Lin authored
      The vport should be initialized to hdev->vport for each bp group,
      otherwise it will cause out-of-bounds access and bp setting not
      correct problem.
      
      [   35.254124] BUG: KASAN: slab-out-of-bounds in hclge_pause_setup_hw+0x2a0/0x3f8 [hclge]
      [   35.254126] Read of size 2 at addr ffff803b6651581a by task kworker/0:1/14
      
      [   35.254132] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted 4.19.0-rc7-hulk+ #85
      [   35.254133] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - B052 (V0.52) 09/14/2018
      [   35.254141] Workqueue: events work_for_cpu_fn
      [   35.254144] Call trace:
      [   35.254147]  dump_backtrace+0x0/0x2f0
      [   35.254149]  show_stack+0x24/0x30
      [   35.254154]  dump_stack+0x110/0x184
      [   35.254157]  print_address_description+0x168/0x2b0
      [   35.254160]  kasan_report+0x184/0x310
      [   35.254162]  __asan_load2+0x7c/0xa0
      [   35.254170]  hclge_pause_setup_hw+0x2a0/0x3f8 [hclge]
      [   35.254177]  hclge_tm_init_hw+0x794/0x9f0 [hclge]
      [   35.254184]  hclge_tm_schd_init+0x48/0x58 [hclge]
      [   35.254191]  hclge_init_ae_dev+0x778/0x1168 [hclge]
      [   35.254196]  hnae3_register_ae_dev+0x14c/0x298 [hnae3]
      [   35.254206]  hns3_probe+0x88/0xa8 [hns3]
      [   35.254210]  local_pci_probe+0x7c/0xf0
      [   35.254212]  work_for_cpu_fn+0x34/0x50
      [   35.254214]  process_one_work+0x4d4/0xa38
      [   35.254216]  worker_thread+0x55c/0x8d8
      [   35.254219]  kthread+0x1b0/0x1b8
      [   35.254222]  ret_from_fork+0x10/0x1c
      
      [   35.254224] The buggy address belongs to the page:
      [   35.254228] page:ffff7e00ed994400 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
      [   35.273835] flags: 0xfffff8000008000(head)
      [   35.282007] raw: 0fffff8000008000 dead000000000100 dead000000000200 0000000000000000
      [   35.282010] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
      [   35.282012] page dumped because: kasan: bad access detected
      
      [   35.282014] Memory state around the buggy address:
      [   35.282017]  ffff803b66515700: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
      [   35.282019]  ffff803b66515780: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
      [   35.282021] >ffff803b66515800: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
      [   35.282022]                             ^
      [   35.282024]  ffff803b66515880: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
      [   35.282026]  ffff803b66515900: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
      [   35.282028] ==================================================================
      [   35.282029] Disabling lock debugging due to kernel taint
      [   35.282747] hclge driver initialization finished.
      
      Fixes: 67bf2541 ("net: hns3: Fixes the back pressure setting when sriov is enabled")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8ccbb7d
    • David S. Miller's avatar
      Merge branch 'net-bql-better-deal-with-GSO' · cb53fd54
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: bql: better deal with GSO
      
      While BQL bulk dequeue works well for TSO packets, it is
      not very efficient as soon as GSO is involved.
      
      On a GSO only workload (UDP or TCP), this patch series
      can save about 8 % of cpu cycles on a 40Gbit mlx4 NIC,
      by keeping optimal batching, and avoiding expensive
      doorbells, qdisc requeues and reschedules.
      
      This patch series :
      
      - Add __netdev_tx_sent_queue() so that drivers
        can implement efficient BQL and xmit_more support.
      
      - Implement a work around in dev_hard_start_xmit()
        for drivers not using __netdev_tx_sent_queue()
      
      - changes mlx4 to use __netdev_tx_sent_queue()
      
      v2: Tariq and Willem feedback addressed.
          added __netdev_tx_sent_queue() (Willem suggestion)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb53fd54
    • Eric Dumazet's avatar
      net/mlx4_en: use __netdev_tx_sent_queue() · c2973444
      Eric Dumazet authored
      doorbell only depends on xmit_more and netif_tx_queue_stopped()
      
      Using __netdev_tx_sent_queue() avoids messing with BQL stop flag,
      and is more generic.
      
      This patch increases performance on GSO workload by keeping
      doorbells to the minimum required.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2973444
    • Eric Dumazet's avatar
      net: do not abort bulk send on BQL status · fe60faa5
      Eric Dumazet authored
      Before calling dev_hard_start_xmit(), upper layers tried
      to cook optimal skb list based on BQL budget.
      
      Problem is that GSO packets can end up comsuming more than
      the BQL budget.
      
      Breaking the loop is not useful, since requeued packets
      are ahead of any packets still in the qdisc.
      
      It is also more expensive, since next TX completion will
      push these packets later, while skbs are not in cpu caches.
      
      It is also a behavior difference with TSO packets, that can
      break the BQL limit by a large amount.
      
      Note that drivers should use __netdev_tx_sent_queue()
      in order to have optimal xmit_more support, and avoid
      useless atomic operations as shown in the following patch.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe60faa5
    • Eric Dumazet's avatar
      net: bql: add __netdev_tx_sent_queue() · 3e59020a
      Eric Dumazet authored
      When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
      of packets, GSO can later transform this list in another list
      of skbs, and each skb is sent to device ndo_start_xmit(),
      one at a time, with skb->xmit_more being set to one but
      for last skb.
      
      Problem is that very often, BQL limit is hit in the middle of
      the packet train, forcing dev_hard_start_xmit() to stop the
      bulk send and requeue the end of the list.
      
      BQL role is to avoid head of line blocking, making sure
      a qdisc can deliver high priority packets before low priority ones.
      
      But there is no way requeued packets can be bypassed by fresh
      packets in the qdisc.
      
      Aborting the bulk send increases TX softirqs, and hot cache
      lines (after skb_segment()) are wasted.
      
      Note that for TSO packets, we never split a packet in the middle
      because of BQL limit being hit.
      
      Drivers should be able to update BQL counters without
      flipping/caring about BQL status, if the current skb
      has xmit_more set.
      
      Upper layers are ultimately responsible to stop sending another
      packet train when BQL limit is hit.
      
      Code template in a driver might look like the following :
      
      	send_doorbell = __netdev_tx_sent_queue(tx_queue, nr_bytes, skb->xmit_more);
      
      Note that __netdev_tx_sent_queue() use is not mandatory,
      since following patch will change dev_hard_start_xmit()
      to not care about BQL status.
      
      But it is highly recommended so that xmit_more full benefits
      can be reached (less doorbells sent, and less atomic operations as well)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e59020a
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · b8a5d06a
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2018-11-02
      
      please apply one round of qeth fixes for -net.
      
      Patch 1 is rather large and removes a use-after-free hazard from many of our
      debug trace entries.
      Patch 2 is yet another fix-up for the L3 subdriver's new IP address management
      code.
      Patch 3 and 4 resolve some fallout from the recent changes wrt how/when qeth
      allocates its net_device.
      Patch 5 makes sure we don't set reserved bits when building HW commands from
      user-provided data.
      And finally, patch 6 allows ethtool to play nice with new HW.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8a5d06a
    • Julian Wiedmann's avatar
      s390/qeth: report 25Gbit link speed · 54e049c2
      Julian Wiedmann authored
      This adds the various identifiers for 25Gbit cards, and wires them up
      into sysfs and ethtool.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54e049c2
    • Julian Wiedmann's avatar
      s390/qeth: sanitize ARP requests · 125d7d30
      Julian Wiedmann authored
      The ARP_{ADD,REMOVE}_ENTRY cmd structs contain reserved fields.
      Introduce a common helper that doesn't raw-copy the user-provided data
      into the cmd, but only sets those fields that are strictly needed for
      the command.
      
      This also sets the correct command length for ARP_REMOVE_ENTRY.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      125d7d30
    • Julian Wiedmann's avatar
      s390/qeth: fix initial operstate · 9fae5c3b
      Julian Wiedmann authored
      Setting the carrier 'on' for an unregistered netdevice doesn't update
      its operstate. Fix this by delaying the update until the netdevice has
      been registered.
      
      Fixes: 91cc98f5 ("s390/qeth: remove duplicated carrier state tracking")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fae5c3b
    • Julian Wiedmann's avatar
      s390/qeth: unregister netdevice only when registered · 30356d08
      Julian Wiedmann authored
      qeth only registers its netdevice when the qeth device is first set
      online. Thus a device that has never been set online will trigger
      a WARN ("network todo 'hsi%d' but state 0") in unregister_netdev() when
      removed.
      
      Fix this by protecting the unregister step, just like we already protect
      against repeated registering of the netdevice.
      
      Fixes: d3d1b205 ("s390/qeth: allocate netdevice early")
      Reported-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30356d08
    • Julian Wiedmann's avatar
      s390/qeth: fix HiperSockets sniffer · bd74a7f9
      Julian Wiedmann authored
      Sniffing mode for L3 HiperSockets requires that no IP addresses are
      registered with the HW. The preferred way to achieve this is for
      userspace to delete all the IPs on the interface. But qeth is expected
      to also tolerate a configuration where that is not the case, by skipping
      the IP registration when in sniffer mode.
      Since commit 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      reworked the IP registration logic in the L3 subdriver, this no longer
      works. When the qeth device is set online, qeth_l3_recover_ip() now
      unconditionally registers all unicast addresses from our internal
      IP table.
      
      While we could fix this particular problem by skipping
      qeth_l3_recover_ip() on a sniffer device, the more future-proof change
      is to skip the IP address registration at the lowest level. This way we
      a) catch any future code path that attempts to register an IP address
         without considering the sniffer scenario, and
      b) continue to build up our internal IP table, so that if sniffer mode
         is switched off later we can operate just like normal.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd74a7f9
    • Julian Wiedmann's avatar
      s390/qeth: sanitize strings in debug messages · e19e5be8
      Julian Wiedmann authored
      As Documentation/s390/s390dbf.txt states quite clearly, using any
      pointer in sprinf-formatted s390dbf debug entries is dangerous.
      The pointers are dereferenced whenever the trace file is read from.
      So if the referenced data has a shorter life-time than the trace file,
      any read operation can result in a use-after-free.
      
      So rip out all hazardous use of indirect data, and replace any usage of
      dev_name() and such by the Bus ID number.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e19e5be8
    • Vasily Khoruzhick's avatar
      netfilter: conntrack: fix calculation of next bucket number in early_drop · f393808d
      Vasily Khoruzhick authored
      If there's no entry to drop in bucket that corresponds to the hash,
      early_drop() should look for it in other buckets. But since it increments
      hash instead of bucket number, it actually looks in the same bucket 8
      times: hsize is 16k by default (14 bits) and hash is 32-bit value, so
      reciprocal_scale(hash, hsize) returns the same value for hash..hash+7 in
      most cases.
      
      Fix it by increasing bucket number instead of hash and rename _hash
      to bucket to avoid future confusion.
      
      Fixes: 3e86638e ("netfilter: conntrack: consider ct netns in early_drop logic")
      Cc: <stable@vger.kernel.org> # v4.7+
      Signed-off-by: default avatarVasily Khoruzhick <vasilykh@arista.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f393808d
    • Florian Westphal's avatar
      netfilter: nft_compat: ebtables 'nat' table is normal chain type · e4844c9c
      Florian Westphal authored
      Unlike ip(6)tables, the ebtables nat table has no special properties.
      This bug causes 'ebtables -A' to fail when using a target such as
      'snat' (ebt_snat target sets ".table = "nat"').  Targets that have
      no table restrictions work fine.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e4844c9c
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr · 8866df92
      Pablo Neira Ayuso authored
      Otherwise, we hit a NULL pointer deference since handlers always assume
      default timeout policy is passed.
      
        netlink: 24 bytes leftover after parsing attributes in process `syz-executor2'.
        kasan: CONFIG_KASAN_INLINE enabled
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        general protection fault: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 9575 Comm: syz-executor1 Not tainted 4.19.0+ #312
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:icmp_timeout_obj_to_nlattr+0x77/0x170 net/netfilter/nf_conntrack_proto_icmp.c:297
      
      Fixes: c779e849 ("netfilter: conntrack: remove get_timeout() indirection")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8866df92
    • Pablo Neira Ayuso's avatar
      netfilter: conntrack: add nf_{tcp,udp,sctp,icmp,dccp,icmpv6,generic}_pernet() · a95a7774
      Pablo Neira Ayuso authored
      Expose these functions to access conntrack protocol tracker netns area,
      nfnetlink_cttimeout needs this.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a95a7774
    • Jozsef Kadlecsik's avatar
      netfilter: ipset: Fix calling ip_set() macro at dumping · 8a02bdd5
      Jozsef Kadlecsik authored
      The ip_set() macro is called when either ip_set_ref_lock held only
      or no lock/nfnl mutex is held at dumping. Take this into account
      properly. Also, use Pablo's suggestion to use rcu_dereference_raw(),
      the ref_netlink protects the set.
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8a02bdd5
    • Taehee Yoo's avatar
      netfilter: xt_IDLETIMER: add sysfs filename checking routine · 54451f60
      Taehee Yoo authored
      When IDLETIMER rule is added, sysfs file is created under
      /sys/class/xt_idletimer/timers/
      But some label name shouldn't be used.
      ".", "..", "power", "uevent", "subsystem", etc...
      So that sysfs filename checking routine is needed.
      
      test commands:
         %iptables -I INPUT -j IDLETIMER --timeout 1 --label "power"
      
      splat looks like:
      [95765.423132] sysfs: cannot create duplicate filename '/devices/virtual/xt_idletimer/timers/power'
      [95765.433418] CPU: 0 PID: 8446 Comm: iptables Not tainted 4.19.0-rc6+ #20
      [95765.449755] Call Trace:
      [95765.449755]  dump_stack+0xc9/0x16b
      [95765.449755]  ? show_regs_print_info+0x5/0x5
      [95765.449755]  sysfs_warn_dup+0x74/0x90
      [95765.449755]  sysfs_add_file_mode_ns+0x352/0x500
      [95765.449755]  sysfs_create_file_ns+0x179/0x270
      [95765.449755]  ? sysfs_add_file_mode_ns+0x500/0x500
      [95765.449755]  ? idletimer_tg_checkentry+0x3e5/0xb1b [xt_IDLETIMER]
      [95765.449755]  ? rcu_read_lock_sched_held+0x114/0x130
      [95765.449755]  ? __kmalloc_track_caller+0x211/0x2b0
      [95765.449755]  ? memcpy+0x34/0x50
      [95765.449755]  idletimer_tg_checkentry+0x4e2/0xb1b [xt_IDLETIMER]
      [ ... ]
      
      Fixes: 0902b469 ("netfilter: xtables: idletimer target implementation")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      54451f60
    • Jozsef Kadlecsik's avatar
      netfilter: ipset: Correct rcu_dereference() call in ip_set_put_comment() · 17b8b74c
      Jozsef Kadlecsik authored
      The function is called when rcu_read_lock() is held and not
      when rcu_read_lock_bh() is held.
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      17b8b74c
    • David S. Miller's avatar
      Merge branch 'net-timeout-fixes-for-GENET-and-SYSTEMPORT' · 265ad063
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: timeout fixes for GENET and SYSTEMPORT
      
      This patch series fixes occasional transmit timeout around the time
      the system goes into suspend. GENET and SYSTEMPORT have nearly the same
      logic in that regard and were both affected in the same way.
      
      Please queue up for stable, thanks!
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      265ad063
    • Florian Fainelli's avatar
      net: systemport: Protect stop from timeout · 7cb6a2a2
      Florian Fainelli authored
      A timing hazard exists when the network interface is stopped that
      allows a watchdog timeout to be processed by a separate core in
      parallel. This creates the potential for the timeout handler to
      wake the queues while the driver is shutting down, or access
      registers after their clocks have been removed.
      
      The more common case is that the watchdog timeout will produce a
      warning message which doesn't lead to a crash. The chances of this
      are greatly increased by the fact that bcm_sysport_netif_stop stops
      the transmit queues which can easily precipitate a watchdog time-
      out because of stale trans_start data in the queues.
      
      This commit corrects the behavior by ensuring that the watchdog
      timeout is disabled before enterring bcm_sysport_netif_stop. There
      are currently only two users of the bcm_sysport_netif_stop function:
      close and suspend.
      
      The close case already handles the issue by exiting the RUNNING
      state before invoking the driver close service.
      
      The suspend case now performs the netif_device_detach to exit the
      PRESENT state before the call to bcm_sysport_netif_stop rather than
      after it.
      
      These behaviors prevent any future scheduling of the driver timeout
      service during the window. The netif_tx_stop_all_queues function
      in bcm_sysport_netif_stop is replaced with netif_tx_disable to ensure
      synchronization with any transmit or timeout threads that may
      already be executing on other cores.
      
      For symmetry, the netif_device_attach call upon resume is moved to
      after the call to bcm_sysport_netif_start. Since it wakes the transmit
      queues it is not necessary to invoke netif_tx_start_all_queues from
      bcm_sysport_netif_start so it is moved into the driver open service.
      
      Fixes: 40755a0f ("net: systemport: add suspend and resume support")
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cb6a2a2
    • Doug Berger's avatar
      net: bcmgenet: protect stop from timeout · 09e805d2
      Doug Berger authored
      A timing hazard exists when the network interface is stopped that
      allows a watchdog timeout to be processed by a separate core in
      parallel. This creates the potential for the timeout handler to
      wake the queues while the driver is shutting down, or access
      registers after their clocks have been removed.
      
      The more common case is that the watchdog timeout will produce a
      warning message which doesn't lead to a crash. The chances of this
      are greatly increased by the fact that bcmgenet_netif_stop stops
      the transmit queues which can easily precipitate a watchdog time-
      out because of stale trans_start data in the queues.
      
      This commit corrects the behavior by ensuring that the watchdog
      timeout is disabled before enterring bcmgenet_netif_stop. There
      are currently only two users of the bcmgenet_netif_stop function:
      close and suspend.
      
      The close case already handles the issue by exiting the RUNNING
      state before invoking the driver close service.
      
      The suspend case now performs the netif_device_detach to exit the
      PRESENT state before the call to bcmgenet_netif_stop rather than
      after it.
      
      These behaviors prevent any future scheduling of the driver timeout
      service during the window. The netif_tx_stop_all_queues function
      in bcmgenet_netif_stop is replaced with netif_tx_disable to ensure
      synchronization with any transmit or timeout threads that may
      already be executing on other cores.
      
      For symmetry, the netif_device_attach call upon resume is moved to
      after the call to bcmgenet_netif_start. Since it wakes the transmit
      queues it is not necessary to invoke netif_tx_start_all_queues from
      bcmgenet_netif_start so it is moved into the driver open service.
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09e805d2
    • David Howells's avatar
      rxrpc: Fix lockup due to no error backoff after ack transmit error · c7e86acf
      David Howells authored
      If the network becomes (partially) unavailable, say by disabling IPv6, the
      background ACK transmission routine can get itself into a tizzy by
      proposing immediate ACK retransmission.  Since we're in the call event
      processor, that happens immediately without returning to the workqueue
      manager.
      
      The condition should clear after a while when either the network comes back
      or the call times out.
      
      Fix this by:
      
       (1) When re-proposing an ACK on failed Tx, don't schedule it immediately.
           This will allow a certain amount of time to elapse before we try
           again.
      
       (2) Enforce a return to the workqueue manager after a certain number of
           iterations of the call processing loop.
      
       (3) Add a backoff delay that increases the delay on deferred ACKs by a
           jiffy per failed transmission to a limit of HZ.  The backoff delay is
           cleared on a successful return from kernel_sendmsg().
      
       (4) Cancel calls immediately if the opening sendmsg fails.  The layer
           above can arrange retransmission or rotate to another server.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7e86acf
    • Tristram Ha's avatar
      net: dsa: microchip: initialize mutex before use · 284fb78e
      Tristram Ha authored
      Initialize mutex before use.  Avoid kernel complaint when
      CONFIG_DEBUG_LOCK_ALLOC is enabled.
      
      Fixes: b987e98e ("dsa: add DSA switch driver for Microchip KSZ9477")
      Signed-off-by: default avatarTristram Ha <Tristram.Ha@microchip.com>
      Reviewed-by: default avatarPavel Machek <pavel@ucw.cz>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      284fb78e