1. 07 Dec, 2019 11 commits
    • Guillaume Nault's avatar
      tcp: tighten acceptance of ACKs not matching a child socket · cb44a08f
      Guillaume Nault authored
      When no synflood occurs, the synflood timestamp isn't updated.
      Therefore it can be so old that time_after32() can consider it to be
      in the future.
      
      That's a problem for tcp_synq_no_recent_overflow() as it may report
      that a recent overflow occurred while, in fact, it's just that jiffies
      has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
      
      Spurious detection of recent overflows lead to extra syncookie
      verification in cookie_v[46]_check(). At that point, the verification
      should fail and the packet dropped. But we should have dropped the
      packet earlier as we didn't even send a syncookie.
      
      Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
      only if jiffies is within the
      [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
      way, no spurious recent overflow is reported when jiffies wraps and
      'last_overflow' becomes in the future from the point of view of
      time_after32().
      
      However, if jiffies wraps and enters the
      [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
      'last_overflow' being a stale synflood timestamp), then
      tcp_synq_no_recent_overflow() still erroneously reports an
      overflow. In such cases, we have to rely on syncookie verification
      to drop the packet. We unfortunately have no way to differentiate
      between a fresh and a stale syncookie timestamp.
      
      In practice, using last_overflow as lower bound is problematic.
      If the synflood timestamp is concurrently updated between the time
      we read jiffies and the moment we store the timestamp in
      'last_overflow', then 'now' becomes smaller than 'last_overflow' and
      tcp_synq_no_recent_overflow() returns true, potentially dropping a
      valid syncookie.
      
      Reading jiffies after loading the timestamp could fix the problem,
      but that'd require a memory barrier. Let's just accommodate for
      potential timestamp growth instead and extend the interval using
      'last_overflow - HZ' as lower bound.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb44a08f
    • Guillaume Nault's avatar
      tcp: fix rejected syncookies due to stale timestamps · 04d26e7b
      Guillaume Nault authored
      If no synflood happens for a long enough period of time, then the
      synflood timestamp isn't refreshed and jiffies can advance so much
      that time_after32() can't accurately compare them any more.
      
      Therefore, we can end up in a situation where time_after32(now,
      last_overflow + HZ) returns false, just because these two values are
      too far apart. In that case, the synflood timestamp isn't updated as
      it should be, which can trick tcp_synq_no_recent_overflow() into
      rejecting valid syncookies.
      
      For example, let's consider the following scenario on a system
      with HZ=1000:
      
        * The synflood timestamp is 0, either because that's the timestamp
          of the last synflood or, more commonly, because we're working with
          a freshly created socket.
      
        * We receive a new SYN, which triggers synflood protection. Let's say
          that this happens when jiffies == 2147484649 (that is,
          'synflood timestamp' + HZ + 2^31 + 1).
      
        * Then tcp_synq_overflow() doesn't update the synflood timestamp,
          because time_after32(2147484649, 1000) returns false.
          With:
            - 2147484649: the value of jiffies, aka. 'now'.
            - 1000: the value of 'last_overflow' + HZ.
      
        * A bit later, we receive the ACK completing the 3WHS. But
          cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
          says that we're not under synflood. That's because
          time_after32(2147484649, 120000) returns false.
          With:
            - 2147484649: the value of jiffies, aka. 'now'.
            - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.
      
          Of course, in reality jiffies would have increased a bit, but this
          condition will last for the next 119 seconds, which is far enough
          to accommodate for jiffie's growth.
      
      Fix this by updating the overflow timestamp whenever jiffies isn't
      within the [last_overflow, last_overflow + HZ] range. That shouldn't
      have any performance impact since the update still happens at most once
      per second.
      
      Now we're guaranteed to have fresh timestamps while under synflood, so
      tcp_synq_no_recent_overflow() can safely use it with time_after32() in
      such situations.
      
      Stale timestamps can still make tcp_synq_no_recent_overflow() return
      the wrong verdict when not under synflood. This will be handled in the
      next patch.
      
      For 64 bits architectures, the problem was introduced with the
      conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
      cca9bab1 ("tcp: use monotonic timestamps for PAWS").
      The problem has always been there on 32 bits architectures.
      
      Fixes: cca9bab1 ("tcp: use monotonic timestamps for PAWS")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04d26e7b
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2019-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 537d0779
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2019-12-05
      
      This series introduces some fixes to mlx5 driver.
      
      Please pull and let me know if there is any problem.
      
      For -stable v4.19:
       ('net/mlx5e: Query global pause state before setting prio2buffer')
      
      For -stable v5.3
       ('net/mlx5e: Fix SFF 8472 eeprom length')
       ('net/mlx5e: Fix translation of link mode into speed')
       ('net/mlx5e: Fix freeing flow with kfree() and not kvfree()')
       ('net/mlx5e: ethtool, Fix analysis of speed setting')
       ('net/mlx5e: Fix TXQ indices to be sequential')
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      537d0779
    • Bruno Carneiro da Cunha's avatar
      lpc_eth: kernel BUG on remove · 04aa1bc4
      Bruno Carneiro da Cunha authored
      We may have found a bug in the nxp/lpc_eth.c driver. The function
      platform_set_drvdata() is called twice, the second time it is called,
      in lpc_mii_init(), it overwrites the struct net_device which should be
      at pdev->dev->driver_data with pldat->mii_bus. When trying to remove
      the driver, in lpc_eth_drv_remove(), platform_get_drvdata() will
      return the pldat->mii_bus pointer and try to use it as a struct
      net_device pointer. This causes unregister_netdev to segfault and
      generate a kernel BUG. Is this reproducible?
      Signed-off-by: default avatarDaniel Martinez <linux@danielsmartinez.com>
      Signed-off-by: default avatarBruno Carneiro da Cunha <brunocarneirodacunha@usp.br>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04aa1bc4
    • Eric Dumazet's avatar
      tcp: md5: fix potential overestimation of TCP option space · 9424e2e7
      Eric Dumazet authored
      Back in 2008, Adam Langley fixed the corner case of packets for flows
      having all of the following options : MD5 TS SACK
      
      Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
      can be cooked from the remaining 8 bytes.
      
      tcp_established_options() correctly sets opts->num_sack_blocks
      to zero, but returns 36 instead of 32.
      
      This means TCP cooks packets with 4 extra bytes at the end
      of options, containing unitialized bytes.
      
      Fixes: 33ad798c ("tcp: options clean up")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9424e2e7
    • David S. Miller's avatar
      Merge branch 'net-tc-indirect-block-relay' · 9a74542e
      David S. Miller authored
      John Hurley says:
      
      ====================
      Ensure egress un/bind are relayed with indirect blocks
      
      On register and unregister for indirect blocks, a command is called that
      sends a bind/unbind event to the registering driver. This command assumes
      that the bind to indirect block will be on ingress. However, drivers such
      as NFP have allowed binding to clsact qdiscs as well as ingress qdiscs
      from mainline Linux 5.2. A clsact qdisc binds to an ingress and an egress
      block.
      
      Rather than assuming that an indirect bind is always ingress, modify the
      function names to remove the ingress tag (patch 1). In cls_api, which is
      used by NFP to offload TC flower, generate bind/unbind message for both
      ingress and egress blocks on the event of indirectly
      registering/unregistering from that block. Doing so mimics the behaviour
      of both ingress and clsact qdiscs on initialise and destroy.
      
      This now ensures that drivers such as NFP receive the correct binder type
      for the indirect block registration.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a74542e
    • John Hurley's avatar
      net: sched: allow indirect blocks to bind to clsact in TC · 25a443f7
      John Hurley authored
      When a device is bound to a clsact qdisc, bind events are triggered to
      registered drivers for both ingress and egress. However, if a driver
      registers to such a device using the indirect block routines then it is
      assumed that it is only interested in ingress offload and so only replays
      ingress bind/unbind messages.
      
      The NFP driver supports the offload of some egress filters when
      registering to a block with qdisc of type clsact. However, on unregister,
      if the block is still active, it will not receive an unbind egress
      notification which can prevent proper cleanup of other registered
      callbacks.
      
      Modify the indirect block callback command in TC to send messages of
      ingress and/or egress bind depending on the qdisc in use. NFP currently
      supports egress offload for TC flower offload so the changes are only
      added to TC.
      
      Fixes: 4d12ba42 ("nfp: flower: allow offloading of matches on 'internal' ports")
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25a443f7
    • John Hurley's avatar
      net: core: rename indirect block ingress cb function · dbad3408
      John Hurley authored
      With indirect blocks, a driver can register for callbacks from a device
      that is does not 'own', for example, a tunnel device. When registering to
      or unregistering from a new device, a callback is triggered to generate
      a bind/unbind event. This, in turn, allows the driver to receive any
      existing rules or to properly clean up installed rules.
      
      When first added, it was assumed that all indirect block registrations
      would be for ingress offloads. However, the NFP driver can, in some
      instances, support clsact qdisc binds for egress offload.
      
      Change the name of the indirect block callback command in flow_offload to
      remove the 'ingress' identifier from it. While this does not change
      functionality, a follow up patch will implement a more more generic
      callback than just those currently just supporting ingress offload.
      
      Fixes: 4d12ba42 ("nfp: flower: allow offloading of matches on 'internal' ports")
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbad3408
    • Jouni Hogander's avatar
      net-sysfs: Call dev_hold always in netdev_queue_add_kobject · e0b60903
      Jouni Hogander authored
      Dev_hold has to be called always in netdev_queue_add_kobject.
      Otherwise usage count drops below 0 in case of failure in
      kobject_init_and_add.
      
      Fixes: b8eb7183 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: David Miller <davem@davemloft.net>
      Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0b60903
    • Alexander Lobakin's avatar
      net: dsa: fix flow dissection on Tx path · 8bef0af0
      Alexander Lobakin authored
      Commit 43e66528 ("net-next: dsa: fix flow dissection") added an
      ability to override protocol and network offset during flow dissection
      for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
      in order to fix skb hashing for RPS on Rx path.
      
      However, skb_hash() and added part of code can be invoked not only on
      Rx, but also on Tx path if we have a multi-queued device and:
       - kernel is running on UP system or
       - XPS is not configured.
      
      The call stack in this two cases will be like: dev_queue_xmit() ->
      __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
      skb_tx_hash() -> skb_get_hash().
      
      The problem is that skbs queued for Tx have both network offset and
      correct protocol already set up even after inserting a CPU tag by DSA
      tagger, so calling tag_ops->flow_dissect() on this path actually only
      breaks flow dissection and hashing.
      
      This can be observed by adding debug prints just before and right after
      tag_ops->flow_dissect() call to the related block of code:
      
      Before the patch:
      
      Rx path (RPS):
      
      [   19.240001] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.244271] tag_ops->flow_dissect()
      [   19.247811] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   19.215435] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.219746] tag_ops->flow_dissect()
      [   19.223241] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   18.654057] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   18.658332] tag_ops->flow_dissect()
      [   18.661826] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   18.759560] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      [   18.763933] tag_ops->flow_dissect()
      [   18.767485] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   22.800020] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      [   22.804392] tag_ops->flow_dissect()
      [   22.807921] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   16.898342] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      [   16.902705] tag_ops->flow_dissect()
      [   16.906227] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      After:
      
      Rx path (RPS):
      
      [   16.520993] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   16.525260] tag_ops->flow_dissect()
      [   16.528808] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   15.484807] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   15.490417] tag_ops->flow_dissect()
      [   15.495223] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   17.134621] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   17.138895] tag_ops->flow_dissect()
      [   17.142388] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   15.499558] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      
      [   20.664689] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      
      [   18.565782] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      
      In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
      to prevent code from calling tag_ops->flow_dissect() on Tx.
      I also decided to initialize 'offset' variable so tagger callbacks can
      now safely leave it untouched without provoking a chaos.
      
      Fixes: 43e66528 ("net-next: dsa: fix flow dissection")
      Signed-off-by: default avatarAlexander Lobakin <alobakin@dlink.ru>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bef0af0
    • Valentin Vidic's avatar
      net/tls: Fix return values to avoid ENOTSUPP · 4a5cdc60
      Valentin Vidic authored
      ENOTSUPP is not available in userspace, for example:
      
        setsockopt failed, 524, Unknown error 524
      Signed-off-by: default avatarValentin Vidic <vvidic@valentin-vidic.from.hr>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a5cdc60
  2. 06 Dec, 2019 7 commits
    • Eric Dumazet's avatar
      net: avoid an indirect call in ____sys_recvmsg() · 1af66221
      Eric Dumazet authored
      CONFIG_RETPOLINE=y made indirect calls expensive.
      
      gcc seems to add an indirect call in ____sys_recvmsg().
      
      Rewriting the code slightly makes sure to avoid this indirection.
      
      Alternative would be to not call sock_recvmsg() and instead
      use security_socket_recvmsg() and sock_recvmsg_nosec(),
      but this is less readable IMO.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: David Laight <David.Laight@aculab.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1af66221
    • Chuhong Yuan's avatar
      phy: mdio-thunder: add missed pci_release_regions in remove · 462f8554
      Chuhong Yuan authored
      The driver forgets to call pci_release_regions() in remove like that
      in probe failure.
      Add the missed call to fix it.
      Signed-off-by: default avatarChuhong Yuan <hslester96@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      462f8554
    • Taehee Yoo's avatar
      tipc: fix ordering of tipc module init and exit routine · 9cf1cd8e
      Taehee Yoo authored
      In order to set/get/dump, the tipc uses the generic netlink
      infrastructure. So, when tipc module is inserted, init function
      calls genl_register_family().
      After genl_register_family(), set/get/dump commands are immediately
      allowed and these callbacks internally use the net_generic.
      net_generic is allocated by register_pernet_device() but this
      is called after genl_register_family() in the __init function.
      So, these callbacks would use un-initialized net_generic.
      
      Test commands:
          #SHELL1
          while :
          do
              modprobe tipc
              modprobe -rv tipc
          done
      
          #SHELL2
          while :
          do
              tipc link list
          done
      
      Splat looks like:
      [   59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled
      [   59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [   59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [   59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194
      [   59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc]
      [   59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00
      [   59.622550][ T2780] NET: Registered protocol family 30
      [   59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202
      [   59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907
      [   59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149
      [   59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1
      [   59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40
      [   59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328
      [   59.624639][ T2788] FS:  00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
      [   59.624645][ T2788] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   59.625875][ T2780] tipc: Started in single node mode
      [   59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0
      [   59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   59.636478][ T2788] Call Trace:
      [   59.637025][ T2788]  tipc_nl_add_bc_link+0x179/0x1470 [tipc]
      [   59.638219][ T2788]  ? lock_downgrade+0x6e0/0x6e0
      [   59.638923][ T2788]  ? __tipc_nl_add_link+0xf90/0xf90 [tipc]
      [   59.639533][ T2788]  ? tipc_nl_node_dump_link+0x318/0xa50 [tipc]
      [   59.640160][ T2788]  ? mutex_lock_io_nested+0x1380/0x1380
      [   59.640746][ T2788]  tipc_nl_node_dump_link+0x4fd/0xa50 [tipc]
      [   59.641356][ T2788]  ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc]
      [   59.642088][ T2788]  ? __skb_ext_del+0x270/0x270
      [   59.642594][ T2788]  genl_lock_dumpit+0x85/0xb0
      [   59.643050][ T2788]  netlink_dump+0x49c/0xed0
      [   59.643529][ T2788]  ? __netlink_sendskb+0xc0/0xc0
      [   59.644044][ T2788]  ? __netlink_dump_start+0x190/0x800
      [   59.644617][ T2788]  ? __mutex_unlock_slowpath+0xd0/0x670
      [   59.645177][ T2788]  __netlink_dump_start+0x5a0/0x800
      [   59.645692][ T2788]  genl_rcv_msg+0xa75/0xe90
      [   59.646144][ T2788]  ? __lock_acquire+0xdfe/0x3de0
      [   59.646692][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.647340][ T2788]  ? genl_lock_dumpit+0xb0/0xb0
      [   59.647821][ T2788]  ? genl_unlock+0x20/0x20
      [   59.648290][ T2788]  ? genl_parallel_done+0xe0/0xe0
      [   59.648787][ T2788]  ? find_held_lock+0x39/0x1d0
      [   59.649276][ T2788]  ? genl_rcv+0x15/0x40
      [   59.649722][ T2788]  ? lock_contended+0xcd0/0xcd0
      [   59.650296][ T2788]  netlink_rcv_skb+0x121/0x350
      [   59.650828][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.651491][ T2788]  ? netlink_ack+0x940/0x940
      [   59.651953][ T2788]  ? lock_acquire+0x164/0x3b0
      [   59.652449][ T2788]  genl_rcv+0x24/0x40
      [   59.652841][ T2788]  netlink_unicast+0x421/0x600
      [ ... ]
      
      Fixes: 7e436905 ("tipc: fix a slab object leak")
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cf1cd8e
    • Vladyslav Tarasiuk's avatar
      mqprio: Fix out-of-bounds access in mqprio_dump · 9f104c77
      Vladyslav Tarasiuk authored
      When user runs a command like
      tc qdisc add dev eth1 root mqprio
      KASAN stack-out-of-bounds warning is emitted.
      Currently, NLA_ALIGN macro used in mqprio_dump provides too large
      buffer size as argument for nla_put and memcpy down the call stack.
      The flow looks like this:
      1. nla_put expects exact object size as an argument;
      2. Later it provides this size to memcpy;
      3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
         macro itself.
      
      Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
      Otherwise it will lead to out-of-bounds memory access in memcpy.
      
      Fixes: 4e8b86c0 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
      Signed-off-by: default avatarVladyslav Tarasiuk <vladyslavt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f104c77
    • Jongsung Kim's avatar
      net: stmmac: reset Tx desc base address before restarting Tx · f421031e
      Jongsung Kim authored
      Refer to the databook of DesignWare Cores Ethernet MAC Universal:
      
      6.2.1.5 Register 4 (Transmit Descriptor List Address Register
      
      If this register is not changed when the ST bit is set to 0, then
      the DMA takes the descriptor address where it was stopped earlier.
      
      The stmmac_tx_err() does zero indices to Tx descriptors, but does
      not reset HW current Tx descriptor address. To fix inconsistency,
      the base address of the Tx descriptors should be rewritten before
      restarting Tx.
      Signed-off-by: default avatarJongsung Kim <neidhard.kim@lge.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f421031e
    • Yangbo Lu's avatar
      enetc: disable EEE autoneg by default · a6a10d45
      Yangbo Lu authored
      The EEE support has not been enabled on ENETC, but it may connect
      to a PHY which supports EEE and advertises EEE by default, while
      its link partner also advertises EEE. If this happens, the PHY enters
      low power mode when the traffic rate is low and causes packet loss.
      This patch disables EEE advertisement by default for any PHY that
      ENETC connects to, to prevent the above unwanted outcome.
      Signed-off-by: default avatarYangbo Lu <yangbo.lu@nxp.com>
      Reviewed-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6a10d45
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · ae72555b
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2019-12-05
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 6 non-merge commits during the last 1 day(s) which contain
      a total of 14 files changed, 116 insertions(+), 37 deletions(-).
      
      The main changes are:
      
      1) three selftests fixes, from Stanislav.
      
      2) one samples fix, from Jesper.
      
      3) one verifier fix, from Yonghong.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae72555b
  3. 05 Dec, 2019 22 commits
    • Eric Biggers's avatar
      ppp: fix out-of-bounds access in bpf_prog_create() · 0033b34a
      Eric Biggers authored
      sock_fprog_kern::len is in units of struct sock_filter, not bytes.
      
      Fixes: 3e859adf ("compat_ioctl: unify copy-in of ppp filters")
      Reported-by: syzbot+eb853b51b10f1befa0b7@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0033b34a
    • David S. Miller's avatar
      Merge branch 'hns3-fixes' · a116f4e2
      David S. Miller authored
      Huazhong Tan says:
      
      ====================
      net: hns3: fixes for -net
      
      This patchset includes misc fixes for the HNS3 ethernet driver.
      
      [patch 1/3] fixes a TX queue not restarted problem.
      
      [patch 2/3] fixes a use-after-free issue.
      
      [patch 3/3] fixes a VF ID issue for setting VF VLAN.
      
      change log:
      V1->V2: keeps 'ring' as parameter in hns3_nic_maybe_stop_tx()
      	in [patch 1/3], suggestted by David.
      	rewrites [patch 2/3]'s commit log to make it be easier
      	to understand, suggestted by David.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a116f4e2
    • Jian Shen's avatar
      net: hns3: fix VF ID issue for setting VF VLAN · 1c985508
      Jian Shen authored
      Previously, when set VF VLAN with command "ip link set <pf name>
      vf <vf id> vlan <vlan id>", the VF ID 0 is handled as PF incorrectly,
      which should be the first VF. This patch fixes it.
      
      Fixes: 21e043cd ("net: hns3: fix set port based VLAN for PF")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c985508
    • Yunsheng Lin's avatar
      net: hns3: fix a use after free problem in hns3_nic_maybe_stop_tx() · d1a37ded
      Yunsheng Lin authored
      Currently, hns3_nic_maybe_stop_tx() uses skb_copy() to linearize a
      SKB if the BD num required by the SKB does not meet the hardware
      limitation, and it linearizes the SKB by allocating a new linearized SKB
      and freeing the old SKB, if hns3_nic_maybe_stop_tx() returns -EBUSY
      because there are no enough space in the ring to send the linearized
      skb to hardware, the sch_direct_xmit() still hold reference to old SKB
      and try to retransmit the old SKB when dev_hard_start_xmit() return
      TX_BUSY, which may cause use after freed problem.
      
      This patch fixes it by using __skb_linearize() to linearize the
      SKB in hns3_nic_maybe_stop_tx().
      
      Fixes: 51e8439f ("net: hns3: add 8 BD limit for tx flow")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1a37ded
    • Yunsheng Lin's avatar
      net: hns3: fix for TX queue not restarted problem · 2a597eff
      Yunsheng Lin authored
      There is timing window between ring_space checking and
      netif_stop_subqueue when transmiting a SKB, and the TX BD
      cleaning may be executed during the time window, which may
      caused TX queue not restarted problem.
      
      This patch fixes it by rechecking the ring_space after
      netif_stop_subqueue to make sure TX queue is restarted.
      
      Also, the ring->next_to_clean is updated even when pkts is
      zero, because all the TX BD cleaned may be non-SKB, so it
      needs to check if TX queue need to be restarted.
      
      Fixes: 76ad4f0e ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a597eff
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpsw_switchdev: fix unmet direct dependencies detected for NET_SWITCHDEV · aacf6578
      Grygorii Strashko authored
      Replace "select NET_SWITCHDEV" vs "depends on NET_SWITCHDEV" to fix Kconfig
      warning with CONFIG_COMPILE_TEST=y
      
      WARNING: unmet direct dependencies detected for NET_SWITCHDEV
        Depends on [n]: NET [=y] && INET [=n]
        Selected by [y]:
        - TI_CPSW_SWITCHDEV [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && (ARCH_DAVINCI || ARCH_OMAP2PLUS || COMPILE_TEST [=y])
      
      because TI_CPSW_SWITCHDEV blindly selects NET_SWITCHDEV even though
      INET is not set/enabled, while NET_SWITCHDEV depends on INET.
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Fixes: ed3525ed ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aacf6578
    • Parav Pandit's avatar
      net/mlx5e: E-switch, Fix Ingress ACL groups in switchdev mode for prio tag · b7826076
      Parav Pandit authored
      In cited commit, when prio tag mode is enabled, FTE creation fails
      due to missing group with valid match criteria.
      
      Hence,
      (a) create prio tag group metadata_prio_tag_grp when prio tag is
      enabled with match criteria for vlan push FTE.
      (b) Rename metadata_grp to metadata_allmatch_grp to reflect its purpose.
      
      Also when priority tag is enabled, delete metadata settings after
      deleting ingress rules, which are using it.
      
      Tide up rest of the ingress config code for unnecessary labels.
      
      Fixes: 10652f39 ("net/mlx5: Refactor ingress acl configuration")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarEli Britstein <elibr@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b7826076
    • Aya Levin's avatar
      net/mlx5e: ethtool, Fix analysis of speed setting · 3d7cadae
      Aya Levin authored
      When setting speed to 100G via ethtool (AN is set to off), only 25G*4 is
      configured while the user, who has an advanced HW which supports
      extended PTYS, expects also 50G*2 to be configured.
      With this patch, when extended PTYS mode is available, configure
      PTYS via extended fields.
      
      Fixes: 4b95840a ("net/mlx5e: Fix matching of speed to PRM link modes")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      3d7cadae
    • Aya Levin's avatar
      net/mlx5e: Fix translation of link mode into speed · 6d485e5e
      Aya Levin authored
      Add a missing value in translation of PTYS ext_eth_proto_oper to its
      corresponding speed. When ext_eth_proto_oper bit 10 is set, ethtool
      shows unknown speed. With this fix, ethtool shows speed is 100G as
      expected.
      
      Fixes: a08b4ed1 ("net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6d485e5e
    • Roi Dayan's avatar
      net/mlx5e: Fix free peer_flow when refcount is 0 · eb252c3a
      Roi Dayan authored
      It could be neigh update flow took a refcount on peer flow so
      sometimes we cannot release peer flow even if parent flow is
      being freed now.
      
      Fixes: 5a7e5bcb ("net/mlx5e: Extend tc flow struct with reference counter")
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarEli Britstein <elibr@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      eb252c3a
    • Roi Dayan's avatar
      net/mlx5e: Fix freeing flow with kfree() and not kvfree() · a23dae79
      Roi Dayan authored
      Flows are allocated with kzalloc() so free with kfree().
      
      Fixes: 04de7dda ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarEli Britstein <elibr@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a23dae79
    • Eran Ben Elisha's avatar
      net/mlx5e: Fix SFF 8472 eeprom length · c431f859
      Eran Ben Elisha authored
      SFF 8472 eeprom length is 512 bytes. Fix module info return value to
      support 512 bytes read.
      
      Fixes: ace329f4 ("net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarAya Levin <ayal@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      c431f859
    • Huy Nguyen's avatar
      net/mlx5e: Query global pause state before setting prio2buffer · 73e65516
      Huy Nguyen authored
      When the user changes prio2buffer mapping while global pause is
      enabled, mlx5 driver incorrectly sets all active buffers
      (buffer that has at least one priority mapped) to lossy.
      
      Solution:
      If global pause is enabled, set all the active buffers to lossless
      in prio2buffer command.
      Also, add error message when buffer size is not enough to meet
      xoff threshold.
      
      Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      73e65516
    • Eran Ben Elisha's avatar
      net/mlx5e: Fix TXQ indices to be sequential · c55d8b10
      Eran Ben Elisha authored
      Cited patch changed (channel index, tc) => (TXQ index) mapping to be a
      static one, in order to keep indices consistent when changing number of
      channels or TCs.
      
      For 32 channels (OOB) and 8 TCs, real num of TXQs is 256.
      When reducing the amount of channels to 8, the real num of TXQs will be
      changed to 64.
      This indices method is buggy:
      - Channel #0, TC 3, the TXQ index is 96.
      - Index 8 is not valid, as there is no such TXQ from driver perspective
        (As it represents channel #8, TC 0, which is not valid with the above
        configuration).
      
      As part of driver's select queue, it calls netdev_pick_tx which returns an
      index in the range of real number of TXQs. Depends on the return value,
      with the examples above, driver could have returned index larger than the
      real number of tx queues, or crash the kernel as it tries to read invalid
      address of SQ which was not allocated.
      
      Fix that by allocating sequential TXQ indices, and hold a new mapping
      between (channel index, tc) => (real TXQ index). This mapping will be
      updated as part of priv channels activation, and is used in
      mlx5e_select_queue to find the selected queue index.
      
      The existing indices mapping (channel_tc2txq) is no longer needed, as it
      is used only for statistics structures and can be calculated on run time.
      Delete its definintion and updates.
      
      Fixes: 8bfaf07f ("net/mlx5e: Present SW stats when state is not opened")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      c55d8b10
    • David S. Miller's avatar
      Merge branch 's390-fixes' · b8744052
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2019-12-05
      
      please apply the following fixes to your net tree.
      
      The first two patches target the RX data path, the third fixes a memory
      leak when shutting down a qeth device.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8744052
    • Julian Wiedmann's avatar
      s390/qeth: fix dangling IO buffers after halt/clear · f9e50b02
      Julian Wiedmann authored
      The cio layer's intparm logic does not align itself well with how qeth
      manages cmd IOs. When an active IO gets terminated via halt/clear, the
      corresponding IRQ's intparm does not reflect the cmd buffer but rather
      the intparm that was passed to ccw_device_halt() / ccw_device_clear().
      This behaviour was recently clarified in
      commit b91d9e67 ("s390/cio: fix intparm documentation").
      
      As a result, qeth_irq() currently doesn't cancel a cmd that was
      terminated via halt/clear. This primarily causes us to leak
      card->read_cmd after the qeth device is removed, since our IO path still
      holds a refcount for this cmd.
      
      For qeth this means that we need to keep track of which IO is pending on
      a device ('active_cmd'), and use this as the intparm when calling
      halt/clear. Otherwise qeth_irq() can't match the subsequent IRQ to its
      cmd buffer.
      Since we now keep track of the _expected_ intparm, we can also detect
      any mismatch; this would constitute a bug somewhere in the lower layers.
      In this case cancel the active cmd - we effectively "lost" the IRQ and
      should not expect any further notification for this IO.
      
      Fixes: 40554895 ("s390/qeth: add support for dynamically allocated cmds")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9e50b02
    • Julian Wiedmann's avatar
      s390/qeth: ensure linear access to packet headers · f677fcb9
      Julian Wiedmann authored
      When the RX path builds non-linear skbs, the packet headers can
      currently spill over into page fragments. Depending on the packet type
      and what fields we need to access in the headers, this could cause us
      to go past the end of skb->data.
      
      So for non-linear packets, copy precisely the length of the necessary
      headers ('linear_len') into skb->data.
      And don't copy more, upper-level protocols will peel whatever additional
      packet headers they need.
      
      Fixes: 4a71df50 ("qeth: new qeth device driver")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f677fcb9
    • Julian Wiedmann's avatar
      s390/qeth: guard against runt packets · 5b55633f
      Julian Wiedmann authored
      Depending on a packet's type, the RX path needs to access fields in the
      packet headers and thus requires a minimum packet length.
      Enforce this length when building the skb.
      
      On the other hand a single runt packet is no reason to drop the whole
      RX buffer. So just skip it, and continue processing on the next packet.
      
      Fixes: 4a71df50 ("qeth: new qeth device driver")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b55633f
    • Mian Yousaf Kaukab's avatar
      net: thunderx: start phy before starting autonegotiation · a350d2e7
      Mian Yousaf Kaukab authored
      Since commit 2b3e88ea ("net: phy: improve phy state checking")
      phy_start_aneg() expects phy state to be >= PHY_UP. Call phy_start()
      before calling phy_start_aneg() during probe so that autonegotiation
      is initiated.
      
      As phy_start() takes care of calling phy_start_aneg(), drop the explicit
      call to phy_start_aneg().
      
      Network fails without this patch on Octeon TX.
      
      Fixes: 2b3e88ea ("net: phy: improve phy state checking")
      Signed-off-by: default avatarMian Yousaf Kaukab <ykaukab@suse.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a350d2e7
    • Taehee Yoo's avatar
      hsr: fix a NULL pointer dereference in hsr_dev_xmit() · df95467b
      Taehee Yoo authored
      hsr_dev_xmit() calls hsr_port_get_hsr() to find master node and that would
      return NULL if master node is not existing in the list.
      But hsr_dev_xmit() doesn't check return pointer so a NULL dereference
      could occur.
      
      Test commands:
          ip netns add nst
          ip link add veth0 type veth peer name veth1
          ip link add veth2 type veth peer name veth3
          ip link set veth1 netns nst
          ip link set veth3 netns nst
          ip link set veth0 up
          ip link set veth2 up
          ip link add hsr0 type hsr slave1 veth0 slave2 veth2
          ip a a 192.168.100.1/24 dev hsr0
          ip link set hsr0 up
          ip netns exec nst ip link set veth1 up
          ip netns exec nst ip link set veth3 up
          ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
          ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
          ip netns exec nst ip link set hsr1 up
          hping3 192.168.100.2 -2 --flood &
          modprobe -rv hsr
      
      Splat looks like:
      [  217.351122][ T1635] kasan: CONFIG_KASAN_INLINE enabled
      [  217.352969][ T1635] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [  217.354297][ T1635] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  217.355507][ T1635] CPU: 1 PID: 1635 Comm: hping3 Not tainted 5.4.0+ #192
      [  217.356472][ T1635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  217.357804][ T1635] RIP: 0010:hsr_dev_xmit+0x34/0x90 [hsr]
      [  217.373010][ T1635] Code: 48 8d be 00 0c 00 00 be 04 00 00 00 48 83 ec 08 e8 21 be ff ff 48 8d 78 10 48 ba 00 b
      [  217.376919][ T1635] RSP: 0018:ffff8880cd8af058 EFLAGS: 00010202
      [  217.377571][ T1635] RAX: 0000000000000000 RBX: ffff8880acde6840 RCX: 0000000000000002
      [  217.379465][ T1635] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 0000000000000010
      [  217.380274][ T1635] RBP: ffff8880acde6840 R08: ffffed101b440d5d R09: 0000000000000001
      [  217.381078][ T1635] R10: 0000000000000001 R11: ffffed101b440d5c R12: ffff8880bffcc000
      [  217.382023][ T1635] R13: ffff8880bffcc088 R14: 0000000000000000 R15: ffff8880ca675c00
      [  217.383094][ T1635] FS:  00007f060d9d1740(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000
      [  217.384289][ T1635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  217.385009][ T1635] CR2: 00007faf15381dd0 CR3: 00000000d523c001 CR4: 00000000000606e0
      [  217.385940][ T1635] Call Trace:
      [  217.386544][ T1635]  dev_hard_start_xmit+0x160/0x740
      [  217.387114][ T1635]  __dev_queue_xmit+0x1961/0x2e10
      [  217.388118][ T1635]  ? check_object+0xaf/0x260
      [  217.391466][ T1635]  ? __alloc_skb+0xb9/0x500
      [  217.392017][ T1635]  ? init_object+0x6b/0x80
      [  217.392629][ T1635]  ? netdev_core_pick_tx+0x2e0/0x2e0
      [  217.393175][ T1635]  ? __alloc_skb+0xb9/0x500
      [  217.393727][ T1635]  ? rcu_read_lock_sched_held+0x90/0xc0
      [  217.394331][ T1635]  ? rcu_read_lock_bh_held+0xa0/0xa0
      [  217.395013][ T1635]  ? kasan_unpoison_shadow+0x30/0x40
      [  217.395668][ T1635]  ? __kasan_kmalloc.constprop.4+0xa0/0xd0
      [  217.396280][ T1635]  ? __kmalloc_node_track_caller+0x3a8/0x3f0
      [  217.399007][ T1635]  ? __kasan_kmalloc.constprop.4+0xa0/0xd0
      [  217.400093][ T1635]  ? __kmalloc_reserve.isra.46+0x2e/0xb0
      [  217.401118][ T1635]  ? memset+0x1f/0x40
      [  217.402529][ T1635]  ? __alloc_skb+0x317/0x500
      [  217.404915][ T1635]  ? arp_xmit+0xca/0x2c0
      [ ... ]
      
      Fixes: 311633b6 ("hsr: switch ->dellink() to ->ndo_uninit()")
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df95467b
    • Yonghong Song's avatar
      selftests/bpf: Add a fexit/bpf2bpf test with target bpf prog no callees · 8f9081c9
      Yonghong Song authored
      The existing fexit_bpf2bpf test covers the target progrm with callees.
      This patch added a test for the target program without callees.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191205010607.177904-1-yhs@fb.com
      8f9081c9
    • Yonghong Song's avatar
      bpf: Fix a bug when getting subprog 0 jited image in check_attach_btf_id · e9eeec58
      Yonghong Song authored
      For jited bpf program, if the subprogram count is 1, i.e.,
      there is no callees in the program, prog->aux->func will be NULL
      and prog->bpf_func points to image address of the program.
      
      If there is more than one subprogram, prog->aux->func is populated,
      and subprogram 0 can be accessed through either prog->bpf_func or
      prog->aux->func[0]. Other subprograms should be accessed through
      prog->aux->func[subprog_id].
      
      This patch fixed a bug in check_attach_btf_id(), where
      prog->aux->func[subprog_id] is used to access any subprogram which
      caused a segfault like below:
        [79162.619208] BUG: kernel NULL pointer dereference, address:
        0000000000000000
        ......
        [79162.634255] Call Trace:
        [79162.634974]  ? _cond_resched+0x15/0x30
        [79162.635686]  ? kmem_cache_alloc_trace+0x162/0x220
        [79162.636398]  ? selinux_bpf_prog_alloc+0x1f/0x60
        [79162.637111]  bpf_prog_load+0x3de/0x690
        [79162.637809]  __do_sys_bpf+0x105/0x1740
        [79162.638488]  do_syscall_64+0x5b/0x180
        [79162.639147]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
        ......
      
      Fixes: 5b92a28a ("bpf: Support attaching tracing BPF program to other BPF programs")
      Reported-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191205010606.177774-1-yhs@fb.com
      e9eeec58