• Ido Schimmel's avatar
    net/sched: flower: Fix chain template offload · 32f2a0af
    Ido Schimmel authored
    When a qdisc is deleted from a net device the stack instructs the
    underlying driver to remove its flow offload callback from the
    associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack
    then continues to replay the removal of the filters in the block for
    this driver by iterating over the chains in the block and invoking the
    'reoffload' operation of the classifier being used. In turn, the
    classifier in its 'reoffload' operation prepares and emits a
    'FLOW_CLS_DESTROY' command for each filter.
    
    However, the stack does not do the same for chain templates and the
    underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when
    a qdisc is deleted. This results in a memory leak [1] which can be
    reproduced using [2].
    
    Fix by introducing a 'tmplt_reoffload' operation and have the stack
    invoke it with the appropriate arguments as part of the replay.
    Implement the operation in the sole classifier that supports chain
    templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}'
    command based on whether a flow offload callback is being bound to a
    filter block or being unbound from one.
    
    As far as I can tell, the issue happens since cited commit which
    reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains()
    in __tcf_block_put(). The order cannot be reversed as the filter block
    is expected to be freed after flushing all the chains.
    
    [1]
    unreferenced object 0xffff888107e28800 (size 2048):
      comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
      hex dump (first 32 bytes):
        b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff  ..|......[......
        01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff  ................
      backtrace:
        [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
        [<ffffffff81ab374e>] __kmalloc+0x4e/0x90
        [<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0
        [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
        [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
        [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
        [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
        [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
        [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
        [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
        [<ffffffff83ac6270>] netlink_unicast+0x540/0x820
        [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
        [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
        [<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0
        [<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0
        [<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0
    unreferenced object 0xffff88816d2c0400 (size 1024):
      comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
      hex dump (first 32 bytes):
        40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00  @.......W.8.....
        10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff  ..,m......,m....
      backtrace:
        [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
        [<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90
        [<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0
        [<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460
        [<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0
        [<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0
        [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
        [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
        [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
        [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
        [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
        [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
        [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
        [<ffffffff83ac6270>] netlink_unicast+0x540/0x820
        [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
        [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
    
    [2]
     # tc qdisc add dev swp1 clsact
     # tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32
     # tc qdisc del dev swp1 clsact
     # devlink dev reload pci/0000:06:00.0
    
    Fixes: bbf73830 ("net: sched: traverse chains in block with tcf_get_next_chain()")
    Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    32f2a0af
sch_generic.h 33.9 KB