1. 19 Jan, 2020 2 commits
    • Rahul Lakkireddy's avatar
      cxgb4: fix Tx multi channel port rate limit · c856e2b6
      Rahul Lakkireddy authored
      T6 can support 2 egress traffic management channels per port to
      double the total number of traffic classes that can be configured.
      In this configuration, if the class belongs to the other channel,
      then all the queues must be bound again explicitly to the new class,
      for the rate limit parameters on the other channel to take effect.
      
      So, always explicitly bind all queues to the port rate limit traffic
      class, regardless of the traffic management channel that it belongs
      to. Also, only bind queues to port rate limit traffic class, if all
      the queues don't already belong to an existing different traffic
      class.
      
      Fixes: 4ec4762d ("cxgb4: add TC-MATCHALL classifier egress offload")
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c856e2b6
    • Eric Dumazet's avatar
      net: sched: act_ctinfo: fix memory leak · 09d4f10a
      Eric Dumazet authored
      Implement a cleanup method to properly free ci->params
      
      BUG: memory leak
      unreferenced object 0xffff88811746e2c0 (size 64):
        comm "syz-executor617", pid 7106, jiffies 4294943055 (age 14.250s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          c0 34 60 84 ff ff ff ff 00 00 00 00 00 00 00 00  .4`.............
        backtrace:
          [<0000000015aa236f>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<0000000015aa236f>] slab_post_alloc_hook mm/slab.h:586 [inline]
          [<0000000015aa236f>] slab_alloc mm/slab.c:3320 [inline]
          [<0000000015aa236f>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549
          [<000000002c946bd1>] kmalloc include/linux/slab.h:556 [inline]
          [<000000002c946bd1>] kzalloc include/linux/slab.h:670 [inline]
          [<000000002c946bd1>] tcf_ctinfo_init+0x21a/0x530 net/sched/act_ctinfo.c:236
          [<0000000086952cca>] tcf_action_init_1+0x400/0x5b0 net/sched/act_api.c:944
          [<000000005ab29bf8>] tcf_action_init+0x135/0x1c0 net/sched/act_api.c:1000
          [<00000000392f56f9>] tcf_action_add+0x9a/0x200 net/sched/act_api.c:1410
          [<0000000088f3c5dd>] tc_ctl_action+0x14d/0x1bb net/sched/act_api.c:1465
          [<000000006b39d986>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424
          [<00000000fd6ecace>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
          [<0000000047493d02>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
          [<00000000bdcf8286>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
          [<00000000bdcf8286>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
          [<00000000fc5b92d9>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
          [<00000000da84d076>] sock_sendmsg_nosec net/socket.c:639 [inline]
          [<00000000da84d076>] sock_sendmsg+0x54/0x70 net/socket.c:659
          [<0000000042fb2eee>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
          [<000000008f23f67e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
          [<00000000d838e4f6>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
          [<00000000289a9cb1>] __do_sys_sendmsg net/socket.c:2426 [inline]
          [<00000000289a9cb1>] __se_sys_sendmsg net/socket.c:2424 [inline]
          [<00000000289a9cb1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424
      
      Fixes: 24ec483c ("net: sched: Introduce act_ctinfo action")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Toke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarKevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09d4f10a
  2. 18 Jan, 2020 4 commits
  3. 17 Jan, 2020 8 commits
    • Florian Fainelli's avatar
      net: systemport: Fixed queue mapping in internal ring map · 5a9ef194
      Florian Fainelli authored
      We would not be transmitting using the correct SYSTEMPORT transmit queue
      during ndo_select_queue() which looks up the internal TX ring map
      because while establishing the mapping we would be off by 4, so for
      instance, when we populate switch port mappings we would be doing:
      
      switch port 0, queue 0 -> ring index #0
      switch port 0, queue 1 -> ring index #1
      ...
      switch port 0, queue 3 -> ring index #3
      switch port 1, queue 0 -> ring index #8 (4 + 4 * 1)
      ...
      
      instead of using ring index #4. This would cause our ndo_select_queue()
      to use the fallback queue mechanism which would pick up an incorrect
      ring for that switch port. Fix this by using the correct switch queue
      number instead of SYSTEMPORT queue number.
      
      Fixes: 25c44070 ("net: systemport: Simplify queue mapping logic")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a9ef194
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec · 8f1880cb
      Florian Fainelli authored
      With the implementation of the system reset controller we lost a setting
      that is currently applied by the bootloader and which configures the IMP
      port for 2Gb/sec, the default is 1Gb/sec. This is needed given the
      number of ports and applications we expect to run so bring back that
      setting.
      
      Fixes: 01b0ac07589e ("net: dsa: bcm_sf2: Add support for optional reset controller line")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f1880cb
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Don't error out on disabled ports with no phy-mode · 27afe0d3
      Vladimir Oltean authored
      The sja1105_parse_ports_node function was tested only on device trees
      where all ports were enabled. Fix this check so that the driver
      continues to probe only with the ports where status is not "disabled",
      as expected.
      
      Fixes: 8aa9ebcc ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27afe0d3
    • Michael Grzeschik's avatar
      net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset · 86ffe920
      Michael Grzeschik authored
      According to the Datasheet this bit should be 0 (Normal operation) in
      default. With the FORCE_LINK_GOOD bit set, it is not possible to get a
      link. This patch sets FORCE_LINK_GOOD to the default value after
      resetting the phy.
      Signed-off-by: default avatarMichael Grzeschik <m.grzeschik@pengutronix.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86ffe920
    • Yonglong Liu's avatar
      net: hns: fix soft lockup when there is not enough memory · 49edd6a2
      Yonglong Liu authored
      When there is not enough memory and napi_alloc_skb() return NULL,
      the HNS driver will print error message, and than try again, if
      the memory is not enough for a while, huge error message and the
      retry operation will cause soft lockup.
      
      When napi_alloc_skb() return NULL because of no memory, we can
      get a warn_alloc() call trace, so this patch deletes the error
      message. We already use polling mode to handle irq, but the
      retry operation will render the polling weight inactive, this
      patch just return budget when the rx is not completed to avoid
      dead loop.
      
      Fixes: 36eedfde ("net: hns: Optimize hns_nic_common_poll for better performance")
      Fixes: b5996f11 ("net: add Hisilicon Network Subsystem basic ethernet support")
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49edd6a2
    • Cong Wang's avatar
      net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key() · 53d37497
      Cong Wang authored
      syzbot reported some bogus lockdep warnings, for example bad unlock
      balance in sch_direct_xmit(). They are due to a race condition between
      slow path and fast path, that is qdisc_xmit_lock_key gets re-registered
      in netdev_update_lockdep_key() on slow path, while we could still
      acquire the queue->_xmit_lock on fast path in this small window:
      
      CPU A						CPU B
      						__netif_tx_lock();
      lockdep_unregister_key(qdisc_xmit_lock_key);
      						__netif_tx_unlock();
      lockdep_register_key(qdisc_xmit_lock_key);
      
      In fact, unlike the addr_list_lock which has to be reordered when
      the master/slave device relationship changes, queue->_xmit_lock is
      only acquired on fast path and only when NETIF_F_LLTX is not set,
      so there is likely no nested locking for it.
      
      Therefore, we can just get rid of re-registration of
      qdisc_xmit_lock_key.
      
      Reported-by: syzbot+4ec99438ed7450da6272@syzkaller.appspotmail.com
      Fixes: ab92d68f ("net: core: add generic lockdep keys")
      Cc: Taehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53d37497
    • Eric Dumazet's avatar
      net/sched: act_ife: initalize ife->metalist earlier · 44c23d71
      Eric Dumazet authored
      It seems better to init ife->metalist earlier in tcf_ife_init()
      to avoid the following crash :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
      RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
      RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
      RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
      RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
      R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
      R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
      FS:  0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
       __tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
       __tcf_idr_release net/sched/act_api.c:165 [inline]
       __tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
       tcf_idr_release include/net/act_api.h:171 [inline]
       tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
       tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
       tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
       tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
       tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
       rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:639 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:659
       ____sys_sendmsg+0x753/0x880 net/socket.c:2330
       ___sys_sendmsg+0x100/0x170 net/socket.c:2384
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2417
       __do_sys_sendmsg net/socket.c:2426 [inline]
       __se_sys_sendmsg net/socket.c:2424 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 11a94d7f ("net/sched: act_ife: validate the control action inside init()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Davide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44c23d71
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · a72b6a1e
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Fix use-after-free in ipset bitmap destroy path, from Cong Wang.
      
      2) Missing init netns in entry cleanup path of arp_tables,
         from Florian Westphal.
      
      3) Fix WARN_ON in set destroy path due to missing cleanup on
         transaction error.
      
      4) Incorrect netlink sanity check in tunnel, from Florian Westphal.
      
      5) Missing sanity check for erspan version netlink attribute, also
         from Florian.
      
      6) Remove WARN in nft_request_module() that can be triggered from
         userspace, from Florian Westphal.
      
      7) Memleak in NFTA_HOOK_DEVS netlink parser, from Dan Carpenter.
      
      8) List poison from commit path for flowtables that are added and
         deleted in the same batch, from Florian Westphal.
      
      9) Fix NAT ICMP packet corruption, from Eyal Birger.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a72b6a1e
  4. 16 Jan, 2020 14 commits
  5. 15 Jan, 2020 12 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-sockmap-tls-fixes' · 85ddd9c3
      Daniel Borkmann authored
      John Fastabend says:
      
      ====================
      To date our usage of sockmap/tls has been fairly simple, the BPF programs
      did only well-defined pop, push, pull and apply/cork operations.
      
      Now that we started to push more complex programs into sockmap we uncovered
      a series of issues addressed here. Further OpenSSL3.0 version should be
      released soon with kTLS support so its important to get any remaining
      issues on BPF and kTLS support resolved.
      
      Additionally, I have a patch under development to allow sockmap to be
      enabled/disabled at runtime for Cilium endpoints. This allows us to stress
      the map insert/delete with kTLS more than previously where Cilium only
      added the socket to the map when it entered ESTABLISHED state and never
      touched it from the control path side again relying on the sockets own
      close() hook to remove it.
      
      To test I have a set of test cases in test_sockmap.c that expose these
      issues. Once we get fixes here merged and in bpf-next I'll submit the
      tests to bpf-next tree to ensure we don't regress again. Also I've run
      these patches in the Cilium CI with OpenSSL (master branch) this will
      run tools such as netperf, ab, wrk2, curl, etc. to get a broad set of
      testing.
      
      I'm aware of two more issues that we are working to resolve in another
      couple (probably two) patches. First we see an auth tag corruption in
      kTLS when sending small 1byte chunks under stress. I've not pinned this
      down yet. But, guessing because its under 1B stress tests it must be
      some error path being triggered. And second we need to ensure BPF RX
      programs are not skipped when kTLS ULP is loaded. This breaks some of the
      sockmap selftests when running with kTLS. I'll send a follow up for this.
      
      v2: I dropped a patch that added !0 size check in tls_push_record
          this originated from a panic I caught awhile ago with a trace
          in the crypto stack. But I can not reproduce it anymore so will
          dig into that and send another patch later if needed. Anyways
          after a bit of thought it would be nicer if tls/crypto/bpf didn't
          require special case handling for the !0 size.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      85ddd9c3
    • John Fastabend's avatar
      bpf: Sockmap/tls, fix pop data with SK_DROP return code · 7361d448
      John Fastabend authored
      When user returns SK_DROP we need to reset the number of copied bytes
      to indicate to the user the bytes were dropped and not sent. If we
      don't reset the copied arg sendmsg will return as if those bytes were
      copied giving the user a positive return value.
      
      This works as expected today except in the case where the user also
      pops bytes. In the pop case the sg.size is reduced but we don't correctly
      account for this when copied bytes is reset. The popped bytes are not
      accounted for and we return a small positive value potentially confusing
      the user.
      
      The reason this happens is due to a typo where we do the wrong comparison
      when accounting for pop bytes. In this fix notice the if/else is not
      needed and that we have a similar problem if we push data except its not
      visible to the user because if delta is larger the sg.size we return a
      negative value so it appears as an error regardless.
      
      Fixes: 7246d8ed ("bpf: helper to pop data from messages")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-9-john.fastabend@gmail.com
      7361d448
    • John Fastabend's avatar
      bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining · 9aaaa568
      John Fastabend authored
      Its possible through a set of push, pop, apply helper calls to construct
      a skmsg, which is just a ring of scatterlist elements, with the start
      value larger than the end value. For example,
      
            end       start
        |_0_|_1_| ... |_n_|_n+1_|
      
      Where end points at 1 and start points and n so that valid elements is
      the set {n, n+1, 0, 1}.
      
      Currently, because we don't build the correct chain only {n, n+1} will
      be sent. This adds a check and sg_chain call to correctly submit the
      above to the crypto and tls send path.
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-8-john.fastabend@gmail.com
      9aaaa568
    • John Fastabend's avatar
      bpf: Sockmap/tls, tls_sw can create a plaintext buf > encrypt buf · d468e477
      John Fastabend authored
      It is possible to build a plaintext buffer using push helper that is larger
      than the allocated encrypt buffer. When this record is pushed to crypto
      layers this can result in a NULL pointer dereference because the crypto
      API expects the encrypt buffer is large enough to fit the plaintext
      buffer. Kernel splat below.
      
      To resolve catch the cases this can happen and split the buffer into two
      records to send individually. Unfortunately, there is still one case to
      handle where the split creates a zero sized buffer. In this case we merge
      the buffers and unmark the split. This happens when apply is zero and user
      pushed data beyond encrypt buffer. This fixes the original case as well
      because the split allocated an encrypt buffer larger than the plaintext
      buffer and the merge simply moves the pointers around so we now have
      a reference to the new (larger) encrypt buffer.
      
      Perhaps its not ideal but it seems the best solution for a fixes branch
      and avoids handling these two cases, (a) apply that needs split and (b)
      non apply case. The are edge cases anyways so optimizing them seems not
      necessary unless someone wants later in next branches.
      
      [  306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008
      [...]
      [  306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0
      [...]
      [  306.770350] Call Trace:
      [  306.770956]  scatterwalk_map_and_copy+0x6c/0x80
      [  306.772026]  gcm_enc_copy_hash+0x4b/0x50
      [  306.772925]  gcm_hash_crypt_remain_continue+0xef/0x110
      [  306.774138]  gcm_hash_crypt_continue+0xa1/0xb0
      [  306.775103]  ? gcm_hash_crypt_continue+0xa1/0xb0
      [  306.776103]  gcm_hash_assoc_remain_continue+0x94/0xa0
      [  306.777170]  gcm_hash_assoc_continue+0x9d/0xb0
      [  306.778239]  gcm_hash_init_continue+0x8f/0xa0
      [  306.779121]  gcm_hash+0x73/0x80
      [  306.779762]  gcm_encrypt_continue+0x6d/0x80
      [  306.780582]  crypto_gcm_encrypt+0xcb/0xe0
      [  306.781474]  crypto_aead_encrypt+0x1f/0x30
      [  306.782353]  tls_push_record+0x3b9/0xb20 [tls]
      [  306.783314]  ? sk_psock_msg_verdict+0x199/0x300
      [  306.784287]  bpf_exec_tx_verdict+0x3f2/0x680 [tls]
      [  306.785357]  tls_sw_sendmsg+0x4a3/0x6a0 [tls]
      
      test_sockmap test signature to trigger bug,
      
      [TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,):
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-7-john.fastabend@gmail.com
      d468e477
    • John Fastabend's avatar
      bpf: Sockmap/tls, msg_push_data may leave end mark in place · cf21e9ba
      John Fastabend authored
      Leaving an incorrect end mark in place when passing to crypto
      layer will cause crypto layer to stop processing data before
      all data is encrypted. To fix clear the end mark on push
      data instead of expecting users of the helper to clear the
      mark value after the fact.
      
      This happens when we push data into the middle of a skmsg and
      have room for it so we don't do a set of copies that already
      clear the end flag.
      
      Fixes: 6fff607e ("bpf: sk_msg program helper bpf_msg_push_data")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-6-john.fastabend@gmail.com
      cf21e9ba
    • John Fastabend's avatar
      bpf: Sockmap, skmsg helper overestimates push, pull, and pop bounds · 6562e29c
      John Fastabend authored
      In the push, pull, and pop helpers operating on skmsg objects to make
      data writable or insert/remove data we use this bounds check to ensure
      specified data is valid,
      
       /* Bounds checks: start and pop must be inside message */
       if (start >= offset + l || last >= msg->sg.size)
           return -EINVAL;
      
      The problem here is offset has already included the length of the
      current element the 'l' above. So start could be past the end of
      the scatterlist element in the case where start also points into an
      offset on the last skmsg element.
      
      To fix do the accounting slightly different by adding the length of
      the previous entry to offset at the start of the iteration. And
      ensure its initialized to zero so that the first iteration does
      nothing.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Fixes: 6fff607e ("bpf: sk_msg program helper bpf_msg_push_data")
      Fixes: 7246d8ed ("bpf: helper to pop data from messages")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-5-john.fastabend@gmail.com
      6562e29c
    • John Fastabend's avatar
      bpf: Sockmap/tls, push write_space updates through ulp updates · 33bfe20d
      John Fastabend authored
      When sockmap sock with TLS enabled is removed we cleanup bpf/psock state
      and call tcp_update_ulp() to push updates to TLS ULP on top. However, we
      don't push the write_space callback up and instead simply overwrite the
      op with the psock stored previous op. This may or may not be correct so
      to ensure we don't overwrite the TLS write space hook pass this field to
      the ULP and have it fixup the ctx.
      
      This completes a previous fix that pushed the ops through to the ULP
      but at the time missed doing this for write_space, presumably because
      write_space TLS hook was added around the same time.
      
      Fixes: 95fa1454 ("bpf: sockmap/tls, close can race with map free")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
      33bfe20d
    • John Fastabend's avatar
      bpf: Sockmap, ensure sock lock held during tear down · 7e81a353
      John Fastabend authored
      The sock_map_free() and sock_hash_free() paths used to delete sockmap
      and sockhash maps walk the maps and destroy psock and bpf state associated
      with the socks in the map. When done the socks no longer have BPF programs
      attached and will function normally. This can happen while the socks in
      the map are still "live" meaning data may be sent/received during the walk.
      
      Currently, though we don't take the sock_lock when the psock and bpf state
      is removed through this path. Specifically, this means we can be writing
      into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc.
      while they are also being called from the networking side. This is not
      safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we
      believed it was safe. Further its not clear to me its even a good idea
      to try and do this on "live" sockets while networking side might also
      be using the socket. Instead of trying to reason about using the socks
      from both sides lets realize that every use case I'm aware of rarely
      deletes maps, in fact kubernetes/Cilium case builds map at init and
      never tears it down except on errors. So lets do the simple fix and
      grab sock lock.
      
      This patch wraps sock deletes from maps in sock lock and adds some
      annotations so we catch any other cases easier.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-3-john.fastabend@gmail.com
      7e81a353
    • John Fastabend's avatar
      bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop · 4da6a196
      John Fastabend authored
      When a sockmap is free'd and a socket in the map is enabled with tls
      we tear down the bpf context on the socket, the psock struct and state,
      and then call tcp_update_ulp(). The tcp_update_ulp() call is to inform
      the tls stack it needs to update its saved sock ops so that when the tls
      socket is later destroyed it doesn't try to call the now destroyed psock
      hooks.
      
      This is about keeping stacked ULPs in good shape so they always have
      the right set of stacked ops.
      
      However, recently unhash() hook was removed from TLS side. But, the
      sockmap/bpf side is not doing any extra work to update the unhash op
      when is torn down instead expecting TLS side to manage it. So both
      TLS and sockmap believe the other side is managing the op and instead
      no one updates the hook so it continues to point at tcp_bpf_unhash().
      When unhash hook is called we call tcp_bpf_unhash() which detects the
      psock has already been destroyed and calls sk->sk_prot_unhash() which
      calls tcp_bpf_unhash() yet again and so on looping and hanging the core.
      
      To fix have sockmap tear down logic fixup the stale pointer.
      
      Fixes: 5d92e631 ("net/tls: partially revert fix transition through disconnect with close")
      Reported-by: syzbot+83979935eb6304f8cd46@syzkaller.appspotmail.com
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-2-john.fastabend@gmail.com
      4da6a196
    • David S. Miller's avatar
      Merge branch 'stmmac-Fix-selftests-in-Synopsys-AXS101-board' · 567110f1
      David S. Miller authored
      Jose Abreu says:
      
      ====================
      net: stmmac: Fix selftests in Synopsys AXS101 board
      
      Set of fixes for sefltests so that they work in Synopsys AXS101 board.
      
      Final output:
      
      $ ethtool -t eth0
      The test result is PASS
      The test extra info:
       1. MAC Loopback                 0
       2. PHY Loopback                 -95
       3. MMC Counters                 0
       4. EEE                          -95
       5. Hash Filter MC               0
       6. Perfect Filter UC            0
       7. MC Filter                    0
       8. UC Filter                    0
       9. Flow Control                 -95
      10. RSS                          -95
      11. VLAN Filtering               -95
      12. VLAN Filtering (perf)        -95
      13. Double VLAN Filter           -95
      14. Double VLAN Filter (perf)    -95
      15. Flexible RX Parser           -95
      16. SA Insertion (desc)          -95
      17. SA Replacement (desc)        -95
      18. SA Insertion (reg)           -95
      19. SA Replacement (reg)         -95
      20. VLAN TX Insertion            -95
      21. SVLAN TX Insertion           -95
      22. L3 DA Filtering              -95
      23. L3 SA Filtering              -95
      24. L4 DA TCP Filtering          -95
      25. L4 SA TCP Filtering          -95
      26. L4 DA UDP Filtering          -95
      27. L4 SA UDP Filtering          -95
      28. ARP Offload                  -95
      29. Jumbo Frame                  0
      30. Multichannel Jumbo           -95
      31. Split Header                 -95
      
      Description:
      
      1) Fixes the unaligned accesses that caused CPU halt in Synopsys AXS101
      boards.
      
      2) Fixes the VLAN tests when filtering failed to work.
      
      3) Fixes the VLAN Perfect tests when filtering is not available in HW.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      567110f1
    • Jose Abreu's avatar
      net: stmmac: selftests: Guard VLAN Perfect test against non supported HW · 4eee13f1
      Jose Abreu authored
      When HW does not support perfect filtering the feature will not be
      enabled in the net_device. Add a check for this to prevent failures.
      
      Fixes: 1b2250a0 ("net: stmmac: selftests: Add tests for VLAN Perfect Filtering")
      Signed-off-by: default avatarJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4eee13f1
    • Jose Abreu's avatar
      net: stmmac: selftests: Mark as fail when received VLAN ID != expected · d39b68e5
      Jose Abreu authored
      When the VLAN ID does not match the expected one it means filter failed
      in HW. Fix it.
      
      Fixes: 94e18382 ("net: stmmac: selftests: Add selftest for VLAN TX Offload")
      Signed-off-by: default avatarJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d39b68e5