1. 24 May, 2018 1 commit
    • Daniel Borkmann's avatar
      bpf: properly enforce index mask to prevent out-of-bounds speculation · c93552c4
      Daniel Borkmann authored
      While reviewing the verifier code, I recently noticed that the
      following two program variants in relation to tail calls can be
      loaded.
      
      Variant 1:
      
        # bpftool p d x i 15
          0: (15) if r1 == 0x0 goto pc+3
          1: (18) r2 = map[id:5]
          3: (05) goto pc+2
          4: (18) r2 = map[id:6]
          6: (b7) r3 = 7
          7: (35) if r3 >= 0xa0 goto pc+2
          8: (54) (u32) r3 &= (u32) 255
          9: (85) call bpf_tail_call#12
         10: (b7) r0 = 1
         11: (95) exit
      
        # bpftool m s i 5
          5: prog_array  flags 0x0
              key 4B  value 4B  max_entries 4  memlock 4096B
        # bpftool m s i 6
          6: prog_array  flags 0x0
              key 4B  value 4B  max_entries 160  memlock 4096B
      
      Variant 2:
      
        # bpftool p d x i 20
          0: (15) if r1 == 0x0 goto pc+3
          1: (18) r2 = map[id:8]
          3: (05) goto pc+2
          4: (18) r2 = map[id:7]
          6: (b7) r3 = 7
          7: (35) if r3 >= 0x4 goto pc+2
          8: (54) (u32) r3 &= (u32) 3
          9: (85) call bpf_tail_call#12
         10: (b7) r0 = 1
         11: (95) exit
      
        # bpftool m s i 8
          8: prog_array  flags 0x0
              key 4B  value 4B  max_entries 160  memlock 4096B
        # bpftool m s i 7
          7: prog_array  flags 0x0
              key 4B  value 4B  max_entries 4  memlock 4096B
      
      In both cases the index masking inserted by the verifier in order
      to control out of bounds speculation from a CPU via b2157399
      ("bpf: prevent out-of-bounds speculation") seems to be incorrect
      in what it is enforcing. In the 1st variant, the mask is applied
      from the map with the significantly larger number of entries where
      we would allow to a certain degree out of bounds speculation for
      the smaller map, and in the 2nd variant where the mask is applied
      from the map with the smaller number of entries, we get buggy
      behavior since we truncate the index of the larger map.
      
      The original intent from commit b2157399 is to reject such
      occasions where two or more different tail call maps are used
      in the same tail call helper invocation. However, the check on
      the BPF_MAP_PTR_POISON is never hit since we never poisoned the
      saved pointer in the first place! We do this explicitly for map
      lookups but in case of tail calls we basically used the tail
      call map in insn_aux_data that was processed in the most recent
      path which the verifier walked. Thus any prior path that stored
      a pointer in insn_aux_data at the helper location was always
      overridden.
      
      Fix it by moving the map pointer poison logic into a small helper
      that covers both BPF helpers with the same logic. After that in
      fixup_bpf_calls() the poison check is then hit for tail calls
      and the program rejected. Latter only happens in unprivileged
      case since this is the *only* occasion where a rewrite needs to
      happen, and where such rewrite is specific to the map (max_entries,
      index_mask). In the privileged case the rewrite is generic for
      the insn->imm / insn->code update so multiple maps from different
      paths can be handled just fine since all the remaining logic
      happens in the instruction processing itself. This is similar
      to the case of map lookups: in case there is a collision of
      maps in fixup_bpf_calls() we must skip the inlined rewrite since
      this will turn the generic instruction sequence into a non-
      generic one. Thus the patch_call_imm will simply update the
      insn->imm location where the bpf_map_lookup_elem() will later
      take care of the dispatch. Given we need this 'poison' state
      as a check, the information of whether a map is an unpriv_array
      gets lost, so enforcing it prior to that needs an additional
      state. In general this check is needed since there are some
      complex and tail call intensive BPF programs out there where
      LLVM tends to generate such code occasionally. We therefore
      convert the map_ptr rather into map_state to store all this
      w/o extra memory overhead, and the bit whether one of the maps
      involved in the collision was from an unpriv_array thus needs
      to be retained as well there.
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c93552c4
  2. 23 May, 2018 1 commit
  3. 18 May, 2018 8 commits
    • Anders Roxell's avatar
      selftests: bpf: config: enable NET_SCH_INGRESS for xdp_meta.sh · a6837d26
      Anders Roxell authored
      When running bpf's selftest test_xdp_meta.sh it fails:
      ./test_xdp_meta.sh
      Error: Specified qdisc not found.
      selftests: test_xdp_meta [FAILED]
      
      Need to enable CONFIG_NET_SCH_INGRESS and CONFIG_NET_CLS_ACT to get the
      test to pass.
      
      Fixes: 22c88526 ("bpf: improve selftests and add tests for meta pointer")
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a6837d26
    • Rahul Lakkireddy's avatar
      cxgb4: fix offset in collecting TX rate limit info · d775f26b
      Rahul Lakkireddy authored
      Correct the indirect register offsets in collecting TX rate limit info
      in UP CIM logs.
      
      Also, T5 doesn't support these indirect register offsets, so remove
      them from collection logic.
      
      Fixes: be6e36d9 ("cxgb4: collect TX rate limit info in UP CIM logs")
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d775f26b
    • Paolo Abeni's avatar
      net: sched: red: avoid hashing NULL child · 44a63b13
      Paolo Abeni authored
      Hangbin reported an Oops triggered by the syzkaller qdisc rules:
      
       kasan: GPF could be caused by NULL-ptr deref or user memory access
       general protection fault: 0000 [#1] SMP KASAN PTI
       Modules linked in: sch_red
       CPU: 0 PID: 28699 Comm: syz-executor5 Not tainted 4.17.0-rc4.kcov #1
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:qdisc_hash_add+0x26/0xa0
       RSP: 0018:ffff8800589cf470 EFLAGS: 00010203
       RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff824ad971
       RDX: 0000000000000007 RSI: ffffc9000ce9f000 RDI: 000000000000003c
       RBP: 0000000000000001 R08: ffffed000b139ea2 R09: ffff8800589cf4f0
       R10: ffff8800589cf50f R11: ffffed000b139ea2 R12: ffff880054019fc0
       R13: ffff880054019fb4 R14: ffff88005c0af600 R15: ffff880054019fb0
       FS:  00007fa6edcb1700(0000) GS:ffff88005ce00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000020000740 CR3: 000000000fc16000 CR4: 00000000000006f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        red_change+0x2d2/0xed0 [sch_red]
        qdisc_create+0x57e/0xef0
        tc_modify_qdisc+0x47f/0x14e0
        rtnetlink_rcv_msg+0x6a8/0x920
        netlink_rcv_skb+0x2a2/0x3c0
        netlink_unicast+0x511/0x740
        netlink_sendmsg+0x825/0xc30
        sock_sendmsg+0xc5/0x100
        ___sys_sendmsg+0x778/0x8e0
        __sys_sendmsg+0xf5/0x1b0
        do_syscall_64+0xbd/0x3b0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x450869
       RSP: 002b:00007fa6edcb0c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007fa6edcb16b4 RCX: 0000000000450869
       RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000013
       RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
       R13: 0000000000008778 R14: 0000000000702838 R15: 00007fa6edcb1700
       Code: e9 0b fe ff ff 0f 1f 44 00 00 55 53 48 89 fb 89 f5 e8 3f 07 f3 fe 48 8d 7b 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 51
       RIP: qdisc_hash_add+0x26/0xa0 RSP: ffff8800589cf470
      
      When a red qdisc is updated with a 0 limit, the child qdisc is left
      unmodified, no additional scheduler is created in red_change(),
      the 'child' local variable is rightfully NULL and must not add it
      to the hash table.
      
      This change addresses the above issue moving qdisc_hash_add() right
      after the child qdisc creation. It additionally removes unneeded checks
      for noop_qdisc.
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Fixes: 49b49971 ("net: sched: make default fifo qdiscs appear in the dump")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44a63b13
    • Eric Dumazet's avatar
      sock_diag: fix use-after-free read in __sk_free · 9709020c
      Eric Dumazet authored
      We must not call sock_diag_has_destroy_listeners(sk) on a socket
      that has no reference on net structure.
      
      BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
      BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609
      Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0
      
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
       __sk_free+0x329/0x340 net/core/sock.c:1609
       sk_free+0x42/0x50 net/core/sock.c:1623
       sock_put include/net/sock.h:1664 [inline]
       reqsk_free include/net/request_sock.h:116 [inline]
       reqsk_put include/net/request_sock.h:124 [inline]
       inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline]
       reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739
       call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
       run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
       invoke_softirq kernel/softirq.c:365 [inline]
       irq_exit+0x1d1/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:525 [inline]
       smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
       </IRQ>
      RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
      RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
      RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000
      RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680
      RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
      R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000
       arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
       default_idle+0xc2/0x440 arch/x86/kernel/process.c:354
       arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345
       default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
       cpuidle_idle_call kernel/sched/idle.c:153 [inline]
       do_idle+0x395/0x560 kernel/sched/idle.c:262
       cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368
       start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269
       secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
      
      Allocated by task 4557:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       kmem_cache_zalloc include/linux/slab.h:691 [inline]
       net_alloc net/core/net_namespace.c:383 [inline]
       copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423
       create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206
       ksys_unshare+0x708/0xf90 kernel/fork.c:2408
       __do_sys_unshare kernel/fork.c:2476 [inline]
       __se_sys_unshare kernel/fork.c:2474 [inline]
       __x64_sys_unshare+0x31/0x40 kernel/fork.c:2474
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 69:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       net_free net/core/net_namespace.c:399 [inline]
       net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406
       net_drop_ns net/core/net_namespace.c:405 [inline]
       cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541
       process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
       worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
       kthread+0x345/0x410 kernel/kthread.c:240
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      The buggy address belongs to the object at ffff88018a02c140
       which belongs to the cache net_namespace of size 8832
      The buggy address is located 8800 bytes inside of
       8832-byte region [ffff88018a02c140, ffff88018a02e3c0)
      The buggy address belongs to the page:
      page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0
      flags: 0x2fffc0000008100(slab|head)
      raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001
      raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000
      page dumped because: kasan: bad access detected
      
      Fixes: b922622e ("sock_diag: don't broadcast kernel sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Craig Gallek <kraig@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9709020c
    • Geert Uytterhoeven's avatar
      sh_eth: Change platform check to CONFIG_ARCH_RENESAS · b16a960d
      Geert Uytterhoeven authored
      Since commit 9b5ba0df ("ARM: shmobile: Introduce ARCH_RENESAS")
      is CONFIG_ARCH_RENESAS a more appropriate platform check than the legacy
      CONFIG_ARCH_SHMOBILE, hence use the former.
      
      Renesas SuperH SH-Mobile SoCs are still covered by the CONFIG_CPU_SH4
      check.
      
      This will allow to drop ARCH_SHMOBILE on ARM and ARM64 in the near
      future.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Reviewed-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b16a960d
    • Florian Fainelli's avatar
      net: dsa: Do not register devlink for unused ports · 5447d786
      Florian Fainelli authored
      Even if commit 1d27732f ("net: dsa: setup and teardown ports") indicated
      that registering a devlink instance for unused ports is not a problem, and this
      is true, this can be confusing nonetheless, so let's not do it.
      
      Fixes: 1d27732f ("net: dsa: setup and teardown ports")
      Reported-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5447d786
    • Amritha Nambiar's avatar
      net: Fix a bug in removing queues from XPS map · 6358d49a
      Amritha Nambiar authored
      While removing queues from the XPS map, the individual CPU ID
      alone was used to index the CPUs map, this should be changed to also
      factor in the traffic class mapping for the CPU-to-queue lookup.
      
      Fixes: 184c449f ("net: Add support for XPS with QoS via traffic classes")
      Signed-off-by: default avatarAmritha Nambiar <amritha.nambiar@intel.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6358d49a
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 6caf9fb3
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-05-18
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix two bugs in sockmap, a use after free in sockmap's error path
         from sock_map_ctx_update_elem() where we mistakenly drop a reference
         we didn't take prior to that, and in the same function fix a race
         in bpf_prog_inc_not_zero() where we didn't use the progs from prior
         READ_ONCE(), from John.
      
      2) Reject program expansions once we figure out that their jump target
         which crosses patchlet boundaries could otherwise get truncated in
         insn->off space, from Daniel.
      
      3) Check the return value of fopen() in BPF selftest's test_verifier
         where we determine whether unpriv BPF is disabled, and iff we do
         fail there then just assume it is disabled. This fixes a segfault
         when used with older kernels, from Jesper.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6caf9fb3
  4. 17 May, 2018 20 commits
    • Daniel Borkmann's avatar
      bpf: fix truncated jump targets on heavy expansions · 050fad7c
      Daniel Borkmann authored
      Recently during testing, I ran into the following panic:
      
        [  207.892422] Internal error: Accessing user space memory outside uaccess.h routines: 96000004 [#1] SMP
        [  207.901637] Modules linked in: binfmt_misc [...]
        [  207.966530] CPU: 45 PID: 2256 Comm: test_verifier Tainted: G        W         4.17.0-rc3+ #7
        [  207.974956] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017
        [  207.982428] pstate: 60400005 (nZCv daif +PAN -UAO)
        [  207.987214] pc : bpf_skb_load_helper_8_no_cache+0x34/0xc0
        [  207.992603] lr : 0xffff000000bdb754
        [  207.996080] sp : ffff000013703ca0
        [  207.999384] x29: ffff000013703ca0 x28: 0000000000000001
        [  208.004688] x27: 0000000000000001 x26: 0000000000000000
        [  208.009992] x25: ffff000013703ce0 x24: ffff800fb4afcb00
        [  208.015295] x23: ffff00007d2f5038 x22: ffff00007d2f5000
        [  208.020599] x21: fffffffffeff2a6f x20: 000000000000000a
        [  208.025903] x19: ffff000009578000 x18: 0000000000000a03
        [  208.031206] x17: 0000000000000000 x16: 0000000000000000
        [  208.036510] x15: 0000ffff9de83000 x14: 0000000000000000
        [  208.041813] x13: 0000000000000000 x12: 0000000000000000
        [  208.047116] x11: 0000000000000001 x10: ffff0000089e7f18
        [  208.052419] x9 : fffffffffeff2a6f x8 : 0000000000000000
        [  208.057723] x7 : 000000000000000a x6 : 00280c6160000000
        [  208.063026] x5 : 0000000000000018 x4 : 0000000000007db6
        [  208.068329] x3 : 000000000008647a x2 : 19868179b1484500
        [  208.073632] x1 : 0000000000000000 x0 : ffff000009578c08
        [  208.078938] Process test_verifier (pid: 2256, stack limit = 0x0000000049ca7974)
        [  208.086235] Call trace:
        [  208.088672]  bpf_skb_load_helper_8_no_cache+0x34/0xc0
        [  208.093713]  0xffff000000bdb754
        [  208.096845]  bpf_test_run+0x78/0xf8
        [  208.100324]  bpf_prog_test_run_skb+0x148/0x230
        [  208.104758]  sys_bpf+0x314/0x1198
        [  208.108064]  el0_svc_naked+0x30/0x34
        [  208.111632] Code: 91302260 f9400001 f9001fa1 d2800001 (29500680)
        [  208.117717] ---[ end trace 263cb8a59b5bf29f ]---
      
      The program itself which caused this had a long jump over the whole
      instruction sequence where all of the inner instructions required
      heavy expansions into multiple BPF instructions. Additionally, I also
      had BPF hardening enabled which requires once more rewrites of all
      constant values in order to blind them. Each time we rewrite insns,
      bpf_adj_branches() would need to potentially adjust branch targets
      which cross the patchlet boundary to accommodate for the additional
      delta. Eventually that lead to the case where the target offset could
      not fit into insn->off's upper 0x7fff limit anymore where then offset
      wraps around becoming negative (in s16 universe), or vice versa
      depending on the jump direction.
      
      Therefore it becomes necessary to detect and reject any such occasions
      in a generic way for native eBPF and cBPF to eBPF migrations. For
      the latter we can simply check bounds in the bpf_convert_filter()'s
      BPF_EMIT_JMP helper macro and bail out once we surpass limits. The
      bpf_patch_insn_single() for native eBPF (and cBPF to eBPF in case
      of subsequent hardening) is a bit more complex in that we need to
      detect such truncations before hitting the bpf_prog_realloc(). Thus
      the latter is split into an extra pass to probe problematic offsets
      on the original program in order to fail early. With that in place
      and carefully tested I no longer hit the panic and the rewrites are
      rejected properly. The above example panic I've seen on bpf-next,
      though the issue itself is generic in that a guard against this issue
      in bpf seems more appropriate in this case.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      050fad7c
    • John Fastabend's avatar
      bpf: parse and verdict prog attach may race with bpf map update · 96174560
      John Fastabend authored
      In the sockmap design BPF programs (SK_SKB_STREAM_PARSER,
      SK_SKB_STREAM_VERDICT and SK_MSG_VERDICT) are attached to the sockmap
      map type and when a sock is added to the map the programs are used by
      the socket. However, sockmap updates from both userspace and BPF
      programs can happen concurrently with the attach and detach of these
      programs.
      
      To resolve this we use the bpf_prog_inc_not_zero and a READ_ONCE()
      primitive to ensure the program pointer is not refeched and
      possibly NULL'd before the refcnt increment. This happens inside
      a RCU critical section so although the pointer reference in the map
      object may be NULL (by a concurrent detach operation) the reference
      from READ_ONCE will not be free'd until after grace period. This
      ensures the object returned by READ_ONCE() is valid through the
      RCU criticl section and safe to use as long as we "know" it may
      be free'd shortly.
      
      Daniel spotted a case in the sock update API where instead of using
      the READ_ONCE() program reference we used the pointer from the
      original map, stab->bpf_{verdict|parse|txmsg}. The problem with this
      is the logic checks the object returned from the READ_ONCE() is not
      NULL and then tries to reference the object again but using the
      above map pointer, which may have already been NULL'd by a parallel
      detach operation. If this happened bpf_porg_inc_not_zero could
      dereference a NULL pointer.
      
      Fix this by using variable returned by READ_ONCE() that is checked
      for NULL.
      
      Fixes: 2f857d04 ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      96174560
    • John Fastabend's avatar
      bpf: sockmap update rollback on error can incorrectly dec prog refcnt · a593f708
      John Fastabend authored
      If the user were to only attach one of the parse or verdict programs
      then it is possible a subsequent sockmap update could incorrectly
      decrement the refcnt on the program. This happens because in the
      rollback logic, after an error, we have to decrement the program
      reference count when its been incremented. However, we only increment
      the program reference count if the user has both a verdict and a
      parse program. The reason for this is because, at least at the
      moment, both are required for any one to be meaningful. The problem
      fixed here is in the rollback path we decrement the program refcnt
      even if only one existing. But we never incremented the refcnt in
      the first place creating an imbalance.
      
      This patch fixes the error path to handle this case.
      
      Fixes: 2f857d04 ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a593f708
    • Willem de Bruijn's avatar
      net: test tailroom before appending to linear skb · 113f99c3
      Willem de Bruijn authored
      Device features may change during transmission. In particular with
      corking, a device may toggle scatter-gather in between allocating
      and writing to an skb.
      
      Do not unconditionally assume that !NETIF_F_SG at write time implies
      that the same held at alloc time and thus the skb has sufficient
      tailroom.
      
      This issue predates git history.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      113f99c3
    • David S. Miller's avatar
      Merge branch 'ip6_gre-Fixes-in-headroom-handling' · 374edea4
      David S. Miller authored
      Petr Machata says:
      
      ====================
      net: ip6_gre: Fixes in headroom handling
      
      This series mends some problems in headroom management in ip6_gre
      module. The current code base has the following three closely-related
      problems:
      
      - ip6gretap tunnels neglect to ensure there's enough writable headroom
        before pushing GRE headers.
      
      - ip6erspan does this, but assumes that dev->needed_headroom is primed.
        But that doesn't happen until ip6_tnl_xmit() is called later. Thus for
        the first packet, ip6erspan actually behaves like ip6gretap above.
      
      - ip6erspan shares some of the code with ip6gretap, including
        calculations of needed header length. While there is custom
        ERSPAN-specific code for calculating the headroom, the computed
        values are overwritten by the ip6gretap code.
      
      The first two issues lead to a kernel panic in situations where a packet
      is mirrored from a veth device to the device in question. They are
      fixed, respectively, in patches #1 and #2, which include the full panic
      trace and a reproducer.
      
      The rest of the patchset deals with the last issue. In patches #3 to #6,
      several functions are split up into reusable parts. Finally in patch #7
      these blocks are used to compose ERSPAN-specific callbacks where
      necessary to fix the hlen calculation.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      374edea4
    • Petr Machata's avatar
      net: ip6_gre: Fix ip6erspan hlen calculation · 2d665034
      Petr Machata authored
      Even though ip6erspan_tap_init() sets up hlen and tun_hlen according to
      what ERSPAN needs, it goes ahead to call ip6gre_tnl_link_config() which
      overwrites these settings with GRE-specific ones.
      
      Similarly for changelink callbacks, which are handled by
      ip6gre_changelink() calls ip6gre_tnl_change() calls
      ip6gre_tnl_link_config() as well.
      
      The difference ends up being 12 vs. 20 bytes, and this is generally not
      a problem, because a 12-byte request likely ends up allocating more and
      the extra 8 bytes are thus available. However correct it is not.
      
      So replace the newlink and changelink callbacks with an ERSPAN-specific
      ones, reusing the newly-introduced _common() functions.
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d665034
    • Petr Machata's avatar
      net: ip6_gre: Split up ip6gre_changelink() · c8632fc3
      Petr Machata authored
      Extract from ip6gre_changelink() a reusable function
      ip6gre_changelink_common(). This will allow introduction of
      ERSPAN-specific _changelink() function with not a lot of code
      duplication.
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8632fc3
    • Petr Machata's avatar
      net: ip6_gre: Split up ip6gre_newlink() · 7fa38a7c
      Petr Machata authored
      Extract from ip6gre_newlink() a reusable function
      ip6gre_newlink_common(). The ip6gre_tnl_link_config() call needs to be
      made customizable for ERSPAN, thus reorder it with calls to
      ip6_tnl_change_mtu() and dev_hold(), and extract the whole tail to the
      caller, ip6gre_newlink(). Thus enable an ERSPAN-specific _newlink()
      function without a lot of duplicity.
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7fa38a7c
    • Petr Machata's avatar
      net: ip6_gre: Split up ip6gre_tnl_change() · a6465350
      Petr Machata authored
      Split a reusable function ip6gre_tnl_copy_tnl_parm() from
      ip6gre_tnl_change(). This will allow ERSPAN-specific code to
      reuse the common parts while customizing the behavior for ERSPAN.
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6465350
    • Petr Machata's avatar
      net: ip6_gre: Split up ip6gre_tnl_link_config() · a483373e
      Petr Machata authored
      The function ip6gre_tnl_link_config() is used for setting up
      configuration of both ip6gretap and ip6erspan tunnels. Split the
      function into the common part and the route-lookup part. The latter then
      takes the calculated header length as an argument. This split will allow
      the patches down the line to sneak in a custom header length computation
      for the ERSPAN tunnel.
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a483373e
    • Petr Machata's avatar
      net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit() · 5691484d
      Petr Machata authored
      dev->needed_headroom is not primed until ip6_tnl_xmit(), so it starts
      out zero. Thus the call to skb_cow_head() fails to actually make sure
      there's enough headroom to push the ERSPAN headers to. That can lead to
      the panic cited below. (Reproducer below that).
      
      Fix by requesting either needed_headroom if already primed, or just the
      bare minimum needed for the header otherwise.
      
      [  190.703567] kernel BUG at net/core/skbuff.c:104!
      [  190.708384] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      [  190.714007] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld
      [  190.728975] CPU: 1 PID: 959 Comm: kworker/1:2 Not tainted 4.17.0-rc4-net_master-custom-139 #10
      [  190.737647] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
      [  190.747006] Workqueue: ipv6_addrconf addrconf_dad_work
      [  190.752222] RIP: 0010:skb_panic+0xc3/0x100
      [  190.756358] RSP: 0018:ffff8801d54072f0 EFLAGS: 00010282
      [  190.761629] RAX: 0000000000000085 RBX: ffff8801c1a8ecc0 RCX: 0000000000000000
      [  190.768830] RDX: 0000000000000085 RSI: dffffc0000000000 RDI: ffffed003aa80e54
      [  190.776025] RBP: ffff8801bd1ec5a0 R08: ffffed003aabce19 R09: ffffed003aabce19
      [  190.783226] R10: 0000000000000001 R11: ffffed003aabce18 R12: ffff8801bf695dbe
      [  190.790418] R13: 0000000000000084 R14: 00000000000006c0 R15: ffff8801bf695dc8
      [  190.797621] FS:  0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000
      [  190.805786] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  190.811582] CR2: 000055fa929aced0 CR3: 0000000003228004 CR4: 00000000001606e0
      [  190.818790] Call Trace:
      [  190.821264]  <IRQ>
      [  190.823314]  ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
      [  190.828940]  ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
      [  190.834562]  skb_push+0x78/0x90
      [  190.837749]  ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
      [  190.843219]  ? ip6gre_tunnel_ioctl+0xd90/0xd90 [ip6_gre]
      [  190.848577]  ? debug_check_no_locks_freed+0x210/0x210
      [  190.853679]  ? debug_check_no_locks_freed+0x210/0x210
      [  190.858783]  ? print_irqtrace_events+0x120/0x120
      [  190.863451]  ? sched_clock_cpu+0x18/0x210
      [  190.867496]  ? cyc2ns_read_end+0x10/0x10
      [  190.871474]  ? skb_network_protocol+0x76/0x200
      [  190.875977]  dev_hard_start_xmit+0x137/0x770
      [  190.880317]  ? do_raw_spin_trylock+0x6d/0xa0
      [  190.884624]  sch_direct_xmit+0x2ef/0x5d0
      [  190.888589]  ? pfifo_fast_dequeue+0x3fa/0x670
      [  190.892994]  ? pfifo_fast_change_tx_queue_len+0x810/0x810
      [  190.898455]  ? __lock_is_held+0xa0/0x160
      [  190.902422]  __qdisc_run+0x39e/0xfc0
      [  190.906041]  ? _raw_spin_unlock+0x29/0x40
      [  190.910090]  ? pfifo_fast_enqueue+0x24b/0x3e0
      [  190.914501]  ? sch_direct_xmit+0x5d0/0x5d0
      [  190.918658]  ? pfifo_fast_dequeue+0x670/0x670
      [  190.923047]  ? __dev_queue_xmit+0x172/0x1770
      [  190.927365]  ? preempt_count_sub+0xf/0xd0
      [  190.931421]  __dev_queue_xmit+0x410/0x1770
      [  190.935553]  ? ___slab_alloc+0x605/0x930
      [  190.939524]  ? print_irqtrace_events+0x120/0x120
      [  190.944186]  ? memcpy+0x34/0x50
      [  190.947364]  ? netdev_pick_tx+0x1c0/0x1c0
      [  190.951428]  ? __skb_clone+0x2fd/0x3d0
      [  190.955218]  ? __copy_skb_header+0x270/0x270
      [  190.959537]  ? rcu_read_lock_sched_held+0x93/0xa0
      [  190.964282]  ? kmem_cache_alloc+0x344/0x4d0
      [  190.968520]  ? cyc2ns_read_end+0x10/0x10
      [  190.972495]  ? skb_clone+0x123/0x230
      [  190.976112]  ? skb_split+0x820/0x820
      [  190.979747]  ? tcf_mirred+0x554/0x930 [act_mirred]
      [  190.984582]  tcf_mirred+0x554/0x930 [act_mirred]
      [  190.989252]  ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
      [  190.996109]  ? __lock_acquire+0x706/0x26e0
      [  191.000239]  ? sched_clock_cpu+0x18/0x210
      [  191.004294]  tcf_action_exec+0xcf/0x2a0
      [  191.008179]  tcf_classify+0xfa/0x340
      [  191.011794]  __netif_receive_skb_core+0x8e1/0x1c60
      [  191.016630]  ? debug_check_no_locks_freed+0x210/0x210
      [  191.021732]  ? nf_ingress+0x500/0x500
      [  191.025458]  ? process_backlog+0x347/0x4b0
      [  191.029619]  ? print_irqtrace_events+0x120/0x120
      [  191.034302]  ? lock_acquire+0xd8/0x320
      [  191.038089]  ? process_backlog+0x1b6/0x4b0
      [  191.042246]  ? process_backlog+0xc2/0x4b0
      [  191.046303]  process_backlog+0xc2/0x4b0
      [  191.050189]  net_rx_action+0x5cc/0x980
      [  191.053991]  ? napi_complete_done+0x2c0/0x2c0
      [  191.058386]  ? mark_lock+0x13d/0xb40
      [  191.062001]  ? clockevents_program_event+0x6b/0x1d0
      [  191.066922]  ? print_irqtrace_events+0x120/0x120
      [  191.071593]  ? __lock_is_held+0xa0/0x160
      [  191.075566]  __do_softirq+0x1d4/0x9d2
      [  191.079282]  ? ip6_finish_output2+0x524/0x1460
      [  191.083771]  do_softirq_own_stack+0x2a/0x40
      [  191.087994]  </IRQ>
      [  191.090130]  do_softirq.part.13+0x38/0x40
      [  191.094178]  __local_bh_enable_ip+0x135/0x190
      [  191.098591]  ip6_finish_output2+0x54d/0x1460
      [  191.102916]  ? ip6_forward_finish+0x2f0/0x2f0
      [  191.107314]  ? ip6_mtu+0x3c/0x2c0
      [  191.110674]  ? ip6_finish_output+0x2f8/0x650
      [  191.114992]  ? ip6_output+0x12a/0x500
      [  191.118696]  ip6_output+0x12a/0x500
      [  191.122223]  ? ip6_route_dev_notify+0x5b0/0x5b0
      [  191.126807]  ? ip6_finish_output+0x650/0x650
      [  191.131120]  ? ip6_fragment+0x1a60/0x1a60
      [  191.135182]  ? icmp6_dst_alloc+0x26e/0x470
      [  191.139317]  mld_sendpack+0x672/0x830
      [  191.143021]  ? igmp6_mcf_seq_next+0x2f0/0x2f0
      [  191.147429]  ? __local_bh_enable_ip+0x77/0x190
      [  191.151913]  ipv6_mc_dad_complete+0x47/0x90
      [  191.156144]  addrconf_dad_completed+0x561/0x720
      [  191.160731]  ? addrconf_rs_timer+0x3a0/0x3a0
      [  191.165036]  ? mark_held_locks+0xc9/0x140
      [  191.169095]  ? __local_bh_enable_ip+0x77/0x190
      [  191.173570]  ? addrconf_dad_work+0x50d/0xa20
      [  191.177886]  ? addrconf_dad_work+0x529/0xa20
      [  191.182194]  addrconf_dad_work+0x529/0xa20
      [  191.186342]  ? addrconf_dad_completed+0x720/0x720
      [  191.191088]  ? __lock_is_held+0xa0/0x160
      [  191.195059]  ? process_one_work+0x45d/0xe20
      [  191.199302]  ? process_one_work+0x51e/0xe20
      [  191.203531]  ? rcu_read_lock_sched_held+0x93/0xa0
      [  191.208279]  process_one_work+0x51e/0xe20
      [  191.212340]  ? pwq_dec_nr_in_flight+0x200/0x200
      [  191.216912]  ? get_lock_stats+0x4b/0xf0
      [  191.220788]  ? preempt_count_sub+0xf/0xd0
      [  191.224844]  ? worker_thread+0x219/0x860
      [  191.228823]  ? do_raw_spin_trylock+0x6d/0xa0
      [  191.233142]  worker_thread+0xeb/0x860
      [  191.236848]  ? process_one_work+0xe20/0xe20
      [  191.241095]  kthread+0x206/0x300
      [  191.244352]  ? process_one_work+0xe20/0xe20
      [  191.248587]  ? kthread_stop+0x570/0x570
      [  191.252459]  ret_from_fork+0x3a/0x50
      [  191.256082] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
      [  191.275327] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d54072f0
      [  191.281024] ---[ end trace 7ea51094e099e006 ]---
      [  191.285724] Kernel panic - not syncing: Fatal exception in interrupt
      [  191.292168] Kernel Offset: disabled
      [  191.295697] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Reproducer:
      
      	ip link add h1 type veth peer name swp1
      	ip link add h3 type veth peer name swp3
      
      	ip link set dev h1 up
      	ip address add 192.0.2.1/28 dev h1
      
      	ip link add dev vh3 type vrf table 20
      	ip link set dev h3 master vh3
      	ip link set dev vh3 up
      	ip link set dev h3 up
      
      	ip link set dev swp3 up
      	ip address add dev swp3 2001:db8:2::1/64
      
      	ip link set dev swp1 up
      	tc qdisc add dev swp1 clsact
      
      	ip link add name gt6 type ip6erspan \
      		local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123
      	ip link set dev gt6 up
      
      	sleep 1
      
      	tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
      		action mirred egress mirror dev gt6
      	ping -I h1 192.0.2.2
      
      Fixes: e41c7c68 ("ip6erspan: make sure enough headroom at xmit.")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5691484d
    • Petr Machata's avatar
      net: ip6_gre: Request headroom in __gre6_xmit() · 01b8d064
      Petr Machata authored
      __gre6_xmit() pushes GRE headers before handing over to ip6_tnl_xmit()
      for generic IP-in-IP processing. However it doesn't make sure that there
      is enough headroom to push the header to. That can lead to the panic
      cited below. (Reproducer below that).
      
      Fix by requesting either needed_headroom if already primed, or just the
      bare minimum needed for the header otherwise.
      
      [  158.576725] kernel BUG at net/core/skbuff.c:104!
      [  158.581510] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      [  158.587174] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld
      [  158.602268] CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 4.17.0-rc4-net_master-custom-139 #10
      [  158.610938] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
      [  158.620426] RIP: 0010:skb_panic+0xc3/0x100
      [  158.624586] RSP: 0018:ffff8801d3f27110 EFLAGS: 00010286
      [  158.629882] RAX: 0000000000000082 RBX: ffff8801c02cc040 RCX: 0000000000000000
      [  158.637127] RDX: 0000000000000082 RSI: dffffc0000000000 RDI: ffffed003a7e4e18
      [  158.644366] RBP: ffff8801bfec8020 R08: ffffed003aabce19 R09: ffffed003aabce19
      [  158.651574] R10: 000000000000000b R11: ffffed003aabce18 R12: ffff8801c364de66
      [  158.658786] R13: 000000000000002c R14: 00000000000000c0 R15: ffff8801c364de68
      [  158.666007] FS:  0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000
      [  158.674212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.680036] CR2: 00007f4b3702dcd0 CR3: 0000000003228002 CR4: 00000000001606e0
      [  158.687228] Call Trace:
      [  158.689752]  ? __gre6_xmit+0x246/0xd80 [ip6_gre]
      [  158.694475]  ? __gre6_xmit+0x246/0xd80 [ip6_gre]
      [  158.699141]  skb_push+0x78/0x90
      [  158.702344]  __gre6_xmit+0x246/0xd80 [ip6_gre]
      [  158.706872]  ip6gre_tunnel_xmit+0x3bc/0x610 [ip6_gre]
      [  158.711992]  ? __gre6_xmit+0xd80/0xd80 [ip6_gre]
      [  158.716668]  ? debug_check_no_locks_freed+0x210/0x210
      [  158.721761]  ? print_irqtrace_events+0x120/0x120
      [  158.726461]  ? sched_clock_cpu+0x18/0x210
      [  158.730572]  ? sched_clock_cpu+0x18/0x210
      [  158.734692]  ? cyc2ns_read_end+0x10/0x10
      [  158.738705]  ? skb_network_protocol+0x76/0x200
      [  158.743216]  ? netif_skb_features+0x1b2/0x550
      [  158.747648]  dev_hard_start_xmit+0x137/0x770
      [  158.752010]  sch_direct_xmit+0x2ef/0x5d0
      [  158.755992]  ? pfifo_fast_dequeue+0x3fa/0x670
      [  158.760460]  ? pfifo_fast_change_tx_queue_len+0x810/0x810
      [  158.765975]  ? __lock_is_held+0xa0/0x160
      [  158.770002]  __qdisc_run+0x39e/0xfc0
      [  158.773673]  ? _raw_spin_unlock+0x29/0x40
      [  158.777781]  ? pfifo_fast_enqueue+0x24b/0x3e0
      [  158.782191]  ? sch_direct_xmit+0x5d0/0x5d0
      [  158.786372]  ? pfifo_fast_dequeue+0x670/0x670
      [  158.790818]  ? __dev_queue_xmit+0x172/0x1770
      [  158.795195]  ? preempt_count_sub+0xf/0xd0
      [  158.799313]  __dev_queue_xmit+0x410/0x1770
      [  158.803512]  ? ___slab_alloc+0x605/0x930
      [  158.807525]  ? ___slab_alloc+0x605/0x930
      [  158.811540]  ? memcpy+0x34/0x50
      [  158.814768]  ? netdev_pick_tx+0x1c0/0x1c0
      [  158.818895]  ? __skb_clone+0x2fd/0x3d0
      [  158.822712]  ? __copy_skb_header+0x270/0x270
      [  158.827079]  ? rcu_read_lock_sched_held+0x93/0xa0
      [  158.831903]  ? kmem_cache_alloc+0x344/0x4d0
      [  158.836199]  ? skb_clone+0x123/0x230
      [  158.839869]  ? skb_split+0x820/0x820
      [  158.843521]  ? tcf_mirred+0x554/0x930 [act_mirred]
      [  158.848407]  tcf_mirred+0x554/0x930 [act_mirred]
      [  158.853104]  ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
      [  158.860005]  ? __lock_acquire+0x706/0x26e0
      [  158.864162]  ? mark_lock+0x13d/0xb40
      [  158.867832]  tcf_action_exec+0xcf/0x2a0
      [  158.871736]  tcf_classify+0xfa/0x340
      [  158.875402]  __netif_receive_skb_core+0x8e1/0x1c60
      [  158.880334]  ? nf_ingress+0x500/0x500
      [  158.884059]  ? process_backlog+0x347/0x4b0
      [  158.888241]  ? lock_acquire+0xd8/0x320
      [  158.892050]  ? process_backlog+0x1b6/0x4b0
      [  158.896228]  ? process_backlog+0xc2/0x4b0
      [  158.900291]  process_backlog+0xc2/0x4b0
      [  158.904210]  net_rx_action+0x5cc/0x980
      [  158.908047]  ? napi_complete_done+0x2c0/0x2c0
      [  158.912525]  ? rcu_read_unlock+0x80/0x80
      [  158.916534]  ? __lock_is_held+0x34/0x160
      [  158.920541]  __do_softirq+0x1d4/0x9d2
      [  158.924308]  ? trace_event_raw_event_irq_handler_exit+0x140/0x140
      [  158.930515]  run_ksoftirqd+0x1d/0x40
      [  158.934152]  smpboot_thread_fn+0x32b/0x690
      [  158.938299]  ? sort_range+0x20/0x20
      [  158.941842]  ? preempt_count_sub+0xf/0xd0
      [  158.945940]  ? schedule+0x5b/0x140
      [  158.949412]  kthread+0x206/0x300
      [  158.952689]  ? sort_range+0x20/0x20
      [  158.956249]  ? kthread_stop+0x570/0x570
      [  158.960164]  ret_from_fork+0x3a/0x50
      [  158.963823] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
      [  158.983235] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d3f27110
      [  158.988935] ---[ end trace 5af56ee845aa6cc8 ]---
      [  158.993641] Kernel panic - not syncing: Fatal exception in interrupt
      [  159.000176] Kernel Offset: disabled
      [  159.003767] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Reproducer:
      
      	ip link add h1 type veth peer name swp1
      	ip link add h3 type veth peer name swp3
      
      	ip link set dev h1 up
      	ip address add 192.0.2.1/28 dev h1
      
      	ip link add dev vh3 type vrf table 20
      	ip link set dev h3 master vh3
      	ip link set dev vh3 up
      	ip link set dev h3 up
      
      	ip link set dev swp3 up
      	ip address add dev swp3 2001:db8:2::1/64
      
      	ip link set dev swp1 up
      	tc qdisc add dev swp1 clsact
      
      	ip link add name gt6 type ip6gretap \
      		local 2001:db8:2::1 remote 2001:db8:2::2
      	ip link set dev gt6 up
      
      	sleep 1
      
      	tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
      		action mirred egress mirror dev gt6
      	ping -I h1 192.0.2.2
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01b8d064
    • Jesper Dangaard Brouer's avatar
      selftests/bpf: check return value of fopen in test_verifier.c · deea8122
      Jesper Dangaard Brouer authored
      Commit 0a674874 ("selftests/bpf: Only run tests if !bpf_disabled")
      forgot to check return value of fopen.
      
      This caused some confusion, when running test_verifier (from
      tools/testing/selftests/bpf/) on an older kernel (< v4.4) as it will
      simply seqfault.
      
      This fix avoids the segfault and prints an error, but allow program to
      continue.  Given the sysctl was introduced in 1be7f75d ("bpf:
      enable non-root eBPF programs"), we know that the running kernel
      cannot support unpriv, thus continue with unpriv_disabled = true.
      
      Fixes: 0a674874 ("selftests/bpf: Only run tests if !bpf_disabled")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      deea8122
    • William Tu's avatar
      erspan: fix invalid erspan version. · 02f99df1
      William Tu authored
      ERSPAN only support version 1 and 2.  When packets send to an
      erspan device which does not have proper version number set,
      drop the packet.  In real case, we observe multicast packets
      sent to the erspan pernet device, erspan0, which does not have
      erspan version configured.
      Reported-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02f99df1
    • David S. Miller's avatar
      Merge branch 'ibmvnic-Fix-bugs-and-memory-leaks' · d13d170c
      David S. Miller authored
      Thomas Falcon says:
      
      ====================
      ibmvnic: Fix bugs and memory leaks
      
      This is a small patch series fixing up some bugs and memory leaks
      in the ibmvnic driver. The first fix frees up previously allocated
      memory that should be freed in case of an error. The second fixes
      a reset case that was failing due to TX/RX queue IRQ's being
      erroneously disabled without being enabled again. The final patch
      fixes incorrect reallocated of statistics buffers during a device
      reset, resulting in loss of statistics information and a memory leak.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d13d170c
    • Thomas Falcon's avatar
      ibmvnic: Fix statistics buffers memory leak · 07184213
      Thomas Falcon authored
      Move initialization of statistics buffers from ibmvnic_init function
      into ibmvnic_probe. In the current state, ibmvnic_init will be called
      again during a device reset, resulting in the allocation of new
      buffers without freeing the old ones.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07184213
    • Thomas Falcon's avatar
      ibmvnic: Fix non-fatal firmware error reset · 134bbe7f
      Thomas Falcon authored
      It is not necessary to disable interrupt lines here during a reset
      to handle a non-fatal firmware error. Move that call within the code
      block that handles the other cases that do require interrupts to be
      disabled and re-enabled.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      134bbe7f
    • Thomas Falcon's avatar
      ibmvnic: Free coherent DMA memory if FW map failed · 4cf2ddf3
      Thomas Falcon authored
      If the firmware map fails for whatever reason, remember to free
      up the memory after.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cf2ddf3
    • David Ahern's avatar
      net/ipv4: Initialize proto and ports in flow struct · 5a847a6e
      David Ahern authored
      Updating the FIB tracepoint for the recent change to allow rules using
      the protocol and ports exposed a few places where the entries in the flow
      struct are not initialized.
      
      For __fib_validate_source add the call to fib4_rules_early_flow_dissect
      since it is invoked for the input path. For netfilter, add the memset on
      the flow struct to avoid future problems like this. In ip_route_input_slow
      need to set the fields if the skb dissection does not happen.
      
      Fixes: bfff4862 ("net: fib_rules: support for match on ip_proto, sport and dport")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a847a6e
    • Matt Mullins's avatar
      tls: don't use stack memory in a scatterlist · 8ab6ffba
      Matt Mullins authored
      scatterlist code expects virt_to_page() to work, which fails with
      CONFIG_VMAP_STACK=y.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Signed-off-by: default avatarMatt Mullins <mmullins@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ab6ffba
  5. 16 May, 2018 10 commits