1. 14 May, 2019 4 commits
  2. 13 May, 2019 17 commits
    • John Fastabend's avatar
      bpf: sockmap fix msg->sg.size account on ingress skb · cabede8b
      John Fastabend authored
      When converting a skb to msg->sg we forget to set the size after the
      latest ktls/tls code conversion. This patch can be reached by doing
      a redir into ingress path from BPF skb sock recv hook. Then trying to
      read the size fails.
      
      Fix this by setting the size.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cabede8b
    • John Fastabend's avatar
      bpf: sockmap remove duplicate queue free · c42253cc
      John Fastabend authored
      In tcp bpf remove we free the cork list and purge the ingress msg
      list. However we do this before the ref count reaches zero so it
      could be possible some other access is in progress. In this case
      (tcp close and/or tcp_unhash) we happen to also hold the sock
      lock so no path exists but lets fix it otherwise it is extremely
      fragile and breaks the reference counting rules. Also we already
      check the cork list and ingress msg queue and free them once the
      ref count reaches zero so its wasteful to check twice.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c42253cc
    • John Fastabend's avatar
      bpf: sockmap, only stop/flush strp if it was enabled at some point · 01489436
      John Fastabend authored
      If we try to call strp_done on a parser that has never been
      initialized, because the sockmap user is only using TX side for
      example we get the following error.
      
        [  883.422081] WARNING: CPU: 1 PID: 208 at kernel/workqueue.c:3030 __flush_work+0x1ca/0x1e0
        ...
        [  883.422095] Workqueue: events sk_psock_destroy_deferred
        [  883.422097] RIP: 0010:__flush_work+0x1ca/0x1e0
      
      This had been wrapped in a 'if (psock->parser.enabled)' logic which
      was broken because the strp_done() was never actually being called
      because we do a strp_stop() earlier in the tear down logic will
      set parser.enabled to false. This could result in a use after free
      if work was still in the queue and was resolved by the patch here,
      1d79895a ("sk_msg: Always cancel strp work before freeing the
      psock"). However, calling strp_stop(), done by the patch marked in
      the fixes tag, only is useful if we never initialized a strp parser
      program and never initialized the strp to start with. Because if
      we had initialized a stream parser strp_stop() would have been called
      by sk_psock_drop() earlier in the tear down process.  By forcing the
      strp to stop we get past the WARNING in strp_done that checks
      the stopped flag but calling cancel_work_sync on work that has never
      been initialized is also wrong and generates the warning above.
      
      To fix check if the parser program exists. If the program exists
      then the strp work has been initialized and must be sync'd and
      cancelled before free'ing any structures. If no program exists we
      never initialized the stream parser in the first place so skip the
      sync/cancel logic implemented by strp_done.
      
      Finally, remove the strp_done its not needed and in the case where we
      are using the stream parser has already been called.
      
      Fixes: e8e34377 ("bpf: Stop the psock parser before canceling its work")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      01489436
    • Stanislav Fomichev's avatar
      bpf: mark bpf_event_notify and bpf_event_init as static · 390e99cf
      Stanislav Fomichev authored
      Both of them are not declared in the headers and not used outside
      of bpf_trace.c file.
      
      Fixes: a38d1107 ("bpf: support raw tracepoints in modules")
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      390e99cf
    • Eric Dumazet's avatar
      bpf: devmap: fix use-after-free Read in __dev_map_entry_free · 2baae354
      Eric Dumazet authored
      synchronize_rcu() is fine when the rcu callbacks only need
      to free memory (kfree_rcu() or direct kfree() call rcu call backs)
      
      __dev_map_entry_free() is a bit more complex, so we need to make
      sure that call queued __dev_map_entry_free() callbacks have completed.
      
      sysbot report:
      
      BUG: KASAN: use-after-free in dev_map_flush_old kernel/bpf/devmap.c:365
      [inline]
      BUG: KASAN: use-after-free in __dev_map_entry_free+0x2a8/0x300
      kernel/bpf/devmap.c:379
      Read of size 8 at addr ffff8801b8da38c8 by task ksoftirqd/1/18
      
      CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.17.0+ #39
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x1b9/0x294 lib/dump_stack.c:113
        print_address_description+0x6c/0x20b mm/kasan/report.c:256
        kasan_report_error mm/kasan/report.c:354 [inline]
        kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
        __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
        dev_map_flush_old kernel/bpf/devmap.c:365 [inline]
        __dev_map_entry_free+0x2a8/0x300 kernel/bpf/devmap.c:379
        __rcu_reclaim kernel/rcu/rcu.h:178 [inline]
        rcu_do_batch kernel/rcu/tree.c:2558 [inline]
        invoke_rcu_callbacks kernel/rcu/tree.c:2818 [inline]
        __rcu_process_callbacks kernel/rcu/tree.c:2785 [inline]
        rcu_process_callbacks+0xe9d/0x1760 kernel/rcu/tree.c:2802
        __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
        run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
        smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
        kthread+0x345/0x410 kernel/kthread.c:240
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      Allocated by task 6675:
        save_stack+0x43/0xd0 mm/kasan/kasan.c:448
        set_track mm/kasan/kasan.c:460 [inline]
        kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
        kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
        kmalloc include/linux/slab.h:513 [inline]
        kzalloc include/linux/slab.h:706 [inline]
        dev_map_alloc+0x208/0x7f0 kernel/bpf/devmap.c:102
        find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
        map_create+0x393/0x1010 kernel/bpf/syscall.c:453
        __do_sys_bpf kernel/bpf/syscall.c:2351 [inline]
        __se_sys_bpf kernel/bpf/syscall.c:2328 [inline]
        __x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2328
        do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 26:
        save_stack+0x43/0xd0 mm/kasan/kasan.c:448
        set_track mm/kasan/kasan.c:460 [inline]
        __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
        kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
        __cache_free mm/slab.c:3498 [inline]
        kfree+0xd9/0x260 mm/slab.c:3813
        dev_map_free+0x4fa/0x670 kernel/bpf/devmap.c:191
        bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
        process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
        worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
        kthread+0x345/0x410 kernel/kthread.c:240
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      The buggy address belongs to the object at ffff8801b8da37c0
        which belongs to the cache kmalloc-512 of size 512
      The buggy address is located 264 bytes inside of
        512-byte region [ffff8801b8da37c0, ffff8801b8da39c0)
      The buggy address belongs to the page:
      page:ffffea0006e368c0 count:1 mapcount:0 mapping:ffff8801da800940
      index:0xffff8801b8da3540
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0007217b88 ffffea0006e30cc8 ffff8801da800940
      raw: ffff8801b8da3540 ffff8801b8da3040 0000000100000004 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
        ffff8801b8da3780: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
        ffff8801b8da3800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      > ffff8801b8da3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                     ^
        ffff8801b8da3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff8801b8da3980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      
      Fixes: 546ac1ff ("bpf: add devmap, a map for storing net device references")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+457d3e2ffbcf31aee5c0@syzkaller.appspotmail.com
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2baae354
    • Corentin Labbe's avatar
      net: ethernet: stmmac: dwmac-sun8i: enable support of unicast filtering · d4c26eb6
      Corentin Labbe authored
      When adding more MAC addresses to a dwmac-sun8i interface, the device goes
      directly in promiscuous mode.
      This is due to IFF_UNICAST_FLT missing flag.
      
      So since the hardware support unicast filtering, let's add IFF_UNICAST_FLT.
      
      Fixes: 9f93ac8d ("net-next: stmmac: Add dwmac-sun8i")
      Signed-off-by: default avatarCorentin Labbe <clabbe@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4c26eb6
    • Grygorii Strashko's avatar
      net: ethernet: ti: netcp_ethss: fix build · a8577e13
      Grygorii Strashko authored
      Fix reported build fail:
      ERROR: "cpsw_ale_flush_multicast" [drivers/net/ethernet/ti/keystone_netcp_ethss.ko] undefined!
      ERROR: "cpsw_ale_create" [drivers/net/ethernet/ti/keystone_netcp_ethss.ko] undefined!
      ERROR: "cpsw_ale_add_vlan" [drivers/net/ethernet/ti/keystone_netcp_ethss.ko] undefined!
      
      Fixes: 16f54164 ("net: ethernet: ti: cpsw: drop CONFIG_TI_CPSW_ALE config option")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8577e13
    • Eric Dumazet's avatar
      flow_dissector: disable preemption around BPF calls · b1c17a9a
      Eric Dumazet authored
      Various things in eBPF really require us to disable preemption
      before running an eBPF program.
      
      syzbot reported :
      
      BUG: assuming atomic context at net/core/flow_dissector.c:737
      in_atomic(): 0, irqs_disabled(): 0, pid: 24710, name: syz-executor.3
      2 locks held by syz-executor.3/24710:
       #0: 00000000e81a4bf1 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x168e/0x3ff0 drivers/net/tun.c:1850
       #1: 00000000254afebd (rcu_read_lock){....}, at: __skb_flow_dissect+0x1e1/0x4bb0 net/core/flow_dissector.c:822
      CPU: 1 PID: 24710 Comm: syz-executor.3 Not tainted 5.1.0+ #6
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       __cant_sleep kernel/sched/core.c:6165 [inline]
       __cant_sleep.cold+0xa3/0xbb kernel/sched/core.c:6142
       bpf_flow_dissect+0xfe/0x390 net/core/flow_dissector.c:737
       __skb_flow_dissect+0x362/0x4bb0 net/core/flow_dissector.c:853
       skb_flow_dissect_flow_keys_basic include/linux/skbuff.h:1322 [inline]
       skb_probe_transport_header include/linux/skbuff.h:2500 [inline]
       skb_probe_transport_header include/linux/skbuff.h:2493 [inline]
       tun_get_user+0x2cfe/0x3ff0 drivers/net/tun.c:1940
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
       call_write_iter include/linux/fs.h:1872 [inline]
       do_iter_readv_writev+0x5fd/0x900 fs/read_write.c:693
       do_iter_write fs/read_write.c:970 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:951
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
       do_writev+0x15b/0x330 fs/read_write.c:1058
       __do_sys_writev fs/read_write.c:1131 [inline]
       __se_sys_writev fs/read_write.c:1128 [inline]
       __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128
       do_syscall_64+0x103/0x670 arch/x86/entry/common.c:298
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Petar Penkov <ppenkov@google.com>
      Cc: Stanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1c17a9a
    • Jarod Wilson's avatar
      bonding: fix arp_validate toggling in active-backup mode · a9b8a2b3
      Jarod Wilson authored
      There's currently a problem with toggling arp_validate on and off with an
      active-backup bond. At the moment, you can start up a bond, like so:
      
      modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
      ip link set bond0 down
      echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
      echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
      ip link set bond0 up
      ip addr add 192.168.1.2/24 dev bond0
      
      Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
      
      echo 1 > /sys/class/net/bond0/bonding/arp_validate
      
      Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
      arp_validate off again, the link falls flat on it's face:
      
      echo 0 > /sys/class/net/bond0/bonding/arp_validate
      dmesg
      ...
      [133191.911987] bond0: Setting arp_validate to none (0)
      [133194.257793] bond0: bond_should_notify_peers: slave ens4f0
      [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
      [133194.259000] bond0: making interface ens4f1 the new active one
      [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
      [133197.331191] bond0: now running without any active interface!
      
      The problem lies in bond_options.c, where passing in arp_validate=0
      results in bond->recv_probe getting set to NULL. This flies directly in
      the face of commit 3fe68df9, which says we need to set recv_probe =
      bond_arp_recv, even if we're not using arp_validate. Said commit fixed
      this in bond_option_arp_interval_set, but missed that we can get to that
      same state in bond_option_arp_validate_set as well.
      
      One solution would be to universally set recv_probe = bond_arp_recv here
      as well, but I don't think bond_option_arp_validate_set has any business
      touching recv_probe at all, and that should be left to the arp_interval
      code, so we can just make things much tidier here.
      
      Fixes: 3fe68df9 ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9b8a2b3
    • Jerome Brunet's avatar
      net: meson: fixup g12a glue ephy id · 0ecfc7e1
      Jerome Brunet authored
      The phy id chosen by Amlogic is incorrectly set in the mdio mux and
      does not match the phy driver.
      
      It was not detected before because DT forces the use the correct driver
      for the internal PHY.
      
      Fixes: 70904251 ("net: phy: add amlogic g12a mdio mux support")
      Reported-by: default avatarQi Duan <qi.duan@amlogic.com>
      Signed-off-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ecfc7e1
    • Kunihiko Hayashi's avatar
      net: phy: realtek: Replace phy functions with non-locked version in rtl8211e_config_init() · dffe7d2e
      Kunihiko Hayashi authored
      After calling phy_select_page() and until calling phy_restore_page(),
      the mutex 'mdio_lock' is already locked, so the driver should use
      non-locked version of phy functions. Or there will be a deadlock with
      'mdio_lock'.
      
      This replaces phy functions called from rtl8211e_config_init() to avoid
      the deadlock issue.
      
      Fixes: f81dadbc ("net: phy: realtek: Add rtl8211e rx/tx delays config")
      Signed-off-by: default avatarKunihiko Hayashi <hayashi.kunihiko@socionext.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dffe7d2e
    • Thomas Bogendoerfer's avatar
      net: seeq: fix crash caused by not set dev.parent · 5afcd14c
      Thomas Bogendoerfer authored
      The old MIPS implementation of dma_cache_sync() didn't use the dev argument,
      but commit c9eb6172 ("dma-mapping: turn dma_cache_sync into a
      dma_map_ops method") changed that, so we now need to set dev.parent.
      Signed-off-by: default avatarThomas Bogendoerfer <tbogendoerfer@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5afcd14c
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 3ebb41bf
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Postpone chain policy update to drop after transaction is complete,
         from Florian Westphal.
      
      2) Add entry to flowtable after confirmation to fix UDP flows with
         packets going in one single direction.
      
      3) Reference count leak in dst object, from Taehee Yoo.
      
      4) Check for TTL field in flowtable datapath, from Taehee Yoo.
      
      5) Fix h323 conntrack helper due to incorrect boundary check,
         from Jakub Jankowski.
      
      6) Fix incorrect rcu dereference when fetching basechain stats,
         from Florian Westphal.
      
      7) Missing error check when adding new entries to flowtable,
         from Taehee Yoo.
      
      8) Use version field in nfnetlink message to honor the nfgen_family
         field, from Kristian Evensen.
      
      9) Remove incorrect configuration check for CONFIG_NF_CONNTRACK_IPV6,
         from Subash Abhinov Kasiviswanathan.
      
      10) Prevent dying entries from being added to the flowtable,
          from Taehee Yoo.
      
      11) Don't hit WARN_ON() with malformed blob in ebtables with
          trailing data after last rule, reported by syzbot, patch
          from Florian Westphal.
      
      12) Remove NFT_CT_TIMEOUT enumeration, never used in the kernel
          code.
      
      13) Fix incorrect definition for NFT_LOGLEVEL_MAX, from Florian
          Westphal.
      
      This batch comes with a conflict that can be fixed with this patch:
      
      diff --cc include/uapi/linux/netfilter/nf_tables.h
      index 7bdb234f3d8c,f0cf7b0f4f35..505393c6e959
      --- a/include/uapi/linux/netfilter/nf_tables.h
      +++ b/include/uapi/linux/netfilter/nf_tables.h
      @@@ -966,6 -966,8 +966,7 @@@ enum nft_socket_keys
         * @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address)
         * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
         * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
       - * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack
      +  * @NFT_CT_ID: conntrack id
         */
        enum nft_ct_keys {
        	NFT_CT_STATE,
      @@@ -991,6 -993,8 +992,7 @@@
        	NFT_CT_DST_IP,
        	NFT_CT_SRC_IP6,
        	NFT_CT_DST_IP6,
       -	NFT_CT_TIMEOUT,
      + 	NFT_CT_ID,
        	__NFT_CT_MAX
        };
        #define NFT_CT_MAX		(__NFT_CT_MAX - 1)
      
      That replaces the unused NFT_CT_TIMEOUT definition by NFT_CT_ID. If you prefer,
      I can also solve this conflict here, just let me know.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ebb41bf
    • Petr Štetiar's avatar
      of_net: Fix missing of_find_device_by_node ref count drop · 3ee9ae74
      Petr Štetiar authored
      of_find_device_by_node takes a reference to the embedded struct device
      which needs to be dropped after use.
      
      Fixes: d01f449c ("of_net: add NVMEM support to of_get_mac_address")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Reported-by: default avatarJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: default avatarPetr Štetiar <ynezz@true.cz>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ee9ae74
    • Maxime Chevallier's avatar
      net: mvpp2: cls: Add missing NETIF_F_NTUPLE flag · da86f59f
      Maxime Chevallier authored
      Now that the mvpp2 driver supports classification offloading, we must
      add the NETIF_F_NTUPLE to the features list.
      
      Since the current code doesn't allow disabling the feature, we don't set
      the flag in dev->hw_features.
      
      Fixes: 90b509b3 ("net: mvpp2: cls: Add Classification offload support")
      Reported-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da86f59f
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 69dda13f
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-05-13
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix out of bounds backwards jumps due to a bug in dead code
         removal, from Daniel.
      
      2) Fix libbpf users by detecting unsupported BTF kernel features
         and sanitize them before load, from Andrii.
      
      3) Fix undefined behavior in narrow load handling of context
         fields, from Krzesimir.
      
      4) Various BPF uapi header doc/man page fixes, from Quentin.
      
      5) Misc .gitignore fixups to exclude built files, from Kelsey.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69dda13f
    • Krzesimir Nowak's avatar
      bpf: fix undefined behavior in narrow load handling · e2f7fc0a
      Krzesimir Nowak authored
      Commit 31fd8581 ("bpf: permits narrower load from bpf program
      context fields") made the verifier add AND instructions to clear the
      unwanted bits with a mask when doing a narrow load. The mask is
      computed with
      
        (1 << size * 8) - 1
      
      where "size" is the size of the narrow load. When doing a 4 byte load
      of a an 8 byte field the verifier shifts the literal 1 by 32 places to
      the left. This results in an overflow of a signed integer, which is an
      undefined behavior. Typically, the computed mask was zero, so the
      result of the narrow load ended up being zero too.
      
      Cast the literal to long long to avoid overflows. Note that narrow
      load of the 4 byte fields does not have the undefined behavior,
      because the load size can only be either 1 or 2 bytes, so shifting 1
      by 8 or 16 places will not overflow it. And reading 4 bytes would not
      be a narrow load of a 4 bytes field.
      
      Fixes: 31fd8581 ("bpf: permits narrower load from bpf program context fields")
      Reviewed-by: default avatarAlban Crequy <alban@kinvolk.io>
      Reviewed-by: default avatarIago López Galeiras <iago@kinvolk.io>
      Signed-off-by: default avatarKrzesimir Nowak <krzesimir@kinvolk.io>
      Cc: Yonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e2f7fc0a
  3. 12 May, 2019 14 commits
  4. 11 May, 2019 3 commits
    • Heiner Kallweit's avatar
      net: phy: realtek: fix double page ops in generic Realtek driver · 8f779443
      Heiner Kallweit authored
      When adding missing callbacks I missed that one had them set already.
      Interesting that the compiler didn't complain.
      
      Fixes: daf3ddbe ("net: phy: realtek: add missing page operations")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f779443
    • Nicholas Mc Guire's avatar
      net: qrtr: use protocol endiannes variable · 8f5e2451
      Nicholas Mc Guire authored
      sparse was unable to verify endiannes correctness due to reassignment
      from le32_to_cpu to the same variable - fix this warning up by providing
      a proper __le32 type and initializing it. This is not actually fixing
      any bug - rather just addressing the sparse warning.
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f5e2451
    • Daniel Borkmann's avatar
      bpf: fix out of bounds backwards jmps due to dead code removal · af959b18
      Daniel Borkmann authored
      systemtap folks reported the following splat recently:
      
        [ 7790.862212] WARNING: CPU: 3 PID: 26759 at arch/x86/kernel/kprobes/core.c:1022 kprobe_fault_handler+0xec/0xf0
        [...]
        [ 7790.864113] CPU: 3 PID: 26759 Comm: sshd Not tainted 5.1.0-0.rc7.git1.1.fc31.x86_64 #1
        [ 7790.864198] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS[...]
        [ 7790.864314] RIP: 0010:kprobe_fault_handler+0xec/0xf0
        [ 7790.864375] Code: 48 8b 50 [...]
        [ 7790.864714] RSP: 0018:ffffc06800bdbb48 EFLAGS: 00010082
        [ 7790.864812] RAX: ffff9e2b75a16320 RBX: 0000000000000000 RCX: 0000000000000000
        [ 7790.865306] RDX: ffffffffffffffff RSI: 000000000000000e RDI: ffffc06800bdbbf8
        [ 7790.865514] RBP: ffffc06800bdbbf8 R08: 0000000000000000 R09: 0000000000000000
        [ 7790.865960] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc06800bdbbf8
        [ 7790.866037] R13: ffff9e2ab56a0418 R14: ffff9e2b6d0bb400 R15: ffff9e2b6d268000
        [ 7790.866114] FS:  00007fde49937d80(0000) GS:ffff9e2b75a00000(0000) knlGS:0000000000000000
        [ 7790.866193] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 7790.866318] CR2: 0000000000000000 CR3: 000000012f312000 CR4: 00000000000006e0
        [ 7790.866419] Call Trace:
        [ 7790.866677]  do_user_addr_fault+0x64/0x480
        [ 7790.867513]  do_page_fault+0x33/0x210
        [ 7790.868002]  async_page_fault+0x1e/0x30
        [ 7790.868071] RIP: 0010:          (null)
        [ 7790.868144] Code: Bad RIP value.
        [ 7790.868229] RSP: 0018:ffffc06800bdbca8 EFLAGS: 00010282
        [ 7790.868362] RAX: ffff9e2b598b60f8 RBX: ffffc06800bdbe48 RCX: 0000000000000004
        [ 7790.868629] RDX: 0000000000000004 RSI: ffffc06800bdbc6c RDI: ffff9e2b598b60f0
        [ 7790.868834] RBP: ffffc06800bdbcf8 R08: 0000000000000000 R09: 0000000000000004
        [ 7790.870432] R10: 00000000ff6f7a03 R11: 0000000000000000 R12: 0000000000000001
        [ 7790.871859] R13: ffffc06800bdbcb8 R14: 0000000000000000 R15: ffff9e2acd0a5310
        [ 7790.873455]  ? vfs_read+0x5/0x170
        [ 7790.874639]  ? vfs_read+0x1/0x170
        [ 7790.875834]  ? trace_call_bpf+0xf6/0x260
        [ 7790.877044]  ? vfs_read+0x1/0x170
        [ 7790.878208]  ? vfs_read+0x5/0x170
        [ 7790.879345]  ? kprobe_perf_func+0x233/0x260
        [ 7790.880503]  ? vfs_read+0x1/0x170
        [ 7790.881632]  ? vfs_read+0x5/0x170
        [ 7790.882751]  ? kprobe_ftrace_handler+0x92/0xf0
        [ 7790.883926]  ? __vfs_read+0x30/0x30
        [ 7790.885050]  ? ftrace_ops_assist_func+0x94/0x100
        [ 7790.886183]  ? vfs_read+0x1/0x170
        [ 7790.887283]  ? vfs_read+0x5/0x170
        [ 7790.888348]  ? ksys_read+0x5a/0xe0
        [ 7790.889389]  ? do_syscall_64+0x5c/0xa0
        [ 7790.890401]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      After some debugging, turns out that the logic in 2cbd95a5
      ("bpf: change parameters of call/branch offset adjustment") has
      a bug that is exposed after 52875a04 ("bpf: verifier: remove
      dead code") in that we miss some of the jump offset adjustments
      after code patching when we remove dead code, more concretely,
      upon backward jump spanning over the area that is being removed.
      
      BPF insns of a case that was hit pre 52875a04:
      
        [...]
        676: (85) call bpf_perf_event_output#-47616
        677: (05) goto pc-636
        678: (62) *(u32 *)(r10 -64) = 0
        679: (bf) r7 = r10
        680: (07) r7 += -64
        681: (05) goto pc-44
        682: (05) goto pc-1
        683: (05) goto pc-1
      
      BPF insns afterwards:
      
        [...]
        618: (85) call bpf_perf_event_output#-47616
        619: (05) goto pc-638
        620: (62) *(u32 *)(r10 -64) = 0
        621: (bf) r7 = r10
        622: (07) r7 += -64
        623: (05) goto pc-44
      
      To illustrate the bug, situation looks as follows:
           ____
        0 |    | <-- foo: [...]
        1 |____|
        2 |____| <-- pos / end_new  ^
        3 |    |                    |
        4 |    |                    |  len
        5 |____|                    |  (remove region)
        6 |    | <-- end_old        v
        7 |    |
        8 |    | <-- curr  (jmp foo)
        9 |____|
      
      The condition curr >= end_new && curr + off + 1 < end_new in the
      branch delta adjustments is never hit because curr + off + 1 <
      end_new is compared as unsigned and therefore curr + off + 1 >
      end_new in unsigned realm as curr + off + 1 becomes negative
      since the insns are memmove()'d before the offset adjustments.
      
      Correct BPF insns after this fix:
      
        [...]
        618: (85) call bpf_perf_event_output#-47216
        619: (05) goto pc-578
        620: (62) *(u32 *)(r10 -64) = 0
        621: (bf) r7 = r10
        622: (07) r7 += -64
        623: (05) goto pc-44
      
      Note that unprivileged case is not affected from this.
      
      Fixes: 52875a04 ("bpf: verifier: remove dead code")
      Fixes: 2cbd95a5 ("bpf: change parameters of call/branch offset adjustment")
      Reported-by: default avatarFrank Ch. Eigler <fche@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      af959b18
  5. 10 May, 2019 2 commits