1. 13 Aug, 2018 34 commits
    • Vlad Buslov's avatar
      net: sched: act_ife: disable bh when taking ife_mod_lock · 42c625a4
      Vlad Buslov authored
      Lockdep reports deadlock for following locking scenario in ife action:
      
      Task one:
      1) Executes ife action update.
      2) Takes tcfa_lock.
      3) Waits on ife_mod_lock which is already taken by task two.
      
      Task two:
      
      1) Executes any path that obtains ife_mod_lock without disabling bh (any
      path that takes ife_mod_lock while holding tcfa_lock has bh disabled) like
      loading a meta module, or creating new action.
      2) Takes ife_mod_lock.
      3) Task is preempted by rate estimator timer.
      4) Timer callback waits on tcfa_lock which is taken by task one.
      
      In described case tasks deadlock because they take same two locks in
      different order. To prevent potential deadlock reported by lockdep, always
      disable bh when obtaining ife_mod_lock.
      
      Lockdep warning:
      
      [  508.101192] =====================================================
      [  508.107708] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      [  508.114728] 4.18.0-rc8+ #646 Not tainted
      [  508.119050] -----------------------------------------------------
      [  508.125559] tc/5460 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
      [  508.132025] 000000005a938c68 (ife_mod_lock){++++}, at: find_ife_oplist+0x1e/0xc0 [act_ife]
      [  508.140996]
                     and this task is already holding:
      [  508.147548] 00000000d46f6c56 (&(&p->tcfa_lock)->rlock){+.-.}, at: tcf_ife_init+0x6ae/0xf40 [act_ife]
      [  508.157371] which would create a new lock dependency:
      [  508.162828]  (&(&p->tcfa_lock)->rlock){+.-.} -> (ife_mod_lock){++++}
      [  508.169572]
                     but this new dependency connects a SOFTIRQ-irq-safe lock:
      [  508.178197]  (&(&p->tcfa_lock)->rlock){+.-.}
      [  508.178201]
                     ... which became SOFTIRQ-irq-safe at:
      [  508.189771]   _raw_spin_lock+0x2c/0x40
      [  508.193906]   est_fetch_counters+0x41/0xb0
      [  508.198391]   est_timer+0x83/0x3c0
      [  508.202180]   call_timer_fn+0x16a/0x5d0
      [  508.206400]   run_timer_softirq+0x399/0x920
      [  508.210967]   __do_softirq+0x157/0x97d
      [  508.215102]   irq_exit+0x152/0x1c0
      [  508.218888]   smp_apic_timer_interrupt+0xc0/0x4e0
      [  508.223976]   apic_timer_interrupt+0xf/0x20
      [  508.228540]   cpuidle_enter_state+0xf8/0x5d0
      [  508.233198]   do_idle+0x28a/0x350
      [  508.236881]   cpu_startup_entry+0xc7/0xe0
      [  508.241296]   start_secondary+0x2e8/0x3f0
      [  508.245678]   secondary_startup_64+0xa5/0xb0
      [  508.250347]
                     to a SOFTIRQ-irq-unsafe lock:  (ife_mod_lock){++++}
      [  508.256531]
                     ... which became SOFTIRQ-irq-unsafe at:
      [  508.267279] ...
      [  508.267283]   _raw_write_lock+0x2c/0x40
      [  508.273653]   register_ife_op+0x118/0x2c0 [act_ife]
      [  508.278926]   do_one_initcall+0xf7/0x4d9
      [  508.283214]   do_init_module+0x18b/0x44e
      [  508.287521]   load_module+0x4167/0x5730
      [  508.291739]   __do_sys_finit_module+0x16d/0x1a0
      [  508.296654]   do_syscall_64+0x7a/0x3f0
      [  508.300788]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  508.306302]
                     other info that might help us debug this:
      
      [  508.315286]  Possible interrupt unsafe locking scenario:
      
      [  508.322771]        CPU0                    CPU1
      [  508.327681]        ----                    ----
      [  508.332604]   lock(ife_mod_lock);
      [  508.336300]                                local_irq_disable();
      [  508.342608]                                lock(&(&p->tcfa_lock)->rlock);
      [  508.349793]                                lock(ife_mod_lock);
      [  508.355990]   <Interrupt>
      [  508.358974]     lock(&(&p->tcfa_lock)->rlock);
      [  508.363803]
                      *** DEADLOCK ***
      
      [  508.370715] 2 locks held by tc/5460:
      [  508.374680]  #0: 00000000e27e4fa4 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x583/0x7b0
      [  508.383366]  #1: 00000000d46f6c56 (&(&p->tcfa_lock)->rlock){+.-.}, at: tcf_ife_init+0x6ae/0xf40 [act_ife]
      [  508.393648]
                     the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
      [  508.403505] -> (&(&p->tcfa_lock)->rlock){+.-.} ops: 1001553 {
      [  508.409646]    HARDIRQ-ON-W at:
      [  508.413136]                     _raw_spin_lock_bh+0x34/0x40
      [  508.419059]                     gnet_stats_start_copy_compat+0xa2/0x230
      [  508.426021]                     gnet_stats_start_copy+0x16/0x20
      [  508.432333]                     tcf_action_copy_stats+0x95/0x1d0
      [  508.438735]                     tcf_action_dump_1+0xb0/0x4e0
      [  508.444795]                     tcf_action_dump+0xca/0x200
      [  508.450673]                     tcf_exts_dump+0xd9/0x320
      [  508.456392]                     fl_dump+0x1b7/0x4a0 [cls_flower]
      [  508.462798]                     tcf_fill_node+0x380/0x530
      [  508.468601]                     tfilter_notify+0xdf/0x1c0
      [  508.474404]                     tc_new_tfilter+0x84a/0xc90
      [  508.480270]                     rtnetlink_rcv_msg+0x5bd/0x7b0
      [  508.486419]                     netlink_rcv_skb+0x184/0x220
      [  508.492394]                     netlink_unicast+0x31b/0x460
      [  508.507411]                     netlink_sendmsg+0x3fb/0x840
      [  508.513390]                     sock_sendmsg+0x7b/0xd0
      [  508.518907]                     ___sys_sendmsg+0x4c6/0x610
      [  508.524797]                     __sys_sendmsg+0xd7/0x150
      [  508.530510]                     do_syscall_64+0x7a/0x3f0
      [  508.536201]                     entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  508.543301]    IN-SOFTIRQ-W at:
      [  508.546834]                     _raw_spin_lock+0x2c/0x40
      [  508.552522]                     est_fetch_counters+0x41/0xb0
      [  508.558571]                     est_timer+0x83/0x3c0
      [  508.563912]                     call_timer_fn+0x16a/0x5d0
      [  508.569699]                     run_timer_softirq+0x399/0x920
      [  508.575840]                     __do_softirq+0x157/0x97d
      [  508.581538]                     irq_exit+0x152/0x1c0
      [  508.586882]                     smp_apic_timer_interrupt+0xc0/0x4e0
      [  508.593533]                     apic_timer_interrupt+0xf/0x20
      [  508.599686]                     cpuidle_enter_state+0xf8/0x5d0
      [  508.605895]                     do_idle+0x28a/0x350
      [  508.611147]                     cpu_startup_entry+0xc7/0xe0
      [  508.617097]                     start_secondary+0x2e8/0x3f0
      [  508.623029]                     secondary_startup_64+0xa5/0xb0
      [  508.629245]    INITIAL USE at:
      [  508.632686]                    _raw_spin_lock_bh+0x34/0x40
      [  508.638557]                    gnet_stats_start_copy_compat+0xa2/0x230
      [  508.645491]                    gnet_stats_start_copy+0x16/0x20
      [  508.651719]                    tcf_action_copy_stats+0x95/0x1d0
      [  508.657992]                    tcf_action_dump_1+0xb0/0x4e0
      [  508.663937]                    tcf_action_dump+0xca/0x200
      [  508.669716]                    tcf_exts_dump+0xd9/0x320
      [  508.675337]                    fl_dump+0x1b7/0x4a0 [cls_flower]
      [  508.681650]                    tcf_fill_node+0x380/0x530
      [  508.687366]                    tfilter_notify+0xdf/0x1c0
      [  508.693031]                    tc_new_tfilter+0x84a/0xc90
      [  508.698820]                    rtnetlink_rcv_msg+0x5bd/0x7b0
      [  508.704869]                    netlink_rcv_skb+0x184/0x220
      [  508.710758]                    netlink_unicast+0x31b/0x460
      [  508.716627]                    netlink_sendmsg+0x3fb/0x840
      [  508.722510]                    sock_sendmsg+0x7b/0xd0
      [  508.727931]                    ___sys_sendmsg+0x4c6/0x610
      [  508.733729]                    __sys_sendmsg+0xd7/0x150
      [  508.739346]                    do_syscall_64	+0x7a/0x3f0
      [  508.744943]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  508.751930]  }
      [  508.753964]  ... key      at: [<ffffffff916b3e20>] __key.61145+0x0/0x40
      [  508.760946]  ... acquired at:
      [  508.764294]    _raw_read_lock+0x2f/0x40
      [  508.768513]    find_ife_oplist+0x1e/0xc0 [act_ife]
      [  508.773692]    tcf_ife_init+0x82f/0xf40 [act_ife]
      [  508.778785]    tcf_action_init_1+0x510/0x750
      [  508.783468]    tcf_action_init+0x1e8/0x340
      [  508.787938]    tcf_action_add+0xc5/0x240
      [  508.792241]    tc_ctl_action+0x203/0x2a0
      [  508.796550]    rtnetlink_rcv_msg+0x5bd/0x7b0
      [  508.801200]    netlink_rcv_skb+0x184/0x220
      [  508.805674]    netlink_unicast+0x31b/0x460
      [  508.810129]    netlink_sendmsg+0x3fb/0x840
      [  508.814611]    sock_sendmsg+0x7b/0xd0
      [  508.818665]    ___sys_sendmsg+0x4c6/0x610
      [  508.823029]    __sys_sendmsg+0xd7/0x150
      [  508.827246]    do_syscall_64+0x7a/0x3f0
      [  508.831483]    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
                     the dependencies between the lock to be acquired
      [  508.838945]  and SOFTIRQ-irq-unsafe lock:
      [  508.851177] -> (ife_mod_lock){++++} ops: 95 {
      [  508.855920]    HARDIRQ-ON-W at:
      [  508.859478]                     _raw_write_lock+0x2c/0x40
      [  508.865264]                     register_ife_op+0x118/0x2c0 [act_ife]
      [  508.872071]                     do_one_initcall+0xf7/0x4d9
      [  508.877947]                     do_init_module+0x18b/0x44e
      [  508.883819]                     load_module+0x4167/0x5730
      [  508.889595]                     __do_sys_finit_module+0x16d/0x1a0
      [  508.896043]                     do_syscall_64+0x7a/0x3f0
      [  508.901734]                     entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  508.908827]    HARDIRQ-ON-R at:
      [  508.912359]                     _raw_read_lock+0x2f/0x40
      [  508.918043]                     find_ife_oplist+0x1e/0xc0 [act_ife]
      [  508.924692]                     tcf_ife_init+0x82f/0xf40 [act_ife]
      [  508.931252]                     tcf_action_init_1+0x510/0x750
      [  508.937393]                     tcf_action_init+0x1e8/0x340
      [  508.943366]                     tcf_action_add+0xc5/0x240
      [  508.949130]                     tc_ctl_action+0x203/0x2a0
      [  508.954922]                     rtnetlink_rcv_msg+0x5bd/0x7b0
      [  508.961024]                     netlink_rcv_skb+0x184/0x220
      [  508.966970]                     netlink_unicast+0x31b/0x460
      [  508.972915]                     netlink_sendmsg+0x3fb/0x840
      [  508.978859]                     sock_sendmsg+0x7b/0xd0
      [  508.984400]                     ___sys_sendmsg+0x4c6/0x610
      [  508.990264]                     __sys_sendmsg+0xd7/0x150
      [  508.995952]                     do_syscall_64+0x7a/0x3f0
      [  509.001643]                     entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  509.008722]    SOFTIRQ-ON-W at:\
      [  509.012242]                     _raw_write_lock+0x2c/0x40
      [  509.018013]                     register_ife_op+0x118/0x2c0 [act_ife]
      [  509.024841]                     do_one_initcall+0xf7/0x4d9
      [  509.030720]                     do_init_module+0x18b/0x44e
      [  509.036604]                     load_module+0x4167/0x5730
      [  509.042397]                     __do_sys_finit_module+0x16d/0x1a0
      [  509.048865]                     do_syscall_64+0x7a/0x3f0
      [  509.054551]                     entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  509.061636]    SOFTIRQ-ON-R at:
      [  509.065145]                     _raw_read_lock+0x2f/0x40
      [  509.070854]                     find_ife_oplist+0x1e/0xc0 [act_ife]
      [  509.077515]                     tcf_ife_init+0x82f/0xf40 [act_ife]
      [  509.084051]                     tcf_action_init_1+0x510/0x750
      [  509.090172]                     tcf_action_init+0x1e8/0x340
      [  509.096124]                     tcf_action_add+0xc5/0x240
      [  509.101891]                     tc_ctl_action+0x203/0x2a0
      [  509.107671]                     rtnetlink_rcv_msg+0x5bd/0x7b0
      [  509.113811]                     netlink_rcv_skb+0x184/0x220
      [  509.119768]                     netlink_unicast+0x31b/0x460
      [  509.125716]                     netlink_sendmsg+0x3fb/0x840
      [  509.131668]                     sock_sendmsg+0x7b/0xd0
      [  509.137167]                     ___sys_sendmsg+0x4c6/0x610
      [  509.143010]                     __sys_sendmsg+0xd7/0x150
      [  509.148718]                     do_syscall_64+0x7a/0x3f0
      [  509.154443]                     entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  509.161533]    INITIAL USE at:
      [  509.164956]                    _raw_read_lock+0x2f/0x40
      [  509.170574]                    find_ife_oplist+0x1e/0xc0 [act_ife]
      [  509.177134]                    tcf_ife_init+0x82f/0xf40 [act_ife]
      [  509.183619]                    tcf_action_init_1+0x510/0x750
      [  509.189674]                    tcf_action_init+0x1e8/0x340
      [  509.195534]                    tcf_action_add+0xc5/0x240
      [  509.201229]                    tc_ctl_action+0x203/0x2a0
      [  509.206920]                    rtnetlink_rcv_msg+0x5bd/0x7b0
      [  509.212936]                    netlink_rcv_skb+0x184/0x220
      [  509.218818]                    netlink_unicast+0x31b/0x460
      [  509.224699]                    netlink_sendmsg+0x3fb/0x840
      [  509.230581]                    sock_sendmsg+0x7b/0xd0
      [  509.235984]                    ___sys_sendmsg+0x4c6/0x610
      [  509.241791]                    __sys_sendmsg+0xd7/0x150
      [  509.247425]                    do_syscall_64+0x7a/0x3f0
      [  509.253007]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  509.259975]  }
      [  509.261998]  ... key      at: [<ffffffffc1554258>] ife_mod_lock+0x18/0xffffffffffff8dc0 [act_ife]
      [  509.271569]  ... acquired at:
      [  509.274912]    _raw_read_lock+0x2f/0x40
      [  509.279134]    find_ife_oplist+0x1e/0xc0 [act_ife]
      [  509.284324]    tcf_ife_init+0x82f/0xf40 [act_ife]
      [  509.289425]    tcf_action_init_1+0x510/0x750
      [  509.294068]    tcf_action_init+0x1e8/0x340
      [  509.298553]    tcf_action_add+0xc5/0x240
      [  509.302854]    tc_ctl_action+0x203/0x2a0
      [  509.307153]    rtnetlink_rcv_msg+0x5bd/0x7b0
      [  509.311805]    netlink_rcv_skb+0x184/0x220
      [  509.316282]    netlink_unicast+0x31b/0x460
      [  509.320769]    netlink_sendmsg+0x3fb/0x840
      [  509.325248]    sock_sendmsg+0x7b/0xd0
      [  509.329290]    ___sys_sendmsg+0x4c6/0x610
      [  509.333687]    __sys_sendmsg+0xd7/0x150
      [  509.337902]    do_syscall_64+0x7a/0x3f0
      [  509.342116]    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  509.349601]
                     stack backtrace:
      [  509.354663] CPU: 6 PID: 5460 Comm: tc Not tainted 4.18.0-rc8+ #646
      [  509.361216] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      
      Fixes: ef6980b6 ("introduce IFE action")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42c625a4
    • David S. Miller's avatar
      Merge branch 'for-upstream' of... · bb2a0812
      David S. Miller authored
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2018-08-13
      
      There was one pretty bad bug that slipped into the MediaTek HCI driver
      in the last bluetooth-next pull request. Would it be possible to get
      this one-liner fix pulled to net-next before you make your first 4.19
      pull request for Linus? Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb2a0812
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · c1617fb4
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-08-13
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Add driver XDP support for veth. This can be used in conjunction with
         redirect of another XDP program e.g. sitting on NIC so the xdp_frame
         can be forwarded to the peer veth directly without modification,
         from Toshiaki.
      
      2) Add a new BPF map type REUSEPORT_SOCKARRAY and prog type SK_REUSEPORT
         in order to provide more control and visibility on where a SO_REUSEPORT
         sk should be located, and the latter enables to directly select a sk
         from the bpf map. This also enables map-in-map for application migration
         use cases, from Martin.
      
      3) Add a new BPF helper bpf_skb_ancestor_cgroup_id() that returns the id
         of cgroup v2 that is the ancestor of the cgroup associated with the
         skb at the ancestor_level, from Andrey.
      
      4) Implement BPF fs map pretty-print support based on BTF data for regular
         hash table and LRU map, from Yonghong.
      
      5) Decouple the ability to attach BTF for a map from the key and value
         pretty-printer in BPF fs, and enable further support of BTF for maps for
         percpu and LPM trie, from Daniel.
      
      6) Implement a better BPF sample of using XDP's CPU redirect feature for
         load balancing SKB processing to remote CPU. The sample implements the
         same XDP load balancing as Suricata does which is symmetric hash based
         on IP and L4 protocol, from Jesper.
      
      7) Revert adding NULL pointer check with WARN_ON_ONCE() in __xdp_return()'s
         critical path as it is ensured that the allocator is present, from Björn.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1617fb4
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-for-davem-2018-08-12' of... · 961d9735
      David S. Miller authored
      Merge tag 'wireless-drivers-next-for-davem-2018-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      pull-request: wireless-drivers-next 2018-08-12
      
      wireless-drivers-next patches for 4.19
      
      Last set of new features for 4.19. Most notable is simplifying SSB
      debugging code with two Kconfig option removals and fixing mt76 USB
      build problems.
      
      Major changes:
      
      ath10k
      
      * add debugfs file warm_hw_reset
      
      wil6210
      
      * add debugfs files tx_latency, link_stats and link_stats_global
      
      * add 3-MSI support
      
      * allow scan on AP interface
      
      * support max aggregation window size 64
      
      ssb
      
      * remove CONFIG_SSB_SILENT and CONFIG_SSB_DEBUG Kconfig options
      
      mt76
      
      * fix build problems with recently added USB support
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      961d9735
    • YueHaibing's avatar
      liquidio: remove set but not used variable 'is25G' · 45c91fb2
      YueHaibing authored
      Fixes gcc '-Wunused-but-set-variable' warning:
      
      drivers/net/ethernet/cavium/liquidio/lio_ethtool.c: In function 'lio_set_link_ksettings':
      drivers/net/ethernet/cavium/liquidio/lio_ethtool.c:392:6: warning:
       variable 'is25G' set but not used [-Wunused-but-set-variable]
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45c91fb2
    • YueHaibing's avatar
      cxgb4: remove set but not used variable 'spd' · 0ec45680
      YueHaibing authored
      Fixes gcc '-Wunused-but-set-variable' warning:
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function 'print_port_info':
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:5147:14: warning:
       variable 'spd' set but not used [-Wunused-but-set-variable]
      
      variable 'spd' is set but not used since
      commit 547fd272 ("cxgb4: Warn if device doesn't have enough PCI bandwidth")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ec45680
    • Yue Haibing's avatar
      lan743x: lan743x: Remove duplicated include from lan743x_ptp.c · 3b20818b
      Yue Haibing authored
      Remove duplicated include.
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b20818b
    • YueHaibing's avatar
      virtio_net: remove duplicated include from virtio_net.c · 1150827b
      YueHaibing authored
      Remove duplicated include linux/netdevice.h
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1150827b
    • Li RongQing's avatar
      packet: switch kvzalloc to allocate memory · 71e41286
      Li RongQing authored
      The patches includes following change:
      
      *Use modern kvzalloc()/kvfree() instead of custom allocations.
      
      *Remove order argument for alloc_pg_vec, it can get from req.
      
      *Remove order argument for free_pg_vec, free_pg_vec now uses
      kvfree which does not need order argument.
      
      *Remove pg_vec_order from struct packet_ring_buffer, no longer
      need to save/restore 'order'
      
      *Remove variable 'order' for packet_set_ring, it is now unused
      Signed-off-by: default avatarZhang Yu <zhangyu31@baidu.com>
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71e41286
    • Zong Li's avatar
      net: Change the layout of structure trace_event_raw_fib_table_lookup · 0192e7d4
      Zong Li authored
      There is an unalignment access about the structure
      'trace_event_raw_fib_table_lookup'.
      
      In include/trace/events/fib.h, there is a memory operation which casting
      the 'src' data member to a pointer, and then store a value to this
      pointer point to.
      
      p32 = (__be32 *) __entry->src;
      *p32 = flp->saddr;
      
      The offset of 'src' in structure trace_event_raw_fib_table_lookup is not
      four bytes alignment. On some architectures, they don't permit the
      unalignment access, it need to pay the price to handle this situation in
      exception handler.
      
      Adjust the layout of structure to avoid this case.
      
      Fixes: 9f323973 ("net/ipv4: Udate fib_table_lookup tracepoint")
      Signed-off-by: default avatarZong Li <zong@andestech.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0192e7d4
    • David S. Miller's avatar
      Merge branch 'net-sched-actions-rename-for-grep-ability-and-consistency' · a72ce9ad
      David S. Miller authored
      Jamal Hadi Salim says:
      
      ====================
      net: sched: actions rename for grep-ability and consistency
      
      Having a structure (example tcf_mirred) and a function with the same name is
      not good for readability or grepability.
      
      This long overdue patchset improves it and make sure there is consistency
      across all actions
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a72ce9ad
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
    • Willem de Bruijn's avatar
      cpumask: make cpumask_next_wrap available without smp · 9af18e56
      Willem de Bruijn authored
      The kbuild robot shows build failure on machines without CONFIG_SMP:
      
        drivers/net/virtio_net.c:1916:10: error:
          implicit declaration of function 'cpumask_next_wrap'
      
      cpumask_next_wrap is exported from lib/cpumask.o, which has
      
          lib-$(CONFIG_SMP) += cpumask.o
      
      same as other functions, also define it as static inline in the
      NR_CPUS==1 branch in include/linux/cpumask.h.
      
      If wrap is true and next == start, return nr_cpumask_bits, or 1.
      Else wrap across the range of valid cpus, here [0].
      
      Fixes: 2ca653d6 ("virtio_net: Stripe queue affinities across cores.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Tested-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9af18e56
    • Heiner Kallweit's avatar
      r8169: don't use MSI-X on RTL8168g · 7c53a722
      Heiner Kallweit authored
      There have been two reports that network doesn't come back on resume
      from suspend when using MSI-X. Both cases affect the same chip version
      (RTL8168g - version 40), on different systems. Falling back to MSI
      fixes the issue.
      Even though we don't really have a proof yet that the network chip
      version is to blame, let's disable MSI-X for this version.
      Reported-by: default avatarSteve Dodd <steved424@gmail.com>
      Reported-by: default avatarLou Reed <gogen@disroot.org>
      Tested-by: default avatarSteve Dodd <steved424@gmail.com>
      Tested-by: default avatarLou Reed <gogen@disroot.org>
      Fixes: 6c6aa15f ("r8169: improve interrupt handling")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c53a722
    • David S. Miller's avatar
      Merge branch 'nixge-Minor-cleanups' · 9ebcc22c
      David S. Miller authored
      Moritz Fischer says:
      
      ====================
      net: nixge: Minor cleanups
      
      in preparation of my 64-bit support series, here's some
      minor cleanup in preparation that gets rid of unneccesary
      accesses to the descriptor application fields.
      
      I've confirmed that the hardware does not access the fields
      in all our configurations.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ebcc22c
    • Moritz Fischer's avatar
      net: nixge: Don't store skb in app4 field of descriptor · fd5cf434
      Moritz Fischer authored
      Don't store skb in app4 field of descriptor since it is
      not being used anywhere (including hardware).
      Signed-off-by: default avatarMoritz Fischer <mdf@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd5cf434
    • Moritz Fischer's avatar
      net: nixge: Do not zero application specific fields in desc · e158770e
      Moritz Fischer authored
      Do not zero application specific fields in DMA descriptors.
      The hardware does ignore them, so should software.
      Signed-off-by: default avatarMoritz Fischer <mdf@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e158770e
    • Wei Wang's avatar
      l2tp: use sk_dst_check() to avoid race on sk->sk_dst_cache · 6d37fa49
      Wei Wang authored
      In l2tp code, if it is a L2TP_UDP_ENCAP tunnel, tunnel->sk points to a
      UDP socket. User could call sendmsg() on both this tunnel and the UDP
      socket itself concurrently. As l2tp_xmit_skb() holds socket lock and call
      __sk_dst_check() to refresh sk->sk_dst_cache, while udpv6_sendmsg() is
      lockless and call sk_dst_check() to refresh sk->sk_dst_cache, there
      could be a race and cause the dst cache to be freed multiple times.
      So we fix l2tp side code to always call sk_dst_check() to garantee
      xchg() is called when refreshing sk->sk_dst_cache to avoid race
      conditions.
      
      Syzkaller reported stack trace:
      BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
      BUG: KASAN: use-after-free in atomic_fetch_add_unless include/linux/atomic.h:575 [inline]
      BUG: KASAN: use-after-free in atomic_add_unless include/linux/atomic.h:597 [inline]
      BUG: KASAN: use-after-free in dst_hold_safe include/net/dst.h:308 [inline]
      BUG: KASAN: use-after-free in ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029
      Read of size 4 at addr ffff8801aea9a880 by task syz-executor129/4829
      
      CPU: 0 PID: 4829 Comm: syz-executor129 Not tainted 4.18.0-rc7-next-20180802+ #30
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
       check_memory_region_inline mm/kasan/kasan.c:260 [inline]
       check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
       kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
       atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
       atomic_fetch_add_unless include/linux/atomic.h:575 [inline]
       atomic_add_unless include/linux/atomic.h:597 [inline]
       dst_hold_safe include/net/dst.h:308 [inline]
       ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029
       rt6_get_pcpu_route net/ipv6/route.c:1249 [inline]
       ip6_pol_route+0x354/0xd20 net/ipv6/route.c:1922
       ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2098
       fib6_rule_lookup+0x283/0x890 net/ipv6/fib6_rules.c:122
       ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2126
       ip6_dst_lookup_tail+0x1278/0x1da0 net/ipv6/ip6_output.c:978
       ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
       ip6_sk_dst_lookup_flow+0x5ed/0xc50 net/ipv6/ip6_output.c:1117
       udpv6_sendmsg+0x2163/0x36b0 net/ipv6/udp.c:1354
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x51d/0x930 net/socket.c:2115
       __sys_sendmmsg+0x240/0x6f0 net/socket.c:2210
       __do_sys_sendmmsg net/socket.c:2239 [inline]
       __se_sys_sendmmsg net/socket.c:2236 [inline]
       __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2236
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x446a29
      Code: e8 ac b8 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f4de5532db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000006dcc38 RCX: 0000000000446a29
      RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003
      RBP: 00000000006dcc30 R08: 00007f4de5533700 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dcc3c
      R13: 00007ffe2b830fdf R14: 00007f4de55339c0 R15: 0000000000000001
      
      Fixes: 71b1391a ("l2tp: ensure sk->dst is still valid")
      Reported-by: syzbot+05f840f3b04f211bad55@syzkaller.appspotmail.com
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Guillaume Nault <g.nault@alphalink.fr>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d37fa49
    • Virgile Jarry's avatar
      ipv6: Add icmp_echo_ignore_all support for ICMPv6 · e6f86b0f
      Virgile Jarry authored
      Preventing the kernel from responding to ICMP Echo Requests messages
      can be useful in several ways. The sysctl parameter
      'icmp_echo_ignore_all' can be used to prevent the kernel from
      responding to IPv4 ICMP echo requests. For IPv6 pings, such
      a sysctl kernel parameter did not exist.
      
      Add the ability to prevent the kernel from responding to IPv6
      ICMP echo requests through the use of the following sysctl
      parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all.
      Update the documentation to reflect this change.
      Signed-off-by: default avatarVirgile Jarry <virgile@acceis.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6f86b0f
    • David S. Miller's avatar
      Merge branch 'net-tls-Combined-memory-allocation-for-decryption-request' · 8f780044
      David S. Miller authored
      Vakul Garg says:
      
      ====================
      net/tls: Combined memory allocation for decryption request
      
      This patch does a combined memory allocation from heap for scatterlists,
      aead_request, aad and iv for the tls record decryption path. In present
      code, aead_request is allocated from heap, scatterlists on a conditional
      basis are allocated on heap or on stack. This is inefficient as it may
      requires multiple kmalloc/kfree.
      
      The initialization vector passed in cryption request is allocated on
      stack. This is a problem since the stack memory is not dma-able from
      crypto accelerators.
      
      Doing one combined memory allocation for each decryption request fixes
      both the above issues. It also paves a way to be able to submit multiple
      async decryption requests while the previous one is pending i.e. being
      processed or queued.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f780044
    • Vakul Garg's avatar
      net/tls: Combined memory allocation for decryption request · 0b243d00
      Vakul Garg authored
      For preparing decryption request, several memory chunks are required
      (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
      an accelerator, it is required that the buffers which are read by the
      accelerator must be dma-able and not come from stack. The buffers for
      aad and iv can be separately kmalloced each, but it is inefficient.
      This patch does a combined allocation for preparing decryption request
      and then segments into aead_req || sgin || sgout || iv || aad.
      Signed-off-by: default avatarVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b243d00
    • Dan Carpenter's avatar
      Bluetooth: mediatek: pass correct size to h4_recv_buf() · 330ad75f
      Dan Carpenter authored
      We're supposed to pass the number of elements in the mtk_recv_pkts, not
      the number of bytes.
      
      Fixes: 7237c4c9 ("Bluetooth: mediatek: Add protocol support for MediaTek serial devices")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      330ad75f
  2. 12 Aug, 2018 6 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-ancestor-cgroup-id' · 2ce3206b
      Daniel Borkmann authored
      Andrey Ignatov says:
      
      ====================
      This patch set adds new BPF helper bpf_skb_ancestor_cgroup_id that returns
      id of cgroup v2 that is ancestor of cgroup associated with the skb at the
      ancestor_level.
      
      The helper is useful to implement policies in TC based on cgroups that are
      upper in hierarchy than immediate cgroup associated with skb.
      
      v1->v2:
      - more reliable check for testing IPv6 to become ready in selftest.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2ce3206b
    • Andrey Ignatov's avatar
      selftests/bpf: Selftest for bpf_skb_ancestor_cgroup_id · 5ecd8c22
      Andrey Ignatov authored
      Add selftests for bpf_skb_ancestor_cgroup_id helper.
      
      test_skb_cgroup_id.sh prepares testing interface and adds tc qdisc and
      filter for it using BPF object compiled from test_skb_cgroup_id_kern.c
      program.
      
      BPF program in test_skb_cgroup_id_kern.c gets ancestor cgroup id using
      the new helper at different levels of cgroup hierarchy that skb belongs
      to, including root level and non-existing level, and saves it to the map
      where the key is the level of corresponding cgroup and the value is its
      id.
      
      To trigger BPF program, user space program test_skb_cgroup_id_user is
      run. It adds itself into testing cgroup and sends UDP datagram to
      link-local multicast address of testing interface. Then it reads cgroup
      ids saved in kernel for different levels from the BPF map and compares
      them with those in user space. They must be equal for every level of
      ancestry.
      
      Example of run:
        # ./test_skb_cgroup_id.sh
        Wait for testing link-local IP to become available ... OK
        Note: 8 bytes struct bpf_elf_map fixup performed due to size mismatch!
        [PASS]
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5ecd8c22
    • Andrey Ignatov's avatar
      selftests/bpf: Add cgroup id helpers to bpf_helpers.h · 02f6ac74
      Andrey Ignatov authored
      Add bpf_skb_cgroup_id and bpf_skb_ancestor_cgroup_id helpers to
      bpf_helpers.h to use them in tests and samples.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      02f6ac74
    • Andrey Ignatov's avatar
      bpf: Sync bpf.h to tools/ · 539764d0
      Andrey Ignatov authored
      Sync skb_ancestor_cgroup_id() related bpf UAPI changes to tools/.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      539764d0
    • Andrey Ignatov's avatar
      bpf: Introduce bpf_skb_ancestor_cgroup_id helper · 77236281
      Andrey Ignatov authored
      == Problem description ==
      
      It's useful to be able to identify cgroup associated with skb in TC so
      that a policy can be applied to this skb, and existing bpf_skb_cgroup_id
      helper can help with this.
      
      Though in real life cgroup hierarchy and hierarchy to apply a policy to
      don't map 1:1.
      
      It's often the case that there is a container and corresponding cgroup,
      but there are many more sub-cgroups inside container, e.g. because it's
      delegated to containerized application to control resources for its
      subsystems, or to separate application inside container from infra that
      belongs to containerization system (e.g. sshd).
      
      At the same time it may be useful to apply a policy to container as a
      whole.
      
      If multiple containers like this are run on a host (what is often the
      case) and many of them have sub-cgroups, it may not be possible to apply
      per-container policy in TC with existing helpers such as
      bpf_skb_under_cgroup or bpf_skb_cgroup_id:
      
      * bpf_skb_cgroup_id will return id of immediate cgroup associated with
        skb, i.e. if it's a sub-cgroup inside container, it can't be used to
        identify container's cgroup;
      
      * bpf_skb_under_cgroup can work only with one cgroup and doesn't scale,
        i.e. if there are N containers on a host and a policy has to be
        applied to M of them (0 <= M <= N), it'd require M calls to
        bpf_skb_under_cgroup, and, if M changes, it'd require to rebuild &
        load new BPF program.
      
      == Solution ==
      
      The patch introduces new helper bpf_skb_ancestor_cgroup_id that can be
      used to get id of cgroup v2 that is an ancestor of cgroup associated
      with skb at specified level of cgroup hierarchy.
      
      That way admin can place all containers on one level of cgroup hierarchy
      (what is a good practice in general and already used in many
      configurations) and identify specific cgroup on this level no matter
      what sub-cgroup skb is associated with.
      
      E.g. if there is a cgroup hierarchy:
        root/
        root/container1/
        root/container1/app11/
        root/container1/app11/sub-app-a/
        root/container1/app12/
        root/container2/
        root/container2/app21/
        root/container2/app22/
        root/container2/app22/sub-app-b/
      
      , then having skb associated with root/container1/app11/sub-app-a/ it's
      possible to get ancestor at level 1, what is container1 and apply policy
      for this container, or apply another policy if it's container2.
      
      Policies can be kept e.g. in a hash map where key is a container cgroup
      id and value is an action.
      
      Levels where container cgroups are created are usually known in advance
      whether cgroup hierarchy inside container may be hard to predict
      especially in case when its creation is delegated to containerized
      application.
      
      == Implementation details ==
      
      The helper gets ancestor by walking parents up to specified level.
      
      Another option would be to get different kind of "id" from
      cgroup->ancestor_ids[level] and use it with idr_find() to get struct
      cgroup for ancestor. But that would require radix lookup what doesn't
      seem to be better (at least it's not obviously better).
      
      Format of return value of the new helper is same as that of
      bpf_skb_cgroup_id.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      77236281
    • Daniel Borkmann's avatar
      bpf: decouple btf from seq bpf fs dump and enable more maps · e8d2bec0
      Daniel Borkmann authored
      Commit a26ca7c9 ("bpf: btf: Add pretty print support to
      the basic arraymap") and 699c86d6 ("bpf: btf: add pretty
      print for hash/lru_hash maps") enabled support for BTF and
      dumping via BPF fs for array and hash/lru map. However, both
      can be decoupled from each other such that regular BPF maps
      can be supported for attaching BTF key/value information,
      while not all maps necessarily need to dump via map_seq_show_elem()
      callback.
      
      The basic sanity check which is a prerequisite for all maps
      is that key/value size has to match in any case, and some maps
      can have extra checks via map_check_btf() callback, e.g.
      probing certain types or indicating no support in general. With
      that we can also enable retrieving BTF info for per-cpu map
      types and lpm.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      e8d2bec0