1. 05 Apr, 2019 4 commits
    • Lorenzo Bianconi's avatar
      ipv6: sit: reset ip header pointer in ipip6_rcv · bb9bd814
      Lorenzo Bianconi authored
      ipip6 tunnels run iptunnel_pull_header on received skbs. This can
      determine the following use-after-free accessing iph pointer since
      the packet will be 'uncloned' running pskb_expand_head if it is a
      cloned gso skb (e.g if the packet has been sent though a veth device)
      
      [  706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit]
      [  706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/=
      [  706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016
      [  706.771839] Call trace:
      [  706.801159]  dump_backtrace+0x0/0x2f8
      [  706.845079]  show_stack+0x24/0x30
      [  706.884833]  dump_stack+0xe0/0x11c
      [  706.925629]  print_address_description+0x68/0x260
      [  706.982070]  kasan_report+0x178/0x340
      [  707.025995]  __asan_report_load1_noabort+0x30/0x40
      [  707.083481]  ipip6_rcv+0x1678/0x16e0 [sit]
      [  707.132623]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  707.185940]  ip_local_deliver_finish+0x3b8/0x988
      [  707.241338]  ip_local_deliver+0x144/0x470
      [  707.289436]  ip_rcv_finish+0x43c/0x14b0
      [  707.335447]  ip_rcv+0x628/0x1138
      [  707.374151]  __netif_receive_skb_core+0x1670/0x2600
      [  707.432680]  __netif_receive_skb+0x28/0x190
      [  707.482859]  process_backlog+0x1d0/0x610
      [  707.529913]  net_rx_action+0x37c/0xf68
      [  707.574882]  __do_softirq+0x288/0x1018
      [  707.619852]  run_ksoftirqd+0x70/0xa8
      [  707.662734]  smpboot_thread_fn+0x3a4/0x9e8
      [  707.711875]  kthread+0x2c8/0x350
      [  707.750583]  ret_from_fork+0x10/0x18
      
      [  707.811302] Allocated by task 16982:
      [  707.854182]  kasan_kmalloc.part.1+0x40/0x108
      [  707.905405]  kasan_kmalloc+0xb4/0xc8
      [  707.948291]  kasan_slab_alloc+0x14/0x20
      [  707.994309]  __kmalloc_node_track_caller+0x158/0x5e0
      [  708.053902]  __kmalloc_reserve.isra.8+0x54/0xe0
      [  708.108280]  __alloc_skb+0xd8/0x400
      [  708.150139]  sk_stream_alloc_skb+0xa4/0x638
      [  708.200346]  tcp_sendmsg_locked+0x818/0x2b90
      [  708.251581]  tcp_sendmsg+0x40/0x60
      [  708.292376]  inet_sendmsg+0xf0/0x520
      [  708.335259]  sock_sendmsg+0xac/0xf8
      [  708.377096]  sock_write_iter+0x1c0/0x2c0
      [  708.424154]  new_sync_write+0x358/0x4a8
      [  708.470162]  __vfs_write+0xc4/0xf8
      [  708.510950]  vfs_write+0x12c/0x3d0
      [  708.551739]  ksys_write+0xcc/0x178
      [  708.592533]  __arm64_sys_write+0x70/0xa0
      [  708.639593]  el0_svc_handler+0x13c/0x298
      [  708.686646]  el0_svc+0x8/0xc
      
      [  708.739019] Freed by task 17:
      [  708.774597]  __kasan_slab_free+0x114/0x228
      [  708.823736]  kasan_slab_free+0x10/0x18
      [  708.868703]  kfree+0x100/0x3d8
      [  708.905320]  skb_free_head+0x7c/0x98
      [  708.948204]  skb_release_data+0x320/0x490
      [  708.996301]  pskb_expand_head+0x60c/0x970
      [  709.044399]  __iptunnel_pull_header+0x3b8/0x5d0
      [  709.098770]  ipip6_rcv+0x41c/0x16e0 [sit]
      [  709.146873]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  709.200195]  ip_local_deliver_finish+0x3b8/0x988
      [  709.255596]  ip_local_deliver+0x144/0x470
      [  709.303692]  ip_rcv_finish+0x43c/0x14b0
      [  709.349705]  ip_rcv+0x628/0x1138
      [  709.388413]  __netif_receive_skb_core+0x1670/0x2600
      [  709.446943]  __netif_receive_skb+0x28/0x190
      [  709.497120]  process_backlog+0x1d0/0x610
      [  709.544169]  net_rx_action+0x37c/0xf68
      [  709.589131]  __do_softirq+0x288/0x1018
      
      [  709.651938] The buggy address belongs to the object at ffffe01b6bd85580
                      which belongs to the cache kmalloc-1024 of size 1024
      [  709.804356] The buggy address is located 117 bytes inside of
                      1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980)
      [  709.946340] The buggy address belongs to the page:
      [  710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0
      [  710.099914] flags: 0xfffff8000000100(slab)
      [  710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600
      [  710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000
      [  710.334966] page dumped because: kasan: bad access detected
      
      Fix it resetting iph pointer after iptunnel_pull_header
      
      Fixes: a09a4c8d ("tunnels: Remove encapsulation offloads on decap")
      Tested-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb9bd814
    • Nikolay Aleksandrov's avatar
      net: bridge: always clear mcast matching struct on reports and leaves · 1515a63f
      Nikolay Aleksandrov authored
      We need to be careful and always zero the whole br_ip struct when it is
      used for matching since the rhashtable change. This patch fixes all the
      places which didn't properly clear it which in turn might've caused
      mismatches.
      
      Thanks for the great bug report with reproducing steps and bisection.
      
      Steps to reproduce (from the bug report):
      ip link add br0 type bridge mcast_querier 1
      ip link set br0 up
      
      ip link add v2 type veth peer name v3
      ip link set v2 master br0
      ip link set v2 up
      ip link set v3 up
      ip addr add 3.0.0.2/24 dev v3
      
      ip netns add test
      ip link add v1 type veth peer name v1 netns test
      ip link set v1 master br0
      ip link set v1 up
      ip -n test link set v1 up
      ip -n test addr add 3.0.0.1/24 dev v1
      
      # Multicast receiver
      ip netns exec test socat
      UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork -
      
      # Multicast sender
      echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588
      
      Reported-by: liam.mcbirnie@boeing.com
      Fixes: 19e3a9c9 ("net: bridge: convert multicast to generic rhashtable")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1515a63f
    • Varun Prakash's avatar
      libcxgb: fix incorrect ppmax calculation · cc5a726c
      Varun Prakash authored
      BITS_TO_LONGS() uses DIV_ROUND_UP() because of
      this ppmax value can be greater than available
      per cpu page pods.
      
      This patch removes BITS_TO_LONGS() to fix this
      issue.
      Signed-off-by: default avatarVarun Prakash <varun@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc5a726c
    • Chris Leech's avatar
      vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x · 0a89eb92
      Chris Leech authored
      Way back in 3c9c36bc the
      ndo_fcoe_get_wwn pointer was switched from depending on CONFIG_FCOE to
      CONFIG_LIBFCOE in order to allow building FCoE support into the bnx2x
      driver and used by bnx2fc without including the generic software fcoe
      module.
      
      But, FCoE is generally used over an 802.1q VLAN, and the implementation
      of ndo_fcoe_get_wwn in the 8021q module was not similarly changed.  The
      result is that if CONFIG_FCOE is disabled, then bnz2fc cannot make a
      call to ndo_fcoe_get_wwn through the 8021q interface to the underlying
      bnx2x interface.  The bnx2fc driver then falls back to a potentially
      different mapping of Ethernet MAC to Fibre Channel WWN, creating an
      incompatibility with the fabric and target configurations when compared
      to the WWNs used by pre-boot firmware and differently-configured
      kernels.
      
      So make the conditional inclusion of FCoE code in 8021q match the
      conditional inclusion in netdevice.h
      Signed-off-by: default avatarChris Leech <cleech@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a89eb92
  2. 04 Apr, 2019 17 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 5ba57801
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-04-04
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Batch of fixes to the existing BPF flow dissector API to support
         calling BPF programs from the eth_get_headlen context (support for
         latter is planned to be added in bpf-next), from Stanislav.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ba57801
    • David S. Miller's avatar
      Merge branch 'sch_cake-fixes' · 3baf5c2d
      David S. Miller authored
      Toke Høiland-Jørgensen says:
      
      ====================
      sched: A few small fixes for sch_cake
      
      Kevin noticed a few issues with the way CAKE reads the skb protocol and the IP
      diffserv fields. This series fixes those two issues, and should probably go to
      in 4.19 as well. However, the previous refactoring patch means they don't apply
      as-is; I can send a follow-up directly to stable if that's OK with you?
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3baf5c2d
    • Toke Høiland-Jørgensen's avatar
      sch_cake: Make sure we can write the IP header before changing DSCP bits · c87b4ecd
      Toke Høiland-Jørgensen authored
      There is not actually any guarantee that the IP headers are valid before we
      access the DSCP bits of the packets. Fix this using the same approach taken
      in sch_dsmark.
      Reported-by: default avatarKevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c87b4ecd
    • Toke Høiland-Jørgensen's avatar
      sch_cake: Use tc_skb_protocol() helper for getting packet protocol · b2100cc5
      Toke Høiland-Jørgensen authored
      We shouldn't be using skb->protocol directly as that will miss cases with
      hardware-accelerated VLAN tags. Use the helper instead to get the right
      protocol number.
      Reported-by: default avatarKevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2100cc5
    • Koen De Schepper's avatar
      tcp: Ensure DCTCP reacts to losses · aecfde23
      Koen De Schepper authored
      RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
      loss episodes in the same way as conventional TCP".
      
      Currently, Linux DCTCP performs no cwnd reduction when losses
      are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
      alpha to its maximal value if a RTO happens. This behavior
      is sub-optimal for at least two reasons: i) it ignores losses
      triggering fast retransmissions; and ii) it causes unnecessary large
      cwnd reduction in the future if the loss was isolated as it resets
      the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
      denoting a total congestion). The second reason has an especially
      noticeable effect when using DCTCP in high BDP environments, where
      alpha normally stays at low values.
      
      This patch replace the clamping of alpha by setting ssthresh to
      half of cwnd for both fast retransmissions and RTOs, at most once
      per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
      has been removed.
      
      The table below shows experimental results where we measured the
      drop probability of a PIE AQM (not applying ECN marks) at a
      bottleneck in the presence of a single TCP flow with either the
      alpha-clamping option enabled or the cwnd halving proposed by this
      patch. Results using reno or cubic are given for comparison.
      
                                |  Link   |   RTT    |    Drop
                       TCP CC   |  speed  | base+AQM | probability
              ==================|=========|==========|============
                          CUBIC |  40Mbps |  7+20ms  |    0.21%
                           RENO |         |          |    0.19%
              DCTCP-CLAMP-ALPHA |         |          |   25.80%
               DCTCP-HALVE-CWND |         |          |    0.22%
              ------------------|---------|----------|------------
                          CUBIC | 100Mbps |  7+20ms  |    0.03%
                           RENO |         |          |    0.02%
              DCTCP-CLAMP-ALPHA |         |          |   23.30%
               DCTCP-HALVE-CWND |         |          |    0.04%
              ------------------|---------|----------|------------
                          CUBIC | 800Mbps |   1+1ms  |    0.04%
                           RENO |         |          |    0.05%
              DCTCP-CLAMP-ALPHA |         |          |   18.70%
               DCTCP-HALVE-CWND |         |          |    0.06%
      
      We see that, without halving its cwnd for all source of losses,
      DCTCP drives the AQM to large drop probabilities in order to keep
      the queue length under control (i.e., it repeatedly faces RTOs).
      Instead, if DCTCP reacts to all source of losses, it can then be
      controlled by the AQM using similar drop levels than cubic or reno.
      Signed-off-by: default avatarKoen De Schepper <koen.de_schepper@nokia-bell-labs.com>
      Signed-off-by: default avatarOlivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
      Cc: Bob Briscoe <research@bobbriscoe.net>
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <borkmann@iogearbox.net>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aecfde23
    • Davide Caratti's avatar
      net/sched: act_sample: fix divide by zero in the traffic path · fae27081
      Davide Caratti authored
      the control path of 'sample' action does not validate the value of 'rate'
      provided by the user, but then it uses it as divisor in the traffic path.
      Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
      message in case that value is zero, to fix a splat with the script below:
      
       # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
       # tc -s a s action sample
       total acts 1
      
               action order 0: sample rate 1/0 group 1 pipe
                index 2 ref 1 bind 1 installed 19 sec used 19 sec
               Action statistics:
               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
               backlog 0b 0p requeues 0
       # ping 192.0.2.1 -I test0 -c1 -q
      
       divide error: 0000 [#1] SMP PTI
       CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample]
       Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01
       RSP: 0018:ffffae320190ba30 EFLAGS: 00010246
       RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49
       RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0
       RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000
       R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80
       R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000
       FS:  00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0
       Call Trace:
        tcf_action_exec+0x7c/0x1c0
        tcf_classify+0x57/0x160
        __dev_queue_xmit+0x3dc/0xd10
        ip_finish_output2+0x257/0x6d0
        ip_output+0x75/0x280
        ip_send_skb+0x15/0x40
        raw_sendmsg+0xae3/0x1410
        sock_sendmsg+0x36/0x40
        __sys_sendto+0x10e/0x140
        __x64_sys_sendto+0x24/0x30
        do_syscall_64+0x60/0x210
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [...]
        Kernel panic - not syncing: Fatal exception in interrupt
      
      Add a TDC selftest to document that 'rate' is now being validated.
      Reported-by: default avatarMatteo Croce <mcroce@redhat.com>
      Fixes: 5c5670fa ("net/sched: Introduce sample tc action")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarYotam Gigi <yotam.gi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fae27081
    • Lorenzo Bianconi's avatar
      net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop · 2ec1ed2a
      Lorenzo Bianconi authored
      When a bpf program is uploaded, the driver computes the number of
      xdp tx queues resulting in the allocation of additional qsets.
      Starting from commit '2ecbe4f4 ("net: thunderx: replace global
      nicvf_rx_mode_wq work queue for all VFs to private for each of them")'
      the driver runs link state polling for each VF resulting in the
      following NULL pointer dereference:
      
      [   56.169256] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
      [   56.178032] Mem abort info:
      [   56.180834]   ESR = 0x96000005
      [   56.183877]   Exception class = DABT (current EL), IL = 32 bits
      [   56.189792]   SET = 0, FnV = 0
      [   56.192834]   EA = 0, S1PTW = 0
      [   56.195963] Data abort info:
      [   56.198831]   ISV = 0, ISS = 0x00000005
      [   56.202662]   CM = 0, WnR = 0
      [   56.205619] user pgtable: 64k pages, 48-bit VAs, pgdp = 0000000021f0c7a0
      [   56.212315] [0000000000000020] pgd=0000000000000000, pud=0000000000000000
      [   56.219094] Internal error: Oops: 96000005 [#1] SMP
      [   56.260459] CPU: 39 PID: 2034 Comm: ip Not tainted 5.1.0-rc3+ #3
      [   56.266452] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018
      [   56.273315] pstate: 80000005 (Nzcv daif -PAN -UAO)
      [   56.278098] pc : __ll_sc___cmpxchg_case_acq_64+0x4/0x20
      [   56.283312] lr : mutex_lock+0x2c/0x50
      [   56.286962] sp : ffff0000219af1b0
      [   56.290264] x29: ffff0000219af1b0 x28: ffff800f64de49a0
      [   56.295565] x27: 0000000000000000 x26: 0000000000000015
      [   56.300865] x25: 0000000000000000 x24: 0000000000000000
      [   56.306165] x23: 0000000000000000 x22: ffff000011117000
      [   56.311465] x21: ffff800f64dfc080 x20: 0000000000000020
      [   56.316766] x19: 0000000000000020 x18: 0000000000000001
      [   56.322066] x17: 0000000000000000 x16: ffff800f2e077080
      [   56.327367] x15: 0000000000000004 x14: 0000000000000000
      [   56.332667] x13: ffff000010964438 x12: 0000000000000002
      [   56.337967] x11: 0000000000000000 x10: 0000000000000c70
      [   56.343268] x9 : ffff0000219af120 x8 : ffff800f2e077d50
      [   56.348568] x7 : 0000000000000027 x6 : 000000062a9d6a84
      [   56.353869] x5 : 0000000000000000 x4 : ffff800f2e077480
      [   56.359169] x3 : 0000000000000008 x2 : ffff800f2e077080
      [   56.364469] x1 : 0000000000000000 x0 : 0000000000000020
      [   56.369770] Process ip (pid: 2034, stack limit = 0x00000000c862da3a)
      [   56.376110] Call trace:
      [   56.378546]  __ll_sc___cmpxchg_case_acq_64+0x4/0x20
      [   56.383414]  drain_workqueue+0x34/0x198
      [   56.387247]  nicvf_open+0x48/0x9e8 [nicvf]
      [   56.391334]  nicvf_open+0x898/0x9e8 [nicvf]
      [   56.395507]  nicvf_xdp+0x1bc/0x238 [nicvf]
      [   56.399595]  dev_xdp_install+0x68/0x90
      [   56.403333]  dev_change_xdp_fd+0xc8/0x240
      [   56.407333]  do_setlink+0x8e0/0xbe8
      [   56.410810]  __rtnl_newlink+0x5b8/0x6d8
      [   56.414634]  rtnl_newlink+0x54/0x80
      [   56.418112]  rtnetlink_rcv_msg+0x22c/0x2f8
      [   56.422199]  netlink_rcv_skb+0x60/0x120
      [   56.426023]  rtnetlink_rcv+0x28/0x38
      [   56.429587]  netlink_unicast+0x1c8/0x258
      [   56.433498]  netlink_sendmsg+0x1b4/0x350
      [   56.437410]  sock_sendmsg+0x4c/0x68
      [   56.440887]  ___sys_sendmsg+0x240/0x280
      [   56.444711]  __sys_sendmsg+0x68/0xb0
      [   56.448275]  __arm64_sys_sendmsg+0x2c/0x38
      [   56.452361]  el0_svc_handler+0x9c/0x128
      [   56.456186]  el0_svc+0x8/0xc
      [   56.459056] Code: 35ffff91 2a1003e0 d65f03c0 f9800011 (c85ffc10)
      [   56.465166] ---[ end trace 4a57fdc27b0a572c ]---
      [   56.469772] Kernel panic - not syncing: Fatal exception
      
      Fix it by checking nicvf_rx_mode_wq pointer in nicvf_open and nicvf_stop
      
      Fixes: 2ecbe4f4 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them")
      Fixes: 2c632ad8 ("net: thunderx: move link state polling function to VF")
      Reported-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Tested-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ec1ed2a
    • David S. Miller's avatar
      Merge branch 'net-hns-bugfixes-for-HNS-Driver' · 47b62cd8
      David S. Miller authored
      Yonglong Liu says:
      
      ====================
      net: hns: bugfixes for HNS Driver
      
      This patchset fix some bugs that were found in the test of
      various scenarios, or identify by KASAN/sparse.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47b62cd8
    • Yonglong Liu's avatar
      net: hns: Fix sparse: some warnings in HNS drivers · 15400663
      Yonglong Liu authored
      There are some sparse warnings in the HNS drivers:
      
      warning: incorrect type in assignment (different address spaces)
          expected void [noderef] <asn:2> *io_base
          got void *vaddr
      warning: cast removes address space '<asn:2>' of expression
      [...]
      
      Add __iomem and change all the u8 __iomem to void __iomem to
      fix these kind of  warnings.
      
      warning: incorrect type in argument 1 (different address spaces)
          expected void [noderef] <asn:2> *base
          got unsigned char [usertype] *base_addr
      warning: cast to restricted __le16
      warning: incorrect type in assignment (different base types)
          expected unsigned int [usertype] tbl_tcam_data_high
          got restricted __le32 [usertype]
      warning: cast to restricted __le32
      [...]
      
      These variables used u32/u16 as their type, and finally as a
      parameter of writel(), writel() will do the cpu_to_le32 coversion
      so remove the little endian covert code to fix these kind of warnings.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15400663
    • Yonglong Liu's avatar
      net: hns: Fix WARNING when remove HNS driver with SMMU enabled · 8601a99d
      Yonglong Liu authored
      When enable SMMU, remove HNS driver will cause a WARNING:
      
      [  141.924177] WARNING: CPU: 36 PID: 2708 at drivers/iommu/dma-iommu.c:443 __iommu_dma_unmap+0xc0/0xc8
      [  141.954673] Modules linked in: hns_enet_drv(-)
      [  141.963615] CPU: 36 PID: 2708 Comm: rmmod Tainted: G        W         5.0.0-rc1-28723-gb729c57de95c-dirty #32
      [  141.983593] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 UEFI Nemo 1.8 RC0 08/31/2017
      [  142.000244] pstate: 60000005 (nZCv daif -PAN -UAO)
      [  142.009886] pc : __iommu_dma_unmap+0xc0/0xc8
      [  142.018476] lr : __iommu_dma_unmap+0xc0/0xc8
      [  142.027066] sp : ffff000013533b90
      [  142.033728] x29: ffff000013533b90 x28: ffff8013e6983600
      [  142.044420] x27: 0000000000000000 x26: 0000000000000000
      [  142.055113] x25: 0000000056000000 x24: 0000000000000015
      [  142.065806] x23: 0000000000000028 x22: ffff8013e66eee68
      [  142.076499] x21: ffff8013db919800 x20: 0000ffffefbff000
      [  142.087192] x19: 0000000000001000 x18: 0000000000000007
      [  142.097885] x17: 000000000000000e x16: 0000000000000001
      [  142.108578] x15: 0000000000000019 x14: 363139343a70616d
      [  142.119270] x13: 6e75656761705f67 x12: 0000000000000000
      [  142.129963] x11: 00000000ffffffff x10: 0000000000000006
      [  142.140656] x9 : 1346c1aa88093500 x8 : ffff0000114de4e0
      [  142.151349] x7 : 6662666578303d72 x6 : ffff0000105ffec8
      [  142.162042] x5 : 0000000000000000 x4 : 0000000000000000
      [  142.172734] x3 : 00000000ffffffff x2 : ffff0000114de500
      [  142.183427] x1 : 0000000000000000 x0 : 0000000000000035
      [  142.194120] Call trace:
      [  142.199030]  __iommu_dma_unmap+0xc0/0xc8
      [  142.206920]  iommu_dma_unmap_page+0x20/0x28
      [  142.215335]  __iommu_unmap_page+0x40/0x60
      [  142.223399]  hnae_unmap_buffer+0x110/0x134
      [  142.231639]  hnae_free_desc+0x6c/0x10c
      [  142.239177]  hnae_fini_ring+0x14/0x34
      [  142.246540]  hnae_fini_queue+0x2c/0x40
      [  142.254080]  hnae_put_handle+0x38/0xcc
      [  142.261619]  hns_nic_dev_remove+0x54/0xfc [hns_enet_drv]
      [  142.272312]  platform_drv_remove+0x24/0x64
      [  142.280552]  device_release_driver_internal+0x17c/0x20c
      [  142.291070]  driver_detach+0x4c/0x90
      [  142.298259]  bus_remove_driver+0x5c/0xd8
      [  142.306148]  driver_unregister+0x2c/0x54
      [  142.314037]  platform_driver_unregister+0x10/0x18
      [  142.323505]  hns_nic_dev_driver_exit+0x14/0xf0c [hns_enet_drv]
      [  142.335248]  __arm64_sys_delete_module+0x214/0x25c
      [  142.344891]  el0_svc_common+0xb0/0x10c
      [  142.352430]  el0_svc_handler+0x24/0x80
      [  142.359968]  el0_svc+0x8/0x7c0
      [  142.366104] ---[ end trace 60ad1cd58e63c407 ]---
      
      The tx ring buffer map when xmit and unmap when xmit done. So in
      hnae_init_ring() did not map tx ring buffer, but in hnae_fini_ring()
      have a unmap operation for tx ring buffer, which is already unmapped
      when xmit done, than cause this WARNING.
      
      The hnae_alloc_buffers() is called in hnae_init_ring(),
      so the hnae_free_buffers() should be in hnae_fini_ring(), not in
      hnae_free_desc().
      
      In hnae_fini_ring(), adds a check is_rx_ring() as in hnae_init_ring().
      When the ring buffer is tx ring, adds a piece of code to ensure that
      the tx ring is unmap.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8601a99d
    • Yonglong Liu's avatar
      net: hns: fix ICMP6 neighbor solicitation messages discard problem · f058e468
      Yonglong Liu authored
      ICMP6 neighbor solicitation messages will be discard by the Hip06
      chips, because of not setting forwarding pool. Enable promisc mode
      has the same problem.
      
      This patch fix the wrong forwarding table configs for the multicast
      vague matching when enable promisc mode, and add forwarding pool
      for the forwarding table.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f058e468
    • Yonglong Liu's avatar
      net: hns: Fix probabilistic memory overwrite when HNS driver initialized · c0b09844
      Yonglong Liu authored
      When reboot the system again and again, may cause a memory
      overwrite.
      
      [   15.638922] systemd[1]: Reached target Swap.
      [   15.667561] tun: Universal TUN/TAP device driver, 1.6
      [   15.676756] Bridge firewalling registered
      [   17.344135] Unable to handle kernel paging request at virtual address 0000000200000040
      [   17.352179] Mem abort info:
      [   17.355007]   ESR = 0x96000004
      [   17.358105]   Exception class = DABT (current EL), IL = 32 bits
      [   17.364112]   SET = 0, FnV = 0
      [   17.367209]   EA = 0, S1PTW = 0
      [   17.370393] Data abort info:
      [   17.373315]   ISV = 0, ISS = 0x00000004
      [   17.377206]   CM = 0, WnR = 0
      [   17.380214] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
      [   17.386926] [0000000200000040] pgd=0000000000000000
      [   17.391878] Internal error: Oops: 96000004 [#1] SMP
      [   17.396824] CPU: 23 PID: 95 Comm: kworker/u130:0 Tainted: G            E     4.19.25-1.2.78.aarch64 #1
      [   17.414175] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.54 08/16/2018
      [   17.425615] Workqueue: events_unbound async_run_entry_fn
      [   17.435151] pstate: 00000005 (nzcv daif -PAN -UAO)
      [   17.444139] pc : __mutex_lock.isra.1+0x74/0x540
      [   17.453002] lr : __mutex_lock.isra.1+0x3c/0x540
      [   17.461701] sp : ffff000100d9bb60
      [   17.469146] x29: ffff000100d9bb60 x28: 0000000000000000
      [   17.478547] x27: 0000000000000000 x26: ffff802fb8945000
      [   17.488063] x25: 0000000000000000 x24: ffff802fa32081a8
      [   17.497381] x23: 0000000000000002 x22: ffff801fa2b15220
      [   17.506701] x21: ffff000009809000 x20: ffff802fa23a0888
      [   17.515980] x19: ffff801fa2b15220 x18: 0000000000000000
      [   17.525272] x17: 0000000200000000 x16: 0000000200000000
      [   17.534511] x15: 0000000000000000 x14: 0000000000000000
      [   17.543652] x13: ffff000008d95db8 x12: 000000000000000d
      [   17.552780] x11: ffff000008d95d90 x10: 0000000000000b00
      [   17.561819] x9 : ffff000100d9bb90 x8 : ffff802fb89d6560
      [   17.570829] x7 : 0000000000000004 x6 : 00000004a1801d05
      [   17.579839] x5 : 0000000000000000 x4 : 0000000000000000
      [   17.588852] x3 : ffff802fb89d5a00 x2 : 0000000000000000
      [   17.597734] x1 : 0000000200000000 x0 : 0000000200000000
      [   17.606631] Process kworker/u130:0 (pid: 95, stack limit = 0x(____ptrval____))
      [   17.617438] Call trace:
      [   17.623349]  __mutex_lock.isra.1+0x74/0x540
      [   17.630927]  __mutex_lock_slowpath+0x24/0x30
      [   17.638602]  mutex_lock+0x50/0x60
      [   17.645295]  drain_workqueue+0x34/0x198
      [   17.652623]  __sas_drain_work+0x7c/0x168
      [   17.659903]  sas_drain_work+0x60/0x68
      [   17.666947]  hisi_sas_scan_finished+0x30/0x40 [hisi_sas_main]
      [   17.676129]  do_scsi_scan_host+0x70/0xb0
      [   17.683534]  do_scan_async+0x20/0x228
      [   17.690586]  async_run_entry_fn+0x4c/0x1d0
      [   17.697997]  process_one_work+0x1b4/0x3f8
      [   17.705296]  worker_thread+0x54/0x470
      
      Every time the call trace is not the same, but the overwrite address
      is always the same:
      Unable to handle kernel paging request at virtual address 0000000200000040
      
      The root cause is, when write the reg XGMAC_MAC_TX_LF_RF_CONTROL_REG,
      didn't use the io_base offset.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0b09844
    • Yonglong Liu's avatar
      net: hns: Use NAPI_POLL_WEIGHT for hns driver · acb1ce15
      Yonglong Liu authored
      When the HNS driver loaded, always have an error print:
      "netif_napi_add() called with weight 256"
      
      This is because the kernel checks the NAPI polling weights
      requested by drivers and it prints an error message if a driver
      requests a weight bigger than 64.
      
      So use NAPI_POLL_WEIGHT to fix it.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acb1ce15
    • Liubin Shu's avatar
      net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw() · 3a39a12a
      Liubin Shu authored
      This patch is trying to fix the issue due to:
      [27237.844750] BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x708/0xa18[hns_enet_drv]
      
      After hnae_queue_xmit() in hns_nic_net_xmit_hw(), can be
      interrupted by interruptions, and than call hns_nic_tx_poll_one()
      to handle the new packets, and free the skb. So, when turn back to
      hns_nic_net_xmit_hw(), calling skb->len will cause use-after-free.
      
      This patch update tx ring statistics in hns_nic_tx_poll_one() to
      fix the bug.
      Signed-off-by: default avatarLiubin Shu <shuliubin@huawei.com>
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a39a12a
    • Stanislav Fomichev's avatar
      flow_dissector: rst'ify documentation · 5eed7898
      Stanislav Fomichev authored
      Rename bpf_flow_dissector.txt to bpf_flow_dissector.rst and fix
      formatting. Also, link it from the Documentation/networking/index.rst.
      
      Tested with 'make htmldocs' to make sure it looks reasonable.
      
      Fixes: ae82899b ("flow_dissector: document BPF flow dissector environment")
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5eed7898
    • Junwei Hu's avatar
      ipv6: Fix dangling pointer when ipv6 fragment · ef0efcd3
      Junwei Hu authored
      At the beginning of ip6_fragment func, the prevhdr pointer is
      obtained in the ip6_find_1stfragopt func.
      However, all the pointers pointing into skb header may change
      when calling skb_checksum_help func with
      skb->ip_summed = CHECKSUM_PARTIAL condition.
      The prevhdr pointe will be dangling if it is not reloaded after
      calling __skb_linearize func in skb_checksum_help func.
      
      Here, I add a variable, nexthdr_offset, to evaluate the offset,
      which does not changes even after calling __skb_linearize func.
      
      Fixes: 405c92f7 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment")
      Signed-off-by: default avatarJunwei Hu <hujunwei4@huawei.com>
      Reported-by: default avatarWenhao Zhang <zhangwenhao8@huawei.com>
      Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com
      Reviewed-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef0efcd3
    • Steffen Klassert's avatar
      net-gro: Fix GRO flush when receiving a GSO packet. · 0ab03f35
      Steffen Klassert authored
      Currently we may merge incorrectly a received GSO packet
      or a packet with frag_list into a packet sitting in the
      gro_hash list. skb_segment() may crash case because
      the assumptions on the skb layout are not met.
      The correct behaviour would be to flush the packet in the
      gro_hash list and send the received GSO packet directly
      afterwards. Commit d61d072e ("net-gro: avoid reorders")
      sets NAPI_GRO_CB(skb)->flush in this case, but this is not
      checked before merging. This patch makes sure to check this
      flag and to not merge in that case.
      
      Fixes: d61d072e ("net-gro: avoid reorders")
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ab03f35
  3. 03 Apr, 2019 6 commits
  4. 02 Apr, 2019 7 commits
  5. 01 Apr, 2019 6 commits
    • Jiri Slaby's avatar
      kcm: switch order of device registration to fix a crash · 3c446e6f
      Jiri Slaby authored
      When kcm is loaded while many processes try to create a KCM socket, a
      crash occurs:
       BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
       IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
       PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0
       Oops: 0002 [#1] SMP KASAN PTI
       CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased)
       RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
       RSP: 0018:ffff88000d487a00 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719
       ...
       CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0
       Call Trace:
        kcm_create+0x600/0xbf0 [kcm]
        __sock_create+0x324/0x750 net/socket.c:1272
       ...
      
      This is due to race between sock_create and unfinished
      register_pernet_device. kcm_create tries to do "net_generic(net,
      kcm_net_id)". but kcm_net_id is not initialized yet.
      
      So switch the order of the two to close the race.
      
      This can be reproduced with mutiple processes doing socket(PF_KCM, ...)
      and one process doing module removal.
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c446e6f
    • David S. Miller's avatar
      Merge branch 'net-sched-fix-stats-accounting-for-child-NOLOCK-qdiscs' · c4df1bdd
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      net: sched: fix stats accounting for child NOLOCK qdiscs
      
      Currently, stats accounting for NOLOCK qdisc enslaved to classful (lock)
      qdiscs is buggy. Per CPU values are ignored in most places, as a result,
      stats dump in the above scenario always report 0 length backlog and parent
      backlog len is not updated correctly on NOLOCK qdisc removal.
      
      The first patch address stats dumping, and the second one child qdisc removal.
      I'm targeting the net tree as this is a bugfix, but it could be moved to
      net-next due to the relatively large diffstat.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4df1bdd
    • Paolo Abeni's avatar
      net: sched: introduce and use qdisc tree flush/purge helpers · e5f0e8f8
      Paolo Abeni authored
      The same code to flush qdisc tree and purge the qdisc queue
      is duplicated in many places and in most cases it does not
      respect NOLOCK qdisc: the global backlog len is used and the
      per CPU values are ignored.
      
      This change addresses the above, factoring-out the relevant
      code and using the helpers introduced by the previous patch
      to fetch the correct backlog len.
      
      Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5f0e8f8
    • Paolo Abeni's avatar
      net: sched: introduce and use qstats read helpers · 5dd431b6
      Paolo Abeni authored
      Classful qdiscs can't access directly the child qdiscs backlog
      length: if such qdisc is NOLOCK, per CPU values should be
      accounted instead.
      
      Most qdiscs no not respect the above. As a result, qstats fetching
      for most classful qdisc is currently incorrect: if the child qdisc is
      NOLOCK, it always reports 0 len backlog.
      
      This change introduces a pair of helpers to safely fetch
      both backlog and qlen and use them in stats class dumping
      functions, fixing the above issue and cleaning a bit the code.
      
      DRR needs also to access the child qdisc queue length, so it
      needs custom handling.
      
      Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5dd431b6
    • Nicolas Dichtel's avatar
      net/sched: fix ->get helper of the matchall cls · 0db6f8be
      Nicolas Dichtel authored
      It returned always NULL, thus it was never possible to get the filter.
      
      Example:
      $ ip link add foo type dummy
      $ ip link add bar type dummy
      $ tc qdisc add dev foo clsact
      $ tc filter add dev foo protocol all pref 1 ingress handle 1234 \
      	matchall action mirred ingress mirror dev bar
      
      Before the patch:
      $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
      Error: Specified filter handle not found.
      We have an error talking to the kernel
      
      After:
      $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
      filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2
        not_in_hw
              action order 1: mirred (Ingress Mirror to device bar) pipe
              index 1 ref 1 bind 1
      
      CC: Yotam Gigi <yotamg@mellanox.com>
      CC: Jiri Pirko <jiri@mellanox.com>
      Fixes: fd62d9f5 ("net/sched: matchall: Fix configuration race")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0db6f8be
    • Björn Töpel's avatar
      i40e: add tracking of AF_XDP ZC state for each queue pair · 44ddd4f1
      Björn Töpel authored
      In commit f3fef2b6 ("i40e: Remove umem from VSI") a regression was
      introduced; When the VSI was reset, the setup code would try to enable
      AF_XDP ZC unconditionally (as long as there was a umem placed in the
      netdev._rx struct). Here, we add a bitmap to the VSI that tracks if a
      certain queue pair has been "zero-copy enabled" via the ndo_bpf. The
      bitmap is used in i40e_xsk_umem, and enables zero-copy if and only if
      XDP is enabled, the corresponding qid in the bitmap is set and the
      umem is non-NULL.
      
      Fixes: f3fef2b6 ("i40e: Remove umem from VSI")
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      44ddd4f1