1. 13 Sep, 2018 6 commits
    • Stephen Hemminger's avatar
      hv_netvsc: fix schedule in RCU context · 018349d7
      Stephen Hemminger authored
      When netvsc device is removed it can call reschedule in RCU context.
      This happens because canceling the subchannel setup work could (in theory)
      cause a reschedule when manipulating the timer.
      
      To reproduce, run with lockdep enabled kernel and unbind
      a network device from hv_netvsc (via sysfs).
      
      [  160.682011] WARNING: suspicious RCU usage
      [  160.707466] 4.19.0-rc3-uio+ #2 Not tainted
      [  160.709937] -----------------------------
      [  160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
      [  160.723691]
      [  160.723691] other info that might help us debug this:
      [  160.723691]
      [  160.730955]
      [  160.730955] rcu_scheduler_active = 2, debug_locks = 1
      [  160.762813] 5 locks held by rebind-eth.sh/1812:
      [  160.766851]  #0: 000000008befa37a (sb_writers#6){.+.+}, at: vfs_write+0x184/0x1b0
      [  160.773416]  #1: 00000000b097f236 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0
      [  160.783766]  #2: 0000000041ee6889 (kn->count#3){++++}, at: kernfs_fop_write+0xeb/0x1a0
      [  160.787465]  #3: 0000000056d92a74 (&dev->mutex){....}, at: device_release_driver_internal+0x39/0x250
      [  160.816987]  #4: 0000000030f6031e (rcu_read_lock){....}, at: netvsc_remove+0x1e/0x250 [hv_netvsc]
      [  160.828629]
      [  160.828629] stack backtrace:
      [  160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-uio+ #2
      [  160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
      [  160.832952] Call Trace:
      [  160.832952]  dump_stack+0x85/0xcb
      [  160.832952]  ___might_sleep+0x1a3/0x240
      [  160.832952]  __flush_work+0x57/0x2e0
      [  160.832952]  ? __mutex_lock+0x83/0x990
      [  160.832952]  ? __kernfs_remove+0x24f/0x2e0
      [  160.832952]  ? __kernfs_remove+0x1b2/0x2e0
      [  160.832952]  ? mark_held_locks+0x50/0x80
      [  160.832952]  ? get_work_pool+0x90/0x90
      [  160.832952]  __cancel_work_timer+0x13c/0x1e0
      [  160.832952]  ? netvsc_remove+0x1e/0x250 [hv_netvsc]
      [  160.832952]  ? __lock_is_held+0x55/0x90
      [  160.832952]  netvsc_remove+0x9a/0x250 [hv_netvsc]
      [  160.832952]  vmbus_remove+0x26/0x30
      [  160.832952]  device_release_driver_internal+0x18a/0x250
      [  160.832952]  unbind_store+0xb4/0x180
      [  160.832952]  kernfs_fop_write+0x113/0x1a0
      [  160.832952]  __vfs_write+0x36/0x1a0
      [  160.832952]  ? rcu_read_lock_sched_held+0x6b/0x80
      [  160.832952]  ? rcu_sync_lockdep_assert+0x2e/0x60
      [  160.832952]  ? __sb_start_write+0x141/0x1a0
      [  160.832952]  ? vfs_write+0x184/0x1b0
      [  160.832952]  vfs_write+0xbe/0x1b0
      [  160.832952]  ksys_write+0x55/0xc0
      [  160.832952]  do_syscall_64+0x60/0x1b0
      [  160.832952]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  160.832952] RIP: 0033:0x7fe48f4c8154
      
      Resolve this by getting RTNL earlier. This is safe because the subchannel
      work queue does trylock on RTNL and will detect the race.
      
      Fixes: 7b2ee50c ("hv_netvsc: common detach logic")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Reviewed-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      018349d7
    • Cong Wang's avatar
      net_sched: notify filter deletion when deleting a chain · f5b9bac7
      Cong Wang authored
      When we delete a chain of filters, we need to notify
      user-space we are deleting each filters in this chain
      too.
      
      Fixes: 32a4f5ec ("net: sched: introduce chain object to uapi")
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5b9bac7
    • Juergen Gross's avatar
      xen/netfront: don't bug in case of too many frags · ad4f15dc
      Juergen Gross authored
      Commit 57f230ab ("xen/netfront: raise max number of slots in
      xennet_get_responses()") raised the max number of allowed slots by one.
      This seems to be problematic in some configurations with netback using
      a larger MAX_SKB_FRAGS value (e.g. old Linux kernel with MAX_SKB_FRAGS
      defined as 18 instead of nowadays 17).
      
      Instead of BUG_ON() in this case just fall back to retransmission.
      
      Fixes: 57f230ab ("xen/netfront: raise max number of slots in xennet_get_responses()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad4f15dc
    • Xin Long's avatar
      ipv6: use rt6_info members when dst is set in rt6_fill_node · 22d0bd82
      Xin Long authored
      In inet6_rtm_getroute, since Commit 93531c67 ("net/ipv6: separate
      handling of FIB entries from dst based routes"), it has used rt->from
      to dump route info instead of rt.
      
      However for some route like cache, some of its information like flags
      or gateway is not the same as that of the 'from' one. It caused 'ip
      route get' to dump the wrong route information.
      
      In Jianlin's testing, the output information even lost the expiration
      time for a pmtu route cache due to the wrong fib6_flags.
      
      So change to use rt6_info members for dst addr, src addr, flags and
      gateway when it tries to dump a route entry without fibmatch set.
      
      v1->v2:
        - not use rt6i_prefsrc.
        - also fix the gw dump issue.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22d0bd82
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2018-09-12' of git://anongit.freedesktop.org/drm/drm · 7428b2e5
      Linus Torvalds authored
      Pull drm nouveau fixes from Dave Airlie:
       "I'm sending this separately as it's a bit larger than I generally like
        for one driver, but it does contain a bunch of make my nvidia laptop
        not die (runpm) and a bunch to make my docking station and monitor
        display stuff (mst) fixes.
      
        Lyude has spent a lot of time on these, and we are putting the fixes
        into distro kernels as well asap, as it helps a bunch of standard
        Lenovo laptops, so I'm fairly happy things are better than they were
        before these patches, but I decided to split them out just for
        clarification"
      
      * tag 'drm-fixes-2018-09-12' of git://anongit.freedesktop.org/drm/drm:
        drm/nouveau/disp/gm200-: enforce identity-mapped SOR assignment for LVDS/eDP panels
        drm/nouveau/disp: fix DP disable race
        drm/nouveau/disp: move eDP panel power handling
        drm/nouveau/disp: remove unused struct member
        drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOS
        drm/nouveau/mmu: don't attempt to dereference vmm without valid instance pointer
        drm/nouveau: fix oops in client init failure path
        drm/nouveau: Fix nouveau_connector_ddc_detect()
        drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unload
        drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early
        drm/nouveau: Reset MST branching unit before enabling
        drm/nouveau: Only write DP_MSTM_CTRL when needed
        drm/nouveau: Remove useless poll_enable() call in drm_load()
        drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()
        drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()
        drm/nouveau: Fix deadlocks in nouveau_connector_detect()
        drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
        drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requests
        drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()
        drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement
      7428b2e5
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 67b07609
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix up several Kconfig dependencies in netfilter, from Martin Willi
          and Florian Westphal.
      
       2) Memory leak in be2net driver, from Petr Oros.
      
       3) Memory leak in E-Switch handling of mlx5 driver, from Raed Salem.
      
       4) mlx5_attach_interface needs to check for errors, from Huy Nguyen.
      
       5) tipc_release() needs to orphan the sock, from Cong Wang.
      
       6) Need to program TxConfig register after TX/RX is enabled in r8169
          driver, not beforehand, from Maciej S. Szmigiero.
      
       7) Handle 64K PAGE_SIZE properly in ena driver, from Netanel Belgazal.
      
       8) Fix crash regression in ip_do_fragment(), from Taehee Yoo.
      
       9) syzbot can create conditions where kernel log is flooded with
          synflood warnings due to creation of many listening sockets, fix
          that. From Willem de Bruijn.
      
      10) Fix RCU issues in rds socket layer, from Cong Wang.
      
      11) Fix vlan matching in nfp driver, from Pieter Jansen van Vuuren.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
        nfp: flower: reject tunnel encap with ipv6 outer headers for offloading
        nfp: flower: fix vlan match by checking both vlan id and vlan pcp
        tipc: check return value of __tipc_dump_start()
        s390/qeth: don't dump past end of unknown HW header
        s390/qeth: use vzalloc for QUERY OAT buffer
        s390/qeth: switch on SG by default for IQD devices
        s390/qeth: indicate error when netdev allocation fails
        rds: fix two RCU related problems
        r8169: Clear RTL_FLAG_TASK_*_PENDING when clearing RTL_FLAG_TASK_ENABLED
        erspan: fix error handling for erspan tunnel
        erspan: return PACKET_REJECT when the appropriate tunnel is not found
        tcp: rate limit synflood warnings further
        MIPS: lantiq: dma: add dev pointer
        netfilter: xt_hashlimit: use s->file instead of s->private
        netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEAT
        netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to type
        netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT
        netfilter: conntrack: reset tcp maxwin on re-register
        qmi_wwan: Support dynamic config on Quectel EP06
        ethernet: renesas: convert to SPDX identifiers
        ...
      67b07609
  2. 12 Sep, 2018 19 commits
  3. 11 Sep, 2018 2 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 28a0ea77
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "This fixes one major regression with NFS and mlx4 due to the max_sg
        rework in this merge window, tidies a few minor error_path
        regressions, and various small fixes.
      
        The HFI1 driver is broken this cycle due to a regression caused by a
        PCI change, it is looking like Bjorn will merge a fix for this. Also,
        the lingering ipoib issue I mentioned earlier still remains unfixed.
      
        Summary:
      
         - Fix possible FD type confusion crash
      
         - Fix a user trigger-able crash in cxgb4
      
         - Fix bad handling of IOMMU resources causing user controlled leaking
           in bnxt
      
         - Add missing locking in ipoib to fix a rare 'stuck tx' situation
      
         - Add missing locking in cma
      
         - Add two missing missing uverbs cleanups on failure paths,
           regressions from this merge window
      
         - Fix a regression from this merge window that caused RDMA NFS to not
           work with the mlx4 driver due to the max_sg changes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/mlx4: Ensure that maximal send/receive SGE less than supported by HW
        RDMA/cma: Protect cma dev list with lock
        RDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one()
        bnxt_re: Fix couple of memory leaks that could lead to IOMMU call traces
        IB/ipoib: Avoid a race condition between start_xmit and cm_rep_handler
        iw_cxgb4: only allow 1 flush on user qps
        IB/core: Release object lock if destroy failed
        RDMA/ucma: check fd type in ucma_migrate_id()
      28a0ea77
    • Dave Airlie's avatar
      Merge branch 'linux-4.19' of git://github.com/skeggsb/linux into drm-fixes · 2887e5ce
      Dave Airlie authored
      A bunch of fixes for MST/runpm problems and races, as well as fixes
      for issues that prevent more recent laptops from booting.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Ben Skeggs <bskeggs@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/CABDvA==GF63dy8a9j611=-0x8G6FRu7uC-ZQypsLO_hqV4OAcA@mail.gmail.com
      2887e5ce
  4. 10 Sep, 2018 8 commits
  5. 09 Sep, 2018 5 commits
    • Taehee Yoo's avatar
      ip: frags: fix crash in ip_do_fragment() · 5d407b07
      Taehee Yoo authored
      A kernel crash occurrs when defragmented packet is fragmented
      in ip_do_fragment().
      In defragment routine, skb_orphan() is called and
      skb->ip_defrag_offset is set. but skb->sk and
      skb->ip_defrag_offset are same union member. so that
      frag->sk is not NULL.
      Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
      defragmented packet is fragmented.
      
      test commands:
         %iptables -t nat -I POSTROUTING -j MASQUERADE
         %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
      
      splat looks like:
      [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
      [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
      [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
      [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
      [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
      [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
      [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
      [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
      [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
      [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
      [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
      [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
      [  261.198158] Call Trace:
      [  261.199018]  ? dst_output+0x180/0x180
      [  261.205011]  ? save_trace+0x300/0x300
      [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
      [  261.213034]  ? sched_clock_local+0xd4/0x140
      [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
      [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
      [  261.227014]  ? find_held_lock+0x39/0x1c0
      [  261.233008]  ip_finish_output+0x51d/0xb50
      [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
      [  261.250152]  ? rcu_is_watching+0x77/0x120
      [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
      [  261.261033]  ? nf_hook_slow+0xb1/0x160
      [  261.265007]  ip_output+0x1c7/0x710
      [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
      [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
      [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.282996]  ? nf_hook_slow+0xb1/0x160
      [  261.287007]  raw_sendmsg+0x21f9/0x4420
      [  261.291008]  ? dst_output+0x180/0x180
      [  261.297003]  ? sched_clock_cpu+0x126/0x170
      [  261.301003]  ? find_held_lock+0x39/0x1c0
      [  261.306155]  ? stop_critical_timings+0x420/0x420
      [  261.311004]  ? check_flags.part.36+0x450/0x450
      [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.326142]  ? cyc2ns_read_end+0x10/0x10
      [  261.330139]  ? raw_bind+0x280/0x280
      [  261.334138]  ? sched_clock_cpu+0x126/0x170
      [  261.338995]  ? check_flags.part.36+0x450/0x450
      [  261.342991]  ? __lock_acquire+0x4500/0x4500
      [  261.348994]  ? inet_sendmsg+0x11c/0x500
      [  261.352989]  ? dst_output+0x180/0x180
      [  261.357012]  inet_sendmsg+0x11c/0x500
      [ ... ]
      
      v2:
       - clear skb->sk at reassembly routine.(Eric Dumarzet)
      
      Fixes: fa0f5273 ("ip: use rb trees for IP frag queue.")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d407b07
    • Vakul Garg's avatar
      net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC · 52ea992c
      Vakul Garg authored
      tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
      function sk_alloc_sg(). In case the number of SG entries hit
      MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
      current SG index to '0'. This leads to calling of function
      tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
      kernel crash. To fix this, set the number of SG elements to the number
      of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
      returns -ENOSPC.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Signed-off-by: default avatarVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52ea992c
    • David S. Miller's avatar
      Merge branch 'ena-fixes' · 0e1f4c76
      David S. Miller authored
      Netanel Belgazal says:
      
      ====================
      bug fixes for ENA Ethernet driver
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e1f4c76
    • Netanel Belgazal's avatar
      net: ena: fix incorrect usage of memory barriers · 37dff155
      Netanel Belgazal authored
      Added memory barriers where they were missing to support multiple
      architectures, and removed redundant ones.
      
      As part of removing the redundant memory barriers and improving
      performance, we moved to more relaxed versions of memory barriers,
      as well as to the more relaxed version of writel - writel_relaxed,
      while maintaining correctness.
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37dff155
    • Netanel Belgazal's avatar
      net: ena: fix missing calls to READ_ONCE · 28abf4e9
      Netanel Belgazal authored
      Add READ_ONCE calls where necessary (for example when iterating
      over a memory field that gets updated by the hardware).
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28abf4e9