1. 28 Jul, 2018 21 commits
    • Roopa Prabhu's avatar
      vxlan: add new fdb alloc and create helpers · 1c345a52
      Roopa Prabhu authored
      [ Upstream commit 7431016b ]
      
      - Add new vxlan_fdb_alloc helper
      - rename existing vxlan_fdb_create into vxlan_fdb_update:
              because it really creates or updates an existing
              fdb entry
      - move new fdb creation into a separate vxlan_fdb_create
      
      Main motivation for this change is to introduce the ability
      to decouple vxlan fdb creation and notify, used in a later patch.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c345a52
    • Roopa Prabhu's avatar
      rtnetlink: add rtnl_link_state check in rtnl_configure_link · 23557c5d
      Roopa Prabhu authored
      [ Upstream commit 5025f7f7 ]
      
      rtnl_configure_link sets dev->rtnl_link_state to
      RTNL_LINK_INITIALIZED and unconditionally calls
      __dev_notify_flags to notify user-space of dev flags.
      
      current call sequence for rtnl_configure_link
      rtnetlink_newlink
          rtnl_link_ops->newlink
          rtnl_configure_link (unconditionally notifies userspace of
                               default and new dev flags)
      
      If a newlink handler wants to call rtnl_configure_link
      early, we will end up with duplicate notifications to
      user-space.
      
      This patch fixes rtnl_configure_link to check rtnl_link_state
      and call __dev_notify_flags with gchanges = 0 if already
      RTNL_LINK_INITIALIZED.
      
      Later in the series, this patch will help the following sequence
      where a driver implementing newlink can call rtnl_configure_link
      to initialize the link early.
      
      makes the following call sequence work:
      rtnetlink_newlink
          rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
                                                      link and notifies
                                                      user-space of default
                                                      dev flags)
          rtnl_configure_link (updates dev flags if requested by user ifm
                               and notifies user-space of new dev flags)
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23557c5d
    • Daniel Borkmann's avatar
      sock: fix sg page frag coalescing in sk_alloc_sg · 464e2326
      Daniel Borkmann authored
      [ Upstream commit 144fe2bf ]
      
      Current sg coalescing logic in sk_alloc_sg() (latter is used by tls and
      sockmap) is not quite correct in that we do fetch the previous sg entry,
      however the subsequent check whether the refilled page frag from the
      socket is still the same as from the last entry with prior offset and
      length matching the start of the current buffer is comparing always the
      first sg list entry instead of the prior one.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      464e2326
    • Heiner Kallweit's avatar
      net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv · 50b464d3
      Heiner Kallweit authored
      [ Upstream commit 215d08a8 ]
      
      The situation described in the comment can occur also with
      PHY_IGNORE_INTERRUPT, therefore change the condition to include it.
      
      Fixes: f555f34f ("net: phy: fix auto-negotiation stall due to unavailable interrupt")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50b464d3
    • Hangbin Liu's avatar
      multicast: do not restore deleted record source filter mode to new one · 46f9e1d0
      Hangbin Liu authored
      There are two scenarios that we will restore deleted records. The first is
      when device down and up(or unmap/remap). In this scenario the new filter
      mode is same with previous one. Because we get it from in_dev->mc_list and
      we do not touch it during device down and up.
      
      The other scenario is when a new socket join a group which was just delete
      and not finish sending status reports. In this scenario, we should use the
      current filter mode instead of restore old one. Here are 4 cases in total.
      
      old_socket        new_socket       before_fix       after_fix
        IN(A)             IN(A)           ALLOW(A)         ALLOW(A)
        IN(A)             EX( )           TO_IN( )         TO_EX( )
        EX( )             IN(A)           TO_EX( )         ALLOW(A)
        EX( )             EX( )           TO_EX( )         TO_EX( )
      
      Fixes: 24803f38 (igmp: do not remove igmp souce list info when set link down)
      Fixes: 1666d49e (mld: do not remove mld souce list info when set link down)
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46f9e1d0
    • David Ahern's avatar
      net/ipv6: Fix linklocal to global address with VRF · 6d5b7d68
      David Ahern authored
      [ Upstream commit 24b711ed ]
      
      Example setup:
          host: ip -6 addr add dev eth1 2001:db8:104::4
                 where eth1 is enslaved to a VRF
      
          switch: ip -6 ro add 2001:db8:104::4/128 dev br1
                  where br1 only has an LLA
      
                 ping6 2001:db8:104::4
                 ssh   2001:db8:104::4
      
      (NOTE: UDP works fine if the PKTINFO has the address set to the global
      address and ifindex is set to the index of eth1 with a destination an
      LLA).
      
      For ICMP, icmp6_iif needs to be updated to check if skb->dev is an
      L3 master. If it is then return the ifindex from rt6i_idev similar
      to what is done for loopback.
      
      For TCP, restore the original tcp_v6_iif definition which is needed in
      most places and add a new tcp_v6_iif_l3_slave that considers the
      l3_slave variability. This latter check is only needed for socket
      lookups.
      
      Fixes: 9ff74384 ("net: vrf: Handle ipv6 multicast and link-local addresses")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d5b7d68
    • Eran Ben Elisha's avatar
      net/mlx5e: Fix quota counting in aRFS expire flow · 047af2d8
      Eran Ben Elisha authored
      [ Upstream commit 2630bae8 ]
      
      Quota should follow the amount of rules which do expire, and not the
      number of rules that were examined, fixed that.
      
      Fixes: 18c908e4 ("net/mlx5e: Add accelerated RFS support")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      047af2d8
    • Eran Ben Elisha's avatar
      net/mlx5e: Don't allow aRFS for encapsulated packets · c83cd442
      Eran Ben Elisha authored
      [ Upstream commit d2e1c57b ]
      
      Driver is yet to support aRFS for encapsulated packets, return early
      error in such case.
      
      Fixes: 18c908e4 ("net/mlx5e: Add accelerated RFS support")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c83cd442
    • Ariel Levkovich's avatar
      net/mlx5: Adjust clock overflow work period · 291d99ac
      Ariel Levkovich authored
      [ Upstream commit 33180bee ]
      
      When driver converts HW timestamp to wall clock time it subtracts
      the last saved cycle counter from the HW timestamp and converts the
      difference to nanoseconds.
      The conversion is done by multiplying the cycles difference with the
      clock multiplier value as a first step and therefore the cycles
      difference should be small enough so that the multiplication product
      doesn't exceed 64bit.
      
      The overflow handling routine is in charge of updating the last saved
      cycle counter in driver and it is called periodically using kernel
      delayed workqueue.
      
      The delay period for this work is calculated using the max HW cycle
      counter value (a 41 bit mask) as a base which doesn't take the 64bit
      limit into account so the delay period may be incorrect and too
      long to prevent a large difference between the HW counter and the last
      saved counter in SW.
      
      This change adjusts the work period for the HW clock overflow work by
      taking the minimum between the previous value and the quotient of max
      u64 value and the clock multiplier value.
      
      Fixes: ef9814de ("net/mlx5e: Add HW timestamping (TS) support")
      Signed-off-by: default avatarAriel Levkovich <lariel@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      291d99ac
    • Eric Dumazet's avatar
      net: skb_segment() should not return NULL · f208fbad
      Eric Dumazet authored
      [ Upstream commit ff907a11 ]
      
      syzbot caught a NULL deref [1], caused by skb_segment()
      
      skb_segment() has many "goto err;" that assume the @err variable
      contains -ENOMEM.
      
      A successful call to __skb_linearize() should not clear @err,
      otherwise a subsequent memory allocation error could return NULL.
      
      While we are at it, we might use -EINVAL instead of -ENOMEM when
      MAX_SKB_FRAGS limit is reached.
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      CPU: 0 PID: 13285 Comm: syz-executor3 Not tainted 4.18.0-rc4+ #146
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:tcp_gso_segment+0x3dc/0x1780 net/ipv4/tcp_offload.c:106
      Code: f0 ff ff 0f 87 1c fd ff ff e8 00 88 0b fb 48 8b 75 d0 48 b9 00 00 00 00 00 fc ff df 48 8d be 90 00 00 00 48 89 f8 48 c1 e8 03 <0f> b6 14 08 48 8d 86 94 00 00 00 48 89 c6 83 e0 07 48 c1 ee 03 0f
      RSP: 0018:ffff88019b7fd060 EFLAGS: 00010206
      RAX: 0000000000000012 RBX: 0000000000000020 RCX: dffffc0000000000
      RDX: 0000000000040000 RSI: 0000000000000000 RDI: 0000000000000090
      RBP: ffff88019b7fd0f0 R08: ffff88019510e0c0 R09: ffffed003b5c46d6
      R10: ffffed003b5c46d6 R11: ffff8801dae236b3 R12: 0000000000000001
      R13: ffff8801d6c581f4 R14: 0000000000000000 R15: ffff8801d6c58128
      FS:  00007fcae64d6700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004e8664 CR3: 00000001b669b000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       tcp4_gso_segment+0x1c3/0x440 net/ipv4/tcp_offload.c:54
       inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
       inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
       skb_mac_gso_segment+0x3b5/0x740 net/core/dev.c:2792
       __skb_gso_segment+0x3c3/0x880 net/core/dev.c:2865
       skb_gso_segment include/linux/netdevice.h:4099 [inline]
       validate_xmit_skb+0x640/0xf30 net/core/dev.c:3104
       __dev_queue_xmit+0xc14/0x3910 net/core/dev.c:3561
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
       neigh_hh_output include/net/neighbour.h:473 [inline]
       neigh_output include/net/neighbour.h:481 [inline]
       ip_finish_output2+0x1063/0x1860 net/ipv4/ip_output.c:229
       ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
       NF_HOOK_COND include/linux/netfilter.h:276 [inline]
       ip_output+0x223/0x880 net/ipv4/ip_output.c:405
       dst_output include/net/dst.h:444 [inline]
       ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
       iptunnel_xmit+0x567/0x850 net/ipv4/ip_tunnel_core.c:91
       ip_tunnel_xmit+0x1598/0x3af1 net/ipv4/ip_tunnel.c:778
       ipip_tunnel_xmit+0x264/0x2c0 net/ipv4/ipip.c:308
       __netdev_start_xmit include/linux/netdevice.h:4148 [inline]
       netdev_start_xmit include/linux/netdevice.h:4157 [inline]
       xmit_one net/core/dev.c:3034 [inline]
       dev_hard_start_xmit+0x26c/0xc30 net/core/dev.c:3050
       __dev_queue_xmit+0x29ef/0x3910 net/core/dev.c:3569
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
       neigh_direct_output+0x15/0x20 net/core/neighbour.c:1403
       neigh_output include/net/neighbour.h:483 [inline]
       ip_finish_output2+0xa67/0x1860 net/ipv4/ip_output.c:229
       ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
       NF_HOOK_COND include/linux/netfilter.h:276 [inline]
       ip_output+0x223/0x880 net/ipv4/ip_output.c:405
       dst_output include/net/dst.h:444 [inline]
       ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
       ip_queue_xmit+0x9df/0x1f80 net/ipv4/ip_output.c:504
       tcp_transmit_skb+0x1bf9/0x3f10 net/ipv4/tcp_output.c:1168
       tcp_write_xmit+0x1641/0x5c20 net/ipv4/tcp_output.c:2363
       __tcp_push_pending_frames+0xb2/0x290 net/ipv4/tcp_output.c:2536
       tcp_push+0x638/0x8c0 net/ipv4/tcp.c:735
       tcp_sendmsg_locked+0x2ec5/0x3f00 net/ipv4/tcp.c:1410
       tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1447
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:641 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:651
       __sys_sendto+0x3d7/0x670 net/socket.c:1797
       __do_sys_sendto net/socket.c:1809 [inline]
       __se_sys_sendto net/socket.c:1805 [inline]
       __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1805
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x455ab9
      Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fcae64d5c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007fcae64d66d4 RCX: 0000000000455ab9
      RDX: 0000000000000001 RSI: 0000000020000200 RDI: 0000000000000013
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
      R13: 00000000004c1145 R14: 00000000004d1818 R15: 0000000000000006
      Modules linked in:
      Dumping ftrace buffer:
         (ftrace buffer empty)
      
      Fixes: ddff00d4 ("net: Move skb_has_shared_frag check out of GRE code and into segmentation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f208fbad
    • Jack Morgenstein's avatar
      net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper · 6e92f04a
      Jack Morgenstein authored
      [ Upstream commit 958c696f ]
      
      Function mlx4_RST2INIT_QP_wrapper saved the qp number passed in the qp
      context, rather than the one passed in the input modifier.
      
      However, the qp number in the qp context is not defined as a
      required parameter by the FW. Therefore, drivers may choose to not
      specify the qp number in the qp context for the reset-to-init transition.
      
      Thus, we must save the qp number passed in the command input modifier --
      which is always present. (This saved qp number is used as the input
      modifier for command 2RST_QP when a slave's qp's are destroyed).
      
      Fixes: c82e9aa0 ("mlx4_core: resource tracking for HCA resources used by guests")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e92f04a
    • Willem de Bruijn's avatar
      ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull · df20f746
      Willem de Bruijn authored
      [ Upstream commit 2efd4fca ]
      
      Syzbot reported a read beyond the end of the skb head when returning
      IPV6_ORIGDSTADDR:
      
        BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
        CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
        Google 01/01/2011
        Call Trace:
          __dump_stack lib/dump_stack.c:77 [inline]
          dump_stack+0x185/0x1d0 lib/dump_stack.c:113
          kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
          kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
          kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
          copy_to_user include/linux/uaccess.h:184 [inline]
          put_cmsg+0x5ef/0x860 net/core/scm.c:242
          ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
          ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
          rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
          [..]
      
      This logic and its ipv4 counterpart read the destination port from
      the packet at skb_transport_offset(skb) + 4.
      
      With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
      packet that stores headers exactly up to skb_transport_offset(skb) in
      the head and the remainder in a frag.
      
      Call pskb_may_pull before accessing the pointer to ensure that it lies
      in skb head.
      
      Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
      Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df20f746
    • Paolo Abeni's avatar
      ip: hash fragments consistently · c2ce657f
      Paolo Abeni authored
      [ Upstream commit 3dd1c9a1 ]
      
      The skb hash for locally generated ip[v6] fragments belonging
      to the same datagram can vary in several circumstances:
      * for connected UDP[v6] sockets, the first fragment get its hash
        via set_owner_w()/skb_set_hash_from_sk()
      * for unconnected IPv6 UDPv6 sockets, the first fragment can get
        its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if
        auto_flowlabel is enabled
      
      For the following frags the hash is usually computed via
      skb_get_hash().
      The above can cause OoO for unconnected IPv6 UDPv6 socket: in that
      scenario the egress tx queue can be selected on a per packet basis
      via the skb hash.
      It may also fool flow-oriented schedulers to place fragments belonging
      to the same datagram in different flows.
      
      Fix the issue by copying the skb hash from the head frag into
      the others at fragmentation time.
      
      Before this commit:
      perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8"
      netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n &
      perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1
      perf script
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0
      
      After this commit:
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
      
      Fixes: b73c3d0e ("net: Save TX flow hash in sock and set in skbuf on xmit")
      Fixes: 67800f9b ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2ce657f
    • Jarod Wilson's avatar
      bonding: set default miimon value for non-arp modes if not set · f1fb27fc
      Jarod Wilson authored
      [ Upstream commit c1f897ce ]
      
      For some time now, if you load the bonding driver and configure bond
      parameters via sysfs using minimal config options, such as specifying
      nothing but the mode, relying on defaults for everything else, modes
      that cannot use arp monitoring (802.3ad, balance-tlb, balance-alb) all
      wind up with both arp_interval=0 (as it should be) and miimon=0, which
      means the miimon monitor thread never actually runs. This is particularly
      problematic for 802.3ad.
      
      For example, from an LNST recipe I've set up:
      
      $ modprobe bonding max_bonds=0"
      $ echo "+t_bond0" > /sys/class/net/bonding_masters"
      $ ip link set t_bond0 down"
      $ echo "802.3ad" > /sys/class/net/t_bond0/bonding/mode"
      $ ip link set ens1f1 down"
      $ echo "+ens1f1" > /sys/class/net/t_bond0/bonding/slaves"
      $ ip link set ens1f0 down"
      $ echo "+ens1f0" > /sys/class/net/t_bond0/bonding/slaves"
      $ ethtool -i t_bond0"
      $ ip link set ens1f1 up"
      $ ip link set ens1f0 up"
      $ ip link set t_bond0 up"
      $ ip addr add 192.168.9.1/24 dev t_bond0"
      $ ip addr add 2002::1/64 dev t_bond0"
      
      This bond comes up okay, but things look slightly suspect in
      /proc/net/bonding/t_bond0 output:
      
      $ grep -i mii /proc/net/bonding/t_bond0
      MII Status: up
      MII Polling Interval (ms): 0
      MII Status: up
      MII Status: up
      
      Now, pull a cable on one of the ports in the bond, then reconnect it, and
      you'll see:
      
      Slave Interface: ens1f0
      MII Status: down
      Speed: 1000 Mbps
      Duplex: full
      
      I believe this became a major issue as of commit 4d2c0cda, which for
      802.3ad bonds, sets slave->link = BOND_LINK_DOWN, with a comment about
      relying on link monitoring via miimon to set it correctly, but since the
      miimon work queue never runs, the link just stays marked down.
      
      If we simply tweak bond_option_mode_set() slightly, we can check for the
      non-arp modes having no miimon value set, and insert BOND_DEFAULT_MIIMON,
      which gets things back in full working order. This problem exists as far
      back as 4.14, and might be worth fixing in all stable trees since, though
      the work-around is to simply specify an miimon value yourself.
      Reported-by: default avatarBob Ball <ball@umich.edu>
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1fb27fc
    • Lyude Paul's avatar
      drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs · 7e454c18
      Lyude Paul authored
      commit eb493fbc upstream.
      
      Currently nouveau doesn't actually expose the state debugfs file that's
      usually provided for any modesetting driver that supports atomic, even
      if nouveau is loaded with atomic=1. This is due to the fact that the
      standard debugfs files that DRM creates for atomic drivers is called
      when drm_get_pci_dev() is called from nouveau_drm.c. This happens well
      before we've initialized the display core, which is currently
      responsible for setting the DRIVER_ATOMIC cap.
      
      So, move the atomic option into nouveau_drm.c and just add the
      DRIVER_ATOMIC cap whenever it's enabled on the kernel commandline. This
      shouldn't cause any actual issues, as the atomic ioctl will still fail
      as expected even if the display core doesn't disable it until later in
      the init sequence. This also provides the added benefit of being able to
      use the state debugfs file to check the current display state even if
      clients aren't allowed to modify it through anything other than the
      legacy ioctls.
      
      Additionally, disable the DRIVER_ATOMIC cap in nv04's display core, as
      this was already disabled there previously.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e454c18
    • Lyude Paul's avatar
      drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit() · d0bd2c70
      Lyude Paul authored
      commit e5d54f19 upstream.
      
      A CRTC being enabled doesn't mean it's on! It doesn't even necessarily
      mean it's being used. This fixes runtime PM leaks on the P50 I've got
      next to me.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      d0bd2c70
    • Alexey Kardashevskiy's avatar
      KVM: PPC: Check if IOMMU page is contained in the pinned physical page · 58113603
      Alexey Kardashevskiy authored
      commit 76fa4975 upstream.
      
      A VM which has:
       - a DMA capable device passed through to it (eg. network card);
       - running a malicious kernel that ignores H_PUT_TCE failure;
       - capability of using IOMMU pages bigger that physical pages
      can create an IOMMU mapping that exposes (for example) 16MB of
      the host physical memory to the device when only 64K was allocated to the VM.
      
      The remaining 16MB - 64K will be some other content of host memory, possibly
      including pages of the VM, but also pages of host kernel memory, host
      programs or other VMs.
      
      The attacking VM does not control the location of the page it can map,
      and is only allowed to map as many pages as it has pages of RAM.
      
      We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
      an IOMMU page is contained in the physical page so the PCI hardware won't
      get access to unassigned host memory; however this check is missing in
      the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
      did not hit this yet as the very first time when the mapping happens
      we do not have tbl::it_userspace allocated yet and fall back to
      the userspace which in turn calls VFIO IOMMU driver, this fails and
      the guest does not retry,
      
      This stores the smallest preregistered page size in the preregistered
      region descriptor and changes the mm_iommu_xxx API to check this against
      the IOMMU page size.
      
      This calculates maximum page size as a minimum of the natural region
      alignment and compound page size. For the page shift this uses the shift
      returned by find_linux_pte() which indicates how the page is mapped to
      the current userspace - if the page is huge and this is not a zero, then
      it is a leaf pte and the page is mapped within the range.
      
      Fixes: 121f80ba ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      58113603
    • Boris Ostrovsky's avatar
      xen/PVH: Set up GS segment for stack canary · 14500f14
      Boris Ostrovsky authored
      commit 98014068 upstream.
      
      We are making calls to C code (e.g. xen_prepare_pvh()) which may use
      stack canary (stored in GS segment).
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Jason Andryuk <jandryuk@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14500f14
    • Paul Burton's avatar
      MIPS: Fix off-by-one in pci_resource_to_user() · de019e78
      Paul Burton authored
      commit 38c0a74f upstream.
      
      The MIPS implementation of pci_resource_to_user() introduced in v3.12 by
      commit 4c2924b7 ("MIPS: PCI: Use pci_resource_to_user to map pci
      memory space properly") incorrectly sets *end to the address of the
      byte after the resource, rather than the last byte of the resource.
      
      This results in userland seeing resources as a byte larger than they
      actually are, for example a 32 byte BAR will be reported by a tool such
      as lspci as being 33 bytes in size:
      
          Region 2: I/O ports at 1000 [disabled] [size=33]
      
      Correct this by subtracting one from the calculated end address,
      reporting the correct address to userland.
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Reported-by: default avatarRui Wang <rui.wang@windriver.com>
      Fixes: 4c2924b7 ("MIPS: PCI: Use pci_resource_to_user to map pci memory space properly")
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Wolfgang Grandegger <wg@grandegger.com>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org # v3.12+
      Patchwork: https://patchwork.linux-mips.org/patch/19829/Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de019e78
    • Felix Fietkau's avatar
      MIPS: ath79: fix register address in ath79_ddr_wb_flush() · 4c686d73
      Felix Fietkau authored
      commit bc88ad2e upstream.
      
      ath79_ddr_wb_flush_base has the type void __iomem *, so register offsets
      need to be a multiple of 4 in order to access the intended register.
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarJohn Crispin <john@phrozen.org>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Fixes: 24b0e3e8 ("MIPS: ath79: Improve the DDR controller interface")
      Patchwork: https://patchwork.linux-mips.org/patch/19912/
      Cc: Alban Bedel <albeu@free.fr>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org # 4.2+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c686d73
    • Greg Kroah-Hartman's avatar
      Revert "cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting" · 4168a842
      Greg Kroah-Hartman authored
      This reverts commit 748144f3 which is
      commit f46ecbd9 upstream.
      
      Philip reports:
      	seems adding "cifs: Fix slab-out-of-bounds in send_set_info() on SMB2
      	ACE setting" (commit 748144f3) [1] created a regression within linux
      	v4.14 kernel series. Writing to a mounted cifs either freezes on writing
      	or crashes the PC. A more detailed explanation you may find in our
      	forums [2]. Reverting the patch, seems to "fix" it. Thoughts?
      
      	[2] https://forum.manjaro.org/t/53250Reported-by: default avatarPhilip Müller <philm@manjaro.org>
      Cc: Jianhong Yin <jiyin@redhat.com>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Cc: Aurelien Aptel <aaptel@suse.com>
      Cc: Steve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4168a842
  2. 25 Jul, 2018 19 commits