1. 15 Aug, 2017 12 commits
    • Eric Dumazet's avatar
      tcp: fix possible deadlock in TCP stack vs BPF filter · d624d276
      Eric Dumazet authored
      Filtering the ACK packet was not put at the right place.
      
      At this place, we already allocated a child and put it
      into accept queue.
      
      We absolutely need to call tcp_child_process() to release
      its spinlock, or we will deadlock at accept() or close() time.
      
      Found by syzkaller team (Thanks a lot !)
      
      Fixes: 8fac365f ("tcp: Add a tcp_filter hook before handle ack packet")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Chenbo Feng <fengc@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d624d276
    • Eric Dumazet's avatar
      dccp: purge write queue in dccp_destroy_sock() · 7749d4ff
      Eric Dumazet authored
      syzkaller reported that DCCP could have a non empty
      write queue at dismantle time.
      
      WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x417 kernel/panic.c:180
       __warn+0x1c4/0x1d9 kernel/panic.c:541
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
       do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
       invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
      RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
      RSP: 0018:ffff8801d182f108 EFLAGS: 00010297
      RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280
      RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0
      R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8
       inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835
       dccp_close+0x84d/0xc10 net/dccp/proto.c:1067
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       sock_release+0x8d/0x1e0 net/socket.c:597
       sock_close+0x16/0x20 net/socket.c:1126
       __fput+0x327/0x7e0 fs/file_table.c:210
       ____fput+0x15/0x20 fs/file_table.c:246
       task_work_run+0x18a/0x260 kernel/task_work.c:116
       exit_task_work include/linux/task_work.h:21 [inline]
       do_exit+0xa32/0x1b10 kernel/exit.c:865
       do_group_exit+0x149/0x400 kernel/exit.c:969
       get_signal+0x7e8/0x17e0 kernel/signal.c:2330
       do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
       exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157
       prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
       syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7749d4ff
    • Al Viro's avatar
      udp: fix linear skb reception with PEEK_OFF · 42b73059
      Al Viro authored
      copy_linear_skb() is broken; both of its callers actually
      expect 'len' to be the amount we are trying to copy,
      not the offset of the end.
      Fix it keeping the meanings of arguments in sync with what the
      callers (both of them) expect.
      Also restore a saner behavior on EFAULT (i.e. preserving
      the iov_iter position in case of failure):
      
      The commit fd851ba9 ("udp: harden copy_linear_skb()")
      avoids the more destructive effect of the buggy
      copy_linear_skb(), e.g. no more invalid memory access, but
      said function still behaves incorrectly: when peeking with
      offset it can fail with EINVAL instead of copying the
      appropriate amount of memory.
      Reported-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Fixes: b65ac446 ("udp: try to avoid 2 cache miss on dequeue")
      Fixes: fd851ba9 ("udp: harden copy_linear_skb()")
      Signed-off-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Tested-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42b73059
    • Wei Wang's avatar
      ipv6: release rt6->rt6i_idev properly during ifdown · e5645f51
      Wei Wang authored
      When a dst is created by addrconf_dst_alloc() for a host route or an
      anycast route, dst->dev points to loopback dev while rt6->rt6i_idev
      points to a real device.
      When the real device goes down, the current cleanup code only checks for
      dst->dev and assumes rt6->rt6i_idev->dev is the same. This causes the
      refcount leak on the real device in the above situation.
      This patch makes sure to always release the refcount taken on
      rt6->rt6i_idev during dst_dev_put().
      
      Fixes: 587fea74 ("ipv6: mark DST_NOGC and remove the operation of
      dst_free()")
      Reported-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5645f51
    • Eric Dumazet's avatar
      af_key: do not use GFP_KERNEL in atomic contexts · 36f41f8f
      Eric Dumazet authored
      pfkey_broadcast() might be called from non process contexts,
      we can not use GFP_KERNEL in these cases [1].
      
      This patch partially reverts commit ba51b6be ("net: Fix RCU splat in
      af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
      section.
      
      [1] : syzkaller reported :
      
      in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
      3 locks held by syzkaller183439/2932:
       #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
       #1:  (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
       #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline]
       #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
      CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
       __might_sleep+0x95/0x190 kernel/sched/core.c:5947
       slab_pre_alloc_hook mm/slab.h:416 [inline]
       slab_alloc mm/slab.c:3383 [inline]
       kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
       skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
       pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
       pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
       dump_sp+0x3d6/0x500 net/key/af_key.c:2685
       xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
       pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
       pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
       pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
       pfkey_process+0x606/0x710 net/key/af_key.c:2814
       pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
      sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       ___sys_sendmsg+0x755/0x890 net/socket.c:2035
       __sys_sendmsg+0xe5/0x210 net/socket.c:2069
       SYSC_sendmsg net/socket.c:2080 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2076
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x445d79
      RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79
      RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008
      RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700
      R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000
      R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000
      
      Fixes: ba51b6be ("net: Fix RCU splat in af_key")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36f41f8f
    • Sabrina Dubroca's avatar
      tcp: ulp: avoid module refcnt leak in tcp_set_ulp · 539a06ba
      Sabrina Dubroca authored
      __tcp_ulp_find_autoload returns tcp_ulp_ops after taking a reference on
      the module. Then, if ->init fails, tcp_set_ulp propagates the error but
      nothing releases that reference.
      
      Fixes: 734942cc ("tcp: ULP infrastructure")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      539a06ba
    • David S. Miller's avatar
      Merge branch 'Add-new-PCI_DEV_FLAGS_NO_RELAXED_ORDERING-flag' · bae514a6
      David S. Miller authored
      Ding Tianhong says:
      
      ====================
      Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
      
      Some devices have problems with Transaction Layer Packets with the Relaxed
      Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
      PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
      devices with Relaxed Ordering issues, and a use of this new flag by the
      cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
      Ports.
      
      It's been years since I've submitted kernel.org patches, I appolgise for the
      almost certain submission errors.
      
      v2: Alexander point out that the v1 was only a part of the whole solution,
          some platform which has some issues could use the new flag to indicate
          that it is not safe to enable relaxed ordering attribute, then we need
          to clear the relaxed ordering enable bits in the PCI configuration when
          initializing the device. So add a new second patch to modify the PCI
          initialization code to clear the relaxed ordering enable bit in the
          event that the root complex doesn't want relaxed ordering enabled.
      
          The third patch was base on the v1's second patch and only be changed
          to query the relaxed ordering enable bit in the PCI configuration space
          to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
          set.
      
          This version didn't plan to drop the defines for Intel Drivers to use the
          new checking way to enable relaxed ordering because it is not the hardest
          part of the moment, we could fix it in next patchset when this patches
          reach the goal.
      
      v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
          If a PCIe device didn't enable the relaxed ordering attribute default,
          we should not do anything in the PCIe configuration, otherwise we
          should check if any of the devices above us do not support relaxed
          ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
          the result if we get a return that indicate that the relaxed ordering
          is not supported we should update our device to disable relaxed ordering
          in configuration space. If the device above us doesn't exist or isn't
          the PCIe device, we shouldn't do anything and skip updating relaxed ordering
          because we are probably running in a guest.
      
      v4: Rename the functions pcie_get_relaxed_ordering and pcie_disable_relaxed_ordering
          according John's suggestion, and modify the description, use the true/false
          as the return value.
      
          We shouldn't enable relaxed ordering attribute by the setting in the root
          complex configuration space for PCIe device, so fix it for cxgb4.
      
          Fix some format issues.
      
      v5: Removed the unnecessary code for some function which only return the bool
          value, and add the check for VF device.
      
          Make this patch set base on 4.12-rc5.
      
      v6: Fix the logic error in the need to enable the relaxed ordering attribute for cxgb4.
      
      v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
          Ordering Enable] in PCI Probe() routine, this will break our current
          solution for some platform which has problematic when enable the relaxed
          ordering attribute. According to the latest recommendations, remove the
          enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
          scene, but we agree to leave this problem until we really trigger it.
      
          Make this patch set base on 4.12 release version.
      
      v8: Change the second patch title and description to make it more reasonable,
          add the acked-by from Alex and Ashok.
      
          Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.
      
          Make this patch set base on 4.13-rc2.
      
      v9: The document (https://software.intel.com/sites/default/files/managed/9e/
          bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
          processors based on Broadwell/Haswell microarchitecture has the problem
          with Relaxed Ordering Attribute enabled, so add the whole list Device ID
          from Intel to the patch.
      
      v10: Significant rework based on Bjorn's feedback, reorganize the first 2 patches,
           now the Intel and AMD erratum soc has been divided to the different patches,
           rename the pcie_relaxed_ordering_supported() to pcie_relaxed_ordering_enabled(),
           and no need to check every intervening switch except the root ports, update
           some commits.
      
      v11: We shouldn't let the Intel engineer to acked the AMD's erratum patch, fix the
           funny mistake.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bae514a6
    • Casey Leedom's avatar
      net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag · b629276d
      Casey Leedom authored
      cxgb4vf Ethernet driver now queries PCIe configuration space to
      determine if it can send TLPs to it with the Relaxed Ordering
      Attribute set, just like the pf did.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Reviewed-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b629276d
    • Casey Leedom's avatar
      net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag · b0ba9d5f
      Casey Leedom authored
      cxgb4 Ethernet driver now queries PCIe configuration space to determine
      if it can send TLPs to it with the Relaxed Ordering Attribute set.
      
      Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
      Device Control[Relaxed Ordering Enable] at probe routine, to make sure
      the driver will not send the Relaxed Ordering TLPs to the Root Complex which
      could not deal the Relaxed Ordering TLPs.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Reviewed-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0ba9d5f
    • dingtianhong's avatar
      PCI: Disable Relaxed Ordering Attributes for AMD A1100 · 077fa19c
      dingtianhong authored
      Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe
      Root Port where Upstream Transaction Layer Packets with the Relaxed
      Ordering Attribute clear are allowed to bypass earlier TLPs with
      Relaxed Ordering set, it would cause Data Corruption, so we need
      to disable Relaxed Ordering Attribute when Upstream TLPs to the
      Root Port.
      Reported-and-suggested-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Acked-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      077fa19c
    • dingtianhong's avatar
      PCI: Disable Relaxed Ordering for some Intel processors · 87e09cde
      dingtianhong authored
      According to the Intel spec section 3.9.1 said:
      
          3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
                and Toward MMIO Regions (P2P)
      
          In order to maximize performance for PCIe devices in the processors
          listed in Table 3-6 below, the soft- ware should determine whether the
          accesses are toward coherent memory (system memory) or toward MMIO
          regions (P2P access to other devices). If the access is toward MMIO
          region, then software can command HW to set the RO bit in the TLP
          header, as this would allow hardware to achieve maximum throughput for
          these types of accesses. For accesses toward coherent memory, software
          can command HW to clear the RO bit in the TLP header (no RO), as this
          would allow hardware to achieve maximum throughput for these types of
          accesses.
      
          Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
                     PCIe Performance
      
          Processor                            CPU RP Device IDs
      
          Intel Xeon processors based on       6F01H-6F0EH
          Broadwell microarchitecture
      
          Intel Xeon processors based on       2F01H-2F0EH
          Haswell microarchitecture
      
      It means some Intel processors has performance issue when use the Relaxed
      Ordering Attribute, so disable Relaxed Ordering for these root port.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarAshok Raj <ashok.raj@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87e09cde
    • dingtianhong's avatar
      PCI: Disable PCIe Relaxed Ordering if unsupported · a99b646a
      dingtianhong authored
      When bit4 is set in the PCIe Device Control register, it indicates
      whether the device is permitted to use relaxed ordering.
      On some platforms using relaxed ordering can have performance issues or
      due to erratum can cause data-corruption. In such cases devices must avoid
      using relaxed ordering.
      
      The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
      Relaxed Ordering (RO) attribute should not be used for Transaction Layer
      Packets (TLP) targeted towards these affected root complexes.
      
      This patch checks if there is any node in the hierarchy that indicates that
      using relaxed ordering is not safe. In such cases the patch turns off the
      relaxed ordering by clearing the capability for this device.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Acked-by: default avatarAshok Raj <ashok.raj@intel.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a99b646a
  2. 14 Aug, 2017 4 commits
    • Jon Paul Maloy's avatar
      tipc: avoid inheriting msg_non_seq flag when message is returned · 59a361bc
      Jon Paul Maloy authored
      In the function msg_reverse(), we reverse the header while trying to
      reuse the original buffer whenever possible. Those rejected/returned
      messages are always transmitted as unicast, but the msg_non_seq field
      is not explicitly set to zero as it should be.
      
      We have seen cases where multicast senders set the message type to
      "NOT dest_droppable", meaning that a multicast message shorter than
      one MTU will be returned, e.g., during receive buffer overflow, by
      reusing the original buffer. This has the effect that even the
      'msg_non_seq' field is inadvertently inherited by the rejected message,
      although it is now sent as a unicast message. This again leads the
      receiving unicast link endpoint to steer the packet toward the broadcast
      link receive function, where it is dropped. The affected unicast link is
      thereafter (after 100 failed retransmissions) declared 'stale' and
      reset.
      
      We fix this by unconditionally setting the 'msg_non_seq' flag to zero
      for all rejected/returned messages.
      Reported-by: default avatarCanh Duc Luu <canh.d.luu@dektech.com.au>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59a361bc
    • Jon Paul Maloy's avatar
      tipc: accept PACKET_MULTICAST packets · fed5f571
      Jon Paul Maloy authored
      On L2 bearers, the TIPC broadcast function is sending out packets using
      the corresponding L2 broadcast address. At reception, we filter such
      packets under the assumption that they will also be delivered as
      broadcast packets.
      
      This assumption doesn't always hold true. Under high load, we have seen
      that a switch may convert the destination address and deliver the packet
      as a PACKET_MULTICAST, something leading to inadvertently dropped
      packets and a stale and reset broadcast link.
      
      We fix this by extending the reception filtering to accept packets of
      type PACKET_MULTICAST.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fed5f571
    • Florian Westphal's avatar
      ipv4: route: fix inet_rtm_getroute induced crash · 2c87d63a
      Florian Westphal authored
      "ip route get $daddr iif eth0 from $saddr" causes:
       BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50
       Call Trace:
        ip_route_input_rcu+0x1535/0x1b50
        ip_route_input_noref+0xf9/0x190
        tcp_v4_early_demux+0x1a4/0x2b0
        ip_rcv+0xbcb/0xc05
        __netif_receive_skb+0x9c/0xd0
        netif_receive_skb_internal+0x5a8/0x890
      
      Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an
      iif was provided) or ip_route_output_key_hash_rcu.
      
      But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already
      associates the dst_entry with the skb.  This clears the SKB_DST_NOREF
      bit (i.e. skb_dst_drop will release/free the entry while it should not).
      
      Thus only set the dst if we called ip_route_output_key_hash_rcu().
      
      I tested this patch by running:
       while true;do ip r get 10.0.1.2;done > /dev/null &
       while true;do ip r get 10.0.1.2 iif eth0  from 10.0.1.1;done > /dev/null &
      ... and saw no crash or memory leak.
      
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: David Ahern <dsahern@gmail.com>
      Fixes: ba52d61e ("ipv4: route: restore skb_dst_set in inet_rtm_getroute")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c87d63a
    • Andreas Born's avatar
      bonding: ratelimit failed speed/duplex update warning · 11e9d782
      Andreas Born authored
      bond_miimon_commit() handles the UP transition for each slave of a bond
      in the case of MII. It is triggered 10 times per second for the default
      MII Polling interval of 100ms. For device drivers that do not implement
      __ethtool_get_link_ksettings() the call to bond_update_speed_duplex()
      fails persistently while the MII status could remain UP. That is, in
      this and other cases where the speed/duplex update keeps failing over a
      longer period of time while the MII state is UP, a warning is printed
      every MII polling interval.
      
      To address these excessive warnings net_ratelimit() should be used.
      Printing a warning once would not be sufficient since the call to
      bond_update_speed_duplex() could recover to succeed and fail again
      later. In that case there would be no new indication what went wrong.
      
      Fixes: b5bf0f5b (bonding: correctly update link status during mii-commit phase)
      Signed-off-by: default avatarAndreas Born <futur.andy@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11e9d782
  3. 11 Aug, 2017 10 commits
    • Eric Dumazet's avatar
      udp: harden copy_linear_skb() · fd851ba9
      Eric Dumazet authored
      syzkaller got crashes with CONFIG_HARDENED_USERCOPY=y configs.
      
      Issue here is that recvfrom() can be used with user buffer of Z bytes,
      and SO_PEEK_OFF of X bytes, from a skb with Y bytes, and following
      condition :
      
      Z < X < Y
      
      kernel BUG at mm/usercopy.c:72!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 2917 Comm: syzkaller842281 Not tainted 4.13.0-rc3+ #16
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      task: ffff8801d2fa40c0 task.stack: ffff8801d1fe8000
      RIP: 0010:report_usercopy mm/usercopy.c:64 [inline]
      RIP: 0010:__check_object_size+0x3ad/0x500 mm/usercopy.c:264
      RSP: 0018:ffff8801d1fef8a8 EFLAGS: 00010286
      RAX: 0000000000000078 RBX: ffffffff847102c0 RCX: 0000000000000000
      RDX: 0000000000000078 RSI: 1ffff1003a3fded5 RDI: ffffed003a3fdf09
      RBP: ffff8801d1fef998 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d1ea480e
      R13: fffffffffffffffa R14: ffffffff84710280 R15: dffffc0000000000
      FS:  0000000001360880(0000) GS:ffff8801dc000000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000202ecfe4 CR3: 00000001d1ff8000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       check_object_size include/linux/thread_info.h:108 [inline]
       check_copy_size include/linux/thread_info.h:139 [inline]
       copy_to_iter include/linux/uio.h:105 [inline]
       copy_linear_skb include/net/udp.h:371 [inline]
       udpv6_recvmsg+0x1040/0x1af0 net/ipv6/udp.c:395
       inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793
       sock_recvmsg_nosec net/socket.c:792 [inline]
       sock_recvmsg+0xc9/0x110 net/socket.c:799
       SYSC_recvfrom+0x2d6/0x570 net/socket.c:1788
       SyS_recvfrom+0x40/0x50 net/socket.c:1760
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: b65ac446 ("udp: try to avoid 2 cache miss on dequeue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd851ba9
    • David S. Miller's avatar
      Merge branch 'bpf-Minor-fix-in-bpf_convert_ctx_access' · 6401f37c
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      bpf: Minor fix in bpf_convert_ctx_access
      
      First one was found while trying to compile the kernel
      with !CONFIG_NET_RX_BUSY_POLL.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6401f37c
    • Daniel Borkmann's avatar
      bpf: fix two missing target_size settings in bpf_convert_ctx_access · 2ed46ce4
      Daniel Borkmann authored
      When CONFIG_NET_SCHED or CONFIG_NET_RX_BUSY_POLL is /not/ set and
      we try a narrow __sk_buff load of tc_index or napi_id, respectively,
      then verifier rightfully complains that it's misconfigured, because
      we need to set target_size in each of the two cases. The rewrite
      for the ctx access is just a dummy op, but needs to pass, so fix
      this up.
      
      Fixes: f96da094 ("bpf: simplify narrower ctx access")
      Reported-by: default avatarShubham Bansal <illusionist.neo@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ed46ce4
    • Daniel Borkmann's avatar
      net: fix compilation when busy poll is not enabled · e4dde412
      Daniel Borkmann authored
      MIN_NAPI_ID is used in various places outside of
      CONFIG_NET_RX_BUSY_POLL wrapping, so when it's not set
      we run into build errors such as:
      
        net/core/dev.c: In function 'dev_get_by_napi_id':
        net/core/dev.c:886:16: error: ‘MIN_NAPI_ID’ undeclared (first use in this function)
          if (napi_id < MIN_NAPI_ID)
                        ^~~~~~~~~~~
      
      Thus, have MIN_NAPI_ID always defined to fix these errors.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4dde412
    • Anton Vasilyev's avatar
      mISDN: Fix null pointer dereference at mISDN_FsmNew · 54a6a043
      Anton Vasilyev authored
      If mISDN_FsmNew() fails to allocate memory for jumpmatrix
      then null pointer dereference will occur on any write to
      jumpmatrix.
      
      The patch adds check on successful allocation and
      corresponding error handling.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAnton Vasilyev <vasilyev@ispras.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54a6a043
    • Simon Horman's avatar
      nfp: do not update MTU from BH in flower app · bb3afda4
      Simon Horman authored
      The Flower app may receive a request to update the MTU of a representor
      netdev upon receipt of a control message from the firmware. This requires
      the RTNL lock which needs to be taken outside of the packet processing
      path.
      
      As a handling of this correctly seems a little to invasive for a fix simply
      skip setting the MTU for now.
      
      Relevant backtrace:
       [ 1496.288489] BUG: scheduling while atomic: kworker/0:3/373/0x00000100
       [ 1496.294911]  dca syscopyarea sysfillrect sysimgblt fb_sys_fops ptp drm mxm_wmi ahci pps_core libahci i2c_algo_bit wmi [last unloaded: nfp]
       [ 1496.294918] CPU: 0 PID: 373 Comm: kworker/0:3 Tainted: G           OE   4.13.0-rc3+ #3
       [ 1496.294919] Hardware name: Supermicro X10DRi/X10DRi, BIOS 2.0 12/28/2015
       [ 1496.294923] Workqueue: events work_for_cpu_fn
       [ 1496.294924] Call Trace:
       [ 1496.294927]  <IRQ>
       [ 1496.294931]  dump_stack+0x63/0x82
       [ 1496.294935]  __schedule_bug+0x54/0x70
       [ 1496.294937]  __schedule+0x62f/0x890
       [ 1496.294941]  ? intel_unmap_sg+0x90/0x90
       [ 1496.294942]  schedule+0x36/0x80
       [ 1496.294943]  schedule_preempt_disabled+0xe/0x10
       [ 1496.294945]  __mutex_lock.isra.2+0x445/0x4a0
       [ 1496.294947]  ? device_is_rmrr_locked+0x12/0x50
       [ 1496.294950]  ? kfree+0x162/0x170
       [ 1496.294952]  ? device_is_rmrr_locked+0x12/0x50
       [ 1496.294953]  ? iommu_should_identity_map+0x50/0xe0
       [ 1496.294954]  __mutex_lock_slowpath+0x13/0x20
       [ 1496.294955]  ? iommu_no_mapping+0x48/0xd0
       [ 1496.294956]  ? __mutex_lock_slowpath+0x13/0x20
       [ 1496.294957]  mutex_lock+0x2f/0x40
       [ 1496.294960]  rtnl_lock+0x15/0x20
       [ 1496.294979]  nfp_flower_cmsg_rx+0xc8/0x150 [nfp]
       [ 1496.294986]  nfp_ctrl_poll+0x286/0x350 [nfp]
       [ 1496.294989]  tasklet_action+0xf6/0x110
       [ 1496.294992]  __do_softirq+0xed/0x278
       [ 1496.294993]  irq_exit+0xb6/0xc0
       [ 1496.294994]  do_IRQ+0x4f/0xd0
       [ 1496.294996]  common_interrupt+0x89/0x89
      
      Fixes: 948faa46 ("nfp: add support for control messages for flower app")
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb3afda4
    • Romain Perier's avatar
      net: stmmac: Use the right logging function in stmmac_mdio_register · fbca1647
      Romain Perier authored
      Currently, the function stmmac_mdio_register() is only used by
      stmmac_dvr_probe() from stmmac_main.c, in order to register the MDIO bus
      and probe information about the PHY. As this function is called before
      calling register_netdev(), all messages logged from stmmac_mdio_register
      are prefixed by "(unnamed net_device)". The goal of netdev_info or
      netdev_err is to dump useful infos about a net_device, when this data
      structure is partially initialized, there is no point for using these
      functions.
      
      This commit fixes the issue by replacing all netdev_*() by the
      corresponding dev_*() function for logging. The last netdev_info is
      replaced by phy_attached_info(), as a valid phydev can be used at this
      point.
      Signed-off-by: default avatarRomain Perier <romain.perier@collabora.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbca1647
    • Konstantin Khlebnikov's avatar
      net/sched/hfsc: allocate tcf block for hfsc root class · 8d553738
      Konstantin Khlebnikov authored
      Without this filters cannot be attached.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d553738
    • Andreas Born's avatar
      bonding: require speed/duplex only for 802.3ad, alb and tlb · ad729bc9
      Andreas Born authored
      The patch c4adfc82 ("bonding: make speed, duplex setting consistent
      with link state") puts the link state to down if
      bond_update_speed_duplex() cannot retrieve speed and duplex settings.
      Assumably the patch was written with 802.3ad mode in mind which relies
      on link speed/duplex settings. For other modes like active-backup these
      settings are not required. Thus, only for these other modes, this patch
      reintroduces support for slaves that do not support reporting speed or
      duplex such as wireless devices. This fixes the regression reported in
      bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547).
      
      Fixes: c4adfc82 ("bonding: make speed, duplex setting consistent
      with link state")
      Signed-off-by: default avatarAndreas Born <futur.andy@googlemail.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad729bc9
    • Vivien Didelot's avatar
      net: dsa: ksz: fix skb freeing · e71cb9e0
      Vivien Didelot authored
      The DSA layer frees the original skb when an xmit function returns NULL,
      meaning an error occurred. But if the tagging code copied the original
      skb, it is responsible of freeing the copy if an error occurs.
      
      The ksz tagging code currently has two issues: if skb_put_padto fails,
      the skb copy is not freed, and the original skb will be freed twice.
      
      To fix that, move skb_put_padto inside both branches of the skb_tailroom
      condition, before freeing the original skb, and free the copy on error.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: default avatarWoojung Huh <woojung.huh@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e71cb9e0
  4. 10 Aug, 2017 10 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 26273939
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix handling of initial STATE message in TIPC, from Jon Paul Maloy.
      
       2) Fix stats handling in bcm_sysport_get_stats(), from Florian
          Fainelli.
      
       3) Reject 16777215 VNI value in geneve_validate(), from Girish
          Moodalbail.
      
       4) Fix initial IGMP sysctl setting regression, from Nikolay Borisov.
      
       5) Once a UFO fragmented frame is treated as UFO, we should continue
          doing so. Likewise once a frame has been segmented, we should
          continue doing that and not try to convert it to a UFO frame. From
          Willem de Bruijn.
      
       6) Test the AF_PACKET RX/TX ring pg_vec state under the socket lock to
          prevent races. From Willem de Bruijn.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        packet: fix tp_reserve race in packet_set_ring
        udp: consistently apply ufo or fragmentation
        net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
        igmp: Fix regression caused by igmp sysctl namespace code.
        geneve: maximum value of VNI cannot be used
        net: systemport: Fix software statistics for SYSTEMPORT Lite
        tipc: remove premature ESTABLISH FSM event at link synchronization
      26273939
    • Willem de Bruijn's avatar
      packet: fix tp_reserve race in packet_set_ring · c27927e3
      Willem de Bruijn authored
      Updates to tp_reserve can race with reads of the field in
      packet_set_ring. Avoid this by holding the socket lock during
      updates in setsockopt PACKET_RESERVE.
      
      This bug was discovered by syzkaller.
      
      Fixes: 8913336a ("packet: add PACKET_RESERVE sockopt")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c27927e3
    • Willem de Bruijn's avatar
      udp: consistently apply ufo or fragmentation · 85f1bd9a
      Willem de Bruijn authored
      When iteratively building a UDP datagram with MSG_MORE and that
      datagram exceeds MTU, consistently choose UFO or fragmentation.
      
      Once skb_is_gso, always apply ufo. Conversely, once a datagram is
      split across multiple skbs, do not consider ufo.
      
      Sendpage already maintains the first invariant, only add the second.
      IPv6 does not have a sendpage implementation to modify.
      
      A gso skb must have a partial checksum, do not follow sk_no_check_tx
      in udp_send_skb.
      
      Found by syzkaller.
      
      Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85f1bd9a
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · f213ad38
      Linus Torvalds authored
      Pull sparc updates from David Miller:
      
       1) Recognize M8 cpus, just basic chip ID matching, from Allen Pais.
      
       2) Prevent crashes when bringing up sunvdc virtual block devices in
          some environments. From Jim Quigley.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain
        sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.
        sparc64: recognize and support sparc M8 cpu type
        sparc64: properly name the cpu constants
      f213ad38
    • Xin Long's avatar
      net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target · 96d97030
      Xin Long authored
      Commit 55917a21 ("netfilter: x_tables: add context to know if
      extension runs from nft_compat") introduced a member nft_compat to
      xt_tgchk_param structure.
      
      But it didn't set it's value for ipt_init_target. With unexpected
      value in par.nft_compat, it may return unexpected result in some
      target's checkentry.
      
      This patch is to set all it's fields as 0 and only initialize the
      non-zero fields in ipt_init_target.
      
      v1->v2:
        As Wang Cong's suggestion, fix it by setting all it's fields as
        0 and only initializing the non-zero fields.
      
      Fixes: 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat")
      Suggested-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96d97030
    • Nikolay Borisov's avatar
      igmp: Fix regression caused by igmp sysctl namespace code. · 1714020e
      Nikolay Borisov authored
      Commit dcd87999 ("igmp: net: Move igmp namespace init to correct file")
      moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This
      function is only called as part of per-namespace initialization, only if
      CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is
      compiled out, casuing the igmp pernet ops to not be registerd and those sysctl
      being left initialized with 0. However, there are certain functions, such as
      ip_mc_join_group which are always compiled and make use of some of those
      sysctls. Let's do a partial revert of the aforementioned commit and move the
      sysctl initialization into inet_init_net, that way they will always have
      sane values.
      
      Fixes: dcd87999 ("igmp: net: Move igmp namespace init to correct file")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595Reported-by: default avatarGerardo Exequiel Pozzi <vmlinuz386@gmail.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1714020e
    • Girish Moodalbail's avatar
      geneve: maximum value of VNI cannot be used · 04db70d9
      Girish Moodalbail authored
      Geneve's Virtual Network Identifier (VNI) is 24 bit long, so the range
      of values for it would be from 0 to 16777215 (2^24 -1).  However, one
      cannot create a geneve device with VNI set to 16777215. This patch fixes
      this issue.
      Signed-off-by: default avatarGirish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04db70d9
    • Florian Fainelli's avatar
      net: systemport: Fix software statistics for SYSTEMPORT Lite · 50ddfbaf
      Florian Fainelli authored
      With SYSTEMPORT Lite we have holes in our statistics layout that make us
      skip over the hardware MIB counters, bcm_sysport_get_stats() was not
      taking that into account resulting in reporting 0 for all SW-maintained
      statistics, fix this by skipping accordingly.
      
      Fixes: 44a4524c ("net: systemport: Add support for SYSTEMPORT Lite")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50ddfbaf
    • Jon Paul Maloy's avatar
      tipc: remove premature ESTABLISH FSM event at link synchronization · ed43594a
      Jon Paul Maloy authored
      When a link between two nodes come up, both endpoints will initially
      send out a STATE message to the peer, to increase the probability that
      the peer endpoint also is up when the first traffic message arrives.
      Thereafter, if the establishing link is the second link between two
      nodes, this first "traffic" message is a TUNNEL_PROTOCOL/SYNCH message,
      helping the peer to perform initial synchronization between the two
      links.
      
      However, the initial STATE message may be lost, in which case the SYNCH
      message will be the first one arriving at the peer. This should also
      work, as the SYNCH message itself will be used to take up the link
      endpoint before  initializing synchronization.
      
      Unfortunately the code for this case is broken. Currently, the link is
      brought up through a tipc_link_fsm_evt(ESTABLISHED) when a SYNCH
      arrives, whereupon __tipc_node_link_up() is called to distribute the
      link slots and take the link into traffic. But, __tipc_node_link_up() is
      itself starting with a test for whether the link is up, and if true,
      returns without action. Clearly, the tipc_link_fsm_evt(ESTABLISHED) call
      is unnecessary, since tipc_node_link_up() is itself issuing such an
      event, but also harmful, since it inhibits tipc_node_link_up() to
      perform the test of its tasks, and the link endpoint in question hence
      is never taken into traffic.
      
      This problem has been exposed when we set up dual links between pre-
      and post-4.4 kernels, because the former ones don't send out the
      initial STATE message described above.
      
      We fix this by removing the unnecessary event call.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed43594a
    • Jim Quigley's avatar
      sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain · 3ee70591
      Jim Quigley authored
      Using mpgroup to define multiple paths for a virtual disk causes multiple
      virtual-device-port ports to be created for that virtual device.
      Each virtual-device-port port then gets a vdisk created for it by the Linux
      sunvdc driver. As mpgroup is not supported by the Linux sunvdc driver it
      cannot handle multiple ports for a single vdisk, leading to a kernel panic
      at startup.
      
      This fix prevents more than one vdisk per virtual-device-port being created
      until full virtual disk multipathing (mpgroup) support is implemented.
      Signed-off-by: default avatarJim Quigley <Jim.Quigley@oracle.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Reviewed-by: default avatarAlexandre Chartre <alexandre.chartre@oracle.com>
      Reviewed-by: default avatarAaron Young <aaron.young@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ee70591
  5. 09 Aug, 2017 4 commits
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 8d31f80e
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "These are the pin control fixes I have gathered since the return from
        my vacation. They boiled in -next a while so let's get them in.
      
        Apart from the documentation build it is purely driver fixes. Which is
        nice. The Intel fixes seem kind of important.
      
         - Fix the documentation build as the docs were moved
      
         - Correct the UART pin list on the Intel Merrifield
      
         - Fix pin assignment and number of pins on the Marvell Armada 37xx
           pin controller
      
         - Cover the Setzer models in the Chromebook DMI quirk in the Intel
           cheryview driver so they start working
      
         - Add the missing "sim" function to the sunxi driver
      
         - Fix USB pin definitions on Uniphier Pro4
      
         - Smatch fix for invalid reference in the zx pin control driver"
      
      * tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: generic: update references to Documentation/pinctrl.txt
        pinctrl: intel: merrifield: Correct UART pin lists
        pinctrl: armada-37xx: Fix number of pin in south bridge
        pinctrl: armada-37xx: Fix the pin 23 on south bridge
        pinctrl: cherryview: Add Setzer models to the Chromebook DMI quirk
        pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
        pinctrl: uniphier: fix USB3 pin assignment for Pro4
        pinctrl: zte: fix dereference of 'data' in zx_set_mux()
      8d31f80e
    • Mel Gorman's avatar
      futex: Remove unnecessary warning from get_futex_key · 48fb6f4d
      Mel Gorman authored
      Commit 65d8fc77 ("futex: Remove requirement for lock_page() in
      get_futex_key()") removed an unnecessary lock_page() with the
      side-effect that page->mapping needed to be treated very carefully.
      
      Two defensive warnings were added in case any assumption was missed and
      the first warning assumed a correct application would not alter a
      mapping backing a futex key.  Since merging, it has not triggered for
      any unexpected case but Mark Rutland reported the following bug
      triggering due to the first warning.
      
        kernel BUG at kernel/futex.c:679!
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
        Hardware name: linux,dummy-virt (DT)
        task: ffff80001e271780 task.stack: ffff000010908000
        PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145
      
      The fact that it's a bug instead of a warning was due to an unrelated
      arm64 problem, but the warning itself triggered because the underlying
      mapping changed.
      
      This is an application issue but from a kernel perspective it's a
      recoverable situation and the warning is unnecessary so this patch
      removes the warning.  The warning may potentially be triggered with the
      following test program from Mark although it may be necessary to adjust
      NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
      system.
      
          #include <linux/futex.h>
          #include <pthread.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <sys/mman.h>
          #include <sys/syscall.h>
          #include <sys/time.h>
          #include <unistd.h>
      
          #define NR_FUTEX_THREADS 16
          pthread_t threads[NR_FUTEX_THREADS];
      
          void *mem;
      
          #define MEM_PROT  (PROT_READ | PROT_WRITE)
          #define MEM_SIZE  65536
      
          static int futex_wrapper(int *uaddr, int op, int val,
                                   const struct timespec *timeout,
                                   int *uaddr2, int val3)
          {
              syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
          }
      
          void *poll_futex(void *unused)
          {
              for (;;) {
                  futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
              }
          }
      
          int main(int argc, char *argv[])
          {
              int i;
      
              mem = mmap(NULL, MEM_SIZE, MEM_PROT,
                     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      
              printf("Mapping @ %p\n", mem);
      
              printf("Creating futex threads...\n");
      
              for (i = 0; i < NR_FUTEX_THREADS; i++)
                  pthread_create(&threads[i], NULL, poll_futex, NULL);
      
              printf("Flipping mapping...\n");
              for (;;) {
                  mmap(mem, MEM_SIZE, MEM_PROT,
                       MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
              }
      
              return 0;
          }
      Reported-and-tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org # 4.7+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48fb6f4d
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 358f8c26
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "The main thing is to allow empty id_tables for ACPI to make some
        drivers get probed again. It looks a bit bigger than usual because it
        needs some internal renaming, too.
      
        Other than that, there is a fix for broken DSTDs, a super simple
        enablement for ARM MPS, and two documentation fixes which I'd like to
        see in v4.13 already"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: rephrase explanation of I2C_CLASS_DEPRECATED
        i2c: allow i2c-versatile for ARM MPS platforms
        i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz
        i2c: designware: Print clock freq on invalid clock freq error
        i2c: core: Allow empty id_table in ACPI case as well
        i2c: mux: pinctrl: mention correct module name in Kconfig help text
      358f8c26
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 31cf92f3
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Three patches that should go into this release.
      
        Two of them are from Paolo and fix up some corner cases with BFQ, and
        the last patch is from Ming and fixes up a potential usage count
        imbalance regression due to the recent NOWAIT work"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed
        block, bfq: consider also in_service_entity to state whether an entity is active
        block, bfq: reset in_service_entity if it becomes idle
      31cf92f3