1. 21 Apr, 2016 25 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c5edde3a
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix memory leak in iwlwifi, from Matti Gottlieb.
      
       2) Add missing registration of netfilter arp_tables into initial
          namespace, from Florian Westphal.
      
       3) Fix potential NULL deref in DecNET routing code.
      
       4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry
          Ivanov.
      
       5) Fix dst ref counting in VRF, from David Ahern.
      
       6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck.
      
       7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause.
      
       8) Ravalidate IPV6 datagram socket cached routes properly, particularly
          with UDP, from Martin KaFai Lau.
      
       9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang.
      
      10) Fix stats typing in bcmgenet driver, from Eric Dumazet.
      
      11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing,
          from Joe Stringer.
      
      12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown.
      
      13) atl2 doesn't actually support scatter-gather, so don't advertise the
          feature.  From Ben Hucthings.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
        openvswitch: use flow protocol when recalculating ipv6 checksums
        Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
        atl2: Disable unimplemented scatter/gather feature
        net/mlx4_en: Split SW RX dropped counter per RX ring
        net/mlx4_core: Don't allow to VF change global pause settings
        net/mlx4_core: Avoid repeated calls to pci enable/disable
        net/mlx4_core: Implement pci_resume callback
        net: phy: spi_ks8895: Don't leak references to SPI devices
        net: ethernet: davinci_emac: Fix platform_data overwrite
        net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable
        qede: Fix single MTU sized packet from firmware GRO flow
        qede: Fix setting Skb network header
        qede: Fix various memory allocation error flows for fastpath
        tcp: Merge tx_flags and tskey in tcp_shifted_skb
        tcp: Merge tx_flags and tskey in tcp_collapse_retrans
        drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open
        tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
        openvswitch: Orphan skbs before IPv6 defrag
        Revert "Prevent NUll pointer dereference with two PHYs on cpsw"
        VSOCK: Only check error on skb_recv_datagram when skb is NULL
        ...
      c5edde3a
    • Simon Horman's avatar
      openvswitch: use flow protocol when recalculating ipv6 checksums · b4f70527
      Simon Horman authored
      When using masked actions the ipv6_proto field of an action
      to set IPv6 fields may be zero rather than the prevailing protocol
      which will result in skipping checksum recalculation.
      
      This patch resolves the problem by relying on the protocol
      in the flow key rather than that in the set field action.
      
      Fixes: 83d2b9ba ("net: openvswitch: Support masked set actions.")
      Cc: Jarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4f70527
    • Shrikrishna Khare's avatar
      Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets · f0d43780
      Shrikrishna Khare authored
      For IPv6, if the device indicates that the checksum is correct, set
      CHECKSUM_UNNECESSARY.
      Reported-by: default avatarSubbarao Narahari <snarahari@vmware.com>
      Signed-off-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Signed-off-by: default avatarJin Heo <heoj@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0d43780
    • Ben Hutchings's avatar
      atl2: Disable unimplemented scatter/gather feature · f43bfaed
      Ben Hutchings authored
      atl2 includes NETIF_F_SG in hw_features even though it has no support
      for non-linear skbs.  This bug was originally harmless since the
      driver does not claim to implement checksum offload and that used to
      be a requirement for SG.
      
      Now that SG and checksum offload are independent features, if you
      explicitly enable SG *and* use one of the rare protocols that can use
      SG without checkusm offload, this potentially leaks sensitive
      information (before you notice that it just isn't working).  Therefore
      this obscure bug has been designated CVE-2016-2117.
      Reported-by: default avatarJustin Yackoski <jyackoski@crypto-nite.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: ec5f0615 ("net: Kill link between CSUM and SG features.")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f43bfaed
    • David S. Miller's avatar
      Merge branch 'mlx4-fixes' · 669c00c0
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      Mellaox 40G driver fixes for 4.6-rc
      
      With the fix for ARM bug being under the works, these are
      few other fixes for mlx4 we have ready to go.
      
      Eran addressed the problematic/wrong reporting of dropped packets, Daniel
      fixed some matters related to PPC EEH's and Jenny's patch makes sure
      VFs can't change the port's pause settings.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      669c00c0
    • Eran Ben Elisha's avatar
      net/mlx4_en: Split SW RX dropped counter per RX ring · d21ed3a3
      Eran Ben Elisha authored
      Count SW packet drops per RX ring instead of a global counter. This
      will allow monitoring the number of rx drops per ring.
      
      In addition, SW rx_dropped counter was overwritten by HW rx_dropped
      counter, sum both of them instead to show the accurate value.
      
      Fixes: a3333b35 ('net/mlx4_en: Moderate ethtool callback to [...] ')
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reported-by: default avatarBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d21ed3a3
    • Eugenia Emantayev's avatar
      net/mlx4_core: Don't allow to VF change global pause settings · 2a500090
      Eugenia Emantayev authored
      Currently changing global pause settings is done via SET_PORT
      command with input modifier GENERAL. This command is allowed
      for each VF since MTU setting is done via the same command.
      
      Change the above to the following scheme: before passing the
      request to the FW, the PF will check whether it was issued
      by a slave. If yes, don't change global pause and warn,
      otherwise change to the requested value and store for
      further reference.
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a500090
    • Daniel Jurgens's avatar
      net/mlx4_core: Avoid repeated calls to pci enable/disable · 4bfd2e6e
      Daniel Jurgens authored
      Maintain the PCI status and provide wrappers for enabling and disabling
      the PCI device.  Performing the actions more than once without doing
      its opposite results in warning logs.
      
      This occurred when EEH hotplugged the device causing a warning for
      disabling an already disabled device.
      
      Fixes: 2ba5fbd6 ('net/mlx4_core: Handle AER flow properly')
      Signed-off-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bfd2e6e
    • Daniel Jurgens's avatar
      net/mlx4_core: Implement pci_resume callback · c12833ac
      Daniel Jurgens authored
      Move resume related activities to a new pci_resume function instead of
      performing them in mlx4_pci_slot_reset.  This change is needed to avoid
      a hotplug during EEH recovery due to commit f2da4ccf ("powerpc/eeh:
      More relaxed hotplug criterion").
      
      Fixes: 2ba5fbd6 ('net/mlx4_core: Handle AER flow properly')
      Signed-off-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c12833ac
    • Mark Brown's avatar
      net: phy: spi_ks8895: Don't leak references to SPI devices · a1459c1c
      Mark Brown authored
      The ks8895 driver is using spi_dev_get() apparently just to take a copy
      of the SPI device used to instantiate it but never calls spi_dev_put()
      to free it.  Since the device is guaranteed to exist between probe() and
      remove() there should be no need for the driver to take an extra
      reference to it so fix the leak by just using a straight assignment.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1459c1c
    • Neil Armstrong's avatar
      net: ethernet: davinci_emac: Fix platform_data overwrite · 210990b0
      Neil Armstrong authored
      When the DaVinci emac driver is removed and re-probed, the actual
      pdev->dev.platform_data is populated with an unwanted valid pointer saved by
      the previous davinci_emac_of_get_pdata() call, causing a kernel crash when
      calling priv->int_disable() in emac_int_disable().
      
      Unable to handle kernel paging request at virtual address c8622a80
      ...
      [<c0426fb4>] (emac_int_disable) from [<c0427700>] (emac_dev_open+0x290/0x5f8)
      [<c0427700>] (emac_dev_open) from [<c04c00ec>] (__dev_open+0xb8/0x120)
      [<c04c00ec>] (__dev_open) from [<c04c0370>] (__dev_change_flags+0x88/0x14c)
      [<c04c0370>] (__dev_change_flags) from [<c04c044c>] (dev_change_flags+0x18/0x48)
      [<c04c044c>] (dev_change_flags) from [<c052bafc>] (devinet_ioctl+0x6b4/0x7ac)
      [<c052bafc>] (devinet_ioctl) from [<c04a1428>] (sock_ioctl+0x1d8/0x2c0)
      [<c04a1428>] (sock_ioctl) from [<c014f054>] (do_vfs_ioctl+0x41c/0x600)
      [<c014f054>] (do_vfs_ioctl) from [<c014f2a4>] (SyS_ioctl+0x6c/0x7c)
      [<c014f2a4>] (SyS_ioctl) from [<c000ff60>] (ret_fast_syscall+0x0/0x1c)
      
      Fixes: 42f59967 ("net: ethernet: davinci_emac: add OF support")
      Cc: Brian Hutchinson <b.hutchman@gmail.com>
      Signed-off-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      210990b0
    • Neil Armstrong's avatar
      net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable · 99164f9e
      Neil Armstrong authored
      In order to avoid an Unbalanced pm_runtime_enable in the DaVinci
      emac driver when the device is removed and re-probed, and a
      pm_runtime_disable() call in davinci_emac_remove().
      
      Actually, using unbind/bind on a TI DM8168 SoC gives :
      $ echo 4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/unbind
      net eth1: DaVinci EMAC: davinci_emac_remove()
      $ echo 4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/bind
      davinci_emac 4a120000.ethernet: Unbalanced pm_runtime_enable
      
      Cc: Brian Hutchinson <b.hutchman@gmail.com>
      Fixes: 3ba97381 ("net: ethernet: davinci_emac: add pm_runtime support")
      Signed-off-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99164f9e
    • David S. Miller's avatar
      Merge branch 'qed-fixes' · 3ad97799
      David S. Miller authored
      Manish Chopra says:
      
      ====================
      qede: Bug fixes
      
      This series fixes -
      
      * various memory allocation failure flows for fastpath
      * issues with respect to driver GRO packets handling
      
      V1->V2
      
      * Send series against net instead of net-next.
      
      Please consider applying this series to "net"
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ad97799
    • Manish Chopra's avatar
      qede: Fix single MTU sized packet from firmware GRO flow · ee2fa8e6
      Manish Chopra authored
      In firmware assisted GRO flow there could be a single MTU sized
      segment arriving due to firmware aggregation timeout/last segment
      in an aggregation flow, which is not expected to be an actual gro
      packet. So If a skb has zero frags from the GRO flow then simply
      push it in the stack as non gso skb.
      Signed-off-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarYuval Mintz <yuval.mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee2fa8e6
    • Manish Chopra's avatar
      qede: Fix setting Skb network header · aad94c04
      Manish Chopra authored
      Skb's network header needs to be set before extracting IPv4/IPv6
      headers from it.
      Signed-off-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarYuval Mintz <yuval.mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aad94c04
    • Manish Chopra's avatar
      qede: Fix various memory allocation error flows for fastpath · f86af2df
      Manish Chopra authored
      This patch handles memory allocation failures for fastpath
      gracefully in the driver.
      Signed-off-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarYuval Mintz <yuval.mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f86af2df
    • David S. Miller's avatar
      Merge branch 'tcp-coalesce-merge-timestamps' · 5bec11cf
      David S. Miller authored
      Martin KaFai Lau says:
      
      ====================
      tcp: Merge timestamp info when coalescing skbs
      
      This series is separated from the RFC series related to
      tcp_sendmsg(MSG_EOR) and it is targeting for the net branch.
      This patchset is focusing on fixing cases where TCP
      timestamp could be lost after coalescing skbs.
      
      A BPF prog is used to kprobe to sock_queue_err_skb()
      and print out the value of serr->ee.ee_data.  The BPF
      prog (run-able from bcc) is attached here:
      
      BPF prog used for testing:
      ~~~~~
      
      from __future__ import print_function
      from bcc import BPF
      
      bpf_text = """
      
      int trace_err_skb(struct pt_regs *ctx)
      {
      	struct sk_buff *skb = (struct sk_buff *)ctx->si;
      	struct sock *sk = (struct sock *)ctx->di;
      	struct sock_exterr_skb *serr;
      	u32 ee_data = 0;
      
      	if (!sk || !skb)
      		return 0;
      
      	serr = SKB_EXT_ERR(skb);
      	bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
      	bpf_trace_printk("ee_data:%u\\n", ee_data);
      
      	return 0;
      };
      """
      
      b = BPF(text=bpf_text)
      b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
      print("Attached to kprobe")
      b.trace_print()
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bec11cf
    • Martin KaFai Lau's avatar
      tcp: Merge tx_flags and tskey in tcp_shifted_skb · cfea5a68
      Martin KaFai Lau authored
      After receiving sacks, tcp_shifted_skb() will collapse
      skbs if possible.  tx_flags and tskey also have to be
      merged.
      
      This patch reuses the tcp_skb_collapse_tstamp() to handle
      them.
      
      BPF Output Before:
      ~~~~~
      <no-output-due-to-missing-tstamp-event>
      
      BPF Output After:
      ~~~~~
      <...>-2024  [007] d.s.    88.644374: : ee_data:14599
      
      Packetdrill Script:
      ~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 1460) = 1460
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 13140) = 13140
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      0.400 close(4) = 0
      0.400 > F. 14601:14601(0) ack 1
      0.500 < F. 1:1(0) ack 14602 win 257
      0.500 > . 14602:14602(0) ack 2
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfea5a68
    • Martin KaFai Lau's avatar
      tcp: Merge tx_flags and tskey in tcp_collapse_retrans · 082ac2d5
      Martin KaFai Lau authored
      If two skbs are merged/collapsed during retransmission, the current
      logic does not merge the tx_flags and tskey.  The end result is
      the SCM_TSTAMP_ACK timestamp could be missing for a packet.
      
      The patch:
      1. Merge the tx_flags
      2. Overwrite the prev_skb's tskey with the next_skb's tskey
      
      BPF Output Before:
      ~~~~~~
      <no-output-due-to-missing-tstamp-event>
      
      BPF Output After:
      ~~~~~~
      packetdrill-2092  [001] d.s.   453.998486: : ee_data:1459
      
      Packetdrill Script:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      0.200 write(4, ..., 11680) = 11680
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      
      0.200 > P. 1:731(730) ack 1
      0.200 > P. 731:1461(730) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:13141(4380) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 13141 win 257
      
      0.400 close(4) = 0
      0.400 > F. 13141:13141(0) ack 1
      0.500 < F. 1:1(0) ack 13142 win 257
      0.500 > . 13142:13142(0) ack 2
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      082ac2d5
    • Grygorii Strashko's avatar
      drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open · 3fa88c51
      Grygorii Strashko authored
      The cpsw_ndo_open() could try to access CPSW registers before
      calling pm_runtime_get_sync(). This will trigger L3 error:
      
       WARNING: CPU: 0 PID: 21 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x220/0x34c()
       44000000.ocp:L3 Custom Error: MASTER M2 (64-bit) TARGET L4_FAST (Idle): Data Access in Supervisor mode during Functional access
      
      and CPSW will stop functioning.
      
      Hence, fix it by moving pm_runtime_get_sync() before the first access
      to CPSW registers in cpsw_ndo_open().
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fa88c51
    • Martin KaFai Lau's avatar
      tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks · 479f85c3
      Martin KaFai Lau authored
      Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received,
      it could incorrectly think that a skb has already
      been acked and queue a SCM_TSTAMP_ACK cmsg to the
      sk->sk_error_queue.
      
      In tcp_ack_tstamp(), it checks
      'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'.
      If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill
      script, between() returns true but the tskey is actually not acked.
      e.g. try between(3, 2, 1).
      
      The fix is to replace between() with one before() and one !before().
      By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be
      removed.
      
      A packetdrill script is used to reproduce the dup ack scenario.
      Due to the lacking cmsg support in packetdrill (may be I
      cannot find it),  a BPF prog is used to kprobe to
      sock_queue_err_skb() and print out the value of
      serr->ee.ee_data.
      
      Both the packetdrill and the bcc BPF script is attached at the end of
      this commit message.
      
      BPF Output Before Fix:
      ~~~~~~
            <...>-2056  [001] d.s.   433.927987: : ee_data:1459  #incorrect
      packetdrill-2056  [001] d.s.   433.929563: : ee_data:1459  #incorrect
      packetdrill-2056  [001] d.s.   433.930765: : ee_data:1459  #incorrect
      packetdrill-2056  [001] d.s.   434.028177: : ee_data:1459
      packetdrill-2056  [001] d.s.   434.029686: : ee_data:14599
      
      BPF Output After Fix:
      ~~~~~~
            <...>-2049  [000] d.s.   113.517039: : ee_data:1459
            <...>-2049  [000] d.s.   113.517253: : ee_data:14599
      
      BCC BPF Script:
      ~~~~~~
      #!/usr/bin/env python
      
      from __future__ import print_function
      from bcc import BPF
      
      bpf_text = """
      #include <uapi/linux/ptrace.h>
      #include <net/sock.h>
      #include <bcc/proto.h>
      #include <linux/errqueue.h>
      
      #ifdef memset
      #undef memset
      #endif
      
      int trace_err_skb(struct pt_regs *ctx)
      {
      	struct sk_buff *skb = (struct sk_buff *)ctx->si;
      	struct sock *sk = (struct sock *)ctx->di;
      	struct sock_exterr_skb *serr;
      	u32 ee_data = 0;
      
      	if (!sk || !skb)
      		return 0;
      
      	serr = SKB_EXT_ERR(skb);
      	bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
      	bpf_trace_printk("ee_data:%u\\n", ee_data);
      
      	return 0;
      };
      """
      
      b = BPF(text=bpf_text)
      b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
      print("Attached to kprobe")
      b.trace_print()
      
      Packetdrill Script:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 1460) = 1460
      0.200 write(4, ..., 13140) = 13140
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      0.400 close(4) = 0
      0.400 > F. 14601:14601(0) ack 1
      0.500 < F. 1:1(0) ack 14602 win 257
      0.500 > . 14602:14602(0) ack 2
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      479f85c3
    • Joe Stringer's avatar
      openvswitch: Orphan skbs before IPv6 defrag · 49e261a8
      Joe Stringer authored
      This is the IPv6 counterpart to commit 8282f274 ("inet: frag: Always
      orphan skbs inside ip_defrag()").
      
      Prior to commit 029f7f3b ("netfilter: ipv6: nf_defrag: avoid/free
      clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
      cloned (implicitly orphaning) prior to queueing for reassembly. As such,
      when the IPv6 message is eventually reassembled, the skb->sk for all
      fragments would be NULL. After that commit was introduced, rather than
      cloning, the original skbs were queued directly without orphaning. The
      end result is that all frags except for the first and last may have a
      socket attached.
      
      This commit explicitly orphans such skbs during nf_ct_frag6_gather() to
      prevent BUG_ON(skb->sk) during a later call to ip6_fragment().
      
      kernel BUG at net/ipv6/ip6_output.c:631!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffffa042c7c0>] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch]
       [<ffffffff810bb8a2>] ? __lock_is_held+0x52/0x70
       [<ffffffffa042c587>] ovs_fragment+0x1f7/0x280 [openvswitch]
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
       [<ffffffff81697ea0>] ? dst_discard_out+0x20/0x20
       [<ffffffff81697e80>] ? dst_ifdown+0x80/0x80
       [<ffffffffa042c703>] do_output.isra.28+0xf3/0x1b0 [openvswitch]
       [<ffffffffa042d279>] do_execute_actions+0x709/0x12c0 [openvswitch]
       [<ffffffffa04340a4>] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch]
       [<ffffffffa04340d1>] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch]
       [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffffa042de75>] ovs_execute_actions+0x45/0x120 [openvswitch]
       [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
       [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffffa042def4>] ovs_execute_actions+0xc4/0x120 [openvswitch]
       [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
       [<ffffffffa04337f2>] ? key_extract+0x442/0xc10 [openvswitch]
       [<ffffffffa043b26d>] ovs_vport_receive+0x5d/0xb0 [openvswitch]
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
       [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
       [<ffffffffa043c11d>] internal_dev_xmit+0x6d/0x150 [openvswitch]
       [<ffffffffa043c0b5>] ? internal_dev_xmit+0x5/0x150 [openvswitch]
       [<ffffffff8168fb5f>] dev_hard_start_xmit+0x2df/0x660
       [<ffffffff8168f5ea>] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0
       [<ffffffff81690925>] __dev_queue_xmit+0x8f5/0x950
       [<ffffffff81690080>] ? __dev_queue_xmit+0x50/0x950
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff81690990>] dev_queue_xmit+0x10/0x20
       [<ffffffff8169a418>] neigh_resolve_output+0x178/0x220
       [<ffffffff81752759>] ? ip6_finish_output2+0x219/0x7b0
       [<ffffffff81752759>] ip6_finish_output2+0x219/0x7b0
       [<ffffffff817525a5>] ? ip6_finish_output2+0x65/0x7b0
       [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
       [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
       [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
       [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
       [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
       [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
       [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
       [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
       [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
       [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
       [<ffffffff817796cc>] icmpv6_push_pending_frames+0xac/0xe0
       [<ffffffff8177a4be>] icmpv6_echo_reply+0x42e/0x500
       [<ffffffff8177acbf>] icmpv6_rcv+0x4cf/0x580
       [<ffffffff81755ac7>] ip6_input_finish+0x1a7/0x690
       [<ffffffff81755925>] ? ip6_input_finish+0x5/0x690
       [<ffffffff817567a0>] ip6_input+0x30/0xa0
       [<ffffffff81755920>] ? ip6_rcv_finish+0x1a0/0x1a0
       [<ffffffff817557ce>] ip6_rcv_finish+0x4e/0x1a0
       [<ffffffff8175640f>] ipv6_rcv+0x45f/0x7c0
       [<ffffffff81755fe6>] ? ipv6_rcv+0x36/0x7c0
       [<ffffffff81755780>] ? ip6_make_skb+0x1c0/0x1c0
       [<ffffffff8168b649>] __netif_receive_skb_core+0x229/0xb80
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff8168c07f>] ? process_backlog+0x6f/0x230
       [<ffffffff8168bfb6>] __netif_receive_skb+0x16/0x70
       [<ffffffff8168c088>] process_backlog+0x78/0x230
       [<ffffffff8168c0ed>] ? process_backlog+0xdd/0x230
       [<ffffffff8168db43>] net_rx_action+0x203/0x480
       [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
       [<ffffffff817c156e>] __do_softirq+0xde/0x49f
       [<ffffffff81752768>] ? ip6_finish_output2+0x228/0x7b0
       [<ffffffff817c070c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff8106f88b>] do_softirq.part.18+0x3b/0x40
       [<ffffffff8106f946>] __local_bh_enable_ip+0xb6/0xc0
       [<ffffffff81752791>] ip6_finish_output2+0x251/0x7b0
       [<ffffffff81754af1>] ? ip6_fragment+0xba1/0xc50
       [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
       [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
       [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
       [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
       [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
       [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
       [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
       [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
       [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
       [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
       [<ffffffff81778558>] rawv6_sendmsg+0xa28/0xe30
       [<ffffffff81719097>] ? inet_sendmsg+0xc7/0x1d0
       [<ffffffff817190d6>] inet_sendmsg+0x106/0x1d0
       [<ffffffff81718fd5>] ? inet_sendmsg+0x5/0x1d0
       [<ffffffff8166d078>] sock_sendmsg+0x38/0x50
       [<ffffffff8166d4d6>] SYSC_sendto+0xf6/0x170
       [<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
       [<ffffffff8166e38e>] SyS_sendto+0xe/0x10
       [<ffffffff817bebe5>] entry_SYSCALL_64_fastpath+0x18/0xa8
      Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 00 00 49 8#
      RIP  [<ffffffff8175468a>] ip6_fragment+0x73a/0xc50
       RSP <ffff880072803120>
      
      Fixes: 029f7f3b ("netfilter: ipv6: nf_defrag: avoid/free clone
      operations")
      Reported-by: default avatarDaniele Di Proietto <diproiettod@vmware.com>
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49e261a8
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm · f862d66a
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
       "Three further fixes for ARM.
      
        Alexandre Courbot was having problems with DMA allocations with the
        GFP flags affecting where the tracking data was being allocated from.
        Vladimir Murzin noticed that the CPU feature code was not entirely
        correct, which can cause some features to be misreported"
      
      * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
        ARM: 8564/1: fix cpu feature extracting helper
        ARM: 8563/1: fix demoting HWCAP_SWP
        ARM: 8551/2: DMA: Fix kzalloc flags in __dma_alloc
      f862d66a
    • Linus Torvalds's avatar
      Merge tag 'fbdev-fixes-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux · 90e6a689
      Linus Torvalds authored
      Pull fbdev fixes from Tomi Valkeinen:
      
       - ARM CLCD: fix regression on multiplatform kernels
      
       - panel-sharp-ls037v7dw01: fix possible NULL deref
      
      * tag 'fbdev-fixes-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
        omapfb: panel-sharp-ls037v7dw01: fix check of gpio_to_desc() return value
        video: ARM CLCD: runtime check for Versatile
      90e6a689
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v4.6-2' of... · b9358b24
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v4.6-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
      
      Pull x86 platform driver fixes from Darren Hart:
       "An S4 fix for intel-hid, new platform 'quirk' for hp_accel, a fix for
        broader support of ACPI resources for the Intel P-unit, and a few
        uninitialized variable fixes.
      
        intel p-unit:
         - decouple telemetry driver from the optional IPC resources
      
        thinkpad_acpi:
         - Silence an uninitialized variable warning
      
        intel_telemetry_pltdrv:
         - Silence an uninitialized variable warning
      
        hp_accel:
         - Silence an uninitialized variable warning
         - Add support for HP ProBook 440 G3
      
        intel-hid:
         - add a workaround to ignore an event after waking up from S4"
      
      * tag 'platform-drivers-x86-v4.6-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
        platform:x86 decouple telemetry driver from the optional IPC resources
        thinkpad_acpi: Silence an uninitialized variable warning
        intel_telemetry_pltdrv: Silence an uninitialized variable warning
        hp_accel: Silence an uninitialized variable warning
        hp_accel: Add support for HP ProBook 440 G3
        intel-hid: add a workaround to ignore an event after waking up from S4.
      b9358b24
  2. 20 Apr, 2016 6 commits
  3. 19 Apr, 2016 5 commits
  4. 18 Apr, 2016 4 commits
    • Linus Torvalds's avatar
      devpts: clean up interface to pty drivers · 67245ff3
      Linus Torvalds authored
      This gets rid of the horrible notion of having that
      
          struct inode *ptmx_inode
      
      be the linchpin of the interface between the pty code and devpts.
      
      By de-emphasizing the ptmx inode, a lot of things actually get cleaner,
      and we will have a much saner way forward.  In particular, this will
      allow us to associate with any particular devpts instance at open-time,
      and not be artificially tied to one particular ptmx inode.
      
      The patch itself is actually fairly straightforward, and apart from some
      locking and return path cleanups it's pretty mechanical:
      
       - the interfaces that devpts exposes all take "struct pts_fs_info *"
         instead of "struct inode *ptmx_inode" now.
      
         NOTE! The "struct pts_fs_info" thing is a completely opaque structure
         as far as the pty driver is concerned: it's still declared entirely
         internally to devpts. So the pty code can't actually access it in any
         way, just pass it as a "cookie" to the devpts code.
      
       - the "look up the pts fs info" is now a single clear operation, that
         also does the reference count increment on the pts superblock.
      
         So "devpts_add/del_ref()" is gone, and replaced by a "lookup and get
         ref" operation (devpts_get_ref(inode)), along with a "put ref" op
         (devpts_put_ref()).
      
       - the pty master "tty->driver_data" field now contains the pts_fs_info,
         not the ptmx inode.
      
       - because we don't care about the ptmx inode any more as some kind of
         base index, the ref counting can now drop the inode games - it just
         gets the ref on the superblock.
      
       - the pts_fs_info now has a back-pointer to the super_block. That's so
         that we can easily look up the information we actually need. Although
         quite often, the pts fs info was actually all we wanted, and not having
         to look it up based on some magical inode makes things more
         straightforward.
      
      In particular, now that "devpts_get_ref(inode)" operation should really
      be the *only* place we need to look up what devpts instance we're
      associated with, and we do it exactly once, at ptmx_open() time.
      
      The other side of this is that one ptmx node could now be associated
      with multiple different devpts instances - you could have a single
      /dev/ptmx node, and then have multiple mount namespaces with their own
      instances of devpts mounted on /dev/pts/.  And that's all perfectly sane
      in a model where we just look up the pts instance at open time.
      
      This will eventually allow us to get rid of our odd single-vs-multiple
      pts instance model, but this patch in itself changes no semantics, only
      an internal binding model.
      
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Aurelien Jarno <aurelien@aurel32.net>
      Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
      Cc: Jann Horn <jann@thejh.net>
      Cc: Greg KH <greg@kroah.com>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Florian Weimer <fw@deneb.enyo.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      67245ff3
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 95d0c427
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
       "A couple of bug fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: add CPU_BIG_ENDIAN config option
        s390/spinlock: avoid yield to non existent cpu
        s390/dcssblk: fix possible deadlock in remove vs. per-device attributes
        s390/seccomp: include generic seccomp header file
        s390/pci: add extra padding to function measurement block
        s390/scm_blk: fix deadlock for requests != REQ_TYPE_FS
      95d0c427
    • Vladimir Zapolskiy's avatar
      omapfb: panel-sharp-ls037v7dw01: fix check of gpio_to_desc() return value · 4dacad61
      Vladimir Zapolskiy authored
      The change fixes a check of gpio_to_desc() return value, the function
      returns either a valid pointer to struct gpio_desc or NULL, this makes
      IS_ERR() check invalid and may lead to a NULL pointer dereference in
      runtime.
      Signed-off-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      4dacad61
    • Linus Walleij's avatar
      video: ARM CLCD: runtime check for Versatile · f36fdacc
      Linus Walleij authored
      The current compile-time check for inversed IENB/CNTL does not
      work in multiplatform boots: as soon as versatile is included
      in the build, the IENB/CNTL is switched and breaks graphics.
      Convert this to a runtime switch.
      
      Cc: stable@vger.kernel.org
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Fixes: a29da136 ("ARM: versatile: convert to multi-platform")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      f36fdacc