1. 13 Nov, 2016 18 commits
    • Thomas Falcon's avatar
      ibmvnic: Fix size of debugfs name buffer · e1fac0ad
      Thomas Falcon authored
      This mistake was causing debugfs directory creation
      failures when multiple ibmvnic devices were probed.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1fac0ad
    • Thomas Falcon's avatar
      ibmvnic: Unmap ibmvnic_statistics structure · b7f193da
      Thomas Falcon authored
      This structure was mapped but never subsequently unmapped.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7f193da
    • Bert Kenward's avatar
      sfc: clear napi_hash state when copying channels · 46d054f8
      Bert Kenward authored
      efx_copy_channel() doesn't correctly clear the napi_hash related state.
      This means that when napi_hash_add is called for that channel nothing is
      done, and we are left with a copy of the napi_hash_node from the old
      channel. When we later call napi_hash_del() on this channel we have a
      stale napi_hash_node.
      
      Corruption is only seen when there are multiple entries in one of the
      napi_hash lists. This is made more likely by having a very large number
      of channels. Testing was carried out with 512 channels - 32 channels on
      each of 16 ports.
      
      This failure typically appears as protection faults within napi_by_id()
      or napi_hash_add(). efx_copy_channel() is only used when tx or rx ring
      sizes are changed (ethtool -G).
      
      Fixes: 36763266 ("sfc: Add support for busy polling")
      Signed-off-by: default avatarBert Kenward <bkenward@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46d054f8
    • David S. Miller's avatar
      Merge branch 'mlxsw-fixes' · 9e37aaa3
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Couple of fixes
      
      Please, queue-up both for stable. Thanks!
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e37aaa3
    • Arkadi Sharshevsky's avatar
      mlxsw: spectrum_router: Correctly dump neighbour activity · 42cdb338
      Arkadi Sharshevsky authored
      The device's neighbour table is periodically dumped in order to update
      the kernel about active neighbours. A single dump session may span
      multiple queries, until the response carries less records than requested
      or when a record (can contain up to four neighbour entries) is not full.
      Current code stops the session when the number of returned records is
      zero, which can result in infinite loop in case of high packet rate.
      
      Fix this by stopping the session according to the above logic.
      
      Fixes: c723c735 ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
      Signed-off-by: default avatarArkadi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42cdb338
    • Yotam Gigi's avatar
      mlxsw: spectrum: Fix refcount bug on span entries · 2d644d4c
      Yotam Gigi authored
      When binding port to a newly created span entry, its refcount is
      initialized to zero even though it has a bound port. That leads
      to unexpected behaviour when the user tries to delete that port
      from the span entry.
      
      Fix this by initializing the reference count to 1.
      
      Also add a warning to put function.
      
      Fixes: 763b4b70 ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d644d4c
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fixes' · a055450a
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: 2 bug fixes.
      
      Bug fixes in bnxt_setup_tc() and VF vitual link state.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a055450a
    • Michael Chan's avatar
      bnxt_en: Fix VF virtual link state. · 73b9bad6
      Michael Chan authored
      If the physical link is down and the VF virtual link is set to "enable",
      the current code does not always work.  If the link is down but the
      cable is attached, the firmware returns LINK_SIGNAL instead of
      NO_LINK.  The current code is treating LINK_SIGNAL as link up.
      The fix is to treat link as down when the link_status != LINK.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73b9bad6
    • Michael Chan's avatar
      bnxt_en: Fix ring arithmetic in bnxt_setup_tc(). · 3ffb6a39
      Michael Chan authored
      The logic is missing the check on whether the tx and rx rings are sharing
      completion rings or not.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ffb6a39
    • Mike Frysinger's avatar
      Revert "include/uapi/linux/atm_zatm.h: include linux/time.h" · 7b5b74ef
      Mike Frysinger authored
      This reverts commit cf00713a ("include/uapi/linux/atm_zatm.h: include
      linux/time.h").
      
      This attempted to fix userspace breakage that no longer existed when
      the patch was merged.  Almost one year earlier, commit 70ba07b6
      ("atm: remove 'struct zatm_t_hist'") deleted the struct in question.
      
      After this patch was merged, we now have to deal with people being
      unable to include this header in conjunction with standard C library
      headers like stdlib.h (which linux-atm does).  Example breakage:
      x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -I./../q2931 -I./../saal \
      	-I.  -DCPPFLAGS_TEST  -I../../src/include -O2 -march=native -pipe -g \
      	-frecord-gcc-switches -freport-bug -Wimplicit-function-declaration \
      	-Wnonnull -Wstrict-aliasing -Wparentheses -Warray-bounds \
      	-Wfree-nonheap-object -Wreturn-local-addr -fno-strict-aliasing -Wall \
      	-Wshadow -Wpointer-arith -Wwrite-strings -Wstrict-prototypes -c zntune.c
      In file included from /usr/include/linux/atm_zatm.h:17:0,
                       from zntune.c:17:
      /usr/include/linux/time.h:9:8: error: redefinition of ‘struct timespec’
       struct timespec {
              ^
      In file included from /usr/include/sys/select.h:43:0,
                       from /usr/include/sys/types.h:219,
                       from /usr/include/stdlib.h:314,
                       from zntune.c:9:
      /usr/include/time.h:120:8: note: originally defined here
       struct timespec
              ^
      Signed-off-by: default avatarMike Frysinger <vapier@gentoo.org>
      Acked-by: default avatarMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b5b74ef
    • Eric Dumazet's avatar
      tcp: take care of truncations done by sk_filter() · ac6e7800
      Eric Dumazet authored
      With syzkaller help, Marco Grassi found a bug in TCP stack,
      crashing in tcp_collapse()
      
      Root cause is that sk_filter() can truncate the incoming skb,
      but TCP stack was not really expecting this to happen.
      It probably was expecting a simple DROP or ACCEPT behavior.
      
      We first need to make sure no part of TCP header could be removed.
      Then we need to adjust TCP_SKB_CB(skb)->end_seq
      
      Many thanks to syzkaller team and Marco for giving us a reproducer.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMarco Grassi <marco.gra@gmail.com>
      Reported-by: default avatarVladis Dronov <vdronov@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac6e7800
    • Stephen Suryaputra Lin's avatar
      ipv4: use new_gw for redirect neigh lookup · 969447f2
      Stephen Suryaputra Lin authored
      In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
      and then the state of the neigh for the new_gw is checked. If the state
      isn't valid then the redirected route is deleted. This behavior is
      maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
      is assigned to peer->redirect_learned.a4 before calling
      ipv4_neigh_lookup().
      
      After commit 5943634f ("ipv4: Maintain redirect and PMTU info in
      struct rtable again."), ipv4_neigh_lookup() is performed without the
      rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
      isn't zero, the function uses it as the key. The neigh is most likely
      valid since the old_gw is the one that sends the ICMP redirect message.
      Then the new_gw is assigned to fib_nh_exception. The problem is: the
      new_gw ARP may never gets resolved and the traffic is blackholed.
      
      So, use the new_gw for neigh lookup.
      
      Changes from v1:
       - use __ipv4_neigh_lookup instead (per Eric Dumazet).
      
      Fixes: 5943634f ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
      Signed-off-by: default avatarStephen Suryaputra Lin <ssurya@ieee.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      969447f2
    • Guenter Roeck's avatar
      r8152: Fix error path in open function · ca0a7531
      Guenter Roeck authored
      If usb_submit_urb() called from the open function fails, the following
      crash may be observed.
      
      r8152 8-1:1.0 eth0: intr_urb submit failed: -19
      ...
      r8152 8-1:1.0 eth0: v1.08.3
      Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b7b
      pgd = ffffffc0e7305000
      [6b6b6b6b6b6b6b7b] *pgd=0000000000000000, *pud=0000000000000000
      Internal error: Oops: 96000004 [#1] PREEMPT SMP
      ...
      PC is at notifier_chain_register+0x2c/0x58
      LR is at blocking_notifier_chain_register+0x54/0x70
      ...
      Call trace:
      [<ffffffc0002407f8>] notifier_chain_register+0x2c/0x58
      [<ffffffc000240bdc>] blocking_notifier_chain_register+0x54/0x70
      [<ffffffc00026991c>] register_pm_notifier+0x24/0x2c
      [<ffffffbffc183200>] rtl8152_open+0x3dc/0x3f8 [r8152]
      [<ffffffc000808000>] __dev_open+0xac/0x104
      [<ffffffc0008082f8>] __dev_change_flags+0xb0/0x148
      [<ffffffc0008083c4>] dev_change_flags+0x34/0x70
      [<ffffffc000818344>] do_setlink+0x2c8/0x888
      [<ffffffc0008199d4>] rtnl_newlink+0x328/0x644
      [<ffffffc000819e98>] rtnetlink_rcv_msg+0x1a8/0x1d4
      [<ffffffc0008373c8>] netlink_rcv_skb+0x68/0xd0
      [<ffffffc000817990>] rtnetlink_rcv+0x2c/0x3c
      [<ffffffc000836d1c>] netlink_unicast+0x16c/0x234
      [<ffffffc00083720c>] netlink_sendmsg+0x340/0x364
      [<ffffffc0007e85d0>] sock_sendmsg+0x48/0x60
      [<ffffffc0007e9c30>] SyS_sendto+0xe0/0x120
      [<ffffffc0007e9cb0>] SyS_send+0x40/0x4c
      [<ffffffc000203e34>] el0_svc_naked+0x24/0x28
      
      Clean up error handling to avoid registering the notifier if the open
      function is going to fail.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca0a7531
    • Baruch Siach's avatar
      net: bpqether.h: remove if_ether.h guard · 10b21768
      Baruch Siach authored
      __LINUX_IF_ETHER_H is not defined anywhere, and if_ether.h can keep itself from
      double inclusion, though it uses a single underscore prefix.
      Signed-off-by: default avatarBaruch Siach <baruch@tkos.co.il>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10b21768
    • Eric Dumazet's avatar
      net: __skb_flow_dissect() must cap its return value · 34fad54c
      Eric Dumazet authored
      After Tom patch, thoff field could point past the end of the buffer,
      this could fool some callers.
      
      If an skb was provided, skb->len should be the upper limit.
      If not, hlen is supposed to be the upper limit.
      
      Fixes: a6e544b0 ("flow_dissector: Jump to exit code in __skb_flow_dissect")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: Yibin Yang <yibyang@cisco.com
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34fad54c
    • David S. Miller's avatar
      Merge branch 'fix-bpf_redirect' · 79774d6b
      David S. Miller authored
      Martin KaFai Lau says:
      
      ====================
      bpf: Fix bpf_redirect to an ipip/ip6tnl dev
      
      This patch set fixes a bug in bpf_redirect(dev, flags) when dev is an
      ipip/ip6tnl.  The current problem is IP-EthHdr-IP is sent out instead of
      IP-IP.
      
      Patch 1 adds a dev->type test similar to dev_is_mac_header_xmit()
      in act_mirred.c which is only available in net-next.  We can consider to
      refactor it once this patch is pulled into net-next from net.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79774d6b
    • Martin KaFai Lau's avatar
      bpf: Add test for bpf_redirect to ipip/ip6tnl · 90e02896
      Martin KaFai Lau authored
      The test creates two netns, ns1 and ns2.  The host (the default netns)
      has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
      
          ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)
      
      The test is to have ns1 pinging VIPs configured at the loopback
      interface in ns2.
      
      The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
      at lo@ns2). [Note: 0x66 => 102].
      
      At ns1, the VIPs are routed _via_ the host.
      
      At the host, bpf programs are installed at the veth to redirect packets
      from a veth to the ipip/ip6tnl.  The test is configured in a way so
      that both ingress and egress can be tested.
      
      At ns2, the ipip/ip6tnl dev is configured with the local and remote address
      specified.  The return path is routed to the dev ipip/ip6tnl.
      
      During egress test, the host also locally tests pinging the VIPs to ensure
      that bpf_redirect at egress also works for the direct egress (i.e. not
      forwarding from dev ve1 to ve2).
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90e02896
    • Martin KaFai Lau's avatar
      bpf: Fix bpf_redirect to an ipip/ip6tnl dev · 4e3264d2
      Martin KaFai Lau authored
      If the bpf program calls bpf_redirect(dev, 0) and dev is
      an ipip/ip6tnl, it currently includes the mac header.
      e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
      of IP-IP.
      
      The fix is to pull the mac header.  At ingress, skb_postpull_rcsum()
      is not needed because the ethhdr should have been pulled once already
      and then got pushed back just before calling the bpf_prog.
      At egress, this patch calls skb_postpull_rcsum().
      
      If bpf_redirect(dev, BPF_F_INGRESS) is called,
      it also fails now because it calls dev_forward_skb() which
      eventually calls eth_type_trans(skb, dev).  The eth_type_trans()
      will set skb->type = PACKET_OTHERHOST because the mac address
      does not match the redirecting dev->dev_addr.  The PACKET_OTHERHOST
      will eventually cause the ip_rcv() errors out.  To fix this,
      ____dev_forward_skb() is added.
      
      Joint work with Daniel Borkmann.
      
      Fixes: cfc7381b ("ip_tunnel: add collect_md mode to IPIP tunnel")
      Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e3264d2
  2. 10 Nov, 2016 11 commits
  3. 09 Nov, 2016 11 commits
    • Arnd Bergmann's avatar
      vxlan: hide unused local variable · 4053ab1b
      Arnd Bergmann authored
      A bugfix introduced a harmless warning in v4.9-rc4:
      
      drivers/net/vxlan.c: In function 'vxlan_group_used':
      drivers/net/vxlan.c:947:21: error: unused variable 'sock6' [-Werror=unused-variable]
      
      This hides the variable inside of the same #ifdef that is
      around its user. The extraneous initialization is removed
      at the same time, it was accidentally introduced in the
      same commit.
      
      Fixes: c6fcc4fc ("vxlan: avoid using stale vxlan socket.")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4053ab1b
    • John Allen's avatar
      ibmvnic: Start completion queue negotiation at server-provided optimum values · 6dbcd8fb
      John Allen authored
      Use the opt_* fields to determine the starting point for negotiating the
      number of tx/rx completion queues with the vnic server. These contain the
      number of queues that the vnic server estimates that it will be able to
      allocate. While renegotiation may still occur, using the opt_* fields will
      reduce the number of times this needs to happen and will prevent driver
      probe timeout on systems using large numbers of ibmvnic client devices per
      vnic port.
      Signed-off-by: default avatarJohn Allen <jallen@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6dbcd8fb
    • David Ahern's avatar
      net: icmp_route_lookup should use rt dev to determine L3 domain · 9d1a6c4e
      David Ahern authored
      icmp_send is called in response to some event. The skb may not have
      the device set (skb->dev is NULL), but it is expected to have an rt.
      Update icmp_route_lookup to use the rt on the skb to determine L3
      domain.
      
      Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d1a6c4e
    • David S. Miller's avatar
      Merge branch 'qcom-emac-pause' · fd6f24d7
      David S. Miller authored
      Timur Tabi says:
      
      ====================
      net: qcom/emac: ensure that pause frames are enabled
      
      The qcom emac driver experiences significant packet loss (through frame
      check sequence errors) if flow control is not enabled and the phy is
      not configured to allow pause frames to pass through it.  Therefore, we
      need to enable flow control and force the phy to pass pause frames.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd6f24d7
    • Timur Tabi's avatar
      net: qcom/emac: enable flow control if requested · df63022e
      Timur Tabi authored
      If the PHY has been configured to allow pause frames, then the MAC
      should be configured to generate and/or accept those frames.
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df63022e
    • Timur Tabi's avatar
      net: qcom/emac: configure the external phy to allow pause frames · 3e884493
      Timur Tabi authored
      Pause frames are used to enable flow control.  A MAC can send and
      receive pause frames in order to throttle traffic.  However, the PHY
      must be configured to allow those frames to pass through.
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e884493
    • Rafał Miłecki's avatar
      net: bgmac: fix reversed checks for clock control flag · cdb26d33
      Rafał Miłecki authored
      This fixes regression introduced by patch adding feature flags. It was
      already reported and patch followed (it got accepted) but it appears it
      was incorrect. Instead of fixing reversed condition it broke a good one.
      
      This patch was verified to actually fix SoC hanges caused by bgmac on
      BCM47186B0.
      
      Fixes: db791eb2 ("net: ethernet: bgmac: convert to feature flags")
      Fixes: 4af1474e ("net: bgmac: Fix errant feature flag check")
      Cc: Jon Mason <jon.mason@broadcom.com>
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdb26d33
    • Benjamin Poirier's avatar
      bna: Add synchronization for tx ring. · d667f785
      Benjamin Poirier authored
      We received two reports of BUG_ON in bnad_txcmpl_process() where
      hw_consumer_index appeared to be ahead of producer_index. Out of order
      write/read of these variables could explain these reports.
      
      bnad_start_xmit(), as a producer of tx descriptors, has a few memory
      barriers sprinkled around writes to producer_index and the device's
      doorbell but they're not paired with anything in bnad_txcmpl_process(), a
      consumer.
      
      Since we are synchronizing with a device, we must use mandatory barriers,
      not smp_*. Also, I didn't see the purpose of the last smp_mb() in
      bnad_start_xmit().
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d667f785
    • Tariq Toukan's avatar
      Revert "net/mlx4_en: Fix panic during reboot" · f91d7181
      Tariq Toukan authored
      This reverts commit 9d2afba0.
      
      The original issue would possibly exist if an external module
      tried calling our "ethtool_ops" without checking if it still
      exists.
      
      The right way of solving it is by simply doing the check in
      the caller side.
      Currently, no action is required as there's no such use case.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f91d7181
    • Maciej Żenczykowski's avatar
      net-ipv6: on device mtu change do not add mtu to mtu-less routes · fb56be83
      Maciej Żenczykowski authored
      Routes can specify an mtu explicitly or inherit the mtu from
      the underlying device - this inheritance is implemented in
      dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().
      
      Currently changing the mtu of a device adds mtu explicitly
      to routes using that device.
      
      ie.
        # ip link set dev lo mtu 65536
        # ip -6 route add local 2000::1 dev lo
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
      
        # ip link set dev lo mtu 65535
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65535 pref medium
      
        # ip link set dev lo mtu 65536
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium
      
        # ip -6 route del local 2000::1
      
      After this patch the route entry no longer changes unless it already has an mtu.
      There is no need: this inheritance is already done in ip6_mtu()
      
        # ip link set dev lo mtu 65536
        # ip -6 route add local 2000::1 dev lo
        # ip -6 route add local 2000::2 dev lo mtu 2000
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium
      
        # ip link set dev lo mtu 65535
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium
      
        # ip link set dev lo mtu 1501
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 1501 pref medium
      
        # ip link set dev lo mtu 65536
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium
      
        # ip -6 route del local 2000::1
        # ip -6 route del local 2000::2
      
      This is desirable because changing device mtu and then resetting it
      to the previous value shouldn't change the user visible routing table.
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      CC: Eric Dumazet <edumazet@google.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb56be83
    • Soheil Hassas Yeganeh's avatar
      sock: fix sendmmsg for partial sendmsg · 3023898b
      Soheil Hassas Yeganeh authored
      Do not send the next message in sendmmsg for partial sendmsg
      invocations.
      
      sendmmsg assumes that it can continue sending the next message
      when the return value of the individual sendmsg invocations
      is positive. It results in corrupting the data for TCP,
      SCTP, and UNIX streams.
      
      For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
      of "aefgh" if the first sendmsg invocation sends only the first
      byte while the second sendmsg goes through.
      
      Datagram sockets either send the entire datagram or fail, so
      this patch affects only sockets of type SOCK_STREAM and
      SOCK_SEQPACKET.
      
      Fixes: 228e548e ("net: Add sendmmsg socket system call")
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3023898b