1. 06 Mar, 2018 1 commit
  2. 03 Mar, 2018 2 commits
    • Daniel Axtens's avatar
      bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat to deal with gso sctp skbs · d02f51cb
      Daniel Axtens authored
      SCTP GSO skbs have a gso_size of GSO_BY_FRAGS, so any sort of
      unconditionally mangling of that will result in nonsense value
      and would corrupt the skb later on.
      
      Therefore, i) add two helpers skb_increase_gso_size() and
      skb_decrease_gso_size() that would throw a one time warning and
      bail out for such skbs and ii) refuse and return early with an
      error in those BPF helpers that are affected. We do need to bail
      out as early as possible from there before any changes on the
      skb have been performed.
      
      Fixes: 6578171a ("bpf: add bpf_skb_change_proto helper")
      Co-authored-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d02f51cb
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 4a0c7191
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains Netfilter fixes for your net tree,
      they are:
      
      1) Put back reference on CLUSTERIP configuration structure from the
         error path, patch from Florian Westphal.
      
      2) Put reference on CLUSTERIP configuration instead of freeing it,
         another cpu may still be walking over it, also from Florian.
      
      3) Refetch pointer to IPv6 header from nf_nat_ipv6_manip_pkt() given
         packet manipulation may reallocation the skbuff header, from Florian.
      
      4) Missing match size sanity checks in ebt_among, from Florian.
      
      5) Convert BUG_ON to WARN_ON in ebtables, from Florian.
      
      6) Sanity check userspace offsets from ebtables kernel, from Florian.
      
      7) Missing checksum replace call in flowtable IPv4 DNAT, from Felix
         Fietkau.
      
      8) Bump the right stats on checksum error from bridge netfilter,
         from Taehee Yoo.
      
      9) Unset interface flag in IPv6 fib lookups otherwise we get
         misleading routing lookup results, from Florian.
      
      10) Missing sk_to_full_sk() in ip6_route_me_harder() from Eric Dumazet.
      
      11) Don't allow devices to be part of multiple flowtables at the same
          time, this may break setups.
      
      12) Missing netlink attribute validation in flowtable deletion.
      
      13) Wrong array index in nf_unregister_net_hook() call from error path
          in flowtable addition path.
      
      14) Fix FTP IPVS helper when NAT mangling is in place, patch from
          Julian Anastasov.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a0c7191
  3. 02 Mar, 2018 6 commits
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2018-03-02' of... · d69242bf
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2018-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Three more patches:
       * fix for a regression in 4-addr mode with fast-RX
       * fix for a Kconfig problem with the new regdb
       * fix for the long-standing TCP performance issue in
         wifi using the new sk_pacing_shift_update()
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d69242bf
    • Ka-Cheong Poon's avatar
      rds: Incorrect reference counting in TCP socket creation · 84eef2b2
      Ka-Cheong Poon authored
      Commit 0933a578 ("rds: tcp: use sock_create_lite() to create the
      accept socket") has a reference counting issue in TCP socket creation
      when accepting a new connection.  The code uses sock_create_lite() to
      create a kernel socket.  But it does not do __module_get() on the
      socket owner.  When the connection is shutdown and sock_release() is
      called to free the socket, the owner's reference count is decremented
      and becomes incorrect.  Note that this bug only shows up when the socket
      owner is configured as a kernel module.
      
      v2: Update comments
      
      Fixes: 0933a578 ("rds: tcp: use sock_create_lite() to create the accept socket")
      Signed-off-by: default avatarKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Acked-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84eef2b2
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · a5f7b0ee
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-02-28
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Add schedule points and reduce the number of loop iterations
         the test_bpf kernel module is performing in order to not hog
         the CPU for too long, from Eric.
      
      2) Fix an out of bounds access in tail calls in the ppc64 BPF
         JIT compiler, from Daniel.
      
      3) Fix a crash on arm64 on unaligned BPF xadd operations that
         could be triggered via interpreter and JIT, from Daniel.
      
      Please not that once you merge net into net-next at some point, there
      is a minor merge conflict in test_verifier.c since test cases had
      been added at the end in both trees. Resolution is trivial: keep all
      the test cases from both trees.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5f7b0ee
    • Edward Cree's avatar
      net: ethtool: don't ignore return from driver get_fecparam method · a6d50512
      Edward Cree authored
      If ethtool_ops->get_fecparam returns an error, pass that error on to the
       user, rather than ignoring it.
      
      Fixes: 1a5f3da2 ("net: ethtool: add support for forward error correction modes")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6d50512
    • Stephen Suryaputra's avatar
      vrf: check forwarding on the original netdevice when generating ICMP dest unreachable · e2c0dc1f
      Stephen Suryaputra authored
      When ip_error() is called the device is the l3mdev master instead of the
      original device. So the forwarding check should be on the original one.
      
      Changes from v2:
      - Handle the original device disappearing (per David Ahern)
      - Minimize the change in code order
      
      Changes from v1:
      - Only need to reset the device on which __in_dev_get_rcu() is done (per
        David Ahern).
      Signed-off-by: default avatarStephen Suryaputra <ssuryaextr@gmail.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2c0dc1f
    • Mike Manning's avatar
      net: allow interface to be set into VRF if VLAN interface in same VRF · 50d629e7
      Mike Manning authored
      Setting an interface into a VRF fails with 'RTNETLINK answers: File
      exists' if one of its VLAN interfaces is already in the same VRF.
      As the VRF is an upper device of the VLAN interface, it is also showing
      up as an upper device of the interface itself. The solution is to
      restrict this check to devices other than master. As only one master
      device can be linked to a device, the check in this case is that the
      upper device (VRF) being linked to is not the same as the master device
      instead of it not being any one of the upper devices.
      
      The following example shows an interface ens12 (with a VLAN interface
      ens12.10) being set into VRF green, which behaves as expected:
      
        # ip link add link ens12 ens12.10 type vlan id 10
        # ip link set dev ens12 master vrfgreen
        # ip link show dev ens12
          3: ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
             master vrfgreen state UP mode DEFAULT group default qlen 1000
             link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
      
      But if the VLAN interface has previously been set into the same VRF,
      then setting the interface into the VRF fails:
      
        # ip link set dev ens12 nomaster
        # ip link set dev ens12.10 master vrfgreen
        # ip link show dev ens12.10
          39: ens12.10@ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
          qdisc noqueue master vrfgreen state UP mode DEFAULT group default
          qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
        # ip link set dev ens12 master vrfgreen
          RTNETLINK answers: File exists
      
      The workaround is to move the VLAN interface back into the default VRF
      beforehand, but it has to be shut first so as to avoid the risk of
      traffic leaking from the VRF. This fix avoids needing this workaround.
      Signed-off-by: default avatarMike Manning <mmanning@att.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50d629e7
  4. 01 Mar, 2018 1 commit
  5. 28 Feb, 2018 25 commits
    • Julian Anastasov's avatar
      ipvs: remove IPS_NAT_MASK check to fix passive FTP · 8a949fff
      Julian Anastasov authored
      The IPS_NAT_MASK check in 4.12 replaced previous check for nfct_nat()
      which was needed to fix a crash in 2.6.36-rc, see
      commit 7bcbf81a ("ipvs: avoid oops for passive FTP").
      But as IPVS does not set the IPS_SRC_NAT and IPS_DST_NAT bits,
      checking for IPS_NAT_MASK prevents PASV response to be properly
      mangled and blocks the transfer. Remove the check as it is not
      needed after 3.12 commit 41d73ec0 ("netfilter: nf_conntrack:
      make sequence number adjustments usuable without NAT") which
      changes nfct_nat() with nfct_seqadj() and especially after 3.13
      commit b25adce1 ("ipvs: correct usage/allocation of seqadj
      ext in ipvs").
      
      Thanks to Li Shuang and Florian Westphal for reporting the problem!
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Fixes: be7be6e1 ("netfilter: ipvs: fix incorrect conflict resolution")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8a949fff
    • David S. Miller's avatar
      Merge branch 'mlxsw-fixes' · b739012b
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: couple of fixes
      
      Couple of unrelated fixes for mlxsw.
      
      ---
      v1->v2:
      -patch 2:
       - rebase on top of current -net tree
       - removed forgotten empty line
      -patch 3:
       - new patch
      -patch 4:
       - new patch
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b739012b
    • Ido Schimmel's avatar
      spectrum: Reference count VLAN entries · b3529af6
      Ido Schimmel authored
      One of the basic construct in the device is a port-VLAN pair, which can
      be bound to a FID or a RIF in order to direct packets to the bridge or
      the router, respectively.
      
      Since not all the netdevs are configured with a VLAN (e.g., sw1p1 vs.
      sw1p1.10), VID 1 is used to represent these and thus this VID can be
      used by both upper devices of mlxsw ports and by the driver itself.
      
      However, this VID is not reference counted and therefore might be freed
      prematurely, which can result in various WARNINGs. For example:
      
      $ ip link add name br0 type bridge vlan_filtering 1
      $ teamd -t team0 -d -c '{"runner": {"name": "lacp"}}'
      $ ip link set dev team0 master br0
      $ ip link set dev enp1s0np1 master team0
      $ ip address add 192.0.2.1/24 dev enp1s0np1
      
      The enslavement to team0 will fail because team0 already has an upper
      and thus vlan_vids_del_by_dev() will be executed as part of team's error
      path which will delete VID 1 from enp1s0np1 (added by br0 as PVID). The
      WARNING will be generated when the driver will realize it can't find VID
      1 on the port and bind it to a RIF.
      
      Fix this by adding a reference count to the VLAN entries on the port, in
      a similar fashion to the reference counting used by the corresponding
      'vlan_vid_info' structure in the 8021q driver.
      
      Fixes: c57529e1 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
      Reported-by: default avatarTal Bar <talb@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Tested-by: default avatarTal Bar <talb@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3529af6
    • Ido Schimmel's avatar
      mlxsw: spectrum: Treat IPv6 unregistered multicast as broadcast · 9d45deb0
      Ido Schimmel authored
      When multicast snooping is enabled, the Linux bridge resorts to flooding
      unregistered multicast packets to all ports only in case it did not
      detect a querier in the network.
      
      The above condition is not reflected to underlying drivers, which is
      especially problematic in IPv6 environments, as multicast snooping is
      enabled by default and since neighbour solicitation packets might be
      treated as unregistered multicast packets in case there is no
      corresponding MDB entry.
      
      Until the Linux bridge reflects its querier state to underlying drivers,
      simply treat unregistered multicast packets as broadcast and allow them
      to reach their destination.
      
      Fixes: 9df552ef ("mlxsw: spectrum: Improve IPv6 unregistered multicast flooding")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d45deb0
    • Jiri Pirko's avatar
      mlxsw: spectrum: Fix handling of resource_size_param · 77d27096
      Jiri Pirko authored
      Current code uses global variables, adjusts them and passes pointer down
      to devlink. With every other mlxsw_core instance, the previously passed
      pointer values are rewritten. Fix this by de-globalize the variables and
      also memcpy size_params during devlink resource registration.
      Also, introduce a convenient size_param_init helper.
      
      Fixes: ef3116e5 ("mlxsw: spectrum: Register KVD resources with devlink")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77d27096
    • Jiri Pirko's avatar
      mlxsw: core: Fix flex keys scratchpad offset conflict · 2ddc94c7
      Jiri Pirko authored
      IP_TTL, IP_ECN and IP_DSCP are using the same offset within the
      scratchpad as L4 ports. Fix this by shifting all up.
      
      Fixes: 5f57e090 ("mlxsw: acl: Add ip ttl acl element")
      Fixes: i80d0fe47 ("mlxsw: acl: Add ip tos acl element")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ddc94c7
    • David S. Miller's avatar
      Merge branch 'net-smc-fixes' · 7358799c
      David S. Miller authored
      Ursula Braun says:
      
      ====================
      net/smc: fixes 2018-02-28
      
      here are 3 smc bug fixes for the net-tree. Karsten's first patch is
      the reworked version of last week's
         "[PATCH net-next 2/5] net/smc: fix structure size"
      patch, now solved without using __packed, and now targetted for net
      instead of net-next.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7358799c
    • Davide Caratti's avatar
      net/smc: fix NULL pointer dereference on sock_create_kern() error path · a5dcb73b
      Davide Caratti authored
      when sock_create_kern(..., a) returns an error, 'a' might not be a valid
      pointer, so it shouldn't be dereferenced to read a->sk->sk_sndbuf and
      and a->sk->sk_rcvbuf; not doing that caused the following crash:
      
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 4254 Comm: syzkaller919713 Not tainted 4.16.0-rc1+ #18
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:smc_create+0x14e/0x300 net/smc/af_smc.c:1410
      RSP: 0018:ffff8801b06afbc8 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8801b63457c0 RCX: ffffffff85a3e746
      RDX: 0000000000000004 RSI: 00000000ffffffff RDI: 0000000000000020
      RBP: ffff8801b06afbf0 R08: 00000000000007c0 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff8801b6345c08 R14: 00000000ffffffe9 R15: ffffffff8695ced0
      FS:  0000000001afb880(0000) GS:ffff8801db200000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000040 CR3: 00000001b0721004 CR4: 00000000001606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        __sock_create+0x4d4/0x850 net/socket.c:1285
        sock_create net/socket.c:1325 [inline]
        SYSC_socketpair net/socket.c:1409 [inline]
        SyS_socketpair+0x1c0/0x6f0 net/socket.c:1366
        do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x26/0x9b
      RIP: 0033:0x4404b9
      RSP: 002b:00007fff44ab6908 EFLAGS: 00000246 ORIG_RAX: 0000000000000035
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004404b9
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000002b
      RBP: 00007fff44ab6910 R08: 0000000000000002 R09: 00007fff44003031
      R10: 0000000020000040 R11: 0000000000000246 R12: ffffffffffffffff
      R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 b3 01 00 00 4c 8b a3 48 04 00 00 48
      b8
      00 00 00 00 00 fc ff df 49 8d 7c 24 20 48 89 fa 48 c1 ea 03 <80> 3c 02
      00
      0f 85 82 01 00 00 4d 8b 7c 24 20 48 b8 00 00 00 00
      RIP: smc_create+0x14e/0x300 net/smc/af_smc.c:1410 RSP: ffff8801b06afbc8
      
      Fixes: cd6851f3 smc: remote memory buffers (RMBs)
      Reported-and-tested-by: syzbot+aa0227369be2dcc26ebe@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5dcb73b
    • Karsten Graul's avatar
      net/smc: use link_id of server in confirm link reply · 2be922f3
      Karsten Graul authored
      The CONFIRM LINK reply message must contain the link_id sent
      by the server. And set the link_id explicitly when
      initializing the link.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.vnet.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2be922f3
    • Karsten Graul's avatar
      net/smc: use a constant for control message length · cbba07a7
      Karsten Graul authored
      The sizeof(struct smc_cdc_msg) evaluates to 48 bytes instead of the
      required 44 bytes. We need to use the constant value of
      SMC_WR_TX_SIZE to set and check the control message length.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.vnet.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbba07a7
    • Jason Wang's avatar
      virtio-net: disable NAPI only when enabled during XDP set · 4e09ff53
      Jason Wang authored
      We try to disable NAPI to prevent a single XDP TX queue being used by
      multiple cpus. But we don't check if device is up (NAPI is enabled),
      this could result stall because of infinite wait in
      napi_disable(). Fixing this by checking device state through
      netif_running() before.
      
      Fixes: 4941d472 ("virtio-net: do not reset during XDP set")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e09ff53
    • Joey Pabalinas's avatar
      net/tcp/illinois: replace broken algorithm reference link · ecc83275
      Joey Pabalinas authored
      The link to the pdf containing the algorithm description is now a
      dead link; it seems http://www.ifp.illinois.edu/~srikant/ has been
      moved to https://sites.google.com/a/illinois.edu/srikant/ and none of
      the original papers can be found there...
      
      I have replaced it with the only working copy I was able to find.
      
      n.b. there is also a copy available at:
      
      http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.6350&rep=rep1&type=pdf
      
      However, this seems to only be a *cached* version, so I am unsure
      exactly how reliable that link can be expected to remain over time
      and have decided against using that one.
      Signed-off-by: default avatarJoey Pabalinas <joeypabalinas@gmail.com>
      
       1 file changed, 1 insertion(+), 1 deletion(-)
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecc83275
    • Eric Dumazet's avatar
      test_bpf: reduce MAX_TESTRUNS · 9960d766
      Eric Dumazet authored
      For tests that are using the maximal number of BPF instruction, each
      run takes 20 usec. Looping 10,000 times on them totals 200 ms, which
      is bad when the loop is not preemptible.
      
      test_bpf: #264 BPF_MAXINSNS: Call heavy transformations jited:1 19248
      18548 PASS
      test_bpf: #269 BPF_MAXINSNS: ld_abs+get_processor_id jited:1 20896 PASS
      
      Lets divide by ten the number of iterations, so that max latency is
      20ms. We could use need_resched() to break the loop earlier if we
      believe 20 ms is too much.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9960d766
    • Soheil Hassas Yeganeh's avatar
      tcp: purge write queue upon RST · a27fd7a8
      Soheil Hassas Yeganeh authored
      When the connection is reset, there is no point in
      keeping the packets on the write queue until the connection
      is closed.
      
      RFC 793 (page 70) and RFC 793-bis (page 64) both suggest
      purging the write queue upon RST:
      https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07
      
      Moreover, this is essential for a correct MSG_ZEROCOPY
      implementation, because userspace cannot call close(fd)
      before receiving zerocopy signals even when the connection
      is reset.
      
      Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY")
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a27fd7a8
    • David S. Miller's avatar
      Merge branch 'tcp-revert-a-F-RTO-extension-due-to-broken-middle-boxes' · 55e84dd7
      David S. Miller authored
      Yuchung Cheng says:
      
      ====================
      tcp: revert a F-RTO extension due to broken middle-boxes
      
      This patch series reverts a (non-standard) TCP F-RTO extension that aimed
      to detect more spurious timeouts. Unfortunately it could result in poor
      performance due to broken middle-boxes that modify TCP packets. E.g.
      https://www.spinics.net/lists/netdev/msg484154.html
      We believe the best and simplest solution is to just revert the change.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55e84dd7
    • Yuchung Cheng's avatar
      tcp: revert F-RTO extension to detect more spurious timeouts · fc68e171
      Yuchung Cheng authored
      This reverts commit 89fe18e4.
      
      While the patch could detect more spurious timeouts, it could cause
      poor TCP performance on broken middle-boxes that modifies TCP packets
      (e.g. receive window, SACK options). Since the performance gain is
      much smaller compared to the potential loss. The best solution is
      to fully revert the change.
      
      Fixes: 89fe18e4 ("tcp: extend F-RTO to catch more spurious timeouts")
      Reported-by: default avatarTeodor Milkov <tm@del.bg>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc68e171
    • Yuchung Cheng's avatar
      tcp: revert F-RTO middle-box workaround · d4131f09
      Yuchung Cheng authored
      This reverts commit cc663f4d. While fixing
      some broken middle-boxes that modifies receive window fields, it does not
      address middle-boxes that strip off SACK options. The best solution is
      to fully revert this patch and the root F-RTO enhancement.
      
      Fixes: cc663f4d ("tcp: restrict F-RTO to work-around broken middle-boxes")
      Reported-by: default avatarTeodor Milkov <tm@del.bg>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4131f09
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · c8431622
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2018-02-27
      
      please apply some more qeth patches for -net and stable.
      
      One patch fixes a performance bug in the TSO path. Then there's several
      more fixes for IP management on L3 devices - including a revert, so that
      the subsequent fix cleanly applies to earlier kernels.
      The final patch takes care of a race in the control IO code that causes
      qeth to miss the cmd response, and subsequently trigger device recovery.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8431622
    • Julian Wiedmann's avatar
      s390/qeth: fix IPA command submission race · d22ffb5a
      Julian Wiedmann authored
      If multiple IPA commands are build & sent out concurrently,
      fill_ipacmd_header() may assign a seqno value to a command that's
      different from what send_control_data() later assigns to this command's
      reply.
      This is due to other commands passing through send_control_data(),
      and incrementing card->seqno.ipa along the way.
      
      So one IPA command has no reply that's waiting for its seqno, while some
      other IPA command has multiple reply objects waiting for it.
      Only one of those waiting replies wins, and the other(s) times out and
      triggers a recovery via send_ipa_cmd().
      
      Fix this by making sure that the same seqno value is assigned to
      a command and its reply object.
      Do so immediately before submitting the command & while holding the
      irq_pending "lock", to produce nicely ascending seqnos.
      
      As a side effect, *all* IPA commands now use a reply object that's
      waiting for its actual seqno. Previously, early IPA commands that were
      submitted while the card was still DOWN used the "catch-all" IDX seqno.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d22ffb5a
    • Julian Wiedmann's avatar
      s390/qeth: fix IP address lookup for L3 devices · c5c48c58
      Julian Wiedmann authored
      Current code ("qeth_l3_ip_from_hash()") matches a queried address object
      against objects in the IP table by IP address, Mask/Prefix Length and
      MAC address ("qeth_l3_ipaddrs_is_equal()"). But what callers actually
      require is either
      a) "is this IP address registered" (ie. match by IP address only),
      before adding a new address.
      b) or "is this address object registered" (ie. match all relevant
         attributes), before deleting an address.
      
      Right now
      1. the ADD path is too strict in its lookup, and eg. doesn't detect
      conflicts between an existing NORMAL address and a new VIPA address
      (because the NORMAL address will have mask != 0, while VIPA has
      a mask == 0),
      2. the DELETE path is not strict enough, and eg. allows del_rxip() to
      delete a VIPA address as long as the IP address matches.
      
      Fix all this by adding helpers (_addr_match_ip() and _addr_match_all())
      that do the appropriate checking.
      
      Note that the ADD path for NORMAL addresses is special, as qeth keeps
      track of how many times such an address is in use (and there is no
      immediate way of returning errors to the caller). So when a requested
      NORMAL address _fully_ matches an existing one, it's not considered a
      conflict and we merely increment the refcount.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5c48c58
    • Julian Wiedmann's avatar
      Revert "s390/qeth: fix using of ref counter for rxip addresses" · 4964c66f
      Julian Wiedmann authored
      This reverts commit cb816192.
      
      The issue this attempted to fix never actually occurs.
      l3_add_rxip() checks (via l3_ip_from_hash()) if the requested address
      was previously added to the card. If so, it returns -EEXIST and doesn't
      call l3_add_ip().
      As a result, the "address exists" path in l3_add_ip() is never taken
      for rxip addresses, and this patch had no effect.
      
      Fixes: cb816192 ("s390/qeth: fix using of ref counter for rxip addresses")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4964c66f
    • Julian Wiedmann's avatar
      s390/qeth: fix double-free on IP add/remove race · 14d066c3
      Julian Wiedmann authored
      Registering an IPv4 address with the HW takes quite a while, so we
      temporarily drop the ip_htable lock. Any concurrent add/remove of the
      same IP adjusts the IP's use count, and (on remove) is then blocked by
      addr->in_progress.
      After the register call has completed, we check the use count for
      concurrently attempted add/remove calls - and possibly straight-away
      deregister the IP again. This happens via l3_delete_ip(), which
      1) looks up the queried IP in the htable (getting a reference to the
         *same* queried object),
      2) deregisters the IP from the HW, and
      3) frees the IP object.
      
      The caller in l3_add_ip() then does a second free on the same object.
      
      For this case, skip all the extra checks and lookups in l3_delete_ip()
      and just deregister & free the IP object ourselves.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14d066c3
    • Julian Wiedmann's avatar
      s390/qeth: fix IP removal on offline cards · 98d823ab
      Julian Wiedmann authored
      If the HW is not reachable, then none of the IPs in qeth's internal
      table has been registered with the HW yet. So when deleting such an IP,
      there's no need to stage it for deregistration - just drop it from
      the table.
      
      This fixes the "add-delete-add" scenario on an offline card, where the
      the second "add" merely increments the IP's use count. But as the IP is
      still set to DISP_ADDR_DELETE from the previous "delete" step,
      l3_recover_ip() won't register it with the HW when the card goes online.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98d823ab
    • Julian Wiedmann's avatar
      s390/qeth: fix overestimated count of buffer elements · 12472af8
      Julian Wiedmann authored
      qeth_get_elements_for_range() doesn't know how to handle a 0-length
      range (ie. start == end), and returns 1 when it should return 0.
      Such ranges occur on TSO skbs, where the L2/L3/L4 headers (and thus all
      of the skb's linear data) are skipped when mapping the skb into regular
      buffer elements.
      
      This overestimation may cause several performance-related issues:
      1. sub-optimal IO buffer selection, where the next buffer gets selected
         even though the skb would actually still fit into the current buffer.
      2. forced linearization, if the element count for a non-linear skb
         exceeds QETH_MAX_BUFFER_ELEMENTS.
      
      Rather than modifying qeth_get_elements_for_range() and adding overhead
      to every caller, fix up those callers that are in risk of passing a
      0-length range.
      
      Fixes: 2863c613 ("qeth: refactor calculation of SBALE count")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12472af8
    • Claudiu Manoil's avatar
      gianfar: Fix Rx byte accounting for ndev stats · 590399dd
      Claudiu Manoil authored
      Don't include in the Rx bytecount of the packet sent up the stack:
      the FCB (frame control block), and the padding bytes inserted by
      the controller into the frame payload, nor the FCS. All these are
      being pulled out of the skb by gfar_process_frame().
      This issue is old, likely from the driver's beginnings, however
      it was amplified by recent:
      commit d903ec77 ("gianfar: simplify FCS handling and fix memory leak")
      which basically added the FCS to the Rx bytecount, and so brought
      this to my attention.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      590399dd
  6. 27 Feb, 2018 5 commits