1. 08 Jun, 2016 7 commits
    • Jakub Sitnicki's avatar
      ipv6: Skip XFRM lookup if dst_entry in socket cache is valid · 00bc0ef5
      Jakub Sitnicki authored
      At present we perform an xfrm_lookup() for each UDPv6 message we
      send. The lookup involves querying the flow cache (flow_cache_lookup)
      and, in case of a cache miss, creating an XFRM bundle.
      
      If we miss the flow cache, we can end up creating a new bundle and
      deriving the path MTU (xfrm_init_pmtu) from on an already transformed
      dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
      to xfrm_lookup(). This can happen only if we're caching the dst_entry
      in the socket, that is when we're using a connected UDP socket.
      
      To put it another way, the path MTU shrinks each time we miss the flow
      cache, which later on leads to incorrectly fragmented payload. It can
      be observed with ESPv6 in transport mode:
      
        1) Set up a transformation and lower the MTU to trigger fragmentation
          # ip xfrm policy add dir out src ::1 dst ::1 \
            tmpl src ::1 dst ::1 proto esp spi 1
          # ip xfrm state add src ::1 dst ::1 \
            proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
          # ip link set dev lo mtu 1500
      
        2) Monitor the packet flow and set up an UDP sink
          # tcpdump -ni lo -ttt &
          # socat udp6-listen:12345,fork /dev/null &
      
        3) Send a datagram that needs fragmentation with a connected socket
          # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
          2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
          00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
          00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
          00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
          (^ ICMPv6 Parameter Problem)
          00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136
      
        4) Compare it to a non-connected socket
          # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
          00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
          00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
      
      What happens in step (3) is:
      
        1) when connecting the socket in __ip6_datagram_connect(), we
           perform an XFRM lookup, miss the flow cache, create an XFRM
           bundle, and cache the destination,
      
        2) afterwards, when sending the datagram, we perform an XFRM lookup,
           again, miss the flow cache (due to mismatch of flowi6_iif and
           flowi6_oif, which is an issue of its own), and recreate an XFRM
           bundle based on the cached (and already transformed) destination.
      
      To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
      altogether whenever we already have a destination entry cached in the
      socket. This prevents the path MTU shrinkage and brings us on par with
      UDPv4.
      
      The fix also benefits connected PINGv6 sockets, another user of
      ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
      twice.
      
      Joint work with Hannes Frederic Sowa.
      Reported-by: default avatarJan Tluka <jtluka@redhat.com>
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00bc0ef5
    • Guillaume Nault's avatar
      l2tp: fix configuration passed to setup_udp_tunnel_sock() · a5c5e2da
      Guillaume Nault authored
      Unused fields of udp_cfg must be all zeros. Otherwise
      setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
      callbacks with garbage, eventually resulting in panic when used by
      udp_gro_receive().
      
      [   72.694123] BUG: unable to handle kernel paging request at ffff880033f87d78
      [   72.695518] IP: [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530] PGD 26e2067 PUD 26e3067 PMD 342ed063 PTE 8000000033f87163
      [   72.696530] Oops: 0011 [#1] SMP KASAN
      [   72.696530] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pptp gre pppox ppp_generic slhc crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel evdev aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper serio_raw acpi_cpufreq button proc\
      essor ext4 crc16 jbd2 mbcache virtio_blk virtio_net virtio_pci virtio_ring virtio
      [   72.696530] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.7.0-rc1 #1
      [   72.696530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
      [   72.696530] task: ffff880035b59700 ti: ffff880035b70000 task.ti: ffff880035b70000
      [   72.696530] RIP: 0010:[<ffff880033f87d78>]  [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530] RSP: 0018:ffff880035f87bc0  EFLAGS: 00010246
      [   72.696530] RAX: ffffed000698f996 RBX: ffff88003326b840 RCX: ffffffff814cc823
      [   72.696530] RDX: ffff88003326b840 RSI: ffff880033e48038 RDI: ffff880034c7c780
      [   72.696530] RBP: ffff880035f87c18 R08: 000000000000a506 R09: 0000000000000000
      [   72.696530] R10: ffff880035f87b38 R11: ffff880034b9344d R12: 00000000ebfea715
      [   72.696530] R13: 0000000000000000 R14: ffff880034c7c780 R15: 0000000000000000
      [   72.696530] FS:  0000000000000000(0000) GS:ffff880035f80000(0000) knlGS:0000000000000000
      [   72.696530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   72.696530] CR2: ffff880033f87d78 CR3: 0000000033c98000 CR4: 00000000000406a0
      [   72.696530] Stack:
      [   72.696530]  ffffffff814cc834 ffff880034b93468 0000001481416818 ffff88003326b874
      [   72.696530]  ffff880034c7ccb0 ffff880033e48038 ffff88003326b840 ffff880034b93462
      [   72.696530]  ffff88003326b88a ffff88003326b88c ffff880034b93468 ffff880035f87c70
      [   72.696530] Call Trace:
      [   72.696530]  <IRQ>
      [   72.696530]  [<ffffffff814cc834>] ? udp_gro_receive+0x1c6/0x1f9
      [   72.696530]  [<ffffffff814ccb1c>] udp4_gro_receive+0x2b5/0x310
      [   72.696530]  [<ffffffff814d989b>] inet_gro_receive+0x4a3/0x4cd
      [   72.696530]  [<ffffffff81431b32>] dev_gro_receive+0x584/0x7a3
      [   72.696530]  [<ffffffff810adf7a>] ? __lock_is_held+0x29/0x64
      [   72.696530]  [<ffffffff814321f7>] napi_gro_receive+0x124/0x21d
      [   72.696530]  [<ffffffffa000b145>] virtnet_receive+0x8df/0x8f6 [virtio_net]
      [   72.696530]  [<ffffffffa000b27e>] virtnet_poll+0x1d/0x8d [virtio_net]
      [   72.696530]  [<ffffffff81431350>] net_rx_action+0x15b/0x3b9
      [   72.696530]  [<ffffffff815893d6>] __do_softirq+0x216/0x546
      [   72.696530]  [<ffffffff81062392>] irq_exit+0x49/0xb6
      [   72.696530]  [<ffffffff81588e9a>] do_IRQ+0xe2/0xfa
      [   72.696530]  [<ffffffff81587a49>] common_interrupt+0x89/0x89
      [   72.696530]  <EOI>
      [   72.696530]  [<ffffffff810b05df>] ? trace_hardirqs_on_caller+0x229/0x270
      [   72.696530]  [<ffffffff8102b3c7>] ? default_idle+0x1c/0x2d
      [   72.696530]  [<ffffffff8102b3c5>] ? default_idle+0x1a/0x2d
      [   72.696530]  [<ffffffff8102bb8c>] arch_cpu_idle+0xa/0xc
      [   72.696530]  [<ffffffff810a6c39>] default_idle_call+0x1a/0x1c
      [   72.696530]  [<ffffffff810a6d96>] cpu_startup_entry+0x15b/0x20f
      [   72.696530]  [<ffffffff81039a81>] start_secondary+0x12c/0x133
      [   72.696530] Code: ff ff ff ff ff ff ff ff ff ff 7f ff ff ff ff ff ff ff 7f 00 7e f8 33 00 88 ff ff 6d 61 58 81 ff ff ff ff 5e de 0a 81 ff ff ff ff <00> 5c e2 34 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   72.696530] RIP  [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530]  RSP <ffff880035f87bc0>
      [   72.696530] CR2: ffff880033f87d78
      [   72.696530] ---[ end trace ad7758b9a1dccf99 ]---
      [   72.696530] Kernel panic - not syncing: Fatal exception in interrupt
      [   72.696530] Kernel Offset: disabled
      [   72.696530] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      v2: use empty initialiser instead of "{ NULL }" to avoid relying on
          first field's type.
      
      Fixes: 38fd2af2 ("udp: Add socket based GRO and config")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5c5e2da
    • Hariprasad Shenai's avatar
    • Ben Dooks's avatar
      net-sysfs: fix missing <linux/of_net.h> · 88832a22
      Ben Dooks authored
      The of_find_net_device_by_node() function is defined in
      <linux/of_net.h> but not included in the .c file that
      implements it. Fix the following warning by including the
      header:
      
      net/core/net-sysfs.c:1494:19: warning: symbol 'of_find_net_device_by_node' was not declared. Should it be static?
      Signed-off-by: default avatarBen Dooks <ben.dooks@codethink.co.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88832a22
    • Toshiaki Makita's avatar
      bridge: Don't insert unnecessary local fdb entry on changing mac address · 0b148def
      Toshiaki Makita authored
      The missing br_vlan_should_use() test caused creation of an unneeded
      local fdb entry on changing mac address of a bridge device when there is
      a vlan which is configured on a bridge port but not on the bridge
      device.
      
      Fixes: 2594e906 ("bridge: vlan: add per-vlan struct and move to rhashtables")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b148def
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 32565644
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains two Netfilter/IPVS fixes for your net
      tree, they are:
      
      1) Fix missing alignment in next offset calculation for standard
         targets, introduced in the previous merge window, patch from
         Florian Westphal.
      
      2) Fix to correct the handling of outgoing connections which use the
         SIP-pe such that the binding of a real-server is updated when needed.
         This was an omission from changes introduced by Marco Angaroni in
         the previous merge window too, to allow handling of outgoing
         connections by the SIP-pe. Patch and report came via Simon Horman.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32565644
    • Yuchung Cheng's avatar
      tcp: record TLP and ER timer stats in v6 stats · ce3cf4ec
      Yuchung Cheng authored
      The v6 tcp stats scan do not provide TLP and ER timer information
      correctly like the v4 version . This patch fixes that.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Fixes: eed530b6 ("tcp: early retransmit")
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce3cf4ec
  2. 07 Jun, 2016 20 commits
  3. 06 Jun, 2016 4 commits
    • Helge Deller's avatar
      soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF · 19575988
      Helge Deller authored
      Commit 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
      missed to add the compat case for the SO_ATTACH_REUSEPORT_CBPF option.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19575988
    • Helge Deller's avatar
      soreuseport: Fix reuseport_bpf testcase on 32bit architectures · fc100a7f
      Helge Deller authored
      This fixes the following compiler warnings when compiling the
      reuseport_bpf testcase on a 32 bit platform:
      
      reuseport_bpf.c: In function ‘attach_ebpf’:
      reuseport_bpf.c:114:15: warning: cast from pointer to integer of ifferent size [-Wpointer-to-int-cast]
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc100a7f
    • Michal Schmidt's avatar
      bnx2x: allow adding VLANs while interface is down · a02cc9d3
      Michal Schmidt authored
      Since implementing VLAN filtering in commit 05cc5a39
      ("bnx2x: add vlan filtering offload") bnx2x refuses to add a VLAN while
      the interface is down:
      
        # ip link add link enp3s0f0 enp3s0f0_10 type vlan id 10
        RTNETLINK answers: Bad address
      
      and in dmesg (with bnx2x.debug=0x20):
        bnx2x: [bnx2x_vlan_rx_add_vid:12941(enp3s0f0)]Ignoring VLAN
        configuration the interface is down
      
      Other drivers have no problem with this.
      Fix this peculiar behavior in the following way:
       - Accept requests to add/kill VID regardless of the device state.
         Maintain the requested list of VIDs in the bp->vlan_reg list.
       - If the device is up, try to configure the VID list into the hardware.
         If we run out of VLAN credits or encounter a failure configuring an
         entry, fall back to accepting all VLANs.
         If we successfully configure all entries from the list, turn the
         fallback off.
       - Use the same code for reconfiguring VLANs during NIC load.
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Acked-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a02cc9d3
    • Marco Angaroni's avatar
      ipvs: update real-server binding of outgoing connections in SIP-pe · 3ec10d3a
      Marco Angaroni authored
      Previous patch that introduced handling of outgoing packets in SIP
      persistent-engine did not call ip_vs_check_template() in case packet was
      matching a connection template. Assumption was that real-server was
      healthy, since it was sending a packet just in that moment.
      
      There are however real-server fault conditions requiring that association
      between call-id and real-server (represented by connection template)
      gets updated. Here is an example of the sequence of events:
        1) RS1 is a back2back user agent that handled call-id1 and call-id2
        2) RS1 is down and was marked as unavailable
        3) new message from outside comes to IPVS with call-id1
        4) IPVS reschedules the message to RS2, which becomes new call handler
        5) RS2 forwards the message outside, translating call-id1 to call-id2
        6) inside pe->conn_out() IPVS matches call-id2 with existing template
        7) IPVS does not change association call-id2 <-> RS1
        8) new message comes from client with call-id2
        9) IPVS reschedules the message to a real-server potentially different
           from RS2, which is now the correct destination
      
      This patch introduces ip_vs_check_template() call in the handling of
      outgoing packets for SIP-pe. And also introduces a second optional
      argument for ip_vs_check_template() that allows to check if dest
      associated to a connection template is the same dest that was identified
      as the source of the packet. This is to change the real-server bound to a
      particular call-id independently from its availability status: the idea
      is that it's more reliable, for in->out direction (where internal
      network can be considered trusted), to always associate a call-id with
      the last real-server that used it in one of its messages. Think about
      above sequence of events where, just after step 5, RS1 returns instead
      to be available.
      
      Comparison of dests is done by simply comparing pointers to struct
      ip_vs_dest; there should be no cases where struct ip_vs_dest keeps its
      memory address, but represent a different real-server in terms of
      ip-address / port.
      
      Fixes: 39b97223 ("ipvs: handle connections started by real-servers")
      Signed-off-by: default avatarMarco Angaroni <marcoangaroni@gmail.com>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      3ec10d3a
  4. 05 Jun, 2016 1 commit
  5. 04 Jun, 2016 1 commit
  6. 03 Jun, 2016 7 commits