1. 15 Nov, 2015 13 commits
    • David S. Miller's avatar
      Merge branch 'packet-fixes' · 52b46202
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      packet fixes
      
      Fixes a couple of issues in packet sockets, i.e. on TX ring side. See
      individual patches for details.
      
      v2 -> v3:
       - First two patches unchanged, kept Jason's Ack
       - Reworked 3rd patch and split into 3:
        - check for dev type as discussed with Willem
        - infer skb->protocol
        - fix max len for dgram
      v1 -> v2:
       - Added patch 2 as suggested by Dave
       - Rest is unchanged from previous submission
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52b46202
    • Daniel Borkmann's avatar
      packet: fix tpacket_snd max frame len · 5cfb4c8d
      Daniel Borkmann authored
      Since it's introduction in commit 69e3c75f ("net: TX_RING and
      packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
      side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
      reserve check should have reserve as 0, but currently, this is
      unconditionally set (in it's original form as dev->hard_header_len).
      
      I think this is not correct since tpacket_fill_skb() would then
      take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
      the extra VLAN_HLEN could be possible in both cases. Presumably, the
      reserve code was copied from packet_snd(), but later on missed the
      check. Make it similar as we have it in packet_snd().
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cfb4c8d
    • Daniel Borkmann's avatar
      packet: infer protocol from ethernet header if unset · c72219b7
      Daniel Borkmann authored
      In case no struct sockaddr_ll has been passed to packet
      socket's sendmsg() when doing a TX_RING flush run, then
      skb->protocol is set to po->num instead, which is the protocol
      passed via socket(2)/bind(2).
      
      Applications only xmitting can go the path of allocating the
      socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the
      TX_RING with sll_protocol of 0. That way, register_prot_hook()
      is neither called on creation nor on bind time, which saves
      cycles when there's no interest in capturing anyway.
      
      That leaves us however with po->num 0 instead and therefore
      the TX_RING flush run sets skb->protocol to 0 as well. Eric
      reported that this leads to problems when using tools like
      trafgen over bonding device. I.e. the bonding's hash function
      could invoke the kernel's flow dissector, which depends on
      skb->protocol being properly set. In the current situation, all
      the traffic is then directed to a single slave.
      
      Fix it up by inferring skb->protocol from the Ethernet header
      when not set and we have ARPHRD_ETHER device type. This is only
      done in case of SOCK_RAW and where we have a dev->hard_header_len
      length. In case of ARPHRD_ETHER devices, this is guaranteed to
      cover ETH_HLEN, and therefore being accessed on the skb after
      the skb_store_bits().
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c72219b7
    • Daniel Borkmann's avatar
      packet: only allow extra vlan len on ethernet devices · 3c70c132
      Daniel Borkmann authored
      Packet sockets can be used by various net devices and are not
      really restricted to ARPHRD_ETHER device types. However, when
      currently checking for the extra 4 bytes that can be transmitted
      in VLAN case, our assumption is that we generally probe on
      ARPHRD_ETHER devices. Therefore, before looking into Ethernet
      header, check the device type first.
      
      This also fixes the issue where non-ARPHRD_ETHER devices could
      have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
      the check would test unfilled linear part of the skb (instead
      of non-linear).
      
      Fixes: 57f89bfa ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
      Fixes: 52f1454f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c70c132
    • Daniel Borkmann's avatar
      packet: always probe for transport header · 8fd6c80d
      Daniel Borkmann authored
      We concluded that the skb_probe_transport_header() should better be
      called unconditionally. Avoiding the call into the flow dissector has
      also not really much to do with the direct xmit mode.
      
      While it seems that only virtio_net code makes use of GSO from non
      RX/TX ring packet socket paths, we should probe for a transport header
      nevertheless before they hit devices.
      
      Reference: http://thread.gmane.org/gmane.linux.network/386173/Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fd6c80d
    • Daniel Borkmann's avatar
      packet: do skb_probe_transport_header when we actually have data · efdfa2f7
      Daniel Borkmann authored
      In tpacket_fill_skb() commit c1aad275 ("packet: set transport
      header before doing xmit") and later on 40893fd0 ("net: switch
      to use skb_probe_transport_header()") was probing for a transport
      header on the skb from a ring buffer slot, but at a time, where
      the skb has _not even_ been filled with data yet. So that call into
      the flow dissector is pretty useless. Lets do it after we've set
      up the skb frags.
      
      Fixes: c1aad275 ("packet: set transport header before doing xmit")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efdfa2f7
    • Kamal Mostafa's avatar
      tools/net: Use include/uapi with __EXPORTED_HEADERS__ · d7475de5
      Kamal Mostafa authored
      Use the local uapi headers to keep in sync with "recently" added #define's
      (e.g. SKF_AD_VLAN_TPID).  Refactored CFLAGS, and bpf_asm doesn't need -I.
      
      Fixes: 3f356385 ("filter: bpf_asm: add minimal bpf asm tool")
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7475de5
    • David S. Miller's avatar
      Merge branch 'ipv6-route-fixes' · e63e904c
      David S. Miller authored
      Martin KaFai Lau says:
      
      ====================
      ipv6: Fixes for pmtu update and DST_NOCACHE route
      
      This patchset fixes:
      1. An oops during IPv6 pmtu update on a IPv4 GRE running
         in an IPSec setup
      2. Misc fixes on DST_NOCACHE route
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e63e904c
    • Martin KaFai Lau's avatar
      ipv6: Check rt->dst.from for the DST_NOCACHE route · 02bcf4e0
      Martin KaFai Lau authored
      All DST_NOCACHE rt6_info used to have rt->dst.from set to
      its parent.
      
      After commit 8e3d5be7 ("ipv6: Avoid double dst_free"),
      DST_NOCACHE is also set to rt6_info which does not have
      a parent (i.e. rt->dst.from is NULL).
      
      This patch catches the rt->dst.from == NULL case.
      
      Fixes: 8e3d5be7 ("ipv6: Avoid double dst_free")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02bcf4e0
    • Martin KaFai Lau's avatar
      ipv6: Check expire on DST_NOCACHE route · 5973fb1e
      Martin KaFai Lau authored
      Since the expires of the DST_NOCACHE rt can be set during
      the ip6_rt_update_pmtu(), we also need to consider the expires
      value when doing ip6_dst_check().
      
      This patches creates __rt6_check_expired() to only
      check the expire value (if one exists) of the current rt.
      
      In rt6_dst_from_check(), it adds __rt6_check_expired() as
      one of the condition check.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5973fb1e
    • Martin KaFai Lau's avatar
      ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree · 0d3f6d29
      Martin KaFai Lau authored
      The original bug report:
      https://bugzilla.redhat.com/show_bug.cgi?id=1272571
      
      The setup has a IPv4 GRE tunnel running in a IPSec.  The bug
      happens when ndisc starts sending router solicitation at the gre
      interface.  The simplified oops stack is like:
      
      __lock_acquire+0x1b2/0x1c30
      lock_acquire+0xb9/0x140
      _raw_write_lock_bh+0x3f/0x50
      __ip6_ins_rt+0x2e/0x60
      ip6_ins_rt+0x49/0x50
      ~~~~~~~~
      __ip6_rt_update_pmtu.part.54+0x145/0x250
      ip6_rt_update_pmtu+0x2e/0x40
      ~~~~~~~~
      ip_tunnel_xmit+0x1f1/0xf40
      __gre_xmit+0x7a/0x90
      ipgre_xmit+0x15a/0x220
      dev_hard_start_xmit+0x2bd/0x480
      __dev_queue_xmit+0x696/0x730
      dev_queue_xmit+0x10/0x20
      neigh_direct_output+0x11/0x20
      ip6_finish_output2+0x21f/0x770
      ip6_finish_output+0xa7/0x1d0
      ip6_output+0x56/0x190
      ~~~~~~~~
      ndisc_send_skb+0x1d9/0x400
      ndisc_send_rs+0x88/0xc0
      ~~~~~~~~
      
      The rt passed to ip6_rt_update_pmtu() is created by
      icmp6_dst_alloc() and it is not managed by the fib6 tree,
      so its rt6i_table == NULL.  When __ip6_rt_update_pmtu() creates
      a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL
      and it causes the ip6_ins_rt() oops.
      
      During pmtu update, we only want to create a RTF_CACHE clone
      from a rt which is currently managed (or owned) by the
      fib6 tree.  It means either rt->rt6i_node != NULL or
      rt is a RTF_PCPU clone.
      
      It is worth to note that rt6i_table may not be NULL even it is
      not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()).
      Hence, rt6i_node is a better check instead of rt6i_table.
      
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Reported-by: default avatarChris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
      Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d3f6d29
    • Colin Ian King's avatar
      fjes: fix inconsistent indenting · 9001d94d
      Colin Ian King authored
      minor change, indenting is one tab out.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9001d94d
    • Hannes Frederic Sowa's avatar
      af-unix: fix use-after-free with concurrent readers while splicing · 73ed5d25
      Hannes Frederic Sowa authored
      During splicing an af-unix socket to a pipe we have to drop all
      af-unix socket locks. While doing so we allow another reader to enter
      unix_stream_read_generic which can read, copy and finally free another
      skb. If exactly this skb is just in process of being spliced we get a
      use-after-free report by kasan.
      
      First, we must make sure to not have a free while the skb is used during
      the splice operation. We simply increment its use counter before unlocking
      the reader lock.
      
      Stream sockets have the nice characteristic that we don't care about
      zero length writes and they never reach the peer socket's queue. That
      said, we can take the UNIXCB.consumed field as the indicator if the
      skb was already freed from the socket's receive queue. If the skb was
      fully consumed after we locked the reader side again we know it has been
      dropped by a second reader. We indicate a short read to user space and
      abort the current splice operation.
      
      This bug has been found with syzkaller
      (http://github.com/google/syzkaller) by Dmitry Vyukov.
      
      Fixes: 2b514574 ("net: af_unix: implement splice for stream af_unix sockets")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73ed5d25
  2. 12 Nov, 2015 3 commits
    • Arnd Bergmann's avatar
      stmmac: avoid ipq806x constant overflow warning · 49e4a229
      Arnd Bergmann authored
      Building dwmac-ipq806x on a 64-bit architecture produces a harmless
      warning from gcc:
      
      stmmac/dwmac-ipq806x.c: In function 'ipq806x_gmac_probe':
      include/linux/bitops.h:6:19: warning: overflow in implicit constant conversion [-Woverflow]
        val = QSGMII_PHY_CDR_EN |
      stmmac/dwmac-ipq806x.c:333:8: note: in expansion of macro 'QSGMII_PHY_CDR_EN'
       #define QSGMII_PHY_CDR_EN   BIT(0)
       #define BIT(nr)   (1UL << (nr))
      
      This is a result of the type conversion rules in C, when we take the
      logical OR of multiple different types. In particular, we have
      and unsigned long
      
      	QSGMII_PHY_CDR_EN == BIT(0) == (1ul << 0) == 0x0000000000000001ul
      
      and a signed int
      
      	0xC << QSGMII_PHY_TX_DRV_AMP_OFFSET == 0xc0000000
      
      which together gives a signed long value
      
      	0xffffffffc0000001l
      
      and when this is passed into a function that takes an unsigned int type,
      gcc warns about the signed overflow and the loss of the upper 32-bits that
      are all ones.
      
      This patch adds 'ul' type modifiers to the literal numbers passed in
      here, so now the expression remains an 'unsigned long' with the upper
      bits all zero, and that avoids the signed overflow and the warning.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: b1c17215 ("stmmac: add ipq806x glue layer")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49e4a229
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 382a483e
      David S. Miller authored
      Pablo Neira Ayuso:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for your net tree. This
      large batch that includes fixes for ipset, netfilter ingress, nf_tables
      dynamic set instantiation and a longstanding Kconfig dependency problem.
      More specifically, they are:
      
      1) Add missing check for empty hook list at the ingress hook, from
         Florian Westphal.
      
      2) Input and output interface are swapped at the ingress hook,
         reported by Patrick McHardy.
      
      3) Resolve ipset extension alignment issues on ARM, patch from Jozsef
         Kadlecsik.
      
      4) Fix bit check on bitmap in ipset hash type, also from Jozsef.
      
      5) Release buckets when all entries have expired in ipset hash type,
         again from Jozsef.
      
      6) Oneliner to initialize conntrack tuple object in the PPTP helper,
         otherwise the conntrack lookup may fail due to random bits in the
         structure holes, patch from Anthony Lineham.
      
      7) Silence a bogus gcc warning in nfnetlink_log, from Arnd Bergmann.
      
      8) Fix Kconfig dependency problems with TPROXY, socket and dup, also
         from Arnd.
      
      9) Add __netdev_alloc_pcpu_stats() to allow creating percpu counters
         from atomic context, this is required by the follow up fix for
         nf_tables.
      
      10) Fix crash from the dynamic set expression, we have to add new clone
          operation that should be defined when a simple memcpy is not enough.
          This resolves a crash when using per-cpu counters with new Patrick
          McHardy's flow table nft support.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      382a483e
    • françois romieu's avatar
      r8169: fix kasan reported skb use-after-free. · 39174291
      françois romieu authored
      Signed-off-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Reported-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Fixes: d7d2d89d ("r8169: Add software counter for multicast packages")
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarCorinna Vinschen <vinschen@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39174291
  3. 11 Nov, 2015 24 commits