1. 24 Jan, 2017 2 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL · 35d0ac90
      Pablo Neira Ayuso authored
      If the element exists and no NLM_F_EXCL is specified, do not bump
      set->nelems, otherwise we leak one set element slot. This problem
      amplifies if the set is full since the abort path always decrements the
      counter for the -ENFILE case too, giving one spare extra slot.
      
      Fix this by moving set->nelems update to nft_add_set_elem() after
      successful element insertion. Moreover, remove the element if the set is
      full so there is no need to rely on the abort path to undo things
      anymore.
      
      Fixes: c016c7e4 ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      35d0ac90
    • Liping Zhang's avatar
      netfilter: nft_log: restrict the log prefix length to 127 · 5ce6b04c
      Liping Zhang authored
      First, log prefix will be truncated to NF_LOG_PREFIXLEN-1, i.e. 127,
      at nf_log_packet(), so the extra part is useless.
      
      Second, after adding a log rule with a very very long prefix, we will
      fail to dump the nft rules after this _special_ one, but acctually,
      they do exist. For example:
        # name_65000=$(printf "%0.sQ" {1..65000})
        # nft add rule filter output log prefix "$name_65000"
        # nft add rule filter output counter
        # nft add rule filter output counter
        # nft list chain filter output
        table ip filter {
            chain output {
                type filter hook output priority 0; policy accept;
            }
        }
      
      So now, restrict the log prefix length to NF_LOG_PREFIXLEN-1.
      
      Fixes: 96518518 ("netfilter: add nftables")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5ce6b04c
  2. 23 Jan, 2017 1 commit
    • Liping Zhang's avatar
      netfilter: nf_tables: validate the name size when possible · b2fbd044
      Liping Zhang authored
      Currently, if the user add a stateful object with the name size exceed
      NFT_OBJ_MAXNAMELEN - 1 (i.e. 31), we truncate it down to 31 silently.
      This is not friendly, furthermore, this will cause duplicated stateful
      objects when the first 31 characters of the name is same. So limit the
      stateful object's name size to NFT_OBJ_MAXNAMELEN - 1.
      
      After apply this patch, error message will be printed out like this:
        # name_32=$(printf "%0.sQ" {1..32})
        # nft add counter filter $name_32
        <cmdline>:1:1-52: Error: Could not process rule: Numerical result out
        of range
        add counter filter QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      Also this patch cleans up the codes which missing the name size limit
      validation in nftables.
      
      Fixes: e5009240 ("netfilter: nf_tables: add stateful objects")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b2fbd044
  3. 19 Jan, 2017 2 commits
    • Florian Westphal's avatar
      netfilter: conntrack: refine gc worker heuristics, redux · e5072053
      Florian Westphal authored
      This further refines the changes made to conntrack gc_worker in
      commit e0df8cae ("netfilter: conntrack: refine gc worker heuristics").
      
      The main idea of that change was to reduce the scan interval when evictions
      take place.
      
      However, on the reporters' setup, there are 1-2 million conntrack entries
      in total and roughly 8k new (and closing) connections per second.
      
      In this case we'll always evict at least one entry per gc cycle and scan
      interval is always at 1 jiffy because of this test:
      
       } else if (expired_count) {
           gc_work->next_gc_run /= 2U;
           next_run = msecs_to_jiffies(1);
      
      being true almost all the time.
      
      Given we scan ~10k entries per run its clearly wrong to reduce interval
      based on nonzero eviction count, it will only waste cpu cycles since a vast
      majorities of conntracks are not timed out.
      
      Thus only look at the ratio (scanned entries vs. evicted entries) to make
      a decision on whether to reduce or not.
      
      Because evictor is supposed to only kick in when system turns idle after
      a busy period, pick a high ratio -- this makes it 50%.  We thus keep
      the idea of increasing scan rate when its likely that table contains many
      expired entries.
      
      In order to not let timed-out entries hang around for too long
      (important when using event logging, in which case we want to timely
      destroy events), we now scan the full table within at most
      GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all
      timed-out entries sit in same slot.
      
      I tested this with a vm under synflood (with
      sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).
      
      While flood is ongoing, interval now stays at its max rate
      (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms).
      
      With feedback from Nicolas Dichtel.
      Reported-by: default avatarDenys Fedoryshchenko <nuclearcat@nuclearcat.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Fixes: b87a2f91 ("netfilter: conntrack: add gc worker to remove timed-out entries")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Tested-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Tested-by: default avatarDenys Fedoryshchenko <nuclearcat@nuclearcat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e5072053
    • Florian Westphal's avatar
      netfilter: conntrack: remove GC_MAX_EVICTS break · 524b698d
      Florian Westphal authored
      Instead of breaking loop and instant resched, don't bother checking
      this in first place (the loop calls cond_resched for every bucket anyway).
      Suggested-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      524b698d
  4. 18 Jan, 2017 1 commit
  5. 16 Jan, 2017 4 commits
    • William Breathitt Gray's avatar
      netfilter: Fix typo in NF_CONNTRACK Kconfig option description · e4670b05
      William Breathitt Gray authored
      The NF_CONNTRACK Kconfig option description makes an incorrect reference
      to the "meta" expression where the "ct" expression would be correct.This
      patch fixes the respective typographical error.
      
      Fixes: d497c635 ("netfilter: add help information to new nf_tables Kconfig options")
      Signed-off-by: default avatarWilliam Breathitt Gray <vilhelm.gray@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e4670b05
    • Liping Zhang's avatar
      netfilter: nf_tables: fix possible oops when dumping stateful objects · d21e540b
      Liping Zhang authored
      When dumping nft stateful objects, if NFTA_OBJ_TABLE and NFTA_OBJ_TYPE
      attributes are not specified either, filter will become NULL, so oops
      will happen(actually nft utility will always set NFTA_OBJ_TABLE attr,
      so I write a test program to make this happen):
      
        BUG: unable to handle kernel NULL pointer dereference at (null)
        IP: nf_tables_dump_obj+0x17c/0x330 [nf_tables]
        [...]
        Call Trace:
        ? nf_tables_dump_obj+0x5/0x330 [nf_tables]
        ? __kmalloc_reserve.isra.35+0x31/0x90
        ? __alloc_skb+0x5b/0x1e0
        netlink_dump+0x124/0x2a0
        __netlink_dump_start+0x161/0x190
        nf_tables_getobj+0xe8/0x280 [nf_tables]
      
      Fixes: a9fea2a3 ("netfilter: nf_tables: allow to filter stateful object dumps by type")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d21e540b
    • Liping Zhang's avatar
      netfilter: rpfilter: fix incorrect loopback packet judgment · 6443ebc3
      Liping Zhang authored
      Currently, we check the existing rtable in PREROUTING hook, if RTCF_LOCAL
      is set, we assume that the packet is loopback.
      
      But this assumption is incorrect, for example, a packet encapsulated
      in ipsec transport mode was received and routed to local, after
      decapsulation, it would be delivered to local again, and the rtable
      was not dropped, so RTCF_LOCAL check would trigger. But actually, the
      packet was not loopback.
      
      So for these normal loopback packets, we can check whether the in device
      is IFF_LOOPBACK or not. For these locally generated broadcast/multicast,
      we can check whether the skb->pkt_type is PACKET_LOOPBACK or not.
      
      Finally, there's a subtle difference between nft fib expr and xtables
      rpfilter extension, user can add the following nft rule to do strict
      rpfilter check:
        # nft add rule x y meta iif eth0 fib saddr . iif oif != eth0 drop
      
      So when the packet is loopback, it's better to store the in device
      instead of the LOOPBACK_IFINDEX, otherwise, after adding the above
      nft rule, locally generated broad/multicast packets will be dropped
      incorrectly.
      
      Fixes: f83a7ea2 ("netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too")
      Fixes: f6d0cbcf ("netfilter: nf_tables: add fib expression")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6443ebc3
    • Alexander Alemayhu's avatar
      netfilter: nf_tables: fix spelling mistakes · d7f5762c
      Alexander Alemayhu authored
      o s/numerice/numeric
      o s/opertaor/operator
      Signed-off-by: default avatarAlexander Alemayhu <alexander@alemayhu.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d7f5762c
  6. 09 Jan, 2017 5 commits
  7. 08 Jan, 2017 4 commits
  8. 07 Jan, 2017 2 commits
  9. 06 Jan, 2017 6 commits
  10. 05 Jan, 2017 4 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · d896b312
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains accumulated Netfilter fixes for your
      net tree:
      
      1) Ensure quota dump and reset happens iff we can deliver numbers to
         userspace.
      
      2) Silence splat on incorrect use of smp_processor_id() from nft_queue.
      
      3) Fix an out-of-bound access reported by KASAN in
         nf_tables_rule_destroy(), patch from Florian Westphal.
      
      4) Fix layer 4 checksum mangling in the nf_tables payload expression
         with IPv6.
      
      5) Fix a race in the CLUSTERIP target from control plane path when two
         threads run to add a new configuration object. Serialize invocations
         of clusterip_config_init() using spin_lock. From Xin Long.
      
      6) Call br_nf_pre_routing_finish_bridge_finish() once we are done with
         the br_nf_pre_routing_finish() hook. From Artur Molchanov.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d896b312
    • Zhu Yanjun's avatar
      r8169: fix the typo in the comment · 9b60047a
      Zhu Yanjun authored
      >From the realtek data sheet, the PID0 should be bit 0.
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b60047a
    • Johannes Berg's avatar
      nl80211: fix sched scan netlink socket owner destruction · 753aacfd
      Johannes Berg authored
      A single netlink socket might own multiple interfaces *and* a
      scheduled scan request (which might belong to another interface),
      so when it goes away both may need to be destroyed.
      
      Remove the schedule_scan_stop indirection to fix this - it's only
      needed for interface destruction because of the way this works
      right now, with a single work taking care of all interfaces.
      
      Cc: stable@vger.kernel.org
      Fixes: 93a1e86c ("nl80211: Stop scheduled scan if netlink client disappears")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      753aacfd
    • Linus Torvalds's avatar
      Merge tag 'xfs-for-linus-4.10-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · e02003b5
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
      
       - fixes for crashes and double-cleanup errors
      
       - XFS maintainership handover
      
       - fix to prevent absurdly large block reservations
      
       - fix broken sysfs getter/setters
      
      * tag 'xfs-for-linus-4.10-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: fix max_retries _show and _store functions
        xfs: update MAINTAINERS
        xfs: fix crash and data corruption due to removal of busy COW extents
        xfs: use the actual AG length when reserving blocks
        xfs: fix double-cleanup when CUI recovery fails
      e02003b5
  11. 04 Jan, 2017 9 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4cf18463
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) stmmac_drv_probe() can race with stmmac_open() because we register
          the netdevice too early. Fix from Florian Fainelli.
      
       2) UFO handling in __ip6_append_data() and ip6_finish_output() use
          different tests for deciding whether a frame will be fragmented or
          not, put them in sync. Fix from Zheng Li.
      
       3) The rtnetlink getstats handlers need to validate that the netlink
          request is large enough, fix from Mathias Krause.
      
       4) Use after free in mlx4 driver, from Jack Morgenstein.
      
       5) Fix setting of garbage UID value in sockets during setattr() calls,
          from Eric Biggers.
      
       6) Packet drop_monitor doesn't format the netlink messages properly
          such that nlmsg_next fails to work, fix from Reiter Wolfgang.
      
       7) Fix handling of wildcard addresses in l2tp lookups, from Guillaume
          Nault.
      
       8) __skb_flow_dissect() can crash on pptp packets, from Ian Kumlien.
      
       9) IGMP code doesn't reset group query timers properly, from Michal
          Tesar.
      
      10) Fix overzealous MAIN/LOCAL route table combining in ipv4, from
          Alexander Duyck.
      
      11) vxlan offload check needs to be more strict in be2net driver, from
          Sabrina Dubroca.
      
      12) Moving l3mdev to packet hooks lost RX stat counters unintentionally,
          fix from David Ahern.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
        sh_eth: enable RX descriptor word 0 shift on SH7734
        sfc: don't report RX hash keys to ethtool when RSS wasn't enabled
        dpaa_eth: Initialize CGR structure before init
        dpaa_eth: cleanup after init_phy() failure
        net: systemport: Pad packet before inserting TSB
        net: systemport: Utilize skb_put_padto()
        LiquidIO VF: s/select/imply/ for PTP_1588_CLOCK
        libcxgb: fix error check for ip6_route_output()
        net: usb: asix_devices: add .reset_resume for USB PM
        net: vrf: Add missing Rx counters
        drop_monitor: consider inserted data in genlmsg_end
        benet: stricter vxlan offloading check in be_features_check
        ipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rules
        net: macb: Updated resource allocation function calls to new version of API.
        net: stmmac: dwmac-oxnas: use generic pm implementation
        net: stmmac: dwmac-oxnas: fix fixed-link-phydev leaks
        net: stmmac: dwmac-oxnas: fix of-node leak
        Documentation/networking: fix typo in mpls-sysctl
        igmp: Make igmp group member RFC 3376 compliant
        flow_dissector: Update pptp handling to avoid null pointer deref.
        ...
      4cf18463
    • Sergei Shtylyov's avatar
      sh_eth: enable RX descriptor word 0 shift on SH7734 · 71eae1ca
      Sergei Shtylyov authored
      The RX descriptor word 0 on SH7734 has the RFS[9:0] field in bits 16-25
      (bits  0-15 usually used for that are occupied by the packet checksum).
      Thus  we need to set the 'shift_rd0'  field in the SH7734 SoC data...
      
      Fixes: f0e81fec ("net: sh_eth: Add support SH7734")
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71eae1ca
    • Edward Cree's avatar
      sfc: don't report RX hash keys to ethtool when RSS wasn't enabled · 4fdda958
      Edward Cree authored
      If we failed to set up RSS on EF10 (e.g. because firmware declared
       RX_RSS_LIMITED), ethtool --show-nfc $dev rx-flow-hash ... should report
       no fields, rather than confusingly reporting what fields we _would_ be
       hashing on if RSS was working.
      
      Fixes: dcb4123c ("sfc: disable RSS when unsupported")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4fdda958
    • David S. Miller's avatar
      Merge branch 'dpaa_eth-fixes' · aa9773be
      David S. Miller authored
      Madalin Bucur says:
      
      ====================
      dpaa_eth: a couple of fixes
      
      Add cleanup on PHY initialization failure path, avoid using
      uninitialized memory at CGR init.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa9773be
    • Roy Pledge's avatar
      dpaa_eth: Initialize CGR structure before init · 0fbb0f24
      Roy Pledge authored
      The QBMan CGR options needs to be zeroed before calling the init
      function
      Signed-off-by: default avatarRoy Pledge <roy.pledge@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fbb0f24
    • Madalin Bucur's avatar
      3fe61f09
    • David S. Miller's avatar
      Merge branch 'systemport-padding-and-TSB-insertion' · c030af87
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: systemport: Fix padding vs. TSB insertion
      
      This patch series fixes how we pad the packets submitted to the SYSTEMPORT
      adapter, and how the transmit status block (prepended 8 bytes) fits in the
      picture. The first patch is not technically a bug fix, but is required for the
      second path to be applied and to greatly simplify the skb length calculation.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c030af87
    • Florian Fainelli's avatar
      net: systemport: Pad packet before inserting TSB · 38e5a855
      Florian Fainelli authored
      Inserting the TSB means adding an extra 8 bytes in front the of packet
      that is going to be used as metadata information by the TDMA engine, but
      stripped off, so it does not really help with the packet padding.
      
      For some odd packet sizes that fall below the 60 bytes payload (e.g: ARP)
      we can end-up padding them after the TSB insertion, thus making them 64
      bytes, but with the TDMA stripping off the first 8 bytes, they could
      still be smaller than 64 bytes which is required to ingress the switch.
      
      Fix this by swapping the padding and TSB insertion, guaranteeing that
      the packets have the right sizes.
      
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38e5a855
    • Florian Fainelli's avatar
      net: systemport: Utilize skb_put_padto() · bb7da333
      Florian Fainelli authored
      Since we need to pad our packets, utilize skb_put_padto() which
      increases skb->len by how much we need to pad, allowing us to eliminate
      the test on skb->len right below.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb7da333