1. 23 Sep, 2014 7 commits
    • Eric Dumazet's avatar
      tcp: add coalescing attempt in tcp_ofo_queue() · bd1e75ab
      Eric Dumazet authored
      In order to make TCP more resilient in presence of reorders, we need
      to allow coalescing to happen when skbs from out of order queue are
      transferred into receive queue. LRO/GRO can be completely canceled
      in some pathological cases, like per packet load balancing on aggregated
      links.
      
      I had to move tcp_try_coalesce() up in the file above tcp_ofo_queue()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd1e75ab
    • Eric Dumazet's avatar
      icmp: add a global rate limitation · 4cdf507d
      Eric Dumazet authored
      Current ICMP rate limiting uses inetpeer cache, which is an RBL tree
      protected by a lock, meaning that hosts can be stuck hard if all cpus
      want to check ICMP limits.
      
      When say a DNS or NTP server process is restarted, inetpeer tree grows
      quick and machine comes to its knees.
      
      iptables can not help because the bottleneck happens before ICMP
      messages are even cooked and sent.
      
      This patch adds a new global limitation, using a token bucket filter,
      controlled by two new sysctl :
      
      icmp_msgs_per_sec - INTEGER
          Limit maximal number of ICMP packets sent per second from this host.
          Only messages whose type matches icmp_ratemask are
          controlled by this limit.
          Default: 1000
      
      icmp_msgs_burst - INTEGER
          icmp_msgs_per_sec controls number of ICMP packets sent per second,
          while icmp_msgs_burst controls the burst size of these packets.
          Default: 50
      
      Note that if we really want to send millions of ICMP messages per
      second, we might extend idea and infra added in commit 04ca6973
      ("ip: make IP identifiers less predictable") :
      add a token bucket in the ip_idents hash and no longer rely on inetpeer.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cdf507d
    • Daniel Borkmann's avatar
      net: bpf: arm: make hole-faulting more robust · e8b56d55
      Daniel Borkmann authored
      Will Deacon pointed out, that the currently used opcode for filling holes,
      that is 0xe7ffffff, seems not robust enough ...
      
        $ echo 0xffffffe7 | xxd -r > test.bin
        $ arm-linux-gnueabihf-objdump -m arm -D -b binary test.bin
        ...
        0: e7ffffff     udf    #65535  ; 0xffff
      
      ... while for Thumb, it ends up as ...
      
        0: ffff e7ff    vqshl.u64  q15, <illegal reg q15.5>, #63
      
      ... which is a bit fragile. The ARM specification defines some *permanently*
      guaranteed undefined instruction (UDF) space, for example for ARM in ARMv7-AR,
      section A5.4 and for Thumb in ARMv7-M, section A5.2.6.
      
      Similarly, ptrace, kprobes, kgdb, bug and uprobes make use of such instruction
      as well to trap. Given mentioned section from the specification, we can find
      such a universe as (where 'x' denotes 'don't care'):
      
        ARM:    xxxx 0111 1111 xxxx xxxx xxxx 1111 xxxx
        Thumb:  1101 1110 xxxx xxxx
      
      We therefore should use a more robust opcode that fits both. Russell King
      suggested that we can even reuse a single 32-bit word, that is, 0xe7fddef1
      which will fault if executed in ARM *or* Thumb mode as done in f928d4f2
      ("ARM: poison the vectors page"). That will still hold our requirements:
      
        $ echo 0xf1defde7 | xxd -r > test.bin
        $ arm-unknown-linux-gnueabi-objdump -m arm -D -b binary test.bin
        ...
        0: e7fddef1     udf    #56801 ; 0xdde1
        $ echo 0xf1defde7f1defde7f1defde7 | xxd -r > test.bin
        $ arm-unknown-linux-gnueabi-objdump -marm -Mforce-thumb -D -b binary test.bin
        ...
        0: def1         udf    #241 ; 0xf1
        2: e7fd         b.n    0x0
        4: def1         udf    #241 ; 0xf1
        6: e7fd         b.n    0x4
        8: def1         udf    #241 ; 0xf1
        a: e7fd         b.n    0x8
      
      So on ARM 0xe7fddef1 conforms to the above UDF pattern, and the low 16 bit
      likewise correspond to UDF in Thumb case. The 0xe7fd part is an unconditional
      branch back to the UDF instruction.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mircea Gherzan <mgherzan@gmail.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8b56d55
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 1f6d8035
      David S. Miller authored
      Conflicts:
      	arch/mips/net/bpf_jit.c
      	drivers/net/can/flexcan.c
      
      Both the flexcan and MIPS bpf_jit conflicts were cases of simple
      overlapping changes.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f6d8035
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 98f75b82
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) If the user gives us a msg_namelen of 0, don't try to interpret
          anything pointed to by msg_name.  From Ani Sinha.
      
       2) Fix some bnx2i/bnx2fc randconfig compilation errors.
      
          The gist of the issue is that we firstly have drivers that span both
          SCSI and networking.  And at the top of that chain of dependencies
          we have things like SCSI_FC_ATTRS and SCSI_NETLINK which are
          selected.
      
          But since select is a sledgehammer and ignores dependencies,
          everything to select's SCSI_FC_ATTRS and/or SCSI_NETLINK has to also
          explicitly select their dependencies and so on and so forth.
      
          Generally speaking 'select' is supposed to only be used for child
          nodes, those which have no dependencies of their own.  And this
          whole chain of dependencies in the scsi layer violates that rather
          strongly.
      
          So just make SCSI_NETLINK depend upon it's dependencies, and so on
          and so forth for the things selecting it (either directly or
          indirectly).
      
          From Anish Bhatt and Randy Dunlap.
      
       3) Fix generation of blackhole routes in IPSEC, from Steffen Klassert.
      
       4) Actually notice netdev feature changes in rtl_open() code, from
          Hayes Wang.
      
       5) Fix divide by zero in bond enslaving, from Nikolay Aleksandrov.
      
       6) Missing memory barrier in sunvnet driver, from David Stevens.
      
       7) Don't leave anycast addresses around when ipv6 interface is
          destroyed, from Sabrina Dubroca.
      
       8) Don't call efx_{arch}_filter_sync_rx_mode before addr_list_lock is
          initialized in SFC driver, from Edward Cree.
      
       9) Fix missing DMA error checking in 3c59x, from Neal Horman.
      
      10) Openvswitch doesn't emit OVS_FLOW_CMD_NEW notifications accidently,
          fix from Samuel Gauthier.
      
      11) pch_gbe needs to select NET_PTP_CLASSIFY otherwise we can get a
          build error.
      
      12) Fix macvlan regression wherein we stopped emitting
          broadcast/multicast frames over software devices.  From Nicolas
          Dichtel.
      
      13) Fix infiniband bug due to unintended overflow of skb->cb[], from
          Eric Dumazet.  And add an assertion so this doesn't happen again.
      
      14) dm9000_parse_dt() should return error pointers, not NULL.  From
          Tobias Klauser.
      
      15) IP tunneling code uses this_cpu_ptr() in preemptible contexts, fix
          from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
        net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma
        net: bcmgenet: fix TX reclaim accounting for fragments
        ipv4: do not use this_cpu_ptr() in preemptible context
        dm9000: Return an ERR_PTR() in all error conditions of dm9000_parse_dt()
        r8169: fix an if condition
        r8152: disable ALDPS
        ipoib: validate struct ipoib_cb size
        net: sched: shrink struct qdisc_skb_cb to 28 bytes
        tg3: Work around HW/FW limitations with vlan encapsulated frames
        macvlan: allow to enqueue broadcast pkt on virtual device
        pch_gbe: 'select' NET_PTP_CLASSIFY.
        scsi: Use 'depends' with LIBFC instead of 'select'.
        openvswitch: restore OVS_FLOW_CMD_NEW notifications
        genetlink: add function genl_has_listeners()
        lib: rhashtable: remove second linux/log2.h inclusion
        net: allow macvlans to move to net namespace
        3c59x: Fix bad offset spec in skb_frag_dma_map
        3c59x: Add dma error checking and recovery
        sparc: bpf_jit: fix support for ldx/stx mem and SKF_AD_VLAN_TAG
        can: at91_can: add missing prepare and unprepare of the clock
        ...
      98f75b82
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux · 94783036
      Linus Torvalds authored
      Pull clock layer fixes from Mike Turquette:
       "The fixes for the clock tree are mostly run-time bugs in clock
        drivers.
      
        The fixes for TI DRA7 remove divide-by-zero errors.  The recently
        merged AT91 clock driver fixes some bad error checking and the QCOM
        driver fix restores audio for that platform, a clear regression.  A
        list iteration bug in the framework core was hit recently and is fixed
        up here.  Finally a compilation warning is fixed for efm32gg, which is
        also a regression fix"
      
      * tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux:
        clk/efm32gg: fix dt init prototype
        clk: prevent erronous parsing of children during rate change
        clk: rockchip: Fix the clocks for i2c1 and i2c2
        clk: qcom: Fix sdc 144kHz frequency entry
        clk: at91: fix num_parents test in at91sam9260 slow clk implementation
        clk: ti: dra7-atl: Provide error check for incoming parameters in set_rate
        clk: ti: divider: Provide error check for incoming parameters in set_rate
      94783036
    • Linus Torvalds's avatar
      Merge tag 'fscache-fixes-20140917' of... · e2519c2c
      Linus Torvalds authored
      Merge tag 'fscache-fixes-20140917' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      Pull fs-cache fixes from David Howells:
      
       - Put a timeout in releasepage() to deal with a recursive hang between
         the memory allocator, writeback, ext4 and fscache under memory
         pressure.
      
       - Fix a pair of refcount bugs in the fscache error handling.
      
       - Remove a couple of unused pagevecs.
      
       - The cachefiles requirement that the base directory support rename
         should permit rename2 as an alternative - otherwise certain
         filesystems cannot now be used as backing stores (such as ext4).
      
      * tag 'fscache-fixes-20140917' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        CacheFiles: Handle rename2
        cachefiles: remove two unused pagevecs.
        FS-Cache: refcount becomes corrupt under vma pressure.
        FS-Cache: Reduce cookie ref count if submit fails.
        FS-Cache: Timeout for releasepage()
      e2519c2c
  2. 22 Sep, 2014 33 commits