1. 26 Mar, 2008 15 commits
  2. 24 Mar, 2008 5 commits
  3. 23 Mar, 2008 5 commits
  4. 22 Mar, 2008 1 commit
    • Herbert Xu's avatar
      [TCP]: Let skbs grow over a page on fast peers · 69d15067
      Herbert Xu authored
      While testing the virtio-net driver on KVM with TSO I noticed
      that TSO performance with a 1500 MTU is significantly worse
      compared to the performance of non-TSO with a 16436 MTU.  The
      packet dump shows that most of the packets sent are smaller
      than a page.
      
      Looking at the code this actually is quite obvious as it always
      stop extending the packet if it's the first packet yet to be
      sent and if it's larger than the MSS.  Since each extension is
      bound by the page size, this means that (given a 1500 MTU) we're
      very unlikely to construct packets greater than a page, provided
      that the receiver and the path is fast enough so that packets can
      always be sent immediately.
      
      The fix is also quite obvious.  The push calls inside the loop
      is just an optimisation so that we don't end up doing all the
      sending at the end of the loop.  Therefore there is no specific
      reason why it has to do so at MSS boundaries.  For TSO, the
      most natural extension of this optimisation is to do the pushing
      once the skb exceeds the TSO size goal.
      
      This is what the patch does and testing with KVM shows that the
      TSO performance with a 1500 MTU easily surpasses that of a 16436
      MTU and indeed the packet sizes sent are generally larger than
      16436.
      
      I don't see any obvious downsides for slower peers or connections,
      but it would be prudent to test this extensively to ensure that
      those cases don't regress.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69d15067
  5. 21 Mar, 2008 4 commits
  6. 20 Mar, 2008 10 commits
    • YOSHIFUJI Hideaki's avatar
      [IPV6] KCONFIG: Fix description about IPV6_TUNNEL. · 38fe999e
      YOSHIFUJI Hideaki authored
      Based on notice from "Colin" <colins@sjtu.edu.cn>.
      Signed-off-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38fe999e
    • Patrick McHardy's avatar
      [TCP]: Fix shrinking windows with window scaling · 607bfbf2
      Patrick McHardy authored
      When selecting a new window, tcp_select_window() tries not to shrink
      the offered window by using the maximum of the remaining offered window
      size and the newly calculated window size. The newly calculated window
      size is always a multiple of the window scaling factor, the remaining
      window size however might not be since it depends on rcv_wup/rcv_nxt.
      This means we're effectively shrinking the window when scaling it down.
      
      
      The dump below shows the problem (scaling factor 2^7):
      
      - Window size of 557 (71296) is advertised, up to 3111907257:
      
      IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>
      
      - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
        below the last end:
      
      IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>
      
      The number 40 results from downscaling the remaining window:
      
      3111907257 - 3111841425 = 65832
      65832 / 2^7 = 514
      65832 % 2^7 = 40
      
      If the sender uses up the entire window before it is shrunk, this can have
      chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
      will notice that the window has been shrunk since tcp_wnd_end() is before
      tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
      This will fail the receivers checks in tcp_sequence() however since it
      is before it's tp->rcv_wup, making it respond with a dupack.
      
      If both sides are in this condition, this leads to a constant flood of
      ACKs until the connection times out.
      
      Make sure the window is never shrunk by aligning the remaining window to
      the window scaling factor.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      607bfbf2
    • Jarek Poplawski's avatar
      netpoll: zap_completion_queue: adjust skb->users counter · 8a455b08
      Jarek Poplawski authored
      zap_completion_queue() retrieves skbs from completion_queue where they have
      zero skb->users counter.  Before dev_kfree_skb_any() it should be non-zero
      yet, so it's increased now.
      Reported-and-tested-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a455b08
    • Fabio Checconi's avatar
      bridge: use time_before() in br_fdb_cleanup() · 2bec008c
      Fabio Checconi authored
      In br_fdb_cleanup() next_timer and this_timer are in jiffies, so they
      should be compared using the time_after() macro.
      Signed-off-by: default avatarFabio Checconi <fabio@gandalf.sssup.it>
      Signed-off-by: default avatarStephen Hemminger <stephen.hemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bec008c
    • David S. Miller's avatar
      [TG3]: Fix build warning on sparc32. · 7582a335
      David S. Miller authored
      Sparc MAC address support should be protected consistently
      with CONFIG_SPARC, but there was a stray CONFIG_SPARC64
      case.
      
      Bump driver version and release date.
      
      Reported by Andrew Morton.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7582a335
    • Pavel Machek's avatar
      781c2844
    • Pavel Emelyanov's avatar
      audit: netlink socket can be auto-bound to pid other than current->pid (v2) · 75c0371a
      Pavel Emelyanov authored
      From:	Pavel Emelyanov <xemul@openvz.org>
      
      This patch is based on the one from Thomas.
      
      The kauditd_thread() calls the netlink_unicast() and passes 
      the audit_pid to it. The audit_pid, in turn, is received from 
      the user space and the tool (I've checked the audit v1.6.9) 
      uses getpid() to pass one in the kernel. Besides, this tool 
      doesn't bind the netlink socket to this id, but simply creates 
      it allowing the kernel to auto-bind one.
      
      That's the preamble.
      
      The problem is that netlink_autobind() _does_not_ guarantees
      that the socket will be auto-bound to the current pid. Instead
      it uses the current pid as a hint to start looking for a free
      id. So, in case of conflict, the audit messages can be sent
      to a wrong socket. This can happen (it's unlikely, but can be)
      in case some task opens more than one netlink sockets and then
      the audit one starts - in this case the audit's pid can be busy
      and its socket will be bound to another id.
      
      The proposal is to introduce an audit_nlk_pid in audit subsys,
      that will point to the netlink socket to send packets to. It
      will most often be equal to audit_pid. The socket id can be 
      got from the skb's netlink CB right in the audit_receive_msg.
      The audit_nlk_pid reset to 0 is not required, since all the
      decisions are taken based on audit_pid value only.
      
      Later, if the audit tools will bind the socket themselves, the
      kernel will have to provide a way to setup the audit_nlk_pid
      as well.
      
      A good side effect of this patch is that audit_pid can later 
      be converted to struct pid, as it is not longer safe to use 
      pid_t-s in the presence of pid namespaces. But audit code still 
      uses the tgid from task_struct in the audit_signal_info and in
      the audit_filter_syscall.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75c0371a
    • Andre Noll's avatar
      [NET]: Fix permissions of /proc/net · 4f42c288
      Andre Noll authored
      commit e9720acd ([NET]: Make /proc/net a symlink on /proc/self/net (v3))
      broke ganglia and probably other applications that read /proc/net/dev.
      
      This is due to the change of permissions of /proc/net that was
      introduced in that commit.
      
      Before: dr-xr-xr-x 5 root root 0 Mar 19 11:30 /proc/net
      After: dr-xr--r-- 5 root root 0 Mar 19 11:29 /proc/self/net
      
      This patch restores the permissions to the old value which makes
      ganglia happy again.
      
      Pavel Emelyanov says:
      
      	This also broke the postfix, as it was reported in bug #10286
      	and described in detail by Benjamin.
      Signed-off-by: default avatarAndre Noll <maan@systemlinux.org>
      Acked-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f42c288
    • Vlad Yasevich's avatar
      [SCTP]: Fix a race between module load and protosw access · 270637ab
      Vlad Yasevich authored
      There is a race is SCTP between the loading of the module
      and the access by the socket layer to the protocol functions.
      In particular, a list of addresss that SCTP maintains is
      not initialized prior to the registration with the protosw.
      Thus it is possible for a user application to gain access
      to SCTP functions before everything has been initialized.
      The problem shows up as odd crashes during connection
      initializtion when we try to access the SCTP address list.
      
      The solution is to refactor how we do registration and
      initialize the lists prior to registering with the protosw.
      Care must be taken since the address list initialization
      depends on some other pieces of SCTP initialization.  Also
      the clean-up in case of failure now also needs to be refactored.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Acked-by: default avatarSridhar Samudrala <sri@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      270637ab
    • Daniel Hokka Zakrisson's avatar
      [NETFILTER]: ipt_recent: sanity check hit count · d0ebf133
      Daniel Hokka Zakrisson authored
      If a rule using ipt_recent is created with a hit count greater than
      ip_pkt_list_tot, the rule will never match as it cannot keep track
      of enough timestamps. This patch makes ipt_recent refuse to create such
      rules.
      
      With ip_pkt_list_tot's default value of 20, the following can be used
      to reproduce the problem.
      
      nc -u -l 0.0.0.0 1234 &
      for i in `seq 1 100`; do echo $i | nc -w 1 -u 127.0.0.1 1234; done
      
      This limits it to 20 packets:
      iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
               --rsource
      iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
               60 --hitcount 20 --name test --rsource -j DROP
      
      While this is unlimited:
      iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
               --rsource
      iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
               60 --hitcount 21 --name test --rsource -j DROP
      
      With the patch the second rule-set will throw an EINVAL.
      Reported-by: default avatarSean Kennedy <skennedy@vcn.com>
      Signed-off-by: default avatarDaniel Hokka Zakrisson <daniel@hozac.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0ebf133