1. 19 May, 2014 11 commits
    • Jonathan Salwan's avatar
      drivers/cdrom/cdrom.c: use kzalloc() for failing hardware · 6c8fbb40
      Jonathan Salwan authored
      commit 542db015 upstream
      
      In drivers/cdrom/cdrom.c mmc_ioctl_cdrom_read_data() allocates a memory
      area with kmalloc in line 2885.
      
        2885         cgc->buffer = kmalloc(blocksize, GFP_KERNEL);
        2886         if (cgc->buffer == NULL)
        2887                 return -ENOMEM;
      
      In line 2908 we can find the copy_to_user function:
      
        2908         if (!ret && copy_to_user(arg, cgc->buffer, blocksize))
      
      The cgc->buffer is never cleaned and initialized before this function.
      If ret = 0 with the previous basic block, it's possible to display some
      memory bytes in kernel space from userspace.
      
      When we read a block from the disk it normally fills the ->buffer but if
      the drive is malfunctioning there is a chance that it would only be
      partially filled.  The result is an leak information to userspace.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6c8fbb40
    • Dan Carpenter's avatar
      cpqarray: fix info leak in ida_locked_ioctl() · 81fd7a29
      Dan Carpenter authored
      commit 627aad1c upstream
      
      The pciinfo struct has a two byte hole after ->dev_fn so stack
      information could be leaked to the user.
      
      This was assigned CVE-2013-2147.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarMike Miller <mike.miller@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      81fd7a29
    • Dan Carpenter's avatar
      cciss: fix info leak in cciss_ioctl32_passthru() · b141d47e
      Dan Carpenter authored
      commit 58f09e00 upstream.
      
      The arg64 struct has a hole after ->buf_size which isn't cleared.  Or if
      any of the calls to copy_from_user() fail then that would cause an
      information leak as well.
      
      This was assigned CVE-2013-2147.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarMike Miller <mike.miller@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b141d47e
    • Tetsuo Handa's avatar
      kernel/kmod.c: check for NULL in call_usermodehelper_exec() · 023ae3b0
      Tetsuo Handa authored
      If /proc/sys/kernel/core_pattern contains only "|", a NULL pointer
      dereference happens upon core dump because argv_split("") returns
      argv[0] == NULL.
      
      This bug was once fixed by commit 264b83c0 ("usermodehelper: check
      subprocess_info->path != NULL") but was by error reintroduced by commit
      7f57cfa4 ("usermodehelper: kill the sub_info->path[0] check").
      
      This bug seems to exist since 2.6.19 (the version which core dump to
      pipe was added).  Depending on kernel version and config, some side
      effect might happen immediately after this oops (e.g.  kernel panic with
      2.6.32-358.18.1.el6).
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit 4c1c7be9)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      023ae3b0
    • Ian Abbott's avatar
      staging: comedi: ni_65xx: (bug fix) confine insn_bits to one subdevice · 2ccae1f0
      Ian Abbott authored
      Commit 677a3156 upstream.
      
      The `insn_bits` handler `ni_65xx_dio_insn_bits()` has a `for` loop that
      currently writes (optionally) and reads back up to 5 "ports" consisting
      of 8 channels each.  It reads up to 32 1-bit channels but can only read
      and write a whole port at once - it needs to handle up to 5 ports as the
      first channel it reads might not be aligned on a port boundary.  It
      breaks out of the loop early if the next port it handles is beyond the
      final port on the card.  It also breaks out early on the 5th port in the
      loop if the first channel was aligned.  Unfortunately, it doesn't check
      that the current port it is dealing with belongs to the comedi subdevice
      the `insn_bits` handler is acting on.  That's a bug.
      
      Redo the `for` loop to terminate after the final port belonging to the
      subdevice, changing the loop variable in the process to simplify things
      a bit.  The `for` loop could now try and handle more than 5 ports if the
      subdevice has more than 40 channels, but the test `if (bitshift >= 32)`
      ensures it will break out early after 4 or 5 ports (depending on whether
      the first channel is aligned on a port boundary).  (`bitshift` will be
      between -7 and 7 inclusive on the first iteration, increasing by 8 for
      each subsequent operation.)
      Signed-off-by: default avatarIan Abbott <abbotti@mev.co.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2ccae1f0
    • Jitendra Bhivare's avatar
      intel-iommu: Flush unmaps at domain_exit · 92e44088
      Jitendra Bhivare authored
      Backported Alex Williamson's commit to 2.6.32.y
      http://git.kernel.org/linus/7b668357810ecb5fdda4418689d50f5d95aea6a8
      
      It resolves the following assert when module is immediately reloaded.
      
      kernel BUG at drivers/pci/iova.c:155!
      <snip>
      Call Trace:
      [<ffffffff812645c5>] intel_alloc_iova+0xb5/0xe0
      [<ffffffff8126725e>] __intel_map_single+0xbe/0x210
      [<ffffffff812674ae>] intel_alloc_coherent+0xae/0x120
      [<ffffffffa035f909>] be_queue_alloc+0xb9/0x140 [be2net]
      [<ffffffffa035fa5a>] be_rx_qs_create+0xca/0x370 [be2net]
      <snip>
      
      The issue is reproducible in 2.6.32.60 and also gets resolved
      by passing intel-iommu=strict to kernel.
      Signed-off-by: default avatarJitendra Bhivare <jitendra.bhivare@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      92e44088
    • Julian Anastasov's avatar
      ipvs: fix CHECKSUM_PARTIAL for TCP, UDP · 030edd62
      Julian Anastasov authored
       	Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP,
      UDP not tested because it needs network card with HW CSUM support.
      May be fixes problem where IPVS can not be used in virtual boxes.
      Problem appears with DNAT to local address when the local stack
      sends reply in CHECKSUM_PARTIAL mode.
      
       	Fix tcp_dnat_handler and udp_dnat_handler to provide
      vaddr and daddr in right order (old and new IP) when calling
      tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL).
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      (cherry picked from commit 5bc9068e)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      030edd62
    • Willy Tarreau's avatar
      x86, ptrace: fix build breakage with gcc 4.7 (second try) · 40c74e0d
      Willy Tarreau authored
      syscall_trace_enter() and syscall_trace_leave() are only called from
      within asm code and do not need to be declared in the .c at all.
      Removing their reference fixes the build issue that was happening
      with gcc 4.7.
      
      Both Sven-Haegar Koch and Christoph Biedl confirmed this patch
      addresses their respective build issues.
      
      Cc: Sven-Haegar Koch <haegar@sdinet.de>
      Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      40c74e0d
    • Willy Tarreau's avatar
      Revert "x86, ptrace: fix build breakage with gcc 4.7" · 4d6afb4f
      Willy Tarreau authored
      This reverts commit 4ed3bb08.
      
      As reported by Sven-Haegar Koch, this patch breaks make headers_check :
      
         CHECK   include (0 files)
         CHECK   include/asm (54 files)
         /home/haegar/src/2.6.32/linux/usr/include/asm/ptrace.h:5: included file 'linux/linkage.h' is not exported
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4d6afb4f
    • Ben Greear's avatar
      Fix lockup related to stop_machine being stuck in __do_softirq. · beca8411
      Ben Greear authored
      The stop machine logic can lock up if all but one of the migration
      threads make it through the disable-irq step and the one remaining
      thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
      that it has a bail-out based on jiffies timeout, but in the lockup case,
      jiffies itself is not incremented.
      
      To work around this, re-add the max_restart counter in __do_irq and stop
      processing irqs after 10 restarts.
      
      Thanks to Tejun Heo and Rusty Russell and others for helping me track
      this down.
      
      This was introduced in 3.9 by commit c10d7367 ("softirq: reduce
      latencies").
      
      It may be worth looking into ath9k to see if it has issues with its irq
      handler at a later date.
      
      The hang stack traces look something like this:
      
          ------------[ cut here ]------------
          WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
          Watchdog detected hard LOCKUP on cpu 2
          Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
          Pid: 23, comm: migration/2 Tainted: G         C   3.9.4+ #11
          Call Trace:
           <NMI>   warn_slowpath_common+0x85/0x9f
            warn_slowpath_fmt+0x46/0x48
            watchdog_overflow_callback+0x9c/0xa7
            __perf_event_overflow+0x137/0x1cb
            perf_event_overflow+0x14/0x16
            intel_pmu_handle_irq+0x2dc/0x359
            perf_event_nmi_handler+0x19/0x1b
            nmi_handle+0x7f/0xc2
            do_nmi+0xbc/0x304
            end_repeat_nmi+0x1e/0x2e
           <<EOE>>
            cpu_stopper_thread+0xae/0x162
            smpboot_thread_fn+0x258/0x260
            kthread+0xc7/0xcf
            ret_from_fork+0x7c/0xb0
          ---[ end trace 4947dfa9b0a4cec3 ]---
          BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
          Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
          irq event stamp: 835637905
          hardirqs last  enabled at (835637904): __do_softirq+0x9f/0x257
          hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
          softirqs last  enabled at (5654720): __do_softirq+0x1ff/0x257
          softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
          CPU 1
          Pid: 17, comm: migration/1 Tainted: G        WC   3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
          RIP: tasklet_hi_action+0xf0/0xf0
          Process migration/1
          Call Trace:
           <IRQ>
            __do_softirq+0x117/0x257
            irq_exit+0x5f/0xbb
            smp_apic_timer_interrupt+0x8a/0x98
            apic_timer_interrupt+0x72/0x80
           <EOI>
            printk+0x4d/0x4f
            stop_machine_cpu_stop+0x22c/0x274
            cpu_stopper_thread+0xae/0x162
            smpboot_thread_fn+0x258/0x260
            kthread+0xc7/0xcf
            ret_from_fork+0x7c/0xb0
      Signed-off-by: default avatarBen Greear <greearb@candelatech.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarPekka Riikonen <priikone@iki.fi>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: stable@kernel.org
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from commit 34376a50)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      beca8411
    • Thomas Bork's avatar
      scsi: fix missing include linux/types.h in scsi_netlink.h · 9bf823ec
      Thomas Bork authored
      Thomas Bork reported that commit c6203cd4 ("scsi: use __uX
      types for headers exported to user space") caused a regression
      as now he's getting this warning :
      
      > /usr/src/linux-2.6.32-eisfair-1/usr/include/scsi/scsi_netlink.h:108:
      > found __[us]{8,16,32,64} type without #include <linux/types.h>
      
      This issue was addressed later by commit 10db4e1e ("headers:
      include linux/types.h where appropriate"), so let's just pick the
      relevant part from it.
      
      Cc: Thomas Bork <tom@eisfair.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9bf823ec
  2. 10 Jun, 2013 29 commits
    • Willy Tarreau's avatar
      Linux 2.6.32.61 · feb908dd
      Willy Tarreau authored
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      feb908dd
    • Willy Tarreau's avatar
      x86, ptrace: fix build breakage with gcc 4.7 · 4ed3bb08
      Willy Tarreau authored
      Christoph Biedl reported that 2.6.32 does not build with gcc 4.7 on
      i386 :
      
        CC      arch/x86/kernel/ptrace.o
      arch/x86/kernel/ptrace.c:1472:17: error: conflicting types for 'syscall_trace_enter'
      In file included from /PKGBUILDDIR/arch/x86/include/asm/vm86.h:130:0,
                       from /PKGBUILDDIR/arch/x86/include/asm/processor.h:10,
                       from /PKGBUILDDIR/arch/x86/include/asm/thread_info.h:22,
                       from include/linux/thread_info.h:56,
                       from include/linux/preempt.h:9,
                       from include/linux/spinlock.h:50,
                       from include/linux/seqlock.h:29,
                       from include/linux/time.h:8,
                       from include/linux/timex.h:56,
                       from include/linux/sched.h:56,
                       from arch/x86/kernel/ptrace.c:11:
      /PKGBUILDDIR/arch/x86/include/asm/ptrace.h:145:13: note: previous declaration of 'syscall_trace_enter' was here
      arch/x86/kernel/ptrace.c:1517:17: error: conflicting types for 'syscall_trace_leave'
      In file included from /PKGBUILDDIR/arch/x86/include/asm/vm86.h:130:0,
                       from /PKGBUILDDIR/arch/x86/include/asm/processor.h:10,
                       from /PKGBUILDDIR/arch/x86/include/asm/thread_info.h:22,
                       from include/linux/thread_info.h:56,
                       from include/linux/preempt.h:9,
                       from include/linux/spinlock.h:50,
                       from include/linux/seqlock.h:29,
                       from include/linux/time.h:8,
                       from include/linux/timex.h:56,
                       from include/linux/sched.h:56,
                       from arch/x86/kernel/ptrace.c:11:
      /PKGBUILDDIR/arch/x86/include/asm/ptrace.h:146:13: note: previous declaration of 'syscall_trace_leave' was here
      make[4]: *** [arch/x86/kernel/ptrace.o] Error 1
      make[3]: *** [arch/x86/kernel] Error 2
      make[3]: *** Waiting for unfinished jobs....
      
      He also found that this issue did not appear in more recent kernels since
      this asmregparm disappeared in 3.0-rc1 with commit 1b4ac2a9 that was
      applied after some UM changes that we don't necessarily want in 2.6.32.
      
      Thus, the cleanest fix for older kernels is to make the declaration in
      ptrace.h match the one in ptrace.c by specifying asmregparm on these
      functions. They're only called from asm which explains why it used to
      work despite the inconsistency in the declaration.
      Reported-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Tested-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4ed3bb08
    • Kashyap, Desai's avatar
      mpt2sas: Send default descriptor for RAID pass through in mpt2ctl · ad6bb568
      Kashyap, Desai authored
      commit ebda4d38 upstream.
      
      RAID_SCSI_IO_PASSTHROUGH: Driver needs to be sending the default
      descriptor for RAID Passthru, currently its sending SCSI_IO descriptor.
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@lsi.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ad6bb568
    • Mathias Krause's avatar
      tipc: fix info leaks via msg_name in recv_msg/recv_stream · 0736a717
      Mathias Krause authored
      commit 60085c3d upstream.
      
      The code in set_orig_addr() does not initialize all of the members of
      struct sockaddr_tipc when filling the sockaddr info -- namely the union
      is only partly filled. This will make recv_msg() and recv_stream() --
      the only users of this function -- leak kernel stack memory as the
      msg_name member is a local variable in net/socket.c.
      
      Additionally to that both recv_msg() and recv_stream() fail to update
      the msg_namelen member to 0 while otherwise returning with 0, i.e.
      "success". This is the case for, e.g., non-blocking sockets. This will
      lead to a 128 byte kernel stack leak in net/socket.c.
      
      Fix the first issue by initializing the memory of the union with
      memset(0). Fix the second one by setting msg_namelen to 0 early as it
      will be updated later if we're going to fill the msg_name member.
      
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Allan Stephens <allan.stephens@windriver.com>
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [dannf: backported to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0736a717
    • Mathias Krause's avatar
      irda: Fix missing msg_namelen update in irda_recvmsg_dgram() · 21f908c9
      Mathias Krause authored
      commit 5ae94c0d upstream.
      
      The current code does not fill the msg_name member in case it is set.
      It also does not set the msg_namelen member to 0 and therefore makes
      net/socket.c leak the local, uninitialized sockaddr_storage variable
      to userland -- 128 bytes of kernel stack memory.
      
      Fix that by simply setting msg_namelen to 0 as obviously nobody cared
      about irda_recvmsg_dgram() not filling the msg_name in case it was
      set.
      
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [dannf: adjusted to apply to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      21f908c9
    • Mathias Krause's avatar
      rose: fix info leak via msg_name in rose_recvmsg() · 3cab351d
      Mathias Krause authored
      [ Upstream commit 4a184233 ]
      
      The code in rose_recvmsg() does not initialize all of the members of
      struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
      Nor does it initialize the padding bytes of the structure inserted by
      the compiler for alignment. This will lead to leaking uninitialized
      kernel stack bytes in net/socket.c.
      
      Fix the issue by initializing the memory used for sockaddr info with
      memset(0).
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3cab351d
    • Weiping Pan's avatar
      rds: set correct msg_namelen · a86c9b39
      Weiping Pan authored
      commit 06b6a1cf upstream
      
      Jay Fenlason (fenlason@redhat.com) found a bug,
      that recvfrom() on an RDS socket can return the contents of random kernel
      memory to userspace if it was called with a address length larger than
      sizeof(struct sockaddr_in).
      rds_recvmsg() also fails to set the addr_len paramater properly before
      returning, but that's just a bug.
      There are also a number of cases wher recvfrom() can return an entirely bogus
      address. Anything in rds_recvmsg() that returns a non-negative value but does
      not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
      at the end of the while(1) loop will return up to 128 bytes of kernel memory
      to userspace.
      
      And I write two test programs to reproduce this bug, you will see that in
      rds_server, fromAddr will be overwritten and the following sock_fd will be
      destroyed.
      Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
      better to make the kernel copy the real length of address to user space in
      such case.
      
      How to run the test programs ?
      I test them on 32bit x86 system, 3.5.0-rc7.
      
      1 compile
      gcc -o rds_client rds_client.c
      gcc -o rds_server rds_server.c
      
      2 run ./rds_server on one console
      
      3 run ./rds_client on another console
      
      4 you will see something like:
      server is waiting to receive data...
      old socket fd=3
      server received data from client:data from client
      msg.msg_namelen=32
      new socket fd=-1067277685
      sendmsg()
      : Bad file descriptor
      
      /***************** rds_client.c ********************/
      
      int main(void)
      {
      	int sock_fd;
      	struct sockaddr_in serverAddr;
      	struct sockaddr_in toAddr;
      	char recvBuffer[128] = "data from client";
      	struct msghdr msg;
      	struct iovec iov;
      
      	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
      	if (sock_fd < 0) {
      		perror("create socket error\n");
      		exit(1);
      	}
      
      	memset(&serverAddr, 0, sizeof(serverAddr));
      	serverAddr.sin_family = AF_INET;
      	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
      	serverAddr.sin_port = htons(4001);
      
      	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
      		perror("bind() error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	memset(&toAddr, 0, sizeof(toAddr));
      	toAddr.sin_family = AF_INET;
      	toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
      	toAddr.sin_port = htons(4000);
      	msg.msg_name = &toAddr;
      	msg.msg_namelen = sizeof(toAddr);
      	msg.msg_iov = &iov;
      	msg.msg_iovlen = 1;
      	msg.msg_iov->iov_base = recvBuffer;
      	msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
      	msg.msg_control = 0;
      	msg.msg_controllen = 0;
      	msg.msg_flags = 0;
      
      	if (sendmsg(sock_fd, &msg, 0) == -1) {
      		perror("sendto() error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	printf("client send data:%s\n", recvBuffer);
      
      	memset(recvBuffer, '\0', 128);
      
      	msg.msg_name = &toAddr;
      	msg.msg_namelen = sizeof(toAddr);
      	msg.msg_iov = &iov;
      	msg.msg_iovlen = 1;
      	msg.msg_iov->iov_base = recvBuffer;
      	msg.msg_iov->iov_len = 128;
      	msg.msg_control = 0;
      	msg.msg_controllen = 0;
      	msg.msg_flags = 0;
      	if (recvmsg(sock_fd, &msg, 0) == -1) {
      		perror("recvmsg() error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	printf("receive data from server:%s\n", recvBuffer);
      
      	close(sock_fd);
      
      	return 0;
      }
      
      /***************** rds_server.c ********************/
      
      int main(void)
      {
      	struct sockaddr_in fromAddr;
      	int sock_fd;
      	struct sockaddr_in serverAddr;
      	unsigned int addrLen;
      	char recvBuffer[128];
      	struct msghdr msg;
      	struct iovec iov;
      
      	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
      	if(sock_fd < 0) {
      		perror("create socket error\n");
      		exit(0);
      	}
      
      	memset(&serverAddr, 0, sizeof(serverAddr));
      	serverAddr.sin_family = AF_INET;
      	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
      	serverAddr.sin_port = htons(4000);
      	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
      		perror("bind error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	printf("server is waiting to receive data...\n");
      	msg.msg_name = &fromAddr;
      
      	/*
      	 * I add 16 to sizeof(fromAddr), ie 32,
      	 * and pay attention to the definition of fromAddr,
      	 * recvmsg() will overwrite sock_fd,
      	 * since kernel will copy 32 bytes to userspace.
      	 *
      	 * If you just use sizeof(fromAddr), it works fine.
      	 * */
      	msg.msg_namelen = sizeof(fromAddr) + 16;
      	/* msg.msg_namelen = sizeof(fromAddr); */
      	msg.msg_iov = &iov;
      	msg.msg_iovlen = 1;
      	msg.msg_iov->iov_base = recvBuffer;
      	msg.msg_iov->iov_len = 128;
      	msg.msg_control = 0;
      	msg.msg_controllen = 0;
      	msg.msg_flags = 0;
      
      	while (1) {
      		printf("old socket fd=%d\n", sock_fd);
      		if (recvmsg(sock_fd, &msg, 0) == -1) {
      			perror("recvmsg() error\n");
      			close(sock_fd);
      			exit(1);
      		}
      		printf("server received data from client:%s\n", recvBuffer);
      		printf("msg.msg_namelen=%d\n", msg.msg_namelen);
      		printf("new socket fd=%d\n", sock_fd);
      		strcat(recvBuffer, "--data from server");
      		if (sendmsg(sock_fd, &msg, 0) == -1) {
      			perror("sendmsg()\n");
      			close(sock_fd);
      			exit(1);
      		}
      	}
      
      	close(sock_fd);
      	return 0;
      }
      Signed-off-by: default avatarWeiping Pan <wpan@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [dannf: Adjusted to apply to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a86c9b39
    • Mathias Krause's avatar
      llc: Fix missing msg_namelen update in llc_ui_recvmsg() · 5f527802
      Mathias Krause authored
      [ Upstream commit c77a4b9c ]
      
      For stream sockets the code misses to update the msg_namelen member
      to 0 and therefore makes net/socket.c leak the local, uninitialized
      sockaddr_storage variable to userland -- 128 bytes of kernel stack
      memory. The msg_namelen update is also missing for datagram sockets
      in case the socket is shutting down during receive.
      
      Fix both issues by setting msg_namelen to 0 early. It will be
      updated later if we're going to fill the msg_name member.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5f527802
    • Mathias Krause's avatar
      llc: fix info leak via getsockname() · 3c445af8
      Mathias Krause authored
      [ Upstream commit 3592aaeb ]
      
      The LLC code wrongly returns 0, i.e. "success", when the socket is
      zapped. Together with the uninitialized uaddrlen pointer argument from
      sys_getsockname this leads to an arbitrary memory leak of up to 128
      bytes kernel stack via the getsockname() syscall.
      
      Return an error instead when the socket is zapped to prevent the info
      leak. Also remove the unnecessary memset(0). We don't directly write to
      the memory pointed by uaddr but memcpy() a local structure at the end of
      the function that is properly initialized.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3c445af8
    • Mathias Krause's avatar
      iucv: Fix missing msg_namelen update in iucv_sock_recvmsg() · 775ad2a8
      Mathias Krause authored
      [ Upstream commit a5598bd9 ]
      
      The current code does not fill the msg_name member in case it is set.
      It also does not set the msg_namelen member to 0 and therefore makes
      net/socket.c leak the local, uninitialized sockaddr_storage variable
      to userland -- 128 bytes of kernel stack memory.
      
      Fix that by simply setting msg_namelen to 0 as obviously nobody cared
      about iucv_sock_recvmsg() not filling the msg_name in case it was set.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Ursula Braun <ursula.braun@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      775ad2a8
    • Wu Fengguang's avatar
      isdnloop: fix and simplify isdnloop_init() · 3822b3c5
      Wu Fengguang authored
      [ Upstream commit 77f00f63 ]
      
      Fix a buffer overflow bug by removing the revision and printk.
      
      [   22.016214] isdnloop-ISDN-driver Rev 1.11.6.7
      [   22.097508] isdnloop: (loop0) virtual card added
      [   22.174400] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff83244972
      [   22.174400]
      [   22.436157] Pid: 1, comm: swapper Not tainted 3.5.0-bisect-00018-gfa8bbb13-dirty #129
      [   22.624071] Call Trace:
      [   22.720558]  [<ffffffff832448c3>] ? CallcNew+0x56/0x56
      [   22.815248]  [<ffffffff8222b623>] panic+0x110/0x329
      [   22.914330]  [<ffffffff83244972>] ? isdnloop_init+0xaf/0xb1
      [   23.014800]  [<ffffffff832448c3>] ? CallcNew+0x56/0x56
      [   23.090763]  [<ffffffff8108e24b>] __stack_chk_fail+0x2b/0x30
      [   23.185748]  [<ffffffff83244972>] isdnloop_init+0xaf/0xb1
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3822b3c5
    • Mathias Krause's avatar
      ax25: fix info leak via msg_name in ax25_recvmsg() · 92df2f60
      Mathias Krause authored
      [ Upstream commit ef3313e8 ]
      
      When msg_namelen is non-zero the sockaddr info gets filled out, as
      requested, but the code fails to initialize the padding bytes of struct
      sockaddr_ax25 inserted by the compiler for alignment. Additionally the
      msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
      not always filled up to this size.
      
      Both issues lead to the fact that the code will leak uninitialized
      kernel stack bytes in net/socket.c.
      
      Fix both issues by initializing the memory with memset(0).
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      92df2f60
    • Mathias Krause's avatar
      atm: fix info leak in getsockopt(SO_ATMPVC) · 55dde8cf
      Mathias Krause authored
      commit e862f1a9 upstream.
      
      The ATM code fails to initialize the two padding bytes of struct
      sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
      before filling the structure to avoid the info leak.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 2.6.32: adjust context, indentation]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      55dde8cf
    • Mathias Krause's avatar
      atm: fix info leak via getsockname() · dde45d39
      Mathias Krause authored
      commit 3c0c5cfd upstream.
      
      The ATM code fails to initialize the two padding bytes of struct
      sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
      before filling the structure to avoid the info leak.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 2.6.32: adjust context]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dde45d39
    • Mathias Krause's avatar
      atm: update msg_namelen in vcc_recvmsg() · 531539ab
      Mathias Krause authored
      [ Upstream commit 9b3e617f ]
      
      The current code does not fill the msg_name member in case it is set.
      It also does not set the msg_namelen member to 0 and therefore makes
      net/socket.c leak the local, uninitialized sockaddr_storage variable
      to userland -- 128 bytes of kernel stack memory.
      
      Fix that by simply setting msg_namelen to 0 as obviously nobody cared
      about vcc_recvmsg() not filling the msg_name in case it was set.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      531539ab
    • Mathias Krause's avatar
      ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT) · 75ca2088
      Mathias Krause authored
      commit 2d8a041b upstream.
      
      If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
      not set, __ip_vs_get_timeouts() does not fully initialize the structure
      that gets copied to userland and that for leaks up to 12 bytes of kernel
      stack. Add an explicit memset(0) before passing the structure to
      __ip_vs_get_timeouts() to avoid the info leak.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Wensong Zhang <wensong@linux-vs.org>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 2.6.32: adjust context]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      75ca2088
    • Jesper Dangaard Brouer's avatar
      ipvs: IPv6 MTU checking cleanup and bugfix · 9df2c9ad
      Jesper Dangaard Brouer authored
      Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
      a common helper function __mtu_check_toobig_v6().
      
      The MTU check for tunnel mode can also use this helper as
      ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
      skb->len.  And the 'mtu' variable have been adjusted before
      calling helper.
      
      Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
      were missing a check for skb_is_gso().
      
      This bug e.g. caused issues for KVM IPVS setups, where different
      Segmentation Offloading techniques are utilized, between guests,
      via the virtio driver.  This resulted in very bad performance,
      due to the ICMPv6 "too big" messages didn't affect the sender.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      (cherry picked from commit 590e3f79)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9df2c9ad
    • Simon Horman's avatar
      ipvs: allow transmit of GRO aggregated skbs · ec3dc8cd
      Simon Horman authored
      Attempt at allowing LVS to transmit skbs of greater than MTU length that
      have been aggregated by GRO and can thus be deaggregated by GSO.
      
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      (cherry picked from commit 8f1b03a4)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ec3dc8cd
    • Jozsef Kadlecsik's avatar
      netfilter: nf_ct_ipv4: packets with wrong ihl are invalid · df7753cf
      Jozsef Kadlecsik authored
      commit 07153c6e upstream.
      
      It was reported that the Linux kernel sometimes logs:
      
      klogd: [2629147.402413] kernel BUG at net / netfilter /
      nf_conntrack_proto_tcp.c: 447!
      klogd: [1072212.887368] kernel BUG at net / netfilter /
      nf_conntrack_proto_tcp.c: 392
      
      ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
      nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
      at the indicated lines - TCP options parsing - should not happen.
      However, tcp_error() relies on the "dataoff" offset to the TCP header,
      calculated by ipv4_get_l4proto().  But ipv4_get_l4proto() does not check
      bogus ihl values in IPv4 packets, which then can slip through tcp_error()
      and get caught at the TCP options parsing routines.
      
      The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
      ihl value.
      
      The patch closes netfilter bugzilla id 771.
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      df7753cf
    • Eric Dumazet's avatar
      ipv6: make fragment identifications less predictable · c023a0b4
      Eric Dumazet authored
      [ Backport of upstream commit 87c48fa3 ]
      
      Fernando Gont reported current IPv6 fragment identification generation
      was not secure, because using a very predictable system-wide generator,
      allowing various attacks.
      
      IPv4 uses inetpeer cache to address this problem and to get good
      performance. We'll use this mechanism when IPv6 inetpeer is stable
      enough in linux-3.1
      
      For the time being, we use jhash on destination address to provide less
      predictable identifications. Also remove a spinlock and use cmpxchg() to
      get better SMP performance.
      Reported-by: default avatarFernando Gont <fernando@gont.com.ar>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      [bwh: Backport further to 2.6.32]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c023a0b4
    • Nicolas Dichtel's avatar
      ipv6: discard overlapping fragment · b1a1c38d
      Nicolas Dichtel authored
      commit 70789d70 upstream
      
      RFC5722 prohibits reassembling fragments when some data overlaps.
      
      Bug spotted by Zhang Zuotao <zuotao.zhang@6wind.com>.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [dannf: backported to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b1a1c38d
    • Daniel Borkmann's avatar
      net: sctp: sctp_auth_key_put: use kzfree instead of kfree · 9c51a966
      Daniel Borkmann authored
      [ Upstream commit 586c31f3 ]
      
      For sensitive data like keying material, it is common practice to zero
      out keys before returning the memory back to the allocator. Thus, use
      kzfree instead of kfree.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9c51a966
    • Daniel Borkmann's avatar
      net: sctp: sctp_endpoint_free: zero out secret key data · 57201215
      Daniel Borkmann authored
      [ Upstream commit b5c37fe6 ]
      
      On sctp_endpoint_destroy, previously used sensitive keying material
      should be zeroed out before the memory is returned, as we already do
      with e.g. auth keys when released.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      57201215
    • Daniel Borkmann's avatar
      net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree · bae3ff4a
      Daniel Borkmann authored
      [ Upstream commit 6ba542a2 ]
      
      In sctp_setsockopt_auth_key, we create a temporary copy of the user
      passed shared auth key for the endpoint or association and after
      internal setup, we free it right away. Since it's sensitive data, we
      should zero out the key before returning the memory back to the
      allocator. Thus, use kzfree instead of kfree, just as we do in
      sctp_auth_key_put().
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bae3ff4a
    • Tommi Rantala's avatar
      sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails · 5e4b9c85
      Tommi Rantala authored
      [ Upstream commit be364c8c ]
      
      Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
      reproducible e.g. with the sendto() syscall by passing invalid
      user space pointer in the second argument:
      
       #include <string.h>
       #include <arpa/inet.h>
       #include <sys/socket.h>
      
       int main(void)
       {
               int fd;
               struct sockaddr_in sa;
      
               fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
               if (fd < 0)
                       return 1;
      
               memset(&sa, 0, sizeof(sa));
               sa.sin_family = AF_INET;
               sa.sin_addr.s_addr = inet_addr("127.0.0.1");
               sa.sin_port = htons(11111);
      
               sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
      
               return 0;
       }
      
      As far as I can tell, the leak has been around since ~2003.
      Signed-off-by: default avatarTommi Rantala <tt.rantala@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5e4b9c85
    • Mathias Krause's avatar
      dcbnl: fix various netlink info leaks · 0e03cad4
      Mathias Krause authored
      commit 29cd8ae0 upstream.
      
      The dcb netlink interface leaks stack memory in various places:
      * perm_addr[] buffer is only filled at max with 12 of the 32 bytes but
        copied completely,
      * no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand,
        so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes
        for ieee_pfc structs, etc.,
      * the same is true for CEE -- no in-kernel driver fills the whole
        struct,
      
      Prevent all of the above stack info leaks by properly initializing the
      buffers/structures involved.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 2.6.32: no support for IEEE or CEE commands, so only
       deal with perm_addr]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0e03cad4
    • Paul Moore's avatar
      unix: fix a race condition in unix_release() · 480cabcc
      Paul Moore authored
      [ Upstream commit ded34e0f ]
      
      As reported by Jan, and others over the past few years, there is a
      race condition caused by unix_release setting the sock->sk pointer
      to NULL before properly marking the socket as dead/orphaned.  This
      can cause a problem with the LSM hook security_unix_may_send() if
      there is another socket attempting to write to this partially
      released socket in between when sock->sk is set to NULL and it is
      marked as dead/orphaned.  This patch fixes this by only setting
      sock->sk to NULL after the socket has been marked as dead; I also
      take the opportunity to make unix_release_sock() a void function
      as it only ever returned 0/success.
      
      Dave, I think this one should go on the -stable pile.
      
      Special thanks to Jan for coming up with a reproducer for this
      problem.
      Reported-by: default avatarJan Stancek <jan.stancek@gmail.com>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      480cabcc
    • Eric Dumazet's avatar
      tcp: preserve ACK clocking in TSO · 0d99f344
      Eric Dumazet authored
      [ Upstream commit f4541d60 ]
      
      A long standing problem with TSO is the fact that tcp_tso_should_defer()
      rearms the deferred timer, while it should not.
      
      Current code leads to following bad bursty behavior :
      
      20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
      20:11:24.484337 IP B > A: . ack 263721 win 1117
      20:11:24.485086 IP B > A: . ack 265241 win 1117
      20:11:24.485925 IP B > A: . ack 266761 win 1117
      20:11:24.486759 IP B > A: . ack 268281 win 1117
      20:11:24.487594 IP B > A: . ack 269801 win 1117
      20:11:24.488430 IP B > A: . ack 271321 win 1117
      20:11:24.489267 IP B > A: . ack 272841 win 1117
      20:11:24.490104 IP B > A: . ack 274361 win 1117
      20:11:24.490939 IP B > A: . ack 275881 win 1117
      20:11:24.491775 IP B > A: . ack 277401 win 1117
      20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
      20:11:24.492620 IP B > A: . ack 278921 win 1117
      20:11:24.493448 IP B > A: . ack 280441 win 1117
      20:11:24.494286 IP B > A: . ack 281961 win 1117
      20:11:24.495122 IP B > A: . ack 283481 win 1117
      20:11:24.495958 IP B > A: . ack 285001 win 1117
      20:11:24.496791 IP B > A: . ack 286521 win 1117
      20:11:24.497628 IP B > A: . ack 288041 win 1117
      20:11:24.498459 IP B > A: . ack 289561 win 1117
      20:11:24.499296 IP B > A: . ack 291081 win 1117
      20:11:24.500133 IP B > A: . ack 292601 win 1117
      20:11:24.500970 IP B > A: . ack 294121 win 1117
      20:11:24.501388 IP B > A: . ack 295641 win 1117
      20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119
      
      While the expected behavior is more like :
      
      20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
      20:19:49.260446 IP B > A: . ack 154281 win 1212
      20:19:49.261282 IP B > A: . ack 155801 win 1212
      20:19:49.262125 IP B > A: . ack 157321 win 1212
      20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
      20:19:49.262958 IP B > A: . ack 158841 win 1212
      20:19:49.263795 IP B > A: . ack 160361 win 1212
      20:19:49.264628 IP B > A: . ack 161881 win 1212
      20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
      20:19:49.265465 IP B > A: . ack 163401 win 1212
      20:19:49.265886 IP B > A: . ack 164921 win 1212
      20:19:49.266722 IP B > A: . ack 166441 win 1212
      20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
      20:19:49.267559 IP B > A: . ack 167961 win 1212
      20:19:49.268394 IP B > A: . ack 169481 win 1212
      20:19:49.269232 IP B > A: . ack 171001 win 1212
      20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0d99f344
    • Eric Dumazet's avatar
      tcp: fix MSG_SENDPAGE_NOTLAST logic · ef32163f
      Eric Dumazet authored
      [ Upstream commit ae62ca7b ]
      
      commit 35f9c09f (tcp: tcp_sendpages() should call tcp_push() once)
      added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
      frags but the last one for a splice() call.
      
      The condition used to set the flag in pipe_to_sendpage() relied on
      splice() user passing the exact number of bytes present in the pipe,
      or a smaller one.
      
      But some programs pass an arbitrary high value, and the test fails.
      
      The effect of this bug is a lack of tcp_push() at the end of a
      splice(pipe -> socket) call, and possibly very slow or erratic TCP
      sessions.
      
      We should both test sd->total_len and fact that another fragment
      is in the pipe (pipe->nrbufs > 1)
      
      Many thanks to Willy for providing very clear bug report, bisection
      and test programs.
      Reported-by: default avatarWilly Tarreau <w@1wt.eu>
      Bisected-by: default avatarWilly Tarreau <w@1wt.eu>
      Tested-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ef32163f