1. 06 Aug, 2014 40 commits
    • Takao Indoh's avatar
      iommu/vt-d: Disable translation if already enabled · 73f98b09
      Takao Indoh authored
      commit 3a93c841 upstream.
      
      This patch disables translation(dma-remapping) before its initialization
      if it is already enabled.
      
      This is needed for kexec/kdump boot. If dma-remapping is enabled in the
      first kernel, it need to be disabled before initializing its page table
      during second kernel boot. Wei Hu also reported that this is needed
      when second kernel boots with intel_iommu=off.
      
      Basically iommu->gcmd is used to know whether translation is enabled or
      disabled, but it is always zero at boot time even when translation is
      enabled since iommu->gcmd is initialized without considering such a
      case. Therefor this patch synchronizes iommu->gcmd value with global
      command register when iommu structure is allocated.
      Signed-off-by: default avatarTakao Indoh <indou.takao@jp.fujitsu.com>
      Signed-off-by: default avatarJoerg Roedel <joro@8bytes.org>
      [wyj: Backported to 3.4: adjust context]
      Signed-off-by: default avatarYijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      73f98b09
    • Sven Wegener's avatar
      x86_32, entry: Store badsys error code in %eax · dc792792
      Sven Wegener authored
      commit 8142b215 upstream.
      
      Commit 554086d8 ("x86_32, entry: Do syscall exit work on badsys
      (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
      code, resulting in syscall() not returning proper errors for undefined
      syscalls on CPUs supporting the sysenter feature.
      
      The following code:
      
      > int result = syscall(666);
      > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));
      
      results in:
      
      > result=666 errno=0 error=Success
      
      Obviously, the syscall return value is the called syscall number, but it
      should have been an ENOSYS error. When run under ptrace it behaves
      correctly, which makes it hard to debug in the wild:
      
      > result=-1 errno=38 error=Function not implemented
      
      The %eax register is the return value register. For debugging via ptrace
      the syscall entry code stores the complete register context on the
      stack. The badsys handlers only store the ENOSYS error code in the
      ptrace register set and do not set %eax like a regular syscall handler
      would. The old resume_userspace call chain contains code that clobbers
      %eax and it restores %eax from the ptrace registers afterwards. The same
      goes for the ptrace-enabled call chain. When ptrace is not used, the
      syscall return value is the passed-in syscall number from the untouched
      %eax register.
      
      Use %eax as the return value register in syscall_badsys and
      sysenter_badsys, like a real syscall handler does, and have the caller
      push the value onto the stack for ptrace access.
      Signed-off-by: default avatarSven Wegener <sven.wegener@stealer.net>
      Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.netReviewed-and-tested-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      dc792792
    • Tejun Heo's avatar
      libata: introduce ata_host->n_tags to avoid oops on SAS controllers · e0747c72
      Tejun Heo authored
      commit 1a112d10 upstream.
      
      1871ee13 ("libata: support the ata host which implements a queue
      depth less than 32") directly used ata_port->scsi_host->can_queue from
      ata_qc_new() to determine the number of tags supported by the host;
      unfortunately, SAS controllers doing SATA don't initialize ->scsi_host
      leading to the following oops.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
       IP: [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
       PGD 0
       Oops: 0002 [#1] SMP
       Modules linked in: isci libsas scsi_transport_sas mgag200 drm_kms_helper ttm
       CPU: 1 PID: 518 Comm: udevd Not tainted 3.16.0-rc6+ #62
       Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
       task: ffff880c1a00b280 ti: ffff88061a000000 task.ti: ffff88061a000000
       RIP: 0010:[<ffffffff814e0618>]  [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
       RSP: 0018:ffff88061a003ae8  EFLAGS: 00010012
       RAX: 0000000000000001 RBX: ffff88000241ca80 RCX: 00000000000000fa
       RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8806194aa298
       RBP: ffff88061a003ae8 R08: ffff8806194a8000 R09: 0000000000000000
       R10: 0000000000000000 R11: ffff88000241ca80 R12: ffff88061ad58200
       R13: ffff8806194aa298 R14: ffffffff814e67a0 R15: ffff8806194a8000
       FS:  00007f3ad7fe3840(0000) GS:ffff880627620000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000058 CR3: 000000061a118000 CR4: 00000000001407e0
       Stack:
        ffff88061a003b20 ffffffff814e96e1 ffff88000241ca80 ffff88061ad58200
        ffff8800b6bf6000 ffff880c1c988000 ffff880619903850 ffff88061a003b68
        ffffffffa0056ce1 ffff88061a003b48 0000000013d6e6f8 ffff88000241ca80
       Call Trace:
        [<ffffffff814e96e1>] ata_sas_queuecmd+0xa1/0x430
        [<ffffffffa0056ce1>] sas_queuecommand+0x191/0x220 [libsas]
        [<ffffffff8149afee>] scsi_dispatch_cmd+0x10e/0x300
        [<ffffffff814a3bc5>] scsi_request_fn+0x2f5/0x550
        [<ffffffff81317613>] __blk_run_queue+0x33/0x40
        [<ffffffff8131781a>] queue_unplugged+0x2a/0x90
        [<ffffffff8131ceb4>] blk_flush_plug_list+0x1b4/0x210
        [<ffffffff8131d274>] blk_finish_plug+0x14/0x50
        [<ffffffff8117eaa8>] __do_page_cache_readahead+0x198/0x1f0
        [<ffffffff8117ee21>] force_page_cache_readahead+0x31/0x50
        [<ffffffff8117ee7e>] page_cache_sync_readahead+0x3e/0x50
        [<ffffffff81172ac6>] generic_file_read_iter+0x496/0x5a0
        [<ffffffff81219897>] blkdev_read_iter+0x37/0x40
        [<ffffffff811e307e>] new_sync_read+0x7e/0xb0
        [<ffffffff811e3734>] vfs_read+0x94/0x170
        [<ffffffff811e43c6>] SyS_read+0x46/0xb0
        [<ffffffff811e33d1>] ? SyS_lseek+0x91/0xb0
        [<ffffffff8171ee29>] system_call_fastpath+0x16/0x1b
       Code: 00 00 00 88 50 29 83 7f 08 01 19 d2 83 e2 f0 83 ea 50 88 50 34 c6 81 1d 02 00 00 40 c6 81 17 02 00 00 00 5d c3 66 0f 1f 44 00 00 <89> 14 25 58 00 00 00
      
      Fix it by introducing ata_host->n_tags which is initialized to
      ATA_MAX_QUEUE - 1 in ata_host_init() for SAS controllers and set to
      scsi_host_template->can_queue in ata_host_register() for !SAS ones.
      As SAS hosts are never registered, this will give them the same
      ATA_MAX_QUEUE - 1 as before.  Note that we can't use
      scsi_host->can_queue directly for SAS hosts anyway as they can go
      higher than the libata maximum.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarMike Qiu <qiudayu@linux.vnet.ibm.com>
      Reported-by: default avatarJesse Brandeburg <jesse.brandeburg@gmail.com>
      Reported-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Tested-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Fixes: 1871ee13 ("libata: support the ata host which implements a queue depth less than 32")
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e0747c72
    • Kevin Hao's avatar
      libata: support the ata host which implements a queue depth less than 32 · 078890ca
      Kevin Hao authored
      commit 1871ee13 upstream.
      
      The sata on fsl mpc8315e is broken after the commit 8a4aeec8
      ("libata/ahci: accommodate tag ordered controllers"). The reason is
      that the ata controller on this SoC only implement a queue depth of
      16. When issuing the commands in tag order, all the commands in tag
      16 ~ 31 are mapped to tag 0 unconditionally and then causes the sata
      malfunction. It makes no senses to use a 32 queue in software while
      the hardware has less queue depth. So consider the queue depth
      implemented by the hardware when requesting a command tag.
      
      Fixes: 8a4aeec8 ("libata/ahci: accommodate tag ordered controllers")
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      078890ca
    • Catalin Marinas's avatar
      mm: kmemleak: avoid false negatives on vmalloc'ed objects · dc47dfd2
      Catalin Marinas authored
      commit 7f88f88f upstream.
      
      Commit 248ac0e1 ("mm/vmalloc: remove guard page from between vmap
      blocks") had the side effect of making vmap_area.va_end member point to
      the next vmap_area.va_start.  This was creating an artificial reference
      to vmalloc'ed objects and kmemleak was rarely reporting vmalloc() leaks.
      
      This patch marks the vmap_area containing pointers explicitly and
      reduces the min ref_count to 2 as vm_struct still contains a reference
      to the vmalloc'ed object.  The kmemleak add_scan_area() function has
      been improved to allow a SIZE_MAX argument covering the rest of the
      object (for simpler calling sites).
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      dc47dfd2
    • Xi Wang's avatar
      introduce SIZE_MAX · b11597b7
      Xi Wang authored
      commit a3860c1c upstream.
      
      ULONG_MAX is often used to check for integer overflow when calculating
      allocation size.  While ULONG_MAX happens to work on most systems, there
      is no guarantee that `size_t' must be the same size as `long'.
      
      This patch introduces SIZE_MAX, the maximum value of `size_t', to improve
      portability and readability for allocation size validation.
      Signed-off-by: default avatarXi Wang <xi.wang@gmail.com>
      Acked-by: default avatarAlex Elder <elder@dreamhost.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b11597b7
    • Xi Wang's avatar
      ceph: fix overflow check in build_snap_context() · ce4ded58
      Xi Wang authored
      commit 80834312 upstream.
      
      The overflow check for a + n * b should be (n > (ULONG_MAX - a) / b),
      rather than (n > ULONG_MAX / b - a).
      Signed-off-by: default avatarXi Wang <xi.wang@gmail.com>
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ce4ded58
    • Nicolas Pitre's avatar
      ARM: 7670/1: fix the memset fix · 2c58922a
      Nicolas Pitre authored
      commit 418df63a upstream.
      
      Commit 455bd4c4 ("ARM: 7668/1: fix memset-related crashes caused by
      recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
      with the memset return value.  However the memset itself became broken
      by that patch for misaligned pointers.
      
      This fixes the above by branching over the entry code from the
      misaligned fixup code to avoid reloading the original pointer.
      
      Also, because the function entry alignment is wrong in the Thumb mode
      compilation, that fixup code is moved to the end.
      
      While at it, the entry instructions are slightly reworked to help dual
      issue pipelines.
      Signed-off-by: default avatarNicolas Pitre <nico@linaro.org>
      Tested-by: default avatarAlexander Holler <holler@ahsoftware.de>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2c58922a
    • Ivan Djelic's avatar
      ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations · fe7b4c33
      Ivan Djelic authored
      commit 455bd4c4 upstream.
      
      Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
      assumptions about the implementation of memset and similar functions.
      The current ARM optimized memset code does not return the value of
      its first argument, as is usually expected from standard implementations.
      
      For instance in the following function:
      
      void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
      {
      	memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
      	waiter->magic = waiter;
      	INIT_LIST_HEAD(&waiter->list);
      }
      
      compiled as:
      
      800554d0 <debug_mutex_lock_common>:
      800554d0:       e92d4008        push    {r3, lr}
      800554d4:       e1a00001        mov     r0, r1
      800554d8:       e3a02010        mov     r2, #16 ; 0x10
      800554dc:       e3a01011        mov     r1, #17 ; 0x11
      800554e0:       eb04426e        bl      80165ea0 <memset>
      800554e4:       e1a03000        mov     r3, r0
      800554e8:       e583000c        str     r0, [r3, #12]
      800554ec:       e5830000        str     r0, [r3]
      800554f0:       e5830004        str     r0, [r3, #4]
      800554f4:       e8bd8008        pop     {r3, pc}
      
      GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
      register/memory corruptions.
      
      This patch fixes the return value of the assembly version of memset.
      It adds a 'mov' instruction and merges an additional load+store into
      existing load/store instructions.
      For ease of review, here is a breakdown of the patch into 4 simple steps:
      
      Step 1
      ======
      Perform the following substitutions:
      ip -> r8, then
      r0 -> ip,
      and insert 'mov ip, r0' as the first statement of the function.
      At this point, we have a memset() implementation returning the proper result,
      but corrupting r8 on some paths (the ones that were using ip).
      
      Step 2
      ======
      Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:
      
      save r8:
      -       str     lr, [sp, #-4]!
      +       stmfd   sp!, {r8, lr}
      
      and restore r8 on both exit paths:
      -       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
      +       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
      (...)
              tst     r2, #16
              stmneia ip!, {r1, r3, r8, lr}
      -       ldr     lr, [sp], #4
      +       ldmfd   sp!, {r8, lr}
      
      Step 3
      ======
      Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:
      
      save r8:
      -       stmfd   sp!, {r4-r7, lr}
      +       stmfd   sp!, {r4-r8, lr}
      
      and restore r8 on both exit paths:
              bgt     3b
      -       ldmeqfd sp!, {r4-r7, pc}
      +       ldmeqfd sp!, {r4-r8, pc}
      (...)
              tst     r2, #16
              stmneia ip!, {r4-r7}
      -       ldmfd   sp!, {r4-r7, lr}
      +       ldmfd   sp!, {r4-r8, lr}
      
      Step 4
      ======
      Rewrite register list "r4-r7, r8" as "r4-r8".
      Signed-off-by: default avatarIvan Djelic <ivan.djelic@parrot.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarDirk Behme <dirk.behme@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      fe7b4c33
    • Naoya Horiguchi's avatar
      mm: hugetlb: fix copy_hugetlb_page_range() · e6a5e9f6
      Naoya Horiguchi authored
      commit 0253d634 upstream.
      
      Commit 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle
      migration/hwpoisoned entry") changed the order of
      huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
      in some workloads like hugepage-backed heap allocation via libhugetlbfs.
      This patch fixes it.
      
      The test program for the problem is shown below:
      
        $ cat heap.c
        #include <unistd.h>
        #include <stdlib.h>
        #include <string.h>
      
        #define HPS 0x200000
      
        int main() {
        	int i;
        	char *p = malloc(HPS);
        	memset(p, '1', HPS);
        	for (i = 0; i < 5; i++) {
        		if (!fork()) {
        			memset(p, '2', HPS);
        			p = malloc(HPS);
        			memset(p, '3', HPS);
        			free(p);
        			return 0;
        		}
        	}
        	sleep(1);
        	free(p);
        	return 0;
        }
      
        $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap
      
      Fixes 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle
      migration/hwpoisoned entry"), so is applicable to -stable kernels which
      include it.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: default avatarGuillaume Morin <guillaume@morinfr.org>
      Suggested-by: default avatarGuillaume Morin <guillaume@morinfr.org>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e6a5e9f6
    • Markus F.X.J. Oberhumer's avatar
      crypto: testmgr - update LZO compression test vectors · 6156e7c0
      Markus F.X.J. Oberhumer authored
      commit 0ec73820 upstream.
      
      Update the LZO compression test vectors according to the latest compressor
      version.
      Signed-off-by: default avatarMarkus F.X.J. Oberhumer <markus@oberhumer.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6156e7c0
    • Julian Anastasov's avatar
      ipvs: stop tot_stats estimator only under CONFIG_SYSCTL · 25cc3a3e
      Julian Anastasov authored
      [ Upstream commit 9802d21e ]
      
      The tot_stats estimator is started only when CONFIG_SYSCTL
      is defined. But it is stopped without checking CONFIG_SYSCTL.
      Fix the crash by moving ip_vs_stop_estimator into
      ip_vs_control_net_cleanup_sysctl.
      
      The change is needed after commit 14e40546
      ("IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()") from 2.6.39.
      Reported-by: default avatarJet Chen <jet.chen@intel.com>
      Tested-by: default avatarJet Chen <jet.chen@intel.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      25cc3a3e
    • Roland Dreier's avatar
      x86, ioremap: Speed up check for RAM pages · b3ff2867
      Roland Dreier authored
      commit c81c8a1e upstream.
      
      In __ioremap_caller() (the guts of ioremap), we loop over the range of
      pfns being remapped and checks each one individually with page_is_ram().
      For large ioremaps, this can be very slow.  For example, we have a
      device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
      seconds -- sometimes long enough to trigger the soft lockup detector!
      
      Internally, page_is_ram() calls walk_system_ram_range() on a single
      page.  Instead, we can make a single call to walk_system_ram_range()
      from __ioremap_caller(), and do our further checks only for any RAM
      pages that we find.  For the common case of MMIO, this saves an enormous
      amount of work, since the range being ioremapped doesn't intersect
      system RAM at all.
      
      With this change, ioremap on our 256 GiB BAR takes less than 1 second.
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-roland@kernel.orgSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b3ff2867
    • Mikulas Patocka's avatar
      sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue · 30b605d6
      Mikulas Patocka authored
      commit fd1232b2 upstream.
      
      This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
      returns QUEUE FULL status.
      
      When the controller encounters an error (including QUEUE FULL or BUSY
      status), it aborts all not yet submitted requests in the function
      sym_dequeue_from_squeue.
      
      This function aborts them with DID_SOFT_ERROR.
      
      If the disk has full tag queue, the request that caused the overflow is
      aborted with QUEUE FULL status (and the scsi midlayer properly retries
      it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
      the following requests with DID_SOFT_ERROR --- for them, the midlayer
      does just a few retries and then signals the error up to sd.
      
      The result is that disk returning QUEUE FULL causes request failures.
      
      The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
      (rebranded ST336607LC) with command queue 48 or 64 tags.  The disk has
      64 tags, but under some access patterns it return QUEUE FULL when there
      are less than 64 pending tags.  The SCSI specification allows returning
      QUEUE FULL anytime and it is up to the host to retry.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30b605d6
    • Dan Carpenter's avatar
      applicom: dereferencing NULL on error path · 12489e9c
      Dan Carpenter authored
      commit 8bab797c upstream.
      
      This is a static checker fix.  The "dev" variable is always NULL after
      the while statement so we would be dereferencing a NULL pointer here.
      
      Fixes: 819a3eba ('[PATCH] applicom: fix error handling')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      12489e9c
    • H. Peter Anvin's avatar
      x86-32, espfix: Remove filter for espfix32 due to race · 6806fa8b
      H. Peter Anvin authored
      commit 246f2d2e upstream.
      
      It is not safe to use LAR to filter when to go down the espfix path,
      because the LDT is per-process (rather than per-thread) and another
      thread might change the descriptors behind our back.  Fortunately it
      is always *safe* (if a bit slow) to go down the espfix path, and a
      32-bit LDT stack segment is extremely rare.
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.comSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6806fa8b
    • Jiang Liu's avatar
      score: normalize global variables exported by vmlinux.lds · 1280d204
      Jiang Liu authored
      commit ae49b83d upstream.
      
      Generate mandatory global variables _sdata in file vmlinux.lds.
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1280d204
    • Michael Cree's avatar
      alpha: add io{read,write}{16,32}be functions · 1df72e0d
      Michael Cree authored
      commit 25534eb7 upstream.
      
      These functions are used in some PCI drivers with big-endian
      MMIO space.
      
      Admittedly it is almost certain that no one this side of the
      Moon would use such a card in an Alpha but it does get us
      closer to being able to build allyesconfig or allmodconfig,
      and it enables the Debian default generic config to build.
      Tested-by: default avatarRaúl Porcel <armin76@gentoo.org>
      Signed-off-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1df72e0d
    • Ben Hutchings's avatar
      score: Add missing #include <linux/export.h> · 951c03ac
      Ben Hutchings authored
      There is no upstream commit for this, as arch/score/kernel/init_task.c
      has been replaced by generic code and <linux/export.h> is included
      indirectly by arch/score/mm/init.c.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      951c03ac
    • Lennox Wu's avatar
      Score: The commit is for compiling successfully. The modifications include: 1.... · ef2706a9
      Lennox Wu authored
      Score: The commit is for compiling successfully. The modifications include: 1. Kconfig of Score: we don't support ioremap 2. Missed headfile including 3. There are some errors in other people's commit not checked by us, we fix it now 3.1 arch/score/kernel/entry.S: wrong instructions 3.2 arch/score/kernel/process.c : just some typos
      
      commit 5fbbf8a1 upstream.
      Signed-off-by: default avatarLennox Wu <lennox.wu@gmail.com>
      [bwh: Backported to 3.2:
       - Drop addition of 'select HAVE_GENERIC_HARDIRQS' which was not removed here
       - Drop inapplicale change to copy_thread()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ef2706a9
    • Fengguang Wu's avatar
      unicore32: select generic atomic64_t support · 3e1ad261
      Fengguang Wu authored
      commit 82e54a6a upstream.
      
      It's required for the core fs/namespace.c and many other basic features.
      Signed-off-by: default avatarGuan Xuetao <gxt@mprc.pku.edu.cn>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3e1ad261
    • Guan Xuetao's avatar
      unicore32: add ioremap_nocache definition · 86e93708
      Guan Xuetao authored
      commit a50e4213 upstream.
      
      Bugfix for following error messages:
      lib/iomap.c: In function 'pci_iomap':
      lib/iomap.c:274: error: implicit declaration of function 'ioremap_nocache'
      lib/iomap.c:274: warning: return makes pointer from integer without a cast
      
      Also see commit <f1ecc698>
        it will hide the ioremap_nocache function for systems with an MMU
      Signed-off-by: default avatarGuan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jonas Bonn <jonas@southpole.se>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      86e93708
    • Hugh Dickins's avatar
      shmem: fix splicing from a hole while it's punched · a9883115
      Hugh Dickins authored
      commit b1a36650 upstream.
      
      shmem_fault() is the actual culprit in trinity's hole-punch starvation,
      and the most significant cause of such problems: since a page faulted is
      one that then appears page_mapped(), needing unmap_mapping_range() and
      i_mmap_mutex to be unmapped again.
      
      But it is not the only way in which a page can be brought into a hole in
      the radix_tree while that hole is being punched; and Vlastimil's testing
      implies that if enough other processors are busy filling in the hole,
      then shmem_undo_range() can be kept from completing indefinitely.
      
      shmem_file_splice_read() is the main other user of SGP_CACHE, which can
      instantiate shmem pagecache pages in the read-only case (without holding
      i_mutex, so perhaps concurrently with a hole-punch).  Probably it's
      silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
      ought to be safe, but might bring surprises - not a change to be rushed.
      
      shmem_read_mapping_page_gfp() is an internal interface used by
      drivers/gpu/drm GEM (and next by uprobes): it should be okay.  And
      shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
      called internally by the kernel (perhaps for a stacking filesystem,
      which might rely on holes to be reserved): it's unclear whether it could
      be provoked to keep hole-punch busy or not.
      
      We could apply the same umbrella as now used in shmem_fault() to
      shmem_file_splice_read() and the others; but it looks ugly, and use over
      a range raises questions - should it actually be per page? can these get
      starved themselves?
      
      The origin of this part of the problem is my v3.1 commit d0823576
      ("mm: pincer in truncate_inode_pages_range"), once it was duplicated
      into shmem.c.  It seemed like a nice idea at the time, to ensure
      (barring RCU lookup fuzziness) that there's an instant when the entire
      hole is empty; but the indefinitely repeated scans to ensure that make
      it vulnerable.
      
      Revert that "enhancement" to hole-punch from shmem_undo_range(), but
      retain the unproblematic rescanning when it's truncating; add a couple
      of comments there.
      
      Remove the "indices[0] >= end" test: that is now handled satisfactorily
      by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
      to be worth avoiding here.
      
      But if we do not always loop indefinitely, we do need to handle the case
      of swap swizzled back to page before shmem_free_swap() gets it: add a
      retry for that case, as suggested by Konstantin Khlebnikov; and for the
      case of page swizzled back to swap, as suggested by Johannes Weiner.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a9883115
    • Hugh Dickins's avatar
      shmem: fix faulting into a hole, not taking i_mutex · de21fd42
      Hugh Dickins authored
      commit 8e205f77 upstream.
      
      Commit f00cdc6d ("shmem: fix faulting into a hole while it's
      punched") was buggy: Sasha sent a lockdep report to remind us that
      grabbing i_mutex in the fault path is a no-no (write syscall may already
      hold i_mutex while faulting user buffer).
      
      We tried a completely different approach (see following patch) but that
      proved inadequate: good enough for a rational workload, but not good
      enough against trinity - which forks off so many mappings of the object
      that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
      into serious starvation when concurrent faults force the puncher to fall
      back to single-page unmap_mapping_range() searches of the i_mmap tree.
      
      So return to the original umbrella approach, but keep away from i_mutex
      this time.  We really don't want to bloat every shmem inode with a new
      mutex or completion, just to protect this unlikely case from trinity.
      So extend the original with wait_queue_head on stack at the hole-punch
      end, and wait_queue item on the stack at the fault end.
      
      This involves further use of i_lock to guard against the races: lockdep
      has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
      i_lock around wake_up_bit(), which is comparable to what we do here.
      i_lock is more convenient, but we could switch to shmem's info->lock.
      
      This issue has been tagged with CVE-2014-4171, which will require commit
      f00cdc6d and this and the following patch to be backported: we
      suggest to 3.1+, though in fact the trinity forkbomb effect might go
      back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
      not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
      Anyone running trinity on 3.0 and earlier? I don't think we need care.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      de21fd42
    • Hugh Dickins's avatar
      shmem: fix faulting into a hole while it's punched · f159cc25
      Hugh Dickins authored
      commit f00cdc6d upstream.
      
      Trinity finds that mmap access to a hole while it's punched from shmem
      can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
      from completing, until the reader chooses to stop; with the puncher's
      hold on i_mutex locking out all other writers until it can complete.
      
      It appears that the tmpfs fault path is too light in comparison with its
      hole-punching path, lacking an i_data_sem to obstruct it; but we don't
      want to slow down the common case.
      
      Extend shmem_fallocate()'s existing range notification mechanism, so
      shmem_fault() can refrain from faulting pages into the hole while it's
      punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
      faulting when not).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f159cc25
    • Dave Chinner's avatar
      xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near · d9892580
      Dave Chinner authored
      commit e3a746f5 upstream.
      
      The current cursor is reallocated when retrying the allocation, so
      the existing cursor needs to be destroyed in both the restart and
      the failure cases.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Tested-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d9892580
    • Dave Chinner's avatar
      xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near · 381687bd
      Dave Chinner authored
      commit 76d09538 upstream.
      
      When we fail to find an matching extent near the requested extent
      specification during a left-right distance search in
      xfs_alloc_ag_vextent_near, we fail to free the original cursor that
      we used to look up the XFS_BTNUM_CNT tree and hence leak it.
      Reported-by: default avatarChris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      381687bd
    • Mathias Krause's avatar
      netfilter: ipt_ULOG: fix info leaks · 0368fea2
      Mathias Krause authored
      commit 278f2b3e upstream.
      
      The ulog messages leak heap bytes by the means of padding bytes and
      incompletely filled string arrays. Fix those by memset(0)'ing the
      whole struct before filling it.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0368fea2
    • Martin Schwidefsky's avatar
      s390/ptrace: fix PSW mask check · 438127dd
      Martin Schwidefsky authored
      commit dab6cf55 upstream.
      
      The PSW mask check of the PTRACE_POKEUSR_AREA command is incorrect.
      For the default user_mode=home address space layout the psw_user_bits
      variable has the home space address-space-control bits set. But the
      PSW_MASK_USER contains PSW_MASK_ASC, the ptrace validity check for the
      PSW mask will therefore always fail.
      
      Fixes CVE-2014-3534
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      438127dd
    • Thomas Gleixner's avatar
      nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off · c4b4c3c5
      Thomas Gleixner authored
      commit 0e576acb upstream.
      
      If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ.
      
      If CONFIG_NO_HZ=y and the nohz functionality is disabled via the
      command line option "nohz=off" or not enabled due to missing hardware
      support, then tick_nohz_get_sleep_length() returns 0. That happens
      because ts->sleep_length is never set in that case.
      
      Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive.
      Reported-by: default avatarMichal Hocko <mhocko@suse.cz>
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c4b4c3c5
    • Michal Schmidt's avatar
      rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 · ea43a736
      Michal Schmidt authored
      commit e5eca6d4 upstream.
      
      When running RHEL6 userspace on a current upstream kernel, "ip link"
      fails to show VF information.
      
      The reason is a kernel<->userspace API change introduced by commit
      88c5b5ce ("rtnetlink: Call nlmsg_parse() with correct header length"),
      after which the kernel does not see iproute2's IFLA_EXT_MASK attribute
      in the netlink request.
      
      iproute2 adjusted for the API change in its commit 63338dca4513
      ("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter").
      
      The problem has been noticed before:
      http://marc.info/?l=linux-netdev&m=136692296022182&w=2
      (Subject: Re: getting VF link info seems to be broken in 3.9-rc8)
      
      We can do better than tell those with old userspace to upgrade. We can
      recognize the old iproute2 in the kernel by checking the netlink message
      length. Even when including the IFLA_EXT_MASK attribute, its netlink
      message is shorter than struct ifinfomsg.
      
      With this patch "ip link" shows VF information in both old and new
      iproute2 versions.
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ea43a736
    • Eric Dumazet's avatar
      ipv4: fix buffer overflow in ip_options_compile() · 22310565
      Eric Dumazet authored
      [ Upstream commit 10ec9472 ]
      
      There is a benign buffer overflow in ip_options_compile spotted by
      AddressSanitizer[1] :
      
      Its benign because we always can access one extra byte in skb->head
      (because header is followed by struct skb_shared_info), and in this case
      this byte is not even used.
      
      [28504.910798] ==================================================================
      [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
      [28504.913170] Read of size 1 by thread T15843:
      [28504.914026]  [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0
      [28504.915394]  [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120
      [28504.916843]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
      [28504.918175]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
      [28504.919490]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
      [28504.920835]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
      [28504.922208]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
      [28504.923459]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
      [28504.924722]
      [28504.925106] Allocated by thread T15843:
      [28504.925815]  [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120
      [28504.926884]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
      [28504.927975]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
      [28504.929175]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
      [28504.930400]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
      [28504.931677]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
      [28504.932851]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
      [28504.934018]
      [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
      [28504.934377]  of 40-byte region [ffff880026382800, ffff880026382828)
      [28504.937144]
      [28504.937474] Memory state around the buggy address:
      [28504.938430]  ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.939884]  ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.941294]  ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.942504]  ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.943483]  ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.945573]                         ^
      [28504.946277]  ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.094949]  ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.096114]  ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.097116]  ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.098472]  ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.099804] Legend:
      [28505.100269]  f - 8 freed bytes
      [28505.100884]  r - 8 redzone bytes
      [28505.101649]  . - 8 allocated bytes
      [28505.102406]  x=1..7 - x allocated bytes + (8-x) redzone bytes
      [28505.103637] ==================================================================
      
      [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernelSigned-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      22310565
    • Ben Hutchings's avatar
      dns_resolver: Null-terminate the right string · 0d604e94
      Ben Hutchings authored
      [ Upstream commit 640d7efe ]
      
      *_result[len] is parsed as *(_result[len]) which is not at all what we
      want to touch here.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: 84a7c0b1 ("dns_resolver: assure that dns_query() result is null-terminated")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d604e94
    • Manuel Schölling's avatar
      dns_resolver: assure that dns_query() result is null-terminated · bba87648
      Manuel Schölling authored
      [ Upstream commit 84a7c0b1 ]
      
      dns_query() credulously assumes that keys are null-terminated and
      returns a copy of a memory block that is off by one.
      Signed-off-by: default avatarManuel Schölling <manuel.schoelling@gmx.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bba87648
    • Sowmini Varadhan's avatar
      sunvnet: clean up objects created in vnet_new() on vnet_exit() · 8b9b0927
      Sowmini Varadhan authored
      [ Upstream commit a4b70a07 ]
      
      Nothing cleans up the objects created by
      vnet_new(), they are completely leaked.
      
      vnet_exit(), after doing the vio_unregister_driver() to clean
      up ports, should call a helper function that iterates over vnet_list
      and cleans up those objects. This includes unregister_netdevice()
      as well as free_netdev().
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Reviewed-by: default avatarKarl Volz <karl.volz@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8b9b0927
    • Daniel Borkmann's avatar
      net: sctp: fix information leaks in ulpevent layer · 2a3fda71
      Daniel Borkmann authored
      [ Upstream commit 8f2e5ae4 ]
      
      While working on some other SCTP code, I noticed that some
      structures shared with user space are leaking uninitialized
      stack or heap buffer. In particular, struct sctp_sndrcvinfo
      has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
      remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
      putting this into cmsg. But also struct sctp_remote_error
      contains a 2 bytes hole that we don't fill but place into a skb
      through skb_copy_expand() via sctp_ulpevent_make_remote_error().
      
      Both structures are defined by the IETF in RFC6458:
      
      * Section 5.3.2. SCTP Header Information Structure:
      
        The sctp_sndrcvinfo structure is defined below:
      
        struct sctp_sndrcvinfo {
          uint16_t sinfo_stream;
          uint16_t sinfo_ssn;
          uint16_t sinfo_flags;
          <-- 2 bytes hole  -->
          uint32_t sinfo_ppid;
          uint32_t sinfo_context;
          uint32_t sinfo_timetolive;
          uint32_t sinfo_tsn;
          uint32_t sinfo_cumtsn;
          sctp_assoc_t sinfo_assoc_id;
        };
      
      * 6.1.3. SCTP_REMOTE_ERROR:
      
        A remote peer may send an Operation Error message to its peer.
        This message indicates a variety of error conditions on an
        association. The entire ERROR chunk as it appears on the wire
        is included in an SCTP_REMOTE_ERROR event. Please refer to the
        SCTP specification [RFC4960] and any extensions for a list of
        possible error formats. An SCTP error notification has the
        following format:
      
        struct sctp_remote_error {
          uint16_t sre_type;
          uint16_t sre_flags;
          uint32_t sre_length;
          uint16_t sre_error;
          <-- 2 bytes hole  -->
          sctp_assoc_t sre_assoc_id;
          uint8_t  sre_data[];
        };
      
      Fix this by setting both to 0 before filling them out. We also
      have other structures shared between user and kernel space in
      SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
      copy that buffer over from user space first and thus don't need
      to care about it in that cases.
      
      While at it, we can also remove lengthy comments copied from
      the draft, instead, we update the comment with the correct RFC
      number where one can look it up.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2a3fda71
    • Andrey Utkin's avatar
      appletalk: Fix socket referencing in skb · 7961c1a1
      Andrey Utkin authored
      [ Upstream commit 36beddc2 ]
      
      Setting just skb->sk without taking its reference and setting a
      destructor is invalid. However, in the places where this was done, skb
      is used in a way not requiring skb->sk setting. So dropping the setting
      of skb->sk.
      Thanks to Eric Dumazet <eric.dumazet@gmail.com> for correct solution.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441Reported-by: default avatarEd Martin <edman007@edman007.com>
      Signed-off-by: default avatarAndrey Utkin <andrey.krieger.utkin@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7961c1a1
    • dingtianhong's avatar
      igmp: fix the problem when mc leave group · 00fcf0cf
      dingtianhong authored
      [ Upstream commit 52ad353a ]
      
      The problem was triggered by these steps:
      
      1) create socket, bind and then setsockopt for add mc group.
         mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
         mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
         setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));
      
      2) drop the mc group for this socket.
         mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
         mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
         setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));
      
      3) and then drop the socket, I found the mc group was still used by the dev:
      
         netstat -g
      
         Interface       RefCnt Group
         --------------- ------ ---------------------
         eth2		   1	  255.0.0.37
      
      Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
      to be released for the netdev when drop the socket, but this process was broken when
      route default is NULL, the reason is that:
      
      The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
      is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
      then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
      the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
      released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
      when dropping the socket, the mc_list is NULL and the dev still keep this group.
      
      v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
      	so I add the checking for the in_dev before polling the mc_list, make sure when
      	we remove the mc group, dec the refcnt to the real dev which was using the mc address.
      	The problem would never happened again.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      00fcf0cf
    • Li RongQing's avatar
      8021q: fix a potential memory leak · 96f641a7
      Li RongQing authored
      [ Upstream commit 916c1689 ]
      
      skb_cow called in vlan_reorder_header does not free the skb when it failed,
      and vlan_reorder_header returns NULL to reset original skb when it is called
      in vlan_untag, lead to a memory leak.
      Signed-off-by: default avatarLi RongQing <roy.qing.li@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      96f641a7
    • Neal Cardwell's avatar
      tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb · 94fb7252
      Neal Cardwell authored
      [ Upstream commit 2cd0d743 ]
      
      If there is an MSS change (or misbehaving receiver) that causes a SACK
      to arrive that covers the end of an skb but is less than one MSS, then
      tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
      the skb ("Round if necessary..."), then chopping all bytes off the skb
      and creating a zero-byte skb in the write queue.
      
      This was visible now because the recently simplified TLP logic in
      bef1909e ("tcp: fixing TLP's FIN recovery") could find that 0-byte
      skb at the end of the write queue, and now that we do not check that
      skb's length we could send it as a TLP probe.
      
      Consider the following example scenario:
      
       mss: 1000
       skb: seq: 0 end_seq: 4000  len: 4000
       SACK: start_seq: 3999 end_seq: 4000
      
      The tcp_match_skb_to_sack() code will compute:
      
       in_sack = false
       pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
       new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
       new_len += mss = 4000
      
      Previously we would find the new_len > skb->len check failing, so we
      would fall through and set pkt_len = new_len = 4000 and chop off
      pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
      afterward in the write queue.
      
      With this new commit, we notice that the new new_len >= skb->len check
      succeeds, so that we return without trying to fragment.
      
      Fixes: adb92db8 ("tcp: Make SACK code to split only at mss boundaries")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      94fb7252