1. 15 Feb, 2014 39 commits
    • Alex Deucher's avatar
      drm/radeon: Fix sideport problems on certain RS690 boards · e5fdcafb
      Alex Deucher authored
      commit 8333f0fe upstream.
      
      Some RS690 boards with 64MB of sideport memory show up as
      having 128MB sideport + 256MB of UMA.  In this case,
      just skip the sideport memory and use UMA.  This fixes
      rendering corruption and should improve performance.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=35457Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e5fdcafb
    • Nicholas Bellinger's avatar
      iscsi-target: Fix-up all zero data-length CDBs with R/W_BIT set · d92055f1
      Nicholas Bellinger authored
      commit 4454b66c upstream.
      
      This patch changes special case handling for ISCSI_OP_SCSI_CMD
      where an initiator sends a zero length Expected Data Transfer
      Length (EDTL), but still sets the WRITE and/or READ flag bits
      when no payload transfer is requested.
      
      Many, many moons ago two special cases where added for an ancient
      version of ESX that has long since been fixed, so instead of adding
      a new special case for the reported bug with a Broadcom 57800 NIC,
      go ahead and always strip off the incorrect WRITE + READ flag bits.
      
      Also, avoid sending a reject here, as RFC-3720 does mandate this
      case be handled without protocol error.
      Reported-by: default avatarWitold Bazakbal <865perl@wp.pl>
      Tested-by: default avatarWitold Bazakbal <865perl@wp.pl>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d92055f1
    • Takashi Iwai's avatar
      xhci: Limit the spurious wakeup fix only to HP machines · fb738589
      Takashi Iwai authored
      commit 6962d914 upstream.
      
      We've got regression reports that my previous fix for spurious wakeups
      after S5 on HP Haswell machines leads to the automatic reboot at
      shutdown on some machines.  It turned out that the fix for one side
      triggers another BIOS bug in other side.  So, it's exclusive.
      
      Since the original S5 wakeups have been confirmed only on HP machines,
      it'd be safer to apply it only to limited machines.  As a wild guess,
      limiting to machines with HP PCI SSID should suffice.
      
      This patch should be backported to kernels as old as 3.12, that
      contain the commit 638298dc "xhci: Fix
      spurious wakeups after S5 on Haswell".
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66171Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSarah Sharp <sarah.a.sharp@linux.intel.com>
      Tested-by: <dashing.meng@gmail.com>
      Reported-by: default avatarNiklas Schnelle <niklas@komani.de>
      Reported-by: default avatarGiorgos <ganastasiouGR@gmail.com>
      Reported-by: <art1@vhex.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      fb738589
    • Al Viro's avatar
      ext4: fix del_timer() misuse for ->s_err_report · f09946da
      Al Viro authored
      commit 9105bb14 upstream.
      
      That thing should be del_timer_sync(); consider what happens
      if ext4_put_super() call of del_timer() happens to come just as it's
      getting run on another CPU.  Since that timer reschedules itself
      to run next day, you are pretty much guaranteed that you'll end up
      with kfree'd scheduled timer, with usual fun consequences.  AFAICS,
      that's -stable fodder all way back to 2010... [the second del_timer_sync()
      is almost certainly not needed, but it doesn't hurt either]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f09946da
    • Jan Kara's avatar
      ext2: Fix oops in ext2_get_block() called from ext2_quota_write() · f5b4f2e8
      Jan Kara authored
      commit df4e7ac0 upstream.
      
      ext2_quota_write() doesn't properly setup bh it passes to
      ext2_get_block() and thus we hit assertion BUG_ON(maxblocks == 0) in
      ext2_get_blocks() (or we could actually ask for mapping arbitrary number
      of blocks depending on whatever value was on stack).
      
      Fix ext2_quota_write() to properly fill in number of blocks to map.
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f5b4f2e8
    • Eryu Guan's avatar
      ext4: check for overlapping extents in ext4_valid_extent_entries() · 4645e4ee
      Eryu Guan authored
      commit 5946d089 upstream.
      
      A corrupted ext4 may have out of order leaf extents, i.e.
      
      extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
      extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
                   ^^^^ overlap with previous extent
      
      Reading such extent could hit BUG_ON() in ext4_es_cache_extent().
      
      	BUG_ON(end < lblk);
      
      The problem is that __read_extent_tree_block() tries to cache holes as
      well but assumes 'lblk' is greater than 'prev' and passes underflowed
      length to ext4_es_cache_extent(). Fix it by checking for overlapping
      extents in ext4_valid_extent_entries().
      
      I hit this when fuzz testing ext4, and am able to reproduce it by
      modifying the on-disk extent by hand.
      
      Also add the check for (ee_block + len - 1) in ext4_valid_extent() to
      make sure the value is not overflow.
      
      Ran xfstests on patched ext4 and no regression.
      
      Cc: Lukáš Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4645e4ee
    • Junho Ryu's avatar
      ext4: fix use-after-free in ext4_mb_new_blocks · ec94b7ab
      Junho Ryu authored
      commit 4e8d2139 upstream.
      
      ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count.
      While ext4_mb_use_preallocated checks pa->pa_deleted first and then
      increments pa->count later, ext4_mb_put_pa decrements pa->pa_count
      before holding pa->pa_lock and then sets pa->pa_deleted.
      
      * Free sequence
      ext4_mb_put_pa (1):		atomic_dec_and_test pa->pa_count
      ext4_mb_put_pa (2):		lock pa->pa_lock
      ext4_mb_put_pa (3):			check pa->pa_deleted
      ext4_mb_put_pa (4):			set pa->pa_deleted=1
      ext4_mb_put_pa (5):		unlock pa->pa_lock
      ext4_mb_put_pa (6):		remove pa from a list
      ext4_mb_pa_callback:		free pa
      
      * Use sequence
      ext4_mb_use_preallocated (1):	iterate over preallocation
      ext4_mb_use_preallocated (2):	lock pa->pa_lock
      ext4_mb_use_preallocated (3):		check pa->pa_deleted
      ext4_mb_use_preallocated (4):		increase pa->pa_count
      ext4_mb_use_preallocated (5):	unlock pa->pa_lock
      ext4_mb_release_context:	access pa
      
      * Use-after-free sequence
      [initial status]		<pa->pa_deleted = 0, pa_count = 1>
      ext4_mb_use_preallocated (1):	iterate over preallocation
      ext4_mb_use_preallocated (2):	lock pa->pa_lock
      ext4_mb_use_preallocated (3):		check pa->pa_deleted
      ext4_mb_put_pa (1):		atomic_dec_and_test pa->pa_count
      [pa_count decremented]		<pa->pa_deleted = 0, pa_count = 0>
      ext4_mb_use_preallocated (4):		increase pa->pa_count
      [pa_count incremented]		<pa->pa_deleted = 0, pa_count = 1>
      ext4_mb_use_preallocated (5):	unlock pa->pa_lock
      ext4_mb_put_pa (2):		lock pa->pa_lock
      ext4_mb_put_pa (3):			check pa->pa_deleted
      ext4_mb_put_pa (4):			set pa->pa_deleted=1
      [race condition!]		<pa->pa_deleted = 1, pa_count = 1>
      ext4_mb_put_pa (5):		unlock pa->pa_lock
      ext4_mb_put_pa (6):		remove pa from a list
      ext4_mb_pa_callback:		free pa
      ext4_mb_release_context:	access pa
      
      AddressSanitizer has detected use-after-free in ext4_mb_new_blocks
      Bug report: http://goo.gl/rG1On3Signed-off-by: default avatarJunho Ryu <jayr@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ec94b7ab
    • Theodore Ts'o's avatar
      ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails · 9eb492b8
      Theodore Ts'o authored
      commit ae1495b1 upstream.
      
      While it's true that errors can only happen if there is a bug in
      jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt
      the kernel or remount the file system read-only in order to avoid
      further data loss.  The ext4_journal_abort_handle() function doesn't
      do any of this, and while it's likely that this call (since it doesn't
      adjust refcounts) will likely result in the file system eventually
      deadlocking since the current transaction will never be able to close,
      it's much cleaner to call let ext4's error handling system deal with
      this situation.
      
      There's a separate bug here which is that if certain jbd2 errors
      errors occur and file system is mounted errors=continue, the file
      system will probably eventually end grind to a halt as described
      above.  But things have been this way in a long time, and usually when
      we have these sorts of errors it's pretty much a disaster --- and
      that's why the jbd2 layer aggressively retries memory allocations,
      which is the most likely cause of these jbd2 errors.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      [bwh: Backported to 3.2: drop logging of missing transaction debug data]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9eb492b8
    • Michele Baldessari's avatar
      libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8 · cbeb052c
      Michele Baldessari authored
      commit 87809942 upstream.
      
      We've received multiple reports in Fedora via (BZ 907193)
      that the Seagate Momentus SpinPoint M8 errors out when enabling AA:
      [    2.555905] ata2.00: failed to enable AA (error_mask=0x1)
      [    2.568482] ata2.00: failed to enable AA (error_mask=0x1)
      
      Add the ATA_HORKAGE_BROKEN_FPDMA_AA for this specific harddisk.
      Reported-by: default avatarNicholas <arealityfarbetween@googlemail.com>
      Signed-off-by: default avatarMichele Baldessari <michele@acksyn.org>
      Tested-by: default avatarNicholas <arealityfarbetween@googlemail.com>
      Acked-by: default avatarAlan Cox <gnomes@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      cbeb052c
    • Geert Uytterhoeven's avatar
      sh: always link in helper functions extracted from libgcc · bed3dd59
      Geert Uytterhoeven authored
      commit 84ed8a99 upstream.
      
      E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m:
      
        ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined!
      
      For "lib-y", if no symbols in a compilation unit are referenced by other
      units, the compilation unit will not be included in vmlinux.  This
      breaks modules that do reference those symbols.
      
      Use "obj-y" instead to fix this.
      
      http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/
      
      This doesn't fix all cases. There are others, e.g. udivsi3.
      This is also not limited to sh, many architectures handle this in the
      same way.
      
      A simple solution is to unconditionally include all helper functions.
      A more complex solution is to make the choice of "lib-y" or "obj-y" depend
      on CONFIG_MODULES:
      
        obj-$(CONFIG_MODULES) += ...
        lib-y($CONFIG_MODULES) += ...
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Tested-by: default avatarNobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
      Reviewed-by: default avatarNobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bed3dd59
    • Yan, Zheng's avatar
      ceph: wake up 'safe' waiters when unregistering request · f4ca736c
      Yan, Zheng authored
      commit fc55d2c9 upstream.
      
      We also need to wake up 'safe' waiters if error occurs or request
      aborted. Otherwise sync(2)/fsync(2) may hang forever.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f4ca736c
    • Yan, Zheng's avatar
      ceph: cleanup aborted requests when re-sending requests. · b072f9ca
      Yan, Zheng authored
      commit eb1b8af3 upstream.
      
      Aborted requests usually get cleared when the reply is received.
      If MDS crashes, no reply will be received. So we need to cleanup
      aborted requests when re-sending requests.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarGreg Farnum <greg@inktank.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b072f9ca
    • Akira Takeuchi's avatar
      mm: ensure get_unmapped_area() returns higher address than mmap_min_addr · 7f4d2460
      Akira Takeuchi authored
      commit 2afc745f upstream.
      
      This patch fixes the problem that get_unmapped_area() can return illegal
      address and result in failing mmap(2) etc.
      
      In case that the address higher than PAGE_SIZE is set to
      /proc/sys/vm/mmap_min_addr, the address lower than mmap_min_addr can be
      returned by get_unmapped_area(), even if you do not pass any virtual
      address hint (i.e.  the second argument).
      
      This is because the current get_unmapped_area() code does not take into
      account mmap_min_addr.
      
      This leads to two actual problems as follows:
      
      1. mmap(2) can fail with EPERM on the process without CAP_SYS_RAWIO,
         although any illegal parameter is not passed.
      
      2. The bottom-up search path after the top-down search might not work in
         arch_get_unmapped_area_topdown().
      
      Note: The first and third chunk of my patch, which changes "len" check,
      are for more precise check using mmap_min_addr, and not for solving the
      above problem.
      
      [How to reproduce]
      
      	--- test.c -------------------------------------------------
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <sys/mman.h>
      	#include <sys/errno.h>
      
      	int main(int argc, char *argv[])
      	{
      		void *ret = NULL, *last_map;
      		size_t pagesize = sysconf(_SC_PAGESIZE);
      
      		do {
      			last_map = ret;
      			ret = mmap(0, pagesize, PROT_NONE,
      				MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      	//		printf("ret=%p\n", ret);
      		} while (ret != MAP_FAILED);
      
      		if (errno != ENOMEM) {
      			printf("ERR: unexpected errno: %d (last map=%p)\n",
      			errno, last_map);
      		}
      
      		return 0;
      	}
      	---------------------------------------------------------------
      
      	$ gcc -m32 -o test test.c
      	$ sudo sysctl -w vm.mmap_min_addr=65536
      	vm.mmap_min_addr = 65536
      	$ ./test  (run as non-priviledge user)
      	ERR: unexpected errno: 1 (last map=0x10000)
      Signed-off-by: default avatarAkira Takeuchi <takeuchi.akr@jp.panasonic.com>
      Signed-off-by: default avatarKiyoshi Owada <owada.kiyoshi@jp.panasonic.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2:
       As we do not have vm_unmapped_area(), make arch_get_unmapped_area_topdown()
       calculate the lower limit for the new area's end address and then compare
       addresses with this instead of with len.  In the process, fix an off-by-one
       error which could result in returning 0 if mm->mmap_base == len.]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7f4d2460
    • Linus Torvalds's avatar
      x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround · bbc220ab
      Linus Torvalds authored
      commit 26bef131 upstream.
      
      Before we do an EMMS in the AMD FXSAVE information leak workaround we
      need to clear any pending exceptions, otherwise we trap with a
      floating-point exception inside this code.
      Reported-by: default avatarhalfdog <me@halfdog.net>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      [bwh: Backported to 3.2: adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bbc220ab
    • Andy Honig's avatar
      KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) · 6aa82e03
      Andy Honig authored
      commit fda4e2e8 upstream.
      
      In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
      potential to corrupt kernel memory if userspace provides an address that
      is at the end of a page.  This patches concerts those functions to use
      kvm_write_guest_cached and kvm_read_guest_cached.  It also checks the
      vapic_address specified by userspace during ioctl processing and returns
      an error to userspace if the address is not a valid GPA.
      
      This is generally not guest triggerable, because the required write is
      done by firmware that runs before the guest.  Also, it only affects AMD
      processors and oldish Intel that do not have the FlexPriority feature
      (unless you disable FlexPriority, of course; then newer processors are
      also affected).
      
      Fixes: b93463aa ('KVM: Accelerated apic support')
      Reported-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [dannf: backported to Debian's 3.2]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6aa82e03
    • Mathy Vanhoef's avatar
      ath9k_htc: properly set MAC address and BSSID mask · f7a9877c
      Mathy Vanhoef authored
      commit 657eb17d upstream.
      
      Pick the MAC address of the first virtual interface as the new hardware MAC
      address. Set BSSID mask according to this MAC address. This fixes CVE-2013-4579.
      Signed-off-by: default avatarMathy Vanhoef <vanhoefm@gmail.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f7a9877c
    • Mikulas Patocka's avatar
      hpfs: fix warnings when the filesystem fills up · bfefd2a8
      Mikulas Patocka authored
      commit bbd465df upstream.
      
      This patch fixes warnings due to missing lock on write error path.
      
        WARNING: at fs/hpfs/hpfs_fn.h:353 hpfs_truncate+0x75/0x80 [hpfs]()
        Hardware name: empty
        Pid: 26563, comm: dd Tainted: P           O 3.9.4 #12
        Call Trace:
          hpfs_truncate+0x75/0x80 [hpfs]
          hpfs_write_begin+0x84/0x90 [hpfs]
          _hpfs_bmap+0x10/0x10 [hpfs]
          generic_file_buffered_write+0x121/0x2c0
          __generic_file_aio_write+0x1c7/0x3f0
          generic_file_aio_write+0x7c/0x100
          do_sync_write+0x98/0xd0
          hpfs_file_write+0xd/0x50 [hpfs]
          vfs_write+0xa2/0x160
          sys_write+0x51/0xa0
          page_fault+0x22/0x30
          system_call_fastpath+0x1a/0x1f
      Signed-off-by: default avatarMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [Mikulas Patocka: This is backport of upstream commit 
       bbd465df, modified for stable kernels 
       2.6.39 - 3.7.]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bfefd2a8
    • Paul Gortmaker's avatar
      Fix warning from machine_kexec.c · 6512274f
      Paul Gortmaker authored
      commit c19ce0ab upstream.
      
      Use proper cpp defined(...) constructs to avoid this:
      
      arch/ia64/kernel/machine_kexec.c: In function 'arch_crash_save_vmcoreinfo':
      arch/ia64/kernel/machine_kexec.c:160:8: warning: "CONFIG_PGTABLE_4" is not defined
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6512274f
    • Ian Abbott's avatar
      staging: comedi: cb_pcidio: fix for newer PCI-DIO48H · 11e106ef
      Ian Abbott authored
      commit 0283f7a1 upstream.
      
      At some point, Measurement Computing / ComputerBoards redesigned the
      PCI-DIO48H to use a PLX PCI interface chip instead of an AMCC chip.
      This meant they had to put their hardware registers in the PCI BAR 2
      region instead of PCI BAR 1.  Unfortunately, they kept the same PCI
      device ID for the new design.  This means the driver recognizes the
      newer cards, but doesn't work (and is likely to screw up the local
      configuration registers of the PLX chip) because it's using the wrong
      region.
      
      Since all the supported boards have the DIO registers in the PCI BAR 2
      region except for older PCI-DIO48H boards which have an empty PCI BAR 2
      region and the DIO registers in PCI BAR 1, determine which PCI BAR
      region to use based on whether the PCI BAR 2 region is empty or not.
      
      This change makes the `dioregs_badrindex` member of `struct
      pcidio_board` redundant.  The `pcicontroler_badrindex` member is also
      unused, so remove both members.
      Signed-off-by: default avatarIan Abbott <abbotti@mev.co.uk>
      Cc: kernel-team@lists.ubuntu.com
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      11e106ef
    • Jianguo Wu's avatar
      mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully · 0ebf55cf
      Jianguo Wu authored
      commit a49ecbcd upstream.
      
      After a successful hugetlb page migration by soft offline, the source
      page will either be freed into hugepage_freelists or buddy(over-commit
      page).  If page is in buddy, page_hstate(page) will be NULL.  It will
      hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page().
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
        IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0
        PGD c23762067 PUD c24be2067 PMD 0
        Oops: 0000 [#1] SMP
      
      So check PageHuge(page) after call migrate_pages() successfully.
      Signed-off-by: default avatarJianguo Wu <wujianguo@huawei.com>
      Tested-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [wujg: backport to 3.4:
       - adjust context
       - s/num_poisoned_pages/mce_bad_pages/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0ebf55cf
    • Yijing Wang's avatar
      PCI: Enable ARI if dev and upstream bridge support it; disable otherwise · 3a6ac4b9
      Yijing Wang authored
      commit b0cc6020 upstream.
      
      Currently, we enable ARI in a device's upstream bridge if the bridge and
      the device support it.  But we never disable ARI, even if the device is
      removed and replaced with a device that doesn't support ARI.
      
      This means that if we hot-remove an ARI device and replace it with a
      non-ARI multi-function device, we find only function 0 of the new device
      because the upstream bridge still has ARI enabled, and next_ari_fn()
      only returns function 0 for the new non-ARI device.
      
      This patch disables ARI in the upstream bridge if the device doesn't
      support ARI.  See the PCIe spec, r3.0, sec 6.13.
      
      [bhelgaas: changelog, function comment]
      [yijing: replace PCIe Cap accessor with legacy PCI accessor]
      Signed-off-by: default avatarYijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3a6ac4b9
    • Dave Chinner's avatar
      xfs: Account log unmount transaction correctly · 1c7a9417
      Dave Chinner authored
      commit 3948659e upstream.
      
      There have been a few reports of this warning appearing recently:
      
      XFS (dm-4): xlog_space_left: head behind tail
       tail_cycle = 129, tail_bytes = 20163072
       GH   cycle = 129, GH   bytes = 20162880
      
      The common cause appears to be lots of freeze and unfreeze cycles,
      and the output from the warnings indicates that we are leaking
      around 8 bytes of log space per freeze/unfreeze cycle.
      
      When we freeze the filesystem, we write an unmount record and that
      uses xlog_write directly - a special type of transaction,
      effectively. What it doesn't do, however, is correctly account for
      the log space it uses. The unmount record writes an 8 byte structure
      with a special magic number into the log, and the space this
      consumes is not accounted for in the log ticket tracking the
      operation. Hence we leak 8 bytes every unmount record that is
      written.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1c7a9417
    • Hannes Frederic Sowa's avatar
      net: avoid reference counter overflows on fib_rules in multicast forwarding · 609365b9
      Hannes Frederic Sowa authored
      [ Upstream commit 95f4a45d ]
      
      Bob Falken reported that after 4G packets, multicast forwarding stopped
      working. This was because of a rule reference counter overflow which
      freed the rule as soon as the overflow happend.
      
      This patch solves this by adding the FIB_LOOKUP_NOREF flag to
      fib_rules_lookup calls. This is safe even from non-rcu locked sections
      as in this case the flag only implies not taking a reference to the rule,
      which we don't need at all.
      
      Rules only hold references to the namespace, which are guaranteed to be
      available during the call of the non-rcu protected function reg_vif_xmit
      because of the interface reference which itself holds a reference to
      the net namespace.
      
      Fixes: f0ad0860 ("ipv4: ipmr: support multiple tables")
      Fixes: d1db275d ("ipv6: ip6mr: support multiple tables")
      Reported-by: default avatarBob Falken <NetFestivalHaveFun@gmx.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      609365b9
    • Neal Cardwell's avatar
      inet_diag: fix inet_diag_dump_icsk() timewait socket state logic · 96a042c2
      Neal Cardwell authored
      [ Based upon upstream commit 70315d22 ]
      
      Fix inet_diag_dump_icsk() to reflect the fact that both TIME_WAIT and
      FIN_WAIT2 connections are represented by inet_timewait_sock (not just
      TIME_WAIT). Thus:
      
      (a) We need to iterate through the time_wait buckets if the user wants
      either TIME_WAIT or FIN_WAIT2. (Before fixing this, "ss -nemoi state
      fin-wait-2" would not return any sockets, even if there were some in
      FIN_WAIT2.)
      
      (b) We need to check tw_substate to see if the user wants to dump
      sockets in the particular substate (TIME_WAIT or FIN_WAIT2) that a
      given connection is in. (Before fixing this, "ss -nemoi state
      time-wait" would actually return sockets in state FIN_WAIT2.)
      
      An analogous fix is in v3.13: 70315d22
      ("inet_diag: fix inet_diag_dump_icsk() to use correct state for
      timewait sockets") but that patch is quite different because 3.13 code
      is very different in this area due to the unification of TCP hash
      tables in 05dbc7b5 ("tcp/dccp: remove twchain") in v3.13-rc1.
      
      I tested that this applies cleanly between v3.3 and v3.12, and tested
      that it works in both 3.3 and 3.12. It does not apply cleanly to 3.2
      and earlier (though it makes semantic sense), and semantically is not
      the right fix for 3.13 and beyond (as mentioned above).
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      96a042c2
    • Michal Schmidt's avatar
      bnx2x: fix DMA unmapping of TSO split BDs · 46bdd0fd
      Michal Schmidt authored
      [ Upstream commit 95e92fd4 ]
      
      bnx2x triggers warnings with CONFIG_DMA_API_DEBUG=y:
      
        WARNING: CPU: 0 PID: 2253 at lib/dma-debug.c:887 check_unmap+0xf8/0x920()
        bnx2x 0000:28:00.0: DMA-API: device driver frees DMA memory with
        different size [device address=0x00000000da2b389e] [map size=1490 bytes]
        [unmap size=66 bytes]
      
      The reason is that bnx2x splits a TSO BD into two BDs (headers + data)
      using one DMA mapping for both, but it uses only the length of the first
      BD when unmapping.
      
      This patch fixes the bug by unmapping the whole length of the two BDs.
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      46bdd0fd
    • Curt Brune's avatar
      bridge: use spin_lock_bh() in br_multicast_set_hash_max · f5d992e9
      Curt Brune authored
      [ Upstream commit fe0d692b ]
      
      br_multicast_set_hash_max() is called from process context in
      net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function.
      
      br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock),
      which can deadlock the CPU if a softirq that also tries to take the
      same lock interrupts br_multicast_set_hash_max() while the lock is
      held .  This can happen quite easily when any of the bridge multicast
      timers expire, which try to take the same lock.
      
      The fix here is to use spin_lock_bh(), preventing other softirqs from
      executing on this CPU.
      
      Steps to reproduce:
      
      1. Create a bridge with several interfaces (I used 4).
      2. Set the "multicast query interval" to a low number, like 2.
      3. Enable the bridge as a multicast querier.
      4. Repeatedly set the bridge hash_max parameter via sysfs.
      
        # brctl addbr br0
        # brctl addif br0 eth1 eth2 eth3 eth4
        # brctl setmcqi br0 2
        # brctl setmcquerier br0 1
      
        # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done
      Signed-off-by: default avatarCurt Brune <curt@cumulusnetworks.com>
      Signed-off-by: default avatarScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f5d992e9
    • Daniel Borkmann's avatar
      net: llc: fix use after free in llc_ui_recvmsg · 10cc9996
      Daniel Borkmann authored
      [ Upstream commit 4d231b76 ]
      
      While commit 30a584d9 fixes datagram interface in LLC, a use
      after free bug has been introduced for SOCK_STREAM sockets that do
      not make use of MSG_PEEK.
      
      The flow is as follow ...
      
        if (!(flags & MSG_PEEK)) {
          ...
          sk_eat_skb(sk, skb, false);
          ...
        }
        ...
        if (used + offset < skb->len)
          continue;
      
      ... where sk_eat_skb() calls __kfree_skb(). Therefore, cache
      original length and work on skb_len to check partial reads.
      
      Fixes: 30a584d9 ("[LLX]: SOCK_DGRAM interface fixes")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      10cc9996
    • David S. Miller's avatar
      vlan: Fix header ops passthru when doing TX VLAN offload. · 31da3597
      David S. Miller authored
      [ Upstream commit 2205369a ]
      
      When the vlan code detects that the real device can do TX VLAN offloads
      in hardware, it tries to arrange for the real device's header_ops to
      be invoked directly.
      
      But it does so illegally, by simply hooking the real device's
      header_ops up to the VLAN device.
      
      This doesn't work because we will end up invoking a set of header_ops
      routines which expect a device type which matches the real device, but
      will see a VLAN device instead.
      
      Fix this by providing a pass-thru set of header_ops which will arrange
      to pass the proper real device instead.
      
      To facilitate this add a dev_rebuild_header().  There are
      implementations which provide a ->cache and ->create but not a
      ->rebuild (f.e. PLIP).  So we need a helper function just like
      dev_hard_header() to avoid crashes.
      
      Use this helper in the one existing place where the
      header_ops->rebuild was being invoked, the neighbour code.
      
      With lots of help from Florian Westphal.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      31da3597
    • Florian Westphal's avatar
      net: rose: restore old recvmsg behavior · 52994125
      Florian Westphal authored
      [ Upstream commit f81152e3 ]
      
      recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen.
      
      After commit f3d33426
      (net: rework recvmsg handler msg_name and msg_namelen logic), we now
      always take the else branch due to namelen being initialized to 0.
      
      Digging in netdev-vger-cvs git repo shows that msg_namelen was
      initialized with a fixed-size since at least 1995, so the else branch
      was never taken.
      
      Compile tested only.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      52994125
    • Sasha Levin's avatar
      rds: prevent dereference of a NULL device · 95ae3677
      Sasha Levin authored
      [ Upstream commit c2349758 ]
      
      Binding might result in a NULL device, which is dereferenced
      causing this BUG:
      
      [ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097
      4
      [ 1317.261847] IP: [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110
      [ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0
      [ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [ 1317.264179] Dumping ftrace buffer:
      [ 1317.264774]    (ftrace buffer empty)
      [ 1317.265220] Modules linked in:
      [ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G        W    3.13.0-rc4-
      next-20131218-sasha-00013-g2cebb9b-dirty #4159
      [ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000
      [ 1317.268399] RIP: 0010:[<ffffffff84225f52>]  [<ffffffff84225f52>] rds_ib_laddr_check+
      0x82/0x110
      [ 1317.269670] RSP: 0000:ffff8803cd31bdf8  EFLAGS: 00010246
      [ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000
      [ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286
      [ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000
      [ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      [ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031
      [ 1317.270230] FS:  00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000
      0000
      [ 1317.270230] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0
      [ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
      [ 1317.270230] Stack:
      [ 1317.270230]  0000000054086700 5408670000a25de0 5408670000000002 0000000000000000
      [ 1317.270230]  ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160
      [ 1317.270230]  ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280
      [ 1317.270230] Call Trace:
      [ 1317.270230]  [<ffffffff84223542>] ? rds_trans_get_preferred+0x42/0xa0
      [ 1317.270230]  [<ffffffff84223556>] rds_trans_get_preferred+0x56/0xa0
      [ 1317.270230]  [<ffffffff8421c9c3>] rds_bind+0x73/0xf0
      [ 1317.270230]  [<ffffffff83e4ce62>] SYSC_bind+0x92/0xf0
      [ 1317.270230]  [<ffffffff812493f8>] ? context_tracking_user_exit+0xb8/0x1d0
      [ 1317.270230]  [<ffffffff8119313d>] ? trace_hardirqs_on+0xd/0x10
      [ 1317.270230]  [<ffffffff8107a852>] ? syscall_trace_enter+0x32/0x290
      [ 1317.270230]  [<ffffffff83e4cece>] SyS_bind+0xe/0x10
      [ 1317.270230]  [<ffffffff843a6ad0>] tracesys+0xdd/0xe2
      [ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00
      89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 <80> b8 74 09 00 00 01 7
      4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02
      [ 1317.270230] RIP  [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110
      [ 1317.270230]  RSP <ffff8803cd31bdf8>
      [ 1317.270230] CR2: 0000000000000974
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      95ae3677
    • Salva Peiró's avatar
      hamradio/yam: fix info leak in ioctl · 794ce89c
      Salva Peiró authored
      [ Upstream commit 8e3fbf87 ]
      
      The yam_ioctl() code fails to initialise the cmd field
      of the struct yamdrv_ioctl_cfg. Add an explicit memset(0)
      before filling the structure to avoid the 4-byte info leak.
      Signed-off-by: default avatarSalva Peiró <speiro@ai2.upv.es>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      794ce89c
    • Wenliang Fan's avatar
      drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl() · 6ea9c09b
      Wenliang Fan authored
      [ Upstream commit e9db5c21 ]
      
      The local variable 'bi' comes from userspace. If userspace passed a
      large number to 'bi.data.calibrate', there would be an integer overflow
      in the following line:
      	s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16;
      Signed-off-by: default avatarWenliang Fan <fanwlexca@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6ea9c09b
    • Daniel Borkmann's avatar
      net: inet_diag: zero out uninitialized idiag_{src,dst} fields · 9229facb
      Daniel Borkmann authored
      [ Upstream commit b1aac815 ]
      
      Jakub reported while working with nlmon netlink sniffer that parts of
      the inet_diag_sockid are not initialized when r->idiag_family != AF_INET6.
      That is, fields of r->id.idiag_src[1 ... 3], r->id.idiag_dst[1 ... 3].
      
      In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab]
      memory through this. At least, in udp_dump_one(), we allocate a skb in ...
      
        rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL);
      
      ... and then pass that to inet_sk_diag_fill() that puts the whole struct
      inet_diag_msg into the skb, where we only fill out r->id.idiag_src[0],
      r->id.idiag_dst[0] and leave the rest untouched:
      
        r->id.idiag_src[0] = inet->inet_rcv_saddr;
        r->id.idiag_dst[0] = inet->inet_daddr;
      
      struct inet_diag_msg embeds struct inet_diag_sockid that is correctly /
      fully filled out in IPv6 case, but for IPv4 not.
      
      So just zero them out by using plain memset (for this little amount of
      bytes it's probably not worth the extra check for idiag_family == AF_INET).
      
      Similarly, fix also other places where we fill that out.
      Reported-by: default avatarJakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9229facb
    • Sasha Levin's avatar
      net: unix: allow bind to fail on mutex lock · 2e737a8a
      Sasha Levin authored
      [ Upstream commit 37ab4fa7 ]
      
      This is similar to the set_peek_off patch where calling bind while the
      socket is stuck in unix_dgram_recvmsg() will block and cause a hung task
      spew after a while.
      
      This is also the last place that did a straightforward mutex_lock(), so
      there shouldn't be any more of these patches.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2e737a8a
    • Nat Gurumoorthy's avatar
      tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0 · 6e890b0a
      Nat Gurumoorthy authored
      [ Upstream commit 388d3335 ]
      
      The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120)
      uninitialized. From power on reset this register may have garbage in it. The
      Register Base Address register defines the device local address of a
      register. The data pointed to by this location is read or written using
      the Register Data register (PCI config offset 128). When REG_BASE_ADDR has
      garbage any read or write of Register Data Register (PCI 128) will cause the
      PCI bus to lock up. The TCO watchdog will fire and bring down the system.
      Signed-off-by: default avatarNat Gurumoorthy <natg@google.com>
      Acked-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6e890b0a
    • Changli Gao's avatar
      net: drop_monitor: fix the value of maxattr · 9898c396
      Changli Gao authored
      [ Upstream commit d323e92c ]
      
      maxattr in genl_family should be used to save the max attribute
      type, but not the max command type. Drop monitor doesn't support
      any attributes, so we should leave it as zero.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9898c396
    • Hannes Frederic Sowa's avatar
      ipv6: don't count addrconf generated routes against gc limit · 66d66a81
      Hannes Frederic Sowa authored
      [ Upstream commit a3300ef4 ]
      
      Brett Ciphery reported that new ipv6 addresses failed to get installed
      because the addrconf generated dsts where counted against the dst gc
      limit. We don't need to count those routes like we currently don't count
      administratively added routes.
      
      Because the max_addresses check enforces a limit on unbounded address
      generation first in case someone plays with router advertisments, we
      are still safe here.
      Reported-by: default avatarBrett Ciphery <brett.ciphery@windriver.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      66d66a81
    • Venkat Venkatsubra's avatar
      rds: prevent BUG_ON triggered on congestion update to loopback · 2c317886
      Venkat Venkatsubra authored
      [ Upstream commit 18fc25c9 ]
      
      After congestion update on a local connection, when rds_ib_xmit returns
      less bytes than that are there in the message, rds_send_xmit calls
      back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
      to trigger.
      
      For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
      the message contains 8240 bytes. rds_send_xmit thinks there is more to send
      and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
      =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].
      
      The commit 6094628b
      "rds: prevent BUG_ON triggering on congestion map updates" introduced
      this regression. That change was addressing the triggering of a different
      BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
       	BUG_ON(ret != 0 &&
          		 conn->c_xmit_sg == rm->data.op_nents);
      This was the sequence it was going through:
      (rds_ib_xmit)
      /* Do not send cong updates to IB loopback */
      if (conn->c_loopback
         && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
        	rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
          	return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
      }
      rds_ib_xmit returns 8240
      rds_send_xmit:
        c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
         		 = 8192
        c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
      rds_ib_xmit returns 8240
      rds_send_xmit:
        c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
        and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
      rds_ib_xmit returns 8240
      On this iteration this sequence causes the BUG_ON in rds_send_xmit:
          while (ret) {
          	tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
          	[tmp = 65536 - 57632 = 7904]
          	conn->c_xmit_data_off += tmp;
          	[c_xmit_data_off = 57632 + 7904 = 65536]
          	ret -= tmp;
          	[ret = 8240 - 7904 = 336]
          	if (conn->c_xmit_data_off == sg->length) {
          		conn->c_xmit_data_off = 0;
          		sg++;
          		conn->c_xmit_sg++;
          		BUG_ON(ret != 0 &&
          			conn->c_xmit_sg == rm->data.op_nents);
          		[c_xmit_sg = 1, rm->data.op_nents = 1]
      
      What the current fix does:
      Since the congestion update over loopback is not actually transmitted
      as a message, all that rds_ib_xmit needs to do is let the caller think
      the full message has been transmitted and not return partial bytes.
      It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
      And 64Kb+48 when page size is 64Kb.
      Reported-by: default avatarJosh Hunt <joshhunt00@gmail.com>
      Tested-by: default avatarHonggang Li <honli@redhat.com>
      Acked-by: default avatarBang Nguyen <bang.nguyen@oracle.com>
      Signed-off-by: default avatarVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2c317886
    • Eric Dumazet's avatar
      net: do not pretend FRAGLIST support · b0809483
      Eric Dumazet authored
      [ Upstream commit 28e24c62 ]
      
      Few network drivers really supports frag_list : virtual drivers.
      
      Some drivers wrongly advertise NETIF_F_FRAGLIST feature.
      
      If skb with a frag_list is given to them, packet on the wire will be
      corrupt.
      
      Remove this flag, as core networking stack will make sure to
      provide packets that can be sent without corruption.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Cc: Anirudha Sarangi <anirudh@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b0809483
  2. 03 Jan, 2014 1 commit