1. 15 Feb, 2014 33 commits
    • Junho Ryu's avatar
      ext4: fix use-after-free in ext4_mb_new_blocks · ec94b7ab
      Junho Ryu authored
      commit 4e8d2139 upstream.
      
      ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count.
      While ext4_mb_use_preallocated checks pa->pa_deleted first and then
      increments pa->count later, ext4_mb_put_pa decrements pa->pa_count
      before holding pa->pa_lock and then sets pa->pa_deleted.
      
      * Free sequence
      ext4_mb_put_pa (1):		atomic_dec_and_test pa->pa_count
      ext4_mb_put_pa (2):		lock pa->pa_lock
      ext4_mb_put_pa (3):			check pa->pa_deleted
      ext4_mb_put_pa (4):			set pa->pa_deleted=1
      ext4_mb_put_pa (5):		unlock pa->pa_lock
      ext4_mb_put_pa (6):		remove pa from a list
      ext4_mb_pa_callback:		free pa
      
      * Use sequence
      ext4_mb_use_preallocated (1):	iterate over preallocation
      ext4_mb_use_preallocated (2):	lock pa->pa_lock
      ext4_mb_use_preallocated (3):		check pa->pa_deleted
      ext4_mb_use_preallocated (4):		increase pa->pa_count
      ext4_mb_use_preallocated (5):	unlock pa->pa_lock
      ext4_mb_release_context:	access pa
      
      * Use-after-free sequence
      [initial status]		<pa->pa_deleted = 0, pa_count = 1>
      ext4_mb_use_preallocated (1):	iterate over preallocation
      ext4_mb_use_preallocated (2):	lock pa->pa_lock
      ext4_mb_use_preallocated (3):		check pa->pa_deleted
      ext4_mb_put_pa (1):		atomic_dec_and_test pa->pa_count
      [pa_count decremented]		<pa->pa_deleted = 0, pa_count = 0>
      ext4_mb_use_preallocated (4):		increase pa->pa_count
      [pa_count incremented]		<pa->pa_deleted = 0, pa_count = 1>
      ext4_mb_use_preallocated (5):	unlock pa->pa_lock
      ext4_mb_put_pa (2):		lock pa->pa_lock
      ext4_mb_put_pa (3):			check pa->pa_deleted
      ext4_mb_put_pa (4):			set pa->pa_deleted=1
      [race condition!]		<pa->pa_deleted = 1, pa_count = 1>
      ext4_mb_put_pa (5):		unlock pa->pa_lock
      ext4_mb_put_pa (6):		remove pa from a list
      ext4_mb_pa_callback:		free pa
      ext4_mb_release_context:	access pa
      
      AddressSanitizer has detected use-after-free in ext4_mb_new_blocks
      Bug report: http://goo.gl/rG1On3Signed-off-by: default avatarJunho Ryu <jayr@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ec94b7ab
    • Theodore Ts'o's avatar
      ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails · 9eb492b8
      Theodore Ts'o authored
      commit ae1495b1 upstream.
      
      While it's true that errors can only happen if there is a bug in
      jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt
      the kernel or remount the file system read-only in order to avoid
      further data loss.  The ext4_journal_abort_handle() function doesn't
      do any of this, and while it's likely that this call (since it doesn't
      adjust refcounts) will likely result in the file system eventually
      deadlocking since the current transaction will never be able to close,
      it's much cleaner to call let ext4's error handling system deal with
      this situation.
      
      There's a separate bug here which is that if certain jbd2 errors
      errors occur and file system is mounted errors=continue, the file
      system will probably eventually end grind to a halt as described
      above.  But things have been this way in a long time, and usually when
      we have these sorts of errors it's pretty much a disaster --- and
      that's why the jbd2 layer aggressively retries memory allocations,
      which is the most likely cause of these jbd2 errors.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      [bwh: Backported to 3.2: drop logging of missing transaction debug data]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9eb492b8
    • Michele Baldessari's avatar
      libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8 · cbeb052c
      Michele Baldessari authored
      commit 87809942 upstream.
      
      We've received multiple reports in Fedora via (BZ 907193)
      that the Seagate Momentus SpinPoint M8 errors out when enabling AA:
      [    2.555905] ata2.00: failed to enable AA (error_mask=0x1)
      [    2.568482] ata2.00: failed to enable AA (error_mask=0x1)
      
      Add the ATA_HORKAGE_BROKEN_FPDMA_AA for this specific harddisk.
      Reported-by: default avatarNicholas <arealityfarbetween@googlemail.com>
      Signed-off-by: default avatarMichele Baldessari <michele@acksyn.org>
      Tested-by: default avatarNicholas <arealityfarbetween@googlemail.com>
      Acked-by: default avatarAlan Cox <gnomes@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      cbeb052c
    • Geert Uytterhoeven's avatar
      sh: always link in helper functions extracted from libgcc · bed3dd59
      Geert Uytterhoeven authored
      commit 84ed8a99 upstream.
      
      E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m:
      
        ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined!
      
      For "lib-y", if no symbols in a compilation unit are referenced by other
      units, the compilation unit will not be included in vmlinux.  This
      breaks modules that do reference those symbols.
      
      Use "obj-y" instead to fix this.
      
      http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/
      
      This doesn't fix all cases. There are others, e.g. udivsi3.
      This is also not limited to sh, many architectures handle this in the
      same way.
      
      A simple solution is to unconditionally include all helper functions.
      A more complex solution is to make the choice of "lib-y" or "obj-y" depend
      on CONFIG_MODULES:
      
        obj-$(CONFIG_MODULES) += ...
        lib-y($CONFIG_MODULES) += ...
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Tested-by: default avatarNobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
      Reviewed-by: default avatarNobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bed3dd59
    • Yan, Zheng's avatar
      ceph: wake up 'safe' waiters when unregistering request · f4ca736c
      Yan, Zheng authored
      commit fc55d2c9 upstream.
      
      We also need to wake up 'safe' waiters if error occurs or request
      aborted. Otherwise sync(2)/fsync(2) may hang forever.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f4ca736c
    • Yan, Zheng's avatar
      ceph: cleanup aborted requests when re-sending requests. · b072f9ca
      Yan, Zheng authored
      commit eb1b8af3 upstream.
      
      Aborted requests usually get cleared when the reply is received.
      If MDS crashes, no reply will be received. So we need to cleanup
      aborted requests when re-sending requests.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarGreg Farnum <greg@inktank.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b072f9ca
    • Akira Takeuchi's avatar
      mm: ensure get_unmapped_area() returns higher address than mmap_min_addr · 7f4d2460
      Akira Takeuchi authored
      commit 2afc745f upstream.
      
      This patch fixes the problem that get_unmapped_area() can return illegal
      address and result in failing mmap(2) etc.
      
      In case that the address higher than PAGE_SIZE is set to
      /proc/sys/vm/mmap_min_addr, the address lower than mmap_min_addr can be
      returned by get_unmapped_area(), even if you do not pass any virtual
      address hint (i.e.  the second argument).
      
      This is because the current get_unmapped_area() code does not take into
      account mmap_min_addr.
      
      This leads to two actual problems as follows:
      
      1. mmap(2) can fail with EPERM on the process without CAP_SYS_RAWIO,
         although any illegal parameter is not passed.
      
      2. The bottom-up search path after the top-down search might not work in
         arch_get_unmapped_area_topdown().
      
      Note: The first and third chunk of my patch, which changes "len" check,
      are for more precise check using mmap_min_addr, and not for solving the
      above problem.
      
      [How to reproduce]
      
      	--- test.c -------------------------------------------------
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <sys/mman.h>
      	#include <sys/errno.h>
      
      	int main(int argc, char *argv[])
      	{
      		void *ret = NULL, *last_map;
      		size_t pagesize = sysconf(_SC_PAGESIZE);
      
      		do {
      			last_map = ret;
      			ret = mmap(0, pagesize, PROT_NONE,
      				MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      	//		printf("ret=%p\n", ret);
      		} while (ret != MAP_FAILED);
      
      		if (errno != ENOMEM) {
      			printf("ERR: unexpected errno: %d (last map=%p)\n",
      			errno, last_map);
      		}
      
      		return 0;
      	}
      	---------------------------------------------------------------
      
      	$ gcc -m32 -o test test.c
      	$ sudo sysctl -w vm.mmap_min_addr=65536
      	vm.mmap_min_addr = 65536
      	$ ./test  (run as non-priviledge user)
      	ERR: unexpected errno: 1 (last map=0x10000)
      Signed-off-by: default avatarAkira Takeuchi <takeuchi.akr@jp.panasonic.com>
      Signed-off-by: default avatarKiyoshi Owada <owada.kiyoshi@jp.panasonic.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2:
       As we do not have vm_unmapped_area(), make arch_get_unmapped_area_topdown()
       calculate the lower limit for the new area's end address and then compare
       addresses with this instead of with len.  In the process, fix an off-by-one
       error which could result in returning 0 if mm->mmap_base == len.]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7f4d2460
    • Linus Torvalds's avatar
      x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround · bbc220ab
      Linus Torvalds authored
      commit 26bef131 upstream.
      
      Before we do an EMMS in the AMD FXSAVE information leak workaround we
      need to clear any pending exceptions, otherwise we trap with a
      floating-point exception inside this code.
      Reported-by: default avatarhalfdog <me@halfdog.net>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      [bwh: Backported to 3.2: adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bbc220ab
    • Andy Honig's avatar
      KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) · 6aa82e03
      Andy Honig authored
      commit fda4e2e8 upstream.
      
      In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
      potential to corrupt kernel memory if userspace provides an address that
      is at the end of a page.  This patches concerts those functions to use
      kvm_write_guest_cached and kvm_read_guest_cached.  It also checks the
      vapic_address specified by userspace during ioctl processing and returns
      an error to userspace if the address is not a valid GPA.
      
      This is generally not guest triggerable, because the required write is
      done by firmware that runs before the guest.  Also, it only affects AMD
      processors and oldish Intel that do not have the FlexPriority feature
      (unless you disable FlexPriority, of course; then newer processors are
      also affected).
      
      Fixes: b93463aa ('KVM: Accelerated apic support')
      Reported-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [dannf: backported to Debian's 3.2]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6aa82e03
    • Mathy Vanhoef's avatar
      ath9k_htc: properly set MAC address and BSSID mask · f7a9877c
      Mathy Vanhoef authored
      commit 657eb17d upstream.
      
      Pick the MAC address of the first virtual interface as the new hardware MAC
      address. Set BSSID mask according to this MAC address. This fixes CVE-2013-4579.
      Signed-off-by: default avatarMathy Vanhoef <vanhoefm@gmail.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f7a9877c
    • Mikulas Patocka's avatar
      hpfs: fix warnings when the filesystem fills up · bfefd2a8
      Mikulas Patocka authored
      commit bbd465df upstream.
      
      This patch fixes warnings due to missing lock on write error path.
      
        WARNING: at fs/hpfs/hpfs_fn.h:353 hpfs_truncate+0x75/0x80 [hpfs]()
        Hardware name: empty
        Pid: 26563, comm: dd Tainted: P           O 3.9.4 #12
        Call Trace:
          hpfs_truncate+0x75/0x80 [hpfs]
          hpfs_write_begin+0x84/0x90 [hpfs]
          _hpfs_bmap+0x10/0x10 [hpfs]
          generic_file_buffered_write+0x121/0x2c0
          __generic_file_aio_write+0x1c7/0x3f0
          generic_file_aio_write+0x7c/0x100
          do_sync_write+0x98/0xd0
          hpfs_file_write+0xd/0x50 [hpfs]
          vfs_write+0xa2/0x160
          sys_write+0x51/0xa0
          page_fault+0x22/0x30
          system_call_fastpath+0x1a/0x1f
      Signed-off-by: default avatarMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [Mikulas Patocka: This is backport of upstream commit 
       bbd465df, modified for stable kernels 
       2.6.39 - 3.7.]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bfefd2a8
    • Paul Gortmaker's avatar
      Fix warning from machine_kexec.c · 6512274f
      Paul Gortmaker authored
      commit c19ce0ab upstream.
      
      Use proper cpp defined(...) constructs to avoid this:
      
      arch/ia64/kernel/machine_kexec.c: In function 'arch_crash_save_vmcoreinfo':
      arch/ia64/kernel/machine_kexec.c:160:8: warning: "CONFIG_PGTABLE_4" is not defined
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6512274f
    • Ian Abbott's avatar
      staging: comedi: cb_pcidio: fix for newer PCI-DIO48H · 11e106ef
      Ian Abbott authored
      commit 0283f7a1 upstream.
      
      At some point, Measurement Computing / ComputerBoards redesigned the
      PCI-DIO48H to use a PLX PCI interface chip instead of an AMCC chip.
      This meant they had to put their hardware registers in the PCI BAR 2
      region instead of PCI BAR 1.  Unfortunately, they kept the same PCI
      device ID for the new design.  This means the driver recognizes the
      newer cards, but doesn't work (and is likely to screw up the local
      configuration registers of the PLX chip) because it's using the wrong
      region.
      
      Since all the supported boards have the DIO registers in the PCI BAR 2
      region except for older PCI-DIO48H boards which have an empty PCI BAR 2
      region and the DIO registers in PCI BAR 1, determine which PCI BAR
      region to use based on whether the PCI BAR 2 region is empty or not.
      
      This change makes the `dioregs_badrindex` member of `struct
      pcidio_board` redundant.  The `pcicontroler_badrindex` member is also
      unused, so remove both members.
      Signed-off-by: default avatarIan Abbott <abbotti@mev.co.uk>
      Cc: kernel-team@lists.ubuntu.com
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      11e106ef
    • Jianguo Wu's avatar
      mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully · 0ebf55cf
      Jianguo Wu authored
      commit a49ecbcd upstream.
      
      After a successful hugetlb page migration by soft offline, the source
      page will either be freed into hugepage_freelists or buddy(over-commit
      page).  If page is in buddy, page_hstate(page) will be NULL.  It will
      hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page().
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
        IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0
        PGD c23762067 PUD c24be2067 PMD 0
        Oops: 0000 [#1] SMP
      
      So check PageHuge(page) after call migrate_pages() successfully.
      Signed-off-by: default avatarJianguo Wu <wujianguo@huawei.com>
      Tested-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [wujg: backport to 3.4:
       - adjust context
       - s/num_poisoned_pages/mce_bad_pages/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0ebf55cf
    • Yijing Wang's avatar
      PCI: Enable ARI if dev and upstream bridge support it; disable otherwise · 3a6ac4b9
      Yijing Wang authored
      commit b0cc6020 upstream.
      
      Currently, we enable ARI in a device's upstream bridge if the bridge and
      the device support it.  But we never disable ARI, even if the device is
      removed and replaced with a device that doesn't support ARI.
      
      This means that if we hot-remove an ARI device and replace it with a
      non-ARI multi-function device, we find only function 0 of the new device
      because the upstream bridge still has ARI enabled, and next_ari_fn()
      only returns function 0 for the new non-ARI device.
      
      This patch disables ARI in the upstream bridge if the device doesn't
      support ARI.  See the PCIe spec, r3.0, sec 6.13.
      
      [bhelgaas: changelog, function comment]
      [yijing: replace PCIe Cap accessor with legacy PCI accessor]
      Signed-off-by: default avatarYijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3a6ac4b9
    • Dave Chinner's avatar
      xfs: Account log unmount transaction correctly · 1c7a9417
      Dave Chinner authored
      commit 3948659e upstream.
      
      There have been a few reports of this warning appearing recently:
      
      XFS (dm-4): xlog_space_left: head behind tail
       tail_cycle = 129, tail_bytes = 20163072
       GH   cycle = 129, GH   bytes = 20162880
      
      The common cause appears to be lots of freeze and unfreeze cycles,
      and the output from the warnings indicates that we are leaking
      around 8 bytes of log space per freeze/unfreeze cycle.
      
      When we freeze the filesystem, we write an unmount record and that
      uses xlog_write directly - a special type of transaction,
      effectively. What it doesn't do, however, is correctly account for
      the log space it uses. The unmount record writes an 8 byte structure
      with a special magic number into the log, and the space this
      consumes is not accounted for in the log ticket tracking the
      operation. Hence we leak 8 bytes every unmount record that is
      written.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1c7a9417
    • Hannes Frederic Sowa's avatar
      net: avoid reference counter overflows on fib_rules in multicast forwarding · 609365b9
      Hannes Frederic Sowa authored
      [ Upstream commit 95f4a45d ]
      
      Bob Falken reported that after 4G packets, multicast forwarding stopped
      working. This was because of a rule reference counter overflow which
      freed the rule as soon as the overflow happend.
      
      This patch solves this by adding the FIB_LOOKUP_NOREF flag to
      fib_rules_lookup calls. This is safe even from non-rcu locked sections
      as in this case the flag only implies not taking a reference to the rule,
      which we don't need at all.
      
      Rules only hold references to the namespace, which are guaranteed to be
      available during the call of the non-rcu protected function reg_vif_xmit
      because of the interface reference which itself holds a reference to
      the net namespace.
      
      Fixes: f0ad0860 ("ipv4: ipmr: support multiple tables")
      Fixes: d1db275d ("ipv6: ip6mr: support multiple tables")
      Reported-by: default avatarBob Falken <NetFestivalHaveFun@gmx.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      609365b9
    • Neal Cardwell's avatar
      inet_diag: fix inet_diag_dump_icsk() timewait socket state logic · 96a042c2
      Neal Cardwell authored
      [ Based upon upstream commit 70315d22 ]
      
      Fix inet_diag_dump_icsk() to reflect the fact that both TIME_WAIT and
      FIN_WAIT2 connections are represented by inet_timewait_sock (not just
      TIME_WAIT). Thus:
      
      (a) We need to iterate through the time_wait buckets if the user wants
      either TIME_WAIT or FIN_WAIT2. (Before fixing this, "ss -nemoi state
      fin-wait-2" would not return any sockets, even if there were some in
      FIN_WAIT2.)
      
      (b) We need to check tw_substate to see if the user wants to dump
      sockets in the particular substate (TIME_WAIT or FIN_WAIT2) that a
      given connection is in. (Before fixing this, "ss -nemoi state
      time-wait" would actually return sockets in state FIN_WAIT2.)
      
      An analogous fix is in v3.13: 70315d22
      ("inet_diag: fix inet_diag_dump_icsk() to use correct state for
      timewait sockets") but that patch is quite different because 3.13 code
      is very different in this area due to the unification of TCP hash
      tables in 05dbc7b5 ("tcp/dccp: remove twchain") in v3.13-rc1.
      
      I tested that this applies cleanly between v3.3 and v3.12, and tested
      that it works in both 3.3 and 3.12. It does not apply cleanly to 3.2
      and earlier (though it makes semantic sense), and semantically is not
      the right fix for 3.13 and beyond (as mentioned above).
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      96a042c2
    • Michal Schmidt's avatar
      bnx2x: fix DMA unmapping of TSO split BDs · 46bdd0fd
      Michal Schmidt authored
      [ Upstream commit 95e92fd4 ]
      
      bnx2x triggers warnings with CONFIG_DMA_API_DEBUG=y:
      
        WARNING: CPU: 0 PID: 2253 at lib/dma-debug.c:887 check_unmap+0xf8/0x920()
        bnx2x 0000:28:00.0: DMA-API: device driver frees DMA memory with
        different size [device address=0x00000000da2b389e] [map size=1490 bytes]
        [unmap size=66 bytes]
      
      The reason is that bnx2x splits a TSO BD into two BDs (headers + data)
      using one DMA mapping for both, but it uses only the length of the first
      BD when unmapping.
      
      This patch fixes the bug by unmapping the whole length of the two BDs.
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      46bdd0fd
    • Curt Brune's avatar
      bridge: use spin_lock_bh() in br_multicast_set_hash_max · f5d992e9
      Curt Brune authored
      [ Upstream commit fe0d692b ]
      
      br_multicast_set_hash_max() is called from process context in
      net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function.
      
      br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock),
      which can deadlock the CPU if a softirq that also tries to take the
      same lock interrupts br_multicast_set_hash_max() while the lock is
      held .  This can happen quite easily when any of the bridge multicast
      timers expire, which try to take the same lock.
      
      The fix here is to use spin_lock_bh(), preventing other softirqs from
      executing on this CPU.
      
      Steps to reproduce:
      
      1. Create a bridge with several interfaces (I used 4).
      2. Set the "multicast query interval" to a low number, like 2.
      3. Enable the bridge as a multicast querier.
      4. Repeatedly set the bridge hash_max parameter via sysfs.
      
        # brctl addbr br0
        # brctl addif br0 eth1 eth2 eth3 eth4
        # brctl setmcqi br0 2
        # brctl setmcquerier br0 1
      
        # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done
      Signed-off-by: default avatarCurt Brune <curt@cumulusnetworks.com>
      Signed-off-by: default avatarScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f5d992e9
    • Daniel Borkmann's avatar
      net: llc: fix use after free in llc_ui_recvmsg · 10cc9996
      Daniel Borkmann authored
      [ Upstream commit 4d231b76 ]
      
      While commit 30a584d9 fixes datagram interface in LLC, a use
      after free bug has been introduced for SOCK_STREAM sockets that do
      not make use of MSG_PEEK.
      
      The flow is as follow ...
      
        if (!(flags & MSG_PEEK)) {
          ...
          sk_eat_skb(sk, skb, false);
          ...
        }
        ...
        if (used + offset < skb->len)
          continue;
      
      ... where sk_eat_skb() calls __kfree_skb(). Therefore, cache
      original length and work on skb_len to check partial reads.
      
      Fixes: 30a584d9 ("[LLX]: SOCK_DGRAM interface fixes")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      10cc9996
    • David S. Miller's avatar
      vlan: Fix header ops passthru when doing TX VLAN offload. · 31da3597
      David S. Miller authored
      [ Upstream commit 2205369a ]
      
      When the vlan code detects that the real device can do TX VLAN offloads
      in hardware, it tries to arrange for the real device's header_ops to
      be invoked directly.
      
      But it does so illegally, by simply hooking the real device's
      header_ops up to the VLAN device.
      
      This doesn't work because we will end up invoking a set of header_ops
      routines which expect a device type which matches the real device, but
      will see a VLAN device instead.
      
      Fix this by providing a pass-thru set of header_ops which will arrange
      to pass the proper real device instead.
      
      To facilitate this add a dev_rebuild_header().  There are
      implementations which provide a ->cache and ->create but not a
      ->rebuild (f.e. PLIP).  So we need a helper function just like
      dev_hard_header() to avoid crashes.
      
      Use this helper in the one existing place where the
      header_ops->rebuild was being invoked, the neighbour code.
      
      With lots of help from Florian Westphal.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      31da3597
    • Florian Westphal's avatar
      net: rose: restore old recvmsg behavior · 52994125
      Florian Westphal authored
      [ Upstream commit f81152e3 ]
      
      recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen.
      
      After commit f3d33426
      (net: rework recvmsg handler msg_name and msg_namelen logic), we now
      always take the else branch due to namelen being initialized to 0.
      
      Digging in netdev-vger-cvs git repo shows that msg_namelen was
      initialized with a fixed-size since at least 1995, so the else branch
      was never taken.
      
      Compile tested only.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      52994125
    • Sasha Levin's avatar
      rds: prevent dereference of a NULL device · 95ae3677
      Sasha Levin authored
      [ Upstream commit c2349758 ]
      
      Binding might result in a NULL device, which is dereferenced
      causing this BUG:
      
      [ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097
      4
      [ 1317.261847] IP: [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110
      [ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0
      [ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [ 1317.264179] Dumping ftrace buffer:
      [ 1317.264774]    (ftrace buffer empty)
      [ 1317.265220] Modules linked in:
      [ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G        W    3.13.0-rc4-
      next-20131218-sasha-00013-g2cebb9b-dirty #4159
      [ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000
      [ 1317.268399] RIP: 0010:[<ffffffff84225f52>]  [<ffffffff84225f52>] rds_ib_laddr_check+
      0x82/0x110
      [ 1317.269670] RSP: 0000:ffff8803cd31bdf8  EFLAGS: 00010246
      [ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000
      [ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286
      [ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000
      [ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      [ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031
      [ 1317.270230] FS:  00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000
      0000
      [ 1317.270230] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0
      [ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
      [ 1317.270230] Stack:
      [ 1317.270230]  0000000054086700 5408670000a25de0 5408670000000002 0000000000000000
      [ 1317.270230]  ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160
      [ 1317.270230]  ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280
      [ 1317.270230] Call Trace:
      [ 1317.270230]  [<ffffffff84223542>] ? rds_trans_get_preferred+0x42/0xa0
      [ 1317.270230]  [<ffffffff84223556>] rds_trans_get_preferred+0x56/0xa0
      [ 1317.270230]  [<ffffffff8421c9c3>] rds_bind+0x73/0xf0
      [ 1317.270230]  [<ffffffff83e4ce62>] SYSC_bind+0x92/0xf0
      [ 1317.270230]  [<ffffffff812493f8>] ? context_tracking_user_exit+0xb8/0x1d0
      [ 1317.270230]  [<ffffffff8119313d>] ? trace_hardirqs_on+0xd/0x10
      [ 1317.270230]  [<ffffffff8107a852>] ? syscall_trace_enter+0x32/0x290
      [ 1317.270230]  [<ffffffff83e4cece>] SyS_bind+0xe/0x10
      [ 1317.270230]  [<ffffffff843a6ad0>] tracesys+0xdd/0xe2
      [ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00
      89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 <80> b8 74 09 00 00 01 7
      4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02
      [ 1317.270230] RIP  [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110
      [ 1317.270230]  RSP <ffff8803cd31bdf8>
      [ 1317.270230] CR2: 0000000000000974
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      95ae3677
    • Salva Peiró's avatar
      hamradio/yam: fix info leak in ioctl · 794ce89c
      Salva Peiró authored
      [ Upstream commit 8e3fbf87 ]
      
      The yam_ioctl() code fails to initialise the cmd field
      of the struct yamdrv_ioctl_cfg. Add an explicit memset(0)
      before filling the structure to avoid the 4-byte info leak.
      Signed-off-by: default avatarSalva Peiró <speiro@ai2.upv.es>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      794ce89c
    • Wenliang Fan's avatar
      drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl() · 6ea9c09b
      Wenliang Fan authored
      [ Upstream commit e9db5c21 ]
      
      The local variable 'bi' comes from userspace. If userspace passed a
      large number to 'bi.data.calibrate', there would be an integer overflow
      in the following line:
      	s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16;
      Signed-off-by: default avatarWenliang Fan <fanwlexca@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6ea9c09b
    • Daniel Borkmann's avatar
      net: inet_diag: zero out uninitialized idiag_{src,dst} fields · 9229facb
      Daniel Borkmann authored
      [ Upstream commit b1aac815 ]
      
      Jakub reported while working with nlmon netlink sniffer that parts of
      the inet_diag_sockid are not initialized when r->idiag_family != AF_INET6.
      That is, fields of r->id.idiag_src[1 ... 3], r->id.idiag_dst[1 ... 3].
      
      In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab]
      memory through this. At least, in udp_dump_one(), we allocate a skb in ...
      
        rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL);
      
      ... and then pass that to inet_sk_diag_fill() that puts the whole struct
      inet_diag_msg into the skb, where we only fill out r->id.idiag_src[0],
      r->id.idiag_dst[0] and leave the rest untouched:
      
        r->id.idiag_src[0] = inet->inet_rcv_saddr;
        r->id.idiag_dst[0] = inet->inet_daddr;
      
      struct inet_diag_msg embeds struct inet_diag_sockid that is correctly /
      fully filled out in IPv6 case, but for IPv4 not.
      
      So just zero them out by using plain memset (for this little amount of
      bytes it's probably not worth the extra check for idiag_family == AF_INET).
      
      Similarly, fix also other places where we fill that out.
      Reported-by: default avatarJakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9229facb
    • Sasha Levin's avatar
      net: unix: allow bind to fail on mutex lock · 2e737a8a
      Sasha Levin authored
      [ Upstream commit 37ab4fa7 ]
      
      This is similar to the set_peek_off patch where calling bind while the
      socket is stuck in unix_dgram_recvmsg() will block and cause a hung task
      spew after a while.
      
      This is also the last place that did a straightforward mutex_lock(), so
      there shouldn't be any more of these patches.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2e737a8a
    • Nat Gurumoorthy's avatar
      tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0 · 6e890b0a
      Nat Gurumoorthy authored
      [ Upstream commit 388d3335 ]
      
      The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120)
      uninitialized. From power on reset this register may have garbage in it. The
      Register Base Address register defines the device local address of a
      register. The data pointed to by this location is read or written using
      the Register Data register (PCI config offset 128). When REG_BASE_ADDR has
      garbage any read or write of Register Data Register (PCI 128) will cause the
      PCI bus to lock up. The TCO watchdog will fire and bring down the system.
      Signed-off-by: default avatarNat Gurumoorthy <natg@google.com>
      Acked-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6e890b0a
    • Changli Gao's avatar
      net: drop_monitor: fix the value of maxattr · 9898c396
      Changli Gao authored
      [ Upstream commit d323e92c ]
      
      maxattr in genl_family should be used to save the max attribute
      type, but not the max command type. Drop monitor doesn't support
      any attributes, so we should leave it as zero.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9898c396
    • Hannes Frederic Sowa's avatar
      ipv6: don't count addrconf generated routes against gc limit · 66d66a81
      Hannes Frederic Sowa authored
      [ Upstream commit a3300ef4 ]
      
      Brett Ciphery reported that new ipv6 addresses failed to get installed
      because the addrconf generated dsts where counted against the dst gc
      limit. We don't need to count those routes like we currently don't count
      administratively added routes.
      
      Because the max_addresses check enforces a limit on unbounded address
      generation first in case someone plays with router advertisments, we
      are still safe here.
      Reported-by: default avatarBrett Ciphery <brett.ciphery@windriver.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      66d66a81
    • Venkat Venkatsubra's avatar
      rds: prevent BUG_ON triggered on congestion update to loopback · 2c317886
      Venkat Venkatsubra authored
      [ Upstream commit 18fc25c9 ]
      
      After congestion update on a local connection, when rds_ib_xmit returns
      less bytes than that are there in the message, rds_send_xmit calls
      back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
      to trigger.
      
      For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
      the message contains 8240 bytes. rds_send_xmit thinks there is more to send
      and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
      =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].
      
      The commit 6094628b
      "rds: prevent BUG_ON triggering on congestion map updates" introduced
      this regression. That change was addressing the triggering of a different
      BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
       	BUG_ON(ret != 0 &&
          		 conn->c_xmit_sg == rm->data.op_nents);
      This was the sequence it was going through:
      (rds_ib_xmit)
      /* Do not send cong updates to IB loopback */
      if (conn->c_loopback
         && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
        	rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
          	return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
      }
      rds_ib_xmit returns 8240
      rds_send_xmit:
        c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
         		 = 8192
        c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
      rds_ib_xmit returns 8240
      rds_send_xmit:
        c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
        and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
      rds_ib_xmit returns 8240
      On this iteration this sequence causes the BUG_ON in rds_send_xmit:
          while (ret) {
          	tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
          	[tmp = 65536 - 57632 = 7904]
          	conn->c_xmit_data_off += tmp;
          	[c_xmit_data_off = 57632 + 7904 = 65536]
          	ret -= tmp;
          	[ret = 8240 - 7904 = 336]
          	if (conn->c_xmit_data_off == sg->length) {
          		conn->c_xmit_data_off = 0;
          		sg++;
          		conn->c_xmit_sg++;
          		BUG_ON(ret != 0 &&
          			conn->c_xmit_sg == rm->data.op_nents);
          		[c_xmit_sg = 1, rm->data.op_nents = 1]
      
      What the current fix does:
      Since the congestion update over loopback is not actually transmitted
      as a message, all that rds_ib_xmit needs to do is let the caller think
      the full message has been transmitted and not return partial bytes.
      It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
      And 64Kb+48 when page size is 64Kb.
      Reported-by: default avatarJosh Hunt <joshhunt00@gmail.com>
      Tested-by: default avatarHonggang Li <honli@redhat.com>
      Acked-by: default avatarBang Nguyen <bang.nguyen@oracle.com>
      Signed-off-by: default avatarVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2c317886
    • Eric Dumazet's avatar
      net: do not pretend FRAGLIST support · b0809483
      Eric Dumazet authored
      [ Upstream commit 28e24c62 ]
      
      Few network drivers really supports frag_list : virtual drivers.
      
      Some drivers wrongly advertise NETIF_F_FRAGLIST feature.
      
      If skb with a frag_list is given to them, packet on the wire will be
      corrupt.
      
      Remove this flag, as core networking stack will make sure to
      provide packets that can be sent without corruption.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Cc: Anirudha Sarangi <anirudh@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b0809483
  2. 03 Jan, 2014 7 commits
    • Ben Hutchings's avatar
      Linux 3.2.54 · 260716c8
      Ben Hutchings authored
      260716c8
    • KOBAYASHI Yoshitake's avatar
      mmc: block: fix a bug of error handling in MMC driver · 295efae4
      KOBAYASHI Yoshitake authored
      commit c8760069 upstream.
      
      Current MMC driver doesn't handle generic error (bit19 of device
      status) in write sequence. As a result, write data gets lost when
      generic error occurs. For example, a generic error when updating a
      filesystem management information causes a loss of write data and
      corrupts the filesystem. In the worst case, the system will never
      boot.
      
      This patch includes the following functionality:
        1. To enable error checking for the response of CMD12 and CMD13
           in write command sequence
        2. To retry write sequence when a generic error occurs
      
      Messages are added for v2 to show what occurs.
      
      [Backported to 3.4-stable]
      Signed-off-by: default avatarKOBAYASHI Yoshitake <yoshitake.kobayashi@toshiba.co.jp>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      295efae4
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix function graph with loading of modules · 969a08e9
      Steven Rostedt (Red Hat) authored
      commit 8a56d776 upstream.
      
      Commit 8c4f3c3f "ftrace: Check module functions being traced on reload"
      fixed module loading and unloading with respect to function tracing, but
      it missed the function graph tracer. If you perform the following
      
       # cd /sys/kernel/debug/tracing
       # echo function_graph > current_tracer
       # modprobe nfsd
       # echo nop > current_tracer
      
      You'll get the following oops message:
      
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
       Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
       CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
        0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
        ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
       Call Trace:
        [<ffffffff814fe193>] dump_stack+0x4f/0x7c
        [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b
        [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c
        [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364
        [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b
        [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78
        [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10
        [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a
        [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd
        [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde
        [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e
        [<ffffffff81122aed>] vfs_write+0xab/0xf6
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff81122dbd>] SyS_write+0x59/0x82
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b
       ---[ end trace 940358030751eafb ]---
      
      The above mentioned commit didn't go far enough. Well, it covered the
      function tracer by adding checks in __register_ftrace_function(). The
      problem is that the function graph tracer circumvents that (for a slight
      efficiency gain when function graph trace is running with a function
      tracer. The gain was not worth this).
      
      The problem came with ftrace_startup() which should always be called after
      __register_ftrace_function(), if you want this bug to be completely fixed.
      
      Anyway, this solution moves __register_ftrace_function() inside of
      ftrace_startup() and removes the need to call them both.
      Reported-by: default avatarDave Wysochanski <dwysocha@redhat.com>
      Fixes: ed926f9b ("ftrace: Use counters to enable functions to trace")
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      969a08e9
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Check module functions being traced on reload · 874d3954
      Steven Rostedt (Red Hat) authored
      commit 8c4f3c3f upstream.
      
      There's been a nasty bug that would show up and not give much info.
      The bug displayed the following warning:
      
       WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
       Pid: 20903, comm: bash Tainted: G           O 3.6.11+ #38405.trunk
       Call Trace:
        [<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
        [<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
        [<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
        [<ffffffff811401cc>] ? kfree+0x2c/0x110
        [<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
        [<ffffffff81149f1e>] __fput+0xae/0x220
        [<ffffffff8114a09e>] ____fput+0xe/0x10
        [<ffffffff8105fa22>] task_work_run+0x72/0x90
        [<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
        [<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
        [<ffffffff815c0f88>] int_signal+0x12/0x17
       ---[ end trace 793179526ee09b2c ]---
      
      It was finally narrowed down to unloading a module that was being traced.
      
      It was actually more than that. When functions are being traced, there's
      a table of all functions that have a ref count of the number of active
      tracers attached to that function. When a function trace callback is
      registered to a function, the function's record ref count is incremented.
      When it is unregistered, the function's record ref count is decremented.
      If an inconsistency is detected (ref count goes below zero) the above
      warning is shown and the function tracing is permanently disabled until
      reboot.
      
      The ftrace callback ops holds a hash of functions that it filters on
      (and/or filters off). If the hash is empty, the default means to filter
      all functions (for the filter_hash) or to disable no functions (for the
      notrace_hash).
      
      When a module is unloaded, it frees the function records that represent
      the module functions. These records exist on their own pages, that is
      function records for one module will not exist on the same page as
      function records for other modules or even the core kernel.
      
      Now when a module unloads, the records that represents its functions are
      freed. When the module is loaded again, the records are recreated with
      a default ref count of zero (unless there's a callback that traces all
      functions, then they will also be traced, and the ref count will be
      incremented).
      
      The problem is that if an ftrace callback hash includes functions of the
      module being unloaded, those hash entries will not be removed. If the
      module is reloaded in the same location, the hash entries still point
      to the functions of the module but the module's ref counts do not reflect
      that.
      
      With the help of Steve and Joern, we found a reproducer:
      
       Using uinput module and uinput_release function.
      
       cd /sys/kernel/debug/tracing
       modprobe uinput
       echo uinput_release > set_ftrace_filter
       echo function > current_tracer
       rmmod uinput
       modprobe uinput
       # check /proc/modules to see if loaded in same addr, otherwise try again
       echo nop > current_tracer
      
       [BOOM]
      
      The above loads the uinput module, which creates a table of functions that
      can be traced within the module.
      
      We add uinput_release to the filter_hash to trace just that function.
      
      Enable function tracincg, which increments the ref count of the record
      associated to uinput_release.
      
      Remove uinput, which frees the records including the one that represents
      uinput_release.
      
      Load the uinput module again (and make sure it's at the same address).
      This recreates the function records all with a ref count of zero,
      including uinput_release.
      
      Disable function tracing, which will decrement the ref count for uinput_release
      which is now zero because of the module removal and reload, and we have
      a mismatch (below zero ref count).
      
      The solution is to check all currently tracing ftrace callbacks to see if any
      are tracing any of the module's functions when a module is loaded (it already does
      that with callbacks that trace all functions). If a callback happens to have
      a module function being traced, it increments that records ref count and starts
      tracing that function.
      
      There may be a strange side effect with this, where tracing module functions
      on unload and then reloading a new module may have that new module's functions
      being traced. This may be something that confuses the user, but it's not
      a big deal. Another approach is to disable all callback hashes on module unload,
      but this leaves some ftrace callbacks that may not be registered, but can
      still have hashes tracing the module's function where ftrace doesn't know about
      it. That situation can cause the same bug. This solution solves that case too.
      Another benefit of this solution, is it is possible to trace a module's
      function on unload and load.
      
      Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.comReported-by: default avatarJörn Engel <joern@logfs.org>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Reported-by: default avatarSteve Hodgson <steve@purestorage.com>
      Tested-by: default avatarSteve Hodgson <steve@purestorage.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      [bwh: Backported to 3.2: adjust context, indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      874d3954
    • Steven Rostedt's avatar
      ftrace: Create ftrace_hash_empty() helper routine · 195c821e
      Steven Rostedt authored
      commit 06a51d93 upstream.
      
      There are two types of hashes in the ftrace_ops; one type
      is the filter_hash and the other is the notrace_hash. Either
      one may be null, meaning it has no elements. But when elements
      are added, the hash is allocated.
      
      Throughout the code, a check needs to be made to see if a hash
      exists or the hash has elements, but the check if the hash exists
      is usually missing causing the possible "NULL pointer dereference bug".
      
      Add a helper routine called "ftrace_hash_empty()" that returns
      true if the hash doesn't exist or its count is zero. As they mean
      the same thing.
      Last-bug-reported-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      195c821e
    • Steven Rostedt's avatar
      ftrace: Fix ftrace hash record update with notrace · 4f02a393
      Steven Rostedt authored
      commit c842e975 upstream.
      
      When disabling the "notrace" records, that means we want to trace them.
      If the notrace_hash is zero, it means that we want to trace all
      records. But to disable a zero notrace_hash means nothing.
      
      The check for the notrace_hash count was incorrect with:
      
      	if (hash && !hash->count)
      		return
      
      With the correct comment above it that states that we do nothing
      if the notrace_hash has zero count. But !hash also means that
      the notrace hash has zero count. I think this was done to
      protect against dereferencing NULL. But if !hash is true, then
      we go through the following loop without doing a single thing.
      
      Fix it to:
      
      	if (!hash || !hash->count)
      		return;
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4f02a393
    • Jason Wang's avatar
      net: flow_dissector: fail on evil iph->ihl · f7d537dc
      Jason Wang authored
      commit 6f092343 upstream.
      
      We don't validate iph->ihl which may lead a dead loop if we meet a IPIP
      skb whose iph->ihl is zero. Fix this by failing immediately when iph->ihl
      is evil (less than 5).
      
      This issue were introduced by commit ec5efe79
      (rps: support IPIP encapsulation).
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: the affected code is in __skb_get_rxhash()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f7d537dc