1. 06 Feb, 2014 5 commits
    • Andrea Arcangeli's avatar
      mm: hugetlbfs: fix hugetlbfs optimization · 17b6ada0
      Andrea Arcangeli authored
      commit 27c73ae7 upstream.
      
      Commit 7cb2ef56 ("mm: fix aio performance regression for database
      caused by THP") can cause dereference of a dangling pointer if
      split_huge_page runs during PageHuge() if there are updates to the
      tail_page->private field.
      
      Also it is repeating compound_head twice for hugetlbfs and it is running
      compound_head+compound_trans_head for THP when a single one is needed in
      both cases.
      
      The new code within the PageSlab() check doesn't need to verify that the
      THP page size is never bigger than the smallest hugetlbfs page size, to
      avoid memory corruption.
      
      A longstanding theoretical race condition was found while fixing the
      above (see the change right after the skip_unlock label, that is
      relevant for the compound_lock path too).
      
      By re-establishing the _mapcount tail refcounting for all compound
      pages, this also fixes the below problem:
      
        echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
      
        BUG: Bad page state in process bash  pfn:59a01
        page:ffffea000139b038 count:0 mapcount:10 mapping:          (null) index:0x0
        page flags: 0x1c00000000008000(tail)
        Modules linked in:
        CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        Call Trace:
          dump_stack+0x55/0x76
          bad_page+0xd5/0x130
          free_pages_prepare+0x213/0x280
          __free_pages+0x36/0x80
          update_and_free_page+0xc1/0xd0
          free_pool_huge_page+0xc2/0xe0
          set_max_huge_pages.part.58+0x14c/0x220
          nr_hugepages_store_common.isra.60+0xd0/0xf0
          nr_hugepages_store+0x13/0x20
          kobj_attr_store+0xf/0x20
          sysfs_write_file+0x189/0x1e0
          vfs_write+0xc5/0x1f0
          SyS_write+0x55/0xb0
          system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Tested-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Guillaume Morin <guillaume@morinfr.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17b6ada0
    • Alexandre Courbot's avatar
      lib/decompressors: fix "no limit" output buffer length · c18e49ad
      Alexandre Courbot authored
      commit 1431574a upstream.
      
      When decompressing into memory, the output buffer length is set to some
      arbitrarily high value (0x7fffffff) to indicate the output is, virtually,
      unlimited in size.
      
      The problem with this is that some platforms have their physical memory at
      high physical addresses (0x80000000 or more), and that the output buffer
      address and its "unlimited" length cannot be added without overflowing.
      An example of this can be found in inflate_fast():
      
      /* next_out is the output buffer address */
      out = strm->next_out - OFF;
      /* avail_out is the output buffer size. end will overflow if the output
       * address is >= 0x80000104 */
      end = out + (strm->avail_out - 257);
      
      This has huge consequences on the performance of kernel decompression,
      since the following exit condition of inflate_fast() will be always true:
      
      } while (in < last && out < end);
      
      Indeed, "end" has overflowed and is now always lower than "out".  As a
      result, inflate_fast() will return after processing one single byte of
      input data, and will thus need to be called an unreasonably high number of
      times.  This probably went unnoticed because kernel decompression is fast
      enough even with this issue.
      
      Nonetheless, adjusting the output buffer length in such a way that the
      above pointer arithmetic never overflows results in a kernel decompression
      that is about 3 times faster on affected machines.
      Signed-off-by: default avatarAlexandre Courbot <acourbot@nvidia.com>
      Tested-by: default avatarJon Medhurst <tixy@linaro.org>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c18e49ad
    • Ilia Mirkin's avatar
      drm/nouveau/bios: fix offset calculation for BMPv1 bioses · 366d6b20
      Ilia Mirkin authored
      commit 5d2f4767 upstream.
      
      The only BIOS on record that needs the 14 offset has a bios major
      version 2 but BMP version 1.01. Another bunch of BIOSes that need the 18
      offset have BMP version 2.01 or 5.01 or higher. So instead of looking at the
      bios major version, look at the BMP version. BIOSes with BMP version 0
      do not contain a detectable script, so always return 0 for them.
      
      See https://bugs.freedesktop.org/show_bug.cgi?id=68835Reported-by: default avatarMauro Molinari <mauromol@tiscali.it>
      Signed-off-by: default avatarIlia Mirkin <imirkin@alum.mit.edu>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      366d6b20
    • NeilBrown's avatar
      md/raid5: fix long-standing problem with bitmap handling on write failure. · 6ba854e9
      NeilBrown authored
      commit 9f97e4b1 upstream.
      
      Before a write starts we set a bit in the write-intent bitmap.
      When the write completes we clear that bit if the write was successful
      to all devices.  However if the write wasn't fully successful we
      should not clear the bit.  If the faulty drive is subsequently
      re-added, the fact that the bit is still set ensure that we will
      re-write the data that is missing.
      
      This logic is mediated by the STRIPE_DEGRADED flag - we only clear the
      bitmap bit when this flag is not set.
      Currently we correctly set the flag if a write starts when some
      devices are failed or missing.  But we do *not* set the flag if some
      device failed during the write attempt.
      This is wrong and can result in clearing the bit inappropriately.
      
      So: set the flag when a write fails.
      
      This bug has been present since bitmaps were introduces, so the fix is
      suitable for any -stable kernel.
      Reported-by: default avatarEthan Wilson <ethan.wilson@shiftmail.org>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ba854e9
    • Andrew Jones's avatar
      kvm: x86: fix apic_base enable check · d934d91a
      Andrew Jones authored
      commit 0dce7cd6 upstream.
      
      Commit e66d2ae7 moved the assignment
      vcpu->arch.apic_base = value above a condition with
      (vcpu->arch.apic_base ^ value), causing that check
      to always fail. Use old_value, vcpu->arch.apic_base's
      old value, in the condition instead.
      Signed-off-by: default avatarAndrew Jones <drjones@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d934d91a
  2. 25 Jan, 2014 24 commits
  3. 15 Jan, 2014 11 commits