1. 26 Jun, 2013 2 commits
    • Jonathan Brassow's avatar
      MD: Remember the last sync operation that was performed · c4a39551
      Jonathan Brassow authored
      MD:  Remember the last sync operation that was performed
      
      This patch adds a field to the mddev structure to track the last
      sync operation that was performed.  This is especially useful when
      it comes to what is recorded in mismatch_cnt in sysfs.  If the
      last operation was "data-check", then it reports the number of
      descrepancies found by the user-initiated check.  If it was a
      "repair" operation, then it is reporting the number of
      descrepancies repaired.  etc.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c4a39551
    • NeilBrown's avatar
      md: fix buglet in RAID5 -> RAID0 conversion. · eea136d6
      NeilBrown authored
      RAID5 uses a 'per-array' value for the 'size' of each device.
      RAID0 uses a 'per-device' value - it can be different for each device.
      
      When converting a RAID5 to a RAID0 we must ensure that the per-device
      size of each device matches the per-array size for the RAID5, else
      the array will change size.
      
      If the metadata cannot record a changed per-device size (as is the
      case with v0.90 metadata) the array could get bigger on restart.  This
      does not cause data corruption, so it not a big issue and is mainly
      yet another a reason to not use 0.90.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      eea136d6
  2. 13 Jun, 2013 22 commits
    • NeilBrown's avatar
      md/raid10: check In_sync flag in 'enough()'. · 725d6e57
      NeilBrown authored
      It isn't really enough to check that the rdev is present, we need to
      also be sure that the device is still In_sync.
      
      Doing this requires using rcu_dereference to access the rdev, and
      holding the rcu_read_lock() to ensure the rdev doesn't disappear while
      we look at it.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      725d6e57
    • NeilBrown's avatar
      md/raid10: locking changes for 'enough()'. · 635f6416
      NeilBrown authored
      As 'enough' accesses conf->prev and conf->geo, which can change
      spontanously, it should guard against changes.
      This can be done with device_lock as start_reshape holds device_lock
      while updating 'geo' and end_reshape holds it while updating 'prev'.
      
      So 'error' needs to hold 'device_lock'.
      
      On the other hand, raid10_end_read_request knows which of the two it
      really wants to access, and as it is an active request on that one,
      the value cannot change underneath it.
      
      So change _enough to take flag rather than a pointer, pass the
      appropriate flag from raid10_end_read_request(), and remove the locking.
      
      All other calls to 'enough' are made with reconfig_mutex held, so
      neither 'prev' nor 'geo' can change.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      635f6416
    • Jingoo Han's avatar
      md: replace strict_strto*() with kstrto*() · b29bebd6
      Jingoo Han authored
      The usage of strict_strtoul() is not preferred, because
      strict_strtoul() is obsolete. Thus, kstrtoul() should be
      used.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      b29bebd6
    • Hannes Reinecke's avatar
      md: Wait for md_check_recovery before attempting device removal. · 90f5f7ad
      Hannes Reinecke authored
      When a device has failed, it needs to be removed from the personality
      module before it can be removed from the array as a whole.
      The first step is performed by md_check_recovery() which is called
      from the raid management thread.
      
      So when a HOT_REMOVE ioctl arrives, wait briefly for md_check_recovery
      to have run.  This increases the chance that the ioctl will succeed.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarNeil Brown <nfbrown@suse.de>
      90f5f7ad
    • NeilBrown's avatar
      dm-raid: silence compiler warning on rebuilds_per_group. · 3f6bbd3f
      NeilBrown authored
      This doesn't really need to be initialised, but it doesn't hurt,
      silences the compiler, and as it is a counter it makes sense for it to
      start at zero.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      3f6bbd3f
    • Jonathan Brassow's avatar
      DM RAID: Fix raid_resume not reviving failed devices in all cases · a4dc163a
      Jonathan Brassow authored
      DM RAID:  Fix raid_resume not reviving failed devices in all cases
      
      When a device fails in a RAID array, it is marked as Faulty.  Later,
      md_check_recovery is called which (through the call chain) calls
      'hot_remove_disk' in order to have the personalities remove the device
      from use in the array.
      
      Sometimes, it is possible for the array to be suspended before the
      personalities get their chance to perform 'hot_remove_disk'.  This is
      normally not an issue.  If the array is deactivated, then the failed
      device will be noticed when the array is reinstantiated.  If the
      array is resumed and the disk is still missing, md_check_recovery will
      be called upon resume and 'hot_remove_disk' will be called at that
      time.  However, (for dm-raid) if the device has been restored,
      a resume on the array would cause it to attempt to revive the device
      by calling 'hot_add_disk'.  If 'hot_remove_disk' had not been called,
      a situation is then created where the device is thought to concurrently
      be the replacement and the device to be replaced.  Thus, the device
      is first sync'ed with the rest of the array (because it is the replacement
      device) and then marked Faulty and removed from the array (because
      it is also the device being replaced).
      
      The solution is to check and see if the device had properly been removed
      before the array was suspended.  This is done by seeing whether the
      device's 'raid_disk' field is -1 - a condition that implies that
      'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1)
      -> hot_remove_disk' has been called.  If 'raid_disk' is not -1, then
      'hot_remove_disk' must be called to complete the removal of the previously
      faulty device before it can be revived via 'hot_add_disk'.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a4dc163a
    • Jonathan Brassow's avatar
      DM RAID: Break-up untidy function · f381e71b
      Jonathan Brassow authored
      DM RAID:  Break-up untidy function
      
      Clean-up excessive indentation by moving some code in raid_resume()
      into its own function.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      f381e71b
    • Jonathan Brassow's avatar
      DM RAID: Add ability to restore transiently failed devices on resume · 9092c02d
      Jonathan Brassow authored
      DM RAID: Add ability to restore transiently failed devices on resume
      
      This patch adds code to the resume function to check over the devices
      in the RAID array.  If any are found to be marked as failed and their
      superblocks can be read, an attempt is made to reintegrate them into
      the array.  This allows the user to refresh the array with a simple
      suspend and resume of the array - rather than having to load a
      completely new table, allocate and initialize all the structures and
      throw away the old instantiation.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      9092c02d
    • Linus Torvalds's avatar
      Merge tag 'acpi-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 25e33ed9
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "This is an alternative fix for the regression introduced in 3.9 whose
        previous fix had to be reverted right before 3.10-rc5, because it
        broke one of the Tony's machines.
      
        In this one the check is confined to the ACPI video driver (which is
        the only one causing the problem to happen in the first place) and the
        Tony's box shouldn't even notice it.
      
         - ACPI fix for an issue causing ACPI video driver to attempt to bind
           to devices it shouldn't touch from Rafael J Wysocki."
      
      * tag 'acpi-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / video: Do not bind to device objects with a scan handler
      25e33ed9
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cb03dc09
      Linus Torvalds authored
      Pull x86 fixes from Peter Anvin:
       "Another set of fixes, the biggest bit of this is yet another tweak to
        the UEFI anti-bricking code; apparently we finally got some feedback
        from Samsung as to what makes at least their systems fail.  This set
        should actually fix the boot regressions that some other systems (e.g.
        SGI) have exhibited.
      
        Other than that, there is a patch to avoid a panic with particularly
        unhappy memory layouts and two minor protocol fixes which may or may
        not be manifest bugs"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Fix typo in kexec register clearing
        x86, relocs: Move __vvar_page from S_ABS to S_REL
        Modify UEFI anti-bricking code
        x86: Fix adjust_range_size_mask calling position
      cb03dc09
    • Linus Torvalds's avatar
      Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · cb7e9704
      Linus Torvalds authored
      Pull RCU fixes from Paul McKenney:
       "I must confess that this past merge window was not RCU's best showing.
        This series contains three more fixes for RCU regressions:
      
         1.   A fix to __DECLARE_TRACE_RCU() that causes it to act as an
              interrupt from idle rather than as a task switch from idle.
              This change is needed due to the recent use of _rcuidle()
              tracepoints that can be invoked from interrupt handlers as well
              as from idle.  Without this fix, invoking _rcuidle() tracepoints
              from interrupt handlers results in splats and (more seriously)
              confusion on RCU's part as to whether a given CPU is idle or not.
              This confusion can in turn result in too-short grace periods and
              therefore random memory corruption.
      
         2.   A fix to a subtle deadlock that could result due to RCU doing
              a wakeup while holding one of its rcu_node structure's locks.
              Although the probability of occurrence is low, it really
              does happen.  The fix, courtesy of Steven Rostedt, uses
              irq_work_queue() to avoid the deadlock.
      
         3.   A fix to a silent deadlock (invisible to lockdep) due to the
              interaction of timeouts posted by RCU debug code enabled by
              CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU
              hotplug operations.  This will not occur in production kernels,
              but really does occur in randconfig testing.  Diagnosis courtesy
              of Steven Rostedt"
      
      * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
        rcu: Don't call wakeup() with rcu_node structure ->lock held
        trace: Allow idle-safe tracepoints to be called from irq
      cb7e9704
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · dcae7f2d
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
       "Three kvm related memory management fixes, a fix for show_trace, a fix
        for early console output and a patch from Ben to help prevent compile
        errors in regard to irq functions (or our lack thereof)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/pci: Implement IRQ functions if !PCI
        s390/sclp: fix new line detection
        s390/pgtable: make pgste lock an explicit barrier
        s390/pgtable: Save pgste during modify_prot_start/commit
        s390/dumpstack: fix address ranges for asynchronous and panic stack
        s390/pgtable: Fix guest overindication for change bit
      dcae7f2d
    • Linus Torvalds's avatar
      Merge tag 'asoc-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound · 509768f7
      Linus Torvalds authored
      Pull ASoC sound updates from Mark Brown:
       "Takashi is travelling at the minute and it'd be good to get the
        MAINTAINERS update in here merged so sending directly.
      
        As well as the usual driver specifics we've got a couple of core fixes
        here, one fixing capabilities for unidirectional streams and the other
        fixing suspend while audio streams are active.
      
        The suspend fix is a little involved but mostly as a result of
        removing some special casing that was doing the wrong thing."
      
      * tag 'asoc-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound:
        ASoC: tlv320aic3x: Remove deadlock from snd_soc_dapm_put_volsw_aic3x()
        ASoC: dapm: Treat DAI widgets like AIF widgets for power
        ASoC: arizona: Correct AEC loopback enable
        ASoC: pcm: Require both CODEC and CPU support when declaring stream caps
        MAINTAINERS: Remove myself from Wolfson maintainers
        ASoC: wm8994: Ensure microphone detection state is reset on removal
        ASoC: wm8994: Avoid leaking pm_runtime reference on removed jack race
        ASoC: cs42l52: fix hp_gain_enum shift value.
        ASoC: cs42l52: use correct PCM mixer TLV dB scale to match datasheet.
      509768f7
    • Linus Torvalds's avatar
      Merge tag 'md-3.10-fixes' of git://neil.brown.name/md · 82ea4be6
      Linus Torvalds authored
      Pull md bugfixes from Neil Brown:
       "A few bugfixes for md
      
        Some tagged for -stable"
      
      * tag 'md-3.10-fixes' of git://neil.brown.name/md:
        md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place
        md/raid1,raid10: use freeze_array in place of raise_barrier in various places.
        md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.
        md: md_stop_writes() should always freeze recovery.
      82ea4be6
    • Josh Triplett's avatar
      turbostat: Increase output buffer size to accommodate C8-C10 · b844db31
      Josh Triplett authored
      On platforms with C8-C10 support, the additional C-states cause
      turbostat to overrun its output buffer of 128 bytes per CPU.  Increase
      this to 256 bytes per CPU.
      
      [ As a bugfix, this should go into 3.10; however, since the C8-C10
        support didn't go in until after 3.9, this need not go into any stable
        kernel. ]
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b844db31
    • H. Peter Anvin's avatar
      Merge tag 'efi-urgent' into x86/urgent · 45df901c
      H. Peter Anvin authored
       * More tweaking to the EFI variable anti-bricking algorithm. Quite a
         few users were reporting boot regressions in v3.9. This has now been
         fixed with a more accurate "minimum storage requirement to avoid
         bricking" value from Samsung (5K instead of 50%) and code to trigger
         garbage collection when we near our limit - Matthew Garrett.
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      45df901c
    • H. Peter Anvin's avatar
      md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place · 5026d7a9
      H. Peter Anvin authored
      There are cases where the kernel will believe that the WRITE SAME
      command is supported by a block device which does not, in fact,
      support WRITE SAME.  This currently happens for SATA drivers behind a
      SAS controller, but there are probably a hundred other ways that can
      happen, including drive firmware bugs.
      
      After receiving an error for WRITE SAME the block layer will retry the
      request as a plain write of zeroes, but mdraid will consider the
      failure as fatal and consider the drive failed.  This has the effect
      that all the mirrors containing a specific set of data are each
      offlined in very rapid succession resulting in data loss.
      
      However, just bouncing the request back up to the block layer isn't
      ideal either, because the whole initial request-retry sequence should
      be inside the write bitmap fence, which probably means that md needs
      to do its own conversion of WRITE SAME to write zero.
      
      Until the failure scenario has been sorted out, disable WRITE SAME for
      raid1, raid5, and raid10.
      
      [neilb: added raid5]
      
      This patch is appropriate for any -stable since 3.7 when write_same
      support was added.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      5026d7a9
    • NeilBrown's avatar
      md/raid1,raid10: use freeze_array in place of raise_barrier in various places. · e2d59925
      NeilBrown authored
      Various places in raid1 and raid10 are calling raise_barrier when they
      really should call freeze_array.
      The former is only intended to be called from "make_request".
      The later has extra checks for 'nr_queued' and makes a call to
      flush_pending_writes(), so it is safe to call it from within the
      management thread.
      
      Using raise_barrier will sometimes deadlock.  Using freeze_array
      should not.
      
      As 'freeze_array' currently expects one request to be pending (in
      handle_read_error - the only previous caller), we need to pass
      it the number of pending requests (extra) to ignore.
      
      The deadlock was made particularly noticeable by commits
      050b6615 (raid10) and 6b740b8d (raid1) which
      appeared in 3.4, so the fix is appropriate for any -stable
      kernel since then.
      
      This patch probably won't apply directly to some early kernels and
      will need to be applied by hand.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e2d59925
    • Alex Lyakas's avatar
      md/raid1: consider WRITE as successful only if at least one non-Faulty and... · 3056e3ae
      Alex Lyakas authored
      md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.
      
      Without that fix, the following scenario could happen:
      
      - RAID1 with drives A and B; drive B was freshly-added and is rebuilding
      - Drive A fails
      - WRITE request arrives to the array. It is failed by drive A, so
      r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B
      succeeds in writing it, so the same r1_bio is marked as
      R1BIO_Uptodate.
      - r1_bio arrives to handle_write_finished, badblocks are disabled,
      md_error()->error() does nothing because we don't fail the last drive
      of raid1
      - raid_end_bio_io()  calls call_bio_endio()
      - As a result, in call_bio_endio():
              if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
                      clear_bit(BIO_UPTODATE, &bio->bi_flags);
      this code doesn't clear the BIO_UPTODATE flag, and the whole master
      WRITE succeeds, back to the upper layer.
      
      So we returned success to the upper layer, even though we had written
      the data onto the rebuilding drive only. But when we want to read the
      data back, we would not read from the rebuilding drive, so this data
      is lost.
      
      [neilb - applied identical change to raid10 as well]
      
      This bug can result in lost data, so it is suitable for any
      -stable kernel.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlex Lyakas <alex@zadarastorage.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      3056e3ae
    • NeilBrown's avatar
      md: md_stop_writes() should always freeze recovery. · 6b6204ee
      NeilBrown authored
      __md_stop_writes() will currently sometimes freeze recovery.
      So any caller must be ready for that to happen, and indeed they are.
      
      However if __md_stop_writes() doesn't freeze_recovery, then
      a recovery could start before mddev_suspend() is called, which
      could be awkward.  This can particularly cause problems or dm-raid.
      
      So change __md_stop_writes() to always freeze recovery.  This is safe
      and more predicatable.
      Reported-by: default avatarBrassow Jonathan <jbrassow@redhat.com>
      Tested-by: default avatarBrassow Jonathan <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      6b6204ee
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 26e04462
      Linus Torvalds authored
      Pull networking update from David Miller:
      
       1) Fix dump iterator in nfnl_acct_dump() and ctnl_timeout_dump() to
          dump all objects properly, from Pablo Neira Ayuso.
      
       2) xt_TCPMSS must use the default MSS of 536 when no MSS TCP option is
          present.  Fix from Phil Oester.
      
       3) qdisc_get_rtab() looks for an existing matching rate table and uses
          that instead of creating a new one.  However, it's key matching is
          incomplete, it fails to check to make sure the ->data[] array is
          identical too.  Fix from Eric Dumazet.
      
       4) ip_vs_dest_entry isn't fully initialized before copying back to
          userspace, fix from Dan Carpenter.
      
       5) Fix ubuf reference counting regression in vhost_net, from Jason
          Wang.
      
       6) When sock_diag dumps a socket filter back to userspace, we have to
          translate it out of the kernel's internal representation first.
          From Nicolas Dichtel.
      
       7) davinci_mdio holds a spinlock while calling pm_runtime, which
          sleeps.  Fix from Sebastian Siewior.
      
       8) Timeout check in sh_eth_check_reset is off by one, from Sergei
          Shtylyov.
      
       9) If sctp socket init fails, we can NULL deref during cleanup.  Fix
          from Daniel Borkmann.
      
      10) netlink_mmap() does not propagate errors properly, from Patrick
          McHardy.
      
      11) Disable powersave and use minstrel by default in ath9k.  From Sujith
          Manoharan.
      
      12) Fix a regression in that SOCK_ZEROCOPY is not set on tuntap sockets
          which prevents vhost from being able to use zerocopy.  From Jason
          Wang.
      
      13) Fix race between port lookup and TX path in team driver, from Jiri
          Pirko.
      
      14) Missing length checks in bluetooth L2CAP packet parsing, from Johan
          Hedberg.
      
      15) rtlwifi fails to connect to networking using any encryption method
          other than WPA2.  Fix from Larry Finger.
      
      16) Fix iwlegacy build due to incorrect CONFIG_* ifdeffing for power
          management stuff.  From Yijing Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
        b43: stop format string leaking into error msgs
        ath9k: Use minstrel rate control by default
        Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity"
        ath9k: Disable PowerSave by default
        net: wireless: iwlegacy: fix build error for il_pm_ops
        rtlwifi: Fix a false leak indication for PCI devices
        wl12xx/wl18xx: scan all 5ghz channels
        wl12xx: increase minimum singlerole firmware version required
        wl12xx: fix minimum required firmware version for wl127x multirole
        rtlwifi: rtl8192cu: Fix problem in connecting to WEP or WPA(1) networks
        mwifiex: debugfs: Fix out of bounds array access
        Bluetooth: Fix mgmt handling of power on failures
        Bluetooth: Fix missing length checks for L2CAP signalling PDUs
        Bluetooth: btmrvl: support Marvell Bluetooth device SD8897
        Bluetooth: Fix checks for LE support on LE-only controllers
        team: fix checks in team_get_first_port_txable_rcu()
        team: move add to port list before port enablement
        team: check return value of team_get_port_by_index_rcu() for NULL
        tuntap: set SOCK_ZEROCOPY flag during open
        netlink: fix error propagation in netlink_mmap()
        ...
      26e04462
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 645a9929
      Linus Torvalds authored
      Pull input layer bugfix from Jiri Kosina:
       "Memory leak regression fix from Benjamin Tissoires"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: multitouch: prevent memleak with the allocated name
      645a9929
  3. 12 Jun, 2013 16 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · b2cc9c19
      Linus Torvalds authored
      Pull block layer fixes from Jens Axboe:
       "Outside of bcache (which really isn't super big), these are all
        few-liners.  There are a few important fixes in here:
      
         - Fix blk pm sleeping when holding the queue lock
      
         - A small collection of bcache fixes that have been done and tested
           since bcache was included in this merge window.
      
         - A fix for a raid5 regression introduced with the bio changes.
      
         - Two important fixes for mtip32xx, fixing an oops and potential data
           corruption (or hang) due to wrong bio iteration on stacked devices."
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        scatterlist: sg_set_buf() argument must be in linear mapping
        raid5: Initialize bi_vcnt
        pktcdvd: silence static checker warning
        block: remove refs to XD disks from documentation
        blkpm: avoid sleep when holding queue lock
        mtip32xx: Correctly handle bio->bi_idx != 0 conditions
        mtip32xx: Fix NULL pointer dereference during module unload
        bcache: Fix error handling in init code
        bcache: clarify free/available/unused space
        bcache: drop "select CLOSURES"
        bcache: Fix incompatible pointer type warning
      b2cc9c19
    • Linus Torvalds's avatar
      Merge branch 'akpm' (updates from Andrew Morton) · a568fa1c
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "Bunch of fixes and one little addition to math64.h"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
        include/linux/math64.h: add div64_ul()
        mm: memcontrol: fix lockless reclaim hierarchy iterator
        frontswap: fix incorrect zeroing and allocation size for frontswap_map
        kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
        mm: migration: add migrate_entry_wait_huge()
        ocfs2: add missing lockres put in dlm_mig_lockres_handler
        mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
        drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()
        aio: fix io_destroy() regression by using call_rcu()
        rtc-at91rm9200: use shadow IMR on at91sam9x5
        rtc-at91rm9200: add shadow interrupt mask
        rtc-at91rm9200: refactor interrupt-register handling
        rtc-at91rm9200: add configuration support
        rtc-at91rm9200: add match-table compile guard
        fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
        swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
        drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree
        cciss: fix broken mutex usage in ioctl
        audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
        drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel
        ...
      a568fa1c
    • Alex Shi's avatar
      include/linux/math64.h: add div64_ul() · c2853c8d
      Alex Shi authored
      There is div64_long() to handle the s64/long division, but no mocro do
      u64/ul division.  It is necessary in some scenarios, so add this
      function.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c2853c8d
    • Johannes Weiner's avatar
      mm: memcontrol: fix lockless reclaim hierarchy iterator · 89dc991f
      Johannes Weiner authored
      The lockless reclaim hierarchy iterator currently has a misplaced
      barrier that can lead to use-after-free crashes.
      
      The reclaim hierarchy iterator consist of a sequence count and a
      position pointer that are read and written locklessly, with memory
      barriers enforcing ordering.
      
      The write side sets the position pointer first, then updates the
      sequence count to "publish" the new position.  Likewise, the read side
      must read the sequence count first, then the position.  If the sequence
      count is up to date, it's guaranteed that the position is up to date as
      well:
      
        writer:                         reader:
        iter->position = position       if iter->sequence == expected:
        smp_wmb()                           smp_rmb()
        iter->sequence = sequence           position = iter->position
      
      However, the read side barrier is currently misplaced, which can lead to
      dereferencing stale position pointers that no longer point to valid
      memory.  Fix this.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: <stable@kernel.org>		[3.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89dc991f
    • Akinobu Mita's avatar
      frontswap: fix incorrect zeroing and allocation size for frontswap_map · 7b57976d
      Akinobu Mita authored
      The bitmap accessed by bitops must have enough size to hold the required
      numbers of bits rounded up to a multiple of BITS_PER_LONG.  And the
      bitmap must not be zeroed by memset() if the number of bits cleared is
      not a multiple of BITS_PER_LONG.
      
      This fixes incorrect zeroing and allocation size for frontswap_map.  The
      incorrect zeroing part doesn't cause any problem because frontswap_map
      is freed just after zeroing.  But the wrongly calculated allocation size
      may cause the problem.
      
      For 32bit systems, the allocation size of frontswap_map is about twice
      as large as required size.  For 64bit systems, the allocation size is
      smaller than requeired if the number of bits is not a multiple of
      BITS_PER_LONG.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b57976d
    • Chen Gang's avatar
      kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() · 736f3203
      Chen Gang authored
      audit_add_tree_rule() must set 'rule->tree = NULL;' firstly, to protect
      the rule itself freed in kill_rules().
      
      The reason is when it is killed, the 'rule' itself may have already
      released, we should not access it.  one example: we add a rule to an
      inode, just at the same time the other task is deleting this inode.
      
      The work flow for adding a rule:
      
          audit_receive() -> (need audit_cmd_mutex lock)
            audit_receive_skb() ->
              audit_receive_msg() ->
                audit_receive_filter() ->
                  audit_add_rule() ->
                    audit_add_tree_rule() -> (need audit_filter_mutex lock)
                      ...
                      unlock audit_filter_mutex
                      get_tree()
                      ...
                      iterate_mounts() -> (iterate all related inodes)
                        tag_mount() ->
                          tag_trunk() ->
                            create_trunk() -> (assume it is 1st rule)
                              fsnotify_add_mark() ->
                                fsnotify_add_inode_mark() ->  (add mark to inode->i_fsnotify_marks)
                              ...
                              get_tree(); (each inode will get one)
                      ...
                      lock audit_filter_mutex
      
      The work flow for deleting an inode:
      
          __destroy_inode() ->
           fsnotify_inode_delete() ->
             __fsnotify_inode_delete() ->
              fsnotify_clear_marks_by_inode() ->  (get mark from inode->i_fsnotify_marks)
                fsnotify_destroy_mark() ->
                 fsnotify_destroy_mark_locked() ->
                   audit_tree_freeing_mark() ->
                     evict_chunk() ->
                       ...
                       tree->goner = 1
                       ...
                       kill_rules() ->   (assume current->audit_context == NULL)
                         call_rcu() ->   (rule->tree != NULL)
                           audit_free_rule_rcu() ->
                             audit_free_rule()
                       ...
                       audit_schedule_prune() ->  (assume current->audit_context == NULL)
                         kthread_run() ->    (need audit_cmd_mutex and audit_filter_mutex lock)
                           prune_one() ->    (delete it from prue_list)
                             put_tree(); (match the original get_tree above)
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      736f3203
    • Naoya Horiguchi's avatar
      mm: migration: add migrate_entry_wait_huge() · 30dad309
      Naoya Horiguchi authored
      When we have a page fault for the address which is backed by a hugepage
      under migration, the kernel can't wait correctly and do busy looping on
      hugepage fault until the migration finishes.  As a result, users who try
      to kick hugepage migration (via soft offlining, for example) occasionally
      experience long delay or soft lockup.
      
      This is because pte_offset_map_lock() can't get a correct migration entry
      or a correct page table lock for hugepage.  This patch introduces
      migration_entry_wait_huge() to solve this.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>	[2.6.35+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30dad309
    • Xue jiufei's avatar
      ocfs2: add missing lockres put in dlm_mig_lockres_handler · 27749f2f
      Xue jiufei authored
      dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path.
      Signed-off-by: default avatarjoyce <xuejiufei@huawei.com>
      Reviewed-by: default avatarshencanquan <shencanquan@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27749f2f
    • Tomasz Stanislawski's avatar
      mm/page_alloc.c: fix watermark check in __zone_watermark_ok() · 026b0814
      Tomasz Stanislawski authored
      The watermark check consists of two sub-checks.  The first one is:
      
      	if (free_pages <= min + lowmem_reserve)
      		return false;
      
      The check assures that there is minimal amount of RAM in the zone.  If
      CMA is used then the free_pages is reduced by the number of free pages
      in CMA prior to the over-mentioned check.
      
      	if (!(alloc_flags & ALLOC_CMA))
      		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
      
      This prevents the zone from being drained from pages available for
      non-movable allocations.
      
      The second check prevents the zone from getting too fragmented.
      
      	for (o = 0; o < order; o++) {
      		free_pages -= z->free_area[o].nr_free << o;
      		min >>= 1;
      		if (free_pages <= min)
      			return false;
      	}
      
      The field z->free_area[o].nr_free is equal to the number of free pages
      including free CMA pages.  Therefore the CMA pages are subtracted twice.
      This may cause a false positive fail of __zone_watermark_ok() if the CMA
      area gets strongly fragmented.  In such a case there are many 0-order
      free pages located in CMA.  Those pages are subtracted twice therefore
      they will quickly drain free_pages during the check against
      fragmentation.  The test fails even though there are many free non-cma
      pages in the zone.
      
      This patch fixes this issue by subtracting CMA pages only for a purpose of
      (free_pages <= min + lowmem_reserve) check.
      
      Laura said:
      
        We were observing allocation failures of higher order pages (order 5 =
        128K typically) under tight memory conditions resulting in driver
        failure.  The output from the page allocation failure showed plenty of
        free pages of the appropriate order/type/zone and mostly CMA pages in
        the lower orders.
      
        For full disclosure, we still observed some page allocation failures
        even after applying the patch but the number was drastically reduced and
        those failures were attributed to fragmentation/other system issues.
      Signed-off-by: default avatarTomasz Stanislawski <t.stanislaws@samsung.com>
      Signed-off-by: default avatarKyungmin Park <kyungmin.park@samsung.com>
      Tested-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Tested-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: <stable@vger.kernel.org>	[3.7+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      026b0814
    • Dan Carpenter's avatar
      drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info() · 282c4c0e
      Dan Carpenter authored
      The "info.fill" array isn't initialized so it can leak uninitialized stack
      information to user space.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarRobin Holt <holt@sgi.com>
      Acked-by: default avatarDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      282c4c0e
    • Kent Overstreet's avatar
      aio: fix io_destroy() regression by using call_rcu() · 4fcc712f
      Kent Overstreet authored
      There was a regression introduced by 36f55889 ("aio: refcounting
      cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
      using RCU in the shutdown path, but the synchronize_rcu() was done in
      the context of the io_destroy() syscall greatly increasing the time it
      could block.
      
      This patch switches it to call_rcu() and makes shutdown asynchronous
      (more asynchronous than it was originally; before the refcount changes
      io_destroy() would still wait on pending kiocbs).
      
      Note that there's a global quota on the max outstanding kiocbs, and that
      quota must be manipulated synchronously; otherwise io_setup() could
      return -EAGAIN when there isn't quota available, and userspace won't
      have any way of waiting until shutdown of the old kioctxs has finished
      (besides busy looping).
      
      So we release our quota before kioctx shutdown has finished, which
      should be fine since the quota never corresponded to anything real
      anyways.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Tested-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Tested-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4fcc712f
    • Johan Hovold's avatar
      rtc-at91rm9200: use shadow IMR on at91sam9x5 · bba00e59
      Johan Hovold authored
      Add support for the at91sam9x5-family which must use the shadow
      interrupt mask due to a hardware issue (causing RTC_IMR to always be
      zero).
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bba00e59
    • Johan Hovold's avatar
      rtc-at91rm9200: add shadow interrupt mask · e9f08bbe
      Johan Hovold authored
      Add shadow interrupt-mask register which can be used on SoCs where the
      actual hardware register is broken.
      
      Note that some care needs to be taken to make sure the shadow mask
      corresponds to the actual hardware state.  The added overhead is not an
      issue for the non-broken SoCs due to the relatively infrequent
      interrupt-mask updates.  We do, however, only use the shadow mask value
      as a fall-back when it actually needed as there is still a theoretical
      possibility that the mask is incorrect (see the code for details).
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9f08bbe
    • Johan Hovold's avatar
      rtc-at91rm9200: refactor interrupt-register handling · e304fcd0
      Johan Hovold authored
      Add accessors for the interrupt register.
      
      This will allow us to easily add a shadow interrupt-mask register to use
      on SoCs where the interrupt-mask register cannot be used.
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e304fcd0
    • Johan Hovold's avatar
      rtc-at91rm9200: add configuration support · de645475
      Johan Hovold authored
      Add configuration support which can be used to implement SoC-specific
      workarounds for broken hardware.
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de645475
    • Johan Hovold's avatar
      rtc-at91rm9200: add match-table compile guard · 558c61e5
      Johan Hovold authored
      The members of Atmel's at91sam9x5 family (9x5) have a broken RTC
      interrupt mask register (AT91_RTC_IMR).  It does not reflect enabled
      interrupts but instead always returns zero.
      
      The kernel's rtc-at91rm9200 driver handles the RTC for the 9x5 family.
      Currently when the date/time is set, an interrupt is generated and this
      driver neglects to handle the interrupt.  The kernel complains about the
      un-handled interrupt and disables it henceforth.  This not only breaks
      the RTC function, but since that interrupt is shared (Atmel's SYS
      interrupt) then other things break as well (e.g.  the debug port no
      longer accepts characters).
      
      Tested on the at91sam9g25.  Bug confirmed by Atmel.
      
      This patch (of 5):
      
      Add missing match-table compile guard.
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      558c61e5