1. 12 Aug, 2010 14 commits
    • Adrian Hunter's avatar
      mmc: add erase, secure erase, trim and secure trim operations · dfe86cba
      Adrian Hunter authored
      SD/MMC cards tend to support an erase operation.  In addition, eMMC v4.4
      cards can support secure erase, trim and secure trim operations that are
      all variants of the basic erase command.
      
      SD/MMC device attributes "erase_size" and "preferred_erase_size" have been
      added.
      
      "erase_size" is the minimum size, in bytes, of an erase operation.  For
      MMC, "erase_size" is the erase group size reported by the card.  Note that
      "erase_size" does not apply to trim or secure trim operations where the
      minimum size is always one 512 byte sector.  For SD, "erase_size" is 512
      if the card is block-addressed, 0 otherwise.
      
      SD/MMC cards can erase an arbitrarily large area up to and
      including the whole card.  When erasing a large area it may
      be desirable to do it in smaller chunks for three reasons:
      
          1. A single erase command will make all other I/O on the card
             wait.  This is not a problem if the whole card is being erased, but
             erasing one partition will make I/O for another partition on the
             same card wait for the duration of the erase - which could be a
             several minutes.
      
          2. To be able to inform the user of erase progress.
      
          3. The erase timeout becomes too large to be very useful.
             Because the erase timeout contains a margin which is multiplied by
             the size of the erase area, the value can end up being several
             minutes for large areas.
      
      "erase_size" is not the most efficient unit to erase (especially for SD
      where it is just one sector), hence "preferred_erase_size" provides a good
      chunk size for erasing large areas.
      
      For MMC, "preferred_erase_size" is the high-capacity erase size if a card
      specifies one, otherwise it is based on the capacity of the card.
      
      For SD, "preferred_erase_size" is the allocation unit size specified by
      the card.
      
      "preferred_erase_size" is in bytes.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Kyungmin Park <kmpark@infradead.org>
      Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ben Gardiner <bengardiner@nanometrics.ca>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dfe86cba
    • Jan Kara's avatar
      mm: fix writeback_in_progress() · 81d73a32
      Jan Kara authored
      Commit 83ba7b07 ("writeback: simplify the write back thread queue")
      broke writeback_in_progress() as in that commit we started to remove work
      items from the list at the moment we start working on them and not at the
      moment they are finished.  Thus if the flusher thread was doing some work
      but there was no other work queued, writeback_in_progress() returned
      false.  This could in particular cause unnecessary queueing of background
      writeback from balance_dirty_pages() or writeout work from
      writeback_sb_if_idle().
      
      This patch fixes the problem by introducing a bit in the bdi state which
      indicates that the flusher thread is processing some work and uses this
      bit for writeback_in_progress() test.
      
      NOTE: Both callsites of writeback_in_progress() (namely,
      writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
      need a different information than what writeback_in_progress() provides.
      They would need to know whether *the kind of writeback they are going to
      submit* is already queued.  But this information isn't that simple to
      provide so let's fix writeback_in_progress() for the time being.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Acked-by: default avatarJens Axboe <jaxboe@fusionio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      81d73a32
    • Wu Fengguang's avatar
      writeback: merge for_kupdate and !for_kupdate cases · a50aeb40
      Wu Fengguang authored
      Unify the logic for kupdate and non-kupdate cases.  There won't be
      starvation because the inodes requeued into b_more_io will later be
      spliced _after_ the remaining inodes in b_io, hence won't stand in the way
      of other inodes in the next run.
      
      It avoids unnecessary redirty_tail() calls, hence the update of
      i_dirtied_when.  The timestamp update is undesirable because it could
      later delay the inode's periodic writeback, or may exclude the inode from
      the data integrity sync operation (which checks timestamp to avoid extra
      work and livelock).
      
      ===
      How the redirty_tail() comes about:
      
      It was a long story..  This redirty_tail() was introduced with
      wbc.more_io.  The initial patch for more_io actually does not have the
      redirty_tail(), and when it's merged, several 100% iowait bug reports
      arised:
      
      reiserfs:
              http://lkml.org/lkml/2007/10/23/93
      
      jfs:
              commit 29a424f2
              JFS: clear PAGECACHE_TAG_DIRTY for no-write pages
      
      ext2:
              http://www.spinics.net/linux/lists/linux-ext4/msg04762.html
      
      They are all old bugs hidden in various filesystems that become "visible"
      with the more_io patch.  At the time, the ext2 bug is thought to be
      "trivial", so not fixed.  Instead the following updated more_io patch with
      redirty_tail() is merged:
      
      	http://www.spinics.net/linux/lists/linux-ext4/msg04507.html
      
      This will in general prevent 100% on ext2 and possibly other unknown FS bugs.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a50aeb40
    • Wu Fengguang's avatar
      writeback: fix queue_io() ordering · 4ea879b9
      Wu Fengguang authored
      This was not a bug, since b_io is empty for kupdate writeback.  The next
      patch will do requeue_io() for non-kupdate writeback, so let's fix it.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ea879b9
    • Wu Fengguang's avatar
      writeback: don't redirty tail an inode with dirty pages · 23539afc
      Wu Fengguang authored
      Avoid delaying writeback for an expire inode with lots of dirty pages, but
      no active dirtier at the moment.  Previously we only do that for the
      kupdate case.
      
      Any filesystem that does delayed allocation or unwritten extent conversion
      after IO completion will cause this - for example, XFS.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23539afc
    • Wu Fengguang's avatar
      writeback: add comment to the dirty limit functions · 1babe183
      Wu Fengguang authored
      Document global_dirty_limits() and bdi_dirty_limit().
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1babe183
    • Wu Fengguang's avatar
      writeback: avoid unnecessary calculation of bdi dirty thresholds · 16c4042f
      Wu Fengguang authored
      Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so
      that the latter can be avoided when under global dirty background
      threshold (which is the normal state for most systems).
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16c4042f
    • Wu Fengguang's avatar
      writeback: balance_dirty_pages(): reduce calls to global_page_state · e50e3720
      Wu Fengguang authored
      Reducing the number of times balance_dirty_pages calls global_page_state
      reduces the cache references and so improves write performance on a
      variety of workloads.
      
      'perf stats' of simple fio write tests shows the reduction in cache
      access.  Where the test is fio 'write,mmap,600Mb,pre_read' on AMD AthlonX2
      with 3Gb memory (dirty_threshold approx 600 Mb) running each test 10
      times, dropping the fasted & slowest values then taking the average &
      standard deviation
      
      		average (s.d.) in millions (10^6)
      2.6.31-rc8	648.6 (14.6)
      +patch		620.1 (16.5)
      
      Achieving this reduction is by dropping clip_bdi_dirty_limit as it rereads
      the counters to apply the dirty_threshold and moving this check up into
      balance_dirty_pages where it has already read the counters.
      
      Also by rearrange the for loop to only contain one copy of the limit tests
      allows the pdflush test after the loop to use the local copies of the
      counters rather than rereading them.
      
      In the common case with no throttling it now calls global_page_state 5
      fewer times and bdi_stat 2 fewer.
      
      Fengguang:
      
      This patch slightly changes behavior by replacing clip_bdi_dirty_limit()
      with the explicit check (nr_reclaimable + nr_writeback >= dirty_thresh) to
      avoid exceeding the dirty limit.  Since the bdi dirty limit is mostly
      accurate we don't need to do routinely clip.  A simple dirty limit check
      would be enough.
      
      The check is necessary because, in principle we should throttle everything
      calling balance_dirty_pages() when we're over the total limit, as said by
      Peter.
      
      We now set and clear dirty_exceeded not only based on bdi dirty limits,
      but also on the global dirty limit.  The global limit check is added in
      place of clip_bdi_dirty_limit() for safety and not intended as a behavior
      change.  The bdi limits should be tight enough to keep all dirty pages
      under the global limit at most time; occasional small exceeding should be
      OK though.  The change makes the logic more obvious: the global limit is
      the ultimate goal and shall be always imposed.
      
      We may now start background writeback work based on outdated conditions.
      That's safe because the bdi flush thread will (and have to) double check
      the states.  It reduces overall overheads because the test based on old
      states still have good chance to be right.
      
      [akpm@linux-foundation.org] fix uninitialized dirty_exceeded
      Signed-off-by: default avatarRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e50e3720
    • Florian Zumbiehl's avatar
      parisc: fix wrong page aligned size calculation in ioremapping code · a292dfa0
      Florian Zumbiehl authored
      parisc __ioremap(): fix off-by-one error in page alignment of allocation
      size for sizes where size%PAGE_SIZE==1.
      Signed-off-by: default avatarFlorian Zumbiehl <florz@florz.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Tested-by: default avatarHelge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a292dfa0
    • Roel Kluin's avatar
      score: fix dereference of NULL pointer in local_flush_tlb_page() · 17e46503
      Roel Kluin authored
      Don't dereference vma if it's NULL.
      Signed-off-by: default avatarRoel Kluin <roel.kluin@gmail.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17e46503
    • Randy Dunlap's avatar
      pc8736x_gpio: depends on X86_32 · 7b958090
      Randy Dunlap authored
      Fix kconfig dependency warning for PC8736x_GPIO by restricting it to
      X86_32.
      
        warning: (SCx200_GPIO && SCx200 || PC8736x_GPIO && X86) selects NSC_GPIO which has unmet direct dependencies (X86_32)
      
      NSC_GPIO is X86_32 only.  The other driver (SCx200_GPIO) that selects
      NSC_GPIO is X86_32 only (indirectly, since SCx200 depends on X86_32), so
      limit this driver also.
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Jordan Crouse <jordan.crouse@amd.com>
      Cc: Jim Cromie <jim.cromie@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b958090
    • Randy Dunlap's avatar
      mm: fix fatal kernel-doc error · 3c111a07
      Randy Dunlap authored
      Fix a fatal kernel-doc error due to a #define coming between a function's
      kernel-doc notation and the function signature.  (kernel-doc cannot handle
      this)
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c111a07
    • Thomas Gleixner's avatar
      acpi: fix bogus preemption logic · 0a7992c9
      Thomas Gleixner authored
      The ACPI_PREEMPTION_POINT() logic was introduced in commit 8bd108d1
      (ACPICA: add preemption point after each opcode parse).  The follow up
      commits abe1dfab, 138d1569, c084ca70 tried to fix the preemption logic
      back and forth, but nobody noticed that the usage of
      in_atomic_preempt_off() in that context is wrong.
      
      The check which guards the call of cond_resched() is:
      
          if (!in_atomic_preempt_off() && !irqs_disabled())
      
      in_atomic_preempt_off() is not intended for general use as the comment
      above the macro definition clearly says:
      
       * Check whether we were atomic before we did preempt_disable():
       * (used by the scheduler, *after* releasing the kernel lock)
      
      On a CONFIG_PREEMPT=n kernel the usage of in_atomic_preempt_off() works by
      accident, but with CONFIG_PREEMPT=y it's just broken.
      
      The whole purpose of the ACPI_PREEMPTION_POINT() is to reduce the latency
      on a CONFIG_PREEMPT=n kernel, so make ACPI_PREEMPTION_POINT() depend on
      CONFIG_PREEMPT=n and remove the in_atomic_preempt_off() check.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16210
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Francois Valenduc <francois.valenduc@tvcablenet.be>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0a7992c9
    • Stefani Seibold's avatar
      kernel/kfifo.c: add handling of chained scatterlists · d78a3eda
      Stefani Seibold authored
      The current kfifo scatterlist implementation will not work with chained
      scatterlists.  It assumes that struct scatterlist arrays are allocated
      contiguously, which is not the case when chained scatterlists (struct
      sg_table) are in use.
      Signed-off-by: default avatarStefani Seibold <stefani@seibold.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d78a3eda
  2. 11 Aug, 2010 26 commits