1. 07 Aug, 2014 40 commits
    • Rasmus Villemoes's avatar
      lib: bitmap: make nbits parameter of bitmap_complement unsigned · 3d6684f4
      Rasmus Villemoes authored
      The compiler can generate slightly smaller and simpler code when it
      knows that "nbits" is non-negative.  Since no-one passes a negative
      bit-count, this shouldn't affect the semantics.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d6684f4
    • Rasmus Villemoes's avatar
      lib: bitmap: make nbits parameter of bitmap_equal unsigned · 5e068069
      Rasmus Villemoes authored
      The compiler can generate slightly smaller and simpler code when it
      knows that "nbits" is non-negative.  Since no-one passes a negative
      bit-count, this shouldn't affect the semantics.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e068069
    • Rasmus Villemoes's avatar
      lib: bitmap: make nbits parameter of bitmap_full unsigned · 8397927c
      Rasmus Villemoes authored
      The compiler can generate slightly smaller and simpler code when it
      knows that "nbits" is non-negative.  Since no-one passes a negative
      bit-count, this shouldn't affect the semantics.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8397927c
    • Rasmus Villemoes's avatar
      lib: bitmap: make nbits parameter of bitmap_empty unsigned · 0679cc48
      Rasmus Villemoes authored
      Many functions in lib/bitmap.c start with an expression such as lim =
      bits/BITS_PER_LONG.  Since bits has type (signed) int, and since gcc
      cannot know that it is in fact non-negative, it generates worse code
      than it could.  These patches, mostly consisting of changing various
      parameters to unsigned, gives a slight overall code reduction:
      
        add/remove: 1/1 grow/shrink: 8/16 up/down: 251/-414 (-163)
        function                                     old     new   delta
        tick_device_uses_broadcast                   335     425     +90
        __irq_alloc_descs                            498     554     +56
        __bitmap_andnot                               73     115     +42
        __bitmap_and                                  70     101     +31
        bitmap_weight                                  -      11     +11
        copy_hugetlb_page_range                      752     762     +10
        follow_hugetlb_page                          846     854      +8
        hugetlb_init                                1415    1417      +2
        hugetlb_nrpages_setup                        130     131      +1
        hugetlb_add_hstate                           377     376      -1
        bitmap_allocate_region                        82      80      -2
        select_task_rq_fair                         2202    2191     -11
        hweight_long                                  66      55     -11
        __reg_op                                     230     219     -11
        dm_stats_message                            2849    2833     -16
        bitmap_parselist                              92      74     -18
        __bitmap_weight                              115      97     -18
        __bitmap_subset                              153     129     -24
        __bitmap_full                                128     104     -24
        __bitmap_empty                               120      96     -24
        bitmap_set                                   179     149     -30
        bitmap_clear                                 185     155     -30
        __bitmap_equal                               136     105     -31
        __bitmap_intersects                          148     108     -40
        __bitmap_complement                          109      67     -42
        tick_device_setup_broadcast_func.isra         81       -     -81
      
      [The increases in __bitmap_and{,not} are due to bug fixes 17/18,18/18.
      No idea why bitmap_weight suddenly appears.] While 163 bytes treewide is
      insignificant, I believe the bitmap functions are often called with
      locks held, so saving even a few cycles might be worth it.
      
      While making these changes, I found a few other things that might be
      worth including.  16,17,18 are actual bug fixes.  The rest shouldn't
      change the behaviour of any of the functions, provided no-one passed
      negative nbits values.  If something should come up, it should be fairly
      bisectable.
      
      A few issues I thought about, but didn't know what to do with:
      
      * Many of the functions misbehave if nbits is compile-time 0; the
        out-of-line functions generally handle 0 correctly.  bitmap_fill() is
        particularly bad, whether the 0 is known at compile time or not.  It
        would probably be nice to add detection of at least compile-time 0 and
        handle that appropriately.
      
      * I didn't change __bitmap_shift_{left,right} to use unsigned because I
        want to fully understand why the algorithm works before making that
        change.  However, AFAICT, they behave correctly for all (positive) shift
        amounts.  This is not the case for the small_const_nbits versions.  If
        for example nbits = n = BITS_PER_LONG, the shift operators turn into
        no-ops (at least on x86), so one get *dst = *src, whereas one would
        expect to get *dst=0.  That difference in behaviour is somewhat
        annoying.
      
      This patch (of 18):
      
      The compiler can generate slightly smaller and simpler code when it
      knows that "nbits" is non-negative.  Since no-one passes a negative
      bit-count, this shouldn't affect the semantics.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0679cc48
    • Andrew Morton's avatar
      lib/list_sort.c: convert to pr_foo · d0da23b0
      Andrew Morton authored
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Don Mullis <don.mullis@gmail.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0da23b0
    • Rasmus Villemoes's avatar
      lib: list_sort.c: Limit number of unused cmp callbacks · 61b3d6c4
      Rasmus Villemoes authored
      The helper merge_and_restore_back_links() makes sure to call the
      caller's cmp function during the final ->prev pointer fixup, so that the
      cmp function may call cond_resched().  However, if the cmp function does
      not call cond_resched() at all, this is entirely redundant.  If it does,
      doing at least two function calls for every two pointer assignments is a
      bit excessive.  This patch limits the calls to once for every 256
      iterations.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Don Mullis <don.mullis@gmail.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      61b3d6c4
    • Rasmus Villemoes's avatar
      lib: list_sort_test(): simplify and harden cleanup · 69412303
      Rasmus Villemoes authored
      There is no reason to maintain the list structure while freeing the
      debug elements.  Aside from the redundant pointer manipulations, it is
      also inefficient from a locality-of-reference viewpoint, since they are
      visited in a random order (wrt.  the order they were allocated).
      Furthermore, if we jumped to exit: after detecting list corruption, it
      is actually dangerous.
      
      So just free the elements in the order they were allocated, using the
      backing array elts.  Allocate that using kcalloc(), so that if
      allocation of one of the debug element fails, we just end up calling
      kfree(NULL) for the trailing elements.
      
      Minor details: Use sizeof(*elts) instead of sizeof(void *), and return
      err immediately when allocation of elts fails, to avoid introducing
      another label just before the final return statement.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Don Mullis <don.mullis@gmail.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69412303
    • Rasmus Villemoes's avatar
      lib: list_sort_test(): add extra corruption check · 9d418dcc
      Rasmus Villemoes authored
      Add a check to make sure that the prev pointer of the list head points
      to the last element on the list.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Don Mullis <don.mullis@gmail.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9d418dcc
    • Rasmus Villemoes's avatar
      lib: list_sort_test(): return -ENOMEM when allocation fails · 27d555d1
      Rasmus Villemoes authored
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Don Mullis <don.mullis@gmail.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27d555d1
    • Joe Perches's avatar
      kernel.h: remove deprecated pack_hex_byte · 087face5
      Joe Perches authored
      It's been nearly 3 years now since commit 55036ba7 ("lib: rename
      pack_hex_byte() to hex_byte_pack()") so it's time to remove this
      deprecated and unused static inline.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      087face5
    • Fabian Frederick's avatar
      lib/test-kstrtox.c: use ARRAY_SIZE instead of sizeof/sizeof[0] · 129965a9
      Fabian Frederick authored
      Use kernel.h definition.
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      129965a9
    • Mathias Krause's avatar
      lib/string_helpers.c: constify static arrays · 142cda5d
      Mathias Krause authored
      Complement commit 68aecfb9 ("lib/string_helpers.c: make arrays
      static") by making the arrays const -- not only pointing to const
      strings.  This moves them out of the data section to the r/o data
      section:
      
         text    data     bss     dec     hex filename
         1150     176       0    1326     52e lib/string_helpers.old.o
         1326       0       0    1326     52e lib/string_helpers.new.o
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      142cda5d
    • Gui Hecheng's avatar
      lib/cmdline.c: add size unit t/p/e to memparse · e004f3c7
      Gui Hecheng authored
      For modern filesystems such as btrfs, t/p/e size level operations are
      common.  add size unit t/p/e parsing to memparse
      Signed-off-by: default avatarGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e004f3c7
    • George Spelvin's avatar
      libata: Use glob_match from lib/glob.c · 428ac5fc
      George Spelvin authored
      The function may be useful for other drivers, so export it.  (Suggested
      by Tejun Heo.)
      
      Note that I inverted the return value of glob_match; returning true on
      match seemed to make more sense.
      Signed-off-by: default avatarGeorge Spelvin <linux@horizon.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      428ac5fc
    • George Spelvin's avatar
      lib/glob.c: add CONFIG_GLOB_SELFTEST · 5f9be824
      George Spelvin authored
      This was useful during development, and is retained for future
      regression testing.
      
      GCC appears to have no way to place string literals in a particular
      section; adding __initconst to a char pointer leaves the string itself
      in the default string section, where it will not be thrown away after
      module load.
      
      Thus all string constants are kept in explicitly declared and named
      arrays.  Sorry this makes printk a bit harder to read.  At least the
      tests are more compact.
      Signed-off-by: default avatarGeorge Spelvin <linux@horizon.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f9be824
    • George Spelvin's avatar
      lib: add lib/glob.c · b0125085
      George Spelvin authored
      This is a helper function from drivers/ata/libata_core.c, where it is
      used to blacklist particular device models.  It's being moved to lib/ so
      other drivers may use it for the same purpose.
      
      This implementation in non-recursive, so is safe for the kernel stack.
      
      [akpm@linux-foundation.org: fix sparse warning]
      Signed-off-by: default avatarGeorge Spelvin <linux@horizon.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0125085
    • Sergey Senozhatsky's avatar
      zlib: clean up some dead code · 62e7ca52
      Sergey Senozhatsky authored
      Cleanup unused `if 0'-ed functions, which have been dead since 2006
      (commits 87c2ce3b ("lib/zlib*: cleanups") by Adrian Bunk and
      4f3865fb ("zlib_inflate: Upgrade library code to a recent version")
      by Richard Purdie):
      
       - zlib_deflateSetDictionary
       - zlib_deflateParams
       - zlib_deflateCopy
       - zlib_inflateSync
       - zlib_syncsearch
       - zlib_inflateSetDictionary
       - zlib_inflatePrime
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62e7ca52
    • Ken Helias's avatar
      klist: use same naming scheme as hlist for klist_add_after() · 0f9859ca
      Ken Helias authored
      The name was modified from hlist_add_after() to hlist_add_behind() when
      adjusting the order of arguments to match the one with
      klist_add_after().  This is necessary to break old code when it would
      use it the wrong way.
      
      Make klist follow this naming scheme for consistency.
      Signed-off-by: default avatarKen Helias <kenhelias@firemail.de>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f9859ca
    • Ken Helias's avatar
      list: fix order of arguments for hlist_add_after(_rcu) · 1d023284
      Ken Helias authored
      All other add functions for lists have the new item as first argument
      and the position where it is added as second argument.  This was changed
      for no good reason in this function and makes using it unnecessary
      confusing.
      
      The name was changed to hlist_add_behind() to cause unconverted code to
      generate a compile error instead of using the wrong parameter order.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKen Helias <kenhelias@firemail.de>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[intel driver bits]
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1d023284
    • Ken Helias's avatar
      list: make hlist_add_after() argument names match hlist_add_after_rcu() · bc18dd33
      Ken Helias authored
      The argument names for hlist_add_after() are poorly chosen because they
      look the same as the ones for hlist_add_before() but have to be used
      differently.
      
      hlist_add_after_rcu() has made a better choice.
      Signed-off-by: default avatarKen Helias <kenhelias@firemail.de>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc18dd33
    • Neil Zhang's avatar
      kernel/printk/printk.c: fix bool assignements · d25d9fec
      Neil Zhang authored
      Fix coccinelle warnings.
      Signed-off-by: default avatarNeil Zhang <zhangwm@marvell.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d25d9fec
    • Jan Kara's avatar
      printk: enable interrupts before calling console_trylock_for_printk() · 5874af20
      Jan Kara authored
      We need interrupts disabled when calling console_trylock_for_printk()
      only so that cpu id we pass to can_use_console() remains valid (for
      other things console_sem provides all the exclusion we need and
      deadlocks on console_sem due to interrupts are impossible because we use
      down_trylock()).  However if we are rescheduled, we are guaranteed to
      run on an online cpu so we can easily just get the cpu id in
      can_use_console().
      
      We can lose a bit of performance when we enable interrupts in
      vprintk_emit() and then disable them again in console_unlock() but OTOH
      it can somewhat reduce interrupt latency caused by console_unlock().
      
      We differ from (reverted) commit 939f04be in that we avoid calling
      console_unlock() from vprintk_emit() with lockdep enabled as that has
      unveiled quite some bugs leading to system freezes during boot (e.g.
        https://lkml.org/lkml/2014/5/30/242,
        https://lkml.org/lkml/2014/6/28/521).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Tested-by: default avatarAndreas Bombe <aeb@debian.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5874af20
    • Alex Elder's avatar
      printk: miscellaneous cleanups · 249771b8
      Alex Elder authored
      Some small cleanups to kernel/printk/printk.c.  None of them should
      cause any change in behavior.
      
        - When CONFIG_PRINTK is defined, parenthesize the value of LOG_LINE_MAX.
        - When CONFIG_PRINTK is *not* defined, there is an extra LOG_LINE_MAX
          definition; delete it.
        - Pull an assignment out of a conditional expression in console_setup().
        - Use isdigit() in console_setup() rather than open coding it.
        - In update_console_cmdline(), drop a NUL-termination assignment;
          the strlcpy() call that precedes it guarantees it's not needed.
        - Simplify some logic in printk_timed_ratelimit().
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      249771b8
    • Alex Elder's avatar
      printk: use a clever macro · e99aa461
      Alex Elder authored
      Use the IS_ENABLED() macro rather than #ifdef blocks to set certain
      global values.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e99aa461
    • Alex Elder's avatar
      printk: fix some comments · 0b90fec3
      Alex Elder authored
      Fix a few comments that don't accurately describe their corresponding
      code.  It also fixes some minor typographical errors.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0b90fec3
    • Alex Elder's avatar
      printk: rename DEFAULT_MESSAGE_LOGLEVEL · 42a9dc0b
      Alex Elder authored
      Commit a8fe19eb ("kernel/printk: use symbolic defines for console
      loglevels") makes consistent use of symbolic values for printk() log
      levels.
      
      The naming scheme used is different from the one used for
      DEFAULT_MESSAGE_LOGLEVEL though.  Change that symbol name to be
      MESSAGE_LOGLEVEL_DEFAULT for consistency.  And because the value of that
      symbol comes from a similarly-named config option, rename
      CONFIG_DEFAULT_MESSAGE_LOGLEVEL as well.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      42a9dc0b
    • Alex Elder's avatar
      printk: tweak do_syslog() to match comments · e97e1267
      Alex Elder authored
      In do_syslog() there's a path used by kmsg_poll() and kmsg_read() that
      only needs to know whether there's any data available to read (and not
      its size).  These callers only check for non-zero return.  As a
      shortcut, do_syslog() returns the difference between what has been
      logged and what has been "seen."
      
      The comments say that the "count of records" should be returned but it's
      not.  Instead it returns (log_next_idx - syslog_idx), which is a
      difference between buffer offsets--and the result could be negative.
      
      The behavior is the same (it'll be zero or not in the same cases), but
      the count of records is more meaningful and it matches what the comments
      say.  So change the code to return that.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e97e1267
    • Luis R. Rodriguez's avatar
      printk: allow increasing the ring buffer depending on the number of CPUs · 23b2899f
      Luis R. Rodriguez authored
      The default size of the ring buffer is too small for machines with a
      large amount of CPUs under heavy load.  What ends up happening when
      debugging is the ring buffer overlaps and chews up old messages making
      debugging impossible unless the size is passed as a kernel parameter.
      An idle system upon boot up will on average spew out only about one or
      two extra lines but where this really matters is on heavy load and that
      will vary widely depending on the system and environment.
      
      There are mechanisms to help increase the kernel ring buffer for tracing
      through debugfs, and those interfaces even allow growing the kernel ring
      buffer per CPU.  We also have a static value which can be passed upon
      boot.  Relying on debugfs however is not ideal for production, and
      relying on the value passed upon bootup is can only used *after* an
      issue has creeped up.  Instead of being reactive this adds a proactive
      measure which lets you scale the amount of contributions you'd expect to
      the kernel ring buffer under load by each CPU in the worst case
      scenario.
      
      We use num_possible_cpus() to avoid complexities which could be
      introduced by dynamically changing the ring buffer size at run time,
      num_possible_cpus() lets us use the upper limit on possible number of
      CPUs therefore avoiding having to deal with hotplugging CPUs on and off.
      This introduces the kernel configuration option LOG_CPU_MAX_BUF_SHIFT
      which is used to specify the maximum amount of contributions to the
      kernel ring buffer in the worst case before the kernel ring buffer flips
      over, the size is specified as a power of 2.  The total amount of
      contributions made by each CPU must be greater than half of the default
      kernel ring buffer size (1 << LOG_BUF_SHIFT bytes) in order to trigger
      an increase upon bootup.  The kernel ring buffer is increased to the
      next power of two that would fit the required minimum kernel ring buffer
      size plus the additional CPU contribution.  For example if LOG_BUF_SHIFT
      is 18 (256 KB) you'd require at least 128 KB contributions by other CPUs
      in order to trigger an increase of the kernel ring buffer.  With a
      LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything over > 64
      possible CPUs to trigger an increase.  If you had 128 possible CPUs the
      amount of minimum required kernel ring buffer bumps to:
      
         ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
      
      Since we require the ring buffer to be a power of two the new required
      size would be 1024 KB.
      
      This CPU contributions are ignored when the "log_buf_len" kernel
      parameter is used as it forces the exact size of the ring buffer to an
      expected power of two value.
      
      [pmladek@suse.cz: fix build]
      Signed-off-by: default avatarLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.cz>
      Tested-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Tested-by: default avatarPetr Mladek <pmladek@suse.cz>
      Reviewed-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Arun KS <arunks.linux@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23b2899f
    • Luis R. Rodriguez's avatar
      printk: make dynamic units clear for the kernel ring buffer · f5405172
      Luis R. Rodriguez authored
      Signed-off-by: default avatarLuis R. Rodriguez <mcgrof@suse.com>
      Suggested-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Arun KS <arunks.linux@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5405172
    • Luis R. Rodriguez's avatar
      printk: move power of 2 practice of ring buffer size to a helper · c0a318a3
      Luis R. Rodriguez authored
      In practice the power of 2 practice of the size of the kernel ring
      buffer remains purely historical but not a requirement, specially now
      that we have LOG_ALIGN and use it for both static and dynamic
      allocations.  It could have helped with implicit alignment back in the
      days given the even the dynamically sized ring buffer was guaranteed to
      be aligned so long as CONFIG_LOG_BUF_SHIFT was set to produce a
      __LOG_BUF_LEN which is architecture aligned, since log_buf_len=n would
      be allowed only if it was > __LOG_BUF_LEN and we always ended up
      rounding the log_buf_len=n to the next power of 2 with
      roundup_pow_of_two(), any multiple of 2 then should be also architecture
      aligned.  These assumptions of course relied heavily on
      CONFIG_LOG_BUF_SHIFT producing an aligned value but users can always
      change this.
      
      We now have precise alignment requirements set for the log buffer size
      for both static and dynamic allocations, but lets upkeep the old
      practice of using powers of 2 for its size to help with easy expected
      scalable values and the allocators for dynamic allocations.  We'll reuse
      this later so move this into a helper.
      Signed-off-by: default avatarLuis R. Rodriguez <mcgrof@suse.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Arun KS <arunks.linux@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c0a318a3
    • Luis R. Rodriguez's avatar
      printk: make dynamic kernel ring buffer alignment explicit · 70300177
      Luis R. Rodriguez authored
      We have to consider alignment for the ring buffer both for the default
      static size, and then also for when an dynamic allocation is made when
      the log_buf_len=n kernel parameter is passed to set the size
      specifically to a size larger than the default size set by the
      architecture through CONFIG_LOG_BUF_SHIFT.
      
      The default static kernel ring buffer can be aligned properly if
      architectures set CONFIG_LOG_BUF_SHIFT properly, we provide ranges for
      the size though so even if CONFIG_LOG_BUF_SHIFT has a sensible aligned
      value it can be reduced to a non aligned value.  Commit 6ebb017d
      ("printk: Fix alignment of buf causing crash on ARM EABI") by Andrew
      Lunn ensures the static buffer is always aligned and the decision of
      alignment is done by the compiler by using __alignof__(struct log).
      
      When log_buf_len=n is used we allocate the ring buffer dynamically.
      Dynamic allocation varies, for the early allocation called before
      setup_arch() memblock_virt_alloc() requests a page aligment and for the
      default kernel allocation memblock_virt_alloc_nopanic() requests no
      special alignment, which in turn ends up aligning the allocation to
      SMP_CACHE_BYTES, which is L1 cache aligned.
      
      Since we already have the required alignment for the kernel ring buffer
      though we can do better and request explicit alignment for LOG_ALIGN.
      This does that to be safe and make dynamic allocation alignment
      explicit.
      Signed-off-by: default avatarLuis R. Rodriguez <mcgrof@suse.com>
      Tested-by: default avatarPetr Mladek <pmladek@suse.cz>
      Acked-by: default avatarPetr Mladek <pmladek@suse.cz>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Joe Perches <joe@perches.com>
      Cc: Arun KS <arunks.linux@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70300177
    • Geoff Levand's avatar
    • Joe Perches's avatar
      fs.h, drivers/hwmon/asus_atk0110.c: fix DEFINE_SIMPLE_ATTRIBUTE semicolon definition and use · 68be3029
      Joe Perches authored
      The DEFINE_SIMPLE_ATTRIBUTE macro should not end in a ; Fix the one use
      in the kernel tree that did not have a semicolon.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Acked-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Acked-by: default avatarLuca Tettamanti <kronos.it@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      68be3029
    • Jiri Kosina's avatar
      ./Makefile: tell gcc optimizer to never introduce new data races · 69102311
      Jiri Kosina authored
      We have been chasing a memory corruption bug, which turned out to be
      caused by very old gcc (4.3.4), which happily turned conditional load
      into a non-conditional one, and that broke correctness (the condition
      was met only if lock was held) and corrupted memory.
      
      This particular problem with that particular code did not happen when
      never gccs were used.  I've brought this up with our gcc folks, as I
      wanted to make sure that this can't really happen again, and it turns
      out it actually can.
      
      Quoting Martin Jambor <mjambor@suse.cz>:
       "More current GCCs are more careful when it comes to replacing a
        conditional load with a non-conditional one, most notably they check
        that a store happens in each iteration of _a_ loop but they assume
        loops are executed.  They also perform a simple check whether the
        store cannot trap which currently passes only for non-const
        variables.  A simple testcase demonstrating it on an x86_64 is for
        example the following:
      
        $ cat cond_store.c
      
        int g_1 = 1;
      
        int g_2[1024] __attribute__((section ("safe_section"), aligned (4096)));
      
        int c = 4;
      
        int __attribute__ ((noinline))
        foo (void)
        {
          int l;
          for (l = 0; (l != 4); l++) {
            if (g_1)
              return l;
            for (g_2[0] = 0; (g_2[0] >= 26); ++g_2[0])
              ;
          }
          return 2;
        }
      
        int main (int argc, char* argv[])
        {
          if (mprotect (g_2, sizeof(g_2), PROT_READ) == -1)
            {
              int e = errno;
              error (e, e, "mprotect error %i", e);
            }
          foo ();
          __builtin_printf("OK\n");
          return 0;
        }
        /* EOF */
        $ ~/gcc/trunk/inst/bin/gcc cond_store.c -O2 --param allow-store-data-races=0
        $ ./a.out
        OK
        $ ~/gcc/trunk/inst/bin/gcc cond_store.c -O2 --param allow-store-data-races=1
        $ ./a.out
        Segmentation fault
      
        The testcase fails the same at least with 4.9, 4.8 and 4.7.  Therefore
        I would suggest building kernels with this parameter set to zero. I
        also agree with Jikos that the default should be changed for -O2.  I
        have run most of the SPEC 2k6 CPU benchmarks (gamess and dealII
        failed, at -O2, not sure why) compiled with and without this option
        and did not see any real difference between respective run-times"
      
      Hopefully the default will be changed in newer gccs, but let's force it
      for kernel builds so that we are on a safe side even when older gcc are
      used.
      
      The code in question was out-of-tree printk-in-NMI (yeah, surprise
      suprise, once again) patch written by Petr Mladek, let me quote his
      comment from our internal bugzilla:
      
       "I have spent few days investigating inconsistent state of kernel ring buffer.
        It went out that it was caused by speculative store generated by
        gcc-4.3.4.
      
        The problem is in assembly generated for make_free_space(). The functions is
        called the following way:
      
        + vprintk_emit();
            + log = MAIN_LOG; // with logbuf_lock
               or
               log = NMI_LOG; // with nmi_logbuf_lock
               cont_add(log, ...);
                + cont_flush(log, ...);
                    + log_store(log, ...);
                          + log_make_free_space(log, ...);
      
        If called with log = NMI_LOG then only nmi_log_* global variables are safe to
        modify but the generated code does store also into (main_)log_* global
        variables:
      
        <log_make_free_space>:
               55                      push   %rbp
               89 f6                   mov    %esi,%esi
      
               48 8b 05 03 99 51 01    mov    0x1519903(%rip),%rax       # ffffffff82620868 <nmi_log_next_id>
               44 8b 1d ec 98 51 01    mov    0x15198ec(%rip),%r11d      # ffffffff82620858 <log_next_idx>
               8b 35 36 60 14 01       mov    0x1146036(%rip),%esi       # ffffffff8224cfa8 <log_buf_len>
               44 8b 35 33 60 14 01    mov    0x1146033(%rip),%r14d      # ffffffff8224cfac <nmi_log_buf_len>
               4c 8b 2d d0 98 51 01    mov    0x15198d0(%rip),%r13       # ffffffff82620850 <log_next_seq>
               4c 8b 25 11 61 14 01    mov    0x1146111(%rip),%r12       # ffffffff8224d098 <log_buf>
               49 89 c2                mov    %rax,%r10
               48 21 c2                and    %rax,%rdx
               48 8b 1d 0c 99 55 01    mov    0x155990c(%rip),%rbx       # ffffffff826608a0 <nmi_log_buf>
               49 c1 ea 20             shr    $0x20,%r10
               48 89 55 d0             mov    %rdx,-0x30(%rbp)
               44 29 de                sub    %r11d,%esi
               45 29 d6                sub    %r10d,%r14d
               4c 8b 0d 97 98 51 01    mov    0x1519897(%rip),%r9	# ffffffff82620840 <log_first_seq>
               eb 7e                   jmp    ffffffff81107029	<log_make_free_space+0xe9>
        [...]
               85 ff                   test   %edi,%edi                  # edi = 1 for NMI_LOG
               4c 89 e8                mov    %r13,%rax
               4c 89 ca                mov    %r9,%rdx
               74 0a                   je     ffffffff8110703d	<log_make_free_space+0xfd>
               8b 15 27 98 51 01       mov    0x1519827(%rip),%edx       # ffffffff82620860 <nmi_log_first_id>
               48 8b 45 d0             mov    -0x30(%rbp),%rax
               48 39 c2                cmp    %rax,%rdx                  # end of loop
               0f 84 da 00 00 00       je     ffffffff81107120 <log_make_free_space+0x1e0>
        [...]
               85 ff                   test   %edi,%edi                  # edi = 1 for NMI_LOG
               4c 89 0d 17 97 51 01    mov    %r9,0x1519717(%rip)        # ffffffff82620840 <log_first_seq>
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
                                       KABOOOM
               74 35                   je     ffffffff81107160		 <log_make_free_space+0x220>
      
        It stores log_first_seq when edi == NMI_LOG. This instructions are used also
        when edi == MAIN_LOG but the store is done speculatively before the condition
        is decided.  It is unsafe because we do not have "logbuf_lock" in NMI context
        and some other process migh modify "log_first_seq" in parallel"
      
      I believe that the best course of action is both
      
       - building kernel (and anything multi-threaded, I guess) with that
         optimization turned off
       - persuade gcc folks to change the default for future releases
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Cc: Martin Jambor <mjambor@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Marek Polacek <polacek@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Richard Biener <richard.guenther@gmail.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69102311
    • Dan Streetman's avatar
      mm/zpool: update zswap to use zpool · 12d79d64
      Dan Streetman authored
      Change zswap to use the zpool api instead of directly using zbud.  Add a
      boot-time param to allow selecting which zpool implementation to use,
      with zbud as the default.
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Tested-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Cc: Weijie Yang <weijie.yang@samsung.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12d79d64
    • Dan Streetman's avatar
      mm/zpool: zbud/zsmalloc implement zpool · c795779d
      Dan Streetman authored
      Update zbud and zsmalloc to implement the zpool api.
      
      [fengguang.wu@intel.com: make functions static]
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Tested-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Weijie Yang <weijie.yang@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c795779d
    • Dan Streetman's avatar
      mm/zpool: implement common zpool api to zbud/zsmalloc · af8d417a
      Dan Streetman authored
      Add zpool api.
      
      zpool provides an interface for memory storage, typically of compressed
      memory.  Users can select what backend to use; currently the only
      implementations are zbud, a low density implementation with up to two
      compressed pages per storage page, and zsmalloc, a higher density
      implementation with multiple compressed pages per storage page.
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Tested-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Weijie Yang <weijie.yang@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af8d417a
    • Dan Streetman's avatar
      mm/zbud: change zbud_alloc size type to size_t · 99eef8e9
      Dan Streetman authored
      Change the type of the zbud_alloc() size param from unsigned int to
      size_t.
      
      Technically, this should not make any difference, as the zbud
      implementation already restricts the size to well within either type's
      limits; but as zsmalloc (and kmalloc) use size_t, and zpool will use
      size_t, this brings the size parameter type in line with zsmalloc/zpool.
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Acked-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Tested-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Cc: Weijie Yang <weijie.yang@samsung.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      99eef8e9
    • Weijie Yang's avatar
      zram: replace global tb_lock with fine grain lock · d2d5e762
      Weijie Yang authored
      Currently, we use a rwlock tb_lock to protect concurrent access to the
      whole zram meta table.  However, according to the actual access model,
      there is only a small chance for upper user to access the same
      table[index], so the current lock granularity is too big.
      
      The idea of optimization is to change the lock granularity from whole
      meta table to per table entry (table -> table[index]), so that we can
      protect concurrent access to the same table[index], meanwhile allow the
      maximum concurrency.
      
      With this in mind, several kinds of locks which could be used as a
      per-entry lock were tested and compared:
      
      Test environment:
      x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04,
      kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO.
      
      iozone test:
      iozone -t 4 -R -r 16K -s 200M -I +Z
      (1GB zram with ext4 filesystem, take the average of 10 tests, KB/s)
      
            Test       base      CAS    spinlock    rwlock   bit_spinlock
      -------------------------------------------------------------------
       Initial write  1381094   1425435   1422860   1423075   1421521
             Rewrite  1529479   1641199   1668762   1672855   1654910
                Read  8468009  11324979  11305569  11117273  10997202
             Re-read  8467476  11260914  11248059  11145336  10906486
        Reverse Read  6821393   8106334   8282174   8279195   8109186
         Stride read  7191093   8994306   9153982   8961224   9004434
         Random read  7156353   8957932   9167098   8980465   8940476
      Mixed workload  4172747   5680814   5927825   5489578   5972253
        Random write  1483044   1605588   1594329   1600453   1596010
              Pwrite  1276644   1303108   1311612   1314228   1300960
               Pread  4324337   4632869   4618386   4457870   4500166
      
      To enhance the possibility of access the same table[index] concurrently,
      set zram a small disksize(10MB) and let threads run with large loop
      count.
      
      fio test:
      fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
      --scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
      --filename=/dev/zram0 --name=seq-write --rw=write --stonewall
      --name=seq-read --rw=read --stonewall --name=seq-readwrite
      --rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
      (10MB zram raw block device, take the average of 10 tests, KB/s)
      
          Test     base     CAS    spinlock    rwlock  bit_spinlock
      -------------------------------------------------------------
      seq-write   933789   999357   1003298    995961   1001958
       seq-read  5634130  6577930   6380861   6243912   6230006
         seq-rw  1405687  1638117   1640256   1633903   1634459
        rand-rw  1386119  1614664   1617211   1609267   1612471
      
      All the optimization methods show a higher performance than the base,
      however, it is hard to say which method is the most appropriate.
      
      On the other hand, zram is mostly used on small embedded system, so we
      don't want to increase any memory footprint.
      
      This patch pick the bit_spinlock method, pack object size and page_flag
      into an unsigned long table.value, so as to not increase any memory
      overhead on both 32-bit and 64-bit system.
      
      On the third hand, even though different kinds of locks have different
      performances, we can ignore this difference, because: if zram is used as
      zram swapfile, the swap subsystem can prevent concurrent access to the
      same swapslot; if zram is used as zram-blk for set up filesystem on it,
      the upper filesystem and the page cache also prevent concurrent access
      of the same block mostly.  So we can ignore the different performances
      among locks.
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: default avatarWeijie Yang <weijie.yang@samsung.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d2d5e762
    • Minchan Kim's avatar
      zram: use size_t instead of u16 · 023b409f
      Minchan Kim authored
      Some architectures (eg, hexagon and PowerPC) could use PAGE_SHIFT of 16
      or more.  In these cases u16 is not sufficiently large to represent a
      compressed page's size so use size_t.
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reported-by: default avatarWeijie Yang <weijie.yang@samsung.com>
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      023b409f