1. 28 May, 2016 10 commits
    • George Spelvin's avatar
      h8300: Add <asm/hash.h> · 4684fe95
      George Spelvin authored
      This will improve the performance of hash_32() and hash_64(), but due
      to complete lack of multi-bit shift instructions on H8, performance will
      still be bad in surrounding code.
      
      Designing H8-specific hash algorithms to work around that is a separate
      project.  (But if the maintainers would like to get in touch...)
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      4684fe95
    • George Spelvin's avatar
      microblaze: Add <asm/hash.h> · 7b13277b
      George Spelvin authored
      Microblaze is an FPGA soft core that can be configured various ways.
      
      If it is configured without a multiplier, the standard __hash_32()
      will require a call to __mulsi3, which is a slow software loop.
      
      Instead, use a shift-and-add sequence for the constant multiply.
      GCC knows how to do this, but it's not as clever as some.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Alistair Francis <alistair.francis@xilinx.com>
      Cc: Michal Simek <michal.simek@xilinx.com>
      7b13277b
    • George Spelvin's avatar
      m68k: Add <asm/hash.h> · 14c44b95
      George Spelvin authored
      This provides a multiply by constant GOLDEN_RATIO_32 = 0x61C88647
      for the original mc68000, which lacks a 32x32-bit multiply instruction.
      
      Yes, the amount of optimization effort put in is excessive. :-)
      
      Shift-add chain found by Yevgen Voronenko's Hcub algorithm at
      http://spiral.ece.cmu.edu/mcm/gen.htmlSigned-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Philippe De Muyter <phdm@macq.eu>
      Cc: linux-m68k@lists.linux-m68k.org
      14c44b95
    • George Spelvin's avatar
      <linux/hash.h>: Add support for architecture-specific functions · 468a9428
      George Spelvin authored
      This is just the infrastructure; there are no users yet.
      
      This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
      the existence of <asm/hash.h>.
      
      That file may define its own versions of various functions, and define
      HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.
      
      Included is a self-test (in lib/test_hash.c) that verifies the basics.
      It is NOT in general required that the arch-specific functions compute
      the same thing as the generic, but if a HAVE_* symbol is defined with
      the value 1, then equality is tested.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Philippe De Muyter <phdm@macq.eu>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: Alistair Francis <alistai@xilinx.com>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      468a9428
    • George Spelvin's avatar
      fs/namei.c: Improve dcache hash function · 2a18da7a
      George Spelvin authored
      Patch 0fed3ac8 improved the hash mixing, but the function is slower
      than necessary; there's a 7-instruction dependency chain (10 on x86)
      each loop iteration.
      
      Word-at-a-time access is a very tight loop (which is good, because
      link_path_walk() is one of the hottest code paths in the entire kernel),
      and the hash mixing function must not have a longer latency to avoid
      slowing it down.
      
      There do not appear to be any published fast hash functions that:
      1) Operate on the input a word at a time, and
      2) Don't need to know the length of the input beforehand, and
      3) Have a single iterated mixing function, not needing conditional
         branches or unrolling to distinguish different loop iterations.
      
      One of the algorithms which comes closest is Yann Collet's xxHash, but
      that's two dependent multiplies per word, which is too much.
      
      The key insights in this design are:
      
      1) Barring expensive ops like multiplies, to diffuse one input bit
         across 64 bits of hash state takes at least log2(64) = 6 sequentially
         dependent instructions.  That is more cycles than we'd like.
      2) An operation like "hash ^= hash << 13" requires a second temporary
         register anyway, and on a 2-operand machine like x86, it's three
         instructions.
      3) A better use of a second register is to hold a two-word hash state.
         With careful design, no temporaries are needed at all, so it doesn't
         increase register pressure.  And this gets rid of register copying
         on 2-operand machines, so the code is smaller and faster.
      4) Using two words of state weakens the requirement for one-round mixing;
         we now have two rounds of mixing before cancellation is possible.
      5) A two-word hash state also allows operations on both halves to be
         done in parallel, so on a superscalar processor we get more mixing
         in fewer cycles.
      
      I ended up using a mixing function inspired by the ChaCha and Speck
      round functions.  It is 6 simple instructions and 3 cycles per iteration
      (assuming multiply by 9 can be done by an "lea" instruction):
      
      		x ^= *input++;
      	y ^= x;	x = ROL(x, K1);
      	x += y;	y = ROL(y, K2);
      	y *= 9;
      
      Not only is this reversible, two consecutive rounds are reversible:
      if you are given the initial and final states, but not the intermediate
      state, it is possible to compute both input words.  This means that at
      least 3 words of input are required to create a collision.
      
      (It also has the property, used by hash_name() to avoid a branch, that
      it hashes all-zero to all-zero.)
      
      The rotate constants K1 and K2 were found by experiment.  The search took
      a sample of random initial states (I used 1023) and considered the effect
      of flipping each of the 64 input bits on each of the 128 output bits two
      rounds later.  Each of the 8192 pairs can be considered a biased coin, and
      adding up the Shannon entropy of all of them produces a score.
      
      The best-scoring shifts also did well in other tests (flipping bits in y,
      trying 3 or 4 rounds of mixing, flipping all 64*63/2 pairs of input bits),
      so the choice was made with the additional constraint that the sum of the
      shifts is odd and not too close to the word size.
      
      The final state is then folded into a 32-bit hash value by a less carefully
      optimized multiply-based scheme.  This also has to be fast, as pathname
      components tend to be short (the most common case is one iteration!), but
      there's some room for latency, as there is a fair bit of intervening logic
      before the hash value is used for anything.
      
      (Performance verified with "bonnie++ -s 0 -n 1536:-2" on tmpfs.  I need
      a better benchmark; the numbers seem to show a slight dip in performance
      between 4.6.0 and this patch, but they're too noisy to quote.)
      
      Special thanks to Bruce fields for diligent testing which uncovered a
      nasty fencepost error in an earlier version of this patch.
      
      [checkpatch.pl formatting complaints noted and respectfully disagreed with.]
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Tested-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      2a18da7a
    • George Spelvin's avatar
      Eliminate bad hash multipliers from hash_32() and hash_64() · ef703f49
      George Spelvin authored
      The "simplified" prime multipliers made very bad hash functions, so get rid
      of them.  This completes the work of 689de1d6.
      
      To avoid the inefficiency which was the motivation for the "simplified"
      multipliers, hash_64() on 32-bit systems is changed to use a different
      algorithm.  It makes two calls to hash_32() instead.
      
      drivers/media/usb/dvb-usb-v2/af9015.c uses the old GOLDEN_RATIO_PRIME_32
      for some horrible reason, so it inherits a copy of the old definition.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Antti Palosaari <crope@iki.fi>
      Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
      ef703f49
    • George Spelvin's avatar
      Change hash_64() return value to 32 bits · 92d56774
      George Spelvin authored
      That's all that's ever asked for, and it makes the return
      type of hash_long() consistent.
      
      It also allows (upcoming patch) an optimized implementation
      of hash_64 on 32-bit machines.
      
      I tried adding a BUILD_BUG_ON to ensure the number of bits requested
      was never more than 32 (most callers use a compile-time constant), but
      adding <linux/bug.h> to <linux/hash.h> breaks the tools/perf compiler
      unless tools/perf/MANIFEST is updated, and understanding that code base
      well enough to update it is too much trouble.  I did the rest of an
      allyesconfig build with such a check, and nothing tripped.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      92d56774
    • George Spelvin's avatar
      <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() · 917ea166
      George Spelvin authored
      Finally, the first use of previous two patches: eliminate the
      separate ad-hoc string hash functions in the sunrpc code.
      
      Now hash_str() is a wrapper around hash_string(), and hash_mem() is
      likewise a wrapper around full_name_hash().
      
      Note that sunrpc code *does* call hash_mem() with a zero length, which
      is why the previous patch needed to handle that in full_name_hash().
      (Thanks, Bruce, for finding that!)
      
      This also eliminates the only caller of hash_long which asks for
      more than 32 bits of output.
      
      The comment about the quality of hashlen_string() and full_name_hash()
      is jumping the gun by a few patches; they aren't very impressive now,
      but will be improved greatly later in the series.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Tested-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Acked-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: linux-nfs@vger.kernel.org
      917ea166
    • George Spelvin's avatar
      fs/namei.c: Add hashlen_string() function · fcfd2fbf
      George Spelvin authored
      We'd like to make more use of the highly-optimized dcache hash functions
      throughout the kernel, rather than have every subsystem create its own,
      and a function that hashes basic null-terminated strings is required
      for that.
      
      (The name is to emphasize that it returns both hash and length.)
      
      It's actually useful in the dcache itself, specifically d_alloc_name().
      Other uses in the next patch.
      
      full_name_hash() is also tweaked to make it more generally useful:
      1) Take a "char *" rather than "unsigned char *" argument, to
         be consistent with hash_name().
      2) Handle zero-length inputs.  If we want more callers, we don't want
         to make them worry about corner cases.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      fcfd2fbf
    • George Spelvin's avatar
      Pull out string hash to <linux/stringhash.h> · f4bcbe79
      George Spelvin authored
      ... so they can be used without the rest of <linux/dcache.h>
      
      The hashlen_* macros will make sense next patch.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      f4bcbe79
  2. 16 May, 2016 1 commit
  3. 15 May, 2016 2 commits
  4. 14 May, 2016 11 commits
  5. 13 May, 2016 16 commits