1. 22 Mar, 2012 2 commits
  2. 14 Mar, 2012 9 commits
    • Steffen Klassert's avatar
      padata: Fix race on sequence number wrap · 2dc9b5db
      Steffen Klassert authored
      When padata_do_parallel() is called from multiple cpus for the same
      padata instance, we can get object reordering on sequence number wrap
      because testing for sequence number wrap and reseting the sequence
      number must happen atomically but is implemented with two atomic
      operations. This patch fixes this by converting the sequence number
      from atomic_t to an unsigned int and protect the access with a
      spin_lock. As a side effect, we get rid of the sequence number wrap
      handling because the seqence number wraps back to null now without
      the need to do anything.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      2dc9b5db
    • Steffen Klassert's avatar
      padata: Fix race in the serialization path · 3047817b
      Steffen Klassert authored
      When a padata object is queued to the serialization queue, another
      cpu might process and free the padata object. So don't dereference
      it after queueing to the serialization queue.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      3047817b
    • Jussi Kivilinna's avatar
      crypto: camellia - add assembler implementation for x86_64 · 0b95ec56
      Jussi Kivilinna authored
      Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of
      functions are provided. First set is regular 'one-block at time' encrypt/decrypt
      functions. Second is 'two-block at time' functions that gain performance increase
      on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way
      functions with in-order CPUs.
      
      Patch has been tested with tcrypt and automated filesystem tests.
      
      Tcrypt benchmark results:
      
      AMD Phenom II 1055T (fam:16, model:10):
      
      camellia-asm vs camellia_generic:
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.27x   1.22x   1.30x   1.42x   1.30x   1.34x   1.19x   1.05x   1.23x   1.24x
      64B     1.74x   1.79x   1.43x   1.87x   1.81x   1.87x   1.48x   1.38x   1.55x   1.62x
      256B    1.90x   1.87x   1.43x   1.94x   1.94x   1.95x   1.63x   1.62x   1.67x   1.70x
      1024B   1.96x   1.93x   1.43x   1.95x   1.98x   2.01x   1.67x   1.69x   1.74x   1.80x
      8192B   1.96x   1.96x   1.39x   1.93x   2.01x   2.03x   1.72x   1.64x   1.71x   1.76x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.23x   1.23x   1.33x   1.39x   1.34x   1.38x   1.04x   1.18x   1.21x   1.29x
      64B     1.72x   1.69x   1.42x   1.78x   1.81x   1.89x   1.57x   1.52x   1.56x   1.65x
      256B    1.85x   1.88x   1.42x   1.86x   1.93x   1.96x   1.69x   1.65x   1.70x   1.75x
      1024B   1.88x   1.86x   1.45x   1.95x   1.96x   1.95x   1.77x   1.71x   1.77x   1.78x
      8192B   1.91x   1.86x   1.42x   1.91x   2.03x   1.98x   1.73x   1.71x   1.78x   1.76x
      
      camellia-asm vs aes-asm (8kB block):
               128bit  256bit
      ecb-enc  1.15x   1.22x
      ecb-dec  1.16x   1.16x
      cbc-enc  0.85x   0.90x
      cbc-dec  1.20x   1.23x
      ctr-enc  1.28x   1.30x
      ctr-dec  1.27x   1.28x
      lrw-enc  1.12x   1.16x
      lrw-dec  1.08x   1.10x
      xts-enc  1.11x   1.15x
      xts-dec  1.14x   1.15x
      
      Intel Core2 T8100 (fam:6, model:23, step:6):
      
      camellia-asm vs camellia_generic:
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.10x   1.12x   1.14x   1.16x   1.16x   1.15x   1.02x   1.02x   1.08x   1.08x
      64B     1.61x   1.60x   1.17x   1.68x   1.67x   1.66x   1.43x   1.42x   1.44x   1.42x
      256B    1.65x   1.73x   1.17x   1.77x   1.81x   1.80x   1.54x   1.53x   1.58x   1.54x
      1024B   1.76x   1.74x   1.18x   1.80x   1.85x   1.85x   1.60x   1.59x   1.65x   1.60x
      8192B   1.77x   1.75x   1.19x   1.81x   1.85x   1.86x   1.63x   1.61x   1.66x   1.62x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.10x   1.07x   1.13x   1.16x   1.11x   1.16x   1.03x   1.02x   1.08x   1.07x
      64B     1.61x   1.62x   1.15x   1.66x   1.63x   1.68x   1.47x   1.46x   1.47x   1.44x
      256B    1.71x   1.70x   1.16x   1.75x   1.69x   1.79x   1.58x   1.57x   1.59x   1.55x
      1024B   1.78x   1.72x   1.17x   1.75x   1.80x   1.80x   1.63x   1.62x   1.65x   1.62x
      8192B   1.76x   1.73x   1.17x   1.78x   1.80x   1.81x   1.64x   1.62x   1.68x   1.64x
      
      camellia-asm vs aes-asm (8kB block):
               128bit  256bit
      ecb-enc  1.17x   1.21x
      ecb-dec  1.17x   1.20x
      cbc-enc  0.80x   0.82x
      cbc-dec  1.22x   1.24x
      ctr-enc  1.25x   1.26x
      ctr-dec  1.25x   1.26x
      lrw-enc  1.14x   1.18x
      lrw-dec  1.13x   1.17x
      xts-enc  1.14x   1.18x
      xts-dec  1.14x   1.17x
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      0b95ec56
    • Jussi Kivilinna's avatar
    • Jussi Kivilinna's avatar
      crypto: camellia - fix checkpatch warnings · e2861a71
      Jussi Kivilinna authored
      Fix checkpatch warnings before renaming file.
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      e2861a71
    • Jussi Kivilinna's avatar
      crypto: camellia - rename camellia module to camellia_generic · 075e39df
      Jussi Kivilinna authored
      Rename camellia module to camellia_generic to allow optimized assembler
      implementations to autoload with module-alias.
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      075e39df
    • Jussi Kivilinna's avatar
      crypto: tcrypt - add more camellia tests · 4de59337
      Jussi Kivilinna authored
      Add tests for CTR, LRW and XTS modes.
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      4de59337
    • Jussi Kivilinna's avatar
      crypto: testmgr - add more camellia test vectors · 0840605e
      Jussi Kivilinna authored
      New ECB, CBC, CTR, LRW and XTS test vectors for camellia. Larger ECB/CBC test
      vectors needed for parallel 2-way camellia implementation.
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      0840605e
    • Jussi Kivilinna's avatar
      crypto: camellia - simplify key setup and CAMELLIA_ROUNDSM macro · c9b56d33
      Jussi Kivilinna authored
      camellia_setup_tail() applies 'inverse of the last half of P-function' to
      subkeys, which is unneeded if keys are applied directly to yl/yr in
      CAMELLIA_ROUNDSM.
      
      Patch speeds up key setup and should speed up CAMELLIA_ROUNDSM as applying
      key to yl/yr early has less register dependencies.
      
      Quick tcrypt camellia results:
       x86_64, AMD Phenom II, ~5% faster
       x86_64, Intel Core 2, ~0.5% faster
       i386, Intel Atom N270, ~1% faster
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      c9b56d33
  3. 25 Feb, 2012 6 commits
  4. 16 Feb, 2012 2 commits
  5. 14 Feb, 2012 2 commits
  6. 05 Feb, 2012 2 commits
  7. 26 Jan, 2012 3 commits
  8. 15 Jan, 2012 3 commits
    • Alexey Dobriyan's avatar
      crypto: sha512 - use standard ror64() · b85a088f
      Alexey Dobriyan authored
      Use standard ror64() instead of hand-written.
      There is no standard ror64, so create it.
      
      The difference is shift value being "unsigned int" instead of uint64_t
      (for which there is no reason). gcc starts to emit native ROR instructions
      which it doesn't do for some reason currently. This should make the code
      faster.
      
      Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      b85a088f
    • Alexey Dobriyan's avatar
      crypto: sha512 - reduce stack usage to safe number · 51fc6dc8
      Alexey Dobriyan authored
      For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16].
      Consequently, keeping all W[80] array on stack is unnecessary,
      only 16 values are really needed.
      
      Using W[16] instead of W[80] greatly reduces stack usage
      (~750 bytes to ~340 bytes on x86_64).
      
      Line by line explanation:
      * BLEND_OP
        array is "circular" now, all indexes have to be modulo 16.
        Round number is positive, so remainder operation should be
        without surprises.
      
      * initial full message scheduling is trimmed to first 16 values which
        come from data block, the rest is calculated before it's needed.
      
      * original loop body is unrolled version of new SHA512_0_15 and
        SHA512_16_79 macros, unrolling was done to not do explicit variable
        renaming. Otherwise it's the very same code after preprocessing.
        See sha1_transform() code which does the same trick.
      
      Patch survives in-tree crypto test and original bugreport test
      (ping flood with hmac(sha512).
      
      See FIPS 180-2 for SHA-512 definition
      http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdfSigned-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      51fc6dc8
    • Alexey Dobriyan's avatar
      crypto: sha512 - make it work, undo percpu message schedule · 84e31fdb
      Alexey Dobriyan authored
      commit f9e2bca6
      aka "crypto: sha512 - Move message schedule W[80] to static percpu area"
      created global message schedule area.
      
      If sha512_update will ever be entered twice, hash will be silently
      calculated incorrectly.
      
      Probably the easiest way to notice incorrect hashes being calculated is
      to run 2 ping floods over AH with hmac(sha512):
      
      	#!/usr/sbin/setkey -f
      	flush;
      	spdflush;
      	add IP1 IP2 ah 25 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000025;
      	add IP2 IP1 ah 52 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000052;
      	spdadd IP1 IP2 any -P out ipsec ah/transport//require;
      	spdadd IP2 IP1 any -P in  ipsec ah/transport//require;
      
      XfrmInStateProtoError will start ticking with -EBADMSG being returned
      from ah_input(). This never happens with, say, hmac(sha1).
      
      With patch applied (on BOTH sides), XfrmInStateProtoError does not tick
      with multiple bidirectional ping flood streams like it doesn't tick
      with SHA-1.
      
      After this patch sha512_transform() will start using ~750 bytes of stack on x86_64.
      This is OK for simple loads, for something more heavy, stack reduction will be done
      separatedly.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      84e31fdb
  9. 13 Jan, 2012 10 commits
  10. 11 Jan, 2012 1 commit
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 4f58cb90
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (54 commits)
        crypto: gf128mul - remove leftover "(EXPERIMENTAL)" in Kconfig
        crypto: serpent-sse2 - remove unneeded LRW/XTS #ifdefs
        crypto: serpent-sse2 - select LRW and XTS
        crypto: twofish-x86_64-3way - remove unneeded LRW/XTS #ifdefs
        crypto: twofish-x86_64-3way - select LRW and XTS
        crypto: xts - remove dependency on EXPERIMENTAL
        crypto: lrw - remove dependency on EXPERIMENTAL
        crypto: picoxcell - fix boolean and / or confusion
        crypto: caam - remove DECO access initialization code
        crypto: caam - fix polarity of "propagate error" logic
        crypto: caam - more desc.h cleanups
        crypto: caam - desc.h - convert spaces to tabs
        crypto: talitos - convert talitos_error to struct device
        crypto: talitos - remove NO_IRQ references
        crypto: talitos - fix bad kfree
        crypto: convert drivers/crypto/* to use module_platform_driver()
        char: hw_random: convert drivers/char/hw_random/* to use module_platform_driver()
        crypto: serpent-sse2 - should select CRYPTO_CRYPTD
        crypto: serpent - rename serpent.c to serpent_generic.c
        crypto: serpent - cleanup checkpatch errors and warnings
        ...
      4f58cb90