Commit 614b4f5d authored by Heiko Carstens's avatar Heiko Carstens Committed by Vasily Gorbik

s390/checksum: make ip_fast_csum() faster

Convert ip_fast_csum() so it doesn't call csum_partial(), but instead
open code the checksum calculation. The problem with csum_partial() is
that it makes use of the cksm instruction, which has high startup
costs and therefore is only very fast if used on larger memory
regions.

IPv4 headers however are small in size (5-16 32-bit words). The open
coded variant calculates the checksum in ~30% of the time compared to
the old variant (z14, march=z196).
Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
parent bb4644b1
...@@ -66,7 +66,18 @@ static inline __sum16 csum_fold(__wsum sum) ...@@ -66,7 +66,18 @@ static inline __sum16 csum_fold(__wsum sum)
*/ */
static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl) static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
{ {
return csum_fold(csum_partial(iph, ihl*4, 0)); __u64 csum = 0;
__u32 *ptr = (u32 *)iph;
csum += *ptr++;
csum += *ptr++;
csum += *ptr++;
csum += *ptr++;
ihl -= 4;
while (ihl--)
csum += *ptr++;
csum += (csum >> 32) | (csum << 32);
return csum_fold((__force __wsum)(csum >> 32));
} }
/* /*
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment