-
Roger He authored
This can improve performance for some cases. v2 (chk): handle all sizes, simplify the patch quite a bit v3 (chk): adjust dw estimation as well v4 (chk): use single loop, make end mask 64bit Signed-off-by:
Roger He <Hongbo.He@amd.com> Signed-off-by:
Christian König <christian.koenig@amd.com> Tested-by:
Roger He <Hongbo.He@amd.com> Reviewed-by:
Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by:
Chunming Zhou <david1.zhou@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
6849d47c