• Christian Borntraeger's avatar
    s390: let the compiler do page clearing · fb3d1c08
    Christian Borntraeger authored
    The hardware folks told me that for page clearing "when you exactly
    know what to do, hand written xc+pfd is usally faster then mvcl for
    page clearing, as it saves millicode overhead and parameter parsing
    and checking" as long as you dont need the cache bypassing.
    Turns out that gcc already does a proper xc,pfd loop.
    
    A small test on z196 that does
    
    buff = mmap(NULL, bufsize,PROT_EXEC|PROT_WRITE|PROT_READ,AP_PRIVATE| MAP_ANONYMOUS,0,0);
    for ( i = 0; i < bufsize; i+= 256)
        buff[i] = 0x5;
    
    gets 20% faster (touches every cache line of a page)
    
    and
    
    buff = mmap(NULL, bufsize,PROT_EXEC|PROT_WRITE|PROT_READ,AP_PRIVATE| MAP_ANONYMOUS,0,0);
    for ( i = 0; i < bufsize; i+= 4096)
        buff[i] = 0x5;
    
    is within noise ratio (touches one cache line of a page).
    
    As the clear_page is usually called for first memory accesses
    we can assume that at least one cache line is used afterwards,
    so this change should be always better.
    Another benchmark, a make -j 40 of my testsuite in tmpfs with
    hot caches on a 32cpu system:
    
     -- unpatched --       --  patched  --
    real     0m1.017s     real     0m0.994s   (~2% faster, but in noise)
    user     0m5.339s     user     0m5.016s   (~6% faster)
    sys      0m0.691s     sys      0m0.632s   (~8% faster)
    
    Let use the same define to memset as the asm-generic variant
    Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
    Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
    fb3d1c08
page.h 4.31 KB