• Ingo Molnar's avatar
    x86/mm/pat: Fix off-by-one bugs in interval tree search · 91298f1a
    Ingo Molnar authored
    There's a bug in the new PAT code, the conversion of memtype_check_conflict()
    is buggy:
    
       8d04a5f9: ("x86/mm/pat: Convert the PAT tree to a generic interval tree")
    
            dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
            found_type = match->type;
    
    -       node = rb_next(&match->rb);
    -       while (node) {
    -               match = rb_entry(node, struct memtype, rb);
    -
    -               if (match->start >= end) /* Checked all possible matches */
    -                       goto success;
    -
    -               if (is_node_overlap(match, start, end) &&
    -                   match->type != found_type) {
    +       match = memtype_interval_iter_next(match, start, end);
    +       while (match) {
    +               if (match->type != found_type)
                            goto failure;
    -               }
    
    -               node = rb_next(&match->rb);
    +               match = memtype_interval_iter_next(match, start, end);
            }
    
    Note how the '>= end' condition to end the interval check, got converted
    into:
    
    +       match = memtype_interval_iter_next(match, start, end);
    
    This is subtly off by one, because the interval trees interfaces require
    closed interval parameters:
    
      include/linux/interval_tree_generic.h
    
     /*                                                                            \
      * Iterate over intervals intersecting [start;last]                           \
      *                                                                            \
      * Note that a node's interval intersects [start;last] iff:                   \
      *   Cond1: ITSTART(node) <= last                                             \
      * and                                                                        \
      *   Cond2: start <= ITLAST(node)                                             \
      */                                                                           \
    
      ...
    
                    if (ITSTART(node) <= last) {            /* Cond1 */           \
                            if (start <= ITLAST(node))      /* Cond2 */           \
                                    return node;    /* node is leftmost match */  \
    
    [start;last] is a closed interval (note that '<= last' check) - while the
    PAT 'end' parameter is 1 byte beyond the end of the range, because
    ioremap() and the other mapping APIs usually use the [start,end)
    half-open interval, derived from 'size'.
    
    This is what ioremap() does for example:
    
            /*
             * Mappings have to be page-aligned
             */
            offset = phys_addr & ~PAGE_MASK;
            phys_addr &= PHYSICAL_PAGE_MASK;
            size = PAGE_ALIGN(last_addr+1) - phys_addr;
    
            retval = reserve_memtype(phys_addr, (u64)phys_addr + size,
                                                    pcm, &new_pcm);
    
    phys_addr+size will be on a page boundary, after the last byte of the
    mapped interval.
    
    So the correct parameter to use in the interval tree searches is not
    'end' but 'end-1'.
    
    This could have relevance if conflicting PAT ranges are exactly adjacent,
    for example a future WC region is followed immediately by an already
    mapped UC- region - in this case memtype_check_conflict() would
    incorrectly deny the WC memtype region and downgrade the memtype to UC-.
    
    BTW., rather annoyingly this downgrading is done silently in
    memtype_check_insert():
    
    int memtype_check_insert(struct memtype *new,
                             enum page_cache_mode *ret_type)
    {
            int err = 0;
    
            err = memtype_check_conflict(new->start, new->end, new->type, ret_type);
            if (err)
                    return err;
    
            if (ret_type)
                    new->type = *ret_type;
    
            memtype_interval_insert(new, &memtype_rbroot);
            return 0;
    }
    
    So on such a conflict we'd just silently get UC- in *ret_type, and write
    it into the new region, never the wiser ...
    
    So assuming that the patch below fixes the primary bug the diagnostics
    side of ioremap() cache attribute downgrades would be another thing to
    fix.
    
    Anyway, I checked all the interval-tree iterations, and most of them are
    off by one - but I think the one related to memtype_check_conflict() is
    the one causing this particular performance regression.
    
    The only correct interval-tree searches were these two:
    
      arch/x86/mm/pat_interval.c:     match = memtype_interval_iter_first(&memtype_rbroot, 0, ULONG_MAX);
      arch/x86/mm/pat_interval.c:             match = memtype_interval_iter_next(match, 0, ULONG_MAX);
    
    The ULONG_MAX was hiding the off-by-one in plain sight. :-)
    
    Note that the bug was probably benign in the sense of implementing a too
    strict cache attribute conflict policy and downgrading cache attributes,
    so AFAICS the worst outcome of this bug would be a performance regression,
    not any instabilities.
    Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
    Reported-by: default avatarKenneth R. Crudup <kenny@panix.com>
    Reported-by: default avatarMariusz Ceier <mceier+kernel@gmail.com>
    Tested-by: default avatarMariusz Ceier <mceier@gmail.com>
    Tested-by: default avatarKenneth R. Crudup <kenny@panix.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lkml.kernel.org/r/20191201144947.GA4167@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    91298f1a
pat_interval.c 4.53 KB