• Linus Torvalds's avatar
    cpumask: fix incorrect cpumask scanning result checks · 8ca09d5f
    Linus Torvalds authored
    It turns out that commit 596ff4a0 ("cpumask: re-introduce
    constant-sized cpumask optimizations") exposed a number of cases of
    drivers not checking the result of "cpumask_next()" and friends
    correctly.
    
    The documented correct check for "no more cpus in the cpumask" is to
    check for the result being equal or larger than the number of possible
    CPU ids, exactly _because_ we've always done those constant-sized
    cpumask scans using a widened type before.  So the return value of a
    cpumask scan should be checked with
    
    	if (cpu >= nr_cpu_ids)
    		...
    
    because the cpumask scan did not necessarily stop exactly *at* that
    maximum CPU id.
    
    But a few cases ended up instead using checks like
    
    	if (cpu == nr_cpumask_bits)
    		...
    
    which used that internal "widened" number of bits.  And that used to
    work pretty much by accident (ok, in this case "by accident" is simply
    because it matched the historical internal implementation of the cpumask
    scanning, so it was more of a "intentionally using implementation
    details rather than an accident").
    
    But the extended constant-sized optimizations then did that internal
    implementation differently, and now that code that did things wrong but
    matched the old implementation no longer worked at all.
    
    Which then causes subsequent odd problems due to using what ends up
    being an invalid CPU ID.
    
    Most of these cases require either unusual hardware or special uses to
    hit, but the random.c one triggers quite easily.
    
    All you really need is to have a sufficiently small CONFIG_NR_CPUS value
    for the bit scanning optimization to be triggered, but not enough CPUs
    to then actually fill that widened cpumask.  At that point, the cpumask
    scanning will return the NR_CPUS constant, which is _not_ the same as
    nr_cpumask_bits.
    
    This just does the mindless fix with
    
       sed -i 's/== nr_cpumask_bits/>= nr_cpu_ids/'
    
    to fix the incorrect uses.
    
    The ones in the SCSI lpfc driver in particular could probably be fixed
    more cleanly by just removing that repeated pattern entirely, but I am
    not emptionally invested enough in that driver to care.
    Reported-and-tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
    Link: https://lore.kernel.org/lkml/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/Reported-and-tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
    Link: https://lore.kernel.org/lkml/CAMuHMdUKo_Sf7TjKzcNDa8Ve+6QrK+P8nSQrSQ=6LTRmcBKNww@mail.gmail.com/Reported-by: default avatarVernon Yang <vernon2gm@gmail.com>
    Link: https://lore.kernel.org/lkml/20230306160651.2016767-1-vernon2gm@gmail.com/
    Cc: Yury Norov <yury.norov@gmail.com>
    Cc: Jason A. Donenfeld <Jason@zx2c4.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    8ca09d5f
xmon.c 92 KB