• Sebastian Andrzej Siewior's avatar
    mm/page_alloc: use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). · a2ebb515
    Sebastian Andrzej Siewior authored
    __build_all_zonelists() acquires zonelist_update_seq by first disabling
    interrupts via local_irq_save() and then acquiring the seqlock with
    write_seqlock().  This is troublesome and leads to problems on PREEMPT_RT.
    The problem is that the inner spinlock_t becomes a sleeping lock on
    PREEMPT_RT and must not be acquired with disabled interrupts.
    
    The API provides write_seqlock_irqsave() which does the right thing in one
    step.  printk_deferred_enter() has to be invoked in non-migrate-able
    context to ensure that deferred printing is enabled and disabled on the
    same CPU.  This is the case after zonelist_update_seq has been acquired.
    
    There was discussion on the first submission that the order should be:
    	local_irq_disable();
    	printk_deferred_enter();
    	write_seqlock();
    
    to avoid pitfalls like having an unaccounted printk() coming from
    write_seqlock_irqsave() before printk_deferred_enter() is invoked.  The
    only origin of such a printk() can be a lockdep splat because the lockdep
    annotation happens after the sequence count is incremented.  This is
    exceptional and subject to change.
    
    It was also pointed that PREEMPT_RT can be affected by the printk problem
    since its write_seqlock_irqsave() does not really disable interrupts. 
    This isn't the case because PREEMPT_RT's printk implementation differs
    from the mainline implementation in two important aspects:
    
    - Printing happens in a dedicated threads and not at during the
      invocation of printk().
    - In emergency cases where synchronous printing is used, a different
      driver is used which does not use tty_port::lock.
    
    Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer
    printk output.
    
    Link: https://lkml.kernel.org/r/20230623201517.yw286Knb@linutronix.de
    Fixes: 1007843a ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
    Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
    Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: John Ogness <john.ogness@linutronix.de>
    Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Waiman Long <longman@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    a2ebb515
page_alloc.c 186 KB