1. 08 Feb, 2017 10 commits
    • Sergey Senozhatsky's avatar
      printk: drop call_console_drivers() unused param · d9c23523
      Sergey Senozhatsky authored
      We do suppress_message_printing() check before we call
      call_console_drivers() now, so `level' param is not needed
      anymore.
      
      Link: http://lkml.kernel.org/r/20161224140902.1962-2-sergey.senozhatsky@gmail.com
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      d9c23523
    • Sergey Senozhatsky's avatar
      printk: convert the rest to printk-safe · de6fcbdb
      Sergey Senozhatsky authored
      This patch converts the rest of logbuf users (which are
      out of printk recursion case, but can deadlock in printk).
      To make printk-safe usage easier the patch introduces 4
      helper macros:
      - logbuf_lock_irq()/logbuf_unlock_irq()
        lock/unlock the logbuf lock and disable/enable local IRQ
      
      - logbuf_lock_irqsave(flags)/logbuf_unlock_irqrestore(flags)
        lock/unlock the logbuf lock and saves/restores local IRQ state
      
      Link: http://lkml.kernel.org/r/20161227141611.940-9-sergey.senozhatsky@gmail.com
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Calvin Owens <calvinowens@fb.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      de6fcbdb
    • Sergey Senozhatsky's avatar
      printk: remove zap_locks() function · 8b1742c9
      Sergey Senozhatsky authored
      We use printk-safe now which makes printk-recursion detection code
      in vprintk_emit() unreachable. The tricky thing here is that, apart
      from detecting and reporting printk recursions, that code also used
      to zap_locks() in case of panic() from the same CPU. However,
      zap_locks() does not look to be needed anymore:
      
      1) Since commit 08d78658 ("panic: release stale console lock to
         always get the logbuf printed out") panic flushing of `logbuf' to
         console ignores the state of `console_sem' by doing
         	panic()
      		console_trylock();
      		console_unlock();
      
      2) Since commit cf9b1106 ("printk/nmi: flush NMI messages on the
         system panic") panic attempts to zap the `logbuf_lock' spin_lock to
         successfully flush nmi messages to `logbuf'.
      
      Basically, it seems that we either already do what zap_locks() used to
      do but in other places or we ignore the state of the lock. The only
      reaming difference is that we don't re-init the console semaphore in
      printk_safe_flush_on_panic(), but this is not necessary because we
      don't call console drivers from printk_safe_flush_on_panic() due to
      the fact that we are using a deferred printk() version (as was
      suggested by Petr Mladek).
      
      Link: http://lkml.kernel.org/r/20161227141611.940-8-sergey.senozhatsky@gmail.com
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Calvin Owens <calvinowens@fb.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      8b1742c9
    • Sergey Senozhatsky's avatar
      printk: use printk_safe buffers in printk · f975237b
      Sergey Senozhatsky authored
      Use printk_safe per-CPU buffers in printk recursion-prone blocks:
      -- around logbuf_lock protected sections in vprintk_emit() and
         console_unlock()
      -- around down_trylock_console_sem() and up_console_sem()
      
      Note that this solution addresses deadlocks caused by printk()
      recursive calls only. That is vprintk_emit() and console_unlock().
      The rest will be converted in a followup patch.
      
      Another thing to note is that we now keep lockdep enabled in printk,
      because we are protected against the printk recursion caused by
      lockdep in vprintk_emit() by the printk-safe mechanism - we first
      switch to per-CPU buffers and only then access the deadlock-prone
      locks.
      
      Examples:
      
      1) printk() from logbuf_lock spin_lock section
      
      Assume the following code:
        printk()
          raw_spin_lock(&logbuf_lock);
          WARN_ON(1);
          raw_spin_unlock(&logbuf_lock);
      
      which now produces:
      
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 366 at kernel/printk/printk.c:1811 vprintk_emit
       CPU: 0 PID: 366 Comm: bash
       Call Trace:
         warn_slowpath_null+0x1d/0x1f
         vprintk_emit+0x1cd/0x438
         vprintk_default+0x1d/0x1f
         printk+0x48/0x50
        [..]
      
      2) printk() from semaphore sem->lock spin_lock section
      
      Assume the following code
      
        printk()
          console_trylock()
            down_trylock()
              raw_spin_lock_irqsave(&sem->lock, flags);
              WARN_ON(1);
              raw_spin_unlock_irqrestore(&sem->lock, flags);
      
      which now produces:
      
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 363 at kernel/locking/semaphore.c:141 down_trylock
       CPU: 1 PID: 363 Comm: bash
       Call Trace:
         warn_slowpath_null+0x1d/0x1f
         down_trylock+0x3d/0x62
         ? vprintk_emit+0x3f9/0x414
         console_trylock+0x31/0xeb
         vprintk_emit+0x3f9/0x414
         vprintk_default+0x1d/0x1f
         printk+0x48/0x50
        [..]
      
      3) printk() from console_unlock()
      
      Assume the following code:
      
        printk()
          console_unlock()
            raw_spin_lock(&logbuf_lock);
            WARN_ON(1);
            raw_spin_unlock(&logbuf_lock);
      
      which now produces:
      
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 329 at kernel/printk/printk.c:2384 console_unlock
       CPU: 1 PID: 329 Comm: bash
       Call Trace:
         warn_slowpath_null+0x18/0x1a
         console_unlock+0x12d/0x559
         ? trace_hardirqs_on_caller+0x16d/0x189
         ? trace_hardirqs_on+0xd/0xf
         vprintk_emit+0x363/0x374
         vprintk_default+0x18/0x1a
         printk+0x43/0x4b
        [..]
      
      4) printk() from try_to_wake_up()
      
      Assume the following code:
      
        printk()
          console_unlock()
            up()
              try_to_wake_up()
                raw_spin_lock_irqsave(&p->pi_lock, flags);
                WARN_ON(1);
                raw_spin_unlock_irqrestore(&p->pi_lock, flags);
      
      which now produces:
      
       ------------[ cut here ]------------
       WARNING: CPU: 3 PID: 363 at kernel/sched/core.c:2028 try_to_wake_up
       CPU: 3 PID: 363 Comm: bash
       Call Trace:
         warn_slowpath_null+0x1d/0x1f
         try_to_wake_up+0x7f/0x4f7
         wake_up_process+0x15/0x17
         __up.isra.0+0x56/0x63
         up+0x32/0x42
         __up_console_sem+0x37/0x55
         console_unlock+0x21e/0x4c2
         vprintk_emit+0x41c/0x462
         vprintk_default+0x1d/0x1f
         printk+0x48/0x50
        [..]
      
      5) printk() from call_console_drivers()
      
      Assume the following code:
        printk()
          console_unlock()
            call_console_drivers()
            ...
                WARN_ON(1);
      
      which now produces:
      
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 305 at kernel/printk/printk.c:1604 call_console_drivers
       CPU: 2 PID: 305 Comm: bash
       Call Trace:
         warn_slowpath_null+0x18/0x1a
         call_console_drivers.isra.6.constprop.16+0x3a/0xb0
         console_unlock+0x471/0x48e
         vprintk_emit+0x1f4/0x206
         vprintk_default+0x18/0x1a
         vprintk_func+0x6e/0x70
         printk+0x3e/0x46
        [..]
      
      6) unsupported placeholder in printk() format now prints an actual
         warning from vscnprintf(), instead of
         	'BUG: recent printk recursion!'.
      
       ------------[ cut here ]------------
       WARNING: CPU: 5 PID: 337 at lib/vsprintf.c:1900 format_decode
       Please remove unsupported %
        in format string
       CPU: 5 PID: 337 Comm: bash
       Call Trace:
         dump_stack+0x4f/0x65
         __warn+0xc2/0xdd
         warn_slowpath_fmt+0x4b/0x53
         format_decode+0x22c/0x308
         vsnprintf+0x89/0x3b7
         vscnprintf+0xd/0x26
         vprintk_emit+0xb4/0x238
         vprintk_default+0x1d/0x1f
         vprintk_func+0x6c/0x73
         printk+0x43/0x4b
        [..]
      
      Link: http://lkml.kernel.org/r/20161227141611.940-7-sergey.senozhatsky@gmail.com
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Calvin Owens <calvinowens@fb.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      f975237b
    • Sergey Senozhatsky's avatar
      printk: report lost messages in printk safe/nmi contexts · ddb9baa8
      Sergey Senozhatsky authored
      Account lost messages in pritk-safe and printk-safe-nmi
      contexts and report those numbers during printk_safe_flush().
      
      The patch also moves lost message counter to struct
      `printk_safe_seq_buf' instead of having dedicated static
      counters - this simplifies the code.
      
      Link: http://lkml.kernel.org/r/20161227141611.940-6-sergey.senozhatsky@gmail.com
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Calvin Owens <calvinowens@fb.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      ddb9baa8
    • Sergey Senozhatsky's avatar
      printk: always use deferred printk when flush printk_safe lines · 7acac344
      Sergey Senozhatsky authored
      Always use printk_deferred() in printk_safe_flush_line().
      Flushing can be done from NMI or printk_safe contexts (when
      we are in panic), so we can't call console drivers, yet still
      want to store the messages in the logbuf buffer. Therefore we
      use a deferred printk version.
      
      Link: http://lkml.kernel.org/r/20170206164253.GA463@tigerII.localdomain
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Calvin Owens <calvinowens@fb.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Suggested-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      7acac344
    • Sergey Senozhatsky's avatar
      printk: introduce per-cpu safe_print seq buffer · 099f1c84
      Sergey Senozhatsky authored
      This patch extends the idea of NMI per-cpu buffers to regions
      that may cause recursive printk() calls and possible deadlocks.
      Namely, printk() can't handle printk calls from schedule code
      or printk() calls from lock debugging code (spin_dump() for instance);
      because those may be called with `sem->lock' already taken or any
      other `critical' locks (p->pi_lock, etc.). An example of deadlock
      can be
      
       vprintk_emit()
        console_unlock()
         up()                        << raw_spin_lock_irqsave(&sem->lock, flags);
          wake_up_process()
           try_to_wake_up()
            ttwu_queue()
             ttwu_activate()
              activate_task()
               enqueue_task()
                enqueue_task_fair()
                 cfs_rq_of()
                  task_of()
                   WARN_ON_ONCE(!entity_is_task(se))
                    vprintk_emit()
                     console_trylock()
                      down_trylock()
                       raw_spin_lock_irqsave(&sem->lock, flags)
                       ^^^^ deadlock
      
      and some other cases.
      
      Just like in NMI implementation, the solution uses a per-cpu
      `printk_func' pointer to 'redirect' printk() calls to a 'safe'
      callback, that store messages in a per-cpu buffer and flushes
      them back to logbuf buffer later.
      
      Usage example:
      
       printk()
        printk_safe_enter_irqsave(flags)
        //
        //  any printk() call from here will endup in vprintk_safe(),
        //  that stores messages in a special per-CPU buffer.
        //
        printk_safe_exit_irqrestore(flags)
      
      The 'redirection' mechanism, though, has been reworked, as suggested
      by Petr Mladek. Instead of using a per-cpu @print_func callback we now
      keep a per-cpu printk-context variable and call either default or nmi
      vprintk function depending on its value. printk_nmi_entrer/exit and
      printk_safe_enter/exit, thus, just set/celar corresponding bits in
      printk-context functions.
      
      The patch only adds printk_safe support, we don't use it yet.
      
      Link: http://lkml.kernel.org/r/20161227141611.940-4-sergey.senozhatsky@gmail.com
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Calvin Owens <calvinowens@fb.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      099f1c84
    • Sergey Senozhatsky's avatar
      printk: rename nmi.c and exported api · f92bac3b
      Sergey Senozhatsky authored
      A preparation patch for printk_safe work. No functional change.
      - rename nmi.c to print_safe.c
      - add `printk_safe' prefix to some (which used both by printk-safe
        and printk-nmi) of the exported functions.
      
      Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Calvin Owens <calvinowens@fb.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      f92bac3b
    • Sergey Senozhatsky's avatar
      printk: use vprintk_func in vprintk() · bd66a892
      Sergey Senozhatsky authored
      vprintk(), just like printk(), better be using per-cpu printk_func
      instead of direct vprintk_emit() call. Just in case if vprintk()
      will ever be called from NMI, or from any other context that can
      deadlock in printk().
      
      Link: http://lkml.kernel.org/r/20161227141611.940-2-sergey.senozhatsky@gmail.com
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Calvin Owens <calvinowens@fb.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      bd66a892
    • Petr Mladek's avatar
      MAINTAINERS: Add printk maintainers · 548cf34b
      Petr Mladek authored
      I and Sergey would like to volunteer as printk code maintainers.
      It is a code that everyone is using, various people fix bugs or
      even add features but there is nobody really interested into
      maintaining it.
      
      I and Sergey have put a lot of effort into understanding the code
      and related problems. We are working on solutions for some long
      term problems. There is a nice summary from the Kernel Summit
      presentation, see https://lwn.net/Articles/705938/
      
      We have already started to use the gained knowledge and comment
      on other printk-related patches. The official role should help
      us to do it more effectively.
      
      Our priorities are:
      
          + prevent deadlocks (printk_safe patchset, console locks)
          + prevent softlocks (async printk, console_sem and flushing)
          + handle other bugs/fixes/features as they come
      
      with this in mind:
      
          + printk is used in different context
          + need special care in some modes, e.g. oops, panic, suspend
          + do best effort to store/show messages
          + the code is already pretty complicated and twisted;
            support clean ups; always think hard if a feature/fix
            is worth any complication
      
      Of course, it still will be much appreciated if other people review
      printk patches.
      
      Regarding the workflow. It will be highly appreciated if the patches
      might still go via Andrew's -mm tree at least for 4.10. In the long
      term, we would like to make Andrew's life easier and handle printk
      patches in an own git tree. But we first need to set it up and get
      familiar with the processes.
      
      Link: http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      548cf34b
  2. 01 Jan, 2017 2 commits
    • Linus Torvalds's avatar
      Linux 4.10-rc2 · 0c744ea4
      Linus Torvalds authored
      0c744ea4
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 4759d386
      Linus Torvalds authored
      Pull DAX updates from Dan Williams:
       "The completion of Jan's DAX work for 4.10.
      
        As I mentioned in the libnvdimm-for-4.10 pull request, these are some
        final fixes for the DAX dirty-cacheline-tracking invalidation work
        that was merged through the -mm, ext4, and xfs trees in -rc1. These
        patches were prepared prior to the merge window, but we waited for
        4.10-rc1 to have a stable merge base after all the prerequisites were
        merged.
      
        Quoting Jan on the overall changes in these patches:
      
           "So I'd like all these 6 patches to go for rc2. The first three
            patches fix invalidation of exceptional DAX entries (a bug which
            is there for a long time) - without these patches data loss can
            occur on power failure even though user called fsync(2). The other
            three patches change locking of DAX faults so that ->iomap_begin()
            is called in a more relaxed locking context and we are safe to
            start a transaction there for ext4"
      
        These have received a build success notification from the kbuild
        robot, and pass the latest libnvdimm unit tests. There have not been
        any -next releases since -rc1, so they have not appeared there"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        ext4: Simplify DAX fault path
        dax: Call ->iomap_begin without entry lock during dax fault
        dax: Finish fault completely when loading holes
        dax: Avoid page invalidation races and unnecessary radix tree traversals
        mm: Invalidate DAX radix tree entries only if appropriate
        ext2: Return BH_New buffers for zeroed blocks
      4759d386
  3. 30 Dec, 2016 2 commits
  4. 29 Dec, 2016 2 commits
    • Olof Johansson's avatar
      mm/filemap: fix parameters to test_bit() · 98473f9f
      Olof Johansson authored
       mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
        mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
          return test_bit(PG_waiters);
               ^~~~~~~~
      
      Fixes: b91e1302 ('mm: optimize PageWaiters bit use for unlock_page()')
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Brown-paper-bag-by: default avatarLinus Torvalds <dummy@duh.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98473f9f
    • Linus Torvalds's avatar
      mm: optimize PageWaiters bit use for unlock_page() · b91e1302
      Linus Torvalds authored
      In commit 62906027 ("mm: add PageWaiters indicating tasks are
      waiting for a page bit") Nick Piggin made our page locking no longer
      unconditionally touch the hashed page waitqueue, which not only helps
      performance in general, but is particularly helpful on NUMA machines
      where the hashed wait queues can bounce around a lot.
      
      However, the "clear lock bit atomically and then test the waiters bit"
      sequence turns out to be much more expensive than it needs to be,
      because you get a nasty stall when trying to access the same word that
      just got updated atomically.
      
      On architectures where locking is done with LL/SC, this would be trivial
      to fix with a new primitive that clears one bit and tests another
      atomically, but that ends up not working on x86, where the only atomic
      operations that return the result end up being cmpxchg and xadd.  The
      atomic bit operations return the old value of the same bit we changed,
      not the value of an unrelated bit.
      
      On x86, we could put the lock bit in the high bit of the byte, and use
      "xadd" with that bit (where the overflow ends up not touching other
      bits), and look at the other bits of the result.  However, an even
      simpler model is to just use a regular atomic "and" to clear the lock
      bit, and then the sign bit in eflags will indicate the resulting state
      of the unrelated bit #7.
      
      So by moving the PageWaiters bit up to bit #7, we can atomically clear
      the lock bit and test the waiters bit on x86 too.  And architectures
      with LL/SC (which is all the usual RISC suspects), the particular bit
      doesn't matter, so they are fine with this approach too.
      
      This avoids the extra access to the same atomic word, and thus avoids
      the costly stall at page unlock time.
      
      The only downside is that the interface ends up being a bit odd and
      specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
      love the resulting name of the new primitive, but I'd rather make the
      name be descriptive and very clear about the limitation imposed by
      trying to work across all relevant architectures than make it be some
      generic thing that doesn't make the odd semantics explicit.
      
      So this introduces the new architecture primitive
      
          clear_bit_unlock_is_negative_byte();
      
      and adds the trivial implementation for x86.  We have a generic
      non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
      combination) which can be overridden by any architecture that can do
      better.  According to Nick, Power has the same hickup x86 has, for
      example, but some other architectures may not even care.
      
      All these optimizations mean that my page locking stress-test (which is
      just executing a lot of small short-lived shell scripts: "make test" in
      the git source tree) no longer makes our page locking look horribly bad.
      Before all these optimizations, just the unlock_page() costs were just
      over 3% of all CPU overhead on "make test".  After this, it's down to
      0.66%, so just a quarter of the cost it used to be.
      
      (The difference on NUMA is bigger, but there this micro-optimization is
      likely less noticeable, since the big issue on NUMA was not the accesses
      to 'struct page', but the waitqueue accesses that were already removed
      by Nick's earlier commit).
      Acked-by: default avatarNick Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b91e1302
  5. 28 Dec, 2016 2 commits
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 2d706e79
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "This fixes a hash corruption bug in the marvell driver"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: marvell - Copy IVDIG before launching partial DMA ahash requests
      2d706e79
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 8f18e4d0
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.
      
          The most important is to not assume the packet is RX just because
          the destination address matches that of the device. Such an
          assumption causes problems when an interface is put into loopback
          mode.
      
       2) If we retry when creating a new tc entry (because we dropped the
          RTNL mutex in order to load a module, for example) we end up with
          -EAGAIN and then loop trying to replay the request. But we didn't
          reset some state when looping back to the top like this, and if
          another thread meanwhile inserted the same tc entry we were trying
          to, we re-link it creating an enless loop in the tc chain. Fix from
          Daniel Borkmann.
      
       3) There are two different WRITE bits in the MDIO address register for
          the stmmac chip, depending upon the chip variant. Due to a bug we
          could set them both, fix from Hock Leong Kweh.
      
       4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: stmmac: fix incorrect bit set in gmac4 mdio addr register
        r8169: add support for RTL8168 series add-on card.
        net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
        openvswitch: upcall: Fix vlan handling.
        ipv4: Namespaceify tcp_tw_reuse knob
        net: korina: Fix NAPI versus resources freeing
        net, sched: fix soft lockup in tc_classify
        net/mlx4_en: Fix user prio field in XDP forward
        tipc: don't send FIN message from connectionless socket
        ipvlan: fix multicast processing
        ipvlan: fix various issues in ipvlan_process_multicast()
      8f18e4d0
  6. 27 Dec, 2016 17 commits
  7. 26 Dec, 2016 5 commits
    • Al Viro's avatar
      arm64: don't pull uaccess.h into *.S · b4b8664d
      Al Viro authored
      Split asm-only parts of arm64 uaccess.h into a new header and use that
      from *.S.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b4b8664d
    • Florian Fainelli's avatar
      net: korina: Fix NAPI versus resources freeing · e6afb1ad
      Florian Fainelli authored
      Commit beb0babf ("korina: disable napi on close and restart")
      introduced calls to napi_disable() that were missing before,
      unfortunately this leaves a small window during which NAPI has a chance
      to run, yet we just freed resources since korina_free_ring() has been
      called:
      
      Fix this by disabling NAPI first then freeing resource, and make sure
      that we also cancel the restart task before doing the resource freeing.
      
      Fixes: beb0babf ("korina: disable napi on close and restart")
      Reported-by: default avatarAlexandros C. Couloumbis <alex@ozo.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6afb1ad
    • Daniel Borkmann's avatar
      net, sched: fix soft lockup in tc_classify · 628185cf
      Daniel Borkmann authored
      Shahar reported a soft lockup in tc_classify(), where we run into an
      endless loop when walking the classifier chain due to tp->next == tp
      which is a state we should never run into. The issue only seems to
      trigger under load in the tc control path.
      
      What happens is that in tc_ctl_tfilter(), thread A allocates a new
      tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
      with it. In that classifier callback we had to unlock/lock the rtnl
      mutex and returned with -EAGAIN. One reason why we need to drop there
      is, for example, that we need to request an action module to be loaded.
      
      This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
      after we loaded and found the requested action, we need to redo the
      whole request so we don't race against others. While we had to unlock
      rtnl in that time, thread B's request was processed next on that CPU.
      Thread B added a new tp instance successfully to the classifier chain.
      When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
      and destroying its tp instance which never got linked, we goto replay
      and redo A's request.
      
      This time when walking the classifier chain in tc_ctl_tfilter() for
      checking for existing tp instances we had a priority match and found
      the tp instance that was created and linked by thread B. Now calling
      again into tp->ops->change() with that tp was successful and returned
      without error.
      
      tp_created was never cleared in the second round, thus kernel thinks
      that we need to link it into the classifier chain (once again). tp and
      *back point to the same object due to the match we had earlier on. Thus
      for thread B's already public tp, we reset tp->next to tp itself and
      link it into the chain, which eventually causes the mentioned endless
      loop in tc_classify() once a packet hits the data path.
      
      Fix is to clear tp_created at the beginning of each request, also when
      we replay it. On the paths that can cause -EAGAIN we already destroy
      the original tp instance we had and on replay we really need to start
      from scratch. It seems that this issue was first introduced in commit
      12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
      and avoid kernel panic when we use cls_cgroup").
      
      Fixes: 12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
      Reported-by: default avatarShahar Klein <shahark@mellanox.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarShahar Klein <shahark@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      628185cf
    • Linus Torvalds's avatar
      Linux 4.10-rc1 · 7ce7d89f
      Linus Torvalds authored
      7ce7d89f
    • Larry Finger's avatar
      powerpc: Fix build warning on 32-bit PPC · 8ae679c4
      Larry Finger authored
      I am getting the following warning when I build kernel 4.9-git on my
      PowerBook G4 with a 32-bit PPC processor:
      
          AS      arch/powerpc/kernel/misc_32.o
        arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]
      
      This problem is evident after commit 989cea5c ("kbuild: prevent
      lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
      error that has been in the code since 2005 when this source file was
      created.  That was with commit 9994a338 ("powerpc: Introduce
      entry_{32,64}.S, misc_{32,64}.S, systbl.S").
      
      The offending line does not make a lot of sense.  This error does not
      seem to cause any errors in the executable, thus I am not recommending
      that it be applied to any stable versions.
      
      Thanks to Nicholas Piggin for suggesting this solution.
      
      Fixes: 9994a338 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ae679c4