• Jack Morgenstein's avatar
    IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level · fb7a9174
    Jack Morgenstein authored
    A warning message during SRIOV multicast cleanup should have actually been
    a debug level message. The condition generating the warning does no harm
    and can fill the message log.
    
    In some cases, during testing, some tests were so intense as to swamp the
    message log with these warning messages, causing a stall in the console
    message log output task. This stall caused an NMI to be sent to all CPUs
    (so that they all dumped their stacks into the message log).
    Aside from the message flood causing an NMI, the tests all passed.
    
    Once the message flood which caused the NMI is removed (by reducing the
    warning message to debug level), the NMI no longer occurs.
    
    Sample message log (console log) output illustrating the flood and
    resultant NMI (snippets with comments and modified with ... instead
    of hex digits, to satisfy checkpatch.pl):
    
     <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
     *** About 4000 almost identical lines in less than one second ***
     <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
     INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...)
     *** { 17} above indicates that CPU 17 was the one that stalled ***
     sending NMI to all CPUs:
     ...
     NMI backtrace for cpu 17
     CPU: 17 PID: 45909 Comm: kworker/17:2
     Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013
     Workqueue: events fb_flashcursor
     task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e......
     RIP: 0010:[ffffffff81......]  [ffffffff81......] io_serial_in+0x15/0x20
     RSP: 0018:ffff88064e257cb0  EFLAGS: 00000002
     RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000......
     RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81......
     RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000......
     R10: 0000000000...... R11: ffff88064e...... R12: 0000000000......
     R13: 0000000000...... R14: ffffffff81...... R15: 0000000000......
     FS:  0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080......
     CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000......
     DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000......
     DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000......
     Stack:
     ffff88064e...... ffffffff81...... ffffffff81...... 0000000000......
     ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81......
     ffffffff81...... ffff88064e...... ffffffff81...... 0000000000......
     Call Trace:
    [<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0
    [<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30
    [<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140
    [<ffffffff813cb5fa>] uart_console_write+0x3a/0x80
    [<ffffffff813d0aae>] serial8250_console_write+0xae/0x140
    [<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0
    [<ffffffff8107d6cf>] console_unlock+0x3bf/0x400
    [<ffffffff813503cd>] fb_flashcursor+0x5d/0x140
    [<ffffffff81355c30>] ? bit_clear+0x120/0x120
    [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
    [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
    [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
    [<ffffffff810a5aef>] kthread+0xcf/0xe0
    [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
    [<ffffffff81645858>] ret_from_fork+0x58/0x90
    [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
    Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6
    
    As indicated in the stack trace above, the console output task got swamped.
    
    Fixes: b9c5d6a6 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV")
    Cc: <stable@vger.kernel.org> # v3.6+
    Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    fb7a9174
mcg.c 35.1 KB