Commits · 245311637fddeca96c1f0758a649eb1fb437978e · Kirill Smelkov / linux

03 Feb, 2020 15 commits

netdevsim: remove unused sdev code · 24531163

Taehee Yoo authored Feb 01, 2020

sdev.c code is merged into dev.c and is not used anymore.
it would be removed.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

24531163

netdevsim: use __GFP_NOWARN to avoid memalloc warning · 83cf4213

Taehee Yoo authored Feb 01, 2020

vfnum buffer size and binary_len buffer size is received by user-space.
So, this buffer size could be too large. If so, kmalloc will internally
print a warning message.
This warning message is actually not necessary for the netdevsim module.
So, this patch adds __GFP_NOWARN.

Test commands:
    modprobe netdevsim
    echo 1 > /sys/bus/netdevsim/new_device
    echo 1000000000 > /sys/devices/netdevsim1/sriov_numvfs

Splat looks like:
[  357.847266][ T1000] WARNING: CPU: 0 PID: 1000 at mm/page_alloc.c:4738 __alloc_pages_nodemask+0x2f3/0x740
[  357.850273][ T1000] Modules linked in: netdevsim veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrx
[  357.852989][ T1000] CPU: 0 PID: 1000 Comm: bash Tainted: G    B             5.5.0-rc5+ #270
[  357.854334][ T1000] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  357.855703][ T1000] RIP: 0010:__alloc_pages_nodemask+0x2f3/0x740
[  357.856669][ T1000] Code: 64 fe ff ff 65 48 8b 04 25 c0 0f 02 00 48 05 f0 12 00 00 41 be 01 00 00 00 49 89 47 0
[  357.860272][ T1000] RSP: 0018:ffff8880b7f47bd8 EFLAGS: 00010246
[  357.861009][ T1000] RAX: ffffed1016fe8f80 RBX: 1ffff11016fe8fae RCX: 0000000000000000
[  357.861843][ T1000] RDX: 0000000000000000 RSI: 0000000000000017 RDI: 0000000000000000
[  357.862661][ T1000] RBP: 0000000000040dc0 R08: 1ffff11016fe8f67 R09: dffffc0000000000
[  357.863509][ T1000] R10: ffff8880b7f47d68 R11: fffffbfff2798180 R12: 1ffff11016fe8f80
[  357.864355][ T1000] R13: 0000000000000017 R14: 0000000000000017 R15: ffff8880c2038d68
[  357.865178][ T1000] FS:  00007fd9a5b8c740(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
[  357.866248][ T1000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  357.867531][ T1000] CR2: 000055ce01ba8100 CR3: 00000000b7dbe005 CR4: 00000000000606f0
[  357.868972][ T1000] Call Trace:
[  357.869423][ T1000]  ? lock_contended+0xcd0/0xcd0
[  357.870001][ T1000]  ? __alloc_pages_slowpath+0x21d0/0x21d0
[  357.870673][ T1000]  ? _kstrtoull+0x76/0x160
[  357.871148][ T1000]  ? alloc_pages_current+0xc1/0x1a0
[  357.871704][ T1000]  kmalloc_order+0x22/0x80
[  357.872184][ T1000]  kmalloc_order_trace+0x1d/0x140
[  357.872733][ T1000]  __kmalloc+0x302/0x3a0
[  357.873204][ T1000]  nsim_bus_dev_numvfs_store+0x1ab/0x260 [netdevsim]
[  357.873919][ T1000]  ? kernfs_get_active+0x12c/0x180
[  357.874459][ T1000]  ? new_device_store+0x450/0x450 [netdevsim]
[  357.875111][ T1000]  ? kernfs_get_parent+0x70/0x70
[  357.875632][ T1000]  ? sysfs_file_ops+0x160/0x160
[  357.876152][ T1000]  kernfs_fop_write+0x276/0x410
[  357.876680][ T1000]  ? __sb_start_write+0x1ba/0x2e0
[  357.877225][ T1000]  vfs_write+0x197/0x4a0
[  357.877671][ T1000]  ksys_write+0x141/0x1d0
[ ... ]
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 79579220 ("netdevsim: add SR-IOV functionality")
Fixes: 82c93a87 ("netdevsim: implement couple of testing devlink health reporters")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

83cf4213

netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs · 6556ff32

Taehee Yoo authored Feb 01, 2020

Debugfs APIs return valid pointer or error pointer. it doesn't return NULL.
So, using IS_ERR is enough, not using IS_ERR_OR_NULL.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6556ff32

netdevsim: fix stack-out-of-bounds in nsim_dev_debugfs_init() · 6fb8852b

Taehee Yoo authored Feb 01, 2020

When netdevsim dev is being created, a debugfs directory is created.
The variable "dev_ddir_name" is 16bytes device name pointer and device
name is "netdevsim<dev id>".
The maximum dev id length is 10.
So, 16bytes for device name isn't enough.

Test commands:
    modprobe netdevsim
    echo "1000000000 0" > /sys/bus/netdevsim/new_device

Splat looks like:
[  249.622710][  T900] BUG: KASAN: stack-out-of-bounds in number+0x824/0x880
[  249.623658][  T900] Write of size 1 at addr ffff88804c527988 by task bash/900
[  249.624521][  T900]
[  249.624830][  T900] CPU: 1 PID: 900 Comm: bash Not tainted 5.5.0+ #322
[  249.625691][  T900] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  249.626712][  T900] Call Trace:
[  249.627103][  T900]  dump_stack+0x96/0xdb
[  249.627639][  T900]  ? number+0x824/0x880
[  249.628173][  T900]  print_address_description.constprop.5+0x1be/0x360
[  249.629022][  T900]  ? number+0x824/0x880
[  249.629569][  T900]  ? number+0x824/0x880
[  249.630105][  T900]  __kasan_report+0x12a/0x170
[  249.630717][  T900]  ? number+0x824/0x880
[  249.631201][  T900]  kasan_report+0xe/0x20
[  249.631723][  T900]  number+0x824/0x880
[  249.632235][  T900]  ? put_dec+0xa0/0xa0
[  249.632716][  T900]  ? rcu_read_lock_sched_held+0x90/0xc0
[  249.633392][  T900]  vsnprintf+0x63c/0x10b0
[  249.633983][  T900]  ? pointer+0x5b0/0x5b0
[  249.634543][  T900]  ? mark_lock+0x11d/0xc40
[  249.635200][  T900]  sprintf+0x9b/0xd0
[  249.635750][  T900]  ? scnprintf+0xe0/0xe0
[  249.636370][  T900]  nsim_dev_probe+0x63c/0xbf0 [netdevsim]
[ ... ]
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes: ab1d0cc0 ("netdevsim: change debugfs tree topology")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6fb8852b

netdevsim: fix panic in nsim_dev_take_snapshot_write() · 8526ad96

Taehee Yoo authored Feb 01, 2020

nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region.
So, during this function, these data shouldn't be removed.
But there is no protecting stuff in this function.

There are two similar cases.
1. reload case
reload could be called during nsim_dev_take_snapshot_write().
When reload is being executed, nsim_dev_reload_down() is called and it
calls nsim_dev_reload_destroy(). nsim_dev_reload_destroy() calls
devlink_region_destroy() to destroy nsim_dev->dummy_region.
So, during nsim_dev_take_snapshot_write(), nsim_dev->dummy_region()
would be removed.
At this point, snapshot_write() would access freed pointer.
In order to fix this case, take_snapshot file will be removed before
devlink_region_destroy().
The take_snapshot file will be re-created by ->reload_up().

2. del_device_store case
del_device_store() also could call nsim_dev_reload_destroy()
during nsim_dev_take_snapshot_write(). If so, panic would occur.
This problem is actually the same problem with the first case.
So, this problem will be fixed by the first case's solution.

Test commands:
    modprobe netdevsim
    while :
    do
        echo 1 > /sys/bus/netdevsim/new_device &
        echo 1 > /sys/bus/netdevsim/del_device &
	devlink dev reload netdevsim/netdevsim1 &
	echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/take_snapshot &
    done

Splat looks like:
[   45.564513][  T975] general protection fault, probably for non-canonical address 0xdffffc000000003a: 0000 [#1] SMP DEI
[   45.566131][  T975] KASAN: null-ptr-deref in range [0x00000000000001d0-0x00000000000001d7]
[   45.566135][  T975] CPU: 1 PID: 975 Comm: bash Not tainted 5.5.0+ #322
[   45.569020][  T975] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   45.569026][  T975] RIP: 0010:__mutex_lock+0x10a/0x14b0
[   45.570518][  T975] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[   45.570522][  T975] RSP: 0018:ffff888046ccfbf0 EFLAGS: 00010206
[   45.572305][  T975] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   45.572308][  T975] RDX: 000000000000003a RSI: ffffffffac926440 RDI: 00000000000001d0
[   45.576843][  T975] RBP: ffff888046ccfd70 R08: ffffffffab610645 R09: 0000000000000000
[   45.576847][  T975] R10: ffff888046ccfd90 R11: ffffed100d6360ad R12: 0000000000000000
[   45.578471][  T975] R13: dffffc0000000000 R14: ffffffffae1976c0 R15: 0000000000000168
[   45.578475][  T975] FS:  00007f614d6e7740(0000) GS:ffff88806c400000(0000) knlGS:0000000000000000
[   45.581492][  T975] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.582942][  T975] CR2: 00005618677d1cf0 CR3: 000000005fb9c002 CR4: 00000000000606e0
[   45.584543][  T975] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.586633][  T975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   45.589889][  T975] Call Trace:
[   45.591445][  T975]  ? devlink_region_snapshot_create+0x55/0x4a0
[   45.601250][  T975]  ? mutex_lock_io_nested+0x1380/0x1380
[   45.602817][  T975]  ? mutex_lock_io_nested+0x1380/0x1380
[   45.603875][  T975]  ? mark_held_locks+0xa5/0xe0
[   45.604769][  T975]  ? _raw_spin_unlock_irqrestore+0x2d/0x50
[   45.606147][  T975]  ? __mutex_unlock_slowpath+0xd0/0x670
[   45.607723][  T975]  ? crng_backtrack_protect+0x80/0x80
[   45.613530][  T975]  ? wait_for_completion+0x390/0x390
[   45.615152][  T975]  ? devlink_region_snapshot_create+0x55/0x4a0
[   45.616834][  T975]  devlink_region_snapshot_create+0x55/0x4a0
[ ... ]

Fixes: 4418f862 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8526ad96

netdevsim: disable devlink reload when resources are being used · 6ab63366

Taehee Yoo authored Feb 01, 2020

devlink reload destroys resources and allocates resources again.
So, when devices and ports resources are being used, devlink reload
function should not be executed. In order to avoid this race, a new
lock is added and new_port() and del_port() call devlink_reload_disable()
and devlink_reload_enable().

Thread0                      Thread1
{new/del}_port()             {new/del}_port()
devlink_reload_disable()
                             devlink_reload_disable()
devlink_reload_enable()
                             //here
                             devlink_reload_enable()

Before Thread1's devlink_reload_enable(), the devlink is already allowed
to execute reload because Thread0 allows it. devlink reload disable/enable
variable type is bool. So the above case would exist.
So, disable/enable should be executed atomically.
In order to do that, a new lock is used.

Test commands:
    modprobe netdevsim
    echo 1 > /sys/bus/netdevsim/new_device
    while :
    do
        echo 1 > /sys/devices/netdevsim1/new_port &
        echo 1 > /sys/devices/netdevsim1/del_port &
        devlink dev reload netdevsim/netdevsim1 &
    done

Splat looks like:
[   23.342145][  T932] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
[   23.342159][  T932] WARNING: CPU: 0 PID: 932 at kernel/locking/mutex-debug.c:103 mutex_destroy+0xc7/0xf0
[   23.344182][  T932] Modules linked in: netdevsim openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_dx
[   23.346485][  T932] CPU: 0 PID: 932 Comm: devlink Not tainted 5.5.0+ #322
[   23.347696][  T932] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   23.348893][  T932] RIP: 0010:mutex_destroy+0xc7/0xf0
[   23.349505][  T932] Code: e0 07 83 c0 03 38 d0 7c 04 84 d2 75 2e 8b 05 00 ac b0 02 85 c0 75 8b 48 c7 c6 00 5e 07 96 40
[   23.351887][  T932] RSP: 0018:ffff88806208f810 EFLAGS: 00010286
[   23.353963][  T932] RAX: dffffc0000000008 RBX: ffff888067f6f2c0 RCX: ffffffff942c4bd4
[   23.355222][  T932] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff96dac5b4
[   23.356169][  T932] RBP: ffff888067f6f000 R08: fffffbfff2d235a5 R09: fffffbfff2d235a5
[   23.357160][  T932] R10: 0000000000000001 R11: fffffbfff2d235a4 R12: ffff888067f6f208
[   23.358288][  T932] R13: ffff88806208fa70 R14: ffff888067f6f000 R15: ffff888069ce3800
[   23.359307][  T932] FS:  00007fe2a3876740(0000) GS:ffff88806c000000(0000) knlGS:0000000000000000
[   23.360473][  T932] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.361319][  T932] CR2: 00005561357aa000 CR3: 000000005227a006 CR4: 00000000000606f0
[   23.362323][  T932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.363417][  T932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.364414][  T932] Call Trace:
[   23.364828][  T932]  nsim_dev_reload_destroy+0x77/0xb0 [netdevsim]
[   23.365655][  T932]  nsim_dev_reload_down+0x84/0xb0 [netdevsim]
[   23.366433][  T932]  devlink_reload+0xb1/0x350
[   23.367010][  T932]  genl_rcv_msg+0x580/0xe90

[ ...]

[   23.531729][ T1305] kernel BUG at lib/list_debug.c:53!
[   23.532523][ T1305] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[   23.533467][ T1305] CPU: 2 PID: 1305 Comm: bash Tainted: G        W         5.5.0+ #322
[   23.534962][ T1305] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   23.536503][ T1305] RIP: 0010:__list_del_entry_valid+0xe6/0x150
[   23.538346][ T1305] Code: 89 ea 48 c7 c7 00 73 1e 96 e8 df f7 4c ff 0f 0b 48 c7 c7 60 73 1e 96 e8 d1 f7 4c ff 0f 0b 44
[   23.541068][ T1305] RSP: 0018:ffff888047c27b58 EFLAGS: 00010282
[   23.542001][ T1305] RAX: 0000000000000054 RBX: ffff888067f6f318 RCX: 0000000000000000
[   23.543051][ T1305] RDX: 0000000000000054 RSI: 0000000000000008 RDI: ffffed1008f84f61
[   23.544072][ T1305] RBP: ffff88804aa0fca0 R08: ffffed100d940539 R09: ffffed100d940539
[   23.545085][ T1305] R10: 0000000000000001 R11: ffffed100d940538 R12: ffff888047c27cb0
[   23.546422][ T1305] R13: ffff88806208b840 R14: ffffffff981976c0 R15: ffff888067f6f2c0
[   23.547406][ T1305] FS:  00007f76c0431740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[   23.548527][ T1305] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.549389][ T1305] CR2: 00007f5048f1a2f8 CR3: 000000004b310006 CR4: 00000000000606e0
[   23.550636][ T1305] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.551578][ T1305] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.552597][ T1305] Call Trace:
[   23.553004][ T1305]  mutex_remove_waiter+0x101/0x520
[   23.553646][ T1305]  __mutex_lock+0xac7/0x14b0
[   23.554218][ T1305]  ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[   23.554908][ T1305]  ? mutex_lock_io_nested+0x1380/0x1380
[   23.555570][ T1305]  ? _parse_integer+0xf0/0xf0
[   23.556043][ T1305]  ? kstrtouint+0x86/0x110
[   23.556504][ T1305]  ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[   23.557133][ T1305]  nsim_dev_port_del+0x4e/0x140 [netdevsim]
[   23.558024][ T1305]  del_port_store+0xcc/0xf0 [netdevsim]
[ ... ]

Fixes: 75ba029f ("netdevsim: implement proper devlink reload")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6ab63366

netdevsim: fix using uninitialized resources · f5cd2160

Taehee Yoo authored Feb 01, 2020

When module is being initialized, __init() calls bus_register() and
driver_register().
These functions internally create various resources and sysfs files.
The sysfs files are used for basic operations(add/del device).
/sys/bus/netdevsim/new_device
/sys/bus/netdevsim/del_device

These sysfs files use netdevsim resources, they are mostly allocated
and initialized in ->probe() function, which is nsim_dev_probe().
But, sysfs files could be executed before ->probe() is finished.
So, accessing uninitialized data would occur.

Another problem is very similar.
/sys/bus/netdevsim/new_device internally creates sysfs files.
/sys/devices/netdevsim<id>/new_port
/sys/devices/netdevsim<id>/del_port

These sysfs files also use netdevsim resources, they are mostly allocated
and initialized in creating device routine, which is nsim_bus_dev_new().
But they also could be executed before nsim_bus_dev_new() is finished.
So, accessing uninitialized data would occur.

To fix these problems, this patch adds flags, which means whether the
operation is finished or not.
The flag variable 'nsim_bus_enable' means whether netdevsim bus was
initialized or not.
This is protected by nsim_bus_dev_list_lock.
The flag variable 'nsim_bus_dev->init' means whether nsim_bus_dev was
initialized or not.
This could be used in {new/del}_port_store() with no lock.

Test commands:
    #SHELL1
    modprobe netdevsim
    while :
    do
        echo "1 1" > /sys/bus/netdevsim/new_device
        echo "1 1" > /sys/bus/netdevsim/del_device
    done

    #SHELL2
    while :
    do
        echo 1 > /sys/devices/netdevsim1/new_port
        echo 1 > /sys/devices/netdevsim1/del_port
    done

Splat looks like:
[   47.508954][ T1008] general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 I
[   47.510793][ T1008] KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f]
[   47.511963][ T1008] CPU: 2 PID: 1008 Comm: bash Not tainted 5.5.0+ #322
[   47.512823][ T1008] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   47.514041][ T1008] RIP: 0010:__mutex_lock+0x10a/0x14b0
[   47.514699][ T1008] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[   47.517163][ T1008] RSP: 0018:ffff888059b4fbb0 EFLAGS: 00010206
[   47.517802][ T1008] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   47.518941][ T1008] RDX: 0000000000000021 RSI: ffffffff85926440 RDI: 0000000000000108
[   47.519732][ T1008] RBP: ffff888059b4fd30 R08: ffffffffc073fad0 R09: 0000000000000000
[   47.520729][ T1008] R10: ffff888059b4fd50 R11: ffff88804bb38040 R12: 0000000000000000
[   47.521702][ T1008] R13: dffffc0000000000 R14: ffffffff871976c0 R15: 00000000000000a0
[   47.522760][ T1008] FS:  00007fd4be05a740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[   47.523877][ T1008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.524627][ T1008] CR2: 0000561c82b69cf0 CR3: 0000000065dd6004 CR4: 00000000000606e0
[   47.527662][ T1008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   47.528604][ T1008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   47.529531][ T1008] Call Trace:
[   47.529874][ T1008]  ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[   47.530470][ T1008]  ? mutex_lock_io_nested+0x1380/0x1380
[   47.531018][ T1008]  ? _kstrtoull+0x76/0x160
[   47.531449][ T1008]  ? _parse_integer+0xf0/0xf0
[   47.531874][ T1008]  ? kernfs_fop_write+0x1cf/0x410
[   47.532330][ T1008]  ? sysfs_file_ops+0x160/0x160
[   47.532773][ T1008]  ? kstrtouint+0x86/0x110
[   47.533168][ T1008]  ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[   47.533721][ T1008]  nsim_dev_port_add+0x50/0x150 [netdevsim]
[   47.534336][ T1008]  ? sysfs_file_ops+0x160/0x160
[   47.534858][ T1008]  new_port_store+0x99/0xb0 [netdevsim]
[   47.535439][ T1008]  ? del_port_store+0xb0/0xb0 [netdevsim]
[   47.536035][ T1008]  ? sysfs_file_ops+0x112/0x160
[   47.536544][ T1008]  ? sysfs_kf_write+0x3b/0x180
[   47.537029][ T1008]  kernfs_fop_write+0x276/0x410
[   47.537548][ T1008]  ? __sb_start_write+0x215/0x2e0
[   47.538110][ T1008]  vfs_write+0x197/0x4a0
[ ... ]

Fixes: f9d9db47 ("netdevsim: add bus attributes to add new and delete devices")
Fixes: 794b2c05 ("netdevsim: extend device attrs to support port addition and deletion")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f5cd2160

Merge branch 'bnxt_en-Bug-fixes' · 2b5ea294

Jakub Kicinski authored Feb 03, 2020

Michael Chan says:

=====================
bnxt_en: Bug fixes

3 patches that fix some issues in the firmware reset logic, starting
with a small patch to refactor the code that re-enables SRIOV.  The
last patch fixes a TC queue mapping issue.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2b5ea294

bnxt_en: Fix TC queue mapping. · 18e4960c

Michael Chan authored Feb 02, 2020

The driver currently only calls netdev_set_tc_queue when the number of
TCs is greater than 1. Instead, the comparison should be greater than
or equal to 1. Even with 1 TC, we need to set the queue mapping.

This bug can cause warnings when the number of TCs is changed back to 1.

Fixes: 7809592d ("bnxt_en: Enable MSIX early in bnxt_init_one().")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

18e4960c

bnxt_en: Fix logic that disables Bus Master during firmware reset. · d4073028

Vasundhara Volam authored Feb 02, 2020

The current logic that calls pci_disable_device() in __bnxt_close_nic()
during firmware reset is flawed.  If firmware is still alive, we're
disabling the device too early, causing some firmware commands to
not reach the firmware.

Fix it by moving the logic to bnxt_reset_close().  If firmware is
in fatal condition, we call pci_disable_device() before we free
any of the rings to prevent DMA corruption of the freed rings.  If
firmware is still alive, we call pci_disable_device() after the
last firmware message has been sent.

Fixes: 3bc7d4a3 ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d4073028

bnxt_en: Fix RDMA driver failure with SRIOV after firmware reset. · 12de2ead

Michael Chan authored Feb 02, 2020

bnxt_ulp_start() needs to be called before SRIOV is re-enabled after
firmware reset. Re-enabling SRIOV may consume all the resources and
may cause the RDMA driver to fail to get MSIX and other resources.
Fix it by calling bnxt_ulp_start() first before calling
bnxt_reenable_sriov().

We re-arrange the logic so that we call bnxt_ulp_start() and
bnxt_reenable_sriov() in proper sequence in bnxt_fw_reset_task() and
bnxt_open(). The former is the normal coordinated firmware reset sequence
and the latter is firmware reset while the function is down. This new
logic is now more straight forward and will now fix both scenarios.

Fixes: f3a6d206 ("bnxt_en: Call bnxt_ulp_stop()/bnxt_ulp_start() during error recovery.")
Reported-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

12de2ead

bnxt_en: Refactor logic to re-enable SRIOV after firmware reset detected. · c16d4ee0

Michael Chan authored Feb 02, 2020

Put the current logic in bnxt_open() to re-enable SRIOV after detecting
firmware reset into a new function bnxt_reenable_sriov(). This call
needs to be invoked in the firmware reset path also in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c16d4ee0

net: stmmac: Delete txtimer in suspend() · 14b41a29

Nicolin Chen authored Jan 31, 2020

When running v5.5 with a rootfs on NFS, memory abort may happen in
the system resume stage:
 Unable to handle kernel paging request at virtual address dead00000000012a
 [dead00000000012a] address between user and kernel address ranges
 pc : run_timer_softirq+0x334/0x3d8
 lr : run_timer_softirq+0x244/0x3d8
 x1 : ffff800011cafe80 x0 : dead000000000122
 Call trace:
  run_timer_softirq+0x334/0x3d8
  efi_header_end+0x114/0x234
  irq_exit+0xd0/0xd8
  __handle_domain_irq+0x60/0xb0
  gic_handle_irq+0x58/0xa8
  el1_irq+0xb8/0x180
  arch_cpu_idle+0x10/0x18
  do_idle+0x1d8/0x2b0
  cpu_startup_entry+0x24/0x40
  secondary_start_kernel+0x1b4/0x208
 Code: f9000693 a9400660 f9000020 b4000040 (f9000401)
 ---[ end trace bb83ceeb4c482071 ]---
 Kernel panic - not syncing: Fatal exception in interrupt
 SMP: stopping secondary CPUs
 SMP: failed to stop secondary CPUs 2-3
 Kernel Offset: disabled
 CPU features: 0x00002,2300aa30
 Memory Limit: none
 ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

It's found that stmmac_xmit() and stmmac_resume() sometimes might
run concurrently, possibly resulting in a race condition between
mod_timer() and setup_timer(), being called by stmmac_xmit() and
stmmac_resume() respectively.

Since the resume() runs setup_timer() every time, it'd be safer to
have del_timer_sync() in the suspend() as the counterpart.
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

14b41a29

Merge tag 'rxrpc-fixes-20200203' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 3d80c653

Jakub Kicinski authored Feb 03, 2020

David Howells says:

====================
RxRPC fixes

Here are a number of fixes for AF_RXRPC:

 (1) Fix a potential use after free in rxrpc_put_local() where it was
     accessing the object just put to get tracing information.

 (2) Fix insufficient notifications being generated by the function that
     queues data packets on a call.  This occasionally causes recvmsg() to
     stall indefinitely.

 (3) Fix a number of packet-transmitting work functions to hold an active
     count on the local endpoint so that the UDP socket doesn't get
     destroyed whilst they're calling kernel_sendmsg() on it.

 (4) Fix a NULL pointer deref that stemmed from a call's connection pointer
     being cleared when the call was disconnected.

Changes:

 v2: Removed a couple of BUG() statements that got added.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3d80c653

rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect · 5273a191

David Howells authored Jan 30, 2020

When a call is disconnected, the connection pointer from the call is
cleared to make sure it isn't used again and to prevent further attempted
transmission for the call. Unfortunately, there might be a daemon trying
to use it at the same time to transmit a packet.

Fix this by keeping call->conn set, but setting a flag on the call to
indicate disconnection instead.

Remove also the bits in the transmission functions where the conn pointer is
checked and a ref taken under spinlock as this is now redundant.

Fixes: 8d94aa38 ("rxrpc: Calls shouldn't hold socket refs")
Signed-off-by: David Howells <dhowells@redhat.com>

5273a191

02 Feb, 2020 4 commits

Merge branch 'Fix-reconnection-latency-caused-by-FIN-ACK-handling-race' · 83d0585f

Jakub Kicinski authored Feb 02, 2020

SeongJae Park says:

====================
Fix reconnection latency caused by FIN/ACK handling race

The first patch fixes the problem by adjusting the first resend delay of
the SYN in the case.  The second one adds a user space test to reproduce
this problem.

From v2
(https://lore.kernel.org/linux-kselftest/20200201071859.4231-1-sj38.park@gmail.com/)
 - Use TCP_TIMEOUT_MIN as reduced delay (Neal Cardwall)
 - Add Reviewed-by and Signed-off-by from Eric Dumazet

From v1
(https://lore.kernel.org/linux-kselftest/20200131122421.23286-1-sjpark@amazon.com/)
 - Drop the trivial comment fix patch (Eric Dumazet)
 - Limit the delay adjustment to only the first SYN resend (Eric Dumazet)
 - selftest: Avoid use of hard-coded port number (Eric Dumazet)
 - Explain RST/ACK and FIN/ACK has no big difference (Neal Cardwell)
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

83d0585f

selftests: net: Add FIN_ACK processing order related latency spike test · af8c8a45

SeongJae Park authored Feb 02, 2020

This commit adds a test for FIN_ACK process races related reconnection
latency spike issues.  The issue has described and solved by the
previous commit ("tcp: Reduce SYN resend delay if a suspicous ACK is
received").

The test program is configured with a server and a client process.  The
server creates and binds a socket to a port that dynamically allocated,
listen on it, and start a infinite loop.  Inside the loop, it accepts
connection, reads 4 bytes from the socket, and closes the connection.
The client is constructed as an infinite loop.  Inside the loop, it
creates a socket with LINGER and NODELAY option, connect to the server,
send 4 bytes data, try read some data from server.  After the read()
returns, it measure the latency from the beginning of this loop to this
point and if the latency is larger than 1 second (spike), print a
message.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af8c8a45

tcp: Reduce SYN resend delay if a suspicous ACK is received · 9603d47b

SeongJae Park authored Feb 02, 2020

When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order.  This is possible in RSS disabled environments such as a
connection inside a host.

For example, expected state transitions and required packets for the
disconnection will be similar to below flow.

	 00 (Process A)				(Process B)
	 01 ESTABLISHED				ESTABLISHED
	 02 close()
	 03 FIN_WAIT_1
	 04 		---FIN-->
	 05 					CLOSE_WAIT
	 06 		<--ACK---
	 07 FIN_WAIT_2
	 08 		<--FIN/ACK---
	 09 TIME_WAIT
	 10 		---ACK-->
	 11 					LAST_ACK
	 12 CLOSED				CLOSED

In some cases such as LINGER option applied socket, the FIN and FIN/ACK
will be substituted to RST and RST/ACK, but there is no difference in
the main logic.

The acks in lines 6 and 8 are the acks.  If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B.  Thus, Process B will left in
CLOSE_WAIT status, as below.

	 00 (Process A)				(Process B)
	 01 ESTABLISHED				ESTABLISHED
	 02 close()
	 03 FIN_WAIT_1
	 04 		---FIN-->
	 05 					CLOSE_WAIT
	 06 				(<--ACK---)
	 07	  			(<--FIN/ACK---)
	 08 				(fired in right order)
	 09 		<--FIN/ACK---
	 10 		<--ACK---
	 11 		(processed in reverse order)
	 12 FIN_WAIT_2

Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number.  Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection.  If reconnections are frequent, the one second latency
spikes can be a big problem.  Below is a tcpdump results of the problem:

    14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
    14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
    14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298
    /* ONE SECOND DELAY */
    15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644

This commit mitigates the problem by reducing the delay for the next SYN
if the suspicous ACK is received while in SYN_SENT state.

Following commit will add a selftest, which can be also helpful for
understanding of this issue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9603d47b

MAINTAINERS: correct entries for ISDN/mISDN section · dff6bc1b

Lukas Bulwahn authored Feb 01, 2020

Commit 6d979850 ("isdn: move capi drivers to staging") cleaned up the
isdn drivers and split the MAINTAINERS section for ISDN, but missed to add
the terminal slash for the two directories mISDN and hardware. Hence, all
files in those directories were not part of the new ISDN/mISDN SUBSYSTEM,
but were considered to be part of "THE REST".

Rectify the situation, and while at it, also complete the section with two
further build files that belong to that subsystem.

This was identified with a small script that finds all files belonging to
"THE REST" according to the current MAINTAINERS file, and I investigated
upon its output.

Fixes: 6d979850 ("isdn: move capi drivers to staging")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dff6bc1b

01 Feb, 2020 9 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · b7c3a17c

Jakub Kicinski authored Feb 01, 2020

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix suspicious RCU usage in ipset, from Jozsef Kadlecsik.

2) Use kvcalloc, from Joe Perches.

3) Flush flowtable hardware workqueue after garbage collection run,
   from Paul Blakey.

4) Missing flowtable hardware workqueue flush from nf_flow_table_free(),
   also from Paul.

5) Restore NF_FLOW_HW_DEAD in flow_offload_work_del(), from Paul.

6) Flowtable documentation fixes, from Matteo Croce.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b7c3a17c

cls_rsvp: fix rsvp_policy · cb3c0e6b

Eric Dumazet authored Jan 31, 2020

NLA_BINARY can be confusing, since .len value represents
the max size of the blob.

cls_rsvp really wants user space to provide long enough data
for TCA_RSVP_DST and TCA_RSVP_SRC attributes.

BUG: KMSAN: uninit-value in rsvp_get net/sched/cls_rsvp.h:258 [inline]
BUG: KMSAN: uninit-value in gen_handle net/sched/cls_rsvp.h:402 [inline]
BUG: KMSAN: uninit-value in rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572
CPU: 1 PID: 13228 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x220 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
 rsvp_get net/sched/cls_rsvp.h:258 [inline]
 gen_handle net/sched/cls_rsvp.h:402 [inline]
 rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572
 tc_new_tfilter+0x31fe/0x5010 net/sched/cls_api.c:2104
 rtnetlink_rcv_msg+0xcb7/0x1570 net/core/rtnetlink.c:5415
 netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477
 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442
 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
 netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328
 netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917
 sock_sendmsg_nosec net/socket.c:639 [inline]
 sock_sendmsg net/socket.c:659 [inline]
 ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330
 ___sys_sendmsg net/socket.c:2384 [inline]
 __sys_sendmsg+0x451/0x5f0 net/socket.c:2417
 __do_sys_sendmsg net/socket.c:2426 [inline]
 __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45b349
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f269d43dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f269d43e6d4 RCX: 000000000045b349
RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000009c2 R14: 00000000004cb338 R15: 000000000075bfd4

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
 kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
 slab_alloc_node mm/slub.c:2774 [inline]
 __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382
 __kmalloc_reserve net/core/skbuff.c:141 [inline]
 __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209
 alloc_skb include/linux/skbuff.h:1049 [inline]
 netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
 netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892
 sock_sendmsg_nosec net/socket.c:639 [inline]
 sock_sendmsg net/socket.c:659 [inline]
 ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330
 ___sys_sendmsg net/socket.c:2384 [inline]
 __sys_sendmsg+0x451/0x5f0 net/socket.c:2417
 __do_sys_sendmsg net/socket.c:2426 [inline]
 __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 6fa8c014 ("[NET_SCHED]: Use nla_policy for attribute validation in classifiers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cb3c0e6b

MAINTAINERS: Orphan HSR network protocol · e8d5bb4d

Sven Eckelmann authored Jan 31, 2020

The current maintainer Arvid Brodin <arvid.brodin@alten.se> hasn't
contributed to the kernel since 2015-02-27. His company mail address is
also bouncing and the company confirmed (2020-01-31) that no Arvid Brodin
is working for them:

> Vi har dessvärre ingen  Arvid Brodin som arbetar på ALTEN.

A MIA person cannot be the maintainer. It is better to mark is as orphaned
until some other person can jump in and take over the responsibility for
HSR.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e8d5bb4d

qed: Fix a error code in qed_hw_init() · d32a06f5

Dan Carpenter authored Jan 31, 2020

If the qed_fw_overlay_mem_alloc() then we should return -ENOMEM instead
of success.

Fixes: 30d5f858 ("qed: FW 8.42.2.0 Add fw overlay feature")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d32a06f5

octeontx2-pf: Fix an IS_ERR() vs NULL bug · 08ff7818

Dan Carpenter authored Jan 31, 2020

The otx2_mbox_get_rsp() function never returns NULL, it returns error
pointers on error.

Fixes: 34bfe0eb ("octeontx2-pf: MTU, MAC and RX mode config support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

08ff7818

tcp: clear tp->segs_{in|out} in tcp_disconnect() · 784f8344

Eric Dumazet authored Jan 31, 2020

tp->segs_in and tp->segs_out need to be cleared in tcp_disconnect().

tcp_disconnect() is rarely used, but it is worth fixing it.

Fixes: 2efd055c ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

784f8344

tcp: clear tp->data_segs{in|out} in tcp_disconnect() · db7ffee6

Eric Dumazet authored Jan 31, 2020

tp->data_segs_in and tp->data_segs_out need to be cleared
in tcp_disconnect().

tcp_disconnect() is rarely used, but it is worth fixing it.

Fixes: a44d6eac ("tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

db7ffee6

tcp: clear tp->delivered in tcp_disconnect() · 2fbdd562

Eric Dumazet authored Jan 31, 2020

tp->delivered needs to be cleared in tcp_disconnect().

tcp_disconnect() is rarely used, but it is worth fixing it.

Fixes: ddf1af6f ("tcp: new delivery accounting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2fbdd562

tcp: clear tp->total_retrans in tcp_disconnect() · c13c48c0

Eric Dumazet authored Jan 31, 2020

total_retrans needs to be cleared in tcp_disconnect().

tcp_disconnect() is rarely used, but it is worth fixing it.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c13c48c0

31 Jan, 2020 10 commits

netfilter: nf_flowtable: fix documentation · 78e06cf4

Matteo Croce authored Jan 30, 2020

In the flowtable documentation there is a missing semicolon, the command
as is would give this error:

    nftables.conf:5:27-33: Error: syntax error, unexpected devices, expecting newline or semicolon
                    hook ingress priority 0 devices = { br0, pppoe-data };
                                            ^^^^^^^
    nftables.conf:4:12-13: Error: invalid hook (null)
            flowtable ft {
                      ^^

Fixes: 19b351f1 ("netfilter: add flowtable documentation")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

78e06cf4

netfilter: flowtable: Fix setting forgotten NF_FLOW_HW_DEAD flag · c22208b7

Paul Blakey authored Jan 30, 2020

During the refactor this was accidently removed.

Fixes: ae290450 ("netfilter: flowtable: add nf_flow_offload_tuple() helper")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

c22208b7

netfilter: flowtable: Fix missing flush hardware on table free · 0f34f30a

Paul Blakey authored Jan 30, 2020

If entries exist when freeing a hardware offload enabled table,
we queue work for hardware while running the gc iteration.

Execute it (flush) after queueing.

Fixes: c29f74e0 ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0f34f30a

netfilter: flowtable: Fix hardware flush order on nf_flow_table_cleanup · 91bfaa15

Paul Blakey authored Jan 30, 2020

On netdev down event, nf_flow_table_cleanup() is called for the relevant
device and it cleans all the tables that are on that device.
If one of those tables has hardware offload flag,
nf_flow_table_iterate_cleanup flushes hardware and then runs the gc.
But the gc can queue more hardware work, which will take time to execute.

Instead first add the work, then flush it, to execute it now.

Fixes: c29f74e0 ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

91bfaa15

netfilter: Use kvcalloc · b9e0102a

Joe Perches authored Jan 28, 2020

Convert the uses of kvmalloc_array with __GFP_ZERO to
the equivalent kvcalloc.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

b9e0102a

mlxsw: spectrum_qdisc: Fix 64-bit division error in mlxsw_sp_qdisc_tbf_rate_kbps · 91a7d4bf

Nathan Chancellor authored Jan 30, 2020

When building arm32 allmodconfig:

ERROR: "__aeabi_uldivmod"
[drivers/net/ethernet/mellanox/mlxsw/mlxsw_spectrum.ko] undefined!

rate_bytes_ps has type u64, we need to use a 64-bit division helper to
avoid a build error.

Fixes: a44f58c4 ("mlxsw: spectrum_qdisc: Support offloading of TBF Qdisc")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

91a7d4bf

ionic: fix rxq comp packet type mask · b5ce31b5

Shannon Nelson authored Jan 30, 2020

Be sure to include all the packet type bits in the mask.

Fixes: fbfb8031 ("ionic: Add hardware init and device commands")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b5ce31b5

net: phy: at803x: disable vddio regulator · 2318ca8a

Michael Walle authored Jan 30, 2020

The probe() might enable a VDDIO regulator, which needs to be disabled
again before calling regulator_put(). Add a remove() function.

Fixes: 2f664823 ("net: phy: at803x: add device tree binding")
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2318ca8a

net: mii_timestamper: fix static allocation by PHY driver · 2e1bf3a7

Michael Walle authored Jan 30, 2020

If phydev->mii_ts is set by the PHY driver, it will always be
overwritten in of_mdiobus_register_phy(). Fix it. Also make sure, that
the unregister() doesn't do anything if the mii_timestamper was provided by
the PHY driver.

Fixes: 1dca22b1 ("net: mdio: of: Register discovered MII time stampers.")
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2e1bf3a7

net: mdio: of: fix potential NULL pointer derefernce · 0e0daf6a

Michael Walle authored Jan 30, 2020

of_find_mii_timestamper() returns NULL if no timestamper is found.
Therefore, guard the unregister_mii_timestamper() calls.

Fixes: 1dca22b1 ("net: mdio: of: Register discovered MII time stampers.")
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

0e0daf6a

30 Jan, 2020 2 commits

rxrpc: Fix missing active use pinning of rxrpc_local object · 04d36d74

David Howells authored Jan 30, 2020

The introduction of a split between the reference count on rxrpc_local
objects and the usage count didn't quite go far enough.  A number of kernel
work items need to make use of the socket to perform transmission.  These
also need to get an active count on the local object to prevent the socket
from being closed.

Fix this by getting the active count in those places.

Also split out the raw active count get/put functions as these places tend
to hold refs on the rxrpc_local object already, so getting and putting an
extra object ref is just a waste of time.

The problem can lead to symptoms like:

    BUG: kernel NULL pointer dereference, address: 0000000000000018
    ..
    CPU: 2 PID: 818 Comm: kworker/u9:0 Not tainted 5.5.0-fscache+ #51
    ...
    RIP: 0010:selinux_socket_sendmsg+0x5/0x13
    ...
    Call Trace:
     security_socket_sendmsg+0x2c/0x3e
     sock_sendmsg+0x1a/0x46
     rxrpc_send_keepalive+0x131/0x1ae
     rxrpc_peer_keepalive_worker+0x219/0x34b
     process_one_work+0x18e/0x271
     worker_thread+0x1a3/0x247
     kthread+0xe6/0xeb
     ret_from_fork+0x1f/0x30

Fixes: 730c5fd4 ("rxrpc: Fix local endpoint refcounting")
Signed-off-by: David Howells <dhowells@redhat.com>

04d36d74

rxrpc: Fix insufficient receive notification generation · f71dbf2f

David Howells authored Jan 30, 2020

In rxrpc_input_data(), rxrpc_notify_socket() is called if the base sequence
number of the packet is immediately following the hard-ack point at the end
of the function.  However, this isn't sufficient, since the recvmsg side
may have been advancing the window and then overrun the position in which
we're adding - at which point rx_hard_ack >= seq0 and no notification is
generated.

Fix this by always generating a notification at the end of the input
function.

Without this, a long call may stall, possibly indefinitely.

Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>

f71dbf2f