Commits · feb3c766a3ab32d233aaff7db13afd9ba5bc142d · nexedi / linux

21 Nov, 2017 9 commits

apparmor: fix possible recursive lock warning in __aa_create_ns · feb3c766

John Johansen authored Nov 20, 2017

Use mutex_lock_nested to provide lockdep the parent child lock ordering of
the tree.

This fixes the lockdep Warning
[  305.275177] ============================================
[  305.275178] WARNING: possible recursive locking detected
[  305.275179] 4.14.0-rc7+ #320 Not tainted
[  305.275180] --------------------------------------------
[  305.275181] apparmor_parser/1339 is trying to acquire lock:
[  305.275182]  (&ns->lock){+.+.}, at: [<ffffffff970544dd>] __aa_create_ns+0x6d/0x1e0
[  305.275187]
               but task is already holding lock:
[  305.275187]  (&ns->lock){+.+.}, at: [<ffffffff97054b5d>] aa_prepare_ns+0x3d/0xd0
[  305.275190]
               other info that might help us debug this:
[  305.275191]  Possible unsafe locking scenario:

[  305.275192]        CPU0
[  305.275193]        ----
[  305.275193]   lock(&ns->lock);
[  305.275194]   lock(&ns->lock);
[  305.275195]
                *** DEADLOCK ***

[  305.275196]  May be due to missing lock nesting notation

[  305.275198] 2 locks held by apparmor_parser/1339:
[  305.275198]  #0:  (sb_writers#10){.+.+}, at: [<ffffffff96e9c6b7>] vfs_write+0x1a7/0x1d0
[  305.275202]  #1:  (&ns->lock){+.+.}, at: [<ffffffff97054b5d>] aa_prepare_ns+0x3d/0xd0
[  305.275205]
               stack backtrace:
[  305.275207] CPU: 1 PID: 1339 Comm: apparmor_parser Not tainted 4.14.0-rc7+ #320
[  305.275208] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[  305.275209] Call Trace:
[  305.275212]  dump_stack+0x85/0xcb
[  305.275214]  __lock_acquire+0x141c/0x1460
[  305.275216]  ? __aa_create_ns+0x6d/0x1e0
[  305.275218]  ? ___slab_alloc+0x183/0x540
[  305.275219]  ? ___slab_alloc+0x183/0x540
[  305.275221]  lock_acquire+0xed/0x1e0
[  305.275223]  ? lock_acquire+0xed/0x1e0
[  305.275224]  ? __aa_create_ns+0x6d/0x1e0
[  305.275227]  __mutex_lock+0x89/0x920
[  305.275228]  ? __aa_create_ns+0x6d/0x1e0
[  305.275230]  ? trace_hardirqs_on_caller+0x11f/0x190
[  305.275231]  ? __aa_create_ns+0x6d/0x1e0
[  305.275233]  ? __lockdep_init_map+0x57/0x1d0
[  305.275234]  ? lockdep_init_map+0x9/0x10
[  305.275236]  ? __rwlock_init+0x32/0x60
[  305.275238]  mutex_lock_nested+0x1b/0x20
[  305.275240]  ? mutex_lock_nested+0x1b/0x20
[  305.275241]  __aa_create_ns+0x6d/0x1e0
[  305.275243]  aa_prepare_ns+0xc2/0xd0
[  305.275245]  aa_replace_profiles+0x168/0xf30
[  305.275247]  ? __might_fault+0x85/0x90
[  305.275250]  policy_update+0xb9/0x380
[  305.275252]  profile_load+0x7e/0x90
[  305.275254]  __vfs_write+0x28/0x150
[  305.275256]  ? rcu_read_lock_sched_held+0x72/0x80
[  305.275257]  ? rcu_sync_lockdep_assert+0x2f/0x60
[  305.275259]  ? __sb_start_write+0xdc/0x1c0
[  305.275261]  ? vfs_write+0x1a7/0x1d0
[  305.275262]  vfs_write+0xca/0x1d0
[  305.275264]  ? trace_hardirqs_on_caller+0x11f/0x190
[  305.275266]  SyS_write+0x49/0xa0
[  305.275268]  entry_SYSCALL_64_fastpath+0x23/0xc2
[  305.275271] RIP: 0033:0x7fa6b22e8c74
[  305.275272] RSP: 002b:00007ffeaaee6288 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  305.275273] RAX: ffffffffffffffda RBX: 00007ffeaaee62a4 RCX: 00007fa6b22e8c74
[  305.275274] RDX: 0000000000000a51 RSI: 00005566a8198c10 RDI: 0000000000000004
[  305.275275] RBP: 0000000000000a39 R08: 0000000000000a51 R09: 0000000000000000
[  305.275276] R10: 0000000000000000 R11: 0000000000000246 R12: 00005566a8198c10
[  305.275277] R13: 0000000000000004 R14: 00005566a72ecb88 R15: 00005566a72ec3a8

Fixes: 73688d1e ("apparmor: refactor prepare_ns() and make usable from different views")
Signed-off-by: John Johansen <john.johansen@canonical.com>

feb3c766

apparmor: fix locking when creating a new complain profile. · 5d7c44ef

John Johansen authored Nov 20, 2017

Break the per cpu buffer atomic section when creating a new null
complain profile. In learning mode this won't matter and we can
safely re-aquire the buffer.

This fixes the following lockdep BUG trace
nov. 14 14:09:09 cyclope audit[7152]: AVC apparmor="ALLOWED" operation="exec" profile="/usr/sbin/sssd" name="/usr/sbin/adcli" pid=7152 comm="sssd_be" requested_mask="x" denied_mask="x" fsuid=0 ouid=0 target="/usr/sbin/sssd//null-/usr/sbin/adcli"
nov. 14 14:09:09 cyclope kernel: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
nov. 14 14:09:09 cyclope kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 7152, name: sssd_be
nov. 14 14:09:09 cyclope kernel: 1 lock held by sssd_be/7152:
nov. 14 14:09:09 cyclope kernel: #0: (&sig->cred_guard_mutex){....}, at: [<ffffffff8182d53e>] prepare_bprm_creds+0x4e/0x100
nov. 14 14:09:09 cyclope kernel: CPU: 3 PID: 7152 Comm: sssd_be Not tainted 4.14.0prahal+intel #150
nov. 14 14:09:09 cyclope kernel: Hardware name: LENOVO 20CDCTO1WW/20CDCTO1WW, BIOS GQET53WW (1.33 ) 09/15/2017
nov. 14 14:09:09 cyclope kernel: Call Trace:
nov. 14 14:09:09 cyclope kernel: dump_stack+0xb0/0x135
nov. 14 14:09:09 cyclope kernel: ? _atomic_dec_and_lock+0x15b/0x15b
nov. 14 14:09:09 cyclope kernel: ? lockdep_print_held_locks+0xc4/0x130
nov. 14 14:09:09 cyclope kernel: ___might_sleep+0x29c/0x320
nov. 14 14:09:09 cyclope kernel: ? rq_clock+0xf0/0xf0
nov. 14 14:09:09 cyclope kernel: ? __kernel_text_address+0xd/0x40
nov. 14 14:09:09 cyclope kernel: __might_sleep+0x95/0x190
nov. 14 14:09:09 cyclope kernel: ? aa_new_null_profile+0x50a/0x960
nov. 14 14:09:09 cyclope kernel: __mutex_lock+0x13e/0x1a20
nov. 14 14:09:09 cyclope kernel: ? aa_new_null_profile+0x50a/0x960
nov. 14 14:09:09 cyclope kernel: ? save_stack+0x43/0xd0
nov. 14 14:09:09 cyclope kernel: ? kmem_cache_alloc_trace+0x13f/0x290
nov. 14 14:09:09 cyclope kernel: ? mutex_lock_io_nested+0x1880/0x1880
nov. 14 14:09:09 cyclope kernel: ? profile_transition+0x932/0x2d40
nov. 14 14:09:09 cyclope kernel: ? apparmor_bprm_set_creds+0x1479/0x1f70
nov. 14 14:09:09 cyclope kernel: ? security_bprm_set_creds+0x5a/0x80
nov. 14 14:09:09 cyclope kernel: ? prepare_binprm+0x366/0x980
nov. 14 14:09:09 cyclope kernel: ? do_execveat_common.isra.30+0x12a9/0x2350
nov. 14 14:09:09 cyclope kernel: ? SyS_execve+0x2c/0x40
nov. 14 14:09:09 cyclope kernel: ? do_syscall_64+0x228/0x650
nov. 14 14:09:09 cyclope kernel: ? entry_SYSCALL64_slow_path+0x25/0x25
nov. 14 14:09:09 cyclope kernel: ? deactivate_slab.isra.62+0x49d/0x5e0
nov. 14 14:09:09 cyclope kernel: ? save_stack_trace+0x16/0x20
nov. 14 14:09:09 cyclope kernel: ? init_object+0x88/0x90
nov. 14 14:09:09 cyclope kernel: ? ___slab_alloc+0x520/0x590
nov. 14 14:09:09 cyclope kernel: ? ___slab_alloc+0x520/0x590
nov. 14 14:09:09 cyclope kernel: ? aa_alloc_proxy+0xab/0x200
nov. 14 14:09:09 cyclope kernel: ? lock_downgrade+0x7e0/0x7e0
nov. 14 14:09:09 cyclope kernel: ? memcg_kmem_get_cache+0x970/0x970
nov. 14 14:09:09 cyclope kernel: ? kasan_unpoison_shadow+0x35/0x50
nov. 14 14:09:09 cyclope kernel: ? kasan_unpoison_shadow+0x35/0x50
nov. 14 14:09:09 cyclope kernel: ? kasan_kmalloc+0xad/0xe0
nov. 14 14:09:09 cyclope kernel: ? aa_alloc_proxy+0xab/0x200
nov. 14 14:09:09 cyclope kernel: ? kmem_cache_alloc_trace+0x13f/0x290
nov. 14 14:09:09 cyclope kernel: ? aa_alloc_proxy+0xab/0x200
nov. 14 14:09:09 cyclope kernel: ? aa_alloc_proxy+0xab/0x200
nov. 14 14:09:09 cyclope kernel: ? _raw_spin_unlock+0x22/0x30
nov. 14 14:09:09 cyclope kernel: ? vec_find+0xa0/0xa0
nov. 14 14:09:09 cyclope kernel: ? aa_label_init+0x6f/0x230
nov. 14 14:09:09 cyclope kernel: ? __label_insert+0x3e0/0x3e0
nov. 14 14:09:09 cyclope kernel: ? kmem_cache_alloc_trace+0x13f/0x290
nov. 14 14:09:09 cyclope kernel: ? aa_alloc_profile+0x58/0x200
nov. 14 14:09:09 cyclope kernel: mutex_lock_nested+0x16/0x20
nov. 14 14:09:09 cyclope kernel: ? mutex_lock_nested+0x16/0x20
nov. 14 14:09:09 cyclope kernel: aa_new_null_profile+0x50a/0x960
nov. 14 14:09:09 cyclope kernel: ? aa_fqlookupn_profile+0xdc0/0xdc0
nov. 14 14:09:09 cyclope kernel: ? aa_compute_fperms+0x4b5/0x640
nov. 14 14:09:09 cyclope kernel: ? disconnect.isra.2+0x1b0/0x1b0
nov. 14 14:09:09 cyclope kernel: ? aa_str_perms+0x8d/0xe0
nov. 14 14:09:09 cyclope kernel: profile_transition+0x932/0x2d40
nov. 14 14:09:09 cyclope kernel: ? up_read+0x1a/0x40
nov. 14 14:09:09 cyclope kernel: ? ext4_xattr_get+0x15c/0xaf0 [ext4]
nov. 14 14:09:09 cyclope kernel: ? x_table_lookup+0x190/0x190
nov. 14 14:09:09 cyclope kernel: ? ext4_xattr_ibody_get+0x590/0x590 [ext4]
nov. 14 14:09:09 cyclope kernel: ? sched_clock+0x9/0x10
nov. 14 14:09:09 cyclope kernel: ? sched_clock+0x9/0x10
nov. 14 14:09:09 cyclope kernel: ? ext4_xattr_security_get+0x1a/0x20 [ext4]
nov. 14 14:09:09 cyclope kernel: ? __vfs_getxattr+0x6d/0xa0
nov. 14 14:09:09 cyclope kernel: ? get_vfs_caps_from_disk+0x114/0x720
nov. 14 14:09:09 cyclope kernel: ? sched_clock+0x9/0x10
nov. 14 14:09:09 cyclope kernel: ? sched_clock+0x9/0x10
nov. 14 14:09:09 cyclope kernel: ? tsc_resume+0x10/0x10
nov. 14 14:09:09 cyclope kernel: ? get_vfs_caps_from_disk+0x720/0x720
nov. 14 14:09:09 cyclope kernel: ? native_sched_clock_from_tsc+0x201/0x2b0
nov. 14 14:09:09 cyclope kernel: ? sched_clock+0x9/0x10
nov. 14 14:09:09 cyclope kernel: ? sched_clock_cpu+0x1b/0x170
nov. 14 14:09:09 cyclope kernel: ? find_held_lock+0x3c/0x1e0
nov. 14 14:09:09 cyclope kernel: ? rb_insert_color_cached+0x1660/0x1660
nov. 14 14:09:09 cyclope kernel: apparmor_bprm_set_creds+0x1479/0x1f70
nov. 14 14:09:09 cyclope kernel: ? sched_clock+0x9/0x10
nov. 14 14:09:09 cyclope kernel: ? handle_onexec+0x31d0/0x31d0
nov. 14 14:09:09 cyclope kernel: ? tsc_resume+0x10/0x10
nov. 14 14:09:09 cyclope kernel: ? graph_lock+0xd0/0xd0
nov. 14 14:09:09 cyclope kernel: ? tsc_resume+0x10/0x10
nov. 14 14:09:09 cyclope kernel: ? sched_clock_cpu+0x1b/0x170
nov. 14 14:09:09 cyclope kernel: ? sched_clock+0x9/0x10
nov. 14 14:09:09 cyclope kernel: ? sched_clock+0x9/0x10
nov. 14 14:09:09 cyclope kernel: ? sched_clock_cpu+0x1b/0x170
nov. 14 14:09:09 cyclope kernel: ? find_held_lock+0x3c/0x1e0
nov. 14 14:09:09 cyclope kernel: security_bprm_set_creds+0x5a/0x80
nov. 14 14:09:09 cyclope kernel: prepare_binprm+0x366/0x980
nov. 14 14:09:09 cyclope kernel: ? install_exec_creds+0x150/0x150
nov. 14 14:09:09 cyclope kernel: ? __might_fault+0x89/0xb0
nov. 14 14:09:09 cyclope kernel: ? up_read+0x40/0x40
nov. 14 14:09:09 cyclope kernel: ? get_user_arg_ptr.isra.18+0x2c/0x70
nov. 14 14:09:09 cyclope kernel: ? count.isra.20.constprop.32+0x7c/0xf0
nov. 14 14:09:09 cyclope kernel: do_execveat_common.isra.30+0x12a9/0x2350
nov. 14 14:09:09 cyclope kernel: ? prepare_bprm_creds+0x100/0x100
nov. 14 14:09:09 cyclope kernel: ? _raw_spin_unlock+0x22/0x30
nov. 14 14:09:09 cyclope kernel: ? deactivate_slab.isra.62+0x49d/0x5e0
nov. 14 14:09:09 cyclope kernel: ? save_stack_trace+0x16/0x20
nov. 14 14:09:09 cyclope kernel: ? init_object+0x88/0x90
nov. 14 14:09:09 cyclope kernel: ? ___slab_alloc+0x520/0x590
nov. 14 14:09:09 cyclope kernel: ? ___slab_alloc+0x520/0x590
nov. 14 14:09:09 cyclope kernel: ? kasan_check_write+0x14/0x20
nov. 14 14:09:09 cyclope kernel: ? memcg_kmem_get_cache+0x970/0x970
nov. 14 14:09:09 cyclope kernel: ? kasan_unpoison_shadow+0x35/0x50
nov. 14 14:09:09 cyclope kernel: ? glob_match+0x730/0x730
nov. 14 14:09:09 cyclope kernel: ? kmem_cache_alloc+0x225/0x280
nov. 14 14:09:09 cyclope kernel: ? getname_flags+0xb8/0x510
nov. 14 14:09:09 cyclope kernel: ? mm_fault_error+0x2e0/0x2e0
nov. 14 14:09:09 cyclope kernel: ? getname_flags+0xf6/0x510
nov. 14 14:09:09 cyclope kernel: ? ptregs_sys_vfork+0x10/0x10
nov. 14 14:09:09 cyclope kernel: SyS_execve+0x2c/0x40
nov. 14 14:09:09 cyclope kernel: do_syscall_64+0x228/0x650
nov. 14 14:09:09 cyclope kernel: ? syscall_return_slowpath+0x2f0/0x2f0
nov. 14 14:09:09 cyclope kernel: ? syscall_return_slowpath+0x167/0x2f0
nov. 14 14:09:09 cyclope kernel: ? prepare_exit_to_usermode+0x220/0x220
nov. 14 14:09:09 cyclope kernel: ? prepare_exit_to_usermode+0xda/0x220
nov. 14 14:09:09 cyclope kernel: ? perf_trace_sys_enter+0x1060/0x1060
nov. 14 14:09:09 cyclope kernel: ? __put_user_4+0x1c/0x30
nov. 14 14:09:09 cyclope kernel: entry_SYSCALL64_slow_path+0x25/0x25
nov. 14 14:09:09 cyclope kernel: RIP: 0033:0x7f9320f23637
nov. 14 14:09:09 cyclope kernel: RSP: 002b:00007fff783be338 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
nov. 14 14:09:09 cyclope kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9320f23637
nov. 14 14:09:09 cyclope kernel: RDX: 0000558c35002a70 RSI: 0000558c3505bd10 RDI: 0000558c35018b90
nov. 14 14:09:09 cyclope kernel: RBP: 0000558c34b63ae8 R08: 0000558c3505bd10 R09: 0000000000000080
nov. 14 14:09:09 cyclope kernel: R10: 0000000000000095 R11: 0000000000000202 R12: 0000000000000001
nov. 14 14:09:09 cyclope kernel: R13: 0000558c35018b90 R14: 0000558c3505bd18 R15: 0000558c3505bd10

Fixes: 4227c333 ("apparmor: Move path lookup to using preallocated buffers")
BugLink: http://bugs.launchpad.net/bugs/173228Reported-by: Alban Browaeys <prahal@yahoo.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

5d7c44ef

apparmor: fix profile attachment for special unconfined profiles · 06d426d1

John Johansen authored Nov 17, 2017

It used to be that unconfined would never attach. However that is not
the case anymore as some special profiles can be marked as unconfined,
that are not the namespaces unconfined profile, and may have an
attachment.

Fixes: f1bd9041 ("apparmor: add the base fns() for domain labels")
Signed-off-by: John Johansen <john.johansen@canonical.com>

06d426d1

apparmor: ensure that undecidable profile attachments fail · 844b8292

John Johansen authored Nov 17, 2017

Profiles that have an undecidable overlap in their attachments are
being incorrectly handled. Instead of failing to attach the first one
encountered is being used.

eg.
  profile A /** { .. }
  profile B /*foo { .. }

have an unresolvable longest left attachment, they both have an exact
match on / and then have an overlapping expression that has no clear
winner.

Currently the winner will be the profile that is loaded first which
can result in non-deterministic behavior. Instead in this situation
the exec should fail.

Fixes: 898127c3 ("AppArmor: functions for domain transitions")
Signed-off-by: John Johansen <john.johansen@canonical.com>

844b8292

apparmor: fix leak of null profile name if profile allocation fails · 4633307e

John Johansen authored Nov 15, 2017

Fixes: d07881d2 ("apparmor: move new_null_profile to after profile lookup fns()")
Reported-by: Seth Arnold <seth.arnold@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

4633307e

apparmor: remove unused redundant variable stop · e3bcfc14

Colin Ian King authored Oct 14, 2017

The boolean variable 'stop' is being set but never read. This
is a redundant variable and can be removed.

Cleans up clang warning: Value stored to 'stop' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

e3bcfc14

apparmor: Fix bool initialization/comparison · 954317fe

Thomas Meyer authored Oct 07, 2017

Bool initializations should use true and false. Bool tests don't need
comparisons.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: John Johansen <john.johansen@canonical.com>

954317fe

apparmor: initialized returned struct aa_perms · 7bba39ae

Arnd Bergmann authored Sep 15, 2017

gcc-4.4 points out suspicious code in compute_mnt_perms, where
the aa_perms structure is only partially initialized before getting
returned:

security/apparmor/mount.c: In function 'compute_mnt_perms':
security/apparmor/mount.c:227: error: 'perms.prompt' is used uninitialized in this function
security/apparmor/mount.c:227: error: 'perms.hide' is used uninitialized in this function
security/apparmor/mount.c:227: error: 'perms.cond' is used uninitialized in this function
security/apparmor/mount.c:227: error: 'perms.complain' is used uninitialized in this function
security/apparmor/mount.c:227: error: 'perms.stop' is used uninitialized in this function
security/apparmor/mount.c:227: error: 'perms.deny' is used uninitialized in this function

Returning or assigning partially initialized structures is a bit tricky,
in particular it is explicitly allowed in c99 to assign a partially
initialized structure to another, as long as only members are read that
have been initialized earlier. Looking at what various compilers do here,
the version that produced the warning copied uninitialized stack data,
while newer versions (and also clang) either set the other members to
zero or don't update the parts of the return buffer that are not modified
in the temporary structure, but they never warn about this.

In case of apparmor, it seems better to be a little safer and always
initialize the aa_perms structure. Most users already do that, this
changes the remaining ones, including the one instance that I got the
warning for.

Fixes: fa488437d0f9 ("apparmor: add mount mediation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>

7bba39ae

apparmor: fix spelling mistake: "resoure" -> "resource" · 5933a627

Colin Ian King authored Aug 24, 2017

Trivial fix to spelling mistake in comment and also with text in
audit_resource call.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>

5933a627

29 Oct, 2017 29 commits

Linux 4.14-rc7 · 0b07194b
Linus Torvalds authored Oct 29, 2017

0b07194b

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 19e12196

Linus Torvalds authored Oct 29, 2017

Pull networking fixes from David Miller:

 1) Fix route leak in xfrm_bundle_create().

 2) In mac80211, validate user rate mask before configuring it. From
    Johannes Berg.

 3) Properly enforce memory limits in fair queueing code, from Toke
    Hoiland-Jorgensen.

 4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.

 5) Fix TSO header allocation and management in mvpp2 driver, from Yan
    Markman.

 6) Don't take socket lock in BH handler in strparser code, from Tom
    Herbert.

 7) Don't show sockets from other namespaces in AF_UNIX code, from
    Andrei Vagin.

 8) Fix double free in error path of tap_open(), from Girish Moodalbail.

 9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
    and Alexander Duyck.

10) Fix DCB mode programming in stmmac driver, from Jose Abreu.

11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
    Long.

12) Properly align SKB head before building SKB in tuntap, from Jason
    Wang.

13) Avoid matching qdiscs with a zero handle during lookups, from Cong
    Wang.

14) Fix various endianness bugs in sctp, from Xin Long.

15) Fix tc filter callback races and add selftests which trigger the
    problem, from Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
  selftests: Introduce a new test case to tc testsuite
  selftests: Introduce a new script to generate tc batch file
  net_sched: fix call_rcu() race on act_sample module removal
  net_sched: add rtnl assertion to tcf_exts_destroy()
  net_sched: use tcf_queue_work() in tcindex filter
  net_sched: use tcf_queue_work() in rsvp filter
  net_sched: use tcf_queue_work() in route filter
  net_sched: use tcf_queue_work() in u32 filter
  net_sched: use tcf_queue_work() in matchall filter
  net_sched: use tcf_queue_work() in fw filter
  net_sched: use tcf_queue_work() in flower filter
  net_sched: use tcf_queue_work() in flow filter
  net_sched: use tcf_queue_work() in cgroup filter
  net_sched: use tcf_queue_work() in bpf filter
  net_sched: use tcf_queue_work() in basic filter
  net_sched: introduce a workqueue for RCU callbacks of tc filter
  sctp: fix some type cast warnings introduced since very beginning
  sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
  sctp: fix some type cast warnings introduced by transport rhashtable
  sctp: fix some type cast warnings introduced by stream reconf
  ...

19e12196

Merge branch 'net_sched-fix-races-with-RCU-callbacks' · 6c325f4e

David S. Miller authored Oct 29, 2017

Cong Wang says:

====================
net_sched: fix races with RCU callbacks

Recently, the RCU callbacks used in TC filters and TC actions keep
drawing my attention, they introduce at least 4 race condition bugs:

1. A simple one fixed by Daniel:

commit c78e1746
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed May 20 17:13:33 2015 +0200

    net: sched: fix call_rcu() race on classifier module unloads

2. A very nasty one fixed by me:

commit 1697c4bb
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Mon Sep 11 16:33:32 2017 -0700

    net_sched: carefully handle tcf_block_put()

3. Two more bugs found by Chris:
https://patchwork.ozlabs.org/patch/826696/
https://patchwork.ozlabs.org/patch/826695/

Usually RCU callbacks are simple, however for TC filters and actions,
they are complex because at least TC actions could be destroyed
together with the TC filter in one callback. And RCU callbacks are
invoked in BH context, without locking they are parallel too. All of
these contribute to the cause of these nasty bugs.

Alternatively, we could also:

a) Introduce a spinlock to serialize these RCU callbacks. But as I
said in commit 1697c4bb ("net_sched: carefully handle
tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
Potentially we need to do a lot of work to make it possible (if not
impossible).

b) Just get rid of these RCU callbacks, because they are not
necessary at all, callers of these call_rcu() are all on slow paths
and holding RTNL lock, so blocking is allowed in their contexts.
However, David and Eric dislike adding synchronize_rcu() here.

As suggested by Paul, we could defer the work to a workqueue and
gain the permission of holding RTNL again without any performance
impact, however, in tcf_block_put() we could have a deadlock when
flushing workqueue while hodling RTNL lock, the trick here is to
defer the work itself in workqueue and make it queued after all
other works so that we keep the same ordering to avoid any
use-after-free. Please see the first patch for details.

Patch 1 introduces the infrastructure, patch 2~12 move each
tc filter to the new tc filter workqueue, patch 13 adds
an assertion to catch potential bugs like this, patch 14
closes another rcu callback race, patch 15 and patch 16 add
new test cases.
====================
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c325f4e

selftests: Introduce a new test case to tc testsuite · 31c2611b

Chris Mi authored Oct 26, 2017

In this patchset, we fixed a tc bug. This patch adds the test case
that reproduces the bug. To run this test case, user should specify
an existing NIC device:
  # sudo ./tdc.py -d enp4s0f0

This test case belongs to category "flower". If user doesn't specify
a NIC device, the test cases belong to "flower" will not be run.

In this test case, we create 1M filters and all filters share the same
action. When destroying all filters, kernel should not panic. It takes
about 18s to run it.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31c2611b

selftests: Introduce a new script to generate tc batch file · 7f071998

Chris Mi authored Oct 26, 2017

  # ./tdc_batch.py -h
  usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file

  TC batch file generator

  positional arguments:
    device                device name
    file                  batch file name

  optional arguments:
    -h, --help            show this help message and exit
    -n NUMBER, --number NUMBER
                          how many lines in batch file
    -o, --skip_sw         skip_sw (offload), by default skip_hw
    -s, --share_action    all filters share the same action
    -p, --prio            all filters have different prio
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7f071998