- 06 Feb, 2017 2 commits
-
-
Kevin Cernekee authored
The libnetfilter_conntrack userland library always sets IPS_CONFIRMED when building a CTA_STATUS attribute. If this toggles the bit from 0->1, the parser will return an error. On Linux 4.4+ this will cause any NFQA_EXP attribute in the packet to be ignored. This breaks conntrackd's userland helpers because they operate on unconfirmed connections. Instead of returning -EBUSY if the user program asks to modify an unchangeable bit, simply ignore the change. Also, fix the logic so that user programs are allowed to clear the bits that they are allowed to change. Signed-off-by: Kevin Cernekee <cernekee@chromium.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jiri Kosina authored
Commit 3bb398d9 ("netfilter: nf_ct_helper: disable automatic helper assignment") is causing behavior regressions in firewalls, as traffic handled by conntrack helpers is now by default not passed through even though it was before due to missing CT targets (which were not necessary before this commit). The default had to be switched off due to security reasons [1] [2] and therefore should stay the way it is, but let's be friendly to firewall admins and issue a warning the first time we're in situation where packet would be likely passed through with the old default but we're likely going to drop it on the floor now. Rewrite the code a little bit as suggested by Linus, so that we avoid spaghettiing the code even more -- namely the whole decision making process regarding helper selection (either automatic or not) is being separated, so that the whole logic can be simplified and code (condition) duplication reduced. [1] https://cansecwest.com/csw12/conntrack-attack.pdf [2] https://home.regit.org/netfilter-en/secure-use-of-helpers/Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 03 Feb, 2017 4 commits
-
-
Mao Wenan authored
There is currently no reference count being held on the PHY driver, which makes it possible to remove the PHY driver module while the PHY state machine is running and polling the PHY. This could cause crashes similar to this one to show up: [ 43.361162] BUG: unable to handle kernel NULL pointer dereference at 0000000000000140 [ 43.361162] IP: phy_state_machine+0x32/0x490 [ 43.361162] PGD 59dc067 [ 43.361162] PUD 0 [ 43.361162] [ 43.361162] Oops: 0000 [#1] SMP [ 43.361162] Modules linked in: dsa_loop [last unloaded: broadcom] [ 43.361162] CPU: 0 PID: 1299 Comm: kworker/0:3 Not tainted 4.10.0-rc5+ #415 [ 43.361162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014 [ 43.361162] Workqueue: events_power_efficient phy_state_machine [ 43.361162] task: ffff880006782b80 task.stack: ffffc90000184000 [ 43.361162] RIP: 0010:phy_state_machine+0x32/0x490 [ 43.361162] RSP: 0018:ffffc90000187e18 EFLAGS: 00000246 [ 43.361162] RAX: 0000000000000000 RBX: ffff8800059e53c0 RCX: ffff880006a15c60 [ 43.361162] RDX: ffff880006782b80 RSI: 0000000000000000 RDI: ffff8800059e5428 [ 43.361162] RBP: ffffc90000187e48 R08: ffff880006a15c40 R09: 0000000000000000 [ 43.361162] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800059e5428 [ 43.361162] R13: ffff8800059e5000 R14: 0000000000000000 R15: ffff880006a15c40 [ 43.361162] FS: 0000000000000000(0000) GS:ffff880006a00000(0000) knlGS:0000000000000000 [ 43.361162] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 43.361162] CR2: 0000000000000140 CR3: 0000000005979000 CR4: 00000000000006f0 [ 43.361162] Call Trace: [ 43.361162] process_one_work+0x1b4/0x3e0 [ 43.361162] worker_thread+0x43/0x4d0 [ 43.361162] ? __schedule+0x17f/0x4e0 [ 43.361162] kthread+0xf7/0x130 [ 43.361162] ? process_one_work+0x3e0/0x3e0 [ 43.361162] ? kthread_create_on_node+0x40/0x40 [ 43.361162] ret_from_fork+0x29/0x40 [ 43.361162] Code: 56 41 55 41 54 4c 8d 67 68 53 4c 8d af 40 fc ff ff 48 89 fb 4c 89 e7 48 83 ec 08 e8 c9 9d 27 00 48 8b 83 60 ff ff ff 44 8b 73 98 <48> 8b 90 40 01 00 00 44 89 f0 48 85 d2 74 08 4c 89 ef ff d2 8b Keep references on the PHY driver module right before we are going to utilize it in phy_attach_direct(), and conversely when we don't use it anymore in phy_detach(). Signed-off-by: Mao Wenan <maowenan@huawei.com> [florian: rebase, rework commit message] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Martin KaFai Lau says: ==================== mlx4: Misc bug fixes after reinitializing queues This patchset fixes misc bugs after reinitializing queues (e.g. by ethtool -L). v2: * Add another fix to mem leak in tx_ring[t] and tx_cq[t] * In mlx4_en_try_alloc_resources(), move all xdp_prog logic after calling mlx4_en_alloc_resources() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin KaFai Lau authored
After calling mlx4_en_try_alloc_resources (e.g. by changing the number of rx-queues with ethtool -L), the existing xdp_prog becomes inactive. The bug is that the xdp_prog ptr has not been carried over from the old rx-queues to the new rx-queues Fixes: 47a38e15 ("net/mlx4_en: add support for fast rx drop bpf program") Cc: Brenden Blanco <bblanco@plumgrid.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin KaFai Lau authored
In mlx4_en_update_priv(), dst->tx_ring[t] and dst->tx_cq[t] are over-written by src->tx_ring[t] and src->tx_cq[t] without first calling kfree. One of the reproducible code paths is by doing 'ethtool -L'. The fix is to do the kfree in mlx4_en_free_resources(). Here is the kmemleak report: unreferenced object 0xffff880841211800 (size 2048): comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81930718>] kmemleak_alloc+0x28/0x50 [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260 [<ffffffff8170e0a8>] mlx4_en_try_alloc_resources+0x118/0x1a0 [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210 [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190 [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0 [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50 [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0 [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0 [<ffffffff812480f9>] SyS_ioctl+0x79/0x90 [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff880841213000 (size 2048): comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81930718>] kmemleak_alloc+0x28/0x50 [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260 [<ffffffff8170e0cb>] mlx4_en_try_alloc_resources+0x13b/0x1a0 [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210 [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190 [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0 [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50 [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0 [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0 [<ffffffff812480f9>] SyS_ioctl+0x79/0x90 [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad [<ffffffffffffffff>] 0xffffffffffffffff (gdb) list *mlx4_en_try_alloc_resources+0x118 0xffffffff8170e0a8 is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2145). 2140 if (!dst->tx_ring_num[t]) 2141 continue; 2142 2143 dst->tx_ring[t] = kzalloc(sizeof(struct mlx4_en_tx_ring *) * 2144 MAX_TX_RINGS, GFP_KERNEL); 2145 if (!dst->tx_ring[t]) 2146 goto err_free_tx; 2147 2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) * 2149 MAX_TX_RINGS, GFP_KERNEL); (gdb) list *mlx4_en_try_alloc_resources+0x13b 0xffffffff8170e0cb is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2150). 2145 if (!dst->tx_ring[t]) 2146 goto err_free_tx; 2147 2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) * 2149 MAX_TX_RINGS, GFP_KERNEL); 2150 if (!dst->tx_cq[t]) { 2151 kfree(dst->tx_ring[t]); 2152 goto err_free_tx; 2153 } 2154 } Fixes: ec25bc04 ("net/mlx4_en: Add resilience in low memory systems") Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 01 Feb, 2017 11 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) Fix handling of interrupt status in stmmac driver. Just because we have masked the event from generating interrupts, doesn't mean the bit won't still be set in the interrupt status register. From Alexey Brodkin. 2) Fix DMA API debugging splats in gianfar driver, from Arseny Solokha. 3) Fix off-by-one error in __ip6_append_data(), from Vlad Yasevich. 4) cls_flow does not match on icmpv6 codes properly, from Simon Horman. 5) Initial MAC address can be set incorrectly in some scenerios, from Ivan Vecera. 6) Packet header pointer arithmetic fix in ip6_tnl_parse_tlv_end_lim(), from Dan Carpenter. 7) Fix divide by zero in __tcp_select_window(), from Eric Dumazet. 8) Fix crash in iwlwifi when unregistering thermal zone, from Jens Axboe. 9) Check for DMA mapping errors in starfire driver, from Alexey Khoroshilov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits) tcp: fix 0 divide in __tcp_select_window() ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim() net: fix ndo_features_check/ndo_fix_features comment ordering net/sched: matchall: Fix configuration race be2net: fix initial MAC setting ipv6: fix flow labels when the traffic class is non-0 net: thunderx: avoid dereferencing xcv when NULL net/sched: cls_flower: Correct matching on ICMPv6 code ipv6: Paritially checksum full MTU frames net/mlx4_core: Avoid command timeouts during VF driver device shutdown gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page net: ethtool: add support for 2500BaseT and 5000BaseT link modes can: bcm: fix hrtimer/tasklet termination in bcm op removal net: adaptec: starfire: add checks for dma mapping errors net: phy: micrel: KSZ8795 do not set SUPPORTED_[Asym_]Pause can: Fix kernel panic at security_sock_rcv_skb net: macb: Fix 64 bit addressing support for GEM stmmac: Discard masked flags in interrupt status register net/mlx5e: Check ets capability before ets query FW command net/mlx5e: Fix update of hash function/key via ethtool ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull fscache fixes from Al Viro. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fscache: Fix dead object requeue fscache: Clear outstanding writes when disabling a cookie FS-Cache: Initialise stores_lock in netfs cookie
-
Eric Dumazet authored
syszkaller fuzzer was able to trigger a divide by zero, when TCP window scaling is not enabled. SO_RCVBUF can be used not only to increase sk_rcvbuf, also to decrease it below current receive buffers utilization. If mss is negative or 0, just return a zero TCP window. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
Casting is a high precedence operation but "off" and "i" are in terms of bytes so we need to have some parenthesis here. Fixes: fbfa743a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fixes from Herbert Xu: "This fixes a bug in CBC/CTR on ARM64 that breaks chaining as well as a bug in the core API that causes registration failures when a driver unloads and then reloads an algorithm" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg
-
git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds authored
Pull dmaengine fixes from Vinod Koul: "A couple of fixes showed up late in the cycle so sending them up and sending early in the week and not on Friday :). They fix a double lock in pl330 driver and runtime pm fixes for cppi driver" * tag 'dmaengine-fix-4.10-rc7' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: pl330: fix double lock dmaengine: cppi41: Clean up pointless warnings dmaengine: cppi41: Fix oops in cppi41_runtime_resume dmaengine: cppi41: Fix runtime PM timeouts with USB mass storage
-
Dimitris Michailidis authored
Commit cdba756f ("net: move ndo_features_check() close to ndo_start_xmit()") inadvertently moved the doc comment for .ndo_fix_features instead of .ndo_features_check. Fix the comment ordering. Fixes: cdba756f ("net: move ndo_features_check() close to ndo_start_xmit()") Signed-off-by: Dimitris Michailidis <dmichail@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yotam Gigi authored
In the current version, the matchall internal state is split into two structs: cls_matchall_head and cls_matchall_filter. This makes little sense, as matchall instance supports only one filter, and there is no situation where one exists and the other does not. In addition, that led to some races when filter was deleted while packet was processed. Unify that two structs into one, thus simplifying the process of matchall creation and deletion. As a result, the new, delete and get callbacks have a dummy implementation where all the work is done in destroy and change callbacks, as was done in cls_cgroup. Fixes: bf3994d2 ("net/sched: introduce Match-all classifier") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrlLinus Torvalds authored
Pull pin control fixes from Linus Walleij: "Another week, another set of pin control fixes. The subsystem has seen high patch-spot activity recently. The majority of the patches are for Intel, I vaguely think it mostly concern phones, tablets and maybe chromebooks and even laptops with this Intel Atom family chips. Driver fixes only: - one fix to the Berlin driver making the SD card work fully again. - one fix to the Allwinner/sunxi bias function: one premature change needs to be partially reverted. - the remaining four patches are to Intel embedded SoCs: baytrail (three patches) and merrifield (one patch): register access debounce fixes and a missing spinlock" * tag 'pinctrl-v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: baytrail: Add missing spinlock usage in byt_gpio_irq_handler pinctrl: baytrail: Debounce register is one per community pinctrl: baytrail: Rectify debounce support (part 2) pinctrl: intel: merrifield: Add missed check in mrfld_config_set() pinctrl: sunxi: Don't enforce bias disable (for now) pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES
-
Ivan Vecera authored
Recent commit 34393529 ("be2net: fix MAC addr setting on privileged BE3 VFs") allows privileged BE3 VFs to set its MAC address during initialization. Although the initial MAC for such VFs is already programmed by parent PF the subsequent setting performed by VF is OK, but in certain cases (after fresh boot) this command in VF can fail. The MAC should be initialized only when: 1) no MAC is programmed (always except BE3 VFs during first init) 2) programmed MAC is different from requested (e.g. MAC is set when interface is down). In this case the initial MAC programmed by PF needs to be deleted. The adapter->dev_mac contains MAC address currently programmed in HW so it should be zeroed when the MAC is deleted from HW and should not be filled when MAC is set when interface is down in be_mac_addr_set() as no programming is performed in this case. Example of failure without the fix (immediately after fresh boot): # ip link set eth0 up <- eth0 is BE3 PF be2net 0000:01:00.0 eth0: Link is Up # echo 1 > /sys/class/net/eth0/device/sriov_numvfs <- Create 1 VF ... be2net 0000:01:04.0: Emulex OneConnect(be3): VF port 0 # ip link set eth8 up <- eth8 is created privileged VF be2net 0000:01:04.0: opcode 59-1 failed:status 1-76 RTNETLINK answers: Input/output error # echo 0 > /sys/class/net/eth0/device/sriov_numvfs <- Delete VF iommu: Removing device 0000:01:04.0 from group 33 ... # echo 1 > /sys/class/net/eth0/device/sriov_numvfs <- Create it again iommu: Removing device 0000:01:04.0 from group 33 ... # ip link set eth8 up be2net 0000:01:04.0 eth8: Link is Up Initialization is now OK. v2 - Corrected the comment and condition check suggested by Suresh & Harsha Fixes: 34393529 ("be2net: fix MAC addr setting on privileged BE3 VFs") Cc: Sathya Perla <sathya.perla@broadcom.com> Cc: Ajit Khaparde <ajit.khaparde@broadcom.com> Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Cc: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Ivan Vecera <cera@cera.cz> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds authored
Pull tracing fix from Steven Rostedt: "It was reported to me that the thread created by the hwlat tracer does not migrate after the first instance. I found that there was as small bug in the logic, and fixed it. It's minor, but should be fixed regardless. There's not much impact outside the hwlat tracer" * tag 'trace-4.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix hwlat kthread migration
-
- 31 Jan, 2017 15 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input subsystem fixes from Dmitry Torokhov: "A fix for a crash in the wm97xx driver and synaptics-rmi4 will stop throwing erroneous warnings." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics-rmi4 - fix reversed conditions in enable/disable_irq_wake Input: wm97xx - make missing platform data non-fatal
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds authored
Pull cgroup fix from Tejun Heo: "The cgroup creation path was getting the order of operations wrong and exposing cgroups which don't have their names set yet to controllers which can lead to NULL derefs. This contains the fix for the bug" * 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: don't online subsystems before cgroup_name/path() are operational
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpuLinus Torvalds authored
Pull percpu fix from Tejun Heo: "Douglas found and fixed a ref leak bug in percpu_ref_tryget[_live](). The bug is caused by storing the return value of atomic_long_inc_not_zero() into an int temp variable before returning it as a bool. The interim cast to int loses the upper bits and can lead to false negatives. As percpu_ref uses a high bit to mark a draining counter, this can happen relatively easily. Fixed by using bool for the temp variable" * 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu-refcount: fix reference leak during percpu-atomic transition
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libataLinus Torvalds authored
Pull libata fixes from Tejun Heo: "Three libata fixes: an error handling fix, blacklist addition for another fallout from upping the default max sectors, and fix for a sense data reporting bug which affects new harddrives which can report sense data" * 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ata: sata_mv:- Handle return value of devm_ioremap. libata: Fix ATA request sense libata: apply MAX_SEC_1024 to all CX1-JB*-HP devices
-
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hidLinus Torvalds authored
Pull HID fixes from Jiri Kosina: - regression fix (sleeping while atomic) for cp2112, from Johan Hovold - regression fix for proximity handling under certain circumstances in Wacom driver, from Jason Gerecke - functional fix for Logitech Rumblepad 2, from Ardinartsev Nikita * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: cp2112: fix gpio-callback error handling HID: cp2112: fix sleep-while-atomic HID: hid-lg: Fix immediate disconnection of Logitech Rumblepad 2 HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL HID: wacom: Fix poor prox handling in 'wacom_pl_irq'
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fix from Steve French: "A small cifs fix for stable" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: initialize file_info_lock
-
David Howells authored
Under some circumstances, an fscache object can become queued such that it fscache_object_work_func() can be called once the object is in the OBJECT_DEAD state. This results in the kernel oopsing when it tries to invoke the handler for the state (which is hard coded to 0x2). The way this comes about is something like the following: (1) The object dispatcher is processing a work state for an object. This is done in workqueue context. (2) An out-of-band event comes in that isn't masked, causing the object to be queued, say EV_KILL. (3) The object dispatcher finishes processing the current work state on that object and then sees there's another event to process, so, without returning to the workqueue core, it processes that event too. It then follows the chain of events that initiates until we reach OBJECT_DEAD without going through a wait state (such as WAIT_FOR_CLEARANCE). At this point, object->events may be 0, object->event_mask will be 0 and oob_event_mask will be 0. (4) The object dispatcher returns to the workqueue processor, and in due course, this sees that the object's work item is still queued and invokes it again. (5) The current state is a work state (OBJECT_DEAD), so the dispatcher jumps to it - resulting in an OOPS. When I'm seeing this, the work state in (1) appears to have been either LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is fscache_osm_lookup_oob). The window for (2) is very small: (A) object->event_mask is cleared whilst the event dispatch process is underway - though there's no memory barrier to force this to the top of the function. The window, therefore is from the time the object was selected by the workqueue processor and made requeueable to the time the mask was cleared. (B) fscache_raise_event() will only queue the object if it manages to set the event bit and the corresponding event_mask bit was set. The enqueuement is then deferred slightly whilst we get a ref on the object and get the per-CPU variable for workqueue congestion. This slight deferral slightly increases the probability by allowing extra time for the workqueue to make the item requeueable. Handle this by giving the dead state a processor function and checking the for the dead state address rather than seeing if the processor function is address 0x2. The dead state processor function can then set a flag to indicate that it's occurred and give a warning if it occurs more than once per object. If this race occurs, an oops similar to the following is seen (note the RIP value): BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 IP: [<0000000000000002>] 0x1 PGD 0 Oops: 0010 [#1] SMP Modules linked in: ... CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015 Workqueue: fscache_object fscache_object_work_func [fscache] task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000 RIP: 0010:[<0000000000000002>] [<0000000000000002>] 0x1 RSP: 0018:ffff880717547df8 EFLAGS: 00010202 RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200 RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480 RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510 R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000 R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600 FS: 0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00 ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000 ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900 Call Trace: [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache] [<ffffffff8109d5db>] process_one_work+0x17b/0x470 [<ffffffff8109e4ac>] worker_thread+0x21c/0x400 [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400 [<ffffffff810a5acf>] kthread+0xcf/0xe0 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [<ffffffff816460d8>] ret_from_fork+0x58/0x90 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeremy McNicoll <jeremymc@redhat.com> Tested-by: Frank Sorenson <sorenson@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
David Howells authored
fscache_disable_cookie() needs to clear the outstanding writes on the cookie it's disabling because they cannot be completed after. Without this, fscache_nfs_open_file() gets stuck because it disables the cookie when the file is opened for writing but can't uncache the pages till afterwards - otherwise there's a race between the open routine and anyone who already has it open R/O and is still reading from it. Looking in /proc/pid/stack of the offending process shows: [<ffffffffa0142883>] __fscache_wait_on_page_write+0x82/0x9b [fscache] [<ffffffffa014336e>] __fscache_uncache_all_inode_pages+0x91/0xe1 [fscache] [<ffffffffa01740fa>] nfs_fscache_open_file+0x59/0x9e [nfs] [<ffffffffa01ccf41>] nfs4_file_open+0x17f/0x1b8 [nfsv4] [<ffffffff8117350e>] do_dentry_open+0x16d/0x2b7 [<ffffffff811743ac>] vfs_open+0x5c/0x65 [<ffffffff81184185>] path_openat+0x785/0x8fb [<ffffffff81184343>] do_filp_open+0x48/0x9e [<ffffffff81174710>] do_sys_open+0x13b/0x1cb [<ffffffff811747b9>] SyS_open+0x19/0x1b [<ffffffff81001c44>] do_syscall_64+0x80/0x17a [<ffffffff8165c2da>] return_from_SYSCALL_64+0x0/0x7a [<ffffffffffffffff>] 0xffffffffffffffff Reported-by: Jianhong Yin <jiyin@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
David Howells authored
Initialise the stores_lock in fscache netfs cookies. Technically, it shouldn't be necessary, since the netfs cookie is an index and stores no data, but initialising it anyway adds insignificant overhead. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Dimitris Michailidis authored
ip6_make_flowlabel() determines the flow label for IPv6 packets. It's supposed to be passed a flow label, which it returns as is if non-0 and in some other cases, otherwise it calculates a new value. The problem is callers often pass a flowi6.flowlabel, which may also contain traffic class bits. If the traffic class is non-0 ip6_make_flowlabel() mistakes the non-0 it gets as a flow label and returns the whole thing. Thus it can return a 'flow label' longer than 20b and the low 20b of that is typically 0 resulting in packets with 0 label. Moreover, different packets of a flow may be labeled differently. For a TCP flow with ECN non-payload and payload packets get different labels as exemplified by this pair of consecutive packets: (pure ACK) Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2:: 0110 .... = Version: 6 .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT) .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0) .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0) .... .... .... 0001 1100 1110 0100 1001 = Flow Label: 0x1ce49 Payload Length: 32 Next Header: TCP (6) (payload) Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2:: 0110 .... = Version: 6 .... 0000 0010 .... .... .... .... .... = Traffic Class: 0x02 (DSCP: CS0, ECN: ECT(0)) .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0) .... .... ..10 .... .... .... .... .... = Explicit Congestion Notification: ECN-Capable Transport codepoint '10' (2) .... .... .... 0000 0000 0000 0000 0000 = Flow Label: 0x00000 Payload Length: 688 Next Header: TCP (6) This patch allows ip6_make_flowlabel() to be passed more than just a flow label and has it extract the part it really wants. This was simpler than modifying the callers. With this patch packets like the above become Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2:: 0110 .... = Version: 6 .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT) .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0) .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0) .... .... .... 1010 1111 1010 0101 1110 = Flow Label: 0xafa5e Payload Length: 32 Next Header: TCP (6) Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2:: 0110 .... = Version: 6 .... 0000 0010 .... .... .... .... .... = Traffic Class: 0x02 (DSCP: CS0, ECN: ECT(0)) .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0) .... .... ..10 .... .... .... .... .... = Explicit Congestion Notification: ECN-Capable Transport codepoint '10' (2) .... .... .... 1010 1111 1010 0101 1110 = Flow Label: 0xafa5e Payload Length: 688 Next Header: TCP (6) Signed-off-by: Dimitris Michailidis <dmichail@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vincent authored
This fixes the following smatch and coccinelle warnings: drivers/net/ethernet/cavium/thunder/thunder_xcv.c:119 xcv_setup_link() error: we previously assumed 'xcv' could be null (see line 118) [smatch] drivers/net/ethernet/cavium/thunder/thunder_xcv.c:119:16-20: ERROR: xcv is NULL but dereferenced. [coccinelle] Fixes: 6465859a ("net: thunderx: Add RGMII interface type support") Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net> Cc: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Steven Rostedt (VMware) authored
The hwlat tracer creates a kernel thread at start of the tracer. It is pinned to a single CPU and will move to the next CPU after each period of running. If the user modifies the migration thread's affinity, it will not change after that happens. The original code created the thread at the first instance it was called, but later was changed to destroy the thread after the tracer was finished, and would not be created until the next instance of the tracer was established. The code that initialized the affinity was only called on the initial instantiation of the tracer. After that, it was not initialized, and the previous affinity did not match the current newly created one, making it appear that the user modified the thread's affinity when it did not, and the thread failed to migrate again. Cc: stable@vger.kernel.org Fixes: 0330f7aa ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Johan Hovold authored
In case of a zero-length report, the gpio direction_input callback would currently return success instead of an errno. Fixes: 1ffb3c40 ("HID: cp2112: make transfer buffers DMA capable") Cc: stable <stable@vger.kernel.org> # 4.9 Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
-
Johan Hovold authored
A recent commit fixing DMA-buffers on stack added a shared transfer buffer protected by a spinlock. This is broken as the USB HID request callbacks can sleep. Fix this up by replacing the spinlock with a mutex. Fixes: 1ffb3c40 ("HID: cp2112: make transfer buffers DMA capable") Cc: stable <stable@vger.kernel.org> # 4.9 Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
-
Christophe JAILLET authored
These tests are reversed. A warning should be displayed if an error is returned, not on success. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
-
- 30 Jan, 2017 8 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds authored
Pull sparc fixes from David Miller: "Several small bug fixes and tidies, along with a fix for non-resumable memory errors triggered by userspace" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Handle PIO & MEM non-resumable errors. sparc64: Zero pages on allocation for mondo and error queues. sparc: Fixed typo in sstate.c. Replaced panicing with panicking sparc: use symbolic names for tsb indexing
-
David S. Miller authored
Liam R. Howlett says: ==================== sparc64: Recover from userspace non-resumable PIO & MEM errors A non-resumable error from userspace is able to cause a kernel panic or trap loop due to the setup and handling of the queued traps once in the kernel. This patch series addresses both of these issues. The queues are fixed by simply zeroing the memory before use. PIO errors from userspace will result in a SIGBUS being sent to the user process. The MEM errors form userspace will result in a SIGKILL and also cause the offending pages to be claimed so they are no longer used in future tasks. SIGKILL is used to ensure that the process does not try to coredump and result in an attempt to read the memory again from within kernel space. Although there is a HV call to scrub the memory (mem_scrub), there is no easy way to guarantee that the real memory address(es) are not used by other tasks. Clearing the error with mem_scrub would zero the memory and cause the other processes to proceed with bad data. The handling of other non-resumable errors remain unchanged and will cause a panic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Liam R. Howlett authored
User processes trying to access an invalid memory address via PIO will receive a SIGBUS signal instead of causing a panic. Memory errors will receive a SIGKILL since a SIGBUS may result in a coredump which may attempt to repeat the faulting access. Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Liam R. Howlett authored
Error queues use a non-zero first word to detect if the queues are full. Using pages that have not been zeroed may result in false positive overflow events. These queues are set up once during boot so zeroing all mondo and error queue pages is safe. Note that the false positive overflow does not always occur because the page allocation for these queues is so early in the boot cycle that higher number CPUs get fresh pages. It is only when traps are serviced with lower number CPUs who were given already used pages that this issue is exposed. Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
When matching on the ICMPv6 code ICMPV6_CODE rather than ICMPV4_CODE attributes should be used. This corrects what appears to be a typo. Sample usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ipv6 parent ffff: flower \ indev eth0 ip_proto icmpv6 type 128 code 0 action drop Without this change the code parameter above is effectively ignored. Fixes: 7b684884 ("net/sched: cls_flower: Support matching on ICMP type and code") Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'linux-can-fixes-for-4.10-20170130' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2017-01-30 this is a pull request of one patch. The patch is by Oliver Hartkopp and fixes the hrtimer/tasklet termination in bcm op removal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linuxLinus Torvalds authored
Pull RTC fix from Alexandre Belloni: "A single fix for this cycle. It is worth taking it for 4.10 so that distributions will not have CONFIG_RTC_DRV_JZ4740 switching from m to y in their config. Summary: - Allow jz4740 to build as a module again by using kernel_halt()" * tag 'rtc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: jz4740: make the driver buildable as a module again
-
Vlad Yasevich authored
IPv6 will mark data that is smaller that mtu - headersize as CHECKSUM_PARTIAL, but if the data will completely fill the mtu, the packet checksum will be computed in software instead. Extend the conditional to include the data that fills the mtu as well. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-