- 20 Aug, 2019 1 commit
-
-
Thomas Gleixner authored
Posix timer delete retry loops are affected by the same priority inversion and live lock issues as the other timers. Provide a RT specific synchronization function which keeps a reference to the timer by holding rcu read lock to prevent the timer from being freed, dropping the timer lock and invoking the timer specific wait function via a new callback. This does not yet cover posix CPU timers because they need more special treatment on PREEMPT_RT. [ This is folded into the original attempt which did not use a callback. ] Originally-by: Anna-Maria Gleixenr <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190819143801.656864506@linutronix.de
-
- 01 Aug, 2019 21 commits
-
-
Sebastian Andrzej Siewior authored
Timer deletion on PREEMPT_RT is prone to priority inversion and live locks. The hrtimer code has a synchronization mechanism for this. Posix CPU timers will grow one. But that mechanism cannot be invoked while holding the k_itimer lock because that can deadlock against the running timer callback. So the lock must be dropped which allows the timer to be freed. The timer free can be prevented by taking RCU readlock before dropping the lock, but because the rcu_head is part of the 'it' union a concurrent free will overwrite the hrtimer on which the task is trying to synchronize. Move the rcu_head out of the union to prevent this. [ tglx: Fixed up kernel-doc. Rewrote changelog ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190730223828.965541887@linutronix.de
-
Thomas Gleixner authored
As a preparatory step for adding the PREEMPT RT specific synchronization mechanism to wait for a running timer callback, rework the timer cancel retry loops so they call a common function. This allows trivial substitution in one place. Originally-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190730223828.874901027@linutronix.de
-
Thomas Gleixner authored
do_timer_settime() has a 'flags' argument and uses 'flag' for the interrupt flags, which is confusing at best. Rename the argument so 'flags' can be used for interrupt flags as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190730223828.782664411@linutronix.de
-
Anna-Maria Gleixner authored
Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent priority inversion and live locks on PREEMPT_RT. As a benefit the retry loop gains the missing cpu_relax() on !RT. [ tglx: Split out of combo patch ] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190730223828.690771827@linutronix.de
-
Anna-Maria Gleixner authored
Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent priority inversion and live locks on PREEMPT_RT. [ tglx: Split out of combo patch ] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190730223828.600085866@linutronix.de
-
Anna-Maria Gleixner authored
Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent priority inversion and live locks on PREEMPT_RT. [ tglx: Split out of combo patch ] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190730223828.508744705@linutronix.de
-
Juri Lelli authored
SCHED_DEADLINE inactive timer needs to run in hardirq context (as dl_task_timer already does) on PREEMPT_RT Change the mode to HRTIMER_MODE_REL_HARD. [ tglx: Fixed up the start site, so mode debugging works ] Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190731103715.4047-1-juri.lelli@redhat.com
-
Anna-Maria Gleixner authored
When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling del_timer_sync() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If del_timer_sync() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. This mechanism is not used for timers which are marked IRQSAFE as for those preemption is disabled accross the callback and therefore this situation cannot happen. The callbacks for such timers need to be individually audited for RT compliance. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls del_timer_sync() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant of del_timer_sync() needs to be used on UP as well. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.832418500@linutronix.de
-
Anna-Maria Gleixner authored
When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling hrtimer_cancel() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If hrtimer_cancel() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.737767218@linutronix.de
-
Sebastian Andrzej Siewior authored
On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for hard interrupt expiry mode are moved into soft interrupt context either for latency reasons or because the hrtimer callback takes regular spinlocks or invokes other functions which are not suitable for hard interrupt context on PREEMPT_RT. The hrtimer_sleeper callback is RT compatible in hard interrupt context, but there is a latency concern: Untrusted userspace can spawn many threads which arm timers for the same expiry time on the same CPU. On expiry that causes a latency spike due to the wakeup of a gazillion threads. OTOH, priviledged real-time user space applications rely on the low latency of hard interrupt wakeups. These syscall related wakeups are all based on hrtimer sleepers. If the current task is in a real-time scheduling class, mark the mode for hard interrupt expiry. [ tglx: Split out of a larger combo patch. Added changelog ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.645792403@linutronix.de
-
Sebastian Andrzej Siewior authored
On PREEMPT_RT not all hrtimers can be expired in hard interrupt context even if that is perfectly fine on a PREEMPT_RT=n kernel, e.g. because they take regular spinlocks. Also for latency reasons PREEMPT_RT tries to defer most hrtimers' expiry into softirq context. hrtimers marked with HRTIMER_MODE_HARD must be kept in hard interrupt context expiry mode. Add the required logic. No functional change for PREEMPT_RT=n kernels. [ tglx: Split out of a larger combo patch. Added changelog ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.551967692@linutronix.de
-
Sebastian Andrzej Siewior authored
The tick related hrtimers, which drive the scheduler tick and hrtimer based broadcasting are required to expire in hard interrupt context for obvious reasons. Mark them so PREEMPT_RT kernels wont move them to soft interrupt expiry. Make the horribly formatted RCU_NONIDLE bracket maze readable while at it. No functional change, [ tglx: Split out from larger combo patch. Add changelog ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.459144407@linutronix.de
-
Sebastian Andrzej Siewior authored
On PREEMPT_RT enabled kernels unmarked hrtimers are moved into soft interrupt expiry mode by default. While that's not a functional requirement for the KVM local APIC timer emulation, it's a latency issue which can be avoided by marking the timer so hard interrupt context expiry is enforced. No functional change. [ tglx: Split out from larger combo patch. Add changelog. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.363363474@linutronix.de
-
Sebastian Andrzej Siewior authored
The watchdog hrtimer must expire in hard interrupt context even on PREEMPT_RT=y kernels as otherwise the hard/softlockup detection logic would not work. No functional change. [ tglx: Split out from larger combo patch. Added changelog ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.262895510@linutronix.de
-
Sebastian Andrzej Siewior authored
To guarantee that the multiplexing mechanism and the hrtimer driven events work on PREEMPT_RT enabled kernels it's required that the related hrtimers expire in hard interrupt context. Mark them so PREEMPT_RT kernels wont defer them to soft interrupt context. No functional change. [ tglx: Split out of larger combo patch. Added changelog ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.169509224@linutronix.de
-
Sebastian Andrzej Siewior authored
The scheduler related hrtimers need to expire in hard interrupt context even on PREEMPT_RT enabled kernels. Mark then as such. No functional change. [ tglx: Split out from larger combo patch. Add changelog. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.077004842@linutronix.de
-
Thomas Gleixner authored
hrtimer_start_range_ns() has a WARN_ONCE() which verifies that a timer which is marker for softirq expiry is not queued in the hard interrupt base and vice versa. When PREEMPT_RT is enabled, timers which are not explicitely marked to expire in hard interrupt context are deferrred to the soft interrupt. So the regular check would trigger. Change the check, so when PREEMPT_RT is enabled, it is verified that the timers marked for hard interrupt expiry are not tried to be queued for soft interrupt expiry or any of the unmarked and softirq marked is tried to be expired in hard interrupt context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Sebastian Andrzej Siewior authored
On PREEMPT_RT not all hrtimers can be expired in hard interrupt context even if that is perfectly fine on a PREEMPT_RT=n kernel, e.g. because they take regular spinlocks. Also for latency reasons PREEMPT_RT tries to defer most hrtimers' expiry into soft interrupt context. But there are hrtimers which must be expired in hard interrupt context even when PREEMPT_RT is enabled: - hrtimers which must expiry in hard interrupt context, e.g. scheduler, perf, watchdog related hrtimers - latency critical hrtimers, e.g. nanosleep, ..., kvm lapic timer Add a new mode flag HRTIMER_MODE_HARD which allows to mark these timers so PREEMPT_RT will not move them into softirq expiry mode. [ tglx: Split out of a larger combo patch. Added changelog ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185752.981398465@linutronix.de
-
Thomas Gleixner authored
hrtimer_sleepers will gain a scheduling class dependent treatment on PREEMPT_RT. Use the new hrtimer_sleeper_start_expires() function to make that possible. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
hrtimer_sleepers will gain a scheduling class dependent treatment on PREEMPT_RT. Create a wrapper around hrtimer_start_expires() to make that possible. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Sebastian Andrzej Siewior authored
hrtimer_init_sleeper() calls require prior initialisation of the hrtimer object which is embedded into the hrtimer_sleeper. Combine the initialization and spare a function call. Fixup all call sites. This is also a preparatory change for PREEMPT_RT to do hrtimer sleeper specific initializations of the embedded hrtimer without modifying any of the call sites. No functional change. [ anna-maria: Minor cleanups ] [ tglx: Adopted to the removal of the task argument of hrtimer_init_sleeper() and trivial polishing. Folded a fix from Stephen Rothwell for the vsoc code ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185752.887468908@linutronix.de
-
- 30 Jul, 2019 1 commit
-
-
Thomas Gleixner authored
All callers hand in 'current' and that's the only task pointer which actually makes sense. Remove the task argument and set current in the function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185752.791885290@linutronix.de
-
- 24 Jul, 2019 1 commit
-
-
Davidlohr Bueso authored
Simplify the timerqueue code by using cached rbtrees and rely on the tree leftmost node semantics to get the timer with earliest expiration time. This is a drop in conversion, and therefore semantics remain untouched. The runtime overhead of cached rbtrees is be pretty much the same as the current head->next method, noting that when removing the leftmost node, a common operation for the timerqueue, the rb_next(leftmost) is O(1) as well, so the next timer will either be the right node or its parent. Therefore no extra pointer chasing. Finally, the size of the struct timerqueue_head remains the same. Passes several hours of rcutorture. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190724152323.bojciei3muvfxalm@linux-r8p5
-
- 23 Jul, 2019 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linuxLinus Torvalds authored
Pull parisc fixes from Helge Deller: - Fix build issues when kprobes are enabled - Speed up ITLB/DTLB cache flushes when running on machines with combined TLBs * 'parisc-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Flush ITLB in flush_tlb_all_local() only on split TLB machines parisc: add kprobe_fault_handler()
-
- 22 Jul, 2019 9 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull preemption Kconfig fix from Thomas Gleixner: "The PREEMPT_RT stub config renamed PREEMPT to PREEMPT_LL and defined PREEMPT outside of the menu and made it selectable by both PREEMPT_LL and PREEMPT_RT. Stupid me missed that 114 defconfigs select CONFIG_PREEMPT which obviously can't work anymore. oldconfig builds are affected as well, but it's more obvious as the user gets asked. [old]defconfig silently fixes it up and selects PREEMPT_NONE. Unbreak it by undoing the rename and adding a intermediate config symbol which is selected by both PREEMPT and PREEMPT_RT. That requires to chase down a few #ifdefs, but it's better than tweaking 114 defconfigs and annoying users" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
-
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linuxLinus Torvalds authored
Pull pidfd polling fix from Christian Brauner: "A fix for pidfd polling. It ensures that the task's exit state is visible to all waiters" * tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: pidfd: fix a poll race when setting exit_state
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fixes from David Sterba: - fixes for leaks caused by recently merged patches - one build fix - a fix to prevent mixing of incompatible features * tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: don't leak extent_map in btrfs_get_io_geometry() btrfs: free checksum hash on in close_ctree btrfs: Fix build error while LIBCRC32C is module btrfs: inode: Don't compress if NODATASUM or NODATACOW set
-
Thomas Gleixner authored
The merge of the CONFIG_PREEMPT_RT stub renamed CONFIG_PREEMPT to CONFIG_PREEMPT_LL which causes all defconfigs which have CONFIG_PREEMPT=y set to fall back to CONFIG_PREEMPT_NONE because CONFIG_PREEMPT depends on the preemption mode choice wich defaults to NONE. This also affects oldconfig builds. So rather than changing 114 defconfig files and being an annoyance to users, revert the rename and select a new config symbol PREEMPTION. That keeps everything working smoothly and the revelant ifdef's are going to be fixed up step by step. Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: a50a3f4b ("sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT") Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds authored
Pull media fixes from Mauro Carvalho Chehab: "For two regressions in media core: - v4l2-subdev: fix regression in check_pad() - videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use" * tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use media: v4l2-subdev: fix regression in check_pad()
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) Several netfilter fixes including a nfnetlink deadlock fix from Florian Westphal and fix for dropping VRF packets from Miaohe Lin. 2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore proper block sharing. 3) Fix r8169 PHY init from Thomas Voegtle. 4) Fix memory leak in mac80211, from Lorenzo Bianconi. 5) Missing NULL check on object allocation in cxgb4, from Navid Emamdoost. 6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn. 7) Check that there is actually an ip header to access in skb->data in VRF, from Peter Kosyh. 8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang. 9) One more tweak the the TCP fragmentation memory limit changes, to be less harmful to applications setting small SO_SNDBUF values. From Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits) tcp: be more careful in tcp_fragment() hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback() vrf: make sure skb->data contains ip header to make routing connector: remove redundant input callback from cn_dev qed: Prefer pcie_capability_read_word() igc: Prefer pcie_capability_read_word() cxgb4: Prefer pcie_capability_read_word() be2net: Synchronize be_update_queues with dev_watchdog bnx2x: Prevent load reordering in tx completion processing net: phy: sfp: hwmon: Fix scaling of RX power net: sched: verify that q!=NULL before setting q->flags chelsio: Fix a typo in a function name allocate_flower_entry: should check for null deref net: hns3: typo in the name of a constant kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist. tipc: Fix a typo mac80211: don't warn about CW params when not using them mac80211: fix possible memory leak in ieee80211_assign_beacon nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN nl80211: fix VENDOR_CMD_RAW_DATA ...
-
Suren Baghdasaryan authored
There is a race between reading task->exit_state in pidfd_poll and writing it after do_notify_parent calls do_notify_pidfd. Expected sequence of events is: CPU 0 CPU 1 ------------------------------------------------ exit_notify do_notify_parent do_notify_pidfd tsk->exit_state = EXIT_DEAD pidfd_poll if (tsk->exit_state) However nothing prevents the following sequence: CPU 0 CPU 1 ------------------------------------------------ exit_notify do_notify_parent do_notify_pidfd pidfd_poll if (tsk->exit_state) tsk->exit_state = EXIT_DEAD This causes a polling task to wait forever, since poll blocks because exit_state is 0 and the waiting task is not notified again. A stress test continuously doing pidfd poll and process exits uncovered this bug. To fix it, we make sure that the task's exit_state is always set before calling do_notify_pidfd. Fixes: b53b0b9d ("pidfd: add polling support") Cc: kernel-team@android.com Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org [christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie] Signed-off-by: Christian Brauner <christian@brauner.io>
-
Eric Dumazet authored
Some applications set tiny SO_SNDBUF values and expect TCP to just work. Recent patches to address CVE-2019-11478 broke them in case of losses, since retransmits might be prevented. We should allow these flows to make progress. This patch allows the first and last skb in retransmit queue to be split even if memory limits are hit. It also adds the some room due to the fact that tcp_sendmsg() and tcp_sendpage() might overshoot sk_wmem_queued by about one full TSO skb (64KB size). Note this allowance was already present in stable backports for kernels < 4.15 Note for < 4.15 backports : tcp_rtx_queue_tail() will probably look like : static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk); return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); } Fixes: f070ef2a ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Tested-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Cc: Jonathan Looney <jtl@netflix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
There is an extra rcu_read_unlock left in netvsc_recv_callback(), after a previous patch that removes RCU from this function. This patch removes the extra RCU unlock. Fixes: 345ac089 ("hv_netvsc: pass netvsc_device to receive callback") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 21 Jul, 2019 6 commits
-
-
Linus Torvalds authored
-
Peter Kosyh authored
vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing using ip/ipv6 addresses, but don't make sure the header is available in skb->data[] (skb_headlen() is less then header size). Case: 1) igb driver from intel. 2) Packet size is greater then 255. 3) MPLS forwards to VRF device. So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound() functions. Signed-off-by: Peter Kosyh <p.kosyh@gmail.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vasily Averin authored
A small cleanup: this callback is never used. Originally fixed by Stanislav Kinsburskiy <skinsbursky@virtuozzo.com> for OpenVZ7 bug OVZ-6877 cc: stanislav.kinsburskiy@gmail.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Frederick Lawler authored
Commit 8c0d3a02 ("PCI: Add accessors for PCI Express Capability") added accessors for the PCI Express Capability so that drivers didn't need to be aware of differences between v1 and v2 of the PCI Express Capability. Replace pci_read_config_word() and pci_write_config_word() calls with pcie_capability_read_word() and pcie_capability_write_word(). Signed-off-by: Frederick Lawler <fred@fredlawl.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Frederick Lawler authored
Commit 8c0d3a02 ("PCI: Add accessors for PCI Express Capability") added accessors for the PCI Express Capability so that drivers didn't need to be aware of differences between v1 and v2 of the PCI Express Capability. Replace pci_read_config_word() and pci_write_config_word() calls with pcie_capability_read_word() and pcie_capability_write_word(). Signed-off-by: Frederick Lawler <fred@fredlawl.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Frederick Lawler authored
Commit 8c0d3a02 ("PCI: Add accessors for PCI Express Capability") added accessors for the PCI Express Capability so that drivers didn't need to be aware of differences between v1 and v2 of the PCI Express Capability. Replace pci_read_config_word() and pci_write_config_word() calls with pcie_capability_read_word() and pcie_capability_write_word(). Signed-off-by: Frederick Lawler <fred@fredlawl.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-