Commits · f8935983f110505daa38e8d36ee406807f83a069 · nexedi / linux

13 Mar, 2015 6 commits

clocksource: Mostly kill clocksource_register() · f8935983

John Stultz authored Mar 11, 2015

A long running project has been to clean up remaining uses
of clocksource_register(), replacing it with the simpler
clocksource_register_khz/hz() functions.

However, there are a few cases where we need to self-define
our mult/shift values, so switch the function to a more
obviously internal __clocksource_register() name, and
consolidate much of the internal logic so we don't have
duplication.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-10-git-send-email-john.stultz@linaro.org
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

f8935983

clocksource: Improve clocksource watchdog reporting · 0b046b21

John Stultz authored Mar 11, 2015

The clocksource watchdog reporting has been less helpful
then desired, as it just printed the delta between
the two clocksources. This prevents any useful analysis
of why the skew occurred.

Thus this patch tries to improve the output when we
mark a clocksource as unstable, printing out the cycle
last and now values for both the current clocksource
and the watchdog clocksource. This will allow us to see
if the result was due to a false positive caused by
a problematic watchdog.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-9-git-send-email-john.stultz@linaro.org
[ Minor cleanups of kernel messages. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

0b046b21

timekeeping: Add warnings when overflows or underflows are observed · 4ca22c26

John Stultz authored Mar 11, 2015

It was suggested that the underflow/overflow protection
should probably throw some sort of warning out, rather
than just silently fixing the issue.

So this patch adds some warnings here. The flag variables
used are not protected by locks, but since we can't print
from the reading functions, just being able to say we
saw an issue in the update interval is useful enough,
and can be slightly racy without real consequence.

The big complication is that we're only under a read
seqlock, so the data could shift under us during
our calculation to see if there was a problem. This
patch avoids this issue by nesting another seqlock
which allows us to snapshot the just required values
atomically. So we shouldn't see false positives.

I also added some basic rate-limiting here, since
on one build machine w/ skewed TSCs it was fairly
noisy at bootup.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stultz@linaro.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

4ca22c26

timekeeping: Try to catch clocksource delta underflows · 057b87e3

John Stultz authored Mar 11, 2015

In the case where there is a broken clocksource
where there are multiple actual clocks that
aren't perfectly aligned, we may see small "negative"
deltas when we subtract 'now' from 'cycle_last'.

The values are actually negative with respect to the
clocksource mask value, not necessarily negative
if cast to a s64, but we can check by checking the
delta to see if it is a small (relative to the mask)
negative value (again negative relative to the mask).

If so, we assume we jumped backwards somehow and
instead use zero for our delta.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stultz@linaro.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

057b87e3

timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value · a558cd02

John Stultz authored Mar 11, 2015

When calculating the current delta since the last tick, we
currently have no hard protections to prevent a multiplication
overflow from occuring.

This patch introduces infrastructure to allow a cap that
limits the clocksource read delta value to the 'max_cycles' value,
which is where an overflow would occur.

Since this is in the hotpath, it adds the extra checking under
CONFIG_DEBUG_TIMEKEEPING=y.

There was some concern that capping time like this could cause
problems as we may stop expiring timers, which could go circular
if the timer that triggers time accumulation were mis-scheduled
too far in the future, which would cause time to stop.

However, since the mult overflow would result in a smaller time
value, we would effectively have the same problem there.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stultz@linaro.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

a558cd02

timekeeping: Add debugging checks to warn if we see delays · 3c17ad19

John Stultz authored Mar 11, 2015

Recently there's been requests for better sanity
checking in the time code, so that it's more clear
when something is going wrong, since timekeeping issues
could manifest in a large number of strange ways in
various subsystems.

Thus, this patch adds some extra infrastructure to
add a check to update_wall_time() to print two new
warnings:

 1) if we see the call delayed beyond the 'max_cycles'
    overflow point,

 2) or if we see the call delayed beyond the clocksource's
    'max_idle_ns' value, which is currently 50% of the
    overflow point.

This extra infrastructure is conditional on
a new CONFIG_DEBUG_TIMEKEEPING option, also
added in this patch - default off.

Tested this a bit by halting qemu for specified
lengths of time to trigger the warnings.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stultz@linaro.org
[ Improved the changelog and the messages a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

3c17ad19

12 Mar, 2015 3 commits

clocksource: Add 'max_cycles' to 'struct clocksource' · fb82fe2f

John Stultz authored Mar 11, 2015

In order to facilitate clocksource validation, add a
'max_cycles' field to the clocksource structure which
will hold the maximum cycle value that can safely be
multiplied without potentially causing an overflow.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-4-git-send-email-john.stultz@linaro.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

fb82fe2f

clocksource: Simplify the logic around clocksource wrapping safety margins · 362fde04

John Stultz authored Mar 11, 2015

The clocksource logic has a number of places where we try to
include a safety margin. Most of these are 12% safety margins,
but they are inconsistently applied and sometimes are applied
on top of each other.

Additionally, in the previous patch, we corrected an issue
where we unintentionally in effect created a 50% safety margin,
which these 12.5% margins where then added to.

So to simplify the logic here, this patch removes the various
12.5% margins, and consolidates adding the margin in one place:
clocks_calc_max_nsecs().

Additionally, Linus prefers a 50% safety margin, as it allows
bad clock values to be more easily caught. This should really
have no net effect, due to the corrected issue earlier which
caused greater then 50% margins to be used w/o issue.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Stephen Boyd <sboyd@codeaurora.org> (for the sched_clock.c bit)
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-3-git-send-email-john.stultz@linaro.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

362fde04

clocksource: Simplify the clocks_calc_max_nsecs() logic · 6086e346

John Stultz authored Mar 11, 2015

The previous clocks_calc_max_nsecs() code had some unecessarily
complex bit logic to find the max interval that could cause
multiplication overflows. Since this is not in the hot
path, just do the divide to make it easier to read.

The previous implementation also had a subtle issue
that it avoided overflows with signed 64-bit values, where
as the intervals are always unsigned. This resulted in
overly conservative intervals, which other safety margins
were then added to, reducing the intended interval length.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-2-git-send-email-john.stultz@linaro.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

6086e346

04 Mar, 2015 1 commit
- Merge tag 'v4.0-rc2' into timers/core, to refresh the tree before pulling more changes · 0bbdb425
  Ingo Molnar authored Mar 04, 2015
  
  0bbdb425
03 Mar, 2015 2 commits

Linux 4.0-rc2 · 13a7a6ac
Linus Torvalds authored Mar 03, 2015

13a7a6ac

drm/i915: Fix modeset state confusion in the load detect code · 9128b040

Daniel Vetter authored Mar 03, 2015

This is a tricky story of the new atomic state handling and the legacy
code fighting over each another. The bug at hand is an underrun of the
framebuffer reference with subsequent hilarity caused by the load
detect code. Which is peculiar since the the exact same code works
fine as the implementation of the legacy setcrtc ioctl.

Let's look at the ingredients:

- Currently our code is a crazy mix of legacy modeset interfaces to
  set the parameters and half-baked atomic state tracking underneath.
  While this transition is going we're using the transitional plane
  helpers to update the atomic side (drm_plane_helper_disable/update
  and friends), i.e. plane->state->fb. Since the state structure owns
  the fb those functions take care of that themselves.

  The legacy state (specifically crtc->primary->fb) is still managed
  by the old code (and mostly by the drm core), with the fb reference
  counting done by callers (core drm for the ioctl or the i915 load
  detect code). The relevant commit is

  commit ea2c67bb
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Dec 23 10:41:52 2014 -0800

      drm/i915: Move to atomic plane helpers (v9)

- drm_plane_helper_disable has special code to handle multiple calls
  in a row - it checks plane->crtc == NULL and bails out. This is to
  match the proper atomic implementation which needs the crtc to get
  at the implied locking context atomic updates always need. See

  commit acf24a39
  Author: Daniel Vetter <daniel.vetter@ffwll.ch>
  Date:   Tue Jul 29 15:33:05 2014 +0200

      drm/plane-helper: transitional atomic plane helpers

- The universal plane code split out the implicit primary plane from
  the CRTC into it's own full-blown drm_plane object. As part of that
  the setcrtc ioctl (which updated both the crtc mode and primary
  plane) learned to set crtc->primary->crtc on modeset to make sure
  the plane->crtc assignments statate up to date in

  commit e13161af
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Apr 1 15:22:38 2014 -0700

      drm: Add drm_crtc_init_with_planes() (v2)

  Unfortunately we've forgotten to update the load detect code. Which
  wasn't a problem since the load detect modeset is temporary and
  always undone before we drop the locks.

- Finally there is a organically grown history (i.e. don't ask) around
  who sets the legacy plane->fb for the various driver entry points.
  Originally updating that was the drivers duty, but for almost all
  places we've moved that (plus updating the refcounts) into the core.
  Again the exception is the load detect code.

Taking all together the following happens:
- The load detect code doesn't set crtc->primary->crtc. This is only
  really an issue on crtcs never before used or when userspace
  explicitly disabled the primary plane.

- The plane helper glue code short-circuits because of that and leaves
  a non-NULL fb behind in plane->state->fb and plane->fb. The state
  fb isn't a real problem (it's properly refcounted on its own), it's
  just the canary.

- Load detect code drops the reference for that fb, but doesn't set
  plane->fb = NULL. This is ok since it's still living in that old
  world where drivers had to clear the pointer but the core/callers
  handled the refcounting.

- On the next modeset the drm core notices plane->fb and takes care of
  refcounting it properly by doing another unref. This drops the
  refcount to zero, leaving state->plane now pointing at freed memory.

- intel_plane_duplicate_state still assume it owns a reference to that
  very state->fb and bad things start to happen.

Fix this all by applying the same duct-tape as for the legacy setcrtc
ioctl code and set crtc->primary->crtc properly.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Paulo Zanoni <przanoni@gmail.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9128b040

02 Mar, 2015 4 commits

Merge tag 'gpio-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 023a6007

Linus Torvalds authored Mar 02, 2015

Pull GPIO fixes from Linus Walleij:
 "Two GPIO fixes:

   - Fix a translation problem in of_get_named_gpiod_flags()

   - Fix a long standing container_of() mistake in the TPS65912 driver"

* tag 'gpio-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: tps65912: fix wrong container_of arguments
  gpiolib: of: allow of_gpiochip_find_and_xlate to find more than one chip per node

023a6007

Merge branch 'fixes-for-4.0-rc2' of... · 10d6dfc1

Linus Torvalds authored Mar 02, 2015

Merge branch 'fixes-for-4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal

Pull thermal management fixes from Eduardo Valentin:
 "Specifics:

   - Several fixes in tmon tool.

   - Fixes in intel int340x for _ART and _TRT tables.

   - Add id for Avoton SoC into powerclamp driver.

   - Fixes in RCAR thermal driver to remove race conditions and fix fail
     path

   - Fixes in TI thermal driver: removal of unnecessary code and build
     fix if !CONFIG_PM_SLEEP

   - Cleanups in exynos thermal driver

   - Add stubs for include/linux/thermal.h.  Now drivers using thermal
     calls but that also work without CONFIG_THERMAL will be able to
     compile for systems that don't care about thermal.

  Note: I am sending this pull on Rui's behalf while he fixes issues in
  his Linux box"

* 'fixes-for-4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  thermal: int340x_thermal: Ignore missing _ART, _TRT tables
  thermal/intel_powerclamp: add id for Avoton SoC
  tools/thermal: tmon: silence 'set but not used' warnings
  tools/thermal: tmon: use pkg-config to determine library dependencies
  tools/thermal: tmon: support cross-compiling
  tools/thermal: tmon: add .gitignore
  tools/thermal: tmon: fixup tui windowing calculations
  tools/thermal: tmon: tui: don't hard-code dialog window size assumptions
  tools/thermal: tmon: add min/max macros
  tools/thermal: tmon: add --target-temp parameter
  thermal: exynos: Clean-up code to use oneline entry for exynos compatible table
  thermal: rcar: Make error and remove paths symmetrical with init
  thermal: rcar: Fix race condition between init and interrupt
  thermal: Introduce dummy functions when thermal is not defined
  ti-soc-thermal: Delete an unnecessary check before the function call "cpufreq_cooling_unregister"
  thermal: ti-soc-thermal: bandgap: Fix build warning if !CONFIG_PM_SLEEP

10d6dfc1

Merge tag 'md/4.0-fixes' of git://neil.brown.name/md · 1a6f77ab

Linus Torvalds authored Mar 02, 2015

Pull md fixes from Neil Brown:
 "Three md fixes:

   - fix a read-balance problem that was reported 2 years ago, but that
     I never noticed the report :-(

   - fix for rare RAID6 problem causing incorrect bitmap updates when
     two devices fail.

   - add __ATTR_PREALLOC annotation now that it is possible"

* tag 'md/4.0-fixes' of git://neil.brown.name/md:
  md: mark some attributes as pre-alloc
  raid5: check faulty flag for array status during recovery.
  md/raid1: fix read balance when a drive is write-mostly.

1a6f77ab

Merge tag 'metag-fixes-v4.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag · 49db1f0e

Linus Torvalds authored Mar 02, 2015

Pull arch/metag fix from James Hogan:
 "This is just a single patch to fix the KSTK_EIP() and KSTK_ESP()
  macros for metag which have always been erronously returning the PC
  and stack pointer of the task's kernel context rather than from its
  user context saved at entry from userland into the kernel, which
  affects the contents of /proc/<pid>/maps and /proc/<pid>/stat"

* tag 'metag-fixes-v4.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  metag: Fix KSTK_EIP() and KSTK_ESP() macros

49db1f0e

01 Mar, 2015 6 commits

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a38ecbbd

Linus Torvalds authored Mar 01, 2015

Pull x86 fixes from Ingo Molnar:
 "A CR4-shadow 32-bit init fix, plus two typo fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too
  x86/platform/intel-mid: Fix trivial printk message typo in intel_mid_arch_setup()
  x86/cpu/intel: Fix trivial typo in intel_tlb_table[]

a38ecbbd

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 640c0f5c

Linus Torvalds authored Mar 01, 2015

Pull timer fixes from Ingo Molnar:
 "Three clockevents/clocksource driver fixes"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: pxa: Fix section mismatch
  clocksource: mtk: Fix race conditions in probe code
  clockevents: asm9260: Fix compilation error with sparc/sparc64 allyesconfig

640c0f5c

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d7b48fec

Linus Torvalds authored Mar 01, 2015

Pull perf fixes from Ingo Molnar:
 "Two kprobes fixes and a handful of tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Make sparc64 arch point to sparc
  perf symbols: Define EM_AARCH64 for older OSes
  perf top: Fix SIGBUS on sparc64
  perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flag
  perf tools: Fix pthread_attr_setaffinity_np build error
  perf tools: Define _GNU_SOURCE on pthread_attr_setaffinity_np feature check
  perf bench: Fix order of arguments to memcpy_alloc_mem
  kprobes/x86: Check for invalid ftrace location in __recover_probed_insn()
  kprobes/x86: Use 5-byte NOP when the code might be modified by ftrace

d7b48fec

Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2ea51b88

Linus Torvalds authored Mar 01, 2015

Pull locking fix from Ingo Molnar:
 "An rtmutex deadlock path fixlet"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Set state back to running on error

2ea51b88

Merge tag 'perf-urgent-for-mingo' of... · 021f5f12

Ingo Molnar authored Mar 01, 2015

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

  - pthread_attr_setaffinity_np() feature detection build fixes (Adrian Hunter, Josh Boyer)

  - Fix probing for PERF_FLAG_FD_CLOEXEC flag (Adrian Hunter)

  - Fix order of arguments to memcpy_alloc_mem in 'perf bench' (Bruce Merry)

  - Sparc64 and Aarch64 build and segfault fixes (David Ahern)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

021f5f12

locking/rtmutex: Set state back to running on error · 9d3e2d02

Sebastian Andrzej Siewior authored Feb 27, 2015

The "usual" path is:

 - rt_mutex_slowlock()
 - set_current_state()
 - task_blocks_on_rt_mutex() (ret 0)
 - __rt_mutex_slowlock()
   - sleep or not but do return with __set_current_state(TASK_RUNNING)
 - back to caller.

In the early error case where task_blocks_on_rt_mutex() return
-EDEADLK we never change the task's state back to RUNNING. I
assume this is intended. Without this change after ww_mutex
using rt_mutex the selftest passes but later I get plenty of:

  | bad: scheduling from the idle thread!

backtraces.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: afffc6c1 ("locking/rtmutex: Optimize setting task running after being blocked")
Link: http://lkml.kernel.org/r/1425056229-22326-4-git-send-email-bigeasy@linutronix.deSigned-off-by: Ingo Molnar <mingo@kernel.org>

9d3e2d02

28 Feb, 2015 18 commits

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · ae1aa797

Linus Torvalds authored Feb 28, 2015

Pull drm fixes from Dave Airlie:
 "Just general fixes: radeon, i915, atmel, tegra, amdkfd and one core
  fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (28 commits)
  drm: atmel-hlcdc: remove clock polarity from crtc driver
  drm/radeon: only enable DP audio if the monitor supports it
  drm/radeon: fix atom aux payload size check for writes (v2)
  drm/radeon: fix 1 RB harvest config setup for TN/RL
  drm/radeon: enable SRBM timeout interrupt on EG/NI
  drm/radeon: enable SRBM timeout interrupt on SI
  drm/radeon: enable SRBM timeout interrupt on CIK v2
  drm/radeon: dump full IB if we hit a packet error
  drm/radeon: disable mclk switching with 120hz+ monitors
  drm/radeon: use drm_mode_vrefresh() rather than mode->vrefresh
  drm/radeon: enable native backlight control on old macs
  drm/i915: Fix frontbuffer false positve.
  drm/i915: Align initial plane backing objects correctly
  drm/i915: avoid processing spurious/shared interrupts in low-power states
  drm/i915: Check obj->vma_list under the struct_mutex
  drm/i915: Fix a use after free, and unbalanced refcounting
  drm: atmel-hlcdc: remove useless pm_runtime_put_sync in probe
  drm: atmel-hlcdc: reset layer A2Q and UPDATE bits when disabling it
  drm: Fix deadlock due to getconnector locking changes
  drm/i915: Dell Chromebook 11 has PWM backlight
  ...

ae1aa797

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · a015d33c

Linus Torvalds authored Feb 28, 2015

Pull block layer fixes from Jens Axboe:
 "Two smaller fixes for this cycle:

   - A fixup from Keith so that NVMe compiles without BLK_INTEGRITY,
     basically just moving the code around appropriately.

   - A fixup for shm, fixing an oops in shmem_mapping() for mapping with
     no inode.  From Sasha"

[ The shmem fix doesn't look block-layer-related, but fixes a bug that
  happened due to the backing_dev_info removal..  - Linus ]

* 'for-linus' of git://git.kernel.dk/linux-block:
  mm: shmem: check for mapping owner before dereferencing
  NVMe: Fix for BLK_DEV_INTEGRITY not set

a015d33c

Merge tag 'xfs-for-linus-4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs · 2aaeb784

Linus Torvalds authored Feb 28, 2015

Pull xfs fixes from Dave Chinner:
 "These are fixes for regressions/bugs introduced in the 4.0 merge cycle
  and problems discovered during the merge window that need to be pushed
  back to stable kernels ASAP.

  This contains:
   - ensure quota type is reset in on-disk dquots
   - fix missing partial EOF block data flush on truncate extension
   - fix transaction leak in error handling for new pnfs block layout
     support
   - add missing target_ip check to RENAME_EXCHANGE"

* tag 'xfs-for-linus-4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: cancel failed transaction in xfs_fs_commit_blocks()
  xfs: Ensure we have target_ip for RENAME_EXCHANGE
  xfs: ensure truncate forces zeroed blocks to disk
  xfs: Fix quota type in quota structures when reusing quota file

2aaeb784

Merge branch 'akpm' (patches from Andrew) · e9738946

Linus Torvalds authored Feb 28, 2015

Merge misc fixes from Andrew Morton:
 "13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: add missing __PAGETABLE_{PUD,PMD}_FOLDED defines
  mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change
  kernel/sys.c: fix UNAME26 for 4.0
  mm: memcontrol: use "max" instead of "infinity" in control knobs
  zram: use proper type to update max_used_pages
  drivers/rtc/rtc-ds1685.c: fix conditional in ds1685_rtc_sysfs_time_regs_{show,store}
  nilfs2: fix potential memory overrun on inode
  scripts/gdb: add empty package initialization script
  rtc: ds1685: remove superfluous checks for out-of-range u8 values
  rtc: ds1685: fix ds1685_rtc_alarm_irq_enable build error
  memcg: fix low limit calculation
  mm/nommu: fix memory leak
  ocfs2: update web page + git tree in documentation

e9738946

mm: add missing __PAGETABLE_{PUD,PMD}_FOLDED defines · c07af4f1

Kirill A. Shutemov authored Feb 27, 2015

Core mm expects __PAGETABLE_{PUD,PMD}_FOLDED to be defined if these page
table levels folded.  Usually, these defines are provided by
<asm-generic/pgtable-nopmd.h> and <asm-generic/pgtable-nopud.h>.

But some architectures fold page table levels in a custom way.  They
need to define these macros themself.  This patch adds missing defines.

The patch fixes mm->nr_pmds underflow and eliminates dead __pmd_alloc()
and __pud_alloc() on architectures without these page table levels.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c07af4f1

mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change · cc873177

Johannes Weiner authored Feb 27, 2015

Historically, !__GFP_FS allocations were not allowed to invoke the OOM
killer once reclaim had failed, but nevertheless kept looping in the
allocator.

Commit 9879de73 ("mm: page_alloc: embed OOM killing naturally into
allocation slowpath"), which should have been a simple cleanup patch,
accidentally changed the behavior to aborting the allocation at that
point.  This creates problems with filesystem callers (?) that currently
rely on the allocator waiting for other tasks to intervene.

Revert the behavior as it shouldn't have been changed as part of a
cleanup patch.

Fixes: 9879de73 ("mm: page_alloc: embed OOM killing naturally into allocation slowpath")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>	[3.19.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cc873177

kernel/sys.c: fix UNAME26 for 4.0 · 39afb5ee

Jon DeVree authored Feb 27, 2015

There's a uname workaround for broken userspace which can't handle kernel
versions of 3.x.  Update it for 4.x.
Signed-off-by: Jon DeVree <nuxi@vault24.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

39afb5ee

mm: memcontrol: use "max" instead of "infinity" in control knobs · d2973697

Johannes Weiner authored Feb 27, 2015

The memcg control knobs indicate the highest possible value using the
symbolic name "infinity", which is long and awkward to type.

Switch to the string "max", which is just as descriptive but shorter and
sweeter.

This changes a user interface, so do it before the release and before
the development flag is dropped from the default hierarchy.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d2973697

zram: use proper type to update max_used_pages · 2ea55a2c

Joonsoo Kim authored Feb 27, 2015

max_used_pages is defined as atomic_long_t so we need to use unsigned
long to keep temporary value for it rather than int which is smaller
than unsigned long in a 64 bit system.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2ea55a2c

drivers/rtc/rtc-ds1685.c: fix conditional in ds1685_rtc_sysfs_time_regs_{show,store} · b00eeaed

Joshua Kinard authored Feb 27, 2015

Fix a conditional statement checking for NULL in both
ds1685_rtc_sysfs_time_regs_show and ds1685_rtc_sysfs_time_regs_store
that was using a logical AND when it should be using a logical OR so
that we fail out of the function properly if the condition ever
evaluates to true.

Fixes: aaaf5fbf ("rtc: add driver for DS1685 family of real time clocks")
Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b00eeaed

nilfs2: fix potential memory overrun on inode · 957ed60b

Ryusuke Konishi authored Feb 27, 2015

Each inode of nilfs2 stores a root node of a b-tree, and it turned out to
have a memory overrun issue:

Each b-tree node of nilfs2 stores a set of key-value pairs and the number
of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as
a few other "bn_*" members.

Since the value of "bn_nchildren" is used for operations on the key-values
within the b-tree node, it can cause memory access overrun if a large
number is incorrectly set to "bn_nchildren".

For instance, nilfs_btree_node_lookup() function determines the range of
binary search with it, and too large "bn_nchildren" leads
nilfs_btree_node_get_key() in that function to overrun.

As for intermediate b-tree nodes, this is prevented by a sanity check
performed when each node is read from a drive, however, no sanity check
has been done for root nodes stored in inodes.

This patch fixes the issue by adding missing sanity check against b-tree
root nodes so that it's called when on-memory inodes are read from ifile,
inode metadata file.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

957ed60b

scripts/gdb: add empty package initialization script · 586a1a12

Jan Kiszka authored Feb 27, 2015

This got lost during the initial merge process: Python requires an
__init__.py script, even if empty, in order to accept a directory as
package.  Add it, this time as a non-empty file.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

586a1a12

rtc: ds1685: remove superfluous checks for out-of-range u8 values · 39ea34cc

Geert Uytterhoeven authored Feb 27, 2015

drivers/rtc/rtc-ds1685.c: In function `ds1685_rtc_read_alarm':
drivers/rtc/rtc-ds1685.c:402: warning: comparison is always true due to limited range of data type
drivers/rtc/rtc-ds1685.c:409: warning: comparison is always true due to limited range of data type
drivers/rtc/rtc-ds1685.c:416: warning: comparison is always true due to limited range of data type
drivers/rtc/rtc-ds1685.c: In function `ds1685_rtc_set_alarm':
drivers/rtc/rtc-ds1685.c:475: warning: comparison is always true due to limited range of data type
drivers/rtc/rtc-ds1685.c:478: warning: comparison is always true due to limited range of data type
drivers/rtc/rtc-ds1685.c:481: warning: comparison is always true due to limited range of data type

u8 cannot contain a value larger than 0xff, hence drop the checks.
Wrapping the checks in unlikely() indicated some sense of humor, though ;-)
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Joshua Kinard <kumba@gentoo.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

39ea34cc

rtc: ds1685: fix ds1685_rtc_alarm_irq_enable build error · 682354d4

Arnd Bergmann authored Feb 27, 2015

The newly added ds1685 driver causes a build error when enabled without
CONFIG_RTC_INTF_DEV:

  drivers/rtc/rtc-ds1685.c:919:22: error: 'ds1685_rtc_alarm_irq_enable' undeclared here (not in a function)
    .alarm_irq_enable = ds1685_rtc_alarm_irq_enable,

Apparently the driver was incorrectly changed to reflect the interface
change from 16380c15 ("RTC: Convert rtc drivers to use the
alarm_irq_enable method"), which removed the respective #ifdef from all
other rtc drivers.

This does the same change that was merged for the other drivers before and
removes the #ifdef, allowing the interrupts to be enabled through the
in-kernel rtc interface independent of the existence of /dev/rtc.

Fixes: aaaf5fbf ("rtc: add driver for DS1685 family of real time clocks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Joshua Kinard <kumba@gentoo.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

682354d4

memcg: fix low limit calculation · 4e54dede

Michal Hocko authored Feb 27, 2015

A memcg is considered low limited even when the current usage is equal to
the low limit. This leads to interesting side effects e.g.
groups/hierarchies with no memory accounted are considered protected and
so the reclaim will emit MEMCG_LOW event when encountering them.

Another and much bigger issue was reported by Joonsoo Kim. He has hit a
NULL ptr dereference with the legacy cgroup API which even doesn't have
low limit exposed. The limit is 0 by default but the initial check fails
for memcg with 0 consumption and parent_mem_cgroup() would return NULL if
use_hierarchy is 0 and so page_counter_read would try to dereference NULL.

I suppose that the current implementation is just an overlook because the
documentation in Documentation/cgroups/unified-hierarchy.txt says:

"The memory.low boundary on the other hand is a top-down allocated
reserve. A cgroup enjoys reclaim protection when it and all its
ancestors are below their low boundaries"

Fix the usage and the low limit comparision in mem_cgroup_low accordingly.

Fixes: 241994ed (mm: memcontrol: default hierarchy interface for memory)
Reported-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4e54dede

mm/nommu: fix memory leak · da616534

Joonsoo Kim authored Feb 27, 2015

Maxime reported the following memory leak regression due to commit
dbc8358c ("mm/nommu: use alloc_pages_exact() rather than its own
implementation").

On v3.19, I am facing a memory leak.  Each time I run a command one page
is lost.  Here an example with busybox's free command:

  / # free
               total       used       free     shared    buffers     cached
  Mem:          7928       1972       5956          0          0        492
  -/+ buffers/cache:       1480       6448
  / # free
               total       used       free     shared    buffers     cached
  Mem:          7928       1976       5952          0          0        492
  -/+ buffers/cache:       1484       6444
  / # free
               total       used       free     shared    buffers     cached
  Mem:          7928       1980       5948          0          0        492
  -/+ buffers/cache:       1488       6440
  / # free
               total       used       free     shared    buffers     cached
  Mem:          7928       1984       5944          0          0        492
  -/+ buffers/cache:       1492       6436
  / # free
               total       used       free     shared    buffers     cached
  Mem:          7928       1988       5940          0          0        492
  -/+ buffers/cache:       1496       6432

At some point, the system fails to sastisfy 256KB allocations:

  free: page allocation failure: order:6, mode:0xd0
  CPU: 0 PID: 67 Comm: free Not tainted 3.19.0-05389-gacf2cf1-dirty #64
  Hardware name: STM32 (Device Tree Support)
    show_stack+0xb/0xc
    warn_alloc_failed+0x97/0xbc
    __alloc_pages_nodemask+0x295/0x35c
    __get_free_pages+0xb/0x24
    alloc_pages_exact+0x19/0x24
    do_mmap_pgoff+0x423/0x658
    vm_mmap_pgoff+0x3f/0x4e
    load_flat_file+0x20d/0x4f8
    load_flat_binary+0x3f/0x26c
    search_binary_handler+0x51/0xe4
    do_execveat_common+0x271/0x35c
    do_execve+0x19/0x1c
    ret_fast_syscall+0x1/0x4a
  Mem-info:
  Normal per-cpu:
  CPU    0: hi:    0, btch:   1 usd:   0
  active_anon:0 inactive_anon:0 isolated_anon:0
   active_file:0 inactive_file:0 isolated_file:0
   unevictable:123 dirty:0 writeback:0 unstable:0
   free:1515 slab_reclaimable:17 slab_unreclaimable:139
   mapped:0 shmem:0 pagetables:0 bounce:0
   free_cma:0
  Normal free:6060kB min:352kB low:440kB high:528kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:492kB isolated(anon):0ks
  lowmem_reserve[]: 0 0
  Normal: 23*4kB (U) 22*8kB (U) 24*16kB (U) 23*32kB (U) 23*64kB (U) 23*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6060kB
  123 total pagecache pages
  2048 pages of RAM
  1538 free pages
  66 reserved pages
  109 slab pages
  -46 pages shared
  0 pages swap cached
  nommu: Allocation of length 221184 from process 67 (free) failed
  Normal per-cpu:
  CPU    0: hi:    0, btch:   1 usd:   0
  active_anon:0 inactive_anon:0 isolated_anon:0
   active_file:0 inactive_file:0 isolated_file:0
   unevictable:123 dirty:0 writeback:0 unstable:0
   free:1515 slab_reclaimable:17 slab_unreclaimable:139
   mapped:0 shmem:0 pagetables:0 bounce:0
   free_cma:0
  Normal free:6060kB min:352kB low:440kB high:528kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:492kB isolated(anon):0ks
  lowmem_reserve[]: 0 0
  Normal: 23*4kB (U) 22*8kB (U) 24*16kB (U) 23*32kB (U) 23*64kB (U) 23*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6060kB
  123 total pagecache pages
  Unable to allocate RAM for process text/data, errno 12 SEGV

This problem happens because we allocate ordered page through
__get_free_pages() in do_mmap_private() in some cases and we try to free
individual pages rather than ordered page in free_page_series().  In
this case, freeing pages whose refcount is not 0 won't be freed to the
page allocator so memory leak happens.

To fix the problem, this patch changes __get_free_pages() to
alloc_pages_exact() since alloc_pages_exact() returns
physically-contiguous pages but each pages are refcounted.

Fixes: dbc8358c ("mm/nommu: use alloc_pages_exact() rather than its own implementation").
Reported-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Tested-by: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>	[3.19]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

da616534

ocfs2: update web page + git tree in documentation · 01945fa2

Mark Fasheh authored Feb 27, 2015

We (the Ocfs2 project) recently moved the location of our ocfs2-tools
git tree and project web page.  The pertinent discussion can be seen
here:

  https://oss.oracle.com/pipermail/ocfs2-devel/2015-February/010579.html

The following patch updates the Ocfs2 documentation in MAINTAINERS,
ocfs2.txt, and dlmfs.txt.  I added our new official web page, changed
the location of our tools git tree and removed the link to Joel's
ancient kernel git tree - Andrew has handled our patches for a while
now.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

01945fa2

x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too · 5b2bdbc8

Steven Rostedt authored Feb 27, 2015

Commit:

   1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")

added a shadow CR4 such that reads and writes that do not
modify the CR4 execute much faster than always reading the
register itself.

The change modified cpu_init() in common.c, so that the
shadow CR4 gets initialized before anything uses it.

Unfortunately, there's two cpu_init()s in common.c. There's
one for 64-bit and one for 32-bit. The commit only added
the shadow init to the 64-bit path, but the 32-bit path
needs the init too.

Link: http://lkml.kernel.org/r/20150227125208.71c36402@gandalf.local.home Fixes: 1e02ce4c "x86: Store a per-cpu shadow copy of CR4"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150227145019.2bdd4354@gandalf.local.homeSigned-off-by: Ingo Molnar <mingo@kernel.org>

5b2bdbc8