1. 20 Jun, 2014 14 commits
    • Johannes Berg's avatar
      Documentation: fix DOCBOOKS=... building · 4552eb24
      Johannes Berg authored
      commit e60cbeed upstream.
      
      Prior to commit 42661299 ("[media] DocBook: Move all media docbook
      stuff into its own directory") it was possible to build only a single
      (or more) book(s) by calling, for example
      
          make htmldocs DOCBOOKS=80211.xml
      
      This now fails:
      
          cp: target `.../Documentation/DocBook//media_api' is not a directory
      
      Ignore errors from that copy to make this possible again.
      
      Fixes: 42661299 ("[media] DocBook: Move all media docbook stuff into its own directory")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Acked-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4552eb24
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: fix memory leak by race between poison and unpoison · 191df2f2
      Naoya Horiguchi authored
      commit 3e030ecc upstream.
      
      When a memory error happens on an in-use page or (free and in-use)
      hugepage, the victim page is isolated with its refcount set to one.
      
      When you try to unpoison it later, unpoison_memory() calls put_page()
      for it twice in order to bring the page back to free page pool (buddy or
      free hugepage list).  However, if another memory error occurs on the
      page which we are unpoisoning, memory_failure() returns without
      releasing the refcount which was incremented in the same call at first,
      which results in memory leak and unconsistent num_poisoned_pages
      statistics.  This patch fixes it.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      191df2f2
    • Peter Zijlstra's avatar
      perf: Fix race in removing an event · c8db9ba0
      Peter Zijlstra authored
      commit 46ce0fe9 upstream.
      
      When removing a (sibling) event we do:
      
      	raw_spin_lock_irq(&ctx->lock);
      	perf_group_detach(event);
      	raw_spin_unlock_irq(&ctx->lock);
      
      	<hole>
      
      	perf_remove_from_context(event);
      		raw_spin_lock_irq(&ctx->lock);
      		...
      		raw_spin_unlock_irq(&ctx->lock);
      
      Now, assuming the event is a sibling, it will be 'unreachable' for
      things like ctx_sched_out() because that iterates the
      groups->siblings, and we just unhooked the sibling.
      
      So, if during <hole> we get ctx_sched_out(), it will miss the event
      and not call event_sched_out() on it, leaving it programmed on the
      PMU.
      
      The subsequent perf_remove_from_context() call will find the ctx is
      inactive and only call list_del_event() to remove the event from all
      other lists.
      
      Hereafter we can proceed to free the event; while still programmed!
      
      Close this hole by moving perf_group_detach() inside the same
      ctx->lock region(s) perf_remove_from_context() has.
      
      The condition on inherited events only in __perf_event_exit_task() is
      likely complete crap because non-inherited events are part of groups
      too and we're tearing down just the same. But leave that for another
      patch.
      
      Most-likely-Fixes: e03a9a55 ("perf: Change close() semantics for group events")
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Much-staring-at-traces-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Much-staring-at-traces-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140505093124.GN17778@laptop.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c8db9ba0
    • Peter Zijlstra's avatar
      perf: Limit perf_event_attr::sample_period to 63 bits · ed8acba3
      Peter Zijlstra authored
      commit 0819b2e3 upstream.
      
      Vince reported that using a large sample_period (one with bit 63 set)
      results in wreckage since while the sample_period is fundamentally
      unsigned (negative periods don't make sense) the way we implement
      things very much rely on signed logic.
      
      So limit sample_period to 63 bits to avoid tripping over this.
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-p25fhunibl4y3qi0zuqmyf4b@git.kernel.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ed8acba3
    • Jiri Olsa's avatar
      perf: Prevent false warning in perf_swevent_add · e36e8c8f
      Jiri Olsa authored
      commit 39af6b16 upstream.
      
      The perf cpu offline callback takes down all cpu context
      events and releases swhash->swevent_hlist.
      
      This could race with task context software event being just
      scheduled on this cpu via perf_swevent_add while cpu hotplug
      code already cleaned up event's data.
      
      The race happens in the gap between the cpu notifier code
      and the cpu being actually taken down. Note that only cpu
      ctx events are terminated in the perf cpu hotplug code.
      
      It's easily reproduced with:
        $ perf record -e faults perf bench sched pipe
      
      while putting one of the cpus offline:
        # echo 0 > /sys/devices/system/cpu/cpu1/online
      
      Console emits following warning:
        WARNING: CPU: 1 PID: 2845 at kernel/events/core.c:5672 perf_swevent_add+0x18d/0x1a0()
        Modules linked in:
        CPU: 1 PID: 2845 Comm: sched-pipe Tainted: G        W    3.14.0+ #256
        Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008
         0000000000000009 ffff880077233ab8 ffffffff81665a23 0000000000200005
         0000000000000000 ffff880077233af8 ffffffff8104732c 0000000000000046
         ffff88007467c800 0000000000000002 ffff88007a9cf2a0 0000000000000001
        Call Trace:
         [<ffffffff81665a23>] dump_stack+0x4f/0x7c
         [<ffffffff8104732c>] warn_slowpath_common+0x8c/0xc0
         [<ffffffff8104737a>] warn_slowpath_null+0x1a/0x20
         [<ffffffff8110fb3d>] perf_swevent_add+0x18d/0x1a0
         [<ffffffff811162ae>] event_sched_in.isra.75+0x9e/0x1f0
         [<ffffffff8111646a>] group_sched_in+0x6a/0x1f0
         [<ffffffff81083dd5>] ? sched_clock_local+0x25/0xa0
         [<ffffffff811167e6>] ctx_sched_in+0x1f6/0x450
         [<ffffffff8111757b>] perf_event_sched_in+0x6b/0xa0
         [<ffffffff81117a4b>] perf_event_context_sched_in+0x7b/0xc0
         [<ffffffff81117ece>] __perf_event_task_sched_in+0x43e/0x460
         [<ffffffff81096f1e>] ? put_lock_stats.isra.18+0xe/0x30
         [<ffffffff8107b3c8>] finish_task_switch+0xb8/0x100
         [<ffffffff8166a7de>] __schedule+0x30e/0xad0
         [<ffffffff81172dd2>] ? pipe_read+0x3e2/0x560
         [<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70
         [<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70
         [<ffffffff8166b464>] preempt_schedule_irq+0x44/0x70
         [<ffffffff816707f0>] retint_kernel+0x20/0x30
         [<ffffffff8109e60a>] ? lockdep_sys_exit+0x1a/0x90
         [<ffffffff812a4234>] lockdep_sys_exit_thunk+0x35/0x67
         [<ffffffff81679321>] ? sysret_check+0x5/0x56
      
      Fixing this by tracking the cpu hotplug state and displaying
      the WARN only if current cpu is initialized properly.
      
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1396861448-10097-1-git-send-email-jolsa@redhat.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e36e8c8f
    • Thomas Gleixner's avatar
      sched: Sanitize irq accounting madness · 3cca5deb
      Thomas Gleixner authored
      commit 2d513868 upstream.
      
      Russell reported, that irqtime_account_idle_ticks() takes ages due to:
      
             for (i = 0; i < ticks; i++)
                     irqtime_account_process_tick(current, 0, rq);
      
      It's sad, that this code was written way _AFTER_ the NOHZ idle
      functionality was available. I charge myself guitly for not paying
      attention when that crap got merged with commit abb74cef ("sched:
      Export ns irqtimes through /proc/stat")
      
      So instead of looping nr_ticks times just apply the whole thing at
      once.
      
      As a side note: The whole cputime_t vs. u64 business in that context
      wants to be cleaned up as well. There is no point in having all these
      back and forth conversions. Lets standardise on u64 nsec for all
      kernel internal accounting and be done with it. Everything else does
      not make sense at all for fine grained accounting. Frederic, can you
      please take care of that?
      Reported-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Shaun Ruffell <sruffell@digium.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1405022307000.6261@ionos.tec.linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3cca5deb
    • Steven Rostedt (Red Hat)'s avatar
      sched: Use CPUPRI_NR_PRIORITIES instead of MAX_RT_PRIO in cpupri check · 21af1041
      Steven Rostedt (Red Hat) authored
      commit 6227cb00 upstream.
      
      The check at the beginning of cpupri_find() makes sure that the task_pri
      variable does not exceed the cp->pri_to_cpu array length. But that length
      is CPUPRI_NR_PRIORITIES not MAX_RT_PRIO, where it will miss the last two
      priorities in that array.
      
      As task_pri is computed from convert_prio() which should never be bigger
      than CPUPRI_NR_PRIORITIES, if the check should cause a panic if it is
      hit.
      Reported-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1397015410.5212.13.camel@marge.simpson.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      21af1041
    • David Woodhouse's avatar
      iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap() · 4801cb6b
      David Woodhouse authored
      This is a small excerpt of the upstream commit
      ea8ea460 (iommu/vt-d: Clean up and fix
      page table clear/free behaviour).
      
      This missing IOTLB flush was added as a minor, inconsequential bug-fix
      in commit ea8ea460 ("iommu/vt-d: Clean up and fix page table clear/free
      behaviour") in 3.15. It wasn't originally intended for -stable but a
      couple of users have reported issues which turn out to be fixed by
      adding the missing flush.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4801cb6b
    • Will Deacon's avatar
      ARM: perf: hook up perf_sample_event_took around pmu irq handling · 22259d1f
      Will Deacon authored
      commit 5f5092e7 upstream.
      
      Since we indirect all of our PMU IRQ handling through a dispatcher, it's
      trivial to hook up perf_sample_event_took to prevent applications such
      as oprofile from generating interrupt storms due to an unrealisticly
      low sample period.
      Reported-by: default avatarRobert Richter <rric@kernel.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      22259d1f
    • Vlastimil Babka's avatar
      mm/compaction: make isolate_freepages start at pageblock boundary · 4ce13868
      Vlastimil Babka authored
      commit 49e068f0 upstream.
      
      The compaction freepage scanner implementation in isolate_freepages()
      starts by taking the current cc->free_pfn value as the first pfn.  In a
      for loop, it scans from this first pfn to the end of the pageblock, and
      then subtracts pageblock_nr_pages from the first pfn to obtain the first
      pfn for the next for loop iteration.
      
      This means that when cc->free_pfn starts at offset X rather than being
      aligned on pageblock boundary, the scanner will start at offset X in all
      scanned pageblock, ignoring potentially many free pages.  Currently this
      can happen when
      
       a) zone's end pfn is not pageblock aligned, or
      
       b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE
          enabled and a hole spanning the beginning of a pageblock
      
      This patch fixes the problem by aligning the initial pfn in
      isolate_freepages() to pageblock boundary.  This also permits replacing
      the end-of-pageblock alignment within the for loop with a simple
      pageblock_nr_pages increment.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarHeesub Shin <heesub.shin@samsung.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4ce13868
    • Vlastimil Babka's avatar
      mm: compaction: detect when scanners meet in isolate_freepages · 31662557
      Vlastimil Babka authored
      commit 7ed695e0 upstream.
      
      Compaction of a zone is finished when the migrate scanner (which begins
      at the zone's lowest pfn) meets the free page scanner (which begins at
      the zone's highest pfn).  This is detected in compact_zone() and in the
      case of direct compaction, the compact_blockskip_flush flag is set so
      that kswapd later resets the cached scanner pfn's, and a new compaction
      may again start at the zone's borders.
      
      The meeting of the scanners can happen during either scanner's activity.
      However, it may currently fail to be detected when it occurs in the free
      page scanner, due to two problems.  First, isolate_freepages() keeps
      free_pfn at the highest block where it isolated pages from, for the
      purposes of not missing the pages that are returned back to allocator
      when migration fails.  Second, failing to isolate enough free pages due
      to scanners meeting results in -ENOMEM being returned by
      migrate_pages(), which makes compact_zone() bail out immediately without
      calling compact_finished() that would detect scanners meeting.
      
      This failure to detect scanners meeting might result in repeated
      attempts at compaction of a zone that keep starting from the cached
      pfn's close to the meeting point, and quickly failing through the
      -ENOMEM path, without the cached pfns being reset, over and over.  This
      has been observed (through additional tracepoints) in the third phase of
      the mmtests stress-highalloc benchmark, where the allocator runs on an
      otherwise idle system.  The problem was observed in the DMA32 zone,
      which was used as a fallback to the preferred Normal zone, but on the
      4GB system it was actually the largest zone.  The problem is even
      amplified for such fallback zone - the deferred compaction logic, which
      could (after being fixed by a previous patch) reset the cached scanner
      pfn's, is only applied to the preferred zone and not for the fallbacks.
      
      The problem in the third phase of the benchmark was further amplified by
      commit 81c0a2bb ("mm: page_alloc: fair zone allocator policy") which
      resulted in a non-deterministic regression of the allocation success
      rate from ~85% to ~65%.  This occurs in about half of benchmark runs,
      making bisection problematic.  It is unlikely that the commit itself is
      buggy, but it should put more pressure on the DMA32 zone during phases 1
      and 2, which may leave it more fragmented in phase 3 and expose the bugs
      that this patch fixes.
      
      The fix is to make scanners meeting in isolate_freepage() stay that way,
      and to check in compact_zone() for scanners meeting when migrate_pages()
      returns -ENOMEM.  The result is that compact_finished() also detects
      scanners meeting and sets the compact_blockskip_flush flag to make
      kswapd reset the scanner pfn's.
      
      The results in stress-highalloc benchmark show that the "regression" by
      commit 81c0a2bb in phase 3 no longer occurs, and phase 1 and 2
      allocation success rates are also significantly improved.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      31662557
    • Vlastimil Babka's avatar
      mm: compaction: reset cached scanner pfn's before reading them · 948ec1db
      Vlastimil Babka authored
      commit d3132e4b upstream.
      
      Compaction caches pfn's for its migrate and free scanners to avoid
      scanning the whole zone each time.  In compact_zone(), the cached values
      are read to set up initial values for the scanners.  There are several
      situations when these cached pfn's are reset to the first and last pfn
      of the zone, respectively.  One of these situations is when a compaction
      has been deferred for a zone and is now being restarted during a direct
      compaction, which is also done in compact_zone().
      
      However, compact_zone() currently reads the cached pfn's *before*
      resetting them.  This means the reset doesn't affect the compaction that
      performs it, and with good chance also subsequent compactions, as
      update_pageblock_skip() is likely to be called and update the cached
      pfn's to those being processed.  Another chance for a successful reset
      is when a direct compaction detects that migration and free scanners
      meet (which has its own problems addressed by another patch) and sets
      update_pageblock_skip flag which kswapd uses to do the reset because it
      goes to sleep.
      
      This is clearly a bug that results in non-deterministic behavior, so
      this patch moves the cached pfn reset to be performed *before* the
      values are read.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      948ec1db
    • Nicholas Bellinger's avatar
      target: Fix NULL pointer dereference for XCOPY in target_put_sess_cmd · 9c7e0735
      Nicholas Bellinger authored
      commit 0ed6e189 upstream.
      
      This patch fixes a NULL pointer dereference regression bug that was
      introduced with:
      
      commit 1e1110c4
      Author: Mikulas Patocka <mpatocka@redhat.com>
      Date:   Sat May 17 06:49:22 2014 -0400
      
          target: fix memory leak on XCOPY
      
      Now that target_put_sess_cmd() -> kref_put_spinlock_irqsave() is
      called with a valid se_cmd->cmd_kref, a NULL pointer dereference
      is triggered because the XCOPY passthrough commands don't have
      an associated se_session pointer.
      
      To address this bug, go ahead and checking for a NULL se_sess pointer
      within target_put_sess_cmd(), and call se_cmd->se_tfo->release_cmd()
      to release the XCOPY's xcopy_pt_cmd memory.
      Reported-by: default avatarThomas Glanzmann <thomas@glanzmann.de>
      Cc: Thomas Glanzmann <thomas@glanzmann.de>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org # 3.12+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9c7e0735
    • Justin Maggard's avatar
      btrfs: fix defrag 32-bit integer overflow · 6e8451cb
      Justin Maggard authored
      commit c41570c9 upstream.
      
      When defragging a very large file, the cluster variable can wrap its 32-bit
      signed int type and become negative, which eventually gets passed to
      btrfs_force_ra() as a very large unsigned long value.  On 32-bit platforms,
      this eventually results in an Oops from the SLAB allocator.
      
      Change the cluster and max_cluster signed int variables to unsigned long to
      match the readahead functions.  This also allows the min() comparison in
      btrfs_defrag_file() to work as intended.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6e8451cb
  2. 18 Jun, 2014 7 commits
  3. 11 Jun, 2014 2 commits
  4. 09 Jun, 2014 17 commits
    • Thomas Gleixner's avatar
      futex: Make lookup_pi_state more robust · 888f1a0f
      Thomas Gleixner authored
      commit 54a21788 upstream.
      
      The current implementation of lookup_pi_state has ambigous handling of
      the TID value 0 in the user space futex.  We can get into the kernel
      even if the TID value is 0, because either there is a stale waiters bit
      or the owner died bit is set or we are called from the requeue_pi path
      or from user space just for fun.
      
      The current code avoids an explicit sanity check for pid = 0 in case
      that kernel internal state (waiters) are found for the user space
      address.  This can lead to state leakage and worse under some
      circumstances.
      
      Handle the cases explicit:
      
             Waiter | pi_state | pi->owner | uTID      | uODIED | ?
      
        [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
        [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid
      
        [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid
      
        [4]  Found  | Found    | NULL      | 0         | 1      | Valid
        [5]  Found  | Found    | NULL      | >0        | 1      | Invalid
      
        [6]  Found  | Found    | task      | 0         | 1      | Valid
      
        [7]  Found  | Found    | NULL      | Any       | 0      | Invalid
      
        [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
        [9]  Found  | Found    | task      | 0         | 0      | Invalid
        [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid
      
       [1] Indicates that the kernel can acquire the futex atomically. We
           came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.
      
       [2] Valid, if TID does not belong to a kernel thread. If no matching
           thread is found then it indicates that the owner TID has died.
      
       [3] Invalid. The waiter is queued on a non PI futex
      
       [4] Valid state after exit_robust_list(), which sets the user space
           value to FUTEX_WAITERS | FUTEX_OWNER_DIED.
      
       [5] The user space value got manipulated between exit_robust_list()
           and exit_pi_state_list()
      
       [6] Valid state after exit_pi_state_list() which sets the new owner in
           the pi_state but cannot access the user space value.
      
       [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.
      
       [8] Owner and user space value match
      
       [9] There is no transient state which sets the user space TID to 0
           except exit_robust_list(), but this is indicated by the
           FUTEX_OWNER_DIED bit. See [4]
      
      [10] There is no transient state which leaves owner and user space
           TID out of sync.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      888f1a0f
    • Thomas Gleixner's avatar
      futex: Always cleanup owner tid in unlock_pi · ab3c68af
      Thomas Gleixner authored
      commit 13fbca4c upstream.
      
      If the owner died bit is set at futex_unlock_pi, we currently do not
      cleanup the user space futex.  So the owner TID of the current owner
      (the unlocker) persists.  That's observable inconsistant state,
      especially when the ownership of the pi state got transferred.
      
      Clean it up unconditionally.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ab3c68af
    • Thomas Gleixner's avatar
      futex: Validate atomic acquisition in futex_lock_pi_atomic() · 8c7e0043
      Thomas Gleixner authored
      commit b3eaa9fc upstream.
      
      We need to protect the atomic acquisition in the kernel against rogue
      user space which sets the user space futex to 0, so the kernel side
      acquisition succeeds while there is existing state in the kernel
      associated to the real owner.
      
      Verify whether the futex has waiters associated with kernel state.  If
      it has, return -EINVAL.  The state is corrupted already, so no point in
      cleaning it up.  Subsequent calls will fail as well.  Not our problem.
      
      [ tglx: Use futex_top_waiter() and explain why we do not need to try
        	restoring the already corrupted user space state. ]
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Will Drewry <wad@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8c7e0043
    • Thomas Gleixner's avatar
      futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in... · b9103e5f
      Thomas Gleixner authored
      futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
      
      commit e9c243a5 upstream.
      
      If uaddr == uaddr2, then we have broken the rule of only requeueing from
      a non-pi futex to a pi futex with this call.  If we attempt this, then
      dangling pointers may be left for rt_waiter resulting in an exploitable
      condition.
      
      This change brings futex_requeue() in line with futex_wait_requeue_pi()
      which performs the same check as per commit 6f7b0a2a ("futex: Forbid
      uaddr == uaddr2 in futex_wait_requeue_pi()")
      
      [ tglx: Compare the resulting keys as well, as uaddrs might be
        	different depending on the mapping ]
      
      Fixes CVE-2014-3153.
      
      Reported-by: Pinkie Pie
      Signed-off-by: default avatarWill Drewry <wad@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b9103e5f
    • Guennadi Liakhovetski's avatar
      media: V4L2: fix VIDIOC_CREATE_BUFS in 64- / 32-bit compatibility mode · 75ffd3e3
      Guennadi Liakhovetski authored
      commit 97d9d23d upstream.
      
      If a struct contains 64-bit fields, it is aligned on 64-bit boundaries
      within containing structs in 64-bit compilations. This is the case with
      struct v4l2_window, which contains pointers and is embedded into struct
      v4l2_format, and that one is embedded into struct v4l2_create_buffers.
      Unlike some other structs, used as a part of the kernel ABI as ioctl()
      arguments, that are packed, these structs aren't packed. This isn't a
      problem per se, but the ioctl-compat code for VIDIOC_CREATE_BUFS contains
      a bug, that triggers in such 64-bit builds. That code wrongly assumes,
      that in struct v4l2_create_buffers, struct v4l2_format immediately follows
      the __u32 memory field, which in fact isn't the case. This bug wasn't
      visible until now, because until recently hardly any applications used
      this ioctl() and mostly embedded 32-bit only drivers implemented it. This
      is changing now with addition of this ioctl() to some USB drivers, e.g.
      UVC. This patch fixes the bug by copying parts of struct
      v4l2_create_buffers separately.
      Signed-off-by: default avatarGuennadi Liakhovetski <g.liakhovetski@gmx.de>
      Acked-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      75ffd3e3
    • Guennadi Liakhovetski's avatar
      media: V4L2: ov7670: fix a wrong index, potentially Oopsing the kernel from user-space · b331e3ac
      Guennadi Liakhovetski authored
      commit cfece585 upstream.
      
      Commit 75e2bdad "ov7670: allow
      configuration of image size, clock speed, and I/O method" uses a wrong
      index to iterate an array. Apart from being wrong, it also uses an
      unchecked value from user-space, which can cause access to unmapped
      memory in the kernel, triggered by a normal desktop user with rights to
      use V4L2 devices.
      Signed-off-by: default avatarGuennadi Liakhovetski <g.liakhovetski@gmx.de>
      Acked-by: default avatarJonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b331e3ac
    • Antti Palosaari's avatar
      media: fc2580: fix tuning failure on 32-bit arch · 383b17ac
      Antti Palosaari authored
      commit 8845cc64 upstream.
      
      There was some frequency calculation overflows which caused tuning
      failure on 32-bit architecture. Use 64-bit numbers where needed in
      order to avoid calculation overflows.
      
      Thanks for the Finnish person, who asked remain anonymous, reporting,
      testing and suggesting the fix.
      Signed-off-by: default avatarAntti Palosaari <crope@iki.fi>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      383b17ac
    • Alex Williamson's avatar
      iommu/amd: Fix interrupt remapping for aliased devices · 2109c87b
      Alex Williamson authored
      commit e028a9e6 upstream.
      
      An apparent cut and paste error prevents the correct flags from being
      set on the alias device resulting in MSI on conventional PCI devices
      failing to work.  This also produces error events from the IOMMU like:
      
      AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=00:14.4 address=0x000000fdf8000000 flags=0x0a00]
      
      Where 14.4 is a PCIe-to-PCI bridge with a device behind it trying to
      use MSI interrupts.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarJoerg Roedel <joro@8bytes.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2109c87b
    • Chunwei Chen's avatar
      libceph: fix corruption when using page_count 0 page in rbd · 92014a62
      Chunwei Chen authored
      commit 178eda29 upstream.
      
      It has been reported that using ZFSonLinux on rbd will result in memory
      corruption. The bug report can be found here:
      
      https://github.com/zfsonlinux/spl/issues/241
      http://tracker.ceph.com/issues/7790
      
      The reason is that ZFS will send pages with page_count 0 into rbd, which in
      turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
      page_count 0, as it will do get_page and put_page, and erroneously free the
      page.
      
      This type of issue has been noted before, and handled in iscsi, drbd,
      etc. So, rbd should also handle this. This fix address this issue by fall back
      to slower sendmsg when page_count 0 detected.
      
      Cc: Sage Weil <sage@inktank.com>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Signed-off-by: default avatarChunwei Chen <tuxoko@gmail.com>
      Reviewed-by: default avatarIlya Dryomov <ilya.dryomov@inktank.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      92014a62
    • Geert Uytterhoeven's avatar
      spi: core: Ignore unsupported Dual/Quad Transfer Mode bits · 4ee11080
      Geert Uytterhoeven authored
      commit 83596fbe upstream.
      
      The availability of SPI Dual or Quad Transfer Mode as indicated by the
      "spi-tx-bus-width" and "spi-rx-bus-width" properties in the device tree is
      a hardware property of the SPI master, SPI slave, and board wiring.  Hence
      the SPI core should not reject an SPI slave because an SPI master driver
      doesn't (yet) support Dual or Quad Transfer Mode.
      
      Change the lack of Dual or Quad Transfer Mode support in the SPI master
      driver from an error condition to a warning condition, and ignore the
      unsupported mode bits, falling back to Single Transfer Mode, to avoid
      breakages when running old kernels with new device trees.
      
      Fixes: f477b7fb (spi: DUAL and QUAD support)
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4ee11080
    • Srivatsa S. Bhat's avatar
      powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode · 20b96d74
      Srivatsa S. Bhat authored
      commit 011e4b02 upstream.
      
      If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
      (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
      get the following messages during boot:
      
      [    0.089866] POWER8 performance monitor hardware support registered
      [    0.089985] power8-pmu: PMAO restore workaround active.
      [    5.095419] Processor 1 is stuck.
      [   10.097933] Processor 2 is stuck.
      [   15.100480] Processor 3 is stuck.
      [   20.102982] Processor 4 is stuck.
      [   25.105489] Processor 5 is stuck.
      [   30.108005] Processor 6 is stuck.
      [   35.110518] Processor 7 is stuck.
      [   40.113369] Processor 9 is stuck.
      [   45.115879] Processor 10 is stuck.
      [   50.118389] Processor 11 is stuck.
      [   55.120904] Processor 12 is stuck.
      [   60.123425] Processor 13 is stuck.
      [   65.125970] Processor 14 is stuck.
      [   70.128495] Processor 15 is stuck.
      [   75.131316] Processor 17 is stuck.
      
      Note that only the sibling threads are stuck, while the primary threads (0, 8,
      16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
      that kexec tries to wakeup (bring online) the sibling threads of all the cores,
      before performing kexec:
      
      [ 9464.131231] Starting new kernel
      [ 9464.148507] kexec: Waking offline cpu 1.
      [ 9464.148552] kexec: Waking offline cpu 2.
      [ 9464.148600] kexec: Waking offline cpu 3.
      [ 9464.148636] kexec: Waking offline cpu 4.
      [ 9464.148671] kexec: Waking offline cpu 5.
      [ 9464.148708] kexec: Waking offline cpu 6.
      [ 9464.148743] kexec: Waking offline cpu 7.
      [ 9464.148779] kexec: Waking offline cpu 9.
      [ 9464.148815] kexec: Waking offline cpu 10.
      [ 9464.148851] kexec: Waking offline cpu 11.
      [ 9464.148887] kexec: Waking offline cpu 12.
      [ 9464.148922] kexec: Waking offline cpu 13.
      [ 9464.148958] kexec: Waking offline cpu 14.
      [ 9464.148994] kexec: Waking offline cpu 15.
      [ 9464.149030] kexec: Waking offline cpu 17.
      
      Instrumenting this piece of code revealed that the cpu_up() operation actually
      fails with -EBUSY. Thus, only the primary threads of all the cores are online
      during kexec, and hence this is a sure-shot receipe for disaster, as explained
      in commit e8e5c215 (powerpc/kexec: Fix orphaned offline CPUs across kexec),
      as well as in the comment above wake_offline_cpus().
      
      It turns out that cpu_up() was returning -EBUSY because the variable
      'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
      by migrate_to_reboot_cpu() inside kernel_kexec().
      
      Now, migrate_to_reboot_cpu() was originally written with the assumption that
      any further code will not need to perform CPU hotplug, since we are anyway in
      the reboot path. However, kexec is clearly not such a case, since we depend on
      onlining CPUs, atleast on powerpc.
      
      So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
      kexec path, to fix this regression in kexec on powerpc.
      
      Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
      can catch such issues more easily in the future.
      
      Fixes: c97102ba (kexec: migrate to reboot cpu)
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      20b96d74
    • Guenter Roeck's avatar
      powerpc: Fix 64 bit builds with binutils 2.24 · cc29f606
      Guenter Roeck authored
      commit 7998eb3d upstream.
      
      With binutils 2.24, various 64 bit builds fail with relocation errors
      such as
      
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_base_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_end_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      
      The assembler maintainer says:
      
       I changed the ABI, something that had to be done but unfortunately
       happens to break the booke kernel code.  When building up a 64-bit
       value with lis, ori, shl, oris, ori or similar sequences, you now
       should use @high and @higha in place of @h and @ha.  @h and @ha
       (and their associated relocs R_PPC64_ADDR16_HI and R_PPC64_ADDR16_HA)
       now report overflow if the value is out of 32-bit signed range.
       ie. @h and @ha assume you're building a 32-bit value. This is needed
       to report out-of-range -mcmodel=medium toc pointer offsets in @toc@h
       and @toc@ha expressions, and for consistency I did the same for all
       other @h and @ha relocs.
      
      Replacing @h with @high in one strategic location fixes the relocation
      errors. This has to be done conditionally since the assembler either
      supports @h or @high but not both.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cc29f606
    • Gavin Shan's avatar
      powerpc/powernv: Reset root port in firmware · f8efc159
      Gavin Shan authored
      commit 372cf124 upstream.
      
      Resetting root port has more stuff to do than that for PCIe switch
      ports and we should have resetting root port done in firmware instead
      of the kernel itself. The problem was introduced by commit 5b2e198e
      ("powerpc/powernv: Rework EEH reset").
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f8efc159
    • Horia Geanta's avatar
      crypto: caam - add allocation failure handling in SPRINTFCAT macro · 2f152373
      Horia Geanta authored
      commit 27c5fb7a upstream.
      
      GFP_ATOMIC memory allocation could fail.
      In this case, avoid NULL pointer dereference and notify user.
      
      Cc: Kim Phillips <kim.phillips@freescale.com>
      Signed-off-by: default avatarHoria Geanta <horia.geanta@freescale.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2f152373
    • Olof Johansson's avatar
      i2c: s3c2410: resume race fix · 7fd5ba24
      Olof Johansson authored
      commit ce78cc07 upstream.
      
      Don't unmark the device as suspended until after it's been re-setup.
      
      The main race would be w.r.t. an i2c driver that gets resumed at the same
      time (asyncronously), that is allowed to do a transfer since suspended
      is set to 0 before reinit, but really should have seen the -EIO return
      instead.
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarDoug Anderson <dianders@chromium.org>
      Acked-by: default avatarKukjin Kim <kgene.kim@samsung.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7fd5ba24
    • Du, Wenkai's avatar
      i2c: designware: Mask all interrupts during i2c controller enable · 3c90e6ad
      Du, Wenkai authored
      commit 47bb27e7 upstream.
      
      There have been "i2c_designware 80860F41:00: controller timed out" errors
      on a number of Baytrail platforms. The issue is caused by incorrect value in
      Interrupt Mask Register (DW_IC_INTR_MASK)  when i2c core is being enabled.
      This causes call to __i2c_dw_enable() to immediately start the transfer which
      leads to timeout. There are 3 failure modes observed:
      
      1. Failure in S0 to S3 resume path
      
      The default value after reset for DW_IC_INTR_MASK is 0x8ff. When we start
      the first transaction after resuming from system sleep, TX_EMPTY interrupt
      is already unmasked because of the hardware default.
      
      2. Failure in normal operational path
      
      This failure happens rarely and is hard to reproduce. Debug trace showed that
      DW_IC_INTR_MASK had value of 0x254 when failure occurred, which meant
      TX_EMPTY was unmasked.
      
      3. Failure in S3 to S0 suspend path
      
      This failure also happens rarely and is hard to reproduce. Adding debug trace
      that read DW_IC_INTR_MASK made this failure not reproducible. But from ISR
      call trace we could conclude TX_EMPTY was unmasked when problem occurred.
      
      The patch masks all interrupts before the controller is enabled to resolve the
      faulty DW_IC_INTR_MASK conditions.
      Signed-off-by: default avatarWenkai Du <wenkai.du@intel.com>
      Acked-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      [wsa: improved the comment and removed typo in commit msg]
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3c90e6ad
    • Wolfram Sang's avatar
      i2c: rcar: bail out on zero length transfers · f6ec9bd4
      Wolfram Sang authored
      commit d7653964 upstream.
      
      This hardware does not support zero length transfers. Instead, the
      driver does one (random) byte transfers currently with undefined results
      for the slaves. We now bail out.
      Signed-off-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f6ec9bd4