1. 21 Aug, 2016 40 commits
    • James Hogan's avatar
      MIPS: KVM: Fix modular KVM under QEMU · 9338ba33
      James Hogan authored
      commit 797179bc upstream.
      
      Copy __kvm_mips_vcpu_run() into unmapped memory, so that we can never
      get a TLB refill exception in it when KVM is built as a module.
      
      This was observed to happen with the host MIPS kernel running under
      QEMU, due to a not entirely transparent optimisation in the QEMU TLB
      handling where TLB entries replaced with TLBWR are copied to a separate
      part of the TLB array. Code in those pages continue to be executable,
      but those mappings persist only until the next ASID switch, even if they
      are marked global.
      
      An ASID switch happens in __kvm_mips_vcpu_run() at exception level after
      switching to the guest exception base. Subsequent TLB mapped kernel
      instructions just prior to switching to the guest trigger a TLB refill
      exception, which enters the guest exception handlers without updating
      EPC. This appears as a guest triggered TLB refill on a host kernel
      mapped (host KSeg2) address, which is not handled correctly as user
      (guest) mode accesses to kernel (host) segments always generate address
      error exceptions.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.10.x-
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [james.hogan@imgtec.com: backported for stable 3.14]
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9338ba33
    • Ralf Baechle's avatar
      MIPS: Fix 64k page support for 32 bit kernels. · ce7222fc
      Ralf Baechle authored
      commit d7de4134 upstream.
      
      TASK_SIZE was defined as 0x7fff8000UL which for 64k pages is not a
      multiple of the page size.  Somewhere further down the math fails
      such that executing an ELF binary fails.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Tested-by: default avatarJoshua Henderson <joshua.henderson@microchip.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ce7222fc
    • Matthias Schiffer's avatar
      MIPS: ath79: make bootconsole wait for both THRE and TEMT · 13f004c0
      Matthias Schiffer authored
      commit f5b556c9 upstream.
      
      This makes the ath79 bootconsole behave the same way as the generic 8250
      bootconsole.
      
      Also waiting for TEMT (transmit buffer is empty) instead of just THRE
      (transmit buffer is not full) ensures that all characters have been
      transmitted before the real serial driver starts reconfiguring the serial
      controller (which would sometimes result in garbage being transmitted.)
      This change does not cause a visible performance loss.
      
      In addition, this seems to fix a hang observed in certain configurations on
      many AR7xxx/AR9xxx SoCs during autoconfig of the real serial driver.
      
      A more complete follow-up patch will disable 8250 autoconfig for ath79
      altogether (the serial controller is detected as a 16550A, which is not
      fully compatible with the ath79 serial, and the autoconfig may lead to
      undefined behavior on ath79.)
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMatthias Schiffer <mschiffer@universe-factory.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      13f004c0
    • James Hogan's avatar
      MIPS: Fix siginfo.h to use strict posix types · f8141132
      James Hogan authored
      commit 5daebc47 upstream.
      
      Commit 85efde6f ("make exported headers use strict posix types")
      changed the asm-generic siginfo.h to use the __kernel_* types, and
      commit 3a471cbc ("remove __KERNEL_STRICT_NAMES") make the internal
      types accessible only to the kernel, but the MIPS implementation hasn't
      been updated to match.
      
      Switch to proper types now so that the exported asm/siginfo.h won't
      produce quite so many compiler errors when included alone by a user
      program.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Christopher Ferris <cferris@google.com>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 2.6.30-
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/12477/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f8141132
    • Paul Burton's avatar
      MIPS: math-emu: Fix jalr emulation when rd == $0 · 2c767dae
      Paul Burton authored
      commit ab4a92e6 upstream.
      
      When emulating a jalr instruction with rd == $0, the code in
      isBranchInstr was incorrectly writing to GPR $0 which should actually
      always remain zeroed. This would lead to any further instructions
      emulated which use $0 operating on a bogus value until the task is next
      context switched, at which point the value of $0 in the task context
      would be restored to the correct zero by a store in SAVE_SOME. Fix this
      by not writing to rd if it is $0.
      
      Fixes: 102cedc3 ("MIPS: microMIPS: Floating point support.")
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: Maciej W. Rozycki <macro@imgtec.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/13160/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2c767dae
    • James Hogan's avatar
      MIPS: KVM: Propagate kseg0/mapped tlb fault errors · 59281253
      James Hogan authored
      commit 9b731bcf upstream.
      
      Propagate errors from kvm_mips_handle_kseg0_tlb_fault() and
      kvm_mips_handle_mapped_seg_tlb_fault(), usually triggering an internal
      error since they normally indicate the guest accessed bad physical
      memory or the commpage in an unexpected way.
      
      Fixes: 858dd5d4 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
      Fixes: e685c689 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      [james.hogan@imgtec.com: Backport to v3.10.y - v3.15.y]
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      59281253
    • James Hogan's avatar
      MIPS: KVM: Fix gfn range check in kseg0 tlb faults · 8cee00e9
      James Hogan authored
      commit 0741f52d upstream.
      
      Two consecutive gfns are loaded into host TLB, so ensure the range check
      isn't off by one if guest_pmap_npages is odd.
      
      Fixes: 858dd5d4 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      [james.hogan@imgtec.com: Backport to v3.10.y - v3.15.y]
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8cee00e9
    • James Hogan's avatar
      MIPS: KVM: Add missing gfn range check · 2336de8d
      James Hogan authored
      commit 8985d503 upstream.
      
      kvm_mips_handle_mapped_seg_tlb_fault() calculates the guest frame number
      based on the guest TLB EntryLo values, however it is not range checked
      to ensure it lies within the guest_pmap. If the physical memory the
      guest refers to is out of range then dump the guest TLB and emit an
      internal error.
      
      Fixes: 858dd5d4 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      [james.hogan@imgtec.com: Backport to v3.10.y - v3.15.y]
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2336de8d
    • James Hogan's avatar
      MIPS: KVM: Fix mapped fault broken commpage handling · 828e4e16
      James Hogan authored
      commit c604cffa upstream.
      
      kvm_mips_handle_mapped_seg_tlb_fault() appears to map the guest page at
      virtual address 0 to PFN 0 if the guest has created its own mapping
      there. The intention is unclear, but it may have been an attempt to
      protect the zero page from being mapped to anything but the comm page in
      code paths you wouldn't expect from genuine commpage accesses (guest
      kernel mode cache instructions on that address, hitting trapping
      instructions when executing from that address with a coincidental TLB
      eviction during the KVM handling, and guest user mode accesses to that
      address).
      
      Fix this to check for mappings exactly at KVM_GUEST_COMMPAGE_ADDR (it
      may not be at address 0 since commit 42aa12e7 ("MIPS: KVM: Move
      commpage so 0x0 is unmapped")), and set the corresponding EntryLo to be
      interpreted as 0 (invalid).
      
      Fixes: 858dd5d4 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      [james.hogan@imgtec.com: Backport to v3.10.y - v3.15.y]
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      828e4e16
    • Soheil Hassas Yeganeh's avatar
      tcp: consider recv buf for the initial window scale · aadf1c47
      Soheil Hassas Yeganeh authored
      commit f626300a upstream.
      
      tcp_select_initial_window() intends to advertise a window
      scaling for the maximum possible window size. To do so,
      it considers the maximum of net.ipv4.tcp_rmem[2] and
      net.core.rmem_max as the only possible upper-bounds.
      However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
      to set the socket's receive buffer size to values
      larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
      Thus, SO_RCVBUFFORCE is effectively ignored by
      tcp_select_initial_window().
      
      To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
      net.core.rmem_max and socket's initial buffer space.
      
      Fixes: b0573dea ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Suggested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      aadf1c47
    • Yuchung Cheng's avatar
      tcp: record TLP and ER timer stats in v6 stats · 4f6b1692
      Yuchung Cheng authored
      commit ce3cf4ec upstream.
      
      The v6 tcp stats scan do not provide TLP and ER timer information
      correctly like the v4 version . This patch fixes that.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Fixes: eed530b6 ("tcp: early retransmit")
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4f6b1692
    • Charles (Chas) Williams's avatar
      tcp: make challenge acks less predictable · b1f32b94
      Charles (Chas) Williams authored
      commit 75ff39cc upstream.
      
      From: Eric Dumazet <edumazet@google.com>
      
      Yue Cao claims that current host rate limiting of challenge ACKS
      (RFC 5961) could leak enough information to allow a patient attacker
      to hijack TCP sessions. He will soon provide details in an academic
      paper.
      
      This patch increases the default limit from 100 to 1000, and adds
      some randomization so that the attacker can no longer hijack
      sessions without spending a considerable amount of probes.
      
      Based on initial analysis and patch from Linus.
      
      Note that we also have per socket rate limiting, so it is tempting
      to remove the host limit in the future.
      
      v2: randomize the count of challenge acks per second, not the period.
      
      Fixes: 282f23c6 ("tcp: implement RFC 5961 3.2")
      Reported-by: default avatarYue Cao <ycao009@ucr.edu>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [ ciwillia: backport to 3.10-stable ]
      Signed-off-by: default avatarChas Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b1f32b94
    • Hugh Dickins's avatar
      tmpfs: fix regression hang in fallocate undo · 24fc11a9
      Hugh Dickins authored
      commit 7f556567 upstream.
      
      The well-spotted fallocate undo fix is good in most cases, but not when
      fallocate failed on the very first page.  index 0 then passes lend -1
      to shmem_undo_range(), and that has two bad effects: (a) that it will
      undo every fallocation throughout the file, unrestricted by the current
      range; but more importantly (b) it can cause the undo to hang, because
      lend -1 is treated as truncation, which makes it keep on retrying until
      every page has gone, but those already fully instantiated will never go
      away.  Big thank you to xfstests generic/269 which demonstrates this.
      
      Fixes: b9b4bb26 ("tmpfs: don't undo fallocate past its last page")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      24fc11a9
    • Anthony Romano's avatar
      tmpfs: don't undo fallocate past its last page · 6d2eb0f1
      Anthony Romano authored
      commit b9b4bb26 upstream.
      
      When fallocate is interrupted it will undo a range that extends one byte
      past its range of allocated pages.  This can corrupt an in-use page by
      zeroing out its first byte.  Instead, undo using the inclusive byte
      range.
      
      Fixes: 1635f6a7 ("tmpfs: undo fallocation on failure")
      Link: http://lkml.kernel.org/r/1462713387-16724-1-git-send-email-anthony.romano@coreos.comSigned-off-by: default avatarAnthony Romano <anthony.romano@coreos.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Brandon Philips <brandon@ifup.co>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6d2eb0f1
    • Ilya Dryomov's avatar
      libceph: apply new_state before new_up_client on incrementals · 1196c36f
      Ilya Dryomov authored
      commit 930c5328 upstream.
      
      Currently, osd_weight and osd_state fields are updated in the encoding
      order.  This is wrong, because an incremental map may look like e.g.
      
          new_up_client: { osd=6, addr=... } # set osd_state and addr
          new_state: { osd=6, xorstate=EXISTS } # clear osd_state
      
      Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
      applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
      on with the new_state update, we flip EXISTS and leave osd6 in a weird
      "!EXISTS but UP" state.  A non-existent OSD is considered down by the
      mapping code
      
      2087    for (i = 0; i < pg->pg_temp.len; i++) {
      2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
      2089                    if (ceph_can_shift_osds(pi))
      2090                            continue;
      2091
      2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;
      
      and so requests get directed to the second OSD in the set instead of
      the first, resulting in OSD-side errors like:
      
      [WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680
      
      and hung rbds on the client:
      
      [  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
      [  493.566805] rbd: rbd0:   result -6 xferred 400000
      [  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688
      
      The fix is to decouple application from the decoding and:
      - apply new_weight first
      - apply new_state before new_up_client
      - twiddle osd_state flags if marking in
      - clear out some of the state if osd is destroyed
      
      Fixes: http://tracker.ceph.com/issues/14901Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
      [idryomov@gmail.com: backport to 3.10-3.14: strip primary-affinity]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1196c36f
    • Scott Bauer's avatar
      HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands · ba293572
      Scott Bauer authored
      commit 93a2001b upstream.
      
      This patch validates the num_values parameter from userland during the
      HIDIOCGUSAGES and HIDIOCSUSAGES commands. Previously, if the report id was set
      to HID_REPORT_ID_UNKNOWN, we would fail to validate the num_values parameter
      leading to a heap overflow.
      
      CVE-2016-5829
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarScott Bauer <sbauer@plzdonthack.me>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarChas Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ba293572
    • Tejun Heo's avatar
      printk: do cond_resched() between lines while outputting to consoles · 0ba4f4bb
      Tejun Heo authored
      commit 8d91f8b1 upstream.
      
      @console_may_schedule tracks whether console_sem was acquired through
      lock or trylock.  If the former, we're inside a sleepable context and
      console_conditional_schedule() performs cond_resched().  This allows
      console drivers which use console_lock for synchronization to yield
      while performing time-consuming operations such as scrolling.
      
      However, the actual console outputting is performed while holding
      irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
      before starting outputting lines.  Also, only a few drivers call
      console_conditional_schedule() to begin with.  This means that when a
      lot of lines need to be output by console_unlock(), for example on a
      console registration, the task doing console_unlock() may not yield for
      a long time on a non-preemptible kernel.
      
      If this happens with a slow console devices, for example a serial
      console, the outputting task may occupy the cpu for a very long time.
      Long enough to trigger softlockup and/or RCU stall warnings, which in
      turn pile more messages, sometimes enough to trigger the next cycle of
      warnings incapacitating the system.
      
      Fix it by making console_unlock() insert cond_resched() between lines if
      @console_may_schedule.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarCalvin Owens <calvinowens@fb.com>
      Acked-by: default avatarJan Kara <jack@suse.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Kyle McMartin <kyle@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ciwillia@brocade.com: adjust context for 3.10.y]
      Signed-off-by: default avatarChas Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0ba4f4bb
    • Hugh Dickins's avatar
      mm: migrate dirty page without clear_page_dirty_for_io etc · af110cc4
      Hugh Dickins authored
      commit 42cb14b1 upstream.
      
      clear_page_dirty_for_io() has accumulated writeback and memcg subtleties
      since v2.6.16 first introduced page migration; and the set_page_dirty()
      which completed its migration of PageDirty, later had to be moderated to
      __set_page_dirty_nobuffers(); then PageSwapBacked had to skip that too.
      
      No actual problems seen with this procedure recently, but if you look into
      what the clear_page_dirty_for_io(page)+set_page_dirty(newpage) is actually
      achieving, it turns out to be nothing more than moving the PageDirty flag,
      and its NR_FILE_DIRTY stat from one zone to another.
      
      It would be good to avoid a pile of irrelevant decrementations and
      incrementations, and improper event counting, and unnecessary descent of
      the radix_tree under tree_lock (to set the PAGECACHE_TAG_DIRTY which
      radix_tree_replace_slot() left in place anyway).
      
      Do the NR_FILE_DIRTY movement, like the other stats movements, while
      interrupts still disabled in migrate_page_move_mapping(); and don't even
      bother if the zone is the same.  Do the PageDirty movement there under
      tree_lock too, where old page is frozen and newpage not yet visible:
      bearing in mind that as soon as newpage becomes visible in radix_tree, an
      un-page-locked set_page_dirty() might interfere (or perhaps that's just
      not possible: anything doing so should already hold an additional
      reference to the old page, preventing its migration; but play safe).
      
      But we do still need to transfer PageDirty in migrate_page_copy(), for
      those who don't go the mapping route through migrate_page_move_mapping().
      
      CVE-2016-3070
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ciwillia@brocade.com: backported to 3.10: adjusted context]
      Signed-off-by: default avatarCharles (Chas) Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      af110cc4
    • Dan Carpenter's avatar
      KEYS: potential uninitialized variable · 273e07f9
      Dan Carpenter authored
      commit 38327424 upstream.
      
      If __key_link_begin() failed then "edit" would be uninitialized.  I've
      added a check to fix that.
      
      This allows a random user to crash the kernel, though it's quite
      difficult to achieve.  There are three ways it can be done as the user
      would have to cause an error to occur in __key_link():
      
       (1) Cause the kernel to run out of memory.  In practice, this is difficult
           to achieve without ENOMEM cropping up elsewhere and aborting the
           attempt.
      
       (2) Revoke the destination keyring between the keyring ID being looked up
           and it being tested for revocation.  In practice, this is difficult to
           time correctly because the KEYCTL_REJECT function can only be used
           from the request-key upcall process.  Further, users can only make use
           of what's in /sbin/request-key.conf, though this does including a
           rejection debugging test - which means that the destination keyring
           has to be the caller's session keyring in practice.
      
       (3) Have just enough key quota available to create a key, a new session
           keyring for the upcall and a link in the session keyring, but not then
           sufficient quota to create a link in the nominated destination keyring
           so that it fails with EDQUOT.
      
      The bug can be triggered using option (3) above using something like the
      following:
      
      	echo 80 >/proc/sys/kernel/keys/root_maxbytes
      	keyctl request2 user debug:fred negate @t
      
      The above sets the quota to something much lower (80) to make the bug
      easier to trigger, but this is dependent on the system.  Note also that
      the name of the keyring created contains a random number that may be
      between 1 and 10 characters in size, so may throw the test off by
      changing the amount of quota used.
      
      Assuming the failure occurs, something like the following will be seen:
      
      	kfree_debugcheck: out of range ptr 6b6b6b6b6b6b6b68h
      	------------[ cut here ]------------
      	kernel BUG at ../mm/slab.c:2821!
      	...
      	RIP: 0010:[<ffffffff811600f9>] kfree_debugcheck+0x20/0x25
      	RSP: 0018:ffff8804014a7de8  EFLAGS: 00010092
      	RAX: 0000000000000034 RBX: 6b6b6b6b6b6b6b68 RCX: 0000000000000000
      	RDX: 0000000000040001 RSI: 00000000000000f6 RDI: 0000000000000300
      	RBP: ffff8804014a7df0 R08: 0000000000000001 R09: 0000000000000000
      	R10: ffff8804014a7e68 R11: 0000000000000054 R12: 0000000000000202
      	R13: ffffffff81318a66 R14: 0000000000000000 R15: 0000000000000001
      	...
      	Call Trace:
      	  kfree+0xde/0x1bc
      	  assoc_array_cancel_edit+0x1f/0x36
      	  __key_link_end+0x55/0x63
      	  key_reject_and_link+0x124/0x155
      	  keyctl_reject_key+0xb6/0xe0
      	  keyctl_negate_key+0x10/0x12
      	  SyS_keyctl+0x9f/0xe7
      	  do_syscall_64+0x63/0x13a
      	  entry_SYSCALL64_slow_path+0x25/0x25
      
      CVE-2016-4470
      
      Fixes: f70e2e06 ('KEYS: Do preallocation for __key_link()')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ciwillia@brocade.com: backported to 3.10: adjusted context]
      Signed-off-by: default avatarCharles (Chas) Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      273e07f9
    • Bjørn Mork's avatar
      cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind · 82365cf6
      Bjørn Mork authored
      commit 4d06dd53 upstream.
      
      usbnet_link_change will call schedule_work and should be
      avoided if bind is failing. Otherwise we will end up with
      scheduled work referring to a netdev which has gone away.
      
      Instead of making the call conditional, we can just defer
      it to usbnet_probe, using the driver_info flag made for
      this purpose.
      
      CVE-2016-3951
      
      Fixes: 8a34b0ae ("usbnet: cdc_ncm: apply usbnet_link_change")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [ciwillia@brocade.com: backported to 3.10: adjusted context]
      Signed-off-by: default avatarCharles (Chas) Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      82365cf6
    • Willy Tarreau's avatar
      pipe: limit the per-user amount of pages allocated in pipes · f474c525
      Willy Tarreau authored
      commit 759c0114 upstream.
      
      On no-so-small systems, it is possible for a single process to cause an
      OOM condition by filling large pipes with data that are never read. A
      typical process filling 4000 pipes with 1 MB of data will use 4 GB of
      memory. On small systems it may be tricky to set the pipe max size to
      prevent this from happening.
      
      This patch makes it possible to enforce a per-user soft limit above
      which new pipes will be limited to a single page, effectively limiting
      them to 4 kB each, as well as a hard limit above which no new pipes may
      be created for this user. This has the effect of protecting the system
      against memory abuse without hurting other users, and still allowing
      pipes to work correctly though with less data at once.
      
      The limit are controlled by two new sysctls : pipe-user-pages-soft, and
      pipe-user-pages-hard. Both may be disabled by setting them to zero. The
      default soft limit allows the default number of FDs per process (1024)
      to create pipes of the default size (64kB), thus reaching a limit of 64MB
      before starting to create only smaller pipes. With 256 processes limited
      to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
      1084 MB of memory allocated for a user. The hard limit is disabled by
      default to avoid breaking existing applications that make intensive use
      of pipes (eg: for splicing).
      
      CVE-2016-2847
      
      Reported-by: socketpair@gmail.com
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Mitigates: CVE-2013-4312 (Linux 2.0+)
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarChas Williams <3chas3@gmail.com>
      f474c525
    • Andy Lutomirski's avatar
      x86/mm: Add barriers and document switch_mm()-vs-flush synchronization · b1b4becf
      Andy Lutomirski authored
      commit 71b3c126 upstream.
      
      When switch_mm() activates a new PGD, it also sets a bit that
      tells other CPUs that the PGD is in use so that TLB flush IPIs
      will be sent.  In order for that to work correctly, the bit
      needs to be visible prior to loading the PGD and therefore
      starting to fill the local TLB.
      
      Document all the barriers that make this work correctly and add
      a couple that were missing.
      
      CVE-2016-2069
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ luis: backported to 3.16:
        - dropped N/A comment in flush_tlb_mm_range()
        - adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      [ciwillia@brocade.com: backported to 3.10: adjusted context]
      Signed-off-by: default avatarCharles (Chas) Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b1b4becf
    • Yoshihiro Shimoda's avatar
      usb: renesas_usbhs: protect the CFIFOSEL setting in usbhsg_ep_enable() · a2fe0850
      Yoshihiro Shimoda authored
      commit 15e4292a upstream.
      
      This patch fixes an issue that the CFIFOSEL register value is possible
      to be changed by usbhsg_ep_enable() wrongly. And then, a data transfer
      using CFIFO may not work correctly.
      
      For example:
       # modprobe g_multi file=usb-storage.bin
       # ifconfig usb0 192.168.1.1 up
       (During the USB host is sending file to the mass storage)
       # ifconfig usb0 down
      
      In this case, since the u_ether.c may call usb_ep_enable() in
      eth_stop(), if the renesas_usbhs driver is also using CFIFO for
      mass storage, the mass storage may not work correctly.
      
      So, this patch adds usbhs_lock() and usbhs_unlock() calling in
      usbhsg_ep_enable() to protect CFIFOSEL register. This is because:
       - CFIFOSEL.CURPIPE = 0 is also needed for the pipe configuration
       - The CFIFOSEL (fifo->sel) is already protected by usbhs_lock()
      
      Fixes: 97664a20 ("usb: renesas_usbhs: shrink spin lock area")
      Cc: <stable@vger.kernel.org> # v3.1+
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a2fe0850
    • Andrew Goodbody's avatar
      usb: musb: Ensure rx reinit occurs for shared_fifo endpoints · 664133ce
      Andrew Goodbody authored
      commit f3eec0cf upstream.
      
      shared_fifo endpoints would only get a previous tx state cleared
      out, the rx state was only cleared for non shared_fifo endpoints
      Change this so that the rx state is cleared for all endpoints.
      This addresses an issue that resulted in rx packets being dropped
      silently.
      Signed-off-by: default avatarAndrew Goodbody <andrew.goodbody@cambrionix.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBin Liu <b-liu@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      664133ce
    • Andrew Goodbody's avatar
      usb: musb: Stop bulk endpoint while queue is rotated · 788d2a8d
      Andrew Goodbody authored
      commit 7b2c17f8 upstream.
      
      Ensure that the endpoint is stopped by clearing REQPKT before
      clearing DATAERR_NAKTIMEOUT before rotating the queue on the
      dedicated bulk endpoint.
      This addresses an issue where a race could result in the endpoint
      receiving data before it was reprogrammed resulting in a warning
      about such data from musb_rx_reinit before it was thrown away.
      The data thrown away was a valid packet that had been correctly
      ACKed which meant the host and device got out of sync.
      Signed-off-by: default avatarAndrew Goodbody <andrew.goodbody@cambrionix.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBin Liu <b-liu@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      788d2a8d
    • Daniele Palmas's avatar
      USB: serial: option: add support for Telit LE910 PID 0x1206 · 1883edde
      Daniele Palmas authored
      commit 3c0415fa upstream.
      
      This patch adds support for 0x1206 PID of Telit LE910.
      
      Since the interfaces positions are the same than the ones for
      0x1043 PID of Telit LE922, telit_le922_blacklist_usbcfg3 is used.
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1883edde
    • Alan Stern's avatar
      USB: EHCI: declare hostpc register as zero-length array · 122ac0bd
      Alan Stern authored
      commit 7e8b3dfe upstream.
      
      The HOSTPC extension registers found in some EHCI implementations form
      a variable-length array, with one element for each port.  Therefore
      the hostpc field in struct ehci_regs should be declared as a
      zero-length array, not a single-element array.
      
      This fixes a problem reported by UBSAN.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarWilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
      Tested-by: default avatarWilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      122ac0bd
    • Willy Tarreau's avatar
      USB: fix up faulty backports · 903c5a46
      Willy Tarreau authored
      Ben Hutchings reported that two patches were incorrectly backported
      to 3.10 :
      
      - ddbe1fca ("USB: Add device quirk for ASUS T100 Base Station keyboard")
      - ad87e032 ("USB: add quirk for devices with broken LPM")
      
      These two patches introduce quirks which must be in usb_quirk_list and
      not in usb_interface_quirk_list. These last one must only contain the
      Logitech UVC camera.
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      903c5a46
    • Kangjie Lu's avatar
      USB: usbfs: fix potential infoleak in devio · ba3904ee
      Kangjie Lu authored
      commit 681fef83 upstream.
      
      The stack object "ci" has a total size of 8 bytes. Its last 3 bytes
      are padding bytes which are not initialized and leaked to userland
      via "copy_to_user".
      
      CVE-2016-4482
      Signed-off-by: default avatarKangjie Lu <kjlu@gatech.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ciwillia@brocade.com: backported to 3.10: adjusted context]
      Signed-off-by: default avatarCharles (Chas) Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ba3904ee
    • Alan Stern's avatar
      USB: fix invalid memory access in hub_activate() · bbb09420
      Alan Stern authored
      commit e50293ef upstream.
      
      Commit 8520f380 ("USB: change hub initialization sleeps to
      delayed_work") changed the hub_activate() routine to make part of it
      run in a workqueue.  However, the commit failed to take a reference to
      the usb_hub structure or to lock the hub interface while doing so.  As
      a result, if a hub is plugged in and quickly unplugged before the work
      routine can run, the routine will try to access memory that has been
      deallocated.  Or, if the hub is unplugged while the routine is
      running, the memory may be deallocated while it is in active use.
      
      This patch fixes the problem by taking a reference to the usb_hub at
      the start of hub_activate() and releasing it at the end (when the work
      is finished), and by locking the hub interface while the work routine
      is running.  It also adds a check at the start of the routine to see
      if the hub has already been disconnected, in which nothing should be
      done.
      
      CVE-2015-8816
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAlexandru Cornea <alexandru.cornea@intel.com>
      Tested-by: default avatarAlexandru Cornea <alexandru.cornea@intel.com>
      Fixes: 8520f380 ("USB: change hub initialization sleeps to delayed_work")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ luis: backported to 3.16:
        - Added forward declaration of hub_release() which mainline had with commit
          32a69589 ("usb: hub: convert khubd into workqueue") ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarCharles (Chas) Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bbb09420
    • Eric Dumazet's avatar
      udp: properly support MSG_PEEK with truncated buffers · 98f57e42
      Eric Dumazet authored
      commit 197c949e upstream.
      
      Backport of this upstream commit into stable kernels :
      89c22d8c ("net: Fix skb csum races when peeking")
      exposed a bug in udp stack vs MSG_PEEK support, when user provides
      a buffer smaller than skb payload.
      
      In this case,
      skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                       msg->msg_iov);
      returns -EFAULT.
      
      This bug does not happen in upstream kernels since Al Viro did a great
      job to replace this into :
      skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
      This variant is safe vs short buffers.
      
      For the time being, instead reverting Herbert Xu patch and add back
      skb->ip_summed invalid changes, simply store the result of
      udp_lib_checksum_complete() so that we avoid computing the checksum a
      second time, and avoid the problematic
      skb_copy_and_csum_datagram_iovec() call.
      
      This patch can be applied on recent kernels as it avoids a double
      checksumming, then backported to stable kernels as a bug fix.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarCharles (Chas) Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      98f57e42
    • Neil Horman's avatar
      PCI/ACPI: Fix _OSC ordering to allow PCIe hotplug use when available · 6aaf5d4c
      Neil Horman authored
      commit 3dc48af3 upstream.
      
      This fixes the problem of acpiphp claiming slots that should be managed
      by pciehp, which may keep ExpressCard slots from working.
      
      The acpiphp driver claims PCIe slots unless the BIOS has granted us
      control of PCIe native hotplug via _OSC.  Prior to v3.10, the acpiphp
      .add method (add_bridge()) was always called *after* we had requested
      native hotplug control with _OSC.
      
      But after 3b63aaa7 ("PCI: acpiphp: Do not use ACPI PCI subdriver
      mechanism"), which appeared in v3.10, acpiphp initialization is done
      during the bus scan via the pcibios_add_bus() hook, and this happens
      *before* we request native hotplug control.
      
      Therefore, acpiphp doesn't know yet whether the BIOS will grant control,
      and it claims slots that we should be handling with native hotplug.
      
      This patch requests native hotplug control earlier, so we know whether
      the BIOS granted it to us before we initialize acpiphp.
      
      To avoid reintroducing the ASPM issue fixed by b8178f13 ('Revert
      "PCI/ACPI: Request _OSC control before scanning PCI root bus"'), we run
      _OSC earlier but defer the actual ASPM calls until after the bus scan is
      complete.
      
      Tested successfully by myself.
      
      [bhelgaas: changelog, mark for stable]
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=60736Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      CC: stable@vger.kernel.org	# v3.10+
      CC: Len Brown <lenb@kernel.org>
      CC: "Rafael J. Wysocki" <rjw@sisk.pl>
      [ciwillia@brocade.com: backported to 3.10: adjusted context]
      Signed-off-by: default avatarCharles (Chas) Williams <ciwillia@brocade.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6aaf5d4c
    • Vladimir Davydov's avatar
      signal: remove warning about using SI_TKILL in rt_[tg]sigqueueinfo · 15db6bf1
      Vladimir Davydov authored
      commit 69828dce upstream.
      
      Sending SI_TKILL from rt_[tg]sigqueueinfo was deprecated, so now we issue
      a warning on the first attempt of doing it.  We use WARN_ON_ONCE, which is
      not informative and, what is worse, taints the kernel, making the trinity
      syscall fuzzer complain false-positively from time to time.
      
      It does not look like we need this warning at all, because the behaviour
      changed quite a long time ago (2.6.39), and if an application relies on
      the old API, it gets EPERM anyway and can issue a warning by itself.
      
      So let us zap the warning in kernel.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Vinson Lee <vlee@freedesktop.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      15db6bf1
    • Andrey Ryabinin's avatar
      perf/x86: Fix undefined shift on 32-bit kernels · 55e1f395
      Andrey Ryabinin authored
      commit 6d6f2833 upstream.
      
      Jim reported:
      
      	UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12
      	shift exponent 35 is too large for 32-bit type 'long unsigned int'
      
      The use of 'unsigned long' type obviously is not correct here, make it
      'unsigned long long' instead.
      Reported-by: default avatarJim Cromie <jim.cromie@gmail.com>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Imre Palik <imrep@amazon.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 2c33645d ("perf/x86: Honor the architectural performance monitoring version")
      Link: http://lkml.kernel.org/r/1462974711-10037-1-git-send-email-aryabinin@virtuozzo.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Kevin Christopher <kevinc@vmware.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      55e1f395
    • Palik, Imre's avatar
      perf/x86: Honor the architectural performance monitoring version · ad0fd1a5
      Palik, Imre authored
      commit 2c33645d upstream.
      
      Architectural performance monitoring, version 1, doesn't support fixed counters.
      
      Currently, even if a hypervisor advertises support for architectural
      performance monitoring version 1, perf may still try to use the fixed
      counters, as the constraints are set up based on the CPU model.
      
      This patch ensures that perf honors the architectural performance monitoring
      version returned by CPUID, and it only uses the fixed counters for version 2
      and above.
      
      (Some of the ideas in this patch came from Peter Zijlstra.)
      Signed-off-by: default avatarImre Palik <imrep@amazon.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Anthony Liguori <aliguori@amazon.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1433767609-1039-1-git-send-email-imrep.amz@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [wt: FIXED_EVENT_FLAGS was X86_RAW_EVENT_MASK in 3.10]
      Cc: Kevin Christopher <kevinc@vmware.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ad0fd1a5
    • Florian Westphal's avatar
      netfilter: x_tables: introduce and use xt_copy_counters_from_user · 0a052338
      Florian Westphal authored
      commit 63ecb81a upstream.
      
      commit d7591f0c upstream
      
      The three variants use same copy&pasted code, condense this into a
      helper and use that.
      
      Make sure info.name is 0-terminated.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0a052338
    • Bernhard Thaler's avatar
      Revert "netfilter: ensure number of counters is >0 in do_replace()" · eb4a7d67
      Bernhard Thaler authored
      commit d26e2c9f upstream.
      
      This partially reverts commit 1086bbe9 ("netfilter: ensure number of
      counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c.
      
      Setting rules with ebtables does not work any more with 1086bbe9 place.
      
      There is an error message and no rules set in the end.
      
      e.g.
      
      ~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
      Unable to update the kernel. Two possible causes:
      1. Multiple ebtables programs were executing simultaneously. The ebtables
         userspace tool doesn't by default support multiple ebtables programs
      running
      
      Reverting the ebtables part of 1086bbe9 makes this work again.
      Signed-off-by: default avatarBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      eb4a7d67
    • Florian Westphal's avatar
      netfilter: x_tables: do compat validation via translate_table · 151cc2f5
      Florian Westphal authored
      commit 09d96860 upstream.
      
      This looks like refactoring, but its also a bug fix.
      
      Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few
      sanity tests that are done in the normal path.
      
      For example, we do not check for underflows and the base chain policies.
      
      While its possible to also add such checks to the compat path, its more
      copy&pastry, for instance we cannot reuse check_underflow() helper as
      e->target_offset differs in the compat case.
      
      Other problem is that it makes auditing for validation errors harder; two
      places need to be checked and kept in sync.
      
      At a high level 32 bit compat works like this:
      1- initial pass over blob:
         validate match/entry offsets, bounds checking
         lookup all matches and targets
         do bookkeeping wrt. size delta of 32/64bit structures
         assign match/target.u.kernel pointer (points at kernel
         implementation, needed to access ->compatsize etc.)
      
      2- allocate memory according to the total bookkeeping size to
         contain the translated ruleset
      
      3- second pass over original blob:
         for each entry, copy the 32bit representation to the newly allocated
         memory.  This also does any special match translations (e.g.
         adjust 32bit to 64bit longs, etc).
      
      4- check if ruleset is free of loops (chase all jumps)
      
      5-first pass over translated blob:
         call the checkentry function of all matches and targets.
      
      The alternative implemented by this patch is to drop steps 3&4 from the
      compat process, the translation is changed into an intermediate step
      rather than a full 1:1 translate_table replacement.
      
      In the 2nd pass (step #3), change the 64bit ruleset back to a kernel
      representation, i.e. put() the kernel pointer and restore ->u.user.name .
      
      This gets us a 64bit ruleset that is in the format generated by a 64bit
      iptables userspace -- we can then use translate_table() to get the
      'native' sanity checks.
      
      This has two drawbacks:
      
      1. we re-validate all the match and target entry structure sizes even
      though compat translation is supposed to never generate bogus offsets.
      2. we put and then re-lookup each match and target.
      
      THe upside is that we get all sanity tests and ruleset validations
      provided by the normal path and can remove some duplicated compat code.
      
      iptables-restore time of autogenerated ruleset with 300k chains of form
      -A CHAIN0001 -m limit --limit 1/s -j CHAIN0002
      -A CHAIN0002 -m limit --limit 1/s -j CHAIN0003
      
      shows no noticeable differences in restore times:
      old:   0m30.796s
      new:   0m31.521s
      64bit: 0m25.674s
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      151cc2f5
    • Dave Jones's avatar
      netfilter: ensure number of counters is >0 in do_replace() · bb560593
      Dave Jones authored
      commit 1086bbe9 upstream.
      
      After improving setsockopt() coverage in trinity, I started triggering
      vmalloc failures pretty reliably from this code path:
      
      warn_alloc_failed+0xe9/0x140
      __vmalloc_node_range+0x1be/0x270
      vzalloc+0x4b/0x50
      __do_replace+0x52/0x260 [ip_tables]
      do_ipt_set_ctl+0x15d/0x1d0 [ip_tables]
      nf_setsockopt+0x65/0x90
      ip_setsockopt+0x61/0xa0
      raw_setsockopt+0x16/0x60
      sock_common_setsockopt+0x14/0x20
      SyS_setsockopt+0x71/0xd0
      
      It turns out we don't validate that the num_counters field in the
      struct we pass in from userspace is initialized.
      
      The same problem also exists in ebtables, arptables, ipv6, and the
      compat variants.
      Signed-off-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bb560593
    • Florian Westphal's avatar
      netfilter: x_tables: xt_compat_match_from_user doesn't need a retval · fbe426f8
      Florian Westphal authored
      commit 0188346f upstream.
      
      Always returned 0.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      fbe426f8