1. 18 Jan, 2018 1 commit
    • Thomas Gleixner's avatar
      irq/matrix: Spread interrupts on allocation · a0c9259d
      Thomas Gleixner authored
      Keith reported an issue with vector space exhaustion on a server machine
      which is caused by the i40e driver allocating 168 MSI interrupts when the
      driver is initialized, even when most of these interrupts are not used at
      all.
      
      The x86 vector allocation code tries to avoid the immediate allocation with
      the reservation mode, but the card uses MSI and does not support MSI entry
      masking, which prevents reservation mode and requires immediate vector
      allocation.
      
      The matrix allocator is a bit naive and prefers the first CPU in the
      cpumask which describes the possible target CPUs for an allocation. That
      results in allocating all 168 vectors on CPU0 which later causes vector
      space exhaustion when the NVMe driver tries to allocate managed interrupts
      on each CPU for the per CPU queues.
      
      Avoid this by finding the CPU which has the lowest vector allocation count
      to spread out the non managed interrupt accross the possible target CPUs.
      
      Fixes: 2f75d9e1 ("genirq: Implement bitmap matrix allocator")
      Reported-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarKeith Busch <keith.busch@intel.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801171557330.1777@nanos
      a0c9259d
  2. 17 Jan, 2018 11 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1d966eb4
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
      
         - A rather involved set of memory hardware encryption fixes to
           support the early loading of microcode files via the initrd. These
           are larger than what we normally take at such a late -rc stage, but
           there are two mitigating factors: 1) much of the changes are
           limited to the SME code itself 2) being able to early load
           microcode has increased importance in the post-Meltdown/Spectre
           era.
      
         - An IRQ vector allocator fix
      
         - An Intel RDT driver use-after-free fix
      
         - An APIC driver bug fix/revert to make certain older systems boot
           again
      
         - A pkeys ABI fix
      
         - TSC calibration fixes
      
         - A kdump fix"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic/vector: Fix off by one in error path
        x86/intel_rdt/cqm: Prevent use after free
        x86/mm: Encrypt the initrd earlier for BSP microcode update
        x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
        x86/mm: Centralize PMD flags in sme_encrypt_kernel()
        x86/mm: Use a struct to reduce parameters for SME PGD mapping
        x86/mm: Clean up register saving in the __enc_copy() assembly code
        x86/idt: Mark IDT tables __initconst
        Revert "x86/apic: Remove init_bsp_APIC()"
        x86/mm/pkeys: Fix fill_sig_info_pkey
        x86/tsc: Print tsc_khz, when it differs from cpu_khz
        x86/tsc: Fix erroneous TSC rate on Skylake Xeon
        x86/tsc: Future-proof native_calibrate_tsc()
        kdump: Write the correct address of mem_section into vmcoreinfo
      1d966eb4
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9a4ba2ab
      Linus Torvalds authored
      Pull scheduler fix from Ingo Molnar:
       "A delayacct statistics correctness fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        delayacct: Account blkio completion on the correct task
      9a4ba2ab
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7dfda84d
      Linus Torvalds authored
      Pull x86 perf fix from Ingo Molnar:
       "An Intel RAPL events fix"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/rapl: Fix Haswell and Broadwell server RAPL event
      7dfda84d
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b8c22594
      Linus Torvalds authored
      Pull locking fixes from Ingo Molnar:
       "Two futex fixes: a input parameters robustness fix, and futex race
        fixes"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Prevent overflow by strengthen input validation
        futex: Avoid violating the 10th rule of futex
      b8c22594
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 88dc7fca
      Linus Torvalds authored
      Pull x86 pti bits and fixes from Thomas Gleixner:
       "This last update contains:
      
         - An objtool fix to prevent a segfault with the gold linker by
           changing the invocation order. That's not just for gold, it's a
           general robustness improvement.
      
         - An improved error message for objtool which spares tearing hairs.
      
         - Make KASAN fail loudly if there is not enough memory instead of
           oopsing at some random place later
      
         - RSB fill on context switch to prevent RSB underflow and speculation
           through other units.
      
         - Make the retpoline/RSB functionality work reliably for both Intel
           and AMD
      
         - Add retpoline to the module version magic so mismatch can be
           detected
      
         - A small (non-fix) update for cpufeatures which prevents cpu feature
           clashing for the upcoming extra mitigation bits to ease
           backporting"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        module: Add retpoline tag to VERMAGIC
        x86/cpufeature: Move processor tracing out of scattered features
        objtool: Improve error message for bad file argument
        objtool: Fix seg fault with gold linker
        x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
        x86/retpoline: Fill RSB on context switch for affected CPUs
        x86/kasan: Panic if there is not enough memory to boot
      88dc7fca
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dd43f346
      Linus Torvalds authored
      Pull timer fix from Thomas Gleixner:
       "A one-liner fix which prevents deferrable timers becoming stale when
        the system does not switch into NOHZ mode"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timers: Unconditionally check deferrable base
      dd43f346
    • Thomas Gleixner's avatar
      x86/apic/vector: Fix off by one in error path · 45d55e7b
      Thomas Gleixner authored
      Keith reported the following warning:
      
      WARNING: CPU: 28 PID: 1420 at kernel/irq/matrix.c:222 irq_matrix_remove_managed+0x10f/0x120
        x86_vector_free_irqs+0xa1/0x180
        x86_vector_alloc_irqs+0x1e4/0x3a0
        msi_domain_alloc+0x62/0x130
      
      The reason for this is that if the vector allocation fails the error
      handling code tries to free the failed vector as well, which causes the
      above imbalance warning to trigger.
      
      Adjust the error path to handle this correctly.
      
      Fixes: b5dc8e6c ("x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors")
      Reported-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarKeith Busch <keith.busch@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161217300.1823@nanos
      45d55e7b
    • Thomas Gleixner's avatar
      x86/intel_rdt/cqm: Prevent use after free · d4792441
      Thomas Gleixner authored
      intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
      proceeds accessing it.
      
       BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
       Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195
       find_first_bit+0x1f/0x80
       has_busy_rmid+0x47/0x70
       intel_rdt_offline_cpu+0x4b4/0x510
      
       Freed by task 195:
       kfree+0x94/0x1a0
       intel_rdt_offline_cpu+0x17d/0x510
      
      Do the teardown first and then free memory.
      
      Fixes: 24247aee ("x86/intel_rdt/cqm: Improve limbo list processing")
      Reported-by: default avatarJoseph Salisbury <joseph.salisbury@canonical.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Peter Zilstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: "Roderick W. Smith" <rod.smith@canonical.com>
      Cc: 1733662@bugs.launchpad.net
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161957510.2366@nanos
      d4792441
    • Andi Kleen's avatar
      module: Add retpoline tag to VERMAGIC · 6cfb521a
      Andi Kleen authored
      Add a marker for retpoline to the module VERMAGIC. This catches the case
      when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
      making it insecure.
      
      It doesn't handle the case when retpoline has been runtime disabled.  Even
      in this case the match of the retcompile status will be enforced.  This
      implies that even with retpoline run time disabled all modules loaded need
      to be recompiled.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: rusty@rustcorp.com.au
      Cc: arjan.van.de.ven@intel.com
      Cc: jeyu@kernel.org
      Cc: torvalds@linux-foundation.org
      Link: https://lkml.kernel.org/r/20180116205228.4890-1-andi@firstfloor.org
      6cfb521a
    • Paolo Bonzini's avatar
      x86/cpufeature: Move processor tracing out of scattered features · 4fdec203
      Paolo Bonzini authored
      Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX),
      so do not duplicate it in the scattered features word.
      
      Besides being more tidy, this will be useful for KVM when it presents
      processor tracing to the guests.  KVM selects host features that are
      supported by both the host kernel (depending on command line options,
      CPU errata, or whatever) and KVM.  Whenever a full feature word exists,
      KVM's code is written in the expectation that the CPUID bit number
      matches the X86_FEATURE_* bit number, but this is not the case for
      X86_FEATURE_INTEL_PT.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luwei Kang <luwei.kang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/1516117345-34561-1-git-send-email-pbonzini@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4fdec203
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 8cbab92d
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "We had a few more items creep up over the last week. Given we are in
        -rc8, these are obviously limited to bugs that have a big downside and
        for which we are certain of the fix.
      
        The first is a straight up oops bug that all you have to do is read
        the code to see it's a guaranteed 100% oops bug.
      
        The second is a use-after-free issue. We get away lucky if the queue
        we are shutting down is empty, but if it isn't, we can end up oopsing.
        We really need to drain the queue before destroying it.
      
        The final one is an issue with bad user input causing us to access our
        port array out of bounds. While fixing the array out of bounds issue,
        it was noticed that the original code did the same thing twice (the
        call to rdma_ah_set_port_num()), so its removal is not balanced by a
        readd elsewhere, it was already where it needed to be in addition to
        where it didn't need to be.
      
        Summary:
      
         - Oops fix in hfi1 driver
      
         - use-after-free issue in iser-target
      
         - use of user supplied array index without proper checking"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/mlx5: Fix out-of-bound access while querying AH
        IB/hfi1: Prevent a NULL dereference
        iser-target: Fix possible use-after-free in connection establishment error
      8cbab92d
  3. 16 Jan, 2018 20 commits
  4. 15 Jan, 2018 8 commits
    • Leon Romanovsky's avatar
      RDMA/mlx5: Fix out-of-bound access while querying AH · ae59c3f0
      Leon Romanovsky authored
      The rdma_ah_find_type() accesses the port array based on an index
      controlled by userspace. The existing bounds check is after the first use
      of the index, so userspace can generate an out of bounds access, as shown
      by the KASN report below.
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
      Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409
      
      CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      Call Trace:
       dump_stack+0xe9/0x18f
       print_address_description+0xa2/0x350
       kasan_report+0x3a5/0x400
       to_rdma_ah_attr+0xa8/0x3b0
       mlx5_ib_query_qp+0xd35/0x1330
       ib_query_qp+0x8a/0xb0
       ib_uverbs_query_qp+0x237/0x7f0
       ib_uverbs_write+0x617/0xd80
       __vfs_write+0xf7/0x500
       vfs_write+0x149/0x310
       SyS_write+0xca/0x190
       entry_SYSCALL_64_fastpath+0x18/0x85
      RIP: 0033:0x7fe9c7a275a0
      RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
      RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
      RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
      R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
      R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560
      
      Allocated by task 1:
       __kmalloc+0x3f9/0x430
       alloc_mad_private+0x25/0x50
       ib_mad_post_receive_mads+0x204/0xa60
       ib_mad_init_device+0xa59/0x1020
       ib_register_device+0x83a/0xbc0
       mlx5_ib_add+0x50e/0x5c0
       mlx5_add_device+0x142/0x410
       mlx5_register_interface+0x18f/0x210
       mlx5_ib_init+0x56/0x63
       do_one_initcall+0x15b/0x270
       kernel_init_freeable+0x2d8/0x3d0
       kernel_init+0x14/0x190
       ret_from_fork+0x24/0x30
      
      Freed by task 0:
      (stack is not available)
      
      The buggy address belongs to the object at ffff880019ae2000
       which belongs to the cache kmalloc-512 of size 512
      The buggy address is located 104 bytes to the right of
       512-byte region [ffff880019ae2000, ffff880019ae2200)
      The buggy address belongs to the page:
      page:000000005d674e18 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
      flags: 0x4000000000008100(slab|head)
      raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
      raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
      >ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                                ^
       ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      ==================================================================
      Disabling lock debugging due to kernel taint
      
      Cc: <stable@vger.kernel.org>
      Fixes: 44c58487 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      ae59c3f0
    • Johannes Berg's avatar
      netlink: extack: avoid parenthesized string constant warning · 6311b7ce
      Johannes Berg authored
      NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning
      in newer versions of gcc:
        warning: array initialized from parenthesized string constant
      
      Just remove the parentheses, they're not needed in this context since
      anyway since there can be no operator precendence issues or similar.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6311b7ce
    • David S. Miller's avatar
      Merge branch 'ipv4-Make-neigh-lookup-keys-for-loopback-point-to-point-devices-be-INADDR_ANY' · db9ca5ca
      David S. Miller authored
      Jim Westfall says:
      
      ====================
      ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
      
      This used to be the previous behavior in older kernels but became broken in
      a263b309 (ipv4: Make neigh lookups directly in output packet path)
      and then later removed because it was broken in 0bb4087c (ipv4: Fix neigh
      lookup keying over loopback/point-to-point devices)
      
      Not having this results in there being an arp entry for every remote ip
      address that the device talks to.  Given a fairly active device it can
      cause the arp table to become huge and/or having to add/purge large number
      of entires to keep within table size thresholds.
      
      $ ip -4 neigh show nud noarp | grep tun | wc -l
      55850
      
      $ lnstat -k arp_cache:entries,arp_cache:allocs,arp_cache:destroys -c 10
      arp_cach|arp_cach|arp_cach|
       entries|  allocs|destroys|
         81493|620166816|620126069|
        101867|   10186|       0|
        113854|    5993|       0|
        118773|    2459|       0|
         27937|   18579|   63998|
         39256|    5659|       0|
         56231|    8487|       0|
         65602|    4685|       0|
         79697|    7047|       0|
         90733|    5517|       0|
      
      v2:
       - fixes coding style issues
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db9ca5ca
    • Jim Westfall's avatar
      ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY · cd9ff4de
      Jim Westfall authored
      Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
      to avoid making an entry for every remote ip the device needs to talk to.
      
      This used the be the old behavior but became broken in a263b309
      (ipv4: Make neigh lookups directly in output packet path) and later removed
      in 0bb4087c (ipv4: Fix neigh lookup keying over loopback/point-to-point
      devices) because it was broken.
      Signed-off-by: default avatarJim Westfall <jwestfall@surrealistic.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd9ff4de
    • Jim Westfall's avatar
      net: Allow neigh contructor functions ability to modify the primary_key · 096b9854
      Jim Westfall authored
      Use n->primary_key instead of pkey to account for the possibility that a neigh
      constructor function may have modified the primary_key value.
      Signed-off-by: default avatarJim Westfall <jwestfall@surrealistic.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      096b9854
    • Sergei Shtylyov's avatar
      sh_eth: fix dumping ARSTR · 17d0fb0c
      Sergei Shtylyov authored
      ARSTR  is always located at the start of the TSU register region, thus
      using add_reg()  instead of add_tsu_reg() in __sh_eth_get_regs() to dump it
      causes EDMR or EDSR (depending on the register layout) to be dumped instead
      of ARSTR.  Use the correct condition/macro there...
      
      Fixes: 6b4b4fea ("sh_eth: Implement ethtool register dump operations")
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17d0fb0c
    • William Tu's avatar
      Revert "openvswitch: Add erspan tunnel support." · 95a33208
      William Tu authored
      This reverts commit ceaa001a.
      
      The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed
      as a nested attribute to support all ERSPAN v1 and v2's fields.
      The current attr is a be32 supporting only one field.  Thus, this
      patch reverts it and later patch will redo it using nested attr.
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Cc: Jiri Benc <jbenc@redhat.com>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95a33208
    • r.hering@avm.de's avatar
      net/tls: Fix inverted error codes to avoid endless loop · 30be8f8d
      r.hering@avm.de authored
      sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
      Socket error codes must be inverted by Kernel TLS before returning because
      they are stored with positive sign. If returned non-inverted they are
      interpreted as number of bytes sent, causing endless looping of the
      splice mechanic behind sendfile().
      Signed-off-by: default avatarRobert Hering <r.hering@avm.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30be8f8d