1. 09 Jan, 2019 40 commits
    • Maciej W. Rozycki's avatar
      rtc: m41t80: Correct alarm month range with RTC reads · 3f23e65d
      Maciej W. Rozycki authored
      commit 3cc9ffbb upstream.
      
      Add the missing adjustment of the month range on alarm reads from the
      RTC, correcting an issue coming from commit 9c6dfed9 ("rtc: m41t80:
      add alarm functionality").  The range is 1-12 for hardware and 0-11 for
      `struct rtc_time', and is already correctly handled on alarm writes to
      the RTC.
      
      It was correct up until commit 48e97667 ("drivers/rtc/rtc-m41t80.c:
      remove disabled alarm functionality") too, which removed the previous
      implementation of alarm support.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@linux-mips.org>
      Fixes: 9c6dfed9 ("rtc: m41t80: add alarm functionality")
      References: 48e97667 ("drivers/rtc/rtc-m41t80.c: remove disabled alarm functionality")
      Cc: stable@vger.kernel.org # 4.7+
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f23e65d
    • Marc Zyngier's avatar
      arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 PPIs/SGIs · 83c2752a
      Marc Zyngier authored
      commit 107352a2 upstream.
      
      We currently only halt the guest when a vCPU messes with the active
      state of an SPI. This is perfectly fine for GICv2, but isn't enough
      for GICv3, where all vCPUs can access the state of any other vCPU.
      
      Let's broaden the condition to include any GICv3 interrupt that
      has an active state (i.e. all but LPIs).
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83c2752a
    • Will Deacon's avatar
      arm64: KVM: Avoid setting the upper 32 bits of VTCR_EL2 to 1 · f6be406e
      Will Deacon authored
      commit df655b75 upstream.
      
      Although bit 31 of VTCR_EL2 is RES1, we inadvertently end up setting all
      of the upper 32 bits to 1 as well because we define VTCR_EL2_RES1 as
      signed, which is sign-extended when assigning to kvm->arch.vtcr.
      
      Lucky for us, the architecture currently treats these upper bits as RES0
      so, whilst we've been naughty, we haven't set fire to anything yet.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6be406e
    • Georgy A Bystrenin's avatar
      CIFS: Fix error mapping for SMB2_LOCK command which caused OFD lock problem · 8697c15f
      Georgy A Bystrenin authored
      commit 9a596f5b upstream.
      
      While resolving a bug with locks on samba shares found a strange behavior.
      When a file locked by one node and we trying to lock it from another node
      it fail with errno 5 (EIO) but in that case errno must be set to
      (EACCES | EAGAIN).
      This isn't happening when we try to lock file second time on same node.
      In this case it returns EACCES as expected.
      Also this issue not reproduces when we use SMB1 protocol (vers=1.0 in
      mount options).
      
      Further investigation showed that the mapping from status_to_posix_error
      is different for SMB1 and SMB2+ implementations.
      For SMB1 mapping is [NT_STATUS_LOCK_NOT_GRANTED to ERRlock]
      (See fs/cifs/netmisc.c line 66)
      but for SMB2+ mapping is [STATUS_LOCK_NOT_GRANTED to -EIO]
      (see fs/cifs/smb2maperror.c line 383)
      
      Quick changes in SMB2+ mapping from EIO to EACCES has fixed issue.
      
      BUG: https://bugzilla.kernel.org/show_bug.cgi?id=201971Signed-off-by: default avatarGeorgy A Bystrenin <gkot@altlinux.org>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8697c15f
    • Aaro Koskinen's avatar
      MIPS: OCTEON: mark RGMII interface disabled on OCTEON III · ac40fe53
      Aaro Koskinen authored
      commit edefae94 upstream.
      
      Commit 885872b7 ("MIPS: Octeon: Add Octeon III CN7xxx
      interface detection") added RGMII interface detection for OCTEON III,
      but it results in the following logs:
      
      [    7.165984] ERROR: Unsupported Octeon model in __cvmx_helper_rgmii_probe
      [    7.173017] ERROR: Unsupported Octeon model in __cvmx_helper_rgmii_probe
      
      The current RGMII routines are valid only for older OCTEONS that
      use GMX/ASX hardware blocks. On later chips AGL should be used,
      but support for that is missing in the mainline. Until that is added,
      mark the interface as disabled.
      
      Fixes: 885872b7 ("MIPS: Octeon: Add Octeon III CN7xxx interface detection")
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-mips@vger.kernel.org
      Cc: stable@vger.kernel.org # 4.7+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac40fe53
    • Paul Burton's avatar
      MIPS: Expand MIPS32 ASIDs to 64 bits · af8a41b9
      Paul Burton authored
      commit ff4dd232 upstream.
      
      ASIDs have always been stored as unsigned longs, ie. 32 bits on MIPS32
      kernels. This is problematic because it is feasible for the ASID version
      to overflow & wrap around to zero.
      
      We currently attempt to handle this overflow by simply setting the ASID
      version to 1, using asid_first_version(), but we make no attempt to
      account for the fact that there may be mm_structs with stale ASIDs that
      have versions which we now reuse due to the overflow & wrap around.
      
      Encountering this requires that:
      
        1) A struct mm_struct X is active on CPU A using ASID (V,n).
      
        2) That mm is not used on CPU A for the length of time that it takes
           for CPU A's asid_cache to overflow & wrap around to the same
           version V that the mm had in step 1. During this time tasks using
           the mm could either be sleeping or only scheduled on other CPUs.
      
        3) Some other mm Y becomes active on CPU A and is allocated the same
           ASID (V,n).
      
        4) mm X now becomes active on CPU A again, and now incorrectly has the
           same ASID as mm Y.
      
      Where struct mm_struct ASIDs are represented above in the format
      (version, EntryHi.ASID), and on a typical MIPS32 system version will be
      24 bits wide & EntryHi.ASID will be 8 bits wide.
      
      The length of time required in step 2 is highly dependent upon the CPU &
      workload, but for a hypothetical 2GHz CPU running a workload which
      generates a new ASID every 10000 cycles this period is around 248 days.
      Due to this long period of time & the fact that tasks need to be
      scheduled in just the right (or wrong, depending upon your inclination)
      way, this is obviously a difficult bug to encounter but it's entirely
      possible as evidenced by reports.
      
      In order to fix this, simply extend ASIDs to 64 bits even on MIPS32
      builds. This will extend the period of time required for the
      hypothetical system above to encounter the problem from 28 days to
      around 3 trillion years, which feels safely outside of the realms of
      possibility.
      
      The cost of this is slightly more generated code in some commonly
      executed paths, but this is pretty minimal:
      
                               | Code Size Gain | Percentage
        -----------------------|----------------|-------------
          decstation_defconfig |           +270 | +0.00%
              32r2el_defconfig |           +652 | +0.01%
              32r6el_defconfig |          +1000 | +0.01%
      
      I have been unable to measure any change in performance of the LMbench
      lat_ctx or lat_proc tests resulting from the 64b ASIDs on either
      32r2el_defconfig+interAptiv or 32r6el_defconfig+I6500 systems.
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Suggested-by: default avatarJames Hogan <jhogan@kernel.org>
      References: https://lore.kernel.org/linux-mips/80B78A8B8FEE6145A87579E8435D78C30205D5F3@fzex.ruijie.com.cn/
      References: https://lore.kernel.org/linux-mips/1488684260-18867-1-git-send-email-jiwei.sun@windriver.com/
      Cc: Jiwei Sun <jiwei.sun@windriver.com>
      Cc: Yu Huabing <yhb@ruijie.com.cn>
      Cc: stable@vger.kernel.org # 2.6.12+
      Cc: linux-mips@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af8a41b9
    • Huacai Chen's avatar
      MIPS: Align kernel load address to 64KB · 5647c1d5
      Huacai Chen authored
      commit bec0de4c upstream.
      
      KEXEC needs the new kernel's load address to be aligned on a page
      boundary (see sanity_check_segment_list()), but on MIPS the default
      vmlinuz load address is only explicitly aligned to 16 bytes.
      
      Since the largest PAGE_SIZE supported by MIPS kernels is 64KB, increase
      the alignment calculated by calc_vmlinuz_load_addr to 64KB.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Patchwork: https://patchwork.linux-mips.org/patch/21131/
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <james.hogan@mips.com>
      Cc: Steven J . Hill <Steven.Hill@cavium.com>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Cc: <stable@vger.kernel.org> # 2.6.36+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5647c1d5
    • Huacai Chen's avatar
      MIPS: Ensure pmd_present() returns false after pmd_mknotpresent() · 5744be55
      Huacai Chen authored
      commit 92aa0718 upstream.
      
      This patch is borrowed from ARM64 to ensure pmd_present() returns false
      after pmd_mknotpresent(). This is needed for THP.
      
      References: 5bb1cc0f ("arm64: Ensure pmd_present() returns false after pmd_mknotpresent()")
      Reviewed-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Patchwork: https://patchwork.linux-mips.org/patch/21135/
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <james.hogan@mips.com>
      Cc: Steven J . Hill <Steven.Hill@cavium.com>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Cc: <stable@vger.kernel.org> # 3.8+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5744be55
    • Huacai Chen's avatar
      MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3 · 95168354
      Huacai Chen authored
      commit bb53fdf3 upstream.
      
      For multi-node Loongson-3 (NUMA configuration), r4k_blast_scache() can
      only flush Node-0's scache. So we add r4k_blast_scache_node() by using
      (CAC_BASE | (node_id << NODE_ADDRSPACE_SHIFT)) instead of CKSEG0 as the
      start address.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      [paul.burton@mips.com: Include asm/mmzone.h from asm/r4kcache.h for
      		       nid_to_addrbase(). Add asm/mach-generic/mmzone.h
      		       to allow inclusion for all platforms.]
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Patchwork: https://patchwork.linux-mips.org/patch/21129/
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <james.hogan@mips.com>
      Cc: Steven J . Hill <Steven.Hill@cavium.com>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Cc: <stable@vger.kernel.org> # 3.15+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95168354
    • Paul Burton's avatar
      MIPS: math-emu: Write-protect delay slot emulation pages · 2713b8fd
      Paul Burton authored
      commit adcc81f1 upstream.
      
      Mapping the delay slot emulation page as both writeable & executable
      presents a security risk, in that if an exploit can write to & jump into
      the page then it can be used as an easy way to execute arbitrary code.
      
      Prevent this by mapping the page read-only for userland, and using
      access_process_vm() with the FOLL_FORCE flag to write to it from
      mips_dsemul().
      
      This will likely be less efficient due to copy_to_user_page() performing
      cache maintenance on a whole page, rather than a single line as in the
      previous use of flush_cache_sigtramp(). However this delay slot
      emulation code ought not to be running in any performance critical paths
      anyway so this isn't really a problem, and we can probably do better in
      copy_to_user_page() anyway in future.
      
      A major advantage of this approach is that the fix is small & simple to
      backport to stable kernels.
      Reported-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Fixes: 432c6bac ("MIPS: Use per-mm page to execute branch delay slot instructions")
      Cc: stable@vger.kernel.org # v4.8+
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Rich Felker <dalias@libc.org>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2713b8fd
    • Hans Verkuil's avatar
      media: v4l2-tpg: array index could become negative · 35323cb2
      Hans Verkuil authored
      commit e5f71a27 upstream.
      
      text[s] is a signed char, so using that as index into the font8x16 array
      can result in negative indices. Cast it to u8 to be safe.
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Reported-by: syzbot+ccf0a61ed12f2a7313ee@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>      # for v4.7 and up
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35323cb2
    • Hans Verkuil's avatar
      media: vivid: free bitmap_cap when updating std/timings/etc. · cd1f0770
      Hans Verkuil authored
      commit 560ccb75 upstream.
      
      When vivid_update_format_cap() is called it should free any overlay
      bitmap since the compose size will change.
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Reported-by: syzbot+0cc8e3cc63ca373722c6@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>      # for v3.18 and up
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd1f0770
    • Nava kishore Manne's avatar
      serial: uartps: Fix interrupt mask issue to handle the RX interrupts properly · 5110d0b4
      Nava kishore Manne authored
      commit 26068313 upstream.
      
      This patch Correct the RX interrupt mask value to handle the
      RX interrupts properly.
      
      Fixes: c8dbdc84 ("serial: xuartps: Rewrite the interrupt handling logic")
      Signed-off-by: default avatarNava kishore Manne <nava.manne@xilinx.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5110d0b4
    • Martin Blumenstingl's avatar
      f2fs: fix validation of the block count in sanity_check_raw_super · c5e4022f
      Martin Blumenstingl authored
      commit 88960068 upstream.
      
      Treat "block_count" from struct f2fs_super_block as 64-bit little endian
      value in sanity_check_raw_super() because struct f2fs_super_block
      declares "block_count" as "__le64".
      
      This fixes a bug where the superblock validation fails on big endian
      devices with the following error:
        F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
        F2FS-fs (sda1): Can't find valid F2FS filesystem in 1th superblock
        F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
        F2FS-fs (sda1): Can't find valid F2FS filesystem in 2th superblock
      As result of this the partition cannot be mounted.
      
      With this patch applied the superblock validation works fine and the
      partition can be mounted again:
        F2FS-fs (sda1): Mounted with checkpoint version = 7c84
      
      My little endian x86-64 hardware was able to mount the partition without
      this fix.
      To confirm that mounting f2fs filesystems works on big endian machines
      again I tested this on a 32-bit MIPS big endian (lantiq) device.
      
      Fixes: 0cfe75c5 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5e4022f
    • Florian Westphal's avatar
      netfilter: nf_conncount: don't skip eviction when age is negative · 052ccb86
      Florian Westphal authored
      commit 4cd273bb upstream.
      
      (not in Linus's tree now, but in nf.git + linux-next.git already.)
      
      age is signed integer, so result can be negative when the timestamps
      have a large delta.  In this case we want to discard the entry.
      
      Instead of using age >= 2 || age < 0, just make it unsigned.
      
      Fixes: b36e4523 ("netfilter: nf_conncount: fix garbage collection confirm race")
      Reviewed-by: default avatarShawn Bohrer <sbohrer@cloudflare.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      
      [mfo: backport: use older file name, nf_conncount.c -> xt_connlimit.c]
      Signed-off-by: default avatarMauricio Faria de Oliveira <mfo@canonical.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      052ccb86
    • Florian Westphal's avatar
      netfilter: nf_conncount: fix garbage collection confirm race · 75af3d78
      Florian Westphal authored
      commit b36e4523 upstream.
      
      Yi-Hung Wei and Justin Pettit found a race in the garbage collection scheme
      used by nf_conncount.
      
      When doing list walk, we lookup the tuple in the conntrack table.
      If the lookup fails we remove this tuple from our list because
      the conntrack entry is gone.
      
      This is the common cause, but turns out its not the only one.
      The list entry could have been created just before by another cpu, i.e. the
      conntrack entry might not yet have been inserted into the global hash.
      
      The avoid this, we introduce a timestamp and the owning cpu.
      If the entry appears to be stale, evict only if:
       1. The current cpu is the one that added the entry, or,
       2. The timestamp is older than two jiffies
      
      The second constraint allows GC to be taken over by other
      cpu too (e.g. because a cpu was offlined or napi got moved to another
      cpu).
      
      We can't pretend the 'doubtful' entry wasn't in our list.
      Instead, when we don't find an entry indicate via IS_ERR
      that entry was removed ('did not exist' or withheld
      ('might-be-unconfirmed').
      
      This most likely also fixes a xt_connlimit imbalance earlier reported by
      Dmitry Andrianov.
      
      Cc: Dmitry Andrianov <dmitry.andrianov@alertme.com>
      Reported-by: default avatarJustin Pettit <jpettit@vmware.com>
      Reported-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      
      [mfo: backport: refresh context lines and use older symbol/file names:
       - nf_conncount.c -> xt_connlimit.c.
         - nf_conncount_rb -> xt_connlimit_rb
         - nf_conncount_tuple -> xt_connlimit_conn
         - conncount_conn_cachep -> connlimit_conn_cachep]
      Signed-off-by: default avatarMauricio Faria de Oliveira <mfo@canonical.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      75af3d78
    • Yi-Hung Wei's avatar
      netfilter: nf_conncount: Fix garbage collection with zones · 525e1dff
      Yi-Hung Wei authored
      commit 21ba8847 upstream.
      
      Currently, we use check_hlist() for garbage colleciton. However, we
      use the ‘zone’ from the counted entry to query the existence of
      existing entries in the hlist. This could be wrong when they are in
      different zones, and this patch fixes this issue.
      
      Fixes: e59ea3df ("netfilter: xt_connlimit: honor conntrack zone if available")
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      
      [mfo: backport: refresh context lines and use older symbol/file names, note hunk 5:
       - nf_conncount.c -> xt_connlimit.c
         - nf_conncount_rb -> xt_connlimit_rb
         - nf_conncount_tuple -> xt_connlimit_conn
         - hunk 5: remove check for non-NULL 'tuple', that isn't required as it's introduced
           by upstream commit 35d8deb8 ("netfilter: conncount: Support count only use case")
           which addresses nf_conncount_count() that does not exist yet -- it's introduced by
           upstream commit 625c5561 ("netfilter: connlimit: split xt_connlimit into front
           and backend"), a refactor change.
       - nft_connlimit.c -> removed, not used/doesn't exist yet.]
      Signed-off-by: default avatarMauricio Faria de Oliveira <mfo@canonical.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      525e1dff
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conncount: expose connection list interface · 15ee3595
      Pablo Neira Ayuso authored
      commit 5e5cbc7b upstream.
      
      This patch provides an interface to maintain the list of connections and
      the lookup function to obtain the number of connections in the list.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      
      [mfo: backport: refresh context lines and use older symbol/file names:
       - nf_conntrack_count.h: new file, add include guards.
       - nf_conncount.c -> xt_connlimit.c.
         - nf_conncount_rb -> xt_connlimit_rb
         - nf_conncount_tuple -> xt_connlimit_conn
         - conncount_rb_cachep -> connlimit_rb_cachep
         - conncount_conn_cachep -> connlimit_conn_cachep]
      Signed-off-by: default avatarMauricio Faria de Oliveira <mfo@canonical.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      15ee3595
    • Florian Westphal's avatar
      netfilter: xt_connlimit: don't store address in the conn nodes · 5e614e21
      Florian Westphal authored
      commit ce49480d upstream.
      
      Only stored, never read.  This is a leftover from commit 7d084877
      ("netfilter: connlimit: use rbtree for per-host conntrack obj storage"),
      which added the rbtree node struct that stores the address instead.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      
      [mfo: backport: refresh context lines and use older symbol/file names:
       - nf_conncount.c -> xt_connlimit.c.
         - nf_conncount_rb -> xt_connlimit_rb
         - nf_conncount_tuple -> xt_connlimit_conn
        - additionally, remove the add_hlist() 'addr' parameter that isn't used and removed
          later upstream with commit 625c5561 ("netfilter: connlimit: split xt_connlimit
          into front and backend") in the rename from 'xt_connlimit.c' to 'nf_conncount.c',
          a big refactor, so do it here, while still here in this related patch.]
      Signed-off-by: default avatarMauricio Faria de Oliveira <mfo@canonical.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5e614e21
    • Filipe Manana's avatar
      Btrfs: fix fsync of files with multiple hard links in new directories · 20373d98
      Filipe Manana authored
      commit 41bd6067 upstream.
      
      The log tree has a long standing problem that when a file is fsync'ed we
      only check for new ancestors, created in the current transaction, by
      following only the hard link for which the fsync was issued. We follow the
      ancestors using the VFS' dget_parent() API. This means that if we create a
      new link for a file in a directory that is new (or in an any other new
      ancestor directory) and then fsync the file using an old hard link, we end
      up not logging the new ancestor, and on log replay that new hard link and
      ancestor do not exist. In some cases, involving renames, the file will not
      exist at all.
      
      Example:
      
        mkfs.btrfs -f /dev/sdb
        mount /dev/sdb /mnt
      
        mkdir /mnt/A
        touch /mnt/foo
        ln /mnt/foo /mnt/A/bar
        xfs_io -c fsync /mnt/foo
      
        <power failure>
      
      In this example after log replay only the hard link named 'foo' exists
      and directory A does not exist, which is unexpected. In other major linux
      filesystems, such as ext4, xfs and f2fs for example, both hard links exist
      and so does directory A after mounting again the filesystem.
      
      Checking if any new ancestors are new and need to be logged was added in
      2009 by commit 12fcfd22 ("Btrfs: tree logging unlink/rename fixes"),
      however only for the ancestors of the hard link (dentry) for which the
      fsync was issued, instead of checking for all ancestors for all of the
      inode's hard links.
      
      So fix this by tracking the id of the last transaction where a hard link
      was created for an inode and then on fsync fallback to a full transaction
      commit when an inode has more than one hard link and at least one new hard
      link was created in the current transaction. This is the simplest solution
      since this is not a common use case (adding frequently hard links for
      which there's an ancestor created in the current transaction and then
      fsync the file). In case it ever becomes a common use case, a solution
      that consists of iterating the fs/subvol btree for each hard link and
      check if any ancestor is new, could be implemented.
      
      This solves many unexpected scenarios reported by Jayashree Mohan and
      Vijay Chidambaram, and for which there is a new test case for fstests
      under review.
      
      Fixes: 12fcfd22 ("Btrfs: tree logging unlink/rename fixes")
      CC: stable@vger.kernel.org # 4.4+
      Reported-by: default avatarVijay Chidambaram <vvijay03@gmail.com>
      Reported-by: default avatarJayashree Mohan <jayashree2912@gmail.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20373d98
    • Macpaul Lin's avatar
      cdc-acm: fix abnormal DATA RX issue for Mediatek Preloader. · 4aac41de
      Macpaul Lin authored
      commit eafb27fa upstream.
      
      Mediatek Preloader is a proprietary embedded boot loader for loading
      Little Kernel and Linux into device DRAM.
      
      This boot loader also handle firmware update. Mediatek Preloader will be
      enumerated as a virtual COM port when the device is connected to Windows
      or Linux OS via CDC-ACM class driver. When the USB enumeration has been
      done, Mediatek Preloader will send out handshake command "READY" to PC
      actively instead of waiting command from the download tool.
      
      Since Linux 4.12, the commit "tty: reset termios state on device
      registration" (93857edd) causes Mediatek
      Preloader receiving some abnoraml command like "READYXX" as it sent.
      This will be recognized as an incorrect response. The behavior change
      also causes the download handshake fail. This change only affects
      subsequent connects if the reconnected device happens to get the same minor
      number.
      
      By disabling the ECHO termios flag could avoid this problem. However, it
      cannot be done by user space configuration when download tool open
      /dev/ttyACM0. This is because the device running Mediatek Preloader will
      send handshake command "READY" immediately once the CDC-ACM driver is
      ready.
      
      This patch wants to fix above problem by introducing "DISABLE_ECHO"
      property in driver_info. When Mediatek Preloader is connected, the
      CDC-ACM driver could disable ECHO flag in termios to avoid the problem.
      Signed-off-by: default avatarMacpaul Lin <macpaul.lin@mediatek.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJohan Hovold <johan@kernel.org>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4aac41de
    • Tejun Heo's avatar
      cgroup: fix CSS_TASK_ITER_PROCS · 8769b27e
      Tejun Heo authored
      commit e9d81a1b upstream.
      
      CSS_TASK_ITER_PROCS implements process-only iteration by making
      css_task_iter_advance() skip tasks which aren't threadgroup leaders;
      however, when an iteration is started css_task_iter_start() calls the
      inner helper function css_task_iter_advance_css_set() instead of
      css_task_iter_advance().  As the helper doesn't have the skip logic,
      when the first task to visit is a non-leader thread, it doesn't get
      skipped correctly as shown in the following example.
      
        # ps -L 2030
          PID   LWP TTY      STAT   TIME COMMAND
         2030  2030 pts/0    Sl+    0:00 ./test-thread
         2030  2031 pts/0    Sl+    0:00 ./test-thread
        # mkdir -p /sys/fs/cgroup/x/a/b
        # echo threaded > /sys/fs/cgroup/x/a/cgroup.type
        # echo threaded > /sys/fs/cgroup/x/a/b/cgroup.type
        # echo 2030 > /sys/fs/cgroup/x/a/cgroup.procs
        # cat /sys/fs/cgroup/x/a/cgroup.threads
        2030
        2031
        # cat /sys/fs/cgroup/x/cgroup.procs
        2030
        # echo 2030 > /sys/fs/cgroup/x/a/b/cgroup.threads
        # cat /sys/fs/cgroup/x/cgroup.procs
        2031
        2030
      
      The last read of cgroup.procs is incorrectly showing non-leader 2031
      in cgroup.procs output.
      
      This can be fixed by updating css_task_iter_advance() to handle the
      first advance and css_task_iters_tart() to call
      css_task_iter_advance() instead of the inner helper.  After the fix,
      the same commands result in the following (correct) result:
      
        # ps -L 2062
          PID   LWP TTY      STAT   TIME COMMAND
         2062  2062 pts/0    Sl+    0:00 ./test-thread
         2062  2063 pts/0    Sl+    0:00 ./test-thread
        # mkdir -p /sys/fs/cgroup/x/a/b
        # echo threaded > /sys/fs/cgroup/x/a/cgroup.type
        # echo threaded > /sys/fs/cgroup/x/a/b/cgroup.type
        # echo 2062 > /sys/fs/cgroup/x/a/cgroup.procs
        # cat /sys/fs/cgroup/x/a/cgroup.threads
        2062
        2063
        # cat /sys/fs/cgroup/x/cgroup.procs
        2062
        # echo 2062 > /sys/fs/cgroup/x/a/b/cgroup.threads
        # cat /sys/fs/cgroup/x/cgroup.procs
        2062
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatar"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
      Fixes: 8cfd8147 ("cgroup: implement cgroup v2 thread support")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8769b27e
    • Wenwen Wang's avatar
      crypto: cavium/nitrox - fix a DMA pool free failure · cf3168c5
      Wenwen Wang authored
      commit 7172122b upstream.
      
      In crypto_alloc_context(), a DMA pool is allocated through dma_pool_alloc()
      to hold the crypto context. The meta data of the DMA pool, including the
      pool used for the allocation 'ndev->ctx_pool' and the base address of the
      DMA pool used by the device 'dma', are then stored to the beginning of the
      pool. These meta data are eventually used in crypto_free_context() to free
      the DMA pool through dma_pool_free(). However, given that the DMA pool can
      also be accessed by the device, a malicious device can modify these meta
      data, especially when the device is controlled to deploy an attack. This
      can cause an unexpected DMA pool free failure.
      
      To avoid the above issue, this patch introduces a new structure
      crypto_ctx_hdr and a new field chdr in the structure nitrox_crypto_ctx hold
      the meta data information of the DMA pool after the allocation. Note that
      the original structure ctx_hdr is not changed to ensure the compatibility.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarWenwen Wang <wang6495@umn.edu>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf3168c5
    • Johan Jonker's avatar
      clk: rockchip: fix typo in rk3188 spdif_frac parent · 3f844ac9
      Johan Jonker authored
      commit 8b19faf6 upstream.
      
      Fix typo in common_clk_branches.
      Make spdif_pre parent of spdif_frac.
      
      Fixes: 66746420 ("clk: rockchip: include downstream muxes into fractional dividers")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJohan Jonker <jbx9999@hotmail.com>
      Acked-by: default avatarElaine Zhang <zhangqing@rock-chips.com>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f844ac9
    • Lukas Wunner's avatar
      spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode · 7c6ac785
      Lukas Wunner authored
      commit 56c17234 upstream.
      
      The IRQ handler bcm2835_spi_interrupt() first reads as much as possible
      from the RX FIFO, then writes as much as possible to the TX FIFO.
      Afterwards it decides whether the transfer is finished by checking if
      the TX FIFO is empty.
      
      If very few bytes were written to the TX FIFO, they may already have
      been transmitted by the time the FIFO's emptiness is checked.  As a
      result, the transfer will be declared finished and the chip will be
      reset without reading the corresponding received bytes from the RX FIFO.
      
      The odds of this happening increase with a high clock frequency (such
      that the TX FIFO drains quickly) and either passing "threadirqs" on the
      command line or enabling CONFIG_PREEMPT_RT_BASE (such that the IRQ
      handler may be preempted between filling the TX FIFO and checking its
      emptiness).
      
      Fix by instead checking whether rx_len has reached zero, which means
      that the transfer has been received in full.  This is also more
      efficient as it avoids one bus read access per interrupt.  Note that
      bcm2835_spi_transfer_one_poll() likewise uses rx_len to determine
      whether the transfer has finished.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Fixes: e34ff011 ("spi: bcm2835: move to the transfer_one driver model")
      Cc: stable@vger.kernel.org # v4.1+
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c6ac785
    • Lukas Wunner's avatar
      spi: bcm2835: Fix book-keeping of DMA termination · fef1fb1f
      Lukas Wunner authored
      commit dbc94411 upstream.
      
      If submission of a DMA TX transfer succeeds but submission of the
      corresponding RX transfer does not, the BCM2835 SPI driver terminates
      the TX transfer but neglects to reset the dma_pending flag to false.
      
      Thus, if the next transfer uses interrupt mode (because it is shorter
      than BCM2835_SPI_DMA_MIN_LENGTH) and runs into a timeout,
      dmaengine_terminate_all() will be called both for TX (once more) and
      for RX (which was never started in the first place).  Fix it.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Fixes: 3ecd37ed ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
      Cc: stable@vger.kernel.org # v4.2+
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fef1fb1f
    • Lukas Wunner's avatar
      spi: bcm2835: Fix race on DMA termination · 24fc3cc2
      Lukas Wunner authored
      commit e82b0b38 upstream.
      
      If a DMA transfer finishes orderly right when spi_transfer_one_message()
      determines that it has timed out, the callbacks bcm2835_spi_dma_done()
      and bcm2835_spi_handle_err() race to call dmaengine_terminate_all(),
      potentially leading to double termination.
      
      Prevent by atomically changing the dma_pending flag before calling
      dmaengine_terminate_all().
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Fixes: 3ecd37ed ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
      Cc: stable@vger.kernel.org # v4.2+
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24fc3cc2
    • Theodore Ts'o's avatar
      ext4: check for shutdown and r/o file system in ext4_write_inode() · 7f3901d8
      Theodore Ts'o authored
      commit 18f2c4fc upstream.
      
      If the file system has been shut down or is read-only, then
      ext4_write_inode() needs to bail out early.
      
      Also use jbd2_complete_transaction() instead of ext4_force_commit() so
      we only force a commit if it is needed.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f3901d8
    • Theodore Ts'o's avatar
      ext4: force inode writes when nfsd calls commit_metadata() · dbffc914
      Theodore Ts'o authored
      commit fde87268 upstream.
      
      Some time back, nfsd switched from calling vfs_fsync() to using a new
      commit_metadata() hook in export_operations().  If the file system did
      not provide a commit_metadata() hook, it fell back to using
      sync_inode_metadata().  Unfortunately doesn't work on all file
      systems.  In particular, it doesn't work on ext4 due to how the inode
      gets journalled --- the VFS writeback code will not always call
      ext4_write_inode().
      
      So we need to provide our own ext4_nfs_commit_metdata() method which
      calls ext4_write_inode() directly.
      
      Google-Bug-Id: 121195940
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbffc914
    • Theodore Ts'o's avatar
      ext4: include terminating u32 in size of xattr entries when expanding inodes · 6ef63893
      Theodore Ts'o authored
      commit a805622a upstream.
      
      In ext4_expand_extra_isize_ea(), we calculate the total size of the
      xattr header, plus the xattr entries so we know how much of the
      beginning part of the xattrs to move when expanding the inode extra
      size.  We need to include the terminating u32 at the end of the xattr
      entries, or else if there is uninitialized, non-zero bytes after the
      xattr entries and before the xattr values, the list of xattr entries
      won't be properly terminated.
      Reported-by: default avatarSteve Graham <stgraham2000@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ef63893
    • ruippan (潘睿)'s avatar
      ext4: fix EXT4_IOC_GROUP_ADD ioctl · 0bf8b3fd
      ruippan (潘睿) authored
      commit e647e291 upstream.
      
      Commit e2b911c5 ("ext4: clean up feature test macros with
      predicate functions") broke the EXT4_IOC_GROUP_ADD ioctl.  This was
      not noticed since only very old versions of resize2fs (before
      e2fsprogs 1.42) use this ioctl.  However, using a new kernel with an
      enterprise Linux userspace will cause attempts to use online resize to
      fail with "No reserved GDT blocks".
      
      Fixes: e2b911c5 ("ext4: clean up feature test macros with predicate...")
      Cc: stable@kernel.org # v4.4
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarruippan (潘睿) <ruippan@tencent.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bf8b3fd
    • Maurizio Lombardi's avatar
      ext4: missing unlock/put_page() in ext4_try_to_write_inline_data() · 92bb9b06
      Maurizio Lombardi authored
      commit 132d00be upstream.
      
      In case of error, ext4_try_to_write_inline_data() should unlock
      and release the page it holds.
      
      Fixes: f19d5870 ("ext4: add normal write support for inline data")
      Cc: stable@kernel.org # 3.8
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92bb9b06
    • Pan Bian's avatar
      ext4: fix possible use after free in ext4_quota_enable · 34bba27d
      Pan Bian authored
      commit 61157b24 upstream.
      
      The function frees qf_inode via iput but then pass qf_inode to
      lockdep_set_quota_inode on the failure path. This may result in a
      use-after-free bug. The patch frees df_inode only when it is never used.
      
      Fixes: daf647d2 ("ext4: add lockdep annotations for i_data_sem")
      Cc: stable@kernel.org # 4.6
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34bba27d
    • Theodore Ts'o's avatar
      ext4: add ext4_sb_bread() to disambiguate ENOMEM cases · 9da1f6d0
      Theodore Ts'o authored
      commit fb265c9c upstream.
      
      Today, when sb_bread() returns NULL, this can either be because of an
      I/O error or because the system failed to allocate the buffer.  Since
      it's an old interface, changing would require changing many call
      sites.
      
      So instead we create our own ext4_sb_bread(), which also allows us to
      set the REQ_META flag.
      
      Also fixed a problem in the xattr code where a NULL return in a
      function could also mean that the xattr was not found, which could
      lead to the wrong error getting returned to userspace.
      
      Fixes: ac27a0ec ("ext4: initial copy of files from ext3")
      Cc: stable@kernel.org # 2.6.19
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9da1f6d0
    • Ben Hutchings's avatar
      perf pmu: Suppress potential format-truncation warning · 9f0fc584
      Ben Hutchings authored
      commit 11a64a05 upstream.
      
      Depending on which functions are inlined in util/pmu.c, the snprintf()
      calls in perf_pmu__parse_{scale,unit,per_pkg,snapshot}() might trigger a
      warning:
      
        util/pmu.c: In function 'pmu_aliases':
        util/pmu.c:178:31: error: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
          snprintf(path, PATH_MAX, "%s/%s.unit", dir, name);
                                     ^~
      
      I found this when trying to build perf from Linux 3.16 with gcc 8.
      However I can reproduce the problem in mainline if I force
      __perf_pmu__new_alias() to be inlined.
      
      Suppress this by using scnprintf() as has been done elsewhere in perf.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181111184524.fux4taownc6ndbx6@decadent.org.ukSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f0fc584
    • Miquel Raynal's avatar
      platform-msi: Free descriptors in platform_msi_domain_free() · 69beeb1c
      Miquel Raynal authored
      commit 81b1e6e6 upstream.
      
      Since the addition of platform MSI support, there were two helpers
      supposed to allocate/free IRQs for a device:
      
          platform_msi_domain_alloc_irqs()
          platform_msi_domain_free_irqs()
      
      In these helpers, IRQ descriptors are allocated in the "alloc" routine
      while they are freed in the "free" one.
      
      Later, two other helpers have been added to handle IRQ domains on top
      of MSI domains:
      
          platform_msi_domain_alloc()
          platform_msi_domain_free()
      
      Seen from the outside, the logic is pretty close with the former
      helpers and people used it with the same logic as before: a
      platform_msi_domain_alloc() call should be balanced with a
      platform_msi_domain_free() call. While this is probably what was
      intended to do, the platform_msi_domain_free() does not remove/free
      the IRQ descriptor(s) created/inserted in
      platform_msi_domain_alloc().
      
      One effect of such situation is that removing a module that requested
      an IRQ will let one orphaned IRQ descriptor (with an allocated MSI
      entry) in the device descriptors list. Next time the module will be
      inserted back, one will observe that the allocation will happen twice
      in the MSI domain, one time for the remaining descriptor, one time for
      the new one. It also has the side effect to quickly overshoot the
      maximum number of allocated MSI and then prevent any module requesting
      an interrupt in the same domain to be inserted anymore.
      
      This situation has been met with loops of insertion/removal of the
      mvpp2.ko module (requesting 15 MSIs each time).
      
      Fixes: 552c494a ("platform-msi: Allow creation of a MSI-based stacked irq domain")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69beeb1c
    • Sean Christopherson's avatar
      KVM: nVMX: Free the VMREAD/VMWRITE bitmaps if alloc_kvm_area() fails · f7ed75e1
      Sean Christopherson authored
      commit 1b3ab5ad upstream.
      
      Fixes: 34a1cd60 ("kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7ed75e1
    • Sean Christopherson's avatar
      KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup · a90e0979
      Sean Christopherson authored
      commit e8143499 upstream.
      
      ____kvm_handle_fault_on_reboot() provides a generic exception fixup
      handler that is used to cleanly handle faults on VMX/SVM instructions
      during reboot (or at least try to).  If there isn't a reboot in
      progress, ____kvm_handle_fault_on_reboot() treats any exception as
      fatal to KVM and invokes kvm_spurious_fault(), which in turn generates
      a BUG() to get a stack trace and die.
      
      When it was originally added by commit 4ecac3fd ("KVM: Handle
      virtualization instruction #UD faults during reboot"), the "call" to
      kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value
      is the RIP of the faulting instructing.
      
      The PUSH+JMP trickery is necessary because the exception fixup handler
      code lies outside of its associated function, e.g. right after the
      function.  An actual CALL from the .fixup code would show a slightly
      bogus stack trace, e.g. an extra "random" function would be inserted
      into the trace, as the return RIP on the stack would point to no known
      function (and the unwinder will likely try to guess who owns the RIP).
      
      Unfortunately, the JMP was replaced with a CALL when the macro was
      reworked to not spin indefinitely during reboot (commit b7c4145b
      "KVM: Don't spin on virt instruction faults during reboot").  This
      causes the aforementioned behavior where a bogus function is inserted
      into the stack trace, e.g. my builds like to blame free_kvm_area().
      
      Revert the CALL back to a JMP.  The changelog for commit b7c4145b
      ("KVM: Don't spin on virt instruction faults during reboot") contains
      nothing that indicates the switch to CALL was deliberate.  This is
      backed up by the fact that the PUSH <insn RIP> was left intact.
      
      Note that an alternative to the PUSH+JMP magic would be to JMP back
      to the "real" code and CALL from there, but that would require adding
      a JMP in the non-faulting path to avoid calling kvm_spurious_fault()
      and would add no value, i.e. the stack trace would be the same.
      
      Using CALL:
      
      ------------[ cut here ]------------
      kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
      invalid opcode: 0000 [#1] SMP
      CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
      Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
      RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0
      R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000
      FS:  00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0
      Call Trace:
       free_kvm_area+0x1044/0x43ea [kvm_intel]
       ? vmx_vcpu_run+0x156/0x630 [kvm_intel]
       ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? __set_task_blocked+0x38/0x90
       ? __set_current_blocked+0x50/0x60
       ? __fpu__restore_sig+0x97/0x490
       ? do_vfs_ioctl+0xa1/0x620
       ? __x64_sys_futex+0x89/0x180
       ? ksys_ioctl+0x66/0x70
       ? __x64_sys_ioctl+0x16/0x20
       ? do_syscall_64+0x4f/0x100
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
      ---[ end trace 9775b14b123b1713 ]---
      
      Using JMP:
      
      ------------[ cut here ]------------
      kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
      invalid opcode: 0000 [#1] SMP
      CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
      Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
      RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0
      R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40
      FS:  00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0
      Call Trace:
       vmx_vcpu_run+0x156/0x630 [kvm_intel]
       ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? __set_task_blocked+0x38/0x90
       ? __set_current_blocked+0x50/0x60
       ? __fpu__restore_sig+0x97/0x490
       ? do_vfs_ioctl+0xa1/0x620
       ? __x64_sys_futex+0x89/0x180
       ? ksys_ioctl+0x66/0x70
       ? __x64_sys_ioctl+0x16/0x20
       ? do_syscall_64+0x4f/0x100
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
      ---[ end trace f9daedb85ab3ddba ]---
      
      Fixes: b7c4145b ("KVM: Don't spin on virt instruction faults during reboot")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a90e0979
    • Dan Williams's avatar
      x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() · 9d956533
      Dan Williams authored
      commit ba6f508d upstream.
      
      Commit:
      
        f77084d9 "x86/mm/pat: Disable preemption around __flush_tlb_all()"
      
      addressed a case where __flush_tlb_all() is called without preemption
      being disabled. It also left a warning to catch other cases where
      preemption is not disabled.
      
      That warning triggers for the memory hotplug path which is also used for
      persistent memory enabling:
      
       WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
       RIP: 0010:__flush_tlb_all+0x1b/0x3a
       [..]
       Call Trace:
        phys_pud_init+0x29c/0x2bb
        kernel_physical_mapping_init+0xfc/0x219
        init_memory_mapping+0x1a5/0x3b0
        arch_add_memory+0x2c/0x50
        devm_memremap_pages+0x3aa/0x610
        pmem_attach_disk+0x585/0x700 [nd_pmem]
      
      Andy wondered why a path that can sleep was using __flush_tlb_all() [1]
      and Dave confirmed the expectation for TLB flush is for modifying /
      invalidating existing PTE entries, but not initial population [2]. Drop
      the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the
      expectation that this path is only ever populating empty entries for the
      linear map. Note, at linear map teardown time there is a call to the
      all-cpu flush_tlb_all() to invalidate the removed mappings.
      
      [1]: https://lkml.kernel.org/r/9DFD717D-857D-493D-A606-B635D72BAC21@amacapital.net
      [2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa0b5@intel.com
      
      [ mingo: Minor readability edits. ]
      Suggested-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reported-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@intel.com
      Fixes: f77084d9 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
      Link: http://lkml.kernel.org/r/154395944713.32119.15611079023837132638.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d956533
    • Michal Hocko's avatar
      x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off · 8c34b071
      Michal Hocko authored
      commit 5b5e4d62 upstream.
      
      Swap storage is restricted to max_swapfile_size (~16TB on x86_64) whenever
      the system is deemed affected by L1TF vulnerability. Even though the limit
      is quite high for most deployments it seems to be too restrictive for
      deployments which are willing to live with the mitigation disabled.
      
      We have a customer to deploy 8x 6,4TB PCIe/NVMe SSD swap devices which is
      clearly out of the limit.
      
      Drop the swap restriction when l1tf=off is specified. It also doesn't make
      much sense to warn about too much memory for the l1tf mitigation when it is
      forcefully disabled by the administrator.
      
      [ tglx: Folded the documentation delta change ]
      
      Fixes: 377eeaa8 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Kosina <jkosina@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: <linux-mm@kvack.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181113184910.26697-1-mhocko@kernel.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c34b071