1. 25 May, 2019 40 commits
    • Daniel Borkmann's avatar
      bpf, lru: avoid messing with eviction heuristics upon syscall lookup · 9503419a
      Daniel Borkmann authored
      commit 50b045a8 upstream.
      
      One of the biggest issues we face right now with picking LRU map over
      regular hash table is that a map walk out of user space, for example,
      to just dump the existing entries or to remove certain ones, will
      completely mess up LRU eviction heuristics and wrong entries such
      as just created ones will get evicted instead. The reason for this
      is that we mark an entry as "in use" via bpf_lru_node_set_ref() from
      system call lookup side as well. Thus upon walk, all entries are
      being marked, so information of actual least recently used ones
      are "lost".
      
      In case of Cilium where it can be used (besides others) as a BPF
      based connection tracker, this current behavior causes disruption
      upon control plane changes that need to walk the map from user space
      to evict certain entries. Discussion result from bpfconf [0] was that
      we should simply just remove marking from system call side as no
      good use case could be found where it's actually needed there.
      Therefore this patch removes marking for regular LRU and per-CPU
      flavor. If there ever should be a need in future, the behavior could
      be selected via map creation flag, but due to mentioned reason we
      avoid this here.
      
        [0] http://vger.kernel.org/bpfconf.html
      
      Fixes: 29ba732a ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
      Fixes: 8f844938 ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9503419a
    • Daniel Borkmann's avatar
      bpf: add map_lookup_elem_sys_only for lookups from syscall side · 45b56138
      Daniel Borkmann authored
      commit c6110222 upstream.
      
      Add a callback map_lookup_elem_sys_only() that map implementations
      could use over map_lookup_elem() from system call side in case the
      map implementation needs to handle the latter differently than from
      the BPF data path. If map_lookup_elem_sys_only() is set, this will
      be preferred pick for map lookups out of user space. This hook is
      used in a follow-up fix for LRU map, but once development window
      opens, we can convert other map types from map_lookup_elem() (here,
      the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use
      the callback to simplify and clean up the latter.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45b56138
    • Chenbo Feng's avatar
      bpf: relax inode permission check for retrieving bpf program · 832f7c33
      Chenbo Feng authored
      commit e547ff3f upstream.
      
      For iptable module to load a bpf program from a pinned location, it
      only retrieve a loaded program and cannot change the program content so
      requiring a write permission for it might not be necessary.
      Also when adding or removing an unrelated iptable rule, it might need to
      flush and reload the xt_bpf related rules as well and triggers the inode
      permission check. It might be better to remove the write premission
      check for the inode so we won't need to grant write access to all the
      processes that flush and restore iptables rules.
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      832f7c33
    • John Garry's avatar
      driver core: Postpone DMA tear-down until after devres release for probe failure · 04bdef83
      John Garry authored
      commit 0b777eee upstream.
      
      In commit 376991db ("driver core: Postpone DMA tear-down until after
      devres release"), we changed the ordering of tearing down the device DMA
      ops and releasing all the device's resources; this was because the DMA ops
      should be maintained until we release the device's managed DMA memories.
      
      However, we have seen another crash on an arm64 system when a
      device driver probe fails:
      
        hisi_sas_v3_hw 0000:74:02.0: Adding to iommu group 2
        scsi host1: hisi_sas_v3_hw
        BUG: Bad page state in process swapper/0  pfn:313f5
        page:ffff7e0000c4fd40 count:1 mapcount:0
        mapping:0000000000000000 index:0x0
        flags: 0xfffe00000001000(reserved)
        raw: 0fffe00000001000 ffff7e0000c4fd48 ffff7e0000c4fd48
      0000000000000000
        raw: 0000000000000000 0000000000000000 00000001ffffffff
      0000000000000000
        page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
        bad because of flags: 0x1000(reserved)
        Modules linked in:
        CPU: 49 PID: 1 Comm: swapper/0 Not tainted
      5.1.0-rc1-43081-g22d97fd-dirty #1433
        Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI
      RC0 - V1.12.01 01/29/2019
        Call trace:
        dump_backtrace+0x0/0x118
        show_stack+0x14/0x1c
        dump_stack+0xa4/0xc8
        bad_page+0xe4/0x13c
        free_pages_check_bad+0x4c/0xc0
        __free_pages_ok+0x30c/0x340
        __free_pages+0x30/0x44
        __dma_direct_free_pages+0x30/0x38
        dma_direct_free+0x24/0x38
        dma_free_attrs+0x9c/0xd8
        dmam_release+0x20/0x28
        release_nodes+0x17c/0x220
        devres_release_all+0x34/0x54
        really_probe+0xc4/0x2c8
        driver_probe_device+0x58/0xfc
        device_driver_attach+0x68/0x70
        __driver_attach+0x94/0xdc
        bus_for_each_dev+0x5c/0xb4
        driver_attach+0x20/0x28
        bus_add_driver+0x14c/0x200
        driver_register+0x6c/0x124
        __pci_register_driver+0x48/0x50
        sas_v3_pci_driver_init+0x20/0x28
        do_one_initcall+0x40/0x25c
        kernel_init_freeable+0x2b8/0x3c0
        kernel_init+0x10/0x100
        ret_from_fork+0x10/0x18
        Disabling lock debugging due to kernel taint
        BUG: Bad page state in process swapper/0  pfn:313f6
        page:ffff7e0000c4fd80 count:1 mapcount:0
      mapping:0000000000000000 index:0x0
      [   89.322983] flags: 0xfffe00000001000(reserved)
        raw: 0fffe00000001000 ffff7e0000c4fd88 ffff7e0000c4fd88
      0000000000000000
        raw: 0000000000000000 0000000000000000 00000001ffffffff
      0000000000000000
      
      The crash occurs for the same reason.
      
      In this case, on the really_probe() failure path, we are still clearing
      the DMA ops prior to releasing the device's managed memories.
      
      This patch fixes this issue by reordering the DMA ops teardown and the
      call to devres_release_all() on the failure path.
      Reported-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Tested-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04bdef83
    • Angus Ainslie (Purism)'s avatar
      dmaengine: imx-sdma: Only check ratio on parts that support 1:1 · 40aab199
      Angus Ainslie (Purism) authored
      commit 941acd56 upstream.
      
      On imx8mq B0 chip, AHB/SDMA clock ratio 2:1 can't be supported,
      since SDMA clock ratio has to be increased to 250Mhz, AHB can't reach
      to 500Mhz, so use 1:1 instead.
      
      To limit this change to the imx8mq for now this patch also adds an
      im8mq-sdma compatible string.
      Signed-off-by: default avatarAngus Ainslie (Purism) <angus@akkea.ca>
      Acked-by: default avatarRobin Gong <yibin.gong@nxp.com>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Cc: Richard Leitner <richard.leitner@skidata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40aab199
    • Nigel Croxon's avatar
      md/raid: raid5 preserve the writeback action after the parity check · 859c4c28
      Nigel Croxon authored
      commit b2176a1d upstream.
      
      The problem is that any 'uptodate' vs 'disks' check is not precise
      in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
      device that might try to kick off writes and then skip the action.
      Better to prevent the raid driver from taking unexpected action *and* keep
      the system alive vs killing the machine with BUG_ON.
      
      Note: fixed warning reported by kbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarNigel Croxon <ncroxon@redhat.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      859c4c28
    • Song Liu's avatar
      Revert "Don't jump to compute_result state from check_result state" · c42adfe6
      Song Liu authored
      commit a25d8c32 upstream.
      
      This reverts commit 4f4fd7c5.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Nigel Croxon <ncroxon@redhat.com>
      Cc: Xiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c42adfe6
    • Michael Lass's avatar
      dm: make sure to obey max_io_len_target_boundary · 871e122d
      Michael Lass authored
      commit 51b86f9a upstream.
      
      Commit 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM
      target interface") incorrectly removed code from
      __send_changing_extent_only() that is required to impose a per-target IO
      boundary on IO that exceeds max_io_len_target_boundary().  Otherwise
      "special" IO (e.g. DISCARD, WRITE SAME, WRITE ZEROES) can write beyond
      where allowed.
      
      Fix this by restoring the max_io_len_target_boundary() limit in
      __send_changing_extent_only()
      
      Fixes: 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM target interface")
      Cc: stable@vger.kernel.org # 5.1+
      Signed-off-by: default avatarMichael Lass <bevan@bi-co.net>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      871e122d
    • Kirill Smelkov's avatar
      fuse: Add FOPEN_STREAM to use stream_open() · 5bb2a758
      Kirill Smelkov authored
      commit bbd84f33 upstream.
      
      Starting from commit 9c225f26 ("vfs: atomic f_pos accesses as per
      POSIX") files opened even via nonseekable_open gate read and write via lock
      and do not allow them to be run simultaneously. This can create read vs
      write deadlock if a filesystem is trying to implement a socket-like file
      which is intended to be simultaneously used for both read and write from
      filesystem client.  See commit 10dce8af ("fs: stream_open - opener for
      stream-like files so that read and write can run simultaneously without
      deadlock") for details and e.g. commit 581d21a2 ("xenbus: fix deadlock
      on writes to /proc/xen/xenbus") for a similar deadlock example on
      /proc/xen/xenbus.
      
      To avoid such deadlock it was tempting to adjust fuse_finish_open to use
      stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
      but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
      and in particular GVFS which actually uses offset in its read and write
      handlers
      
      	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
      	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
      	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
      	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
      
      so if we would do such a change it will break a real user.
      
      Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
      opened handler is having stream-like semantics; does not use file position
      and thus the kernel is free to issue simultaneous read and write request on
      opened file handle.
      
      This patch together with stream_open() should be added to stable kernels
      starting from v3.14+. This will allow to patch OSSPD and other FUSE
      filesystems that provide stream-like files to return FOPEN_STREAM |
      FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
      kernel versions. This should work because fuse_finish_open ignores unknown
      open flags returned from a filesystem and so passing FOPEN_STREAM to a
      kernel that is not aware of this flag cannot hurt. In turn the kernel that
      is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
      is sufficient to implement streams without read vs write deadlock.
      
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: Kirill Smelkov's avatarKirill Smelkov <kirr@nexedi.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bb2a758
    • Martin Wilck's avatar
      dm mpath: always free attached_handler_name in parse_path() · 2f961166
      Martin Wilck authored
      commit 940bc471 upstream.
      
      Commit b592211c ("dm mpath: fix attached_handler_name leak and
      dangling hw_handler_name pointer") fixed a memory leak for the case
      where setup_scsi_dh() returns failure. But setup_scsi_dh may return
      success and not "use" attached_handler_name if the
      retain_attached_hwhandler flag is not set on the map. As setup_scsi_sh
      properly "steals" the pointer by nullifying it, freeing it
      unconditionally in parse_path() is safe.
      
      Fixes: b592211c ("dm mpath: fix attached_handler_name leak and dangling hw_handler_name pointer")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarMartin Wilck <mwilck@suse.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f961166
    • Helen Koike's avatar
      dm ioctl: fix hang in early create error condition · 2b62694b
      Helen Koike authored
      commit 0f41fcf7 upstream.
      
      The dm_early_create() function (which deals with "dm-mod.create=" kernel
      command line option) calls dm_hash_insert() who gets an extra reference
      to the md object.
      
      In case of failure, this reference wasn't being released, causing
      dm_destroy() to hang, thus hanging the whole boot process.
      
      Fix this by calling __hash_remove() in the error path.
      
      Fixes: 6bbc923d ("dm: add support to directly boot to a mapped device")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHelen Koike <helen.koike@collabora.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b62694b
    • Mikulas Patocka's avatar
      dm integrity: correctly calculate the size of metadata area · c318890d
      Mikulas Patocka authored
      commit 30bba430 upstream.
      
      When we use separate devices for data and metadata, dm-integrity would
      incorrectly calculate the size of the metadata device as if it had
      512-byte block size - and it would refuse activation with larger block
      size and smaller metadata device.
      
      Fix this so that it takes actual block size into account, which fixes
      the following reported issue:
      https://gitlab.com/cryptsetup/cryptsetup/issues/450
      
      Fixes: 356d9d52 ("dm integrity: allow separate metadata device")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c318890d
    • Milan Broz's avatar
      dm crypt: move detailed message into debug level · 743bcc1a
      Milan Broz authored
      commit 7a1cd723 upstream.
      
      The information about tag size should not be printed without debug info
      set. Also print device major:minor in the error message to identify the
      device instance.
      
      Also use rate limiting and debug level for info about used crypto API
      implementaton.  This is important because during online reencryption
      the existing message saturates syslog (because we are moving hotzone
      across the whole device).
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      743bcc1a
    • Mikulas Patocka's avatar
      dm delay: fix a crash when invalid device is specified · 9611f37c
      Mikulas Patocka authored
      commit 81bc6d15 upstream.
      
      When the target line contains an invalid device, delay_ctr() will call
      delay_dtr() with NULL workqueue.  Attempting to destroy the NULL
      workqueue causes a crash.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9611f37c
    • Helen Koike's avatar
      dm init: fix max devices/targets checks · 7ea065d6
      Helen Koike authored
      commit 8e890c1a upstream.
      
      dm-init should allow up to DM_MAX_{DEVICES,TARGETS} for devices/targets,
      and not DM_MAX_{DEVICES,TARGETS} - 1.
      
      Fix the checks and also fix the error message when the number of devices
      is surpassed.
      
      Fixes: 6bbc923d ("dm: add support to directly boot to a mapped device")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHelen Koike <helen.koike@collabora.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ea065d6
    • Damien Le Moal's avatar
      dm zoned: Fix zone report handling · 29374659
      Damien Le Moal authored
      commit 7aedf75f upstream.
      
      The function blkdev_report_zones() returns success even if no zone
      information is reported (empty report). Empty zone reports can only
      happen if the report start sector passed exceeds the device capacity.
      The conditions for this to happen are either a bug in the caller code,
      or, a change in the device that forced the low level driver to change
      the device capacity to a value that is lower than the report start
      sector. This situation includes a failed disk revalidation resulting in
      the disk capacity being changed to 0.
      
      If this change happens while dm-zoned is in its initialization phase
      executing dmz_init_zones(), this function may enter an infinite loop
      and hang the system. To avoid this, add a check to disallow empty zone
      reports and bail out early. Also fix the function dmz_update_zone() to
      make sure that the report for the requested zone was correctly obtained.
      
      Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarShaun Tancheff <shaun@tancheff.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29374659
    • Nikos Tsironis's avatar
      dm cache metadata: Fix loading discard bitset · 9a467ab5
      Nikos Tsironis authored
      commit e28adc3b upstream.
      
      Add missing dm_bitset_cursor_next() to properly advance the bitset
      cursor.
      
      Otherwise, the discarded state of all blocks is set according to the
      discarded state of the first block.
      
      Fixes: ae4a46a1 ("dm cache metadata: use bitset cursor api to load discard bitset")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a467ab5
    • Stefan Mätje's avatar
      PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratum · a04b9369
      Stefan Mätje authored
      commit 4ec73791 upstream.
      
      Due to an erratum in some Pericom PCIe-to-PCI bridges in reverse mode
      (conventional PCI on primary side, PCIe on downstream side), the Retrain
      Link bit needs to be cleared manually to allow the link training to
      complete successfully.
      
      If it is not cleared manually, the link training is continuously restarted
      and no devices below the PCI-to-PCIe bridge can be accessed.  That means
      drivers for devices below the bridge will be loaded but won't work and may
      even crash because the driver is only reading 0xffff.
      
      See the Pericom Errata Sheet PI7C9X111SLB_errata_rev1.2_102711.pdf for
      details.  Devices known as affected so far are: PI7C9X110, PI7C9X111SL,
      PI7C9X130.
      
      Add a new flag, clear_retrain_link, in struct pci_dev.  Quirks for affected
      devices set this bit.
      
      Note that pcie_retrain_link() lives in aspm.c because that's currently the
      only place we use it, but this erratum is not specific to ASPM, and we may
      retrain links for other reasons in the future.
      Signed-off-by: default avatarStefan Mätje <stefan.maetje@esd.eu>
      [bhelgaas: apply regardless of CONFIG_PCIEASPM]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a04b9369
    • Stefan Mätje's avatar
      PCI: Factor out pcie_retrain_link() function · 989d2c41
      Stefan Mätje authored
      commit 86fa6a34 upstream.
      
      Factor out pcie_retrain_link() to use for Pericom Retrain Link quirk.  No
      functional change intended.
      Signed-off-by: default avatarStefan Mätje <stefan.maetje@esd.eu>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      989d2c41
    • Kazufumi Ikeda's avatar
      PCI: rcar: Add the initialization of PCIe link in resume_noirq() · 0fcad44e
      Kazufumi Ikeda authored
      commit be20bbcb upstream.
      
      Reestablish the PCIe link very early in the resume process in case it
      went down to prevent PCI accesses from hanging the bus. Such accesses
      can happen early in the PCI resume process, as early as the
      SUSPEND_RESUME_NOIRQ step, thus the link must be reestablished in the
      driver resume_noirq() callback.
      
      Fixes: e015f88c ("PCI: rcar: Add support for R-Car H3 to pcie-rcar")
      Signed-off-by: default avatarKazufumi Ikeda <kaz-ikeda@xc.jp.nec.com>
      Signed-off-by: default avatarGaku Inami <gaku.inami.xw@bp.renesas.com>
      Signed-off-by: default avatarMarek Vasut <marek.vasut+renesas@gmail.com>
      [lorenzo.pieralisi@arm.com: reformatted commit log]
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Cc: stable@vger.kernel.org
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Phil Edworthy <phil.edworthy@renesas.com>
      Cc: Simon Horman <horms+renesas@verge.net.au>
      Cc: Wolfram Sang <wsa@the-dreams.de>
      Cc: linux-renesas-soc@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0fcad44e
    • Jisheng Zhang's avatar
      PCI/AER: Change pci_aer_init() stub to return void · 8bdfb57a
      Jisheng Zhang authored
      commit 31f996ef upstream.
      
      Commit 60ed982a ("PCI/AER: Move internal declarations to
      drivers/pci/pci.h") changed pci_aer_init() to return "void", but didn't
      change the stub for when CONFIG_PCIEAER isn't enabled.  Change the stub to
      match.
      
      Fixes: 60ed982a ("PCI/AER: Move internal declarations to drivers/pci/pci.h")
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org	# v4.19+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bdfb57a
    • Jean-Philippe Brucker's avatar
      PCI: Init PCIe feature bits for managed host bridge alloc · c1f52d4b
      Jean-Philippe Brucker authored
      commit 6302bf3e upstream.
      
      Two functions allocate a host bridge: devm_pci_alloc_host_bridge() and
      pci_alloc_host_bridge().  At the moment, only the unmanaged one initializes
      the PCIe feature bits, which prevents from using features such as hotplug
      or AER on some systems, when booting with device tree.  Make the
      initialization code common.
      
      Fixes: 02bfeb48 ("PCI/portdrv: Simplify PCIe feature permission checking")
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org	# v4.17+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1f52d4b
    • Lyude Paul's avatar
      PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary · 3266b109
      Lyude Paul authored
      commit e0547c81 upstream.
      
      On ThinkPad P50 SKUs with an Nvidia Quadro M1000M instead of the M2000M
      variant, the BIOS does not always reset the secondary Nvidia GPU during
      reboot if the laptop is configured in Hybrid Graphics mode.  The reason is
      unknown, but the following steps and possibly a good bit of patience will
      reproduce the issue:
      
        1. Boot up the laptop normally in Hybrid Graphics mode
        2. Make sure nouveau is loaded and that the GPU is awake
        3. Allow the Nvidia GPU to runtime suspend itself after being idle
        4. Reboot the machine, the more sudden the better (e.g. sysrq-b may help)
        5. If nouveau loads up properly, reboot the machine again and go back to
           step 2 until you reproduce the issue
      
      This results in some very strange behavior: the GPU will be left in exactly
      the same state it was in when the previously booted kernel started the
      reboot.  This has all sorts of bad side effects: for starters, this
      completely breaks nouveau starting with a mysterious EVO channel failure
      that happens well before we've actually used the EVO channel for anything:
      
        nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000 00000002
      
      This causes a timeout trying to bring up the GR ctx:
      
        nouveau 0000:01:00.0: timeout
        WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547 gf100_grctx_generate+0x7b2/0x850 [nouveau]
        Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 ) 12/18/2018
        Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
        ...
        nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
        nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
        nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine 00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000 unknown]
      
      The GPU never manages to recover.  Booting without loading nouveau causes
      issues as well, since the GPU starts sending spurious interrupts that cause
      other device's IRQs to get disabled by the kernel:
      
        irq 16: nobody cared (try booting with the "irqpoll" option)
        ...
        handlers:
        [<000000007faa9e99>] i801_isr [i2c_i801]
        Disabling IRQ #16
        ...
        serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
        i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
        i801_smbus 0000:00:1f.4: Transaction timeout
        rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-110).
        i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
        i801_smbus 0000:00:1f.4: Transaction timeout
        rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts!
      
      This causes the touchpad and sometimes other things to get disabled.
      
      Since this happens without nouveau, we can't fix this problem from nouveau
      itself.
      
      Add a PCI quirk for the specific P50 variant of this GPU.  Make sure the
      GPU is advertising NoReset- so we don't reset the GPU when the machine is
      in Dedicated graphics mode (where the GPU being initialized by the BIOS is
      normal and expected).  Map the GPU MMIO space and read the magic 0x2240c
      register, which will have bit 1 set if the device was POSTed during a
      previous boot.  Once we've confirmed all of this, reset the GPU and
      re-disable it - bringing it back to a healthy state.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=203003
      Link: https://lore.kernel.org/lkml/20190212220230.1568-1-lyude@redhat.comSigned-off-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: nouveau@lists.freedesktop.org
      Cc: dri-devel@lists.freedesktop.org
      Cc: Karol Herbst <kherbst@redhat.com>
      Cc: Ben Skeggs <skeggsb@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3266b109
    • James Prestwood's avatar
      PCI: Mark Atheros AR9462 to avoid bus reset · 07c4e6b5
      James Prestwood authored
      commit 6afb7e26 upstream.
      
      When using PCI passthrough with this device, the host machine locks up
      completely when starting the VM, requiring a hard reboot.  Add a quirk to
      avoid bus resets on this device.
      
      Fixes: c3e59ee4 ("PCI: Mark Atheros AR93xx to avoid bus reset")
      Link: https://lore.kernel.org/linux-pci/20190107213248.3034-1-james.prestwood@linux.intel.comSigned-off-by: default avatarJames Prestwood <james.prestwood@linux.intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org	# v3.14+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07c4e6b5
    • Nikolai Kostrigin's avatar
      PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken · 978bb187
      Nikolai Kostrigin authored
      commit d28ca864 upstream.
      
      ATS is broken on the Radeon R7 GPU (at least for Stoney Ridge based laptop)
      and causes IOMMU stalls and system failure.  Disable ATS on these devices
      to make them usable again with IOMMU enabled.
      
      Thanks to Joerg Roedel <jroedel@suse.de> for help.
      
      [bhelgaas: In the email thread mentioned below, Alex suspects the real
      problem is in sbios or iommu, so it may affect only certain systems, and it
      may affect other devices in those systems as well.  However, per Joerg we
      lack the ability to debug further, so this quirk is the best we can do for
      now.]
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=194521
      Link: https://lore.kernel.org/lkml/20190408103725.30426-1-nickel@altlinux.org
      Fixes: 9b44b0b0 ("PCI: Mark AMD Stoney GPU ATS as broken")
      Signed-off-by: default avatarNikolai Kostrigin <nickel@altlinux.org>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarJoerg Roedel <jroedel@suse.de>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      978bb187
    • Yifeng Li's avatar
      fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting · 24f7e248
      Yifeng Li authored
      commit f627caf5 upstream.
      
      On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), blanking the display
      or starting the X server will crash and freeze the system, or garble the
      display.
      
      Experiments showed this problem can mostly be solved by adjusting the
      order of register writes. Also, sm712fb failed to consider the difference
      of clock frequency when unblanking the display, and programs the clock for
      SM712 to SM720.
      
      Fix them by adjusting the order of register writes, and adding an
      additional check for SM720 for programming the clock frequency.
      Signed-off-by: default avatarYifeng Li <tomli@tomli.me>
      Tested-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Teddy Wang <teddy.wang@siliconmotion.com>
      Cc: <stable@vger.kernel.org>  # v4.4+
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24f7e248
    • Yifeng Li's avatar
      fbdev: sm712fb: use 1024x768 by default on non-MIPS, fix garbled display · c0615dcf
      Yifeng Li authored
      commit 4ed7d2cc upstream.
      
      Loongson MIPS netbooks use 1024x600 LCD panels, which is the original
      target platform of this driver, but nearly all old x86 laptops have
      1024x768. Lighting 768 panels using 600's timings would partially
      garble the display. Since it's not possible to distinguish them reliably,
      we change the default to 768, but keep 600 as-is on MIPS.
      
      Further, earlier laptops, such as IBM Thinkpad 240X, has a 800x600 LCD
      panel, this driver would probably garbled those display. As we don't
      have one for testing, the original behavior of the driver is kept as-is,
      but the problem has been documented is the comments.
      Signed-off-by: default avatarYifeng Li <tomli@tomli.me>
      Tested-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Teddy Wang <teddy.wang@siliconmotion.com>
      Cc: <stable@vger.kernel.org>  # v4.4+
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0615dcf
    • Yifeng Li's avatar
      fbdev: sm712fb: fix support for 1024x768-16 mode · ace34e8e
      Yifeng Li authored
      commit 6053d3a4 upstream.
      
      In order to support the 1024x600 panel on Yeeloong Loongson MIPS
      laptop, the original 1024x768-16 table was modified to 1024x600-16,
      without leaving the original. It causes problem on x86 laptop as
      the 1024x768-16 support was still claimed but not working.
      
      Fix it by introducing the 1024x768-16 mode.
      Signed-off-by: default avatarYifeng Li <tomli@tomli.me>
      Tested-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Teddy Wang <teddy.wang@siliconmotion.com>
      Cc: <stable@vger.kernel.org>  # v4.4+
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ace34e8e
    • Yifeng Li's avatar
      fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping VRAM · fb294796
      Yifeng Li authored
      commit 9e0e5999 upstream.
      
      On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), running fbtest or X
      will crash the machine instantly, because the VRAM/framebuffer is not
      mapped correctly.
      
      On SM712, the framebuffer starts at the beginning of address space, but
      SM720's framebuffer starts at the 1 MiB offset from the beginning. However,
      sm712fb fails to take this into account, as a result, writing to the
      framebuffer will destroy all the registers and kill the system immediately.
      Another problem is the driver assumes 8 MiB of VRAM for SM720, but some
      SM720 system, such as this IBM Thinkpad, only has 4 MiB of VRAM.
      
      Fix this problem by removing the hardcoded VRAM size, adding a function to
      query the amount of VRAM from register MCR76 on SM720, and adding proper
      framebuffer offset.
      
      Please note that the memory map may have additional problems on Big-Endian
      system, which is not available for testing by myself. But I highly suspect
      that the original code is also broken on Big-Endian machines for SM720, so
      at least we are not making the problem worse. More, the driver also assumed
      SM710/SM712 has 4 MiB of VRAM, but it has a 2 MiB version as well, and used
      in earlier laptops, such as IBM Thinkpad 240X, the driver would probably
      crash on them. I've never seen one of those machines and cannot fix it, but
      I have documented these problems in the comments.
      Signed-off-by: default avatarYifeng Li <tomli@tomli.me>
      Tested-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Teddy Wang <teddy.wang@siliconmotion.com>
      Cc: <stable@vger.kernel.org>  # v4.4+
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb294796
    • Yifeng Li's avatar
      fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA · 61f7fcb3
      Yifeng Li authored
      commit ec1587d5 upstream.
      
      When the machine is booted in VGA mode, loading sm712fb would cause
      a glitch of random pixels shown on the screen. To prevent it from
      happening, we first clear the entire framebuffer, and we also need
      to stop calling smtcfb_setmode() during initialization, the fbdev
      layer will call it for us later when it's ready.
      Signed-off-by: default avatarYifeng Li <tomli@tomli.me>
      Tested-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Teddy Wang <teddy.wang@siliconmotion.com>
      Cc: <stable@vger.kernel.org>  # v4.4+
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61f7fcb3
    • Yifeng Li's avatar
      fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F · c77769b7
      Yifeng Li authored
      commit 80690538 upstream.
      
      On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), rebooting with
      sm712fb framebuffer driver would cause a white screen of death on
      the next POST, presumably the proper timings for the LCD panel was
      not reprogrammed properly by the BIOS.
      
      Experiments showed a few CRTC Scratch Registers, including CRT3D,
      CRT3E and CRT3F may be used internally by BIOS as some flags. CRT3B is
      a hardware testing register, we shouldn't mess with it. CRT3C has
      blanking signal and line compare control, which is not needed for this
      driver.
      
      Stop writing to CR3B-CR3F (a.k.a CRT3B-CRT3F) registers. Even if these
      registers don't have side-effect on other systems, writing to them is
      also highly questionable.
      Signed-off-by: default avatarYifeng Li <tomli@tomli.me>
      Tested-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Teddy Wang <teddy.wang@siliconmotion.com>
      Cc: <stable@vger.kernel.org>  # v4.4+
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c77769b7
    • Yifeng Li's avatar
      fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75 · 60c4e62c
      Yifeng Li authored
      commit dcf90705 upstream.
      
      On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), the amount of Video
      RAM is not detected correctly by the xf86-video-siliconmotion driver.
      This is because sm712fb overwrites the GPR71 Scratch Pad Register, which
      is set by BIOS on x86 and used to indicate amount of VRAM.
      
      Other Scratch Pad Registers, including GPR70/74/75, don't have the same
      side-effect, but overwriting to them is still questionable, as they are
      not related to modesetting.
      
      Stop writing to SR70/71/74/75 (a.k.a GPR70/71/74/75).
      Signed-off-by: default avatarYifeng Li <tomli@tomli.me>
      Tested-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Teddy Wang <teddy.wang@siliconmotion.com>
      Cc: <stable@vger.kernel.org>  # v4.4+
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60c4e62c
    • Yifeng Li's avatar
      fbdev: sm712fb: fix brightness control on reboot, don't set SR30 · 860bb76a
      Yifeng Li authored
      commit 5481115e upstream.
      
      On a Thinkpad s30 (Pentium III / i440MX, Lynx3DM), rebooting with
      sm712fb framebuffer driver would cause the role of brightness up/down
      button to swap.
      
      Experiments showed the FPR30 register caused this behavior. Moreover,
      even if this register don't have side-effect on other systems, over-
      writing it is also highly questionable, since it was originally
      configurated by the motherboard manufacturer by hardwiring pull-down
      resistors to indicate the type of LCD panel. We should not mess with
      it.
      
      Stop writing to the SR30 (a.k.a FPR30) register.
      Signed-off-by: default avatarYifeng Li <tomli@tomli.me>
      Tested-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Teddy Wang <teddy.wang@siliconmotion.com>
      Cc: <stable@vger.kernel.org>  # v4.4+
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      860bb76a
    • Ard Biesheuvel's avatar
      fbdev/efifb: Ignore framebuffer memmap entries that lack any memory types · 6623270e
      Ard Biesheuvel authored
      commit f8585539 upstream.
      
      The following commit:
      
        38ac0287 ("fbdev/efifb: Honour UEFI memory map attributes when mapping the FB")
      
      updated the EFI framebuffer code to use memory mappings for the linear
      framebuffer that are permitted by the memory attributes described by the
      EFI memory map for the particular region, if the framebuffer happens to
      be covered by the EFI memory map (which is typically only the case for
      framebuffers in shared memory). This is required since non-x86 systems
      may require cacheable attributes for memory mappings that are shared
      with other masters (such as GPUs), and this information cannot be
      described by the Graphics Output Protocol (GOP) EFI protocol itself,
      and so we rely on the EFI memory map for this.
      
      As reported by James, this breaks some x86 systems:
      
        [ 1.173368] efifb: probing for efifb
        [ 1.173386] efifb: abort, cannot remap video memory 0x1d5000 @ 0xcf800000
        [ 1.173395] Trying to free nonexistent resource <00000000cf800000-00000000cf9d4bff>
        [ 1.173413] efi-framebuffer: probe of efi-framebuffer.0 failed with error -5
      
      The problem turns out to be that the memory map entry that describes the
      framebuffer has no memory attributes listed at all, and so we end up with
      a mem_flags value of 0x0.
      
      So work around this by ensuring that the memory map entry's attribute field
      has a sane value before using it to mask the set of usable attributes.
      Reported-by: default avatarJames Hilliard <james.hilliard1@gmail.com>
      Tested-by: default avatarJames Hilliard <james.hilliard1@gmail.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: <stable@vger.kernel.org> # v4.19+
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: James Morse <james.morse@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Fixes: 38ac0287 ("fbdev/efifb: Honour UEFI memory map attributes when ...")
      Link: http://lkml.kernel.org/r/20190516213159.3530-2-ard.biesheuvel@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6623270e
    • Dave Hansen's avatar
      x86/mpx, mm/core: Fix recursive munmap() corruption · bed41691
      Dave Hansen authored
      commit 5a28fc94 upstream.
      
      This is a bit of a mess, to put it mildly.  But, it's a bug
      that only seems to have showed up in 4.20 but wasn't noticed
      until now, because nobody uses MPX.
      
      MPX has the arch_unmap() hook inside of munmap() because MPX
      uses bounds tables that protect other areas of memory.  When
      memory is unmapped, there is also a need to unmap the MPX
      bounds tables.  Barring this, unused bounds tables can eat 80%
      of the address space.
      
      But, the recursive do_munmap() that gets called vi arch_unmap()
      wreaks havoc with __do_munmap()'s state.  It can result in
      freeing populated page tables, accessing bogus VMA state,
      double-freed VMAs and more.
      
      See the "long story" further below for the gory details.
      
      To fix this, call arch_unmap() before __do_unmap() has a chance
      to do anything meaningful.  Also, remove the 'vma' argument
      and force the MPX code to do its own, independent VMA lookup.
      
      == UML / unicore32 impact ==
      
      Remove unused 'vma' argument to arch_unmap().  No functional
      change.
      
      I compile tested this on UML but not unicore32.
      
      == powerpc impact ==
      
      powerpc uses arch_unmap() well to watch for munmap() on the
      VDSO and zeroes out 'current->mm->context.vdso_base'.  Moving
      arch_unmap() makes this happen earlier in __do_munmap().  But,
      'vdso_base' seems to only be used in perf and in the signal
      delivery that happens near the return to userspace.  I can not
      find any likely impact to powerpc, other than the zeroing
      happening a little earlier.
      
      powerpc does not use the 'vma' argument and is unaffected by
      its removal.
      
      I compile-tested a 64-bit powerpc defconfig.
      
      == x86 impact ==
      
      For the common success case this is functionally identical to
      what was there before.  For the munmap() failure case, it's
      possible that some MPX tables will be zapped for memory that
      continues to be in use.  But, this is an extraordinarily
      unlikely scenario and the harm would be that MPX provides no
      protection since the bounds table got reset (zeroed).
      
      I can't imagine anyone doing this:
      
      	ptr = mmap();
      	// use ptr
      	ret = munmap(ptr);
      	if (ret)
      		// oh, there was an error, I'll
      		// keep using ptr.
      
      Because if you're doing munmap(), you are *done* with the
      memory.  There's probably no good data in there _anyway_.
      
      This passes the original reproducer from Richard Biener as
      well as the existing mpx selftests/.
      
      The long story:
      
      munmap() has a couple of pieces:
      
       1. Find the affected VMA(s)
       2. Split the start/end one(s) if neceesary
       3. Pull the VMAs out of the rbtree
       4. Actually zap the memory via unmap_region(), including
          freeing page tables (or queueing them to be freed).
       5. Fix up some of the accounting (like fput()) and actually
          free the VMA itself.
      
      This specific ordering was actually introduced by:
      
        dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
      
      during the 4.20 merge window.  The previous __do_munmap() code
      was actually safe because the only thing after arch_unmap() was
      remove_vma_list().  arch_unmap() could not see 'vma' in the
      rbtree because it was detached, so it is not even capable of
      doing operations unsafe for remove_vma_list()'s use of 'vma'.
      
      Richard Biener reported a test that shows this in dmesg:
      
        [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
        [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
      
      What triggered this was the recursive do_munmap() called via
      arch_unmap().  It was freeing page tables that has not been
      properly zapped.
      
      But, the problem was bigger than this.  For one, arch_unmap()
      can free VMAs.  But, the calling __do_munmap() has variables
      that *point* to VMAs and obviously can't handle them just
      getting freed while the pointer is still in use.
      
      I tried a couple of things here.  First, I tried to fix the page
      table freeing problem in isolation, but I then found the VMA
      issue.  I also tried having the MPX code return a flag if it
      modified the rbtree which would force __do_munmap() to re-walk
      to restart.  That spiralled out of control in complexity pretty
      fast.
      
      Just moving arch_unmap() and accepting that the bonkers failure
      case might eat some bounds tables seems like the simplest viable
      fix.
      
      This was also reported in the following kernel bugzilla entry:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=203123
      
      There are some reports that this commit triggered this bug:
      
        dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
      
      While that commit certainly made the issues easier to hit, I believe
      the fundamental issue has been with us as long as MPX itself, thus
      the Fixes: tag below is for one of the original MPX commits.
      
      [ mingo: Minor edits to the changelog and the patch. ]
      Reported-by: default avatarRichard Biener <rguenther@suse.de>
      Reported-by: default avatarH.J. Lu <hjl.tools@gmail.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by Thomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-um@lists.infradead.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: stable@vger.kernel.org
      Fixes: dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
      Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bed41691
    • Nathan Chancellor's avatar
      objtool: Allow AR to be overridden with HOSTAR · 30e61ff2
      Nathan Chancellor authored
      commit 8ea58f1e upstream.
      
      Currently, this Makefile hardcodes GNU ar, meaning that if it is not
      available, there is no way to supply a different one and the build will
      fail.
      
        $ make AR=llvm-ar CC=clang LD=ld.lld HOSTAR=llvm-ar HOSTCC=clang \
               HOSTLD=ld.lld HOSTLDFLAGS=-fuse-ld=lld defconfig modules_prepare
        ...
          AR       /out/tools/objtool/libsubcmd.a
        /bin/sh: 1: ar: not found
        ...
      
      Follow the logic of HOST{CC,LD} and allow the user to specify a
      different ar tool via HOSTAR (which is used elsewhere in other
      tools/ Makefiles).
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarMukesh Ojha <mojha@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/80822a9353926c38fd7a152991c6292491a9d0e8.1558028966.git.jpoimboe@redhat.com
      Link: https://github.com/ClangBuiltLinux/linux/issues/481Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30e61ff2
    • Florian Fainelli's avatar
      MIPS: perf: Fix build with CONFIG_CPU_BMIPS5000 enabled · b70cb364
      Florian Fainelli authored
      commit 1b1f01b6 upstream.
      
      arch/mips/kernel/perf_event_mipsxx.c: In function 'mipsxx_pmu_enable_event':
      arch/mips/kernel/perf_event_mipsxx.c:326:21: error: unused variable 'event' [-Werror=unused-variable]
        struct perf_event *event = container_of(evt, struct perf_event, hw);
                           ^~~~~
      
      Fix this by making use of IS_ENABLED() to simplify the code and avoid
      unnecessary ifdefery.
      
      Fixes: 84002c88 ("MIPS: perf: Fix perf with MT counting other threads")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: linux-mips@linux-mips.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: stable@vger.kernel.org # v4.18+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b70cb364
    • Adrian Hunter's avatar
      perf intel-pt: Fix sample timestamp wrt non-taken branches · 36dcf6ef
      Adrian Hunter authored
      commit 1b6599a9 upstream.
      
      The sample timestamp is updated to ensure that the timestamp represents
      the time of the sample and not a branch that the decoder is still
      walking towards. The sample timestamp is updated when the decoder
      returns, but the decoder does not return for non-taken branches. Update
      the sample timestamp then also.
      
      Note that commit 3f04d98e ("perf intel-pt: Improve sample
      timestamp") was also a stable fix and appears, for example, in v4.4
      stable tree as commit a4ebb58f ("perf intel-pt: Improve sample
      timestamp").
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.4+
      Fixes: 3f04d98e ("perf intel-pt: Improve sample timestamp")
      Link: http://lkml.kernel.org/r/20190510124143.27054-4-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36dcf6ef
    • Adrian Hunter's avatar
      perf intel-pt: Fix improved sample timestamp · 41be88f5
      Adrian Hunter authored
      commit 61b6e08d upstream.
      
      The decoder uses its current timestamp in samples. Usually that is a
      timestamp that has already passed, but in some cases it is a timestamp
      for a branch that the decoder is walking towards, and consequently
      hasn't reached.
      
      The intel_pt_sample_time() function decides which is which, but was not
      handling TNT packets exactly correctly.
      
      In the case of TNT, the timestamp applies to the first branch, so the
      decoder must first walk to that branch.
      
      That means intel_pt_sample_time() should return true for TNT, and this
      patch makes that change. However, if the first branch is a non-taken
      branch (i.e. a 'N'), then intel_pt_sample_time() needs to return false
      for subsequent taken branches in the same TNT packet.
      
      To handle that, introduce a new state INTEL_PT_STATE_TNT_CONT to
      distinguish the cases.
      
      Note that commit 3f04d98e ("perf intel-pt: Improve sample
      timestamp") was also a stable fix and appears, for example, in v4.4
      stable tree as commit a4ebb58f ("perf intel-pt: Improve sample
      timestamp").
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.4+
      Fixes: 3f04d98e ("perf intel-pt: Improve sample timestamp")
      Link: http://lkml.kernel.org/r/20190510124143.27054-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41be88f5
    • Adrian Hunter's avatar
      perf intel-pt: Fix instructions sampling rate · 40dc1fa6
      Adrian Hunter authored
      commit 7ba8fa20 upstream.
      
      The timestamp used to determine if an instruction sample is made, is an
      estimate based on the number of instructions since the last known
      timestamp. A consequence is that it might go backwards, which results in
      extra samples. Change it so that a sample is only made when the
      timestamp goes forwards.
      
      Note this does not affect a sampling period of 0 or sampling periods
      specified as a count of instructions.
      
      Example:
      
       Before:
      
       $ perf script --itrace=i10us
       ls 13812 [003] 2167315.222583:       3270 instructions:u:      7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:      30902 instructions:u:      7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         10 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          8 instructions:u:      7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         14 instructions:u:      7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          6 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:         14 instructions:u:      7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:          4 instructions:u:      7fac71e2dab2 _dl_cache_libcmp+0xd2 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222728:      16423 instructions:u:      7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222734:      12731 instructions:u:      7fac71e27938 _dl_name_match_p+0x68 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ...
      
       After:
       $ perf script --itrace=i10us
       ls 13812 [003] 2167315.222583:       3270 instructions:u:      7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222667:      30902 instructions:u:      7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so)
       ls 13812 [003] 2167315.222728:      16479 instructions:u:      7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so)
       ...
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: f4aa0819 ("perf tools: Add Intel PT decoder")
      Link: http://lkml.kernel.org/r/20190510124143.27054-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40dc1fa6