1. 01 Nov, 2018 1 commit
  2. 29 Oct, 2018 10 commits
    • Linus Torvalds's avatar
      Merge tag 'csky-for-linus-4.20' of https://github.com/c-sky/csky-linux · ac435075
      Linus Torvalds authored
      Pull C-SKY architecture port from Guo Ren:
       "This contains the Linux port for C-SKY(csky) based on linux-4.19
        Release, which has been through 10 rounds of review on mailing list.
      
        More information:
      
          http://en.c-sky.com
      
        The development repo:
      
          https://github.com/c-sky/csky-linux
      
        ABI Documentation:
      
          https://github.com/c-sky/csky-doc
      
        Here is the pre-built cross compiler for fast test from our CI:
      
          https://gitlab.com/c-sky/buildroot/-/jobs/101608095/artifacts/file/output/images/csky_toolchain_qemu_csky_ck807f_4.18_glibc_defconfig_482b221e52908be1c9b2ccb444255e1562bb7025.tar.xz
      
        We use buildroot as our CI-test enviornment. "LTP, Lmbench ..." will
        be tested for every commit. See here for more details:
      
          https://gitlab.com/c-sky/buildroot/pipelines
      
        We'll continouslly improve csky subsystem in future"
      
      Arnd acks, and adds the following notes:
       "I did a thorough review of the ABI, which as usual mainly consists of
        spotting any files that don't use the asm-generic ABI itself, and
        having it changed to it matches exactly what we do on other new
        architectures.
      
        I also looked at every other patch and commented on maybe half of them
        where I saw something that did not quite seem right. Others have
        reviewed specific patches in greater depth. I'm sure that one could
        fine more of the minor details, but as long as they are not ABI
        relevant, they can be fixed later.
      
        The only patch that is part of the ABI and that nobody reviewed is the
        signal handling. This is one of the areas I never worked on in much
        detail. I did not see anything wrong with it, but I also don't know
        what the problems with the other architectures are here, and we seem
        to be hitting issues occasionally, and we never managed to generalize
        this enough for new architectures to have a trivial implementation.
      
        I was originally hoping that we could have the 64-bit time_t
        interfaces ready in time to completely drop the 32-bit ones, but that
        did not happen. We might still remove them in the next merge window
        depending on whether the libc upstream people prefer to keep them or
        not.
      
        One more general comment: I think this may well be the last new CPU
        architecture we ever add to the kernel. Both nds32 and c-sky are made
        by companies that also work on risc-v, and generally speaking risc-v
        seems to be killing off any of the minor licensable instruction set
        projects, just like ARM has mostly killed off the custom
        vendor-specific instruction sets already.
      
        If we add another architecture in the future, it may instead be
        something like the LLVM bitcode or WebAssembly, who knows?"
      
      To which Geert Uytterhoeven pipes in about another architecture still in
      the pipeline: Kalray MPPA.
      
      * tag 'csky-for-linus-4.20' of https://github.com/c-sky/csky-linux: (24 commits)
        dt-bindings: interrupt-controller: C-SKY APB intc
        irqchip: add C-SKY APB bus interrupt controller
        dt-bindings: interrupt-controller: C-SKY SMP intc
        irqchip: add C-SKY SMP interrupt controller
        MAINTAINERS: Add csky
        dt-bindings: Add vendor prefix for csky
        dt-bindings: csky CPU Bindings
        csky: Misc headers
        csky: SMP support
        csky: Debug and Ptrace GDB
        csky: User access
        csky: Library functions
        csky: ELF and module probe
        csky: Atomic operations
        csky: IRQ handling
        csky: VDSO and rt_sigreturn
        csky: Process management and Signal
        csky: MMU and page table management
        csky: Cache and TLB routines
        csky: System Call
        ...
      ac435075
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 9f51ae62
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) GRO overflow entries are not unlinked properly, resulting in list
          poison pointers being dereferenced.
      
       2) Fix bridge build with ipv6 disabled, from Nikolay Aleksandrov.
      
       3) Direct packet access and other fixes in BPF from Daniel Borkmann.
      
       4) gred_change_table_def() gets passed the wrong pointer, a pointer to
          a set of unparsed attributes instead of the attribute itself. From
          Jakub Kicinski.
      
       5) Allow macsec device to be brought up even if it's lowerdev is down,
          from Sabrina Dubroca.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: diag: document swapped src/dst in udp_dump_one.
        macsec: let the administrator set UP state even if lowerdev is down
        macsec: update operstate when lower device changes
        net: sched: gred: pass the right attribute to gred_change_table_def()
        ptp: drop redundant kasprintf() to create worker name
        net: bridge: remove ipv6 zero address check in mcast queries
        net: Properly unlink GRO packets on overflow.
        bpf: fix wrong helper enablement in cgroup local storage
        bpf: add bpf_jit_limit knob to restrict unpriv allocations
        bpf: make direct packet write unclone more robust
        bpf: fix leaking uninitialized memory on pop/peek helpers
        bpf: fix direct packet write into pop/peek helpers
        bpf: fix cg_skb types to hint access type in may_access_direct_pkt_data
        bpf: fix direct packet access for flow dissector progs
        bpf: disallow direct packet access for unpriv in cg_skb
        bpf: fix test suite to enable all unpriv program types
        bpf, btf: fix a missing check bug in btf_parse
        selftests/bpf: add config fragments BPF_STREAM_PARSER and XDP_SOCKETS
        bpf: devmap: fix wrong interface selection in notifier_call
      9f51ae62
    • Lorenzo Colitti's avatar
      net: diag: document swapped src/dst in udp_dump_one. · 747569b0
      Lorenzo Colitti authored
      Since its inception, udp_dump_one has had a bug where userspace
      needs to swap src and dst addresses and ports in order to find
      the socket it wants. This is because it passes the socket source
      address to __udp[46]_lib_lookup's saddr argument, but those
      functions are intended to find local sockets matching received
      packets, so saddr is the remote address, not the local address.
      
      This can no longer be fixed for backwards compatibility reasons,
      so add a brief comment explaining that this is the case. This
      will avoid confusion and help ensure SOCK_DIAG implementations
      of new protocols don't have the same problem.
      
      Fixes: a925aa00 ("udp_diag: Implement the get_exact dumping functionality")
      Signed-off-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      747569b0
    • David S. Miller's avatar
      Merge branch 'macsec-fixes' · 3bdf6bac
      David S. Miller authored
      Sabrina Dubroca says:
      
      ====================
      macsec: linkstate fixes
      
      This series fixes issues with handling administrative and operstate of
      macsec devices.
      
      Radu Rendec proposed another version of the first patch [0] but I'd
      rather not follow the behavior of vlan devices, going with macvlan
      does instead. Patrick Talbert also reported the same issue to me.
      
      The second patch is a follow-up. The restriction on setting the device
      up is a bit unreasonable, and operstate provides the information we
      need in this case.
      
      [0] https://patchwork.ozlabs.org/patch/971374/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bdf6bac
    • Sabrina Dubroca's avatar
      macsec: let the administrator set UP state even if lowerdev is down · 07bddef9
      Sabrina Dubroca authored
      Currently, the kernel doesn't let the administrator set a macsec device
      up unless its lower device is currently up. This is inconsistent, as a
      macsec device that is up won't automatically go down when its lower
      device goes down.
      
      Now that linkstate propagation works, there's really no reason for this
      limitation, so let's remove it.
      
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Reported-by: default avatarRadu Rendec <radu.rendec@gmail.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07bddef9
    • Sabrina Dubroca's avatar
      macsec: update operstate when lower device changes · e6ac0758
      Sabrina Dubroca authored
      Like all other virtual devices (macvlan, vlan), the operstate of a
      macsec device should match the state of its lower device. This is done
      by calling netif_stacked_transfer_operstate from its netdevice notifier.
      
      We also need to call netif_stacked_transfer_operstate when a new macsec
      device is created, so that its operstate is set properly. This is only
      relevant when we try to bring the device up directly when we create it.
      
      Radu Rendec proposed a similar patch, inspired from the 802.1q driver,
      that included changing the administrative state of the macsec device,
      instead of just the operstate. This version is similar to what the
      macvlan driver does, and updates only the operstate.
      
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Reported-by: default avatarRadu Rendec <radu.rendec@gmail.com>
      Reported-by: default avatarPatrick Talbert <ptalbert@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6ac0758
    • Jakub Kicinski's avatar
      net: sched: gred: pass the right attribute to gred_change_table_def() · 38b4f18d
      Jakub Kicinski authored
      gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute,
      and expects it will be able to interpret its contents as
      struct tc_gred_sopt.  Pass the correct gred attribute, instead of
      TCA_OPTIONS.
      
      This bug meant the table definition could never be changed after
      Qdisc was initialized (unless whatever TCA_OPTIONS contained both
      passed netlink validation and was a valid struct tc_gred_sopt...).
      
      Old behaviour:
      $ ip link add type dummy
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      RTNETLINK answers: Invalid argument
      
      Now:
      $ ip link add type dummy
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      
      Fixes: f62d6b93 ("[PKT_SCHED]: GRED: Use central VQ change procedure")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38b4f18d
    • Rasmus Villemoes's avatar
      ptp: drop redundant kasprintf() to create worker name · 822c5f73
      Rasmus Villemoes authored
      Building with -Wformat-nonliteral, gcc complains
      
      drivers/ptp/ptp_clock.c: In function ‘ptp_clock_register’:
      drivers/ptp/ptp_clock.c:239:26: warning: format not a string literal and no format arguments [-Wformat-nonliteral]
                  worker_name : info->name);
      
      kthread_create_worker takes fmt+varargs to set the name of the
      worker, and that happens with a vsnprintf() to a stack buffer (that is
      then copied into task_comm). So there's no reason not to just pass
      "ptp%d", ptp->index to kthread_create_worker() and avoid the
      intermediate worker_name variable.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      822c5f73
    • Nikolay Aleksandrov's avatar
      net: bridge: remove ipv6 zero address check in mcast queries · 0fe5119e
      Nikolay Aleksandrov authored
      Recently a check was added which prevents marking of routers with zero
      source address, but for IPv6 that cannot happen as the relevant RFCs
      actually forbid such packets:
      RFC 2710 (MLDv1):
      "To be valid, the Query message MUST
       come from a link-local IPv6 Source Address, be at least 24 octets
       long, and have a correct MLD checksum."
      
      Same goes for RFC 3810.
      
      And also it can be seen as a requirement in ipv6_mc_check_mld_query()
      which is used by the bridge to validate the message before processing
      it. Thus any queries with :: source address won't be processed anyway.
      So just remove the check for zero IPv6 source address from the query
      processing function.
      
      Fixes: 5a2de63f ("bridge: do not add port to router list when receives query with source 0.0.0.0")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fe5119e
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2018-10-24' of git://anongit.freedesktop.org/drm/drm · 53b3b6bb
      Linus Torvalds authored
      Pull drm updates from Dave Airlie:
       "This is going to rebuild more than drm as it adds a new helper to
        list.h for doing bulk updates. Seemed like a reasonable addition to
        me.
      
        Otherwise the usual merge window stuff lots of i915 and amdgpu, not so
        much nouveau, and piles of everything else.
      
        Core:
         - Adds a new list.h helper for doing bulk list updates for TTM.
         - Don't leak fb address in smem_start to userspace (comes with EXPORT
           workaround for people using mali out of tree hacks)
         - udmabuf device to turn memfd regions into dma-buf
         - Per-plane blend mode property
         - ref/unref replacements with get/put
         - fbdev conflicting framebuffers code cleaned up
         - host-endian format variants
         - panel orientation quirk for Acer One 10
      
        bridge:
         - TI SN65DSI86 chip support
      
        vkms:
         - GEM support.
         - Cursor support
      
        amdgpu:
         - Merge amdkfd and amdgpu into one module
         - CEC over DP AUX support
         - Picasso APU support + VCN dynamic powergating
         - Raven2 APU support
         - Vega20 enablement + kfd support
         - ACP powergating improvements
         - ABGR/XBGR display support
         - VCN jpeg support
         - xGMI support
         - DC i2c/aux cleanup
         - Ycbcr 4:2:0 support
         - GPUVM improvements
         - Powerplay and powerplay endian fixes
         - Display underflow fixes
      
        vmwgfx:
         - Move vmwgfx specific TTM code to vmwgfx
         - Split out vmwgfx buffer/resource validation code
         - Atomic operation rework
      
        bochs:
         - use more helpers
         - format/byteorder improvements
      
        qxl:
         - use more helpers
      
        i915:
         - GGTT coherency getparam
         - Turn off resource streamer API
         - More Icelake enablement + DMC firmware
         - Full PPGTT for Ivybridge, Haswell and Valleyview
         - DDB distribution based on resolution
         - Limited range DP display support
      
        nouveau:
         - CEC over DP AUX support
         - Initial HDMI 2.0 support
      
        virtio-gpu:
         - vmap support for PRIME objects
      
        tegra:
         - Initial Tegra194 support
         - DMA/IOMMU integration fixes
      
        msm:
         - a6xx perf improvements + clock prefix
         - GPU preemption optimisations
         - a6xx devfreq support
         - cursor support
      
        rockchip:
         - PX30 support
         - rgb output interface support
      
        mediatek:
         - HDMI output support on mt2701 and mt7623
      
        rcar-du:
         - Interlaced modes on Gen3
         - LVDS on R8A77980
         - D3 and E3 SoC support
      
        hisilicon:
         - misc fixes
      
        mxsfb:
         - runtime pm support
      
        sun4i:
         - R40 TCON support
         - Allwinner A64 support
         - R40 HDMI support
      
        omapdrm:
         - Driver rework changing display pipeline ordering to use common code
         - DMM memory barrier and irq fixes
         - Errata workarounds
      
        exynos:
         - out-bridge support for LVDS bridge driver
         - Samsung 16x16 tiled format support
         - Plane alpha and pixel blend mode support
      
        tilcdc:
         - suspend/resume update
      
        mali-dp:
         - misc updates"
      
      * tag 'drm-next-2018-10-24' of git://anongit.freedesktop.org/drm/drm: (1382 commits)
        firmware/dmc/icl: Add missing MODULE_FIRMWARE() for Icelake.
        drm/i915/icl: Fix signal_levels
        drm/i915/icl: Fix DDI/TC port clk_off bits
        drm/i915/icl: create function to identify combophy port
        drm/i915/gen9+: Fix initial readout for Y tiled framebuffers
        drm/i915: Large page offsets for pread/pwrite
        drm/i915/selftests: Disable shrinker across mmap-exhaustion
        drm/i915/dp: Link train Fallback on eDP only if fallback link BW can fit panel's native mode
        drm/i915: Fix intel_dp_mst_best_encoder()
        drm/i915: Skip vcpi allocation for MSTB ports that are gone
        drm/i915: Don't unset intel_connector->mst_port
        drm/i915: Only reset seqno if actually idle
        drm/i915: Use the correct crtc when sanitizing plane mapping
        drm/i915: Restore vblank interrupts earlier
        drm/i915: Check fb stride against plane max stride
        drm/amdgpu/vcn:Fix uninitialized symbol error
        drm: panel-orientation-quirks: Add quirk for Acer One 10 (S1003)
        drm/amd/amdgpu: Fix debugfs error handling
        drm/amdgpu: Update gc_9_0 golden settings.
        drm/amd/powerplay: update PPtable with DC BTC and Tvr SocLimit fields
        ...
      53b3b6bb
  3. 28 Oct, 2018 7 commits
    • Linus Torvalds's avatar
      Merge tag 'vla-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 746bb4ed
      Linus Torvalds authored
      Pull VLA removal from Kees Cook:
       "Globally warn on VLA use.
      
        This turns on "-Wvla" globally now that the last few trees with their
        VLA removals have landed (crypto, block, net, and powerpc).
      
        Arnd mentioned that there may be a couple more VLAs hiding in
        hard-to-find randconfigs, but nothing big has shaken out in the last
        month or so in linux-next.
      
        We should be basically VLA-free now! Wheee. :)
      
        Summary:
      
         - Remove unused fallback for BUILD_BUG_ON (which technically contains
           a VLA)
      
         - Lift -Wvla to the top-level Makefile"
      
      * tag 'vla-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        Makefile: Globally enable VLA warning
        compiler.h: give up __compiletime_assert_fallback()
      746bb4ed
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · ac747c07
      Linus Torvalds authored
      Pull Kbuild updates from Masahiro Yamada:
      
       - optimize kallsyms slightly
      
       - remove check for old CFLAGS usage
      
       - add some compiler flags unconditionally instead of evaluating
         $(call cc-option,...)
      
       - fix variable shadowing in host tools
      
       - refactor scripts/mkmakefile
      
       - refactor various makefiles
      
      * tag 'kbuild-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        modpost: Create macro to avoid variable shadowing
        ASN.1: Remove unnecessary shadowed local variable
        kbuild: use 'else ifeq' for checksrc to improve readability
        kbuild: remove unneeded link_multi_deps
        kbuild: add -Wno-unused-but-set-variable flag unconditionally
        kbuild: add -Wdeclaration-after-statement flag unconditionally
        kbuild: add -Wno-pointer-sign flag unconditionally
        modpost: remove leftover symbol prefix handling for module device table
        kbuild: simplify command line creation in scripts/mkmakefile
        kbuild: do not pass $(objtree) to scripts/mkmakefile
        kbuild: remove user ID check in scripts/mkmakefile
        kbuild: remove VERSION and PATCHLEVEL from $(objtree)/Makefile
        kbuild: add --include-dir flag only for out-of-tree build
        kbuild: remove dead code in cmd_files calculation in top Makefile
        kbuild: hide most of targets when running config or mixed targets
        kbuild: remove old check for CFLAGS use
        kbuild: prefix Makefile.dtbinst path with $(srctree) unconditionally
        kallsyms: remove left-over Blackfin code
        kallsyms: reduce size a little on 64-bit
      ac747c07
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-4.20-rc1' of... · f8cab69b
      Linus Torvalds authored
      Merge tag 'linux-kselftest-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest updates from Shuah Khan:
       "This Kselftest update for Linux 4.20-rc1 consists of:
      
         - Improvements to ftrace test suite from Masami Hiramatsu.
      
         - Color coded ftrace PASS / FAIL results from Steven Rostedt (VMware)
           to improve readability of reports.
      
         - watchdog Fixes and enhancement to add gettimeout and get|set
           pretimeout options from Jerry Hoemann.
      
         - Several fixes to warnings and spelling etc"
      
      * tag 'linux-kselftest-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (40 commits)
        selftests/ftrace: Strip escape sequences for log file
        selftests/ftrace: Use colored output when available
        selftests: fix warning: "_GNU_SOURCE" redefined
        selftests: kvm: Fix -Wformat warnings
        selftests/ftrace: Add color to the PASS / FAIL results
        kvm: selftests: fix spelling mistake "Insufficent" -> "Insufficient"
        selftests: gpio: Fix OUTPUT directory in Makefile
        selftests: gpio: restructure Makefile
        selftests: watchdog: Fix ioctl SET* error paths to take oneshot exit path
        selftests: watchdog: Add gettimeout and get|set pretimeout
        selftests: watchdog: Fix error message.
        selftests: watchdog: fix message when /dev/watchdog open fails
        selftests/ftrace: Add ftrace cpumask testcase
        selftests/ftrace: Add wakeup_rt tracer testcase
        selftests/ftrace: Add wakeup tracer testcase
        selftests/ftrace: Add stacktrace ftrace filter command testcase
        selftests/ftrace: Add trace_pipe testcase
        selftests/ftrace: Add function filter on module testcase
        selftests/ftrace: Add max stack tracer testcase
        selftests/ftrace: Add function profiling stat testcase
        ...
      f8cab69b
    • Linus Torvalds's avatar
      Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax · dad4f140
      Linus Torvalds authored
      Pull XArray conversion from Matthew Wilcox:
       "The XArray provides an improved interface to the radix tree data
        structure, providing locking as part of the API, specifying GFP flags
        at allocation time, eliminating preloading, less re-walking the tree,
        more efficient iterations and not exposing RCU-protected pointers to
        its users.
      
        This patch set
      
         1. Introduces the XArray implementation
      
         2. Converts the pagecache to use it
      
         3. Converts memremap to use it
      
        The page cache is the most complex and important user of the radix
        tree, so converting it was most important. Converting the memremap
        code removes the only other user of the multiorder code, which allows
        us to remove the radix tree code that supported it.
      
        I have 40+ followup patches to convert many other users of the radix
        tree over to the XArray, but I'd like to get this part in first. The
        other conversions haven't been in linux-next and aren't suitable for
        applying yet, but you can see them in the xarray-conv branch if you're
        interested"
      
      * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
        radix tree: Remove multiorder support
        radix tree test: Convert multiorder tests to XArray
        radix tree tests: Convert item_delete_rcu to XArray
        radix tree tests: Convert item_kill_tree to XArray
        radix tree tests: Move item_insert_order
        radix tree test suite: Remove multiorder benchmarking
        radix tree test suite: Remove __item_insert
        memremap: Convert to XArray
        xarray: Add range store functionality
        xarray: Move multiorder_check to in-kernel tests
        xarray: Move multiorder_shrink to kernel tests
        xarray: Move multiorder account test in-kernel
        radix tree test suite: Convert iteration test to XArray
        radix tree test suite: Convert tag_tagged_items to XArray
        radix tree: Remove radix_tree_clear_tags
        radix tree: Remove radix_tree_maybe_preload_order
        radix tree: Remove split/join code
        radix tree: Remove radix_tree_update_node_t
        page cache: Finish XArray conversion
        dax: Convert page fault handlers to XArray
        ...
      dad4f140
    • David S. Miller's avatar
      net: Properly unlink GRO packets on overflow. · ece23711
      David S. Miller authored
      Just like with normal GRO processing, we have to initialize
      skb->next to NULL when we unlink overflow packets from the
      GRO hash lists.
      
      Fixes: d4546c25 ("net: Convert GRO SKB handling to list_head.")
      Reported-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ece23711
    • Leonardo Bras's avatar
      modpost: Create macro to avoid variable shadowing · c2b1a922
      Leonardo Bras authored
      Create DEF_FIELD_ADDR_VAR as a more generic version of the DEF_FIELD_ADD
      macro, allowing usage of a variable name other than the struct element name.
      Also, sets DEF_FIELD_ADDR as a specific usage of DEF_FILD_ADDR_VAR in which
      the var name is the same as the struct element name.
      Then, makes use of DEF_FIELD_ADDR_VAR to create a variable of another name,
      in order to avoid variable shadowing.
      Signed-off-by: default avatarLeonardo Bras <leobras.c@gmail.com>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      c2b1a922
    • Leonardo Bras's avatar
      ASN.1: Remove unnecessary shadowed local variable · 9e1e8194
      Leonardo Bras authored
      Remove an unnecessary shadowed local variable (start).
      It was used only once, with the same value it was started before
      the if block.
      Signed-off-by: default avatarLeonardo Bras <leobras.c@gmail.com>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      9e1e8194
  4. 27 Oct, 2018 10 commits
    • Linus Torvalds's avatar
      HID: we do not randomly make new drivers 'default y' · 69d5b97c
      Linus Torvalds authored
      .. even when that "default y" is hidden syntactically as a
      
      	default !EXPERT
      
      it's wrong.
      
      The only reason something should be 'default y' is if it used to be
      built-in, and it was made configurable, and the 'default y' is just
      retaining the status quo.
      
      Altheratively, the hardware for the driver has become _so_ common that
      it really makes sense for everybody to build it.  Finally, one possible
      reason for 'default y' is because the option is not enabling any new
      code at all, but is just enabling other options (the networking people
      do this for vendor options, for example, so that you can disable whole
      vendors at a time).
      
      Clearly, none of these cases hold for the BigBen Interactive Kids'
      gamepad, and HID_BIGBEN_FF should thus most definitely not default
      to on for everybody.
      
      Cc: Hanno Zulla <kontakt@hanno.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69d5b97c
    • Linus Torvalds's avatar
      Merge tag 'linux-watchdog-4.20-rc1' of git://www.linux-watchdog.org/linux-watchdog · 5ecf3e11
      Linus Torvalds authored
      Pull watchdog updates from Wim Van Sebroeck:
      
       - Add Armada 37xx CPU watchdog
      
       - w83627hf_wdt: Add Support for NCT6796D, NCT6797D, NCT6798D
      
       - hpwdt: several improvements
      
       - renesas_wdt: SPDX identifiers, stop when unregistering, support for
         R7S9210
      
       - rza_wdt: SPDX identifiers, support longer timeouts
      
       - core: fix null pointer dereference when releasing cdev
      
       - iTCO_wdt: Drop option vendorsupport=2
      
       - sama5d4: fix timeout-sec usage
      
       - lantiq_wdt: convert to watchdog framework
      
       - several small fixes
      
      * tag 'linux-watchdog-4.20-rc1' of git://www.linux-watchdog.org/linux-watchdog: (30 commits)
        watchdog: ts4800: release syscon device node in ts4800_wdt_probe()
        watchdog: armada_37xx_wdt: use do_div for u64 division
        documentation: watchdog: add documentation for armada-37xx-wdt
        dt-bindings: watchdog: Document armada-37xx-wdt binding
        watchdog: Add support for Armada 37xx CPU watchdog
        dt-bindings: watchdog: add mpc8xxx-wdt support
        watchdog: mpc8xxx: provide boot status
        MAINTAINERS: Fix file pattern for MEN Z069 watchdog driver
        dt-bindings: watchdog: renesas-wdt: Add support for R7S9210
        watchdog: rza_wdt: Support longer timeouts
        watchdog: hpwdt: Disable PreTimeout when Timeout is smaller
        watchdog: w83627hf_wdt: Support NCT6796D, NCT6797D, NCT6798D
        watchdog: mpc8xxx: use dev_xxxx() instead of pr_xxxx()
        watchdog: lantiq: add get_timeleft callback
        watchdog: lantiq: Convert to watchdog_device
        watchdog: lantiq: update register names to better match spec
        watchdog: sama5d4: fix timeout-sec usage
        watchdog: fix a small number of "watchog" typos in comments
        watchdog: rza_wdt: convert to SPDX identifiers
        watchdog: iTCO_wdt: Remove unused hooks
        ...
      5ecf3e11
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · ed3f4e23
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
       "Just random driver fixups, nothing exiting"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: synaptics - avoid using uninitialized variable when probing
        Input: xen-kbdfront - mark expected switch fall-through
        Input: atmel_mxt_ts - mark expected switch fall-through
        Input: cyapa - mark expected switch fall-throughs
        Input: wm97xx-ts - fix exit path
        Input: of_touchscreen - add support for touchscreen-min-x|y
        Input: Fix DIR-685 touchkeys MAINTAINERS entry
        Input: elants_i2c - use DMA safe i2c when possible
        Input: silead - try firmware reload after unsuccessful resume
        Input: st1232 - set INPUT_PROP_DIRECT property
        Input: xilinx_ps2 - convert to using %pOFn instead of device_node.name
        Input: atmel_mxt_ts - fix multiple <linux/property.h> includes
        Input: sun4i-lradc - convert to using %pOFn instead of device_node.name
        Input: pwm-vibrator - correct pwms in DT binding example
      ed3f4e23
    • Linus Torvalds's avatar
      Merge tag 'rtc-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · c7b7eefa
      Linus Torvalds authored
      Pull RTC updates from Alexandre Belloni:
       "This cycle, there were mostly non urgent fixes in drivers. I also
        finally unexported the non managed registration.
      
        Subsystem:
      
         - non devm managed registration is now removed from the driver API
      
         - all the unnecessary rtc_valid_tm() calls have been removed
      
        Drivers:
      
         - abx80X: watchdog support
      
         - cmos: fix non ACPI support
      
         - sc27xx: fix alarm support
      
         - Remove a possible sysfs race condition for ab8500, ds1307, ds1685,
           isl1208
      
         - Fix a possible race condition where an irq handler may be called
           before the rtc_device struct is allocated for mt6397, pl030,
           menelaus, armada38x"
      
      * tag 'rtc-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (54 commits)
        rtc: sc27xx: Always read normal alarm when registering RTC device
        rtc: sc27xx: Add check to see if need to enable the alarm interrupt
        rtc: sc27xx: Remove interrupts disable and clear in probe()
        rtc: sc27xx: Clear SPG value update interrupt status
        rtc: sc27xx: Set wakeup capability before registering rtc device
        rtc: s35390a: Change buf's type to u8 in s35390a_init
        rtc: ds1307: fix ds1339 wakealarm support
        rtc: ds1685: simplify getting .driver_data
        rtc: m41t80: mark expected switch fall-through
        rtc: tegra: Propagate errors from platform_get_irq()
        rtc: cmos: Remove the `use_acpi_alarm' module parameter for !ACPI
        rtc: cmos: Fix non-ACPI undefined reference to `hpet_rtc_interrupt'
        rtc: mv: let the core handle invalid alarms
        rtc: vr41xx: switch to rtc_time64_to_tm/rtc_tm_to_time64
        rtc: ab8500: remove useless check
        rtc: ab8500: let the core handle range
        rtc: ab8500: use rtc_add_group
        rtc: rs5c348: report error when time is invalid
        rtc: rs5c348: remove forward declaration
        rtc: rs5c348: remove useless label
        ...
      c7b7eefa
    • Linus Torvalds's avatar
      Merge tag 'led-fix-for-4.20-rc1' of... · e5585453
      Linus Torvalds authored
      Merge tag 'led-fix-for-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
      
      Pull LED fix from Jacek Anaszewski.
      
      * tag 'led-fix-for-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
        leds: gpio: set led_dat->gpiod pointer for OF defined GPIO leds
      e5585453
    • Linus Torvalds's avatar
      i2c-hid: properly terminate i2c_hid_dmi_desc_override_table[] array · b59dfdae
      Linus Torvalds authored
      Commit 9ee3e066 ("HID: i2c-hid: override HID descriptors for certain
      devices") added a new dmi_system_id quirk table to override certain HID
      report descriptors for some systems that lack them.
      
      But the table wasn't properly terminated, causing the dmi matching to
      walk off into la-la-land, and starting to treat random data as dmi
      descriptor pointers, causing boot-time oopses if you were at all
      unlucky.
      
      Terminate the array.
      
      We really should have some way to just statically check that arrays that
      should be terminated by an empty entry actually are so.  But the HID
      people really should have caught this themselves, rather than have me
      deal with an oops during the merge window.  Tssk, tssk.
      
      Cc: Julian Sax <jsbc@gmx.de>
      Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b59dfdae
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 6788fac8
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-10-27
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix toctou race in BTF header validation, from Martin and Wenwen.
      
      2) Fix devmap interface comparison in notifier call which was
         neglecting netns, from Taehee.
      
      3) Several fixes in various places, for example, correcting direct
         packet access and helper function availability, from Daniel.
      
      4) Fix BPF kselftest config fragment to include af_xdp and sockmap,
         from Naresh.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6788fac8
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 345671ea
      Linus Torvalds authored
      Merge updates from Andrew Morton:
      
       - a few misc things
      
       - ocfs2 updates
      
       - most of MM
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits)
        hugetlbfs: dirty pages as they are added to pagecache
        mm: export add_swap_extent()
        mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS
        tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE
        mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page()
        mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()
        mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition
        mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t
        mm/gup: cache dev_pagemap while pinning pages
        Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"
        mm: return zero_resv_unavail optimization
        mm: zero remaining unavailable struct pages
        tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option
        tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option
        tools/testing/selftests/vm/gup_benchmark.c: allow user specified file
        tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage
        mm/gup_benchmark.c: add additional pinning methods
        mm/gup_benchmark.c: time put_page()
        mm: don't raise MEMCG_OOM event due to failed high-order allocation
        mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock
        ...
      345671ea
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 49040081
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "What better way to start off a weekend than with some networking bug
        fixes:
      
        1) net namespace leak in dump filtering code of ipv4 and ipv6, fixed
           by David Ahern and Bjørn Mork.
      
        2) Handle bad checksums from hardware when using CHECKSUM_COMPLETE
           properly in UDP, from Sean Tranchetti.
      
        3) Remove TCA_OPTIONS from policy validation, it turns out we don't
           consistently use nested attributes for this across all packet
           schedulers. From David Ahern.
      
        4) Fix SKB corruption in cadence driver, from Tristram Ha.
      
        5) Fix broken WoL handling in r8169 driver, from Heiner Kallweit.
      
        6) Fix OOPS in pneigh_dump_table(), from Eric Dumazet"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
        net/neigh: fix NULL deref in pneigh_dump_table()
        net: allow traceroute with a specified interface in a vrf
        bridge: do not add port to router list when receives query with source 0.0.0.0
        net/smc: fix smc_buf_unuse to use the lgr pointer
        ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called
        net/{ipv4,ipv6}: Do not put target net if input nsid is invalid
        lan743x: Remove SPI dependency from Microchip group.
        drivers: net: remove <net/busy_poll.h> inclusion when not needed
        net: phy: genphy_10g_driver: Avoid NULL pointer dereference
        r8169: fix broken Wake-on-LAN from S5 (poweroff)
        octeontx2-af: Use GFP_ATOMIC under spin lock
        net: ethernet: cadence: fix socket buffer corruption problem
        net/ipv6: Allow onlink routes to have a device mismatch if it is the default route
        net: sched: Remove TCA_OPTIONS from policy
        ice: Poll for link status change
        ice: Allocate VF interrupts and set queue map
        ice: Introduce ice_dev_onetime_setup
        net: hns3: Fix for warning uninitialized symbol hw_err_lst3
        octeontx2-af: Copy the right amount of memory
        net: udp: fix handling of CHECKSUM_COMPLETE packets
        ...
      49040081
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · a45dcff7
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "Some more sparc fixups, mostly aimed at getting the allmodconfig build
        up and clean again"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Rework xchg() definition to avoid warnings.
        sparc64: Export __node_distance.
        sparc64: Make corrupted user stacks more debuggable.
      a45dcff7
  5. 26 Oct, 2018 12 commits
    • Mike Kravetz's avatar
      hugetlbfs: dirty pages as they are added to pagecache · 22146c3c
      Mike Kravetz authored
      Some test systems were experiencing negative huge page reserve counts and
      incorrect file block counts.  This was traced to /proc/sys/vm/drop_caches
      removing clean pages from hugetlbfs file pagecaches.  When non-hugetlbfs
      explicit code removes the pages, the appropriate accounting is not
      performed.
      
      This can be recreated as follows:
       fallocate -l 2M /dev/hugepages/foo
       echo 1 > /proc/sys/vm/drop_caches
       fallocate -l 2M /dev/hugepages/foo
       grep -i huge /proc/meminfo
         AnonHugePages:         0 kB
         ShmemHugePages:        0 kB
         HugePages_Total:    2048
         HugePages_Free:     2047
         HugePages_Rsvd:    18446744073709551615
         HugePages_Surp:        0
         Hugepagesize:       2048 kB
         Hugetlb:         4194304 kB
       ls -lsh /dev/hugepages/foo
         4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
      
      To address this issue, dirty pages as they are added to pagecache.  This
      can easily be reproduced with fallocate as shown above.  Read faulted
      pages will eventually end up being marked dirty.  But there is a window
      where they are clean and could be impacted by code such as drop_caches.
      So, just dirty them all as they are added to the pagecache.
      
      Link: http://lkml.kernel.org/r/b5be45b8-5afe-56cd-9482-28384699a049@oracle.com
      Fixes: 6bda666a ("hugepages: fold find_or_alloc_pages into huge_no_page()")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMihcla Hocko <mhocko@suse.com>
      Reviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22146c3c
    • Omar Sandoval's avatar
      mm: export add_swap_extent() · aa8aa8a3
      Omar Sandoval authored
      Btrfs currently does not support swap files because swap's use of bmap
      does not work with copy-on-write and multiple devices.  See 35054394
      ("Btrfs: stop providing a bmap operation to avoid swapfile corruptions").
      
      However, the swap code has a mechanism for the filesystem to manually add
      swap extents using add_swap_extent() from the ->swap_activate() aop.
      iomap has done this since 67482129 ("iomap: add a swapfile activation
      function").  Btrfs will do the same in a later patch, so export
      add_swap_extent().
      
      Link: http://lkml.kernel.org/r/bb1208575e02829aae51b538709476964f97b1ea.1536704650.git.osandov@fb.comSigned-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aa8aa8a3
    • Omar Sandoval's avatar
      mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS · bc4ae27d
      Omar Sandoval authored
      The SWP_FILE flag serves two purposes: to make swap_{read,write}page() go
      through the filesystem, and to make swapoff() call ->swap_deactivate().
      For Btrfs, we want the latter but not the former, so split this flag into
      two.  This makes us always call ->swap_deactivate() if ->swap_activate()
      succeeded, not just if it didn't add any swap extents itself.
      
      This also resolves the issue of the very misleading name of SWP_FILE,
      which is only used for swap files over NFS.
      
      Link: http://lkml.kernel.org/r/6d63d8668c4287a4f6d203d65696e96f80abdfc7.1536704650.git.osandov@fb.comSigned-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc4ae27d
    • Michael Ellerman's avatar
      tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE · 91cbacc3
      Michael Ellerman authored
      Add a test for MAP_FIXED_NOREPLACE, based on some code originally by Jann
      Horn.  This would have caught the overlap bug reported by Daniel Micay.
      
      I originally suggested to Michal that we create MAP_FIXED_NOREPLACE, but
      instead of writing a selftest I spent my time bike-shedding whether it
      should be called MAP_FIXED_SAFE/NOCLOBBER/WEAK/NEW ..  mea culpa.
      
      Link: http://lkml.kernel.org/r/20181013133929.28653-1-mpe@ellerman.id.auSigned-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Jason Evans <jasone@google.com>
      Cc: David Goldblatt <davidtgoldblatt@gmail.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      91cbacc3
    • Andrea Arcangeli's avatar
      mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page() · 7eef5f97
      Andrea Arcangeli authored
      There should be no cache left by the time we overwrite the old transhuge
      pmd with the new one.  It's already too late to flush through the virtual
      address because we already copied the page data to the new physical
      address.
      
      So flush the cache before the data copy.
      
      Also delete the "end" variable to shutoff a "unused variable" warning on
      x86 where flush_cache_range() is a noop.
      
      Link: http://lkml.kernel.org/r/20181015202311.7209-1-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7eef5f97
    • Andrea Arcangeli's avatar
      mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() · 7066f0f9
      Andrea Arcangeli authored
      change_huge_pmd() after arming the numa/protnone pmd doesn't flush the TLB
      right away.  do_huge_pmd_numa_page() flushes the TLB before calling
      migrate_misplaced_transhuge_page().  By the time do_huge_pmd_numa_page()
      runs some CPU could still access the page through the TLB.
      
      change_huge_pmd() before arming the numa/protnone transhuge pmd calls
      mmu_notifier_invalidate_range_start().  So there's no need of
      mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_only_end()
      sequence in migrate_misplaced_transhuge_page() too, because by the time
      migrate_misplaced_transhuge_page() runs, the pmd mapping has already been
      invalidated in the secondary MMUs.  It has to or if a secondary MMU can
      still write to the page, the migrate_page_copy() would lose data.
      
      However an explicit mmu_notifier_invalidate_range() is needed before
      migrate_misplaced_transhuge_page() starts copying the data of the
      transhuge page or the below can happen for MMU notifier users sharing the
      primary MMU pagetables and only implementing ->invalidate_range:
      
      CPU0		CPU1		GPU sharing linux pagetables using
                                      only ->invalidate_range
      -----------	------------	---------
      				GPU secondary MMU writes to the page
      				mapped by the transhuge pmd
      change_pmd_range()
      mmu..._range_start()
      ->invalidate_range_start() noop
      change_huge_pmd()
      set_pmd_at(numa/protnone)
      pmd_unlock()
      		do_huge_pmd_numa_page()
      		CPU TLB flush globally (1)
      		CPU cannot write to page
      		migrate_misplaced_transhuge_page()
      				GPU writes to the page...
      		migrate_page_copy()
      				...GPU stops writing to the page
      CPU TLB flush (2)
      mmu..._range_end() (3)
      ->invalidate_range_stop() noop
      ->invalidate_range()
      				GPU secondary MMU is invalidated
      				and cannot write to the page anymore
      				(too late)
      
      Just like we need a CPU TLB flush (1) because the TLB flush (2) arrives
      too late, we also need a mmu_notifier_invalidate_range() before calling
      migrate_misplaced_transhuge_page(), because the ->invalidate_range() in
      (3) also arrives too late.
      
      This requirement is the result of the lazy optimization in
      change_huge_pmd() that releases the pmd_lock without first flushing the
      TLB and without first calling mmu_notifier_invalidate_range().
      
      Even converting the removed mmu_notifier_invalidate_range_only_end() into
      a mmu_notifier_invalidate_range_end() would not have been enough to fix
      this, because it run after migrate_page_copy().
      
      After the hugepage data copy is done migrate_misplaced_transhuge_page()
      can proceed and call set_pmd_at without having to flush the TLB nor any
      secondary MMUs because the secondary MMU invalidate, just like the CPU TLB
      flush, has to happen before the migrate_page_copy() is called or it would
      be a bug in the first place (and it was for drivers using
      ->invalidate_range()).
      
      KVM is unaffected because it doesn't implement ->invalidate_range().
      
      The standard PAGE_SIZEd migrate_misplaced_page is less accelerated and
      uses the generic migrate_pages which transitions the pte from
      numa/protnone to a migration entry in try_to_unmap_one() and flushes TLBs
      and all mmu notifiers there before copying the page.
      
      Link: http://lkml.kernel.org/r/20181013002430.698-3-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7066f0f9
    • Andrea Arcangeli's avatar
      mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition · d7c33934
      Andrea Arcangeli authored
      Patch series "migrate_misplaced_transhuge_page race conditions".
      
      Aaron found a new instance of the THP MADV_DONTNEED race against
      pmdp_clear_flush* variants, that was apparently left unfixed.
      
      While looking into the race found by Aaron, I may have found two more
      issues in migrate_misplaced_transhuge_page.
      
      These race conditions would not cause kernel instability, but they'd
      corrupt userland data or leave data non zero after MADV_DONTNEED.
      
      I did only minor testing, and I don't expect to be able to reproduce this
      (especially the lack of ->invalidate_range before migrate_page_copy,
      requires the latest iommu hardware or infiniband to reproduce).  The last
      patch is noop for x86 and it needs further review from maintainers of
      archs that implement flush_cache_range() (not in CC yet).
      
      To avoid confusion, it's not the first patch that introduces the bug fixed
      in the second patch, even before removing the
      pmdp_huge_clear_flush_notify, that _notify suffix was called after
      migrate_page_copy already run.
      
      This patch (of 3):
      
      This is a corollary of ced10803 ("thp: fix MADV_DONTNEED vs.  numa
      balancing race"), 58ceeb6b ("thp: fix MADV_DONTNEED vs.  MADV_FREE
      race") and 5b7abeae ("thp: fix MADV_DONTNEED vs clear soft dirty
      race).
      
      When the above three fixes where posted Dave asked
      https://lkml.kernel.org/r/929b3844-aec2-0111-fef7-8002f9d4e2b9@intel.com
      but apparently this was missed.
      
      The pmdp_clear_flush* in migrate_misplaced_transhuge_page() was introduced
      in a54a407f ("mm: Close races between THP migration and PMD numa
      clearing").
      
      The important part of such commit is only the part where the page lock is
      not released until the first do_huge_pmd_numa_page() finished disarming
      the pagenuma/protnone.
      
      The addition of pmdp_clear_flush() wasn't beneficial to such commit and
      there's no commentary about such an addition either.
      
      I guess the pmdp_clear_flush() in such commit was added just in case for
      safety, but it ended up introducing the MADV_DONTNEED race condition found
      by Aaron.
      
      At that point in time nobody thought of such kind of MADV_DONTNEED race
      conditions yet (they were fixed later) so the code may have looked more
      robust by adding the pmdp_clear_flush().
      
      This specific race condition won't destabilize the kernel, but it can
      confuse userland because after MADV_DONTNEED the memory won't be zeroed
      out.
      
      This also optimizes the code and removes a superfluous TLB flush.
      
      [akpm@linux-foundation.org: reflow comment to 80 cols, fix grammar and typo (beacuse)]
      Link: http://lkml.kernel.org/r/20181013002430.698-2-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7c33934
    • Clark Williams's avatar
      mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t · 026d1eaf
      Clark Williams authored
      The static lock quarantine_lock is used in quarantine.c to protect the
      quarantine queue datastructures.  It is taken inside quarantine queue
      manipulation routines (quarantine_put(), quarantine_reduce() and
      quarantine_remove_cache()), with IRQs disabled.  This is not a problem on
      a stock kernel but is problematic on an RT kernel where spin locks are
      sleeping spinlocks, which can sleep and can not be acquired with disabled
      interrupts.
      
      Convert the quarantine_lock to a raw spinlock_t.  The usage of
      quarantine_lock is confined to quarantine.c and the work performed while
      the lock is held is used for debug purpose.
      
      [bigeasy@linutronix.de: slightly altered the commit message]
      Link: http://lkml.kernel.org/r/20181010214945.5owshc3mlrh74z4b@linutronix.deSigned-off-by: default avatarClark Williams <williams@redhat.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      026d1eaf
    • Keith Busch's avatar
      mm/gup: cache dev_pagemap while pinning pages · df06b37f
      Keith Busch authored
      Getting pages from ZONE_DEVICE memory needs to check the backing device's
      live-ness, which is tracked in the device's dev_pagemap metadata.  This
      metadata is stored in a radix tree and looking it up adds measurable
      software overhead.
      
      This patch avoids repeating this relatively costly operation when
      dev_pagemap is used by caching the last dev_pagemap while getting user
      pages.  The gup_benchmark kernel self test reports this reduces time to
      get user pages to as low as 1/3 of the previous time.
      
      Link: http://lkml.kernel.org/r/20181012173040.15669-1-keith.busch@intel.comSigned-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df06b37f
    • Masayoshi Mizuma's avatar
      Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved" · 9fd61bc9
      Masayoshi Mizuma authored
      commit 124049de ("x86/e820: put !E820_TYPE_RAM regions into
      memblock.reserved") breaks movable_node kernel option because it changed
      the memory gap range to reserved memblock.  So, the node is marked as
      Normal zone even if the SRAT has Hot pluggable affinity.
      
          =====================================================================
          kernel: BIOS-e820: [mem 0x0000180000000000-0x0000180fffffffff] usable
          kernel: BIOS-e820: [mem 0x00001c0000000000-0x00001c0fffffffff] usable
          ...
          kernel: reserved[0x12]#011[0x0000181000000000-0x00001bffffffffff], 0x000003f000000000 bytes flags: 0x0
          ...
          kernel: ACPI: SRAT: Node 2 PXM 6 [mem 0x180000000000-0x1bffffffffff] hotplug
          kernel: ACPI: SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug
          ...
          kernel: Movable zone start for each node
          kernel:  Node 3: 0x00001c0000000000
          kernel: Early memory node ranges
          ...
          =====================================================================
      
      The original issue is fixed by the former patches, so let's revert commit
      124049de ("x86/e820: put !E820_TYPE_RAM regions into
      memblock.reserved").
      
      Link: http://lkml.kernel.org/r/20181002143821.5112-4-msys.mizuma@gmail.comSigned-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9fd61bc9
    • Pavel Tatashin's avatar
      mm: return zero_resv_unavail optimization · ec393a0f
      Pavel Tatashin authored
      When checking for valid pfns in zero_resv_unavail(), it is not necessary
      to verify that pfns within pageblock_nr_pages ranges are valid, only the
      first one needs to be checked.  This is because memory for pages are
      allocated in contiguous chunks that contain pageblock_nr_pages struct
      pages.
      
      Link: http://lkml.kernel.org/r/20181002143821.5112-3-msys.mizuma@gmail.comSigned-off-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Reviewed-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ec393a0f
    • Naoya Horiguchi's avatar
      mm: zero remaining unavailable struct pages · 907ec5fc
      Naoya Horiguchi authored
      Patch series "mm: Fix for movable_node boot option", v3.
      
      This patch series contains a fix for the movable_node boot option issue
      which was introduced by commit 124049de ("x86/e820: put !E820_TYPE_RAM
      regions into memblock.reserved").
      
      The commit breaks the option because it changed the memory gap range to
      reserved memblock.  So, the node is marked as Normal zone even if the SRAT
      has Hot pluggable affinity.
      
      First and second patch fix the original issue which the commit tried to
      fix, then revert the commit.
      
      This patch (of 3):
      
      There is a kernel panic that is triggered when reading /proc/kpageflags on
      the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
      
        BUG: unable to handle kernel paging request at fffffffffffffffe
        PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
        Oops: 0000 [#1] SMP PTI
        CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
        RIP: 0010:stable_page_flags+0x27/0x3c0
        Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
        RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
        RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
        RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
        R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
        R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
        FS:  00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
        Call Trace:
         kpageflags_read+0xc7/0x120
         proc_reg_read+0x3c/0x60
         __vfs_read+0x36/0x170
         vfs_read+0x89/0x130
         ksys_pread64+0x71/0x90
         do_syscall_64+0x5b/0x160
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7efc42e75e23
        Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
      
      According to kernel bisection, this problem became visible due to commit
      f7f99100 which changes how struct pages are initialized.
      
      Memblock layout affects the pfn ranges covered by node/zone.  Consider
      that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the
      default (no memmap= given) memblock layout is like below:
      
        MEMBLOCK configuration:
         memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
         memory.cnt  = 0x4
         memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
         memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
         memory[0x2]     [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
         memory[0x3]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
         ...
      
      If you give memmap=1G!4G (so it just covers memory[0x2]),
      the range [0x100000000-0x13fffffff] is gone:
      
        MEMBLOCK configuration:
         memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
         memory.cnt  = 0x3
         memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
         memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
         memory[0x2]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
         ...
      
      This causes shrinking node 0's pfn range because it is calculated by the
      address range of memblock.memory.  So some of struct pages in the gap
      range are left uninitialized.
      
      We have a function zero_resv_unavail() which does zeroing the struct pages
      outside memblock.memory, but currently it covers only the reserved
      unavailable range (i.e.  memblock.memory && !memblock.reserved).  This
      patch extends it to cover all unavailable range, which fixes the reported
      issue.
      
      Link: http://lkml.kernel.org/r/20181002143821.5112-2-msys.mizuma@gmail.com
      Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Tested-by: default avatarOscar Salvador <osalvador@suse.de>
      Tested-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      907ec5fc