1. 04 Feb, 2018 14 commits
    • Yonghong Song's avatar
      bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y · 09584b40
      Yonghong Song authored
      With CONFIG_BPF_JIT_ALWAYS_ON is defined in the config file,
      tools/testing/selftests/bpf/test_kmod.sh failed like below:
        [root@localhost bpf]# ./test_kmod.sh
        sysctl: setting key "net.core.bpf_jit_enable": Invalid argument
        [ JIT enabled:0 hardened:0 ]
        [  132.175681] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  132.458834] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:0 ]
        [  133.456025] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  133.730935] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:1 ]
        [  134.769730] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  135.050864] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:2 ]
        [  136.442882] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  136.821810] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [root@localhost bpf]#
      
      The test_kmod.sh load/remove test_bpf.ko multiple times with different
      settings for sysctl net.core.bpf_jit_{enable,harden}. The failed test #297
      of test_bpf.ko is designed such that JIT always fails.
      
      Commit 290af866 (bpf: introduce BPF_JIT_ALWAYS_ON config)
      introduced the following tightening logic:
          ...
              if (!bpf_prog_is_dev_bound(fp->aux)) {
                      fp = bpf_int_jit_compile(fp);
          #ifdef CONFIG_BPF_JIT_ALWAYS_ON
                      if (!fp->jited) {
                              *err = -ENOTSUPP;
                              return fp;
                      }
          #endif
          ...
      With this logic, Test #297 always gets return value -ENOTSUPP
      when CONFIG_BPF_JIT_ALWAYS_ON is defined, causing the test failure.
      
      This patch fixed the failure by marking Test #297 as expected failure
      when CONFIG_BPF_JIT_ALWAYS_ON is defined.
      
      Fixes: 290af866 (bpf: introduce BPF_JIT_ALWAYS_ON config)
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      09584b40
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · a6b88814
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2018-02-02
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) support XDP attach in libbpf, from Eric.
      
      2) minor fixes, from Daniel, Jakub, Yonghong, Alexei.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6b88814
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 35277995
      Linus Torvalds authored
      Pull spectre/meltdown updates from Thomas Gleixner:
       "The next round of updates related to melted spectrum:
      
         - The initial set of spectre V1 mitigations:
      
             - Array index speculation blocker and its usage for syscall,
               fdtable and the n180211 driver.
      
             - Speculation barrier and its usage in user access functions
      
         - Make indirect calls in KVM speculation safe
      
         - Blacklisting of known to be broken microcodes so IPBP/IBSR are not
           touched.
      
         - The initial IBPB support and its usage in context switch
      
         - The exposure of the new speculation MSRs to KVM guests.
      
         - A fix for a regression in x86/32 related to the cpu entry area
      
         - Proper whitelisting for known to be safe CPUs from the mitigations.
      
         - objtool fixes to deal proper with retpolines and alternatives
      
         - Exclude __init functions from retpolines which speeds up the boot
           process.
      
         - Removal of the syscall64 fast path and related cleanups and
           simplifications
      
         - Removal of the unpatched paravirt mode which is yet another source
           of indirect unproteced calls.
      
         - A new and undisputed version of the module mismatch warning
      
         - A couple of cleanup and correctness fixes all over the place
      
        Yet another step towards full mitigation. There are a few things still
        missing like the RBS underflow mitigation for Skylake and other small
        details, but that's being worked on.
      
        That said, I'm taking a belated christmas vacation for a week and hope
        that everything is magically solved when I'm back on Feb 12th"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
        KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
        KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
        KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
        KVM/x86: Add IBPB support
        KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
        x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL
        x86/pti: Mark constant arrays as __initconst
        x86/spectre: Simplify spectre_v2 command line parsing
        x86/retpoline: Avoid retpolines for built-in __init functions
        x86/kvm: Update spectre-v1 mitigation
        KVM: VMX: make MSR bitmaps per-VCPU
        x86/paravirt: Remove 'noreplace-paravirt' cmdline option
        x86/speculation: Use Indirect Branch Prediction Barrier in context switch
        x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
        x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"
        x86/spectre: Report get_user mitigation for spectre_v1
        nl80211: Sanitize array index in parse_txq_params
        vfs, fdtable: Prevent bounds-check bypass via speculative execution
        x86/syscall: Sanitize syscall table de-references under speculation
        x86/get_user: Use pointer masking to limit speculation
        ...
      35277995
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0a646e9c
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A small set of changes:
      
         - a fixup for kexec related to 5-level paging mode. That covers most
           of the cases except kexec from a 5-level kernel to a 4-level
           kernel. The latter needs more work and is going to come in 4.17
      
         - two trivial fixes for build warnings triggered by LTO and gcc-8"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/power: Fix swsusp_arch_resume prototype
        x86/dumpstack: Avoid uninitlized variable
        x86/kexec: Make kexec (mostly) work in 5-level paging mode
      0a646e9c
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f74a127f
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "Two small changes:
      
         - a fix for a interrupt regression caused by the vector management
           changes in 4.15 affecting museum pieces which rely on interrupt
           probing for legacy (e.g. parallel port) devices.
      
           One of the startup calls in the autoprobe code was not changed to
           the new activate_and_startup() function resulting in a warning and
           as a consequence failing to discover the device interrupt.
      
         - a trivial update to the copyright/license header of the STM32 irq
           chip driver"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Make legacy autoprobing work again
        irqchip/stm32: Fix copyright
      f74a127f
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180204' of git://git.kernel.dk/linux-block · 64b28683
      Linus Torvalds authored
      Pull more block updates from Jens Axboe:
       "Most of this is fixes and not new code/features:
      
         - skd fix from Arnd, fixing a build error dependent on sla allocator
           type.
      
         - blk-mq scheduler discard merging fixes, one from me and one from
           Keith. This fixes a segment miscalculation for blk-mq-sched, where
           we mistakenly think two segments are physically contigious even
           though the request isn't carrying real data. Also fixes a bio-to-rq
           merge case.
      
         - Don't re-set a bit on the buffer_head flags, if it's already set.
           This can cause scalability concerns on bigger machines and
           workloads. From Kemi Wang.
      
         - Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to
           distuingish between a local (device related) resource starvation
           and a global one. The latter might happen without IO being in
           flight, so it has to be handled a bit differently. From Ming"
      
      * tag 'for-linus-20180204' of git://git.kernel.dk/linux-block:
        block: skd: fix incorrect linux/slab_def.h inclusion
        buffer: Avoid setting buffer bits that are already set
        blk-mq-sched: Enable merging discard bio into request
        blk-mq: fix discard merge with scheduler attached
        blk-mq: introduce BLK_STS_DEV_RESOURCE
      64b28683
    • Linus Torvalds's avatar
      Merge tag 'ntb-4.16' of git://github.com/jonmason/ntb · d3658c22
      Linus Torvalds authored
      Pull NTB updates from Jon Mason:
       "Bug fixes galore, removal of the ntb atom driver, and updates to the
        ntb tools and tests to support the multi-port interface"
      
      * tag 'ntb-4.16' of git://github.com/jonmason/ntb: (37 commits)
        NTB: ntb_perf: fix cast to restricted __le32
        ntb_perf: Fix an error code in perf_copy_chunk()
        ntb_hw_switchtec: Make function switchtec_ntb_remove() static
        NTB: ntb_tool: fix memory leak on 'buf' on error exit path
        NTB: ntb_perf: fix printing of resource_size_t
        NTB: ntb_hw_idt: Set NTB_TOPO_SWITCH topology
        NTB: ntb_test: Update ntb_perf tests
        NTB: ntb_test: Update ntb_tool MW tests
        NTB: ntb_test: Add ntb_tool Message tests
        NTB: ntb_test: Update ntb_tool Scratchpad tests
        NTB: ntb_test: Update ntb_tool DB tests
        NTB: ntb_test: Update ntb_tool link tests
        NTB: ntb_test: Add ntb_tool port tests
        NTB: ntb_test: Safely use paths with whitespace
        NTB: ntb_perf: Add full multi-port NTB API support
        NTB: ntb_tool: Add full multi-port NTB API support
        NTB: ntb_pp: Add full multi-port NTB API support
        NTB: Fix UB/bug in ntb_mw_get_align()
        NTB: Set dma mask and dma coherent mask to NTB devices
        NTB: Rename NTB messaging API methods
        ...
      d3658c22
    • Linus Torvalds's avatar
      Merge tag 'mailbox-v4.16' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 8ac4840a
      Linus Torvalds authored
      Pull mailbox updates from Jassi Brar:
       "Misc driver changes only:
      
         - TI-MsgMgr: Fix print format for a printk
      
         - TI-MSgMgr: SPDX license switch for the driver
      
         - QCOM-IPC: Convert driver to use regmap
      
         - QCOM-IPC: Spawn sibling clock device from mailbox driver"
      
      * tag 'mailbox-v4.16' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        dt-bindings: mailbox: qcom: Document the APCS clock binding
        mailbox: qcom: Create APCS child device for clock controller
        mailbox: qcom: Convert APCS IPC driver to use regmap
        mailbox: ti-msgmgr: Use %zu for size_t print format
        mailbox: ti-msgmgr: Switch to SPDX Licensing
      8ac4840a
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 4141cf67
      Linus Torvalds authored
      Pull i2c updates from Wolfram Sang:
       "I2C has the following changes for you:
      
         - new flag to mark DMA safe buffers in i2c_msg. Also, some
           infrastructure around it. And docs.
      
         - huge refactoring of the at24 driver led by the new maintainer
           Bartosz
      
         - update I2C bus recovery to send STOP after recovery
      
         - conversion from gpio to gpiod for I2C bus recovery
      
         - adding a fault-injector to the i2c-gpio driver
      
         - lots of small driver improvements, and bigger ones to
           i2c-sh_mobile"
      
      * 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (99 commits)
        i2c: mv64xxx: Add myself as maintainer for this driver
        i2c: mv64xxx: Fix clock resource by adding an optional bus clock
        i2c: mv64xxx: Remove useless test before clk_disable_unprepare
        i2c: mxs: use true and false for boolean values
        i2c: meson: update doc description to fix build warnings
        i2c: meson: add configurable divider factors
        dt-bindings: i2c: update documentation for the Meson-AXG
        i2c: imx-lpi2c: add runtime pm support
        i2c: rcar: fix some trivial typos in comments
        i2c: davinci: fix the cpufreq transition
        i2c: rk3x: add proper kerneldoc header
        i2c: rk3x: account for const type of of_device_id.data
        i2c: acorn: remove outdated path from file header
        i2c: acorn: add MODULE_LICENSE tag
        i2c: rcar: implement bus recovery
        i2c: send STOP after successful bus recovery
        i2c: ensure SDA is released in recovery if SDA is controllable
        i2c: add 'set_sda' to bus_recovery_info
        i2c: add identifier in declarations for i2c_bus_recovery
        i2c: make kerneldoc about bus recovery more precise
        ...
      4141cf67
    • Linus Torvalds's avatar
      Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt · 3462ac57
      Linus Torvalds authored
      Pull fscrypt updates from Ted Ts'o:
       "Refactor support for encrypted symlinks to move common code to fscrypt"
      
      Ted also points out about the merge:
       "This makes the f2fs symlink code use the fscrypt_encrypt_symlink()
        from the fscrypt tree. This will end up dropping the kzalloc() ->
        f2fs_kzalloc() change, which means the fscrypt-specific allocation
        won't get tested by f2fs's kmalloc error injection system; which is
        fine"
      
      * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: (26 commits)
        fscrypt: fix build with pre-4.6 gcc versions
        fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info()
        fscrypt: document symlink length restriction
        fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
        fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
        fscrypt: calculate NUL-padding length in one place only
        fscrypt: move fscrypt_symlink_data to fscrypt_private.h
        fscrypt: remove fscrypt_fname_usr_to_disk()
        ubifs: switch to fscrypt_get_symlink()
        ubifs: switch to fscrypt ->symlink() helper functions
        ubifs: free the encrypted symlink target
        f2fs: switch to fscrypt_get_symlink()
        f2fs: switch to fscrypt ->symlink() helper functions
        ext4: switch to fscrypt_get_symlink()
        ext4: switch to fscrypt ->symlink() helper functions
        fscrypt: new helper function - fscrypt_get_symlink()
        fscrypt: new helper functions for ->symlink()
        fscrypt: trim down fscrypt.h includes
        fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
        fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
        ...
      3462ac57
    • Georgi Djakov's avatar
      dt-bindings: mailbox: qcom: Document the APCS clock binding · 0ae7d327
      Georgi Djakov authored
      Update the binding documentation for APCS to mention that the APCS
      hardware block also expose a clock controller functionality.
      
      The APCS clock controller is a mux and half-integer divider. It has the
      main CPU PLL as an input and provides the clock for the application CPU.
      Signed-off-by: default avatarGeorgi Djakov <georgi.djakov@linaro.org>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Acked-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarJassi Brar <jaswinder.singh@linaro.org>
      0ae7d327
    • Georgi Djakov's avatar
      mailbox: qcom: Create APCS child device for clock controller · c815d769
      Georgi Djakov authored
      There is a clock controller functionality provided by the APCS hardware
      block of msm8916 devices. The device-tree would represent an APCS node
      with both mailbox and clock provider properties.
      Create a platform child device for the clock controller functionality so
      the driver can probe and use APCS as parent.
      Signed-off-by: default avatarGeorgi Djakov <georgi.djakov@linaro.org>
      Acked-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarJassi Brar <jaswinder.singh@linaro.org>
      c815d769
    • Georgi Djakov's avatar
      mailbox: qcom: Convert APCS IPC driver to use regmap · c6a8b171
      Georgi Djakov authored
      This hardware block provides more functionalities that just IPC. Convert
      it to regmap to allow other child platform devices to use the same regmap.
      Signed-off-by: default avatarGeorgi Djakov <georgi.djakov@linaro.org>
      Acked-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarJassi Brar <jaswinder.singh@linaro.org>
      c6a8b171
    • Linus Torvalds's avatar
      Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 617aebe6
      Linus Torvalds authored
      Pull hardened usercopy whitelisting from Kees Cook:
       "Currently, hardened usercopy performs dynamic bounds checking on slab
        cache objects. This is good, but still leaves a lot of kernel memory
        available to be copied to/from userspace in the face of bugs.
      
        To further restrict what memory is available for copying, this creates
        a way to whitelist specific areas of a given slab cache object for
        copying to/from userspace, allowing much finer granularity of access
        control.
      
        Slab caches that are never exposed to userspace can declare no
        whitelist for their objects, thereby keeping them unavailable to
        userspace via dynamic copy operations. (Note, an implicit form of
        whitelisting is the use of constant sizes in usercopy operations and
        get_user()/put_user(); these bypass all hardened usercopy checks since
        these sizes cannot change at runtime.)
      
        This new check is WARN-by-default, so any mistakes can be found over
        the next several releases without breaking anyone's system.
      
        The series has roughly the following sections:
         - remove %p and improve reporting with offset
         - prepare infrastructure and whitelist kmalloc
         - update VFS subsystem with whitelists
         - update SCSI subsystem with whitelists
         - update network subsystem with whitelists
         - update process memory with whitelists
         - update per-architecture thread_struct with whitelists
         - update KVM with whitelists and fix ioctl bug
         - mark all other allocations as not whitelisted
         - update lkdtm for more sensible test overage"
      
      * tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
        lkdtm: Update usercopy tests for whitelisting
        usercopy: Restrict non-usercopy caches to size 0
        kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
        kvm: whitelist struct kvm_vcpu_arch
        arm: Implement thread_struct whitelist for hardened usercopy
        arm64: Implement thread_struct whitelist for hardened usercopy
        x86: Implement thread_struct whitelist for hardened usercopy
        fork: Provide usercopy whitelisting for task_struct
        fork: Define usercopy region in thread_stack slab caches
        fork: Define usercopy region in mm_struct slab caches
        net: Restrict unwhitelisted proto caches to size 0
        sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
        sctp: Define usercopy region in SCTP proto slab cache
        caif: Define usercopy region in caif proto slab cache
        ip: Define usercopy region in IP proto slab cache
        net: Define usercopy region in struct proto slab cache
        scsi: Define usercopy region in scsi_sense_cache slab cache
        cifs: Define usercopy region in cifs_request slab cache
        vxfs: Define usercopy region in vxfs_inode slab cache
        ufs: Define usercopy region in ufs_inode_cache slab cache
        ...
      617aebe6
  2. 03 Feb, 2018 26 commits
    • KarimAllah Ahmed's avatar
      KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL · b2ac58f9
      KarimAllah Ahmed authored
      [ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]
      
      ... basically doing exactly what we do for VMX:
      
      - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
      - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
        actually used it.
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDarren Kenny <darren.kenny@oracle.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon.de
      b2ac58f9
    • KarimAllah Ahmed's avatar
      KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL · d28b387f
      KarimAllah Ahmed authored
      [ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]
      
      Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
      guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
      be using a retpoline+IBPB based approach.
      
      To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
      guests that do not actually use the MSR, only start saving and restoring
      when a non-zero is written to it.
      
      No attempt is made to handle STIBP here, intentionally. Filtering STIBP
      may be added in a future patch, which may require trapping all writes
      if we don't want to pass it through directly to the guest.
      
      [dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDarren Kenny <darren.kenny@oracle.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de
      d28b387f
    • KarimAllah Ahmed's avatar
      KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES · 28c1c9fa
      KarimAllah Ahmed authored
      Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
      (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
      contents will come directly from the hardware, but user-space can still
      override it.
      
      [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarDarren Kenny <darren.kenny@oracle.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de
      28c1c9fa
    • Ashok Raj's avatar
      KVM/x86: Add IBPB support · 15d45071
      Ashok Raj authored
      The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
      control mechanism. It keeps earlier branches from influencing
      later ones.
      
      Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
      It's a command that ensures predicted branch targets aren't used after
      the barrier. Although IBRS and IBPB are enumerated by the same CPUID
      enumeration, IBPB is very different.
      
      IBPB helps mitigate against three potential attacks:
      
      * Mitigate guests from being attacked by other guests.
        - This is addressed by issing IBPB when we do a guest switch.
      
      * Mitigate attacks from guest/ring3->host/ring3.
        These would require a IBPB during context switch in host, or after
        VMEXIT. The host process has two ways to mitigate
        - Either it can be compiled with retpoline
        - If its going through context switch, and has set !dumpable then
          there is a IBPB in that path.
          (Tim's patch: https://patchwork.kernel.org/patch/10192871)
        - The case where after a VMEXIT you return back to Qemu might make
          Qemu attackable from guest when Qemu isn't compiled with retpoline.
        There are issues reported when doing IBPB on every VMEXIT that resulted
        in some tsc calibration woes in guest.
      
      * Mitigate guest/ring0->host/ring0 attacks.
        When host kernel is using retpoline it is safe against these attacks.
        If host kernel isn't using retpoline we might need to do a IBPB flush on
        every VMEXIT.
      
      Even when using retpoline for indirect calls, in certain conditions 'ret'
      can use the BTB on Skylake-era CPUs. There are other mitigations
      available like RSB stuffing/clearing.
      
      * IBPB is issued only for SVM during svm_free_vcpu().
        VMX has a vmclear and SVM doesn't.  Follow discussion here:
        https://lkml.org/lkml/2018/1/15/146
      
      Please refer to the following spec for more details on the enumeration
      and control.
      
      Refer here to get documentation about mitigations.
      
      https://software.intel.com/en-us/side-channel-security-support
      
      [peterz: rebase and changelog rewrite]
      [karahmed: - rebase
                 - vmx: expose PRED_CMD if guest has it in CPUID
                 - svm: only pass through IBPB if guest has it in CPUID
                 - vmx: support !cpu_has_vmx_msr_bitmap()]
                 - vmx: support nested]
      [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
              PRED_CMD is a write-only MSR]
      Signed-off-by: default avatarAshok Raj <ashok.raj@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
      Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de
      15d45071
    • KarimAllah Ahmed's avatar
      KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX · b7b27aa0
      KarimAllah Ahmed authored
      [dwmw2: Stop using KF() for bits in it, too]
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Link: https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karahmed@amazon.de
      b7b27aa0
    • Linus Torvalds's avatar
      Merge tag 'pstore-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 0771ad44
      Linus Torvalds authored
      Pull pstore update from Kees Cook:
       "Only a header cleanup this release; nice and quiet. :)
      
         - clean up hardirq header usage (Yang Shi)"
      
      * tag 'pstore-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        fs: pstore: remove unused hardirq.h
      0771ad44
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 23aedc4b
      Linus Torvalds authored
      Pull ext4 updates from Ted Ts'o:
       "Only miscellaneous cleanups and bug fixes for ext4 this cycle"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: create ext4_kset dynamically
        ext4: create ext4_feat kobject dynamically
        ext4: release kobject/kset even when init/register fail
        ext4: fix incorrect indentation of if statement
        ext4: correct documentation for grpid mount option
        ext4: use 'sbi' instead of 'EXT4_SB(sb)'
        ext4: save error to disk in __ext4_grp_locked_error()
        jbd2: fix sphinx kernel-doc build warnings
        ext4: fix a race in the ext4 shutdown path
        mbcache: make sure c_entry_count is not decremented past zero
        ext4: no need flush workqueue before destroying it
        ext4: fixed alignment and minor code cleanup in ext4.h
        ext4: fix ENOSPC handling in DAX page fault handler
        dax: pass detailed error code from dax_iomap_fault()
        mbcache: revert "fs/mbcache.c: make count_objects() more robust"
        mbcache: initialize entry->e_referenced in mb_cache_entry_create()
        ext4: fix up remaining files with SPDX cleanups
      23aedc4b
    • Linus Torvalds's avatar
      Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · 85b8bac9
      Linus Torvalds authored
      Pull dmi subsystem updates/fixes from Jean Delvare.
      
      * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
        firmware: dmi: handle missing DMI data gracefully
        firmware: dmi_scan: Fix handling of empty DMI strings
        firmware: dmi_scan: Drop dmi_initialized
        firmware: dmi: Optimize dmi_matches
      85b8bac9
    • Linus Torvalds's avatar
      Merge branch 'fixes-v4.16-rc1' of... · 1726aa70
      Linus Torvalds authored
      Merge branch 'fixes-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
      
      Pull integrity fixes from James Morris:
      
      -  add James Bottommley as a Trusted Keys maintainer.
      
       - IMA: re-initialize iint->atomic_flags on iint_free(), from Mimi.
      
      * 'fixes-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        ima: re-initialize iint->atomic_flags
        maintainers: update trusted keys
      1726aa70
    • Thomas Gleixner's avatar
      Merge branch 'msr-bitmaps' of git://git.kernel.org/pub/scm/virt/kvm/kvm into x86/pti · a96223f1
      Thomas Gleixner authored
      Pull the KVM prerequisites so the IBPB patches apply.
      a96223f1
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c80c238a
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) The bnx2x can hang if you give it a GSO packet with a segment size
          which is too big for the hardware, detect and drop in this case.
          From Daniel Axtens.
      
       2) Fix some overflows and pointer leaks in xtables, from Dmitry Vyukov.
      
       3) Missing RCU locking in igmp, from Eric Dumazet.
      
       4) Fix RX checksum handling on r8152, it can only checksum UDP and TCP
          packets. From Hayes Wang.
      
       5) Minor pacing tweak to TCP BBR congestion control, from Neal
          Cardwell.
      
       6) Missing RCU annotations in cls_u32, from Paolo Abeni.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
        Revert "defer call to mem_cgroup_sk_alloc()"
        soreuseport: fix mem leak in reuseport_add_sock()
        net: qlge: use memmove instead of skb_copy_to_linear_data
        net: qed: use correct strncpy() size
        net: cxgb4: avoid memcpy beyond end of source buffer
        cls_u32: add missing RCU annotation.
        r8152: set rx mode early when linking on
        r8152: fix wrong checksum status for received IPv4 packets
        nfp: fix TLV offset calculation
        net: pxa168_eth: add netconsole support
        net: igmp: add a missing rcu locking section
        ibmvnic: fix firmware version when no firmware level has been provided by the VIOS server
        vmxnet3: remove redundant initialization of pointer 'rq'
        lan78xx: remove redundant initialization of pointer 'phydev'
        net: jme: remove unused initialization of 'rxdesc'
        rtnetlink: remove check for IFLA_IF_NETNSID
        rocker: fix possible null pointer dereference in rocker_router_fib_event_work
        inet: Avoid unitialized variable warning in inet_unhash()
        net: bridge: Fix uninitialized error in br_fdb_sync_static()
        openvswitch: Remove padding from packet before L3+ conntrack processing
        ...
      c80c238a
    • Linus Torvalds's avatar
      Merge tag 'gfs2-4.16.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 6ec4de89
      Linus Torvalds authored
      Pull GFS2 fixes from Bob Peterson:
       "Andreas Gruenbacher wrote two additional patches that we would like
        merged in this time. Both are regressions:
      
         - fix another kernel build dependency problem
      
         - fix a performance regression in glock dumps"
      
      * tag 'gfs2-4.16.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Glock dump performance regression fix
        gfs2: Fix the crc32c dependency
      6ec4de89
    • Linus Torvalds's avatar
      Merge tag 'scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · abbbd021
      Linus Torvalds authored
      Pull second set of SCSI updates from James Bottomley:
       "This is a set of three patches that depended on mq and zone changes in
        the block tree (now upstream)"
      
      * tag 'scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: Remove zone write locking
        scsi: sd_zbc: Initialize device request queue zoned data
        scsi: scsi-mq-debugfs: Show more information
      abbbd021
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-4.16-rc1' of... · 6cb7903e
      Linus Torvalds authored
      Merge tag 'linux-kselftest-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest updates from Shuah Khan:
       "This update to Kselftest consists of fixes, cleanups, and SPDX license
        additions"
      
      * tag 'linux-kselftest-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: vm: update .gitignore with missing generated file
        selftests/x86: Add <test_name>{,_32,_64} targets
        selftests: Fix loss of test output in run_kselftests.sh
        selftest: ftrace: Fix to add 256 kprobe events correctly
        selftest: ftrace: Fix to pick text symbols for kprobes
        selftests: media_tests: Add SPDX license identifier
        selftests: kselftest.h: Add SPDX license identifier
        selftests: kselftest_install.sh: Add SPDX license identifier
        selftests: gen_kselftest_tar.h: Add SPDX license identifier
        selftests: media_tests: Fix Makefile 'clean' target warning
        tools/testing: Fix trailing semicolon
        kselftest: fix OOM in memory compaction test
        selftests: seccomp: fix compile error seccomp_bpf
      6cb7903e
    • Linus Torvalds's avatar
      pinctrl: remove include file from <linux/device.h> · 23c35f48
      Linus Torvalds authored
      When pulling the recent pinctrl merge, I was surprised by how a
      pinctrl-only pull request ended up rebuilding basically the whole
      kernel.
      
      The reason for that ended up being that <linux/device.h> included
      <linux/pinctrl/devinfo.h>, so any change to that file ended up causing
      pretty much every driver out there to be rebuilt.
      
      The reason for that was because 'struct device' has this in it:
      
          #ifdef CONFIG_PINCTRL
              struct dev_pin_info     *pins;
          #endif
      
      but we already avoid header includes for these kinds of things in that
      header file, preferring to just use a forward-declaration of the
      structure instead.  Exactly to avoid this kind of header dependency.
      
      Since some drivers seem to expect that <linux/pinctrl/devinfo.h> header
      to come in automatically, move the include to <linux/pinctrl/pinctrl.h>
      instead.  It might be better to just make the includes more targeted,
      but I'm not going to review every driver.
      
      It would definitely be good to have a tool for finding and minimizing
      header dependencies automatically - or at least help with them.  Right
      now we almost certainly end up having way too many of these things, and
      it's hard to test every single configuration.
      
      FWIW, you can get a sense of the "hotness" of a header file with something
      like this after doing a full build:
      
          find . -name '.*.o.cmd' -print0 |
              xargs -0 tail --lines=+2 |
              grep -v 'wildcard ' |
              tr ' \\' '\n' |
              sort | uniq -c | sort -n | less -S
      
      which isn't exact (there are other things in those '*.o.cmd' than just
      the dependencies, and the "--lines=+2" only removes the header), but
      might a useful approximation.
      
      With this patch, <linux/pinctrl/devinfo.h> drops to "only" having 833
      users in the current x86-64 allmodconfig.  In contrast, <linux/device.h>
      has 14857 build files including it directly or indirectly.
      
      Of course, the headers that absolutely _everybody_ includes (things like
      <linux/types.h> etc) get a score of 23000+.
      
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23c35f48
    • Ard Biesheuvel's avatar
      firmware: dmi: handle missing DMI data gracefully · a81114d0
      Ard Biesheuvel authored
      Currently, when booting a kernel with DMI support on a platform that has
      no DMI tables, the following output is emitted into the kernel log:
      
        [    0.128818] DMI not present or invalid.
        ...
        [    1.306659] dmi: Firmware registration failed.
        ...
        [    2.908681] dmi-sysfs: dmi entry is absent.
      
      The first one is a pr_info(), but the subsequent ones are pr_err()s that
      complain about a condition that is not really an error to begin with.
      
      So let's clean this up, and give up silently if dma_available is not set.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMartin Hundebøll <mnhu@prevas.dk>
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      a81114d0
    • Jean Delvare's avatar
      firmware: dmi_scan: Fix handling of empty DMI strings · a7770ae1
      Jean Delvare authored
      The handling of empty DMI strings looks quite broken to me:
      * Strings from 1 to 7 spaces are not considered empty.
      * True empty DMI strings (string index set to 0) are not considered
        empty, and result in allocating a 0-char string.
      * Strings with invalid index also result in allocating a 0-char
        string.
      * Strings starting with 8 spaces are all considered empty, even if
        non-space characters follow (sounds like a weird thing to do, but
        I have actually seen occurrences of this in DMI tables before.)
      * Strings which are considered empty are reported as 8 spaces,
        instead of being actually empty.
      
      Some of these issues are the result of an off-by-one error in memcmp,
      the rest is incorrect by design.
      
      So let's get it square: missing strings and strings made of only
      spaces, regardless of their length, should be treated as empty and
      no memory should be allocated for them. All other strings are
      non-empty and should be allocated.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Fixes: 79da4721 ("x86: fix DMI out of memory problems")
      Cc: Parag Warudkar <parag.warudkar@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      a7770ae1
    • Jean Delvare's avatar
      firmware: dmi_scan: Drop dmi_initialized · 7117794f
      Jean Delvare authored
      I don't think it makes sense to check for a possible bad
      initialization order at run time on every system when it is all
      decided at build time.
      
      A more efficient way to make sure developers do not introduce new
      calls to dmi_check_system() too early in the initialization sequence
      is to simply document the expected call order. That way, developers
      have a chance to get it right immediately, without having to
      test-boot their kernel, wonder why it does not work, and parse the
      kernel logs for a warning message. And we get rid of the run-time
      performance penalty as a nice side effect.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      7117794f
    • Jean Delvare's avatar
      firmware: dmi: Optimize dmi_matches · 8cf4e6a0
      Jean Delvare authored
      Function dmi_matches can me made a bit faster:
      
      * The documented purpose of dmi_initialized is to catch too early
        calls to dmi_check_system(). I'm not fully convinced it justifies
        slowing down the initialization of all systems out there, but at
        least the check should not have been moved from dmi_check_system()
        to dmi_matches(). dmi_matches() is being called for every entry of
        the table passed to dmi_check_system(), causing the same redundant
        check to be performed again and again. So move it back to
        dmi_check_system(), reverting this specific portion of commit
        d7b1956f ("DMI: Introduce dmi_first_match to make the interface
        more flexible").
      
      * Don't check for the exact_match flag again when we already know its
        value.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Fixes: d7b1956f ("DMI: Introduce dmi_first_match to make the interface more flexible")
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Jeff Garzik <jgarzik@redhat.com>
      8cf4e6a0
    • Alexei Starovoitov's avatar
      Merge branch 'libbpf-xdp-support' · 09c0656d
      Alexei Starovoitov authored
      Eric Leblond says:
      
      ====================
      Here is an updated v8 version:
      - add if_link.h in uapi and remove the definition
      - fix a commit message
      - remove uapi from a include
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      09c0656d
    • Eric Leblond's avatar
      samples/bpf: use bpf_set_link_xdp_fd · b259c2ff
      Eric Leblond authored
      Use bpf_set_link_xdp_fd instead of set_link_xdp_fd to remove some
      code duplication and benefit of netlink ext ack errors message.
      Signed-off-by: default avatarEric Leblond <eric@regit.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b259c2ff
    • Eric Leblond's avatar
      6061a3d6
    • Eric Leblond's avatar
      libbpf: add error reporting in XDP · bbf48c18
      Eric Leblond authored
      Parse netlink ext attribute to get the error message returned by
      the card. Code is partially take from libnl.
      
      We add netlink.h to the uapi include of tools. And we need to
      avoid include of userspace netlink header to have a successful
      build of sample so nlattr.h has a define to avoid
      the inclusion. Using a direct define could have been an issue
      as NLMSGERR_ATTR_MAX can change in the future.
      
      We also define SOL_NETLINK if not defined to avoid to have to
      copy socket.h for a fixed value.
      Signed-off-by: default avatarEric Leblond <eric@regit.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bbf48c18
    • Eric Leblond's avatar
      libbpf: add function to setup XDP · 949abbe8
      Eric Leblond authored
      Most of the code is taken from set_link_xdp_fd() in bpf_load.c and
      slightly modified to be library compliant.
      Signed-off-by: default avatarEric Leblond <eric@regit.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      949abbe8
    • Eric Leblond's avatar
      tools: add netlink.h and if_link.h in tools uapi · dc2b9f19
      Eric Leblond authored
      The headers are necessary for libbpf compilation on system with older
      version of the headers.
      Signed-off-by: default avatarEric Leblond <eric@regit.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      dc2b9f19
    • Roman Gushchin's avatar
      Revert "defer call to mem_cgroup_sk_alloc()" · edbe69ef
      Roman Gushchin authored
      This patch effectively reverts commit 9f1c2674 ("net: memcontrol:
      defer call to mem_cgroup_sk_alloc()").
      
      Moving mem_cgroup_sk_alloc() to the inet_csk_accept() completely breaks
      memcg socket memory accounting, as packets received before memcg
      pointer initialization are not accounted and are causing refcounting
      underflow on socket release.
      
      Actually the free-after-use problem was fixed by
      commit c0576e39 ("net: call cgroup_sk_alloc() earlier in
      sk_clone_lock()") for the cgroup pointer.
      
      So, let's revert it and call mem_cgroup_sk_alloc() just before
      cgroup_sk_alloc(). This is safe, as we hold a reference to the socket
      we're cloning, and it holds a reference to the memcg.
      
      Also, let's drop BUG_ON(mem_cgroup_is_root()) check from
      mem_cgroup_sk_alloc(). I see no reasons why bumping the root
      memcg counter is a good reason to panic, and there are no realistic
      ways to hit it.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edbe69ef