- 06 Feb, 2018 5 commits
-
-
Daniel Borkmann authored
John Fastabend says: ==================== A set of fixes for sockmap to resolve programs referencing sockmaps and closing without deleting all entries in the map and/or not detaching BPF programs attached to the map. Both leaving entries in the map and not detaching programs may result in the map failing to be removed by BPF infrastructure due to reference counts never reaching zero. For this we pull in the ULP infrastructure to hook into the close() hook of the sock layer. This seemed natural because we have additional sockmap features (to add support for TX hooks) that will also use the ULP infrastructure. This allows us to cleanup entries in the map when socks are closed() and avoid trying to get the sk_state_change() hook to fire in all cases. The second issue resolved here occurs when users don't detach programs. The gist is a refcnt issue resolved by implementing the release callback. See patch for details. For testing I ran both sample/sockmap and selftests bpf/test_maps.c. Dave Watson ran TLS test suite on v1 version of the patches without the put_module error path change. v4 fix missing rcu_unlock() v3 wrap psock reference in RCU v2 changes rebased onto bpf-next with small update adding module_put ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
John Fastabend authored
When a program is attached to a map we increment the program refcnt to ensure that the program is not removed while it is potentially being referenced from sockmap side. However, if this same program also references the map (this is a reasonably common pattern in my programs) then the verifier will also increment the maps refcnt from the verifier. This is to ensure the map doesn't get garbage collected while the program has a reference to it. So we are left in a state where the map holds the refcnt on the program stopping it from being removed and releasing the map refcnt. And vice versa the program holds a refcnt on the map stopping it from releasing the refcnt on the prog. All this is fine as long as users detach the program while the map fd is still around. But, if the user omits this detach command we are left with a dangling map we can no longer release. To resolve this when the map fd is released decrement the program references and remove any reference from the map to the program. This fixes the issue with possibly dangling map and creates a user side API constraint. That is, the map fd must be held open for programs to be attached to a map. Fixes: 174a79ff ("bpf: sockmap with sk redirect support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
John Fastabend authored
The selftests test_maps program was leaving dangling BPF sockmap programs around because not all psock elements were removed from the map. The elements in turn hold a reference on the BPF program they are attached to causing BPF programs to stay open even after test_maps has completed. The original intent was that sk_state_change() would be called when TCP socks went through TCP_CLOSE state. However, because socks may be in SOCK_DEAD state or the sock may be a listening socket the event is not always triggered. To resolve this use the ULP infrastructure and register our own proto close() handler. This fixes the above case. Fixes: 174a79ff ("bpf: sockmap with sk redirect support") Reported-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
John Fastabend authored
Create a UID field and enum that can be used to assign ULPs to sockets. This saves a set of string comparisons if the ULP id is known. For sockmap, which is added in the next patches, a ULP is used to hook into TCP sockets close state. In this case the ULP being added is done at map insert time and the ULP is known and done on the kernel side. In this case the named lookup is not needed. Because we don't want to expose psock internals to user space socket options a user visible flag is also added. For TLS this is set for BPF it will be cleared. Alos remove pr_notice, user gets an error code back and should check that rather than rely on logs. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Yonghong Song authored
The tests at tools/testing/selftests/bpf can run in patch mode, e.g., make -C tools/testing/selftests/bpf run_tests With the batch mode, I experimented intermittent test failure of test_xdp_redirect.sh. .... selftests: test_xdp_redirect [PASS] selftests: test_xdp_redirect.sh [PASS] RTNETLINK answers: File exists selftests: test_xdp_meta [FAILED] selftests: test_xdp_meta.sh [FAIL] .... The following illustrates what caused the failure: (1). test_xdp_redirect creates veth pairs (veth1,veth11) and (veth2,veth22), and assign veth11 and veth22 to namespace ns1 and ns2 respectively. (2). at the end of test_xdp_redirect test, ns1 and ns2 are deleted. During this process, the deletion of actual namespace resources, including deletion of veth1{1} and veth2{2}, is put into a workqueue to be processed asynchronously. (3). test_xdp_meta tries to create veth pair (veth1, veth2). The previous veth deletions in step (2) have not finished yet, and veth1 or veth2 may be still valid in the kernel, thus causing the failure. The fix is to explicitly delete the veth pair before test_xdp_redirect exits. Only one end of veth needs deletion as the kernel will delete the other end automatically. Also test_xdp_meta is also fixed in similar manner to avoid future potential issues. Fixes: 996139e8 ("selftests: bpf: add a test for XDP redirect") Fixes: 22c88526 ("bpf: improve selftests and add tests for meta pointer") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 04 Feb, 2018 14 commits
-
-
Yonghong Song authored
With CONFIG_BPF_JIT_ALWAYS_ON is defined in the config file, tools/testing/selftests/bpf/test_kmod.sh failed like below: [root@localhost bpf]# ./test_kmod.sh sysctl: setting key "net.core.bpf_jit_enable": Invalid argument [ JIT enabled:0 hardened:0 ] [ 132.175681] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096 [ 132.458834] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed] [ JIT enabled:1 hardened:0 ] [ 133.456025] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096 [ 133.730935] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed] [ JIT enabled:1 hardened:1 ] [ 134.769730] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096 [ 135.050864] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed] [ JIT enabled:1 hardened:2 ] [ 136.442882] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096 [ 136.821810] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed] [root@localhost bpf]# The test_kmod.sh load/remove test_bpf.ko multiple times with different settings for sysctl net.core.bpf_jit_{enable,harden}. The failed test #297 of test_bpf.ko is designed such that JIT always fails. Commit 290af866 (bpf: introduce BPF_JIT_ALWAYS_ON config) introduced the following tightening logic: ... if (!bpf_prog_is_dev_bound(fp->aux)) { fp = bpf_int_jit_compile(fp); #ifdef CONFIG_BPF_JIT_ALWAYS_ON if (!fp->jited) { *err = -ENOTSUPP; return fp; } #endif ... With this logic, Test #297 always gets return value -ENOTSUPP when CONFIG_BPF_JIT_ALWAYS_ON is defined, causing the test failure. This patch fixed the failure by marking Test #297 as expected failure when CONFIG_BPF_JIT_ALWAYS_ON is defined. Fixes: 290af866 (bpf: introduce BPF_JIT_ALWAYS_ON config) Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller authored
Alexei Starovoitov says: ==================== pull-request: bpf 2018-02-02 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) support XDP attach in libbpf, from Eric. 2) minor fixes, from Daniel, Jakub, Yonghong, Alexei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull spectre/meltdown updates from Thomas Gleixner: "The next round of updates related to melted spectrum: - The initial set of spectre V1 mitigations: - Array index speculation blocker and its usage for syscall, fdtable and the n180211 driver. - Speculation barrier and its usage in user access functions - Make indirect calls in KVM speculation safe - Blacklisting of known to be broken microcodes so IPBP/IBSR are not touched. - The initial IBPB support and its usage in context switch - The exposure of the new speculation MSRs to KVM guests. - A fix for a regression in x86/32 related to the cpu entry area - Proper whitelisting for known to be safe CPUs from the mitigations. - objtool fixes to deal proper with retpolines and alternatives - Exclude __init functions from retpolines which speeds up the boot process. - Removal of the syscall64 fast path and related cleanups and simplifications - Removal of the unpatched paravirt mode which is yet another source of indirect unproteced calls. - A new and undisputed version of the module mismatch warning - A couple of cleanup and correctness fixes all over the place Yet another step towards full mitigation. There are a few things still missing like the RBS underflow mitigation for Skylake and other small details, but that's being worked on. That said, I'm taking a belated christmas vacation for a week and hope that everything is magically solved when I'm back on Feb 12th" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM/x86: Add IBPB support KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL x86/pti: Mark constant arrays as __initconst x86/spectre: Simplify spectre_v2 command line parsing x86/retpoline: Avoid retpolines for built-in __init functions x86/kvm: Update spectre-v1 mitigation KVM: VMX: make MSR bitmaps per-VCPU x86/paravirt: Remove 'noreplace-paravirt' cmdline option x86/speculation: Use Indirect Branch Prediction Barrier in context switch x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable" x86/spectre: Report get_user mitigation for spectre_v1 nl80211: Sanitize array index in parse_txq_params vfs, fdtable: Prevent bounds-check bypass via speculative execution x86/syscall: Sanitize syscall table de-references under speculation x86/get_user: Use pointer masking to limit speculation ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Thomas Gleixner: "A small set of changes: - a fixup for kexec related to 5-level paging mode. That covers most of the cases except kexec from a 5-level kernel to a 4-level kernel. The latter needs more work and is going to come in 4.17 - two trivial fixes for build warnings triggered by LTO and gcc-8" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/power: Fix swsusp_arch_resume prototype x86/dumpstack: Avoid uninitlized variable x86/kexec: Make kexec (mostly) work in 5-level paging mode
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Thomas Gleixner: "Two small changes: - a fix for a interrupt regression caused by the vector management changes in 4.15 affecting museum pieces which rely on interrupt probing for legacy (e.g. parallel port) devices. One of the startup calls in the autoprobe code was not changed to the new activate_and_startup() function resulting in a warning and as a consequence failing to discover the device interrupt. - a trivial update to the copyright/license header of the STM32 irq chip driver" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Make legacy autoprobing work again irqchip/stm32: Fix copyright
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull more block updates from Jens Axboe: "Most of this is fixes and not new code/features: - skd fix from Arnd, fixing a build error dependent on sla allocator type. - blk-mq scheduler discard merging fixes, one from me and one from Keith. This fixes a segment miscalculation for blk-mq-sched, where we mistakenly think two segments are physically contigious even though the request isn't carrying real data. Also fixes a bio-to-rq merge case. - Don't re-set a bit on the buffer_head flags, if it's already set. This can cause scalability concerns on bigger machines and workloads. From Kemi Wang. - Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to distuingish between a local (device related) resource starvation and a global one. The latter might happen without IO being in flight, so it has to be handled a bit differently. From Ming" * tag 'for-linus-20180204' of git://git.kernel.dk/linux-block: block: skd: fix incorrect linux/slab_def.h inclusion buffer: Avoid setting buffer bits that are already set blk-mq-sched: Enable merging discard bio into request blk-mq: fix discard merge with scheduler attached blk-mq: introduce BLK_STS_DEV_RESOURCE
-
git://github.com/jonmason/ntbLinus Torvalds authored
Pull NTB updates from Jon Mason: "Bug fixes galore, removal of the ntb atom driver, and updates to the ntb tools and tests to support the multi-port interface" * tag 'ntb-4.16' of git://github.com/jonmason/ntb: (37 commits) NTB: ntb_perf: fix cast to restricted __le32 ntb_perf: Fix an error code in perf_copy_chunk() ntb_hw_switchtec: Make function switchtec_ntb_remove() static NTB: ntb_tool: fix memory leak on 'buf' on error exit path NTB: ntb_perf: fix printing of resource_size_t NTB: ntb_hw_idt: Set NTB_TOPO_SWITCH topology NTB: ntb_test: Update ntb_perf tests NTB: ntb_test: Update ntb_tool MW tests NTB: ntb_test: Add ntb_tool Message tests NTB: ntb_test: Update ntb_tool Scratchpad tests NTB: ntb_test: Update ntb_tool DB tests NTB: ntb_test: Update ntb_tool link tests NTB: ntb_test: Add ntb_tool port tests NTB: ntb_test: Safely use paths with whitespace NTB: ntb_perf: Add full multi-port NTB API support NTB: ntb_tool: Add full multi-port NTB API support NTB: ntb_pp: Add full multi-port NTB API support NTB: Fix UB/bug in ntb_mw_get_align() NTB: Set dma mask and dma coherent mask to NTB devices NTB: Rename NTB messaging API methods ...
-
git://git.linaro.org/landing-teams/working/fujitsu/integrationLinus Torvalds authored
Pull mailbox updates from Jassi Brar: "Misc driver changes only: - TI-MsgMgr: Fix print format for a printk - TI-MSgMgr: SPDX license switch for the driver - QCOM-IPC: Convert driver to use regmap - QCOM-IPC: Spawn sibling clock device from mailbox driver" * tag 'mailbox-v4.16' of git://git.linaro.org/landing-teams/working/fujitsu/integration: dt-bindings: mailbox: qcom: Document the APCS clock binding mailbox: qcom: Create APCS child device for clock controller mailbox: qcom: Convert APCS IPC driver to use regmap mailbox: ti-msgmgr: Use %zu for size_t print format mailbox: ti-msgmgr: Switch to SPDX Licensing
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c updates from Wolfram Sang: "I2C has the following changes for you: - new flag to mark DMA safe buffers in i2c_msg. Also, some infrastructure around it. And docs. - huge refactoring of the at24 driver led by the new maintainer Bartosz - update I2C bus recovery to send STOP after recovery - conversion from gpio to gpiod for I2C bus recovery - adding a fault-injector to the i2c-gpio driver - lots of small driver improvements, and bigger ones to i2c-sh_mobile" * 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (99 commits) i2c: mv64xxx: Add myself as maintainer for this driver i2c: mv64xxx: Fix clock resource by adding an optional bus clock i2c: mv64xxx: Remove useless test before clk_disable_unprepare i2c: mxs: use true and false for boolean values i2c: meson: update doc description to fix build warnings i2c: meson: add configurable divider factors dt-bindings: i2c: update documentation for the Meson-AXG i2c: imx-lpi2c: add runtime pm support i2c: rcar: fix some trivial typos in comments i2c: davinci: fix the cpufreq transition i2c: rk3x: add proper kerneldoc header i2c: rk3x: account for const type of of_device_id.data i2c: acorn: remove outdated path from file header i2c: acorn: add MODULE_LICENSE tag i2c: rcar: implement bus recovery i2c: send STOP after successful bus recovery i2c: ensure SDA is released in recovery if SDA is controllable i2c: add 'set_sda' to bus_recovery_info i2c: add identifier in declarations for i2c_bus_recovery i2c: make kerneldoc about bus recovery more precise ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscryptLinus Torvalds authored
Pull fscrypt updates from Ted Ts'o: "Refactor support for encrypted symlinks to move common code to fscrypt" Ted also points out about the merge: "This makes the f2fs symlink code use the fscrypt_encrypt_symlink() from the fscrypt tree. This will end up dropping the kzalloc() -> f2fs_kzalloc() change, which means the fscrypt-specific allocation won't get tested by f2fs's kmalloc error injection system; which is fine" * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: (26 commits) fscrypt: fix build with pre-4.6 gcc versions fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info() fscrypt: document symlink length restriction fscrypt: fix up fscrypt_fname_encrypted_size() for internal use fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names fscrypt: calculate NUL-padding length in one place only fscrypt: move fscrypt_symlink_data to fscrypt_private.h fscrypt: remove fscrypt_fname_usr_to_disk() ubifs: switch to fscrypt_get_symlink() ubifs: switch to fscrypt ->symlink() helper functions ubifs: free the encrypted symlink target f2fs: switch to fscrypt_get_symlink() f2fs: switch to fscrypt ->symlink() helper functions ext4: switch to fscrypt_get_symlink() ext4: switch to fscrypt ->symlink() helper functions fscrypt: new helper function - fscrypt_get_symlink() fscrypt: new helper functions for ->symlink() fscrypt: trim down fscrypt.h includes fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h ...
-
Georgi Djakov authored
Update the binding documentation for APCS to mention that the APCS hardware block also expose a clock controller functionality. The APCS clock controller is a mux and half-integer divider. It has the main CPU PLL as an input and provides the clock for the application CPU. Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
-
Georgi Djakov authored
There is a clock controller functionality provided by the APCS hardware block of msm8916 devices. The device-tree would represent an APCS node with both mailbox and clock provider properties. Create a platform child device for the clock controller functionality so the driver can probe and use APCS as parent. Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
-
Georgi Djakov authored
This hardware block provides more functionalities that just IPC. Convert it to regmap to allow other child platform devices to use the same regmap. Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull hardened usercopy whitelisting from Kees Cook: "Currently, hardened usercopy performs dynamic bounds checking on slab cache objects. This is good, but still leaves a lot of kernel memory available to be copied to/from userspace in the face of bugs. To further restrict what memory is available for copying, this creates a way to whitelist specific areas of a given slab cache object for copying to/from userspace, allowing much finer granularity of access control. Slab caches that are never exposed to userspace can declare no whitelist for their objects, thereby keeping them unavailable to userspace via dynamic copy operations. (Note, an implicit form of whitelisting is the use of constant sizes in usercopy operations and get_user()/put_user(); these bypass all hardened usercopy checks since these sizes cannot change at runtime.) This new check is WARN-by-default, so any mistakes can be found over the next several releases without breaking anyone's system. The series has roughly the following sections: - remove %p and improve reporting with offset - prepare infrastructure and whitelist kmalloc - update VFS subsystem with whitelists - update SCSI subsystem with whitelists - update network subsystem with whitelists - update process memory with whitelists - update per-architecture thread_struct with whitelists - update KVM with whitelists and fix ioctl bug - mark all other allocations as not whitelisted - update lkdtm for more sensible test overage" * tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits) lkdtm: Update usercopy tests for whitelisting usercopy: Restrict non-usercopy caches to size 0 kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl kvm: whitelist struct kvm_vcpu_arch arm: Implement thread_struct whitelist for hardened usercopy arm64: Implement thread_struct whitelist for hardened usercopy x86: Implement thread_struct whitelist for hardened usercopy fork: Provide usercopy whitelisting for task_struct fork: Define usercopy region in thread_stack slab caches fork: Define usercopy region in mm_struct slab caches net: Restrict unwhitelisted proto caches to size 0 sctp: Copy struct sctp_sock.autoclose to userspace using put_user() sctp: Define usercopy region in SCTP proto slab cache caif: Define usercopy region in caif proto slab cache ip: Define usercopy region in IP proto slab cache net: Define usercopy region in struct proto slab cache scsi: Define usercopy region in scsi_sense_cache slab cache cifs: Define usercopy region in cifs_request slab cache vxfs: Define usercopy region in vxfs_inode slab cache ufs: Define usercopy region in ufs_inode_cache slab cache ...
-
- 03 Feb, 2018 21 commits
-
-
KarimAllah Ahmed authored
[ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ] ... basically doing exactly what we do for VMX: - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID) - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest actually used it. Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: kvm@vger.kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon.de
-
KarimAllah Ahmed authored
[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only start saving and restoring when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: kvm@vger.kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de
-
KarimAllah Ahmed authored
Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the contents will come directly from the hardware, but user-space can still override it. [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: kvm@vger.kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de
-
Ashok Raj authored
The Indirect Branch Predictor Barrier (IBPB) is an indirect branch control mechanism. It keeps earlier branches from influencing later ones. Unlike IBRS and STIBP, IBPB does not define a new mode of operation. It's a command that ensures predicted branch targets aren't used after the barrier. Although IBRS and IBPB are enumerated by the same CPUID enumeration, IBPB is very different. IBPB helps mitigate against three potential attacks: * Mitigate guests from being attacked by other guests. - This is addressed by issing IBPB when we do a guest switch. * Mitigate attacks from guest/ring3->host/ring3. These would require a IBPB during context switch in host, or after VMEXIT. The host process has two ways to mitigate - Either it can be compiled with retpoline - If its going through context switch, and has set !dumpable then there is a IBPB in that path. (Tim's patch: https://patchwork.kernel.org/patch/10192871) - The case where after a VMEXIT you return back to Qemu might make Qemu attackable from guest when Qemu isn't compiled with retpoline. There are issues reported when doing IBPB on every VMEXIT that resulted in some tsc calibration woes in guest. * Mitigate guest/ring0->host/ring0 attacks. When host kernel is using retpoline it is safe against these attacks. If host kernel isn't using retpoline we might need to do a IBPB flush on every VMEXIT. Even when using retpoline for indirect calls, in certain conditions 'ret' can use the BTB on Skylake-era CPUs. There are other mitigations available like RSB stuffing/clearing. * IBPB is issued only for SVM during svm_free_vcpu(). VMX has a vmclear and SVM doesn't. Follow discussion here: https://lkml.org/lkml/2018/1/15/146 Please refer to the following spec for more details on the enumeration and control. Refer here to get documentation about mitigations. https://software.intel.com/en-us/side-channel-security-support [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD if guest has it in CPUID - svm: only pass through IBPB if guest has it in CPUID - vmx: support !cpu_has_vmx_msr_bitmap()] - vmx: support nested] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: kvm@vger.kernel.org Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de
-
KarimAllah Ahmed authored
[dwmw2: Stop using KF() for bits in it, too] Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Link: https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karahmed@amazon.de
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull pstore update from Kees Cook: "Only a header cleanup this release; nice and quiet. :) - clean up hardirq header usage (Yang Shi)" * tag 'pstore-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: fs: pstore: remove unused hardirq.h
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds authored
Pull ext4 updates from Ted Ts'o: "Only miscellaneous cleanups and bug fixes for ext4 this cycle" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: create ext4_kset dynamically ext4: create ext4_feat kobject dynamically ext4: release kobject/kset even when init/register fail ext4: fix incorrect indentation of if statement ext4: correct documentation for grpid mount option ext4: use 'sbi' instead of 'EXT4_SB(sb)' ext4: save error to disk in __ext4_grp_locked_error() jbd2: fix sphinx kernel-doc build warnings ext4: fix a race in the ext4 shutdown path mbcache: make sure c_entry_count is not decremented past zero ext4: no need flush workqueue before destroying it ext4: fixed alignment and minor code cleanup in ext4.h ext4: fix ENOSPC handling in DAX page fault handler dax: pass detailed error code from dax_iomap_fault() mbcache: revert "fs/mbcache.c: make count_objects() more robust" mbcache: initialize entry->e_referenced in mb_cache_entry_create() ext4: fix up remaining files with SPDX cleanups
-
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/stagingLinus Torvalds authored
Pull dmi subsystem updates/fixes from Jean Delvare. * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: firmware: dmi: handle missing DMI data gracefully firmware: dmi_scan: Fix handling of empty DMI strings firmware: dmi_scan: Drop dmi_initialized firmware: dmi: Optimize dmi_matches
-
Linus Torvalds authored
Merge branch 'fixes-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull integrity fixes from James Morris: - add James Bottommley as a Trusted Keys maintainer. - IMA: re-initialize iint->atomic_flags on iint_free(), from Mimi. * 'fixes-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: ima: re-initialize iint->atomic_flags maintainers: update trusted keys
-
git://git.kernel.org/pub/scm/virt/kvm/kvmThomas Gleixner authored
Pull the KVM prerequisites so the IBPB patches apply.
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) The bnx2x can hang if you give it a GSO packet with a segment size which is too big for the hardware, detect and drop in this case. From Daniel Axtens. 2) Fix some overflows and pointer leaks in xtables, from Dmitry Vyukov. 3) Missing RCU locking in igmp, from Eric Dumazet. 4) Fix RX checksum handling on r8152, it can only checksum UDP and TCP packets. From Hayes Wang. 5) Minor pacing tweak to TCP BBR congestion control, from Neal Cardwell. 6) Missing RCU annotations in cls_u32, from Paolo Abeni. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits) Revert "defer call to mem_cgroup_sk_alloc()" soreuseport: fix mem leak in reuseport_add_sock() net: qlge: use memmove instead of skb_copy_to_linear_data net: qed: use correct strncpy() size net: cxgb4: avoid memcpy beyond end of source buffer cls_u32: add missing RCU annotation. r8152: set rx mode early when linking on r8152: fix wrong checksum status for received IPv4 packets nfp: fix TLV offset calculation net: pxa168_eth: add netconsole support net: igmp: add a missing rcu locking section ibmvnic: fix firmware version when no firmware level has been provided by the VIOS server vmxnet3: remove redundant initialization of pointer 'rq' lan78xx: remove redundant initialization of pointer 'phydev' net: jme: remove unused initialization of 'rxdesc' rtnetlink: remove check for IFLA_IF_NETNSID rocker: fix possible null pointer dereference in rocker_router_fib_event_work inet: Avoid unitialized variable warning in inet_unhash() net: bridge: Fix uninitialized error in br_fdb_sync_static() openvswitch: Remove padding from packet before L3+ conntrack processing ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds authored
Pull GFS2 fixes from Bob Peterson: "Andreas Gruenbacher wrote two additional patches that we would like merged in this time. Both are regressions: - fix another kernel build dependency problem - fix a performance regression in glock dumps" * tag 'gfs2-4.16.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Glock dump performance regression fix gfs2: Fix the crc32c dependency
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull second set of SCSI updates from James Bottomley: "This is a set of three patches that depended on mq and zone changes in the block tree (now upstream)" * tag 'scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Remove zone write locking scsi: sd_zbc: Initialize device request queue zoned data scsi: scsi-mq-debugfs: Show more information
-
Linus Torvalds authored
Merge tag 'linux-kselftest-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest updates from Shuah Khan: "This update to Kselftest consists of fixes, cleanups, and SPDX license additions" * tag 'linux-kselftest-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: vm: update .gitignore with missing generated file selftests/x86: Add <test_name>{,_32,_64} targets selftests: Fix loss of test output in run_kselftests.sh selftest: ftrace: Fix to add 256 kprobe events correctly selftest: ftrace: Fix to pick text symbols for kprobes selftests: media_tests: Add SPDX license identifier selftests: kselftest.h: Add SPDX license identifier selftests: kselftest_install.sh: Add SPDX license identifier selftests: gen_kselftest_tar.h: Add SPDX license identifier selftests: media_tests: Fix Makefile 'clean' target warning tools/testing: Fix trailing semicolon kselftest: fix OOM in memory compaction test selftests: seccomp: fix compile error seccomp_bpf
-
Linus Torvalds authored
When pulling the recent pinctrl merge, I was surprised by how a pinctrl-only pull request ended up rebuilding basically the whole kernel. The reason for that ended up being that <linux/device.h> included <linux/pinctrl/devinfo.h>, so any change to that file ended up causing pretty much every driver out there to be rebuilt. The reason for that was because 'struct device' has this in it: #ifdef CONFIG_PINCTRL struct dev_pin_info *pins; #endif but we already avoid header includes for these kinds of things in that header file, preferring to just use a forward-declaration of the structure instead. Exactly to avoid this kind of header dependency. Since some drivers seem to expect that <linux/pinctrl/devinfo.h> header to come in automatically, move the include to <linux/pinctrl/pinctrl.h> instead. It might be better to just make the includes more targeted, but I'm not going to review every driver. It would definitely be good to have a tool for finding and minimizing header dependencies automatically - or at least help with them. Right now we almost certainly end up having way too many of these things, and it's hard to test every single configuration. FWIW, you can get a sense of the "hotness" of a header file with something like this after doing a full build: find . -name '.*.o.cmd' -print0 | xargs -0 tail --lines=+2 | grep -v 'wildcard ' | tr ' \\' '\n' | sort | uniq -c | sort -n | less -S which isn't exact (there are other things in those '*.o.cmd' than just the dependencies, and the "--lines=+2" only removes the header), but might a useful approximation. With this patch, <linux/pinctrl/devinfo.h> drops to "only" having 833 users in the current x86-64 allmodconfig. In contrast, <linux/device.h> has 14857 build files including it directly or indirectly. Of course, the headers that absolutely _everybody_ includes (things like <linux/types.h> etc) get a score of 23000+. Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ard Biesheuvel authored
Currently, when booting a kernel with DMI support on a platform that has no DMI tables, the following output is emitted into the kernel log: [ 0.128818] DMI not present or invalid. ... [ 1.306659] dmi: Firmware registration failed. ... [ 2.908681] dmi-sysfs: dmi entry is absent. The first one is a pr_info(), but the subsequent ones are pr_err()s that complain about a condition that is not really an error to begin with. So let's clean this up, and give up silently if dma_available is not set. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Martin Hundebøll <mnhu@prevas.dk> Signed-off-by: Jean Delvare <jdelvare@suse.de>
-
Jean Delvare authored
The handling of empty DMI strings looks quite broken to me: * Strings from 1 to 7 spaces are not considered empty. * True empty DMI strings (string index set to 0) are not considered empty, and result in allocating a 0-char string. * Strings with invalid index also result in allocating a 0-char string. * Strings starting with 8 spaces are all considered empty, even if non-space characters follow (sounds like a weird thing to do, but I have actually seen occurrences of this in DMI tables before.) * Strings which are considered empty are reported as 8 spaces, instead of being actually empty. Some of these issues are the result of an off-by-one error in memcmp, the rest is incorrect by design. So let's get it square: missing strings and strings made of only spaces, regardless of their length, should be treated as empty and no memory should be allocated for them. All other strings are non-empty and should be allocated. Signed-off-by: Jean Delvare <jdelvare@suse.de> Fixes: 79da4721 ("x86: fix DMI out of memory problems") Cc: Parag Warudkar <parag.warudkar@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
-
Jean Delvare authored
I don't think it makes sense to check for a possible bad initialization order at run time on every system when it is all decided at build time. A more efficient way to make sure developers do not introduce new calls to dmi_check_system() too early in the initialization sequence is to simply document the expected call order. That way, developers have a chance to get it right immediately, without having to test-boot their kernel, wonder why it does not work, and parse the kernel logs for a warning message. And we get rid of the run-time performance penalty as a nice side effect. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Ingo Molnar <mingo@kernel.org>
-
Jean Delvare authored
Function dmi_matches can me made a bit faster: * The documented purpose of dmi_initialized is to catch too early calls to dmi_check_system(). I'm not fully convinced it justifies slowing down the initialization of all systems out there, but at least the check should not have been moved from dmi_check_system() to dmi_matches(). dmi_matches() is being called for every entry of the table passed to dmi_check_system(), causing the same redundant check to be performed again and again. So move it back to dmi_check_system(), reverting this specific portion of commit d7b1956f ("DMI: Introduce dmi_first_match to make the interface more flexible"). * Don't check for the exact_match flag again when we already know its value. Signed-off-by: Jean Delvare <jdelvare@suse.de> Fixes: d7b1956f ("DMI: Introduce dmi_first_match to make the interface more flexible") Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Jeff Garzik <jgarzik@redhat.com>
-
Alexei Starovoitov authored
Eric Leblond says: ==================== Here is an updated v8 version: - add if_link.h in uapi and remove the definition - fix a commit message - remove uapi from a include ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Eric Leblond authored
Use bpf_set_link_xdp_fd instead of set_link_xdp_fd to remove some code duplication and benefit of netlink ext ack errors message. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-