1. 18 Dec, 2017 1 commit
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 59436c9e
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2017-12-18
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Allow arbitrary function calls from one BPF function to another BPF function.
         As of today when writing BPF programs, __always_inline had to be used in
         the BPF C programs for all functions, unnecessarily causing LLVM to inflate
         code size. Handle this more naturally with support for BPF to BPF calls
         such that this __always_inline restriction can be overcome. As a result,
         it allows for better optimized code and finally enables to introduce core
         BPF libraries in the future that can be reused out of different projects.
         x86 and arm64 JIT support was added as well, from Alexei.
      
      2) Add infrastructure for tagging functions as error injectable and allow for
         BPF to return arbitrary error values when BPF is attached via kprobes on
         those. This way of injecting errors generically eases testing and debugging
         without having to recompile or restart the kernel. Tags for opting-in for
         this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.
      
      3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
         call for XDP programs. First part of this work adds handling of BPF
         capabilities included in the firmware, and the later patches add support
         to the nfp verifier part and JIT as well as some small optimizations,
         from Jakub.
      
      4) The bpftool now also gets support for basic cgroup BPF operations such
         as attaching, detaching and listing current BPF programs. As a requirement
         for the attach part, bpftool can now also load object files through
         'bpftool prog load'. This reuses libbpf which we have in the kernel tree
         as well. bpftool-cgroup man page is added along with it, from Roman.
      
      5) Back then commit e87c6bc3 ("bpf: permit multiple bpf attachments for
         a single perf event") added support for attaching multiple BPF programs
         to a single perf event. Given they are configured through perf's ioctl()
         interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
         command in this work in order to return an array of one or multiple BPF
         prog ids that are currently attached, from Yonghong.
      
      6) Various minor fixes and cleanups to the bpftool's Makefile as well
         as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
         itself or prior installed documentation related to it, from Quentin.
      
      7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
         required for the test_dev_cgroup test case to run, from Naresh.
      
      8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.
      
      9) Fix libbpf's exit code from the Makefile when libelf was not found in
         the system, also from Jakub.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59436c9e
  2. 17 Dec, 2017 18 commits
    • Josef Bacik's avatar
      trace: reenable preemption if we modify the ip · 46df3d20
      Josef Bacik authored
      Things got moved around between the original bpf_override_return patches
      and the final version, and now the ftrace kprobe dispatcher assumes if
      you modified the ip that you also enabled preemption.  Make a comment of
      this and enable preemption, this fixes the lockdep splat that happened
      when using this feature.
      
      Fixes: 9802d865 ("bpf: add a bpf_override_function helper")
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      46df3d20
    • Jakub Kicinski's avatar
      nfp: set flags in the correct member of netdev_bpf · 4a29c0db
      Jakub Kicinski authored
      netdev_bpf.flags is the input member for installing the program.
      netdev_bpf.prog_flags is the output member for querying.  Set
      the correct one on query.
      
      Fixes: 92f0292b ("net: xdp: report flags program was installed with on query")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4a29c0db
    • Jakub Kicinski's avatar
      libbpf: fix Makefile exit code if libelf not found · 21567ede
      Jakub Kicinski authored
      /bin/sh's exit does not recognize -1 as a number, leading to
      the following error message:
      
      /bin/sh: 1: exit: Illegal number: -1
      
      Use 1 as the exit code.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      21567ede
    • Daniel Borkmann's avatar
      Merge branch 'bpf-to-bpf-function-calls' · ef9fde06
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      First of all huge thank you to Daniel, John, Jakub, Edward and others who
      reviewed multiple iterations of this patch set over the last many months
      and to Dave and others who gave critical feedback during netconf/netdev.
      
      The patch is solid enough and we thought through numerous corner cases,
      but it's not the end. More followups with code reorg and features to follow.
      
      TLDR: Allow arbitrary function calls from bpf function to another bpf function.
      
      Since the beginning of bpf all bpf programs were represented as a single function
      and program authors were forced to use always_inline for all functions
      in their C code. That was causing llvm to unnecessary inflate the code size
      and forcing developers to move code to header files with little code reuse.
      
      With a bit of additional complexity teach verifier to recognize
      arbitrary function calls from one bpf function to another as long as
      all of functions are presented to the verifier as a single bpf program.
      Extended program layout:
      ..
      r1 = ..    // arg1
      r2 = ..    // arg2
      call pc+1  // function call pc-relative
      exit
      .. = r1    // access arg1
      .. = r2    // access arg2
      ..
      call pc+20 // second level of function call
      ...
      
      It allows for better optimized code and finally allows to introduce
      the core bpf libraries that can be reused in different projects,
      since programs are no longer limited by single elf file.
      With function calls bpf can be compiled into multiple .o files.
      
      This patch is the first step. It detects programs that contain
      multiple functions and checks that calls between them are valid.
      It splits the sequence of bpf instructions (one program) into a set
      of bpf functions that call each other. Calls to only known
      functions are allowed. Since all functions are presented to
      the verifier at once conceptually it is 'static linking'.
      
      Future plans:
      - introduce BPF_PROG_TYPE_LIBRARY and allow a set of bpf functions
        to be loaded into the kernel that can be later linked to other
        programs with concrete program types. Aka 'dynamic linking'.
      
      - introduce function pointer type and indirect calls to allow
        bpf functions call other dynamically loaded bpf functions while
        the caller bpf function is already executing. Aka 'runtime linking'.
        This will be more generic and more flexible alternative
        to bpf_tail_calls.
      
      FAQ:
      Q: Interpreter and JIT changes mean that new instruction is introduced ?
      A: No. The call instruction technically stays the same. Now it can call
         both kernel helpers and other bpf functions.
         Calling convention stays the same as well.
         From uapi point of view the call insn got new 'relocation' BPF_PSEUDO_CALL
         similar to BPF_PSEUDO_MAP_FD 'relocation' of bpf_ldimm64 insn.
      
      Q: What had to change on LLVM side?
      A: Trivial LLVM patch to allow calls was applied to upcoming 6.0 release:
         https://reviews.llvm.org/rL318614
         with few bugfixes as well.
         Make sure to build the latest llvm to have bpf_call support.
      
      More details in the patches.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ef9fde06
    • Daniel Borkmann's avatar
      selftests/bpf: additional bpf_call tests · 28ab173e
      Daniel Borkmann authored
      Add some additional checks for few more corner cases.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      28ab173e
    • Alexei Starovoitov's avatar
      bpf: arm64: add JIT support for multi-function programs · db496944
      Alexei Starovoitov authored
      similar to x64 add support for bpf-to-bpf calls.
      When program has calls to in-kernel helpers the target call offset
      is known at JIT time and arm64 architecture needs 2 passes.
      With bpf-to-bpf calls the dynamically allocated function start
      is unknown until all functions of the program are JITed.
      Therefore (just like x64) arm64 JIT needs one extra pass over
      the program to emit correct call offsets.
      
      Implementation detail:
      Avoid being too clever in 64-bit immediate moves and
      always use 4 instructions (instead of 3-4 depending on the address)
      to make sure only one extra pass is needed.
      If some future optimization would make it worth while to optimize
      'call 64-bit imm' further, the JIT would need to do 4 passes
      over the program instead of 3 as in this patch.
      For typical bpf program address the mov needs 3 or 4 insns,
      so unconditional 4 insns to save extra pass is a worthy trade off
      at this state of JIT.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      db496944
    • Alexei Starovoitov's avatar
      bpf: x64: add JIT support for multi-function programs · 1c2a088a
      Alexei Starovoitov authored
      Typical JIT does several passes over bpf instructions to
      compute total size and relative offsets of jumps and calls.
      With multitple bpf functions calling each other all relative calls
      will have invalid offsets intially therefore we need to additional
      last pass over the program to emit calls with correct offsets.
      For example in case of three bpf functions:
      main:
        call foo
        call bpf_map_lookup
        exit
      foo:
        call bar
        exit
      bar:
        exit
      
      We will call bpf_int_jit_compile() indepedently for main(), foo() and bar()
      x64 JIT typically does 4-5 passes to converge.
      After these initial passes the image for these 3 functions
      will be good except call targets, since start addresses of
      foo() and bar() are unknown when we were JITing main()
      (note that call bpf_map_lookup will be resolved properly
      during initial passes).
      Once start addresses of 3 functions are known we patch
      call_insn->imm to point to right functions and call
      bpf_int_jit_compile() again which needs only one pass.
      Additional safety checks are done to make sure this
      last pass doesn't produce image that is larger or smaller
      than previous pass.
      
      When constant blinding is on it's applied to all functions
      at the first pass, since doing it once again at the last
      pass can change size of the JITed code.
      
      Tested on x64 and arm64 hw with JIT on/off, blinding on/off.
      x64 jits bpf-to-bpf calls correctly while arm64 falls back to interpreter.
      All other JITs that support normal BPF_CALL will behave the same way
      since bpf-to-bpf call is equivalent to bpf-to-kernel call from
      JITs point of view.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1c2a088a
    • Alexei Starovoitov's avatar
      bpf: fix net.core.bpf_jit_enable race · 60b58afc
      Alexei Starovoitov authored
      global bpf_jit_enable variable is tested multiple times in JITs,
      blinding and verifier core. The malicious root can try to toggle
      it while loading the programs. This race condition was accounted
      for and there should be no issues, but it's safer to avoid
      this race condition.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      60b58afc
    • Alexei Starovoitov's avatar
      bpf: add support for bpf_call to interpreter · 1ea47e01
      Alexei Starovoitov authored
      though bpf_call is still the same call instruction and
      calling convention 'bpf to bpf' and 'bpf to helper' is the same
      the interpreter has to oparate on 'struct bpf_insn *'.
      To distinguish these two cases add a kernel internal opcode and
      mark call insns with it.
      This opcode is seen by interpreter only. JITs will never see it.
      Also add tiny bit of debug code to aid interpreter debugging.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1ea47e01
    • Alexei Starovoitov's avatar
      selftests/bpf: add xdp noinline test · b0b04fc4
      Alexei Starovoitov authored
      add large semi-artificial XDP test with 18 functions to stress test
      bpf call verification logic
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b0b04fc4
    • Alexei Starovoitov's avatar
      selftests/bpf: add bpf_call test · 3bc35c63
      Alexei Starovoitov authored
      strip always_inline from test_l4lb.c and compile it with -fno-inline
      to let verifier go through 11 function with various function arguments
      and return values
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3bc35c63
    • Alexei Starovoitov's avatar
      libbpf: add support for bpf_call · 48cca7e4
      Alexei Starovoitov authored
      - recognize relocation emitted by llvm
      - since all regular function will be kept in .text section and llvm
        takes care of pc-relative offsets in bpf_call instruction
        simply copy all of .text to relevant program section while adjusting
        bpf_call instructions in program section to point to newly copied
        body of instructions from .text
      - do so for all programs in the elf file
      - set all programs types to the one passed to bpf_prog_load()
      
      Note for elf files with multiple programs that use different
      functions in .text section we need to do 'linker' style logic.
      This work is still TBD
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      48cca7e4
    • Alexei Starovoitov's avatar
      selftests/bpf: add tests for stack_zero tracking · d98588ce
      Alexei Starovoitov authored
      adjust two tests, since verifier got smarter
      and add new one to test stack_zero logic
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d98588ce
    • Alexei Starovoitov's avatar
      bpf: teach verifier to recognize zero initialized stack · cc2b14d5
      Alexei Starovoitov authored
      programs with function calls are often passing various
      pointers via stack. When all calls are inlined llvm
      flattens stack accesses and optimizes away extra branches.
      When functions are not inlined it becomes the job of
      the verifier to recognize zero initialized stack to avoid
      exploring paths that program will not take.
      The following program would fail otherwise:
      
      ptr = &buffer_on_stack;
      *ptr = 0;
      ...
      func_call(.., ptr, ...) {
        if (..)
          *ptr = bpf_map_lookup();
      }
      ...
      if (*ptr != 0) {
        // Access (*ptr)->field is valid.
        // Without stack_zero tracking such (*ptr)->field access
        // will be rejected
      }
      
      since stack slots are no longer uniform invalid | spill | misc
      add liveness marking to all slots, but do it in 8 byte chunks.
      So if nothing was read or written in [fp-16, fp-9] range
      it will be marked as LIVE_NONE.
      If any byte in that range was read, it will be marked LIVE_READ
      and stacksafe() check will perform byte-by-byte verification.
      If all bytes in the range were written the slot will be
      marked as LIVE_WRITTEN.
      This significantly speeds up state equality comparison
      and reduces total number of states processed.
      
                          before   after
      bpf_lb-DLB_L3.o       2051    2003
      bpf_lb-DLB_L4.o       3287    3164
      bpf_lb-DUNKNOWN.o     1080    1080
      bpf_lxc-DDROP_ALL.o   24980   12361
      bpf_lxc-DUNKNOWN.o    34308   16605
      bpf_netdev.o          15404   10962
      bpf_overlay.o         7191    6679
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cc2b14d5
    • Alexei Starovoitov's avatar
      selftests/bpf: add verifier tests for bpf_call · a7ff3eca
      Alexei Starovoitov authored
      Add extensive set of tests for bpf_call verification logic:
      
      calls: basic sanity
      calls: using r0 returned by callee
      calls: callee is using r1
      calls: callee using args1
      calls: callee using wrong args2
      calls: callee using two args
      calls: callee changing pkt pointers
      calls: two calls with args
      calls: two calls with bad jump
      calls: recursive call. test1
      calls: recursive call. test2
      calls: unreachable code
      calls: invalid call
      calls: jumping across function bodies. test1
      calls: jumping across function bodies. test2
      calls: call without exit
      calls: call into middle of ld_imm64
      calls: call into middle of other call
      calls: two calls with bad fallthrough
      calls: two calls with stack read
      calls: two calls with stack write
      calls: spill into caller stack frame
      calls: two calls with stack write and void return
      calls: ambiguous return value
      calls: two calls that return map_value
      calls: two calls that return map_value with bool condition
      calls: two calls that return map_value with incorrect bool check
      calls: two calls that receive map_value via arg=ptr_stack_of_caller. test1
      calls: two calls that receive map_value via arg=ptr_stack_of_caller. test2
      calls: two jumps that receive map_value via arg=ptr_stack_of_jumper. test3
      calls: two calls that receive map_value_ptr_or_null via arg. test1
      calls: two calls that receive map_value_ptr_or_null via arg. test2
      calls: pkt_ptr spill into caller stack
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a7ff3eca
    • Alexei Starovoitov's avatar
      bpf: introduce function calls (verification) · f4d7e40a
      Alexei Starovoitov authored
      Allow arbitrary function calls from bpf function to another bpf function.
      
      To recognize such set of bpf functions the verifier does:
      1. runs control flow analysis to detect function boundaries
      2. proceeds with verification of all functions starting from main(root) function
      It recognizes that the stack of the caller can be accessed by the callee
      (if the caller passed a pointer to its stack to the callee) and the callee
      can store map_value and other pointers into the stack of the caller.
      3. keeps track of the stack_depth of each function to make sure that total
      stack depth is still less than 512 bytes
      4. disallows pointers to the callee stack to be stored into the caller stack,
      since they will be invalid as soon as the callee returns
      5. to reuse all of the existing state_pruning logic each function call
      is considered to be independent call from the verifier point of view.
      The verifier pretends to inline all function calls it sees are being called.
      It stores the callsite instruction index as part of the state to make sure
      that two calls to the same callee from two different places in the caller
      will be different from state pruning point of view
      6. more safety checks are added to liveness analysis
      
      Implementation details:
      . struct bpf_verifier_state is now consists of all stack frames that
        led to this function
      . struct bpf_func_state represent one stack frame. It consists of
        registers in the given frame and its stack
      . propagate_liveness() logic had a premature optimization where
        mark_reg_read() and mark_stack_slot_read() were manually inlined
        with loop iterating over parents for each register or stack slot.
        Undo this optimization to reuse more complex mark_*_read() logic
      . skip_callee() logic is not necessary from safety point of view,
        but without it mark_*_read() markings become too conservative,
        since after returning from the funciton call a read of r6-r9
        will incorrectly propagate the read marks into callee causing
        inefficient pruning later
      . mark_*_read() logic is now aware of control flow which makes it
        more complex. In the future the plan is to rewrite liveness
        to be hierarchical. So that liveness can be done within
        basic block only and control flow will be responsible for
        propagation of liveness information along cfg and between calls.
      . tail_calls and ld_abs insns are not allowed in the programs with
        bpf-to-bpf calls
      . returning stack pointers to the caller or storing them into stack
        frame of the caller is not allowed
      
      Testing:
      . no difference in cilium processed_insn numbers
      . large number of tests follows in next patches
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f4d7e40a
    • Alexei Starovoitov's avatar
      bpf: introduce function calls (function boundaries) · cc8b0b92
      Alexei Starovoitov authored
      Allow arbitrary function calls from bpf function to another bpf function.
      
      Since the beginning of bpf all bpf programs were represented as a single function
      and program authors were forced to use always_inline for all functions
      in their C code. That was causing llvm to unnecessary inflate the code size
      and forcing developers to move code to header files with little code reuse.
      
      With a bit of additional complexity teach verifier to recognize
      arbitrary function calls from one bpf function to another as long as
      all of functions are presented to the verifier as a single bpf program.
      New program layout:
      r6 = r1    // some code
      ..
      r1 = ..    // arg1
      r2 = ..    // arg2
      call pc+1  // function call pc-relative
      exit
      .. = r1    // access arg1
      .. = r2    // access arg2
      ..
      call pc+20 // second level of function call
      ...
      
      It allows for better optimized code and finally allows to introduce
      the core bpf libraries that can be reused in different projects,
      since programs are no longer limited by single elf file.
      With function calls bpf can be compiled into multiple .o files.
      
      This patch is the first step. It detects programs that contain
      multiple functions and checks that calls between them are valid.
      It splits the sequence of bpf instructions (one program) into a set
      of bpf functions that call each other. Calls to only known
      functions are allowed. In the future the verifier may allow
      calls to unresolved functions and will do dynamic linking.
      This logic supports statically linked bpf functions only.
      
      Such function boundary detection could have been done as part of
      control flow graph building in check_cfg(), but it's cleaner to
      separate function boundary detection vs control flow checks within
      a subprogram (function) into logically indepedent steps.
      Follow up patches may split check_cfg() further, but not check_subprogs().
      
      Only allow bpf-to-bpf calls for root only and for non-hw-offloaded programs.
      These restrictions can be relaxed in the future.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cc8b0b92
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c30abd5e
      David S. Miller authored
      Three sets of overlapping changes, two in the packet scheduler
      and one in the meson-gxl PHY driver.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c30abd5e
  3. 16 Dec, 2017 4 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · f3b5ad89
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "More fixes from testing done on the rc kernel, including more SELinux
        testing. Looking forward, lockdep found regression today in ipoib
        which is still being fixed.
      
        Summary:
      
         - Fix for SELinux on the umad SMI path. Some old hardware does not
           fill the PKey properly exposing another bug in the newer SELinux
           code.
      
         - Check the input port as we can exceed array bounds from this user
           supplied value
      
         - Users are unable to use the hash field support as they want due to
           incorrect checks on the field restrictions, correct that so the
           feature works as intended
      
         - User triggerable oops in the NETLINK_RDMA handler
      
         - cxgb4 driver fix for a bad interaction with CQ flushing in iser
           caused by patches in this merge window, and bad CQ flushing during
           normal close.
      
         - Unbalanced memalloc_noio in ipoib in an error path"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        IB/ipoib: Restore MM behavior in case of tx_ring allocation failure
        iw_cxgb4: only insert drain cqes if wq is flushed
        iw_cxgb4: only clear the ARMED bit if a notification is needed
        RDMA/netlink: Fix general protection fault
        IB/mlx4: Fix RSS hash fields restrictions
        IB/core: Don't enforce PKey security on SMI MADs
        IB/core: Bound check alternate path port number
      f3b5ad89
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · f25e2295
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Two bugfixes for the AT24 I2C eeprom driver and some minor corrections
        for I2C bus drivers"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: piix4: Fix port number check on release
        i2c: stm32: Fix copyrights
        i2c-cht-wc: constify platform_device_id
        eeprom: at24: change nvmem stride to 1
        eeprom: at24: fix I2C device selection for runtime PM
      f25e2295
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs · d025fbf1
      Linus Torvalds authored
      Pull NFS client fixes from Anna Schumaker:
       "This has two stable bugfixes, one to fix a BUG_ON() when
        nfs_commit_inode() is called with no outstanding commit requests and
        another to fix a race in the SUNRPC receive codepath.
      
        Additionally, there are also fixes for an NFS client deadlock and an
        xprtrdma performance regression.
      
        Summary:
      
        Stable bugfixes:
         - NFS: Avoid a BUG_ON() in nfs_commit_inode() by not waiting for a
           commit in the case that there were no commit requests.
         - SUNRPC: Fix a race in the receive code path
      
        Other fixes:
         - NFS: Fix a deadlock in nfs client initialization
         - xprtrdma: Fix a performance regression for small IOs"
      
      * tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        SUNRPC: Fix a race in the receive code path
        nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests
        xprtrdma: Spread reply processing over more CPUs
        nfs: fix a deadlock in nfs client initialization
      d025fbf1
    • Linus Torvalds's avatar
      Revert "mm: replace p??_write with pte_access_permitted in fault + gup paths" · f6f37321
      Linus Torvalds authored
      This reverts commits 5c9d2d5c, c7da82b8, and e7fe7b5c.
      
      We'll probably need to revisit this, but basically we should not
      complicate the get_user_pages_fast() case, and checking the actual page
      table protection key bits will require more care anyway, since the
      protection keys depend on the exact state of the VM in question.
      
      Particularly when doing a "remote" page lookup (ie in somebody elses VM,
      not your own), you need to be much more careful than this was.  Dave
      Hansen says:
      
       "So, the underlying bug here is that we now a get_user_pages_remote()
        and then go ahead and do the p*_access_permitted() checks against the
        current PKRU. This was introduced recently with the addition of the
        new p??_access_permitted() calls.
      
        We have checks in the VMA path for the "remote" gups and we avoid
        consulting PKRU for them. This got missed in the pkeys selftests
        because I did a ptrace read, but not a *write*. I also didn't
        explicitly test it against something where a COW needed to be done"
      
      It's also not entirely clear that it makes sense to check the protection
      key bits at this level at all.  But one possible eventual solution is to
      make the get_user_pages_fast() case just abort if it sees protection key
      bits set, which makes us fall back to the regular get_user_pages() case,
      which then has a vma and can do the check there if we want to.
      
      We'll see.
      
      Somewhat related to this all: what we _do_ want to do some day is to
      check the PAGE_USER bit - it should obviously always be set for user
      pages, but it would be a good check to have back.  Because we have no
      generic way to test for it, we lost it as part of moving over from the
      architecture-specific x86 GUP implementation to the generic one in
      commit e585513b ("x86/mm/gup: Switch GUP to the generic
      get_user_page_fast() implementation").
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f6f37321
  4. 15 Dec, 2017 17 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7a3c296a
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Clamp timeouts to INT_MAX in conntrack, from Jay Elliot.
      
       2) Fix broken UAPI for BPF_PROG_TYPE_PERF_EVENT, from Hendrik
          Brueckner.
      
       3) Fix locking in ieee80211_sta_tear_down_BA_sessions, from Johannes
          Berg.
      
       4) Add missing barriers to ptr_ring, from Michael S. Tsirkin.
      
       5) Don't advertise gigabit in sh_eth when not available, from Thomas
          Petazzoni.
      
       6) Check network namespace when delivering to netlink taps, from Kevin
          Cernekee.
      
       7) Kill a race in raw_sendmsg(), from Mohamed Ghannam.
      
       8) Use correct address in TCP md5 lookups when replying to an incoming
          segment, from Christoph Paasch.
      
       9) Add schedule points to BPF map alloc/free, from Eric Dumazet.
      
      10) Don't allow silly mtu values to be used in ipv4/ipv6 multicast, also
          from Eric Dumazet.
      
      11) Fix SKB leak in tipc, from Jon Maloy.
      
      12) Disable MAC learning on OVS ports of mlxsw, from Yuval Mintz.
      
      13) SKB leak fix in skB_complete_tx_timestamp(), from Willem de Bruijn.
      
      14) Add some new qmi_wwan device IDs, from Daniele Palmas.
      
      15) Fix static key imbalance in ingress qdisc, from Jiri Pirko.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
        net: qcom/emac: Reduce timeout for mdio read/write
        net: sched: fix static key imbalance in case of ingress/clsact_init error
        net: sched: fix clsact init error path
        ip_gre: fix wrong return value of erspan_rcv
        net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
        pkt_sched: Remove TC_RED_OFFLOADED from uapi
        net: sched: Move to new offload indication in RED
        net: sched: Add TCA_HW_OFFLOAD
        net: aquantia: Increment driver version
        net: aquantia: Fix typo in ethtool statistics names
        net: aquantia: Update hw counters on hw init
        net: aquantia: Improve link state and statistics check interval callback
        net: aquantia: Fill in multicast counter in ndev stats from hardware
        net: aquantia: Fill ndev stat couters from hardware
        net: aquantia: Extend stat counters to 64bit values
        net: aquantia: Fix hardware DMA stream overload on large MRRS
        net: aquantia: Fix actual speed capabilities reporting
        sock: free skb in skb_complete_tx_timestamp on error
        s390/qeth: update takeover IPs after configuration change
        s390/qeth: lock IP table while applying takeover changes
        ...
      7a3c296a
    • Linus Torvalds's avatar
      Merge tag 'usb-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c36c7a7c
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some USB fixes for 4.15-rc4.
      
        There is the usual handful gadget/dwc2/dwc3 fixes as always, for
        reported issues. But the most important things in here is the core fix
        from Alan Stern to resolve a nasty security bug (my first attempt is
        reverted, Alan's was much cleaner), as well as a number of usbip fixes
        from Shuah Khan to resolve those reported security issues.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: core: prevent malicious bNumInterfaces overflow
        Revert "USB: core: only clean up what we allocated"
        USB: core: only clean up what we allocated
        Revert "usb: gadget: allow to enable legacy drivers without USB_ETH"
        usb: gadget: webcam: fix V4L2 Kconfig dependency
        usb: dwc2: Fix TxFIFOn sizes and total TxFIFO size issues
        usb: dwc3: gadget: Fix PCM1 for ISOC EP with ep->mult less than 3
        usb: dwc3: of-simple: set dev_pm_ops
        usb: dwc3: of-simple: fix missing clk_disable_unprepare
        usb: dwc3: gadget: Wait longer for controller to end command processing
        usb: xhci: fix TDS for MTK xHCI1.1
        xhci: Don't add a virt_dev to the devs array before it's fully allocated
        usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer
        usbip: prevent vhci_hcd driver from leaking a socket pointer address
        usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input
        usbip: fix stub_rx: get_pipe() to validate endpoint number
        tools/usbip: fixes potential (minor) "buffer overflow" (detected on recent gcc with -Werror)
        USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID
        usb: musb: da8xx: fix babble condition handling
      c36c7a7c
    • Jakub Kicinski's avatar
      nfp: bpf: correct printk formats for size_t · 0bce7c9a
      Jakub Kicinski authored
      Build bot reported warning about invalid printk formats on 32bit
      architectures.  Use %zu for size_t and %zd ptr diff.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0bce7c9a
    • Linus Torvalds's avatar
      Merge tag 'staging-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · a84ec723
      Linus Torvalds authored
      Pull staging fixes from Greg KH:
       "Here are some small staging driver fixes for 4.15-rc4.
      
        One patch for the ccree driver to prevent an unitialized value from
        being returned to a caller, and the other fixes a logic error in the
        pi433 driver"
      
      * tag 'staging-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: pi433: Fixes issue with bit shift in rf69_get_modulation
        staging: ccree: Uninitialized return in ssi_ahash_import()
      a84ec723
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · d6e47eed
      Linus Torvalds authored
      Pull virtio regression fixes from Michael Tsirkin:
       "Fixes two issues in the latest kernel"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_mmio: fix devm cleanup
        ptr_ring: fix up after recent ptr_ring changes
      d6e47eed
    • Linus Torvalds's avatar
      Merge tag 'for-4.15/dm-fixes' of... · ee1b43ec
      Linus Torvalds authored
      Merge tag 'for-4.15/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - fix a particularly nasty DM core bug in a 4.15 refcount_t conversion.
      
       - fix various targets to dm_register_target after module __init
         resources created; otherwise racing lvm2 commands could result in a
         NULL pointer during initialization of associated DM kernel module.
      
       - fix regression in bio-based DM multipath queue_if_no_path handling.
      
       - fix DM bufio's shrinker to reclaim more than one buffer per scan.
      
      * tag 'for-4.15/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
        dm mpath: fix bio-based multipath queue_if_no_path handling
        dm: fix various targets to dm_register_target after module __init resources created
        dm table: fix regression from improper dm_dev_internal.count refcount_t conversion
      ee1b43ec
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 66dbbd72
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "The most important one is the bfa fix because it's easy to oops the
        kernel with this driver (this includes the commit that corrects the
        compiler warning in the original), a regression in the new timespec
        conversion in aacraid and a regression in the Fibre Channel ELS
        handling patch.
      
        The other three are a theoretical problem with termination in the
        vendor/host matching code and a use after free in lpfc.
      
        The additional patches are a fix for an I/O hang in the mq code under
        certain circumstances and a rare oops in some debugging code"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: core: Fix a scsi_show_rq() NULL pointer dereference
        scsi: MAINTAINERS: change FCoE list to linux-scsi
        scsi: libsas: fix length error in sas_smp_handler()
        scsi: bfa: fix type conversion warning
        scsi: core: run queue if SCSI device queue isn't ready and queue is idle
        scsi: scsi_devinfo: cleanly zero-pad devinfo strings
        scsi: scsi_devinfo: handle non-terminated strings
        scsi: bfa: fix access to bfad_im_port_s
        scsi: aacraid: address UBSAN warning regression
        scsi: libfc: fix ELS request handling
        scsi: lpfc: Use after free in lpfc_rq_buf_free()
      66dbbd72
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 07a20ed1
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "A couple of MMC fixes:
      
         - fix use of uninitialized drv_typ variable
      
         - apply NO_CMD23 quirk to some specific SD cards to make them work"
      
      * tag 'mmc-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: core: apply NO_CMD23 quirk to some specific cards
        mmc: core: properly init drv_type
      07a20ed1
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.15-rc4' of git://github.com/ceph/ceph-client · dd3d66b8
      Linus Torvalds authored
      Pull ceph fix from Ilya Dryomov:
       "CephFS inode trimming fix from Zheng, marked for stable"
      
      * tag 'ceph-for-4.15-rc4' of git://github.com/ceph/ceph-client:
        ceph: drop negative child dentries before try pruning inode's alias
      dd3d66b8
    • Linus Torvalds's avatar
      Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · 227701e0
      Linus Torvalds authored
      Pull overlayfs fixes from Miklos Szeredi:
      
       - fix incomplete syncing of filesystem
      
       - fix regression in readdir on ovl over 9p
      
       - only follow redirects when needed
      
       - misc fixes and cleanups
      
      * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
        ovl: fix overlay: warning prefix
        ovl: Use PTR_ERR_OR_ZERO()
        ovl: Sync upper dirty data when syncing overlayfs
        ovl: update ctx->pos on impure dir iteration
        ovl: Pass ovl_get_nlink() parameters in right order
        ovl: don't follow redirects if redirect_dir=off
      227701e0
    • Hemanth Puranik's avatar
      net: qcom/emac: Reduce timeout for mdio read/write · 043ee1de
      Hemanth Puranik authored
      Currently mdio read/write takes around ~115us as the timeout
      between status check is set to 100us.
      By reducing the timeout to 1us mdio read/write takes ~15us to
      complete. This improves the link up event response.
      Signed-off-by: default avatarHemanth Puranik <hpuranik@codeaurora.org>
      Acked-by: default avatarTimur Tabi <timur@codeaurora.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      043ee1de
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 06f976ec
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "There are some significant fixes in here for FP state corruption,
        hardware access/dirty PTE corruption and an erratum workaround for the
        Falkor CPU.
      
        I'm hoping that things finally settle down now, but never say never...
      
        Summary:
      
         - Fix FPSIMD context switch regression introduced in -rc2
      
         - Fix ABI break with SVE CPUID register reporting
      
         - Fix use of uninitialised variable
      
         - Fixes to hardware access/dirty management and sanity checking
      
         - CPU erratum workaround for Falkor CPUs
      
         - Fix reporting of writeable+executable mappings
      
         - Fix signal reporting for RAS errors"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: fpsimd: Fix copying of FP state from signal frame into task struct
        arm64/sve: Report SVE to userspace via CPUID only if supported
        arm64: fix CONFIG_DEBUG_WX address reporting
        arm64: fault: avoid send SIGBUS two times
        arm64: hw_breakpoint: Use linux/uaccess.h instead of asm/uaccess.h
        arm64: Add software workaround for Falkor erratum 1041
        arm64: Define cputype macros for Falkor CPU
        arm64: mm: Fix false positives in set_pte_at access/dirty race detection
        arm64: mm: Fix pte_mkclean, pte_mkdirty semantics
        arm64: Initialise high_memory global variable earlier
      06f976ec
    • Jiri Pirko's avatar
      net: sched: fix static key imbalance in case of ingress/clsact_init error · b59e6979
      Jiri Pirko authored
      Move static key increments to the beginning of the init function
      so they pair 1:1 with decrements in ingress/clsact_destroy,
      which is called in case ingress/clsact_init fails.
      
      Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b59e6979
    • Jiri Pirko's avatar
      net: sched: fix clsact init error path · 343723dd
      Jiri Pirko authored
      Since in qdisc_create, the destroy op is called when init fails, we
      don't do cleanup in init and leave it up to destroy.
      This fixes use-after-free when trying to put already freed block.
      
      Fixes: 6e40cf2d ("net: sched: use extended variants of block_get/put in ingress and clsact qdiscs")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      343723dd
    • Florian Fainelli's avatar
      net: phy: broadcom: Add entry for 5395 switch PHYs · 28dc4c8f
      Florian Fainelli authored
      Add an entry for the builtin PHYs present in the Broadcom BCM5395 switch. This
      allows us to retrieve the PHY statistics among other things.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarChris Healy <cphealy@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28dc4c8f
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e53000b1
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
      
         - fix the s2ram regression related to confusion around segment
           register restoration, plus related cleanups that make the code more
           robust
      
         - a guess-unwinder Kconfig dependency fix
      
         - an isoimage build target fix for certain tool chain combinations
      
         - instruction decoder opcode map fixes+updates, and the syncing of
           the kernel decoder headers to the objtool headers
      
         - a kmmio tracing fix
      
         - two 5-level paging related fixes
      
         - a topology enumeration fix on certain SMP systems"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version
        x86/decoder: Fix and update the opcodes map
        x86/power: Make restore_processor_context() sane
        x86/power/32: Move SYSENTER MSR restoration to fix_processor_context()
        x86/power/64: Use struct desc_ptr for the IDT in struct saved_context
        x86/unwinder/guess: Prevent using CONFIG_UNWINDER_GUESS=y with CONFIG_STACKDEPOT=y
        x86/build: Don't verify mtools configuration file for isoimage
        x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
        x86/boot/compressed/64: Print error if 5-level paging is not supported
        x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
        x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
      e53000b1
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1f76a755
      Linus Torvalds authored
      Pull locking fixes from Ingo Molnar:
       "Misc fixes:
      
         - Fix a S390 boot hang that was caused by the lock-break logic.
           Remove lock-break to begin with, as review suggested it was
           unreasonably fragile and our confidence in its continued good
           health is lower than our confidence in its removal.
      
         - Remove the lockdep cross-release checking code for now, because of
           unresolved false positive warnings. This should make lockdep work
           well everywhere again.
      
         - Get rid of the final (and single) ACCESS_ONCE() straggler and
           remove the API from v4.15.
      
         - Fix a liblockdep build warning"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tools/lib/lockdep: Add missing declaration of 'pr_cont()'
        checkpatch: Remove ACCESS_ONCE() warning
        compiler.h: Remove ACCESS_ONCE()
        tools/include: Remove ACCESS_ONCE()
        tools/perf: Convert ACCESS_ONCE() to READ_ONCE()
        locking/lockdep: Remove the cross-release locking checks
        locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y
        locking/core: Fix deadlock during boot on systems with GENERIC_LOCKBREAK
      1f76a755