1. 17 Dec, 2016 14 commits
    • Daniel Mack's avatar
      bpf: cgroup: annotate pointers in struct cgroup_bpf with __rcu · dcdc43d6
      Daniel Mack authored
      The member 'effective' in 'struct cgroup_bpf' is protected by RCU.
      Annotate it accordingly to squelch a sparse warning.
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcdc43d6
    • David S. Miller's avatar
      Merge branch 'inet_csk_get_port-and-soreusport-fixes' · 28055c97
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      inet: Fixes for inet_csk_get_port and soreusport
      
      This patch set fixes a couple of issues I noticed while debugging our
      softlockup issue in inet_csk_get_port.
      
      - Don't allow jump into port scan in inet_csk_get_port if function
        was called with non-zero port number (looking up explicit port
        number).
      - When inet_csk_get_port is called with zero port number (ie. perform
        scan) an reuseport is set on the socket, don't match sockets that
        also have reuseport set. The intent from the user should be
        to get a new port number and then explictly bind other
        sockets to that number using soreuseport.
      
      Tested:
      
      Ran first patch on production workload with no ill effect.
      
      For second patch, ran a little listener application and first
      demonstrated that unbound sockets with soreuseport can indeed
      be bound to unrelated soreuseport sockets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28055c97
    • Tom Herbert's avatar
      inet: Fix get port to handle zero port number with soreuseport set · 0643ee4f
      Tom Herbert authored
      A user may call listen with binding an explicit port with the intent
      that the kernel will assign an available port to the socket. In this
      case inet_csk_get_port does a port scan. For such sockets, the user may
      also set soreuseport with the intent a creating more sockets for the
      port that is selected. The problem is that the initial socket being
      opened could inadvertently choose an existing and unreleated port
      number that was already created with soreuseport.
      
      This patch adds a boolean parameter to inet_bind_conflict that indicates
      rather soreuseport is allowed for the check (in addition to
      sk->sk_reuseport). In calls to inet_bind_conflict from inet_csk_get_port
      the argument is set to true if an explicit port is being looked up (snum
      argument is nonzero), and is false if port scan is done.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0643ee4f
    • Tom Herbert's avatar
      inet: Don't go into port scan when looking for specific bind port · 9af7e923
      Tom Herbert authored
      inet_csk_get_port is called with port number (snum argument) that may be
      zero or nonzero. If it is zero, then the intent is to find an available
      ephemeral port number to bind to. If snum is non-zero then the caller
      is asking to allocate a specific port number. In the latter case we
      never want to perform the scan in ephemeral port range. It is
      conceivable that this can happen if the "goto again" in "tb_found:"
      is done. This patch adds a check that snum is zero before doing
      the "goto again".
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9af7e923
    • Daniel Borkmann's avatar
      bpf, test_verifier: fix a test case error result on unprivileged · 0eb6984f
      Daniel Borkmann authored
      Running ./test_verifier as unprivileged lets 1 out of 98 tests fail:
      
        [...]
        #71 unpriv: check that printk is disallowed FAIL
        Unexpected error message!
        0: (7a) *(u64 *)(r10 -8) = 0
        1: (bf) r1 = r10
        2: (07) r1 += -8
        3: (b7) r2 = 8
        4: (bf) r3 = r1
        5: (85) call bpf_trace_printk#6
        unknown func bpf_trace_printk#6
        [...]
      
      The test case is correct, just that the error outcome changed with
      ebb676da ("bpf: Print function name in addition to function id").
      Same as with e00c7b21 ("bpf: fix multiple issues in selftest suite
      and samples") issue 2), so just fix up the function name.
      
      Fixes: ebb676da ("bpf: Print function name in addition to function id")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eb6984f
    • Daniel Borkmann's avatar
      bpf: fix regression on verifier pruning wrt map lookups · a08dd0da
      Daniel Borkmann authored
      Commit 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL
      registers") introduced a regression where existing programs stopped
      loading due to reaching the verifier's maximum complexity limit,
      whereas prior to this commit they were loading just fine; the affected
      program has roughly 2k instructions.
      
      What was found is that state pruning couldn't be performed effectively
      anymore due to mismatches of the verifier's register state, in particular
      in the id tracking. It doesn't mean that 57a09bf0 is incorrect per
      se, but rather that verifier needs to perform a lot more work for the
      same program with regards to involved map lookups.
      
      Since commit 57a09bf0 is only about tracking registers with type
      PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers
      until they are promoted through pattern matching with a NULL check to
      either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the
      id becomes irrelevant for the transitioned types.
      
      For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(),
      but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even
      transferred further into other types that don't make use of it. Among
      others, one example is where UNKNOWN_VALUE is set on function call
      return with RET_INTEGER return type.
      
      states_equal() will then fall through the memcmp() on register state;
      note that the second memcmp() uses offsetofend(), so the id is part of
      that since d2a4dd37 ("bpf: fix state equivalence"). But the bisect
      pointed already to 57a09bf0, where we really reach beyond complexity
      limit. What I found was that states_equal() often failed in this
      case due to id mismatches in spilled regs with registers in type
      PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform
      a memcmp() on their reg state and don't have any other optimizations
      in place, therefore also id was relevant in this case for making a
      pruning decision.
      
      We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE.
      For the affected program, it resulted in a ~17 fold reduction of
      complexity and let the program load fine again. Selftest suite also
      runs fine. The only other place where env->id_gen is used currently is
      through direct packet access, but for these cases id is long living, thus
      a different scenario.
      
      Also, the current logic in mark_map_regs() is not fully correct when
      marking NULL branch with UNKNOWN_VALUE. We need to cache the destination
      reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE,
      it's id is reset and any subsequent registers that hold the original id
      and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE
      anymore, since mark_map_reg() reuses the uncached regs[regno].id that
      was just overridden. Note, we don't need to cache it outside of
      mark_map_regs(), since it's called once on this_branch and the other
      time on other_branch, which are both two independent verifier states.
      A test case for this is added here, too.
      
      Fixes: 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a08dd0da
    • David Ahern's avatar
      net: vrf: Drop conntrack data after pass through VRF device on Tx · eb63ecc1
      David Ahern authored
      Locally originated traffic in a VRF fails in the presence of a POSTROUTING
      rule. For example,
      
          $ iptables -t nat -A POSTROUTING -s 11.1.1.0/24  -j MASQUERADE
          $ ping -I red -c1 11.1.1.3
          ping: Warning: source address might be selected on device other than red.
          PING 11.1.1.3 (11.1.1.3) from 11.1.1.2 red: 56(84) bytes of data.
          ping: sendmsg: Operation not permitted
      
      Worse, the above causes random corruption resulting in a panic in random
      places (I have not seen a consistent backtrace).
      
      Call nf_reset to drop the conntrack info following the pass through the
      VRF device.  The nf_reset is needed on Tx but not Rx because of the order
      in which NF_HOOK's are hit: on Rx the VRF device is after the real ingress
      device and on Tx it is is before the real egress device. Connection
      tracking should be tied to the real egress device and not the VRF device.
      
      Fixes: 8f58336d ("net: Add ethernet header for pass through VRF device")
      Fixes: 35402e31 ("net: Add IPv6 support to VRF device")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb63ecc1
    • David Ahern's avatar
      net: vrf: Fix NAT within a VRF · a0f37efa
      David Ahern authored
      Connection tracking with VRF is broken because the pass through the VRF
      device drops the connection tracking info. Removing the call to nf_reset
      allows DNAT and MASQUERADE to work across interfaces within a VRF.
      
      Fixes: 73e20b76 ("net: vrf: Add support for PREROUTING rules on vrf device")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0f37efa
    • David S. Miller's avatar
      Merge branch 'cls_flower-mask' · 8a9f5fdf
      David S. Miller authored
      Paul Blakey says:
      
      ====================
      net/sched: cls_flower: Fix mask handling
      
      The series fix how the mask is being handled.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a9f5fdf
    • Paul Blakey's avatar
      net/sched: cls_flower: Use masked key when calling HW offloads · f93bd17b
      Paul Blakey authored
      Zero bits on the mask signify a "don't care" on the corresponding bits
      in key. Some HWs require those bits on the key to be zero. Since these
      bits are masked anyway, it's okay to provide the masked key to all
      drivers.
      
      Fixes: 5b33f488 ('net/flower: Introduce hardware offload support')
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f93bd17b
    • Paul Blakey's avatar
      net/sched: cls_flower: Use mask for addr_type · 970bfcd0
      Paul Blakey authored
      When addr_type is set, mask should also be set.
      
      Fixes: 66530bdf ('sched,cls_flower: set key address type when present')
      Fixes: bc3103f1 ('net/sched: cls_flower: Classify packet in ip tunnels')
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      970bfcd0
    • Bartosz Folta's avatar
      net: macb: Added PCI wrapper for Platform Driver. · 83a77e9e
      Bartosz Folta authored
      There are hardware PCI implementations of Cadence GEM network
      controller. This patch will allow to use such hardware with reuse of
      existing Platform Driver.
      Signed-off-by: default avatarBartosz Folta <bfolta@cadence.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83a77e9e
    • Thomas Falcon's avatar
      ibmveth: calculate gso_segs for large packets · 94acf164
      Thomas Falcon authored
      Include calculations to compute the number of segments
      that comprise an aggregated large packet.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarJonathan Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94acf164
    • Timur Tabi's avatar
      net: qcom/emac: don't try to claim clocks on ACPI systems · 026acd5f
      Timur Tabi authored
      On ACPI systems, clocks are not available to drivers directly.  They are
      handled exclusively by ACPI and/or firmware, so there is no clock driver.
      Calls to clk_get() always fail, so we should not even attempt to claim
      any clocks on ACPI systems.
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      026acd5f
  2. 16 Dec, 2016 7 commits
  3. 13 Dec, 2016 2 commits
  4. 12 Dec, 2016 17 commits