1. 09 Jan, 2017 11 commits
    • David Howells's avatar
      afs: Refcount the afs_call struct · 341f741f
      David Howells authored
      A static checker warning occurs in the AFS filesystem:
      
      	fs/afs/cmservice.c:155 SRXAFSCB_CallBack()
      	error: dereferencing freed memory 'call'
      
      due to the reply being sent before we access the server it points to.  The
      act of sending the reply causes the call to be freed if an error occurs
      (but not if it doesn't).
      
      On top of this, the lifetime handling of afs_call structs is fragile
      because they get passed around through workqueues without any sort of
      refcounting.
      
      Deal with the issues by:
      
       (1) Fix the maybe/maybe not nature of the reply sending functions with
           regards to whether they release the call struct.
      
       (2) Refcount the afs_call struct and sort out places that need to get/put
           references.
      
       (3) Pass a ref through the work queue and release (or pass on) that ref in
           the work function.  Care has to be taken because a work queue may
           already own a ref to the call.
      
       (4) Do the cleaning up in the put function only.
      
       (5) Simplify module cleanup by always incrementing afs_outstanding_calls
           whenever a call is allocated.
      
       (6) Set the backlog to 0 with kernel_listen() at the beginning of the
           process of closing the socket to prevent new incoming calls from
           occurring and to remove the contribution of preallocated calls from
           afs_outstanding_calls before we wait on it.
      
      A tracepoint is also added to monitor the afs_call refcount and lifetime.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Fixes: 08e0e7c8: "[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC."
      341f741f
    • David Howells's avatar
      rxrpc: Allow listen(sock, 0) to be used to disable listening · 210f0353
      David Howells authored
      Allow listen() with a backlog of 0 to be used to disable listening on an
      AF_RXRPC socket.  This also releases any preallocation, thereby making it
      easier for a kernel service to account for all allocated call structures
      when shutting down the service.
      
      The socket cannot thereafter have listening reenabled, but must rather be
      closed and reopened.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      210f0353
    • David Howells's avatar
      afs: Kill afs_wait_mode · 56ff9c83
      David Howells authored
      The afs_wait_mode struct isn't really necessary.  Client calls only use one
      of a choice of two (synchronous or the asynchronous) and incoming calls
      don't use the wait at all.  Replace with a boolean parameter.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      56ff9c83
    • David Howells's avatar
      afs: Add some tracepoints · 8e8d7f13
      David Howells authored
      Add three tracepoints to the AFS filesystem:
      
       (1) The afs_recv_data tracepoint logs data segments that are extracted
           from the data received from the peer through afs_extract_data().
      
       (2) The afs_notify_call tracepoint logs notification from AF_RXRPC of data
           coming in to an asynchronous call.
      
       (3) The afs_cb_call tracepoint logs incoming calls that have had their
           operation ID extracted and mapped into a supported cache manager
           service call.
      
      To make (3) work, the name strings in the afs_call_type struct objects have
      to be annotated with __tracepoint_string.  This is done with the CM_NAME()
      macro.
      
      Further, the AFS call state enum needs a name so that it can be used to
      declare parameter types.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      8e8d7f13
    • David S. Miller's avatar
      Merge branch 'tc-skb-diet' · 4289e60c
      David S. Miller authored
      Willem de Bruijn says:
      
      ====================
      convert tc_verd to integer bitfields
      
      The skb tc_verd field takes up two bytes but uses far fewer bits.
      Convert the remaining use cases to bitfields that fit in existing
      holes (depending on config options) and potentially save the two
      bytes in struct sk_buff.
      
      This patchset is based on an earlier set by Florian Westphal and its
      discussion (http://www.spinics.net/lists/netdev/msg329181.html).
      
      Patches 1 and 2 are low hanging fruit: removing the last traces of
        data that are no longer stored in tc_verd.
      
      Patches 3 and 4 convert tc_verd to individual bitfields (5 bits).
      
      Patch 5 reduces TC_AT to a single bitfield,
        as AT_STACK is not valid here (unlike in the case of TC_FROM).
      
      Patch 6 changes TC_FROM to two bitfields with clearly defined purpose.
      
      It may be possible to reduce storage further after this initial round.
      If tc_skip_classify is set only by IFB, testing skb_iif may suffice.
      The L2 header pushing/popping logic can perhaps be shared with
      AF_PACKET, which currently not pkt_type for the same purpose.
      
      Changes:
        RFC -> v1
          - (patch 3): remove no longer needed label in tfc_action_exec
          - (patch 5): set tc_at_ingress at the same points as existing
                       SET_TC_AT calls
      
      Tested ingress mirred + netem + ifb:
      
        ip link set dev ifb0 up
        tc qdisc add dev eth0 ingress
        tc filter add dev eth0 parent ffff: \
          u32 match ip dport 8000 0xffff \
          action mirred egress redirect dev ifb0
        tc qdisc add dev ifb0 root netem delay 1000ms
        nc -u -l 8000 &
        ssh $otherhost nc -u $host 8000
      
      Tested egress mirred:
      
        ip link add veth1 type veth peer name veth2
        ip link set dev veth1 up
        ip link set dev veth2 up
        tcpdump -n -i veth2 udp and dst port 8000 &
      
        tc qdisc add dev eth0 root handle 1: prio
        tc filter add dev eth0 parent 1:0 \
          u32 match ip dport 8000 0xffff \
          action mirred egress redirect dev veth1
        tc qdisc add dev veth1 root netem delay 1000ms
        nc -u $otherhost 8000
      
      Tested ingress mirred:
      
        ip link add veth1 type veth peer name veth2
        ip link add veth3 type veth peer name veth4
      
        ip netns add ns0
        ip netns add ns1
      
        for i in 1 2 3 4; do \
          NS=ns$((${i}%2)); \
          ip link set dev veth${i} netns ${NS}; \
          ip netns exec ${NS} \
            ip addr add dev veth${i} 192.168.1.${i}/24; \
          ip netns exec ${NS} \
            ip link set dev veth${i} up; \
        done
      
        ip netns exec ns0 tc qdisc add dev veth2 ingress
        ip netns exec ns0 \
          tc filter add dev veth2 parent ffff: \
            u32 match ip dport 8000 0xffff \
            action mirred ingress redirect dev veth4
      
        ip netns exec ns0 \
          tcpdump -n -i veth4 udp and dst port 8000 &
        ip netns exec ns1 \
          nc -u 192.168.1.2 8000
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4289e60c
    • Willem de Bruijn's avatar
      net-tc: convert tc_from to tc_from_ingress and tc_redirected · bc31c905
      Willem de Bruijn authored
      The tc_from field fulfills two roles. It encodes whether a packet was
      redirected by an act_mirred device and, if so, whether act_mirred was
      called on ingress or egress. Split it into separate fields.
      
      The information is needed by the special IFB loop, where packets are
      taken out of the normal path by act_mirred, forwarded to IFB, then
      reinjected at their original location (ingress or egress) by IFB.
      
      The IFB device cannot use skb->tc_at_ingress, because that may have
      been overwritten as the packet travels from act_mirred to ifb_xmit,
      when it passes through tc_classify on the IFB egress path. Cache this
      value in skb->tc_from_ingress.
      
      That field is valid only if a packet arriving at ifb_xmit came from
      act_mirred. Other packets can be crafted to reach ifb_xmit. These
      must be dropped. Set tc_redirected on redirection and drop all packets
      that do not have this bit set.
      
      Both fields are set only on cloned skbs in tc actions, so original
      packet sources do not have to clear the bit when reusing packets
      (notably, pktgen and octeon).
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc31c905
    • Willem de Bruijn's avatar
      net-tc: convert tc_at to tc_at_ingress · 8dc07fdb
      Willem de Bruijn authored
      Field tc_at is used only within tc actions to distinguish ingress from
      egress processing. A single bit is sufficient for this purpose.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8dc07fdb
    • Willem de Bruijn's avatar
      net-tc: convert tc_verd to integer bitfields · a5135bcf
      Willem de Bruijn authored
      Extract the remaining two fields from tc_verd and remove the __u16
      completely. TC_AT and TC_FROM are converted to equivalent two-bit
      integer fields tc_at and tc_from. Where possible, use existing
      helper skb_at_tc_ingress when reading tc_at. Introduce helper
      skb_reset_tc to clear fields.
      
      Not documenting tc_from and tc_at, because they will be replaced
      with single bit fields in follow-on patches.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5135bcf
    • Willem de Bruijn's avatar
      net-tc: extract skip classify bit from tc_verd · e7246e12
      Willem de Bruijn authored
      Packets sent by the IFB device skip subsequent tc classification.
      A single bit governs this state. Move it out of tc_verd in
      anticipation of removing that __u16 completely.
      
      The new bitfield tc_skip_classify temporarily uses one bit of a
      hole, until tc_verd is removed completely in a follow-up patch.
      
      Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
      With that many options, little value in documenting it.
      
      Introduce a helper function to deduplicate the logic in the two
      sites that check this bit.
      
      The field tc_skip_classify is set only in IFB on skbs cloned in
      act_mirred, so original packet sources do not have to clear the
      bit when reusing packets (notably, pktgen and octeon).
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7246e12
    • Willem de Bruijn's avatar
      net-tc: make MAX_RECLASSIFY_LOOP local · d6264071
      Willem de Bruijn authored
      This field is no longer kept in tc_verd. Remove it from the global
      definition of that struct.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6264071
    • Willem de Bruijn's avatar
      net-tc: remove unused tc_verd fields · aec745e2
      Willem de Bruijn authored
      Remove the last reference to tc_verd's munge and redirect ttl bits.
      These fields are no longer used.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aec745e2
  2. 08 Jan, 2017 29 commits