1. 27 Apr, 2015 4 commits
    • Daniel Borkmann's avatar
      tc: built-in eBPF exec proxy · 4bd62446
      Daniel Borkmann authored
      This work follows upon commit 6256f8c9 ("tc, bpf: finalize eBPF
      support for cls and act front-end") and takes up the idea proposed by
      Hannes Frederic Sowa to spawn a shell (or any other command) that holds
      generated eBPF map file descriptors.
      
      File descriptors, based on their id, are being fetched from the same
      unix domain socket as demonstrated in the bpf_agent, the shell spawned
      via execvpe(2) and the map fds passed over the environment, and thus
      are made available to applications in the fashion of std{in,out,err}
      for read/write access, for example in case of iproute2's examples/bpf/:
      
        # env | grep BPF
        BPF_NUM_MAPS=3
        BPF_MAP1=6        <- BPF_MAP_ID_QUEUE (id 1)
        BPF_MAP0=5        <- BPF_MAP_ID_PROTO (id 0)
        BPF_MAP2=7        <- BPF_MAP_ID_DROPS (id 2)
      
        # ls -la /proc/self/fd
        [...]
        lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
        lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
        lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
        [...]
        lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
        lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
        lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
      
      The advantage (as opposed to the direct/native usage) is that now the
      shell is map fd owner and applications can terminate and easily reattach
      to descriptors w/o any kernel changes. Moreover, multiple applications
      can easily read/write eBPF maps simultaneously.
      
      To further allow users for experimenting with that, next step is to add
      a small helper that can get along with simple data types, so that also
      shell scripts can make use of bpf syscall, f.e to read/write into maps.
      
      Generally, this allows for prepopulating maps, or any runtime altering
      which could influence eBPF program behaviour (f.e. different run-time
      classifications, skb modifications, ...), dumping of statistics, etc.
      
      Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860Suggested-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      4bd62446
    • Nicolas Dichtel's avatar
      mroute: remove invalid check against NLM_F_MULTI · 505f9186
      Nicolas Dichtel authored
      This flag is only for the netlink protocol (multi-part messages), no reason
      to reject messages without it.
      
      Note that this flag was removed by the following kernel patches (v3.14)
      65886f439ab0 ipmr: fix mfc notification flags
      f518338b1603 ip6mr: fix mfc notification flags
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      505f9186
    • Nicolas Dichtel's avatar
      libnamespaces: fix warning about syscall() · b765eda9
      Nicolas Dichtel authored
      The warning was:
      In file included from namespace.c:14:0:
      ../include/namespace.h: In function ‘setns’:
      ../include/namespace.h:37:2: warning: implicit declaration of function ‘syscall’ [-Wimplicit-function-declaration]
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      b765eda9
    • Nicolas Dichtel's avatar
      tc: fix compilation warning on 32bits arch · afa5158f
      Nicolas Dichtel authored
      The warning was:
      m_simple.c: In function ‘parse_simple’:
      m_simple.c:142:4: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ [-Wformat]
      
      Useful to be able to compile with -Werror.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      afa5158f
  2. 20 Apr, 2015 8 commits
  3. 13 Apr, 2015 9 commits
    • Felix Fietkau's avatar
      tc: add support for connmark action · b8d5c9a7
      Felix Fietkau authored
      Add ability to add the netfilter connmark support.
      
      Typical usage:
      ...lets tag outgoing icmp with mark 0x10..
      iptables -tmangle -A PREROUTING -p icmp -j CONNMARK --set-mark 0x10
      ..add on ingress of $ETH an extractor for connmark...
      tc filter add dev $ETH parent ffff: prio 4 protocol ip \
      u32 match ip protocol 1 0xff \
      flowid 1:1 \
      action connmark continue
      ...if the connmark was 0x11, we police to a ridic rate of 10Kbps
      tc filter add dev $ETH parent ffff: prio 5 protocol ip \
      handle 0x11 fw flowid 1:1 \
      action police rate 10kbit burst 10k
      
      Other ways to use the connmark is to supply the zone, index and
      branching choice. Refer to help.
      Signed-off-by: default avatarFelix Fietkau <nbd@openwrt.org>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      b8d5c9a7
    • Stephen Hemminger's avatar
      update kernel headers and add tc_connmark.h · 94f66538
      Stephen Hemminger authored
      Needed for later tc action patches
      94f66538
    • Andy Gospodarek's avatar
      iproute2: unify naming for entries offloaded to hardware · aa05b988
      Andy Gospodarek authored
      The kernel now has the capability to offload FDB and FIB entries to hardware.
      It is important to let users know if table entries are also offloaded to
      hardware.  Currently offloaded FDB entries are indicated by the existence of
      the flag 'external' on the entry as of the following commit:
      
      commit 28467b7f
      Author: Scott Feldman <sfeldma@gmail.com>
      Date:   Thu Dec 4 09:57:15 2014 +0100
      
          bridge/fdb: add flag/indication for FDB entry synced from offload device
      
      When the patch to add support for indicating that FIB entries were also
      offloaded as posted to netdev by Scott Feldman it became clear that 'external'
      would not be an ideal name for routes.  There could definitely be confusion
      about what this might mean since many routes are to external networks -- a
      collision/confusion that did not happen with FDB.
      
      Scott Feldman asked me to check with others and build concensus around a name.
      After speaking with several people about this I am proposing we refer to both
      FDB and FIB entries that are currently backed by hardware (based on the work
      done in rocker) with the flag 'offload' appended to the end ofthe entry.
      
      Some people liked the string 'external,' others liked 'hardware,' but the point
      is to communicate that these routes are available to something that will will
      offload the forwarding normally done by the kernel.  Since the term 'offload'
      is used so frequently it seems appropriate to use the same language in
      ip/bridge output.
      
      The term 'offload' also seems to resonate with many of the people who have
      responded on Scott's original thread or to those who I reached out to directly
      and did respond to my query, so it seems we have reached consensus that it
      should be the term used going forward.
      
      v2: rebased against net-next branch
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      CC: Jamal Hadi Salim <jhs@mojatatu.com>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: John W. Linville <linville@tuxdriver.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      CC: Scott Feldman <sfeldma@gmail.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      aa05b988
    • Stephen Hemminger's avatar
      Merge branch 'master' into net-next · 93531fac
      Stephen Hemminger authored
      93531fac
    • Stephen Hemminger's avatar
      fix whitespace · 672acc72
      Stephen Hemminger authored
      672acc72
    • Stephen Hemminger's avatar
      v4.0.0 · aed6d85d
      Stephen Hemminger authored
      aed6d85d
    • Nicolas Dichtel's avatar
      ipnetns: add a runtime check for RTM_GETNSID support · 4c7d9a58
      Nicolas Dichtel authored
      The goal of this patch is to test during the runtime if the command RTM_GETNSID
      is supported by the kernel.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      4c7d9a58
    • Nicolas Dichtel's avatar
      5a2ce868
    • Nicolas Dichtel's avatar
      694ed195
  4. 10 Apr, 2015 8 commits
  5. 07 Apr, 2015 6 commits
  6. 24 Mar, 2015 5 commits
    • Lubomir Rintel's avatar
      ip: support RFC4191 router preference · 194e9b85
      Lubomir Rintel authored
      This allows querying and setting the route preference. It's usually set from
      the IPv6 Neighbor Discovery Router Advertisement messages.
      
      Introduced in "ipv6: expose RFC4191 route preference via rtnetlink", enqueued
      for Linux 4.1.
      Signed-off-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      194e9b85
    • Eric W. Biederman's avatar
      add basic mpls support to iproute · dacc5d41
      Eric W. Biederman authored
      - Pull in the uapi mpls.h
      - Update rtnetlink.h to include the mpls rtnetlink notification multicast group.
      - Define AF_MPLS in utils.h if it is not defined from elsewhere
        as is done with AF_DECnet
      
      The address syntax for multiple mpls labels is a complete invention.
      When I looked there seemed to be no wide spread convention for talking
      about an mpls label stack in text for.  Sometimes people did:
      "{ Label1, Label2, Label3 }", sometimes people would do:
      "[ label3, label2, label1 ]", and most of the time label
      stacks were not explicitly shown at all.
      
      The syntax I wound up using, so it would not have spaces and so it
      would visually distinct from other kinds of addresses is.
      
      label1/label2/label3 Where label1 is the label at the top of the label
      stack and label3 is the label at the bottom on the label stack.
      
      When there is a single label this matches what seems to be convention
      with other tools.  Just print out the numeric value of the mpls label.
      
      The netlink protocol for labels uses the on the wire format for a
      label stack. The ttl and traffic class are expected to be 0.  Using
      the on the wire format is common and what happens with other address
      types. BGP when passing label stacks also uses this technique with the
      exception that the ttl byte is not included making each label in a BGP
      label stack 3 bytes instead of 4.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      dacc5d41
    • Eric W. Biederman's avatar
      add support for the RTA_NEWDST attribute. · 6f7a9f4d
      Eric W. Biederman authored
      This attribute is like RTA_DST except it specifies the destination
      address to place on a packet when it leaves the host.  For ip based
      protocols this is destination NAT and not a common part of forwarding.
      For protocols like MPLS label swapping is something that typically
      happens on every hop.
      
      There is likely to be a RTA_NEWSRC at some point so RTA_NEWDST
      is printed as "as to"  and can be specified either as "as to"
      or just "as"
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      6f7a9f4d
    • Eric W. Biederman's avatar
      add support for the RTA_VIA attribute · 93ae2835
      Eric W. Biederman authored
      Add support for the RTA_VIA attribute that specifies an address family
      as well as an address for the next hop gateway.
      
      To make it easy to pass this reorder inet_prefix so that it's tail
      is a proper RTA_VIA attribute.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      93ae2835
    • Eric W. Biederman's avatar
      misc whitespace cleanup · 8e8f8de4
      Eric W. Biederman authored
      8e8f8de4