1. 10 Aug, 2015 27 commits
    • Kevin Hao's avatar
      net: fec: fix the race between xmit and bdp reclaiming path · c4bc44c6
      Kevin Hao authored
      When we transmit a fragmented skb, we may run into a race like the
      following scenario (assume txq->cur_tx is next to txq->dirty_tx):
                 cpu 0                                          cpu 1
        fec_enet_txq_submit_skb
          reserve a bdp for the first fragment
          fec_enet_txq_submit_frag_skb
             update the bdp for the other fragment
             update txq->cur_tx
                                                         fec_enet_tx_queue
                                                           bdp = fec_enet_get_nextdesc(txq->dirty_tx, fep, queue_id);
                                                           This bdp is the bdp reserved for the first segment. Given
                                                           that this bdp BD_ENET_TX_READY bit is not set and txq->cur_tx
                                                           is already pointed to a bdp beyond this one. We think this is a
                                                           completed bdp and try to reclaim it.
          update the bdp for the first segment
          update txq->cur_tx
      
      So we shouldn't update the txq->cur_tx until all the update to the
      bdps used for fragments are performed. Also add the corresponding
      memory barrier to guarantee that the update to the bdps, dirty_tx and
      cur_tx performed in the proper order.
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4bc44c6
    • David S. Miller's avatar
      Merge branch 'mlxsw-fixes' · 2cf1b5ce
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Couple of fixes/adjustments
      
      Ido Schimmel (5):
        mlxsw: Call free_netdev when removing port
        mlxsw: Make system port to local port mapping explicit
        mlxsw: Simplify mlxsw_sx_port_xmit function
        mlxsw: Use correct skb length when dumping payload
        mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit
      
      Jiri Pirko (2):
        mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM
        mlxsw: Strip FCS from incoming packets
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cf1b5ce
    • Ido Schimmel's avatar
      mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit · e577516b
      Ido Schimmel authored
      Store the length of the skb before transmitting it and use it for stats
      instead of skb->len, since skb might have been freed already.
      
      This issue was discovered using the Kernel Address sanitizer (KASan).
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e577516b
    • Ido Schimmel's avatar
      mlxsw: Use correct skb length when dumping payload · 3bfcd347
      Ido Schimmel authored
      Do not use the length of the transmitted skb (which was freed), but
      that of the response skb.
      
      This issue was discovered using the Kernel Address sanitizer (KASan).
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bfcd347
    • Ido Schimmel's avatar
      mlxsw: Simplify mlxsw_sx_port_xmit function · d003462a
      Ido Schimmel authored
      Previously we only checked if the transmission queue is not full in the
      middle of the xmit function. This lead to complex logic due to the fact
      that sometimes we need to reallocate the headroom for our Tx header.
      
      Allow the switch driver to know if the transmission queue is not full
      before sending the packet and remove this complex logic.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d003462a
    • Jiri Pirko's avatar
      mlxsw: Strip FCS from incoming packets · 7b7b9cff
      Jiri Pirko authored
      FCS of incoming packets is already checked by HW. Just strip it out.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b7b9cff
    • Jiri Pirko's avatar
      mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM · 74ed207e
      Jiri Pirko authored
      This resolves compile errors on um-allyesconfig.
      
      Note that there are many other drivers which have the same issue.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74ed207e
    • Ido Schimmel's avatar
      mlxsw: Make system port to local port mapping explicit · e61011b5
      Ido Schimmel authored
      System ports are unique identifiers in a multi-ASIC environment that
      represent all the available ports in the system. Local ports on the
      other hand, are unique only within the local ASIC.
      
      Since system port to local port mapping is not part of the HW-SW
      contract and since only single-ASIC configurations are currently
      supported, set an explicit 1:1 mapping by configuring the Switch System
      Port Record (SSPR) register.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e61011b5
    • Ido Schimmel's avatar
      mlxsw: Call free_netdev when removing port · 26a80f6e
      Ido Schimmel authored
      When removing a port's netdevice we should also free the memory
      allocated by alloc_etherdev(). Do this by calling free_netdev() at the
      end of the teardown sequence.
      Reported-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26a80f6e
    • Masanari Iida's avatar
      net: ethernet: Fix double word "the the" in eth.c · ecea4991
      Masanari Iida authored
      This patch fix double word "the the" in
      Documentation/DocBook/networking/API-eth-get-headlen.html
      Documentation/DocBook/networking/netdev.html
      Documentation/DocBook/networking.xml
      
      These files are generated from comment in source,
      so I have to fix comment in net/ethernet/eth.c.
      Signed-off-by: default avatarMasanari Iida <standby24x7@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecea4991
    • Shaohui Xie's avatar
      net: phy: add RealTek RTL8211DN phy id · 0024f892
      Shaohui Xie authored
      RTL8211DN is compatible with RTL8211E.
      Signed-off-by: default avatarShaohui Xie <Shaohui.Xie@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0024f892
    • Robert Shearman's avatar
      mpls: Enforce payload type of traffic sent using explicit NULL · 118d5234
      Robert Shearman authored
      RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
      label on the stack, then after popping the resulting packet must be
      treated as a IPv4 packet and forwarded based on the IPv4 header. The
      same is true for IPv6 Explicit NULL with an IPv6 packet following.
      
      Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
      add an attribute that specifies the expected payload type for use at
      forwarding time for determining the type of the encapsulated packet
      instead of inspecting the first nibble of the packet.
      Signed-off-by: default avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      118d5234
    • David S. Miller's avatar
      Merge branch 'bpf-perf' · d74a790d
      David S. Miller authored
      Kaixu Xia says:
      
      ====================
      bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
      
      This patchset is base on the net-next:
       git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
      commit 9dc20a64.
      
      Previous patch v6 url:
      https://lkml.org/lkml/2015/8/4/188
      
      changes in V7:
       - rebase the whole patch set to net-next tree(9dc20a64);
       - split out the core perf APIs into Patch 1/5;
       - change the return value of function perf_event_attrs()
         from struct perf_event * to const struct perf_event * in
         Patch 1/5;
       - rename the function perf_event_read_internal() to perf_event_
         read_local() and rewrite it in Patch 1/5;
       - rename the function check_func_limit() to check_map_func
         _compatibility() and remove the unnecessary pass pointer to
         a pointer in Patch 4/5;
      
      changes in V6:
       - make the Patch 1/4 commit message more meaning and readable;
       - remove the unnecessary comment in Patch 2/4 and make it clean;
       - declare the function perf_event_release_kernel() in include/
         linux/perf_event.h to fix the build error when CONFIG_PERF_EVENTS
         isn't configured in Patch 2/4;
       - add function perf_event_attrs() to get the struct perf_event_attr
         in Patch 2/4.
       - move the related code from kernel/trace/bpf_trace.c to kernel/
         events/core.c and add function perf_event_read_internal() to
         avoid poking inside of the event outside of perf code in Patch 3/4;
       - generial the func & map match-pair with an array in Patch 3/4;
      
      changes in V5:
       - move struct fd_array_map_ops* fd_ops to bpf_map;
       - move array perf event decrement refcnt function to
         map_free;
       - fix the NULL ptr of perf_event_get();
       - move bpf_perf_event_read() to kernel/bpf/bpf_trace.c;
       - get rid of the remaining struct bpf_prog;
       - move the unnecessay cast on void *;
      
      changes in V4:
       - make the bpf_prog_array_map more generic;
       - fix the bug of event refcnt leak;
       - use more useful errno in bpf_perf_event_read();
      
      changes in V3:
       - collapse V2 patches 1-3 into one;
       - drop the function map->ops->map_traverse_elem() and release
         the struct perf_event in map_free;
       - only allow to access bpf_perf_event_read() from programs;
       - update the perf_event_array_map elem via xchg();
       - pass index directly to bpf_perf_event_read() instead of
         MAP_KEY;
      
      changes in V2:
       - put atomic_long_inc_not_zero() between fdget() and fdput();
       - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
       - Only read the event counter on current CPU or on current
         process;
       - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the
         pointer to the struct perf_event;
       - according to the perf_event_map_fd and key, the function
         bpf_perf_event_read() can get the Hardware PMU counter value;
      
      Patch 5/5 is a simple example and shows how to use this new eBPF
      programs ability. The PMU counter data can be found in
      /sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU
      value when 'kprobe/sys_write' sampling)
      
        $ cat /sys/kernel/debug/tracing/trace_pipe
        $ ./tracex6
             ...
             syslog-ng-548   [000] d..1    76.905673: : CPU-0   681765271
             syslog-ng-548   [000] d..1    76.905690: : CPU-0   681787855
             syslog-ng-548   [000] d..1    76.905707: : CPU-0   681810504
             syslog-ng-548   [000] d..1    76.905725: : CPU-0   681834771
             syslog-ng-548   [000] d..1    76.905745: : CPU-0   681859519
             syslog-ng-548   [000] d..1    76.905766: : CPU-0   681890419
             syslog-ng-548   [000] d..1    76.905783: : CPU-0   681914045
             syslog-ng-548   [000] d..1    76.905800: : CPU-0   681935950
             syslog-ng-548   [000] d..1    76.905816: : CPU-0   681958299
                    ls-690   [005] d..1    82.241308: : CPU-5   3138451
                    sh-691   [004] d..1    82.244570: : CPU-4   7324988
                 <...>-699   [007] d..1    99.961387: : CPU-7   3194027
                 <...>-695   [003] d..1    99.961474: : CPU-3   288901
                 <...>-695   [003] d..1    99.961541: : CPU-3   383145
                 <...>-695   [003] d..1    99.961591: : CPU-3   450365
                 <...>-695   [003] d..1    99.961639: : CPU-3   515751
                 <...>-695   [003] d..1    99.961686: : CPU-3   579047
             ...
      
      The detail of patches is as follow:
      
      Patch 1/5 add the necessary core perf APIs perf_event_attrs(),
      perf_event_get(),perf_event_read_local() when accessing events
      counters in eBPF programs
      
      Patch 2/5 rewrites part of the bpf_prog_array map code and make it
      more generic;
      
      Patch 3/5 introduces a new bpf map type. This map only stores the
      pointer to struct perf_event;
      
      Patch 4/5 implements function bpf_perf_event_read() that get the
      selected hardware PMU conuter;
      
      Patch 5/5 gives a simple example.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d74a790d
    • Kaixu Xia's avatar
      samples/bpf: example of get selected PMU counter value · 47efb302
      Kaixu Xia authored
      This is a simple example and shows how to use the new ability
      to get the selected Hardware PMU counter value.
      Signed-off-by: default avatarKaixu Xia <xiakaixu@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47efb302
    • Kaixu Xia's avatar
      bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter · 35578d79
      Kaixu Xia authored
      According to the perf_event_map_fd and index, the function
      bpf_perf_event_read() can convert the corresponding map
      value to the pointer to struct perf_event and return the
      Hardware PMU counter value.
      Signed-off-by: default avatarKaixu Xia <xiakaixu@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35578d79
    • Kaixu Xia's avatar
      bpf: Add new bpf map type to store the pointer to struct perf_event · ea317b26
      Kaixu Xia authored
      Introduce a new bpf map type 'BPF_MAP_TYPE_PERF_EVENT_ARRAY'.
      This map only stores the pointer to struct perf_event. The
      user space event FDs from perf_event_open() syscall are converted
      to the pointer to struct perf_event and stored in map.
      Signed-off-by: default avatarKaixu Xia <xiakaixu@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea317b26
    • Wang Nan's avatar
      bpf: Make the bpf_prog_array_map more generic · 2a36f0b9
      Wang Nan authored
      All the map backends are of generic nature. In order to avoid
      adding much special code into the eBPF core, rewrite part of
      the bpf_prog_array map code and make it more generic. So the
      new perf_event_array map type can reuse most of code with
      bpf_prog_array map and add fewer lines of special code.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarKaixu Xia <xiakaixu@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a36f0b9
    • Kaixu Xia's avatar
      perf: add the necessary core perf APIs when accessing events counters in eBPF programs · ffe8690c
      Kaixu Xia authored
      This patch add three core perf APIs:
       - perf_event_attrs(): export the struct perf_event_attr from struct
         perf_event;
       - perf_event_get(): get the struct perf_event from the given fd;
       - perf_event_read_local(): read the events counters active on the
         current CPU;
      These APIs are needed when accessing events counters in eBPF programs.
      
      The API perf_event_read_local() comes from Peter and I add the
      corresponding SOB.
      Signed-off-by: default avatarKaixu Xia <xiakaixu@huawei.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ffe8690c
    • David S. Miller's avatar
      Merge branch 'mv88e6xxx-switchdev-fdb' · f1d5ca43
      David S. Miller authored
      Vivien Didelot says:
      
      ====================
      net: dsa: mv88e6xxx: support switchdev FDB objects
      
      This patchset refactors the DSA and mv88e6xxx code to use the switchdev FDB
      objects.
      
      The first two patches add minor but necessary changes to switchdev, the third
      one implements the switchdev glue in DSA for FDB routines, and the remaining
      ones refactor the FDB access functions in the mv88e6xxx code.
      
      Below is an usage example (ports 0-2 belongs to br0, ports 3-4 belongs to br1):
      
          # bridge fdb add 3c:97:0e:11:30:6e dev swp2
          # bridge fdb add 3c:97:0e:11:40:78 dev swp3
          # bridge fdb add 3c:97:0e:11:50:86 dev swp4
          # bridge fdb del 3c:97:0e:11:40:78 dev swp3
          # bridge fdb
          01:00:5e:00:00:01 dev eth0 self permanent
          01:00:5e:00:00:01 dev eth1 self permanent
          00:50:d2:10:78:15 dev swp0 master br0 permanent
          3c:97:0e:11:30:6e dev swp2 self static
          00:50:d2:10:78:15 dev swp3 master br1 permanent
          3c:97:0e:11:50:86 dev swp4 self static
          # cat /sys/kernel/debug/dsa0/atu
          # DB   T/P  Vec State Addr
          # 001  Port 004   e   3c:97:0e:11:30:6e
          # 004  Port 010   e   3c:97:0e:11:50:86
      
      For the 88E6xxx switches, FIDs 1 to num_ports will be reserved for non-bridged
      ports and bridge groups, and the remaining will be later used by VLANs.
      
      This change is necessary to welcome the support for hardware VLANs (which will
      follow soon).
      
      Changes in v2:
      
       - remove ndo_bridge_{get,set,del}link from switchdev/DSA glue code
      
       - use ether_addr_copy instead of memcpy for MAC addresses
      
       - constify MAC address in port_fdb_{add,del}
      
       - split the mv88e6xxx code refactoring into several patches
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1d5ca43
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: rework FDB add/del operations · 87820510
      Vivien Didelot authored
      Add a low level function for the ATU Load operation, and provide FDB add
      and delete wrappers functions.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87820510
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: rework FDB getnext operation · 6630e236
      Vivien Didelot authored
      This commit adds a low level _mv88e6xxx_atu_getnext function and helpers
      to rewrite the mv88e6xxx_port_fdb_getnext operation.
      
      A mv88e6xxx_atu_entry structure is added for convenient access to the
      hardware, and GLOBAL_ATU_FID is defined instead of the raw 0x01 value.
      
      The previous implementation did not handle the eventual trunk mapping.
      If the related bit is set, then the ATU data register would contain the
      trunk ID, and not the port vector.
      
      Check this in the FDB getnext operation and do not handle it (yet).
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6630e236
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: rename ATU MAC accessors · 395059fb
      Vivien Didelot authored
      Rename the __mv88e6xxx_{read,write}_addr functions to more explicit
      _mv88e6xxx_atu_mac_{read,write} functions, which also respect the single
      underscore convention used in the file (meaning SMI lock must be held).
      
      In the meantime, define their MAC address parameters as an array of
      ETH_ALEN bytes instead of a char pointer.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      395059fb
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: extend fid mask · 368b1d9c
      Vivien Didelot authored
      The driver currently manages one FID per port (or bridge group), with a
      mask of DSA_MAX_PORTS bits, where 0 means that the FID is in use.
      
      The Marvell 88E6xxx switches support up to 4094 FIDs (from 1 to 0xfff;
      FID 0 means that multiple address databases are not being used).
      
      This patch changes the fid_mask for an fid_bitmap of 4096 bits.
      
      >From now on, FIDs 1 to num_ports are reserved for non-bridged ports and
      bridge groups (a bridge group gets the FID of its first member). The
      remaining bits will be reserved for VLAN entries.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      368b1d9c
    • Vivien Didelot's avatar
      net: dsa: add support for switchdev FDB objects · 55045ddd
      Vivien Didelot authored
      Remove the fdb_{add,del,getnext} function pointer in favor of new
      port_fdb_{add,del,getnext}.
      
      Implement the switchdev_port_obj_{add,del,dump} functions in DSA to
      support the SWITCHDEV_OBJ_PORT_FDB objects.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55045ddd
    • Vivien Didelot's avatar
      net: switchdev: support static FDB addresses · 89024826
      Vivien Didelot authored
      This patch adds a is_static boolean to the switchdev_obj_fdb structure,
      in order to set the ndm_state to either NUD_NOARP or NUD_REACHABLE.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89024826
    • Vivien Didelot's avatar
      net: switchdev: change fdb addr for a byte array · 1525c386
      Vivien Didelot authored
      The address in the switchdev_obj_fdb structure is currently represented
      as a pointer. Replacing it for a 6-byte array allows switchdev to carry
      addresses directly read from hardware registers, not stored by the
      switch chip driver (as in Rocker).
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1525c386
    • Masanari Iida's avatar
      net:wimax: Fix doucble word "the the" in networking.xml · 4933d85c
      Masanari Iida authored
      This patch fix a double word "the the"
      in Documentation/DocBook/networking.xml and
      Documentation/DocBook/networking/API-Wimax-report-rfkill-sw.html.
      
      These files are generated from comment in source, so I had to
      fix the typo in net/wimax/io-rfkill.c
      Signed-off-by: default avatarMasanari Iida <standby24x7@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4933d85c
  2. 07 Aug, 2015 13 commits
    • Tom Herbert's avatar
      net: Fix race condition in store_rps_map · 10e4ea75
      Tom Herbert authored
      There is a race condition in store_rps_map that allows jump label
      count in rps_needed to go below zero. This can happen when
      concurrently attempting to set and a clear map.
      
      Scenario:
      
      1. rps_needed count is zero
      2. New map is assigned by setting thread, but rps_needed count _not_ yet
         incremented (rps_needed count still zero)
      2. Map is cleared by second thread, old_map set to that just assigned
      3. Second thread performs static_key_slow_dec, rps_needed count now goes
         negative
      
      Fix is to increment or decrement rps_needed under the spinlock.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10e4ea75
    • Wenyu Zhang's avatar
      openvswitch: Make 100 percents packets sampled when sampling rate is 1. · e05176a3
      Wenyu Zhang authored
      When sampling rate is 1, the sampling probability is UINT32_MAX. The packet
      should be sampled even the prandom32() generate the number of UINT32_MAX.
      And none packet need be sampled when the probability is 0.
      Signed-off-by: default avatarWenyu Zhang <wenyuz@vmware.com>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e05176a3
    • Alexei Starovoitov's avatar
      vxlan: combine VXLAN_FLOWBASED into VXLAN_COLLECT_METADATA · da8b43c0
      Alexei Starovoitov authored
      IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA,
      so combine them into single IFLA_VXLAN_COLLECT_METADATA flag.
      'flowbased' doesn't convey real meaning of the vxlan tunnel mode.
      This mode can be used by routing, tc+bpf and ovs.
      Only ovs is strictly flow based, so 'collect metadata' is a better
      name for this tunnel mode.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da8b43c0
    • David S. Miller's avatar
      Merge branch 'rds-tcp-netns' · e03c5128
      David S. Miller authored
      Sowmini Varadhan says:
      
      ====================
      RDS-TCP: Network namespace support
      
      This patch series contains the set of changes to correctly set up
      the infra for PF_RDS sockets that use TCP as the transport in multiple
      network namespaces.
      
      Patch 1 in the series is the minimal set of changes to allow
      a single instance of RDS-TCP to run in any (i.e init_net or other) net
      namespace.  The changes in this patch set ensure that the execution of
      'modprobe [-r] rds_tcp' sets up the kernel TCP sockets
      relative to the current netns, so that RDS applications can send/recv
      packets from that netns, and the netns can later be deleted cleanly.
      
      Patch 2 of the series further allows multiple RDS-TCP instances,
      one per network namespace. The changes in this patch allows dynamic
      creation/tear-down of RDS-TCP client and server sockets  across all
      current and future namespaces.
      
      v2 changes from RFC sent out earlier:
          David Ahern comments in patch 1, net_device notifier in patch 2,
          patch 3 broken off and submitted separately.
      v3: Cong Wang review comments.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e03c5128
    • Sowmini Varadhan's avatar
      RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns. · 467fa153
      Sowmini Varadhan authored
      Register pernet subsys init/stop functions that will set up
      and tear down per-net RDS-TCP listen endpoints. Unregister
      pernet subusys functions on 'modprobe -r' to clean up these
      end points.
      
      Enable keepalive on both accept and connect socket endpoints.
      The keepalive timer expiration will ensure that client socket
      endpoints will be removed as appropriate from the netns when
      an interface is removed from a namespace.
      
      Register a device notifier callback that will clean up all
      sockets (and thus avoid the need to wait for keepalive timeout)
      when the loopback device is unregistered from the netns indicating
      that the netns is getting deleted.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      467fa153
    • Sowmini Varadhan's avatar
      RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net · d5a8ac28
      Sowmini Varadhan authored
      Open the sockets calling sock_create_kern() with the correct struct net
      pointer, and use that struct net pointer when verifying the
      address passed to rds_bind().
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5a8ac28
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 1ebd08a7
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-08-05
      
      This series contains updates to i40e, i40evf and e1000e.
      
      Anjali adds support for x772 devices to i40e and i40evf.  With the added
      support, x772 supports offloading of the outer UDP transmit and receive
      checksum for tunneled packets.  Also supports evicting ATR filters in the
      hardware, so update the driver with this new feature set.
      
      Raanan provides several fixes for e1000e, first rectifies the Energy
      Efficient Ethernet in Sx code so that it only applies to parts that
      actually support EEE in Sx.  Fix whitespace and moved ICH8 related define
      to the proper context.  Fixed the ASPM locking which was reported by
      Bjorn Helgaas.  Fix a workaround implementation for systime which could
      experience a large non-linear increment of the systime value when
      checking for overflow.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ebd08a7
    • Jason A. Donenfeld's avatar
      net_dbg_ratelimited: turn into no-op when !DEBUG · d92cff89
      Jason A. Donenfeld authored
      The pr_debug family of functions turns into a no-op when -DDEBUG is not
      specified, opting instead to call "no_printk", which gets compiled to a
      no-op (but retains gcc's nice warnings about printf-style arguments).
      
      The problem with net_dbg_ratelimited is that it is defined to be a
      variant of net_ratelimited_function, which expands to essentially:
      
          if (net_ratelimit())
              pr_debug(fmt, ...);
      
      When DEBUG is not defined, then this becomes,
      
          if (net_ratelimit())
              ;
      
      This seems benign, except it isn't. Firstly, there's the obvious
      overhead of calling net_ratelimit needlessly, which does quite some book
      keeping for the rate limiting. Given that the pr_debug and
      net_dbg_ratelimited family of functions are sprinkled liberally through
      performance critical code, with developers assuming they'll be compiled
      out to a no-op most of the time, we certainly do not want this needless
      book keeping. Secondly, and most visibly, even though no debug message
      is printed when DEBUG is not defined, if there is a flood of
      invocations, dmesg winds up peppered with messages such as
      "net_ratelimit: 320 callbacks suppressed". This is because our
      aforementioned net_ratelimit() function actually prints this text in
      some circumstances. It's especially odd to see this when there isn't any
      other accompanying debug message.
      
      So, in sum, it doesn't make sense to have this function's current
      behavior, and instead it should match what every other debug family of
      functions in the kernel does with !DEBUG -- nothing.
      
      This patch replaces calls to net_dbg_ratelimited when !DEBUG with
      no_printk, keeping with the idiom of all the other debug print helpers.
      
      Also, though not strictly neccessary, it guards the call with an if (0)
      so that all evaluation of any arguments are sure to be compiled out.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d92cff89
    • Roopa Prabhu's avatar
      af_mpls: add null dev check in find_outdev · 3dcb615e
      Roopa Prabhu authored
      This patch adds null dev check for the 'cfg->rc_via_table ==
      NEIGH_LINK_TABLE or dev_get_by_index() failed' case
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dcb615e
    • David S. Miller's avatar
      Merge branch 'test-bpf-next' · 3c621818
      David S. Miller authored
      Nicolas Schichan says:
      
      ====================
      test_bpf improvements
      
      Please find below the patch series with my latest changes to test_bpf.
      
      The first patch checks for unexpected NULL generated skbs before
      running the filter.
      
      The second patch adds the possibility for tests to generate fragmented
      skbs.
      
      The third patch tests LD_ABS and LD_IND on fragmented skbs.
      
      The fourth patch adds the possibility to restrict the tests being run
      by specifying the name/id/range of the test(s) to run via module
      parameters.
      
      The fifth patch tests LD_ABS and LD_IND on non fragmented skbs with
      various sizes and alignments.
      
      The sixth and final patch checks that the interpreter or JIT correctly
      resets A and X to 0.
      
      This serie is against today's net-next tree.
      
      Changes in V2:
      
      * move declaration of 'ptr' in if() block in patch 2/6.
      
      * fix various typos in patch 4/6
      
      * rework default init of test_range array and cleanup exclude_test()
        return condition in patch 4/6.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c621818
    • Nicolas Schichan's avatar
      test_bpf: add tests checking that JIT/interpreter sets A and X to 0. · 86bf1721
      Nicolas Schichan authored
      It is mandatory for the JIT or interpreter to reset the A and X
      registers to 0 before running the filter. Check that it is the case on
      various ALU and JMP instructions.
      Signed-off-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86bf1721
    • Nicolas Schichan's avatar
      test_bpf: add more tests for LD_ABS and LD_IND. · 08fcb08f
      Nicolas Schichan authored
      This exerces the LD_ABS and LD_IND instructions for various sizes and
      alignments. This also checks that X when used as an offset to a
      BPF_IND instruction first in a filter is correctly set to 0.
      Signed-off-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08fcb08f
    • Nicolas Schichan's avatar
      test_bpf: add module parameters to filter the tests to run. · d2648d4e
      Nicolas Schichan authored
      When developping on the interpreter or a particular JIT, it can be
      interesting to restrict the tests list to a specific test or a
      particular range of tests.
      
      This patch adds the following module parameters to the test_bpf module:
      
      * test_name=<string>: only the specified named test will be run.
      
      * test_id=<number>: only the test with the specified id will be run
        (see the output of test_bpf without parameters to get the test id).
      
      * test_range=<number>,<number>: only the tests within IDs in the
        specified id range are run (see the output of test_bpf without
        parameters to get the test ids).
      
      Any invalid range, test id or test name will result in -EINVAL being
      returned and no tests being run.
      Signed-off-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2648d4e