Commits · 84e14fe353de7624872e582887712079ba0b2d56 · Kirill Smelkov / linux

30 Sep, 2017 2 commits

net-ipv6: add support for sockopt(SOL_IPV6, IPV6_FREEBIND) · 84e14fe3

Maciej Żenczykowski authored Sep 26, 2017

So far we've been relying on sockopt(SOL_IP, IP_FREEBIND) being usable
even on IPv6 sockets.

However, it turns out it is perfectly reasonable to want to set freebind
on an AF_INET6 SOCK_RAW socket - but there is no way to set any SOL_IP
socket option on such a socket (they're all blindly errored out).

One use case for this is to allow spoofing src ip on a raw socket
via sendmsg cmsg.

Tested:
  built, and booted
  # python
  >>> import socket
  >>> SOL_IP = socket.SOL_IP
  >>> SOL_IPV6 = socket.IPPROTO_IPV6
  >>> IP_FREEBIND = 15
  >>> IPV6_FREEBIND = 78
  >>> s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0)
  >>> s.getsockopt(SOL_IP, IP_FREEBIND)
  0
  >>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
  0
  >>> s.setsockopt(SOL_IPV6, IPV6_FREEBIND, 1)
  >>> s.getsockopt(SOL_IP, IP_FREEBIND)
  1
  >>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
  1
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84e14fe3

net: ipv6: send NS for DAD when link operationally up · 1f372c7b

Mike Manning authored Sep 25, 2017

The NS for DAD are sent on admin up as long as a valid qdisc is found.
A race condition exists by which these packets will not egress the
interface if the operational state of the lower device is not yet up.
The solution is to delay DAD until the link is operationally up
according to RFC2863. Rather than only doing this, follow the existing
code checks by deferring IPv6 device initialization altogether. The fix
allows DAD on devices like tunnels that are controlled by userspace
control plane. The fix has no impact on regular deployments, but means
that there is no IPv6 connectivity until the port has been opened in
the case of port-based network access control, which should be
desirable.
Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f372c7b

29 Sep, 2017 10 commits

net: ipv4: remove fib_info arg to fib_check_nh · fa8fefaa

David Ahern authored Sep 27, 2017

fib_check_nh does not use the fib_info arg; remove t.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa8fefaa

net: ipv4: remove fib_weight · c7c3e591

David Ahern authored Sep 27, 2017

fib_weight in fib_info is set but not used. Remove it and the
helpers for setting it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7c3e591

Merge branch 'bpf-extend-info' · fadad670

David S. Miller authored Sep 29, 2017

Martin KaFai Lau says:

====================
bpf: Extend bpf_{prog,map}_info

This patch series adds more fields to bpf_prog_info and bpf_map_info.
Please see individual patch for details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fadad670

bpf: Test new fields in bpf_attr and bpf_{prog, map}_info · 3a8ad560

Martin KaFai Lau authored Sep 27, 2017

This patch tests newly added fields of the bpf_attr,
bpf_prog_info and bpf_map_info.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a8ad560

bpf: Swap the order of checking prog_info and map_info · 6e525d06

Martin KaFai Lau authored Sep 27, 2017

This patch swaps the checking order.  It now checks the map_info
first and then prog_info.  It is a prep work for adding
test to the newly added fields (the map_ids of prog_info field
in particular).
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e525d06

bpf: libbpf: Provide basic API support to specify BPF obj name · 88cda1c9

Martin KaFai Lau authored Sep 27, 2017

This patch extends the libbpf to provide API support to
allow specifying BPF object name.

In tools/lib/bpf/libbpf, the C symbol of the function
and the map is used.  Regarding section name, all maps are
under the same section named "maps".  Hence, section name
is not a good choice for map's name.  To be consistent with
map, bpf_prog also follows and uses its function symbol as
the prog's name.

This patch adds logic to collect function's symbols in libbpf.
There is existing codes to collect the map's symbols and no change
is needed.

The bpf_load_program_name() and bpf_map_create_name() are
added to take the name argument.  For the other bpf_map_create_xxx()
variants, a name argument is directly added to them.

In samples/bpf, bpf_load.c in particular, the symbol is also
used as the map's name and the map symbols has already been
collected in the existing code.  For bpf_prog, bpf_load.c does
not collect the function symbol name.  We can consider to collect
them later if there is a need to continue supporting the bpf_load.c.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

88cda1c9

bpf: Add map_name to bpf_map_info · ad5b177b

Martin KaFai Lau authored Sep 27, 2017

This patch allows userspace to specify a name for a map
during BPF_MAP_CREATE.

The map's name can later be exported to user space
via BPF_OBJ_GET_INFO_BY_FD.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad5b177b

bpf: Add name, load_time, uid and map_ids to bpf_prog_info · cb4d2b3f

Martin KaFai Lau authored Sep 27, 2017

The patch adds name and load_time to struct bpf_prog_aux.  They
are also exported to bpf_prog_info.

The bpf_prog's name is passed by userspace during BPF_PROG_LOAD.
The kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes
and the name stored in the kernel is always \0 terminated.

The kernel will reject name that contains characters other than
isalnum() and '_'.  It will also reject name that is not null
terminated.

The existing 'user->uid' of the bpf_prog_aux is also exported to
the bpf_prog_info as created_by_uid.

The existing 'used_maps' of the bpf_prog_aux is exported to
the newly added members 'nr_map_ids' and 'map_ids' of
the bpf_prog_info.  On the input, nr_map_ids tells how
big the userspace's map_ids buffer is.  On the output,
nr_map_ids tells the exact user_map_cnt and it will only
copy up to the userspace's map_ids buffer is allowed.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

cb4d2b3f

tcp: fix under-evaluated ssthresh in TCP Vegas · cf5d74b8

Hoang Tran authored Sep 27, 2017

With the commit 76174004 (tcp: do not slow start when cwnd equals
ssthresh), the comparison to the reduced cwnd in tcp_vegas_ssthresh() would
under-evaluate the ssthresh.
Signed-off-by: Hoang Tran <hoang.tran@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf5d74b8

net: bridge: add per-port group_fwd_mask with less restrictions · 5af48b59

Nikolay Aleksandrov authored Sep 27, 2017

We need to be able to transparently forward most link-local frames via
tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
mask which restricts the forwarding of STP and LACP, but we need to be able
to forward these over tunnels and control that forwarding on a per-port
basis thus add a new per-port group_fwd_mask option which only disallows
mac pause frames to be forwarded (they're always dropped anyway).
The patch does not change the current default situation - all of the others
are still restricted unless configured for forwarding.
We have successfully tested this patch with LACP and STP forwarding over
VxLAN and qinq tunnels.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5af48b59

28 Sep, 2017 28 commits

Merge branch 'hns3-dcb' · de9c8a6a

David S. Miller authored Sep 28, 2017

Yunsheng Lin says:

====================
Add support for DCB feature in hns3 driver

The patchset contains some enhancement related to DCB before
adding support for DCB feature.

This patchset depends on the following patchset:
https://patchwork.ozlabs.org/cover/815646/
https://patchwork.ozlabs.org/cover/816145/

High Level Architecture:

                   [ lldpad ]
                       |
                       |
                       |
                 [ hns3_dcbnl ]
                       |
                       |
                       |
                 [ hclge_dcb ]
                   /      \
                /            \
             /                  \
     [ hclge_main ]        [ hclge_tm ]

Current patch-set support following functionality:
   Use of lldptool to configure the tc schedule mode, tc
   bandwidth(if schedule mode is ETS), prio_tc_map and
   PFC parameter.

V3: Drop mqprio support

V2: Fix for not defining variables in local loop.

V1: Initial Submit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

de9c8a6a

net: hns3: Add DCB support when interacting with network stack · 9df8f79a

Yunsheng Lin authored Sep 27, 2017

When using lldptool to configure DCB parameter, hclge_dcb module
call the client_ops->setup_tc to tell network stack which queue
and priority is using for specific tc.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9df8f79a

net: hns3: Setting for fc_mode and dcb enable flag in TM module · 7979a223

Yunsheng Lin authored Sep 27, 2017

After the DCB feature is supported, fc_mode and dcb enable flag
must be set according to the DCB parameter.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7979a223

net: hns3: Add dcb netlink interface for the support of DCB feature · 986743db

Yunsheng Lin authored Sep 27, 2017

This patch add dcb netlink interface by calling the interface from
hclge_dcb module.

This patch also update Makefile in order to build hns3_dcbnl module.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

986743db

net: hns3: Add hclge_dcb module for the support of DCB feature · cacde272

Yunsheng Lin authored Sep 27, 2017

The hclge_dcb module calls the interface from hclge_main/tm
and provide interface for the dcb netlink interface.

This patch also update Makefiles required to build the DCB
supported code in HNS3 Ethernet driver and update the existing
Kconfig file in the hisilicon folder.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cacde272

net: hns3: Add some interface for the support of DCB feature · 77f255c1

Yunsheng Lin authored Sep 27, 2017

This patch add some interface and export some interface from
hclge_tm and hclgc_main to support the upcoming DCB feature.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

77f255c1

net: hns3: Add tc-based TM support for sriov enabled port · cc9bb43a

Yunsheng Lin authored Sep 27, 2017

When sriov is enabled and TM is in tc-based mode, vf's TM
parameters is not set in TM initialization process.
This patch add the tc_based TM support for sriov enabled
using the information in vport struct.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc9bb43a

net: hns3: Add support for port shaper setting in TM module · 0a5677d3

Yunsheng Lin authored Sep 27, 2017

This patch add a tm_port_shaper cmd and set port shaper
to HCLGE_ETHER_MAX_RATE on TM initialization process.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a5677d3

net: hns3: Add support for PFC setting in TM module · 9dc2145d

Yunsheng Lin authored Sep 27, 2017

This patch add a pfc_pause_en cmd, and use it to configure
PFC option according to fc_mode in hdev->tm_info.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9dc2145d

net: hns3: Add support for dynamically buffer reallocation · acf61ecd

Yunsheng Lin authored Sep 27, 2017

Current buffer allocation can only happen at init, when
doing buffer reallocation after init, care must be taken
care of memory which priv_buf points to.
This patch fixes it by using a dynamic allocated temporary
memory. Because we only do buffer reallocation at init or
when setting up the DCB parameter, and priv_buf is only
used at buffer allocation process, so it is ok to use a
dynamic allocated temporary memory.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

acf61ecd

net: hns3: Support for dynamically assigning tx buffer to TC · 9ffe79a9

Yunsheng Lin authored Sep 27, 2017

This patch add support of dynamically assigning tx buffer to
TC when the TC is enabled.
It will save buffer for rx direction to avoid packet loss.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ffe79a9

arp: make arp_hdr_len() return unsigned int · 6ade97da

Alexey Dobriyan authored Sep 26, 2017

Negative ARP header length are not a thing.

Constify arguments while I'm at it.

Space savings:

	add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-3 (-3)
	function                        old     new   delta
	arpt_do_table                  1163    1160      -3
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ade97da

net-next/hinic: Fix a case of Tx Queue is Stopped forever · bbdc9e68

Aviad Krawczyk authored Sep 27, 2017

Fix the following scenario:
1. tx_free_poll is running on cpu X
2. xmit function is running on cpu Y and fails to get sq wqe
3. tx_free_poll frees wqes on cpu X and checks the queue is not stopped
4. xmit function stops the queue after failed to get sq wqe
5. The queue is stopped forever
Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bbdc9e68

net-next/hinic: Set Rxq irq to specific cpu for NUMA · 352f58b0

Aviad Krawczyk authored Sep 27, 2017

Set Rxq irq to specific cpu for allocating and receiving the skb from
the same node.
Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

352f58b0

Merge branch 'bpf-verifier-disassembly-improvements' · 93771b01

David S. Miller authored Sep 28, 2017

Edward Cree says:

====================
bpf/verifier: disassembly improvements

Fix the output of print_bpf_insn() for ALU ops that don't look like
 compound assignment (i.e. BPF_END and BPF_NEG).

Sample output for a short test program:
0: (b4) (u32) r0 = (u32) 0
1: (dc) r0 = be32 r0
2: (84) r0 = (u32) -r0
3: (95) exit
processed 4 insns, stack depth 0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

93771b01

bpf/verifier: improve disassembly of BPF_NEG instructions · 73c864b3

Edward Cree authored Sep 26, 2017

BPF_NEG takes only one operand, unlike the bulk of BPF_ALU[64] which are
 compound-assignments.  So give it its own format in print_bpf_insn().
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

73c864b3

bpf/verifier: improve disassembly of BPF_END instructions · 2b7c6ba9

Edward Cree authored Sep 26, 2017

print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
 different structure: it has a size in insn->imm (even if it's BPF_X) and
 uses the BPF_SRC (X or K) to indicate which endianness to use.  So it
 needs different code to print it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b7c6ba9

cxgb4: make function ch_flower_stats_cb, fixes warning · 4d8806fd

Colin Ian King authored Sep 26, 2017

The function ch_flower_stats_cb is local to the source and does not need
to be in global scope, so make it static.

Cleans up sparse warnings:
symbol 'ch_flower_stats_cb' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d8806fd

Merge branch 'rtnl-pushdown-prep' · f46d9632

David S. Miller authored Sep 28, 2017

Florian Westphal says:

====================
rtnetlink: preparation patches for further rtnl lock pushdown/removal

Patches split large rtnl_fill_ifinfo into smaller chunks
to better see which parts

1. require rtnl
2. do not require it at all
3. rely on rtnl locking now but could be converted

Changes since v3:

I dropped the 'ifalias' patch, I have a change to decouple ifalias and
rtnl mutex, I will send it once this series has been merged.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f46d9632

rtnetlink: rtnl_have_link_slave_info doesn't need rtnl · 4c82a95e

Florian Westphal authored Sep 26, 2017

it can be switched to rcu.
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c82a95e

rtnetlink: add helpers to dump netnsid information · b1e66b9a

Florian Westphal authored Sep 26, 2017

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

b1e66b9a

rtnetlink: add helpers to dump vf information · 250fc3df

Florian Westphal authored Sep 26, 2017

similar to earlier patches, split out more parts of this function to
better see what is happening and where we assume rtnl is locked.
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

250fc3df

rtnetlink: add helper to put master and link ifindexes · 79110a04

Florian Westphal authored Sep 26, 2017

rtnl_fill_ifinfo currently requires caller to hold the rtnl mutex.
Unfortunately the function is quite large which makes it harder to see
which spots require the lock, which spots assume it and which ones could
do without.

Add helpers to factor out the ifindex dumping, one can use rcu to avoid
rtnl dependency.
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

79110a04

selftests: rtnetlink.sh: add rudimentary vrf test · 61f26d92

Florian Westphal authored Sep 26, 2017

Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

61f26d92

net_sched: use idr to allocate u32 filter handles · e7614370

Cong Wang authored Sep 25, 2017

Instead of calling u32_lookup_ht() in a loop to find
a unused handle, just switch to idr API to allocate
new handles. u32 filters are special as the handle
could contain a hash table id and a key id, so we
need two IDR to allocate each of them.

Cc: Chris Mi <chrism@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7614370

net_sched: use idr to allocate basic filter handles · 1d8134fe

Cong Wang authored Sep 25, 2017

Instead of calling basic_get() in a loop to find
a unused handle, just switch to idr API to allocate
new handles.

Cc: Chris Mi <chrism@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d8134fe

net_sched: use idr to allocate bpf filter handles · 76cf546c

Cong Wang authored Sep 25, 2017

Instead of calling cls_bpf_get() in a loop to find
a unused handle, just switch to idr API to allocate
new handles.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Chris Mi <chrism@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76cf546c

inetpeer: speed up inetpeer_invalidate_tree() · 8f1975e3

Eric Dumazet authored Sep 25, 2017

As measured in my prior patch ("sch_netem: faster rb tree removal"),
rbtree_postorder_for_each_entry_safe() is nice looking but much slower
than using rb_next() directly, except when tree is small enough
to fit in CPU caches (then the cost is the same)

From: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8f1975e3