Commits · 6f3dfb0dc831953187fea8e3b798768611441321 · Kirill Smelkov / linux

12 Jul, 2018 40 commits

net/sched: skbedit: use per-cpu counters · 6f3dfb0d

Davide Caratti authored Jul 11, 2018

use per-CPU counters, instead of sharing a single set of stats with all
cores: this removes the need of spinlocks when stats are read/updated.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f3dfb0d

tcp: use monotonic timestamps for PAWS · cca9bab1

Arnd Bergmann authored Jul 11, 2018

Using get_seconds() for timestamps is deprecated since it can lead
to overflows on 32-bit systems. While the interface generally doesn't
overflow until year 2106, the specific implementation of the TCP PAWS
algorithm breaks in 2038 when the intermediate signed 32-bit timestamps
overflow.

A related problem is that the local timestamps in CLOCK_REALTIME form
lead to unexpected behavior when settimeofday is called to set the system
clock backwards or forwards by more than 24 days.

While the first problem could be solved by using an overflow-safe method
of comparing the timestamps, a nicer solution is to use a monotonic
clocksource with ktime_get_seconds() that simply doesn't overflow (at
least not until 136 years after boot) and that doesn't change during
settimeofday().

To make 32-bit and 64-bit architectures behave the same way here, and
also save a few bytes in the tcp_options_received structure, I'm changing
the type to a 32-bit integer, which is now safe on all architectures.

Finally, the ts_recent_stamp field also (confusingly) gets used to store
a jiffies value in tcp_synq_overflow()/tcp_synq_no_recent_overflow().
This is currently safe, but changing the type to 32-bit requires
some small changes there to keep it working.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cca9bab1

net/tls: Use aead_request_alloc/free for request alloc/free · d2bdd268

Vakul Garg authored Jul 11, 2018

Instead of kzalloc/free for aead_request allocation and free, use
functions aead_request_alloc(), aead_request_free(). It ensures that
any sensitive crypto material held in crypto transforms is securely
erased from memory.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2bdd268

tc-testing: add geneve options in tunnel_key unit tests · cba54f9c

Pieter Jansen van Vuuren authored Jul 10, 2018

Extend tc tunnel_key action unit tests with geneve options. Tests
include testing single and multiple geneve options, as well as
testing geneve options that are expected to fail.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cba54f9c

Merge branch 'be2net-small-structures-clean-up' · f1fbeada

David S. Miller authored Jul 12, 2018

Ivan Vecera says:

====================
be2net: small structures clean-up

The series:
- removes unused / unneccessary fields in several be2net structures
- re-order fields in some structures to eliminate holes, cache-lines
  crosses
- as result reduces size of main struct be_adapter by 4kB
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f1fbeada

be2net: move rss_flags field in rss_info to ensure proper alignment · 28ace84b

Ivan Vecera authored Jul 10, 2018

The current position of .rss_flags field in struct rss_info causes
that fields .rsstable and .rssqueue (both 128 bytes long) crosses
cache-line boundaries. Moving it at the end properly align all fields.

Before patch:
struct rss_info {
        u64                        rss_flags;            /*     0     8 */
        u8                         rsstable[128];        /*     8   128 */
        /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
        u8                         rss_queue[128];       /*   136   128 */
        /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
        u8                         rss_hkey[40];         /*   264    40 */
};

After patch:
struct rss_info {
        u8                         rsstable[128];        /*     0   128 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        u8                         rss_queue[128];       /*   128   128 */
        /* --- cacheline 4 boundary (256 bytes) --- */
        u8                         rss_hkey[40];         /*   256    40 */
        u64                        rss_flags;            /*   296     8 */
};
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

28ace84b

be2net: re-order fields in be_error_recovert to avoid hole · 03d231a9

Ivan Vecera authored Jul 10, 2018

- Unionize two u8 fields where only one of them is used depending on NIC
chipset.
- Move recovery_supported field after that union

These changes eliminate 7-bytes hole in the struct and makes it smaller
by 8 bytes.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

03d231a9

be2net: remove unused tx_jiffies field from be_tx_stats · f9520b86
Ivan Vecera authored Jul 10, 2018
```
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
f9520b86

be2net: move txcp field in be_tx_obj to eliminate holes in the struct · 646d2c10

Ivan Vecera authored Jul 10, 2018

Before patch:
struct be_tx_obj {
        u32                        db_offset;            /*     0     4 */

        /* XXX 4 bytes hole, try to pack */

        struct be_queue_info       q;                    /*     8    56 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct be_queue_info       cq;                   /*    64    56 */
        struct be_tx_compl_info    txcp;                 /*   120     4 */

        /* XXX 4 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        struct sk_buff *           sent_skb_list[2048];  /*   128 16384 */
        ...
}:

After patch:
struct be_tx_obj {
        u32                        db_offset;            /*     0     4 */
        struct be_tx_compl_info    txcp;                 /*     4     4 */
        struct be_queue_info       q;                    /*     8    56 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct be_queue_info       cq;                   /*    64    56 */
        struct sk_buff *           sent_skb_list[2048];  /*   120 16384 */
        ...
};
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

646d2c10

be2net: reorder fields in be_eq_obj structure · e9c74cd8

Ivan Vecera authored Jul 10, 2018

Re-order fields in struct be_eq_obj to ensure that .napi field begins
at start of cache-line. Also the .adapter field is moved to the first
cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr)
and the 4-bytes hole to 3rd cache-line.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9c74cd8

be2net: remove desc field from be_eq_obj · d6d9704a

Ivan Vecera authored Jul 10, 2018

The event queue description (be_eq_obj.desc) field is used only to format
string for IRQ name and it is not really needed to hold this value.
Remove it and use local variable to format string for IRQ name.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

d6d9704a

be2net: remove unused old custom busy-poll fields · c1328a27

Ivan Vecera authored Jul 10, 2018

The commit fb6113e6 ("be2net: get rid of custom busy poll code")
replaced custom busy-poll code by the generic one but left several
macros and fields in struct be_eq_obj that are currently unused.
Remove this stuff.

Fixes: fb6113e6 ("be2net: get rid of custom busy poll code")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1328a27

be2net: remove unused old AIC info · a5d7fcb6

Ivan Vecera authored Jul 10, 2018

The commit 2632bafd ("be2net: fix adaptive interrupt coalescing")
introduced a separate struct be_aic_obj to hold AIC information but
unfortunately left the old stuff in be_eq_obj. So remove it.

Fixes: 2632bafd ("be2net: fix adaptive interrupt coalescing")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5d7fcb6

net: ethernet: ti: cpts: break cycle once late ts is matched · d0c694fc

Ivan Khoronzhuk authored Jul 10, 2018

The late ts queue can contain a bunch of skbs while hi rate testing,
no need to check all of them if timestamp is already matched.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0c694fc

selftests: forwarding: mirror_gre_nh: Unset rp_filter on host VRF · 42801298

Petr Machata authored Jul 10, 2018

The mirrored packets arrive at $h3 encapsulated in GRE/IPv4, with IP
address from 192.0.2.128/28 network. However the interface is configured
as a member of 192.0.2.160/28 and there's no route directing traffic
from the former network through that interface. Correspondingly, the RP
filter on the VRF rejects it.

Therefore turn off the VRF's RP filter.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

42801298

Merge branch 'mlxsw-ERSPAN-Take-LACP-state-into-consideration' · d90a5215

David S. Miller authored Jul 11, 2018

Ido Schimmel says:

====================
mlxsw: ERSPAN: Take LACP state into consideration

Petr says:

When offloading mirror-to-gretap, mlxsw needs to preroute the path that
the encapsulated packet will take. That path may include a LAG device
above a front panel port. So far, mlxsw resolved the path to the first
up front panel slave of the LAG interface, but that only reflects
administrative state of the port. It neglects to consider whether the
port actually has a carrier, and what the LACP state is. This patch set
aims to address these problems.

Patch #1 publishes team_port_get_rcu().

Then in patch #2, a new function is introduced,
mlxsw_sp_port_dev_check(). That returns, for a given netdevice that is a
slave of a LAG device, whether that device is "txable", i.e. whether the
LAG master would send traffic through it. Since there's no good place to
put LAG-wide helpers, introduce a new header include/net/lag.h.

Finally in patch #3, fix the slave selection logic to take into
consideration whether a given slave has a carrier and whether it is
txable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d90a5215

mlxsw: spectrum_span: Change LAG lower selection · b5de82f3

Petr Machata authored Jul 10, 2018

When offloading mirror-to-gretap, mlxsw needs to preroute the path that
the encapsulated packet will take. That path may include a LAG device
above a front panel port. So far, mlxsw resolved the path to the first
up front panel slave of the LAG interface, but that only reflects
administrative state of the port. It neglects to consider whether the
port actually has a carrier, and what the LACP state is.

So instead of checking upness of the device, check carrier state and
txability.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5de82f3

net: Add lag.h, net_lag_port_dev_txable() · eeed992b

Petr Machata authored Jul 10, 2018

LAG devices (team or bond) recognize for each one of their slave devices
whether LAG traffic is going to be sent through that device. Bond calls
such devices "active", team calls them "txable". When this state
changes, a NETDEV_CHANGELOWERSTATE notification is distributed, together
with a netdev_notifier_changelowerstate_info structure that for LAG
devices includes a tx_enabled flag that refers to the new state. The
notification thus makes it possible to react to the changes in txability
in drivers.

However there's no way to query txability from the outside on demand.
That is problematic namely for mlxsw, which when resolving ERSPAN packet
path, may encounter a LAG device, and needs to determine which of the
slaves it should choose.

To that end, introduce a new function, net_lag_port_dev_txable(), which
determines whether a given slave device is "active" or
"txable" (depending on the flavor of the LAG device). That function then
dispatches to per-LAG-flavor helpers, bond_is_active_slave_dev() resp.
team_port_dev_txable().

Because there currently is no good place where net_lag_port_dev_txable()
should be added, introduce a new header file, lag.h, which should from
now on hold any logic common to both team and bond. (But keep
netif_is_lag_master() together with the rest of netif_is_*_master()
functions).
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eeed992b

team: Publish team_port_get_rcu() · 3443b00e

Petr Machata authored Jul 10, 2018

A follow-up patch adds a new entry point, team_port_dev_txable(). Making
it an ordinary exported function would mean that any module that may
need the service in one of the supported configurations also
unconditionally needs to pull in the team module, whether or not the
user actually intends to create team interfaces.

To prevent that, team_port_dev_txable() is defined in if_team.h, and
therefore all dependencies of that function also need to be
publicly-visible.

Therefore move team_port_get_rcu() from team.c to if_team.h.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3443b00e

macvlan: Change status when lower device goes down · 80fd2d6c

Travis Brown authored Jul 10, 2018

Today macvlan ignores the notification when a lower device goes
administratively down, preventing the lack of connectivity from
bubbling up.

Processing NETDEV_DOWN results in a macvlan state of LOWERLAYERDOWN
with NO-CARRIER which should be easy to interpret in userspace.

2: lower: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
3: macvlan@lower: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
Signed-off-by: Suresh Krishnan <skrishnan@arista.com>
Signed-off-by: Travis Brown <travisb@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80fd2d6c

Merge branch 'tipc-make-link-protocol-more-resilient' · 0e97c4fb

David S. Miller authored Jul 11, 2018

Jon Maloy says:

====================
tipc: make link protocol more resilient

These two commits make the link ptotocol more resilient to
infrastructures with frequent packet duplication and long delays.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0e97c4fb

tipc: check session number before accepting link protocol messages · 7ea817f4

Jon Maloy authored Jul 10, 2018

In some virtual environments we observe a significant higher number of
packet reordering and delays than we have been used to traditionally.

This makes it necessary with stricter checks on incoming link protocol
messages' session number, which until now only has been validated for
RESET messages.

Since the other two message types, ACTIVATE and STATE messages also
carry this number, it is easy to extend the validation check to those
messages.

We also introduce a flag indicating if a link has a valid peer session
number or not. This eliminates the mixing of 32- and 16-bit arithmethics
we are currently using to achieve this.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ea817f4

tipc: add sequence number check for link STATE messages · 9012de50

Jon Maloy authored Jul 10, 2018

Some switch infrastructures produce huge amounts of packet duplicates.
This becomes a problem if those messages are STATE/NACK protocol
messages, causing unnecessary retransmissions of already accepted
packets.

We now introduce a unique sequence number per STATE protocol message
so that duplicates can be identified and ignored. This will also be
useful when tracing such cases, and to avert replay attacks when TIPC
is encrypted.

For compatibility reasons we have to introduce a new capability flag
TIPC_LINK_PROTO_SEQNO to handle this new feature.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9012de50

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · e32f55f3

David S. Miller authored Jul 11, 2018

Jeff Kirsher says:

====================
L2 Fwd Offload & 10GbE Intel Driver Updates 2018-07-09

This patch series is meant to allow support for the L2 forward offload, aka
MACVLAN offload without the need for using ndo_select_queue.

The existing solution currently requires that we use ndo_select_queue in
the transmit path if we want to associate specific Tx queues with a given
MACVLAN interface. In order to get away from this we need to repurpose the
tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
a means of accessing the queues on the lower device. As a result we cannot
offload a device that is configured as multiqueue, however it doesn't
really make sense to configure a macvlan interfaced as being multiqueue
anyway since it doesn't really have a qdisc of its own in the first place.

The big changes in this set are:
  Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
  Disable XPS for single queue devices
  Replace accel_priv with sb_dev in ndo_select_queue
  Add sb_dev parameter to fallback function for ndo_select_queue
  Consolidated ndo_select_queue functions that appeared to be duplicates
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e32f55f3

tcp: expose both send and receive intervals for rate sample · 4929c942

Deepti Raghavan authored Jul 09, 2018

Congestion control algorithms, which access the rate sample
through the tcp_cong_control function, only have access to the maximum
of the send and receive interval, for cases where the acknowledgment
rate may be inaccurate due to ACK compression or decimation. Algorithms
may want to use send rates and receive rates as separate signals.
Signed-off-by: Deepti Raghavan <deeptir@mit.edu>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4929c942

net: sched: fix unprotected access to rcu cookie pointer · e0479b67

Vlad Buslov authored Jul 09, 2018

Fix action attribute size calculation function to take rcu read lock and
access act_cookie pointer with rcu dereference.

Fixes: eec94fdb ("net: sched: use rcu for action cookie update")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0479b67

Merge branch 'cxgb4-move-stats-fetched-from-firmware-to-debugfs' · 2368957a

David S. Miller authored Jul 11, 2018

Rahul Lakkireddy says:

====================
cxgb4: move stats fetched from firmware to debugfs

Some stats are fetched via slow firmware mailbox, which can cause
packet drops under heavy load. So, this series removes these stats
from ethtool -S and expose them via debugfs.

Patch 1 removes stats fetched via firmware from ethtool -S.
Patch 2 exposes stats removed in Patch 1 via debugfs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2368957a

cxgb4: expose stats fetched from firmware via debugfs · 31e5f5c3

Rahul Lakkireddy authored Jul 09, 2018

Expose stats obtained from firmware via debugfs. These stats can't
be part of ethtool -S because the slow firmware mailbox can cause
packet drops under heavy load.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31e5f5c3

cxgb4: remove stats fetched from firmware · b351b16d

Rahul Lakkireddy authored Jul 09, 2018

When running ethtool -S, some stats are requested from firmware.
Since getting these stats via firmware mailbox is slow, some packets
get dropped under heavy load while running ethtool -S.

So, remove these stats from ethtool -S.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b351b16d

net: mvpp2: explicitly include linux/interrupt.h · b32b0881

Antoine Tenart authored Jul 09, 2018

The Marvell PPv2 driver uses interrupts and tasklet but does not
explicitly include linux/interrupt.h, relying on implicit includes. This
one particularly is included by chance after a long unlogical chain of
inclusions. Fix this so we do not get future build breaks.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b32b0881

cnic: use kvzalloc to allocate memory for csk_tbl · fb8ed3af

Jan Dakinevich authored Jul 09, 2018

Size of csk_tbl is about 58K, which means 3rd order page allocation.
kvzalloc provides a fallback if no high order memory is available.
Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb8ed3af

wimax/i2400m: remove redundant variables ack_status, bcf and protocol · 3c546728

Colin Ian King authored Jul 09, 2018

Variables ack_status, bcf and protocol are being assigned but are
never used hence they are redundant and can be removed.

Also declare ack_type as unsigned int rather than unsigned to clean
up a checkpatch warning.

Cleans up clang warnings:
warning: variable 'ack_status' set but not used [-Wunused-but-set-variable]
warning: variable 'bcf' set but not used [-Wunused-but-set-variable]
warning: variable 'protocol' set but not used [-Wunused-but-set-variable]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c546728

net: sched: act_ife: fix memory leak in ife init · 01e866bf

Vlad Buslov authored Jul 09, 2018

Free params if tcf_idr_check_alloc() returned error.

Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01e866bf

cxgb4: specify IQTYPE in fw_iq_cmd · 8dce04f1

Arjun Vynipadath authored Jul 09, 2018

congestion argument passed to t4_sge_alloc_rxq() is used
to differentiate between nic/ofld queues.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8dce04f1

Merge branch 'net-ipv6-addr_gen_mode-fixes' · 57cd07fb

David S. Miller authored Jul 11, 2018

Sabrina Dubroca says:

====================
net/ipv6: addr_gen_mode fixes

This series fixes bugs in handling of the addr_gen_mode option, mainly
related to the sysctl. A minor netlink issue was also present in the
initial commit introducing the option on a per-netdevice basis.

v2: add patch 4, requested by David Ahern during review of v1
    add patch 5, missing documentation for the sysctl
    patches 1, 2, 3 are unchanged
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

57cd07fb

Documentation: ip-sysctl.txt: document addr_gen_mode · f168db5e

Sabrina Dubroca authored Jul 09, 2018

addr_gen_mode was introduced in without documentation, add it now.

Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f168db5e

net/ipv6: propagate net.ipv6.conf.all.addr_gen_mode to devices · f24c5987

Sabrina Dubroca authored Jul 09, 2018

This aligns the addr_gen_mode sysctl with the expected behavior of the
"all" variant.

Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

f24c5987

net/ipv6: reserve room for IFLA_INET6_ADDR_GEN_MODE · bdd72f41

Sabrina Dubroca authored Jul 09, 2018

inet6_ifla6_size() is called to check how much space is needed by
inet6_fill_link_af() and inet6_fill_ifinfo(), both of which include
the IFLA_INET6_ADDR_GEN_MODE attribute. Reserve some room for it.

Fixes: bc91b0f0 ("ipv6: addrconf: implement address generation modes")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bdd72f41

net/ipv6: don't reinitialize ndev->cnf.addr_gen_mode on new inet6_dev · 70c30d76

Sabrina Dubroca authored Jul 09, 2018

The value has already been copied from this netns's devconf_dflt, it
shouldn't be reset to the global kernel default.

Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

70c30d76

net/ipv6: fix addrconf_sysctl_addr_gen_mode · c6dbf7aa

Sabrina Dubroca authored Jul 09, 2018

addrconf_sysctl_addr_gen_mode() has multiple problems. First, it ignores
the errors returned by proc_dointvec().

addrconf_sysctl_addr_gen_mode() calls proc_dointvec() directly, which
writes the value to memory, and then checks if it's valid and may return
EINVAL. If a bad value is given, the value displayed when reading
net.ipv6.conf.foo.addr_gen_mode next time will be invalid. In case the
value provided by the user was valid, addrconf_dev_config() won't be
called since idev->cnf.addr_gen_mode has already been updated.

Fix this in the usual way we deal with values that need to be checked
after the proc_do*() helper has returned: define a local ctl_table and
storage, call proc_dointvec() on that temporary area, then check and
store.

addrconf_sysctl_addr_gen_mode() also writes the new value to the global
ipv6_devconf_dflt, when we're writing to some netns's default, so that
new netns will inherit the value that was set by the change occuring in
any netns. That doesn't make any sense, so let's drop this assignment.

Finally, since addr_gen_mode is a __u32, switch to proc_douintvec().

Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c6dbf7aa