1. 01 Apr, 2018 20 commits
    • David S. Miller's avatar
      Merge branch 'tipc-slim-down-name-table' · 6851cf28
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: slim down name table
      
      We clean up and improve the name binding table:
      
       - Replace the memory consuming 'sub_sequence/service range' array with
         an RB tree.
       - Introduce support for overlapping service sequences/ranges
      
       v2: #1: Fixed a missing initialization reported by David Miller
           #4: Obsoleted and replaced a few more macros to get a consistent
               terminology in the API.
           #5: Added new commit to fix a potential string overflow bug (it
               is still only in net-next) reported by Arnd Bergmann
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6851cf28
    • Jon Maloy's avatar
      tipc: avoid possible string overflow · 7494cfa6
      Jon Maloy authored
      gcc points out that the combined length of the fixed-length inputs to
      l->name is larger than the destination buffer size:
      
      net/tipc/link.c: In function 'tipc_link_create':
      net/tipc/link.c:465:26: error: '%s' directive writing up to 32 bytes
      into a region of size between 26 and 58 [-Werror=format-overflow=]
      sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
      
      net/tipc/link.c:465:2: note: 'sprintf' output 11 or more bytes
      (assuming 75) into a destination of size 60
      sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
      
      A detailed analysis reveals that the theoretical maximum length of
      a link name is:
      max self_str + 1 + max if_name + 1 + max peer_str + 1 + max if_name =
      16 + 1 + 15 + 1 + 16 + 1 + 15 = 65
      Since we also need space for a trailing zero we now set MAX_LINK_NAME
      to 68.
      
      Just to be on the safe side we also replace the sprintf() call with
      snprintf().
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address
      hash values")
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7494cfa6
    • Jon Maloy's avatar
      tipc: tipc: rename address types in user api · 7a74d39c
      Jon Maloy authored
      The three address type structs in the user API have names that in
      reality reflect the specific, non-Linux environment where they were
      originally created.
      
      We now give them more intuitive names, in accordance with how TIPC is
      described in the current documentation.
      
      struct tipc_portid   -> struct tipc_socket_addr
      struct tipc_name     -> struct tipc_service_addr
      struct tipc_name_seq -> struct tipc_service_range
      
      To avoid confusion, we also update some commmets and macro names to
       match the new terminology.
      
      For compatibility, we add macros that map all old names to the new ones.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a74d39c
    • Jon Maloy's avatar
      tipc: permit overlapping service ranges in name table · 37922ea4
      Jon Maloy authored
      With the new RB tree structure for service ranges it becomes possible to
      solve an old problem; - we can now allow overlapping service ranges in
      the table.
      
      When inserting a new service range to the tree, we use 'lower' as primary
      key, and when necessary 'upper' as secondary key.
      
      Since there may now be multiple service ranges matching an indicated
      'lower' value, we must also add the 'upper' value to the functions
      used for removing publications, so that the correct, corresponding
      range item can be found.
      
      These changes guarantee that a well-formed publication/withdrawal item
      from a peer node never will be rejected, and make it possible to
      eliminate the problematic backlog functionality we currently have for
      handling such cases.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37922ea4
    • Jon Maloy's avatar
      tipc: refactor name table translate function · f20889f7
      Jon Maloy authored
      The function tipc_nametbl_translate() function is ugly and hard to
      follow. This can be improved somewhat by introducing a stack variable
      for holding the publication list to be used and re-ordering the if-
      clauses for selection of algorithm.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f20889f7
    • Jon Maloy's avatar
      tipc: replace name table service range array with rb tree · 218527fe
      Jon Maloy authored
      The current design of the binding table has an unnecessary memory
      consuming and complex data structure. It aggregates the service range
      items into an array, which is expanded by a factor two every time it
      becomes too small to hold a new item. Furthermore, the arrays never
      shrink when the number of ranges diminishes.
      
      We now replace this array with an RB tree that is holding the range
      items as tree nodes, each range directly holding a list of bindings.
      
      This, along with a few name changes, improves both readability and
      volume of the code, as well as reducing memory consumption and hopefully
      improving cache hit rate.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      218527fe
    • David S. Miller's avatar
      Merge branch 'bridge-mtu' · 24197ee2
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: MTU handling changes
      
      As previously discussed the recent changes break some setups and could lead
      to packet drops. Thus the first patch reverts the behaviour for the bridge
      to follow the minimum MTU but also keeps the ability to set the MTU to the
      maximum (out of all ports) if vlan filtering is enabled. Patch 02 is the
      bigger change in behaviour - we've always had trouble when configuring
      bridges and their MTU which is auto tuning on port events
      (add/del/changemtu), which means config software needs to chase it and fix
      it after each such event, after patch 02 we allow the user to configure any
      MTU (ETH_MIN/MAX limited) but once that is done the bridge stops auto
      tuning and relies on the user to keep the MTU correct.
      This should be compatible with cases that don't touch the MTU (or set it
      to the same value), while allowing to configure the MTU and not worry
      about it changing afterwards.
      
      The patches are intentionally split like this, so that if they get accepted
      and there are any complaints patch 02 can be reverted.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24197ee2
    • Nikolay Aleksandrov's avatar
      net: bridge: disable bridge MTU auto tuning if it was set manually · 804b854d
      Nikolay Aleksandrov authored
      As Roopa noted today the biggest source of problems when configuring
      bridge and ports is that the bridge MTU keeps changing automatically on
      port events (add/del/changemtu). That leads to inconsistent behaviour
      and network config software needs to chase the MTU and fix it on each
      such event. Let's improve on that situation and allow for the user to
      set any MTU within ETH_MIN/MAX limits, but once manually configured it
      is the user's responsibility to keep it correct afterwards.
      
      In case the MTU isn't manually set - the behaviour reverts to the
      previous and the bridge follows the minimum MTU.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      804b854d
    • Nikolay Aleksandrov's avatar
      net: bridge: set min MTU on port events and allow user to set max · f40aa233
      Nikolay Aleksandrov authored
      Recently the bridge was changed to automatically set maximum MTU on port
      events (add/del/changemtu) when vlan filtering is enabled, but that
      actually changes behaviour in a way which breaks some setups and can lead
      to packet drops. In order to still allow that maximum to be set while being
      compatible, we add the ability for the user to tune the bridge MTU up to
      the maximum when vlan filtering is enabled, but that has to be done
      explicitly and all port events (add/del/changemtu) lead to resetting that
      MTU to the minimum as before.
      Suggested-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f40aa233
    • David S. Miller's avatar
      Merge branch 'thunderx-DMAC-filtering' · 56c03cbf
      David S. Miller authored
      Vadim Lomovtsev says:
      
      ====================
      net: thunderx: implement DMAC filtering support
      
      By default CN88XX BGX accepts all incoming multicast and broadcast
      packets and filtering is disabled. The nic driver doesn't provide
      an ability to change such behaviour.
      
      This series is to implement DMAC filtering management for CN88XX
      nic driver allowing user to enable/disable filtering and configure
      specific MAC addresses to filter traffic.
      
      Changes from v1:
      build issues:
       - update code in order to address compiler warnings;
      checkpatch.pl reported issues:
       - update code in order to fit 80 symbols length;
       - update commit descriptions in order to fit 80 symbols length;
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56c03cbf
    • Vadim Lomovtsev's avatar
      net: thunderx: add ndo_set_rx_mode callback implementation for VF · 37c3347e
      Vadim Lomovtsev authored
      The ndo_set_rx_mode() is called from atomic context which causes
      messages response timeouts while VF to PF communication via MSIx.
      To get rid of that we're copy passed mc list, parse flags and queue
      handling of kernel request to ordered workqueue.
      Signed-off-by: default avatarVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37c3347e
    • Vadim Lomovtsev's avatar
      net: thunderx: add workqueue control structures for handle ndo_set_rx_mode request · 1b6d55f2
      Vadim Lomovtsev authored
      The kernel calls ndo_set_rx_mode() callback from atomic context which
      causes messaging timeouts between VF and PF (as they’re implemented via
      MSIx). So in order to handle ndo_set_rx_mode() we need to get rid of it.
      
      This commit implements necessary workqueue related structures to let VF
      queue kernel request processing in non-atomic context later.
      Signed-off-by: default avatarVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b6d55f2
    • Vadim Lomovtsev's avatar
      net: thunderx: add XCAST messages handlers for PF · aba4a263
      Vadim Lomovtsev authored
      This commit is to add message handling for ndo_set_rx_mode()
      callback at PF side.
      Signed-off-by: default avatarVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aba4a263
    • Vadim Lomovtsev's avatar
      net: thunderx: add new messages for handle ndo_set_rx_mode callback · 0b849f58
      Vadim Lomovtsev authored
      The kernel calls ndo_set_rx_mode() callback supplying it will all necessary
      info, such as device state flags, multicast mac addresses list and so on.
      Since we have only 128 bits to communicate with PF we need to initiate
      several requests to PF with small/short operation each based on input data.
      
      So this commit implements following PF messages codes along with new
      data structures for them:
      NIC_MBOX_MSG_RESET_XCAST to flush all filters configured for this
                                particular network interface (VF)
      NIC_MBOX_MSG_ADD_MCAST   to add new MAC address to DMAC filter registers
                                for this particular network interface (VF)
      NIC_MBOX_MSG_SET_XCAST   to apply filtering configuration to filter control
                                register
      Signed-off-by: default avatarVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b849f58
    • Vadim Lomovtsev's avatar
      net: thunderx: add multicast filter management support · ceb9ea21
      Vadim Lomovtsev authored
      The ThunderX NIC could be partitioned to up to 128 VFs and thus
      represented to system. Each VF is mapped to pair BGX:LMAC, and each of VF
      is configured by kernel individually. Eventually the bunch of VFs could be
      mapped onto same pair BGX:LMAC and thus could cause several multicast
      filtering configuration requests to LMAC with the same MAC addresses.
      
      This commit is to add ThunderX NIC BGX filtering manipulation routines.
      Signed-off-by: default avatarVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ceb9ea21
    • Vadim Lomovtsev's avatar
      net: thunderx: add MAC address filter tracking for LMAC · 3a34ecfd
      Vadim Lomovtsev authored
      The ThunderX NIC has two Ethernet Interfaces (BGX) each of them could has
      up to four Logical MACs configured. Each of BGX has 32 filters to be
      configured for filtering ingress packets. The number of filters available
      to particular LMAC is from 8 (if we have four LMACs configured per BGX)
      up to 32 (in case of only one LMAC is configured per BGX).
      
      At the same time the NIC could present up to 128 VFs to OS as network
      interfaces, each of them kernel will configure with set of MAC addresses
      for filtering. So to prevent dupes in BGX filter registers from different
      network interfaces it is required to cache and track all filter
      configuration requests prior to applying them onto BGX filter registers.
      
      This commit is to update LMAC structures with control fields to
      allocate/releasing filters tracking list along with implementing
      dmac array allocate/release per LMAC.
      Signed-off-by: default avatarVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a34ecfd
    • Vadim Lomovtsev's avatar
      net: thunderx: move filter register related macro into proper place · f8ad1f3f
      Vadim Lomovtsev authored
      The ThunderX NIC has set of registers which allows to configure
      filter policy for ingress packets. There are three possible regimes
      of filtering multicasts, broadcasts and unicasts: accept all, reject all
      and accept filter allowed only.
      
      Current implementation has enum with all of them and two generic macro
      for enabling filtering et all (CAM_ACCEPT) and enabling/disabling
      broadcast packets, which also should be corrected in order to represent
      register bits properly. All these values are private for driver and
      there is no need to ‘publish’ them via header file.
      
      This commit is to move filtering register manipulation values from
      header file into source with explicit assignment of exact register
      values to them to be used while register configuring.
      Signed-off-by: default avatarVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8ad1f3f
    • David S. Miller's avatar
      Merge branch 'meson8b' · 5e8b270f
      David S. Miller authored
      Martin Blumenstingl says:
      
      ====================
      Meson8m2 support for dwmac-meson8b
      
      The Meson8m2 SoC is an updated version of the Meson8 SoC. Some of the
      peripherals are shared with Meson8b (for example the watchdog registers
      and the internal temperature sensor calibration procedure).
      Meson8m2 also seems to include the same Gigabit MAC register layout as
      Meson8b.
      
      The registers in the Amlogic dwmac "glue" seem identical between Meson8b
      and Meson8m2. Manual testing seems to confirm this.
      
      To be extra-safe a new compatible string is added because there's no
      (public) documentation on the Meson8m2 SoC. This will allow us to
      implement any SoC-specific variations later on (if needed).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e8b270f
    • Martin Blumenstingl's avatar
      net: stmmac: dwmac-meson8b: Add support for the Meson8m2 SoC · 7676693c
      Martin Blumenstingl authored
      The Meson8m2 SoC uses a similar (potentially even identical) register
      layout as the Meson8b and GXBB SoCs for the dwmac glue.
      Add a new compatible string and update the module description to
      indicate support for these SoCs.
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7676693c
    • Martin Blumenstingl's avatar
      dt-bindings: net: meson-dwmac: add support for the Meson8m2 SoC · a5af1fb9
      Martin Blumenstingl authored
      The Meson8m2 SoC uses a similar (potentially even identical) register
      layout for the dwmac glue as Meson8b and GXBB. Unfortunately there is no
      documentation available.
      Testing shows that both, RMII and RGMII PHYs are working if they are
      configured as on Meson8b. Add a new compatible string to the
      documentation so differences (if there are any) between Meson8m2 and the
      other SoCs can be taken care of within the driver.
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5af1fb9
  2. 30 Mar, 2018 20 commits
    • Lucas Bates's avatar
      tc-testing: Add newline when writing test case files · c0b6edef
      Lucas Bates authored
      When using the -i feature to generate random ID numbers for test
      cases in tdc, the function that writes the JSON to file doesn't
      add a newline character to the end of the file, so we have to
      add our own.
      Signed-off-by: default avatarLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0b6edef
    • Raghu Vatsavayi's avatar
      liquidio: prevent rx queues from getting stalled · ccdd0b4c
      Raghu Vatsavayi authored
      This commit has fix for RX traffic issues when we stress test the driver
      with continuous ifconfig up/down under very high traffic conditions.
      
      Reason for the issue is that, in existing liquidio_stop function NAPI is
      disabled even before actual FW/HW interface is brought down via
      send_rx_ctrl_cmd(lio, 0). Between time frame of NAPI disable and actual
      interface down in firmware, firmware continuously enqueues rx traffic to
      host. When interrupt happens for new packets, host irq handler fails in
      scheduling NAPI as the NAPI is already disabled.
      
      After "ifconfig <iface> up", Host re-enables NAPI but cannot schedule it
      until it receives another Rx interrupt. Host never receives Rx interrupt as
      it never cleared the Rx interrupt it received during interface down
      operation. NIC Rx interrupt gets cleared only when Host processes queue and
      clears the queue counts. Above anomaly leads to other issues like packet
      overflow in FW/HW queues, backpressure.
      
      Fix:
      This commit fixes this issue by disabling NAPI only after informing
      firmware to stop queueing packets to host via send_rx_ctrl_cmd(lio, 0).
      send_rx_ctrl_cmd is not visible in the patch as it is already there in the
      code. The DOWN command also waits for any pending packets to be processed
      by NAPI so that the deadlock will not occur.
      Signed-off-by: default avatarRaghu Vatsavayi <raghu.vatsavayi@cavium.com>
      Acked-by: default avatarDerek Chickles <derek.chickles@cavium.com>
      Signed-off-by: default avatarFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccdd0b4c
    • David S. Miller's avatar
      Merge branch 'ieee802154-for-davem-2018-03-29' of... · 6f14f49c
      David S. Miller authored
      Merge branch 'ieee802154-for-davem-2018-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154-next 2018-03-29
      
      An update from ieee802154 for *net-next*
      
      Colin fixed a unused variable in the new mcr20a driver.
      Harry fixed an unitialised data read in the debugfs interface of the
      ca8210 driver.
      
      If there are any issues or you think these are to late for -rc1 (both can also
      go into -rc2 as they are simple fixes) let me know.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f14f49c
    • Roman Mashak's avatar
      tc-testing: add connmark action tests · 1dad0f9f
      Roman Mashak authored
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1dad0f9f
    • Claudiu Manoil's avatar
      MAINTAINERS: Update my email address from freescale to nxp · fe3f4e80
      Claudiu Manoil authored
      The freescale.com address will no longer be available.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe3f4e80
    • Biju Das's avatar
      dt-bindings: net: renesas-ravb: Add support for r8a77470 SoC · 9b857563
      Biju Das authored
      Add a new compatible string for the RZ/G1C (R8A77470) SoC.
      Signed-off-by: default avatarBiju Das <biju.das@bp.renesas.com>
      Reviewed-by: default avatarFabrizio Castro <fabrizio.castro@bp.renesas.com>
      Acked-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b857563
    • David S. Miller's avatar
      Merge branch 'stmmac-DWMAC5' · 8bafb83e
      David S. Miller authored
      Jose Abreu says:
      
      ====================
      Fix TX Timeout and implement Safety Features
      
      Fix the TX Timeout handler to correctly reconfigure the whole system and
      start implementing features for DWMAC5 cores, specifically the Safety
      Features.
      
      Changes since v1:
      	- Display error stats in ethtool
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bafb83e
    • Jose Abreu's avatar
      net: stmmac: Add support for DWMAC5 and implement Safety Features · 8bf993a5
      Jose Abreu authored
      This adds initial suport for DWMAC5 and implements the Automotive Safety
      Package which is available from core version 5.10.
      
      The Automotive Safety Pacakge (also called Safety Features) offers us
      with error protection in the core by implementing ECC Protection in
      memories, on-chip data path parity protection, FSM parity and timeout
      protection and Application/CSR interface timeout protection.
      
      In case of an uncorrectable error we call stmmac_global_err() and
      reconfigure the whole core.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bf993a5
    • Jose Abreu's avatar
      net: stmmac: Rework and fix TX Timeout code · 34877a15
      Jose Abreu authored
      Currently TX Timeout handler does not behaves as expected and leads to
      an unrecoverable state. Rework current implementation of TX Timeout
      handling to actually perform a complete reset of the driver state and IP.
      
      We use deferred work to init a task which will be responsible for
      resetting the system.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34877a15
    • Jisheng Zhang's avatar
      net: mvneta: remove duplicate *_coal assignment · 02281a35
      Jisheng Zhang authored
      The style of the rx/tx queue's *_coal member assignment is:
      
      static void foo_coal_set(...)
      {
      	set the coal in hw;
      	update queue's foo_coal member; [1]
      }
      
      In other place, we call foo_coal_set(pp, queue->foo_coal), so the above [1]
      is duplicated and could be removed.
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02281a35
    • David S. Miller's avatar
      Merge branch 'do-not-allow-adding-routes-if-disable_ipv6-is-enabled' · e7696042
      David S. Miller authored
      Lorenzo Bianconi says:
      
      ====================
      do not allow adding routes if disable_ipv6 is enabled
      
      Do not allow userspace to add static ipv6 routes if disable_ipv6 is enabled.
      Update disable_ipv6 documentation according to that change
      
      Changes since v1:
      - added an extack message telling the user that IPv6 is disabled on the nexthop
        device
      - rebased on-top of net-next
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7696042
    • Lorenzo Bianconi's avatar
      Documentation: ip-sysctl.txt: clarify disable_ipv6 · 2f0aaf7f
      Lorenzo Bianconi authored
      Clarify that when disable_ipv6 is enabled even the ipv6 routes
      are deleted for the selected interface and from now it will not
      be possible to add addresses/routes to that interface
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f0aaf7f
    • Lorenzo Bianconi's avatar
      ipv6: do not set routes if disable_ipv6 has been enabled · 428604fb
      Lorenzo Bianconi authored
      Do not allow setting ipv6 routes from userspace if disable_ipv6 has been
      enabled. The issue can be triggered using the following reproducer:
      
      - sysctl net.ipv6.conf.all.disable_ipv6=1
      - ip -6 route add a:b:c:d::/64 dev em1
      - ip -6 route show
        a:b:c:d::/64 dev em1 metric 1024 pref medium
      
      Fix it checking disable_ipv6 value in ip6_route_info_create routine
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      428604fb
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · d162190b
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter/IPVS updates for your net-next
      tree. This batch comes with more input sanitization for xtables to
      address bug reports from fuzzers, preparation works to the flowtable
      infrastructure and assorted updates. In no particular order, they are:
      
      1) Make sure userspace provides a valid standard target verdict, from
         Florian Westphal.
      
      2) Sanitize error target size, also from Florian.
      
      3) Validate that last rule in basechain matches underflow/policy since
         userspace assumes this when decoding the ruleset blob that comes
         from the kernel, from Florian.
      
      4) Consolidate hook entry checks through xt_check_table_hooks(),
         patch from Florian.
      
      5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject
         very large compat offset arrays, so we have a reasonable upper limit
         and fuzzers don't exercise the oom-killer. Patches from Florian.
      
      6) Several WARN_ON checks on xtables mutex helper, from Florian.
      
      7) xt_rateest now has a hashtable per net, from Cong Wang.
      
      8) Consolidate counter allocation in xt_counters_alloc(), from Florian.
      
      9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch
         from Xin Long.
      
      10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from
          Felix Fietkau.
      
      11) Consolidate code through flow_offload_fill_dir(), also from Felix.
      
      12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward()
          to remove a dependency with flowtable and ipv6.ko, from Felix.
      
      13) Cache mtu size in flow_offload_tuple object, this is safe for
          forwarding as f87c10a8 describes, from Felix.
      
      14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too
          modular infrastructure, from Felix.
      
      15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from
          Ahmed Abdelsalam.
      
      16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei.
      
      17) Support for counting only to nf_conncount infrastructure, patch
          from Yi-Hung Wei.
      
      18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes
          to nft_ct.
      
      19) Use boolean as return value from ipt_ah and from IPVS too, patch
          from Gustavo A. R. Silva.
      
      20) Remove useless parameters in nfnl_acct_overquota() and
          nf_conntrack_broadcast_help(), from Taehee Yoo.
      
      21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo.
      
      22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu.
      
      23) Fix typo in xt_limit, from Geert Uytterhoeven.
      
      24) Do no use VLAs in Netfilter code, again from Gustavo.
      
      25) Use ADD_COUNTER from ebtables, from Taehee Yoo.
      
      26) Bitshift support for CONNMARK and MARK targets, from Jack Ma.
      
      27) Use pr_*() and add pr_fmt(), from Arushi Singhal.
      
      28) Add synproxy support to ctnetlink.
      
      29) ICMP type and IGMP matching support for ebtables, patches from
          Matthias Schiffer.
      
      30) Support for the revision infrastructure to ebtables, from
          Bernie Harris.
      
      31) String match support for ebtables, also from Bernie.
      
      32) Documentation for the new flowtable infrastructure.
      
      33) Use generic comparison functions in ebt_stp, from Joe Perches.
      
      34) Demodularize filter chains in nftables.
      
      35) Register conntrack hooks in case nftables NAT chain is added.
      
      36) Merge assignments with return in a couple of spots in the
          Netfilter codebase, also from Arushi.
      
      37) Document that xtables percpu counters are stored in the same
          memory area, from Ben Hutchings.
      
      38) Revert mark_source_chains() sanity checks that break existing
          rulesets, from Florian Westphal.
      
      39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d162190b
    • David S. Miller's avatar
      Merge branch 'Close-race-between-un-register_netdevice_notifier-and-pernet_operations' · b9a12601
      David S. Miller authored
      Kirill Tkhai says:
      
      ====================
      Close race between {un, }register_netdevice_notifier and pernet_operations
      
      the problem is {,un}register_netdevice_notifier() do not take
      pernet_ops_rwsem, and they don't see network namespaces, being
      initialized in setup_net() and cleanup_net(), since at this
      time net is not hashed to net_namespace_list.
      
      This may lead to imbalance, when a notifier is called at time of
      setup_net()/net is alive, but it's not called at time of cleanup_net(),
      for the devices, hashed to the net, and vise versa. See (3/3) for
      the scheme of imbalance.
      
      This patchset fixes the problem by acquiring pernet_ops_rwsem
      at the time of {,un}register_netdevice_notifier() (3/3).
      (1-2/3) are preparations in xfrm and netfilter subsystems.
      
      The problem was introduced a long ago, but backporting won't be easy,
      since every previous kernel version may have changes in netdevice
      notifiers, and they all need review and testing. Otherwise, there
      may be more pernet_operations, which register or unregister
      netdevice notifiers, and that leads to deadlock (which is was fixed
      in 1-2/3). This patchset is for net-next.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9a12601
    • Kirill Tkhai's avatar
      net: Close race between {un, }register_netdevice_notifier() and setup_net()/cleanup_net() · 328fbe74
      Kirill Tkhai authored
      {un,}register_netdevice_notifier() iterate over all net namespaces
      hashed to net_namespace_list. But pernet_operations register and
      unregister netdevices in unhashed net namespace, and they are not
      seen for netdevice notifiers. This results in asymmetry:
      
      1)Race with register_netdevice_notifier()
        pernet_operations::init(net)	...
         register_netdevice()		...
          call_netdevice_notifiers()  ...
            ... nb is not called ...
        ...				register_netdevice_notifier(nb) -> net skipped
        ...				...
        list_add_tail(&net->list, ..) ...
      
        Then, userspace stops using net, and it's destructed:
      
        pernet_operations::exit(net)
         unregister_netdevice()
          call_netdevice_notifiers()
            ... nb is called ...
      
      This always happens with net::loopback_dev, but it may be not the only device.
      
      2)Race with unregister_netdevice_notifier()
        pernet_operations::init(net)
         register_netdevice()
          call_netdevice_notifiers()
            ... nb is called ...
      
        Then, userspace stops using net, and it's destructed:
      
        list_del_rcu(&net->list)	...
        pernet_operations::exit(net)  unregister_netdevice_notifier(nb) -> net skipped
         dev_change_net_namespace()	...
          call_netdevice_notifiers()
            ... nb is not called ...
         unregister_netdevice()
          call_netdevice_notifiers()
            ... nb is not called ...
      
      This race is more danger, since dev_change_net_namespace() moves real
      network devices, which use not trivial netdevice notifiers, and if this
      will happen, the system will be left in unpredictable state.
      
      The patch closes the race. During the testing I found two places,
      where register_netdevice_notifier() is called from pernet init/exit
      methods (which led to deadlock) and fixed them (see previous patches).
      
      The review moved me to one more unusual registration place:
      raw_init() (can driver). It may be a reason of problems,
      if someone creates in-kernel CAN_RAW sockets, since they
      will be destroyed in exit method and raw_release()
      will call unregister_netdevice_notifier(). But grep over
      kernel tree does not show, someone creates such sockets
      from kernel space.
      
      Theoretically, there can be more places like this, and which are
      hidden from review, but we found them on the first bumping there
      (since there is no a race, it will be 100% reproducible).
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      328fbe74
    • Kirill Tkhai's avatar
      netfilter: Rework xt_TEE netdevice notifier · 9e2f6c5d
      Kirill Tkhai authored
      Register netdevice notifier for every iptable entry
      is not good, since this breaks modularity, and
      the hidden synchronization is based on rtnl_lock().
      
      This patch reworks the synchronization via new lock,
      while the rest of logic remains as it was before.
      This is required for the next patch.
      
      Tested via:
      
      while :; do
      	unshare -n iptables -t mangle -A OUTPUT -j TEE --gateway 1.1.1.2 --oif lo;
      done
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e2f6c5d
    • Kirill Tkhai's avatar
      xfrm: Register xfrm_dev_notifier in appropriate place · e9a441b6
      Kirill Tkhai authored
      Currently, driver registers it from pernet_operations::init method,
      and this breaks modularity, because initialization of net namespace
      and netdevice notifiers are orthogonal actions. We don't have
      per-namespace netdevice notifiers; all of them are global for all
      devices in all namespaces.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9a441b6
    • David S. Miller's avatar
      Merge branch 'Implement-of_get_nvmem_mac_address-helper' · caeeeda3
      David S. Miller authored
      Mike Looijmans says:
      
      ====================
      of_net: Implement of_get_nvmem_mac_address helper
      
      Posted this as a small set now, with an (optional) second patch that shows
      how the changes work and what I've used to test the code on a Topic Miami board.
      I've taken the liberty to add appropriate "Acked" and "Review" tags.
      
      v4: Replaced "6" with ETH_ALEN
      
      v3: Add patch that implements mac in nvmem for the Cadence MACB controller
          Remove the integrated of_get_mac_address call
      
      v2: Use of_nvmem_cell_get to avoid needing the assiciated device
          Use void* instead of char*
          Add devicetree binding doc
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caeeeda3
    • Mike Looijmans's avatar
      net: macb: Try to retrieve MAC addess from nvmem provider · aa076e3d
      Mike Looijmans authored
      Call of_get_nvmem_mac_address() to fetch the MAC address from an nvmem
      cell, if one is provided in the device tree. This allows the address to
      be stored in an I2C EEPROM device for example.
      Signed-off-by: default avatarMike Looijmans <mike.looijmans@topic.nl>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa076e3d