1. 15 Feb, 2019 1 commit
    • Herbert Xu's avatar
      mac80211: Use linked list instead of rhashtable walk for mesh tables · b4c3fbe6
      Herbert Xu authored
      The mesh table code walks over hash tables for two purposes.  First of
      all it's used as part of a netlink dump process, but it is also used
      for looking up entries to delete using criteria other than the hash
      key.
      
      The second purpose is directly contrary to the design specification
      of rhashtable walks.  It is only meant for use by netlink dumps.
      
      This is because rhashtable is resizable and you cannot obtain a
      stable walk over it during a resize process.
      
      In fact mesh's use of rhashtable for dumping is bogus too.  Rather
      than using rhashtable walk's iterator to keep track of the current
      position, it always converts the current position to an integer
      which defeats the purpose of the iterator.
      
      Therefore this patch converts all uses of rhashtable walk into a
      simple linked list.
      
      This patch also adds a new spin lock to protect the hash table
      insertion/removal as well as the walk list modifications.  In fact
      the previous code was buggy as the removals can race with each
      other, potentially resulting in a double-free.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      b4c3fbe6
  2. 14 Feb, 2019 15 commits
    • Vivien Didelot's avatar
      net: ethernet: freescale: set FEC ethtool regs version · f9bcc9f3
      Vivien Didelot authored
      Currently the ethtool_regs version is set to 0 for FEC devices.
      
      Use this field to store the register dump version exposed by the
      kernel. The choosen version 2 corresponds to the kernel compile test:
      
              #if defined(CONFIG_M523x) || defined(CONFIG_M527x)
              || defined(CONFIG_M528x) || defined(CONFIG_M520x)
              || defined(CONFIG_M532x) || defined(CONFIG_ARM)
              || defined(CONFIG_ARM64) || defined(CONFIG_COMPILE_TEST)
      
      and version 1 corresponds to the opposite. Binaries of ethtool unaware
      of this version will dump the whole set as usual.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9bcc9f3
    • Huang Zijiang's avatar
      net: hns: Fix object reference leaks in hns_dsaf_roce_reset() · c969c6e7
      Huang Zijiang authored
      The of_find_device_by_node() takes a reference to the underlying device
      structure, we should release that reference.
      Signed-off-by: default avatarHuang Zijiang <huang.zijiang@zte.com.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c969c6e7
    • Jann Horn's avatar
      mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs · 2c2ade81
      Jann Horn authored
      The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
      number of references that we might need to create in the fastpath later,
      the bump-allocation fastpath only has to modify the non-atomic bias value
      that tracks the number of extra references we hold instead of the atomic
      refcount. The maximum number of allocations we can serve (under the
      assumption that no allocation is made with size 0) is nc->size, so that's
      the bias used.
      
      However, even when all memory in the allocation has been given away, a
      reference to the page is still held; and in the `offset < 0` slowpath, the
      page may be reused if everyone else has dropped their references.
      This means that the necessary number of references is actually
      `nc->size+1`.
      
      Luckily, from a quick grep, it looks like the only path that can call
      page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
      requires CAP_NET_ADMIN in the init namespace and is only intended to be
      used for kernel testing and fuzzing.
      
      To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
      `offset < 0` path, below the virt_to_page() call, and then repeatedly call
      writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
      with a vector consisting of 15 elements containing 1 byte each.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c2ade81
    • David S. Miller's avatar
      Merge branch 'net-phy-fix-locking-issue' · 61c4c0bc
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      net: phy: fix locking issue
      
      Russell pointed out that the locking used in phy_is_started() isn't
      needed and misleading. This locking also contributes to a race fixed
      with patch 2.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61c4c0bc
    • Heiner Kallweit's avatar
      net: phy: fix potential race in the phylib state machine · a2004907
      Heiner Kallweit authored
      Russell reported the following race in the phylib state machine
      (quoting from his mail):
      
      if (phy_polling_mode(phydev) && phy_is_started(phydev))
      	phy_queue_state_machine(phydev, PHY_STATE_TIME);
      
      state = PHY_UP
      thread 0			thread 1
      				phy_disconnect()
      				+-phy_is_started()
      phy_is_started()                |
      				`-phy_stop()
      				  +-phydev->state = PHY_HALTED
      				  `-phy_stop_machine()
      				    `-cancel_delayed_work_sync()
      phy_queue_state_machine()
      `-mod_delayed_work()
      
      At this point, the phydev->state_queue() has been added back onto the
      system workqueue despite phy_stop_machine() having been called and
      cancel_delayed_work_sync() called on it.
      
      Fix this by protecting the complete operation in thread 0.
      
      Fixes: 2b3e88ea ("net: phy: improve phy state checking")
      Reported-by: default avatarRussell King - ARM Linux admin <linux@armlinux.org.uk>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2004907
    • Heiner Kallweit's avatar
      net: phy: don't use locking in phy_is_started · a2fc9d7e
      Heiner Kallweit authored
      Russell suggested to remove the locking from phy_is_started() because
      the read is atomic anyway and actually the locking may be more
      misleading.
      
      Fixes: 2b3e88ea ("net: phy: improve phy state checking")
      Suggested-by: default avatarRussell King - ARM Linux admin <linux@armlinux.org.uk>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2fc9d7e
    • Deepa Dinamani's avatar
      selftests: fix timestamping Makefile · 39c13319
      Deepa Dinamani authored
      The clean target in the makefile conflicts with the generic
      kselftests lib.mk, and fails to properly remove the compiled
      test programs.
      
      Remove the redundant rule, the TEST_GEN_FILES will be already
      removed by the CLEAN macro in lib.mk.
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: default avatarShuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39c13319
    • Dan Carpenter's avatar
      net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend() · 8d6ea932
      Dan Carpenter authored
      The value of ->num_ports comes from bcm_sf2_sw_probe() and it is less
      than or equal to DSA_MAX_PORTS.  The ds->ports[] array is used inside
      the dsa_is_user_port() and dsa_is_cpu_port() functions.  The ds->ports[]
      array is allocated in dsa_switch_alloc() and it has ds->num_ports
      elements so this leads to a static checker warning about a potential out
      of bounds read.
      
      Fixes: 8cfa9498 ("net: dsa: bcm_sf2: add suspend/resume callbacks")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d6ea932
    • Eric Dumazet's avatar
      net: fix possible overflow in __sk_mem_raise_allocated() · 5bf325a5
      Eric Dumazet authored
      With many active TCP sockets, fat TCP sockets could fool
      __sk_mem_raise_allocated() thanks to an overflow.
      
      They would increase their share of the memory, instead
      of decreasing it.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bf325a5
    • John David Anglin's avatar
      dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit · 7c0db24c
      John David Anglin authored
      The GPIO interrupt controller on the espressobin board only supports edge interrupts.
      If one enables the use of hardware interrupts in the device tree for the 88E6341, it is
      possible to miss an edge.  When this happens, the INTn pin on the Marvell switch is
      stuck low and no further interrupts occur.
      
      I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
      a race in handling device interrupts (e.g. PHY link interrupts).  Some interrupts are
      directly cleared by reading the Global 1 status register.  However, the device interrupt
      flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
      This is done by reading the relevant SERDES and PHY status register.
      
      The code only services interrupts whose status bit is set at the time of reading its status
      register.  If an interrupt event occurs after its status is read and before all interrupts
      are serviced, then this event will not be serviced and the INTn output pin will remain low.
      
      This is not a problem with polling or level interrupts since the handler will be called
      again to process the event.  However, it's a big problem when using level interrupts.
      
      The fix presented here is to add a loop around the code servicing switch interrupts.  If
      any pending interrupts remain after the current set has been handled, we loop and process
      the new set.  If there are no pending interrupts after servicing, we are sure that INTn has
      gone high and we will get an edge when a new event occurs.
      
      Tested on espressobin board.
      
      Fixes: dc30c35b ("net: dsa: mv88e6xxx: Implement interrupt support.")
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Tested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c0db24c
    • Heiner Kallweit's avatar
      net: phy: fix interrupt handling in non-started states · b79555d5
      Heiner Kallweit authored
      phylib enables interrupts before phy_start() has been called, and if
      we receive an interrupt in a non-started state, the interrupt handler
      returns IRQ_NONE. This causes problems with at least one Marvell chip
      as reported by Andrew.
      Fix this by handling interrupts the same as in phy_mac_interrupt(),
      basically always running the phylib state machine. It knows when it
      has to do something and when not.
      This change allows to handle interrupts gracefully even if they
      occur in a non-started state.
      
      Fixes: 2b3e88ea ("net: phy: improve phy state checking")
      Reported-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b79555d5
    • Xin Long's avatar
      sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate · af98c5a7
      Xin Long authored
      In sctp_stream_init(), after sctp_stream_outq_migrate() freed the
      surplus streams' ext, but sctp_stream_alloc_out() returns -ENOMEM,
      stream->outcnt will not be set to 'outcnt'.
      
      With the bigger value on stream->outcnt, when closing the assoc and
      freeing its streams, the ext of those surplus streams will be freed
      again since those stream exts were not set to NULL after freeing in
      sctp_stream_outq_migrate(). Then the invalid-free issue reported by
      syzbot would be triggered.
      
      We fix it by simply setting them to NULL after freeing.
      
      Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
      Reported-by: syzbot+58e480e7b28f2d890bfd@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af98c5a7
    • Xin Long's avatar
      sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment · fc228abc
      Xin Long authored
      Jianlin reported a panic when running sctp gso over gre over vlan device:
      
        [   84.772930] RIP: 0010:do_csum+0x6d/0x170
        [   84.790605] Call Trace:
        [   84.791054]  csum_partial+0xd/0x20
        [   84.791657]  gre_gso_segment+0x2c3/0x390
        [   84.792364]  inet_gso_segment+0x161/0x3e0
        [   84.793071]  skb_mac_gso_segment+0xb8/0x120
        [   84.793846]  __skb_gso_segment+0x7e/0x180
        [   84.794581]  validate_xmit_skb+0x141/0x2e0
        [   84.795297]  __dev_queue_xmit+0x258/0x8f0
        [   84.795949]  ? eth_header+0x26/0xc0
        [   84.796581]  ip_finish_output2+0x196/0x430
        [   84.797295]  ? skb_gso_validate_network_len+0x11/0x80
        [   84.798183]  ? ip_finish_output+0x169/0x270
        [   84.798875]  ip_output+0x6c/0xe0
        [   84.799413]  ? ip_append_data.part.50+0xc0/0xc0
        [   84.800145]  iptunnel_xmit+0x144/0x1c0
        [   84.800814]  ip_tunnel_xmit+0x62d/0x930 [ip_tunnel]
        [   84.801699]  gre_tap_xmit+0xac/0xf0 [ip_gre]
        [   84.802395]  dev_hard_start_xmit+0xa5/0x210
        [   84.803086]  sch_direct_xmit+0x14f/0x340
        [   84.803733]  __dev_queue_xmit+0x799/0x8f0
        [   84.804472]  ip_finish_output2+0x2e0/0x430
        [   84.805255]  ? skb_gso_validate_network_len+0x11/0x80
        [   84.806154]  ip_output+0x6c/0xe0
        [   84.806721]  ? ip_append_data.part.50+0xc0/0xc0
        [   84.807516]  sctp_packet_transmit+0x716/0xa10 [sctp]
        [   84.808337]  sctp_outq_flush+0xd7/0x880 [sctp]
      
      It was caused by SKB_GSO_CB(skb)->csum_start not set in sctp_gso_segment.
      sctp_gso_segment() calls skb_segment() with 'feature | NETIF_F_HW_CSUM',
      which causes SKB_GSO_CB(skb)->csum_start not to be set in skb_segment().
      
      For TCP/UDP, when feature supports HW_CSUM, CHECKSUM_PARTIAL will be set
      and gso_reset_checksum will be called to set SKB_GSO_CB(skb)->csum_start.
      
      So SCTP should do the same as TCP/UDP, to call gso_reset_checksum() when
      computing checksum in sctp_gso_segment.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc228abc
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · f325ef72
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains Netfilter/IPVS fixes for net:
      
      1) Missing structure initialization in ebtables causes splat with
         32-bit user level on a 64-bit kernel, from Francesco Ruggeri.
      
      2) Missing dependency on nf_defrag in IPVS IPv6 codebase, from
         Andrea Claudi.
      
      3) Fix possible use-after-free from release path of target extensions.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f325ef72
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2019-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 41ceb5e8
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2019-02-13
      
      This series introduces some fixes to mlx5 driver.
      For more information please see tag log below.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41ceb5e8
  3. 13 Feb, 2019 5 commits
    • Saeed Mahameed's avatar
      net/mlx5e: XDP, fix redirect resources availability check · 407e17b1
      Saeed Mahameed authored
      Currently mlx5 driver creates xdp redirect hw queues unconditionally on
      netdevice open, This is great until someone starts redirecting XDP traffic
      via ndo_xdp_xmit on mlx5 device and changes the device configuration at
      the same time, this might cause crashes, since the other device's napi
      is not aware of the mlx5 state change (resources un-availability).
      
      To fix this we must synchronize with other devices napi's on the system.
      Added a new flag under mlx5e_priv to determine XDP TX resources are
      available, set/clear it up when necessary and use synchronize_rcu()
      when the flag is turned off, so other napi's are in-sync with it, before
      we actually cleanup the hw resources.
      
      The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and
      it is sufficient to determine if it safe to transmit or not. The other
      two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become
      unnecessary. Thus, they are removed from data path.
      
      Fixes: 58b99ee3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
      Reported-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      407e17b1
    • Tariq Toukan's avatar
      net/mlx5: Fix a compilation warning in events.c · 5400261e
      Tariq Toukan authored
      Eliminate the following compilation warning:
      
      drivers/net/ethernet/mellanox/mlx5/core/events.c: warning: 'error_str'
      may be used uninitialized in this function [-Wuninitialized]:  => 238:3
      
      Fixes: c2fb3db2 ("net/mlx5: Rework handling of port module events")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarMikhael Goikhman <migo@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5400261e
    • Huy Nguyen's avatar
      net/mlx5: No command allowed when command interface is not ready · 4cab346b
      Huy Nguyen authored
      When EEH is injected and PCI bus stalls, mlx5's pci error detect
      function is called to deactivate the command interface and tear down
      the device. The issue is that there can be a thread that already
      passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
      and stuck in the wait_func.
      
      Solution:
      Add function mlx5_cmd_flush to disable command interface and clear all
      the pending commands. When device state is set to
      MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
      pending threads waiting for firmware commands completion are terminated.
      
      Fixes: c1d4d2e9 ("net/mlx5: Avoid calling sleeping function by the health poll thread")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      4cab346b
    • Maria Pasechnik's avatar
      net/mlx5e: Fix NULL pointer derefernce in set channels error flow · fb35c534
      Maria Pasechnik authored
      New channels are applied to the priv channels only after they
      are successfully opened. Then, the indirection table should be built
      according to the new number of channels.
      Currently, such build is preformed independently of whether the
      channels opening is successful, and is not reverted on failure.
      
      The bug is caused due to removal of rss params from channels struct
      and moving it to priv struct. That change cause to independency between
      channels and rss params.
      This causes a crash on a later point, when accessing rqn of a non
      existing channel.
      
      This patch fixes it by moving the indirection table build right before
      switching the priv channels to new channels struct, after the new set of
      channels was successfully opened.
      
      Fixes: bbeb53b8 ("net/mlx5e: Move RSS params to a dedicated struct")
      Signed-off-by: default avatarMaria Pasechnik <mariap@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      fb35c534
    • Pablo Neira Ayuso's avatar
      netfilter: nft_compat: use-after-free when deleting targets · 753c111f
      Pablo Neira Ayuso authored
      Fetch pointer to module before target object is released.
      
      Fixes: 29e38801 ("netfilter: nf_tables: fix use-after-free when deleting compat expressions")
      Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      753c111f
  4. 12 Feb, 2019 19 commits
    • Cong Wang's avatar
      team: avoid complex list operations in team_nl_cmd_options_set() · 2fdeee25
      Cong Wang authored
      The current opt_inst_list operations inside team_nl_cmd_options_set()
      is too complex to track:
      
          LIST_HEAD(opt_inst_list);
          nla_for_each_nested(...) {
              list_for_each_entry(opt_inst, &team->option_inst_list, list) {
                  if (__team_option_inst_tmp_find(&opt_inst_list, opt_inst))
                      continue;
                  list_add(&opt_inst->tmp_list, &opt_inst_list);
              }
          }
          team_nl_send_event_options_get(team, &opt_inst_list);
      
      as while we retrieve 'opt_inst' from team->option_inst_list, it could
      be added to the local 'opt_inst_list' for multiple times. The
      __team_option_inst_tmp_find() doesn't work, as the setter
      team_mode_option_set() still calls team->ops.exit() which uses
      ->tmp_list too in __team_options_change_check().
      
      Simplify the list operations by moving the 'opt_inst_list' and
      team_nl_send_event_options_get() into the nla_for_each_nested() loop so
      that it can be guranteed that we won't insert a same list entry for
      multiple times. Therefore, __team_option_inst_tmp_find() can be removed
      too.
      
      Fixes: 4fb0534f ("team: avoid adding twice the same option to the event list")
      Fixes: 2fcdb2c9 ("team: allow to send multiple set events in one message")
      Reported-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
      Reported-by: syzbot+68ee510075cf64260cc4@syzkaller.appspotmail.com
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fdeee25
    • David S. Miller's avatar
      Merge branch 'net_sched-some-fixes-for-cls_tcindex' · a090d794
      David S. Miller authored
      Cong Wang says:
      
      ====================
      net_sched: some fixes for cls_tcindex
      
      This patchset contains 3 bug fixes for tcindex filter. Please check
      each patch for details.
      
      v2: fix a compile error in patch 2
          drop netns refcnt in patch 1
      ====================
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      a090d794
    • Cong Wang's avatar
      net_sched: fix two more memory leaks in cls_tcindex · 1db817e7
      Cong Wang authored
      struct tcindex_filter_result contains two parts:
      struct tcf_exts and struct tcf_result.
      
      For the local variable 'cr', its exts part is never used but
      initialized without being released properly on success path. So
      just completely remove the exts part to fix this leak.
      
      For the local variable 'new_filter_result', it is never properly
      released if not used by 'r' on success path.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1db817e7
    • Cong Wang's avatar
      net_sched: fix a memory leak in cls_tcindex · 033b228e
      Cong Wang authored
      When tcindex_destroy() destroys all the filter results in
      the perfect hash table, it invokes the walker to delete
      each of them. However, results with class==0 are skipped
      in either tcindex_walk() or tcindex_delete(), which causes
      a memory leak reported by kmemleak.
      
      This patch fixes it by skipping the walker and directly
      deleting these filter results so we don't miss any filter
      result.
      
      As a result of this change, we have to initialize exts->net
      properly in tcindex_alloc_perfect_hash(). For net-next, we
      need to consider whether we should initialize ->net in
      tcf_exts_init() instead, before that just directly test
      CONFIG_NET_CLS_ACT=y.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      033b228e
    • Cong Wang's avatar
      net_sched: fix a race condition in tcindex_destroy() · 8015d93e
      Cong Wang authored
      tcindex_destroy() invokes tcindex_destroy_element() via
      a walker to delete each filter result in its perfect hash
      table, and tcindex_destroy_element() calls tcindex_delete()
      which schedules tcf RCU works to do the final deletion work.
      Unfortunately this races with the RCU callback
      __tcindex_destroy(), which could lead to use-after-free as
      reported by Adrian.
      
      Fix this by migrating this RCU callback to tcf RCU work too,
      as that workqueue is ordered, we will not have use-after-free.
      
      Note, we don't need to hold netns refcnt because we don't call
      tcf_exts_destroy() here.
      
      Fixes: 27ce4f05 ("net_sched: use tcf_queue_work() in tcindex filter")
      Reported-by: default avatarAdrian <bugs@abtelecom.ro>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8015d93e
    • David S. Miller's avatar
      Merge branch 'ena-races' · 6a7dd172
      David S. Miller authored
      Arthur Kiyanovski says:
      
      ====================
      net: ena: race condition bug fix and version update
      
      This patchset includes a fix to a race condition that can cause
      kernel panic, as well as a driver version update because of this
      fix.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a7dd172
    • Arthur Kiyanovski's avatar
      net: ena: update driver version from 2.0.2 to 2.0.3 · d9b8656d
      Arthur Kiyanovski authored
      Update driver version due to bug fix.
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9b8656d
    • Arthur Kiyanovski's avatar
      net: ena: fix race between link up and device initalization · e1f1bd9b
      Arthur Kiyanovski authored
      Fix race condition between ena_update_on_link_change() and
      ena_restore_device().
      
      This race can occur if link notification arrives while the driver
      is performing a reset sequence. In this case link can be set up,
      enabling the device, before it is fully restored. If packets are
      sent at this time, the driver might access uninitialized data
      structures, causing kernel crash.
      
      Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on()
      after ena_up() to ensure the device is ready when link is set up.
      
      Fixes: d18e4f68 ("net: ena: fix race condition between device reset and link up setup")
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1f1bd9b
    • Kal Conley's avatar
      net/packet: fix 4gb buffer limit due to overflow check · fc62814d
      Kal Conley authored
      When calculating rb->frames_per_block * req->tp_block_nr the result
      can overflow. Check it for overflow without limiting the total buffer
      size to UINT_MAX.
      
      This change fixes support for packet ring buffers >= UINT_MAX.
      
      Fixes: 8f8d28e4 ("net/packet: fix overflow in check for tp_frame_nr")
      Signed-off-by: default avatarKal Conley <kal.conley@dectris.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc62814d
    • Konstantin Khlebnikov's avatar
      inet_diag: fix reporting cgroup classid and fallback to priority · 1ec17dbd
      Konstantin Khlebnikov authored
      Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
      extensions has only 8 bits. Thus extensions starting from DCTCPINFO
      cannot be requested directly. Some of them included into response
      unconditionally or hook into some of lower 8 bits.
      
      Extension INET_DIAG_CLASS_ID has not way to request from the beginning.
      
      This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
      reservation, and documents behavior for other extensions.
      
      Also this patch adds fallback to reporting socket priority. This filed
      is more widely used for traffic classification because ipv4 sockets
      automatically maps TOS to priority and default qdisc pfifo_fast knows
      about that. But priority could be changed via setsockopt SO_PRIORITY so
      INET_DIAG_TOS isn't enough for predicting class.
      
      Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
      reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).
      
      So, after this patch INET_DIAG_CLASS_ID will report socket priority
      for most common setup when net_cls isn't set and/or cgroup2 in use.
      
      Fixes: 0888e372 ("net: inet: diag: expose sockets cgroup classid")
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ec17dbd
    • Eric Dumazet's avatar
      batman-adv: fix uninit-value in batadv_interface_tx() · 4ffcbfac
      Eric Dumazet authored
      KMSAN reported batadv_interface_tx() was possibly using a
      garbage value [1]
      
      batadv_get_vid() does have a pskb_may_pull() call
      but batadv_interface_tx() does not actually make sure
      this did not fail.
      
      [1]
      BUG: KMSAN: uninit-value in batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
      CPU: 0 PID: 10006 Comm: syz-executor469 Not tainted 4.20.0-rc7+ #5
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x173/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
       __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
       batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
       __netdev_start_xmit include/linux/netdevice.h:4356 [inline]
       netdev_start_xmit include/linux/netdevice.h:4365 [inline]
       xmit_one net/core/dev.c:3257 [inline]
       dev_hard_start_xmit+0x607/0xc40 net/core/dev.c:3273
       __dev_queue_xmit+0x2e42/0x3bc0 net/core/dev.c:3843
       dev_queue_xmit+0x4b/0x60 net/core/dev.c:3876
       packet_snd net/packet/af_packet.c:2928 [inline]
       packet_sendmsg+0x8306/0x8f30 net/packet/af_packet.c:2953
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg net/socket.c:631 [inline]
       __sys_sendto+0x8c4/0xac0 net/socket.c:1788
       __do_sys_sendto net/socket.c:1800 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:1796
       __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x441889
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffdda6fd468 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000441889
      RDX: 000000000000000e RSI: 00000000200000c0 RDI: 0000000000000003
      RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00007ffdda6fd4c0
      R13: 00007ffdda6fd4b0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
       kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
       kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
       kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
       slab_post_alloc_hook mm/slab.h:446 [inline]
       slab_alloc_node mm/slub.c:2759 [inline]
       __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
       __kmalloc_reserve net/core/skbuff.c:137 [inline]
       __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
       alloc_skb include/linux/skbuff.h:998 [inline]
       alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5220
       sock_alloc_send_pskb+0xafd/0x10e0 net/core/sock.c:2083
       packet_alloc_skb net/packet/af_packet.c:2781 [inline]
       packet_snd net/packet/af_packet.c:2872 [inline]
       packet_sendmsg+0x661a/0x8f30 net/packet/af_packet.c:2953
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg net/socket.c:631 [inline]
       __sys_sendto+0x8c4/0xac0 net/socket.c:1788
       __do_sys_sendto net/socket.c:1800 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:1796
       __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      Fixes: c6c8fea2 ("net: Add batman-adv meshing protocol")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc:	Marek Lindner <mareklindner@neomailbox.ch>
      Cc:	Simon Wunderlich <sw@simonwunderlich.de>
      Cc:	Antonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ffcbfac
    • Li RongQing's avatar
      ipv6: propagate genlmsg_reply return code · d1f20798
      Li RongQing authored
      genlmsg_reply can fail, so propagate its return code
      
      Fixes: 915d7e5e ("ipv6: sr: add code base for control plane support of SR-IPv6")
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1f20798
    • Saeed Mahameed's avatar
      net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames · 29dded89
      Saeed Mahameed authored
      When an ethernet frame is padded to meet the minimum ethernet frame
      size, the padding octets are not covered by the hardware checksum.
      Fortunately the padding octets are usually zero's, which don't affect
      checksum. However, it is not guaranteed. For example, switches might
      choose to make other use of these octets.
      This repeatedly causes kernel hardware checksum fault.
      
      Prior to the cited commit below, skb checksum was forced to be
      CHECKSUM_NONE when padding is detected. After it, we need to keep
      skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
      verify and parse IP headers, it does not worth the effort as the packets
      are so small that CHECKSUM_COMPLETE has no significant advantage.
      
      Future work: when reporting checksum complete is not an option for
      IP non-TCP/UDP packets, we can actually fallback to report checksum
      unnecessary, by looking at cqe IPOK bit.
      
      Fixes: 88078d98 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29dded89
    • Russell King's avatar
      net: phylink: avoid resolving link state too early · 87454b6e
      Russell King authored
      During testing on Armada 388 platforms, it was found with a certain
      module configuration that it was possible to trigger a kernel oops
      during the module load process, caused by the phylink resolver being
      triggered for a currently disabled interface.
      
      This problem was introduced by changing the way the SFP registration
      works, which now can result in the sfp link down notification being
      called during phylink_create().
      
      Fixes: b5bfc21a ("net: sfp: do not probe SFP module before we're attached")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87454b6e
    • Matteo Croce's avatar
      geneve: change NET_UDP_TUNNEL dependency to select · a7603ac1
      Matteo Croce authored
      Due to the depends on NET_UDP_TUNNEL, at the moment it is impossible to
      compile GENEVE if no other protocol depending on NET_UDP_TUNNEL is
      selected.
      
      Fix this changing the depends to a select, and drop NET_IP_TUNNEL from the
      select list, as it already depends on NET_UDP_TUNNEL.
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Reviewed-and-tested-by: default avatarAndrea Claudi <aclaudi@redhat.com>
      Tested-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7603ac1
    • Bert Kenward's avatar
      sfc: initialise found bitmap in efx_ef10_mtd_probe · c6528542
      Bert Kenward authored
      The bitmap of found partitions in efx_ef10_mtd_probe was not
      initialised, causing partitions to be suppressed based off whatever
      value was in the bitmap at the start.
      
      Fixes: 33664635 ("sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe")
      Signed-off-by: default avatarBert Kenward <bkenward@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6528542
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2019-02-12' of... · 1ea06107
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2019-02-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Just a few fixes:
       * aggregation session teardown with internal TXQs was
         continuing to send some frames marked as aggregation,
         fix from Ilan
       * IBSS join was missed during firmware restart, should
         such a thing happen
       * speculative execution based on the return value of
         cfg80211_classify8021d() - which is controlled by the
         sender of the packet - could be problematic in some
         code using it, prevent it
       * a few peer measurement fixes
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ea06107
    • Andrea Claudi's avatar
      ipvs: fix dependency on nf_defrag_ipv6 · 098e13f5
      Andrea Claudi authored
      ipvs relies on nf_defrag_ipv6 module to manage IPv6 fragmentation,
      but lacks proper Kconfig dependencies and does not explicitly
      request defrag features.
      
      As a result, if netfilter hooks are not loaded, when IPv6 fragmented
      packet are handled by ipvs only the first fragment makes through.
      
      Fix it properly declaring the dependency on Kconfig and registering
      netfilter hooks on ip_vs_add_service() and ip_vs_new_dest().
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Signed-off-by: default avatarAndrea Claudi <aclaudi@redhat.com>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      098e13f5
    • Tuong Lien's avatar
      tipc: fix link session and re-establish issues · 91986ee1
      Tuong Lien authored
      When a link endpoint is re-created (e.g. after a node reboot or
      interface reset), the link session number is varied by random, the peer
      endpoint will be synced with this new session number before the link is
      re-established.
      
      However, there is a shortcoming in this mechanism that can lead to the
      link never re-established or faced with a failure then. It happens when
      the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
      well as the 'in_session' flag have been set, but suddenly this link
      endpoint leaves. When it comes back with a random session number, there
      are two situations possible:
      
      1/ If the random session number is larger than (or equal to) the
      previous one, the peer endpoint will be updated with this new session
      upon receipt of a RESET_MSG from this endpoint, and the link can be re-
      established as normal. Otherwise, all the RESET_MSGs from this endpoint
      will be rejected by the peer. In turn, when this link endpoint receives
      one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
      to send STATE_MSGs, but again these messages will be dropped by the
      peer due to wrong session.
      The peer link endpoint can still become ESTABLISHED after receiving a
      traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
      NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
      will be forced down sooner or later!
      
      Even in case the random session number is larger than the previous one,
      it can be that the ACTIVATE_MSG from the peer arrives first, and this
      link endpoint moves quickly to ESTABLISHED without sending out any
      RESET_MSG yet. Consequently, the peer link will not be updated with the
      new session number, and the same link failure scenario as above will
      happen.
      
      2/ Another situation can be that, the peer link endpoint was reset due
      to any reasons in the meantime, its link state was set to RESET from
      ESTABLISHING but still in session, i.e. the 'in_session' flag is not
      reset...
      Now, if the random session number from this endpoint is less than the
      previous one, all the RESET_MSGs from this endpoint will be rejected by
      the peer. In the other direction, when this link endpoint receives a
      RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
      ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
      As a result, the link cannot be re-established but gets stuck with this
      link endpoint in state ESTABLISHING and the peer in RESET!
      
      Solution:
      
      ===========
      
      This link endpoint should not go directly to ESTABLISHED when getting
      ACTIVATE_MSG from the peer which may belong to the old session if the
      link was re-created. To ensure the session to be correct before the
      link is re-established, the peer endpoint in ESTABLISHING state will
      send back the last session number in ACTIVATE_MSG for a verification at
      this endpoint. Then, if needed, a new and more appropriate session
      number will be regenerated to force a re-synch first.
      
      In addition, when a link in ESTABLISHING state is reset, its state will
      move to RESET according to the link FSM, along with resetting the
      'in_session' flag (and the other data) as a normal link reset, it will
      also be deleted if requested.
      
      The solution is backward compatible.
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91986ee1