1. 23 Nov, 2018 23 commits
    • Or Gerlitz's avatar
      net/mlx5e: Always use the match level enum when parsing TC rule match · 52ae8d6c
      Or Gerlitz authored
      [ Upstream commit 83621b7d ]
      
      We get the match level (none, l2, l3, l4) while going over the match
      dissectors of an offloaded tc rule. When doing this, the match level
      enum and the not min inline enum values should be used, fix that.
      
      This worked accidentally b/c both enums have the same numerical values.
      
      Fixes: d708f902 ('net/mlx5e: Get the required HW match level while parsing TC flow matches')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52ae8d6c
    • Xin Long's avatar
      Revert "sctp: remove sctp_transport_pmtu_check" · 00497302
      Xin Long authored
      [ Upstream commit 69fec325 ]
      
      This reverts commit 22d7be26.
      
      The dst's mtu in transport can be updated by a non sctp place like
      in xfrm where the MTU information didn't get synced between asoc,
      transport and dst, so it is still needed to do the pmtu check
      in sctp_packet_config.
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00497302
    • Moshe Shemesh's avatar
      net/mlx5e: RX, verify received packet size in Linear Striding RQ · 5fa9f2bd
      Moshe Shemesh authored
      [ Upstream commit 0073c8f7 ]
      
      In case of striding RQ, we use  MPWRQ (Multi Packet WQE RQ), which means
      that WQE (RX descriptor) can be used for many packets and so the WQE is
      much bigger than MTU.  In virtualization setups where the port mtu can
      be larger than the vf mtu, if received packet is bigger than MTU, it
      won't be dropped by HW on too small receive WQE. If we use linear SKB in
      striding RQ, since each stride has room for mtu size payload and skb
      info, an oversized packet can lead to crash for crossing allocated page
      boundary upon the call to build_skb. So driver needs to check packet
      size and drop it.
      
      Introduce new SW rx counter, rx_oversize_pkts_sw_drop, which counts the
      number of packets dropped by the driver for being too large.
      
      As a new field is added to the RQ struct, re-open the channels whenever
      this field is being used in datapath (i.e., in the case of linear
      Striding RQ).
      
      Fixes: 619a8f2a ("net/mlx5e: Use linear SKB in Striding RQ")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fa9f2bd
    • Yuval Avnery's avatar
      net/mlx5e: Adjust to max number of channles when re-attaching · 97cc2cc1
      Yuval Avnery authored
      [ Upstream commit a1f240f1 ]
      
      When core driver enters deattach/attach flow after pci reset,
      Number of logical CPUs may have changed.
      As a result we need to update the cpu affiliated resource tables.
      	1. indirect rqt list
      	2. eq table
      
      Reproduction (PowerPC):
      	echo 1000 > /sys/kernel/debug/powerpc/eeh_max_freezes
      	ppc64_cpu --smt=on
      	# Restart driver
      	modprobe -r ... ; modprobe ...
      	# Link up
      	ifconfig ...
      	# Only physical CPUs
      	ppc64_cpu --smt=off
      	# Inject PCI errors so PCI will reset - calling the pci error handler
      	echo 0x8000000000000000 > /sys/kernel/debug/powerpc/<PCI BUS>/err_injct_inboundA
      
      Call trace when trying to add non-existing rqs to an indirect rqt:
      	mlx5e_redirect_rqt+0x84/0x260 [mlx5_core] (unreliable)
      	mlx5e_redirect_rqts+0x188/0x190 [mlx5_core]
      	mlx5e_activate_priv_channels+0x488/0x570 [mlx5_core]
      	mlx5e_open_locked+0xbc/0x140 [mlx5_core]
      	mlx5e_open+0x50/0x130 [mlx5_core]
      	mlx5e_nic_enable+0x174/0x1b0 [mlx5_core]
      	mlx5e_attach_netdev+0x154/0x290 [mlx5_core]
      	mlx5e_attach+0x88/0xd0 [mlx5_core]
      	mlx5_attach_device+0x168/0x1e0 [mlx5_core]
      	mlx5_load_one+0x1140/0x1210 [mlx5_core]
      	mlx5_pci_resume+0x6c/0xf0 [mlx5_core]
      
      Create cq will fail when trying to use non-existing EQ.
      
      Fixes: 89d44f0a ("net/mlx5_core: Add pci error handlers to mlx5_core driver")
      Signed-off-by: default avatarYuval Avnery <yuvalav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97cc2cc1
    • Or Gerlitz's avatar
      net/mlx5e: Claim TC hw offloads support only under a proper build config · 92a2f39f
      Or Gerlitz authored
      [ Upstream commit 077ecd78 ]
      
      Currently, we are only supporting tc hw offloads when the eswitch
      support is compiled in, but we are not gating the adevertizment
      of the NETIF_F_HW_TC feature on this config being set.
      
      Fix it, and while doing that, also avoid dealing with the feature
      on ethtool when the config is not set.
      
      Fixes: e8f887ac ('net/mlx5e: Introduce tc offload support')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92a2f39f
    • Or Gerlitz's avatar
      net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded · 5351b859
      Or Gerlitz authored
      [ Upstream commit d3a80bb5 ]
      
      For the "all" ethertype we should not care whether the packet has
      vlans. Besides being wrong, the way we did it caused FW error
      for rules such as:
      
      tc filter add dev eth0 protocol all parent ffff: \
      	prio 1 flower skip_sw action drop
      
      b/c the matching meta-data (outer headers bit in struct mlx5_flow_spec)
      wasn't set. Fix that by matching on vlan non-existence only if we were
      also told to match on the ethertype.
      
      Fixes: cee26487 ('net/mlx5e: Set vlan masks for all offloaded TC rules')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarSlava Ovsiienko <viacheslavo@mellanox.com>
      Reviewed-by: default avatarJianbo Liu <jianbol@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5351b859
    • Jon Maloy's avatar
      tipc: fix link re-establish failure · 961842dc
      Jon Maloy authored
      [ Upstream commit 7ab412d3 ]
      
      When a link failure is detected locally, the link is reset, the flag
      link->in_session is set to false, and a RESET_MSG with the 'stopping'
      bit set is sent to the peer.
      
      The purpose of this bit is to inform the peer that this endpoint just
      is going down, and that the peer should handle the reception of this
      particular RESET message as a local failure. This forces the peer to
      accept another RESET or ACTIVATE message from this endpoint before it
      can re-establish the link. This again is necessary to ensure that
      link session numbers are properly exchanged before the link comes up
      again.
      
      If a failure is detected locally at the same time at the peer endpoint
      this will do the same, which is also a correct behavior.
      
      However, when receiving such messages, the endpoints will not
      distinguish between 'stopping' RESETs and ordinary ones when it comes
      to updating session numbers. Both endpoints will copy the received
      session number and set their 'in_session' flags to true at the
      reception, while they are still expecting another RESET from the
      peer before they can go ahead and re-establish. This is contradictory,
      since, after applying the validation check referred to below, the
      'in_session' flag will cause rejection of all such messages, and the
      link will never come up again.
      
      We now fix this by not only handling received RESET/STOPPING messages
      as a local failure, but also by omitting to set a new session number
      and the 'in_session' flag in such cases.
      
      Fixes: 7ea817f4 ("tipc: check session number before accepting link protocol messages")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      961842dc
    • Jakub Kicinski's avatar
      net: sched: cls_flower: validate nested enc_opts_policy to avoid warning · ed25a206
      Jakub Kicinski authored
      [ Upstream commit 63c82997 ]
      
      TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
      currently contain further nested attributes, which are parsed by
      hand, so the policy is never actually used resulting in a W=1
      build warning:
      
      net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=]
       enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {
      
      Add the validation anyway to avoid potential bugs when other
      attributes are added and to make the attribute structure slightly
      more clear.  Validation will also set extact to point to bad
      attribute on error.
      
      Fixes: 0a6e7778 ("net/sched: allow flower to match tunnel options")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed25a206
    • Davide Caratti's avatar
      net/sched: act_pedit: fix memory leak when IDR allocation fails · ae06e2f9
      Davide Caratti authored
      [ Upstream commit 19ab6910 ]
      
      tcf_idr_check_alloc() can return a negative value, on allocation failures
      (-ENOMEM) or IDR exhaustion (-ENOSPC): don't leak keys_ex in these cases.
      
      Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae06e2f9
    • Florian Fainelli's avatar
      net: systemport: Protect stop from timeout · 3d6679c8
      Florian Fainelli authored
      [ Upstream commit 7cb6a2a2 ]
      
      A timing hazard exists when the network interface is stopped that
      allows a watchdog timeout to be processed by a separate core in
      parallel. This creates the potential for the timeout handler to
      wake the queues while the driver is shutting down, or access
      registers after their clocks have been removed.
      
      The more common case is that the watchdog timeout will produce a
      warning message which doesn't lead to a crash. The chances of this
      are greatly increased by the fact that bcm_sysport_netif_stop stops
      the transmit queues which can easily precipitate a watchdog time-
      out because of stale trans_start data in the queues.
      
      This commit corrects the behavior by ensuring that the watchdog
      timeout is disabled before enterring bcm_sysport_netif_stop. There
      are currently only two users of the bcm_sysport_netif_stop function:
      close and suspend.
      
      The close case already handles the issue by exiting the RUNNING
      state before invoking the driver close service.
      
      The suspend case now performs the netif_device_detach to exit the
      PRESENT state before the call to bcm_sysport_netif_stop rather than
      after it.
      
      These behaviors prevent any future scheduling of the driver timeout
      service during the window. The netif_tx_stop_all_queues function
      in bcm_sysport_netif_stop is replaced with netif_tx_disable to ensure
      synchronization with any transmit or timeout threads that may
      already be executing on other cores.
      
      For symmetry, the netif_device_attach call upon resume is moved to
      after the call to bcm_sysport_netif_start. Since it wakes the transmit
      queues it is not necessary to invoke netif_tx_start_all_queues from
      bcm_sysport_netif_start so it is moved into the driver open service.
      
      Fixes: 40755a0f ("net: systemport: add suspend and resume support")
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d6679c8
    • Matthew Cover's avatar
      tuntap: fix multiqueue rx · 3e8f5d55
      Matthew Cover authored
      [ Upstream commit 8ebebcba ]
      
      When writing packets to a descriptor associated with a combined queue, the
      packets should end up on that queue.
      
      Before this change all packets written to any descriptor associated with a
      tap interface end up on rx-0, even when the descriptor is associated with a
      different queue.
      
      The rx traffic can be generated by either of the following.
        1. a simple tap program which spins up multiple queues and writes packets
           to each of the file descriptors
        2. tx from a qemu vm with a tap multiqueue netdev
      
      The queue for rx traffic can be observed by either of the following (done
      on the hypervisor in the qemu case).
        1. a simple netmap program which opens and reads from per-queue
           descriptors
        2. configuring RPS and doing per-cpu captures with rxtxcpu
      
      Alternatively, if you printk() the return value of skb_get_rx_queue() just
      before each instance of netif_receive_skb() in tun.c, you will get 65535
      for every skb.
      
      Calling skb_record_rx_queue() to set the rx queue to the queue_index fixes
      the association between descriptor and rx queue.
      Signed-off-by: default avatarMatthew Cover <matthew.cover@stackpath.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e8f5d55
    • Jon Maloy's avatar
      tipc: fix lockdep warning when reinitilaizing sockets · ce209966
      Jon Maloy authored
      [ Upstream commit adba75be ]
      
      We get the following warning:
      
      [   47.926140] 32-bit node address hash set to 2010a0a
      [   47.927202]
      [   47.927433] ================================
      [   47.928050] WARNING: inconsistent lock state
      [   47.928661] 4.19.0+ #37 Tainted: G            E
      [   47.929346] --------------------------------
      [   47.929954] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [   47.930116] swapper/3/0 [HC0[0]:SC1[3]:HE1:SE0] takes:
      [   47.930116] 00000000af8bc31e (&(&ht->lock)->rlock){+.?.}, at: rhashtable_walk_enter+0x36/0xb0
      [   47.930116] {SOFTIRQ-ON-W} state was registered at:
      [   47.930116]   _raw_spin_lock+0x29/0x60
      [   47.930116]   rht_deferred_worker+0x556/0x810
      [   47.930116]   process_one_work+0x1f5/0x540
      [   47.930116]   worker_thread+0x64/0x3e0
      [   47.930116]   kthread+0x112/0x150
      [   47.930116]   ret_from_fork+0x3a/0x50
      [   47.930116] irq event stamp: 14044
      [   47.930116] hardirqs last  enabled at (14044): [<ffffffff9a07fbba>] __local_bh_enable_ip+0x7a/0xf0
      [   47.938117] hardirqs last disabled at (14043): [<ffffffff9a07fb81>] __local_bh_enable_ip+0x41/0xf0
      [   47.938117] softirqs last  enabled at (14028): [<ffffffff9a0803ee>] irq_enter+0x5e/0x60
      [   47.938117] softirqs last disabled at (14029): [<ffffffff9a0804a5>] irq_exit+0xb5/0xc0
      [   47.938117]
      [   47.938117] other info that might help us debug this:
      [   47.938117]  Possible unsafe locking scenario:
      [   47.938117]
      [   47.938117]        CPU0
      [   47.938117]        ----
      [   47.938117]   lock(&(&ht->lock)->rlock);
      [   47.938117]   <Interrupt>
      [   47.938117]     lock(&(&ht->lock)->rlock);
      [   47.938117]
      [   47.938117]  *** DEADLOCK ***
      [   47.938117]
      [   47.938117] 2 locks held by swapper/3/0:
      [   47.938117]  #0: 0000000062c64f90 ((&d->timer)){+.-.}, at: call_timer_fn+0x5/0x280
      [   47.938117]  #1: 00000000ee39619c (&(&d->lock)->rlock){+.-.}, at: tipc_disc_timeout+0xc8/0x540 [tipc]
      [   47.938117]
      [   47.938117] stack backtrace:
      [   47.938117] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G            E     4.19.0+ #37
      [   47.938117] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   47.938117] Call Trace:
      [   47.938117]  <IRQ>
      [   47.938117]  dump_stack+0x5e/0x8b
      [   47.938117]  print_usage_bug+0x1ed/0x1ff
      [   47.938117]  mark_lock+0x5b5/0x630
      [   47.938117]  __lock_acquire+0x4c0/0x18f0
      [   47.938117]  ? lock_acquire+0xa6/0x180
      [   47.938117]  lock_acquire+0xa6/0x180
      [   47.938117]  ? rhashtable_walk_enter+0x36/0xb0
      [   47.938117]  _raw_spin_lock+0x29/0x60
      [   47.938117]  ? rhashtable_walk_enter+0x36/0xb0
      [   47.938117]  rhashtable_walk_enter+0x36/0xb0
      [   47.938117]  tipc_sk_reinit+0xb0/0x410 [tipc]
      [   47.938117]  ? mark_held_locks+0x6f/0x90
      [   47.938117]  ? __local_bh_enable_ip+0x7a/0xf0
      [   47.938117]  ? lockdep_hardirqs_on+0x20/0x1a0
      [   47.938117]  tipc_net_finalize+0xbf/0x180 [tipc]
      [   47.938117]  tipc_disc_timeout+0x509/0x540 [tipc]
      [   47.938117]  ? call_timer_fn+0x5/0x280
      [   47.938117]  ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
      [   47.938117]  ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
      [   47.938117]  call_timer_fn+0xa1/0x280
      [   47.938117]  ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
      [   47.938117]  run_timer_softirq+0x1f2/0x4d0
      [   47.938117]  __do_softirq+0xfc/0x413
      [   47.938117]  irq_exit+0xb5/0xc0
      [   47.938117]  smp_apic_timer_interrupt+0xac/0x210
      [   47.938117]  apic_timer_interrupt+0xf/0x20
      [   47.938117]  </IRQ>
      [   47.938117] RIP: 0010:default_idle+0x1c/0x140
      [   47.938117] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 65 8b 2d d8 2b 74 65 0f 1f 44 00 00 e8 c6 2c 8b ff fb f4 <65> 8b 2d c5 2b 74 65 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 b4 2b
      [   47.938117] RSP: 0018:ffffaf6ac0207ec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
      [   47.938117] RAX: ffff8f5b3735e200 RBX: 0000000000000003 RCX: 0000000000000001
      [   47.938117] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8f5b3735e200
      [   47.938117] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000
      [   47.938117] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [   47.938117] R13: 0000000000000000 R14: ffff8f5b3735e200 R15: ffff8f5b3735e200
      [   47.938117]  ? default_idle+0x1a/0x140
      [   47.938117]  do_idle+0x1bc/0x280
      [   47.938117]  cpu_startup_entry+0x19/0x20
      [   47.938117]  start_secondary+0x187/0x1c0
      [   47.938117]  secondary_startup_64+0xa4/0xb0
      
      The reason seems to be that tipc_net_finalize()->tipc_sk_reinit() is
      calling the function rhashtable_walk_enter() within a timer interrupt.
      We fix this by executing tipc_net_finalize() in work queue context.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce209966
    • Jon Maloy's avatar
      tipc: don't assume linear buffer when reading ancillary data · aaf13772
      Jon Maloy authored
      [ Upstream commit 1c1274a5 ]
      
      The code for reading ancillary data from a received buffer is assuming
      the buffer is linear. To make this assumption true we have to linearize
      the buffer before message data is read.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aaf13772
    • Siva Reddy Kallam's avatar
      tg3: Add PHY reset for 5717/5719/5720 in change ring and flow control paths · 710c65c8
      Siva Reddy Kallam authored
      [ Upstream commit 59663e42 ]
      
      This patch has the fix to avoid PHY lockup with 5717/5719/5720 in change
      ring and flow control paths. This patch solves the RX hang while doing
      continuous ring or flow control parameters with heavy traffic from peer.
      Signed-off-by: default avatarSiva Reddy Kallam <siva.kallam@broadcom.com>
      Acked-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      710c65c8
    • Stephen Mallon's avatar
      tcp: Fix SOF_TIMESTAMPING_RX_HARDWARE to use the latest timestamp during TCP coalescing · 7e678227
      Stephen Mallon authored
      [ Upstream commit cadf9df2 ]
      
      During tcp coalescing ensure that the skb hardware timestamp refers to the
      highest sequence number data.
      Previously only the software timestamp was updated during coalescing.
      Signed-off-by: default avatarStephen Mallon <stephen.mallon@sydney.edu.au>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e678227
    • Xin Long's avatar
      sctp: not allow to set asoc prsctp_enable by sockopt · 7e86081c
      Xin Long authored
      [ Upstream commit cc3ccf26 ]
      
      As rfc7496#section4.5 says about SCTP_PR_SUPPORTED:
      
         This socket option allows the enabling or disabling of the
         negotiation of PR-SCTP support for future associations.  For existing
         associations, it allows one to query whether or not PR-SCTP support
         was negotiated on a particular association.
      
      It means only sctp sock's prsctp_enable can be set.
      
      Note that for the limitation of SCTP_{CURRENT|ALL}_ASSOC, we will
      add it when introducing SCTP_{FUTURE|CURRENT|ALL}_ASSOC for linux
      sctp in another patchset.
      
      v1->v2:
        - drop the params.assoc_id check as Neil suggested.
      
      Fixes: 28aa4c26 ("sctp: add SCTP_PR_SUPPORTED on sctp sockopt")
      Reported-by: default avatarYing Xu <yinxu@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e86081c
    • Eric Dumazet's avatar
      net-gro: reset skb->pkt_type in napi_reuse_skb() · a21a82a9
      Eric Dumazet authored
      [ Upstream commit 33d9a2c7 ]
      
      eth_type_trans() assumes initial value for skb->pkt_type
      is PACKET_HOST.
      
      This is indeed the value right after a fresh skb allocation.
      
      However, it is possible that GRO merged a packet with a different
      value (like PACKET_OTHERHOST in case macvlan is used), so
      we need to make sure napi->skb will have pkt_type set back to
      PACKET_HOST.
      
      Otherwise, valid packets might be dropped by the stack because
      their pkt_type is not PACKET_HOST.
      
      napi_reuse_skb() was added in commit 96e93eab ("gro: Add
      internal interfaces for VLAN"), but this bug always has
      been there.
      
      Fixes: 96e93eab ("gro: Add internal interfaces for VLAN")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a21a82a9
    • Doug Berger's avatar
      net: bcmgenet: protect stop from timeout · 852c280d
      Doug Berger authored
      A timing hazard exists when the network interface is stopped that
      allows a watchdog timeout to be processed by a separate core in
      parallel. This creates the potential for the timeout handler to
      wake the queues while the driver is shutting down, or access
      registers after their clocks have been removed.
      
      The more common case is that the watchdog timeout will produce a
      warning message which doesn't lead to a crash. The chances of this
      are greatly increased by the fact that bcmgenet_netif_stop stops
      the transmit queues which can easily precipitate a watchdog time-
      out because of stale trans_start data in the queues.
      
      This commit corrects the behavior by ensuring that the watchdog
      timeout is disabled before enterring bcmgenet_netif_stop. There
      are currently only two users of the bcmgenet_netif_stop function:
      close and suspend.
      
      The close case already handles the issue by exiting the RUNNING
      state before invoking the driver close service.
      
      The suspend case now performs the netif_device_detach to exit the
      PRESENT state before the call to bcmgenet_netif_stop rather than
      after it.
      
      These behaviors prevent any future scheduling of the driver timeout
      service during the window. The netif_tx_stop_all_queues function
      in bcmgenet_netif_stop is replaced with netif_tx_disable to ensure
      synchronization with any transmit or timeout threads that may
      already be executing on other cores.
      
      For symmetry, the netif_device_attach call upon resume is moved to
      after the call to bcmgenet_netif_start. Since it wakes the transmit
      queues it is not necessary to invoke netif_tx_start_all_queues from
      bcmgenet_netif_start so it is moved into the driver open service.
      
      [ Upstream commit 09e805d2 ]
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      852c280d
    • David Ahern's avatar
      ipv6: Fix PMTU updates for UDP/raw sockets in presence of VRF · 5bb115fb
      David Ahern authored
      [ Upstream commit 7ddacfa5 ]
      
      Preethi reported that PMTU discovery for UDP/raw applications is not
      working in the presence of VRF when the socket is not bound to a device.
      The problem is that ip6_sk_update_pmtu does not consider the L3 domain
      of the skb device if the socket is not bound. Update the function to
      set oif to the L3 master device if relevant.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Reported-by: default avatarPreethi Ramachandra <preethir@juniper.net>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bb115fb
    • Xin Long's avatar
      ipv6: fix a dst leak when removing its exception · b536dd80
      Xin Long authored
      [ Upstream commit 761f6026 ]
      
      These is no need to hold dst before calling rt6_remove_exception_rt().
      The call to dst_hold_safe() in ip6_link_failure() was for ip6_del_rt(),
      which has been removed in Commit 93531c67 ("net/ipv6: separate
      handling of FIB entries from dst based routes"). Otherwise, it will
      cause a dst leak.
      
      This patch is to simply remove the dst_hold_safe() call before calling
      rt6_remove_exception_rt() and also do the same in ip6_del_cached_rt().
      It's safe, because the removal of the exception that holds its dst's
      refcnt is protected by rt6_exception_lock.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Fixes: 23fb93a4 ("net/ipv6: Cleanup exception and cache route handling")
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b536dd80
    • Sabrina Dubroca's avatar
      ip_tunnel: don't force DF when MTU is locked · 60258098
      Sabrina Dubroca authored
      [ Upstream commit 16f7eb2b ]
      
      The various types of tunnels running over IPv4 can ask to set the DF
      bit to do PMTU discovery. However, PMTU discovery is subject to the
      threshold set by the net.ipv4.route.min_pmtu sysctl, and is also
      disabled on routes with "mtu lock". In those cases, we shouldn't set
      the DF bit.
      
      This patch makes setting the DF bit conditional on the route's MTU
      locking state.
      
      This issue seems to be older than git history.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60258098
    • Michał Mirosław's avatar
      ibmvnic: fix accelerated VLAN handling · a6870825
      Michał Mirosław authored
      [ Upstream commit e84b4794 ]
      
      Don't request tag insertion when it isn't present in outgoing skb.
      Signed-off-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6870825
    • 배석진's avatar
      flow_dissector: do not dissect l4 ports for fragments · ad6dfbd1
      배석진 authored
      [ Upstream commit 62230715 ]
      
      Only first fragment has the sport/dport information,
      not the following ones.
      
      If we want consistent hash for all fragments, we need to
      ignore ports even for first fragment.
      
      This bug is visible for IPv6 traffic, if incoming fragments
      do not have a flow label, since skb_get_hash() will give
      different results for first fragment and following ones.
      
      It is also visible if any routing rule wants dissection
      and sport or dport.
      
      See commit 5e5d6fed ("ipv6: route: dissect flow
      in input path if fib rules need it") for details.
      
      [edumazet] rewrote the changelog completely.
      
      Fixes: 06635a35 ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
      Signed-off-by: default avatar배석진 <soukjin.bae@samsung.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad6dfbd1
  2. 21 Nov, 2018 17 commits