1. 10 Aug, 2015 8 commits
    • Benjamin Poirier's avatar
      net-timestamp: Update skb_complete_tx_timestamp comment · 7a76a021
      Benjamin Poirier authored
      After "62bccb8c net-timestamp: Make the clone operation stand-alone from phy
      timestamping" the hwtstamps parameter of skb_complete_tx_timestamp() may no
      longer be NULL.
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a76a021
    • Florian Westphal's avatar
      ipv6: don't reject link-local nexthop on other interface · 330567b7
      Florian Westphal authored
      48ed7b26 ("ipv6: reject locally assigned nexthop addresses") is too
      strict; it rejects following corner-case:
      
      ip -6 route add default via fe80::1:2:3 dev eth1
      
      [ where fe80::1:2:3 is assigned to a local interface, but not eth1 ]
      
      Fix this by restricting search to given device if nh is linklocal.
      
      Joint work with Hannes Frederic Sowa.
      
      Fixes: 48ed7b26 ("ipv6: reject locally assigned nexthop addresses")
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      330567b7
    • Daniel Borkmann's avatar
      netlink: make sure -EBUSY won't escape from netlink_insert · 4e7c1330
      Daniel Borkmann authored
      Linus reports the following deadlock on rtnl_mutex; triggered only
      once so far (extract):
      
      [12236.694209] NetworkManager  D 0000000000013b80     0  1047      1 0x00000000
      [12236.694218]  ffff88003f902640 0000000000000000 ffffffff815d15a9 0000000000000018
      [12236.694224]  ffff880119538000 ffff88003f902640 ffffffff81a8ff84 00000000ffffffff
      [12236.694230]  ffffffff81a8ff88 ffff880119c47f00 ffffffff815d133a ffffffff81a8ff80
      [12236.694235] Call Trace:
      [12236.694250]  [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
      [12236.694257]  [<ffffffff815d133a>] ? schedule+0x2a/0x70
      [12236.694263]  [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
      [12236.694271]  [<ffffffff815d2c3f>] ? __mutex_lock_slowpath+0x7f/0xf0
      [12236.694280]  [<ffffffff815d2cc6>] ? mutex_lock+0x16/0x30
      [12236.694291]  [<ffffffff814f1f90>] ? rtnetlink_rcv+0x10/0x30
      [12236.694299]  [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
      [12236.694309]  [<ffffffff814f5ad3>] ? rtnl_getlink+0x113/0x190
      [12236.694319]  [<ffffffff814f202a>] ? rtnetlink_rcv_msg+0x7a/0x210
      [12236.694331]  [<ffffffff8124565c>] ? sock_has_perm+0x5c/0x70
      [12236.694339]  [<ffffffff814f1fb0>] ? rtnetlink_rcv+0x30/0x30
      [12236.694346]  [<ffffffff8150d62c>] ? netlink_rcv_skb+0x9c/0xc0
      [12236.694354]  [<ffffffff814f1f9f>] ? rtnetlink_rcv+0x1f/0x30
      [12236.694360]  [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
      [12236.694367]  [<ffffffff8150d344>] ? netlink_sendmsg+0x484/0x5d0
      [12236.694376]  [<ffffffff810a236f>] ? __wake_up+0x2f/0x50
      [12236.694387]  [<ffffffff814cad23>] ? sock_sendmsg+0x33/0x40
      [12236.694396]  [<ffffffff814cb05e>] ? ___sys_sendmsg+0x22e/0x240
      [12236.694405]  [<ffffffff814cab75>] ? ___sys_recvmsg+0x135/0x1a0
      [12236.694415]  [<ffffffff811a9d12>] ? eventfd_write+0x82/0x210
      [12236.694423]  [<ffffffff811a0f9e>] ? fsnotify+0x32e/0x4c0
      [12236.694429]  [<ffffffff8108cb70>] ? wake_up_q+0x60/0x60
      [12236.694434]  [<ffffffff814cba09>] ? __sys_sendmsg+0x39/0x70
      [12236.694440]  [<ffffffff815d4797>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
      
      It seems so far plausible that the recursive call into rtnetlink_rcv()
      looks suspicious. One way, where this could trigger is that the senders
      NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so
      the rtnl_getlink() request's answer would be sent to the kernel instead
      to the actual user process, thus grabbing rtnl_mutex() twice.
      
      One theory would be that netlink_autobind() triggered via netlink_sendmsg()
      internally overwrites the -EBUSY error to 0, but where it is wrongly
      originating from __netlink_insert() instead. That would reset the
      socket's portid to 0, which is then filled into NETLINK_CB(skb).portid
      later on. As commit d470e3b4 ("[NETLINK]: Fix two socket hashing bugs.")
      also puts it, -EBUSY should not be propagated from netlink_insert().
      
      It looks like it's very unlikely to reproduce. We need to trigger the
      rhashtable_insert_rehash() handler under a situation where rehashing
      currently occurs (one /rare/ way would be to hit ht->elasticity limits
      while not filled enough to expand the hashtable, but that would rather
      require a specifically crafted bind() sequence with knowledge about
      destination slots, seems unlikely). It probably makes sense to guard
      __netlink_insert() in any case and remap that error. It was suggested
      that EOVERFLOW might be better than an already overloaded ENOMEM.
      
      Reference: http://thread.gmane.org/gmane.linux.network/372676Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e7c1330
    • Ivan Vecera's avatar
      bna: fix interrupts storm caused by erroneous packets · ade4dc3e
      Ivan Vecera authored
      The commit "e29aa339 bna: Enable Multi Buffer RX" moved packets counter
      increment from the beginning of the NAPI processing loop after the check
      for erroneous packets so they are never accounted. This counter is used
      to inform firmware about number of processed completions (packets).
      As these packets are never acked the firmware fires IRQs for them again
      and again.
      
      Fixes: e29aa339 ("bna: Enable Multi Buffer RX")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Acked-by: default avatarRasesh Mody <rasesh.mody@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ade4dc3e
    • David S. Miller's avatar
      Merge branch 'mvpp2-fixes' · ea708584
      David S. Miller authored
      Marcin Wojtas says:
      
      ====================
      Fixes for the network driver of Marvell Armada 375 SoC
      
      This is a set of three patches that fix long-lasting problems implemented in
      the initial support for the Armada 375 network controller.
      
      Due to an inappropriate concept of handling the per-CPU sent packets'
      processing on TX path the driver numerous problems occured, such as RCU
      stalls. Those have been fixed, of which details you can find in the commit
      logs. The patches were intensively tested on top of v4.2-rc5.
      
      I'm looking forward to any comments or remarks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea708584
    • Marcin Wojtas's avatar
      net: mvpp2: replace TX coalescing interrupts with hrtimer · edc660fa
      Marcin Wojtas authored
      The PP2 controller is capable of per-CPU TX processing, which means there are
      per-CPU banked register sets and queues. Current version of the driver supports
      TX packet coalescing - once on given CPU sent packets amount reaches a threshold
      value, an IRQ occurs. However, there is a single interrupt line responsible for
      CPU0/1 TX and RX events (the latter is not per-CPU, the hardware does not
      support RSS).
      
      When the top-half executes the interrupt cause is not known. This is why in
      NAPI poll function, along with RX processing, IRQ cause register on both
      CPU's is accessed in order to determine on which of them the TX coalescing
      threshold might have been reached. Thus the egress processing and releasing the
      buffers is able to take place on the corresponding CPU. Hitherto approach lead
      to an illegal usage of on_each_cpu function in softirq context.
      
      The problem is solved by resigning from TX coalescing interrupts and separating
      egress finalization from NAPI processing. For that purpose a method of using
      hrtimer is introduced. In main transmit function (mvpp2_tx) buffers are released
      once a software coalescing threshold is reached. In case not all the data is
      processed a timer is set on this CPU - in its interrupt context a tasklet is
      scheduled in which all queues are processed. At once only one timer per-CPU can
      be running, which is controlled by a dedicated flag.
      
      This commit removes TX processing from NAPI polling function, disables hardware
      coalescing and enables hrtimer with tasklet, using new per-CPU port structure
      (mvpp2_port_pcpu).
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edc660fa
    • Marcin Wojtas's avatar
      net: mvpp2: enable proper per-CPU TX buffers unmapping · 71ce391d
      Marcin Wojtas authored
      mvpp2 driver allows usage of per-CPU TX processing. Once the packets are
      prepared independetly on each CPU, the hardware enqueues the descriptors in
      common TX queue. After they are sent, the buffers and associated sk_buffs
      should be released on the corresponding CPU.
      
      This is why a special index is maintained in order to point to the right data to
      be released after transmission takes place. Each per-CPU TX queue comprise an
      array of sent sk_buffs, freed in mvpp2_txq_bufs_free function. However, the
      index was used there also for obtaining a descriptor (and therefore a buffer to
      be DMA-unmapped) from common TX queue, which was wrong, because it was not
      referring to the current CPU.
      
      This commit enables proper unmapping of sent data buffers by indexing them in
      per-CPU queues using a dedicated array for keeping their physical addresses.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71ce391d
    • Marcin Wojtas's avatar
      net: mvpp2: remove excessive spinlocks from driver initialization · d53793c5
      Marcin Wojtas authored
      Using spinlocks protection during one-time driver initialization is not
      necessary. Moreover it resulted in invalid GFP_KERNEL allocation under the lock.
      
      This commit removes redundant spinlocks from buffer manager part of mvpp2
      initialization.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Reported-by: default avatarAlexandre Fournier <alexandre.fournier@wisp-e.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d53793c5
  2. 07 Aug, 2015 18 commits
  3. 04 Aug, 2015 6 commits
  4. 03 Aug, 2015 7 commits
  5. 01 Aug, 2015 1 commit
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7c764cec
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Must teardown SR-IOV before unregistering netdev in igb driver, from
          Alex Williamson.
      
       2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell.
      
       3) Default route selection in ipv4 should take the prefix length, table
          ID, and TOS into account, from Julian Anastasov.
      
       4) sch_plug must have a reset method in order to purge all buffered
          packets when the qdisc is reset, likewise for sch_choke, from WANG
          Cong.
      
       5) Fix deadlock and races in slave_changelink/br_setport in bridging.
          From Nikolay Aleksandrov.
      
       6) mlx4 bug fixes (wrong index in port even propagation to VFs,
          overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack
          Morgenstein, and Or Gerlitz.
      
       7) Turn off klog message about SCTP userspace interface compat that
          makes no sense at all, from Daniel Borkmann.
      
       8) Fix unbounded restarts of inet frag eviction process, causing NMI
          watchdog soft lockup messages, from Florian Westphal.
      
       9) Suspend/resume fixes for r8152 from Hayes Wang.
      
      10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from
          Sabrina Dubroca.
      
      11) Fix performance regression when removing a lot of routes from the
          ipv4 routing tables, from Alexander Duyck.
      
      12) Fix device leak in AF_PACKET, from Lars Westerhoff.
      
      13) AF_PACKET also has a header length comparison bug due to signedness,
          from Alexander Drozdov.
      
      14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann.
      
      15) Memory leaks, TSO stats, watchdog timeout and other fixes to
          thunderx driver from Sunil Goutham and Thanneeru Srinivasulu.
      
      16) act_bpf can leak memory when replacing programs, from Daniel
          Borkmann.
      
      17) WOL packet fixes in gianfar driver, from Claudiu Manoil.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
        stmmac: fix missing MODULE_LICENSE in stmmac_platform
        gianfar: Enable device wakeup when appropriate
        gianfar: Fix suspend/resume for wol magic packet
        gianfar: Fix warning when CONFIG_PM off
        act_pedit: check binding before calling tcf_hash_release()
        net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
        net: sched: fix refcount imbalance in actions
        r8152: reset device when tx timeout
        r8152: add pre_reset and post_reset
        qlcnic: Fix corruption while copying
        act_bpf: fix memory leaks when replacing bpf programs
        net: thunderx: Fix for crash while BGX teardown
        net: thunderx: Add PCI driver shutdown routine
        net: thunderx: Fix crash when changing rss with mutliple traffic flows
        net: thunderx: Set watchdog timeout value
        net: thunderx: Wakeup TXQ only if CQE_TX are processed
        net: thunderx: Suppress alloc_pages() failure warnings
        net: thunderx: Fix TSO packet statistic
        net: thunderx: Fix memory leak when changing queue count
        net: thunderx: Fix RQ_DROP miscalculation
        ...
      7c764cec