1. 12 Aug, 2015 6 commits
    • David S. Miller's avatar
      Merge branch 'gianfar-fixes' · e941ba86
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      gianfar: filer changes
      
      respinning with examples as requested.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e941ba86
    • Jakub Kicinski's avatar
      gianfar: remove faulty filer optimizer · 1f2b7293
      Jakub Kicinski authored
      Current filer rule optimization is broken in several ways:
       (1) Can perform reads/writes beyond end of allocated tables.
           (gianfar_ethtool.c:1326).
      
      (2) It breaks badly for rules with more than 2 specifiers
           (e.g. matching ip, port, tos).
      
      Example:
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.1 dst-port 1 tos 1 action 1
      Added rule with ID 254
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.2 dst-port 2 tos 2 action 9
      Added rule with ID 253
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.3 dst-port 3 tos 3 action 17
      Added rule with ID 252
      # ./filer_decode /sys/kernel/debug/gfar1/filer_raw
      00: MASK == 00000210 AND         Q:00           ctrl:00000080 prop:00000210
      01: FPR  == 00000210 AND CLE     Q:00           ctrl:00000281 prop:00000210
      02: MASK == ffffffff AND         Q:00           ctrl:00000080 prop:ffffffff
      03: DPT  == 00000003 AND         Q:00           ctrl:0000008e prop:00000003
      04: TOS  == 00000003 AND         Q:00           ctrl:0000008a prop:00000003
      05: DIA  == 0a000003 AND         Q:11           ctrl:0000448c prop:0a000003
      06: DPT  == 00000002 AND         Q:00           ctrl:0000008e prop:00000002
      07: TOS  == 00000002 AND         Q:00           ctrl:0000008a prop:00000002
      08: DIA  == 0a000002 AND         Q:09           ctrl:0000248c prop:0a000002
      09: DIA  == 0a000001 AND         Q:00           ctrl:0000008c prop:0a000001
      0a: DPT  == 00000001 AND         Q:00           ctrl:0000008e prop:00000001
      0b: TOS  == 00000001     CLE     Q:01           ctrl:0000060a prop:00000001
      ff: MASK >= 00000000             Q:00           ctrl:00000020 prop:00000000
      
      (Entire cluster gets AND-ed together).
      
       (3) We observed that the masking rules it generates do not
           play well with clustering on P2020.  Only first rule
           of the cluster would ever fire.  Given that optimizer
           relies heavily on masking this is very hard to fix.
      
      Example:
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.1 dst-port 1  action 1
      Added rule with ID 254
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.2 dst-port 2  action 9
      Added rule with ID 253
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.3 dst-port 3  action 17
      Added rule with ID 252
      # ./filer_decode /sys/kernel/debug/gfar1/filer_raw
      00: MASK == 00000210 AND         Q:00           ctrl:00000080 prop:00000210
      01: FPR  == 00000210 AND CLE     Q:00           ctrl:00000281 prop:00000210
      02: MASK == ffffffff AND         Q:00           ctrl:00000080 prop:ffffffff
      03: DPT  == 00000003 AND         Q:00           ctrl:0000008e prop:00000003
      04: DIA  == 0a000003             Q:11           ctrl:0000440c prop:0a000003
      05: DPT  == 00000002 AND         Q:00           ctrl:0000008e prop:00000002
      06: DIA  == 0a000002             Q:09           ctrl:0000240c prop:0a000002
      07: DIA  == 0a000001 AND         Q:00           ctrl:0000008c prop:0a000001
      08: DPT  == 00000001     CLE     Q:01           ctrl:0000060e prop:00000001
      ff: MASK >= 00000000             Q:00           ctrl:00000020 prop:00000000
      
      Which looks correct according to the spec but only the first
      (eth id 252)/last added rule for 10.0.0.3 will ever trigger.
      As if filer did not treat the AND CLE as cluster start but
      also kept AND-ing the rules.  We found no errata covering this.
      
      The fact that nobody noticed (2) or (3) makes me think
      that this feature is not very widely used and we should just
      remove it.
      Reported-by: default avatarAleksander Dutkowski <adutkowski@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kubakici@wp.pl>
      Acked-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f2b7293
    • Jakub Kicinski's avatar
      gianfar: correct list membership accounting · b5c8c890
      Jakub Kicinski authored
      At a cost of one line let's make sure .count is correct
      when calling gfar_process_filer_changes().
      Signed-off-by: default avatarJakub Kicinski <kubakici@wp.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5c8c890
    • Jakub Kicinski's avatar
      gianfar: correct filer table writing · a898fe04
      Jakub Kicinski authored
      MAX_FILER_IDX is the last usable index.  Using less-than
      will already guarantee that one entry for catch-all rule
      will be left, no need to subtract 1 here.
      Signed-off-by: default avatarJakub Kicinski <kubakici@wp.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a898fe04
    • Venkat Venkatsubra's avatar
      bonding: Gratuitous ARP gets dropped when first slave added · b02e3e94
      Venkat Venkatsubra authored
      When the first slave is added (such as during bootup) the first
      gratuitous ARP gets dropped. We don't see this drop during a failover.
      The packet gets dropped in qdisc (noop_enqueue).
      
      The fix is to delay the sending of gratuitous ARPs till the bond dev's
      carrier is present.
      
      It can also be worked around by setting num_grat_arp to more than 1.
      Signed-off-by: default avatarVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b02e3e94
    • Florian Fainelli's avatar
      net: dsa: Do not override PHY interface if already configured · 211c504a
      Florian Fainelli authored
      In case we need to divert reads/writes using the slave MII bus, we may have
      already fetched a valid PHY interface property from Device Tree, and that
      mode is used by the PHY driver to make configuration decisions.
      
      If we could not fetch the "phy-mode" property, we will assign p->phy_interface
      to PHY_INTERFACE_MODE_NA, such that we can actually check for that condition as
      to whether or not we should override the interface value.
      
      Fixes: 19334920 ("net: dsa: Set valid phy interface type")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      211c504a
  2. 11 Aug, 2015 7 commits
  3. 10 Aug, 2015 14 commits
    • David S. Miller's avatar
      Merge branch 'bnx2x-fixes' · 875a74b6
      David S. Miller authored
      Yuval Mintz says:
      
      ====================
      bnx2x: small fixes
      
      This adds 2 small fixes, one to error flows during memory release
      and the other to flash writes via ethtool API.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      875a74b6
    • Yuval Mintz's avatar
      bnx2x: Free NVRAM lock at end of each page · 0ea853df
      Yuval Mintz authored
      Writing each 4Kb page into flash might take up-to ~100 miliseconds,
      during which time management firmware cannot acces the nvram for its
      own uses.
      
      Firmware upgrade utility use the ethtool API to burn new flash images
      for the device via the ethtool API, doing so by writing several page-worth
      of data on each command. Such action might create problems for the
      management firmware, as the nvram might not be accessible for a long time.
      
      This patch changes the write implementation, releasing the nvram lock on
      the completion of each page, allowing the management firmware time to
      claim it and perform its own required actions.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarAriel Elior <Ariel.Elior@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ea853df
    • Yuval Mintz's avatar
      bnx2x: Prevent null pointer dereference on SKB release · e1615903
      Yuval Mintz authored
      On error flows its possible to free an SKB even if it was not allocated.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarAriel Elior <Ariel.Elior@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1615903
    • Dan Carpenter's avatar
      cxgb4: missing curly braces in t4_setup_debugfs() · 21a44763
      Dan Carpenter authored
      There were missing curly braces so it means we call add_debugfs_mem()
      unintentionally.
      
      Fixes: 3ccc6cf7 ('cxgb4: Adds support for T6 adapter')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21a44763
    • Benjamin Poirier's avatar
      net-timestamp: Update skb_complete_tx_timestamp comment · 7a76a021
      Benjamin Poirier authored
      After "62bccb8c net-timestamp: Make the clone operation stand-alone from phy
      timestamping" the hwtstamps parameter of skb_complete_tx_timestamp() may no
      longer be NULL.
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a76a021
    • Florian Westphal's avatar
      ipv6: don't reject link-local nexthop on other interface · 330567b7
      Florian Westphal authored
      48ed7b26 ("ipv6: reject locally assigned nexthop addresses") is too
      strict; it rejects following corner-case:
      
      ip -6 route add default via fe80::1:2:3 dev eth1
      
      [ where fe80::1:2:3 is assigned to a local interface, but not eth1 ]
      
      Fix this by restricting search to given device if nh is linklocal.
      
      Joint work with Hannes Frederic Sowa.
      
      Fixes: 48ed7b26 ("ipv6: reject locally assigned nexthop addresses")
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      330567b7
    • Daniel Borkmann's avatar
      netlink: make sure -EBUSY won't escape from netlink_insert · 4e7c1330
      Daniel Borkmann authored
      Linus reports the following deadlock on rtnl_mutex; triggered only
      once so far (extract):
      
      [12236.694209] NetworkManager  D 0000000000013b80     0  1047      1 0x00000000
      [12236.694218]  ffff88003f902640 0000000000000000 ffffffff815d15a9 0000000000000018
      [12236.694224]  ffff880119538000 ffff88003f902640 ffffffff81a8ff84 00000000ffffffff
      [12236.694230]  ffffffff81a8ff88 ffff880119c47f00 ffffffff815d133a ffffffff81a8ff80
      [12236.694235] Call Trace:
      [12236.694250]  [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
      [12236.694257]  [<ffffffff815d133a>] ? schedule+0x2a/0x70
      [12236.694263]  [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
      [12236.694271]  [<ffffffff815d2c3f>] ? __mutex_lock_slowpath+0x7f/0xf0
      [12236.694280]  [<ffffffff815d2cc6>] ? mutex_lock+0x16/0x30
      [12236.694291]  [<ffffffff814f1f90>] ? rtnetlink_rcv+0x10/0x30
      [12236.694299]  [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
      [12236.694309]  [<ffffffff814f5ad3>] ? rtnl_getlink+0x113/0x190
      [12236.694319]  [<ffffffff814f202a>] ? rtnetlink_rcv_msg+0x7a/0x210
      [12236.694331]  [<ffffffff8124565c>] ? sock_has_perm+0x5c/0x70
      [12236.694339]  [<ffffffff814f1fb0>] ? rtnetlink_rcv+0x30/0x30
      [12236.694346]  [<ffffffff8150d62c>] ? netlink_rcv_skb+0x9c/0xc0
      [12236.694354]  [<ffffffff814f1f9f>] ? rtnetlink_rcv+0x1f/0x30
      [12236.694360]  [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
      [12236.694367]  [<ffffffff8150d344>] ? netlink_sendmsg+0x484/0x5d0
      [12236.694376]  [<ffffffff810a236f>] ? __wake_up+0x2f/0x50
      [12236.694387]  [<ffffffff814cad23>] ? sock_sendmsg+0x33/0x40
      [12236.694396]  [<ffffffff814cb05e>] ? ___sys_sendmsg+0x22e/0x240
      [12236.694405]  [<ffffffff814cab75>] ? ___sys_recvmsg+0x135/0x1a0
      [12236.694415]  [<ffffffff811a9d12>] ? eventfd_write+0x82/0x210
      [12236.694423]  [<ffffffff811a0f9e>] ? fsnotify+0x32e/0x4c0
      [12236.694429]  [<ffffffff8108cb70>] ? wake_up_q+0x60/0x60
      [12236.694434]  [<ffffffff814cba09>] ? __sys_sendmsg+0x39/0x70
      [12236.694440]  [<ffffffff815d4797>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
      
      It seems so far plausible that the recursive call into rtnetlink_rcv()
      looks suspicious. One way, where this could trigger is that the senders
      NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so
      the rtnl_getlink() request's answer would be sent to the kernel instead
      to the actual user process, thus grabbing rtnl_mutex() twice.
      
      One theory would be that netlink_autobind() triggered via netlink_sendmsg()
      internally overwrites the -EBUSY error to 0, but where it is wrongly
      originating from __netlink_insert() instead. That would reset the
      socket's portid to 0, which is then filled into NETLINK_CB(skb).portid
      later on. As commit d470e3b4 ("[NETLINK]: Fix two socket hashing bugs.")
      also puts it, -EBUSY should not be propagated from netlink_insert().
      
      It looks like it's very unlikely to reproduce. We need to trigger the
      rhashtable_insert_rehash() handler under a situation where rehashing
      currently occurs (one /rare/ way would be to hit ht->elasticity limits
      while not filled enough to expand the hashtable, but that would rather
      require a specifically crafted bind() sequence with knowledge about
      destination slots, seems unlikely). It probably makes sense to guard
      __netlink_insert() in any case and remap that error. It was suggested
      that EOVERFLOW might be better than an already overloaded ENOMEM.
      
      Reference: http://thread.gmane.org/gmane.linux.network/372676Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e7c1330
    • Ivan Vecera's avatar
      bna: fix interrupts storm caused by erroneous packets · ade4dc3e
      Ivan Vecera authored
      The commit "e29aa339 bna: Enable Multi Buffer RX" moved packets counter
      increment from the beginning of the NAPI processing loop after the check
      for erroneous packets so they are never accounted. This counter is used
      to inform firmware about number of processed completions (packets).
      As these packets are never acked the firmware fires IRQs for them again
      and again.
      
      Fixes: e29aa339 ("bna: Enable Multi Buffer RX")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Acked-by: default avatarRasesh Mody <rasesh.mody@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ade4dc3e
    • David S. Miller's avatar
      Merge branch 'mvpp2-fixes' · ea708584
      David S. Miller authored
      Marcin Wojtas says:
      
      ====================
      Fixes for the network driver of Marvell Armada 375 SoC
      
      This is a set of three patches that fix long-lasting problems implemented in
      the initial support for the Armada 375 network controller.
      
      Due to an inappropriate concept of handling the per-CPU sent packets'
      processing on TX path the driver numerous problems occured, such as RCU
      stalls. Those have been fixed, of which details you can find in the commit
      logs. The patches were intensively tested on top of v4.2-rc5.
      
      I'm looking forward to any comments or remarks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea708584
    • Marcin Wojtas's avatar
      net: mvpp2: replace TX coalescing interrupts with hrtimer · edc660fa
      Marcin Wojtas authored
      The PP2 controller is capable of per-CPU TX processing, which means there are
      per-CPU banked register sets and queues. Current version of the driver supports
      TX packet coalescing - once on given CPU sent packets amount reaches a threshold
      value, an IRQ occurs. However, there is a single interrupt line responsible for
      CPU0/1 TX and RX events (the latter is not per-CPU, the hardware does not
      support RSS).
      
      When the top-half executes the interrupt cause is not known. This is why in
      NAPI poll function, along with RX processing, IRQ cause register on both
      CPU's is accessed in order to determine on which of them the TX coalescing
      threshold might have been reached. Thus the egress processing and releasing the
      buffers is able to take place on the corresponding CPU. Hitherto approach lead
      to an illegal usage of on_each_cpu function in softirq context.
      
      The problem is solved by resigning from TX coalescing interrupts and separating
      egress finalization from NAPI processing. For that purpose a method of using
      hrtimer is introduced. In main transmit function (mvpp2_tx) buffers are released
      once a software coalescing threshold is reached. In case not all the data is
      processed a timer is set on this CPU - in its interrupt context a tasklet is
      scheduled in which all queues are processed. At once only one timer per-CPU can
      be running, which is controlled by a dedicated flag.
      
      This commit removes TX processing from NAPI polling function, disables hardware
      coalescing and enables hrtimer with tasklet, using new per-CPU port structure
      (mvpp2_port_pcpu).
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edc660fa
    • Marcin Wojtas's avatar
      net: mvpp2: enable proper per-CPU TX buffers unmapping · 71ce391d
      Marcin Wojtas authored
      mvpp2 driver allows usage of per-CPU TX processing. Once the packets are
      prepared independetly on each CPU, the hardware enqueues the descriptors in
      common TX queue. After they are sent, the buffers and associated sk_buffs
      should be released on the corresponding CPU.
      
      This is why a special index is maintained in order to point to the right data to
      be released after transmission takes place. Each per-CPU TX queue comprise an
      array of sent sk_buffs, freed in mvpp2_txq_bufs_free function. However, the
      index was used there also for obtaining a descriptor (and therefore a buffer to
      be DMA-unmapped) from common TX queue, which was wrong, because it was not
      referring to the current CPU.
      
      This commit enables proper unmapping of sent data buffers by indexing them in
      per-CPU queues using a dedicated array for keeping their physical addresses.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71ce391d
    • Marcin Wojtas's avatar
      net: mvpp2: remove excessive spinlocks from driver initialization · d53793c5
      Marcin Wojtas authored
      Using spinlocks protection during one-time driver initialization is not
      necessary. Moreover it resulted in invalid GFP_KERNEL allocation under the lock.
      
      This commit removes redundant spinlocks from buffer manager part of mvpp2
      initialization.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Reported-by: default avatarAlexandre Fournier <alexandre.fournier@wisp-e.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d53793c5
    • Phil Sutter's avatar
      netfilter: SYNPROXY: fix sending window update to client · 3c16241c
      Phil Sutter authored
      Upon receipt of SYNACK from the server, ipt_SYNPROXY first sends back an ACK to
      finish the server handshake, then calls nf_ct_seqadj_init() to initiate
      sequence number adjustment of forwarded packets to the client and finally sends
      a window update to the client to unblock it's TX queue.
      
      Since synproxy_send_client_ack() does not set synproxy_send_tcp()'s nfct
      parameter, no sequence number adjustment happens and the client receives the
      window update with incorrect sequence number. Depending on client TCP
      implementation, this leads to a significant delay (until a window probe is
      being sent).
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3c16241c
    • Phil Sutter's avatar
      netfilter: ip6t_SYNPROXY: fix NULL pointer dereference · 96fffb4f
      Phil Sutter authored
      This happens when networking namespaces are enabled.
      Suggested-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Acked-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      96fffb4f
  4. 07 Aug, 2015 13 commits