1. 10 Dec, 2013 24 commits
    • Jiri Pirko's avatar
      73af614a
    • Jiri Pirko's avatar
      neigh: wrap proc dointvec functions · cb5b09c1
      Jiri Pirko authored
      This will be needed later on to provide better management of default values.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb5b09c1
    • Jiri Pirko's avatar
      neigh: convert parms to an array · 1f9248e5
      Jiri Pirko authored
      This patch converts the neigh param members to an array. This allows easier
      manipulation which will be needed later on to provide better management of
      default values.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f9248e5
    • David S. Miller's avatar
      Merge branch 'phy_reset' · 65be6291
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: phy: consolidate PHY reset
      
      This patchset consolidates the PHY reset through the MII BMCR
      register by using a central place were this is done.
      
      This patchset resumes the work Kyle Moffett started here:
      https://lkml.org/lkml/2011/10/20/301
      
      Note that at this point, drivers doing funky things after issuing
      a PHY reset using phy_init_hw() will still suffer from PHY state
      machine problems, this will be taken care of later on.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65be6291
    • Florian Fainelli's avatar
      net: sh_eth: do not issue a wild PHY reset through BMCR · 0c9eb5b9
      Florian Fainelli authored
      The sh_eth driver issues an uncontrolled PHY reset through the MII
      register BMCR but fails to wait for the reset to complete, and will also
      implicitely wipe out all possible PHY fixups applied. Use phy_init_hw()
      which remedies both problems.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c9eb5b9
    • Florian Fainelli's avatar
      net: tc35815: use phy_init_hw for PHY reset · 01b0114e
      Florian Fainelli authored
      Instead of open-coding the PHY reset through MII BMCR, use phy_init_hw()
      which does that for us and also makes sure that any PHY specific fixups
      are applied.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01b0114e
    • Florian Fainelli's avatar
      net: pxa168_eth: use phy_init_hw for PHY reset · 78de53f0
      Florian Fainelli authored
      Instead of open-coding a PHY reset through the MII BMCR register, use
      phy_init_hw() which does this for us and ensures that PHY device fixups
      are also applied. We also remove a call to ethernet_phy_reset() which is
      now unncessary since phy_attach() calls phy_attach_direct() which in
      turns calls phy_init_hw().
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78de53f0
    • Florian Fainelli's avatar
      net: mv643xx_eth: use phy_init_hw to reset PHY · 7cd14636
      Florian Fainelli authored
      Instead of open-coding a PHY reset through the MII BMCR register, use
      phy_init_hw() which does that for us and will also make sure that PHY
      fixups are applied if required. We also remove a call to phy_reset()
      due to the following sequence of calls in the driver:
      
      phy_scan()
      	-> phy_connect()
      		-> phy_connect_direct()
      			-> phy_attach_direct()
      				-> phy_init_hw()
      
      and we only have a call to phy_init() after phy_scan().
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cd14636
    • Florian Fainelli's avatar
      net: phy: consolidate PHY reset in phy_init_hw() · 87aa9f9c
      Florian Fainelli authored
      There are quite a lot of drivers touching a PHY device MII_BMCR
      register to reset the PHY without taking care of:
      
      1) ensuring that BMCR_RESET is cleared after a given timeout
      2) the PHY state machine resuming to the proper state and re-applying
      potentially changed settings such as auto-negotiation
      
      Introduce phy_poll_reset() which will take care of polling the MII_BMCR
      for the BMCR_RESET bit to be cleared after a given timeout or return a
      timeout error code.
      
      In order to make sure the PHY is in a correct state, phy_init_hw() first
      issues a software reset through MII_BMCR and then applies any fixups.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87aa9f9c
    • Florian Fainelli's avatar
      net: bfin_mac: do not reset PHY after phy_start() · 06d87cec
      Florian Fainelli authored
      The PHY is already reset during driver probing, and this manual reset
      after calling phy_start() will wipe out board-specific PHY fixups and
      driver specific configuration initialization. Remove that explicit PHY
      reset.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06d87cec
    • Florian Fainelli's avatar
      net: greth: use phy_read_status() · f0528ce7
      Florian Fainelli authored
      In case the greth driver is bound to anything but the Generic PHY
      driver or the PHY has a special read_status callback implemented,
      unexpected things will happen. Make sure we that we use
      phy_read_status() which does the proper abstraction of calling the
      driver specific read_status() callback for a given PHY.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0528ce7
    • Florian Fainelli's avatar
      net: phy: use phy_init_hw instead of open-coding it · 2613f95f
      Florian Fainelli authored
      Use phy_init_hw() instead of open-coding it in phy_mii_ioctl(), this
      improves consistenty and makes sure that we will not duplicate the same
      routine somewhere else.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2613f95f
    • Florian Fainelli's avatar
      net: phy: report link partner features through ethtool · 114002bc
      Florian Fainelli authored
      The PHY library already reads the MII_STAT1000 and MII_LPA registers in
      genphy_read_status(), so extend it to also populate the PHY device link
      partner advertised features such that we can feed this back into ethtool
      when asked for it in phy_ethtool_gset().
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      114002bc
    • Zhi Yong Wu's avatar
      tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg() · 73713357
      Zhi Yong Wu authored
      By checking related codes, it is impossible that ret > len or total_len,
      so we should remove some useless codes in both above functions.
      Signed-off-by: default avatarZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73713357
    • Zhi Yong Wu's avatar
      macvtap: remove useless codes in macvtap_aio_read() and macvtap_recvmsg() · 41e4af69
      Zhi Yong Wu authored
      By checking related codes, it is impossible that ret > len or total_len,
      so we should remove some useless coeds in both above functions.
      Signed-off-by: default avatarZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41e4af69
    • Paul Durrant's avatar
      xen-netback: improve guest-receive-side flow control · ca2f09f2
      Paul Durrant authored
      The way that flow control works without this patch is that, in start_xmit()
      the code uses xenvif_count_skb_slots() to predict how many slots
      xenvif_gop_skb() will consume and then adds this to a 'req_cons_peek'
      counter which it then uses to determine if the shared ring has that amount
      of space available by checking whether 'req_prod' has passed that value.
      If the ring doesn't have space the tx queue is stopped.
      xenvif_gop_skb() will then consume slots and update 'req_cons' and issue
      responses, updating 'rsp_prod' as it goes. The frontend will consume those
      responses and post new requests, by updating req_prod. So, req_prod chases
      req_cons which chases rsp_prod, and can never exceed that value. Thus if
      xenvif_count_skb_slots() ever returns a number of slots greater than
      xenvif_gop_skb() uses, req_cons_peek will get to a value that req_prod cannot
      possibly achieve (since it's limited by the 'real' req_cons) and, if this
      happens enough times, req_cons_peek gets more than a ring size ahead of
      req_cons and the tx queue then remains stopped forever waiting for an
      unachievable amount of space to become available in the ring.
      
      Having two routines trying to calculate the same value is always going to be
      fragile, so this patch does away with that. All we essentially need to do is
      make sure that we have 'enough stuff' on our internal queue without letting
      it build up uncontrollably. So start_xmit() makes a cheap optimistic check
      of how much space is needed for an skb and only turns the queue off if that
      is unachievable. net_rx_action() is the place where we could do with an
      accurate predicition but, since that has proven tricky to calculate, a cheap
      worse-case (but not too bad) estimate is all we really need since the only
      thing we *must* prevent is xenvif_gop_skb() consuming more slots than are
      available.
      
      Without this patch I can trivially stall netback permanently by just doing
      a large guest to guest file copy between two Windows Server 2008R2 VMs on a
      single host.
      
      Patch tested with frontends in:
      - Windows Server 2008R2
      - CentOS 6.0
      - Debian Squeeze
      - Debian Wheezy
      - SLES11
      Signed-off-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Annie Li <annie.li@oracle.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca2f09f2
    • Erik Hugne's avatar
      tipc: remove interface state mirroring in bearer · 512137ee
      Erik Hugne authored
      struct 'tipc_bearer' is a generic representation of the underlying
      media type, and exists in a one-to-one relationship to each interface
      TIPC is using. The struct contains a 'blocked' flag that mirrors the
      operational and execution state of the represented interface, and is
      updated through notification calls from the latter. The users of
      tipc_bearer are checking this flag before each attempt to send a
      packet via the interface.
      
      This state mirroring serves no purpose in the current code base. TIPC
      links will not discover a media failure any faster through this
      mechanism, and in reality the flag only adds overhead at packet
      sending and reception.
      
      Furthermore, the fact that the flag needs to be protected by a spinlock
      aggregated into tipc_bearer has turned out to cause a serious and
      completely unnecessary deadlock problem.
      
      CPU0                                    CPU1
      ----                                    ----
      Time 0: bearer_disable()                link_timeout()
      Time 1:   spin_lock_bh(&b_ptr->lock)      tipc_link_push_queue()
      Time 2:   tipc_link_delete()                tipc_bearer_blocked(b_ptr)
      Time 3:     k_cancel_timer(&req->timer)       spin_lock_bh(&b_ptr->lock)
      Time 4:       del_timer_sync(&req->timer)
      
      I.e., del_timer_sync() on CPU0 never returns, because the timer handler
      on CPU1 is waiting for the bearer lock.
      
      We eliminate the 'blocked' flag from struct tipc_bearer, along with all
      tests on this flag. This not only resolves the deadlock, but also
      simplifies and speeds up the data path execution of TIPC. It also fits
      well into our ongoing effort to make the locking policy simpler and
      more manageable.
      
      An effect of this change is that we can get rid of functions such as
      tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
      We replace the latter with a new function, tipc_reset_bearer(), which
      resets all links associated to the bearer immediately after an
      interface goes down.
      
      A user might notice one slight change in link behaviour after this
      change. When an interface goes down, (e.g. through a NETDEV_DOWN
      event) all attached links will be reset immediately, instead of
      leaving it to each link to detect the failure through a timer-driven
      mechanism. We consider this an improvement, and see no obvious risks
      with the new behavior.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarPaul Gortmaker <Paul.Gortmaker@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      512137ee
    • wangweidong's avatar
      x25: convert printks to pr_<level> · b73e9e3c
      wangweidong authored
      use pr_<level> instead of printk(LEVEL)
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b73e9e3c
    • Daniel Borkmann's avatar
      packet: introduce PACKET_QDISC_BYPASS socket option · d346a3fa
      Daniel Borkmann authored
      This patch introduces a PACKET_QDISC_BYPASS socket option, that
      allows for using a similar xmit() function as in pktgen instead
      of taking the dev_queue_xmit() path. This can be very useful when
      PF_PACKET applications are required to be used in a similar
      scenario as pktgen, but with full, flexible packet payload that
      needs to be provided, for example.
      
      On default, nothing changes in behaviour for normal PF_PACKET
      TX users, so everything stays as is for applications. New users,
      however, can now set PACKET_QDISC_BYPASS if needed to prevent
      own packets from i) reentering packet_rcv() and ii) to directly
      push the frame to the driver.
      
      In doing so we can increase pps (here 64 byte packets) for
      PF_PACKET a bit:
      
        # CPUs -- QDISC_BYPASS   -- qdisc path -- qdisc path[**]
        1 CPU  ==  1,509,628 pps --  1,208,708 --  1,247,436
        2 CPUs ==  3,198,659 pps --  2,536,012 --  1,605,779
        3 CPUs ==  4,787,992 pps --  3,788,740 --  1,735,610
        4 CPUs ==  6,173,956 pps --  4,907,799 --  1,909,114
        5 CPUs ==  7,495,676 pps --  5,956,499 --  2,014,422
        6 CPUs ==  9,001,496 pps --  7,145,064 --  2,155,261
        7 CPUs == 10,229,776 pps --  8,190,596 --  2,220,619
        8 CPUs == 11,040,732 pps --  9,188,544 --  2,241,879
        9 CPUs == 12,009,076 pps -- 10,275,936 --  2,068,447
       10 CPUs == 11,380,052 pps -- 11,265,337 --  1,578,689
       11 CPUs == 11,672,676 pps -- 11,845,344 --  1,297,412
       [...]
       20 CPUs == 11,363,192 pps -- 11,014,933 --  1,245,081
      
       [**]: qdisc path with packet_rcv(), how probably most people
             seem to use it (hopefully not anymore if not needed)
      
      The test was done using a modified trafgen, sending a simple
      static 64 bytes packet, on all CPUs.  The trick in the fast
      "qdisc path" case, is to avoid reentering packet_rcv() by
      setting the RAW socket protocol to zero, like:
      socket(PF_PACKET, SOCK_RAW, 0);
      
      Tradeoffs are documented as well in this patch, clearly, if
      queues are busy, we will drop more packets, tc disciplines are
      ignored, and these packets are not visible to taps anymore. For
      a pktgen like scenario, we argue that this is acceptable.
      
      The pointer to the xmit function has been placed in packet
      socket structure hole between cached_dev and prot_hook that
      is hot anyway as we're working on cached_dev in each send path.
      
      Done in joint work together with Jesper Dangaard Brouer.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d346a3fa
    • Daniel Borkmann's avatar
      net: dev: move inline skb_needs_linearize helper to header · 4262e5cc
      Daniel Borkmann authored
      As we need it elsewhere, move the inline helper function of
      skb_needs_linearize() over to skbuff.h include file. While
      at it, also convert the return to 'bool' instead of 'int'
      and add a proper kernel doc.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4262e5cc
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 34f9f437
      David S. Miller authored
      Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
      Daniel's direct transmit changes depend upon.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34f9f437
    • Daniel Borkmann's avatar
      packet: fix send path when running with proto == 0 · 66e56cd4
      Daniel Borkmann authored
      Commit e40526cb introduced a cached dev pointer, that gets
      hooked into register_prot_hook(), __unregister_prot_hook() to
      update the device used for the send path.
      
      We need to fix this up, as otherwise this will not work with
      sockets created with protocol = 0, plus with sll_protocol = 0
      passed via sockaddr_ll when doing the bind.
      
      So instead, assign the pointer directly. The compiler can inline
      these helper functions automagically.
      
      While at it, also assume the cached dev fast-path as likely(),
      and document this variant of socket creation as it seems it is
      not widely used (seems not even the author of TX_RING was aware
      of that in his reference example [1]). Tested with reproducer
      from e40526cb.
      
       [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap#Example
      
      Fixes: e40526cb ("packet: fix use after free race in send path when dev is released")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Tested-by: default avatarSalam Noureddine <noureddine@aristanetworks.com>
      Tested-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66e56cd4
    • Eric Dumazet's avatar
      pkt_sched: give visibility to mq slave qdiscs · 95dc1929
      Eric Dumazet authored
      Commit 6da7c8fc ("qdisc: allow setting default queuing discipline")
      added the ability to change default qdisc from pfifo_fast to say fq
      
      But as most modern ethernet devices are multiqueue, we cant really
      see all the statistics from "tc -s qdisc show", as the default root
      qdisc is mq.
      
      This patch adds the calls to qdisc_list_add() to mq and mqprio
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95dc1929
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · fbec3706
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates
      
      This series contains updates to i40e only.
      
      Jacob provides a i40e patch to get 1588 work correctly by separating
      TSYNVALID and TSYNINDX fields in the receive descriptor.
      
      Jesse provides several i40e patches, first to correct the checking
      of the multi-bit state.  The hash is reported correctly in the RSS
      field if and only if the filter status is 3.  Other values of the
      filter status mean different things and we should not depend on a
      bitwise result.  Then provides a patch to enable a couple of
      workarounds based on revision ID that allow the driver to work
      more fully on early hardware.
      
      Shannon provides several i40e patches as well.  First sets the media
      type in the hardware structure based on the external connection type.
      Then provides a patch to only setup the rings that will be used.  Lastly
      provides a fix where the TESTING state was still set when exiting the
      ethtool diagnostics.
      
      Kevin Scott provides one i40e patch to add a new flag to the i40e_add_veb()
      which allows the driver to request the hardware to filter on layer 2
      parameters.
      
      Anjali provides four i40e patches, first refactors the reset code in
      order to re-size queues and vectors while the interface is still up.
      Then provides a patch to enable all PCTYPEs expect FCoE for RSS.  Adds
      a message to notify the user of how many VFs are initialized on each
      port.  Lastly adds a new variable to track the number of PF instances,
      this is a global counter on purpose so that each PF loaded has a
      unique ID.
      
      Catherine bumps the driver version.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbec3706
  2. 09 Dec, 2013 12 commits
  3. 07 Dec, 2013 4 commits