1. 13 Feb, 2018 15 commits
    • Kirill Tkhai's avatar
      net: Convert rtnetlink_net_ops · 46456675
      Kirill Tkhai authored
      rtnetlink_net_init() and rtnetlink_net_exit()
      create and destroy netlink socket net::rtnl.
      
      The socket is used to send rtnl notification via
      rtnl_net_notifyid(). There is no a problem
      to create and destroy it in parallel with other
      pernet operations, as we link net in setup_net()
      after the socket is created, and destroy
      in cleanup_net() after net is unhashed from all
      the lists and there is no RCU references on it.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46456675
    • Kirill Tkhai's avatar
      net: Convert netlink_net_ops · 194b95d2
      Kirill Tkhai authored
      The methods of netlink_net_ops create and destroy "netlink"
      file, which are not interesting for foreigh pernet_operations.
      So, netlink_net_ops may safely be made async.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      194b95d2
    • Kirill Tkhai's avatar
      net: Convert net_defaults_ops · ff291d00
      Kirill Tkhai authored
      net_defaults_ops introduce only net_defaults_init_net method,
      and it acts on net::core::sysctl_somaxconn, which
      is not interesting for the rest of pernet_subsys and
      pernet_device lists. Then, make them async.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff291d00
    • Kirill Tkhai's avatar
      net: Convert net_inuse_ops · 604da74e
      Kirill Tkhai authored
      net_inuse_ops methods expose statistics in /proc.
      No one from the rest of pernet_subsys or pernet_device
      lists touch net::core::inuse.
      
      So, it's safe to make net_inuse_ops async.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      604da74e
    • Kirill Tkhai's avatar
      net: Convert nf_log_net_ops · c9d8fb91
      Kirill Tkhai authored
      The pernet_operations would have had a problem in parallel
      execution with others, if init_net had been able to released.
      But it's not, and the rest is safe for that.
      There is memory allocation, which nobody else interested in,
      and sysctl registration. So, we make them async.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9d8fb91
    • Kirill Tkhai's avatar
      net: Convert netfilter_net_ops · 95499299
      Kirill Tkhai authored
      Methods netfilter_net_init() and netfilter_net_exit()
      initialize net::nf::hooks and change net-related proc
      directory of net. Another pernet_operations are not
      interested in forein net::nf::hooks or proc entries,
      so it's safe to make them executed in parallel with
      methods of other pernet operations.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95499299
    • Kirill Tkhai's avatar
      net: Convert sysctl_pernet_ops · 93d230fe
      Kirill Tkhai authored
      This patch starts to convert pernet_subsys, registered
      from core initcalls.
      
      Methods sysctl_net_init() and sysctl_net_exit() initialize
      net::sysctls table of a namespace.
      
      pernet_operations::init()/exit() methods from the rest
      of the list do not touch net::sysctls of strangers,
      so it's safe to execute sysctl_pernet_ops's methods
      in parallel with any other pernet_operations.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93d230fe
    • Kirill Tkhai's avatar
      net: Convert net_ns_ops methods · 3fc3b827
      Kirill Tkhai authored
      This patch starts to convert pernet_subsys, registered
      from pure initcalls.
      
      net_ns_ops::net_ns_net_init/net_ns_net_init, methods use only
      ida_simple_* functions, which are not need a synchronization.
      They are synchronized by idr subsystem.
      
      So, net_ns_ops methods are able to be executed
      in parallel with methods of other pernet operations.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fc3b827
    • Kirill Tkhai's avatar
      net: Convert proc_net_ns_ops · f039e184
      Kirill Tkhai authored
      This patch starts to convert pernet_subsys, registered
      before initcalls.
      
      proc_net_ns_ops::proc_net_ns_init()/proc_net_ns_exit()
      {un,}register pernet net->proc_net and ->proc_net_stat.
      
      Constructors and destructors of another pernet_operations
      are not interested in foreign net's proc_net and proc_net_stat.
      Proc filesystem privitives are synchronized on proc_subdir_lock.
      
      So, proc_net_ns_ops methods are able to be executed
      in parallel with methods of any other pernet operations.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f039e184
    • Kirill Tkhai's avatar
      net: Allow pernet_operations to be executed in parallel · 447cd7a0
      Kirill Tkhai authored
      This adds new pernet_operations::async flag to indicate operations,
      which ->init(), ->exit() and ->exit_batch() methods are allowed
      to be executed in parallel with the methods of any other pernet_operations.
      
      When there are only asynchronous pernet_operations in the system,
      net_mutex won't be taken for a net construction and destruction.
      
      Also, remove BUG_ON(mutex_is_locked()) from net_assign_generic()
      without replacing with the equivalent net_sem check, as there is
      one more lockdep assert below.
      
      v3: Add comment near net_mutex.
      Suggested-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      447cd7a0
    • Kirill Tkhai's avatar
      net: Move mutex_unlock() in cleanup_net() up · bcab1ddd
      Kirill Tkhai authored
      net_sem protects from pernet_list changing, while
      ops_free_list() makes simple kfree(), and it can't
      race with other pernet_operations callbacks.
      
      So we may release net_mutex earlier then it was.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcab1ddd
    • Kirill Tkhai's avatar
      net: Introduce net_sem for protection of pernet_list · 1a57feb8
      Kirill Tkhai authored
      Currently, the mutex is mostly used to protect pernet operations
      list. It orders setup_net() and cleanup_net() with parallel
      {un,}register_pernet_operations() calls, so ->exit{,batch} methods
      of the same pernet operations are executed for a dying net, as
      were used to call ->init methods, even after the net namespace
      is unlinked from net_namespace_list in cleanup_net().
      
      But there are several problems with scalability. The first one
      is that more than one net can't be created or destroyed
      at the same moment on the node. For big machines with many cpus
      running many containers it's very sensitive.
      
      The second one is that it's need to synchronize_rcu() after net
      is removed from net_namespace_list():
      
      Destroy net_ns:
      cleanup_net()
        mutex_lock(&net_mutex)
        list_del_rcu(&net->list)
        synchronize_rcu()                                  <--- Sleep there for ages
        list_for_each_entry_reverse(ops, &pernet_list, list)
          ops_exit_list(ops, &net_exit_list)
        list_for_each_entry_reverse(ops, &pernet_list, list)
          ops_free_list(ops, &net_exit_list)
        mutex_unlock(&net_mutex)
      
      This primitive is not fast, especially on the systems with many processors
      and/or when preemptible RCU is enabled in config. So, all the time, while
      cleanup_net() is waiting for RCU grace period, creation of new net namespaces
      is not possible, the tasks, who makes it, are sleeping on the same mutex:
      
      Create net_ns:
      copy_net_ns()
        mutex_lock_killable(&net_mutex)                    <--- Sleep there for ages
      
      I observed 20-30 seconds hangs of "unshare -n" on ordinary 8-cpu laptop
      with preemptible RCU enabled after CRIU tests round is finished.
      
      The solution is to convert net_mutex to the rw_semaphore and add fine grain
      locks to really small number of pernet_operations, what really need them.
      
      Then, pernet_operations::init/::exit methods, modifying the net-related data,
      will require down_read() locking only, while down_write() will be used
      for changing pernet_list (i.e., when modules are being loaded and unloaded).
      
      This gives signify performance increase, after all patch set is applied,
      like you may see here:
      
      %for i in {1..10000}; do unshare -n bash -c exit; done
      
      *before*
      real 1m40,377s
      user 0m9,672s
      sys 0m19,928s
      
      *after*
      real 0m17,007s
      user 0m5,311s
      sys 0m11,779
      
      (5.8 times faster)
      
      This patch starts replacing net_mutex to net_sem. It adds rw_semaphore,
      describes the variables it protects, and makes to use, where appropriate.
      net_mutex is still present, and next patches will kick it out step-by-step.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a57feb8
    • Kirill Tkhai's avatar
      net: Cleanup in copy_net_ns() · 5ba049a5
      Kirill Tkhai authored
      Line up destructors actions in the revers order
      to constructors. Next patches will add more actions,
      and this will be comfortable, if there is the such
      order.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ba049a5
    • Kirill Tkhai's avatar
      net: Assign net to net_namespace_list in setup_net() · 98f6c533
      Kirill Tkhai authored
      This patch merges two repeating pieces of code in one,
      and they will live in setup_net() now.
      
      The only change is that assignment:
      
      	init_net_initialized = true;
      
      becomes reordered with:
      
      	list_add_tail_rcu(&net->list, &net_namespace_list);
      
      The order does not have visible effect, and it is a simple
      cleanup because of:
      
      init_net_initialized is used in !CONFIG_NET_NS case
      to order proc_net_ns_ops registration occuring at boot time:
      
      	start_kernel()->proc_root_init()->proc_net_init(),
      with
      	net_ns_init()->setup_net(&init_net, &init_user_ns)
      
      also occuring in boot time from the same init_task.
      
      When there are no another tasks to race with them,
      for the single task it does not matter, which order
      two sequential independent loads should be made.
      So we make them reordered.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98f6c533
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · cf19e5e2
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2018-02-12
      
      This series contains updates to i40e and i40evf.
      
      Alan fixes a spelling mistake in code comments.  Fixes an issue on older
      firmware versions or NPAR enabled PFs which do not support the
      I40E_FLAG_DISABLE_FW_LLDP flag and would get into a situation where any
      attempt to change any priv flag would be forbidden.
      
      Alex got busy with the ITR code and made several cleanups and fixes so
      that we can more easily understand what is going on.  The fixes included
      a computational fix when determining the register offset, as well as a
      fix for unnecessarily toggling the CLEARPBA bit which could lead to
      potential lost events if auto-masking is not enabled.
      
      Filip adds a necessary delay to recover after a EMP reset when using
      firmware version 4.33.
      
      Paweł adds a warning message for MFP devices when the link-down-on-close
      flag is set because it may affect other partitions.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf19e5e2
  2. 12 Feb, 2018 13 commits
    • Alexander Duyck's avatar
      i40e/i40evf: Add support for new mechanism of updating adaptive ITR · a0073a4b
      Alexander Duyck authored
      This patch replaces the existing mechanism for determining the correct
      value to program for adaptive ITR with yet another new and more
      complicated approach.
      
      The basic idea from a 30K foot view is that this new approach will push the
      Rx interrupt moderation up so that by default it starts in low latency and
      is gradually pushed up into a higher latency setup as long as doing so
      increases the number of packets processed, if the number of packets drops
      to 4 to 1 per packet we will reset and just base our ITR on the size of the
      packets being received. For Tx we leave it floating at a high interrupt
      delay and do not pull it down unless we start processing more than 112
      packets per interrupt. If we start exceeding that we will cut our interrupt
      rates in half until we are back below 112.
      
      The side effect of these patches are that we will be processing more
      packets per interrupt. This is both a good and a bad thing as it means we
      will not be blocking processing in the case of things like pktgen and XDP,
      but we will also be consuming a bit more CPU in the cases of things such as
      network throughput tests using netperf.
      
      One delta from this versus the ixgbe version of the changes is that I have
      made the interrupt moderation a bit more aggressive when we are in bulk
      mode by moving our "goldilocks zone" up from 48 to 96 to 56 to 112. The
      main motivation behind moving this is to address the fact that we need to
      update less frequently, and have more fine grained control due to the
      separate Tx and Rx ITR times.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a0073a4b
    • Alexander Duyck's avatar
      i40e/i40evf: Split container ITR into current_itr and target_itr · 556fdfd6
      Alexander Duyck authored
      This patch is mostly prep-work for replacing the current approach to
      programming the dynamic aka adaptive ITR. Specifically here what we are
      doing is splitting the Tx and Rx ITR each into two separate values.
      
      The first value current_itr represents the current value of the register.
      
      The second value target_itr represents the desired value of the register.
      
      The general plan by doing this is to allow for deferring the update of the
      ITR value under certain circumstances. For now we will work with what we
      have, but in the future I hope to change the behavior so that we always
      only update one ITR at a time using some simple logic to determine which
      ITR requires an update.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      556fdfd6
    • Alexander Duyck's avatar
      i40evf: Correctly populate rxitr_idx and txitr_idx · d4942d58
      Alexander Duyck authored
      While testing code for the recent ITR changes I found that updating the Tx
      ITR appeared to have no effect with everything defaulting to the Rx ITR. A
      bit of digging narrowed it down the fact that we were asking the PF to
      associate all causes with ITR 0 as we weren't populating the itr_idx values
      for either Rx or Tx.
      
      To correct it I have added the configuration for these values to this
      patch. In addition I did some minor clean-up to just add a local pointer
      for the vector map instead of dereferencing it based off of the index
      repeatedly. In my opinion this makes the resultant code a bit more readable
      and saves us a few characters.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d4942d58
    • Alexander Duyck's avatar
      i40e/i40evf: Use usec value instead of reg value for ITR defines · 92418fb1
      Alexander Duyck authored
      Instead of using the register value for the defines when setting up the
      ring ITR we can just use the actual values and avoid the use of shifts and
      macros to translate between the values we have and the values we want.
      
      This helps to make the code more readable as we can quickly translate from
      one value to the other.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      92418fb1
    • Denys Vlasenko's avatar
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko authored
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b2c45d4
    • Alexander Duyck's avatar
      i40e/i40evf: Don't bother setting the CLEARPBA bit · 4ff17929
      Alexander Duyck authored
      The CLEARPBA bit in the dynamic interrupt control register actually has
      no effect either way on the hardware. As per errata 28 in the XL710
      specification update the interrupt is actually cleared any time the
      register is written with the INTENA_MSK bit set to 0. As such the act of
      toggling the enable bit actually will trigger the interrupt being
      cleared and could lead to potential lost events if auto-masking is
      not enabled.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4ff17929
    • Alexander Duyck's avatar
      i40e/i40evf: Clean-up of bits related to using q_vector->reg_idx · 8b99b117
      Alexander Duyck authored
      This patch is a further clean-up related to the change over to using
      q_vector->reg_idx when accessing the ITR registers. Specifically the code
      appears to have several other spots where we were computing the register
      offset manually and this resulted in errors in a few spots.
      
      Specifically in the i40evf functions for mapping queues to vectors it
      appears we may have had an off by 1 error since (v_idx - 1) for the first
      q_vector with an index of 0 would result in us returning -1 if I am not
      mistaken.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8b99b117
    • Alan Brady's avatar
      i40e: use changed_flags to check I40E_FLAG_DISABLE_FW_LLDP · fe09ed0e
      Alan Brady authored
      Currently in i40e_set_priv_flags we use new_flags to check for the
      I40E_FLAG_DISABLE_FW_LLDP flag.  This is an issue for a few a reasons.
      DISABLE_FW_LLDP is persistent across reboots/driver reloads.  This means
      we need some way to detect if FW LLDP is enabled on init.  We do this by
      trying to init_dcb and if it fails with EPERM we know LLDP is disabled
      in FW.
      
      This could be a problem on older FW versions or NPAR enabled PFs because
      there are situations where the FW could disable LLDP, but they do _not_
      support using this flag to change it.  If we do end up in this
      situation, the flag will be set, then when the user tries to change any
      priv flags, the driver thinks the user is trying to disable FW LLDP on a
      FW that doesn't support it and essentially forbids any priv flag
      changes.
      
      The fix is simple, instead of checking if this flag is set, we should be
      checking if the user is trying to _change_ the flag on unsupported FW
      versions.
      
      This patch also adds a comment explaining that the cmpxchg is the point
      of no return.  Once we put the new flags into pf->flags we can't back
      out.
      Signed-off-by: default avatarAlan Brady <alan.brady@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fe09ed0e
    • Paweł Jabłoński's avatar
      i40e: Warn when setting link-down-on-close while in MFP · 17b4d25c
      Paweł Jabłoński authored
      This patch adds a warning message when the link-down-on-close flag is
      setting on. The warning is printed only on MFP devices
      Signed-off-by: default avatarPaweł Jabłoński <pawel.jablonski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      17b4d25c
    • Filip Sadowski's avatar
      i40e: Add delay after EMP reset for firmware to recover · 1fa51a65
      Filip Sadowski authored
      This patch adds necessary delay for 4.33 firmware to recover after
      EMP reset. Without this patch driver occasionally reinitializes
      structures too quickly to communicate with firmware after EMP reset
      causing AdminQ to timeout.
      Signed-off-by: default avatarFilip Sadowski <filip.sadowski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1fa51a65
    • Alexander Duyck's avatar
      i40e/i40evf: Clean up logic for adaptive ITR · 71dc3719
      Alexander Duyck authored
      The logic for dynamic ITR update is confusing at best as there were odd
      paths chosen for how to find the rings associated with a given queue based
      on the vector index and other inconsistencies throughout the code.
      
      This patch is an attempt to clean up the logic so that we can more easily
      understand what is going on. Specifically if there is a Rx or Tx ring that
      is enabled in dynamic mode on the q_vector it is allowed to override the
      other side of the interrupt moderation. While it isn't correct all this
      patch is doing is cleaning up the logic for now so that when we come
      through and fix it we can more easily identify that this is wrong.
      
      The other big change made here is that we replace references to:
      	vsi->rx_rings[q_vector->v_idx]->itr_setting
      with:
      	q_vector->rx.ring->itr_setting
      
      The general idea is we can avoid the long pointer chase since just
      accessing q_vector->rx.ring is a single pointer access versus having to
      chase down vsi->rx_rings, and then finding the pointer in the array, and
      finally chasing down the itr_setting from there.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      71dc3719
    • Alexander Duyck's avatar
      i40e/i40evf: Only track one ITR setting per ring instead of Tx/Rx · 40588ca6
      Alexander Duyck authored
      The rings are already split out into Tx and Rx rings so it doesn't make
      sense to have any single ring store both a Tx and Rx itr_setting value.
      Since that is the case drop the pair in favor of storing just a single ITR
      value.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      40588ca6
    • Alan Brady's avatar
      i40e: fix typo in function description · 11a350c9
      Alan Brady authored
      'bufer' should be 'buffer'
      Signed-off-by: default avatarAlan Brady <alan.brady@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      11a350c9
  3. 11 Feb, 2018 9 commits
    • Linus Torvalds's avatar
      Linux 4.16-rc1 · 7928b2cb
      Linus Torvalds authored
      7928b2cb
    • Al Viro's avatar
      unify {de,}mangle_poll(), get rid of kernel-side POLL... · 7a163b21
      Al Viro authored
      except, again, POLLFREE and POLL_BUSY_LOOP.
      
      With this, we finally get to the promised end result:
      
       - POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any
         stray instances of ->poll() still using those will be caught by
         sparse.
      
       - eventpoll.c and select.c warning-free wrt __poll_t
      
       - no more kernel-side definitions of POLL... - userland ones are
         visible through the entire kernel (and used pretty much only for
         mangle/demangle)
      
       - same behavior as after the first series (i.e. sparc et.al. epoll(2)
         working correctly).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a163b21
    • Linus Torvalds's avatar
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds authored
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
    • Linus Torvalds's avatar
      Merge branch 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ee5daa13
      Linus Torvalds authored
      Pull more poll annotation updates from Al Viro:
       "This is preparation to solving the problems you've mentioned in the
        original poll series.
      
        After this series, the kernel is ready for running
      
            for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
                  L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
                  for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
            done
      
        as a for bulk search-and-replace.
      
        After that, the kernel is ready to apply the patch to unify
        {de,}mangle_poll(), and then get rid of kernel-side POLL... uses
        entirely, and we should be all done with that stuff.
      
        Basically, that's what you suggested wrt KPOLL..., except that we can
        use EPOLL... instead - they already are arch-independent (and equal to
        what is currently kernel-side POLL...).
      
        After the preparations (in this series) switch to returning EPOLL...
        from ->poll() instances is completely mechanical and kernel-side
        POLL... can go away. The last step (killing kernel-side POLL... and
        unifying {de,}mangle_poll() has to be done after the
        search-and-replace job, since we need userland-side POLL... for
        unified {de,}mangle_poll(), thus the cherry-pick at the last step.
      
        After that we will have:
      
         - POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of
           ->poll() still using those will be caught by sparse.
      
         - eventpoll.c and select.c warning-free wrt __poll_t
      
         - no more kernel-side definitions of POLL... - userland ones are
           visible through the entire kernel (and used pretty much only for
           mangle/demangle)
      
         - same behavior as after the first series (i.e. sparc et.al. epoll(2)
           working correctly)"
      
      * 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        annotate ep_scan_ready_list()
        ep_send_events_proc(): return result via esed->res
        preparation to switching ->poll() to returning EPOLL...
        add EPOLLNVAL, annotate EPOLL... and event_poll->event
        use linux/poll.h instead of asm/poll.h
        xen: fix poll misannotation
        smc: missing poll annotations
      ee5daa13
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20180211' of git://github.com/jcmvbkbc/linux-xtensa · 3fc928dc
      Linus Torvalds authored
      Pull xtense fix from Max Filippov:
       "Build fix for xtensa architecture with KASAN enabled"
      
      * tag 'xtensa-20180211' of git://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix build with KASAN
      3fc928dc
    • Linus Torvalds's avatar
      Merge tag 'nios2-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2 · 60d7a21a
      Linus Torvalds authored
      Pull nios2 update from Ley Foon Tan:
      
       - clean up old Kconfig options from defconfig
      
       - remove leading 0x and 0s from bindings notation in dts files
      
      * tag 'nios2-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
        nios2: defconfig: Cleanup from old Kconfig options
        nios2: dts: Remove leading 0x and 0s from bindings notation
      60d7a21a
    • Max Filippov's avatar
      xtensa: fix build with KASAN · f8d0cbf2
      Max Filippov authored
      The commit 917538e2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT
      usage") removed KASAN_SHADOW_SCALE_SHIFT definition from
      include/linux/kasan.h and added it to architecture-specific headers,
      except for xtensa. This broke the xtensa build with KASAN enabled.
      Define KASAN_SHADOW_SCALE_SHIFT in arch/xtensa/include/asm/kasan.h
      
      Reported by: kbuild test robot <fengguang.wu@intel.com>
      Fixes: 917538e2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage")
      Acked-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      f8d0cbf2
    • Krzysztof Kozlowski's avatar
      nios2: defconfig: Cleanup from old Kconfig options · e0691ebb
      Krzysztof Kozlowski authored
      Remove old, dead Kconfig option INET_LRO. It is gone since
      commit 7bbf3cae ("ipv4: Remove inet_lro library").
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Acked-by: default avatarLey Foon Tan <ley.foon.tan@intel.com>
      e0691ebb
    • Mathieu Malaterre's avatar
      nios2: dts: Remove leading 0x and 0s from bindings notation · 5d13c731
      Mathieu Malaterre authored
      Improve the DTS files by removing all the leading "0x" and zeros to fix the
      following dtc warnings:
      
      Warning (unit_address_format): Node /XXX unit name should not have leading "0x"
      
      and
      
      Warning (unit_address_format): Node /XXX unit name should not have leading 0s
      
      Converted using the following command:
      
      find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} +
      
      For simplicity, two sed expressions were used to solve each warnings separately.
      
      To make the regex expression more robust a few other issues were resolved,
      namely setting unit-address to lower case, and adding a whitespace before the
      the opening curly brace:
      
      https://elinux.org/Device_Tree_Linux#Linux_conventions
      
      This is a follow up to commit 4c9847b7 ("dt-bindings: Remove leading 0x from bindings notation")
      Reported-by: default avatarDavid Daney <ddaney@caviumnetworks.com>
      Suggested-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Acked-by: default avatarLey Foon Tan <ley.foon.tan@intel.com>
      5d13c731
  4. 10 Feb, 2018 3 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · d48fcbd8
      Linus Torvalds authored
      Pull PCI fix from Bjorn Helgaas:
       "Fix a POWER9/powernv INTx regression from the merge window (Alexey
        Kardashevskiy)"
      
      * tag 'pci-v4.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        powerpc/pci: Fix broken INTx configuration via OF
      d48fcbd8
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180210' of git://git.kernel.dk/linux-block · 9454473c
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes to round off the merge window on the block side:
      
         - a set of bcache fixes by way of Michael Lyle, from the usual bcache
           suspects.
      
         - add a simple-to-hook-into function for bpf EIO error injection.
      
         - fix blk-wbt that mischarectized flushes as reads. Improve the logic
           so that flushes and writes are accounted as writes, and only reads
           as reads. From me.
      
         - fix requeue crash in BFQ, from Paolo"
      
      * tag 'for-linus-20180210' of git://git.kernel.dk/linux-block:
        block, bfq: add requeue-request hook
        bcache: fix for data collapse after re-attaching an attached device
        bcache: return attach error when no cache set exist
        bcache: set writeback_rate_update_seconds in range [1, 60] seconds
        bcache: fix for allocator and register thread race
        bcache: set error_limit correctly
        bcache: properly set task state in bch_writeback_thread()
        bcache: fix high CPU occupancy during journal
        bcache: add journal statistic
        block: Add should_fail_bio() for bpf error injection
        blk-wbt: account flush requests correctly
      9454473c
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86 · cc5cb5af
      Linus Torvalds authored
      Pull x86 platform driver updates from Darren Hart:
       "Mellanox fixes and new system type support.
      
        Mostly data for new system types with a correction and an
        uninitialized variable fix"
      
      [ Pulling from github because git.infradead.org currently seems to be
        down for some reason, but Darren had a backup location    - Linus ]
      
      * tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86:
        platform/x86: mlx-platform: Add support for new 200G IB and Ethernet systems
        platform/x86: mlx-platform: Add support for new msn201x system type
        platform/x86: mlx-platform: Add support for new msn274x system type
        platform/x86: mlx-platform: Fix power cable setting for msn21xx family
        platform/x86: mlx-platform: Add define for the negative bus
        platform/x86: mlx-platform: Use defines for bus assignment
        platform/mellanox: mlxreg-hotplug: Fix uninitialized variable
      cc5cb5af