1. 18 Nov, 2015 20 commits
    • David S. Miller's avatar
      Merge branch 'net-generic-busy-polling' · 85c72ba1
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: extend busy polling support
      
      This patch series extends busy polling range to tunnels devices,
      and adds busy polling generic support to all NAPI drivers.
      
      No need to provide ndo_busy_poll() method and extra synchronization
      between ndo_busy_poll() and normal napi->poll() method.
      This was proven very difficult and bug prone.
      
      mlx5 driver is changed to support busy polling using this new method,
      and a second mlx5 patch adds napi_complete_done() support and proper
      SNMP accounting.
      
      bnx2x and mlx4 drivers are converted to new infrastructure,
      reducing kernel bloat and improving performance.
      
      Latest patch, adding generic support, adds a new requirement :
      
       -free_netdev() and netif_napi_del() must be called from process context.
      
      Since this might not be the case in some drivers, we might have to
      either : fix the non conformant drivers (by disabling busy polling on them)
      or revert this last patch.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85c72ba1
    • Eric Dumazet's avatar
      net: provide generic busy polling to all NAPI drivers · 93d05d4a
      Eric Dumazet authored
      NAPI drivers no longer need to observe a particular protocol
      to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)
      
      napi_hash_add() and napi_hash_del() are automatically called
      from core networking stack, respectively from
      netif_napi_add() and netif_napi_del()
      
      This patch depends on free_netdev() and netif_napi_del() being
      called from process context, which seems to be the norm.
      
      Drivers might still prefer to call napi_hash_del() on their
      own, since they might combine all the rcu grace periods into
      a single one, knowing their NAPI structures lifetime, while
      core networking stack has no idea of a possible combining.
      
      Once this patch proves to not bring serious regressions,
      we will cleanup drivers to either remove napi_hash_del()
      or provide appropriate rcu grace periods combining.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93d05d4a
    • Eric Dumazet's avatar
      net: napi_hash_del() returns a boolean status · 34cbe27e
      Eric Dumazet authored
      napi_hash_del() will soon be used from both drivers (if they want)
      or core networking stack.
      
      Callers are responsibles to ensure an RCU grace period is respected
      before freeing napi structure : napi_hash_del() can signal if
      this RCU grace period is needed or not.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34cbe27e
    • Eric Dumazet's avatar
      net: move napi_hash[] into read mostly section · 6180d9de
      Eric Dumazet authored
      We do not often add/delete a napi context.
      Moving napi_hash[] into read mostly section avoids potential false sharing.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6180d9de
    • Eric Dumazet's avatar
      net: add netif_tx_napi_add() · d64b5e85
      Eric Dumazet authored
      netif_tx_napi_add() is a variant of netif_napi_add()
      
      It should be used by drivers that use a napi structure
      to exclusively poll TX.
      
      We do not want to add this kind of napi in napi_hash[] in following
      patches, adding generic busy polling to all NAPI drivers.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d64b5e85
    • Eric Dumazet's avatar
      net: move skb_mark_napi_id() into core networking stack · 93f93a44
      Eric Dumazet authored
      We would like to automatically provide busy polling support
      to all NAPI drivers, without them having to implement anything.
      
      skb_mark_napi_id() can be called from napi_gro_receive() and
      napi_get_frags().
      
      Few drivers are still calling skb_mark_napi_id() because
      they use netif_receive_skb(). They should eventually call
      napi_gro_receive() instead. I will leave this to drivers
      maintainers.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93f93a44
    • Eric Dumazet's avatar
      mlx4: remove mlx4_en_low_latency_recv() · 868fdb06
      Eric Dumazet authored
      Busy polling can now be handled in generic NAPI poll infrastructure.
      This removes complexity and fast path overhead :
      
      mlx4 used two spin_lock()/spin_unlock() pair per napi->poll() call
      in mlx4_en_cq_lock_napi()/mlx4_en_cq_unlock_napi()
      
      Tested:
      
      Without busy polling :
      
      lpaa23:~# echo 0 >/proc/sys/net/core/busy_read
      lpaa24:~# echo 0 >/proc/sys/net/core/busy_read
      lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    47330.78
      
      With busy polling :
      
      lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
      lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
      lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    97643.55
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      868fdb06
    • Eric Dumazet's avatar
      bnx2x: remove bnx2x_low_latency_recv() support · b59768c6
      Eric Dumazet authored
      Switch to native NAPI polling, as this reduces overhead and complexity.
      
      Normal path is faster, since one cmpxchg() is not anymore requested,
      and busy polling with the NAPI polling has same performance.
      
      Tested:
      lpk50:~# cat /proc/sys/net/core/busy_read
      70
      lpk50:~# nstat >/dev/null;./netperf -H lpk55 -t TCP_RR;nstat
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpk55.prod.google.com () port 0 AF_INET : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    40095.07
      16384  87380
      IpInReceives                    401062             0.0
      IpInDelivers                    401062             0.0
      IpOutRequests                   401079             0.0
      TcpActiveOpens                  7                  0.0
      TcpPassiveOpens                 3                  0.0
      TcpAttemptFails                 3                  0.0
      TcpEstabResets                  5                  0.0
      TcpInSegs                       401036             0.0
      TcpOutSegs                      401052             0.0
      TcpOutRsts                      38                 0.0
      UdpInDatagrams                  26                 0.0
      UdpOutDatagrams                 27                 0.0
      Ip6OutNoRoutes                  1                  0.0
      TcpExtDelayedACKs               1                  0.0
      TcpExtTCPPrequeued              98                 0.0
      TcpExtTCPDirectCopyFromPrequeue 98                 0.0
      TcpExtTCPHPHits                 4                  0.0
      TcpExtTCPHPHitsToUser           98                 0.0
      TcpExtTCPPureAcks               5                  0.0
      TcpExtTCPHPAcks                 101                0.0
      TcpExtTCPAbortOnData            6                  0.0
      TcpExtBusyPollRxPackets         400832             0.0
      TcpExtTCPOrigDataSent           400983             0.0
      IpExtInOctets                   21273867           0.0
      IpExtOutOctets                  21261254           0.0
      IpExtInNoECTPkts                401064             0.0
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b59768c6
    • Eric Dumazet's avatar
      mlx5: support napi_complete_done() · 44fb6fbb
      Eric Dumazet authored
      A NAPI poll handler should return number of RX packets processed,
      instead of 0 / budget.
      
      This allows proper busy poll accounting through LINUX_MIB_BUSYPOLLRXPACKETS
      SNMP counter.
      
      napi_complete_done() allows /sys/class/net/ethX/gro_flush_timeout
      to be used for finer GRO aggregation control.
      
      Tested:
      
      Enabled busy polling, and checked TcpExtBusyPollRxPackets counter is increasing.
      
      echo 70 >/proc/sys/net/core/busy_read
      nstat >/dev/null
      netperf -H target -t TCP_RR >/dev/null
      nstat | grep TcpExtBusyPollRxPackets
      TcpExtBusyPollRxPackets         490958             0.0
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Eli Cohen <eli@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44fb6fbb
    • Eric Dumazet's avatar
      mlx5: add busy polling support · 7ae92ae5
      Eric Dumazet authored
      It is now easy to add busy polling support to a NAPI driver,
      with very little impact on normal input path.
      
      This patch serves as a reference implementation.
      
      Note:
      
      A followup patch will add proper napi_complete_done() in mlx5,
      so that LINUX_MIB_BUSYPOLLRXPACKETS snmp counter is properly handled.
      
      Tested:
      
      Normal TCP_RR results without busy polling :
      
      lpk51:~# echo 0 >/proc/sys/net/core/busy_read
      lpk52:~# echo 0 >/proc/sys/net/core/busy_read
      
      lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    53509.49
      16384  87380
      
      Now enable busy polling :
      
      lpk51:~# echo 70 >/proc/sys/net/core/busy_read
      lpk52:~# echo 70 >/proc/sys/net/core/busy_read
      
      lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    97530.92
      16384  87380
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ae92ae5
    • Eric Dumazet's avatar
      net: network drivers no longer need to implement ndo_busy_poll() · ce6aea93
      Eric Dumazet authored
      Instead of having to implement complex ndo_busy_poll() method,
      drivers can simply rely on NAPI poll logic.
      
      Busy polling gains are mainly coming from polling itself,
      not on exact details on how we poll the device.
      
      ndo_busy_poll() if implemented can avoid touching
      napi state, but it adds extra synchronization between
      normal napi->poll() and busy poll handler, slowing down
      the common path (non busy polling) with extra atomic operations.
      In practice few drivers ever got busy poll because of the complexity.
      
      We could go one step further, and make busy polling
      available for all NAPI drivers, but this would require
      that all netif_napi_del() calls are done in process context
      so that we can call synchronize_rcu().
      Full audit would be required.
      
      Before this is done, a driver still needs to call :
      
      - skb_mark_napi_id() for each skb provided to the stack.
      - napi_hash_add() and napi_hash_del() to allocate a napi_id per napi struct.
      - Make sure RCU grace period is respected after napi_hash_del() before
        memory containing napi structure is freed.
      
      Followup patch implements busy poll for mlx5 driver as an example.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce6aea93
    • Eric Dumazet's avatar
      net: allow BH servicing in sk_busy_loop() · 2a028ecb
      Eric Dumazet authored
      Instead of blocking BH in whole sk_busy_loop(), block them
      only around ->ndo_busy_poll() calls.
      
      This has many benefits.
      
      1) allow tunneled traffic to use busy poll as well as native traffic.
         Tunnels handlers usually call netif_rx() and depend on net_rx_action()
         being run (from sofirq handler)
      
      2) allow RFS/RPS being used (sending IPI to other cpus if needed)
      
      3) use the 'lets burn cpu cycles' budget to do useful work
         (like TX completions, timers, RCU callbacks...)
      
      4) reduce BH latencies, making busy poll a better citizen.
      
      Tested:
      
      Tested with SIT tunnel
      
      lpaa5:~# echo 0 >/proc/sys/net/core/busy_read
      lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
      MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    37373.93
      16384  87380
      
      Now enable busy poll on both hosts
      
      lpaa5:~# echo 70 >/proc/sys/net/core/busy_read
      lpaa6:~# echo 70 >/proc/sys/net/core/busy_read
      
      lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
      MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    58314.77
      16384  87380
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a028ecb
    • Eric Dumazet's avatar
      net: un-inline sk_busy_loop() · 02d62e86
      Eric Dumazet authored
      There is really little gain from inlining this big function.
      We'll soon make it even bigger in following patches.
      
      This means we no longer need to export napi_by_id()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02d62e86
    • Eric Dumazet's avatar
      mlx4: mlx4_en_low_latency_recv() called with BH disabled · 5865316c
      Eric Dumazet authored
      mlx4_en_low_latency_recv() is called with BH disabled,
      as other ndo_busy_poll() methods.
      
      No need for spin_lock_bh()/spin_unlock_bh()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5865316c
    • Eric Dumazet's avatar
      net: better skb->sender_cpu and skb->napi_id cohabitation · 52bd2d62
      Eric Dumazet authored
      skb->sender_cpu and skb->napi_id share a common storage,
      and we had various bugs about this.
      
      We had to call skb_sender_cpu_clear() in some places to
      not leave a prior skb->napi_id and fool netdev_pick_tx()
      
      As suggested by Alexei, we could split the space so that
      these errors can not happen.
      
      0 value being reserved as the common (not initialized) value,
      let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
      and [NR_CPUS+1 .. ~0U] for valid napi_id.
      
      This will allow proper busy polling support over tunnels.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52bd2d62
    • Ivan Vecera's avatar
      be2net: remove local variable 'status' · d37b4c0a
      Ivan Vecera authored
      The lancer_cmd_get_file_len() uses lancer_cmd_read_object() to get
      the current size of registers for ethtool registers dump. Returned status
      value is stored but not checked. The check itself is not necessary as
      the data_read output variable is initialized to 0 and status variable
      can be removed.
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d37b4c0a
    • huangdaode's avatar
      net: hisilicon: fix binding document of mdio · 6fbaa570
      huangdaode authored
      This patch explains the occasion of "hisilcon,mdio" and
      "hisilicon,hns-mdio" according to Arnd's comments.
      and reformat it according to comments from Rob<robh@kernel.org>.
      Signed-off-by: default avatarhuangdaode <huangdaode@hisilicon.com>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fbaa570
    • Bastian Stender's avatar
      net ipv4: use preferred log methods · 09605cc1
      Bastian Stender authored
      Replace printk calls with preferred unconditional log method calls to keep
      kernel messages clean.
      
      Added newline to "too small MTU" message.
      Signed-off-by: default avatarBastian Stender <bst@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09605cc1
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 34258a32
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
       "Assorted bug fixes, the mlock2 system call gets added, and one
        improvement.  The boot from dasd devices is now possible from a wider
        range of devices"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: remove SALIPL loader
        s390: wire up mlock2 system call
        s390: remove g5 elf platform support
        s390: avoid cache aliasing under z/VM and KVM
        s390/sclp: _sclp_wait_int(): retain full PSW mask
        s390/zcrypt: Fix initialisation when zcrypt is built-in
        s390/zcrypt: Fix kernel crash on systems without AP bus support
        s390: add support for ipl devices in subchannel sets > 0
        s390/ipl: fix out of bounds access in scpdata_write
        s390/pci_dma: improve debugging of errors during dma map
        s390/pci_dma: handle dma table failures
        s390/pci_dma: unify label of invalid translation table entries
        s390/syscalls: remove system call number calculation
        s390/cio: simplify css_generate_pgid
        s390/diag: add a s390 prefix to the diagnose trace point
        s390/head: fix error message on unsupported hardware
      34258a32
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus-v4.4-rc2' of... · 0d77a123
      Linus Torvalds authored
      Merge tag 'hwmon-for-linus-v4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
       "Fix build issues in scpi and ina2xx drivers, update scpi driver to
        support recent firmware, and fix an uninitialized variable warning in
        applesmc driver"
      
      * tag 'hwmon-for-linus-v4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (scpi) skip unsupported sensors properly
        hwmon: (scpi) add thermal-of dependency
        hwmon : (applesmc) Fix uninitialized variables warnings
        hwmon: (ina2xx) Fix build issue by selecting REGMAP_I2C
      0d77a123
  2. 17 Nov, 2015 20 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7f151f1d
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix list tests in netfilter ingress support, from Florian Westphal.
      
       2) Fix reversal of input and output interfaces in ingress hook
          invocation, from Pablo Neira Ayuso.
      
       3) We have a use after free in r8169, caught by Dave Jones, fixed by
          Francois Romieu.
      
       4) Splice use-after-free fix in AF_UNIX frmo Hannes Frederic Sowa.
      
       5) Three ipv6 route handling bug fixes from Martin KaFai Lau:
          a) Don't create clone routes not managed by the fib6 tree
          b) Don't forget to check expiration of DST_NOCACHE routes.
          c) Handle rt->dst.from == NULL properly.
      
       6) Several AF_PACKET fixes wrt transport header setting and SKB
          protocol setting, from Daniel Borkmann.
      
       7) Fix thunder driver crash on shutdown, from Pavel Fedin.
      
       8) Several Mellanox driver fixes (max MTU calculations, use of correct
          DMA unmap in TX path, etc.) from Saeed Mahameed, Tariq Toukan, Doron
          Tsur, Achiad Shochat, Eran Ben Elisha, and Noa Osherovich.
      
       9) Several mv88e6060 DSA driver fixes (wrong bit definitions for
          certain registers, etc.) from Neil Armstrong.
      
      10) Make sure to disable preemption while updating per-cpu stats of ip
          tunnels, from Jason A.  Donenfeld.
      
      11) Various ARM64 bpf JIT fixes, from Yang Shi.
      
      12) Flush icache properly in ARM JITs, from Daniel Borkmann.
      
      13) Fix masking of RX and TX interrupts in ravb driver, from Masaru
          Nagai.
      
      14) Fix netdev feature propagation for devices not implementing
          ->ndo_set_features().  From Nikolay Aleksandrov.
      
      15) Big endian fix in vmxnet3 driver, from Shrikrishna Khare.
      
      16) RAW socket code increments incorrect SNMP counters, fix from Ben
          Cartwright-Cox.
      
      17) IPv6 multicast SNMP counters are bumped twice, fix from Neil Horman.
      
      18) Fix handling of VLAN headers on stacked devices when REORDER is
          disabled.  From Vlad Yasevich.
      
      19) Fix SKB leaks and use-after-free in ipvlan and macvlan drivers, from
          Sabrina Dubroca.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
        MAINTAINERS: Update Mellanox's Eth NIC driver entries
        net/core: revert "net: fix __netdev_update_features return.." and add comment
        af_unix: take receive queue lock while appending new skb
        rtnetlink: fix frame size warning in rtnl_fill_ifinfo
        net: use skb_clone to avoid alloc_pages failure.
        packet: Use PAGE_ALIGNED macro
        packet: Don't check frames_per_block against negative values
        net: phy: Use interrupts when available in NOLINK state
        phy: marvell: Add support for 88E1540 PHY
        arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
        macvlan: fix leak in macvlan_handle_frame
        ipvlan: fix use after free of skb
        ipvlan: fix leak in ipvlan_rcv_frame
        vlan: Do not put vlan headers back on bridge and macvlan ports
        vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
        via-velocity: unconditionally drop frames with bad l2 length
        ipg: Remove ipg driver
        dl2k: Add support for IP1000A-based cards
        snmp: Remove duplicate OUTMCAST stat increment
        net: thunder: Check for driver data in nicvf_remove()
        ...
      7f151f1d
    • Or Gerlitz's avatar
      MAINTAINERS: Update Mellanox's Eth NIC driver entries · e7523a49
      Or Gerlitz authored
      Eugenia (Jenny) Emantayev is replacing Amir Vadai as the
      mlx4 Ethernet driver maintainer.
      
      Saeed Mahameed is assigned to maintain mlx5 Eth functionality.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7523a49
    • Nikolay Aleksandrov's avatar
      net/core: revert "net: fix __netdev_update_features return.." and add comment · 17b85d29
      Nikolay Aleksandrov authored
      This reverts commit 00ee5927 ("net: fix __netdev_update_features return
      on ndo_set_features failure")
      and adds a comment explaining why it's okay to return a value other than
      0 upon error. Some drivers might actually change flags and return an
      error so it's better to fire a spurious notification rather than miss
      these.
      
      CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17b85d29
    • Hannes Frederic Sowa's avatar
      af_unix: take receive queue lock while appending new skb · a3a116e0
      Hannes Frederic Sowa authored
      While possibly in future we don't necessarily need to use
      sk_buff_head.lock this is a rather larger change, as it affects the
      af_unix fd garbage collector, diag and socket cleanups. This is too much
      for a stable patch.
      
      For the time being grab sk_buff_head.lock without disabling bh and irqs,
      so don't use locked skb_queue_tail.
      
      Fixes: 869e7c62 ("net: af_unix: implement stream sendpage support")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3a116e0
    • Hannes Frederic Sowa's avatar
      rtnetlink: fix frame size warning in rtnl_fill_ifinfo · b22b941b
      Hannes Frederic Sowa authored
      Fix the following warning:
      
        CC      net/core/rtnetlink.o
      net/core/rtnetlink.c: In function ‘rtnl_fill_ifinfo’:
      net/core/rtnetlink.c:1308:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
       }
       ^
      by splitting up the huge rtnl_fill_ifinfo into some smaller ones, so we
      don't have the huge frame allocations at the same time.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b22b941b
    • Martin Zhang's avatar
      net: use skb_clone to avoid alloc_pages failure. · 19125c1a
      Martin Zhang authored
      1. new skb only need dst and ip address(v4 or v6).
      2. skb_copy may need high order pages, which is very rare on long running server.
      Signed-off-by: default avatarJunwei Zhang <linggao.zjw@alibaba-inc.com>
      Signed-off-by: default avatarMartin Zhang <martinbj2008@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19125c1a
    • Tobias Klauser's avatar
      packet: Use PAGE_ALIGNED macro · 90836b67
      Tobias Klauser authored
      Use PAGE_ALIGNED(...) instead of open-coding it.
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90836b67
    • Tobias Klauser's avatar
      packet: Don't check frames_per_block against negative values · 4194b491
      Tobias Klauser authored
      rb->frames_per_block is an unsigned int, thus can never be negative.
      
      Also fix spacing in the calculation of frames_per_block.
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4194b491
    • Andrew Lunn's avatar
      net: phy: Use interrupts when available in NOLINK state · 321beec5
      Andrew Lunn authored
      The NOLINK state will poll the phy once a second to see if the link
      has come up. If the phy has an interrupt line, this polling can be
      skipped, since the phy should interrupt when the link returns.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      321beec5
    • Andrew Lunn's avatar
      phy: marvell: Add support for 88E1540 PHY · 819ec8e1
      Andrew Lunn authored
      The 88E1540 can be found embedded in the Marvell 88E6352 switch.  It
      is compatible with the 88E1510, so add support for it, using the
      88E1510 specific functions.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      819ec8e1
    • Yang Shi's avatar
      arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS · ec0738db
      Yang Shi authored
      Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP
      in prologue in order to get the correct stack backtrace.
      
      However, ARM64 JIT used FP (x29) as eBPF fp register, FP is subjected to
      change during function call so it may cause the BPF prog stack base address
      change too.
      
      Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee
      saved register, so it will keep intact during function call.
      It is initialized in BPF prog prologue when BPF prog is started to run
      everytime. Save and restore x25/x26 in BPF prologue and epilogue to keep
      them intact for the outside of BPF. Actually, x26 is unnecessary, but SP
      requires 16 bytes alignment.
      
      So, the BPF stack layout looks like:
      
                                       high
               original A64_SP =>   0:+-----+ BPF prologue
                                      |FP/LR|
               current A64_FP =>  -16:+-----+
                                      | ... | callee saved registers
                                      +-----+
                                      |     | x25/x26
               BPF fp register => -80:+-----+
                                      |     |
                                      | ... | BPF prog stack
                                      |     |
                                      |     |
               current A64_SP =>      +-----+
                                      |     |
                                      | ... | Function call stack
                                      |     |
                                      +-----+
                                        low
      
      CC: Zi Shen Lim <zlim.lnx@gmail.com>
      CC: Xi Wang <xi.wang@gmail.com>
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Acked-by: default avatarZi Shen Lim <zlim.lnx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec0738db
    • Sabrina Dubroca's avatar
      macvlan: fix leak in macvlan_handle_frame · e639b8d8
      Sabrina Dubroca authored
      Reset pskb in macvlan_handle_frame in case skb_share_check returned a
      clone.
      
      Fixes: 8a4eb573 ("net: introduce rx_handler results and logic around that")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e639b8d8
    • Sabrina Dubroca's avatar
      ipvlan: fix use after free of skb · a534dc52
      Sabrina Dubroca authored
      ipvlan_handle_frame is a rx_handler, and when it returns a value other
      than RX_HANDLER_CONSUMED (here, NET_RX_DROP aka RX_HANDLER_ANOTHER),
      __netif_receive_skb_core expects that the skb still exists and will
      process it further, but we just freed it.
      
      Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a534dc52
    • Sabrina Dubroca's avatar
      ipvlan: fix leak in ipvlan_rcv_frame · cf554ada
      Sabrina Dubroca authored
      Pass a **skb to ipvlan_rcv_frame so that if skb_share_check returns a
      new skb, we actually use it during further processing.
      
      It's safe to ignore the new skb in the ipvlan_xmit_* functions, because
      they call ipvlan_rcv_frame with local == true, so that dev_forward_skb
      is called and always takes ownership of the skb.
      
      Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf554ada
    • David S. Miller's avatar
      Merge branch 'vlan-reorder' · eb3f8b42
      David S. Miller authored
      Vladislav Yasevich says:
      
      ====================
      Fix issues with vlans without REORDER_HEADER
      
      A while ago Phil Sutter brought up an issue with vlans without
      REORDER_HEADER and bridges.  The problem was that if a vlan
      without REORDER_HEADER was a port in the bridge, the bridge ended
      up forwarding corrupted packets that still contained the vlan header.
      The same issue exists for bridge mode macvlan/macvtap devices.
      
      An additional issue with vlans without REORDER_HEADER is that stacking
      them also doesn't work.  The reason here is that skb_reorder_vlan_header()
      function assumes that it on ETH_HLEN bytes deep into the packet.  That
      is not the case, when you a vlan without REORRDER_HEADER flag set.
      
      This series attempts to correct these 2 issues.
      
      1) To solve the stacked vlans problem, the patch simply use
      skb->mac_len as an offset to start copying mac addresses that
      is part of header reordering.
      
      2) To fix the issue with bridge/macvlan/macvtap, the second patch
      simply doesn't write the vlan header back to the packet if the
      vlan device is either a bridge or a macvlan port.  This ends up
      being the simplest and least performance intrussive solution.
      
      I've considered extending patch 2 to all stacked devices (essentially
      checked for the presense of rx_handler), but that feels like a broader
      restriction and _may_ break existing uses.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb3f8b42
    • Vlad Yasevich's avatar
      vlan: Do not put vlan headers back on bridge and macvlan ports · 28f9ee22
      Vlad Yasevich authored
      When a vlan is configured with REORDER_HEADER set to 0, the vlan
      header is put back into the packet and makes it appear that
      the vlan header is still there even after it's been processed.
      This posses a problem for bridge and macvlan ports.  The packets
      passed to those device may be forwarded and at the time of the
      forward, vlan headers end up being unexpectedly present.
      
      With the patch, we make sure that we do not put the vlan header
      back (when REORDER_HEADER is 0) if a bridge or macvlan has
      been configured on top of the vlan device.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28f9ee22
    • Vlad Yasevich's avatar
      vlan: Fix untag operations of stacked vlans with REORDER_HEADER off · a6e18ff1
      Vlad Yasevich authored
      When we have multiple stacked vlan devices all of which have
      turned off REORDER_HEADER flag, the untag operation does not
      locate the ethernet addresses correctly for nested vlans.
      The reason is that in case of REORDER_HEADER flag being off,
      the outer vlan headers are put back and the mac_len is adjusted
      to account for the presense of the header.  Then, the subsequent
      untag operation, for the next level vlan, always use VLAN_ETH_HLEN
      to locate the begining of the ethernet header and that ends up
      being a multiple of 4 bytes short of the actuall beginning
      of the mac header (the multiple depending on the how many vlan
      encapsulations ethere are).
      
      As a reslult, if there are multiple levles of vlan devices
      with REODER_HEADER being off, the recevied packets end up
      being dropped.
      
      To solve this, we use skb->mac_len as the offset.  The value
      is always set on receive path and starts out as a ETH_HLEN.
      The value is also updated when the vlan header manupations occur
      so we know it will be correct.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6e18ff1
    • Timo Teräs's avatar
      via-velocity: unconditionally drop frames with bad l2 length · 6c606fa3
      Timo Teräs authored
      By default the driver allowed incorrect frames to be received. What is
      worse the code does not handle very short frames correctly. The FCS
      length is unconditionally subtracted, and the underflow can cause
      skb_put to be called with large number after implicit cast to unsigned.
      And indeed, an skb_over_panic() was observed with via-velocity.
      
      This removes the module parameter as it does not work in it's
      current state, and should be implemented via NETIF_F_RXALL if needed.
      Suggested-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Signed-off-by: default avatarTimo Teräs <timo.teras@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c606fa3
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · a18ab2f6
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "A fs-cache regression fix, and adding a warning about obnoxiou^W
        moderation of list given in MAINTAINERS"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        MAINTAINERS: linux-cachefs@redhat.com is moderated for non-subscribers
        FS-Cache: Add missing initialization of ret in cachefiles_write_page()
      a18ab2f6
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 864f83a1
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "This fixes a bug in the qat driver where a user-space pointer is
        dereferenced"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: qat - don't use userspace pointer
      864f83a1