1. 19 Aug, 2016 19 commits
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · a5c88182
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      1GbE Intel Wired LAN Driver Updates 2016-08-18
      
      This series contains updates to igb only.
      
      Gangfeng Huang provides all the changes in the series to update the
      igb driver to support advanced receive side filters that direct receive
      packets by flows to different hardware queues. This enables a tight
      control on routing a flow in the platform.  First patch allows for
      receive network flow classification to insert and remove receive filters
      by ethtool.  Second and third patches add the ability to insert and
      remove ethertype and VLAN priority filters by ethtool.
      
      Last patch just fixes an error message to return "Not supported" versus
      "Unknown error 524".
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5c88182
    • Gangfeng Huang's avatar
      igb: fix error code in igb_add_ethtool_nfc_entry() · 54be8132
      Gangfeng Huang authored
      Use error "rmgr: Cannot insert RX class rule: Operation not supported" is
      more meaningful than "rmgr: Cannot insert RX class rule: Unknown error 524"
      Signed-off-by: default avatarGangfeng Huang <gangfeng.huang@ni.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      54be8132
    • Gangfeng Huang's avatar
      igb: support RX flow classification by VLAN priority · 7a277a96
      Gangfeng Huang authored
      This patch is meant to allow for RX network flow classification to insert
      and remove VLAN priority filter by ethtool
      
      Example:
      Add an VLAN priority filter:
      $ ethtool -N eth0 flow-type ether vlan 0x6000 vlan-mask 0x1FFF action 2 loc 1
      
      Show all filters:
      $ ethtool -n eth0
      4 RX rings available
      Total 1 rules
      
      Filter: 1
      	Flow Type: Raw Ethernet
      	Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
      	Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
      	Ethertype: 0x0 mask: 0xFFFF
      	VLAN EtherType: 0x0 mask: 0xffff
      	VLAN: 0x6000 mask: 0x1fff
      	User-defined: 0x0 mask: 0xffffffffffffffff
      	Action: Direct to queue 2
      
      Delete the filter by location:
      $ ethtool -N delete 1
      Signed-off-by: default avatarRuhao Gao <ruhao.gao@ni.com>
      Signed-off-by: default avatarGangfeng Huang <gangfeng.huang@ni.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7a277a96
    • Gangfeng Huang's avatar
      igb: support RX flow classification by ethertype · 64c75d41
      Gangfeng Huang authored
      This patch is meant to allow for RX network flow classification to insert
      and remove ethertype filter by ethtool
      
      Example:
      Add an ethertype filter:
      $ ethtool -N eth0 flow-type ether proto 0x88F8 action 2
      
      Show all filters:
      $ ethtool -n eth0
      4 RX rings available
      Total 1 rules
      
      Filter: 15
      	Flow Type: Raw Ethernet
      	Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
      	Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
      	Ethertype: 0x88F8 mask: 0x0
      	Action: Direct to queue 2
      
      Delete the filter by location:
      $ ethtool -N delete 15
      Signed-off-by: default avatarRuhao Gao <ruhao.gao@ni.com>
      Signed-off-by: default avatarGangfeng Huang <gangfeng.huang@ni.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      64c75d41
    • Gangfeng Huang's avatar
      igb: add support of RX network flow classification · 0e71def2
      Gangfeng Huang authored
      This patch is meant to allow for RX network flow classification to insert
      and remove Rx filter by ethtool. Ethtool interface has it's own rules
      manager
      
      Show all filters:
      $ ethtool -n eth0
      4 RX rings available
      Total 2 rules
      Signed-off-by: default avatarRuhao Gao <ruhao.gao@ni.com>
      Signed-off-by: default avatarGangfeng Huang <gangfeng.huang@ni.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0e71def2
    • David S. Miller's avatar
      Merge branch 'qdisc-hash-fixes' · 3e7d2d45
      David S. Miller authored
      Jiri Kosina says:
      
      ====================
      qdisc-hashtable fixes
      
      The following two patches fix all the issues that have been reported
      against the conversion of qdisc linked list to hashtable (currently in
      net-next) so far.
      
      First patch adjusts handling of singleton qdiscs to the new semantics, and
      is rather straightforward.
      
      The second patch, which fixes "cosmetic" issue of duplicate entries in the
      qdisc dump for ingress qdiscs, is a little bit more hairy; I personally
      would love to see all the already existing "if (ingress)"-like hacks go
      away (by, let's say, introducing a general TCQ_F_? flag), but that's way
      out of scope of this patchset (but already on my todo).
      
      Thanks a lot to Daniel Borkmann and David Ahern for reporting the issues
      and testing the patches promptly.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e7d2d45
    • Jiri Kosina's avatar
      net: sched: avoid duplicates in qdisc dump · ea327469
      Jiri Kosina authored
      tc_dump_qdisc() performs dumping of the per-device qdiscs in two phases;
      first, the "standard" dev->qdisc is being dumped. Second, if there is/are
      ingress queue(s), they are being dumped as well.
      
      After conversion of netdevice's qdisc linked-list into hashtable, these
      two sets are not in two disjunctive sets/lists any more, but are both
      "reachable" directly from netdevice's hashtable. As a consequence, the
      "full-depth" dump of the ingress qdiscs results in immediately hitting the
      netdevice hashtable again, and duplicating the dump that has already been
      performed for dev->qdisc.
      What in fact needs to be dumped in case of ingress queue is "just" the
      top-level ingress qdisc, as everything else has been dumped already.
      
      Fix this by extending tc_dump_qdisc_root() in a way that it can be instructed
      whether it should (while performing the "full" per-netdev qdisc dump) perform
      the whole recursion, or just dump "additional" top-level (ingress) qdiscs
      without performing any kind of recursion.
      
      This fixes duplicate dumps such as
      
      	qdisc mq 0: root
      	qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc clsact ffff: parent ffff:fff1
      	qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea327469
    • Jiri Kosina's avatar
      net: sched: fix handling of singleton qdiscs with qdisc_hash · 69012ae4
      Jiri Kosina authored
      qdisc_match_from_root() is now iterating over per-netdevice qdisc
      hashtable instead of going through a linked-list of qdiscs (independently
      on the actual underlying netdev), which was the case before the switch to
      hashtable for qdiscs.
      
      For singleton qdiscs, there is no underlying netdev associated though, and
      therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will
      always be NULL.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000410
       IP: [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
       PGD 1aceba067 PUD 1aceb7067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
      [ ... ]
       task: ffff8801ec996e00 task.stack: ffff8801ec934000
       RIP: 0010:[<ffffffff8167efac>]  [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
       RSP: 0018:ffff8801ec937ab0  EFLAGS: 00010203
       RAX: 0000000000000408 RBX: ffff88025e612000 RCX: ffffffffffffffd8
       RDX: 0000000000000000 RSI: 00000000ffff0000 RDI: ffffffff81cf8100
       RBP: ffff8801ec937ab0 R08: 000000000001c160 R09: ffff8802668032c0
       R10: ffffffff81cf8100 R11: 0000000000000030 R12: 00000000ffff0000
       R13: ffff88025e612000 R14: ffffffff81cf3140 R15: 0000000000000000
       FS:  00007f24b9af6740(0000) GS:ffff88026f280000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000410 CR3: 00000001aceec000 CR4: 00000000001406e0
       Stack:
        ffff8801ec937ad0 ffffffff81681210 ffff88025dd51a00 00000000fffffff1
        ffff8801ec937b88 ffffffff81681e4e ffffffff81c42bc0 ffff880262431500
        ffffffff81cf3140 ffff88025dd51a10 ffff88025dd51a24 00000000ec937b38
       Call Trace:
        [<ffffffff81681210>] qdisc_lookup+0x40/0x50
        [<ffffffff81681e4e>] tc_modify_qdisc+0x21e/0x550
        [<ffffffff8166ae25>] rtnetlink_rcv_msg+0x95/0x220
        [<ffffffff81209602>] ? __kmalloc_track_caller+0x172/0x230
        [<ffffffff8166ad90>] ? rtnl_newlink+0x870/0x870
        [<ffffffff816897b7>] netlink_rcv_skb+0xa7/0xc0
        [<ffffffff816657c8>] rtnetlink_rcv+0x28/0x30
        [<ffffffff8168919b>] netlink_unicast+0x15b/0x210
        [<ffffffff81689569>] netlink_sendmsg+0x319/0x390
        [<ffffffff816379f8>] sock_sendmsg+0x38/0x50
        [<ffffffff81638296>] ___sys_sendmsg+0x256/0x260
        [<ffffffff811b1275>] ? __pagevec_lru_add_fn+0x135/0x280
        [<ffffffff811b1a90>] ? pagevec_lru_move_fn+0xd0/0xf0
        [<ffffffff811b1140>] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
        [<ffffffff811b1b85>] ? __lru_cache_add+0x75/0xb0
        [<ffffffff817708a6>] ? _raw_spin_unlock+0x16/0x40
        [<ffffffff811d8dff>] ? handle_mm_fault+0x39f/0x1160
        [<ffffffff81638b15>] __sys_sendmsg+0x45/0x80
        [<ffffffff81638b62>] SyS_sendmsg+0x12/0x20
        [<ffffffff810038e7>] do_syscall_64+0x57/0xb0
      
      Fix this by special-casing singleton qdiscs (those that don't have
      underlying netdevice) and introduce immediate handling of those rather
      than trying to go over an underlying netdevice. We're in the same
      situation in tc_dump_qdisc_root() and tc_dump_tclass_root().
      
      Ultimately, this will have to be slightly reworked so that we are actually
      able to show singleton qdiscs (noop) in the dump properly; but we're not
      currently doing that anyway, so no regression there, and better do this in
      a gradual manner.
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reported-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69012ae4
    • David S. Miller's avatar
      Merge branch 'tipc-next' · e951f145
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: bearer and link improvements
      
      The first commit makes it possible to set and check the 'blocked' state
      of a bearer from the generic bearer layer. The second commit is a small
      improvement to the link congestion mechanism.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e951f145
    • Jon Paul Maloy's avatar
      tipc: ensure that link congestion and wakeup use same criteria · 5a0950c2
      Jon Paul Maloy authored
      When a link is attempted woken up after congestion, it uses a different,
      more generous criteria than when it was originally declared congested.
      This has the effect that the link, and the sending process, sometimes
      will be woken up unnecessarily, just to immediately return to congestion
      when it turns out there is not not enough space in its send queue to
      host the pending message. This is a waste of CPU cycles.
      
      We now change the function link_prepare_wakeup() to use exactly the same
      criteria as tipc_link_xmit(). However, since we are now excluding the
      window limit from the wakeup calculation, and the current backlog limit
      for the lowest level is too small to house even a single maximum-size
      message, we have to expand this limit. We do this by evaluating an
      alternative, minimum value during the setting of the importance limits.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a0950c2
    • Jon Paul Maloy's avatar
      tipc: make bearer packet filtering generic · 0d051bf9
      Jon Paul Maloy authored
      In commit 5b7066c3 ("tipc: stricter filtering of packets in bearer
      layer") we introduced a method of filtering out messages while a bearer
      is being reset, to avoid that links may be re-created and come back in
      working state while we are still in the process of shutting them down.
      
      This solution works well, but is limited to only work with L2 media, which
      is insufficient with the increasing use of UDP as carrier media.
      
      We now replace this solution with a more generic one, by introducing a
      new flag "up" in the generic struct tipc_bearer. This field will be set
      and reset at the same locations as with the previous solution, while
      the packet filtering is moved to the generic code for the sending side.
      On the receiving side, the filtering is still done in media specific
      code, but now including the UDP bearer.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d051bf9
    • David S. Miller's avatar
      Merge branch 'qed-next' · 37bd91d1
      David S. Miller authored
      Sudarsana Reddy Kalluru says:
      
      ====================
      qed*: Add support for additional statistics.
      
      The patch series adds qed/qede support for new statistics.
      Patch (1) adds couple of statistcs for "ethtool -S" display.
      Patch (2) adds support for per-queue statistics to ethtool display.
      Patch (3) adds qed support for NCSI statistics.
      
      Please consider applying this to 'net-next' branch.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37bd91d1
    • Sudarsana Reddy Kalluru's avatar
      qed: Add support for NCSI statistics. · 6c754246
      Sudarsana Reddy Kalluru authored
      The patch adds driver support for sending the NCSI statistics to the
      MFW. This is an asynchronous request from MFW. Upon receiving this, driver
      populates the required data and send it to MFW.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c754246
    • Sudarsana Reddy Kalluru's avatar
    • Sudarsana Reddy Kalluru's avatar
      qede: Add support for capturing additional stats in ethtool-stats display. · 1a5a366f
      Sudarsana Reddy Kalluru authored
      The patch adds driver support for capturing stats ttl0_discard and
      packet_too_big_discard in "ethtool -S" display.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a5a366f
    • Colin Ian King's avatar
      net: atm: remove redundant null pointer check on dev->name · 0d135e4f
      Colin Ian King authored
      dev->name is a char array of IFNAMSIZ elements, hence can never be
      null, so the null pointer check is redundant. Remove it.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d135e4f
    • Appana Durga Kedareswara Rao's avatar
      net: phy: Update copyright info · e202d4c6
      Appana Durga Kedareswara Rao authored
      For implementing this driver most of the inputs is
      provided by Andrew Lunn.
      
      Updating the driver with Andrew Copy right.
      Signed-off-by: default avatarKedareswara rao Appana <appanad@xilinx.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e202d4c6
    • shubhrajyoti.datta@xilinx.com's avatar
      net: ethernet: macb: Add support for rx_clk · aead88bd
      shubhrajyoti.datta@xilinx.com authored
      Some of the platforms like zynqmp ultrascale+ has a
      separate clock gate for the rx clock. Add an optional
      rx_clk so that the clock can be enabled.
      Signed-off-by: default avatarShubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aead88bd
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · d52bfbda
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-08-18
      
      This series contains updates to i40e and i40evf only.
      
      Wei Yongjun updates i40e to use list_move() instead of list_del() &
      list_add() operations.
      
      Anjali fixes an issue where the client->open call was not protected with
      the client instance mutex, which allowed client->close to be called before
      the open all completed.
      
      Catherine makes sure that the VLAN count (and stats) gets reset to 0
      after reset.
      
      Jake provides two patches, first adds the needed rtnl lock around
      i40evf_set_interrupt_capability() since i40evf_init_task() does not
      hold the rtnl_lock.  Second fixes an issue where users could reduce
      the number of channels (queues) below the current flow director
      filter rules targets.
      
      Dave fixes a problem where a static analysis tool generates a warning
      so eliminating the irrelevant check and redundant assignment for the
      value of enabled_tc.
      
      Avinash fixes an sync issue where the iWARP device open is called
      before the PCI register writes are completed, so ensure the register
      writes complete before exiting the setup function.
      
      Alan fixes a bug which causes RSS to continue to work after being
      disabled.
      
      Carolyn implements a feature change which allows using ethtool to set
      RDD hash options using less than four parameters if desired.
      
      Dan Carpenter cleans up a stray unlock.
      
      Sridhar exposes the "trust" flag to userspace via ndo_get_vf_config().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d52bfbda
  2. 18 Aug, 2016 17 commits
  3. 17 Aug, 2016 4 commits
    • David S. Miller's avatar
      Merge branch 'strparser' · 48433419
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      strp: Stream parser for messages
      
      This patch set introduces a utility for parsing application layer
      protocol messages in a TCP stream. This is a generalization of the
      mechanism implemented of Kernel Connection Multiplexor.
      
      This patch set adapts KCM to use the strparser. We expect that kTLS
      can use this mechanism also. RDS would probably be another candidate
      to use a common stream parsing mechanism.
      
      The API includes a context structure, a set of callbacks, utility
      functions, and a data ready function. The callbacks include
      a parse_msg function that is called to perform parsing (e.g.
      BPF parsing in case of KCM), and a rcv_msg function that is called
      when a full message has been completed.
      
      For strparser we specify the return codes from the parser to allow
      the backend to indicate that control of the socket should be
      transferred back to userspace to handle some exceptions in the
      stream: The return values are:
      
            >0 : indicates length of successfully parsed message
             0  : indicates more data must be received to parse the message
             -ESTRPIPE : current message should not be processed by the
                kernel, return control of the socket to userspace which
                can proceed to read the messages itself
             other < 0 : Error is parsing, give control back to userspace
                assuming that synchronization is lost and the stream
                is unrecoverable (application expected to close TCP socket)
      
      There is one issue I haven't been able to fully resolve. If parse_msg
      returns ESTRPIPE (wants control back to userspace) the parser may
      already have consumed some bytes of the message. There is no way to
      put bytes back into the TCP receive queue and tcp_read_sock does not
      allow an easy way to peek messages. In lieu of a better solution, we
      return ENODATA on the socket to indicate that the data stream is
      unrecoverable (application needs to close socket). This condition
      should only happen if an application layer message header is split
      across two skbuffs and parsing just the first skbuff wasn't sufficient
      to determine the that transfer to userspace is needed.
      
      This patch set contains:
      
        - strparser implementation
        - changes to kcm to use strparser
        - strparser.txt documentation
      
      v2:
        - Add copyright notice to C files
        - Remove GPL module license from strparser.c
        - Add report of rxpause
      
      v3:
        - Restore GPL module license
        - Use EXPORT_SYMBOL_GPL
      
      v4:
        - Removed unused function, changed another to be static as suggested
          by davem
        - Rewoked data_ready to be called from upper layer, no longer requires
          taking over socket data_ready callback as suggested by Lance Chao
      
      Tested:
        - Ran a KCM thrash test for 24 hours. No behavioral or performance
          differences observed.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48433419
    • Tom Herbert's avatar
      strparser: Documentation · adcce4d5
      Tom Herbert authored
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adcce4d5
    • Tom Herbert's avatar
      kcm: Use stream parser · 9b73896a
      Tom Herbert authored
      Adapt KCM to use the stream parser. This mostly involves removing
      the RX handling and setting up the strparser using the interface.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b73896a
    • Tom Herbert's avatar
      strparser: Stream parser for messages · 43a0c675
      Tom Herbert authored
      This patch introduces a utility for parsing application layer protocol
      messages in a TCP stream. This is a generalization of the mechanism
      implemented of Kernel Connection Multiplexor.
      
      The API includes a context structure, a set of callbacks, utility
      functions, and a data ready function.
      
      A stream parser instance is defined by a strparse structure that
      is bound to a TCP socket. The function to initialize the structure
      is:
      
      int strp_init(struct strparser *strp, struct sock *csk,
                    struct strp_callbacks *cb);
      
      csk is the TCP socket being bound to and cb are the parser callbacks.
      
      The upper layer calls strp_tcp_data_ready when data is ready on the lower
      socket for strparser to process. This should be called from a data_ready
      callback that is set on the socket:
      
      void strp_tcp_data_ready(struct strparser *strp);
      
      A parser is bound to a TCP socket by setting data_ready function to
      strp_tcp_data_ready so that all receive indications on the socket
      go through the parser. This is assumes that sk_user_data is set to
      the strparser structure.
      
      There are four callbacks.
       - parse_msg is called to parse the message (returns length or error).
       - rcv_msg is called when a complete message has been received
       - read_sock_done is called when data_ready function exits
       - abort_parser is called to abort the parser
      
      The input to parse_msg is an skbuff which contains next message under
      construction. The backend processing of parse_msg will parse the
      application layer protocol headers to determine the length of
      the message in the stream. The possible return values are:
      
         >0 : indicates length of successfully parsed message
         0  : indicates more data must be received to parse the message
         -ESTRPIPE : current message should not be processed by the
            kernel, return control of the socket to userspace which
            can proceed to read the messages itself
         other < 0 : Error is parsing, give control back to userspace
            assuming that synchronzation is lost and the stream
            is unrecoverable (application expected to close TCP socket)
      
      In the case of error return (< 0) strparse will stop the parser
      and report and error to userspace. The application must deal
      with the error. To handle the error the strparser is unbound
      from the TCP socket. If the error indicates that the stream
      TCP socket is at recoverable point (ESTRPIPE) then the application
      can read the TCP socket to process the stream. Once the application
      has dealt with the exceptions in the stream, it may again bind the
      socket to a strparser to continue data operations.
      
      Note that ENODATA may be returned to the application. In this case
      parse_msg returned -ESTRPIPE, however strparser was unable to maintain
      synchronization of the stream (i.e. some of the message in question
      was already read by the parser).
      
      strp_pause and strp_unpause are used to provide flow control. For
      instance, if rcv_msg is called but the upper layer can't immediately
      consume the message it can hold the message and pause strparser.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43a0c675