1. 16 Jan, 2023 13 commits
  2. 14 Jan, 2023 23 commits
  3. 13 Jan, 2023 4 commits
    • Yunhui Cui's avatar
      sock: add tracepoint for send recv length · 6e6eda44
      Yunhui Cui authored
      Add 2 tracepoints to monitor the tcp/udp traffic
      of per process and per cgroup.
      
      Regarding monitoring the tcp/udp traffic of each process, there are two
      existing solutions, the first one is https://www.atoptool.nl/netatop.php.
      The second is via kprobe/kretprobe.
      
      Netatop solution is implemented by registering the hook function at the
      hook point provided by the netfilter framework.
      
      These hook functions may be in the soft interrupt context and cannot
      directly obtain the pid. Some data structures are added to bind packets
      and processes. For example, struct taskinfobucket, struct taskinfo ...
      
      Every time the process sends and receives packets it needs multiple
      hashmaps,resulting in low performance and it has the problem fo inaccurate
      tcp/udp traffic statistics(for example: multiple threads share sockets).
      
      We can obtain the information with kretprobe, but as we know, kprobe gets
      the result by trappig in an exception, which loses performance compared
      to tracepoint.
      
      We compared the performance of tracepoints with the above two methods, and
      the results are as follows:
      
      ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html
      without trace:
      Time per request: 39.660 [ms] (mean)
      Time per request: 0.040 [ms] (mean, across all concurrent requests)
      
      netatop:
      Time per request: 50.717 [ms] (mean)
      Time per request: 0.051 [ms] (mean, across all concurrent requests)
      
      kr:
      Time per request: 43.168 [ms] (mean)
      Time per request: 0.043 [ms] (mean, across all concurrent requests)
      
      tracepoint:
      Time per request: 41.004 [ms] (mean)
      Time per request: 0.041 [ms] (mean, across all concurrent requests
      
      It can be seen that tracepoint has better performance.
      Signed-off-by: default avatarYunhui Cui <cuiyunhui@bytedance.com>
      Signed-off-by: default avatarXiongchun Duan <duanxiongchun@bytedance.com>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e6eda44
    • David S. Miller's avatar
      Merge branch 'rmnet-tx-pkt-aggregation' · 8e8b6c63
      David S. Miller authored
      Daniele Palmas says:
      
      ====================
      net: add tx packets aggregation to ethtool and rmnet
      
      Hello maintainers and all,
      
      this patchset implements tx qmap packets aggregation in rmnet and generic
      ethtool support for that.
      
      Some low-cat Thread-x based modems are not capable of properly reaching the maximum
      allowed throughput both in tx and rx during a bidirectional test if tx packets
      aggregation is not enabled.
      
      I verified this problem with rmnet + qmi_wwan by using a MDM9207 Cat. 4 based modem
      (50Mbps/150Mbps max throughput). What is actually happening is pictured at
      https://drive.google.com/file/d/1gSbozrtd9h0X63i6vdkNpN68d-9sg8f9/view
      
      Testing with iperf TCP, when rx and tx flows are tested singularly there's no issue
      in tx and minor issues in rx (not able to reach max throughput). When there are concurrent
      tx and rx flows, tx throughput has an huge drop. rx a minor one, but still present.
      
      The same scenario with tx aggregation enabled is pictured at
      https://drive.google.com/file/d/1jcVIKNZD7K3lHtwKE5W02mpaloudYYih/view
      showing a regular graph.
      
      This issue does not happen with high-cat modems (e.g. SDX20), or at least it
      does not happen at the throughputs I'm able to test currently: maybe the same
      could happen when moving close to the maximum rates supported by those modems.
      Anyway, having the tx aggregation enabled should not hurt.
      
      The first attempt to solve this issue was in qmi_wwan qmap implementation,
      see the discussion at https://lore.kernel.org/netdev/20221019132503.6783-1-dnlplm@gmail.com/
      
      However, it turned out that rmnet was a better candidate for the implementation.
      
      Moreover, Greg and Jakub suggested also to use ethtool for the configuration:
      not sure if I got their advice right, but this patchset add also generic ethtool
      support for tx aggregation.
      
      The patches have been tested mainly against an MDM9207 based modem through USB
      and SDX55 through PCI (MHI).
      
      v2 should address the comments highlighted in the review: the implementation is
      still in rmnet, due to Subash's request of keeping tx aggregation there.
      
      v3 fixes ethtool-netlink.rst content out of table bounds and a W=1 build warning
      for patch 2.
      
      v4 solves a race related to egress_agg_params.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e8b6c63
    • Daniele Palmas's avatar
      net: qualcomm: rmnet: add ethtool support for configuring tx aggregation · db8a563a
      Daniele Palmas authored
      Add support for ETHTOOL_COALESCE_TX_AGGR for configuring the tx
      aggregation settings.
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db8a563a
    • Daniele Palmas's avatar
      net: qualcomm: rmnet: add tx packets aggregation · 64b5d1f8
      Daniele Palmas authored
      Add tx packets aggregation.
      
      Bidirectional TCP throughput tests through iperf with low-cat
      Thread-x based modems revelead performance issues both in tx
      and rx.
      
      The Windows driver does not show this issue: inspecting USB
      packets revealed that the only notable change is the driver
      enabling tx packets aggregation.
      
      Tx packets aggregation is by default disabled and can be enabled
      by increasing the value of ETHTOOL_A_COALESCE_TX_MAX_AGGR_FRAMES.
      
      The maximum aggregated size is by default set to a reasonably low
      value in order to support the majority of modems.
      
      This implementation is based on patches available in Code Aurora
      repositories (msm kernel) whose main authors are
      
      Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Sean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Reviewed-by: default avatarSubash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64b5d1f8