- 24 Oct, 2023 33 commits
-
-
Swarup Laxman Kotiaklapudi authored
Change ifconfig with ip command, on a system where ifconfig is not used this script will not work correcly. Test result with this patchset: sudo make TARGETS="net" kselftest .... TAP version 13 1..1 timeout set to 1500 selftests: net: route_localnet.sh run arp_announce test net.ipv4.conf.veth0.route_localnet = 1 net.ipv4.conf.veth1.route_localnet = 1 net.ipv4.conf.veth0.arp_announce = 2 net.ipv4.conf.veth1.arp_announce = 2 PING 127.25.3.14 (127.25.3.14) from 127.25.3.4 veth0: 56(84) bytes of data. 64 bytes from 127.25.3.14: icmp_seq=1 ttl=64 time=0.038 ms 64 bytes from 127.25.3.14: icmp_seq=2 ttl=64 time=0.068 ms 64 bytes from 127.25.3.14: icmp_seq=3 ttl=64 time=0.068 ms 64 bytes from 127.25.3.14: icmp_seq=4 ttl=64 time=0.068 ms 64 bytes from 127.25.3.14: icmp_seq=5 ttl=64 time=0.068 ms --- 127.25.3.14 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4073ms rtt min/avg/max/mdev = 0.038/0.062/0.068/0.012 ms ok run arp_ignore test net.ipv4.conf.veth0.route_localnet = 1 net.ipv4.conf.veth1.route_localnet = 1 net.ipv4.conf.veth0.arp_ignore = 3 net.ipv4.conf.veth1.arp_ignore = 3 PING 127.25.3.14 (127.25.3.14) from 127.25.3.4 veth0: 56(84) bytes of data. 64 bytes from 127.25.3.14: icmp_seq=1 ttl=64 time=0.032 ms 64 bytes from 127.25.3.14: icmp_seq=2 ttl=64 time=0.065 ms 64 bytes from 127.25.3.14: icmp_seq=3 ttl=64 time=0.066 ms 64 bytes from 127.25.3.14: icmp_seq=4 ttl=64 time=0.065 ms 64 bytes from 127.25.3.14: icmp_seq=5 ttl=64 time=0.065 ms --- 127.25.3.14 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4092ms rtt min/avg/max/mdev = 0.032/0.058/0.066/0.013 ms ok ok 1 selftests: net: route_localnet.sh ... Signed-off-by: Swarup Laxman Kotiaklapudi <swarupkotikalapudi@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231023123422.2895-1-swarupkotikalapudi@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Florian Fainelli says: ==================== Switch DSA to inclusive terminology One of the action items following Netconf'23 is to switch subsystems to use inclusive terminology. DSA has been making extensive use of the "master" and "slave" words which are now replaced by "conduit" and "user" respectively. ==================== Link: https://lore.kernel.org/r/20231023181729.1191071-1-florian.fainelli@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Fainelli authored
This preserves the existing IFLA_DSA_MASTER which is part of the uAPI and creates an alias named IFLA_DSA_CONDUIT. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-3-florian.fainelli@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Fainelli authored
Use more inclusive terms throughout the DSA subsystem by moving away from "master" which is replaced by "conduit" and "slave" which is replaced by "user". No functional changes. Acked-by: Rob Herring <robh@kernel.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-2-florian.fainelli@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gerhard Engleder authored
Compiler warns about a possible format-overflow in tsnep_request_irq(): drivers/net/ethernet/engleder/tsnep_main.c:884:55: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=] sprintf(queue->name, "%s-rx-%d", name, ^ drivers/net/ethernet/engleder/tsnep_main.c:881:55: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=] sprintf(queue->name, "%s-tx-%d", name, ^ drivers/net/ethernet/engleder/tsnep_main.c:878:49: warning: '-txrx-' directive writing 6 bytes into a region of size between 5 and 25 [-Wformat-overflow=] sprintf(queue->name, "%s-txrx-%d", name, ^~~~~~ Actually overflow cannot happen. Name is limited to IFNAMSIZ, because netdev_name() is called during ndo_open(). queue_index is single char, because less than 10 queues are supported. Fix warning with snprintf(). Additionally increase buffer to 32 bytes, because those 7 additional bytes were unused anyway. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310182028.vmDthIUa-lkp@intel.com/Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231023183856.58373-1-gerhard@engleder-embedded.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jakub Kicinski says: ==================== net: deduplicate netdev name allocation After recent fixes we have even more duplicated code in netdev name allocation helpers. There are two complications in this code. First, __dev_alloc_name() clobbers its output arg even if allocation fails, forcing callers to do extra copies. Second as our experience in commit 55a5ec9b ("Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns"") and commit 029b6d14 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"") taught us, user space is very sensitive to the exact error codes. Align the callers of __dev_alloc_name(), and remove some of its complexity. v1: https://lore.kernel.org/all/20231020011856.3244410-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20231023152346.3639749-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Remove unnecessary else clauses after return. I copied this if / else construct from somewhere, it makes the code harder to read. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-7-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
__dev_alloc_name() is only called by dev_prep_valid_name(), which already checks that name is valid. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-6-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Prior to restructuring __dev_alloc_name() handled both printf and non-printf names. In a clever attempt at code reuse it always prints the name into a buffer and checks if it's a duplicate. Trust the bitmap, and return an error if its full. This shrinks the possible ID space by one from 32K to 32K - 1, as previously the max value would have been tried as a valid ID. It seems very unlikely that anyone would care as we heard no requests to increase the max beyond 32k. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-5-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
All callers of __dev_valid_name() go thru dev_prep_valid_name() which handles the non-printf case. Focus __dev_alloc_name() on the sprintf case, remove the indentation level. Minor functional change of returning -EINVAL if % is not found, which should now never happen. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-4-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
__dev_alloc_name() handles both the sprintf and non-sprintf target names. This complicates the code. dev_prep_valid_name() already handles the non-sprintf case, before calling __dev_alloc_name(), make the only other caller also go thru dev_prep_valid_name(). This way we can drop the non-sprintf handling in __dev_alloc_name() in one of the next changes. commit 55a5ec9b ("Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns"") and commit 029b6d14 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"") tell us that we can't start returning -EEXIST from dev_alloc_name() on name duplicates. Bite the bullet and pass the expected errno to dev_prep_valid_name(). dev_prep_valid_name() must now propagate out the allocated id for printf names. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-3-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Callers of __dev_alloc_name() want to pass dev->name as the output buffer. Make __dev_alloc_name() not clobber that buffer on failure, and remove the workarounds in callers. dev_alloc_name_ns() is now completely unnecessary. The extra strscpy() added here will be gone by the end of the patch series. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-2-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Mat Martineau says: ==================== mptcp: convert Netlink code to use YAML spec This series from Davide converts most of the MPTCP Netlink interface (plus uAPI bits) to use sources generated by YNL using a YAML spec file. This new YAML file is useful to validate the API and to generate a good documentation page. Patch 1 modifies YNL spec to support "uns-admin-perm" for genetlink legacy. Patch 2 adds support for validating exact length of netlink attrs. Patch 3 converts Netlink structures from small_ops to ops to prepare the switch to YAML. Patch 4 adds the Netlink YAML spec for MPTCP. Patch 5 adds and uses a new header file generated from the new YAML spec. Patch 6 renames some handlers to match the ones generated from the YAML spec. Patch 7 adds and uses Netlink policies automatically generated from the YAML spec. ==================== Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-0-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Davide Caratti authored
generated with: $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \ > --spec Documentation/netlink/specs/mptcp.yaml --source \ > -o net/mptcp/mptcp_pm_gen.c $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \ > --spec Documentation/netlink/specs/mptcp.yaml --header \ > -o net/mptcp/mptcp_pm_gen.h Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/340Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-7-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Davide Caratti authored
so that they will match names generated from YAML spec. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-6-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Davide Caratti authored
generated with: $ ./tools/net/ynl/ynl-gen-c.py --mode uapi \ > --spec Documentation/netlink/specs/mptcp.yaml \ > --header -o include/uapi/linux/mptcp_pm.h Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-5-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Davide Caratti authored
it describes most of the current netlink interface (uAPI definitions, doit/dumpit operations and attributes) Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-4-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Davide Caratti authored
in the current MPTCP control plane, all operations use a netlink attribute of the same type "MPTCP_PM_ATTR". However, add/del/get/flush operations only parse the first element in the message _ the one that describes MPTCP endpoints (that was named MPTCP_PM_ATTR_ADDR and mostly used in ADD_ADDR operations _ probably the similarity of "attr", "addr" and "add" might cause some confusion to human readers). Convert MPTCP from 'small_ops' to 'ops', thus allowing different attributes for each single operation, hopefully makes all this clearer to human readers. - use a separate attribute set for add/del/get/flush address operation, binary compatible with the existing one, to store the endpoint address. MPTCP_PM_ENDPOINT_ADDR is added to the uAPI (with the same value as MPTCP_PM_ATTR_ADDR) for these operations. - convert mptcp_pm_ops[] and add policy files accordingly. this prepares MPTCP control plane to be described as YAML spec. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-3-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Davide Caratti authored
add support for 'exact-len' validation on netlink attributes. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340Acked-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-2-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Davide Caratti authored
this flag maps to GENL_UNS_ADMIN_PERM and will be used by future specs. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-1-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Liu Jian authored
A helper function for printing non-work-conserving alarms is added in commit b00355db ("pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler."). In this commit, use qdisc_warn_nonwc() instead of WARN_ONCE() to handle the non-work-conserving warning in qfq Qdisc. Signed-off-by: Liu Jian <liujian56@huawei.com> Link: https://lore.kernel.org/r/20231023064729.370649-1-liujian56@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Adam Ford authored
Currently there is a device tree entry called "local-mac-address" which can be filled by the bootloader or manually set.This is useful when the user does not want to use the MAC address programmed into the SoC. Currently, the davinci_emac reads the MAC from the DT, copies it from pdata->mac_addr to priv->mac_addr, then blindly overwrites it by reading from registers in the SoC, and falls back to a random MAC if it's still not valid. This completely ignores any MAC address in the device tree. In order to use the local-mac-address, check to see if the contents of priv->mac_addr are valid before falling back to reading from the SoC when the MAC address is not valid. Signed-off-by: Adam Ford <aford173@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231022151911.4279-1-aford173@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Abel Wu authored
Before sockets became aware of net-memcg's memory pressure since commit e1aab161 ("socket: initial cgroup code."), the memory usage would be granted to raise if below average even when under protocol's pressure. This provides fairness among the sockets of same protocol. That commit changes this because the heuristic will also be effective when only memcg is under pressure which makes no sense. So revert that behavior. After reverting, __sk_mem_raise_allocated() no longer considers memcg's pressure. As memcgs are isolated from each other w.r.t. memory accounting, consuming one's budget won't affect others. So except the places where buffer sizes are needed to be tuned, allow workloads to use the memory they are provisioned. Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231019120026.42215-3-wuyun.abel@bytedance.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Abel Wu authored
There are now two accounting infrastructures for skmem, while the heuristics in __sk_mem_raise_allocated() were actually introduced before memcg was born. Add some comments to clarify whether they can be applied to both infrastructures or not. Suggested-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> Acked-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231019120026.42215-2-wuyun.abel@bytedance.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Abel Wu authored
Code cleanup for both better simplicity and readability. No functional change intended. Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> Acked-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231019120026.42215-1-wuyun.abel@bytedance.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jan Kiszka authored
Helps identifying the ports in udev rules e.g. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/895ae9c1-b6dd-4a97-be14-6f2b73c7b2b5@siemens.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Vishvambar Panth S authored
Currently all RX frames are timestamped which results in a performance penalty when timestamping is not needed. The default is now being changed to not timestamp any Rx frames (HWTSTAMP_FILTER_NONE), but support has been added to allow changing the desired RX timestamping mode (HWTSTAMP_FILTER_ALL - which was the previous setting and HWTSTAMP_FILTER_PTP_V2_EVENT are now supported) using SIOCSHWTSTAMP. All settings were tested using the hwstamp_ctl application. It is also noted that ptp4l, when started, preconfigures the device to timestamp using HWTSTAMP_FILTER_PTP_V2_EVENT, so this driver continues to work properly "out of the box". Test setup: x64 PC with LAN7430 ---> x64 PC as partner iperf3 with - Timestamp all incoming packets: - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-5.05 sec 517 MBytes 859 Mbits/sec 0 sender [ 5] 0.00-5.00 sec 515 MBytes 864 Mbits/sec receiver iperf Done. iperf3 with - Timestamp only PTP packets: - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-5.04 sec 563 MBytes 937 Mbits/sec 0 sender [ 5] 0.00-5.00 sec 561 MBytes 941 Mbits/sec receiver Signed-off-by: Vishvambar Panth S <vishvambarpanth.s@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231020185801.25649-1-vishvambarpanth.s@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jakub Kicinski authored
Yunsheng Lin says: ==================== introduce page_pool_alloc() related API In [1] & [2] & [3], there are usecases for veth and virtio_net to use frag support in page pool to reduce memory usage, and it may request different frag size depending on the head/tail room space for xdp_frame/shinfo and mtu/packet size. When the requested frag size is large enough that a single page can not be split into more than one frag, using frag support only have performance penalty because of the extra frag count handling for frag support. So this patchset provides a page pool API for the driver to allocate memory with least memory utilization and performance penalty when it doesn't know the size of memory it need beforehand. 1. https://patchwork.kernel.org/project/netdevbpf/patch/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/ 2. https://patchwork.kernel.org/project/netdevbpf/patch/20230526054621.18371-3-liangchen.linux@gmail.com/ 3. https://github.com/alobakin/linux/tree/iavf-pp-frag ==================== Link: https://lore.kernel.org/r/20231020095952.11055-1-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yunsheng Lin authored
Use page_pool_alloc() API to allocate memory with least memory utilization and performance penalty. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> CC: Lorenzo Bianconi <lorenzo@kernel.org> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: Liang Chen <liangchen.linux@gmail.com> CC: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20231020095952.11055-6-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yunsheng Lin authored
As more drivers begin to use the fragment API, update the document about how to decide which API to use for the driver author. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> CC: Lorenzo Bianconi <lorenzo@kernel.org> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: Liang Chen <liangchen.linux@gmail.com> CC: Alexander Lobakin <aleksander.lobakin@intel.com> CC: Dima Tisnek <dimaqq@gmail.com> Link: https://lore.kernel.org/r/20231020095952.11055-5-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yunsheng Lin authored
Currently page pool supports the below use cases: use case 1: allocate page without page splitting using page_pool_alloc_pages() API if the driver knows that the memory it need is always bigger than half of the page allocated from page pool. use case 2: allocate page frag with page splitting using page_pool_alloc_frag() API if the driver knows that the memory it need is always smaller than or equal to the half of the page allocated from page pool. There is emerging use case [1] & [2] that is a mix of the above two case: the driver doesn't know the size of memory it need beforehand, so the driver may use something like below to allocate memory with least memory utilization and performance penalty: if (size << 1 > max_size) page = page_pool_alloc_pages(); else page = page_pool_alloc_frag(); To avoid the driver doing something like above, add the page_pool_alloc() API to support the above use case, and update the true size of memory that is acctually allocated by updating '*size' back to the driver in order to avoid exacerbating truesize underestimate problem. Rename page_pool_free() which is used in the destroy process to __page_pool_destroy() to avoid confusion with the newly added API. 1. https://lore.kernel.org/all/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/ 2. https://lore.kernel.org/all/20230526054621.18371-3-liangchen.linux@gmail.com/Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> CC: Lorenzo Bianconi <lorenzo@kernel.org> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: Liang Chen <liangchen.linux@gmail.com> CC: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20231020095952.11055-4-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yunsheng Lin authored
PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count handling is unified and page_pool_alloc_frag() is supported in 32-bit arch with 64-bit DMA, so remove it. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> CC: Lorenzo Bianconi <lorenzo@kernel.org> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: Liang Chen <liangchen.linux@gmail.com> CC: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20231020095952.11055-3-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yunsheng Lin authored
Currently when page_pool_create() is called with PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only allowed to be called under the below constraints: 1. page_pool_fragment_page() need to be called to setup page->pp_frag_count immediately. 2. page_pool_defrag_page() often need to be called to drain the page->pp_frag_count when there is no more user will be holding on to that page. Those constraints exist in order to support a page to be split into multi fragments. And those constraints have some overhead because of the cache line dirtying/bouncing and atomic update. Those constraints are unavoidable for case when we need a page to be split into more than one fragment, but there is also case that we want to avoid the above constraints and their overhead when a page can't be split as it can only hold a fragment as requested by user, depending on different use cases: use case 1: allocate page without page splitting. use case 2: allocate page with page splitting. use case 3: allocate page with or without page splitting depending on the fragment size. Currently page pool only provide page_pool_alloc_pages() and page_pool_alloc_frag() API to enable the 1 & 2 separately, so we can not use a combination of 1 & 2 to enable 3, it is not possible yet because of the per page_pool flag PP_FLAG_PAGE_FRAG. So in order to allow allocating unsplit page without the overhead of split page while still allow allocating split page we need to remove the per page_pool flag in page_pool_is_last_frag(), as best as I can think of, it seems there are two methods as below: 1. Add per page flag/bit to indicate a page is split or not, which means we might need to update that flag/bit everytime the page is recycled, dirtying the cache line of 'struct page' for use case 1. 2. Unify the page->pp_frag_count handling for both split and unsplit page by assuming all pages in the page pool is split into a big fragment initially. As page pool already supports use case 1 without dirtying the cache line of 'struct page' whenever a page is recyclable, we need to support the above use case 3 with minimal overhead, especially not adding any noticeable overhead for use case 1, and we are already doing an optimization by not updating pp_frag_count in page_pool_defrag_page() for the last fragment user, this patch chooses to unify the pp_frag_count handling to support the above use case 3. There is no noticeable performance degradation and some justification for unifying the frag_count handling with this patch applied using a micro-benchmark testing in [1]. 1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> CC: Lorenzo Bianconi <lorenzo@kernel.org> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: Liang Chen <liangchen.linux@gmail.com> CC: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 23 Oct, 2023 7 commits
-
-
Jakub Kicinski authored
Merge tag 'for-net-next-2023-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add 0bda:b85b for Fn-Link RTL8852BE - ISO: Many fixes for broadcast support - Mark bcm4378/bcm4387 as BROKEN_LE_CODED - Add support ITTIM PE50-M75C - Add RTW8852BE device 13d3:3570 - Add support for QCA2066 - Add support for Intel Misty Peak - 8087:0038 * tag 'for-net-next-2023-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: Bluetooth: hci_sync: Fix Opcode prints in bt_dev_dbg/err Bluetooth: Fix double free in hci_conn_cleanup Bluetooth: btmtksdio: enable bluetooth wakeup in system suspend Bluetooth: btusb: Add 0bda:b85b for Fn-Link RTL8852BE Bluetooth: hci_bcm4377: Mark bcm4378/bcm4387 as BROKEN_LE_CODED Bluetooth: ISO: Copy BASE if service data matches EIR_BAA_SERVICE_UUID Bluetooth: Make handle of hci_conn be unique Bluetooth: btusb: Add date->evt_skb is NULL check Bluetooth: ISO: Fix bcast listener cleanup Bluetooth: msft: __hci_cmd_sync() doesn't return NULL Bluetooth: ISO: Match QoS adv handle with BIG handle Bluetooth: ISO: Allow binding a bcast listener to 0 bises Bluetooth: btusb: Add RTW8852BE device 13d3:3570 to device tables Bluetooth: qca: add support for QCA2066 Bluetooth: ISO: Set CIS bit only for devices with CIS support Bluetooth: Add support for Intel Misty Peak - 8087:0038 Bluetooth: Add support ITTIM PE50-M75C Bluetooth: ISO: Pass BIG encryption info through QoS Bluetooth: ISO: Fix BIS cleanup ==================== Link: https://lore.kernel.org/r/20231023182119.3629194-1-luiz.dentz@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jiri Pirko says: ==================== devlink: finish conversion to generated split_ops This patchset converts the remaining genetlink commands to generated split_ops and removes the existing small_ops arrays entirely alongside with shared netlink attribute policy. Patches #1-#6 are just small preparations and small fixes on multiple places. Note that couple of patches contain the "Fixes" tag but no need to put them into -net tree. Patch #7 is a simple rename preparation Patch #8 is the main one in this set and adds actual definitions of cmds in to yaml file. Patches #9-#10 finalize the change removing bits that are no longer in use. ==================== Link: https://lore.kernel.org/r/20231021112711.660606-1-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
All commands are now covered by generated split_ops. Remove the small_ops entirely alongside with unified devlink netlink policy array. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231021112711.660606-11-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
The prototypes are now generated, remove the old ones. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231021112711.660606-10-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
Currently, some of the commands are not described in devlink yaml file and are manually filled in net/devlink/netlink.c in small_ops. To make all part of split_ops, add definitions of the rest of the commands alongside with needed attributes and enums. Note that this focuses on the kernel side. The requests are fully described in order to generate split_op alongside with policies. Follow-up will describe the replies in order to make the userspace helpers complete. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231021112711.660606-9-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
All remaining doit and dumpit netlink callback functions are going to be used by generated split ops. They expect certain name format. Rename the callback to be aligned with generated names. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231021112711.660606-8-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
Since this enum is going to be used in generated userspace file, name it properly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231021112711.660606-7-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-