- 09 Jan, 2019 1 commit
-
-
Jason Gunthorpe authored
'dev' is non NULL when the addr_len check triggers so it must goto a label that does the dev_put otherwise dev will have a leaked refcount. This bug causes the ib_ipoib module to become unloadable when using systemd-network as it triggers this check on InfiniBand links. Fixes: 99137b78 ("packet: validate address length") Reported-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 08 Jan, 2019 17 commits
-
-
David S. Miller authored
Daniel Borkmann says: ==================== pull-request: bpf 2019-01-08 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix BSD'ism in sendmsg(2) to rewrite unspecified IPv6 dst for unconnected UDP sockets with [::1] _after_ cgroup BPF invocation, from Andrey. 2) Follow-up fix to the speculation fix where we need to reject a corner case for sanitation when ptr and scalars are mixed in the same alu op. Also, some unrelated minor doc fixes, from Daniel. 3) Fix BPF kselftest's incorrect uses of create_and_get_cgroup() by not assuming fd of zero value to be the result of an error case, from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Add a VLAN on a bridge port, delete it and make sure the PVID VLAN is not affected. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When a VLAN is deleted from a bridge port we should not change the PVID unless the deleted VLAN is the PVID. Fixes: fe9ccc78 ("mlxsw: spectrum_switchdev: Don't batch VLAN operations") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When running the test on the Spectrum ASIC the generated packets are counted on the ingress filter and injected back to the pipeline because of the 'pass' action. The router block then drops the packets due to checksum error, as the test generates packets with zero checksum. When running the test on an emulator that is not as strict about checksum errors the test fails since packets are counted twice. Once by the emulated ASIC on its ingress filter and again by the kernel as the emulator does not perform checksum validation and allows the packets to be trapped by a matching host route. Fix this by changing the action to 'drop', which will prevent the packet from continuing further in the pipeline to the router block. For veth pairs this change is essentially a NOP given packets are only processed once (by the kernel). Fixes: a0b61f3d ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When adding / deleting VLANs to / from a bridge port, the bridge driver first tries to propagate the information via switchdev and falls back to the 8021q driver in case the underlying driver does not support switchdev. This can result in a memory leak [1] when VXLAN and mlxsw ports are enslaved to the bridge: $ ip link set dev vxlan0 master br0 # No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev # notification and the bridge driver adds the VLAN on 'vxlan0' via the # 8021q driver $ bridge vlan add vid 10 dev vxlan0 pvid untagged # mlxsw port is enslaved to the bridge $ ip link set dev swp1 master br0 # mlxsw processes the switchdev notification and the 8021q driver is # skipped $ bridge vlan del vid 10 dev vxlan0 This results in 'struct vlan_info' and 'struct vlan_vid_info' being leaked, as they were allocated by the 8021q driver during VLAN addition, but never freed as the 8021q driver was skipped during deletion. Fix this by introducing a new VLAN private flag that indicates whether the VLAN was added on the port by switchdev or the 8021q driver. If the VLAN was added by the 8021q driver, then we make sure to delete it via the 8021q driver as well. [1] unreferenced object 0xffff88822d20b1e8 (size 256): comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s) hex dump (first 32 bytes): e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00 .B.............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330 [<00000000e0178b02>] vlan_vid_add+0x661/0x920 [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00 [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90 [<000000003535392c>] br_vlan_info+0x132/0x410 [<00000000aedaa9dc>] br_afspec+0x75c/0x870 [<00000000f5716133>] br_setlink+0x3dc/0x6d0 [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30 [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80 [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0 [<000000008be8d614>] rtnetlink_rcv+0x21/0x30 [<000000009ab2ca25>] netlink_unicast+0x52f/0x740 [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50 [<000000005d1e2050>] sock_sendmsg+0xbe/0x120 [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0 [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270 unreferenced object 0xffff888227454308 (size 32): comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s) hex dump (first 32 bytes): 88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff .. -...... -.... 81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330 [<0000000018050631>] vlan_vid_add+0x3e6/0x920 [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00 [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90 [<000000003535392c>] br_vlan_info+0x132/0x410 [<00000000aedaa9dc>] br_afspec+0x75c/0x870 [<00000000f5716133>] br_setlink+0x3dc/0x6d0 [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30 [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80 [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0 [<000000008be8d614>] rtnetlink_rcv+0x21/0x30 [<000000009ab2ca25>] netlink_unicast+0x52f/0x740 [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50 [<000000005d1e2050>] sock_sendmsg+0xbe/0x120 [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0 [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270 Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: bridge@lists.linux-foundation.org Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Add a test case for the issue fixed by previous commit. In case the offloading of an unsupported VxLAN tunnel was triggered by adding the mapped VLAN to a local port, then error should be returned to the user. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Adding a VLAN on a port can trigger the offload of a VXLAN tunnel which is already a member in the VLAN. In case the configuration of the VXLAN is not supported, the driver would return -EOPNOTSUPP. This is problematic since bridge code does not interpret this as error, but rather that it should try to setup the VLAN using the 8021q driver instead of switchdev. Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Drivers are not supposed to return errors in switchdev commit phase if they returned OK in prepare phase. Otherwise, a WARNING is emitted. However, when the offloading of a VXLAN tunnel is triggered by the addition of a VLAN on a local port, it is not possible to guarantee that the commit phase will succeed without doing a lot of work. In these cases, the artificial division between prepare and commit phase does not make sense, so simply do the work in the prepare phase. Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When VXLAN is a loadable module, MLXSW_SPECTRUM must not be built-in: drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:2547: undefined reference to `vxlan_fdb_find_uc' Add Kconfig dependency to enforce usable configurations. Fixes: 1231e04f ("mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Make sure that lag port TX is disabled before mlxsw_sp_port_lag_leave() is called and prevent from possible EMAD error. Fixes: 0d65fc13 ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nir Dotan authored
Removal of the mlxsw driver on Spectrum-2 platforms hits an ASSERT_RTNL() in Spectrum-2 ACL Bloom filter and in ERP removal paths. This happens because the multicast router implementation in Spectrum-2 relies on ACLs. Taking the RTNL lock upon driver removal is useless since the driver first removes its ports and unregisters from notifiers so concurrent writes cannot happen at that time. The assertions were originally put as a reminder for future work involving ERP background optimization, but having these assertions only during addition serves this purpose as well. Therefore remove the ASSERT_RTNL() in both places related to ERP and Bloom filter removal. Fixes: cf7221a4 ("mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2") Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nir Dotan authored
When writing to C-TCAM, mlxsw driver uses cregion->ops->entry_insert(). In case of C-TCAM HW insertion error, the opposite action should take place. Add error handling case in which the C-TCAM region entry is removed, by calling cregion->ops->entry_remove(). Fixes: a0a777b9 ("mlxsw: spectrum_acl: Start using A-TCAM") Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
This soft dependency works around an issue where sometimes the genphy driver is used instead of the dedicated PHY driver. The root cause of the issue isn't clear yet. People reported the unloading/re-loading module r8169 helps, and also configuring this soft dependency in the modprobe config files. Important just seems to be that the realtek module is loaded before r8169. Once this has been applied preliminary fix 38af4b90 ("net: phy: add workaround for issue where PHY driver doesn't bind to the device") will be removed. Fixes: f1e911d5 ("r8169: add basic phylib support") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bryan Whitehead authored
It has been noticed that some phys do not have the registers required by the previous implementation. To fix this, instead of using phy_read, the required information is extracted from the phy_device structure. fixes: 23f0703c ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eugene Syromiatnikov authored
The ioctl command is read/write (or just read, if the fact that user space writes n_samples field is ignored). Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eugene Syromiatnikov authored
Otherwise it is impossible to use it for something else, as it will break userspace that puts garbage there. The same check should be done in other structures, but the fact that data in reserved fields is ignored is already part of the kernel ABI. Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Daniel Borkmann says: ==================== pull-request: bpf 2019-01-08 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix BSD'ism in sendmsg(2) to rewrite unspecified IPv6 dst for unconnected UDP sockets with [::1] _after_ cgroup BPF invocation, from Andrey. 2) Follow-up fix to the speculation fix where we need to reject a corner case for sanitation when ptr and scalars are mixed in the same alu op. Also, some unrelated minor doc fixes, from Daniel. 3) Fix BPF kselftest's incorrect uses of create_and_get_cgroup() by not assuming fd of zero value to be the result of an error case, from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 07 Jan, 2019 14 commits
-
-
Alexei Starovoitov authored
Daniel Borkmann says: ==================== Two trivial doc follow-ups to i) remove deprecated kern_version mentioning in the design qa and ii) to mention stand-alone build and license of libbpf. Thanks! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
Given this came up couple of times, add a note to libbpf's readme about the semi-automated mirror for a stand-alone build which is officially managed by BPF folks. While at it, also explicitly state the libbpf license in the readme file. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
Update the bpf_design_QA.rst to also reflect recent changes in 6c4fc209 ("bpf: remove useless version check for prog load"). Suggested-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Stanislav Fomichev authored
We have some tests that assume create_and_get_cgroup returns -1 on error which is incorrect (it returns 0 on error). Since fd might be zero in general case, change create_and_get_cgroup to return -1 on error and fix the users that assume 0 on error. Fixes: f269099a ("tools/bpf: add a selftest for bpf_get_current_cgroup_id() helper") Fixes: 7d2c6cfc ("bpf: use --cgroup in test_suite if supplied") v2: - instead of fixing the uses that assume -1 on error, convert the users that assume 0 on error (fd might be zero in general case) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Cong Wang authored
In smc_release() we release smc->clcsock before unhash the smc sock, but a parallel smc_diag_dump() may be still reading smc->clcsock, therefore this could cause a use-after-free as reported by syzbot. Reported-and-tested-by: syzbot+fbd1e5476e4c94c7b34e@syzkaller.appspotmail.com Fixes: 51f1de79 ("net/smc: replace sock_put worker by socket refcounting") Cc: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reported-by: syzbot+0bf2e01269f1274b4b03@syzkaller.appspotmail.com Reported-by: syzbot+e3132895630f957306bc@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jason Gunthorpe authored
This driver requires regmap or the compile fails: drivers/phy/ti/phy-gmii-sel.c:43:27: error: array type has incomplete element type ‘struct reg_field’ const struct reg_field (*regfields)[PHY_GMII_SEL_LAST]; Add it to kconfig. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
JianJhen Chen authored
When handling DNAT'ed packets on a bridge device, the neighbour cache entry from lookup was used without checking its state. It means that a cache entry in the NUD_STALE state will be used directly instead of entering the NUD_DELAY state to confirm the reachability of the neighbor. This problem becomes worse after commit 2724680b ("neigh: Keep neighbour cache entries if number of them is small enough."), since all neighbour cache entries in the NUD_STALE state will be kept in the neighbour table as long as the number of cache entries does not exceed the value specified in gc_thresh1. This commit validates the state of a neighbour cache entry before using the entry. Signed-off-by: JianJhen Chen <kchen@synology.com> Reviewed-by: JinLin Chen <jlchen@synology.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
There is a memory leak in case genlmsg_put fails. Fix this by freeing *args* before return. Addresses-Coverity-ID: 1476406 ("Resource leak") Fixes: 46273cf7 ("tipc: fix a missing check of genlmsg_put") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bjørn Mork authored
This function is unreadable enough without indenting mismatches and unnecessary line breaks. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jacob Wen authored
Yes indeed, DIV_ROUND_UP is in kernel.h. Signed-off-by: Jacob Wen <jian.w.wen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
Avoid log spam caused by trying to read counters from the chip whilst it is in a PCI power-save state. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=107421 Fixes: 1ef7286e ("r8169: Dereference MMIO address immediately before use") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Oliver Hartkopp authored
Muyu Yu provided a POC where user root with CAP_NET_ADMIN can create a CAN frame modification rule that makes the data length code a higher value than the available CAN frame data size. In combination with a configured checksum calculation where the result is stored relatively to the end of the data (e.g. cgw_csum_xor_rel) the tail of the skb (e.g. frag_list pointer in skb_shared_info) can be rewritten which finally can cause a system crash. Michael Kubecek suggested to drop frames that have a DLC exceeding the available space after the modification process and provided a patch that can handle CAN FD frames too. Within this patch we also limit the length for the checksum calculations to the maximum of Classic CAN data length (8). CAN frames that are dropped by these additional checks are counted with the CGW_DELETED counter which indicates misconfigurations in can-gw rules. This fixes CVE-2019-3701. Reported-by: Muyu Yu <ieatmuttonchuan@gmail.com> Reported-by: Marcus Meissner <meissner@suse.de> Suggested-by: Michal Kubecek <mkubecek@suse.cz> Tested-by: Muyu Yu <ieatmuttonchuan@gmail.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> # >= v3.2 Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Warren authored
pci_{,un}map_sg are deprecated and replaced by dma_{,un}map_sg. This is especially relevant since the rest of the driver uses the DMA API. Fix the driver to use the replacement APIs. Signed-off-by: Stephen Warren <swarren@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Warren authored
This patch solves a crash at the time of mlx4 driver unload or system shutdown. The crash occurs because dma_alloc_coherent() returns one value in mlx4_alloc_icm_coherent(), but a different value is passed to dma_free_coherent() in mlx4_free_icm_coherent(). In turn this is because when allocated, that pointer is passed to sg_set_buf() to record it, then when freed it is re-calculated by calling lowmem_page_address(sg_page()) which returns a different value. Solve this by recording the value that dma_alloc_coherent() returns, and passing this to dma_free_coherent(). This patch is roughly equivalent to commit 378efe79 ("RDMA/hns: Get rid of page operation after dma_alloc_coherent"). Based-on-code-from: Christoph Hellwig <hch@lst.de> Signed-off-by: Stephen Warren <swarren@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 06 Jan, 2019 3 commits
-
-
Alexei Starovoitov authored
Daniel Borkmann says: ==================== Follow-up fix to 979d63d5 ("bpf: prevent out of bounds speculation on pointer arithmetic") in order to reject a corner case for sanitation when ptr / scalars are mixed in the same alu op. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
Add couple of test_verifier tests to check sanitation of alu op insn with pointer and scalar type coming from different paths. This also includes BPF insns of the test reproducer provided by Jann Horn. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Daniel Borkmann authored
While 979d63d5 ("bpf: prevent out of bounds speculation on pointer arithmetic") took care of rejecting alu op on pointer when e.g. pointer came from two different map values with different map properties such as value size, Jann reported that a case was not covered yet when a given alu op is used in both "ptr_reg += reg" and "numeric_reg += reg" from different branches where we would incorrectly try to sanitize based on the pointer's limit. Catch this corner case and reject the program instead. Fixes: 979d63d5 ("bpf: prevent out of bounds speculation on pointer arithmetic") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 05 Jan, 2019 5 commits
-
-
David Ahern authored
I realized the last patch calls dev_get_by_index_rcu in a branch not holding the rcu lock. Add the calls to rcu_read_lock and rcu_read_unlock. Fixes: ec90ad33 ("ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
Andrey Ignatov says: ==================== The patch set fixes BSD'ism in sys_sendmsg to rewrite unspecified destination IPv6 for unconnected UDP sockets in sys_sendmsg with [::1] in case when either CONFIG_CGROUP_BPF is enabled or when sys_sendmsg BPF hook sets destination IPv6 to [::]. Patch 1 is the fix and provides more details. Patch 2 adds two test cases to verify the fix. v1->v2: * Fix compile error in patch 1. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrey Ignatov authored
Test that sys_sendmsg BPF hook doesn't break sys_sendmsg behaviour to rewrite destination IPv6 = [::] with [::1] (BSD'ism). Two test cases are added: 1) User passes dst IPv6 = [::] and BPF_CGROUP_UDP6_SENDMSG program doesn't touch it. 2) User passes dst IPv6 != [::], but BPF_CGROUP_UDP6_SENDMSG program rewrites it with [::]. In both cases [::1] is used by sys_sendmsg code eventually and datagram is sent successfully for unconnected UDP socket. Example of relevant output: Test case: sendmsg6: set dst IP = [::] (BSD'ism) .. [PASS] Test case: sendmsg6: preserve dst IP = [::] (BSD'ism) .. [PASS] Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrey Ignatov authored
sys_sendmsg has supported unspecified destination IPv6 (wildcard) for unconnected UDP sockets since 876c7f41. When [::] is passed by user as destination, sys_sendmsg rewrites it with [::1] to be consistent with BSD (see "BSD'ism" comment in the code). This didn't work when cgroup-bpf was enabled though since the rewrite [::] -> [::1] happened before passing control to cgroup-bpf block where fl6.daddr was updated with passed by user sockaddr_in6.sin6_addr (that might or might not be changed by BPF program). That way if user passed [::] as dst IPv6 it was first rewritten with [::1] by original code from 876c7f41, but then rewritten back with [::] by cgroup-bpf block. It happened even when BPF_CGROUP_UDP6_SENDMSG program was not present (CONFIG_CGROUP_BPF=y was enough). The fix is to apply BSD'ism after cgroup-bpf block so that [::] is replaced with [::1] no matter where it came from: passed by user to sys_sendmsg or set by BPF_CGROUP_UDP6_SENDMSG program. Fixes: 1cedee13 ("bpf: Hooks for sys_sendmsg") Reported-by: Nitin Rawat <nitin.rawat@intel.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
David Ahern authored
Similar to c5ee0663 ("ipv6: Consider sk_bound_dev_if when binding a socket to an address"), binding a socket to v4 mapped addresses needs to consider if the socket is bound to a device. This problem also exists from the beginning of git history. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-