- 28 Aug, 2017 28 commits
-
-
Gao Feng authored
The commit 520ac30f ("net_sched: drop packets after root qdisc lock is released) made a big change of tc for performance. But there are some points which are not changed in SFQ enqueue operation. 1. Fail to find the SFQ hash slot; 2. When the queue is full; Now use qdisc_drop instead free skb directly. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: spectrum: Fix couple of dpipe ipv4 host table bugs Arkadi Sharshevsky (1): mlxsw: spectrum_dpipe: Fix host table dump Jiri Pirko (1): mlxsw: spectrum: compile-in dpipe support only if devlink is enabled ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
During the neighbor traversal the neighbors from different families should be ignored. Fixes: c58035a74aba ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump") Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Makes no sense to have dpipe compiled in when devlink is not enabled, because the devlink dpipe registation is noop function. So don't compile it in. This also fixes missing extern structs errors. Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: a86f0309 ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dexuan Cui authored
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It uses VMBus ringbuffer as the transportation layer. With hv_sock, applications between the host (Windows 10, Windows Server 2016 or newer) and the guest can talk with each other using the traditional socket APIs. More info about Hyper-V Sockets is available here: "Make your own integration services": https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service The patch implements the necessary support in Linux guest by introducing a new vsock transport for AF_VSOCK. Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Andy King <acking@vmware.com> Cc: Dmitry Torokhov <dtor@vmware.com> Cc: George Zhang <georgezhang@vmware.com> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Reilly Grant <grantr@vmware.com> Cc: Asias He <asias@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Cc: Marcelo Cerri <marcelo.cerri@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Add a basic test for checking whether kernel is populating the jited and xlated BPF images. It was used to confirm the behaviour change from commit d777b2dd ("bpf: don't zero out the info struct in bpf_obj_get_info_by_fd()"), which made bpf_obj_get_info_by_fd() usable for retrieving the image dumps. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
"err" is set to zero if bpf_map_area_alloc() fails so it means we return ERR_PTR(0) which is NULL. The caller, find_and_alloc_map(), is not expecting NULL returns and will oops. Fixes: 174a79ff ("bpf: sockmap with sk redirect support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Twice patches trying to constify inet{6}_protocol have been reverted: 39294c3d ("Revert "ipv6: constify inet6_protocol structures"") to revert 3a3a4e30 and then 03157937 ("Revert "ipv4: make net_protocol const"") to revert aa8db499. Add a comment that the structures can not be const because the early_demux field can change based on a sysctl. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Willem de Bruijn authored
The xen driver initializes struct ubuf_info fields using designated initializers. I recently moved these fields inside a nested anonymous struct inside an anonymous union. I had missed this use case. This breaks compilation of xen-netback with older compilers. >From kbuild bot with gcc-4.4.7: drivers/net//xen-netback/interface.c: In function 'xenvif_init_queue': >> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' specified in initializer >> drivers/net//xen-netback/interface.c:554: warning: missing braces around initializer drivers/net//xen-netback/interface.c:554: warning: (near initialization for '(anonymous).<anonymous>') >> drivers/net//xen-netback/interface.c:554: warning: initialization makes integer from pointer without a cast >> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' specified in initializer Add double braces around the designated initializers to match their nested position in the struct. After this, compilation succeeds again. Fixes: 4ab6c99d ("sock: MSG_ZEROCOPY notification coalescing") Reported-by: kbuild bot <lpk@intel.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
William Tu says: ==================== gre: add collect_md mode for ERSPAN tunnel This patch series provide collect_md mode for ERSPAN tunnel. The fist patch refactors the existing gre_fb_xmit function by exacting the route cache portion into a new function called prepare_fb_xmit. The second patch introduces the collect_md mode for ERSPAN tunnel, by calling the prepare_fb_xmit function and adding ERSPAN specific logic. The final patch adds the test case using bpf_skb_{set,get}_tunnel_{key,opt}. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
William Tu authored
Extend existing tests for vxlan, gre, geneve, ipip to include ERSPAN tunnel. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
William Tu authored
Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to operate in 'collect metadata' mode. bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away. OVS can use it as well in the future. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
William Tu authored
The patch refactors the gre_fb_xmit function, by creating prepare_fb_xmit function for later ERSPAN collect_md mode patch. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
This reverts commit aa8db499. Early demux structs can not be made const. Doing so results in: [ 84.967355] BUG: unable to handle kernel paging request at ffffffff81684b10 [ 84.969272] IP: proc_configure_early_demux+0x1e/0x3d [ 84.970544] PGD 1a0a067 [ 84.970546] P4D 1a0a067 [ 84.971212] PUD 1a0b063 [ 84.971733] PMD 80000000016001e1 [ 84.972669] Oops: 0003 [#1] SMP [ 84.973065] Modules linked in: ip6table_filter ip6_tables veth vrf [ 84.973833] CPU: 0 PID: 955 Comm: sysctl Not tainted 4.13.0-rc6+ #22 [ 84.974612] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 84.975855] task: ffff88003854ce00 task.stack: ffffc900005a4000 [ 84.976580] RIP: 0010:proc_configure_early_demux+0x1e/0x3d [ 84.977253] RSP: 0018:ffffc900005a7dd0 EFLAGS: 00010246 [ 84.977891] RAX: ffffffff81684b10 RBX: 0000000000000001 RCX: 0000000000000000 [ 84.978759] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000 [ 84.979628] RBP: ffffc900005a7dd0 R08: 0000000000000000 R09: 0000000000000000 [ 84.980501] R10: 0000000000000001 R11: 0000000000000008 R12: 0000000000000001 [ 84.981373] R13: ffffffffffffffea R14: ffffffff81a9b4c0 R15: 0000000000000002 [ 84.982249] FS: 00007feb237b7700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [ 84.983231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 84.983941] CR2: ffffffff81684b10 CR3: 0000000038492000 CR4: 00000000000406f0 [ 84.984817] Call Trace: [ 84.985133] proc_tcp_early_demux+0x29/0x30 I think this is the second time such a patch has been reverted. Cc: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bhumika Goyal authored
Make this const as it is either used during a copy operation or passed to a const argument of the function rhltable_init Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bhumika Goyal authored
Make these const as they are only passed to a const argument of the function inet_add_protocol. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bhumika Goyal authored
Make this const as it is only passed to a const argument of the function ebt_register_table. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
John Fastabend says: ==================== sockmap UAPI updates and fixes This series updates sockmap UAPI, adds additional test cases and provides a couple fixes. First the UAPI changes. The original API added two sockmap specific API artifacts (a) a new map_flags field with a sockmap specific update command and (b) a new sockmap specific attach field in the attach data structure. After this series instead of attaching programs with a single command now two commands are used to attach programs to maps individually. This allows us to add new programs easily in the future and avoids any specific sockmap data structure additions. The map_flags field is also removed and instead we allow socks to be added to multiple maps that may or may not have programs attached. This allows users to decide if a sock should run a SK_SKB program type on receive based on the map it is attached to. This is a nice improvement. See patches for specific details. More test cases were added to test above changes and also stress test the interface. Finally two fixes/improvements were made. First a missing rcu section was added. Second now sockmap can build without KCM being used to trigger 'y' on CONFIG_STREAM_PARSER by selecting a new BPF config option. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Sockmap is a bit different than normal stress tests that can run in parallel as is. We need to reuse the same socket pool and map pool to get good stress test cases. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
SOCKMAP uses strparser code (compiled with Kconfig option CONFIG_STREAM_PARSER) to run the parser BPF program. Without this config option set sockmap wont be compiled. However, at the moment the only way to pull in the strparser code is to enable KCM. To resolve this create a BPF specific config option to pull only the strparser piece in that sockmap needs. This also allows folks who want to use BPF/syscall/maps but don't need sockmap to easily opt out. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
After userspace pushes sockets into a sockmap it may not be receiving data (assuming stream_{parser|verdict} programs are attached). But, it may still want to manage the socks. A common pattern is to poll/select for a POLLRDHUP event so we can close the sock. This patch adds the logic to wake up these listeners. Also add TCP_SYN_SENT to the list of events to handle. We don't want to break the connection just because we happen to be in this state. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
When attaching a program to sockmap we need to check map type is correct. Fixes: 174a79ff ("bpf: sockmap with sk redirect support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Tests packet read/writes and additional skb fields. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Add some more sockmap tests to cover, - forwarding to NULL entries - more than two maps to test list ops - forwarding to different map Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
References to psock must be done inside RCU critical section. Fixes: 174a79ff ("bpf: sockmap with sk redirect support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
The addition of map_flags BPF_SOCKMAP_STRPARSER flags was to handle a specific use case where we want to have BPF parse program disabled on an entry in a sockmap. However, Alexei found the API a bit cumbersome and I agreed. Lets remove the STRPARSER flag and support the use case by allowing socks to be in multiple maps. This allows users to create two maps one with programs attached and one without. When socks are added to maps they now inherit any programs attached to the map. This is a nice generalization and IMO improves the API. The API rules are less ambiguous and do not need a flag: - When a sock is added to a sockmap we have two cases, i. The sock map does not have any attached programs so we can add sock to map without inheriting bpf programs. The sock may exist in 0 or more other maps. ii. The sock map has an attached BPF program. To avoid duplicate bpf programs we only add the sock entry if it does not have an existing strparser/verdict attached, returning -EBUSY if a program is already attached. Otherwise attach the program and inherit strparser/verdict programs from the sock map. This allows for socks to be in a multiple maps for redirects and inherit a BPF program from a single map. Also this patch simplifies the logic around BPF_{EXIST|NOEXIST|ANY} flags. In the original patch I tried to be extra clever and only update map entries when necessary. Now I've decided the complexity is not worth it. If users constantly update an entry with the same sock for no reason (i.e. update an entry without actually changing any parameters on map or sock) we still do an alloc/release. Using this and allowing multiple entries of a sock to exist in a map the logic becomes much simpler. Note: Now that multiple maps are supported the "maps" pointer called when a socket is closed becomes a list of maps to remove the sock from. To keep the map up to date when a sock is added to the sockmap we must add the map/elem in the list. Likewise when it is removed we must remove it from the list. This results in searching the per psock list on delete operation. On TCP_CLOSE events we walk the list and remove the psock from all map/entry locations. I don't see any perf implications in this because at most I have a psock in two maps. If a psock were to be in many maps its possibly this might be noticeable on delete but I can't think of a reason to dup a psock in many maps. The sk_callback_lock is used to protect read/writes to the list. This was convenient because in all locations we were taking the lock anyways just after working on the list. Also the lock is per sock so in normal cases we shouldn't see any contention. Suggested-by: Alexei Starovoitov <ast@kernel.org> Fixes: 174a79ff ("bpf: sockmap with sk redirect support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
In the initial sockmap API we provided strparser and verdict programs using a single attach command by extending the attach API with a the attach_bpf_fd2 field. However, if we add other programs in the future we will be adding a field for every new possible type, attach_bpf_fd(3,4,..). This seems a bit clumsy for an API. So lets push the programs using two new type fields. BPF_SK_SKB_STREAM_PARSER BPF_SK_SKB_STREAM_VERDICT This has the advantage of having a readable name and can easily be extended in the future. Updates to samples and sockmap included here also generalize tests slightly to support upcoming patch for multiple map support. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Fixes: 174a79ff ("bpf: sockmap with sk redirect support") Suggested-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Wu authored
This patch solves the following error: arch/arm/boot/dts/rk3228-evb.dtb: ERROR (phandle_references): Reference to non-existent node or label "phy0" Fixess db40f15b ("ARM: dts: rk3228-evb: Enable the integrated PHY for gmac") Signed-off-by: David Wu <david.wu@rock-chips.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 26 Aug, 2017 12 commits
-
-
Antoine Ténart authored
The MVPP22_XLG_CTRL1_FRAMESIZELIMIT define is used as an offset, but is defined as BIT(0). Updated its name to contains "OFFS" as in offset and fix its value using the offset value, 0. Reported-by: Stefan Chulski <stefanc@marvell.com> Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Fixes: 76eb1b1d ("net: mvpp2: set maximum packet size for 10G ports") Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2017-08-25 This series contains updates to i40e and i40evf only. Mitch adjusts the max packet size to account for two VLAN tags. Sudheer provides a fix to ensure that the watchdog timer is scheduled immediately after admin queue operations are scheduled in i40evf_down(). Fixes an issue by adding locking around the admin queue command and update of state variables so that adminq_subtask will have the accurate information whenever it gets scheduled. Anjali fixes a bug where the PF flag setup should happen before the VMDq RSS queue count is initialized for VMDq VSI to get the right number of queues for RSS in the case of x722 devices. Fixed a problem with the hardware ATR eviction feature where the NVM setting was incorrect. Jake separates the flags into two types, hw_features and flags. The hw_features flags contain a set of features which are enabled at init time and will not contain feature flags that can be toggled. Everything else will remain in the flags variable, and can be modified anytime during run time. We should not be directly copying a cpumask_t, since it is bitmap and might not be copied correctly, so use cpumask_copy() instead. Stefan Assmann makes vf _offload_flags more "generic" by renaming it to vf_cap_flags, which allows other capabilities besides offloading to be added. Alan makes it such that if adaptive-rx/tx is enabled, the user cannot make any manual adjustments to interrupt moderation. Also makes it so that if ITR is disabled by adaptive-rx/tx is then enabled, ITR will be re-enabled. v2: Dropped patches #1 & #8 from the original patch series submission, while Jesse and Jake re-work their patches based on feedback from David Miller. Also removed the duplicate patch 3 that was accidentally sent out twice in the previous submission. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jakub Kicinski says: ==================== nfp: SR-IOV ndos support This set adds basic SR-IOV including setting/getting VF MAC addresses, VLANs, link state and spoofcheck settings. It is wired up for both vNICs and representors (note: ip link will not report VF settings on VF/PF representors because they are not linked to the PF PCI device). Pablo and team add the basic implementation, Simon and Dirk follow up with the representor plumbing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
Add basic ndo_set/get_vf to support SR-IOV on all types of port representors. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pablo Cascón authored
Add basic ndo_set/get_vf to support SR-IOV. VF to egress phy static mapping by now. Use vfcfg ABI version 2 to write the info to the FW and collect the return value from the mailbox. Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com> Signed-off-by: Jimmy Kizito <jimmy.kizito@netronome.com> Signed-off-by: Rami Tomer <rami.tomer@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
syszkaller got a hang in tcp stack, related to a bug in tcp_sendpage_locked() root@syzkaller:~# cat /proc/3059/stack [<ffffffff83de926c>] __lock_sock+0x1dc/0x2f0 [<ffffffff83de9473>] lock_sock_nested+0xf3/0x110 [<ffffffff8408ce01>] tcp_sendmsg+0x21/0x50 [<ffffffff84163b6f>] inet_sendmsg+0x11f/0x5e0 [<ffffffff83dd8eea>] sock_sendmsg+0xca/0x110 [<ffffffff83dd9547>] kernel_sendmsg+0x47/0x60 [<ffffffff83de35dc>] sock_no_sendpage+0x1cc/0x280 [<ffffffff8408916b>] tcp_sendpage_locked+0x10b/0x160 [<ffffffff84089203>] tcp_sendpage+0x43/0x60 [<ffffffff841641da>] inet_sendpage+0x1aa/0x660 [<ffffffff83dd4fcd>] kernel_sendpage+0x8d/0xe0 [<ffffffff83dd50ac>] sock_sendpage+0x8c/0xc0 [<ffffffff81b63300>] pipe_to_sendpage+0x290/0x3b0 [<ffffffff81b67243>] __splice_from_pipe+0x343/0x750 [<ffffffff81b6a459>] splice_from_pipe+0x1e9/0x330 [<ffffffff81b6a5e0>] generic_splice_sendpage+0x40/0x50 [<ffffffff81b6b1d7>] SyS_splice+0x7b7/0x1610 [<ffffffff84d77a01>] entry_SYSCALL_64_fastpath+0x1f/0xbe Fixes: 306b13eb ("proto_ops: Add locked held versions of sendmsg and sendpage") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Cong Wang says: ==================== net_sched: clean up tc classes and u32 filter Patch 1 and patch 2 prepare for patch 3. Major changes are in patch 3 and patch 4, details are there too. v2: Add patch 1 and 2, group all into a patchset Fix a coding style issue in patch 4 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
WANG Cong authored
It is ugly to hide a u32-filter-specific pointer inside Qdisc, this breaks the TC layers: 1. Qdisc is a generic representation, should not have any specific data of any type 2. Qdisc layer is above filter layer, should only save filters in the list of struct tcf_proto. This pointer is used as the head of the chain of u32 hash tables, that is struct tc_u_hnode, because u32 filter is very special, it allows to create multiple hash tables within one qdisc and across multiple u32 filters. Instead of using this ugly pointer, we can just save it in a global hash table key'ed by (dev ifindex, qdisc handle), therefore we can still treat it as a per qdisc basis data structure conceptually. Of course, because of network namespaces, this key is not unique at all, but it is fine as we already have a pointer to Qdisc in struct tc_u_common, we can just compare the pointers when collision. And this only affects slow paths, has no impact to fast path, thanks to the pointer ->tp_c. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
WANG Cong authored
For TC classes, their ->get() and ->put() are always paired, and the reference counting is completely useless, because: 1) For class modification and dumping paths, we already hold RTNL lock, so all of these ->get(),->change(),->put() are atomic. 2) For filter bindiing/unbinding, we use other reference counter than this one, and they should have RTNL lock too. 3) For ->qlen_notify(), it is special because it is called on ->enqueue() path, but we already hold qdisc tree lock there, and we hold this tree lock when graft or delete the class too, so it should not be gone or changed until we release the tree lock. Therefore, this patch removes ->get() and ->put(), but: 1) Adds a new ->find() to find the pointer to a class by classid, no refcnt. 2) Move the original class destroy upon the last refcnt into ->delete(), right after releasing tree lock. This is fine because the class is already removed from hash when holding the lock. For those who also use ->put() as ->unbind(), just rename them to reflect this change. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
WANG Cong authored
Like for TC actions, ->delete() is a special case, we have to prepare and fill the notification before delete otherwise would get use-after-free after we remove the reference count. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
WANG Cong authored
This is not needed if we move them up properly. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
The skb_pad() function frees the skb on error, so this code has a double free. Fixes: 00e57a6d ("net-next/hinic: Add Tx operation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-