- 12 Jun, 2015 12 commits
-
-
Florian Westphal authored
We store the rule blob per (possible) cpu. Unfortunately this means we can waste lot of memory on big smp machines. ipt_entry structure ('rule head') is 112 byte, so e.g. with maxcpu=64 one single rule eats close to 8k RAM. Since previous patch made counters percpu it appears there is nothing left in the rule blob that needs to be percpu. On my test system (144 possible cpus, 400k dummy rules) this change saves close to 9 Gigabyte of RAM. Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
The binary arp/ip/ip6tables ruleset is stored per cpu. The only reason left as to why we need percpu duplication are the rule counters embedded into ipt_entry et al -- since each cpu has its own copy of the rules, all counters can be lockless. The downside is that the more cpus are supported, the more memory is required. Rules are not just duplicated per online cpu but for each possible cpu, i.e. if maxcpu is 144, then rule is duplicated 144 times, not for the e.g. 64 cores present. To save some memory and also improve utilization of shared caches it would be preferable to only store the rule blob once. So we first need to separate counters and the rule blob. Instead of using entry->counters, allocate this percpu and store the percpu address in entry->counters.pcnt on CONFIG_SMP. This change makes no sense as-is; it is merely an intermediate step to remove the percpu duplication of the rule set in a followup patch. Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
If bridge netfilter is used with both bridge-nf-call-iptables and bridge-nf-filter-vlan-tagged enabled then ip fragments in VLAN frames are sent without the vlan header. This has never worked reliably. Turns out this relied on pre-3.5 behaviour where skb frag_list was used to store ip fragments; ip_fragment() then re-used these skbs. But since commit 3cc49492 ("ipv4: use skb coalescing in defragmentation") this is no longer the case. ip_do_fragment now needs to allocate new skbs, but these don't contain the vlan tag information anymore. Fix it by storing vlan information of the ressembled skb in the br netfilter percpu frag area, and restore them for each of the fragments. Fixes: 3cc49492 ("ipv4: use skb coalescing in defragmentation") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
since commit d6b915e2 ("ip_fragment: don't forward defragmented DF packet") the largest fragment size is available in the IPCB. Therefore we no longer need to care about 'encapsulation' overhead of stripped PPPOE/VLAN headers since ip_do_fragment doesn't use device mtu in such cases. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Bernhard Thaler authored
IPv6 fragmented packets are not forwarded on an ethernet bridge with netfilter ip6_tables loaded. e.g. steps to reproduce 1) create a simple bridge like this modprobe br_netfilter brctl addbr br0 brctl addif br0 eth0 brctl addif br0 eth2 ifconfig eth0 up ifconfig eth2 up ifconfig br0 up 2) place a host with an IPv6 address on each side of the bridge set IPv6 address on host A: ip -6 addr add fd01:2345:6789:1::1/64 dev eth0 set IPv6 address on host B: ip -6 addr add fd01:2345:6789:1::2/64 dev eth0 3) run a simple ping command on host A with packets > MTU ping6 -s 4000 fd01:2345:6789:1::2 4) wait some time and run e.g. "ip6tables -t nat -nvL" on the bridge IPv6 fragmented packets traverse the bridge cleanly until somebody runs. "ip6tables -t nat -nvL". As soon as it is run (and netfilter modules are loaded) IPv6 fragmented packets do not traverse the bridge any more (you see no more responses in ping's output). After applying this patch IPv6 fragmented packets traverse the bridge cleanly in above scenario. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> [pablo@netfilter.org: small changes to br_nf_dev_queue_xmit] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Bernhard Thaler authored
Prepare check_hbh_len() to be called from newly introduced br_validate_ipv6() in next commit. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Bernhard Thaler authored
br_parse_ip_options() does not parse any IP options, it validates IP packets as a whole and the function name is misleading. Rename br_parse_ip_options() to br_validate_ipv4() and remove unneeded commments. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Bernhard Thaler authored
Currently frag_max_size is member of br_input_skb_cb and copied back and forth using IPCB(skb) and BR_INPUT_SKB_CB(skb) each time it is changed or used. Attach frag_max_size to nf_bridge_info and set value in pre_routing and forward functions. Use its value in forward and xmit functions. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Bernhard Thaler authored
IPv4 iptables allows to REDIRECT/DNAT/SNAT any traffic over a bridge. e.g. REDIRECT $ sysctl -w net.bridge.bridge-nf-call-iptables=1 $ iptables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \ -j REDIRECT --to-ports 81 This does not work with ip6tables on a bridge in NAT66 scenario because the REDIRECT/DNAT/SNAT is not correctly detected. The bridge pre-routing (finish) netfilter hook has to check for a possible redirect and then fix the destination mac address. This allows to use the ip6tables rules for local REDIRECT/DNAT/SNAT REDIRECT similar to the IPv4 iptables version. e.g. REDIRECT $ sysctl -w net.bridge.bridge-nf-call-ip6tables=1 $ ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \ -j REDIRECT --to-ports 81 This patch makes it possible to use IPv6 NAT66 on a bridge. It was tested on a bridge with two interfaces using SNAT/DNAT NAT66 rules. Reported-by: Artie Hamilton <artiemhamilton@yahoo.com> Signed-off-by: Sven Eckelmann <sven@open-mesh.com> [bernhard.thaler@wvnet.at: rebased, add indirect call to ip6_route_input()] [bernhard.thaler@wvnet.at: rebased, split into separate patches] Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Bernhard Thaler authored
Put br_nf_pre_routing_finish_ipv6() after daddr_was_changed() and br_nf_pre_routing_finish_bridge() to prepare calling these functions from there. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Bernhard Thaler authored
use binary AND on complement of BRNF_NF_BRIDGE_PREROUTING to unset bit in nf_bridge->mask. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Marcelo Ricardo Leitner authored
After db29a950 ("netfilter: conntrack: disable generic tracking for known protocols"), if the specific helper is built but not loaded (a standard for most distributions) systems with a restrictive firewall but weak configuration regarding netfilter modules to load, will silently stop working. This patch then puts a warning message so the sysadmin knows where to start looking into. It's a pr_warn_once regardless of protocol itself but it should be enough to give a hint on where to look. Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 11 Jun, 2015 28 commits
-
-
David S. Miller authored
Eric Dumazet says: ==================== tcp: defer shinfo->gso_size|type settings We put shinfo->gso_segs in TCP_SKB_CB(skb) a while back for performance reasons. This was in commit cd7d8498 ("tcp: change tcp_skb_pcount() location") This patch series complete the job for gso_size and gso_type, so that we do not bring 2 extra cache lines in tcp write xmit fast path, and making tcp_init_tso_segs() simpler and faster. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We had various issues in the past when TCP stack was modifying gso_size/gso_segs while clones were in flight. Commit c52e2421 ("tcp: must unclone packets before mangling them") fixed these bugs and added a WARN_ON_ONCE(skb_cloned(skb)); in tcp_set_skb_tso_segs() These bugs are now fixed, and because TCP stack now only sets shinfo->gso_size|segs on the clone itself, the check can be removed. As a result of this change, compiler inlines tcp_set_skb_tso_segs() in tcp_init_tso_segs() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
In commit cd7d8498 ("tcp: change tcp_skb_pcount() location") we stored gso_segs in a temporary cache hot location. This patch does the same for gso_size. This allows to save 2 cache line misses in tcp xmit path for the last packet that is considered but not sent because of various conditions (cwnd, tso defer, receiver window, TSQ...) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
tcp_set_skb_tso_segs() & tcp_init_tso_segs() no longer use the sock pointer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Our goal is to touch skb_shinfo(skb) only when absolutely needed, to avoid two cache line misses in TCP output path for last skb that is considered but not sent because of various conditions (cwnd, tso defer, receiver window, TSQ...) A packet is GSO only when skb_shinfo(skb)->gso_size is not zero. We can set skb_shinfo(skb)->gso_type to sk->sk_gso_type even for non GSO packets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
tcp_gso_segment() and tcp_gro_receive() are not strictly part of TCP stack. They should not assume tcp_skb_mss(skb) is in fact skb_shinfo(skb)->gso_size. This will allow us to change tcp_skb_mss() in following patches. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Fix a BUG_ON() where CONFIG_NET_SWITCHDEV is set but the driver for a bridged port does not support switchdev_port_attr_set op. Don't BUG_ON() if -EOPNOTSUPP is returned. Also change BUG_ON() to netdev_err since this is a normal error path and does not warrant the use of BUG_ON(), which is reserved for unrecoverable errs. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Reported-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ivan Vecera says: ==================== bna: clean-up The patches clean the bna driver. v2: changes & comments requested by Joe ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
...and remove some of them. It is not necessary to log when .probe() and .remove() are called or when TxQ is started or stopped. Also log level of some of them was changed to more appropriate one (link up/down, firmware loading failure. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Timeout functions are defined with 'void *' ptr argument. They should be defined directly with 'struct bfa_ioc *' type to avoid type conversions. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Remove macros for manipulation with struct list_head and replace them with standard ones. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Pointer cmpl used to iterate through completion entries is updated at the beginning of while loop as well as at the end. The update at the end of the loop is useless. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Patch converts kzalloc->copy_from_user sequence to memdup_user. There is also one useless assignment of NULL to bnad->regdata as it is followed by assignment of kzalloc output. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
TX_E_PRIO_CHANGE event is never sent for bna_tx so it doesn't need to be handled. After this change bna_tx->flags cannot contain BNA_TX_F_PRIO_CHANGED flag and it can be also eliminated. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
The bna_rx_config struct member paused can be removed as it is never written and as it cannot have non-zero value the bna_rxf struct member flags also cannot have BNA_RXF_F_PAUSED value and is always zero. So the flags member can be removed as well as bna_rxf_flags enum and the code-paths that needs to have non-zero bna_rxf->flags. This clean-up makes bna_rxf_sm_paused state unsed and can be also removed. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
RXF_E_PAUSE & RXF_E_RESUME events are never sent for bna_rxf object so they needn't to be handled. The bna_rxf's state bna_rxf_sm_fltr_clr_wait and function bna_rxf_fltr_clear are unused after this so remove them also. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
removed: bna_rx_ucast_add bna_rx_ucast_del simplified: bna_enet_pause_config bna_rx_mcast_delall bna_rx_mcast_listset bna_rx_mode_set bna_rx_ucast_listset bna_rx_ucast_set Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
replaced macros: BNA_MAC_IS_EQUAL -> ether_addr_equal BNA_POWER_OF_2 -> is_power_of_2 BNA_TO_POWER_OF_2_HIGH -> roundup_pow_of_two removed unused macros: bfa_fsm_get_state bfa_ioc_clr_stats bfa_ioc_fetch_stats bfa_ioc_get_alt_ioc_fwstate bfa_ioc_isr_mode_set bfa_ioc_maxfrsize bfa_ioc_mbox_cmd_pending bfa_ioc_ownership_reset bfa_ioc_rx_bbcredit bfa_ioc_state_disabled bfa_sm_cmp_state bfa_sm_get_state bfa_sm_send_event bfa_sm_set_state bfa_sm_state_decl BFA_STRING_32 BFI_ADAPTER_IS_{PROTO,TTV,UNSUPP) BFI_IOC_ENDIAN_SIG BNA_{C,RX,TX}Q_PAGE_INDEX_MAX BNA_{C,RX,TX}Q_PAGE_INDEX_MAX_SHIFT BNA_{C,RX,TX}Q_QPGE_PTR_GET BNA_IOC_TIMER_FREQ BNA_MESSAGE_SIZE BNA_QE_INDX_2_PTR BNA_QE_INDX_RANGE BNA_Q_GET_{C,P}I BNA_Q_{C,P}I_ADD BNA_Q_FREE_COUNT BNA_Q_IN_USE_COUNT BNA_TO_POWER_OF_2 containing_rec Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
The patch converts mac_t type to widely used 'u8 [ETH_ALEN]'. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Vecera authored
Parameters of all ether_addr_copy instances were checked for proper alignment. Alignment of bnad_bcast_addr is forced to 2 as the implicit alignment is 1. I have also renamed address parameter of bnad_set_mac_address() to addr. The name mac_addr was a little bit confusing as the real parameter is struct sockaddr *. v2: added __aligned directive to bnad_bcast_addr, renamed parameter of bnad_set_mac_address() (thx joe@perches.com) Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Or Gerlitz says: ==================== mlx5 Ethernet driver update - Jun 11 2015 This series from Saeed, Achiad and Gal contains few fixes to the recently introduced mlx5 Ethernet functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-