1. 20 Dec, 2018 7 commits
  2. 18 Dec, 2018 4 commits
    • Taehee Yoo's avatar
      netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set · 06aa151a
      Taehee Yoo authored
      If same destination IP address config is already existing, that config is
      just used. MAC address also should be same.
      However, there is no MAC address checking routine.
      So that MAC address checking routine is added.
      
      test commands:
         %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
      	   -j CLUSTERIP --new --hashmode sourceip \
      	   --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
         %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
      	   -j CLUSTERIP --new --hashmode sourceip \
      	   --clustermac 01:00:5e:00:00:21 --total-nodes 2 --local-node 1
      
      After this patch, above commands are disallowed.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      06aa151a
    • Taehee Yoo's avatar
      netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in clusterip_config_entry_put() · 2a61d8b8
      Taehee Yoo authored
      A proc_remove() can sleep. so that it can't be inside of spin_lock.
      Hence proc_remove() is moved to outside of spin_lock. and it also
      adds mutex to sync create and remove of proc entry(config->pde).
      
      test commands:
      SHELL#1
         %while :; do iptables -A INPUT -p udp -i enp2s0 -d 192.168.1.100 \
      	   --dport 9000  -j CLUSTERIP --new --hashmode sourceip \
      	   --clustermac 01:00:5e:00:00:21 --total-nodes 3 --local-node 3; \
      	   iptables -F; done
      
      SHELL#2
         %while :; do echo +1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; \
      	   echo -1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; done
      
      [ 2949.569864] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
      [ 2949.579944] in_atomic(): 1, irqs_disabled(): 0, pid: 5472, name: iptables
      [ 2949.587920] 1 lock held by iptables/5472:
      [ 2949.592711]  #0: 000000008f0ebcf2 (&(&cn->lock)->rlock){+...}, at: refcount_dec_and_lock+0x24/0x50
      [ 2949.603307] CPU: 1 PID: 5472 Comm: iptables Tainted: G        W         4.19.0-rc5+ #16
      [ 2949.604212] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
      [ 2949.604212] Call Trace:
      [ 2949.604212]  dump_stack+0xc9/0x16b
      [ 2949.604212]  ? show_regs_print_info+0x5/0x5
      [ 2949.604212]  ___might_sleep+0x2eb/0x420
      [ 2949.604212]  ? set_rq_offline.part.87+0x140/0x140
      [ 2949.604212]  ? _rcu_barrier_trace+0x400/0x400
      [ 2949.604212]  wait_for_completion+0x94/0x710
      [ 2949.604212]  ? wait_for_completion_interruptible+0x780/0x780
      [ 2949.604212]  ? __kernel_text_address+0xe/0x30
      [ 2949.604212]  ? __lockdep_init_map+0x10e/0x5c0
      [ 2949.604212]  ? __lockdep_init_map+0x10e/0x5c0
      [ 2949.604212]  ? __init_waitqueue_head+0x86/0x130
      [ 2949.604212]  ? init_wait_entry+0x1a0/0x1a0
      [ 2949.604212]  proc_entry_rundown+0x208/0x270
      [ 2949.604212]  ? proc_reg_get_unmapped_area+0x370/0x370
      [ 2949.604212]  ? __lock_acquire+0x4500/0x4500
      [ 2949.604212]  ? complete+0x18/0x70
      [ 2949.604212]  remove_proc_subtree+0x143/0x2a0
      [ 2949.708655]  ? remove_proc_entry+0x390/0x390
      [ 2949.708655]  clusterip_tg_destroy+0x27a/0x630 [ipt_CLUSTERIP]
      [ ... ]
      
      Fixes: b3e456fc ("netfilter: ipt_CLUSTERIP: fix a race condition of proc file creation")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2a61d8b8
    • Taehee Yoo's avatar
      netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine · b12f7bad
      Taehee Yoo authored
      When network namespace is destroyed, both clusterip_tg_destroy() and
      clusterip_net_exit() are called. and clusterip_net_exit() is called
      before clusterip_tg_destroy().
      Hence cleanup check code in clusterip_net_exit() doesn't make sense.
      
      test commands:
         %ip netns add vm1
         %ip netns exec vm1 bash
         %ip link set lo up
         %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
      	-j CLUSTERIP --new --hashmode sourceip \
      	--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
         %exit
         %ip netns del vm1
      
      splat looks like:
      [  341.184508] WARNING: CPU: 1 PID: 87 at net/ipv4/netfilter/ipt_CLUSTERIP.c:840 clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
      [  341.184850] Modules linked in: ipt_CLUSTERIP nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter ip_tables x_tables
      [  341.184850] CPU: 1 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc5+ #16
      [  341.227509] Workqueue: netns cleanup_net
      [  341.227509] RIP: 0010:clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
      [  341.227509] Code: 0f 85 7f fe ff ff 48 c7 c2 80 64 2c c0 be a8 02 00 00 48 c7 c7 a0 63 2c c0 c6 05 18 6e 00 00 01 e8 bc 38 ff f5 e9 5b fe ff ff <0f> 0b e9 33 ff ff ff e8 4b 90 50 f6 e9 2d fe ff ff 48 89 df e8 de
      [  341.227509] RSP: 0018:ffff88011086f408 EFLAGS: 00010202
      [  341.227509] RAX: dffffc0000000000 RBX: 1ffff1002210de85 RCX: 0000000000000000
      [  341.227509] RDX: 1ffff1002210de85 RSI: ffff880110813be8 RDI: ffffed002210de58
      [  341.227509] RBP: ffff88011086f4d0 R08: 0000000000000000 R09: 0000000000000000
      [  341.227509] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1002210de81
      [  341.227509] R13: ffff880110625a48 R14: ffff880114cec8c8 R15: 0000000000000014
      [  341.227509] FS:  0000000000000000(0000) GS:ffff880116600000(0000) knlGS:0000000000000000
      [  341.227509] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  341.227509] CR2: 00007f11fd38e000 CR3: 000000013ca16000 CR4: 00000000001006e0
      [  341.227509] Call Trace:
      [  341.227509]  ? __clusterip_config_find+0x460/0x460 [ipt_CLUSTERIP]
      [  341.227509]  ? default_device_exit+0x1ca/0x270
      [  341.227509]  ? remove_proc_entry+0x1cd/0x390
      [  341.227509]  ? dev_change_net_namespace+0xd00/0xd00
      [  341.227509]  ? __init_waitqueue_head+0x130/0x130
      [  341.227509]  ops_exit_list.isra.10+0x94/0x140
      [  341.227509]  cleanup_net+0x45b/0x900
      [ ... ]
      
      Fixes: 613d0776 ("netfilter: exit_net cleanup check added")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b12f7bad
    • Taehee Yoo's avatar
      netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine · 5a86d68b
      Taehee Yoo authored
      When network namespace is destroyed, cleanup_net() is called.
      cleanup_net() holds pernet_ops_rwsem then calls each ->exit callback.
      So that clusterip_tg_destroy() is called by cleanup_net().
      And clusterip_tg_destroy() calls unregister_netdevice_notifier().
      
      But both cleanup_net() and clusterip_tg_destroy() hold same
      lock(pernet_ops_rwsem). hence deadlock occurrs.
      
      After this patch, only 1 notifier is registered when module is inserted.
      And all of configs are added to per-net list.
      
      test commands:
         %ip netns add vm1
         %ip netns exec vm1 bash
         %ip link set lo up
         %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
      	-j CLUSTERIP --new --hashmode sourceip \
      	--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
         %exit
         %ip netns del vm1
      
      splat looks like:
      [  341.809674] ============================================
      [  341.809674] WARNING: possible recursive locking detected
      [  341.809674] 4.19.0-rc5+ #16 Tainted: G        W
      [  341.809674] --------------------------------------------
      [  341.809674] kworker/u4:2/87 is trying to acquire lock:
      [  341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x8c/0x460
      [  341.809674]
      [  341.809674] but task is already holding lock:
      [  341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
      [  341.809674]
      [  341.809674] other info that might help us debug this:
      [  341.809674]  Possible unsafe locking scenario:
      [  341.809674]
      [  341.809674]        CPU0
      [  341.809674]        ----
      [  341.809674]   lock(pernet_ops_rwsem);
      [  341.809674]   lock(pernet_ops_rwsem);
      [  341.809674]
      [  341.809674]  *** DEADLOCK ***
      [  341.809674]
      [  341.809674]  May be due to missing lock nesting notation
      [  341.809674]
      [  341.809674] 3 locks held by kworker/u4:2/87:
      [  341.809674]  #0: 00000000d9df6c92 ((wq_completion)"%s""netns"){+.+.}, at: process_one_work+0xafe/0x1de0
      [  341.809674]  #1: 00000000c2cbcee2 (net_cleanup_work){+.+.}, at: process_one_work+0xb60/0x1de0
      [  341.809674]  #2: 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
      [  341.809674]
      [  341.809674] stack backtrace:
      [  341.809674] CPU: 1 PID: 87 Comm: kworker/u4:2 Tainted: G        W         4.19.0-rc5+ #16
      [  341.809674] Workqueue: netns cleanup_net
      [  341.809674] Call Trace:
      [ ... ]
      [  342.070196]  down_write+0x93/0x160
      [  342.070196]  ? unregister_netdevice_notifier+0x8c/0x460
      [  342.070196]  ? down_read+0x1e0/0x1e0
      [  342.070196]  ? sched_clock_cpu+0x126/0x170
      [  342.070196]  ? find_held_lock+0x39/0x1c0
      [  342.070196]  unregister_netdevice_notifier+0x8c/0x460
      [  342.070196]  ? register_netdevice_notifier+0x790/0x790
      [  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
      [  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
      [  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
      [  342.070196]  ? trace_hardirqs_on+0x93/0x210
      [  342.070196]  ? __bpf_trace_preemptirq_template+0x10/0x10
      [  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
      [  342.123094]  clusterip_tg_destroy+0x3ad/0x650 [ipt_CLUSTERIP]
      [  342.123094]  ? clusterip_net_init+0x3d0/0x3d0 [ipt_CLUSTERIP]
      [  342.123094]  ? cleanup_match+0x17d/0x200 [ip_tables]
      [  342.123094]  ? xt_unregister_table+0x215/0x300 [x_tables]
      [  342.123094]  ? kfree+0xe2/0x2a0
      [  342.123094]  cleanup_entry+0x1d5/0x2f0 [ip_tables]
      [  342.123094]  ? cleanup_match+0x200/0x200 [ip_tables]
      [  342.123094]  __ipt_unregister_table+0x9b/0x1a0 [ip_tables]
      [  342.123094]  iptable_filter_net_exit+0x43/0x80 [iptable_filter]
      [  342.123094]  ops_exit_list.isra.10+0x94/0x140
      [  342.123094]  cleanup_net+0x45b/0x900
      [ ... ]
      
      Fixes: 202f59af ("netfilter: ipt_CLUSTERIP: do not hold dev")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5a86d68b
  3. 17 Dec, 2018 13 commits
  4. 13 Dec, 2018 2 commits
  5. 01 Dec, 2018 3 commits
    • Florian Westphal's avatar
      netfilter: nat: remove l4 protocol port rovers · 6ed5943f
      Florian Westphal authored
      This is a leftover from days where single-cpu systems were common:
      Store last port used to resolve a clash to use it as a starting point when
      the next conflict needs to be resolved.
      
      When we have parallel attempt to connect to same address:port pair,
      its likely that both cores end up computing the same "available" port,
      as both use same starting port, and newly used ports won't become
      visible to other cores until the conntrack gets confirmed later.
      
      One of the cores then has to drop the packet at insertion time because
      the chosen new tuple turns out to be in use after all.
      
      Lets simplify this: remove port rover and use a pseudo-random starting
      point.
      
      Note that this doesn't make netfilter default to 'fully random' mode;
      the 'rover' was only used if NAT could not reuse source port as-is.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6ed5943f
    • Pablo Neira Ayuso's avatar
      netfilter: remove NFC_* cache bits · c3e93059
      Pablo Neira Ayuso authored
      These are very very (for long time unused) caching infrastructure
      definition, remove then. They have nothing to do with the NFC subsystem.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c3e93059
    • Paul E. McKenney's avatar
      netfilter: Replace call_rcu_bh(), rcu_barrier_bh(), and synchronize_rcu_bh() · c8d1da40
      Paul E. McKenney authored
      Now that call_rcu()'s callback is not invoked until after bh-disable
      regions of code have completed (in addition to explicitly marked
      RCU read-side critical sections), call_rcu() can be used in place
      of call_rcu_bh().  Similarly, rcu_barrier() can be used in place of
      rcu_barrier_bh() and synchronize_rcu() in place of synchronize_rcu_bh().
      This commit therefore makes these changes.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c8d1da40
  6. 12 Nov, 2018 6 commits
  7. 11 Nov, 2018 5 commits
    • Linus Torvalds's avatar
      Linux 4.20-rc2 · ccda4af0
      Linus Torvalds authored
      ccda4af0
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7a3765ed
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "One last pull request before heading to Vancouver for LPC, here we have:
      
         1) Don't forget to free VSI contexts during ice driver unload, from
            Victor Raj.
      
         2) Don't forget napi delete calls during device remove in ice driver,
            from Dave Ertman.
      
         3) Don't request VLAN tag insertion of ibmvnic device when SKB
            doesn't have VLAN tags at all.
      
         4) IPV4 frag handling code has to accomodate the situation where two
            threads try to insert the same fragment into the hash table at the
            same time. From Eric Dumazet.
      
         5) Relatedly, don't flow separate on protocol ports for fragmented
            frames, also from Eric Dumazet.
      
         6) Memory leaks in qed driver, from Denis Bolotin.
      
         7) Correct valid MTU range in smsc95xx driver, from Stefan Wahren.
      
         8) Validate cls_flower nested policies properly, from Jakub Kicinski.
      
         9) Clearing of stats counters in mc88e6xxx driver doesn't retain
            important bits in the G1_STATS_OP register causing the chip to
            hang. Fix from Andrew Lunn"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
        act_mirred: clear skb->tstamp on redirect
        net: dsa: mv88e6xxx: Fix clearing of stats counters
        tipc: fix link re-establish failure
        net: sched: cls_flower: validate nested enc_opts_policy to avoid warning
        net: mvneta: correct typo
        flow_dissector: do not dissect l4 ports for fragments
        net: qualcomm: rmnet: Fix incorrect assignment of real_dev
        net: aquantia: allow rx checksum offload configuration
        net: aquantia: invalid checksumm offload implementation
        net: aquantia: fixed enable unicast on 32 macvlan
        net: aquantia: fix potential IOMMU fault after driver unbind
        net: aquantia: synchronized flow control between mac/phy
        net: smsc95xx: Fix MTU range
        net: stmmac: Fix RX packet size > 8191
        qed: Fix potential memory corruption
        qed: Fix SPQ entries not returned to pool in error flows
        qed: Fix blocking/unlimited SPQ entries leak
        qed: Fix memory/entry leak in qed_init_sp_request()
        inet: frags: better deal with smp races
        net: hns3: bugfix for not checking return value
        ...
      7a3765ed
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v4.20' of... · e12e00e3
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - fix build errors in binrpm-pkg and bindeb-pkg targets
      
       - fix false positive matches in merge_config.sh
      
       - fix build version mismatch in deb-pkg target
      
       - fix dtbs_install handling in (bin)deb-pkg target
      
       - revert a commit that allows setlocalversion to write to source tree
      
      * tag 'kbuild-fixes-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        builddeb: Fix inclusion of dtbs in debian package
        Revert "scripts/setlocalversion: git: Make -dirty check more robust"
        kbuild: deb-pkg: fix too low build version number
        kconfig: merge_config: avoid false positive matches from comment lines
        kbuild: deb-pkg: fix bindeb-pkg breakage when O= is used
        kbuild: rpm-pkg: fix binrpm-pkg breakage when O= is used
      e12e00e3
    • Linus Torvalds's avatar
      Merge tag 'for-4.20-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 63a42e1a
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "Several fixes to recent release (4.19, fixes tagged for stable) and
        other fixes"
      
      * tag 'for-4.20-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix missing delayed iputs on unmount
        Btrfs: fix data corruption due to cloning of eof block
        Btrfs: fix infinite loop on inode eviction after deduplication of eof block
        Btrfs: fix deadlock on tree root leaf when finding free extent
        btrfs: avoid link error with CONFIG_NO_AUTO_INLINE
        btrfs: tree-checker: Fix misleading group system information
        Btrfs: fix missing data checksums after a ranged fsync (msync)
        btrfs: fix pinned underflow after transaction aborted
        Btrfs: fix cur_offset in the error case for nocow
      63a42e1a
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · c140f8b0
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "A large number of ext4 bug fixes, mostly buffer and memory leaks on
        error return cleanup paths"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: missing !bh check in ext4_xattr_inode_write()
        ext4: fix buffer leak in __ext4_read_dirblock() on error path
        ext4: fix buffer leak in ext4_expand_extra_isize_ea() on error path
        ext4: fix buffer leak in ext4_xattr_move_to_block() on error path
        ext4: release bs.bh before re-using in ext4_xattr_block_find()
        ext4: fix buffer leak in ext4_xattr_get_block() on error path
        ext4: fix possible leak of s_journal_flag_rwsem in error path
        ext4: fix possible leak of sbi->s_group_desc_leak in error path
        ext4: remove unneeded brelse call in ext4_xattr_inode_update_ref()
        ext4: avoid possible double brelse() in add_new_gdb() on error path
        ext4: avoid buffer leak in ext4_orphan_add() after prior errors
        ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty()
        ext4: fix possible inode leak in the retry loop of ext4_resize_fs()
        ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizing
        ext4: add missing brelse() update_backups()'s error path
        ext4: add missing brelse() add_new_gdb_meta_bg()'s error path
        ext4: add missing brelse() in set_flexbg_block_bitmap()'s error path
        ext4: avoid potential extra brelse in setup_new_flex_group_blocks()
      c140f8b0