1. 24 Oct, 2019 11 commits
    • Taehee Yoo's avatar
      vxlan: add adjacent link to limit depth level · 0ce1822c
      Taehee Yoo authored
      Current vxlan code doesn't limit the number of nested devices.
      Nested devices would be handled recursively and this routine needs
      huge stack memory. So, unlimited nested devices could make
      stack overflow.
      
      In order to fix this issue, this patch adds adjacent links.
      The adjacent link APIs internally check the depth level.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add vxlan0 type vxlan id 0 group 239.1.1.1 dev dummy0 \
      	    dstport 4789
          for i in {1..100}
          do
      	    let A=$i-1
      	    ip link add vxlan$i type vxlan id $i group 239.1.1.1 \
      		    dev vxlan$A dstport 4789
          done
          ip link del dummy0
      
      The top upper link is vxlan100 and the lowest link is vxlan0.
      When vxlan0 is deleting, the upper devices will be deleted recursively.
      It needs huge stack memory so it makes stack overflow.
      
      Splat looks like:
      [  229.628477] =============================================================================
      [  229.629785] BUG page->ptl (Not tainted): Padding overwritten. 0x0000000026abf214-0x0000000091f6abb2
      [  229.629785] -----------------------------------------------------------------------------
      [  229.629785]
      [  229.655439] ==================================================================
      [  229.629785] INFO: Slab 0x00000000ff7cfda8 objects=19 used=19 fp=0x00000000fe33776c flags=0x200000000010200
      [  229.655688] BUG: KASAN: stack-out-of-bounds in unmap_single_vma+0x25a/0x2e0
      [  229.655688] Read of size 8 at addr ffff888113076928 by task vlan-network-in/2334
      [  229.655688]
      [  229.629785] Padding 0000000026abf214: 00 80 14 0d 81 88 ff ff 68 91 81 14 81 88 ff ff  ........h.......
      [  229.629785] Padding 0000000001e24790: 38 91 81 14 81 88 ff ff 68 91 81 14 81 88 ff ff  8.......h.......
      [  229.629785] Padding 00000000b39397c8: 33 30 62 a7 ff ff ff ff ff eb 60 22 10 f1 ff 1f  30b.......`"....
      [  229.629785] Padding 00000000bc98f53a: 80 60 07 13 81 88 ff ff 00 80 14 0d 81 88 ff ff  .`..............
      [  229.629785] Padding 000000002aa8123d: 68 91 81 14 81 88 ff ff f7 21 17 a7 ff ff ff ff  h........!......
      [  229.629785] Padding 000000001c8c2369: 08 81 14 0d 81 88 ff ff 03 02 00 00 00 00 00 00  ................
      [  229.629785] Padding 000000004e290c5d: 21 90 a2 21 10 ed ff ff 00 00 00 00 00 fc ff df  !..!............
      [  229.629785] Padding 000000000e25d731: 18 60 07 13 81 88 ff ff c0 8b 13 05 81 88 ff ff  .`..............
      [  229.629785] Padding 000000007adc7ab3: b3 8a b5 41 00 00 00 00                          ...A....
      [  229.629785] FIX page->ptl: Restoring 0x0000000026abf214-0x0000000091f6abb2=0x5a
      [  ... ]
      
      Fixes: acaf4e70 ("net: vxlan: when lower dev unregisters remove vxlan dev as well")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ce1822c
    • Taehee Yoo's avatar
      net: core: add ignore flag to netdev_adjacent structure · 32b6d34f
      Taehee Yoo authored
      In order to link an adjacent node, netdev_upper_dev_link() is used
      and in order to unlink an adjacent node, netdev_upper_dev_unlink() is used.
      unlink operation does not fail, but link operation can fail.
      
      In order to exchange adjacent nodes, we should unlink an old adjacent
      node first. then, link a new adjacent node.
      If link operation is failed, we should link an old adjacent node again.
      But this link operation can fail too.
      It eventually breaks the adjacent link relationship.
      
      This patch adds an ignore flag into the netdev_adjacent structure.
      If this flag is set, netdev_upper_dev_link() ignores an old adjacent
      node for a moment.
      
      This patch also adds new functions for other modules.
      netdev_adjacent_change_prepare()
      netdev_adjacent_change_commit()
      netdev_adjacent_change_abort()
      
      netdev_adjacent_change_prepare() inserts new device into adjacent list
      but new device is not allowed to use immediately.
      If netdev_adjacent_change_prepare() fails, it internally rollbacks
      adjacent list so that we don't need any other action.
      netdev_adjacent_change_commit() deletes old device in the adjacent list
      and allows new device to use.
      netdev_adjacent_change_abort() rollbacks adjacent list.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32b6d34f
    • Taehee Yoo's avatar
      macsec: fix refcnt leak in module exit routine · 2bce1ebe
      Taehee Yoo authored
      When a macsec interface is created, it increases a refcnt to a lower
      device(real device). when macsec interface is deleted, the refcnt is
      decreased in macsec_free_netdev(), which is ->priv_destructor() of
      macsec interface.
      
      The problem scenario is this.
      When nested macsec interfaces are exiting, the exit routine of the
      macsec module makes refcnt leaks.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add macsec0 link dummy0 type macsec
          ip link add macsec1 link macsec0 type macsec
          modprobe -rv macsec
      
      [  208.629433] unregister_netdevice: waiting for macsec0 to become free. Usage count = 1
      
      Steps of exit routine of macsec module are below.
      1. Calls ->dellink() in __rtnl_link_unregister().
      2. Checks refcnt and wait refcnt to be 0 if refcnt is not 0 in
      netdev_run_todo().
      3. Calls ->priv_destruvtor() in netdev_run_todo().
      
      Step2 checks refcnt, but step3 decreases refcnt.
      So, step2 waits forever.
      
      This patch makes the macsec module do not hold a refcnt of the lower
      device because it already holds a refcnt of the lower device with
      netdev_upper_dev_link().
      
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bce1ebe
    • Taehee Yoo's avatar
      team: fix nested locking lockdep warning · 369f61be
      Taehee Yoo authored
      team interface could be nested and it's lock variable could be nested too.
      But this lock uses static lockdep key and there is no nested locking
      handling code such as mutex_lock_nested() and so on.
      so the Lockdep would warn about the circular locking scenario that
      couldn't happen.
      In order to fix, this patch makes the team module to use dynamic lock key
      instead of static key.
      
      Test commands:
          ip link add team0 type team
          ip link add team1 type team
          ip link set team0 master team1
          ip link set team0 nomaster
          ip link set team1 master team0
          ip link set team1 nomaster
      
      Splat that looks like:
      [   40.364352] WARNING: possible recursive locking detected
      [   40.364964] 5.4.0-rc3+ #96 Not tainted
      [   40.365405] --------------------------------------------
      [   40.365973] ip/750 is trying to acquire lock:
      [   40.366542] ffff888060b34c40 (&team->lock){+.+.}, at: team_set_mac_address+0x151/0x290 [team]
      [   40.367689]
      	       but task is already holding lock:
      [   40.368729] ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
      [   40.370280]
      	       other info that might help us debug this:
      [   40.371159]  Possible unsafe locking scenario:
      
      [   40.371942]        CPU0
      [   40.372338]        ----
      [   40.372673]   lock(&team->lock);
      [   40.373115]   lock(&team->lock);
      [   40.373549]
      	       *** DEADLOCK ***
      
      [   40.374432]  May be due to missing lock nesting notation
      
      [   40.375338] 2 locks held by ip/750:
      [   40.375851]  #0: ffffffffabcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
      [   40.376927]  #1: ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
      [   40.377989]
      	       stack backtrace:
      [   40.378650] CPU: 0 PID: 750 Comm: ip Not tainted 5.4.0-rc3+ #96
      [   40.379368] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   40.380574] Call Trace:
      [   40.381208]  dump_stack+0x7c/0xbb
      [   40.381959]  __lock_acquire+0x269d/0x3de0
      [   40.382817]  ? register_lock_class+0x14d0/0x14d0
      [   40.383784]  ? check_chain_key+0x236/0x5d0
      [   40.384518]  lock_acquire+0x164/0x3b0
      [   40.385074]  ? team_set_mac_address+0x151/0x290 [team]
      [   40.385805]  __mutex_lock+0x14d/0x14c0
      [   40.386371]  ? team_set_mac_address+0x151/0x290 [team]
      [   40.387038]  ? team_set_mac_address+0x151/0x290 [team]
      [   40.387632]  ? mutex_lock_io_nested+0x1380/0x1380
      [   40.388245]  ? team_del_slave+0x60/0x60 [team]
      [   40.388752]  ? rcu_read_lock_sched_held+0x90/0xc0
      [   40.389304]  ? rcu_read_lock_bh_held+0xa0/0xa0
      [   40.389819]  ? lock_acquire+0x164/0x3b0
      [   40.390285]  ? lockdep_rtnl_is_held+0x16/0x20
      [   40.390797]  ? team_port_get_rtnl+0x90/0xe0 [team]
      [   40.391353]  ? __module_text_address+0x13/0x140
      [   40.391886]  ? team_set_mac_address+0x151/0x290 [team]
      [   40.392547]  team_set_mac_address+0x151/0x290 [team]
      [   40.393111]  dev_set_mac_address+0x1f0/0x3f0
      [ ... ]
      
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      369f61be
    • Taehee Yoo's avatar
      bonding: use dynamic lockdep key instead of subclass · 089bca2c
      Taehee Yoo authored
      All bonding device has same lockdep key and subclass is initialized with
      nest_level.
      But actual nest_level value can be changed when a lower device is attached.
      And at this moment, the subclass should be updated but it seems to be
      unsafe.
      So this patch makes bonding use dynamic lockdep key instead of the
      subclass.
      
      Test commands:
          ip link add bond0 type bond
      
          for i in {1..5}
          do
      	    let A=$i-1
      	    ip link add bond$i type bond
      	    ip link set bond$i master bond$A
          done
          ip link set bond5 master bond0
      
      Splat looks like:
      [  307.992912] WARNING: possible recursive locking detected
      [  307.993656] 5.4.0-rc3+ #96 Tainted: G        W
      [  307.994367] --------------------------------------------
      [  307.995092] ip/761 is trying to acquire lock:
      [  307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
      [  307.997045]
      	       but task is already holding lock:
      [  307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
      [  307.999215]
      	       other info that might help us debug this:
      [  308.000251]  Possible unsafe locking scenario:
      
      [  308.001137]        CPU0
      [  308.001533]        ----
      [  308.001915]   lock(&(&bond->stats_lock)->rlock#2/2);
      [  308.002609]   lock(&(&bond->stats_lock)->rlock#2/2);
      [  308.003302]
      		*** DEADLOCK ***
      
      [  308.004310]  May be due to missing lock nesting notation
      
      [  308.005319] 3 locks held by ip/761:
      [  308.005830]  #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
      [  308.006894]  #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
      [  308.008243]  #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding]
      [  308.009422]
      	       stack backtrace:
      [  308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G        W         5.4.0-rc3+ #96
      [  308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  308.012179] Call Trace:
      [  308.012601]  dump_stack+0x7c/0xbb
      [  308.013089]  __lock_acquire+0x269d/0x3de0
      [  308.013669]  ? register_lock_class+0x14d0/0x14d0
      [  308.014318]  lock_acquire+0x164/0x3b0
      [  308.014858]  ? bond_get_stats+0xb8/0x500 [bonding]
      [  308.015520]  _raw_spin_lock_nested+0x2e/0x60
      [  308.016129]  ? bond_get_stats+0xb8/0x500 [bonding]
      [  308.017215]  bond_get_stats+0xb8/0x500 [bonding]
      [  308.018454]  ? bond_arp_rcv+0xf10/0xf10 [bonding]
      [  308.019710]  ? rcu_read_lock_held+0x90/0xa0
      [  308.020605]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [  308.021286]  ? bond_get_stats+0x9f/0x500 [bonding]
      [  308.021953]  dev_get_stats+0x1ec/0x270
      [  308.022508]  bond_get_stats+0x1d1/0x500 [bonding]
      
      Fixes: d3fff6c4 ("net: add netdev_lockdep_set_classes() helper")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      089bca2c
    • Taehee Yoo's avatar
      bonding: fix unexpected IFF_BONDING bit unset · 65de65d9
      Taehee Yoo authored
      The IFF_BONDING means bonding master or bonding slave device.
      ->ndo_add_slave() sets IFF_BONDING flag and ->ndo_del_slave() unsets
      IFF_BONDING flag.
      
      bond0<--bond1
      
      Both bond0 and bond1 are bonding device and these should keep having
      IFF_BONDING flag until they are removed.
      But bond1 would lose IFF_BONDING at ->ndo_del_slave() because that routine
      do not check whether the slave device is the bonding type or not.
      This patch adds the interface type check routine before removing
      IFF_BONDING flag.
      
      Test commands:
          ip link add bond0 type bond
          ip link add bond1 type bond
          ip link set bond1 master bond0
          ip link set bond1 nomaster
          ip link del bond1 type bond
          ip link add bond1 type bond
      
      Splat looks like:
      [  226.665555] proc_dir_entry 'bonding/bond1' already registered
      [  226.666440] WARNING: CPU: 0 PID: 737 at fs/proc/generic.c:361 proc_register+0x2a9/0x3e0
      [  226.667571] Modules linked in: bonding af_packet sch_fq_codel ip_tables x_tables unix
      [  226.668662] CPU: 0 PID: 737 Comm: ip Not tainted 5.4.0-rc3+ #96
      [  226.669508] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  226.670652] RIP: 0010:proc_register+0x2a9/0x3e0
      [  226.671612] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 39 01 00 00 48 8b 04 24 48 89 ea 48 c7 c7 a0 0b 14 9f 48 8b b0 e
      0 00 00 00 e8 07 e7 88 ff <0f> 0b 48 c7 c7 40 2d a5 9f e8 59 d6 23 01 48 8b 4c 24 10 48 b8 00
      [  226.675007] RSP: 0018:ffff888050e17078 EFLAGS: 00010282
      [  226.675761] RAX: dffffc0000000008 RBX: ffff88805fdd0f10 RCX: ffffffff9dd344e2
      [  226.676757] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f6b8c
      [  226.677751] RBP: ffff8880507160f3 R08: ffffed100d940019 R09: ffffed100d940019
      [  226.678761] R10: 0000000000000001 R11: ffffed100d940018 R12: ffff888050716008
      [  226.679757] R13: ffff8880507160f2 R14: dffffc0000000000 R15: ffffed100a0e2c1e
      [  226.680758] FS:  00007fdc217cc0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
      [  226.681886] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  226.682719] CR2: 00007f49313424d0 CR3: 0000000050e46001 CR4: 00000000000606f0
      [  226.683727] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  226.684725] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  226.685681] Call Trace:
      [  226.687089]  proc_create_seq_private+0xb3/0xf0
      [  226.687778]  bond_create_proc_entry+0x1b3/0x3f0 [bonding]
      [  226.691458]  bond_netdev_event+0x433/0x970 [bonding]
      [  226.692139]  ? __module_text_address+0x13/0x140
      [  226.692779]  notifier_call_chain+0x90/0x160
      [  226.693401]  register_netdevice+0x9b3/0xd80
      [  226.694010]  ? alloc_netdev_mqs+0x854/0xc10
      [  226.694629]  ? netdev_change_features+0xa0/0xa0
      [  226.695278]  ? rtnl_create_link+0x2ed/0xad0
      [  226.695849]  bond_newlink+0x2a/0x60 [bonding]
      [  226.696422]  __rtnl_newlink+0xb9f/0x11b0
      [  226.696968]  ? rtnl_link_unregister+0x220/0x220
      [ ... ]
      
      Fixes: 0b680e75 ("[PATCH] bonding: Add priv_flag to avoid event mishandling")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65de65d9
    • Taehee Yoo's avatar
      net: core: add generic lockdep keys · ab92d68f
      Taehee Yoo authored
      Some interface types could be nested.
      (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
      These interface types should set lockdep class because, without lockdep
      class key, lockdep always warn about unexisting circular locking.
      
      In the current code, these interfaces have their own lockdep class keys and
      these manage itself. So that there are so many duplicate code around the
      /driver/net and /net/.
      This patch adds new generic lockdep keys and some helper functions for it.
      
      This patch does below changes.
      a) Add lockdep class keys in struct net_device
         - qdisc_running, xmit, addr_list, qdisc_busylock
         - these keys are used as dynamic lockdep key.
      b) When net_device is being allocated, lockdep keys are registered.
         - alloc_netdev_mqs()
      c) When net_device is being free'd llockdep keys are unregistered.
         - free_netdev()
      d) Add generic lockdep key helper function
         - netdev_register_lockdep_key()
         - netdev_unregister_lockdep_key()
         - netdev_update_lockdep_key()
      e) Remove unnecessary generic lockdep macro and functions
      f) Remove unnecessary lockdep code of each interfaces.
      
      After this patch, each interface modules don't need to maintain
      their lockdep keys.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab92d68f
    • Taehee Yoo's avatar
      net: core: limit nested device depth · 5343da4c
      Taehee Yoo authored
      Current code doesn't limit the number of nested devices.
      Nested devices would be handled recursively and this needs huge stack
      memory. So, unlimited nested devices could make stack overflow.
      
      This patch adds upper_level and lower_level, they are common variables
      and represent maximum lower/upper depth.
      When upper/lower device is attached or dettached,
      {lower/upper}_level are updated. and if maximum depth is bigger than 8,
      attach routine fails and returns -EMLINK.
      
      In addition, this patch converts recursive routine of
      netdev_walk_all_{lower/upper} to iterator routine.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add link dummy0 name vlan1 type vlan id 1
          ip link set vlan1 up
      
          for i in {2..55}
          do
      	    let A=$i-1
      
      	    ip link add vlan$i link vlan$A type vlan id $i
          done
          ip link del dummy0
      
      Splat looks like:
      [  155.513226][  T908] BUG: KASAN: use-after-free in __unwind_start+0x71/0x850
      [  155.514162][  T908] Write of size 88 at addr ffff8880608a6cc0 by task ip/908
      [  155.515048][  T908]
      [  155.515333][  T908] CPU: 0 PID: 908 Comm: ip Not tainted 5.4.0-rc3+ #96
      [  155.516147][  T908] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  155.517233][  T908] Call Trace:
      [  155.517627][  T908]
      [  155.517918][  T908] Allocated by task 0:
      [  155.518412][  T908] (stack is not available)
      [  155.518955][  T908]
      [  155.519228][  T908] Freed by task 0:
      [  155.519885][  T908] (stack is not available)
      [  155.520452][  T908]
      [  155.520729][  T908] The buggy address belongs to the object at ffff8880608a6ac0
      [  155.520729][  T908]  which belongs to the cache names_cache of size 4096
      [  155.522387][  T908] The buggy address is located 512 bytes inside of
      [  155.522387][  T908]  4096-byte region [ffff8880608a6ac0, ffff8880608a7ac0)
      [  155.523920][  T908] The buggy address belongs to the page:
      [  155.524552][  T908] page:ffffea0001822800 refcount:1 mapcount:0 mapping:ffff88806c657cc0 index:0x0 compound_mapcount:0
      [  155.525836][  T908] flags: 0x100000000010200(slab|head)
      [  155.526445][  T908] raw: 0100000000010200 ffffea0001813808 ffffea0001a26c08 ffff88806c657cc0
      [  155.527424][  T908] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      [  155.528429][  T908] page dumped because: kasan: bad access detected
      [  155.529158][  T908]
      [  155.529410][  T908] Memory state around the buggy address:
      [  155.530060][  T908]  ffff8880608a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  155.530971][  T908]  ffff8880608a6c00: fb fb fb fb fb f1 f1 f1 f1 00 f2 f2 f2 f3 f3 f3
      [  155.531889][  T908] >ffff8880608a6c80: f3 fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  155.532806][  T908]                                            ^
      [  155.533509][  T908]  ffff8880608a6d00: fb fb fb fb fb fb fb fb fb f1 f1 f1 f1 00 00 00
      [  155.534436][  T908]  ffff8880608a6d80: f2 f3 f3 f3 f3 fb fb fb 00 00 00 00 00 00 00 00
      [ ... ]
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5343da4c
    • Takeshi Misawa's avatar
      keys: Fix memory leak in copy_net_ns · 82ecff65
      Takeshi Misawa authored
      If copy_net_ns() failed after net_alloc(), net->key_domain is leaked.
      Fix this, by freeing key_domain in error path.
      
      syzbot report:
      BUG: memory leak
      unreferenced object 0xffff8881175007e0 (size 32):
        comm "syz-executor902", pid 7069, jiffies 4294944350 (age 28.400s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000a83ed741>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<00000000a83ed741>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<00000000a83ed741>] slab_alloc mm/slab.c:3326 [inline]
          [<00000000a83ed741>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
          [<0000000059fc92b9>] kmalloc include/linux/slab.h:547 [inline]
          [<0000000059fc92b9>] kzalloc include/linux/slab.h:742 [inline]
          [<0000000059fc92b9>] net_alloc net/core/net_namespace.c:398 [inline]
          [<0000000059fc92b9>] copy_net_ns+0xb2/0x220 net/core/net_namespace.c:445
          [<00000000a9d74bbc>] create_new_namespaces+0x141/0x2a0 kernel/nsproxy.c:103
          [<000000008047d645>] unshare_nsproxy_namespaces+0x7f/0x100 kernel/nsproxy.c:202
          [<000000005993ea6e>] ksys_unshare+0x236/0x490 kernel/fork.c:2674
          [<0000000019417e75>] __do_sys_unshare kernel/fork.c:2742 [inline]
          [<0000000019417e75>] __se_sys_unshare kernel/fork.c:2740 [inline]
          [<0000000019417e75>] __x64_sys_unshare+0x16/0x20 kernel/fork.c:2740
          [<00000000f4c5f2c8>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:296
          [<0000000038550184>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      syzbot also reported other leak in copy_net_ns -> setup_net.
      This problem is already fixed by cf47a0b8.
      
      Fixes: 9b242610 ("keys: Network namespace domain tag")
      Reported-and-tested-by: syzbot+3b3296d032353c33184b@syzkaller.appspotmail.com
      Signed-off-by: default avatarTakeshi Misawa <jeliantsurux@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82ecff65
    • Martin Fuzzey's avatar
      net: phy: smsc: LAN8740: add PHY_RST_AFTER_CLK_EN flag · 76db2d46
      Martin Fuzzey authored
      The LAN8740, like the 8720, also requires a reset after enabling clock.
      The datasheet [1] 3.8.5.1 says:
      	"During a Hardware reset, an external clock must be supplied
      	to the XTAL1/CLKIN signal."
      
      I have observed this issue on a custom i.MX6 based board with
      the LAN8740A.
      
      [1] http://ww1.microchip.com/downloads/en/DeviceDoc/8740a.pdfSigned-off-by: default avatarMartin Fuzzey <martin.fuzzey@flowbird.group>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76db2d46
    • Eric Dumazet's avatar
      net/flow_dissector: switch to siphash · 55667441
      Eric Dumazet authored
      UDP IPv6 packets auto flowlabels are using a 32bit secret
      (static u32 hashrnd in net/core/flow_dissector.c) and
      apply jhash() over fields known by the receivers.
      
      Attackers can easily infer the 32bit secret and use this information
      to identify a device and/or user, since this 32bit secret is only
      set at boot time.
      
      Really, using jhash() to generate cookies sent on the wire
      is a serious security concern.
      
      Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
      a dead end. Trying to periodically change the secret (like in sch_sfq.c)
      could change paths taken in the network for long lived flows.
      
      Let's switch to siphash, as we did in commit df453700
      ("inet: switch IP ID generator to siphash")
      
      Using a cryptographically strong pseudo random function will solve this
      privacy issue and more generally remove other weak points in the stack.
      
      Packet schedulers using skb_get_hash_perturb() benefit from this change.
      
      Fixes: b5677416 ("ipv6: Enable auto flow labels by default")
      Fixes: 42240901 ("ipv6: Implement different admin modes for automatic flow labels")
      Fixes: 67800f9b ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
      Fixes: cb1ce2ef ("ipv6: Implement automatic flow label generation on transmit")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJonathan Berger <jonathann1@walla.com>
      Reported-by: default avatarAmit Klein <aksecurity@gmail.com>
      Reported-by: default avatarBenny Pinkas <benny@pinkas.net>
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55667441
  2. 22 Oct, 2019 12 commits
    • Ben Dooks (Codethink)'s avatar
      ipv6: include <net/addrconf.h> for missing declarations · 6c5d9c2a
      Ben Dooks (Codethink) authored
      Include <net/addrconf.h> for the missing declarations of
      various functions. Fixes the following sparse warnings:
      
      net/ipv6/addrconf_core.c:94:5: warning: symbol 'register_inet6addr_notifier' was not declared. Should it be static?
      net/ipv6/addrconf_core.c:100:5: warning: symbol 'unregister_inet6addr_notifier' was not declared. Should it be static?
      net/ipv6/addrconf_core.c:106:5: warning: symbol 'inet6addr_notifier_call_chain' was not declared. Should it be static?
      net/ipv6/addrconf_core.c:112:5: warning: symbol 'register_inet6addr_validator_notifier' was not declared. Should it be static?
      net/ipv6/addrconf_core.c:118:5: warning: symbol 'unregister_inet6addr_validator_notifier' was not declared. Should it be static?
      net/ipv6/addrconf_core.c:125:5: warning: symbol 'inet6addr_validator_notifier_call_chain' was not declared. Should it be static?
      net/ipv6/addrconf_core.c:237:6: warning: symbol 'in6_dev_finish_destroy' was not declared. Should it be static?
      Signed-off-by: default avatarBen Dooks (Codethink) <ben.dooks@codethink.co.uk>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      6c5d9c2a
    • Hillf Danton's avatar
      net: openvswitch: free vport unless register_netdevice() succeeds · 9464cc37
      Hillf Danton authored
      syzbot found the following crash on:
      
      HEAD commit:    1e78030e Merge tag 'mmc-v5.3-rc1' of git://git.kernel.org/..
      git tree:       upstream
      console output: https://syzkaller.appspot.com/x/log.txt?x=148d3d1a600000
      kernel config:  https://syzkaller.appspot.com/x/.config?x=30cef20daf3e9977
      dashboard link: https://syzkaller.appspot.com/bug?extid=13210896153522fe1ee5
      compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
      syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=136aa8c4600000
      C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=109ba792600000
      
      =====================================================================
      BUG: memory leak
      unreferenced object 0xffff8881207e4100 (size 128):
         comm "syz-executor032", pid 7014, jiffies 4294944027 (age 13.830s)
         hex dump (first 32 bytes):
           00 70 16 18 81 88 ff ff 80 af 8c 22 81 88 ff ff  .p........."....
           00 b6 23 17 81 88 ff ff 00 00 00 00 00 00 00 00  ..#.............
         backtrace:
           [<000000000eb78212>] kmemleak_alloc_recursive  include/linux/kmemleak.h:43 [inline]
           [<000000000eb78212>] slab_post_alloc_hook mm/slab.h:522 [inline]
           [<000000000eb78212>] slab_alloc mm/slab.c:3319 [inline]
           [<000000000eb78212>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
           [<00000000006ea6c6>] kmalloc include/linux/slab.h:552 [inline]
           [<00000000006ea6c6>] kzalloc include/linux/slab.h:748 [inline]
           [<00000000006ea6c6>] ovs_vport_alloc+0x37/0xf0  net/openvswitch/vport.c:130
           [<00000000f9a04a7d>] internal_dev_create+0x24/0x1d0  net/openvswitch/vport-internal_dev.c:164
           [<0000000056ee7c13>] ovs_vport_add+0x81/0x190  net/openvswitch/vport.c:199
           [<000000005434efc7>] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
           [<00000000b7b253f1>] ovs_dp_cmd_new+0x22f/0x410  net/openvswitch/datapath.c:1614
           [<00000000e0988518>] genl_family_rcv_msg+0x2ab/0x5b0  net/netlink/genetlink.c:629
           [<00000000d0cc9347>] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
           [<000000006694b647>] netlink_rcv_skb+0x61/0x170  net/netlink/af_netlink.c:2477
           [<0000000088381f37>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
           [<00000000dad42a47>] netlink_unicast_kernel  net/netlink/af_netlink.c:1302 [inline]
           [<00000000dad42a47>] netlink_unicast+0x1ec/0x2d0  net/netlink/af_netlink.c:1328
           [<0000000067e6b079>] netlink_sendmsg+0x270/0x480  net/netlink/af_netlink.c:1917
           [<00000000aab08a47>] sock_sendmsg_nosec net/socket.c:637 [inline]
           [<00000000aab08a47>] sock_sendmsg+0x54/0x70 net/socket.c:657
           [<000000004cb7c11d>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
           [<00000000c4901c63>] __sys_sendmsg+0x80/0xf0 net/socket.c:2356
           [<00000000c10abb2d>] __do_sys_sendmsg net/socket.c:2365 [inline]
           [<00000000c10abb2d>] __se_sys_sendmsg net/socket.c:2363 [inline]
           [<00000000c10abb2d>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2363
      
      BUG: memory leak
      unreferenced object 0xffff88811723b600 (size 64):
         comm "syz-executor032", pid 7014, jiffies 4294944027 (age 13.830s)
         hex dump (first 32 bytes):
           01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
           00 00 00 00 00 00 00 00 02 00 00 00 05 35 82 c1  .............5..
         backtrace:
           [<00000000352f46d8>] kmemleak_alloc_recursive  include/linux/kmemleak.h:43 [inline]
           [<00000000352f46d8>] slab_post_alloc_hook mm/slab.h:522 [inline]
           [<00000000352f46d8>] slab_alloc mm/slab.c:3319 [inline]
           [<00000000352f46d8>] __do_kmalloc mm/slab.c:3653 [inline]
           [<00000000352f46d8>] __kmalloc+0x169/0x300 mm/slab.c:3664
           [<000000008e48f3d1>] kmalloc include/linux/slab.h:557 [inline]
           [<000000008e48f3d1>] ovs_vport_set_upcall_portids+0x54/0xd0  net/openvswitch/vport.c:343
           [<00000000541e4f4a>] ovs_vport_alloc+0x7f/0xf0  net/openvswitch/vport.c:139
           [<00000000f9a04a7d>] internal_dev_create+0x24/0x1d0  net/openvswitch/vport-internal_dev.c:164
           [<0000000056ee7c13>] ovs_vport_add+0x81/0x190  net/openvswitch/vport.c:199
           [<000000005434efc7>] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
           [<00000000b7b253f1>] ovs_dp_cmd_new+0x22f/0x410  net/openvswitch/datapath.c:1614
           [<00000000e0988518>] genl_family_rcv_msg+0x2ab/0x5b0  net/netlink/genetlink.c:629
           [<00000000d0cc9347>] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
           [<000000006694b647>] netlink_rcv_skb+0x61/0x170  net/netlink/af_netlink.c:2477
           [<0000000088381f37>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
           [<00000000dad42a47>] netlink_unicast_kernel  net/netlink/af_netlink.c:1302 [inline]
           [<00000000dad42a47>] netlink_unicast+0x1ec/0x2d0  net/netlink/af_netlink.c:1328
           [<0000000067e6b079>] netlink_sendmsg+0x270/0x480  net/netlink/af_netlink.c:1917
           [<00000000aab08a47>] sock_sendmsg_nosec net/socket.c:637 [inline]
           [<00000000aab08a47>] sock_sendmsg+0x54/0x70 net/socket.c:657
           [<000000004cb7c11d>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
           [<00000000c4901c63>] __sys_sendmsg+0x80/0xf0 net/socket.c:2356
      
      BUG: memory leak
      unreferenced object 0xffff8881228ca500 (size 128):
         comm "syz-executor032", pid 7015, jiffies 4294944622 (age 7.880s)
         hex dump (first 32 bytes):
           00 f0 27 18 81 88 ff ff 80 ac 8c 22 81 88 ff ff  ..'........"....
           40 b7 23 17 81 88 ff ff 00 00 00 00 00 00 00 00  @.#.............
         backtrace:
           [<000000000eb78212>] kmemleak_alloc_recursive  include/linux/kmemleak.h:43 [inline]
           [<000000000eb78212>] slab_post_alloc_hook mm/slab.h:522 [inline]
           [<000000000eb78212>] slab_alloc mm/slab.c:3319 [inline]
           [<000000000eb78212>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
           [<00000000006ea6c6>] kmalloc include/linux/slab.h:552 [inline]
           [<00000000006ea6c6>] kzalloc include/linux/slab.h:748 [inline]
           [<00000000006ea6c6>] ovs_vport_alloc+0x37/0xf0  net/openvswitch/vport.c:130
           [<00000000f9a04a7d>] internal_dev_create+0x24/0x1d0  net/openvswitch/vport-internal_dev.c:164
           [<0000000056ee7c13>] ovs_vport_add+0x81/0x190  net/openvswitch/vport.c:199
           [<000000005434efc7>] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
           [<00000000b7b253f1>] ovs_dp_cmd_new+0x22f/0x410  net/openvswitch/datapath.c:1614
           [<00000000e0988518>] genl_family_rcv_msg+0x2ab/0x5b0  net/netlink/genetlink.c:629
           [<00000000d0cc9347>] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
           [<000000006694b647>] netlink_rcv_skb+0x61/0x170  net/netlink/af_netlink.c:2477
           [<0000000088381f37>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
           [<00000000dad42a47>] netlink_unicast_kernel  net/netlink/af_netlink.c:1302 [inline]
           [<00000000dad42a47>] netlink_unicast+0x1ec/0x2d0  net/netlink/af_netlink.c:1328
           [<0000000067e6b079>] netlink_sendmsg+0x270/0x480  net/netlink/af_netlink.c:1917
           [<00000000aab08a47>] sock_sendmsg_nosec net/socket.c:637 [inline]
           [<00000000aab08a47>] sock_sendmsg+0x54/0x70 net/socket.c:657
           [<000000004cb7c11d>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
           [<00000000c4901c63>] __sys_sendmsg+0x80/0xf0 net/socket.c:2356
           [<00000000c10abb2d>] __do_sys_sendmsg net/socket.c:2365 [inline]
           [<00000000c10abb2d>] __se_sys_sendmsg net/socket.c:2363 [inline]
           [<00000000c10abb2d>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2363
      =====================================================================
      
      The function in net core, register_netdevice(), may fail with vport's
      destruction callback either invoked or not. After commit 309b6697
      ("net: openvswitch: do not free vport if register_netdevice() is failed."),
      the duty to destroy vport is offloaded from the driver OTOH, which ends
      up in the memory leak reported.
      
      It is fixed by releasing vport unless device is registered successfully.
      To do that, the callback assignment is defered until device is registered.
      
      Reported-by: syzbot+13210896153522fe1ee5@syzkaller.appspotmail.com
      Fixes: 309b6697 ("net: openvswitch: do not free vport if register_netdevice() is failed.")
      Cc: Taehee Yoo <ap420073@gmail.com>
      Cc: Greg Rose <gvrose8192@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarHillf Danton <hdanton@sina.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      [sbrivio: this was sent to dev@openvswitch.org and never made its way
       to netdev -- resending original patch]
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      9464cc37
    • David Ahern's avatar
      selftests: Make l2tp.sh executable · b5b9181c
      David Ahern authored
      Kernel test robot reported that the l2tp.sh test script failed:
          # selftests: net: l2tp.sh
          # Warning: file l2tp.sh is not executable, correct this.
      
      Set executable bits.
      
      Fixes: e858ef1c ("selftests: Add l2tp tests")
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      b5b9181c
    • Yi Wang's avatar
      net: sched: taprio: fix -Wmissing-prototypes warnings · d665c128
      Yi Wang authored
      We get one warnings when build kernel W=1:
      net/sched/sch_taprio.c:1155:6: warning: no previous prototype for ‘taprio_offload_config_changed’ [-Wmissing-prototypes]
      
      Make the function static to fix this.
      
      Fixes: 9c66d156 ("taprio: Add support for hardware offloading")
      Signed-off-by: default avatarYi Wang <wang.yi59@zte.com.cn>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      d665c128
    • Jakub Kicinski's avatar
      Merge branch 'bnxt_en-bug-fixes' · 682fa9fb
      Jakub Kicinski authored
      Michael Chan says:
      
      ====================
      Devlink and error recovery bug fix patches.
      Most of the work is by Vasundhara Volam.
      ====================
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      682fa9fb
    • Vasundhara Volam's avatar
      bnxt_en: Avoid disabling pci device in bnxt_remove_one() for already disabled device. · f6824308
      Vasundhara Volam authored
      With the recently added error recovery logic, the device may already
      be disabled if the firmware recovery is unsuccessful.  In
      bnxt_remove_one(), check that the device is still enabled first
      before calling pci_disable_device().
      
      Fixes: 3bc7d4a3 ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      f6824308
    • Vasundhara Volam's avatar
      bnxt_en: Minor formatting changes in FW devlink_health_reporter · f255ed1c
      Vasundhara Volam authored
      Minor formatting changes to diagnose cb for FW devlink health
      reporter.
      Suggested-by: default avatarJiri Pirko <jiri@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      f255ed1c
    • Vasundhara Volam's avatar
      bnxt_en: Adjust the time to wait before polling firmware readiness. · c6a9e7aa
      Vasundhara Volam authored
      When firmware indicates that driver needs to invoke firmware reset
      which is common for both error recovery and live firmware reset path,
      driver needs a different time to wait before polling for firmware
      readiness.
      
      Modify the wait time to fw_reset_min_dsecs, which is initialised to
      correct timeout for error recovery and firmware reset.
      
      Fixes: 4037eb71 ("bnxt_en: Add a new BNXT_FW_RESET_STATE_POLL_FW_DOWN state.")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      c6a9e7aa
    • Michael Chan's avatar
      bnxt_en: Fix devlink NVRAM related byte order related issues. · 83a46a82
      Michael Chan authored
      The current code does not do endian swapping between the devlink
      parameter and the internal NVRAM representation.  Define a union to
      represent the little endian NVRAM data and add 2 helper functions to
      copy to and from the NVRAM data with the proper byte swapping.
      
      Fixes: 782a624d ("bnxt_en: Add bnxt_en initial port params table and register it")
      Cc: Jiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      83a46a82
    • Vasundhara Volam's avatar
      bnxt_en: Fix the size of devlink MSIX parameters. · c329230c
      Vasundhara Volam authored
      The current code that rounds up the NVRAM parameter bit size to the next
      byte size for the devlink parameter is not always correct.  The MSIX
      devlink parameters are 4 bytes and we don't get the correct size
      using this method.
      
      Fix it by adding a new dl_num_bytes member to the bnxt_dl_nvm_param
      structure which statically provides bytesize information according
      to the devlink parameter type definition.
      
      Fixes: 782a624d ("bnxt_en: Add bnxt_en initial port params table and register it")
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      c329230c
    • yuqi jin's avatar
      net: stmmac: Fix the problem of tso_xmit · 34c15202
      yuqi jin authored
      When the address width of DMA is greater than 32, the packet header occupies
      a BD descriptor. The starting address of the data should be added to the
      header length.
      
      Fixes: a993db88 ("net: stmmac: Enable support for > 32 Bits addressing in XGMAC")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Cc: Jose Abreu <joabreu@synopsys.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Signed-off-by: default avataryuqi jin <jinyuqi@huawei.com>
      Signed-off-by: default avatarShaokun Zhang <zhangshaokun@hisilicon.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      34c15202
    • Arnd Bergmann's avatar
      dynamic_debug: provide dynamic_hex_dump stub · 011c7289
      Arnd Bergmann authored
      The ionic driver started using dymamic_hex_dump(), but
      that is not always defined:
      
      drivers/net/ethernet/pensando/ionic/ionic_main.c:229:2: error: implicit declaration of function 'dynamic_hex_dump' [-Werror,-Wimplicit-function-declaration]
      
      Add a dummy implementation to use when CONFIG_DYNAMIC_DEBUG
      is disabled, printing nothing.
      
      Fixes: 938962d5 ("ionic: Add adminq action")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarShannon Nelson <snelson@pensando.io>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      011c7289
  3. 21 Oct, 2019 3 commits
  4. 19 Oct, 2019 14 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 531e93d1
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "I was battling a cold after some recent trips, so quite a bit piled up
        meanwhile, sorry about that.
      
        Highlights:
      
         1) Fix fd leak in various bpf selftests, from Brian Vazquez.
      
         2) Fix crash in xsk when device doesn't support some methods, from
            Magnus Karlsson.
      
         3) Fix various leaks and use-after-free in rxrpc, from David Howells.
      
         4) Fix several SKB leaks due to confusion of who owns an SKB and who
            should release it in the llc code. From Eric Biggers.
      
         5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.
      
         6) Jumbo packets don't work after resume on r8169, as the BIOS resets
            the chip into non-jumbo mode during suspend. From Heiner Kallweit.
      
         7) Corrupt L2 header during MPLS push, from Davide Caratti.
      
         8) Prevent possible infinite loop in tc_ctl_action, from Eric
            Dumazet.
      
         9) Get register bits right in bcmgenet driver, based upon chip
            version. From Florian Fainelli.
      
        10) Fix mutex problems in microchip DSA driver, from Marek Vasut.
      
        11) Cure race between route lookup and invalidation in ipv4, from Wei
            Wang.
      
        12) Fix performance regression due to false sharing in 'net'
            structure, from Eric Dumazet"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
        net: reorder 'struct net' fields to avoid false sharing
        net: dsa: fix switch tree list
        net: ethernet: dwmac-sun8i: show message only when switching to promisc
        net: aquantia: add an error handling in aq_nic_set_multicast_list
        net: netem: correct the parent's backlog when corrupted packet was dropped
        net: netem: fix error path for corrupted GSO frames
        macb: propagate errors when getting optional clocks
        xen/netback: fix error path of xenvif_connect_data()
        net: hns3: fix mis-counting IRQ vector numbers issue
        net: usb: lan78xx: Connect PHY before registering MAC
        vsock/virtio: discard packets if credit is not respected
        vsock/virtio: send a credit update when buffer size is changed
        mlxsw: spectrum_trap: Push Ethernet header before reporting trap
        net: ensure correct skb->tstamp in various fragmenters
        net: bcmgenet: reset 40nm EPHY on energy detect
        net: bcmgenet: soft reset 40nm EPHYs before MAC init
        net: phy: bcm7xxx: define soft_reset for 40nm EPHY
        net: bcmgenet: don't set phydev->link from MAC
        net: Update address for MediaTek ethernet driver in MAINTAINERS
        ipv4: fix race condition between route lookup and invalidation
        ...
      531e93d1
    • Eric Dumazet's avatar
      net: reorder 'struct net' fields to avoid false sharing · 2a06b898
      Eric Dumazet authored
      Intel test robot reported a ~7% regression on TCP_CRR tests
      that they bisected to the cited commit.
      
      Indeed, every time a new TCP socket is created or deleted,
      the atomic counter net->count is touched (via get_net(net)
      and put_net(net) calls)
      
      So cpus might have to reload a contended cache line in
      net_hash_mix(net) calls.
      
      We need to reorder 'struct net' fields to move @hash_mix
      in a read mostly cache line.
      
      We move in the first cache line fields that can be
      dirtied often.
      
      We probably will have to address in a followup patch
      the __randomize_layout that was added in linux-4.13,
      since this might break our placement choices.
      
      Fixes: 355b9855 ("netns: provide pure entropy for net_hash_mix()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a06b898
    • Vivien Didelot's avatar
      net: dsa: fix switch tree list · 50c7d2ba
      Vivien Didelot authored
      If there are multiple switch trees on the device, only the last one
      will be listed, because the arguments of list_add_tail are swapped.
      
      Fixes: 83c0afae ("net: dsa: Add new binding implementation")
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50c7d2ba
    • Mans Rullgard's avatar
      net: ethernet: dwmac-sun8i: show message only when switching to promisc · 05908d72
      Mans Rullgard authored
      Printing the info message every time more than the max number of mac
      addresses are requested generates unnecessary log spam.  Showing it only
      when the hw is not already in promiscous mode is equally informative
      without being annoying.
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05908d72
    • Chenwandun's avatar
      net: aquantia: add an error handling in aq_nic_set_multicast_list · 3d00cf2f
      Chenwandun authored
      add an error handling in aq_nic_set_multicast_list, it may not
      work when hw_multicast_list_set error; and at the same time
      it will remove gcc Wunused-but-set-variable warning.
      Signed-off-by: default avatarChenwandun <chenwandun@huawei.com>
      Reviewed-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d00cf2f
    • David S. Miller's avatar
      Merge branch 'netem-fix-further-issues-with-packet-corruption' · 70873837
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      net: netem: fix further issues with packet corruption
      
      This set is fixing two more issues with the netem packet corruption.
      
      First patch (which was previously posted) avoids NULL pointer dereference
      if the first frame gets freed due to allocation or checksum failure.
      v2 improves the clarity of the code a little as requested by Cong.
      
      Second patch ensures we don't return SUCCESS if the frame was in fact
      dropped. Thanks to this commit message for patch 1 no longer needs the
      "this will still break with a single-frame failure" disclaimer.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70873837
    • Jakub Kicinski's avatar
      net: netem: correct the parent's backlog when corrupted packet was dropped · e0ad032e
      Jakub Kicinski authored
      If packet corruption failed we jump to finish_segs and return
      NET_XMIT_SUCCESS. Seeing success will make the parent qdisc
      increment its backlog, that's incorrect - we need to return
      NET_XMIT_DROP.
      
      Fixes: 6071bd1a ("netem: Segment GSO packets on enqueue")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0ad032e
    • Jakub Kicinski's avatar
      net: netem: fix error path for corrupted GSO frames · a7fa12d1
      Jakub Kicinski authored
      To corrupt a GSO frame we first perform segmentation.  We then
      proceed using the first segment instead of the full GSO skb and
      requeue the rest of the segments as separate packets.
      
      If there are any issues with processing the first segment we
      still want to process the rest, therefore we jump to the
      finish_segs label.
      
      Commit 177b8007 ("net: netem: fix backlog accounting for
      corrupted GSO frames") started using the pointer to the first
      segment in the "rest of segments processing", but as mentioned
      above the first segment may had already been freed at this point.
      
      Backlog corrections for parent qdiscs have to be adjusted.
      
      Fixes: 177b8007 ("net: netem: fix backlog accounting for corrupted GSO frames")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7fa12d1
    • Michael Tretter's avatar
      macb: propagate errors when getting optional clocks · bd310aca
      Michael Tretter authored
      The tx_clk, rx_clk, and tsu_clk are optional. Currently the macb driver
      marks clock as not available if it receives an error when trying to get
      a clock. This is wrong, because a clock controller might return
      -EPROBE_DEFER if a clock is not available, but will eventually become
      available.
      
      In these cases, the driver would probe successfully but will never be
      able to adjust the clocks, because the clocks were not available during
      probe, but became available later.
      
      For example, the clock controller for the ZynqMP is implemented in the
      PMU firmware and the clocks are only available after the firmware driver
      has been probed.
      
      Use devm_clk_get_optional() in instead of devm_clk_get() to get the
      optional clock and propagate all errors to the calling function.
      Signed-off-by: default avatarMichael Tretter <m.tretter@pengutronix.de>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Tested-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd310aca
    • Juergen Gross's avatar
      xen/netback: fix error path of xenvif_connect_data() · 3d5c1a03
      Juergen Gross authored
      xenvif_connect_data() calls module_put() in case of error. This is
      wrong as there is no related module_get().
      
      Remove the superfluous module_put().
      
      Fixes: 279f438e ("xen-netback: Don't destroy the netdev until the vif is shut down")
      Cc: <stable@vger.kernel.org> # 3.12
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarPaul Durrant <paul@xen.org>
      Reviewed-by: default avatarWei Liu <wei.liu@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d5c1a03
    • Yonglong Liu's avatar
      net: hns3: fix mis-counting IRQ vector numbers issue · 580a05f9
      Yonglong Liu authored
      Currently, the num_msi_left means the vector numbers of NIC,
      but if the PF supported RoCE, it contains the vector numbers
      of NIC and RoCE(Not expected).
      
      This may cause interrupts lost in some case, because of the
      NIC module used the vector resources which belongs to RoCE.
      
      This patch adds a new variable num_nic_msi to store the vector
      numbers of NIC, and adjust the default TQP numbers and rss_size
      according to the value of num_nic_msi.
      
      Fixes: 46a3df9f ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      580a05f9
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 998d7551
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "Rather a lot of fixes, almost all affecting mm/"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits)
        scripts/gdb: fix debugging modules on s390
        kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register
        mm/thp: allow dropping THP from page cache
        mm/vmscan.c: support removing arbitrary sized pages from mapping
        mm/thp: fix node page state in split_huge_page_to_list()
        proc/meminfo: fix output alignment
        mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch
        mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition
        mm: include <linux/huge_mm.h> for is_vma_temporary_stack
        zram: fix race between backing_dev_show and backing_dev_store
        mm/memcontrol: update lruvec counters in mem_cgroup_move_account
        ocfs2: fix panic due to ocfs2_wq is null
        hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
        mm: memblock: do not enforce current limit for memblock_phys* family
        mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
        mm/gup: fix a misnamed "write" argument, and a related bug
        mm/gup_benchmark: add a missing "w" to getopt string
        ocfs2: fix error handling in ocfs2_setattr()
        mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
        mm/memunmap: don't access uninitialized memmap in memunmap_pages()
        ...
      998d7551
    • Ilya Leoshkevich's avatar
      scripts/gdb: fix debugging modules on s390 · 585d730d
      Ilya Leoshkevich authored
      Currently lx-symbols assumes that module text is always located at
      module->core_layout->base, but s390 uses the following layout:
      
        +------+  <- module->core_layout->base
        | GOT  |
        +------+  <- module->core_layout->base + module->arch->plt_offset
        | PLT  |
        +------+  <- module->core_layout->base + module->arch->plt_offset +
        | TEXT |     module->arch->plt_size
        +------+
      
      Therefore, when trying to debug modules on s390, all the symbol
      addresses are skewed by plt_offset + plt_size.
      
      Fix by adding plt_offset + plt_size to module_addr in
      load_module_symbols().
      
      Link: http://lkml.kernel.org/r/20191017085917.81791-1-iii@linux.ibm.comSigned-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Reviewed-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Cc: Kieran Bingham <kbingham@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      585d730d
    • Song Liu's avatar
      kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register · aa5de305
      Song Liu authored
      Attaching uprobe to text section in THP splits the PMD mapped page table
      into PTE mapped entries.  On uprobe detach, we would like to regroup PMD
      mapped page table entry to regain performance benefit of THP.
      
      However, the regroup is broken For perf_event based trace_uprobe.  This
      is because perf_event based trace_uprobe calls uprobe_unregister twice
      on close: first in TRACE_REG_PERF_CLOSE, then in
      TRACE_REG_PERF_UNREGISTER.  The second call will split the PMD mapped
      page table entry, which is not the desired behavior.
      
      Fix this by only use FOLL_SPLIT_PMD for uprobe register case.
      
      Add a WARN() to confirm uprobe unregister never work on huge pages, and
      abort the operation when this WARN() triggers.
      
      Link: http://lkml.kernel.org/r/20191017164223.2762148-6-songliubraving@fb.com
      Fixes: 5a52c9df ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aa5de305