1. 14 Mar, 2022 3 commits
    • Filipe Manana's avatar
      btrfs: don't log unnecessary boundary keys when logging directory · a450a4af
      Filipe Manana authored
      Before we start to log dir index keys from a leaf, we check if there is a
      previous index key, which normally is at the end of a leaf that was not
      changed in the current transaction. Then we log that key and set the start
      of logged range (item of type BTRFS_DIR_LOG_INDEX_KEY) to the offset of
      that key. This is to ensure that if there were deleted index keys between
      that key and the first key we are going to log, those deletions are
      replayed in case we need to replay to the log after a power failure.
      However we really don't need to log that previous key, we can just set the
      start of the logged range to that key's offset plus 1. This achieves the
      same and avoids logging one dir index key.
      
      The same logic is performed when we finish logging the index keys of a
      leaf and we find that the next leaf has index keys and was not changed in
      the current transaction. We are logging the first key of that next leaf
      and use its offset as the end of range we log. This is just to ensure that
      if there were deleted index keys between the last index key we logged and
      the first key of that next leaf, those index keys are deleted if we end
      up replaying the log. However that is not necessary, we can avoid logging
      that first index key of the next leaf and instead set the end of the
      logged range to match the offset of that index key minus 1.
      
      So avoid logging those index keys at the boundaries and adjust the start
      and end offsets of the logged ranges as described above.
      
      This patch is part of a patchset comprised of the following patches:
      
        1/4 btrfs: don't log unnecessary boundary keys when logging directory
        2/4 btrfs: put initial index value of a directory in a constant
        3/4 btrfs: stop copying old dir items when logging a directory
        4/4 btrfs: stop trying to log subdirectories created in past transactions
      
      Performance test results are listed in the changelog of patch 3/4.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a450a4af
    • Sahil Kang's avatar
      btrfs: reuse existing pointers from btrfs_ioctl · dc408ccd
      Sahil Kang authored
      btrfs_ioctl already contains pointers to the inode and btrfs_root
      structs, so we can pass them into the subfunctions instead of the
      toplevel struct file.
      Signed-off-by: default avatarSahil Kang <sahil.kang@asilaycomputing.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      dc408ccd
    • Filipe Manana's avatar
      btrfs: remove write and wait of struct walk_control · c816d705
      Filipe Manana authored
      The ->write and ->wait fields of struct walk_control, used for log trees,
      are not used since 2008, more specifically since commit d0c803c4
      ("Btrfs: Record dirty pages tree-log pages in an extent_io tree") and
      since commit d0c803c4 ("Btrfs: Record dirty pages tree-log pages in
      an extent_io tree"). So just remove them, along with the function
      btrfs_write_tree_block(), which is also not used anymore after removing
      the ->write member.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c816d705
  2. 13 Mar, 2022 2 commits
  3. 12 Mar, 2022 8 commits
  4. 11 Mar, 2022 26 commits
  5. 10 Mar, 2022 1 commit
    • Ivan Vecera's avatar
      ice: Fix race condition during interface enslave · 5cb1ebdb
      Ivan Vecera authored
      Commit 5dbbbd01 ("ice: Avoid RTNL lock when re-creating
      auxiliary device") changes a process of re-creation of aux device
      so ice_plug_aux_dev() is called from ice_service_task() context.
      This unfortunately opens a race window that can result in dead-lock
      when interface has left LAG and immediately enters LAG again.
      
      Reproducer:
      ```
      #!/bin/sh
      
      ip link add lag0 type bond mode 1 miimon 100
      ip link set lag0
      
      for n in {1..10}; do
              echo Cycle: $n
              ip link set ens7f0 master lag0
              sleep 1
              ip link set ens7f0 nomaster
      done
      ```
      
      This results in:
      [20976.208697] Workqueue: ice ice_service_task [ice]
      [20976.213422] Call Trace:
      [20976.215871]  __schedule+0x2d1/0x830
      [20976.219364]  schedule+0x35/0xa0
      [20976.222510]  schedule_preempt_disabled+0xa/0x10
      [20976.227043]  __mutex_lock.isra.7+0x310/0x420
      [20976.235071]  enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
      [20976.251215]  ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
      [20976.256192]  ib_cache_setup_one+0x33/0xa0 [ib_core]
      [20976.261079]  ib_register_device+0x40d/0x580 [ib_core]
      [20976.266139]  irdma_ib_register_device+0x129/0x250 [irdma]
      [20976.281409]  irdma_probe+0x2c1/0x360 [irdma]
      [20976.285691]  auxiliary_bus_probe+0x45/0x70
      [20976.289790]  really_probe+0x1f2/0x480
      [20976.298509]  driver_probe_device+0x49/0xc0
      [20976.302609]  bus_for_each_drv+0x79/0xc0
      [20976.306448]  __device_attach+0xdc/0x160
      [20976.310286]  bus_probe_device+0x9d/0xb0
      [20976.314128]  device_add+0x43c/0x890
      [20976.321287]  __auxiliary_device_add+0x43/0x60
      [20976.325644]  ice_plug_aux_dev+0xb2/0x100 [ice]
      [20976.330109]  ice_service_task+0xd0c/0xed0 [ice]
      [20976.342591]  process_one_work+0x1a7/0x360
      [20976.350536]  worker_thread+0x30/0x390
      [20976.358128]  kthread+0x10a/0x120
      [20976.365547]  ret_from_fork+0x1f/0x40
      ...
      [20976.438030] task:ip              state:D stack:    0 pid:213658 ppid:213627 flags:0x00004084
      [20976.446469] Call Trace:
      [20976.448921]  __schedule+0x2d1/0x830
      [20976.452414]  schedule+0x35/0xa0
      [20976.455559]  schedule_preempt_disabled+0xa/0x10
      [20976.460090]  __mutex_lock.isra.7+0x310/0x420
      [20976.464364]  device_del+0x36/0x3c0
      [20976.467772]  ice_unplug_aux_dev+0x1a/0x40 [ice]
      [20976.472313]  ice_lag_event_handler+0x2a2/0x520 [ice]
      [20976.477288]  notifier_call_chain+0x47/0x70
      [20976.481386]  __netdev_upper_dev_link+0x18b/0x280
      [20976.489845]  bond_enslave+0xe05/0x1790 [bonding]
      [20976.494475]  do_setlink+0x336/0xf50
      [20976.502517]  __rtnl_newlink+0x529/0x8b0
      [20976.543441]  rtnl_newlink+0x43/0x60
      [20976.546934]  rtnetlink_rcv_msg+0x2b1/0x360
      [20976.559238]  netlink_rcv_skb+0x4c/0x120
      [20976.563079]  netlink_unicast+0x196/0x230
      [20976.567005]  netlink_sendmsg+0x204/0x3d0
      [20976.570930]  sock_sendmsg+0x4c/0x50
      [20976.574423]  ____sys_sendmsg+0x1eb/0x250
      [20976.586807]  ___sys_sendmsg+0x7c/0xc0
      [20976.606353]  __sys_sendmsg+0x57/0xa0
      [20976.609930]  do_syscall_64+0x5b/0x1a0
      [20976.613598]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
         is called from ice_service_task() context, aux device is created
         and associated device->lock is taken.
      2. Command 'ip link ... set master...' calls ice's notifier under
         RTNL lock and that notifier calls ice_unplug_aux_dev(). That
         function tries to take aux device->lock but this is already taken
         by ice_plug_aux_dev() in step 1
      3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
         taken in step 2
      4. Dead-lock
      
      The patch fixes this issue by following changes:
      - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
        call in ice_service_task()
      - The bit is checked in ice_clear_rdma_cap() and only if it is not set
        then ice_unplug_aux_dev() is called. If it is set (in other words
        plugging of aux device was requested and ice_plug_aux_dev() is
        potentially running) then the function only clears the bit
      - Once ice_plug_aux_dev() call (in ice_service_task) is finished
        the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
        whether it was already cleared by ice_clear_rdma_cap(). If so then
        aux device is unplugged.
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Co-developed-by: default avatarPetr Oros <poros@redhat.com>
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5cb1ebdb