1. 21 Jan, 2009 7 commits
  2. 16 Jan, 2009 2 commits
    • Chris Mason's avatar
      Btrfs: fix ioctl arg size (userland incompatible change!) · c071fcfd
      Chris Mason authored
      The structure used to send device in btrfs ioctl calls was not
      properly aligned, and so 32 bit ioctls would not work properly on
      64 bit kernels.
      
      We could fix this with compat ioctls, but we're just one byte away
      and it doesn't make sense at this stage to carry about the compat ioctls
      forever at this stage in the project.
      
      This patch brings the ioctl arg up to an evenly aligned 4k.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      c071fcfd
    • Chris Mason's avatar
      Btrfs: Clear the device->running_pending flag before bailing on congestion · 1d9e2ae9
      Chris Mason authored
      Btrfs maintains a queue of async bio submissions so the checksumming
      threads don't have to wait on get_request_wait.  In order to avoid
      extra wakeups, this code has a running_pending flag that is used
      to tell new submissions they don't need to wake the thread.
      
      When the threads notice congestion on a single device, they
      may decide to requeue the job and move on to other devices.  This
      makes sure the running_pending flag is cleared before the
      job is requeued.
      
      It should help avoid IO stalls by making sure the task is woken up
      when new submissions come in.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      1d9e2ae9
  3. 09 Jan, 2009 1 commit
    • Chris Mason's avatar
      Btrfs: explicitly mark the tree log root for writeback · e293e97e
      Chris Mason authored
      Each subvolume has an extent_state_tree used to mark metadata
      that needs to be sent to disk while syncing the tree.  This is
      used in addition to the dirty bits on the pages themselves so that
      a single subvolume can be sent to disk efficiently in disk order.
      
      Normally this marking happens in btrfs_alloc_free_block, which also does
      special recording of dirty tree blocks for the tree log roots.
      
      Yan Zheng noticed that when the root of the log tree is allocated, it is added
      to the wrong writeback list.  The fix used here is to explicitly set
      it dirty as part of tree log creation.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      e293e97e
  4. 08 Jan, 2009 1 commit
  5. 07 Jan, 2009 2 commits
  6. 06 Jan, 2009 8 commits
  7. 05 Jan, 2009 9 commits
  8. 24 Dec, 2008 8 commits
  9. 23 Dec, 2008 2 commits
    • Harry Ciao's avatar
      edac: fix edac core deadlock when removing a device · d519c8d9
      Harry Ciao authored
      When deleting an edac device, we have to wait for its edac_dev.work to be
      completed before deleting the whole edac_dev structure.  Since we have no
      idea which work in current edac_poller's workqueue is the work we are
      conerned about, we wait for all work in the edac_poller's workqueue to be
      proceseed.  This is done via flush_cpu_workqueue() which inserts a
      wq_barrier into the tail of the workqueue and then sleeping on the
      completion of this wq_barrier.  The edac_poller will wake up sleepers when
      it is found.
      
      EDAC core creates only one kernel worker thread, edac_poller, to run the
      works of all current edac devices.  They share the same callback function
      of edac_device_workq_function(), which would grab the mutex of
      device_ctls_mutex first before it checks the device.  This is exactly
      where edac_poller and rmmod would have a great chance to deadlock.
      
      In below call trace of rmmod > ... >
      edac_device_del_device >
      edac_device_workq_teardown > flush_workqueue > flush_cpu_workqueue,
      
      device_ctls_mutex would have already been grabbed by
      edac_device_del_device().  So, on one hand rmmod would sleep on the
      completion of a wq_barrier, holding device_ctls_mutex; on the other hand
      edac_poller would be blocked on the same mutex when it's running any one
      of works of existing edac evices(Note, this edac_dev.work is likely to be
      totally irrelevant to the one that is being removed right now)and never
      would have a chance to run the work of above wq_barrier to wake rmmod up.
      
      edac_device_workq_teardown() should not be called within the critical
      region of device_ctls_mutex.  Just like is done in edac_pci_del_device()
      and edac_mc_del_mc(), where edac_pci_workq_teardown() and
      edac_mc_workq_teardown() are called after related mutex are released.
      
      Moreover, an edac_dev.work should check first if it is being removed.  If
      this is the case, then it should bail out immediately.  Since not all of
      existing edac devices are to be removed, this "shutting flag" should be
      contained to edac device being removed.  The current edac_dev.op_state can
      be used to serve this purpose.
      
      The original deadlock problem and the solution have been witnessed and
      tested on actual hardware.  Without the solution, rmmod an edac driver
      would result in below deadlock:
      
      root@localhost:/root> rmmod mv64x60_edac
      EDAC DEBUG: mv64x60_dma_err_remove()
      EDAC DEBUG: edac_device_del_device()
      EDAC DEBUG: find_edac_device_by_dev()
      
      (hang for a moment)
      
      INFO: task edac-poller:2030 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      edac-poller   D 00000000     0  2030      2
      Call Trace:
      [df159dc0] [c0071e3c] free_hot_cold_page+0x17c/0x304 (unreliable)
      [df159e80] [c000a024] __switch_to+0x6c/0xa0
      [df159ea0] [c03587d8] schedule+0x2f4/0x4d8
      [df159f00] [c03598a8] __mutex_lock_slowpath+0xa0/0x174
      [df159f40] [e1030434] edac_device_workq_function+0x28/0xd8 [edac_core]
      [df159f60] [c003beb4] run_workqueue+0x114/0x218
      [df159f90] [c003c674] worker_thread+0x5c/0xc8
      [df159fd0] [c004106c] kthread+0x5c/0xa0
      [df159ff0] [c0013538] original_kernel_thread+0x44/0x60
      INFO: task rmmod:2062 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      rmmod         D 0ff2c9fc     0  2062   1839
      Call Trace:
      [df119c00] [c0437a74] 0xc0437a74 (unreliable)
      [df119cc0] [c000a024] __switch_to+0x6c/0xa0
      [df119ce0] [c03587d8] schedule+0x2f4/0x4d8
      [df119d40] [c03591dc] schedule_timeout+0xb0/0xf4
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d519c8d9
    • Li Zefan's avatar
      cgroups: avoid accessing uninitialized data in failure path · 20ca9b3f
      Li Zefan authored
      If cgroup_get_rootdir() failed, free_cg_links() will be called in the
      failure path, but tmp_cg_links hasn't been initialized at that time.
      
      I introduced this bug in the 2.6.27 merge window.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20ca9b3f