1. 12 Jul, 2010 1 commit
    • Joel Becker's avatar
      ocfs2: No need to zero pages past i_size. · 693c241a
      Joel Becker authored
      When ocfs2 fills a hole, it does so by allocating clusters.  When a
      cluster is larger than the write, ocfs2 must zero the portions of the
      cluster outside of the write.  If the clustersize is smaller than a
      pagecache page, this is handled by the normal pagecache mechanisms, but
      when the clustersize is larger than a page, ocfs2's write code will zero
      the pages adjacent to the write.  This makes sure the entire cluster is
      zeroed correctly.
      
      Currently ocfs2 behaves exactly the same when writing past i_size.
      However, this means ocfs2 is writing zeroed pages for portions of a new
      cluster that are beyond i_size.  The page writeback code isn't expecting
      this.  It treats all pages past the one containing i_size as left behind
      due to a previous truncate operation.
      
      Thankfully, ocfs2 calculates the number of pages it will be working on
      up front.  The rest of the write code merely honors the original
      calculation.  We can simply trim the number of pages to only cover the
      actual file data.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Cc: stable@kernel.org
      693c241a
  2. 08 Jul, 2010 2 commits
    • Joel Becker's avatar
      ocfs2: Zero the tail cluster when extending past i_size. · 5693486b
      Joel Becker authored
      ocfs2's allocation unit is the cluster.  This can be larger than a block
      or even a memory page.  This means that a file may have many blocks in
      its last extent that are beyond the block containing i_size.  There also
      may be more unwritten extents after that.
      
      When ocfs2 grows a file, it zeros the entire cluster in order to ensure
      future i_size growth will see cleared blocks.  Unfortunately,
      block_write_full_page() drops the pages past i_size.  This means that
      ocfs2 is actually leaking garbage data into the tail end of that last
      cluster.  This is a bug.
      
      We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect
      when a write or truncate is past i_size.  They will use
      ocfs2_zero_extend() to ensure the data is properly zeroed.
      
      Older versions of ocfs2_zero_extend() simply zeroed every block between
      i_size and the zeroing position.  This presumes three things:
      
      1) There is allocation for all of these blocks.
      2) The extents are not unwritten.
      3) The extents are not refcounted.
      
      (1) and (2) hold true for non-sparse filesystems, which used to be the
      only users of ocfs2_zero_extend().  (3) is another bug.
      
      Since we're now using ocfs2_zero_extend() for sparse filesystems as
      well, we teach ocfs2_zero_extend() to check every extent between
      i_size and the zeroing position.  If the extent is unwritten, it is
      ignored.  If it is refcounted, it is CoWed.  Then it is zeroed.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Cc: stable@kernel.org
      5693486b
    • Joel Becker's avatar
      ocfs2: When zero extending, do it by page. · a4bfb4cf
      Joel Becker authored
      ocfs2_zero_extend() does its zeroing block by block, but it calls a
      function named ocfs2_write_zero_page().  Let's have
      ocfs2_write_zero_page() handle the page level.  From
      ocfs2_zero_extend()'s perspective, it is now page-at-a-time.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Cc: stable@kernel.org
      a4bfb4cf
  3. 15 Jun, 2010 3 commits
    • Tao Ma's avatar
      ocfs2: Limit default local alloc size within bitmap range. · 1739da40
      Tao Ma authored
      In commit 6b82021b, we increase
      our local alloc size and calculate how much megabytes we can
      get according to group size and volume size.
      But we also need to check the maximum bits a local alloc block
      bitmap can have. With a bs=512, cs=32K, local volume with 160G,
      it calculate 96MB while the maximum local alloc size is only
      76M. So the bitmap will overflow and corrupt the system truncate
      log file. See bug
      http://oss.oracle.com/bugzilla/show_bug.cgi?id=1262Signed-off-by: default avatarTao Ma <tao.ma@oracle.com>
      Acked-by: default avatarMark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      1739da40
    • Tao Ma's avatar
      ocfs2: Move orphan scan work to ocfs2_wq. · 40f165f4
      Tao Ma authored
      We used to let orphan scan work in the default work queue,
      but there is a corner case which will make the system deadlock.
      The scenario is like this:
      1. set heartbeat threadshold to 200. this will allow us to have a
         great chance to have a orphan scan work before our quorum decision.
      2. mount node 1.
      3. after 1~2 minutes, mount node 2(in order to make the bug easier
         to reproduce, better add maxcpus=1 to kernel command line).
      4. node 1 do orphan scan work.
      5. node 2 do orphan scan work.
      6. node 1 do orphan scan work. After this, node 1 hold the orphan scan
         lock while node 2 know node 1 is the master.
      7. ifdown eth2 in node 2(eth2 is what we do ocfs2 interconnection).
      
      Now when node 2 begins orphan scan, the system queue is blocked.
      
      The root cause is that both orphan scan work and quorum decision work
      will use the system event work queue. orphan scan has a chance of
      blocking the event work queue(in dlm_wait_for_node_death) so that there
      is no chance for quorum decision work to proceed.
      
      This patch resolve it by moving orphan scan work to ocfs2_wq.
      Signed-off-by: default avatarTao Ma <tao.ma@oracle.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      40f165f4
    • Julia Lawall's avatar
      fs/ocfs2/dlm: Add missing spin_unlock · 6469272c
      Julia Lawall authored
      Add a spin_unlock missing on the error path.  Unlock as in the other code
      that leads to the leave label.
      
      The semantic match that finds this problem is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression E1;
      @@
      
      * spin_lock(E1,...);
        <+... when != E1
        if (...) {
          ... when != E1
      *   return ...;
        }
        ...+>
      * spin_unlock(E1,...);
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <julia@diku.dk>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      6469272c
  4. 12 Jun, 2010 1 commit
  5. 11 Jun, 2010 33 commits