• Andrew Morton's avatar
    [PATCH] i_dirty_buffers locking fix · 43152186
    Andrew Morton authored
    This fixes a race between try_to_free_buffers' call to
    __remove_inode_queue() and other users of b_inode_buffers
    (fsync_inode_buffers and mark_buffer_dirty_inode()).  They are
    presently taking different locks.
    
    The patch relocates and redefines and clarifies(?) the role of
    inode.i_dirty_buffers.
    
    The 2.4 definition of i_dirty_buffers is "a list of random buffers
    which is protected by a kernel-wide lock".  This definition needs to be
    narrowed in the 2.5 context.  It is now
    
    "a list of buffers from a different mapping, protected by a lock within
    that mapping".  This list of buffers is specifically for fsync().
    
    As this is a "data plane" operation, all the structures have been moved
    out of the inode and into the address_space.  So address_space now has:
    
    list_head private_list;
    
         A list, available to the address_space for any purpose.  If
         that address_space chooses to use the helper functions
         mark_buffer_dirty_inode and sync_mapping_buffers() then this list
         will contain buffer_heads, attached via
         buffer_head.b_assoc_buffers.
    
         If the address_space does not call those helper functions
         then the list is free for other usage.  The only requirement is
         that the list be list_empty() at destroy_inode() time.
    
         At least, this is the objective.  At present,
         generic_file_write() will call generic_osync_inode(), which
         expects that list to contain buffer_heads.  So private_list isn't
         useful for anything else yet.
    
    spinlock_t private_lock;
    
         A spinlock, available to the address_space.
    
         If the address_space is using try_to_free_buffers(),
         mark_inode_dirty_buffers() and fsync_inode_buffers() then this
         lock is used to protect the private_list of *other* mappings which
         have listed buffers from *this* mapping onto themselves.
    
         That is: for buffer_heads, mapping_A->private_lock does not
         protect mapping_A->private_list!  It protects the b_assoc_buffers
         list from buffers which are backed by mapping_A and it protects
         mapping_B->private_list, mapping_C->private_list, ...
    
         So what we have here is a cross-mapping association.  S_ISREG
         mappings maintain a list of buffers from the blockdev's
         address_space which they need to know about for a successful
         fsync().  The locking follows the buffers: the lock in in the
         blockdev's mapping, not in the S_ISREG file's mapping.
    
         For address_spaces which use try_to_free_buffers,
         private_lock is also (and quite unrelatedly) used for protection
         of the buffer ring at page->private.  Exclusion between
         try_to_free_buffers(), __get_hash_table() and
         __set_page_dirty_buffers().  This is in fact its major use.
    
    address_space *assoc_mapping
    
        Sigh.  This is the address of the mapping which backs the
        buffers which are attached to private_list.  It's here so that
        generic_osync_inode() can locate the lock which protects this
        mapping's private_list.  Will probably go away.
    
    
    A consequence of all the above is that:
    
        a) All the buffers at a mapping_A's ->private_list must come
           from the same mapping, mapping_B.  There is no requirement that
           mapping_B be a blockdev mapping, but that's how it's used.
    
           There is a BUG() check in mark_buffer_dirty_inode() for this.
    
        b) blockdev mappings never have any buffers on ->private_list.
           It just never happens, and doesn't make a lot of sense.
    
    reiserfs is using b_inode_buffers for attaching dependent buffers to its
    journal and that caused a few problems.  Fixed in reiserfs_releasepage.patch
    43152186
inode.c 79.7 KB