Commit 62c8d922 authored by Linus Torvalds's avatar Linus Torvalds

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw

Pull GFS2 changes from Steven Whitehouse.

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits)
  GFS2: Fix quota adjustment return code
  GFS2: Add rgrp information to block_alloc trace point
  GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer
  GFS2: Update glock doc to add new stats info
  GFS2: Update main gfs2 doc
  GFS2: Remove redundant metadata block type check
  GFS2: Fix sgid propagation when using ACLs
  GFS2: eliminate log elements and simplify
  GFS2: Eliminate vestigial sd_log_le_rg
  GFS2: Eliminate needless parameter from function gfs2_setbit
  GFS2: Log code fixes
  GFS2: Remove unused argument from gfs2_internal_read
  GFS2: Remove bd_list_tr
  GFS2: Remove duplicate log code
  GFS2: Clean up log write code path
  GFS2: Use variable rather than qa to determine if unstuff necessary
  GFS2: Change variable blk to biblk
  GFS2: Fix function parameter comments in rgrp.c
  GFS2: Eliminate offset parameter to gfs2_setbit
  GFS2: Use slab for block reservation memory
  ...
parents 06930b94 500242ac
......@@ -61,7 +61,9 @@ go_unlock | Called on the final local unlock of a lock
go_dump | Called to print content of object for debugfs file, or on
| error to dump glock to the log.
go_type | The type of the glock, LM_TYPE_.....
go_min_hold_time | The minimum hold time
go_callback | Called if the DLM sends a callback to drop this lock
go_flags | GLOF_ASPACE is set, if the glock has an address space
| associated with it
The minimum hold time for each lock is the time after a remote lock
grant for which we ignore remote demote requests. This is in order to
......@@ -89,6 +91,7 @@ go_demote_ok | Sometimes | Yes
go_lock | Yes | No
go_unlock | Yes | No
go_dump | Sometimes | Yes
go_callback | Sometimes (N/A) | Yes
N.B. Operations must not drop either the bit lock or the spinlock
if its held on entry. go_dump and do_demote_ok must never block.
......@@ -111,4 +114,118 @@ itself (locking order as above), and the other, known as the iopen
glock is used in conjunction with the i_nlink field in the inode to
determine the lifetime of the inode in question. Locking of inodes
is on a per-inode basis. Locking of rgrps is on a per rgrp basis.
In general we prefer to lock local locks prior to cluster locks.
Glock Statistics
------------------
The stats are divided into two sets: those relating to the
super block and those relating to an individual glock. The
super block stats are done on a per cpu basis in order to
try and reduce the overhead of gathering them. They are also
further divided by glock type. All timings are in nanoseconds.
In the case of both the super block and glock statistics,
the same information is gathered in each case. The super
block timing statistics are used to provide default values for
the glock timing statistics, so that newly created glocks
should have, as far as possible, a sensible starting point.
The per-glock counters are initialised to zero when the
glock is created. The per-glock statistics are lost when
the glock is ejected from memory.
The statistics are divided into three pairs of mean and
variance, plus two counters. The mean/variance pairs are
smoothed exponential estimates and the algorithm used is
one which will be very familiar to those used to calculation
of round trip times in network code. See "TCP/IP Illustrated,
Volume 1", W. Richard Stevens, sect 21.3, "Round-Trip Time Measurement",
p. 299 and onwards. Also, Volume 2, Sect. 25.10, p. 838 and onwards.
Unlike the TCP/IP Illustrated case, the mean and variance are
not scaled, but are in units of integer nanoseconds.
The three pairs of mean/variance measure the following
things:
1. DLM lock time (non-blocking requests)
2. DLM lock time (blocking requests)
3. Inter-request time (again to the DLM)
A non-blocking request is one which will complete right
away, whatever the state of the DLM lock in question. That
currently means any requests when (a) the current state of
the lock is exclusive, i.e. a lock demotion (b) the requested
state is either null or unlocked (again, a demotion) or (c) the
"try lock" flag is set. A blocking request covers all the other
lock requests.
There are two counters. The first is there primarily to show
how many lock requests have been made, and thus how much data
has gone into the mean/variance calculations. The other counter
is counting queuing of holders at the top layer of the glock
code. Hopefully that number will be a lot larger than the number
of dlm lock requests issued.
So why gather these statistics? There are several reasons
we'd like to get a better idea of these timings:
1. To be able to better set the glock "min hold time"
2. To spot performance issues more easily
3. To improve the algorithm for selecting resource groups for
allocation (to base it on lock wait time, rather than blindly
using a "try lock")
Due to the smoothing action of the updates, a step change in
some input quantity being sampled will only fully be taken
into account after 8 samples (or 4 for the variance) and this
needs to be carefully considered when interpreting the
results.
Knowing both the time it takes a lock request to complete and
the average time between lock requests for a glock means we
can compute the total percentage of the time for which the
node is able to use a glock vs. time that the rest of the
cluster has its share. That will be very useful when setting
the lock min hold time.
Great care has been taken to ensure that we
measure exactly the quantities that we want, as accurately
as possible. There are always inaccuracies in any
measuring system, but I hope this is as accurate as we
can reasonably make it.
Per sb stats can be found here:
/sys/kernel/debug/gfs2/<fsname>/sbstats
Per glock stats can be found here:
/sys/kernel/debug/gfs2/<fsname>/glstats
Assuming that debugfs is mounted on /sys/kernel/debug and also
that <fsname> is replaced with the name of the gfs2 filesystem
in question.
The abbreviations used in the output as are follows:
srtt - Smoothed round trip time for non-blocking dlm requests
srttvar - Variance estimate for srtt
srttb - Smoothed round trip time for (potentially) blocking dlm requests
srttvarb - Variance estimate for srttb
sirt - Smoothed inter-request time (for dlm requests)
sirtvar - Variance estimate for sirt
dlm - Number of dlm requests made (dcnt in glstats file)
queue - Number of glock requests queued (qcnt in glstats file)
The sbstats file contains a set of these stats for each glock type (so 8 lines
for each type) and for each cpu (one column per cpu). The glstats file contains
a set of these stats for each glock in a similar format to the glocks file, but
using the format mean/variance for each of the timing stats.
The gfs2_glock_lock_time tracepoint prints out the current values of the stats
for the glock in question, along with some addition information on each dlm
reply that is received:
status - The status of the dlm request
flags - The dlm request flags
tdiff - The time taken by this specific request
(remaining fields as per above list)
Global File System
------------------
http://sources.redhat.com/cluster/wiki/
https://fedorahosted.org/cluster/wiki/HomePage
GFS is a cluster file system. It allows a cluster of computers to
simultaneously use a block device that is shared between them (with FC,
......@@ -30,7 +30,8 @@ needed, simply:
If you are using Fedora, you need to install the gfs2-utils package
and, for lock_dlm, you will also need to install the cman package
and write a cluster.conf as per the documentation.
and write a cluster.conf as per the documentation. For F17 and above
cman has been replaced by the dlm package.
GFS2 is not on-disk compatible with previous versions of GFS, but it
is pretty close.
......@@ -39,8 +40,6 @@ The following man pages can be found at the URL above:
fsck.gfs2 to repair a filesystem
gfs2_grow to expand a filesystem online
gfs2_jadd to add journals to a filesystem online
gfs2_tool to manipulate, examine and tune a filesystem
gfs2_quota to examine and change quota values in a filesystem
tunegfs2 to manipulate, examine and tune a filesystem
gfs2_convert to convert a gfs filesystem to gfs2 in-place
mount.gfs2 to help mount(8) mount a filesystem
mkfs.gfs2 to make a filesystem
......@@ -73,12 +73,8 @@ static int gfs2_set_mode(struct inode *inode, umode_t mode)
int error = 0;
if (mode != inode->i_mode) {
struct iattr iattr;
iattr.ia_valid = ATTR_MODE;
iattr.ia_mode = mode;
error = gfs2_setattr_simple(inode, &iattr);
inode->i_mode = mode;
mark_inode_dirty(inode);
}
return error;
......@@ -126,9 +122,7 @@ int gfs2_acl_create(struct gfs2_inode *dip, struct inode *inode)
return PTR_ERR(acl);
if (!acl) {
mode &= ~current_umask();
if (mode != inode->i_mode)
error = gfs2_set_mode(inode, mode);
return error;
return gfs2_set_mode(inode, mode);
}
if (S_ISDIR(inode->i_mode)) {
......
......@@ -36,8 +36,8 @@
#include "glops.h"
void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
unsigned int from, unsigned int to)
static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
unsigned int from, unsigned int to)
{
struct buffer_head *head = page_buffers(page);
unsigned int bsize = head->b_size;
......@@ -517,15 +517,14 @@ static int gfs2_readpage(struct file *file, struct page *page)
/**
* gfs2_internal_read - read an internal file
* @ip: The gfs2 inode
* @ra_state: The readahead state (or NULL for no readahead)
* @buf: The buffer to fill
* @pos: The file position
* @size: The amount to read
*
*/
int gfs2_internal_read(struct gfs2_inode *ip, struct file_ra_state *ra_state,
char *buf, loff_t *pos, unsigned size)
int gfs2_internal_read(struct gfs2_inode *ip, char *buf, loff_t *pos,
unsigned size)
{
struct address_space *mapping = ip->i_inode.i_mapping;
unsigned long index = *pos / PAGE_CACHE_SIZE;
......@@ -943,8 +942,8 @@ static void gfs2_discard(struct gfs2_sbd *sdp, struct buffer_head *bh)
clear_buffer_dirty(bh);
bd = bh->b_private;
if (bd) {
if (!list_empty(&bd->bd_le.le_list) && !buffer_pinned(bh))
list_del_init(&bd->bd_le.le_list);
if (!list_empty(&bd->bd_list) && !buffer_pinned(bh))
list_del_init(&bd->bd_list);
else
gfs2_remove_from_journal(bh, current->journal_info, 0);
}
......@@ -1084,10 +1083,9 @@ int gfs2_releasepage(struct page *page, gfp_t gfp_mask)
bd = bh->b_private;
if (bd) {
gfs2_assert_warn(sdp, bd->bd_bh == bh);
gfs2_assert_warn(sdp, list_empty(&bd->bd_list_tr));
if (!list_empty(&bd->bd_le.le_list)) {
if (!list_empty(&bd->bd_list)) {
if (!buffer_pinned(bh))
list_del_init(&bd->bd_le.le_list);
list_del_init(&bd->bd_list);
else
bd = NULL;
}
......
......@@ -324,7 +324,7 @@ static int lookup_metapath(struct gfs2_inode *ip, struct metapath *mp)
if (!dblock)
return x + 1;
ret = gfs2_meta_indirect_buffer(ip, x+1, dblock, 0, &mp->mp_bh[x+1]);
ret = gfs2_meta_indirect_buffer(ip, x+1, dblock, &mp->mp_bh[x+1]);
if (ret)
return ret;
}
......@@ -882,7 +882,7 @@ static int recursive_scan(struct gfs2_inode *ip, struct buffer_head *dibh,
top = (__be64 *)(bh->b_data + sizeof(struct gfs2_dinode)) + mp->mp_list[0];
bottom = (__be64 *)(bh->b_data + sizeof(struct gfs2_dinode)) + sdp->sd_diptrs;
} else {
error = gfs2_meta_indirect_buffer(ip, height, block, 0, &bh);
error = gfs2_meta_indirect_buffer(ip, height, block, &bh);
if (error)
return error;
......@@ -1169,6 +1169,7 @@ static int do_grow(struct inode *inode, u64 size)
struct buffer_head *dibh;
struct gfs2_qadata *qa = NULL;
int error;
int unstuff = 0;
if (gfs2_is_stuffed(ip) &&
(size > (sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode)))) {
......@@ -1183,13 +1184,14 @@ static int do_grow(struct inode *inode, u64 size)
error = gfs2_inplace_reserve(ip, 1);
if (error)
goto do_grow_qunlock;
unstuff = 1;
}
error = gfs2_trans_begin(sdp, RES_DINODE + RES_STATFS + RES_RG_BIT, 0);
if (error)
goto do_grow_release;
if (qa) {
if (unstuff) {
error = gfs2_unstuff_dinode(ip, NULL);
if (error)
goto do_end_trans;
......@@ -1208,7 +1210,7 @@ static int do_grow(struct inode *inode, u64 size)
do_end_trans:
gfs2_trans_end(sdp);
do_grow_release:
if (qa) {
if (unstuff) {
gfs2_inplace_release(ip);
do_grow_qunlock:
gfs2_quota_unlock(ip);
......
......@@ -558,14 +558,14 @@ static int gfs2_open(struct inode *inode, struct file *file)
}
/**
* gfs2_close - called to close a struct file
* gfs2_release - called to close a struct file
* @inode: the inode the struct file belongs to
* @file: the struct file being closed
*
* Returns: errno
*/
static int gfs2_close(struct inode *inode, struct file *file)
static int gfs2_release(struct inode *inode, struct file *file)
{
struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
struct gfs2_file *fp;
......@@ -1005,7 +1005,7 @@ const struct file_operations gfs2_file_fops = {
.unlocked_ioctl = gfs2_ioctl,
.mmap = gfs2_mmap,
.open = gfs2_open,
.release = gfs2_close,
.release = gfs2_release,
.fsync = gfs2_fsync,
.lock = gfs2_lock,
.flock = gfs2_flock,
......@@ -1019,7 +1019,7 @@ const struct file_operations gfs2_dir_fops = {
.readdir = gfs2_readdir,
.unlocked_ioctl = gfs2_ioctl,
.open = gfs2_open,
.release = gfs2_close,
.release = gfs2_release,
.fsync = gfs2_fsync,
.lock = gfs2_lock,
.flock = gfs2_flock,
......@@ -1037,7 +1037,7 @@ const struct file_operations gfs2_file_fops_nolock = {
.unlocked_ioctl = gfs2_ioctl,
.mmap = gfs2_mmap,
.open = gfs2_open,
.release = gfs2_close,
.release = gfs2_release,
.fsync = gfs2_fsync,
.splice_read = generic_file_splice_read,
.splice_write = generic_file_splice_write,
......@@ -1049,7 +1049,7 @@ const struct file_operations gfs2_dir_fops_nolock = {
.readdir = gfs2_readdir,
.unlocked_ioctl = gfs2_ioctl,
.open = gfs2_open,
.release = gfs2_close,
.release = gfs2_release,
.fsync = gfs2_fsync,
.llseek = default_llseek,
};
......
......@@ -94,7 +94,6 @@ static void gfs2_ail_empty_gl(struct gfs2_glock *gl)
/* A shortened, inline version of gfs2_trans_begin() */
tr.tr_reserved = 1 + gfs2_struct2blk(sdp, tr.tr_revokes, sizeof(u64));
tr.tr_ip = (unsigned long)__builtin_return_address(0);
INIT_LIST_HEAD(&tr.tr_list_buf);
gfs2_log_reserve(sdp, tr.tr_reserved);
BUG_ON(current->journal_info);
current->journal_info = &tr;
......@@ -379,11 +378,6 @@ int gfs2_inode_refresh(struct gfs2_inode *ip)
if (error)
return error;
if (gfs2_metatype_check(GFS2_SB(&ip->i_inode), dibh, GFS2_METATYPE_DI)) {
brelse(dibh);
return -EIO;
}
error = gfs2_dinode_in(ip, dibh->b_data);
brelse(dibh);
clear_bit(GIF_INVALID, &ip->i_flags);
......
......@@ -26,7 +26,7 @@
#define DIO_METADATA 0x00000020
struct gfs2_log_operations;
struct gfs2_log_element;
struct gfs2_bufdata;
struct gfs2_holder;
struct gfs2_glock;
struct gfs2_quota_data;
......@@ -52,7 +52,7 @@ struct gfs2_log_header_host {
*/
struct gfs2_log_operations {
void (*lo_add) (struct gfs2_sbd *sdp, struct gfs2_log_element *le);
void (*lo_add) (struct gfs2_sbd *sdp, struct gfs2_bufdata *bd);
void (*lo_before_commit) (struct gfs2_sbd *sdp);
void (*lo_after_commit) (struct gfs2_sbd *sdp, struct gfs2_ail *ai);
void (*lo_before_scan) (struct gfs2_jdesc *jd,
......@@ -64,11 +64,6 @@ struct gfs2_log_operations {
const char *lo_name;
};
struct gfs2_log_element {
struct list_head le_list;
const struct gfs2_log_operations *le_ops;
};
#define GBF_FULL 1
struct gfs2_bitmap {
......@@ -118,15 +113,10 @@ TAS_BUFFER_FNS(Zeronew, zeronew)
struct gfs2_bufdata {
struct buffer_head *bd_bh;
struct gfs2_glock *bd_gl;
u64 bd_blkno;
union {
struct list_head list_tr;
u64 blkno;
} u;
#define bd_list_tr u.list_tr
#define bd_blkno u.blkno
struct gfs2_log_element bd_le;
struct list_head bd_list;
const struct gfs2_log_operations *bd_ops;
struct gfs2_ail *bd_ail;
struct list_head bd_ail_st_list;
......@@ -411,13 +401,10 @@ struct gfs2_trans {
int tr_touched;
unsigned int tr_num_buf;
unsigned int tr_num_buf_new;
unsigned int tr_num_databuf_new;
unsigned int tr_num_buf_rm;
unsigned int tr_num_databuf_rm;
struct list_head tr_list_buf;
unsigned int tr_num_revoke;
unsigned int tr_num_revoke_rm;
};
......@@ -699,7 +686,6 @@ struct gfs2_sbd {
struct list_head sd_log_le_buf;
struct list_head sd_log_le_revoke;
struct list_head sd_log_le_rg;
struct list_head sd_log_le_databuf;
struct list_head sd_log_le_ordered;
......@@ -716,7 +702,9 @@ struct gfs2_sbd {
struct rw_semaphore sd_log_flush_lock;
atomic_t sd_log_in_flight;
struct bio *sd_log_bio;
wait_queue_head_t sd_log_flush_wait;
int sd_log_error;
unsigned int sd_log_flush_head;
u64 sd_log_flush_wrapped;
......
......@@ -17,10 +17,7 @@
extern int gfs2_releasepage(struct page *page, gfp_t gfp_mask);
extern int gfs2_internal_read(struct gfs2_inode *ip,
struct file_ra_state *ra_state,
char *buf, loff_t *pos, unsigned size);
extern void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
unsigned int from, unsigned int to);
extern void gfs2_set_aops(struct inode *inode);
static inline int gfs2_is_stuffed(const struct gfs2_inode *ip)
......
......@@ -32,8 +32,6 @@
#include "dir.h"
#include "trace_gfs2.h"
#define PULL 1
/**
* gfs2_struct2blk - compute stuff
* @sdp: the filesystem
......@@ -359,18 +357,6 @@ int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks)
return 0;
}
u64 gfs2_log_bmap(struct gfs2_sbd *sdp, unsigned int lbn)
{
struct gfs2_journal_extent *je;
list_for_each_entry(je, &sdp->sd_jdesc->extent_list, extent_list) {
if (lbn >= je->lblock && lbn < je->lblock + je->blocks)
return je->dblock + lbn - je->lblock;
}
return -1;
}
/**
* log_distance - Compute distance between two journal blocks
* @sdp: The GFS2 superblock
......@@ -466,17 +452,6 @@ static unsigned int current_tail(struct gfs2_sbd *sdp)
return tail;
}
void gfs2_log_incr_head(struct gfs2_sbd *sdp)
{
BUG_ON((sdp->sd_log_flush_head == sdp->sd_log_tail) &&
(sdp->sd_log_flush_head != sdp->sd_log_head));
if (++sdp->sd_log_flush_head == sdp->sd_jdesc->jd_blocks) {
sdp->sd_log_flush_head = 0;
sdp->sd_log_flush_wrapped = 1;
}
}
static void log_pull_tail(struct gfs2_sbd *sdp, unsigned int new_tail)
{
unsigned int dist = log_distance(sdp, new_tail, sdp->sd_log_tail);
......@@ -511,8 +486,8 @@ static int bd_cmp(void *priv, struct list_head *a, struct list_head *b)
{
struct gfs2_bufdata *bda, *bdb;
bda = list_entry(a, struct gfs2_bufdata, bd_le.le_list);
bdb = list_entry(b, struct gfs2_bufdata, bd_le.le_list);
bda = list_entry(a, struct gfs2_bufdata, bd_list);
bdb = list_entry(b, struct gfs2_bufdata, bd_list);
if (bda->bd_bh->b_blocknr < bdb->bd_bh->b_blocknr)
return -1;
......@@ -530,8 +505,8 @@ static void gfs2_ordered_write(struct gfs2_sbd *sdp)
gfs2_log_lock(sdp);
list_sort(NULL, &sdp->sd_log_le_ordered, &bd_cmp);
while (!list_empty(&sdp->sd_log_le_ordered)) {
bd = list_entry(sdp->sd_log_le_ordered.next, struct gfs2_bufdata, bd_le.le_list);
list_move(&bd->bd_le.le_list, &written);
bd = list_entry(sdp->sd_log_le_ordered.next, struct gfs2_bufdata, bd_list);
list_move(&bd->bd_list, &written);
bh = bd->bd_bh;
if (!buffer_dirty(bh))
continue;
......@@ -558,7 +533,7 @@ static void gfs2_ordered_wait(struct gfs2_sbd *sdp)
gfs2_log_lock(sdp);
while (!list_empty(&sdp->sd_log_le_ordered)) {
bd = list_entry(sdp->sd_log_le_ordered.prev, struct gfs2_bufdata, bd_le.le_list);
bd = list_entry(sdp->sd_log_le_ordered.prev, struct gfs2_bufdata, bd_list);
bh = bd->bd_bh;
if (buffer_locked(bh)) {
get_bh(bh);
......@@ -568,7 +543,7 @@ static void gfs2_ordered_wait(struct gfs2_sbd *sdp)
gfs2_log_lock(sdp);
continue;
}
list_del_init(&bd->bd_le.le_list);
list_del_init(&bd->bd_list);
}
gfs2_log_unlock(sdp);
}
......@@ -580,25 +555,19 @@ static void gfs2_ordered_wait(struct gfs2_sbd *sdp)
* Returns: the initialized log buffer descriptor
*/
static void log_write_header(struct gfs2_sbd *sdp, u32 flags, int pull)
static void log_write_header(struct gfs2_sbd *sdp, u32 flags)
{
u64 blkno = gfs2_log_bmap(sdp, sdp->sd_log_flush_head);
struct buffer_head *bh;
struct gfs2_log_header *lh;
unsigned int tail;
u32 hash;
bh = sb_getblk(sdp->sd_vfs, blkno);
lock_buffer(bh);
memset(bh->b_data, 0, bh->b_size);
set_buffer_uptodate(bh);
clear_buffer_dirty(bh);
int rw = WRITE_FLUSH_FUA | REQ_META;
struct page *page = mempool_alloc(gfs2_page_pool, GFP_NOIO);
lh = page_address(page);
clear_page(lh);
gfs2_ail1_empty(sdp);
tail = current_tail(sdp);
lh = (struct gfs2_log_header *)bh->b_data;
memset(lh, 0, sizeof(struct gfs2_log_header));
lh->lh_header.mh_magic = cpu_to_be32(GFS2_MAGIC);
lh->lh_header.mh_type = cpu_to_be32(GFS2_METATYPE_LH);
lh->lh_header.__pad0 = cpu_to_be64(0);
......@@ -608,31 +577,22 @@ static void log_write_header(struct gfs2_sbd *sdp, u32 flags, int pull)
lh->lh_flags = cpu_to_be32(flags);
lh->lh_tail = cpu_to_be32(tail);
lh->lh_blkno = cpu_to_be32(sdp->sd_log_flush_head);
hash = gfs2_disk_hash(bh->b_data, sizeof(struct gfs2_log_header));
hash = gfs2_disk_hash(page_address(page), sizeof(struct gfs2_log_header));
lh->lh_hash = cpu_to_be32(hash);
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
if (test_bit(SDF_NOBARRIERS, &sdp->sd_flags)) {
gfs2_ordered_wait(sdp);
log_flush_wait(sdp);
submit_bh(WRITE_SYNC | REQ_META | REQ_PRIO, bh);
} else {
submit_bh(WRITE_FLUSH_FUA | REQ_META, bh);
rw = WRITE_SYNC | REQ_META | REQ_PRIO;
}
wait_on_buffer(bh);
if (!buffer_uptodate(bh))
gfs2_io_error_bh(sdp, bh);
brelse(bh);
sdp->sd_log_idle = (tail == sdp->sd_log_flush_head);
gfs2_log_write_page(sdp, page);
gfs2_log_flush_bio(sdp, rw);
log_flush_wait(sdp);
if (sdp->sd_log_tail != tail)
log_pull_tail(sdp, tail);
else
gfs2_assert_withdraw(sdp, !pull);
sdp->sd_log_idle = (tail == sdp->sd_log_flush_head);
gfs2_log_incr_head(sdp);
}
/**
......@@ -678,15 +638,14 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl)
gfs2_ordered_write(sdp);
lops_before_commit(sdp);
gfs2_log_flush_bio(sdp, WRITE);
if (sdp->sd_log_head != sdp->sd_log_flush_head) {
log_write_header(sdp, 0, 0);
log_write_header(sdp, 0);
} else if (sdp->sd_log_tail != current_tail(sdp) && !sdp->sd_log_idle){
gfs2_log_lock(sdp);
atomic_dec(&sdp->sd_log_blks_free); /* Adjust for unreserved buffer */
trace_gfs2_log_blocks(sdp, -1);
gfs2_log_unlock(sdp);
log_write_header(sdp, 0, PULL);
log_write_header(sdp, 0);
}
lops_after_commit(sdp, ai);
......@@ -735,21 +694,6 @@ static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
gfs2_log_unlock(sdp);
}
static void buf_lo_incore_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
{
struct list_head *head = &tr->tr_list_buf;
struct gfs2_bufdata *bd;
gfs2_log_lock(sdp);
while (!list_empty(head)) {
bd = list_entry(head->next, struct gfs2_bufdata, bd_list_tr);
list_del_init(&bd->bd_list_tr);
tr->tr_num_buf--;
}
gfs2_log_unlock(sdp);
gfs2_assert_warn(sdp, !tr->tr_num_buf);
}
/**
* gfs2_log_commit - Commit a transaction to the log
* @sdp: the filesystem
......@@ -768,8 +712,6 @@ static void buf_lo_incore_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
{
log_refund(sdp, tr);
buf_lo_incore_commit(sdp, tr);
up_read(&sdp->sd_log_flush_lock);
if (atomic_read(&sdp->sd_log_pinned) > atomic_read(&sdp->sd_log_thresh1) ||
......@@ -798,8 +740,7 @@ void gfs2_log_shutdown(struct gfs2_sbd *sdp)
sdp->sd_log_flush_head = sdp->sd_log_head;
sdp->sd_log_flush_wrapped = 0;
log_write_header(sdp, GFS2_LOG_HEAD_UNMOUNT,
(sdp->sd_log_tail == current_tail(sdp)) ? 0 : PULL);
log_write_header(sdp, GFS2_LOG_HEAD_UNMOUNT);
gfs2_assert_warn(sdp, atomic_read(&sdp->sd_log_blks_free) == sdp->sd_jdesc->jd_blocks);
gfs2_assert_warn(sdp, sdp->sd_log_head == sdp->sd_log_tail);
......@@ -854,11 +795,9 @@ int gfs2_logd(void *data)
struct gfs2_sbd *sdp = data;
unsigned long t = 1;
DEFINE_WAIT(wait);
unsigned preflush;
while (!kthread_should_stop()) {
preflush = atomic_read(&sdp->sd_log_pinned);
if (gfs2_jrnl_flush_reqd(sdp) || t == 0) {
gfs2_ail1_empty(sdp);
gfs2_log_flush(sdp, NULL);
......
......@@ -52,8 +52,6 @@ extern unsigned int gfs2_struct2blk(struct gfs2_sbd *sdp, unsigned int nstruct,
unsigned int ssize);
extern int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks);
extern void gfs2_log_incr_head(struct gfs2_sbd *sdp);
extern u64 gfs2_log_bmap(struct gfs2_sbd *sdp, unsigned int lbn);
extern void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl);
extern void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *trans);
extern void gfs2_remove_from_ail(struct gfs2_bufdata *bd);
......
This diff is collapsed.
......@@ -27,6 +27,8 @@ extern const struct gfs2_log_operations gfs2_rg_lops;
extern const struct gfs2_log_operations gfs2_databuf_lops;
extern const struct gfs2_log_operations *gfs2_log_ops[];
extern void gfs2_log_write_page(struct gfs2_sbd *sdp, struct page *page);
extern void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw);
static inline unsigned int buf_limit(struct gfs2_sbd *sdp)
{
......@@ -44,17 +46,17 @@ static inline unsigned int databuf_limit(struct gfs2_sbd *sdp)
return limit;
}
static inline void lops_init_le(struct gfs2_log_element *le,
static inline void lops_init_le(struct gfs2_bufdata *bd,
const struct gfs2_log_operations *lops)
{
INIT_LIST_HEAD(&le->le_list);
le->le_ops = lops;
INIT_LIST_HEAD(&bd->bd_list);
bd->bd_ops = lops;
}
static inline void lops_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le)
static inline void lops_add(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
{
if (le->le_ops->lo_add)
le->le_ops->lo_add(sdp, le);
if (bd->bd_ops->lo_add)
bd->bd_ops->lo_add(sdp, bd);
}
static inline void lops_before_commit(struct gfs2_sbd *sdp)
......
......@@ -70,16 +70,6 @@ static void gfs2_init_gl_aspace_once(void *foo)
address_space_init_once(mapping);
}
static void *gfs2_bh_alloc(gfp_t mask, void *data)
{
return alloc_buffer_head(mask);
}
static void gfs2_bh_free(void *ptr, void *data)
{
return free_buffer_head(ptr);
}
/**
* init_gfs2_fs - Register GFS2 as a filesystem
*
......@@ -143,6 +133,12 @@ static int __init init_gfs2_fs(void)
if (!gfs2_quotad_cachep)
goto fail;
gfs2_rsrv_cachep = kmem_cache_create("gfs2_mblk",
sizeof(struct gfs2_blkreserv),
0, 0, NULL);
if (!gfs2_rsrv_cachep)
goto fail;
register_shrinker(&qd_shrinker);
error = register_filesystem(&gfs2_fs_type);
......@@ -164,8 +160,8 @@ static int __init init_gfs2_fs(void)
if (!gfs2_control_wq)
goto fail_recovery;
gfs2_bh_pool = mempool_create(1024, gfs2_bh_alloc, gfs2_bh_free, NULL);
if (!gfs2_bh_pool)
gfs2_page_pool = mempool_create_page_pool(64, 0);
if (!gfs2_page_pool)
goto fail_control;
gfs2_register_debugfs();
......@@ -186,6 +182,9 @@ static int __init init_gfs2_fs(void)
unregister_shrinker(&qd_shrinker);
gfs2_glock_exit();
if (gfs2_rsrv_cachep)
kmem_cache_destroy(gfs2_rsrv_cachep);
if (gfs2_quotad_cachep)
kmem_cache_destroy(gfs2_quotad_cachep);
......@@ -225,7 +224,8 @@ static void __exit exit_gfs2_fs(void)
rcu_barrier();
mempool_destroy(gfs2_bh_pool);
mempool_destroy(gfs2_page_pool);
kmem_cache_destroy(gfs2_rsrv_cachep);
kmem_cache_destroy(gfs2_quotad_cachep);
kmem_cache_destroy(gfs2_rgrpd_cachep);
kmem_cache_destroy(gfs2_bufdata_cachep);
......
......@@ -293,11 +293,10 @@ void gfs2_attach_bufdata(struct gfs2_glock *gl, struct buffer_head *bh,
bd->bd_bh = bh;
bd->bd_gl = gl;
INIT_LIST_HEAD(&bd->bd_list_tr);
if (meta)
lops_init_le(&bd->bd_le, &gfs2_buf_lops);
lops_init_le(bd, &gfs2_buf_lops);
else
lops_init_le(&bd->bd_le, &gfs2_databuf_lops);
lops_init_le(bd, &gfs2_databuf_lops);
bh->b_private = bd;
if (meta)
......@@ -313,7 +312,7 @@ void gfs2_remove_from_journal(struct buffer_head *bh, struct gfs2_trans *tr, int
if (test_clear_buffer_pinned(bh)) {
trace_gfs2_pin(bd, 0);
atomic_dec(&sdp->sd_log_pinned);
list_del_init(&bd->bd_le.le_list);
list_del_init(&bd->bd_list);
if (meta) {
gfs2_assert_warn(sdp, sdp->sd_log_num_buf);
sdp->sd_log_num_buf--;
......@@ -375,33 +374,24 @@ void gfs2_meta_wipe(struct gfs2_inode *ip, u64 bstart, u32 blen)
* @ip: The GFS2 inode
* @height: The level of this buf in the metadata (indir addr) tree (if any)
* @num: The block number (device relative) of the buffer
* @new: Non-zero if we may create a new buffer
* @bhp: the buffer is returned here
*
* Returns: errno
*/
int gfs2_meta_indirect_buffer(struct gfs2_inode *ip, int height, u64 num,
int new, struct buffer_head **bhp)
struct buffer_head **bhp)
{
struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
struct gfs2_glock *gl = ip->i_gl;
struct buffer_head *bh;
int ret = 0;
u32 mtype = height ? GFS2_METATYPE_IN : GFS2_METATYPE_DI;
if (new) {
BUG_ON(height == 0);
bh = gfs2_meta_new(gl, num);
gfs2_trans_add_bh(ip->i_gl, bh, 1);
gfs2_metatype_set(bh, GFS2_METATYPE_IN, GFS2_FORMAT_IN);
gfs2_buffer_clear_tail(bh, sizeof(struct gfs2_meta_header));
} else {
u32 mtype = height ? GFS2_METATYPE_IN : GFS2_METATYPE_DI;
ret = gfs2_meta_read(gl, num, DIO_WAIT, &bh);
if (ret == 0 && gfs2_metatype_check(sdp, bh, mtype)) {
brelse(bh);
ret = -EIO;
}
ret = gfs2_meta_read(gl, num, DIO_WAIT, &bh);
if (ret == 0 && gfs2_metatype_check(sdp, bh, mtype)) {
brelse(bh);
ret = -EIO;
}
*bhp = bh;
return ret;
......
......@@ -65,12 +65,12 @@ void gfs2_remove_from_journal(struct buffer_head *bh, struct gfs2_trans *tr,
void gfs2_meta_wipe(struct gfs2_inode *ip, u64 bstart, u32 blen);
int gfs2_meta_indirect_buffer(struct gfs2_inode *ip, int height, u64 num,
int new, struct buffer_head **bhp);
struct buffer_head **bhp);
static inline int gfs2_meta_inode_buffer(struct gfs2_inode *ip,
struct buffer_head **bhp)
{
return gfs2_meta_indirect_buffer(ip, 0, ip->i_no_addr, 0, bhp);
return gfs2_meta_indirect_buffer(ip, 0, ip->i_no_addr, bhp);
}
struct buffer_head *gfs2_meta_ra(struct gfs2_glock *gl, u64 dblock, u32 extlen);
......
......@@ -99,7 +99,6 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
atomic_set(&sdp->sd_log_pinned, 0);
INIT_LIST_HEAD(&sdp->sd_log_le_buf);
INIT_LIST_HEAD(&sdp->sd_log_le_revoke);
INIT_LIST_HEAD(&sdp->sd_log_le_rg);
INIT_LIST_HEAD(&sdp->sd_log_le_databuf);
INIT_LIST_HEAD(&sdp->sd_log_le_ordered);
......
......@@ -652,7 +652,7 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc,
}
memset(&q, 0, sizeof(struct gfs2_quota));
err = gfs2_internal_read(ip, NULL, (char *)&q, &loc, sizeof(q));
err = gfs2_internal_read(ip, (char *)&q, &loc, sizeof(q));
if (err < 0)
return err;
......@@ -744,7 +744,7 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc,
i_size_write(inode, size);
inode->i_mtime = inode->i_atime = CURRENT_TIME;
mark_inode_dirty(inode);
return err;
return 0;
unlock_out:
unlock_page(page);
......@@ -852,7 +852,7 @@ static int update_qd(struct gfs2_sbd *sdp, struct gfs2_quota_data *qd)
memset(&q, 0, sizeof(struct gfs2_quota));
pos = qd2offset(qd);
error = gfs2_internal_read(ip, NULL, (char *)&q, &pos, sizeof(q));
error = gfs2_internal_read(ip, (char *)&q, &pos, sizeof(q));
if (error < 0)
return error;
......
This diff is collapsed.
......@@ -457,10 +457,10 @@ TRACE_EVENT(gfs2_bmap,
/* Keep track of blocks as they are allocated/freed */
TRACE_EVENT(gfs2_block_alloc,
TP_PROTO(const struct gfs2_inode *ip, u64 block, unsigned len,
u8 block_state),
TP_PROTO(const struct gfs2_inode *ip, struct gfs2_rgrpd *rgd,
u64 block, unsigned len, u8 block_state),
TP_ARGS(ip, block, len, block_state),
TP_ARGS(ip, rgd, block, len, block_state),
TP_STRUCT__entry(
__field( dev_t, dev )
......@@ -468,6 +468,8 @@ TRACE_EVENT(gfs2_block_alloc,
__field( u64, inum )
__field( u32, len )
__field( u8, block_state )
__field( u64, rd_addr )
__field( u32, rd_free_clone )
),
TP_fast_assign(
......@@ -476,14 +478,18 @@ TRACE_EVENT(gfs2_block_alloc,
__entry->inum = ip->i_no_addr;
__entry->len = len;
__entry->block_state = block_state;
__entry->rd_addr = rgd->rd_addr;
__entry->rd_free_clone = rgd->rd_free_clone;
),
TP_printk("%u,%u bmap %llu alloc %llu/%lu %s",
TP_printk("%u,%u bmap %llu alloc %llu/%lu %s rg:%llu rf:%u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long long)__entry->inum,
(unsigned long long)__entry->start,
(unsigned long)__entry->len,
block_state_name(__entry->block_state))
block_state_name(__entry->block_state),
(unsigned long long)__entry->rd_addr,
__entry->rd_free_clone)
);
#endif /* _TRACE_GFS2_H */
......
......@@ -50,8 +50,6 @@ int gfs2_trans_begin(struct gfs2_sbd *sdp, unsigned int blocks,
if (revokes)
tr->tr_reserved += gfs2_struct2blk(sdp, revokes,
sizeof(u64));
INIT_LIST_HEAD(&tr->tr_list_buf);
gfs2_holder_init(sdp->sd_trans_gl, LM_ST_SHARED, 0, &tr->tr_t_gh);
error = gfs2_glock_nq(&tr->tr_t_gh);
......@@ -93,10 +91,21 @@ static void gfs2_log_release(struct gfs2_sbd *sdp, unsigned int blks)
up_read(&sdp->sd_log_flush_lock);
}
static void gfs2_print_trans(const struct gfs2_trans *tr)
{
print_symbol(KERN_WARNING "GFS2: Transaction created at: %s\n", tr->tr_ip);
printk(KERN_WARNING "GFS2: blocks=%u revokes=%u reserved=%u touched=%d\n",
tr->tr_blocks, tr->tr_revokes, tr->tr_reserved, tr->tr_touched);
printk(KERN_WARNING "GFS2: Buf %u/%u Databuf %u/%u Revoke %u/%u\n",
tr->tr_num_buf_new, tr->tr_num_buf_rm,
tr->tr_num_databuf_new, tr->tr_num_databuf_rm,
tr->tr_num_revoke, tr->tr_num_revoke_rm);
}
void gfs2_trans_end(struct gfs2_sbd *sdp)
{
struct gfs2_trans *tr = current->journal_info;
s64 nbuf;
BUG_ON(!tr);
current->journal_info = NULL;
......@@ -110,16 +119,13 @@ void gfs2_trans_end(struct gfs2_sbd *sdp)
return;
}
if (gfs2_assert_withdraw(sdp, tr->tr_num_buf <= tr->tr_blocks)) {
fs_err(sdp, "tr_num_buf = %u, tr_blocks = %u ",
tr->tr_num_buf, tr->tr_blocks);
print_symbol(KERN_WARNING "GFS2: Transaction created at: %s\n", tr->tr_ip);
}
if (gfs2_assert_withdraw(sdp, tr->tr_num_revoke <= tr->tr_revokes)) {
fs_err(sdp, "tr_num_revoke = %u, tr_revokes = %u ",
tr->tr_num_revoke, tr->tr_revokes);
print_symbol(KERN_WARNING "GFS2: Transaction created at: %s\n", tr->tr_ip);
}
nbuf = tr->tr_num_buf_new + tr->tr_num_databuf_new;
nbuf -= tr->tr_num_buf_rm;
nbuf -= tr->tr_num_databuf_rm;
if (gfs2_assert_withdraw(sdp, (nbuf <= tr->tr_blocks) &&
(tr->tr_num_revoke <= tr->tr_revokes)))
gfs2_print_trans(tr);
gfs2_log_commit(sdp, tr);
if (tr->tr_t_gh.gh_gl) {
......@@ -152,16 +158,16 @@ void gfs2_trans_add_bh(struct gfs2_glock *gl, struct buffer_head *bh, int meta)
gfs2_attach_bufdata(gl, bh, meta);
bd = bh->b_private;
}
lops_add(sdp, &bd->bd_le);
lops_add(sdp, bd);
}
void gfs2_trans_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
{
BUG_ON(!list_empty(&bd->bd_le.le_list));
BUG_ON(!list_empty(&bd->bd_list));
BUG_ON(!list_empty(&bd->bd_ail_st_list));
BUG_ON(!list_empty(&bd->bd_ail_gl_list));
lops_init_le(&bd->bd_le, &gfs2_revoke_lops);
lops_add(sdp, &bd->bd_le);
lops_init_le(bd, &gfs2_revoke_lops);
lops_add(sdp, bd);
}
void gfs2_trans_add_unrevoke(struct gfs2_sbd *sdp, u64 blkno, unsigned int len)
......@@ -171,9 +177,9 @@ void gfs2_trans_add_unrevoke(struct gfs2_sbd *sdp, u64 blkno, unsigned int len)
unsigned int n = len;
gfs2_log_lock(sdp);
list_for_each_entry_safe(bd, tmp, &sdp->sd_log_le_revoke, bd_le.le_list) {
list_for_each_entry_safe(bd, tmp, &sdp->sd_log_le_revoke, bd_list) {
if ((bd->bd_blkno >= blkno) && (bd->bd_blkno < (blkno + len))) {
list_del_init(&bd->bd_le.le_list);
list_del_init(&bd->bd_list);
gfs2_assert_withdraw(sdp, sdp->sd_log_num_revoke);
sdp->sd_log_num_revoke--;
kmem_cache_free(gfs2_bufdata_cachep, bd);
......
......@@ -25,7 +25,8 @@ struct kmem_cache *gfs2_inode_cachep __read_mostly;
struct kmem_cache *gfs2_bufdata_cachep __read_mostly;
struct kmem_cache *gfs2_rgrpd_cachep __read_mostly;
struct kmem_cache *gfs2_quotad_cachep __read_mostly;
mempool_t *gfs2_bh_pool __read_mostly;
struct kmem_cache *gfs2_rsrv_cachep __read_mostly;
mempool_t *gfs2_page_pool __read_mostly;
void gfs2_assert_i(struct gfs2_sbd *sdp)
{
......
......@@ -152,7 +152,8 @@ extern struct kmem_cache *gfs2_inode_cachep;
extern struct kmem_cache *gfs2_bufdata_cachep;
extern struct kmem_cache *gfs2_rgrpd_cachep;
extern struct kmem_cache *gfs2_quotad_cachep;
extern mempool_t *gfs2_bh_pool;
extern struct kmem_cache *gfs2_rsrv_cachep;
extern mempool_t *gfs2_page_pool;
static inline unsigned int gfs2_tune_get_i(struct gfs2_tune *gt,
unsigned int *p)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment