Commits · 6b82021b9e91cd689fdffadbcdb9a42597bbe764 · nexedi / linux

06 May, 2010 8 commits

ocfs2: increase the default size of local alloc windows · 6b82021b

Mark Fasheh authored Apr 05, 2010

I have observed that the current size of 8M gives us pretty poor
fragmentation on multi-threaded workloads which do lots of writes.

Generally, I can increase the size of local alloc windows and observe a
marked decrease in fragmentation, even up and beyond window sizes of 512
megabytes. This makes sense for a couple reasons - larger local alloc means
more room for reservation windows. On multi-node workloads the larger local
alloc helps as well because we don't have to do window slides as often.

Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no
longer used and the comment above it was out of date.

To test fragmentation, I used a workload which launched 4 threads that did
4k writes into a series of about 140 alternating files.

With resv_level=2, and a 4k/4k file system I observed the following average
fragmentation for various localalloc= parameters:

localalloc=	avg. fragmentation
	8		48
	32		16
	64		10
	120		7

On larger cluster sizes, the difference is more dramatic.

The new default size top out at 256M, which we'll only get for cluster
sizes of 32K and above.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

6b82021b

ocfs2: clean up localalloc mount option size parsing · 73c8a800

Mark Fasheh authored Apr 05, 2010

This patch pulls the local alloc sizing code into localalloc.c and provides
a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged
except that I correctly calculate the maximum local alloc size. The old code
in ocfs2_parse_options() calculated the max size as:

ocfs2_local_alloc_size(sb) * 8

which is correct, in bits. Unfortunately though the option passed in is in
megabytes. Ultimately, this bug made no real difference - the shrink code
would catch a too-large size and bring it down to something reasonable.
Still, it's less than efficient as-is.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

73c8a800

ocfs2: remove ocfs2_local_alloc_in_range() · a57c8fd2

Mark Fasheh authored Mar 16, 2010

Inodes are always allocated from the global bitmap now so we don't need this
any more. Also, the existing implementation bounces reservations around
needlessly.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

a57c8fd2

ocfs2: allocate btree internal block groups from the global bitmap · 33d5d380

Mark Fasheh authored Feb 24, 2010

Otherwise, the need for a very large contiguous allocation tends to
wreak havoc on many inode allocation reservations on the local alloc, thus
ruining any chances for contiguousness.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

33d5d380

ocfs2: use allocation reservations for directory data · e3b4a97d

Mark Fasheh authored Dec 07, 2009

Use the reservations system for unindexed dir tree allocations. We don't
bother with the indexed tree as reads from it are mostly random anyway.
Directory reservations are marked seperately, to allow the reservations code
a chance to optimize their window sizes. This patch allocates only 8 bits
for directory windows as they generally are not expected to grow as quickly
as file data. Future improvements to dir window sizing can trivially be
made.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

e3b4a97d

ocfs2: use allocation reservations during file write · 4fe370af

Mark Fasheh authored Dec 07, 2009

Add a per-inode reservations structure and pass it through to the
reservations code.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

4fe370af

ocfs2: allocation reservations · d02f00cc

Mark Fasheh authored Dec 07, 2009

This patch improves Ocfs2 allocation policy by allowing an inode to
reserve a portion of the local alloc bitmap for itself. The reserved
portion (allocation window) is advisory in that other allocation
windows might steal it if the local alloc bitmap becomes
full. Otherwise, the reservations are honored and guaranteed to be
free. When the local alloc window is moved to a different portion of
the bitmap, existing reservations are discarded.

Reservation windows are represented internally by a red-black
tree. Within that tree, each node represents the reservation window of
one inode. An LRU of active reservations is also maintained. When new
data is written, we allocate it from the inodes window. When all bits
in a window are exhausted, we allocate a new one as close to the
previous one as possible. Should we not find free space, an existing
reservation is pulled off the LRU and cannibalized.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

d02f00cc

ocfs2: Make ocfs2_journal_dirty() void. · ec20cec7

Joel Becker authored Mar 19, 2010

jbd[2]_journal_dirty_metadata() only returns 0.  It's been returning 0
since before the kernel moved to git.  There is no point in checking
this error.

ocfs2_journal_dirty() has been faithfully returning the status since the
beginning.  All over ocfs2, we have blocks of code checking this can't
fail status.  In the past few years, we've tried to avoid adding these
checks, because they are pointless.  But anyone who looks at our code
assumes they are needed.

Finally, ocfs2_journal_dirty() is made a void function.  All error
checking is removed from other files.  We'll BUG_ON() the status of
jbd2_journal_dirty_metadata() just in case they change it someday.  They
won't.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

ec20cec7

24 Mar, 2010 1 commit

ocfs2: Clear undo bits when local alloc is freed · b4414eea

Mark Fasheh authored Mar 11, 2010

When the local alloc file changes windows, unused bits are freed back to the
global bitmap. By defnition, those bits can not be in use by any file. Also,
the local alloc will never have been able to allocate those bits if they
were part of a previous truncate. Therefore it makes sense that we should
clear unused local alloc bits in the undo buffer so that they can be used
immediatly.

[ Modified to call it ocfs2_release_clusters() -- Joel ]
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

b4414eea

19 Mar, 2010 2 commits

ocfs2: Init meta_ac properly in ocfs2_create_empty_xattr_block. · b2317968

Tao Ma authored Mar 19, 2010

You can't store a pointer that you haven't filled in yet and expect it
to work.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

b2317968

ocfs2: Fix the update of name_offset when removing xattrs · dfe4d3d6

Tao Ma authored Mar 19, 2010

When replacing a xattr's value, in some case we wipe its name/value
first and then re-add it. The wipe is done by
ocfs2_xa_block_wipe_namevalue() when the xattr is in the inode or
block. We currently adjust name_offset for all the entries which have
(offset < name_offset). This does not adjust the entrie we're replacing.
Since we are replacing the entry, we don't adjust the total entry count.
When we calculate a new namevalue location, we trust the entries
now-wrong offset in ocfs2_xa_get_free_start().  The solution is to
also adjust the name_offset for the replaced entry, allowing
ocfs2_xa_get_free_start() to calculate the new namevalue location
correctly.

The following script can trigger a kernel panic easily.

echo 'y'|mkfs.ocfs2 --fs-features=local,xattr -b 4K $DEVICE
mount -t ocfs2 $DEVICE $MNT_DIR
FILE=$MNT_DIR/$RANDOM
for((i=0;i<76;i++))
do
string_76="a$string_76"
done
string_78="aa$string_76"
string_82="aaaa$string_78"

touch $FILE
setfattr -n 'user.test1234567890' -v $string_76 $FILE
setfattr -n 'user.test1234567890' -v $string_78 $FILE
setfattr -n 'user.test1234567890' -v $string_82 $FILE
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

dfe4d3d6

18 Mar, 2010 1 commit

ocfs2: Always try for maximum bits with new local alloc windows · b22b63eb

Mark Fasheh authored Mar 11, 2010

What we were doing before was to ask for the current window size as the
maximum allocation. This had the effect of limiting the amount of allocation
we could get for the local alloc during times when the window size was
shrunk due to fragmentation. In some cases, that could actually *increase*
fragmentation by artificially limiting the number of bits we can accept. So
while we still want to ask for a minimum number of bits equal to window
size, there is no reason why we should limit the number of bits the local
alloc should accept. Hence always allow the maximum number of local alloc
bits.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

b22b63eb

17 Mar, 2010 4 commits

ocfs2: set i_mode on disk during acl operations · fcefd25a

Mark Fasheh authored Mar 15, 2010

ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory
inode, but never setting it on the disk copy. Thus, acls were some times not
getting propagated between nodes. This patch fixes the issue by adding a
helper function ocfs2_acl_set_mode() which does this the right way.
ocfs2_set_acl() and ocfs2_init_acl() are then updated to call
ocfs2_acl_set_mode().
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

fcefd25a

ocfs2: Update i_blocks in reflink operations. · 6527f8f8

Tao Ma authored Mar 10, 2010

In reflink, we need to upate i_blocks for the target inode.
Reported-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

6527f8f8

ocfs2: Change bg_chain check for ocfs2_validate_gd_parent. · 78c37eb0

Tao Ma authored Mar 03, 2010

In ocfs2_validate_gd_parent, we check bg_chain against the
cl_next_free_rec of the dinode. Actually in resize, we have
the chance of bg_chain == cl_next_free_rec. So add some
additional condition check for it.

I also rename paramter "clean_error" to "resize", since the
old one is not clearly enough to indicate that we should only
meet with this case in resize.

btw, the correpsonding bug is
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

78c37eb0

· ee860b6a

Sachin Prabhu authored Mar 10, 2010

[PATCH] Skip check for mandatory locks when unlocking

ocfs2_lock() will skip locks on file which has mode set to 02666. This
is a problem in cases where the mode of the file is changed after a
process has obtained a lock on the file.

ocfs2_lock() should skip the check for mandatory locks when unlocking a
file.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

ee860b6a

15 Mar, 2010 21 commits

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 · a3d3203e

Linus Torvalds authored Mar 14, 2010

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (34 commits)
  ACPI: processor: push file static MADT pointer into internal map_madt_entry()
  ACPI: processor: refactor internal map_lsapic_id()
  ACPI: processor: refactor internal map_x2apic_id()
  ACPI: processor: refactor internal map_lapic_id()
  ACPI: processor: driver doesn't need to evaluate _PDC
  ACPI: processor: remove early _PDC optin quirks
  ACPI: processor: add internal processor_physically_present()
  ACPI: processor: move acpi_get_cpuid into processor_core.c
  ACPI: processor: export acpi_get_cpuid()
  ACPI: processor: mv processor_pdc.c processor_core.c
  ACPI: processor: mv processor_core.c processor_driver.c
  ACPI: plan to delete "acpi=ht" boot option
  ACPI: remove "acpi=ht" DMI blacklist
  PNPACPI: add bus number support
  PNPACPI: add window support
  resource: add window support
  resource: add bus number support
  resource: expand IORESOURCE_TYPE_BITS to make room for bus resource type
  acpiphp: Execute ACPI _REG method for hotadded devices
  ACPI video: Be more liberal in validating _BQC behaviour
  ...

a3d3203e

init dynamic bin_attribute structures · f937331b

Wolfram Sang authored Mar 15, 2010

Commit 6992f533 ("sysfs: Use one lockdep
class per sysfs attribute.") introduced this requirement.  First, at25
was fixed manually.  Then, other occurences were found with coccinelle
and the following semantic patch.  Results were reviewed and fixed up:

    @ init @
    identifier struct_name, bin;
    @@

    	struct struct_name {
    		...
    		struct bin_attribute bin;
    		...
    	};

    @ main extends init @
    expression E;
    statement S;
    identifier name, err;
    @@

    (
    	struct struct_name *name;
    |
    -	struct struct_name *name = NULL;
    +	struct struct_name *name;
    )
    	...
    (
    	sysfs_bin_attr_init(&name->bin);
    |
    +	sysfs_bin_attr_init(&name->bin);
    	if (sysfs_create_bin_file(E, &name->bin))
    		S
    |
    +	sysfs_bin_attr_init(&name->bin);
    	err = sysfs_create_bin_file(E, &name->bin);
    )
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f937331b

Merge branches 'battery-2.6.34', 'bugzilla-10805', 'bugzilla-14668',... · ec28dcc6

Len Brown authored Mar 14, 2010

Merge branches 'battery-2.6.34', 'bugzilla-10805', 'bugzilla-14668', 'bugzilla-531916-power-state', 'ht-warn-2.6.34', 'pnp', 'processor-rename', 'sony-2.6.34', 'suse-bugzilla-531547', 'tz-check', 'video' and 'misc-2.6.34' into release

ec28dcc6

ACPI: processor: push file static MADT pointer into internal map_madt_entry() · 149fe9c2

Alex Chiang authored Feb 22, 2010

There's no real need for a pointer to the MADT to be global. The only
function who uses it is map_madt_entry.

This allows us to remove some more ugly #ifdefs.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

149fe9c2

ACPI: processor: refactor internal map_lsapic_id() · eae701ce

Alex Chiang authored Feb 22, 2010

Un-nest the if statements for readability.

Remove comments that re-state the obvious.

Change the control flow so that we no longer need a temp variable.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

eae701ce

ACPI: processor: refactor internal map_x2apic_id() · d6742095

Alex Chiang authored Feb 22, 2010

Untangle the nested if conditions to make this function look
more similar to the other map_*apic_id() functions.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

d6742095

ACPI: processor: refactor internal map_lapic_id() · 11130736

Alex Chiang authored Feb 22, 2010

Untangle the if() statement a little for readability.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

11130736

ACPI: processor: driver doesn't need to evaluate _PDC · d8191fa4

Alex Chiang authored Feb 22, 2010

Now that the early _PDC evaluation path knows how to correctly
evaluate _PDC on only physically present processors, there's no
need for the processor driver to evaluate it later when it loads.

To cover the hotplug case, push _PDC evaluation down into the
hotplug paths.

Cc: x86@kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

d8191fa4

ACPI: processor: remove early _PDC optin quirks · 3b1da4c5

Alex Chiang authored Feb 22, 2010

Now that we check for physically present processors before blindly
evaluating _PDC, we no longer need to maintain a DMI opt-in table
nor a kernel param.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

3b1da4c5

ACPI: processor: add internal processor_physically_present() · 5d554a7b

Alex Chiang authored Feb 22, 2010

Detect if a processor is physically present before evaluating _PDC.

We want this because some BIOS will provide a _PDC even for processors
that are not present. These bogus _PDC methods then attempt to load
non-existent tables, which causes problems.

Avoid those bogus landmines.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

5d554a7b

ACPI: processor: move acpi_get_cpuid into processor_core.c · 78ed8bd2

Alex Chiang authored Feb 22, 2010

Enumerating processors (via MADT/_MAT) belongs in the processor core,
which is always built-in, rather than living in the processor driver
which may not be built.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

78ed8bd2

ACPI: processor: export acpi_get_cpuid() · 2e9d5e4e

Alex Chiang authored Feb 22, 2010

Rename static get_cpu_id() to acpi_get_cpuid() and export it.

This change also gives us an opportunity to remove the
#ifndef CONFIG_SMP from processor_driver.c and into a header file
where it properly belongs.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

2e9d5e4e

ACPI: processor: mv processor_pdc.c processor_core.c · 4d5d4cd8

Alex Chiang authored Feb 22, 2010

We've renamed the old processor_core.c to processor_driver.c, to
convey the idea that it can be built modular and has driver-like
bits.

Now let's re-create a processor_core.c for the bits needed
statically by the rest of the kernel. The contents of processor_pdc.c
are a good starting spot, so let's just rename that file and
complete our three card monte.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

4d5d4cd8

ACPI: processor: mv processor_core.c processor_driver.c · 0131aa3d

Alex Chiang authored Feb 22, 2010

The ACPI processor driver can be built as a module. But it has
pieces of code that should always be built statically into the
kernel.

The plan is for processor_core.c to contain the static bits while
processor_driver.c contains the module-like bits.

Since the bulk of the code in the current processor_core.c is
module-like, first step is to rename the file to processor_driver.c

Next step will re-create processor_core.c and cherry-pick out
the static bits.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

0131aa3d

ACPI: plan to delete "acpi=ht" boot option · 4c81ba49
Len Brown authored Mar 14, 2010
```
Signed-off-by: Len Brown <len.brown@intel.com>
```
4c81ba49

ACPI: remove "acpi=ht" DMI blacklist · 8144c880

Len Brown authored Feb 18, 2010

SuSE added these entries when deploying ACPI in Linux-2.4.
I pulled them into Linux-2.6 on 2003-08-09.
Over the last 6+ years, several entries have proven to be
unnecessary and deleted, while no new entries have been added.
Matthew suggests that they now have negative value, and I agree.
Based-on-patch-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Len Brown <len.brown@intel.com>

8144c880

PNPACPI: add bus number support · 7e0e9c04

Bjorn Helgaas authored Mar 05, 2010

Add support for bus number resources.  This is for bridges with a range of
bus numbers behind them.  Previously, PNP ignored bus number resources.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7e0e9c04

PNPACPI: add window support · fa35b492

Bjorn Helgaas authored Mar 05, 2010

Add support for resource windows.  This is for bridge resources, i.e.,
regions where a bridge forwards transactions from the primary to the
secondary side.  This does not add support for *setting* windows via
the /proc interface.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

fa35b492

resource: add window support · 9d7cca04

Bjorn Helgaas authored Mar 05, 2010

Add support for resource windows.  This is for bridge resources, i.e.,
regions where a bridge forwards transactions from the primary to the
secondary side.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

9d7cca04

resource: add bus number support · 0f4050c7

Bjorn Helgaas authored Mar 05, 2010

Add support for bus number resources.  This is for bridges with a range of
bus numbers behind them.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

0f4050c7

resource: expand IORESOURCE_TYPE_BITS to make room for bus resource type · cd7e9fcd

Bjorn Helgaas authored Mar 05, 2010

No functional change; this just makes room for another resource type.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>

cd7e9fcd

14 Mar, 2010 3 commits

tomoyo: fix potential use after free · 181427a7

Dan Carpenter authored Mar 13, 2010

The original code returns a freed pointer.  This function is expected to
return NULL on errors.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <jmorris@namei.org>

181427a7

acpiphp: Execute ACPI _REG method for hotadded devices · d0607050

Shaohua Li authored Feb 25, 2010

Per ACPI spec, _ERG method should be executed before device driver
gets control for hotpluged device. Firmware might do some configuration
there. See http://bugzilla.kernel.org/show_bug.cgi?id=10805. In this
machine, _REG method of docked device will configure cardbus bridge.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Tested-by: Paul Martin <pm@debian.org>
Signed-off-by: Len Brown <len.brown@intel.com>

d0607050

ACPI video: Be more liberal in validating _BQC behaviour · 70287db8

Matthew Garrett authored Feb 16, 2010

Right now, if _BQC returns a value we don't understand we immediately
invalidate it. Change this behaviour so we only invalidate it if it
continues to give an invalid answer after we've already set a brightness.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

70287db8