Merge http://linux.bkbits.net/linux-2.5

into lab343.munich.sgi.com:/home/hch/repo/bkbits/linux-2.5

Merge http://linux.bkbits.net/linux-2.5
into lab343.munich.sgi.com:/home/hch/repo/bkbits/linux-2.5
cfd2e1af · Christoph Hellwig · 1142fa81 · 9a3e1a96 · cfd2e1af · cfd2e1af
Commit cfd2e1af authored Feb 06, 2003 by Christoph Hellwig
41 changed files
--- a/Documentation/DocBook/journal-api.tmpl
+++ b/Documentation/DocBook/journal-api.tmpl
@@ -141,17 +141,14 @@ you are have done so you need to call journal_dirty_{meta,}data().
 Or if you've asked for access to a buffer you now know is now longer 
 required to be pushed back on the device you can call journal_forget()
 in much the same way as you might have used bforget() in the past.
-
 </para>

-
-
 <para>
 A journal_flush() may be called at any time to commit and checkpoint
 all your transactions.
 </para>
-<para>

+<para>
 Then at umount time , in your put_super() (2.4) or write_super() (2.5)
 you can then call journal_destroy() to clean up your in-core journal object.
 </para>
@@ -168,8 +165,8 @@ on another journal. Since transactions can't be nested/batched
 across differing journals, and another filesystem other than
 yours (say ext3) may be modified in a later syscall.
 </para>
-<para>

+<para>
 The second case to bear in mind is that journal_start() can 
 block if there isn't enough space in the journal for your transaction 
 (based on the passed nblocks param) - when it blocks it merely(!) needs to
@@ -180,10 +177,14 @@ were semaphores and include them in your semaphore ordering rules to prevent
 deadlocks. Note that journal_extend() has similar blocking behaviour to
 journal_start() so you can deadlock here just as easily as on journal_start().
 </para>
-<para>

-Try to reserve the right number of blocks the first time. ;-).
+<para>
+Try to reserve the right number of blocks the first time. ;-). This will
+be the maximum number of blocks you are going to touch in this transaction.
+I advise having a look at at least ext3_jbd.h to see the basis on which 
+ext3 uses to make these decisions.
 </para>
+
 <para>
 Another wriggle to watch out for is your on-disk block allocation strategy.
 why? Because, if you undo a delete, you need to ensure you haven't reused any
@@ -211,6 +212,30 @@ The opportunities for abuse and DOS attacks with this should be obvious,
 if you allow unprivileged userspace to trigger codepaths containing these
 calls.
 </para>
+
+<para>
+A new feature of jbd since 2.5.25 is commit callbacks with the new
+journal_callback_set() function you can now ask the journalling layer
+to call you back when the transaction is finally commited to disk, so that
+you can do some of your own management. The key to this is the journal_callback
+struct, this maintains the internal callback information but you can
+extend it like this:-
+</para>
+<programlisting>
+	struct  myfs_callback_s {
+		//Data structure element required by jbd..
+		struct journal_callback for_jbd;
+		// Stuff for myfs allocated together.
+		myfs_inode*    i_commited;
+	
+	}
+</programlisting>
+
+<para>
+this would be useful if you needed to know when data was commited to a 
+particular inode.
+</para>
+
 </sect1>

 <sect1>

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
-November 2002             Kernel Parameters                     v2.5.49
+February 2003             Kernel Parameters                     v2.5.59
                          ~~~~~~~~~~~~~~~~~

 The following is a consolidated list of the kernel parameters as implemented
@@ -60,6 +60,7 @@ restrictions referred to are that the relevant option is valid if:
 	V4L	Video For Linux support is enabled.
 	VGA	The VGA console has been enabled.
 	VT	Virtual terminal support is enabled.
+	WDT	Watchdog support is enabled.
 	XT	IBM PC/XT MFM hard disk support is enabled.

 In addition, the following text indicates that the option:
@@ -98,6 +99,9 @@ running once the system is up.
 	advansys=	[HW,SCSI]
 			See header of drivers/scsi/advansys.c.

+	advwdt=		[HW,WDT] Advantech WDT
+			Format: <iostart>,<iostop>
+
 	aedsp16=	[HW,OSS] Audio Excel DSP 16
 			Format: <io>,<irq>,<dma>,<mss_io>,<mpu_io>,<mpu_irq>
 			See also header of sound/oss/aedsp16.c.
@@ -111,6 +115,9 @@ running once the system is up.
 	aic7xxx=	[HW,SCSI]
 			See Documentation/scsi/aic7xxx.txt.

+	aic79xx=	[HW,SCSI]
+			See Documentation/scsi/aic79xx.txt.
+
 	allowdma0	[ISAPNP]

 	AM53C974=	[HW,SCSI]
@@ -231,19 +238,11 @@ running once the system is up.
 	cs89x0_media=	[HW,NET]
 			Format: { rj45 | aui | bnc }
 
-	ctc=		[HW,NET]
-			See drivers/s390/net/ctcmain.c, comment before function
-			ctc_setup().
- 
 	cyclades=	[HW,SERIAL] Cyclades multi-serial port adapter.
 
 	dasd=		[HW,NET]    
 			See header of drivers/s390/block/dasd_devmap.c.

-	dasd_discipline=
-			[HW,NET]
-			See header of drivers/s390/block/dasd.c.
-
 	db9=		[HW,JOY]
 	db9_2=
 	db9_3=
@@ -254,9 +253,6 @@ running once the system is up.
 			Format: <area>[,<node>]
 			See also Documentation/networking/decnet.txt.

-	decr_overclock= [PPC]
-	decr_overclock_proc0=
-
 	devfs=		[DEVFS]
 			See Documentation/filesystems/devfs/boot-options.
 
@@ -305,6 +301,9 @@ running once the system is up.
 			This option is obsoleted by the "netdev=" option, which
 			has equivalent usage. See its documentation for details.

+	eurwdt=		[HW,WDT] Eurotech CPU-1220/1410 onboard watchdog.
+			Format: <io>[,<irq>]
+
 	fd_mcs=		[HW,SCSI]
 			See header of drivers/scsi/fd_mcs.c.

@@ -350,7 +349,9 @@ running once the system is up.
 	hisax=		[HW,ISDN]
 			See Documentation/isdn/README.HiSax.

-	hugepages=	[HW,IA-32] Maximal number of HugeTLB pages
+	hugepages=	[HW,IA-32,IA-64] Maximal number of HugeTLB pages.
+
+	noirqbalance	[IA-32,SMP,KNL] Disable kernel irq balancing

 	i8042_direct	[HW] Non-translated mode
 	i8042_dumbkbd
@@ -394,6 +395,10 @@ running once the system is up.

 	inttest=	[IA64]

+	io7=		[HW] IO7 for Marvel based alpha systems
+			See comment before marvel_specify_io7 in
+			arch/alpha/kernel/core_marvel.c.
+
 	ip=		[IP_PNP]
 			See Documentation/nfsroot.txt.

@@ -495,6 +500,7 @@ running once the system is up.
 
 	mdacon=		[MDA]
 			Format: <first>,<last>
+			Specifies range of consoles to be captured by the MDA.
 
 	mem=exactmap	[KNL,BOOT,IA-32] Enable setting of an exact
 			E820 memory map, as specified by the user.
@@ -576,6 +582,8 @@ running once the system is up.
 
 	nodisconnect	[HW,SCSI,M68K] Disables SCSI disconnects.

+	noexec		[IA-64]
+
 	nofxsr		[BUGS=IA-32]

 	nohighio	[BUGS=IA-32] Disable highmem block I/O.
@@ -599,7 +607,9 @@ running once the system is up.

 	noresume	[SWSUSP] Disables resume and restore original swap space.
 
-	no-scroll	[VGA]
+	no-scroll	[VGA] Disables scrollback.
+			This is required for the Braillex ib80-piezo Braille
+			reader made by F.H. Papenmeier (Germany).

 	nosbagart	[IA-64]

@@ -809,6 +819,9 @@ running once the system is up.
 			See a comment before function sbpcd_setup() in
 			drivers/cdrom/sbpcd.c.

+	sc1200wdt=	[HW,WDT] SC1200 WDT (watchdog) driver
+			Format: <io>[,<timeout>[,<isapnp>]]
+
 	scsi_debug_*=	[SCSI]
 			See drivers/scsi/scsi_debug.c.

@@ -997,9 +1010,6 @@ running once the system is up.
 	spia_pedr=
 	spia_peddr=

-	spread_lpevents=
-			[PPC]
-
 	sscape=		[HW,OSS]
 			Format: <io>,<irq>,<dma>,<mpu_io>,<mpu_irq>
 
@@ -1009,6 +1019,19 @@ running once the system is up.
 	st0x=		[HW,SCSI]
 			See header of drivers/scsi/seagate.c.

+	sti=		[HW]
+			Format: <num>
+			Set the STI (builtin display/keyboard on the HP-PARISC
+			machines) console (graphic card) which should be used
+			as the initial boot-console.
+			See also comment in drivers/video/console/sticore.c.
+
+	sti_font=	[HW]
+			See comment in drivers/video/console/sticore.c.
+
+	stifb=		[HW]
+			Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
+
 	stram_swap=	[HW,M68k]

 	swiotlb=	[IA-64] Number of I/O TLB slabs
@@ -1079,7 +1102,7 @@ running once the system is up.
 	wd7000=		[HW,SCSI]
 			See header of drivers/scsi/wd7000.c.

-	wdt=		[HW] Watchdog
+	wdt=		[WDT] Watchdog
 			See Documentation/watchdog.txt.

 	xd=		[HW,XT] Original XT pre-IDE (RLL encoded) disks.

--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
--- a/arch/i386/kernel/process.c
+++ b/arch/i386/kernel/process.c
@@ -86,7 +86,7 @@ void enable_hlt(void)
 */
 void default_idle(void)
 {
-	if (current_cpu_data.hlt_works_ok && !hlt_counter) {
+	if (!hlt_counter && current_cpu_data.hlt_works_ok) {
 		local_irq_disable();
 		if (!need_resched())
 			safe_halt();

--- a/arch/i386/mm/hugetlbpage.c
+++ b/arch/i386/mm/hugetlbpage.c
@@ -26,7 +26,6 @@ static long    htlbpagemem;
 int     htlbpage_max;
 static long    htlbzone_pages;

-struct vm_operations_struct hugetlb_vm_ops;
 static LIST_HEAD(htlbpage_freelist);
 static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;

@@ -46,6 +45,7 @@ static struct page *alloc_hugetlb_page(void)
 	htlbpagemem--;
 	spin_unlock(&htlbpage_lock);
 	set_page_count(page, 1);
+	page->lru.prev = (void *)huge_page_release;
 	for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
 		clear_highpage(&page[i]);
 	return page;
@@ -134,6 +134,7 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		page = pte_page(pte);
 		if (pages) {
 			page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
+			get_page(page);
 			pages[i] = page;
 		}
 		if (vmas)
@@ -150,6 +151,82 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	return i;
 }

+#if 0	/* This is just for testing */
+struct page *
+follow_huge_addr(struct mm_struct *mm,
+	struct vm_area_struct *vma, unsigned long address, int write)
+{
+	unsigned long start = address;
+	int length = 1;
+	int nr;
+	struct page *page;
+
+	nr = follow_hugetlb_page(mm, vma, &page, NULL, &start, &length, 0);
+	if (nr == 1)
+		return page;
+	return NULL;
+}
+
+/*
+ * If virtual address `addr' lies within a huge page, return its controlling
+ * VMA, else NULL.
+ */
+struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr)
+{
+	if (mm->used_hugetlb) {
+		struct vm_area_struct *vma = find_vma(mm, addr);
+		if (vma && is_vm_hugetlb_page(vma))
+			return vma;
+	}
+	return NULL;
+}
+
+int pmd_huge(pmd_t pmd)
+{
+	return 0;
+}
+
+struct page *
+follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+		pmd_t *pmd, int write)
+{
+	return NULL;
+}
+
+#else
+
+struct page *
+follow_huge_addr(struct mm_struct *mm,
+	struct vm_area_struct *vma, unsigned long address, int write)
+{
+	return NULL;
+}
+
+struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr)
+{
+	return NULL;
+}
+
+int pmd_huge(pmd_t pmd)
+{
+	return !!(pmd_val(pmd) & _PAGE_PSE);
+}
+
+struct page *
+follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+		pmd_t *pmd, int write)
+{
+	struct page *page;
+
+	page = pte_page(*(pte_t *)pmd);
+	if (page) {
+		page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
+		get_page(page);
+	}
+	return page;
+}
+#endif
+
 void free_huge_page(struct page *page)
 {
 	BUG_ON(page_count(page));
@@ -171,7 +248,8 @@ void huge_page_release(struct page *page)
 	free_huge_page(page);
 }

-void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long end)
+void unmap_hugepage_range(struct vm_area_struct *vma,
+		unsigned long start, unsigned long end)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -181,8 +259,6 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsig
 	BUG_ON(start & (HPAGE_SIZE - 1));
 	BUG_ON(end & (HPAGE_SIZE - 1));

-	spin_lock(&htlbpage_lock);
-	spin_unlock(&htlbpage_lock);
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		pte = huge_pte_offset(mm, address);
 		if (pte_none(*pte))
@@ -195,7 +271,9 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsig
 	flush_tlb_range(vma, start, end);
 }

-void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long length)
+void
+zap_hugepage_range(struct vm_area_struct *vma,
+		unsigned long start, unsigned long length)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	spin_lock(&mm->page_table_lock);
@@ -206,6 +284,7 @@ void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigne
 int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
 {
 	struct mm_struct *mm = current->mm;
+	struct inode *inode = mapping->host;
 	unsigned long addr;
 	int ret = 0;

@@ -229,6 +308,7 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
 			+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 		page = find_get_page(mapping, idx);
 		if (!page) {
+			loff_t i_size;
 			page = alloc_hugetlb_page();
 			if (!page) {
 				ret = -ENOMEM;
@@ -240,6 +320,9 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
 				free_huge_page(page);
 				goto out;
 			}
+			i_size = (loff_t)(idx + 1) * HPAGE_SIZE;
+			if (i_size > inode->i_size)
+				inode->i_size = i_size;
 		}
 		set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE);
 	}
@@ -298,8 +381,8 @@ int try_to_free_low(int count)

 int set_hugetlb_mem_size(int count)
 {
-	int j, lcount;
-	struct page *page, *map;
+	int lcount;
+	struct page *page;
 	extern long htlbzone_pages;
 	extern struct list_head htlbpage_freelist;

@@ -315,11 +398,6 @@ int set_hugetlb_mem_size(int count)
 			page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
 			if (page == NULL)
 				break;
-			map = page;
-			for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
-				SetPageReserved(map);
-				map++;
-			}
 			spin_lock(&htlbpage_lock);
 			list_add(&page->list, &htlbpage_freelist);
 			htlbpagemem++;
@@ -341,7 +419,8 @@ int set_hugetlb_mem_size(int count)
 	return (int) htlbzone_pages;
 }

-int hugetlb_sysctl_handler(ctl_table *table, int write, struct file *file, void *buffer, size_t *length)
+int hugetlb_sysctl_handler(ctl_table *table, int write,
+		struct file *file, void *buffer, size_t *length)
 {
 	proc_dointvec(table, write, file, buffer, length);
 	htlbpage_max = set_hugetlb_mem_size(htlbpage_max);
@@ -358,15 +437,13 @@ __setup("hugepages=", hugetlb_setup);

 static int __init hugetlb_init(void)
 {
-	int i, j;
+	int i;
 	struct page *page;

 	for (i = 0; i < htlbpage_max; ++i) {
 		page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
 		if (!page)
 			break;
-		for (j = 0; j < HPAGE_SIZE/PAGE_SIZE; ++j)
-			SetPageReserved(&page[j]);
 		spin_lock(&htlbpage_lock);
 		list_add(&page->list, &htlbpage_freelist);
 		spin_unlock(&htlbpage_lock);
@@ -395,7 +472,14 @@ int is_hugepage_mem_enough(size_t size)
 	return 1;
 }

-static struct page *hugetlb_nopage(struct vm_area_struct * area, unsigned long address, int unused)
+/*
+ * We cannot handle pagefaults against hugetlb pages at all.  They cause
+ * handle_mm_fault() to try to instantiate regular-sized pages in the
+ * hugegpage VMA.  do_page_fault() is supposed to trap this, so BUG is we get
+ * this far.
+ */
+static struct page *hugetlb_nopage(struct vm_area_struct *vma,
+				unsigned long address, int unused)
 {
 	BUG();
 	return NULL;

--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -18,7 +18,6 @@
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>

-static struct vm_operations_struct hugetlb_vm_ops;
 struct list_head htlbpage_freelist;
 spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
 extern long htlbpagemem;
@@ -227,6 +226,7 @@ follow_hugetlb_page (struct mm_struct *mm, struct vm_area_struct *vma,
 		page = pte_page(pte);
 		if (pages) {
 			page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
+			get_page(page);
 			pages[i] = page;
 		}
 		if (vmas)
@@ -303,11 +303,6 @@ set_hugetlb_mem_size (int count)
 			page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
 			if (page == NULL)
 				break;
-			map = page;
-			for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
-				SetPageReserved(map);
-				map++;
-			}
 			spin_lock(&htlbpage_lock);
 			list_add(&page->list, &htlbpage_freelist);
 			htlbpagemem++;
@@ -327,7 +322,7 @@ set_hugetlb_mem_size (int count)
 		map = page;
 		for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
 			map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
-					1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
+					1 << PG_dirty | 1 << PG_active |
 					1 << PG_private | 1<< PG_writeback);
 			map++;
 		}
@@ -337,6 +332,14 @@ set_hugetlb_mem_size (int count)
 	return (int) htlbzone_pages;
 }

+static struct page *
+hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
+{
+	BUG();
+	return NULL;
+}
+
 static struct vm_operations_struct hugetlb_vm_ops = {
-	.close =	zap_hugetlb_resources
+	.nopage =	hugetlb_nopage,
+	.close =	zap_hugetlb_resources,
 };
--- a/arch/sparc64/mm/hugetlbpage.c
+++ b/arch/sparc64/mm/hugetlbpage.c
@@ -288,6 +288,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		page = pte_page(pte);
 		if (pages) {
 			page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
+			get_page(page);
 			pages[i] = page;
 		}
 		if (vmas)
@@ -584,11 +585,6 @@ int set_hugetlb_mem_size(int count)
 			page = alloc_pages(GFP_ATOMIC, HUGETLB_PAGE_ORDER);
 			if (page == NULL)
 				break;
-			map = page;
-			for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
-				SetPageReserved(map);
-				map++;
-			}
 			spin_lock(&htlbpage_lock);
 			list_add(&page->list, &htlbpage_freelist);
 			htlbpagemem++;
@@ -613,7 +609,6 @@ int set_hugetlb_mem_size(int count)
 			map->flags &= ~(1UL << PG_locked | 1UL << PG_error |
 					1UL << PG_referenced |
 					1UL << PG_dirty | 1UL << PG_active |
-					1UL << PG_reserved |
 					1UL << PG_private | 1UL << PG_writeback);
 			set_page_count(page, 0);
 			map++;
@@ -624,6 +619,14 @@ int set_hugetlb_mem_size(int count)
 	return (int) htlbzone_pages;
 }

+static struct page *
+hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
+{
+	BUG();
+	return NULL;
+}
+
 static struct vm_operations_struct hugetlb_vm_ops = {
+	.nopage = hugetlb_nopage,
 	.close	= zap_hugetlb_resources,
 };
--- a/arch/x86_64/mm/hugetlbpage.c
+++ b/arch/x86_64/mm/hugetlbpage.c
@@ -25,7 +25,6 @@ static long    htlbpagemem;
 int     htlbpage_max;
 static long    htlbzone_pages;

-struct vm_operations_struct hugetlb_vm_ops;
 static LIST_HEAD(htlbpage_freelist);
 static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;

@@ -134,6 +133,7 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		page = pte_page(pte);
 		if (pages) {
 			page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
+			get_page(page);
 			pages[i] = page;
 		}
 		if (vmas)
@@ -204,6 +204,7 @@ void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigne
 int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
 {
 	struct mm_struct *mm = current->mm;
+	struct inode = mapping->host;
 	unsigned long addr;
 	int ret = 0;

@@ -227,6 +228,8 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
 			+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 		page = find_get_page(mapping, idx);
 		if (!page) {
+			loff_t i_size;
+
 			page = alloc_hugetlb_page();
 			if (!page) {
 				ret = -ENOMEM;
@@ -238,6 +241,9 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
 				free_huge_page(page);
 				goto out;
 			}
+			i_size = (loff_t)(idx + 1) * HPAGE_SIZE;
+			if (i_size > inode->i_size)
+				inode->i_size = i_size;
 		}
 		set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE);
 	}
@@ -263,11 +269,6 @@ int set_hugetlb_mem_size(int count)
 			page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
 			if (page == NULL)
 				break;
-			map = page;
-			for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
-				SetPageReserved(map);
-				map++;
-			}
 			spin_lock(&htlbpage_lock);
 			list_add(&page->list, &htlbpage_freelist);
 			htlbpagemem++;
@@ -286,8 +287,9 @@ int set_hugetlb_mem_size(int count)
 		spin_unlock(&htlbpage_lock);
 		map = page;
 		for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
-			map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
-					1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
+			map->flags &= ~(1 << PG_locked | 1 << PG_error |
+					1 << PG_referenced |
+					1 << PG_dirty | 1 << PG_active |
 					1 << PG_private | 1<< PG_writeback);
 			set_page_count(map, 0);
 			map++;
@@ -346,7 +348,8 @@ int hugetlb_report_meminfo(char *buf)
 			HPAGE_SIZE/1024);
 }

-static struct page * hugetlb_nopage(struct vm_area_struct * area, unsigned long address, int unused)
+static struct page *
+hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
 {
 	BUG();
 	return NULL;

--- a/drivers/block/ll_rw_blk.c
+++ b/drivers/block/ll_rw_blk.c
@@ -27,6 +27,8 @@
 #include <linux/completion.h>
 #include <linux/slab.h>

+static void blk_unplug_work(void *data);
+
 /*
 * For the allocated request tables
 */
@@ -237,6 +239,14 @@ void blk_queue_make_request(request_queue_t * q, make_request_fn * mfn)
 	blk_queue_hardsect_size(q, 512);
 	blk_queue_dma_alignment(q, 511);

+	q->unplug_thresh = 4;		/* hmm */
+	q->unplug_delay = (3 * HZ) / 1000;	/* 3 milliseconds */
+	if (q->unplug_delay == 0)
+		q->unplug_delay = 1;
+
+	init_timer(&q->unplug_timer);
+	INIT_WORK(&q->unplug_work, blk_unplug_work, q);
+
 	/*
 	 * by default assume old behaviour and bounce for any highmem page
 	 */
@@ -960,6 +970,7 @@ void blk_plug_device(request_queue_t *q)
 	if (!blk_queue_plugged(q)) {
 		spin_lock(&blk_plug_lock);
 		list_add_tail(&q->plug_list, &blk_plug_list);
+		mod_timer(&q->unplug_timer, jiffies + q->unplug_delay);
 		spin_unlock(&blk_plug_lock);
 	}
 }
@@ -974,6 +985,7 @@ int blk_remove_plug(request_queue_t *q)
 	if (blk_queue_plugged(q)) {
 		spin_lock(&blk_plug_lock);
 		list_del_init(&q->plug_list);
+		del_timer(&q->unplug_timer);
 		spin_unlock(&blk_plug_lock);
 		return 1;
 	}
@@ -992,6 +1004,8 @@ static inline void __generic_unplug_device(request_queue_t *q)
 	if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
 		return;

+	del_timer(&q->unplug_timer);
+
 	/*
 	 * was plugged, fire request_fn if queue has stuff to do
 	 */
@@ -1020,6 +1034,18 @@ void generic_unplug_device(void *data)
 	spin_unlock_irq(q->queue_lock);
 }

+static void blk_unplug_work(void *data)
+{
+	generic_unplug_device(data);
+}
+
+static void blk_unplug_timeout(unsigned long data)
+{
+	request_queue_t *q = (request_queue_t *)data;
+
+	schedule_work(&q->unplug_work);
+}
+
 /**
 * blk_start_queue - restart a previously stopped queue
 * @q:    The &request_queue_t in question
@@ -1164,6 +1190,9 @@ void blk_cleanup_queue(request_queue_t * q)
 	count -= __blk_cleanup_queue(&q->rq[READ]);
 	count -= __blk_cleanup_queue(&q->rq[WRITE]);

+	del_timer_sync(&q->unplug_timer);
+	flush_scheduled_work();
+
 	if (count)
 		printk("blk_cleanup_queue: leaked requests (%d)\n", count);

@@ -1269,6 +1298,9 @@ int blk_init_queue(request_queue_t *q, request_fn_proc *rfn, spinlock_t *lock)
 	blk_queue_make_request(q, __make_request);
 	blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE);

+	q->unplug_timer.function = blk_unplug_timeout;
+	q->unplug_timer.data = (unsigned long)q;
+
 	blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
 	blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);

@@ -1811,7 +1843,15 @@ static int __make_request(request_queue_t *q, struct bio *bio)
 out:
 	if (freereq)
 		__blk_put_request(q, freereq);
+
+	if (blk_queue_plugged(q)) {
+		int nr_queued = (queue_nr_requests - q->rq[0].count) +
+				(queue_nr_requests - q->rq[1].count);
+		if (nr_queued == q->unplug_thresh)
+			__generic_unplug_device(q);
+	}
 	spin_unlock_irq(q->queue_lock);
+
 	return 0;

 end_io:

--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -350,15 +350,10 @@ static int do_bio_filebacked(struct loop_device *lo, struct bio *bio)
 	int ret;

 	pos = ((loff_t) bio->bi_sector << 9) + lo->lo_offset;
-
-	do {
 	if (bio_rw(bio) == WRITE)
 		ret = lo_send(lo, bio, lo->lo_blocksize, pos);
 	else
 		ret = lo_receive(lo, bio, lo->lo_blocksize, pos);
-
-	} while (++bio->bi_idx < bio->bi_vcnt);
-
 	return ret;
 }


--- a/drivers/media/video/Kconfig
+++ b/drivers/media/video/Kconfig
@@ -19,7 +19,7 @@ comment "Video Adapters"

 config VIDEO_BT848
 	tristate "BT848 Video For Linux"
-	depends on VIDEO_DEV && PCI && I2C_ALGOBIT
+	depends on VIDEO_DEV && PCI && I2C_ALGOBIT && SOUND
 	---help---
 	  Support for BT848 based frame grabber/overlay boards. This includes
 	  the Miro, Hauppauge and STB boards. Please read the material in

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -127,9 +127,10 @@ void __wait_on_buffer(struct buffer_head * bh)
 	get_bh(bh);
 	do {
 		prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
+		if (buffer_locked(bh)) {
 			blk_run_queues();
-		if (buffer_locked(bh))
 			io_schedule();
+		}
 	} while (buffer_locked(bh));
 	put_bh(bh);
 	finish_wait(wqh, &wait);
@@ -959,8 +960,6 @@ create_buffers(struct page * page, unsigned long size, int retry)
 	 * the reserve list is empty, we're sure there are 
 	 * async buffer heads in use.
 	 */
-	blk_run_queues();
-
 	free_more_memory();
 	goto try_again;
 }

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -300,6 +300,8 @@ void put_dirty_page(struct task_struct * tsk, struct page *page, unsigned long a

 	pgd = pgd_offset(tsk->mm, address);
 	pte_chain = pte_chain_alloc(GFP_KERNEL);
+	if (!pte_chain)
+		goto out_sig;
 	spin_lock(&tsk->mm->page_table_lock);
 	pmd = pmd_alloc(tsk->mm, pgd, address);
 	if (!pmd)
@@ -325,6 +327,7 @@ void put_dirty_page(struct task_struct * tsk, struct page *page, unsigned long a
 	return;
 out:
 	spin_unlock(&tsk->mm->page_table_lock);
+out_sig:
 	__free_page(page);
 	force_sig(SIGKILL, tsk);
 	pte_chain_free(pte_chain);

--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -99,6 +99,34 @@ int ext3_forget(handle_t *handle, int is_metadata,
 	return err;
 }

+/*
+ * Work out how many blocks we need to progress with the next chunk of a
+ * truncate transaction.
+ */
+
+static unsigned long blocks_for_truncate(struct inode *inode) 
+{
+	unsigned long needed;
+	
+	needed = inode->i_blocks >> (inode->i_sb->s_blocksize_bits - 9);
+
+	/* Give ourselves just enough room to cope with inodes in which
+	 * i_blocks is corrupt: we've seen disk corruptions in the past
+	 * which resulted in random data in an inode which looked enough
+	 * like a regular file for ext3 to try to delete it.  Things
+	 * will go a bit crazy if that happens, but at least we should
+	 * try not to panic the whole kernel. */
+	if (needed < 2)
+		needed = 2;
+
+	/* But we need to bound the transaction so we don't overflow the
+	 * journal. */
+	if (needed > EXT3_MAX_TRANS_DATA) 
+		needed = EXT3_MAX_TRANS_DATA;
+
+	return EXT3_DATA_TRANS_BLOCKS + needed;
+}
+	
 /* 
 * Truncate transactions can be complex and absolutely huge.  So we need to
 * be able to restart the transaction at a conventient checkpoint to make
@@ -112,14 +140,9 @@ int ext3_forget(handle_t *handle, int is_metadata,

 static handle_t *start_transaction(struct inode *inode) 
 {
-	long needed;
 	handle_t *result;
 	
-	needed = inode->i_blocks;
-	if (needed > EXT3_MAX_TRANS_DATA) 
-		needed = EXT3_MAX_TRANS_DATA;
-	
-	result = ext3_journal_start(inode, EXT3_DATA_TRANS_BLOCKS + needed);
+	result = ext3_journal_start(inode, blocks_for_truncate(inode));
 	if (!IS_ERR(result))
 		return result;
 	
@@ -135,14 +158,9 @@ static handle_t *start_transaction(struct inode *inode)
 */
 static int try_to_extend_transaction(handle_t *handle, struct inode *inode)
 {
-	long needed;
-	
 	if (handle->h_buffer_credits > EXT3_RESERVE_TRANS_BLOCKS)
 		return 0;
-	needed = inode->i_blocks;
-	if (needed > EXT3_MAX_TRANS_DATA) 
-		needed = EXT3_MAX_TRANS_DATA;
-	if (!ext3_journal_extend(handle, EXT3_RESERVE_TRANS_BLOCKS + needed))
+	if (!ext3_journal_extend(handle, blocks_for_truncate(inode)))
 		return 0;
 	return 1;
 }
@@ -154,11 +172,8 @@ static int try_to_extend_transaction(handle_t *handle, struct inode *inode)
 */
 static int ext3_journal_test_restart(handle_t *handle, struct inode *inode)
 {
-	long needed = inode->i_blocks;
-	if (needed > EXT3_MAX_TRANS_DATA) 
-		needed = EXT3_MAX_TRANS_DATA;
 	jbd_debug(2, "restarting handle %p\n", handle);
-	return ext3_journal_restart(handle, EXT3_DATA_TRANS_BLOCKS + needed);
+	return ext3_journal_restart(handle, blocks_for_truncate(inode));
 }

 /*

--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -61,6 +61,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 			sb->s_op->dirty_inode(inode);
 	}

+	/*
+	 * make sure that changes are seen by all cpus before we test i_state
+	 * -- mikulas
+	 */
+	smp_mb();
+
 	/* avoid the locking if we can */
 	if ((inode->i_state & flags) == flags)
 		return;
@@ -137,6 +143,12 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc)
 	inode->i_state |= I_LOCK;
 	inode->i_state &= ~I_DIRTY;

+	/*
+	 * smp_rmb(); note: if you remove write_lock below, you must add this.
+	 * mark_inode_dirty doesn't take spinlock, make sure that inode is not
+	 * read speculatively by this cpu before &= ~I_DIRTY  -- mikulas
+	 */
+
 	write_lock(&mapping->page_lock);
 	if (wait || !wbc->for_kupdate || list_empty(&mapping->io_pages))
 		list_splice_init(&mapping->dirty_pages, &mapping->io_pages);
@@ -334,7 +346,6 @@ writeback_inodes(struct writeback_control *wbc)
 	}
 	spin_unlock(&sb_lock);
 	spin_unlock(&inode_lock);
-	blk_run_queues();
 }

 /*

--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
--- a/fs/jbd/recovery.c
+++ b/fs/jbd/recovery.c
@@ -206,20 +206,22 @@ do {									\
 		var -= ((journal)->j_last - (journal)->j_first);	\
 } while (0)

-/*
- * journal_recover
+/**
+ * int journal_recover(journal_t *journal) - recovers a on-disk journal
+ * @journal: the journal to recover
 * 
 * The primary function for recovering the log contents when mounting a
 * journaled device.  
- * 
+ */
+int journal_recover(journal_t *journal)
+{
+/*
 * Recovery is done in three passes.  In the first pass, we look for the
 * end of the log.  In the second, we assemble the list of revoke
 * blocks.  In the third and final pass, we replay any un-revoked blocks
 * in the log.  
 */

-int journal_recover(journal_t *journal)
-{
 	int			err;
 	journal_superblock_t *	sb;

@@ -263,20 +265,23 @@ int journal_recover(journal_t *journal)
 	return err;
 }

-/*
- * journal_skip_recovery
+/**
+ * int journal_skip_recovery() - Start journal and wipe exiting records 
+ * @journal: journal to startup
 * 
 * Locate any valid recovery information from the journal and set up the
 * journal structures in memory to ignore it (presumably because the
 * caller has evidence that it is out of date).  
- *
+ * This function does'nt appear to be exorted..
+ */
+int journal_skip_recovery(journal_t *journal)
+{
+/*
 * We perform one pass over the journal to allow us to tell the user how
 * much recovery information is being erased, and to let us initialise
 * the journal transaction sequence numbers to the next unused ID. 
 */

-int journal_skip_recovery(journal_t *journal)
-{
 	int			err;
 	journal_superblock_t *	sb;


--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -116,6 +116,49 @@ mpage_alloc(struct block_device *bdev,
 	return bio;
 }

+/*
+ * support function for mpage_readpages.  The fs supplied get_block might
+ * return an up to date buffer.  This is used to map that buffer into
+ * the page, which allows readpage to avoid triggering a duplicate call
+ * to get_block.
+ *
+ * The idea is to avoid adding buffers to pages that don't already have
+ * them.  So when the buffer is up to date and the page size == block size,
+ * this marks the page up to date instead of adding new buffers.
+ */
+static void 
+map_buffer_to_page(struct page *page, struct buffer_head *bh, int page_block) 
+{
+	struct inode *inode = page->mapping->host;
+	struct buffer_head *page_bh, *head;
+	int block = 0;
+
+	if (!page_has_buffers(page)) {
+		/*
+		 * don't make any buffers if there is only one buffer on
+		 * the page and the page just needs to be set up to date
+		 */
+		if (inode->i_blkbits == PAGE_CACHE_SHIFT && 
+		    buffer_uptodate(bh)) {
+			SetPageUptodate(page);    
+			return;
+		}
+		create_empty_buffers(page, 1 << inode->i_blkbits, 0);
+	}
+	head = page_buffers(page);
+	page_bh = head;
+	do {
+		if (block == page_block) {
+			page_bh->b_state = bh->b_state;
+			page_bh->b_bdev = bh->b_bdev;
+			page_bh->b_blocknr = bh->b_blocknr;
+			break;
+		}
+		page_bh = page_bh->b_this_page;
+		block++;
+	} while (page_bh != head);
+}
+
 /**
 * mpage_readpages - populate an address space with some pages, and
 *                       start reads against them.
@@ -186,6 +229,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 	block_in_file = page->index << (PAGE_CACHE_SHIFT - blkbits);
 	last_block = (inode->i_size + blocksize - 1) >> blkbits;

+	bh.b_page = page;
 	for (page_block = 0; page_block < blocks_per_page;
 				page_block++, block_in_file++) {
 		bh.b_state = 0;
@@ -201,6 +245,17 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 			continue;
 		}

+		/* some filesystems will copy data into the page during
+		 * the get_block call, in which case we don't want to
+		 * read it again.  map_buffer_to_page copies the data
+		 * we just collected from get_block into the page's buffers
+		 * so readpage doesn't have to repeat the get_block call
+		 */
+		if (buffer_uptodate(&bh)) {
+			map_buffer_to_page(page, &bh, page_block);
+			goto confused;
+		}
+	
 		if (first_hole != blocks_per_page)
 			goto confused;		/* hole -> non-hole */

@@ -256,7 +311,10 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 confused:
 	if (bio)
 		bio = mpage_bio_submit(READ, bio);
+	if (!PageUptodate(page))
 	        block_read_full_page(page, get_block);
+	else
+		unlock_page(page);
 	goto out;
 }

@@ -344,6 +402,7 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
 	sector_t boundary_block = 0;
 	struct block_device *boundary_bdev = NULL;
 	int length;
+	struct buffer_head map_bh;

 	if (page_has_buffers(page)) {
 		struct buffer_head *head = page_buffers(page);
@@ -401,8 +460,8 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
 	BUG_ON(!PageUptodate(page));
 	block_in_file = page->index << (PAGE_CACHE_SHIFT - blkbits);
 	last_block = (inode->i_size - 1) >> blkbits;
+	map_bh.b_page = page;
 	for (page_block = 0; page_block < blocks_per_page; ) {
-		struct buffer_head map_bh;

 		map_bh.b_state = 0;
 		if (get_block(inode, block_in_file, &map_bh, 1))
@@ -559,7 +618,6 @@ mpage_writepages(struct address_space *mapping,
 	int (*writepage)(struct page *page, struct writeback_control *wbc);

 	if (wbc->nonblocking && bdi_write_congested(bdi)) {
-		blk_run_queues();
 		wbc->encountered_congestion = 1;
 		return 0;
 	}
@@ -614,7 +672,6 @@ mpage_writepages(struct address_space *mapping,
 			if (ret || (--(wbc->nr_to_write) <= 0))
 				done = 1;
 			if (wbc->nonblocking && bdi_write_congested(bdi)) {
-				blk_run_queues();
 				wbc->encountered_congestion = 1;
 				done = 1;
 			}

--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -535,6 +535,10 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	if (retval)
 		goto fput_in;

+	retval = security_file_permission (in_file, MAY_READ);
+	if (retval)
+		goto fput_in;
+
 	/*
 	 * Get output file, and verify that it is ok..
 	 */
@@ -556,6 +560,10 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	if (retval)
 		goto fput_out;

+	retval = security_file_permission (out_file, MAY_WRITE);
+	if (retval)
+		goto fput_out;
+
 	if (!ppos)
 		ppos = &in_file->f_pos;


--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -4,6 +4,8 @@
 #include <linux/major.h>
 #include <linux/genhd.h>
 #include <linux/list.h>
+#include <linux/timer.h>
+#include <linux/workqueue.h>
 #include <linux/pagemap.h>
 #include <linux/backing-dev.h>
 #include <linux/wait.h>
@@ -188,6 +190,14 @@ struct request_queue
 	unplug_fn		*unplug_fn;
 	merge_bvec_fn		*merge_bvec_fn;

+	/*
+	 * Auto-unplugging state
+	 */
+	struct timer_list	unplug_timer;
+	int			unplug_thresh;	/* After this many requests */
+	unsigned long		unplug_delay;	/* After this many jiffies */
+	struct work_struct	unplug_work;
+
 	struct backing_dev_info	backing_dev_info;

 	/*

--- a/include/linux/ext3_jbd.h
+++ b/include/linux/ext3_jbd.h
@@ -28,7 +28,7 @@
 * indirection blocks, the group and superblock summaries, and the data
 * block to complete the transaction.  */

-#define EXT3_SINGLEDATA_TRANS_BLOCKS	8
+#define EXT3_SINGLEDATA_TRANS_BLOCKS	8U

 /* Extended attributes may touch two data buffers, two bitmap buffers,
 * and two group and summaries. */
@@ -58,7 +58,7 @@ extern int ext3_writepage_trans_blocks(struct inode *inode);
 * start off at the maximum transaction size and grow the transaction
 * optimistically as we go. */

-#define EXT3_MAX_TRANS_DATA		64
+#define EXT3_MAX_TRANS_DATA		64U

 /* We break up a large truncate or write transaction once the handle's
 * buffer credits gets this low, we need either to extend the
@@ -67,7 +67,7 @@ extern int ext3_writepage_trans_blocks(struct inode *inode);
 * one block, plus two quota updates.  Quota allocations are not
 * needed. */

-#define EXT3_RESERVE_TRANS_BLOCKS	12
+#define EXT3_RESERVE_TRANS_BLOCKS	12U

 #define EXT3_INDEX_EXTRA_TRANS_BLOCKS	8


--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -20,16 +20,32 @@ int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
 void huge_page_release(struct page *);
 int hugetlb_report_meminfo(char *);
 int is_hugepage_mem_enough(size_t);
+struct page *follow_huge_addr(struct mm_struct *mm, struct vm_area_struct *vma,
+			unsigned long address, int write);
+struct vm_area_struct *hugepage_vma(struct mm_struct *mm,
+					unsigned long address);
+struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+				pmd_t *pmd, int write);
+int pmd_huge(pmd_t pmd);

 extern int htlbpage_max;

+static inline void
+mark_mm_hugetlb(struct mm_struct *mm, struct vm_area_struct *vma)
+{
+	if (is_vm_hugetlb_page(vma))
+		mm->used_hugetlb = 1;
+}
+
 #else /* !CONFIG_HUGETLB_PAGE */
+
 static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
 {
 	return 0;
 }

 #define follow_hugetlb_page(m,v,p,vs,a,b,i)	({ BUG(); 0; })
+#define follow_huge_addr(mm, vma, addr, write)	0
 #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
 #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
 #define zap_hugepage_range(vma, start, len)	BUG()
@@ -37,6 +53,14 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
 #define huge_page_release(page)			BUG()
 #define is_hugepage_mem_enough(size)		0
 #define hugetlb_report_meminfo(buf)		0
+#define hugepage_vma(mm, addr)			0
+#define mark_mm_hugetlb(mm, vma)		do { } while (0)
+#define follow_huge_pmd(mm, addr, pmd, write)	0
+#define pmd_huge(x)	0
+
+#ifndef HPAGE_MASK
+#define HPAGE_MASK	0		/* Keep the compiler happy */
+#endif

 #endif /* !CONFIG_HUGETLB_PAGE */


--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -208,24 +208,55 @@ struct page {
 * Also, many kernel routines increase the page count before a critical
 * routine so they can be sure the page doesn't go away from under them.
 */
-#define get_page(p)		atomic_inc(&(p)->count)
-#define __put_page(p)		atomic_dec(&(p)->count)
 #define put_page_testzero(p)				\
 	({						\
 		BUG_ON(page_count(page) == 0);		\
 		atomic_dec_and_test(&(p)->count);	\
 	})
+
 #define page_count(p)		atomic_read(&(p)->count)
 #define set_page_count(p,v) 	atomic_set(&(p)->count, v)
+#define __put_page(p)		atomic_dec(&(p)->count)

 extern void FASTCALL(__page_cache_release(struct page *));

+#ifdef CONFIG_HUGETLB_PAGE
+
+static inline void get_page(struct page *page)
+{
+	if (PageCompound(page))
+		page = (struct page *)page->lru.next;
+	atomic_inc(&page->count);
+}
+
+static inline void put_page(struct page *page)
+{
+	if (PageCompound(page)) {
+		page = (struct page *)page->lru.next;
+		if (page->lru.prev) {	/* destructor? */
+			(*(void (*)(struct page *))page->lru.prev)(page);
+			return;
+		}
+	}
+	if (!PageReserved(page) && put_page_testzero(page))
+		__page_cache_release(page);
+}
+
+#else		/* CONFIG_HUGETLB_PAGE */
+
+static inline void get_page(struct page *page)
+{
+	atomic_inc(&page->count);
+}
+
 static inline void put_page(struct page *page)
 {
 	if (!PageReserved(page) && put_page_testzero(page))
 		__page_cache_release(page);
 }

+#endif		/* CONFIG_HUGETLB_PAGE */
+
 /*
 * Multiple processes may "see" the same page. E.g. for untouched
 * mappings of /dev/null, all processes see the same page full of

--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -72,7 +72,8 @@

 #define PG_direct		16	/* ->pte_chain points directly at pte */
 #define PG_mappedtodisk		17	/* Has blocks allocated on-disk */
-#define PG_reclaim		18	/* To be recalimed asap */
+#define PG_reclaim		18	/* To be reclaimed asap */
+#define PG_compound		19	/* Part of a compound page */

 /*
 * Global page accounting.  One instance per CPU.  Only unsigned longs are
@@ -251,6 +252,10 @@ extern void get_full_page_state(struct page_state *ret);
 #define ClearPageReclaim(page)	clear_bit(PG_reclaim, &(page)->flags)
 #define TestClearPageReclaim(page) test_and_clear_bit(PG_reclaim, &(page)->flags)

+#define PageCompound(page)	test_bit(PG_compound, &(page)->flags)
+#define SetPageCompound(page)	set_bit(PG_compound, &(page)->flags)
+#define ClearPageCompound(page)	clear_bit(PG_compound, &(page)->flags)
+
 /*
 * The PageSwapCache predicate doesn't use a PG_flag at this time,
 * but it may again do so one day.

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -201,7 +201,9 @@ struct mm_struct {
 	unsigned long swap_address;

 	unsigned dumpable:1;
-
+#ifdef CONFIG_HUGETLB_PAGE
+	int used_hugetlb;
+#endif
 	/* Architecture-specific MM context */
 	mm_context_t context;


--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -37,30 +37,120 @@
 #ifdef CONFIG_SMP
 #include <asm/spinlock.h>

-/*
- * !CONFIG_SMP and spin_lock_init not previously defined
- * (e.g. by including include/asm/spinlock.h)
- */
-#elif !defined(spin_lock_init)
+#else

-#ifndef CONFIG_PREEMPT
+#if !defined(CONFIG_PREEMPT) && !defined(CONFIG_DEBUG_SPINLOCK)
 # define atomic_dec_and_lock(atomic,lock) atomic_dec_and_test(atomic)
 # define ATOMIC_DEC_AND_LOCK
 #endif

+#ifdef CONFIG_DEBUG_SPINLOCK
+ 
+#define SPINLOCK_MAGIC	0x1D244B3C
+typedef struct {
+	unsigned long magic;
+	volatile unsigned long lock;
+	volatile unsigned int babble;
+	const char *module;
+	char *owner;
+	int oline;
+} spinlock_t;
+#define SPIN_LOCK_UNLOCKED (spinlock_t) { SPINLOCK_MAGIC, 0, 10, __FILE__ , NULL, 0}
+
+#define spin_lock_init(x) \
+	do { \
+		(x)->magic = SPINLOCK_MAGIC; \
+		(x)->lock = 0; \
+		(x)->babble = 5; \
+		(x)->module = __FILE__; \
+		(x)->owner = NULL; \
+		(x)->oline = 0; \
+	} while (0)
+
+#define CHECK_LOCK(x) \
+	do { \
+	 	if ((x)->magic != SPINLOCK_MAGIC) { \
+			printk(KERN_ERR "%s:%d: spin_is_locked on uninitialized spinlock %p.\n", \
+					__FILE__, __LINE__, (x)); \
+		} \
+	} while(0)
+
+#define _raw_spin_lock(x)		\
+	do { \
+	 	CHECK_LOCK(x); \
+		if ((x)->lock&&(x)->babble) { \
+			printk("%s:%d: spin_lock(%s:%p) already locked by %s/%d\n", \
+					__FILE__,__LINE__, (x)->module, \
+					(x), (x)->owner, (x)->oline); \
+			(x)->babble--; \
+		} \
+		(x)->lock = 1; \
+		(x)->owner = __FILE__; \
+		(x)->oline = __LINE__; \
+	} while (0)
+
+/* without debugging, spin_is_locked on UP always says
+ * FALSE. --> printk if already locked. */
+#define spin_is_locked(x) \
+	({ \
+	 	CHECK_LOCK(x); \
+		if ((x)->lock&&(x)->babble) { \
+			printk("%s:%d: spin_is_locked(%s:%p) already locked by %s/%d\n", \
+					__FILE__,__LINE__, (x)->module, \
+					(x), (x)->owner, (x)->oline); \
+			(x)->babble--; \
+		} \
+		0; \
+	})
+
+/* without debugging, spin_trylock on UP always says
+ * TRUE. --> printk if already locked. */
+#define _raw_spin_trylock(x) \
+	({ \
+	 	CHECK_LOCK(x); \
+		if ((x)->lock&&(x)->babble) { \
+			printk("%s:%d: spin_trylock(%s:%p) already locked by %s/%d\n", \
+					__FILE__,__LINE__, (x)->module, \
+					(x), (x)->owner, (x)->oline); \
+			(x)->babble--; \
+		} \
+		(x)->lock = 1; \
+		(x)->owner = __FILE__; \
+		(x)->oline = __LINE__; \
+		1; \
+	})
+
+#define spin_unlock_wait(x)	\
+	do { \
+	 	CHECK_LOCK(x); \
+		if ((x)->lock&&(x)->babble) { \
+			printk("%s:%d: spin_unlock_wait(%s:%p) owned by %s/%d\n", \
+					__FILE__,__LINE__, (x)->module, (x), \
+					(x)->owner, (x)->oline); \
+			(x)->babble--; \
+		}\
+	} while (0)
+
+#define _raw_spin_unlock(x) \
+	do { \
+	 	CHECK_LOCK(x); \
+		if (!(x)->lock&&(x)->babble) { \
+			printk("%s:%d: spin_unlock(%s:%p) not locked\n", \
+					__FILE__,__LINE__, (x)->module, (x));\
+			(x)->babble--; \
+		} \
+		(x)->lock = 0; \
+	} while (0)
+#else
 /*
 * gcc versions before ~2.95 have a nasty bug with empty initializers.
 */
 #if (__GNUC__ > 2)
  typedef struct { } spinlock_t;
-  typedef struct { } rwlock_t;
  #define SPIN_LOCK_UNLOCKED (spinlock_t) { }
-  #define RW_LOCK_UNLOCKED (rwlock_t) { }
 #else
  typedef struct { int gcc_is_buggy; } spinlock_t;
-  typedef struct { int gcc_is_buggy; } rwlock_t;
  #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
-  #define RW_LOCK_UNLOCKED (rwlock_t) { 0 }
 #endif

 /*
@@ -72,6 +162,18 @@
 #define _raw_spin_trylock(lock)	((void)(lock), 1)
 #define spin_unlock_wait(lock)	do { (void)(lock); } while(0)
 #define _raw_spin_unlock(lock)	do { (void)(lock); } while(0)
+#endif /* CONFIG_DEBUG_SPINLOCK */
+
+/* RW spinlocks: No debug version */
+
+#if (__GNUC__ > 2)
+  typedef struct { } rwlock_t;
+  #define RW_LOCK_UNLOCKED (rwlock_t) { }
+#else
+  typedef struct { int gcc_is_buggy; } rwlock_t;
+  #define RW_LOCK_UNLOCKED (rwlock_t) { 0 }
+#endif
+
 #define rwlock_init(lock)	do { (void)(lock); } while(0)
 #define _raw_read_lock(lock)	do { (void)(lock); } while(0)
 #define _raw_read_unlock(lock)	do { (void)(lock); } while(0)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -177,6 +177,7 @@ static int worker_thread(void *__startup)
 	current->flags |= PF_IOTHREAD;
 	cwq->thread = current;

+	set_user_nice(current, -10);
 	set_cpus_allowed(current, 1UL << cpu);

 	spin_lock_irq(&current->sig->siglock);

--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -259,9 +259,10 @@ void wait_on_page_bit(struct page *page, int bit_nr)

 	do {
 		prepare_to_wait(waitqueue, &wait, TASK_UNINTERRUPTIBLE);
+		if (test_bit(bit_nr, &page->flags)) {
 			sync_page(page);
-		if (test_bit(bit_nr, &page->flags))
 			io_schedule();
+		}
 	} while (test_bit(bit_nr, &page->flags));
 	finish_wait(waitqueue, &wait);
 }
@@ -326,10 +327,11 @@ void __lock_page(struct page *page)

 	while (TestSetPageLocked(page)) {
 		prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
+		if (PageLocked(page)) {
 			sync_page(page);
-		if (PageLocked(page))
 			io_schedule();
 		}
+	}
 	finish_wait(wqh, &wait);
 }
 EXPORT_SYMBOL(__lock_page);

--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -53,8 +53,11 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	pte_t *pte, entry;
 	pgd_t *pgd;
 	pmd_t *pmd;
-	struct pte_chain *pte_chain = NULL;
+	struct pte_chain *pte_chain;

+	pte_chain = pte_chain_alloc(GFP_KERNEL);
+	if (!pte_chain)
+		goto err;
 	pgd = pgd_offset(mm, addr);
 	spin_lock(&mm->page_table_lock);

@@ -62,7 +65,6 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (!pmd)
 		goto err_unlock;

-	pte_chain = pte_chain_alloc(GFP_KERNEL);
 	pte = pte_alloc_map(mm, pmd, addr);
 	if (!pte)
 		goto err_unlock;
@@ -87,6 +89,7 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
 err_unlock:
 	spin_unlock(&mm->page_table_lock);
 	pte_chain_free(pte_chain);
+err:
 	return err;
 }


--- a/mm/memory.c
+++ b/mm/memory.c
@@ -607,13 +607,22 @@ follow_page(struct mm_struct *mm, unsigned long address, int write)
 	pmd_t *pmd;
 	pte_t *ptep, pte;
 	unsigned long pfn;
+	struct vm_area_struct *vma;
+
+	vma = hugepage_vma(mm, address);
+	if (vma)
+		return follow_huge_addr(mm, vma, address, write);

 	pgd = pgd_offset(mm, address);
 	if (pgd_none(*pgd) || pgd_bad(*pgd))
 		goto out;

 	pmd = pmd_offset(pgd, address);
-	if (pmd_none(*pmd) || pmd_bad(*pmd))
+	if (pmd_none(*pmd))
+		goto out;
+	if (pmd_huge(*pmd))
+		return follow_huge_pmd(mm, address, pmd, write);
+	if (pmd_bad(*pmd))
 		goto out;

 	ptep = pte_offset_map(pmd, address);
@@ -926,9 +935,19 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
 	struct page *old_page, *new_page;
 	unsigned long pfn = pte_pfn(pte);
 	struct pte_chain *pte_chain = NULL;
+	int ret;

-	if (!pfn_valid(pfn))
-		goto bad_wp_page;
+	if (unlikely(!pfn_valid(pfn))) {
+		/*
+		 * This should really halt the system so it can be debugged or
+		 * at least the kernel stops what it's doing before it corrupts
+		 * data, but for the moment just pretend this is OOM.
+		 */
+		pte_unmap(page_table);
+		printk(KERN_ERR "do_wp_page: bogus page at address %08lx\n",
+				address);
+		goto oom;
+	}
 	old_page = pfn_to_page(pfn);

 	if (!TestSetPageLocked(old_page)) {
@@ -936,10 +955,11 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
 		unlock_page(old_page);
 		if (reuse) {
 			flush_cache_page(vma, address);
-			establish_pte(vma, address, page_table, pte_mkyoung(pte_mkdirty(pte_mkwrite(pte))));
+			establish_pte(vma, address, page_table,
+				pte_mkyoung(pte_mkdirty(pte_mkwrite(pte))));
 			pte_unmap(page_table);
-			spin_unlock(&mm->page_table_lock);
-			return VM_FAULT_MINOR;
+			ret = VM_FAULT_MINOR;
+			goto out;
 		}
 	}
 	pte_unmap(page_table);
@@ -950,11 +970,13 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
 	page_cache_get(old_page);
 	spin_unlock(&mm->page_table_lock);

+	pte_chain = pte_chain_alloc(GFP_KERNEL);
+	if (!pte_chain)
+		goto no_mem;
 	new_page = alloc_page(GFP_HIGHUSER);
 	if (!new_page)
 		goto no_mem;
 	copy_cow_page(old_page,new_page,address);
-	pte_chain = pte_chain_alloc(GFP_KERNEL);

 	/*
 	 * Re-check the pte - we dropped the lock
@@ -973,25 +995,19 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
 		new_page = old_page;
 	}
 	pte_unmap(page_table);
-	spin_unlock(&mm->page_table_lock);
 	page_cache_release(new_page);
 	page_cache_release(old_page);
-	pte_chain_free(pte_chain);
-	return VM_FAULT_MINOR;
+	ret = VM_FAULT_MINOR;
+	goto out;

-bad_wp_page:
-	pte_unmap(page_table);
-	spin_unlock(&mm->page_table_lock);
-	printk(KERN_ERR "do_wp_page: bogus page at address %08lx\n", address);
-	/*
-	 * This should really halt the system so it can be debugged or
-	 * at least the kernel stops what it's doing before it corrupts
-	 * data, but for the moment just pretend this is OOM.
-	 */
-	return VM_FAULT_OOM;
 no_mem:
 	page_cache_release(old_page);
-	return VM_FAULT_OOM;
+oom:
+	ret = VM_FAULT_OOM;
+out:
+	spin_unlock(&mm->page_table_lock);
+	pte_chain_free(pte_chain);
+	return ret;
 }

 static void vmtruncate_list(struct list_head *head, unsigned long pgoff)
@@ -1286,6 +1302,7 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page * new_page;
 	pte_t entry;
 	struct pte_chain *pte_chain;
+	int ret;

 	if (!vma->vm_ops || !vma->vm_ops->nopage)
 		return do_anonymous_page(mm, vma, page_table,
@@ -1301,6 +1318,10 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (new_page == NOPAGE_OOM)
 		return VM_FAULT_OOM;

+	pte_chain = pte_chain_alloc(GFP_KERNEL);
+	if (!pte_chain)
+		goto oom;
+
 	/*
 	 * Should we do an early C-O-W break?
 	 */
@@ -1308,7 +1329,7 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		struct page * page = alloc_page(GFP_HIGHUSER);
 		if (!page) {
 			page_cache_release(new_page);
-			return VM_FAULT_OOM;
+			goto oom;
 		}
 		copy_user_highpage(page, new_page, address);
 		page_cache_release(new_page);
@@ -1316,7 +1337,6 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		new_page = page;
 	}

-	pte_chain = pte_chain_alloc(GFP_KERNEL);
 	spin_lock(&mm->page_table_lock);
 	page_table = pte_offset_map(pmd, address);

@@ -1346,15 +1366,20 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		pte_unmap(page_table);
 		page_cache_release(new_page);
 		spin_unlock(&mm->page_table_lock);
-		pte_chain_free(pte_chain);
-		return VM_FAULT_MINOR;
+		ret = VM_FAULT_MINOR;
+		goto out;
 	}

 	/* no need to invalidate: a not-present page shouldn't be cached */
 	update_mmu_cache(vma, address, entry);
 	spin_unlock(&mm->page_table_lock);
+	ret = VM_FAULT_MAJOR;
+	goto out;
+oom:
+	ret = VM_FAULT_OOM;
+out:
 	pte_chain_free(pte_chain);
-	return VM_FAULT_MAJOR;
+	return ret;
 }

 /*
@@ -1422,6 +1447,10 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma,
 	pgd = pgd_offset(mm, address);

 	inc_page_state(pgfault);
+
+	if (is_vm_hugetlb_page(vma))
+		return VM_FAULT_SIGBUS;	/* mapping truncation does this. */
+
 	/*
 	 * We need the page table lock to synchronize with kswapd
 	 * and the SMP-safe atomic PTE updates.

--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -362,6 +362,7 @@ static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (mapping)
 		up(&mapping->i_shared_sem);

+	mark_mm_hugetlb(mm, vma);
 	mm->map_count++;
 	validate_mm(mm);
 }
@@ -1222,6 +1223,11 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
 		return 0;
 	/* we have  start < mpnt->vm_end  */

+	if (is_vm_hugetlb_page(mpnt)) {
+		if ((start & ~HPAGE_MASK) || (len & ~HPAGE_MASK))
+			return -EINVAL;
+	}
+
 	/* if it doesn't overlap, we have nothing.. */
 	end = start + len;
 	if (mpnt->vm_start >= end)
@@ -1423,7 +1429,6 @@ void exit_mmap(struct mm_struct *mm)
 		kmem_cache_free(vm_area_cachep, vma);
 		vma = next;
 	}
-		
 }

 /* Insert vm structure into process list sorted by address

--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -24,9 +24,9 @@

 static pte_t *get_one_pte_map_nested(struct mm_struct *mm, unsigned long addr)
 {
-	pgd_t * pgd;
-	pmd_t * pmd;
-	pte_t * pte = NULL;
+	pgd_t *pgd;
+	pmd_t *pmd;
+	pte_t *pte = NULL;

 	pgd = pgd_offset(mm, addr);
 	if (pgd_none(*pgd))
@@ -73,8 +73,8 @@ static inline int page_table_present(struct mm_struct *mm, unsigned long addr)

 static inline pte_t *alloc_one_pte_map(struct mm_struct *mm, unsigned long addr)
 {
-	pmd_t * pmd;
-	pte_t * pte = NULL;
+	pmd_t *pmd;
+	pte_t *pte = NULL;

 	pmd = pmd_alloc(mm, pgd_offset(mm, addr), addr);
 	if (pmd)
@@ -88,7 +88,7 @@ copy_one_pte(struct mm_struct *mm, pte_t *src, pte_t *dst,
 {
 	int error = 0;
 	pte_t pte;
-	struct page * page = NULL;
+	struct page *page = NULL;

 	if (pte_present(*src))
 		page = pte_page(*src);
@@ -183,12 +183,12 @@ static int move_page_tables(struct vm_area_struct *vma,
 	return -1;
 }

-static unsigned long move_vma(struct vm_area_struct * vma,
+static unsigned long move_vma(struct vm_area_struct *vma,
 	unsigned long addr, unsigned long old_len, unsigned long new_len,
 	unsigned long new_addr)
 {
-	struct mm_struct * mm = vma->vm_mm;
-	struct vm_area_struct * new_vma, * next, * prev;
+	struct mm_struct *mm = vma->vm_mm;
+	struct vm_area_struct *new_vma, *next, *prev;
 	int allocated_vma;
 	int split = 0;

@@ -196,14 +196,16 @@ static unsigned long move_vma(struct vm_area_struct * vma,
 	next = find_vma_prev(mm, new_addr, &prev);
 	if (next) {
 		if (prev && prev->vm_end == new_addr &&
-		    can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
+		    can_vma_merge(prev, vma->vm_flags) && !vma->vm_file &&
+					!(vma->vm_flags & VM_SHARED)) {
 			spin_lock(&mm->page_table_lock);
 			prev->vm_end = new_addr + new_len;
 			spin_unlock(&mm->page_table_lock);
 			new_vma = prev;
 			if (next != prev->vm_next)
 				BUG();
-			if (prev->vm_end == next->vm_start && can_vma_merge(next, prev->vm_flags)) {
+			if (prev->vm_end == next->vm_start &&
+					can_vma_merge(next, prev->vm_flags)) {
 				spin_lock(&mm->page_table_lock);
 				prev->vm_end = next->vm_end;
 				__vma_unlink(mm, next, prev);
@@ -214,7 +216,8 @@ static unsigned long move_vma(struct vm_area_struct * vma,
 				kmem_cache_free(vm_area_cachep, next);
 			}
 		} else if (next->vm_start == new_addr + new_len &&
-			   can_vma_merge(next, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
+			  	can_vma_merge(next, vma->vm_flags) &&
+				!vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
 			spin_lock(&mm->page_table_lock);
 			next->vm_start = new_addr;
 			spin_unlock(&mm->page_table_lock);
@@ -223,7 +226,8 @@ static unsigned long move_vma(struct vm_area_struct * vma,
 	} else {
 		prev = find_vma(mm, new_addr-1);
 		if (prev && prev->vm_end == new_addr &&
-		    can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
+		    can_vma_merge(prev, vma->vm_flags) && !vma->vm_file &&
+				!(vma->vm_flags & VM_SHARED)) {
 			spin_lock(&mm->page_table_lock);
 			prev->vm_end = new_addr + new_len;
 			spin_unlock(&mm->page_table_lock);
@@ -249,7 +253,7 @@ static unsigned long move_vma(struct vm_area_struct * vma,
 			INIT_LIST_HEAD(&new_vma->shared);
 			new_vma->vm_start = new_addr;
 			new_vma->vm_end = new_addr+new_len;
-			new_vma->vm_pgoff += (addr - vma->vm_start) >> PAGE_SHIFT;
+			new_vma->vm_pgoff += (addr-vma->vm_start) >> PAGE_SHIFT;
 			if (new_vma->vm_file)
 				get_file(new_vma->vm_file);
 			if (new_vma->vm_ops && new_vma->vm_ops->open)
@@ -428,7 +432,8 @@ unsigned long do_mremap(unsigned long addr,
 			if (vma->vm_flags & VM_SHARED)
 				map_flags |= MAP_SHARED;

-			new_addr = get_unmapped_area(vma->vm_file, 0, new_len, vma->vm_pgoff, map_flags);
+			new_addr = get_unmapped_area(vma->vm_file, 0, new_len,
+						vma->vm_pgoff, map_flags);
 			ret = new_addr;
 			if (new_addr & ~PAGE_MASK)
 				goto out;

--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -237,7 +237,6 @@ static void background_writeout(unsigned long _min_pages)
 				break;
 		}
 	}
-	blk_run_queues();
 }

 /*
@@ -308,7 +307,6 @@ static void wb_kupdate(unsigned long arg)
 		}
 		nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
 	}
-	blk_run_queues();
 	if (time_before(next_jif, jiffies + HZ))
 		next_jif = jiffies + HZ;
 	mod_timer(&wb_timer, next_jif);

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -85,6 +85,62 @@ static void bad_page(const char *function, struct page *page)
 	page->mapping = NULL;
 }

+#ifndef CONFIG_HUGETLB_PAGE
+#define prep_compound_page(page, order) do { } while (0)
+#define destroy_compound_page(page, order) do { } while (0)
+#else
+/*
+ * Higher-order pages are called "compound pages".  They are structured thusly:
+ *
+ * The first PAGE_SIZE page is called the "head page".
+ *
+ * The remaining PAGE_SIZE pages are called "tail pages".
+ *
+ * All pages have PG_compound set.  All pages have their lru.next pointing at
+ * the head page (even the head page has this).
+ *
+ * The head page's lru.prev, if non-zero, holds the address of the compound
+ * page's put_page() function.
+ *
+ * The order of the allocation is stored in the first tail page's lru.prev.
+ * This is only for debug at present.  This usage means that zero-order pages
+ * may not be compound.
+ */
+static void prep_compound_page(struct page *page, int order)
+{
+	int i;
+	int nr_pages = 1 << order;
+
+	page->lru.prev = NULL;
+	page[1].lru.prev = (void *)order;
+	for (i = 0; i < nr_pages; i++) {
+		struct page *p = page + i;
+
+		SetPageCompound(p);
+		p->lru.next = (void *)page;
+	}
+}
+
+static void destroy_compound_page(struct page *page, int order)
+{
+	int i;
+	int nr_pages = 1 << order;
+
+	if (page[1].lru.prev != (void *)order)
+		bad_page(__FUNCTION__, page);
+
+	for (i = 0; i < nr_pages; i++) {
+		struct page *p = page + i;
+
+		if (!PageCompound(p))
+			bad_page(__FUNCTION__, page);
+		if (p->lru.next != (void *)page)
+			bad_page(__FUNCTION__, page);
+		ClearPageCompound(p);
+	}
+}
+#endif		/* CONFIG_HUGETLB_PAGE */
+
 /*
 * Freeing function for a buddy system allocator.
 *
@@ -114,6 +170,8 @@ static inline void __free_pages_bulk (struct page *page, struct page *base,
 {
 	unsigned long page_idx, index;

+	if (order)
+		destroy_compound_page(page, order);
 	page_idx = page - base;
 	if (page_idx & ~mask)
 		BUG();
@@ -409,6 +467,12 @@ void free_cold_page(struct page *page)
 	free_hot_cold_page(page, 1);
 }

+/*
+ * Really, prep_compound_page() should be called from __rmqueue_bulk().  But
+ * we cheat by calling it from here, in the order > 0 path.  Saves a branch
+ * or two.
+ */
+
 static struct page *buffered_rmqueue(struct zone *zone, int order, int cold)
 {
 	unsigned long flags;
@@ -435,6 +499,8 @@ static struct page *buffered_rmqueue(struct zone *zone, int order, int cold)
 		spin_lock_irqsave(&zone->lock, flags);
 		page = __rmqueue(zone, order);
 		spin_unlock_irqrestore(&zone->lock, flags);
+		if (order && page)
+			prep_compound_page(page, order);
 	}

 	if (page != NULL) {

--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -236,10 +236,8 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 	 * uptodate then the caller will launch readpage again, and
 	 * will then handle the error.
 	 */
-	if (ret) {
+	if (ret)
 		read_pages(mapping, filp, &page_pool, ret);
-		blk_run_queues();
-	}
 	BUG_ON(!list_empty(&page_pool));
 out:
 	return ret;

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -439,7 +439,6 @@ struct arraycache_init initarray_generic __initdata = { { 0, BOOT_CPUCACHE_ENTRI
 static kmem_cache_t cache_cache = {
 	.lists		= LIST3_INIT(cache_cache.lists),
 	/* Allow for boot cpu != 0 */
-	.array		= { [0 ... NR_CPUS-1] = &initarray_cache.cache },
 	.batchcount	= 1,
 	.limit		= BOOT_CPUCACHE_ENTRIES,
 	.objsize	= sizeof(kmem_cache_t),
@@ -611,6 +610,7 @@ void __init kmem_cache_init(void)
 	init_MUTEX(&cache_chain_sem);
 	INIT_LIST_HEAD(&cache_chain);
 	list_add(&cache_cache.next, &cache_chain);
+	cache_cache.array[smp_processor_id()] = &initarray_cache.cache;

 	cache_estimate(0, cache_cache.objsize, 0,
 			&left_over, &cache_cache.num);

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -957,7 +957,6 @@ int kswapd(void *p)
 		finish_wait(&pgdat->kswapd_wait, &wait);
 		get_page_state(&ps);
 		balance_pgdat(pgdat, 0, &ps);
-		blk_run_queues();
 	}
 }