Merge master.kernel.org:/home/mingo/bk-sched

into home.transmeta.com:/home/torvalds/v2.5/linux

Merge master.kernel.org:/home/mingo/bk-sched
into home.transmeta.com:/home/torvalds/v2.5/linux
1f60ade2 · Linus Torvalds · 8509486a · 3986594c · 1f60ade2 · 1f60ade2
Commit 1f60ade2 authored Jun 17, 2002 by Linus Torvalds
158 changed files
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,27 +50,27 @@ prototypes:
 	int (*removexattr) (struct dentry *, const char *);
 locking rules:
-	all may block
+	all may block, none have BKL
-		BKL	i_sem(inode)
+		i_sem(inode)
-lookup:		no	yes
+lookup:		yes
-create:		no	yes
+create:		yes
-link:		no	yes (both)
+link:		yes (both)
-mknod:		no	yes
+mknod:		yes
-symlink:	no	yes
+symlink:	yes
-mkdir:		no	yes
+mkdir:		yes
-unlink:		no	yes (both)
+unlink:		yes (both)
-rmdir:		no	yes (both)	(see below)
+rmdir:		yes (both)	(see below)
-rename:		no	yes (all)	(see below)
+rename:		yes (all)	(see below)
-readlink:	no	no
+readlink:	no
-follow_link:	no	no
+follow_link:	no
-truncate:	no	yes		(see below)
+truncate:	yes		(see below)
-setattr:	no	yes
+setattr:	yes
-permission:	yes	no
+permission:	no
-getattr:	no	no
+getattr:	no
-setxattr:	no	yes
+setxattr:	yes
-getxattr:	no	yes
+getxattr:	yes
-listxattr:	no	yes
+listxattr:	yes
-removexattr:	no	yes
+removexattr:	yes
 	Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_sem on
 victim.
 	cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.

--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -81,9 +81,9 @@ can relax your locking.
 [mandatory]
 ->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename() and ->readdir() 
+->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename(), ->permission()
-are called without BKL now.  Grab it on the entry, drop upon return - that 
+and ->readdir() are called without BKL now.  Grab it on entry, drop upon return
-will guarantee the same locking you used to have.  If your method or its
+- that will guarantee the same locking you used to have.  If your method or its
 parts do not need BKL - better yet, now you can shift lock_kernel() and
 unlock_kernel() so that they would protect exactly what needs to be
 protected.

--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -948,120 +948,43 @@ program to load modules on demand.
 -----------------------------------------------
 The files  in  this directory can be used to tune the operation of the virtual
-memory (VM)  subsystem  of  the  Linux  kernel.  In addition, one of the files
+memory (VM)  subsystem  of  the  Linux  kernel.
-(bdflush) has some influence on disk usage.
-bdflush
+dirty_background_ratio
-------
+----------------------
-This file  controls  the  operation of the bdflush kernel daemon. It currently
-contains nine  integer  values,  six of which are actually used by the kernel.
-They are listed in table 2-2.
-Table 2-2: Parameters in /proc/sys/vm/bdflush 
-..............................................................................
- Value      Meaning                                                            
- nfract     Percentage of buffer cache dirty to  activate bdflush              
- ndirty     Maximum number of dirty blocks to  write out per wake-cycle        
- nrefill    Number of clean buffers to try to obtain  each time we call refill 
- nref_dirt  buffer threshold for activating bdflush when trying to refill
-            buffers. 
- dummy      Unused                                                             
- age_buffer Time for normal buffer to age before we flush it                   
- age_super  Time for superblock to age before we flush it                      
- dummy      Unused                                                             
- dummy      Unused                                                             
-..............................................................................
-nfract
------
-This parameter  governs  the  maximum  number  of  dirty buffers in the buffer
-cache. Dirty means that the contents of the buffer still have to be written to
-disk (as  opposed  to  a  clean  buffer,  which  can just be forgotten about).
-Setting this  to  a  higher value means that Linux can delay disk writes for a
-long time, but it also means that it will have to do a lot of I/O at once when
-memory becomes short. A lower value will spread out disk I/O more evenly.
-ndirty
------
-Ndirty gives the maximum number of dirty buffers that bdflush can write to the
-disk at  one  time.  A high value will mean delayed, bursty I/O, while a small
-value can lead to memory shortage when bdflush isn't woken up often enough.
-nrefill
-------
-This is  the  number  of  buffers  that  bdflush  will add to the list of free
-buffers when  refill_freelist()  is  called.  It is necessary to allocate free
-buffers beforehand,  since  the  buffers  are  often  different sizes than the
-memory pages  and some bookkeeping needs to be done beforehand. The higher the
-number, the  more  memory  will be wasted and the less often refill_freelist()
-will need to run.
-nref_dirt
---------
-When refill_freelist() comes across more than nref_dirt dirty buffers, it will
-wake up bdflush.
-age_buffer and age_super
------------------------
-Finally, the age_buffer and age_super parameters govern the maximum time Linux
-waits before  writing  out  a  dirty buffer to disk. The value is expressed in
-jiffies (clockticks),  the  number of jiffies per second is 100. Age_buffer is
-the maximum age for data blocks, while age_super is for filesystems meta data.
-buffermem
---------
-The three  values  in  this  file  control  how much memory should be used for
-buffer memory.  The  percentage  is calculated as a percentage of total system
-memory.
-The values are:
-min_percent
-----------
-This is  the  minimum  percentage  of  memory  that  should be spent on buffer
+Contains, as a percentage of total system memory, the number of pages at which
-memory.
+the pdflush background writeback daemon will start writing out dirty data.
-borrow_percent
+dirty_async_ratio
--------------
+-----------------
-When Linux is short on memory, and the buffer cache uses more than it has been
+Contains, as a percentage of total system memory, the number of pages at which
-allotted, the  memory  management  (MM)  subsystem will prune the buffer cache
+a process which is generating disk writes will itself start writing out dirty
-more heavily than other memory to compensate.
+data.
-max_percent
+dirty_sync_ratio
-----------
+----------------
-This is the maximum amount of memory that can be used for buffer memory.
+Contains, as a percentage of total system memory, the number of pages at which
+a process which is generating disk writes will itself start writing out dirty
+data and waiting upon completion of that writeout.
-freepages
+dirty_writeback_centisecs
---------
+-------------------------
-This file contains three values: min, low and high:
+The pdflush writeback daemons will periodically wake up and write `old' data
+out to disk.  This tunable expresses the interval between those wakeups, in
+100'ths of a second.
-min
+dirty_expire_centisecs
---
+----------------------
-When the  number  of  free  pages  in the system reaches this number, only the
-kernel can allocate more memory.
-low
+This tunable is used to define when dirty data is old enough to be eligible
---
+for writeout by the pdflush daemons.  It is expressed in 100'ths of a second. 
-If the number of free pages falls below this point, the kernel starts swapping
+Data which has been dirty in-memory for longer than this interval will be
-aggressively.
+written out next time a pdflush daemon wakes up.
-high
----
-The kernel  tries  to  keep  up to this amount of memory free; if memory falls
-below this point, the kernel starts gently swapping in the hopes that it never
-has to do really aggressive swapping.
 kswapd
 ------
@@ -1113,79 +1036,6 @@ On the  other  hand,  enabling this feature can cause you to run out of memory
 and thrash the system to death, so large and/or important servers will want to
 set this value to 0.
-pagecache
---------
-This file  does exactly the same job as buffermem, only this file controls the
-amount of memory allowed for memory mapping and generic caching of files.
-You don't  want  the  minimum level to be too low, otherwise your system might
-thrash when memory is tight or fragmentation is high.
-pagetable_cache
---------------
-The kernel  keeps a number of page tables in a per-processor cache (this helps
-a lot  on  SMP systems). The cache size for each processor will be between the
-low and the high value.
-On a  low-memory,  single  CPU system, you can safely set these values to 0 so
-you don't  waste  memory.  It  is  used  on SMP systems so that the system can
-perform fast  pagetable allocations without having to acquire the kernel memory
-lock.
-For large  systems,  the  settings  are probably fine. For normal systems they
-won't hurt  a  bit.  For  small  systems  (  less  than  16MB ram) it might be
-advantageous to set both values to 0.
-swapctl
-------
-This file  contains  no less than 8 variables. All of these values are used by
-kswapd.
-The first four variables
-* sc_max_page_age,
-* sc_page_advance,
-* sc_page_decline and
-* sc_page_initial_age
-are used  to  keep  track  of  Linux's page aging. Page aging is a bookkeeping
-method to  track  which pages of memory are often used, and which pages can be
-swapped out without consequences.
-When a  page  is  swapped in, it starts at sc_page_initial_age (default 3) and
-when the  page  is  scanned  by  kswapd,  its age is adjusted according to the
-following scheme:
-* If  the  page  was used since the last time we scanned, its age is increased
-  by sc_page_advance  (default  3).  Where  the  maximum  value  is  given  by
-  sc_max_page_age (default 20).
-* Otherwise  (meaning  it wasn't used) its age is decreased by sc_page_decline
-  (default 1).
-When a page reaches age 0, it's ready to be swapped out.
-The variables  sc_age_cluster_fract, sc_age_cluster_min, sc_pageout_weight and
-sc_bufferout_weight, can  be  used  to  control  kswapd's  aggressiveness  in
-swapping out pages.
-Sc_age_cluster_fract is used to calculate how many pages from a process are to
-be scanned by kswapd. The formula used is
-(sc_age_cluster_fract divided by 1024) times resident set size
-So if you want kswapd to scan the whole process, sc_age_cluster_fract needs to
-have a  value  of  1024.  The  minimum  number  of  pages  kswapd will scan is
-represented by sc_age_cluster_min, which is done so that kswapd will also scan
-small processes.
-The values  of  sc_pageout_weight  and sc_bufferout_weight are used to control
-how many  tries  kswapd  will make in order to swap out one page/buffer. These
-values can  be used to fine-tune the ratio between user pages and buffer/cache
-memory. When  you find that your Linux system is swapping out too many process
-pages in  order  to  satisfy  buffer  memory  demands,  you may want to either
-increase sc_bufferout_weight, or decrease the value of sc_pageout_weight.
 2.5 /proc/sys/dev - Device specific parameters
 ----------------------------------------------

--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -9,116 +9,28 @@ This file contains the documentation for the sysctl files in
 /proc/sys/vm and is valid for Linux kernel version 2.2.
 The files in this directory can be used to tune the operation
-of the virtual memory (VM) subsystem of the Linux kernel, and
+of the virtual memory (VM) subsystem of the Linux kernel and
-one of the files (bdflush) also has a little influence on disk
+the writeout of dirty data to disk.
-usage.
 Default values and initialization routines for most of these
 files can be found in mm/swap.c.
 Currently, these files are in /proc/sys/vm:
- bdflush
- buffermem
- freepages
 - kswapd
 - overcommit_memory
 - page-cluster
- pagecache
+- dirty_async_ratio
- pagetable_cache
+- dirty_background_ratio
+- dirty_expire_centisecs
+- dirty_sync_ratio
+- dirty_writeback_centisecs
 ==============================================================
-bdflush:
+dirty_async_ratio, dirty_background_ratio, dirty_expire_centisecs,
+dirty_sync_ratio dirty_writeback_centisecs:
-This file controls the operation of the bdflush kernel
-daemon. The source code to this struct can be found in
-linux/fs/buffer.c. It currently contains 9 integer values,
-of which 4 are actually used by the kernel.
-From linux/fs/buffer.c:
--------------------------------------------------------------
-union bdflush_param {
-	struct {
-		int nfract;	/* Percentage of buffer cache dirty to 
-				   activate bdflush */
-		int dummy1;	/* old "ndirty" */
-		int dummy2;	/* old "nrefill" */
-		int dummy3;	/* unused */
-		int interval;	/* jiffies delay between kupdate flushes */
-		int age_buffer;	/* Time for normal buffer to age */
-		int nfract_sync;/* Percentage of buffer cache dirty to 
-				   activate bdflush synchronously */
-		int dummy4;	/* unused */
-		int dummy5;	/* unused */
-	} b_un;
-	unsigned int data[N_PARAM];
-} bdf_prm = {{30, 64, 64, 256, 5*HZ, 30*HZ, 60, 0, 0}};
--------------------------------------------------------------
-int nfract:
-The first parameter governs the maximum number of dirty
-buffers in the buffer cache. Dirty means that the contents
-of the buffer still have to be written to disk (as opposed
-to a clean buffer, which can just be forgotten about).
-Setting this to a high value means that Linux can delay disk
-writes for a long time, but it also means that it will have
-to do a lot of I/O at once when memory becomes short. A low
-value will spread out disk I/O more evenly, at the cost of
-more frequent I/O operations.  The default value is 30%,
-the minimum is 0%, and the maximum is 100%.
-int interval:
-The fifth parameter, interval, is the minimum rate at
-which kupdate will wake and flush.  The value is expressed in
-jiffies (clockticks), the number of jiffies per second is
-normally 100 (Alpha is 1024). Thus, x*HZ is x seconds.  The
-default value is 5 seconds, the minimum is 0 seconds, and the
-maximum is 600 seconds.
-int age_buffer:
-The sixth parameter, age_buffer, governs the maximum time
-Linux waits before writing out a dirty buffer to disk.  The
-value is in jiffies.  The default value is 30 seconds,
-the minimum is 1 second, and the maximum 6,000 seconds.
-int nfract_sync:
-The seventh parameter, nfract_sync, governs the percentage
-of buffer cache that is dirty before bdflush activates
-synchronously.  This can be viewed as the hard limit before
-bdflush forces buffers to disk.  The default is 60%, the
-minimum is 0%, and the maximum is 100%.
-==============================================================
+See Documentation/filesystems/proc.txt
-buffermem:
-The three values in this file correspond to the values in
-the struct buffer_mem. It controls how much memory should
-be used for buffer memory. The percentage is calculated
-as a percentage of total system memory.
-The values are:
-min_percent	-- this is the minimum percentage of memory
-		   that should be spent on buffer memory
-borrow_percent  -- UNUSED
-max_percent     -- UNUSED
-==============================================================
-freepages:
-This file contains the values in the struct freepages. That
-struct contains three members: min, low and high.
-The meaning of the numbers is:
-freepages.min	When the number of free pages in the system
-		reaches this number, only the kernel can
-		allocate more memory.
-freepages.low	If the number of free pages gets below this
-		point, the kernel starts swapping aggressively.
-freepages.high	The kernel tries to keep up to this amount of
-		memory free; if memory comes below this point,
-		the kernel gently starts swapping in the hopes
-		that it never has to do real aggressive swapping.
 ==============================================================
@@ -180,38 +92,3 @@ The number of pages the kernel reads in at once is equal to
 2 ^ page-cluster. Values above 2 ^ 5 don't make much sense
 for swap because we only cluster swap data in 32-page groups.
-==============================================================
-pagecache:
-This file does exactly the same as buffermem, only this
-file controls the struct page_cache, and thus controls
-the amount of memory used for the page cache.
-In 2.2, the page cache is used for 3 main purposes:
- caching read() data from files
- caching mmap()ed data and executable files
- swap cache
-When your system is both deep in swap and high on cache,
-it probably means that a lot of the swapped data is being
-cached, making for more efficient swapping than possible
-with the 2.0 kernel.
-==============================================================
-pagetable_cache:
-The kernel keeps a number of page tables in a per-processor
-cache (this helps a lot on SMP systems). The cache size for
-each processor will be between the low and the high value.
-On a low-memory, single CPU system you can safely set these
-values to 0 so you don't waste the memory. On SMP systems it
-is used so that the system can do fast pagetable allocations
-without having to acquire the kernel memory lock.
-For large systems, the settings are probably OK. For normal
-systems they won't hurt a bit. For small systems (<16MB ram)
-it might be advantageous to set both values to 0.
--- a/arch/alpha/kernel/time.c
+++ b/arch/alpha/kernel/time.c
@@ -48,6 +48,8 @@
 #include "proto.h"
 #include "irq_impl.h"
+u64 jiffies_64;
 extern rwlock_t xtime_lock;
 extern unsigned long wall_jiffies;	/* kernel/timer.c */

--- a/arch/arm/kernel/time.c
+++ b/arch/arm/kernel/time.c
@@ -32,6 +32,8 @@
 #include <asm/irq.h>
 #include <asm/leds.h>
+u64 jiffies_64;
 extern rwlock_t xtime_lock;
 extern unsigned long wall_jiffies;

--- a/arch/cris/kernel/time.c
+++ b/arch/cris/kernel/time.c
@@ -44,6 +44,8 @@
 #include <asm/svinto.h>
+u64 jiffies_64;
 static int have_rtc;  /* used to remember if we have an RTC or not */
 /* define this if you need to use print_timestamp */

--- a/arch/i386/kernel/irq.c
+++ b/arch/i386/kernel/irq.c
@@ -360,8 +360,9 @@ void __global_cli(void)
 	__save_flags(flags);
 	if (flags & (1 << EFLAGS_IF_SHIFT)) {
-		int cpu = smp_processor_id();
+		int cpu;
 		__cli();
+		cpu = smp_processor_id();
 		if (!local_irq_count(cpu))
 			get_irqlock(cpu);
 	}
@@ -369,11 +370,12 @@ void __global_cli(void)
 void __global_sti(void)
 {
-	int cpu = smp_processor_id();
+	int cpu = get_cpu();
 	if (!local_irq_count(cpu))
 		release_irqlock(cpu);
 	__sti();
+	put_cpu();
 }
 /*

--- a/arch/i386/kernel/time.c
+++ b/arch/i386/kernel/time.c
@@ -65,6 +65,7 @@
 */
 #include <linux/irq.h>
+u64 jiffies_64;
 unsigned long cpu_khz;	/* Detected as we calibrate the TSC */

--- a/arch/i386/mm/Makefile
+++ b/arch/i386/mm/Makefile
@@ -9,6 +9,7 @@
 O_TARGET := mm.o
-obj-y	 := init.o fault.o ioremap.o extable.o
+obj-y	 := init.o fault.o ioremap.o extable.o pageattr.o
+export-objs := pageattr.o
 include $(TOPDIR)/Rules.make
--- a/arch/i386/mm/ioremap.c
+++ b/arch/i386/mm/ioremap.c
@@ -10,12 +10,13 @@
 #include <linux/vmalloc.h>
 #include <linux/init.h>
+#include <linux/slab.h>
 #include <asm/io.h>
 #include <asm/pgalloc.h>
 #include <asm/fixmap.h>
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
+#include <asm/pgtable.h>
 static inline void remap_area_pte(pte_t * pte, unsigned long address, unsigned long size,
 	unsigned long phys_addr, unsigned long flags)
@@ -155,6 +156,7 @@ void * __ioremap(unsigned long phys_addr, unsigned long size, unsigned long flag
 	area = get_vm_area(size, VM_IOREMAP);
 	if (!area)
 		return NULL;
+	area->phys_addr = phys_addr;
 	addr = area->addr;
 	if (remap_area_pages(VMALLOC_VMADDR(addr), phys_addr, size, flags)) {
 		vfree(addr);
@@ -163,10 +165,71 @@ void * __ioremap(unsigned long phys_addr, unsigned long size, unsigned long flag
 	return (void *) (offset + (char *)addr);
 }
+/**
+ * ioremap_nocache     -   map bus memory into CPU space
+ * @offset:    bus address of the memory
+ * @size:      size of the resource to map
+ *
+ * ioremap_nocache performs a platform specific sequence of operations to
+ * make bus memory CPU accessible via the readb/readw/readl/writeb/
+ * writew/writel functions and the other mmio helpers. The returned
+ * address is not guaranteed to be usable directly as a virtual
+ * address. 
+ *
+ * This version of ioremap ensures that the memory is marked uncachable
+ * on the CPU as well as honouring existing caching rules from things like
+ * the PCI bus. Note that there are other caches and buffers on many 
+ * busses. In particular driver authors should read up on PCI writes
+ *
+ * It's useful if some control registers are in such an area and
+ * write combining or read caching is not desirable:
+ * 
+ * Must be freed with iounmap.
+ */
+void *ioremap_nocache (unsigned long phys_addr, unsigned long size)
+{
+	void *p = __ioremap(phys_addr, size, _PAGE_PCD);
+	if (!p) 
+		return p; 
+	if (phys_addr + size < virt_to_phys(high_memory)) { 
+		struct page *ppage = virt_to_page(__va(phys_addr));		
+		unsigned long npages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		BUG_ON(phys_addr+size > (unsigned long)high_memory);
+		BUG_ON(phys_addr + size < phys_addr);
+		if (change_page_attr(ppage, npages, PAGE_KERNEL_NOCACHE) < 0) { 
+			iounmap(p); 
+			p = NULL;
+		}
+	} 
+	return p;					
+}
 void iounmap(void *addr)
 {
-	if (addr > high_memory)
+	struct vm_struct *p;
-		return vfree((void *) (PAGE_MASK & (unsigned long) addr));
+	if (addr < high_memory) 
+		return; 
+	p = remove_kernel_area(addr); 
+	if (!p) { 
+		printk("__iounmap: bad address %p\n", addr);
+		return;
+	} 
+	BUG_ON(p->phys_addr == 0);  /* not allocated with ioremap */	
+	vmfree_area_pages(VMALLOC_VMADDR(p->addr), p->size);	
+	if (p->flags && p->phys_addr < virt_to_phys(high_memory)) { 
+		change_page_attr(virt_to_page(__va(p->phys_addr)),
+				 p->size >> PAGE_SHIFT,
+				 PAGE_KERNEL); 				 
+	} 
+	kfree(p); 
 }
 void __init *bt_ioremap(unsigned long phys_addr, unsigned long size)

--- a/arch/i386/mm/pageattr.c
+++ b/arch/i386/mm/pageattr.c
+/* 
+ * Copyright 2002 Andi Kleen, SuSE Labs. 
+ * Thanks to Ben LaHaise for precious feedback.
+ */ 
+#include <linux/config.h>
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/highmem.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <asm/uaccess.h>
+#include <asm/processor.h>
+static inline pte_t *lookup_address(unsigned long address) 
+{ 
+	pgd_t *pgd = pgd_offset_k(address); 
+	pmd_t *pmd = pmd_offset(pgd, address); 	       
+	if (pmd_large(*pmd))
+		return (pte_t *)pmd;
+        return pte_offset_kernel(pmd, address);
+} 
+static struct page *split_large_page(unsigned long address, pgprot_t prot)
+{ 
+	int i; 
+	unsigned long addr;
+	struct page *base = alloc_pages(GFP_KERNEL, 0);
+	pte_t *pbase;
+	if (!base) 
+		return NULL;
+	address = __pa(address);
+	addr = address & LARGE_PAGE_MASK; 
+	pbase = (pte_t *)page_address(base);
+	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
+		pbase[i] = pfn_pte(addr >> PAGE_SHIFT, 
+				   addr == address ? prot : PAGE_KERNEL);
+	}
+	return base;
+} 
+static void flush_kernel_map(void *dummy) 
+{ 
+	/* Could use CLFLUSH here if the CPU supports it (Hammer,P4) */
+	if (boot_cpu_data.x86_model >= 4) 
+		asm volatile("wbinvd":::"memory"); 
+	/* Flush all to work around Errata in early athlons regarding 
+	 * large page flushing. 
+	 */
+	__flush_tlb_all(); 	
+}
+static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte) 
+{ 
+	set_pte_atomic(kpte, pte); 	/* change init_mm */
+#ifndef CONFIG_X86_PAE
+	{
+		struct list_head *l;
+		spin_lock(&mmlist_lock);
+		list_for_each(l, &init_mm.mmlist) { 
+			struct mm_struct *mm = list_entry(l, struct mm_struct, mmlist);
+			pmd_t *pmd = pmd_offset(pgd_offset(mm, address), address);
+			set_pte_atomic((pte_t *)pmd, pte);
+		} 
+		spin_unlock(&mmlist_lock);
+	}
+#endif
+}
+/* 
+ * No more special protections in this 2/4MB area - revert to a
+ * large page again. 
+ */
+static inline void revert_page(struct page *kpte_page, unsigned long address)
+{
+	pte_t *linear = (pte_t *) 
+		pmd_offset(pgd_offset(&init_mm, address), address);
+	set_pmd_pte(linear,  address,
+		    pfn_pte((__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT,
+			    PAGE_KERNEL_LARGE));
+}
+static int
+__change_page_attr(struct page *page, pgprot_t prot, struct page **oldpage) 
+{ 
+	pte_t *kpte; 
+	unsigned long address;
+	struct page *kpte_page;
+#ifdef CONFIG_HIGHMEM
+	if (page >= highmem_start_page) 
+		BUG(); 
+#endif
+	address = (unsigned long)page_address(page);
+	kpte = lookup_address(address);
+	kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
+	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { 
+		if ((pte_val(*kpte) & _PAGE_PSE) == 0) { 
+			pte_t old = *kpte;
+			pte_t standard = mk_pte(page, PAGE_KERNEL); 
+			set_pte_atomic(kpte, mk_pte(page, prot)); 
+			if (pte_same(old,standard))
+				atomic_inc(&kpte_page->count);
+		} else {
+			struct page *split = split_large_page(address, prot); 
+			if (!split)
+				return -ENOMEM;
+			set_pmd_pte(kpte,address,mk_pte(split, PAGE_KERNEL));
+		}	
+	} else if ((pte_val(*kpte) & _PAGE_PSE) == 0) { 
+		set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
+		atomic_dec(&kpte_page->count); 
+	}
+	if (cpu_has_pse && (atomic_read(&kpte_page->count) == 1)) { 
+		*oldpage = kpte_page;
+		revert_page(kpte_page, address);
+	} 
+	return 0;
+} 
+static inline void flush_map(void)
+{	
+#ifdef CONFIG_SMP 
+	smp_call_function(flush_kernel_map, NULL, 1, 1);
+#endif	
+	flush_kernel_map(NULL);
+}
+struct deferred_page { 
+	struct deferred_page *next; 
+	struct page *fpage;
+}; 
+static struct deferred_page *df_list; /* protected by init_mm.mmap_sem */
+/*
+ * Change the page attributes of an page in the linear mapping.
+ *
+ * This should be used when a page is mapped with a different caching policy
+ * than write-back somewhere - some CPUs do not like it when mappings with
+ * different caching policies exist. This changes the page attributes of the
+ * in kernel linear mapping too.
+ * 
+ * The caller needs to ensure that there are no conflicting mappings elsewhere.
+ * This function only deals with the kernel linear map.
+ * 
+ * Caller must call global_flush_tlb() after this.
+ */
+int change_page_attr(struct page *page, int numpages, pgprot_t prot)
+{
+	int err = 0; 
+	struct page *fpage; 
+	int i; 
+	down_write(&init_mm.mmap_sem);
+	for (i = 0; i < numpages; i++, page++) { 
+		fpage = NULL;
+		err = __change_page_attr(page, prot, &fpage); 
+		if (err) 
+			break; 
+		if (fpage) { 
+			struct deferred_page *df;
+			df = kmalloc(sizeof(struct deferred_page), GFP_KERNEL); 
+			if (!df) {
+				flush_map();
+				__free_page(fpage);
+			} else { 
+				df->next = df_list;
+				df->fpage = fpage;				
+				df_list = df;
+			} 			
+		} 
+	} 	
+	up_write(&init_mm.mmap_sem); 
+	return err;
+}
+void global_flush_tlb(void)
+{ 
+	struct deferred_page *df, *next_df;
+	down_read(&init_mm.mmap_sem);
+	df = xchg(&df_list, NULL);
+	up_read(&init_mm.mmap_sem);
+	flush_map();
+	for (; df; df = next_df) { 
+		next_df = df->next;
+		if (df->fpage) 
+			__free_page(df->fpage);
+		kfree(df);
+	} 
+} 
+EXPORT_SYMBOL(change_page_attr);
+EXPORT_SYMBOL(global_flush_tlb);
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -27,6 +27,8 @@ extern rwlock_t xtime_lock;
 extern unsigned long wall_jiffies;
 extern unsigned long last_time_offset;
+u64 jiffies_64;
 #ifdef CONFIG_IA64_DEBUG_IRQ
 unsigned long last_cli_ip;

--- a/arch/m68k/kernel/time.c
+++ b/arch/m68k/kernel/time.c
@@ -24,6 +24,7 @@
 #include <linux/timex.h>
+u64 jiffies_64;
 static inline int set_rtc_mmss(unsigned long nowtime)
 {

--- a/arch/mips/kernel/time.c
+++ b/arch/mips/kernel/time.c
@@ -32,6 +32,8 @@
 #define USECS_PER_JIFFY (1000000/HZ)
 #define USECS_PER_JIFFY_FRAC ((1000000ULL << 32) / HZ & 0xffffffff)
+u64 jiffies_64;
 /*
 * forward reference
 */

--- a/arch/mips64/kernel/syscall.c
+++ b/arch/mips64/kernel/syscall.c
@@ -32,6 +32,8 @@
 #include <asm/sysmips.h>
 #include <asm/uaccess.h>
+u64 jiffies_64;
 extern asmlinkage void syscall_trace(void);
 asmlinkage int sys_pipe(abi64_no_regargs, struct pt_regs regs)

--- a/arch/parisc/kernel/time.c
+++ b/arch/parisc/kernel/time.c
@@ -30,6 +30,8 @@
 #include <linux/timex.h>
+u64 jiffies_64;
 extern rwlock_t xtime_lock;
 static int timer_value;

--- a/arch/ppc/kernel/time.c
+++ b/arch/ppc/kernel/time.c
@@ -70,6 +70,9 @@
 #include <asm/time.h>
+/* XXX false sharing with below? */
+u64 jiffies_64;
 unsigned long disarm_decr[NR_CPUS];
 extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);

--- a/arch/ppc64/kernel/time.c
+++ b/arch/ppc64/kernel/time.c
@@ -64,6 +64,8 @@
 void smp_local_timer_interrupt(struct pt_regs *);
+u64 jiffies_64;
 /* keep track of when we need to update the rtc */
 time_t last_rtc_update;
 extern rwlock_t xtime_lock;

--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -39,6 +39,8 @@
 #define TICK_SIZE tick
+u64 jiffies_64;
 static ext_int_info_t ext_int_info_timer;
 static uint64_t init_timer_cc;

--- a/arch/s390x/kernel/time.c
+++ b/arch/s390x/kernel/time.c
@@ -39,6 +39,8 @@
 #define TICK_SIZE tick
+u64 jiffies_64;
 static ext_int_info_t ext_int_info_timer;
 static uint64_t init_timer_cc;

--- a/arch/sh/kernel/time.c
+++ b/arch/sh/kernel/time.c
@@ -70,6 +70,8 @@
 #endif /* CONFIG_CPU_SUBTYPE_ST40STB1 */
 #endif /* __sh3__ or __SH4__ */
+u64 jiffies_64;
 extern rwlock_t xtime_lock;
 extern unsigned long wall_jiffies;
 #define TICK_SIZE tick

--- a/arch/sparc/kernel/time.c
+++ b/arch/sparc/kernel/time.c
@@ -43,6 +43,8 @@
 extern rwlock_t xtime_lock;
+u64 jiffies_64;
 enum sparc_clock_type sp_clock_typ;
 spinlock_t mostek_lock = SPIN_LOCK_UNLOCKED;
 unsigned long mstk48t02_regs = 0UL;

--- a/arch/sparc64/kernel/time.c
+++ b/arch/sparc64/kernel/time.c
@@ -44,6 +44,8 @@ unsigned long mstk48t02_regs = 0UL;
 unsigned long ds1287_regs = 0UL;
 #endif
+u64 jiffies_64;
 static unsigned long mstk48t08_regs = 0UL;
 static unsigned long mstk48t59_regs = 0UL;

--- a/arch/x86_64/Makefile
+++ b/arch/x86_64/Makefile
@@ -43,15 +43,9 @@ CFLAGS += -mcmodel=kernel
 CFLAGS += -pipe
 # this makes reading assembly source easier
 CFLAGS += -fno-reorder-blocks	
-# needed for later gcc 3.1
 CFLAGS += -finline-limit=2000
-# needed for earlier gcc 3.1
-#CFLAGS += -fno-strength-reduce
 #CFLAGS += -g
-# prevent gcc from keeping the stack 16 byte aligned (FIXME)
-#CFLAGS += -mpreferred-stack-boundary=2
 HEAD := arch/x86_64/kernel/head.o arch/x86_64/kernel/head64.o arch/x86_64/kernel/init_task.o
 SUBDIRS := arch/x86_64/tools $(SUBDIRS) arch/x86_64/kernel arch/x86_64/mm arch/x86_64/lib

--- a/arch/x86_64/boot/Makefile
+++ b/arch/x86_64/boot/Makefile
@@ -21,10 +21,6 @@ ROOT_DEV := CURRENT
 SVGA_MODE := -DSVGA_MODE=NORMAL_VGA
-# If you want the RAM disk device, define this to be the size in blocks.
-RAMDISK := -DRAMDISK=512
 # ---------------------------------------------------------------------------
 BOOT_INCL =	$(TOPDIR)/include/linux/config.h \

--- a/arch/x86_64/config.in
+++ b/arch/x86_64/config.in
@@ -47,8 +47,7 @@ define_bool CONFIG_EISA n
 define_bool CONFIG_X86_IO_APIC y
 define_bool CONFIG_X86_LOCAL_APIC y
-#currently broken:
+bool 'MTRR (Memory Type Range Register) support' CONFIG_MTRR
-#bool 'MTRR (Memory Type Range Register) support' CONFIG_MTRR
 bool 'Symmetric multi-processing support' CONFIG_SMP
 if [ "$CONFIG_SMP" = "n" ]; then
   bool 'Preemptible Kernel' CONFIG_PREEMPT
@@ -226,6 +225,7 @@ if [ "$CONFIG_DEBUG_KERNEL" != "n" ]; then
   bool '  Spinlock debugging' CONFIG_DEBUG_SPINLOCK
   bool '  Additional run-time checks' CONFIG_CHECKING
   bool '  Debug __init statements' CONFIG_INIT_DEBUG
+   bool '  Spinlock debugging' CONFIG_DEBUG_SPINLOCK
 fi
 endmenu

--- a/arch/x86_64/ia32/Makefile
+++ b/arch/x86_64/ia32/Makefile
@@ -9,8 +9,9 @@ export-objs := ia32_ioctl.o sys_ia32.o
 all: ia32.o
 O_TARGET := ia32.o
-obj-$(CONFIG_IA32_EMULATION) := ia32entry.o sys_ia32.o ia32_ioctl.o ia32_signal.o \
+obj-$(CONFIG_IA32_EMULATION) := ia32entry.o sys_ia32.o ia32_ioctl.o \
-	ia32_binfmt.o fpu32.o socket32.o ptrace32.o
+	ia32_signal.o \
+	ia32_binfmt.o fpu32.o socket32.o ptrace32.o ipc32.o
 clean::

--- a/arch/x86_64/ia32/ipc32.c
+++ b/arch/x86_64/ia32/ipc32.c
--- a/arch/x86_64/ia32/sys_ia32.c
+++ b/arch/x86_64/ia32/sys_ia32.c
--- a/arch/x86_64/kernel/ioport.c
+++ b/arch/x86_64/kernel/ioport.c
@@ -14,6 +14,7 @@
 #include <linux/smp.h>
 #include <linux/smp_lock.h>
 #include <linux/stddef.h>
+#include <linux/slab.h>
 /* Set EXTENT bits starting at BASE in BITMAP to value TURN_ON. */
 static void set_bitmap(unsigned long *bitmap, short base, short extent, int new_value)
@@ -61,27 +62,19 @@ asmlinkage int sys_ioperm(unsigned long from, unsigned long num, int turn_on)
 		return -EINVAL;
 	if (turn_on && !capable(CAP_SYS_RAWIO))
 		return -EPERM;
-	/*
-	 * If it's the first ioperm() call in this thread's lifetime, set the
+	if (!t->io_bitmap_ptr) { 
-	 * IO bitmap up. ioperm() is much less timing critical than clone(),
+		t->io_bitmap_ptr = kmalloc((IO_BITMAP_SIZE+1)*4, GFP_KERNEL);
-	 * this is why we delay this operation until now:
+		if (!t->io_bitmap_ptr) 
-	 */
+			return -ENOMEM; 
- 	if (!t->ioperm) {
+		memset(t->io_bitmap_ptr,0xff,(IO_BITMAP_SIZE+1)*4);
-		/*
-		 * just in case ...
-		 */
-		memset(t->io_bitmap,0xff,(IO_BITMAP_SIZE+1)*4);
-		t->ioperm = 1;
-		/*
-		 * this activates it in the TSS
-		 */
 		tss->io_map_base = IO_BITMAP_OFFSET;
 	}
 	/*
 	 * do it in the per-thread copy and in the TSS ...
 	 */
-	set_bitmap((unsigned long *) t->io_bitmap, from, num, !turn_on);
+	set_bitmap((unsigned long *) t->io_bitmap_ptr, from, num, !turn_on);
 	set_bitmap((unsigned long *) tss->io_bitmap, from, num, !turn_on);
 	return 0;

--- a/arch/x86_64/kernel/mtrr.c
+++ b/arch/x86_64/kernel/mtrr.c
--- a/arch/x86_64/kernel/process.c
+++ b/arch/x86_64/kernel/process.c
@@ -39,6 +39,7 @@
 #include <linux/reboot.h>
 #include <linux/init.h>
 #include <linux/ctype.h>
+#include <linux/slab.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -320,9 +321,6 @@ void show_regs(struct pt_regs * regs)
 	printk("CR2: %016lx CR3: %016lx CR4: %016lx\n", cr2, cr3, cr4);
 }
-#define __STR(x) #x
-#define __STR2(x) __STR(x)
 extern void load_gs_index(unsigned);
 /*
@@ -330,7 +328,13 @@ extern void load_gs_index(unsigned);
 */
 void exit_thread(void)
 {
-	/* nothing to do ... */
+	struct task_struct *me = current;
+	if (me->thread.io_bitmap_ptr) { 
+		kfree(me->thread.io_bitmap_ptr); 
+		me->thread.io_bitmap_ptr = NULL;
+		(init_tss + smp_processor_id())->io_map_base = 
+			INVALID_IO_BITMAP_OFFSET;
+	}
 }
 void flush_thread(void)
@@ -392,6 +396,14 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long rsp,
 	unlazy_fpu(current);	
 	p->thread.i387 = current->thread.i387;
+	if (unlikely(me->thread.io_bitmap_ptr != NULL)) { 
+		p->thread.io_bitmap_ptr = kmalloc((IO_BITMAP_SIZE+1)*4, GFP_KERNEL);
+		if (!p->thread.io_bitmap_ptr) 
+			return -ENOMEM;
+		memcpy(p->thread.io_bitmap_ptr, me->thread.io_bitmap_ptr, 
+		       (IO_BITMAP_SIZE+1)*4);
+	} 
 	return 0;
 }
@@ -491,21 +503,14 @@ void __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/* 
 	 * Handle the IO bitmap 
 	 */ 
-	if (unlikely(prev->ioperm || next->ioperm)) {
+	if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr)) {
-		if (next->ioperm) {
+		if (next->io_bitmap_ptr) {
 			/*
 			 * 4 cachelines copy ... not good, but not that
 			 * bad either. Anyone got something better?
 			 * This only affects processes which use ioperm().
-			 * [Putting the TSSs into 4k-tlb mapped regions
-			 * and playing VM tricks to switch the IO bitmap
-			 * is not really acceptable.]
-			 * On x86-64 we could put multiple bitmaps into 
-			 * the GDT and just switch offsets
-			 * This would require ugly special cases on overflow
-			 * though -AK 
 			 */
-			memcpy(tss->io_bitmap, next->io_bitmap,
+			memcpy(tss->io_bitmap, next->io_bitmap_ptr,
 				 IO_BITMAP_SIZE*sizeof(u32));
 			tss->io_map_base = IO_BITMAP_OFFSET;
 		} else {

--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -91,6 +91,9 @@ void pda_init(int cpu)
 	pda->me = pda;
 	pda->cpudata_offset = 0;
+	pda->active_mm = &init_mm;
+	pda->mmu_state = 0;
 	asm volatile("movl %0,%%fs ; movl %0,%%gs" :: "r" (0)); 
 	wrmsrl(MSR_GS_BASE, cpu_pda + cpu);
 } 

--- a/arch/x86_64/kernel/signal.c
+++ b/arch/x86_64/kernel/signal.c
@@ -84,7 +84,6 @@ struct rt_sigframe
 	char *pretcode;
 	struct ucontext uc;
 	struct siginfo info;
-	struct _fpstate fpstate;
 };
 static int
@@ -186,8 +185,7 @@ asmlinkage long sys_rt_sigreturn(struct pt_regs regs)
 */
 static int
-setup_sigcontext(struct sigcontext *sc, struct _fpstate *fpstate,
+setup_sigcontext(struct sigcontext *sc, struct pt_regs *regs, unsigned long mask)
-		 struct pt_regs *regs, unsigned long mask)
 {
 	int tmp, err = 0;
 	struct task_struct *me = current;
@@ -221,20 +219,17 @@ setup_sigcontext(struct sigcontext *sc, struct _fpstate *fpstate,
 	err |= __put_user(mask, &sc->oldmask);
 	err |= __put_user(me->thread.cr2, &sc->cr2);
-	tmp = save_i387(fpstate);
-	if (tmp < 0)
-	  err = 1;
-	else
-	  err |= __put_user(tmp ? fpstate : NULL, &sc->fpstate);
 	return err;
 }
 /*
 * Determine which stack to use..
 */
-static inline struct rt_sigframe *
-get_sigframe(struct k_sigaction *ka, struct pt_regs * regs)
+#define round_down(p, r) ((void *)  ((unsigned long)((p) - (r) + 1) & ~((r)-1)))
+static void *
+get_stack(struct k_sigaction *ka, struct pt_regs *regs, unsigned long size)
 {
 	unsigned long rsp;
@@ -247,22 +242,34 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs * regs)
 			rsp = current->sas_ss_sp + current->sas_ss_size;
 	}
-	rsp = (rsp - sizeof(struct _fpstate)) & ~(15UL); 
+	return round_down(rsp - size, 16); 
-	rsp -= offsetof(struct rt_sigframe, fpstate);
-	return (struct rt_sigframe *) rsp; 
 }
 static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 			   sigset_t *set, struct pt_regs * regs)
 {
-	struct rt_sigframe *frame;
+	struct rt_sigframe *frame = NULL;
+	struct _fpstate *fp = NULL; 
 	int err = 0;
-	frame = get_sigframe(ka, regs);
+	if (current->used_math) {
+		fp = get_stack(ka, regs, sizeof(struct _fpstate)); 
+		frame = round_down((char *)fp - sizeof(struct rt_sigframe), 16) - 8;
+		if (!access_ok(VERIFY_WRITE, fp, sizeof(struct _fpstate))) { 
+		goto give_sigsegv;
+		}
+		if (save_i387(fp) < 0) 
+			err |= -1; 
+	}
+	if (!frame)
+		frame = get_stack(ka, regs, sizeof(struct rt_sigframe)) - 8;
-	if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
+	if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame))) {
 		goto give_sigsegv;
+	}
 	if (ka->sa.sa_flags & SA_SIGINFO) { 
 		err |= copy_siginfo_to_user(&frame->info, info);
@@ -278,14 +285,10 @@ static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	err |= __put_user(sas_ss_flags(regs->rsp),
 			  &frame->uc.uc_stack.ss_flags);
 	err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
-	err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
+	err |= setup_sigcontext(&frame->uc.uc_mcontext, regs, set->sig[0]);
-			        regs, set->sig[0]);
+	err |= __put_user(fp, &frame->uc.uc_mcontext.fpstate);
 	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
-	if (err) { 
-		goto give_sigsegv;
-	} 
 	/* Set up to return from userspace.  If provided, use a stub
 	   already in userspace.  */
 	/* x86-64 should always use SA_RESTORER. */
@@ -297,7 +300,6 @@ static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	}
 	if (err) { 
-		printk("fault 3\n"); 
 		goto give_sigsegv;
 	} 
@@ -305,7 +307,6 @@ static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	printk("%d old rip %lx old rsp %lx old rax %lx\n", current->pid,regs->rip,regs->rsp,regs->rax);
 #endif
 	/* Set up registers for signal handler */
 	{ 
 		struct exec_domain *ed = current_thread_info()->exec_domain;
@@ -320,9 +321,10 @@ static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	   next argument after the signal number on the stack. */
 	regs->rsi = (unsigned long)&frame->info; 
 	regs->rdx = (unsigned long)&frame->uc; 
-	regs->rsp = (unsigned long) frame;
 	regs->rip = (unsigned long) ka->sa.sa_handler;
+	regs->rsp = (unsigned long)frame;
 	set_fs(USER_DS);
 	regs->eflags &= ~TF_MASK;

--- a/arch/x86_64/kernel/smp.c
+++ b/arch/x86_64/kernel/smp.c
@@ -25,8 +25,6 @@
 /* The 'big kernel lock' */
 spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
-struct tlb_state cpu_tlbstate[NR_CPUS] = {[0 ... NR_CPUS-1] = { &init_mm, 0 }};
 /*
 * the following functions deal with sending IPIs between CPUs.
 *
@@ -147,9 +145,9 @@ static spinlock_t tlbstate_lock = SPIN_LOCK_UNLOCKED;
 */
 static void inline leave_mm (unsigned long cpu)
 {
-	if (cpu_tlbstate[cpu].state == TLBSTATE_OK)
+	if (read_pda(mmu_state) == TLBSTATE_OK)
 		BUG();
-	clear_bit(cpu, &cpu_tlbstate[cpu].active_mm->cpu_vm_mask);
+	clear_bit(cpu, &read_pda(active_mm)->cpu_vm_mask);
 	__flush_tlb();
 }
@@ -164,18 +162,18 @@ static void inline leave_mm (unsigned long cpu)
 * 	the other cpus, but smp_invalidate_interrupt ignore flush ipis
 * 	for the wrong mm, and in the worst case we perform a superflous
 * 	tlb flush.
- * 1a2) set cpu_tlbstate to TLBSTATE_OK
+ * 1a2) set cpu mmu_state to TLBSTATE_OK
 * 	Now the smp_invalidate_interrupt won't call leave_mm if cpu0
 *	was in lazy tlb mode.
- * 1a3) update cpu_tlbstate[].active_mm
+ * 1a3) update cpu active_mm
 * 	Now cpu0 accepts tlb flushes for the new mm.
 * 1a4) set_bit(cpu, &new_mm->cpu_vm_mask);
 * 	Now the other cpus will send tlb flush ipis.
 * 1a4) change cr3.
 * 1b) thread switch without mm change
- *	cpu_tlbstate[].active_mm is correct, cpu0 already handles
+ *	cpu active_mm is correct, cpu0 already handles
 *	flush ipis.
- * 1b1) set cpu_tlbstate to TLBSTATE_OK
+ * 1b1) set cpu mmu_state to TLBSTATE_OK
 * 1b2) test_and_set the cpu bit in cpu_vm_mask.
 * 	Atomically set the bit [other cpus will start sending flush ipis],
 * 	and test the bit.
@@ -188,7 +186,7 @@ static void inline leave_mm (unsigned long cpu)
 *   runs in kernel space, the cpu could load tlb entries for user space
 *   pages.
 *
- * The good news is that cpu_tlbstate is local to each cpu, no
+ * The good news is that cpu mmu_state is local to each cpu, no
 * write/read ordering problems.
 */
@@ -216,8 +214,8 @@ asmlinkage void smp_invalidate_interrupt (void)
 		 * BUG();
 		 */
-	if (flush_mm == cpu_tlbstate[cpu].active_mm) {
+	if (flush_mm == read_pda(active_mm)) {
-		if (cpu_tlbstate[cpu].state == TLBSTATE_OK) {
+		if (read_pda(mmu_state) == TLBSTATE_OK) {
 			if (flush_va == FLUSH_ALL)
 				local_flush_tlb();
 			else
@@ -335,7 +333,7 @@ static inline void do_flush_tlb_all_local(void)
 	unsigned long cpu = smp_processor_id();
 	__flush_tlb_all();
-	if (cpu_tlbstate[cpu].state == TLBSTATE_LAZY)
+	if (read_pda(mmu_state) == TLBSTATE_LAZY)
 		leave_mm(cpu);
 }

--- a/arch/x86_64/kernel/vsyscall.c
+++ b/arch/x86_64/kernel/vsyscall.c
@@ -47,7 +47,7 @@
 #define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
-#define NO_VSYSCALL 1
+//#define NO_VSYSCALL 1
 #ifdef NO_VSYSCALL
 #include <asm/unistd.h>

--- a/arch/x86_64/kernel/x8664_ksyms.c
+++ b/arch/x86_64/kernel/x8664_ksyms.c
@@ -189,3 +189,5 @@ EXPORT_SYMBOL_NOVERS(do_softirq_thunk);
 void out_of_line_bug(void);
 EXPORT_SYMBOL(out_of_line_bug);
+EXPORT_SYMBOL(init_level4_pgt);
--- a/arch/x86_64/lib/Makefile
+++ b/arch/x86_64/lib/Makefile
@@ -12,7 +12,7 @@ obj-y  = csum-partial.o csum-copy.o csum-wrappers.o delay.o \
 	thunk.o io.o clear_page.o copy_page.o
 obj-y += memcpy.o
 obj-y += memmove.o
-#obj-y += memset.o
+obj-y += memset.o
 obj-y += copy_user.o
 export-objs := io.o csum-wrappers.o csum-partial.o

--- a/arch/x86_64/lib/memset.S
+++ b/arch/x86_64/lib/memset.S
-/* Copyright 2002 Andi Kleen, SuSE Labs */
+/* Copyright 2002 Andi Kleen */
-	// #define FIX_ALIGNMENT 1
 /*
 * ISO C memset - set a memory block to a byte value.
@@ -11,51 +9,51 @@
 * 
 * rax   original destination
 */	
-	.globl ____memset
+ 	.globl __memset
+	.globl memset
 	.p2align
-____memset:
+memset:	
-	movq %rdi,%r10		/* save destination for return address */
+__memset:
-	movq %rdx,%r11		/* save count */ 
+	movq %rdi,%r10
+	movq %rdx,%r11
 	/* expand byte value  */
-	movzbl %sil,%ecx	/* zero extend char value */
+	movzbl %sil,%ecx
-	movabs $0x0101010101010101,%rax		/* expansion pattern */
+	movabs $0x0101010101010101,%rax
-	mul    %rcx		/* expand with rax, clobbers rdx */
+	mul    %rcx		/* with rax, clobbers rdx */
-#ifdef FIX_ALIGNMENT
 	/* align dst */
 	movl  %edi,%r9d		
-	andl  $7,%r9d		/* test unaligned bits */
+	andl  $7,%r9d	
 	jnz  bad_alignment
 after_bad_alignment:
-#endif
-	movq %r11,%rcx		/* restore count */
+	movq %r11,%rcx
-	shrq $6,%rcx		/* divide by 64 */
+	movl $64,%r8d
-	jz	 handle_tail	/* block smaller than 64 bytes? */
+	shrq $6,%rcx
-	movl $64,%r8d		/* CSE loop block size */
+	jz	 handle_tail
 loop_64:	
-	movnti  %rax,0*8(%rdi) 
+	movnti  %rax,(%rdi) 
-	movnti  %rax,1*8(%rdi) 
+	movnti  %rax,8(%rdi) 
-	movnti  %rax,2*8(%rdi) 
+	movnti  %rax,16(%rdi) 
-	movnti  %rax,3*8(%rdi) 
+	movnti  %rax,24(%rdi) 
-	movnti  %rax,4*8(%rdi) 
+	movnti  %rax,32(%rdi) 
-	movnti  %rax,5*8(%rdi) 
+	movnti  %rax,40(%rdi) 
-	movnti  %rax,6*8(%rdi) 
+	movnti  %rax,48(%rdi) 
-	movnti  %rax,7*8(%rdi) 	/* clear 64 byte blocks */
+	movnti  %rax,56(%rdi) 
-	addq    %r8,%rdi	/* increase pointer by 64 bytes */
+	addq    %r8,%rdi
-	loop	loop_64		/* decrement rcx and if not zero loop */
+	loop	loop_64
 	/* Handle tail in loops. The loops should be faster than hard
 	   to predict jump tables. */ 
 handle_tail:
 	movl	%r11d,%ecx
-	andl    $63,%ecx
+	andl    $63&(~7),%ecx
-	shrl	$3,%ecx
 	jz 	handle_7
+	shrl	$3,%ecx
 loop_8:
-	movnti  %rax,(%rdi) 	/* long words */
+	movnti  %rax,(%rdi) 
 	addq    $8,%rdi
 	loop 	loop_8   
@@ -64,22 +62,20 @@ handle_7:
 	andl	$7,%ecx
 	jz      ende
 loop_1:
-	movb 	%al,(%rdi)	/* bytes */
+	movb 	%al,(%rdi)
-	incq	%rdi
+	addq	$1,%rdi
 	loop	loop_1
 ende:	
 	movq	%r10,%rax
 	ret
-#ifdef FIX_ALIGNMENT
 bad_alignment:
-	andq $-8,%r11		/* shorter than 8 bytes */
+	cmpq $7,%r11
-	jz      handle_7	/* if yes handle it in the tail code */
+	jbe	handle_7
-	movnti %rax,(%rdi)	/* unaligned store of 8 bytes */
+	movnti %rax,(%rdi)	/* unaligned store */
 	movq $8,%r8			
-	subq %r9,%r8 		/* compute alignment (8-misalignment) */
+	subq %r9,%r8 
-	addq %r8,%rdi		/* fix destination */
+	addq %r8,%rdi
-	subq %r8,%r11		/* fix count */
+	subq %r8,%r11
 	jmp after_bad_alignment
-#endif
--- a/drivers/block/DAC960.c
+++ b/drivers/block/DAC960.c
@@ -28,6 +28,7 @@
 #include <linux/types.h>
 #include <linux/blk.h>
 #include <linux/blkdev.h>
+#include <linux/bio.h>
 #include <linux/completion.h>
 #include <linux/delay.h>
 #include <linux/genhd.h>

--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -30,6 +30,7 @@
 #include <linux/delay.h>
 #include <linux/major.h>
 #include <linux/fs.h>
+#include <linux/bio.h>
 #include <linux/blkpg.h>
 #include <linux/timer.h>
 #include <linux/proc_fs.h>

--- a/drivers/block/cpqarray.c
+++ b/drivers/block/cpqarray.c
@@ -24,6 +24,7 @@
 #include <linux/version.h>
 #include <linux/types.h>
 #include <linux/pci.h>
+#include <linux/bio.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/delay.h>

--- a/drivers/block/elevator.c
+++ b/drivers/block/elevator.c
@@ -28,6 +28,7 @@
 #include <linux/fs.h>
 #include <linux/blkdev.h>
 #include <linux/elevator.h>
+#include <linux/bio.h>
 #include <linux/blk.h>
 #include <linux/config.h>
 #include <linux/module.h>

--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -165,6 +165,7 @@ static int print_unex=1;
 #include <linux/errno.h>
 #include <linux/slab.h>
 #include <linux/mm.h>
+#include <linux/bio.h>
 #include <linux/string.h>
 #include <linux/fcntl.h>
 #include <linux/delay.h>

--- a/drivers/block/ll_rw_blk.c
+++ b/drivers/block/ll_rw_blk.c
@@ -18,6 +18,7 @@
 #include <linux/errno.h>
 #include <linux/string.h>
 #include <linux/config.h>
+#include <linux/bio.h>
 #include <linux/mm.h>
 #include <linux/swap.h>
 #include <linux/init.h>
@@ -2002,8 +2003,8 @@ int __init blk_dev_init(void)
 	queue_nr_requests = (total_ram >> 8) & ~15;	/* One per quarter-megabyte */
 	if (queue_nr_requests < 32)
 		queue_nr_requests = 32;
-	if (queue_nr_requests > 512)
+	if (queue_nr_requests > 256)
-		queue_nr_requests = 512;
+		queue_nr_requests = 256;
 	/*
 	 * Batch frees according to queue length

--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -60,6 +60,7 @@
 #include <linux/sched.h>
 #include <linux/fs.h>
 #include <linux/file.h>
+#include <linux/bio.h>
 #include <linux/stat.h>
 #include <linux/errno.h>
 #include <linux/major.h>
@@ -168,6 +169,15 @@ static void figure_loop_size(struct loop_device *lo)
 }
+static inline int lo_do_transfer(struct loop_device *lo, int cmd, char *rbuf,
+				 char *lbuf, int size, int rblock)
+{
+	if (!lo->transfer)
+		return 0;
+	return lo->transfer(lo, cmd, rbuf, lbuf, size, rblock);
+}
 static int
 do_lo_send(struct loop_device *lo, struct bio_vec *bvec, int bsize, loff_t pos)
 {
@@ -454,20 +464,43 @@ static struct bio *loop_get_buffer(struct loop_device *lo, struct bio *rbh)
 out_bh:
 	bio->bi_sector = rbh->bi_sector + (lo->lo_offset >> 9);
 	bio->bi_rw = rbh->bi_rw;
-	spin_lock_irq(&lo->lo_lock);
 	bio->bi_bdev = lo->lo_device;
-	spin_unlock_irq(&lo->lo_lock);
 	return bio;
 }
-static int loop_make_request(request_queue_t *q, struct bio *rbh)
+static int
+bio_transfer(struct loop_device *lo, struct bio *to_bio,
+			      struct bio *from_bio)
+{
+	unsigned long IV = loop_get_iv(lo, from_bio->bi_sector);
+	struct bio_vec *from_bvec, *to_bvec;
+	char *vto, *vfrom;
+	int ret = 0, i;
+	__bio_for_each_segment(from_bvec, from_bio, i, 0) {
+		to_bvec = &to_bio->bi_io_vec[i];
+		kmap(from_bvec->bv_page);
+		kmap(to_bvec->bv_page);
+		vfrom = page_address(from_bvec->bv_page) + from_bvec->bv_offset;
+		vto = page_address(to_bvec->bv_page) + to_bvec->bv_offset;
+		ret |= lo_do_transfer(lo, bio_data_dir(to_bio), vto, vfrom,
+					from_bvec->bv_len, IV);
+		kunmap(from_bvec->bv_page);
+		kunmap(to_bvec->bv_page);
+	}
+	return ret;
+}
+static int loop_make_request(request_queue_t *q, struct bio *old_bio)
 {
-	struct bio *bh = NULL;
+	struct bio *new_bio = NULL;
 	struct loop_device *lo;
 	unsigned long IV;
-	int rw = bio_rw(rbh);
+	int rw = bio_rw(old_bio);
-	int unit = minor(to_kdev_t(rbh->bi_bdev->bd_dev));
+	int unit = minor(to_kdev_t(old_bio->bi_bdev->bd_dev));
 	if (unit >= max_loop)
 		goto out;
@@ -489,60 +522,41 @@ static int loop_make_request(request_queue_t *q, struct bio *rbh)
 		goto err;
 	}
-	blk_queue_bounce(q, &rbh);
+	blk_queue_bounce(q, &old_bio);
 	/*
 	 * file backed, queue for loop_thread to handle
 	 */
 	if (lo->lo_flags & LO_FLAGS_DO_BMAP) {
-		loop_add_bio(lo, rbh);
+		loop_add_bio(lo, old_bio);
 		return 0;
 	}
 	/*
 	 * piggy old buffer on original, and submit for I/O
 	 */
-	bh = loop_get_buffer(lo, rbh);
+	new_bio = loop_get_buffer(lo, old_bio);
-	IV = loop_get_iv(lo, rbh->bi_sector);
+	IV = loop_get_iv(lo, old_bio->bi_sector);
 	if (rw == WRITE) {
-		if (lo_do_transfer(lo, WRITE, bio_data(bh), bio_data(rbh),
+		if (bio_transfer(lo, new_bio, old_bio))
-				   bh->bi_size, IV))
 			goto err;
 	}
-	generic_make_request(bh);
+	generic_make_request(new_bio);
 	return 0;
 err:
 	if (atomic_dec_and_test(&lo->lo_pending))
 		up(&lo->lo_bh_mutex);
-	loop_put_buffer(bh);
+	loop_put_buffer(new_bio);
 out:
-	bio_io_error(rbh);
+	bio_io_error(old_bio);
 	return 0;
 inactive:
 	spin_unlock_irq(&lo->lo_lock);
 	goto out;
 }
-static int do_bio_blockbacked(struct loop_device *lo, struct bio *bio,
-			      struct bio *rbh)
-{
-	unsigned long IV = loop_get_iv(lo, rbh->bi_sector);
-	struct bio_vec *from;
-	char *vto, *vfrom;
-	int ret = 0, i;
-	bio_for_each_segment(from, rbh, i) {
-		vfrom = page_address(from->bv_page) + from->bv_offset;
-		vto = page_address(bio->bi_io_vec[i].bv_page) + bio->bi_io_vec[i].bv_offset;
-		ret |= lo_do_transfer(lo, bio_data_dir(bio), vto, vfrom,
-					from->bv_len, IV);
-	}
-	return ret;
-}
 static inline void loop_handle_bio(struct loop_device *lo, struct bio *bio)
 {
 	int ret;
@@ -556,7 +570,7 @@ static inline void loop_handle_bio(struct loop_device *lo, struct bio *bio)
 	} else {
 		struct bio *rbh = bio->bi_private;
-		ret = do_bio_blockbacked(lo, bio, rbh);
+		ret = bio_transfer(lo, bio, rbh);
 		bio_endio(rbh, !ret);
 		loop_put_buffer(bio);
@@ -588,10 +602,8 @@ static int loop_thread(void *data)
 	set_user_nice(current, -20);
-	spin_lock_irq(&lo->lo_lock);
 	lo->lo_state = Lo_bound;
 	atomic_inc(&lo->lo_pending);
-	spin_unlock_irq(&lo->lo_lock);
 	/*
 	 * up sem, we are running

--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -39,6 +39,7 @@
 #include <linux/init.h>
 #include <linux/sched.h>
 #include <linux/fs.h>
+#include <linux/bio.h>
 #include <linux/stat.h>
 #include <linux/errno.h>
 #include <linux/file.h>

--- a/drivers/block/rd.c
+++ b/drivers/block/rd.c
@@ -45,6 +45,8 @@
 #include <linux/config.h>
 #include <linux/string.h>
 #include <linux/slab.h>
+#include <asm/atomic.h>
+#include <linux/bio.h>
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/devfs_fs_kernel.h>

--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -37,6 +37,7 @@
 #include <linux/config.h>
 #include <linux/sched.h>
 #include <linux/fs.h>
+#include <linux/bio.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/mman.h>

--- a/drivers/char/agp/agp.h
+++ b/drivers/char/agp/agp.h
@@ -118,8 +118,8 @@ struct agp_bridge_data {
 	int (*remove_memory) (agp_memory *, off_t, int);
 	agp_memory *(*alloc_by_type) (size_t, int);
 	void (*free_by_type) (agp_memory *);
-	unsigned long (*agp_alloc_page) (void);
+	void *(*agp_alloc_page) (void);
-	void (*agp_destroy_page) (unsigned long);
+	void (*agp_destroy_page) (void *);
 	int (*suspend)(void);
 	void (*resume)(void);

--- a/drivers/char/agp/agpgart_be.c
+++ b/drivers/char/agp/agpgart_be.c
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -252,6 +252,7 @@
 #include <linux/poll.h>
 #include <linux/init.h>
 #include <linux/fs.h>
+#include <linux/tqueue.h>
 #include <asm/processor.h>
 #include <asm/uaccess.h>

--- a/drivers/ide/ioctl.c
+++ b/drivers/ide/ioctl.c
@@ -345,7 +345,8 @@ int ata_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned
 			if (!arg) {
 				if (ide_spin_wait_hwgroup(drive))
 					return -EBUSY;
-				else
+				/* Do nothing, just unlock */
+				spin_unlock_irq(drive->channel->lock);
 				return 0;
 			}

--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -20,7 +20,7 @@
 #include <linux/raid/md.h>
 #include <linux/slab.h>
+#include <linux/bio.h>
 #include <linux/raid/linear.h>
 #define MAJOR_NR MD_MAJOR

--- a/drivers/md/lvm-snap.c
+++ b/drivers/md/lvm-snap.c
@@ -224,7 +224,7 @@ static inline void invalidate_snap_cache(unsigned long start, unsigned long nr,
 	for (i = 0; i < nr; i++)
 	{
-		bh = get_hash_table(dev, start++, blksize);
+		bh = find_get_block(dev, start++, blksize);
 		if (bh)
 			bforget(bh);
 	}

--- a/drivers/md/lvm.c
+++ b/drivers/md/lvm.c
@@ -209,6 +209,7 @@
 #include <linux/hdreg.h>
 #include <linux/stat.h>
 #include <linux/fs.h>
+#include <linux/bio.h>
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
 #include <linux/genhd.h>

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -33,6 +33,7 @@
 #include <linux/linkage.h>
 #include <linux/raid/md.h>
 #include <linux/sysctl.h>
+#include <linux/bio.h>
 #include <linux/raid/xor.h>
 #include <linux/devfs_fs_kernel.h>

--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -23,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/raid/multipath.h>
+#include <linux/bio.h>
 #include <linux/buffer_head.h>
 #include <asm/atomic.h>

--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -20,6 +20,7 @@
 #include <linux/module.h>
 #include <linux/raid/raid0.h>
+#include <linux/bio.h>
 #define MAJOR_NR MD_MAJOR
 #define MD_DRIVER

--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -23,6 +23,7 @@
 */
 #include <linux/raid/raid1.h>
+#include <linux/bio.h>
 #define MAJOR_NR MD_MAJOR
 #define MD_DRIVER

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -20,6 +20,7 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/raid/raid5.h>
+#include <linux/bio.h>
 #include <asm/bitops.h>
 #include <asm/atomic.h>

--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -210,3 +210,4 @@ EXPORT_SYMBOL(pci_match_device);
 EXPORT_SYMBOL(pci_register_driver);
 EXPORT_SYMBOL(pci_unregister_driver);
 EXPORT_SYMBOL(pci_dev_driver);
+EXPORT_SYMBOL(pci_bus_type);
--- a/drivers/pcmcia/pci_socket.c
+++ b/drivers/pcmcia/pci_socket.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>
 #include <linux/pci.h>
 #include <linux/sched.h>
+#include <linux/tqueue.h>
 #include <linux/interrupt.h>
 #include <pcmcia/ss.h>

--- a/drivers/pcmcia/yenta.c
+++ b/drivers/pcmcia/yenta.c
@@ -6,6 +6,7 @@
 #include <linux/init.h>
 #include <linux/pci.h>
 #include <linux/sched.h>
+#include <linux/tqueue.h>
 #include <linux/interrupt.h>
 #include <linux/delay.h>
 #include <linux/module.h>

--- a/drivers/scsi/README.st
+++ b/drivers/scsi/README.st
@@ -2,7 +2,7 @@ This file contains brief information about the SCSI tape driver.
 The driver is currently maintained by Kai M{kisara (email
 Kai.Makisara@metla.fi)
-Last modified: Tue Jan 22 21:08:57 2002 by makisara
+Last modified: Tue Jun 18 18:13:50 2002 by makisara
 BASICS
@@ -105,15 +105,19 @@ The default is BSD semantics.
 BUFFERING
-The driver uses tape buffers allocated either at system initialization
+The driver uses tape buffers allocated at run-time when needed and it
-or at run-time when needed. One buffer is used for each open tape
+is freed when the device file is closed. One buffer is used for each
-device. The size of the buffers is selectable at compile and/or boot
+open tape device. 
-time. The buffers are used to store the data being transferred to/from
-the SCSI adapter. The following buffering options are selectable at
+The size of the buffers is always at least one tape block. In fixed
-compile time and/or at run time (via ioctl):
+block mode, the minimum buffer size is defined (in 1024 byte units) by
+ST_FIXED_BUFFER_BLOCKS. With small block size this allows buffering of
+several blocks and using one SCSI read or write to transfer all of the
+blocks. Buffering of data across write calls in fixed block mode is
+allowed if ST_BUFFER_WRITES is non-zero. Buffer allocation uses chunks of
+memory having sizes 2^n * (page size). Because of this the actual
+buffer size may be larger than the minimum allowable buffer size.
-Buffering of data across write calls in fixed block mode (define
-ST_BUFFER_WRITES).
 Asynchronous writing. Writing the buffer contents to the tape is
 started and the write call returns immediately. The status is checked
@@ -128,30 +132,6 @@ attempted even if the user does not want to get all of the data at
 this read command. Should be disabled for those drives that don't like
 a filemark to truncate a read request or that don't like backspacing.
-The buffer size is defined (in 1024 byte units) by ST_BUFFER_BLOCKS or
-at boot time. If this size is not large enough, the driver tries to
-temporarily enlarge the buffer. Buffer allocation uses chunks of
-memory having sizes 2^n * (page size). Because of this the actual
-buffer size may be larger than the buffer size specified with
-ST_BUFFER_BLOCKS.
-A small number of buffers are allocated at driver initialisation. The
-maximum number of these buffers is defined by ST_MAX_BUFFERS. The
-maximum can be changed with kernel or module startup options. One
-buffer is allocated for each drive detected when the driver is
-initialized up to the maximum.
-The driver tries to allocate new buffers at run-time if
-necessary. These buffers are freed after use. If the maximum number of
-initial buffers is set to zero, all buffer allocation is done at
-run-time. The advantage of run-time allocation is that memory is not
-wasted for buffers not being used. The disadvantage is that there may
-not be memory available at the time when a buffer is needed for the
-first time (once a buffer is allocated, it is not released). This risk
-should not be big if the tape drive is connected to a PCI adapter that
-supports scatter/gather (the allocation is not limited to "DMA memory"
-and the buffer can be composed of several fragments).
 The threshold for triggering asynchronous write in fixed block mode
 is defined by ST_WRITE_THRESHOLD. This may be optimized for each
 use pattern. The default triggers asynchronous write after three

--- a/drivers/scsi/cpqfcTSinit.c
+++ b/drivers/scsi/cpqfcTSinit.c
@@ -39,6 +39,7 @@
 #include <linux/pci.h>
 #include <linux/delay.h>
 #include <linux/timer.h>
+#include <linux/init.h>
 #include <linux/ioport.h>  // request_region() prototype
 #include <linux/vmalloc.h> // ioremap()
 //#if LINUX_VERSION_CODE >= LinuxVersionCode(2,4,7)

--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -23,6 +23,7 @@
 #include <linux/timer.h>
 #include <linux/string.h>
 #include <linux/slab.h>
+#include <linux/bio.h>
 #include <linux/ioport.h>
 #include <linux/kernel.h>
 #include <linux/stat.h>

--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -36,6 +36,7 @@
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/mm.h>
+#include <linux/bio.h>
 #include <linux/string.h>
 #include <linux/hdreg.h>
 #include <linux/errno.h>

--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -39,6 +39,7 @@
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/mm.h>
+#include <linux/bio.h>
 #include <linux/string.h>
 #include <linux/errno.h>
 #include <linux/cdrom.h>

--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
--- a/drivers/scsi/st_options.h
+++ b/drivers/scsi/st_options.h
@@ -3,7 +3,7 @@
   Copyright 1995-2000 Kai Makisara.
-   Last modified: Tue Jan 22 21:52:34 2002 by makisara
+   Last modified: Sun May  5 15:09:56 2002 by makisara
 */
 #ifndef _ST_OPTIONS_H
@@ -30,22 +30,17 @@
   SENSE. */
 #define ST_DEFAULT_BLOCK 0
-/* The tape driver buffer size in kilobytes. Must be non-zero. */
+/* The minimum tape driver buffer size in kilobytes in fixed block mode.
-#define ST_BUFFER_BLOCKS 32
+   Must be non-zero. */
+#define ST_FIXED_BUFFER_BLOCKS 32
 /* The number of kilobytes of data in the buffer that triggers an
   asynchronous write in fixed block mode. See also ST_ASYNC_WRITES
   below. */
 #define ST_WRITE_THRESHOLD_BLOCKS 30
-/* The maximum number of tape buffers the driver tries to allocate at 
-   driver initialisation. The number is also constrained by the number
-   of drives detected. If more buffers are needed, they are allocated
-   at run time and freed after use. */
-#define ST_MAX_BUFFERS 4
 /* Maximum number of scatter/gather segments */
-#define ST_MAX_SG      16
+#define ST_MAX_SG      64
 /* The number of scatter/gather segments to allocate at first try (must be
   smaller or equal to the maximum). */

--- a/fs/bio.c
+++ b/fs/bio.c
@@ -17,6 +17,7 @@
 *
 */
 #include <linux/mm.h>
+#include <linux/bio.h>
 #include <linux/blk.h>
 #include <linux/slab.h>
 #include <linux/iobuf.h>
@@ -284,8 +285,8 @@ struct bio *bio_copy(struct bio *bio, int gfp_mask, int copy)
 			vto = kmap(bbv->bv_page);
 		} else {
 			local_irq_save(flags);
-			vfrom = kmap_atomic(bv->bv_page, KM_BIO_IRQ);
+			vfrom = kmap_atomic(bv->bv_page, KM_BIO_SRC_IRQ);
-			vto = kmap_atomic(bbv->bv_page, KM_BIO_IRQ);
+			vto = kmap_atomic(bbv->bv_page, KM_BIO_DST_IRQ);
 		}
 		memcpy(vto + bbv->bv_offset, vfrom + bv->bv_offset, bv->bv_len);
@@ -293,8 +294,8 @@ struct bio *bio_copy(struct bio *bio, int gfp_mask, int copy)
 			kunmap(bbv->bv_page);
 			kunmap(bv->bv_page);
 		} else {
-			kunmap_atomic(vto, KM_BIO_IRQ);
+			kunmap_atomic(vto, KM_BIO_DST_IRQ);
-			kunmap_atomic(vfrom, KM_BIO_IRQ);
+			kunmap_atomic(vfrom, KM_BIO_SRC_IRQ);
 			local_irq_restore(flags);
 		}
 	}

--- a/fs/buffer.c
+++ b/fs/buffer.c
--- a/fs/coda/dir.c
+++ b/fs/coda/dir.c
--- a/fs/ext3/balloc.c
+++ b/fs/ext3/balloc.c
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1650,7 +1650,7 @@ ext3_clear_blocks(handle_t *handle, struct inode *inode, struct buffer_head *bh,
 			struct buffer_head *bh;
 			*p = 0;
-			bh = sb_get_hash_table(inode->i_sb, nr);
+			bh = sb_find_get_block(inode->i_sb, nr);
 			ext3_forget(handle, 0, inode, bh, nr);
 		}
 	}

--- a/fs/inode.c
+++ b/fs/inode.c
--- a/fs/intermezzo/dir.c
+++ b/fs/intermezzo/dir.c
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
--- a/fs/jbd/revoke.c
+++ b/fs/jbd/revoke.c
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
--- a/fs/namei.c
+++ b/fs/namei.c
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
--- a/fs/qnx4/fsync.c
+++ b/fs/qnx4/fsync.c
--- a/fs/reiserfs/fix_node.c
+++ b/fs/reiserfs/fix_node.c
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
--- a/fs/select.c
+++ b/fs/select.c
--- a/fs/ufs/truncate.c
+++ b/fs/ufs/truncate.c
--- a/include/asm-alpha/agp.h
+++ b/include/asm-alpha/agp.h
--- a/include/asm-i386/agp.h
+++ b/include/asm-i386/agp.h
--- a/include/asm-i386/cacheflush.h
+++ b/include/asm-i386/cacheflush.h
--- a/include/asm-i386/io.h
+++ b/include/asm-i386/io.h
--- a/include/asm-i386/kmap_types.h
+++ b/include/asm-i386/kmap_types.h
--- a/include/asm-i386/page.h
+++ b/include/asm-i386/page.h
--- a/include/asm-i386/pgtable-2level.h
+++ b/include/asm-i386/pgtable-2level.h
--- a/include/asm-i386/pgtable-3level.h
+++ b/include/asm-i386/pgtable-3level.h
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
--- a/include/asm-ia64/agp.h
+++ b/include/asm-ia64/agp.h
--- a/include/asm-ppc/kmap_types.h
+++ b/include/asm-ppc/kmap_types.h
--- a/include/asm-sparc/kmap_types.h
+++ b/include/asm-sparc/kmap_types.h
--- a/include/asm-sparc64/agp.h
+++ b/include/asm-sparc64/agp.h
--- a/include/asm-x86_64/agp.h
+++ b/include/asm-x86_64/agp.h
--- a/include/asm-x86_64/cacheflush.h
+++ b/include/asm-x86_64/cacheflush.h
--- a/include/asm-x86_64/i387.h
+++ b/include/asm-x86_64/i387.h
--- a/include/asm-x86_64/ia32.h
+++ b/include/asm-x86_64/ia32.h
--- a/include/asm-x86_64/ipc.h
+++ b/include/asm-x86_64/ipc.h
--- a/include/asm-x86_64/kmap_types.h
+++ b/include/asm-x86_64/kmap_types.h
--- a/include/asm-x86_64/mmu_context.h
+++ b/include/asm-x86_64/mmu_context.h
--- a/include/asm-x86_64/msr.h
+++ b/include/asm-x86_64/msr.h
--- a/include/asm-x86_64/mtrr.h
+++ b/include/asm-x86_64/mtrr.h
--- a/include/asm-x86_64/pda.h
+++ b/include/asm-x86_64/pda.h
--- a/include/asm-x86_64/processor.h
+++ b/include/asm-x86_64/processor.h
--- a/include/asm-x86_64/spinlock.h
+++ b/include/asm-x86_64/spinlock.h
--- a/include/asm-x86_64/string.h
+++ b/include/asm-x86_64/string.h
--- a/include/asm-x86_64/suspend.h
+++ b/include/asm-x86_64/suspend.h
--- a/include/asm-x86_64/system.h
+++ b/include/asm-x86_64/system.h
--- a/include/asm-x86_64/timex.h
+++ b/include/asm-x86_64/timex.h
--- a/include/asm-x86_64/tlbflush.h
+++ b/include/asm-x86_64/tlbflush.h
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
--- a/include/linux/loop.h
+++ b/include/linux/loop.h
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
--- a/include/linux/raid/raid5.h
+++ b/include/linux/raid/raid5.h
--- a/include/linux/reiserfs_fs.h
+++ b/include/linux/reiserfs_fs.h
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
--- a/include/linux/tqueue.h
+++ b/include/linux/tqueue.h
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
--- a/kernel/context.c
+++ b/kernel/context.c
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
--- a/kernel/ksyms.c
+++ b/kernel/ksyms.c
--- a/kernel/suspend.c
+++ b/kernel/suspend.c
--- a/kernel/sys.c
+++ b/kernel/sys.c
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
--- a/kernel/timer.c
+++ b/kernel/timer.c
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
--- a/mm/highmem.c
+++ b/mm/highmem.c
--- a/mm/msync.c
+++ b/mm/msync.c
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
--- a/mm/page_io.c
+++ b/mm/page_io.c
--- a/mm/shmem.c
+++ b/mm/shmem.c
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c