Commit 1f60ade2 authored by Linus Torvalds's avatar Linus Torvalds

Merge master.kernel.org:/home/mingo/bk-sched

into home.transmeta.com:/home/torvalds/v2.5/linux
parents 8509486a 3986594c
......@@ -50,27 +50,27 @@ prototypes:
int (*removexattr) (struct dentry *, const char *);
locking rules:
all may block
BKL i_sem(inode)
lookup: no yes
create: no yes
link: no yes (both)
mknod: no yes
symlink: no yes
mkdir: no yes
unlink: no yes (both)
rmdir: no yes (both) (see below)
rename: no yes (all) (see below)
readlink: no no
follow_link: no no
truncate: no yes (see below)
setattr: no yes
permission: yes no
getattr: no no
setxattr: no yes
getxattr: no yes
listxattr: no yes
removexattr: no yes
all may block, none have BKL
i_sem(inode)
lookup: yes
create: yes
link: yes (both)
mknod: yes
symlink: yes
mkdir: yes
unlink: yes (both)
rmdir: yes (both) (see below)
rename: yes (all) (see below)
readlink: no
follow_link: no
truncate: yes (see below)
setattr: yes
permission: no
getattr: no
setxattr: yes
getxattr: yes
listxattr: yes
removexattr: yes
Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_sem on
victim.
cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
......
......@@ -81,9 +81,9 @@ can relax your locking.
[mandatory]
->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename() and ->readdir()
are called without BKL now. Grab it on the entry, drop upon return - that
will guarantee the same locking you used to have. If your method or its
->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename(), ->permission()
and ->readdir() are called without BKL now. Grab it on entry, drop upon return
- that will guarantee the same locking you used to have. If your method or its
parts do not need BKL - better yet, now you can shift lock_kernel() and
unlock_kernel() so that they would protect exactly what needs to be
protected.
......
......@@ -948,120 +948,43 @@ program to load modules on demand.
-----------------------------------------------
The files in this directory can be used to tune the operation of the virtual
memory (VM) subsystem of the Linux kernel. In addition, one of the files
(bdflush) has some influence on disk usage.
memory (VM) subsystem of the Linux kernel.
bdflush
-------
This file controls the operation of the bdflush kernel daemon. It currently
contains nine integer values, six of which are actually used by the kernel.
They are listed in table 2-2.
Table 2-2: Parameters in /proc/sys/vm/bdflush
..............................................................................
Value Meaning
nfract Percentage of buffer cache dirty to activate bdflush
ndirty Maximum number of dirty blocks to write out per wake-cycle
nrefill Number of clean buffers to try to obtain each time we call refill
nref_dirt buffer threshold for activating bdflush when trying to refill
buffers.
dummy Unused
age_buffer Time for normal buffer to age before we flush it
age_super Time for superblock to age before we flush it
dummy Unused
dummy Unused
..............................................................................
nfract
------
This parameter governs the maximum number of dirty buffers in the buffer
cache. Dirty means that the contents of the buffer still have to be written to
disk (as opposed to a clean buffer, which can just be forgotten about).
Setting this to a higher value means that Linux can delay disk writes for a
long time, but it also means that it will have to do a lot of I/O at once when
memory becomes short. A lower value will spread out disk I/O more evenly.
ndirty
------
Ndirty gives the maximum number of dirty buffers that bdflush can write to the
disk at one time. A high value will mean delayed, bursty I/O, while a small
value can lead to memory shortage when bdflush isn't woken up often enough.
nrefill
-------
This is the number of buffers that bdflush will add to the list of free
buffers when refill_freelist() is called. It is necessary to allocate free
buffers beforehand, since the buffers are often different sizes than the
memory pages and some bookkeeping needs to be done beforehand. The higher the
number, the more memory will be wasted and the less often refill_freelist()
will need to run.
nref_dirt
---------
When refill_freelist() comes across more than nref_dirt dirty buffers, it will
wake up bdflush.
age_buffer and age_super
------------------------
Finally, the age_buffer and age_super parameters govern the maximum time Linux
waits before writing out a dirty buffer to disk. The value is expressed in
jiffies (clockticks), the number of jiffies per second is 100. Age_buffer is
the maximum age for data blocks, while age_super is for filesystems meta data.
buffermem
---------
The three values in this file control how much memory should be used for
buffer memory. The percentage is calculated as a percentage of total system
memory.
The values are:
min_percent
-----------
dirty_background_ratio
----------------------
This is the minimum percentage of memory that should be spent on buffer
memory.
Contains, as a percentage of total system memory, the number of pages at which
the pdflush background writeback daemon will start writing out dirty data.
borrow_percent
--------------
dirty_async_ratio
-----------------
When Linux is short on memory, and the buffer cache uses more than it has been
allotted, the memory management (MM) subsystem will prune the buffer cache
more heavily than other memory to compensate.
Contains, as a percentage of total system memory, the number of pages at which
a process which is generating disk writes will itself start writing out dirty
data.
max_percent
-----------
dirty_sync_ratio
----------------
This is the maximum amount of memory that can be used for buffer memory.
Contains, as a percentage of total system memory, the number of pages at which
a process which is generating disk writes will itself start writing out dirty
data and waiting upon completion of that writeout.
freepages
---------
dirty_writeback_centisecs
-------------------------
This file contains three values: min, low and high:
The pdflush writeback daemons will periodically wake up and write `old' data
out to disk. This tunable expresses the interval between those wakeups, in
100'ths of a second.
min
---
When the number of free pages in the system reaches this number, only the
kernel can allocate more memory.
dirty_expire_centisecs
----------------------
low
---
If the number of free pages falls below this point, the kernel starts swapping
aggressively.
This tunable is used to define when dirty data is old enough to be eligible
for writeout by the pdflush daemons. It is expressed in 100'ths of a second.
Data which has been dirty in-memory for longer than this interval will be
written out next time a pdflush daemon wakes up.
high
----
The kernel tries to keep up to this amount of memory free; if memory falls
below this point, the kernel starts gently swapping in the hopes that it never
has to do really aggressive swapping.
kswapd
------
......@@ -1113,79 +1036,6 @@ On the other hand, enabling this feature can cause you to run out of memory
and thrash the system to death, so large and/or important servers will want to
set this value to 0.
pagecache
---------
This file does exactly the same job as buffermem, only this file controls the
amount of memory allowed for memory mapping and generic caching of files.
You don't want the minimum level to be too low, otherwise your system might
thrash when memory is tight or fragmentation is high.
pagetable_cache
---------------
The kernel keeps a number of page tables in a per-processor cache (this helps
a lot on SMP systems). The cache size for each processor will be between the
low and the high value.
On a low-memory, single CPU system, you can safely set these values to 0 so
you don't waste memory. It is used on SMP systems so that the system can
perform fast pagetable allocations without having to acquire the kernel memory
lock.
For large systems, the settings are probably fine. For normal systems they
won't hurt a bit. For small systems ( less than 16MB ram) it might be
advantageous to set both values to 0.
swapctl
-------
This file contains no less than 8 variables. All of these values are used by
kswapd.
The first four variables
* sc_max_page_age,
* sc_page_advance,
* sc_page_decline and
* sc_page_initial_age
are used to keep track of Linux's page aging. Page aging is a bookkeeping
method to track which pages of memory are often used, and which pages can be
swapped out without consequences.
When a page is swapped in, it starts at sc_page_initial_age (default 3) and
when the page is scanned by kswapd, its age is adjusted according to the
following scheme:
* If the page was used since the last time we scanned, its age is increased
by sc_page_advance (default 3). Where the maximum value is given by
sc_max_page_age (default 20).
* Otherwise (meaning it wasn't used) its age is decreased by sc_page_decline
(default 1).
When a page reaches age 0, it's ready to be swapped out.
The variables sc_age_cluster_fract, sc_age_cluster_min, sc_pageout_weight and
sc_bufferout_weight, can be used to control kswapd's aggressiveness in
swapping out pages.
Sc_age_cluster_fract is used to calculate how many pages from a process are to
be scanned by kswapd. The formula used is
(sc_age_cluster_fract divided by 1024) times resident set size
So if you want kswapd to scan the whole process, sc_age_cluster_fract needs to
have a value of 1024. The minimum number of pages kswapd will scan is
represented by sc_age_cluster_min, which is done so that kswapd will also scan
small processes.
The values of sc_pageout_weight and sc_bufferout_weight are used to control
how many tries kswapd will make in order to swap out one page/buffer. These
values can be used to fine-tune the ratio between user pages and buffer/cache
memory. When you find that your Linux system is swapping out too many process
pages in order to satisfy buffer memory demands, you may want to either
increase sc_bufferout_weight, or decrease the value of sc_pageout_weight.
2.5 /proc/sys/dev - Device specific parameters
----------------------------------------------
......
......@@ -9,116 +9,28 @@ This file contains the documentation for the sysctl files in
/proc/sys/vm and is valid for Linux kernel version 2.2.
The files in this directory can be used to tune the operation
of the virtual memory (VM) subsystem of the Linux kernel, and
one of the files (bdflush) also has a little influence on disk
usage.
of the virtual memory (VM) subsystem of the Linux kernel and
the writeout of dirty data to disk.
Default values and initialization routines for most of these
files can be found in mm/swap.c.
Currently, these files are in /proc/sys/vm:
- bdflush
- buffermem
- freepages
- kswapd
- overcommit_memory
- page-cluster
- pagecache
- pagetable_cache
- dirty_async_ratio
- dirty_background_ratio
- dirty_expire_centisecs
- dirty_sync_ratio
- dirty_writeback_centisecs
==============================================================
bdflush:
This file controls the operation of the bdflush kernel
daemon. The source code to this struct can be found in
linux/fs/buffer.c. It currently contains 9 integer values,
of which 4 are actually used by the kernel.
From linux/fs/buffer.c:
--------------------------------------------------------------
union bdflush_param {
struct {
int nfract; /* Percentage of buffer cache dirty to
activate bdflush */
int dummy1; /* old "ndirty" */
int dummy2; /* old "nrefill" */
int dummy3; /* unused */
int interval; /* jiffies delay between kupdate flushes */
int age_buffer; /* Time for normal buffer to age */
int nfract_sync;/* Percentage of buffer cache dirty to
activate bdflush synchronously */
int dummy4; /* unused */
int dummy5; /* unused */
} b_un;
unsigned int data[N_PARAM];
} bdf_prm = {{30, 64, 64, 256, 5*HZ, 30*HZ, 60, 0, 0}};
--------------------------------------------------------------
int nfract:
The first parameter governs the maximum number of dirty
buffers in the buffer cache. Dirty means that the contents
of the buffer still have to be written to disk (as opposed
to a clean buffer, which can just be forgotten about).
Setting this to a high value means that Linux can delay disk
writes for a long time, but it also means that it will have
to do a lot of I/O at once when memory becomes short. A low
value will spread out disk I/O more evenly, at the cost of
more frequent I/O operations. The default value is 30%,
the minimum is 0%, and the maximum is 100%.
int interval:
The fifth parameter, interval, is the minimum rate at
which kupdate will wake and flush. The value is expressed in
jiffies (clockticks), the number of jiffies per second is
normally 100 (Alpha is 1024). Thus, x*HZ is x seconds. The
default value is 5 seconds, the minimum is 0 seconds, and the
maximum is 600 seconds.
int age_buffer:
The sixth parameter, age_buffer, governs the maximum time
Linux waits before writing out a dirty buffer to disk. The
value is in jiffies. The default value is 30 seconds,
the minimum is 1 second, and the maximum 6,000 seconds.
int nfract_sync:
The seventh parameter, nfract_sync, governs the percentage
of buffer cache that is dirty before bdflush activates
synchronously. This can be viewed as the hard limit before
bdflush forces buffers to disk. The default is 60%, the
minimum is 0%, and the maximum is 100%.
==============================================================
buffermem:
The three values in this file correspond to the values in
the struct buffer_mem. It controls how much memory should
be used for buffer memory. The percentage is calculated
as a percentage of total system memory.
The values are:
min_percent -- this is the minimum percentage of memory
that should be spent on buffer memory
borrow_percent -- UNUSED
max_percent -- UNUSED
==============================================================
freepages:
dirty_async_ratio, dirty_background_ratio, dirty_expire_centisecs,
dirty_sync_ratio dirty_writeback_centisecs:
This file contains the values in the struct freepages. That
struct contains three members: min, low and high.
The meaning of the numbers is:
freepages.min When the number of free pages in the system
reaches this number, only the kernel can
allocate more memory.
freepages.low If the number of free pages gets below this
point, the kernel starts swapping aggressively.
freepages.high The kernel tries to keep up to this amount of
memory free; if memory comes below this point,
the kernel gently starts swapping in the hopes
that it never has to do real aggressive swapping.
See Documentation/filesystems/proc.txt
==============================================================
......@@ -180,38 +92,3 @@ The number of pages the kernel reads in at once is equal to
2 ^ page-cluster. Values above 2 ^ 5 don't make much sense
for swap because we only cluster swap data in 32-page groups.
==============================================================
pagecache:
This file does exactly the same as buffermem, only this
file controls the struct page_cache, and thus controls
the amount of memory used for the page cache.
In 2.2, the page cache is used for 3 main purposes:
- caching read() data from files
- caching mmap()ed data and executable files
- swap cache
When your system is both deep in swap and high on cache,
it probably means that a lot of the swapped data is being
cached, making for more efficient swapping than possible
with the 2.0 kernel.
==============================================================
pagetable_cache:
The kernel keeps a number of page tables in a per-processor
cache (this helps a lot on SMP systems). The cache size for
each processor will be between the low and the high value.
On a low-memory, single CPU system you can safely set these
values to 0 so you don't waste the memory. On SMP systems it
is used so that the system can do fast pagetable allocations
without having to acquire the kernel memory lock.
For large systems, the settings are probably OK. For normal
systems they won't hurt a bit. For small systems (<16MB ram)
it might be advantageous to set both values to 0.
......@@ -48,6 +48,8 @@
#include "proto.h"
#include "irq_impl.h"
u64 jiffies_64;
extern rwlock_t xtime_lock;
extern unsigned long wall_jiffies; /* kernel/timer.c */
......
......@@ -32,6 +32,8 @@
#include <asm/irq.h>
#include <asm/leds.h>
u64 jiffies_64;
extern rwlock_t xtime_lock;
extern unsigned long wall_jiffies;
......
......@@ -44,6 +44,8 @@
#include <asm/svinto.h>
u64 jiffies_64;
static int have_rtc; /* used to remember if we have an RTC or not */
/* define this if you need to use print_timestamp */
......
......@@ -360,8 +360,9 @@ void __global_cli(void)
__save_flags(flags);
if (flags & (1 << EFLAGS_IF_SHIFT)) {
int cpu = smp_processor_id();
int cpu;
__cli();
cpu = smp_processor_id();
if (!local_irq_count(cpu))
get_irqlock(cpu);
}
......@@ -369,11 +370,12 @@ void __global_cli(void)
void __global_sti(void)
{
int cpu = smp_processor_id();
int cpu = get_cpu();
if (!local_irq_count(cpu))
release_irqlock(cpu);
__sti();
put_cpu();
}
/*
......
......@@ -65,6 +65,7 @@
*/
#include <linux/irq.h>
u64 jiffies_64;
unsigned long cpu_khz; /* Detected as we calibrate the TSC */
......
......@@ -9,6 +9,7 @@
O_TARGET := mm.o
obj-y := init.o fault.o ioremap.o extable.o
obj-y := init.o fault.o ioremap.o extable.o pageattr.o
export-objs := pageattr.o
include $(TOPDIR)/Rules.make
......@@ -10,12 +10,13 @@
#include <linux/vmalloc.h>
#include <linux/init.h>
#include <linux/slab.h>
#include <asm/io.h>
#include <asm/pgalloc.h>
#include <asm/fixmap.h>
#include <asm/cacheflush.h>
#include <asm/tlbflush.h>
#include <asm/pgtable.h>
static inline void remap_area_pte(pte_t * pte, unsigned long address, unsigned long size,
unsigned long phys_addr, unsigned long flags)
......@@ -155,6 +156,7 @@ void * __ioremap(unsigned long phys_addr, unsigned long size, unsigned long flag
area = get_vm_area(size, VM_IOREMAP);
if (!area)
return NULL;
area->phys_addr = phys_addr;
addr = area->addr;
if (remap_area_pages(VMALLOC_VMADDR(addr), phys_addr, size, flags)) {
vfree(addr);
......@@ -163,10 +165,71 @@ void * __ioremap(unsigned long phys_addr, unsigned long size, unsigned long flag
return (void *) (offset + (char *)addr);
}
/**
* ioremap_nocache - map bus memory into CPU space
* @offset: bus address of the memory
* @size: size of the resource to map
*
* ioremap_nocache performs a platform specific sequence of operations to
* make bus memory CPU accessible via the readb/readw/readl/writeb/
* writew/writel functions and the other mmio helpers. The returned
* address is not guaranteed to be usable directly as a virtual
* address.
*
* This version of ioremap ensures that the memory is marked uncachable
* on the CPU as well as honouring existing caching rules from things like
* the PCI bus. Note that there are other caches and buffers on many
* busses. In particular driver authors should read up on PCI writes
*
* It's useful if some control registers are in such an area and
* write combining or read caching is not desirable:
*
* Must be freed with iounmap.
*/
void *ioremap_nocache (unsigned long phys_addr, unsigned long size)
{
void *p = __ioremap(phys_addr, size, _PAGE_PCD);
if (!p)
return p;
if (phys_addr + size < virt_to_phys(high_memory)) {
struct page *ppage = virt_to_page(__va(phys_addr));
unsigned long npages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
BUG_ON(phys_addr+size > (unsigned long)high_memory);
BUG_ON(phys_addr + size < phys_addr);
if (change_page_attr(ppage, npages, PAGE_KERNEL_NOCACHE) < 0) {
iounmap(p);
p = NULL;
}
}
return p;
}
void iounmap(void *addr)
{
if (addr > high_memory)
return vfree((void *) (PAGE_MASK & (unsigned long) addr));
struct vm_struct *p;
if (addr < high_memory)
return;
p = remove_kernel_area(addr);
if (!p) {
printk("__iounmap: bad address %p\n", addr);
return;
}
BUG_ON(p->phys_addr == 0); /* not allocated with ioremap */
vmfree_area_pages(VMALLOC_VMADDR(p->addr), p->size);
if (p->flags && p->phys_addr < virt_to_phys(high_memory)) {
change_page_attr(virt_to_page(__va(p->phys_addr)),
p->size >> PAGE_SHIFT,
PAGE_KERNEL);
}
kfree(p);
}
void __init *bt_ioremap(unsigned long phys_addr, unsigned long size)
......
/*
* Copyright 2002 Andi Kleen, SuSE Labs.
* Thanks to Ben LaHaise for precious feedback.
*/
#include <linux/config.h>
#include <linux/mm.h>
#include <linux/sched.h>
#include <linux/highmem.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <asm/uaccess.h>
#include <asm/processor.h>
static inline pte_t *lookup_address(unsigned long address)
{
pgd_t *pgd = pgd_offset_k(address);
pmd_t *pmd = pmd_offset(pgd, address);
if (pmd_large(*pmd))
return (pte_t *)pmd;
return pte_offset_kernel(pmd, address);
}
static struct page *split_large_page(unsigned long address, pgprot_t prot)
{
int i;
unsigned long addr;
struct page *base = alloc_pages(GFP_KERNEL, 0);
pte_t *pbase;
if (!base)
return NULL;
address = __pa(address);
addr = address & LARGE_PAGE_MASK;
pbase = (pte_t *)page_address(base);
for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
pbase[i] = pfn_pte(addr >> PAGE_SHIFT,
addr == address ? prot : PAGE_KERNEL);
}
return base;
}
static void flush_kernel_map(void *dummy)
{
/* Could use CLFLUSH here if the CPU supports it (Hammer,P4) */
if (boot_cpu_data.x86_model >= 4)
asm volatile("wbinvd":::"memory");
/* Flush all to work around Errata in early athlons regarding
* large page flushing.
*/
__flush_tlb_all();
}
static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
{
set_pte_atomic(kpte, pte); /* change init_mm */
#ifndef CONFIG_X86_PAE
{
struct list_head *l;
spin_lock(&mmlist_lock);
list_for_each(l, &init_mm.mmlist) {
struct mm_struct *mm = list_entry(l, struct mm_struct, mmlist);
pmd_t *pmd = pmd_offset(pgd_offset(mm, address), address);
set_pte_atomic((pte_t *)pmd, pte);
}
spin_unlock(&mmlist_lock);
}
#endif
}
/*
* No more special protections in this 2/4MB area - revert to a
* large page again.
*/
static inline void revert_page(struct page *kpte_page, unsigned long address)
{
pte_t *linear = (pte_t *)
pmd_offset(pgd_offset(&init_mm, address), address);
set_pmd_pte(linear, address,
pfn_pte((__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT,
PAGE_KERNEL_LARGE));
}
static int
__change_page_attr(struct page *page, pgprot_t prot, struct page **oldpage)
{
pte_t *kpte;
unsigned long address;
struct page *kpte_page;
#ifdef CONFIG_HIGHMEM
if (page >= highmem_start_page)
BUG();
#endif
address = (unsigned long)page_address(page);
kpte = lookup_address(address);
kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) {
if ((pte_val(*kpte) & _PAGE_PSE) == 0) {
pte_t old = *kpte;
pte_t standard = mk_pte(page, PAGE_KERNEL);
set_pte_atomic(kpte, mk_pte(page, prot));
if (pte_same(old,standard))
atomic_inc(&kpte_page->count);
} else {
struct page *split = split_large_page(address, prot);
if (!split)
return -ENOMEM;
set_pmd_pte(kpte,address,mk_pte(split, PAGE_KERNEL));
}
} else if ((pte_val(*kpte) & _PAGE_PSE) == 0) {
set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
atomic_dec(&kpte_page->count);
}
if (cpu_has_pse && (atomic_read(&kpte_page->count) == 1)) {
*oldpage = kpte_page;
revert_page(kpte_page, address);
}
return 0;
}
static inline void flush_map(void)
{
#ifdef CONFIG_SMP
smp_call_function(flush_kernel_map, NULL, 1, 1);
#endif
flush_kernel_map(NULL);
}
struct deferred_page {
struct deferred_page *next;
struct page *fpage;
};
static struct deferred_page *df_list; /* protected by init_mm.mmap_sem */
/*
* Change the page attributes of an page in the linear mapping.
*
* This should be used when a page is mapped with a different caching policy
* than write-back somewhere - some CPUs do not like it when mappings with
* different caching policies exist. This changes the page attributes of the
* in kernel linear mapping too.
*
* The caller needs to ensure that there are no conflicting mappings elsewhere.
* This function only deals with the kernel linear map.
*
* Caller must call global_flush_tlb() after this.
*/
int change_page_attr(struct page *page, int numpages, pgprot_t prot)
{
int err = 0;
struct page *fpage;
int i;
down_write(&init_mm.mmap_sem);
for (i = 0; i < numpages; i++, page++) {
fpage = NULL;
err = __change_page_attr(page, prot, &fpage);
if (err)
break;
if (fpage) {
struct deferred_page *df;
df = kmalloc(sizeof(struct deferred_page), GFP_KERNEL);
if (!df) {
flush_map();
__free_page(fpage);
} else {
df->next = df_list;
df->fpage = fpage;
df_list = df;
}
}
}
up_write(&init_mm.mmap_sem);
return err;
}
void global_flush_tlb(void)
{
struct deferred_page *df, *next_df;
down_read(&init_mm.mmap_sem);
df = xchg(&df_list, NULL);
up_read(&init_mm.mmap_sem);
flush_map();
for (; df; df = next_df) {
next_df = df->next;
if (df->fpage)
__free_page(df->fpage);
kfree(df);
}
}
EXPORT_SYMBOL(change_page_attr);
EXPORT_SYMBOL(global_flush_tlb);
......@@ -27,6 +27,8 @@ extern rwlock_t xtime_lock;
extern unsigned long wall_jiffies;
extern unsigned long last_time_offset;
u64 jiffies_64;
#ifdef CONFIG_IA64_DEBUG_IRQ
unsigned long last_cli_ip;
......
......@@ -24,6 +24,7 @@
#include <linux/timex.h>
u64 jiffies_64;
static inline int set_rtc_mmss(unsigned long nowtime)
{
......
......@@ -32,6 +32,8 @@
#define USECS_PER_JIFFY (1000000/HZ)
#define USECS_PER_JIFFY_FRAC ((1000000ULL << 32) / HZ & 0xffffffff)
u64 jiffies_64;
/*
* forward reference
*/
......
......@@ -32,6 +32,8 @@
#include <asm/sysmips.h>
#include <asm/uaccess.h>
u64 jiffies_64;
extern asmlinkage void syscall_trace(void);
asmlinkage int sys_pipe(abi64_no_regargs, struct pt_regs regs)
......
......@@ -30,6 +30,8 @@
#include <linux/timex.h>
u64 jiffies_64;
extern rwlock_t xtime_lock;
static int timer_value;
......
......@@ -70,6 +70,9 @@
#include <asm/time.h>
/* XXX false sharing with below? */
u64 jiffies_64;
unsigned long disarm_decr[NR_CPUS];
extern int do_sys_settimeofday(struct timeval *tv, struct timezone *tz);
......
......@@ -64,6 +64,8 @@
void smp_local_timer_interrupt(struct pt_regs *);
u64 jiffies_64;
/* keep track of when we need to update the rtc */
time_t last_rtc_update;
extern rwlock_t xtime_lock;
......
......@@ -39,6 +39,8 @@
#define TICK_SIZE tick
u64 jiffies_64;
static ext_int_info_t ext_int_info_timer;
static uint64_t init_timer_cc;
......
......@@ -39,6 +39,8 @@
#define TICK_SIZE tick
u64 jiffies_64;
static ext_int_info_t ext_int_info_timer;
static uint64_t init_timer_cc;
......
......@@ -70,6 +70,8 @@
#endif /* CONFIG_CPU_SUBTYPE_ST40STB1 */
#endif /* __sh3__ or __SH4__ */
u64 jiffies_64;
extern rwlock_t xtime_lock;
extern unsigned long wall_jiffies;
#define TICK_SIZE tick
......
......@@ -43,6 +43,8 @@
extern rwlock_t xtime_lock;
u64 jiffies_64;
enum sparc_clock_type sp_clock_typ;
spinlock_t mostek_lock = SPIN_LOCK_UNLOCKED;
unsigned long mstk48t02_regs = 0UL;
......
......@@ -44,6 +44,8 @@ unsigned long mstk48t02_regs = 0UL;
unsigned long ds1287_regs = 0UL;
#endif
u64 jiffies_64;
static unsigned long mstk48t08_regs = 0UL;
static unsigned long mstk48t59_regs = 0UL;
......
......@@ -43,15 +43,9 @@ CFLAGS += -mcmodel=kernel
CFLAGS += -pipe
# this makes reading assembly source easier
CFLAGS += -fno-reorder-blocks
# needed for later gcc 3.1
CFLAGS += -finline-limit=2000
# needed for earlier gcc 3.1
#CFLAGS += -fno-strength-reduce
#CFLAGS += -g
# prevent gcc from keeping the stack 16 byte aligned (FIXME)
#CFLAGS += -mpreferred-stack-boundary=2
HEAD := arch/x86_64/kernel/head.o arch/x86_64/kernel/head64.o arch/x86_64/kernel/init_task.o
SUBDIRS := arch/x86_64/tools $(SUBDIRS) arch/x86_64/kernel arch/x86_64/mm arch/x86_64/lib
......
......@@ -21,10 +21,6 @@ ROOT_DEV := CURRENT
SVGA_MODE := -DSVGA_MODE=NORMAL_VGA
# If you want the RAM disk device, define this to be the size in blocks.
RAMDISK := -DRAMDISK=512
# ---------------------------------------------------------------------------
BOOT_INCL = $(TOPDIR)/include/linux/config.h \
......
......@@ -47,8 +47,7 @@ define_bool CONFIG_EISA n
define_bool CONFIG_X86_IO_APIC y
define_bool CONFIG_X86_LOCAL_APIC y
#currently broken:
#bool 'MTRR (Memory Type Range Register) support' CONFIG_MTRR
bool 'MTRR (Memory Type Range Register) support' CONFIG_MTRR
bool 'Symmetric multi-processing support' CONFIG_SMP
if [ "$CONFIG_SMP" = "n" ]; then
bool 'Preemptible Kernel' CONFIG_PREEMPT
......@@ -226,6 +225,7 @@ if [ "$CONFIG_DEBUG_KERNEL" != "n" ]; then
bool ' Spinlock debugging' CONFIG_DEBUG_SPINLOCK
bool ' Additional run-time checks' CONFIG_CHECKING
bool ' Debug __init statements' CONFIG_INIT_DEBUG
bool ' Spinlock debugging' CONFIG_DEBUG_SPINLOCK
fi
endmenu
......
......@@ -9,8 +9,9 @@ export-objs := ia32_ioctl.o sys_ia32.o
all: ia32.o
O_TARGET := ia32.o
obj-$(CONFIG_IA32_EMULATION) := ia32entry.o sys_ia32.o ia32_ioctl.o ia32_signal.o \
ia32_binfmt.o fpu32.o socket32.o ptrace32.o
obj-$(CONFIG_IA32_EMULATION) := ia32entry.o sys_ia32.o ia32_ioctl.o \
ia32_signal.o \
ia32_binfmt.o fpu32.o socket32.o ptrace32.o ipc32.o
clean::
......
This diff is collapsed.
This diff is collapsed.
......@@ -14,6 +14,7 @@
#include <linux/smp.h>
#include <linux/smp_lock.h>
#include <linux/stddef.h>
#include <linux/slab.h>
/* Set EXTENT bits starting at BASE in BITMAP to value TURN_ON. */
static void set_bitmap(unsigned long *bitmap, short base, short extent, int new_value)
......@@ -61,27 +62,19 @@ asmlinkage int sys_ioperm(unsigned long from, unsigned long num, int turn_on)
return -EINVAL;
if (turn_on && !capable(CAP_SYS_RAWIO))
return -EPERM;
/*
* If it's the first ioperm() call in this thread's lifetime, set the
* IO bitmap up. ioperm() is much less timing critical than clone(),
* this is why we delay this operation until now:
*/
if (!t->ioperm) {
/*
* just in case ...
*/
memset(t->io_bitmap,0xff,(IO_BITMAP_SIZE+1)*4);
t->ioperm = 1;
/*
* this activates it in the TSS
*/
if (!t->io_bitmap_ptr) {
t->io_bitmap_ptr = kmalloc((IO_BITMAP_SIZE+1)*4, GFP_KERNEL);
if (!t->io_bitmap_ptr)
return -ENOMEM;
memset(t->io_bitmap_ptr,0xff,(IO_BITMAP_SIZE+1)*4);
tss->io_map_base = IO_BITMAP_OFFSET;
}
/*
* do it in the per-thread copy and in the TSS ...
*/
set_bitmap((unsigned long *) t->io_bitmap, from, num, !turn_on);
set_bitmap((unsigned long *) t->io_bitmap_ptr, from, num, !turn_on);
set_bitmap((unsigned long *) tss->io_bitmap, from, num, !turn_on);
return 0;
......
This diff is collapsed.
......@@ -39,6 +39,7 @@
#include <linux/reboot.h>
#include <linux/init.h>
#include <linux/ctype.h>
#include <linux/slab.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
......@@ -320,9 +321,6 @@ void show_regs(struct pt_regs * regs)
printk("CR2: %016lx CR3: %016lx CR4: %016lx\n", cr2, cr3, cr4);
}
#define __STR(x) #x
#define __STR2(x) __STR(x)
extern void load_gs_index(unsigned);
/*
......@@ -330,7 +328,13 @@ extern void load_gs_index(unsigned);
*/
void exit_thread(void)
{
/* nothing to do ... */
struct task_struct *me = current;
if (me->thread.io_bitmap_ptr) {
kfree(me->thread.io_bitmap_ptr);
me->thread.io_bitmap_ptr = NULL;
(init_tss + smp_processor_id())->io_map_base =
INVALID_IO_BITMAP_OFFSET;
}
}
void flush_thread(void)
......@@ -392,6 +396,14 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long rsp,
unlazy_fpu(current);
p->thread.i387 = current->thread.i387;
if (unlikely(me->thread.io_bitmap_ptr != NULL)) {
p->thread.io_bitmap_ptr = kmalloc((IO_BITMAP_SIZE+1)*4, GFP_KERNEL);
if (!p->thread.io_bitmap_ptr)
return -ENOMEM;
memcpy(p->thread.io_bitmap_ptr, me->thread.io_bitmap_ptr,
(IO_BITMAP_SIZE+1)*4);
}
return 0;
}
......@@ -491,21 +503,14 @@ void __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
/*
* Handle the IO bitmap
*/
if (unlikely(prev->ioperm || next->ioperm)) {
if (next->ioperm) {
if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr)) {
if (next->io_bitmap_ptr) {
/*
* 4 cachelines copy ... not good, but not that
* bad either. Anyone got something better?
* This only affects processes which use ioperm().
* [Putting the TSSs into 4k-tlb mapped regions
* and playing VM tricks to switch the IO bitmap
* is not really acceptable.]
* On x86-64 we could put multiple bitmaps into
* the GDT and just switch offsets
* This would require ugly special cases on overflow
* though -AK
*/
memcpy(tss->io_bitmap, next->io_bitmap,
memcpy(tss->io_bitmap, next->io_bitmap_ptr,
IO_BITMAP_SIZE*sizeof(u32));
tss->io_map_base = IO_BITMAP_OFFSET;
} else {
......
......@@ -91,6 +91,9 @@ void pda_init(int cpu)
pda->me = pda;
pda->cpudata_offset = 0;
pda->active_mm = &init_mm;
pda->mmu_state = 0;
asm volatile("movl %0,%%fs ; movl %0,%%gs" :: "r" (0));
wrmsrl(MSR_GS_BASE, cpu_pda + cpu);
}
......
......@@ -84,7 +84,6 @@ struct rt_sigframe
char *pretcode;
struct ucontext uc;
struct siginfo info;
struct _fpstate fpstate;
};
static int
......@@ -186,8 +185,7 @@ asmlinkage long sys_rt_sigreturn(struct pt_regs regs)
*/
static int
setup_sigcontext(struct sigcontext *sc, struct _fpstate *fpstate,
struct pt_regs *regs, unsigned long mask)
setup_sigcontext(struct sigcontext *sc, struct pt_regs *regs, unsigned long mask)
{
int tmp, err = 0;
struct task_struct *me = current;
......@@ -221,20 +219,17 @@ setup_sigcontext(struct sigcontext *sc, struct _fpstate *fpstate,
err |= __put_user(mask, &sc->oldmask);
err |= __put_user(me->thread.cr2, &sc->cr2);
tmp = save_i387(fpstate);
if (tmp < 0)
err = 1;
else
err |= __put_user(tmp ? fpstate : NULL, &sc->fpstate);
return err;
}
/*
* Determine which stack to use..
*/
static inline struct rt_sigframe *
get_sigframe(struct k_sigaction *ka, struct pt_regs * regs)
#define round_down(p, r) ((void *) ((unsigned long)((p) - (r) + 1) & ~((r)-1)))
static void *
get_stack(struct k_sigaction *ka, struct pt_regs *regs, unsigned long size)
{
unsigned long rsp;
......@@ -247,22 +242,34 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs * regs)
rsp = current->sas_ss_sp + current->sas_ss_size;
}
rsp = (rsp - sizeof(struct _fpstate)) & ~(15UL);
rsp -= offsetof(struct rt_sigframe, fpstate);
return (struct rt_sigframe *) rsp;
return round_down(rsp - size, 16);
}
static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
sigset_t *set, struct pt_regs * regs)
{
struct rt_sigframe *frame;
struct rt_sigframe *frame = NULL;
struct _fpstate *fp = NULL;
int err = 0;
frame = get_sigframe(ka, regs);
if (current->used_math) {
fp = get_stack(ka, regs, sizeof(struct _fpstate));
frame = round_down((char *)fp - sizeof(struct rt_sigframe), 16) - 8;
if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
if (!access_ok(VERIFY_WRITE, fp, sizeof(struct _fpstate))) {
goto give_sigsegv;
}
if (save_i387(fp) < 0)
err |= -1;
}
if (!frame)
frame = get_stack(ka, regs, sizeof(struct rt_sigframe)) - 8;
if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame))) {
goto give_sigsegv;
}
if (ka->sa.sa_flags & SA_SIGINFO) {
err |= copy_siginfo_to_user(&frame->info, info);
......@@ -278,14 +285,10 @@ static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
err |= __put_user(sas_ss_flags(regs->rsp),
&frame->uc.uc_stack.ss_flags);
err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
regs, set->sig[0]);
err |= setup_sigcontext(&frame->uc.uc_mcontext, regs, set->sig[0]);
err |= __put_user(fp, &frame->uc.uc_mcontext.fpstate);
err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
if (err) {
goto give_sigsegv;
}
/* Set up to return from userspace. If provided, use a stub
already in userspace. */
/* x86-64 should always use SA_RESTORER. */
......@@ -297,7 +300,6 @@ static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
}
if (err) {
printk("fault 3\n");
goto give_sigsegv;
}
......@@ -305,7 +307,6 @@ static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
printk("%d old rip %lx old rsp %lx old rax %lx\n", current->pid,regs->rip,regs->rsp,regs->rax);
#endif
/* Set up registers for signal handler */
{
struct exec_domain *ed = current_thread_info()->exec_domain;
......@@ -320,9 +321,10 @@ static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
next argument after the signal number on the stack. */
regs->rsi = (unsigned long)&frame->info;
regs->rdx = (unsigned long)&frame->uc;
regs->rsp = (unsigned long) frame;
regs->rip = (unsigned long) ka->sa.sa_handler;
regs->rsp = (unsigned long)frame;
set_fs(USER_DS);
regs->eflags &= ~TF_MASK;
......
......@@ -25,8 +25,6 @@
/* The 'big kernel lock' */
spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
struct tlb_state cpu_tlbstate[NR_CPUS] = {[0 ... NR_CPUS-1] = { &init_mm, 0 }};
/*
* the following functions deal with sending IPIs between CPUs.
*
......@@ -147,9 +145,9 @@ static spinlock_t tlbstate_lock = SPIN_LOCK_UNLOCKED;
*/
static void inline leave_mm (unsigned long cpu)
{
if (cpu_tlbstate[cpu].state == TLBSTATE_OK)
if (read_pda(mmu_state) == TLBSTATE_OK)
BUG();
clear_bit(cpu, &cpu_tlbstate[cpu].active_mm->cpu_vm_mask);
clear_bit(cpu, &read_pda(active_mm)->cpu_vm_mask);
__flush_tlb();
}
......@@ -164,18 +162,18 @@ static void inline leave_mm (unsigned long cpu)
* the other cpus, but smp_invalidate_interrupt ignore flush ipis
* for the wrong mm, and in the worst case we perform a superflous
* tlb flush.
* 1a2) set cpu_tlbstate to TLBSTATE_OK
* 1a2) set cpu mmu_state to TLBSTATE_OK
* Now the smp_invalidate_interrupt won't call leave_mm if cpu0
* was in lazy tlb mode.
* 1a3) update cpu_tlbstate[].active_mm
* 1a3) update cpu active_mm
* Now cpu0 accepts tlb flushes for the new mm.
* 1a4) set_bit(cpu, &new_mm->cpu_vm_mask);
* Now the other cpus will send tlb flush ipis.
* 1a4) change cr3.
* 1b) thread switch without mm change
* cpu_tlbstate[].active_mm is correct, cpu0 already handles
* cpu active_mm is correct, cpu0 already handles
* flush ipis.
* 1b1) set cpu_tlbstate to TLBSTATE_OK
* 1b1) set cpu mmu_state to TLBSTATE_OK
* 1b2) test_and_set the cpu bit in cpu_vm_mask.
* Atomically set the bit [other cpus will start sending flush ipis],
* and test the bit.
......@@ -188,7 +186,7 @@ static void inline leave_mm (unsigned long cpu)
* runs in kernel space, the cpu could load tlb entries for user space
* pages.
*
* The good news is that cpu_tlbstate is local to each cpu, no
* The good news is that cpu mmu_state is local to each cpu, no
* write/read ordering problems.
*/
......@@ -216,8 +214,8 @@ asmlinkage void smp_invalidate_interrupt (void)
* BUG();
*/
if (flush_mm == cpu_tlbstate[cpu].active_mm) {
if (cpu_tlbstate[cpu].state == TLBSTATE_OK) {
if (flush_mm == read_pda(active_mm)) {
if (read_pda(mmu_state) == TLBSTATE_OK) {
if (flush_va == FLUSH_ALL)
local_flush_tlb();
else
......@@ -335,7 +333,7 @@ static inline void do_flush_tlb_all_local(void)
unsigned long cpu = smp_processor_id();
__flush_tlb_all();
if (cpu_tlbstate[cpu].state == TLBSTATE_LAZY)
if (read_pda(mmu_state) == TLBSTATE_LAZY)
leave_mm(cpu);
}
......
......@@ -47,7 +47,7 @@
#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
#define NO_VSYSCALL 1
//#define NO_VSYSCALL 1
#ifdef NO_VSYSCALL
#include <asm/unistd.h>
......
......@@ -189,3 +189,5 @@ EXPORT_SYMBOL_NOVERS(do_softirq_thunk);
void out_of_line_bug(void);
EXPORT_SYMBOL(out_of_line_bug);
EXPORT_SYMBOL(init_level4_pgt);
......@@ -12,7 +12,7 @@ obj-y = csum-partial.o csum-copy.o csum-wrappers.o delay.o \
thunk.o io.o clear_page.o copy_page.o
obj-y += memcpy.o
obj-y += memmove.o
#obj-y += memset.o
obj-y += memset.o
obj-y += copy_user.o
export-objs := io.o csum-wrappers.o csum-partial.o
......
/* Copyright 2002 Andi Kleen, SuSE Labs */
// #define FIX_ALIGNMENT 1
/* Copyright 2002 Andi Kleen */
/*
* ISO C memset - set a memory block to a byte value.
......@@ -11,51 +9,51 @@
*
* rax original destination
*/
.globl ____memset
.globl __memset
.globl memset
.p2align
____memset:
movq %rdi,%r10 /* save destination for return address */
movq %rdx,%r11 /* save count */
memset:
__memset:
movq %rdi,%r10
movq %rdx,%r11
/* expand byte value */
movzbl %sil,%ecx /* zero extend char value */
movabs $0x0101010101010101,%rax /* expansion pattern */
mul %rcx /* expand with rax, clobbers rdx */
movzbl %sil,%ecx
movabs $0x0101010101010101,%rax
mul %rcx /* with rax, clobbers rdx */
#ifdef FIX_ALIGNMENT
/* align dst */
movl %edi,%r9d
andl $7,%r9d /* test unaligned bits */
andl $7,%r9d
jnz bad_alignment
after_bad_alignment:
#endif
movq %r11,%rcx /* restore count */
shrq $6,%rcx /* divide by 64 */
jz handle_tail /* block smaller than 64 bytes? */
movl $64,%r8d /* CSE loop block size */
movq %r11,%rcx
movl $64,%r8d
shrq $6,%rcx
jz handle_tail
loop_64:
movnti %rax,0*8(%rdi)
movnti %rax,1*8(%rdi)
movnti %rax,2*8(%rdi)
movnti %rax,3*8(%rdi)
movnti %rax,4*8(%rdi)
movnti %rax,5*8(%rdi)
movnti %rax,6*8(%rdi)
movnti %rax,7*8(%rdi) /* clear 64 byte blocks */
addq %r8,%rdi /* increase pointer by 64 bytes */
loop loop_64 /* decrement rcx and if not zero loop */
movnti %rax,(%rdi)
movnti %rax,8(%rdi)
movnti %rax,16(%rdi)
movnti %rax,24(%rdi)
movnti %rax,32(%rdi)
movnti %rax,40(%rdi)
movnti %rax,48(%rdi)
movnti %rax,56(%rdi)
addq %r8,%rdi
loop loop_64
/* Handle tail in loops. The loops should be faster than hard
to predict jump tables. */
handle_tail:
movl %r11d,%ecx
andl $63,%ecx
shrl $3,%ecx
andl $63&(~7),%ecx
jz handle_7
shrl $3,%ecx
loop_8:
movnti %rax,(%rdi) /* long words */
movnti %rax,(%rdi)
addq $8,%rdi
loop loop_8
......@@ -64,22 +62,20 @@ handle_7:
andl $7,%ecx
jz ende
loop_1:
movb %al,(%rdi) /* bytes */
incq %rdi
movb %al,(%rdi)
addq $1,%rdi
loop loop_1
ende:
movq %r10,%rax
ret
#ifdef FIX_ALIGNMENT
bad_alignment:
andq $-8,%r11 /* shorter than 8 bytes */
jz handle_7 /* if yes handle it in the tail code */
movnti %rax,(%rdi) /* unaligned store of 8 bytes */
cmpq $7,%r11
jbe handle_7
movnti %rax,(%rdi) /* unaligned store */
movq $8,%r8
subq %r9,%r8 /* compute alignment (8-misalignment) */
addq %r8,%rdi /* fix destination */
subq %r8,%r11 /* fix count */
subq %r9,%r8
addq %r8,%rdi
subq %r8,%r11
jmp after_bad_alignment
#endif
......@@ -28,6 +28,7 @@
#include <linux/types.h>
#include <linux/blk.h>
#include <linux/blkdev.h>
#include <linux/bio.h>
#include <linux/completion.h>
#include <linux/delay.h>
#include <linux/genhd.h>
......
......@@ -30,6 +30,7 @@
#include <linux/delay.h>
#include <linux/major.h>
#include <linux/fs.h>
#include <linux/bio.h>
#include <linux/blkpg.h>
#include <linux/timer.h>
#include <linux/proc_fs.h>
......
......@@ -24,6 +24,7 @@
#include <linux/version.h>
#include <linux/types.h>
#include <linux/pci.h>
#include <linux/bio.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/delay.h>
......
......@@ -28,6 +28,7 @@
#include <linux/fs.h>
#include <linux/blkdev.h>
#include <linux/elevator.h>
#include <linux/bio.h>
#include <linux/blk.h>
#include <linux/config.h>
#include <linux/module.h>
......
......@@ -165,6 +165,7 @@ static int print_unex=1;
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/mm.h>
#include <linux/bio.h>
#include <linux/string.h>
#include <linux/fcntl.h>
#include <linux/delay.h>
......
......@@ -18,6 +18,7 @@
#include <linux/errno.h>
#include <linux/string.h>
#include <linux/config.h>
#include <linux/bio.h>
#include <linux/mm.h>
#include <linux/swap.h>
#include <linux/init.h>
......@@ -2002,8 +2003,8 @@ int __init blk_dev_init(void)
queue_nr_requests = (total_ram >> 8) & ~15; /* One per quarter-megabyte */
if (queue_nr_requests < 32)
queue_nr_requests = 32;
if (queue_nr_requests > 512)
queue_nr_requests = 512;
if (queue_nr_requests > 256)
queue_nr_requests = 256;
/*
* Batch frees according to queue length
......
......@@ -60,6 +60,7 @@
#include <linux/sched.h>
#include <linux/fs.h>
#include <linux/file.h>
#include <linux/bio.h>
#include <linux/stat.h>
#include <linux/errno.h>
#include <linux/major.h>
......@@ -168,6 +169,15 @@ static void figure_loop_size(struct loop_device *lo)
}
static inline int lo_do_transfer(struct loop_device *lo, int cmd, char *rbuf,
char *lbuf, int size, int rblock)
{
if (!lo->transfer)
return 0;
return lo->transfer(lo, cmd, rbuf, lbuf, size, rblock);
}
static int
do_lo_send(struct loop_device *lo, struct bio_vec *bvec, int bsize, loff_t pos)
{
......@@ -454,20 +464,43 @@ static struct bio *loop_get_buffer(struct loop_device *lo, struct bio *rbh)
out_bh:
bio->bi_sector = rbh->bi_sector + (lo->lo_offset >> 9);
bio->bi_rw = rbh->bi_rw;
spin_lock_irq(&lo->lo_lock);
bio->bi_bdev = lo->lo_device;
spin_unlock_irq(&lo->lo_lock);
return bio;
}
static int loop_make_request(request_queue_t *q, struct bio *rbh)
static int
bio_transfer(struct loop_device *lo, struct bio *to_bio,
struct bio *from_bio)
{
struct bio *bh = NULL;
unsigned long IV = loop_get_iv(lo, from_bio->bi_sector);
struct bio_vec *from_bvec, *to_bvec;
char *vto, *vfrom;
int ret = 0, i;
__bio_for_each_segment(from_bvec, from_bio, i, 0) {
to_bvec = &to_bio->bi_io_vec[i];
kmap(from_bvec->bv_page);
kmap(to_bvec->bv_page);
vfrom = page_address(from_bvec->bv_page) + from_bvec->bv_offset;
vto = page_address(to_bvec->bv_page) + to_bvec->bv_offset;
ret |= lo_do_transfer(lo, bio_data_dir(to_bio), vto, vfrom,
from_bvec->bv_len, IV);
kunmap(from_bvec->bv_page);
kunmap(to_bvec->bv_page);
}
return ret;
}
static int loop_make_request(request_queue_t *q, struct bio *old_bio)
{
struct bio *new_bio = NULL;
struct loop_device *lo;
unsigned long IV;
int rw = bio_rw(rbh);
int unit = minor(to_kdev_t(rbh->bi_bdev->bd_dev));
int rw = bio_rw(old_bio);
int unit = minor(to_kdev_t(old_bio->bi_bdev->bd_dev));
if (unit >= max_loop)
goto out;
......@@ -489,60 +522,41 @@ static int loop_make_request(request_queue_t *q, struct bio *rbh)
goto err;
}
blk_queue_bounce(q, &rbh);
blk_queue_bounce(q, &old_bio);
/*
* file backed, queue for loop_thread to handle
*/
if (lo->lo_flags & LO_FLAGS_DO_BMAP) {
loop_add_bio(lo, rbh);
loop_add_bio(lo, old_bio);
return 0;
}
/*
* piggy old buffer on original, and submit for I/O
*/
bh = loop_get_buffer(lo, rbh);
IV = loop_get_iv(lo, rbh->bi_sector);
new_bio = loop_get_buffer(lo, old_bio);
IV = loop_get_iv(lo, old_bio->bi_sector);
if (rw == WRITE) {
if (lo_do_transfer(lo, WRITE, bio_data(bh), bio_data(rbh),
bh->bi_size, IV))
if (bio_transfer(lo, new_bio, old_bio))
goto err;
}
generic_make_request(bh);
generic_make_request(new_bio);
return 0;
err:
if (atomic_dec_and_test(&lo->lo_pending))
up(&lo->lo_bh_mutex);
loop_put_buffer(bh);
loop_put_buffer(new_bio);
out:
bio_io_error(rbh);
bio_io_error(old_bio);
return 0;
inactive:
spin_unlock_irq(&lo->lo_lock);
goto out;
}
static int do_bio_blockbacked(struct loop_device *lo, struct bio *bio,
struct bio *rbh)
{
unsigned long IV = loop_get_iv(lo, rbh->bi_sector);
struct bio_vec *from;
char *vto, *vfrom;
int ret = 0, i;
bio_for_each_segment(from, rbh, i) {
vfrom = page_address(from->bv_page) + from->bv_offset;
vto = page_address(bio->bi_io_vec[i].bv_page) + bio->bi_io_vec[i].bv_offset;
ret |= lo_do_transfer(lo, bio_data_dir(bio), vto, vfrom,
from->bv_len, IV);
}
return ret;
}
static inline void loop_handle_bio(struct loop_device *lo, struct bio *bio)
{
int ret;
......@@ -556,7 +570,7 @@ static inline void loop_handle_bio(struct loop_device *lo, struct bio *bio)
} else {
struct bio *rbh = bio->bi_private;
ret = do_bio_blockbacked(lo, bio, rbh);
ret = bio_transfer(lo, bio, rbh);
bio_endio(rbh, !ret);
loop_put_buffer(bio);
......@@ -588,10 +602,8 @@ static int loop_thread(void *data)
set_user_nice(current, -20);
spin_lock_irq(&lo->lo_lock);
lo->lo_state = Lo_bound;
atomic_inc(&lo->lo_pending);
spin_unlock_irq(&lo->lo_lock);
/*
* up sem, we are running
......
......@@ -39,6 +39,7 @@
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/fs.h>
#include <linux/bio.h>
#include <linux/stat.h>
#include <linux/errno.h>
#include <linux/file.h>
......
......@@ -45,6 +45,8 @@
#include <linux/config.h>
#include <linux/string.h>
#include <linux/slab.h>
#include <asm/atomic.h>
#include <linux/bio.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/devfs_fs_kernel.h>
......
......@@ -37,6 +37,7 @@
#include <linux/config.h>
#include <linux/sched.h>
#include <linux/fs.h>
#include <linux/bio.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/mman.h>
......
......@@ -118,8 +118,8 @@ struct agp_bridge_data {
int (*remove_memory) (agp_memory *, off_t, int);
agp_memory *(*alloc_by_type) (size_t, int);
void (*free_by_type) (agp_memory *);
unsigned long (*agp_alloc_page) (void);
void (*agp_destroy_page) (unsigned long);
void *(*agp_alloc_page) (void);
void (*agp_destroy_page) (void *);
int (*suspend)(void);
void (*resume)(void);
......
This diff is collapsed.
......@@ -252,6 +252,7 @@
#include <linux/poll.h>
#include <linux/init.h>
#include <linux/fs.h>
#include <linux/tqueue.h>
#include <asm/processor.h>
#include <asm/uaccess.h>
......
......@@ -345,8 +345,9 @@ int ata_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned
if (!arg) {
if (ide_spin_wait_hwgroup(drive))
return -EBUSY;
else
return 0;
/* Do nothing, just unlock */
spin_unlock_irq(drive->channel->lock);
return 0;
}
return do_cmd_ioctl(drive, arg);
......
......@@ -20,7 +20,7 @@
#include <linux/raid/md.h>
#include <linux/slab.h>
#include <linux/bio.h>
#include <linux/raid/linear.h>
#define MAJOR_NR MD_MAJOR
......
......@@ -224,7 +224,7 @@ static inline void invalidate_snap_cache(unsigned long start, unsigned long nr,
for (i = 0; i < nr; i++)
{
bh = get_hash_table(dev, start++, blksize);
bh = find_get_block(dev, start++, blksize);
if (bh)
bforget(bh);
}
......
......@@ -209,6 +209,7 @@
#include <linux/hdreg.h>
#include <linux/stat.h>
#include <linux/fs.h>
#include <linux/bio.h>
#include <linux/proc_fs.h>
#include <linux/blkdev.h>
#include <linux/genhd.h>
......
......@@ -33,6 +33,7 @@
#include <linux/linkage.h>
#include <linux/raid/md.h>
#include <linux/sysctl.h>
#include <linux/bio.h>
#include <linux/raid/xor.h>
#include <linux/devfs_fs_kernel.h>
......
......@@ -23,6 +23,7 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/raid/multipath.h>
#include <linux/bio.h>
#include <linux/buffer_head.h>
#include <asm/atomic.h>
......
......@@ -20,6 +20,7 @@
#include <linux/module.h>
#include <linux/raid/raid0.h>
#include <linux/bio.h>
#define MAJOR_NR MD_MAJOR
#define MD_DRIVER
......
......@@ -23,6 +23,7 @@
*/
#include <linux/raid/raid1.h>
#include <linux/bio.h>
#define MAJOR_NR MD_MAJOR
#define MD_DRIVER
......
......@@ -20,6 +20,7 @@
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/raid/raid5.h>
#include <linux/bio.h>
#include <asm/bitops.h>
#include <asm/atomic.h>
......
......@@ -210,3 +210,4 @@ EXPORT_SYMBOL(pci_match_device);
EXPORT_SYMBOL(pci_register_driver);
EXPORT_SYMBOL(pci_unregister_driver);
EXPORT_SYMBOL(pci_dev_driver);
EXPORT_SYMBOL(pci_bus_type);
......@@ -20,6 +20,7 @@
#include <linux/init.h>
#include <linux/pci.h>
#include <linux/sched.h>
#include <linux/tqueue.h>
#include <linux/interrupt.h>
#include <pcmcia/ss.h>
......
......@@ -6,6 +6,7 @@
#include <linux/init.h>
#include <linux/pci.h>
#include <linux/sched.h>
#include <linux/tqueue.h>
#include <linux/interrupt.h>
#include <linux/delay.h>
#include <linux/module.h>
......
......@@ -2,7 +2,7 @@ This file contains brief information about the SCSI tape driver.
The driver is currently maintained by Kai M{kisara (email
Kai.Makisara@metla.fi)
Last modified: Tue Jan 22 21:08:57 2002 by makisara
Last modified: Tue Jun 18 18:13:50 2002 by makisara
BASICS
......@@ -105,15 +105,19 @@ The default is BSD semantics.
BUFFERING
The driver uses tape buffers allocated either at system initialization
or at run-time when needed. One buffer is used for each open tape
device. The size of the buffers is selectable at compile and/or boot
time. The buffers are used to store the data being transferred to/from
the SCSI adapter. The following buffering options are selectable at
compile time and/or at run time (via ioctl):
The driver uses tape buffers allocated at run-time when needed and it
is freed when the device file is closed. One buffer is used for each
open tape device.
The size of the buffers is always at least one tape block. In fixed
block mode, the minimum buffer size is defined (in 1024 byte units) by
ST_FIXED_BUFFER_BLOCKS. With small block size this allows buffering of
several blocks and using one SCSI read or write to transfer all of the
blocks. Buffering of data across write calls in fixed block mode is
allowed if ST_BUFFER_WRITES is non-zero. Buffer allocation uses chunks of
memory having sizes 2^n * (page size). Because of this the actual
buffer size may be larger than the minimum allowable buffer size.
Buffering of data across write calls in fixed block mode (define
ST_BUFFER_WRITES).
Asynchronous writing. Writing the buffer contents to the tape is
started and the write call returns immediately. The status is checked
......@@ -128,30 +132,6 @@ attempted even if the user does not want to get all of the data at
this read command. Should be disabled for those drives that don't like
a filemark to truncate a read request or that don't like backspacing.
The buffer size is defined (in 1024 byte units) by ST_BUFFER_BLOCKS or
at boot time. If this size is not large enough, the driver tries to
temporarily enlarge the buffer. Buffer allocation uses chunks of
memory having sizes 2^n * (page size). Because of this the actual
buffer size may be larger than the buffer size specified with
ST_BUFFER_BLOCKS.
A small number of buffers are allocated at driver initialisation. The
maximum number of these buffers is defined by ST_MAX_BUFFERS. The
maximum can be changed with kernel or module startup options. One
buffer is allocated for each drive detected when the driver is
initialized up to the maximum.
The driver tries to allocate new buffers at run-time if
necessary. These buffers are freed after use. If the maximum number of
initial buffers is set to zero, all buffer allocation is done at
run-time. The advantage of run-time allocation is that memory is not
wasted for buffers not being used. The disadvantage is that there may
not be memory available at the time when a buffer is needed for the
first time (once a buffer is allocated, it is not released). This risk
should not be big if the tape drive is connected to a PCI adapter that
supports scatter/gather (the allocation is not limited to "DMA memory"
and the buffer can be composed of several fragments).
The threshold for triggering asynchronous write in fixed block mode
is defined by ST_WRITE_THRESHOLD. This may be optimized for each
use pattern. The default triggers asynchronous write after three
......
......@@ -39,6 +39,7 @@
#include <linux/pci.h>
#include <linux/delay.h>
#include <linux/timer.h>
#include <linux/init.h>
#include <linux/ioport.h> // request_region() prototype
#include <linux/vmalloc.h> // ioremap()
//#if LINUX_VERSION_CODE >= LinuxVersionCode(2,4,7)
......
......@@ -23,6 +23,7 @@
#include <linux/timer.h>
#include <linux/string.h>
#include <linux/slab.h>
#include <linux/bio.h>
#include <linux/ioport.h>
#include <linux/kernel.h>
#include <linux/stat.h>
......
......@@ -36,6 +36,7 @@
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/bio.h>
#include <linux/string.h>
#include <linux/hdreg.h>
#include <linux/errno.h>
......
......@@ -39,6 +39,7 @@
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/bio.h>
#include <linux/string.h>
#include <linux/errno.h>
#include <linux/cdrom.h>
......
This diff is collapsed.
......@@ -3,7 +3,7 @@
Copyright 1995-2000 Kai Makisara.
Last modified: Tue Jan 22 21:52:34 2002 by makisara
Last modified: Sun May 5 15:09:56 2002 by makisara
*/
#ifndef _ST_OPTIONS_H
......@@ -30,22 +30,17 @@
SENSE. */
#define ST_DEFAULT_BLOCK 0
/* The tape driver buffer size in kilobytes. Must be non-zero. */
#define ST_BUFFER_BLOCKS 32
/* The minimum tape driver buffer size in kilobytes in fixed block mode.
Must be non-zero. */
#define ST_FIXED_BUFFER_BLOCKS 32
/* The number of kilobytes of data in the buffer that triggers an
asynchronous write in fixed block mode. See also ST_ASYNC_WRITES
below. */
#define ST_WRITE_THRESHOLD_BLOCKS 30
/* The maximum number of tape buffers the driver tries to allocate at
driver initialisation. The number is also constrained by the number
of drives detected. If more buffers are needed, they are allocated
at run time and freed after use. */
#define ST_MAX_BUFFERS 4
/* Maximum number of scatter/gather segments */
#define ST_MAX_SG 16
#define ST_MAX_SG 64
/* The number of scatter/gather segments to allocate at first try (must be
smaller or equal to the maximum). */
......
......@@ -17,6 +17,7 @@
*
*/
#include <linux/mm.h>
#include <linux/bio.h>
#include <linux/blk.h>
#include <linux/slab.h>
#include <linux/iobuf.h>
......@@ -284,8 +285,8 @@ struct bio *bio_copy(struct bio *bio, int gfp_mask, int copy)
vto = kmap(bbv->bv_page);
} else {
local_irq_save(flags);
vfrom = kmap_atomic(bv->bv_page, KM_BIO_IRQ);
vto = kmap_atomic(bbv->bv_page, KM_BIO_IRQ);
vfrom = kmap_atomic(bv->bv_page, KM_BIO_SRC_IRQ);
vto = kmap_atomic(bbv->bv_page, KM_BIO_DST_IRQ);
}
memcpy(vto + bbv->bv_offset, vfrom + bv->bv_offset, bv->bv_len);
......@@ -293,8 +294,8 @@ struct bio *bio_copy(struct bio *bio, int gfp_mask, int copy)
kunmap(bbv->bv_page);
kunmap(bv->bv_page);
} else {
kunmap_atomic(vto, KM_BIO_IRQ);
kunmap_atomic(vfrom, KM_BIO_IRQ);
kunmap_atomic(vto, KM_BIO_DST_IRQ);
kunmap_atomic(vfrom, KM_BIO_SRC_IRQ);
local_irq_restore(flags);
}
}
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment