Commit cfd2e1af authored by Christoph Hellwig's avatar Christoph Hellwig

Merge http://linux.bkbits.net/linux-2.5

into lab343.munich.sgi.com:/home/hch/repo/bkbits/linux-2.5
parents 1142fa81 9a3e1a96
......@@ -141,17 +141,14 @@ you are have done so you need to call journal_dirty_{meta,}data().
Or if you've asked for access to a buffer you now know is now longer
required to be pushed back on the device you can call journal_forget()
in much the same way as you might have used bforget() in the past.
</para>
<para>
A journal_flush() may be called at any time to commit and checkpoint
all your transactions.
</para>
<para>
<para>
Then at umount time , in your put_super() (2.4) or write_super() (2.5)
you can then call journal_destroy() to clean up your in-core journal object.
</para>
......@@ -168,8 +165,8 @@ on another journal. Since transactions can't be nested/batched
across differing journals, and another filesystem other than
yours (say ext3) may be modified in a later syscall.
</para>
<para>
<para>
The second case to bear in mind is that journal_start() can
block if there isn't enough space in the journal for your transaction
(based on the passed nblocks param) - when it blocks it merely(!) needs to
......@@ -180,10 +177,14 @@ were semaphores and include them in your semaphore ordering rules to prevent
deadlocks. Note that journal_extend() has similar blocking behaviour to
journal_start() so you can deadlock here just as easily as on journal_start().
</para>
<para>
Try to reserve the right number of blocks the first time. ;-).
<para>
Try to reserve the right number of blocks the first time. ;-). This will
be the maximum number of blocks you are going to touch in this transaction.
I advise having a look at at least ext3_jbd.h to see the basis on which
ext3 uses to make these decisions.
</para>
<para>
Another wriggle to watch out for is your on-disk block allocation strategy.
why? Because, if you undo a delete, you need to ensure you haven't reused any
......@@ -211,6 +212,30 @@ The opportunities for abuse and DOS attacks with this should be obvious,
if you allow unprivileged userspace to trigger codepaths containing these
calls.
</para>
<para>
A new feature of jbd since 2.5.25 is commit callbacks with the new
journal_callback_set() function you can now ask the journalling layer
to call you back when the transaction is finally commited to disk, so that
you can do some of your own management. The key to this is the journal_callback
struct, this maintains the internal callback information but you can
extend it like this:-
</para>
<programlisting>
struct myfs_callback_s {
//Data structure element required by jbd..
struct journal_callback for_jbd;
// Stuff for myfs allocated together.
myfs_inode* i_commited;
}
</programlisting>
<para>
this would be useful if you needed to know when data was commited to a
particular inode.
</para>
</sect1>
<sect1>
......
November 2002 Kernel Parameters v2.5.49
February 2003 Kernel Parameters v2.5.59
~~~~~~~~~~~~~~~~~
The following is a consolidated list of the kernel parameters as implemented
......@@ -60,6 +60,7 @@ restrictions referred to are that the relevant option is valid if:
V4L Video For Linux support is enabled.
VGA The VGA console has been enabled.
VT Virtual terminal support is enabled.
WDT Watchdog support is enabled.
XT IBM PC/XT MFM hard disk support is enabled.
In addition, the following text indicates that the option:
......@@ -98,6 +99,9 @@ running once the system is up.
advansys= [HW,SCSI]
See header of drivers/scsi/advansys.c.
advwdt= [HW,WDT] Advantech WDT
Format: <iostart>,<iostop>
aedsp16= [HW,OSS] Audio Excel DSP 16
Format: <io>,<irq>,<dma>,<mss_io>,<mpu_io>,<mpu_irq>
See also header of sound/oss/aedsp16.c.
......@@ -111,6 +115,9 @@ running once the system is up.
aic7xxx= [HW,SCSI]
See Documentation/scsi/aic7xxx.txt.
aic79xx= [HW,SCSI]
See Documentation/scsi/aic79xx.txt.
allowdma0 [ISAPNP]
AM53C974= [HW,SCSI]
......@@ -231,19 +238,11 @@ running once the system is up.
cs89x0_media= [HW,NET]
Format: { rj45 | aui | bnc }
ctc= [HW,NET]
See drivers/s390/net/ctcmain.c, comment before function
ctc_setup().
cyclades= [HW,SERIAL] Cyclades multi-serial port adapter.
dasd= [HW,NET]
See header of drivers/s390/block/dasd_devmap.c.
dasd_discipline=
[HW,NET]
See header of drivers/s390/block/dasd.c.
db9= [HW,JOY]
db9_2=
db9_3=
......@@ -254,9 +253,6 @@ running once the system is up.
Format: <area>[,<node>]
See also Documentation/networking/decnet.txt.
decr_overclock= [PPC]
decr_overclock_proc0=
devfs= [DEVFS]
See Documentation/filesystems/devfs/boot-options.
......@@ -305,6 +301,9 @@ running once the system is up.
This option is obsoleted by the "netdev=" option, which
has equivalent usage. See its documentation for details.
eurwdt= [HW,WDT] Eurotech CPU-1220/1410 onboard watchdog.
Format: <io>[,<irq>]
fd_mcs= [HW,SCSI]
See header of drivers/scsi/fd_mcs.c.
......@@ -350,7 +349,9 @@ running once the system is up.
hisax= [HW,ISDN]
See Documentation/isdn/README.HiSax.
hugepages= [HW,IA-32] Maximal number of HugeTLB pages
hugepages= [HW,IA-32,IA-64] Maximal number of HugeTLB pages.
noirqbalance [IA-32,SMP,KNL] Disable kernel irq balancing
i8042_direct [HW] Non-translated mode
i8042_dumbkbd
......@@ -394,6 +395,10 @@ running once the system is up.
inttest= [IA64]
io7= [HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
ip= [IP_PNP]
See Documentation/nfsroot.txt.
......@@ -495,6 +500,7 @@ running once the system is up.
mdacon= [MDA]
Format: <first>,<last>
Specifies range of consoles to be captured by the MDA.
mem=exactmap [KNL,BOOT,IA-32] Enable setting of an exact
E820 memory map, as specified by the user.
......@@ -576,6 +582,8 @@ running once the system is up.
nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects.
noexec [IA-64]
nofxsr [BUGS=IA-32]
nohighio [BUGS=IA-32] Disable highmem block I/O.
......@@ -599,7 +607,9 @@ running once the system is up.
noresume [SWSUSP] Disables resume and restore original swap space.
no-scroll [VGA]
no-scroll [VGA] Disables scrollback.
This is required for the Braillex ib80-piezo Braille
reader made by F.H. Papenmeier (Germany).
nosbagart [IA-64]
......@@ -809,6 +819,9 @@ running once the system is up.
See a comment before function sbpcd_setup() in
drivers/cdrom/sbpcd.c.
sc1200wdt= [HW,WDT] SC1200 WDT (watchdog) driver
Format: <io>[,<timeout>[,<isapnp>]]
scsi_debug_*= [SCSI]
See drivers/scsi/scsi_debug.c.
......@@ -997,9 +1010,6 @@ running once the system is up.
spia_pedr=
spia_peddr=
spread_lpevents=
[PPC]
sscape= [HW,OSS]
Format: <io>,<irq>,<dma>,<mpu_io>,<mpu_irq>
......@@ -1009,6 +1019,19 @@ running once the system is up.
st0x= [HW,SCSI]
See header of drivers/scsi/seagate.c.
sti= [HW]
Format: <num>
Set the STI (builtin display/keyboard on the HP-PARISC
machines) console (graphic card) which should be used
as the initial boot-console.
See also comment in drivers/video/console/sticore.c.
sti_font= [HW]
See comment in drivers/video/console/sticore.c.
stifb= [HW]
Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
stram_swap= [HW,M68k]
swiotlb= [IA-64] Number of I/O TLB slabs
......@@ -1079,7 +1102,7 @@ running once the system is up.
wd7000= [HW,SCSI]
See header of drivers/scsi/wd7000.c.
wdt= [HW] Watchdog
wdt= [WDT] Watchdog
See Documentation/watchdog.txt.
xd= [HW,XT] Original XT pre-IDE (RLL encoded) disks.
......
This diff is collapsed.
......@@ -86,7 +86,7 @@ void enable_hlt(void)
*/
void default_idle(void)
{
if (current_cpu_data.hlt_works_ok && !hlt_counter) {
if (!hlt_counter && current_cpu_data.hlt_works_ok) {
local_irq_disable();
if (!need_resched())
safe_halt();
......
......@@ -26,7 +26,6 @@ static long htlbpagemem;
int htlbpage_max;
static long htlbzone_pages;
struct vm_operations_struct hugetlb_vm_ops;
static LIST_HEAD(htlbpage_freelist);
static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
......@@ -46,6 +45,7 @@ static struct page *alloc_hugetlb_page(void)
htlbpagemem--;
spin_unlock(&htlbpage_lock);
set_page_count(page, 1);
page->lru.prev = (void *)huge_page_release;
for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
clear_highpage(&page[i]);
return page;
......@@ -134,6 +134,7 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = pte_page(pte);
if (pages) {
page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
pages[i] = page;
}
if (vmas)
......@@ -150,6 +151,82 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
return i;
}
#if 0 /* This is just for testing */
struct page *
follow_huge_addr(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address, int write)
{
unsigned long start = address;
int length = 1;
int nr;
struct page *page;
nr = follow_hugetlb_page(mm, vma, &page, NULL, &start, &length, 0);
if (nr == 1)
return page;
return NULL;
}
/*
* If virtual address `addr' lies within a huge page, return its controlling
* VMA, else NULL.
*/
struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr)
{
if (mm->used_hugetlb) {
struct vm_area_struct *vma = find_vma(mm, addr);
if (vma && is_vm_hugetlb_page(vma))
return vma;
}
return NULL;
}
int pmd_huge(pmd_t pmd)
{
return 0;
}
struct page *
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
{
return NULL;
}
#else
struct page *
follow_huge_addr(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address, int write)
{
return NULL;
}
struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr)
{
return NULL;
}
int pmd_huge(pmd_t pmd)
{
return !!(pmd_val(pmd) & _PAGE_PSE);
}
struct page *
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
{
struct page *page;
page = pte_page(*(pte_t *)pmd);
if (page) {
page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
}
return page;
}
#endif
void free_huge_page(struct page *page)
{
BUG_ON(page_count(page));
......@@ -171,7 +248,8 @@ void huge_page_release(struct page *page)
free_huge_page(page);
}
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long end)
void unmap_hugepage_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
......@@ -181,8 +259,6 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsig
BUG_ON(start & (HPAGE_SIZE - 1));
BUG_ON(end & (HPAGE_SIZE - 1));
spin_lock(&htlbpage_lock);
spin_unlock(&htlbpage_lock);
for (address = start; address < end; address += HPAGE_SIZE) {
pte = huge_pte_offset(mm, address);
if (pte_none(*pte))
......@@ -195,7 +271,9 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsig
flush_tlb_range(vma, start, end);
}
void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long length)
void
zap_hugepage_range(struct vm_area_struct *vma,
unsigned long start, unsigned long length)
{
struct mm_struct *mm = vma->vm_mm;
spin_lock(&mm->page_table_lock);
......@@ -206,6 +284,7 @@ void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigne
int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
{
struct mm_struct *mm = current->mm;
struct inode *inode = mapping->host;
unsigned long addr;
int ret = 0;
......@@ -229,6 +308,7 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
page = find_get_page(mapping, idx);
if (!page) {
loff_t i_size;
page = alloc_hugetlb_page();
if (!page) {
ret = -ENOMEM;
......@@ -240,6 +320,9 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
free_huge_page(page);
goto out;
}
i_size = (loff_t)(idx + 1) * HPAGE_SIZE;
if (i_size > inode->i_size)
inode->i_size = i_size;
}
set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE);
}
......@@ -298,8 +381,8 @@ int try_to_free_low(int count)
int set_hugetlb_mem_size(int count)
{
int j, lcount;
struct page *page, *map;
int lcount;
struct page *page;
extern long htlbzone_pages;
extern struct list_head htlbpage_freelist;
......@@ -315,11 +398,6 @@ int set_hugetlb_mem_size(int count)
page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
if (page == NULL)
break;
map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
SetPageReserved(map);
map++;
}
spin_lock(&htlbpage_lock);
list_add(&page->list, &htlbpage_freelist);
htlbpagemem++;
......@@ -341,7 +419,8 @@ int set_hugetlb_mem_size(int count)
return (int) htlbzone_pages;
}
int hugetlb_sysctl_handler(ctl_table *table, int write, struct file *file, void *buffer, size_t *length)
int hugetlb_sysctl_handler(ctl_table *table, int write,
struct file *file, void *buffer, size_t *length)
{
proc_dointvec(table, write, file, buffer, length);
htlbpage_max = set_hugetlb_mem_size(htlbpage_max);
......@@ -358,15 +437,13 @@ __setup("hugepages=", hugetlb_setup);
static int __init hugetlb_init(void)
{
int i, j;
int i;
struct page *page;
for (i = 0; i < htlbpage_max; ++i) {
page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
if (!page)
break;
for (j = 0; j < HPAGE_SIZE/PAGE_SIZE; ++j)
SetPageReserved(&page[j]);
spin_lock(&htlbpage_lock);
list_add(&page->list, &htlbpage_freelist);
spin_unlock(&htlbpage_lock);
......@@ -395,7 +472,14 @@ int is_hugepage_mem_enough(size_t size)
return 1;
}
static struct page *hugetlb_nopage(struct vm_area_struct * area, unsigned long address, int unused)
/*
* We cannot handle pagefaults against hugetlb pages at all. They cause
* handle_mm_fault() to try to instantiate regular-sized pages in the
* hugegpage VMA. do_page_fault() is supposed to trap this, so BUG is we get
* this far.
*/
static struct page *hugetlb_nopage(struct vm_area_struct *vma,
unsigned long address, int unused)
{
BUG();
return NULL;
......
......@@ -18,7 +18,6 @@
#include <asm/tlb.h>
#include <asm/tlbflush.h>
static struct vm_operations_struct hugetlb_vm_ops;
struct list_head htlbpage_freelist;
spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
extern long htlbpagemem;
......@@ -227,6 +226,7 @@ follow_hugetlb_page (struct mm_struct *mm, struct vm_area_struct *vma,
page = pte_page(pte);
if (pages) {
page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
pages[i] = page;
}
if (vmas)
......@@ -303,11 +303,6 @@ set_hugetlb_mem_size (int count)
page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
if (page == NULL)
break;
map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
SetPageReserved(map);
map++;
}
spin_lock(&htlbpage_lock);
list_add(&page->list, &htlbpage_freelist);
htlbpagemem++;
......@@ -327,7 +322,7 @@ set_hugetlb_mem_size (int count)
map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
1 << PG_dirty | 1 << PG_active |
1 << PG_private | 1<< PG_writeback);
map++;
}
......@@ -337,6 +332,14 @@ set_hugetlb_mem_size (int count)
return (int) htlbzone_pages;
}
static struct page *
hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
{
BUG();
return NULL;
}
static struct vm_operations_struct hugetlb_vm_ops = {
.close = zap_hugetlb_resources
.nopage = hugetlb_nopage,
.close = zap_hugetlb_resources,
};
......@@ -288,6 +288,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = pte_page(pte);
if (pages) {
page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
pages[i] = page;
}
if (vmas)
......@@ -584,11 +585,6 @@ int set_hugetlb_mem_size(int count)
page = alloc_pages(GFP_ATOMIC, HUGETLB_PAGE_ORDER);
if (page == NULL)
break;
map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
SetPageReserved(map);
map++;
}
spin_lock(&htlbpage_lock);
list_add(&page->list, &htlbpage_freelist);
htlbpagemem++;
......@@ -613,7 +609,6 @@ int set_hugetlb_mem_size(int count)
map->flags &= ~(1UL << PG_locked | 1UL << PG_error |
1UL << PG_referenced |
1UL << PG_dirty | 1UL << PG_active |
1UL << PG_reserved |
1UL << PG_private | 1UL << PG_writeback);
set_page_count(page, 0);
map++;
......@@ -624,6 +619,14 @@ int set_hugetlb_mem_size(int count)
return (int) htlbzone_pages;
}
static struct page *
hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
{
BUG();
return NULL;
}
static struct vm_operations_struct hugetlb_vm_ops = {
.nopage = hugetlb_nopage,
.close = zap_hugetlb_resources,
};
......@@ -25,7 +25,6 @@ static long htlbpagemem;
int htlbpage_max;
static long htlbzone_pages;
struct vm_operations_struct hugetlb_vm_ops;
static LIST_HEAD(htlbpage_freelist);
static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
......@@ -134,6 +133,7 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = pte_page(pte);
if (pages) {
page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
pages[i] = page;
}
if (vmas)
......@@ -204,6 +204,7 @@ void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigne
int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
{
struct mm_struct *mm = current->mm;
struct inode = mapping->host;
unsigned long addr;
int ret = 0;
......@@ -227,6 +228,8 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
page = find_get_page(mapping, idx);
if (!page) {
loff_t i_size;
page = alloc_hugetlb_page();
if (!page) {
ret = -ENOMEM;
......@@ -238,6 +241,9 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
free_huge_page(page);
goto out;
}
i_size = (loff_t)(idx + 1) * HPAGE_SIZE;
if (i_size > inode->i_size)
inode->i_size = i_size;
}
set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE);
}
......@@ -263,11 +269,6 @@ int set_hugetlb_mem_size(int count)
page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
if (page == NULL)
break;
map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
SetPageReserved(map);
map++;
}
spin_lock(&htlbpage_lock);
list_add(&page->list, &htlbpage_freelist);
htlbpagemem++;
......@@ -286,8 +287,9 @@ int set_hugetlb_mem_size(int count)
spin_unlock(&htlbpage_lock);
map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
map->flags &= ~(1 << PG_locked | 1 << PG_error |
1 << PG_referenced |
1 << PG_dirty | 1 << PG_active |
1 << PG_private | 1<< PG_writeback);
set_page_count(map, 0);
map++;
......@@ -346,7 +348,8 @@ int hugetlb_report_meminfo(char *buf)
HPAGE_SIZE/1024);
}
static struct page * hugetlb_nopage(struct vm_area_struct * area, unsigned long address, int unused)
static struct page *
hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
{
BUG();
return NULL;
......
......@@ -27,6 +27,8 @@
#include <linux/completion.h>
#include <linux/slab.h>
static void blk_unplug_work(void *data);
/*
* For the allocated request tables
*/
......@@ -237,6 +239,14 @@ void blk_queue_make_request(request_queue_t * q, make_request_fn * mfn)
blk_queue_hardsect_size(q, 512);
blk_queue_dma_alignment(q, 511);
q->unplug_thresh = 4; /* hmm */
q->unplug_delay = (3 * HZ) / 1000; /* 3 milliseconds */
if (q->unplug_delay == 0)
q->unplug_delay = 1;
init_timer(&q->unplug_timer);
INIT_WORK(&q->unplug_work, blk_unplug_work, q);
/*
* by default assume old behaviour and bounce for any highmem page
*/
......@@ -960,6 +970,7 @@ void blk_plug_device(request_queue_t *q)
if (!blk_queue_plugged(q)) {
spin_lock(&blk_plug_lock);
list_add_tail(&q->plug_list, &blk_plug_list);
mod_timer(&q->unplug_timer, jiffies + q->unplug_delay);
spin_unlock(&blk_plug_lock);
}
}
......@@ -974,6 +985,7 @@ int blk_remove_plug(request_queue_t *q)
if (blk_queue_plugged(q)) {
spin_lock(&blk_plug_lock);
list_del_init(&q->plug_list);
del_timer(&q->unplug_timer);
spin_unlock(&blk_plug_lock);
return 1;
}
......@@ -992,6 +1004,8 @@ static inline void __generic_unplug_device(request_queue_t *q)
if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
return;
del_timer(&q->unplug_timer);
/*
* was plugged, fire request_fn if queue has stuff to do
*/
......@@ -1020,6 +1034,18 @@ void generic_unplug_device(void *data)
spin_unlock_irq(q->queue_lock);
}
static void blk_unplug_work(void *data)
{
generic_unplug_device(data);
}
static void blk_unplug_timeout(unsigned long data)
{
request_queue_t *q = (request_queue_t *)data;
schedule_work(&q->unplug_work);
}
/**
* blk_start_queue - restart a previously stopped queue
* @q: The &request_queue_t in question
......@@ -1164,6 +1190,9 @@ void blk_cleanup_queue(request_queue_t * q)
count -= __blk_cleanup_queue(&q->rq[READ]);
count -= __blk_cleanup_queue(&q->rq[WRITE]);
del_timer_sync(&q->unplug_timer);
flush_scheduled_work();
if (count)
printk("blk_cleanup_queue: leaked requests (%d)\n", count);
......@@ -1269,6 +1298,9 @@ int blk_init_queue(request_queue_t *q, request_fn_proc *rfn, spinlock_t *lock)
blk_queue_make_request(q, __make_request);
blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE);
q->unplug_timer.function = blk_unplug_timeout;
q->unplug_timer.data = (unsigned long)q;
blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
......@@ -1811,7 +1843,15 @@ static int __make_request(request_queue_t *q, struct bio *bio)
out:
if (freereq)
__blk_put_request(q, freereq);
if (blk_queue_plugged(q)) {
int nr_queued = (queue_nr_requests - q->rq[0].count) +
(queue_nr_requests - q->rq[1].count);
if (nr_queued == q->unplug_thresh)
__generic_unplug_device(q);
}
spin_unlock_irq(q->queue_lock);
return 0;
end_io:
......
......@@ -350,15 +350,10 @@ static int do_bio_filebacked(struct loop_device *lo, struct bio *bio)
int ret;
pos = ((loff_t) bio->bi_sector << 9) + lo->lo_offset;
do {
if (bio_rw(bio) == WRITE)
ret = lo_send(lo, bio, lo->lo_blocksize, pos);
else
ret = lo_receive(lo, bio, lo->lo_blocksize, pos);
} while (++bio->bi_idx < bio->bi_vcnt);
return ret;
}
......
......@@ -19,7 +19,7 @@ comment "Video Adapters"
config VIDEO_BT848
tristate "BT848 Video For Linux"
depends on VIDEO_DEV && PCI && I2C_ALGOBIT
depends on VIDEO_DEV && PCI && I2C_ALGOBIT && SOUND
---help---
Support for BT848 based frame grabber/overlay boards. This includes
the Miro, Hauppauge and STB boards. Please read the material in
......
......@@ -127,9 +127,10 @@ void __wait_on_buffer(struct buffer_head * bh)
get_bh(bh);
do {
prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
if (buffer_locked(bh)) {
blk_run_queues();
if (buffer_locked(bh))
io_schedule();
}
} while (buffer_locked(bh));
put_bh(bh);
finish_wait(wqh, &wait);
......@@ -959,8 +960,6 @@ create_buffers(struct page * page, unsigned long size, int retry)
* the reserve list is empty, we're sure there are
* async buffer heads in use.
*/
blk_run_queues();
free_more_memory();
goto try_again;
}
......
......@@ -300,6 +300,8 @@ void put_dirty_page(struct task_struct * tsk, struct page *page, unsigned long a
pgd = pgd_offset(tsk->mm, address);
pte_chain = pte_chain_alloc(GFP_KERNEL);
if (!pte_chain)
goto out_sig;
spin_lock(&tsk->mm->page_table_lock);
pmd = pmd_alloc(tsk->mm, pgd, address);
if (!pmd)
......@@ -325,6 +327,7 @@ void put_dirty_page(struct task_struct * tsk, struct page *page, unsigned long a
return;
out:
spin_unlock(&tsk->mm->page_table_lock);
out_sig:
__free_page(page);
force_sig(SIGKILL, tsk);
pte_chain_free(pte_chain);
......
......@@ -99,6 +99,34 @@ int ext3_forget(handle_t *handle, int is_metadata,
return err;
}
/*
* Work out how many blocks we need to progress with the next chunk of a
* truncate transaction.
*/
static unsigned long blocks_for_truncate(struct inode *inode)
{
unsigned long needed;
needed = inode->i_blocks >> (inode->i_sb->s_blocksize_bits - 9);
/* Give ourselves just enough room to cope with inodes in which
* i_blocks is corrupt: we've seen disk corruptions in the past
* which resulted in random data in an inode which looked enough
* like a regular file for ext3 to try to delete it. Things
* will go a bit crazy if that happens, but at least we should
* try not to panic the whole kernel. */
if (needed < 2)
needed = 2;
/* But we need to bound the transaction so we don't overflow the
* journal. */
if (needed > EXT3_MAX_TRANS_DATA)
needed = EXT3_MAX_TRANS_DATA;
return EXT3_DATA_TRANS_BLOCKS + needed;
}
/*
* Truncate transactions can be complex and absolutely huge. So we need to
* be able to restart the transaction at a conventient checkpoint to make
......@@ -112,14 +140,9 @@ int ext3_forget(handle_t *handle, int is_metadata,
static handle_t *start_transaction(struct inode *inode)
{
long needed;
handle_t *result;
needed = inode->i_blocks;
if (needed > EXT3_MAX_TRANS_DATA)
needed = EXT3_MAX_TRANS_DATA;
result = ext3_journal_start(inode, EXT3_DATA_TRANS_BLOCKS + needed);
result = ext3_journal_start(inode, blocks_for_truncate(inode));
if (!IS_ERR(result))
return result;
......@@ -135,14 +158,9 @@ static handle_t *start_transaction(struct inode *inode)
*/
static int try_to_extend_transaction(handle_t *handle, struct inode *inode)
{
long needed;
if (handle->h_buffer_credits > EXT3_RESERVE_TRANS_BLOCKS)
return 0;
needed = inode->i_blocks;
if (needed > EXT3_MAX_TRANS_DATA)
needed = EXT3_MAX_TRANS_DATA;
if (!ext3_journal_extend(handle, EXT3_RESERVE_TRANS_BLOCKS + needed))
if (!ext3_journal_extend(handle, blocks_for_truncate(inode)))
return 0;
return 1;
}
......@@ -154,11 +172,8 @@ static int try_to_extend_transaction(handle_t *handle, struct inode *inode)
*/
static int ext3_journal_test_restart(handle_t *handle, struct inode *inode)
{
long needed = inode->i_blocks;
if (needed > EXT3_MAX_TRANS_DATA)
needed = EXT3_MAX_TRANS_DATA;
jbd_debug(2, "restarting handle %p\n", handle);
return ext3_journal_restart(handle, EXT3_DATA_TRANS_BLOCKS + needed);
return ext3_journal_restart(handle, blocks_for_truncate(inode));
}
/*
......
......@@ -61,6 +61,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
sb->s_op->dirty_inode(inode);
}
/*
* make sure that changes are seen by all cpus before we test i_state
* -- mikulas
*/
smp_mb();
/* avoid the locking if we can */
if ((inode->i_state & flags) == flags)
return;
......@@ -137,6 +143,12 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc)
inode->i_state |= I_LOCK;
inode->i_state &= ~I_DIRTY;
/*
* smp_rmb(); note: if you remove write_lock below, you must add this.
* mark_inode_dirty doesn't take spinlock, make sure that inode is not
* read speculatively by this cpu before &= ~I_DIRTY -- mikulas
*/
write_lock(&mapping->page_lock);
if (wait || !wbc->for_kupdate || list_empty(&mapping->io_pages))
list_splice_init(&mapping->dirty_pages, &mapping->io_pages);
......@@ -334,7 +346,6 @@ writeback_inodes(struct writeback_control *wbc)
}
spin_unlock(&sb_lock);
spin_unlock(&inode_lock);
blk_run_queues();
}
/*
......
This diff is collapsed.
This diff is collapsed.
......@@ -206,20 +206,22 @@ do { \
var -= ((journal)->j_last - (journal)->j_first); \
} while (0)
/*
* journal_recover
/**
* int journal_recover(journal_t *journal) - recovers a on-disk journal
* @journal: the journal to recover
*
* The primary function for recovering the log contents when mounting a
* journaled device.
*
*/
int journal_recover(journal_t *journal)
{
/*
* Recovery is done in three passes. In the first pass, we look for the
* end of the log. In the second, we assemble the list of revoke
* blocks. In the third and final pass, we replay any un-revoked blocks
* in the log.
*/
int journal_recover(journal_t *journal)
{
int err;
journal_superblock_t * sb;
......@@ -263,20 +265,23 @@ int journal_recover(journal_t *journal)
return err;
}
/*
* journal_skip_recovery
/**
* int journal_skip_recovery() - Start journal and wipe exiting records
* @journal: journal to startup
*
* Locate any valid recovery information from the journal and set up the
* journal structures in memory to ignore it (presumably because the
* caller has evidence that it is out of date).
*
* This function does'nt appear to be exorted..
*/
int journal_skip_recovery(journal_t *journal)
{
/*
* We perform one pass over the journal to allow us to tell the user how
* much recovery information is being erased, and to let us initialise
* the journal transaction sequence numbers to the next unused ID.
*/
int journal_skip_recovery(journal_t *journal)
{
int err;
journal_superblock_t * sb;
......
This diff is collapsed.
......@@ -116,6 +116,49 @@ mpage_alloc(struct block_device *bdev,
return bio;
}
/*
* support function for mpage_readpages. The fs supplied get_block might
* return an up to date buffer. This is used to map that buffer into
* the page, which allows readpage to avoid triggering a duplicate call
* to get_block.
*
* The idea is to avoid adding buffers to pages that don't already have
* them. So when the buffer is up to date and the page size == block size,
* this marks the page up to date instead of adding new buffers.
*/
static void
map_buffer_to_page(struct page *page, struct buffer_head *bh, int page_block)
{
struct inode *inode = page->mapping->host;
struct buffer_head *page_bh, *head;
int block = 0;
if (!page_has_buffers(page)) {
/*
* don't make any buffers if there is only one buffer on
* the page and the page just needs to be set up to date
*/
if (inode->i_blkbits == PAGE_CACHE_SHIFT &&
buffer_uptodate(bh)) {
SetPageUptodate(page);
return;
}
create_empty_buffers(page, 1 << inode->i_blkbits, 0);
}
head = page_buffers(page);
page_bh = head;
do {
if (block == page_block) {
page_bh->b_state = bh->b_state;
page_bh->b_bdev = bh->b_bdev;
page_bh->b_blocknr = bh->b_blocknr;
break;
}
page_bh = page_bh->b_this_page;
block++;
} while (page_bh != head);
}
/**
* mpage_readpages - populate an address space with some pages, and
* start reads against them.
......@@ -186,6 +229,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
block_in_file = page->index << (PAGE_CACHE_SHIFT - blkbits);
last_block = (inode->i_size + blocksize - 1) >> blkbits;
bh.b_page = page;
for (page_block = 0; page_block < blocks_per_page;
page_block++, block_in_file++) {
bh.b_state = 0;
......@@ -201,6 +245,17 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
continue;
}
/* some filesystems will copy data into the page during
* the get_block call, in which case we don't want to
* read it again. map_buffer_to_page copies the data
* we just collected from get_block into the page's buffers
* so readpage doesn't have to repeat the get_block call
*/
if (buffer_uptodate(&bh)) {
map_buffer_to_page(page, &bh, page_block);
goto confused;
}
if (first_hole != blocks_per_page)
goto confused; /* hole -> non-hole */
......@@ -256,7 +311,10 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
confused:
if (bio)
bio = mpage_bio_submit(READ, bio);
if (!PageUptodate(page))
block_read_full_page(page, get_block);
else
unlock_page(page);
goto out;
}
......@@ -344,6 +402,7 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
sector_t boundary_block = 0;
struct block_device *boundary_bdev = NULL;
int length;
struct buffer_head map_bh;
if (page_has_buffers(page)) {
struct buffer_head *head = page_buffers(page);
......@@ -401,8 +460,8 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
BUG_ON(!PageUptodate(page));
block_in_file = page->index << (PAGE_CACHE_SHIFT - blkbits);
last_block = (inode->i_size - 1) >> blkbits;
map_bh.b_page = page;
for (page_block = 0; page_block < blocks_per_page; ) {
struct buffer_head map_bh;
map_bh.b_state = 0;
if (get_block(inode, block_in_file, &map_bh, 1))
......@@ -559,7 +618,6 @@ mpage_writepages(struct address_space *mapping,
int (*writepage)(struct page *page, struct writeback_control *wbc);
if (wbc->nonblocking && bdi_write_congested(bdi)) {
blk_run_queues();
wbc->encountered_congestion = 1;
return 0;
}
......@@ -614,7 +672,6 @@ mpage_writepages(struct address_space *mapping,
if (ret || (--(wbc->nr_to_write) <= 0))
done = 1;
if (wbc->nonblocking && bdi_write_congested(bdi)) {
blk_run_queues();
wbc->encountered_congestion = 1;
done = 1;
}
......
......@@ -535,6 +535,10 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
if (retval)
goto fput_in;
retval = security_file_permission (in_file, MAY_READ);
if (retval)
goto fput_in;
/*
* Get output file, and verify that it is ok..
*/
......@@ -556,6 +560,10 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
if (retval)
goto fput_out;
retval = security_file_permission (out_file, MAY_WRITE);
if (retval)
goto fput_out;
if (!ppos)
ppos = &in_file->f_pos;
......
This diff is collapsed.
......@@ -4,6 +4,8 @@
#include <linux/major.h>
#include <linux/genhd.h>
#include <linux/list.h>
#include <linux/timer.h>
#include <linux/workqueue.h>
#include <linux/pagemap.h>
#include <linux/backing-dev.h>
#include <linux/wait.h>
......@@ -188,6 +190,14 @@ struct request_queue
unplug_fn *unplug_fn;
merge_bvec_fn *merge_bvec_fn;
/*
* Auto-unplugging state
*/
struct timer_list unplug_timer;
int unplug_thresh; /* After this many requests */
unsigned long unplug_delay; /* After this many jiffies */
struct work_struct unplug_work;
struct backing_dev_info backing_dev_info;
/*
......
......@@ -28,7 +28,7 @@
* indirection blocks, the group and superblock summaries, and the data
* block to complete the transaction. */
#define EXT3_SINGLEDATA_TRANS_BLOCKS 8
#define EXT3_SINGLEDATA_TRANS_BLOCKS 8U
/* Extended attributes may touch two data buffers, two bitmap buffers,
* and two group and summaries. */
......@@ -58,7 +58,7 @@ extern int ext3_writepage_trans_blocks(struct inode *inode);
* start off at the maximum transaction size and grow the transaction
* optimistically as we go. */
#define EXT3_MAX_TRANS_DATA 64
#define EXT3_MAX_TRANS_DATA 64U
/* We break up a large truncate or write transaction once the handle's
* buffer credits gets this low, we need either to extend the
......@@ -67,7 +67,7 @@ extern int ext3_writepage_trans_blocks(struct inode *inode);
* one block, plus two quota updates. Quota allocations are not
* needed. */
#define EXT3_RESERVE_TRANS_BLOCKS 12
#define EXT3_RESERVE_TRANS_BLOCKS 12U
#define EXT3_INDEX_EXTRA_TRANS_BLOCKS 8
......
......@@ -20,16 +20,32 @@ int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
void huge_page_release(struct page *);
int hugetlb_report_meminfo(char *);
int is_hugepage_mem_enough(size_t);
struct page *follow_huge_addr(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, int write);
struct vm_area_struct *hugepage_vma(struct mm_struct *mm,
unsigned long address);
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write);
int pmd_huge(pmd_t pmd);
extern int htlbpage_max;
static inline void
mark_mm_hugetlb(struct mm_struct *mm, struct vm_area_struct *vma)
{
if (is_vm_hugetlb_page(vma))
mm->used_hugetlb = 1;
}
#else /* !CONFIG_HUGETLB_PAGE */
static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
{
return 0;
}
#define follow_hugetlb_page(m,v,p,vs,a,b,i) ({ BUG(); 0; })
#define follow_huge_addr(mm, vma, addr, write) 0
#define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; })
#define hugetlb_prefault(mapping, vma) ({ BUG(); 0; })
#define zap_hugepage_range(vma, start, len) BUG()
......@@ -37,6 +53,14 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
#define huge_page_release(page) BUG()
#define is_hugepage_mem_enough(size) 0
#define hugetlb_report_meminfo(buf) 0
#define hugepage_vma(mm, addr) 0
#define mark_mm_hugetlb(mm, vma) do { } while (0)
#define follow_huge_pmd(mm, addr, pmd, write) 0
#define pmd_huge(x) 0
#ifndef HPAGE_MASK
#define HPAGE_MASK 0 /* Keep the compiler happy */
#endif
#endif /* !CONFIG_HUGETLB_PAGE */
......
This diff is collapsed.
......@@ -208,24 +208,55 @@ struct page {
* Also, many kernel routines increase the page count before a critical
* routine so they can be sure the page doesn't go away from under them.
*/
#define get_page(p) atomic_inc(&(p)->count)
#define __put_page(p) atomic_dec(&(p)->count)
#define put_page_testzero(p) \
({ \
BUG_ON(page_count(page) == 0); \
atomic_dec_and_test(&(p)->count); \
})
#define page_count(p) atomic_read(&(p)->count)
#define set_page_count(p,v) atomic_set(&(p)->count, v)
#define __put_page(p) atomic_dec(&(p)->count)
extern void FASTCALL(__page_cache_release(struct page *));
#ifdef CONFIG_HUGETLB_PAGE
static inline void get_page(struct page *page)
{
if (PageCompound(page))
page = (struct page *)page->lru.next;
atomic_inc(&page->count);
}
static inline void put_page(struct page *page)
{
if (PageCompound(page)) {
page = (struct page *)page->lru.next;
if (page->lru.prev) { /* destructor? */
(*(void (*)(struct page *))page->lru.prev)(page);
return;
}
}
if (!PageReserved(page) && put_page_testzero(page))
__page_cache_release(page);
}
#else /* CONFIG_HUGETLB_PAGE */
static inline void get_page(struct page *page)
{
atomic_inc(&page->count);
}
static inline void put_page(struct page *page)
{
if (!PageReserved(page) && put_page_testzero(page))
__page_cache_release(page);
}
#endif /* CONFIG_HUGETLB_PAGE */
/*
* Multiple processes may "see" the same page. E.g. for untouched
* mappings of /dev/null, all processes see the same page full of
......
......@@ -72,7 +72,8 @@
#define PG_direct 16 /* ->pte_chain points directly at pte */
#define PG_mappedtodisk 17 /* Has blocks allocated on-disk */
#define PG_reclaim 18 /* To be recalimed asap */
#define PG_reclaim 18 /* To be reclaimed asap */
#define PG_compound 19 /* Part of a compound page */
/*
* Global page accounting. One instance per CPU. Only unsigned longs are
......@@ -251,6 +252,10 @@ extern void get_full_page_state(struct page_state *ret);
#define ClearPageReclaim(page) clear_bit(PG_reclaim, &(page)->flags)
#define TestClearPageReclaim(page) test_and_clear_bit(PG_reclaim, &(page)->flags)
#define PageCompound(page) test_bit(PG_compound, &(page)->flags)
#define SetPageCompound(page) set_bit(PG_compound, &(page)->flags)
#define ClearPageCompound(page) clear_bit(PG_compound, &(page)->flags)
/*
* The PageSwapCache predicate doesn't use a PG_flag at this time,
* but it may again do so one day.
......
......@@ -201,7 +201,9 @@ struct mm_struct {
unsigned long swap_address;
unsigned dumpable:1;
#ifdef CONFIG_HUGETLB_PAGE
int used_hugetlb;
#endif
/* Architecture-specific MM context */
mm_context_t context;
......
......@@ -37,30 +37,120 @@
#ifdef CONFIG_SMP
#include <asm/spinlock.h>
/*
* !CONFIG_SMP and spin_lock_init not previously defined
* (e.g. by including include/asm/spinlock.h)
*/
#elif !defined(spin_lock_init)
#else
#ifndef CONFIG_PREEMPT
#if !defined(CONFIG_PREEMPT) && !defined(CONFIG_DEBUG_SPINLOCK)
# define atomic_dec_and_lock(atomic,lock) atomic_dec_and_test(atomic)
# define ATOMIC_DEC_AND_LOCK
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
#define SPINLOCK_MAGIC 0x1D244B3C
typedef struct {
unsigned long magic;
volatile unsigned long lock;
volatile unsigned int babble;
const char *module;
char *owner;
int oline;
} spinlock_t;
#define SPIN_LOCK_UNLOCKED (spinlock_t) { SPINLOCK_MAGIC, 0, 10, __FILE__ , NULL, 0}
#define spin_lock_init(x) \
do { \
(x)->magic = SPINLOCK_MAGIC; \
(x)->lock = 0; \
(x)->babble = 5; \
(x)->module = __FILE__; \
(x)->owner = NULL; \
(x)->oline = 0; \
} while (0)
#define CHECK_LOCK(x) \
do { \
if ((x)->magic != SPINLOCK_MAGIC) { \
printk(KERN_ERR "%s:%d: spin_is_locked on uninitialized spinlock %p.\n", \
__FILE__, __LINE__, (x)); \
} \
} while(0)
#define _raw_spin_lock(x) \
do { \
CHECK_LOCK(x); \
if ((x)->lock&&(x)->babble) { \
printk("%s:%d: spin_lock(%s:%p) already locked by %s/%d\n", \
__FILE__,__LINE__, (x)->module, \
(x), (x)->owner, (x)->oline); \
(x)->babble--; \
} \
(x)->lock = 1; \
(x)->owner = __FILE__; \
(x)->oline = __LINE__; \
} while (0)
/* without debugging, spin_is_locked on UP always says
* FALSE. --> printk if already locked. */
#define spin_is_locked(x) \
({ \
CHECK_LOCK(x); \
if ((x)->lock&&(x)->babble) { \
printk("%s:%d: spin_is_locked(%s:%p) already locked by %s/%d\n", \
__FILE__,__LINE__, (x)->module, \
(x), (x)->owner, (x)->oline); \
(x)->babble--; \
} \
0; \
})
/* without debugging, spin_trylock on UP always says
* TRUE. --> printk if already locked. */
#define _raw_spin_trylock(x) \
({ \
CHECK_LOCK(x); \
if ((x)->lock&&(x)->babble) { \
printk("%s:%d: spin_trylock(%s:%p) already locked by %s/%d\n", \
__FILE__,__LINE__, (x)->module, \
(x), (x)->owner, (x)->oline); \
(x)->babble--; \
} \
(x)->lock = 1; \
(x)->owner = __FILE__; \
(x)->oline = __LINE__; \
1; \
})
#define spin_unlock_wait(x) \
do { \
CHECK_LOCK(x); \
if ((x)->lock&&(x)->babble) { \
printk("%s:%d: spin_unlock_wait(%s:%p) owned by %s/%d\n", \
__FILE__,__LINE__, (x)->module, (x), \
(x)->owner, (x)->oline); \
(x)->babble--; \
}\
} while (0)
#define _raw_spin_unlock(x) \
do { \
CHECK_LOCK(x); \
if (!(x)->lock&&(x)->babble) { \
printk("%s:%d: spin_unlock(%s:%p) not locked\n", \
__FILE__,__LINE__, (x)->module, (x));\
(x)->babble--; \
} \
(x)->lock = 0; \
} while (0)
#else
/*
* gcc versions before ~2.95 have a nasty bug with empty initializers.
*/
#if (__GNUC__ > 2)
typedef struct { } spinlock_t;
typedef struct { } rwlock_t;
#define SPIN_LOCK_UNLOCKED (spinlock_t) { }
#define RW_LOCK_UNLOCKED (rwlock_t) { }
#else
typedef struct { int gcc_is_buggy; } spinlock_t;
typedef struct { int gcc_is_buggy; } rwlock_t;
#define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
#define RW_LOCK_UNLOCKED (rwlock_t) { 0 }
#endif
/*
......@@ -72,6 +162,18 @@
#define _raw_spin_trylock(lock) ((void)(lock), 1)
#define spin_unlock_wait(lock) do { (void)(lock); } while(0)
#define _raw_spin_unlock(lock) do { (void)(lock); } while(0)
#endif /* CONFIG_DEBUG_SPINLOCK */
/* RW spinlocks: No debug version */
#if (__GNUC__ > 2)
typedef struct { } rwlock_t;
#define RW_LOCK_UNLOCKED (rwlock_t) { }
#else
typedef struct { int gcc_is_buggy; } rwlock_t;
#define RW_LOCK_UNLOCKED (rwlock_t) { 0 }
#endif
#define rwlock_init(lock) do { (void)(lock); } while(0)
#define _raw_read_lock(lock) do { (void)(lock); } while(0)
#define _raw_read_unlock(lock) do { (void)(lock); } while(0)
......
......@@ -177,6 +177,7 @@ static int worker_thread(void *__startup)
current->flags |= PF_IOTHREAD;
cwq->thread = current;
set_user_nice(current, -10);
set_cpus_allowed(current, 1UL << cpu);
spin_lock_irq(&current->sig->siglock);
......
......@@ -259,9 +259,10 @@ void wait_on_page_bit(struct page *page, int bit_nr)
do {
prepare_to_wait(waitqueue, &wait, TASK_UNINTERRUPTIBLE);
if (test_bit(bit_nr, &page->flags)) {
sync_page(page);
if (test_bit(bit_nr, &page->flags))
io_schedule();
}
} while (test_bit(bit_nr, &page->flags));
finish_wait(waitqueue, &wait);
}
......@@ -326,10 +327,11 @@ void __lock_page(struct page *page)
while (TestSetPageLocked(page)) {
prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
if (PageLocked(page)) {
sync_page(page);
if (PageLocked(page))
io_schedule();
}
}
finish_wait(wqh, &wait);
}
EXPORT_SYMBOL(__lock_page);
......
......@@ -53,8 +53,11 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
pte_t *pte, entry;
pgd_t *pgd;
pmd_t *pmd;
struct pte_chain *pte_chain = NULL;
struct pte_chain *pte_chain;
pte_chain = pte_chain_alloc(GFP_KERNEL);
if (!pte_chain)
goto err;
pgd = pgd_offset(mm, addr);
spin_lock(&mm->page_table_lock);
......@@ -62,7 +65,6 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (!pmd)
goto err_unlock;
pte_chain = pte_chain_alloc(GFP_KERNEL);
pte = pte_alloc_map(mm, pmd, addr);
if (!pte)
goto err_unlock;
......@@ -87,6 +89,7 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
err_unlock:
spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain);
err:
return err;
}
......
......@@ -607,13 +607,22 @@ follow_page(struct mm_struct *mm, unsigned long address, int write)
pmd_t *pmd;
pte_t *ptep, pte;
unsigned long pfn;
struct vm_area_struct *vma;
vma = hugepage_vma(mm, address);
if (vma)
return follow_huge_addr(mm, vma, address, write);
pgd = pgd_offset(mm, address);
if (pgd_none(*pgd) || pgd_bad(*pgd))
goto out;
pmd = pmd_offset(pgd, address);
if (pmd_none(*pmd) || pmd_bad(*pmd))
if (pmd_none(*pmd))
goto out;
if (pmd_huge(*pmd))
return follow_huge_pmd(mm, address, pmd, write);
if (pmd_bad(*pmd))
goto out;
ptep = pte_offset_map(pmd, address);
......@@ -926,9 +935,19 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
struct page *old_page, *new_page;
unsigned long pfn = pte_pfn(pte);
struct pte_chain *pte_chain = NULL;
int ret;
if (!pfn_valid(pfn))
goto bad_wp_page;
if (unlikely(!pfn_valid(pfn))) {
/*
* This should really halt the system so it can be debugged or
* at least the kernel stops what it's doing before it corrupts
* data, but for the moment just pretend this is OOM.
*/
pte_unmap(page_table);
printk(KERN_ERR "do_wp_page: bogus page at address %08lx\n",
address);
goto oom;
}
old_page = pfn_to_page(pfn);
if (!TestSetPageLocked(old_page)) {
......@@ -936,10 +955,11 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
unlock_page(old_page);
if (reuse) {
flush_cache_page(vma, address);
establish_pte(vma, address, page_table, pte_mkyoung(pte_mkdirty(pte_mkwrite(pte))));
establish_pte(vma, address, page_table,
pte_mkyoung(pte_mkdirty(pte_mkwrite(pte))));
pte_unmap(page_table);
spin_unlock(&mm->page_table_lock);
return VM_FAULT_MINOR;
ret = VM_FAULT_MINOR;
goto out;
}
}
pte_unmap(page_table);
......@@ -950,11 +970,13 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
page_cache_get(old_page);
spin_unlock(&mm->page_table_lock);
pte_chain = pte_chain_alloc(GFP_KERNEL);
if (!pte_chain)
goto no_mem;
new_page = alloc_page(GFP_HIGHUSER);
if (!new_page)
goto no_mem;
copy_cow_page(old_page,new_page,address);
pte_chain = pte_chain_alloc(GFP_KERNEL);
/*
* Re-check the pte - we dropped the lock
......@@ -973,25 +995,19 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
new_page = old_page;
}
pte_unmap(page_table);
spin_unlock(&mm->page_table_lock);
page_cache_release(new_page);
page_cache_release(old_page);
pte_chain_free(pte_chain);
return VM_FAULT_MINOR;
ret = VM_FAULT_MINOR;
goto out;
bad_wp_page:
pte_unmap(page_table);
spin_unlock(&mm->page_table_lock);
printk(KERN_ERR "do_wp_page: bogus page at address %08lx\n", address);
/*
* This should really halt the system so it can be debugged or
* at least the kernel stops what it's doing before it corrupts
* data, but for the moment just pretend this is OOM.
*/
return VM_FAULT_OOM;
no_mem:
page_cache_release(old_page);
return VM_FAULT_OOM;
oom:
ret = VM_FAULT_OOM;
out:
spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain);
return ret;
}
static void vmtruncate_list(struct list_head *head, unsigned long pgoff)
......@@ -1286,6 +1302,7 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page * new_page;
pte_t entry;
struct pte_chain *pte_chain;
int ret;
if (!vma->vm_ops || !vma->vm_ops->nopage)
return do_anonymous_page(mm, vma, page_table,
......@@ -1301,6 +1318,10 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (new_page == NOPAGE_OOM)
return VM_FAULT_OOM;
pte_chain = pte_chain_alloc(GFP_KERNEL);
if (!pte_chain)
goto oom;
/*
* Should we do an early C-O-W break?
*/
......@@ -1308,7 +1329,7 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page * page = alloc_page(GFP_HIGHUSER);
if (!page) {
page_cache_release(new_page);
return VM_FAULT_OOM;
goto oom;
}
copy_user_highpage(page, new_page, address);
page_cache_release(new_page);
......@@ -1316,7 +1337,6 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
new_page = page;
}
pte_chain = pte_chain_alloc(GFP_KERNEL);
spin_lock(&mm->page_table_lock);
page_table = pte_offset_map(pmd, address);
......@@ -1346,15 +1366,20 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
pte_unmap(page_table);
page_cache_release(new_page);
spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain);
return VM_FAULT_MINOR;
ret = VM_FAULT_MINOR;
goto out;
}
/* no need to invalidate: a not-present page shouldn't be cached */
update_mmu_cache(vma, address, entry);
spin_unlock(&mm->page_table_lock);
ret = VM_FAULT_MAJOR;
goto out;
oom:
ret = VM_FAULT_OOM;
out:
pte_chain_free(pte_chain);
return VM_FAULT_MAJOR;
return ret;
}
/*
......@@ -1422,6 +1447,10 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma,
pgd = pgd_offset(mm, address);
inc_page_state(pgfault);
if (is_vm_hugetlb_page(vma))
return VM_FAULT_SIGBUS; /* mapping truncation does this. */
/*
* We need the page table lock to synchronize with kswapd
* and the SMP-safe atomic PTE updates.
......
......@@ -362,6 +362,7 @@ static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma,
if (mapping)
up(&mapping->i_shared_sem);
mark_mm_hugetlb(mm, vma);
mm->map_count++;
validate_mm(mm);
}
......@@ -1222,6 +1223,11 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
return 0;
/* we have start < mpnt->vm_end */
if (is_vm_hugetlb_page(mpnt)) {
if ((start & ~HPAGE_MASK) || (len & ~HPAGE_MASK))
return -EINVAL;
}
/* if it doesn't overlap, we have nothing.. */
end = start + len;
if (mpnt->vm_start >= end)
......@@ -1423,7 +1429,6 @@ void exit_mmap(struct mm_struct *mm)
kmem_cache_free(vm_area_cachep, vma);
vma = next;
}
}
/* Insert vm structure into process list sorted by address
......
......@@ -24,9 +24,9 @@
static pte_t *get_one_pte_map_nested(struct mm_struct *mm, unsigned long addr)
{
pgd_t * pgd;
pmd_t * pmd;
pte_t * pte = NULL;
pgd_t *pgd;
pmd_t *pmd;
pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd))
......@@ -73,8 +73,8 @@ static inline int page_table_present(struct mm_struct *mm, unsigned long addr)
static inline pte_t *alloc_one_pte_map(struct mm_struct *mm, unsigned long addr)
{
pmd_t * pmd;
pte_t * pte = NULL;
pmd_t *pmd;
pte_t *pte = NULL;
pmd = pmd_alloc(mm, pgd_offset(mm, addr), addr);
if (pmd)
......@@ -88,7 +88,7 @@ copy_one_pte(struct mm_struct *mm, pte_t *src, pte_t *dst,
{
int error = 0;
pte_t pte;
struct page * page = NULL;
struct page *page = NULL;
if (pte_present(*src))
page = pte_page(*src);
......@@ -183,12 +183,12 @@ static int move_page_tables(struct vm_area_struct *vma,
return -1;
}
static unsigned long move_vma(struct vm_area_struct * vma,
static unsigned long move_vma(struct vm_area_struct *vma,
unsigned long addr, unsigned long old_len, unsigned long new_len,
unsigned long new_addr)
{
struct mm_struct * mm = vma->vm_mm;
struct vm_area_struct * new_vma, * next, * prev;
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *new_vma, *next, *prev;
int allocated_vma;
int split = 0;
......@@ -196,14 +196,16 @@ static unsigned long move_vma(struct vm_area_struct * vma,
next = find_vma_prev(mm, new_addr, &prev);
if (next) {
if (prev && prev->vm_end == new_addr &&
can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
can_vma_merge(prev, vma->vm_flags) && !vma->vm_file &&
!(vma->vm_flags & VM_SHARED)) {
spin_lock(&mm->page_table_lock);
prev->vm_end = new_addr + new_len;
spin_unlock(&mm->page_table_lock);
new_vma = prev;
if (next != prev->vm_next)
BUG();
if (prev->vm_end == next->vm_start && can_vma_merge(next, prev->vm_flags)) {
if (prev->vm_end == next->vm_start &&
can_vma_merge(next, prev->vm_flags)) {
spin_lock(&mm->page_table_lock);
prev->vm_end = next->vm_end;
__vma_unlink(mm, next, prev);
......@@ -214,7 +216,8 @@ static unsigned long move_vma(struct vm_area_struct * vma,
kmem_cache_free(vm_area_cachep, next);
}
} else if (next->vm_start == new_addr + new_len &&
can_vma_merge(next, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
can_vma_merge(next, vma->vm_flags) &&
!vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
spin_lock(&mm->page_table_lock);
next->vm_start = new_addr;
spin_unlock(&mm->page_table_lock);
......@@ -223,7 +226,8 @@ static unsigned long move_vma(struct vm_area_struct * vma,
} else {
prev = find_vma(mm, new_addr-1);
if (prev && prev->vm_end == new_addr &&
can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
can_vma_merge(prev, vma->vm_flags) && !vma->vm_file &&
!(vma->vm_flags & VM_SHARED)) {
spin_lock(&mm->page_table_lock);
prev->vm_end = new_addr + new_len;
spin_unlock(&mm->page_table_lock);
......@@ -249,7 +253,7 @@ static unsigned long move_vma(struct vm_area_struct * vma,
INIT_LIST_HEAD(&new_vma->shared);
new_vma->vm_start = new_addr;
new_vma->vm_end = new_addr+new_len;
new_vma->vm_pgoff += (addr - vma->vm_start) >> PAGE_SHIFT;
new_vma->vm_pgoff += (addr-vma->vm_start) >> PAGE_SHIFT;
if (new_vma->vm_file)
get_file(new_vma->vm_file);
if (new_vma->vm_ops && new_vma->vm_ops->open)
......@@ -428,7 +432,8 @@ unsigned long do_mremap(unsigned long addr,
if (vma->vm_flags & VM_SHARED)
map_flags |= MAP_SHARED;
new_addr = get_unmapped_area(vma->vm_file, 0, new_len, vma->vm_pgoff, map_flags);
new_addr = get_unmapped_area(vma->vm_file, 0, new_len,
vma->vm_pgoff, map_flags);
ret = new_addr;
if (new_addr & ~PAGE_MASK)
goto out;
......
......@@ -237,7 +237,6 @@ static void background_writeout(unsigned long _min_pages)
break;
}
}
blk_run_queues();
}
/*
......@@ -308,7 +307,6 @@ static void wb_kupdate(unsigned long arg)
}
nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
}
blk_run_queues();
if (time_before(next_jif, jiffies + HZ))
next_jif = jiffies + HZ;
mod_timer(&wb_timer, next_jif);
......
......@@ -85,6 +85,62 @@ static void bad_page(const char *function, struct page *page)
page->mapping = NULL;
}
#ifndef CONFIG_HUGETLB_PAGE
#define prep_compound_page(page, order) do { } while (0)
#define destroy_compound_page(page, order) do { } while (0)
#else
/*
* Higher-order pages are called "compound pages". They are structured thusly:
*
* The first PAGE_SIZE page is called the "head page".
*
* The remaining PAGE_SIZE pages are called "tail pages".
*
* All pages have PG_compound set. All pages have their lru.next pointing at
* the head page (even the head page has this).
*
* The head page's lru.prev, if non-zero, holds the address of the compound
* page's put_page() function.
*
* The order of the allocation is stored in the first tail page's lru.prev.
* This is only for debug at present. This usage means that zero-order pages
* may not be compound.
*/
static void prep_compound_page(struct page *page, int order)
{
int i;
int nr_pages = 1 << order;
page->lru.prev = NULL;
page[1].lru.prev = (void *)order;
for (i = 0; i < nr_pages; i++) {
struct page *p = page + i;
SetPageCompound(p);
p->lru.next = (void *)page;
}
}
static void destroy_compound_page(struct page *page, int order)
{
int i;
int nr_pages = 1 << order;
if (page[1].lru.prev != (void *)order)
bad_page(__FUNCTION__, page);
for (i = 0; i < nr_pages; i++) {
struct page *p = page + i;
if (!PageCompound(p))
bad_page(__FUNCTION__, page);
if (p->lru.next != (void *)page)
bad_page(__FUNCTION__, page);
ClearPageCompound(p);
}
}
#endif /* CONFIG_HUGETLB_PAGE */
/*
* Freeing function for a buddy system allocator.
*
......@@ -114,6 +170,8 @@ static inline void __free_pages_bulk (struct page *page, struct page *base,
{
unsigned long page_idx, index;
if (order)
destroy_compound_page(page, order);
page_idx = page - base;
if (page_idx & ~mask)
BUG();
......@@ -409,6 +467,12 @@ void free_cold_page(struct page *page)
free_hot_cold_page(page, 1);
}
/*
* Really, prep_compound_page() should be called from __rmqueue_bulk(). But
* we cheat by calling it from here, in the order > 0 path. Saves a branch
* or two.
*/
static struct page *buffered_rmqueue(struct zone *zone, int order, int cold)
{
unsigned long flags;
......@@ -435,6 +499,8 @@ static struct page *buffered_rmqueue(struct zone *zone, int order, int cold)
spin_lock_irqsave(&zone->lock, flags);
page = __rmqueue(zone, order);
spin_unlock_irqrestore(&zone->lock, flags);
if (order && page)
prep_compound_page(page, order);
}
if (page != NULL) {
......
......@@ -236,10 +236,8 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
* uptodate then the caller will launch readpage again, and
* will then handle the error.
*/
if (ret) {
if (ret)
read_pages(mapping, filp, &page_pool, ret);
blk_run_queues();
}
BUG_ON(!list_empty(&page_pool));
out:
return ret;
......
......@@ -439,7 +439,6 @@ struct arraycache_init initarray_generic __initdata = { { 0, BOOT_CPUCACHE_ENTRI
static kmem_cache_t cache_cache = {
.lists = LIST3_INIT(cache_cache.lists),
/* Allow for boot cpu != 0 */
.array = { [0 ... NR_CPUS-1] = &initarray_cache.cache },
.batchcount = 1,
.limit = BOOT_CPUCACHE_ENTRIES,
.objsize = sizeof(kmem_cache_t),
......@@ -611,6 +610,7 @@ void __init kmem_cache_init(void)
init_MUTEX(&cache_chain_sem);
INIT_LIST_HEAD(&cache_chain);
list_add(&cache_cache.next, &cache_chain);
cache_cache.array[smp_processor_id()] = &initarray_cache.cache;
cache_estimate(0, cache_cache.objsize, 0,
&left_over, &cache_cache.num);
......
......@@ -957,7 +957,6 @@ int kswapd(void *p)
finish_wait(&pgdat->kswapd_wait, &wait);
get_page_state(&ps);
balance_pgdat(pgdat, 0, &ps);
blk_run_queues();
}
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment