Commit cfd2e1af authored by Christoph Hellwig's avatar Christoph Hellwig

Merge http://linux.bkbits.net/linux-2.5

into lab343.munich.sgi.com:/home/hch/repo/bkbits/linux-2.5
parents 1142fa81 9a3e1a96
...@@ -141,17 +141,14 @@ you are have done so you need to call journal_dirty_{meta,}data(). ...@@ -141,17 +141,14 @@ you are have done so you need to call journal_dirty_{meta,}data().
Or if you've asked for access to a buffer you now know is now longer Or if you've asked for access to a buffer you now know is now longer
required to be pushed back on the device you can call journal_forget() required to be pushed back on the device you can call journal_forget()
in much the same way as you might have used bforget() in the past. in much the same way as you might have used bforget() in the past.
</para> </para>
<para> <para>
A journal_flush() may be called at any time to commit and checkpoint A journal_flush() may be called at any time to commit and checkpoint
all your transactions. all your transactions.
</para> </para>
<para>
<para>
Then at umount time , in your put_super() (2.4) or write_super() (2.5) Then at umount time , in your put_super() (2.4) or write_super() (2.5)
you can then call journal_destroy() to clean up your in-core journal object. you can then call journal_destroy() to clean up your in-core journal object.
</para> </para>
...@@ -168,8 +165,8 @@ on another journal. Since transactions can't be nested/batched ...@@ -168,8 +165,8 @@ on another journal. Since transactions can't be nested/batched
across differing journals, and another filesystem other than across differing journals, and another filesystem other than
yours (say ext3) may be modified in a later syscall. yours (say ext3) may be modified in a later syscall.
</para> </para>
<para>
<para>
The second case to bear in mind is that journal_start() can The second case to bear in mind is that journal_start() can
block if there isn't enough space in the journal for your transaction block if there isn't enough space in the journal for your transaction
(based on the passed nblocks param) - when it blocks it merely(!) needs to (based on the passed nblocks param) - when it blocks it merely(!) needs to
...@@ -180,10 +177,14 @@ were semaphores and include them in your semaphore ordering rules to prevent ...@@ -180,10 +177,14 @@ were semaphores and include them in your semaphore ordering rules to prevent
deadlocks. Note that journal_extend() has similar blocking behaviour to deadlocks. Note that journal_extend() has similar blocking behaviour to
journal_start() so you can deadlock here just as easily as on journal_start(). journal_start() so you can deadlock here just as easily as on journal_start().
</para> </para>
<para>
Try to reserve the right number of blocks the first time. ;-). <para>
Try to reserve the right number of blocks the first time. ;-). This will
be the maximum number of blocks you are going to touch in this transaction.
I advise having a look at at least ext3_jbd.h to see the basis on which
ext3 uses to make these decisions.
</para> </para>
<para> <para>
Another wriggle to watch out for is your on-disk block allocation strategy. Another wriggle to watch out for is your on-disk block allocation strategy.
why? Because, if you undo a delete, you need to ensure you haven't reused any why? Because, if you undo a delete, you need to ensure you haven't reused any
...@@ -211,6 +212,30 @@ The opportunities for abuse and DOS attacks with this should be obvious, ...@@ -211,6 +212,30 @@ The opportunities for abuse and DOS attacks with this should be obvious,
if you allow unprivileged userspace to trigger codepaths containing these if you allow unprivileged userspace to trigger codepaths containing these
calls. calls.
</para> </para>
<para>
A new feature of jbd since 2.5.25 is commit callbacks with the new
journal_callback_set() function you can now ask the journalling layer
to call you back when the transaction is finally commited to disk, so that
you can do some of your own management. The key to this is the journal_callback
struct, this maintains the internal callback information but you can
extend it like this:-
</para>
<programlisting>
struct myfs_callback_s {
//Data structure element required by jbd..
struct journal_callback for_jbd;
// Stuff for myfs allocated together.
myfs_inode* i_commited;
}
</programlisting>
<para>
this would be useful if you needed to know when data was commited to a
particular inode.
</para>
</sect1> </sect1>
<sect1> <sect1>
......
November 2002 Kernel Parameters v2.5.49 February 2003 Kernel Parameters v2.5.59
~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
The following is a consolidated list of the kernel parameters as implemented The following is a consolidated list of the kernel parameters as implemented
...@@ -60,6 +60,7 @@ restrictions referred to are that the relevant option is valid if: ...@@ -60,6 +60,7 @@ restrictions referred to are that the relevant option is valid if:
V4L Video For Linux support is enabled. V4L Video For Linux support is enabled.
VGA The VGA console has been enabled. VGA The VGA console has been enabled.
VT Virtual terminal support is enabled. VT Virtual terminal support is enabled.
WDT Watchdog support is enabled.
XT IBM PC/XT MFM hard disk support is enabled. XT IBM PC/XT MFM hard disk support is enabled.
In addition, the following text indicates that the option: In addition, the following text indicates that the option:
...@@ -98,6 +99,9 @@ running once the system is up. ...@@ -98,6 +99,9 @@ running once the system is up.
advansys= [HW,SCSI] advansys= [HW,SCSI]
See header of drivers/scsi/advansys.c. See header of drivers/scsi/advansys.c.
advwdt= [HW,WDT] Advantech WDT
Format: <iostart>,<iostop>
aedsp16= [HW,OSS] Audio Excel DSP 16 aedsp16= [HW,OSS] Audio Excel DSP 16
Format: <io>,<irq>,<dma>,<mss_io>,<mpu_io>,<mpu_irq> Format: <io>,<irq>,<dma>,<mss_io>,<mpu_io>,<mpu_irq>
See also header of sound/oss/aedsp16.c. See also header of sound/oss/aedsp16.c.
...@@ -111,6 +115,9 @@ running once the system is up. ...@@ -111,6 +115,9 @@ running once the system is up.
aic7xxx= [HW,SCSI] aic7xxx= [HW,SCSI]
See Documentation/scsi/aic7xxx.txt. See Documentation/scsi/aic7xxx.txt.
aic79xx= [HW,SCSI]
See Documentation/scsi/aic79xx.txt.
allowdma0 [ISAPNP] allowdma0 [ISAPNP]
AM53C974= [HW,SCSI] AM53C974= [HW,SCSI]
...@@ -231,19 +238,11 @@ running once the system is up. ...@@ -231,19 +238,11 @@ running once the system is up.
cs89x0_media= [HW,NET] cs89x0_media= [HW,NET]
Format: { rj45 | aui | bnc } Format: { rj45 | aui | bnc }
ctc= [HW,NET]
See drivers/s390/net/ctcmain.c, comment before function
ctc_setup().
cyclades= [HW,SERIAL] Cyclades multi-serial port adapter. cyclades= [HW,SERIAL] Cyclades multi-serial port adapter.
dasd= [HW,NET] dasd= [HW,NET]
See header of drivers/s390/block/dasd_devmap.c. See header of drivers/s390/block/dasd_devmap.c.
dasd_discipline=
[HW,NET]
See header of drivers/s390/block/dasd.c.
db9= [HW,JOY] db9= [HW,JOY]
db9_2= db9_2=
db9_3= db9_3=
...@@ -254,9 +253,6 @@ running once the system is up. ...@@ -254,9 +253,6 @@ running once the system is up.
Format: <area>[,<node>] Format: <area>[,<node>]
See also Documentation/networking/decnet.txt. See also Documentation/networking/decnet.txt.
decr_overclock= [PPC]
decr_overclock_proc0=
devfs= [DEVFS] devfs= [DEVFS]
See Documentation/filesystems/devfs/boot-options. See Documentation/filesystems/devfs/boot-options.
...@@ -305,6 +301,9 @@ running once the system is up. ...@@ -305,6 +301,9 @@ running once the system is up.
This option is obsoleted by the "netdev=" option, which This option is obsoleted by the "netdev=" option, which
has equivalent usage. See its documentation for details. has equivalent usage. See its documentation for details.
eurwdt= [HW,WDT] Eurotech CPU-1220/1410 onboard watchdog.
Format: <io>[,<irq>]
fd_mcs= [HW,SCSI] fd_mcs= [HW,SCSI]
See header of drivers/scsi/fd_mcs.c. See header of drivers/scsi/fd_mcs.c.
...@@ -350,7 +349,9 @@ running once the system is up. ...@@ -350,7 +349,9 @@ running once the system is up.
hisax= [HW,ISDN] hisax= [HW,ISDN]
See Documentation/isdn/README.HiSax. See Documentation/isdn/README.HiSax.
hugepages= [HW,IA-32] Maximal number of HugeTLB pages hugepages= [HW,IA-32,IA-64] Maximal number of HugeTLB pages.
noirqbalance [IA-32,SMP,KNL] Disable kernel irq balancing
i8042_direct [HW] Non-translated mode i8042_direct [HW] Non-translated mode
i8042_dumbkbd i8042_dumbkbd
...@@ -394,6 +395,10 @@ running once the system is up. ...@@ -394,6 +395,10 @@ running once the system is up.
inttest= [IA64] inttest= [IA64]
io7= [HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
ip= [IP_PNP] ip= [IP_PNP]
See Documentation/nfsroot.txt. See Documentation/nfsroot.txt.
...@@ -495,6 +500,7 @@ running once the system is up. ...@@ -495,6 +500,7 @@ running once the system is up.
mdacon= [MDA] mdacon= [MDA]
Format: <first>,<last> Format: <first>,<last>
Specifies range of consoles to be captured by the MDA.
mem=exactmap [KNL,BOOT,IA-32] Enable setting of an exact mem=exactmap [KNL,BOOT,IA-32] Enable setting of an exact
E820 memory map, as specified by the user. E820 memory map, as specified by the user.
...@@ -576,6 +582,8 @@ running once the system is up. ...@@ -576,6 +582,8 @@ running once the system is up.
nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects. nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects.
noexec [IA-64]
nofxsr [BUGS=IA-32] nofxsr [BUGS=IA-32]
nohighio [BUGS=IA-32] Disable highmem block I/O. nohighio [BUGS=IA-32] Disable highmem block I/O.
...@@ -599,7 +607,9 @@ running once the system is up. ...@@ -599,7 +607,9 @@ running once the system is up.
noresume [SWSUSP] Disables resume and restore original swap space. noresume [SWSUSP] Disables resume and restore original swap space.
no-scroll [VGA] no-scroll [VGA] Disables scrollback.
This is required for the Braillex ib80-piezo Braille
reader made by F.H. Papenmeier (Germany).
nosbagart [IA-64] nosbagart [IA-64]
...@@ -809,6 +819,9 @@ running once the system is up. ...@@ -809,6 +819,9 @@ running once the system is up.
See a comment before function sbpcd_setup() in See a comment before function sbpcd_setup() in
drivers/cdrom/sbpcd.c. drivers/cdrom/sbpcd.c.
sc1200wdt= [HW,WDT] SC1200 WDT (watchdog) driver
Format: <io>[,<timeout>[,<isapnp>]]
scsi_debug_*= [SCSI] scsi_debug_*= [SCSI]
See drivers/scsi/scsi_debug.c. See drivers/scsi/scsi_debug.c.
...@@ -997,9 +1010,6 @@ running once the system is up. ...@@ -997,9 +1010,6 @@ running once the system is up.
spia_pedr= spia_pedr=
spia_peddr= spia_peddr=
spread_lpevents=
[PPC]
sscape= [HW,OSS] sscape= [HW,OSS]
Format: <io>,<irq>,<dma>,<mpu_io>,<mpu_irq> Format: <io>,<irq>,<dma>,<mpu_io>,<mpu_irq>
...@@ -1009,6 +1019,19 @@ running once the system is up. ...@@ -1009,6 +1019,19 @@ running once the system is up.
st0x= [HW,SCSI] st0x= [HW,SCSI]
See header of drivers/scsi/seagate.c. See header of drivers/scsi/seagate.c.
sti= [HW]
Format: <num>
Set the STI (builtin display/keyboard on the HP-PARISC
machines) console (graphic card) which should be used
as the initial boot-console.
See also comment in drivers/video/console/sticore.c.
sti_font= [HW]
See comment in drivers/video/console/sticore.c.
stifb= [HW]
Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
stram_swap= [HW,M68k] stram_swap= [HW,M68k]
swiotlb= [IA-64] Number of I/O TLB slabs swiotlb= [IA-64] Number of I/O TLB slabs
...@@ -1079,7 +1102,7 @@ running once the system is up. ...@@ -1079,7 +1102,7 @@ running once the system is up.
wd7000= [HW,SCSI] wd7000= [HW,SCSI]
See header of drivers/scsi/wd7000.c. See header of drivers/scsi/wd7000.c.
wdt= [HW] Watchdog wdt= [WDT] Watchdog
See Documentation/watchdog.txt. See Documentation/watchdog.txt.
xd= [HW,XT] Original XT pre-IDE (RLL encoded) disks. xd= [HW,XT] Original XT pre-IDE (RLL encoded) disks.
......
This diff is collapsed.
...@@ -86,7 +86,7 @@ void enable_hlt(void) ...@@ -86,7 +86,7 @@ void enable_hlt(void)
*/ */
void default_idle(void) void default_idle(void)
{ {
if (current_cpu_data.hlt_works_ok && !hlt_counter) { if (!hlt_counter && current_cpu_data.hlt_works_ok) {
local_irq_disable(); local_irq_disable();
if (!need_resched()) if (!need_resched())
safe_halt(); safe_halt();
......
...@@ -26,7 +26,6 @@ static long htlbpagemem; ...@@ -26,7 +26,6 @@ static long htlbpagemem;
int htlbpage_max; int htlbpage_max;
static long htlbzone_pages; static long htlbzone_pages;
struct vm_operations_struct hugetlb_vm_ops;
static LIST_HEAD(htlbpage_freelist); static LIST_HEAD(htlbpage_freelist);
static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED; static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
...@@ -46,6 +45,7 @@ static struct page *alloc_hugetlb_page(void) ...@@ -46,6 +45,7 @@ static struct page *alloc_hugetlb_page(void)
htlbpagemem--; htlbpagemem--;
spin_unlock(&htlbpage_lock); spin_unlock(&htlbpage_lock);
set_page_count(page, 1); set_page_count(page, 1);
page->lru.prev = (void *)huge_page_release;
for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i) for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
clear_highpage(&page[i]); clear_highpage(&page[i]);
return page; return page;
...@@ -134,6 +134,7 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -134,6 +134,7 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = pte_page(pte); page = pte_page(pte);
if (pages) { if (pages) {
page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT); page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
pages[i] = page; pages[i] = page;
} }
if (vmas) if (vmas)
...@@ -150,6 +151,82 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -150,6 +151,82 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
return i; return i;
} }
#if 0 /* This is just for testing */
struct page *
follow_huge_addr(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address, int write)
{
unsigned long start = address;
int length = 1;
int nr;
struct page *page;
nr = follow_hugetlb_page(mm, vma, &page, NULL, &start, &length, 0);
if (nr == 1)
return page;
return NULL;
}
/*
* If virtual address `addr' lies within a huge page, return its controlling
* VMA, else NULL.
*/
struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr)
{
if (mm->used_hugetlb) {
struct vm_area_struct *vma = find_vma(mm, addr);
if (vma && is_vm_hugetlb_page(vma))
return vma;
}
return NULL;
}
int pmd_huge(pmd_t pmd)
{
return 0;
}
struct page *
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
{
return NULL;
}
#else
struct page *
follow_huge_addr(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address, int write)
{
return NULL;
}
struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr)
{
return NULL;
}
int pmd_huge(pmd_t pmd)
{
return !!(pmd_val(pmd) & _PAGE_PSE);
}
struct page *
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
{
struct page *page;
page = pte_page(*(pte_t *)pmd);
if (page) {
page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
}
return page;
}
#endif
void free_huge_page(struct page *page) void free_huge_page(struct page *page)
{ {
BUG_ON(page_count(page)); BUG_ON(page_count(page));
...@@ -171,7 +248,8 @@ void huge_page_release(struct page *page) ...@@ -171,7 +248,8 @@ void huge_page_release(struct page *page)
free_huge_page(page); free_huge_page(page);
} }
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long end) void unmap_hugepage_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{ {
struct mm_struct *mm = vma->vm_mm; struct mm_struct *mm = vma->vm_mm;
unsigned long address; unsigned long address;
...@@ -181,8 +259,6 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsig ...@@ -181,8 +259,6 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsig
BUG_ON(start & (HPAGE_SIZE - 1)); BUG_ON(start & (HPAGE_SIZE - 1));
BUG_ON(end & (HPAGE_SIZE - 1)); BUG_ON(end & (HPAGE_SIZE - 1));
spin_lock(&htlbpage_lock);
spin_unlock(&htlbpage_lock);
for (address = start; address < end; address += HPAGE_SIZE) { for (address = start; address < end; address += HPAGE_SIZE) {
pte = huge_pte_offset(mm, address); pte = huge_pte_offset(mm, address);
if (pte_none(*pte)) if (pte_none(*pte))
...@@ -195,7 +271,9 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsig ...@@ -195,7 +271,9 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsig
flush_tlb_range(vma, start, end); flush_tlb_range(vma, start, end);
} }
void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long length) void
zap_hugepage_range(struct vm_area_struct *vma,
unsigned long start, unsigned long length)
{ {
struct mm_struct *mm = vma->vm_mm; struct mm_struct *mm = vma->vm_mm;
spin_lock(&mm->page_table_lock); spin_lock(&mm->page_table_lock);
...@@ -206,6 +284,7 @@ void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigne ...@@ -206,6 +284,7 @@ void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigne
int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct inode *inode = mapping->host;
unsigned long addr; unsigned long addr;
int ret = 0; int ret = 0;
...@@ -229,6 +308,7 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) ...@@ -229,6 +308,7 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
page = find_get_page(mapping, idx); page = find_get_page(mapping, idx);
if (!page) { if (!page) {
loff_t i_size;
page = alloc_hugetlb_page(); page = alloc_hugetlb_page();
if (!page) { if (!page) {
ret = -ENOMEM; ret = -ENOMEM;
...@@ -240,6 +320,9 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) ...@@ -240,6 +320,9 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
free_huge_page(page); free_huge_page(page);
goto out; goto out;
} }
i_size = (loff_t)(idx + 1) * HPAGE_SIZE;
if (i_size > inode->i_size)
inode->i_size = i_size;
} }
set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE); set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE);
} }
...@@ -298,8 +381,8 @@ int try_to_free_low(int count) ...@@ -298,8 +381,8 @@ int try_to_free_low(int count)
int set_hugetlb_mem_size(int count) int set_hugetlb_mem_size(int count)
{ {
int j, lcount; int lcount;
struct page *page, *map; struct page *page;
extern long htlbzone_pages; extern long htlbzone_pages;
extern struct list_head htlbpage_freelist; extern struct list_head htlbpage_freelist;
...@@ -315,11 +398,6 @@ int set_hugetlb_mem_size(int count) ...@@ -315,11 +398,6 @@ int set_hugetlb_mem_size(int count)
page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER); page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
if (page == NULL) if (page == NULL)
break; break;
map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
SetPageReserved(map);
map++;
}
spin_lock(&htlbpage_lock); spin_lock(&htlbpage_lock);
list_add(&page->list, &htlbpage_freelist); list_add(&page->list, &htlbpage_freelist);
htlbpagemem++; htlbpagemem++;
...@@ -341,7 +419,8 @@ int set_hugetlb_mem_size(int count) ...@@ -341,7 +419,8 @@ int set_hugetlb_mem_size(int count)
return (int) htlbzone_pages; return (int) htlbzone_pages;
} }
int hugetlb_sysctl_handler(ctl_table *table, int write, struct file *file, void *buffer, size_t *length) int hugetlb_sysctl_handler(ctl_table *table, int write,
struct file *file, void *buffer, size_t *length)
{ {
proc_dointvec(table, write, file, buffer, length); proc_dointvec(table, write, file, buffer, length);
htlbpage_max = set_hugetlb_mem_size(htlbpage_max); htlbpage_max = set_hugetlb_mem_size(htlbpage_max);
...@@ -358,15 +437,13 @@ __setup("hugepages=", hugetlb_setup); ...@@ -358,15 +437,13 @@ __setup("hugepages=", hugetlb_setup);
static int __init hugetlb_init(void) static int __init hugetlb_init(void)
{ {
int i, j; int i;
struct page *page; struct page *page;
for (i = 0; i < htlbpage_max; ++i) { for (i = 0; i < htlbpage_max; ++i) {
page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER); page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
if (!page) if (!page)
break; break;
for (j = 0; j < HPAGE_SIZE/PAGE_SIZE; ++j)
SetPageReserved(&page[j]);
spin_lock(&htlbpage_lock); spin_lock(&htlbpage_lock);
list_add(&page->list, &htlbpage_freelist); list_add(&page->list, &htlbpage_freelist);
spin_unlock(&htlbpage_lock); spin_unlock(&htlbpage_lock);
...@@ -395,7 +472,14 @@ int is_hugepage_mem_enough(size_t size) ...@@ -395,7 +472,14 @@ int is_hugepage_mem_enough(size_t size)
return 1; return 1;
} }
static struct page *hugetlb_nopage(struct vm_area_struct * area, unsigned long address, int unused) /*
* We cannot handle pagefaults against hugetlb pages at all. They cause
* handle_mm_fault() to try to instantiate regular-sized pages in the
* hugegpage VMA. do_page_fault() is supposed to trap this, so BUG is we get
* this far.
*/
static struct page *hugetlb_nopage(struct vm_area_struct *vma,
unsigned long address, int unused)
{ {
BUG(); BUG();
return NULL; return NULL;
......
...@@ -18,7 +18,6 @@ ...@@ -18,7 +18,6 @@
#include <asm/tlb.h> #include <asm/tlb.h>
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
static struct vm_operations_struct hugetlb_vm_ops;
struct list_head htlbpage_freelist; struct list_head htlbpage_freelist;
spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED; spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
extern long htlbpagemem; extern long htlbpagemem;
...@@ -227,6 +226,7 @@ follow_hugetlb_page (struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -227,6 +226,7 @@ follow_hugetlb_page (struct mm_struct *mm, struct vm_area_struct *vma,
page = pte_page(pte); page = pte_page(pte);
if (pages) { if (pages) {
page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT); page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
pages[i] = page; pages[i] = page;
} }
if (vmas) if (vmas)
...@@ -303,11 +303,6 @@ set_hugetlb_mem_size (int count) ...@@ -303,11 +303,6 @@ set_hugetlb_mem_size (int count)
page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER); page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
if (page == NULL) if (page == NULL)
break; break;
map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
SetPageReserved(map);
map++;
}
spin_lock(&htlbpage_lock); spin_lock(&htlbpage_lock);
list_add(&page->list, &htlbpage_freelist); list_add(&page->list, &htlbpage_freelist);
htlbpagemem++; htlbpagemem++;
...@@ -327,7 +322,7 @@ set_hugetlb_mem_size (int count) ...@@ -327,7 +322,7 @@ set_hugetlb_mem_size (int count)
map = page; map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) { for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced | map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
1 << PG_dirty | 1 << PG_active | 1 << PG_reserved | 1 << PG_dirty | 1 << PG_active |
1 << PG_private | 1<< PG_writeback); 1 << PG_private | 1<< PG_writeback);
map++; map++;
} }
...@@ -337,6 +332,14 @@ set_hugetlb_mem_size (int count) ...@@ -337,6 +332,14 @@ set_hugetlb_mem_size (int count)
return (int) htlbzone_pages; return (int) htlbzone_pages;
} }
static struct page *
hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
{
BUG();
return NULL;
}
static struct vm_operations_struct hugetlb_vm_ops = { static struct vm_operations_struct hugetlb_vm_ops = {
.close = zap_hugetlb_resources .nopage = hugetlb_nopage,
.close = zap_hugetlb_resources,
}; };
...@@ -288,6 +288,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -288,6 +288,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = pte_page(pte); page = pte_page(pte);
if (pages) { if (pages) {
page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT); page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
pages[i] = page; pages[i] = page;
} }
if (vmas) if (vmas)
...@@ -584,11 +585,6 @@ int set_hugetlb_mem_size(int count) ...@@ -584,11 +585,6 @@ int set_hugetlb_mem_size(int count)
page = alloc_pages(GFP_ATOMIC, HUGETLB_PAGE_ORDER); page = alloc_pages(GFP_ATOMIC, HUGETLB_PAGE_ORDER);
if (page == NULL) if (page == NULL)
break; break;
map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
SetPageReserved(map);
map++;
}
spin_lock(&htlbpage_lock); spin_lock(&htlbpage_lock);
list_add(&page->list, &htlbpage_freelist); list_add(&page->list, &htlbpage_freelist);
htlbpagemem++; htlbpagemem++;
...@@ -613,7 +609,6 @@ int set_hugetlb_mem_size(int count) ...@@ -613,7 +609,6 @@ int set_hugetlb_mem_size(int count)
map->flags &= ~(1UL << PG_locked | 1UL << PG_error | map->flags &= ~(1UL << PG_locked | 1UL << PG_error |
1UL << PG_referenced | 1UL << PG_referenced |
1UL << PG_dirty | 1UL << PG_active | 1UL << PG_dirty | 1UL << PG_active |
1UL << PG_reserved |
1UL << PG_private | 1UL << PG_writeback); 1UL << PG_private | 1UL << PG_writeback);
set_page_count(page, 0); set_page_count(page, 0);
map++; map++;
...@@ -624,6 +619,14 @@ int set_hugetlb_mem_size(int count) ...@@ -624,6 +619,14 @@ int set_hugetlb_mem_size(int count)
return (int) htlbzone_pages; return (int) htlbzone_pages;
} }
static struct page *
hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
{
BUG();
return NULL;
}
static struct vm_operations_struct hugetlb_vm_ops = { static struct vm_operations_struct hugetlb_vm_ops = {
.nopage = hugetlb_nopage,
.close = zap_hugetlb_resources, .close = zap_hugetlb_resources,
}; };
...@@ -25,7 +25,6 @@ static long htlbpagemem; ...@@ -25,7 +25,6 @@ static long htlbpagemem;
int htlbpage_max; int htlbpage_max;
static long htlbzone_pages; static long htlbzone_pages;
struct vm_operations_struct hugetlb_vm_ops;
static LIST_HEAD(htlbpage_freelist); static LIST_HEAD(htlbpage_freelist);
static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED; static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
...@@ -134,6 +133,7 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -134,6 +133,7 @@ follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = pte_page(pte); page = pte_page(pte);
if (pages) { if (pages) {
page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT); page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
get_page(page);
pages[i] = page; pages[i] = page;
} }
if (vmas) if (vmas)
...@@ -204,6 +204,7 @@ void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigne ...@@ -204,6 +204,7 @@ void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigne
int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct inode = mapping->host;
unsigned long addr; unsigned long addr;
int ret = 0; int ret = 0;
...@@ -227,6 +228,8 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) ...@@ -227,6 +228,8 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
page = find_get_page(mapping, idx); page = find_get_page(mapping, idx);
if (!page) { if (!page) {
loff_t i_size;
page = alloc_hugetlb_page(); page = alloc_hugetlb_page();
if (!page) { if (!page) {
ret = -ENOMEM; ret = -ENOMEM;
...@@ -238,6 +241,9 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) ...@@ -238,6 +241,9 @@ int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
free_huge_page(page); free_huge_page(page);
goto out; goto out;
} }
i_size = (loff_t)(idx + 1) * HPAGE_SIZE;
if (i_size > inode->i_size)
inode->i_size = i_size;
} }
set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE); set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE);
} }
...@@ -263,11 +269,6 @@ int set_hugetlb_mem_size(int count) ...@@ -263,11 +269,6 @@ int set_hugetlb_mem_size(int count)
page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER); page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER);
if (page == NULL) if (page == NULL)
break; break;
map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
SetPageReserved(map);
map++;
}
spin_lock(&htlbpage_lock); spin_lock(&htlbpage_lock);
list_add(&page->list, &htlbpage_freelist); list_add(&page->list, &htlbpage_freelist);
htlbpagemem++; htlbpagemem++;
...@@ -286,8 +287,9 @@ int set_hugetlb_mem_size(int count) ...@@ -286,8 +287,9 @@ int set_hugetlb_mem_size(int count)
spin_unlock(&htlbpage_lock); spin_unlock(&htlbpage_lock);
map = page; map = page;
for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) { for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced | map->flags &= ~(1 << PG_locked | 1 << PG_error |
1 << PG_dirty | 1 << PG_active | 1 << PG_reserved | 1 << PG_referenced |
1 << PG_dirty | 1 << PG_active |
1 << PG_private | 1<< PG_writeback); 1 << PG_private | 1<< PG_writeback);
set_page_count(map, 0); set_page_count(map, 0);
map++; map++;
...@@ -346,7 +348,8 @@ int hugetlb_report_meminfo(char *buf) ...@@ -346,7 +348,8 @@ int hugetlb_report_meminfo(char *buf)
HPAGE_SIZE/1024); HPAGE_SIZE/1024);
} }
static struct page * hugetlb_nopage(struct vm_area_struct * area, unsigned long address, int unused) static struct page *
hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
{ {
BUG(); BUG();
return NULL; return NULL;
......
...@@ -27,6 +27,8 @@ ...@@ -27,6 +27,8 @@
#include <linux/completion.h> #include <linux/completion.h>
#include <linux/slab.h> #include <linux/slab.h>
static void blk_unplug_work(void *data);
/* /*
* For the allocated request tables * For the allocated request tables
*/ */
...@@ -237,6 +239,14 @@ void blk_queue_make_request(request_queue_t * q, make_request_fn * mfn) ...@@ -237,6 +239,14 @@ void blk_queue_make_request(request_queue_t * q, make_request_fn * mfn)
blk_queue_hardsect_size(q, 512); blk_queue_hardsect_size(q, 512);
blk_queue_dma_alignment(q, 511); blk_queue_dma_alignment(q, 511);
q->unplug_thresh = 4; /* hmm */
q->unplug_delay = (3 * HZ) / 1000; /* 3 milliseconds */
if (q->unplug_delay == 0)
q->unplug_delay = 1;
init_timer(&q->unplug_timer);
INIT_WORK(&q->unplug_work, blk_unplug_work, q);
/* /*
* by default assume old behaviour and bounce for any highmem page * by default assume old behaviour and bounce for any highmem page
*/ */
...@@ -960,6 +970,7 @@ void blk_plug_device(request_queue_t *q) ...@@ -960,6 +970,7 @@ void blk_plug_device(request_queue_t *q)
if (!blk_queue_plugged(q)) { if (!blk_queue_plugged(q)) {
spin_lock(&blk_plug_lock); spin_lock(&blk_plug_lock);
list_add_tail(&q->plug_list, &blk_plug_list); list_add_tail(&q->plug_list, &blk_plug_list);
mod_timer(&q->unplug_timer, jiffies + q->unplug_delay);
spin_unlock(&blk_plug_lock); spin_unlock(&blk_plug_lock);
} }
} }
...@@ -974,6 +985,7 @@ int blk_remove_plug(request_queue_t *q) ...@@ -974,6 +985,7 @@ int blk_remove_plug(request_queue_t *q)
if (blk_queue_plugged(q)) { if (blk_queue_plugged(q)) {
spin_lock(&blk_plug_lock); spin_lock(&blk_plug_lock);
list_del_init(&q->plug_list); list_del_init(&q->plug_list);
del_timer(&q->unplug_timer);
spin_unlock(&blk_plug_lock); spin_unlock(&blk_plug_lock);
return 1; return 1;
} }
...@@ -992,6 +1004,8 @@ static inline void __generic_unplug_device(request_queue_t *q) ...@@ -992,6 +1004,8 @@ static inline void __generic_unplug_device(request_queue_t *q)
if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags)) if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
return; return;
del_timer(&q->unplug_timer);
/* /*
* was plugged, fire request_fn if queue has stuff to do * was plugged, fire request_fn if queue has stuff to do
*/ */
...@@ -1020,6 +1034,18 @@ void generic_unplug_device(void *data) ...@@ -1020,6 +1034,18 @@ void generic_unplug_device(void *data)
spin_unlock_irq(q->queue_lock); spin_unlock_irq(q->queue_lock);
} }
static void blk_unplug_work(void *data)
{
generic_unplug_device(data);
}
static void blk_unplug_timeout(unsigned long data)
{
request_queue_t *q = (request_queue_t *)data;
schedule_work(&q->unplug_work);
}
/** /**
* blk_start_queue - restart a previously stopped queue * blk_start_queue - restart a previously stopped queue
* @q: The &request_queue_t in question * @q: The &request_queue_t in question
...@@ -1164,6 +1190,9 @@ void blk_cleanup_queue(request_queue_t * q) ...@@ -1164,6 +1190,9 @@ void blk_cleanup_queue(request_queue_t * q)
count -= __blk_cleanup_queue(&q->rq[READ]); count -= __blk_cleanup_queue(&q->rq[READ]);
count -= __blk_cleanup_queue(&q->rq[WRITE]); count -= __blk_cleanup_queue(&q->rq[WRITE]);
del_timer_sync(&q->unplug_timer);
flush_scheduled_work();
if (count) if (count)
printk("blk_cleanup_queue: leaked requests (%d)\n", count); printk("blk_cleanup_queue: leaked requests (%d)\n", count);
...@@ -1269,6 +1298,9 @@ int blk_init_queue(request_queue_t *q, request_fn_proc *rfn, spinlock_t *lock) ...@@ -1269,6 +1298,9 @@ int blk_init_queue(request_queue_t *q, request_fn_proc *rfn, spinlock_t *lock)
blk_queue_make_request(q, __make_request); blk_queue_make_request(q, __make_request);
blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE); blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE);
q->unplug_timer.function = blk_unplug_timeout;
q->unplug_timer.data = (unsigned long)q;
blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS); blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS); blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
...@@ -1811,7 +1843,15 @@ static int __make_request(request_queue_t *q, struct bio *bio) ...@@ -1811,7 +1843,15 @@ static int __make_request(request_queue_t *q, struct bio *bio)
out: out:
if (freereq) if (freereq)
__blk_put_request(q, freereq); __blk_put_request(q, freereq);
if (blk_queue_plugged(q)) {
int nr_queued = (queue_nr_requests - q->rq[0].count) +
(queue_nr_requests - q->rq[1].count);
if (nr_queued == q->unplug_thresh)
__generic_unplug_device(q);
}
spin_unlock_irq(q->queue_lock); spin_unlock_irq(q->queue_lock);
return 0; return 0;
end_io: end_io:
......
...@@ -350,15 +350,10 @@ static int do_bio_filebacked(struct loop_device *lo, struct bio *bio) ...@@ -350,15 +350,10 @@ static int do_bio_filebacked(struct loop_device *lo, struct bio *bio)
int ret; int ret;
pos = ((loff_t) bio->bi_sector << 9) + lo->lo_offset; pos = ((loff_t) bio->bi_sector << 9) + lo->lo_offset;
do {
if (bio_rw(bio) == WRITE) if (bio_rw(bio) == WRITE)
ret = lo_send(lo, bio, lo->lo_blocksize, pos); ret = lo_send(lo, bio, lo->lo_blocksize, pos);
else else
ret = lo_receive(lo, bio, lo->lo_blocksize, pos); ret = lo_receive(lo, bio, lo->lo_blocksize, pos);
} while (++bio->bi_idx < bio->bi_vcnt);
return ret; return ret;
} }
......
...@@ -19,7 +19,7 @@ comment "Video Adapters" ...@@ -19,7 +19,7 @@ comment "Video Adapters"
config VIDEO_BT848 config VIDEO_BT848
tristate "BT848 Video For Linux" tristate "BT848 Video For Linux"
depends on VIDEO_DEV && PCI && I2C_ALGOBIT depends on VIDEO_DEV && PCI && I2C_ALGOBIT && SOUND
---help--- ---help---
Support for BT848 based frame grabber/overlay boards. This includes Support for BT848 based frame grabber/overlay boards. This includes
the Miro, Hauppauge and STB boards. Please read the material in the Miro, Hauppauge and STB boards. Please read the material in
......
...@@ -127,9 +127,10 @@ void __wait_on_buffer(struct buffer_head * bh) ...@@ -127,9 +127,10 @@ void __wait_on_buffer(struct buffer_head * bh)
get_bh(bh); get_bh(bh);
do { do {
prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
if (buffer_locked(bh)) {
blk_run_queues(); blk_run_queues();
if (buffer_locked(bh))
io_schedule(); io_schedule();
}
} while (buffer_locked(bh)); } while (buffer_locked(bh));
put_bh(bh); put_bh(bh);
finish_wait(wqh, &wait); finish_wait(wqh, &wait);
...@@ -959,8 +960,6 @@ create_buffers(struct page * page, unsigned long size, int retry) ...@@ -959,8 +960,6 @@ create_buffers(struct page * page, unsigned long size, int retry)
* the reserve list is empty, we're sure there are * the reserve list is empty, we're sure there are
* async buffer heads in use. * async buffer heads in use.
*/ */
blk_run_queues();
free_more_memory(); free_more_memory();
goto try_again; goto try_again;
} }
......
...@@ -300,6 +300,8 @@ void put_dirty_page(struct task_struct * tsk, struct page *page, unsigned long a ...@@ -300,6 +300,8 @@ void put_dirty_page(struct task_struct * tsk, struct page *page, unsigned long a
pgd = pgd_offset(tsk->mm, address); pgd = pgd_offset(tsk->mm, address);
pte_chain = pte_chain_alloc(GFP_KERNEL); pte_chain = pte_chain_alloc(GFP_KERNEL);
if (!pte_chain)
goto out_sig;
spin_lock(&tsk->mm->page_table_lock); spin_lock(&tsk->mm->page_table_lock);
pmd = pmd_alloc(tsk->mm, pgd, address); pmd = pmd_alloc(tsk->mm, pgd, address);
if (!pmd) if (!pmd)
...@@ -325,6 +327,7 @@ void put_dirty_page(struct task_struct * tsk, struct page *page, unsigned long a ...@@ -325,6 +327,7 @@ void put_dirty_page(struct task_struct * tsk, struct page *page, unsigned long a
return; return;
out: out:
spin_unlock(&tsk->mm->page_table_lock); spin_unlock(&tsk->mm->page_table_lock);
out_sig:
__free_page(page); __free_page(page);
force_sig(SIGKILL, tsk); force_sig(SIGKILL, tsk);
pte_chain_free(pte_chain); pte_chain_free(pte_chain);
......
...@@ -99,6 +99,34 @@ int ext3_forget(handle_t *handle, int is_metadata, ...@@ -99,6 +99,34 @@ int ext3_forget(handle_t *handle, int is_metadata,
return err; return err;
} }
/*
* Work out how many blocks we need to progress with the next chunk of a
* truncate transaction.
*/
static unsigned long blocks_for_truncate(struct inode *inode)
{
unsigned long needed;
needed = inode->i_blocks >> (inode->i_sb->s_blocksize_bits - 9);
/* Give ourselves just enough room to cope with inodes in which
* i_blocks is corrupt: we've seen disk corruptions in the past
* which resulted in random data in an inode which looked enough
* like a regular file for ext3 to try to delete it. Things
* will go a bit crazy if that happens, but at least we should
* try not to panic the whole kernel. */
if (needed < 2)
needed = 2;
/* But we need to bound the transaction so we don't overflow the
* journal. */
if (needed > EXT3_MAX_TRANS_DATA)
needed = EXT3_MAX_TRANS_DATA;
return EXT3_DATA_TRANS_BLOCKS + needed;
}
/* /*
* Truncate transactions can be complex and absolutely huge. So we need to * Truncate transactions can be complex and absolutely huge. So we need to
* be able to restart the transaction at a conventient checkpoint to make * be able to restart the transaction at a conventient checkpoint to make
...@@ -112,14 +140,9 @@ int ext3_forget(handle_t *handle, int is_metadata, ...@@ -112,14 +140,9 @@ int ext3_forget(handle_t *handle, int is_metadata,
static handle_t *start_transaction(struct inode *inode) static handle_t *start_transaction(struct inode *inode)
{ {
long needed;
handle_t *result; handle_t *result;
needed = inode->i_blocks; result = ext3_journal_start(inode, blocks_for_truncate(inode));
if (needed > EXT3_MAX_TRANS_DATA)
needed = EXT3_MAX_TRANS_DATA;
result = ext3_journal_start(inode, EXT3_DATA_TRANS_BLOCKS + needed);
if (!IS_ERR(result)) if (!IS_ERR(result))
return result; return result;
...@@ -135,14 +158,9 @@ static handle_t *start_transaction(struct inode *inode) ...@@ -135,14 +158,9 @@ static handle_t *start_transaction(struct inode *inode)
*/ */
static int try_to_extend_transaction(handle_t *handle, struct inode *inode) static int try_to_extend_transaction(handle_t *handle, struct inode *inode)
{ {
long needed;
if (handle->h_buffer_credits > EXT3_RESERVE_TRANS_BLOCKS) if (handle->h_buffer_credits > EXT3_RESERVE_TRANS_BLOCKS)
return 0; return 0;
needed = inode->i_blocks; if (!ext3_journal_extend(handle, blocks_for_truncate(inode)))
if (needed > EXT3_MAX_TRANS_DATA)
needed = EXT3_MAX_TRANS_DATA;
if (!ext3_journal_extend(handle, EXT3_RESERVE_TRANS_BLOCKS + needed))
return 0; return 0;
return 1; return 1;
} }
...@@ -154,11 +172,8 @@ static int try_to_extend_transaction(handle_t *handle, struct inode *inode) ...@@ -154,11 +172,8 @@ static int try_to_extend_transaction(handle_t *handle, struct inode *inode)
*/ */
static int ext3_journal_test_restart(handle_t *handle, struct inode *inode) static int ext3_journal_test_restart(handle_t *handle, struct inode *inode)
{ {
long needed = inode->i_blocks;
if (needed > EXT3_MAX_TRANS_DATA)
needed = EXT3_MAX_TRANS_DATA;
jbd_debug(2, "restarting handle %p\n", handle); jbd_debug(2, "restarting handle %p\n", handle);
return ext3_journal_restart(handle, EXT3_DATA_TRANS_BLOCKS + needed); return ext3_journal_restart(handle, blocks_for_truncate(inode));
} }
/* /*
......
...@@ -61,6 +61,12 @@ void __mark_inode_dirty(struct inode *inode, int flags) ...@@ -61,6 +61,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
sb->s_op->dirty_inode(inode); sb->s_op->dirty_inode(inode);
} }
/*
* make sure that changes are seen by all cpus before we test i_state
* -- mikulas
*/
smp_mb();
/* avoid the locking if we can */ /* avoid the locking if we can */
if ((inode->i_state & flags) == flags) if ((inode->i_state & flags) == flags)
return; return;
...@@ -137,6 +143,12 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc) ...@@ -137,6 +143,12 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc)
inode->i_state |= I_LOCK; inode->i_state |= I_LOCK;
inode->i_state &= ~I_DIRTY; inode->i_state &= ~I_DIRTY;
/*
* smp_rmb(); note: if you remove write_lock below, you must add this.
* mark_inode_dirty doesn't take spinlock, make sure that inode is not
* read speculatively by this cpu before &= ~I_DIRTY -- mikulas
*/
write_lock(&mapping->page_lock); write_lock(&mapping->page_lock);
if (wait || !wbc->for_kupdate || list_empty(&mapping->io_pages)) if (wait || !wbc->for_kupdate || list_empty(&mapping->io_pages))
list_splice_init(&mapping->dirty_pages, &mapping->io_pages); list_splice_init(&mapping->dirty_pages, &mapping->io_pages);
...@@ -334,7 +346,6 @@ writeback_inodes(struct writeback_control *wbc) ...@@ -334,7 +346,6 @@ writeback_inodes(struct writeback_control *wbc)
} }
spin_unlock(&sb_lock); spin_unlock(&sb_lock);
spin_unlock(&inode_lock); spin_unlock(&inode_lock);
blk_run_queues();
} }
/* /*
......
This diff is collapsed.
This diff is collapsed.
...@@ -206,20 +206,22 @@ do { \ ...@@ -206,20 +206,22 @@ do { \
var -= ((journal)->j_last - (journal)->j_first); \ var -= ((journal)->j_last - (journal)->j_first); \
} while (0) } while (0)
/* /**
* journal_recover * int journal_recover(journal_t *journal) - recovers a on-disk journal
* @journal: the journal to recover
* *
* The primary function for recovering the log contents when mounting a * The primary function for recovering the log contents when mounting a
* journaled device. * journaled device.
* */
int journal_recover(journal_t *journal)
{
/*
* Recovery is done in three passes. In the first pass, we look for the * Recovery is done in three passes. In the first pass, we look for the
* end of the log. In the second, we assemble the list of revoke * end of the log. In the second, we assemble the list of revoke
* blocks. In the third and final pass, we replay any un-revoked blocks * blocks. In the third and final pass, we replay any un-revoked blocks
* in the log. * in the log.
*/ */
int journal_recover(journal_t *journal)
{
int err; int err;
journal_superblock_t * sb; journal_superblock_t * sb;
...@@ -263,20 +265,23 @@ int journal_recover(journal_t *journal) ...@@ -263,20 +265,23 @@ int journal_recover(journal_t *journal)
return err; return err;
} }
/* /**
* journal_skip_recovery * int journal_skip_recovery() - Start journal and wipe exiting records
* @journal: journal to startup
* *
* Locate any valid recovery information from the journal and set up the * Locate any valid recovery information from the journal and set up the
* journal structures in memory to ignore it (presumably because the * journal structures in memory to ignore it (presumably because the
* caller has evidence that it is out of date). * caller has evidence that it is out of date).
* * This function does'nt appear to be exorted..
*/
int journal_skip_recovery(journal_t *journal)
{
/*
* We perform one pass over the journal to allow us to tell the user how * We perform one pass over the journal to allow us to tell the user how
* much recovery information is being erased, and to let us initialise * much recovery information is being erased, and to let us initialise
* the journal transaction sequence numbers to the next unused ID. * the journal transaction sequence numbers to the next unused ID.
*/ */
int journal_skip_recovery(journal_t *journal)
{
int err; int err;
journal_superblock_t * sb; journal_superblock_t * sb;
......
This diff is collapsed.
...@@ -116,6 +116,49 @@ mpage_alloc(struct block_device *bdev, ...@@ -116,6 +116,49 @@ mpage_alloc(struct block_device *bdev,
return bio; return bio;
} }
/*
* support function for mpage_readpages. The fs supplied get_block might
* return an up to date buffer. This is used to map that buffer into
* the page, which allows readpage to avoid triggering a duplicate call
* to get_block.
*
* The idea is to avoid adding buffers to pages that don't already have
* them. So when the buffer is up to date and the page size == block size,
* this marks the page up to date instead of adding new buffers.
*/
static void
map_buffer_to_page(struct page *page, struct buffer_head *bh, int page_block)
{
struct inode *inode = page->mapping->host;
struct buffer_head *page_bh, *head;
int block = 0;
if (!page_has_buffers(page)) {
/*
* don't make any buffers if there is only one buffer on
* the page and the page just needs to be set up to date
*/
if (inode->i_blkbits == PAGE_CACHE_SHIFT &&
buffer_uptodate(bh)) {
SetPageUptodate(page);
return;
}
create_empty_buffers(page, 1 << inode->i_blkbits, 0);
}
head = page_buffers(page);
page_bh = head;
do {
if (block == page_block) {
page_bh->b_state = bh->b_state;
page_bh->b_bdev = bh->b_bdev;
page_bh->b_blocknr = bh->b_blocknr;
break;
}
page_bh = page_bh->b_this_page;
block++;
} while (page_bh != head);
}
/** /**
* mpage_readpages - populate an address space with some pages, and * mpage_readpages - populate an address space with some pages, and
* start reads against them. * start reads against them.
...@@ -186,6 +229,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, ...@@ -186,6 +229,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
block_in_file = page->index << (PAGE_CACHE_SHIFT - blkbits); block_in_file = page->index << (PAGE_CACHE_SHIFT - blkbits);
last_block = (inode->i_size + blocksize - 1) >> blkbits; last_block = (inode->i_size + blocksize - 1) >> blkbits;
bh.b_page = page;
for (page_block = 0; page_block < blocks_per_page; for (page_block = 0; page_block < blocks_per_page;
page_block++, block_in_file++) { page_block++, block_in_file++) {
bh.b_state = 0; bh.b_state = 0;
...@@ -201,6 +245,17 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, ...@@ -201,6 +245,17 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
continue; continue;
} }
/* some filesystems will copy data into the page during
* the get_block call, in which case we don't want to
* read it again. map_buffer_to_page copies the data
* we just collected from get_block into the page's buffers
* so readpage doesn't have to repeat the get_block call
*/
if (buffer_uptodate(&bh)) {
map_buffer_to_page(page, &bh, page_block);
goto confused;
}
if (first_hole != blocks_per_page) if (first_hole != blocks_per_page)
goto confused; /* hole -> non-hole */ goto confused; /* hole -> non-hole */
...@@ -256,7 +311,10 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, ...@@ -256,7 +311,10 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
confused: confused:
if (bio) if (bio)
bio = mpage_bio_submit(READ, bio); bio = mpage_bio_submit(READ, bio);
if (!PageUptodate(page))
block_read_full_page(page, get_block); block_read_full_page(page, get_block);
else
unlock_page(page);
goto out; goto out;
} }
...@@ -344,6 +402,7 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block, ...@@ -344,6 +402,7 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
sector_t boundary_block = 0; sector_t boundary_block = 0;
struct block_device *boundary_bdev = NULL; struct block_device *boundary_bdev = NULL;
int length; int length;
struct buffer_head map_bh;
if (page_has_buffers(page)) { if (page_has_buffers(page)) {
struct buffer_head *head = page_buffers(page); struct buffer_head *head = page_buffers(page);
...@@ -401,8 +460,8 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block, ...@@ -401,8 +460,8 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
BUG_ON(!PageUptodate(page)); BUG_ON(!PageUptodate(page));
block_in_file = page->index << (PAGE_CACHE_SHIFT - blkbits); block_in_file = page->index << (PAGE_CACHE_SHIFT - blkbits);
last_block = (inode->i_size - 1) >> blkbits; last_block = (inode->i_size - 1) >> blkbits;
map_bh.b_page = page;
for (page_block = 0; page_block < blocks_per_page; ) { for (page_block = 0; page_block < blocks_per_page; ) {
struct buffer_head map_bh;
map_bh.b_state = 0; map_bh.b_state = 0;
if (get_block(inode, block_in_file, &map_bh, 1)) if (get_block(inode, block_in_file, &map_bh, 1))
...@@ -559,7 +618,6 @@ mpage_writepages(struct address_space *mapping, ...@@ -559,7 +618,6 @@ mpage_writepages(struct address_space *mapping,
int (*writepage)(struct page *page, struct writeback_control *wbc); int (*writepage)(struct page *page, struct writeback_control *wbc);
if (wbc->nonblocking && bdi_write_congested(bdi)) { if (wbc->nonblocking && bdi_write_congested(bdi)) {
blk_run_queues();
wbc->encountered_congestion = 1; wbc->encountered_congestion = 1;
return 0; return 0;
} }
...@@ -614,7 +672,6 @@ mpage_writepages(struct address_space *mapping, ...@@ -614,7 +672,6 @@ mpage_writepages(struct address_space *mapping,
if (ret || (--(wbc->nr_to_write) <= 0)) if (ret || (--(wbc->nr_to_write) <= 0))
done = 1; done = 1;
if (wbc->nonblocking && bdi_write_congested(bdi)) { if (wbc->nonblocking && bdi_write_congested(bdi)) {
blk_run_queues();
wbc->encountered_congestion = 1; wbc->encountered_congestion = 1;
done = 1; done = 1;
} }
......
...@@ -535,6 +535,10 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, ...@@ -535,6 +535,10 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
if (retval) if (retval)
goto fput_in; goto fput_in;
retval = security_file_permission (in_file, MAY_READ);
if (retval)
goto fput_in;
/* /*
* Get output file, and verify that it is ok.. * Get output file, and verify that it is ok..
*/ */
...@@ -556,6 +560,10 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, ...@@ -556,6 +560,10 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
if (retval) if (retval)
goto fput_out; goto fput_out;
retval = security_file_permission (out_file, MAY_WRITE);
if (retval)
goto fput_out;
if (!ppos) if (!ppos)
ppos = &in_file->f_pos; ppos = &in_file->f_pos;
......
This diff is collapsed.
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <linux/major.h> #include <linux/major.h>
#include <linux/genhd.h> #include <linux/genhd.h>
#include <linux/list.h> #include <linux/list.h>
#include <linux/timer.h>
#include <linux/workqueue.h>
#include <linux/pagemap.h> #include <linux/pagemap.h>
#include <linux/backing-dev.h> #include <linux/backing-dev.h>
#include <linux/wait.h> #include <linux/wait.h>
...@@ -188,6 +190,14 @@ struct request_queue ...@@ -188,6 +190,14 @@ struct request_queue
unplug_fn *unplug_fn; unplug_fn *unplug_fn;
merge_bvec_fn *merge_bvec_fn; merge_bvec_fn *merge_bvec_fn;
/*
* Auto-unplugging state
*/
struct timer_list unplug_timer;
int unplug_thresh; /* After this many requests */
unsigned long unplug_delay; /* After this many jiffies */
struct work_struct unplug_work;
struct backing_dev_info backing_dev_info; struct backing_dev_info backing_dev_info;
/* /*
......
...@@ -28,7 +28,7 @@ ...@@ -28,7 +28,7 @@
* indirection blocks, the group and superblock summaries, and the data * indirection blocks, the group and superblock summaries, and the data
* block to complete the transaction. */ * block to complete the transaction. */
#define EXT3_SINGLEDATA_TRANS_BLOCKS 8 #define EXT3_SINGLEDATA_TRANS_BLOCKS 8U
/* Extended attributes may touch two data buffers, two bitmap buffers, /* Extended attributes may touch two data buffers, two bitmap buffers,
* and two group and summaries. */ * and two group and summaries. */
...@@ -58,7 +58,7 @@ extern int ext3_writepage_trans_blocks(struct inode *inode); ...@@ -58,7 +58,7 @@ extern int ext3_writepage_trans_blocks(struct inode *inode);
* start off at the maximum transaction size and grow the transaction * start off at the maximum transaction size and grow the transaction
* optimistically as we go. */ * optimistically as we go. */
#define EXT3_MAX_TRANS_DATA 64 #define EXT3_MAX_TRANS_DATA 64U
/* We break up a large truncate or write transaction once the handle's /* We break up a large truncate or write transaction once the handle's
* buffer credits gets this low, we need either to extend the * buffer credits gets this low, we need either to extend the
...@@ -67,7 +67,7 @@ extern int ext3_writepage_trans_blocks(struct inode *inode); ...@@ -67,7 +67,7 @@ extern int ext3_writepage_trans_blocks(struct inode *inode);
* one block, plus two quota updates. Quota allocations are not * one block, plus two quota updates. Quota allocations are not
* needed. */ * needed. */
#define EXT3_RESERVE_TRANS_BLOCKS 12 #define EXT3_RESERVE_TRANS_BLOCKS 12U
#define EXT3_INDEX_EXTRA_TRANS_BLOCKS 8 #define EXT3_INDEX_EXTRA_TRANS_BLOCKS 8
......
...@@ -20,16 +20,32 @@ int hugetlb_prefault(struct address_space *, struct vm_area_struct *); ...@@ -20,16 +20,32 @@ int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
void huge_page_release(struct page *); void huge_page_release(struct page *);
int hugetlb_report_meminfo(char *); int hugetlb_report_meminfo(char *);
int is_hugepage_mem_enough(size_t); int is_hugepage_mem_enough(size_t);
struct page *follow_huge_addr(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, int write);
struct vm_area_struct *hugepage_vma(struct mm_struct *mm,
unsigned long address);
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write);
int pmd_huge(pmd_t pmd);
extern int htlbpage_max; extern int htlbpage_max;
static inline void
mark_mm_hugetlb(struct mm_struct *mm, struct vm_area_struct *vma)
{
if (is_vm_hugetlb_page(vma))
mm->used_hugetlb = 1;
}
#else /* !CONFIG_HUGETLB_PAGE */ #else /* !CONFIG_HUGETLB_PAGE */
static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
{ {
return 0; return 0;
} }
#define follow_hugetlb_page(m,v,p,vs,a,b,i) ({ BUG(); 0; }) #define follow_hugetlb_page(m,v,p,vs,a,b,i) ({ BUG(); 0; })
#define follow_huge_addr(mm, vma, addr, write) 0
#define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; }) #define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; })
#define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; })
#define zap_hugepage_range(vma, start, len) BUG() #define zap_hugepage_range(vma, start, len) BUG()
...@@ -37,6 +53,14 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) ...@@ -37,6 +53,14 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
#define huge_page_release(page) BUG() #define huge_page_release(page) BUG()
#define is_hugepage_mem_enough(size) 0 #define is_hugepage_mem_enough(size) 0
#define hugetlb_report_meminfo(buf) 0 #define hugetlb_report_meminfo(buf) 0
#define hugepage_vma(mm, addr) 0
#define mark_mm_hugetlb(mm, vma) do { } while (0)
#define follow_huge_pmd(mm, addr, pmd, write) 0
#define pmd_huge(x) 0
#ifndef HPAGE_MASK
#define HPAGE_MASK 0 /* Keep the compiler happy */
#endif
#endif /* !CONFIG_HUGETLB_PAGE */ #endif /* !CONFIG_HUGETLB_PAGE */
......
This diff is collapsed.
...@@ -208,24 +208,55 @@ struct page { ...@@ -208,24 +208,55 @@ struct page {
* Also, many kernel routines increase the page count before a critical * Also, many kernel routines increase the page count before a critical
* routine so they can be sure the page doesn't go away from under them. * routine so they can be sure the page doesn't go away from under them.
*/ */
#define get_page(p) atomic_inc(&(p)->count)
#define __put_page(p) atomic_dec(&(p)->count)
#define put_page_testzero(p) \ #define put_page_testzero(p) \
({ \ ({ \
BUG_ON(page_count(page) == 0); \ BUG_ON(page_count(page) == 0); \
atomic_dec_and_test(&(p)->count); \ atomic_dec_and_test(&(p)->count); \
}) })
#define page_count(p) atomic_read(&(p)->count) #define page_count(p) atomic_read(&(p)->count)
#define set_page_count(p,v) atomic_set(&(p)->count, v) #define set_page_count(p,v) atomic_set(&(p)->count, v)
#define __put_page(p) atomic_dec(&(p)->count)
extern void FASTCALL(__page_cache_release(struct page *)); extern void FASTCALL(__page_cache_release(struct page *));
#ifdef CONFIG_HUGETLB_PAGE
static inline void get_page(struct page *page)
{
if (PageCompound(page))
page = (struct page *)page->lru.next;
atomic_inc(&page->count);
}
static inline void put_page(struct page *page)
{
if (PageCompound(page)) {
page = (struct page *)page->lru.next;
if (page->lru.prev) { /* destructor? */
(*(void (*)(struct page *))page->lru.prev)(page);
return;
}
}
if (!PageReserved(page) && put_page_testzero(page))
__page_cache_release(page);
}
#else /* CONFIG_HUGETLB_PAGE */
static inline void get_page(struct page *page)
{
atomic_inc(&page->count);
}
static inline void put_page(struct page *page) static inline void put_page(struct page *page)
{ {
if (!PageReserved(page) && put_page_testzero(page)) if (!PageReserved(page) && put_page_testzero(page))
__page_cache_release(page); __page_cache_release(page);
} }
#endif /* CONFIG_HUGETLB_PAGE */
/* /*
* Multiple processes may "see" the same page. E.g. for untouched * Multiple processes may "see" the same page. E.g. for untouched
* mappings of /dev/null, all processes see the same page full of * mappings of /dev/null, all processes see the same page full of
......
...@@ -72,7 +72,8 @@ ...@@ -72,7 +72,8 @@
#define PG_direct 16 /* ->pte_chain points directly at pte */ #define PG_direct 16 /* ->pte_chain points directly at pte */
#define PG_mappedtodisk 17 /* Has blocks allocated on-disk */ #define PG_mappedtodisk 17 /* Has blocks allocated on-disk */
#define PG_reclaim 18 /* To be recalimed asap */ #define PG_reclaim 18 /* To be reclaimed asap */
#define PG_compound 19 /* Part of a compound page */
/* /*
* Global page accounting. One instance per CPU. Only unsigned longs are * Global page accounting. One instance per CPU. Only unsigned longs are
...@@ -251,6 +252,10 @@ extern void get_full_page_state(struct page_state *ret); ...@@ -251,6 +252,10 @@ extern void get_full_page_state(struct page_state *ret);
#define ClearPageReclaim(page) clear_bit(PG_reclaim, &(page)->flags) #define ClearPageReclaim(page) clear_bit(PG_reclaim, &(page)->flags)
#define TestClearPageReclaim(page) test_and_clear_bit(PG_reclaim, &(page)->flags) #define TestClearPageReclaim(page) test_and_clear_bit(PG_reclaim, &(page)->flags)
#define PageCompound(page) test_bit(PG_compound, &(page)->flags)
#define SetPageCompound(page) set_bit(PG_compound, &(page)->flags)
#define ClearPageCompound(page) clear_bit(PG_compound, &(page)->flags)
/* /*
* The PageSwapCache predicate doesn't use a PG_flag at this time, * The PageSwapCache predicate doesn't use a PG_flag at this time,
* but it may again do so one day. * but it may again do so one day.
......
...@@ -201,7 +201,9 @@ struct mm_struct { ...@@ -201,7 +201,9 @@ struct mm_struct {
unsigned long swap_address; unsigned long swap_address;
unsigned dumpable:1; unsigned dumpable:1;
#ifdef CONFIG_HUGETLB_PAGE
int used_hugetlb;
#endif
/* Architecture-specific MM context */ /* Architecture-specific MM context */
mm_context_t context; mm_context_t context;
......
...@@ -37,30 +37,120 @@ ...@@ -37,30 +37,120 @@
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
#include <asm/spinlock.h> #include <asm/spinlock.h>
/* #else
* !CONFIG_SMP and spin_lock_init not previously defined
* (e.g. by including include/asm/spinlock.h)
*/
#elif !defined(spin_lock_init)
#ifndef CONFIG_PREEMPT #if !defined(CONFIG_PREEMPT) && !defined(CONFIG_DEBUG_SPINLOCK)
# define atomic_dec_and_lock(atomic,lock) atomic_dec_and_test(atomic) # define atomic_dec_and_lock(atomic,lock) atomic_dec_and_test(atomic)
# define ATOMIC_DEC_AND_LOCK # define ATOMIC_DEC_AND_LOCK
#endif #endif
#ifdef CONFIG_DEBUG_SPINLOCK
#define SPINLOCK_MAGIC 0x1D244B3C
typedef struct {
unsigned long magic;
volatile unsigned long lock;
volatile unsigned int babble;
const char *module;
char *owner;
int oline;
} spinlock_t;
#define SPIN_LOCK_UNLOCKED (spinlock_t) { SPINLOCK_MAGIC, 0, 10, __FILE__ , NULL, 0}
#define spin_lock_init(x) \
do { \
(x)->magic = SPINLOCK_MAGIC; \
(x)->lock = 0; \
(x)->babble = 5; \
(x)->module = __FILE__; \
(x)->owner = NULL; \
(x)->oline = 0; \
} while (0)
#define CHECK_LOCK(x) \
do { \
if ((x)->magic != SPINLOCK_MAGIC) { \
printk(KERN_ERR "%s:%d: spin_is_locked on uninitialized spinlock %p.\n", \
__FILE__, __LINE__, (x)); \
} \
} while(0)
#define _raw_spin_lock(x) \
do { \
CHECK_LOCK(x); \
if ((x)->lock&&(x)->babble) { \
printk("%s:%d: spin_lock(%s:%p) already locked by %s/%d\n", \
__FILE__,__LINE__, (x)->module, \
(x), (x)->owner, (x)->oline); \
(x)->babble--; \
} \
(x)->lock = 1; \
(x)->owner = __FILE__; \
(x)->oline = __LINE__; \
} while (0)
/* without debugging, spin_is_locked on UP always says
* FALSE. --> printk if already locked. */
#define spin_is_locked(x) \
({ \
CHECK_LOCK(x); \
if ((x)->lock&&(x)->babble) { \
printk("%s:%d: spin_is_locked(%s:%p) already locked by %s/%d\n", \
__FILE__,__LINE__, (x)->module, \
(x), (x)->owner, (x)->oline); \
(x)->babble--; \
} \
0; \
})
/* without debugging, spin_trylock on UP always says
* TRUE. --> printk if already locked. */
#define _raw_spin_trylock(x) \
({ \
CHECK_LOCK(x); \
if ((x)->lock&&(x)->babble) { \
printk("%s:%d: spin_trylock(%s:%p) already locked by %s/%d\n", \
__FILE__,__LINE__, (x)->module, \
(x), (x)->owner, (x)->oline); \
(x)->babble--; \
} \
(x)->lock = 1; \
(x)->owner = __FILE__; \
(x)->oline = __LINE__; \
1; \
})
#define spin_unlock_wait(x) \
do { \
CHECK_LOCK(x); \
if ((x)->lock&&(x)->babble) { \
printk("%s:%d: spin_unlock_wait(%s:%p) owned by %s/%d\n", \
__FILE__,__LINE__, (x)->module, (x), \
(x)->owner, (x)->oline); \
(x)->babble--; \
}\
} while (0)
#define _raw_spin_unlock(x) \
do { \
CHECK_LOCK(x); \
if (!(x)->lock&&(x)->babble) { \
printk("%s:%d: spin_unlock(%s:%p) not locked\n", \
__FILE__,__LINE__, (x)->module, (x));\
(x)->babble--; \
} \
(x)->lock = 0; \
} while (0)
#else
/* /*
* gcc versions before ~2.95 have a nasty bug with empty initializers. * gcc versions before ~2.95 have a nasty bug with empty initializers.
*/ */
#if (__GNUC__ > 2) #if (__GNUC__ > 2)
typedef struct { } spinlock_t; typedef struct { } spinlock_t;
typedef struct { } rwlock_t;
#define SPIN_LOCK_UNLOCKED (spinlock_t) { } #define SPIN_LOCK_UNLOCKED (spinlock_t) { }
#define RW_LOCK_UNLOCKED (rwlock_t) { }
#else #else
typedef struct { int gcc_is_buggy; } spinlock_t; typedef struct { int gcc_is_buggy; } spinlock_t;
typedef struct { int gcc_is_buggy; } rwlock_t;
#define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 } #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
#define RW_LOCK_UNLOCKED (rwlock_t) { 0 }
#endif #endif
/* /*
...@@ -72,6 +162,18 @@ ...@@ -72,6 +162,18 @@
#define _raw_spin_trylock(lock) ((void)(lock), 1) #define _raw_spin_trylock(lock) ((void)(lock), 1)
#define spin_unlock_wait(lock) do { (void)(lock); } while(0) #define spin_unlock_wait(lock) do { (void)(lock); } while(0)
#define _raw_spin_unlock(lock) do { (void)(lock); } while(0) #define _raw_spin_unlock(lock) do { (void)(lock); } while(0)
#endif /* CONFIG_DEBUG_SPINLOCK */
/* RW spinlocks: No debug version */
#if (__GNUC__ > 2)
typedef struct { } rwlock_t;
#define RW_LOCK_UNLOCKED (rwlock_t) { }
#else
typedef struct { int gcc_is_buggy; } rwlock_t;
#define RW_LOCK_UNLOCKED (rwlock_t) { 0 }
#endif
#define rwlock_init(lock) do { (void)(lock); } while(0) #define rwlock_init(lock) do { (void)(lock); } while(0)
#define _raw_read_lock(lock) do { (void)(lock); } while(0) #define _raw_read_lock(lock) do { (void)(lock); } while(0)
#define _raw_read_unlock(lock) do { (void)(lock); } while(0) #define _raw_read_unlock(lock) do { (void)(lock); } while(0)
......
...@@ -177,6 +177,7 @@ static int worker_thread(void *__startup) ...@@ -177,6 +177,7 @@ static int worker_thread(void *__startup)
current->flags |= PF_IOTHREAD; current->flags |= PF_IOTHREAD;
cwq->thread = current; cwq->thread = current;
set_user_nice(current, -10);
set_cpus_allowed(current, 1UL << cpu); set_cpus_allowed(current, 1UL << cpu);
spin_lock_irq(&current->sig->siglock); spin_lock_irq(&current->sig->siglock);
......
...@@ -259,9 +259,10 @@ void wait_on_page_bit(struct page *page, int bit_nr) ...@@ -259,9 +259,10 @@ void wait_on_page_bit(struct page *page, int bit_nr)
do { do {
prepare_to_wait(waitqueue, &wait, TASK_UNINTERRUPTIBLE); prepare_to_wait(waitqueue, &wait, TASK_UNINTERRUPTIBLE);
if (test_bit(bit_nr, &page->flags)) {
sync_page(page); sync_page(page);
if (test_bit(bit_nr, &page->flags))
io_schedule(); io_schedule();
}
} while (test_bit(bit_nr, &page->flags)); } while (test_bit(bit_nr, &page->flags));
finish_wait(waitqueue, &wait); finish_wait(waitqueue, &wait);
} }
...@@ -326,10 +327,11 @@ void __lock_page(struct page *page) ...@@ -326,10 +327,11 @@ void __lock_page(struct page *page)
while (TestSetPageLocked(page)) { while (TestSetPageLocked(page)) {
prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
if (PageLocked(page)) {
sync_page(page); sync_page(page);
if (PageLocked(page))
io_schedule(); io_schedule();
} }
}
finish_wait(wqh, &wait); finish_wait(wqh, &wait);
} }
EXPORT_SYMBOL(__lock_page); EXPORT_SYMBOL(__lock_page);
......
...@@ -53,8 +53,11 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -53,8 +53,11 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
pte_t *pte, entry; pte_t *pte, entry;
pgd_t *pgd; pgd_t *pgd;
pmd_t *pmd; pmd_t *pmd;
struct pte_chain *pte_chain = NULL; struct pte_chain *pte_chain;
pte_chain = pte_chain_alloc(GFP_KERNEL);
if (!pte_chain)
goto err;
pgd = pgd_offset(mm, addr); pgd = pgd_offset(mm, addr);
spin_lock(&mm->page_table_lock); spin_lock(&mm->page_table_lock);
...@@ -62,7 +65,6 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -62,7 +65,6 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (!pmd) if (!pmd)
goto err_unlock; goto err_unlock;
pte_chain = pte_chain_alloc(GFP_KERNEL);
pte = pte_alloc_map(mm, pmd, addr); pte = pte_alloc_map(mm, pmd, addr);
if (!pte) if (!pte)
goto err_unlock; goto err_unlock;
...@@ -87,6 +89,7 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -87,6 +89,7 @@ int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
err_unlock: err_unlock:
spin_unlock(&mm->page_table_lock); spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain); pte_chain_free(pte_chain);
err:
return err; return err;
} }
......
...@@ -607,13 +607,22 @@ follow_page(struct mm_struct *mm, unsigned long address, int write) ...@@ -607,13 +607,22 @@ follow_page(struct mm_struct *mm, unsigned long address, int write)
pmd_t *pmd; pmd_t *pmd;
pte_t *ptep, pte; pte_t *ptep, pte;
unsigned long pfn; unsigned long pfn;
struct vm_area_struct *vma;
vma = hugepage_vma(mm, address);
if (vma)
return follow_huge_addr(mm, vma, address, write);
pgd = pgd_offset(mm, address); pgd = pgd_offset(mm, address);
if (pgd_none(*pgd) || pgd_bad(*pgd)) if (pgd_none(*pgd) || pgd_bad(*pgd))
goto out; goto out;
pmd = pmd_offset(pgd, address); pmd = pmd_offset(pgd, address);
if (pmd_none(*pmd) || pmd_bad(*pmd)) if (pmd_none(*pmd))
goto out;
if (pmd_huge(*pmd))
return follow_huge_pmd(mm, address, pmd, write);
if (pmd_bad(*pmd))
goto out; goto out;
ptep = pte_offset_map(pmd, address); ptep = pte_offset_map(pmd, address);
...@@ -926,9 +935,19 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma, ...@@ -926,9 +935,19 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
struct page *old_page, *new_page; struct page *old_page, *new_page;
unsigned long pfn = pte_pfn(pte); unsigned long pfn = pte_pfn(pte);
struct pte_chain *pte_chain = NULL; struct pte_chain *pte_chain = NULL;
int ret;
if (!pfn_valid(pfn)) if (unlikely(!pfn_valid(pfn))) {
goto bad_wp_page; /*
* This should really halt the system so it can be debugged or
* at least the kernel stops what it's doing before it corrupts
* data, but for the moment just pretend this is OOM.
*/
pte_unmap(page_table);
printk(KERN_ERR "do_wp_page: bogus page at address %08lx\n",
address);
goto oom;
}
old_page = pfn_to_page(pfn); old_page = pfn_to_page(pfn);
if (!TestSetPageLocked(old_page)) { if (!TestSetPageLocked(old_page)) {
...@@ -936,10 +955,11 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma, ...@@ -936,10 +955,11 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
unlock_page(old_page); unlock_page(old_page);
if (reuse) { if (reuse) {
flush_cache_page(vma, address); flush_cache_page(vma, address);
establish_pte(vma, address, page_table, pte_mkyoung(pte_mkdirty(pte_mkwrite(pte)))); establish_pte(vma, address, page_table,
pte_mkyoung(pte_mkdirty(pte_mkwrite(pte))));
pte_unmap(page_table); pte_unmap(page_table);
spin_unlock(&mm->page_table_lock); ret = VM_FAULT_MINOR;
return VM_FAULT_MINOR; goto out;
} }
} }
pte_unmap(page_table); pte_unmap(page_table);
...@@ -950,11 +970,13 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma, ...@@ -950,11 +970,13 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
page_cache_get(old_page); page_cache_get(old_page);
spin_unlock(&mm->page_table_lock); spin_unlock(&mm->page_table_lock);
pte_chain = pte_chain_alloc(GFP_KERNEL);
if (!pte_chain)
goto no_mem;
new_page = alloc_page(GFP_HIGHUSER); new_page = alloc_page(GFP_HIGHUSER);
if (!new_page) if (!new_page)
goto no_mem; goto no_mem;
copy_cow_page(old_page,new_page,address); copy_cow_page(old_page,new_page,address);
pte_chain = pte_chain_alloc(GFP_KERNEL);
/* /*
* Re-check the pte - we dropped the lock * Re-check the pte - we dropped the lock
...@@ -973,25 +995,19 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma, ...@@ -973,25 +995,19 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma,
new_page = old_page; new_page = old_page;
} }
pte_unmap(page_table); pte_unmap(page_table);
spin_unlock(&mm->page_table_lock);
page_cache_release(new_page); page_cache_release(new_page);
page_cache_release(old_page); page_cache_release(old_page);
pte_chain_free(pte_chain); ret = VM_FAULT_MINOR;
return VM_FAULT_MINOR; goto out;
bad_wp_page:
pte_unmap(page_table);
spin_unlock(&mm->page_table_lock);
printk(KERN_ERR "do_wp_page: bogus page at address %08lx\n", address);
/*
* This should really halt the system so it can be debugged or
* at least the kernel stops what it's doing before it corrupts
* data, but for the moment just pretend this is OOM.
*/
return VM_FAULT_OOM;
no_mem: no_mem:
page_cache_release(old_page); page_cache_release(old_page);
return VM_FAULT_OOM; oom:
ret = VM_FAULT_OOM;
out:
spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain);
return ret;
} }
static void vmtruncate_list(struct list_head *head, unsigned long pgoff) static void vmtruncate_list(struct list_head *head, unsigned long pgoff)
...@@ -1286,6 +1302,7 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -1286,6 +1302,7 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page * new_page; struct page * new_page;
pte_t entry; pte_t entry;
struct pte_chain *pte_chain; struct pte_chain *pte_chain;
int ret;
if (!vma->vm_ops || !vma->vm_ops->nopage) if (!vma->vm_ops || !vma->vm_ops->nopage)
return do_anonymous_page(mm, vma, page_table, return do_anonymous_page(mm, vma, page_table,
...@@ -1301,6 +1318,10 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -1301,6 +1318,10 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (new_page == NOPAGE_OOM) if (new_page == NOPAGE_OOM)
return VM_FAULT_OOM; return VM_FAULT_OOM;
pte_chain = pte_chain_alloc(GFP_KERNEL);
if (!pte_chain)
goto oom;
/* /*
* Should we do an early C-O-W break? * Should we do an early C-O-W break?
*/ */
...@@ -1308,7 +1329,7 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -1308,7 +1329,7 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page * page = alloc_page(GFP_HIGHUSER); struct page * page = alloc_page(GFP_HIGHUSER);
if (!page) { if (!page) {
page_cache_release(new_page); page_cache_release(new_page);
return VM_FAULT_OOM; goto oom;
} }
copy_user_highpage(page, new_page, address); copy_user_highpage(page, new_page, address);
page_cache_release(new_page); page_cache_release(new_page);
...@@ -1316,7 +1337,6 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -1316,7 +1337,6 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
new_page = page; new_page = page;
} }
pte_chain = pte_chain_alloc(GFP_KERNEL);
spin_lock(&mm->page_table_lock); spin_lock(&mm->page_table_lock);
page_table = pte_offset_map(pmd, address); page_table = pte_offset_map(pmd, address);
...@@ -1346,15 +1366,20 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -1346,15 +1366,20 @@ do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
pte_unmap(page_table); pte_unmap(page_table);
page_cache_release(new_page); page_cache_release(new_page);
spin_unlock(&mm->page_table_lock); spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain); ret = VM_FAULT_MINOR;
return VM_FAULT_MINOR; goto out;
} }
/* no need to invalidate: a not-present page shouldn't be cached */ /* no need to invalidate: a not-present page shouldn't be cached */
update_mmu_cache(vma, address, entry); update_mmu_cache(vma, address, entry);
spin_unlock(&mm->page_table_lock); spin_unlock(&mm->page_table_lock);
ret = VM_FAULT_MAJOR;
goto out;
oom:
ret = VM_FAULT_OOM;
out:
pte_chain_free(pte_chain); pte_chain_free(pte_chain);
return VM_FAULT_MAJOR; return ret;
} }
/* /*
...@@ -1422,6 +1447,10 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma, ...@@ -1422,6 +1447,10 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma,
pgd = pgd_offset(mm, address); pgd = pgd_offset(mm, address);
inc_page_state(pgfault); inc_page_state(pgfault);
if (is_vm_hugetlb_page(vma))
return VM_FAULT_SIGBUS; /* mapping truncation does this. */
/* /*
* We need the page table lock to synchronize with kswapd * We need the page table lock to synchronize with kswapd
* and the SMP-safe atomic PTE updates. * and the SMP-safe atomic PTE updates.
......
...@@ -362,6 +362,7 @@ static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma, ...@@ -362,6 +362,7 @@ static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma,
if (mapping) if (mapping)
up(&mapping->i_shared_sem); up(&mapping->i_shared_sem);
mark_mm_hugetlb(mm, vma);
mm->map_count++; mm->map_count++;
validate_mm(mm); validate_mm(mm);
} }
...@@ -1222,6 +1223,11 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len) ...@@ -1222,6 +1223,11 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
return 0; return 0;
/* we have start < mpnt->vm_end */ /* we have start < mpnt->vm_end */
if (is_vm_hugetlb_page(mpnt)) {
if ((start & ~HPAGE_MASK) || (len & ~HPAGE_MASK))
return -EINVAL;
}
/* if it doesn't overlap, we have nothing.. */ /* if it doesn't overlap, we have nothing.. */
end = start + len; end = start + len;
if (mpnt->vm_start >= end) if (mpnt->vm_start >= end)
...@@ -1423,7 +1429,6 @@ void exit_mmap(struct mm_struct *mm) ...@@ -1423,7 +1429,6 @@ void exit_mmap(struct mm_struct *mm)
kmem_cache_free(vm_area_cachep, vma); kmem_cache_free(vm_area_cachep, vma);
vma = next; vma = next;
} }
} }
/* Insert vm structure into process list sorted by address /* Insert vm structure into process list sorted by address
......
...@@ -24,9 +24,9 @@ ...@@ -24,9 +24,9 @@
static pte_t *get_one_pte_map_nested(struct mm_struct *mm, unsigned long addr) static pte_t *get_one_pte_map_nested(struct mm_struct *mm, unsigned long addr)
{ {
pgd_t * pgd; pgd_t *pgd;
pmd_t * pmd; pmd_t *pmd;
pte_t * pte = NULL; pte_t *pte = NULL;
pgd = pgd_offset(mm, addr); pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd)) if (pgd_none(*pgd))
...@@ -73,8 +73,8 @@ static inline int page_table_present(struct mm_struct *mm, unsigned long addr) ...@@ -73,8 +73,8 @@ static inline int page_table_present(struct mm_struct *mm, unsigned long addr)
static inline pte_t *alloc_one_pte_map(struct mm_struct *mm, unsigned long addr) static inline pte_t *alloc_one_pte_map(struct mm_struct *mm, unsigned long addr)
{ {
pmd_t * pmd; pmd_t *pmd;
pte_t * pte = NULL; pte_t *pte = NULL;
pmd = pmd_alloc(mm, pgd_offset(mm, addr), addr); pmd = pmd_alloc(mm, pgd_offset(mm, addr), addr);
if (pmd) if (pmd)
...@@ -88,7 +88,7 @@ copy_one_pte(struct mm_struct *mm, pte_t *src, pte_t *dst, ...@@ -88,7 +88,7 @@ copy_one_pte(struct mm_struct *mm, pte_t *src, pte_t *dst,
{ {
int error = 0; int error = 0;
pte_t pte; pte_t pte;
struct page * page = NULL; struct page *page = NULL;
if (pte_present(*src)) if (pte_present(*src))
page = pte_page(*src); page = pte_page(*src);
...@@ -183,12 +183,12 @@ static int move_page_tables(struct vm_area_struct *vma, ...@@ -183,12 +183,12 @@ static int move_page_tables(struct vm_area_struct *vma,
return -1; return -1;
} }
static unsigned long move_vma(struct vm_area_struct * vma, static unsigned long move_vma(struct vm_area_struct *vma,
unsigned long addr, unsigned long old_len, unsigned long new_len, unsigned long addr, unsigned long old_len, unsigned long new_len,
unsigned long new_addr) unsigned long new_addr)
{ {
struct mm_struct * mm = vma->vm_mm; struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct * new_vma, * next, * prev; struct vm_area_struct *new_vma, *next, *prev;
int allocated_vma; int allocated_vma;
int split = 0; int split = 0;
...@@ -196,14 +196,16 @@ static unsigned long move_vma(struct vm_area_struct * vma, ...@@ -196,14 +196,16 @@ static unsigned long move_vma(struct vm_area_struct * vma,
next = find_vma_prev(mm, new_addr, &prev); next = find_vma_prev(mm, new_addr, &prev);
if (next) { if (next) {
if (prev && prev->vm_end == new_addr && if (prev && prev->vm_end == new_addr &&
can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) { can_vma_merge(prev, vma->vm_flags) && !vma->vm_file &&
!(vma->vm_flags & VM_SHARED)) {
spin_lock(&mm->page_table_lock); spin_lock(&mm->page_table_lock);
prev->vm_end = new_addr + new_len; prev->vm_end = new_addr + new_len;
spin_unlock(&mm->page_table_lock); spin_unlock(&mm->page_table_lock);
new_vma = prev; new_vma = prev;
if (next != prev->vm_next) if (next != prev->vm_next)
BUG(); BUG();
if (prev->vm_end == next->vm_start && can_vma_merge(next, prev->vm_flags)) { if (prev->vm_end == next->vm_start &&
can_vma_merge(next, prev->vm_flags)) {
spin_lock(&mm->page_table_lock); spin_lock(&mm->page_table_lock);
prev->vm_end = next->vm_end; prev->vm_end = next->vm_end;
__vma_unlink(mm, next, prev); __vma_unlink(mm, next, prev);
...@@ -214,7 +216,8 @@ static unsigned long move_vma(struct vm_area_struct * vma, ...@@ -214,7 +216,8 @@ static unsigned long move_vma(struct vm_area_struct * vma,
kmem_cache_free(vm_area_cachep, next); kmem_cache_free(vm_area_cachep, next);
} }
} else if (next->vm_start == new_addr + new_len && } else if (next->vm_start == new_addr + new_len &&
can_vma_merge(next, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) { can_vma_merge(next, vma->vm_flags) &&
!vma->vm_file && !(vma->vm_flags & VM_SHARED)) {
spin_lock(&mm->page_table_lock); spin_lock(&mm->page_table_lock);
next->vm_start = new_addr; next->vm_start = new_addr;
spin_unlock(&mm->page_table_lock); spin_unlock(&mm->page_table_lock);
...@@ -223,7 +226,8 @@ static unsigned long move_vma(struct vm_area_struct * vma, ...@@ -223,7 +226,8 @@ static unsigned long move_vma(struct vm_area_struct * vma,
} else { } else {
prev = find_vma(mm, new_addr-1); prev = find_vma(mm, new_addr-1);
if (prev && prev->vm_end == new_addr && if (prev && prev->vm_end == new_addr &&
can_vma_merge(prev, vma->vm_flags) && !vma->vm_file && !(vma->vm_flags & VM_SHARED)) { can_vma_merge(prev, vma->vm_flags) && !vma->vm_file &&
!(vma->vm_flags & VM_SHARED)) {
spin_lock(&mm->page_table_lock); spin_lock(&mm->page_table_lock);
prev->vm_end = new_addr + new_len; prev->vm_end = new_addr + new_len;
spin_unlock(&mm->page_table_lock); spin_unlock(&mm->page_table_lock);
...@@ -249,7 +253,7 @@ static unsigned long move_vma(struct vm_area_struct * vma, ...@@ -249,7 +253,7 @@ static unsigned long move_vma(struct vm_area_struct * vma,
INIT_LIST_HEAD(&new_vma->shared); INIT_LIST_HEAD(&new_vma->shared);
new_vma->vm_start = new_addr; new_vma->vm_start = new_addr;
new_vma->vm_end = new_addr+new_len; new_vma->vm_end = new_addr+new_len;
new_vma->vm_pgoff += (addr - vma->vm_start) >> PAGE_SHIFT; new_vma->vm_pgoff += (addr-vma->vm_start) >> PAGE_SHIFT;
if (new_vma->vm_file) if (new_vma->vm_file)
get_file(new_vma->vm_file); get_file(new_vma->vm_file);
if (new_vma->vm_ops && new_vma->vm_ops->open) if (new_vma->vm_ops && new_vma->vm_ops->open)
...@@ -428,7 +432,8 @@ unsigned long do_mremap(unsigned long addr, ...@@ -428,7 +432,8 @@ unsigned long do_mremap(unsigned long addr,
if (vma->vm_flags & VM_SHARED) if (vma->vm_flags & VM_SHARED)
map_flags |= MAP_SHARED; map_flags |= MAP_SHARED;
new_addr = get_unmapped_area(vma->vm_file, 0, new_len, vma->vm_pgoff, map_flags); new_addr = get_unmapped_area(vma->vm_file, 0, new_len,
vma->vm_pgoff, map_flags);
ret = new_addr; ret = new_addr;
if (new_addr & ~PAGE_MASK) if (new_addr & ~PAGE_MASK)
goto out; goto out;
......
...@@ -237,7 +237,6 @@ static void background_writeout(unsigned long _min_pages) ...@@ -237,7 +237,6 @@ static void background_writeout(unsigned long _min_pages)
break; break;
} }
} }
blk_run_queues();
} }
/* /*
...@@ -308,7 +307,6 @@ static void wb_kupdate(unsigned long arg) ...@@ -308,7 +307,6 @@ static void wb_kupdate(unsigned long arg)
} }
nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write; nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
} }
blk_run_queues();
if (time_before(next_jif, jiffies + HZ)) if (time_before(next_jif, jiffies + HZ))
next_jif = jiffies + HZ; next_jif = jiffies + HZ;
mod_timer(&wb_timer, next_jif); mod_timer(&wb_timer, next_jif);
......
...@@ -85,6 +85,62 @@ static void bad_page(const char *function, struct page *page) ...@@ -85,6 +85,62 @@ static void bad_page(const char *function, struct page *page)
page->mapping = NULL; page->mapping = NULL;
} }
#ifndef CONFIG_HUGETLB_PAGE
#define prep_compound_page(page, order) do { } while (0)
#define destroy_compound_page(page, order) do { } while (0)
#else
/*
* Higher-order pages are called "compound pages". They are structured thusly:
*
* The first PAGE_SIZE page is called the "head page".
*
* The remaining PAGE_SIZE pages are called "tail pages".
*
* All pages have PG_compound set. All pages have their lru.next pointing at
* the head page (even the head page has this).
*
* The head page's lru.prev, if non-zero, holds the address of the compound
* page's put_page() function.
*
* The order of the allocation is stored in the first tail page's lru.prev.
* This is only for debug at present. This usage means that zero-order pages
* may not be compound.
*/
static void prep_compound_page(struct page *page, int order)
{
int i;
int nr_pages = 1 << order;
page->lru.prev = NULL;
page[1].lru.prev = (void *)order;
for (i = 0; i < nr_pages; i++) {
struct page *p = page + i;
SetPageCompound(p);
p->lru.next = (void *)page;
}
}
static void destroy_compound_page(struct page *page, int order)
{
int i;
int nr_pages = 1 << order;
if (page[1].lru.prev != (void *)order)
bad_page(__FUNCTION__, page);
for (i = 0; i < nr_pages; i++) {
struct page *p = page + i;
if (!PageCompound(p))
bad_page(__FUNCTION__, page);
if (p->lru.next != (void *)page)
bad_page(__FUNCTION__, page);
ClearPageCompound(p);
}
}
#endif /* CONFIG_HUGETLB_PAGE */
/* /*
* Freeing function for a buddy system allocator. * Freeing function for a buddy system allocator.
* *
...@@ -114,6 +170,8 @@ static inline void __free_pages_bulk (struct page *page, struct page *base, ...@@ -114,6 +170,8 @@ static inline void __free_pages_bulk (struct page *page, struct page *base,
{ {
unsigned long page_idx, index; unsigned long page_idx, index;
if (order)
destroy_compound_page(page, order);
page_idx = page - base; page_idx = page - base;
if (page_idx & ~mask) if (page_idx & ~mask)
BUG(); BUG();
...@@ -409,6 +467,12 @@ void free_cold_page(struct page *page) ...@@ -409,6 +467,12 @@ void free_cold_page(struct page *page)
free_hot_cold_page(page, 1); free_hot_cold_page(page, 1);
} }
/*
* Really, prep_compound_page() should be called from __rmqueue_bulk(). But
* we cheat by calling it from here, in the order > 0 path. Saves a branch
* or two.
*/
static struct page *buffered_rmqueue(struct zone *zone, int order, int cold) static struct page *buffered_rmqueue(struct zone *zone, int order, int cold)
{ {
unsigned long flags; unsigned long flags;
...@@ -435,6 +499,8 @@ static struct page *buffered_rmqueue(struct zone *zone, int order, int cold) ...@@ -435,6 +499,8 @@ static struct page *buffered_rmqueue(struct zone *zone, int order, int cold)
spin_lock_irqsave(&zone->lock, flags); spin_lock_irqsave(&zone->lock, flags);
page = __rmqueue(zone, order); page = __rmqueue(zone, order);
spin_unlock_irqrestore(&zone->lock, flags); spin_unlock_irqrestore(&zone->lock, flags);
if (order && page)
prep_compound_page(page, order);
} }
if (page != NULL) { if (page != NULL) {
......
...@@ -236,10 +236,8 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp, ...@@ -236,10 +236,8 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
* uptodate then the caller will launch readpage again, and * uptodate then the caller will launch readpage again, and
* will then handle the error. * will then handle the error.
*/ */
if (ret) { if (ret)
read_pages(mapping, filp, &page_pool, ret); read_pages(mapping, filp, &page_pool, ret);
blk_run_queues();
}
BUG_ON(!list_empty(&page_pool)); BUG_ON(!list_empty(&page_pool));
out: out:
return ret; return ret;
......
...@@ -439,7 +439,6 @@ struct arraycache_init initarray_generic __initdata = { { 0, BOOT_CPUCACHE_ENTRI ...@@ -439,7 +439,6 @@ struct arraycache_init initarray_generic __initdata = { { 0, BOOT_CPUCACHE_ENTRI
static kmem_cache_t cache_cache = { static kmem_cache_t cache_cache = {
.lists = LIST3_INIT(cache_cache.lists), .lists = LIST3_INIT(cache_cache.lists),
/* Allow for boot cpu != 0 */ /* Allow for boot cpu != 0 */
.array = { [0 ... NR_CPUS-1] = &initarray_cache.cache },
.batchcount = 1, .batchcount = 1,
.limit = BOOT_CPUCACHE_ENTRIES, .limit = BOOT_CPUCACHE_ENTRIES,
.objsize = sizeof(kmem_cache_t), .objsize = sizeof(kmem_cache_t),
...@@ -611,6 +610,7 @@ void __init kmem_cache_init(void) ...@@ -611,6 +610,7 @@ void __init kmem_cache_init(void)
init_MUTEX(&cache_chain_sem); init_MUTEX(&cache_chain_sem);
INIT_LIST_HEAD(&cache_chain); INIT_LIST_HEAD(&cache_chain);
list_add(&cache_cache.next, &cache_chain); list_add(&cache_cache.next, &cache_chain);
cache_cache.array[smp_processor_id()] = &initarray_cache.cache;
cache_estimate(0, cache_cache.objsize, 0, cache_estimate(0, cache_cache.objsize, 0,
&left_over, &cache_cache.num); &left_over, &cache_cache.num);
......
...@@ -957,7 +957,6 @@ int kswapd(void *p) ...@@ -957,7 +957,6 @@ int kswapd(void *p)
finish_wait(&pgdat->kswapd_wait, &wait); finish_wait(&pgdat->kswapd_wait, &wait);
get_page_state(&ps); get_page_state(&ps);
balance_pgdat(pgdat, 0, &ps); balance_pgdat(pgdat, 0, &ps);
blk_run_queues();
} }
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment