Commit edf3c7a4 authored by Dave Kleikamp's avatar Dave Kleikamp

Add JFS file system

parent a814d16f
...@@ -54,6 +54,7 @@ o binutils 2.9.5.0.25 # ld -v ...@@ -54,6 +54,7 @@ o binutils 2.9.5.0.25 # ld -v
o util-linux 2.10o # fdformat --version o util-linux 2.10o # fdformat --version
o modutils 2.4.2 # insmod -V o modutils 2.4.2 # insmod -V
o e2fsprogs 1.25 # tune2fs o e2fsprogs 1.25 # tune2fs
o jfsutils 1.0.14 # fsck.jfs -V
o reiserfsprogs 3.x.0j # reiserfsck 2>&1|grep reiserfsprogs o reiserfsprogs 3.x.0j # reiserfsck 2>&1|grep reiserfsprogs
o pcmcia-cs 3.1.21 # cardmgr -V o pcmcia-cs 3.1.21 # cardmgr -V
o PPP 2.4.0 # pppd --version o PPP 2.4.0 # pppd --version
...@@ -106,8 +107,8 @@ assembling the 16-bit boot code, removing the need for as86 to compile ...@@ -106,8 +107,8 @@ assembling the 16-bit boot code, removing the need for as86 to compile
your kernel. This change does, however, mean that you need a recent your kernel. This change does, however, mean that you need a recent
release of binutils. release of binutils.
System utilities System utililities
================ ==================
Architectural changes Architectural changes
--------------------- ---------------------
...@@ -165,6 +166,16 @@ E2fsprogs ...@@ -165,6 +166,16 @@ E2fsprogs
The latest version of e2fsprogs fixes several bugs in fsck and The latest version of e2fsprogs fixes several bugs in fsck and
debugfs. Obviously, it's a good idea to upgrade. debugfs. Obviously, it's a good idea to upgrade.
JFSutils
--------
The jfsutils package contains the utilities for the file system.
The following utilities are available:
o fsck.jfs - initiate replay of the transaction log, and check
and repair a JFS formatted partition.
o mkfs.jfs - create a JFS formatted partition.
o other file system utilities are also available in this package.
Reiserfsprogs Reiserfsprogs
------------- -------------
...@@ -303,6 +314,10 @@ E2fsprogs ...@@ -303,6 +314,10 @@ E2fsprogs
--------- ---------
o <http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.25.tar.gz> o <http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.25.tar.gz>
JFSutils
--------
o <http://oss.software.ibm.com/jfs>
Reiserfsprogs Reiserfsprogs
------------- -------------
o <ftp://ftp.namesys.com/pub/reiserfsprogs/reiserfsprogs-3.x.0j.tar.gz> o <ftp://ftp.namesys.com/pub/reiserfsprogs/reiserfsprogs-3.x.0j.tar.gz>
......
...@@ -22,6 +22,8 @@ hpfs.txt ...@@ -22,6 +22,8 @@ hpfs.txt
- info and mount options for the OS/2 HPFS. - info and mount options for the OS/2 HPFS.
isofs.txt isofs.txt
- info and mount options for the ISO 9660 (CDROM) filesystem. - info and mount options for the ISO 9660 (CDROM) filesystem.
jfs.txt
- info and mount options for the JFS filesystem.
ncpfs.txt ncpfs.txt
- info on Novell Netware(tm) filesystem using NCP protocol. - info on Novell Netware(tm) filesystem using NCP protocol.
ntfs.txt ntfs.txt
......
IBM's Journaled File System (JFS) for Linux version 1.0.15
Team members
Steve Best sbest@us.ibm.com
Dave Kleikamp shaggy@austin.ibm.com
Barry Arndt barndt@us.ibm.com
Christoph Hellwig hch@caldera.de
Release February 15, 2002 (version 1.0.15)
This is our fifty-third release of IBM's Enterprise JFS technology port to Linux.
Beta 1 was release 0.1.0 on 12/8/2000, Beta 2 was release 0.2.0 on 3/7/2001,
Beta 3 was release 0.3.0 on 4/30/2001, and release 1.0.0 on 6/28/2001.
Function and Fixes in drop 53 (1.0.15)
- Fix trap when appending to very large file
- Moving jfs headers into fs/jfs at Linus' request
- Move up to linux-2.5.4
- Fix file size limit on 32-bit (Andi Kleen)
- make changelog more read-able and include only 1.0.0 and above (Christoph Hellwig)
- Don't allocate metadata pages from high memory. JFS keeps them kmapped too long causing deadlock.
- Fix xtree corruption when creating file with >= 64 GB of physically contiguous dasd
- Replace semaphore with struct completion for thread startup/shutdown (Benedikt Spranger)
- cleanup Tx alloc/free (Christoph Hellwig)
- Move up to linux-2.5.3
- thread cleanups (Christoph Hellwig)
- First step toward making tblocks and tlocks dynamically allocated. Intro tid_t and lid_t to
insulate the majority of the code from future changes. Also hide TxBlock and TxLock arrays
by using macros to get from tids and lids to real structures.
- minor list-handling cleanup (Christoph Hellwig)
- Replace altnext and altprev with struct list_head
- Clean up the debugging code and add support for collecting statistics (Christoph Hellwig)
Function and Fixes in drop 52 (1.0.14)
- Fix hang in invalidate_metapages when jfs.o is built as a module
- Fix anon_list removal logic in txLock
Function and Fixes in drop 51 (1.0.13)
- chmod changes on newly created directories are lost after umount (bug 2535)
- Page locking race fixes
- Improve metapage locking
- Fix timing window. Lock page while metapage is active to avoid page going
away before the metadata is released. (Fixed crash during mount/umount testing)
- Make changes for 2.5.2 kernel
- Fix race condition truncating large files
Function and Fixes in drop50 (1.0.12)
- Add O_DIRECT support
- Add support for 2.4.17 kernel
- Make sure COMMIT_STALE gets reset before the inode is unlocked. Fixing
this gets rid of XT_GETPAGE errors
- Remove invalid __exit keyword from metapage_exit and txExit.
- fix assert(log->cqueue.head == NULL by waiting longer
Function and Fixes in drop49 (1.0.11)
- Readdir was not handling multibyte codepages correctly.
- Make mount option parsing more robust.
- Add iocharset mount option.
- Journalling of symlinks incorrect, resulting in logredo failure of -265.
- Add jfsutils information to Changes file
- Improve recoverability of the file system when metadata corruption is detected.
- Fix kernel OOPS when root inode is corrupted
Function and Fixes in drop48 (1.0.10)
- put inodes later on hash queues
- Fix boundary case in xtTruncate
- When invalidating metadata, try to flush the dirty buffers rather than sync them.
- Add another sanity check to avoid trapping when imap is corrupt
- Fix file truncate while removing large file (assert(cmp == 0))
- read_cache_page returns ERR_PTR, not NULL on error
- Add dtSearchNode and dtRelocate
- JFS needs to use generic_file_open & generic_file_llseek
- Remove lazyQwait, etc. It created an unnecessary bottleneck in TxBegin.
Function and Fixes in drop47 (1.0.9)
- Fix data corruption problem when creating files while deleting others. (jitterbug 183)
- Make sure all metadata is written before finalizing the log
- Fix serialization problem in shutdown by setting i_size of directory sooner. (bugzilla #334)
- JFS should quit whining when special files are marked dirty during read-only mount.
- Must always check rc after DT_GETPAGE
- Add diExtendFS
- Removing defconfig form JFS source - not really needed
Function and Fixes in drop46 (1.0.8)
- Synclist was being built backwards causing logredo to quit too early
- jfs_compat.h needs to include module.h
- uncomment EXPORTS_NO_SYMBOLS in super.c
- Minor code cleanup
- xtree of zero-truncated file not being logged
- Fix logging on file truncate
- remove unused metapage fields
Function and Fixes in drop45 (1.0.7)
- cleanup remove IS_KIOBUFIO define.
- cleanup remove TRUNC_NO_TOSS define.
- have jFYI's use the name directly from dentry
- Remove nul _ALLOC and _FREE macros and also make spinlocks static.
- cleanup add externs where needed in the header files
- jfs_write_inode is a bad place to call iput. Also limit warnings.
- More truncate cleanup
- Truncate cleanup
- Add missing statics in jfs_metapage.c
- fsync fixes
- Clean up symlink code - use page_symlink_inode_operations
- unicode handling cleanup
- cleanup replace UniChar with wchar_t
- Get rid of CDLL_* macros - use list.h instead
- 2.4.11-prex mount problem Call new_inode instead of get_empty_inode
- use kernel min/max macros
- Add MODULE_LICENSE stub for older kernels
- IA64/gcc3 fixes
- Log Manager fixes, introduce __SLEEP_COND macro
- Mark superblock dirty when some errors detected (forcing fsck to be run).
- More robust remounting from r/o to r/w.
- Misc. cleanup add static where appropriate
- small cleanup in jfs_umount_rw
- add MODULE_ stuff
- Set *dropped_lock in alloc_metapage
- Get rid of unused log list
- cleanup jfs_imap.c to remove _OLD_STUFF and _NO_MORE_MOUNT_INODE defines
- Log manager cleanup
- Transaction manager cleanup
- correct memory allocations flags
- Better handling of iterative truncation
- Change continue to break, otherwise we don't re-acquire LAZY_LOCK
Function and Fixes in drop44 (1.0.6)
- Create jfs_incore.h which merges linux/jfs_fs.h, linux/jfs_fs_i.h, and jfs_fs_sb.h
- Create a configuration option to handle JFS_DEBUG define
- Fixed a few cases where positive error codes were returned to the VFS.
- Replace jfs_dir_read by generic_read_dir.
- jfs_fsync_inode is only called by jfs_fsync_file, merge the two and rename to jfs_fsync.
- Add a bunch of missing externs.
- jfs_rwlock_lock is unused, nuke it.
- Always use atomic set/test_bit operations to protect jfs_ip->cflag
- Combine jfs_ip->flag with jfs_ip->cflag
- Fixed minor format errors reported by fsck
- cflags should be long so bitops always works correctly
- Use GFP_NOFS for runtime memory allocations
- Support VM changes in 2.4.10 of the kernel
- Remove ifdefs supporting older 2.4 kernels. JFS now requires at least 2.4.3 or 2.4.2-ac2
- Simplify and remove one use of IWRITE_TRYLOCK
- jfs_truncate was not passing tid to xtTruncate
- removed obsolete extent_page workaround
- correct recovery from failed diAlloc call (disk full)
- In write_metapage, don't call commit_write if prepare_write failed
Function and Fixes in drop43 (1.0.5)
- Allow separate allocation of JFS-private superblock/inode data.
- Remove checks in namei.c that are already done by the VFS.
- Remove redundant mutex defines.
- Replace all occurrences of #include <linux/malloc.h> with #include <linux/slab.h>
- Work around race condition in remount -fixes OOPS during shutdown
- Truncate large files incrementally ( affects directories too)
Function and Fixes in drop42 (1.0.4)
- Fixed compiler warnings in the FS when building on 64 bits systems
- Fixed deadlock where jfsCommit hung in hold_metapage
- Fixed problems with remount
- Reserve metapages for jfsCommit thread
- Get rid of buggy invalidate_metapage & use discard_metapage
- Don't hand metapages to jfsIOthread (too many context switches) (jitterbug 125, bugzilla 238)
- Fix error message in jfs_strtoUCS
Function and Fixes in drop41 (1.0.3)
- Patch to move from previous release to latest release needs to update the version number in super.c
- Jitterbug problems (134,140,152) removing files have been fixed
- Set rc=ENOSPC if ialloc fails in jfs_create and jfs_mkdir
- Fixed jfs_txnmgr.c 775! assert
- Fixed jfs_txnmgr.c 884! assert(mp->nohomeok==0)
- Fix hang - prevent tblocks from being exhausted
- Fix oops trying to mount reiserfs
- Fail more gracefully in jfs_imap.c
- Print more information when char2uni fails
- Fix timing problem between Block map and metapage cache - jitterbug 139
- Code Cleanup (removed many ifdef's, obsolete code, ran code through indent) Mostly 2.4 tree
- Split source tree (Now have a separate source tree for 2.2, 2.4, and jfsutils)
Function and Fixes in drop40 (1.0.2)
- Fixed multiple truncate hang
- Fixed hang on unlink a file and sync happening at the same time
- Improved handling of kmalloc error conditions
- Fixed hang in blk_get_queue and SMP deadlock: bh_end_io call generic_make_request
(jitterbug 145 and 146)
- stbl was not set correctly set in dtDelete
- changed trap to printk in dbAllocAG to avoid system hang
Function and Fixes in drop 39 (1.0.1)
- Fixed hang during copying files on 2.2.x series
- Fixed TxLock compile problem
- Fixed to correctly update the number of blocks for directories (this was causing the FS
to show fsck error after compiling mozilla).
- Fixed to prevent old data from being written to disk from the page cache.
Function and Fixes in drop 38 (1.0.0)
- Fixed some general log problems
Please send bugs, comments, cards and letters to linuxjfs@us.ibm.com.
The JFS mailing list can be subscribed to by using the link labeled "Mail list Subscribe"
at our web page http://oss.software.ibm.com/jfs/.
IBM's Journaled File System (JFS) for Linux version 1.0.15
Team members
Steve Best sbest@us.ibm.com
Dave Kleikamp shaggy@austin.ibm.com
Barry Arndt barndt@us.ibm.com
Christoph Hellwig hch@caldera.de
Release February 15, 2002 (version 1.0.15)
This is our fifty-third release of IBM's Enterprise JFS technology port to Linux.
Beta 1 was release 0.1.0 on 12/8/2000, Beta 2 was release 0.2.0 on 3/7/2001,
Beta 3 was release 0.3.0 on 4/30/2001, and release 1.0.0 on 6/28/2001.
The changelog.jfs file contains detailed information of changes done in each source
code drop.
JFS has a source tree that can be built on 2.4.3 - 2.4.17 and 2.5.4 kernel.org
source trees.
Our current goal on the 2.5.x series of the kernel is to update to the latest
2.5.x version and only support the latest version of this kernel.
This will change when the distros start shipping the 2.5.x series of the kernel.
Our current goal on the 2.4.x series of the kernel is to continue to support
all of the kernels in this series as we do today.
There is an anonymous cvs access available for the JFS tree. The steps below are
what is needed to pull the JFS cvs tree from the oss.software.ibm.com server.
id anoncvs
password anoncvs
To checkout 2.4.x series of the JFS files do the following:
CVSROOT should be set to :pserver:anoncvs@oss.software.ibm.com:/usr/cvs/jfs
cvs checkout linux24
To checkout 2.5.2 series of the JFS files do the following:
CVSROOT should be set to :pserver:anoncvs@oss.software.ibm.com:/usr/cvs/jfs
cvs checkout linux25
To checkout the JFS utilities do the following:
CVSROOT should be set to :pserver:anoncvs@oss.software.ibm.com:/usr/cvs/jfs
cvs checkout jfsutils
The cvs tree contains the latest changes being done to JFS. To receive notification
of commits to the cvs tree, please send e-mail to linuxjfs@us.ibm.com stating that
you would like notifications sent to you.
The jfs-2.4-1.0.15-patch.tar.gz is the easiest way to get the latest file system
source code on your system. There are also patch files that can move your jfs source
code from one release to another. If you have release 1.0.14 and would like to move
to release 1.0.15 the patch file named jfs-2.4-1_0_14-to-1_0_15-patch.gz will do that.
The jfs-2.4-1.0.15-patch.tar.gz file contains a readme and patch files for different
levels of the 2.4 kernel. Please see the README in the jfs-2.4-1.0.15-patch.tar.gz
file for help on applying the two patch files.
The following files in the kernel source tree have been changed so JFS can be built.
The jfs-2.4-1.0.15.tar.gz source tar ball contains each of the files below with
the extension of the kernel level it is associated with. As an example, there are now
four Config.in files named Config.in-2.4.0, Config.in-2.4.5, Config.in-2.4.7 and
Config.in-2.4.17.
If you use the jfs-2.4-1.0.15.tar.gz to build JFS you must rename each of the
kernel files to the file names listed below. The standard kernel from www.kernel.org
is the source of the kernel files that are included in the jfs tar file.
In sub dir fs Config.in, Makefile
In sub dir fs/nls Config.in
In sub dir Documentation Configure.help, Changes
In sub dir Documentation/filesystems 00-INDEX
In sub dir linux MAINTAINERS
Please backup the above files before the JFS tar file is added to the kernel source
tree. All JFS files are located in the include/linux/jfs or fs/jfs sub dirs.
Our development team has used the Linux kernel levels 2.4.3 - 2.4.17 kernels
with gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
for our port so far. A goal of the JFS team is to have JFS run on all architectures
that Linux supports, there is no architecture specific code in JFS. JFS has been run
on the following architectures (x86, PowerPC, Alpha, s/390, ARM) so far.
To make JFS build, during the "make config" step of building the kernel answer y to
the Prompt for development and/or incomplete code/drivers in the Code maturity level
options section. In the Filesystems section use the m for the answer to
JFS filesystem support (experimental) (CONFIG_JFS_FS) [Y/m/n?]
Build in /usr/src/linux with the command:
make modules
make modules_install
If you rebuild jfs.o after having mounted and unmounted a partition, "modprobe -r jfs"
will unload the old module.
For the file system debugging messages are being written to /var/log/messages.
Please see the readme in the utilities package for information about building
the JFS utilities.
JFS TODO list:
Plans for our near term development items
- get defrag capabilities operational in the FS
- get extendfs capabilities operational in the FS
- test EXTENDFS utility, for growing JFS partitions
- test defrag utility, calls file system to defrag the file system.
- add support for block sizes (512,1024,2048)
- add support for logfile on dedicated partition
Longer term work items
- get access control list functionality operational
- get extended attributes functionality operational
- add quota support
Please send bugs, comments, cards and letters to linuxjfs@us.ibm.com.
The JFS mailing list can be subscribed to by using the link labeled "Mail list Subscribe"
at our web page http://oss.software.ibm.com/jfs/.
...@@ -852,6 +852,13 @@ L: jffs-dev@axis.com ...@@ -852,6 +852,13 @@ L: jffs-dev@axis.com
W: http://sources.redhat.com/jffs2/ W: http://sources.redhat.com/jffs2/
S: Maintained S: Maintained
JFS FILESYSTEM
P: Dave Kleikamp
M: shaggy@austin.ibm.com
L: jfs-discussion@oss.software.ibm.com
W: http://oss.software.ibm.com/jfs/
S: Supported
JOYSTICK DRIVER JOYSTICK DRIVER
P: Vojtech Pavlik P: Vojtech Pavlik
M: vojtech@suse.cz M: vojtech@suse.cz
......
...@@ -859,6 +859,25 @@ CONFIG_ADFS_FS_RW ...@@ -859,6 +859,25 @@ CONFIG_ADFS_FS_RW
hard drives and ADFS-formatted floppy disks. This is experimental hard drives and ADFS-formatted floppy disks. This is experimental
codes, so if you're unsure, say N. codes, so if you're unsure, say N.
JFS filesystem support
CONFIG_JFS_FS
This is a port of IBM's Journaled Filesystem . More information is
available in the file Documentation/filesystems/jfs.txt.
If you do not intend to use the JFS filesystem, say N.
JFS Debugging
CONFIG_JFS_DEBUG
If you are experiencing any problems with the JFS filesystem, say
Y here. This will result in additional debugging messages to be
written to the system log. Under normal circumstances, this
results in very little overhead.
JFS Statistics
CONFIG_JFS_STATISTICS
Enabling this option will cause statistics from the JFS file system
to be made available to the user in the /proc/fs/jfs/ directory.
CONFIG_DEVPTS_FS CONFIG_DEVPTS_FS
You should say Y here if you said Y to "Unix98 PTY support" above. You should say Y here if you said Y to "Unix98 PTY support" above.
You'll then get a virtual file system which can be mounted on You'll then get a virtual file system which can be mounted on
......
...@@ -54,6 +54,10 @@ tristate 'ISO 9660 CDROM file system support' CONFIG_ISO9660_FS ...@@ -54,6 +54,10 @@ tristate 'ISO 9660 CDROM file system support' CONFIG_ISO9660_FS
dep_mbool ' Microsoft Joliet CDROM extensions' CONFIG_JOLIET $CONFIG_ISO9660_FS dep_mbool ' Microsoft Joliet CDROM extensions' CONFIG_JOLIET $CONFIG_ISO9660_FS
dep_mbool ' Transparent decompression extension' CONFIG_ZISOFS $CONFIG_ISO9660_FS dep_mbool ' Transparent decompression extension' CONFIG_ZISOFS $CONFIG_ISO9660_FS
tristate 'JFS filesystem support' CONFIG_JFS_FS
dep_mbool ' JFS debugging' CONFIG_JFS_DEBUG $CONFIG_JFS_FS
dep_mbool ' JFS statistics' CONFIG_JFS_STATISTICS $CONFIG_JFS_FS
tristate 'Minix fs support' CONFIG_MINIX_FS tristate 'Minix fs support' CONFIG_MINIX_FS
tristate 'FreeVxFS file system support (VERITAS VxFS(TM) compatible)' CONFIG_VXFS_FS tristate 'FreeVxFS file system support (VERITAS VxFS(TM) compatible)' CONFIG_VXFS_FS
......
...@@ -67,6 +67,7 @@ subdir-$(CONFIG_ADFS_FS) += adfs ...@@ -67,6 +67,7 @@ subdir-$(CONFIG_ADFS_FS) += adfs
subdir-$(CONFIG_REISERFS_FS) += reiserfs subdir-$(CONFIG_REISERFS_FS) += reiserfs
subdir-$(CONFIG_DEVPTS_FS) += devpts subdir-$(CONFIG_DEVPTS_FS) += devpts
subdir-$(CONFIG_SUN_OPENPROMFS) += openpromfs subdir-$(CONFIG_SUN_OPENPROMFS) += openpromfs
subdir-$(CONFIG_JFS_FS) += jfs
obj-$(CONFIG_BINFMT_AOUT) += binfmt_aout.o obj-$(CONFIG_BINFMT_AOUT) += binfmt_aout.o
......
#
# Makefile for the Linux JFS filesystem routines.
#
# Note! Dependencies are done automagically by 'make dep', which also
# removes any old dependencies. DON'T put your own dependencies here
# unless it's something special (not a .c file).
#
# Note 2! The CFLAGS definitions are now in the main makefile.
O_TARGET := jfs.o
obj-y := super.o file.o inode.o namei.o jfs_mount.o jfs_umount.o \
jfs_xtree.o jfs_imap.o jfs_debug.o jfs_dmap.o \
jfs_unicode.o jfs_dtree.o jfs_inode.o \
jfs_extent.o symlink.o jfs_metapage.o \
jfs_logmgr.o jfs_txnmgr.o jfs_uniupr.o
obj-m := $(O_TARGET)
EXTRA_CFLAGS += -D_JFS_4K
include $(TOPDIR)/Rules.make
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#ifndef _H_ENDIAN24
#define _H_ENDIAN24
/*
* fs/jfs/endian24.h:
*
* Endian conversion for 24-byte data
*
*/
#define __swab24(x) \
({ \
__u32 __x = (x); \
((__u32)( \
((__x & (__u32)0x000000ffUL) << 16) | \
(__x & (__u32)0x0000ff00UL) | \
((__x & (__u32)0x00ff0000UL) >> 16) )); \
})
#if (defined(__KERNEL__) && defined(__LITTLE_ENDIAN)) || (defined(__BYTE_ORDER) && (__BYTE_ORDER == __LITTLE_ENDIAN))
#define __cpu_to_le24(x) ((__u32)(x))
#define __le24_to_cpu(x) ((__u32)(x))
#else
#define __cpu_to_le24(x) __swab24(x)
#define __le24_to_cpu(x) __swab24(x)
#endif
#ifdef __KERNEL__
#define cpu_to_le24 __cpu_to_le24
#define le24_to_cpu __le24_to_cpu
#endif
#endif /* !_H_ENDIAN24 */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include <linux/fs.h>
#include <linux/locks.h>
#include "jfs_incore.h"
#include "jfs_txnmgr.h"
#include "jfs_debug.h"
extern int generic_file_open(struct inode *, struct file *);
extern loff_t generic_file_llseek(struct file *, loff_t, int origin);
extern int jfs_commit_inode(struct inode *, int);
int jfs_fsync(struct file *file, struct dentry *dentry, int datasync)
{
struct inode *inode = dentry->d_inode;
int rc = 0;
rc = fsync_inode_data_buffers(inode);
if (!(inode->i_state & I_DIRTY))
return rc;
if (datasync || !(inode->i_state & I_DIRTY_DATASYNC))
return rc;
IWRITE_LOCK(inode);
rc |= jfs_commit_inode(inode, 1);
IWRITE_UNLOCK(inode);
return rc ? -EIO : 0;
}
struct file_operations jfs_file_operations = {
open: generic_file_open,
llseek: generic_file_llseek,
write: generic_file_write,
read: generic_file_read,
mmap: generic_file_mmap,
fsync: jfs_fsync,
};
/*
* Guts of jfs_truncate. Called with locks already held. Can be called
* with directory for truncating directory index table.
*/
void jfs_truncate_nolock(struct inode *ip, loff_t length)
{
loff_t newsize;
tid_t tid;
ASSERT(length >= 0);
if (test_cflag(COMMIT_Nolink, ip)) {
xtTruncate(0, ip, length, COMMIT_WMAP);
return;
}
do {
tid = txBegin(ip->i_sb, 0);
newsize = xtTruncate(tid, ip, length,
COMMIT_TRUNCATE | COMMIT_PWMAP);
if (newsize < 0) {
txEnd(tid);
break;
}
ip->i_mtime = ip->i_ctime = CURRENT_TIME;
mark_inode_dirty(ip);
txCommit(tid, 1, &ip, 0);
txEnd(tid);
} while (newsize > length); /* Truncate isn't always atomic */
}
static void jfs_truncate(struct inode *ip)
{
jFYI(1, ("jfs_truncate: size = 0x%lx\n", (ulong) ip->i_size));
IWRITE_LOCK(ip);
jfs_truncate_nolock(ip, ip->i_size);
IWRITE_UNLOCK(ip);
}
struct inode_operations jfs_file_inode_operations = {
truncate: jfs_truncate,
};
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include <linux/fs.h>
#include <linux/locks.h>
#include <linux/slab.h>
#include "jfs_incore.h"
#include "jfs_filsys.h"
#include "jfs_imap.h"
#include "jfs_extent.h"
#include "jfs_unicode.h"
#include "jfs_debug.h"
extern struct inode_operations jfs_dir_inode_operations;
extern struct inode_operations jfs_file_inode_operations;
extern struct inode_operations jfs_symlink_inode_operations;
extern struct file_operations jfs_dir_operations;
extern struct file_operations jfs_file_operations;
struct address_space_operations jfs_aops;
extern int freeZeroLink(struct inode *);
void jfs_put_inode(struct inode *inode)
{
jFYI(1, ("In jfs_put_inode, inode = 0x%p\n", inode));
}
void jfs_read_inode(struct inode *inode)
{
jFYI(1, ("In jfs_read_inode, inode = 0x%p\n", inode));
if (diRead(inode))
goto bad_inode;
if (S_ISREG(inode->i_mode)) {
inode->i_op = &jfs_file_inode_operations;
inode->i_fop = &jfs_file_operations;
inode->i_mapping->a_ops = &jfs_aops;
} else if (S_ISDIR(inode->i_mode)) {
inode->i_op = &jfs_dir_inode_operations;
inode->i_fop = &jfs_dir_operations;
inode->i_mapping->a_ops = &jfs_aops;
inode->i_mapping->gfp_mask = GFP_NOFS;
} else if (S_ISLNK(inode->i_mode)) {
if (inode->i_size > IDATASIZE) {
inode->i_op = &page_symlink_inode_operations;
inode->i_mapping->a_ops = &jfs_aops;
} else
inode->i_op = &jfs_symlink_inode_operations;
} else {
init_special_inode(inode, inode->i_mode,
kdev_t_to_nr(inode->i_rdev));
}
return;
bad_inode:
make_bad_inode(inode);
}
/* This define is from fs/open.c */
#define special_file(m) (S_ISCHR(m)||S_ISBLK(m)||S_ISFIFO(m)||S_ISSOCK(m))
/*
* Workhorse of both fsync & write_inode
*/
int jfs_commit_inode(struct inode *inode, int wait)
{
int rc = 0;
tid_t tid;
static int noisy = 5;
jFYI(1, ("In jfs_commit_inode, inode = 0x%p\n", inode));
/*
* Don't commit if inode has been committed since last being
* marked dirty, or if it has been deleted.
*/
if (test_cflag(COMMIT_Nolink, inode) ||
!test_cflag(COMMIT_Dirty, inode))
return 0;
if (isReadOnly(inode)) {
/* kernel allows writes to devices on read-only
* partitions and may think inode is dirty
*/
if (!special_file(inode->i_mode) && noisy) {
jERROR(1, ("jfs_commit_inode(0x%p) called on "
"read-only volume\n", inode));
jERROR(1, ("Is remount racy?\n"));
noisy--;
}
return 0;
}
tid = txBegin(inode->i_sb, COMMIT_INODE);
rc = txCommit(tid, 1, &inode, wait ? COMMIT_SYNC : 0);
txEnd(tid);
return -rc;
}
void jfs_write_inode(struct inode *inode, int wait)
{
/*
* If COMMIT_DIRTY is not set, the inode isn't really dirty.
* It has been committed since the last change, but was still
* on the dirty inode list
*/
if (test_cflag(COMMIT_Nolink, inode) ||
!test_cflag(COMMIT_Dirty, inode))
return;
IWRITE_LOCK(inode);
if (jfs_commit_inode(inode, wait)) {
jERROR(1, ("jfs_write_inode: jfs_commit_inode failed!\n"));
}
IWRITE_UNLOCK(inode);
}
void jfs_delete_inode(struct inode *inode)
{
jFYI(1, ("In jfs_delete_inode, inode = 0x%p\n", inode));
IWRITE_LOCK(inode);
if (test_cflag(COMMIT_Freewmap, inode))
freeZeroLink(inode);
diFree(inode);
IWRITE_UNLOCK(inode);
clear_inode(inode);
}
void jfs_dirty_inode(struct inode *inode)
{
static int noisy = 5;
if (isReadOnly(inode)) {
if (!special_file(inode->i_mode) && noisy) {
/* kernel allows writes to devices on read-only
* partitions and may try to mark inode dirty
*/
jERROR(1, ("jfs_dirty_inode called on "
"read-only volume\n"));
jERROR(1, ("Is remount racy?\n"));
noisy--;
}
return;
}
set_cflag(COMMIT_Dirty, inode);
}
static int jfs_get_block(struct inode *ip, sector_t lblock,
struct buffer_head *bh_result, int create)
{
s64 lblock64 = lblock;
int no_size_check = 0;
int rc = 0;
int take_locks;
xad_t xad;
s64 xaddr;
int xflag;
s32 xlen;
/*
* If this is a special inode (imap, dmap) or directory,
* the lock should already be taken
*/
take_locks = ((JFS_IP(ip)->fileset != AGGREGATE_I) &&
!S_ISDIR(ip->i_mode));
/*
* Take appropriate lock on inode
*/
if (take_locks) {
if (create)
IWRITE_LOCK(ip);
else
IREAD_LOCK(ip);
}
/*
* A directory's "data" is the inode index table, but i_size is the
* size of the d-tree, so don't check the offset against i_size
*/
if (S_ISDIR(ip->i_mode))
no_size_check = 1;
if ((no_size_check ||
((lblock64 << ip->i_sb->s_blocksize_bits) < ip->i_size)) &&
(xtLookup
(ip, lblock64, 1, &xflag, &xaddr, &xlen, no_size_check)
== 0) && xlen) {
if (xflag & XAD_NOTRECORDED) {
if (!create)
/*
* Allocated but not recorded, read treats
* this as a hole
*/
goto unlock;
#ifdef _JFS_4K
XADoffset(&xad, lblock64);
XADlength(&xad, xlen);
XADaddress(&xad, xaddr);
#else /* _JFS_4K */
/*
* As long as block size = 4K, this isn't a problem.
* We should mark the whole page not ABNR, but how
* will we know to mark the other blocks BH_New?
*/
BUG();
#endif /* _JFS_4K */
rc = extRecord(ip, &xad);
if (rc)
goto unlock;
bh_result->b_state |= (1UL << BH_New);
}
map_bh(bh_result, ip->i_sb, xaddr);
goto unlock;
}
if (!create)
goto unlock;
/*
* Allocate a new block
*/
#ifdef _JFS_4K
if ((rc =
extHint(ip, lblock64 << ip->i_sb->s_blocksize_bits, &xad)))
goto unlock;
rc = extAlloc(ip, 1, lblock64, &xad, FALSE);
if (rc)
goto unlock;
bh_result->b_state |= (1UL << BH_New);
map_bh(bh_result, ip->i_sb, addressXAD(&xad));
#else /* _JFS_4K */
/*
* We need to do whatever it takes to keep all but the last buffers
* in 4K pages - see jfs_write.c
*/
BUG();
#endif /* _JFS_4K */
unlock:
/*
* Release lock on inode
*/
if (take_locks) {
if (create)
IWRITE_UNLOCK(ip);
else
IREAD_UNLOCK(ip);
}
return -rc;
}
static int jfs_writepage(struct page *page)
{
return block_write_full_page(page, jfs_get_block);
}
static int jfs_readpage(struct file *file, struct page *page)
{
return block_read_full_page(page, jfs_get_block);
}
static int jfs_prepare_write(struct file *file,
struct page *page, unsigned from, unsigned to)
{
return block_prepare_write(page, from, to, jfs_get_block);
}
static int jfs_bmap(struct address_space *mapping, long block)
{
return generic_block_bmap(mapping, block, jfs_get_block);
}
static int jfs_direct_IO(int rw, struct inode *inode, struct kiobuf *iobuf,
unsigned long blocknr, int blocksize)
{
return generic_direct_IO(rw, inode, iobuf, blocknr,
blocksize, jfs_get_block);
}
struct address_space_operations jfs_aops = {
readpage: jfs_readpage,
writepage: jfs_writepage,
sync_page: block_sync_page,
prepare_write: jfs_prepare_write,
commit_write: generic_commit_write,
bmap: jfs_bmap,
direct_IO: jfs_direct_IO,
};
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*/
#ifndef _H_JFS_BTREE
#define _H_JFS_BTREE
/*
* jfs_btree.h: B+-tree
*
* JFS B+-tree (dtree and xtree) common definitions
*/
/*
* basic btree page - btpage_t
*/
typedef struct {
s64 next; /* 8: right sibling bn */
s64 prev; /* 8: left sibling bn */
u8 flag; /* 1: */
u8 rsrvd[7]; /* 7: type specific */
s64 self; /* 8: self address */
u8 entry[4064]; /* 4064: */
} btpage_t; /* (4096) */
/* btpaget_t flag */
#define BT_TYPE 0x07 /* B+-tree index */
#define BT_ROOT 0x01 /* root page */
#define BT_LEAF 0x02 /* leaf page */
#define BT_INTERNAL 0x04 /* internal page */
#define BT_RIGHTMOST 0x10 /* rightmost page */
#define BT_LEFTMOST 0x20 /* leftmost page */
#define BT_SWAPPED 0x80 /* used by fsck for endian swapping */
/* btorder (in inode) */
#define BT_RANDOM 0x0000
#define BT_SEQUENTIAL 0x0001
#define BT_LOOKUP 0x0010
#define BT_INSERT 0x0020
#define BT_DELETE 0x0040
/*
* btree page buffer cache access
*/
#define BT_IS_ROOT(MP) (((MP)->xflag & COMMIT_PAGE) == 0)
/* get page from buffer page */
#define BT_PAGE(IP, MP, TYPE, ROOT)\
(BT_IS_ROOT(MP) ? (TYPE *)&JFS_IP(IP)->ROOT : (TYPE *)(MP)->data)
/* get the page buffer and the page for specified block address */
#define BT_GETPAGE(IP, BN, MP, TYPE, SIZE, P, RC, ROOT)\
{\
if ((BN) == 0)\
{\
MP = (metapage_t *)&JFS_IP(IP)->bxflag;\
P = (TYPE *)&JFS_IP(IP)->ROOT;\
RC = 0;\
jEVENT(0,("%d BT_GETPAGE returning root\n", __LINE__));\
}\
else\
{\
jEVENT(0,("%d BT_GETPAGE reading block %d\n", __LINE__,\
(int)BN));\
MP = read_metapage((IP), BN, SIZE, 1);\
if (MP) {\
RC = 0;\
P = (MP)->data;\
} else {\
P = NULL;\
jERROR(1,("bread failed!\n"));\
RC = EIO;\
}\
}\
}
#define BT_MARK_DIRTY(MP, IP)\
{\
if (BT_IS_ROOT(MP))\
mark_inode_dirty(IP);\
else\
mark_metapage_dirty(MP);\
}
/* put the page buffer */
#define BT_PUTPAGE(MP)\
{\
if (! BT_IS_ROOT(MP)) \
release_metapage(MP); \
}
/*
* btree traversal stack
*
* record the path traversed during the search;
* top frame record the leaf page/entry selected.
*/
#define MAXTREEHEIGHT 8
typedef struct btframe { /* stack frame */
s64 bn; /* 8: */
s16 index; /* 2: */
s16 lastindex; /* 2: */
struct metapage *mp; /* 4: */
} btframe_t; /* (16) */
typedef struct btstack {
btframe_t *top; /* 4: */
int nsplit; /* 4: */
btframe_t stack[MAXTREEHEIGHT];
} btstack_t;
#define BT_CLR(btstack)\
(btstack)->top = (btstack)->stack
#define BT_PUSH(BTSTACK, BN, INDEX)\
{\
(BTSTACK)->top->bn = BN;\
(BTSTACK)->top->index = INDEX;\
++(BTSTACK)->top;\
assert((BTSTACK)->top != &((BTSTACK)->stack[MAXTREEHEIGHT]));\
}
#define BT_POP(btstack)\
( (btstack)->top == (btstack)->stack ? NULL : --(btstack)->top )
#define BT_STACK(btstack)\
( (btstack)->top == (btstack)->stack ? NULL : (btstack)->top )
/* retrieve search results */
#define BT_GETSEARCH(IP, LEAF, BN, MP, TYPE, P, INDEX, ROOT)\
{\
BN = (LEAF)->bn;\
MP = (LEAF)->mp;\
if (BN)\
P = (TYPE *)MP->data;\
else\
P = (TYPE *)&JFS_IP(IP)->ROOT;\
INDEX = (LEAF)->index;\
}
/* put the page buffer of search */
#define BT_PUTSEARCH(BTSTACK)\
{\
if (! BT_IS_ROOT((BTSTACK)->top->mp))\
release_metapage((BTSTACK)->top->mp);\
}
#endif /* _H_JFS_BTREE */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include <linux/fs.h>
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
#include <asm/uaccess.h>
#include "jfs_incore.h"
#include "jfs_filsys.h"
#include "jfs_debug.h"
#ifdef CONFIG_JFS_DEBUG
void dump_mem(char *label, void *data, int length)
{
int i, j;
int *intptr = data;
char *charptr = data;
char buf[10], line[80];
printk("%s: dump of %d bytes of data at 0x%p\n\n", label, length,
data);
for (i = 0; i < length; i += 16) {
line[0] = 0;
for (j = 0; (j < 4) && (i + j * 4 < length); j++) {
sprintf(buf, " %08x", intptr[i / 4 + j]);
strcat(line, buf);
}
buf[0] = ' ';
buf[2] = 0;
for (j = 0; (j < 16) && (i + j < length); j++) {
buf[1] =
isprint(charptr[i + j]) ? charptr[i + j] : '.';
strcat(line, buf);
}
printk("%s\n", line);
}
}
#ifdef CONFIG_PROC_FS
static int loglevel_read(char *page, char **start, off_t off,
int count, int *eof, void *data)
{
int len;
len = sprintf(page, "%d\n", jfsloglevel);
len -= off;
*start = page + off;
if (len > count)
len = count;
else
*eof = 1;
if (len < 0)
len = 0;
return len;
}
static int loglevel_write(struct file *file, const char *buffer,
unsigned long count, void *data)
{
char c;
if (get_user(c, buffer))
return -EFAULT;
/* yes, I know this is an ASCIIism. --hch */
if (c < '0' || c > '9')
return -EINVAL;
jfsloglevel = c - '0';
return count;
}
extern read_proc_t jfs_txanchor_read;
#ifdef CONFIG_JFS_STATISTICS
extern read_proc_t jfs_lmstats_read;
extern read_proc_t jfs_xtstat_read;
extern read_proc_t jfs_mpstat_read;
#endif
static struct proc_dir_entry *base;
static struct {
const char *name;
read_proc_t *read_fn;
write_proc_t *write_fn;
} Entries[] = {
{ "TxAnchor", jfs_txanchor_read, },
#ifdef CONFIG_JFS_STATISTICS
{ "lmstats", jfs_lmstats_read, },
{ "xtstat", jfs_xtstat_read, },
{ "mpstat", jfs_mpstat_read, },
#endif
{ "loglevel", loglevel_read, loglevel_write }
};
#define NPROCENT (sizeof(Entries)/sizeof(Entries[0]))
void jfs_proc_init(void)
{
int i;
if (!(base = proc_mkdir("jfs", proc_root_fs)))
return;
base->owner = THIS_MODULE;
for (i = 0; i < NPROCENT; i++) {
struct proc_dir_entry *p;
if ((p = create_proc_entry(Entries[i].name, 0, base))) {
p->read_proc = Entries[i].read_fn;
p->write_proc = Entries[i].write_fn;
}
}
}
void jfs_proc_clean(void)
{
int i;
if (base) {
for (i = 0; i < NPROCENT; i++)
remove_proc_entry(Entries[i].name, base);
remove_proc_entry("jfs", base);
}
}
#endif /* CONFIG_PROC_FS */
#endif /* CONFIG_JFS_DEBUG */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*/
#ifndef _H_JFS_DEBUG
#define _H_JFS_DEBUG
/*
* jfs_debug.h
*
* global debug message, data structure/macro definitions
* under control of CONFIG_JFS_DEBUG, CONFIG_JFS_STATISTICS;
*/
/*
* assert with traditional printf/panic
*/
#ifdef CONFIG_KERNEL_ASSERTS
/* kgdb stuff */
#define assert(p) KERNEL_ASSERT(#p, p)
#else
#define assert(p) {\
if (!(p))\
{\
printk("assert(%s)\n",#p);\
BUG();\
}\
}
#endif
/*
* debug ON
* --------
*/
#ifdef CONFIG_JFS_DEBUG
#define ASSERT(p) assert(p)
/* dump memory contents */
extern void dump_mem(char *label, void *data, int length);
extern int jfsloglevel;
/* information message: e.g., configuration, major event */
#define jFYI(button, prspec) \
do { if (button && jfsloglevel > 1) printk prspec; } while (0)
/* error event message: e.g., i/o error */
extern int jfsERROR;
#define jERROR(button, prspec) \
do { if (button && jfsloglevel > 0) { printk prspec; } } while (0)
/* debug event message: */
#define jEVENT(button,prspec) \
do { if (button) printk prspec; } while (0)
/*
* debug OFF
* ---------
*/
#else /* CONFIG_JFS_DEBUG */
#define dump_mem(label,data,length)
#define ASSERT(p)
#define jEVENT(button,prspec)
#define jERROR(button,prspec)
#define jFYI(button,prspec)
#endif /* CONFIG_JFS_DEBUG */
/*
* statistics
* ----------
*/
#ifdef CONFIG_JFS_STATISTICS
#define INCREMENT(x) ((x)++)
#define DECREMENT(x) ((x)--)
#define HIGHWATERMARK(x,y) ((x) = max((x), (y)))
#else
#define INCREMENT(x)
#define DECREMENT(x)
#define HIGHWATERMARK(x,y)
#endif /* CONFIG_JFS_STATISTICS */
#endif /* _H_JFS_DEBUG */
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*/
#ifndef _H_JFS_DEFRAGFS
#define _H_JFS_DEFRAGFS
/*
* jfs_defragfs.h
*/
/*
* defragfs parameter list
*/
typedef struct {
uint flag; /* 4: */
u8 dev; /* 1: */
u8 pad[3]; /* 3: */
s32 fileset; /* 4: */
u32 inostamp; /* 4: */
u32 ino; /* 4: */
u32 gen; /* 4: */
s64 xoff; /* 8: */
s64 old_xaddr; /* 8: */
s64 new_xaddr; /* 8: */
s32 xlen; /* 4: */
} defragfs_t; /* (52) */
/* plist flag */
#define DEFRAGFS_SYNC 0x80000000
#define DEFRAGFS_COMMIT 0x40000000
#define DEFRAGFS_RELOCATE 0x10000000
#define INODE_TYPE 0x0000F000 /* IFREG or IFDIR */
#define EXTENT_TYPE 0x000000ff
#define DTPAGE 0x00000001
#define XTPAGE 0x00000002
#define DATAEXT 0x00000004
#define EAEXT 0x00000008
#endif /* _H_JFS_DEFRAGFS */
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*/
#ifndef _H_JFS_DINODE
#define _H_JFS_DINODE
/*
* jfs_dinode.h: on-disk inode manager
*
*/
#define INODESLOTSIZE 128
#define L2INODESLOTSIZE 7
#define log2INODESIZE 9 /* log2(bytes per dinode) */
/*
* on-disk inode (dinode_t): 512 bytes
*
* note: align 64-bit fields on 8-byte boundary.
*/
struct dinode {
/*
* I. base area (128 bytes)
* ------------------------
*
* define generic/POSIX attributes
*/
u32 di_inostamp; /* 4: stamp to show inode belongs to fileset */
s32 di_fileset; /* 4: fileset number */
u32 di_number; /* 4: inode number, aka file serial number */
u32 di_gen; /* 4: inode generation number */
pxd_t di_ixpxd; /* 8: inode extent descriptor */
s64 di_size; /* 8: size */
s64 di_nblocks; /* 8: number of blocks allocated */
u32 di_nlink; /* 4: number of links to the object */
u32 di_uid; /* 4: user id of owner */
u32 di_gid; /* 4: group id of owner */
u32 di_mode; /* 4: attribute, format and permission */
struct timestruc_t di_atime; /* 8: time last data accessed */
struct timestruc_t di_ctime; /* 8: time last status changed */
struct timestruc_t di_mtime; /* 8: time last data modified */
struct timestruc_t di_otime; /* 8: time created */
dxd_t di_acl; /* 16: acl descriptor */
dxd_t di_ea; /* 16: ea descriptor */
u32 di_next_index; /* 4: Next available dir_table index */
s32 di_acltype; /* 4: Type of ACL */
/*
* Extension Areas.
*
* Historically, the inode was partitioned into 4 128-byte areas,
* the last 3 being defined as unions which could have multiple
* uses. The first 96 bytes had been completely unused until
* an index table was added to the directory. It is now more
* useful to describe the last 3/4 of the inode as a single
* union. We would probably be better off redesigning the
* entire structure from scratch, but we don't want to break
* commonality with OS/2's JFS at this time.
*/
union {
struct {
/*
* This table contains the information needed to
* find a directory entry from a 32-bit index.
* If the index is small enough, the table is inline,
* otherwise, an x-tree root overlays this table
*/
dir_table_slot_t _table[12]; /* 96: inline */
dtroot_t _dtroot; /* 288: dtree root */
} _dir; /* (384) */
#define di_dirtable u._dir._table
#define di_dtroot u._dir._dtroot
#define di_parent di_dtroot.header.idotdot
#define di_DASD di_dtroot.header.DASD
struct {
union {
u8 _data[96]; /* 96: unused */
struct {
void *_imap; /* 4: unused */
u32 _gengen; /* 4: generator */
} _imap;
} _u1; /* 96: */
#define di_gengen u._file._u1._imap._gengen
union {
xtpage_t _xtroot;
struct {
u8 unused[16]; /* 16: */
dxd_t _dxd; /* 16: */
union {
u32 _rdev; /* 4: */
u8 _fastsymlink[128];
} _u;
u8 _inlineea[128];
} _special;
} _u2;
} _file;
#define di_xtroot u._file._u2._xtroot
#define di_dxd u._file._u2._special._dxd
#define di_btroot di_xtroot
#define di_inlinedata u._file._u2._special._u
#define di_rdev u._file._u2._special._u._rdev
#define di_fastsymlink u._file._u2._special._u._fastsymlink
#define di_inlineea u._file._u2._special._inlineea
} u;
};
typedef struct dinode dinode_t;
/* extended mode bits (on-disk inode di_mode) */
#define IFJOURNAL 0x00010000 /* journalled file */
#define ISPARSE 0x00020000 /* sparse file enabled */
#define INLINEEA 0x00040000 /* inline EA area free */
#define ISWAPFILE 0x00800000 /* file open for pager swap space */
/* more extended mode bits: attributes for OS/2 */
#define IREADONLY 0x02000000 /* no write access to file */
#define IARCHIVE 0x40000000 /* file archive bit */
#define ISYSTEM 0x08000000 /* system file */
#define IHIDDEN 0x04000000 /* hidden file */
#define IRASH 0x4E000000 /* mask for changeable attributes */
#define INEWNAME 0x80000000 /* non-8.3 filename format */
#define IDIRECTORY 0x20000000 /* directory (shadow of real bit) */
#define ATTRSHIFT 25 /* bits to shift to move attribute
specification to mode position */
#endif /*_H_JFS_DINODE */
This source diff could not be displayed because it is too large. You can view the blob instead.
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* jfs_dmap.h: block allocation map manager
*/
#ifndef _H_JFS_DMAP
#define _H_JFS_DMAP
#include "jfs_txnmgr.h"
#define BMAPVERSION 1 /* version number */
#define TREESIZE (256+64+16+4+1) /* size of a dmap tree */
#define LEAFIND (64+16+4+1) /* index of 1st leaf of a dmap tree */
#define LPERDMAP 256 /* num leaves per dmap tree */
#define L2LPERDMAP 8 /* l2 number of leaves per dmap tree */
#define DBWORD 32 /* # of blks covered by a map word */
#define L2DBWORD 5 /* l2 # of blks covered by a mword */
#define BUDMIN L2DBWORD /* max free string in a map word */
#define BPERDMAP (LPERDMAP * DBWORD) /* num of blks per dmap */
#define L2BPERDMAP 13 /* l2 num of blks per dmap */
#define CTLTREESIZE (1024+256+64+16+4+1) /* size of a dmapctl tree */
#define CTLLEAFIND (256+64+16+4+1) /* idx of 1st leaf of a dmapctl tree */
#define LPERCTL 1024 /* num of leaves per dmapctl tree */
#define L2LPERCTL 10 /* l2 num of leaves per dmapctl tree */
#define ROOT 0 /* index of the root of a tree */
#define NOFREE ((s8) -1) /* no blocks free */
#define MAXAG 128 /* max number of allocation groups */
#define L2MAXAG 7 /* l2 max num of AG */
#define L2MINAGSZ 25 /* l2 of minimum AG size in bytes */
#define BMAPBLKNO 0 /* lblkno of bmap within the map */
/*
* maximum l2 number of disk blocks at the various dmapctl levels.
*/
#define L2MAXL0SIZE (L2BPERDMAP + 1 * L2LPERCTL)
#define L2MAXL1SIZE (L2BPERDMAP + 2 * L2LPERCTL)
#define L2MAXL2SIZE (L2BPERDMAP + 3 * L2LPERCTL)
/*
* maximum number of disk blocks at the various dmapctl levels.
*/
#define MAXL0SIZE ((s64)1 << L2MAXL0SIZE)
#define MAXL1SIZE ((s64)1 << L2MAXL1SIZE)
#define MAXL2SIZE ((s64)1 << L2MAXL2SIZE)
#define MAXMAPSIZE MAXL2SIZE /* maximum aggregate map size */
/*
* determine the maximum free string for four (lower level) nodes
* of the tree.
*/
static __inline signed char TREEMAX(signed char *cp)
{
signed char tmp1, tmp2;
tmp1 = max(*(cp+2), *(cp+3));
tmp2 = max(*(cp), *(cp+1));
return max(tmp1, tmp2);
}
/*
* convert disk block number to the logical block number of the dmap
* describing the disk block. s is the log2(number of logical blocks per page)
*
* The calculation figures out how many logical pages are in front of the dmap.
* - the number of dmaps preceding it
* - the number of L0 pages preceding its L0 page
* - the number of L1 pages preceding its L1 page
* - 3 is added to account for the L2, L1, and L0 page for this dmap
* - 1 is added to account for the control page of the map.
*/
#define BLKTODMAP(b,s) \
((((b) >> 13) + ((b) >> 23) + ((b) >> 33) + 3 + 1) << (s))
/*
* convert disk block number to the logical block number of the LEVEL 0
* dmapctl describing the disk block. s is the log2(number of logical blocks
* per page)
*
* The calculation figures out how many logical pages are in front of the L0.
* - the number of dmap pages preceding it
* - the number of L0 pages preceding it
* - the number of L1 pages preceding its L1 page
* - 2 is added to account for the L2, and L1 page for this L0
* - 1 is added to account for the control page of the map.
*/
#define BLKTOL0(b,s) \
(((((b) >> 23) << 10) + ((b) >> 23) + ((b) >> 33) + 2 + 1) << (s))
/*
* convert disk block number to the logical block number of the LEVEL 1
* dmapctl describing the disk block. s is the log2(number of logical blocks
* per page)
*
* The calculation figures out how many logical pages are in front of the L1.
* - the number of dmap pages preceding it
* - the number of L0 pages preceding it
* - the number of L1 pages preceding it
* - 1 is added to account for the L2 page
* - 1 is added to account for the control page of the map.
*/
#define BLKTOL1(b,s) \
(((((b) >> 33) << 20) + (((b) >> 33) << 10) + ((b) >> 33) + 1 + 1) << (s))
/*
* convert disk block number to the logical block number of the dmapctl
* at the specified level which describes the disk block.
*/
#define BLKTOCTL(b,s,l) \
(((l) == 2) ? 1 : ((l) == 1) ? BLKTOL1((b),(s)) : BLKTOL0((b),(s)))
/*
* convert aggregate map size to the zero origin dmapctl level of the
* top dmapctl.
*/
#define BMAPSZTOLEV(size) \
(((size) <= MAXL0SIZE) ? 0 : ((size) <= MAXL1SIZE) ? 1 : 2)
/* convert disk block number to allocation group number.
*/
#define BLKTOAG(b,sbi) ((b) >> ((sbi)->bmap->db_agl2size))
/* convert allocation group number to starting disk block
* number.
*/
#define AGTOBLK(a,ip) \
((s64)(a) << (JFS_SBI((ip)->i_sb)->bmap->db_agl2size))
/*
* dmap summary tree
*
* dmaptree_t must be consistent with dmapctl_t.
*/
typedef struct {
s32 nleafs; /* 4: number of tree leafs */
s32 l2nleafs; /* 4: l2 number of tree leafs */
s32 leafidx; /* 4: index of first tree leaf */
s32 height; /* 4: height of the tree */
s8 budmin; /* 1: min l2 tree leaf value to combine */
s8 stree[TREESIZE]; /* TREESIZE: tree */
u8 pad[2]; /* 2: pad to word boundary */
} dmaptree_t; /* - 360 - */
/*
* dmap page per 8K blocks bitmap
*/
typedef struct {
s32 nblocks; /* 4: num blks covered by this dmap */
s32 nfree; /* 4: num of free blks in this dmap */
s64 start; /* 8: starting blkno for this dmap */
dmaptree_t tree; /* 360: dmap tree */
u8 pad[1672]; /* 1672: pad to 2048 bytes */
u32 wmap[LPERDMAP]; /* 1024: bits of the working map */
u32 pmap[LPERDMAP]; /* 1024: bits of the persistent map */
} dmap_t; /* - 4096 - */
/*
* disk map control page per level.
*
* dmapctl_t must be consistent with dmaptree_t.
*/
typedef struct {
s32 nleafs; /* 4: number of tree leafs */
s32 l2nleafs; /* 4: l2 number of tree leafs */
s32 leafidx; /* 4: index of the first tree leaf */
s32 height; /* 4: height of tree */
s8 budmin; /* 1: minimum l2 tree leaf value */
s8 stree[CTLTREESIZE]; /* CTLTREESIZE: dmapctl tree */
u8 pad[2714]; /* 2714: pad to 4096 */
} dmapctl_t; /* - 4096 - */
/*
* common definition for dmaptree_t within dmap and dmapctl
*/
typedef union {
dmaptree_t t1;
dmapctl_t t2;
} dmtree_t;
/* macros for accessing fields within dmtree_t */
#define dmt_nleafs t1.nleafs
#define dmt_l2nleafs t1.l2nleafs
#define dmt_leafidx t1.leafidx
#define dmt_height t1.height
#define dmt_budmin t1.budmin
#define dmt_stree t1.stree
/*
* on-disk aggregate disk allocation map descriptor.
*/
typedef struct {
s64 dn_mapsize; /* 8: number of blocks in aggregate */
s64 dn_nfree; /* 8: num free blks in aggregate map */
s32 dn_l2nbperpage; /* 4: number of blks per page */
s32 dn_numag; /* 4: total number of ags */
s32 dn_maxlevel; /* 4: number of active ags */
s32 dn_maxag; /* 4: max active alloc group number */
s32 dn_agpref; /* 4: preferred alloc group (hint) */
s32 dn_aglevel; /* 4: dmapctl level holding the AG */
s32 dn_agheigth; /* 4: height in dmapctl of the AG */
s32 dn_agwidth; /* 4: width in dmapctl of the AG */
s32 dn_agstart; /* 4: start tree index at AG height */
s32 dn_agl2size; /* 4: l2 num of blks per alloc group */
s64 dn_agfree[MAXAG]; /* 8*MAXAG: per AG free count */
s64 dn_agsize; /* 8: num of blks per alloc group */
s8 dn_maxfreebud; /* 1: max free buddy system */
u8 pad[3007]; /* 3007: pad to 4096 */
} dbmap_t; /* - 4096 - */
/*
* in-memory aggregate disk allocation map descriptor.
*/
typedef struct bmap {
dbmap_t db_bmap; /* on-disk aggregate map descriptor */
struct inode *db_ipbmap; /* ptr to aggregate map incore inode */
struct semaphore db_bmaplock; /* aggregate map lock */
u32 *db_DBmap;
} bmap_t;
/* macros for accessing fields within in-memory aggregate map descriptor */
#define db_mapsize db_bmap.dn_mapsize
#define db_nfree db_bmap.dn_nfree
#define db_agfree db_bmap.dn_agfree
#define db_agsize db_bmap.dn_agsize
#define db_agl2size db_bmap.dn_agl2size
#define db_agwidth db_bmap.dn_agwidth
#define db_agheigth db_bmap.dn_agheigth
#define db_agstart db_bmap.dn_agstart
#define db_numag db_bmap.dn_numag
#define db_maxlevel db_bmap.dn_maxlevel
#define db_aglevel db_bmap.dn_aglevel
#define db_agpref db_bmap.dn_agpref
#define db_maxag db_bmap.dn_maxag
#define db_maxfreebud db_bmap.dn_maxfreebud
#define db_l2nbperpage db_bmap.dn_l2nbperpage
/*
* macros for various conversions needed by the allocators.
* blkstol2(), cntlz(), and cnttz() are operating system dependent functions.
*/
/* convert number of blocks to log2 number of blocks, rounding up to
* the next log2 value if blocks is not a l2 multiple.
*/
#define BLKSTOL2(d) (blkstol2(d))
/* convert number of leafs to log2 leaf value */
#define NLSTOL2BSZ(n) (31 - cntlz((n)) + BUDMIN)
/* convert leaf index to log2 leaf value */
#define LITOL2BSZ(n,m,b) ((((n) == 0) ? (m) : cnttz((n))) + (b))
/* convert a block number to a dmap control leaf index */
#define BLKTOCTLLEAF(b,m) \
(((b) & (((s64)1 << ((m) + L2LPERCTL)) - 1)) >> (m))
/* convert log2 leaf value to buddy size */
#define BUDSIZE(s,m) (1 << ((s) - (m)))
/*
* external references.
*/
extern int dbMount(struct inode *ipbmap);
extern int dbUnmount(struct inode *ipbmap, int mounterror);
extern int dbFree(struct inode *ipbmap, s64 blkno, s64 nblocks);
extern int dbUpdatePMap(struct inode *ipbmap,
int free, s64 blkno, s64 nblocks, tblock_t * tblk);
extern int dbNextAG(struct inode *ipbmap);
extern int dbAlloc(struct inode *ipbmap, s64 hint, s64 nblocks, s64 * results);
extern int dbAllocExact(struct inode *ip, s64 blkno, int nblocks);
extern int dbReAlloc(struct inode *ipbmap,
s64 blkno, s64 nblocks, s64 addnblocks, s64 * results);
extern int dbSync(struct inode *ipbmap);
extern int dbAllocBottomUp(struct inode *ip, s64 blkno, s64 nblocks);
extern int dbExtendFS(struct inode *ipbmap, s64 blkno, s64 nblocks);
extern void dbFinalizeBmap(struct inode *ipbmap);
extern s64 dbMapFileSizeToMapSize(struct inode *ipbmap);
#endif /* _H_JFS_DMAP */
This source diff could not be displayed because it is too large. You can view the blob instead.
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
/*
* Change History :
*
*/
#ifndef _H_JFS_DTREE
#define _H_JFS_DTREE
/*
* jfs_dtree.h: directory B+-tree manager
*/
#include "jfs_btree.h"
typedef union {
struct {
tid_t tid;
struct inode *ip;
u32 ino;
} leaf;
pxd_t xd;
} ddata_t;
/*
* entry segment/slot
*
* an entry consists of type dependent head/only segment/slot and
* additional segments/slots linked vi next field;
* N.B. last/only segment of entry is terminated by next = -1;
*/
/*
* directory page slot
*/
typedef struct {
s8 next; /* 1: */
s8 cnt; /* 1: */
wchar_t name[15]; /* 30: */
} dtslot_t; /* (32) */
#define DATASLOTSIZE 16
#define L2DATASLOTSIZE 4
#define DTSLOTSIZE 32
#define L2DTSLOTSIZE 5
#define DTSLOTHDRSIZE 2
#define DTSLOTDATASIZE 30
#define DTSLOTDATALEN 15
/*
* internal node entry head/only segment
*/
typedef struct {
pxd_t xd; /* 8: child extent descriptor */
s8 next; /* 1: */
u8 namlen; /* 1: */
wchar_t name[11]; /* 22: 2-byte aligned */
} idtentry_t; /* (32) */
#define DTIHDRSIZE 10
#define DTIHDRDATALEN 11
/* compute number of slots for entry */
#define NDTINTERNAL(klen) ( ((4 + (klen)) + (15 - 1)) / 15 )
/*
* leaf node entry head/only segment
*
* For legacy filesystems, name contains 13 wchars -- no index field
*/
typedef struct {
u32 inumber; /* 4: 4-byte aligned */
s8 next; /* 1: */
u8 namlen; /* 1: */
wchar_t name[11]; /* 22: 2-byte aligned */
u32 index; /* 4: index into dir_table */
} ldtentry_t; /* (32) */
#define DTLHDRSIZE 6
#define DTLHDRDATALEN_LEGACY 13 /* Old (OS/2) format */
#define DTLHDRDATALEN 11
/*
* dir_table used for directory traversal during readdir
*/
/*
* Keep persistent index for directory entries
*/
#define DO_INDEX(INODE) (JFS_SBI((INODE)->i_sb)->mntflag & JFS_DIR_INDEX)
/*
* Maximum entry in inline directory table
*/
#define MAX_INLINE_DIRTABLE_ENTRY 13
typedef struct dir_table_slot {
u8 rsrvd; /* 1: */
u8 flag; /* 1: 0 if free */
u8 slot; /* 1: slot within leaf page of entry */
u8 addr1; /* 1: upper 8 bits of leaf page address */
u32 addr2; /* 4: lower 32 bits of leaf page address -OR-
index of next entry when this entry was deleted */
} dir_table_slot_t; /* (8) */
/*
* flag values
*/
#define DIR_INDEX_VALID 1
#define DIR_INDEX_FREE 0
#define DTSaddress(dir_table_slot, address64)\
{\
(dir_table_slot)->addr1 = ((u64)address64) >> 32;\
(dir_table_slot)->addr2 = __cpu_to_le32((address64) & 0xffffffff);\
}
#define addressDTS(dts)\
( ((s64)((dts)->addr1)) << 32 | __le32_to_cpu((dts)->addr2) )
/* compute number of slots for entry */
#define NDTLEAF_LEGACY(klen) ( ((2 + (klen)) + (15 - 1)) / 15 )
#define NDTLEAF NDTINTERNAL
/*
* directory root page (in-line in on-disk inode):
*
* cf. dtpage_t below.
*/
typedef union {
struct {
dasd_t DASD; /* 16: DASD limit/usage info F226941 */
u8 flag; /* 1: */
u8 nextindex; /* 1: next free entry in stbl */
s8 freecnt; /* 1: free count */
s8 freelist; /* 1: freelist header */
u32 idotdot; /* 4: parent inode number */
s8 stbl[8]; /* 8: sorted entry index table */
} header; /* (32) */
dtslot_t slot[9];
} dtroot_t;
#define PARENT(IP) \
(le32_to_cpu(JFS_IP(IP)->i_dtroot.header.idotdot))
#define DTROOTMAXSLOT 9
#define dtEmpty(IP) (JFS_IP(IP)->i_dtroot.header.nextindex == 0)
/*
* directory regular page:
*
* entry slot array of 32 byte slot
*
* sorted entry slot index table (stbl):
* contiguous slots at slot specified by stblindex,
* 1-byte per entry
* 512 byte block: 16 entry tbl (1 slot)
* 1024 byte block: 32 entry tbl (1 slot)
* 2048 byte block: 64 entry tbl (2 slot)
* 4096 byte block: 128 entry tbl (4 slot)
*
* data area:
* 512 byte block: 16 - 2 = 14 slot
* 1024 byte block: 32 - 2 = 30 slot
* 2048 byte block: 64 - 3 = 61 slot
* 4096 byte block: 128 - 5 = 123 slot
*
* N.B. index is 0-based; index fields refer to slot index
* except nextindex which refers to entry index in stbl;
* end of entry stot list or freelist is marked with -1.
*/
typedef union {
struct {
s64 next; /* 8: next sibling */
s64 prev; /* 8: previous sibling */
u8 flag; /* 1: */
u8 nextindex; /* 1: next entry index in stbl */
s8 freecnt; /* 1: */
s8 freelist; /* 1: slot index of head of freelist */
u8 maxslot; /* 1: number of slots in page slot[] */
u8 stblindex; /* 1: slot index of start of stbl */
u8 rsrvd[2]; /* 2: */
pxd_t self; /* 8: self pxd */
} header; /* (32) */
dtslot_t slot[128];
} dtpage_t;
#define DTPAGEMAXSLOT 128
#define DT8THPGNODEBYTES 512
#define DT8THPGNODETSLOTS 1
#define DT8THPGNODESLOTS 16
#define DTQTRPGNODEBYTES 1024
#define DTQTRPGNODETSLOTS 1
#define DTQTRPGNODESLOTS 32
#define DTHALFPGNODEBYTES 2048
#define DTHALFPGNODETSLOTS 2
#define DTHALFPGNODESLOTS 64
#define DTFULLPGNODEBYTES 4096
#define DTFULLPGNODETSLOTS 4
#define DTFULLPGNODESLOTS 128
#define DTENTRYSTART 1
/* get sorted entry table of the page */
#define DT_GETSTBL(p) ( ((p)->header.flag & BT_ROOT) ?\
((dtroot_t *)(p))->header.stbl : \
(s8 *)&(p)->slot[(p)->header.stblindex] )
/*
* Flags for dtSearch
*/
#define JFS_CREATE 1
#define JFS_LOOKUP 2
#define JFS_REMOVE 3
#define JFS_RENAME 4
#define DIRENTSIZ(namlen) \
( (sizeof(struct dirent) - 2*(JFS_NAME_MAX+1) + 2*((namlen)+1) + 3) &~ 3 )
/*
* external declarations
*/
extern void dtInitRoot(tid_t tid, struct inode *ip, u32 idotdot);
extern int dtSearch(struct inode *ip, component_t * key,
ino_t * data, btstack_t * btstack, int flag);
extern int dtInsert(tid_t tid, struct inode *ip,
component_t * key, ino_t * ino, btstack_t * btstack);
extern int dtDelete(tid_t tid,
struct inode *ip, component_t * key, ino_t * data, int flag);
extern int dtRelocate(tid_t tid,
struct inode *ip, s64 lmxaddr, pxd_t * opxd, s64 nxaddr);
extern int dtModify(tid_t tid, struct inode *ip,
component_t * key, ino_t * orig_ino, ino_t new_ino, int flag);
extern int jfs_readdir(struct file *filp, void *dirent, filldir_t filldir);
#ifdef _JFS_DEBUG_DTREE
extern int dtDisplayTree(struct inode *ip);
extern int dtDisplayPage(struct inode *ip, s64 bn, dtpage_t * p);
#endif /* _JFS_DEBUG_DTREE */
#endif /* !_H_JFS_DTREE */
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#ifndef _H_JFS_EXTENDFS
#define _H_JFS_EXTENDFS
/*
* jfs_extendfs.h
*/
/*
* extendfs parameter list
*/
typedef struct {
u32 flag; /* 4: */
u8 dev; /* 1: */
u8 pad[3]; /* 3: */
s64 LVSize; /* 8: LV size in LV block */
s64 FSSize; /* 8: FS size in LV block */
s32 LogSize; /* 4: inlinelog size in LV block */
} extendfs_t; /* (28) */
/* plist flag */
#define EXTENDFS_QUERY 0x00000001
#endif /* _H_JFS_EXTENDFS */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*
*
* Module: jfs_extent.c:
*/
#include <linux/fs.h>
#include "jfs_incore.h"
#include "jfs_dmap.h"
#include "jfs_extent.h"
#include "jfs_debug.h"
/*
* forward references
*/
static int extBalloc(struct inode *, s64, s64 *, s64 *);
static int extBrealloc(struct inode *, s64, s64, s64 *, s64 *);
int extRecord(struct inode *, xad_t *);
static s64 extRoundDown(s64 nb);
/*
* external references
*/
extern int dbExtend(struct inode *, s64, s64, s64);
extern int jfs_commit_inode(struct inode *, int);
#define DPD(a) (printk("(a): %d\n",(a)))
#define DPC(a) (printk("(a): %c\n",(a)))
#define DPL1(a) \
{ \
if ((a) >> 32) \
printk("(a): %x%08x ",(a)); \
else \
printk("(a): %x ",(a) << 32); \
}
#define DPL(a) \
{ \
if ((a) >> 32) \
printk("(a): %x%08x\n",(a)); \
else \
printk("(a): %x\n",(a) << 32); \
}
#define DPD1(a) (printk("(a): %d ",(a)))
#define DPX(a) (printk("(a): %08x\n",(a)))
#define DPX1(a) (printk("(a): %08x ",(a)))
#define DPS(a) (printk("%s\n",(a)))
#define DPE(a) (printk("\nENTERING: %s\n",(a)))
#define DPE1(a) (printk("\nENTERING: %s",(a)))
#define DPS1(a) (printk(" %s ",(a)))
/*
* NAME: extAlloc()
*
* FUNCTION: allocate an extent for a specified page range within a
* file.
*
* PARAMETERS:
* ip - the inode of the file.
* xlen - requested extent length.
* pno - the starting page number with the file.
* xp - pointer to an xad. on entry, xad describes an
* extent that is used as an allocation hint if the
* xaddr of the xad is non-zero. on successful exit,
* the xad describes the newly allocated extent.
* abnr - boolean_t indicating whether the newly allocated extent
* should be marked as allocated but not recorded.
*
* RETURN VALUES:
* 0 - success
* EIO - i/o error.
* ENOSPC - insufficient disk resources.
*/
int
extAlloc(struct inode *ip, s64 xlen, s64 pno, xad_t * xp, boolean_t abnr)
{
struct jfs_sb_info *sbi = JFS_SBI(ip->i_sb);
s64 nxlen, nxaddr, xoff, hint, xaddr = 0;
int rc, nbperpage;
int xflag;
/* This blocks if we are low on resources */
txBeginAnon(ip->i_sb);
/* validate extent length */
if (xlen > MAXXLEN)
xlen = MAXXLEN;
/* get the number of blocks per page */
nbperpage = sbi->nbperpage;
/* get the page's starting extent offset */
xoff = pno << sbi->l2nbperpage;
/* check if an allocation hint was provided */
if ((hint = addressXAD(xp))) {
/* get the size of the extent described by the hint */
nxlen = lengthXAD(xp);
/* check if the hint is for the portion of the file
* immediately previous to the current allocation
* request and if hint extent has the same abnr
* value as the current request. if so, we can
* extend the hint extent to include the current
* extent if we can allocate the blocks immediately
* following the hint extent.
*/
if (offsetXAD(xp) + nxlen == xoff &&
abnr == ((xp->flag & XAD_NOTRECORDED) ? TRUE : FALSE))
xaddr = hint + nxlen;
/* adjust the hint to the last block of the extent */
hint += (nxlen - 1);
}
/* allocate the disk blocks for the extent. initially, extBalloc()
* will try to allocate disk blocks for the requested size (xlen).
* if this fails (xlen contigious free blocks not avaliable), it'll
* try to allocate a smaller number of blocks (producing a smaller
* extent), with this smaller number of blocks consisting of the
* requested number of blocks rounded down to the next smaller
* power of 2 number (i.e. 16 -> 8). it'll continue to round down
* and retry the allocation until the number of blocks to allocate
* is smaller than the number of blocks per page.
*/
nxlen = xlen;
if ((rc =
extBalloc(ip, hint ? hint : INOHINT(ip), &nxlen, &nxaddr))) {
return (rc);
}
/* determine the value of the extent flag */
xflag = (abnr == TRUE) ? XAD_NOTRECORDED : 0;
/* if we can extend the hint extent to cover the current request,
* extend it. otherwise, insert a new extent to
* cover the current request.
*/
if (xaddr && xaddr == nxaddr)
rc = xtExtend(0, ip, xoff, (int) nxlen, 0);
else
rc = xtInsert(0, ip, xflag, xoff, (int) nxlen, &nxaddr, 0);
/* if the extend or insert failed,
* free the newly allocated blocks and return the error.
*/
if (rc) {
dbFree(ip, nxaddr, nxlen);
return (rc);
}
/* update the number of blocks allocated to the file */
ip->i_blocks += LBLK2PBLK(ip->i_sb, nxlen);
/* set the results of the extent allocation */
XADaddress(xp, nxaddr);
XADlength(xp, nxlen);
XADoffset(xp, xoff);
xp->flag = xflag;
mark_inode_dirty(ip);
/*
* COMMIT_SyncList flags an anonymous tlock on page that is on
* sync list.
* We need to commit the inode to get the page written disk.
*/
if (test_and_clear_cflag(COMMIT_Synclist,ip))
jfs_commit_inode(ip, 0);
return (0);
}
/*
* NAME: extRealloc()
*
* FUNCTION: extend the allocation of a file extent containing a
* partial back last page.
*
* PARAMETERS:
* ip - the inode of the file.
* cp - cbuf for the partial backed last page.
* xlen - request size of the resulting extent.
* xp - pointer to an xad. on successful exit, the xad
* describes the newly allocated extent.
* abnr - boolean_t indicating whether the newly allocated extent
* should be marked as allocated but not recorded.
*
* RETURN VALUES:
* 0 - success
* EIO - i/o error.
* ENOSPC - insufficient disk resources.
*/
int extRealloc(struct inode *ip, s64 nxlen, xad_t * xp, boolean_t abnr)
{
struct super_block *sb = ip->i_sb;
s64 xaddr, xlen, nxaddr, delta, xoff;
s64 ntail, nextend, ninsert;
int rc, nbperpage = JFS_SBI(sb)->nbperpage;
int xflag;
/* This blocks if we are low on resources */
txBeginAnon(ip->i_sb);
/* validate extent length */
if (nxlen > MAXXLEN)
nxlen = MAXXLEN;
/* get the extend (partial) page's disk block address and
* number of blocks.
*/
xaddr = addressXAD(xp);
xlen = lengthXAD(xp);
xoff = offsetXAD(xp);
/* if the extend page is abnr and if the request is for
* the extent to be allocated and recorded,
* make the page allocated and recorded.
*/
if ((xp->flag & XAD_NOTRECORDED) && !abnr) {
xp->flag = 0;
if ((rc = xtUpdate(0, ip, xp)))
return (rc);
}
/* try to allocated the request number of blocks for the
* extent. dbRealloc() first tries to satisfy the request
* by extending the allocation in place. otherwise, it will
* try to allocate a new set of blocks large enough for the
* request. in satisfying a request, dbReAlloc() may allocate
* less than what was request but will always allocate enough
* space as to satisfy the extend page.
*/
if ((rc = extBrealloc(ip, xaddr, xlen, &nxlen, &nxaddr)))
return (rc);
delta = nxlen - xlen;
/* check if the extend page is not abnr but the request is abnr
* and the allocated disk space is for more than one page. if this
* is the case, there is a miss match of abnr between the extend page
* and the one or more pages following the extend page. as a result,
* two extents will have to be manipulated. the first will be that
* of the extent of the extend page and will be manipulated thru
* an xtExtend() or an xtTailgate(), depending upon whether the
* disk allocation occurred as an inplace extension. the second
* extent will be manipulated (created) through an xtInsert() and
* will be for the pages following the extend page.
*/
if (abnr && (!(xp->flag & XAD_NOTRECORDED)) && (nxlen > nbperpage)) {
ntail = nbperpage;
nextend = ntail - xlen;
ninsert = nxlen - nbperpage;
xflag = XAD_NOTRECORDED;
} else {
ntail = nxlen;
nextend = delta;
ninsert = 0;
xflag = xp->flag;
}
/* if we were able to extend the disk allocation in place,
* extend the extent. otherwise, move the extent to a
* new disk location.
*/
if (xaddr == nxaddr) {
/* extend the extent */
if ((rc = xtExtend(0, ip, xoff + xlen, (int) nextend, 0))) {
dbFree(ip, xaddr + xlen, delta);
return (rc);
}
} else {
/*
* move the extent to a new location:
*
* xtTailgate() accounts for relocated tail extent;
*/
if ((rc = xtTailgate(0, ip, xoff, (int) ntail, nxaddr, 0))) {
dbFree(ip, nxaddr, nxlen);
return (rc);
}
}
/* check if we need to also insert a new extent */
if (ninsert) {
/* perform the insert. if it fails, free the blocks
* to be inserted and make it appear that we only did
* the xtExtend() or xtTailgate() above.
*/
xaddr = nxaddr + ntail;
if (xtInsert (0, ip, xflag, xoff + ntail, (int) ninsert,
&xaddr, 0)) {
dbFree(ip, xaddr, (s64) ninsert);
delta = nextend;
nxlen = ntail;
xflag = 0;
}
}
/* update the inode with the number of blocks allocated */
ip->i_blocks += LBLK2PBLK(sb, delta);
/* set the return results */
XADaddress(xp, nxaddr);
XADlength(xp, nxlen);
XADoffset(xp, xoff);
xp->flag = xflag;
mark_inode_dirty(ip);
return (0);
}
/*
* NAME: extHint()
*
* FUNCTION: produce an extent allocation hint for a file offset.
*
* PARAMETERS:
* ip - the inode of the file.
* offset - file offset for which the hint is needed.
* xp - pointer to the xad that is to be filled in with
* the hint.
*
* RETURN VALUES:
* 0 - success
* EIO - i/o error.
*/
int extHint(struct inode *ip, s64 offset, xad_t * xp)
{
struct super_block *sb = ip->i_sb;
xadlist_t xadl;
lxdlist_t lxdl;
lxd_t lxd;
s64 prev;
int rc, nbperpage = JFS_SBI(sb)->nbperpage;
/* init the hint as "no hint provided" */
XADaddress(xp, 0);
/* determine the starting extent offset of the page previous
* to the page containing the offset.
*/
prev = ((offset & ~POFFSET) >> JFS_SBI(sb)->l2bsize) - nbperpage;
/* if the offsets in the first page of the file,
* no hint provided.
*/
if (prev < 0)
return (0);
/* prepare to lookup the previous page's extent info */
lxdl.maxnlxd = 1;
lxdl.nlxd = 1;
lxdl.lxd = &lxd;
LXDoffset(&lxd, prev)
LXDlength(&lxd, nbperpage);
xadl.maxnxad = 1;
xadl.nxad = 0;
xadl.xad = xp;
/* perform the lookup */
if ((rc = xtLookupList(ip, &lxdl, &xadl, 0)))
return (rc);
/* check if not extent exists for the previous page.
* this is possible for sparse files.
*/
if (xadl.nxad == 0) {
// assert(ISSPARSE(ip));
return (0);
}
/* only preserve the abnr flag within the xad flags
* of the returned hint.
*/
xp->flag &= XAD_NOTRECORDED;
assert(xadl.nxad == 1);
assert(lengthXAD(xp) == nbperpage);
return (0);
}
/*
* NAME: extRecord()
*
* FUNCTION: change a page with a file from not recorded to recorded.
*
* PARAMETERS:
* ip - inode of the file.
* cp - cbuf of the file page.
*
* RETURN VALUES:
* 0 - success
* EIO - i/o error.
* ENOSPC - insufficient disk resources.
*/
int extRecord(struct inode *ip, xad_t * xp)
{
int rc;
txBeginAnon(ip->i_sb);
/* update the extent */
if ((rc = xtUpdate(0, ip, xp)))
return (rc);
#ifdef _STILL_TO_PORT
/* no longer abnr */
cp->cm_abnr = FALSE;
/* mark the cbuf as modified */
cp->cm_modified = TRUE;
#endif /* _STILL_TO_PORT */
return (0);
}
/*
* NAME: extFill()
*
* FUNCTION: allocate disk space for a file page that represents
* a file hole.
*
* PARAMETERS:
* ip - the inode of the file.
* cp - cbuf of the file page represent the hole.
*
* RETURN VALUES:
* 0 - success
* EIO - i/o error.
* ENOSPC - insufficient disk resources.
*/
int extFill(struct inode *ip, xad_t * xp)
{
int rc, nbperpage = JFS_SBI(ip->i_sb)->nbperpage;
s64 blkno = offsetXAD(xp) >> ip->i_blksize;
// assert(ISSPARSE(ip));
/* initialize the extent allocation hint */
XADaddress(xp, 0);
/* allocate an extent to fill the hole */
if ((rc = extAlloc(ip, nbperpage, blkno, xp, FALSE)))
return (rc);
assert(lengthPXD(xp) == nbperpage);
return (0);
}
/*
* NAME: extBalloc()
*
* FUNCTION: allocate disk blocks to form an extent.
*
* initially, we will try to allocate disk blocks for the
* requested size (nblocks). if this fails (nblocks
* contigious free blocks not avaliable), we'll try to allocate
* a smaller number of blocks (producing a smaller extent), with
* this smaller number of blocks consisting of the requested
* number of blocks rounded down to the next smaller power of 2
* number (i.e. 16 -> 8). we'll continue to round down and
* retry the allocation until the number of blocks to allocate
* is smaller than the number of blocks per page.
*
* PARAMETERS:
* ip - the inode of the file.
* hint - disk block number to be used as an allocation hint.
* *nblocks - pointer to an s64 value. on entry, this value specifies
* the desired number of block to be allocated. on successful
* exit, this value is set to the number of blocks actually
* allocated.
* blkno - pointer to a block address that is filled in on successful
* return with the starting block number of the newly
* allocated block range.
*
* RETURN VALUES:
* 0 - success
* EIO - i/o error.
* ENOSPC - insufficient disk resources.
*/
static int
extBalloc(struct inode *ip, s64 hint, s64 * nblocks, s64 * blkno)
{
s64 nb, nblks, daddr, max;
int rc, nbperpage = JFS_SBI(ip->i_sb)->nbperpage;
bmap_t *mp = JFS_SBI(ip->i_sb)->bmap;
/* get the number of blocks to initially attempt to allocate.
* we'll first try the number of blocks requested unless this
* number is greater than the maximum number of contigious free
* blocks in the map. in that case, we'll start off with the
* maximum free.
*/
max = (s64) 1 << mp->db_maxfreebud;
if (*nblocks >= max && *nblocks > nbperpage)
nb = nblks = (max > nbperpage) ? max : nbperpage;
else
nb = nblks = *nblocks;
/* try to allocate blocks */
while ((rc = dbAlloc(ip, hint, nb, &daddr))) {
/* if something other than an out of space error,
* stop and return this error.
*/
if (rc != ENOSPC)
return (rc);
/* decrease the allocation request size */
nb = min(nblks, extRoundDown(nb));
/* give up if we cannot cover a page */
if (nb < nbperpage)
return (rc);
}
*nblocks = nb;
*blkno = daddr;
return (0);
}
/*
* NAME: extBrealloc()
*
* FUNCTION: attempt to extend an extent's allocation.
*
* initially, we will try to extend the extent's allocation
* in place. if this fails, we'll try to move the extent
* to a new set of blocks. if moving the extent, we initially
* will try to allocate disk blocks for the requested size
* (nnew). if this fails (nnew contigious free blocks not
* avaliable), we'll try to allocate a smaller number of
* blocks (producing a smaller extent), with this smaller
* number of blocks consisting of the requested number of
* blocks rounded down to the next smaller power of 2
* number (i.e. 16 -> 8). we'll continue to round down and
* retry the allocation until the number of blocks to allocate
* is smaller than the number of blocks per page.
*
* PARAMETERS:
* ip - the inode of the file.
* blkno - starting block number of the extents current allocation.
* nblks - number of blocks within the extents current allocation.
* newnblks - pointer to a s64 value. on entry, this value is the
* the new desired extent size (number of blocks). on
* successful exit, this value is set to the extent's actual
* new size (new number of blocks).
* newblkno - the starting block number of the extents new allocation.
*
* RETURN VALUES:
* 0 - success
* EIO - i/o error.
* ENOSPC - insufficient disk resources.
*/
static int
extBrealloc(struct inode *ip,
s64 blkno, s64 nblks, s64 * newnblks, s64 * newblkno)
{
int rc;
/* try to extend in place */
if ((rc = dbExtend(ip, blkno, nblks, *newnblks - nblks)) == 0) {
*newblkno = blkno;
return (0);
} else {
if (rc != ENOSPC)
return (rc);
}
/* in place extension not possible.
* try to move the extent to a new set of blocks.
*/
return (extBalloc(ip, blkno, newnblks, newblkno));
}
/*
* NAME: extRoundDown()
*
* FUNCTION: round down a specified number of blocks to the next
* smallest power of 2 number.
*
* PARAMETERS:
* nb - the inode of the file.
*
* RETURN VALUES:
* next smallest power of 2 number.
*/
static s64 extRoundDown(s64 nb)
{
int i;
u64 m, k;
for (i = 0, m = (u64) 1 << 63; i < 64; i++, m >>= 1) {
if (m & nb)
break;
}
i = 63 - i;
k = (u64) 1 << i;
k = ((k - 1) & nb) ? k : k >> 1;
return (k);
}
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#ifndef _H_JFS_EXTENT
#define _H_JFS_EXTENT
/* get block allocation allocation hint as location of disk inode */
#define INOHINT(ip) \
(addressPXD(&(JFS_IP(ip)->ixpxd)) + lengthPXD(&(JFS_IP(ip)->ixpxd)) - 1)
extern int extAlloc(struct inode *, s64, s64, xad_t *, boolean_t);
extern int extFill(struct inode *, xad_t *);
extern int extHint(struct inode *, s64, xad_t *);
extern int extRealloc(struct inode *, s64, xad_t *, boolean_t);
extern int extRecord(struct inode *, xad_t *);
#endif /* _H_JFS_EXTENT */
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*/
#ifndef _H_JFS_FILSYS
#define _H_JFS_FILSYS
/*
* jfs_filsys.h
*
* file system (implementation-dependent) constants
*
* refer to <limits.h> for system wide implementation-dependent constants
*/
/*
* file system option (superblock flag)
*/
/* platform option (conditional compilation) */
#define JFS_AIX 0x80000000 /* AIX support */
/* POSIX name/directory support */
#define JFS_OS2 0x40000000 /* OS/2 support */
/* case-insensitive name/directory support */
#define JFS_DFS 0x20000000 /* DCE DFS LFS support */
#define JFS_LINUX 0x10000000 /* Linux support */
/* case-sensitive name/directory support */
/* directory option */
#define JFS_UNICODE 0x00000001 /* unicode name */
/* commit option */
#define JFS_COMMIT 0x00000f00 /* commit option mask */
#define JFS_GROUPCOMMIT 0x00000100 /* group (of 1) commit */
#define JFS_LAZYCOMMIT 0x00000200 /* lazy commit */
#define JFS_TMPFS 0x00000400 /* temporary file system -
* do not log/commit:
*/
/* log logical volume option */
#define JFS_INLINELOG 0x00000800 /* inline log within file system */
#define JFS_INLINEMOVE 0x00001000 /* inline log being moved */
/* Secondary aggregate inode table */
#define JFS_BAD_SAIT 0x00010000 /* current secondary ait is bad */
/* sparse regular file support */
#define JFS_SPARSE 0x00020000 /* sparse regular file */
/* DASD Limits F226941 */
#define JFS_DASD_ENABLED 0x00040000 /* DASD limits enabled */
#define JFS_DASD_PRIME 0x00080000 /* Prime DASD usage on boot */
/* big endian flag */
#define JFS_SWAP_BYTES 0x00100000 /* running on big endian computer */
/* Directory index */
#define JFS_DIR_INDEX 0x00200000 /* Persistant index for */
/* directory entries */
/*
* buffer cache configuration
*/
/* page size */
#ifdef PSIZE
#undef PSIZE
#endif
#define PSIZE 4096 /* page size (in byte) */
#define L2PSIZE 12 /* log2(PSIZE) */
#define POFFSET 4095 /* offset within page */
/* buffer page size */
#define BPSIZE PSIZE
/*
* fs fundamental size
*
* PSIZE >= file system block size >= PBSIZE >= DISIZE
*/
#define PBSIZE 512 /* physical block size (in byte) */
#define L2PBSIZE 9 /* log2(PBSIZE) */
#define DISIZE 512 /* on-disk inode size (in byte) */
#define L2DISIZE 9 /* log2(DISIZE) */
#define IDATASIZE 256 /* inode inline data size */
#define IXATTRSIZE 128 /* inode inline extended attribute size */
#define XTPAGE_SIZE 4096
#define log2_PAGESIZE 12
#define IAG_SIZE 4096
#define IAG_EXTENT_SIZE 4096
#define INOSPERIAG 4096 /* number of disk inodes per iag */
#define L2INOSPERIAG 12 /* l2 number of disk inodes per iag */
#define INOSPEREXT 32 /* number of disk inode per extent */
#define L2INOSPEREXT 5 /* l2 number of disk inode per extent */
#define IXSIZE (DISIZE * INOSPEREXT) /* inode extent size */
#define INOSPERPAGE 8 /* number of disk inodes per 4K page */
#define L2INOSPERPAGE 3 /* log2(INOSPERPAGE) */
#define IAGFREELIST_LWM 64
#define INODE_EXTENT_SIZE IXSIZE /* inode extent size */
#define NUM_INODE_PER_EXTENT INOSPEREXT
#define NUM_INODE_PER_IAG INOSPERIAG
#define MINBLOCKSIZE 512
#define MAXBLOCKSIZE 4096
#define MAXFILESIZE ((s64)1 << 52)
#define JFS_LINK_MAX 65535 /* nlink_t is unsigned short */
/* Minimum number of bytes supported for a JFS partition */
#define MINJFS (0x1000000)
#define MINJFSTEXT "16"
/*
* file system block size -> physical block size
*/
#define LBOFFSET(x) ((x) & (PBSIZE - 1))
#define LBNUMBER(x) ((x) >> L2PBSIZE)
#define LBLK2PBLK(sb,b) ((b) << (sb->s_blocksize_bits - L2PBSIZE))
#define PBLK2LBLK(sb,b) ((b) >> (sb->s_blocksize_bits - L2PBSIZE))
/* size in byte -> last page number */
#define SIZE2PN(size) ( ((s64)((size) - 1)) >> (L2PSIZE) )
/* size in byte -> last file system block number */
#define SIZE2BN(size, l2bsize) ( ((s64)((size) - 1)) >> (l2bsize) )
/*
* fixed physical block address (physical block size = 512 byte)
*
* NOTE: since we can't guarantee a physical block size of 512 bytes the use of
* these macros should be removed and the byte offset macros used instead.
*/
#define SUPER1_B 64 /* primary superblock */
#define AIMAP_B (SUPER1_B + 8) /* 1st extent of aggregate inode map */
#define AITBL_B (AIMAP_B + 16) /*
* 1st extent of aggregate inode table
*/
#define SUPER2_B (AITBL_B + 32) /* 2ndary superblock pbn */
#define BMAP_B (SUPER2_B + 8) /* block allocation map */
/*
* SIZE_OF_SUPER defines the total amount of space reserved on disk for the
* superblock. This is not the same as the superblock structure, since all of
* this space is not currently being used.
*/
#define SIZE_OF_SUPER PSIZE
/*
* SIZE_OF_AG_TABLE defines the amount of space reserved to hold the AG table
*/
#define SIZE_OF_AG_TABLE PSIZE
/*
* SIZE_OF_MAP_PAGE defines the amount of disk space reserved for each page of
* the inode allocation map (to hold iag)
*/
#define SIZE_OF_MAP_PAGE PSIZE
/*
* fixed byte offset address
*/
#define SUPER1_OFF 0x8000 /* primary superblock */
#define AIMAP_OFF (SUPER1_OFF + SIZE_OF_SUPER)
/*
* Control page of aggregate inode map
* followed by 1st extent of map
*/
#define AITBL_OFF (AIMAP_OFF + (SIZE_OF_MAP_PAGE << 1))
/*
* 1st extent of aggregate inode table
*/
#define SUPER2_OFF (AITBL_OFF + INODE_EXTENT_SIZE)
/*
* secondary superblock
*/
#define BMAP_OFF (SUPER2_OFF + SIZE_OF_SUPER)
/*
* block allocation map
*/
/*
* The following macro is used to indicate the number of reserved disk blocks at
* the front of an aggregate, in terms of physical blocks. This value is
* currently defined to be 32K. This turns out to be the same as the primary
* superblock's address, since it directly follows the reserved blocks.
*/
#define AGGR_RSVD_BLOCKS SUPER1_B
/*
* The following macro is used to indicate the number of reserved bytes at the
* front of an aggregate. This value is currently defined to be 32K. This
* turns out to be the same as the primary superblock's byte offset, since it
* directly follows the reserved blocks.
*/
#define AGGR_RSVD_BYTES SUPER1_OFF
/*
* The following macro defines the byte offset for the first inode extent in
* the aggregate inode table. This allows us to find the self inode to find the
* rest of the table. Currently this value is 44K.
*/
#define AGGR_INODE_TABLE_START AITBL_OFF
/*
* fixed reserved inode number
*/
/* aggregate inode */
#define AGGR_RESERVED_I 0 /* aggregate inode (reserved) */
#define AGGREGATE_I 1 /* aggregate inode map inode */
#define BMAP_I 2 /* aggregate block allocation map inode */
#define LOG_I 3 /* aggregate inline log inode */
#define BADBLOCK_I 4 /* aggregate bad block inode */
#define FILESYSTEM_I 16 /* 1st/only fileset inode in ait:
* fileset inode map inode
*/
/* per fileset inode */
#define FILESET_RSVD_I 0 /* fileset inode (reserved) */
#define FILESET_EXT_I 1 /* fileset inode extension */
#define ROOT_I 2 /* fileset root inode */
#define ACL_I 3 /* fileset ACL inode */
#define FILESET_OBJECT_I 4 /* the first fileset inode available for a file
* or directory or link...
*/
#define FIRST_FILESET_INO 16 /* the first aggregate inode which describes
* an inode. (To fsck this is also the first
* inode in part 2 of the agg inode table.)
*/
/*
* directory configuration
*/
#define JFS_NAME_MAX 255
#define JFS_PATH_MAX BPSIZE
/*
* file system state (superblock state)
*/
#define FM_CLEAN 0x00000000 /* file system is unmounted and clean */
#define FM_MOUNT 0x00000001 /* file system is mounted cleanly */
#define FM_DIRTY 0x00000002 /* file system was not unmounted and clean
* when mounted or
* commit failure occurred while being mounted:
* fsck() must be run to repair
*/
#define FM_LOGREDO 0x00000004 /* log based recovery (logredo()) failed:
* fsck() must be run to repair
*/
#define FM_EXTENDFS 0x00000008 /* file system extendfs() in progress */
#endif /* _H_JFS_FILSYS */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*/
/*
* Change History :
*
*/
/*
* jfs_imap.c: inode allocation map manager
*
* Serialization:
* Each AG has a simple lock which is used to control the serialization of
* the AG level lists. This lock should be taken first whenever an AG
* level list will be modified or accessed.
*
* Each IAG is locked by obtaining the buffer for the IAG page.
*
* There is also a inode lock for the inode map inode. A read lock needs to
* be taken whenever an IAG is read from the map or the global level
* information is read. A write lock needs to be taken whenever the global
* level information is modified or an atomic operation needs to be used.
*
* If more than one IAG is read at one time, the read lock may not
* be given up until all of the IAG's are read. Otherwise, a deadlock
* may occur when trying to obtain the read lock while another thread
* holding the read lock is waiting on the IAG already being held.
*
* The control page of the inode map is read into memory by diMount().
* Thereafter it should only be modified in memory and then it will be
* written out when the filesystem is unmounted by diUnmount().
*/
#include <linux/fs.h>
#include <linux/slab.h>
#include <linux/locks.h>
#include "jfs_incore.h"
#include "jfs_filsys.h"
#include "jfs_dinode.h"
#include "jfs_dmap.h"
#include "jfs_imap.h"
#include "jfs_metapage.h"
#include "jfs_superblock.h"
#include "jfs_debug.h"
/*
* imap locks
*/
/* iag free list lock */
#define IAGFREE_LOCK_INIT(imap) init_MUTEX(&imap->im_freelock)
#define IAGFREE_LOCK(imap) down(&imap->im_freelock)
#define IAGFREE_UNLOCK(imap) up(&imap->im_freelock)
/* per ag iag list locks */
#define AG_LOCK_INIT(imap,index) init_MUTEX(&(imap->im_aglock[index]))
#define AG_LOCK(imap,agno) down(&imap->im_aglock[agno])
#define AG_UNLOCK(imap,agno) up(&imap->im_aglock[agno])
/*
* external references
*/
extern struct address_space_operations jfs_aops;
/*
* forward references
*/
static int diAllocAG(imap_t *, int, boolean_t, struct inode *);
static int diAllocAny(imap_t *, int, boolean_t, struct inode *);
static int diAllocBit(imap_t *, iag_t *, int);
static int diAllocExt(imap_t *, int, struct inode *);
static int diAllocIno(imap_t *, int, struct inode *);
static int diFindFree(u32, int);
static int diNewExt(imap_t *, iag_t *, int);
static int diNewIAG(imap_t *, int *, int, metapage_t **);
static void duplicateIXtree(struct super_block *, s64, int, s64 *);
static int diIAGRead(imap_t * imap, int, metapage_t **);
static int copy_from_dinode(dinode_t *, struct inode *);
static void copy_to_dinode(dinode_t *, struct inode *);
/*
* debug code for double-checking inode map
*/
/* #define _JFS_DEBUG_IMAP 1 */
#ifdef _JFS_DEBUG_IMAP
#define DBG_DIINIT(imap) DBGdiInit(imap)
#define DBG_DIALLOC(imap, ino) DBGdiAlloc(imap, ino)
#define DBG_DIFREE(imap, ino) DBGdiFree(imap, ino)
static void *DBGdiInit(imap_t * imap);
static void DBGdiAlloc(imap_t * imap, ino_t ino);
static void DBGdiFree(imap_t * imap, ino_t ino);
#else
#define DBG_DIINIT(imap)
#define DBG_DIALLOC(imap, ino)
#define DBG_DIFREE(imap, ino)
#endif /* _JFS_DEBUG_IMAP */
/*
* NAME: diMount()
*
* FUNCTION: initialize the incore inode map control structures for
* a fileset or aggregate init time.
*
* the inode map's control structure (dinomap_t) is
* brought in from disk and placed in virtual memory.
*
* PARAMETERS:
* ipimap - pointer to inode map inode for the aggregate or fileset.
*
* RETURN VALUES:
* 0 - success
* ENOMEM - insufficient free virtual memory.
* EIO - i/o error.
*/
int diMount(struct inode *ipimap)
{
imap_t *imap;
metapage_t *mp;
int index;
dinomap_t *dinom_le;
/*
* allocate/initialize the in-memory inode map control structure
*/
/* allocate the in-memory inode map control structure. */
imap = (imap_t *) kmalloc(sizeof(imap_t), GFP_KERNEL);
if (imap == NULL) {
jERROR(1, ("diMount: kmalloc returned NULL!\n"));
return (ENOMEM);
}
/* read the on-disk inode map control structure. */
mp = read_metapage(ipimap,
IMAPBLKNO << JFS_SBI(ipimap->i_sb)->l2nbperpage,
PSIZE, 0);
if (mp == NULL) {
kfree(imap);
return (EIO);
}
/* copy the on-disk version to the in-memory version. */
dinom_le = (dinomap_t *) mp->data;
imap->im_freeiag = le32_to_cpu(dinom_le->in_freeiag);
imap->im_nextiag = le32_to_cpu(dinom_le->in_nextiag);
atomic_set(&imap->im_numinos, le32_to_cpu(dinom_le->in_numinos));
atomic_set(&imap->im_numfree, le32_to_cpu(dinom_le->in_numfree));
imap->im_nbperiext = le32_to_cpu(dinom_le->in_nbperiext);
imap->im_l2nbperiext = le32_to_cpu(dinom_le->in_l2nbperiext);
for (index = 0; index < MAXAG; index++) {
imap->im_agctl[index].inofree =
le32_to_cpu(dinom_le->in_agctl[index].inofree);
imap->im_agctl[index].extfree =
le32_to_cpu(dinom_le->in_agctl[index].extfree);
imap->im_agctl[index].numinos =
le32_to_cpu(dinom_le->in_agctl[index].numinos);
imap->im_agctl[index].numfree =
le32_to_cpu(dinom_le->in_agctl[index].numfree);
}
/* release the buffer. */
release_metapage(mp);
/*
* allocate/initialize inode allocation map locks
*/
/* allocate and init iag free list lock */
IAGFREE_LOCK_INIT(imap);
/* allocate and init ag list locks */
for (index = 0; index < MAXAG; index++) {
AG_LOCK_INIT(imap, index);
}
/* bind the inode map inode and inode map control structure
* to each other.
*/
imap->im_ipimap = ipimap;
JFS_IP(ipimap)->i_imap = imap;
// DBG_DIINIT(imap);
return (0);
}
/*
* NAME: diUnmount()
*
* FUNCTION: write to disk the incore inode map control structures for
* a fileset or aggregate at unmount time.
*
* PARAMETERS:
* ipimap - pointer to inode map inode for the aggregate or fileset.
*
* RETURN VALUES:
* 0 - success
* ENOMEM - insufficient free virtual memory.
* EIO - i/o error.
*/
int diUnmount(struct inode *ipimap, int mounterror)
{
imap_t *imap = JFS_IP(ipimap)->i_imap;
/*
* update the on-disk inode map control structure
*/
if (!(mounterror || isReadOnly(ipimap)))
diSync(ipimap);
/*
* Invalidate the page cache buffers
*/
truncate_inode_pages(ipimap->i_mapping, 0);
/*
* free in-memory control structure
*/
kfree(imap);
return (0);
}
/*
* diSync()
*/
int diSync(struct inode *ipimap)
{
dinomap_t *dinom_le;
imap_t *imp = JFS_IP(ipimap)->i_imap;
metapage_t *mp;
int index;
/*
* write imap global conrol page
*/
/* read the on-disk inode map control structure */
mp = get_metapage(ipimap,
IMAPBLKNO << JFS_SBI(ipimap->i_sb)->l2nbperpage,
PSIZE, 0);
if (mp == NULL) {
jERROR(1,("diSync: get_metapage failed!\n"));
return EIO;
}
/* copy the in-memory version to the on-disk version */
//memcpy(mp->data, &imp->im_imap,sizeof(dinomap_t));
dinom_le = (dinomap_t *) mp->data;
dinom_le->in_freeiag = cpu_to_le32(imp->im_freeiag);
dinom_le->in_nextiag = cpu_to_le32(imp->im_nextiag);
dinom_le->in_numinos = cpu_to_le32(atomic_read(&imp->im_numinos));
dinom_le->in_numfree = cpu_to_le32(atomic_read(&imp->im_numfree));
dinom_le->in_nbperiext = cpu_to_le32(imp->im_nbperiext);
dinom_le->in_l2nbperiext = cpu_to_le32(imp->im_l2nbperiext);
for (index = 0; index < MAXAG; index++) {
dinom_le->in_agctl[index].inofree =
cpu_to_le32(imp->im_agctl[index].inofree);
dinom_le->in_agctl[index].extfree =
cpu_to_le32(imp->im_agctl[index].extfree);
dinom_le->in_agctl[index].numinos =
cpu_to_le32(imp->im_agctl[index].numinos);
dinom_le->in_agctl[index].numfree =
cpu_to_le32(imp->im_agctl[index].numfree);
}
/* write out the control structure */
write_metapage(mp);
/*
* write out dirty pages of imap
*/
fsync_inode_data_buffers(ipimap);
diWriteSpecial(ipimap);
return (0);
}
/*
* NAME: diRead()
*
* FUNCTION: initialize an incore inode from disk.
*
* on entry, the specifed incore inode should itself
* specify the disk inode number corresponding to the
* incore inode (i.e. i_number should be initialized).
*
* this routine handles incore inode initialization for
* both "special" and "regular" inodes. special inodes
* are those required early in the mount process and
* require special handling since much of the file system
* is not yet initialized. these "special" inodes are
* identified by a NULL inode map inode pointer and are
* actually initialized by a call to diReadSpecial().
*
* for regular inodes, the iag describing the disk inode
* is read from disk to determine the inode extent address
* for the disk inode. with the inode extent address in
* hand, the page of the extent that contains the disk
* inode is read and the disk inode is copied to the
* incore inode.
*
* PARAMETERS:
* ip - pointer to incore inode to be initialized from disk.
*
* RETURN VALUES:
* 0 - success
* EIO - i/o error.
* ENOMEM - insufficient memory
*
*/
int diRead(struct inode *ip)
{
struct jfs_sb_info *sbi = JFS_SBI(ip->i_sb);
int iagno, ino, extno, rc;
struct inode *ipimap;
dinode_t *dp;
iag_t *iagp;
metapage_t *mp;
s64 blkno, agstart;
imap_t *imap;
int block_offset;
int inodes_left;
uint pageno;
int rel_inode;
jFYI(1, ("diRead: ino = %ld\n", ip->i_ino));
ipimap = sbi->ipimap;
JFS_IP(ip)->ipimap = ipimap;
/* determine the iag number for this inode (number) */
iagno = INOTOIAG(ip->i_ino);
/* read the iag */
imap = JFS_IP(ipimap)->i_imap;
IREAD_LOCK(ipimap);
rc = diIAGRead(imap, iagno, &mp);
IREAD_UNLOCK(ipimap);
if (rc) {
jERROR(1, ("diRead: diIAGRead returned %d\n", rc));
return (rc);
}
iagp = (iag_t *) mp->data;
/* determine inode extent that holds the disk inode */
ino = ip->i_ino & (INOSPERIAG - 1);
extno = ino >> L2INOSPEREXT;
if ((lengthPXD(&iagp->inoext[extno]) != imap->im_nbperiext) ||
(addressPXD(&iagp->inoext[extno]) == 0)) {
jERROR(1, ("diRead: Bad inoext: 0x%lx, 0x%lx\n",
(ulong) addressPXD(&iagp->inoext[extno]),
(ulong) lengthPXD(&iagp->inoext[extno])));
release_metapage(mp);
updateSuper(ip->i_sb, FM_DIRTY);
return ESTALE;
}
/* get disk block number of the page within the inode extent
* that holds the disk inode.
*/
blkno = INOPBLK(&iagp->inoext[extno], ino, sbi->l2nbperpage);
/* get the ag for the iag */
agstart = le64_to_cpu(iagp->agstart);
release_metapage(mp);
rel_inode = (ino & (INOSPERPAGE - 1));
pageno = blkno >> sbi->l2nbperpage;
if ((block_offset = ((u32) blkno & (sbi->nbperpage - 1)))) {
/*
* OS/2 didn't always align inode extents on page boundaries
*/
inodes_left =
(sbi->nbperpage - block_offset) << sbi->l2niperblk;
if (rel_inode < inodes_left)
rel_inode += block_offset << sbi->l2niperblk;
else {
pageno += 1;
rel_inode -= inodes_left;
}
}
/* read the page of disk inode */
mp = read_metapage(ipimap, pageno << sbi->l2nbperpage, PSIZE, 1);
if (mp == 0) {
jERROR(1, ("diRead: read_metapage failed\n"));
return EIO;
}
/* locate the the disk inode requested */
dp = (dinode_t *) mp->data;
dp += rel_inode;
if (ip->i_ino != le32_to_cpu(dp->di_number)) {
jERROR(1, ("diRead: i_ino != di_number\n"));
updateSuper(ip->i_sb, FM_DIRTY);
rc = EIO;
} else if (le32_to_cpu(dp->di_nlink) == 0) {
jERROR(1,
("diRead: di_nlink is zero. ino=%ld\n", ip->i_ino));
updateSuper(ip->i_sb, FM_DIRTY);
rc = ESTALE;
} else
/* copy the disk inode to the in-memory inode */
rc = copy_from_dinode(dp, ip);
release_metapage(mp);
/* set the ag for the inode */
JFS_IP(ip)->agno = BLKTOAG(agstart, sbi);
return (rc);
}
/*
* NAME: diReadSpecial()
*
* FUNCTION: initialize a 'special' inode from disk.
*
* this routines handles aggregate level inodes. The
* inode cache cannot differentiate between the
* aggregate inodes and the filesystem inodes, so we
* handle these here. We don't actually use the aggregate
* inode map, since these inodes are at a fixed location
* and in some cases the aggregate inode map isn't initialized
* yet.
*
* PARAMETERS:
* sb - filesystem superblock
* inum - aggregate inode number
*
* RETURN VALUES:
* new inode - success
* NULL - i/o error.
*/
struct inode *diReadSpecial(struct super_block *sb, ino_t inum)
{
struct jfs_sb_info *sbi = JFS_SBI(sb);
uint address;
dinode_t *dp;
struct inode *ip;
metapage_t *mp;
ip = new_inode(sb);
if (ip == NULL) {
jERROR(1,
("diReadSpecial: new_inode returned NULL!\n"));
return ip;
}
/*
* If ip->i_number >= 32 (INOSPEREXT), then read from secondary
* aggregate inode table.
*/
if (inum >= INOSPEREXT) {
address =
addressPXD(&sbi->ait2) >> sbi->l2nbperpage;
inum -= INOSPEREXT;
ASSERT(inum < INOSPEREXT);
JFS_IP(ip)->ipimap = sbi->ipaimap2;
} else {
address = AITBL_OFF >> L2PSIZE;
JFS_IP(ip)->ipimap = sbi->ipaimap;
}
ip->i_ino = inum;
address += inum >> 3; /* 8 inodes per 4K page */
/* read the page of fixed disk inode (AIT) in raw mode */
jEVENT(0,
("Reading aggregate inode %d from block %d\n", (uint) inum,
address));
mp = read_metapage(ip, address << sbi->l2nbperpage, PSIZE, 1);
if (mp == NULL) {
ip->i_sb = NULL;
ip->i_nlink = 1; /* Don't want iput() deleting it */
iput(ip);
return (NULL);
}
/* get the pointer to the disk inode of interest */
dp = (dinode_t *) (mp->data);
dp += inum % 8; /* 8 inodes per 4K page */
/* copy on-disk inode to in-memory inode */
if ((copy_from_dinode(dp, ip)) != 0) {
/* handle bad return by returning NULL for ip */
ip->i_sb = NULL;
ip->i_nlink = 1; /* Don't want iput() deleting it */
iput(ip);
/* release the page */
release_metapage(mp);
return (NULL);
}
ip->i_mapping->a_ops = &jfs_aops;
ip->i_mapping->gfp_mask = GFP_NOFS;
if ((inum == FILESYSTEM_I) && (JFS_IP(ip)->ipimap == sbi->ipaimap)) {
sbi->gengen = le32_to_cpu(dp->di_gengen);
sbi->inostamp = le32_to_cpu(dp->di_inostamp);
}
/* release the page */
release_metapage(mp);
return (ip);
}
/*
* NAME: diWriteSpecial()
*
* FUNCTION: Write the special inode to disk
*
* PARAMETERS:
* ip - special inode
*
* RETURN VALUES: none
*/
void diWriteSpecial(struct inode *ip)
{
struct jfs_sb_info *sbi = JFS_SBI(ip->i_sb);
uint address;
dinode_t *dp;
ino_t inum = ip->i_ino;
metapage_t *mp;
/*
* If ip->i_number >= 32 (INOSPEREXT), then write to secondary
* aggregate inode table.
*/
if (!(ip->i_state & I_DIRTY))
return;
ip->i_state &= ~I_DIRTY;
if (inum >= INOSPEREXT) {
address =
addressPXD(&sbi->ait2) >> sbi->l2nbperpage;
inum -= INOSPEREXT;
ASSERT(inum < INOSPEREXT);
} else {
address = AITBL_OFF >> L2PSIZE;
}
address += inum >> 3; /* 8 inodes per 4K page */
/* read the page of fixed disk inode (AIT) in raw mode */
jEVENT(0,
("Reading aggregate inode %d from block %d\n", (uint) inum,
address));
mp = read_metapage(ip, address << sbi->l2nbperpage, PSIZE, 1);
if (mp == NULL) {
jERROR(1,
("diWriteSpecial: failed to read aggregate inode extent!\n"));
return;
}
/* get the pointer to the disk inode of interest */
dp = (dinode_t *) (mp->data);
dp += inum % 8; /* 8 inodes per 4K page */
/* copy on-disk inode to in-memory inode */
copy_to_dinode(dp, ip);
memcpy(&dp->di_xtroot, &JFS_IP(ip)->i_xtroot, 288);
if (inum == FILESYSTEM_I)
dp->di_gengen = cpu_to_le32(sbi->gengen);
/* write the page */
write_metapage(mp);
}
/*
* NAME: diFreeSpecial()
*
* FUNCTION: Free allocated space for special inode
*/
void diFreeSpecial(struct inode *ip)
{
if (ip == NULL) {
jERROR(1, ("diFreeSpecial called with NULL ip!\n"));
return;
}
fsync_inode_data_buffers(ip);
truncate_inode_pages(ip->i_mapping, 0);
iput(ip);
}
/*
* NAME: diWrite()
*
* FUNCTION: write the on-disk inode portion of the in-memory inode
* to its corresponding on-disk inode.
*
* on entry, the specifed incore inode should itself
* specify the disk inode number corresponding to the
* incore inode (i.e. i_number should be initialized).
*
* the inode contains the inode extent address for the disk
* inode. with the inode extent address in hand, the
* page of the extent that contains the disk inode is
* read and the disk inode portion of the incore inode
* is copied to the disk inode.
*
* PARAMETERS:
* tid - transacation id
* ip - pointer to incore inode to be written to the inode extent.
*
* RETURN VALUES:
* 0 - success
* EIO - i/o error.
*/
int diWrite(tid_t tid, struct inode *ip)
{
struct jfs_sb_info *sbi = JFS_SBI(ip->i_sb);
struct jfs_inode_info *jfs_ip = JFS_IP(ip);
int rc = 0;
s32 ino;
dinode_t *dp;
s64 blkno;
int block_offset;
int inodes_left;
metapage_t *mp;
uint pageno;
int rel_inode;
int dioffset;
struct inode *ipimap;
uint type;
lid_t lid;
tlock_t *ditlck, *tlck;
linelock_t *dilinelock, *ilinelock;
lv_t *lv;
int n;
ipimap = jfs_ip->ipimap;
ino = ip->i_ino & (INOSPERIAG - 1);
assert(lengthPXD(&(jfs_ip->ixpxd)) ==
JFS_IP(ipimap)->i_imap->im_nbperiext);
assert(addressPXD(&(jfs_ip->ixpxd)));
/*
* read the page of disk inode containing the specified inode:
*/
/* compute the block address of the page */
blkno = INOPBLK(&(jfs_ip->ixpxd), ino, sbi->l2nbperpage);
rel_inode = (ino & (INOSPERPAGE - 1));
pageno = blkno >> sbi->l2nbperpage;
if ((block_offset = ((u32) blkno & (sbi->nbperpage - 1)))) {
/*
* OS/2 didn't always align inode extents on page boundaries
*/
inodes_left =
(sbi->nbperpage - block_offset) << sbi->l2niperblk;
if (rel_inode < inodes_left)
rel_inode += block_offset << sbi->l2niperblk;
else {
pageno += 1;
rel_inode -= inodes_left;
}
}
/* read the page of disk inode */
retry:
mp = read_metapage(ipimap, pageno << sbi->l2nbperpage, PSIZE, 1);
if (mp == 0)
return (EIO);
/* get the pointer to the disk inode */
dp = (dinode_t *) mp->data;
dp += rel_inode;
dioffset = (ino & (INOSPERPAGE - 1)) << L2DISIZE;
/*
* acquire transaction lock on the on-disk inode;
* N.B. tlock is acquired on ipimap not ip;
*/
if ((ditlck =
txLock(tid, ipimap, mp, tlckINODE | tlckENTRY)) == NULL)
goto retry;
dilinelock = (linelock_t *) & ditlck->lock;
/*
* copy btree root from in-memory inode to on-disk inode
*
* (tlock is taken from inline B+-tree root in in-memory
* inode when the B+-tree root is updated, which is pointed
* by jfs_ip->blid as well as being on tx tlock list)
*
* further processing of btree root is based on the copy
* in in-memory inode, where txLog() will log from, and,
* for xtree root, txUpdateMap() will update map and reset
* XAD_NEW bit;
*/
if (S_ISDIR(ip->i_mode) && (lid = jfs_ip->xtlid)) {
/*
* This is the special xtree inside the directory for storing
* the directory table
*/
xtpage_t *p, *xp;
xad_t *xad;
jfs_ip->xtlid = 0;
tlck = lid_to_tlock(lid);
assert(tlck->type & tlckXTREE);
tlck->type |= tlckBTROOT;
tlck->mp = mp;
ilinelock = (linelock_t *) & tlck->lock;
/*
* copy xtree root from inode to dinode:
*/
p = &jfs_ip->i_xtroot;
xp = (xtpage_t *) &dp->di_dirtable;
lv = (lv_t *) & ilinelock->lv;
for (n = 0; n < ilinelock->index; n++, lv++) {
memcpy(&xp->xad[lv->offset], &p->xad[lv->offset],
lv->length << L2XTSLOTSIZE);
}
/* reset on-disk (metadata page) xtree XAD_NEW bit */
xad = &xp->xad[XTENTRYSTART];
for (n = XTENTRYSTART;
n < le16_to_cpu(xp->header.nextindex); n++, xad++)
if (xad->flag & (XAD_NEW | XAD_EXTENDED))
xad->flag &= ~(XAD_NEW | XAD_EXTENDED);
}
if ((lid = jfs_ip->blid) == 0)
goto inlineData;
jfs_ip->blid = 0;
tlck = lid_to_tlock(lid);
type = tlck->type;
tlck->type |= tlckBTROOT;
tlck->mp = mp;
ilinelock = (linelock_t *) & tlck->lock;
/*
* regular file: 16 byte (XAD slot) granularity
*/
if (type & tlckXTREE) {
xtpage_t *p, *xp;
xad_t *xad;
/*
* copy xtree root from inode to dinode:
*/
p = &jfs_ip->i_xtroot;
xp = &dp->di_xtroot;
lv = (lv_t *) & ilinelock->lv;
for (n = 0; n < ilinelock->index; n++, lv++) {
memcpy(&xp->xad[lv->offset], &p->xad[lv->offset],
lv->length << L2XTSLOTSIZE);
}
/* reset on-disk (metadata page) xtree XAD_NEW bit */
xad = &xp->xad[XTENTRYSTART];
for (n = XTENTRYSTART;
n < le16_to_cpu(xp->header.nextindex); n++, xad++)
if (xad->flag & (XAD_NEW | XAD_EXTENDED))
xad->flag &= ~(XAD_NEW | XAD_EXTENDED);
}
/*
* directory: 32 byte (directory entry slot) granularity
*/
else if (type & tlckDTREE) {
dtpage_t *p, *xp;
/*
* copy dtree root from inode to dinode:
*/
p = (dtpage_t *) &jfs_ip->i_dtroot;
xp = (dtpage_t *) & dp->di_dtroot;
lv = (lv_t *) & ilinelock->lv;
for (n = 0; n < ilinelock->index; n++, lv++) {
memcpy(&xp->slot[lv->offset], &p->slot[lv->offset],
lv->length << L2DTSLOTSIZE);
}
} else {
jERROR(1, ("diWrite: UFO tlock\n"));
}
inlineData:
/*
* copy inline symlink from in-memory inode to on-disk inode
*/
if (S_ISLNK(ip->i_mode) && ip->i_size < IDATASIZE) {
lv = (lv_t *) & dilinelock->lv[dilinelock->index];
lv->offset = (dioffset + 2 * 128) >> L2INODESLOTSIZE;
lv->length = 2;
memcpy(&dp->di_fastsymlink, jfs_ip->i_inline, IDATASIZE);
dilinelock->index++;
}
#ifdef _STILL_TO_PORT
/*
* copy inline data from in-memory inode to on-disk inode:
* 128 byte slot granularity
*/
if (test_cflag(COMMIT_Inlineea, ip))
lv = (lv_t *) & dilinelock->lv[dilinelock->index];
lv->offset = (dioffset + 3 * 128) >> L2INODESLOTSIZE;
lv->length = 1;
memcpy(&dp->di_inlineea, &ip->i_inlineea, INODESLOTSIZE);
dilinelock->index++;
clear_cflag(COMMIT_Inlineea, ip);
}
#endif /* _STILL_TO_PORT */
/*
* lock/copy inode base: 128 byte slot granularity
*/
// baseDinode:
lv = (lv_t *) & dilinelock->lv[dilinelock->index];
lv->offset = dioffset >> L2INODESLOTSIZE;
copy_to_dinode(dp, ip);
if (test_and_clear_cflag(COMMIT_Dirtable, ip)) {
lv->length = 2;
memcpy(&dp->di_dirtable, &jfs_ip->i_dirtable, 96);
} else
lv->length = 1;
dilinelock->index++;
#ifdef _JFS_FASTDASD
/*
* We aren't logging changes to the DASD used in directory inodes,
* but we need to write them to disk. If we don't unmount cleanly,
* mount will recalculate the DASD used.
*/
if (S_ISDIR(ip->i_mode)
&& (ip->i_ipmnt->i_mntflag & JFS_DASD_ENABLED))
bcopy(&ip->i_DASD, &dp->di_DASD, sizeof(dasd_t));
#endif /* _JFS_FASTDASD */
/* release the buffer holding the updated on-disk inode.
* the buffer will be later written by commit processing.
*/
write_metapage(mp);
return (rc);
}
/*
* NAME: diFree(ip)
*
* FUNCTION: free a specified inode from the inode working map
* for a fileset or aggregate.
*
* if the inode to be freed represents the first (only)
* free inode within the iag, the iag will be placed on
* the ag free inode list.
*
* freeing the inode will cause the inode extent to be
* freed if the inode is the only allocated inode within
* the extent. in this case all the disk resource backing
* up the inode extent will be freed. in addition, the iag
* will be placed on the ag extent free list if the extent
* is the first free extent in the iag. if freeing the
* extent also means that no free inodes will exist for
* the iag, the iag will also be removed from the ag free
* inode list.
*
* the iag describing the inode will be freed if the extent
* is to be freed and it is the only backed extent within
* the iag. in this case, the iag will be removed from the
* ag free extent list and ag free inode list and placed on
* the inode map's free iag list.
*
* a careful update approach is used to provide consistency
* in the face of updates to multiple buffers. under this
* approach, all required buffers are obtained before making
* any updates and are held until all updates are complete.
*
* PARAMETERS:
* ip - inode to be freed.
*
* RETURN VALUES:
* 0 - success
* EIO - i/o error.
*/
int diFree(struct inode *ip)
{
int rc;
ino_t inum = ip->i_ino;
iag_t *iagp, *aiagp, *biagp, *ciagp, *diagp;
metapage_t *mp, *amp, *bmp, *cmp, *dmp;
int iagno, ino, extno, bitno, sword, agno;
int back, fwd;
u32 bitmap, mask;
struct inode *ipimap = JFS_SBI(ip->i_sb)->ipimap;
imap_t *imap = JFS_IP(ipimap)->i_imap;
s64 xaddr;
s64 xlen;
pxd_t freepxd;
tid_t tid;
struct inode *iplist[3];
tlock_t *tlck;
pxdlock_t *pxdlock;
/*
* This is just to suppress compiler warnings. The same logic that
* references these variables is used to initialize them.
*/
aiagp = biagp = ciagp = diagp = NULL;
/* get the iag number containing the inode.
*/
iagno = INOTOIAG(inum);
/* make sure that the iag is contained within
* the map.
*/
//assert(iagno < imap->im_nextiag);
if (iagno >= imap->im_nextiag) {
jERROR(1, ("diFree: inum = %d, iagno = %d, nextiag = %d\n",
(uint) inum, iagno, imap->im_nextiag));
dump_mem("imap", imap, 32);
updateSuper(ip->i_sb, FM_DIRTY);
return EIO;
}
/* get the allocation group for this ino.
*/
agno = JFS_IP(ip)->agno;
/* Lock the AG specific inode map information
*/
AG_LOCK(imap, agno);
/* Obtain read lock in imap inode. Don't release it until we have
* read all of the IAG's that we are going to.
*/
IREAD_LOCK(ipimap);
/* read the iag.
*/
if ((rc = diIAGRead(imap, iagno, &mp))) {
IREAD_UNLOCK(ipimap);
AG_UNLOCK(imap, agno);
return (rc);
}
iagp = (iag_t *) mp->data;
/* get the inode number and extent number of the inode within
* the iag and the inode number within the extent.
*/
ino = inum & (INOSPERIAG - 1);
extno = ino >> L2INOSPEREXT;
bitno = ino & (INOSPEREXT - 1);
mask = HIGHORDER >> bitno;
assert(le32_to_cpu(iagp->wmap[extno]) & mask);
#ifdef _STILL_TO_PORT
assert((le32_to_cpu(iagp->pmap[extno]) & mask) == 0);
#endif /* _STILL_TO_PORT */
assert(addressPXD(&iagp->inoext[extno]));
/* compute the bitmap for the extent reflecting the freed inode.
*/
bitmap = le32_to_cpu(iagp->wmap[extno]) & ~mask;
if (imap->im_agctl[agno].numfree > imap->im_agctl[agno].numinos) {
jERROR(1,("diFree: numfree > numinos\n"));
release_metapage(mp);
IREAD_UNLOCK(ipimap);
AG_UNLOCK(imap, agno);
updateSuper(ip->i_sb, FM_DIRTY);
return EIO;
}
/*
* inode extent still has some inodes or below low water mark:
* keep the inode extent;
*/
if (bitmap ||
imap->im_agctl[agno].numfree < 96 ||
(imap->im_agctl[agno].numfree < 288 &&
(((imap->im_agctl[agno].numfree * 100) /
imap->im_agctl[agno].numinos) <= 25))) {
/* if the iag currently has no free inodes (i.e.,
* the inode being freed is the first free inode of iag),
* insert the iag at head of the inode free list for the ag.
*/
if (iagp->nfreeinos == 0) {
/* check if there are any iags on the ag inode
* free list. if so, read the first one so that
* we can link the current iag onto the list at
* the head.
*/
if ((fwd = imap->im_agctl[agno].inofree) >= 0) {
/* read the iag that currently is the head
* of the list.
*/
if ((rc = diIAGRead(imap, fwd, &amp))) {
IREAD_UNLOCK(ipimap);
AG_UNLOCK(imap, agno);
release_metapage(mp);
return (rc);
}
aiagp = (iag_t *) amp->data;
/* make current head point back to the iag.
*/
aiagp->inofreeback = cpu_to_le32(iagno);
write_metapage(amp);
}
/* iag points forward to current head and iag
* becomes the new head of the list.
*/
iagp->inofreefwd =
cpu_to_le32(imap->im_agctl[agno].inofree);
iagp->inofreeback = -1;
imap->im_agctl[agno].inofree = iagno;
}
IREAD_UNLOCK(ipimap);
/* update the free inode summary map for the extent if
* freeing the inode means the extent will now have free
* inodes (i.e., the inode being freed is the first free
* inode of extent),
*/
if (iagp->wmap[extno] == ONES) {
sword = extno >> L2EXTSPERSUM;
bitno = extno & (EXTSPERSUM - 1);
iagp->inosmap[sword] &=
cpu_to_le32(~(HIGHORDER >> bitno));
}
/* update the bitmap.
*/
iagp->wmap[extno] = cpu_to_le32(bitmap);
DBG_DIFREE(imap, inum);
/* update the free inode counts at the iag, ag and
* map level.
*/
iagp->nfreeinos =
cpu_to_le32(le32_to_cpu(iagp->nfreeinos) + 1);
imap->im_agctl[agno].numfree += 1;
atomic_inc(&imap->im_numfree);
/* release the AG inode map lock
*/
AG_UNLOCK(imap, agno);
/* write the iag */
write_metapage(mp);
return (0);
}
/*
* inode extent has become free and above low water mark:
* free the inode extent;
*/
/*
* prepare to update iag list(s) (careful update step 1)
*/
amp = bmp = cmp = dmp = NULL;
fwd = back = -1;
/* check if the iag currently has no free extents. if so,
* it will be placed on the head of the ag extent free list.
*/
if (iagp->nfreeexts == 0) {
/* check if the ag extent free list has any iags.
* if so, read the iag at the head of the list now.
* this (head) iag will be updated later to reflect
* the addition of the current iag at the head of
* the list.
*/
if ((fwd = imap->im_agctl[agno].extfree) >= 0) {
if ((rc = diIAGRead(imap, fwd, &amp)))
goto error_out;
aiagp = (iag_t *) amp->data;
}
} else {
/* iag has free extents. check if the addition of a free
* extent will cause all extents to be free within this
* iag. if so, the iag will be removed from the ag extent
* free list and placed on the inode map's free iag list.
*/
if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG - 1)) {
/* in preparation for removing the iag from the
* ag extent free list, read the iags preceeding
* and following the iag on the ag extent free
* list.
*/
if ((fwd = le32_to_cpu(iagp->extfreefwd)) >= 0) {
if ((rc = diIAGRead(imap, fwd, &amp)))
goto error_out;
aiagp = (iag_t *) amp->data;
}
if ((back = le32_to_cpu(iagp->extfreeback)) >= 0) {
if ((rc = diIAGRead(imap, back, &bmp)))
goto error_out;
biagp = (iag_t *) bmp->data;
}
}
}
/* remove the iag from the ag inode free list if freeing
* this extent cause the iag to have no free inodes.
*/
if (iagp->nfreeinos == cpu_to_le32(INOSPEREXT - 1)) {
int inofreeback = le32_to_cpu(iagp->inofreeback);
int inofreefwd = le32_to_cpu(iagp->inofreefwd);
/* in preparation for removing the iag from the
* ag inode free list, read the iags preceeding
* and following the iag on the ag inode free
* list. before reading these iags, we must make
* sure that we already don't have them in hand
* from up above, since re-reading an iag (buffer)
* we are currently holding would cause a deadlock.
*/
if (inofreefwd >= 0) {
if (inofreefwd == fwd)
ciagp = (iag_t *) amp->data;
else if (inofreefwd == back)
ciagp = (iag_t *) bmp->data;
else {
if ((rc =
diIAGRead(imap, inofreefwd, &cmp)))
goto error_out;
assert(cmp != NULL);
ciagp = (iag_t *) cmp->data;
}
assert(ciagp != NULL);
}
if (inofreeback >= 0) {
if (inofreeback == fwd)
diagp = (iag_t *) amp->data;
else if (inofreeback == back)
diagp = (iag_t *) bmp->data;
else {
if ((rc =
diIAGRead(imap, inofreeback, &dmp)))
goto error_out;
assert(dmp != NULL);
diagp = (iag_t *) dmp->data;
}
assert(diagp != NULL);
}
}
IREAD_UNLOCK(ipimap);
/*
* invalidate any page of the inode extent freed from buffer cache;
*/
freepxd = iagp->inoext[extno];
xaddr = addressPXD(&iagp->inoext[extno]);
xlen = lengthPXD(&iagp->inoext[extno]);
invalidate_metapages(JFS_SBI(ip->i_sb)->direct_inode, xaddr, xlen);
/*
* update iag list(s) (careful update step 2)
*/
/* add the iag to the ag extent free list if this is the
* first free extent for the iag.
*/
if (iagp->nfreeexts == 0) {
if (fwd >= 0)
aiagp->extfreeback = cpu_to_le32(iagno);
iagp->extfreefwd =
cpu_to_le32(imap->im_agctl[agno].extfree);
iagp->extfreeback = -1;
imap->im_agctl[agno].extfree = iagno;
} else {
/* remove the iag from the ag extent list if all extents
* are now free and place it on the inode map iag free list.
*/
if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG - 1)) {
if (fwd >= 0)
aiagp->extfreeback = iagp->extfreeback;
if (back >= 0)
biagp->extfreefwd = iagp->extfreefwd;
else
imap->im_agctl[agno].extfree =
le32_to_cpu(iagp->extfreefwd);
iagp->extfreefwd = iagp->extfreeback = -1;
IAGFREE_LOCK(imap);
iagp->iagfree = cpu_to_le32(imap->im_freeiag);
imap->im_freeiag = iagno;
IAGFREE_UNLOCK(imap);
}
}
/* remove the iag from the ag inode free list if freeing
* this extent causes the iag to have no free inodes.
*/
if (iagp->nfreeinos == cpu_to_le32(INOSPEREXT - 1)) {
if ((int) le32_to_cpu(iagp->inofreefwd) >= 0)
ciagp->inofreeback = iagp->inofreeback;
if ((int) le32_to_cpu(iagp->inofreeback) >= 0)
diagp->inofreefwd = iagp->inofreefwd;
else
imap->im_agctl[agno].inofree =
le32_to_cpu(iagp->inofreefwd);
iagp->inofreefwd = iagp->inofreeback = -1;
}
/* update the inode extent address and working map
* to reflect the free extent.
* the permanent map should have been updated already
* for the inode being freed.
*/
assert(iagp->pmap[extno] == 0);
iagp->wmap[extno] = 0;
DBG_DIFREE(imap, inum);
PXDlength(&iagp->inoext[extno], 0);
PXDaddress(&iagp->inoext[extno], 0);
/* update the free extent and free inode summary maps
* to reflect the freed extent.
* the inode summary map is marked to indicate no inodes
* available for the freed extent.
*/
sword = extno >> L2EXTSPERSUM;
bitno = extno & (EXTSPERSUM - 1);
mask = HIGHORDER >> bitno;
iagp->inosmap[sword] |= cpu_to_le32(mask);
iagp->extsmap[sword] &= cpu_to_le32(~mask);
/* update the number of free inodes and number of free extents
* for the iag.
*/
iagp->nfreeinos = cpu_to_le32(le32_to_cpu(iagp->nfreeinos) -
(INOSPEREXT - 1));
iagp->nfreeexts = cpu_to_le32(le32_to_cpu(iagp->nfreeexts) + 1);
/* update the number of free inodes and backed inodes
* at the ag and inode map level.
*/
imap->im_agctl[agno].numfree -= (INOSPEREXT - 1);
imap->im_agctl[agno].numinos -= INOSPEREXT;
atomic_sub(INOSPEREXT - 1, &imap->im_numfree);
atomic_sub(INOSPEREXT, &imap->im_numinos);
if (amp)
write_metapage(amp);
if (bmp)
write_metapage(bmp);
if (cmp)
write_metapage(cmp);
if (dmp)
write_metapage(dmp);
/*
* start transaction to update block allocation map
* for the inode extent freed;
*
* N.B. AG_LOCK is released and iag will be released below, and
* other thread may allocate inode from/reusing the ixad freed
* BUT with new/different backing inode extent from the extent
* to be freed by the transaction;
*/
tid = txBegin(ipimap->i_sb, COMMIT_FORCE);
/* acquire tlock of the iag page of the freed ixad
* to force the page NOHOMEOK (even though no data is
* logged from the iag page) until NOREDOPAGE|FREEXTENT log
* for the free of the extent is committed;
* write FREEXTENT|NOREDOPAGE log record
* N.B. linelock is overlaid as freed extent descriptor;
*/
tlck = txLock(tid, ipimap, mp, tlckINODE | tlckFREE);
pxdlock = (pxdlock_t *) & tlck->lock;
pxdlock->flag = mlckFREEPXD;
pxdlock->pxd = freepxd;
pxdlock->index = 1;
write_metapage(mp);
iplist[0] = ipimap;
/*
* logredo needs the IAG number and IAG extent index in order
* to ensure that the IMap is consistent. The least disruptive
* way to pass these values through to the transaction manager
* is in the iplist array.
*
* It's not pretty, but it works.
*/
iplist[1] = (struct inode *) (size_t)iagno;
iplist[2] = (struct inode *) (size_t)extno;
rc = txCommit(tid, 1, &iplist[0], COMMIT_FORCE); // D233382
txEnd(tid);
/* unlock the AG inode map information */
AG_UNLOCK(imap, agno);
return (0);
error_out:
IREAD_UNLOCK(ipimap);
if (amp)
release_metapage(amp);
if (bmp)
release_metapage(bmp);
if (cmp)
release_metapage(cmp);
if (dmp)
release_metapage(dmp);
AG_UNLOCK(imap, agno);
release_metapage(mp);
return (rc);
}
/*
* There are several places in the diAlloc* routines where we initialize
* the inode.
*/
static inline void
diInitInode(struct inode *ip, int iagno, int ino, int extno, iag_t * iagp)
{
struct jfs_sb_info *sbi = JFS_SBI(ip->i_sb);
struct jfs_inode_info *jfs_ip = JFS_IP(ip);
ip->i_ino = (iagno << L2INOSPERIAG) + ino;
DBG_DIALLOC(JFS_IP(ipimap)->i_imap, ip->i_ino);
jfs_ip->ixpxd = iagp->inoext[extno];
jfs_ip->agno = BLKTOAG(le64_to_cpu(iagp->agstart), sbi);
}
/*
* NAME: diAlloc(pip,dir,ip)
*
* FUNCTION: allocate a disk inode from the inode working map
* for a fileset or aggregate.
*
* PARAMETERS:
* pip - pointer to incore inode for the parent inode.
* dir - TRUE if the new disk inode is for a directory.
* ip - pointer to a new inode
*
* RETURN VALUES:
* 0 - success.
* ENOSPC - insufficient disk resources.
* EIO - i/o error.
*/
int diAlloc(struct inode *pip, boolean_t dir, struct inode *ip)
{
int rc, ino, iagno, addext, extno, bitno, sword;
int nwords, rem, i, agno;
u32 mask, inosmap, extsmap;
struct inode *ipimap;
metapage_t *mp;
ino_t inum;
iag_t *iagp;
imap_t *imap;
/* get the pointers to the inode map inode and the
* corresponding imap control structure.
*/
ipimap = JFS_SBI(pip->i_sb)->ipimap;
imap = JFS_IP(ipimap)->i_imap;
JFS_IP(ip)->ipimap = ipimap;
JFS_IP(ip)->fileset = FILESYSTEM_I;
/* for a directory, the allocation policy is to start
* at the ag level using the preferred ag.
*/
if (dir == TRUE) {
agno = dbNextAG(JFS_SBI(pip->i_sb)->ipbmap);
AG_LOCK(imap, agno);
goto tryag;
}
/* for files, the policy starts off by trying to allocate from
* the same iag containing the parent disk inode:
* try to allocate the new disk inode close to the parent disk
* inode, using parent disk inode number + 1 as the allocation
* hint. (we use a left-to-right policy to attempt to avoid
* moving backward on the disk.) compute the hint within the
* file system and the iag.
*/
inum = pip->i_ino + 1;
ino = inum & (INOSPERIAG - 1);
/* back off the the hint if it is outside of the iag */
if (ino == 0)
inum = pip->i_ino;
/* get the ag number of this iag */
agno = JFS_IP(pip)->agno;
/* lock the AG inode map information */
AG_LOCK(imap, agno);
/* Get read lock on imap inode */
IREAD_LOCK(ipimap);
/* get the iag number and read the iag */
iagno = INOTOIAG(inum);
if ((rc = diIAGRead(imap, iagno, &mp))) {
IREAD_UNLOCK(ipimap);
return (rc);
}
iagp = (iag_t *) mp->data;
/* determine if new inode extent is allowed to be added to the iag.
* new inode extent can be added to the iag if the ag
* has less than 32 free disk inodes and the iag has free extents.
*/
addext = (imap->im_agctl[agno].numfree < 32 && iagp->nfreeexts);
/*
* try to allocate from the IAG
*/
/* check if the inode may be allocated from the iag
* (i.e. the inode has free inodes or new extent can be added).
*/
if (iagp->nfreeinos || addext) {
/* determine the extent number of the hint.
*/
extno = ino >> L2INOSPEREXT;
/* check if the extent containing the hint has backed
* inodes. if so, try to allocate within this extent.
*/
if (addressPXD(&iagp->inoext[extno])) {
bitno = ino & (INOSPEREXT - 1);
if ((bitno =
diFindFree(le32_to_cpu(iagp->wmap[extno]),
bitno))
< INOSPEREXT) {
ino = (extno << L2INOSPEREXT) + bitno;
/* a free inode (bit) was found within this
* extent, so allocate it.
*/
rc = diAllocBit(imap, iagp, ino);
IREAD_UNLOCK(ipimap);
if (rc) {
assert(rc == EIO);
} else {
/* set the results of the allocation
* and write the iag.
*/
diInitInode(ip, iagno, ino, extno,
iagp);
mark_metapage_dirty(mp);
}
release_metapage(mp);
/* free the AG lock and return.
*/
AG_UNLOCK(imap, agno);
return (rc);
}
if (!addext)
extno =
(extno ==
EXTSPERIAG - 1) ? 0 : extno + 1;
}
/*
* no free inodes within the extent containing the hint.
*
* try to allocate from the backed extents following
* hint or, if appropriate (i.e. addext is true), allocate
* an extent of free inodes at or following the extent
* containing the hint.
*
* the free inode and free extent summary maps are used
* here, so determine the starting summary map position
* and the number of words we'll have to examine. again,
* the approach is to allocate following the hint, so we
* might have to initially ignore prior bits of the summary
* map that represent extents prior to the extent containing
* the hint and later revisit these bits.
*/
bitno = extno & (EXTSPERSUM - 1);
nwords = (bitno == 0) ? SMAPSZ : SMAPSZ + 1;
sword = extno >> L2EXTSPERSUM;
/* mask any prior bits for the starting words of the
* summary map.
*/
mask = ONES << (EXTSPERSUM - bitno);
inosmap = le32_to_cpu(iagp->inosmap[sword]) | mask;
extsmap = le32_to_cpu(iagp->extsmap[sword]) | mask;
/* scan the free inode and free extent summary maps for
* free resources.
*/
for (i = 0; i < nwords; i++) {
/* check if this word of the free inode summary
* map describes an extent with free inodes.
*/
if (~inosmap) {
/* an extent with free inodes has been
* found. determine the extent number
* and the inode number within the extent.
*/
rem = diFindFree(inosmap, 0);
extno = (sword << L2EXTSPERSUM) + rem;
rem =
diFindFree(le32_to_cpu
(iagp->wmap[extno]), 0);
assert(rem < INOSPEREXT);
/* determine the inode number within the
* iag and allocate the inode from the
* map.
*/
ino = (extno << L2INOSPEREXT) + rem;
rc = diAllocBit(imap, iagp, ino);
IREAD_UNLOCK(ipimap);
if (rc) {
assert(rc == EIO);
} else {
/* set the results of the allocation
* and write the iag.
*/
diInitInode(ip, iagno, ino, extno,
iagp);
mark_metapage_dirty(mp);
}
release_metapage(mp);
/* free the AG lock and return.
*/
AG_UNLOCK(imap, agno);
return (rc);
}
/* check if we may allocate an extent of free
* inodes and whether this word of the free
* extents summary map describes a free extent.
*/
if (addext && ~extsmap) {
/* a free extent has been found. determine
* the extent number.
*/
rem = diFindFree(extsmap, 0);
extno = (sword << L2EXTSPERSUM) + rem;
/* allocate an extent of free inodes.
*/
if ((rc = diNewExt(imap, iagp, extno))) {
/* if there is no disk space for a
* new extent, try to allocate the
* disk inode from somewhere else.
*/
if (rc == ENOSPC)
break;
assert(rc == EIO);
} else {
/* set the results of the allocation
* and write the iag.
*/
diInitInode(ip, iagno,
extno << L2INOSPEREXT,
extno, iagp);
mark_metapage_dirty(mp);
}
release_metapage(mp);
/* free the imap inode & the AG lock & return.
*/
IREAD_UNLOCK(ipimap);
AG_UNLOCK(imap, agno);
return (rc);
}
/* move on to the next set of summary map words.
*/
sword = (sword == SMAPSZ - 1) ? 0 : sword + 1;
inosmap = le32_to_cpu(iagp->inosmap[sword]);
extsmap = le32_to_cpu(iagp->extsmap[sword]);
}
}
/* unlock imap inode */
IREAD_UNLOCK(ipimap);
/* nothing doing in this iag, so release it. */
release_metapage(mp);
tryag:
/*
* try to allocate anywhere within the same AG as the parent inode.
*/
rc = diAllocAG(imap, agno, dir, ip);
AG_UNLOCK(imap, agno);
if (rc != ENOSPC)
return (rc);
/*
* try to allocate in any AG.
*/
return (diAllocAny(imap, agno, dir, ip));
}
/*
* NAME: diAllocAG(imap,agno,dir,ip)
*
* FUNCTION: allocate a disk inode from the allocation group.
*
* this routine first determines if a new extent of free
* inodes should be added for the allocation group, with
* the current request satisfied from this extent. if this
* is the case, an attempt will be made to do just that. if
* this attempt fails or it has been determined that a new
* extent should not be added, an attempt is made to satisfy
* the request by allocating an existing (backed) free inode
* from the allocation group.
*
* PRE CONDITION: Already have the AG lock for this AG.
*
* PARAMETERS:
* imap - pointer to inode map control structure.
* agno - allocation group to allocate from.
* dir - TRUE if the new disk inode is for a directory.
* ip - pointer to the new inode to be filled in on successful return
* with the disk inode number allocated, its extent address
* and the start of the ag.
*
* RETURN VALUES:
* 0 - success.
* ENOSPC - insufficient disk resources.
* EIO - i/o error.
*/
static int
diAllocAG(imap_t * imap, int agno, boolean_t dir, struct inode *ip)
{
int rc, addext, numfree, numinos;
/* get the number of free and the number of backed disk
* inodes currently within the ag.
*/
numfree = imap->im_agctl[agno].numfree;
numinos = imap->im_agctl[agno].numinos;
if (numfree > numinos) {
jERROR(1,("diAllocAG: numfree > numinos\n"));
updateSuper(ip->i_sb, FM_DIRTY);
return EIO;
}
/* determine if we should allocate a new extent of free inodes
* within the ag: for directory inodes, add a new extent
* if there are a small number of free inodes or number of free
* inodes is a small percentage of the number of backed inodes.
*/
if (dir == TRUE)
addext = (numfree < 64 ||
(numfree < 256
&& ((numfree * 100) / numinos) <= 20));
else
addext = (numfree == 0);
/*
* try to allocate a new extent of free inodes.
*/
if (addext) {
/* if free space is not avaliable for this new extent, try
* below to allocate a free and existing (already backed)
* inode from the ag.
*/
if ((rc = diAllocExt(imap, agno, ip)) != ENOSPC)
return (rc);
}
/*
* try to allocate an existing free inode from the ag.
*/
return (diAllocIno(imap, agno, ip));
}
/*
* NAME: diAllocAny(imap,agno,dir,iap)
*
* FUNCTION: allocate a disk inode from any other allocation group.
*
* this routine is called when an allocation attempt within
* the primary allocation group has failed. if attempts to
* allocate an inode from any allocation group other than the
* specified primary group.
*
* PARAMETERS:
* imap - pointer to inode map control structure.
* agno - primary allocation group (to avoid).
* dir - TRUE if the new disk inode is for a directory.
* ip - pointer to a new inode to be filled in on successful return
* with the disk inode number allocated, its extent address
* and the start of the ag.
*
* RETURN VALUES:
* 0 - success.
* ENOSPC - insufficient disk resources.
* EIO - i/o error.
*/
static int
diAllocAny(imap_t * imap, int agno, boolean_t dir, struct inode *ip)
{
int ag, rc;
int maxag = JFS_SBI(imap->im_ipimap->i_sb)->bmap->db_maxag;
/* try to allocate from the ags following agno up to
* the maximum ag number.
*/
for (ag = agno + 1; ag <= maxag; ag++) {
AG_LOCK(imap, ag);
rc = diAllocAG(imap, ag, dir, ip);
AG_UNLOCK(imap, ag);
if (rc != ENOSPC)
return (rc);
}
/* try to allocate from the ags in front of agno.
*/
for (ag = 0; ag < agno; ag++) {
AG_LOCK(imap, ag);
rc = diAllocAG(imap, ag, dir, ip);
AG_UNLOCK(imap, ag);
if (rc != ENOSPC)
return (rc);
}
/* no free disk inodes.
*/
return (ENOSPC);
}
/*
* NAME: diAllocIno(imap,agno,ip)
*
* FUNCTION: allocate a disk inode from the allocation group's free
* inode list, returning an error if this free list is
* empty (i.e. no iags on the list).
*
* allocation occurs from the first iag on the list using
* the iag's free inode summary map to find the leftmost
* free inode in the iag.
*
* PRE CONDITION: Already have AG lock for this AG.
*
* PARAMETERS:
* imap - pointer to inode map control structure.
* agno - allocation group.
* ip - pointer to new inode to be filled in on successful return
* with the disk inode number allocated, its extent address
* and the start of the ag.
*
* RETURN VALUES:
* 0 - success.
* ENOSPC - insufficient disk resources.
* EIO - i/o error.
*/
static int diAllocIno(imap_t * imap, int agno, struct inode *ip)
{
int iagno, ino, rc, rem, extno, sword;
metapage_t *mp;
iag_t *iagp;
/* check if there are iags on the ag's free inode list.
*/
if ((iagno = imap->im_agctl[agno].inofree) < 0)
return (ENOSPC);
/* obtain read lock on imap inode */
IREAD_LOCK(imap->im_ipimap);
/* read the iag at the head of the list.
*/
if ((rc = diIAGRead(imap, iagno, &mp))) {
IREAD_UNLOCK(imap->im_ipimap);
return (rc);
}
iagp = (iag_t *) mp->data;
/* better be free inodes in this iag if it is on the
* list.
*/
//assert(iagp->nfreeinos);
if (!iagp->nfreeinos) {
jERROR(1,
("diAllocIno: nfreeinos = 0, but iag on freelist\n"));
jERROR(1, (" agno = %d, iagno = %d\n", agno, iagno));
dump_mem("iag", iagp, 64);
updateSuper(ip->i_sb, FM_DIRTY);
return EIO;
}
/* scan the free inode summary map to find an extent
* with free inodes.
*/
for (sword = 0;; sword++) {
assert(sword < SMAPSZ);
if (~iagp->inosmap[sword])
break;
}
/* found a extent with free inodes. determine
* the extent number.
*/
rem = diFindFree(le32_to_cpu(iagp->inosmap[sword]), 0);
assert(rem < EXTSPERSUM);
extno = (sword << L2EXTSPERSUM) + rem;
/* find the first free inode in the extent.
*/
rem = diFindFree(le32_to_cpu(iagp->wmap[extno]), 0);
assert(rem < INOSPEREXT);
/* compute the inode number within the iag.
*/
ino = (extno << L2INOSPEREXT) + rem;
/* allocate the inode.
*/
rc = diAllocBit(imap, iagp, ino);
IREAD_UNLOCK(imap->im_ipimap);
if (rc) {
release_metapage(mp);
return (rc);
}
/* set the results of the allocation and write the iag.
*/
diInitInode(ip, iagno, ino, extno, iagp);
write_metapage(mp);
return (0);
}
/*
* NAME: diAllocExt(imap,agno,ip)
*
* FUNCTION: add a new extent of free inodes to an iag, allocating
* an inode from this extent to satisfy the current allocation
* request.
*
* this routine first tries to find an existing iag with free
* extents through the ag free extent list. if list is not
* empty, the head of the list will be selected as the home
* of the new extent of free inodes. otherwise (the list is
* empty), a new iag will be allocated for the ag to contain
* the extent.
*
* once an iag has been selected, the free extent summary map
* is used to locate a free extent within the iag and diNewExt()
* is called to initialize the extent, with initialization
* including the allocation of the first inode of the extent
* for the purpose of satisfying this request.
*
* PARAMETERS:
* imap - pointer to inode map control structure.
* agno - allocation group number.
* ip - pointer to new inode to be filled in on successful return
* with the disk inode number allocated, its extent address
* and the start of the ag.
*
* RETURN VALUES:
* 0 - success.
* ENOSPC - insufficient disk resources.
* EIO - i/o error.
*/
static int diAllocExt(imap_t * imap, int agno, struct inode *ip)
{
int rem, iagno, sword, extno, rc;
metapage_t *mp;
iag_t *iagp;
/* check if the ag has any iags with free extents. if not,
* allocate a new iag for the ag.
*/
if ((iagno = imap->im_agctl[agno].extfree) < 0) {
/* If successful, diNewIAG will obtain the read lock on the
* imap inode.
*/
if ((rc = diNewIAG(imap, &iagno, agno, &mp))) {
return (rc);
}
iagp = (iag_t *) mp->data;
/* set the ag number if this a brand new iag
*/
iagp->agstart =
cpu_to_le64(AGTOBLK(agno, imap->im_ipimap));
} else {
/* read the iag.
*/
IREAD_LOCK(imap->im_ipimap);
if ((rc = diIAGRead(imap, iagno, &mp))) {
assert(0);
}
iagp = (iag_t *) mp->data;
}
/* using the free extent summary map, find a free extent.
*/
for (sword = 0;; sword++) {
assert(sword < SMAPSZ);
if (~iagp->extsmap[sword])
break;
}
/* determine the extent number of the free extent.
*/
rem = diFindFree(le32_to_cpu(iagp->extsmap[sword]), 0);
assert(rem < EXTSPERSUM);
extno = (sword << L2EXTSPERSUM) + rem;
/* initialize the new extent.
*/
rc = diNewExt(imap, iagp, extno);
IREAD_UNLOCK(imap->im_ipimap);
if (rc) {
/* something bad happened. if a new iag was allocated,
* place it back on the inode map's iag free list, and
* clear the ag number information.
*/
if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG)) {
IAGFREE_LOCK(imap);
iagp->iagfree = cpu_to_le32(imap->im_freeiag);
imap->im_freeiag = iagno;
IAGFREE_UNLOCK(imap);
}
write_metapage(mp);
return (rc);
}
/* set the results of the allocation and write the iag.
*/
diInitInode(ip, iagno, extno << L2INOSPEREXT, extno, iagp);
write_metapage(mp);
return (0);
}
/*
* NAME: diAllocBit(imap,iagp,ino)
*
* FUNCTION: allocate a backed inode from an iag.
*
* this routine performs the mechanics of allocating a
* specified inode from a backed extent.
*
* if the inode to be allocated represents the last free
* inode within the iag, the iag will be removed from the
* ag free inode list.
*
* a careful update approach is used to provide consistency
* in the face of updates to multiple buffers. under this
* approach, all required buffers are obtained before making
* any updates and are held all are updates are complete.
*
* PRE CONDITION: Already have buffer lock on iagp. Already have AG lock on
* this AG. Must have read lock on imap inode.
*
* PARAMETERS:
* imap - pointer to inode map control structure.
* iagp - pointer to iag.
* ino - inode number to be allocated within the iag.
*
* RETURN VALUES:
* 0 - success.
* ENOSPC - insufficient disk resources.
* EIO - i/o error.
*/
static int diAllocBit(imap_t * imap, iag_t * iagp, int ino)
{
int extno, bitno, agno, sword, rc;
metapage_t *amp, *bmp;
iag_t *aiagp = 0, *biagp = 0;
u32 mask;
/* check if this is the last free inode within the iag.
* if so, it will have to be removed from the ag free
* inode list, so get the iags preceeding and following
* it on the list.
*/
if (iagp->nfreeinos == cpu_to_le32(1)) {
amp = bmp = NULL;
if ((int) le32_to_cpu(iagp->inofreefwd) >= 0) {
if ((rc =
diIAGRead(imap, le32_to_cpu(iagp->inofreefwd),
&amp)))
return (rc);
aiagp = (iag_t *) amp->data;
}
if ((int) le32_to_cpu(iagp->inofreeback) >= 0) {
if ((rc =
diIAGRead(imap,
le32_to_cpu(iagp->inofreeback),
&bmp))) {
if (amp)
release_metapage(amp);
return (rc);
}
biagp = (iag_t *) bmp->data;
}
}
/* get the ag number, extent number, inode number within
* the extent.
*/
agno = BLKTOAG(le64_to_cpu(iagp->agstart), JFS_SBI(imap->im_ipimap->i_sb));
extno = ino >> L2INOSPEREXT;
bitno = ino & (INOSPEREXT - 1);
/* compute the mask for setting the map.
*/
mask = HIGHORDER >> bitno;
/* the inode should be free and backed.
*/
assert((le32_to_cpu(iagp->pmap[extno]) & mask) == 0);
assert((le32_to_cpu(iagp->wmap[extno]) & mask) == 0);
assert(addressPXD(&iagp->inoext[extno]) != 0);
/* mark the inode as allocated in the working map.
*/
iagp->wmap[extno] |= cpu_to_le32(mask);
/* check if all inodes within the extent are now
* allocated. if so, update the free inode summary
* map to reflect this.
*/
if (iagp->wmap[extno] == ONES) {
sword = extno >> L2EXTSPERSUM;
bitno = extno & (EXTSPERSUM - 1);
iagp->inosmap[sword] |= cpu_to_le32(HIGHORDER >> bitno);
}
/* if this was the last free inode in the iag, remove the
* iag from the ag free inode list.
*/
if (iagp->nfreeinos == cpu_to_le32(1)) {
if (amp) {
aiagp->inofreeback = iagp->inofreeback;
write_metapage(amp);
}
if (bmp) {
biagp->inofreefwd = iagp->inofreefwd;
write_metapage(bmp);
} else {
imap->im_agctl[agno].inofree =
le32_to_cpu(iagp->inofreefwd);
}
iagp->inofreefwd = iagp->inofreeback = -1;
}
/* update the free inode count at the iag, ag, inode
* map levels.
*/
iagp->nfreeinos = cpu_to_le32(le32_to_cpu(iagp->nfreeinos) - 1);
imap->im_agctl[agno].numfree -= 1;
atomic_dec(&imap->im_numfree);
return (0);
}
/*
* NAME: diNewExt(imap,iagp,extno)
*
* FUNCTION: initialize a new extent of inodes for an iag, allocating
* the first inode of the extent for use for the current
* allocation request.
*
* disk resources are allocated for the new extent of inodes
* and the inodes themselves are initialized to reflect their
* existence within the extent (i.e. their inode numbers and
* inode extent addresses are set) and their initial state
* (mode and link count are set to zero).
*
* if the iag is new, it is not yet on an ag extent free list
* but will now be placed on this list.
*
* if the allocation of the new extent causes the iag to
* have no free extent, the iag will be removed from the
* ag extent free list.
*
* if the iag has no free backed inodes, it will be placed
* on the ag free inode list, since the addition of the new
* extent will now cause it to have free inodes.
*
* a careful update approach is used to provide consistency
* (i.e. list consistency) in the face of updates to multiple
* buffers. under this approach, all required buffers are
* obtained before making any updates and are held until all
* updates are complete.
*
* PRE CONDITION: Already have buffer lock on iagp. Already have AG lock on
* this AG. Must have read lock on imap inode.
*
* PARAMETERS:
* imap - pointer to inode map control structure.
* iagp - pointer to iag.
* extno - extent number.
*
* RETURN VALUES:
* 0 - success.
* ENOSPC - insufficient disk resources.
* EIO - i/o error.
*/
static int diNewExt(imap_t * imap, iag_t * iagp, int extno)
{
int agno, iagno, fwd, back, freei = 0, sword, rc;
iag_t *aiagp = 0, *biagp = 0, *ciagp = 0;
metapage_t *amp, *bmp, *cmp, *dmp;
struct inode *ipimap;
s64 blkno, hint;
int i, j;
u32 mask;
ino_t ino;
dinode_t *dp;
struct jfs_sb_info *sbi;
/* better have free extents.
*/
assert(iagp->nfreeexts);
/* get the inode map inode.
*/
ipimap = imap->im_ipimap;
sbi = JFS_SBI(ipimap->i_sb);
amp = bmp = cmp = NULL;
/* get the ag and iag numbers for this iag.
*/
agno = BLKTOAG(le64_to_cpu(iagp->agstart), sbi);
iagno = le32_to_cpu(iagp->iagnum);
/* check if this is the last free extent within the
* iag. if so, the iag must be removed from the ag
* free extent list, so get the iags preceeding and
* following the iag on this list.
*/
if (iagp->nfreeexts == cpu_to_le32(1)) {
if ((fwd = le32_to_cpu(iagp->extfreefwd)) >= 0) {
if ((rc = diIAGRead(imap, fwd, &amp)))
return (rc);
aiagp = (iag_t *) amp->data;
}
if ((back = le32_to_cpu(iagp->extfreeback)) >= 0) {
if ((rc = diIAGRead(imap, back, &bmp)))
goto error_out;
biagp = (iag_t *) bmp->data;
}
} else {
/* the iag has free extents. if all extents are free
* (as is the case for a newly allocated iag), the iag
* must be added to the ag free extent list, so get
* the iag at the head of the list in preparation for
* adding this iag to this list.
*/
fwd = back = -1;
if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG)) {
if ((fwd = imap->im_agctl[agno].extfree) >= 0) {
if ((rc = diIAGRead(imap, fwd, &amp)))
goto error_out;
aiagp = (iag_t *) amp->data;
}
}
}
/* check if the iag has no free inodes. if so, the iag
* will have to be added to the ag free inode list, so get
* the iag at the head of the list in preparation for
* adding this iag to this list. in doing this, we must
* check if we already have the iag at the head of
* the list in hand.
*/
if (iagp->nfreeinos == 0) {
freei = imap->im_agctl[agno].inofree;
if (freei >= 0) {
if (freei == fwd) {
ciagp = aiagp;
} else if (freei == back) {
ciagp = biagp;
} else {
if ((rc = diIAGRead(imap, freei, &cmp)))
goto error_out;
ciagp = (iag_t *) cmp->data;
}
assert(ciagp != NULL);
}
}
/* allocate disk space for the inode extent.
*/
if ((extno == 0) || (addressPXD(&iagp->inoext[extno - 1]) == 0))
hint = ((s64) agno << sbi->bmap->db_agl2size) - 1;
else
hint = addressPXD(&iagp->inoext[extno - 1]) +
lengthPXD(&iagp->inoext[extno - 1]) - 1;
if ((rc = dbAlloc(ipimap, hint, (s64) imap->im_nbperiext, &blkno)))
goto error_out;
/* compute the inode number of the first inode within the
* extent.
*/
ino = (iagno << L2INOSPERIAG) + (extno << L2INOSPEREXT);
/* initialize the inodes within the newly allocated extent a
* page at a time.
*/
for (i = 0; i < imap->im_nbperiext; i += sbi->nbperpage) {
/* get a buffer for this page of disk inodes.
*/
dmp = get_metapage(ipimap, blkno + i, PSIZE, 1);
if (dmp == NULL) {
rc = EIO;
goto error_out;
}
dp = (dinode_t *) dmp->data;
/* initialize the inode number, mode, link count and
* inode extent address.
*/
for (j = 0; j < INOSPERPAGE; j++, dp++, ino++) {
dp->di_inostamp = cpu_to_le32(sbi->inostamp);
dp->di_number = cpu_to_le32(ino);
dp->di_fileset = cpu_to_le32(FILESYSTEM_I);
dp->di_mode = 0;
dp->di_nlink = 0;
PXDaddress(&(dp->di_ixpxd), blkno);
PXDlength(&(dp->di_ixpxd), imap->im_nbperiext);
}
write_metapage(dmp);
}
/* if this is the last free extent within the iag, remove the
* iag from the ag free extent list.
*/
if (iagp->nfreeexts == cpu_to_le32(1)) {
if (fwd >= 0)
aiagp->extfreeback = iagp->extfreeback;
if (back >= 0)
biagp->extfreefwd = iagp->extfreefwd;
else
imap->im_agctl[agno].extfree =
le32_to_cpu(iagp->extfreefwd);
iagp->extfreefwd = iagp->extfreeback = -1;
} else {
/* if the iag has all free extents (newly allocated iag),
* add the iag to the ag free extent list.
*/
if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG)) {
if (fwd >= 0)
aiagp->extfreeback = cpu_to_le32(iagno);
iagp->extfreefwd = cpu_to_le32(fwd);
iagp->extfreeback = -1;
imap->im_agctl[agno].extfree = iagno;
}
}
/* if the iag has no free inodes, add the iag to the
* ag free inode list.
*/
if (iagp->nfreeinos == 0) {
if (freei >= 0)
ciagp->inofreeback = cpu_to_le32(iagno);
iagp->inofreefwd =
cpu_to_le32(imap->im_agctl[agno].inofree);
iagp->inofreeback = -1;
imap->im_agctl[agno].inofree = iagno;
}
/* initialize the extent descriptor of the extent. */
PXDlength(&iagp->inoext[extno], imap->im_nbperiext);
PXDaddress(&iagp->inoext[extno], blkno);
/* initialize the working and persistent map of the extent.
* the working map will be initialized such that
* it indicates the first inode of the extent is allocated.
*/
iagp->wmap[extno] = cpu_to_le32(HIGHORDER);
iagp->pmap[extno] = 0;
/* update the free inode and free extent summary maps
* for the extent to indicate the extent has free inodes
* and no longer represents a free extent.
*/
sword = extno >> L2EXTSPERSUM;
mask = HIGHORDER >> (extno & (EXTSPERSUM - 1));
iagp->extsmap[sword] |= cpu_to_le32(mask);
iagp->inosmap[sword] &= cpu_to_le32(~mask);
/* update the free inode and free extent counts for the
* iag.
*/
iagp->nfreeinos = cpu_to_le32(le32_to_cpu(iagp->nfreeinos) +
(INOSPEREXT - 1));
iagp->nfreeexts = cpu_to_le32(le32_to_cpu(iagp->nfreeexts) - 1);
/* update the free and backed inode counts for the ag.
*/
imap->im_agctl[agno].numfree += (INOSPEREXT - 1);
imap->im_agctl[agno].numinos += INOSPEREXT;
/* update the free and backed inode counts for the inode map.
*/
atomic_add(INOSPEREXT - 1, &imap->im_numfree);
atomic_add(INOSPEREXT, &imap->im_numinos);
/* write the iags.
*/
if (amp)
write_metapage(amp);
if (bmp)
write_metapage(bmp);
if (cmp)
write_metapage(cmp);
return (0);
error_out:
/* release the iags.
*/
if (amp)
release_metapage(amp);
if (bmp)
release_metapage(bmp);
if (cmp)
release_metapage(cmp);
return (rc);
}
/*
* NAME: diNewIAG(imap,iagnop,agno)
*
* FUNCTION: allocate a new iag for an allocation group.
*
* first tries to allocate the iag from the inode map
* iagfree list:
* if the list has free iags, the head of the list is removed
* and returned to satisfy the request.
* if the inode map's iag free list is empty, the inode map
* is extended to hold a new iag. this new iag is initialized
* and returned to satisfy the request.
*
* PARAMETERS:
* imap - pointer to inode map control structure.
* iagnop - pointer to an iag number set with the number of the
* newly allocated iag upon successful return.
* agno - allocation group number.
* bpp - Buffer pointer to be filled in with new IAG's buffer
*
* RETURN VALUES:
* 0 - success.
* ENOSPC - insufficient disk resources.
* EIO - i/o error.
*
* serialization:
* AG lock held on entry/exit;
* write lock on the map is held inside;
* read lock on the map is held on successful completion;
*
* note: new iag transaction:
* . synchronously write iag;
* . write log of xtree and inode of imap;
* . commit;
* . synchronous write of xtree (right to left, bottom to top);
* . at start of logredo(): init in-memory imap with one additional iag page;
* . at end of logredo(): re-read imap inode to determine
* new imap size;
*/
static int
diNewIAG(imap_t * imap, int *iagnop, int agno, metapage_t ** mpp)
{
int rc;
int iagno, i, xlen;
struct inode *ipimap;
struct super_block *sb;
struct jfs_sb_info *sbi;
metapage_t *mp;
iag_t *iagp;
s64 xaddr = 0;
s64 blkno;
tid_t tid;
#ifdef _STILL_TO_PORT
xad_t xad;
#endif /* _STILL_TO_PORT */
struct inode *iplist[1];
/* pick up pointers to the inode map and mount inodes */
ipimap = imap->im_ipimap;
sb = ipimap->i_sb;
sbi = JFS_SBI(sb);
/* acquire the free iag lock */
IAGFREE_LOCK(imap);
/* if there are any iags on the inode map free iag list,
* allocate the iag from the head of the list.
*/
if (imap->im_freeiag >= 0) {
/* pick up the iag number at the head of the list */
iagno = imap->im_freeiag;
/* determine the logical block number of the iag */
blkno = IAGTOLBLK(iagno, sbi->l2nbperpage);
} else {
/* no free iags. the inode map will have to be extented
* to include a new iag.
*/
/* acquire inode map lock */
IWRITE_LOCK(ipimap);
assert(ipimap->i_size >> L2PSIZE == imap->im_nextiag + 1);
/* get the next avaliable iag number */
iagno = imap->im_nextiag;
/* make sure that we have not exceeded the maximum inode
* number limit.
*/
if (iagno > (MAXIAGS - 1)) {
/* release the inode map lock */
IWRITE_UNLOCK(ipimap);
rc = ENOSPC;
goto out;
}
/*
* synchronously append new iag page.
*/
/* determine the logical address of iag page to append */
blkno = IAGTOLBLK(iagno, sbi->l2nbperpage);
/* Allocate extent for new iag page */
xlen = sbi->nbperpage;
if ((rc = dbAlloc(ipimap, 0, (s64) xlen, &xaddr))) {
/* release the inode map lock */
IWRITE_UNLOCK(ipimap);
goto out;
}
/* assign a buffer for the page */
mp = get_metapage(ipimap, xaddr, PSIZE, 1);
//bp = bmAssign(ipimap, blkno, xaddr, PSIZE, bmREAD_PAGE);
if (!mp) {
/* Free the blocks allocated for the iag since it was
* not successfully added to the inode map
*/
dbFree(ipimap, xaddr, (s64) xlen);
/* release the inode map lock */
IWRITE_UNLOCK(ipimap);
rc = EIO;
goto out;
}
iagp = (iag_t *) mp->data;
/* init the iag */
memset(iagp, 0, sizeof(iag_t));
iagp->iagnum = cpu_to_le32(iagno);
iagp->inofreefwd = iagp->inofreeback = -1;
iagp->extfreefwd = iagp->extfreeback = -1;
iagp->iagfree = -1;
iagp->nfreeinos = 0;
iagp->nfreeexts = cpu_to_le32(EXTSPERIAG);
/* initialize the free inode summary map (free extent
* summary map initialization handled by bzero).
*/
for (i = 0; i < SMAPSZ; i++)
iagp->inosmap[i] = ONES;
flush_metapage(mp);
#ifdef _STILL_TO_PORT
/* synchronously write the iag page */
if (bmWrite(bp)) {
/* Free the blocks allocated for the iag since it was
* not successfully added to the inode map
*/
dbFree(ipimap, xaddr, (s64) xlen);
/* release the inode map lock */
IWRITE_UNLOCK(ipimap);
rc = EIO;
goto out;
}
/* Now the iag is on disk */
/*
* start tyransaction of update of the inode map
* addressing structure pointing to the new iag page;
*/
#endif /* _STILL_TO_PORT */
tid = txBegin(sb, COMMIT_FORCE);
/* update the inode map addressing structure to point to it */
if ((rc =
xtInsert(tid, ipimap, 0, blkno, xlen, &xaddr, 0))) {
/* Free the blocks allocated for the iag since it was
* not successfully added to the inode map
*/
dbFree(ipimap, xaddr, (s64) xlen);
/* release the inode map lock */
IWRITE_UNLOCK(ipimap);
goto out;
}
/* update the inode map's inode to reflect the extension */
ipimap->i_size += PSIZE;
ipimap->i_blocks += LBLK2PBLK(sb, xlen);
/*
* txCommit(COMMIT_FORCE) will synchronously write address
* index pages and inode after commit in careful update order
* of address index pages (right to left, bottom up);
*/
iplist[0] = ipimap;
rc = txCommit(tid, 1, &iplist[0], COMMIT_FORCE);
txEnd(tid);
duplicateIXtree(sb, blkno, xlen, &xaddr);
/* update the next avaliable iag number */
imap->im_nextiag += 1;
/* Add the iag to the iag free list so we don't lose the iag
* if a failure happens now.
*/
imap->im_freeiag = iagno;
/* Until we have logredo working, we want the imap inode &
* control page to be up to date.
*/
diSync(ipimap);
/* release the inode map lock */
IWRITE_UNLOCK(ipimap);
}
/* obtain read lock on map */
IREAD_LOCK(ipimap);
/* read the iag */
if ((rc = diIAGRead(imap, iagno, &mp))) {
IREAD_UNLOCK(ipimap);
rc = EIO;
goto out;
}
iagp = (iag_t *) mp->data;
/* remove the iag from the iag free list */
imap->im_freeiag = le32_to_cpu(iagp->iagfree);
iagp->iagfree = -1;
/* set the return iag number and buffer pointer */
*iagnop = iagno;
*mpp = mp;
out:
/* release the iag free lock */
IAGFREE_UNLOCK(imap);
return (rc);
}
/*
* NAME: diIAGRead()
*
* FUNCTION: get the buffer for the specified iag within a fileset
* or aggregate inode map.
*
* PARAMETERS:
* imap - pointer to inode map control structure.
* iagno - iag number.
* bpp - point to buffer pointer to be filled in on successful
* exit.
*
* SERIALIZATION:
* must have read lock on imap inode
* (When called by diExtendFS, the filesystem is quiesced, therefore
* the read lock is unnecessary.)
*
* RETURN VALUES:
* 0 - success.
* EIO - i/o error.
*/
static int diIAGRead(imap_t * imap, int iagno, metapage_t ** mpp)
{
struct inode *ipimap = imap->im_ipimap;
s64 blkno;
/* compute the logical block number of the iag. */
blkno = IAGTOLBLK(iagno, JFS_SBI(ipimap->i_sb)->l2nbperpage);
/* read the iag. */
*mpp = read_metapage(ipimap, blkno, PSIZE, 0);
if (*mpp == NULL) {
return (EIO);
}
return (0);
}
/*
* NAME: diFindFree()
*
* FUNCTION: find the first free bit in a word starting at
* the specified bit position.
*
* PARAMETERS:
* word - word to be examined.
* start - starting bit position.
*
* RETURN VALUES:
* bit position of first free bit in the word or 32 if
* no free bits were found.
*/
static int diFindFree(u32 word, int start)
{
int bitno;
assert(start < 32);
/* scan the word for the first free bit. */
for (word <<= start, bitno = start; bitno < 32;
bitno++, word <<= 1) {
if ((word & HIGHORDER) == 0)
break;
}
return (bitno);
}
/*
* NAME: diUpdatePMap()
*
* FUNCTION: Update the persistent map in an IAG for the allocation or
* freeing of the specified inode.
*
* PRE CONDITIONS: Working map has already been updated for allocate.
*
* PARAMETERS:
* ipimap - Incore inode map inode
* inum - Number of inode to mark in permanent map
* is_free - If TRUE indicates inode should be marked freed, otherwise
* indicates inode should be marked allocated.
*
* RETURNS: 0 for success
*/
int
diUpdatePMap(struct inode *ipimap,
unsigned long inum, boolean_t is_free, tblock_t * tblk)
{
int rc;
iag_t *iagp;
metapage_t *mp;
int iagno, ino, extno, bitno;
imap_t *imap;
u32 mask;
log_t *log;
int lsn, difft, diffp;
imap = JFS_IP(ipimap)->i_imap;
/* get the iag number containing the inode */
iagno = INOTOIAG(inum);
/* make sure that the iag is contained within the map */
assert(iagno < imap->im_nextiag);
/* read the iag */
IREAD_LOCK(ipimap);
rc = diIAGRead(imap, iagno, &mp);
IREAD_UNLOCK(ipimap);
if (rc)
return (rc);
iagp = (iag_t *) mp->data;
/* get the inode number and extent number of the inode within
* the iag and the inode number within the extent.
*/
ino = inum & (INOSPERIAG - 1);
extno = ino >> L2INOSPEREXT;
bitno = ino & (INOSPEREXT - 1);
mask = HIGHORDER >> bitno;
/*
* mark the inode free in persistent map:
*/
if (is_free == TRUE) {
/* The inode should have been allocated both in working
* map and in persistent map;
* the inode will be freed from working map at the release
* of last reference release;
*/
// assert(le32_to_cpu(iagp->wmap[extno]) & mask);
if (!(le32_to_cpu(iagp->wmap[extno]) & mask)) {
jERROR(1,
("diUpdatePMap: inode %ld not marked as allocated in wmap!\n",
inum));
updateSuper(ipimap->i_sb, FM_DIRTY);
}
// assert(le32_to_cpu(iagp->pmap[extno]) & mask);
if (!(le32_to_cpu(iagp->pmap[extno]) & mask)) {
jERROR(1,
("diUpdatePMap: inode %ld not marked as allocated in pmap!\n",
inum));
updateSuper(ipimap->i_sb, FM_DIRTY);
}
/* update the bitmap for the extent of the freed inode */
iagp->pmap[extno] &= cpu_to_le32(~mask);
}
/*
* mark the inode allocated in persistent map:
*/
else {
/* The inode should be already allocated in the working map
* and should be free in persistent map;
*/
assert(le32_to_cpu(iagp->wmap[extno]) & mask);
assert((le32_to_cpu(iagp->pmap[extno]) & mask) == 0);
/* update the bitmap for the extent of the allocated inode */
iagp->pmap[extno] |= cpu_to_le32(mask);
}
/*
* update iag lsn
*/
lsn = tblk->lsn;
log = JFS_SBI(tblk->sb)->log;
if (mp->lsn != 0) {
/* inherit older/smaller lsn */
logdiff(difft, lsn, log);
logdiff(diffp, mp->lsn, log);
if (difft < diffp) {
mp->lsn = lsn;
/* move mp after tblock in logsync list */
LOGSYNC_LOCK(log);
list_del(&mp->synclist);
list_add(&mp->synclist, &tblk->synclist);
LOGSYNC_UNLOCK(log);
}
/* inherit younger/larger clsn */
LOGSYNC_LOCK(log);
assert(mp->clsn);
logdiff(difft, tblk->clsn, log);
logdiff(diffp, mp->clsn, log);
if (difft > diffp)
mp->clsn = tblk->clsn;
LOGSYNC_UNLOCK(log);
} else {
mp->log = log;
mp->lsn = lsn;
/* insert mp after tblock in logsync list */
LOGSYNC_LOCK(log);
log->count++;
list_add(&mp->synclist, &tblk->synclist);
mp->clsn = tblk->clsn;
LOGSYNC_UNLOCK(log);
}
// bmLazyWrite(mp, log->flag & JFS_COMMIT);
write_metapage(mp);
return (0);
}
/*
* diExtendFS()
*
* function: update imap for extendfs();
*
* note: AG size has been increased s.t. each k old contiguous AGs are
* coalesced into a new AG;
*/
int diExtendFS(struct inode *ipimap, struct inode *ipbmap)
{
int rc, rcx = 0;
imap_t *imap = JFS_IP(ipimap)->i_imap;
iag_t *iagp = 0, *hiagp = 0;
bmap_t *mp = JFS_SBI(ipbmap->i_sb)->bmap;
metapage_t *bp, *hbp;
int i, n, head;
int numinos, xnuminos = 0, xnumfree = 0;
s64 agstart;
jEVENT(0, ("diExtendFS: nextiag:%d numinos:%d numfree:%d\n",
imap->im_nextiag, atomic_read(&imap->im_numinos),
atomic_read(&imap->im_numfree)));
/*
* reconstruct imap
*
* coalesce contiguous k (newAGSize/oldAGSize) AGs;
* i.e., (AGi, ..., AGj) where i = k*n and j = k*(n+1) - 1 to AGn;
* note: new AG size = old AG size * (2**x).
*/
/* init per AG control information im_agctl[] */
for (i = 0; i < MAXAG; i++) {
imap->im_agctl[i].inofree = -1; /* free inode list */
imap->im_agctl[i].extfree = -1; /* free extent list */
imap->im_agctl[i].numinos = 0; /* number of backed inodes */
imap->im_agctl[i].numfree = 0; /* number of free backed inodes */
}
/*
* process each iag_t page of the map.
*
* rebuild AG Free Inode List, AG Free Inode Extent List;
*/
for (i = 0; i < imap->im_nextiag; i++) {
if ((rc = diIAGRead(imap, i, &bp))) {
rcx = rc;
continue;
}
iagp = (iag_t *) bp->data;
assert(le32_to_cpu(iagp->iagnum) == i);
/* leave free iag in the free iag list */
if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG)) {
release_metapage(bp);
continue;
}
/* agstart that computes to the same ag is treated as same; */
agstart = le64_to_cpu(iagp->agstart);
/* iagp->agstart = agstart & ~(mp->db_agsize - 1); */
n = agstart >> mp->db_agl2size;
/*
printf("diExtendFS: iag:%d agstart:%Ld agno:%d\n", i, agstart, n);
*/
/* compute backed inodes */
numinos = (EXTSPERIAG - le32_to_cpu(iagp->nfreeexts))
<< L2INOSPEREXT;
if (numinos > 0) {
/* merge AG backed inodes */
imap->im_agctl[n].numinos += numinos;
xnuminos += numinos;
}
/* if any backed free inodes, insert at AG free inode list */
if ((int) le32_to_cpu(iagp->nfreeinos) > 0) {
if ((head = imap->im_agctl[n].inofree) == -1)
iagp->inofreefwd = iagp->inofreeback = -1;
else {
if ((rc = diIAGRead(imap, head, &hbp))) {
rcx = rc;
goto nextiag;
}
hiagp = (iag_t *) hbp->data;
hiagp->inofreeback =
le32_to_cpu(iagp->iagnum);
iagp->inofreefwd = cpu_to_le32(head);
iagp->inofreeback = -1;
write_metapage(hbp);
}
imap->im_agctl[n].inofree =
le32_to_cpu(iagp->iagnum);
/* merge AG backed free inodes */
imap->im_agctl[n].numfree +=
le32_to_cpu(iagp->nfreeinos);
xnumfree += le32_to_cpu(iagp->nfreeinos);
}
/* if any free extents, insert at AG free extent list */
if (le32_to_cpu(iagp->nfreeexts) > 0) {
if ((head = imap->im_agctl[n].extfree) == -1)
iagp->extfreefwd = iagp->extfreeback = -1;
else {
if ((rc = diIAGRead(imap, head, &hbp))) {
rcx = rc;
goto nextiag;
}
hiagp = (iag_t *) hbp->data;
hiagp->extfreeback = iagp->iagnum;
iagp->extfreefwd = cpu_to_le32(head);
iagp->extfreeback = -1;
write_metapage(hbp);
}
imap->im_agctl[n].extfree =
le32_to_cpu(iagp->iagnum);
}
nextiag:
write_metapage(bp);
}
ASSERT(xnuminos == atomic_read(&imap->im_numinos) &&
xnumfree == atomic_read(&imap->im_numfree));
return rcx;
}
/*
* duplicateIXtree()
*
* serialization: IWRITE_LOCK held on entry/exit
*
* note: shadow page with regular inode (rel.2);
*/
static void
duplicateIXtree(struct super_block *sb, s64 blkno, int xlen, s64 * xaddr)
{
int rc;
tid_t tid;
struct inode *ip;
metapage_t *mpsuper;
struct jfs_superblock *j_sb;
/* if AIT2 ipmap2 is bad, do not try to update it */
if (JFS_SBI(sb)->mntflag & JFS_BAD_SAIT) /* s_flag */
return;
ip = diReadSpecial(sb, FILESYSTEM_I + INOSPEREXT);
if (ip == 0) {
JFS_SBI(sb)->mntflag |= JFS_BAD_SAIT;
if ((rc = readSuper(sb, &mpsuper)))
return;
j_sb = (struct jfs_superblock *) (mpsuper->data);
j_sb->s_flag |= JFS_BAD_SAIT;
write_metapage(mpsuper);
return;
}
/* start transaction */
tid = txBegin(sb, COMMIT_FORCE);
/* update the inode map addressing structure to point to it */
if ((rc = xtInsert(tid, ip, 0, blkno, xlen, xaddr, 0))) {
JFS_SBI(sb)->mntflag |= JFS_BAD_SAIT;
txAbort(tid, 1);
goto cleanup;
}
/* update the inode map's inode to reflect the extension */
ip->i_size += PSIZE;
ip->i_blocks += LBLK2PBLK(sb, xlen);
rc = txCommit(tid, 1, &ip, COMMIT_FORCE);
cleanup:
txEnd(tid);
diFreeSpecial(ip);
}
/*
* NAME: copy_from_dinode()
*
* FUNCTION: Copies inode info from disk inode to in-memory inode
*
* RETURN VALUES:
* 0 - success
* ENOMEM - insufficient memory
*/
static int copy_from_dinode(dinode_t * dip, struct inode *ip)
{
struct jfs_inode_info *jfs_ip = JFS_IP(ip);
jfs_ip->fileset = le32_to_cpu(dip->di_fileset);
jfs_ip->mode2 = le32_to_cpu(dip->di_mode);
ip->i_mode = le32_to_cpu(dip->di_mode) & 0xffff;
ip->i_nlink = le32_to_cpu(dip->di_nlink);
ip->i_uid = le32_to_cpu(dip->di_uid);
ip->i_gid = le32_to_cpu(dip->di_gid);
ip->i_size = le64_to_cpu(dip->di_size);
ip->i_atime = le32_to_cpu(dip->di_atime.tv_sec);
ip->i_mtime = le32_to_cpu(dip->di_mtime.tv_sec);
ip->i_ctime = le32_to_cpu(dip->di_ctime.tv_sec);
ip->i_blksize = ip->i_sb->s_blocksize;
ip->i_blocks = LBLK2PBLK(ip->i_sb, le64_to_cpu(dip->di_nblocks));
ip->i_version = ++event;
ip->i_generation = le32_to_cpu(dip->di_gen);
jfs_ip->ixpxd = dip->di_ixpxd; /* in-memory pxd's are little-endian */
jfs_ip->acl = dip->di_acl; /* as are dxd's */
jfs_ip->ea = dip->di_ea;
jfs_ip->next_index = le32_to_cpu(dip->di_next_index);
jfs_ip->otime = le32_to_cpu(dip->di_otime.tv_sec);
jfs_ip->acltype = le32_to_cpu(dip->di_acltype);
/*
* We may only need to do this for "special" inodes (dmap, imap)
*/
if (S_ISCHR(ip->i_mode) || S_ISBLK(ip->i_mode))
ip->i_rdev = to_kdev_t(le32_to_cpu(dip->di_rdev));
else if (S_ISDIR(ip->i_mode)) {
memcpy(&jfs_ip->i_dirtable, &dip->di_dirtable, 384);
} else if (!S_ISFIFO(ip->i_mode)) {
memcpy(&jfs_ip->i_xtroot, &dip->di_xtroot, 288);
}
/* Zero the in-memory-only stuff */
jfs_ip->cflag = 0;
jfs_ip->btindex = 0;
jfs_ip->btorder = 0;
jfs_ip->bxflag = 0;
jfs_ip->blid = 0;
jfs_ip->atlhead = 0;
jfs_ip->atltail = 0;
jfs_ip->xtlid = 0;
return (0);
}
/*
* NAME: copy_to_dinode()
*
* FUNCTION: Copies inode info from in-memory inode to disk inode
*/
static void copy_to_dinode(dinode_t * dip, struct inode *ip)
{
struct jfs_inode_info *jfs_ip = JFS_IP(ip);
dip->di_fileset = cpu_to_le32(jfs_ip->fileset);
dip->di_inostamp = cpu_to_le32(JFS_SBI(ip->i_sb)->inostamp);
dip->di_number = cpu_to_le32(ip->i_ino);
dip->di_gen = cpu_to_le32(ip->i_generation);
dip->di_size = cpu_to_le64(ip->i_size);
dip->di_nblocks = cpu_to_le64(PBLK2LBLK(ip->i_sb, ip->i_blocks));
dip->di_nlink = cpu_to_le32(ip->i_nlink);
dip->di_uid = cpu_to_le32(ip->i_uid);
dip->di_gid = cpu_to_le32(ip->i_gid);
/*
* mode2 is only needed for storing the higher order bits.
* Trust i_mode for the lower order ones
*/
dip->di_mode = cpu_to_le32((jfs_ip->mode2 & 0xffff0000) | ip->i_mode);
dip->di_atime.tv_sec = cpu_to_le32(ip->i_atime);
dip->di_atime.tv_nsec = 0;
dip->di_ctime.tv_sec = cpu_to_le32(ip->i_ctime);
dip->di_ctime.tv_nsec = 0;
dip->di_mtime.tv_sec = cpu_to_le32(ip->i_mtime);
dip->di_mtime.tv_nsec = 0;
dip->di_ixpxd = jfs_ip->ixpxd; /* in-memory pxd's are little-endian */
dip->di_acl = jfs_ip->acl; /* as are dxd's */
dip->di_ea = jfs_ip->ea;
dip->di_next_index = cpu_to_le32(jfs_ip->next_index);
dip->di_otime.tv_sec = cpu_to_le32(jfs_ip->otime);
dip->di_otime.tv_nsec = 0;
dip->di_acltype = cpu_to_le32(jfs_ip->acltype);
if (S_ISCHR(ip->i_mode) || S_ISBLK(ip->i_mode))
dip->di_rdev = cpu_to_le32(kdev_t_to_nr(ip->i_rdev));
}
#ifdef _JFS_DEBUG_IMAP
/*
* DBGdiInit()
*/
static void *DBGdiInit(imap_t * imap)
{
u32 *dimap;
int size;
size = 64 * 1024;
if ((dimap = (u32 *) xmalloc(size, L2PSIZE, kernel_heap)) == NULL)
assert(0);
bzero((void *) dimap, size);
imap->im_DBGdimap = dimap;
}
/*
* DBGdiAlloc()
*/
static void DBGdiAlloc(imap_t * imap, ino_t ino)
{
u32 *dimap = imap->im_DBGdimap;
int w, b;
u32 m;
w = ino >> 5;
b = ino & 31;
m = 0x80000000 >> b;
assert(w < 64 * 256);
if (dimap[w] & m) {
printk("DEBUG diAlloc: duplicate alloc ino:0x%x\n", ino);
}
dimap[w] |= m;
}
/*
* DBGdiFree()
*/
static void DBGdiFree(imap_t * imap, ino_t ino)
{
u32 *dimap = imap->im_DBGdimap;
int w, b;
u32 m;
w = ino >> 5;
b = ino & 31;
m = 0x80000000 >> b;
assert(w < 64 * 256);
if ((dimap[w] & m) == 0) {
printk("DEBUG diFree: duplicate free ino:0x%x\n", ino);
}
dimap[w] &= ~m;
}
static void dump_cp(imap_t * ipimap, char *function, int line)
{
printk("\n* ********* *\nControl Page %s %d\n", function, line);
printk("FreeIAG %d\tNextIAG %d\n", ipimap->im_freeiag,
ipimap->im_nextiag);
printk("NumInos %d\tNumFree %d\n",
atomic_read(&ipimap->im_numinos),
atomic_read(&ipimap->im_numfree));
printk("AG InoFree %d\tAG ExtFree %d\n",
ipimap->im_agctl[0].inofree, ipimap->im_agctl[0].extfree);
printk("AG NumInos %d\tAG NumFree %d\n",
ipimap->im_agctl[0].numinos, ipimap->im_agctl[0].numfree);
}
static void dump_iag(iag_t * iag, char *function, int line)
{
printk("\n* ********* *\nIAG %s %d\n", function, line);
printk("IagNum %d\tIAG Free %d\n", le32_to_cpu(iag->iagnum),
le32_to_cpu(iag->iagfree));
printk("InoFreeFwd %d\tInoFreeBack %d\n",
le32_to_cpu(iag->inofreefwd),
le32_to_cpu(iag->inofreeback));
printk("ExtFreeFwd %d\tExtFreeBack %d\n",
le32_to_cpu(iag->extfreefwd),
le32_to_cpu(iag->extfreeback));
printk("NFreeInos %d\tNFreeExts %d\n", le32_to_cpu(iag->nfreeinos),
le32_to_cpu(iag->nfreeexts));
}
#endif /* _JFS_DEBUG_IMAP */
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#ifndef _H_JFS_IMAP
#define _H_JFS_IMAP
#include "jfs_txnmgr.h"
/*
* jfs_imap.h: disk inode manager
*/
#define EXTSPERIAG 128 /* number of disk inode extent per iag */
#define IMAPBLKNO 0 /* lblkno of dinomap within inode map */
#define SMAPSZ 4 /* number of words per summary map */
#define EXTSPERSUM 32 /* number of extents per summary map entry */
#define L2EXTSPERSUM 5 /* l2 number of extents per summary map */
#define PGSPERIEXT 4 /* number of 4K pages per dinode extent */
#define MAXIAGS ((1<<20)-1) /* maximum number of iags */
#define MAXAG 128 /* maximum number of allocation groups */
#define AMAPSIZE 512 /* bytes in the IAG allocation maps */
#define SMAPSIZE 16 /* bytes in the IAG summary maps */
/* convert inode number to iag number */
#define INOTOIAG(ino) ((ino) >> L2INOSPERIAG)
/* convert iag number to logical block number of the iag page */
#define IAGTOLBLK(iagno,l2nbperpg) (((iagno) + 1) << (l2nbperpg))
/* get the starting block number of the 4K page of an inode extent
* that contains ino.
*/
#define INOPBLK(pxd,ino,l2nbperpg) (addressPXD((pxd)) + \
((((ino) & (INOSPEREXT-1)) >> L2INOSPERPAGE) << (l2nbperpg)))
/*
* inode allocation map:
*
* inode allocation map consists of
* . the inode map control page and
* . inode allocation group pages (per 4096 inodes)
* which are addressed by standard JFS xtree.
*/
/*
* inode allocation group page (per 4096 inodes of an AG)
*/
typedef struct {
s64 agstart; /* 8: starting block of ag */
s32 iagnum; /* 4: inode allocation group number */
s32 inofreefwd; /* 4: ag inode free list forward */
s32 inofreeback; /* 4: ag inode free list back */
s32 extfreefwd; /* 4: ag inode extent free list forward */
s32 extfreeback; /* 4: ag inode extent free list back */
s32 iagfree; /* 4: iag free list */
/* summary map: 1 bit per inode extent */
s32 inosmap[SMAPSZ]; /* 16: sum map of mapwords w/ free inodes;
* note: this indicates free and backed
* inodes, if the extent is not backed the
* value will be 1. if the extent is
* backed but all inodes are being used the
* value will be 1. if the extent is
* backed but at least one of the inodes is
* free the value will be 0.
*/
s32 extsmap[SMAPSZ]; /* 16: sum map of mapwords w/ free extents */
s32 nfreeinos; /* 4: number of free inodes */
s32 nfreeexts; /* 4: number of free extents */
/* (72) */
u8 pad[1976]; /* 1976: pad to 2048 bytes */
/* allocation bit map: 1 bit per inode (0 - free, 1 - allocated) */
u32 wmap[EXTSPERIAG]; /* 512: working allocation map */
u32 pmap[EXTSPERIAG]; /* 512: persistent allocation map */
pxd_t inoext[EXTSPERIAG]; /* 1024: inode extent addresses */
} iag_t; /* (4096) */
/*
* per AG control information (in inode map control page)
*/
typedef struct {
s32 inofree; /* 4: free inode list anchor */
s32 extfree; /* 4: free extent list anchor */
s32 numinos; /* 4: number of backed inodes */
s32 numfree; /* 4: number of free inodes */
} iagctl_t; /* (16) */
/*
* per fileset/aggregate inode map control page
*/
typedef struct {
s32 in_freeiag; /* 4: free iag list anchor */
s32 in_nextiag; /* 4: next free iag number */
s32 in_numinos; /* 4: num of backed inodes */
s32 in_numfree; /* 4: num of free backed inodes */
s32 in_nbperiext; /* 4: num of blocks per inode extent */
s32 in_l2nbperiext; /* 4: l2 of in_nbperiext */
s32 in_diskblock; /* 4: for standalone test driver */
s32 in_maxag; /* 4: for standalone test driver */
u8 pad[2016]; /* 2016: pad to 2048 */
iagctl_t in_agctl[MAXAG]; /* 2048: AG control information */
} dinomap_t; /* (4096) */
/*
* In-core inode map control page
*/
typedef struct inomap {
dinomap_t im_imap; /* 4096: inode allocation control */
struct inode *im_ipimap; /* 4: ptr to inode for imap */
struct semaphore im_freelock; /* 4: iag free list lock */
struct semaphore im_aglock[MAXAG]; /* 512: per AG locks */
u32 *im_DBGdimap;
atomic_t im_numinos; /* num of backed inodes */
atomic_t im_numfree; /* num of free backed inodes */
} imap_t;
#define im_freeiag im_imap.in_freeiag
#define im_nextiag im_imap.in_nextiag
#define im_agctl im_imap.in_agctl
#define im_nbperiext im_imap.in_nbperiext
#define im_l2nbperiext im_imap.in_l2nbperiext
/* for standalone testdriver
*/
#define im_diskblock im_imap.in_diskblock
#define im_maxag im_imap.in_maxag
extern int diFree(struct inode *);
extern int diAlloc(struct inode *, boolean_t, struct inode *);
extern int diSync(struct inode *);
/* external references */
extern int diUpdatePMap(struct inode *ipimap, unsigned long inum,
boolean_t is_free, tblock_t * tblk);
#ifdef _STILL_TO_PORT
extern int diExtendFS(inode_t * ipimap, inode_t * ipbmap);
#endif /* _STILL_TO_PORT */
extern int diMount(struct inode *);
extern int diUnmount(struct inode *, int);
extern int diRead(struct inode *);
extern void diClearExtension(struct inode *);
extern struct inode *diReadSpecial(struct super_block *, ino_t);
extern void diWriteSpecial(struct inode *);
extern void diFreeSpecial(struct inode *);
extern int diWrite(tid_t tid, struct inode *);
#endif /* _H_JFS_IMAP */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*/
#ifndef _H_JFS_INCORE
#define _H_JFS_INCORE
#include <linux/slab.h>
#include <asm/bitops.h>
#include "jfs_types.h"
#include "jfs_xtree.h"
#include "jfs_dtree.h"
/*
* JFS magic number
*/
#define JFS_SUPER_MAGIC 0x3153464a /* "JFS1" */
/*
* Due to header ordering problems this can't be in jfs_lock.h
*/
typedef struct jfs_rwlock {
struct rw_semaphore rw_sem;
atomic_t in_use; /* for hacked implementation of trylock */
} jfs_rwlock_t;
/*
* JFS-private inode information
*/
struct jfs_inode_info {
int fileset; /* fileset number (always 16)*/
uint mode2; /* jfs-specific mode */
pxd_t ixpxd; /* inode extent descriptor */
dxd_t acl; /* dxd describing acl */
dxd_t ea; /* dxd describing ea */
time_t otime; /* time created */
uint next_index; /* next available directory entry index */
int acltype; /* Type of ACL */
short btorder; /* access order */
short btindex; /* btpage entry index*/
struct inode *ipimap; /* inode map */
long cflag; /* commit flags */
u16 bxflag; /* xflag of pseudo buffer? */
unchar agno; /* ag number */
unchar pad; /* pad */
lid_t blid; /* lid of pseudo buffer? */
lid_t atlhead; /* anonymous tlock list head */
lid_t atltail; /* anonymous tlock list tail */
struct list_head anon_inode_list; /* inodes having anonymous txns */
struct list_head mp_list; /* metapages in inode's address space */
jfs_rwlock_t rdwrlock; /* read/write lock */
lid_t xtlid; /* lid of xtree lock on directory */
union {
struct {
xtpage_t _xtroot; /* 288: xtree root */
struct inomap *_imap; /* 4: inode map header */
} file;
struct {
dir_table_slot_t _table[12]; /* 96: directory index */
dtroot_t _dtroot; /* 288: dtree root */
} dir;
struct {
unchar _unused[16]; /* 16: */
dxd_t _dxd; /* 16: */
unchar _inline[128]; /* 128: inline symlink */
} link;
} u;
struct inode vfs_inode;
};
#define i_xtroot u.file._xtroot
#define i_imap u.file._imap
#define i_dirtable u.dir._table
#define i_dtroot u.dir._dtroot
#define i_inline u.link._inline
/*
* cflag
*/
enum cflags {
COMMIT_New, /* never committed inode */
COMMIT_Nolink, /* inode committed with zero link count */
COMMIT_Inlineea, /* commit inode inline EA */
COMMIT_Freewmap, /* free WMAP at iClose() */
COMMIT_Dirty, /* Inode is really dirty */
COMMIT_Holdlock, /* Hold the IWRITE_LOCK until commit is done */
COMMIT_Dirtable, /* commit changes to di_dirtable */
COMMIT_Stale, /* data extent is no longer valid */
COMMIT_Synclist, /* metadata pages on group commit synclist */
};
#define set_cflag(flag, ip) set_bit(flag, &(JFS_IP(ip)->cflag))
#define clear_cflag(flag, ip) clear_bit(flag, &(JFS_IP(ip)->cflag))
#define test_cflag(flag, ip) test_bit(flag, &(JFS_IP(ip)->cflag))
#define test_and_clear_cflag(flag, ip) \
test_and_clear_bit(flag, &(JFS_IP(ip)->cflag))
/*
* JFS-private superblock information.
*/
struct jfs_sb_info {
unsigned long mntflag; /* 4: aggregate attributes */
struct inode *ipbmap; /* 4: block map inode */
struct inode *ipaimap; /* 4: aggregate inode map inode */
struct inode *ipaimap2; /* 4: secondary aimap inode */
struct inode *ipimap; /* 4: aggregate inode map inode */
struct jfs_log *log; /* 4: log */
short bsize; /* 2: logical block size */
short l2bsize; /* 2: log2 logical block size */
short nbperpage; /* 2: blocks per page */
short l2nbperpage; /* 2: log2 blocks per page */
short l2niperblk; /* 2: log2 inodes per page */
short reserved; /* 2: log2 inodes per page */
pxd_t logpxd; /* 8: pxd describing log */
pxd_t ait2; /* 8: pxd describing AIT copy */
/* Formerly in ipimap */
uint gengen; /* 4: inode generation generator*/
uint inostamp; /* 4: shows inode belongs to fileset*/
/* Formerly in ipbmap */
struct bmap *bmap; /* 4: incore bmap descriptor */
struct nls_table *nls_tab; /* 4: current codepage */
struct inode *direct_inode; /* 4: inode for physical I/O */
struct address_space *direct_mapping; /* 4: mapping for physical I/O */
uint state; /* 4: mount/recovery state */
};
static inline struct jfs_inode_info *JFS_IP(struct inode *inode)
{
return list_entry(inode, struct jfs_inode_info, vfs_inode);
}
#define JFS_SBI(sb) ((struct jfs_sb_info *)(sb)->u.generic_sbp)
#define isReadOnly(ip) ((JFS_SBI((ip)->i_sb)->log) ? 0 : 1)
#endif /* _H_JFS_INCORE */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include <linux/fs.h>
#include "jfs_incore.h"
#include "jfs_filsys.h"
#include "jfs_imap.h"
#include "jfs_dinode.h"
#include "jfs_debug.h"
/*
* NAME: ialloc()
*
* FUNCTION: Allocate a new inode
*
*/
struct inode *ialloc(struct inode *parent, umode_t mode)
{
struct super_block *sb = parent->i_sb;
struct inode *inode;
struct jfs_inode_info *jfs_inode;
int rc;
inode = new_inode(sb);
if (!inode) {
jERROR(1, ("ialloc: new_inode returned NULL!\n"));
return inode;
}
jfs_inode = JFS_IP(inode);
rc = diAlloc(parent, S_ISDIR(mode), inode);
if (rc) {
jERROR(1, ("ialloc: diAlloc returned %d!\n", rc));
make_bad_inode(inode);
iput(inode);
return NULL;
}
inode->i_uid = current->fsuid;
if (parent->i_mode & S_ISGID) {
inode->i_gid = parent->i_gid;
if (S_ISDIR(mode))
mode |= S_ISGID;
} else
inode->i_gid = current->fsgid;
inode->i_mode = mode;
if (S_ISDIR(mode))
jfs_inode->mode2 = IDIRECTORY | mode;
else
jfs_inode->mode2 = INLINEEA | ISPARSE | mode;
inode->i_blksize = sb->s_blocksize;
inode->i_blocks = 0;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
jfs_inode->otime = inode->i_ctime;
inode->i_version = ++event;
inode->i_generation = JFS_SBI(sb)->gengen++;
jfs_inode->cflag = 0;
set_cflag(COMMIT_New, inode);
/* Zero remaining fields */
memset(&jfs_inode->acl, 0, sizeof(dxd_t));
memset(&jfs_inode->ea, 0, sizeof(dxd_t));
jfs_inode->next_index = 0;
jfs_inode->acltype = 0;
jfs_inode->btorder = 0;
jfs_inode->btindex = 0;
jfs_inode->bxflag = 0;
jfs_inode->blid = 0;
jfs_inode->atlhead = 0;
jfs_inode->atltail = 0;
jfs_inode->xtlid = 0;
jFYI(1, ("ialloc returns inode = 0x%p\n", inode));
return inode;
}
/*
* NAME: iwritelocklist()
*
* FUNCTION: Lock multiple inodes in sorted order to avoid deadlock
*
*/
void iwritelocklist(int n, ...)
{
va_list ilist;
struct inode *sort[4];
struct inode *ip;
int k, m;
va_start(ilist, n);
for (k = 0; k < n; k++)
sort[k] = va_arg(ilist, struct inode *);
va_end(ilist);
/* Bubble sort in descending order */
do {
m = 0;
for (k = 0; k < n; k++)
if ((k + 1) < n
&& sort[k + 1]->i_ino > sort[k]->i_ino) {
ip = sort[k];
sort[k] = sort[k + 1];
sort[k + 1] = ip;
m++;
}
} while (m);
/* Lock them */
for (k = 0; k < n; k++) {
IWRITE_LOCK(sort[k]);
}
}
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#ifndef _H_JFS_INODE
#define _H_JFS_INODE
extern struct inode *ialloc(struct inode *, umode_t);
#endif /* _H_JFS_INODE */
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#ifndef _H_JFS_LOCK
#define _H_JFS_LOCK
#include <linux/spinlock.h>
#include <linux/sched.h>
/*
* jfs_lock.h
*
* JFS lock definition for globally referenced locks
*/
/* readers/writer lock: thread-thread */
/*
* RW semaphores do not currently have a trylock function. Since the
* implementation varies by platform, I have implemented a platform-independent
* wrapper around the rw_semaphore routines. If this turns out to be the best
* way of avoiding our locking problems, I will push to get a trylock
* implemented in the kernel, but I'd rather find a way to avoid having to
* use it.
*/
#define RDWRLOCK_T jfs_rwlock_t
static inline void RDWRLOCK_INIT(jfs_rwlock_t * Lock)
{
init_rwsem(&Lock->rw_sem);
atomic_set(&Lock->in_use, 0);
}
static inline void READ_LOCK(jfs_rwlock_t * Lock)
{
atomic_inc(&Lock->in_use);
down_read(&Lock->rw_sem);
}
static inline void READ_UNLOCK(jfs_rwlock_t * Lock)
{
up_read(&Lock->rw_sem);
atomic_dec(&Lock->in_use);
}
static inline void WRITE_LOCK(jfs_rwlock_t * Lock)
{
atomic_inc(&Lock->in_use);
down_write(&Lock->rw_sem);
}
static inline int WRITE_TRYLOCK(jfs_rwlock_t * Lock)
{
if (atomic_read(&Lock->in_use))
return 0;
WRITE_LOCK(Lock);
return 1;
}
static inline void WRITE_UNLOCK(jfs_rwlock_t * Lock)
{
up_write(&Lock->rw_sem);
atomic_dec(&Lock->in_use);
}
#define IREAD_LOCK(ip) READ_LOCK(&JFS_IP(ip)->rdwrlock)
#define IREAD_UNLOCK(ip) READ_UNLOCK(&JFS_IP(ip)->rdwrlock)
#define IWRITE_LOCK(ip) WRITE_LOCK(&JFS_IP(ip)->rdwrlock)
#define IWRITE_TRYLOCK(ip) WRITE_TRYLOCK(&JFS_IP(ip)->rdwrlock)
#define IWRITE_UNLOCK(ip) WRITE_UNLOCK(&JFS_IP(ip)->rdwrlock)
#define IWRITE_LOCK_LIST iwritelocklist
extern void iwritelocklist(int, ...);
/*
* Conditional sleep where condition is protected by spinlock
*
* lock_cmd and unlock_cmd take and release the spinlock
*/
#define __SLEEP_COND(wq, cond, lock_cmd, unlock_cmd) \
do { \
DECLARE_WAITQUEUE(__wait, current); \
\
add_wait_queue(&wq, &__wait); \
for (;;) { \
set_current_state(TASK_UNINTERRUPTIBLE);\
if (cond) \
break; \
unlock_cmd; \
schedule(); \
lock_cmd; \
} \
current->state = TASK_RUNNING; \
remove_wait_queue(&wq, &__wait); \
} while (0)
#endif /* _H_JFS_LOCK */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*/
/*
* jfs_logmgr.c: log manager
*
* for related information, see transaction manager (jfs_txnmgr.c), and
* recovery manager (jfs_logredo.c).
*
* note: for detail, RTFS.
*
* log buffer manager:
* special purpose buffer manager supporting log i/o requirements.
* per log serial pageout of logpage
* queuing i/o requests and redrive i/o at iodone
* maintain current logpage buffer
* no caching since append only
* appropriate jfs buffer cache buffers as needed
*
* group commit:
* transactions which wrote COMMIT records in the same in-memory
* log page during the pageout of previous/current log page(s) are
* committed together by the pageout of the page.
*
* TBD lazy commit:
* transactions are committed asynchronously when the log page
* containing it COMMIT is paged out when it becomes full;
*
* serialization:
* . a per log lock serialize log write.
* . a per log lock serialize group commit.
* . a per log lock serialize log open/close;
*
* TBD log integrity:
* careful-write (ping-pong) of last logpage to recover from crash
* in overwrite.
* detection of split (out-of-order) write of physical sectors
* of last logpage via timestamp at end of each sector
* with its mirror data array at trailer).
*
* alternatives:
* lsn - 64-bit monotonically increasing integer vs
* 32-bit lspn and page eor.
*/
#include <linux/fs.h>
#include <linux/locks.h>
#include <linux/slab.h>
#include <linux/blkdev.h>
#include <linux/interrupt.h>
#include <linux/smp_lock.h>
#include "jfs_incore.h"
#include "jfs_filsys.h"
#include "jfs_metapage.h"
#include "jfs_txnmgr.h"
#include "jfs_debug.h"
/*
* lbuf's ready to be redriven. Protected by log_redrive_lock (jfsIOtask)
*/
static lbuf_t *log_redrive_list;
static spinlock_t log_redrive_lock = SPIN_LOCK_UNLOCKED;
/*
* log read/write serialization (per log)
*/
#define LOG_LOCK_INIT(log) init_MUTEX(&(log)->loglock)
#define LOG_LOCK(log) down(&((log)->loglock))
#define LOG_UNLOCK(log) up(&((log)->loglock))
/*
* log group commit serialization (per log)
*/
#define LOGGC_LOCK_INIT(log) spin_lock_init(&(log)->gclock)
#define LOGGC_LOCK(log) spin_lock_irq(&(log)->gclock)
#define LOGGC_UNLOCK(log) spin_unlock_irq(&(log)->gclock)
#define LOGGC_WAKEUP(tblk) wake_up(&(tblk)->gcwait)
/*
* log sync serialization (per log)
*/
#define LOGSYNC_DELTA(logsize) min((logsize)/8, 128*LOGPSIZE)
#define LOGSYNC_BARRIER(logsize) ((logsize)/4)
/*
#define LOGSYNC_DELTA(logsize) min((logsize)/4, 256*LOGPSIZE)
#define LOGSYNC_BARRIER(logsize) ((logsize)/2)
*/
/*
* log buffer cache synchronization
*/
static spinlock_t jfsLCacheLock = SPIN_LOCK_UNLOCKED;
#define LCACHE_LOCK(flags) spin_lock_irqsave(&jfsLCacheLock, flags)
#define LCACHE_UNLOCK(flags) spin_unlock_irqrestore(&jfsLCacheLock, flags)
/*
* See __SLEEP_COND in jfs_locks.h
*/
#define LCACHE_SLEEP_COND(wq, cond, flags) \
do { \
if (cond) \
break; \
__SLEEP_COND(wq, cond, LCACHE_LOCK(flags), LCACHE_UNLOCK(flags)); \
} while (0)
#define LCACHE_WAKEUP(event) wake_up(event)
/*
* lbuf buffer cache (lCache) control
*/
/* log buffer manager pageout control (cumulative, inclusive) */
#define lbmREAD 0x0001
#define lbmWRITE 0x0002 /* enqueue at tail of write queue;
* init pageout if at head of queue;
*/
#define lbmRELEASE 0x0004 /* remove from write queue
* at completion of pageout;
* do not free/recycle it yet:
* caller will free it;
*/
#define lbmSYNC 0x0008 /* do not return to freelist
* when removed from write queue;
*/
#define lbmFREE 0x0010 /* return to freelist
* at completion of pageout;
* the buffer may be recycled;
*/
#define lbmDONE 0x0020
#define lbmERROR 0x0040
#define lbmGC 0x0080 /* lbmIODone to perform post-GC processing
* of log page
*/
#define lbmDIRECT 0x0100
/*
* external references
*/
extern void vPut(struct inode *ip);
extern void txLazyUnlock(tblock_t * tblk);
extern int jfs_thread_stopped(void);
extern struct task_struct *jfsIOtask;
extern struct completion jfsIOwait;
/*
* forward references
*/
static int lmWriteRecord(log_t * log, tblock_t * tblk, lrd_t * lrd,
tlock_t * tlck);
static int lmNextPage(log_t * log);
static int lmLogInit(log_t * log);
static int lmLogShutdown(log_t * log);
static int lbmLogInit(log_t * log);
static void lbmLogShutdown(log_t * log);
static lbuf_t *lbmAllocate(log_t * log, int);
static void lbmFree(lbuf_t * bp);
static void lbmfree(lbuf_t * bp);
static int lbmRead(log_t * log, int pn, lbuf_t ** bpp);
static void lbmWrite(log_t * log, lbuf_t * bp, int flag, int cant_block);
static void lbmDirectWrite(log_t * log, lbuf_t * bp, int flag);
static int lbmIOWait(lbuf_t * bp, int flag);
static int lbmIODone(struct bio *bio, int);
#ifdef _STILL_TO_PORT
static void lbmDirectIODone(iobuf_t * ddbp);
#endif /* _STILL_TO_PORT */
void lbmStartIO(lbuf_t * bp);
void lmGCwrite(log_t * log, int cant_block);
/*
* statistics
*/
#ifdef CONFIG_JFS_STATISTICS
struct lmStat {
uint commit; /* # of commit */
uint pagedone; /* # of page written */
uint submitted; /* # of pages submitted */
} lmStat;
#endif
/*
* NAME: lmLog()
*
* FUNCTION: write a log record;
*
* PARAMETER:
*
* RETURN: lsn - offset to the next log record to write (end-of-log);
* -1 - error;
*
* note: todo: log error handler
*/
int lmLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
{
int lsn;
int diffp, difft;
metapage_t *mp = NULL;
jFYI(1, ("lmLog: log:0x%p tblk:0x%p, lrd:0x%p tlck:0x%p\n",
log, tblk, lrd, tlck));
LOG_LOCK(log);
/* log by (out-of-transaction) JFS ? */
if (tblk == NULL)
goto writeRecord;
/* log from page ? */
if (tlck == NULL ||
tlck->type & tlckBTROOT || (mp = tlck->mp) == NULL)
goto writeRecord;
/*
* initialize/update page/transaction recovery lsn
*/
lsn = log->lsn;
LOGSYNC_LOCK(log);
/*
* initialize page lsn if first log write of the page
*/
if (mp->lsn == 0) {
mp->log = log;
mp->lsn = lsn;
log->count++;
/* insert page at tail of logsynclist */
list_add_tail(&mp->synclist, &log->synclist);
}
/*
* initialize/update lsn of tblock of the page
*
* transaction inherits oldest lsn of pages associated
* with allocation/deallocation of resources (their
* log records are used to reconstruct allocation map
* at recovery time: inode for inode allocation map,
* B+-tree index of extent descriptors for block
* allocation map);
* allocation map pages inherit transaction lsn at
* commit time to allow forwarding log syncpt past log
* records associated with allocation/deallocation of
* resources only after persistent map of these map pages
* have been updated and propagated to home.
*/
/*
* initialize transaction lsn:
*/
if (tblk->lsn == 0) {
/* inherit lsn of its first page logged */
tblk->lsn = mp->lsn;
log->count++;
/* insert tblock after the page on logsynclist */
list_add(&tblk->synclist, &mp->synclist);
}
/*
* update transaction lsn:
*/
else {
/* inherit oldest/smallest lsn of page */
logdiff(diffp, mp->lsn, log);
logdiff(difft, tblk->lsn, log);
if (diffp < difft) {
/* update tblock lsn with page lsn */
tblk->lsn = mp->lsn;
/* move tblock after page on logsynclist */
list_del(&tblk->synclist);
list_add(&tblk->synclist, &mp->synclist);
}
}
LOGSYNC_UNLOCK(log);
/*
* write the log record
*/
writeRecord:
lsn = lmWriteRecord(log, tblk, lrd, tlck);
/*
* forward log syncpt if log reached next syncpt trigger
*/
logdiff(diffp, lsn, log);
if (diffp >= log->nextsync)
lsn = lmLogSync(log, 0);
/* update end-of-log lsn */
log->lsn = lsn;
LOG_UNLOCK(log);
/* return end-of-log address */
return lsn;
}
/*
* NAME: lmWriteRecord()
*
* FUNCTION: move the log record to current log page
*
* PARAMETER: cd - commit descriptor
*
* RETURN: end-of-log address
*
* serialization: LOG_LOCK() held on entry/exit
*/
static int
lmWriteRecord(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
{
int lsn = 0; /* end-of-log address */
lbuf_t *bp; /* dst log page buffer */
logpage_t *lp; /* dst log page */
caddr_t dst; /* destination address in log page */
int dstoffset; /* end-of-log offset in log page */
int freespace; /* free space in log page */
caddr_t p; /* src meta-data page */
caddr_t src;
int srclen;
int nbytes; /* number of bytes to move */
int i;
int len;
linelock_t *linelock;
lv_t *lv;
lvd_t *lvd;
int l2linesize;
len = 0;
/* retrieve destination log page to write */
bp = (lbuf_t *) log->bp;
lp = (logpage_t *) bp->l_ldata;
dstoffset = log->eor;
/* any log data to write ? */
if (tlck == NULL)
goto moveLrd;
/*
* move log record data
*/
/* retrieve source meta-data page to log */
if (tlck->flag & tlckPAGELOCK) {
p = (caddr_t) (tlck->mp->data);
linelock = (linelock_t *) & tlck->lock;
}
/* retrieve source in-memory inode to log */
else if (tlck->flag & tlckINODELOCK) {
if (tlck->type & tlckDTREE)
p = (caddr_t) &JFS_IP(tlck->ip)->i_dtroot;
else
p = (caddr_t) &JFS_IP(tlck->ip)->i_xtroot;
linelock = (linelock_t *) & tlck->lock;
}
#ifdef _JFS_WIP
else if (tlck->flag & tlckINLINELOCK) {
inlinelock = (inlinelock_t *) & tlck;
p = (caddr_t) & inlinelock->pxd;
linelock = (linelock_t *) & tlck;
}
#endif /* _JFS_WIP */
else {
jERROR(2, ("lmWriteRecord: UFO tlck:0x%p\n", tlck));
return 0; /* Probably should trap */
}
l2linesize = linelock->l2linesize;
moveData:
ASSERT(linelock->index <= linelock->maxcnt);
lv = (lv_t *) & linelock->lv;
for (i = 0; i < linelock->index; i++, lv++) {
if (lv->length == 0)
continue;
/* is page full ? */
if (dstoffset >= LOGPSIZE - LOGPTLRSIZE) {
/* page become full: move on to next page */
lmNextPage(log);
bp = log->bp;
lp = (logpage_t *) bp->l_ldata;
dstoffset = LOGPHDRSIZE;
}
/*
* move log vector data
*/
src = (u8 *) p + (lv->offset << l2linesize);
srclen = lv->length << l2linesize;
len += srclen;
while (srclen > 0) {
freespace = (LOGPSIZE - LOGPTLRSIZE) - dstoffset;
nbytes = min(freespace, srclen);
dst = (caddr_t) lp + dstoffset;
memcpy(dst, src, nbytes);
dstoffset += nbytes;
/* is page not full ? */
if (dstoffset < LOGPSIZE - LOGPTLRSIZE)
break;
/* page become full: move on to next page */
lmNextPage(log);
bp = (lbuf_t *) log->bp;
lp = (logpage_t *) bp->l_ldata;
dstoffset = LOGPHDRSIZE;
srclen -= nbytes;
src += nbytes;
}
/*
* move log vector descriptor
*/
len += 4;
lvd = (lvd_t *) ((caddr_t) lp + dstoffset);
lvd->offset = cpu_to_le16(lv->offset);
lvd->length = cpu_to_le16(lv->length);
dstoffset += 4;
jFYI(1,
("lmWriteRecord: lv offset:%d length:%d\n",
lv->offset, lv->length));
}
if ((i = linelock->next)) {
linelock = (linelock_t *) lid_to_tlock(i);
goto moveData;
}
/*
* move log record descriptor
*/
moveLrd:
lrd->length = cpu_to_le16(len);
src = (caddr_t) lrd;
srclen = LOGRDSIZE;
while (srclen > 0) {
freespace = (LOGPSIZE - LOGPTLRSIZE) - dstoffset;
nbytes = min(freespace, srclen);
dst = (caddr_t) lp + dstoffset;
memcpy(dst, src, nbytes);
dstoffset += nbytes;
srclen -= nbytes;
/* are there more to move than freespace of page ? */
if (srclen)
goto pageFull;
/*
* end of log record descriptor
*/
/* update last log record eor */
log->eor = dstoffset;
bp->l_eor = dstoffset;
lsn = (log->page << L2LOGPSIZE) + dstoffset;
if (lrd->type & cpu_to_le16(LOG_COMMIT)) {
tblk->clsn = lsn;
jFYI(1,
("wr: tclsn:0x%x, beor:0x%x\n", tblk->clsn,
bp->l_eor));
INCREMENT(lmStat.commit); /* # of commit */
/*
* enqueue tblock for group commit:
*
* enqueue tblock of non-trivial/synchronous COMMIT
* at tail of group commit queue
* (trivial/asynchronous COMMITs are ignored by
* group commit.)
*/
LOGGC_LOCK(log);
/* init tblock gc state */
tblk->flag = tblkGC_QUEUE;
tblk->bp = log->bp;
tblk->pn = log->page;
tblk->eor = log->eor;
init_waitqueue_head(&tblk->gcwait);
/* enqueue transaction to commit queue */
tblk->cqnext = NULL;
if (log->cqueue.head) {
log->cqueue.tail->cqnext = tblk;
log->cqueue.tail = tblk;
} else
log->cqueue.head = log->cqueue.tail = tblk;
LOGGC_UNLOCK(log);
}
jFYI(1,
("lmWriteRecord: lrd:0x%04x bp:0x%p pn:%d eor:0x%x\n",
le16_to_cpu(lrd->type), log->bp, log->page,
dstoffset));
/* page not full ? */
if (dstoffset < LOGPSIZE - LOGPTLRSIZE)
return lsn;
pageFull:
/* page become full: move on to next page */
lmNextPage(log);
bp = (lbuf_t *) log->bp;
lp = (logpage_t *) bp->l_ldata;
dstoffset = LOGPHDRSIZE;
src += nbytes;
}
return lsn;
}
/*
* NAME: lmNextPage()
*
* FUNCTION: write current page and allocate next page.
*
* PARAMETER: log
*
* RETURN: 0
*
* serialization: LOG_LOCK() held on entry/exit
*/
static int lmNextPage(log_t * log)
{
logpage_t *lp;
int lspn; /* log sequence page number */
int pn; /* current page number */
lbuf_t *bp;
lbuf_t *nextbp;
tblock_t *tblk;
jFYI(1, ("lmNextPage\n"));
/* get current log page number and log sequence page number */
pn = log->page;
bp = log->bp;
lp = (logpage_t *) bp->l_ldata;
lspn = le32_to_cpu(lp->h.page);
LOGGC_LOCK(log);
/*
* write or queue the full page at the tail of write queue
*/
/* get the tail tblk on commit queue */
tblk = log->cqueue.tail;
/* every tblk who has COMMIT record on the current page,
* and has not been committed, must be on commit queue
* since tblk is queued at commit queueu at the time
* of writing its COMMIT record on the page before
* page becomes full (even though the tblk thread
* who wrote COMMIT record may have been suspended
* currently);
*/
/* is page bound with outstanding tail tblk ? */
if (tblk && tblk->pn == pn) {
/* mark tblk for end-of-page */
tblk->flag |= tblkGC_EOP;
/* if page is not already on write queue,
* just enqueue (no lbmWRITE to prevent redrive)
* buffer to wqueue to ensure correct serial order
* of the pages since log pages will be added
* continuously (tblk bound with the page hasn't
* got around to init write of the page, either
* preempted or the page got filled by its COMMIT
* record);
* pages with COMMIT are paged out explicitly by
* tblk in lmGroupCommit();
*/
if (bp->l_wqnext == NULL) {
/* bp->l_ceor = bp->l_eor; */
/* lp->h.eor = lp->t.eor = bp->l_ceor; */
lbmWrite(log, bp, 0, 0);
}
}
/* page is not bound with outstanding tblk:
* init write or mark it to be redriven (lbmWRITE)
*/
else {
/* finalize the page */
bp->l_ceor = bp->l_eor;
lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_ceor);
lbmWrite(log, bp, lbmWRITE | lbmRELEASE | lbmFREE, 0);
}
LOGGC_UNLOCK(log);
/*
* allocate/initialize next page
*/
/* if log wraps, the first data page of log is 2
* (0 never used, 1 is superblock).
*/
log->page = (pn == log->size - 1) ? 2 : pn + 1;
log->eor = LOGPHDRSIZE; /* ? valid page empty/full at logRedo() */
/* allocate/initialize next log page buffer */
nextbp = lbmAllocate(log, log->page);
nextbp->l_eor = log->eor;
log->bp = nextbp;
/* initialize next log page */
lp = (logpage_t *) nextbp->l_ldata;
lp->h.page = lp->t.page = cpu_to_le32(lspn + 1);
lp->h.eor = lp->t.eor = cpu_to_le16(LOGPHDRSIZE);
jFYI(1, ("lmNextPage done\n"));
return 0;
}
/*
* NAME: lmGroupCommit()
*
* FUNCTION: group commit
* initiate pageout of the pages with COMMIT in the order of
* page number - redrive pageout of the page at the head of
* pageout queue until full page has been written.
*
* RETURN:
*
* NOTE:
* LOGGC_LOCK serializes log group commit queue, and
* transaction blocks on the commit queue.
* N.B. LOG_LOCK is NOT held during lmGroupCommit().
*/
int lmGroupCommit(log_t * log, tblock_t * tblk)
{
int rc = 0;
LOGGC_LOCK(log);
/* group committed already ? */
if (tblk->flag & tblkGC_COMMITTED) {
if (tblk->flag & tblkGC_ERROR)
rc = EIO;
LOGGC_UNLOCK(log);
return rc;
}
jFYI(1,
("lmGroup Commit: tblk = 0x%p, gcrtc = %d\n", tblk,
log->gcrtc));
/*
* group commit pageout in progress
*/
if ((!(log->cflag & logGC_PAGEOUT)) && log->cqueue.head) {
/*
* only transaction in the commit queue:
*
* start one-transaction group commit as
* its group leader.
*/
log->cflag |= logGC_PAGEOUT;
lmGCwrite(log, 0);
}
/* lmGCwrite gives up LOGGC_LOCK, check again */
if (tblk->flag & tblkGC_COMMITTED) {
if (tblk->flag & tblkGC_ERROR)
rc = EIO;
LOGGC_UNLOCK(log);
return rc;
}
/* upcount transaction waiting for completion
*/
log->gcrtc++;
if (tblk->xflag & COMMIT_LAZY) {
tblk->flag |= tblkGC_LAZY;
LOGGC_UNLOCK(log);
return 0;
}
tblk->flag |= tblkGC_READY;
__SLEEP_COND(tblk->gcwait, (tblk->flag & tblkGC_COMMITTED),
LOGGC_LOCK(log), LOGGC_UNLOCK(log));
/* removed from commit queue */
if (tblk->flag & tblkGC_ERROR)
rc = EIO;
LOGGC_UNLOCK(log);
return rc;
}
/*
* NAME: lmGCwrite()
*
* FUNCTION: group commit write
* initiate write of log page, building a group of all transactions
* with commit records on that page.
*
* RETURN: None
*
* NOTE:
* LOGGC_LOCK must be held by caller.
* N.B. LOG_LOCK is NOT held during lmGroupCommit().
*/
void lmGCwrite(log_t * log, int cant_write)
{
lbuf_t *bp;
logpage_t *lp;
int gcpn; /* group commit page number */
tblock_t *tblk;
tblock_t *xtblk;
/*
* build the commit group of a log page
*
* scan commit queue and make a commit group of all
* transactions with COMMIT records on the same log page.
*/
/* get the head tblk on the commit queue */
tblk = xtblk = log->cqueue.head;
gcpn = tblk->pn;
while (tblk && tblk->pn == gcpn) {
xtblk = tblk;
/* state transition: (QUEUE, READY) -> COMMIT */
tblk->flag |= tblkGC_COMMIT;
tblk = tblk->cqnext;
}
tblk = xtblk; /* last tblk of the page */
/*
* pageout to commit transactions on the log page.
*/
bp = (lbuf_t *) tblk->bp;
lp = (logpage_t *) bp->l_ldata;
/* is page already full ? */
if (tblk->flag & tblkGC_EOP) {
/* mark page to free at end of group commit of the page */
tblk->flag &= ~tblkGC_EOP;
tblk->flag |= tblkGC_FREE;
bp->l_ceor = bp->l_eor;
lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_ceor);
jEVENT(0,
("gc: tclsn:0x%x, bceor:0x%x\n", tblk->clsn,
bp->l_ceor));
lbmWrite(log, bp, lbmWRITE | lbmRELEASE | lbmGC,
cant_write);
}
/* page is not yet full */
else {
bp->l_ceor = tblk->eor; /* ? bp->l_ceor = bp->l_eor; */
lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_ceor);
jEVENT(0,
("gc: tclsn:0x%x, bceor:0x%x\n", tblk->clsn,
bp->l_ceor));
lbmWrite(log, bp, lbmWRITE | lbmGC, cant_write);
}
}
/*
* NAME: lmPostGC()
*
* FUNCTION: group commit post-processing
* Processes transactions after their commit records have been written
* to disk, redriving log I/O if necessary.
*
* RETURN: None
*
* NOTE:
* This routine is called a interrupt time by lbmIODone
*/
void lmPostGC(lbuf_t * bp)
{
unsigned long flags;
log_t *log = bp->l_log;
logpage_t *lp;
tblock_t *tblk;
//LOGGC_LOCK(log);
spin_lock_irqsave(&log->gclock, flags);
/*
* current pageout of group commit completed.
*
* remove/wakeup transactions from commit queue who were
* group committed with the current log page
*/
while ((tblk = log->cqueue.head) && (tblk->flag & tblkGC_COMMIT)) {
/* if transaction was marked GC_COMMIT then
* it has been shipped in the current pageout
* and made it to disk - it is committed.
*/
if (bp->l_flag & lbmERROR)
tblk->flag |= tblkGC_ERROR;
/* remove it from the commit queue */
log->cqueue.head = tblk->cqnext;
if (log->cqueue.head == NULL)
log->cqueue.tail = NULL;
tblk->flag &= ~tblkGC_QUEUE;
tblk->cqnext = 0;
jEVENT(0,
("lmPostGC: tblk = 0x%p, flag = 0x%x\n", tblk,
tblk->flag));
if (!(tblk->xflag & COMMIT_FORCE))
/*
* Hand tblk over to lazy commit thread
*/
txLazyUnlock(tblk);
else {
/* state transition: COMMIT -> COMMITTED */
tblk->flag |= tblkGC_COMMITTED;
if (tblk->flag & tblkGC_READY) {
log->gcrtc--;
LOGGC_WAKEUP(tblk);
}
}
/* was page full before pageout ?
* (and this is the last tblk bound with the page)
*/
if (tblk->flag & tblkGC_FREE)
lbmFree(bp);
/* did page become full after pageout ?
* (and this is the last tblk bound with the page)
*/
else if (tblk->flag & tblkGC_EOP) {
/* finalize the page */
lp = (logpage_t *) bp->l_ldata;
bp->l_ceor = bp->l_eor;
lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_eor);
jEVENT(0, ("lmPostGC: calling lbmWrite\n"));
lbmWrite(log, bp, lbmWRITE | lbmRELEASE | lbmFREE,
1);
}
}
/* are there any transactions who have entered lnGroupCommit()
* (whose COMMITs are after that of the last log page written.
* They are waiting for new group commit (above at (SLEEP 1)):
* select the latest ready transaction as new group leader and
* wake her up to lead her group.
*/
if ((log->gcrtc > 0) && log->cqueue.head)
/*
* Call lmGCwrite with new group leader
*/
lmGCwrite(log, 1);
/* no transaction are ready yet (transactions are only just
* queued (GC_QUEUE) and not entered for group commit yet).
* let the first transaction entering group commit
* will elect hetself as new group leader.
*/
else
log->cflag &= ~logGC_PAGEOUT;
//LOGGC_UNLOCK(log);
spin_unlock_irqrestore(&log->gclock, flags);
return;
}
/*
* NAME: lmLogSync()
*
* FUNCTION: write log SYNCPT record for specified log
* if new sync address is available
* (normally the case if sync() is executed by back-ground
* process).
* if not, explicitly run jfs_blogsync() to initiate
* getting of new sync address.
* calculate new value of i_nextsync which determines when
* this code is called again.
*
* this is called only from lmLog().
*
* PARAMETER: ip - pointer to logs inode.
*
* RETURN: 0
*
* serialization: LOG_LOCK() held on entry/exit
*/
int lmLogSync(log_t * log, int nosyncwait)
{
int logsize;
int written; /* written since last syncpt */
int free; /* free space left available */
int delta; /* additional delta to write normally */
int more; /* additional write granted */
lrd_t lrd;
int lsn;
struct logsyncblk *lp;
/*
* forward syncpt
*/
/* if last sync is same as last syncpt,
* invoke sync point forward processing to update sync.
*/
if (log->sync == log->syncpt) {
LOGSYNC_LOCK(log);
/* ToDo: push dirty metapages out to disk */
// bmLogSync(log);
if (list_empty(&log->synclist))
log->sync = log->lsn;
else {
lp = list_entry(log->synclist.next,
struct logsyncblk, synclist);
log->sync = lp->lsn;
}
LOGSYNC_UNLOCK(log);
}
/* if sync is different from last syncpt,
* write a SYNCPT record with syncpt = sync.
* reset syncpt = sync
*/
if (log->sync != log->syncpt) {
struct jfs_sb_info *sbi = JFS_SBI(log->sb);
/*
* We need to make sure all of the "written" metapages
* actually make it to disk
*/
fsync_inode_data_buffers(sbi->ipbmap);
fsync_inode_data_buffers(sbi->ipimap);
fsync_inode_data_buffers(sbi->direct_inode);
lrd.logtid = 0;
lrd.backchain = 0;
lrd.type = cpu_to_le16(LOG_SYNCPT);
lrd.length = 0;
lrd.log.syncpt.sync = cpu_to_le32(log->sync);
lsn = lmWriteRecord(log, NULL, &lrd, NULL);
log->syncpt = log->sync;
} else
lsn = log->lsn;
/*
* setup next syncpt trigger (SWAG)
*/
logsize = log->logsize;
logdiff(written, lsn, log);
free = logsize - written;
delta = LOGSYNC_DELTA(logsize);
more = min(free / 2, delta);
if (more < 2 * LOGPSIZE) {
jEVENT(1,
("\n ... Log Wrap ... Log Wrap ... Log Wrap ...\n\n"));
/*
* log wrapping
*
* option 1 - panic ? No.!
* option 2 - shutdown file systems
* associated with log ?
* option 3 - extend log ?
*/
/*
* option 4 - second chance
*
* mark log wrapped, and continue.
* when all active transactions are completed,
* mark log vaild for recovery.
* if crashed during invalid state, log state
* implies invald log, forcing fsck().
*/
/* mark log state log wrap in log superblock */
/* log->state = LOGWRAP; */
/* reset sync point computation */
log->syncpt = log->sync = lsn;
log->nextsync = delta;
} else
/* next syncpt trigger = written + more */
log->nextsync = written + more;
/* return if lmLogSync() from outside of transaction, e.g., sync() */
if (nosyncwait)
return lsn;
/* if number of bytes written from last sync point is more
* than 1/4 of the log size, stop new transactions from
* starting until all current transactions are completed
* by setting syncbarrier flag.
*/
if (written > LOGSYNC_BARRIER(logsize) && logsize > 32 * LOGPSIZE) {
log->syncbarrier = 1;
jFYI(1, ("log barrier on: lsn=0x%x syncpt=0x%x\n", lsn,
log->syncpt));
}
return lsn;
}
/*
* NAME: lmLogOpen()
*
* FUNCTION: open the log on first open;
* insert filesystem in the active list of the log.
*
* PARAMETER: ipmnt - file system mount inode
* iplog - log inode (out)
*
* RETURN:
*
* serialization:
*/
int lmLogOpen(struct super_block *sb, log_t ** logptr)
{
int rc;
kdev_t logdev; /* dev_t of log device */
log_t *log;
logdev = sb->s_dev;
#ifdef _STILL_TO_PORT
/*
* open the inode representing the log device (aka log inode)
*/
if (logdev != fsdev)
goto externalLog;
#endif /* _STILL_TO_PORT */
/*
* in-line log in host file system
*
* file system to log have 1-to-1 relationship;
*/
// inlineLog:
*logptr = log = kmalloc(sizeof(log_t), GFP_KERNEL);
if (log == 0)
return ENOMEM;
memset(log, 0, sizeof(log_t));
log->sb = sb; /* This should be a list */
log->flag = JFS_INLINELOG;
log->dev = logdev;
log->base = addressPXD(&JFS_SBI(sb)->logpxd);
log->size = lengthPXD(&JFS_SBI(sb)->logpxd) >>
(L2LOGPSIZE - sb->s_blocksize_bits);
log->l2bsize = sb->s_blocksize_bits;
ASSERT(L2LOGPSIZE >= sb->s_blocksize_bits);
/*
* initialize log.
*/
if ((rc = lmLogInit(log)))
goto errout10;
#ifdef _STILL_TO_PORT
goto out;
/*
* external log as separate logical volume
*
* file systems to log may have n-to-1 relationship;
*/
externalLog:
/*
* open log inode
*
* log inode is reserved inode of (dev_t = log device,
* fileset number = 0, i_number = 0), which acquire
* one i_count for each open by file system.
*
* hand craft dummy vfs to force iget() the special case of
* an in-memory inode allocation without on-disk inode
*/
memset(&dummyvfs, 0, sizeof(struct vfs));
dummyvfs.filesetvfs.vfs_data = NULL;
dummyvfs.dummyvfs.dev = logdev;
dummyvfs.dummyvfs.ipmnt = NULL;
ICACHE_LOCK();
rc = iget((struct vfs *) &dummyvfs, 0, (inode_t **) & log, 0);
ICACHE_UNLOCK();
if (rc)
return rc;
log->flag = 0;
log->dev = logdev;
log->base = 0;
log->size = 0;
/*
* serialize open/close between multiple file systems
* bound with the log;
*/
ip = (inode_t *) log;
IWRITE_LOCK(ip);
/*
* subsequent open: add file system to log active file system list
*/
#ifdef _JFS_OS2
if (log->strat2p)
#endif /* _JFS_OS2 */
{
if (rc = lmLogFileSystem(log, fsdev, 1))
goto errout10;
IWRITE_UNLOCK(ip);
*iplog = ip;
jFYI(1, ("lmLogOpen: exit(0)\n"));
return 0;
}
/* decouple log inode from dummy vfs */
vPut(ip);
/*
* first open:
*/
#ifdef _JFS_OS2
/*
* establish access to the single/shared (already open) log device
*/
logdevfp = (void *) logStrat2;
log->strat2p = logStrat2;
log->strat3p = logStrat3;
log->l2pbsize = 9; /* todo: when OS/2 have multiple external log */
#endif /* _JFS_OS2 */
/*
* initialize log:
*/
if (rc = lmLogInit(log))
goto errout20;
/*
* add file system to log active file system list
*/
if (rc = lmLogFileSystem(log, fsdev, 1))
goto errout30;
/*
* insert log device into log device list
*/
out:
#endif /* _STILL_TO_PORT */
jFYI(1, ("lmLogOpen: exit(0)\n"));
return 0;
/*
* unwind on error
*/
#ifdef _STILL_TO_PORT
errout30: /* unwind lbmLogInit() */
lbmLogShutdown(log);
errout20: /* close external log device */
#endif /* _STILL_TO_PORT */
errout10: /* free log inode */
kfree(log);
jFYI(1, ("lmLogOpen: exit(%d)\n", rc));
return rc;
}
/*
* NAME: lmLogInit()
*
* FUNCTION: log initialization at first log open.
*
* logredo() (or logformat()) should have been run previously.
* initialize the log inode from log superblock.
* set the log state in the superblock to LOGMOUNT and
* write SYNCPT log record.
*
* PARAMETER: log - log structure
*
* RETURN: 0 - if ok
* EINVAL - bad log magic number or superblock dirty
* error returned from logwait()
*
* serialization: single first open thread
*/
static int lmLogInit(log_t * log)
{
int rc = 0;
lrd_t lrd;
logsuper_t *logsuper;
lbuf_t *bpsuper;
lbuf_t *bp;
logpage_t *lp;
int lsn;
jFYI(1, ("lmLogInit: log:0x%p\n", log));
/*
* log inode is overlaid on generic inode where
* dinode have been zeroed out by iRead();
*/
/*
* initialize log i/o
*/
if ((rc = lbmLogInit(log)))
return rc;
/*
* validate log superblock
*/
if ((rc = lbmRead(log, 1, &bpsuper)))
goto errout10;
logsuper = (logsuper_t *) bpsuper->l_ldata;
if (logsuper->magic != cpu_to_le32(LOGMAGIC)) {
jERROR(1, ("*** Log Format Error ! ***\n"));
rc = EINVAL;
goto errout20;
}
/* logredo() should have been run successfully. */
if (logsuper->state != cpu_to_le32(LOGREDONE)) {
jERROR(1, ("*** Log Is Dirty ! ***\n"));
rc = EINVAL;
goto errout20;
}
/* initialize log inode from log superblock */
if (log->flag & JFS_INLINELOG) {
if (log->size != le32_to_cpu(logsuper->size)) {
rc = EINVAL;
goto errout20;
}
jFYI(0,
("lmLogInit: inline log:0x%p base:0x%Lx size:0x%x\n",
log, (unsigned long long) log->base, log->size));
} else {
log->size = le32_to_cpu(logsuper->size);
jFYI(0,
("lmLogInit: external log:0x%p base:0x%Lx size:0x%x\n",
log, (unsigned long long) log->base, log->size));
}
log->flag |= JFS_GROUPCOMMIT;
/*
log->flag |= JFS_LAZYCOMMIT;
*/
log->page = le32_to_cpu(logsuper->end) / LOGPSIZE;
log->eor = le32_to_cpu(logsuper->end) - (LOGPSIZE * log->page);
/*
* initialize for log append write mode
*/
/* establish current/end-of-log page/buffer */
if ((rc = lbmRead(log, log->page, &bp)))
goto errout20;
lp = (logpage_t *) bp->l_ldata;
jFYI(1, ("lmLogInit: lsn:0x%x page:%d eor:%d:%d\n",
le32_to_cpu(logsuper->end), log->page, log->eor,
le16_to_cpu(lp->h.eor)));
// ASSERT(log->eor == lp->h.eor);
log->bp = bp;
bp->l_pn = log->page;
bp->l_eor = log->eor;
/* initialize the group commit serialization lock */
LOGGC_LOCK_INIT(log);
/* if current page is full, move on to next page */
if (log->eor >= LOGPSIZE - LOGPTLRSIZE)
lmNextPage(log);
/* allocate/initialize the log write serialization lock */
LOG_LOCK_INIT(log);
/*
* initialize log syncpoint
*/
/*
* write the first SYNCPT record with syncpoint = 0
* (i.e., log redo up to HERE !);
* remove current page from lbm write queue at end of pageout
* (to write log superblock update), but do not release to freelist;
*/
lrd.logtid = 0;
lrd.backchain = 0;
lrd.type = cpu_to_le16(LOG_SYNCPT);
lrd.length = 0;
lrd.log.syncpt.sync = 0;
lsn = lmWriteRecord(log, NULL, &lrd, NULL);
bp = log->bp;
bp->l_ceor = bp->l_eor;
lp = (logpage_t *) bp->l_ldata;
lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_eor);
lbmWrite(log, bp, lbmWRITE | lbmSYNC, 0);
if ((rc = lbmIOWait(bp, 0)))
goto errout30;
/* initialize logsync parameters */
log->logsize = (log->size - 2) << L2LOGPSIZE;
log->lsn = lsn;
log->syncpt = lsn;
log->sync = log->syncpt;
log->nextsync = LOGSYNC_DELTA(log->logsize);
init_waitqueue_head(&log->syncwait);
jFYI(1, ("lmLogInit: lsn:0x%x syncpt:0x%x sync:0x%x\n",
log->lsn, log->syncpt, log->sync));
LOGSYNC_LOCK_INIT(log);
INIT_LIST_HEAD(&log->synclist);
log->cqueue.head = log->cqueue.tail = 0;
log->count = 0;
/*
* initialize for lazy/group commit
*/
log->clsn = lsn;
/*
* update/write superblock
*/
logsuper->state = cpu_to_le32(LOGMOUNT);
log->serial = le32_to_cpu(logsuper->serial) + 1;
logsuper->serial = cpu_to_le32(log->serial);
lbmDirectWrite(log, bpsuper, lbmWRITE | lbmRELEASE | lbmSYNC);
if ((rc = lbmIOWait(bpsuper, lbmFREE)))
goto errout30;
jFYI(1, ("lmLogInit: exit(%d)\n", rc));
return 0;
/*
* unwind on error
*/
errout30: /* release log page */
lbmFree(bp);
errout20: /* release log superblock */
lbmFree(bpsuper);
errout10: /* unwind lbmLogInit() */
lbmLogShutdown(log);
jFYI(1, ("lmLogInit: exit(%d)\n", rc));
return rc;
}
/*
* NAME: lmLogClose()
*
* FUNCTION: remove file system <ipmnt> from active list of log <iplog>
* and close it on last close.
*
* PARAMETER: sb - superblock
* log - log inode
*
* RETURN: errors from subroutines
*
* serialization:
*/
int lmLogClose(struct super_block *sb, log_t * log)
{
int rc;
jFYI(1, ("lmLogClose: log:0x%p\n", log));
/*
* in-line log in host file system
*/
// inlineLog:
#ifdef _STILL_TO_PORT
if (log->flag & JFS_INLINELOG) {
rc = lmLogShutdown(log);
goto out1;
}
/*
* external log as separate logical volume
*/
externalLog:
/* serialize open/close between multiple file systems
* associated with the log
*/
IWRITE_LOCK(iplog);
/*
* remove file system from log active file system list
*/
rc = lmLogFileSystem(log, fsdev, 0);
if (iplog->i_count > 1)
goto out2;
/*
* last close: shut down log
*/
rc = ((rc1 = lmLogShutdown(log)) && rc == 0) ? rc1 : rc;
out1:
#else /* _STILL_TO_PORT */
rc = lmLogShutdown(log);
#endif /* _STILL_TO_PORT */
// out2:
jFYI(0, ("lmLogClose: exit(%d)\n", rc));
return rc;
}
/*
* NAME: lmLogShutdown()
*
* FUNCTION: log shutdown at last LogClose().
*
* write log syncpt record.
* update super block to set redone flag to 0.
*
* PARAMETER: log - log inode
*
* RETURN: 0 - success
*
* serialization: single last close thread
*/
static int lmLogShutdown(log_t * log)
{
int rc;
lrd_t lrd;
int lsn;
logsuper_t *logsuper;
lbuf_t *bpsuper;
lbuf_t *bp;
logpage_t *lp;
jFYI(1, ("lmLogShutdown: log:0x%p\n", log));
if (log->cqueue.head || !list_empty(&log->synclist)) {
/*
* If there was very recent activity, we may need to wait
* for the lazycommit thread to catch up
*/
int i;
for (i = 0; i < 800; i++) { /* Too much? */
current->state = TASK_INTERRUPTIBLE;
schedule_timeout(HZ / 4);
if ((log->cqueue.head == NULL) &&
list_empty(&log->synclist))
break;
}
}
assert(log->cqueue.head == NULL);
assert(list_empty(&log->synclist));
/*
* We need to make sure all of the "written" metapages
* actually make it to disk
*/
fsync_no_super(log->sb->s_bdev);
/*
* write the last SYNCPT record with syncpoint = 0
* (i.e., log redo up to HERE !)
*/
lrd.logtid = 0;
lrd.backchain = 0;
lrd.type = cpu_to_le16(LOG_SYNCPT);
lrd.length = 0;
lrd.log.syncpt.sync = 0;
lsn = lmWriteRecord(log, NULL, &lrd, NULL);
bp = log->bp;
lp = (logpage_t *) bp->l_ldata;
lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_eor);
lbmWrite(log, log->bp, lbmWRITE | lbmRELEASE | lbmSYNC, 0);
lbmIOWait(log->bp, lbmFREE);
/*
* synchronous update log superblock
* mark log state as shutdown cleanly
* (i.e., Log does not need to be replayed).
*/
if ((rc = lbmRead(log, 1, &bpsuper)))
goto out;
logsuper = (logsuper_t *) bpsuper->l_ldata;
logsuper->state = cpu_to_le32(LOGREDONE);
logsuper->end = cpu_to_le32(lsn);
lbmDirectWrite(log, bpsuper, lbmWRITE | lbmRELEASE | lbmSYNC);
rc = lbmIOWait(bpsuper, lbmFREE);
jFYI(1, ("lmLogShutdown: lsn:0x%x page:%d eor:%d\n",
lsn, log->page, log->eor));
out:
/*
* shutdown per log i/o
*/
lbmLogShutdown(log);
if (rc) {
jFYI(1, ("lmLogShutdown: exit(%d)\n", rc));
}
return rc;
}
#ifdef _STILL_TO_PORT
/*
* NAME: lmLogFileSystem()
*
* FUNCTION: insert (<activate> = true)/remove (<activate> = false)
* file system into/from log active file system list.
*
* PARAMETE: log - pointer to logs inode.
* fsdev - dev_t of filesystem.
* serial - pointer to returned log serial number
* activate - insert/remove device from active list.
*
* RETURN: 0 - success
* errors returned by vms_iowait().
*
* serialization: IWRITE_LOCK(log inode) held on entry/exit
*/
static int lmLogFileSystem(log_t * log, dev_t fsdev, int activate)
{
int rc = 0;
int bit, word;
logsuper_t *logsuper;
lbuf_t *bpsuper;
/*
* insert/remove file system device to log active file system list.
*/
if ((rc = lbmRead(log, 1, &bpsuper)))
return rc;
logsuper = (logsuper_t *) bpsuper->l_ldata;
bit = MINOR(fsdev);
word = bit / 32;
bit -= 32 * word;
if (activate)
logsuper->active[word] |=
cpu_to_le32((LEFTMOSTONE >> bit));
else
logsuper->active[word] &=
cpu_to_le32((~(LEFTMOSTONE >> bit)));
/*
* synchronous write log superblock:
*
* write sidestream bypassing write queue:
* at file system mount, log super block is updated for
* activation of the file system before any log record
* (MOUNT record) of the file system, and at file system
* unmount, all meta data for the file system has been
* flushed before log super block is updated for deactivation
* of the file system.
*/
lbmDirectWrite(log, bpsuper, lbmWRITE | lbmRELEASE | lbmSYNC);
rc = lbmIOWait(bpsuper, lbmFREE);
return rc;
}
#endif /* _STILL_TO_PORT */
/*
* lmLogQuiesce()
*/
int lmLogQuiesce(log_t * log)
{
int rc;
rc = lmLogShutdown(log);
return rc;
}
/*
* lmLogResume()
*/
int lmLogResume(log_t * log, struct super_block *sb)
{
struct jfs_sb_info *sbi = JFS_SBI(sb);
int rc;
log->base = addressPXD(&sbi->logpxd);
log->size =
(lengthPXD(&sbi->logpxd) << sb->s_blocksize_bits) >> L2LOGPSIZE;
rc = lmLogInit(log);
return rc;
}
/*
* log buffer manager (lbm)
* ------------------------
*
* special purpose buffer manager supporting log i/o requirements.
*
* per log write queue:
* log pageout occurs in serial order by fifo write queue and
* restricting to a single i/o in pregress at any one time.
* a circular singly-linked list
* (log->wrqueue points to the tail, and buffers are linked via
* bp->wrqueue field), and
* maintains log page in pageout ot waiting for pageout in serial pageout.
*/
/*
* lbmLogInit()
*
* initialize per log I/O setup at lmLogInit()
*/
static int lbmLogInit(log_t * log)
{ /* log inode */
int i;
lbuf_t *lbuf;
jFYI(1, ("lbmLogInit: log:0x%p\n", log));
/* initialize current buffer cursor */
log->bp = NULL;
/* initialize log device write queue */
log->wqueue = NULL;
/*
* Each log has its own buffer pages allocated to it. These are
* not managed by the page cache. This ensures that a transaction
* writing to the log does not block trying to allocate a page from
* the page cache (for the log). This would be bad, since page
* allocation waits on the kswapd thread that may be committing inodes
* which would cause log activity. Was that clear? I'm trying to
* avoid deadlock here.
*/
init_waitqueue_head(&log->free_wait);
log->lbuf_free = NULL;
for (i = 0; i < LOGPAGES; i++) {
lbuf = kmalloc(sizeof(lbuf_t), GFP_KERNEL);
if (lbuf == 0)
goto error;
lbuf->l_ldata = (char *) __get_free_page(GFP_KERNEL);
if (lbuf->l_ldata == 0) {
kfree(lbuf);
goto error;
}
lbuf->l_log = log;
init_waitqueue_head(&lbuf->l_ioevent);
lbuf->l_freelist = log->lbuf_free;
log->lbuf_free = lbuf;
}
return (0);
error:
lbmLogShutdown(log);
return (ENOMEM);
}
/*
* lbmLogShutdown()
*
* finalize per log I/O setup at lmLogShutdown()
*/
static void lbmLogShutdown(log_t * log)
{
lbuf_t *lbuf;
jFYI(1, ("lbmLogShutdown: log:0x%p\n", log));
lbuf = log->lbuf_free;
while (lbuf) {
lbuf_t *next = lbuf->l_freelist;
free_page((unsigned long) lbuf->l_ldata);
kfree(lbuf);
lbuf = next;
}
log->bp = NULL;
}
/*
* lbmAllocate()
*
* allocate an empty log buffer
*/
static lbuf_t *lbmAllocate(log_t * log, int pn)
{
lbuf_t *bp;
unsigned long flags;
/*
* recycle from log buffer freelist if any
*/
LCACHE_LOCK(flags);
LCACHE_SLEEP_COND(log->free_wait, (bp = log->lbuf_free), flags);
log->lbuf_free = bp->l_freelist;
LCACHE_UNLOCK(flags);
bp->l_flag = 0;
bp->l_wqnext = NULL;
bp->l_freelist = NULL;
bp->l_pn = pn;
bp->l_blkno = log->base + (pn << (L2LOGPSIZE - log->l2bsize));
bp->l_ceor = 0;
return bp;
}
/*
* lbmFree()
*
* release a log buffer to freelist
*/
static void lbmFree(lbuf_t * bp)
{
unsigned long flags;
LCACHE_LOCK(flags);
lbmfree(bp);
LCACHE_UNLOCK(flags);
}
static void lbmfree(lbuf_t * bp)
{
log_t *log = bp->l_log;
assert(bp->l_wqnext == NULL);
/*
* return the buffer to head of freelist
*/
bp->l_freelist = log->lbuf_free;
log->lbuf_free = bp;
wake_up(&log->free_wait);
return;
}
#ifdef _THIS_IS_NOT_USED
/*
* lbmRelease()
*
* remove the log buffer from log device write queue;
*/
static void lbmRelease(log_t * log, uint flag)
{
lbuf_t *bp, *tail;
unsigned long flags;
bp = log->bp;
LCACHE_LOCK(flags);
tail = log->wqueue;
/* single element queue */
if (bp == tail) {
log->wqueue = NULL;
bp->l_wqnext = NULL;
}
/* multi element queue */
else {
tail->l_wqnext = bp->l_wqnext;
bp->l_wqnext = NULL;
}
if (flag & lbmFREE)
lbmfree(bp);
LCACHE_UNLOCK(flags);
}
#endif /* _THIS_IS_NOT_USED */
/*
* NAME: lbmRedrive
*
* FUNCTION: add a log buffer to the the log redrive list
*
* PARAMETER:
* bp - log buffer
*
* NOTES:
* Takes log_redrive_lock.
*/
static inline void lbmRedrive(lbuf_t *bp)
{
unsigned long flags;
spin_lock_irqsave(&log_redrive_lock, flags);
bp->l_redrive_next = log_redrive_list;
log_redrive_list = bp;
spin_unlock_irqrestore(&log_redrive_lock, flags);
wake_up_process(jfsIOtask);
}
/*
* lbmRead()
*/
static int lbmRead(log_t * log, int pn, lbuf_t ** bpp)
{
struct bio *bio;
lbuf_t *bp;
/*
* allocate a log buffer
*/
*bpp = bp = lbmAllocate(log, pn);
jFYI(1, ("lbmRead: bp:0x%p pn:0x%x\n", bp, pn));
bp->l_flag |= lbmREAD;
bio = bio_alloc(GFP_NOFS, 1);
bio->bi_sector = bp->l_blkno << (log->l2bsize - 9);
bio->bi_dev = log->dev;
bio->bi_io_vec[0].bv_page = virt_to_page(bp->l_ldata);
bio->bi_io_vec[0].bv_len = LOGPSIZE;
bio->bi_io_vec[0].bv_offset = 0;
bio->bi_vcnt = 1;
bio->bi_idx = 0;
bio->bi_size = LOGPSIZE;
bio->bi_end_io = lbmIODone;
bio->bi_private = bp;
submit_bio(READ, bio);
run_task_queue(&tq_disk);
wait_event(bp->l_ioevent, (bp->l_flag != lbmREAD));
return 0;
}
/*
* lbmWrite()
*
* buffer at head of pageout queue stays after completion of
* partial-page pageout and redriven by explicit initiation of
* pageout by caller until full-page pageout is completed and
* released.
*
* device driver i/o done redrives pageout of new buffer at
* head of pageout queue when current buffer at head of pageout
* queue is released at the completion of its full-page pageout.
*
* LOGGC_LOCK() serializes lbmWrite() by lmNextPage() and lmGroupCommit().
* LCACHE_LOCK() serializes xflag between lbmWrite() and lbmIODone()
*/
static void lbmWrite(log_t * log, lbuf_t * bp, int flag, int cant_block)
{
lbuf_t *tail;
unsigned long flags;
jFYI(1, ("lbmWrite: bp:0x%p flag:0x%x pn:0x%x\n",
bp, flag, bp->l_pn));
/* map the logical block address to physical block address */
bp->l_blkno =
log->base + (bp->l_pn << (L2LOGPSIZE - log->l2bsize));
LCACHE_LOCK(flags); /* disable+lock */
/*
* initialize buffer for device driver
*/
bp->l_flag = flag;
/*
* insert bp at tail of write queue associated with log
*
* (request is either for bp already/currently at head of queue
* or new bp to be inserted at tail)
*/
tail = log->wqueue;
/* is buffer not already on write queue ? */
if (bp->l_wqnext == NULL) {
/* insert at tail of wqueue */
if (tail == NULL) {
log->wqueue = bp;
bp->l_wqnext = bp;
} else {
log->wqueue = bp;
bp->l_wqnext = tail->l_wqnext;
tail->l_wqnext = bp;
}
tail = bp;
}
/* is buffer at head of wqueue and for write ? */
if ((bp != tail->l_wqnext) || !(flag & lbmWRITE)) {
LCACHE_UNLOCK(flags); /* unlock+enable */
return;
}
LCACHE_UNLOCK(flags); /* unlock+enable */
if (cant_block)
lbmRedrive(bp);
else if (flag & lbmSYNC)
lbmStartIO(bp);
else {
LOGGC_UNLOCK(log);
lbmStartIO(bp);
LOGGC_LOCK(log);
}
}
/*
* lbmDirectWrite()
*
* initiate pageout bypassing write queue for sidestream
* (e.g., log superblock) write;
*/
static void lbmDirectWrite(log_t * log, lbuf_t * bp, int flag)
{
jEVENT(0, ("lbmDirectWrite: bp:0x%p flag:0x%x pn:0x%x\n",
bp, flag, bp->l_pn));
/*
* initialize buffer for device driver
*/
bp->l_flag = flag | lbmDIRECT;
/* map the logical block address to physical block address */
bp->l_blkno =
log->base + (bp->l_pn << (L2LOGPSIZE - log->l2bsize));
/*
* initiate pageout of the page
*/
lbmStartIO(bp);
}
/*
* NAME: lbmStartIO()
*
* FUNCTION: Interface to DD strategy routine
*
* RETURN: none
*
* serialization: LCACHE_LOCK() is NOT held during log i/o;
*/
void lbmStartIO(lbuf_t * bp)
{
struct bio *bio;
log_t *log = bp->l_log;
jFYI(1, ("lbmStartIO\n"));
bio = bio_alloc(GFP_NOFS, 1);
bio->bi_sector = bp->l_blkno << (log->l2bsize - 9);
bio->bi_dev = log->dev;
bio->bi_io_vec[0].bv_page = virt_to_page(bp->l_ldata);
bio->bi_io_vec[0].bv_len = LOGPSIZE;
bio->bi_io_vec[0].bv_offset = 0;
bio->bi_vcnt = 1;
bio->bi_idx = 0;
bio->bi_size = LOGPSIZE;
bio->bi_end_io = lbmIODone;
bio->bi_private = bp;
submit_bio(WRITE, bio);
INCREMENT(lmStat.submitted);
run_task_queue(&tq_disk);
jFYI(1, ("lbmStartIO done\n"));
}
/*
* lbmIOWait()
*/
static int lbmIOWait(lbuf_t * bp, int flag)
{
unsigned long flags;
int rc = 0;
jFYI(1,
("lbmIOWait1: bp:0x%p flag:0x%x:0x%x\n", bp, bp->l_flag,
flag));
LCACHE_LOCK(flags); /* disable+lock */
LCACHE_SLEEP_COND(bp->l_ioevent, (bp->l_flag & lbmDONE), flags);
rc = (bp->l_flag & lbmERROR) ? EIO : 0;
if (flag & lbmFREE)
lbmfree(bp);
LCACHE_UNLOCK(flags); /* unlock+enable */
jFYI(1,
("lbmIOWait2: bp:0x%p flag:0x%x:0x%x\n", bp, bp->l_flag,
flag));
return rc;
}
/*
* lbmIODone()
*
* executed at INTIODONE level
*/
static int lbmIODone(struct bio *bio, int nr_sectors)
{
lbuf_t *bp = bio->bi_private;
lbuf_t *nextbp, *tail;
log_t *log;
unsigned long flags;
/*
* get back jfs buffer bound to the i/o buffer
*/
jEVENT(0, ("lbmIODone: bp:0x%p flag:0x%x\n", bp, bp->l_flag));
LCACHE_LOCK(flags); /* disable+lock */
bp->l_flag |= lbmDONE;
if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) {
bp->l_flag |= lbmERROR;
jERROR(1, ("lbmIODone: I/O error in JFS log\n"));
}
bio_put(bio);
/*
* pagein completion
*/
if (bp->l_flag & lbmREAD) {
bp->l_flag &= ~lbmREAD;
LCACHE_UNLOCK(flags); /* unlock+enable */
/* wakeup I/O initiator */
LCACHE_WAKEUP(&bp->l_ioevent);
return 0;
}
/*
* pageout completion
*
* the bp at the head of write queue has completed pageout.
*
* if single-commit/full-page pageout, remove the current buffer
* from head of pageout queue, and redrive pageout with
* the new buffer at head of pageout queue;
* otherwise, the partial-page pageout buffer stays at
* the head of pageout queue to be redriven for pageout
* by lmGroupCommit() until full-page pageout is completed.
*/
bp->l_flag &= ~lbmWRITE;
INCREMENT(lmStat.pagedone);
/* update committed lsn */
log = bp->l_log;
log->clsn = (bp->l_pn << L2LOGPSIZE) + bp->l_ceor;
if (bp->l_flag & lbmDIRECT) {
LCACHE_WAKEUP(&bp->l_ioevent);
LCACHE_UNLOCK(flags);
return 0;
}
tail = log->wqueue;
/* single element queue */
if (bp == tail) {
/* remove head buffer of full-page pageout
* from log device write queue
*/
if (bp->l_flag & lbmRELEASE) {
log->wqueue = NULL;
bp->l_wqnext = NULL;
}
}
/* multi element queue */
else {
/* remove head buffer of full-page pageout
* from log device write queue
*/
if (bp->l_flag & lbmRELEASE) {
nextbp = tail->l_wqnext = bp->l_wqnext;
bp->l_wqnext = NULL;
/*
* redrive pageout of next page at head of write queue:
* redrive next page without any bound tblk
* (i.e., page w/o any COMMIT records), or
* first page of new group commit which has been
* queued after current page (subsequent pageout
* is performed synchronously, except page without
* any COMMITs) by lmGroupCommit() as indicated
* by lbmWRITE flag;
*/
if (nextbp->l_flag & lbmWRITE) {
/*
* We can't do the I/O at interrupt time.
* The jfsIO thread can do it
*/
lbmRedrive(nextbp);
}
}
}
/*
* synchronous pageout:
*
* buffer has not necessarily been removed from write queue
* (e.g., synchronous write of partial-page with COMMIT):
* leave buffer for i/o initiator to dispose
*/
if (bp->l_flag & lbmSYNC) {
LCACHE_UNLOCK(flags); /* unlock+enable */
/* wakeup I/O initiator */
LCACHE_WAKEUP(&bp->l_ioevent);
}
/*
* Group Commit pageout:
*/
else if (bp->l_flag & lbmGC) {
LCACHE_UNLOCK(flags);
lmPostGC(bp);
}
/*
* asynchronous pageout:
*
* buffer must have been removed from write queue:
* insert buffer at head of freelist where it can be recycled
*/
else {
assert(bp->l_flag & lbmRELEASE);
assert(bp->l_flag & lbmFREE);
lbmfree(bp);
LCACHE_UNLOCK(flags); /* unlock+enable */
}
return 0;
}
int jfsIOWait(void *arg)
{
lbuf_t *bp;
jFYI(1, ("jfsIOWait is here!\n"));
lock_kernel();
daemonize();
current->tty = NULL;
strcpy(current->comm, "jfsIO");
unlock_kernel();
jfsIOtask = current;
spin_lock_irq(&current->sigmask_lock);
siginitsetinv(&current->blocked,
sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP)
| sigmask(SIGCONT));
spin_unlock_irq(&current->sigmask_lock);
complete(&jfsIOwait);
do {
spin_lock_irq(&log_redrive_lock);
while ((bp = log_redrive_list)) {
log_redrive_list = bp->l_redrive_next;
bp->l_redrive_next = NULL;
spin_unlock_irq(&log_redrive_lock);
lbmStartIO(bp);
spin_lock_irq(&log_redrive_lock);
}
spin_unlock_irq(&log_redrive_lock);
set_current_state(TASK_INTERRUPTIBLE);
schedule();
} while (!jfs_thread_stopped());
jFYI(1,("jfsIOWait being killed!\n"));
complete(&jfsIOwait);
return 0;
}
#ifdef _STILL_TO_PORT
/*
* lbmDirectIODone()
*
* iodone() for lbmDirectWrite() to bypass write queue;
* executed at INTIODONE level;
*/
static void lbmDirectIODone(iobuf_t * iobp)
{
lbuf_t *bp;
unsigned long flags;
/*
* get back jfs buffer bound to the io buffer
*/
bp = (lbuf_t *) iobp->b_jfsbp;
jEVENT(0,
("lbmDirectIODone: bp:0x%p flag:0x%x\n", bp, bp->l_flag));
LCACHE_LOCK(flags); /* disable+lock */
bp->l_flag |= lbmDONE;
if (iobp->b_flags & B_ERROR) {
bp->l_flag |= lbmERROR;
#ifdef _JFS_OS2
SysLogError();
#endif
}
/*
* pageout completion
*/
bp->l_flag &= ~lbmWRITE;
/*
* synchronous pageout:
*/
if (bp->l_flag & lbmSYNC) {
LCACHE_UNLOCK(flags); /* unlock+enable */
/* wakeup I/O initiator */
LCACHE_WAKEUP(&bp->l_ioevent);
}
/*
* asynchronous pageout:
*/
else {
assert(bp->l_flag & lbmRELEASE);
assert(bp->l_flag & lbmFREE);
lbmfree(bp);
LCACHE_UNLOCK(flags); /* unlock+enable */
}
}
#endif /* _STILL_TO_PORT */
#ifdef _STILL_TO_PORT
/*
* NAME: lmLogFormat()/jfs_logform()
*
* FUNCTION: format file system log (ref. jfs_logform()).
*
* PARAMETERS:
* log - log inode (with common mount inode base);
* logAddress - start address of log space in FS block;
* logSize - length of log space in FS block;
*
* RETURN: 0 - success
* -1 - i/o error
*/
int lmLogFormat(inode_t * ipmnt, s64 logAddress, int logSize)
{
int rc = 0;
cbuf_t *bp;
logsuper_t *logsuper;
logpage_t *lp;
int lspn; /* log sequence page number */
struct lrd *lrd_ptr;
int npbperpage, npages;
jFYI(0, ("lmLogFormat: logAddress:%Ld logSize:%d\n",
logAddress, logSize));
/* allocate a JFS buffer */
bp = rawAllocate();
/* map the logical block address to physical block address */
bp->cm_blkno = logAddress << ipmnt->i_l2bfactor;
npbperpage = LOGPSIZE >> ipmnt->i_l2pbsize;
npages = logSize / (LOGPSIZE >> ipmnt->i_l2bsize);
/*
* log space:
*
* page 0 - reserved;
* page 1 - log superblock;
* page 2 - log data page: A SYNC log record is written
* into this page at logform time;
* pages 3-N - log data page: set to empty log data pages;
*/
/*
* init log superblock: log page 1
*/
logsuper = (logsuper_t *) bp->cm_cdata;
logsuper->magic = cpu_to_le32(LOGMAGIC);
logsuper->version = cpu_to_le32(LOGVERSION);
logsuper->state = cpu_to_le32(LOGREDONE);
logsuper->flag = cpu_to_le32(ipmnt->i_mntflag); /* ? */
logsuper->size = cpu_to_le32(npages);
logsuper->bsize = cpu_to_le32(ipmnt->i_bsize);
logsuper->l2bsize = cpu_to_le32(ipmnt->i_l2bsize);
logsuper->end =
cpu_to_le32(2 * LOGPSIZE + LOGPHDRSIZE + LOGRDSIZE);
bp->cm_blkno += npbperpage;
rawWrite(ipmnt, bp, 0);
/*
* init pages 2 to npages-1 as log data pages:
*
* log page sequence number (lpsn) initialization:
*
* pn: 0 1 2 3 n-1
* +-----+-----+=====+=====+===.....===+=====+
* lspn: N-1 0 1 N-2
* <--- N page circular file ---->
*
* the N (= npages-2) data pages of the log is maintained as
* a circular file for the log records;
* lpsn grows by 1 monotonically as each log page is written
* to the circular file of the log;
* Since the AIX DUMMY log record is dropped for this XJFS,
* and setLogpage() will not reset the page number even if
* the eor is equal to LOGPHDRSIZE. In order for binary search
* still work in find log end process, we have to simulate the
* log wrap situation at the log format time.
* The 1st log page written will have the highest lpsn. Then
* the succeeding log pages will have ascending order of
* the lspn starting from 0, ... (N-2)
*/
lp = (logpage_t *) bp->cm_cdata;
/*
* initialize 1st log page to be written: lpsn = N - 1,
* write a SYNCPT log record is written to this page
*/
lp->h.page = lp->t.page = cpu_to_le32(npages - 3);
lp->h.eor = lp->t.eor = cpu_to_le16(LOGPHDRSIZE + LOGRDSIZE);
lrd_ptr = (struct lrd *) &lp->data;
lrd_ptr->logtid = 0;
lrd_ptr->backchain = 0;
lrd_ptr->type = cpu_to_le16(LOG_SYNCPT);
lrd_ptr->length = 0;
lrd_ptr->log.syncpt.sync = 0;
bp->cm_blkno += npbperpage;
rawWrite(ipmnt, bp, 0);
/*
* initialize succeeding log pages: lpsn = 0, 1, ..., (N-2)
*/
for (lspn = 0; lspn < npages - 3; lspn++) {
lp->h.page = lp->t.page = cpu_to_le32(lspn);
lp->h.eor = lp->t.eor = cpu_to_le16(LOGPHDRSIZE);
bp->cm_blkno += npbperpage;
rawWrite(ipmnt, bp, 0);
}
/*
* finalize log
*/
/* release the buffer */
rawRelease(bp);
return rc;
}
#endif /* _STILL_TO_PORT */
#ifdef CONFIG_JFS_STATISTICS
int jfs_lmstats_read(char *buffer, char **start, off_t offset, int length,
int *eof, void *data)
{
int len = 0;
off_t begin;
len += sprintf(buffer,
"JFS Logmgr stats\n"
"================\n"
"commits = %d\n"
"writes submitted = %d\n"
"writes completed = %d\n",
lmStat.commit,
lmStat.submitted,
lmStat.pagedone);
begin = offset;
*start = buffer + begin;
len -= begin;
if (len > length)
len = length;
else
*eof = 1;
if (len < 0)
len = 0;
return len;
}
#endif /* CONFIG_JFS_STATISTICS */
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#ifndef _H_JFS_LOGMGR
#define _H_JFS_LOGMGR
#include "jfs_filsys.h"
#include "jfs_lock.h"
/*
* log manager configuration parameters
*/
/* log page size */
#define LOGPSIZE 4096
#define L2LOGPSIZE 12
#define LOGPAGES 16 /* Log pages per mounted file system */
/*
* log logical volume
*
* a log is used to make the commit operation on journalled
* files within the same logical volume group atomic.
* a log is implemented with a logical volume.
* there is one log per logical volume group.
*
* block 0 of the log logical volume is not used (ipl etc).
* block 1 contains a log "superblock" and is used by logFormat(),
* lmLogInit(), lmLogShutdown(), and logRedo() to record status
* of the log but is not otherwise used during normal processing.
* blocks 2 - (N-1) are used to contain log records.
*
* when a volume group is varied-on-line, logRedo() must have
* been executed before the file systems (logical volumes) in
* the volume group can be mounted.
*/
/*
* log superblock (block 1 of logical volume)
*/
#define LOGSUPER_B 1
#define LOGSTART_B 2
#define LOGMAGIC 0x87654321
#define LOGVERSION 1
typedef struct {
u32 magic; /* 4: log lv identifier */
s32 version; /* 4: version number */
s32 serial; /* 4: log open/mount counter */
s32 size; /* 4: size in number of LOGPSIZE blocks */
s32 bsize; /* 4: logical block size in byte */
s32 l2bsize; /* 4: log2 of bsize */
u32 flag; /* 4: option */
u32 state; /* 4: state - see below */
s32 end; /* 4: addr of last log record set by logredo */
u32 active[8]; /* 32: active file systems bit vector */
s32 rsrvd[LOGPSIZE / 4 - 17];
} logsuper_t;
/* log flag: commit option (see jfs_filsys.h) */
/* log state */
#define LOGMOUNT 0 /* log mounted by lmLogInit() */
#define LOGREDONE 1 /* log shutdown by lmLogShutdown().
* log redo completed by logredo().
*/
#define LOGWRAP 2 /* log wrapped */
#define LOGREADERR 3 /* log read error detected in logredo() */
/*
* log logical page
*
* (this comment should be rewritten !)
* the header and trailer structures (h,t) will normally have
* the same page and eor value.
* An exception to this occurs when a complete page write is not
* accomplished on a power failure. Since the hardware may "split write"
* sectors in the page, any out of order sequence may occur during powerfail
* and needs to be recognized during log replay. The xor value is
* an "exclusive or" of all log words in the page up to eor. This
* 32 bit eor is stored with the top 16 bits in the header and the
* bottom 16 bits in the trailer. logredo can easily recognize pages
* that were not completed by reconstructing this eor and checking
* the log page.
*
* Previous versions of the operating system did not allow split
* writes and detected partially written records in logredo by
* ordering the updates to the header, trailer, and the move of data
* into the logdata area. The order: (1) data is moved (2) header
* is updated (3) trailer is updated. In logredo, when the header
* differed from the trailer, the header and trailer were reconciled
* as follows: if h.page != t.page they were set to the smaller of
* the two and h.eor and t.eor set to 8 (i.e. empty page). if (only)
* h.eor != t.eor they were set to the smaller of their two values.
*/
typedef struct {
struct { /* header */
s32 page; /* 4: log sequence page number */
s16 rsrvd; /* 2: */
s16 eor; /* 2: end-of-log offset of lasrt record write */
} h;
s32 data[LOGPSIZE / 4 - 4]; /* log record area */
struct { /* trailer */
s32 page; /* 4: normally the same as h.page */
s16 rsrvd; /* 2: */
s16 eor; /* 2: normally the same as h.eor */
} t;
} logpage_t;
#define LOGPHDRSIZE 8 /* log page header size */
#define LOGPTLRSIZE 8 /* log page trailer size */
/*
* log record
*
* (this comment should be rewritten !)
* jfs uses only "after" log records (only a single writer is allowed
* in a page, pages are written to temporary paging space if
* if they must be written to disk before commit, and i/o is
* scheduled for modified pages to their home location after
* the log records containing the after values and the commit
* record is written to the log on disk, undo discards the copy
* in main-memory.)
*
* a log record consists of a data area of variable length followed by
* a descriptor of fixed size LOGRDSIZE bytes.
* the data area is rounded up to an integral number of 4-bytes and
* must be no longer than LOGPSIZE.
* the descriptor is of size of multiple of 4-bytes and aligned on a
* 4-byte boundary.
* records are packed one after the other in the data area of log pages.
* (sometimes a DUMMY record is inserted so that at least one record ends
* on every page or the longest record is placed on at most two pages).
* the field eor in page header/trailer points to the byte following
* the last record on a page.
*/
/* log record types */
#define LOG_COMMIT 0x8000
#define LOG_SYNCPT 0x4000
#define LOG_MOUNT 0x2000
#define LOG_REDOPAGE 0x0800
#define LOG_NOREDOPAGE 0x0080
#define LOG_NOREDOINOEXT 0x0040
#define LOG_UPDATEMAP 0x0008
#define LOG_NOREDOFILE 0x0001
/* REDOPAGE/NOREDOPAGE log record data type */
#define LOG_INODE 0x0001
#define LOG_XTREE 0x0002
#define LOG_DTREE 0x0004
#define LOG_BTROOT 0x0010
#define LOG_EA 0x0020
#define LOG_ACL 0x0040
#define LOG_DATA 0x0080
#define LOG_NEW 0x0100
#define LOG_EXTEND 0x0200
#define LOG_RELOCATE 0x0400
#define LOG_DIR_XTREE 0x0800 /* Xtree is in directory inode */
/* UPDATEMAP log record descriptor type */
#define LOG_ALLOCXADLIST 0x0080
#define LOG_ALLOCPXDLIST 0x0040
#define LOG_ALLOCXAD 0x0020
#define LOG_ALLOCPXD 0x0010
#define LOG_FREEXADLIST 0x0008
#define LOG_FREEPXDLIST 0x0004
#define LOG_FREEXAD 0x0002
#define LOG_FREEPXD 0x0001
typedef struct lrd {
/*
* type independent area
*/
s32 logtid; /* 4: log transaction identifier */
s32 backchain; /* 4: ptr to prev record of same transaction */
u16 type; /* 2: record type */
s16 length; /* 2: length of data in record (in byte) */
s32 aggregate; /* 4: file system lv/aggregate */
/* (16) */
/*
* type dependent area (20)
*/
union {
/*
* COMMIT: commit
*
* transaction commit: no type-dependent information;
*/
/*
* REDOPAGE: after-image
*
* apply after-image;
*
* N.B. REDOPAGE, NOREDOPAGE, and UPDATEMAP must be same format;
*/
struct {
u32 fileset; /* 4: fileset number */
u32 inode; /* 4: inode number */
u16 type; /* 2: REDOPAGE record type */
s16 l2linesize; /* 2: log2 of line size */
pxd_t pxd; /* 8: on-disk page pxd */
} redopage; /* (20) */
/*
* NOREDOPAGE: the page is freed
*
* do not apply after-image records which precede this record
* in the log with the same page block number to this page.
*
* N.B. REDOPAGE, NOREDOPAGE, and UPDATEMAP must be same format;
*/
struct {
s32 fileset; /* 4: fileset number */
u32 inode; /* 4: inode number */
u16 type; /* 2: NOREDOPAGE record type */
s16 rsrvd; /* 2: reserved */
pxd_t pxd; /* 8: on-disk page pxd */
} noredopage; /* (20) */
/*
* UPDATEMAP: update block allocation map
*
* either in-line PXD,
* or out-of-line XADLIST;
*
* N.B. REDOPAGE, NOREDOPAGE, and UPDATEMAP must be same format;
*/
struct {
u32 fileset; /* 4: fileset number */
u32 inode; /* 4: inode number */
u16 type; /* 2: UPDATEMAP record type */
s16 nxd; /* 2: number of extents */
pxd_t pxd; /* 8: pxd */
} updatemap; /* (20) */
/*
* NOREDOINOEXT: the inode extent is freed
*
* do not apply after-image records which precede this
* record in the log with the any of the 4 page block
* numbers in this inode extent.
*
* NOTE: The fileset and pxd fields MUST remain in
* the same fields in the REDOPAGE record format.
*
*/
struct {
s32 fileset; /* 4: fileset number */
s32 iagnum; /* 4: IAG number */
s32 inoext_idx; /* 4: inode extent index */
pxd_t pxd; /* 8: on-disk page pxd */
} noredoinoext; /* (20) */
/*
* SYNCPT: log sync point
*
* replay log upto syncpt address specified;
*/
struct {
s32 sync; /* 4: syncpt address (0 = here) */
} syncpt;
/*
* MOUNT: file system mount
*
* file system mount: no type-dependent information;
*/
/*
* ? FREEXTENT: free specified extent(s)
*
* free specified extent(s) from block allocation map
* N.B.: nextents should be length of data/sizeof(xad_t)
*/
struct {
s32 type; /* 4: FREEXTENT record type */
s32 nextent; /* 4: number of extents */
/* data: PXD or XAD list */
} freextent;
/*
* ? NOREDOFILE: this file is freed
*
* do not apply records which precede this record in the log
* with the same inode number.
*
* NOREDILE must be the first to be written at commit
* (last to be read in logredo()) - it prevents
* replay of preceding updates of all preceding generations
* of the inumber esp. the on-disk inode itself,
* but does NOT prevent
* replay of the
*/
struct {
s32 fileset; /* 4: fileset number */
u32 inode; /* 4: inode number */
} noredofile;
/*
* ? NEWPAGE:
*
* metadata type dependent
*/
struct {
s32 fileset; /* 4: fileset number */
u32 inode; /* 4: inode number */
s32 type; /* 4: NEWPAGE record type */
pxd_t pxd; /* 8: on-disk page pxd */
} newpage;
/*
* ? DUMMY: filler
*
* no type-dependent information
*/
} log;
} lrd_t; /* (36) */
#define LOGRDSIZE (sizeof(struct lrd))
/*
* line vector descriptor
*/
typedef struct {
s16 offset;
s16 length;
} lvd_t;
/*
* log logical volume
*/
typedef struct jfs_log {
struct super_block *sb; /* 4: This is used to sync metadata
* before writing syncpt. Will
* need to be a list if we share
* the log between fs's
*/
kdev_t dev; /* 4: log lv number */
struct file *devfp; /* 4: log device file */
s32 serial; /* 4: log mount serial number */
s64 base; /* @8: log extent address (inline log ) */
int size; /* 4: log size in log page (in page) */
int l2bsize; /* 4: log2 of bsize */
uint flag; /* 4: flag */
uint state; /* 4: state */
struct lbuf *lbuf_free; /* 4: free lbufs */
wait_queue_head_t free_wait; /* 4: */
/* log write */
int logtid; /* 4: log tid */
int page; /* 4: page number of eol page */
int eor; /* 4: eor of last record in eol page */
struct lbuf *bp; /* 4: current log page buffer */
struct semaphore loglock; /* 4: log write serialization lock */
/* syncpt */
int nextsync; /* 4: bytes to write before next syncpt */
int active; /* 4: */
int syncbarrier; /* 4: */
wait_queue_head_t syncwait; /* 4: */
/* commit */
uint cflag; /* 4: */
struct { /* 8: FIFO commit queue header */
struct tblock *head;
struct tblock *tail;
} cqueue;
int gcrtc; /* 4: GC_READY transaction count */
struct tblock *gclrt; /* 4: latest GC_READY transaction */
spinlock_t gclock; /* 4: group commit lock */
int logsize; /* 4: log data area size in byte */
int lsn; /* 4: end-of-log */
int clsn; /* 4: clsn */
int syncpt; /* 4: addr of last syncpt record */
int sync; /* 4: addr from last logsync() */
struct list_head synclist; /* 8: logsynclist anchor */
spinlock_t synclock; /* 4: synclist lock */
struct lbuf *wqueue; /* 4: log pageout queue */
int count; /* 4: count */
} log_t;
/*
* group commit flag
*/
/* log_t */
#define logGC_PAGEOUT 0x00000001
/* tblock_t/lbuf_t */
#define tblkGC_QUEUE 0x0001
#define tblkGC_READY 0x0002
#define tblkGC_COMMIT 0x0004
#define tblkGC_COMMITTED 0x0008
#define tblkGC_EOP 0x0010
#define tblkGC_FREE 0x0020
#define tblkGC_LEADER 0x0040
#define tblkGC_ERROR 0x0080
#define tblkGC_LAZY 0x0100 // D230860
#define tblkGC_UNLOCKED 0x0200 // D230860
/*
* log cache buffer header
*/
typedef struct lbuf {
log_t *l_log; /* 4: log associated with buffer */
/*
* data buffer base area
*/
uint l_flag; /* 4: pageout control flags */
struct lbuf *l_wqnext; /* 4: write queue link */
struct lbuf *l_freelist; /* 4: freelistlink */
int l_pn; /* 4: log page number */
int l_eor; /* 4: log record eor */
int l_ceor; /* 4: committed log record eor */
s64 l_blkno; /* 8: log page block number */
caddr_t l_ldata; /* 4: data page */
wait_queue_head_t l_ioevent; /* 4: i/o done event */
struct page *l_page; /* The page itself */
} lbuf_t;
/* Reuse l_freelist for redrive list */
#define l_redrive_next l_freelist
/*
* logsynclist block
*
* common logsyncblk prefix for jbuf_t and tblock_t
*/
typedef struct logsyncblk {
u16 xflag; /* flags */
u16 flag; /* only meaninful in tblock_t */
lid_t lid; /* lock id */
s32 lsn; /* log sequence number */
struct list_head synclist; /* log sync list link */
} logsyncblk_t;
/*
* logsynclist serialization (per log)
*/
#define LOGSYNC_LOCK_INIT(log) spin_lock_init(&(log)->synclock)
#define LOGSYNC_LOCK(log) spin_lock(&(log)->synclock)
#define LOGSYNC_UNLOCK(log) spin_unlock(&(log)->synclock)
/* compute the difference in bytes of lsn from sync point */
#define logdiff(diff, lsn, log)\
{\
diff = (lsn) - (log)->syncpt;\
if (diff < 0)\
diff += (log)->logsize;\
}
extern int lmLogOpen(struct super_block *sb, log_t ** log);
extern int lmLogClose(struct super_block *sb, log_t * log);
extern int lmLogSync(log_t * log, int nosyncwait);
extern int lmLogQuiesce(log_t * log);
extern int lmLogResume(log_t * log, struct super_block *sb);
extern int lmLogFormat(struct super_block *sb, s64 logAddress, int logSize);
#endif /* _H_JFS_LOGMGR */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Module: jfs/jfs_metapage.c
*
*/
#include <linux/fs.h>
#include <linux/init.h>
#include "jfs_incore.h"
#include "jfs_filsys.h"
#include "jfs_metapage.h"
#include "jfs_txnmgr.h"
#include "jfs_debug.h"
extern struct task_struct *jfsCommitTask;
static unsigned int metapages = 1024; /* ??? Need a better number */
static unsigned int free_metapages;
static metapage_t *metapage_buf;
static unsigned long meta_order;
static metapage_t *meta_free_list = NULL;
static spinlock_t meta_lock = SPIN_LOCK_UNLOCKED;
static wait_queue_head_t meta_wait;
#ifdef CONFIG_JFS_STATISTICS
struct {
uint pagealloc; /* # of page allocations */
uint pagefree; /* # of page frees */
uint lockwait; /* # of sleeping lock_metapage() calls */
uint allocwait; /* # of sleeping alloc_metapage() calls */
} mpStat;
#endif
#define HASH_BITS 10 /* This makes hash_table 1 4K page */
#define HASH_SIZE (1 << HASH_BITS)
static metapage_t **hash_table = NULL;
static unsigned long hash_order;
static inline int metapage_locked(struct metapage *mp)
{
return test_bit(META_locked, &mp->flag);
}
static inline int trylock_metapage(struct metapage *mp)
{
return test_and_set_bit(META_locked, &mp->flag);
}
static inline void unlock_metapage(struct metapage *mp)
{
clear_bit(META_locked, &mp->flag);
wake_up(&mp->wait);
}
static void __lock_metapage(struct metapage *mp)
{
DECLARE_WAITQUEUE(wait, current);
INCREMENT(mpStat.lockwait);
add_wait_queue_exclusive(&mp->wait, &wait);
do {
set_current_state(TASK_UNINTERRUPTIBLE);
if (metapage_locked(mp)) {
spin_unlock(&meta_lock);
schedule();
spin_lock(&meta_lock);
}
} while (trylock_metapage(mp));
__set_current_state(TASK_RUNNING);
remove_wait_queue(&mp->wait, &wait);
}
/* needs meta_lock */
static inline void lock_metapage(struct metapage *mp)
{
if (trylock_metapage(mp))
__lock_metapage(mp);
}
/* We're currently re-evaluating the method we use to write metadata
* pages. Currently, we have to make sure there no dirty buffer_heads
* hanging around after we free the metadata page, since the same
* physical disk blocks may be used in a different address space and we
* can't write old data over the good data.
*
* The best way to do this now is with block_invalidate_page. However,
* this is only available in the newer kernels and is not exported
* to modules. block_flushpage is the next best, but it too is not exported
* to modules.
*
* In a module, about the best we have is generic_buffer_fdatasync. This
* synchronously writes any dirty buffers. This is not optimal, but it will
* keep old dirty buffers from overwriting newer data.
*/
static inline void invalidate_page(metapage_t *mp)
{
#ifdef MODULE
generic_buffer_fdatasync(mp->mapping->host, mp->index, mp->index + 1);
#else
lock_page(mp->page);
block_flushpage(mp->page, 0);
UnlockPage(mp->page);
#endif
}
int __init metapage_init(void)
{
int i;
metapage_t *last = NULL;
metapage_t *mp;
/*
* Initialize wait queue
*/
init_waitqueue_head(&meta_wait);
/*
* Allocate the metapage structures
*/
for (meta_order = 0;
((PAGE_SIZE << meta_order) / sizeof(metapage_t)) < metapages;
meta_order++);
metapages = (PAGE_SIZE << meta_order) / sizeof(metapage_t);
jFYI(1, ("metapage_init: metapage size = %Zd, metapages = %d\n",
sizeof(metapage_t), metapages));
metapage_buf =
(metapage_t *) __get_free_pages(GFP_KERNEL, meta_order);
assert(metapage_buf);
memset(metapage_buf, 0, PAGE_SIZE << meta_order);
mp = metapage_buf;
for (i = 0; i < metapages; i++, mp++) {
mp->flag = 0;
set_bit(META_free, &mp->flag);
init_waitqueue_head(&mp->wait);
mp->hash_next = last;
last = mp;
}
meta_free_list = last;
free_metapages = metapages;
/*
* Now the hash list
*/
for (hash_order = 0;
((PAGE_SIZE << hash_order) / sizeof(void *)) < HASH_SIZE;
hash_order++);
hash_table =
(metapage_t **) __get_free_pages(GFP_KERNEL, hash_order);
assert(hash_table);
memset(hash_table, 0, PAGE_SIZE << hash_order);
return 0;
}
void metapage_exit(void)
{
free_pages((unsigned long) metapage_buf, meta_order);
free_pages((unsigned long) hash_table, hash_order);
metapage_buf = 0; /* This is a signal to the jfsIOwait thread */
}
/*
* Get metapage structure from freelist
*
* Caller holds meta_lock
*/
static metapage_t *alloc_metapage(int *dropped_lock)
{
metapage_t *new;
*dropped_lock = FALSE;
/*
* Reserve two metapages for the lazy commit thread. Otherwise
* we may deadlock with holders of metapages waiting for tlocks
* that lazy thread should be freeing.
*/
if ((free_metapages < 3) && (current != jfsCommitTask)) {
INCREMENT(mpStat.allocwait);
*dropped_lock = TRUE;
__SLEEP_COND(meta_wait, (free_metapages > 2),
spin_lock(&meta_lock), spin_unlock(&meta_lock));
}
assert(meta_free_list);
new = meta_free_list;
meta_free_list = new->hash_next;
free_metapages--;
return new;
}
/*
* Put metapage on freelist (holding meta_lock)
*/
static inline void __free_metapage(metapage_t * mp)
{
mp->flag = 0;
set_bit(META_free, &mp->flag);
mp->hash_next = meta_free_list;
meta_free_list = mp;
free_metapages++;
wake_up(&meta_wait);
}
/*
* Put metapage on freelist (not holding meta_lock)
*/
static inline void free_metapage(metapage_t * mp)
{
spin_lock(&meta_lock);
__free_metapage(mp);
spin_unlock(&meta_lock);
}
/*
* Basically same hash as in pagemap.h, but using our hash table
*/
static metapage_t **meta_hash(struct address_space *mapping,
unsigned long index)
{
#define i (((unsigned long)mapping)/ \
(sizeof(struct inode) & ~(sizeof(struct inode) -1 )))
#define s(x) ((x) + ((x) >> HASH_BITS))
return hash_table + (s(i + index) & (HASH_SIZE - 1));
#undef i
#undef s
}
static metapage_t *search_hash(metapage_t ** hash_ptr,
struct address_space *mapping,
unsigned long index)
{
metapage_t *ptr;
for (ptr = *hash_ptr; ptr; ptr = ptr->hash_next) {
if ((ptr->mapping == mapping) && (ptr->index == index))
return ptr;
}
return NULL;
}
static void add_to_hash(metapage_t * mp, metapage_t ** hash_ptr)
{
if (*hash_ptr)
(*hash_ptr)->hash_prev = mp;
mp->hash_prev = NULL;
mp->hash_next = *hash_ptr;
*hash_ptr = mp;
list_add(&mp->inode_list, &JFS_IP(mp->mapping->host)->mp_list);
}
static void remove_from_hash(metapage_t * mp, metapage_t ** hash_ptr)
{
list_del(&mp->inode_list);
if (mp->hash_prev)
mp->hash_prev->hash_next = mp->hash_next;
else {
assert(*hash_ptr == mp);
*hash_ptr = mp->hash_next;
}
if (mp->hash_next)
mp->hash_next->hash_prev = mp->hash_prev;
}
/*
* Direct address space operations
*/
static int direct_get_block(struct inode *ip, sector_t lblock,
struct buffer_head *bh_result, int create)
{
if (create)
bh_result->b_state |= (1UL << BH_New);
map_bh(bh_result, ip->i_sb, lblock);
return 0;
}
static int direct_writepage(struct page *page)
{
return block_write_full_page(page, direct_get_block);
}
static int direct_readpage(struct file *fp, struct page *page)
{
return block_read_full_page(page, direct_get_block);
}
static int direct_prepare_write(struct file *file, struct page *page,
unsigned from, unsigned to)
{
return block_prepare_write(page, from, to, direct_get_block);
}
static int direct_bmap(struct address_space *mapping, long block)
{
return generic_block_bmap(mapping, block, direct_get_block);
}
struct address_space_operations direct_aops = {
readpage: direct_readpage,
writepage: direct_writepage,
sync_page: block_sync_page,
prepare_write: direct_prepare_write,
commit_write: generic_commit_write,
bmap: direct_bmap,
};
metapage_t *__get_metapage(struct inode *inode,
unsigned long lblock, unsigned int size,
int absolute, unsigned long new)
{
int dropped_lock;
metapage_t **hash_ptr;
int l2BlocksPerPage;
int l2bsize;
struct address_space *mapping;
metapage_t *mp;
unsigned long page_index;
unsigned long page_offset;
jFYI(1, ("__get_metapage: inode = 0x%p, lblock = 0x%lx\n",
inode, lblock));
if (absolute)
mapping = JFS_SBI(inode->i_sb)->direct_mapping;
else
mapping = inode->i_mapping;
spin_lock(&meta_lock);
hash_ptr = meta_hash(mapping, lblock);
mp = search_hash(hash_ptr, mapping, lblock);
if (mp) {
page_found:
if (test_bit(META_discard, &mp->flag)) {
assert(new); /* It's okay to reuse a discarded
* if we expect it to be empty
*/
clear_bit(META_discard, &mp->flag);
}
mp->count++;
jFYI(1, ("__get_metapage: found 0x%p, in hash\n", mp));
assert(mp->logical_size == size);
lock_metapage(mp);
spin_unlock(&meta_lock);
} else {
l2bsize = inode->i_sb->s_blocksize_bits;
l2BlocksPerPage = PAGE_CACHE_SHIFT - l2bsize;
page_index = lblock >> l2BlocksPerPage;
page_offset = (lblock - (page_index << l2BlocksPerPage)) <<
l2bsize;
if ((page_offset + size) > PAGE_SIZE) {
spin_unlock(&meta_lock);
jERROR(1, ("MetaData crosses page boundary!!\n"));
return NULL;
}
mp = alloc_metapage(&dropped_lock);
if (dropped_lock) {
/* alloc_metapage blocked, we need to search the hash
* again. (The goto is ugly, maybe we'll clean this
* up in the future.)
*/
metapage_t *mp2;
mp2 = search_hash(hash_ptr, mapping, lblock);
if (mp2) {
__free_metapage(mp);
mp = mp2;
goto page_found;
}
}
mp->flag = 0;
lock_metapage(mp);
if (absolute)
set_bit(META_absolute, &mp->flag);
mp->xflag = COMMIT_PAGE;
mp->count = 1;
atomic_set(&mp->nohomeok,0);
mp->mapping = mapping;
mp->index = lblock;
mp->page = 0;
mp->logical_size = size;
add_to_hash(mp, hash_ptr);
spin_unlock(&meta_lock);
if (new) {
jFYI(1,
("__get_metapage: Calling grab_cache_page\n"));
mp->page = grab_cache_page(mapping, page_index);
if (!mp->page) {
jERROR(1, ("grab_cache_page failed!\n"));
spin_lock(&meta_lock);
remove_from_hash(mp, hash_ptr);
__free_metapage(mp);
spin_unlock(&meta_lock);
return NULL;
} else
INCREMENT(mpStat.pagealloc);
} else {
jFYI(1,
("__get_metapage: Calling read_cache_page\n"));
mp->page =
read_cache_page(mapping, lblock,
(filler_t *) mapping->a_ops->
readpage, NULL);
if (IS_ERR(mp->page)) {
jERROR(1, ("read_cache_page failed!\n"));
spin_lock(&meta_lock);
remove_from_hash(mp, hash_ptr);
__free_metapage(mp);
spin_unlock(&meta_lock);
return NULL;
} else
INCREMENT(mpStat.pagealloc);
lock_page(mp->page);
}
mp->data = (void *) (kmap(mp->page) + page_offset);
}
jFYI(1, ("__get_metapage: returning = 0x%p\n", mp));
return mp;
}
void hold_metapage(metapage_t * mp, int force)
{
spin_lock(&meta_lock);
mp->count++;
if (force) {
ASSERT (!(test_bit(META_forced, &mp->flag)));
if (trylock_metapage(mp))
set_bit(META_forced, &mp->flag);
} else
lock_metapage(mp);
spin_unlock(&meta_lock);
}
static void __write_metapage(metapage_t * mp)
{
struct inode *ip = (struct inode *) mp->mapping->host;
unsigned long page_index;
unsigned long page_offset;
int rc;
int l2bsize = ip->i_sb->s_blocksize_bits;
int l2BlocksPerPage = PAGE_CACHE_SHIFT - l2bsize;
jFYI(1, ("__write_metapage: mp = 0x%p\n", mp));
if (test_bit(META_discard, &mp->flag)) {
/*
* This metadata is no longer valid
*/
clear_bit(META_dirty, &mp->flag);
return;
}
page_index = mp->page->index;
page_offset =
(mp->index - (page_index << l2BlocksPerPage)) << l2bsize;
rc = mp->mapping->a_ops->prepare_write(NULL, mp->page, page_offset,
page_offset +
mp->logical_size);
if (rc) {
jERROR(1, ("prepare_write return %d!\n", rc));
ClearPageUptodate(mp->page);
kunmap(mp->page);
clear_bit(META_dirty, &mp->flag);
return;
}
rc = mp->mapping->a_ops->commit_write(NULL, mp->page, page_offset,
page_offset +
mp->logical_size);
if (rc) {
jERROR(1, ("commit_write returned %d\n", rc));
}
clear_bit(META_dirty, &mp->flag);
jFYI(1, ("__write_metapage done\n"));
}
void release_metapage(metapage_t * mp)
{
log_t *log;
struct inode *ip;
jFYI(1,
("release_metapage: mp = 0x%p, flag = 0x%lx\n", mp,
mp->flag));
spin_lock(&meta_lock);
if (test_bit(META_forced, &mp->flag)) {
clear_bit(META_forced, &mp->flag);
mp->count--;
spin_unlock(&meta_lock);
return;
}
ip = (struct inode *) mp->mapping->host;
assert(mp->count);
if (--mp->count || atomic_read(&mp->nohomeok)) {
unlock_metapage(mp);
spin_unlock(&meta_lock);
} else {
remove_from_hash(mp, meta_hash(mp->mapping, mp->index));
spin_unlock(&meta_lock);
if (mp->page) {
kunmap(mp->page);
mp->data = 0;
if (test_bit(META_dirty, &mp->flag))
__write_metapage(mp);
UnlockPage(mp->page);
if (test_bit(META_sync, &mp->flag)) {
sync_metapage(mp);
clear_bit(META_sync, &mp->flag);
}
if (test_bit(META_discard, &mp->flag))
invalidate_page(mp);
page_cache_release(mp->page);
INCREMENT(mpStat.pagefree);
}
if (mp->lsn) {
/*
* Remove metapage from logsynclist.
*/
log = mp->log;
LOGSYNC_LOCK(log);
mp->log = 0;
mp->lsn = 0;
mp->clsn = 0;
log->count--;
list_del(&mp->synclist);
LOGSYNC_UNLOCK(log);
}
free_metapage(mp);
}
jFYI(1, ("release_metapage: done\n"));
}
void invalidate_metapages(struct inode *ip, unsigned long addr,
unsigned long len)
{
metapage_t **hash_ptr;
unsigned long lblock;
int l2BlocksPerPage = PAGE_CACHE_SHIFT - ip->i_sb->s_blocksize_bits;
struct address_space *mapping = ip->i_mapping;
metapage_t *mp;
#ifndef MODULE
struct page *page;
#endif
/*
* First, mark metapages to discard. They will eventually be
* released, but should not be written.
*/
for (lblock = addr; lblock < addr + len;
lblock += 1 << l2BlocksPerPage) {
hash_ptr = meta_hash(mapping, lblock);
spin_lock(&meta_lock);
mp = search_hash(hash_ptr, mapping, lblock);
if (mp) {
set_bit(META_discard, &mp->flag);
spin_unlock(&meta_lock);
/*
* If in the metapage cache, we've got the page locked
*/
#ifdef MODULE
UnlockPage(mp->page);
generic_buffer_fdatasync(mp->mapping->host, mp->index,
mp->index+1);
lock_page(mp->page);
#else
block_flushpage(mp->page, 0);
#endif
} else {
spin_unlock(&meta_lock);
#ifdef MODULE
generic_buffer_fdatasync(ip, lblock << l2BlocksPerPage,
(lblock + 1) << l2BlocksPerPage);
#else
page = find_lock_page(mapping,
lblock >> l2BlocksPerPage);
if (page) {
block_flushpage(page, 0);
UnlockPage(page);
}
#endif
}
}
}
void invalidate_inode_metapages(struct inode *inode)
{
struct list_head *ptr;
metapage_t *mp;
spin_lock(&meta_lock);
list_for_each(ptr, &JFS_IP(inode)->mp_list) {
mp = list_entry(ptr, metapage_t, inode_list);
clear_bit(META_dirty, &mp->flag);
set_bit(META_discard, &mp->flag);
kunmap(mp->page);
UnlockPage(mp->page);
page_cache_release(mp->page);
INCREMENT(mpStat.pagefree);
mp->data = 0;
mp->page = 0;
}
spin_unlock(&meta_lock);
truncate_inode_pages(inode->i_mapping, 0);
}
#ifdef CONFIG_JFS_STATISTICS
int jfs_mpstat_read(char *buffer, char **start, off_t offset, int length,
int *eof, void *data)
{
int len = 0;
off_t begin;
len += sprintf(buffer,
"JFS Metapage statistics\n"
"=======================\n"
"metapages in use = %d\n"
"page allocations = %d\n"
"page frees = %d\n"
"lock waits = %d\n"
"allocation waits = %d\n",
metapages - free_metapages,
mpStat.pagealloc,
mpStat.pagefree,
mpStat.lockwait,
mpStat.allocwait);
begin = offset;
*start = buffer + begin;
len -= begin;
if (len > length)
len = length;
else
*eof = 1;
if (len < 0)
len = 0;
return len;
}
#endif
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#ifndef _H_JFS_METAPAGE
#define _H_JFS_METAPAGE
#include <linux/pagemap.h>
typedef struct metapage {
/* Common logsyncblk prefix (see jfs_logmgr.h) */
u16 xflag;
u16 unused;
lid_t lid;
int lsn;
struct list_head synclist;
/* End of logsyncblk prefix */
unsigned long flag; /* See Below */
unsigned long count; /* Reference count */
void *data; /* Data pointer */
/* list management stuff */
struct metapage *hash_prev;
struct metapage *hash_next; /* Also used for free list */
struct list_head inode_list; /* per-inode metapage list */
/*
* mapping & index become redundant, but we need these here to
* add the metapage to the hash before we have the real page
*/
struct address_space *mapping;
unsigned long index;
wait_queue_head_t wait;
/* implementation */
struct page *page;
unsigned long logical_size;
/* Journal management */
int clsn;
atomic_t nohomeok;
struct jfs_log *log;
} metapage_t;
/*
* Direct-access address space operations
*/
extern struct address_space_operations direct_aops;
/* metapage flag */
#define META_locked 0
#define META_absolute 1
#define META_free 2
#define META_dirty 3
#define META_sync 4
#define META_discard 5
#define META_forced 6
#define mark_metapage_dirty(mp) set_bit(META_dirty, &(mp)->flag)
/* function prototypes */
extern metapage_t *__get_metapage(struct inode *inode,
unsigned long lblock, unsigned int size,
int absolute, unsigned long new);
#define read_metapage(inode, lblock, size, absolute)\
__get_metapage(inode, lblock, size, absolute, FALSE)
#define get_metapage(inode, lblock, size, absolute)\
__get_metapage(inode, lblock, size, absolute, TRUE)
extern void release_metapage(metapage_t *);
#define flush_metapage(mp) \
{\
set_bit(META_dirty, &(mp)->flag);\
set_bit(META_sync, &(mp)->flag);\
release_metapage(mp);\
}
#define sync_metapage(mp) \
generic_buffer_fdatasync((struct inode *)mp->mapping->host,\
mp->page->index, mp->page->index + 1)
#define write_metapage(mp) \
{\
set_bit(META_dirty, &(mp)->flag);\
release_metapage(mp);\
}
#define discard_metapage(mp) \
{\
clear_bit(META_dirty, &(mp)->flag);\
set_bit(META_discard, &(mp)->flag);\
release_metapage(mp);\
}
extern void hold_metapage(metapage_t *, int);
/*
* This routine uses hash to explicitly find small number of pages
*/
extern void invalidate_metapages(struct inode *, unsigned long, unsigned long);
/*
* This one uses mp_list to invalidate all pages for an inode
*/
extern void invalidate_inode_metapages(struct inode *inode);
#endif /* _H_JFS_METAPAGE */
/*
* MODULE_NAME: jfs_mount.c
*
* COMPONENT_NAME: sysjfs
*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
/*
* Change History :
*
*/
/*
* Module: jfs_mount.c
*
* note: file system in transition to aggregate/fileset:
*
* file system mount is interpreted as the mount of aggregate,
* if not already mounted, and mount of the single/only fileset in
* the aggregate;
*
* a file system/aggregate is represented by an internal inode
* (aka mount inode) initialized with aggregate superblock;
* each vfs represents a fileset, and points to its "fileset inode
* allocation map inode" (aka fileset inode):
* (an aggregate itself is structured recursively as a filset:
* an internal vfs is constructed and points to its "fileset inode
* allocation map inode" (aka aggregate inode) where each inode
* represents a fileset inode) so that inode number is mapped to
* on-disk inode in uniform way at both aggregate and fileset level;
*
* each vnode/inode of a fileset is linked to its vfs (to facilitate
* per fileset inode operations, e.g., unmount of a fileset, etc.);
* each inode points to the mount inode (to facilitate access to
* per aggregate information, e.g., block size, etc.) as well as
* its file set inode.
*
* aggregate
* ipmnt
* mntvfs -> fileset ipimap+ -> aggregate ipbmap -> aggregate ipaimap;
* fileset vfs -> vp(1) <-> ... <-> vp(n) <->vproot;
*/
#include <linux/fs.h>
#include "jfs_incore.h"
#include "jfs_filsys.h"
#include "jfs_superblock.h"
#include "jfs_dmap.h"
#include "jfs_imap.h"
#include "jfs_metapage.h"
#include "jfs_debug.h"
/*
* forward references
*/
static int chkSuper(struct super_block *);
static int logMOUNT(struct super_block *sb);
/*
* NAME: jfs_mount(sb)
*
* FUNCTION: vfs_mount()
*
* PARAMETER: sb - super block
*
* RETURN: EBUSY - device already mounted or open for write
* EBUSY - cvrdvp already mounted;
* EBUSY - mount table full
* ENOTDIR - cvrdvp not directory on a device mount
* ENXIO - device open failure
*/
int jfs_mount(struct super_block *sb)
{
int rc = 0; /* Return code */
struct jfs_sb_info *sbi = JFS_SBI(sb);
struct inode *ipaimap = NULL;
struct inode *ipaimap2 = NULL;
struct inode *ipimap = NULL;
struct inode *ipbmap = NULL;
jFYI(1, ("\nMount JFS\n"));
/*
* read/validate superblock
* (initialize mount inode from the superblock)
*/
if ((rc = chkSuper(sb))) {
goto errout20;
}
ipaimap = diReadSpecial(sb, AGGREGATE_I);
if (ipaimap == NULL) {
jERROR(1, ("jfs_mount: Faild to read AGGREGATE_I\n"));
rc = EIO;
goto errout20;
}
sbi->ipaimap = ipaimap;
jFYI(1, ("jfs_mount: ipaimap:0x%p\n", ipaimap));
/*
* initialize aggregate inode allocation map
*/
if ((rc = diMount(ipaimap))) {
jERROR(1,
("jfs_mount: diMount(ipaimap) failed w/rc = %d\n",
rc));
goto errout21;
}
/*
* open aggregate block allocation map
*/
ipbmap = diReadSpecial(sb, BMAP_I);
if (ipbmap == NULL) {
rc = EIO;
goto errout22;
}
jFYI(1, ("jfs_mount: ipbmap:0x%p\n", ipbmap));
sbi->ipbmap = ipbmap;
/*
* initialize aggregate block allocation map
*/
if ((rc = dbMount(ipbmap))) {
jERROR(1, ("jfs_mount: dbMount failed w/rc = %d\n", rc));
goto errout22;
}
/*
* open the secondary aggregate inode allocation map
*
* This is a duplicate of the aggregate inode allocation map.
*
* hand craft a vfs in the same fashion as we did to read ipaimap.
* By adding INOSPEREXT (32) to the inode number, we are telling
* diReadSpecial that we are reading from the secondary aggregate
* inode table. This also creates a unique entry in the inode hash
* table.
*/
if ((sbi->mntflag & JFS_BAD_SAIT) == 0) {
ipaimap2 = diReadSpecial(sb, AGGREGATE_I + INOSPEREXT);
if (ipaimap2 == 0) {
jERROR(1,
("jfs_mount: Faild to read AGGREGATE_I\n"));
rc = EIO;
goto errout35;
}
sbi->ipaimap2 = ipaimap2;
jFYI(1, ("jfs_mount: ipaimap2:0x%p\n", ipaimap2));
/*
* initialize secondary aggregate inode allocation map
*/
if ((rc = diMount(ipaimap2))) {
jERROR(1,
("jfs_mount: diMount(ipaimap2) failed, rc = %d\n",
rc));
goto errout35;
}
} else
/* Secondary aggregate inode table is not valid */
sbi->ipaimap2 = 0;
/*
* mount (the only/single) fileset
*/
/*
* open fileset inode allocation map (aka fileset inode)
*/
ipimap = diReadSpecial(sb, FILESYSTEM_I);
if (ipimap == NULL) {
jERROR(1, ("jfs_mount: Failed to read FILESYSTEM_I\n"));
/* open fileset secondary inode allocation map */
rc = EIO;
goto errout40;
}
jFYI(1, ("jfs_mount: ipimap:0x%p\n", ipimap));
/* map further access of per fileset inodes by the fileset inode */
sbi->ipimap = ipimap;
/* initialize fileset inode allocation map */
if ((rc = diMount(ipimap))) {
jERROR(1, ("jfs_mount: diMount failed w/rc = %d\n", rc));
goto errout41;
}
jFYI(1, ("Mount JFS Complete.\n"));
goto out;
/*
* unwind on error
*/
//errout42: /* close fileset inode allocation map */
diUnmount(ipimap, 1);
errout41: /* close fileset inode allocation map inode */
diFreeSpecial(ipimap);
errout40: /* fileset closed */
/* close secondary aggregate inode allocation map */
if (ipaimap2) {
diUnmount(ipaimap2, 1);
diFreeSpecial(ipaimap2);
}
errout35:
/* close aggregate block allocation map */
dbUnmount(ipbmap, 1);
diFreeSpecial(ipbmap);
errout22: /* close aggregate inode allocation map */
diUnmount(ipaimap, 1);
errout21: /* close aggregate inodes */
diFreeSpecial(ipaimap);
errout20: /* aggregate closed */
out:
if (rc) {
jERROR(1, ("Mount JFS Failure: %d\n", rc));
}
return rc;
}
/*
* NAME: jfs_mount_rw(sb, remount)
*
* FUNCTION: Completes read-write mount, or remounts read-only volume
* as read-write
*/
int jfs_mount_rw(struct super_block *sb, int remount)
{
struct jfs_sb_info *sbi = JFS_SBI(sb);
log_t *log;
int rc;
/*
* If we are re-mounting a previously read-only volume, we want to
* re-read the inode and block maps, since fsck.jfs may have updated
* them.
*/
if (remount) {
if (chkSuper(sb) || (sbi->state != FM_CLEAN))
return -EINVAL;
truncate_inode_pages(sbi->ipimap->i_mapping, 0);
truncate_inode_pages(sbi->ipbmap->i_mapping, 0);
diUnmount(sbi->ipimap, 1);
if ((rc = diMount(sbi->ipimap))) {
jERROR(1,("jfs_mount_rw: diMount failed!\n"));
return rc;
}
dbUnmount(sbi->ipbmap, 1);
if ((rc = dbMount(sbi->ipbmap))) {
jERROR(1,("jfs_mount_rw: dbMount failed!\n"));
return rc;
}
}
#ifdef _STILL_TO_PORT
/*
* get log device associated with the fs being mounted;
*/
if (ipmnt->i_mntflag & JFS_INLINELOG) {
vfsp->vfs_logVPB = vfsp->vfs_hVPB;
vfsp->vfs_logvpfs = vfsp->vfs_vpfsi;
} else if (vfsp->vfs_logvpfs == NULL) {
/*
* XXX: there's only one external log per system;
*/
jERROR(1, ("jfs_mount: Mount Failure! No Log Device.\n"));
goto errout30;
}
logdev = vfsp->vfs_logvpfs->vpi_unit;
ipmnt->i_logdev = logdev;
#endif /* _STILL_TO_PORT */
/*
* open/initialize log
*/
if ((rc = lmLogOpen(sb, &log)))
return rc;
JFS_SBI(sb)->log = log;
/*
* update file system superblock;
*/
if ((rc = updateSuper(sb, FM_MOUNT))) {
jERROR(1,
("jfs_mount: updateSuper failed w/rc = %d\n", rc));
lmLogClose(sb, log);
JFS_SBI(sb)->log = 0;
return rc;
}
/*
* write MOUNT log record of the file system
*/
logMOUNT(sb);
return rc;
}
/*
* chkSuper()
*
* validate the superblock of the file system to be mounted and
* get the file system parameters.
*
* returns
* 0 with fragsize set if check successful
* error code if not successful
*/
static int chkSuper(struct super_block *sb)
{
int rc = 0;
metapage_t *mp;
struct jfs_sb_info *sbi = JFS_SBI(sb);
struct jfs_superblock *j_sb;
int AIM_bytesize, AIT_bytesize;
int expected_AIM_bytesize, expected_AIT_bytesize;
s64 AIM_byte_addr, AIT_byte_addr, fsckwsp_addr;
s64 byte_addr_diff0, byte_addr_diff1;
s32 bsize;
if ((rc = readSuper(sb, &mp)))
return rc;
j_sb = (struct jfs_superblock *) (mp->data);
/*
* validate superblock
*/
/* validate fs signature */
if (strncmp(j_sb->s_magic, JFS_MAGIC, 4) ||
j_sb->s_version != cpu_to_le32(JFS_VERSION)) {
//rc = EFORMAT;
rc = EINVAL;
goto out;
}
bsize = le32_to_cpu(j_sb->s_bsize);
#ifdef _JFS_4K
if (bsize != PSIZE) {
jERROR(1, ("Currently only 4K block size supported!\n"));
rc = EINVAL;
goto out;
}
#endif /* _JFS_4K */
jFYI(1, ("superblock: flag:0x%08x state:0x%08x size:0x%Lx\n",
le32_to_cpu(j_sb->s_flag), le32_to_cpu(j_sb->s_state),
(unsigned long long) le64_to_cpu(j_sb->s_size)));
/* validate the descriptors for Secondary AIM and AIT */
if ((j_sb->s_flag & cpu_to_le32(JFS_BAD_SAIT)) !=
cpu_to_le32(JFS_BAD_SAIT)) {
expected_AIM_bytesize = 2 * PSIZE;
AIM_bytesize = lengthPXD(&(j_sb->s_aim2)) * bsize;
expected_AIT_bytesize = 4 * PSIZE;
AIT_bytesize = lengthPXD(&(j_sb->s_ait2)) * bsize;
AIM_byte_addr = addressPXD(&(j_sb->s_aim2)) * bsize;
AIT_byte_addr = addressPXD(&(j_sb->s_ait2)) * bsize;
byte_addr_diff0 = AIT_byte_addr - AIM_byte_addr;
fsckwsp_addr = addressPXD(&(j_sb->s_fsckpxd)) * bsize;
byte_addr_diff1 = fsckwsp_addr - AIT_byte_addr;
if ((AIM_bytesize != expected_AIM_bytesize) ||
(AIT_bytesize != expected_AIT_bytesize) ||
(byte_addr_diff0 != AIM_bytesize) ||
(byte_addr_diff1 <= AIT_bytesize))
j_sb->s_flag |= cpu_to_le32(JFS_BAD_SAIT);
}
/* in release 1, the flag MUST reflect inline log, and group commit */
if ((j_sb->s_flag & cpu_to_le32(JFS_INLINELOG)) !=
cpu_to_le32(JFS_INLINELOG))
j_sb->s_flag |= cpu_to_le32(JFS_INLINELOG);
if ((j_sb->s_flag & cpu_to_le32(JFS_GROUPCOMMIT)) !=
cpu_to_le32(JFS_GROUPCOMMIT))
j_sb->s_flag |= cpu_to_le32(JFS_GROUPCOMMIT);
jFYI(0, ("superblock: flag:0x%08x state:0x%08x size:0x%Lx\n",
le32_to_cpu(j_sb->s_flag), le32_to_cpu(j_sb->s_state),
(unsigned long long) le64_to_cpu(j_sb->s_size)));
/* validate fs state */
if (j_sb->s_state != cpu_to_le32(FM_CLEAN) &&
!(sb->s_flags & MS_RDONLY)) {
jERROR(1,
("jfs_mount: Mount Failure: File System Dirty.\n"));
rc = EINVAL;
goto out;
}
sbi->state = le32_to_cpu(j_sb->s_state);
sbi->mntflag = le32_to_cpu(j_sb->s_flag);
/*
* JFS always does I/O by 4K pages. Don't tell the buffer cache
* that we use anything else (leave s_blocksize alone).
*/
sbi->bsize = bsize;
sbi->l2bsize = le16_to_cpu(j_sb->s_l2bsize);
/*
* For now, ignore s_pbsize, l2bfactor. All I/O going through buffer
* cache.
*/
sbi->nbperpage = PSIZE >> sbi->l2bsize;
sbi->l2nbperpage = L2PSIZE - sbi->l2bsize;
sbi->l2niperblk = sbi->l2bsize - L2DISIZE;
if (sbi->mntflag & JFS_INLINELOG)
sbi->logpxd = j_sb->s_logpxd;
sbi->ait2 = j_sb->s_ait2;
out:
release_metapage(mp);
return rc;
}
/*
* updateSuper()
*
* update synchronously superblock if it is mounted read-write.
*/
int updateSuper(struct super_block *sb, uint state)
{
int rc;
metapage_t *mp;
struct jfs_superblock *j_sb;
/*
* Only fsck can fix dirty state
*/
if (JFS_SBI(sb)->state == FM_DIRTY)
return 0;
if ((rc = readSuper(sb, &mp)))
return rc;
j_sb = (struct jfs_superblock *) (mp->data);
j_sb->s_state = cpu_to_le32(state);
JFS_SBI(sb)->state = state;
if (state == FM_MOUNT) {
/* record log's dev_t and mount serial number */
j_sb->s_logdev =
cpu_to_le32(kdev_t_to_nr(JFS_SBI(sb)->log->dev));
j_sb->s_logserial = cpu_to_le32(JFS_SBI(sb)->log->serial);
} else if (state == FM_CLEAN) {
/*
* If this volume is shared with OS/2, OS/2 will need to
* recalculate DASD usage, since we don't deal with it.
*/
if (j_sb->s_flag & cpu_to_le32(JFS_DASD_ENABLED))
j_sb->s_flag |= cpu_to_le32(JFS_DASD_PRIME);
}
flush_metapage(mp);
return 0;
}
/*
* readSuper()
*
* read superblock by raw sector address
*/
int readSuper(struct super_block *sb, metapage_t ** mpp)
{
/* read in primary superblock */
*mpp = read_metapage(JFS_SBI(sb)->direct_inode,
SUPER1_OFF >> sb->s_blocksize_bits, PSIZE, 1);
if (*mpp == NULL) {
/* read in secondary/replicated superblock */
*mpp = read_metapage(JFS_SBI(sb)->direct_inode,
SUPER2_OFF >> sb->s_blocksize_bits,
PSIZE, 1);
}
return *mpp ? 0 : 1;
}
/*
* logMOUNT()
*
* function: write a MOUNT log record for file system.
*
* MOUNT record keeps logredo() from processing log records
* for this file system past this point in log.
* it is harmless if mount fails.
*
* note: MOUNT record is at aggregate level, not at fileset level,
* since log records of previous mounts of a fileset
* (e.g., AFTER record of extent allocation) have to be processed
* to update block allocation map at aggregate level.
*/
static int logMOUNT(struct super_block *sb)
{
log_t *log = JFS_SBI(sb)->log;
lrd_t lrd;
lrd.logtid = 0;
lrd.backchain = 0;
lrd.type = cpu_to_le16(LOG_MOUNT);
lrd.length = 0;
lrd.aggregate = cpu_to_le32(kdev_t_to_nr(sb->s_dev));
lmLog(log, NULL, &lrd, NULL);
return 0;
}
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*/
#ifndef _H_JFS_SUPERBLOCK
#define _H_JFS_SUPERBLOCK
/*
* jfs_superblock.h
*/
/*
* make the magic number something a human could read
*/
#define JFS_MAGIC "JFS1" /* Magic word: Version 1 */
#define JFS_VERSION 1 /* Version number: Version 1 */
#define LV_NAME_SIZE 11 /* MUST BE 11 for OS/2 boot sector */
/*
* aggregate superblock
*
* The name superblock is too close to super_block, so the name has been
* changed to jfs_superblock. The utilities are still using the old name.
*/
struct jfs_superblock {
char s_magic[4]; /* 4: magic number */
u32 s_version; /* 4: version number */
s64 s_size; /* 8: aggregate size in hardware/LVM blocks;
* VFS: number of blocks
*/
s32 s_bsize; /* 4: aggregate block size in bytes;
* VFS: fragment size
*/
s16 s_l2bsize; /* 2: log2 of s_bsize */
s16 s_l2bfactor; /* 2: log2(s_bsize/hardware block size) */
s32 s_pbsize; /* 4: hardware/LVM block size in bytes */
s16 s_l2pbsize; /* 2: log2 of s_pbsize */
s16 pad; /* 2: padding necessary for alignment */
u32 s_agsize; /* 4: allocation group size in aggr. blocks */
u32 s_flag; /* 4: aggregate attributes:
* see jfs_filsys.h
*/
u32 s_state; /* 4: mount/unmount/recovery state:
* see jfs_filsys.h
*/
s32 s_compress; /* 4: > 0 if data compression */
pxd_t s_ait2; /* 8: first extent of secondary
* aggregate inode table
*/
pxd_t s_aim2; /* 8: first extent of secondary
* aggregate inode map
*/
u32 s_logdev; /* 4: device address of log */
s32 s_logserial; /* 4: log serial number at aggregate mount */
pxd_t s_logpxd; /* 8: inline log extent */
pxd_t s_fsckpxd; /* 8: inline fsck work space extent */
struct timestruc_t s_time; /* 8: time last updated */
s32 s_fsckloglen; /* 4: Number of filesystem blocks reserved for
* the fsck service log.
* N.B. These blocks are divided among the
* versions kept. This is not a per
* version size.
* N.B. These blocks are included in the
* length field of s_fsckpxd.
*/
s8 s_fscklog; /* 1: which fsck service log is most recent
* 0 => no service log data yet
* 1 => the first one
* 2 => the 2nd one
*/
char s_fpack[11]; /* 11: file system volume name
* N.B. This must be 11 bytes to
* conform with the OS/2 BootSector
* requirements
*/
/* extendfs() parameter under s_state & FM_EXTENDFS */
s64 s_xsize; /* 8: extendfs s_size */
pxd_t s_xfsckpxd; /* 8: extendfs fsckpxd */
pxd_t s_xlogpxd; /* 8: extendfs logpxd */
/* - 128 byte boundary - */
/*
* DFS VFS support (preliminary)
*/
char s_attach; /* 1: VFS: flag: set when aggregate is attached
*/
u8 rsrvd4[7]; /* 7: reserved - set to 0 */
u64 totalUsable; /* 8: VFS: total of 1K blocks which are
* available to "normal" (non-root) users.
*/
u64 minFree; /* 8: VFS: # of 1K blocks held in reserve for
* exclusive use of root. This value can be 0,
* and if it is then totalUsable will be equal
* to # of blocks in aggregate. I believe this
* means that minFree + totalUsable = # blocks.
* In that case, we don't need to store both
* totalUsable and minFree since we can compute
* one from the other. I would guess minFree
* would be the one we should store, and
* totalUsable would be the one we should
* compute. (Just a guess...)
*/
u64 realFree; /* 8: VFS: # of free 1K blocks can be used by
* "normal" users. It may be this is something
* we should compute when asked for instead of
* storing in the superblock. I don't know how
* often this information is needed.
*/
/*
* graffiti area
*/
};
extern int readSuper(struct super_block *, struct metapage **);
extern int updateSuper(struct super_block *, uint);
#endif /*_H_JFS_SUPERBLOCK */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
/*
* jfs_txnmgr.c: transaction manager
*
* notes:
* transaction starts with txBegin() and ends with txCommit()
* or txAbort().
*
* tlock is acquired at the time of update;
* (obviate scan at commit time for xtree and dtree)
* tlock and mp points to each other;
* (no hashlist for mp -> tlock).
*
* special cases:
* tlock on in-memory inode:
* in-place tlock in the in-memory inode itself;
* converted to page lock by iWrite() at commit time.
*
* tlock during write()/mmap() under anonymous transaction (tid = 0):
* transferred (?) to transaction at commit time.
*
* use the page itself to update allocation maps
* (obviate intermediate replication of allocation/deallocation data)
* hold on to mp+lock thru update of maps
*/
#include <linux/fs.h>
#include <linux/locks.h>
#include <linux/vmalloc.h>
#include <linux/smp_lock.h>
#include "jfs_incore.h"
#include "jfs_filsys.h"
#include "jfs_metapage.h"
#include "jfs_dinode.h"
#include "jfs_imap.h"
#include "jfs_dmap.h"
#include "jfs_superblock.h"
#include "jfs_debug.h"
/*
* transaction management structures
*/
static struct {
/* tblock */
int freetid; /* 4: index of a free tid structure */
wait_queue_head_t freewait; /* 4: eventlist of free tblock */
/* tlock */
int freelock; /* 4: index first free lock word */
wait_queue_head_t freelockwait; /* 4: eventlist of free tlock */
wait_queue_head_t lowlockwait; /* 4: eventlist of ample tlocks */
int tlocksInUse; /* 4: Number of tlocks in use */
spinlock_t LazyLock; /* 4: synchronize sync_queue & unlock_queue */
/* tblock_t *sync_queue; * 4: Transactions waiting for data sync */
tblock_t *unlock_queue; /* 4: Transactions waiting to be released */
tblock_t *unlock_tail; /* 4: Tail of unlock_queue */
struct list_head anon_list; /* inodes having anonymous txns */
struct list_head anon_list2; /* inodes having anonymous txns
that couldn't be sync'ed */
} TxAnchor;
static int nTxBlock = 512; /* number of transaction blocks */
struct tblock *TxBlock; /* transaction block table */
static int nTxLock = 2048; /* number of transaction locks */
static int TxLockLWM = 2048*.4; /* Low water mark for number of txLocks used */
static int TxLockHWM = 2048*.8; /* High water mark for number of txLocks used */
struct tlock *TxLock; /* transaction lock table */
static int TlocksLow = 0; /* Indicates low number of available tlocks */
/*
* transaction management lock
*/
static spinlock_t jfsTxnLock = SPIN_LOCK_UNLOCKED;
#define TXN_LOCK() spin_lock(&jfsTxnLock)
#define TXN_UNLOCK() spin_unlock(&jfsTxnLock)
#define LAZY_LOCK_INIT() spin_lock_init(&TxAnchor.LazyLock);
#define LAZY_LOCK(flags) spin_lock_irqsave(&TxAnchor.LazyLock, flags)
#define LAZY_UNLOCK(flags) spin_unlock_irqrestore(&TxAnchor.LazyLock, flags)
/*
* Retry logic exist outside these macros to protect from spurrious wakeups.
*/
static inline void TXN_SLEEP_DROP_LOCK(wait_queue_head_t * event)
{
DECLARE_WAITQUEUE(wait, current);
add_wait_queue(event, &wait);
set_current_state(TASK_UNINTERRUPTIBLE);
TXN_UNLOCK();
schedule();
current->state = TASK_RUNNING;
remove_wait_queue(event, &wait);
}
#define TXN_SLEEP(event)\
{\
TXN_SLEEP_DROP_LOCK(event);\
TXN_LOCK();\
}
#define TXN_WAKEUP(event) wake_up_all(event)
/*
* statistics
*/
struct {
tid_t maxtid; /* 4: biggest tid ever used */
lid_t maxlid; /* 4: biggest lid ever used */
int ntid; /* 4: # of transactions performed */
int nlid; /* 4: # of tlocks acquired */
int waitlock; /* 4: # of tlock wait */
} stattx;
/*
* external references
*/
extern int lmGroupCommit(log_t * log, tblock_t * tblk);
extern void lmSync(log_t *);
extern int readSuper(struct super_block *sb, metapage_t ** bpp);
extern int jfs_commit_inode(struct inode *, int);
extern int jfs_thread_stopped(void);
extern struct task_struct *jfsCommitTask;
extern struct completion jfsIOwait;
extern struct task_struct *jfsSyncTask;
/*
* forward references
*/
int diLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck,
commit_t * cd);
int dataLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
void dtLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
void inlineLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
void mapLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
void txAbortCommit(commit_t * cd, int exval);
static void txAllocPMap(struct inode *ip, maplock_t * maplock,
tblock_t * tblk);
void txForce(tblock_t * tblk);
static int txLog(log_t * log, tblock_t * tblk, commit_t * cd);
int txMoreLock(void);
static void txUpdateMap(tblock_t * tblk);
static void txRelease(tblock_t * tblk);
void xtLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
static void LogSyncRelease(metapage_t * mp);
/*
* transaction block/lock management
* ---------------------------------
*/
/*
* Get a transaction lock from the free list. If the number in use is
* greater than the high water mark, wake up the sync daemon. This should
* free some anonymous transaction locks. (TXN_LOCK must be held.)
*/
static lid_t txLockAlloc(void)
{
lid_t lid;
while (!(lid = TxAnchor.freelock))
TXN_SLEEP(&TxAnchor.freelockwait);
TxAnchor.freelock = TxLock[lid].next;
HIGHWATERMARK(stattx.maxlid, lid);
if ((++TxAnchor.tlocksInUse > TxLockHWM) && (TlocksLow == 0)) {
jEVENT(0,("txLockAlloc TlocksLow\n"));
TlocksLow = 1;
wake_up_process(jfsSyncTask);
}
return lid;
}
static void txLockFree(lid_t lid)
{
TxLock[lid].next = TxAnchor.freelock;
TxAnchor.freelock = lid;
TxAnchor.tlocksInUse--;
if (TlocksLow && (TxAnchor.tlocksInUse < TxLockLWM)) {
jEVENT(0,("txLockFree TlocksLow no more\n"));
TlocksLow = 0;
TXN_WAKEUP(&TxAnchor.lowlockwait);
}
TXN_WAKEUP(&TxAnchor.freelockwait);
}
/*
* NAME: txInit()
*
* FUNCTION: initialize transaction management structures
*
* RETURN:
*
* serialization: single thread at jfs_init()
*/
int txInit(void)
{
int k, size;
/*
* initialize transaction block (tblock) table
*
* transaction id (tid) = tblock index
* tid = 0 is reserved.
*/
size = sizeof(tblock_t) * nTxBlock;
TxBlock = (tblock_t *) vmalloc(size);
if (TxBlock == NULL)
return ENOMEM;
for (k = 1; k < nTxBlock - 1; k++) {
TxBlock[k].next = k + 1;
init_waitqueue_head(&TxBlock[k].gcwait);
init_waitqueue_head(&TxBlock[k].waitor);
}
TxBlock[k].next = 0;
init_waitqueue_head(&TxBlock[k].gcwait);
init_waitqueue_head(&TxBlock[k].waitor);
TxAnchor.freetid = 1;
init_waitqueue_head(&TxAnchor.freewait);
stattx.maxtid = 1; /* statistics */
/*
* initialize transaction lock (tlock) table
*
* transaction lock id = tlock index
* tlock id = 0 is reserved.
*/
size = sizeof(tlock_t) * nTxLock;
TxLock = (tlock_t *) vmalloc(size);
if (TxLock == NULL) {
vfree(TxBlock);
return ENOMEM;
}
/* initialize tlock table */
for (k = 1; k < nTxLock - 1; k++)
TxLock[k].next = k + 1;
TxLock[k].next = 0;
init_waitqueue_head(&TxAnchor.freelockwait);
init_waitqueue_head(&TxAnchor.lowlockwait);
TxAnchor.freelock = 1;
TxAnchor.tlocksInUse = 0;
INIT_LIST_HEAD(&TxAnchor.anon_list);
INIT_LIST_HEAD(&TxAnchor.anon_list2);
stattx.maxlid = 1; /* statistics */
return 0;
}
/*
* NAME: txExit()
*
* FUNCTION: clean up when module is unloaded
*/
void txExit(void)
{
vfree(TxLock);
TxLock = 0;
vfree(TxBlock);
TxBlock = 0;
}
/*
* NAME: txBegin()
*
* FUNCTION: start a transaction.
*
* PARAMETER: sb - superblock
* flag - force for nested tx;
*
* RETURN: tid - transaction id
*
* note: flag force allows to start tx for nested tx
* to prevent deadlock on logsync barrier;
*/
tid_t txBegin(struct super_block *sb, int flag)
{
tid_t t;
tblock_t *tblk;
log_t *log;
jFYI(1, ("txBegin: flag = 0x%x\n", flag));
log = (log_t *) JFS_SBI(sb)->log;
TXN_LOCK();
retry:
if (flag != COMMIT_FORCE) {
/*
* synchronize with logsync barrier
*/
if (log->syncbarrier) {
TXN_SLEEP(&log->syncwait);
goto retry;
}
}
if (flag == 0) {
/*
* Don't begin transaction if we're getting starved for tlocks
* unless COMMIT_FORCE (imap changes) or COMMIT_INODE (which
* may ultimately free tlocks)
*/
if (TlocksLow) {
TXN_SLEEP(&TxAnchor.lowlockwait);
goto retry;
}
}
/*
* allocate transaction id/block
*/
if ((t = TxAnchor.freetid) == 0) {
jFYI(1, ("txBegin: waiting for free tid\n"));
TXN_SLEEP(&TxAnchor.freewait);
goto retry;
}
tblk = tid_to_tblock(t);
if ((tblk->next == 0) && (current != jfsCommitTask)) {
/* Save one tblk for jfsCommit thread */
jFYI(1, ("txBegin: waiting for free tid\n"));
TXN_SLEEP(&TxAnchor.freewait);
goto retry;
}
TxAnchor.freetid = tblk->next;
/*
* initialize transaction
*/
/*
* We can't zero the whole thing or we screw up another thread being
* awakened after sleeping on tblk->waitor
*
* memset(tblk, 0, sizeof(tblock_t));
*/
tblk->next = tblk->last = tblk->xflag = tblk->flag = tblk->lsn = 0;
tblk->sb = sb;
++log->logtid;
tblk->logtid = log->logtid;
++log->active;
HIGHWATERMARK(stattx.maxtid, t); /* statistics */
INCREMENT(stattx.ntid); /* statistics */
TXN_UNLOCK();
jFYI(1, ("txBegin: returning tid = %d\n", t));
return t;
}
/*
* NAME: txBeginAnon()
*
* FUNCTION: start an anonymous transaction.
* Blocks if logsync or available tlocks are low to prevent
* anonymous tlocks from depleting supply.
*
* PARAMETER: sb - superblock
*
* RETURN: none
*/
void txBeginAnon(struct super_block *sb)
{
log_t *log;
log = (log_t *) JFS_SBI(sb)->log;
TXN_LOCK();
retry:
/*
* synchronize with logsync barrier
*/
if (log->syncbarrier) {
TXN_SLEEP(&log->syncwait);
goto retry;
}
/*
* Don't begin transaction if we're getting starved for tlocks
*/
if (TlocksLow) {
TXN_SLEEP(&TxAnchor.lowlockwait);
goto retry;
}
TXN_UNLOCK();
}
/*
* txEnd()
*
* function: free specified transaction block.
*
* logsync barrier processing:
*
* serialization:
*/
void txEnd(tid_t tid)
{
tblock_t *tblk = tid_to_tblock(tid);
log_t *log;
jFYI(1, ("txEnd: tid = %d\n", tid));
TXN_LOCK();
/*
* wakeup transactions waiting on the page locked
* by the current transaction
*/
TXN_WAKEUP(&tblk->waitor);
log = (log_t *) JFS_SBI(tblk->sb)->log;
/*
* Lazy commit thread can't free this guy until we mark it UNLOCKED,
* otherwise, we would be left with a transaction that may have been
* reused.
*
* Lazy commit thread will turn off tblkGC_LAZY before calling this
* routine.
*/
if (tblk->flag & tblkGC_LAZY) {
jFYI(1,
("txEnd called w/lazy tid: %d, tblk = 0x%p\n",
tid, tblk));
TXN_UNLOCK();
spin_lock_irq(&log->gclock); // LOGGC_LOCK
tblk->flag |= tblkGC_UNLOCKED;
spin_unlock_irq(&log->gclock); // LOGGC_UNLOCK
return;
}
jFYI(1, ("txEnd: tid: %d, tblk = 0x%p\n", tid, tblk));
assert(tblk->next == 0);
/*
* insert tblock back on freelist
*/
tblk->next = TxAnchor.freetid;
TxAnchor.freetid = tid;
/*
* mark the tblock not active
*/
--log->active;
/*
* synchronize with logsync barrier
*/
if (log->syncbarrier && log->active == 0) {
/* forward log syncpt */
/* lmSync(log); */
jFYI(1, (" log barrier off: 0x%x\n", log->lsn));
/* enable new transactions start */
log->syncbarrier = 0;
/* wakeup all waitors for logsync barrier */
TXN_WAKEUP(&log->syncwait);
}
/*
* wakeup all waitors for a free tblock
*/
TXN_WAKEUP(&TxAnchor.freewait);
TXN_UNLOCK();
jFYI(1, ("txEnd: exitting\n"));
}
/*
* txLock()
*
* function: acquire a transaction lock on the specified <mp>
*
* parameter:
*
* return: transaction lock id
*
* serialization:
*/
tlock_t *txLock(tid_t tid, struct inode *ip, metapage_t * mp, int type)
{
struct jfs_inode_info *jfs_ip = JFS_IP(ip);
int dir_xtree = 0;
lid_t lid;
tid_t xtid;
tlock_t *tlck;
xtlock_t *xtlck;
linelock_t *linelock;
xtpage_t *p;
tblock_t *tblk;
TXN_LOCK();
if (S_ISDIR(ip->i_mode) && (type & tlckXTREE) &&
!(mp->xflag & COMMIT_PAGE)) {
/*
* Directory inode is special. It can have both an xtree tlock
* and a dtree tlock associated with it.
*/
dir_xtree = 1;
lid = jfs_ip->xtlid;
} else
lid = mp->lid;
/* is page not locked by a transaction ? */
if (lid == 0)
goto allocateLock;
jFYI(1, ("txLock: tid:%d ip:0x%p mp:0x%p lid:%d\n",
tid, ip, mp, lid));
/* is page locked by the requester transaction ? */
tlck = lid_to_tlock(lid);
if ((xtid = tlck->tid) == tid)
goto grantLock;
/*
* is page locked by anonymous transaction/lock ?
*
* (page update without transaction (i.e., file write) is
* locked under anonymous transaction tid = 0:
* anonymous tlocks maintained on anonymous tlock list of
* the inode of the page and available to all anonymous
* transactions until txCommit() time at which point
* they are transferred to the transaction tlock list of
* the commiting transaction of the inode)
*/
if (xtid == 0) {
tlck->tid = tid;
tblk = tid_to_tblock(tid);
/*
* The order of the tlocks in the transaction is important
* (during truncate, child xtree pages must be freed before
* parent's tlocks change the working map).
* Take tlock off anonymous list and add to tail of
* transaction list
*
* Note: We really need to get rid of the tid & lid and
* use list_head's. This code is getting UGLY!
*/
if (jfs_ip->atlhead == lid) {
if (jfs_ip->atltail == lid) {
/* only anonymous txn.
* Remove from anon_list
*/
list_del_init(&jfs_ip->anon_inode_list);
}
jfs_ip->atlhead = tlck->next;
} else {
lid_t last;
for (last = jfs_ip->atlhead;
lid_to_tlock(last)->next != lid;
last = lid_to_tlock(last)->next) {
assert(last);
}
lid_to_tlock(last)->next = tlck->next;
if (jfs_ip->atltail == lid)
jfs_ip->atltail = last;
}
/* insert the tlock at tail of transaction tlock list */
if (tblk->next)
lid_to_tlock(tblk->last)->next = lid;
else
tblk->next = lid;
tlck->next = 0;
tblk->last = lid;
goto grantLock;
}
goto waitLock;
/*
* allocate a tlock
*/
allocateLock:
lid = txLockAlloc();
tlck = lid_to_tlock(lid);
/*
* initialize tlock
*/
tlck->tid = tid;
/* mark tlock for meta-data page */
if (mp->xflag & COMMIT_PAGE) {
tlck->flag = tlckPAGELOCK;
/* mark the page dirty and nohomeok */
mark_metapage_dirty(mp);
atomic_inc(&mp->nohomeok);
jFYI(1,
("locking mp = 0x%p, nohomeok = %d tid = %d tlck = 0x%p\n",
mp, atomic_read(&mp->nohomeok), tid, tlck));
/* if anonymous transaction, and buffer is on the group
* commit synclist, mark inode to show this. This will
* prevent the buffer from being marked nohomeok for too
* long a time.
*/
if ((tid == 0) && mp->lsn)
set_cflag(COMMIT_Synclist, ip);
}
/* mark tlock for in-memory inode */
else
tlck->flag = tlckINODELOCK;
tlck->type = 0;
/* bind the tlock and the page */
tlck->ip = ip;
tlck->mp = mp;
if (dir_xtree)
jfs_ip->xtlid = lid;
else
mp->lid = lid;
/*
* enqueue transaction lock to transaction/inode
*/
/* insert the tlock at tail of transaction tlock list */
if (tid) {
tblk = tid_to_tblock(tid);
if (tblk->next)
lid_to_tlock(tblk->last)->next = lid;
else
tblk->next = lid;
tlck->next = 0;
tblk->last = lid;
}
/* anonymous transaction:
* insert the tlock at head of inode anonymous tlock list
*/
else {
tlck->next = jfs_ip->atlhead;
jfs_ip->atlhead = lid;
if (tlck->next == 0) {
/* This inode's first anonymous transaction */
jfs_ip->atltail = lid;
list_add_tail(&jfs_ip->anon_inode_list,
&TxAnchor.anon_list);
}
}
/* initialize type dependent area for linelock */
linelock = (linelock_t *) & tlck->lock;
linelock->next = 0;
linelock->flag = tlckLINELOCK;
linelock->maxcnt = TLOCKSHORT;
linelock->index = 0;
switch (type & tlckTYPE) {
case tlckDTREE:
linelock->l2linesize = L2DTSLOTSIZE;
break;
case tlckXTREE:
linelock->l2linesize = L2XTSLOTSIZE;
xtlck = (xtlock_t *) linelock;
xtlck->header.offset = 0;
xtlck->header.length = 2;
if (type & tlckNEW) {
xtlck->lwm.offset = XTENTRYSTART;
} else {
if (mp->xflag & COMMIT_PAGE)
p = (xtpage_t *) mp->data;
else
p = &jfs_ip->i_xtroot;
xtlck->lwm.offset =
le16_to_cpu(p->header.nextindex);
}
xtlck->lwm.length = 0; /* ! */
xtlck->index = 2;
break;
case tlckINODE:
linelock->l2linesize = L2INODESLOTSIZE;
break;
case tlckDATA:
linelock->l2linesize = L2DATASLOTSIZE;
break;
default:
jERROR(1, ("UFO tlock:0x%p\n", tlck));
}
/*
* update tlock vector
*/
grantLock:
tlck->type |= type;
TXN_UNLOCK();
return tlck;
/*
* page is being locked by another transaction:
*/
waitLock:
/* Only locks on ipimap or ipaimap should reach here */
/* assert(jfs_ip->fileset == AGGREGATE_I); */
if (jfs_ip->fileset != AGGREGATE_I) {
jERROR(1, ("txLock: trying to lock locked page!\n"));
dump_mem("ip", ip, sizeof(struct inode));
dump_mem("mp", mp, sizeof(metapage_t));
dump_mem("Locker's tblk", tid_to_tblock(tid),
sizeof(tblock_t));
dump_mem("Tlock", tlck, sizeof(tlock_t));
BUG();
}
INCREMENT(stattx.waitlock); /* statistics */
release_metapage(mp);
jEVENT(0, ("txLock: in waitLock, tid = %d, xtid = %d, lid = %d\n",
tid, xtid, lid));
TXN_SLEEP_DROP_LOCK(&tid_to_tblock(xtid)->waitor);
jEVENT(0, ("txLock: awakened tid = %d, lid = %d\n", tid, lid));
return NULL;
}
/*
* NAME: txRelease()
*
* FUNCTION: Release buffers associated with transaction locks, but don't
* mark homeok yet. The allows other transactions to modify
* buffers, but won't let them go to disk until commit record
* actually gets written.
*
* PARAMETER:
* tblk -
*
* RETURN: Errors from subroutines.
*/
static void txRelease(tblock_t * tblk)
{
metapage_t *mp;
lid_t lid;
tlock_t *tlck;
TXN_LOCK();
for (lid = tblk->next; lid; lid = tlck->next) {
tlck = lid_to_tlock(lid);
if ((mp = tlck->mp) != NULL &&
(tlck->type & tlckBTROOT) == 0) {
assert(mp->xflag & COMMIT_PAGE);
mp->lid = 0;
}
}
/*
* wakeup transactions waiting on a page locked
* by the current transaction
*/
TXN_WAKEUP(&tblk->waitor);
TXN_UNLOCK();
}
/*
* NAME: txUnlock()
*
* FUNCTION: Initiates pageout of pages modified by tid in journalled
* objects and frees their lockwords.
*
* PARAMETER:
* flag -
*
* RETURN: Errors from subroutines.
*/
static void txUnlock(tblock_t * tblk, int flag)
{
tlock_t *tlck;
linelock_t *linelock;
lid_t lid, next, llid, k;
metapage_t *mp;
log_t *log;
int force;
int difft, diffp;
jFYI(1, ("txUnlock: tblk = 0x%p\n", tblk));
log = (log_t *) JFS_SBI(tblk->sb)->log;
force = flag & COMMIT_FLUSH;
if (log->syncbarrier)
force |= COMMIT_FORCE;
/*
* mark page under tlock homeok (its log has been written):
* if caller has specified FORCE (e.g., iRecycle()), or
* if syncwait for the log is set (i.e., the log sync point
* has fallen behind), or
* if syncpt is set for the page, or
* if the page is new, initiate pageout;
* otherwise, leave the page in memory.
*/
for (lid = tblk->next; lid; lid = next) {
tlck = lid_to_tlock(lid);
next = tlck->next;
jFYI(1, ("unlocking lid = %d, tlck = 0x%p\n", lid, tlck));
/* unbind page from tlock */
if ((mp = tlck->mp) != NULL &&
(tlck->type & tlckBTROOT) == 0) {
assert(mp->xflag & COMMIT_PAGE);
/* hold buffer
*
* It's possible that someone else has the metapage.
* The only things were changing are nohomeok, which
* is handled atomically, and clsn which is protected
* by the LOGSYNC_LOCK.
*/
hold_metapage(mp, 1);
assert(atomic_read(&mp->nohomeok) > 0);
atomic_dec(&mp->nohomeok);
/* inherit younger/larger clsn */
LOGSYNC_LOCK(log);
if (mp->clsn) {
logdiff(difft, tblk->clsn, log);
logdiff(diffp, mp->clsn, log);
if (difft > diffp)
mp->clsn = tblk->clsn;
} else
mp->clsn = tblk->clsn;
LOGSYNC_UNLOCK(log);
assert(!(tlck->flag & tlckFREEPAGE));
if (tlck->flag & tlckWRITEPAGE) {
write_metapage(mp);
} else {
/* release page which has been forced */
release_metapage(mp);
}
}
/* insert tlock, and linelock(s) of the tlock if any,
* at head of freelist
*/
TXN_LOCK();
llid = ((linelock_t *) & tlck->lock)->next;
while (llid) {
linelock = (linelock_t *) lid_to_tlock(llid);
k = linelock->next;
txLockFree(llid);
llid = k;
}
txLockFree(lid);
TXN_UNLOCK();
}
tblk->next = tblk->last = 0;
/*
* remove tblock from logsynclist
* (allocation map pages inherited lsn of tblk and
* has been inserted in logsync list at txUpdateMap())
*/
if (tblk->lsn) {
LOGSYNC_LOCK(log);
log->count--;
list_del(&tblk->synclist);
LOGSYNC_UNLOCK(log);
}
}
/*
* txMaplock()
*
* function: allocate a transaction lock for freed page/entry;
* for freed page, maplock is used as xtlock/dtlock type;
*/
tlock_t *txMaplock(tid_t tid, struct inode *ip, int type)
{
struct jfs_inode_info *jfs_ip = JFS_IP(ip);
lid_t lid;
tblock_t *tblk;
tlock_t *tlck;
maplock_t *maplock;
TXN_LOCK();
/*
* allocate a tlock
*/
lid = txLockAlloc();
tlck = lid_to_tlock(lid);
/*
* initialize tlock
*/
tlck->tid = tid;
/* bind the tlock and the object */
tlck->flag = tlckINODELOCK;
tlck->ip = ip;
tlck->mp = NULL;
tlck->type = type;
/*
* enqueue transaction lock to transaction/inode
*/
/* insert the tlock at tail of transaction tlock list */
if (tid) {
tblk = tid_to_tblock(tid);
if (tblk->next)
lid_to_tlock(tblk->last)->next = lid;
else
tblk->next = lid;
tlck->next = 0;
tblk->last = lid;
}
/* anonymous transaction:
* insert the tlock at head of inode anonymous tlock list
*/
else {
tlck->next = jfs_ip->atlhead;
jfs_ip->atlhead = lid;
if (tlck->next == 0) {
/* This inode's first anonymous transaction */
jfs_ip->atltail = lid;
list_add_tail(&jfs_ip->anon_inode_list,
&TxAnchor.anon_list);
}
}
TXN_UNLOCK();
/* initialize type dependent area for maplock */
maplock = (maplock_t *) & tlck->lock;
maplock->next = 0;
maplock->maxcnt = 0;
maplock->index = 0;
return tlck;
}
/*
* txLinelock()
*
* function: allocate a transaction lock for log vector list
*/
linelock_t *txLinelock(linelock_t * tlock)
{
lid_t lid;
tlock_t *tlck;
linelock_t *linelock;
TXN_LOCK();
/* allocate a TxLock structure */
lid = txLockAlloc();
tlck = lid_to_tlock(lid);
TXN_UNLOCK();
/* initialize linelock */
linelock = (linelock_t *) tlck;
linelock->next = 0;
linelock->flag = tlckLINELOCK;
linelock->maxcnt = TLOCKLONG;
linelock->index = 0;
/* append linelock after tlock */
linelock->next = tlock->next;
tlock->next = lid;
return linelock;
}
/*
* transaction commit management
* -----------------------------
*/
/*
* NAME: txCommit()
*
* FUNCTION: commit the changes to the objects specified in
* clist. For journalled segments only the
* changes of the caller are committed, ie by tid.
* for non-journalled segments the data are flushed to
* disk and then the change to the disk inode and indirect
* blocks committed (so blocks newly allocated to the
* segment will be made a part of the segment atomically).
*
* all of the segments specified in clist must be in
* one file system. no more than 6 segments are needed
* to handle all unix svcs.
*
* if the i_nlink field (i.e. disk inode link count)
* is zero, and the type of inode is a regular file or
* directory, or symbolic link , the inode is truncated
* to zero length. the truncation is committed but the
* VM resources are unaffected until it is closed (see
* iput and iclose).
*
* PARAMETER:
*
* RETURN:
*
* serialization:
* on entry the inode lock on each segment is assumed
* to be held.
*
* i/o error:
*/
int txCommit(tid_t tid, /* transaction identifier */
int nip, /* number of inodes to commit */
struct inode **iplist, /* list of inode to commit */
int flag)
{
int rc = 0, rc1 = 0;
commit_t cd;
log_t *log;
tblock_t *tblk;
lrd_t *lrd;
int lsn;
struct inode *ip;
struct jfs_inode_info *jfs_ip;
int k, n;
ino_t top;
struct super_block *sb;
jFYI(1, ("txCommit, tid = %d, flag = %d\n", tid, flag));
/* is read-only file system ? */
if (isReadOnly(iplist[0])) {
rc = EROFS;
goto TheEnd;
}
sb = cd.sb = iplist[0]->i_sb;
cd.tid = tid;
if (tid == 0)
tid = txBegin(sb, 0);
tblk = tid_to_tblock(tid);
/*
* initialize commit structure
*/
log = (log_t *) JFS_SBI(sb)->log;
cd.log = log;
/* initialize log record descriptor in commit */
lrd = &cd.lrd;
lrd->logtid = cpu_to_le32(tblk->logtid);
lrd->backchain = 0;
tblk->xflag |= flag;
if ((flag & (COMMIT_FORCE | COMMIT_SYNC)) == 0)
tblk->xflag |= COMMIT_LAZY;
/*
* prepare non-journaled objects for commit
*
* flush data pages of non-journaled file
* to prevent the file getting non-initialized disk blocks
* in case of crash.
* (new blocks - )
*/
cd.iplist = iplist;
cd.nip = nip;
/*
* acquire transaction lock on (on-disk) inodes
*
* update on-disk inode from in-memory inode
* acquiring transaction locks for AFTER records
* on the on-disk inode of file object
*
* sort the inodes array by inode number in descending order
* to prevent deadlock when acquiring transaction lock
* of on-disk inodes on multiple on-disk inode pages by
* multiple concurrent transactions
*/
for (k = 0; k < cd.nip; k++) {
top = (cd.iplist[k])->i_ino;
for (n = k + 1; n < cd.nip; n++) {
ip = cd.iplist[n];
if (ip->i_ino > top) {
top = ip->i_ino;
cd.iplist[n] = cd.iplist[k];
cd.iplist[k] = ip;
}
}
ip = cd.iplist[k];
jfs_ip = JFS_IP(ip);
/*
* BUGBUG - Should we call filemap_fdatasync here instead
* of fsync_inode_data?
* If we do, we have a deadlock condition since we may end
* up recursively calling jfs_get_block with the IWRITELOCK
* held. We may be able to do away with IWRITELOCK while
* committing transactions and use i_sem instead.
*/
if ((!S_ISDIR(ip->i_mode))
&& (tblk->flag & COMMIT_DELETE) == 0)
fsync_inode_data_buffers(ip);
/*
* Mark inode as not dirty. It will still be on the dirty
* inode list, but we'll know not to commit it again unless
* it gets marked dirty again
*/
clear_cflag(COMMIT_Dirty, ip);
/* inherit anonymous tlock(s) of inode */
if (jfs_ip->atlhead) {
lid_to_tlock(jfs_ip->atltail)->next = tblk->next;
tblk->next = jfs_ip->atlhead;
if (!tblk->last)
tblk->last = jfs_ip->atltail;
jfs_ip->atlhead = jfs_ip->atltail = 0;
TXN_LOCK();
list_del_init(&jfs_ip->anon_inode_list);
TXN_UNLOCK();
}
/*
* acquire transaction lock on on-disk inode page
* (become first tlock of the tblk's tlock list)
*/
if (((rc = diWrite(tid, ip))))
goto out;
}
/*
* write log records from transaction locks
*
* txUpdateMap() resets XAD_NEW in XAD.
*/
if ((rc = txLog(log, tblk, &cd)))
goto TheEnd;
/*
* Ensure that inode isn't reused before
* lazy commit thread finishes processing
*/
if (tblk->xflag & (COMMIT_CREATE | COMMIT_DELETE))
atomic_inc(&tblk->ip->i_count);
if (tblk->xflag & COMMIT_DELETE) {
ip = tblk->ip;
assert((ip->i_nlink == 0) && !test_cflag(COMMIT_Nolink, ip));
set_cflag(COMMIT_Nolink, ip);
}
/*
* write COMMIT log record
*/
lrd->type = cpu_to_le16(LOG_COMMIT);
lrd->length = 0;
lsn = lmLog(log, tblk, lrd, NULL);
lmGroupCommit(log, tblk);
/*
* - transaction is now committed -
*/
/*
* force pages in careful update
* (imap addressing structure update)
*/
if (flag & COMMIT_FORCE)
txForce(tblk);
/*
* update allocation map.
*
* update inode allocation map and inode:
* free pager lock on memory object of inode if any.
* update block allocation map.
*
* txUpdateMap() resets XAD_NEW in XAD.
*/
if (tblk->xflag & COMMIT_FORCE)
txUpdateMap(tblk);
/*
* free transaction locks and pageout/free pages
*/
txRelease(tblk);
if ((tblk->flag & tblkGC_LAZY) == 0)
txUnlock(tblk, flag);
/*
* reset in-memory object state
*/
for (k = 0; k < cd.nip; k++) {
ip = cd.iplist[k];
jfs_ip = JFS_IP(ip);
/*
* reset in-memory inode state
*/
jfs_ip->bxflag = 0;
jfs_ip->blid = 0;
}
out:
if (rc != 0)
txAbortCommit(&cd, rc);
else
rc = rc1;
TheEnd:
jFYI(1, ("txCommit: tid = %d, returning %d\n", tid, rc));
return rc;
}
/*
* NAME: txLog()
*
* FUNCTION: Writes AFTER log records for all lines modified
* by tid for segments specified by inodes in comdata.
* Code assumes only WRITELOCKS are recorded in lockwords.
*
* PARAMETERS:
*
* RETURN :
*/
static int txLog(log_t * log, tblock_t * tblk, commit_t * cd)
{
int rc = 0;
struct inode *ip;
lid_t lid;
tlock_t *tlck;
lrd_t *lrd = &cd->lrd;
/*
* write log record(s) for each tlock of transaction,
*/
for (lid = tblk->next; lid; lid = tlck->next) {
tlck = lid_to_tlock(lid);
tlck->flag |= tlckLOG;
/* initialize lrd common */
ip = tlck->ip;
lrd->aggregate = cpu_to_le32(kdev_t_to_nr(ip->i_dev));
lrd->log.redopage.fileset = cpu_to_le32(JFS_IP(ip)->fileset);
lrd->log.redopage.inode = cpu_to_le32(ip->i_ino);
if (tlck->mp)
hold_metapage(tlck->mp, 0);
/* write log record of page from the tlock */
switch (tlck->type & tlckTYPE) {
case tlckXTREE:
xtLog(log, tblk, lrd, tlck);
break;
case tlckDTREE:
dtLog(log, tblk, lrd, tlck);
break;
case tlckINODE:
diLog(log, tblk, lrd, tlck, cd);
break;
case tlckMAP:
mapLog(log, tblk, lrd, tlck);
break;
case tlckDATA:
dataLog(log, tblk, lrd, tlck);
break;
default:
jERROR(1, ("UFO tlock:0x%p\n", tlck));
}
if (tlck->mp)
release_metapage(tlck->mp);
}
return rc;
}
/*
* diLog()
*
* function: log inode tlock and format maplock to update bmap;
*/
int diLog(log_t * log,
tblock_t * tblk, lrd_t * lrd, tlock_t * tlck, commit_t * cd)
{
int rc = 0;
metapage_t *mp;
pxd_t *pxd;
pxdlock_t *pxdlock;
mp = tlck->mp;
/* initialize as REDOPAGE record format */
lrd->log.redopage.type = cpu_to_le16(LOG_INODE);
lrd->log.redopage.l2linesize = cpu_to_le16(L2INODESLOTSIZE);
pxd = &lrd->log.redopage.pxd;
/*
* inode after image
*/
if (tlck->type & tlckENTRY) {
/* log after-image for logredo(): */
lrd->type = cpu_to_le16(LOG_REDOPAGE);
// *pxd = mp->cm_pxd;
PXDaddress(pxd, mp->index);
PXDlength(pxd,
mp->logical_size >> tblk->sb->s_blocksize_bits);
lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
/* mark page as homeward bound */
tlck->flag |= tlckWRITEPAGE;
} else if (tlck->type & tlckFREE) {
/*
* free inode extent
*
* (pages of the freed inode extent have been invalidated and
* a maplock for free of the extent has been formatted at
* txLock() time);
*
* the tlock had been acquired on the inode allocation map page
* (iag) that specifies the freed extent, even though the map
* page is not itself logged, to prevent pageout of the map
* page before the log;
*/
assert(tlck->type & tlckFREE);
/* log LOG_NOREDOINOEXT of the freed inode extent for
* logredo() to start NoRedoPage filters, and to update
* imap and bmap for free of the extent;
*/
lrd->type = cpu_to_le16(LOG_NOREDOINOEXT);
/*
* For the LOG_NOREDOINOEXT record, we need
* to pass the IAG number and inode extent
* index (within that IAG) from which the
* the extent being released. These have been
* passed to us in the iplist[1] and iplist[2].
*/
lrd->log.noredoinoext.iagnum =
cpu_to_le32((u32) (size_t) cd->iplist[1]);
lrd->log.noredoinoext.inoext_idx =
cpu_to_le32((u32) (size_t) cd->iplist[2]);
pxdlock = (pxdlock_t *) & tlck->lock;
*pxd = pxdlock->pxd;
lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, NULL));
/* update bmap */
tlck->flag |= tlckUPDATEMAP;
/* mark page as homeward bound */
tlck->flag |= tlckWRITEPAGE;
} else {
jERROR(2, ("diLog: UFO type tlck:0x%p\n", tlck));
}
#ifdef _JFS_WIP
/*
* alloc/free external EA extent
*
* a maplock for txUpdateMap() to update bPWMAP for alloc/free
* of the extent has been formatted at txLock() time;
*/
else {
assert(tlck->type & tlckEA);
/* log LOG_UPDATEMAP for logredo() to update bmap for
* alloc of new (and free of old) external EA extent;
*/
lrd->type = cpu_to_le16(LOG_UPDATEMAP);
pxdlock = (pxdlock_t *) & tlck->lock;
nlock = pxdlock->index;
for (i = 0; i < nlock; i++, pxdlock++) {
if (pxdlock->flag & mlckALLOCPXD)
lrd->log.updatemap.type =
cpu_to_le16(LOG_ALLOCPXD);
else
lrd->log.updatemap.type =
cpu_to_le16(LOG_FREEPXD);
lrd->log.updatemap.nxd = cpu_to_le16(1);
lrd->log.updatemap.pxd = pxdlock->pxd;
lrd->backchain =
cpu_to_le32(lmLog(log, tblk, lrd, NULL));
}
/* update bmap */
tlck->flag |= tlckUPDATEMAP;
}
#endif /* _JFS_WIP */
return rc;
}
/*
* dataLog()
*
* function: log data tlock
*/
int dataLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
{
metapage_t *mp;
pxd_t *pxd;
int rc;
s64 xaddr;
int xflag;
s32 xlen;
mp = tlck->mp;
/* initialize as REDOPAGE record format */
lrd->log.redopage.type = cpu_to_le16(LOG_DATA);
lrd->log.redopage.l2linesize = cpu_to_le16(L2DATASLOTSIZE);
pxd = &lrd->log.redopage.pxd;
/* log after-image for logredo(): */
lrd->type = cpu_to_le16(LOG_REDOPAGE);
if (JFS_IP(tlck->ip)->next_index < MAX_INLINE_DIRTABLE_ENTRY) {
/*
* The table has been truncated, we've must have deleted
* the last entry, so don't bother logging this
*/
mp->lid = 0;
atomic_dec(&mp->nohomeok);
discard_metapage(mp);
tlck->mp = 0;
return 0;
}
rc = xtLookup(tlck->ip, mp->index, 1, &xflag, &xaddr, &xlen, 1);
if (rc || (xlen == 0)) {
jERROR(1, ("dataLog: can't find physical address\n"));
return 0;
}
PXDaddress(pxd, xaddr);
PXDlength(pxd, mp->logical_size >> tblk->sb->s_blocksize_bits);
lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
/* mark page as homeward bound */
tlck->flag |= tlckWRITEPAGE;
return 0;
}
/*
* dtLog()
*
* function: log dtree tlock and format maplock to update bmap;
*/
void dtLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
{
struct inode *ip;
metapage_t *mp;
pxdlock_t *pxdlock;
pxd_t *pxd;
ip = tlck->ip;
mp = tlck->mp;
/* initialize as REDOPAGE/NOREDOPAGE record format */
lrd->log.redopage.type = cpu_to_le16(LOG_DTREE);
lrd->log.redopage.l2linesize = cpu_to_le16(L2DTSLOTSIZE);
pxd = &lrd->log.redopage.pxd;
if (tlck->type & tlckBTROOT)
lrd->log.redopage.type |= cpu_to_le16(LOG_BTROOT);
/*
* page extension via relocation: entry insertion;
* page extension in-place: entry insertion;
* new right page from page split, reinitialized in-line
* root from root page split: entry insertion;
*/
if (tlck->type & (tlckNEW | tlckEXTEND)) {
/* log after-image of the new page for logredo():
* mark log (LOG_NEW) for logredo() to initialize
* freelist and update bmap for alloc of the new page;
*/
lrd->type = cpu_to_le16(LOG_REDOPAGE);
if (tlck->type & tlckEXTEND)
lrd->log.redopage.type |= cpu_to_le16(LOG_EXTEND);
else
lrd->log.redopage.type |= cpu_to_le16(LOG_NEW);
// *pxd = mp->cm_pxd;
PXDaddress(pxd, mp->index);
PXDlength(pxd,
mp->logical_size >> tblk->sb->s_blocksize_bits);
lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
/* format a maplock for txUpdateMap() to update bPMAP for
* alloc of the new page;
*/
if (tlck->type & tlckBTROOT)
return;
tlck->flag |= tlckUPDATEMAP;
pxdlock = (pxdlock_t *) & tlck->lock;
pxdlock->flag = mlckALLOCPXD;
pxdlock->pxd = *pxd;
pxdlock->index = 1;
/* mark page as homeward bound */
tlck->flag |= tlckWRITEPAGE;
return;
}
/*
* entry insertion/deletion,
* sibling page link update (old right page before split);
*/
if (tlck->type & (tlckENTRY | tlckRELINK)) {
/* log after-image for logredo(): */
lrd->type = cpu_to_le16(LOG_REDOPAGE);
PXDaddress(pxd, mp->index);
PXDlength(pxd,
mp->logical_size >> tblk->sb->s_blocksize_bits);
lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
/* mark page as homeward bound */
tlck->flag |= tlckWRITEPAGE;
return;
}
/*
* page deletion: page has been invalidated
* page relocation: source extent
*
* a maplock for free of the page has been formatted
* at txLock() time);
*/
if (tlck->type & (tlckFREE | tlckRELOCATE)) {
/* log LOG_NOREDOPAGE of the deleted page for logredo()
* to start NoRedoPage filter and to update bmap for free
* of the deletd page
*/
lrd->type = cpu_to_le16(LOG_NOREDOPAGE);
pxdlock = (pxdlock_t *) & tlck->lock;
*pxd = pxdlock->pxd;
lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, NULL));
/* a maplock for txUpdateMap() for free of the page
* has been formatted at txLock() time;
*/
tlck->flag |= tlckUPDATEMAP;
}
return;
}
/*
* xtLog()
*
* function: log xtree tlock and format maplock to update bmap;
*/
void xtLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
{
struct inode *ip;
metapage_t *mp;
xtpage_t *p;
xtlock_t *xtlck;
maplock_t *maplock;
xdlistlock_t *xadlock;
pxdlock_t *pxdlock;
pxd_t *pxd;
int next, lwm, hwm;
ip = tlck->ip;
mp = tlck->mp;
/* initialize as REDOPAGE/NOREDOPAGE record format */
lrd->log.redopage.type = cpu_to_le16(LOG_XTREE);
lrd->log.redopage.l2linesize = cpu_to_le16(L2XTSLOTSIZE);
pxd = &lrd->log.redopage.pxd;
if (tlck->type & tlckBTROOT) {
lrd->log.redopage.type |= cpu_to_le16(LOG_BTROOT);
p = &JFS_IP(ip)->i_xtroot;
if (S_ISDIR(ip->i_mode))
lrd->log.redopage.type |=
cpu_to_le16(LOG_DIR_XTREE);
} else
p = (xtpage_t *) mp->data;
next = le16_to_cpu(p->header.nextindex);
xtlck = (xtlock_t *) & tlck->lock;
maplock = (maplock_t *) & tlck->lock;
xadlock = (xdlistlock_t *) maplock;
/*
* entry insertion/extension;
* sibling page link update (old right page before split);
*/
if (tlck->type & (tlckNEW | tlckGROW | tlckRELINK)) {
/* log after-image for logredo():
* logredo() will update bmap for alloc of new/extended
* extents (XAD_NEW|XAD_EXTEND) of XAD[lwm:next) from
* after-image of XADlist;
* logredo() resets (XAD_NEW|XAD_EXTEND) flag when
* applying the after-image to the meta-data page.
*/
lrd->type = cpu_to_le16(LOG_REDOPAGE);
// *pxd = mp->cm_pxd;
PXDaddress(pxd, mp->index);
PXDlength(pxd,
mp->logical_size >> tblk->sb->s_blocksize_bits);
lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
/* format a maplock for txUpdateMap() to update bPMAP
* for alloc of new/extended extents of XAD[lwm:next)
* from the page itself;
* txUpdateMap() resets (XAD_NEW|XAD_EXTEND) flag.
*/
lwm = xtlck->lwm.offset;
if (lwm == 0)
lwm = XTPAGEMAXSLOT;
if (lwm == next)
goto out;
assert(lwm < next);
tlck->flag |= tlckUPDATEMAP;
xadlock->flag = mlckALLOCXADLIST;
xadlock->count = next - lwm;
if ((xadlock->count <= 2) && (tblk->xflag & COMMIT_LAZY)) {
int i;
/*
* Lazy commit may allow xtree to be modified before
* txUpdateMap runs. Copy xad into linelock to
* preserve correct data.
*/
xadlock->xdlist = &xtlck->pxdlock;
memcpy(xadlock->xdlist, &p->xad[lwm],
sizeof(xad_t) * xadlock->count);
for (i = 0; i < xadlock->count; i++)
p->xad[lwm + i].flag &=
~(XAD_NEW | XAD_EXTENDED);
} else {
/*
* xdlist will point to into inode's xtree, ensure
* that transaction is not committed lazily.
*/
xadlock->xdlist = &p->xad[lwm];
tblk->xflag &= ~COMMIT_LAZY;
}
jFYI(1,
("xtLog: alloc ip:0x%p mp:0x%p tlck:0x%p lwm:%d count:%d\n",
tlck->ip, mp, tlck, lwm, xadlock->count));
maplock->index = 1;
out:
/* mark page as homeward bound */
tlck->flag |= tlckWRITEPAGE;
return;
}
/*
* page deletion: file deletion/truncation (ref. xtTruncate())
*
* (page will be invalidated after log is written and bmap
* is updated from the page);
*/
if (tlck->type & tlckFREE) {
/* LOG_NOREDOPAGE log for NoRedoPage filter:
* if page free from file delete, NoRedoFile filter from
* inode image of zero link count will subsume NoRedoPage
* filters for each page;
* if page free from file truncattion, write NoRedoPage
* filter;
*
* upadte of block allocation map for the page itself:
* if page free from deletion and truncation, LOG_UPDATEMAP
* log for the page itself is generated from processing
* its parent page xad entries;
*/
/* if page free from file truncation, log LOG_NOREDOPAGE
* of the deleted page for logredo() to start NoRedoPage
* filter for the page;
*/
if (tblk->xflag & COMMIT_TRUNCATE) {
/* write NOREDOPAGE for the page */
lrd->type = cpu_to_le16(LOG_NOREDOPAGE);
PXDaddress(pxd, mp->index);
PXDlength(pxd,
mp->logical_size >> tblk->sb->
s_blocksize_bits);
lrd->backchain =
cpu_to_le32(lmLog(log, tblk, lrd, NULL));
if (tlck->type & tlckBTROOT) {
/* Empty xtree must be logged */
lrd->type = cpu_to_le16(LOG_REDOPAGE);
lrd->backchain =
cpu_to_le32(lmLog(log, tblk, lrd, tlck));
}
}
/* init LOG_UPDATEMAP of the freed extents
* XAD[XTENTRYSTART:hwm) from the deleted page itself
* for logredo() to update bmap;
*/
lrd->type = cpu_to_le16(LOG_UPDATEMAP);
lrd->log.updatemap.type = cpu_to_le16(LOG_FREEXADLIST);
xtlck = (xtlock_t *) & tlck->lock;
hwm = xtlck->hwm.offset;
lrd->log.updatemap.nxd =
cpu_to_le16(hwm - XTENTRYSTART + 1);
/* reformat linelock for lmLog() */
xtlck->header.offset = XTENTRYSTART;
xtlck->header.length = hwm - XTENTRYSTART + 1;
xtlck->index = 1;
lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
/* format a maplock for txUpdateMap() to update bmap
* to free extents of XAD[XTENTRYSTART:hwm) from the
* deleted page itself;
*/
tlck->flag |= tlckUPDATEMAP;
xadlock->flag = mlckFREEXADLIST;
xadlock->count = hwm - XTENTRYSTART + 1;
if ((xadlock->count <= 2) && (tblk->xflag & COMMIT_LAZY)) {
/*
* Lazy commit may allow xtree to be modified before
* txUpdateMap runs. Copy xad into linelock to
* preserve correct data.
*/
xadlock->xdlist = &xtlck->pxdlock;
memcpy(xadlock->xdlist, &p->xad[XTENTRYSTART],
sizeof(xad_t) * xadlock->count);
} else {
/*
* xdlist will point to into inode's xtree, ensure
* that transaction is not committed lazily unless
* we're deleting the inode (unlink). In that case
* we have special logic for the inode to be
* unlocked by the lazy commit thread.
*/
xadlock->xdlist = &p->xad[XTENTRYSTART];
if ((tblk->xflag & COMMIT_LAZY) &&
(tblk->xflag & COMMIT_DELETE) &&
(tblk->ip == ip))
set_cflag(COMMIT_Holdlock, ip);
else
tblk->xflag &= ~COMMIT_LAZY;
}
jFYI(1,
("xtLog: free ip:0x%p mp:0x%p count:%d lwm:2\n",
tlck->ip, mp, xadlock->count));
maplock->index = 1;
/* mark page as invalid */
if (((tblk->xflag & COMMIT_PWMAP) || S_ISDIR(ip->i_mode))
&& !(tlck->type & tlckBTROOT))
tlck->flag |= tlckFREEPAGE;
/*
else (tblk->xflag & COMMIT_PMAP)
? release the page;
*/
return;
}
/*
* page/entry truncation: file truncation (ref. xtTruncate())
*
* |----------+------+------+---------------|
* | | |
* | | hwm - hwm before truncation
* | next - truncation point
* lwm - lwm before truncation
* header ?
*/
if (tlck->type & tlckTRUNCATE) {
pxd_t tpxd; /* truncated extent of xad */
/*
* For truncation the entire linelock may be used, so it would
* be difficult to store xad list in linelock itself.
* Therefore, we'll just force transaction to be committed
* synchronously, so that xtree pages won't be changed before
* txUpdateMap runs.
*/
tblk->xflag &= ~COMMIT_LAZY;
lwm = xtlck->lwm.offset;
if (lwm == 0)
lwm = XTPAGEMAXSLOT;
hwm = xtlck->hwm.offset;
/*
* write log records
*/
/*
* allocate entries XAD[lwm:next]:
*/
if (lwm < next) {
/* log after-image for logredo():
* logredo() will update bmap for alloc of new/extended
* extents (XAD_NEW|XAD_EXTEND) of XAD[lwm:next) from
* after-image of XADlist;
* logredo() resets (XAD_NEW|XAD_EXTEND) flag when
* applying the after-image to the meta-data page.
*/
lrd->type = cpu_to_le16(LOG_REDOPAGE);
PXDaddress(pxd, mp->index);
PXDlength(pxd,
mp->logical_size >> tblk->sb->
s_blocksize_bits);
lrd->backchain =
cpu_to_le32(lmLog(log, tblk, lrd, tlck));
}
/*
* truncate entry XAD[hwm == next - 1]:
*/
if (hwm == next - 1) {
/* init LOG_UPDATEMAP for logredo() to update bmap for
* free of truncated delta extent of the truncated
* entry XAD[next - 1]:
* (xtlck->pxdlock = truncated delta extent);
*/
pxdlock = (pxdlock_t *) & xtlck->pxdlock;
/* assert(pxdlock->type & tlckTRUNCATE); */
lrd->type = cpu_to_le16(LOG_UPDATEMAP);
lrd->log.updatemap.type = cpu_to_le16(LOG_FREEPXD);
lrd->log.updatemap.nxd = cpu_to_le16(1);
lrd->log.updatemap.pxd = pxdlock->pxd;
tpxd = pxdlock->pxd; /* save to format maplock */
lrd->backchain =
cpu_to_le32(lmLog(log, tblk, lrd, NULL));
}
/*
* free entries XAD[next:hwm]:
*/
if (hwm >= next) {
/* init LOG_UPDATEMAP of the freed extents
* XAD[next:hwm] from the deleted page itself
* for logredo() to update bmap;
*/
lrd->type = cpu_to_le16(LOG_UPDATEMAP);
lrd->log.updatemap.type =
cpu_to_le16(LOG_FREEXADLIST);
xtlck = (xtlock_t *) & tlck->lock;
hwm = xtlck->hwm.offset;
lrd->log.updatemap.nxd =
cpu_to_le16(hwm - next + 1);
/* reformat linelock for lmLog() */
xtlck->header.offset = next;
xtlck->header.length = hwm - next + 1;
xtlck->index = 1;
lrd->backchain =
cpu_to_le32(lmLog(log, tblk, lrd, tlck));
}
/*
* format maplock(s) for txUpdateMap() to update bmap
*/
maplock->index = 0;
/*
* allocate entries XAD[lwm:next):
*/
if (lwm < next) {
/* format a maplock for txUpdateMap() to update bPMAP
* for alloc of new/extended extents of XAD[lwm:next)
* from the page itself;
* txUpdateMap() resets (XAD_NEW|XAD_EXTEND) flag.
*/
tlck->flag |= tlckUPDATEMAP;
xadlock->flag = mlckALLOCXADLIST;
xadlock->count = next - lwm;
xadlock->xdlist = &p->xad[lwm];
jFYI(1,
("xtLog: alloc ip:0x%p mp:0x%p count:%d lwm:%d next:%d\n",
tlck->ip, mp, xadlock->count, lwm, next));
maplock->index++;
xadlock++;
}
/*
* truncate entry XAD[hwm == next - 1]:
*/
if (hwm == next - 1) {
pxdlock_t *pxdlock;
/* format a maplock for txUpdateMap() to update bmap
* to free truncated delta extent of the truncated
* entry XAD[next - 1];
* (xtlck->pxdlock = truncated delta extent);
*/
tlck->flag |= tlckUPDATEMAP;
pxdlock = (pxdlock_t *) xadlock;
pxdlock->flag = mlckFREEPXD;
pxdlock->count = 1;
pxdlock->pxd = tpxd;
jFYI(1,
("xtLog: truncate ip:0x%p mp:0x%p count:%d hwm:%d\n",
ip, mp, pxdlock->count, hwm));
maplock->index++;
xadlock++;
}
/*
* free entries XAD[next:hwm]:
*/
if (hwm >= next) {
/* format a maplock for txUpdateMap() to update bmap
* to free extents of XAD[next:hwm] from thedeleted
* page itself;
*/
tlck->flag |= tlckUPDATEMAP;
xadlock->flag = mlckFREEXADLIST;
xadlock->count = hwm - next + 1;
xadlock->xdlist = &p->xad[next];
jFYI(1,
("xtLog: free ip:0x%p mp:0x%p count:%d next:%d hwm:%d\n",
tlck->ip, mp, xadlock->count, next, hwm));
maplock->index++;
}
/* mark page as homeward bound */
tlck->flag |= tlckWRITEPAGE;
}
return;
}
/*
* mapLog()
*
* function: log from maplock of freed data extents;
*/
void mapLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
{
pxdlock_t *pxdlock;
int i, nlock;
pxd_t *pxd;
/*
* page relocation: free the source page extent
*
* a maplock for txUpdateMap() for free of the page
* has been formatted at txLock() time saving the src
* relocated page address;
*/
if (tlck->type & tlckRELOCATE) {
/* log LOG_NOREDOPAGE of the old relocated page
* for logredo() to start NoRedoPage filter;
*/
lrd->type = cpu_to_le16(LOG_NOREDOPAGE);
pxdlock = (pxdlock_t *) & tlck->lock;
pxd = &lrd->log.redopage.pxd;
*pxd = pxdlock->pxd;
lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, NULL));
/* (N.B. currently, logredo() does NOT update bmap
* for free of the page itself for (LOG_XTREE|LOG_NOREDOPAGE);
* if page free from relocation, LOG_UPDATEMAP log is
* specifically generated now for logredo()
* to update bmap for free of src relocated page;
* (new flag LOG_RELOCATE may be introduced which will
* inform logredo() to start NORedoPage filter and also
* update block allocation map at the same time, thus
* avoiding an extra log write);
*/
lrd->type = cpu_to_le16(LOG_UPDATEMAP);
lrd->log.updatemap.type = cpu_to_le16(LOG_FREEPXD);
lrd->log.updatemap.nxd = cpu_to_le16(1);
lrd->log.updatemap.pxd = pxdlock->pxd;
lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, NULL));
/* a maplock for txUpdateMap() for free of the page
* has been formatted at txLock() time;
*/
tlck->flag |= tlckUPDATEMAP;
return;
}
/*
* Otherwise it's not a relocate request
*
*/
else {
/* log LOG_UPDATEMAP for logredo() to update bmap for
* free of truncated/relocated delta extent of the data;
* e.g.: external EA extent, relocated/truncated extent
* from xtTailgate();
*/
lrd->type = cpu_to_le16(LOG_UPDATEMAP);
pxdlock = (pxdlock_t *) & tlck->lock;
nlock = pxdlock->index;
for (i = 0; i < nlock; i++, pxdlock++) {
if (pxdlock->flag & mlckALLOCPXD)
lrd->log.updatemap.type =
cpu_to_le16(LOG_ALLOCPXD);
else
lrd->log.updatemap.type =
cpu_to_le16(LOG_FREEPXD);
lrd->log.updatemap.nxd = cpu_to_le16(1);
lrd->log.updatemap.pxd = pxdlock->pxd;
lrd->backchain =
cpu_to_le32(lmLog(log, tblk, lrd, NULL));
jFYI(1, ("mapLog: xaddr:0x%lx xlen:0x%x\n",
(ulong) addressPXD(&pxdlock->pxd),
lengthPXD(&pxdlock->pxd)));
}
/* update bmap */
tlck->flag |= tlckUPDATEMAP;
}
}
/*
* txEA()
*
* function: acquire maplock for EA/ACL extents or
* set COMMIT_INLINE flag;
*/
void txEA(tid_t tid, struct inode *ip, dxd_t * oldea, dxd_t * newea)
{
tlock_t *tlck = NULL;
pxdlock_t *maplock = NULL, *pxdlock = NULL;
/*
* format maplock for alloc of new EA extent
*/
if (newea) {
/* Since the newea could be a completely zeroed entry we need to
* check for the two flags which indicate we should actually
* commit new EA data
*/
if (newea->flag & DXD_EXTENT) {
tlck = txMaplock(tid, ip, tlckMAP);
maplock = (pxdlock_t *) & tlck->lock;
pxdlock = (pxdlock_t *) maplock;
pxdlock->flag = mlckALLOCPXD;
PXDaddress(&pxdlock->pxd, addressDXD(newea));
PXDlength(&pxdlock->pxd, lengthDXD(newea));
pxdlock++;
maplock->index = 1;
} else if (newea->flag & DXD_INLINE) {
tlck = NULL;
set_cflag(COMMIT_Inlineea, ip);
}
}
/*
* format maplock for free of old EA extent
*/
if (!test_cflag(COMMIT_Nolink, ip) && oldea->flag & DXD_EXTENT) {
if (tlck == NULL) {
tlck = txMaplock(tid, ip, tlckMAP);
maplock = (pxdlock_t *) & tlck->lock;
pxdlock = (pxdlock_t *) maplock;
maplock->index = 0;
}
pxdlock->flag = mlckFREEPXD;
PXDaddress(&pxdlock->pxd, addressDXD(oldea));
PXDlength(&pxdlock->pxd, lengthDXD(oldea));
maplock->index++;
}
}
/*
* txForce()
*
* function: synchronously write pages locked by transaction
* after txLog() but before txUpdateMap();
*/
void txForce(tblock_t * tblk)
{
tlock_t *tlck;
lid_t lid, next;
metapage_t *mp;
/*
* reverse the order of transaction tlocks in
* careful update order of address index pages
* (right to left, bottom up)
*/
tlck = lid_to_tlock(tblk->next);
lid = tlck->next;
tlck->next = 0;
while (lid) {
tlck = lid_to_tlock(lid);
next = tlck->next;
tlck->next = tblk->next;
tblk->next = lid;
lid = next;
}
/*
* synchronously write the page, and
* hold the page for txUpdateMap();
*/
for (lid = tblk->next; lid; lid = next) {
tlck = lid_to_tlock(lid);
next = tlck->next;
if ((mp = tlck->mp) != NULL &&
(tlck->type & tlckBTROOT) == 0) {
assert(mp->xflag & COMMIT_PAGE);
if (tlck->flag & tlckWRITEPAGE) {
tlck->flag &= ~tlckWRITEPAGE;
/* do not release page to freelist */
assert(atomic_read(&mp->nohomeok));
hold_metapage(mp, 0);
write_metapage(mp);
}
}
}
}
/*
* txUpdateMap()
*
* function: update persistent allocation map (and working map
* if appropriate);
*
* parameter:
*/
static void txUpdateMap(tblock_t * tblk)
{
struct inode *ip;
struct inode *ipimap;
lid_t lid;
tlock_t *tlck;
maplock_t *maplock;
pxdlock_t pxdlock;
int maptype;
int k, nlock;
metapage_t *mp = 0;
ipimap = JFS_SBI(tblk->sb)->ipimap;
maptype = (tblk->xflag & COMMIT_PMAP) ? COMMIT_PMAP : COMMIT_PWMAP;
/*
* update block allocation map
*
* update allocation state in pmap (and wmap) and
* update lsn of the pmap page;
*/
/*
* scan each tlock/page of transaction for block allocation/free:
*
* for each tlock/page of transaction, update map.
* ? are there tlock for pmap and pwmap at the same time ?
*/
for (lid = tblk->next; lid; lid = tlck->next) {
tlck = lid_to_tlock(lid);
if ((tlck->flag & tlckUPDATEMAP) == 0)
continue;
if (tlck->flag & tlckFREEPAGE) {
/*
* Another thread may attempt to reuse freed space
* immediately, so we want to get rid of the metapage
* before anyone else has a chance to get it.
* Lock metapage, update maps, then invalidate
* the metapage.
*/
mp = tlck->mp;
ASSERT(mp->xflag & COMMIT_PAGE);
hold_metapage(mp, 0);
}
/*
* extent list:
* . in-line PXD list:
* . out-of-line XAD list:
*/
maplock = (maplock_t *) & tlck->lock;
nlock = maplock->index;
for (k = 0; k < nlock; k++, maplock++) {
/*
* allocate blocks in persistent map:
*
* blocks have been allocated from wmap at alloc time;
*/
if (maplock->flag & mlckALLOC) {
txAllocPMap(ipimap, maplock, tblk);
}
/*
* free blocks in persistent and working map:
* blocks will be freed in pmap and then in wmap;
*
* ? tblock specifies the PMAP/PWMAP based upon
* transaction
*
* free blocks in persistent map:
* blocks will be freed from wmap at last reference
* release of the object for regular files;
*
* Alway free blocks from both persistent & working
* maps for directories
*/
else { /* (maplock->flag & mlckFREE) */
if (S_ISDIR(tlck->ip->i_mode))
txFreeMap(ipimap, maplock,
tblk, COMMIT_PWMAP);
else
txFreeMap(ipimap, maplock,
tblk, maptype);
}
}
if (tlck->flag & tlckFREEPAGE) {
if (!(tblk->flag & tblkGC_LAZY)) {
/* This is equivalent to txRelease */
ASSERT(mp->lid == lid);
tlck->mp->lid = 0;
}
assert(atomic_read(&mp->nohomeok) == 1);
atomic_dec(&mp->nohomeok);
discard_metapage(mp);
tlck->mp = 0;
}
}
/*
* update inode allocation map
*
* update allocation state in pmap and
* update lsn of the pmap page;
* update in-memory inode flag/state
*
* unlock mapper/write lock
*/
if (tblk->xflag & COMMIT_CREATE) {
ip = tblk->ip;
ASSERT(test_cflag(COMMIT_New, ip));
clear_cflag(COMMIT_New, ip);
diUpdatePMap(ipimap, ip->i_ino, FALSE, tblk);
ipimap->i_state |= I_DIRTY;
/* update persistent block allocation map
* for the allocation of inode extent;
*/
pxdlock.flag = mlckALLOCPXD;
pxdlock.pxd = JFS_IP(ip)->ixpxd;
pxdlock.index = 1;
txAllocPMap(ip, (maplock_t *) & pxdlock, tblk);
iput(ip);
} else if (tblk->xflag & COMMIT_DELETE) {
ip = tblk->ip;
diUpdatePMap(ipimap, ip->i_ino, TRUE, tblk);
ipimap->i_state |= I_DIRTY;
if (test_and_clear_cflag(COMMIT_Holdlock, ip)) {
if (tblk->flag & tblkGC_LAZY)
IWRITE_UNLOCK(ip);
}
iput(ip);
}
}
/*
* txAllocPMap()
*
* function: allocate from persistent map;
*
* parameter:
* ipbmap -
* malock -
* xad list:
* pxd:
*
* maptype -
* allocate from persistent map;
* free from persistent map;
* (e.g., tmp file - free from working map at releae
* of last reference);
* free from persistent and working map;
*
* lsn - log sequence number;
*/
static void txAllocPMap(struct inode *ip, maplock_t * maplock,
tblock_t * tblk)
{
struct inode *ipbmap = JFS_SBI(ip->i_sb)->ipbmap;
xdlistlock_t *xadlistlock;
xad_t *xad;
s64 xaddr;
int xlen;
pxdlock_t *pxdlock;
xdlistlock_t *pxdlistlock;
pxd_t *pxd;
int n;
/*
* allocate from persistent map;
*/
if (maplock->flag & mlckALLOCXADLIST) {
xadlistlock = (xdlistlock_t *) maplock;
xad = xadlistlock->xdlist;
for (n = 0; n < xadlistlock->count; n++, xad++) {
if (xad->flag & (XAD_NEW | XAD_EXTENDED)) {
xaddr = addressXAD(xad);
xlen = lengthXAD(xad);
dbUpdatePMap(ipbmap, FALSE, xaddr,
(s64) xlen, tblk);
xad->flag &= ~(XAD_NEW | XAD_EXTENDED);
jFYI(1,
("allocPMap: xaddr:0x%lx xlen:%d\n",
(ulong) xaddr, xlen));
}
}
} else if (maplock->flag & mlckALLOCPXD) {
pxdlock = (pxdlock_t *) maplock;
xaddr = addressPXD(&pxdlock->pxd);
xlen = lengthPXD(&pxdlock->pxd);
dbUpdatePMap(ipbmap, FALSE, xaddr, (s64) xlen, tblk);
jFYI(1,
("allocPMap: xaddr:0x%lx xlen:%d\n", (ulong) xaddr,
xlen));
} else { /* (maplock->flag & mlckALLOCPXDLIST) */
pxdlistlock = (xdlistlock_t *) maplock;
pxd = pxdlistlock->xdlist;
for (n = 0; n < pxdlistlock->count; n++, pxd++) {
xaddr = addressPXD(pxd);
xlen = lengthPXD(pxd);
dbUpdatePMap(ipbmap, FALSE, xaddr, (s64) xlen,
tblk);
jFYI(1,
("allocPMap: xaddr:0x%lx xlen:%d\n",
(ulong) xaddr, xlen));
}
}
}
/*
* txFreeMap()
*
* function: free from persistent and/or working map;
*
* todo: optimization
*/
void txFreeMap(struct inode *ip,
maplock_t * maplock, tblock_t * tblk, int maptype)
{
struct inode *ipbmap = JFS_SBI(ip->i_sb)->ipbmap;
xdlistlock_t *xadlistlock;
xad_t *xad;
s64 xaddr;
int xlen;
pxdlock_t *pxdlock;
xdlistlock_t *pxdlistlock;
pxd_t *pxd;
int n;
jFYI(1,
("txFreeMap: tblk:0x%p maplock:0x%p maptype:0x%x\n",
tblk, maplock, maptype));
/*
* free from persistent map;
*/
if (maptype == COMMIT_PMAP || maptype == COMMIT_PWMAP) {
if (maplock->flag & mlckFREEXADLIST) {
xadlistlock = (xdlistlock_t *) maplock;
xad = xadlistlock->xdlist;
for (n = 0; n < xadlistlock->count; n++, xad++) {
if (!(xad->flag & XAD_NEW)) {
xaddr = addressXAD(xad);
xlen = lengthXAD(xad);
dbUpdatePMap(ipbmap, TRUE, xaddr,
(s64) xlen, tblk);
jFYI(1,
("freePMap: xaddr:0x%lx xlen:%d\n",
(ulong) xaddr, xlen));
}
}
} else if (maplock->flag & mlckFREEPXD) {
pxdlock = (pxdlock_t *) maplock;
xaddr = addressPXD(&pxdlock->pxd);
xlen = lengthPXD(&pxdlock->pxd);
dbUpdatePMap(ipbmap, TRUE, xaddr, (s64) xlen,
tblk);
jFYI(1,
("freePMap: xaddr:0x%lx xlen:%d\n",
(ulong) xaddr, xlen));
} else { /* (maplock->flag & mlckALLOCPXDLIST) */
pxdlistlock = (xdlistlock_t *) maplock;
pxd = pxdlistlock->xdlist;
for (n = 0; n < pxdlistlock->count; n++, pxd++) {
xaddr = addressPXD(pxd);
xlen = lengthPXD(pxd);
dbUpdatePMap(ipbmap, TRUE, xaddr,
(s64) xlen, tblk);
jFYI(1,
("freePMap: xaddr:0x%lx xlen:%d\n",
(ulong) xaddr, xlen));
}
}
}
/*
* free from working map;
*/
if (maptype == COMMIT_PWMAP || maptype == COMMIT_WMAP) {
if (maplock->flag & mlckFREEXADLIST) {
xadlistlock = (xdlistlock_t *) maplock;
xad = xadlistlock->xdlist;
for (n = 0; n < xadlistlock->count; n++, xad++) {
xaddr = addressXAD(xad);
xlen = lengthXAD(xad);
dbFree(ip, xaddr, (s64) xlen);
xad->flag = 0;
jFYI(1,
("freeWMap: xaddr:0x%lx xlen:%d\n",
(ulong) xaddr, xlen));
}
} else if (maplock->flag & mlckFREEPXD) {
pxdlock = (pxdlock_t *) maplock;
xaddr = addressPXD(&pxdlock->pxd);
xlen = lengthPXD(&pxdlock->pxd);
dbFree(ip, xaddr, (s64) xlen);
jFYI(1,
("freeWMap: xaddr:0x%lx xlen:%d\n",
(ulong) xaddr, xlen));
} else { /* (maplock->flag & mlckFREEPXDLIST) */
pxdlistlock = (xdlistlock_t *) maplock;
pxd = pxdlistlock->xdlist;
for (n = 0; n < pxdlistlock->count; n++, pxd++) {
xaddr = addressPXD(pxd);
xlen = lengthPXD(pxd);
dbFree(ip, xaddr, (s64) xlen);
jFYI(1,
("freeWMap: xaddr:0x%lx xlen:%d\n",
(ulong) xaddr, xlen));
}
}
}
}
/*
* txFreelock()
*
* function: remove tlock from inode anonymous locklist
*/
void txFreelock(struct inode *ip)
{
struct jfs_inode_info *jfs_ip = JFS_IP(ip);
tlock_t *xtlck, *tlck;
lid_t xlid = 0, lid;
if (!jfs_ip->atlhead)
return;
xtlck = (tlock_t *) &jfs_ip->atlhead;
while ((lid = xtlck->next)) {
tlck = lid_to_tlock(lid);
if (tlck->flag & tlckFREELOCK) {
xtlck->next = tlck->next;
txLockFree(lid);
} else {
xtlck = tlck;
xlid = lid;
}
}
if (jfs_ip->atlhead)
jfs_ip->atltail = xlid;
else {
jfs_ip->atltail = 0;
/*
* If inode was on anon_list, remove it
*/
TXN_LOCK();
list_del_init(&jfs_ip->anon_inode_list);
TXN_UNLOCK();
}
}
/*
* txAbort()
*
* function: abort tx before commit;
*
* frees line-locks and segment locks for all
* segments in comdata structure.
* Optionally sets state of file-system to FM_DIRTY in super-block.
* log age of page-frames in memory for which caller has
* are reset to 0 (to avoid logwarap).
*/
void txAbort(tid_t tid, int dirty)
{
lid_t lid, next;
metapage_t *mp;
tblock_t *tblk = tid_to_tblock(tid);
jEVENT(1, ("txAbort: tid:%d dirty:0x%x\n", tid, dirty));
/*
* free tlocks of the transaction
*/
for (lid = tblk->next; lid; lid = next) {
next = lid_to_tlock(lid)->next;
mp = lid_to_tlock(lid)->mp;
if (mp) {
mp->lid = 0;
/*
* reset lsn of page to avoid logwarap:
*
* (page may have been previously committed by another
* transaction(s) but has not been paged, i.e.,
* it may be on logsync list even though it has not
* been logged for the current tx.)
*/
if (mp->xflag & COMMIT_PAGE && mp->lsn)
LogSyncRelease(mp);
}
/* insert tlock at head of freelist */
TXN_LOCK();
txLockFree(lid);
TXN_UNLOCK();
}
/* caller will free the transaction block */
tblk->next = tblk->last = 0;
/*
* mark filesystem dirty
*/
if (dirty)
updateSuper(tblk->sb, FM_DIRTY);
return;
}
/*
* txAbortCommit()
*
* function: abort commit.
*
* frees tlocks of transaction; line-locks and segment locks for all
* segments in comdata structure. frees malloc storage
* sets state of file-system to FM_MDIRTY in super-block.
* log age of page-frames in memory for which caller has
* are reset to 0 (to avoid logwarap).
*/
void txAbortCommit(commit_t * cd, int exval)
{
tblock_t *tblk;
tid_t tid;
lid_t lid, next;
metapage_t *mp;
assert(exval == EIO || exval == ENOMEM);
jEVENT(1, ("txAbortCommit: cd:0x%p\n", cd));
/*
* free tlocks of the transaction
*/
tid = cd->tid;
tblk = tid_to_tblock(tid);
for (lid = tblk->next; lid; lid = next) {
next = lid_to_tlock(lid)->next;
mp = lid_to_tlock(lid)->mp;
if (mp) {
mp->lid = 0;
/*
* reset lsn of page to avoid logwarap;
*/
if (mp->xflag & COMMIT_PAGE)
LogSyncRelease(mp);
}
/* insert tlock at head of freelist */
TXN_LOCK();
txLockFree(lid);
TXN_UNLOCK();
}
tblk->next = tblk->last = 0;
/* free the transaction block */
txEnd(tid);
/*
* mark filesystem dirty
*/
updateSuper(cd->sb, FM_DIRTY);
}
/*
* txLazyCommit(void)
*
* All transactions except those changing ipimap (COMMIT_FORCE) are
* processed by this routine. This insures that the inode and block
* allocation maps are updated in order. For synchronous transactions,
* let the user thread finish processing after txUpdateMap() is called.
*/
void txLazyCommit(tblock_t * tblk)
{
log_t *log;
while (((tblk->flag & tblkGC_READY) == 0) &&
((tblk->flag & tblkGC_UNLOCKED) == 0)) {
/* We must have gotten ahead of the user thread
*/
jFYI(1,
("jfs_lazycommit: tblk 0x%p not unlocked\n", tblk));
schedule();
}
jFYI(1, ("txLazyCommit: processing tblk 0x%p\n", tblk));
txUpdateMap(tblk);
log = (log_t *) JFS_SBI(tblk->sb)->log;
spin_lock_irq(&log->gclock); // LOGGC_LOCK
tblk->flag |= tblkGC_COMMITTED;
if ((tblk->flag & tblkGC_READY) || (tblk->flag & tblkGC_LAZY))
log->gcrtc--;
if (tblk->flag & tblkGC_READY)
wake_up(&tblk->gcwait); // LOGGC_WAKEUP
spin_unlock_irq(&log->gclock); // LOGGC_UNLOCK
if (tblk->flag & tblkGC_LAZY) {
txUnlock(tblk, 0);
tblk->flag &= ~tblkGC_LAZY;
txEnd(tblk - TxBlock); /* Convert back to tid */
}
jFYI(1, ("txLazyCommit: done: tblk = 0x%p\n", tblk));
}
/*
* jfs_lazycommit(void)
*
* To be run as a kernel daemon. If lbmIODone is called in an interrupt
* context, or where blocking is not wanted, this routine will process
* committed transactions from the unlock queue.
*/
int jfs_lazycommit(void)
{
int WorkDone;
tblock_t *tblk;
unsigned long flags;
lock_kernel();
daemonize();
current->tty = NULL;
strcpy(current->comm, "jfsCommit");
unlock_kernel();
jfsCommitTask = current;
spin_lock_irq(&current->sigmask_lock);
siginitsetinv(&current->blocked,
sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP)
| sigmask(SIGCONT));
spin_unlock_irq(&current->sigmask_lock);
LAZY_LOCK_INIT();
TxAnchor.unlock_queue = TxAnchor.unlock_tail = 0;
complete(&jfsIOwait);
do {
restart:
WorkDone = 0;
while ((tblk = TxAnchor.unlock_queue)) {
/*
* We can't get ahead of user thread. Spinning is
* simpler than blocking/waking. We shouldn't spin
* very long, since user thread shouldn't be blocking
* between lmGroupCommit & txEnd.
*/
WorkDone = 1;
LAZY_LOCK(flags);
/*
* Remove first transaction from queue
*/
TxAnchor.unlock_queue = tblk->cqnext;
tblk->cqnext = 0;
if (TxAnchor.unlock_tail == tblk)
TxAnchor.unlock_tail = 0;
LAZY_UNLOCK(flags);
txLazyCommit(tblk);
/*
* We can be running indefinately if other processors
* are adding transactions to this list
*/
if (need_resched()) {
current->state = TASK_RUNNING;
schedule();
}
}
if (WorkDone)
goto restart;
set_current_state(TASK_INTERRUPTIBLE);
schedule();
} while (!jfs_thread_stopped());
if (TxAnchor.unlock_queue)
jERROR(1, ("jfs_lazycommit being killed with pending transactions!\n"));
else
jFYI(1, ("jfs_lazycommit being killed\n"));
complete(&jfsIOwait);
return 0;
}
void txLazyUnlock(tblock_t * tblk)
{
unsigned long flags;
LAZY_LOCK(flags);
if (TxAnchor.unlock_tail)
TxAnchor.unlock_tail->cqnext = tblk;
else
TxAnchor.unlock_queue = tblk;
TxAnchor.unlock_tail = tblk;
tblk->cqnext = 0;
LAZY_UNLOCK(flags);
wake_up_process(jfsCommitTask);
}
static void LogSyncRelease(metapage_t * mp)
{
log_t *log = mp->log;
assert(atomic_read(&mp->nohomeok));
assert(log);
atomic_dec(&mp->nohomeok);
if (atomic_read(&mp->nohomeok))
return;
hold_metapage(mp, 0);
LOGSYNC_LOCK(log);
mp->log = NULL;
mp->lsn = 0;
mp->clsn = 0;
log->count--;
list_del_init(&mp->synclist);
LOGSYNC_UNLOCK(log);
release_metapage(mp);
}
/*
* jfs_sync(void)
*
* To be run as a kernel daemon. This is awakened when tlocks run low.
* We write any inodes that have anonymous tlocks so they will become
* available.
*/
int jfs_sync(void)
{
struct inode *ip;
struct jfs_inode_info *jfs_ip;
lock_kernel();
daemonize();
current->tty = NULL;
strcpy(current->comm, "jfsSync");
unlock_kernel();
jfsSyncTask = current;
spin_lock_irq(&current->sigmask_lock);
siginitsetinv(&current->blocked,
sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP)
| sigmask(SIGCONT));
spin_unlock_irq(&current->sigmask_lock);
complete(&jfsIOwait);
do {
/*
* write each inode on the anonymous inode list
*/
TXN_LOCK();
while (TlocksLow && !list_empty(&TxAnchor.anon_list)) {
jfs_ip = list_entry(TxAnchor.anon_list.next,
struct jfs_inode_info,
anon_inode_list);
ip = &jfs_ip->vfs_inode;
/*
* We must release the TXN_LOCK since our
* IWRITE_TRYLOCK implementation may still block
*/
TXN_UNLOCK();
if (IWRITE_TRYLOCK(ip)) {
/*
* inode will be removed from anonymous list
* when it is committed
*/
jfs_commit_inode(ip, 0);
IWRITE_UNLOCK(ip);
/*
* Just to be safe. I don't know how
* long we can run without blocking
*/
if (need_resched()) {
current->state = TASK_RUNNING;
schedule();
}
TXN_LOCK();
} else {
/* We can't get the write lock. It may
* be held by a thread waiting for tlock's
* so let's not block here. Save it to
* put back on the anon_list.
*/
/*
* We released TXN_LOCK, let's make sure
* this inode is still there
*/
TXN_LOCK();
if (TxAnchor.anon_list.next !=
&jfs_ip->anon_inode_list)
continue;
/* Take off anon_list */
list_del(&jfs_ip->anon_inode_list);
/* Put on anon_list2 */
list_add(&jfs_ip->anon_inode_list,
&TxAnchor.anon_list2);
}
}
/* Add anon_list2 back to anon_list */
if (!list_empty(&TxAnchor.anon_list2)) {
list_splice(&TxAnchor.anon_list2, &TxAnchor.anon_list);
INIT_LIST_HEAD(&TxAnchor.anon_list2);
}
TXN_UNLOCK();
set_current_state(TASK_INTERRUPTIBLE);
schedule();
} while (!jfs_thread_stopped());
jFYI(1, ("jfs_sync being killed\n"));
complete(&jfsIOwait);
return 0;
}
#if CONFIG_PROC_FS
int jfs_txanchor_read(char *buffer, char **start, off_t offset, int length,
int *eof, void *data)
{
int len = 0;
off_t begin;
char *freewait;
char *freelockwait;
char *lowlockwait;
freewait =
waitqueue_active(&TxAnchor.freewait) ? "active" : "empty";
freelockwait =
waitqueue_active(&TxAnchor.freelockwait) ? "active" : "empty";
lowlockwait =
waitqueue_active(&TxAnchor.lowlockwait) ? "active" : "empty";
len += sprintf(buffer,
"JFS TxAnchor\n"
"============\n"
"freetid = %d\n"
"freewait = %s\n"
"freelock = %d\n"
"freelockwait = %s\n"
"lowlockwait = %s\n"
"tlocksInUse = %d\n"
"unlock_queue = 0x%p\n"
"unlock_tail = 0x%p\n",
TxAnchor.freetid,
freewait,
TxAnchor.freelock,
freelockwait,
lowlockwait,
TxAnchor.tlocksInUse,
TxAnchor.unlock_queue,
TxAnchor.unlock_tail);
begin = offset;
*start = buffer + begin;
len -= begin;
if (len > length)
len = length;
else
*eof = 1;
if (len < 0)
len = 0;
return len;
}
#endif
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
/*
* Change History :
*
*/
#ifndef _H_JFS_TXNMGR
#define _H_JFS_TXNMGR
/*
* jfs_txnmgr.h: transaction manager
*/
#include "jfs_logmgr.h"
/*
* Hide implementation of TxBlock and TxLock
*/
#define tid_to_tblock(tid) (&TxBlock[tid])
#define lid_to_tlock(lid) (&TxLock[lid])
/*
* transaction block
*/
typedef struct tblock {
/*
* tblock_t and jbuf_t common area: struct logsyncblk
*
* the following 5 fields are the same as struct logsyncblk
* which is common to tblock and jbuf to form logsynclist
*/
u16 xflag; /* tx commit type */
u16 flag; /* tx commit state */
lid_t dummy; /* Must keep structures common */
s32 lsn; /* recovery lsn */
struct list_head synclist; /* logsynclist link */
/* lock management */
struct super_block *sb; /* 4: super block */
lid_t next; /* 2: index of first tlock of tid */
lid_t last; /* 2: index of last tlock of tid */
wait_queue_head_t waitor; /* 4: tids waiting on this tid */
/* log management */
u32 logtid; /* 4: log transaction id */
/* (32) */
/* commit management */
struct tblock *cqnext; /* 4: commit queue link */
s32 clsn; /* 4: commit lsn */
struct lbuf *bp; /* 4: */
s32 pn; /* 4: commit record log page number */
s32 eor; /* 4: commit record eor */
wait_queue_head_t gcwait; /* 4: group commit event list:
* ready transactions wait on this
* event for group commit completion.
*/
struct inode *ip; /* 4: inode being created or deleted */
s32 rsrvd; /* 4: */
} tblock_t; /* (64) */
extern struct tblock *TxBlock; /* transaction block table */
/* commit flags: tblk->xflag */
#define COMMIT_SYNC 0x0001 /* synchronous commit */
#define COMMIT_FORCE 0x0002 /* force pageout at end of commit */
#define COMMIT_FLUSH 0x0004 /* init flush at end of commit */
#define COMMIT_MAP 0x00f0
#define COMMIT_PMAP 0x0010 /* update pmap */
#define COMMIT_WMAP 0x0020 /* update wmap */
#define COMMIT_PWMAP 0x0040 /* update pwmap */
#define COMMIT_FREE 0x0f00
#define COMMIT_DELETE 0x0100 /* inode delete */
#define COMMIT_TRUNCATE 0x0200 /* file truncation */
#define COMMIT_CREATE 0x0400 /* inode create */
#define COMMIT_LAZY 0x0800 /* lazy commit */
#define COMMIT_PAGE 0x1000 /* Identifies element as metapage */
#define COMMIT_INODE 0x2000 /* Identifies element as inode */
/* group commit flags tblk->flag: see jfs_logmgr.h */
/*
* transaction lock
*/
typedef struct tlock {
lid_t next; /* index next lockword on tid locklist
* next lockword on freelist
*/
tid_t tid; /* transaction id holding lock */
u16 flag; /* 2: lock control */
u16 type; /* 2: log type */
struct metapage *mp; /* 4: object page buffer locked */
struct inode *ip; /* 4: object */
/* (16) */
s16 lock[24]; /* 48: overlay area */
} tlock_t; /* (64) */
extern struct tlock *TxLock; /* transaction lock table */
/*
* tlock flag
*/
/* txLock state */
#define tlckPAGELOCK 0x8000
#define tlckINODELOCK 0x4000
#define tlckLINELOCK 0x2000
#define tlckINLINELOCK 0x1000
/* lmLog state */
#define tlckLOG 0x0800
/* updateMap state */
#define tlckUPDATEMAP 0x0080
/* freeLock state */
#define tlckFREELOCK 0x0008
#define tlckWRITEPAGE 0x0004
#define tlckFREEPAGE 0x0002
/*
* tlock type
*/
#define tlckTYPE 0xfe00
#define tlckINODE 0x8000
#define tlckXTREE 0x4000
#define tlckDTREE 0x2000
#define tlckMAP 0x1000
#define tlckEA 0x0800
#define tlckACL 0x0400
#define tlckDATA 0x0200
#define tlckBTROOT 0x0100
#define tlckOPERATION 0x00ff
#define tlckGROW 0x0001 /* file grow */
#define tlckREMOVE 0x0002 /* file delete */
#define tlckTRUNCATE 0x0004 /* file truncate */
#define tlckRELOCATE 0x0008 /* file/directory relocate */
#define tlckENTRY 0x0001 /* directory insert/delete */
#define tlckEXTEND 0x0002 /* directory extend in-line */
#define tlckSPLIT 0x0010 /* splited page */
#define tlckNEW 0x0020 /* new page from split */
#define tlckFREE 0x0040 /* free page */
#define tlckRELINK 0x0080 /* update sibling pointer */
/*
* linelock for lmLog()
*
* note: linelock_t and its variations are overlaid
* at tlock.lock: watch for alignment;
*/
typedef struct {
u8 offset; /* 1: */
u8 length; /* 1: */
} lv_t; /* (2) */
#define TLOCKSHORT 20
#define TLOCKLONG 28
typedef struct {
u16 next; /* 2: next linelock */
s8 maxcnt; /* 1: */
s8 index; /* 1: */
u16 flag; /* 2: */
u8 type; /* 1: */
u8 l2linesize; /* 1: log2 of linesize */
/* (8) */
lv_t lv[20]; /* 40: */
} linelock_t; /* (48) */
#define dtlock_t linelock_t
#define itlock_t linelock_t
typedef struct {
u16 next; /* 2: */
s8 maxcnt; /* 1: */
s8 index; /* 1: */
u16 flag; /* 2: */
u8 type; /* 1: */
u8 l2linesize; /* 1: log2 of linesize */
/* (8) */
lv_t header; /* 2: */
lv_t lwm; /* 2: low water mark */
lv_t hwm; /* 2: high water mark */
lv_t twm; /* 2: */
/* (16) */
s32 pxdlock[8]; /* 32: */
} xtlock_t; /* (48) */
/*
* maplock for txUpdateMap()
*
* note: maplock_t and its variations are overlaid
* at tlock.lock/linelock: watch for alignment;
* N.B. next field may be set by linelock, and should not
* be modified by maplock;
* N.B. index of the first pxdlock specifies index of next
* free maplock (i.e., number of maplock) in the tlock;
*/
typedef struct {
u16 next; /* 2: */
u8 maxcnt; /* 2: */
u8 index; /* 2: next free maplock index */
u16 flag; /* 2: */
u8 type; /* 1: */
u8 count; /* 1: number of pxd/xad */
/* (8) */
pxd_t pxd; /* 8: */
} maplock_t; /* (16): */
/* maplock flag */
#define mlckALLOC 0x00f0
#define mlckALLOCXADLIST 0x0080
#define mlckALLOCPXDLIST 0x0040
#define mlckALLOCXAD 0x0020
#define mlckALLOCPXD 0x0010
#define mlckFREE 0x000f
#define mlckFREEXADLIST 0x0008
#define mlckFREEPXDLIST 0x0004
#define mlckFREEXAD 0x0002
#define mlckFREEPXD 0x0001
#define pxdlock_t maplock_t
typedef struct {
u16 next; /* 2: */
u8 maxcnt; /* 2: */
u8 index; /* 2: */
u16 flag; /* 2: */
u8 type; /* 1: */
u8 count; /* 1: number of pxd/xad */
/* (8) */
void *xdlist; /* 4: pxd/xad list */
s32 rsrvd; /* 4: */
} xdlistlock_t; /* (16): */
/*
* commit
*
* parameter to the commit manager routines
*/
typedef struct commit {
tid_t tid; /* 4: tid = index of tblock */
int flag; /* 4: flags */
log_t *log; /* 4: log */
struct super_block *sb; /* 4: superblock */
int nip; /* 4: number of entries in iplist */
struct inode **iplist; /* 4: list of pointers to inodes */
/* (32) */
/* log record descriptor on 64-bit boundary */
lrd_t lrd; /* : log record descriptor */
} commit_t;
/*
* external declarations
*/
extern tlock_t *txLock(tid_t tid, struct inode *ip, struct metapage *mp, int flag);
extern tlock_t *txMaplock(tid_t tid, struct inode *ip, int flag);
extern int txCommit(tid_t tid, int nip, struct inode **iplist, int flag);
extern tid_t txBegin(struct super_block *sb, int flag);
extern void txBeginAnon(struct super_block *sb);
extern void txEnd(tid_t tid);
extern void txAbort(tid_t tid, int dirty);
extern linelock_t *txLinelock(linelock_t * tlock);
extern void txFreeMap(struct inode *ip,
maplock_t * maplock, tblock_t * tblk, int maptype);
extern void txEA(tid_t tid, struct inode *ip, dxd_t * oldea, dxd_t * newea);
extern void txFreelock(struct inode *ip);
extern int lmLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
#endif /* _H_JFS_TXNMGR */
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#ifndef _H_JFS_TYPES
#define _H_JFS_TYPES
/*
* jfs_types.h:
*
* basic type/utility definitions
*
* note: this header file must be the 1st include file
* of JFS include list in all JFS .c file.
*/
#include <linux/types.h>
#include <linux/nls.h>
#include "endian24.h"
/*
* transaction and lock id's
*/
typedef uint tid_t;
typedef uint lid_t;
/*
* Almost identical to Linux's timespec, but not quite
*/
struct timestruc_t {
u32 tv_sec;
u32 tv_nsec;
};
/*
* handy
*/
#define LEFTMOSTONE 0x80000000
#define HIGHORDER 0x80000000u /* high order bit on */
#define ONES 0xffffffffu /* all bit on */
typedef int boolean_t;
#define TRUE 1
#define FALSE 0
/*
* logical xd (lxd)
*/
typedef struct {
unsigned len:24;
unsigned off1:8;
u32 off2;
} lxd_t;
/* lxd_t field construction */
#define LXDlength(lxd, length32) ( (lxd)->len = length32 )
#define LXDoffset(lxd, offset64)\
{\
(lxd)->off1 = ((s64)offset64) >> 32;\
(lxd)->off2 = (offset64) & 0xffffffff;\
}
/* lxd_t field extraction */
#define lengthLXD(lxd) ( (lxd)->len )
#define offsetLXD(lxd)\
( ((s64)((lxd)->off1)) << 32 | (lxd)->off2 )
/* lxd list */
typedef struct {
s16 maxnlxd;
s16 nlxd;
lxd_t *lxd;
} lxdlist_t;
/*
* physical xd (pxd)
*/
typedef struct {
unsigned len:24;
unsigned addr1:8;
u32 addr2;
} pxd_t;
/* xd_t field construction */
#define PXDlength(pxd, length32) ((pxd)->len = __cpu_to_le24(length32))
#define PXDaddress(pxd, address64)\
{\
(pxd)->addr1 = ((s64)address64) >> 32;\
(pxd)->addr2 = __cpu_to_le32((address64) & 0xffffffff);\
}
/* xd_t field extraction */
#define lengthPXD(pxd) __le24_to_cpu((pxd)->len)
#define addressPXD(pxd)\
( ((s64)((pxd)->addr1)) << 32 | __le32_to_cpu((pxd)->addr2))
/* pxd list */
typedef struct {
s16 maxnpxd;
s16 npxd;
pxd_t pxd[8];
} pxdlist_t;
/*
* data extent descriptor (dxd)
*/
typedef struct {
unsigned flag:8; /* 1: flags */
unsigned rsrvd:24; /* 3: */
u32 size; /* 4: size in byte */
unsigned len:24; /* 3: length in unit of fsblksize */
unsigned addr1:8; /* 1: address in unit of fsblksize */
u32 addr2; /* 4: address in unit of fsblksize */
} dxd_t; /* - 16 - */
/* dxd_t flags */
#define DXD_INDEX 0x80 /* B+-tree index */
#define DXD_INLINE 0x40 /* in-line data extent */
#define DXD_EXTENT 0x20 /* out-of-line single extent */
#define DXD_FILE 0x10 /* out-of-line file (inode) */
#define DXD_CORRUPT 0x08 /* Inconsistency detected */
/* dxd_t field construction
* Conveniently, the PXD macros work for DXD
*/
#define DXDlength PXDlength
#define DXDaddress PXDaddress
#define lengthDXD lengthPXD
#define addressDXD addressPXD
/*
* directory entry argument
*/
typedef struct component_name {
int namlen;
wchar_t *name;
} component_t;
/*
* DASD limit information - stored in directory inode
*/
typedef struct dasd {
u8 thresh; /* Alert Threshold (in percent) */
u8 delta; /* Alert Threshold delta (in percent) */
u8 rsrvd1;
u8 limit_hi; /* DASD limit (in logical blocks) */
u32 limit_lo; /* DASD limit (in logical blocks) */
u8 rsrvd2[3];
u8 used_hi; /* DASD usage (in logical blocks) */
u32 used_lo; /* DASD usage (in logical blocks) */
} dasd_t;
#define DASDLIMIT(dasdp) \
(((u64)((dasdp)->limit_hi) << 32) + __le32_to_cpu((dasdp)->limit_lo))
#define setDASDLIMIT(dasdp, limit)\
{\
(dasdp)->limit_hi = ((u64)limit) >> 32;\
(dasdp)->limit_lo = __cpu_to_le32(limit);\
}
#define DASDUSED(dasdp) \
(((u64)((dasdp)->used_hi) << 32) + __le32_to_cpu((dasdp)->used_lo))
#define setDASDUSED(dasdp, used)\
{\
(dasdp)->used_hi = ((u64)used) >> 32;\
(dasdp)->used_lo = __cpu_to_le32(used);\
}
#endif /* !_H_JFS_TYPES */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
/*
* Change History :
*/
/*
* jfs_umount.c
*
* note: file system in transition to aggregate/fileset:
* (ref. jfs_mount.c)
*
* file system unmount is interpreted as mount of the single/only
* fileset in the aggregate and, if unmount of the last fileset,
* as unmount of the aggerate;
*/
#include <linux/fs.h>
#include "jfs_incore.h"
#include "jfs_filsys.h"
#include "jfs_superblock.h"
#include "jfs_dmap.h"
#include "jfs_imap.h"
#include "jfs_metapage.h"
#include "jfs_debug.h"
/*
* NAME: jfs_umount(vfsp, flags, crp)
*
* FUNCTION: vfs_umount()
*
* PARAMETERS: vfsp - virtual file system pointer
* flags - unmount for shutdown
* crp - credential
*
* RETURN : EBUSY - device has open files
*/
int jfs_umount(struct super_block *sb)
{
int rc = 0;
log_t *log;
struct jfs_sb_info *sbi = JFS_SBI(sb);
struct inode *ipbmap = sbi->ipbmap;
struct inode *ipimap = sbi->ipimap;
struct inode *ipaimap = sbi->ipaimap;
struct inode *ipaimap2 = sbi->ipaimap2;
jFYI(1, ("\n UnMount JFS: sb:0x%p\n", sb));
/*
* update superblock and close log
*
* if mounted read-write and log based recovery was enabled
*/
if ((log = sbi->log)) {
/*
* close log:
*
* remove file system from log active file system list.
*/
log = sbi->log;
rc = lmLogClose(sb, log);
}
/*
* close fileset inode allocation map (aka fileset inode)
*/
jEVENT(0, ("jfs_umount: close ipimap:0x%p\n", ipimap));
diUnmount(ipimap, 0);
diFreeSpecial(ipimap);
sbi->ipimap = NULL;
/*
* close secondary aggregate inode allocation map
*/
ipaimap2 = sbi->ipaimap2;
if (ipaimap2) {
jEVENT(0, ("jfs_umount: close ipaimap2:0x%p\n", ipaimap2));
diUnmount(ipaimap2, 0);
diFreeSpecial(ipaimap2);
sbi->ipaimap2 = NULL;
}
/*
* close aggregate inode allocation map
*/
ipaimap = sbi->ipaimap;
jEVENT(0, ("jfs_umount: close ipaimap:0x%p\n", ipaimap));
diUnmount(ipaimap, 0);
diFreeSpecial(ipaimap);
sbi->ipaimap = NULL;
/*
* close aggregate block allocation map
*/
jEVENT(0, ("jfs_umount: close ipbmap:%p\n", ipbmap));
dbUnmount(ipbmap, 0);
diFreeSpecial(ipbmap);
sbi->ipimap = NULL;
/*
* ensure all file system file pages are propagated to their
* home blocks on disk (and their in-memory buffer pages are
* invalidated) BEFORE updating file system superblock state
* (to signify file system is unmounted cleanly, and thus in
* consistent state) and log superblock active file system
* list (to signify skip logredo()).
*/
if (log) /* log = NULL if read-only mount */
rc = updateSuper(sb, FM_CLEAN);
jFYI(0, (" UnMount JFS Complete: %d\n", rc));
return rc;
}
int jfs_umount_rw(struct super_block *sb)
{
struct jfs_sb_info *sbi = JFS_SBI(sb);
if (!sbi->log)
return 0;
/*
* close log:
*
* remove file system from log active file system list.
*/
lmLogClose(sb, sbi->log);
dbSync(sbi->ipbmap);
diSync(sbi->ipimap);
sbi->log = 0;
return updateSuper(sb, FM_CLEAN);
}
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include <linux/fs.h>
#include <linux/slab.h>
#include "jfs_types.h"
#include "jfs_filsys.h"
#include "jfs_unicode.h"
#include "jfs_debug.h"
/*
* NAME: jfs_strfromUCS()
*
* FUNCTION: Convert little-endian unicode string to character string
*
*/
int jfs_strfromUCS_le(char *to, const wchar_t * from, /* LITTLE ENDIAN */
int len, struct nls_table *codepage)
{
int i;
int outlen = 0;
for (i = 0; (i < len) && from[i]; i++) {
int charlen;
charlen =
codepage->uni2char(le16_to_cpu(from[i]), &to[outlen],
NLS_MAX_CHARSET_SIZE);
if (charlen > 0) {
outlen += charlen;
} else {
to[outlen++] = '?';
}
}
to[outlen] = 0;
jEVENT(0, ("jfs_strfromUCS returning %d - '%s'\n", outlen, to));
return outlen;
}
/*
* NAME: jfs_strtoUCS()
*
* FUNCTION: Convert character string to unicode string
*
*/
int jfs_strtoUCS(wchar_t * to,
const char *from, int len, struct nls_table *codepage)
{
int charlen;
int i;
jEVENT(0, ("jfs_strtoUCS - '%s'\n", from));
for (i = 0; len && *from; i++, from += charlen, len -= charlen) {
charlen = codepage->char2uni(from, len, &to[i]);
if (charlen < 1) {
jERROR(1, ("jfs_strtoUCS: char2uni returned %d.\n",
charlen));
jERROR(1, ("charset = %s, char = 0x%x\n",
codepage->charset, (unsigned char) *from));
to[i] = 0x003f; /* a question mark */
charlen = 1;
}
}
jEVENT(0, (" returning %d\n", i));
to[i] = 0;
return i;
}
/*
* NAME: get_UCSname()
*
* FUNCTION: Allocate and translate to unicode string
*
*/
int get_UCSname(component_t * uniName, struct dentry *dentry,
struct nls_table *nls_tab)
{
int length = dentry->d_name.len;
if (length > JFS_NAME_MAX)
return ENAMETOOLONG;
uniName->name =
kmalloc((length + 1) * sizeof(wchar_t), GFP_NOFS);
if (uniName->name == NULL)
return ENOSPC;
uniName->namlen = jfs_strtoUCS(uniName->name, dentry->d_name.name,
length, nls_tab);
return 0;
}
/*
* unistrk: Unicode kernel case support
*
* Function:
* Convert a unicode character to upper or lower case using
* compressed tables.
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*
*/
#include <asm/byteorder.h>
#include "jfs_types.h"
typedef struct {
wchar_t start;
wchar_t end;
signed char *table;
} UNICASERANGE;
extern signed char UniUpperTable[512];
extern UNICASERANGE UniUpperRange[];
extern int get_UCSname(component_t *, struct dentry *, struct nls_table *);
extern int jfs_strfromUCS_le(char *, const wchar_t *, int, struct nls_table *);
#define free_UCSname(COMP) kfree((COMP)->name)
/*
* UniStrcpy: Copy a string
*/
static inline wchar_t *UniStrcpy(wchar_t * ucs1, const wchar_t * ucs2)
{
wchar_t *anchor = ucs1; /* save the start of result string */
while ((*ucs1++ = *ucs2++));
return anchor;
}
/*
* UniStrncpy: Copy length limited string with pad
*/
static inline wchar_t *UniStrncpy(wchar_t * ucs1, const wchar_t * ucs2,
size_t n)
{
wchar_t *anchor = ucs1;
while (n-- && *ucs2) /* Copy the strings */
*ucs1++ = *ucs2++;
n++;
while (n--) /* Pad with nulls */
*ucs1++ = 0;
return anchor;
}
/*
* UniStrncmp_le: Compare length limited string - native to little-endian
*/
static inline int UniStrncmp_le(const wchar_t * ucs1, const wchar_t * ucs2,
size_t n)
{
if (!n)
return 0; /* Null strings are equal */
while ((*ucs1 == __le16_to_cpu(*ucs2)) && *ucs1 && --n) {
ucs1++;
ucs2++;
}
return (int) *ucs1 - (int) __le16_to_cpu(*ucs2);
}
/*
* UniStrncpy_le: Copy length limited string with pad to little-endian
*/
static inline wchar_t *UniStrncpy_le(wchar_t * ucs1, const wchar_t * ucs2,
size_t n)
{
wchar_t *anchor = ucs1;
while (n-- && *ucs2) /* Copy the strings */
*ucs1++ = __le16_to_cpu(*ucs2++);
n++;
while (n--) /* Pad with nulls */
*ucs1++ = 0;
return anchor;
}
/*
* UniToupper: Convert a unicode character to upper case
*/
static inline wchar_t UniToupper(register wchar_t uc)
{
register UNICASERANGE *rp;
if (uc < sizeof(UniUpperTable)) { /* Latin characters */
return uc + UniUpperTable[uc]; /* Use base tables */
} else {
rp = UniUpperRange; /* Use range tables */
while (rp->start) {
if (uc < rp->start) /* Before start of range */
return uc; /* Uppercase = input */
if (uc <= rp->end) /* In range */
return uc + rp->table[uc - rp->start];
rp++; /* Try next range */
}
}
return uc; /* Past last range */
}
/*
* UniStrupr: Upper case a unicode string
*/
static inline wchar_t *UniStrupr(register wchar_t * upin)
{
register wchar_t *up;
up = upin;
while (*up) { /* For all characters */
*up = UniToupper(*up);
up++;
}
return upin; /* Return input pointer */
}
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* jfs_uniupr.c - Unicode compressed case ranges
*
*/
#include <linux/fs.h>
#include "jfs_unicode.h"
/*
* Latin upper case
*/
signed char UniUpperTable[512] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 000-00f */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 010-01f */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 020-02f */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 030-03f */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 040-04f */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 050-05f */
0,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* 060-06f */
-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, 0, 0, 0, 0, 0, /* 070-07f */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 080-08f */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 090-09f */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0a0-0af */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0b0-0bf */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0c0-0cf */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0d0-0df */
-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* 0e0-0ef */
-32,-32,-32,-32,-32,-32,-32, 0,-32,-32,-32,-32,-32,-32,-32,121, /* 0f0-0ff */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 100-10f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 110-11f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 120-12f */
0, 0, 0, -1, 0, -1, 0, -1, 0, 0, -1, 0, -1, 0, -1, 0, /* 130-13f */
-1, 0, -1, 0, -1, 0, -1, 0, -1, 0, 0, -1, 0, -1, 0, -1, /* 140-14f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 150-15f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 160-16f */
0, -1, 0, -1, 0, -1, 0, -1, 0, 0, -1, 0, -1, 0, -1, 0, /* 170-17f */
0, 0, 0, -1, 0, -1, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, /* 180-18f */
0, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, /* 190-19f */
0, -1, 0, -1, 0, -1, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, /* 1a0-1af */
-1, 0, 0, 0, -1, 0, -1, 0, 0, -1, 0, 0, 0, -1, 0, 0, /* 1b0-1bf */
0, 0, 0, 0, 0, -1, -2, 0, -1, -2, 0, -1, -2, 0, -1, 0, /* 1c0-1cf */
-1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1,-79, 0, -1, /* 1d0-1df */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1e0-1ef */
0, 0, -1, -2, 0, -1, 0, 0, 0, -1, 0, -1, 0, -1, 0, -1, /* 1f0-1ff */
};
/* Upper case range - Greek */
static signed char UniCaseRangeU03a0[47] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,-38,-37,-37,-37, /* 3a0-3af */
0,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* 3b0-3bf */
-32,-32,-31,-32,-32,-32,-32,-32,-32,-32,-32,-32,-64,-63,-63,
};
/* Upper case range - Cyrillic */
static signed char UniCaseRangeU0430[48] = {
-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* 430-43f */
-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* 440-44f */
0,-80,-80,-80,-80,-80,-80,-80,-80,-80,-80,-80,-80, 0,-80,-80, /* 450-45f */
};
/* Upper case range - Extended cyrillic */
static signed char UniCaseRangeU0490[61] = {
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 490-49f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 4a0-4af */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 4b0-4bf */
0, 0, -1, 0, -1, 0, 0, 0, -1, 0, 0, 0, -1,
};
/* Upper case range - Extended latin and greek */
static signed char UniCaseRangeU1e00[509] = {
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1e00-1e0f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1e10-1e1f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1e20-1e2f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1e30-1e3f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1e40-1e4f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1e50-1e5f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1e60-1e6f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1e70-1e7f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1e80-1e8f */
0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0,-59, 0, -1, 0, -1, /* 1e90-1e9f */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1ea0-1eaf */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1eb0-1ebf */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1ec0-1ecf */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1ed0-1edf */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, /* 1ee0-1eef */
0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, /* 1ef0-1eff */
8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, /* 1f00-1f0f */
8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 1f10-1f1f */
8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, /* 1f20-1f2f */
8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, /* 1f30-1f3f */
8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 1f40-1f4f */
0, 8, 0, 8, 0, 8, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, /* 1f50-1f5f */
8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, /* 1f60-1f6f */
74, 74, 86, 86, 86, 86,100,100, 0, 0,112,112,126,126, 0, 0, /* 1f70-1f7f */
8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, /* 1f80-1f8f */
8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, /* 1f90-1f9f */
8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, /* 1fa0-1faf */
8, 8, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 1fb0-1fbf */
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 1fc0-1fcf */
8, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 1fd0-1fdf */
8, 8, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 1fe0-1fef */
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0,
};
/* Upper case range - Wide latin */
static signed char UniCaseRangeUff40[27] = {
0,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* ff40-ff4f */
-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,
};
/*
* Upper Case Range
*/
UNICASERANGE UniUpperRange[] = {
{ 0x03a0, 0x03ce, UniCaseRangeU03a0 },
{ 0x0430, 0x045f, UniCaseRangeU0430 },
{ 0x0490, 0x04cc, UniCaseRangeU0490 },
{ 0x1e00, 0x1ffc, UniCaseRangeU1e00 },
{ 0xff40, 0xff5a, UniCaseRangeUff40 },
{ 0, 0, 0 }
};
This source diff could not be displayed because it is too large. You can view the blob instead.
/*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
/*
* Change History :
*
*/
#ifndef _H_JFS_XTREE
#define _H_JFS_XTREE
/*
* jfs_xtree.h: extent allocation descriptor B+-tree manager
*/
#include "jfs_btree.h"
/*
* extent allocation descriptor (xad)
*/
typedef struct xad {
unsigned flag:8; /* 1: flag */
unsigned rsvrd:16; /* 2: reserved */
unsigned off1:8; /* 1: offset in unit of fsblksize */
u32 off2; /* 4: offset in unit of fsblksize */
unsigned len:24; /* 3: length in unit of fsblksize */
unsigned addr1:8; /* 1: address in unit of fsblksize */
u32 addr2; /* 4: address in unit of fsblksize */
} xad_t; /* (16) */
#define MAXXLEN ((1 << 24) - 1)
#define XTSLOTSIZE 16
#define L2XTSLOTSIZE 4
/* xad_t field construction */
#define XADoffset(xad, offset64)\
{\
(xad)->off1 = ((u64)offset64) >> 32;\
(xad)->off2 = __cpu_to_le32((offset64) & 0xffffffff);\
}
#define XADaddress(xad, address64)\
{\
(xad)->addr1 = ((u64)address64) >> 32;\
(xad)->addr2 = __cpu_to_le32((address64) & 0xffffffff);\
}
#define XADlength(xad, length32) (xad)->len = __cpu_to_le24(length32)
/* xad_t field extraction */
#define offsetXAD(xad)\
( ((s64)((xad)->off1)) << 32 | __le32_to_cpu((xad)->off2))
#define addressXAD(xad)\
( ((s64)((xad)->addr1)) << 32 | __le32_to_cpu((xad)->addr2))
#define lengthXAD(xad) __le24_to_cpu((xad)->len)
/* xad list */
typedef struct {
s16 maxnxad;
s16 nxad;
xad_t *xad;
} xadlist_t;
/* xad_t flags */
#define XAD_NEW 0x01 /* new */
#define XAD_EXTENDED 0x02 /* extended */
#define XAD_COMPRESSED 0x04 /* compressed with recorded length */
#define XAD_NOTRECORDED 0x08 /* allocated but not recorded */
#define XAD_COW 0x10 /* copy-on-write */
/* possible values for maxentry */
#define XTROOTINITSLOT_DIR 6
#define XTROOTINITSLOT 10
#define XTROOTMAXSLOT 18
#define XTPAGEMAXSLOT 256
#define XTENTRYSTART 2
/*
* xtree page:
*/
typedef union {
struct xtheader {
s64 next; /* 8: */
s64 prev; /* 8: */
u8 flag; /* 1: */
u8 rsrvd1; /* 1: */
s16 nextindex; /* 2: next index = number of entries */
s16 maxentry; /* 2: max number of entries */
s16 rsrvd2; /* 2: */
pxd_t self; /* 8: self */
} header; /* (32) */
xad_t xad[XTROOTMAXSLOT]; /* 16 * maxentry: xad array */
} xtpage_t;
/*
* external declaration
*/
extern int xtLookup(struct inode *ip, s64 lstart, s64 llen,
int *pflag, s64 * paddr, int *plen, int flag);
extern int xtLookupList(struct inode *ip, lxdlist_t * lxdlist,
xadlist_t * xadlist, int flag);
extern void xtInitRoot(tid_t tid, struct inode *ip);
extern int xtInsert(tid_t tid, struct inode *ip,
int xflag, s64 xoff, int xlen, s64 * xaddrp, int flag);
extern int xtExtend(tid_t tid, struct inode *ip, s64 xoff, int xlen,
int flag);
extern int xtTailgate(tid_t tid, struct inode *ip,
s64 xoff, int xlen, s64 xaddr, int flag);
extern int xtUpdate(tid_t tid, struct inode *ip, struct xad *nxad);
extern int xtDelete(tid_t tid, struct inode *ip, s64 xoff, int xlen,
int flag);
extern s64 xtTruncate(tid_t tid, struct inode *ip, s64 newsize, int type);
extern s64 xtTruncate_pmap(tid_t tid, struct inode *ip, s64 committed_size);
extern int xtRelocate(tid_t tid, struct inode *ip,
xad_t * oxad, s64 nxaddr, int xtype);
extern int xtAppend(tid_t tid,
struct inode *ip, int xflag, s64 xoff, int maxblocks,
int *xlenp, s64 * xaddrp, int flag);
#ifdef _JFS_DEBUG_XTREE
extern int xtDisplayTree(struct inode *ip);
extern int xtDisplayPage(struct inode *ip, s64 bn, xtpage_t * p);
#endif /* _JFS_DEBUG_XTREE */
#endif /* !_H_JFS_XTREE */
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Module: jfs/namei.c
*
*/
/*
* Change History :
*
*/
#include <linux/fs.h>
#include "jfs_incore.h"
#include "jfs_inode.h"
#include "jfs_dinode.h"
#include "jfs_dmap.h"
#include "jfs_unicode.h"
#include "jfs_metapage.h"
#include "jfs_debug.h"
#include <linux/locks.h>
#include <linux/slab.h>
extern struct inode_operations jfs_file_inode_operations;
extern struct inode_operations jfs_symlink_inode_operations;
extern struct file_operations jfs_file_operations;
extern struct address_space_operations jfs_aops;
extern int jfs_fsync(struct file *, struct dentry *, int);
extern void jfs_truncate_nolock(struct inode *, loff_t);
/*
* forward references
*/
struct inode_operations jfs_dir_inode_operations;
struct file_operations jfs_dir_operations;
s64 commitZeroLink(tid_t, struct inode *);
/*
* NAME: jfs_create(dip, dentry, mode)
*
* FUNCTION: create a regular file in the parent directory <dip>
* with name = <from dentry> and mode = <mode>
*
* PARAMETER: dip - parent directory vnode
* dentry - dentry of new file
* mode - create mode (rwxrwxrwx).
*
* RETURN: Errors from subroutines
*
*/
int jfs_create(struct inode *dip, struct dentry *dentry, int mode)
{
int rc = 0;
tid_t tid; /* transaction id */
struct inode *ip = NULL; /* child directory inode */
ino_t ino;
component_t dname; /* child directory name */
btstack_t btstack;
struct inode *iplist[2];
tblock_t *tblk;
jFYI(1, ("jfs_create: dip:0x%p name:%s\n", dip, dentry->d_name.name));
IWRITE_LOCK(dip);
/*
* search parent directory for entry/freespace
* (dtSearch() returns parent directory page pinned)
*/
if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dip->i_sb)->nls_tab)))
goto out1;
/*
* Either iAlloc() or txBegin() may block. Deadlock can occur if we
* block there while holding dtree page, so we allocate the inode &
* begin the transaction before we search the directory.
*/
ip = ialloc(dip, mode);
if (ip == NULL) {
rc = ENOSPC;
goto out2;
}
tid = txBegin(dip->i_sb, 0);
if ((rc = dtSearch(dip, &dname, &ino, &btstack, JFS_CREATE))) {
jERROR(1, ("jfs_create: dtSearch returned %d\n", rc));
ip->i_nlink = 0;
iput(ip);
txEnd(tid);
goto out2;
}
tblk = tid_to_tblock(tid);
tblk->xflag |= COMMIT_CREATE;
tblk->ip = ip;
iplist[0] = dip;
iplist[1] = ip;
/*
* initialize the child XAD tree root in-line in inode
*/
xtInitRoot(tid, ip);
/*
* create entry in parent directory for child directory
* (dtInsert() releases parent directory page)
*/
ino = ip->i_ino;
if ((rc = dtInsert(tid, dip, &dname, &ino, &btstack))) {
jERROR(1, ("jfs_create: dtInsert returned %d\n", rc));
/* discard new inode */
ip->i_nlink = 0;
iput(ip);
if (rc == EIO)
txAbort(tid, 1); /* Marks Filesystem dirty */
else
txAbort(tid, 0); /* Filesystem full */
txEnd(tid);
goto out2;
}
ip->i_op = &jfs_file_inode_operations;
ip->i_fop = &jfs_file_operations;
ip->i_mapping->a_ops = &jfs_aops;
insert_inode_hash(ip);
mark_inode_dirty(ip);
d_instantiate(dentry, ip);
dip->i_version = ++event;
dip->i_ctime = dip->i_mtime = CURRENT_TIME;
mark_inode_dirty(dip);
rc = txCommit(tid, 2, &iplist[0], 0);
txEnd(tid);
out2:
free_UCSname(&dname);
out1:
IWRITE_UNLOCK(dip);
jFYI(1, ("jfs_create: rc:%d\n", -rc));
return -rc;
}
/*
* NAME: jfs_mkdir(dip, dentry, mode)
*
* FUNCTION: create a child directory in the parent directory <dip>
* with name = <from dentry> and mode = <mode>
*
* PARAMETER: dip - parent directory vnode
* dentry - dentry of child directory
* mode - create mode (rwxrwxrwx).
*
* RETURN: Errors from subroutines
*
* note:
* EACCESS: user needs search+write permission on the parent directory
*/
int jfs_mkdir(struct inode *dip, struct dentry *dentry, int mode)
{
int rc = 0;
tid_t tid; /* transaction id */
struct inode *ip = NULL; /* child directory inode */
ino_t ino;
component_t dname; /* child directory name */
btstack_t btstack;
struct inode *iplist[2];
tblock_t *tblk;
jFYI(1, ("jfs_mkdir: dip:0x%p name:%s\n", dip, dentry->d_name.name));
IWRITE_LOCK(dip);
/* link count overflow on parent directory ? */
if (dip->i_nlink == JFS_LINK_MAX) {
rc = EMLINK;
goto out1;
}
/*
* search parent directory for entry/freespace
* (dtSearch() returns parent directory page pinned)
*/
if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dip->i_sb)->nls_tab)))
goto out1;
/*
* Either iAlloc() or txBegin() may block. Deadlock can occur if we
* block there while holding dtree page, so we allocate the inode &
* begin the transaction before we search the directory.
*/
ip = ialloc(dip, S_IFDIR | mode);
if (ip == NULL) {
rc = ENOSPC;
goto out2;
}
tid = txBegin(dip->i_sb, 0);
if ((rc = dtSearch(dip, &dname, &ino, &btstack, JFS_CREATE))) {
jERROR(1, ("jfs_mkdir: dtSearch returned %d\n", rc));
ip->i_nlink = 0;
iput(ip);
txEnd(tid);
goto out2;
}
tblk = tid_to_tblock(tid);
tblk->xflag |= COMMIT_CREATE;
tblk->ip = ip;
iplist[0] = dip;
iplist[1] = ip;
/*
* initialize the child directory in-line in inode
*/
dtInitRoot(tid, ip, dip->i_ino);
/*
* create entry in parent directory for child directory
* (dtInsert() releases parent directory page)
*/
ino = ip->i_ino;
if ((rc = dtInsert(tid, dip, &dname, &ino, &btstack))) {
jERROR(1, ("jfs_mkdir: dtInsert returned %d\n", rc));
/* discard new directory inode */
ip->i_nlink = 0;
iput(ip);
if (rc == EIO)
txAbort(tid, 1); /* Marks Filesystem dirty */
else
txAbort(tid, 0); /* Filesystem full */
txEnd(tid);
goto out2;
}
ip->i_nlink = 2; /* for '.' */
ip->i_op = &jfs_dir_inode_operations;
ip->i_fop = &jfs_dir_operations;
ip->i_mapping->a_ops = &jfs_aops;
ip->i_mapping->gfp_mask = GFP_NOFS;
insert_inode_hash(ip);
mark_inode_dirty(ip);
d_instantiate(dentry, ip);
/* update parent directory inode */
dip->i_nlink++; /* for '..' from child directory */
dip->i_version = ++event;
dip->i_ctime = dip->i_mtime = CURRENT_TIME;
mark_inode_dirty(dip);
rc = txCommit(tid, 2, &iplist[0], 0);
txEnd(tid);
out2:
free_UCSname(&dname);
out1:
IWRITE_UNLOCK(dip);
jFYI(1, ("jfs_mkdir: rc:%d\n", -rc));
return -rc;
}
/*
* NAME: jfs_rmdir(dip, dentry)
*
* FUNCTION: remove a link to child directory
*
* PARAMETER: dip - parent inode
* dentry - child directory dentry
*
* RETURN: EINVAL - if name is . or ..
* EINVAL - if . or .. exist but are invalid.
* errors from subroutines
*
* note:
* if other threads have the directory open when the last link
* is removed, the "." and ".." entries, if present, are removed before
* rmdir() returns and no new entries may be created in the directory,
* but the directory is not removed until the last reference to
* the directory is released (cf.unlink() of regular file).
*/
int jfs_rmdir(struct inode *dip, struct dentry *dentry)
{
int rc;
tid_t tid; /* transaction id */
struct inode *ip = dentry->d_inode;
ino_t ino;
component_t dname;
struct inode *iplist[2];
tblock_t *tblk;
jFYI(1, ("jfs_rmdir: dip:0x%p name:%s\n", dip, dentry->d_name.name));
IWRITE_LOCK_LIST(2, dip, ip);
/* directory must be empty to be removed */
if (!dtEmpty(ip)) {
IWRITE_UNLOCK(ip);
IWRITE_UNLOCK(dip);
rc = ENOTEMPTY;
goto out;
}
if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dip->i_sb)->nls_tab))) {
IWRITE_UNLOCK(ip);
IWRITE_UNLOCK(dip);
goto out;
}
tid = txBegin(dip->i_sb, 0);
iplist[0] = dip;
iplist[1] = ip;
tblk = tid_to_tblock(tid);
tblk->xflag |= COMMIT_DELETE;
tblk->ip = ip;
/*
* delete the entry of target directory from parent directory
*/
ino = ip->i_ino;
if ((rc = dtDelete(tid, dip, &dname, &ino, JFS_REMOVE))) {
jERROR(1, ("jfs_rmdir: dtDelete returned %d\n", rc));
if (rc == EIO)
txAbort(tid, 1);
txEnd(tid);
IWRITE_UNLOCK(ip);
IWRITE_UNLOCK(dip);
goto out2;
}
/* update parent directory's link count corresponding
* to ".." entry of the target directory deleted
*/
dip->i_nlink--;
dip->i_ctime = dip->i_mtime = CURRENT_TIME;
dip->i_version = ++event;
mark_inode_dirty(dip);
/*
* OS/2 could have created EA and/or ACL
*/
/* free EA from both persistent and working map */
if (JFS_IP(ip)->ea.flag & DXD_EXTENT) {
/* free EA pages */
txEA(tid, ip, &JFS_IP(ip)->ea, NULL);
}
JFS_IP(ip)->ea.flag = 0;
/* free ACL from both persistent and working map */
if (JFS_IP(ip)->acl.flag & DXD_EXTENT) {
/* free ACL pages */
txEA(tid, ip, &JFS_IP(ip)->acl, NULL);
}
JFS_IP(ip)->acl.flag = 0;
/* mark the target directory as deleted */
ip->i_nlink = 0;
mark_inode_dirty(ip);
rc = txCommit(tid, 2, &iplist[0], 0);
txEnd(tid);
IWRITE_UNLOCK(ip);
/*
* Truncating the directory index table is not guaranteed. It
* may need to be done iteratively
*/
if (test_cflag(COMMIT_Stale, dip)) {
if (dip->i_size > 1)
jfs_truncate_nolock(dip, 0);
clear_cflag(COMMIT_Stale, dip);
}
IWRITE_UNLOCK(dip);
d_delete(dentry);
out2:
free_UCSname(&dname);
out:
jFYI(1, ("jfs_rmdir: rc:%d\n", rc));
return -rc;
}
/*
* NAME: jfs_unlink(dip, dentry)
*
* FUNCTION: remove a link to object <vp> named by <name>
* from parent directory <dvp>
*
* PARAMETER: dip - inode of parent directory
* dentry - dentry of object to be removed
*
* RETURN: errors from subroutines
*
* note:
* temporary file: if one or more processes have the file open
* when the last link is removed, the link will be removed before
* unlink() returns, but the removal of the file contents will be
* postponed until all references to the files are closed.
*
* JFS does NOT support unlink() on directories.
*
*/
int jfs_unlink(struct inode *dip, struct dentry *dentry)
{
int rc;
tid_t tid; /* transaction id */
struct inode *ip = dentry->d_inode;
ino_t ino;
component_t dname; /* object name */
struct inode *iplist[2];
tblock_t *tblk;
s64 new_size = 0;
int commit_flag;
jFYI(1, ("jfs_unlink: dip:0x%p name:%s\n", dip, dentry->d_name.name));
if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dip->i_sb)->nls_tab)))
goto out;
IWRITE_LOCK_LIST(2, ip, dip);
tid = txBegin(dip->i_sb, 0);
iplist[0] = dip;
iplist[1] = ip;
/*
* delete the entry of target file from parent directory
*/
ino = ip->i_ino;
if ((rc = dtDelete(tid, dip, &dname, &ino, JFS_REMOVE))) {
jERROR(1, ("jfs_unlink: dtDelete returned %d\n", rc));
if (rc == EIO)
txAbort(tid, 1); /* Marks FS Dirty */
txEnd(tid);
IWRITE_UNLOCK(ip);
IWRITE_UNLOCK(dip);
goto out1;
}
ASSERT(ip->i_nlink);
ip->i_ctime = dip->i_ctime = dip->i_mtime = CURRENT_TIME;
dip->i_version = ++event;
mark_inode_dirty(dip);
/* update target's inode */
ip->i_nlink--;
mark_inode_dirty(ip);
/*
* commit zero link count object
*/
if (ip->i_nlink == 0) {
assert(!test_cflag(COMMIT_Nolink, ip));
/* free block resources */
if ((new_size = commitZeroLink(tid, ip)) < 0) {
txAbort(tid, 1); /* Marks FS Dirty */
txEnd(tid);
IWRITE_UNLOCK(ip);
IWRITE_UNLOCK(dip);
rc = -new_size; /* We return -rc */
goto out1;
}
tblk = tid_to_tblock(tid);
tblk->xflag |= COMMIT_DELETE;
tblk->ip = ip;
}
/*
* Incomplete truncate of file data can
* result in timing problems unless we synchronously commit the
* transaction.
*/
if (new_size)
commit_flag = COMMIT_SYNC;
else
commit_flag = 0;
/*
* If xtTruncate was incomplete, commit synchronously to avoid
* timing complications
*/
rc = txCommit(tid, 2, &iplist[0], commit_flag);
txEnd(tid);
while (new_size && (rc == 0)) {
tid = txBegin(dip->i_sb, 0);
new_size = xtTruncate_pmap(tid, ip, new_size);
if (new_size < 0) {
txAbort(tid, 1); /* Marks FS Dirty */
rc = -new_size; /* We return -rc */
} else
rc = txCommit(tid, 2, &iplist[0], COMMIT_SYNC);
txEnd(tid);
}
if (!test_cflag(COMMIT_Holdlock, ip))
IWRITE_UNLOCK(ip);
/*
* Truncating the directory index table is not guaranteed. It
* may need to be done iteratively
*/
if (test_cflag(COMMIT_Stale, dip)) {
if (dip->i_size > 1)
jfs_truncate_nolock(dip, 0);
clear_cflag(COMMIT_Stale, dip);
}
IWRITE_UNLOCK(dip);
d_delete(dentry);
out1:
free_UCSname(&dname);
out:
jFYI(1, ("jfs_unlink: rc:%d\n", -rc));
return -rc;
}
/*
* NAME: commitZeroLink()
*
* FUNCTION: for non-directory, called by jfs_remove(),
* truncate a regular file, directory or symbolic
* link to zero length. return 0 if type is not
* one of these.
*
* if the file is currently associated with a VM segment
* only permanent disk and inode map resources are freed,
* and neither the inode nor indirect blocks are modified
* so that the resources can be later freed in the work
* map by ctrunc1.
* if there is no VM segment on entry, the resources are
* freed in both work and permanent map.
* (? for temporary file - memory object is cached even
* after no reference:
* reference count > 0 - )
*
* PARAMETERS: cd - pointer to commit data structure.
* current inode is the one to truncate.
*
* RETURN : Errors from subroutines
*/
s64 commitZeroLink(tid_t tid, struct inode *ip)
{
int filetype, committype;
tblock_t *tblk;
jFYI(1, ("commitZeroLink: tid = %d, ip = 0x%p\n", tid, ip));
filetype = ip->i_mode & S_IFMT;
switch (filetype) {
case S_IFREG:
break;
case S_IFLNK:
/* fast symbolic link */
if (ip->i_size <= 256) {
ip->i_size = 0;
return 0;
}
break;
default:
assert(filetype != S_IFDIR);
return 0;
}
#ifdef _STILL_TO_PORT
/*
* free from block allocation map:
*
* if there is no cache control element associated with
* the file, free resources in both persistent and work map;
* otherwise just persistent map.
*/
if (ip->i_cacheid) {
committype = COMMIT_PMAP;
/* mark for iClose() to free from working map */
set_cflag(COMMIT_Freewmap, ip);
} else
committype = COMMIT_PWMAP;
#else /* _STILL_TO_PORT */
set_cflag(COMMIT_Freewmap, ip);
committype = COMMIT_PMAP;
#endif /* _STILL_TO_PORT */
/* mark transaction of block map update type */
tblk = tid_to_tblock(tid);
tblk->xflag |= committype;
/*
* free EA
*/
if (JFS_IP(ip)->ea.flag & DXD_EXTENT) {
#ifdef _STILL_TO_PORT
/* free EA pages from cache */
if (committype == COMMIT_PWMAP)
bmExtentInvalidate(ip, addressDXD(&ip->i_ea),
lengthDXD(&ip->i_ea));
#endif /* _STILL_TO_PORT */
/* acquire maplock on EA to be freed from block map */
txEA(tid, ip, &JFS_IP(ip)->ea, NULL);
if (committype == COMMIT_PWMAP)
JFS_IP(ip)->ea.flag = 0;
}
/*
* free ACL
*/
if (JFS_IP(ip)->acl.flag & DXD_EXTENT) {
#ifdef _STILL_TO_PORT
/* free ACL pages from cache */
if (committype == COMMIT_PWMAP)
bmExtentInvalidate(ip, addressDXD(&ip->i_acl),
lengthDXD(&ip->i_acl));
#endif /* _STILL_TO_PORT */
/* acquire maplock on EA to be freed from block map */
txEA(tid, ip, &JFS_IP(ip)->acl, NULL);
if (committype == COMMIT_PWMAP)
JFS_IP(ip)->acl.flag = 0;
}
/*
* free xtree/data (truncate to zero length):
* free xtree/data pages from cache if COMMIT_PWMAP,
* free xtree/data blocks from persistent block map, and
* free xtree/data blocks from working block map if COMMIT_PWMAP;
*/
if (ip->i_size)
return xtTruncate_pmap(tid, ip, 0);
return 0;
}
/*
* NAME: freeZeroLink()
*
* FUNCTION: for non-directory, called by iClose(),
* free resources of a file from cache and WORKING map
* for a file previously committed with zero link count
* while associated with a pager object,
*
* PARAMETER: ip - pointer to inode of file.
*
* RETURN: 0 -ok
*/
int freeZeroLink(struct inode *ip)
{
int rc = 0;
int type;
jFYI(1, ("freeZeroLink: ip = 0x%p\n", ip));
/* return if not reg or symbolic link or if size is
* already ok.
*/
type = ip->i_mode & S_IFMT;
switch (type) {
case S_IFREG:
break;
case S_IFLNK:
/* if its contained in inode nothing to do */
if (ip->i_size <= 256)
return 0;
break;
default:
return 0;
}
/*
* free EA
*/
if (JFS_IP(ip)->ea.flag & DXD_EXTENT) {
s64 xaddr;
int xlen;
maplock_t maplock; /* maplock for COMMIT_WMAP */
pxdlock_t *pxdlock; /* maplock for COMMIT_WMAP */
/* free EA pages from cache */
xaddr = addressDXD(&JFS_IP(ip)->ea);
xlen = lengthDXD(&JFS_IP(ip)->ea);
#ifdef _STILL_TO_PORT
bmExtentInvalidate(ip, xaddr, xlen);
#endif
/* free EA extent from working block map */
maplock.index = 1;
pxdlock = (pxdlock_t *) & maplock;
pxdlock->flag = mlckFREEPXD;
PXDaddress(&pxdlock->pxd, xaddr);
PXDlength(&pxdlock->pxd, xlen);
txFreeMap(ip, pxdlock, 0, COMMIT_WMAP);
}
/*
* free ACL
*/
if (JFS_IP(ip)->acl.flag & DXD_EXTENT) {
s64 xaddr;
int xlen;
maplock_t maplock; /* maplock for COMMIT_WMAP */
pxdlock_t *pxdlock; /* maplock for COMMIT_WMAP */
/* free ACL pages from cache */
xaddr = addressDXD(&JFS_IP(ip)->acl);
xlen = lengthDXD(&JFS_IP(ip)->acl);
#ifdef _STILL_TO_PORT
bmExtentInvalidate(ip, xaddr, xlen);
#endif
/* free ACL extent from working block map */
maplock.index = 1;
pxdlock = (pxdlock_t *) & maplock;
pxdlock->flag = mlckFREEPXD;
PXDaddress(&pxdlock->pxd, xaddr);
PXDlength(&pxdlock->pxd, xlen);
txFreeMap(ip, pxdlock, 0, COMMIT_WMAP);
}
/*
* free xtree/data (truncate to zero length):
* free xtree/data pages from cache, and
* free xtree/data blocks from working block map;
*/
if (ip->i_size)
rc = xtTruncate(0, ip, 0, COMMIT_WMAP);
return rc;
}
/*
* NAME: jfs_link(vp, dvp, name, crp)
*
* FUNCTION: create a link to <vp> by the name = <name>
* in the parent directory <dvp>
*
* PARAMETER: vp - target object
* dvp - parent directory of new link
* name - name of new link to target object
* crp - credential
*
* RETURN: Errors from subroutines
*
* note:
* JFS does NOT support link() on directories (to prevent circular
* path in the directory hierarchy);
* EPERM: the target object is a directory, and either the caller
* does not have appropriate privileges or the implementation prohibits
* using link() on directories [XPG4.2].
*
* JFS does NOT support links between file systems:
* EXDEV: target object and new link are on different file systems and
* implementation does not support links between file systems [XPG4.2].
*/
int jfs_link(struct dentry *old_dentry,
struct inode *dir, struct dentry *dentry)
{
int rc;
tid_t tid;
struct inode *ip = old_dentry->d_inode;
ino_t ino;
component_t dname;
btstack_t btstack;
struct inode *iplist[2];
jFYI(1,
("jfs_link: %s %s\n", old_dentry->d_name.name,
dentry->d_name.name));
/* The checks for links between filesystems and permissions are
handled by the VFS layer */
/* JFS does NOT support link() on directories */
if (S_ISDIR(ip->i_mode))
return -EPERM;
IWRITE_LOCK_LIST(2, dir, ip);
tid = txBegin(ip->i_sb, 0);
if (ip->i_nlink == JFS_LINK_MAX) {
rc = EMLINK;
goto out;
}
/*
* scan parent directory for entry/freespace
*/
if ((rc = get_UCSname(&dname, dentry, JFS_SBI(ip->i_sb)->nls_tab)))
goto out;
if ((rc = dtSearch(dir, &dname, &ino, &btstack, JFS_CREATE)))
goto out;
/*
* create entry for new link in parent directory
*/
ino = ip->i_ino;
if ((rc = dtInsert(tid, dir, &dname, &ino, &btstack)))
goto out;
dir->i_version = ++event;
/* update object inode */
ip->i_nlink++; /* for new link */
ip->i_ctime = CURRENT_TIME;
mark_inode_dirty(dir);
atomic_inc(&ip->i_count);
d_instantiate(dentry, ip);
iplist[0] = ip;
iplist[1] = dir;
rc = txCommit(tid, 2, &iplist[0], 0);
out:
IWRITE_UNLOCK(dir);
IWRITE_UNLOCK(ip);
txEnd(tid);
jFYI(1, ("jfs_link: rc:%d\n", rc));
return -rc;
}
/*
* NAME: jfs_symlink(dip, dentry, name)
*
* FUNCTION: creates a symbolic link to <symlink> by name <name>
* in directory <dip>
*
* PARAMETER: dip - parent directory vnode
* dentry - dentry of symbolic link
* name - the path name of the existing object
* that will be the source of the link
*
* RETURN: errors from subroutines
*
* note:
* ENAMETOOLONG: pathname resolution of a symbolic link produced
* an intermediate result whose length exceeds PATH_MAX [XPG4.2]
*/
int jfs_symlink(struct inode *dip, struct dentry *dentry, const char *name)
{
int rc;
tid_t tid;
ino_t ino = 0;
component_t dname;
int ssize; /* source pathname size */
btstack_t btstack;
struct inode *ip = dentry->d_inode;
unchar *i_fastsymlink;
s64 xlen = 0;
int bmask = 0, xsize;
s64 xaddr;
metapage_t *mp;
struct super_block *sb;
tblock_t *tblk;
struct inode *iplist[2];
jFYI(1, ("jfs_symlink: dip:0x%p name:%s\n", dip, name));
IWRITE_LOCK(dip);
ssize = strlen(name) + 1;
tid = txBegin(dip->i_sb, 0);
/*
* search parent directory for entry/freespace
* (dtSearch() returns parent directory page pinned)
*/
if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dip->i_sb)->nls_tab)))
goto out1;
if ((rc = dtSearch(dip, &dname, &ino, &btstack, JFS_CREATE)))
goto out2;
/*
* allocate on-disk/in-memory inode for symbolic link:
* (iAlloc() returns new, locked inode)
*/
ip = ialloc(dip, S_IFLNK | 0777);
if (ip == NULL) {
BT_PUTSEARCH(&btstack);
rc = ENOSPC;
goto out2;
}
tblk = tid_to_tblock(tid);
tblk->xflag |= COMMIT_CREATE;
tblk->ip = ip;
/*
* create entry for symbolic link in parent directory
*/
ino = ip->i_ino;
if ((rc = dtInsert(tid, dip, &dname, &ino, &btstack))) {
jERROR(1, ("jfs_symlink: dtInsert returned %d\n", rc));
/* discard ne inode */
ip->i_nlink = 0;
iput(ip);
goto out2;
}
/* fix symlink access permission
* (dir_create() ANDs in the u.u_cmask,
* but symlinks really need to be 777 access)
*/
ip->i_mode |= 0777;
/*
* write symbolic link target path name
*/
xtInitRoot(tid, ip);
/*
* write source path name inline in on-disk inode (fast symbolic link)
*/
if (ssize <= IDATASIZE) {
ip->i_op = &jfs_symlink_inode_operations;
i_fastsymlink = JFS_IP(ip)->i_inline;
memcpy(i_fastsymlink, name, ssize);
ip->i_size = ssize - 1;
jFYI(1,
("jfs_symlink: fast symlink added ssize:%d name:%s \n",
ssize, name));
}
/*
* write source path name in a single extent
*/
else {
jFYI(1, ("jfs_symlink: allocate extent ip:0x%p\n", ip));
ip->i_op = &page_symlink_inode_operations;
ip->i_mapping->a_ops = &jfs_aops;
/*
* even though the data of symlink object (source
* path name) is treated as non-journaled user data,
* it is read/written thru buffer cache for performance.
*/
sb = ip->i_sb;
bmask = JFS_SBI(sb)->bsize - 1;
xsize = (ssize + bmask) & ~bmask;
xaddr = 0;
xlen = xsize >> JFS_SBI(sb)->l2bsize;
if ((rc = xtInsert(tid, ip, 0, 0, xlen, &xaddr, 0)) == 0) {
ip->i_size = ssize - 1;
while (ssize) {
int copy_size = min(ssize, PSIZE);
mp = get_metapage(ip, xaddr, PSIZE, 1);
if (mp == NULL) {
dtDelete(tid, dip, &dname, &ino,
JFS_REMOVE);
ip->i_nlink = 0;
iput(ip);
rc = EIO;
goto out2;
}
memcpy(mp->data, name, copy_size);
flush_metapage(mp);
#if 0
mark_buffer_uptodate(bp, 1);
mark_buffer_dirty(bp, 1);
if (IS_SYNC(dip)) {
ll_rw_block(WRITE, 1, &bp);
wait_on_buffer(bp);
}
brelse(bp);
#endif /* 0 */
ssize -= copy_size;
xaddr += JFS_SBI(sb)->nbperpage;
}
ip->i_blocks = LBLK2PBLK(sb, xlen);
} else {
dtDelete(tid, dip, &dname, &ino, JFS_REMOVE);
ip->i_nlink = 0;
iput(ip);
rc = ENOSPC;
goto out2;
}
}
dip->i_version = ++event;
insert_inode_hash(ip);
mark_inode_dirty(ip);
d_instantiate(dentry, ip);
/*
* commit update of parent directory and link object
*
* if extent allocation failed (ENOSPC),
* the parent inode is committed regardless to avoid
* backing out parent directory update (by dtInsert())
* and subsequent dtDelete() which is harmless wrt
* integrity concern.
* the symlink inode will be freed by iput() at exit
* as it has a zero link count (by dtDelete()) and
* no permanant resources.
*/
iplist[0] = dip;
if (rc == 0) {
iplist[1] = ip;
rc = txCommit(tid, 2, &iplist[0], 0);
} else
rc = txCommit(tid, 1, &iplist[0], 0);
out2:
free_UCSname(&dname);
out1:
IWRITE_UNLOCK(dip);
txEnd(tid);
jFYI(1, ("jfs_symlink: rc:%d\n", -rc));
return -rc;
}
/*
* NAME: jfs_rename
*
* FUNCTION: rename a file or directory
*/
int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
btstack_t btstack;
ino_t ino;
component_t new_dname;
struct inode *new_ip;
component_t old_dname;
struct inode *old_ip;
int rc;
tid_t tid;
tlock_t *tlck;
dtlock_t *dtlck;
lv_t *lv;
int ipcount;
struct inode *iplist[4];
tblock_t *tblk;
s64 new_size = 0;
int commit_flag;
jFYI(1,
("jfs_rename: %s %s\n", old_dentry->d_name.name,
new_dentry->d_name.name));
old_ip = old_dentry->d_inode;
new_ip = new_dentry->d_inode;
if (old_dir == new_dir) {
if (new_ip)
IWRITE_LOCK_LIST(3, old_dir, old_ip, new_ip);
else
IWRITE_LOCK_LIST(2, old_dir, old_ip);
} else {
if (new_ip)
IWRITE_LOCK_LIST(4, old_dir, new_dir, old_ip,
new_ip);
else
IWRITE_LOCK_LIST(3, old_dir, new_dir, old_ip);
}
if ((rc = get_UCSname(&old_dname, old_dentry,
JFS_SBI(old_dir->i_sb)->nls_tab)))
goto out1;
if ((rc = get_UCSname(&new_dname, new_dentry,
JFS_SBI(old_dir->i_sb)->nls_tab)))
goto out2;
/*
* Make sure source inode number is what we think it is
*/
rc = dtSearch(old_dir, &old_dname, &ino, &btstack, JFS_LOOKUP);
if (rc || (ino != old_ip->i_ino)) {
rc = ENOENT;
goto out3;
}
/*
* Make sure dest inode number (if any) is what we think it is
*/
rc = dtSearch(new_dir, &new_dname, &ino, &btstack, JFS_LOOKUP);
if (rc == 0) {
if ((new_ip == 0) || (ino != new_ip->i_ino)) {
rc = ESTALE;
goto out3;
}
} else if (rc != ENOENT)
goto out3;
else if (new_ip) {
/* no entry exists, but one was expected */
rc = ESTALE;
goto out3;
}
if (S_ISDIR(old_ip->i_mode)) {
if (new_ip) {
if (!dtEmpty(new_ip)) {
rc = ENOTEMPTY;
goto out3;
}
} else if ((new_dir != old_dir) &&
(new_dir->i_nlink == JFS_LINK_MAX)) {
rc = EMLINK;
goto out3;
}
}
/*
* The real work starts here
*/
tid = txBegin(new_dir->i_sb, 0);
if (new_ip) {
/*
* Change existing directory entry to new inode number
*/
ino = new_ip->i_ino;
rc = dtModify(tid, new_dir, &new_dname, &ino,
old_ip->i_ino, JFS_RENAME);
if (rc)
goto out4;
new_ip->i_nlink--;
if (S_ISDIR(new_ip->i_mode)) {
new_ip->i_nlink--;
assert(new_ip->i_nlink == 0);
tblk = tid_to_tblock(tid);
tblk->xflag |= COMMIT_DELETE;
tblk->ip = new_ip;
} else if (new_ip->i_nlink == 0) {
assert(!test_cflag(COMMIT_Nolink, new_ip));
/* free block resources */
if ((new_size = commitZeroLink(tid, new_ip)) < 0) {
txAbort(tid, 1); /* Marks FS Dirty */
rc = -new_size; /* We return -rc */
goto out4;
}
tblk = tid_to_tblock(tid);
tblk->xflag |= COMMIT_DELETE;
tblk->ip = new_ip;
} else {
new_ip->i_ctime = CURRENT_TIME;
mark_inode_dirty(new_ip);
}
} else {
/*
* Add new directory entry
*/
rc = dtSearch(new_dir, &new_dname, &ino, &btstack,
JFS_CREATE);
if (rc) {
jERROR(1,
("jfs_rename didn't expect dtSearch to fail w/rc = %d\n",
rc));
goto out4;
}
ino = old_ip->i_ino;
rc = dtInsert(tid, new_dir, &new_dname, &ino, &btstack);
if (rc) {
jERROR(1,
("jfs_rename: dtInsert failed w/rc = %d\n",
rc));
goto out4;
}
if (S_ISDIR(old_ip->i_mode))
new_dir->i_nlink++;
}
/*
* Remove old directory entry
*/
ino = old_ip->i_ino;
rc = dtDelete(tid, old_dir, &old_dname, &ino, JFS_REMOVE);
if (rc) {
jERROR(1,
("jfs_rename did not expect dtDelete to return rc = %d\n",
rc));
txAbort(tid, 1); /* Marks Filesystem dirty */
goto out4;
}
if (S_ISDIR(old_ip->i_mode)) {
old_dir->i_nlink--;
if (old_dir != new_dir) {
/*
* Change inode number of parent for moved directory
*/
JFS_IP(old_ip)->i_dtroot.header.idotdot =
cpu_to_le32(new_dir->i_ino);
/* Linelock header of dtree */
tlck = txLock(tid, old_ip,
(metapage_t *) & JFS_IP(old_ip)->bxflag,
tlckDTREE | tlckBTROOT);
dtlck = (dtlock_t *) & tlck->lock;
ASSERT(dtlck->index == 0);
lv = (lv_t *) & dtlck->lv[0];
lv->offset = 0;
lv->length = 1;
dtlck->index++;
}
}
/*
* Update ctime on changed/moved inodes & mark dirty
*/
old_ip->i_ctime = CURRENT_TIME;
mark_inode_dirty(old_ip);
new_dir->i_version = ++event;
new_dir->i_ctime = CURRENT_TIME;
mark_inode_dirty(new_dir);
/* Build list of inodes modified by this transaction */
ipcount = 0;
iplist[ipcount++] = old_ip;
if (new_ip)
iplist[ipcount++] = new_ip;
iplist[ipcount++] = old_dir;
if (old_dir != new_dir) {
iplist[ipcount++] = new_dir;
old_dir->i_version = ++event;
old_dir->i_ctime = CURRENT_TIME;
mark_inode_dirty(old_dir);
}
/*
* Incomplete truncate of file data can
* result in timing problems unless we synchronously commit the
* transaction.
*/
if (new_size)
commit_flag = COMMIT_SYNC;
else
commit_flag = 0;
rc = txCommit(tid, ipcount, iplist, commit_flag);
/*
* Don't unlock new_ip if COMMIT_HOLDLOCK is set
*/
if (new_ip && test_cflag(COMMIT_Holdlock, new_ip))
new_ip = 0;
out4:
txEnd(tid);
while (new_size && (rc == 0)) {
tid = txBegin(new_ip->i_sb, 0);
new_size = xtTruncate_pmap(tid, new_ip, new_size);
if (new_size < 0) {
txAbort(tid, 1);
rc = -new_size; /* We return -rc */
} else
rc = txCommit(tid, 1, &new_ip, COMMIT_SYNC);
txEnd(tid);
}
out3:
free_UCSname(&new_dname);
out2:
free_UCSname(&old_dname);
out1:
IWRITE_UNLOCK(old_ip);
if (old_dir != new_dir)
IWRITE_UNLOCK(new_dir);
if (new_ip)
IWRITE_UNLOCK(new_ip);
/*
* Truncating the directory index table is not guaranteed. It
* may need to be done iteratively
*/
if (test_cflag(COMMIT_Stale, old_dir)) {
if (old_dir->i_size > 1)
jfs_truncate_nolock(old_dir, 0);
clear_cflag(COMMIT_Stale, old_dir);
}
IWRITE_UNLOCK(old_dir);
jFYI(1, ("jfs_rename: returning %d\n", rc));
return -rc;
}
/*
* NAME: jfs_mknod
*
* FUNCTION: Create a special file (device)
*/
int jfs_mknod(struct inode *dir, struct dentry *dentry, int mode, int rdev)
{
btstack_t btstack;
component_t dname;
ino_t ino;
struct inode *ip;
struct inode *iplist[2];
int rc;
tid_t tid;
tblock_t *tblk;
jFYI(1, ("jfs_mknod: %s\n", dentry->d_name.name));
if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dir->i_sb)->nls_tab)))
goto out;
IWRITE_LOCK(dir);
ip = ialloc(dir, mode);
if (ip == NULL) {
rc = ENOSPC;
goto out1;
}
tid = txBegin(dir->i_sb, 0);
if ((rc = dtSearch(dir, &dname, &ino, &btstack, JFS_CREATE))) {
ip->i_nlink = 0;
iput(ip);
txEnd(tid);
goto out1;
}
tblk = tid_to_tblock(tid);
tblk->xflag |= COMMIT_CREATE;
tblk->ip = ip;
ino = ip->i_ino;
if ((rc = dtInsert(tid, dir, &dname, &ino, &btstack))) {
ip->i_nlink = 0;
iput(ip);
txEnd(tid);
goto out1;
}
if (S_ISREG(ip->i_mode)) {
ip->i_op = &jfs_file_inode_operations;
ip->i_fop = &jfs_file_operations;
ip->i_mapping->a_ops = &jfs_aops;
} else
init_special_inode(ip, ip->i_mode, rdev);
insert_inode_hash(ip);
mark_inode_dirty(ip);
d_instantiate(dentry, ip);
dir->i_version = ++event;
dir->i_ctime = dir->i_mtime = CURRENT_TIME;
mark_inode_dirty(dir);
iplist[0] = dir;
iplist[1] = ip;
rc = txCommit(tid, 2, iplist, 0);
txEnd(tid);
out1:
IWRITE_UNLOCK(dir);
free_UCSname(&dname);
out:
jFYI(1, ("jfs_mknod: returning %d\n", rc));
return -rc;
}
static struct dentry *jfs_lookup(struct inode *dip, struct dentry *dentry)
{
btstack_t btstack;
ino_t inum;
struct inode *ip;
component_t key;
const char *name = dentry->d_name.name;
int len = dentry->d_name.len;
int rc;
jFYI(1, ("jfs_lookup: name = %s\n", name));
if ((name[0] == '.') && (len == 1))
inum = dip->i_ino;
else if (strcmp(name, "..") == 0)
inum = PARENT(dip);
else {
if ((rc =
get_UCSname(&key, dentry, JFS_SBI(dip->i_sb)->nls_tab)))
return ERR_PTR(-rc);
IREAD_LOCK(dip);
rc = dtSearch(dip, &key, &inum, &btstack, JFS_LOOKUP);
IREAD_UNLOCK(dip);
free_UCSname(&key);
if (rc == ENOENT) {
d_add(dentry, NULL);
return ERR_PTR(0);
} else if (rc) {
jERROR(1,
("jfs_lookup: dtSearch returned %d\n", rc));
return ERR_PTR(-rc);
}
}
ip = iget(dip->i_sb, inum);
if (ip == NULL) {
jERROR(1,
("jfs_lookup: iget failed on inum %d\n",
(uint) inum));
return ERR_PTR(-EACCES);
}
d_add(dentry, ip);
return ERR_PTR(0);
}
struct inode_operations jfs_dir_inode_operations = {
create: jfs_create,
lookup: jfs_lookup,
link: jfs_link,
unlink: jfs_unlink,
symlink: jfs_symlink,
mkdir: jfs_mkdir,
rmdir: jfs_rmdir,
mknod: jfs_mknod,
rename: jfs_rename,
};
struct file_operations jfs_dir_operations = {
read: generic_read_dir,
readdir: jfs_readdir,
fsync: jfs_fsync,
};
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include <linux/fs.h>
#include <linux/locks.h>
#include <linux/config.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <asm/uaccess.h>
#include "jfs_incore.h"
#include "jfs_filsys.h"
#include "jfs_metapage.h"
#include "jfs_superblock.h"
#include "jfs_dmap.h"
#include "jfs_imap.h"
#include "jfs_debug.h"
MODULE_DESCRIPTION("The Journaled Filesystem (JFS)");
MODULE_AUTHOR("Steve Best/Dave Kleikamp/Barry Arndt, IBM");
MODULE_LICENSE("GPL");
static kmem_cache_t * jfs_inode_cachep;
static int in_shutdown;
static pid_t jfsIOthread;
static pid_t jfsCommitThread;
static pid_t jfsSyncThread;
struct task_struct *jfsIOtask;
struct task_struct *jfsCommitTask;
struct task_struct *jfsSyncTask;
DECLARE_COMPLETION(jfsIOwait);
#ifdef CONFIG_JFS_DEBUG
int jfsloglevel = 1;
MODULE_PARM(jfsloglevel, "i");
MODULE_PARM_DESC(jfsloglevel, "Specify JFS loglevel (0, 1 or 2)");
#endif
/*
* External declarations
*/
extern int jfs_mount(struct super_block *);
extern int jfs_mount_rw(struct super_block *, int);
extern int jfs_umount(struct super_block *);
extern int jfs_umount_rw(struct super_block *);
extern int jfsIOWait(void *);
extern int jfs_lazycommit(void *);
extern int jfs_sync(void *);
extern void jfs_put_inode(struct inode *inode);
extern void jfs_read_inode(struct inode *inode);
extern void jfs_dirty_inode(struct inode *inode);
extern void jfs_delete_inode(struct inode *inode);
extern void jfs_write_inode(struct inode *inode, int wait);
#if defined(CONFIG_JFS_DEBUG) && defined(CONFIG_PROC_FS)
extern void jfs_proc_init(void);
extern void jfs_proc_clean(void);
#endif
int jfs_thread_stopped(void)
{
unsigned long signr;
siginfo_t info;
spin_lock_irq(&current->sigmask_lock);
signr = dequeue_signal(&current->blocked, &info);
spin_unlock_irq(&current->sigmask_lock);
if (signr == SIGKILL && in_shutdown)
return 1;
return 0;
}
static struct inode *jfs_alloc_inode(struct super_block *sb)
{
struct jfs_inode_info *jfs_inode;
jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
if (!jfs_inode)
return NULL;
return &jfs_inode->vfs_inode;
}
static void jfs_destroy_inode(struct inode *inode)
{
kmem_cache_free(jfs_inode_cachep, JFS_IP(inode));
}
static int jfs_statfs(struct super_block *sb, struct statfs *buf)
{
struct jfs_sb_info *sbi = JFS_SBI(sb);
s64 maxinodes;
imap_t *imap = JFS_IP(sbi->ipimap)->i_imap;
jFYI(1, ("In jfs_statfs\n"));
buf->f_type = JFS_SUPER_MAGIC;
buf->f_bsize = sbi->bsize;
buf->f_blocks = sbi->bmap->db_mapsize;
buf->f_bfree = sbi->bmap->db_nfree;
buf->f_bavail = sbi->bmap->db_nfree;
/*
* If we really return the number of allocated & free inodes, some
* applications will fail because they won't see enough free inodes.
* We'll try to calculate some guess as to how may inodes we can
* really allocate
*
* buf->f_files = atomic_read(&imap->im_numinos);
* buf->f_ffree = atomic_read(&imap->im_numfree);
*/
maxinodes = min((s64) atomic_read(&imap->im_numinos) +
((sbi->bmap->db_nfree >> imap->im_l2nbperiext)
<< L2INOSPEREXT), (s64)0xffffffffLL);
buf->f_files = maxinodes;
buf->f_ffree = maxinodes - (atomic_read(&imap->im_numinos) -
atomic_read(&imap->im_numfree));
buf->f_namelen = JFS_NAME_MAX;
return 0;
}
static void jfs_put_super(struct super_block *sb)
{
struct jfs_sb_info *sbi = JFS_SBI(sb);
int rc;
jFYI(1, ("In jfs_put_super\n"));
rc = jfs_umount(sb);
if (rc) {
jERROR(1, ("jfs_umount failed with return code %d\n", rc));
}
unload_nls(sbi->nls_tab);
sbi->nls_tab = NULL;
/*
* We need to clean out the direct_inode pages since this inode
* is not in the inode hash.
*/
fsync_inode_data_buffers(sbi->direct_inode);
truncate_inode_pages(sbi->direct_mapping, 0);
iput(sbi->direct_inode);
sbi->direct_inode = NULL;
sbi->direct_mapping = NULL;
JFS_SBI(sb) = 0;
kfree(sbi);
}
static int parse_options (char * options, struct jfs_sb_info *sbi)
{
void *nls_map = NULL;
char * this_char;
char * value;
if (!options)
return 1;
for (this_char = strtok (options, ",");
this_char != NULL;
this_char = strtok (NULL, ",")) {
if ((value = strchr (this_char, '=')) != NULL)
*value++ = 0;
if (!strcmp (this_char, "iocharset")) {
if (!value || !*value)
goto needs_arg;
if (nls_map) /* specified iocharset twice! */
unload_nls(nls_map);
nls_map = load_nls(value);
if (!nls_map) {
printk(KERN_ERR "JFS: charset not found\n");
goto cleanup;
}
/* Silently ignore the quota options */
} else if (!strcmp (this_char, "grpquota")
|| !strcmp (this_char, "noquota")
|| !strcmp (this_char, "quota")
|| !strcmp (this_char, "usrquota"))
/* Don't do anything ;-) */ ;
else {
printk ("jfs: Unrecognized mount option %s\n", this_char);
goto cleanup;
}
}
if (nls_map) {
/* Discard old (if remount) */
if (sbi->nls_tab)
unload_nls(sbi->nls_tab);
sbi->nls_tab = nls_map;
}
return 1;
needs_arg:
printk(KERN_ERR "JFS: %s needs an argument\n", this_char);
cleanup:
if (nls_map)
unload_nls(nls_map);
return 0;
}
int jfs_remount(struct super_block *sb, int *flags, char *data)
{
struct jfs_sb_info *sbi = JFS_SBI(sb);
if (!parse_options(data, sbi)) {
return -EINVAL;
}
if ((sb->s_flags & MS_RDONLY) && !(*flags & MS_RDONLY)) {
/*
* Invalidate any previously read metadata. fsck may
* have changed the on-disk data since we mounted r/o
*/
truncate_inode_pages(sbi->direct_mapping, 0);
return jfs_mount_rw(sb, 1);
} else if ((!(sb->s_flags & MS_RDONLY)) && (*flags & MS_RDONLY))
return jfs_umount_rw(sb);
return 0;
}
static struct super_operations jfs_sops = {
alloc_inode: jfs_alloc_inode,
destroy_inode: jfs_destroy_inode,
read_inode: jfs_read_inode,
dirty_inode: jfs_dirty_inode,
write_inode: jfs_write_inode,
put_inode: jfs_put_inode,
delete_inode: jfs_delete_inode,
put_super: jfs_put_super,
statfs: jfs_statfs,
remount_fs: jfs_remount,
};
static int jfs_fill_super(struct super_block *sb, void *data, int silent)
{
struct jfs_sb_info *sbi;
struct inode *inode;
int rc;
jFYI(1, ("In jfs_read_super: s_flags=0x%lx\n", sb->s_flags));
sbi = kmalloc(sizeof(struct jfs_sb_info), GFP_KERNEL);
JFS_SBI(sb) = sbi;
if (!sbi)
return -ENOSPC;
memset(sbi, 0, sizeof(struct jfs_sb_info));
if (!parse_options((char *)data, sbi)) {
kfree(sbi);
return -EINVAL;
}
/*
* Initialize blocksize to 4K.
*/
sb->s_blocksize = PSIZE;
sb->s_blocksize_bits = L2PSIZE;
set_blocksize(sb->s_dev, PSIZE);
sb->s_op = &jfs_sops;
/*
* Initialize direct-mapping inode/address-space
*/
inode = new_inode(sb);
if (inode == NULL)
goto out_kfree;
inode->i_ino = 0;
inode->i_nlink = 1;
inode->i_size = 0x0000010000000000LL;
inode->i_mapping->a_ops = &direct_aops;
inode->i_mapping->gfp_mask = GFP_NOFS;
sbi->direct_inode = inode;
sbi->direct_mapping = inode->i_mapping;
rc = jfs_mount(sb);
if (rc) {
if (!silent) {
jERROR(1,
("jfs_mount failed w/return code = %d\n",
rc));
}
goto out_mount_failed;
}
if (sb->s_flags & MS_RDONLY)
sbi->log = 0;
else {
rc = jfs_mount_rw(sb, 0);
if (rc) {
if (!silent) {
jERROR(1,
("jfs_mount_rw failed w/return code = %d\n",
rc));
}
goto out_no_rw;
}
}
sb->s_magic = JFS_SUPER_MAGIC;
inode = iget(sb, ROOT_I);
if (!inode || is_bad_inode(inode))
goto out_no_root;
sb->s_root = d_alloc_root(inode);
if (!sb->s_root)
goto out_no_root;
if (!sbi->nls_tab)
sbi->nls_tab = load_nls_default();
sb->s_maxbytes = ((u64) sb->s_blocksize) << 40;
#if BITS_PER_LONG == 32
sb->s_maxbytes = min((u64)PAGE_CACHE_SIZE << 32, sb->s_maxbytes);
#endif
return 0;
out_no_root:
jEVENT(1, ("jfs_read_super: get root inode failed\n"));
if (inode)
iput(inode);
out_no_rw:
rc = jfs_umount(sb);
if (rc) {
jERROR(1, ("jfs_umount failed with return code %d\n", rc));
}
out_mount_failed:
fsync_inode_data_buffers(sbi->direct_inode);
truncate_inode_pages(sbi->direct_mapping, 0);
make_bad_inode(sbi->direct_inode);
iput(sbi->direct_inode);
sbi->direct_inode = NULL;
sbi->direct_mapping = NULL;
out_kfree:
if (sbi->nls_tab)
unload_nls(sbi->nls_tab);
kfree(sbi);
return -EINVAL;
}
static struct super_block *jfs_get_sb(struct file_system_type *fs_type,
int flags, char *dev_name, void *data)
{
return get_sb_bdev(fs_type, flags, dev_name, data, jfs_fill_super);
}
static struct file_system_type jfs_fs_type = {
owner: THIS_MODULE,
name: "jfs",
get_sb: jfs_get_sb,
fs_flags: FS_REQUIRES_DEV,
};
extern int metapage_init(void);
extern int txInit(void);
extern void txExit(void);
extern void metapage_exit(void);
static void init_once(void * foo, kmem_cache_t * cachep, unsigned long flags)
{
struct jfs_inode_info *jfs_ip = (struct jfs_inode_info *) foo;
if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) ==
SLAB_CTOR_CONSTRUCTOR) {
INIT_LIST_HEAD(&jfs_ip->anon_inode_list);
INIT_LIST_HEAD(&jfs_ip->mp_list);
RDWRLOCK_INIT(&jfs_ip->rdwrlock);
inode_init_once(&jfs_ip->vfs_inode);
}
}
static int __init init_jfs_fs(void)
{
int rc;
printk("JFS development version: $Name: $\n");
jfs_inode_cachep =
kmem_cache_create("jfs_ip",
sizeof(struct jfs_inode_info),
0, 0, init_once, NULL);
if (jfs_inode_cachep == NULL)
return -ENOMEM;
/*
* Metapage initialization
*/
rc = metapage_init();
if (rc) {
jERROR(1, ("metapage_init failed w/rc = %d\n", rc));
goto free_slab;
}
/*
* Transaction Manager initialization
*/
rc = txInit();
if (rc) {
jERROR(1, ("txInit failed w/rc = %d\n", rc));
goto free_metapage;
}
/*
* I/O completion thread (endio)
*/
jfsIOthread = kernel_thread(jfsIOWait, 0,
CLONE_FS | CLONE_FILES |
CLONE_SIGHAND);
if (jfsIOthread < 0) {
jERROR(1,
("init_jfs_fs: fork failed w/rc = %d\n",
jfsIOthread));
goto end_txmngr;
}
wait_for_completion(&jfsIOwait); /* Wait until IO thread starts */
jfsCommitThread = kernel_thread(jfs_lazycommit, 0,
CLONE_FS | CLONE_FILES |
CLONE_SIGHAND);
if (jfsCommitThread < 0) {
jERROR(1,
("init_jfs_fs: fork failed w/rc = %d\n",
jfsCommitThread));
goto kill_iotask;
}
wait_for_completion(&jfsIOwait); /* Wait until IO thread starts */
jfsSyncThread = kernel_thread(jfs_sync, 0,
CLONE_FS | CLONE_FILES |
CLONE_SIGHAND);
if (jfsSyncThread < 0) {
jERROR(1,
("init_jfs_fs: fork failed w/rc = %d\n",
jfsSyncThread));
goto kill_committask;
}
wait_for_completion(&jfsIOwait); /* Wait until IO thread starts */
#if defined(CONFIG_JFS_DEBUG) && defined(CONFIG_PROC_FS)
jfs_proc_init();
#endif
return register_filesystem(&jfs_fs_type);
kill_committask:
send_sig(SIGKILL, jfsCommitTask, 1);
wait_for_completion(&jfsIOwait); /* Wait until Commit thread exits */
kill_iotask:
send_sig(SIGKILL, jfsIOtask, 1);
wait_for_completion(&jfsIOwait); /* Wait until IO thread exits */
end_txmngr:
txExit();
free_metapage:
metapage_exit();
free_slab:
kmem_cache_destroy(jfs_inode_cachep);
return -rc;
}
static void __exit exit_jfs_fs(void)
{
jFYI(1, ("exit_jfs_fs called\n"));
in_shutdown = 1;
txExit();
metapage_exit();
send_sig(SIGKILL, jfsIOtask, 1);
wait_for_completion(&jfsIOwait); /* Wait until IO thread exits */
send_sig(SIGKILL, jfsCommitTask, 1);
wait_for_completion(&jfsIOwait); /* Wait until Commit thread exits */
send_sig(SIGKILL, jfsSyncTask, 1);
wait_for_completion(&jfsIOwait); /* Wait until Sync thread exits */
#if defined(CONFIG_JFS_DEBUG) && defined(CONFIG_PROC_FS)
jfs_proc_clean();
#endif
unregister_filesystem(&jfs_fs_type);
kmem_cache_destroy(jfs_inode_cachep);
}
EXPORT_NO_SYMBOLS;
module_init(init_jfs_fs)
module_exit(exit_jfs_fs)
/*
*
* Copyright (c) International Business Machines Corp., 2000
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
* the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* JFS fast symlink handling code
*/
#include <linux/fs.h>
#include "jfs_incore.h"
static int jfs_readlink(struct dentry *, char *buffer, int buflen);
static int jfs_follow_link(struct dentry *dentry, struct nameidata *nd);
/*
* symlinks can't do much...
*/
struct inode_operations jfs_symlink_inode_operations = {
readlink: jfs_readlink,
follow_link: jfs_follow_link,
};
static int jfs_follow_link(struct dentry *dentry, struct nameidata *nd)
{
char *s = JFS_IP(dentry->d_inode)->i_inline;
return vfs_follow_link(nd, s);
}
static int jfs_readlink(struct dentry *dentry, char *buffer, int buflen)
{
char *s = JFS_IP(dentry->d_inode)->i_inline;
return vfs_readlink(dentry, buffer, buflen, s);
}
...@@ -12,7 +12,7 @@ fi ...@@ -12,7 +12,7 @@ fi
# msdos and Joliet want NLS # msdos and Joliet want NLS
if [ "$CONFIG_JOLIET" = "y" -o "$CONFIG_FAT_FS" != "n" \ if [ "$CONFIG_JOLIET" = "y" -o "$CONFIG_FAT_FS" != "n" \
-o "$CONFIG_NTFS_FS" != "n" -o "$CONFIG_NCPFS_NLS" = "y" \ -o "$CONFIG_NTFS_FS" != "n" -o "$CONFIG_NCPFS_NLS" = "y" \
-o "$CONFIG_SMB_NLS" = "y" ]; then -o "$CONFIG_SMB_NLS" = "y" -o "$CONFIG_JFS_FS" != "n" ]; then
define_bool CONFIG_NLS y define_bool CONFIG_NLS y
else else
define_bool CONFIG_NLS n define_bool CONFIG_NLS n
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment