Add JFS file system

edf3c7a4 · Dave Kleikamp · a814d16f · edf3c7a4 · edf3c7a4 · edf3c7a4
Commit edf3c7a4 authored Feb 21, 2002 by Dave Kleikamp
50 changed files
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -54,6 +54,7 @@ o  binutils               2.9.5.0.25              # ld -v
 o  util-linux             2.10o                   # fdformat --version
 o  modutils               2.4.2                   # insmod -V
 o  e2fsprogs              1.25                    # tune2fs
+o  jfsutils               1.0.14                  # fsck.jfs -V
 o  reiserfsprogs          3.x.0j                  # reiserfsck 2>&1|grep reiserfsprogs
 o  pcmcia-cs              3.1.21                  # cardmgr -V
 o  PPP                    2.4.0                   # pppd --version
@@ -106,8 +107,8 @@ assembling the 16-bit boot code, removing the need for as86 to compile
 your kernel.  This change does, however, mean that you need a recent
 release of binutils.

-System utilities
-================
+System utililities
+==================

 Architectural changes
 ---------------------
@@ -165,6 +166,16 @@ E2fsprogs
 The latest version of e2fsprogs fixes several bugs in fsck and
 debugfs.  Obviously, it's a good idea to upgrade.

+JFSutils
+--------
+
+The jfsutils package contains the utilities for the file system.
+The following utilities are available:
+o fsck.jfs - initiate replay of the transaction log, and check
+  and repair a JFS formatted partition.
+o mkfs.jfs - create a JFS formatted partition.
+o other file system utilities are also available in this package.
+
 Reiserfsprogs
 -------------

@@ -303,6 +314,10 @@ E2fsprogs
 ---------
 o  <http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.25.tar.gz>

+JFSutils
+--------
+o  <http://oss.software.ibm.com/jfs>
+
 Reiserfsprogs
 -------------
 o  <ftp://ftp.namesys.com/pub/reiserfsprogs/reiserfsprogs-3.x.0j.tar.gz>

--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -22,6 +22,8 @@ hpfs.txt
 	- info and mount options for the OS/2 HPFS.
 isofs.txt
 	- info and mount options for the ISO 9660 (CDROM) filesystem.
+jfs.txt
+	- info and mount options for the JFS filesystem.
 ncpfs.txt
 	- info on Novell Netware(tm) filesystem using NCP protocol.
 ntfs.txt

--- a/Documentation/filesystems/changelog.jfs
+++ b/Documentation/filesystems/changelog.jfs
+IBM's Journaled File System (JFS) for Linux version 1.0.15
+Team members
+Steve Best        sbest@us.ibm.com
+Dave Kleikamp     shaggy@austin.ibm.com  
+Barry Arndt       barndt@us.ibm.com
+Christoph Hellwig hch@caldera.de
+
+
+Release February 15, 2002 (version 1.0.15)
+
+This is our fifty-third release of IBM's Enterprise JFS technology port to Linux.
+Beta 1 was release 0.1.0 on 12/8/2000, Beta 2 was release 0.2.0 on 3/7/2001, 
+Beta 3 was release 0.3.0 on 4/30/2001, and release 1.0.0 on 6/28/2001.
+
+Function and Fixes in drop 53 (1.0.15)
+   - Fix trap when appending to very large file
+   - Moving jfs headers into fs/jfs at Linus' request
+   - Move up to linux-2.5.4
+   - Fix file size limit on 32-bit (Andi Kleen)
+   - make changelog more read-able and include only 1.0.0 and above (Christoph Hellwig)
+   - Don't allocate metadata pages from high memory. JFS keeps them kmapped too long causing deadlock.
+   - Fix xtree corruption when creating file with >= 64 GB of physically contiguous dasd
+   - Replace semaphore with struct completion for thread startup/shutdown (Benedikt Spranger)
+   - cleanup Tx alloc/free (Christoph Hellwig)
+   - Move up to linux-2.5.3
+   - thread cleanups (Christoph Hellwig)
+   - First step toward making tblocks and tlocks dynamically allocated. Intro tid_t and lid_t to
+     insulate the majority of the code from future changes. Also hide TxBlock and TxLock arrays
+     by using macros to get from tids and lids to real structures.
+   - minor list-handling cleanup (Christoph Hellwig)
+   - Replace altnext and altprev with struct list_head
+   - Clean up the debugging code and add support for collecting statistics (Christoph Hellwig)
+   
+  
+Function and Fixes in drop 52 (1.0.14)
+   - Fix hang in invalidate_metapages when jfs.o is built as a module
+   - Fix anon_list removal logic in txLock
+
+Function and Fixes in drop 51 (1.0.13)
+   - chmod changes on newly created directories are lost after umount (bug 2535)
+   - Page locking race fixes
+   - Improve metapage locking
+   - Fix timing window. Lock page while metapage is active to avoid page going 
+     away before the metadata is released. (Fixed crash during mount/umount testing)
+   - Make changes for 2.5.2 kernel
+   - Fix race condition truncating large files
+         
+Function and Fixes in drop50 (1.0.12)
+   - Add O_DIRECT support
+   - Add support for 2.4.17 kernel
+   - Make sure COMMIT_STALE gets reset before the inode is unlocked. Fixing
+     this gets rid of XT_GETPAGE errors
+   - Remove invalid __exit keyword from metapage_exit and txExit.
+   - fix assert(log->cqueue.head == NULL by waiting longer
+   
+Function and Fixes in drop49 (1.0.11)
+   - Readdir was not handling multibyte codepages correctly.
+   - Make mount option parsing more robust.
+   - Add iocharset mount option.
+   - Journalling of symlinks incorrect, resulting in logredo failure of -265.
+   - Add jfsutils information to Changes file
+   - Improve recoverability of the file system when metadata corruption is detected.
+   - Fix kernel OOPS when root inode is corrupted 
+   
+Function and Fixes in drop48 (1.0.10)
+   - put inodes later on hash queues
+   - Fix boundary case in xtTruncate
+   - When invalidating metadata, try to flush the dirty buffers rather than sync them.
+   - Add another sanity check to avoid trapping when imap is corrupt
+   - Fix file truncate while removing large file (assert(cmp == 0))
+   - read_cache_page returns ERR_PTR, not NULL on error
+   - Add dtSearchNode and dtRelocate
+   - JFS needs to use generic_file_open & generic_file_llseek
+   - Remove lazyQwait, etc. It created an unnecessary bottleneck in TxBegin.
+
+Function and Fixes in drop47 (1.0.9)
+   - Fix data corruption problem when creating files while deleting others. (jitterbug 183)
+   - Make sure all metadata is written before finalizing the log
+   - Fix serialization problem in shutdown by setting i_size of directory sooner. (bugzilla #334)
+   - JFS should quit whining when special files are marked dirty during read-only mount.
+   - Must always check rc after DT_GETPAGE
+   - Add diExtendFS
+   - Removing defconfig form JFS source - not really needed
+   
+Function and Fixes in drop46 (1.0.8)
+   - Synclist was being built backwards causing logredo to quit too early
+   - jfs_compat.h needs to include module.h
+   - uncomment EXPORTS_NO_SYMBOLS in super.c
+   - Minor code cleanup
+   - xtree of zero-truncated file not being logged
+   - Fix logging on file truncate
+   - remove unused metapage fields
+
+Function and Fixes in drop45 (1.0.7)
+   - cleanup remove IS_KIOBUFIO define.
+   - cleanup remove TRUNC_NO_TOSS define. 
+   - have jFYI's use the name directly from dentry  
+   - Remove nul _ALLOC and _FREE macros and also make spinlocks static. 
+   - cleanup add externs where needed in the header files  
+   - jfs_write_inode is a bad place to call iput.  Also limit warnings.
+   - More truncate cleanup 
+   - Truncate cleanup 
+   - Add missing statics in jfs_metapage.c 
+   - fsync fixes   
+   - Clean up symlink code - use page_symlink_inode_operations 
+   - unicode handling cleanup   
+   - cleanup replace UniChar with wchar_t
+   - Get rid of CDLL_* macros - use list.h instead 
+   - 2.4.11-prex mount problem Call new_inode instead of get_empty_inode 
+   - use kernel min/max macros 
+   - Add MODULE_LICENSE stub for older kernels 
+   - IA64/gcc3 fixes 
+   - Log Manager fixes, introduce __SLEEP_COND macro 
+   - Mark superblock dirty when some errors detected (forcing fsck to be run).
+   - More robust remounting from r/o to r/w. 
+   - Misc. cleanup add static where appropriate 
+   - small cleanup in jfs_umount_rw 
+   - add MODULE_ stuff 
+   - Set *dropped_lock in alloc_metapage 
+   - Get rid of unused log list 
+   - cleanup jfs_imap.c to remove _OLD_STUFF and _NO_MORE_MOUNT_INODE defines 
+   - Log manager cleanup  
+   - Transaction manager cleanup 
+   - correct memory allocations flags 
+   - Better handling of iterative truncation
+   - Change continue to break, otherwise we don't re-acquire LAZY_LOCK
+
+Function and Fixes in drop44 (1.0.6)
+   - Create jfs_incore.h which merges linux/jfs_fs.h, linux/jfs_fs_i.h, and jfs_fs_sb.h
+   - Create a configuration option to handle JFS_DEBUG define
+   - Fixed a few cases where positive error codes were returned to the VFS.
+   - Replace jfs_dir_read by generic_read_dir.
+   - jfs_fsync_inode is only called by jfs_fsync_file, merge the two and rename to jfs_fsync.
+   - Add a bunch of missing externs.
+   - jfs_rwlock_lock is unused, nuke it.
+   - Always use atomic set/test_bit operations to protect jfs_ip->cflag 
+   - Combine jfs_ip->flag with jfs_ip->cflag
+   - Fixed minor format errors reported by fsck 
+   - cflags should be long so bitops always works correctly
+   - Use GFP_NOFS for runtime memory allocations 
+   - Support  VM changes in 2.4.10 of the kernel
+   - Remove ifdefs supporting older 2.4 kernels. JFS now requires at least 2.4.3 or 2.4.2-ac2
+   - Simplify and remove one use of IWRITE_TRYLOCK
+   - jfs_truncate was not passing tid to xtTruncate
+   - removed obsolete extent_page workaround
+   - correct recovery from failed diAlloc call (disk full)
+   - In write_metapage, don't call commit_write if prepare_write failed   
+   
+Function and Fixes in drop43 (1.0.5)
+   - Allow separate allocation of JFS-private superblock/inode data.
+   - Remove checks in namei.c that are already done by the VFS.
+   - Remove redundant mutex defines.
+   - Replace all occurrences of #include <linux/malloc.h> with #include <linux/slab.h>
+   - Work around race condition in remount -fixes OOPS during shutdown
+   - Truncate large files incrementally ( affects directories too)
+
+Function and Fixes in drop42 (1.0.4)
+   - Fixed compiler warnings in the FS when building on 64 bits systems
+   - Fixed deadlock where jfsCommit hung in hold_metapage
+   - Fixed problems with remount
+   - Reserve metapages for jfsCommit thread 
+   - Get rid of buggy invalidate_metapage & use discard_metapage 
+   - Don't hand metapages to jfsIOthread (too many context switches) (jitterbug 125, bugzilla 238)
+   - Fix error message in jfs_strtoUCS
+
+Function and Fixes in drop41 (1.0.3)
+   - Patch to move from previous release to latest release needs to update the version number in super.c 
+   - Jitterbug problems (134,140,152) removing files have been fixed
+   - Set rc=ENOSPC if ialloc fails in jfs_create and jfs_mkdir
+   - Fixed jfs_txnmgr.c 775! assert
+   - Fixed jfs_txnmgr.c 884! assert(mp->nohomeok==0)
+   - Fix hang - prevent tblocks from being exhausted
+   - Fix oops trying to mount reiserfs
+   - Fail more gracefully in jfs_imap.c
+   - Print more information when char2uni fails
+   - Fix timing problem between Block map and metapage cache - jitterbug 139
+   - Code Cleanup (removed many ifdef's, obsolete code, ran code through indent) Mostly 2.4 tree
+   - Split source tree (Now have a separate source tree for 2.2, 2.4, and jfsutils)  
+
+Function and Fixes in drop40 (1.0.2)
+   - Fixed multiple truncate hang
+   - Fixed hang on unlink a file and sync happening at the same time
+   - Improved handling of kmalloc error conditions
+   - Fixed hang in blk_get_queue and SMP deadlock: bh_end_io call generic_make_request
+     (jitterbug 145 and 146)
+   - stbl was not set correctly set in dtDelete  
+   - changed trap to printk in dbAllocAG to avoid system hang
+
+Function and Fixes in drop 39 (1.0.1)
+   - Fixed hang during copying files on 2.2.x series
+   - Fixed TxLock compile problem
+   - Fixed to correctly update the number of blocks for directories (this was causing the FS 
+     to show fsck error after compiling mozilla).
+   - Fixed to prevent old data from being written to disk from the page cache. 
+
+Function and Fixes in drop 38 (1.0.0)
+   - Fixed some general log problems   
+
+Please send bugs, comments, cards and letters to linuxjfs@us.ibm.com.
+
+The JFS mailing list can be subscribed to by using the link labeled "Mail list Subscribe"
+at our web page http://oss.software.ibm.com/jfs/.
+
+
+
+
+
+
+
+
+
--- a/Documentation/filesystems/jfs.txt
+++ b/Documentation/filesystems/jfs.txt
+IBM's Journaled File System (JFS) for Linux version 1.0.15
+Team members
+Steve Best         sbest@us.ibm.com
+Dave Kleikamp      shaggy@austin.ibm.com  
+Barry Arndt        barndt@us.ibm.com
+Christoph Hellwig  hch@caldera.de
+
+
+Release February 15, 2002 (version 1.0.15)
+
+This is our fifty-third release of IBM's Enterprise JFS technology port to Linux.
+Beta 1 was release 0.1.0 on 12/8/2000, Beta 2 was release 0.2.0 on 3/7/2001, 
+Beta 3 was release 0.3.0 on 4/30/2001, and release 1.0.0 on 6/28/2001.
+ 
+The changelog.jfs file contains detailed information of changes done in each source
+code drop.
+
+JFS has a source tree that can be built on 2.4.3 - 2.4.17 and 2.5.4 kernel.org
+source trees.
+ 
+Our current goal on the 2.5.x series of the kernel is to update to the latest 
+2.5.x version and only support the latest version of this kernel.
+This will change when the distros start shipping the 2.5.x series of the kernel.
+
+Our current goal on the 2.4.x series of the kernel is to continue to support
+all of the kernels in this series as we do today.
+
+There is an anonymous cvs access available for the JFS tree. The steps below are
+what is needed to pull the JFS cvs tree from the oss.software.ibm.com server.
+
+id anoncvs
+password anoncvs
+
+To checkout 2.4.x series of the JFS files do the following:
+CVSROOT should be set to :pserver:anoncvs@oss.software.ibm.com:/usr/cvs/jfs
+cvs checkout linux24
+
+To checkout 2.5.2 series of the JFS files do the following:
+CVSROOT should be set to :pserver:anoncvs@oss.software.ibm.com:/usr/cvs/jfs
+cvs checkout linux25
+
+To checkout the JFS utilities do the following:
+CVSROOT should be set to :pserver:anoncvs@oss.software.ibm.com:/usr/cvs/jfs
+cvs checkout jfsutils
+
+The cvs tree contains the latest changes being done to JFS. To receive notification
+of commits to the cvs tree, please send e-mail to linuxjfs@us.ibm.com stating that 
+you would like notifications sent to you. 
+
+The jfs-2.4-1.0.15-patch.tar.gz is the easiest way to get the latest file system
+source code on your system. There are also patch files that can move your jfs source
+code from one release to another. If you have release 1.0.14 and would like to move
+to release 1.0.15 the patch file named jfs-2.4-1_0_14-to-1_0_15-patch.gz will do that.
+
+The jfs-2.4-1.0.15-patch.tar.gz file contains a readme and patch files for different
+levels of the 2.4 kernel. Please see the README in the jfs-2.4-1.0.15-patch.tar.gz
+file for help on applying the two patch files. 
+
+
+The following files in the kernel source tree have been changed so JFS can be built.
+The jfs-2.4-1.0.15.tar.gz source tar ball contains each of the files below with
+the extension of the kernel level it is associated with. As an example, there are now
+four Config.in files named Config.in-2.4.0, Config.in-2.4.5, Config.in-2.4.7 and 
+Config.in-2.4.17.
+
+
+If you use the jfs-2.4-1.0.15.tar.gz to build JFS you must rename each of the 
+kernel files to the file names listed below. The standard kernel from www.kernel.org 
+is the source of the kernel files that are included in the jfs tar file. 
+
+
+In sub dir fs Config.in, Makefile
+In sub dir fs/nls Config.in
+In sub dir Documentation Configure.help, Changes
+In sub dir Documentation/filesystems 00-INDEX
+In sub dir linux MAINTAINERS
+
+Please backup the above files before the JFS tar file is added to the kernel source 
+tree. All JFS files are located in the include/linux/jfs or fs/jfs sub dirs.
+
+Our development team has used the Linux kernel levels  2.4.3 - 2.4.17 kernels
+with gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) 
+for our port so far. A goal of the JFS team is to have JFS run on all architectures
+that Linux supports, there is no architecture specific code in JFS. JFS has been run
+on the following architectures (x86, PowerPC, Alpha, s/390, ARM) so far. 
+
+To make JFS build, during the "make config" step of building the kernel answer y to
+the Prompt for development and/or incomplete code/drivers in the Code maturity level
+options section. In the Filesystems section use the m for the answer to 
+JFS filesystem support (experimental) (CONFIG_JFS_FS) [Y/m/n?]
+ 
+
+Build in /usr/src/linux with the command:
+
+
+make modules
+make modules_install
+
+If you rebuild jfs.o after having mounted and unmounted a partition, "modprobe -r jfs" 
+will unload the old module.
+
+For the file system debugging messages are being written to /var/log/messages.
+
+Please see the readme in the utilities package for information about building
+the JFS utilities.
+
+JFS TODO list:
+
+Plans for our near term development items
+
+   - get defrag capabilities operational in the FS
+   - get extendfs capabilities operational in the FS
+   - test EXTENDFS utility, for growing JFS partitions
+   - test defrag utility, calls file system to defrag the file system.
+   - add support for block sizes (512,1024,2048)
+   - add support for logfile on dedicated partition
+
+   
+Longer term work items
+
+   - get access control list functionality operational
+   - get extended attributes functionality operational
+   - add quota support
+
+Please send bugs, comments, cards and letters to linuxjfs@us.ibm.com.
+
+The JFS mailing list can be subscribed to by using the link labeled "Mail list Subscribe"
+at our web page http://oss.software.ibm.com/jfs/.
+
+
+
+
+
+
+
+
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -852,6 +852,13 @@ L:	jffs-dev@axis.com
 W:	http://sources.redhat.com/jffs2/
 S:	Maintained

+JFS FILESYSTEM
+P:	Dave Kleikamp
+M:	shaggy@austin.ibm.com
+L:	jfs-discussion@oss.software.ibm.com
+W:	http://oss.software.ibm.com/jfs/
+S:	Supported
+
 JOYSTICK DRIVER
 P:	Vojtech Pavlik
 M:	vojtech@suse.cz

--- a/fs/Config.help
+++ b/fs/Config.help
@@ -859,6 +859,25 @@ CONFIG_ADFS_FS_RW
  hard drives and ADFS-formatted floppy disks. This is experimental
  codes, so if you're unsure, say N.

+JFS filesystem support
+CONFIG_JFS_FS
+  This is a port of IBM's Journaled Filesystem .  More information is
+  available in the file Documentation/filesystems/jfs.txt.
+
+  If you do not intend to use the JFS filesystem, say N.
+
+JFS Debugging
+CONFIG_JFS_DEBUG
+  If you are experiencing any problems with the JFS filesystem, say
+  Y here.  This will result in additional debugging messages to be
+  written to the system log.  Under normal circumstances, this
+  results in very little overhead.
+
+JFS Statistics
+CONFIG_JFS_STATISTICS
+  Enabling this option will cause statistics from the JFS file system
+  to be made available to the user in the /proc/fs/jfs/ directory.
+
 CONFIG_DEVPTS_FS
  You should say Y here if you said Y to "Unix98 PTY support" above.
  You'll then get a virtual file system which can be mounted on

--- a/fs/Config.in
+++ b/fs/Config.in
@@ -54,6 +54,10 @@ tristate 'ISO 9660 CDROM file system support' CONFIG_ISO9660_FS
 dep_mbool '  Microsoft Joliet CDROM extensions' CONFIG_JOLIET $CONFIG_ISO9660_FS
 dep_mbool '  Transparent decompression extension' CONFIG_ZISOFS $CONFIG_ISO9660_FS

+tristate 'JFS filesystem support' CONFIG_JFS_FS
+dep_mbool '  JFS debugging' CONFIG_JFS_DEBUG $CONFIG_JFS_FS
+dep_mbool '  JFS statistics' CONFIG_JFS_STATISTICS $CONFIG_JFS_FS
+
 tristate 'Minix fs support' CONFIG_MINIX_FS

 tristate 'FreeVxFS file system support (VERITAS VxFS(TM) compatible)' CONFIG_VXFS_FS

--- a/fs/Makefile
+++ b/fs/Makefile
@@ -67,6 +67,7 @@ subdir-$(CONFIG_ADFS_FS)	+= adfs
 subdir-$(CONFIG_REISERFS_FS)	+= reiserfs
 subdir-$(CONFIG_DEVPTS_FS)	+= devpts
 subdir-$(CONFIG_SUN_OPENPROMFS)	+= openpromfs
+subdir-$(CONFIG_JFS_FS)		+= jfs


 obj-$(CONFIG_BINFMT_AOUT)	+= binfmt_aout.o

--- a/fs/jfs/Makefile
+++ b/fs/jfs/Makefile
+#
+# Makefile for the Linux JFS filesystem routines.
+#
+# Note! Dependencies are done automagically by 'make dep', which also
+# removes any old dependencies. DON'T put your own dependencies here
+# unless it's something special (not a .c file).
+#
+# Note 2! The CFLAGS definitions are now in the main makefile.
+
+O_TARGET := jfs.o
+obj-y   := super.o file.o inode.o namei.o jfs_mount.o jfs_umount.o \
+	    jfs_xtree.o jfs_imap.o jfs_debug.o jfs_dmap.o \
+	    jfs_unicode.o jfs_dtree.o jfs_inode.o \
+	    jfs_extent.o symlink.o jfs_metapage.o \
+	    jfs_logmgr.o jfs_txnmgr.o jfs_uniupr.o
+obj-m   := $(O_TARGET)
+
+EXTRA_CFLAGS += -D_JFS_4K
+
+include $(TOPDIR)/Rules.make
--- a/fs/jfs/endian24.h
+++ b/fs/jfs/endian24.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef _H_ENDIAN24
+#define	_H_ENDIAN24
+
+/*
+ *	fs/jfs/endian24.h:
+ *
+ * Endian conversion for 24-byte data
+ *
+ */
+#define __swab24(x) \
+({ \
+	__u32 __x = (x); \
+	((__u32)( \
+		((__x & (__u32)0x000000ffUL) << 16) | \
+		 (__x & (__u32)0x0000ff00UL)        | \
+		((__x & (__u32)0x00ff0000UL) >> 16) )); \
+})
+
+#if (defined(__KERNEL__) && defined(__LITTLE_ENDIAN)) || (defined(__BYTE_ORDER) && (__BYTE_ORDER == __LITTLE_ENDIAN))
+	#define __cpu_to_le24(x) ((__u32)(x))
+	#define __le24_to_cpu(x) ((__u32)(x))
+#else
+	#define __cpu_to_le24(x) __swab24(x)
+	#define __le24_to_cpu(x) __swab24(x)
+#endif
+
+#ifdef __KERNEL__
+	#define cpu_to_le24 __cpu_to_le24
+	#define le24_to_cpu __le24_to_cpu
+#endif
+
+#endif				/* !_H_ENDIAN24 */
--- a/fs/jfs/file.c
+++ b/fs/jfs/file.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/fs.h>
+#include <linux/locks.h>
+#include "jfs_incore.h"
+#include "jfs_txnmgr.h"
+#include "jfs_debug.h"
+
+
+extern int generic_file_open(struct inode *, struct file *);
+extern loff_t generic_file_llseek(struct file *, loff_t, int origin);
+
+extern int jfs_commit_inode(struct inode *, int);
+
+int jfs_fsync(struct file *file, struct dentry *dentry, int datasync)
+{
+	struct inode *inode = dentry->d_inode;
+	int rc = 0;
+
+	rc = fsync_inode_data_buffers(inode);
+
+	if (!(inode->i_state & I_DIRTY))
+		return rc;
+	if (datasync || !(inode->i_state & I_DIRTY_DATASYNC))
+		return rc;
+
+	IWRITE_LOCK(inode);
+	rc |= jfs_commit_inode(inode, 1);
+	IWRITE_UNLOCK(inode);
+
+	return rc ? -EIO : 0;
+}
+
+struct file_operations jfs_file_operations = {
+	open:		generic_file_open,
+	llseek:		generic_file_llseek,
+	write:		generic_file_write,
+	read:		generic_file_read,
+	mmap:		generic_file_mmap,
+	fsync:		jfs_fsync,
+};
+
+/*
+ * Guts of jfs_truncate.  Called with locks already held.  Can be called
+ * with directory for truncating directory index table.
+ */
+void jfs_truncate_nolock(struct inode *ip, loff_t length)
+{
+	loff_t newsize;
+	tid_t tid;
+
+	ASSERT(length >= 0);
+
+	if (test_cflag(COMMIT_Nolink, ip)) {
+		xtTruncate(0, ip, length, COMMIT_WMAP);
+		return;
+	}
+
+	do {
+		tid = txBegin(ip->i_sb, 0);
+
+		newsize = xtTruncate(tid, ip, length,
+				     COMMIT_TRUNCATE | COMMIT_PWMAP);
+		if (newsize < 0) {
+			txEnd(tid);
+			break;
+		}
+
+		ip->i_mtime = ip->i_ctime = CURRENT_TIME;
+		mark_inode_dirty(ip);
+
+		txCommit(tid, 1, &ip, 0);
+		txEnd(tid);
+	} while (newsize > length);	/* Truncate isn't always atomic */
+}
+
+static void jfs_truncate(struct inode *ip)
+{
+	jFYI(1, ("jfs_truncate: size = 0x%lx\n", (ulong) ip->i_size));
+
+	IWRITE_LOCK(ip);
+	jfs_truncate_nolock(ip, ip->i_size);
+	IWRITE_UNLOCK(ip);
+}
+
+struct inode_operations jfs_file_inode_operations = {
+	truncate:	jfs_truncate,
+};
--- a/fs/jfs/inode.c
+++ b/fs/jfs/inode.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/fs.h>
+#include <linux/locks.h>
+#include <linux/slab.h>
+#include "jfs_incore.h"
+#include "jfs_filsys.h"
+#include "jfs_imap.h"
+#include "jfs_extent.h"
+#include "jfs_unicode.h"
+#include "jfs_debug.h"
+
+
+extern struct inode_operations jfs_dir_inode_operations;
+extern struct inode_operations jfs_file_inode_operations;
+extern struct inode_operations jfs_symlink_inode_operations;
+extern struct file_operations jfs_dir_operations;
+extern struct file_operations jfs_file_operations;
+struct address_space_operations jfs_aops;
+extern int freeZeroLink(struct inode *);
+
+void jfs_put_inode(struct inode *inode)
+{
+	jFYI(1, ("In jfs_put_inode, inode = 0x%p\n", inode));
+}
+
+void jfs_read_inode(struct inode *inode)
+{
+	jFYI(1, ("In jfs_read_inode, inode = 0x%p\n", inode));
+
+	if (diRead(inode))
+		goto bad_inode;
+
+	if (S_ISREG(inode->i_mode)) {
+		inode->i_op = &jfs_file_inode_operations;
+		inode->i_fop = &jfs_file_operations;
+		inode->i_mapping->a_ops = &jfs_aops;
+	} else if (S_ISDIR(inode->i_mode)) {
+		inode->i_op = &jfs_dir_inode_operations;
+		inode->i_fop = &jfs_dir_operations;
+		inode->i_mapping->a_ops = &jfs_aops;
+		inode->i_mapping->gfp_mask = GFP_NOFS;
+	} else if (S_ISLNK(inode->i_mode)) {
+		if (inode->i_size > IDATASIZE) {
+			inode->i_op = &page_symlink_inode_operations;
+			inode->i_mapping->a_ops = &jfs_aops;
+		} else
+			inode->i_op = &jfs_symlink_inode_operations;
+	} else {
+		init_special_inode(inode, inode->i_mode,
+				   kdev_t_to_nr(inode->i_rdev));
+	}
+
+	return;
+
+      bad_inode:
+	make_bad_inode(inode);
+}
+
+/* This define is from fs/open.c */
+#define special_file(m) (S_ISCHR(m)||S_ISBLK(m)||S_ISFIFO(m)||S_ISSOCK(m))
+
+/*
+ * Workhorse of both fsync & write_inode
+ */
+int jfs_commit_inode(struct inode *inode, int wait)
+{
+	int rc = 0;
+	tid_t tid;
+	static int noisy = 5;
+
+	jFYI(1, ("In jfs_commit_inode, inode = 0x%p\n", inode));
+
+	/*
+	 * Don't commit if inode has been committed since last being
+	 * marked dirty, or if it has been deleted.
+	 */
+	if (test_cflag(COMMIT_Nolink, inode) ||
+	    !test_cflag(COMMIT_Dirty, inode))
+		return 0;
+
+	if (isReadOnly(inode)) {
+		/* kernel allows writes to devices on read-only
+		 * partitions and may think inode is dirty
+		 */
+		if (!special_file(inode->i_mode) && noisy) {
+			jERROR(1, ("jfs_commit_inode(0x%p) called on "
+				   "read-only volume\n", inode));
+			jERROR(1, ("Is remount racy?\n"));
+			noisy--;
+		}
+		return 0;
+	}
+
+	tid = txBegin(inode->i_sb, COMMIT_INODE);
+	rc = txCommit(tid, 1, &inode, wait ? COMMIT_SYNC : 0);
+	txEnd(tid);
+	return -rc;
+}
+
+void jfs_write_inode(struct inode *inode, int wait)
+{
+	/*
+	 * If COMMIT_DIRTY is not set, the inode isn't really dirty.
+	 * It has been committed since the last change, but was still
+	 * on the dirty inode list
+	 */
+	if (test_cflag(COMMIT_Nolink, inode) ||
+	    !test_cflag(COMMIT_Dirty, inode))
+		return;
+
+	IWRITE_LOCK(inode);
+
+	if (jfs_commit_inode(inode, wait)) {
+		jERROR(1, ("jfs_write_inode: jfs_commit_inode failed!\n"));
+	}
+
+	IWRITE_UNLOCK(inode);
+}
+
+void jfs_delete_inode(struct inode *inode)
+{
+	jFYI(1, ("In jfs_delete_inode, inode = 0x%p\n", inode));
+
+	IWRITE_LOCK(inode);
+	if (test_cflag(COMMIT_Freewmap, inode))
+		freeZeroLink(inode);
+
+	diFree(inode);
+	IWRITE_UNLOCK(inode);
+
+	clear_inode(inode);
+}
+
+void jfs_dirty_inode(struct inode *inode)
+{
+	static int noisy = 5;
+
+	if (isReadOnly(inode)) {
+		if (!special_file(inode->i_mode) && noisy) {
+			/* kernel allows writes to devices on read-only
+			 * partitions and may try to mark inode dirty
+			 */
+			jERROR(1, ("jfs_dirty_inode called on "
+				   "read-only volume\n"));
+			jERROR(1, ("Is remount racy?\n"));
+			noisy--;
+		}
+		return;
+	}
+
+	set_cflag(COMMIT_Dirty, inode);
+}
+
+static int jfs_get_block(struct inode *ip, sector_t lblock,
+			 struct buffer_head *bh_result, int create)
+{
+	s64 lblock64 = lblock;
+	int no_size_check = 0;
+	int rc = 0;
+	int take_locks;
+	xad_t xad;
+	s64 xaddr;
+	int xflag;
+	s32 xlen;
+
+	/*
+	 * If this is a special inode (imap, dmap) or directory,
+	 * the lock should already be taken
+	 */
+	take_locks = ((JFS_IP(ip)->fileset != AGGREGATE_I) &&
+		      !S_ISDIR(ip->i_mode));
+	/*
+	 * Take appropriate lock on inode
+	 */
+	if (take_locks) {
+		if (create)
+			IWRITE_LOCK(ip);
+		else
+			IREAD_LOCK(ip);
+	}
+
+	/*
+	 * A directory's "data" is the inode index table, but i_size is the
+	 * size of the d-tree, so don't check the offset against i_size
+	 */
+	if (S_ISDIR(ip->i_mode))
+		no_size_check = 1;
+
+	if ((no_size_check ||
+	     ((lblock64 << ip->i_sb->s_blocksize_bits) < ip->i_size)) &&
+	    (xtLookup
+	     (ip, lblock64, 1, &xflag, &xaddr, &xlen, no_size_check)
+	     == 0) && xlen) {
+		if (xflag & XAD_NOTRECORDED) {
+			if (!create)
+				/*
+				 * Allocated but not recorded, read treats
+				 * this as a hole
+				 */
+				goto unlock;
+#ifdef _JFS_4K
+			XADoffset(&xad, lblock64);
+			XADlength(&xad, xlen);
+			XADaddress(&xad, xaddr);
+#else				/* _JFS_4K */
+			/*
+			 * As long as block size = 4K, this isn't a problem.
+			 * We should mark the whole page not ABNR, but how
+			 * will we know to mark the other blocks BH_New?
+			 */
+			BUG();
+#endif				/* _JFS_4K */
+			rc = extRecord(ip, &xad);
+			if (rc)
+				goto unlock;
+			bh_result->b_state |= (1UL << BH_New);
+		}
+
+		map_bh(bh_result, ip->i_sb, xaddr);
+		goto unlock;
+	}
+	if (!create)
+		goto unlock;
+
+	/*
+	 * Allocate a new block
+	 */
+#ifdef _JFS_4K
+	if ((rc =
+	     extHint(ip, lblock64 << ip->i_sb->s_blocksize_bits, &xad)))
+		goto unlock;
+	rc = extAlloc(ip, 1, lblock64, &xad, FALSE);
+	if (rc)
+		goto unlock;
+
+	bh_result->b_state |= (1UL << BH_New);
+	map_bh(bh_result, ip->i_sb, addressXAD(&xad));
+
+#else				/* _JFS_4K */
+	/*
+	 * We need to do whatever it takes to keep all but the last buffers
+	 * in 4K pages - see jfs_write.c
+	 */
+	BUG();
+#endif				/* _JFS_4K */
+
+      unlock:
+	/*
+	 * Release lock on inode
+	 */
+	if (take_locks) {
+		if (create)
+			IWRITE_UNLOCK(ip);
+		else
+			IREAD_UNLOCK(ip);
+	}
+	return -rc;
+}
+
+static int jfs_writepage(struct page *page)
+{
+	return block_write_full_page(page, jfs_get_block);
+}
+
+static int jfs_readpage(struct file *file, struct page *page)
+{
+	return block_read_full_page(page, jfs_get_block);
+}
+
+static int jfs_prepare_write(struct file *file,
+			     struct page *page, unsigned from, unsigned to)
+{
+	return block_prepare_write(page, from, to, jfs_get_block);
+}
+
+static int jfs_bmap(struct address_space *mapping, long block)
+{
+	return generic_block_bmap(mapping, block, jfs_get_block);
+}
+
+static int jfs_direct_IO(int rw, struct inode *inode, struct kiobuf *iobuf,
+			 unsigned long blocknr, int blocksize)
+{
+	return generic_direct_IO(rw, inode, iobuf, blocknr,
+				 blocksize, jfs_get_block);
+}
+
+struct address_space_operations jfs_aops = {
+	readpage:	jfs_readpage,
+	writepage:	jfs_writepage,
+	sync_page:	block_sync_page,
+	prepare_write:	jfs_prepare_write,
+	commit_write:	generic_commit_write,
+	bmap:		jfs_bmap,
+	direct_IO:	jfs_direct_IO,
+};
--- a/fs/jfs/jfs_btree.h
+++ b/fs/jfs/jfs_btree.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+#ifndef	_H_JFS_BTREE
+#define _H_JFS_BTREE
+/*
+ *	jfs_btree.h: B+-tree
+ *
+ * JFS B+-tree (dtree and xtree) common definitions
+ */
+
+/*
+ *	basic btree page - btpage_t
+ */
+typedef struct {
+	s64 next;		/* 8: right sibling bn */
+	s64 prev;		/* 8: left sibling bn */
+
+	u8 flag;		/* 1: */
+	u8 rsrvd[7];		/* 7: type specific */
+	s64 self;		/* 8: self address */
+
+	u8 entry[4064];		/* 4064: */
+} btpage_t;			/* (4096) */
+
+/* btpaget_t flag */
+#define BT_TYPE		0x07	/* B+-tree index */
+#define	BT_ROOT		0x01	/* root page */
+#define	BT_LEAF		0x02	/* leaf page */
+#define	BT_INTERNAL	0x04	/* internal page */
+#define	BT_RIGHTMOST	0x10	/* rightmost page */
+#define	BT_LEFTMOST	0x20	/* leftmost page */
+#define	BT_SWAPPED	0x80	/* used by fsck for endian swapping */
+
+/* btorder (in inode) */
+#define	BT_RANDOM		0x0000
+#define	BT_SEQUENTIAL		0x0001
+#define	BT_LOOKUP		0x0010
+#define	BT_INSERT		0x0020
+#define	BT_DELETE		0x0040
+
+/*
+ *	btree page buffer cache access
+ */
+#define BT_IS_ROOT(MP) (((MP)->xflag & COMMIT_PAGE) == 0)
+
+/* get page from buffer page */
+#define BT_PAGE(IP, MP, TYPE, ROOT)\
+	(BT_IS_ROOT(MP) ? (TYPE *)&JFS_IP(IP)->ROOT : (TYPE *)(MP)->data)
+
+/* get the page buffer and the page for specified block address */
+#define BT_GETPAGE(IP, BN, MP, TYPE, SIZE, P, RC, ROOT)\
+{\
+	if ((BN) == 0)\
+	{\
+		MP = (metapage_t *)&JFS_IP(IP)->bxflag;\
+		P = (TYPE *)&JFS_IP(IP)->ROOT;\
+		RC = 0;\
+		jEVENT(0,("%d BT_GETPAGE returning root\n", __LINE__));\
+	}\
+	else\
+	{\
+		jEVENT(0,("%d BT_GETPAGE reading block %d\n", __LINE__,\
+			 (int)BN));\
+		MP = read_metapage((IP), BN, SIZE, 1);\
+		if (MP) {\
+			RC = 0;\
+			P = (MP)->data;\
+		} else {\
+			P = NULL;\
+			jERROR(1,("bread failed!\n"));\
+			RC = EIO;\
+		}\
+	}\
+}
+
+#define BT_MARK_DIRTY(MP, IP)\
+{\
+	if (BT_IS_ROOT(MP))\
+		mark_inode_dirty(IP);\
+	else\
+		mark_metapage_dirty(MP);\
+}
+
+/* put the page buffer */
+#define BT_PUTPAGE(MP)\
+{\
+	if (! BT_IS_ROOT(MP)) \
+		release_metapage(MP); \
+}
+
+
+/*
+ *	btree traversal stack
+ *
+ * record the path traversed during the search;
+ * top frame record the leaf page/entry selected.
+ */
+#define	MAXTREEHEIGHT		8
+typedef struct btframe {	/* stack frame */
+	s64 bn;			/* 8: */
+	s16 index;		/* 2: */
+	s16 lastindex;		/* 2: */
+	struct metapage *mp;	/* 4: */
+} btframe_t;			/* (16) */
+
+typedef struct btstack {
+	btframe_t *top;		/* 4: */
+	int nsplit;		/* 4: */
+	btframe_t stack[MAXTREEHEIGHT];
+} btstack_t;
+
+#define BT_CLR(btstack)\
+	(btstack)->top = (btstack)->stack
+
+#define BT_PUSH(BTSTACK, BN, INDEX)\
+{\
+	(BTSTACK)->top->bn = BN;\
+	(BTSTACK)->top->index = INDEX;\
+	++(BTSTACK)->top;\
+	assert((BTSTACK)->top != &((BTSTACK)->stack[MAXTREEHEIGHT]));\
+}
+
+#define BT_POP(btstack)\
+	( (btstack)->top == (btstack)->stack ? NULL : --(btstack)->top )
+
+#define BT_STACK(btstack)\
+	( (btstack)->top == (btstack)->stack ? NULL : (btstack)->top )
+
+/* retrieve search results */
+#define BT_GETSEARCH(IP, LEAF, BN, MP, TYPE, P, INDEX, ROOT)\
+{\
+	BN = (LEAF)->bn;\
+	MP = (LEAF)->mp;\
+	if (BN)\
+		P = (TYPE *)MP->data;\
+	else\
+		P = (TYPE *)&JFS_IP(IP)->ROOT;\
+	INDEX = (LEAF)->index;\
+}
+
+/* put the page buffer of search */
+#define BT_PUTSEARCH(BTSTACK)\
+{\
+	if (! BT_IS_ROOT((BTSTACK)->top->mp))\
+		release_metapage((BTSTACK)->top->mp);\
+}
+#endif				/* _H_JFS_BTREE */
--- a/fs/jfs/jfs_debug.c
+++ b/fs/jfs/jfs_debug.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#include <linux/fs.h>
+#include <linux/ctype.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <asm/uaccess.h>
+#include "jfs_incore.h"
+#include "jfs_filsys.h"
+#include "jfs_debug.h"
+
+#ifdef CONFIG_JFS_DEBUG
+void dump_mem(char *label, void *data, int length)
+{
+	int i, j;
+	int *intptr = data;
+	char *charptr = data;
+	char buf[10], line[80];
+
+	printk("%s: dump of %d bytes of data at 0x%p\n\n", label, length,
+	       data);
+	for (i = 0; i < length; i += 16) {
+		line[0] = 0;
+		for (j = 0; (j < 4) && (i + j * 4 < length); j++) {
+			sprintf(buf, " %08x", intptr[i / 4 + j]);
+			strcat(line, buf);
+		}
+		buf[0] = ' ';
+		buf[2] = 0;
+		for (j = 0; (j < 16) && (i + j < length); j++) {
+			buf[1] =
+			    isprint(charptr[i + j]) ? charptr[i + j] : '.';
+			strcat(line, buf);
+		}
+		printk("%s\n", line);
+	}
+}
+
+#ifdef CONFIG_PROC_FS
+static int loglevel_read(char *page, char **start, off_t off,
+			 int count, int *eof, void *data)
+{
+	int len;
+
+	len = sprintf(page, "%d\n", jfsloglevel);
+
+	len -= off;
+	*start = page + off;
+
+	if (len > count)
+		len = count;
+	else
+		*eof = 1;
+
+	if (len < 0)
+		len = 0;
+
+	return len;
+}
+
+static int loglevel_write(struct file *file, const char *buffer,
+			unsigned long count, void *data)
+{
+	char c;
+
+	if (get_user(c, buffer))
+		return -EFAULT;
+
+	/* yes, I know this is an ASCIIism.  --hch */
+	if (c < '0' || c > '9')
+		return -EINVAL;
+	jfsloglevel = c - '0';
+	return count;
+}
+
+
+extern read_proc_t jfs_txanchor_read;
+#ifdef CONFIG_JFS_STATISTICS
+extern read_proc_t jfs_lmstats_read;
+extern read_proc_t jfs_xtstat_read;
+extern read_proc_t jfs_mpstat_read;
+#endif
+static struct proc_dir_entry *base;
+
+static struct {
+	const char	*name;
+	read_proc_t	*read_fn;
+	write_proc_t	*write_fn;
+} Entries[] = {
+	{ "TxAnchor",	jfs_txanchor_read, },
+#ifdef CONFIG_JFS_STATISTICS
+	{ "lmstats",	jfs_lmstats_read, },
+	{ "xtstat",	jfs_xtstat_read, },
+	{ "mpstat",	jfs_mpstat_read, },
+#endif
+	{ "loglevel",	loglevel_read, loglevel_write }
+};
+#define NPROCENT	(sizeof(Entries)/sizeof(Entries[0]))
+
+void jfs_proc_init(void)
+{
+	int i;
+
+	if (!(base = proc_mkdir("jfs", proc_root_fs)))
+		return;
+	base->owner = THIS_MODULE;
+
+	for (i = 0; i < NPROCENT; i++) {
+		struct proc_dir_entry *p;
+		if ((p = create_proc_entry(Entries[i].name, 0, base))) {
+			p->read_proc = Entries[i].read_fn;
+			p->write_proc = Entries[i].write_fn;
+		}
+	}
+}
+
+void jfs_proc_clean(void)
+{
+	int i;
+
+	if (base) {
+		for (i = 0; i < NPROCENT; i++)
+			remove_proc_entry(Entries[i].name, base);
+		remove_proc_entry("jfs", base);
+	}
+}
+
+#endif /* CONFIG_PROC_FS */
+#endif /* CONFIG_JFS_DEBUG */
--- a/fs/jfs/jfs_debug.h
+++ b/fs/jfs/jfs_debug.h
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+*/
+#ifndef _H_JFS_DEBUG
+#define _H_JFS_DEBUG
+
+/*
+ *	jfs_debug.h
+ *
+ * global debug message, data structure/macro definitions
+ * under control of CONFIG_JFS_DEBUG, CONFIG_JFS_STATISTICS;
+ */
+
+/*
+ *	assert with traditional printf/panic
+ */
+#ifdef CONFIG_KERNEL_ASSERTS
+/* kgdb stuff */
+#define assert(p) KERNEL_ASSERT(#p, p)
+#else
+#define assert(p) {\
+if (!(p))\
+	{\
+		printk("assert(%s)\n",#p);\
+		BUG();\
+	}\
+}
+#endif
+
+/*
+ *	debug ON
+ *	--------
+ */
+#ifdef CONFIG_JFS_DEBUG
+#define ASSERT(p) assert(p)
+
+/* dump memory contents */
+extern void dump_mem(char *label, void *data, int length);
+extern int jfsloglevel;
+
+/* information message: e.g., configuration, major event */
+#define jFYI(button, prspec) \
+	do { if (button && jfsloglevel > 1) printk prspec; } while (0)
+
+/* error event message: e.g., i/o error */
+extern int jfsERROR;
+#define jERROR(button, prspec) \
+	do { if (button && jfsloglevel > 0) { printk prspec; } } while (0)
+
+/* debug event message: */
+#define jEVENT(button,prspec) \
+	do { if (button) printk prspec; } while (0)
+
+/*
+ *	debug OFF
+ *	---------
+ */
+#else				/* CONFIG_JFS_DEBUG */
+#define dump_mem(label,data,length)
+#define ASSERT(p)
+#define jEVENT(button,prspec)
+#define jERROR(button,prspec)
+#define jFYI(button,prspec)
+#endif				/* CONFIG_JFS_DEBUG */
+
+/*
+ *	statistics
+ *	----------
+ */
+#ifdef	CONFIG_JFS_STATISTICS
+#define	INCREMENT(x)		((x)++)
+#define	DECREMENT(x)		((x)--)
+#define	HIGHWATERMARK(x,y)	((x) = max((x), (y)))
+#else
+#define	INCREMENT(x)
+#define	DECREMENT(x)
+#define	HIGHWATERMARK(x,y)
+#endif				/* CONFIG_JFS_STATISTICS */
+
+#endif				/* _H_JFS_DEBUG */
--- a/fs/jfs/jfs_defragfs.h
+++ b/fs/jfs/jfs_defragfs.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+#ifndef	_H_JFS_DEFRAGFS
+#define _H_JFS_DEFRAGFS
+
+/*
+ *	jfs_defragfs.h
+ */
+/*
+ *	defragfs parameter list
+ */
+typedef struct {
+	uint flag;		/* 4: */
+	u8 dev;			/* 1: */
+	u8 pad[3];		/* 3: */
+	s32 fileset;		/* 4: */
+	u32 inostamp;		/* 4: */
+	u32 ino;		/* 4: */
+	u32 gen;		/* 4: */
+	s64 xoff;		/* 8: */
+	s64 old_xaddr;		/* 8: */
+	s64 new_xaddr;		/* 8: */
+	s32 xlen;		/* 4: */
+} defragfs_t;			/* (52) */
+
+/* plist flag */
+#define DEFRAGFS_SYNC		0x80000000
+#define DEFRAGFS_COMMIT		0x40000000
+#define DEFRAGFS_RELOCATE	0x10000000
+
+#define	INODE_TYPE		0x0000F000	/* IFREG or IFDIR */
+
+#define EXTENT_TYPE		0x000000ff
+#define DTPAGE			0x00000001
+#define XTPAGE			0x00000002
+#define DATAEXT			0x00000004
+#define EAEXT			0x00000008
+
+#endif				/* _H_JFS_DEFRAGFS */
--- a/fs/jfs/jfs_dinode.h
+++ b/fs/jfs/jfs_dinode.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+
+#ifndef _H_JFS_DINODE
+#define _H_JFS_DINODE
+
+/*
+ *      jfs_dinode.h: on-disk inode manager
+ *
+ */
+
+#define INODESLOTSIZE           128
+#define L2INODESLOTSIZE         7
+#define log2INODESIZE           9	/* log2(bytes per dinode) */
+
+
+/*
+ *      on-disk inode (dinode_t): 512 bytes
+ *
+ * note: align 64-bit fields on 8-byte boundary.
+ */
+struct dinode {
+	/*
+	 *      I. base area (128 bytes)
+	 *      ------------------------
+	 *
+	 * define generic/POSIX attributes
+	 */
+	u32 di_inostamp;	/* 4: stamp to show inode belongs to fileset */
+	s32 di_fileset;		/* 4: fileset number */
+	u32 di_number;		/* 4: inode number, aka file serial number */
+	u32 di_gen;		/* 4: inode generation number */
+
+	pxd_t di_ixpxd;		/* 8: inode extent descriptor */
+
+	s64 di_size;		/* 8: size */
+	s64 di_nblocks;		/* 8: number of blocks allocated */
+
+	u32 di_nlink;		/* 4: number of links to the object */
+
+	u32 di_uid;		/* 4: user id of owner */
+	u32 di_gid;		/* 4: group id of owner */
+
+	u32 di_mode;		/* 4: attribute, format and permission */
+
+	struct timestruc_t di_atime;	/* 8: time last data accessed */
+	struct timestruc_t di_ctime;	/* 8: time last status changed */
+	struct timestruc_t di_mtime;	/* 8: time last data modified */
+	struct timestruc_t di_otime;	/* 8: time created */
+
+	dxd_t di_acl;		/* 16: acl descriptor */
+
+	dxd_t di_ea;		/* 16: ea descriptor */
+
+	u32 di_next_index;	/* 4: Next available dir_table index */
+
+	s32 di_acltype;		/* 4: Type of ACL */
+
+	/*
+	 *      Extension Areas.
+	 *
+	 *      Historically, the inode was partitioned into 4 128-byte areas,
+	 *      the last 3 being defined as unions which could have multiple
+	 *      uses.  The first 96 bytes had been completely unused until
+	 *      an index table was added to the directory.  It is now more
+	 *      useful to describe the last 3/4 of the inode as a single
+	 *      union.  We would probably be better off redesigning the
+	 *      entire structure from scratch, but we don't want to break
+	 *      commonality with OS/2's JFS at this time.
+	 */
+	union {
+		struct {
+			/*
+			 * This table contains the information needed to
+			 * find a directory entry from a 32-bit index.
+			 * If the index is small enough, the table is inline,
+			 * otherwise, an x-tree root overlays this table
+			 */
+			dir_table_slot_t _table[12];	/* 96: inline */
+
+			dtroot_t _dtroot;		/* 288: dtree root */
+		} _dir;					/* (384) */
+#define di_dirtable	u._dir._table
+#define di_dtroot	u._dir._dtroot
+#define di_parent       di_dtroot.header.idotdot
+#define di_DASD		di_dtroot.header.DASD
+
+		struct {
+			union {
+				u8 _data[96];		/* 96: unused */
+				struct {
+					void *_imap;	/* 4: unused */
+					u32 _gengen;	/* 4: generator */
+				} _imap;
+			} _u1;				/* 96: */
+#define di_gengen	u._file._u1._imap._gengen
+
+			union {
+				xtpage_t _xtroot;
+				struct {
+					u8 unused[16];	/* 16: */
+					dxd_t _dxd;	/* 16: */
+					union {
+						u32 _rdev;	/* 4: */
+						u8 _fastsymlink[128];
+					} _u;
+					u8 _inlineea[128];
+				} _special;
+			} _u2;
+		} _file;
+#define di_xtroot	u._file._u2._xtroot
+#define di_dxd		u._file._u2._special._dxd
+#define di_btroot	di_xtroot
+#define di_inlinedata	u._file._u2._special._u
+#define di_rdev		u._file._u2._special._u._rdev
+#define di_fastsymlink	u._file._u2._special._u._fastsymlink
+#define di_inlineea     u._file._u2._special._inlineea
+	} u;
+};
+
+typedef struct dinode dinode_t;
+
+
+/* extended mode bits (on-disk inode di_mode) */
+#define IFJOURNAL       0x00010000	/* journalled file */
+#define ISPARSE         0x00020000	/* sparse file enabled */
+#define INLINEEA        0x00040000	/* inline EA area free */
+#define ISWAPFILE	0x00800000	/* file open for pager swap space */
+
+/* more extended mode bits: attributes for OS/2 */
+#define IREADONLY	0x02000000	/* no write access to file */
+#define IARCHIVE	0x40000000	/* file archive bit */
+#define ISYSTEM		0x08000000	/* system file */
+#define IHIDDEN		0x04000000	/* hidden file */
+#define IRASH		0x4E000000	/* mask for changeable attributes */
+#define INEWNAME	0x80000000	/* non-8.3 filename format */
+#define IDIRECTORY	0x20000000	/* directory (shadow of real bit) */
+#define ATTRSHIFT	25	/* bits to shift to move attribute
+				   specification to mode position */
+
+#endif /*_H_JFS_DINODE */
--- a/fs/jfs/jfs_dmap.c
+++ b/fs/jfs/jfs_dmap.c
--- a/fs/jfs/jfs_dmap.h
+++ b/fs/jfs/jfs_dmap.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *	jfs_dmap.h: block allocation map manager
+ */
+
+#ifndef	_H_JFS_DMAP
+#define _H_JFS_DMAP
+
+#include "jfs_txnmgr.h"
+
+#define BMAPVERSION	1	/* version number */
+#define	TREESIZE	(256+64+16+4+1)	/* size of a dmap tree */
+#define	LEAFIND		(64+16+4+1)	/* index of 1st leaf of a dmap tree */
+#define LPERDMAP	256	/* num leaves per dmap tree */
+#define L2LPERDMAP	8	/* l2 number of leaves per dmap tree */
+#define	DBWORD		32	/* # of blks covered by a map word */
+#define	L2DBWORD	5	/* l2 # of blks covered by a mword */
+#define BUDMIN  	L2DBWORD	/* max free string in a map word */
+#define BPERDMAP	(LPERDMAP * DBWORD)	/* num of blks per dmap */
+#define L2BPERDMAP	13	/* l2 num of blks per dmap */
+#define CTLTREESIZE	(1024+256+64+16+4+1)	/* size of a dmapctl tree */
+#define CTLLEAFIND	(256+64+16+4+1)	/* idx of 1st leaf of a dmapctl tree */
+#define LPERCTL		1024	/* num of leaves per dmapctl tree */
+#define L2LPERCTL	10	/* l2 num of leaves per dmapctl tree */
+#define	ROOT		0	/* index of the root of a tree */
+#define	NOFREE		((s8) -1)	/* no blocks free */
+#define	MAXAG		128	/* max number of allocation groups */
+#define L2MAXAG		7	/* l2 max num of AG */
+#define L2MINAGSZ	25	/* l2 of minimum AG size in bytes */
+#define	BMAPBLKNO	0	/* lblkno of bmap within the map */
+
+/*
+ * maximum l2 number of disk blocks at the various dmapctl levels.
+ */
+#define	L2MAXL0SIZE	(L2BPERDMAP + 1 * L2LPERCTL)
+#define	L2MAXL1SIZE	(L2BPERDMAP + 2 * L2LPERCTL)
+#define	L2MAXL2SIZE	(L2BPERDMAP + 3 * L2LPERCTL)
+
+/*
+ * maximum number of disk blocks at the various dmapctl levels.
+ */
+#define	MAXL0SIZE	((s64)1 << L2MAXL0SIZE)
+#define	MAXL1SIZE	((s64)1 << L2MAXL1SIZE)
+#define	MAXL2SIZE	((s64)1 << L2MAXL2SIZE)
+
+#define	MAXMAPSIZE	MAXL2SIZE	/* maximum aggregate map size */
+
+/* 
+ * determine the maximum free string for four (lower level) nodes
+ * of the tree.
+ */
+static __inline signed char TREEMAX(signed char *cp)
+{
+	signed char tmp1, tmp2;
+
+	tmp1 = max(*(cp+2), *(cp+3));
+	tmp2 = max(*(cp), *(cp+1));
+
+	return max(tmp1, tmp2);
+}
+
+/*
+ * convert disk block number to the logical block number of the dmap
+ * describing the disk block.  s is the log2(number of logical blocks per page)
+ *
+ * The calculation figures out how many logical pages are in front of the dmap.
+ *	- the number of dmaps preceding it
+ *	- the number of L0 pages preceding its L0 page
+ *	- the number of L1 pages preceding its L1 page
+ *	- 3 is added to account for the L2, L1, and L0 page for this dmap
+ *	- 1 is added to account for the control page of the map.
+ */
+#define BLKTODMAP(b,s)    \
+        ((((b) >> 13) + ((b) >> 23) + ((b) >> 33) + 3 + 1) << (s))
+
+/*
+ * convert disk block number to the logical block number of the LEVEL 0
+ * dmapctl describing the disk block.  s is the log2(number of logical blocks
+ * per page)
+ *
+ * The calculation figures out how many logical pages are in front of the L0.
+ *	- the number of dmap pages preceding it
+ *	- the number of L0 pages preceding it
+ *	- the number of L1 pages preceding its L1 page
+ *	- 2 is added to account for the L2, and L1 page for this L0
+ *	- 1 is added to account for the control page of the map.
+ */
+#define BLKTOL0(b,s)      \
+        (((((b) >> 23) << 10) + ((b) >> 23) + ((b) >> 33) + 2 + 1) << (s))
+
+/*
+ * convert disk block number to the logical block number of the LEVEL 1
+ * dmapctl describing the disk block.  s is the log2(number of logical blocks
+ * per page)
+ *
+ * The calculation figures out how many logical pages are in front of the L1.
+ *	- the number of dmap pages preceding it
+ *	- the number of L0 pages preceding it
+ *	- the number of L1 pages preceding it
+ *	- 1 is added to account for the L2 page
+ *	- 1 is added to account for the control page of the map.
+ */
+#define BLKTOL1(b,s)      \
+     (((((b) >> 33) << 20) + (((b) >> 33) << 10) + ((b) >> 33) + 1 + 1) << (s))
+
+/*
+ * convert disk block number to the logical block number of the dmapctl
+ * at the specified level which describes the disk block.
+ */
+#define BLKTOCTL(b,s,l)   \
+        (((l) == 2) ? 1 : ((l) == 1) ? BLKTOL1((b),(s)) : BLKTOL0((b),(s)))
+
+/* 
+ * convert aggregate map size to the zero origin dmapctl level of the
+ * top dmapctl.
+ */
+#define	BMAPSZTOLEV(size)	\
+	(((size) <= MAXL0SIZE) ? 0 : ((size) <= MAXL1SIZE) ? 1 : 2)
+
+/* convert disk block number to allocation group number.
+ */
+#define BLKTOAG(b,sbi)	((b) >> ((sbi)->bmap->db_agl2size))
+
+/* convert allocation group number to starting disk block
+ * number.
+ */
+#define AGTOBLK(a,ip)	\
+	((s64)(a) << (JFS_SBI((ip)->i_sb)->bmap->db_agl2size))
+
+/*
+ *	dmap summary tree
+ *
+ * dmaptree_t must be consistent with dmapctl_t.
+ */
+typedef struct {
+	s32 nleafs;		/* 4: number of tree leafs      */
+	s32 l2nleafs;		/* 4: l2 number of tree leafs   */
+	s32 leafidx;		/* 4: index of first tree leaf  */
+	s32 height;		/* 4: height of the tree        */
+	s8 budmin;		/* 1: min l2 tree leaf value to combine */
+	s8 stree[TREESIZE];	/* TREESIZE: tree               */
+	u8 pad[2];		/* 2: pad to word boundary      */
+} dmaptree_t;			/* - 360 -                      */
+
+/*
+ *	dmap page per 8K blocks bitmap
+ */
+typedef struct {
+	s32 nblocks;		/* 4: num blks covered by this dmap     */
+	s32 nfree;		/* 4: num of free blks in this dmap     */
+	s64 start;		/* 8: starting blkno for this dmap      */
+	dmaptree_t tree;	/* 360: dmap tree                       */
+	u8 pad[1672];		/* 1672: pad to 2048 bytes              */
+	u32 wmap[LPERDMAP];	/* 1024: bits of the working map        */
+	u32 pmap[LPERDMAP];	/* 1024: bits of the persistent map     */
+} dmap_t;			/* - 4096 -                             */
+
+/*
+ *	disk map control page per level.
+ *
+ * dmapctl_t must be consistent with dmaptree_t.
+ */
+typedef struct {
+	s32 nleafs;		/* 4: number of tree leafs      */
+	s32 l2nleafs;		/* 4: l2 number of tree leafs   */
+	s32 leafidx;		/* 4: index of the first tree leaf      */
+	s32 height;		/* 4: height of tree            */
+	s8 budmin;		/* 1: minimum l2 tree leaf value        */
+	s8 stree[CTLTREESIZE];	/* CTLTREESIZE: dmapctl tree    */
+	u8 pad[2714];		/* 2714: pad to 4096            */
+} dmapctl_t;			/* - 4096 -                     */
+
+/*
+ *	common definition for dmaptree_t within dmap and dmapctl
+ */
+typedef union {
+	dmaptree_t t1;
+	dmapctl_t t2;
+} dmtree_t;
+
+/* macros for accessing fields within dmtree_t */
+#define	dmt_nleafs	t1.nleafs
+#define	dmt_l2nleafs 	t1.l2nleafs
+#define	dmt_leafidx 	t1.leafidx
+#define	dmt_height 	t1.height
+#define	dmt_budmin 	t1.budmin
+#define	dmt_stree 	t1.stree
+
+/* 
+ *	on-disk aggregate disk allocation map descriptor.
+ */
+typedef struct {
+	s64 dn_mapsize;		/* 8: number of blocks in aggregate     */
+	s64 dn_nfree;		/* 8: num free blks in aggregate map    */
+	s32 dn_l2nbperpage;	/* 4: number of blks per page           */
+	s32 dn_numag;		/* 4: total number of ags               */
+	s32 dn_maxlevel;	/* 4: number of active ags              */
+	s32 dn_maxag;		/* 4: max active alloc group number     */
+	s32 dn_agpref;		/* 4: preferred alloc group (hint)      */
+	s32 dn_aglevel;		/* 4: dmapctl level holding the AG      */
+	s32 dn_agheigth;	/* 4: height in dmapctl of the AG       */
+	s32 dn_agwidth;		/* 4: width in dmapctl of the AG        */
+	s32 dn_agstart;		/* 4: start tree index at AG height     */
+	s32 dn_agl2size;	/* 4: l2 num of blks per alloc group    */
+	s64 dn_agfree[MAXAG];	/* 8*MAXAG: per AG free count           */
+	s64 dn_agsize;		/* 8: num of blks per alloc group       */
+	s8 dn_maxfreebud;	/* 1: max free buddy system             */
+	u8 pad[3007];		/* 3007: pad to 4096                    */
+} dbmap_t;			/* - 4096 -                             */
+
+/* 
+ *	in-memory aggregate disk allocation map descriptor.
+ */
+typedef struct bmap {
+	dbmap_t db_bmap;	/* on-disk aggregate map descriptor */
+	struct inode *db_ipbmap;	/* ptr to aggregate map incore inode */
+	struct semaphore db_bmaplock;	/* aggregate map lock */
+	u32 *db_DBmap;
+} bmap_t;
+
+/* macros for accessing fields within in-memory aggregate map descriptor */
+#define	db_mapsize	db_bmap.dn_mapsize
+#define	db_nfree	db_bmap.dn_nfree
+#define	db_agfree	db_bmap.dn_agfree
+#define	db_agsize	db_bmap.dn_agsize
+#define	db_agl2size	db_bmap.dn_agl2size
+#define	db_agwidth	db_bmap.dn_agwidth
+#define	db_agheigth	db_bmap.dn_agheigth
+#define	db_agstart	db_bmap.dn_agstart
+#define	db_numag	db_bmap.dn_numag
+#define	db_maxlevel	db_bmap.dn_maxlevel
+#define	db_aglevel	db_bmap.dn_aglevel
+#define	db_agpref	db_bmap.dn_agpref
+#define	db_maxag	db_bmap.dn_maxag
+#define	db_maxfreebud	db_bmap.dn_maxfreebud
+#define	db_l2nbperpage	db_bmap.dn_l2nbperpage
+
+/*
+ * macros for various conversions needed by the allocators.
+ * blkstol2(), cntlz(), and cnttz() are operating system dependent functions.
+ */
+/* convert number of blocks to log2 number of blocks, rounding up to
+ * the next log2 value if blocks is not a l2 multiple.
+ */
+#define	BLKSTOL2(d)		(blkstol2(d))
+
+/* convert number of leafs to log2 leaf value */
+#define	NLSTOL2BSZ(n)		(31 - cntlz((n)) + BUDMIN)
+
+/* convert leaf index to log2 leaf value */
+#define	LITOL2BSZ(n,m,b)	((((n) == 0) ? (m) : cnttz((n))) + (b))
+
+/* convert a block number to a dmap control leaf index */
+#define BLKTOCTLLEAF(b,m)	\
+	(((b) & (((s64)1 << ((m) + L2LPERCTL)) - 1)) >> (m))
+
+/* convert log2 leaf value to buddy size */
+#define	BUDSIZE(s,m)		(1 << ((s) - (m)))
+
+/*
+ *	external references.
+ */
+extern int dbMount(struct inode *ipbmap);
+
+extern int dbUnmount(struct inode *ipbmap, int mounterror);
+
+extern int dbFree(struct inode *ipbmap, s64 blkno, s64 nblocks);
+
+extern int dbUpdatePMap(struct inode *ipbmap,
+			int free, s64 blkno, s64 nblocks, tblock_t * tblk);
+
+extern int dbNextAG(struct inode *ipbmap);
+
+extern int dbAlloc(struct inode *ipbmap, s64 hint, s64 nblocks, s64 * results);
+
+extern int dbAllocExact(struct inode *ip, s64 blkno, int nblocks);
+
+extern int dbReAlloc(struct inode *ipbmap,
+		     s64 blkno, s64 nblocks, s64 addnblocks, s64 * results);
+
+extern int dbSync(struct inode *ipbmap);
+extern int dbAllocBottomUp(struct inode *ip, s64 blkno, s64 nblocks);
+extern int dbExtendFS(struct inode *ipbmap, s64 blkno, s64 nblocks);
+extern void dbFinalizeBmap(struct inode *ipbmap);
+extern s64 dbMapFileSizeToMapSize(struct inode *ipbmap);
+#endif				/* _H_JFS_DMAP */
--- a/fs/jfs/jfs_dtree.c
+++ b/fs/jfs/jfs_dtree.c
--- a/fs/jfs/jfs_dtree.h
+++ b/fs/jfs/jfs_dtree.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/*
+ * Change History :
+ *
+ */
+
+#ifndef _H_JFS_DTREE
+#define	_H_JFS_DTREE
+
+/*
+ *	jfs_dtree.h: directory B+-tree manager
+ */
+
+#include "jfs_btree.h"
+
+typedef union {
+	struct {
+		tid_t tid;
+		struct inode *ip;
+		u32 ino;
+	} leaf;
+	pxd_t xd;
+} ddata_t;
+
+
+/*
+ *      entry segment/slot
+ *
+ * an entry consists of type dependent head/only segment/slot and
+ * additional segments/slots linked vi next field;
+ * N.B. last/only segment of entry is terminated by next = -1;
+ */
+/*
+ *	directory page slot
+ */
+typedef struct {
+	s8 next;		/* 1: */
+	s8 cnt;			/* 1: */
+	wchar_t name[15];	/* 30: */
+} dtslot_t;			/* (32) */
+
+
+#define DATASLOTSIZE	16
+#define L2DATASLOTSIZE	4
+#define	DTSLOTSIZE	32
+#define	L2DTSLOTSIZE	5
+#define DTSLOTHDRSIZE	2
+#define DTSLOTDATASIZE	30
+#define DTSLOTDATALEN	15
+
+/*
+ *	 internal node entry head/only segment
+ */
+typedef struct {
+	pxd_t xd;		/* 8: child extent descriptor */
+
+	s8 next;		/* 1: */
+	u8 namlen;		/* 1: */
+	wchar_t name[11];	/* 22: 2-byte aligned */
+} idtentry_t;			/* (32) */
+
+#define DTIHDRSIZE	10
+#define DTIHDRDATALEN	11
+
+/* compute number of slots for entry */
+#define	NDTINTERNAL(klen) ( ((4 + (klen)) + (15 - 1)) / 15 )
+
+
+/*
+ *	leaf node entry head/only segment
+ *
+ * 	For legacy filesystems, name contains 13 wchars -- no index field
+ */
+typedef struct {
+	u32 inumber;		/* 4: 4-byte aligned */
+	s8 next;		/* 1: */
+	u8 namlen;		/* 1: */
+	wchar_t name[11];	/* 22: 2-byte aligned */
+	u32 index;		/* 4: index into dir_table */
+} ldtentry_t;			/* (32) */
+
+#define DTLHDRSIZE	6
+#define DTLHDRDATALEN_LEGACY	13	/* Old (OS/2) format */
+#define DTLHDRDATALEN	11
+
+/*
+ * dir_table used for directory traversal during readdir
+ */
+
+/*
+ * Keep persistent index for directory entries
+ */
+#define DO_INDEX(INODE) (JFS_SBI((INODE)->i_sb)->mntflag & JFS_DIR_INDEX)
+
+/*
+ * Maximum entry in inline directory table
+ */
+#define MAX_INLINE_DIRTABLE_ENTRY 13
+
+typedef struct dir_table_slot {
+	u8 rsrvd;		/* 1: */
+	u8 flag;		/* 1: 0 if free */
+	u8 slot;		/* 1: slot within leaf page of entry */
+	u8 addr1;		/* 1: upper 8 bits of leaf page address */
+	u32 addr2;		/* 4: lower 32 bits of leaf page address -OR-
+				   index of next entry when this entry was deleted */
+} dir_table_slot_t;		/* (8) */
+
+/*
+ * flag values
+ */
+#define DIR_INDEX_VALID 1
+#define DIR_INDEX_FREE 0
+
+#define DTSaddress(dir_table_slot, address64)\
+{\
+	(dir_table_slot)->addr1 = ((u64)address64) >> 32;\
+	(dir_table_slot)->addr2 = __cpu_to_le32((address64) & 0xffffffff);\
+}
+
+#define addressDTS(dts)\
+	( ((s64)((dts)->addr1)) << 32 | __le32_to_cpu((dts)->addr2) )
+
+/* compute number of slots for entry */
+#define	NDTLEAF_LEGACY(klen)	( ((2 + (klen)) + (15 - 1)) / 15 )
+#define	NDTLEAF	NDTINTERNAL
+
+
+/*
+ *	directory root page (in-line in on-disk inode):
+ *
+ * cf. dtpage_t below.
+ */
+typedef union {
+	struct {
+		dasd_t DASD;	/* 16: DASD limit/usage info  F226941 */
+
+		u8 flag;	/* 1: */
+		u8 nextindex;	/* 1: next free entry in stbl */
+		s8 freecnt;	/* 1: free count */
+		s8 freelist;	/* 1: freelist header */
+
+		u32 idotdot;	/* 4: parent inode number */
+
+		s8 stbl[8];	/* 8: sorted entry index table */
+	} header;		/* (32) */
+
+	dtslot_t slot[9];
+} dtroot_t;
+
+#define PARENT(IP) \
+	(le32_to_cpu(JFS_IP(IP)->i_dtroot.header.idotdot))
+
+#define DTROOTMAXSLOT	9
+
+#define	dtEmpty(IP) (JFS_IP(IP)->i_dtroot.header.nextindex == 0)
+
+
+/*
+ *	directory regular page:
+ *
+ *	entry slot array of 32 byte slot
+ *
+ * sorted entry slot index table (stbl):
+ * contiguous slots at slot specified by stblindex,
+ * 1-byte per entry
+ *   512 byte block:  16 entry tbl (1 slot)
+ *  1024 byte block:  32 entry tbl (1 slot)
+ *  2048 byte block:  64 entry tbl (2 slot)
+ *  4096 byte block: 128 entry tbl (4 slot)
+ *
+ * data area:
+ *   512 byte block:  16 - 2 =  14 slot
+ *  1024 byte block:  32 - 2 =  30 slot
+ *  2048 byte block:  64 - 3 =  61 slot
+ *  4096 byte block: 128 - 5 = 123 slot
+ *
+ * N.B. index is 0-based; index fields refer to slot index
+ * except nextindex which refers to entry index in stbl;
+ * end of entry stot list or freelist is marked with -1.
+ */
+typedef union {
+	struct {
+		s64 next;	/* 8: next sibling */
+		s64 prev;	/* 8: previous sibling */
+
+		u8 flag;	/* 1: */
+		u8 nextindex;	/* 1: next entry index in stbl */
+		s8 freecnt;	/* 1: */
+		s8 freelist;	/* 1: slot index of head of freelist */
+
+		u8 maxslot;	/* 1: number of slots in page slot[] */
+		u8 stblindex;	/* 1: slot index of start of stbl */
+		u8 rsrvd[2];	/* 2: */
+
+		pxd_t self;	/* 8: self pxd */
+	} header;		/* (32) */
+
+	dtslot_t slot[128];
+} dtpage_t;
+
+#define DTPAGEMAXSLOT        128
+
+#define DT8THPGNODEBYTES     512
+#define DT8THPGNODETSLOTS      1
+#define DT8THPGNODESLOTS      16
+
+#define DTQTRPGNODEBYTES    1024
+#define DTQTRPGNODETSLOTS      1
+#define DTQTRPGNODESLOTS      32
+
+#define DTHALFPGNODEBYTES   2048
+#define DTHALFPGNODETSLOTS     2
+#define DTHALFPGNODESLOTS     64
+
+#define DTFULLPGNODEBYTES   4096
+#define DTFULLPGNODETSLOTS     4
+#define DTFULLPGNODESLOTS    128
+
+#define DTENTRYSTART	1
+
+/* get sorted entry table of the page */
+#define DT_GETSTBL(p) ( ((p)->header.flag & BT_ROOT) ?\
+	((dtroot_t *)(p))->header.stbl : \
+	(s8 *)&(p)->slot[(p)->header.stblindex] )
+
+/*
+ * Flags for dtSearch
+ */
+#define JFS_CREATE 1
+#define JFS_LOOKUP 2
+#define JFS_REMOVE 3
+#define JFS_RENAME 4
+
+#define DIRENTSIZ(namlen) \
+    ( (sizeof(struct dirent) - 2*(JFS_NAME_MAX+1) + 2*((namlen)+1) + 3) &~ 3 )
+
+
+/*
+ *	external declarations
+ */
+extern void dtInitRoot(tid_t tid, struct inode *ip, u32 idotdot);
+
+extern int dtSearch(struct inode *ip, component_t * key,
+		    ino_t * data, btstack_t * btstack, int flag);
+
+extern int dtInsert(tid_t tid, struct inode *ip,
+		    component_t * key, ino_t * ino, btstack_t * btstack);
+
+extern int dtDelete(tid_t tid,
+		    struct inode *ip, component_t * key, ino_t * data, int flag);
+
+extern int dtRelocate(tid_t tid,
+		      struct inode *ip, s64 lmxaddr, pxd_t * opxd, s64 nxaddr);
+
+extern int dtModify(tid_t tid, struct inode *ip,
+		    component_t * key, ino_t * orig_ino, ino_t new_ino, int flag);
+
+extern int jfs_readdir(struct file *filp, void *dirent, filldir_t filldir);
+
+#ifdef  _JFS_DEBUG_DTREE
+extern int dtDisplayTree(struct inode *ip);
+
+extern int dtDisplayPage(struct inode *ip, s64 bn, dtpage_t * p);
+#endif				/* _JFS_DEBUG_DTREE */
+
+#endif				/* !_H_JFS_DTREE */
--- a/fs/jfs/jfs_extendfs.h
+++ b/fs/jfs/jfs_extendfs.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef	_H_JFS_EXTENDFS
+#define _H_JFS_EXTENDFS
+
+/*
+ *	jfs_extendfs.h
+ */
+/*
+ *	extendfs parameter list
+ */
+typedef struct {
+	u32 flag;		/* 4: */
+	u8 dev;			/* 1: */
+	u8 pad[3];		/* 3: */
+	s64 LVSize;		/* 8: LV size in LV block */
+	s64 FSSize;		/* 8: FS size in LV block */
+	s32 LogSize;		/* 4: inlinelog size in LV block */
+} extendfs_t;			/* (28) */
+
+/* plist flag */
+#define EXTENDFS_QUERY		0x00000001
+
+#endif				/* _H_JFS_EXTENDFS */
--- a/fs/jfs/jfs_extent.c
+++ b/fs/jfs/jfs_extent.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ *
+ * Module: jfs_extent.c:
+ */
+
+#include <linux/fs.h>
+#include "jfs_incore.h"
+#include "jfs_dmap.h"
+#include "jfs_extent.h"
+#include "jfs_debug.h"
+
+/*
+ * forward references
+ */
+static int extBalloc(struct inode *, s64, s64 *, s64 *);
+static int extBrealloc(struct inode *, s64, s64, s64 *, s64 *);
+int extRecord(struct inode *, xad_t *);
+static s64 extRoundDown(s64 nb);
+
+/*
+ * external references
+ */
+extern int dbExtend(struct inode *, s64, s64, s64);
+extern int jfs_commit_inode(struct inode *, int);
+
+
+#define DPD(a)          (printk("(a): %d\n",(a)))
+#define DPC(a)          (printk("(a): %c\n",(a)))
+#define DPL1(a)					\
+{						\
+	if ((a) >> 32)				\
+		printk("(a): %x%08x  ",(a));	\
+	else					\
+		printk("(a): %x  ",(a) << 32);	\
+}
+#define DPL(a)					\
+{						\
+	if ((a) >> 32)				\
+		printk("(a): %x%08x\n",(a));	\
+	else					\
+		printk("(a): %x\n",(a) << 32);	\
+}
+
+#define DPD1(a)         (printk("(a): %d  ",(a)))
+#define DPX(a)          (printk("(a): %08x\n",(a)))
+#define DPX1(a)         (printk("(a): %08x  ",(a)))
+#define DPS(a)          (printk("%s\n",(a)))
+#define DPE(a)          (printk("\nENTERING: %s\n",(a)))
+#define DPE1(a)          (printk("\nENTERING: %s",(a)))
+#define DPS1(a)         (printk("  %s  ",(a)))
+
+
+/*
+ * NAME:	extAlloc()
+ *
+ * FUNCTION:    allocate an extent for a specified page range within a
+ *		file.
+ *
+ * PARAMETERS:
+ *	ip	- the inode of the file.
+ *	xlen	- requested extent length.
+ *	pno	- the starting page number with the file.
+ *	xp	- pointer to an xad.  on entry, xad describes an
+ *		  extent that is used as an allocation hint if the
+ *		  xaddr of the xad is non-zero.  on successful exit,
+ *		  the xad describes the newly allocated extent.
+ *	abnr	- boolean_t indicating whether the newly allocated extent
+ *		  should be marked as allocated but not recorded.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      EIO	- i/o error.
+ *      ENOSPC	- insufficient disk resources.
+ */
+int
+extAlloc(struct inode *ip, s64 xlen, s64 pno, xad_t * xp, boolean_t abnr)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(ip->i_sb);
+	s64 nxlen, nxaddr, xoff, hint, xaddr = 0;
+	int rc, nbperpage;
+	int xflag;
+
+	/* This blocks if we are low on resources */
+	txBeginAnon(ip->i_sb);
+
+	/* validate extent length */
+	if (xlen > MAXXLEN)
+		xlen = MAXXLEN;
+
+	/* get the number of blocks per page */
+	nbperpage = sbi->nbperpage;
+
+	/* get the page's starting extent offset */
+	xoff = pno << sbi->l2nbperpage;
+
+	/* check if an allocation hint was provided */
+	if ((hint = addressXAD(xp))) {
+		/* get the size of the extent described by the hint */
+		nxlen = lengthXAD(xp);
+
+		/* check if the hint is for the portion of the file
+		 * immediately previous to the current allocation
+		 * request and if hint extent has the same abnr
+		 * value as the current request.  if so, we can
+		 * extend the hint extent to include the current
+		 * extent if we can allocate the blocks immediately
+		 * following the hint extent.
+		 */
+		if (offsetXAD(xp) + nxlen == xoff &&
+		    abnr == ((xp->flag & XAD_NOTRECORDED) ? TRUE : FALSE))
+			xaddr = hint + nxlen;
+
+		/* adjust the hint to the last block of the extent */
+		hint += (nxlen - 1);
+	}
+
+	/* allocate the disk blocks for the extent.  initially, extBalloc()
+	 * will try to allocate disk blocks for the requested size (xlen). 
+	 * if this fails (xlen contigious free blocks not avaliable), it'll
+	 * try to allocate a smaller number of blocks (producing a smaller
+	 * extent), with this smaller number of blocks consisting of the
+	 * requested number of blocks rounded down to the next smaller
+	 * power of 2 number (i.e. 16 -> 8).  it'll continue to round down
+	 * and retry the allocation until the number of blocks to allocate
+	 * is smaller than the number of blocks per page.
+	 */
+	nxlen = xlen;
+	if ((rc =
+	     extBalloc(ip, hint ? hint : INOHINT(ip), &nxlen, &nxaddr))) {
+		return (rc);
+	}
+
+	/* determine the value of the extent flag */
+	xflag = (abnr == TRUE) ? XAD_NOTRECORDED : 0;
+
+	/* if we can extend the hint extent to cover the current request, 
+	 * extend it.  otherwise, insert a new extent to
+	 * cover the current request.
+	 */
+	if (xaddr && xaddr == nxaddr)
+		rc = xtExtend(0, ip, xoff, (int) nxlen, 0);
+	else
+		rc = xtInsert(0, ip, xflag, xoff, (int) nxlen, &nxaddr, 0);
+
+	/* if the extend or insert failed, 
+	 * free the newly allocated blocks and return the error.
+	 */
+	if (rc) {
+		dbFree(ip, nxaddr, nxlen);
+		return (rc);
+	}
+
+	/* update the number of blocks allocated to the file */
+	ip->i_blocks += LBLK2PBLK(ip->i_sb, nxlen);
+
+	/* set the results of the extent allocation */
+	XADaddress(xp, nxaddr);
+	XADlength(xp, nxlen);
+	XADoffset(xp, xoff);
+	xp->flag = xflag;
+
+	mark_inode_dirty(ip);
+
+	/*
+	 * COMMIT_SyncList flags an anonymous tlock on page that is on
+	 * sync list.
+	 * We need to commit the inode to get the page written disk.
+	 */
+	if (test_and_clear_cflag(COMMIT_Synclist,ip))
+		jfs_commit_inode(ip, 0);
+
+	return (0);
+}
+
+
+/*
+ * NAME:        extRealloc()
+ *
+ * FUNCTION:    extend the allocation of a file extent containing a
+ *		partial back last page.
+ *
+ * PARAMETERS:
+ *	ip	- the inode of the file.
+ *	cp	- cbuf for the partial backed last page.
+ *	xlen	- request size of the resulting extent.
+ *	xp	- pointer to an xad. on successful exit, the xad
+ *		  describes the newly allocated extent.
+ *	abnr	- boolean_t indicating whether the newly allocated extent
+ *		  should be marked as allocated but not recorded.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      EIO	- i/o error.
+ *      ENOSPC	- insufficient disk resources.
+ */
+int extRealloc(struct inode *ip, s64 nxlen, xad_t * xp, boolean_t abnr)
+{
+	struct super_block *sb = ip->i_sb;
+	s64 xaddr, xlen, nxaddr, delta, xoff;
+	s64 ntail, nextend, ninsert;
+	int rc, nbperpage = JFS_SBI(sb)->nbperpage;
+	int xflag;
+
+	/* This blocks if we are low on resources */
+	txBeginAnon(ip->i_sb);
+
+	/* validate extent length */
+	if (nxlen > MAXXLEN)
+		nxlen = MAXXLEN;
+
+	/* get the extend (partial) page's disk block address and
+	 * number of blocks.
+	 */
+	xaddr = addressXAD(xp);
+	xlen = lengthXAD(xp);
+	xoff = offsetXAD(xp);
+
+	/* if the extend page is abnr and if the request is for
+	 * the extent to be allocated and recorded, 
+	 * make the page allocated and recorded.
+	 */
+	if ((xp->flag & XAD_NOTRECORDED) && !abnr) {
+		xp->flag = 0;
+		if ((rc = xtUpdate(0, ip, xp)))
+			return (rc);
+	}
+
+	/* try to allocated the request number of blocks for the
+	 * extent.  dbRealloc() first tries to satisfy the request
+	 * by extending the allocation in place. otherwise, it will
+	 * try to allocate a new set of blocks large enough for the
+	 * request.  in satisfying a request, dbReAlloc() may allocate
+	 * less than what was request but will always allocate enough
+	 * space as to satisfy the extend page.
+	 */
+	if ((rc = extBrealloc(ip, xaddr, xlen, &nxlen, &nxaddr)))
+		return (rc);
+
+	delta = nxlen - xlen;
+
+	/* check if the extend page is not abnr but the request is abnr
+	 * and the allocated disk space is for more than one page.  if this
+	 * is the case, there is a miss match of abnr between the extend page
+	 * and the one or more pages following the extend page.  as a result,
+	 * two extents will have to be manipulated. the first will be that
+	 * of the extent of the extend page and will be manipulated thru
+	 * an xtExtend() or an xtTailgate(), depending upon whether the
+	 * disk allocation occurred as an inplace extension.  the second
+	 * extent will be manipulated (created) through an xtInsert() and
+	 * will be for the pages following the extend page.
+	 */
+	if (abnr && (!(xp->flag & XAD_NOTRECORDED)) && (nxlen > nbperpage)) {
+		ntail = nbperpage;
+		nextend = ntail - xlen;
+		ninsert = nxlen - nbperpage;
+
+		xflag = XAD_NOTRECORDED;
+	} else {
+		ntail = nxlen;
+		nextend = delta;
+		ninsert = 0;
+
+		xflag = xp->flag;
+	}
+
+	/* if we were able to extend the disk allocation in place,
+	 * extend the extent.  otherwise, move the extent to a
+	 * new disk location.
+	 */
+	if (xaddr == nxaddr) {
+		/* extend the extent */
+		if ((rc = xtExtend(0, ip, xoff + xlen, (int) nextend, 0))) {
+			dbFree(ip, xaddr + xlen, delta);
+			return (rc);
+		}
+	} else {
+		/*
+		 * move the extent to a new location:
+		 *
+		 * xtTailgate() accounts for relocated tail extent;
+		 */
+		if ((rc = xtTailgate(0, ip, xoff, (int) ntail, nxaddr, 0))) {
+			dbFree(ip, nxaddr, nxlen);
+			return (rc);
+		}
+	}
+
+
+	/* check if we need to also insert a new extent */
+	if (ninsert) {
+		/* perform the insert.  if it fails, free the blocks
+		 * to be inserted and make it appear that we only did
+		 * the xtExtend() or xtTailgate() above.
+		 */
+		xaddr = nxaddr + ntail;
+		if (xtInsert (0, ip, xflag, xoff + ntail, (int) ninsert,
+			      &xaddr, 0)) {
+			dbFree(ip, xaddr, (s64) ninsert);
+			delta = nextend;
+			nxlen = ntail;
+			xflag = 0;
+		}
+	}
+
+	/* update the inode with the number of blocks allocated */
+	ip->i_blocks += LBLK2PBLK(sb, delta);
+
+	/* set the return results */
+	XADaddress(xp, nxaddr);
+	XADlength(xp, nxlen);
+	XADoffset(xp, xoff);
+	xp->flag = xflag;
+
+	mark_inode_dirty(ip);
+
+	return (0);
+}
+
+
+/*
+ * NAME:        extHint()
+ *
+ * FUNCTION:    produce an extent allocation hint for a file offset.
+ *
+ * PARAMETERS:
+ *	ip	- the inode of the file.
+ *	offset  - file offset for which the hint is needed.
+ *	xp	- pointer to the xad that is to be filled in with
+ *		  the hint.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      EIO	- i/o error.
+ */
+int extHint(struct inode *ip, s64 offset, xad_t * xp)
+{
+	struct super_block *sb = ip->i_sb;
+	xadlist_t xadl;
+	lxdlist_t lxdl;
+	lxd_t lxd;
+	s64 prev;
+	int rc, nbperpage = JFS_SBI(sb)->nbperpage;
+
+	/* init the hint as "no hint provided" */
+	XADaddress(xp, 0);
+
+	/* determine the starting extent offset of the page previous
+	 * to the page containing the offset.
+	 */
+	prev = ((offset & ~POFFSET) >> JFS_SBI(sb)->l2bsize) - nbperpage;
+
+	/* if the offsets in the first page of the file,
+	 * no hint provided.
+	 */
+	if (prev < 0)
+		return (0);
+
+	/* prepare to lookup the previous page's extent info */
+	lxdl.maxnlxd = 1;
+	lxdl.nlxd = 1;
+	lxdl.lxd = &lxd;
+	LXDoffset(&lxd, prev)
+	    LXDlength(&lxd, nbperpage);
+
+	xadl.maxnxad = 1;
+	xadl.nxad = 0;
+	xadl.xad = xp;
+
+	/* perform the lookup */
+	if ((rc = xtLookupList(ip, &lxdl, &xadl, 0)))
+		return (rc);
+
+	/* check if not extent exists for the previous page.  
+	 * this is possible for sparse files.
+	 */
+	if (xadl.nxad == 0) {
+//              assert(ISSPARSE(ip));
+		return (0);
+	}
+
+	/* only preserve the abnr flag within the xad flags
+	 * of the returned hint.
+	 */
+	xp->flag &= XAD_NOTRECORDED;
+
+	assert(xadl.nxad == 1);
+	assert(lengthXAD(xp) == nbperpage);
+
+	return (0);
+}
+
+
+/*
+ * NAME:        extRecord()
+ *
+ * FUNCTION:    change a page with a file from not recorded to recorded.
+ *
+ * PARAMETERS:
+ *	ip	- inode of the file.
+ *	cp	- cbuf of the file page.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      EIO	- i/o error.
+ *      ENOSPC	- insufficient disk resources.
+ */
+int extRecord(struct inode *ip, xad_t * xp)
+{
+	int rc;
+
+	txBeginAnon(ip->i_sb);
+
+	/* update the extent */
+	if ((rc = xtUpdate(0, ip, xp)))
+		return (rc);
+
+#ifdef _STILL_TO_PORT
+	/* no longer abnr */
+	cp->cm_abnr = FALSE;
+
+	/* mark the cbuf as modified */
+	cp->cm_modified = TRUE;
+#endif				/*  _STILL_TO_PORT */
+
+	return (0);
+}
+
+
+/*
+ * NAME:        extFill()
+ *
+ * FUNCTION:    allocate disk space for a file page that represents
+ *		a file hole.
+ *
+ * PARAMETERS:
+ *	ip	- the inode of the file.
+ *	cp	- cbuf of the file page represent the hole.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      EIO	- i/o error.
+ *      ENOSPC	- insufficient disk resources.
+ */
+int extFill(struct inode *ip, xad_t * xp)
+{
+	int rc, nbperpage = JFS_SBI(ip->i_sb)->nbperpage;
+	s64 blkno = offsetXAD(xp) >> ip->i_blksize;
+
+//      assert(ISSPARSE(ip));
+
+	/* initialize the extent allocation hint */
+	XADaddress(xp, 0);
+
+	/* allocate an extent to fill the hole */
+	if ((rc = extAlloc(ip, nbperpage, blkno, xp, FALSE)))
+		return (rc);
+
+	assert(lengthPXD(xp) == nbperpage);
+
+	return (0);
+}
+
+
+/*
+ * NAME:	extBalloc()
+ *
+ * FUNCTION:    allocate disk blocks to form an extent.
+ *
+ *		initially, we will try to allocate disk blocks for the
+ *		requested size (nblocks).  if this fails (nblocks 
+ *		contigious free blocks not avaliable), we'll try to allocate
+ *		a smaller number of blocks (producing a smaller extent), with
+ *		this smaller number of blocks consisting of the requested
+ *		number of blocks rounded down to the next smaller power of 2
+ *		number (i.e. 16 -> 8).  we'll continue to round down and
+ *		retry the allocation until the number of blocks to allocate
+ *		is smaller than the number of blocks per page.
+ *		
+ * PARAMETERS:
+ *	ip	 - the inode of the file.
+ *	hint	 - disk block number to be used as an allocation hint.
+ *	*nblocks - pointer to an s64 value.  on entry, this value specifies
+ *		   the desired number of block to be allocated. on successful
+ *		   exit, this value is set to the number of blocks actually
+ *		   allocated.
+ *	blkno	 - pointer to a block address that is filled in on successful
+ *		   return with the starting block number of the newly 
+ *		   allocated block range.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      EIO	- i/o error.
+ *      ENOSPC	- insufficient disk resources.
+ */
+static int
+extBalloc(struct inode *ip, s64 hint, s64 * nblocks, s64 * blkno)
+{
+	s64 nb, nblks, daddr, max;
+	int rc, nbperpage = JFS_SBI(ip->i_sb)->nbperpage;
+	bmap_t *mp = JFS_SBI(ip->i_sb)->bmap;
+
+	/* get the number of blocks to initially attempt to allocate.
+	 * we'll first try the number of blocks requested unless this
+	 * number is greater than the maximum number of contigious free
+	 * blocks in the map. in that case, we'll start off with the 
+	 * maximum free.
+	 */
+	max = (s64) 1 << mp->db_maxfreebud;
+	if (*nblocks >= max && *nblocks > nbperpage)
+		nb = nblks = (max > nbperpage) ? max : nbperpage;
+	else
+		nb = nblks = *nblocks;
+
+	/* try to allocate blocks */
+	while ((rc = dbAlloc(ip, hint, nb, &daddr))) {
+		/* if something other than an out of space error,
+		 * stop and return this error.
+		 */
+		if (rc != ENOSPC)
+			return (rc);
+
+		/* decrease the allocation request size */
+		nb = min(nblks, extRoundDown(nb));
+
+		/* give up if we cannot cover a page */
+		if (nb < nbperpage)
+			return (rc);
+	}
+
+	*nblocks = nb;
+	*blkno = daddr;
+
+	return (0);
+}
+
+
+/*
+ * NAME:	extBrealloc()
+ *
+ * FUNCTION:    attempt to extend an extent's allocation.
+ *
+ *		initially, we will try to extend the extent's allocation
+ *		in place.  if this fails, we'll try to move the extent
+ *		to a new set of blocks. if moving the extent, we initially
+ *		will try to allocate disk blocks for the requested size
+ *		(nnew).  if this fails 	(nnew contigious free blocks not
+ *		avaliable), we'll try  to allocate a smaller number of
+ *		blocks (producing a smaller extent), with this smaller
+ *		number of blocks consisting of the requested number of
+ *		blocks rounded down to the next smaller power of 2
+ *		number (i.e. 16 -> 8).  we'll continue to round down and
+ *		retry the allocation until the number of blocks to allocate
+ *		is smaller than the number of blocks per page.
+ *		
+ * PARAMETERS:
+ *	ip	 - the inode of the file.
+ *	blkno    - starting block number of the extents current allocation.
+ *	nblks    - number of blocks within the extents current allocation.
+ *	newnblks - pointer to a s64 value.  on entry, this value is the
+ *		   the new desired extent size (number of blocks).  on
+ *		   successful exit, this value is set to the extent's actual
+ *		   new size (new number of blocks).
+ *	newblkno - the starting block number of the extents new allocation.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      EIO	- i/o error.
+ *      ENOSPC	- insufficient disk resources.
+ */
+static int
+extBrealloc(struct inode *ip,
+	    s64 blkno, s64 nblks, s64 * newnblks, s64 * newblkno)
+{
+	int rc;
+
+	/* try to extend in place */
+	if ((rc = dbExtend(ip, blkno, nblks, *newnblks - nblks)) == 0) {
+		*newblkno = blkno;
+		return (0);
+	} else {
+		if (rc != ENOSPC)
+			return (rc);
+	}
+
+	/* in place extension not possible.  
+	 * try to move the extent to a new set of blocks.
+	 */
+	return (extBalloc(ip, blkno, newnblks, newblkno));
+}
+
+
+/*
+ * NAME:        extRoundDown()
+ *
+ * FUNCTION:    round down a specified number of blocks to the next
+ *		smallest power of 2 number.
+ *
+ * PARAMETERS:
+ *	nb	- the inode of the file.
+ *
+ * RETURN VALUES:
+ *      next smallest power of 2 number.
+ */
+static s64 extRoundDown(s64 nb)
+{
+	int i;
+	u64 m, k;
+
+	for (i = 0, m = (u64) 1 << 63; i < 64; i++, m >>= 1) {
+		if (m & nb)
+			break;
+	}
+
+	i = 63 - i;
+	k = (u64) 1 << i;
+	k = ((k - 1) & nb) ? k : k >> 1;
+
+	return (k);
+}
--- a/fs/jfs/jfs_extent.h
+++ b/fs/jfs/jfs_extent.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef	_H_JFS_EXTENT
+#define _H_JFS_EXTENT
+
+/*  get block allocation allocation hint as location of disk inode */
+#define	INOHINT(ip)	\
+	(addressPXD(&(JFS_IP(ip)->ixpxd)) + lengthPXD(&(JFS_IP(ip)->ixpxd)) - 1)
+
+extern int	extAlloc(struct inode *, s64, s64, xad_t *, boolean_t);
+extern int	extFill(struct inode *, xad_t *);
+extern int	extHint(struct inode *, s64, xad_t *);
+extern int	extRealloc(struct inode *, s64, xad_t *, boolean_t);
+extern int	extRecord(struct inode *, xad_t *);
+
+#endif	/* _H_JFS_EXTENT */
--- a/fs/jfs/jfs_filsys.h
+++ b/fs/jfs/jfs_filsys.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+*/
+
+#ifndef _H_JFS_FILSYS
+#define _H_JFS_FILSYS
+
+/*
+ *	jfs_filsys.h
+ *
+ * file system (implementation-dependent) constants 
+ *
+ * refer to <limits.h> for system wide implementation-dependent constants 
+ */
+
+/*
+ *	 file system option (superblock flag)
+ */
+/* platform option (conditional compilation) */
+#define JFS_AIX		0x80000000	/* AIX support */
+/*	POSIX name/directory  support */
+
+#define JFS_OS2		0x40000000	/* OS/2 support */
+/*	case-insensitive name/directory support */
+
+#define JFS_DFS		0x20000000	/* DCE DFS LFS support */
+
+#define JFS_LINUX      	0x10000000	/* Linux support */
+/*	case-sensitive name/directory support */
+
+/* directory option */
+#define JFS_UNICODE	0x00000001	/* unicode name */
+
+/* commit option */
+#define	JFS_COMMIT	0x00000f00	/* commit option mask */
+#define	JFS_GROUPCOMMIT	0x00000100	/* group (of 1) commit */
+#define	JFS_LAZYCOMMIT	0x00000200	/* lazy commit */
+#define	JFS_TMPFS	0x00000400	/* temporary file system - 
+					 * do not log/commit:
+					 */
+
+/* log logical volume option */
+#define	JFS_INLINELOG	0x00000800	/* inline log within file system */
+#define JFS_INLINEMOVE	0x00001000	/* inline log being moved */
+
+/* Secondary aggregate inode table */
+#define JFS_BAD_SAIT	0x00010000	/* current secondary ait is bad */
+
+/* sparse regular file support */
+#define JFS_SPARSE	0x00020000	/* sparse regular file */
+
+/* DASD Limits		F226941 */
+#define JFS_DASD_ENABLED	0x00040000	/* DASD limits enabled */
+#define	JFS_DASD_PRIME		0x00080000	/* Prime DASD usage on boot */
+
+/* big endian flag */
+#define	JFS_SWAP_BYTES		0x00100000	/* running on big endian computer */
+
+/* Directory index */
+#define JFS_DIR_INDEX		0x00200000	/* Persistant index for */
+						/* directory entries    */
+
+
+/*
+ *	buffer cache configuration
+ */
+/* page size */
+#ifdef PSIZE
+#undef PSIZE
+#endif
+#define	PSIZE		4096	/* page size (in byte) */
+#define	L2PSIZE		12	/* log2(PSIZE) */
+#define	POFFSET		4095	/* offset within page */
+
+/* buffer page size */
+#define BPSIZE	PSIZE
+
+/*
+ *	fs fundamental size
+ *
+ * PSIZE >= file system block size >= PBSIZE >= DISIZE
+ */
+#define	PBSIZE		512	/* physical block size (in byte) */
+#define	L2PBSIZE	9	/* log2(PBSIZE) */
+
+#define DISIZE		512	/* on-disk inode size (in byte) */
+#define L2DISIZE	9	/* log2(DISIZE) */
+
+#define IDATASIZE	256	/* inode inline data size */
+#define	IXATTRSIZE	128	/* inode inline extended attribute size */
+
+#define XTPAGE_SIZE     4096
+#define log2_PAGESIZE     12
+
+#define IAG_SIZE        4096
+#define IAG_EXTENT_SIZE 4096
+#define	INOSPERIAG	4096	/* number of disk inodes per iag */
+#define	L2INOSPERIAG	12	/* l2 number of disk inodes per iag */
+#define INOSPEREXT	32	/* number of disk inode per extent */
+#define L2INOSPEREXT	5	/* l2 number of disk inode per extent */
+#define	IXSIZE		(DISIZE * INOSPEREXT)	/* inode extent size */
+#define	INOSPERPAGE	8	/* number of disk inodes per 4K page */
+#define	L2INOSPERPAGE	3	/* log2(INOSPERPAGE) */
+
+#define	IAGFREELIST_LWM	64
+
+#define INODE_EXTENT_SIZE	IXSIZE	/* inode extent size */
+#define NUM_INODE_PER_EXTENT	INOSPEREXT
+#define NUM_INODE_PER_IAG	INOSPERIAG
+
+#define MINBLOCKSIZE		512
+#define MAXBLOCKSIZE		4096
+#define	MAXFILESIZE		((s64)1 << 52)
+
+#define JFS_LINK_MAX		65535	/* nlink_t is unsigned short */
+
+/* Minimum number of bytes supported for a JFS partition */
+#define MINJFS			(0x1000000)
+#define MINJFSTEXT		"16"
+
+/*
+ * file system block size -> physical block size
+ */
+#define LBOFFSET(x)	((x) & (PBSIZE - 1))
+#define LBNUMBER(x)	((x) >> L2PBSIZE)
+#define	LBLK2PBLK(sb,b)	((b) << (sb->s_blocksize_bits - L2PBSIZE))
+#define	PBLK2LBLK(sb,b)	((b) >> (sb->s_blocksize_bits - L2PBSIZE))
+/* size in byte -> last page number */
+#define	SIZE2PN(size)	( ((s64)((size) - 1)) >> (L2PSIZE) )
+/* size in byte -> last file system block number */
+#define	SIZE2BN(size, l2bsize) ( ((s64)((size) - 1)) >> (l2bsize) )
+
+/*
+ * fixed physical block address (physical block size = 512 byte)
+ *
+ * NOTE: since we can't guarantee a physical block size of 512 bytes the use of
+ *	 these macros should be removed and the byte offset macros used instead.
+ */
+#define SUPER1_B	64	/* primary superblock */
+#define	AIMAP_B		(SUPER1_B + 8)	/* 1st extent of aggregate inode map */
+#define	AITBL_B		(AIMAP_B + 16)	/*
+					 * 1st extent of aggregate inode table
+					 */
+#define	SUPER2_B	(AITBL_B + 32)	/* 2ndary superblock pbn */
+#define	BMAP_B		(SUPER2_B + 8)	/* block allocation map */
+
+/*
+ * SIZE_OF_SUPER defines the total amount of space reserved on disk for the
+ * superblock.  This is not the same as the superblock structure, since all of
+ * this space is not currently being used.
+ */
+#define SIZE_OF_SUPER	PSIZE
+
+/*
+ * SIZE_OF_AG_TABLE defines the amount of space reserved to hold the AG table
+ */
+#define SIZE_OF_AG_TABLE	PSIZE
+
+/*
+ * SIZE_OF_MAP_PAGE defines the amount of disk space reserved for each page of
+ * the inode allocation map (to hold iag)
+ */
+#define SIZE_OF_MAP_PAGE	PSIZE
+
+/*
+ * fixed byte offset address
+ */
+#define SUPER1_OFF	0x8000	/* primary superblock */
+#define AIMAP_OFF	(SUPER1_OFF + SIZE_OF_SUPER)
+					/*
+					 * Control page of aggregate inode map
+					 * followed by 1st extent of map
+					 */
+#define AITBL_OFF	(AIMAP_OFF + (SIZE_OF_MAP_PAGE << 1))
+					/* 
+					 * 1st extent of aggregate inode table
+					 */
+#define SUPER2_OFF	(AITBL_OFF + INODE_EXTENT_SIZE)
+					/*
+					 * secondary superblock
+					 */
+#define BMAP_OFF	(SUPER2_OFF + SIZE_OF_SUPER)
+					/*
+					 * block allocation map
+					 */
+
+/*
+ * The following macro is used to indicate the number of reserved disk blocks at
+ * the front of an aggregate, in terms of physical blocks.  This value is
+ * currently defined to be 32K.  This turns out to be the same as the primary
+ * superblock's address, since it directly follows the reserved blocks.
+ */
+#define AGGR_RSVD_BLOCKS	SUPER1_B
+
+/*
+ * The following macro is used to indicate the number of reserved bytes at the
+ * front of an aggregate.  This value is currently defined to be 32K.  This
+ * turns out to be the same as the primary superblock's byte offset, since it
+ * directly follows the reserved blocks.
+ */
+#define AGGR_RSVD_BYTES	SUPER1_OFF
+
+/*
+ * The following macro defines the byte offset for the first inode extent in
+ * the aggregate inode table.  This allows us to find the self inode to find the
+ * rest of the table.  Currently this value is 44K.
+ */
+#define AGGR_INODE_TABLE_START	AITBL_OFF
+
+/*
+ *	fixed reserved inode number
+ */
+/* aggregate inode */
+#define AGGR_RESERVED_I	0	/* aggregate inode (reserved) */
+#define	AGGREGATE_I	1	/* aggregate inode map inode */
+#define	BMAP_I		2	/* aggregate block allocation map inode */
+#define	LOG_I		3	/* aggregate inline log inode */
+#define BADBLOCK_I	4	/* aggregate bad block inode */
+#define	FILESYSTEM_I	16	/* 1st/only fileset inode in ait:
+				 * fileset inode map inode
+				 */
+
+/* per fileset inode */
+#define FILESET_RSVD_I	0	/* fileset inode (reserved) */
+#define FILESET_EXT_I	1	/* fileset inode extension */
+#define	ROOT_I		2	/* fileset root inode */
+#define ACL_I		3	/* fileset ACL inode */
+
+#define FILESET_OBJECT_I 4	/* the first fileset inode available for a file
+				 * or directory or link...
+				 */
+#define FIRST_FILESET_INO 16	/* the first aggregate inode which describes
+				 * an inode.  (To fsck this is also the first
+				 * inode in part 2 of the agg inode table.)
+				 */
+
+/*
+ *	directory configuration
+ */
+#define JFS_NAME_MAX	255
+#define JFS_PATH_MAX	BPSIZE
+
+
+/*
+ *	file system state (superblock state)
+ */
+#define FM_CLEAN 0x00000000	/* file system is unmounted and clean */
+#define FM_MOUNT 0x00000001	/* file system is mounted cleanly */
+#define FM_DIRTY 0x00000002	/* file system was not unmounted and clean 
+				 * when mounted or 
+				 * commit failure occurred while being mounted:
+				 * fsck() must be run to repair 
+				 */
+#define	FM_LOGREDO 0x00000004	/* log based recovery (logredo()) failed:
+				 * fsck() must be run to repair 
+				 */
+#define	FM_EXTENDFS 0x00000008	/* file system extendfs() in progress */
+
+#endif				/* _H_JFS_FILSYS */
--- a/fs/jfs/jfs_imap.c
+++ b/fs/jfs/jfs_imap.c
+/*
+
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+
+/*
+ * Change History :
+ *
+ */
+
+/*
+ *	jfs_imap.c: inode allocation map manager
+ *
+ * Serialization:
+ *   Each AG has a simple lock which is used to control the serialization of
+ *	the AG level lists.  This lock should be taken first whenever an AG
+ *	level list will be modified or accessed.
+ *
+ *   Each IAG is locked by obtaining the buffer for the IAG page.
+ *
+ *   There is also a inode lock for the inode map inode.  A read lock needs to
+ *	be taken whenever an IAG is read from the map or the global level
+ *	information is read.  A write lock needs to be taken whenever the global
+ *	level information is modified or an atomic operation needs to be used.
+ *
+ *	If more than one IAG is read at one time, the read lock may not
+ *	be given up until all of the IAG's are read.  Otherwise, a deadlock
+ *	may occur when trying to obtain the read lock while another thread
+ *	holding the read lock is waiting on the IAG already being held.
+ *
+ *   The control page of the inode map is read into memory by diMount().
+ *	Thereafter it should only be modified in memory and then it will be
+ *	written out when the filesystem is unmounted by diUnmount().
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/locks.h>
+#include "jfs_incore.h"
+#include "jfs_filsys.h"
+#include "jfs_dinode.h"
+#include "jfs_dmap.h"
+#include "jfs_imap.h"
+#include "jfs_metapage.h"
+#include "jfs_superblock.h"
+#include "jfs_debug.h"
+
+/*
+ * imap locks
+ */
+/* iag free list lock */
+#define IAGFREE_LOCK_INIT(imap)		init_MUTEX(&imap->im_freelock)
+#define IAGFREE_LOCK(imap)		down(&imap->im_freelock)
+#define IAGFREE_UNLOCK(imap)		up(&imap->im_freelock)
+
+/* per ag iag list locks */
+#define AG_LOCK_INIT(imap,index)	init_MUTEX(&(imap->im_aglock[index]))
+#define AG_LOCK(imap,agno)		down(&imap->im_aglock[agno])
+#define AG_UNLOCK(imap,agno)		up(&imap->im_aglock[agno])
+
+/*
+ * external references
+ */
+extern struct address_space_operations jfs_aops;
+
+/*
+ * forward references
+ */
+static int diAllocAG(imap_t *, int, boolean_t, struct inode *);
+static int diAllocAny(imap_t *, int, boolean_t, struct inode *);
+static int diAllocBit(imap_t *, iag_t *, int);
+static int diAllocExt(imap_t *, int, struct inode *);
+static int diAllocIno(imap_t *, int, struct inode *);
+static int diFindFree(u32, int);
+static int diNewExt(imap_t *, iag_t *, int);
+static int diNewIAG(imap_t *, int *, int, metapage_t **);
+static void duplicateIXtree(struct super_block *, s64, int, s64 *);
+
+static int diIAGRead(imap_t * imap, int, metapage_t **);
+static int copy_from_dinode(dinode_t *, struct inode *);
+static void copy_to_dinode(dinode_t *, struct inode *);
+
+/*
+ *	debug code for double-checking inode map
+ */
+/* #define	_JFS_DEBUG_IMAP	1 */
+
+#ifdef	_JFS_DEBUG_IMAP
+#define DBG_DIINIT(imap)	DBGdiInit(imap)
+#define DBG_DIALLOC(imap, ino)	DBGdiAlloc(imap, ino)
+#define DBG_DIFREE(imap, ino)	DBGdiFree(imap, ino)
+
+static void *DBGdiInit(imap_t * imap);
+static void DBGdiAlloc(imap_t * imap, ino_t ino);
+static void DBGdiFree(imap_t * imap, ino_t ino);
+#else
+#define DBG_DIINIT(imap)
+#define DBG_DIALLOC(imap, ino)
+#define DBG_DIFREE(imap, ino)
+#endif				/* _JFS_DEBUG_IMAP */
+
+/*
+ * NAME:        diMount()
+ *
+ * FUNCTION:    initialize the incore inode map control structures for
+ *		a fileset or aggregate init time.
+ *
+ *              the inode map's control structure (dinomap_t) is 
+ *              brought in from disk and placed in virtual memory.
+ *
+ * PARAMETERS:
+ *      ipimap  - pointer to inode map inode for the aggregate or fileset.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      ENOMEM  - insufficient free virtual memory.
+ *      EIO  	- i/o error.
+ */
+int diMount(struct inode *ipimap)
+{
+	imap_t *imap;
+	metapage_t *mp;
+	int index;
+	dinomap_t *dinom_le;
+
+	/*
+	 * allocate/initialize the in-memory inode map control structure
+	 */
+	/* allocate the in-memory inode map control structure. */
+	imap = (imap_t *) kmalloc(sizeof(imap_t), GFP_KERNEL);
+	if (imap == NULL) {
+		jERROR(1, ("diMount: kmalloc returned NULL!\n"));
+		return (ENOMEM);
+	}
+
+	/* read the on-disk inode map control structure. */
+
+	mp = read_metapage(ipimap,
+			   IMAPBLKNO << JFS_SBI(ipimap->i_sb)->l2nbperpage,
+			   PSIZE, 0);
+	if (mp == NULL) {
+		kfree(imap);
+		return (EIO);
+	}
+
+	/* copy the on-disk version to the in-memory version. */
+	dinom_le = (dinomap_t *) mp->data;
+	imap->im_freeiag = le32_to_cpu(dinom_le->in_freeiag);
+	imap->im_nextiag = le32_to_cpu(dinom_le->in_nextiag);
+	atomic_set(&imap->im_numinos, le32_to_cpu(dinom_le->in_numinos));
+	atomic_set(&imap->im_numfree, le32_to_cpu(dinom_le->in_numfree));
+	imap->im_nbperiext = le32_to_cpu(dinom_le->in_nbperiext);
+	imap->im_l2nbperiext = le32_to_cpu(dinom_le->in_l2nbperiext);
+	for (index = 0; index < MAXAG; index++) {
+		imap->im_agctl[index].inofree =
+		    le32_to_cpu(dinom_le->in_agctl[index].inofree);
+		imap->im_agctl[index].extfree =
+		    le32_to_cpu(dinom_le->in_agctl[index].extfree);
+		imap->im_agctl[index].numinos =
+		    le32_to_cpu(dinom_le->in_agctl[index].numinos);
+		imap->im_agctl[index].numfree =
+		    le32_to_cpu(dinom_le->in_agctl[index].numfree);
+	}
+
+	/* release the buffer. */
+	release_metapage(mp);
+
+	/*
+	 * allocate/initialize inode allocation map locks
+	 */
+	/* allocate and init iag free list lock */
+	IAGFREE_LOCK_INIT(imap);
+
+	/* allocate and init ag list locks */
+	for (index = 0; index < MAXAG; index++) {
+		AG_LOCK_INIT(imap, index);
+	}
+
+	/* bind the inode map inode and inode map control structure
+	 * to each other.
+	 */
+	imap->im_ipimap = ipimap;
+	JFS_IP(ipimap)->i_imap = imap;
+
+//      DBG_DIINIT(imap);
+
+	return (0);
+}
+
+
+/*
+ * NAME:        diUnmount()
+ *
+ * FUNCTION:    write to disk the incore inode map control structures for
+ *		a fileset or aggregate at unmount time.
+ *
+ * PARAMETERS:
+ *      ipimap  - pointer to inode map inode for the aggregate or fileset.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      ENOMEM  - insufficient free virtual memory.
+ *      EIO  	- i/o error.
+ */
+int diUnmount(struct inode *ipimap, int mounterror)
+{
+	imap_t *imap = JFS_IP(ipimap)->i_imap;
+
+	/*
+	 * update the on-disk inode map control structure
+	 */
+
+	if (!(mounterror || isReadOnly(ipimap)))
+		diSync(ipimap);
+
+	/*
+	 * Invalidate the page cache buffers
+	 */
+	truncate_inode_pages(ipimap->i_mapping, 0);
+
+	/*
+	 * free in-memory control structure
+	 */
+	kfree(imap);
+
+	return (0);
+}
+
+
+/*
+ *	diSync()
+ */
+int diSync(struct inode *ipimap)
+{
+	dinomap_t *dinom_le;
+	imap_t *imp = JFS_IP(ipimap)->i_imap;
+	metapage_t *mp;
+	int index;
+
+	/*
+	 * write imap global conrol page
+	 */
+	/* read the on-disk inode map control structure */
+	mp = get_metapage(ipimap,
+			  IMAPBLKNO << JFS_SBI(ipimap->i_sb)->l2nbperpage,
+			  PSIZE, 0);
+	if (mp == NULL) {
+		jERROR(1,("diSync: get_metapage failed!\n"));
+		return EIO;
+	}
+
+	/* copy the in-memory version to the on-disk version */
+	//memcpy(mp->data, &imp->im_imap,sizeof(dinomap_t));
+	dinom_le = (dinomap_t *) mp->data;
+	dinom_le->in_freeiag = cpu_to_le32(imp->im_freeiag);
+	dinom_le->in_nextiag = cpu_to_le32(imp->im_nextiag);
+	dinom_le->in_numinos = cpu_to_le32(atomic_read(&imp->im_numinos));
+	dinom_le->in_numfree = cpu_to_le32(atomic_read(&imp->im_numfree));
+	dinom_le->in_nbperiext = cpu_to_le32(imp->im_nbperiext);
+	dinom_le->in_l2nbperiext = cpu_to_le32(imp->im_l2nbperiext);
+	for (index = 0; index < MAXAG; index++) {
+		dinom_le->in_agctl[index].inofree =
+		    cpu_to_le32(imp->im_agctl[index].inofree);
+		dinom_le->in_agctl[index].extfree =
+		    cpu_to_le32(imp->im_agctl[index].extfree);
+		dinom_le->in_agctl[index].numinos =
+		    cpu_to_le32(imp->im_agctl[index].numinos);
+		dinom_le->in_agctl[index].numfree =
+		    cpu_to_le32(imp->im_agctl[index].numfree);
+	}
+
+	/* write out the control structure */
+	write_metapage(mp);
+
+	/*
+	 * write out dirty pages of imap
+	 */
+	fsync_inode_data_buffers(ipimap);
+
+	diWriteSpecial(ipimap);
+
+	return (0);
+}
+
+
+/*
+ * NAME:        diRead()
+ *
+ * FUNCTION:    initialize an incore inode from disk.
+ *
+ *		on entry, the specifed incore inode should itself
+ *		specify the disk inode number corresponding to the
+ *		incore inode (i.e. i_number should be initialized).
+ *		
+ *		this routine handles incore inode initialization for
+ *		both "special" and "regular" inodes.  special inodes
+ *		are those required early in the mount process and
+ *	        require special handling since much of the file system
+ *		is not yet initialized.  these "special" inodes are
+ *		identified by a NULL inode map inode pointer and are
+ *		actually initialized by a call to diReadSpecial().
+ *		
+ *		for regular inodes, the iag describing the disk inode
+ *		is read from disk to determine the inode extent address
+ *		for the disk inode.  with the inode extent address in
+ *		hand, the page of the extent that contains the disk
+ *		inode is read and the disk inode is copied to the
+ *		incore inode.
+ *
+ * PARAMETERS:
+ *      ip  -  pointer to incore inode to be initialized from disk.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      EIO  	- i/o error.
+ *      ENOMEM	- insufficient memory
+ *      
+ */
+int diRead(struct inode *ip)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(ip->i_sb);
+	int iagno, ino, extno, rc;
+	struct inode *ipimap;
+	dinode_t *dp;
+	iag_t *iagp;
+	metapage_t *mp;
+	s64 blkno, agstart;
+	imap_t *imap;
+	int block_offset;
+	int inodes_left;
+	uint pageno;
+	int rel_inode;
+
+	jFYI(1, ("diRead: ino = %ld\n", ip->i_ino));
+
+	ipimap = sbi->ipimap;
+	JFS_IP(ip)->ipimap = ipimap;
+
+	/* determine the iag number for this inode (number) */
+	iagno = INOTOIAG(ip->i_ino);
+
+	/* read the iag */
+	imap = JFS_IP(ipimap)->i_imap;
+	IREAD_LOCK(ipimap);
+	rc = diIAGRead(imap, iagno, &mp);
+	IREAD_UNLOCK(ipimap);
+	if (rc) {
+		jERROR(1, ("diRead: diIAGRead returned %d\n", rc));
+		return (rc);
+	}
+
+	iagp = (iag_t *) mp->data;
+
+	/* determine inode extent that holds the disk inode */
+	ino = ip->i_ino & (INOSPERIAG - 1);
+	extno = ino >> L2INOSPEREXT;
+
+	if ((lengthPXD(&iagp->inoext[extno]) != imap->im_nbperiext) ||
+	    (addressPXD(&iagp->inoext[extno]) == 0)) {
+		jERROR(1, ("diRead: Bad inoext: 0x%lx, 0x%lx\n",
+			   (ulong) addressPXD(&iagp->inoext[extno]),
+			   (ulong) lengthPXD(&iagp->inoext[extno])));
+		release_metapage(mp);
+		updateSuper(ip->i_sb, FM_DIRTY);
+		return ESTALE;
+	}
+
+	/* get disk block number of the page within the inode extent
+	 * that holds the disk inode.
+	 */
+	blkno = INOPBLK(&iagp->inoext[extno], ino, sbi->l2nbperpage);
+
+	/* get the ag for the iag */
+	agstart = le64_to_cpu(iagp->agstart);
+
+	release_metapage(mp);
+
+	rel_inode = (ino & (INOSPERPAGE - 1));
+	pageno = blkno >> sbi->l2nbperpage;
+
+	if ((block_offset = ((u32) blkno & (sbi->nbperpage - 1)))) {
+		/*
+		 * OS/2 didn't always align inode extents on page boundaries
+		 */
+		inodes_left =
+		     (sbi->nbperpage - block_offset) << sbi->l2niperblk;
+
+		if (rel_inode < inodes_left)
+			rel_inode += block_offset << sbi->l2niperblk;
+		else {
+			pageno += 1;
+			rel_inode -= inodes_left;
+		}
+	}
+
+	/* read the page of disk inode */
+	mp = read_metapage(ipimap, pageno << sbi->l2nbperpage, PSIZE, 1);
+	if (mp == 0) {
+		jERROR(1, ("diRead: read_metapage failed\n"));
+		return EIO;
+	}
+
+	/* locate the the disk inode requested */
+	dp = (dinode_t *) mp->data;
+	dp += rel_inode;
+
+	if (ip->i_ino != le32_to_cpu(dp->di_number)) {
+		jERROR(1, ("diRead: i_ino != di_number\n"));
+		updateSuper(ip->i_sb, FM_DIRTY);
+		rc = EIO;
+	} else if (le32_to_cpu(dp->di_nlink) == 0) {
+		jERROR(1,
+		       ("diRead: di_nlink is zero. ino=%ld\n", ip->i_ino));
+		updateSuper(ip->i_sb, FM_DIRTY);
+		rc = ESTALE;
+	} else
+		/* copy the disk inode to the in-memory inode */
+		rc = copy_from_dinode(dp, ip);
+
+	release_metapage(mp);
+
+	/* set the ag for the inode */
+	JFS_IP(ip)->agno = BLKTOAG(agstart, sbi);
+
+	return (rc);
+}
+
+
+/*
+ * NAME:        diReadSpecial()
+ *
+ * FUNCTION:    initialize a 'special' inode from disk.
+ *
+ *		this routines handles aggregate level inodes.  The
+ *		inode cache cannot differentiate between the
+ *		aggregate inodes and the filesystem inodes, so we
+ *		handle these here.  We don't actually use the aggregate
+ *	        inode map, since these inodes are at a fixed location
+ *		and in some cases the aggregate inode map isn't initialized
+ *		yet.
+ *
+ * PARAMETERS:
+ *      sb - filesystem superblock
+ *	inum - aggregate inode number
+ *
+ * RETURN VALUES:
+ *      new inode	- success
+ *      NULL		- i/o error.
+ */
+struct inode *diReadSpecial(struct super_block *sb, ino_t inum)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(sb);
+	uint address;
+	dinode_t *dp;
+	struct inode *ip;
+	metapage_t *mp;
+
+	ip = new_inode(sb);
+	if (ip == NULL) {
+		jERROR(1,
+		       ("diReadSpecial: new_inode returned NULL!\n"));
+		return ip;
+	}
+
+	/*
+	 * If ip->i_number >= 32 (INOSPEREXT), then read from secondary
+	 * aggregate inode table.
+	 */
+
+	if (inum >= INOSPEREXT) {
+		address =
+		    addressPXD(&sbi->ait2) >> sbi->l2nbperpage;
+		inum -= INOSPEREXT;
+		ASSERT(inum < INOSPEREXT);
+		JFS_IP(ip)->ipimap = sbi->ipaimap2;
+	} else {
+		address = AITBL_OFF >> L2PSIZE;
+		JFS_IP(ip)->ipimap = sbi->ipaimap;
+	}
+	ip->i_ino = inum;
+
+	address += inum >> 3;	/* 8 inodes per 4K page */
+
+	/* read the page of fixed disk inode (AIT) in raw mode */
+	jEVENT(0,
+	       ("Reading aggregate inode %d from block %d\n", (uint) inum,
+		address));
+	mp = read_metapage(ip, address << sbi->l2nbperpage, PSIZE, 1);
+	if (mp == NULL) {
+		ip->i_sb = NULL;
+		ip->i_nlink = 1;	/* Don't want iput() deleting it */
+		iput(ip);
+		return (NULL);
+	}
+
+	/* get the pointer to the disk inode of interest */
+	dp = (dinode_t *) (mp->data);
+	dp += inum % 8;		/* 8 inodes per 4K page */
+
+	/* copy on-disk inode to in-memory inode */
+	if ((copy_from_dinode(dp, ip)) != 0) {
+		/* handle bad return by returning NULL for ip */
+		ip->i_sb = NULL;
+		ip->i_nlink = 1;	/* Don't want iput() deleting it */
+		iput(ip);
+		/* release the page */
+		release_metapage(mp);
+		return (NULL);
+
+	}
+
+	ip->i_mapping->a_ops = &jfs_aops;
+	ip->i_mapping->gfp_mask = GFP_NOFS;
+
+	if ((inum == FILESYSTEM_I) && (JFS_IP(ip)->ipimap == sbi->ipaimap)) {
+		sbi->gengen = le32_to_cpu(dp->di_gengen);
+		sbi->inostamp = le32_to_cpu(dp->di_inostamp);
+	}
+
+	/* release the page */
+	release_metapage(mp);
+
+	return (ip);
+}
+
+/*
+ * NAME:        diWriteSpecial()
+ *
+ * FUNCTION:    Write the special inode to disk
+ *
+ * PARAMETERS:
+ *      ip - special inode
+ *
+ * RETURN VALUES: none
+ */
+
+void diWriteSpecial(struct inode *ip)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(ip->i_sb);
+	uint address;
+	dinode_t *dp;
+	ino_t inum = ip->i_ino;
+	metapage_t *mp;
+
+	/*
+	 * If ip->i_number >= 32 (INOSPEREXT), then write to secondary
+	 * aggregate inode table.
+	 */
+
+	if (!(ip->i_state & I_DIRTY))
+		return;
+
+	ip->i_state &= ~I_DIRTY;
+
+	if (inum >= INOSPEREXT) {
+		address =
+		    addressPXD(&sbi->ait2) >> sbi->l2nbperpage;
+		inum -= INOSPEREXT;
+		ASSERT(inum < INOSPEREXT);
+	} else {
+		address = AITBL_OFF >> L2PSIZE;
+	}
+
+	address += inum >> 3;	/* 8 inodes per 4K page */
+
+	/* read the page of fixed disk inode (AIT) in raw mode */
+	jEVENT(0,
+	       ("Reading aggregate inode %d from block %d\n", (uint) inum,
+		address));
+	mp = read_metapage(ip, address << sbi->l2nbperpage, PSIZE, 1);
+	if (mp == NULL) {
+		jERROR(1,
+		       ("diWriteSpecial: failed to read aggregate inode extent!\n"));
+		return;
+	}
+
+	/* get the pointer to the disk inode of interest */
+	dp = (dinode_t *) (mp->data);
+	dp += inum % 8;		/* 8 inodes per 4K page */
+
+	/* copy on-disk inode to in-memory inode */
+	copy_to_dinode(dp, ip);
+	memcpy(&dp->di_xtroot, &JFS_IP(ip)->i_xtroot, 288);
+
+	if (inum == FILESYSTEM_I)
+		dp->di_gengen = cpu_to_le32(sbi->gengen);
+
+	/* write the page */
+	write_metapage(mp);
+}
+
+/*
+ * NAME:        diFreeSpecial()
+ *
+ * FUNCTION:    Free allocated space for special inode
+ */
+void diFreeSpecial(struct inode *ip)
+{
+	if (ip == NULL) {
+		jERROR(1, ("diFreeSpecial called with NULL ip!\n"));
+		return;
+	}
+	fsync_inode_data_buffers(ip);
+	truncate_inode_pages(ip->i_mapping, 0);
+	iput(ip);
+}
+
+
+
+/*
+ * NAME:        diWrite()
+ *
+ * FUNCTION:    write the on-disk inode portion of the in-memory inode
+ *		to its corresponding on-disk inode.
+ *
+ *		on entry, the specifed incore inode should itself
+ *		specify the disk inode number corresponding to the
+ *		incore inode (i.e. i_number should be initialized).
+ *
+ *		the inode contains the inode extent address for the disk
+ *		inode.  with the inode extent address in hand, the
+ *		page of the extent that contains the disk inode is
+ *		read and the disk inode portion of the incore inode
+ *		is copied to the disk inode.
+ *		
+ * PARAMETERS:
+ *	tid -  transacation id
+ *      ip  -  pointer to incore inode to be written to the inode extent.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      EIO  	- i/o error.
+ */
+int diWrite(tid_t tid, struct inode *ip)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(ip->i_sb);
+	struct jfs_inode_info *jfs_ip = JFS_IP(ip);
+	int rc = 0;
+	s32 ino;
+	dinode_t *dp;
+	s64 blkno;
+	int block_offset;
+	int inodes_left;
+	metapage_t *mp;
+	uint pageno;
+	int rel_inode;
+	int dioffset;
+	struct inode *ipimap;
+	uint type;
+	lid_t lid;
+	tlock_t *ditlck, *tlck;
+	linelock_t *dilinelock, *ilinelock;
+	lv_t *lv;
+	int n;
+
+	ipimap = jfs_ip->ipimap;
+
+	ino = ip->i_ino & (INOSPERIAG - 1);
+
+	assert(lengthPXD(&(jfs_ip->ixpxd)) ==
+	       JFS_IP(ipimap)->i_imap->im_nbperiext);
+	assert(addressPXD(&(jfs_ip->ixpxd)));
+
+	/*
+	 * read the page of disk inode containing the specified inode:
+	 */
+	/* compute the block address of the page */
+	blkno = INOPBLK(&(jfs_ip->ixpxd), ino, sbi->l2nbperpage);
+
+	rel_inode = (ino & (INOSPERPAGE - 1));
+	pageno = blkno >> sbi->l2nbperpage;
+
+	if ((block_offset = ((u32) blkno & (sbi->nbperpage - 1)))) {
+		/*
+		 * OS/2 didn't always align inode extents on page boundaries
+		 */
+		inodes_left =
+		    (sbi->nbperpage - block_offset) << sbi->l2niperblk;
+
+		if (rel_inode < inodes_left)
+			rel_inode += block_offset << sbi->l2niperblk;
+		else {
+			pageno += 1;
+			rel_inode -= inodes_left;
+		}
+	}
+	/* read the page of disk inode */
+      retry:
+	mp = read_metapage(ipimap, pageno << sbi->l2nbperpage, PSIZE, 1);
+	if (mp == 0)
+		return (EIO);
+
+	/* get the pointer to the disk inode */
+	dp = (dinode_t *) mp->data;
+	dp += rel_inode;
+
+	dioffset = (ino & (INOSPERPAGE - 1)) << L2DISIZE;
+
+	/*
+	 * acquire transaction lock on the on-disk inode;
+	 * N.B. tlock is acquired on ipimap not ip;
+	 */
+	if ((ditlck =
+	     txLock(tid, ipimap, mp, tlckINODE | tlckENTRY)) == NULL)
+		goto retry;
+	dilinelock = (linelock_t *) & ditlck->lock;
+
+	/*
+	 * copy btree root from in-memory inode to on-disk inode
+	 *
+	 * (tlock is taken from inline B+-tree root in in-memory
+	 * inode when the B+-tree root is updated, which is pointed 
+	 * by jfs_ip->blid as well as being on tx tlock list)
+	 *
+	 * further processing of btree root is based on the copy 
+	 * in in-memory inode, where txLog() will log from, and, 
+	 * for xtree root, txUpdateMap() will update map and reset
+	 * XAD_NEW bit;
+	 */
+
+	if (S_ISDIR(ip->i_mode) && (lid = jfs_ip->xtlid)) {
+		/*
+		 * This is the special xtree inside the directory for storing
+		 * the directory table
+		 */
+		xtpage_t *p, *xp;
+		xad_t *xad;
+
+		jfs_ip->xtlid = 0;
+		tlck = lid_to_tlock(lid);
+		assert(tlck->type & tlckXTREE);
+		tlck->type |= tlckBTROOT;
+		tlck->mp = mp;
+		ilinelock = (linelock_t *) & tlck->lock;
+
+		/*
+		 * copy xtree root from inode to dinode:
+		 */
+		p = &jfs_ip->i_xtroot;
+		xp = (xtpage_t *) &dp->di_dirtable;
+		lv = (lv_t *) & ilinelock->lv;
+		for (n = 0; n < ilinelock->index; n++, lv++) {
+			memcpy(&xp->xad[lv->offset], &p->xad[lv->offset],
+			       lv->length << L2XTSLOTSIZE);
+		}
+
+		/* reset on-disk (metadata page) xtree XAD_NEW bit */
+		xad = &xp->xad[XTENTRYSTART];
+		for (n = XTENTRYSTART;
+		     n < le16_to_cpu(xp->header.nextindex); n++, xad++)
+			if (xad->flag & (XAD_NEW | XAD_EXTENDED))
+				xad->flag &= ~(XAD_NEW | XAD_EXTENDED);
+	}
+
+	if ((lid = jfs_ip->blid) == 0)
+		goto inlineData;
+	jfs_ip->blid = 0;
+
+	tlck = lid_to_tlock(lid);
+	type = tlck->type;
+	tlck->type |= tlckBTROOT;
+	tlck->mp = mp;
+	ilinelock = (linelock_t *) & tlck->lock;
+
+	/*
+	 *      regular file: 16 byte (XAD slot) granularity
+	 */
+	if (type & tlckXTREE) {
+		xtpage_t *p, *xp;
+		xad_t *xad;
+
+		/*
+		 * copy xtree root from inode to dinode:
+		 */
+		p = &jfs_ip->i_xtroot;
+		xp = &dp->di_xtroot;
+		lv = (lv_t *) & ilinelock->lv;
+		for (n = 0; n < ilinelock->index; n++, lv++) {
+			memcpy(&xp->xad[lv->offset], &p->xad[lv->offset],
+			       lv->length << L2XTSLOTSIZE);
+		}
+
+		/* reset on-disk (metadata page) xtree XAD_NEW bit */
+		xad = &xp->xad[XTENTRYSTART];
+		for (n = XTENTRYSTART;
+		     n < le16_to_cpu(xp->header.nextindex); n++, xad++)
+			if (xad->flag & (XAD_NEW | XAD_EXTENDED))
+				xad->flag &= ~(XAD_NEW | XAD_EXTENDED);
+	}
+	/*
+	 *      directory: 32 byte (directory entry slot) granularity
+	 */
+	else if (type & tlckDTREE) {
+		dtpage_t *p, *xp;
+
+		/*
+		 * copy dtree root from inode to dinode:
+		 */
+		p = (dtpage_t *) &jfs_ip->i_dtroot;
+		xp = (dtpage_t *) & dp->di_dtroot;
+		lv = (lv_t *) & ilinelock->lv;
+		for (n = 0; n < ilinelock->index; n++, lv++) {
+			memcpy(&xp->slot[lv->offset], &p->slot[lv->offset],
+			       lv->length << L2DTSLOTSIZE);
+		}
+	} else {
+		jERROR(1, ("diWrite: UFO tlock\n"));
+	}
+
+      inlineData:
+	/*
+	 * copy inline symlink from in-memory inode to on-disk inode
+	 */
+	if (S_ISLNK(ip->i_mode) && ip->i_size < IDATASIZE) {
+		lv = (lv_t *) & dilinelock->lv[dilinelock->index];
+		lv->offset = (dioffset + 2 * 128) >> L2INODESLOTSIZE;
+		lv->length = 2;
+		memcpy(&dp->di_fastsymlink, jfs_ip->i_inline, IDATASIZE);
+		dilinelock->index++;
+	}
+#ifdef _STILL_TO_PORT
+	/*
+	 * copy inline data from in-memory inode to on-disk inode:
+	 * 128 byte slot granularity
+	 */
+	if (test_cflag(COMMIT_Inlineea, ip))
+		lv = (lv_t *) & dilinelock->lv[dilinelock->index];
+		lv->offset = (dioffset + 3 * 128) >> L2INODESLOTSIZE;
+		lv->length = 1;
+		memcpy(&dp->di_inlineea, &ip->i_inlineea, INODESLOTSIZE);
+		dilinelock->index++;
+
+		clear_cflag(COMMIT_Inlineea, ip);
+	}
+#endif				/* _STILL_TO_PORT */
+
+	/*
+	 *      lock/copy inode base: 128 byte slot granularity
+	 */
+// baseDinode:
+	lv = (lv_t *) & dilinelock->lv[dilinelock->index];
+	lv->offset = dioffset >> L2INODESLOTSIZE;
+	copy_to_dinode(dp, ip);
+	if (test_and_clear_cflag(COMMIT_Dirtable, ip)) {
+		lv->length = 2;
+		memcpy(&dp->di_dirtable, &jfs_ip->i_dirtable, 96);
+	} else
+		lv->length = 1;
+	dilinelock->index++;
+
+#ifdef _JFS_FASTDASD
+	/*
+	 * We aren't logging changes to the DASD used in directory inodes,
+	 * but we need to write them to disk.  If we don't unmount cleanly,
+	 * mount will recalculate the DASD used.
+	 */
+	if (S_ISDIR(ip->i_mode)
+	    && (ip->i_ipmnt->i_mntflag & JFS_DASD_ENABLED))
+		bcopy(&ip->i_DASD, &dp->di_DASD, sizeof(dasd_t));
+#endif				/*  _JFS_FASTDASD */
+
+	/* release the buffer holding the updated on-disk inode. 
+	 * the buffer will be later written by commit processing.
+	 */
+	write_metapage(mp);
+
+	return (rc);
+}
+
+
+/*
+ * NAME:        diFree(ip)
+ *
+ * FUNCTION:    free a specified inode from the inode working map
+ *		for a fileset or aggregate.
+ *
+ *		if the inode to be freed represents the first (only)
+ *		free inode within the iag, the iag will be placed on
+ *		the ag free inode list.
+ *	
+ *		freeing the inode will cause the inode extent to be
+ *		freed if the inode is the only allocated inode within
+ *		the extent.  in this case all the disk resource backing
+ *		up the inode extent will be freed. in addition, the iag
+ *		will be placed on the ag extent free list if the extent
+ *		is the first free extent in the iag.  if freeing the
+ *		extent also means that no free inodes will exist for
+ *		the iag, the iag will also be removed from the ag free
+ *		inode list.
+ *
+ *		the iag describing the inode will be freed if the extent
+ *		is to be freed and it is the only backed extent within
+ *		the iag.  in this case, the iag will be removed from the
+ *		ag free extent list and ag free inode list and placed on
+ *		the inode map's free iag list.
+ *
+ *		a careful update approach is used to provide consistency
+ *		in the face of updates to multiple buffers.  under this
+ *		approach, all required buffers are obtained before making
+ *		any updates and are held until all updates are complete.
+ *
+ * PARAMETERS:
+ *      ip  	- inode to be freed.
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      EIO  	- i/o error.
+ */
+int diFree(struct inode *ip)
+{
+	int rc;
+	ino_t inum = ip->i_ino;
+	iag_t *iagp, *aiagp, *biagp, *ciagp, *diagp;
+	metapage_t *mp, *amp, *bmp, *cmp, *dmp;
+	int iagno, ino, extno, bitno, sword, agno;
+	int back, fwd;
+	u32 bitmap, mask;
+	struct inode *ipimap = JFS_SBI(ip->i_sb)->ipimap;
+	imap_t *imap = JFS_IP(ipimap)->i_imap;
+	s64 xaddr;
+	s64 xlen;
+	pxd_t freepxd;
+	tid_t tid;
+	struct inode *iplist[3];
+	tlock_t *tlck;
+	pxdlock_t *pxdlock;
+
+	/*
+	 * This is just to suppress compiler warnings.  The same logic that
+	 * references these variables is used to initialize them.
+	 */
+	aiagp = biagp = ciagp = diagp = NULL;
+
+	/* get the iag number containing the inode.
+	 */
+	iagno = INOTOIAG(inum);
+
+	/* make sure that the iag is contained within 
+	 * the map.
+	 */
+	//assert(iagno < imap->im_nextiag);
+	if (iagno >= imap->im_nextiag) {
+		jERROR(1, ("diFree: inum = %d, iagno = %d, nextiag = %d\n",
+			   (uint) inum, iagno, imap->im_nextiag));
+		dump_mem("imap", imap, 32);
+		updateSuper(ip->i_sb, FM_DIRTY);
+		return EIO;
+	}
+
+	/* get the allocation group for this ino.
+	 */
+	agno = JFS_IP(ip)->agno;
+
+	/* Lock the AG specific inode map information
+	 */
+	AG_LOCK(imap, agno);
+
+	/* Obtain read lock in imap inode.  Don't release it until we have
+	 * read all of the IAG's that we are going to.
+	 */
+	IREAD_LOCK(ipimap);
+
+	/* read the iag.
+	 */
+	if ((rc = diIAGRead(imap, iagno, &mp))) {
+		IREAD_UNLOCK(ipimap);
+		AG_UNLOCK(imap, agno);
+		return (rc);
+	}
+	iagp = (iag_t *) mp->data;
+
+	/* get the inode number and extent number of the inode within
+	 * the iag and the inode number within the extent.
+	 */
+	ino = inum & (INOSPERIAG - 1);
+	extno = ino >> L2INOSPEREXT;
+	bitno = ino & (INOSPEREXT - 1);
+	mask = HIGHORDER >> bitno;
+
+	assert(le32_to_cpu(iagp->wmap[extno]) & mask);
+#ifdef _STILL_TO_PORT
+	assert((le32_to_cpu(iagp->pmap[extno]) & mask) == 0);
+#endif				/*  _STILL_TO_PORT */
+	assert(addressPXD(&iagp->inoext[extno]));
+
+	/* compute the bitmap for the extent reflecting the freed inode.
+	 */
+	bitmap = le32_to_cpu(iagp->wmap[extno]) & ~mask;
+
+	if (imap->im_agctl[agno].numfree > imap->im_agctl[agno].numinos) {
+		jERROR(1,("diFree: numfree > numinos\n"));
+		release_metapage(mp);
+		IREAD_UNLOCK(ipimap);
+		AG_UNLOCK(imap, agno);
+		updateSuper(ip->i_sb, FM_DIRTY);
+		return EIO;
+	}
+	/*
+	 *      inode extent still has some inodes or below low water mark:
+	 *      keep the inode extent;
+	 */
+	if (bitmap ||
+	    imap->im_agctl[agno].numfree < 96 ||
+	    (imap->im_agctl[agno].numfree < 288 &&
+	     (((imap->im_agctl[agno].numfree * 100) /
+	       imap->im_agctl[agno].numinos) <= 25))) {
+		/* if the iag currently has no free inodes (i.e.,
+		 * the inode being freed is the first free inode of iag),
+		 * insert the iag at head of the inode free list for the ag.
+		 */
+		if (iagp->nfreeinos == 0) {
+			/* check if there are any iags on the ag inode
+			 * free list.  if so, read the first one so that
+			 * we can link the current iag onto the list at
+			 * the head.
+			 */
+			if ((fwd = imap->im_agctl[agno].inofree) >= 0) {
+				/* read the iag that currently is the head
+				 * of the list.
+				 */
+				if ((rc = diIAGRead(imap, fwd, &amp))) {
+					IREAD_UNLOCK(ipimap);
+					AG_UNLOCK(imap, agno);
+					release_metapage(mp);
+					return (rc);
+				}
+				aiagp = (iag_t *) amp->data;
+
+				/* make current head point back to the iag.
+				 */
+				aiagp->inofreeback = cpu_to_le32(iagno);
+
+				write_metapage(amp);
+			}
+
+			/* iag points forward to current head and iag
+			 * becomes the new head of the list.
+			 */
+			iagp->inofreefwd =
+			    cpu_to_le32(imap->im_agctl[agno].inofree);
+			iagp->inofreeback = -1;
+			imap->im_agctl[agno].inofree = iagno;
+		}
+		IREAD_UNLOCK(ipimap);
+
+		/* update the free inode summary map for the extent if
+		 * freeing the inode means the extent will now have free
+		 * inodes (i.e., the inode being freed is the first free 
+		 * inode of extent),
+		 */
+		if (iagp->wmap[extno] == ONES) {
+			sword = extno >> L2EXTSPERSUM;
+			bitno = extno & (EXTSPERSUM - 1);
+			iagp->inosmap[sword] &=
+			    cpu_to_le32(~(HIGHORDER >> bitno));
+		}
+
+		/* update the bitmap.
+		 */
+		iagp->wmap[extno] = cpu_to_le32(bitmap);
+		DBG_DIFREE(imap, inum);
+
+		/* update the free inode counts at the iag, ag and
+		 * map level.
+		 */
+		iagp->nfreeinos =
+		    cpu_to_le32(le32_to_cpu(iagp->nfreeinos) + 1);
+		imap->im_agctl[agno].numfree += 1;
+		atomic_inc(&imap->im_numfree);
+
+		/* release the AG inode map lock
+		 */
+		AG_UNLOCK(imap, agno);
+
+		/* write the iag */
+		write_metapage(mp);
+
+		return (0);
+	}
+
+
+	/*
+	 *      inode extent has become free and above low water mark:
+	 *      free the inode extent;
+	 */
+
+	/*
+	 *      prepare to update iag list(s) (careful update step 1)
+	 */
+	amp = bmp = cmp = dmp = NULL;
+	fwd = back = -1;
+
+	/* check if the iag currently has no free extents.  if so,
+	 * it will be placed on the head of the ag extent free list.
+	 */
+	if (iagp->nfreeexts == 0) {
+		/* check if the ag extent free list has any iags.
+		 * if so, read the iag at the head of the list now.
+		 * this (head) iag will be updated later to reflect
+		 * the addition of the current iag at the head of
+		 * the list.
+		 */
+		if ((fwd = imap->im_agctl[agno].extfree) >= 0) {
+			if ((rc = diIAGRead(imap, fwd, &amp)))
+				goto error_out;
+			aiagp = (iag_t *) amp->data;
+		}
+	} else {
+		/* iag has free extents. check if the addition of a free
+		 * extent will cause all extents to be free within this
+		 * iag.  if so, the iag will be removed from the ag extent
+		 * free list and placed on the inode map's free iag list.
+		 */
+		if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG - 1)) {
+			/* in preparation for removing the iag from the
+			 * ag extent free list, read the iags preceeding
+			 * and following the iag on the ag extent free
+			 * list.
+			 */
+			if ((fwd = le32_to_cpu(iagp->extfreefwd)) >= 0) {
+				if ((rc = diIAGRead(imap, fwd, &amp)))
+					goto error_out;
+				aiagp = (iag_t *) amp->data;
+			}
+
+			if ((back = le32_to_cpu(iagp->extfreeback)) >= 0) {
+				if ((rc = diIAGRead(imap, back, &bmp)))
+					goto error_out;
+				biagp = (iag_t *) bmp->data;
+			}
+		}
+	}
+
+	/* remove the iag from the ag inode free list if freeing
+	 * this extent cause the iag to have no free inodes.
+	 */
+	if (iagp->nfreeinos == cpu_to_le32(INOSPEREXT - 1)) {
+		int inofreeback = le32_to_cpu(iagp->inofreeback);
+		int inofreefwd = le32_to_cpu(iagp->inofreefwd);
+
+		/* in preparation for removing the iag from the
+		 * ag inode free list, read the iags preceeding
+		 * and following the iag on the ag inode free
+		 * list.  before reading these iags, we must make
+		 * sure that we already don't have them in hand
+		 * from up above, since re-reading an iag (buffer)
+		 * we are currently holding would cause a deadlock.
+		 */
+		if (inofreefwd >= 0) {
+
+			if (inofreefwd == fwd)
+				ciagp = (iag_t *) amp->data;
+			else if (inofreefwd == back)
+				ciagp = (iag_t *) bmp->data;
+			else {
+				if ((rc =
+				     diIAGRead(imap, inofreefwd, &cmp)))
+					goto error_out;
+				assert(cmp != NULL);
+				ciagp = (iag_t *) cmp->data;
+			}
+			assert(ciagp != NULL);
+		}
+
+		if (inofreeback >= 0) {
+			if (inofreeback == fwd)
+				diagp = (iag_t *) amp->data;
+			else if (inofreeback == back)
+				diagp = (iag_t *) bmp->data;
+			else {
+				if ((rc =
+				     diIAGRead(imap, inofreeback, &dmp)))
+					goto error_out;
+				assert(dmp != NULL);
+				diagp = (iag_t *) dmp->data;
+			}
+			assert(diagp != NULL);
+		}
+	}
+
+	IREAD_UNLOCK(ipimap);
+
+	/*
+	 * invalidate any page of the inode extent freed from buffer cache;
+	 */
+	freepxd = iagp->inoext[extno];
+	xaddr = addressPXD(&iagp->inoext[extno]);
+	xlen = lengthPXD(&iagp->inoext[extno]);
+	invalidate_metapages(JFS_SBI(ip->i_sb)->direct_inode, xaddr, xlen);
+
+	/*
+	 *      update iag list(s) (careful update step 2)
+	 */
+	/* add the iag to the ag extent free list if this is the
+	 * first free extent for the iag.
+	 */
+	if (iagp->nfreeexts == 0) {
+		if (fwd >= 0)
+			aiagp->extfreeback = cpu_to_le32(iagno);
+
+		iagp->extfreefwd =
+		    cpu_to_le32(imap->im_agctl[agno].extfree);
+		iagp->extfreeback = -1;
+		imap->im_agctl[agno].extfree = iagno;
+	} else {
+		/* remove the iag from the ag extent list if all extents
+		 * are now free and place it on the inode map iag free list.
+		 */
+		if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG - 1)) {
+			if (fwd >= 0)
+				aiagp->extfreeback = iagp->extfreeback;
+
+			if (back >= 0)
+				biagp->extfreefwd = iagp->extfreefwd;
+			else
+				imap->im_agctl[agno].extfree =
+				    le32_to_cpu(iagp->extfreefwd);
+
+			iagp->extfreefwd = iagp->extfreeback = -1;
+
+			IAGFREE_LOCK(imap);
+			iagp->iagfree = cpu_to_le32(imap->im_freeiag);
+			imap->im_freeiag = iagno;
+			IAGFREE_UNLOCK(imap);
+		}
+	}
+
+	/* remove the iag from the ag inode free list if freeing
+	 * this extent causes the iag to have no free inodes.
+	 */
+	if (iagp->nfreeinos == cpu_to_le32(INOSPEREXT - 1)) {
+		if ((int) le32_to_cpu(iagp->inofreefwd) >= 0)
+			ciagp->inofreeback = iagp->inofreeback;
+
+		if ((int) le32_to_cpu(iagp->inofreeback) >= 0)
+			diagp->inofreefwd = iagp->inofreefwd;
+		else
+			imap->im_agctl[agno].inofree =
+			    le32_to_cpu(iagp->inofreefwd);
+
+		iagp->inofreefwd = iagp->inofreeback = -1;
+	}
+
+	/* update the inode extent address and working map 
+	 * to reflect the free extent.
+	 * the permanent map should have been updated already 
+	 * for the inode being freed.
+	 */
+	assert(iagp->pmap[extno] == 0);
+	iagp->wmap[extno] = 0;
+	DBG_DIFREE(imap, inum);
+	PXDlength(&iagp->inoext[extno], 0);
+	PXDaddress(&iagp->inoext[extno], 0);
+
+	/* update the free extent and free inode summary maps
+	 * to reflect the freed extent.
+	 * the inode summary map is marked to indicate no inodes 
+	 * available for the freed extent.
+	 */
+	sword = extno >> L2EXTSPERSUM;
+	bitno = extno & (EXTSPERSUM - 1);
+	mask = HIGHORDER >> bitno;
+	iagp->inosmap[sword] |= cpu_to_le32(mask);
+	iagp->extsmap[sword] &= cpu_to_le32(~mask);
+
+	/* update the number of free inodes and number of free extents
+	 * for the iag.
+	 */
+	iagp->nfreeinos = cpu_to_le32(le32_to_cpu(iagp->nfreeinos) -
+				      (INOSPEREXT - 1));
+	iagp->nfreeexts = cpu_to_le32(le32_to_cpu(iagp->nfreeexts) + 1);
+
+	/* update the number of free inodes and backed inodes
+	 * at the ag and inode map level.
+	 */
+	imap->im_agctl[agno].numfree -= (INOSPEREXT - 1);
+	imap->im_agctl[agno].numinos -= INOSPEREXT;
+	atomic_sub(INOSPEREXT - 1, &imap->im_numfree);
+	atomic_sub(INOSPEREXT, &imap->im_numinos);
+
+	if (amp)
+		write_metapage(amp);
+	if (bmp)
+		write_metapage(bmp);
+	if (cmp)
+		write_metapage(cmp);
+	if (dmp)
+		write_metapage(dmp);
+
+	/*
+	 * start transaction to update block allocation map
+	 * for the inode extent freed;
+	 *
+	 * N.B. AG_LOCK is released and iag will be released below, and 
+	 * other thread may allocate inode from/reusing the ixad freed
+	 * BUT with new/different backing inode extent from the extent 
+	 * to be freed by the transaction;  
+	 */
+	tid = txBegin(ipimap->i_sb, COMMIT_FORCE);
+
+	/* acquire tlock of the iag page of the freed ixad 
+	 * to force the page NOHOMEOK (even though no data is
+	 * logged from the iag page) until NOREDOPAGE|FREEXTENT log 
+	 * for the free of the extent is committed;
+	 * write FREEXTENT|NOREDOPAGE log record
+	 * N.B. linelock is overlaid as freed extent descriptor;
+	 */
+	tlck = txLock(tid, ipimap, mp, tlckINODE | tlckFREE);
+	pxdlock = (pxdlock_t *) & tlck->lock;
+	pxdlock->flag = mlckFREEPXD;
+	pxdlock->pxd = freepxd;
+	pxdlock->index = 1;
+
+	write_metapage(mp);
+
+	iplist[0] = ipimap;
+
+	/*
+	 * logredo needs the IAG number and IAG extent index in order
+	 * to ensure that the IMap is consistent.  The least disruptive
+	 * way to pass these values through  to the transaction manager
+	 * is in the iplist array.  
+	 * 
+	 * It's not pretty, but it works.
+	 */
+	iplist[1] = (struct inode *) (size_t)iagno;
+	iplist[2] = (struct inode *) (size_t)extno;
+
+	rc = txCommit(tid, 1, &iplist[0], COMMIT_FORCE);	// D233382
+
+	txEnd(tid);
+
+	/* unlock the AG inode map information */
+	AG_UNLOCK(imap, agno);
+
+	return (0);
+
+      error_out:
+	IREAD_UNLOCK(ipimap);
+
+	if (amp)
+		release_metapage(amp);
+	if (bmp)
+		release_metapage(bmp);
+	if (cmp)
+		release_metapage(cmp);
+	if (dmp)
+		release_metapage(dmp);
+
+	AG_UNLOCK(imap, agno);
+
+	release_metapage(mp);
+
+	return (rc);
+}
+
+/*
+ * There are several places in the diAlloc* routines where we initialize
+ * the inode.
+ */
+static inline void
+diInitInode(struct inode *ip, int iagno, int ino, int extno, iag_t * iagp)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(ip->i_sb);
+	struct jfs_inode_info *jfs_ip = JFS_IP(ip);
+
+	ip->i_ino = (iagno << L2INOSPERIAG) + ino;
+	DBG_DIALLOC(JFS_IP(ipimap)->i_imap, ip->i_ino);
+	jfs_ip->ixpxd = iagp->inoext[extno];
+	jfs_ip->agno = BLKTOAG(le64_to_cpu(iagp->agstart), sbi);
+}
+
+
+/*
+ * NAME:        diAlloc(pip,dir,ip)
+ *
+ * FUNCTION:    allocate a disk inode from the inode working map 
+ *		for a fileset or aggregate.
+ *
+ * PARAMETERS:
+ *      pip  	- pointer to incore inode for the parent inode.
+ *      dir  	- TRUE if the new disk inode is for a directory.
+ *      ip  	- pointer to a new inode
+ *
+ * RETURN VALUES:
+ *      0       - success.
+ *      ENOSPC 	- insufficient disk resources.
+ *      EIO  	- i/o error.
+ */
+int diAlloc(struct inode *pip, boolean_t dir, struct inode *ip)
+{
+	int rc, ino, iagno, addext, extno, bitno, sword;
+	int nwords, rem, i, agno;
+	u32 mask, inosmap, extsmap;
+	struct inode *ipimap;
+	metapage_t *mp;
+	ino_t inum;
+	iag_t *iagp;
+	imap_t *imap;
+
+	/* get the pointers to the inode map inode and the
+	 * corresponding imap control structure.
+	 */
+	ipimap = JFS_SBI(pip->i_sb)->ipimap;
+	imap = JFS_IP(ipimap)->i_imap;
+	JFS_IP(ip)->ipimap = ipimap;
+	JFS_IP(ip)->fileset = FILESYSTEM_I;
+
+	/* for a directory, the allocation policy is to start 
+	 * at the ag level using the preferred ag.
+	 */
+	if (dir == TRUE) {
+		agno = dbNextAG(JFS_SBI(pip->i_sb)->ipbmap);
+		AG_LOCK(imap, agno);
+		goto tryag;
+	}
+
+	/* for files, the policy starts off by trying to allocate from
+	 * the same iag containing the parent disk inode:
+	 * try to allocate the new disk inode close to the parent disk
+	 * inode, using parent disk inode number + 1 as the allocation
+	 * hint.  (we use a left-to-right policy to attempt to avoid
+	 * moving backward on the disk.)  compute the hint within the
+	 * file system and the iag.
+	 */
+	inum = pip->i_ino + 1;
+	ino = inum & (INOSPERIAG - 1);
+
+	/* back off the the hint if it is outside of the iag */
+	if (ino == 0)
+		inum = pip->i_ino;
+
+	/* get the ag number of this iag */
+	agno = JFS_IP(pip)->agno;
+
+	/* lock the AG inode map information */
+	AG_LOCK(imap, agno);
+
+	/* Get read lock on imap inode */
+	IREAD_LOCK(ipimap);
+
+	/* get the iag number and read the iag */
+	iagno = INOTOIAG(inum);
+	if ((rc = diIAGRead(imap, iagno, &mp))) {
+		IREAD_UNLOCK(ipimap);
+		return (rc);
+	}
+	iagp = (iag_t *) mp->data;
+
+	/* determine if new inode extent is allowed to be added to the iag.
+	 * new inode extent can be added to the iag if the ag
+	 * has less than 32 free disk inodes and the iag has free extents.
+	 */
+	addext = (imap->im_agctl[agno].numfree < 32 && iagp->nfreeexts);
+
+	/*
+	 *      try to allocate from the IAG
+	 */
+	/* check if the inode may be allocated from the iag 
+	 * (i.e. the inode has free inodes or new extent can be added).
+	 */
+	if (iagp->nfreeinos || addext) {
+		/* determine the extent number of the hint.
+		 */
+		extno = ino >> L2INOSPEREXT;
+
+		/* check if the extent containing the hint has backed
+		 * inodes.  if so, try to allocate within this extent.
+		 */
+		if (addressPXD(&iagp->inoext[extno])) {
+			bitno = ino & (INOSPEREXT - 1);
+			if ((bitno =
+			     diFindFree(le32_to_cpu(iagp->wmap[extno]),
+					bitno))
+			    < INOSPEREXT) {
+				ino = (extno << L2INOSPEREXT) + bitno;
+
+				/* a free inode (bit) was found within this
+				 * extent, so allocate it.
+				 */
+				rc = diAllocBit(imap, iagp, ino);
+				IREAD_UNLOCK(ipimap);
+				if (rc) {
+					assert(rc == EIO);
+				} else {
+					/* set the results of the allocation
+					 * and write the iag.
+					 */
+					diInitInode(ip, iagno, ino, extno,
+						    iagp);
+					mark_metapage_dirty(mp);
+				}
+				release_metapage(mp);
+
+				/* free the AG lock and return.
+				 */
+				AG_UNLOCK(imap, agno);
+				return (rc);
+			}
+
+			if (!addext)
+				extno =
+				    (extno ==
+				     EXTSPERIAG - 1) ? 0 : extno + 1;
+		}
+
+		/*
+		 * no free inodes within the extent containing the hint.
+		 *
+		 * try to allocate from the backed extents following
+		 * hint or, if appropriate (i.e. addext is true), allocate
+		 * an extent of free inodes at or following the extent
+		 * containing the hint.
+		 * 
+		 * the free inode and free extent summary maps are used
+		 * here, so determine the starting summary map position
+		 * and the number of words we'll have to examine.  again,
+		 * the approach is to allocate following the hint, so we
+		 * might have to initially ignore prior bits of the summary
+		 * map that represent extents prior to the extent containing
+		 * the hint and later revisit these bits.
+		 */
+		bitno = extno & (EXTSPERSUM - 1);
+		nwords = (bitno == 0) ? SMAPSZ : SMAPSZ + 1;
+		sword = extno >> L2EXTSPERSUM;
+
+		/* mask any prior bits for the starting words of the
+		 * summary map.
+		 */
+		mask = ONES << (EXTSPERSUM - bitno);
+		inosmap = le32_to_cpu(iagp->inosmap[sword]) | mask;
+		extsmap = le32_to_cpu(iagp->extsmap[sword]) | mask;
+
+		/* scan the free inode and free extent summary maps for
+		 * free resources.
+		 */
+		for (i = 0; i < nwords; i++) {
+			/* check if this word of the free inode summary
+			 * map describes an extent with free inodes.
+			 */
+			if (~inosmap) {
+				/* an extent with free inodes has been
+				 * found. determine the extent number
+				 * and the inode number within the extent.
+				 */
+				rem = diFindFree(inosmap, 0);
+				extno = (sword << L2EXTSPERSUM) + rem;
+				rem =
+				    diFindFree(le32_to_cpu
+					       (iagp->wmap[extno]), 0);
+				assert(rem < INOSPEREXT);
+
+				/* determine the inode number within the
+				 * iag and allocate the inode from the
+				 * map.
+				 */
+				ino = (extno << L2INOSPEREXT) + rem;
+				rc = diAllocBit(imap, iagp, ino);
+				IREAD_UNLOCK(ipimap);
+				if (rc) {
+					assert(rc == EIO);
+				} else {
+					/* set the results of the allocation
+					 * and write the iag.
+					 */
+					diInitInode(ip, iagno, ino, extno,
+						    iagp);
+					mark_metapage_dirty(mp);
+				}
+				release_metapage(mp);
+
+				/* free the AG lock and return.
+				 */
+				AG_UNLOCK(imap, agno);
+				return (rc);
+
+			}
+
+			/* check if we may allocate an extent of free
+			 * inodes and whether this word of the free
+			 * extents summary map describes a free extent.
+			 */
+			if (addext && ~extsmap) {
+				/* a free extent has been found.  determine
+				 * the extent number.
+				 */
+				rem = diFindFree(extsmap, 0);
+				extno = (sword << L2EXTSPERSUM) + rem;
+
+				/* allocate an extent of free inodes.
+				 */
+				if ((rc = diNewExt(imap, iagp, extno))) {
+					/* if there is no disk space for a
+					 * new extent, try to allocate the
+					 * disk inode from somewhere else.
+					 */
+					if (rc == ENOSPC)
+						break;
+
+					assert(rc == EIO);
+				} else {
+					/* set the results of the allocation
+					 * and write the iag.
+					 */
+					diInitInode(ip, iagno,
+						    extno << L2INOSPEREXT,
+						    extno, iagp);
+					mark_metapage_dirty(mp);
+				}
+				release_metapage(mp);
+				/* free the imap inode & the AG lock & return.
+				 */
+				IREAD_UNLOCK(ipimap);
+				AG_UNLOCK(imap, agno);
+				return (rc);
+			}
+
+			/* move on to the next set of summary map words.
+			 */
+			sword = (sword == SMAPSZ - 1) ? 0 : sword + 1;
+			inosmap = le32_to_cpu(iagp->inosmap[sword]);
+			extsmap = le32_to_cpu(iagp->extsmap[sword]);
+		}
+	}
+	/* unlock imap inode */
+	IREAD_UNLOCK(ipimap);
+
+	/* nothing doing in this iag, so release it. */
+	release_metapage(mp);
+
+      tryag:
+	/*
+	 * try to allocate anywhere within the same AG as the parent inode.
+	 */
+	rc = diAllocAG(imap, agno, dir, ip);
+
+	AG_UNLOCK(imap, agno);
+
+	if (rc != ENOSPC)
+		return (rc);
+
+	/*
+	 * try to allocate in any AG.
+	 */
+	return (diAllocAny(imap, agno, dir, ip));
+}
+
+
+/*
+ * NAME:        diAllocAG(imap,agno,dir,ip)
+ *
+ * FUNCTION:    allocate a disk inode from the allocation group.
+ *
+ *		this routine first determines if a new extent of free
+ *		inodes should be added for the allocation group, with
+ *		the current request satisfied from this extent. if this
+ *		is the case, an attempt will be made to do just that.  if
+ *		this attempt fails or it has been determined that a new 
+ *		extent should not be added, an attempt is made to satisfy
+ *		the request by allocating an existing (backed) free inode
+ *		from the allocation group.
+ *
+ * PRE CONDITION: Already have the AG lock for this AG.
+ *
+ * PARAMETERS:
+ *      imap  	- pointer to inode map control structure.
+ *      agno  	- allocation group to allocate from.
+ *      dir  	- TRUE if the new disk inode is for a directory.
+ *      ip  	- pointer to the new inode to be filled in on successful return
+ *		  with the disk inode number allocated, its extent address
+ *		  and the start of the ag.
+ *
+ * RETURN VALUES:
+ *      0       - success.
+ *      ENOSPC 	- insufficient disk resources.
+ *      EIO  	- i/o error.
+ */
+static int
+diAllocAG(imap_t * imap, int agno, boolean_t dir, struct inode *ip)
+{
+	int rc, addext, numfree, numinos;
+
+	/* get the number of free and the number of backed disk 
+	 * inodes currently within the ag.
+	 */
+	numfree = imap->im_agctl[agno].numfree;
+	numinos = imap->im_agctl[agno].numinos;
+
+	if (numfree > numinos) {
+		jERROR(1,("diAllocAG: numfree > numinos\n"));
+		updateSuper(ip->i_sb, FM_DIRTY);
+		return EIO;
+	}
+
+	/* determine if we should allocate a new extent of free inodes
+	 * within the ag: for directory inodes, add a new extent
+	 * if there are a small number of free inodes or number of free
+	 * inodes is a small percentage of the number of backed inodes.
+	 */
+	if (dir == TRUE)
+		addext = (numfree < 64 ||
+			  (numfree < 256
+			   && ((numfree * 100) / numinos) <= 20));
+	else
+		addext = (numfree == 0);
+
+	/*
+	 * try to allocate a new extent of free inodes.
+	 */
+	if (addext) {
+		/* if free space is not avaliable for this new extent, try
+		 * below to allocate a free and existing (already backed)
+		 * inode from the ag.
+		 */
+		if ((rc = diAllocExt(imap, agno, ip)) != ENOSPC)
+			return (rc);
+	}
+
+	/*
+	 * try to allocate an existing free inode from the ag.
+	 */
+	return (diAllocIno(imap, agno, ip));
+}
+
+
+/*
+ * NAME:        diAllocAny(imap,agno,dir,iap)
+ *
+ * FUNCTION:    allocate a disk inode from any other allocation group.
+ *
+ *		this routine is called when an allocation attempt within
+ *		the primary allocation group has failed. if attempts to
+ *		allocate an inode from any allocation group other than the
+ *		specified primary group.
+ *
+ * PARAMETERS:
+ *      imap  	- pointer to inode map control structure.
+ *      agno  	- primary allocation group (to avoid).
+ *      dir  	- TRUE if the new disk inode is for a directory.
+ *      ip  	- pointer to a new inode to be filled in on successful return
+ *		  with the disk inode number allocated, its extent address
+ *		  and the start of the ag.
+ *
+ * RETURN VALUES:
+ *      0       - success.
+ *      ENOSPC 	- insufficient disk resources.
+ *      EIO  	- i/o error.
+ */
+static int
+diAllocAny(imap_t * imap, int agno, boolean_t dir, struct inode *ip)
+{
+	int ag, rc;
+	int maxag = JFS_SBI(imap->im_ipimap->i_sb)->bmap->db_maxag;
+
+
+	/* try to allocate from the ags following agno up to 
+	 * the maximum ag number.
+	 */
+	for (ag = agno + 1; ag <= maxag; ag++) {
+		AG_LOCK(imap, ag);
+
+		rc = diAllocAG(imap, ag, dir, ip);
+
+		AG_UNLOCK(imap, ag);
+
+		if (rc != ENOSPC)
+			return (rc);
+	}
+
+	/* try to allocate from the ags in front of agno.
+	 */
+	for (ag = 0; ag < agno; ag++) {
+		AG_LOCK(imap, ag);
+
+		rc = diAllocAG(imap, ag, dir, ip);
+
+		AG_UNLOCK(imap, ag);
+
+		if (rc != ENOSPC)
+			return (rc);
+	}
+
+	/* no free disk inodes.
+	 */
+	return (ENOSPC);
+}
+
+
+/*
+ * NAME:        diAllocIno(imap,agno,ip)
+ *
+ * FUNCTION:    allocate a disk inode from the allocation group's free
+ *		inode list, returning an error if this free list is
+ *		empty (i.e. no iags on the list).
+ *
+ *		allocation occurs from the first iag on the list using
+ *		the iag's free inode summary map to find the leftmost
+ *		free inode in the iag. 
+ *		
+ * PRE CONDITION: Already have AG lock for this AG.
+ *		
+ * PARAMETERS:
+ *      imap  	- pointer to inode map control structure.
+ *      agno  	- allocation group.
+ *      ip  	- pointer to new inode to be filled in on successful return
+ *		  with the disk inode number allocated, its extent address
+ *		  and the start of the ag.
+ *
+ * RETURN VALUES:
+ *      0       - success.
+ *      ENOSPC 	- insufficient disk resources.
+ *      EIO  	- i/o error.
+ */
+static int diAllocIno(imap_t * imap, int agno, struct inode *ip)
+{
+	int iagno, ino, rc, rem, extno, sword;
+	metapage_t *mp;
+	iag_t *iagp;
+
+	/* check if there are iags on the ag's free inode list.
+	 */
+	if ((iagno = imap->im_agctl[agno].inofree) < 0)
+		return (ENOSPC);
+
+	/* obtain read lock on imap inode */
+	IREAD_LOCK(imap->im_ipimap);
+
+	/* read the iag at the head of the list.
+	 */
+	if ((rc = diIAGRead(imap, iagno, &mp))) {
+		IREAD_UNLOCK(imap->im_ipimap);
+		return (rc);
+	}
+	iagp = (iag_t *) mp->data;
+
+	/* better be free inodes in this iag if it is on the
+	 * list.
+	 */
+	//assert(iagp->nfreeinos);
+	if (!iagp->nfreeinos) {
+		jERROR(1,
+		       ("diAllocIno: nfreeinos = 0, but iag on freelist\n"));
+		jERROR(1, ("  agno = %d, iagno = %d\n", agno, iagno));
+		dump_mem("iag", iagp, 64);
+		updateSuper(ip->i_sb, FM_DIRTY);
+		return EIO;
+	}
+
+	/* scan the free inode summary map to find an extent
+	 * with free inodes.
+	 */
+	for (sword = 0;; sword++) {
+		assert(sword < SMAPSZ);
+
+		if (~iagp->inosmap[sword])
+			break;
+	}
+
+	/* found a extent with free inodes. determine
+	 * the extent number.
+	 */
+	rem = diFindFree(le32_to_cpu(iagp->inosmap[sword]), 0);
+	assert(rem < EXTSPERSUM);
+	extno = (sword << L2EXTSPERSUM) + rem;
+
+	/* find the first free inode in the extent.
+	 */
+	rem = diFindFree(le32_to_cpu(iagp->wmap[extno]), 0);
+	assert(rem < INOSPEREXT);
+
+	/* compute the inode number within the iag. 
+	 */
+	ino = (extno << L2INOSPEREXT) + rem;
+
+	/* allocate the inode.
+	 */
+	rc = diAllocBit(imap, iagp, ino);
+	IREAD_UNLOCK(imap->im_ipimap);
+	if (rc) {
+		release_metapage(mp);
+		return (rc);
+	}
+
+	/* set the results of the allocation and write the iag.
+	 */
+	diInitInode(ip, iagno, ino, extno, iagp);
+	write_metapage(mp);
+
+	return (0);
+}
+
+
+/*
+ * NAME:        diAllocExt(imap,agno,ip)
+ *
+ * FUNCTION:   	add a new extent of free inodes to an iag, allocating
+ *	       	an inode from this extent to satisfy the current allocation
+ *	       	request.
+ *		
+ *		this routine first tries to find an existing iag with free
+ *		extents through the ag free extent list.  if list is not
+ *		empty, the head of the list will be selected as the home
+ *		of the new extent of free inodes.  otherwise (the list is
+ *		empty), a new iag will be allocated for the ag to contain
+ *		the extent.
+ *		
+ *		once an iag has been selected, the free extent summary map
+ *		is used to locate a free extent within the iag and diNewExt()
+ *		is called to initialize the extent, with initialization
+ *		including the allocation of the first inode of the extent
+ *		for the purpose of satisfying this request.
+ *
+ * PARAMETERS:
+ *      imap  	- pointer to inode map control structure.
+ *      agno  	- allocation group number.
+ *      ip  	- pointer to new inode to be filled in on successful return
+ *		  with the disk inode number allocated, its extent address
+ *		  and the start of the ag.
+ *
+ * RETURN VALUES:
+ *      0       - success.
+ *      ENOSPC 	- insufficient disk resources.
+ *      EIO  	- i/o error.
+ */
+static int diAllocExt(imap_t * imap, int agno, struct inode *ip)
+{
+	int rem, iagno, sword, extno, rc;
+	metapage_t *mp;
+	iag_t *iagp;
+
+	/* check if the ag has any iags with free extents.  if not,
+	 * allocate a new iag for the ag.
+	 */
+	if ((iagno = imap->im_agctl[agno].extfree) < 0) {
+		/* If successful, diNewIAG will obtain the read lock on the
+		 * imap inode.
+		 */
+		if ((rc = diNewIAG(imap, &iagno, agno, &mp))) {
+			return (rc);
+		}
+		iagp = (iag_t *) mp->data;
+
+		/* set the ag number if this a brand new iag
+		 */
+		iagp->agstart =
+		    cpu_to_le64(AGTOBLK(agno, imap->im_ipimap));
+	} else {
+		/* read the iag.
+		 */
+		IREAD_LOCK(imap->im_ipimap);
+		if ((rc = diIAGRead(imap, iagno, &mp))) {
+			assert(0);
+		}
+		iagp = (iag_t *) mp->data;
+	}
+
+	/* using the free extent summary map, find a free extent.
+	 */
+	for (sword = 0;; sword++) {
+		assert(sword < SMAPSZ);
+		if (~iagp->extsmap[sword])
+			break;
+	}
+
+	/* determine the extent number of the free extent.
+	 */
+	rem = diFindFree(le32_to_cpu(iagp->extsmap[sword]), 0);
+	assert(rem < EXTSPERSUM);
+	extno = (sword << L2EXTSPERSUM) + rem;
+
+	/* initialize the new extent.
+	 */
+	rc = diNewExt(imap, iagp, extno);
+	IREAD_UNLOCK(imap->im_ipimap);
+	if (rc) {
+		/* something bad happened.  if a new iag was allocated,
+		 * place it back on the inode map's iag free list, and
+		 * clear the ag number information.
+		 */
+		if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG)) {
+			IAGFREE_LOCK(imap);
+			iagp->iagfree = cpu_to_le32(imap->im_freeiag);
+			imap->im_freeiag = iagno;
+			IAGFREE_UNLOCK(imap);
+		}
+		write_metapage(mp);
+		return (rc);
+	}
+
+	/* set the results of the allocation and write the iag.
+	 */
+	diInitInode(ip, iagno, extno << L2INOSPEREXT, extno, iagp);
+
+	write_metapage(mp);
+
+	return (0);
+}
+
+
+/*
+ * NAME:        diAllocBit(imap,iagp,ino)
+ *
+ * FUNCTION:   	allocate a backed inode from an iag.
+ *
+ *		this routine performs the mechanics of allocating a
+ *		specified inode from a backed extent.
+ *
+ *		if the inode to be allocated represents the last free
+ *		inode within the iag, the iag will be removed from the
+ *		ag free inode list.
+ *
+ *		a careful update approach is used to provide consistency
+ *		in the face of updates to multiple buffers.  under this
+ *		approach, all required buffers are obtained before making
+ *		any updates and are held all are updates are complete.
+ *		
+ * PRE CONDITION: Already have buffer lock on iagp.  Already have AG lock on
+ *	this AG.  Must have read lock on imap inode.
+ *
+ * PARAMETERS:
+ *      imap  	- pointer to inode map control structure.
+ *      iagp  	- pointer to iag. 
+ *      ino   	- inode number to be allocated within the iag.
+ *
+ * RETURN VALUES:
+ *      0       - success.
+ *      ENOSPC 	- insufficient disk resources.
+ *      EIO  	- i/o error.
+ */
+static int diAllocBit(imap_t * imap, iag_t * iagp, int ino)
+{
+	int extno, bitno, agno, sword, rc;
+	metapage_t *amp, *bmp;
+	iag_t *aiagp = 0, *biagp = 0;
+	u32 mask;
+
+	/* check if this is the last free inode within the iag.
+	 * if so, it will have to be removed from the ag free
+	 * inode list, so get the iags preceeding and following
+	 * it on the list.
+	 */
+	if (iagp->nfreeinos == cpu_to_le32(1)) {
+		amp = bmp = NULL;
+
+		if ((int) le32_to_cpu(iagp->inofreefwd) >= 0) {
+			if ((rc =
+			     diIAGRead(imap, le32_to_cpu(iagp->inofreefwd),
+				       &amp)))
+				return (rc);
+			aiagp = (iag_t *) amp->data;
+		}
+
+		if ((int) le32_to_cpu(iagp->inofreeback) >= 0) {
+			if ((rc =
+			     diIAGRead(imap,
+				       le32_to_cpu(iagp->inofreeback),
+				       &bmp))) {
+				if (amp)
+					release_metapage(amp);
+				return (rc);
+			}
+			biagp = (iag_t *) bmp->data;
+		}
+	}
+
+	/* get the ag number, extent number, inode number within
+	 * the extent.
+	 */
+	agno = BLKTOAG(le64_to_cpu(iagp->agstart), JFS_SBI(imap->im_ipimap->i_sb));
+	extno = ino >> L2INOSPEREXT;
+	bitno = ino & (INOSPEREXT - 1);
+
+	/* compute the mask for setting the map.
+	 */
+	mask = HIGHORDER >> bitno;
+
+	/* the inode should be free and backed.
+	 */
+	assert((le32_to_cpu(iagp->pmap[extno]) & mask) == 0);
+	assert((le32_to_cpu(iagp->wmap[extno]) & mask) == 0);
+	assert(addressPXD(&iagp->inoext[extno]) != 0);
+
+	/* mark the inode as allocated in the working map.
+	 */
+	iagp->wmap[extno] |= cpu_to_le32(mask);
+
+	/* check if all inodes within the extent are now
+	 * allocated.  if so, update the free inode summary
+	 * map to reflect this.
+	 */
+	if (iagp->wmap[extno] == ONES) {
+		sword = extno >> L2EXTSPERSUM;
+		bitno = extno & (EXTSPERSUM - 1);
+		iagp->inosmap[sword] |= cpu_to_le32(HIGHORDER >> bitno);
+	}
+
+	/* if this was the last free inode in the iag, remove the
+	 * iag from the ag free inode list.
+	 */
+	if (iagp->nfreeinos == cpu_to_le32(1)) {
+		if (amp) {
+			aiagp->inofreeback = iagp->inofreeback;
+			write_metapage(amp);
+		}
+
+		if (bmp) {
+			biagp->inofreefwd = iagp->inofreefwd;
+			write_metapage(bmp);
+		} else {
+			imap->im_agctl[agno].inofree =
+			    le32_to_cpu(iagp->inofreefwd);
+		}
+		iagp->inofreefwd = iagp->inofreeback = -1;
+	}
+
+	/* update the free inode count at the iag, ag, inode
+	 * map levels.
+	 */
+	iagp->nfreeinos = cpu_to_le32(le32_to_cpu(iagp->nfreeinos) - 1);
+	imap->im_agctl[agno].numfree -= 1;
+	atomic_dec(&imap->im_numfree);
+
+	return (0);
+}
+
+
+/*
+ * NAME:        diNewExt(imap,iagp,extno)
+ *
+ * FUNCTION:    initialize a new extent of inodes for an iag, allocating
+ *	        the first inode of the extent for use for the current
+ *	        allocation request.
+ *
+ *		disk resources are allocated for the new extent of inodes
+ *		and the inodes themselves are initialized to reflect their
+ *		existence within the extent (i.e. their inode numbers and
+ *		inode extent addresses are set) and their initial state
+ *		(mode and link count are set to zero).
+ *
+ *		if the iag is new, it is not yet on an ag extent free list
+ *		but will now be placed on this list.
+ *
+ *		if the allocation of the new extent causes the iag to
+ *		have no free extent, the iag will be removed from the
+ *		ag extent free list.
+ *
+ *		if the iag has no free backed inodes, it will be placed
+ *		on the ag free inode list, since the addition of the new
+ *		extent will now cause it to have free inodes.
+ *
+ *		a careful update approach is used to provide consistency
+ *		(i.e. list consistency) in the face of updates to multiple
+ *		buffers.  under this approach, all required buffers are
+ *		obtained before making any updates and are held until all
+ *		updates are complete.
+ *		
+ * PRE CONDITION: Already have buffer lock on iagp.  Already have AG lock on
+ *	this AG.  Must have read lock on imap inode.
+ *
+ * PARAMETERS:
+ *      imap  	- pointer to inode map control structure.
+ *      iagp  	- pointer to iag. 
+ *      extno  	- extent number.
+ *
+ * RETURN VALUES:
+ *      0       - success.
+ *      ENOSPC 	- insufficient disk resources.
+ *      EIO  	- i/o error.
+ */
+static int diNewExt(imap_t * imap, iag_t * iagp, int extno)
+{
+	int agno, iagno, fwd, back, freei = 0, sword, rc;
+	iag_t *aiagp = 0, *biagp = 0, *ciagp = 0;
+	metapage_t *amp, *bmp, *cmp, *dmp;
+	struct inode *ipimap;
+	s64 blkno, hint;
+	int i, j;
+	u32 mask;
+	ino_t ino;
+	dinode_t *dp;
+	struct jfs_sb_info *sbi;
+
+	/* better have free extents.
+	 */
+	assert(iagp->nfreeexts);
+
+	/* get the inode map inode.
+	 */
+	ipimap = imap->im_ipimap;
+	sbi = JFS_SBI(ipimap->i_sb);
+
+	amp = bmp = cmp = NULL;
+
+	/* get the ag and iag numbers for this iag.
+	 */
+	agno = BLKTOAG(le64_to_cpu(iagp->agstart), sbi);
+	iagno = le32_to_cpu(iagp->iagnum);
+
+	/* check if this is the last free extent within the
+	 * iag.  if so, the iag must be removed from the ag
+	 * free extent list, so get the iags preceeding and
+	 * following the iag on this list.
+	 */
+	if (iagp->nfreeexts == cpu_to_le32(1)) {
+		if ((fwd = le32_to_cpu(iagp->extfreefwd)) >= 0) {
+			if ((rc = diIAGRead(imap, fwd, &amp)))
+				return (rc);
+			aiagp = (iag_t *) amp->data;
+		}
+
+		if ((back = le32_to_cpu(iagp->extfreeback)) >= 0) {
+			if ((rc = diIAGRead(imap, back, &bmp)))
+				goto error_out;
+			biagp = (iag_t *) bmp->data;
+		}
+	} else {
+		/* the iag has free extents.  if all extents are free
+		 * (as is the case for a newly allocated iag), the iag
+		 * must be added to the ag free extent list, so get
+		 * the iag at the head of the list in preparation for
+		 * adding this iag to this list.
+		 */
+		fwd = back = -1;
+		if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG)) {
+			if ((fwd = imap->im_agctl[agno].extfree) >= 0) {
+				if ((rc = diIAGRead(imap, fwd, &amp)))
+					goto error_out;
+				aiagp = (iag_t *) amp->data;
+			}
+		}
+	}
+
+	/* check if the iag has no free inodes.  if so, the iag
+	 * will have to be added to the ag free inode list, so get
+	 * the iag at the head of the list in preparation for
+	 * adding this iag to this list.  in doing this, we must
+	 * check if we already have the iag at the head of
+	 * the list in hand.
+	 */
+	if (iagp->nfreeinos == 0) {
+		freei = imap->im_agctl[agno].inofree;
+
+		if (freei >= 0) {
+			if (freei == fwd) {
+				ciagp = aiagp;
+			} else if (freei == back) {
+				ciagp = biagp;
+			} else {
+				if ((rc = diIAGRead(imap, freei, &cmp)))
+					goto error_out;
+				ciagp = (iag_t *) cmp->data;
+			}
+			assert(ciagp != NULL);
+		}
+	}
+
+	/* allocate disk space for the inode extent.
+	 */
+	if ((extno == 0) || (addressPXD(&iagp->inoext[extno - 1]) == 0))
+		hint = ((s64) agno << sbi->bmap->db_agl2size) - 1;
+	else
+		hint = addressPXD(&iagp->inoext[extno - 1]) +
+		    lengthPXD(&iagp->inoext[extno - 1]) - 1;
+
+	if ((rc = dbAlloc(ipimap, hint, (s64) imap->im_nbperiext, &blkno)))
+		goto error_out;
+
+	/* compute the inode number of the first inode within the
+	 * extent.
+	 */
+	ino = (iagno << L2INOSPERIAG) + (extno << L2INOSPEREXT);
+
+	/* initialize the inodes within the newly allocated extent a
+	 * page at a time.
+	 */
+	for (i = 0; i < imap->im_nbperiext; i += sbi->nbperpage) {
+		/* get a buffer for this page of disk inodes.
+		 */
+		dmp = get_metapage(ipimap, blkno + i, PSIZE, 1);
+		if (dmp == NULL) {
+			rc = EIO;
+			goto error_out;
+		}
+		dp = (dinode_t *) dmp->data;
+
+		/* initialize the inode number, mode, link count and
+		 * inode extent address.
+		 */
+		for (j = 0; j < INOSPERPAGE; j++, dp++, ino++) {
+			dp->di_inostamp = cpu_to_le32(sbi->inostamp);
+			dp->di_number = cpu_to_le32(ino);
+			dp->di_fileset = cpu_to_le32(FILESYSTEM_I);
+			dp->di_mode = 0;
+			dp->di_nlink = 0;
+			PXDaddress(&(dp->di_ixpxd), blkno);
+			PXDlength(&(dp->di_ixpxd), imap->im_nbperiext);
+		}
+		write_metapage(dmp);
+	}
+
+	/* if this is the last free extent within the iag, remove the
+	 * iag from the ag free extent list.
+	 */
+	if (iagp->nfreeexts == cpu_to_le32(1)) {
+		if (fwd >= 0)
+			aiagp->extfreeback = iagp->extfreeback;
+
+		if (back >= 0)
+			biagp->extfreefwd = iagp->extfreefwd;
+		else
+			imap->im_agctl[agno].extfree =
+			    le32_to_cpu(iagp->extfreefwd);
+
+		iagp->extfreefwd = iagp->extfreeback = -1;
+	} else {
+		/* if the iag has all free extents (newly allocated iag),
+		 * add the iag to the ag free extent list.
+		 */
+		if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG)) {
+			if (fwd >= 0)
+				aiagp->extfreeback = cpu_to_le32(iagno);
+
+			iagp->extfreefwd = cpu_to_le32(fwd);
+			iagp->extfreeback = -1;
+			imap->im_agctl[agno].extfree = iagno;
+		}
+	}
+
+	/* if the iag has no free inodes, add the iag to the
+	 * ag free inode list.
+	 */
+	if (iagp->nfreeinos == 0) {
+		if (freei >= 0)
+			ciagp->inofreeback = cpu_to_le32(iagno);
+
+		iagp->inofreefwd =
+		    cpu_to_le32(imap->im_agctl[agno].inofree);
+		iagp->inofreeback = -1;
+		imap->im_agctl[agno].inofree = iagno;
+	}
+
+	/* initialize the extent descriptor of the extent. */
+	PXDlength(&iagp->inoext[extno], imap->im_nbperiext);
+	PXDaddress(&iagp->inoext[extno], blkno);
+
+	/* initialize the working and persistent map of the extent.
+	 * the working map will be initialized such that
+	 * it indicates the first inode of the extent is allocated.
+	 */
+	iagp->wmap[extno] = cpu_to_le32(HIGHORDER);
+	iagp->pmap[extno] = 0;
+
+	/* update the free inode and free extent summary maps
+	 * for the extent to indicate the extent has free inodes
+	 * and no longer represents a free extent.
+	 */
+	sword = extno >> L2EXTSPERSUM;
+	mask = HIGHORDER >> (extno & (EXTSPERSUM - 1));
+	iagp->extsmap[sword] |= cpu_to_le32(mask);
+	iagp->inosmap[sword] &= cpu_to_le32(~mask);
+
+	/* update the free inode and free extent counts for the
+	 * iag.
+	 */
+	iagp->nfreeinos = cpu_to_le32(le32_to_cpu(iagp->nfreeinos) +
+				      (INOSPEREXT - 1));
+	iagp->nfreeexts = cpu_to_le32(le32_to_cpu(iagp->nfreeexts) - 1);
+
+	/* update the free and backed inode counts for the ag.
+	 */
+	imap->im_agctl[agno].numfree += (INOSPEREXT - 1);
+	imap->im_agctl[agno].numinos += INOSPEREXT;
+
+	/* update the free and backed inode counts for the inode map.
+	 */
+	atomic_add(INOSPEREXT - 1, &imap->im_numfree);
+	atomic_add(INOSPEREXT, &imap->im_numinos);
+
+	/* write the iags.
+	 */
+	if (amp)
+		write_metapage(amp);
+	if (bmp)
+		write_metapage(bmp);
+	if (cmp)
+		write_metapage(cmp);
+
+	return (0);
+
+      error_out:
+
+	/* release the iags.
+	 */
+	if (amp)
+		release_metapage(amp);
+	if (bmp)
+		release_metapage(bmp);
+	if (cmp)
+		release_metapage(cmp);
+
+	return (rc);
+}
+
+
+/*
+ * NAME:        diNewIAG(imap,iagnop,agno)
+ *
+ * FUNCTION:   	allocate a new iag for an allocation group.
+ *		
+ *		first tries to allocate the iag from the inode map 
+ *		iagfree list:  
+ *		if the list has free iags, the head of the list is removed 
+ *		and returned to satisfy the request.
+ *		if the inode map's iag free list is empty, the inode map
+ *		is extended to hold a new iag. this new iag is initialized
+ *		and returned to satisfy the request.
+ *
+ * PARAMETERS:
+ *      imap  	- pointer to inode map control structure.
+ *      iagnop 	- pointer to an iag number set with the number of the
+ *		  newly allocated iag upon successful return.
+ *      agno  	- allocation group number.
+ *	bpp	- Buffer pointer to be filled in with new IAG's buffer
+ *
+ * RETURN VALUES:
+ *      0       - success.
+ *      ENOSPC 	- insufficient disk resources.
+ *      EIO  	- i/o error.
+ *
+ * serialization: 
+ *	AG lock held on entry/exit;
+ *	write lock on the map is held inside;
+ *	read lock on the map is held on successful completion;
+ *
+ * note: new iag transaction: 
+ * . synchronously write iag;
+ * . write log of xtree and inode  of imap;
+ * . commit;
+ * . synchronous write of xtree (right to left, bottom to top);
+ * . at start of logredo(): init in-memory imap with one additional iag page;
+ * . at end of logredo(): re-read imap inode to determine
+ *   new imap size;
+ */
+static int
+diNewIAG(imap_t * imap, int *iagnop, int agno, metapage_t ** mpp)
+{
+	int rc;
+	int iagno, i, xlen;
+	struct inode *ipimap;
+	struct super_block *sb;
+	struct jfs_sb_info *sbi;
+	metapage_t *mp;
+	iag_t *iagp;
+	s64 xaddr = 0;
+	s64 blkno;
+	tid_t tid;
+#ifdef _STILL_TO_PORT
+	xad_t xad;
+#endif				/*  _STILL_TO_PORT */
+	struct inode *iplist[1];
+
+	/* pick up pointers to the inode map and mount inodes */
+	ipimap = imap->im_ipimap;
+	sb = ipimap->i_sb;
+	sbi = JFS_SBI(sb);
+
+	/* acquire the free iag lock */
+	IAGFREE_LOCK(imap);
+
+	/* if there are any iags on the inode map free iag list, 
+	 * allocate the iag from the head of the list.
+	 */
+	if (imap->im_freeiag >= 0) {
+		/* pick up the iag number at the head of the list */
+		iagno = imap->im_freeiag;
+
+		/* determine the logical block number of the iag */
+		blkno = IAGTOLBLK(iagno, sbi->l2nbperpage);
+	} else {
+		/* no free iags. the inode map will have to be extented
+		 * to include a new iag.
+		 */
+
+		/* acquire inode map lock */
+		IWRITE_LOCK(ipimap);
+
+		assert(ipimap->i_size >> L2PSIZE == imap->im_nextiag + 1);
+
+		/* get the next avaliable iag number */
+		iagno = imap->im_nextiag;
+
+		/* make sure that we have not exceeded the maximum inode
+		 * number limit.
+		 */
+		if (iagno > (MAXIAGS - 1)) {
+			/* release the inode map lock */
+			IWRITE_UNLOCK(ipimap);
+
+			rc = ENOSPC;
+			goto out;
+		}
+
+		/*
+		 * synchronously append new iag page.
+		 */
+		/* determine the logical address of iag page to append */
+		blkno = IAGTOLBLK(iagno, sbi->l2nbperpage);
+
+		/* Allocate extent for new iag page */
+		xlen = sbi->nbperpage;
+		if ((rc = dbAlloc(ipimap, 0, (s64) xlen, &xaddr))) {
+			/* release the inode map lock */
+			IWRITE_UNLOCK(ipimap);
+
+			goto out;
+		}
+
+		/* assign a buffer for the page */
+		mp = get_metapage(ipimap, xaddr, PSIZE, 1);
+		//bp = bmAssign(ipimap, blkno, xaddr, PSIZE, bmREAD_PAGE);
+		if (!mp) {
+			/* Free the blocks allocated for the iag since it was
+			 * not successfully added to the inode map
+			 */
+			dbFree(ipimap, xaddr, (s64) xlen);
+
+			/* release the inode map lock */
+			IWRITE_UNLOCK(ipimap);
+
+			rc = EIO;
+			goto out;
+		}
+		iagp = (iag_t *) mp->data;
+
+		/* init the iag */
+		memset(iagp, 0, sizeof(iag_t));
+		iagp->iagnum = cpu_to_le32(iagno);
+		iagp->inofreefwd = iagp->inofreeback = -1;
+		iagp->extfreefwd = iagp->extfreeback = -1;
+		iagp->iagfree = -1;
+		iagp->nfreeinos = 0;
+		iagp->nfreeexts = cpu_to_le32(EXTSPERIAG);
+
+		/* initialize the free inode summary map (free extent
+		 * summary map initialization handled by bzero).
+		 */
+		for (i = 0; i < SMAPSZ; i++)
+			iagp->inosmap[i] = ONES;
+
+		flush_metapage(mp);
+#ifdef _STILL_TO_PORT
+		/* synchronously write the iag page */
+		if (bmWrite(bp)) {
+			/* Free the blocks allocated for the iag since it was
+			 * not successfully added to the inode map
+			 */
+			dbFree(ipimap, xaddr, (s64) xlen);
+
+			/* release the inode map lock */
+			IWRITE_UNLOCK(ipimap);
+
+			rc = EIO;
+			goto out;
+		}
+
+		/* Now the iag is on disk */
+
+		/*
+		 * start tyransaction of update of the inode map
+		 * addressing structure pointing to the new iag page;
+		 */
+#endif				/*  _STILL_TO_PORT */
+		tid = txBegin(sb, COMMIT_FORCE);
+
+		/* update the inode map addressing structure to point to it */
+		if ((rc =
+		     xtInsert(tid, ipimap, 0, blkno, xlen, &xaddr, 0))) {
+			/* Free the blocks allocated for the iag since it was
+			 * not successfully added to the inode map
+			 */
+			dbFree(ipimap, xaddr, (s64) xlen);
+
+			/* release the inode map lock */
+			IWRITE_UNLOCK(ipimap);
+
+			goto out;
+		}
+
+		/* update the inode map's inode to reflect the extension */
+		ipimap->i_size += PSIZE;
+		ipimap->i_blocks += LBLK2PBLK(sb, xlen);
+
+		/*
+		 * txCommit(COMMIT_FORCE) will synchronously write address 
+		 * index pages and inode after commit in careful update order 
+		 * of address index pages (right to left, bottom up);
+		 */
+		iplist[0] = ipimap;
+		rc = txCommit(tid, 1, &iplist[0], COMMIT_FORCE);
+
+		txEnd(tid);
+
+		duplicateIXtree(sb, blkno, xlen, &xaddr);
+
+		/* update the next avaliable iag number */
+		imap->im_nextiag += 1;
+
+		/* Add the iag to the iag free list so we don't lose the iag
+		 * if a failure happens now.
+		 */
+		imap->im_freeiag = iagno;
+
+		/* Until we have logredo working, we want the imap inode &
+		 * control page to be up to date.
+		 */
+		diSync(ipimap);
+
+		/* release the inode map lock */
+		IWRITE_UNLOCK(ipimap);
+	}
+
+	/* obtain read lock on map */
+	IREAD_LOCK(ipimap);
+
+	/* read the iag */
+	if ((rc = diIAGRead(imap, iagno, &mp))) {
+		IREAD_UNLOCK(ipimap);
+		rc = EIO;
+		goto out;
+	}
+	iagp = (iag_t *) mp->data;
+
+	/* remove the iag from the iag free list */
+	imap->im_freeiag = le32_to_cpu(iagp->iagfree);
+	iagp->iagfree = -1;
+
+	/* set the return iag number and buffer pointer */
+	*iagnop = iagno;
+	*mpp = mp;
+
+      out:
+	/* release the iag free lock */
+	IAGFREE_UNLOCK(imap);
+
+	return (rc);
+}
+
+/*
+ * NAME:        diIAGRead()
+ *
+ * FUNCTION:    get the buffer for the specified iag within a fileset
+ *		or aggregate inode map.
+ *		
+ * PARAMETERS:
+ *      imap  	- pointer to inode map control structure.
+ *      iagno  	- iag number.
+ *      bpp  	- point to buffer pointer to be filled in on successful
+ *		  exit.
+ *
+ * SERIALIZATION:
+ *	must have read lock on imap inode
+ *	(When called by diExtendFS, the filesystem is quiesced, therefore
+ *	 the read lock is unnecessary.)
+ *
+ * RETURN VALUES:
+ *      0       - success.
+ *      EIO  	- i/o error.
+ */
+static int diIAGRead(imap_t * imap, int iagno, metapage_t ** mpp)
+{
+	struct inode *ipimap = imap->im_ipimap;
+	s64 blkno;
+
+	/* compute the logical block number of the iag. */
+	blkno = IAGTOLBLK(iagno, JFS_SBI(ipimap->i_sb)->l2nbperpage);
+
+	/* read the iag. */
+	*mpp = read_metapage(ipimap, blkno, PSIZE, 0);
+	if (*mpp == NULL) {
+		return (EIO);
+	}
+
+	return (0);
+}
+
+/*
+ * NAME:        diFindFree()
+ *
+ * FUNCTION:    find the first free bit in a word starting at
+ *		the specified bit position.
+ *
+ * PARAMETERS:
+ *      word  	- word to be examined.
+ *      start  	- starting bit position.
+ *
+ * RETURN VALUES:
+ *      bit position of first free bit in the word or 32 if
+ *	no free bits were found.
+ */
+static int diFindFree(u32 word, int start)
+{
+	int bitno;
+	assert(start < 32);
+	/* scan the word for the first free bit. */
+	for (word <<= start, bitno = start; bitno < 32;
+	     bitno++, word <<= 1) {
+		if ((word & HIGHORDER) == 0)
+			break;
+	}
+	return (bitno);
+}
+
+/*
+ * NAME:	diUpdatePMap()
+ *                                                                    
+ * FUNCTION: Update the persistent map in an IAG for the allocation or 
+ *	freeing of the specified inode.
+ *                                                                    
+ * PRE CONDITIONS: Working map has already been updated for allocate.
+ *
+ * PARAMETERS:
+ *	ipimap	- Incore inode map inode
+ *	inum	- Number of inode to mark in permanent map
+ *	is_free	- If TRUE indicates inode should be marked freed, otherwise
+ *		  indicates inode should be marked allocated.
+ *
+ * RETURNS: 0 for success
+ */
+int
+diUpdatePMap(struct inode *ipimap,
+	     unsigned long inum, boolean_t is_free, tblock_t * tblk)
+{
+	int rc;
+	iag_t *iagp;
+	metapage_t *mp;
+	int iagno, ino, extno, bitno;
+	imap_t *imap;
+	u32 mask;
+	log_t *log;
+	int lsn, difft, diffp;
+
+	imap = JFS_IP(ipimap)->i_imap;
+	/* get the iag number containing the inode */
+	iagno = INOTOIAG(inum);
+	/* make sure that the iag is contained within the map */
+	assert(iagno < imap->im_nextiag);
+	/* read the iag */
+	IREAD_LOCK(ipimap);
+	rc = diIAGRead(imap, iagno, &mp);
+	IREAD_UNLOCK(ipimap);
+	if (rc)
+		return (rc);
+	iagp = (iag_t *) mp->data;
+	/* get the inode number and extent number of the inode within
+	 * the iag and the inode number within the extent.
+	 */
+	ino = inum & (INOSPERIAG - 1);
+	extno = ino >> L2INOSPEREXT;
+	bitno = ino & (INOSPEREXT - 1);
+	mask = HIGHORDER >> bitno;
+	/* 
+	 * mark the inode free in persistent map:
+	 */
+	if (is_free == TRUE) {
+		/* The inode should have been allocated both in working
+		 * map and in persistent map;
+		 * the inode will be freed from working map at the release
+		 * of last reference release;
+		 */
+//              assert(le32_to_cpu(iagp->wmap[extno]) & mask);
+		if (!(le32_to_cpu(iagp->wmap[extno]) & mask)) {
+			jERROR(1,
+			       ("diUpdatePMap: inode %ld not marked as allocated in wmap!\n",
+				inum));
+			updateSuper(ipimap->i_sb, FM_DIRTY);
+		}
+//              assert(le32_to_cpu(iagp->pmap[extno]) & mask);
+		if (!(le32_to_cpu(iagp->pmap[extno]) & mask)) {
+			jERROR(1,
+			       ("diUpdatePMap: inode %ld not marked as allocated in pmap!\n",
+				inum));
+			updateSuper(ipimap->i_sb, FM_DIRTY);
+		}
+		/* update the bitmap for the extent of the freed inode */
+		iagp->pmap[extno] &= cpu_to_le32(~mask);
+	}
+	/*
+	 * mark the inode allocated in persistent map:
+	 */
+	else {
+		/* The inode should be already allocated in the working map
+		 * and should be free in persistent map;
+		 */
+		assert(le32_to_cpu(iagp->wmap[extno]) & mask);
+		assert((le32_to_cpu(iagp->pmap[extno]) & mask) == 0);
+		/* update the bitmap for the extent of the allocated inode */
+		iagp->pmap[extno] |= cpu_to_le32(mask);
+	}
+	/*
+	 * update iag lsn
+	 */
+	lsn = tblk->lsn;
+	log = JFS_SBI(tblk->sb)->log;
+	if (mp->lsn != 0) {
+		/* inherit older/smaller lsn */
+		logdiff(difft, lsn, log);
+		logdiff(diffp, mp->lsn, log);
+		if (difft < diffp) {
+			mp->lsn = lsn;
+			/* move mp after tblock in logsync list */
+			LOGSYNC_LOCK(log);
+			list_del(&mp->synclist);
+			list_add(&mp->synclist, &tblk->synclist);
+			LOGSYNC_UNLOCK(log);
+		}
+		/* inherit younger/larger clsn */
+		LOGSYNC_LOCK(log);
+		assert(mp->clsn);
+		logdiff(difft, tblk->clsn, log);
+		logdiff(diffp, mp->clsn, log);
+		if (difft > diffp)
+			mp->clsn = tblk->clsn;
+		LOGSYNC_UNLOCK(log);
+	} else {
+		mp->log = log;
+		mp->lsn = lsn;
+		/* insert mp after tblock in logsync list */
+		LOGSYNC_LOCK(log);
+		log->count++;
+		list_add(&mp->synclist, &tblk->synclist);
+		mp->clsn = tblk->clsn;
+		LOGSYNC_UNLOCK(log);
+	}
+//      bmLazyWrite(mp, log->flag & JFS_COMMIT);
+	write_metapage(mp);
+	return (0);
+}
+
+/*
+ *	diExtendFS()
+ *
+ * function: update imap for extendfs();
+ * 
+ * note: AG size has been increased s.t. each k old contiguous AGs are 
+ * coalesced into a new AG;
+ */
+int diExtendFS(struct inode *ipimap, struct inode *ipbmap)
+{
+	int rc, rcx = 0;
+	imap_t *imap = JFS_IP(ipimap)->i_imap;
+	iag_t *iagp = 0, *hiagp = 0;
+	bmap_t *mp = JFS_SBI(ipbmap->i_sb)->bmap;
+	metapage_t *bp, *hbp;
+	int i, n, head;
+	int numinos, xnuminos = 0, xnumfree = 0;
+	s64 agstart;
+
+	jEVENT(0, ("diExtendFS: nextiag:%d numinos:%d numfree:%d\n",
+		   imap->im_nextiag, atomic_read(&imap->im_numinos),
+		   atomic_read(&imap->im_numfree)));
+
+	/*
+	 *      reconstruct imap 
+	 *
+	 * coalesce contiguous k (newAGSize/oldAGSize) AGs;
+	 * i.e., (AGi, ..., AGj) where i = k*n and j = k*(n+1) - 1 to AGn;
+	 * note: new AG size = old AG size * (2**x).
+	 */
+
+	/* init per AG control information im_agctl[] */
+	for (i = 0; i < MAXAG; i++) {
+		imap->im_agctl[i].inofree = -1;	/* free inode list */
+		imap->im_agctl[i].extfree = -1;	/* free extent list */
+		imap->im_agctl[i].numinos = 0;	/* number of backed inodes */
+		imap->im_agctl[i].numfree = 0;	/* number of free backed inodes */
+	}
+
+	/*
+	 *      process each iag_t page of the map.
+	 *
+	 * rebuild AG Free Inode List, AG Free Inode Extent List;
+	 */
+	for (i = 0; i < imap->im_nextiag; i++) {
+		if ((rc = diIAGRead(imap, i, &bp))) {
+			rcx = rc;
+			continue;
+		}
+		iagp = (iag_t *) bp->data;
+		assert(le32_to_cpu(iagp->iagnum) == i);
+
+		/* leave free iag in the free iag list */
+		if (iagp->nfreeexts == cpu_to_le32(EXTSPERIAG)) {  
+		        release_metapage(bp);
+			continue;
+		}
+
+		/* agstart that computes to the same ag is treated as same; */
+		agstart = le64_to_cpu(iagp->agstart);
+		/* iagp->agstart = agstart & ~(mp->db_agsize - 1); */
+		n = agstart >> mp->db_agl2size;
+/*
+printf("diExtendFS: iag:%d agstart:%Ld agno:%d\n", i, agstart, n);
+*/
+
+		/* compute backed inodes */
+		numinos = (EXTSPERIAG - le32_to_cpu(iagp->nfreeexts))
+		    << L2INOSPEREXT;
+		if (numinos > 0) {
+			/* merge AG backed inodes */
+			imap->im_agctl[n].numinos += numinos;
+			xnuminos += numinos;
+		}
+
+		/* if any backed free inodes, insert at AG free inode list */
+		if ((int) le32_to_cpu(iagp->nfreeinos) > 0) {
+			if ((head = imap->im_agctl[n].inofree) == -1)
+				iagp->inofreefwd = iagp->inofreeback = -1;
+			else {
+				if ((rc = diIAGRead(imap, head, &hbp))) {
+					rcx = rc;
+					goto nextiag;
+				}
+				hiagp = (iag_t *) hbp->data;
+				hiagp->inofreeback =
+				    le32_to_cpu(iagp->iagnum);
+				iagp->inofreefwd = cpu_to_le32(head);
+				iagp->inofreeback = -1;
+				write_metapage(hbp);
+			}
+
+			imap->im_agctl[n].inofree =
+			    le32_to_cpu(iagp->iagnum);
+
+			/* merge AG backed free inodes */
+			imap->im_agctl[n].numfree +=
+			    le32_to_cpu(iagp->nfreeinos);
+			xnumfree += le32_to_cpu(iagp->nfreeinos);
+		}
+
+		/* if any free extents, insert at AG free extent list */
+		if (le32_to_cpu(iagp->nfreeexts) > 0) {
+			if ((head = imap->im_agctl[n].extfree) == -1)
+				iagp->extfreefwd = iagp->extfreeback = -1;
+			else {
+				if ((rc = diIAGRead(imap, head, &hbp))) {
+					rcx = rc;
+					goto nextiag;
+				}
+				hiagp = (iag_t *) hbp->data;
+				hiagp->extfreeback = iagp->iagnum;
+				iagp->extfreefwd = cpu_to_le32(head);
+				iagp->extfreeback = -1;
+				write_metapage(hbp);
+			}
+
+			imap->im_agctl[n].extfree =
+			    le32_to_cpu(iagp->iagnum);
+		}
+
+	      nextiag:
+		write_metapage(bp);
+	}
+
+	ASSERT(xnuminos == atomic_read(&imap->im_numinos) &&
+	       xnumfree == atomic_read(&imap->im_numfree));
+
+	return rcx;
+}
+
+
+/*
+ *	duplicateIXtree()
+ *
+ * serialization: IWRITE_LOCK held on entry/exit
+ *
+ * note: shadow page with regular inode (rel.2);
+ */
+static void
+duplicateIXtree(struct super_block *sb, s64 blkno, int xlen, s64 * xaddr)
+{
+	int rc;
+	tid_t tid;
+	struct inode *ip;
+	metapage_t *mpsuper;
+	struct jfs_superblock *j_sb;
+
+	/* if AIT2 ipmap2 is bad, do not try to update it */
+	if (JFS_SBI(sb)->mntflag & JFS_BAD_SAIT)	/* s_flag */
+		return;
+	ip = diReadSpecial(sb, FILESYSTEM_I + INOSPEREXT);
+	if (ip == 0) {
+		JFS_SBI(sb)->mntflag |= JFS_BAD_SAIT;
+		if ((rc = readSuper(sb, &mpsuper)))
+			return;
+		j_sb = (struct jfs_superblock *) (mpsuper->data);
+		j_sb->s_flag |= JFS_BAD_SAIT;
+		write_metapage(mpsuper);
+		return;
+	}
+
+	/* start transaction */
+	tid = txBegin(sb, COMMIT_FORCE);
+	/* update the inode map addressing structure to point to it */
+	if ((rc = xtInsert(tid, ip, 0, blkno, xlen, xaddr, 0))) {
+		JFS_SBI(sb)->mntflag |= JFS_BAD_SAIT;
+		txAbort(tid, 1);
+		goto cleanup;
+
+	}
+	/* update the inode map's inode to reflect the extension */
+	ip->i_size += PSIZE;
+	ip->i_blocks += LBLK2PBLK(sb, xlen);
+	rc = txCommit(tid, 1, &ip, COMMIT_FORCE);
+      cleanup:
+	txEnd(tid);
+	diFreeSpecial(ip);
+}
+
+/*
+ * NAME:        copy_from_dinode()
+ *
+ * FUNCTION:    Copies inode info from disk inode to in-memory inode
+ *
+ * RETURN VALUES:
+ *      0       - success
+ *      ENOMEM	- insufficient memory
+ */
+static int copy_from_dinode(dinode_t * dip, struct inode *ip)
+{
+	struct jfs_inode_info *jfs_ip = JFS_IP(ip);
+
+	jfs_ip->fileset = le32_to_cpu(dip->di_fileset);
+	jfs_ip->mode2 = le32_to_cpu(dip->di_mode);
+
+	ip->i_mode = le32_to_cpu(dip->di_mode) & 0xffff;
+	ip->i_nlink = le32_to_cpu(dip->di_nlink);
+	ip->i_uid = le32_to_cpu(dip->di_uid);
+	ip->i_gid = le32_to_cpu(dip->di_gid);
+	ip->i_size = le64_to_cpu(dip->di_size);
+	ip->i_atime = le32_to_cpu(dip->di_atime.tv_sec);
+	ip->i_mtime = le32_to_cpu(dip->di_mtime.tv_sec);
+	ip->i_ctime = le32_to_cpu(dip->di_ctime.tv_sec);
+	ip->i_blksize = ip->i_sb->s_blocksize;
+	ip->i_blocks = LBLK2PBLK(ip->i_sb, le64_to_cpu(dip->di_nblocks));
+	ip->i_version = ++event;
+	ip->i_generation = le32_to_cpu(dip->di_gen);
+
+	jfs_ip->ixpxd = dip->di_ixpxd;	/* in-memory pxd's are little-endian */
+	jfs_ip->acl = dip->di_acl;	/* as are dxd's */
+	jfs_ip->ea = dip->di_ea;
+	jfs_ip->next_index = le32_to_cpu(dip->di_next_index);
+	jfs_ip->otime = le32_to_cpu(dip->di_otime.tv_sec);
+	jfs_ip->acltype = le32_to_cpu(dip->di_acltype);
+	/*
+	 * We may only need to do this for "special" inodes (dmap, imap)
+	 */
+	if (S_ISCHR(ip->i_mode) || S_ISBLK(ip->i_mode))
+		ip->i_rdev = to_kdev_t(le32_to_cpu(dip->di_rdev));
+	else if (S_ISDIR(ip->i_mode)) {
+		memcpy(&jfs_ip->i_dirtable, &dip->di_dirtable, 384);
+	} else if (!S_ISFIFO(ip->i_mode)) {
+		memcpy(&jfs_ip->i_xtroot, &dip->di_xtroot, 288);
+	}
+	/* Zero the in-memory-only stuff */
+	jfs_ip->cflag = 0;
+	jfs_ip->btindex = 0;
+	jfs_ip->btorder = 0;
+	jfs_ip->bxflag = 0;
+	jfs_ip->blid = 0;
+	jfs_ip->atlhead = 0;
+	jfs_ip->atltail = 0;
+	jfs_ip->xtlid = 0;
+	return (0);
+}
+
+/*
+ * NAME:        copy_to_dinode()
+ *
+ * FUNCTION:    Copies inode info from in-memory inode to disk inode
+ */
+static void copy_to_dinode(dinode_t * dip, struct inode *ip)
+{
+	struct jfs_inode_info *jfs_ip = JFS_IP(ip);
+
+	dip->di_fileset = cpu_to_le32(jfs_ip->fileset);
+	dip->di_inostamp = cpu_to_le32(JFS_SBI(ip->i_sb)->inostamp);
+	dip->di_number = cpu_to_le32(ip->i_ino);
+	dip->di_gen = cpu_to_le32(ip->i_generation);
+	dip->di_size = cpu_to_le64(ip->i_size);
+	dip->di_nblocks = cpu_to_le64(PBLK2LBLK(ip->i_sb, ip->i_blocks));
+	dip->di_nlink = cpu_to_le32(ip->i_nlink);
+	dip->di_uid = cpu_to_le32(ip->i_uid);
+	dip->di_gid = cpu_to_le32(ip->i_gid);
+	/*
+	 * mode2 is only needed for storing the higher order bits.
+	 * Trust i_mode for the lower order ones
+	 */
+	dip->di_mode = cpu_to_le32((jfs_ip->mode2 & 0xffff0000) | ip->i_mode);
+	dip->di_atime.tv_sec = cpu_to_le32(ip->i_atime);
+	dip->di_atime.tv_nsec = 0;
+	dip->di_ctime.tv_sec = cpu_to_le32(ip->i_ctime);
+	dip->di_ctime.tv_nsec = 0;
+	dip->di_mtime.tv_sec = cpu_to_le32(ip->i_mtime);
+	dip->di_mtime.tv_nsec = 0;
+	dip->di_ixpxd = jfs_ip->ixpxd;	/* in-memory pxd's are little-endian */
+	dip->di_acl = jfs_ip->acl;	/* as are dxd's */
+	dip->di_ea = jfs_ip->ea;
+	dip->di_next_index = cpu_to_le32(jfs_ip->next_index);
+	dip->di_otime.tv_sec = cpu_to_le32(jfs_ip->otime);
+	dip->di_otime.tv_nsec = 0;
+	dip->di_acltype = cpu_to_le32(jfs_ip->acltype);
+
+	if (S_ISCHR(ip->i_mode) || S_ISBLK(ip->i_mode))
+		dip->di_rdev = cpu_to_le32(kdev_t_to_nr(ip->i_rdev));
+}
+
+#ifdef	_JFS_DEBUG_IMAP
+/*
+ *	DBGdiInit()
+ */
+static void *DBGdiInit(imap_t * imap)
+{
+	u32 *dimap;
+	int size;
+	size = 64 * 1024;
+	if ((dimap = (u32 *) xmalloc(size, L2PSIZE, kernel_heap)) == NULL)
+		assert(0);
+	bzero((void *) dimap, size);
+	imap->im_DBGdimap = dimap;
+}
+
+/*
+ *	DBGdiAlloc()
+ */
+static void DBGdiAlloc(imap_t * imap, ino_t ino)
+{
+	u32 *dimap = imap->im_DBGdimap;
+	int w, b;
+	u32 m;
+	w = ino >> 5;
+	b = ino & 31;
+	m = 0x80000000 >> b;
+	assert(w < 64 * 256);
+	if (dimap[w] & m) {
+		printk("DEBUG diAlloc: duplicate alloc ino:0x%x\n", ino);
+	}
+	dimap[w] |= m;
+}
+
+/*
+ *	DBGdiFree()
+ */
+static void DBGdiFree(imap_t * imap, ino_t ino)
+{
+	u32 *dimap = imap->im_DBGdimap;
+	int w, b;
+	u32 m;
+	w = ino >> 5;
+	b = ino & 31;
+	m = 0x80000000 >> b;
+	assert(w < 64 * 256);
+	if ((dimap[w] & m) == 0) {
+		printk("DEBUG diFree: duplicate free ino:0x%x\n", ino);
+	}
+	dimap[w] &= ~m;
+}
+
+static void dump_cp(imap_t * ipimap, char *function, int line)
+{
+	printk("\n* ********* *\nControl Page %s %d\n", function, line);
+	printk("FreeIAG %d\tNextIAG %d\n", ipimap->im_freeiag,
+	       ipimap->im_nextiag);
+	printk("NumInos %d\tNumFree %d\n",
+	       atomic_read(&ipimap->im_numinos),
+	       atomic_read(&ipimap->im_numfree));
+	printk("AG InoFree %d\tAG ExtFree %d\n",
+	       ipimap->im_agctl[0].inofree, ipimap->im_agctl[0].extfree);
+	printk("AG NumInos %d\tAG NumFree %d\n",
+	       ipimap->im_agctl[0].numinos, ipimap->im_agctl[0].numfree);
+}
+
+static void dump_iag(iag_t * iag, char *function, int line)
+{
+	printk("\n* ********* *\nIAG %s %d\n", function, line);
+	printk("IagNum %d\tIAG Free %d\n", le32_to_cpu(iag->iagnum),
+	       le32_to_cpu(iag->iagfree));
+	printk("InoFreeFwd %d\tInoFreeBack %d\n",
+	       le32_to_cpu(iag->inofreefwd),
+	       le32_to_cpu(iag->inofreeback));
+	printk("ExtFreeFwd %d\tExtFreeBack %d\n",
+	       le32_to_cpu(iag->extfreefwd),
+	       le32_to_cpu(iag->extfreeback));
+	printk("NFreeInos %d\tNFreeExts %d\n", le32_to_cpu(iag->nfreeinos),
+	       le32_to_cpu(iag->nfreeexts));
+}
+#endif				/* _JFS_DEBUG_IMAP */
--- a/fs/jfs/jfs_imap.h
+++ b/fs/jfs/jfs_imap.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef	_H_JFS_IMAP
+#define _H_JFS_IMAP
+
+#include "jfs_txnmgr.h"
+
+/*
+ *	jfs_imap.h: disk inode manager
+ */
+
+#define	EXTSPERIAG	128	/* number of disk inode extent per iag  */
+#define IMAPBLKNO	0	/* lblkno of dinomap within inode map   */
+#define SMAPSZ		4	/* number of words per summary map      */
+#define	EXTSPERSUM	32	/* number of extents per summary map entry */
+#define	L2EXTSPERSUM	5	/* l2 number of extents per summary map */
+#define	PGSPERIEXT	4	/* number of 4K pages per dinode extent */
+#define	MAXIAGS		((1<<20)-1)	/* maximum number of iags       */
+#define	MAXAG		128	/* maximum number of allocation groups  */
+
+#define AMAPSIZE      512	/* bytes in the IAG allocation maps */
+#define SMAPSIZE      16	/* bytes in the IAG summary maps */
+
+/* convert inode number to iag number */
+#define	INOTOIAG(ino)	((ino) >> L2INOSPERIAG)
+
+/* convert iag number to logical block number of the iag page */
+#define IAGTOLBLK(iagno,l2nbperpg)	(((iagno) + 1) << (l2nbperpg))
+
+/* get the starting block number of the 4K page of an inode extent
+ * that contains ino.
+ */
+#define INOPBLK(pxd,ino,l2nbperpg)    	(addressPXD((pxd)) +		\
+	((((ino) & (INOSPEREXT-1)) >> L2INOSPERPAGE) << (l2nbperpg)))
+
+/*
+ *	inode allocation map:
+ * 
+ * inode allocation map consists of 
+ * . the inode map control page and
+ * . inode allocation group pages (per 4096 inodes)
+ * which are addressed by standard JFS xtree.
+ */
+/*
+ *	inode allocation group page (per 4096 inodes of an AG)
+ */
+typedef struct {
+	s64 agstart;		/* 8: starting block of ag              */
+	s32 iagnum;		/* 4: inode allocation group number     */
+	s32 inofreefwd;		/* 4: ag inode free list forward        */
+	s32 inofreeback;	/* 4: ag inode free list back           */
+	s32 extfreefwd;		/* 4: ag inode extent free list forward */
+	s32 extfreeback;	/* 4: ag inode extent free list back    */
+	s32 iagfree;		/* 4: iag free list                     */
+
+	/* summary map: 1 bit per inode extent */
+	s32 inosmap[SMAPSZ];	/* 16: sum map of mapwords w/ free inodes;
+				 *      note: this indicates free and backed
+				 *      inodes, if the extent is not backed the
+				 *      value will be 1.  if the extent is
+				 *      backed but all inodes are being used the
+				 *      value will be 1.  if the extent is
+				 *      backed but at least one of the inodes is
+				 *      free the value will be 0.
+				 */
+	s32 extsmap[SMAPSZ];	/* 16: sum map of mapwords w/ free extents */
+	s32 nfreeinos;		/* 4: number of free inodes             */
+	s32 nfreeexts;		/* 4: number of free extents            */
+	/* (72) */
+	u8 pad[1976];		/* 1976: pad to 2048 bytes */
+	/* allocation bit map: 1 bit per inode (0 - free, 1 - allocated) */
+	u32 wmap[EXTSPERIAG];	/* 512: working allocation map  */
+	u32 pmap[EXTSPERIAG];	/* 512: persistent allocation map */
+	pxd_t inoext[EXTSPERIAG];	/* 1024: inode extent addresses */
+} iag_t;			/* (4096) */
+
+/*
+ *	per AG control information (in inode map control page)
+ */
+typedef struct {
+	s32 inofree;		/* 4: free inode list anchor            */
+	s32 extfree;		/* 4: free extent list anchor           */
+	s32 numinos;		/* 4: number of backed inodes           */
+	s32 numfree;		/* 4: number of free inodes             */
+} iagctl_t;			/* (16) */
+
+/*
+ *	per fileset/aggregate inode map control page
+ */
+typedef struct {
+	s32 in_freeiag;		/* 4: free iag list anchor     */
+	s32 in_nextiag;		/* 4: next free iag number     */
+	s32 in_numinos;		/* 4: num of backed inodes */
+	s32 in_numfree;		/* 4: num of free backed inodes */
+	s32 in_nbperiext;	/* 4: num of blocks per inode extent */
+	s32 in_l2nbperiext;	/* 4: l2 of in_nbperiext */
+	s32 in_diskblock;	/* 4: for standalone test driver  */
+	s32 in_maxag;		/* 4: for standalone test driver  */
+	u8 pad[2016];		/* 2016: pad to 2048 */
+	iagctl_t in_agctl[MAXAG];	/* 2048: AG control information */
+} dinomap_t;			/* (4096) */
+
+
+/*
+ *	In-core inode map control page
+ */
+typedef struct inomap {
+	dinomap_t im_imap;	/* 4096: inode allocation control */
+	struct inode *im_ipimap;	/* 4: ptr to inode for imap   */
+	struct semaphore im_freelock;	/* 4: iag free list lock      */
+	struct semaphore im_aglock[MAXAG];	/* 512: per AG locks          */
+	u32 *im_DBGdimap;
+	atomic_t im_numinos;	/* num of backed inodes */
+	atomic_t im_numfree;	/* num of free backed inodes */
+} imap_t;
+
+#define	im_freeiag	im_imap.in_freeiag
+#define	im_nextiag	im_imap.in_nextiag
+#define	im_agctl	im_imap.in_agctl
+#define	im_nbperiext	im_imap.in_nbperiext
+#define	im_l2nbperiext	im_imap.in_l2nbperiext
+
+/* for standalone testdriver
+ */
+#define	im_diskblock	im_imap.in_diskblock
+#define	im_maxag	im_imap.in_maxag
+
+extern int diFree(struct inode *);
+extern int diAlloc(struct inode *, boolean_t, struct inode *);
+extern int diSync(struct inode *);
+/* external references */
+extern int diUpdatePMap(struct inode *ipimap, unsigned long inum,
+			boolean_t is_free, tblock_t * tblk);
+#ifdef _STILL_TO_PORT
+extern int diExtendFS(inode_t * ipimap, inode_t * ipbmap);
+#endif				/* _STILL_TO_PORT */
+
+extern int diMount(struct inode *);
+extern int diUnmount(struct inode *, int);
+extern int diRead(struct inode *);
+extern void diClearExtension(struct inode *);
+extern struct inode *diReadSpecial(struct super_block *, ino_t);
+extern void diWriteSpecial(struct inode *);
+extern void diFreeSpecial(struct inode *);
+extern int diWrite(tid_t tid, struct inode *);
+#endif				/* _H_JFS_IMAP */
--- a/fs/jfs/jfs_incore.h
+++ b/fs/jfs/jfs_incore.h
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+*/ 
+#ifndef _H_JFS_INCORE
+#define _H_JFS_INCORE
+
+#include <linux/slab.h>
+#include <asm/bitops.h>
+#include "jfs_types.h"
+#include "jfs_xtree.h"
+#include "jfs_dtree.h"
+
+/*
+ * JFS magic number
+ */
+#define JFS_SUPER_MAGIC 0x3153464a /* "JFS1" */
+
+/*
+ * Due to header ordering problems this can't be in jfs_lock.h
+ */
+typedef struct	jfs_rwlock {
+	struct rw_semaphore rw_sem;
+	atomic_t in_use;	/* for hacked implementation of trylock */
+} jfs_rwlock_t;
+
+/*
+ * JFS-private inode information
+ */
+struct jfs_inode_info {
+	int	fileset;	/* fileset number (always 16)*/
+	uint	mode2;		/* jfs-specific mode		*/
+	pxd_t   ixpxd;		/* inode extent descriptor	*/
+	dxd_t	acl;		/* dxd describing acl	*/
+	dxd_t	ea;		/* dxd describing ea	*/
+	time_t	otime;		/* time created	*/
+	uint	next_index;	/* next available directory entry index */
+	int	acltype;	/* Type of ACL	*/
+	short	btorder;	/* access order	*/
+	short	btindex;	/* btpage entry index*/
+	struct inode *ipimap;	/* inode map			*/
+	long	cflag;		/* commit flags		*/
+	u16	bxflag;		/* xflag of pseudo buffer?	*/
+	unchar	agno;		/* ag number			*/
+	unchar	pad;		/* pad			*/
+	lid_t	blid;		/* lid of pseudo buffer?	*/
+	lid_t	atlhead;	/* anonymous tlock list head	*/
+	lid_t	atltail;	/* anonymous tlock list tail	*/
+	struct list_head anon_inode_list; /* inodes having anonymous txns */
+	struct list_head mp_list; /* metapages in inode's address space */
+	jfs_rwlock_t rdwrlock;	/* read/write lock	*/
+	lid_t	xtlid;		/* lid of xtree lock on directory */
+	union {
+		struct {
+			xtpage_t _xtroot;	/* 288: xtree root */
+			struct inomap *_imap;	/* 4: inode map header	*/
+		} file;
+		struct {
+			dir_table_slot_t _table[12]; /* 96: directory index */
+			dtroot_t _dtroot;	/* 288: dtree root */
+		} dir;
+		struct {
+			unchar _unused[16];	/* 16: */
+			dxd_t _dxd;		/* 16: */
+			unchar _inline[128];	/* 128: inline symlink */
+		} link;
+	} u;
+	struct inode	vfs_inode;
+};
+#define i_xtroot u.file._xtroot
+#define i_imap u.file._imap
+#define i_dirtable u.dir._table
+#define i_dtroot u.dir._dtroot
+#define i_inline u.link._inline
+
+/*
+ * cflag
+ */
+enum cflags {
+	COMMIT_New,		/* never committed inode   */
+	COMMIT_Nolink,		/* inode committed with zero link count */
+	COMMIT_Inlineea,	/* commit inode inline EA */
+	COMMIT_Freewmap,	/* free WMAP at iClose() */
+	COMMIT_Dirty,		/* Inode is really dirty */
+	COMMIT_Holdlock,	/* Hold the IWRITE_LOCK until commit is done */
+	COMMIT_Dirtable,	/* commit changes to di_dirtable */
+	COMMIT_Stale,		/* data extent is no longer valid */
+	COMMIT_Synclist,	/* metadata pages on group commit synclist */
+};
+
+#define set_cflag(flag, ip)	set_bit(flag, &(JFS_IP(ip)->cflag))
+#define clear_cflag(flag, ip)	clear_bit(flag, &(JFS_IP(ip)->cflag))
+#define test_cflag(flag, ip)	test_bit(flag, &(JFS_IP(ip)->cflag))
+#define test_and_clear_cflag(flag, ip) \
+	test_and_clear_bit(flag, &(JFS_IP(ip)->cflag))
+/*
+ * JFS-private superblock information.
+ */
+struct jfs_sb_info {
+	unsigned long	mntflag;	/* 4: aggregate attributes	*/
+	struct inode	*ipbmap;	/* 4: block map inode		*/
+	struct inode	*ipaimap;	/* 4: aggregate inode map inode	*/
+	struct inode	*ipaimap2;	/* 4: secondary aimap inode	*/
+	struct inode	*ipimap;	/* 4: aggregate inode map inode	*/
+	struct jfs_log	*log;		/* 4: log			*/
+	short		bsize;		/* 2: logical block size	*/
+	short		l2bsize;	/* 2: log2 logical block size	*/
+	short		nbperpage;	/* 2: blocks per page		*/
+	short		l2nbperpage;	/* 2: log2 blocks per page	*/
+	short		l2niperblk;	/* 2: log2 inodes per page	*/
+	short		reserved;	/* 2: log2 inodes per page	*/
+	pxd_t		logpxd;		/* 8: pxd describing log	*/
+	pxd_t		ait2;		/* 8: pxd describing AIT copy	*/
+	/* Formerly in ipimap */
+	uint		gengen;		/* 4: inode generation generator*/
+	uint		inostamp;	/* 4: shows inode belongs to fileset*/
+
+        /* Formerly in ipbmap */
+	struct bmap	*bmap;		/* 4: incore bmap descriptor	*/
+	struct nls_table *nls_tab;	/* 4: current codepage		*/
+	struct inode	*direct_inode;	/* 4: inode for physical I/O	*/
+	struct address_space *direct_mapping; /* 4: mapping for physical I/O */
+	uint		state;		/* 4: mount/recovery state	*/
+};
+
+static inline struct jfs_inode_info *JFS_IP(struct inode *inode)
+{
+	return list_entry(inode, struct jfs_inode_info, vfs_inode);
+}
+#define JFS_SBI(sb)	((struct jfs_sb_info *)(sb)->u.generic_sbp)
+
+#define isReadOnly(ip)	((JFS_SBI((ip)->i_sb)->log) ? 0 : 1)
+
+#endif /* _H_JFS_INCORE */
--- a/fs/jfs/jfs_inode.c
+++ b/fs/jfs/jfs_inode.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/fs.h>
+#include "jfs_incore.h"
+#include "jfs_filsys.h"
+#include "jfs_imap.h"
+#include "jfs_dinode.h"
+#include "jfs_debug.h"
+
+/*
+ * NAME:	ialloc()
+ *
+ * FUNCTION:	Allocate a new inode
+ *
+ */
+struct inode *ialloc(struct inode *parent, umode_t mode)
+{
+	struct super_block *sb = parent->i_sb;
+	struct inode *inode;
+	struct jfs_inode_info *jfs_inode;
+	int rc;
+
+	inode = new_inode(sb);
+	if (!inode) {
+		jERROR(1, ("ialloc: new_inode returned NULL!\n"));
+		return inode;
+	}
+
+	jfs_inode = JFS_IP(inode);
+
+	rc = diAlloc(parent, S_ISDIR(mode), inode);
+	if (rc) {
+		jERROR(1, ("ialloc: diAlloc returned %d!\n", rc));
+		make_bad_inode(inode);
+		iput(inode);
+		return NULL;
+	}
+
+	inode->i_uid = current->fsuid;
+	if (parent->i_mode & S_ISGID) {
+		inode->i_gid = parent->i_gid;
+		if (S_ISDIR(mode))
+			mode |= S_ISGID;
+	} else
+		inode->i_gid = current->fsgid;
+
+	inode->i_mode = mode;
+	if (S_ISDIR(mode))
+		jfs_inode->mode2 = IDIRECTORY | mode;
+	else
+		jfs_inode->mode2 = INLINEEA | ISPARSE | mode;
+	inode->i_blksize = sb->s_blocksize;
+	inode->i_blocks = 0;
+	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
+	jfs_inode->otime = inode->i_ctime;
+	inode->i_version = ++event;
+	inode->i_generation = JFS_SBI(sb)->gengen++;
+
+	jfs_inode->cflag = 0;
+	set_cflag(COMMIT_New, inode);
+
+	/* Zero remaining fields */
+	memset(&jfs_inode->acl, 0, sizeof(dxd_t));
+	memset(&jfs_inode->ea, 0, sizeof(dxd_t));
+	jfs_inode->next_index = 0;
+	jfs_inode->acltype = 0;
+	jfs_inode->btorder = 0;
+	jfs_inode->btindex = 0;
+	jfs_inode->bxflag = 0;
+	jfs_inode->blid = 0;
+	jfs_inode->atlhead = 0;
+	jfs_inode->atltail = 0;
+	jfs_inode->xtlid = 0;
+
+	jFYI(1, ("ialloc returns inode = 0x%p\n", inode));
+
+	return inode;
+}
+
+/*
+ * NAME:	iwritelocklist()
+ *
+ * FUNCTION:	Lock multiple inodes in sorted order to avoid deadlock
+ *
+ */
+void iwritelocklist(int n, ...)
+{
+	va_list ilist;
+	struct inode *sort[4];
+	struct inode *ip;
+	int k, m;
+
+	va_start(ilist, n);
+	for (k = 0; k < n; k++)
+		sort[k] = va_arg(ilist, struct inode *);
+	va_end(ilist);
+
+	/* Bubble sort in descending order */
+	do {
+		m = 0;
+		for (k = 0; k < n; k++)
+			if ((k + 1) < n
+			    && sort[k + 1]->i_ino > sort[k]->i_ino) {
+				ip = sort[k];
+				sort[k] = sort[k + 1];
+				sort[k + 1] = ip;
+				m++;
+			}
+	} while (m);
+
+	/* Lock them */
+	for (k = 0; k < n; k++) {
+		IWRITE_LOCK(sort[k]);
+	}
+}
--- a/fs/jfs/jfs_inode.h
+++ b/fs/jfs/jfs_inode.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef	_H_JFS_INODE
+#define _H_JFS_INODE
+
+extern struct inode *ialloc(struct inode *, umode_t);
+
+#endif				/* _H_JFS_INODE */
--- a/fs/jfs/jfs_lock.h
+++ b/fs/jfs/jfs_lock.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef _H_JFS_LOCK
+#define _H_JFS_LOCK
+
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+
+/*
+ *	jfs_lock.h
+ *
+ * JFS lock definition for globally referenced locks
+ */
+
+/* readers/writer lock: thread-thread */
+
+/*
+ * RW semaphores do not currently have a trylock function.  Since the
+ * implementation varies by platform, I have implemented a platform-independent
+ * wrapper around the rw_semaphore routines.  If this turns out to be the best
+ * way of avoiding our locking problems, I will push to get a trylock
+ * implemented in the kernel, but I'd rather find a way to avoid having to
+ * use it.
+ */
+#define RDWRLOCK_T jfs_rwlock_t
+static inline void RDWRLOCK_INIT(jfs_rwlock_t * Lock)
+{
+	init_rwsem(&Lock->rw_sem);
+	atomic_set(&Lock->in_use, 0);
+}
+static inline void READ_LOCK(jfs_rwlock_t * Lock)
+{
+	atomic_inc(&Lock->in_use);
+	down_read(&Lock->rw_sem);
+}
+static inline void READ_UNLOCK(jfs_rwlock_t * Lock)
+{
+	up_read(&Lock->rw_sem);
+	atomic_dec(&Lock->in_use);
+}
+static inline void WRITE_LOCK(jfs_rwlock_t * Lock)
+{
+	atomic_inc(&Lock->in_use);
+	down_write(&Lock->rw_sem);
+}
+
+static inline int WRITE_TRYLOCK(jfs_rwlock_t * Lock)
+{
+	if (atomic_read(&Lock->in_use))
+		return 0;
+	WRITE_LOCK(Lock);
+	return 1;
+}
+static inline void WRITE_UNLOCK(jfs_rwlock_t * Lock)
+{
+	up_write(&Lock->rw_sem);
+	atomic_dec(&Lock->in_use);
+}
+
+#define IREAD_LOCK(ip)		READ_LOCK(&JFS_IP(ip)->rdwrlock)
+#define IREAD_UNLOCK(ip)	READ_UNLOCK(&JFS_IP(ip)->rdwrlock)
+#define IWRITE_LOCK(ip)		WRITE_LOCK(&JFS_IP(ip)->rdwrlock)
+#define IWRITE_TRYLOCK(ip)	WRITE_TRYLOCK(&JFS_IP(ip)->rdwrlock)
+#define IWRITE_UNLOCK(ip)	WRITE_UNLOCK(&JFS_IP(ip)->rdwrlock)
+#define IWRITE_LOCK_LIST	iwritelocklist
+
+extern void iwritelocklist(int, ...);
+
+/*
+ * Conditional sleep where condition is protected by spinlock
+ *
+ * lock_cmd and unlock_cmd take and release the spinlock
+ */
+#define __SLEEP_COND(wq, cond, lock_cmd, unlock_cmd)	\
+do {							\
+	DECLARE_WAITQUEUE(__wait, current);		\
+							\
+	add_wait_queue(&wq, &__wait);			\
+	for (;;) {					\
+		set_current_state(TASK_UNINTERRUPTIBLE);\
+		if (cond)				\
+			break;				\
+		unlock_cmd;				\
+		schedule();				\
+		lock_cmd;				\
+	}						\
+	current->state = TASK_RUNNING;			\
+	remove_wait_queue(&wq, &__wait);		\
+} while (0)
+
+#endif				/* _H_JFS_LOCK */
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+*/
+
+/*
+ *	jfs_logmgr.c: log manager
+ *
+ * for related information, see transaction manager (jfs_txnmgr.c), and
+ * recovery manager (jfs_logredo.c).
+ *
+ * note: for detail, RTFS.
+ *
+ *	log buffer manager:
+ * special purpose buffer manager supporting log i/o requirements.
+ * per log serial pageout of logpage
+ * queuing i/o requests and redrive i/o at iodone
+ * maintain current logpage buffer
+ * no caching since append only
+ * appropriate jfs buffer cache buffers as needed
+ *
+ *	group commit:
+ * transactions which wrote COMMIT records in the same in-memory
+ * log page during the pageout of previous/current log page(s) are
+ * committed together by the pageout of the page.
+ *
+ *	TBD lazy commit:
+ * transactions are committed asynchronously when the log page
+ * containing it COMMIT is paged out when it becomes full;
+ *
+ *	serialization:
+ * . a per log lock serialize log write.
+ * . a per log lock serialize group commit.
+ * . a per log lock serialize log open/close;
+ *
+ *	TBD log integrity:
+ * careful-write (ping-pong) of last logpage to recover from crash
+ * in overwrite.
+ * detection of split (out-of-order) write of physical sectors
+ * of last logpage via timestamp at end of each sector
+ * with its mirror data array at trailer).
+ *
+ *	alternatives:
+ * lsn - 64-bit monotonically increasing integer vs
+ * 32-bit lspn and page eor.
+ */
+
+#include <linux/fs.h>
+#include <linux/locks.h>
+#include <linux/slab.h>
+#include <linux/blkdev.h>
+#include <linux/interrupt.h>
+#include <linux/smp_lock.h>
+#include "jfs_incore.h"
+#include "jfs_filsys.h"
+#include "jfs_metapage.h"
+#include "jfs_txnmgr.h"
+#include "jfs_debug.h"
+
+
+/*
+ * lbuf's ready to be redriven.  Protected by log_redrive_lock (jfsIOtask)
+ */
+static lbuf_t *log_redrive_list;
+static spinlock_t log_redrive_lock = SPIN_LOCK_UNLOCKED;
+
+
+/*
+ *	log read/write serialization (per log)
+ */
+#define LOG_LOCK_INIT(log)	init_MUTEX(&(log)->loglock)
+#define LOG_LOCK(log)		down(&((log)->loglock))
+#define LOG_UNLOCK(log)		up(&((log)->loglock))
+
+
+/*
+ *	log group commit serialization (per log)
+ */
+
+#define LOGGC_LOCK_INIT(log)	spin_lock_init(&(log)->gclock)
+#define LOGGC_LOCK(log)		spin_lock_irq(&(log)->gclock)
+#define LOGGC_UNLOCK(log)	spin_unlock_irq(&(log)->gclock)
+#define LOGGC_WAKEUP(tblk)	wake_up(&(tblk)->gcwait)
+
+/*
+ *	log sync serialization (per log)
+ */
+#define	LOGSYNC_DELTA(logsize)		min((logsize)/8, 128*LOGPSIZE)
+#define	LOGSYNC_BARRIER(logsize)	((logsize)/4)
+/*
+#define	LOGSYNC_DELTA(logsize)		min((logsize)/4, 256*LOGPSIZE)
+#define	LOGSYNC_BARRIER(logsize)	((logsize)/2)
+*/
+
+
+/*
+ *	log buffer cache synchronization
+ */
+static spinlock_t jfsLCacheLock = SPIN_LOCK_UNLOCKED;
+
+#define	LCACHE_LOCK(flags)	spin_lock_irqsave(&jfsLCacheLock, flags)
+#define	LCACHE_UNLOCK(flags)	spin_unlock_irqrestore(&jfsLCacheLock, flags)
+
+/*
+ * See __SLEEP_COND in jfs_locks.h
+ */
+#define LCACHE_SLEEP_COND(wq, cond, flags)	\
+do {						\
+	if (cond)				\
+		break;				\
+	__SLEEP_COND(wq, cond, LCACHE_LOCK(flags), LCACHE_UNLOCK(flags)); \
+} while (0)
+
+#define	LCACHE_WAKEUP(event)	wake_up(event)
+
+
+/*
+ *	lbuf buffer cache (lCache) control
+ */
+/* log buffer manager pageout control (cumulative, inclusive) */
+#define	lbmREAD		0x0001
+#define	lbmWRITE	0x0002	/* enqueue at tail of write queue;
+				 * init pageout if at head of queue;
+				 */
+#define	lbmRELEASE	0x0004	/* remove from write queue
+				 * at completion of pageout;
+				 * do not free/recycle it yet:
+				 * caller will free it;
+				 */
+#define	lbmSYNC		0x0008	/* do not return to freelist
+				 * when removed from write queue;
+				 */
+#define lbmFREE		0x0010	/* return to freelist
+				 * at completion of pageout;
+				 * the buffer may be recycled;
+				 */
+#define	lbmDONE		0x0020
+#define	lbmERROR	0x0040
+#define lbmGC		0x0080	/* lbmIODone to perform post-GC processing
+				 * of log page
+				 */
+#define lbmDIRECT	0x0100
+
+/*
+ * external references
+ */
+extern void vPut(struct inode *ip);
+extern void txLazyUnlock(tblock_t * tblk);
+extern int jfs_thread_stopped(void);
+extern struct task_struct *jfsIOtask;
+extern struct completion jfsIOwait;
+
+/*
+ * forward references
+ */
+static int lmWriteRecord(log_t * log, tblock_t * tblk, lrd_t * lrd,
+			 tlock_t * tlck);
+
+static int lmNextPage(log_t * log);
+static int lmLogInit(log_t * log);
+static int lmLogShutdown(log_t * log);
+
+static int lbmLogInit(log_t * log);
+static void lbmLogShutdown(log_t * log);
+static lbuf_t *lbmAllocate(log_t * log, int);
+static void lbmFree(lbuf_t * bp);
+static void lbmfree(lbuf_t * bp);
+static int lbmRead(log_t * log, int pn, lbuf_t ** bpp);
+static void lbmWrite(log_t * log, lbuf_t * bp, int flag, int cant_block);
+static void lbmDirectWrite(log_t * log, lbuf_t * bp, int flag);
+static int lbmIOWait(lbuf_t * bp, int flag);
+static int lbmIODone(struct bio *bio, int);
+#ifdef _STILL_TO_PORT
+static void lbmDirectIODone(iobuf_t * ddbp);
+#endif				/* _STILL_TO_PORT */
+void lbmStartIO(lbuf_t * bp);
+void lmGCwrite(log_t * log, int cant_block);
+
+
+/*
+ *	statistics
+ */
+#ifdef CONFIG_JFS_STATISTICS
+struct lmStat {
+	uint commit;		/* # of commit */
+	uint pagedone;		/* # of page written */
+	uint submitted;		/* # of pages submitted */
+} lmStat;
+#endif
+
+
+/*
+ * NAME:	lmLog()
+ *
+ * FUNCTION:	write a log record;
+ *
+ * PARAMETER:
+ *
+ * RETURN:	lsn - offset to the next log record to write (end-of-log);
+ *		-1  - error;
+ *
+ * note: todo: log error handler
+ */
+int lmLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
+{
+	int lsn;
+	int diffp, difft;
+	metapage_t *mp = NULL;
+
+	jFYI(1, ("lmLog: log:0x%p tblk:0x%p, lrd:0x%p tlck:0x%p\n",
+		 log, tblk, lrd, tlck));
+
+	LOG_LOCK(log);
+
+	/* log by (out-of-transaction) JFS ? */
+	if (tblk == NULL)
+		goto writeRecord;
+
+	/* log from page ? */
+	if (tlck == NULL ||
+	    tlck->type & tlckBTROOT || (mp = tlck->mp) == NULL)
+		goto writeRecord;
+
+	/*
+	 *      initialize/update page/transaction recovery lsn
+	 */
+	lsn = log->lsn;
+
+	LOGSYNC_LOCK(log);
+
+	/*
+	 * initialize page lsn if first log write of the page
+	 */
+	if (mp->lsn == 0) {
+		mp->log = log;
+		mp->lsn = lsn;
+		log->count++;
+
+		/* insert page at tail of logsynclist */
+		list_add_tail(&mp->synclist, &log->synclist);
+	}
+
+	/*
+	 *      initialize/update lsn of tblock of the page
+	 *
+	 * transaction inherits oldest lsn of pages associated
+	 * with allocation/deallocation of resources (their
+	 * log records are used to reconstruct allocation map
+	 * at recovery time: inode for inode allocation map,
+	 * B+-tree index of extent descriptors for block
+	 * allocation map);
+	 * allocation map pages inherit transaction lsn at
+	 * commit time to allow forwarding log syncpt past log
+	 * records associated with allocation/deallocation of
+	 * resources only after persistent map of these map pages
+	 * have been updated and propagated to home.
+	 */
+	/*
+	 * initialize transaction lsn:
+	 */
+	if (tblk->lsn == 0) {
+		/* inherit lsn of its first page logged */
+		tblk->lsn = mp->lsn;
+		log->count++;
+
+		/* insert tblock after the page on logsynclist */
+		list_add(&tblk->synclist, &mp->synclist);
+	}
+	/*
+	 * update transaction lsn:
+	 */
+	else {
+		/* inherit oldest/smallest lsn of page */
+		logdiff(diffp, mp->lsn, log);
+		logdiff(difft, tblk->lsn, log);
+		if (diffp < difft) {
+			/* update tblock lsn with page lsn */
+			tblk->lsn = mp->lsn;
+
+			/* move tblock after page on logsynclist */
+			list_del(&tblk->synclist);
+			list_add(&tblk->synclist, &mp->synclist);
+		}
+	}
+
+	LOGSYNC_UNLOCK(log);
+
+	/*
+	 *      write the log record
+	 */
+      writeRecord:
+	lsn = lmWriteRecord(log, tblk, lrd, tlck);
+
+	/*
+	 * forward log syncpt if log reached next syncpt trigger
+	 */
+	logdiff(diffp, lsn, log);
+	if (diffp >= log->nextsync)
+		lsn = lmLogSync(log, 0);
+
+	/* update end-of-log lsn */
+	log->lsn = lsn;
+
+	LOG_UNLOCK(log);
+
+	/* return end-of-log address */
+	return lsn;
+}
+
+
+/*
+ * NAME:	lmWriteRecord()
+ *
+ * FUNCTION:	move the log record to current log page
+ *
+ * PARAMETER:	cd	- commit descriptor
+ *
+ * RETURN:	end-of-log address
+ *			
+ * serialization: LOG_LOCK() held on entry/exit
+ */
+static int
+lmWriteRecord(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
+{
+	int lsn = 0;		/* end-of-log address */
+	lbuf_t *bp;		/* dst log page buffer */
+	logpage_t *lp;		/* dst log page */
+	caddr_t dst;		/* destination address in log page */
+	int dstoffset;		/* end-of-log offset in log page */
+	int freespace;		/* free space in log page */
+	caddr_t p;		/* src meta-data page */
+	caddr_t src;
+	int srclen;
+	int nbytes;		/* number of bytes to move */
+	int i;
+	int len;
+	linelock_t *linelock;
+	lv_t *lv;
+	lvd_t *lvd;
+	int l2linesize;
+
+	len = 0;
+
+	/* retrieve destination log page to write */
+	bp = (lbuf_t *) log->bp;
+	lp = (logpage_t *) bp->l_ldata;
+	dstoffset = log->eor;
+
+	/* any log data to write ? */
+	if (tlck == NULL)
+		goto moveLrd;
+
+	/*
+	 *      move log record data
+	 */
+	/* retrieve source meta-data page to log */
+	if (tlck->flag & tlckPAGELOCK) {
+		p = (caddr_t) (tlck->mp->data);
+		linelock = (linelock_t *) & tlck->lock;
+	}
+	/* retrieve source in-memory inode to log */
+	else if (tlck->flag & tlckINODELOCK) {
+		if (tlck->type & tlckDTREE)
+			p = (caddr_t) &JFS_IP(tlck->ip)->i_dtroot;
+		else
+			p = (caddr_t) &JFS_IP(tlck->ip)->i_xtroot;
+		linelock = (linelock_t *) & tlck->lock;
+	}
+#ifdef	_JFS_WIP
+	else if (tlck->flag & tlckINLINELOCK) {
+
+		inlinelock = (inlinelock_t *) & tlck;
+		p = (caddr_t) & inlinelock->pxd;
+		linelock = (linelock_t *) & tlck;
+	}
+#endif				/* _JFS_WIP */
+	else {
+		jERROR(2, ("lmWriteRecord: UFO tlck:0x%p\n", tlck));
+		return 0;	/* Probably should trap */
+	}
+	l2linesize = linelock->l2linesize;
+
+      moveData:
+	ASSERT(linelock->index <= linelock->maxcnt);
+
+	lv = (lv_t *) & linelock->lv;
+	for (i = 0; i < linelock->index; i++, lv++) {
+		if (lv->length == 0)
+			continue;
+
+		/* is page full ? */
+		if (dstoffset >= LOGPSIZE - LOGPTLRSIZE) {
+			/* page become full: move on to next page */
+			lmNextPage(log);
+
+			bp = log->bp;
+			lp = (logpage_t *) bp->l_ldata;
+			dstoffset = LOGPHDRSIZE;
+		}
+
+		/*
+		 * move log vector data
+		 */
+		src = (u8 *) p + (lv->offset << l2linesize);
+		srclen = lv->length << l2linesize;
+		len += srclen;
+		while (srclen > 0) {
+			freespace = (LOGPSIZE - LOGPTLRSIZE) - dstoffset;
+			nbytes = min(freespace, srclen);
+			dst = (caddr_t) lp + dstoffset;
+			memcpy(dst, src, nbytes);
+			dstoffset += nbytes;
+
+			/* is page not full ? */
+			if (dstoffset < LOGPSIZE - LOGPTLRSIZE)
+				break;
+
+			/* page become full: move on to next page */
+			lmNextPage(log);
+
+			bp = (lbuf_t *) log->bp;
+			lp = (logpage_t *) bp->l_ldata;
+			dstoffset = LOGPHDRSIZE;
+
+			srclen -= nbytes;
+			src += nbytes;
+		}
+
+		/*
+		 * move log vector descriptor
+		 */
+		len += 4;
+		lvd = (lvd_t *) ((caddr_t) lp + dstoffset);
+		lvd->offset = cpu_to_le16(lv->offset);
+		lvd->length = cpu_to_le16(lv->length);
+		dstoffset += 4;
+		jFYI(1,
+		     ("lmWriteRecord: lv offset:%d length:%d\n",
+		      lv->offset, lv->length));
+	}
+
+	if ((i = linelock->next)) {
+		linelock = (linelock_t *) lid_to_tlock(i);
+		goto moveData;
+	}
+
+	/*
+	 *      move log record descriptor
+	 */
+      moveLrd:
+	lrd->length = cpu_to_le16(len);
+
+	src = (caddr_t) lrd;
+	srclen = LOGRDSIZE;
+
+	while (srclen > 0) {
+		freespace = (LOGPSIZE - LOGPTLRSIZE) - dstoffset;
+		nbytes = min(freespace, srclen);
+		dst = (caddr_t) lp + dstoffset;
+		memcpy(dst, src, nbytes);
+
+		dstoffset += nbytes;
+		srclen -= nbytes;
+
+		/* are there more to move than freespace of page ? */
+		if (srclen)
+			goto pageFull;
+
+		/*
+		 * end of log record descriptor
+		 */
+
+		/* update last log record eor */
+		log->eor = dstoffset;
+		bp->l_eor = dstoffset;
+		lsn = (log->page << L2LOGPSIZE) + dstoffset;
+
+		if (lrd->type & cpu_to_le16(LOG_COMMIT)) {
+			tblk->clsn = lsn;
+			jFYI(1,
+			     ("wr: tclsn:0x%x, beor:0x%x\n", tblk->clsn,
+			      bp->l_eor));
+
+			INCREMENT(lmStat.commit);	/* # of commit */
+
+			/*
+			 * enqueue tblock for group commit:
+			 *
+			 * enqueue tblock of non-trivial/synchronous COMMIT
+			 * at tail of group commit queue
+			 * (trivial/asynchronous COMMITs are ignored by
+			 * group commit.)
+			 */
+			LOGGC_LOCK(log);
+
+			/* init tblock gc state */
+			tblk->flag = tblkGC_QUEUE;
+			tblk->bp = log->bp;
+			tblk->pn = log->page;
+			tblk->eor = log->eor;
+			init_waitqueue_head(&tblk->gcwait);
+
+			/* enqueue transaction to commit queue */
+			tblk->cqnext = NULL;
+			if (log->cqueue.head) {
+				log->cqueue.tail->cqnext = tblk;
+				log->cqueue.tail = tblk;
+			} else
+				log->cqueue.head = log->cqueue.tail = tblk;
+
+			LOGGC_UNLOCK(log);
+		}
+
+		jFYI(1,
+		     ("lmWriteRecord: lrd:0x%04x bp:0x%p pn:%d eor:0x%x\n",
+		      le16_to_cpu(lrd->type), log->bp, log->page,
+		      dstoffset));
+
+		/* page not full ? */
+		if (dstoffset < LOGPSIZE - LOGPTLRSIZE)
+			return lsn;
+
+	      pageFull:
+		/* page become full: move on to next page */
+		lmNextPage(log);
+
+		bp = (lbuf_t *) log->bp;
+		lp = (logpage_t *) bp->l_ldata;
+		dstoffset = LOGPHDRSIZE;
+		src += nbytes;
+	}
+
+	return lsn;
+}
+
+
+/*
+ * NAME:	lmNextPage()
+ *
+ * FUNCTION:	write current page and allocate next page.
+ *
+ * PARAMETER:	log
+ *
+ * RETURN:	0
+ *			
+ * serialization: LOG_LOCK() held on entry/exit
+ */
+static int lmNextPage(log_t * log)
+{
+	logpage_t *lp;
+	int lspn;		/* log sequence page number */
+	int pn;			/* current page number */
+	lbuf_t *bp;
+	lbuf_t *nextbp;
+	tblock_t *tblk;
+
+	jFYI(1, ("lmNextPage\n"));
+
+	/* get current log page number and log sequence page number */
+	pn = log->page;
+	bp = log->bp;
+	lp = (logpage_t *) bp->l_ldata;
+	lspn = le32_to_cpu(lp->h.page);
+
+	LOGGC_LOCK(log);
+
+	/*
+	 *      write or queue the full page at the tail of write queue
+	 */
+	/* get the tail tblk on commit queue */
+	tblk = log->cqueue.tail;
+
+	/* every tblk who has COMMIT record on the current page,
+	 * and has not been committed, must be on commit queue
+	 * since tblk is queued at commit queueu at the time
+	 * of writing its COMMIT record on the page before
+	 * page becomes full (even though the tblk thread
+	 * who wrote COMMIT record may have been suspended
+	 * currently);
+	 */
+
+	/* is page bound with outstanding tail tblk ? */
+	if (tblk && tblk->pn == pn) {
+		/* mark tblk for end-of-page */
+		tblk->flag |= tblkGC_EOP;
+
+		/* if page is not already on write queue,
+		 * just enqueue (no lbmWRITE to prevent redrive)
+		 * buffer to wqueue to ensure correct serial order
+		 * of the pages since log pages will be added
+		 * continuously (tblk bound with the page hasn't
+		 * got around to init write of the page, either
+		 * preempted or the page got filled by its COMMIT
+		 * record);
+		 * pages with COMMIT are paged out explicitly by
+		 * tblk in lmGroupCommit();
+		 */
+		if (bp->l_wqnext == NULL) {
+			/* bp->l_ceor = bp->l_eor; */
+			/* lp->h.eor = lp->t.eor = bp->l_ceor; */
+			lbmWrite(log, bp, 0, 0);
+		}
+	}
+	/* page is not bound with outstanding tblk:
+	 * init write or mark it to be redriven (lbmWRITE)
+	 */
+	else {
+		/* finalize the page */
+		bp->l_ceor = bp->l_eor;
+		lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_ceor);
+		lbmWrite(log, bp, lbmWRITE | lbmRELEASE | lbmFREE, 0);
+	}
+	LOGGC_UNLOCK(log);
+
+	/*
+	 *      allocate/initialize next page
+	 */
+	/* if log wraps, the first data page of log is 2
+	 * (0 never used, 1 is superblock).
+	 */
+	log->page = (pn == log->size - 1) ? 2 : pn + 1;
+	log->eor = LOGPHDRSIZE;	/* ? valid page empty/full at logRedo() */
+
+	/* allocate/initialize next log page buffer */
+	nextbp = lbmAllocate(log, log->page);
+	nextbp->l_eor = log->eor;
+	log->bp = nextbp;
+
+	/* initialize next log page */
+	lp = (logpage_t *) nextbp->l_ldata;
+	lp->h.page = lp->t.page = cpu_to_le32(lspn + 1);
+	lp->h.eor = lp->t.eor = cpu_to_le16(LOGPHDRSIZE);
+
+	jFYI(1, ("lmNextPage done\n"));
+	return 0;
+}
+
+
+/*
+ * NAME:	lmGroupCommit()
+ *
+ * FUNCTION:	group commit
+ *	initiate pageout of the pages with COMMIT in the order of
+ *	page number - redrive pageout of the page at the head of
+ *	pageout queue until full page has been written.
+ *
+ * RETURN:	
+ *
+ * NOTE:
+ *	LOGGC_LOCK serializes log group commit queue, and
+ *	transaction blocks on the commit queue.
+ *	N.B. LOG_LOCK is NOT held during lmGroupCommit().
+ */
+int lmGroupCommit(log_t * log, tblock_t * tblk)
+{
+	int rc = 0;
+
+	LOGGC_LOCK(log);
+
+	/* group committed already ? */
+	if (tblk->flag & tblkGC_COMMITTED) {
+		if (tblk->flag & tblkGC_ERROR)
+			rc = EIO;
+
+		LOGGC_UNLOCK(log);
+		return rc;
+	}
+	jFYI(1,
+	     ("lmGroup Commit: tblk = 0x%p, gcrtc = %d\n", tblk,
+	      log->gcrtc));
+
+	/*
+	 * group commit pageout in progress
+	 */
+	if ((!(log->cflag & logGC_PAGEOUT)) && log->cqueue.head) {
+		/*
+		 * only transaction in the commit queue:
+		 *
+		 * start one-transaction group commit as
+		 * its group leader.
+		 */
+		log->cflag |= logGC_PAGEOUT;
+
+		lmGCwrite(log, 0);
+	}
+	/* lmGCwrite gives up LOGGC_LOCK, check again */
+
+	if (tblk->flag & tblkGC_COMMITTED) {
+		if (tblk->flag & tblkGC_ERROR)
+			rc = EIO;
+
+		LOGGC_UNLOCK(log);
+		return rc;
+	}
+
+	/* upcount transaction waiting for completion
+	 */
+	log->gcrtc++;
+
+	if (tblk->xflag & COMMIT_LAZY) {
+		tblk->flag |= tblkGC_LAZY;
+		LOGGC_UNLOCK(log);
+		return 0;
+	}
+	tblk->flag |= tblkGC_READY;
+
+	__SLEEP_COND(tblk->gcwait, (tblk->flag & tblkGC_COMMITTED),
+		     LOGGC_LOCK(log), LOGGC_UNLOCK(log));
+
+	/* removed from commit queue */
+	if (tblk->flag & tblkGC_ERROR)
+		rc = EIO;
+
+	LOGGC_UNLOCK(log);
+	return rc;
+}
+
+/*
+ * NAME:	lmGCwrite()
+ *
+ * FUNCTION:	group commit write
+ *	initiate write of log page, building a group of all transactions
+ *	with commit records on that page.
+ *
+ * RETURN:	None
+ *
+ * NOTE:
+ *	LOGGC_LOCK must be held by caller.
+ *	N.B. LOG_LOCK is NOT held during lmGroupCommit().
+ */
+void lmGCwrite(log_t * log, int cant_write)
+{
+	lbuf_t *bp;
+	logpage_t *lp;
+	int gcpn;		/* group commit page number */
+	tblock_t *tblk;
+	tblock_t *xtblk;
+
+	/*
+	 * build the commit group of a log page
+	 *
+	 * scan commit queue and make a commit group of all
+	 * transactions with COMMIT records on the same log page.
+	 */
+	/* get the head tblk on the commit queue */
+	tblk = xtblk = log->cqueue.head;
+	gcpn = tblk->pn;
+
+	while (tblk && tblk->pn == gcpn) {
+		xtblk = tblk;
+
+		/* state transition: (QUEUE, READY) -> COMMIT */
+		tblk->flag |= tblkGC_COMMIT;
+		tblk = tblk->cqnext;
+	}
+	tblk = xtblk;		/* last tblk of the page */
+
+	/*
+	 * pageout to commit transactions on the log page.
+	 */
+	bp = (lbuf_t *) tblk->bp;
+	lp = (logpage_t *) bp->l_ldata;
+	/* is page already full ? */
+	if (tblk->flag & tblkGC_EOP) {
+		/* mark page to free at end of group commit of the page */
+		tblk->flag &= ~tblkGC_EOP;
+		tblk->flag |= tblkGC_FREE;
+		bp->l_ceor = bp->l_eor;
+		lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_ceor);
+		jEVENT(0,
+		       ("gc: tclsn:0x%x, bceor:0x%x\n", tblk->clsn,
+			bp->l_ceor));
+		lbmWrite(log, bp, lbmWRITE | lbmRELEASE | lbmGC,
+			 cant_write);
+	}
+	/* page is not yet full */
+	else {
+		bp->l_ceor = tblk->eor;	/* ? bp->l_ceor = bp->l_eor; */
+		lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_ceor);
+		jEVENT(0,
+		       ("gc: tclsn:0x%x, bceor:0x%x\n", tblk->clsn,
+			bp->l_ceor));
+		lbmWrite(log, bp, lbmWRITE | lbmGC, cant_write);
+	}
+}
+
+/*
+ * NAME:	lmPostGC()
+ *
+ * FUNCTION:	group commit post-processing
+ *	Processes transactions after their commit records have been written
+ *	to disk, redriving log I/O if necessary.
+ *
+ * RETURN:	None
+ *
+ * NOTE:
+ *	This routine is called a interrupt time by lbmIODone
+ */
+void lmPostGC(lbuf_t * bp)
+{
+	unsigned long flags;
+	log_t *log = bp->l_log;
+	logpage_t *lp;
+	tblock_t *tblk;
+
+	//LOGGC_LOCK(log);
+	spin_lock_irqsave(&log->gclock, flags);
+	/*
+	 * current pageout of group commit completed.
+	 *
+	 * remove/wakeup transactions from commit queue who were
+	 * group committed with the current log page
+	 */
+	while ((tblk = log->cqueue.head) && (tblk->flag & tblkGC_COMMIT)) {
+		/* if transaction was marked GC_COMMIT then
+		 * it has been shipped in the current pageout
+		 * and made it to disk - it is committed.
+		 */
+
+		if (bp->l_flag & lbmERROR)
+			tblk->flag |= tblkGC_ERROR;
+
+		/* remove it from the commit queue */
+		log->cqueue.head = tblk->cqnext;
+		if (log->cqueue.head == NULL)
+			log->cqueue.tail = NULL;
+		tblk->flag &= ~tblkGC_QUEUE;
+		tblk->cqnext = 0;
+
+		jEVENT(0,
+		       ("lmPostGC: tblk = 0x%p, flag = 0x%x\n", tblk,
+			tblk->flag));
+
+		if (!(tblk->xflag & COMMIT_FORCE))
+			/*
+			 * Hand tblk over to lazy commit thread
+			 */
+			txLazyUnlock(tblk);
+		else {
+			/* state transition: COMMIT -> COMMITTED */
+			tblk->flag |= tblkGC_COMMITTED;
+
+			if (tblk->flag & tblkGC_READY) {
+				log->gcrtc--;
+				LOGGC_WAKEUP(tblk);
+			}
+		}
+
+		/* was page full before pageout ?
+		 * (and this is the last tblk bound with the page)
+		 */
+		if (tblk->flag & tblkGC_FREE)
+			lbmFree(bp);
+		/* did page become full after pageout ?
+		 * (and this is the last tblk bound with the page)
+		 */
+		else if (tblk->flag & tblkGC_EOP) {
+			/* finalize the page */
+			lp = (logpage_t *) bp->l_ldata;
+			bp->l_ceor = bp->l_eor;
+			lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_eor);
+			jEVENT(0, ("lmPostGC: calling lbmWrite\n"));
+			lbmWrite(log, bp, lbmWRITE | lbmRELEASE | lbmFREE,
+				 1);
+		}
+
+	}
+
+	/* are there any transactions who have entered lnGroupCommit()
+	 * (whose COMMITs are after that of the last log page written.
+	 * They are waiting for new group commit (above at (SLEEP 1)):
+	 * select the latest ready transaction as new group leader and
+	 * wake her up to lead her group.
+	 */
+	if ((log->gcrtc > 0) && log->cqueue.head)
+		/*
+		 * Call lmGCwrite with new group leader
+		 */
+		lmGCwrite(log, 1);
+
+	/* no transaction are ready yet (transactions are only just
+	 * queued (GC_QUEUE) and not entered for group commit yet).
+	 * let the first transaction entering group commit
+	 * will elect hetself as new group leader.
+	 */
+	else
+		log->cflag &= ~logGC_PAGEOUT;
+
+	//LOGGC_UNLOCK(log);
+	spin_unlock_irqrestore(&log->gclock, flags);
+	return;
+}
+
+/*
+ * NAME:	lmLogSync()
+ *
+ * FUNCTION:	write log SYNCPT record for specified log
+ *	if new sync address is available
+ *	(normally the case if sync() is executed by back-ground
+ *	process).
+ *	if not, explicitly run jfs_blogsync() to initiate
+ *	getting of new sync address.
+ *	calculate new value of i_nextsync which determines when
+ *	this code is called again.
+ *
+ *	this is called only from lmLog().
+ *
+ * PARAMETER:	ip	- pointer to logs inode.
+ *
+ * RETURN:	0
+ *			
+ * serialization: LOG_LOCK() held on entry/exit
+ */
+int lmLogSync(log_t * log, int nosyncwait)
+{
+	int logsize;
+	int written;		/* written since last syncpt */
+	int free;		/* free space left available */
+	int delta;		/* additional delta to write normally */
+	int more;		/* additional write granted */
+	lrd_t lrd;
+	int lsn;
+	struct logsyncblk *lp;
+
+	/*
+	 *      forward syncpt
+	 */
+	/* if last sync is same as last syncpt,
+	 * invoke sync point forward processing to update sync.
+	 */
+
+	if (log->sync == log->syncpt) {
+		LOGSYNC_LOCK(log);
+		/* ToDo: push dirty metapages out to disk */
+//              bmLogSync(log);
+
+		if (list_empty(&log->synclist))
+			log->sync = log->lsn;
+		else {
+			lp = list_entry(log->synclist.next,
+					struct logsyncblk, synclist);
+			log->sync = lp->lsn;
+		}
+		LOGSYNC_UNLOCK(log);
+
+	}
+
+	/* if sync is different from last syncpt,
+	 * write a SYNCPT record with syncpt = sync.
+	 * reset syncpt = sync
+	 */
+	if (log->sync != log->syncpt) {
+		struct jfs_sb_info	*sbi = JFS_SBI(log->sb);
+		/*
+		 * We need to make sure all of the "written" metapages
+		 * actually make it to disk
+		 */
+		fsync_inode_data_buffers(sbi->ipbmap);
+		fsync_inode_data_buffers(sbi->ipimap);
+		fsync_inode_data_buffers(sbi->direct_inode);
+
+		lrd.logtid = 0;
+		lrd.backchain = 0;
+		lrd.type = cpu_to_le16(LOG_SYNCPT);
+		lrd.length = 0;
+		lrd.log.syncpt.sync = cpu_to_le32(log->sync);
+		lsn = lmWriteRecord(log, NULL, &lrd, NULL);
+
+		log->syncpt = log->sync;
+	} else
+		lsn = log->lsn;
+
+	/*
+	 *      setup next syncpt trigger (SWAG)
+	 */
+	logsize = log->logsize;
+
+	logdiff(written, lsn, log);
+	free = logsize - written;
+	delta = LOGSYNC_DELTA(logsize);
+	more = min(free / 2, delta);
+	if (more < 2 * LOGPSIZE) {
+		jEVENT(1,
+		       ("\n ... Log Wrap ... Log Wrap ... Log Wrap ...\n\n"));
+		/*
+		 *      log wrapping
+		 *
+		 * option 1 - panic ? No.!
+		 * option 2 - shutdown file systems
+		 *            associated with log ?
+		 * option 3 - extend log ?
+		 */
+		/*
+		 * option 4 - second chance
+		 *
+		 * mark log wrapped, and continue.
+		 * when all active transactions are completed,
+		 * mark log vaild for recovery.
+		 * if crashed during invalid state, log state
+		 * implies invald log, forcing fsck().
+		 */
+		/* mark log state log wrap in log superblock */
+		/* log->state = LOGWRAP; */
+
+		/* reset sync point computation */
+		log->syncpt = log->sync = lsn;
+		log->nextsync = delta;
+	} else
+		/* next syncpt trigger = written + more */
+		log->nextsync = written + more;
+
+	/* return if lmLogSync() from outside of transaction, e.g., sync() */
+	if (nosyncwait)
+		return lsn;
+
+	/* if number of bytes written from last sync point is more
+	 * than 1/4 of the log size, stop new transactions from
+	 * starting until all current transactions are completed
+	 * by setting syncbarrier flag.
+	 */
+	if (written > LOGSYNC_BARRIER(logsize) && logsize > 32 * LOGPSIZE) {
+		log->syncbarrier = 1;
+		jFYI(1, ("log barrier on: lsn=0x%x syncpt=0x%x\n", lsn,
+			 log->syncpt));
+	}
+
+	return lsn;
+}
+
+
+/*
+ * NAME:	lmLogOpen()
+ *
+ * FUNCTION:    open the log on first open;
+ *	insert filesystem in the active list of the log.
+ *
+ * PARAMETER:	ipmnt	- file system mount inode
+ *		iplog 	- log inode (out)
+ *
+ * RETURN:
+ *
+ * serialization:
+ */
+int lmLogOpen(struct super_block *sb, log_t ** logptr)
+{
+	int rc;
+	kdev_t logdev;		/* dev_t of log device */
+	log_t *log;
+
+	logdev = sb->s_dev;
+
+#ifdef _STILL_TO_PORT
+	/*
+	 * open the inode representing the log device (aka log inode)
+	 */
+	if (logdev != fsdev)
+		goto externalLog;
+#endif				/* _STILL_TO_PORT */
+
+	/*
+	 *      in-line log in host file system
+	 *
+	 * file system to log have 1-to-1 relationship;
+	 */
+//    inlineLog:
+
+	*logptr = log = kmalloc(sizeof(log_t), GFP_KERNEL);
+	if (log == 0)
+		return ENOMEM;
+
+	memset(log, 0, sizeof(log_t));
+	log->sb = sb;		/* This should be a list */
+	log->flag = JFS_INLINELOG;
+	log->dev = logdev;
+	log->base = addressPXD(&JFS_SBI(sb)->logpxd);
+	log->size = lengthPXD(&JFS_SBI(sb)->logpxd) >>
+	    (L2LOGPSIZE - sb->s_blocksize_bits);
+	log->l2bsize = sb->s_blocksize_bits;
+	ASSERT(L2LOGPSIZE >= sb->s_blocksize_bits);
+	/*
+	 * initialize log.
+	 */
+	if ((rc = lmLogInit(log)))
+		goto errout10;
+
+#ifdef _STILL_TO_PORT
+	goto out;
+
+	/*
+	 *      external log as separate logical volume
+	 *
+	 * file systems to log may have n-to-1 relationship;
+	 */
+      externalLog:
+	/*
+	 * open log inode
+	 *
+	 * log inode is reserved inode of (dev_t = log device,
+	 * fileset number = 0, i_number = 0), which acquire
+	 * one i_count for each open by file system.
+	 *
+	 * hand craft dummy vfs to force iget() the special case of
+	 * an in-memory inode allocation without on-disk inode
+	 */
+	memset(&dummyvfs, 0, sizeof(struct vfs));
+	dummyvfs.filesetvfs.vfs_data = NULL;
+	dummyvfs.dummyvfs.dev = logdev;
+	dummyvfs.dummyvfs.ipmnt = NULL;
+	ICACHE_LOCK();
+	rc = iget((struct vfs *) &dummyvfs, 0, (inode_t **) & log, 0);
+	ICACHE_UNLOCK();
+	if (rc)
+		return rc;
+
+	log->flag = 0;
+	log->dev = logdev;
+	log->base = 0;
+	log->size = 0;
+
+	/*
+	 * serialize open/close between multiple file systems
+	 * bound with the log;
+	 */
+	ip = (inode_t *) log;
+	IWRITE_LOCK(ip);
+
+	/*
+	 * subsequent open: add file system to log active file system list
+	 */
+#ifdef	_JFS_OS2
+	if (log->strat2p)
+#endif				/* _JFS_OS2 */
+	{
+		if (rc = lmLogFileSystem(log, fsdev, 1))
+			goto errout10;
+
+		IWRITE_UNLOCK(ip);
+
+		*iplog = ip;
+		jFYI(1, ("lmLogOpen: exit(0)\n"));
+		return 0;
+	}
+
+	/* decouple log inode from dummy vfs */
+	vPut(ip);
+
+	/*
+	 * first open:
+	 */
+#ifdef	_JFS_OS2
+	/*
+	 * establish access to the single/shared (already open) log device
+	 */
+	logdevfp = (void *) logStrat2;
+	log->strat2p = logStrat2;
+	log->strat3p = logStrat3;
+
+	log->l2pbsize = 9;	/* todo: when OS/2 have multiple external log */
+#endif				/* _JFS_OS2 */
+
+	/*
+	 * initialize log:
+	 */
+	if (rc = lmLogInit(log))
+		goto errout20;
+
+	/*
+	 * add file system to log active file system list
+	 */
+	if (rc = lmLogFileSystem(log, fsdev, 1))
+		goto errout30;
+
+	/*
+	 *      insert log device into log device list
+	 */
+      out:
+#endif				/*  _STILL_TO_PORT */
+	jFYI(1, ("lmLogOpen: exit(0)\n"));
+	return 0;
+
+	/*
+	 *      unwind on error
+	 */
+#ifdef _STILL_TO_PORT
+      errout30:		/* unwind lbmLogInit() */
+	lbmLogShutdown(log);
+
+      errout20:		/* close external log device */
+
+#endif				/* _STILL_TO_PORT */
+      errout10:		/* free log inode */
+	kfree(log);
+
+	jFYI(1, ("lmLogOpen: exit(%d)\n", rc));
+	return rc;
+}
+
+
+/*
+ * NAME:	lmLogInit()
+ *
+ * FUNCTION:	log initialization at first log open.
+ *
+ *	logredo() (or logformat()) should have been run previously.
+ *	initialize the log inode from log superblock.
+ *	set the log state in the superblock to LOGMOUNT and
+ *	write SYNCPT log record.
+ *		
+ * PARAMETER:	log	- log structure
+ *
+ * RETURN:	0	- if ok
+ *		EINVAL	- bad log magic number or superblock dirty
+ *		error returned from logwait()
+ *			
+ * serialization: single first open thread
+ */
+static int lmLogInit(log_t * log)
+{
+	int rc = 0;
+	lrd_t lrd;
+	logsuper_t *logsuper;
+	lbuf_t *bpsuper;
+	lbuf_t *bp;
+	logpage_t *lp;
+	int lsn;
+
+	jFYI(1, ("lmLogInit: log:0x%p\n", log));
+
+	/*
+	 * log inode is overlaid on generic inode where
+	 * dinode have been zeroed out by iRead();
+	 */
+
+	/*
+	 * initialize log i/o
+	 */
+	if ((rc = lbmLogInit(log)))
+		return rc;
+
+	/*
+	 * validate log superblock
+	 */
+	if ((rc = lbmRead(log, 1, &bpsuper)))
+		goto errout10;
+
+	logsuper = (logsuper_t *) bpsuper->l_ldata;
+
+	if (logsuper->magic != cpu_to_le32(LOGMAGIC)) {
+		jERROR(1, ("*** Log Format Error ! ***\n"));
+		rc = EINVAL;
+		goto errout20;
+	}
+
+	/* logredo() should have been run successfully. */
+	if (logsuper->state != cpu_to_le32(LOGREDONE)) {
+		jERROR(1, ("*** Log Is Dirty ! ***\n"));
+		rc = EINVAL;
+		goto errout20;
+	}
+
+	/* initialize log inode from log superblock */
+	if (log->flag & JFS_INLINELOG) {
+		if (log->size != le32_to_cpu(logsuper->size)) {
+			rc = EINVAL;
+			goto errout20;
+		}
+		jFYI(0,
+		     ("lmLogInit: inline log:0x%p base:0x%Lx size:0x%x\n",
+		      log, (unsigned long long) log->base, log->size));
+	} else {
+		log->size = le32_to_cpu(logsuper->size);
+		jFYI(0,
+		     ("lmLogInit: external log:0x%p base:0x%Lx size:0x%x\n",
+		      log, (unsigned long long) log->base, log->size));
+	}
+
+	log->flag |= JFS_GROUPCOMMIT;
+/*
+	log->flag |= JFS_LAZYCOMMIT;
+*/
+	log->page = le32_to_cpu(logsuper->end) / LOGPSIZE;
+	log->eor = le32_to_cpu(logsuper->end) - (LOGPSIZE * log->page);
+
+	/*
+	 * initialize for log append write mode
+	 */
+	/* establish current/end-of-log page/buffer */
+	if ((rc = lbmRead(log, log->page, &bp)))
+		goto errout20;
+
+	lp = (logpage_t *) bp->l_ldata;
+
+	jFYI(1, ("lmLogInit: lsn:0x%x page:%d eor:%d:%d\n",
+		 le32_to_cpu(logsuper->end), log->page, log->eor,
+		 le16_to_cpu(lp->h.eor)));
+
+//      ASSERT(log->eor == lp->h.eor);
+
+	log->bp = bp;
+	bp->l_pn = log->page;
+	bp->l_eor = log->eor;
+
+	/* initialize the group commit serialization lock */
+	LOGGC_LOCK_INIT(log);
+
+	/* if current page is full, move on to next page */
+	if (log->eor >= LOGPSIZE - LOGPTLRSIZE)
+		lmNextPage(log);
+
+	/* allocate/initialize the log write serialization lock */
+	LOG_LOCK_INIT(log);
+
+	/*
+	 * initialize log syncpoint
+	 */
+	/*
+	 * write the first SYNCPT record with syncpoint = 0
+	 * (i.e., log redo up to HERE !);
+	 * remove current page from lbm write queue at end of pageout
+	 * (to write log superblock update), but do not release to freelist;
+	 */
+	lrd.logtid = 0;
+	lrd.backchain = 0;
+	lrd.type = cpu_to_le16(LOG_SYNCPT);
+	lrd.length = 0;
+	lrd.log.syncpt.sync = 0;
+	lsn = lmWriteRecord(log, NULL, &lrd, NULL);
+	bp = log->bp;
+	bp->l_ceor = bp->l_eor;
+	lp = (logpage_t *) bp->l_ldata;
+	lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_eor);
+	lbmWrite(log, bp, lbmWRITE | lbmSYNC, 0);
+	if ((rc = lbmIOWait(bp, 0)))
+		goto errout30;
+
+	/* initialize logsync parameters */
+	log->logsize = (log->size - 2) << L2LOGPSIZE;
+	log->lsn = lsn;
+	log->syncpt = lsn;
+	log->sync = log->syncpt;
+	log->nextsync = LOGSYNC_DELTA(log->logsize);
+	init_waitqueue_head(&log->syncwait);
+
+	jFYI(1, ("lmLogInit: lsn:0x%x syncpt:0x%x sync:0x%x\n",
+		 log->lsn, log->syncpt, log->sync));
+
+	LOGSYNC_LOCK_INIT(log);
+
+	INIT_LIST_HEAD(&log->synclist);
+
+	log->cqueue.head = log->cqueue.tail = 0;
+
+	log->count = 0;
+
+	/*
+	 * initialize for lazy/group commit
+	 */
+	log->clsn = lsn;
+
+	/*
+	 * update/write superblock
+	 */
+	logsuper->state = cpu_to_le32(LOGMOUNT);
+	log->serial = le32_to_cpu(logsuper->serial) + 1;
+	logsuper->serial = cpu_to_le32(log->serial);
+	lbmDirectWrite(log, bpsuper, lbmWRITE | lbmRELEASE | lbmSYNC);
+	if ((rc = lbmIOWait(bpsuper, lbmFREE)))
+		goto errout30;
+
+	jFYI(1, ("lmLogInit: exit(%d)\n", rc));
+	return 0;
+
+	/*
+	 *      unwind on error
+	 */
+      errout30:		/* release log page */
+	lbmFree(bp);
+
+      errout20:		/* release log superblock */
+	lbmFree(bpsuper);
+
+      errout10:		/* unwind lbmLogInit() */
+	lbmLogShutdown(log);
+
+	jFYI(1, ("lmLogInit: exit(%d)\n", rc));
+	return rc;
+}
+
+
+/*
+ * NAME:	lmLogClose()
+ *
+ * FUNCTION:	remove file system <ipmnt> from active list of log <iplog>
+ *		and close it on last close.
+ *
+ * PARAMETER:	sb	- superblock
+ *		log	- log inode
+ *
+ * RETURN:	errors from subroutines
+ *
+ * serialization:
+ */
+int lmLogClose(struct super_block *sb, log_t * log)
+{
+	int rc;
+
+	jFYI(1, ("lmLogClose: log:0x%p\n", log));
+
+	/*
+	 *      in-line log in host file system
+	 */
+//      inlineLog:
+#ifdef _STILL_TO_PORT
+	if (log->flag & JFS_INLINELOG) {
+		rc = lmLogShutdown(log);
+
+		goto out1;
+	}
+
+	/*
+	 *      external log as separate logical volume
+	 */
+      externalLog:
+
+	/* serialize open/close between multiple file systems
+	 * associated with the log
+	 */
+	IWRITE_LOCK(iplog);
+
+	/*
+	 * remove file system from log active file system list
+	 */
+	rc = lmLogFileSystem(log, fsdev, 0);
+
+	if (iplog->i_count > 1)
+		goto out2;
+
+	/*
+	 *      last close: shut down log
+	 */
+	rc = ((rc1 = lmLogShutdown(log)) && rc == 0) ? rc1 : rc;
+
+      out1:
+#else				/* _STILL_TO_PORT */
+	rc = lmLogShutdown(log);
+#endif				/* _STILL_TO_PORT */
+
+//      out2:
+
+	jFYI(0, ("lmLogClose: exit(%d)\n", rc));
+	return rc;
+}
+
+
+/*
+ * NAME:	lmLogShutdown()
+ *
+ * FUNCTION:	log shutdown at last LogClose().
+ *
+ *		write log syncpt record.
+ *		update super block to set redone flag to 0.
+ *
+ * PARAMETER:	log	- log inode
+ *
+ * RETURN:	0	- success
+ *			
+ * serialization: single last close thread
+ */
+static int lmLogShutdown(log_t * log)
+{
+	int rc;
+	lrd_t lrd;
+	int lsn;
+	logsuper_t *logsuper;
+	lbuf_t *bpsuper;
+	lbuf_t *bp;
+	logpage_t *lp;
+
+	jFYI(1, ("lmLogShutdown: log:0x%p\n", log));
+
+	if (log->cqueue.head || !list_empty(&log->synclist)) {
+		/*
+		 * If there was very recent activity, we may need to wait
+		 * for the lazycommit thread to catch up
+		 */
+		int i;
+
+		for (i = 0; i < 800; i++) {	/* Too much? */
+			current->state = TASK_INTERRUPTIBLE;
+			schedule_timeout(HZ / 4);
+			if ((log->cqueue.head == NULL) &&
+			    list_empty(&log->synclist))
+				break;
+		}
+	}
+	assert(log->cqueue.head == NULL);
+	assert(list_empty(&log->synclist));
+
+	/*
+	 * We need to make sure all of the "written" metapages
+	 * actually make it to disk
+	 */
+	fsync_no_super(log->sb->s_bdev);
+
+	/*
+	 * write the last SYNCPT record with syncpoint = 0
+	 * (i.e., log redo up to HERE !)
+	 */
+	lrd.logtid = 0;
+	lrd.backchain = 0;
+	lrd.type = cpu_to_le16(LOG_SYNCPT);
+	lrd.length = 0;
+	lrd.log.syncpt.sync = 0;
+	lsn = lmWriteRecord(log, NULL, &lrd, NULL);
+	bp = log->bp;
+	lp = (logpage_t *) bp->l_ldata;
+	lp->h.eor = lp->t.eor = cpu_to_le16(bp->l_eor);
+	lbmWrite(log, log->bp, lbmWRITE | lbmRELEASE | lbmSYNC, 0);
+	lbmIOWait(log->bp, lbmFREE);
+
+	/*
+	 * synchronous update log superblock
+	 * mark log state as shutdown cleanly
+	 * (i.e., Log does not need to be replayed).
+	 */
+	if ((rc = lbmRead(log, 1, &bpsuper)))
+		goto out;
+
+	logsuper = (logsuper_t *) bpsuper->l_ldata;
+	logsuper->state = cpu_to_le32(LOGREDONE);
+	logsuper->end = cpu_to_le32(lsn);
+	lbmDirectWrite(log, bpsuper, lbmWRITE | lbmRELEASE | lbmSYNC);
+	rc = lbmIOWait(bpsuper, lbmFREE);
+
+	jFYI(1, ("lmLogShutdown: lsn:0x%x page:%d eor:%d\n",
+		 lsn, log->page, log->eor));
+
+      out:    
+	/*
+	 * shutdown per log i/o
+	 */
+	lbmLogShutdown(log);
+
+	if (rc) {
+		jFYI(1, ("lmLogShutdown: exit(%d)\n", rc));
+	}
+	return rc;
+}
+
+
+#ifdef _STILL_TO_PORT
+/*
+ * NAME:	lmLogFileSystem()
+ *
+ * FUNCTION:	insert (<activate> = true)/remove (<activate> = false)
+ *	file system into/from log active file system list.
+ *
+ * PARAMETE:	log	- pointer to logs inode.
+ *		fsdev	- dev_t of filesystem.
+ *		serial  - pointer to returned log serial number
+ *		activate - insert/remove device from active list.
+ *
+ * RETURN:	0	- success
+ *		errors returned by vms_iowait().
+ *			
+ * serialization: IWRITE_LOCK(log inode) held on entry/exit
+ */
+static int lmLogFileSystem(log_t * log, dev_t fsdev, int activate)
+{
+	int rc = 0;
+	int bit, word;
+	logsuper_t *logsuper;
+	lbuf_t *bpsuper;
+
+	/*
+	 * insert/remove file system device to log active file system list.
+	 */
+	if ((rc = lbmRead(log, 1, &bpsuper)))
+		return rc;
+
+	logsuper = (logsuper_t *) bpsuper->l_ldata;
+	bit = MINOR(fsdev);
+	word = bit / 32;
+	bit -= 32 * word;
+	if (activate)
+		logsuper->active[word] |=
+		    cpu_to_le32((LEFTMOSTONE >> bit));
+	else
+		logsuper->active[word] &=
+		    cpu_to_le32((~(LEFTMOSTONE >> bit)));
+
+	/*
+	 * synchronous write log superblock:
+	 *
+	 * write sidestream bypassing write queue:
+	 * at file system mount, log super block is updated for
+	 * activation of the file system before any log record
+	 * (MOUNT record) of the file system, and at file system
+	 * unmount, all meta data for the file system has been
+	 * flushed before log super block is updated for deactivation
+	 * of the file system.
+	 */
+	lbmDirectWrite(log, bpsuper, lbmWRITE | lbmRELEASE | lbmSYNC);
+	rc = lbmIOWait(bpsuper, lbmFREE);
+
+	return rc;
+}
+#endif				/* _STILL_TO_PORT */
+
+
+/*
+ *	lmLogQuiesce()
+ */
+int lmLogQuiesce(log_t * log)
+{
+	int rc;
+
+	rc = lmLogShutdown(log);
+
+	return rc;
+}
+
+
+/*
+ *	lmLogResume()
+ */
+int lmLogResume(log_t * log, struct super_block *sb)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(sb);
+	int rc;
+
+	log->base = addressPXD(&sbi->logpxd);
+	log->size =
+	    (lengthPXD(&sbi->logpxd) << sb->s_blocksize_bits) >> L2LOGPSIZE;
+	rc = lmLogInit(log);
+
+	return rc;
+}
+
+
+/*
+ *		log buffer manager (lbm)
+ *		------------------------
+ *
+ * special purpose buffer manager supporting log i/o requirements.
+ *
+ * per log write queue:
+ * log pageout occurs in serial order by fifo write queue and
+ * restricting to a single i/o in pregress at any one time.
+ * a circular singly-linked list
+ * (log->wrqueue points to the tail, and buffers are linked via
+ * bp->wrqueue field), and
+ * maintains log page in pageout ot waiting for pageout in serial pageout.
+ */
+
+/*
+ *	lbmLogInit()
+ *
+ * initialize per log I/O setup at lmLogInit()
+ */
+static int lbmLogInit(log_t * log)
+{				/* log inode */
+	int i;
+	lbuf_t *lbuf;
+
+	jFYI(1, ("lbmLogInit: log:0x%p\n", log));
+
+	/* initialize current buffer cursor */
+	log->bp = NULL;
+
+	/* initialize log device write queue */
+	log->wqueue = NULL;
+
+	/*
+	 * Each log has its own buffer pages allocated to it.  These are
+	 * not managed by the page cache.  This ensures that a transaction
+	 * writing to the log does not block trying to allocate a page from
+	 * the page cache (for the log).  This would be bad, since page
+	 * allocation waits on the kswapd thread that may be committing inodes
+	 * which would cause log activity.  Was that clear?  I'm trying to
+	 * avoid deadlock here.
+	 */
+	init_waitqueue_head(&log->free_wait);
+
+	log->lbuf_free = NULL;
+
+	for (i = 0; i < LOGPAGES; i++) {
+		lbuf = kmalloc(sizeof(lbuf_t), GFP_KERNEL);
+		if (lbuf == 0)
+			goto error;
+		lbuf->l_ldata = (char *) __get_free_page(GFP_KERNEL);
+		if (lbuf->l_ldata == 0) {
+			kfree(lbuf);
+			goto error;
+		}
+		lbuf->l_log = log;
+		init_waitqueue_head(&lbuf->l_ioevent);
+
+		lbuf->l_freelist = log->lbuf_free;
+		log->lbuf_free = lbuf;
+	}
+
+	return (0);
+
+      error:
+	lbmLogShutdown(log);
+	return (ENOMEM);
+}
+
+
+/*
+ *	lbmLogShutdown()
+ *
+ * finalize per log I/O setup at lmLogShutdown()
+ */
+static void lbmLogShutdown(log_t * log)
+{
+	lbuf_t *lbuf;
+
+	jFYI(1, ("lbmLogShutdown: log:0x%p\n", log));
+
+	lbuf = log->lbuf_free;
+	while (lbuf) {
+		lbuf_t *next = lbuf->l_freelist;
+		free_page((unsigned long) lbuf->l_ldata);
+		kfree(lbuf);
+		lbuf = next;
+	}
+
+	log->bp = NULL;
+}
+
+
+/*
+ *	lbmAllocate()
+ *
+ * allocate an empty log buffer
+ */
+static lbuf_t *lbmAllocate(log_t * log, int pn)
+{
+	lbuf_t *bp;
+	unsigned long flags;
+
+	/*
+	 * recycle from log buffer freelist if any
+	 */
+	LCACHE_LOCK(flags);
+	LCACHE_SLEEP_COND(log->free_wait, (bp = log->lbuf_free), flags);
+	log->lbuf_free = bp->l_freelist;
+	LCACHE_UNLOCK(flags);
+
+	bp->l_flag = 0;
+
+	bp->l_wqnext = NULL;
+	bp->l_freelist = NULL;
+
+	bp->l_pn = pn;
+	bp->l_blkno = log->base + (pn << (L2LOGPSIZE - log->l2bsize));
+	bp->l_ceor = 0;
+
+	return bp;
+}
+
+
+/*
+ *	lbmFree()
+ *
+ * release a log buffer to freelist
+ */
+static void lbmFree(lbuf_t * bp)
+{
+	unsigned long flags;
+
+	LCACHE_LOCK(flags);
+
+	lbmfree(bp);
+
+	LCACHE_UNLOCK(flags);
+}
+
+static void lbmfree(lbuf_t * bp)
+{
+	log_t *log = bp->l_log;
+
+	assert(bp->l_wqnext == NULL);
+
+	/*
+	 * return the buffer to head of freelist
+	 */
+	bp->l_freelist = log->lbuf_free;
+	log->lbuf_free = bp;
+
+	wake_up(&log->free_wait);
+	return;
+}
+
+
+#ifdef _THIS_IS_NOT_USED
+/*
+ *	lbmRelease()
+ *
+ * remove the log buffer from log device write queue;
+ */
+static void lbmRelease(log_t * log, uint flag)
+{
+	lbuf_t *bp, *tail;
+	unsigned long flags;
+
+	bp = log->bp;
+
+	LCACHE_LOCK(flags);
+
+	tail = log->wqueue;
+
+	/* single element queue */
+	if (bp == tail) {
+		log->wqueue = NULL;
+		bp->l_wqnext = NULL;
+	}
+	/* multi element queue */
+	else {
+		tail->l_wqnext = bp->l_wqnext;
+		bp->l_wqnext = NULL;
+	}
+
+	if (flag & lbmFREE)
+		lbmfree(bp);
+
+	LCACHE_UNLOCK(flags);
+}
+#endif				/* _THIS_IS_NOT_USED */
+
+
+/*
+ * NAME:	lbmRedrive
+ *
+ * FUNCTION:	add a log buffer to the the log redrive list
+ *
+ * PARAMETER:
+ *     bp	- log buffer
+ *
+ * NOTES:
+ *	Takes log_redrive_lock.
+ */
+static inline void lbmRedrive(lbuf_t *bp)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&log_redrive_lock, flags);
+	bp->l_redrive_next = log_redrive_list;
+	log_redrive_list = bp;
+	spin_unlock_irqrestore(&log_redrive_lock, flags);
+
+	wake_up_process(jfsIOtask);
+}
+
+
+/*
+ *	lbmRead()
+ */
+static int lbmRead(log_t * log, int pn, lbuf_t ** bpp)
+{
+	struct bio *bio;
+	lbuf_t *bp;
+
+	/*
+	 * allocate a log buffer
+	 */
+	*bpp = bp = lbmAllocate(log, pn);
+	jFYI(1, ("lbmRead: bp:0x%p pn:0x%x\n", bp, pn));
+
+	bp->l_flag |= lbmREAD;
+
+	bio = bio_alloc(GFP_NOFS, 1);
+
+	bio->bi_sector = bp->l_blkno << (log->l2bsize - 9);
+	bio->bi_dev = log->dev;
+	bio->bi_io_vec[0].bv_page = virt_to_page(bp->l_ldata);
+	bio->bi_io_vec[0].bv_len = LOGPSIZE;
+	bio->bi_io_vec[0].bv_offset = 0;
+
+	bio->bi_vcnt = 1;
+	bio->bi_idx = 0;
+	bio->bi_size = LOGPSIZE;
+
+	bio->bi_end_io = lbmIODone;
+	bio->bi_private = bp;
+	submit_bio(READ, bio);
+	run_task_queue(&tq_disk);
+
+	wait_event(bp->l_ioevent, (bp->l_flag != lbmREAD));
+
+	return 0;
+}
+
+
+/*
+ *	lbmWrite()
+ *
+ * buffer at head of pageout queue stays after completion of
+ * partial-page pageout and redriven by explicit initiation of
+ * pageout by caller until full-page pageout is completed and
+ * released.
+ *
+ * device driver i/o done redrives pageout of new buffer at
+ * head of pageout queue when current buffer at head of pageout
+ * queue is released at the completion of its full-page pageout.
+ *
+ * LOGGC_LOCK() serializes lbmWrite() by lmNextPage() and lmGroupCommit().
+ * LCACHE_LOCK() serializes xflag between lbmWrite() and lbmIODone()
+ */
+static void lbmWrite(log_t * log, lbuf_t * bp, int flag, int cant_block)
+{
+	lbuf_t *tail;
+	unsigned long flags;
+
+	jFYI(1, ("lbmWrite: bp:0x%p flag:0x%x pn:0x%x\n",
+		 bp, flag, bp->l_pn));
+
+	/* map the logical block address to physical block address */
+	bp->l_blkno =
+	    log->base + (bp->l_pn << (L2LOGPSIZE - log->l2bsize));
+
+	LCACHE_LOCK(flags);		/* disable+lock */
+
+	/*
+	 * initialize buffer for device driver
+	 */
+	bp->l_flag = flag;
+
+	/*
+	 *      insert bp at tail of write queue associated with log
+	 *
+	 * (request is either for bp already/currently at head of queue
+	 * or new bp to be inserted at tail)
+	 */
+	tail = log->wqueue;
+
+	/* is buffer not already on write queue ? */
+	if (bp->l_wqnext == NULL) {
+		/* insert at tail of wqueue */
+		if (tail == NULL) {
+			log->wqueue = bp;
+			bp->l_wqnext = bp;
+		} else {
+			log->wqueue = bp;
+			bp->l_wqnext = tail->l_wqnext;
+			tail->l_wqnext = bp;
+		}
+
+		tail = bp;
+	}
+
+	/* is buffer at head of wqueue and for write ? */
+	if ((bp != tail->l_wqnext) || !(flag & lbmWRITE)) {
+		LCACHE_UNLOCK(flags);	/* unlock+enable */
+		return;
+	}
+
+	LCACHE_UNLOCK(flags);	/* unlock+enable */
+
+	if (cant_block)
+		lbmRedrive(bp);
+	else if (flag & lbmSYNC)
+		lbmStartIO(bp);
+	else {
+		LOGGC_UNLOCK(log);
+		lbmStartIO(bp);
+		LOGGC_LOCK(log);
+	}
+}
+
+
+/*
+ *	lbmDirectWrite()
+ *
+ * initiate pageout bypassing write queue for sidestream
+ * (e.g., log superblock) write;
+ */
+static void lbmDirectWrite(log_t * log, lbuf_t * bp, int flag)
+{
+	jEVENT(0, ("lbmDirectWrite: bp:0x%p flag:0x%x pn:0x%x\n",
+		   bp, flag, bp->l_pn));
+
+	/*
+	 * initialize buffer for device driver
+	 */
+	bp->l_flag = flag | lbmDIRECT;
+
+	/* map the logical block address to physical block address */
+	bp->l_blkno =
+	    log->base + (bp->l_pn << (L2LOGPSIZE - log->l2bsize));
+
+	/*
+	 *      initiate pageout of the page
+	 */
+	lbmStartIO(bp);
+}
+
+
+/*
+ * NAME:	lbmStartIO()
+ *
+ * FUNCTION:	Interface to DD strategy routine
+ *
+ * RETURN:      none
+ *
+ * serialization: LCACHE_LOCK() is NOT held during log i/o;
+ */
+void lbmStartIO(lbuf_t * bp)
+{
+	struct bio *bio;
+	log_t *log = bp->l_log;
+
+	jFYI(1, ("lbmStartIO\n"));
+
+	bio = bio_alloc(GFP_NOFS, 1);
+	bio->bi_sector = bp->l_blkno << (log->l2bsize - 9);
+	bio->bi_dev = log->dev;
+	bio->bi_io_vec[0].bv_page = virt_to_page(bp->l_ldata);
+	bio->bi_io_vec[0].bv_len = LOGPSIZE;
+	bio->bi_io_vec[0].bv_offset = 0;
+
+	bio->bi_vcnt = 1;
+	bio->bi_idx = 0;
+	bio->bi_size = LOGPSIZE;
+
+	bio->bi_end_io = lbmIODone;
+	bio->bi_private = bp;
+
+	submit_bio(WRITE, bio);
+
+	INCREMENT(lmStat.submitted);
+	run_task_queue(&tq_disk);
+
+	jFYI(1, ("lbmStartIO done\n"));
+}
+
+
+/*
+ *	lbmIOWait()
+ */
+static int lbmIOWait(lbuf_t * bp, int flag)
+{
+	unsigned long flags;
+	int rc = 0;
+
+	jFYI(1,
+	     ("lbmIOWait1: bp:0x%p flag:0x%x:0x%x\n", bp, bp->l_flag,
+	      flag));
+
+	LCACHE_LOCK(flags);		/* disable+lock */
+
+	LCACHE_SLEEP_COND(bp->l_ioevent, (bp->l_flag & lbmDONE), flags);
+
+	rc = (bp->l_flag & lbmERROR) ? EIO : 0;
+
+	if (flag & lbmFREE)
+		lbmfree(bp);
+
+	LCACHE_UNLOCK(flags);	/* unlock+enable */
+
+	jFYI(1,
+	     ("lbmIOWait2: bp:0x%p flag:0x%x:0x%x\n", bp, bp->l_flag,
+	      flag));
+	return rc;
+}
+
+/*
+ *	lbmIODone()
+ *
+ * executed at INTIODONE level
+ */
+static int lbmIODone(struct bio *bio, int nr_sectors)
+{
+	lbuf_t *bp = bio->bi_private;
+	lbuf_t *nextbp, *tail;
+	log_t *log;
+	unsigned long flags;
+
+	/*
+	 * get back jfs buffer bound to the i/o buffer
+	 */
+	jEVENT(0, ("lbmIODone: bp:0x%p flag:0x%x\n", bp, bp->l_flag));
+
+	LCACHE_LOCK(flags);		/* disable+lock */
+
+	bp->l_flag |= lbmDONE;
+
+	if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) {
+		bp->l_flag |= lbmERROR;
+
+		jERROR(1, ("lbmIODone: I/O error in JFS log\n"));
+	}
+	bio_put(bio);
+
+	/*
+	 *      pagein completion
+	 */
+	if (bp->l_flag & lbmREAD) {
+		bp->l_flag &= ~lbmREAD;
+
+		LCACHE_UNLOCK(flags);	/* unlock+enable */
+
+		/* wakeup I/O initiator */
+		LCACHE_WAKEUP(&bp->l_ioevent);
+
+		return 0;
+	}
+
+	/*
+	 *      pageout completion
+	 *
+	 * the bp at the head of write queue has completed pageout.
+	 *
+	 * if single-commit/full-page pageout, remove the current buffer
+	 * from head of pageout queue, and redrive pageout with
+	 * the new buffer at head of pageout queue;
+	 * otherwise, the partial-page pageout buffer stays at
+	 * the head of pageout queue to be redriven for pageout
+	 * by lmGroupCommit() until full-page pageout is completed.
+	 */
+	bp->l_flag &= ~lbmWRITE;
+	INCREMENT(lmStat.pagedone);
+
+	/* update committed lsn */
+	log = bp->l_log;
+	log->clsn = (bp->l_pn << L2LOGPSIZE) + bp->l_ceor;
+
+	if (bp->l_flag & lbmDIRECT) {
+		LCACHE_WAKEUP(&bp->l_ioevent);
+		LCACHE_UNLOCK(flags);
+		return 0;
+	}
+
+	tail = log->wqueue;
+
+	/* single element queue */
+	if (bp == tail) {
+		/* remove head buffer of full-page pageout
+		 * from log device write queue
+		 */
+		if (bp->l_flag & lbmRELEASE) {
+			log->wqueue = NULL;
+			bp->l_wqnext = NULL;
+		}
+	}
+	/* multi element queue */
+	else {
+		/* remove head buffer of full-page pageout
+		 * from log device write queue
+		 */
+		if (bp->l_flag & lbmRELEASE) {
+			nextbp = tail->l_wqnext = bp->l_wqnext;
+			bp->l_wqnext = NULL;
+
+			/*
+			 * redrive pageout of next page at head of write queue:
+			 * redrive next page without any bound tblk
+			 * (i.e., page w/o any COMMIT records), or
+			 * first page of new group commit which has been
+			 * queued after current page (subsequent pageout
+			 * is performed synchronously, except page without
+			 * any COMMITs) by lmGroupCommit() as indicated
+			 * by lbmWRITE flag;
+			 */
+			if (nextbp->l_flag & lbmWRITE) {
+				/*
+				 * We can't do the I/O at interrupt time.
+				 * The jfsIO thread can do it
+				 */
+				lbmRedrive(nextbp);
+			}
+		}
+	}
+
+	/*
+	 *      synchronous pageout:
+	 *
+	 * buffer has not necessarily been removed from write queue
+	 * (e.g., synchronous write of partial-page with COMMIT):
+	 * leave buffer for i/o initiator to dispose
+	 */
+	if (bp->l_flag & lbmSYNC) {
+		LCACHE_UNLOCK(flags);	/* unlock+enable */
+
+		/* wakeup I/O initiator */
+		LCACHE_WAKEUP(&bp->l_ioevent);
+	}
+
+	/*
+	 *      Group Commit pageout:
+	 */
+	else if (bp->l_flag & lbmGC) {
+		LCACHE_UNLOCK(flags);
+		lmPostGC(bp);
+	}
+
+	/*
+	 *      asynchronous pageout:
+	 *
+	 * buffer must have been removed from write queue:
+	 * insert buffer at head of freelist where it can be recycled
+	 */
+	else {
+		assert(bp->l_flag & lbmRELEASE);
+		assert(bp->l_flag & lbmFREE);
+		lbmfree(bp);
+
+		LCACHE_UNLOCK(flags);	/* unlock+enable */
+	}
+	return 0;
+}
+
+int jfsIOWait(void *arg)
+{
+	lbuf_t *bp;
+
+	jFYI(1, ("jfsIOWait is here!\n"));
+
+	lock_kernel();
+
+	daemonize();
+	current->tty = NULL;
+	strcpy(current->comm, "jfsIO");
+
+	unlock_kernel();
+
+	jfsIOtask = current;
+
+	spin_lock_irq(&current->sigmask_lock);
+	siginitsetinv(&current->blocked,
+		      sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP)
+		      | sigmask(SIGCONT));
+	spin_unlock_irq(&current->sigmask_lock);
+
+	complete(&jfsIOwait);
+
+	do {
+		spin_lock_irq(&log_redrive_lock);
+		while ((bp = log_redrive_list)) {
+			log_redrive_list = bp->l_redrive_next;
+			bp->l_redrive_next = NULL;
+			spin_unlock_irq(&log_redrive_lock);
+			lbmStartIO(bp);
+			spin_lock_irq(&log_redrive_lock);
+		}
+		spin_unlock_irq(&log_redrive_lock);
+
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule();
+	} while (!jfs_thread_stopped());
+
+	jFYI(1,("jfsIOWait being killed!\n"));
+	complete(&jfsIOwait);
+	return 0;
+}
+
+
+#ifdef _STILL_TO_PORT
+/*
+ *	lbmDirectIODone()
+ *
+ * iodone() for lbmDirectWrite() to bypass write queue;
+ * executed at INTIODONE level;
+ */
+static void lbmDirectIODone(iobuf_t * iobp)
+{
+	lbuf_t *bp;
+	unsigned long flags;
+
+	/*
+	 * get back jfs buffer bound to the io buffer
+	 */
+	bp = (lbuf_t *) iobp->b_jfsbp;
+	jEVENT(0,
+	       ("lbmDirectIODone: bp:0x%p flag:0x%x\n", bp, bp->l_flag));
+
+	LCACHE_LOCK(flags);		/* disable+lock */
+
+	bp->l_flag |= lbmDONE;
+
+	if (iobp->b_flags & B_ERROR) {
+		bp->l_flag |= lbmERROR;
+#ifdef _JFS_OS2
+		SysLogError();
+#endif
+	}
+
+	/*
+	 *      pageout completion
+	 */
+	bp->l_flag &= ~lbmWRITE;
+
+	/*
+	 *      synchronous pageout:
+	 */
+	if (bp->l_flag & lbmSYNC) {
+		LCACHE_UNLOCK(flags);	/* unlock+enable */
+
+		/* wakeup I/O initiator */
+		LCACHE_WAKEUP(&bp->l_ioevent);
+	}
+	/*
+	 *      asynchronous pageout:
+	 */
+	else {
+		assert(bp->l_flag & lbmRELEASE);
+		assert(bp->l_flag & lbmFREE);
+		lbmfree(bp);
+
+		LCACHE_UNLOCK(flags);	/* unlock+enable */
+	}
+}
+#endif				/* _STILL_TO_PORT */
+
+#ifdef _STILL_TO_PORT
+/*
+ * NAME:	lmLogFormat()/jfs_logform()
+ *
+ * FUNCTION:	format file system log (ref. jfs_logform()).
+ *
+ * PARAMETERS:
+ *	log	- log inode (with common mount inode base);
+ *	logAddress - start address of log space in FS block;
+ *	logSize	- length of log space in FS block;
+ *
+ * RETURN:	0 -	success
+ *		-1 -	i/o error
+ */
+int lmLogFormat(inode_t * ipmnt, s64 logAddress, int logSize)
+{
+	int rc = 0;
+	cbuf_t *bp;
+	logsuper_t *logsuper;
+	logpage_t *lp;
+	int lspn;		/* log sequence page number */
+	struct lrd *lrd_ptr;
+	int npbperpage, npages;
+
+	jFYI(0, ("lmLogFormat: logAddress:%Ld logSize:%d\n",
+		 logAddress, logSize));
+
+	/* allocate a JFS buffer */
+	bp = rawAllocate();
+
+	/* map the logical block address to physical block address */
+	bp->cm_blkno = logAddress << ipmnt->i_l2bfactor;
+
+	npbperpage = LOGPSIZE >> ipmnt->i_l2pbsize;
+	npages = logSize / (LOGPSIZE >> ipmnt->i_l2bsize);
+
+	/*
+	 *      log space:
+	 *
+	 * page 0 - reserved;
+	 * page 1 - log superblock;
+	 * page 2 - log data page: A SYNC log record is written
+	 *          into this page at logform time;
+	 * pages 3-N - log data page: set to empty log data pages;
+	 */
+	/*
+	 *      init log superblock: log page 1
+	 */
+	logsuper = (logsuper_t *) bp->cm_cdata;
+
+	logsuper->magic = cpu_to_le32(LOGMAGIC);
+	logsuper->version = cpu_to_le32(LOGVERSION);
+	logsuper->state = cpu_to_le32(LOGREDONE);
+	logsuper->flag = cpu_to_le32(ipmnt->i_mntflag);	/* ? */
+	logsuper->size = cpu_to_le32(npages);
+	logsuper->bsize = cpu_to_le32(ipmnt->i_bsize);
+	logsuper->l2bsize = cpu_to_le32(ipmnt->i_l2bsize);
+	logsuper->end =
+	    cpu_to_le32(2 * LOGPSIZE + LOGPHDRSIZE + LOGRDSIZE);
+
+	bp->cm_blkno += npbperpage;
+	rawWrite(ipmnt, bp, 0);
+
+	/*
+	 *      init pages 2 to npages-1 as log data pages:
+	 *
+	 * log page sequence number (lpsn) initialization:
+	 *
+	 * pn:   0     1     2     3                 n-1
+	 *       +-----+-----+=====+=====+===.....===+=====+
+	 * lspn:             N-1   0     1           N-2
+	 *                   <--- N page circular file ---->
+	 *
+	 * the N (= npages-2) data pages of the log is maintained as
+	 * a circular file for the log records;
+	 * lpsn grows by 1 monotonically as each log page is written
+	 * to the circular file of the log;
+	 * Since the AIX DUMMY log record is dropped for this XJFS,
+	 * and setLogpage() will not reset the page number even if
+	 * the eor is equal to LOGPHDRSIZE. In order for binary search
+	 * still work in find log end process, we have to simulate the
+	 * log wrap situation at the log format time.
+	 * The 1st log page written will have the highest lpsn. Then
+	 * the succeeding log pages will have ascending order of
+	 * the lspn starting from 0, ... (N-2)
+	 */
+	lp = (logpage_t *) bp->cm_cdata;
+
+	/*
+	 * initialize 1st log page to be written: lpsn = N - 1,
+	 * write a SYNCPT log record is written to this page
+	 */
+	lp->h.page = lp->t.page = cpu_to_le32(npages - 3);
+	lp->h.eor = lp->t.eor = cpu_to_le16(LOGPHDRSIZE + LOGRDSIZE);
+
+	lrd_ptr = (struct lrd *) &lp->data;
+	lrd_ptr->logtid = 0;
+	lrd_ptr->backchain = 0;
+	lrd_ptr->type = cpu_to_le16(LOG_SYNCPT);
+	lrd_ptr->length = 0;
+	lrd_ptr->log.syncpt.sync = 0;
+
+	bp->cm_blkno += npbperpage;
+	rawWrite(ipmnt, bp, 0);
+
+	/*
+	 *      initialize succeeding log pages: lpsn = 0, 1, ..., (N-2)
+	 */
+	for (lspn = 0; lspn < npages - 3; lspn++) {
+		lp->h.page = lp->t.page = cpu_to_le32(lspn);
+		lp->h.eor = lp->t.eor = cpu_to_le16(LOGPHDRSIZE);
+
+		bp->cm_blkno += npbperpage;
+		rawWrite(ipmnt, bp, 0);
+	}
+
+	/*
+	 *      finalize log
+	 */
+	/* release the buffer */
+	rawRelease(bp);
+
+	return rc;
+}
+#endif				/* _STILL_TO_PORT */
+
+
+#ifdef CONFIG_JFS_STATISTICS
+int jfs_lmstats_read(char *buffer, char **start, off_t offset, int length,
+		      int *eof, void *data)
+{
+	int len = 0;
+	off_t begin;
+
+	len += sprintf(buffer,
+		       "JFS Logmgr stats\n"
+		       "================\n"
+		       "commits = %d\n"
+		       "writes submitted = %d\n"
+		       "writes completed = %d\n",
+		       lmStat.commit,
+		       lmStat.submitted,
+		       lmStat.pagedone);
+
+	begin = offset;
+	*start = buffer + begin;
+	len -= begin;
+
+	if (len > length)
+		len = length;
+	else
+		*eof = 1;
+
+	if (len < 0)
+		len = 0;
+
+	return len;
+}
+#endif /* CONFIG_JFS_STATISTICS */
--- a/fs/jfs/jfs_logmgr.h
+++ b/fs/jfs/jfs_logmgr.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef	_H_JFS_LOGMGR
+#define _H_JFS_LOGMGR
+
+
+#include "jfs_filsys.h"
+#include "jfs_lock.h"
+
+/*
+ *	log manager configuration parameters
+ */
+
+/* log page size */
+#define	LOGPSIZE	4096
+#define	L2LOGPSIZE	12
+
+#define LOGPAGES	16	/* Log pages per mounted file system */
+
+/*
+ *	log logical volume
+ *
+ * a log is used to make the commit operation on journalled 
+ * files within the same logical volume group atomic.
+ * a log is implemented with a logical volume.
+ * there is one log per logical volume group. 
+ *
+ * block 0 of the log logical volume is not used (ipl etc).
+ * block 1 contains a log "superblock" and is used by logFormat(),
+ * lmLogInit(), lmLogShutdown(), and logRedo() to record status 
+ * of the log but is not otherwise used during normal processing. 
+ * blocks 2 - (N-1) are used to contain log records.
+ *
+ * when a volume group is varied-on-line, logRedo() must have 
+ * been executed before the file systems (logical volumes) in 
+ * the volume group can be mounted.
+ */
+/*
+ *	log superblock (block 1 of logical volume)
+ */
+#define	LOGSUPER_B	1
+#define	LOGSTART_B	2
+
+#define	LOGMAGIC	0x87654321
+#define	LOGVERSION	1
+
+typedef struct {
+	u32 magic;		/* 4: log lv identifier */
+	s32 version;		/* 4: version number */
+	s32 serial;		/* 4: log open/mount counter */
+	s32 size;		/* 4: size in number of LOGPSIZE blocks */
+	s32 bsize;		/* 4: logical block size in byte */
+	s32 l2bsize;		/* 4: log2 of bsize */
+
+	u32 flag;		/* 4: option */
+	u32 state;		/* 4: state - see below */
+
+	s32 end;		/* 4: addr of last log record set by logredo */
+	u32 active[8];		/* 32: active file systems bit vector */
+	s32 rsrvd[LOGPSIZE / 4 - 17];
+} logsuper_t;
+
+/* log flag: commit option (see jfs_filsys.h) */
+
+/* log state */
+#define	LOGMOUNT	0	/* log mounted by lmLogInit() */
+#define LOGREDONE	1	/* log shutdown by lmLogShutdown().
+				 * log redo completed by logredo().
+				 */
+#define LOGWRAP		2	/* log wrapped */
+#define LOGREADERR	3	/* log read error detected in logredo() */
+
+
+/*
+ *	log logical page
+ *
+ * (this comment should be rewritten !)
+ * the header and trailer structures (h,t) will normally have 
+ * the same page and eor value.
+ * An exception to this occurs when a complete page write is not 
+ * accomplished on a power failure. Since the hardware may "split write"
+ * sectors in the page, any out of order sequence may occur during powerfail 
+ * and needs to be recognized during log replay.  The xor value is
+ * an "exclusive or" of all log words in the page up to eor.  This
+ * 32 bit eor is stored with the top 16 bits in the header and the
+ * bottom 16 bits in the trailer.  logredo can easily recognize pages
+ * that were not completed by reconstructing this eor and checking 
+ * the log page.
+ *
+ * Previous versions of the operating system did not allow split 
+ * writes and detected partially written records in logredo by 
+ * ordering the updates to the header, trailer, and the move of data 
+ * into the logdata area.  The order: (1) data is moved (2) header 
+ * is updated (3) trailer is updated.  In logredo, when the header 
+ * differed from the trailer, the header and trailer were reconciled 
+ * as follows: if h.page != t.page they were set to the smaller of 
+ * the two and h.eor and t.eor set to 8 (i.e. empty page). if (only) 
+ * h.eor != t.eor they were set to the smaller of their two values.
+ */
+typedef struct {
+	struct {		/* header */
+		s32 page;	/* 4: log sequence page number */
+		s16 rsrvd;	/* 2: */
+		s16 eor;	/* 2: end-of-log offset of lasrt record write */
+	} h;
+
+	s32 data[LOGPSIZE / 4 - 4];	/* log record area */
+
+	struct {		/* trailer */
+		s32 page;	/* 4: normally the same as h.page */
+		s16 rsrvd;	/* 2: */
+		s16 eor;	/* 2: normally the same as h.eor */
+	} t;
+} logpage_t;
+
+#define LOGPHDRSIZE	8	/* log page header size */
+#define LOGPTLRSIZE	8	/* log page trailer size */
+
+
+/*
+ *	log record
+ *
+ * (this comment should be rewritten !)
+ * jfs uses only "after" log records (only a single writer is allowed
+ * in a  page, pages are written to temporary paging space if
+ * if they must be written to disk before commit, and i/o is
+ * scheduled for modified pages to their home location after
+ * the log records containing the after values and the commit 
+ * record is written to the log on disk, undo discards the copy
+ * in main-memory.)
+ *
+ * a log record consists of a data area of variable length followed by 
+ * a descriptor of fixed size LOGRDSIZE bytes.
+ * the  data area is rounded up to an integral number of 4-bytes and 
+ * must be no longer than LOGPSIZE.
+ * the descriptor is of size of multiple of 4-bytes and aligned on a 
+ * 4-byte boundary. 
+ * records are packed one after the other in the data area of log pages.
+ * (sometimes a DUMMY record is inserted so that at least one record ends 
+ * on every page or the longest record is placed on at most two pages).
+ * the field eor in page header/trailer points to the byte following 
+ * the last record on a page.
+ */
+
+/* log record types */
+#define LOG_COMMIT		0x8000
+#define LOG_SYNCPT		0x4000
+#define LOG_MOUNT		0x2000
+#define LOG_REDOPAGE		0x0800
+#define LOG_NOREDOPAGE		0x0080
+#define LOG_NOREDOINOEXT	0x0040
+#define LOG_UPDATEMAP		0x0008
+#define LOG_NOREDOFILE		0x0001
+
+/* REDOPAGE/NOREDOPAGE log record data type */
+#define	LOG_INODE		0x0001
+#define	LOG_XTREE		0x0002
+#define	LOG_DTREE		0x0004
+#define	LOG_BTROOT		0x0010
+#define	LOG_EA			0x0020
+#define	LOG_ACL			0x0040
+#define	LOG_DATA		0x0080
+#define	LOG_NEW			0x0100
+#define	LOG_EXTEND		0x0200
+#define LOG_RELOCATE		0x0400
+#define LOG_DIR_XTREE		0x0800	/* Xtree is in directory inode */
+
+/* UPDATEMAP log record descriptor type */
+#define	LOG_ALLOCXADLIST	0x0080
+#define	LOG_ALLOCPXDLIST	0x0040
+#define	LOG_ALLOCXAD		0x0020
+#define	LOG_ALLOCPXD		0x0010
+#define	LOG_FREEXADLIST		0x0008
+#define	LOG_FREEPXDLIST		0x0004
+#define	LOG_FREEXAD		0x0002
+#define	LOG_FREEPXD		0x0001
+
+
+typedef struct lrd {
+	/*
+	 * type independent area
+	 */
+	s32 logtid;		/* 4: log transaction identifier */
+	s32 backchain;		/* 4: ptr to prev record of same transaction */
+	u16 type;		/* 2: record type */
+	s16 length;		/* 2: length of data in record (in byte) */
+	s32 aggregate;		/* 4: file system lv/aggregate */
+	/* (16) */
+
+	/*
+	 * type dependent area (20)
+	 */
+	union {
+
+		/*
+		 *      COMMIT: commit
+		 *
+		 * transaction commit: no type-dependent information;
+		 */
+
+		/*
+		 *      REDOPAGE: after-image
+		 *
+		 * apply after-image;
+		 *
+		 * N.B. REDOPAGE, NOREDOPAGE, and UPDATEMAP must be same format;
+		 */
+		struct {
+			u32 fileset;	/* 4: fileset number */
+			u32 inode;	/* 4: inode number */
+			u16 type;	/* 2: REDOPAGE record type */
+			s16 l2linesize;	/* 2: log2 of line size */
+			pxd_t pxd;	/* 8: on-disk page pxd */
+		} redopage;	/* (20) */
+
+		/*
+		 *      NOREDOPAGE: the page is freed
+		 *
+		 * do not apply after-image records which precede this record
+		 * in the log with the same page block number to this page.
+		 *
+		 * N.B. REDOPAGE, NOREDOPAGE, and UPDATEMAP must be same format;
+		 */
+		struct {
+			s32 fileset;	/* 4: fileset number */
+			u32 inode;	/* 4: inode number */
+			u16 type;	/* 2: NOREDOPAGE record type */
+			s16 rsrvd;	/* 2: reserved */
+			pxd_t pxd;	/* 8: on-disk page pxd */
+		} noredopage;	/* (20) */
+
+		/*
+		 *      UPDATEMAP: update block allocation map
+		 *
+		 * either in-line PXD,
+		 * or     out-of-line  XADLIST;
+		 *
+		 * N.B. REDOPAGE, NOREDOPAGE, and UPDATEMAP must be same format;
+		 */
+		struct {
+			u32 fileset;	/* 4: fileset number */
+			u32 inode;	/* 4: inode number */
+			u16 type;	/* 2: UPDATEMAP record type */
+			s16 nxd;	/* 2: number of extents */
+			pxd_t pxd;	/* 8: pxd */
+		} updatemap;	/* (20) */
+
+		/*
+		 *      NOREDOINOEXT: the inode extent is freed
+		 *
+		 * do not apply after-image records which precede this 
+		 * record in the log with the any of the 4 page block 
+		 * numbers in this inode extent. 
+		 * 
+		 * NOTE: The fileset and pxd fields MUST remain in 
+		 *       the same fields in the REDOPAGE record format.
+		 *
+		 */
+		struct {
+			s32 fileset;	/* 4: fileset number */
+			s32 iagnum;	/* 4: IAG number     */
+			s32 inoext_idx;	/* 4: inode extent index */
+			pxd_t pxd;	/* 8: on-disk page pxd */
+		} noredoinoext;	/* (20) */
+
+		/*
+		 *      SYNCPT: log sync point
+		 *
+		 * replay log upto syncpt address specified;
+		 */
+		struct {
+			s32 sync;	/* 4: syncpt address (0 = here) */
+		} syncpt;
+
+		/*
+		 *      MOUNT: file system mount
+		 *
+		 * file system mount: no type-dependent information;
+		 */
+
+		/*
+		 *      ? FREEXTENT: free specified extent(s)
+		 *
+		 * free specified extent(s) from block allocation map
+		 * N.B.: nextents should be length of data/sizeof(xad_t)
+		 */
+		struct {
+			s32 type;	/* 4: FREEXTENT record type */
+			s32 nextent;	/* 4: number of extents */
+
+			/* data: PXD or XAD list */
+		} freextent;
+
+		/*
+		 *      ? NOREDOFILE: this file is freed
+		 *
+		 * do not apply records which precede this record in the log
+		 * with the same inode number.
+		 *
+		 * NOREDILE must be the first to be written at commit
+		 * (last to be read in logredo()) - it prevents
+		 * replay of preceding updates of all preceding generations
+		 * of the inumber esp. the on-disk inode itself, 
+		 * but does NOT prevent
+		 * replay of the 
+		 */
+		struct {
+			s32 fileset;	/* 4: fileset number */
+			u32 inode;	/* 4: inode number */
+		} noredofile;
+
+		/*
+		 *      ? NEWPAGE: 
+		 *
+		 * metadata type dependent
+		 */
+		struct {
+			s32 fileset;	/* 4: fileset number */
+			u32 inode;	/* 4: inode number */
+			s32 type;	/* 4: NEWPAGE record type */
+			pxd_t pxd;	/* 8: on-disk page pxd */
+		} newpage;
+
+		/*
+		 *      ? DUMMY: filler
+		 *
+		 * no type-dependent information
+		 */
+	} log;
+} lrd_t;			/* (36) */
+
+#define	LOGRDSIZE	(sizeof(struct lrd))
+
+/*
+ *	line vector descriptor
+ */
+typedef struct {
+	s16 offset;
+	s16 length;
+} lvd_t;
+
+
+/*
+ *	log logical volume
+ */
+typedef struct jfs_log {
+
+	struct super_block *sb;	/* 4: This is used to sync metadata
+				 *    before writing syncpt.  Will
+				 *    need to be a list if we share
+				 *    the log between fs's
+				 */
+	kdev_t dev;		/* 4: log lv number */
+	struct file *devfp;	/* 4: log device file */
+	s32 serial;		/* 4: log mount serial number */
+
+	s64 base;		/* @8: log extent address (inline log ) */
+	int size;		/* 4: log size in log page (in page) */
+	int l2bsize;		/* 4: log2 of bsize */
+
+	uint flag;		/* 4: flag */
+	uint state;		/* 4: state */
+
+	struct lbuf *lbuf_free;	/* 4: free lbufs */
+	wait_queue_head_t free_wait;	/* 4: */
+
+	/* log write */
+	int logtid;		/* 4: log tid */
+	int page;		/* 4: page number of eol page */
+	int eor;		/* 4: eor of last record in eol page */
+	struct lbuf *bp;	/* 4: current log page buffer */
+
+	struct semaphore loglock;	/* 4: log write serialization lock */
+
+	/* syncpt */
+	int nextsync;		/* 4: bytes to write before next syncpt */
+	int active;		/* 4: */
+	int syncbarrier;	/* 4: */
+	wait_queue_head_t syncwait;	/* 4: */
+
+	/* commit */
+	uint cflag;		/* 4: */
+	struct {		/* 8: FIFO commit queue header */
+		struct tblock *head;
+		struct tblock *tail;
+	} cqueue;
+	int gcrtc;		/* 4: GC_READY transaction count */
+	struct tblock *gclrt;	/* 4: latest GC_READY transaction */
+	spinlock_t gclock;	/* 4: group commit lock */
+	int logsize;		/* 4: log data area size in byte */
+	int lsn;		/* 4: end-of-log */
+	int clsn;		/* 4: clsn */
+	int syncpt;		/* 4: addr of last syncpt record */
+	int sync;		/* 4: addr from last logsync() */
+	struct list_head synclist;	/* 8: logsynclist anchor */
+	spinlock_t synclock;	/* 4: synclist lock */
+	struct lbuf *wqueue;	/* 4: log pageout queue */
+	int count;		/* 4: count */
+} log_t;
+
+/*
+ * group commit flag
+ */
+/* log_t */
+#define logGC_PAGEOUT	0x00000001
+
+/* tblock_t/lbuf_t */
+#define tblkGC_QUEUE		0x0001
+#define tblkGC_READY		0x0002
+#define tblkGC_COMMIT		0x0004
+#define tblkGC_COMMITTED	0x0008
+#define tblkGC_EOP		0x0010
+#define tblkGC_FREE		0x0020
+#define tblkGC_LEADER		0x0040
+#define tblkGC_ERROR		0x0080
+#define tblkGC_LAZY		0x0100	// D230860
+#define tblkGC_UNLOCKED		0x0200	// D230860
+
+/*
+ *		log cache buffer header
+ */
+typedef struct lbuf {
+	log_t *l_log;		/* 4: log associated with buffer */
+
+	/*
+	 * data buffer base area
+	 */
+	uint l_flag;		/* 4: pageout control flags */
+
+	struct lbuf *l_wqnext;	/* 4: write queue link */
+	struct lbuf *l_freelist;	/* 4: freelistlink */
+
+	int l_pn;		/* 4: log page number */
+	int l_eor;		/* 4: log record eor */
+	int l_ceor;		/* 4: committed log record eor */
+
+	s64 l_blkno;		/* 8: log page block number */
+	caddr_t l_ldata;	/* 4: data page */
+
+	wait_queue_head_t l_ioevent;	/* 4: i/o done event */
+	struct page *l_page;	/* The page itself */
+} lbuf_t;
+
+/* Reuse l_freelist for redrive list */
+#define l_redrive_next l_freelist
+
+/*
+ *	logsynclist block
+ *
+ * common logsyncblk prefix for jbuf_t and tblock_t
+ */
+typedef struct logsyncblk {
+	u16 xflag;		/* flags */
+	u16 flag;		/* only meaninful in tblock_t */
+	lid_t lid;		/* lock id */
+	s32 lsn;		/* log sequence number */
+	struct list_head synclist;	/* log sync list link */
+} logsyncblk_t;
+
+/*
+ *	logsynclist serialization (per log)
+ */
+
+#define LOGSYNC_LOCK_INIT(log) spin_lock_init(&(log)->synclock)
+#define LOGSYNC_LOCK(log) spin_lock(&(log)->synclock)
+#define LOGSYNC_UNLOCK(log) spin_unlock(&(log)->synclock)
+
+/* compute the difference in bytes of lsn from sync point */
+#define logdiff(diff, lsn, log)\
+{\
+	diff = (lsn) - (log)->syncpt;\
+	if (diff < 0)\
+		diff += (log)->logsize;\
+}
+
+extern int lmLogOpen(struct super_block *sb, log_t ** log);
+extern int lmLogClose(struct super_block *sb, log_t * log);
+extern int lmLogSync(log_t * log, int nosyncwait);
+extern int lmLogQuiesce(log_t * log);
+extern int lmLogResume(log_t * log, struct super_block *sb);
+extern int lmLogFormat(struct super_block *sb, s64 logAddress, int logSize);
+
+#endif				/* _H_JFS_LOGMGR */
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Module: jfs/jfs_metapage.c
+ *
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include "jfs_incore.h"
+#include "jfs_filsys.h"
+#include "jfs_metapage.h"
+#include "jfs_txnmgr.h"
+#include "jfs_debug.h"
+
+extern struct task_struct *jfsCommitTask;
+static unsigned int metapages = 1024;	/* ??? Need a better number */
+static unsigned int free_metapages;
+static metapage_t *metapage_buf;
+static unsigned long meta_order;
+static metapage_t *meta_free_list = NULL;
+static spinlock_t meta_lock = SPIN_LOCK_UNLOCKED;
+static wait_queue_head_t meta_wait;
+
+#ifdef CONFIG_JFS_STATISTICS
+struct {
+	uint	pagealloc;	/* # of page allocations */
+	uint	pagefree;	/* # of page frees */
+	uint	lockwait;	/* # of sleeping lock_metapage() calls */
+	uint	allocwait;	/* # of sleeping alloc_metapage() calls */
+} mpStat;
+#endif
+
+
+#define HASH_BITS 10		/* This makes hash_table 1 4K page */
+#define HASH_SIZE (1 << HASH_BITS)
+static metapage_t **hash_table = NULL;
+static unsigned long hash_order;
+
+
+static inline int metapage_locked(struct metapage *mp)
+{
+	return test_bit(META_locked, &mp->flag);
+}
+
+static inline int trylock_metapage(struct metapage *mp)
+{
+	return test_and_set_bit(META_locked, &mp->flag);
+}
+
+static inline void unlock_metapage(struct metapage *mp)
+{
+	clear_bit(META_locked, &mp->flag);
+	wake_up(&mp->wait);
+}
+
+static void __lock_metapage(struct metapage *mp)
+{
+	DECLARE_WAITQUEUE(wait, current);
+
+	INCREMENT(mpStat.lockwait);
+
+	add_wait_queue_exclusive(&mp->wait, &wait);
+	do {
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		if (metapage_locked(mp)) {
+			spin_unlock(&meta_lock);
+			schedule();
+			spin_lock(&meta_lock);
+		}
+	} while (trylock_metapage(mp));
+	__set_current_state(TASK_RUNNING);
+	remove_wait_queue(&mp->wait, &wait);
+}
+
+/* needs meta_lock */
+static inline void lock_metapage(struct metapage *mp)
+{
+	if (trylock_metapage(mp))
+		__lock_metapage(mp);
+}
+
+/* We're currently re-evaluating the method we use to write metadata
+ * pages.  Currently, we have to make sure there no dirty buffer_heads
+ * hanging around after we free the metadata page, since the same
+ * physical disk blocks may be used in a different address space and we
+ * can't write old data over the good data.
+ *
+ * The best way to do this now is with block_invalidate_page.  However,
+ * this is only available in the newer kernels and is not exported
+ * to modules.  block_flushpage is the next best, but it too is not exported
+ * to modules.
+ *
+ * In a module, about the best we have is generic_buffer_fdatasync.  This
+ * synchronously writes any dirty buffers.  This is not optimal, but it will
+ * keep old dirty buffers from overwriting newer data.
+ */
+static inline void invalidate_page(metapage_t *mp)
+{
+#ifdef MODULE
+	generic_buffer_fdatasync(mp->mapping->host, mp->index, mp->index + 1);
+#else
+	lock_page(mp->page);
+	block_flushpage(mp->page, 0);
+	UnlockPage(mp->page);
+#endif
+}
+
+int __init metapage_init(void)
+{
+	int i;
+	metapage_t *last = NULL;
+	metapage_t *mp;
+
+	/*
+	 * Initialize wait queue
+	 */
+	init_waitqueue_head(&meta_wait);
+
+	/*
+	 * Allocate the metapage structures
+	 */
+	for (meta_order = 0;
+	     ((PAGE_SIZE << meta_order) / sizeof(metapage_t)) < metapages;
+	     meta_order++);
+	metapages = (PAGE_SIZE << meta_order) / sizeof(metapage_t);
+
+	jFYI(1, ("metapage_init: metapage size = %Zd, metapages = %d\n",
+		 sizeof(metapage_t), metapages));
+
+	metapage_buf =
+	    (metapage_t *) __get_free_pages(GFP_KERNEL, meta_order);
+	assert(metapage_buf);
+	memset(metapage_buf, 0, PAGE_SIZE << meta_order);
+
+	mp = metapage_buf;
+	for (i = 0; i < metapages; i++, mp++) {
+		mp->flag = 0;
+		set_bit(META_free, &mp->flag);
+		init_waitqueue_head(&mp->wait);
+		mp->hash_next = last;
+		last = mp;
+	}
+	meta_free_list = last;
+	free_metapages = metapages;
+
+	/*
+	 * Now the hash list
+	 */
+	for (hash_order = 0;
+	     ((PAGE_SIZE << hash_order) / sizeof(void *)) < HASH_SIZE;
+	     hash_order++);
+	hash_table =
+	    (metapage_t **) __get_free_pages(GFP_KERNEL, hash_order);
+	assert(hash_table);
+	memset(hash_table, 0, PAGE_SIZE << hash_order);
+
+	return 0;
+}
+
+void metapage_exit(void)
+{
+	free_pages((unsigned long) metapage_buf, meta_order);
+	free_pages((unsigned long) hash_table, hash_order);
+	metapage_buf = 0;	/* This is a signal to the jfsIOwait thread */
+}
+
+/*
+ * Get metapage structure from freelist
+ * 
+ * Caller holds meta_lock
+ */
+static metapage_t *alloc_metapage(int *dropped_lock)
+{
+	metapage_t *new;
+
+	*dropped_lock = FALSE;
+
+	/*
+	 * Reserve two metapages for the lazy commit thread.  Otherwise
+	 * we may deadlock with holders of metapages waiting for tlocks
+	 * that lazy thread should be freeing.
+	 */
+	if ((free_metapages < 3) && (current != jfsCommitTask)) {
+		INCREMENT(mpStat.allocwait);
+		*dropped_lock = TRUE;
+		__SLEEP_COND(meta_wait, (free_metapages > 2),
+			     spin_lock(&meta_lock), spin_unlock(&meta_lock));
+	}
+
+	assert(meta_free_list);
+
+	new = meta_free_list;
+	meta_free_list = new->hash_next;
+	free_metapages--;
+
+	return new;
+}
+
+/*
+ * Put metapage on freelist (holding meta_lock)
+ */
+static inline void __free_metapage(metapage_t * mp)
+{
+	mp->flag = 0;
+	set_bit(META_free, &mp->flag);
+	mp->hash_next = meta_free_list;
+	meta_free_list = mp;
+	free_metapages++;
+	wake_up(&meta_wait);
+}
+
+/*
+ * Put metapage on freelist (not holding meta_lock)
+ */
+static inline void free_metapage(metapage_t * mp)
+{
+	spin_lock(&meta_lock);
+	__free_metapage(mp);
+	spin_unlock(&meta_lock);
+}
+
+/*
+ * Basically same hash as in pagemap.h, but using our hash table
+ */
+static metapage_t **meta_hash(struct address_space *mapping,
+			      unsigned long index)
+{
+#define i (((unsigned long)mapping)/ \
+	   (sizeof(struct inode) & ~(sizeof(struct inode) -1 )))
+#define s(x) ((x) + ((x) >> HASH_BITS))
+	return hash_table + (s(i + index) & (HASH_SIZE - 1));
+#undef i
+#undef s
+}
+
+static metapage_t *search_hash(metapage_t ** hash_ptr,
+			       struct address_space *mapping,
+			       unsigned long index)
+{
+	metapage_t *ptr;
+
+	for (ptr = *hash_ptr; ptr; ptr = ptr->hash_next) {
+		if ((ptr->mapping == mapping) && (ptr->index == index))
+			return ptr;
+	}
+
+	return NULL;
+}
+
+static void add_to_hash(metapage_t * mp, metapage_t ** hash_ptr)
+{
+	if (*hash_ptr)
+		(*hash_ptr)->hash_prev = mp;
+
+	mp->hash_prev = NULL;
+	mp->hash_next = *hash_ptr;
+	*hash_ptr = mp;
+	list_add(&mp->inode_list, &JFS_IP(mp->mapping->host)->mp_list);
+}
+
+static void remove_from_hash(metapage_t * mp, metapage_t ** hash_ptr)
+{
+	list_del(&mp->inode_list);
+
+	if (mp->hash_prev)
+		mp->hash_prev->hash_next = mp->hash_next;
+	else {
+		assert(*hash_ptr == mp);
+		*hash_ptr = mp->hash_next;
+	}
+
+	if (mp->hash_next)
+		mp->hash_next->hash_prev = mp->hash_prev;
+}
+
+/*
+ * Direct address space operations
+ */
+
+static int direct_get_block(struct inode *ip, sector_t lblock,
+			    struct buffer_head *bh_result, int create)
+{
+	if (create)
+		bh_result->b_state |= (1UL << BH_New);
+
+	map_bh(bh_result, ip->i_sb, lblock);
+
+	return 0;
+}
+
+static int direct_writepage(struct page *page)
+{
+	return block_write_full_page(page, direct_get_block);
+}
+
+static int direct_readpage(struct file *fp, struct page *page)
+{
+	return block_read_full_page(page, direct_get_block);
+}
+
+static int direct_prepare_write(struct file *file, struct page *page,
+				unsigned from, unsigned to)
+{
+	return block_prepare_write(page, from, to, direct_get_block);
+}
+
+static int direct_bmap(struct address_space *mapping, long block)
+{
+	return generic_block_bmap(mapping, block, direct_get_block);
+}
+
+struct address_space_operations direct_aops = {
+	readpage:	direct_readpage,
+	writepage:	direct_writepage,
+	sync_page:	block_sync_page,
+	prepare_write:	direct_prepare_write,
+	commit_write:	generic_commit_write,
+	bmap:		direct_bmap,
+};
+
+metapage_t *__get_metapage(struct inode *inode,
+			   unsigned long lblock, unsigned int size,
+			   int absolute, unsigned long new)
+{
+	int dropped_lock;
+	metapage_t **hash_ptr;
+	int l2BlocksPerPage;
+	int l2bsize;
+	struct address_space *mapping;
+	metapage_t *mp;
+	unsigned long page_index;
+	unsigned long page_offset;
+
+	jFYI(1, ("__get_metapage: inode = 0x%p, lblock = 0x%lx\n",
+		 inode, lblock));
+
+	if (absolute)
+		mapping = JFS_SBI(inode->i_sb)->direct_mapping;
+	else
+		mapping = inode->i_mapping;
+
+	spin_lock(&meta_lock);
+
+	hash_ptr = meta_hash(mapping, lblock);
+
+	mp = search_hash(hash_ptr, mapping, lblock);
+	if (mp) {
+	      page_found:
+		if (test_bit(META_discard, &mp->flag)) {
+			assert(new);	/* It's okay to reuse a discarded
+					 * if we expect it to be empty
+					 */
+			clear_bit(META_discard, &mp->flag);
+		}
+		mp->count++;
+		jFYI(1, ("__get_metapage: found 0x%p, in hash\n", mp));
+		assert(mp->logical_size == size);
+		lock_metapage(mp);
+		spin_unlock(&meta_lock);
+	} else {
+		l2bsize = inode->i_sb->s_blocksize_bits;
+		l2BlocksPerPage = PAGE_CACHE_SHIFT - l2bsize;
+		page_index = lblock >> l2BlocksPerPage;
+		page_offset = (lblock - (page_index << l2BlocksPerPage)) <<
+		    l2bsize;
+		if ((page_offset + size) > PAGE_SIZE) {
+			spin_unlock(&meta_lock);
+			jERROR(1, ("MetaData crosses page boundary!!\n"));
+			return NULL;
+		}
+
+		mp = alloc_metapage(&dropped_lock);
+		if (dropped_lock) {
+			/* alloc_metapage blocked, we need to search the hash
+			 * again.  (The goto is ugly, maybe we'll clean this
+			 * up in the future.)
+			 */
+			metapage_t *mp2;
+			mp2 = search_hash(hash_ptr, mapping, lblock);
+			if (mp2) {
+				__free_metapage(mp);
+				mp = mp2;
+				goto page_found;
+			}
+		}
+		mp->flag = 0;
+		lock_metapage(mp);
+		if (absolute)
+			set_bit(META_absolute, &mp->flag);
+		mp->xflag = COMMIT_PAGE;
+		mp->count = 1;
+		atomic_set(&mp->nohomeok,0);
+		mp->mapping = mapping;
+		mp->index = lblock;
+		mp->page = 0;
+		mp->logical_size = size;
+		add_to_hash(mp, hash_ptr);
+		spin_unlock(&meta_lock);
+
+		if (new) {
+			jFYI(1,
+			     ("__get_metapage: Calling grab_cache_page\n"));
+			mp->page = grab_cache_page(mapping, page_index);
+			if (!mp->page) {
+				jERROR(1, ("grab_cache_page failed!\n"));
+				spin_lock(&meta_lock);
+				remove_from_hash(mp, hash_ptr);
+				__free_metapage(mp);
+				spin_unlock(&meta_lock);
+				return NULL;
+			} else
+				INCREMENT(mpStat.pagealloc);
+		} else {
+			jFYI(1,
+			     ("__get_metapage: Calling read_cache_page\n"));
+			mp->page =
+			    read_cache_page(mapping, lblock,
+					    (filler_t *) mapping->a_ops->
+					    readpage, NULL);
+			if (IS_ERR(mp->page)) {
+				jERROR(1, ("read_cache_page failed!\n"));
+				spin_lock(&meta_lock);
+				remove_from_hash(mp, hash_ptr);
+				__free_metapage(mp);
+				spin_unlock(&meta_lock);
+				return NULL;
+			} else
+				INCREMENT(mpStat.pagealloc);
+			lock_page(mp->page);
+		}
+		mp->data = (void *) (kmap(mp->page) + page_offset);
+	}
+	jFYI(1, ("__get_metapage: returning = 0x%p\n", mp));
+	return mp;
+}
+
+void hold_metapage(metapage_t * mp, int force)
+{
+	spin_lock(&meta_lock);
+
+	mp->count++;
+
+	if (force) {
+		ASSERT (!(test_bit(META_forced, &mp->flag)));
+		if (trylock_metapage(mp))
+			set_bit(META_forced, &mp->flag);
+	} else
+		lock_metapage(mp);
+
+	spin_unlock(&meta_lock);
+}
+
+static void __write_metapage(metapage_t * mp)
+{
+	struct inode *ip = (struct inode *) mp->mapping->host;
+	unsigned long page_index;
+	unsigned long page_offset;
+	int rc;
+	int l2bsize = ip->i_sb->s_blocksize_bits;
+	int l2BlocksPerPage = PAGE_CACHE_SHIFT - l2bsize;
+
+	jFYI(1, ("__write_metapage: mp = 0x%p\n", mp));
+
+	if (test_bit(META_discard, &mp->flag)) {
+		/*
+		 * This metadata is no longer valid
+		 */
+		clear_bit(META_dirty, &mp->flag);
+		return;
+	}
+
+	page_index = mp->page->index;
+	page_offset =
+	    (mp->index - (page_index << l2BlocksPerPage)) << l2bsize;
+
+	rc = mp->mapping->a_ops->prepare_write(NULL, mp->page, page_offset,
+					       page_offset +
+					       mp->logical_size);
+	if (rc) {
+		jERROR(1, ("prepare_write return %d!\n", rc));
+		ClearPageUptodate(mp->page);
+		kunmap(mp->page);
+		clear_bit(META_dirty, &mp->flag);
+		return;
+	}
+	rc = mp->mapping->a_ops->commit_write(NULL, mp->page, page_offset,
+					      page_offset +
+					      mp->logical_size);
+	if (rc) {
+		jERROR(1, ("commit_write returned %d\n", rc));
+	}
+
+	clear_bit(META_dirty, &mp->flag);
+
+	jFYI(1, ("__write_metapage done\n"));
+}
+
+void release_metapage(metapage_t * mp)
+{
+	log_t *log;
+	struct inode *ip;
+
+	jFYI(1,
+	     ("release_metapage: mp = 0x%p, flag = 0x%lx\n", mp,
+	      mp->flag));
+
+	spin_lock(&meta_lock);
+	if (test_bit(META_forced, &mp->flag)) {
+		clear_bit(META_forced, &mp->flag);
+		mp->count--;
+		spin_unlock(&meta_lock);
+		return;
+	}
+
+	ip = (struct inode *) mp->mapping->host;
+
+	assert(mp->count);
+	if (--mp->count || atomic_read(&mp->nohomeok)) {
+		unlock_metapage(mp);
+		spin_unlock(&meta_lock);
+	} else {
+		remove_from_hash(mp, meta_hash(mp->mapping, mp->index));
+		spin_unlock(&meta_lock);
+
+		if (mp->page) {
+			kunmap(mp->page);
+			mp->data = 0;
+			if (test_bit(META_dirty, &mp->flag))
+				__write_metapage(mp);
+			UnlockPage(mp->page);
+			if (test_bit(META_sync, &mp->flag)) {
+				sync_metapage(mp);
+				clear_bit(META_sync, &mp->flag);
+			}
+
+			if (test_bit(META_discard, &mp->flag))
+				invalidate_page(mp);
+
+			page_cache_release(mp->page);
+			INCREMENT(mpStat.pagefree);
+		}
+
+		if (mp->lsn) {
+			/*
+			 * Remove metapage from logsynclist.
+			 */
+			log = mp->log;
+			LOGSYNC_LOCK(log);
+			mp->log = 0;
+			mp->lsn = 0;
+			mp->clsn = 0;
+			log->count--;
+			list_del(&mp->synclist);
+			LOGSYNC_UNLOCK(log);
+		}
+
+		free_metapage(mp);
+	}
+	jFYI(1, ("release_metapage: done\n"));
+}
+
+void invalidate_metapages(struct inode *ip, unsigned long addr,
+			 unsigned long len)
+{
+	metapage_t **hash_ptr;
+	unsigned long lblock;
+	int l2BlocksPerPage = PAGE_CACHE_SHIFT - ip->i_sb->s_blocksize_bits;
+	struct address_space *mapping = ip->i_mapping;
+	metapage_t *mp;
+#ifndef MODULE
+	struct page *page;
+#endif
+
+	/*
+	 * First, mark metapages to discard.  They will eventually be
+	 * released, but should not be written.
+	 */
+	for (lblock = addr; lblock < addr + len;
+	     lblock += 1 << l2BlocksPerPage) {
+		hash_ptr = meta_hash(mapping, lblock);
+		spin_lock(&meta_lock);
+		mp = search_hash(hash_ptr, mapping, lblock);
+		if (mp) {
+			set_bit(META_discard, &mp->flag);
+			spin_unlock(&meta_lock);
+			/*
+			 * If in the metapage cache, we've got the page locked
+			 */
+#ifdef MODULE
+			UnlockPage(mp->page);
+			generic_buffer_fdatasync(mp->mapping->host, mp->index,
+						 mp->index+1);
+			lock_page(mp->page);
+#else
+			block_flushpage(mp->page, 0);
+#endif
+		} else {
+			spin_unlock(&meta_lock);
+#ifdef MODULE
+			generic_buffer_fdatasync(ip, lblock << l2BlocksPerPage,
+					(lblock + 1) << l2BlocksPerPage);
+#else
+			page = find_lock_page(mapping,
+					      lblock >> l2BlocksPerPage);
+			if (page) {
+				block_flushpage(page, 0);
+				UnlockPage(page);
+			}
+#endif
+		}
+	}
+}
+
+void invalidate_inode_metapages(struct inode *inode)
+{
+	struct list_head *ptr;
+	metapage_t *mp;
+
+	spin_lock(&meta_lock);
+	list_for_each(ptr, &JFS_IP(inode)->mp_list) {
+		mp = list_entry(ptr, metapage_t, inode_list);
+		clear_bit(META_dirty, &mp->flag);
+		set_bit(META_discard, &mp->flag);
+		kunmap(mp->page);
+		UnlockPage(mp->page);
+		page_cache_release(mp->page);
+		INCREMENT(mpStat.pagefree);
+		mp->data = 0;
+		mp->page = 0;
+	}
+	spin_unlock(&meta_lock);
+	truncate_inode_pages(inode->i_mapping, 0);
+}
+
+#ifdef CONFIG_JFS_STATISTICS
+int jfs_mpstat_read(char *buffer, char **start, off_t offset, int length,
+		    int *eof, void *data)
+{
+	int len = 0;
+	off_t begin;
+
+	len += sprintf(buffer,
+		       "JFS Metapage statistics\n"
+		       "=======================\n"
+		       "metapages in use = %d\n"
+		       "page allocations = %d\n"
+		       "page frees = %d\n"
+		       "lock waits = %d\n"
+		       "allocation waits = %d\n",
+		       metapages - free_metapages,
+		       mpStat.pagealloc,
+		       mpStat.pagefree,
+		       mpStat.lockwait,
+		       mpStat.allocwait);
+
+	begin = offset;
+	*start = buffer + begin;
+	len -= begin;
+
+	if (len > length)
+		len = length;
+	else
+		*eof = 1;
+
+	if (len < 0)
+		len = 0;
+
+	return len;
+}
+#endif
--- a/fs/jfs/jfs_metapage.h
+++ b/fs/jfs/jfs_metapage.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef	_H_JFS_METAPAGE
+#define _H_JFS_METAPAGE
+
+#include <linux/pagemap.h>
+
+typedef struct metapage {
+	/* Common logsyncblk prefix (see jfs_logmgr.h) */
+	u16 xflag;
+	u16 unused;
+	lid_t lid;
+	int lsn;
+	struct list_head synclist;
+	/* End of logsyncblk prefix */
+
+	unsigned long flag;	/* See Below */
+	unsigned long count;	/* Reference count */
+	void *data;		/* Data pointer */
+
+	/* list management stuff */
+	struct metapage *hash_prev;
+	struct metapage *hash_next;	/* Also used for free list */
+
+	struct list_head inode_list;	/* per-inode metapage list */
+	/*
+	 * mapping & index become redundant, but we need these here to
+	 * add the metapage to the hash before we have the real page
+	 */
+	struct address_space *mapping;
+	unsigned long index;
+	wait_queue_head_t wait;
+
+	/* implementation */
+	struct page *page;
+	unsigned long logical_size;
+
+	/* Journal management */
+	int clsn;
+	atomic_t nohomeok;
+	struct jfs_log *log;
+} metapage_t;
+
+/*
+ * Direct-access address space operations
+ */
+extern struct address_space_operations direct_aops;
+
+/* metapage flag */
+#define META_locked	0
+#define META_absolute	1
+#define META_free	2
+#define META_dirty	3
+#define META_sync	4
+#define META_discard	5
+#define META_forced	6
+
+#define mark_metapage_dirty(mp) set_bit(META_dirty, &(mp)->flag)
+
+/* function prototypes */
+extern metapage_t *__get_metapage(struct inode *inode,
+				  unsigned long lblock, unsigned int size,
+				  int absolute, unsigned long new);
+
+#define read_metapage(inode, lblock, size, absolute)\
+	 __get_metapage(inode, lblock, size, absolute, FALSE)
+
+#define get_metapage(inode, lblock, size, absolute)\
+	 __get_metapage(inode, lblock, size, absolute, TRUE)
+
+extern void release_metapage(metapage_t *);
+
+#define flush_metapage(mp) \
+{\
+	set_bit(META_dirty, &(mp)->flag);\
+	set_bit(META_sync, &(mp)->flag);\
+	release_metapage(mp);\
+}
+
+#define sync_metapage(mp) \
+	generic_buffer_fdatasync((struct inode *)mp->mapping->host,\
+				 mp->page->index, mp->page->index + 1)
+
+#define write_metapage(mp) \
+{\
+	set_bit(META_dirty, &(mp)->flag);\
+	release_metapage(mp);\
+}
+
+#define discard_metapage(mp) \
+{\
+	clear_bit(META_dirty, &(mp)->flag);\
+	set_bit(META_discard, &(mp)->flag);\
+	release_metapage(mp);\
+}
+
+extern void hold_metapage(metapage_t *, int);
+
+/*
+ * This routine uses hash to explicitly find small number of pages
+ */
+extern void invalidate_metapages(struct inode *, unsigned long, unsigned long);
+
+/*
+ * This one uses mp_list to invalidate all pages for an inode
+ */
+extern void invalidate_inode_metapages(struct inode *inode);
+#endif				/* _H_JFS_METAPAGE */
--- a/fs/jfs/jfs_mount.c
+++ b/fs/jfs/jfs_mount.c
+/*
+ *   MODULE_NAME:		jfs_mount.c
+ *
+ *   COMPONENT_NAME:		sysjfs
+ *
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/*
+ * Change History :
+ *
+ */
+
+/*
+ * Module: jfs_mount.c
+ *
+ * note: file system in transition to aggregate/fileset:
+ *
+ * file system mount is interpreted as the mount of aggregate, 
+ * if not already mounted, and mount of the single/only fileset in 
+ * the aggregate;
+ *
+ * a file system/aggregate is represented by an internal inode
+ * (aka mount inode) initialized with aggregate superblock;
+ * each vfs represents a fileset, and points to its "fileset inode 
+ * allocation map inode" (aka fileset inode):
+ * (an aggregate itself is structured recursively as a filset: 
+ * an internal vfs is constructed and points to its "fileset inode 
+ * allocation map inode" (aka aggregate inode) where each inode 
+ * represents a fileset inode) so that inode number is mapped to 
+ * on-disk inode in uniform way at both aggregate and fileset level;
+ *
+ * each vnode/inode of a fileset is linked to its vfs (to facilitate
+ * per fileset inode operations, e.g., unmount of a fileset, etc.);
+ * each inode points to the mount inode (to facilitate access to
+ * per aggregate information, e.g., block size, etc.) as well as
+ * its file set inode.
+ *
+ *   aggregate 
+ *   ipmnt
+ *   mntvfs -> fileset ipimap+ -> aggregate ipbmap -> aggregate ipaimap;
+ *             fileset vfs     -> vp(1) <-> ... <-> vp(n) <->vproot;
+ */
+
+#include <linux/fs.h>
+#include "jfs_incore.h"
+#include "jfs_filsys.h"
+#include "jfs_superblock.h"
+#include "jfs_dmap.h"
+#include "jfs_imap.h"
+#include "jfs_metapage.h"
+#include "jfs_debug.h"
+
+
+/*
+ * forward references
+ */
+static int chkSuper(struct super_block *);
+static int logMOUNT(struct super_block *sb);
+
+/*
+ * NAME:	jfs_mount(sb)
+ *
+ * FUNCTION:	vfs_mount()
+ *
+ * PARAMETER:	sb	- super block
+ *
+ * RETURN:	EBUSY	- device already mounted or open for write
+ *		EBUSY	- cvrdvp already mounted;
+ *		EBUSY	- mount table full
+ *		ENOTDIR	- cvrdvp not directory on a device mount
+ *		ENXIO	- device open failure
+ */
+int jfs_mount(struct super_block *sb)
+{
+	int rc = 0;		/* Return code          */
+	struct jfs_sb_info *sbi = JFS_SBI(sb);
+	struct inode *ipaimap = NULL;
+	struct inode *ipaimap2 = NULL;
+	struct inode *ipimap = NULL;
+	struct inode *ipbmap = NULL;
+
+	jFYI(1, ("\nMount JFS\n"));
+
+	/*
+	 * read/validate superblock 
+	 * (initialize mount inode from the superblock)
+	 */
+	if ((rc = chkSuper(sb))) {
+		goto errout20;
+	}
+
+	ipaimap = diReadSpecial(sb, AGGREGATE_I);
+	if (ipaimap == NULL) {
+		jERROR(1, ("jfs_mount: Faild to read AGGREGATE_I\n"));
+		rc = EIO;
+		goto errout20;
+	}
+	sbi->ipaimap = ipaimap;
+
+	jFYI(1, ("jfs_mount: ipaimap:0x%p\n", ipaimap));
+
+	/*
+	 * initialize aggregate inode allocation map
+	 */
+	if ((rc = diMount(ipaimap))) {
+		jERROR(1,
+		       ("jfs_mount: diMount(ipaimap) failed w/rc = %d\n",
+			rc));
+		goto errout21;
+	}
+
+	/*
+	 * open aggregate block allocation map
+	 */
+	ipbmap = diReadSpecial(sb, BMAP_I);
+	if (ipbmap == NULL) {
+		rc = EIO;
+		goto errout22;
+	}
+
+	jFYI(1, ("jfs_mount: ipbmap:0x%p\n", ipbmap));
+
+	sbi->ipbmap = ipbmap;
+
+	/*
+	 * initialize aggregate block allocation map
+	 */
+	if ((rc = dbMount(ipbmap))) {
+		jERROR(1, ("jfs_mount: dbMount failed w/rc = %d\n", rc));
+		goto errout22;
+	}
+
+	/*
+	 * open the secondary aggregate inode allocation map
+	 *
+	 * This is a duplicate of the aggregate inode allocation map.
+	 *
+	 * hand craft a vfs in the same fashion as we did to read ipaimap.
+	 * By adding INOSPEREXT (32) to the inode number, we are telling
+	 * diReadSpecial that we are reading from the secondary aggregate
+	 * inode table.  This also creates a unique entry in the inode hash
+	 * table.
+	 */
+	if ((sbi->mntflag & JFS_BAD_SAIT) == 0) {
+		ipaimap2 = diReadSpecial(sb, AGGREGATE_I + INOSPEREXT);
+		if (ipaimap2 == 0) {
+			jERROR(1,
+			       ("jfs_mount: Faild to read AGGREGATE_I\n"));
+			rc = EIO;
+			goto errout35;
+		}
+		sbi->ipaimap2 = ipaimap2;
+
+		jFYI(1, ("jfs_mount: ipaimap2:0x%p\n", ipaimap2));
+
+		/*
+		 * initialize secondary aggregate inode allocation map
+		 */
+		if ((rc = diMount(ipaimap2))) {
+			jERROR(1,
+			       ("jfs_mount: diMount(ipaimap2) failed, rc = %d\n",
+				rc));
+			goto errout35;
+		}
+	} else
+		/* Secondary aggregate inode table is not valid */
+		sbi->ipaimap2 = 0;
+
+	/*
+	 *      mount (the only/single) fileset
+	 */
+	/*
+	 * open fileset inode allocation map (aka fileset inode)
+	 */
+	ipimap = diReadSpecial(sb, FILESYSTEM_I);
+	if (ipimap == NULL) {
+		jERROR(1, ("jfs_mount: Failed to read FILESYSTEM_I\n"));
+		/* open fileset secondary inode allocation map */
+		rc = EIO;
+		goto errout40;
+	}
+	jFYI(1, ("jfs_mount: ipimap:0x%p\n", ipimap));
+
+	/* map further access of per fileset inodes by the fileset inode */
+	sbi->ipimap = ipimap;
+
+	/* initialize fileset inode allocation map */
+	if ((rc = diMount(ipimap))) {
+		jERROR(1, ("jfs_mount: diMount failed w/rc = %d\n", rc));
+		goto errout41;
+	}
+
+	jFYI(1, ("Mount JFS Complete.\n"));
+	goto out;
+
+	/*
+	 *      unwind on error
+	 */
+//errout42: /* close fileset inode allocation map */
+	diUnmount(ipimap, 1);
+
+      errout41:		/* close fileset inode allocation map inode */
+	diFreeSpecial(ipimap);
+
+      errout40:		/* fileset closed */
+
+	/* close secondary aggregate inode allocation map */
+	if (ipaimap2) {
+		diUnmount(ipaimap2, 1);
+		diFreeSpecial(ipaimap2);
+	}
+
+      errout35:
+
+	/* close aggregate block allocation map */
+	dbUnmount(ipbmap, 1);
+	diFreeSpecial(ipbmap);
+
+      errout22:		/* close aggregate inode allocation map */
+
+	diUnmount(ipaimap, 1);
+
+      errout21:		/* close aggregate inodes */
+	diFreeSpecial(ipaimap);
+      errout20:		/* aggregate closed */
+
+      out:
+
+	if (rc) {
+		jERROR(1, ("Mount JFS Failure: %d\n", rc));
+	}
+	return rc;
+}
+
+/*
+ * NAME:	jfs_mount_rw(sb, remount)
+ *
+ * FUNCTION:	Completes read-write mount, or remounts read-only volume
+ *		as read-write
+ */
+int jfs_mount_rw(struct super_block *sb, int remount)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(sb);  
+	log_t *log;
+	int rc;
+
+	/*
+	 * If we are re-mounting a previously read-only volume, we want to
+	 * re-read the inode and block maps, since fsck.jfs may have updated
+	 * them.
+	 */
+	if (remount) {
+		if (chkSuper(sb) || (sbi->state != FM_CLEAN))
+			return -EINVAL;
+
+		truncate_inode_pages(sbi->ipimap->i_mapping, 0);
+		truncate_inode_pages(sbi->ipbmap->i_mapping, 0);
+		diUnmount(sbi->ipimap, 1);
+		if ((rc = diMount(sbi->ipimap))) {
+			jERROR(1,("jfs_mount_rw: diMount failed!\n"));
+			return rc;
+		}
+
+		dbUnmount(sbi->ipbmap, 1);
+		if ((rc = dbMount(sbi->ipbmap))) {
+			jERROR(1,("jfs_mount_rw: dbMount failed!\n"));
+			return rc;
+		}
+	}
+#ifdef _STILL_TO_PORT
+	/*
+	 * get log device associated with the fs being mounted;
+	 */
+	if (ipmnt->i_mntflag & JFS_INLINELOG) {
+		vfsp->vfs_logVPB = vfsp->vfs_hVPB;
+		vfsp->vfs_logvpfs = vfsp->vfs_vpfsi;
+	} else if (vfsp->vfs_logvpfs == NULL) {
+		/*
+		 * XXX: there's only one external log per system;
+		 */
+		jERROR(1, ("jfs_mount: Mount Failure! No Log Device.\n"));
+		goto errout30;
+	}
+
+	logdev = vfsp->vfs_logvpfs->vpi_unit;
+	ipmnt->i_logdev = logdev;
+#endif				/* _STILL_TO_PORT */
+
+	/*
+	 * open/initialize log
+	 */
+	if ((rc = lmLogOpen(sb, &log)))
+		return rc;
+
+	JFS_SBI(sb)->log = log;
+
+	/*
+	 * update file system superblock;
+	 */
+	if ((rc = updateSuper(sb, FM_MOUNT))) {
+		jERROR(1,
+		       ("jfs_mount: updateSuper failed w/rc = %d\n", rc));
+		lmLogClose(sb, log);
+		JFS_SBI(sb)->log = 0;
+		return rc;
+	}
+
+	/*
+	 * write MOUNT log record of the file system
+	 */
+	logMOUNT(sb);
+
+	return rc;
+}
+
+/*
+ *	chkSuper()
+ *
+ * validate the superblock of the file system to be mounted and 
+ * get the file system parameters.
+ *
+ * returns
+ *	0 with fragsize set if check successful
+ *	error code if not successful
+ */
+static int chkSuper(struct super_block *sb)
+{
+	int rc = 0;
+	metapage_t *mp;
+	struct jfs_sb_info *sbi = JFS_SBI(sb);
+	struct jfs_superblock *j_sb;
+	int AIM_bytesize, AIT_bytesize;
+	int expected_AIM_bytesize, expected_AIT_bytesize;
+	s64 AIM_byte_addr, AIT_byte_addr, fsckwsp_addr;
+	s64 byte_addr_diff0, byte_addr_diff1;
+	s32 bsize;
+
+	if ((rc = readSuper(sb, &mp)))
+		return rc;
+	j_sb = (struct jfs_superblock *) (mp->data);
+
+	/*
+	 * validate superblock
+	 */
+	/* validate fs signature */
+	if (strncmp(j_sb->s_magic, JFS_MAGIC, 4) ||
+	    j_sb->s_version != cpu_to_le32(JFS_VERSION)) {
+		//rc = EFORMAT;
+		rc = EINVAL;
+		goto out;
+	}
+
+	bsize = le32_to_cpu(j_sb->s_bsize);
+#ifdef _JFS_4K
+	if (bsize != PSIZE) {
+		jERROR(1, ("Currently only 4K block size supported!\n"));
+		rc = EINVAL;
+		goto out;
+	}
+#endif				/* _JFS_4K */
+
+	jFYI(1, ("superblock: flag:0x%08x state:0x%08x size:0x%Lx\n",
+		 le32_to_cpu(j_sb->s_flag), le32_to_cpu(j_sb->s_state),
+		 (unsigned long long) le64_to_cpu(j_sb->s_size)));
+
+	/* validate the descriptors for Secondary AIM and AIT */
+	if ((j_sb->s_flag & cpu_to_le32(JFS_BAD_SAIT)) !=
+	    cpu_to_le32(JFS_BAD_SAIT)) {
+		expected_AIM_bytesize = 2 * PSIZE;
+		AIM_bytesize = lengthPXD(&(j_sb->s_aim2)) * bsize;
+		expected_AIT_bytesize = 4 * PSIZE;
+		AIT_bytesize = lengthPXD(&(j_sb->s_ait2)) * bsize;
+		AIM_byte_addr = addressPXD(&(j_sb->s_aim2)) * bsize;
+		AIT_byte_addr = addressPXD(&(j_sb->s_ait2)) * bsize;
+		byte_addr_diff0 = AIT_byte_addr - AIM_byte_addr;
+		fsckwsp_addr = addressPXD(&(j_sb->s_fsckpxd)) * bsize;
+		byte_addr_diff1 = fsckwsp_addr - AIT_byte_addr;
+		if ((AIM_bytesize != expected_AIM_bytesize) ||
+		    (AIT_bytesize != expected_AIT_bytesize) ||
+		    (byte_addr_diff0 != AIM_bytesize) ||
+		    (byte_addr_diff1 <= AIT_bytesize))
+			j_sb->s_flag |= cpu_to_le32(JFS_BAD_SAIT);
+	}
+
+	/* in release 1, the flag MUST reflect inline log, and group commit */
+	if ((j_sb->s_flag & cpu_to_le32(JFS_INLINELOG)) !=
+	    cpu_to_le32(JFS_INLINELOG))
+		j_sb->s_flag |= cpu_to_le32(JFS_INLINELOG);
+	if ((j_sb->s_flag & cpu_to_le32(JFS_GROUPCOMMIT)) !=
+	    cpu_to_le32(JFS_GROUPCOMMIT))
+		j_sb->s_flag |= cpu_to_le32(JFS_GROUPCOMMIT);
+	jFYI(0, ("superblock: flag:0x%08x state:0x%08x size:0x%Lx\n",
+		 le32_to_cpu(j_sb->s_flag), le32_to_cpu(j_sb->s_state),
+		 (unsigned long long) le64_to_cpu(j_sb->s_size)));
+
+	/* validate fs state */
+	if (j_sb->s_state != cpu_to_le32(FM_CLEAN) &&
+	    !(sb->s_flags & MS_RDONLY)) {
+		jERROR(1,
+		       ("jfs_mount: Mount Failure: File System Dirty.\n"));
+		rc = EINVAL;
+		goto out;
+	}
+
+	sbi->state = le32_to_cpu(j_sb->s_state);
+	sbi->mntflag = le32_to_cpu(j_sb->s_flag);
+
+	/*
+	 * JFS always does I/O by 4K pages.  Don't tell the buffer cache
+	 * that we use anything else (leave s_blocksize alone).
+	 */
+	sbi->bsize = bsize;
+	sbi->l2bsize = le16_to_cpu(j_sb->s_l2bsize);
+
+	/*
+	 * For now, ignore s_pbsize, l2bfactor.  All I/O going through buffer
+	 * cache.
+	 */
+	sbi->nbperpage = PSIZE >> sbi->l2bsize;
+	sbi->l2nbperpage = L2PSIZE - sbi->l2bsize;
+	sbi->l2niperblk = sbi->l2bsize - L2DISIZE;
+	if (sbi->mntflag & JFS_INLINELOG)
+		sbi->logpxd = j_sb->s_logpxd;
+	sbi->ait2 = j_sb->s_ait2;
+
+      out:
+	release_metapage(mp);
+
+	return rc;
+}
+
+
+/*
+ *	updateSuper()
+ *
+ * update synchronously superblock if it is mounted read-write.
+ */
+int updateSuper(struct super_block *sb, uint state)
+{
+	int rc;
+	metapage_t *mp;
+	struct jfs_superblock *j_sb;
+
+	/*
+	 * Only fsck can fix dirty state
+	 */
+	if (JFS_SBI(sb)->state == FM_DIRTY)
+		return 0;
+
+	if ((rc = readSuper(sb, &mp)))
+		return rc;
+
+	j_sb = (struct jfs_superblock *) (mp->data);
+
+	j_sb->s_state = cpu_to_le32(state);
+	JFS_SBI(sb)->state = state;
+
+	if (state == FM_MOUNT) {
+		/* record log's dev_t and mount serial number */
+		j_sb->s_logdev =
+			cpu_to_le32(kdev_t_to_nr(JFS_SBI(sb)->log->dev));
+		j_sb->s_logserial = cpu_to_le32(JFS_SBI(sb)->log->serial);
+	} else if (state == FM_CLEAN) {
+		/*
+		 * If this volume is shared with OS/2, OS/2 will need to
+		 * recalculate DASD usage, since we don't deal with it.
+		 */
+		if (j_sb->s_flag & cpu_to_le32(JFS_DASD_ENABLED))
+			j_sb->s_flag |= cpu_to_le32(JFS_DASD_PRIME);
+	}
+
+	flush_metapage(mp);
+
+	return 0;
+}
+
+
+/*
+ *	readSuper()
+ *
+ * read superblock by raw sector address
+ */
+int readSuper(struct super_block *sb, metapage_t ** mpp)
+{
+	/* read in primary superblock */
+	*mpp = read_metapage(JFS_SBI(sb)->direct_inode,
+			     SUPER1_OFF >> sb->s_blocksize_bits, PSIZE, 1);
+	if (*mpp == NULL) {
+		/* read in secondary/replicated superblock */
+		*mpp = read_metapage(JFS_SBI(sb)->direct_inode,
+				     SUPER2_OFF >> sb->s_blocksize_bits,
+				     PSIZE, 1);
+	}
+	return *mpp ? 0 : 1;
+}
+
+
+/*
+ *	logMOUNT()
+ *
+ * function: write a MOUNT log record for file system.
+ *
+ * MOUNT record keeps logredo() from processing log records
+ * for this file system past this point in log.
+ * it is harmless if mount fails.
+ *
+ * note: MOUNT record is at aggregate level, not at fileset level, 
+ * since log records of previous mounts of a fileset
+ * (e.g., AFTER record of extent allocation) have to be processed 
+ * to update block allocation map at aggregate level.
+ */
+static int logMOUNT(struct super_block *sb)
+{
+	log_t *log = JFS_SBI(sb)->log;
+	lrd_t lrd;
+
+	lrd.logtid = 0;
+	lrd.backchain = 0;
+	lrd.type = cpu_to_le16(LOG_MOUNT);
+	lrd.length = 0;
+	lrd.aggregate = cpu_to_le32(kdev_t_to_nr(sb->s_dev));
+	lmLog(log, NULL, &lrd, NULL);
+
+	return 0;
+}
--- a/fs/jfs/jfs_superblock.h
+++ b/fs/jfs/jfs_superblock.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+#ifndef	_H_JFS_SUPERBLOCK
+#define _H_JFS_SUPERBLOCK
+/*
+ *	jfs_superblock.h
+ */
+
+/*
+ * make the magic number something a human could read
+ */
+#define JFS_MAGIC 	"JFS1"	/* Magic word: Version 1 */
+
+#define JFS_VERSION	1	/* Version number: Version 1 */
+
+#define LV_NAME_SIZE	11	/* MUST BE 11 for OS/2 boot sector */
+
+/* 
+ *	aggregate superblock 
+ *
+ * The name superblock is too close to super_block, so the name has been
+ * changed to jfs_superblock.  The utilities are still using the old name.
+ */
+struct jfs_superblock {
+	char s_magic[4];	/* 4: magic number */
+	u32 s_version;		/* 4: version number */
+
+	s64 s_size;		/* 8: aggregate size in hardware/LVM blocks;
+				 * VFS: number of blocks
+				 */
+	s32 s_bsize;		/* 4: aggregate block size in bytes; 
+				 * VFS: fragment size
+				 */
+	s16 s_l2bsize;		/* 2: log2 of s_bsize */
+	s16 s_l2bfactor;	/* 2: log2(s_bsize/hardware block size) */
+	s32 s_pbsize;		/* 4: hardware/LVM block size in bytes */
+	s16 s_l2pbsize;		/* 2: log2 of s_pbsize */
+	s16 pad;		/* 2: padding necessary for alignment */
+
+	u32 s_agsize;		/* 4: allocation group size in aggr. blocks */
+
+	u32 s_flag;		/* 4: aggregate attributes:
+				 *    see jfs_filsys.h
+				 */
+	u32 s_state;		/* 4: mount/unmount/recovery state: 
+				 *    see jfs_filsys.h
+				 */
+	s32 s_compress;		/* 4: > 0 if data compression */
+
+	pxd_t s_ait2;		/* 8: first extent of secondary
+				 *    aggregate inode table
+				 */
+
+	pxd_t s_aim2;		/* 8: first extent of secondary
+				 *    aggregate inode map
+				 */
+	u32 s_logdev;		/* 4: device address of log */
+	s32 s_logserial;	/* 4: log serial number at aggregate mount */
+	pxd_t s_logpxd;		/* 8: inline log extent */
+
+	pxd_t s_fsckpxd;	/* 8: inline fsck work space extent */
+
+	struct timestruc_t s_time;	/* 8: time last updated */
+
+	s32 s_fsckloglen;	/* 4: Number of filesystem blocks reserved for
+				 *    the fsck service log.  
+				 *    N.B. These blocks are divided among the
+				 *         versions kept.  This is not a per
+				 *         version size.
+				 *    N.B. These blocks are included in the 
+				 *         length field of s_fsckpxd.
+				 */
+	s8 s_fscklog;		/* 1: which fsck service log is most recent
+				 *    0 => no service log data yet
+				 *    1 => the first one
+				 *    2 => the 2nd one
+				 */
+	char s_fpack[11];	/* 11: file system volume name 
+				 *     N.B. This must be 11 bytes to
+				 *          conform with the OS/2 BootSector
+				 *          requirements
+				 */
+
+	/* extendfs() parameter under s_state & FM_EXTENDFS */
+	s64 s_xsize;		/* 8: extendfs s_size */
+	pxd_t s_xfsckpxd;	/* 8: extendfs fsckpxd */
+	pxd_t s_xlogpxd;	/* 8: extendfs logpxd */
+	/* - 128 byte boundary - */
+
+	/*
+	 *      DFS VFS support (preliminary) 
+	 */
+	char s_attach;		/* 1: VFS: flag: set when aggregate is attached
+				 */
+	u8 rsrvd4[7];		/* 7: reserved - set to 0 */
+
+	u64 totalUsable;	/* 8: VFS: total of 1K blocks which are
+				 * available to "normal" (non-root) users.
+				 */
+	u64 minFree;		/* 8: VFS: # of 1K blocks held in reserve for 
+				 * exclusive use of root.  This value can be 0,
+				 * and if it is then totalUsable will be equal 
+				 * to # of blocks in aggregate.  I believe this
+				 * means that minFree + totalUsable = # blocks.
+				 * In that case, we don't need to store both 
+				 * totalUsable and minFree since we can compute
+				 * one from the other.  I would guess minFree 
+				 * would be the one we should store, and 
+				 * totalUsable would be the one we should 
+				 * compute.  (Just a guess...)
+				 */
+
+	u64 realFree;		/* 8: VFS: # of free 1K blocks can be used by 
+				 * "normal" users.  It may be this is something
+				 * we should compute when asked for instead of 
+				 * storing in the superblock.  I don't know how
+				 * often this information is needed.
+				 */
+	/*
+	 *      graffiti area
+	 */
+};
+
+extern int readSuper(struct super_block *, struct metapage **);
+extern int updateSuper(struct super_block *, uint);
+
+#endif /*_H_JFS_SUPERBLOCK */
--- a/fs/jfs/jfs_txnmgr.c
+++ b/fs/jfs/jfs_txnmgr.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/*
+ *      jfs_txnmgr.c: transaction manager
+ *
+ * notes:
+ * transaction starts with txBegin() and ends with txCommit()
+ * or txAbort().
+ *
+ * tlock is acquired at the time of update;
+ * (obviate scan at commit time for xtree and dtree)
+ * tlock and mp points to each other;
+ * (no hashlist for mp -> tlock).
+ *
+ * special cases:
+ * tlock on in-memory inode:
+ * in-place tlock in the in-memory inode itself;
+ * converted to page lock by iWrite() at commit time.
+ *
+ * tlock during write()/mmap() under anonymous transaction (tid = 0):
+ * transferred (?) to transaction at commit time.
+ *
+ * use the page itself to update allocation maps
+ * (obviate intermediate replication of allocation/deallocation data)
+ * hold on to mp+lock thru update of maps
+ */
+
+
+#include <linux/fs.h>
+#include <linux/locks.h>
+#include <linux/vmalloc.h>
+#include <linux/smp_lock.h>
+#include "jfs_incore.h"
+#include "jfs_filsys.h"
+#include "jfs_metapage.h"
+#include "jfs_dinode.h"
+#include "jfs_imap.h"
+#include "jfs_dmap.h"
+#include "jfs_superblock.h"
+#include "jfs_debug.h"
+
+/*
+ *      transaction management structures
+ */
+static struct {
+	/* tblock */
+	int freetid;		/* 4: index of a free tid structure */
+	wait_queue_head_t freewait;	/* 4: eventlist of free tblock */
+
+	/* tlock */
+	int freelock;		/* 4: index first free lock word */
+	wait_queue_head_t freelockwait;	/* 4: eventlist of free tlock */
+	wait_queue_head_t lowlockwait;	/* 4: eventlist of ample tlocks */
+	int tlocksInUse;	/* 4: Number of tlocks in use */
+	spinlock_t LazyLock;	/* 4: synchronize sync_queue & unlock_queue */
+/*	tblock_t *sync_queue;	 * 4: Transactions waiting for data sync */
+	tblock_t *unlock_queue;	/* 4: Transactions waiting to be released */
+	tblock_t *unlock_tail;	/* 4: Tail of unlock_queue */
+	struct list_head anon_list;	/* inodes having anonymous txns */
+	struct list_head anon_list2;	/* inodes having anonymous txns
+					   that couldn't be sync'ed */
+} TxAnchor;
+
+static int nTxBlock = 512;	/* number of transaction blocks */
+struct tblock *TxBlock;	        /* transaction block table */
+
+static int nTxLock = 2048;	/* number of transaction locks */
+static int TxLockLWM = 2048*.4;	/* Low water mark for number of txLocks used */
+static int TxLockHWM = 2048*.8;	/* High water mark for number of txLocks used */
+struct tlock *TxLock;           /* transaction lock table */
+static int TlocksLow = 0;	/* Indicates low number of available tlocks */
+
+
+/*
+ *      transaction management lock
+ */
+static spinlock_t jfsTxnLock = SPIN_LOCK_UNLOCKED;
+
+#define TXN_LOCK()              spin_lock(&jfsTxnLock)
+#define TXN_UNLOCK()            spin_unlock(&jfsTxnLock)
+
+#define LAZY_LOCK_INIT()	spin_lock_init(&TxAnchor.LazyLock);
+#define LAZY_LOCK(flags)	spin_lock_irqsave(&TxAnchor.LazyLock, flags)
+#define LAZY_UNLOCK(flags) spin_unlock_irqrestore(&TxAnchor.LazyLock, flags)
+
+/*
+ * Retry logic exist outside these macros to protect from spurrious wakeups.
+ */
+static inline void TXN_SLEEP_DROP_LOCK(wait_queue_head_t * event)
+{
+	DECLARE_WAITQUEUE(wait, current);
+
+	add_wait_queue(event, &wait);
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	TXN_UNLOCK();
+	schedule();
+	current->state = TASK_RUNNING;
+	remove_wait_queue(event, &wait);
+}
+
+#define TXN_SLEEP(event)\
+{\
+	TXN_SLEEP_DROP_LOCK(event);\
+	TXN_LOCK();\
+}
+
+#define TXN_WAKEUP(event) wake_up_all(event)
+
+
+/*
+ *      statistics
+ */
+struct {
+	tid_t maxtid;		/* 4: biggest tid ever used */
+	lid_t maxlid;		/* 4: biggest lid ever used */
+	int ntid;		/* 4: # of transactions performed */
+	int nlid;		/* 4: # of tlocks acquired */
+	int waitlock;		/* 4: # of tlock wait */
+} stattx;
+
+
+/*
+ * external references
+ */
+extern int lmGroupCommit(log_t * log, tblock_t * tblk);
+extern void lmSync(log_t *);
+extern int readSuper(struct super_block *sb, metapage_t ** bpp);
+extern int jfs_commit_inode(struct inode *, int);
+extern int jfs_thread_stopped(void);
+
+extern struct task_struct *jfsCommitTask;
+extern struct completion jfsIOwait;
+extern struct task_struct *jfsSyncTask;
+
+/*
+ * forward references
+ */
+int diLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck,
+	  commit_t * cd);
+int dataLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
+void dtLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
+void inlineLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
+void mapLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
+void txAbortCommit(commit_t * cd, int exval);
+static void txAllocPMap(struct inode *ip, maplock_t * maplock,
+			tblock_t * tblk);
+void txForce(tblock_t * tblk);
+static int txLog(log_t * log, tblock_t * tblk, commit_t * cd);
+int txMoreLock(void);
+static void txUpdateMap(tblock_t * tblk);
+static void txRelease(tblock_t * tblk);
+void xtLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
+static void LogSyncRelease(metapage_t * mp);
+
+/*
+ *              transaction block/lock management
+ *              ---------------------------------
+ */
+
+/*
+ * Get a transaction lock from the free list.  If the number in use is
+ * greater than the high water mark, wake up the sync daemon.  This should
+ * free some anonymous transaction locks.  (TXN_LOCK must be held.)
+ */
+static lid_t txLockAlloc(void)
+{
+	lid_t lid;
+
+	while (!(lid = TxAnchor.freelock))
+		TXN_SLEEP(&TxAnchor.freelockwait);
+	TxAnchor.freelock = TxLock[lid].next;
+	HIGHWATERMARK(stattx.maxlid, lid);
+	if ((++TxAnchor.tlocksInUse > TxLockHWM) && (TlocksLow == 0)) {
+		jEVENT(0,("txLockAlloc TlocksLow\n"));
+		TlocksLow = 1;
+	wake_up_process(jfsSyncTask);
+	}
+
+	return lid;
+}
+
+static void txLockFree(lid_t lid)
+{
+	TxLock[lid].next = TxAnchor.freelock;
+	TxAnchor.freelock = lid;
+	TxAnchor.tlocksInUse--;
+	if (TlocksLow && (TxAnchor.tlocksInUse < TxLockLWM)) {
+		jEVENT(0,("txLockFree TlocksLow no more\n"));
+		TlocksLow = 0;
+		TXN_WAKEUP(&TxAnchor.lowlockwait);
+	}
+	TXN_WAKEUP(&TxAnchor.freelockwait);
+}
+
+/*
+ * NAME:        txInit()
+ *
+ * FUNCTION:    initialize transaction management structures
+ *
+ * RETURN:
+ *
+ * serialization: single thread at jfs_init()
+ */
+int txInit(void)
+{
+	int k, size;
+
+	/*
+	 * initialize transaction block (tblock) table
+	 *
+	 * transaction id (tid) = tblock index
+	 * tid = 0 is reserved.
+	 */
+	size = sizeof(tblock_t) * nTxBlock;
+	TxBlock = (tblock_t *) vmalloc(size);
+	if (TxBlock == NULL)
+		return ENOMEM;
+
+	for (k = 1; k < nTxBlock - 1; k++) {
+		TxBlock[k].next = k + 1;
+		init_waitqueue_head(&TxBlock[k].gcwait);
+		init_waitqueue_head(&TxBlock[k].waitor);
+	}
+	TxBlock[k].next = 0;
+	init_waitqueue_head(&TxBlock[k].gcwait);
+	init_waitqueue_head(&TxBlock[k].waitor);
+
+	TxAnchor.freetid = 1;
+	init_waitqueue_head(&TxAnchor.freewait);
+
+	stattx.maxtid = 1;	/* statistics */
+
+	/*
+	 * initialize transaction lock (tlock) table
+	 *
+	 * transaction lock id = tlock index
+	 * tlock id = 0 is reserved.
+	 */
+	size = sizeof(tlock_t) * nTxLock;
+	TxLock = (tlock_t *) vmalloc(size);
+	if (TxLock == NULL) {
+		vfree(TxBlock);
+		return ENOMEM;
+	}
+
+	/* initialize tlock table */
+	for (k = 1; k < nTxLock - 1; k++)
+		TxLock[k].next = k + 1;
+	TxLock[k].next = 0;
+	init_waitqueue_head(&TxAnchor.freelockwait);
+	init_waitqueue_head(&TxAnchor.lowlockwait);
+
+	TxAnchor.freelock = 1;
+	TxAnchor.tlocksInUse = 0;
+	INIT_LIST_HEAD(&TxAnchor.anon_list);
+	INIT_LIST_HEAD(&TxAnchor.anon_list2);
+
+	stattx.maxlid = 1;	/* statistics */
+
+	return 0;
+}
+
+/*
+ * NAME:        txExit()
+ *
+ * FUNCTION:    clean up when module is unloaded
+ */
+void txExit(void)
+{
+	vfree(TxLock);
+	TxLock = 0;
+	vfree(TxBlock);
+	TxBlock = 0;
+}
+
+
+/*
+ * NAME:        txBegin()
+ *
+ * FUNCTION:    start a transaction.
+ *
+ * PARAMETER:   sb	- superblock
+ *              flag	- force for nested tx;
+ *
+ * RETURN:	tid	- transaction id
+ *
+ * note: flag force allows to start tx for nested tx
+ * to prevent deadlock on logsync barrier;
+ */
+tid_t txBegin(struct super_block *sb, int flag)
+{
+	tid_t t;
+	tblock_t *tblk;
+	log_t *log;
+
+	jFYI(1, ("txBegin: flag = 0x%x\n", flag));
+	log = (log_t *) JFS_SBI(sb)->log;
+
+	TXN_LOCK();
+
+      retry:
+	if (flag != COMMIT_FORCE) {
+		/*
+		 * synchronize with logsync barrier
+		 */
+		if (log->syncbarrier) {
+			TXN_SLEEP(&log->syncwait);
+			goto retry;
+		}
+	}
+	if (flag == 0) {
+		/*
+		 * Don't begin transaction if we're getting starved for tlocks
+		 * unless COMMIT_FORCE (imap changes) or COMMIT_INODE (which
+		 * may ultimately free tlocks)
+		 */
+		if (TlocksLow) {
+			TXN_SLEEP(&TxAnchor.lowlockwait);
+			goto retry;
+		}
+	}
+
+	/*
+	 * allocate transaction id/block
+	 */
+	if ((t = TxAnchor.freetid) == 0) {
+		jFYI(1, ("txBegin: waiting for free tid\n"));
+		TXN_SLEEP(&TxAnchor.freewait);
+		goto retry;
+	}
+
+	tblk = tid_to_tblock(t);
+
+	if ((tblk->next == 0) && (current != jfsCommitTask)) {
+		/* Save one tblk for jfsCommit thread */
+		jFYI(1, ("txBegin: waiting for free tid\n"));
+		TXN_SLEEP(&TxAnchor.freewait);
+		goto retry;
+	}
+
+	TxAnchor.freetid = tblk->next;
+
+	/*
+	 * initialize transaction
+	 */
+
+	/*
+	 * We can't zero the whole thing or we screw up another thread being
+	 * awakened after sleeping on tblk->waitor
+	 *
+	 * memset(tblk, 0, sizeof(tblock_t));
+	 */
+	tblk->next = tblk->last = tblk->xflag = tblk->flag = tblk->lsn = 0;
+
+	tblk->sb = sb;
+	++log->logtid;
+	tblk->logtid = log->logtid;
+
+	++log->active;
+
+	HIGHWATERMARK(stattx.maxtid, t);	/* statistics */
+	INCREMENT(stattx.ntid);	/* statistics */
+
+	TXN_UNLOCK();
+
+	jFYI(1, ("txBegin: returning tid = %d\n", t));
+
+	return t;
+}
+
+
+/*
+ * NAME:        txBeginAnon()
+ *
+ * FUNCTION:    start an anonymous transaction.
+ *		Blocks if logsync or available tlocks are low to prevent
+ *		anonymous tlocks from depleting supply.
+ *
+ * PARAMETER:   sb	- superblock
+ *
+ * RETURN:	none
+ */
+void txBeginAnon(struct super_block *sb)
+{
+	log_t *log;
+
+	log = (log_t *) JFS_SBI(sb)->log;
+
+	TXN_LOCK();
+
+      retry:
+	/*
+	 * synchronize with logsync barrier
+	 */
+	if (log->syncbarrier) {
+		TXN_SLEEP(&log->syncwait);
+		goto retry;
+	}
+
+	/*
+	 * Don't begin transaction if we're getting starved for tlocks
+	 */
+	if (TlocksLow) {
+		TXN_SLEEP(&TxAnchor.lowlockwait);
+		goto retry;
+	}
+	TXN_UNLOCK();
+}
+
+
+/*
+ *      txEnd()
+ *
+ * function: free specified transaction block.
+ *
+ *      logsync barrier processing:
+ *
+ * serialization:
+ */
+void txEnd(tid_t tid)
+{
+	tblock_t *tblk = tid_to_tblock(tid);
+	log_t *log;
+
+	jFYI(1, ("txEnd: tid = %d\n", tid));
+	TXN_LOCK();
+
+	/*
+	 * wakeup transactions waiting on the page locked
+	 * by the current transaction
+	 */
+	TXN_WAKEUP(&tblk->waitor);
+
+	log = (log_t *) JFS_SBI(tblk->sb)->log;
+
+	/*
+	 * Lazy commit thread can't free this guy until we mark it UNLOCKED,
+	 * otherwise, we would be left with a transaction that may have been
+	 * reused.
+	 *
+	 * Lazy commit thread will turn off tblkGC_LAZY before calling this
+	 * routine.
+	 */
+	if (tblk->flag & tblkGC_LAZY) {
+		jFYI(1,
+		     ("txEnd called w/lazy tid: %d, tblk = 0x%p\n",
+		      tid, tblk));
+		TXN_UNLOCK();
+
+		spin_lock_irq(&log->gclock);	// LOGGC_LOCK
+		tblk->flag |= tblkGC_UNLOCKED;
+		spin_unlock_irq(&log->gclock);	// LOGGC_UNLOCK
+		return;
+	}
+
+	jFYI(1, ("txEnd: tid: %d, tblk = 0x%p\n", tid, tblk));
+
+	assert(tblk->next == 0);
+
+	/*
+	 * insert tblock back on freelist
+	 */
+	tblk->next = TxAnchor.freetid;
+	TxAnchor.freetid = tid;
+
+	/*
+	 * mark the tblock not active
+	 */
+	--log->active;
+
+	/*
+	 * synchronize with logsync barrier
+	 */
+	if (log->syncbarrier && log->active == 0) {
+		/* forward log syncpt */
+		/* lmSync(log); */
+
+		jFYI(1, ("     log barrier off: 0x%x\n", log->lsn));
+
+		/* enable new transactions start */
+		log->syncbarrier = 0;
+
+		/* wakeup all waitors for logsync barrier */
+		TXN_WAKEUP(&log->syncwait);
+	}
+
+	/*
+	 * wakeup all waitors for a free tblock
+	 */
+	TXN_WAKEUP(&TxAnchor.freewait);
+
+	TXN_UNLOCK();
+	jFYI(1, ("txEnd: exitting\n"));
+}
+
+
+/*
+ *      txLock()
+ *
+ * function: acquire a transaction lock on the specified <mp>
+ *
+ * parameter:
+ *
+ * return:      transaction lock id
+ *
+ * serialization:
+ */
+tlock_t *txLock(tid_t tid, struct inode *ip, metapage_t * mp, int type)
+{
+	struct jfs_inode_info *jfs_ip = JFS_IP(ip);
+	int dir_xtree = 0;
+	lid_t lid;
+	tid_t xtid;
+	tlock_t *tlck;
+	xtlock_t *xtlck;
+	linelock_t *linelock;
+	xtpage_t *p;
+	tblock_t *tblk;
+
+	TXN_LOCK();
+
+	if (S_ISDIR(ip->i_mode) && (type & tlckXTREE) &&
+	    !(mp->xflag & COMMIT_PAGE)) {
+		/*
+		 * Directory inode is special.  It can have both an xtree tlock
+		 * and a dtree tlock associated with it.
+		 */
+		dir_xtree = 1;
+		lid = jfs_ip->xtlid;
+	} else
+		lid = mp->lid;
+
+	/* is page not locked by a transaction ? */
+	if (lid == 0)
+		goto allocateLock;
+
+	jFYI(1, ("txLock: tid:%d ip:0x%p mp:0x%p lid:%d\n",
+		 tid, ip, mp, lid));
+
+	/* is page locked by the requester transaction ? */
+	tlck = lid_to_tlock(lid);
+	if ((xtid = tlck->tid) == tid)
+		goto grantLock;
+
+	/*
+	 * is page locked by anonymous transaction/lock ?
+	 *
+	 * (page update without transaction (i.e., file write) is
+	 * locked under anonymous transaction tid = 0:
+	 * anonymous tlocks maintained on anonymous tlock list of
+	 * the inode of the page and available to all anonymous
+	 * transactions until txCommit() time at which point
+	 * they are transferred to the transaction tlock list of
+	 * the commiting transaction of the inode)
+	 */
+	if (xtid == 0) {
+		tlck->tid = tid;
+		tblk = tid_to_tblock(tid);
+		/*
+		 * The order of the tlocks in the transaction is important
+		 * (during truncate, child xtree pages must be freed before
+		 * parent's tlocks change the working map).
+		 * Take tlock off anonymous list and add to tail of
+		 * transaction list
+		 *
+		 * Note:  We really need to get rid of the tid & lid and
+		 * use list_head's.  This code is getting UGLY!
+		 */
+		if (jfs_ip->atlhead == lid) {
+			if (jfs_ip->atltail == lid) {
+				/* only anonymous txn.
+				 * Remove from anon_list
+				 */
+				list_del_init(&jfs_ip->anon_inode_list);
+			}
+			jfs_ip->atlhead = tlck->next;
+		} else {
+			lid_t last;
+			for (last = jfs_ip->atlhead;
+			     lid_to_tlock(last)->next != lid;
+			     last = lid_to_tlock(last)->next) {
+				assert(last);
+			}
+			lid_to_tlock(last)->next = tlck->next;
+			if (jfs_ip->atltail == lid)
+				jfs_ip->atltail = last;
+		}
+
+		/* insert the tlock at tail of transaction tlock list */
+
+		if (tblk->next)
+			lid_to_tlock(tblk->last)->next = lid;
+		else
+			tblk->next = lid;
+		tlck->next = 0;
+		tblk->last = lid;
+
+		goto grantLock;
+	}
+
+	goto waitLock;
+
+	/*
+	 * allocate a tlock
+	 */
+      allocateLock:
+	lid = txLockAlloc();
+	tlck = lid_to_tlock(lid);
+
+	/*
+	 * initialize tlock
+	 */
+	tlck->tid = tid;
+
+	/* mark tlock for meta-data page */
+	if (mp->xflag & COMMIT_PAGE) {
+
+		tlck->flag = tlckPAGELOCK;
+
+		/* mark the page dirty and nohomeok */
+		mark_metapage_dirty(mp);
+		atomic_inc(&mp->nohomeok);
+
+		jFYI(1,
+		     ("locking mp = 0x%p, nohomeok = %d tid = %d tlck = 0x%p\n",
+		      mp, atomic_read(&mp->nohomeok), tid, tlck));
+
+		/* if anonymous transaction, and buffer is on the group
+		 * commit synclist, mark inode to show this.  This will
+		 * prevent the buffer from being marked nohomeok for too
+		 * long a time.
+		 */
+		if ((tid == 0) && mp->lsn)
+			set_cflag(COMMIT_Synclist, ip);
+	}
+	/* mark tlock for in-memory inode */
+	else
+		tlck->flag = tlckINODELOCK;
+
+	tlck->type = 0;
+
+	/* bind the tlock and the page */
+	tlck->ip = ip;
+	tlck->mp = mp;
+	if (dir_xtree)
+		jfs_ip->xtlid = lid;
+	else
+		mp->lid = lid;
+
+	/*
+	 * enqueue transaction lock to transaction/inode
+	 */
+	/* insert the tlock at tail of transaction tlock list */
+	if (tid) {
+		tblk = tid_to_tblock(tid);
+		if (tblk->next)
+			lid_to_tlock(tblk->last)->next = lid;
+		else
+			tblk->next = lid;
+		tlck->next = 0;
+		tblk->last = lid;
+	}
+	/* anonymous transaction:
+	 * insert the tlock at head of inode anonymous tlock list
+	 */
+	else {
+		tlck->next = jfs_ip->atlhead;
+		jfs_ip->atlhead = lid;
+		if (tlck->next == 0) {
+			/* This inode's first anonymous transaction */
+			jfs_ip->atltail = lid;
+			list_add_tail(&jfs_ip->anon_inode_list,
+				      &TxAnchor.anon_list);
+		}
+	}
+
+	/* initialize type dependent area for linelock */
+	linelock = (linelock_t *) & tlck->lock;
+	linelock->next = 0;
+	linelock->flag = tlckLINELOCK;
+	linelock->maxcnt = TLOCKSHORT;
+	linelock->index = 0;
+
+	switch (type & tlckTYPE) {
+	case tlckDTREE:
+		linelock->l2linesize = L2DTSLOTSIZE;
+		break;
+
+	case tlckXTREE:
+		linelock->l2linesize = L2XTSLOTSIZE;
+
+		xtlck = (xtlock_t *) linelock;
+		xtlck->header.offset = 0;
+		xtlck->header.length = 2;
+
+		if (type & tlckNEW) {
+			xtlck->lwm.offset = XTENTRYSTART;
+		} else {
+			if (mp->xflag & COMMIT_PAGE)
+				p = (xtpage_t *) mp->data;
+			else
+				p = &jfs_ip->i_xtroot;
+			xtlck->lwm.offset =
+			    le16_to_cpu(p->header.nextindex);
+		}
+		xtlck->lwm.length = 0;	/* ! */
+
+		xtlck->index = 2;
+		break;
+
+	case tlckINODE:
+		linelock->l2linesize = L2INODESLOTSIZE;
+		break;
+
+	case tlckDATA:
+		linelock->l2linesize = L2DATASLOTSIZE;
+		break;
+
+	default:
+		jERROR(1, ("UFO tlock:0x%p\n", tlck));
+	}
+
+	/*
+	 * update tlock vector
+	 */
+      grantLock:
+	tlck->type |= type;
+
+	TXN_UNLOCK();
+
+	return tlck;
+
+	/*
+	 * page is being locked by another transaction:
+	 */
+      waitLock:
+	/* Only locks on ipimap or ipaimap should reach here */
+	/* assert(jfs_ip->fileset == AGGREGATE_I); */
+	if (jfs_ip->fileset != AGGREGATE_I) {
+		jERROR(1, ("txLock: trying to lock locked page!\n"));
+		dump_mem("ip", ip, sizeof(struct inode));
+		dump_mem("mp", mp, sizeof(metapage_t));
+		dump_mem("Locker's tblk", tid_to_tblock(tid),
+			 sizeof(tblock_t));
+		dump_mem("Tlock", tlck, sizeof(tlock_t));
+		BUG();
+	}
+	INCREMENT(stattx.waitlock);	/* statistics */
+	release_metapage(mp);
+
+	jEVENT(0, ("txLock: in waitLock, tid = %d, xtid = %d, lid = %d\n",
+		   tid, xtid, lid));
+	TXN_SLEEP_DROP_LOCK(&tid_to_tblock(xtid)->waitor);
+	jEVENT(0, ("txLock: awakened     tid = %d, lid = %d\n", tid, lid));
+
+	return NULL;
+}
+
+
+/*
+ * NAME:        txRelease()
+ *
+ * FUNCTION:    Release buffers associated with transaction locks, but don't
+ *		mark homeok yet.  The allows other transactions to modify
+ *		buffers, but won't let them go to disk until commit record
+ *		actually gets written.
+ *
+ * PARAMETER:
+ *              tblk    -
+ *
+ * RETURN:      Errors from subroutines.
+ */
+static void txRelease(tblock_t * tblk)
+{
+	metapage_t *mp;
+	lid_t lid;
+	tlock_t *tlck;
+
+	TXN_LOCK();
+
+	for (lid = tblk->next; lid; lid = tlck->next) {
+		tlck = lid_to_tlock(lid);
+		if ((mp = tlck->mp) != NULL &&
+		    (tlck->type & tlckBTROOT) == 0) {
+			assert(mp->xflag & COMMIT_PAGE);
+			mp->lid = 0;
+		}
+	}
+
+	/*
+	 * wakeup transactions waiting on a page locked
+	 * by the current transaction
+	 */
+	TXN_WAKEUP(&tblk->waitor);
+
+	TXN_UNLOCK();
+}
+
+
+/*
+ * NAME:        txUnlock()
+ *
+ * FUNCTION:    Initiates pageout of pages modified by tid in journalled
+ *              objects and frees their lockwords.
+ *
+ * PARAMETER:
+ *              flag    -
+ *
+ * RETURN:      Errors from subroutines.
+ */
+static void txUnlock(tblock_t * tblk, int flag)
+{
+	tlock_t *tlck;
+	linelock_t *linelock;
+	lid_t lid, next, llid, k;
+	metapage_t *mp;
+	log_t *log;
+	int force;
+	int difft, diffp;
+
+	jFYI(1, ("txUnlock: tblk = 0x%p\n", tblk));
+	log = (log_t *) JFS_SBI(tblk->sb)->log;
+	force = flag & COMMIT_FLUSH;
+	if (log->syncbarrier)
+		force |= COMMIT_FORCE;
+
+	/*
+	 * mark page under tlock homeok (its log has been written):
+	 * if caller has specified FORCE (e.g., iRecycle()), or
+	 * if syncwait for the log is set (i.e., the log sync point
+	 * has fallen behind), or
+	 * if syncpt is set for the page, or
+	 * if the page is new, initiate pageout;
+	 * otherwise, leave the page in memory.
+	 */
+	for (lid = tblk->next; lid; lid = next) {
+		tlck = lid_to_tlock(lid);
+		next = tlck->next;
+
+		jFYI(1, ("unlocking lid = %d, tlck = 0x%p\n", lid, tlck));
+
+		/* unbind page from tlock */
+		if ((mp = tlck->mp) != NULL &&
+		    (tlck->type & tlckBTROOT) == 0) {
+			assert(mp->xflag & COMMIT_PAGE);
+
+			/* hold buffer
+			 *
+			 * It's possible that someone else has the metapage.
+			 * The only things were changing are nohomeok, which
+			 * is handled atomically, and clsn which is protected
+			 * by the LOGSYNC_LOCK.
+			 */
+			hold_metapage(mp, 1);
+
+			assert(atomic_read(&mp->nohomeok) > 0);
+			atomic_dec(&mp->nohomeok);
+
+			/* inherit younger/larger clsn */
+			LOGSYNC_LOCK(log);
+			if (mp->clsn) {
+				logdiff(difft, tblk->clsn, log);
+				logdiff(diffp, mp->clsn, log);
+				if (difft > diffp)
+					mp->clsn = tblk->clsn;
+			} else
+				mp->clsn = tblk->clsn;
+			LOGSYNC_UNLOCK(log);
+
+			assert(!(tlck->flag & tlckFREEPAGE));
+
+			if (tlck->flag & tlckWRITEPAGE) {
+				write_metapage(mp);
+			} else {
+				/* release page which has been forced */
+				release_metapage(mp);
+			}
+		}
+
+		/* insert tlock, and linelock(s) of the tlock if any,
+		 * at head of freelist
+		 */
+		TXN_LOCK();
+
+		llid = ((linelock_t *) & tlck->lock)->next;
+		while (llid) {
+			linelock = (linelock_t *) lid_to_tlock(llid);
+			k = linelock->next;
+			txLockFree(llid);
+			llid = k;
+		}
+		txLockFree(lid);
+
+		TXN_UNLOCK();
+	}
+	tblk->next = tblk->last = 0;
+
+	/*
+	 * remove tblock from logsynclist
+	 * (allocation map pages inherited lsn of tblk and
+	 * has been inserted in logsync list at txUpdateMap())
+	 */
+	if (tblk->lsn) {
+		LOGSYNC_LOCK(log);
+		log->count--;
+		list_del(&tblk->synclist);
+		LOGSYNC_UNLOCK(log);
+	}
+}
+
+
+/*
+ *      txMaplock()
+ *
+ * function: allocate a transaction lock for freed page/entry;
+ *      for freed page, maplock is used as xtlock/dtlock type;
+ */
+tlock_t *txMaplock(tid_t tid, struct inode *ip, int type)
+{
+	struct jfs_inode_info *jfs_ip = JFS_IP(ip);
+	lid_t lid;
+	tblock_t *tblk;
+	tlock_t *tlck;
+	maplock_t *maplock;
+
+	TXN_LOCK();
+
+	/*
+	 * allocate a tlock
+	 */
+	lid = txLockAlloc();
+	tlck = lid_to_tlock(lid);
+
+	/*
+	 * initialize tlock
+	 */
+	tlck->tid = tid;
+
+	/* bind the tlock and the object */
+	tlck->flag = tlckINODELOCK;
+	tlck->ip = ip;
+	tlck->mp = NULL;
+
+	tlck->type = type;
+
+	/*
+	 * enqueue transaction lock to transaction/inode
+	 */
+	/* insert the tlock at tail of transaction tlock list */
+	if (tid) {
+		tblk = tid_to_tblock(tid);
+		if (tblk->next)
+			lid_to_tlock(tblk->last)->next = lid;
+		else
+			tblk->next = lid;
+		tlck->next = 0;
+		tblk->last = lid;
+	}
+	/* anonymous transaction:
+	 * insert the tlock at head of inode anonymous tlock list
+	 */
+	else {
+		tlck->next = jfs_ip->atlhead;
+		jfs_ip->atlhead = lid;
+		if (tlck->next == 0) {
+			/* This inode's first anonymous transaction */
+			jfs_ip->atltail = lid;
+			list_add_tail(&jfs_ip->anon_inode_list,
+				      &TxAnchor.anon_list);
+		}
+	}
+
+	TXN_UNLOCK();
+
+	/* initialize type dependent area for maplock */
+	maplock = (maplock_t *) & tlck->lock;
+	maplock->next = 0;
+	maplock->maxcnt = 0;
+	maplock->index = 0;
+
+	return tlck;
+}
+
+
+/*
+ *      txLinelock()
+ *
+ * function: allocate a transaction lock for log vector list
+ */
+linelock_t *txLinelock(linelock_t * tlock)
+{
+	lid_t lid;
+	tlock_t *tlck;
+	linelock_t *linelock;
+
+	TXN_LOCK();
+
+	/* allocate a TxLock structure */
+	lid = txLockAlloc();
+	tlck = lid_to_tlock(lid);
+
+	TXN_UNLOCK();
+
+	/* initialize linelock */
+	linelock = (linelock_t *) tlck;
+	linelock->next = 0;
+	linelock->flag = tlckLINELOCK;
+	linelock->maxcnt = TLOCKLONG;
+	linelock->index = 0;
+
+	/* append linelock after tlock */
+	linelock->next = tlock->next;
+	tlock->next = lid;
+
+	return linelock;
+}
+
+
+
+/*
+ *              transaction commit management
+ *              -----------------------------
+ */
+
+/*
+ * NAME:        txCommit()
+ *
+ * FUNCTION:    commit the changes to the objects specified in
+ *              clist.  For journalled segments only the
+ *              changes of the caller are committed, ie by tid.
+ *              for non-journalled segments the data are flushed to
+ *              disk and then the change to the disk inode and indirect
+ *              blocks committed (so blocks newly allocated to the
+ *              segment will be made a part of the segment atomically).
+ *
+ *              all of the segments specified in clist must be in
+ *              one file system. no more than 6 segments are needed
+ *              to handle all unix svcs.
+ *
+ *              if the i_nlink field (i.e. disk inode link count)
+ *              is zero, and the type of inode is a regular file or
+ *              directory, or symbolic link , the inode is truncated
+ *              to zero length. the truncation is committed but the
+ *              VM resources are unaffected until it is closed (see
+ *              iput and iclose).
+ *
+ * PARAMETER:
+ *
+ * RETURN:
+ *
+ * serialization:
+ *              on entry the inode lock on each segment is assumed
+ *              to be held.
+ *
+ * i/o error:
+ */
+int txCommit(tid_t tid,		/* transaction identifier */
+	     int nip,		/* number of inodes to commit */
+	     struct inode **iplist,	/* list of inode to commit */
+	     int flag)
+{
+	int rc = 0, rc1 = 0;
+	commit_t cd;
+	log_t *log;
+	tblock_t *tblk;
+	lrd_t *lrd;
+	int lsn;
+	struct inode *ip;
+	struct jfs_inode_info *jfs_ip;
+	int k, n;
+	ino_t top;
+	struct super_block *sb;
+
+	jFYI(1, ("txCommit, tid = %d, flag = %d\n", tid, flag));
+	/* is read-only file system ? */
+	if (isReadOnly(iplist[0])) {
+		rc = EROFS;
+		goto TheEnd;
+	}
+
+	sb = cd.sb = iplist[0]->i_sb;
+	cd.tid = tid;
+
+	if (tid == 0)
+		tid = txBegin(sb, 0);
+	tblk = tid_to_tblock(tid);
+
+	/*
+	 * initialize commit structure
+	 */
+	log = (log_t *) JFS_SBI(sb)->log;
+	cd.log = log;
+
+	/* initialize log record descriptor in commit */
+	lrd = &cd.lrd;
+	lrd->logtid = cpu_to_le32(tblk->logtid);
+	lrd->backchain = 0;
+
+	tblk->xflag |= flag;
+
+	if ((flag & (COMMIT_FORCE | COMMIT_SYNC)) == 0)
+		tblk->xflag |= COMMIT_LAZY;
+	/*
+	 *      prepare non-journaled objects for commit
+	 *
+	 * flush data pages of non-journaled file
+	 * to prevent the file getting non-initialized disk blocks
+	 * in case of crash.
+	 * (new blocks - )
+	 */
+	cd.iplist = iplist;
+	cd.nip = nip;
+
+	/*
+	 *      acquire transaction lock on (on-disk) inodes
+	 *
+	 * update on-disk inode from in-memory inode
+	 * acquiring transaction locks for AFTER records
+	 * on the on-disk inode of file object
+	 *
+	 * sort the inodes array by inode number in descending order
+	 * to prevent deadlock when acquiring transaction lock
+	 * of on-disk inodes on multiple on-disk inode pages by
+	 * multiple concurrent transactions
+	 */
+	for (k = 0; k < cd.nip; k++) {
+		top = (cd.iplist[k])->i_ino;
+		for (n = k + 1; n < cd.nip; n++) {
+			ip = cd.iplist[n];
+			if (ip->i_ino > top) {
+				top = ip->i_ino;
+				cd.iplist[n] = cd.iplist[k];
+				cd.iplist[k] = ip;
+			}
+		}
+
+		ip = cd.iplist[k];
+		jfs_ip = JFS_IP(ip);
+
+		/*
+		 * BUGBUG - Should we call filemap_fdatasync here instead
+		 * of fsync_inode_data?
+		 * If we do, we have a deadlock condition since we may end
+		 * up recursively calling jfs_get_block with the IWRITELOCK
+		 * held.  We may be able to do away with IWRITELOCK while
+		 * committing transactions and use i_sem instead.
+		 */
+		if ((!S_ISDIR(ip->i_mode))
+		    && (tblk->flag & COMMIT_DELETE) == 0)
+			fsync_inode_data_buffers(ip);
+
+		/*
+		 * Mark inode as not dirty.  It will still be on the dirty
+		 * inode list, but we'll know not to commit it again unless
+		 * it gets marked dirty again
+		 */
+		clear_cflag(COMMIT_Dirty, ip);
+
+		/* inherit anonymous tlock(s) of inode */
+		if (jfs_ip->atlhead) {
+			lid_to_tlock(jfs_ip->atltail)->next = tblk->next;
+			tblk->next = jfs_ip->atlhead;
+			if (!tblk->last)
+				tblk->last = jfs_ip->atltail;
+			jfs_ip->atlhead = jfs_ip->atltail = 0;
+			TXN_LOCK();
+			list_del_init(&jfs_ip->anon_inode_list);
+			TXN_UNLOCK();
+		}
+
+		/*
+		 * acquire transaction lock on on-disk inode page
+		 * (become first tlock of the tblk's tlock list)
+		 */
+		if (((rc = diWrite(tid, ip))))
+			goto out;
+	}
+
+	/*
+	 *      write log records from transaction locks
+	 *
+	 * txUpdateMap() resets XAD_NEW in XAD.
+	 */
+	if ((rc = txLog(log, tblk, &cd)))
+		goto TheEnd;
+
+	/*
+	 * Ensure that inode isn't reused before
+	 * lazy commit thread finishes processing
+	 */
+	if (tblk->xflag & (COMMIT_CREATE | COMMIT_DELETE))
+		atomic_inc(&tblk->ip->i_count);
+	if (tblk->xflag & COMMIT_DELETE) {
+		ip = tblk->ip;
+		assert((ip->i_nlink == 0) && !test_cflag(COMMIT_Nolink, ip));
+		set_cflag(COMMIT_Nolink, ip);
+	}
+
+	/*
+	 *      write COMMIT log record
+	 */
+	lrd->type = cpu_to_le16(LOG_COMMIT);
+	lrd->length = 0;
+	lsn = lmLog(log, tblk, lrd, NULL);
+
+	lmGroupCommit(log, tblk);
+
+	/*
+	 *      - transaction is now committed -
+	 */
+
+	/*
+	 * force pages in careful update
+	 * (imap addressing structure update)
+	 */
+	if (flag & COMMIT_FORCE)
+		txForce(tblk);
+
+	/*
+	 *      update allocation map.
+	 *
+	 * update inode allocation map and inode:
+	 * free pager lock on memory object of inode if any.
+	 * update  block allocation map.
+	 *
+	 * txUpdateMap() resets XAD_NEW in XAD.
+	 */
+	if (tblk->xflag & COMMIT_FORCE)
+		txUpdateMap(tblk);
+
+	/*
+	 *      free transaction locks and pageout/free pages
+	 */
+	txRelease(tblk);
+
+	if ((tblk->flag & tblkGC_LAZY) == 0)
+		txUnlock(tblk, flag);
+
+
+	/*
+	 *      reset in-memory object state
+	 */
+	for (k = 0; k < cd.nip; k++) {
+		ip = cd.iplist[k];
+		jfs_ip = JFS_IP(ip);
+
+		/*
+		 * reset in-memory inode state
+		 */
+		jfs_ip->bxflag = 0;
+		jfs_ip->blid = 0;
+	}
+
+      out:
+	if (rc != 0)
+		txAbortCommit(&cd, rc);
+	else
+		rc = rc1;
+
+      TheEnd:
+	jFYI(1, ("txCommit: tid = %d, returning %d\n", tid, rc));
+	return rc;
+}
+
+
+/*
+ * NAME:        txLog()
+ *
+ * FUNCTION:    Writes AFTER log records for all lines modified
+ *              by tid for segments specified by inodes in comdata.
+ *              Code assumes only WRITELOCKS are recorded in lockwords.
+ *
+ * PARAMETERS:
+ *
+ * RETURN :
+ */
+static int txLog(log_t * log, tblock_t * tblk, commit_t * cd)
+{
+	int rc = 0;
+	struct inode *ip;
+	lid_t lid;
+	tlock_t *tlck;
+	lrd_t *lrd = &cd->lrd;
+
+	/*
+	 * write log record(s) for each tlock of transaction,
+	 */
+	for (lid = tblk->next; lid; lid = tlck->next) {
+		tlck = lid_to_tlock(lid);
+
+		tlck->flag |= tlckLOG;
+
+		/* initialize lrd common */
+		ip = tlck->ip;
+		lrd->aggregate = cpu_to_le32(kdev_t_to_nr(ip->i_dev));
+		lrd->log.redopage.fileset = cpu_to_le32(JFS_IP(ip)->fileset);
+		lrd->log.redopage.inode = cpu_to_le32(ip->i_ino);
+
+		if (tlck->mp)
+			hold_metapage(tlck->mp, 0);
+
+		/* write log record of page from the tlock */
+		switch (tlck->type & tlckTYPE) {
+		case tlckXTREE:
+			xtLog(log, tblk, lrd, tlck);
+			break;
+
+		case tlckDTREE:
+			dtLog(log, tblk, lrd, tlck);
+			break;
+
+		case tlckINODE:
+			diLog(log, tblk, lrd, tlck, cd);
+			break;
+
+		case tlckMAP:
+			mapLog(log, tblk, lrd, tlck);
+			break;
+
+		case tlckDATA:
+			dataLog(log, tblk, lrd, tlck);
+			break;
+
+		default:
+			jERROR(1, ("UFO tlock:0x%p\n", tlck));
+		}
+		if (tlck->mp)
+			release_metapage(tlck->mp);
+	}
+
+	return rc;
+}
+
+
+/*
+ *      diLog()
+ *
+ * function:    log inode tlock and format maplock to update bmap;
+ */
+int diLog(log_t * log,
+	  tblock_t * tblk, lrd_t * lrd, tlock_t * tlck, commit_t * cd)
+{
+	int rc = 0;
+	metapage_t *mp;
+	pxd_t *pxd;
+	pxdlock_t *pxdlock;
+
+	mp = tlck->mp;
+
+	/* initialize as REDOPAGE record format */
+	lrd->log.redopage.type = cpu_to_le16(LOG_INODE);
+	lrd->log.redopage.l2linesize = cpu_to_le16(L2INODESLOTSIZE);
+
+	pxd = &lrd->log.redopage.pxd;
+
+	/*
+	 *      inode after image
+	 */
+	if (tlck->type & tlckENTRY) {
+		/* log after-image for logredo(): */
+		lrd->type = cpu_to_le16(LOG_REDOPAGE);
+//              *pxd = mp->cm_pxd;
+		PXDaddress(pxd, mp->index);
+		PXDlength(pxd,
+			  mp->logical_size >> tblk->sb->s_blocksize_bits);
+		lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
+
+		/* mark page as homeward bound */
+		tlck->flag |= tlckWRITEPAGE;
+	} else if (tlck->type & tlckFREE) {
+		/*
+		 *      free inode extent
+		 *
+		 * (pages of the freed inode extent have been invalidated and
+		 * a maplock for free of the extent has been formatted at
+		 * txLock() time);
+		 *
+		 * the tlock had been acquired on the inode allocation map page
+		 * (iag) that specifies the freed extent, even though the map
+		 * page is not itself logged, to prevent pageout of the map
+		 * page before the log;
+		 */
+		assert(tlck->type & tlckFREE);
+
+		/* log LOG_NOREDOINOEXT of the freed inode extent for
+		 * logredo() to start NoRedoPage filters, and to update
+		 * imap and bmap for free of the extent;
+		 */
+		lrd->type = cpu_to_le16(LOG_NOREDOINOEXT);
+		/*
+		 * For the LOG_NOREDOINOEXT record, we need
+		 * to pass the IAG number and inode extent
+		 * index (within that IAG) from which the
+		 * the extent being released.  These have been
+		 * passed to us in the iplist[1] and iplist[2].
+		 */
+		lrd->log.noredoinoext.iagnum =
+		    cpu_to_le32((u32) (size_t) cd->iplist[1]);
+		lrd->log.noredoinoext.inoext_idx =
+		    cpu_to_le32((u32) (size_t) cd->iplist[2]);
+
+		pxdlock = (pxdlock_t *) & tlck->lock;
+		*pxd = pxdlock->pxd;
+		lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, NULL));
+
+		/* update bmap */
+		tlck->flag |= tlckUPDATEMAP;
+
+		/* mark page as homeward bound */
+		tlck->flag |= tlckWRITEPAGE;
+	} else {
+		jERROR(2, ("diLog: UFO type tlck:0x%p\n", tlck));
+	}
+#ifdef  _JFS_WIP
+	/*
+	 *      alloc/free external EA extent
+	 *
+	 * a maplock for txUpdateMap() to update bPWMAP for alloc/free
+	 * of the extent has been formatted at txLock() time;
+	 */
+	else {
+		assert(tlck->type & tlckEA);
+
+		/* log LOG_UPDATEMAP for logredo() to update bmap for
+		 * alloc of new (and free of old) external EA extent;
+		 */
+		lrd->type = cpu_to_le16(LOG_UPDATEMAP);
+		pxdlock = (pxdlock_t *) & tlck->lock;
+		nlock = pxdlock->index;
+		for (i = 0; i < nlock; i++, pxdlock++) {
+			if (pxdlock->flag & mlckALLOCPXD)
+				lrd->log.updatemap.type =
+				    cpu_to_le16(LOG_ALLOCPXD);
+			else
+				lrd->log.updatemap.type =
+				    cpu_to_le16(LOG_FREEPXD);
+			lrd->log.updatemap.nxd = cpu_to_le16(1);
+			lrd->log.updatemap.pxd = pxdlock->pxd;
+			lrd->backchain =
+			    cpu_to_le32(lmLog(log, tblk, lrd, NULL));
+		}
+
+		/* update bmap */
+		tlck->flag |= tlckUPDATEMAP;
+	}
+#endif				/* _JFS_WIP */
+
+	return rc;
+}
+
+
+/*
+ *      dataLog()
+ *
+ * function:    log data tlock
+ */
+int dataLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
+{
+	metapage_t *mp;
+	pxd_t *pxd;
+	int rc;
+	s64 xaddr;
+	int xflag;
+	s32 xlen;
+
+	mp = tlck->mp;
+
+	/* initialize as REDOPAGE record format */
+	lrd->log.redopage.type = cpu_to_le16(LOG_DATA);
+	lrd->log.redopage.l2linesize = cpu_to_le16(L2DATASLOTSIZE);
+
+	pxd = &lrd->log.redopage.pxd;
+
+	/* log after-image for logredo(): */
+	lrd->type = cpu_to_le16(LOG_REDOPAGE);
+
+	if (JFS_IP(tlck->ip)->next_index < MAX_INLINE_DIRTABLE_ENTRY) {
+		/*
+		 * The table has been truncated, we've must have deleted
+		 * the last entry, so don't bother logging this
+		 */
+		mp->lid = 0;
+		atomic_dec(&mp->nohomeok);
+		discard_metapage(mp);
+		tlck->mp = 0;
+		return 0;
+	}
+
+	rc = xtLookup(tlck->ip, mp->index, 1, &xflag, &xaddr, &xlen, 1);
+	if (rc || (xlen == 0)) {
+		jERROR(1, ("dataLog: can't find physical address\n"));
+		return 0;
+	}
+
+	PXDaddress(pxd, xaddr);
+	PXDlength(pxd, mp->logical_size >> tblk->sb->s_blocksize_bits);
+
+	lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
+
+	/* mark page as homeward bound */
+	tlck->flag |= tlckWRITEPAGE;
+
+	return 0;
+}
+
+
+/*
+ *      dtLog()
+ *
+ * function:    log dtree tlock and format maplock to update bmap;
+ */
+void dtLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
+{
+	struct inode *ip;
+	metapage_t *mp;
+	pxdlock_t *pxdlock;
+	pxd_t *pxd;
+
+	ip = tlck->ip;
+	mp = tlck->mp;
+
+	/* initialize as REDOPAGE/NOREDOPAGE record format */
+	lrd->log.redopage.type = cpu_to_le16(LOG_DTREE);
+	lrd->log.redopage.l2linesize = cpu_to_le16(L2DTSLOTSIZE);
+
+	pxd = &lrd->log.redopage.pxd;
+
+	if (tlck->type & tlckBTROOT)
+		lrd->log.redopage.type |= cpu_to_le16(LOG_BTROOT);
+
+	/*
+	 *      page extension via relocation: entry insertion;
+	 *      page extension in-place: entry insertion;
+	 *      new right page from page split, reinitialized in-line
+	 *      root from root page split: entry insertion;
+	 */
+	if (tlck->type & (tlckNEW | tlckEXTEND)) {
+		/* log after-image of the new page for logredo():
+		 * mark log (LOG_NEW) for logredo() to initialize
+		 * freelist and update bmap for alloc of the new page;
+		 */
+		lrd->type = cpu_to_le16(LOG_REDOPAGE);
+		if (tlck->type & tlckEXTEND)
+			lrd->log.redopage.type |= cpu_to_le16(LOG_EXTEND);
+		else
+			lrd->log.redopage.type |= cpu_to_le16(LOG_NEW);
+//              *pxd = mp->cm_pxd;
+		PXDaddress(pxd, mp->index);
+		PXDlength(pxd,
+			  mp->logical_size >> tblk->sb->s_blocksize_bits);
+		lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
+
+		/* format a maplock for txUpdateMap() to update bPMAP for
+		 * alloc of the new page;
+		 */
+		if (tlck->type & tlckBTROOT)
+			return;
+		tlck->flag |= tlckUPDATEMAP;
+		pxdlock = (pxdlock_t *) & tlck->lock;
+		pxdlock->flag = mlckALLOCPXD;
+		pxdlock->pxd = *pxd;
+
+		pxdlock->index = 1;
+
+		/* mark page as homeward bound */
+		tlck->flag |= tlckWRITEPAGE;
+		return;
+	}
+
+	/*
+	 *      entry insertion/deletion,
+	 *      sibling page link update (old right page before split);
+	 */
+	if (tlck->type & (tlckENTRY | tlckRELINK)) {
+		/* log after-image for logredo(): */
+		lrd->type = cpu_to_le16(LOG_REDOPAGE);
+		PXDaddress(pxd, mp->index);
+		PXDlength(pxd,
+			  mp->logical_size >> tblk->sb->s_blocksize_bits);
+		lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
+
+		/* mark page as homeward bound */
+		tlck->flag |= tlckWRITEPAGE;
+		return;
+	}
+
+	/*
+	 *      page deletion: page has been invalidated
+	 *      page relocation: source extent
+	 *
+	 *      a maplock for free of the page has been formatted
+	 *      at txLock() time);
+	 */
+	if (tlck->type & (tlckFREE | tlckRELOCATE)) {
+		/* log LOG_NOREDOPAGE of the deleted page for logredo()
+		 * to start NoRedoPage filter and to update bmap for free
+		 * of the deletd page
+		 */
+		lrd->type = cpu_to_le16(LOG_NOREDOPAGE);
+		pxdlock = (pxdlock_t *) & tlck->lock;
+		*pxd = pxdlock->pxd;
+		lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, NULL));
+
+		/* a maplock for txUpdateMap() for free of the page
+		 * has been formatted at txLock() time;
+		 */
+		tlck->flag |= tlckUPDATEMAP;
+	}
+	return;
+}
+
+
+/*
+ *      xtLog()
+ *
+ * function:    log xtree tlock and format maplock to update bmap;
+ */
+void xtLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
+{
+	struct inode *ip;
+	metapage_t *mp;
+	xtpage_t *p;
+	xtlock_t *xtlck;
+	maplock_t *maplock;
+	xdlistlock_t *xadlock;
+	pxdlock_t *pxdlock;
+	pxd_t *pxd;
+	int next, lwm, hwm;
+
+	ip = tlck->ip;
+	mp = tlck->mp;
+
+	/* initialize as REDOPAGE/NOREDOPAGE record format */
+	lrd->log.redopage.type = cpu_to_le16(LOG_XTREE);
+	lrd->log.redopage.l2linesize = cpu_to_le16(L2XTSLOTSIZE);
+
+	pxd = &lrd->log.redopage.pxd;
+
+	if (tlck->type & tlckBTROOT) {
+		lrd->log.redopage.type |= cpu_to_le16(LOG_BTROOT);
+		p = &JFS_IP(ip)->i_xtroot;
+		if (S_ISDIR(ip->i_mode))
+			lrd->log.redopage.type |=
+			    cpu_to_le16(LOG_DIR_XTREE);
+	} else
+		p = (xtpage_t *) mp->data;
+	next = le16_to_cpu(p->header.nextindex);
+
+	xtlck = (xtlock_t *) & tlck->lock;
+
+	maplock = (maplock_t *) & tlck->lock;
+	xadlock = (xdlistlock_t *) maplock;
+
+	/*
+	 *      entry insertion/extension;
+	 *      sibling page link update (old right page before split);
+	 */
+	if (tlck->type & (tlckNEW | tlckGROW | tlckRELINK)) {
+		/* log after-image for logredo():
+		 * logredo() will update bmap for alloc of new/extended
+		 * extents (XAD_NEW|XAD_EXTEND) of XAD[lwm:next) from
+		 * after-image of XADlist;
+		 * logredo() resets (XAD_NEW|XAD_EXTEND) flag when
+		 * applying the after-image to the meta-data page.
+		 */
+		lrd->type = cpu_to_le16(LOG_REDOPAGE);
+//              *pxd = mp->cm_pxd;
+		PXDaddress(pxd, mp->index);
+		PXDlength(pxd,
+			  mp->logical_size >> tblk->sb->s_blocksize_bits);
+		lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
+
+		/* format a maplock for txUpdateMap() to update bPMAP
+		 * for alloc of new/extended extents of XAD[lwm:next)
+		 * from the page itself;
+		 * txUpdateMap() resets (XAD_NEW|XAD_EXTEND) flag.
+		 */
+		lwm = xtlck->lwm.offset;
+		if (lwm == 0)
+			lwm = XTPAGEMAXSLOT;
+
+		if (lwm == next)
+			goto out;
+		assert(lwm < next);
+		tlck->flag |= tlckUPDATEMAP;
+		xadlock->flag = mlckALLOCXADLIST;
+		xadlock->count = next - lwm;
+		if ((xadlock->count <= 2) && (tblk->xflag & COMMIT_LAZY)) {
+			int i;
+			/*
+			 * Lazy commit may allow xtree to be modified before
+			 * txUpdateMap runs.  Copy xad into linelock to
+			 * preserve correct data.
+			 */
+			xadlock->xdlist = &xtlck->pxdlock;
+			memcpy(xadlock->xdlist, &p->xad[lwm],
+			       sizeof(xad_t) * xadlock->count);
+
+			for (i = 0; i < xadlock->count; i++)
+				p->xad[lwm + i].flag &=
+				    ~(XAD_NEW | XAD_EXTENDED);
+		} else {
+			/*
+			 * xdlist will point to into inode's xtree, ensure
+			 * that transaction is not committed lazily.
+			 */
+			xadlock->xdlist = &p->xad[lwm];
+			tblk->xflag &= ~COMMIT_LAZY;
+		}
+		jFYI(1,
+		     ("xtLog: alloc ip:0x%p mp:0x%p tlck:0x%p lwm:%d count:%d\n",
+		      tlck->ip, mp, tlck, lwm, xadlock->count));
+
+		maplock->index = 1;
+
+	      out:
+		/* mark page as homeward bound */
+		tlck->flag |= tlckWRITEPAGE;
+
+		return;
+	}
+
+	/*
+	 *      page deletion: file deletion/truncation (ref. xtTruncate())
+	 *
+	 * (page will be invalidated after log is written and bmap
+	 * is updated from the page);
+	 */
+	if (tlck->type & tlckFREE) {
+		/* LOG_NOREDOPAGE log for NoRedoPage filter:
+		 * if page free from file delete, NoRedoFile filter from
+		 * inode image of zero link count will subsume NoRedoPage
+		 * filters for each page;
+		 * if page free from file truncattion, write NoRedoPage
+		 * filter;
+		 *
+		 * upadte of block allocation map for the page itself:
+		 * if page free from deletion and truncation, LOG_UPDATEMAP
+		 * log for the page itself is generated from processing
+		 * its parent page xad entries;
+		 */
+		/* if page free from file truncation, log LOG_NOREDOPAGE
+		 * of the deleted page for logredo() to start NoRedoPage
+		 * filter for the page;
+		 */
+		if (tblk->xflag & COMMIT_TRUNCATE) {
+			/* write NOREDOPAGE for the page */
+			lrd->type = cpu_to_le16(LOG_NOREDOPAGE);
+			PXDaddress(pxd, mp->index);
+			PXDlength(pxd,
+				  mp->logical_size >> tblk->sb->
+				  s_blocksize_bits);
+			lrd->backchain =
+			    cpu_to_le32(lmLog(log, tblk, lrd, NULL));
+
+			if (tlck->type & tlckBTROOT) {
+				/* Empty xtree must be logged */
+				lrd->type = cpu_to_le16(LOG_REDOPAGE);
+				lrd->backchain =
+				    cpu_to_le32(lmLog(log, tblk, lrd, tlck));
+			}
+		}
+
+		/* init LOG_UPDATEMAP of the freed extents
+		 * XAD[XTENTRYSTART:hwm) from the deleted page itself
+		 * for logredo() to update bmap;
+		 */
+		lrd->type = cpu_to_le16(LOG_UPDATEMAP);
+		lrd->log.updatemap.type = cpu_to_le16(LOG_FREEXADLIST);
+		xtlck = (xtlock_t *) & tlck->lock;
+		hwm = xtlck->hwm.offset;
+		lrd->log.updatemap.nxd =
+		    cpu_to_le16(hwm - XTENTRYSTART + 1);
+		/* reformat linelock for lmLog() */
+		xtlck->header.offset = XTENTRYSTART;
+		xtlck->header.length = hwm - XTENTRYSTART + 1;
+		xtlck->index = 1;
+		lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, tlck));
+
+		/* format a maplock for txUpdateMap() to update bmap
+		 * to free extents of XAD[XTENTRYSTART:hwm) from the
+		 * deleted page itself;
+		 */
+		tlck->flag |= tlckUPDATEMAP;
+		xadlock->flag = mlckFREEXADLIST;
+		xadlock->count = hwm - XTENTRYSTART + 1;
+		if ((xadlock->count <= 2) && (tblk->xflag & COMMIT_LAZY)) {
+			/*
+			 * Lazy commit may allow xtree to be modified before
+			 * txUpdateMap runs.  Copy xad into linelock to
+			 * preserve correct data.
+			 */
+			xadlock->xdlist = &xtlck->pxdlock;
+			memcpy(xadlock->xdlist, &p->xad[XTENTRYSTART],
+			       sizeof(xad_t) * xadlock->count);
+		} else {
+			/*
+			 * xdlist will point to into inode's xtree, ensure
+			 * that transaction is not committed lazily unless
+			 * we're deleting the inode (unlink).  In that case
+			 * we have special logic for the inode to be
+			 * unlocked by the lazy commit thread.
+			 */
+			xadlock->xdlist = &p->xad[XTENTRYSTART];
+			if ((tblk->xflag & COMMIT_LAZY) &&
+			    (tblk->xflag & COMMIT_DELETE) &&
+			    (tblk->ip == ip))
+				set_cflag(COMMIT_Holdlock, ip);
+			else
+				tblk->xflag &= ~COMMIT_LAZY;
+		}
+		jFYI(1,
+		     ("xtLog: free ip:0x%p mp:0x%p count:%d lwm:2\n",
+		      tlck->ip, mp, xadlock->count));
+
+		maplock->index = 1;
+
+		/* mark page as invalid */
+		if (((tblk->xflag & COMMIT_PWMAP) || S_ISDIR(ip->i_mode))
+		    && !(tlck->type & tlckBTROOT))
+			tlck->flag |= tlckFREEPAGE;
+		/*
+		   else (tblk->xflag & COMMIT_PMAP)
+		   ? release the page;
+		 */
+		return;
+	}
+
+	/*
+	 *      page/entry truncation: file truncation (ref. xtTruncate())
+	 *
+	 *     |----------+------+------+---------------|
+	 *                |      |      |
+	 *                |      |     hwm - hwm before truncation
+	 *                |     next - truncation point
+	 *               lwm - lwm before truncation
+	 * header ?
+	 */
+	if (tlck->type & tlckTRUNCATE) {
+		pxd_t tpxd;	/* truncated extent of xad */
+
+		/*
+		 * For truncation the entire linelock may be used, so it would
+		 * be difficult to store xad list in linelock itself.
+		 * Therefore, we'll just force transaction to be committed
+		 * synchronously, so that xtree pages won't be changed before
+		 * txUpdateMap runs.
+		 */
+		tblk->xflag &= ~COMMIT_LAZY;
+		lwm = xtlck->lwm.offset;
+		if (lwm == 0)
+			lwm = XTPAGEMAXSLOT;
+		hwm = xtlck->hwm.offset;
+
+		/*
+		 *      write log records
+		 */
+		/*
+		 * allocate entries XAD[lwm:next]:
+		 */
+		if (lwm < next) {
+			/* log after-image for logredo():
+			 * logredo() will update bmap for alloc of new/extended
+			 * extents (XAD_NEW|XAD_EXTEND) of XAD[lwm:next) from
+			 * after-image of XADlist;
+			 * logredo() resets (XAD_NEW|XAD_EXTEND) flag when
+			 * applying the after-image to the meta-data page.
+			 */
+			lrd->type = cpu_to_le16(LOG_REDOPAGE);
+			PXDaddress(pxd, mp->index);
+			PXDlength(pxd,
+				  mp->logical_size >> tblk->sb->
+				  s_blocksize_bits);
+			lrd->backchain =
+			    cpu_to_le32(lmLog(log, tblk, lrd, tlck));
+		}
+
+		/*
+		 * truncate entry XAD[hwm == next - 1]:
+		 */
+		if (hwm == next - 1) {
+			/* init LOG_UPDATEMAP for logredo() to update bmap for
+			 * free of truncated delta extent of the truncated
+			 * entry XAD[next - 1]:
+			 * (xtlck->pxdlock = truncated delta extent);
+			 */
+			pxdlock = (pxdlock_t *) & xtlck->pxdlock;
+			/* assert(pxdlock->type & tlckTRUNCATE); */
+			lrd->type = cpu_to_le16(LOG_UPDATEMAP);
+			lrd->log.updatemap.type = cpu_to_le16(LOG_FREEPXD);
+			lrd->log.updatemap.nxd = cpu_to_le16(1);
+			lrd->log.updatemap.pxd = pxdlock->pxd;
+			tpxd = pxdlock->pxd;	/* save to format maplock */
+			lrd->backchain =
+			    cpu_to_le32(lmLog(log, tblk, lrd, NULL));
+		}
+
+		/*
+		 * free entries XAD[next:hwm]:
+		 */
+		if (hwm >= next) {
+			/* init LOG_UPDATEMAP of the freed extents
+			 * XAD[next:hwm] from the deleted page itself
+			 * for logredo() to update bmap;
+			 */
+			lrd->type = cpu_to_le16(LOG_UPDATEMAP);
+			lrd->log.updatemap.type =
+			    cpu_to_le16(LOG_FREEXADLIST);
+			xtlck = (xtlock_t *) & tlck->lock;
+			hwm = xtlck->hwm.offset;
+			lrd->log.updatemap.nxd =
+			    cpu_to_le16(hwm - next + 1);
+			/* reformat linelock for lmLog() */
+			xtlck->header.offset = next;
+			xtlck->header.length = hwm - next + 1;
+			xtlck->index = 1;
+			lrd->backchain =
+			    cpu_to_le32(lmLog(log, tblk, lrd, tlck));
+		}
+
+		/*
+		 *      format maplock(s) for txUpdateMap() to update bmap
+		 */
+		maplock->index = 0;
+
+		/*
+		 * allocate entries XAD[lwm:next):
+		 */
+		if (lwm < next) {
+			/* format a maplock for txUpdateMap() to update bPMAP
+			 * for alloc of new/extended extents of XAD[lwm:next)
+			 * from the page itself;
+			 * txUpdateMap() resets (XAD_NEW|XAD_EXTEND) flag.
+			 */
+			tlck->flag |= tlckUPDATEMAP;
+			xadlock->flag = mlckALLOCXADLIST;
+			xadlock->count = next - lwm;
+			xadlock->xdlist = &p->xad[lwm];
+
+			jFYI(1,
+			     ("xtLog: alloc ip:0x%p mp:0x%p count:%d lwm:%d next:%d\n",
+			      tlck->ip, mp, xadlock->count, lwm, next));
+			maplock->index++;
+			xadlock++;
+		}
+
+		/*
+		 * truncate entry XAD[hwm == next - 1]:
+		 */
+		if (hwm == next - 1) {
+			pxdlock_t *pxdlock;
+
+			/* format a maplock for txUpdateMap() to update bmap
+			 * to free truncated delta extent of the truncated
+			 * entry XAD[next - 1];
+			 * (xtlck->pxdlock = truncated delta extent);
+			 */
+			tlck->flag |= tlckUPDATEMAP;
+			pxdlock = (pxdlock_t *) xadlock;
+			pxdlock->flag = mlckFREEPXD;
+			pxdlock->count = 1;
+			pxdlock->pxd = tpxd;
+
+			jFYI(1,
+			     ("xtLog: truncate ip:0x%p mp:0x%p count:%d hwm:%d\n",
+			      ip, mp, pxdlock->count, hwm));
+			maplock->index++;
+			xadlock++;
+		}
+
+		/*
+		 * free entries XAD[next:hwm]:
+		 */
+		if (hwm >= next) {
+			/* format a maplock for txUpdateMap() to update bmap
+			 * to free extents of XAD[next:hwm] from thedeleted
+			 * page itself;
+			 */
+			tlck->flag |= tlckUPDATEMAP;
+			xadlock->flag = mlckFREEXADLIST;
+			xadlock->count = hwm - next + 1;
+			xadlock->xdlist = &p->xad[next];
+
+			jFYI(1,
+			     ("xtLog: free ip:0x%p mp:0x%p count:%d next:%d hwm:%d\n",
+			      tlck->ip, mp, xadlock->count, next, hwm));
+			maplock->index++;
+		}
+
+		/* mark page as homeward bound */
+		tlck->flag |= tlckWRITEPAGE;
+	}
+	return;
+}
+
+
+/*
+ *      mapLog()
+ *
+ * function:    log from maplock of freed data extents;
+ */
+void mapLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck)
+{
+	pxdlock_t *pxdlock;
+	int i, nlock;
+	pxd_t *pxd;
+
+	/*
+	 *      page relocation: free the source page extent
+	 *
+	 * a maplock for txUpdateMap() for free of the page
+	 * has been formatted at txLock() time saving the src
+	 * relocated page address;
+	 */
+	if (tlck->type & tlckRELOCATE) {
+		/* log LOG_NOREDOPAGE of the old relocated page
+		 * for logredo() to start NoRedoPage filter;
+		 */
+		lrd->type = cpu_to_le16(LOG_NOREDOPAGE);
+		pxdlock = (pxdlock_t *) & tlck->lock;
+		pxd = &lrd->log.redopage.pxd;
+		*pxd = pxdlock->pxd;
+		lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, NULL));
+
+		/* (N.B. currently, logredo() does NOT update bmap
+		 * for free of the page itself for (LOG_XTREE|LOG_NOREDOPAGE);
+		 * if page free from relocation, LOG_UPDATEMAP log is
+		 * specifically generated now for logredo()
+		 * to update bmap for free of src relocated page;
+		 * (new flag LOG_RELOCATE may be introduced which will
+		 * inform logredo() to start NORedoPage filter and also
+		 * update block allocation map at the same time, thus
+		 * avoiding an extra log write);
+		 */
+		lrd->type = cpu_to_le16(LOG_UPDATEMAP);
+		lrd->log.updatemap.type = cpu_to_le16(LOG_FREEPXD);
+		lrd->log.updatemap.nxd = cpu_to_le16(1);
+		lrd->log.updatemap.pxd = pxdlock->pxd;
+		lrd->backchain = cpu_to_le32(lmLog(log, tblk, lrd, NULL));
+
+		/* a maplock for txUpdateMap() for free of the page
+		 * has been formatted at txLock() time;
+		 */
+		tlck->flag |= tlckUPDATEMAP;
+		return;
+	}
+	/*
+
+	 * Otherwise it's not a relocate request
+	 *
+	 */
+	else {
+		/* log LOG_UPDATEMAP for logredo() to update bmap for
+		 * free of truncated/relocated delta extent of the data;
+		 * e.g.: external EA extent, relocated/truncated extent
+		 * from xtTailgate();
+		 */
+		lrd->type = cpu_to_le16(LOG_UPDATEMAP);
+		pxdlock = (pxdlock_t *) & tlck->lock;
+		nlock = pxdlock->index;
+		for (i = 0; i < nlock; i++, pxdlock++) {
+			if (pxdlock->flag & mlckALLOCPXD)
+				lrd->log.updatemap.type =
+				    cpu_to_le16(LOG_ALLOCPXD);
+			else
+				lrd->log.updatemap.type =
+				    cpu_to_le16(LOG_FREEPXD);
+			lrd->log.updatemap.nxd = cpu_to_le16(1);
+			lrd->log.updatemap.pxd = pxdlock->pxd;
+			lrd->backchain =
+			    cpu_to_le32(lmLog(log, tblk, lrd, NULL));
+			jFYI(1, ("mapLog: xaddr:0x%lx xlen:0x%x\n",
+				 (ulong) addressPXD(&pxdlock->pxd),
+				 lengthPXD(&pxdlock->pxd)));
+		}
+
+		/* update bmap */
+		tlck->flag |= tlckUPDATEMAP;
+	}
+}
+
+
+/*
+ *      txEA()
+ *
+ * function:    acquire maplock for EA/ACL extents or
+ *              set COMMIT_INLINE flag;
+ */
+void txEA(tid_t tid, struct inode *ip, dxd_t * oldea, dxd_t * newea)
+{
+	tlock_t *tlck = NULL;
+	pxdlock_t *maplock = NULL, *pxdlock = NULL;
+
+	/*
+	 * format maplock for alloc of new EA extent
+	 */
+	if (newea) {
+		/* Since the newea could be a completely zeroed entry we need to
+		 * check for the two flags which indicate we should actually
+		 * commit new EA data
+		 */
+		if (newea->flag & DXD_EXTENT) {
+			tlck = txMaplock(tid, ip, tlckMAP);
+			maplock = (pxdlock_t *) & tlck->lock;
+			pxdlock = (pxdlock_t *) maplock;
+			pxdlock->flag = mlckALLOCPXD;
+			PXDaddress(&pxdlock->pxd, addressDXD(newea));
+			PXDlength(&pxdlock->pxd, lengthDXD(newea));
+			pxdlock++;
+			maplock->index = 1;
+		} else if (newea->flag & DXD_INLINE) {
+			tlck = NULL;
+
+			set_cflag(COMMIT_Inlineea, ip);
+		}
+	}
+
+	/*
+	 * format maplock for free of old EA extent
+	 */
+	if (!test_cflag(COMMIT_Nolink, ip) && oldea->flag & DXD_EXTENT) {
+		if (tlck == NULL) {
+			tlck = txMaplock(tid, ip, tlckMAP);
+			maplock = (pxdlock_t *) & tlck->lock;
+			pxdlock = (pxdlock_t *) maplock;
+			maplock->index = 0;
+		}
+		pxdlock->flag = mlckFREEPXD;
+		PXDaddress(&pxdlock->pxd, addressDXD(oldea));
+		PXDlength(&pxdlock->pxd, lengthDXD(oldea));
+		maplock->index++;
+	}
+}
+
+
+/*
+ *      txForce()
+ *
+ * function: synchronously write pages locked by transaction
+ *              after txLog() but before txUpdateMap();
+ */
+void txForce(tblock_t * tblk)
+{
+	tlock_t *tlck;
+	lid_t lid, next;
+	metapage_t *mp;
+
+	/*
+	 * reverse the order of transaction tlocks in
+	 * careful update order of address index pages
+	 * (right to left, bottom up)
+	 */
+	tlck = lid_to_tlock(tblk->next);
+	lid = tlck->next;
+	tlck->next = 0;
+	while (lid) {
+		tlck = lid_to_tlock(lid);
+		next = tlck->next;
+		tlck->next = tblk->next;
+		tblk->next = lid;
+		lid = next;
+	}
+
+	/*
+	 * synchronously write the page, and
+	 * hold the page for txUpdateMap();
+	 */
+	for (lid = tblk->next; lid; lid = next) {
+		tlck = lid_to_tlock(lid);
+		next = tlck->next;
+
+		if ((mp = tlck->mp) != NULL &&
+		    (tlck->type & tlckBTROOT) == 0) {
+			assert(mp->xflag & COMMIT_PAGE);
+
+			if (tlck->flag & tlckWRITEPAGE) {
+				tlck->flag &= ~tlckWRITEPAGE;
+
+				/* do not release page to freelist */
+				assert(atomic_read(&mp->nohomeok));
+				hold_metapage(mp, 0);
+				write_metapage(mp);
+			}
+		}
+	}
+}
+
+
+/*
+ *      txUpdateMap()
+ *
+ * function:    update persistent allocation map (and working map
+ *              if appropriate);
+ *
+ * parameter:
+ */
+static void txUpdateMap(tblock_t * tblk)
+{
+	struct inode *ip;
+	struct inode *ipimap;
+	lid_t lid;
+	tlock_t *tlck;
+	maplock_t *maplock;
+	pxdlock_t pxdlock;
+	int maptype;
+	int k, nlock;
+	metapage_t *mp = 0;
+
+	ipimap = JFS_SBI(tblk->sb)->ipimap;
+
+	maptype = (tblk->xflag & COMMIT_PMAP) ? COMMIT_PMAP : COMMIT_PWMAP;
+
+
+	/*
+	 *      update block allocation map
+	 *
+	 * update allocation state in pmap (and wmap) and
+	 * update lsn of the pmap page;
+	 */
+	/*
+	 * scan each tlock/page of transaction for block allocation/free:
+	 *
+	 * for each tlock/page of transaction, update map.
+	 *  ? are there tlock for pmap and pwmap at the same time ?
+	 */
+	for (lid = tblk->next; lid; lid = tlck->next) {
+		tlck = lid_to_tlock(lid);
+
+		if ((tlck->flag & tlckUPDATEMAP) == 0)
+			continue;
+
+		if (tlck->flag & tlckFREEPAGE) {
+			/*
+			 * Another thread may attempt to reuse freed space
+			 * immediately, so we want to get rid of the metapage
+			 * before anyone else has a chance to get it.
+			 * Lock metapage, update maps, then invalidate
+			 * the metapage.
+			 */
+			mp = tlck->mp;
+			ASSERT(mp->xflag & COMMIT_PAGE);
+			hold_metapage(mp, 0);
+		}
+
+		/*
+		 * extent list:
+		 * . in-line PXD list:
+		 * . out-of-line XAD list:
+		 */
+		maplock = (maplock_t *) & tlck->lock;
+		nlock = maplock->index;
+
+		for (k = 0; k < nlock; k++, maplock++) {
+			/*
+			 * allocate blocks in persistent map:
+			 *
+			 * blocks have been allocated from wmap at alloc time;
+			 */
+			if (maplock->flag & mlckALLOC) {
+				txAllocPMap(ipimap, maplock, tblk);
+			}
+			/*
+			 * free blocks in persistent and working map:
+			 * blocks will be freed in pmap and then in wmap;
+			 *
+			 * ? tblock specifies the PMAP/PWMAP based upon
+			 * transaction
+			 *
+			 * free blocks in persistent map:
+			 * blocks will be freed from wmap at last reference
+			 * release of the object for regular files;
+			 *
+			 * Alway free blocks from both persistent & working
+			 * maps for directories
+			 */
+			else {	/* (maplock->flag & mlckFREE) */
+
+				if (S_ISDIR(tlck->ip->i_mode))
+					txFreeMap(ipimap, maplock,
+						  tblk, COMMIT_PWMAP);
+				else
+					txFreeMap(ipimap, maplock,
+						  tblk, maptype);
+			}
+		}
+		if (tlck->flag & tlckFREEPAGE) {
+			if (!(tblk->flag & tblkGC_LAZY)) {
+				/* This is equivalent to txRelease */
+				ASSERT(mp->lid == lid);
+				tlck->mp->lid = 0;
+			}
+			assert(atomic_read(&mp->nohomeok) == 1);
+			atomic_dec(&mp->nohomeok);
+			discard_metapage(mp);
+			tlck->mp = 0;
+		}
+	}
+	/*
+	 *      update inode allocation map
+	 *
+	 * update allocation state in pmap and
+	 * update lsn of the pmap page;
+	 * update in-memory inode flag/state
+	 *
+	 * unlock mapper/write lock
+	 */
+	if (tblk->xflag & COMMIT_CREATE) {
+		ip = tblk->ip;
+
+		ASSERT(test_cflag(COMMIT_New, ip));
+		clear_cflag(COMMIT_New, ip);
+
+		diUpdatePMap(ipimap, ip->i_ino, FALSE, tblk);
+		ipimap->i_state |= I_DIRTY;
+		/* update persistent block allocation map
+		 * for the allocation of inode extent;
+		 */
+		pxdlock.flag = mlckALLOCPXD;
+		pxdlock.pxd = JFS_IP(ip)->ixpxd;
+		pxdlock.index = 1;
+		txAllocPMap(ip, (maplock_t *) & pxdlock, tblk);
+		iput(ip);
+	} else if (tblk->xflag & COMMIT_DELETE) {
+		ip = tblk->ip;
+		diUpdatePMap(ipimap, ip->i_ino, TRUE, tblk);
+		ipimap->i_state |= I_DIRTY;
+		if (test_and_clear_cflag(COMMIT_Holdlock, ip)) {
+			if (tblk->flag & tblkGC_LAZY)
+				IWRITE_UNLOCK(ip);
+		}
+		iput(ip);
+	}
+}
+
+
+/*
+ *      txAllocPMap()
+ *
+ * function: allocate from persistent map;
+ *
+ * parameter:
+ *      ipbmap  -
+ *      malock -
+ *              xad list:
+ *              pxd:
+ *
+ *      maptype -
+ *              allocate from persistent map;
+ *              free from persistent map;
+ *              (e.g., tmp file - free from working map at releae
+ *               of last reference);
+ *              free from persistent and working map;
+ *
+ *      lsn     - log sequence number;
+ */
+static void txAllocPMap(struct inode *ip, maplock_t * maplock,
+			tblock_t * tblk)
+{
+	struct inode *ipbmap = JFS_SBI(ip->i_sb)->ipbmap;
+	xdlistlock_t *xadlistlock;
+	xad_t *xad;
+	s64 xaddr;
+	int xlen;
+	pxdlock_t *pxdlock;
+	xdlistlock_t *pxdlistlock;
+	pxd_t *pxd;
+	int n;
+
+	/*
+	 * allocate from persistent map;
+	 */
+	if (maplock->flag & mlckALLOCXADLIST) {
+		xadlistlock = (xdlistlock_t *) maplock;
+		xad = xadlistlock->xdlist;
+		for (n = 0; n < xadlistlock->count; n++, xad++) {
+			if (xad->flag & (XAD_NEW | XAD_EXTENDED)) {
+				xaddr = addressXAD(xad);
+				xlen = lengthXAD(xad);
+				dbUpdatePMap(ipbmap, FALSE, xaddr,
+					     (s64) xlen, tblk);
+				xad->flag &= ~(XAD_NEW | XAD_EXTENDED);
+				jFYI(1,
+				     ("allocPMap: xaddr:0x%lx xlen:%d\n",
+				      (ulong) xaddr, xlen));
+			}
+		}
+	} else if (maplock->flag & mlckALLOCPXD) {
+		pxdlock = (pxdlock_t *) maplock;
+		xaddr = addressPXD(&pxdlock->pxd);
+		xlen = lengthPXD(&pxdlock->pxd);
+		dbUpdatePMap(ipbmap, FALSE, xaddr, (s64) xlen, tblk);
+		jFYI(1,
+		     ("allocPMap: xaddr:0x%lx xlen:%d\n", (ulong) xaddr,
+		      xlen));
+	} else {		/* (maplock->flag & mlckALLOCPXDLIST) */
+
+		pxdlistlock = (xdlistlock_t *) maplock;
+		pxd = pxdlistlock->xdlist;
+		for (n = 0; n < pxdlistlock->count; n++, pxd++) {
+			xaddr = addressPXD(pxd);
+			xlen = lengthPXD(pxd);
+			dbUpdatePMap(ipbmap, FALSE, xaddr, (s64) xlen,
+				     tblk);
+			jFYI(1,
+			     ("allocPMap: xaddr:0x%lx xlen:%d\n",
+			      (ulong) xaddr, xlen));
+		}
+	}
+}
+
+
+/*
+ *      txFreeMap()
+ *
+ * function:    free from persistent and/or working map;
+ *
+ * todo: optimization
+ */
+void txFreeMap(struct inode *ip,
+	       maplock_t * maplock, tblock_t * tblk, int maptype)
+{
+	struct inode *ipbmap = JFS_SBI(ip->i_sb)->ipbmap;
+	xdlistlock_t *xadlistlock;
+	xad_t *xad;
+	s64 xaddr;
+	int xlen;
+	pxdlock_t *pxdlock;
+	xdlistlock_t *pxdlistlock;
+	pxd_t *pxd;
+	int n;
+
+	jFYI(1,
+	     ("txFreeMap: tblk:0x%p maplock:0x%p maptype:0x%x\n",
+	      tblk, maplock, maptype));
+
+	/*
+	 * free from persistent map;
+	 */
+	if (maptype == COMMIT_PMAP || maptype == COMMIT_PWMAP) {
+		if (maplock->flag & mlckFREEXADLIST) {
+			xadlistlock = (xdlistlock_t *) maplock;
+			xad = xadlistlock->xdlist;
+			for (n = 0; n < xadlistlock->count; n++, xad++) {
+				if (!(xad->flag & XAD_NEW)) {
+					xaddr = addressXAD(xad);
+					xlen = lengthXAD(xad);
+					dbUpdatePMap(ipbmap, TRUE, xaddr,
+						     (s64) xlen, tblk);
+					jFYI(1,
+					     ("freePMap: xaddr:0x%lx xlen:%d\n",
+					      (ulong) xaddr, xlen));
+				}
+			}
+		} else if (maplock->flag & mlckFREEPXD) {
+			pxdlock = (pxdlock_t *) maplock;
+			xaddr = addressPXD(&pxdlock->pxd);
+			xlen = lengthPXD(&pxdlock->pxd);
+			dbUpdatePMap(ipbmap, TRUE, xaddr, (s64) xlen,
+				     tblk);
+			jFYI(1,
+			     ("freePMap: xaddr:0x%lx xlen:%d\n",
+			      (ulong) xaddr, xlen));
+		} else {	/* (maplock->flag & mlckALLOCPXDLIST) */
+
+			pxdlistlock = (xdlistlock_t *) maplock;
+			pxd = pxdlistlock->xdlist;
+			for (n = 0; n < pxdlistlock->count; n++, pxd++) {
+				xaddr = addressPXD(pxd);
+				xlen = lengthPXD(pxd);
+				dbUpdatePMap(ipbmap, TRUE, xaddr,
+					     (s64) xlen, tblk);
+				jFYI(1,
+				     ("freePMap: xaddr:0x%lx xlen:%d\n",
+				      (ulong) xaddr, xlen));
+			}
+		}
+	}
+
+	/*
+	 * free from working map;
+	 */
+	if (maptype == COMMIT_PWMAP || maptype == COMMIT_WMAP) {
+		if (maplock->flag & mlckFREEXADLIST) {
+			xadlistlock = (xdlistlock_t *) maplock;
+			xad = xadlistlock->xdlist;
+			for (n = 0; n < xadlistlock->count; n++, xad++) {
+				xaddr = addressXAD(xad);
+				xlen = lengthXAD(xad);
+				dbFree(ip, xaddr, (s64) xlen);
+				xad->flag = 0;
+				jFYI(1,
+				     ("freeWMap: xaddr:0x%lx xlen:%d\n",
+				      (ulong) xaddr, xlen));
+			}
+		} else if (maplock->flag & mlckFREEPXD) {
+			pxdlock = (pxdlock_t *) maplock;
+			xaddr = addressPXD(&pxdlock->pxd);
+			xlen = lengthPXD(&pxdlock->pxd);
+			dbFree(ip, xaddr, (s64) xlen);
+			jFYI(1,
+			     ("freeWMap: xaddr:0x%lx xlen:%d\n",
+			      (ulong) xaddr, xlen));
+		} else {	/* (maplock->flag & mlckFREEPXDLIST) */
+
+			pxdlistlock = (xdlistlock_t *) maplock;
+			pxd = pxdlistlock->xdlist;
+			for (n = 0; n < pxdlistlock->count; n++, pxd++) {
+				xaddr = addressPXD(pxd);
+				xlen = lengthPXD(pxd);
+				dbFree(ip, xaddr, (s64) xlen);
+				jFYI(1,
+				     ("freeWMap: xaddr:0x%lx xlen:%d\n",
+				      (ulong) xaddr, xlen));
+			}
+		}
+	}
+}
+
+
+/*
+ *      txFreelock()
+ *
+ * function:    remove tlock from inode anonymous locklist
+ */
+void txFreelock(struct inode *ip)
+{
+	struct jfs_inode_info *jfs_ip = JFS_IP(ip);
+	tlock_t *xtlck, *tlck;
+	lid_t xlid = 0, lid;
+
+	if (!jfs_ip->atlhead)
+		return;
+
+	xtlck = (tlock_t *) &jfs_ip->atlhead;
+
+	while ((lid = xtlck->next)) {
+		tlck = lid_to_tlock(lid);
+		if (tlck->flag & tlckFREELOCK) {
+			xtlck->next = tlck->next;
+			txLockFree(lid);
+		} else {
+			xtlck = tlck;
+			xlid = lid;
+		}
+	}
+
+	if (jfs_ip->atlhead)
+		jfs_ip->atltail = xlid;
+	else {
+		jfs_ip->atltail = 0;
+		/*
+		 * If inode was on anon_list, remove it
+		 */
+		TXN_LOCK();
+		list_del_init(&jfs_ip->anon_inode_list);
+		TXN_UNLOCK();
+	}
+}
+
+
+/*
+ *      txAbort()
+ *
+ * function: abort tx before commit;
+ *
+ * frees line-locks and segment locks for all
+ * segments in comdata structure.
+ * Optionally sets state of file-system to FM_DIRTY in super-block.
+ * log age of page-frames in memory for which caller has
+ * are reset to 0 (to avoid logwarap).
+ */
+void txAbort(tid_t tid, int dirty)
+{
+	lid_t lid, next;
+	metapage_t *mp;
+	tblock_t *tblk = tid_to_tblock(tid);
+
+	jEVENT(1, ("txAbort: tid:%d dirty:0x%x\n", tid, dirty));
+
+	/*
+	 * free tlocks of the transaction
+	 */
+	for (lid = tblk->next; lid; lid = next) {
+		next = lid_to_tlock(lid)->next;
+
+		mp = lid_to_tlock(lid)->mp;
+
+		if (mp) {
+			mp->lid = 0;
+
+			/*
+			 * reset lsn of page to avoid logwarap:
+			 *
+			 * (page may have been previously committed by another
+			 * transaction(s) but has not been paged, i.e.,
+			 * it may be on logsync list even though it has not
+			 * been logged for the current tx.)
+			 */
+			if (mp->xflag & COMMIT_PAGE && mp->lsn)
+				LogSyncRelease(mp);
+		}
+		/* insert tlock at head of freelist */
+		TXN_LOCK();
+		txLockFree(lid);
+		TXN_UNLOCK();
+	}
+
+	/* caller will free the transaction block */
+
+	tblk->next = tblk->last = 0;
+
+	/*
+	 * mark filesystem dirty
+	 */
+	if (dirty)
+		updateSuper(tblk->sb, FM_DIRTY);
+
+	return;
+}
+
+
+/*
+ *      txAbortCommit()
+ *
+ * function: abort commit.
+ *
+ * frees tlocks of transaction; line-locks and segment locks for all
+ * segments in comdata structure. frees malloc storage
+ * sets state of file-system to FM_MDIRTY in super-block.
+ * log age of page-frames in memory for which caller has
+ * are reset to 0 (to avoid logwarap).
+ */
+void txAbortCommit(commit_t * cd, int exval)
+{
+	tblock_t *tblk;
+	tid_t tid;
+	lid_t lid, next;
+	metapage_t *mp;
+
+	assert(exval == EIO || exval == ENOMEM);
+	jEVENT(1, ("txAbortCommit: cd:0x%p\n", cd));
+
+	/*
+	 * free tlocks of the transaction
+	 */
+	tid = cd->tid;
+	tblk = tid_to_tblock(tid);
+	for (lid = tblk->next; lid; lid = next) {
+		next = lid_to_tlock(lid)->next;
+
+		mp = lid_to_tlock(lid)->mp;
+		if (mp) {
+			mp->lid = 0;
+
+			/*
+			 * reset lsn of page to avoid logwarap;
+			 */
+			if (mp->xflag & COMMIT_PAGE)
+				LogSyncRelease(mp);
+		}
+
+		/* insert tlock at head of freelist */
+		TXN_LOCK();
+		txLockFree(lid);
+		TXN_UNLOCK();
+	}
+
+	tblk->next = tblk->last = 0;
+
+	/* free the transaction block */
+	txEnd(tid);
+
+	/*
+	 * mark filesystem dirty
+	 */
+	updateSuper(cd->sb, FM_DIRTY);
+}
+
+
+/*
+ *      txLazyCommit(void)
+ *
+ *	All transactions except those changing ipimap (COMMIT_FORCE) are
+ *	processed by this routine.  This insures that the inode and block
+ *	allocation maps are updated in order.  For synchronous transactions,
+ *	let the user thread finish processing after txUpdateMap() is called.
+ */
+void txLazyCommit(tblock_t * tblk)
+{
+	log_t *log;
+
+	while (((tblk->flag & tblkGC_READY) == 0) &&
+	       ((tblk->flag & tblkGC_UNLOCKED) == 0)) {
+		/* We must have gotten ahead of the user thread
+		 */
+		jFYI(1,
+		     ("jfs_lazycommit: tblk 0x%p not unlocked\n", tblk));
+		schedule();
+	}
+
+	jFYI(1, ("txLazyCommit: processing tblk 0x%p\n", tblk));
+
+	txUpdateMap(tblk);
+
+	log = (log_t *) JFS_SBI(tblk->sb)->log;
+
+	spin_lock_irq(&log->gclock);	// LOGGC_LOCK
+
+	tblk->flag |= tblkGC_COMMITTED;
+
+	if ((tblk->flag & tblkGC_READY) || (tblk->flag & tblkGC_LAZY))
+		log->gcrtc--;
+
+	if (tblk->flag & tblkGC_READY)
+		wake_up(&tblk->gcwait);	// LOGGC_WAKEUP
+
+	spin_unlock_irq(&log->gclock);	// LOGGC_UNLOCK
+
+	if (tblk->flag & tblkGC_LAZY) {
+		txUnlock(tblk, 0);
+		tblk->flag &= ~tblkGC_LAZY;
+		txEnd(tblk - TxBlock);	/* Convert back to tid */
+	}
+
+	jFYI(1, ("txLazyCommit: done: tblk = 0x%p\n", tblk));
+}
+
+/*
+ *      jfs_lazycommit(void)
+ *
+ *	To be run as a kernel daemon.  If lbmIODone is called in an interrupt
+ *	context, or where blocking is not wanted, this routine will process
+ *	committed transactions from the unlock queue.
+ */
+int jfs_lazycommit(void)
+{
+	int WorkDone;
+	tblock_t *tblk;
+	unsigned long flags;
+
+	lock_kernel();
+
+	daemonize();
+	current->tty = NULL;
+	strcpy(current->comm, "jfsCommit");
+
+	unlock_kernel();
+
+	jfsCommitTask = current;
+
+	spin_lock_irq(&current->sigmask_lock);
+	siginitsetinv(&current->blocked,
+		      sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP)
+		      | sigmask(SIGCONT));
+	spin_unlock_irq(&current->sigmask_lock);
+
+	LAZY_LOCK_INIT();
+	TxAnchor.unlock_queue = TxAnchor.unlock_tail = 0;
+
+	complete(&jfsIOwait);
+
+	do {
+restart:
+		WorkDone = 0;
+		while ((tblk = TxAnchor.unlock_queue)) {
+			/*
+			 * We can't get ahead of user thread.  Spinning is
+			 * simpler than blocking/waking.  We shouldn't spin
+			 * very long, since user thread shouldn't be blocking
+			 * between lmGroupCommit & txEnd.
+			 */
+			WorkDone = 1;
+
+			LAZY_LOCK(flags);
+			/*
+			 * Remove first transaction from queue
+			 */
+			TxAnchor.unlock_queue = tblk->cqnext;
+			tblk->cqnext = 0;
+			if (TxAnchor.unlock_tail == tblk)
+				TxAnchor.unlock_tail = 0;
+
+			LAZY_UNLOCK(flags);
+			txLazyCommit(tblk);
+
+			/*
+			 * We can be running indefinately if other processors
+			 * are adding transactions to this list
+			 */
+			if (need_resched()) {
+				current->state = TASK_RUNNING;
+				schedule();
+			}
+		}
+
+		if (WorkDone)
+			goto restart;
+		
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule();
+	} while (!jfs_thread_stopped());
+
+	if (TxAnchor.unlock_queue)
+		jERROR(1, ("jfs_lazycommit being killed with pending transactions!\n"));
+	else
+		jFYI(1, ("jfs_lazycommit being killed\n"));
+	complete(&jfsIOwait);
+	return 0;
+}
+
+void txLazyUnlock(tblock_t * tblk)
+{
+	unsigned long flags;
+
+	LAZY_LOCK(flags);
+
+	if (TxAnchor.unlock_tail)
+		TxAnchor.unlock_tail->cqnext = tblk;
+	else
+		TxAnchor.unlock_queue = tblk;
+	TxAnchor.unlock_tail = tblk;
+	tblk->cqnext = 0;
+	LAZY_UNLOCK(flags);
+	wake_up_process(jfsCommitTask);
+}
+
+static void LogSyncRelease(metapage_t * mp)
+{
+	log_t *log = mp->log;
+
+	assert(atomic_read(&mp->nohomeok));
+	assert(log);
+	atomic_dec(&mp->nohomeok);
+
+	if (atomic_read(&mp->nohomeok))
+		return;
+
+	hold_metapage(mp, 0);
+
+	LOGSYNC_LOCK(log);
+	mp->log = NULL;
+	mp->lsn = 0;
+	mp->clsn = 0;
+	log->count--;
+	list_del_init(&mp->synclist);
+	LOGSYNC_UNLOCK(log);
+
+	release_metapage(mp);
+}
+
+/*
+ *      jfs_sync(void)
+ *
+ *	To be run as a kernel daemon.  This is awakened when tlocks run low.
+ *	We write any inodes that have anonymous tlocks so they will become
+ *	available.
+ */
+int jfs_sync(void)
+{
+	struct inode *ip;
+	struct jfs_inode_info *jfs_ip;
+
+	lock_kernel();
+
+	daemonize();
+	current->tty = NULL;
+	strcpy(current->comm, "jfsSync");
+
+	unlock_kernel();
+
+	jfsSyncTask = current;
+
+	spin_lock_irq(&current->sigmask_lock);
+	siginitsetinv(&current->blocked,
+		      sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP)
+		      | sigmask(SIGCONT));
+	spin_unlock_irq(&current->sigmask_lock);
+
+	complete(&jfsIOwait);
+
+	do {
+		/*
+		 * write each inode on the anonymous inode list
+		 */
+		TXN_LOCK();
+		while (TlocksLow && !list_empty(&TxAnchor.anon_list)) {
+			jfs_ip = list_entry(TxAnchor.anon_list.next,
+					    struct jfs_inode_info,
+					    anon_inode_list);
+			ip = &jfs_ip->vfs_inode;
+
+			/*
+			 * We must release the TXN_LOCK since our
+			 * IWRITE_TRYLOCK implementation may still block
+			 */
+			TXN_UNLOCK();
+			if (IWRITE_TRYLOCK(ip)) {
+				/*
+				 * inode will be removed from anonymous list
+				 * when it is committed
+				 */
+				jfs_commit_inode(ip, 0);
+				IWRITE_UNLOCK(ip);
+				/*
+				 * Just to be safe.  I don't know how
+				 * long we can run without blocking
+				 */
+				if (need_resched()) {
+					current->state = TASK_RUNNING;
+					schedule();
+				}
+				TXN_LOCK();
+			} else {
+				/* We can't get the write lock.  It may
+				 * be held by a thread waiting for tlock's
+				 * so let's not block here.  Save it to
+				 * put back on the anon_list.
+				 */
+
+				/*
+				 * We released TXN_LOCK, let's make sure
+				 * this inode is still there
+				 */
+				TXN_LOCK();
+				if (TxAnchor.anon_list.next !=
+				    &jfs_ip->anon_inode_list)
+					continue;
+
+				/* Take off anon_list */
+				list_del(&jfs_ip->anon_inode_list);
+
+				/* Put on anon_list2 */
+				list_add(&jfs_ip->anon_inode_list,
+					 &TxAnchor.anon_list2);
+			}
+		}
+		/* Add anon_list2 back to anon_list */
+		if (!list_empty(&TxAnchor.anon_list2)) {
+			list_splice(&TxAnchor.anon_list2, &TxAnchor.anon_list);
+			INIT_LIST_HEAD(&TxAnchor.anon_list2);
+		}
+		TXN_UNLOCK();
+
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule();
+	} while (!jfs_thread_stopped());
+
+	jFYI(1, ("jfs_sync being killed\n"));
+	complete(&jfsIOwait);
+	return 0;
+}
+
+#if CONFIG_PROC_FS
+int jfs_txanchor_read(char *buffer, char **start, off_t offset, int length,
+		      int *eof, void *data)
+{
+	int len = 0;
+	off_t begin;
+	char *freewait;
+	char *freelockwait;
+	char *lowlockwait;
+
+	freewait =
+	    waitqueue_active(&TxAnchor.freewait) ? "active" : "empty";
+	freelockwait =
+	    waitqueue_active(&TxAnchor.freelockwait) ? "active" : "empty";
+	lowlockwait =
+	    waitqueue_active(&TxAnchor.lowlockwait) ? "active" : "empty";
+
+	len += sprintf(buffer,
+		       "JFS TxAnchor\n"
+		       "============\n"
+		       "freetid = %d\n"
+		       "freewait = %s\n"
+		       "freelock = %d\n"
+		       "freelockwait = %s\n"
+		       "lowlockwait = %s\n"
+		       "tlocksInUse = %d\n"
+		       "unlock_queue = 0x%p\n"
+		       "unlock_tail = 0x%p\n",
+		       TxAnchor.freetid,
+		       freewait,
+		       TxAnchor.freelock,
+		       freelockwait,
+		       lowlockwait,
+		       TxAnchor.tlocksInUse,
+		       TxAnchor.unlock_queue,
+		       TxAnchor.unlock_tail);
+
+	begin = offset;
+	*start = buffer + begin;
+	len -= begin;
+
+	if (len > length)
+		len = length;
+	else
+		*eof = 1;
+
+	if (len < 0)
+		len = 0;
+
+	return len;
+}
+#endif
--- a/fs/jfs/jfs_txnmgr.h
+++ b/fs/jfs/jfs_txnmgr.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/*
+ * Change History :
+ *
+ */
+
+#ifndef _H_JFS_TXNMGR
+#define _H_JFS_TXNMGR
+/*
+ *	jfs_txnmgr.h: transaction manager
+ */
+
+#include "jfs_logmgr.h"
+
+/*
+ * Hide implementation of TxBlock and TxLock
+ */
+#define tid_to_tblock(tid) (&TxBlock[tid])
+
+#define lid_to_tlock(lid) (&TxLock[lid])
+
+/*
+ *	transaction block
+ */
+typedef struct tblock {
+	/*
+	 * tblock_t and jbuf_t common area: struct logsyncblk
+	 *
+	 * the following 5 fields are the same as struct logsyncblk
+	 * which is common to tblock and jbuf to form logsynclist
+	 */
+	u16 xflag;		/* tx commit type */
+	u16 flag;		/* tx commit state */
+	lid_t dummy;		/* Must keep structures common */
+	s32 lsn;		/* recovery lsn */
+	struct list_head synclist;	/* logsynclist link */
+
+	/* lock management */
+	struct super_block *sb;	/* 4: super block */
+	lid_t next;		/* 2: index of first tlock of tid */
+	lid_t last;		/* 2: index of last tlock of tid */
+	wait_queue_head_t waitor;	/* 4: tids waiting on this tid */
+
+	/* log management */
+	u32 logtid;		/* 4: log transaction id */
+				/* (32) */
+
+	/* commit management */
+	struct tblock *cqnext;	/* 4: commit queue link */
+	s32 clsn;		/* 4: commit lsn */
+	struct lbuf *bp;	/* 4: */
+	s32 pn;			/* 4: commit record log page number */
+	s32 eor;		/* 4: commit record eor */
+	wait_queue_head_t gcwait;	/* 4: group commit event list:
+					 *    ready transactions wait on this
+					 *    event for group commit completion.
+					 */
+	struct inode *ip;	/* 4: inode being created or deleted */
+	s32 rsrvd;		/* 4: */
+} tblock_t;			/* (64) */
+
+extern struct tblock *TxBlock;	/* transaction block table */
+
+/* commit flags: tblk->xflag */
+#define	COMMIT_SYNC	0x0001	/* synchronous commit */
+#define	COMMIT_FORCE	0x0002	/* force pageout at end of commit */
+#define	COMMIT_FLUSH	0x0004	/* init flush at end of commit */
+#define COMMIT_MAP	0x00f0
+#define	COMMIT_PMAP	0x0010	/* update pmap */
+#define	COMMIT_WMAP	0x0020	/* update wmap */
+#define	COMMIT_PWMAP	0x0040	/* update pwmap */
+#define	COMMIT_FREE	0x0f00
+#define	COMMIT_DELETE	0x0100	/* inode delete */
+#define	COMMIT_TRUNCATE	0x0200	/* file truncation */
+#define	COMMIT_CREATE	0x0400	/* inode create */
+#define	COMMIT_LAZY	0x0800	/* lazy commit */
+#define COMMIT_PAGE	0x1000	/* Identifies element as metapage */
+#define COMMIT_INODE	0x2000	/* Identifies element as inode */
+
+/* group commit flags tblk->flag: see jfs_logmgr.h */
+
+/*
+ *	transaction lock
+ */
+typedef struct tlock {
+	lid_t next;		/* index next lockword on tid locklist
+				 *          next lockword on freelist
+				 */
+	tid_t tid;		/* transaction id holding lock */
+
+	u16 flag;		/* 2: lock control */
+	u16 type;		/* 2: log type */
+
+	struct metapage *mp;	/* 4: object page buffer locked */
+	struct inode *ip;	/* 4: object */
+	/* (16) */
+
+	s16 lock[24];		/* 48: overlay area */
+} tlock_t;			/* (64) */
+
+extern struct tlock *TxLock;	/* transaction lock table */
+
+/*
+ * tlock flag
+ */
+/* txLock state */
+#define tlckPAGELOCK		0x8000
+#define tlckINODELOCK		0x4000
+#define tlckLINELOCK		0x2000
+#define tlckINLINELOCK		0x1000
+/* lmLog state */
+#define tlckLOG			0x0800
+/* updateMap state */
+#define	tlckUPDATEMAP		0x0080
+/* freeLock state */
+#define tlckFREELOCK		0x0008
+#define tlckWRITEPAGE		0x0004
+#define tlckFREEPAGE		0x0002
+
+/*
+ * tlock type
+ */
+#define	tlckTYPE		0xfe00
+#define	tlckINODE		0x8000
+#define	tlckXTREE		0x4000
+#define	tlckDTREE		0x2000
+#define	tlckMAP			0x1000
+#define	tlckEA			0x0800
+#define	tlckACL			0x0400
+#define	tlckDATA		0x0200
+#define	tlckBTROOT		0x0100
+
+#define	tlckOPERATION		0x00ff
+#define tlckGROW		0x0001	/* file grow */
+#define tlckREMOVE		0x0002	/* file delete */
+#define tlckTRUNCATE		0x0004	/* file truncate */
+#define tlckRELOCATE		0x0008	/* file/directory relocate */
+#define tlckENTRY		0x0001	/* directory insert/delete */
+#define tlckEXTEND		0x0002	/* directory extend in-line */
+#define tlckSPLIT		0x0010	/* splited page */
+#define tlckNEW			0x0020	/* new page from split */
+#define tlckFREE		0x0040	/* free page */
+#define tlckRELINK		0x0080	/* update sibling pointer */
+
+/*
+ *	linelock for lmLog()
+ *
+ * note: linelock_t and its variations are overlaid
+ * at tlock.lock: watch for alignment;
+ */
+typedef struct {
+	u8 offset;		/* 1: */
+	u8 length;		/* 1: */
+} lv_t;				/* (2) */
+
+#define	TLOCKSHORT	20
+#define	TLOCKLONG	28
+
+typedef struct {
+	u16 next;		/* 2: next linelock */
+
+	s8 maxcnt;		/* 1: */
+	s8 index;		/* 1: */
+
+	u16 flag;		/* 2: */
+	u8 type;		/* 1: */
+	u8 l2linesize;		/* 1: log2 of linesize */
+	/* (8) */
+
+	lv_t lv[20];		/* 40: */
+} linelock_t;			/* (48) */
+
+#define dtlock_t	linelock_t
+#define itlock_t	linelock_t
+
+typedef struct {
+	u16 next;		/* 2: */
+
+	s8 maxcnt;		/* 1: */
+	s8 index;		/* 1: */
+
+	u16 flag;		/* 2: */
+	u8 type;		/* 1: */
+	u8 l2linesize;		/* 1: log2 of linesize */
+				/* (8) */
+
+	lv_t header;		/* 2: */
+	lv_t lwm;		/* 2: low water mark */
+	lv_t hwm;		/* 2: high water mark */
+	lv_t twm;		/* 2: */
+				/* (16) */
+
+	s32 pxdlock[8];		/* 32: */
+} xtlock_t;			/* (48) */
+
+
+/*
+ *	maplock for txUpdateMap()
+ *
+ * note: maplock_t and its variations are overlaid
+ * at tlock.lock/linelock: watch for alignment;
+ * N.B. next field may be set by linelock, and should not
+ * be modified by maplock;
+ * N.B. index of the first pxdlock specifies index of next 
+ * free maplock (i.e., number of maplock) in the tlock; 
+ */
+typedef struct {
+	u16 next;		/* 2: */
+
+	u8 maxcnt;		/* 2: */
+	u8 index;		/* 2: next free maplock index */
+
+	u16 flag;		/* 2: */
+	u8 type;		/* 1: */
+	u8 count;		/* 1: number of pxd/xad */
+				/* (8) */
+
+	pxd_t pxd;		/* 8: */
+} maplock_t;			/* (16): */
+
+/* maplock flag */
+#define	mlckALLOC		0x00f0
+#define	mlckALLOCXADLIST	0x0080
+#define	mlckALLOCPXDLIST	0x0040
+#define	mlckALLOCXAD		0x0020
+#define	mlckALLOCPXD		0x0010
+#define	mlckFREE		0x000f
+#define	mlckFREEXADLIST		0x0008
+#define	mlckFREEPXDLIST		0x0004
+#define	mlckFREEXAD		0x0002
+#define	mlckFREEPXD		0x0001
+
+#define	pxdlock_t	maplock_t
+
+typedef struct {
+	u16 next;		/* 2: */
+
+	u8 maxcnt;		/* 2: */
+	u8 index;		/* 2: */
+
+	u16 flag;		/* 2: */
+	u8 type;		/* 1: */
+	u8 count;		/* 1: number of pxd/xad */
+				/* (8) */
+
+	void *xdlist;		/* 4: pxd/xad list */
+	s32 rsrvd;		/* 4: */
+} xdlistlock_t;			/* (16): */
+
+
+/*
+ *	commit
+ *
+ * parameter to the commit manager routines
+ */
+typedef struct commit {
+	tid_t tid;		/* 4: tid = index of tblock */
+	int flag;		/* 4: flags */
+	log_t *log;		/* 4: log */
+	struct super_block *sb;	/* 4: superblock */
+
+	int nip;		/* 4: number of entries in iplist */
+	struct inode **iplist;	/* 4: list of pointers to inodes */
+				/* (32) */
+
+	/* log record descriptor on 64-bit boundary */
+	lrd_t lrd;		/* : log record descriptor */
+} commit_t;
+
+/*
+ * external declarations
+ */
+extern tlock_t *txLock(tid_t tid, struct inode *ip, struct metapage *mp, int flag);
+
+extern tlock_t *txMaplock(tid_t tid, struct inode *ip, int flag);
+
+extern int txCommit(tid_t tid, int nip, struct inode **iplist, int flag);
+
+extern tid_t txBegin(struct super_block *sb, int flag);
+
+extern void txBeginAnon(struct super_block *sb);
+
+extern void txEnd(tid_t tid);
+
+extern void txAbort(tid_t tid, int dirty);
+
+extern linelock_t *txLinelock(linelock_t * tlock);
+
+extern void txFreeMap(struct inode *ip,
+		      maplock_t * maplock, tblock_t * tblk, int maptype);
+
+extern void txEA(tid_t tid, struct inode *ip, dxd_t * oldea, dxd_t * newea);
+
+extern void txFreelock(struct inode *ip);
+
+extern int lmLog(log_t * log, tblock_t * tblk, lrd_t * lrd, tlock_t * tlck);
+
+#endif				/* _H_JFS_TXNMGR */
--- a/fs/jfs/jfs_types.h
+++ b/fs/jfs/jfs_types.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef _H_JFS_TYPES
+#define	_H_JFS_TYPES
+
+/*
+ *	jfs_types.h:
+ *
+ * basic type/utility  definitions
+ *
+ * note: this header file must be the 1st include file
+ * of JFS include list in all JFS .c file.
+ */
+
+#include <linux/types.h>
+#include <linux/nls.h>
+
+#include "endian24.h"
+
+/*
+ * transaction and lock id's
+ */
+typedef uint tid_t;
+typedef uint lid_t;
+
+/*
+ * Almost identical to Linux's timespec, but not quite
+ */
+struct timestruc_t {
+	u32 tv_sec;
+	u32 tv_nsec;
+};
+
+/*
+ *	handy
+ */
+
+#define LEFTMOSTONE	0x80000000
+#define	HIGHORDER	0x80000000u	/* high order bit on            */
+#define	ONES		0xffffffffu	/* all bit on                   */
+
+typedef int boolean_t;
+#define TRUE 1
+#define FALSE 0
+
+/*
+ *	logical xd (lxd)
+ */
+typedef struct {
+	unsigned len:24;
+	unsigned off1:8;
+	u32 off2;
+} lxd_t;
+
+/* lxd_t field construction */
+#define	LXDlength(lxd, length32)	( (lxd)->len = length32 )
+#define	LXDoffset(lxd, offset64)\
+{\
+	(lxd)->off1 = ((s64)offset64) >> 32;\
+	(lxd)->off2 = (offset64) & 0xffffffff;\
+}
+
+/* lxd_t field extraction */
+#define	lengthLXD(lxd)	( (lxd)->len )
+#define	offsetLXD(lxd)\
+	( ((s64)((lxd)->off1)) << 32 | (lxd)->off2 )
+
+/* lxd list */
+typedef struct {
+	s16 maxnlxd;
+	s16 nlxd;
+	lxd_t *lxd;
+} lxdlist_t;
+
+/*
+ *	physical xd (pxd)
+ */
+typedef struct {
+	unsigned len:24;
+	unsigned addr1:8;
+	u32 addr2;
+} pxd_t;
+
+/* xd_t field construction */
+
+#define	PXDlength(pxd, length32)	((pxd)->len = __cpu_to_le24(length32))
+#define	PXDaddress(pxd, address64)\
+{\
+	(pxd)->addr1 = ((s64)address64) >> 32;\
+	(pxd)->addr2 = __cpu_to_le32((address64) & 0xffffffff);\
+}
+
+/* xd_t field extraction */
+#define	lengthPXD(pxd)	__le24_to_cpu((pxd)->len)
+#define	addressPXD(pxd)\
+	( ((s64)((pxd)->addr1)) << 32 | __le32_to_cpu((pxd)->addr2))
+
+/* pxd list */
+typedef struct {
+	s16 maxnpxd;
+	s16 npxd;
+	pxd_t pxd[8];
+} pxdlist_t;
+
+
+/*
+ *	data extent descriptor (dxd)
+ */
+typedef struct {
+	unsigned flag:8;	/* 1: flags */
+	unsigned rsrvd:24;	/* 3: */
+	u32 size;		/* 4: size in byte */
+	unsigned len:24;	/* 3: length in unit of fsblksize */
+	unsigned addr1:8;	/* 1: address in unit of fsblksize */
+	u32 addr2;		/* 4: address in unit of fsblksize */
+} dxd_t;			/* - 16 - */
+
+/* dxd_t flags */
+#define	DXD_INDEX	0x80	/* B+-tree index */
+#define	DXD_INLINE	0x40	/* in-line data extent */
+#define	DXD_EXTENT	0x20	/* out-of-line single extent */
+#define	DXD_FILE	0x10	/* out-of-line file (inode) */
+#define DXD_CORRUPT	0x08	/* Inconsistency detected */
+
+/* dxd_t field construction
+ *	Conveniently, the PXD macros work for DXD
+ */
+#define	DXDlength	PXDlength
+#define	DXDaddress	PXDaddress
+#define	lengthDXD	lengthPXD
+#define	addressDXD	addressPXD
+
+/*
+ *      directory entry argument
+ */
+typedef struct component_name {
+	int namlen;
+	wchar_t *name;
+} component_t;
+
+
+/*
+ *	DASD limit information - stored in directory inode
+ */
+typedef struct dasd {
+	u8 thresh;		/* Alert Threshold (in percent) */
+	u8 delta;		/* Alert Threshold delta (in percent)   */
+	u8 rsrvd1;
+	u8 limit_hi;		/* DASD limit (in logical blocks)       */
+	u32 limit_lo;		/* DASD limit (in logical blocks)       */
+	u8 rsrvd2[3];
+	u8 used_hi;		/* DASD usage (in logical blocks)       */
+	u32 used_lo;		/* DASD usage (in logical blocks)       */
+} dasd_t;
+
+#define DASDLIMIT(dasdp) \
+	(((u64)((dasdp)->limit_hi) << 32) + __le32_to_cpu((dasdp)->limit_lo))
+#define setDASDLIMIT(dasdp, limit)\
+{\
+	(dasdp)->limit_hi = ((u64)limit) >> 32;\
+	(dasdp)->limit_lo = __cpu_to_le32(limit);\
+}
+#define DASDUSED(dasdp) \
+	(((u64)((dasdp)->used_hi) << 32) + __le32_to_cpu((dasdp)->used_lo))
+#define setDASDUSED(dasdp, used)\
+{\
+	(dasdp)->used_hi = ((u64)used) >> 32;\
+	(dasdp)->used_lo = __cpu_to_le32(used);\
+}
+
+#endif				/* !_H_JFS_TYPES */
--- a/fs/jfs/jfs_umount.c
+++ b/fs/jfs/jfs_umount.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/*
+ * Change History :
+ */
+
+/*
+ *	jfs_umount.c
+ *
+ * note: file system in transition to aggregate/fileset:
+ * (ref. jfs_mount.c)
+ *
+ * file system unmount is interpreted as mount of the single/only 
+ * fileset in the aggregate and, if unmount of the last fileset, 
+ * as unmount of the aggerate;
+ */
+
+#include <linux/fs.h>
+#include "jfs_incore.h"
+#include "jfs_filsys.h"
+#include "jfs_superblock.h"
+#include "jfs_dmap.h"
+#include "jfs_imap.h"
+#include "jfs_metapage.h"
+#include "jfs_debug.h"
+
+/*
+ * NAME:	jfs_umount(vfsp, flags, crp)
+ *
+ * FUNCTION:	vfs_umount()
+ *
+ * PARAMETERS:	vfsp	- virtual file system pointer
+ *		flags	- unmount for shutdown
+ *		crp	- credential
+ *
+ * RETURN :	EBUSY	- device has open files
+ */
+int jfs_umount(struct super_block *sb)
+{
+	int rc = 0;
+	log_t *log;
+	struct jfs_sb_info *sbi = JFS_SBI(sb);
+	struct inode *ipbmap = sbi->ipbmap;
+	struct inode *ipimap = sbi->ipimap;
+	struct inode *ipaimap = sbi->ipaimap;
+	struct inode *ipaimap2 = sbi->ipaimap2;
+
+	jFYI(1, ("\n	UnMount JFS: sb:0x%p\n", sb));
+
+	/*
+	 *      update superblock and close log 
+	 *
+	 * if mounted read-write and log based recovery was enabled
+	 */
+	if ((log = sbi->log)) {
+		/*
+		 * close log: 
+		 *
+		 * remove file system from log active file system list.
+		 */
+		log = sbi->log;
+		rc = lmLogClose(sb, log);
+	}
+
+	/*
+	 * close fileset inode allocation map (aka fileset inode)
+	 */
+	jEVENT(0, ("jfs_umount: close ipimap:0x%p\n", ipimap));
+	diUnmount(ipimap, 0);
+
+	diFreeSpecial(ipimap);
+	sbi->ipimap = NULL;
+
+	/*
+	 * close secondary aggregate inode allocation map
+	 */
+	ipaimap2 = sbi->ipaimap2;
+	if (ipaimap2) {
+		jEVENT(0, ("jfs_umount: close ipaimap2:0x%p\n", ipaimap2));
+		diUnmount(ipaimap2, 0);
+		diFreeSpecial(ipaimap2);
+		sbi->ipaimap2 = NULL;
+	}
+
+	/*
+	 * close aggregate inode allocation map
+	 */
+	ipaimap = sbi->ipaimap;
+	jEVENT(0, ("jfs_umount: close ipaimap:0x%p\n", ipaimap));
+	diUnmount(ipaimap, 0);
+	diFreeSpecial(ipaimap);
+	sbi->ipaimap = NULL;
+
+	/*
+	 * close aggregate block allocation map
+	 */
+	jEVENT(0, ("jfs_umount: close ipbmap:%p\n", ipbmap));
+	dbUnmount(ipbmap, 0);
+
+	diFreeSpecial(ipbmap);
+	sbi->ipimap = NULL;
+
+	/*
+	 * ensure all file system file pages are propagated to their
+	 * home blocks on disk (and their in-memory buffer pages are 
+	 * invalidated) BEFORE updating file system superblock state
+	 * (to signify file system is unmounted cleanly, and thus in 
+	 * consistent state) and log superblock active file system 
+	 * list (to signify skip logredo()).
+	 */
+	if (log)		/* log = NULL if read-only mount */
+		rc = updateSuper(sb, FM_CLEAN);
+
+
+	jFYI(0, ("	UnMount JFS Complete: %d\n", rc));
+	return rc;
+}
+
+
+int jfs_umount_rw(struct super_block *sb)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(sb);
+
+	if (!sbi->log)
+		return 0;
+
+	/*
+	 * close log: 
+	 *
+	 * remove file system from log active file system list.
+	 */
+	lmLogClose(sb, sbi->log);
+
+	dbSync(sbi->ipbmap);
+	diSync(sbi->ipimap);
+
+	sbi->log = 0;
+
+	return updateSuper(sb, FM_CLEAN);
+       
+}
--- a/fs/jfs/jfs_unicode.c
+++ b/fs/jfs/jfs_unicode.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include "jfs_types.h"
+#include "jfs_filsys.h"
+#include "jfs_unicode.h"
+#include "jfs_debug.h"
+
+/*
+ * NAME:	jfs_strfromUCS()
+ *
+ * FUNCTION:	Convert little-endian unicode string to character string
+ *
+ */
+int jfs_strfromUCS_le(char *to, const wchar_t * from,	/* LITTLE ENDIAN */
+		      int len, struct nls_table *codepage)
+{
+	int i;
+	int outlen = 0;
+
+	for (i = 0; (i < len) && from[i]; i++) {
+		int charlen;
+		charlen =
+		    codepage->uni2char(le16_to_cpu(from[i]), &to[outlen],
+				       NLS_MAX_CHARSET_SIZE);
+		if (charlen > 0) {
+			outlen += charlen;
+		} else {
+			to[outlen++] = '?';
+		}
+	}
+	to[outlen] = 0;
+	jEVENT(0, ("jfs_strfromUCS returning %d - '%s'\n", outlen, to));
+	return outlen;
+}
+
+/*
+ * NAME:	jfs_strtoUCS()
+ *
+ * FUNCTION:	Convert character string to unicode string
+ *
+ */
+int jfs_strtoUCS(wchar_t * to,
+		 const char *from, int len, struct nls_table *codepage)
+{
+	int charlen;
+	int i;
+
+	jEVENT(0, ("jfs_strtoUCS - '%s'\n", from));
+
+	for (i = 0; len && *from; i++, from += charlen, len -= charlen) {
+		charlen = codepage->char2uni(from, len, &to[i]);
+		if (charlen < 1) {
+			jERROR(1, ("jfs_strtoUCS: char2uni returned %d.\n",
+				   charlen));
+			jERROR(1, ("charset = %s, char = 0x%x\n",
+				   codepage->charset, (unsigned char) *from));
+			to[i] = 0x003f;	/* a question mark */
+			charlen = 1;
+		}
+	}
+
+	jEVENT(0, (" returning %d\n", i));
+
+	to[i] = 0;
+	return i;
+}
+
+/*
+ * NAME:	get_UCSname()
+ *
+ * FUNCTION:	Allocate and translate to unicode string
+ *
+ */
+int get_UCSname(component_t * uniName, struct dentry *dentry,
+		struct nls_table *nls_tab)
+{
+	int length = dentry->d_name.len;
+
+	if (length > JFS_NAME_MAX)
+		return ENAMETOOLONG;
+
+	uniName->name =
+	    kmalloc((length + 1) * sizeof(wchar_t), GFP_NOFS);
+
+	if (uniName->name == NULL)
+		return ENOSPC;
+
+	uniName->namlen = jfs_strtoUCS(uniName->name, dentry->d_name.name,
+				       length, nls_tab);
+
+	return 0;
+}
--- a/fs/jfs/jfs_unicode.h
+++ b/fs/jfs/jfs_unicode.h
+/*
+ * unistrk:  Unicode kernel case support
+ *
+ * Function:
+ *     Convert a unicode character to upper or lower case using
+ *     compressed tables.
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ */
+
+#include <asm/byteorder.h>
+#include "jfs_types.h"
+
+typedef struct {
+	wchar_t start;
+	wchar_t end;
+	signed char *table;
+} UNICASERANGE;
+
+extern signed char UniUpperTable[512];
+extern UNICASERANGE UniUpperRange[];
+extern int get_UCSname(component_t *, struct dentry *, struct nls_table *);
+extern int jfs_strfromUCS_le(char *, const wchar_t *, int, struct nls_table *);
+
+#define free_UCSname(COMP) kfree((COMP)->name)
+
+/*
+ * UniStrcpy:  Copy a string
+ */
+static inline wchar_t *UniStrcpy(wchar_t * ucs1, const wchar_t * ucs2)
+{
+	wchar_t *anchor = ucs1;	/* save the start of result string */
+
+	while ((*ucs1++ = *ucs2++));
+	return anchor;
+}
+
+
+
+/*
+ * UniStrncpy:  Copy length limited string with pad
+ */
+static inline wchar_t *UniStrncpy(wchar_t * ucs1, const wchar_t * ucs2,
+				  size_t n)
+{
+	wchar_t *anchor = ucs1;
+
+	while (n-- && *ucs2)	/* Copy the strings */
+		*ucs1++ = *ucs2++;
+
+	n++;
+	while (n--)		/* Pad with nulls */
+		*ucs1++ = 0;
+	return anchor;
+}
+
+/*
+ * UniStrncmp_le:  Compare length limited string - native to little-endian
+ */
+static inline int UniStrncmp_le(const wchar_t * ucs1, const wchar_t * ucs2,
+				size_t n)
+{
+	if (!n)
+		return 0;	/* Null strings are equal */
+	while ((*ucs1 == __le16_to_cpu(*ucs2)) && *ucs1 && --n) {
+		ucs1++;
+		ucs2++;
+	}
+	return (int) *ucs1 - (int) __le16_to_cpu(*ucs2);
+}
+
+/*
+ * UniStrncpy_le:  Copy length limited string with pad to little-endian
+ */
+static inline wchar_t *UniStrncpy_le(wchar_t * ucs1, const wchar_t * ucs2,
+				     size_t n)
+{
+	wchar_t *anchor = ucs1;
+
+	while (n-- && *ucs2)	/* Copy the strings */
+		*ucs1++ = __le16_to_cpu(*ucs2++);
+
+	n++;
+	while (n--)		/* Pad with nulls */
+		*ucs1++ = 0;
+	return anchor;
+}
+
+
+/*
+ * UniToupper:  Convert a unicode character to upper case
+ */
+static inline wchar_t UniToupper(register wchar_t uc)
+{
+	register UNICASERANGE *rp;
+
+	if (uc < sizeof(UniUpperTable)) {	/* Latin characters */
+		return uc + UniUpperTable[uc];	/* Use base tables */
+	} else {
+		rp = UniUpperRange;	/* Use range tables */
+		while (rp->start) {
+			if (uc < rp->start)	/* Before start of range */
+				return uc;	/* Uppercase = input */
+			if (uc <= rp->end)	/* In range */
+				return uc + rp->table[uc - rp->start];
+			rp++;	/* Try next range */
+		}
+	}
+	return uc;		/* Past last range */
+}
+
+
+/*
+ * UniStrupr:  Upper case a unicode string
+ */
+static inline wchar_t *UniStrupr(register wchar_t * upin)
+{
+	register wchar_t *up;
+
+	up = upin;
+	while (*up) {		/* For all characters */
+		*up = UniToupper(*up);
+		up++;
+	}
+	return upin;		/* Return input pointer */
+}
+
--- a/fs/jfs/jfs_uniupr.c
+++ b/fs/jfs/jfs_uniupr.c
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * jfs_uniupr.c - Unicode compressed case ranges
+ *
+*/
+
+#include <linux/fs.h>
+#include "jfs_unicode.h"
+
+/*
+ * Latin upper case
+ */
+signed char UniUpperTable[512] = {
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 000-00f */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 010-01f */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 020-02f */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 030-03f */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 040-04f */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 050-05f */
+   0,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* 060-06f */
+ -32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,  0,  0,  0,  0,  0, /* 070-07f */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 080-08f */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 090-09f */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 0a0-0af */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 0b0-0bf */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 0c0-0cf */
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 0d0-0df */
+ -32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* 0e0-0ef */
+ -32,-32,-32,-32,-32,-32,-32,  0,-32,-32,-32,-32,-32,-32,-32,121, /* 0f0-0ff */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 100-10f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 110-11f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 120-12f */
+   0,  0,  0, -1,  0, -1,  0, -1,  0,  0, -1,  0, -1,  0, -1,  0, /* 130-13f */
+  -1,  0, -1,  0, -1,  0, -1,  0, -1,  0,  0, -1,  0, -1,  0, -1, /* 140-14f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 150-15f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 160-16f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0,  0, -1,  0, -1,  0, -1,  0, /* 170-17f */
+   0,  0,  0, -1,  0, -1,  0,  0, -1,  0,  0,  0, -1,  0,  0,  0, /* 180-18f */
+   0,  0, -1,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0, /* 190-19f */
+   0, -1,  0, -1,  0, -1,  0,  0, -1,  0,  0,  0,  0, -1,  0,  0, /* 1a0-1af */
+  -1,  0,  0,  0, -1,  0, -1,  0,  0, -1,  0,  0,  0, -1,  0,  0, /* 1b0-1bf */
+   0,  0,  0,  0,  0, -1, -2,  0, -1, -2,  0, -1, -2,  0, -1,  0, /* 1c0-1cf */
+  -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,-79,  0, -1, /* 1d0-1df */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1e0-1ef */
+   0,  0, -1, -2,  0, -1,  0,  0,  0, -1,  0, -1,  0, -1,  0, -1, /* 1f0-1ff */
+};
+
+/* Upper case range - Greek */
+static signed char UniCaseRangeU03a0[47] = {
+   0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,-38,-37,-37,-37, /* 3a0-3af */
+   0,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* 3b0-3bf */
+ -32,-32,-31,-32,-32,-32,-32,-32,-32,-32,-32,-32,-64,-63,-63,
+};
+
+/* Upper case range - Cyrillic */
+static signed char UniCaseRangeU0430[48] = {
+ -32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* 430-43f */
+ -32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* 440-44f */
+   0,-80,-80,-80,-80,-80,-80,-80,-80,-80,-80,-80,-80,  0,-80,-80, /* 450-45f */
+};
+
+/* Upper case range - Extended cyrillic */
+static signed char UniCaseRangeU0490[61] = {
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 490-49f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 4a0-4af */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 4b0-4bf */
+   0,  0, -1,  0, -1,  0,  0,  0, -1,  0,  0,  0, -1,
+};
+
+/* Upper case range - Extended latin and greek */
+static signed char UniCaseRangeU1e00[509] = {
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1e00-1e0f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1e10-1e1f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1e20-1e2f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1e30-1e3f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1e40-1e4f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1e50-1e5f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1e60-1e6f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1e70-1e7f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1e80-1e8f */
+   0, -1,  0, -1,  0, -1,  0,  0,  0,  0,  0,-59,  0, -1,  0, -1, /* 1e90-1e9f */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1ea0-1eaf */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1eb0-1ebf */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1ec0-1ecf */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1ed0-1edf */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0, -1, /* 1ee0-1eef */
+   0, -1,  0, -1,  0, -1,  0, -1,  0, -1,  0,  0,  0,  0,  0,  0, /* 1ef0-1eff */
+   8,  8,  8,  8,  8,  8,  8,  8,  0,  0,  0,  0,  0,  0,  0,  0, /* 1f00-1f0f */
+   8,  8,  8,  8,  8,  8,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 1f10-1f1f */
+   8,  8,  8,  8,  8,  8,  8,  8,  0,  0,  0,  0,  0,  0,  0,  0, /* 1f20-1f2f */
+   8,  8,  8,  8,  8,  8,  8,  8,  0,  0,  0,  0,  0,  0,  0,  0, /* 1f30-1f3f */
+   8,  8,  8,  8,  8,  8,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 1f40-1f4f */
+   0,  8,  0,  8,  0,  8,  0,  8,  0,  0,  0,  0,  0,  0,  0,  0, /* 1f50-1f5f */
+   8,  8,  8,  8,  8,  8,  8,  8,  0,  0,  0,  0,  0,  0,  0,  0, /* 1f60-1f6f */
+  74, 74, 86, 86, 86, 86,100,100,  0,  0,112,112,126,126,  0,  0, /* 1f70-1f7f */
+   8,  8,  8,  8,  8,  8,  8,  8,  0,  0,  0,  0,  0,  0,  0,  0, /* 1f80-1f8f */
+   8,  8,  8,  8,  8,  8,  8,  8,  0,  0,  0,  0,  0,  0,  0,  0, /* 1f90-1f9f */
+   8,  8,  8,  8,  8,  8,  8,  8,  0,  0,  0,  0,  0,  0,  0,  0, /* 1fa0-1faf */
+   8,  8,  0,  9,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 1fb0-1fbf */
+   0,  0,  0,  9,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 1fc0-1fcf */
+   8,  8,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 1fd0-1fdf */
+   8,  8,  0,  0,  0,  7,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, /* 1fe0-1fef */
+   0,  0,  0,  9,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+};
+
+/* Upper case range - Wide latin */
+static signed char UniCaseRangeUff40[27] = {
+   0,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32, /* ff40-ff4f */
+ -32,-32,-32,-32,-32,-32,-32,-32,-32,-32,-32,
+};
+
+/*
+ * Upper Case Range
+ */
+UNICASERANGE UniUpperRange[] = {
+    { 0x03a0,  0x03ce,  UniCaseRangeU03a0 },
+    { 0x0430,  0x045f,  UniCaseRangeU0430 },
+    { 0x0490,  0x04cc,  UniCaseRangeU0490 },
+    { 0x1e00,  0x1ffc,  UniCaseRangeU1e00 },
+    { 0xff40,  0xff5a,  UniCaseRangeUff40 },
+    { 0, 0, 0 }
+};
--- a/fs/jfs/jfs_xtree.c
+++ b/fs/jfs/jfs_xtree.c
--- a/fs/jfs/jfs_xtree.h
+++ b/fs/jfs/jfs_xtree.h
+/*
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+*/
+/*
+ * Change History :
+ *
+*/
+
+#ifndef _H_JFS_XTREE
+#define _H_JFS_XTREE
+
+/*
+ *      jfs_xtree.h: extent allocation descriptor B+-tree manager
+ */
+
+#include "jfs_btree.h"
+
+
+/*
+ *      extent allocation descriptor (xad)
+ */
+typedef struct xad {
+	unsigned flag:8;	/* 1: flag */
+	unsigned rsvrd:16;	/* 2: reserved */
+	unsigned off1:8;	/* 1: offset in unit of fsblksize */
+	u32 off2;		/* 4: offset in unit of fsblksize */
+	unsigned len:24;	/* 3: length in unit of fsblksize */
+	unsigned addr1:8;	/* 1: address in unit of fsblksize */
+	u32 addr2;		/* 4: address in unit of fsblksize */
+} xad_t;			/* (16) */
+
+#define MAXXLEN         ((1 << 24) - 1)
+
+#define XTSLOTSIZE      16
+#define L2XTSLOTSIZE    4
+
+/* xad_t field construction */
+#define XADoffset(xad, offset64)\
+{\
+        (xad)->off1 = ((u64)offset64) >> 32;\
+        (xad)->off2 = __cpu_to_le32((offset64) & 0xffffffff);\
+}
+#define XADaddress(xad, address64)\
+{\
+        (xad)->addr1 = ((u64)address64) >> 32;\
+        (xad)->addr2 = __cpu_to_le32((address64) & 0xffffffff);\
+}
+#define XADlength(xad, length32)        (xad)->len = __cpu_to_le24(length32)
+
+/* xad_t field extraction */
+#define offsetXAD(xad)\
+        ( ((s64)((xad)->off1)) << 32 | __le32_to_cpu((xad)->off2))
+#define addressXAD(xad)\
+        ( ((s64)((xad)->addr1)) << 32 | __le32_to_cpu((xad)->addr2))
+#define lengthXAD(xad)  __le24_to_cpu((xad)->len)
+
+/* xad list */
+typedef struct {
+	s16 maxnxad;
+	s16 nxad;
+	xad_t *xad;
+} xadlist_t;
+
+/* xad_t flags */
+#define XAD_NEW         0x01	/* new */
+#define XAD_EXTENDED    0x02	/* extended */
+#define XAD_COMPRESSED  0x04	/* compressed with recorded length */
+#define XAD_NOTRECORDED 0x08	/* allocated but not recorded */
+#define XAD_COW         0x10	/* copy-on-write */
+
+
+/* possible values for maxentry */
+#define XTROOTINITSLOT_DIR  6
+#define XTROOTINITSLOT  10
+#define XTROOTMAXSLOT   18
+#define XTPAGEMAXSLOT   256
+#define XTENTRYSTART    2
+
+/*
+ *      xtree page:
+ */
+typedef union {
+	struct xtheader {
+		s64 next;	/* 8: */
+		s64 prev;	/* 8: */
+
+		u8 flag;	/* 1: */
+		u8 rsrvd1;	/* 1: */
+		s16 nextindex;	/* 2: next index = number of entries */
+		s16 maxentry;	/* 2: max number of entries */
+		s16 rsrvd2;	/* 2: */
+
+		pxd_t self;	/* 8: self */
+	} header;		/* (32) */
+
+	xad_t xad[XTROOTMAXSLOT];	/* 16 * maxentry: xad array */
+} xtpage_t;
+
+/*
+ *      external declaration
+ */
+extern int xtLookup(struct inode *ip, s64 lstart, s64 llen,
+		    int *pflag, s64 * paddr, int *plen, int flag);
+extern int xtLookupList(struct inode *ip, lxdlist_t * lxdlist,
+			xadlist_t * xadlist, int flag);
+extern void xtInitRoot(tid_t tid, struct inode *ip);
+extern int xtInsert(tid_t tid, struct inode *ip,
+		    int xflag, s64 xoff, int xlen, s64 * xaddrp, int flag);
+extern int xtExtend(tid_t tid, struct inode *ip, s64 xoff, int xlen,
+		    int flag);
+extern int xtTailgate(tid_t tid, struct inode *ip,
+		      s64 xoff, int xlen, s64 xaddr, int flag);
+extern int xtUpdate(tid_t tid, struct inode *ip, struct xad *nxad);
+extern int xtDelete(tid_t tid, struct inode *ip, s64 xoff, int xlen,
+		    int flag);
+extern s64 xtTruncate(tid_t tid, struct inode *ip, s64 newsize, int type);
+extern s64 xtTruncate_pmap(tid_t tid, struct inode *ip, s64 committed_size);
+extern int xtRelocate(tid_t tid, struct inode *ip,
+		      xad_t * oxad, s64 nxaddr, int xtype);
+extern int xtAppend(tid_t tid,
+		    struct inode *ip, int xflag, s64 xoff, int maxblocks,
+		    int *xlenp, s64 * xaddrp, int flag);
+
+#ifdef  _JFS_DEBUG_XTREE
+extern int xtDisplayTree(struct inode *ip);
+extern int xtDisplayPage(struct inode *ip, s64 bn, xtpage_t * p);
+#endif				/* _JFS_DEBUG_XTREE */
+
+#endif				/* !_H_JFS_XTREE */
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Module: jfs/namei.c
+ *
+ */
+
+/*
+ * Change History :
+ *
+ */
+
+#include <linux/fs.h>
+#include "jfs_incore.h"
+#include "jfs_inode.h"
+#include "jfs_dinode.h"
+#include "jfs_dmap.h"
+#include "jfs_unicode.h"
+#include "jfs_metapage.h"
+#include "jfs_debug.h"
+#include <linux/locks.h>
+#include <linux/slab.h>
+
+extern struct inode_operations jfs_file_inode_operations;
+extern struct inode_operations jfs_symlink_inode_operations;
+extern struct file_operations jfs_file_operations;
+extern struct address_space_operations jfs_aops;
+
+extern int jfs_fsync(struct file *, struct dentry *, int);
+extern void jfs_truncate_nolock(struct inode *, loff_t);
+
+/*
+ * forward references
+ */
+struct inode_operations jfs_dir_inode_operations;
+struct file_operations jfs_dir_operations;
+
+s64 commitZeroLink(tid_t, struct inode *);
+
+/*
+ * NAME:	jfs_create(dip, dentry, mode)
+ *
+ * FUNCTION:	create a regular file in the parent directory <dip>
+ *		with name = <from dentry> and mode = <mode>
+ *
+ * PARAMETER:	dip 	- parent directory vnode
+ *		dentry	- dentry of new file
+ *		mode	- create mode (rwxrwxrwx).
+ *
+ * RETURN:	Errors from subroutines
+ *
+ */
+int jfs_create(struct inode *dip, struct dentry *dentry, int mode)
+{
+	int rc = 0;
+	tid_t tid;		/* transaction id */
+	struct inode *ip = NULL;	/* child directory inode */
+	ino_t ino;
+	component_t dname;	/* child directory name */
+	btstack_t btstack;
+	struct inode *iplist[2];
+	tblock_t *tblk;
+
+	jFYI(1, ("jfs_create: dip:0x%p name:%s\n", dip, dentry->d_name.name));
+
+	IWRITE_LOCK(dip);
+
+	/*
+	 * search parent directory for entry/freespace
+	 * (dtSearch() returns parent directory page pinned)
+	 */
+	if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dip->i_sb)->nls_tab)))
+		goto out1;
+
+	/*
+	 * Either iAlloc() or txBegin() may block.  Deadlock can occur if we
+	 * block there while holding dtree page, so we allocate the inode &
+	 * begin the transaction before we search the directory.
+	 */
+	ip = ialloc(dip, mode);
+	if (ip == NULL) {
+		rc = ENOSPC;
+		goto out2;
+	}
+
+	tid = txBegin(dip->i_sb, 0);
+
+	if ((rc = dtSearch(dip, &dname, &ino, &btstack, JFS_CREATE))) {
+		jERROR(1, ("jfs_create: dtSearch returned %d\n", rc));
+		ip->i_nlink = 0;
+		iput(ip);
+		txEnd(tid);
+		goto out2;
+	}
+
+	tblk = tid_to_tblock(tid);
+	tblk->xflag |= COMMIT_CREATE;
+	tblk->ip = ip;
+
+	iplist[0] = dip;
+	iplist[1] = ip;
+
+	/*
+	 * initialize the child XAD tree root in-line in inode
+	 */
+	xtInitRoot(tid, ip);
+
+	/*
+	 * create entry in parent directory for child directory
+	 * (dtInsert() releases parent directory page)
+	 */
+	ino = ip->i_ino;
+	if ((rc = dtInsert(tid, dip, &dname, &ino, &btstack))) {
+		jERROR(1, ("jfs_create: dtInsert returned %d\n", rc));
+		/* discard new inode */
+		ip->i_nlink = 0;
+		iput(ip);
+
+		if (rc == EIO)
+			txAbort(tid, 1);	/* Marks Filesystem dirty */
+		else
+			txAbort(tid, 0);	/* Filesystem full */
+		txEnd(tid);
+		goto out2;
+	}
+
+	ip->i_op = &jfs_file_inode_operations;
+	ip->i_fop = &jfs_file_operations;
+	ip->i_mapping->a_ops = &jfs_aops;
+
+	insert_inode_hash(ip);
+	mark_inode_dirty(ip);
+	d_instantiate(dentry, ip);
+
+	dip->i_version = ++event;
+	dip->i_ctime = dip->i_mtime = CURRENT_TIME;
+
+	mark_inode_dirty(dip);
+
+	rc = txCommit(tid, 2, &iplist[0], 0);
+	txEnd(tid);
+
+      out2:
+	free_UCSname(&dname);
+
+      out1:
+
+	IWRITE_UNLOCK(dip);
+	jFYI(1, ("jfs_create: rc:%d\n", -rc));
+	return -rc;
+}
+
+
+/*
+ * NAME:	jfs_mkdir(dip, dentry, mode)
+ *
+ * FUNCTION:	create a child directory in the parent directory <dip>
+ *		with name = <from dentry> and mode = <mode>
+ *
+ * PARAMETER:	dip 	- parent directory vnode
+ *		dentry	- dentry of child directory
+ *		mode	- create mode (rwxrwxrwx).
+ *
+ * RETURN:	Errors from subroutines
+ *
+ * note:
+ * EACCESS: user needs search+write permission on the parent directory
+ */
+int jfs_mkdir(struct inode *dip, struct dentry *dentry, int mode)
+{
+	int rc = 0;
+	tid_t tid;		/* transaction id */
+	struct inode *ip = NULL;	/* child directory inode */
+	ino_t ino;
+	component_t dname;	/* child directory name */
+	btstack_t btstack;
+	struct inode *iplist[2];
+	tblock_t *tblk;
+
+	jFYI(1, ("jfs_mkdir: dip:0x%p name:%s\n", dip, dentry->d_name.name));
+
+	IWRITE_LOCK(dip);
+
+	/* link count overflow on parent directory ? */
+	if (dip->i_nlink == JFS_LINK_MAX) {
+		rc = EMLINK;
+		goto out1;
+	}
+
+	/*
+	 * search parent directory for entry/freespace
+	 * (dtSearch() returns parent directory page pinned)
+	 */
+	if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dip->i_sb)->nls_tab)))
+		goto out1;
+
+	/*
+	 * Either iAlloc() or txBegin() may block.  Deadlock can occur if we
+	 * block there while holding dtree page, so we allocate the inode &
+	 * begin the transaction before we search the directory.
+	 */
+	ip = ialloc(dip, S_IFDIR | mode);
+	if (ip == NULL) {
+		rc = ENOSPC;
+		goto out2;
+	}
+
+	tid = txBegin(dip->i_sb, 0);
+
+	if ((rc = dtSearch(dip, &dname, &ino, &btstack, JFS_CREATE))) {
+		jERROR(1, ("jfs_mkdir: dtSearch returned %d\n", rc));
+		ip->i_nlink = 0;
+		iput(ip);
+		txEnd(tid);
+		goto out2;
+	}
+
+	tblk = tid_to_tblock(tid);
+	tblk->xflag |= COMMIT_CREATE;
+	tblk->ip = ip;
+
+	iplist[0] = dip;
+	iplist[1] = ip;
+
+	/*
+	 * initialize the child directory in-line in inode
+	 */
+	dtInitRoot(tid, ip, dip->i_ino);
+
+	/*
+	 * create entry in parent directory for child directory
+	 * (dtInsert() releases parent directory page)
+	 */
+	ino = ip->i_ino;
+	if ((rc = dtInsert(tid, dip, &dname, &ino, &btstack))) {
+		jERROR(1, ("jfs_mkdir: dtInsert returned %d\n", rc));
+		/* discard new directory inode */
+		ip->i_nlink = 0;
+		iput(ip);
+
+		if (rc == EIO)
+			txAbort(tid, 1);	/* Marks Filesystem dirty */
+		else
+			txAbort(tid, 0);	/* Filesystem full */
+		txEnd(tid);
+		goto out2;
+	}
+
+	ip->i_nlink = 2;	/* for '.' */
+	ip->i_op = &jfs_dir_inode_operations;
+	ip->i_fop = &jfs_dir_operations;
+	ip->i_mapping->a_ops = &jfs_aops;
+	ip->i_mapping->gfp_mask = GFP_NOFS;
+
+	insert_inode_hash(ip);
+	mark_inode_dirty(ip);
+	d_instantiate(dentry, ip);
+
+	/* update parent directory inode */
+	dip->i_nlink++;		/* for '..' from child directory */
+	dip->i_version = ++event;
+	dip->i_ctime = dip->i_mtime = CURRENT_TIME;
+	mark_inode_dirty(dip);
+
+	rc = txCommit(tid, 2, &iplist[0], 0);
+	txEnd(tid);
+
+      out2:
+	free_UCSname(&dname);
+
+      out1:
+
+	IWRITE_UNLOCK(dip);
+
+	jFYI(1, ("jfs_mkdir: rc:%d\n", -rc));
+	return -rc;
+}
+
+/*
+ * NAME:	jfs_rmdir(dip, dentry)
+ *
+ * FUNCTION:	remove a link to child directory
+ *
+ * PARAMETER:	dip 	- parent inode
+ *		dentry	- child directory dentry
+ *
+ * RETURN:	EINVAL	- if name is . or ..
+ *		EINVAL  - if . or .. exist but are invalid.
+ *		errors from subroutines
+ *
+ * note:
+ * if other threads have the directory open when the last link 
+ * is removed, the "." and ".." entries, if present, are removed before 
+ * rmdir() returns and no new entries may be created in the directory, 
+ * but the directory is not removed until the last reference to 
+ * the directory is released (cf.unlink() of regular file).
+ */
+int jfs_rmdir(struct inode *dip, struct dentry *dentry)
+{
+	int rc;
+	tid_t tid;		/* transaction id */
+	struct inode *ip = dentry->d_inode;
+	ino_t ino;
+	component_t dname;
+	struct inode *iplist[2];
+	tblock_t *tblk;
+
+	jFYI(1, ("jfs_rmdir: dip:0x%p name:%s\n", dip, dentry->d_name.name));
+
+	IWRITE_LOCK_LIST(2, dip, ip);
+
+	/* directory must be empty to be removed */
+	if (!dtEmpty(ip)) {
+		IWRITE_UNLOCK(ip);
+		IWRITE_UNLOCK(dip);
+		rc = ENOTEMPTY;
+		goto out;
+	}
+
+	if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dip->i_sb)->nls_tab))) {
+		IWRITE_UNLOCK(ip);
+		IWRITE_UNLOCK(dip);
+		goto out;
+	}
+
+	tid = txBegin(dip->i_sb, 0);
+
+	iplist[0] = dip;
+	iplist[1] = ip;
+
+	tblk = tid_to_tblock(tid);
+	tblk->xflag |= COMMIT_DELETE;
+	tblk->ip = ip;
+
+	/*
+	 * delete the entry of target directory from parent directory
+	 */
+	ino = ip->i_ino;
+	if ((rc = dtDelete(tid, dip, &dname, &ino, JFS_REMOVE))) {
+		jERROR(1, ("jfs_rmdir: dtDelete returned %d\n", rc));
+		if (rc == EIO)
+			txAbort(tid, 1);
+		txEnd(tid);
+
+		IWRITE_UNLOCK(ip);
+		IWRITE_UNLOCK(dip);
+
+		goto out2;
+	}
+
+	/* update parent directory's link count corresponding
+	 * to ".." entry of the target directory deleted
+	 */
+	dip->i_nlink--;
+	dip->i_ctime = dip->i_mtime = CURRENT_TIME;
+	dip->i_version = ++event;
+	mark_inode_dirty(dip);
+
+	/*
+	 * OS/2 could have created EA and/or ACL
+	 */
+	/* free EA from both persistent and working map */
+	if (JFS_IP(ip)->ea.flag & DXD_EXTENT) {
+		/* free EA pages */
+		txEA(tid, ip, &JFS_IP(ip)->ea, NULL);
+	}
+	JFS_IP(ip)->ea.flag = 0;
+
+	/* free ACL from both persistent and working map */
+	if (JFS_IP(ip)->acl.flag & DXD_EXTENT) {
+		/* free ACL pages */
+		txEA(tid, ip, &JFS_IP(ip)->acl, NULL);
+	}
+	JFS_IP(ip)->acl.flag = 0;
+
+	/* mark the target directory as deleted */
+	ip->i_nlink = 0;
+	mark_inode_dirty(ip);
+
+	rc = txCommit(tid, 2, &iplist[0], 0);
+
+	txEnd(tid);
+
+	IWRITE_UNLOCK(ip);
+
+	/*
+	 * Truncating the directory index table is not guaranteed.  It
+	 * may need to be done iteratively
+	 */
+	if (test_cflag(COMMIT_Stale, dip)) {
+		if (dip->i_size > 1)
+			jfs_truncate_nolock(dip, 0);
+
+		clear_cflag(COMMIT_Stale, dip);
+	}
+
+	IWRITE_UNLOCK(dip);
+
+	d_delete(dentry);
+
+      out2:
+	free_UCSname(&dname);
+
+      out:
+	jFYI(1, ("jfs_rmdir: rc:%d\n", rc));
+	return -rc;
+}
+
+/*
+ * NAME:	jfs_unlink(dip, dentry)
+ *
+ * FUNCTION:	remove a link to object <vp> named by <name> 
+ *		from parent directory <dvp>
+ *
+ * PARAMETER:	dip 	- inode of parent directory
+ *		dentry 	- dentry of object to be removed
+ *
+ * RETURN:	errors from subroutines
+ *
+ * note:
+ * temporary file: if one or more processes have the file open
+ * when the last link is removed, the link will be removed before
+ * unlink() returns, but the removal of the file contents will be
+ * postponed until all references to the files are closed.
+ *
+ * JFS does NOT support unlink() on directories.
+ *
+ */
+int jfs_unlink(struct inode *dip, struct dentry *dentry)
+{
+	int rc;
+	tid_t tid;		/* transaction id */
+	struct inode *ip = dentry->d_inode;
+	ino_t ino;
+	component_t dname;	/* object name */
+	struct inode *iplist[2];
+	tblock_t *tblk;
+	s64 new_size = 0;
+	int commit_flag;
+
+	jFYI(1, ("jfs_unlink: dip:0x%p name:%s\n", dip, dentry->d_name.name));
+
+	if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dip->i_sb)->nls_tab)))
+		goto out;
+
+	IWRITE_LOCK_LIST(2, ip, dip);
+
+	tid = txBegin(dip->i_sb, 0);
+
+	iplist[0] = dip;
+	iplist[1] = ip;
+
+	/*
+	 * delete the entry of target file from parent directory
+	 */
+	ino = ip->i_ino;
+	if ((rc = dtDelete(tid, dip, &dname, &ino, JFS_REMOVE))) {
+		jERROR(1, ("jfs_unlink: dtDelete returned %d\n", rc));
+		if (rc == EIO)
+			txAbort(tid, 1);	/* Marks FS Dirty */
+		txEnd(tid);
+		IWRITE_UNLOCK(ip);
+		IWRITE_UNLOCK(dip);
+		goto out1;
+	}
+
+	ASSERT(ip->i_nlink);
+
+	ip->i_ctime = dip->i_ctime = dip->i_mtime = CURRENT_TIME;
+	dip->i_version = ++event;
+	mark_inode_dirty(dip);
+
+	/* update target's inode */
+	ip->i_nlink--;
+	mark_inode_dirty(ip);
+
+	/*
+	 *      commit zero link count object
+	 */
+	if (ip->i_nlink == 0) {
+		assert(!test_cflag(COMMIT_Nolink, ip));
+		/* free block resources */
+		if ((new_size = commitZeroLink(tid, ip)) < 0) {
+			txAbort(tid, 1);	/* Marks FS Dirty */
+			txEnd(tid);
+			IWRITE_UNLOCK(ip);
+			IWRITE_UNLOCK(dip);
+			rc = -new_size;		/* We return -rc */
+			goto out1;
+		}
+		tblk = tid_to_tblock(tid);
+		tblk->xflag |= COMMIT_DELETE;
+		tblk->ip = ip;
+	}
+
+	/*
+	 * Incomplete truncate of file data can
+	 * result in timing problems unless we synchronously commit the
+	 * transaction.
+	 */
+	if (new_size)
+		commit_flag = COMMIT_SYNC;
+	else
+		commit_flag = 0;
+
+	/*
+	 * If xtTruncate was incomplete, commit synchronously to avoid
+	 * timing complications
+	 */
+	rc = txCommit(tid, 2, &iplist[0], commit_flag);
+
+	txEnd(tid);
+
+	while (new_size && (rc == 0)) {
+		tid = txBegin(dip->i_sb, 0);
+		new_size = xtTruncate_pmap(tid, ip, new_size);
+		if (new_size < 0) {
+			txAbort(tid, 1);	/* Marks FS Dirty */
+			rc = -new_size;		/* We return -rc */
+		} else
+			rc = txCommit(tid, 2, &iplist[0], COMMIT_SYNC);
+		txEnd(tid);
+	}
+
+	if (!test_cflag(COMMIT_Holdlock, ip))
+		IWRITE_UNLOCK(ip);
+
+	/*
+	 * Truncating the directory index table is not guaranteed.  It
+	 * may need to be done iteratively
+	 */
+	if (test_cflag(COMMIT_Stale, dip)) {
+		if (dip->i_size > 1)
+			jfs_truncate_nolock(dip, 0);
+
+		clear_cflag(COMMIT_Stale, dip);
+	}
+
+	IWRITE_UNLOCK(dip);
+
+	d_delete(dentry);
+
+      out1:
+	free_UCSname(&dname);
+      out:
+	jFYI(1, ("jfs_unlink: rc:%d\n", -rc));
+	return -rc;
+}
+
+/*
+ * NAME:	commitZeroLink()
+ *
+ * FUNCTION:    for non-directory, called by jfs_remove(),
+ *		truncate a regular file, directory or symbolic
+ *		link to zero length. return 0 if type is not 
+ *		one of these.
+ *
+ *		if the file is currently associated with a VM segment
+ *		only permanent disk and inode map resources are freed,
+ *		and neither the inode nor indirect blocks are modified
+ *		so that the resources can be later freed in the work
+ *		map by ctrunc1.
+ *		if there is no VM segment on entry, the resources are
+ *		freed in both work and permanent map.
+ *		(? for temporary file - memory object is cached even 
+ *		after no reference:
+ *		reference count > 0 -   )
+ *
+ * PARAMETERS:	cd	- pointer to commit data structure.
+ *			  current inode is the one to truncate.
+ *
+ * RETURN :	Errors from subroutines
+ */
+s64 commitZeroLink(tid_t tid, struct inode *ip)
+{
+	int filetype, committype;
+	tblock_t *tblk;
+
+	jFYI(1, ("commitZeroLink: tid = %d, ip = 0x%p\n", tid, ip));
+
+	filetype = ip->i_mode & S_IFMT;
+	switch (filetype) {
+	case S_IFREG:
+		break;
+	case S_IFLNK:
+		/* fast symbolic link */
+		if (ip->i_size <= 256) {
+			ip->i_size = 0;
+			return 0;
+		}
+		break;
+	default:
+		assert(filetype != S_IFDIR);
+		return 0;
+	}
+
+#ifdef _STILL_TO_PORT
+	/*
+	 *      free from block allocation map:
+	 *
+	 * if there is no cache control element associated with 
+	 * the file, free resources in both persistent and work map;
+	 * otherwise just persistent map. 
+	 */
+	if (ip->i_cacheid) {
+		committype = COMMIT_PMAP;
+
+		/* mark for iClose() to free from working map */
+		set_cflag(COMMIT_Freewmap, ip);
+	} else
+		committype = COMMIT_PWMAP;
+#else				/* _STILL_TO_PORT */
+
+	set_cflag(COMMIT_Freewmap, ip);
+	committype = COMMIT_PMAP;
+#endif				/* _STILL_TO_PORT */
+
+	/* mark transaction of block map update type */
+	tblk = tid_to_tblock(tid);
+	tblk->xflag |= committype;
+
+	/*
+	 * free EA
+	 */
+	if (JFS_IP(ip)->ea.flag & DXD_EXTENT) {
+#ifdef _STILL_TO_PORT
+		/* free EA pages from cache */
+		if (committype == COMMIT_PWMAP)
+			bmExtentInvalidate(ip, addressDXD(&ip->i_ea),
+					   lengthDXD(&ip->i_ea));
+#endif				/* _STILL_TO_PORT */
+
+		/* acquire maplock on EA to be freed from block map */
+		txEA(tid, ip, &JFS_IP(ip)->ea, NULL);
+
+		if (committype == COMMIT_PWMAP)
+			JFS_IP(ip)->ea.flag = 0;
+	}
+
+	/*
+	 * free ACL
+	 */
+	if (JFS_IP(ip)->acl.flag & DXD_EXTENT) {
+#ifdef _STILL_TO_PORT
+		/* free ACL pages from cache */
+		if (committype == COMMIT_PWMAP)
+			bmExtentInvalidate(ip, addressDXD(&ip->i_acl),
+					   lengthDXD(&ip->i_acl));
+#endif				/* _STILL_TO_PORT */
+
+		/* acquire maplock on EA to be freed from block map */
+		txEA(tid, ip, &JFS_IP(ip)->acl, NULL);
+
+		if (committype == COMMIT_PWMAP)
+			JFS_IP(ip)->acl.flag = 0;
+	}
+
+	/*
+	 * free xtree/data (truncate to zero length):
+	 * free xtree/data pages from cache if COMMIT_PWMAP, 
+	 * free xtree/data blocks from persistent block map, and
+	 * free xtree/data blocks from working block map if COMMIT_PWMAP;
+	 */
+	if (ip->i_size)
+		return xtTruncate_pmap(tid, ip, 0);
+
+	return 0;
+}
+
+
+/*
+ * NAME:	freeZeroLink()
+ *
+ * FUNCTION:    for non-directory, called by iClose(),
+ *		free resources of a file from cache and WORKING map 
+ *		for a file previously committed with zero link count
+ *		while associated with a pager object,
+ *
+ * PARAMETER:	ip	- pointer to inode of file.
+ *
+ * RETURN:	0 -ok
+ */
+int freeZeroLink(struct inode *ip)
+{
+	int rc = 0;
+	int type;
+
+	jFYI(1, ("freeZeroLink: ip = 0x%p\n", ip));
+
+	/* return if not reg or symbolic link or if size is
+	 * already ok.
+	 */
+	type = ip->i_mode & S_IFMT;
+
+	switch (type) {
+	case S_IFREG:
+		break;
+	case S_IFLNK:
+		/* if its contained in inode nothing to do */
+		if (ip->i_size <= 256)
+			return 0;
+		break;
+	default:
+		return 0;
+	}
+
+	/*
+	 * free EA
+	 */
+	if (JFS_IP(ip)->ea.flag & DXD_EXTENT) {
+		s64 xaddr;
+		int xlen;
+		maplock_t maplock;	/* maplock for COMMIT_WMAP */
+		pxdlock_t *pxdlock;	/* maplock for COMMIT_WMAP */
+
+		/* free EA pages from cache */
+		xaddr = addressDXD(&JFS_IP(ip)->ea);
+		xlen = lengthDXD(&JFS_IP(ip)->ea);
+#ifdef _STILL_TO_PORT
+		bmExtentInvalidate(ip, xaddr, xlen);
+#endif
+
+		/* free EA extent from working block map */
+		maplock.index = 1;
+		pxdlock = (pxdlock_t *) & maplock;
+		pxdlock->flag = mlckFREEPXD;
+		PXDaddress(&pxdlock->pxd, xaddr);
+		PXDlength(&pxdlock->pxd, xlen);
+		txFreeMap(ip, pxdlock, 0, COMMIT_WMAP);
+	}
+
+	/*
+	 * free ACL
+	 */
+	if (JFS_IP(ip)->acl.flag & DXD_EXTENT) {
+		s64 xaddr;
+		int xlen;
+		maplock_t maplock;	/* maplock for COMMIT_WMAP */
+		pxdlock_t *pxdlock;	/* maplock for COMMIT_WMAP */
+
+		/* free ACL pages from cache */
+		xaddr = addressDXD(&JFS_IP(ip)->acl);
+		xlen = lengthDXD(&JFS_IP(ip)->acl);
+#ifdef _STILL_TO_PORT
+		bmExtentInvalidate(ip, xaddr, xlen);
+#endif
+
+		/* free ACL extent from working block map */
+		maplock.index = 1;
+		pxdlock = (pxdlock_t *) & maplock;
+		pxdlock->flag = mlckFREEPXD;
+		PXDaddress(&pxdlock->pxd, xaddr);
+		PXDlength(&pxdlock->pxd, xlen);
+		txFreeMap(ip, pxdlock, 0, COMMIT_WMAP);
+	}
+
+	/*
+	 * free xtree/data (truncate to zero length):
+	 * free xtree/data pages from cache, and
+	 * free xtree/data blocks from working block map;
+	 */
+	if (ip->i_size)
+		rc = xtTruncate(0, ip, 0, COMMIT_WMAP);
+
+	return rc;
+}
+
+/*
+ * NAME:	jfs_link(vp, dvp, name, crp)
+ *
+ * FUNCTION:	create a link to <vp> by the name = <name>
+ *		in the parent directory <dvp>
+ *
+ * PARAMETER:	vp 	- target object
+ *		dvp	- parent directory of new link
+ *		name	- name of new link to target object
+ *		crp	- credential
+ *
+ * RETURN:	Errors from subroutines
+ *
+ * note:
+ * JFS does NOT support link() on directories (to prevent circular
+ * path in the directory hierarchy);
+ * EPERM: the target object is a directory, and either the caller
+ * does not have appropriate privileges or the implementation prohibits
+ * using link() on directories [XPG4.2].
+ *
+ * JFS does NOT support links between file systems:
+ * EXDEV: target object and new link are on different file systems and
+ * implementation does not support links between file systems [XPG4.2].
+ */
+int jfs_link(struct dentry *old_dentry,
+	     struct inode *dir, struct dentry *dentry)
+{
+	int rc;
+	tid_t tid;
+	struct inode *ip = old_dentry->d_inode;
+	ino_t ino;
+	component_t dname;
+	btstack_t btstack;
+	struct inode *iplist[2];
+
+	jFYI(1,
+	     ("jfs_link: %s %s\n", old_dentry->d_name.name,
+	      dentry->d_name.name));
+/* The checks for links between filesystems and permissions are
+   handled by the VFS layer                                     */
+
+	/* JFS does NOT support link() on directories */
+	if (S_ISDIR(ip->i_mode))
+		return -EPERM;
+
+	IWRITE_LOCK_LIST(2, dir, ip);
+
+	tid = txBegin(ip->i_sb, 0);
+
+	if (ip->i_nlink == JFS_LINK_MAX) {
+		rc = EMLINK;
+		goto out;
+	}
+
+	/*
+	 * scan parent directory for entry/freespace
+	 */
+	if ((rc = get_UCSname(&dname, dentry, JFS_SBI(ip->i_sb)->nls_tab)))
+		goto out;
+
+	if ((rc = dtSearch(dir, &dname, &ino, &btstack, JFS_CREATE)))
+		goto out;
+
+	/*
+	 * create entry for new link in parent directory
+	 */
+	ino = ip->i_ino;
+	if ((rc = dtInsert(tid, dir, &dname, &ino, &btstack)))
+		goto out;
+
+	dir->i_version = ++event;
+
+	/* update object inode */
+	ip->i_nlink++;		/* for new link */
+	ip->i_ctime = CURRENT_TIME;
+	mark_inode_dirty(dir);
+	atomic_inc(&ip->i_count);
+	d_instantiate(dentry, ip);
+
+	iplist[0] = ip;
+	iplist[1] = dir;
+	rc = txCommit(tid, 2, &iplist[0], 0);
+
+      out:
+	IWRITE_UNLOCK(dir);
+	IWRITE_UNLOCK(ip);
+
+	txEnd(tid);
+
+	jFYI(1, ("jfs_link: rc:%d\n", rc));
+	return -rc;
+}
+
+/*
+ * NAME:	jfs_symlink(dip, dentry, name)
+ *
+ * FUNCTION:	creates a symbolic link to <symlink> by name <name>
+ *		        in directory <dip>
+ *
+ * PARAMETER:	dip	    - parent directory vnode
+ *		        dentry 	- dentry of symbolic link
+ *		        name    - the path name of the existing object 
+ *			              that will be the source of the link
+ *
+ * RETURN:	errors from subroutines
+ *
+ * note:
+ * ENAMETOOLONG: pathname resolution of a symbolic link produced
+ * an intermediate result whose length exceeds PATH_MAX [XPG4.2]
+*/
+
+int jfs_symlink(struct inode *dip, struct dentry *dentry, const char *name)
+{
+	int rc;
+	tid_t tid;
+	ino_t ino = 0;
+	component_t dname;
+	int ssize;		/* source pathname size */
+	btstack_t btstack;
+	struct inode *ip = dentry->d_inode;
+	unchar *i_fastsymlink;
+	s64 xlen = 0;
+	int bmask = 0, xsize;
+	s64 xaddr;
+	metapage_t *mp;
+	struct super_block *sb;
+	tblock_t *tblk;
+
+	struct inode *iplist[2];
+
+	jFYI(1, ("jfs_symlink: dip:0x%p name:%s\n", dip, name));
+
+	IWRITE_LOCK(dip);
+
+	ssize = strlen(name) + 1;
+
+	tid = txBegin(dip->i_sb, 0);
+
+	/*
+	 * search parent directory for entry/freespace
+	 * (dtSearch() returns parent directory page pinned)
+	 */
+
+	if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dip->i_sb)->nls_tab)))
+		goto out1;
+
+	if ((rc = dtSearch(dip, &dname, &ino, &btstack, JFS_CREATE)))
+		goto out2;
+
+
+
+	/*
+	 * allocate on-disk/in-memory inode for symbolic link:
+	 * (iAlloc() returns new, locked inode)
+	 */
+
+	ip = ialloc(dip, S_IFLNK | 0777);
+	if (ip == NULL) {
+		BT_PUTSEARCH(&btstack);
+		rc = ENOSPC;
+		goto out2;
+	}
+
+	tblk = tid_to_tblock(tid);
+	tblk->xflag |= COMMIT_CREATE;
+	tblk->ip = ip;
+
+	/*
+	 * create entry for symbolic link in parent directory
+	 */
+
+	ino = ip->i_ino;
+
+
+
+	if ((rc = dtInsert(tid, dip, &dname, &ino, &btstack))) {
+		jERROR(1, ("jfs_symlink: dtInsert returned %d\n", rc));
+		/* discard ne inode */
+		ip->i_nlink = 0;
+		iput(ip);
+		goto out2;
+
+	}
+
+	/* fix symlink access permission
+	 * (dir_create() ANDs in the u.u_cmask, 
+	 * but symlinks really need to be 777 access)
+	 */
+	ip->i_mode |= 0777;
+
+	/*
+	   *       write symbolic link target path name
+	 */
+	xtInitRoot(tid, ip);
+
+	/*
+	 * write source path name inline in on-disk inode (fast symbolic link)
+	 */
+
+	if (ssize <= IDATASIZE) {
+		ip->i_op = &jfs_symlink_inode_operations;
+
+		i_fastsymlink = JFS_IP(ip)->i_inline;
+		memcpy(i_fastsymlink, name, ssize);
+		ip->i_size = ssize - 1;
+		jFYI(1,
+		     ("jfs_symlink: fast symlink added  ssize:%d name:%s \n",
+		      ssize, name));
+	}
+	/*
+	 * write source path name in a single extent
+	 */
+	else {
+		jFYI(1, ("jfs_symlink: allocate extent ip:0x%p\n", ip));
+
+		ip->i_op = &page_symlink_inode_operations;
+		ip->i_mapping->a_ops = &jfs_aops;
+
+		/*
+		 * even though the data of symlink object (source 
+		 * path name) is treated as non-journaled user data,
+		 * it is read/written thru buffer cache for performance.
+		 */
+		sb = ip->i_sb;
+		bmask = JFS_SBI(sb)->bsize - 1;
+		xsize = (ssize + bmask) & ~bmask;
+		xaddr = 0;
+		xlen = xsize >> JFS_SBI(sb)->l2bsize;
+		if ((rc = xtInsert(tid, ip, 0, 0, xlen, &xaddr, 0)) == 0) {
+			ip->i_size = ssize - 1;
+			while (ssize) {
+				int copy_size = min(ssize, PSIZE);
+
+				mp = get_metapage(ip, xaddr, PSIZE, 1);
+
+				if (mp == NULL) {
+					dtDelete(tid, dip, &dname, &ino,
+						 JFS_REMOVE);
+					ip->i_nlink = 0;
+					iput(ip);
+					rc = EIO;
+					goto out2;
+				}
+				memcpy(mp->data, name, copy_size);
+				flush_metapage(mp);
+#if 0
+				mark_buffer_uptodate(bp, 1);
+				mark_buffer_dirty(bp, 1);
+				if (IS_SYNC(dip)) {
+					ll_rw_block(WRITE, 1, &bp);
+					wait_on_buffer(bp);
+				}
+				brelse(bp);
+#endif				/* 0 */
+				ssize -= copy_size;
+				xaddr += JFS_SBI(sb)->nbperpage;
+			}
+			ip->i_blocks = LBLK2PBLK(sb, xlen);
+		} else {
+			dtDelete(tid, dip, &dname, &ino, JFS_REMOVE);
+			ip->i_nlink = 0;
+			iput(ip);
+			rc = ENOSPC;
+			goto out2;
+		}
+	}
+	dip->i_version = ++event;
+
+	insert_inode_hash(ip);
+	mark_inode_dirty(ip);
+	d_instantiate(dentry, ip);
+
+	/*
+	 * commit update of parent directory and link object
+	 *
+	 * if extent allocation failed (ENOSPC),
+	 * the parent inode is committed regardless to avoid
+	 * backing out parent directory update (by dtInsert())
+	 * and subsequent dtDelete() which is harmless wrt 
+	 * integrity concern.  
+	 * the symlink inode will be freed by iput() at exit
+	 * as it has a zero link count (by dtDelete()) and 
+	 * no permanant resources. 
+	 */
+
+	iplist[0] = dip;
+	if (rc == 0) {
+		iplist[1] = ip;
+		rc = txCommit(tid, 2, &iplist[0], 0);
+	} else
+		rc = txCommit(tid, 1, &iplist[0], 0);
+
+      out2:
+
+	free_UCSname(&dname);
+      out1:
+	IWRITE_UNLOCK(dip);
+
+	txEnd(tid);
+
+	jFYI(1, ("jfs_symlink: rc:%d\n", -rc));
+	return -rc;
+}
+
+
+/*
+ * NAME:        jfs_rename
+ *
+ * FUNCTION:    rename a file or directory
+ */
+int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+	       struct inode *new_dir, struct dentry *new_dentry)
+{
+	btstack_t btstack;
+	ino_t ino;
+	component_t new_dname;
+	struct inode *new_ip;
+	component_t old_dname;
+	struct inode *old_ip;
+	int rc;
+	tid_t tid;
+	tlock_t *tlck;
+	dtlock_t *dtlck;
+	lv_t *lv;
+	int ipcount;
+	struct inode *iplist[4];
+	tblock_t *tblk;
+	s64 new_size = 0;
+	int commit_flag;
+
+
+	jFYI(1,
+	     ("jfs_rename: %s %s\n", old_dentry->d_name.name,
+	      new_dentry->d_name.name));
+
+	old_ip = old_dentry->d_inode;
+	new_ip = new_dentry->d_inode;
+
+	if (old_dir == new_dir) {
+		if (new_ip)
+			IWRITE_LOCK_LIST(3, old_dir, old_ip, new_ip);
+		else
+			IWRITE_LOCK_LIST(2, old_dir, old_ip);
+	} else {
+		if (new_ip)
+			IWRITE_LOCK_LIST(4, old_dir, new_dir, old_ip,
+					 new_ip);
+		else
+			IWRITE_LOCK_LIST(3, old_dir, new_dir, old_ip);
+	}
+
+	if ((rc = get_UCSname(&old_dname, old_dentry,
+			      JFS_SBI(old_dir->i_sb)->nls_tab)))
+		goto out1;
+
+	if ((rc = get_UCSname(&new_dname, new_dentry,
+			      JFS_SBI(old_dir->i_sb)->nls_tab)))
+		goto out2;
+
+	/*
+	 * Make sure source inode number is what we think it is
+	 */
+	rc = dtSearch(old_dir, &old_dname, &ino, &btstack, JFS_LOOKUP);
+	if (rc || (ino != old_ip->i_ino)) {
+		rc = ENOENT;
+		goto out3;
+	}
+
+	/*
+	 * Make sure dest inode number (if any) is what we think it is
+	 */
+	rc = dtSearch(new_dir, &new_dname, &ino, &btstack, JFS_LOOKUP);
+	if (rc == 0) {
+		if ((new_ip == 0) || (ino != new_ip->i_ino)) {
+			rc = ESTALE;
+			goto out3;
+		}
+	} else if (rc != ENOENT)
+		goto out3;
+	else if (new_ip) {
+		/* no entry exists, but one was expected */
+		rc = ESTALE;
+		goto out3;
+	}
+
+	if (S_ISDIR(old_ip->i_mode)) {
+		if (new_ip) {
+			if (!dtEmpty(new_ip)) {
+				rc = ENOTEMPTY;
+				goto out3;
+			}
+		} else if ((new_dir != old_dir) &&
+			   (new_dir->i_nlink == JFS_LINK_MAX)) {
+			rc = EMLINK;
+			goto out3;
+		}
+	}
+
+	/*
+	 * The real work starts here
+	 */
+	tid = txBegin(new_dir->i_sb, 0);
+
+	if (new_ip) {
+		/*
+		 * Change existing directory entry to new inode number
+		 */
+		ino = new_ip->i_ino;
+		rc = dtModify(tid, new_dir, &new_dname, &ino,
+			      old_ip->i_ino, JFS_RENAME);
+		if (rc)
+			goto out4;
+		new_ip->i_nlink--;
+		if (S_ISDIR(new_ip->i_mode)) {
+			new_ip->i_nlink--;
+			assert(new_ip->i_nlink == 0);
+			tblk = tid_to_tblock(tid);
+			tblk->xflag |= COMMIT_DELETE;
+			tblk->ip = new_ip;
+		} else if (new_ip->i_nlink == 0) {
+			assert(!test_cflag(COMMIT_Nolink, new_ip));
+			/* free block resources */
+			if ((new_size = commitZeroLink(tid, new_ip)) < 0) {
+				txAbort(tid, 1);	/* Marks FS Dirty */
+				rc = -new_size;		/* We return -rc */
+				goto out4;
+			}
+			tblk = tid_to_tblock(tid);
+			tblk->xflag |= COMMIT_DELETE;
+			tblk->ip = new_ip;
+		} else {
+			new_ip->i_ctime = CURRENT_TIME;
+			mark_inode_dirty(new_ip);
+		}
+	} else {
+		/*
+		 * Add new directory entry
+		 */
+		rc = dtSearch(new_dir, &new_dname, &ino, &btstack,
+			      JFS_CREATE);
+		if (rc) {
+			jERROR(1,
+			       ("jfs_rename didn't expect dtSearch to fail w/rc = %d\n",
+				rc));
+			goto out4;
+		}
+
+		ino = old_ip->i_ino;
+		rc = dtInsert(tid, new_dir, &new_dname, &ino, &btstack);
+		if (rc) {
+			jERROR(1,
+			       ("jfs_rename: dtInsert failed w/rc = %d\n",
+				rc));
+			goto out4;
+		}
+		if (S_ISDIR(old_ip->i_mode))
+			new_dir->i_nlink++;
+	}
+	/*
+	 * Remove old directory entry
+	 */
+
+	ino = old_ip->i_ino;
+	rc = dtDelete(tid, old_dir, &old_dname, &ino, JFS_REMOVE);
+	if (rc) {
+		jERROR(1,
+		       ("jfs_rename did not expect dtDelete to return rc = %d\n",
+			rc));
+		txAbort(tid, 1);	/* Marks Filesystem dirty */
+		goto out4;
+	}
+	if (S_ISDIR(old_ip->i_mode)) {
+		old_dir->i_nlink--;
+		if (old_dir != new_dir) {
+			/*
+			 * Change inode number of parent for moved directory
+			 */
+
+			JFS_IP(old_ip)->i_dtroot.header.idotdot =
+				cpu_to_le32(new_dir->i_ino);
+
+			/* Linelock header of dtree */
+			tlck = txLock(tid, old_ip,
+				      (metapage_t *) & JFS_IP(old_ip)->bxflag,
+				      tlckDTREE | tlckBTROOT);
+			dtlck = (dtlock_t *) & tlck->lock;
+			ASSERT(dtlck->index == 0);
+			lv = (lv_t *) & dtlck->lv[0];
+			lv->offset = 0;
+			lv->length = 1;
+			dtlck->index++;
+		}
+	}
+
+	/*
+	 * Update ctime on changed/moved inodes & mark dirty
+	 */
+	old_ip->i_ctime = CURRENT_TIME;
+	mark_inode_dirty(old_ip);
+
+	new_dir->i_version = ++event;
+	new_dir->i_ctime = CURRENT_TIME;
+	mark_inode_dirty(new_dir);
+
+	/* Build list of inodes modified by this transaction */
+	ipcount = 0;
+	iplist[ipcount++] = old_ip;
+	if (new_ip)
+		iplist[ipcount++] = new_ip;
+	iplist[ipcount++] = old_dir;
+
+	if (old_dir != new_dir) {
+		iplist[ipcount++] = new_dir;
+		old_dir->i_version = ++event;
+		old_dir->i_ctime = CURRENT_TIME;
+		mark_inode_dirty(old_dir);
+	}
+
+	/*
+	 * Incomplete truncate of file data can
+	 * result in timing problems unless we synchronously commit the
+	 * transaction.
+	 */
+	if (new_size)
+		commit_flag = COMMIT_SYNC;
+	else
+		commit_flag = 0;
+
+	rc = txCommit(tid, ipcount, iplist, commit_flag);
+
+	/*
+	 * Don't unlock new_ip if COMMIT_HOLDLOCK is set
+	 */
+	if (new_ip && test_cflag(COMMIT_Holdlock, new_ip))
+		new_ip = 0;
+
+      out4:
+	txEnd(tid);
+
+	while (new_size && (rc == 0)) {
+		tid = txBegin(new_ip->i_sb, 0);
+		new_size = xtTruncate_pmap(tid, new_ip, new_size);
+		if (new_size < 0) {
+			txAbort(tid, 1);
+			rc = -new_size;		/* We return -rc */
+		} else
+			rc = txCommit(tid, 1, &new_ip, COMMIT_SYNC);
+		txEnd(tid);
+	}
+      out3:
+	free_UCSname(&new_dname);
+      out2:
+	free_UCSname(&old_dname);
+      out1:
+	IWRITE_UNLOCK(old_ip);
+	if (old_dir != new_dir)
+		IWRITE_UNLOCK(new_dir);
+	if (new_ip)
+		IWRITE_UNLOCK(new_ip);
+
+	/*
+	 * Truncating the directory index table is not guaranteed.  It
+	 * may need to be done iteratively
+	 */
+	if (test_cflag(COMMIT_Stale, old_dir)) {
+		if (old_dir->i_size > 1)
+			jfs_truncate_nolock(old_dir, 0);
+
+		clear_cflag(COMMIT_Stale, old_dir);
+	}
+
+	IWRITE_UNLOCK(old_dir);
+
+	jFYI(1, ("jfs_rename: returning %d\n", rc));
+	return -rc;
+}
+
+
+/*
+ * NAME:        jfs_mknod
+ *
+ * FUNCTION:    Create a special file (device)
+ */
+int jfs_mknod(struct inode *dir, struct dentry *dentry, int mode, int rdev)
+{
+	btstack_t btstack;
+	component_t dname;
+	ino_t ino;
+	struct inode *ip;
+	struct inode *iplist[2];
+	int rc;
+	tid_t tid;
+	tblock_t *tblk;
+
+	jFYI(1, ("jfs_mknod: %s\n", dentry->d_name.name));
+
+	if ((rc = get_UCSname(&dname, dentry, JFS_SBI(dir->i_sb)->nls_tab)))
+		goto out;
+
+	IWRITE_LOCK(dir);
+
+	ip = ialloc(dir, mode);
+	if (ip == NULL) {
+		rc = ENOSPC;
+		goto out1;
+	}
+
+	tid = txBegin(dir->i_sb, 0);
+
+	if ((rc = dtSearch(dir, &dname, &ino, &btstack, JFS_CREATE))) {
+		ip->i_nlink = 0;
+		iput(ip);
+		txEnd(tid);
+		goto out1;
+	}
+
+	tblk = tid_to_tblock(tid);
+	tblk->xflag |= COMMIT_CREATE;
+	tblk->ip = ip;
+
+	ino = ip->i_ino;
+	if ((rc = dtInsert(tid, dir, &dname, &ino, &btstack))) {
+		ip->i_nlink = 0;
+		iput(ip);
+		txEnd(tid);
+		goto out1;
+	}
+
+	if (S_ISREG(ip->i_mode)) {
+		ip->i_op = &jfs_file_inode_operations;
+		ip->i_fop = &jfs_file_operations;
+		ip->i_mapping->a_ops = &jfs_aops;
+	} else
+		init_special_inode(ip, ip->i_mode, rdev);
+
+	insert_inode_hash(ip);
+	mark_inode_dirty(ip);
+	d_instantiate(dentry, ip);
+
+	dir->i_version = ++event;
+	dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+
+	mark_inode_dirty(dir);
+
+	iplist[0] = dir;
+	iplist[1] = ip;
+	rc = txCommit(tid, 2, iplist, 0);
+	txEnd(tid);
+
+      out1:
+	IWRITE_UNLOCK(dir);
+	free_UCSname(&dname);
+
+      out:
+	jFYI(1, ("jfs_mknod: returning %d\n", rc));
+	return -rc;
+}
+
+static struct dentry *jfs_lookup(struct inode *dip, struct dentry *dentry)
+{
+	btstack_t btstack;
+	ino_t inum;
+	struct inode *ip;
+	component_t key;
+	const char *name = dentry->d_name.name;
+	int len = dentry->d_name.len;
+	int rc;
+
+	jFYI(1, ("jfs_lookup: name = %s\n", name));
+
+
+	if ((name[0] == '.') && (len == 1))
+		inum = dip->i_ino;
+	else if (strcmp(name, "..") == 0)
+		inum = PARENT(dip);
+	else {
+		if ((rc =
+		     get_UCSname(&key, dentry, JFS_SBI(dip->i_sb)->nls_tab)))
+			return ERR_PTR(-rc);
+		IREAD_LOCK(dip);
+		rc = dtSearch(dip, &key, &inum, &btstack, JFS_LOOKUP);
+		IREAD_UNLOCK(dip);
+		free_UCSname(&key);
+		if (rc == ENOENT) {
+			d_add(dentry, NULL);
+			return ERR_PTR(0);
+		} else if (rc) {
+			jERROR(1,
+			       ("jfs_lookup: dtSearch returned %d\n", rc));
+			return ERR_PTR(-rc);
+		}
+	}
+
+	ip = iget(dip->i_sb, inum);
+	if (ip == NULL) {
+		jERROR(1,
+		       ("jfs_lookup: iget failed on inum %d\n",
+			(uint) inum));
+		return ERR_PTR(-EACCES);
+	}
+
+	d_add(dentry, ip);
+
+	return ERR_PTR(0);
+}
+
+struct inode_operations jfs_dir_inode_operations = {
+	create:		jfs_create,
+	lookup:		jfs_lookup,
+	link:		jfs_link,
+	unlink:		jfs_unlink,
+	symlink:	jfs_symlink,
+	mkdir:		jfs_mkdir,
+	rmdir:		jfs_rmdir,
+	mknod:		jfs_mknod,
+	rename:		jfs_rename,
+};
+
+struct file_operations jfs_dir_operations = {
+	read:		generic_read_dir,
+	readdir:	jfs_readdir,
+	fsync:		jfs_fsync,
+};
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/fs.h>
+#include <linux/locks.h>
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <asm/uaccess.h>
+#include "jfs_incore.h"
+#include "jfs_filsys.h"
+#include "jfs_metapage.h"
+#include "jfs_superblock.h"
+#include "jfs_dmap.h"
+#include "jfs_imap.h"
+#include "jfs_debug.h"
+
+MODULE_DESCRIPTION("The Journaled Filesystem (JFS)");
+MODULE_AUTHOR("Steve Best/Dave Kleikamp/Barry Arndt, IBM");
+MODULE_LICENSE("GPL");
+
+static kmem_cache_t * jfs_inode_cachep;
+
+static int in_shutdown;
+static pid_t jfsIOthread;
+static pid_t jfsCommitThread;
+static pid_t jfsSyncThread;
+struct task_struct *jfsIOtask;
+struct task_struct *jfsCommitTask;
+struct task_struct *jfsSyncTask;
+DECLARE_COMPLETION(jfsIOwait);
+
+#ifdef CONFIG_JFS_DEBUG
+int jfsloglevel = 1;
+MODULE_PARM(jfsloglevel, "i");
+MODULE_PARM_DESC(jfsloglevel, "Specify JFS loglevel (0, 1 or 2)");
+#endif
+
+/*
+ * External declarations
+ */
+extern int jfs_mount(struct super_block *);
+extern int jfs_mount_rw(struct super_block *, int);
+extern int jfs_umount(struct super_block *);
+extern int jfs_umount_rw(struct super_block *);
+
+extern int jfsIOWait(void *);
+extern int jfs_lazycommit(void *);
+extern int jfs_sync(void *);
+extern void jfs_put_inode(struct inode *inode);
+extern void jfs_read_inode(struct inode *inode);
+extern void jfs_dirty_inode(struct inode *inode);
+extern void jfs_delete_inode(struct inode *inode);
+extern void jfs_write_inode(struct inode *inode, int wait);
+
+#if defined(CONFIG_JFS_DEBUG) && defined(CONFIG_PROC_FS)
+extern void jfs_proc_init(void);
+extern void jfs_proc_clean(void);
+#endif
+
+int jfs_thread_stopped(void)
+{
+	unsigned long signr;
+	siginfo_t info;
+
+	spin_lock_irq(&current->sigmask_lock);
+	signr = dequeue_signal(&current->blocked, &info);
+	spin_unlock_irq(&current->sigmask_lock);
+
+	if (signr == SIGKILL && in_shutdown)
+		return 1;
+	return 0;
+}
+
+static struct inode *jfs_alloc_inode(struct super_block *sb)
+{
+	struct jfs_inode_info *jfs_inode;
+
+	jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
+	if (!jfs_inode)
+		return NULL;
+	return &jfs_inode->vfs_inode;
+}
+
+static void jfs_destroy_inode(struct inode *inode)
+{
+	kmem_cache_free(jfs_inode_cachep, JFS_IP(inode));
+}
+
+static int jfs_statfs(struct super_block *sb, struct statfs *buf)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(sb);
+	s64 maxinodes;
+	imap_t *imap = JFS_IP(sbi->ipimap)->i_imap;
+
+	jFYI(1, ("In jfs_statfs\n"));
+	buf->f_type = JFS_SUPER_MAGIC;
+	buf->f_bsize = sbi->bsize;
+	buf->f_blocks = sbi->bmap->db_mapsize;
+	buf->f_bfree = sbi->bmap->db_nfree;
+	buf->f_bavail = sbi->bmap->db_nfree;
+	/*
+	 * If we really return the number of allocated & free inodes, some
+	 * applications will fail because they won't see enough free inodes.
+	 * We'll try to calculate some guess as to how may inodes we can
+	 * really allocate
+	 *
+	 * buf->f_files = atomic_read(&imap->im_numinos);
+	 * buf->f_ffree = atomic_read(&imap->im_numfree);
+	 */
+	maxinodes = min((s64) atomic_read(&imap->im_numinos) +
+			((sbi->bmap->db_nfree >> imap->im_l2nbperiext)
+			 << L2INOSPEREXT), (s64)0xffffffffLL);
+	buf->f_files = maxinodes;
+	buf->f_ffree = maxinodes - (atomic_read(&imap->im_numinos) -
+				    atomic_read(&imap->im_numfree));
+
+	buf->f_namelen = JFS_NAME_MAX;
+	return 0;
+}
+
+static void jfs_put_super(struct super_block *sb)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(sb);
+	int rc;
+
+	jFYI(1, ("In jfs_put_super\n"));
+	rc = jfs_umount(sb);
+	if (rc) {
+		jERROR(1, ("jfs_umount failed with return code %d\n", rc));
+	}
+	unload_nls(sbi->nls_tab);
+	sbi->nls_tab = NULL;
+
+	/*
+	 * We need to clean out the direct_inode pages since this inode
+	 * is not in the inode hash.
+	 */
+	fsync_inode_data_buffers(sbi->direct_inode);
+	truncate_inode_pages(sbi->direct_mapping, 0);
+	iput(sbi->direct_inode);
+	sbi->direct_inode = NULL;
+	sbi->direct_mapping = NULL;
+
+	JFS_SBI(sb) = 0;
+	kfree(sbi);
+}
+
+static int parse_options (char * options, struct jfs_sb_info *sbi)
+{
+	void *nls_map = NULL;
+	char * this_char;
+	char * value;
+
+	if (!options)
+		return 1;
+	for (this_char = strtok (options, ",");
+	     this_char != NULL;
+	     this_char = strtok (NULL, ",")) {
+		if ((value = strchr (this_char, '=')) != NULL)
+			*value++ = 0;
+		if (!strcmp (this_char, "iocharset")) {
+			if (!value || !*value)
+				goto needs_arg;
+			if (nls_map)	/* specified iocharset twice! */
+				unload_nls(nls_map);
+			nls_map = load_nls(value);
+			if (!nls_map) {
+				printk(KERN_ERR "JFS: charset not found\n");
+				goto cleanup;
+			}
+		/* Silently ignore the quota options */
+		} else if (!strcmp (this_char, "grpquota")
+		         || !strcmp (this_char, "noquota")
+		         || !strcmp (this_char, "quota")
+		         || !strcmp (this_char, "usrquota"))
+			/* Don't do anything ;-) */ ;
+		else {
+			printk ("jfs: Unrecognized mount option %s\n", this_char);
+			goto cleanup;
+		}
+	}
+	if (nls_map) {
+		/* Discard old (if remount) */
+		if (sbi->nls_tab)
+			unload_nls(sbi->nls_tab);
+		sbi->nls_tab = nls_map;
+	}
+	return 1;
+needs_arg:
+	printk(KERN_ERR "JFS: %s needs an argument\n", this_char);
+cleanup:
+	if (nls_map)
+		unload_nls(nls_map);
+	return 0;
+}
+
+int jfs_remount(struct super_block *sb, int *flags, char *data)
+{
+	struct jfs_sb_info *sbi = JFS_SBI(sb);
+
+	if (!parse_options(data, sbi)) {
+		return -EINVAL;
+	}
+
+	if ((sb->s_flags & MS_RDONLY) && !(*flags & MS_RDONLY)) {
+		/*
+		 * Invalidate any previously read metadata.  fsck may
+		 * have changed the on-disk data since we mounted r/o
+		 */
+		truncate_inode_pages(sbi->direct_mapping, 0);
+
+		return jfs_mount_rw(sb, 1);
+	} else if ((!(sb->s_flags & MS_RDONLY)) && (*flags & MS_RDONLY))
+		return jfs_umount_rw(sb);
+
+	return 0;
+}
+
+static struct super_operations jfs_sops = {
+	alloc_inode:	jfs_alloc_inode,
+	destroy_inode:	jfs_destroy_inode,
+	read_inode:	jfs_read_inode,
+	dirty_inode:	jfs_dirty_inode,
+	write_inode:	jfs_write_inode,
+	put_inode:	jfs_put_inode,
+	delete_inode:	jfs_delete_inode,
+	put_super:	jfs_put_super,
+	statfs:		jfs_statfs,
+	remount_fs:	jfs_remount,
+};
+
+static int jfs_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct jfs_sb_info *sbi;
+	struct inode *inode;
+	int rc;
+
+	jFYI(1, ("In jfs_read_super: s_flags=0x%lx\n", sb->s_flags));
+
+	sbi = kmalloc(sizeof(struct jfs_sb_info), GFP_KERNEL);
+	JFS_SBI(sb) = sbi;
+	if (!sbi)
+		return -ENOSPC;
+	memset(sbi, 0, sizeof(struct jfs_sb_info));
+
+	if (!parse_options((char *)data, sbi)) {
+		kfree(sbi);
+		return -EINVAL;
+	}
+
+	/*
+	 * Initialize blocksize to 4K.
+	 */
+	sb->s_blocksize = PSIZE;
+	sb->s_blocksize_bits = L2PSIZE;
+	set_blocksize(sb->s_dev, PSIZE);
+
+	sb->s_op = &jfs_sops;
+	/*
+	 * Initialize direct-mapping inode/address-space
+	 */
+	inode = new_inode(sb);
+	if (inode == NULL)
+		goto out_kfree;
+	inode->i_ino = 0;
+	inode->i_nlink = 1;
+	inode->i_size = 0x0000010000000000LL;
+	inode->i_mapping->a_ops = &direct_aops;
+	inode->i_mapping->gfp_mask = GFP_NOFS;
+
+	sbi->direct_inode = inode;
+	sbi->direct_mapping = inode->i_mapping;
+
+	rc = jfs_mount(sb);
+	if (rc) {
+		if (!silent) {
+			jERROR(1,
+			       ("jfs_mount failed w/return code = %d\n",
+				rc));
+		}
+		goto out_mount_failed;
+	}
+	if (sb->s_flags & MS_RDONLY)
+		sbi->log = 0;
+	else {
+		rc = jfs_mount_rw(sb, 0);
+		if (rc) {
+			if (!silent) {
+				jERROR(1,
+				       ("jfs_mount_rw failed w/return code = %d\n",
+					rc));
+			}
+			goto out_no_rw;
+		}
+	}
+
+	sb->s_magic = JFS_SUPER_MAGIC;
+
+	inode = iget(sb, ROOT_I);
+	if (!inode || is_bad_inode(inode))
+		goto out_no_root;
+	sb->s_root = d_alloc_root(inode);
+	if (!sb->s_root)
+		goto out_no_root;
+
+	if (!sbi->nls_tab)
+		sbi->nls_tab = load_nls_default();
+
+	sb->s_maxbytes = ((u64) sb->s_blocksize) << 40;
+#if BITS_PER_LONG == 32
+	sb->s_maxbytes = min((u64)PAGE_CACHE_SIZE << 32, sb->s_maxbytes);
+#endif
+
+	return 0;
+
+out_no_root:
+	jEVENT(1, ("jfs_read_super: get root inode failed\n"));
+	if (inode)
+		iput(inode);
+
+out_no_rw:
+	rc = jfs_umount(sb);
+	if (rc) {
+		jERROR(1, ("jfs_umount failed with return code %d\n", rc));
+	}
+out_mount_failed:
+	fsync_inode_data_buffers(sbi->direct_inode);
+	truncate_inode_pages(sbi->direct_mapping, 0);
+	make_bad_inode(sbi->direct_inode);
+	iput(sbi->direct_inode);
+	sbi->direct_inode = NULL;
+	sbi->direct_mapping = NULL;
+out_kfree:
+	if (sbi->nls_tab)
+		unload_nls(sbi->nls_tab);
+	kfree(sbi);
+	return -EINVAL;
+}
+
+static struct super_block *jfs_get_sb(struct file_system_type *fs_type,
+		int flags, char *dev_name, void *data)
+{
+	return get_sb_bdev(fs_type, flags, dev_name, data, jfs_fill_super);
+}
+
+static struct file_system_type jfs_fs_type = {
+	owner:		THIS_MODULE,
+	name:		"jfs",
+	get_sb:		jfs_get_sb,
+	fs_flags:	FS_REQUIRES_DEV,
+};
+
+extern int metapage_init(void);
+extern int txInit(void);
+extern void txExit(void);
+extern void metapage_exit(void);
+
+static void init_once(void * foo, kmem_cache_t * cachep, unsigned long flags)
+{
+	struct jfs_inode_info *jfs_ip = (struct jfs_inode_info *) foo;
+
+	if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) ==
+	    SLAB_CTOR_CONSTRUCTOR) {
+		INIT_LIST_HEAD(&jfs_ip->anon_inode_list);
+		INIT_LIST_HEAD(&jfs_ip->mp_list);
+		RDWRLOCK_INIT(&jfs_ip->rdwrlock);
+		inode_init_once(&jfs_ip->vfs_inode);
+	}
+}
+
+static int __init init_jfs_fs(void)
+{
+	int rc;
+
+	printk("JFS development version: $Name:  $\n");
+
+	jfs_inode_cachep =
+	    kmem_cache_create("jfs_ip",
+			    sizeof(struct jfs_inode_info),
+			    0, 0, init_once, NULL);
+	if (jfs_inode_cachep == NULL)
+		return -ENOMEM;
+
+	/*
+	 * Metapage initialization
+	 */
+	rc = metapage_init();
+	if (rc) {
+		jERROR(1, ("metapage_init failed w/rc = %d\n", rc));
+		goto free_slab;
+	}
+
+	/*
+	 * Transaction Manager initialization
+	 */
+	rc = txInit();
+	if (rc) {
+		jERROR(1, ("txInit failed w/rc = %d\n", rc));
+		goto free_metapage;
+	}
+
+	/*
+	 * I/O completion thread (endio)
+	 */
+	jfsIOthread = kernel_thread(jfsIOWait, 0,
+				    CLONE_FS | CLONE_FILES |
+				    CLONE_SIGHAND);
+	if (jfsIOthread < 0) {
+		jERROR(1,
+		       ("init_jfs_fs: fork failed w/rc = %d\n",
+			jfsIOthread));
+		goto end_txmngr;
+	}
+	wait_for_completion(&jfsIOwait);	/* Wait until IO thread starts */
+
+	jfsCommitThread = kernel_thread(jfs_lazycommit, 0,
+					CLONE_FS | CLONE_FILES |
+					CLONE_SIGHAND);
+	if (jfsCommitThread < 0) {
+		jERROR(1,
+		       ("init_jfs_fs: fork failed w/rc = %d\n",
+			jfsCommitThread));
+		goto kill_iotask;
+	}
+	wait_for_completion(&jfsIOwait);	/* Wait until IO thread starts */
+
+	jfsSyncThread = kernel_thread(jfs_sync, 0,
+				      CLONE_FS | CLONE_FILES |
+				      CLONE_SIGHAND);
+	if (jfsSyncThread < 0) {
+		jERROR(1,
+		       ("init_jfs_fs: fork failed w/rc = %d\n",
+			jfsSyncThread));
+		goto kill_committask;
+	}
+	wait_for_completion(&jfsIOwait);	/* Wait until IO thread starts */
+
+#if defined(CONFIG_JFS_DEBUG) && defined(CONFIG_PROC_FS)
+	jfs_proc_init();
+#endif
+
+	return register_filesystem(&jfs_fs_type);
+
+
+kill_committask:
+	send_sig(SIGKILL, jfsCommitTask, 1);
+	wait_for_completion(&jfsIOwait);	/* Wait until Commit thread exits */
+kill_iotask:
+	send_sig(SIGKILL, jfsIOtask, 1);
+	wait_for_completion(&jfsIOwait);	/* Wait until IO thread exits */
+end_txmngr:
+	txExit();
+free_metapage:
+	metapage_exit();
+free_slab:
+	kmem_cache_destroy(jfs_inode_cachep);
+	return -rc;
+}
+
+static void __exit exit_jfs_fs(void)
+{
+	jFYI(1, ("exit_jfs_fs called\n"));
+
+	in_shutdown = 1;
+	txExit();
+	metapage_exit();
+	send_sig(SIGKILL, jfsIOtask, 1);
+	wait_for_completion(&jfsIOwait);	/* Wait until IO thread exits */
+	send_sig(SIGKILL, jfsCommitTask, 1);
+	wait_for_completion(&jfsIOwait);	/* Wait until Commit thread exits */
+	send_sig(SIGKILL, jfsSyncTask, 1);
+	wait_for_completion(&jfsIOwait);	/* Wait until Sync thread exits */
+#if defined(CONFIG_JFS_DEBUG) && defined(CONFIG_PROC_FS)
+	jfs_proc_clean();
+#endif
+	unregister_filesystem(&jfs_fs_type);
+	kmem_cache_destroy(jfs_inode_cachep);
+}
+
+
+EXPORT_NO_SYMBOLS;
+
+module_init(init_jfs_fs)
+module_exit(exit_jfs_fs)
--- a/fs/jfs/symlink.c
+++ b/fs/jfs/symlink.c
+
+/*
+ *
+ *   Copyright (c) International Business Machines  Corp., 2000
+ *
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or 
+ *   (at your option) any later version.
+ * 
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software 
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *  JFS fast symlink handling code
+ */
+
+#include <linux/fs.h>
+#include "jfs_incore.h"
+
+static int jfs_readlink(struct dentry *, char *buffer, int buflen);
+static int jfs_follow_link(struct dentry *dentry, struct nameidata *nd);
+
+/*
+ * symlinks can't do much...
+ */
+struct inode_operations jfs_symlink_inode_operations = {
+	readlink:	jfs_readlink,
+	follow_link:	jfs_follow_link,
+};
+
+static int jfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+	char *s = JFS_IP(dentry->d_inode)->i_inline;
+	return vfs_follow_link(nd, s);
+}
+
+static int jfs_readlink(struct dentry *dentry, char *buffer, int buflen)
+{
+	char *s = JFS_IP(dentry->d_inode)->i_inline;
+	return vfs_readlink(dentry, buffer, buflen, s);
+}
--- a/fs/nls/Config.in
+++ b/fs/nls/Config.in
@@ -12,7 +12,7 @@ fi
 # msdos and Joliet want NLS
 if [ "$CONFIG_JOLIET" = "y" -o "$CONFIG_FAT_FS" != "n" \
 	-o "$CONFIG_NTFS_FS" != "n" -o "$CONFIG_NCPFS_NLS" = "y" \
-	-o "$CONFIG_SMB_NLS" = "y" ]; then
+	-o "$CONFIG_SMB_NLS" = "y" -o "$CONFIG_JFS_FS" != "n" ]; then
  define_bool CONFIG_NLS y
 else
  define_bool CONFIG_NLS n