- 09 Sep, 2002 2 commits
-
-
bk://thebsh.namesys.com/bk/reiser3-linux-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Ingo Molnar authored
This avoids a crash that can be caused by a CLONE_DETACHED thread.
-
- 08 Sep, 2002 12 commits
-
-
Alexander Viro authored
There are 4 different scenarios of late boot: 1. no initrd or ROOT_DEV is ram0. That's the simplest one - we want whatever is on ROOT_DEV as final root. 2. initrd is there, ROOT_DEV is not ram0, /linuxrc on initrd doesn't exit. We want initrd mounted, /linuxrc launched and /linuxrc will mount whatever it wants, maybe do pivot_root and exec init itself. Task with PID 1 (parent of linuxrc) will sit there reaping zombies, never leaving the kernel mode. 3. initrd is there, ROOT_DEV is not ram0, /linuxrc on initrd does exit and sets real-root-dev to 256 (1:0, aka. ram0). We want initrd mounted, /linuxrc launched and we expect linuxrc to mount all stuff we need, maybe do pivot root and exit. Parent of /linuxrc (PID 1) will proceed to exec init once /linuxrc is done. 4. initrd is there, ROOT_DEV is not ram0, /linuxrc on initrd might have done something or not, but when it exits real-root-dev is not ram0. We want initrd mounted, /linuxrc launched and when it exits we are going to mount final root according to real-root-dev. If there is /initrd on the final root, initrd will be moved there. Otherwise initrd will be unmounted and its memory (if possible) freed. Then we exec init from final root. Note that we want the parent of linuxrc chrooted to initrd while linuxrc runs - otherwise things like request_module() will be rather unhappy. That goes for all variants that run linuxrc. Scenarios above go in order of increasing complexity. Let's start with #4: we had loaded initrd we mount initrd on /root we open / and /old (on initrd) chdir /root mount -- move . / chroot . Now we have initrd mounted on /, we are chrooted into it but keep opened descriptors of / and /old, so we'll be able to break out of jail later. we fork a child that will be linuxrc child closes opened descriptors, opens /dev/console, dups it to stdout and stderr, does setsid and execs /linuxrc. parent sits there reaping zombies until child is finished. Note that both parent and linuxrc are chrooted into /initrd and if linuxrc calls pivot_root, the parent will also have its root/cwd switched. OK, child is finished and after checking real_root_dev we see that it's not MKDEV(1,0). Now we know that it's scenario #4. We break out of jail, doing the following: fchdir to /old on rootfs mount --move / . fchdir to / on rootfs chroot to . That will move initrd to /old and leave us with root and cwd in / of rootfs. We can close these two descriptors now - they'd done their job. We mount final root to /root We attempt to mount -- move /old /root/initrd; if we are successful - we chdir to /root, mount --move . / and chroot to . That will leave us with * final root on / * initrd on /initrd of final root * cwd and root on final root. At that point we simply exec init. Now, if mount --move had failed, we got to clean up the mess. We unmount (with MNT_DETACH) initrd from /old and do BLKFLSBUF on ram0. After that we have final root on /root, initrd maybe still alive, but not mounted anywhere and our root/cwd in / of rootfs. Again, chdir /root mount --move . / chroot to . and we have final root mounted on /, we are chrooted into it and it's time for exec init. That's it for scenario 4. The rest will be simpler - there's less work to do. #3 diverges from #4 after linuxrc had finished and we had already broken out of jail. Whatever we got from linuxrc is mounted on /old now, so we move it back to /, get chrooted there and exec init. We could've left earlier (skipping the move to /old and move back parts), but that would lead to even messier logics in prepare_namespace() ;-/ #2 means that parent of /linuxrc never gets past waiting its child to finish. End of story. #1 is the simplest variant - it mounts final root on /root and then does usual "chdir there, mount --move . /, chroot to ." and execs init. Relevant code is in prepare_namespace()/handle_initrd() and yes, it's messy. Had been even worse... ;-/
-
Hugh Dickins authored
CONFIG_M386 kernel running on PPro+ processor with X86_FEATURE_PGE may set _PAGE_GLOBAL bit: then __flush_tlb_one must use invlpg instruction. H. J. Lu reports (LKML 8 Sept) that his P4 reboots due to this problem.
-
Ingo Molnar authored
This fixes the bootup crash. There were two initialization bugs: - INIT_SIGNAL needs to set shared_pending. - exec() needs to set up newsig properly. the second one caused the crash Anton saw.
-
Andrew Morton authored
This patch uses the atomic copy_from_user() facility in generic_file_write(). This required a change in the prepare_write/commit_write API definition. It is no longer the case that these functions will kmap the page for you. If any part of the kernel wants to get at the page in the write path, it now has to kmap it for itself. The best way to do this is with kmap_atomic(KM_USER0). This patch updates all callers. It also converts several places which were unnecessarily using kmap() over to using kmap_atomic(). The reiserfs changes here are Oleg Drokin's revised version. The patch has been tested with loop, ext2, ext3, reiserfs, jfs, minixfs, vfat, iso9660, nfs and the ramdisk driver. I haven't fixed the racy deadlock avoidance thing in generic_file_write() - the case where we take a fault when the source and dest of the copy are both the same pagecache page. There is a printk in there now which will trigger if the page was unexpectedly not present. And guess what? I get 50-100 of them when running `dbench 64' on mem=48m. This deadlock can happen.
-
Andrew Morton authored
This patch allows the kernel to hold atomic kmaps in file_read_actor(). We try to fault in the page, then take an atomic kmap. If the atomic copy_to_user() then faults, drop a printk and fall back to kmap().
-
Andrew Morton authored
The patch implements the atomic copy_*_user() function. If the kernel takes a pagefault while running copy_*_user() in an atomic region, the copy_*_user() will fail (return a short value). And with this patch, holding an atomic kmap() puts the CPU into an atomic region. - Increment preempt_count() in kmap_atomic() regardless of the setting of CONFIG_PREEMPT. The pagefault handler recognises this as an atomic region and refuses to service the fault. copy_*_user will return a non-zero value. - Attempts to propagate the in_atomic() predicate to all the other highmem-capable architectures' pagefault handlers. But the code is only tested on x86. - Fixed a PPC bug in kunmap_atomic(): it forgot to reenable preemption if HIGHMEM_DEBUG is turned on. - Fixed a sparc bug in kunmap_atomic(): it forgot to reenable preemption all the time, for non-fixmap pages. - Fix an error in <linux/highmem.h> - in the CONFIG_HIGHMEM=n case, kunmap_atomic() takes an address, not a page *.
-
Andrew Morton authored
Fix a problem noticed by Ed Tomlinson: under shifting workloads the shrink_zone() logic will refill the inactive load too slowly. Bale out of the zone scan when we've reclaimed enough pages. Fixes a rarely-occurring problem wherein refill_inactive_zone() ends up shuffling 100,000 pages and generally goes silly. This needs to be revisited - we should go on and rebalance the lower zones even if we reclaimed enough pages from highmem.
-
Andrew Morton authored
Back out the use of preempt_count to signify atomicity wrt pagefaults. We won't do it that way - in_atomic() works fine.
-
Andrew Morton authored
Fix the ENOSPC recovery code in __block_write_full_page() - Don't write out clean buffers. - Set PG_writeback before submitting the IO. Otherwise the completion handler will go BUG when it sees a non-PageWriteback page. If the IO is very fast, or synchronous.
-
Andrew Morton authored
Patch from Bjorn Helgaas, via Rusty. Change: On node 0 totalpages: 61031 <--- not including holes zone(0): 65172 pages. <--- including holes zone(1): 0 pages. ... zone(2): 0 pages. to: On node 0 totalpages: 61031 <--- not including holes DMA zone: 61031 pages <--- not including holes Normal zone: 0 pages HighMem zone: 0 pages
-
Ingo Molnar authored
Support POSIX compliant thread signals on a kernel level with usable debugging (broadcast SIGSTOP, SIGCONT) and thread group management (broadcast SIGKILL), plus to load-balance 'process' signals between threads for better signal performance. Changes: - POSIX thread semantics for signals there are 7 'types' of actions a signal can take: specific, load-balance, kill-all, kill-all+core, stop-all, continue-all and ignore. Depending on the POSIX specifications each signal has one of the types defined for both the 'handler defined' and the 'handler not defined (kernel default)' case. Here is the table: ---------------------------------------------------------- | | userspace | kernel | ---------------------------------------------------------- | SIGHUP | load-balance | kill-all | | SIGINT | load-balance | kill-all | | SIGQUIT | load-balance | kill-all+core | | SIGILL | specific | kill-all+core | | SIGTRAP | specific | kill-all+core | | SIGABRT/SIGIOT | specific | kill-all+core | | SIGBUS | specific | kill-all+core | | SIGFPE | specific | kill-all+core | | SIGKILL | n/a | kill-all | | SIGUSR1 | load-balance | kill-all | | SIGSEGV | specific | kill-all+core | | SIGUSR2 | load-balance | kill-all | | SIGPIPE | specific | kill-all | | SIGALRM | load-balance | kill-all | | SIGTERM | load-balance | kill-all | | SIGCHLD | load-balance | ignore | | SIGCONT | load-balance | continue-all | | SIGSTOP | n/a | stop-all | | SIGTSTP | load-balance | stop-all | | SIGTTIN | load-balancen | stop-all | | SIGTTOU | load-balancen | stop-all | | SIGURG | load-balance | ignore | | SIGXCPU | specific | kill-all+core | | SIGXFSZ | specific | kill-all+core | | SIGVTALRM | load-balance | kill-all | | SIGPROF | specific | kill-all | | SIGPOLL/SIGIO | load-balance | kill-all | | SIGSYS/SIGUNUSED | specific | kill-all+core | | SIGSTKFLT | specific | kill-all | | SIGWINCH | load-balance | ignore | | SIGPWR | load-balance | kill-all | | SIGRTMIN-SIGRTMAX | load-balance | kill-all | ---------------------------------------------------------- as you can see it from the list, signals that have handlers defined never get broadcasted - they are either specific or load-balanced. - CLONE_THREAD implies CLONE_SIGHAND It does not make much sense to have a thread group that does not share signal handlers. In fact in the patch i'm using the signal spinlock to lock access to the thread group. I made the siglock IRQ-safe, thus we can load-balance signals from interrupt contexts as well. (we cannot take the tasklist lock in write mode from IRQ handlers.) this is not as clean as i'd like it to be, but it's the best i could come up with so far. - thread group list management reworked. threads are now removed from the group if the thread is unhashed from the PID table. This makes the most sense. This also helps with another feature that relies on an intact thread group list: multithreaded coredumps. - child reparenting reworked. the O(N) algorithm in forget_original_parent() causes massive performance problems if a large number of threads exit from the group. Performance improves more than 10-fold if the following simple rules are followed instead: - reparent children to the *previous* thread [exiting or not] - if a thread is detached then reparent to init. - fast broadcasting of kernel-internal SIGSTOP, SIGCONT, SIGKILL, etc. kernel-internal broadcasted signals are a potential DoS problem, since they might generate massive amounts of GFP_ATOMIC allocations of siginfo structures. The important thing to note is that the siginfo structure does not actually have to be allocated and queued - the signal processing code has all the information it needs, neither of these signals carries any information in the siginfo structure. This makes a broadcast SIGKILL a very simple operation: all threads get the bit 9 set in their pending bitmask. The speedup due to this was significant - and the robustness win is invaluable. - sys_execve() should not kill off 'all other' threads. the 'exec kills all threads if the master thread does the exec()' is a POSIX(-ish) thing that should not be hardcoded in the kernel in this case. to handle POSIX exec() semantics, glibc uses a special syscall, which kills 'all but self' threads: sys_exit_allbutself(). the straightforward exec() implementation just calls sys_exit_allbutself() and then sys_execve(). (this syscall is also be used internally if the thread group leader thread sys_exit()s or sys_exec()s, to ensure the integrity of the thread group.)
-
Ivan Kokshaysky authored
Added PCI_BUS_NUM_RESOURCES as Ben suggested. Default value is 4 and can be overridden by arch (probably in asm/system.h). pci_read_bridge_bases() and pci_assign_bus_resource() changed accordingly. "for (i = 0 ; i < 4; i++)" in pci_add_new_bus() not changed, as it's used _only_ for pci-pci and cardbus bridges.
-
- 07 Sep, 2002 26 commits
-
-
Randy Hron authored
This patch is based on changes I've used for 2.5.31, 2.5.31-mm1, 2.5.32-mm1, 2.5.32-mm2, and 2.5.33-mm1. Without the patch, 2.5.x during heavy benchmark/stress testing eventually locks up with these final messages: kernel: qlogicfc0 : no handle slots, this should not happen. kernel: hostdata->queued is 6, in_ptr: 7d This is a combination of Doug Ledford's patch: http://marc.theaimsgroup.com/?l=linux-kernel&m=103005703808312&w=2 and Eric Weigle's patch: http://marc.theaimsgroup.com/?l=linux-kernel&m=103005790509079&w=2 2.5.33 (and all predecessors i've tested) locked up without it.
-
Linus Torvalds authored
-
Linus Torvalds authored
but it does...
-
Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Alexander Viro authored
* we remove the paritition 0 from ->part[] and put the old contents of ->part[0] into gendisk itself; indexes are shifted, obviously. * ->part is allocated at add_gendisk() time and freed at del_gendisk() according to value of ->minor_shift; static arrays of hd_struct are gone from drivers, ditto for manual allocations a-la ide. As the matter of fact, none of the drivers know about struct hd_struct now.
-
Alexander Viro authored
new helpers - get_capacity(gendisk)/set_capacity(gendisk, sectors). Drivers switched to these; that eliminates most of the accesses to disk->part[]... in the drivers (and makes code more readable, while we are at it). That had caught several bugs when minor had been used in place of minor>>minor_shift (acsi.c is especially nasty in that respect; I don't know if it had ever been used with multiple devices...)
-
Alexander Viro authored
ide switched from hwif->gd[i] to hwif->drive[i]->disk - IOW, instead of array of two pointers to gendisks refered from hwif, we keep these pointers in relevant drives. Cleaned up.
-
Alexander Viro authored
SCSI cdroms got gendisks.
-
Alexander Viro authored
invalidate_buffers() pulled from cdrom ->reset() into its caller. At that point only cdrom.c using cdi->dev. That will play a bit later.
-
Alexander Viro authored
minor cleanup in cdu31a.c
-
Alexander Viro authored
mcdx.c cleaned up, uses of cdi->dev eliminated
-
Alexander Viro authored
pcd.c cleaned up, uses of cdi->dev eliminated, abuse of macros killed (it used to have #define PCD pcd[unit] #define PI PCD.pi and expected 'unit' to be local variable in each function that used these (== almost every function in there)).
-
Alexander Viro authored
Lindent pcd.c.
-
Alexander Viro authored
pcd.c - killed RR and WR macros (replaced with inlines without hidden arguments; the first step in cleanup, they were monstrous).
-
Alexander Viro authored
sbpcd.c - d eliminated, ditto for uses of cdi->dev (we set cdi->handle pointing to structure we neeed). Cleaned up a bit.
-
Alexander Viro authored
sbpcd.[c,h] - uses of D_S[d] replaced with uses of *current_drive.
-
Alexander Viro authored
sbpcd.c - sigh... It used to have a global variable inventively called 'd'. Current disk number. Tons of uses, 99% of them being D_S[d].<blah>. Added a new variable - current_drive. Said animal is equal to D_S + d - it's reassigned at the same place as d.
-
Alexander Viro authored
killed passing minors around; we always pass a pointer to structure; scsi_CDs made static. That killed uses of cdi->dev in sr.c and friends.
-
Alexander Viro authored
Global search'n'replace job - 'SCp' (Scsi_CD pointer - I'm not kidding; and yes, they spell it "Scsi") replaced with 'cd' (sr.c, sr_ioctl.c, sr_vendor.c).
-
Alexander Viro authored
sr.c: we set SCp->cdi.name from the very beginning, which allows to kill passing minors in many cases (we can use "%s...", SCp->cd.name instead of "sr%d...", minor and that turns out to be the majority of places where we use minors at all).
-
Alexander Viro authored
new helper - update_partition(disk, partition_number); does the right thing wrt devfs and driverfs (un)registration of partition entries. BLKPG ioctls fixed - now they call that beast rather than calling only devfs side. New helper - rescan_partitions(disk, bdev); does all work with wiping/rereading/etc. and fs/block_dev.c now uses it instead of check_partition(). The latter became static.
-
Alexander Viro authored
similar to ->flags and ->driverfs_dev_arr, ->de_arr[] got replaced with its (single) element + flag.
-
Alexander Viro authored
Each hd_struct used to have int number; in it. It's used _only_ in disk->part[0] - disk->part[n].number is never assigned/checked for any positive n. Moved from hd_struct to gendisk (disk->part[0].number to disk->number).
-
Alexander Viro authored
disk->driverfs_dev_arr is either NULL or consists of exactly one element. Same change as above (struct device ** -> struct device *); old "is the pointer to array itself NULL or not?" replaced with a flag (in disk->flags).
-
Alexander Viro authored
Seeing that now disk->flags[] always consists of one element, we replace char *flags with int flags, remove the junk from places that used to allocate these "arrays" and do obvious updates of the code (s/->flags[0]/->flags/).
-
Alexander Viro authored
call of driverfs_remove_partitions() pulled into del_gendisk(); function isn't exported anymore. Both it and driverfs_create_partitions() cleaned up.
-