1. 08 Sep, 2002 9 commits
    • Andrew Morton's avatar
      [PATCH] Use kmap_atomic() for generic_file_write() · 86ee4c5d
      Andrew Morton authored
      This patch uses the atomic copy_from_user() facility in
      generic_file_write().
      
      This required a change in the prepare_write/commit_write API
      definition.  It is no longer the case that these functions will kmap
      the page for you.
      
      If any part of the kernel wants to get at the page in the write path,
      it now has to kmap it for itself.  The best way to do this is with
      kmap_atomic(KM_USER0).
      
      This patch updates all callers.  It also converts several places which
      were unnecessarily using kmap() over to using kmap_atomic().
      
      The reiserfs changes here are Oleg Drokin's revised version.
      
      The patch has been tested with loop, ext2, ext3, reiserfs, jfs,
      minixfs, vfat, iso9660, nfs and the ramdisk driver.
      
      I haven't fixed the racy deadlock avoidance thing in
      generic_file_write() - the case where we take a fault when the source
      and dest of the copy are both the same pagecache page.
      
      There is a printk in there now which will trigger if the page was
      unexpectedly not present.  And guess what?  I get 50-100 of them when
      running `dbench 64' on mem=48m.   This deadlock can happen.
      86ee4c5d
    • Andrew Morton's avatar
      [PATCH] Use kmap_atomic() for generic_file_read() · 88a3b490
      Andrew Morton authored
      This patch allows the kernel to hold atomic kmaps in file_read_actor().
      
      We try to fault in the page, then take an atomic kmap.  If the atomic
      copy_to_user() then faults, drop a printk and fall back to kmap().
      88a3b490
    • Andrew Morton's avatar
      [PATCH] atomic copy_*_user infrastructure · 4b19c940
      Andrew Morton authored
      The patch implements the atomic copy_*_user() function.
      
      If the kernel takes a pagefault while running copy_*_user() in an
      atomic region, the copy_*_user() will fail (return a short value).
      
      And with this patch, holding an atomic kmap() puts the CPU into an
      atomic region.
      
      - Increment preempt_count() in kmap_atomic() regardless of the
        setting of CONFIG_PREEMPT.  The pagefault handler recognises this as
        an atomic region and refuses to service the fault.  copy_*_user will
        return a non-zero value.
      
      - Attempts to propagate the in_atomic() predicate to all the other
        highmem-capable architectures' pagefault handlers.  But the code is
        only tested on x86.
      
      - Fixed a PPC bug in kunmap_atomic(): it forgot to reenable
        preemption if HIGHMEM_DEBUG is turned on.
      
      - Fixed a sparc bug in kunmap_atomic(): it forgot to reenable
        preemption all the time, for non-fixmap pages.
      
      - Fix an error in <linux/highmem.h> - in the CONFIG_HIGHMEM=n case,
        kunmap_atomic() takes an address, not a page *.
      4b19c940
    • Andrew Morton's avatar
      [PATCH] refill the inactive list more quickly · 5f607d6e
      Andrew Morton authored
      Fix a problem noticed by Ed Tomlinson: under shifting workloads the
      shrink_zone() logic will refill the inactive load too slowly.
      
      Bale out of the zone scan when we've reclaimed enough pages.  Fixes a
      rarely-occurring problem wherein refill_inactive_zone() ends up
      shuffling 100,000 pages and generally goes silly.
      
      This needs to be revisited - we should go on and rebalance the lower
      zones even if we reclaimed enough pages from highmem.
      5f607d6e
    • Andrew Morton's avatar
      [PATCH] Back out the initial work for atomic copy_*_user() · 9fdbd959
      Andrew Morton authored
      Back out the use of preempt_count to signify atomicity wrt pagefaults.
      We won't do it that way - in_atomic() works fine.
      9fdbd959
    • Andrew Morton's avatar
      [PATCH] Fix the __block_write_full_page() error path. · 0e64a39d
      Andrew Morton authored
      Fix the ENOSPC recovery code in __block_write_full_page()
      
      - Don't write out clean buffers.
      
      - Set PG_writeback before submitting the IO.  Otherwise the completion
        handler will go BUG when it sees a non-PageWriteback page.  If the IO
        is very fast, or synchronous.
      0e64a39d
    • Andrew Morton's avatar
      [PATCH] Fix the boot-time reporting of each zone's available pages · d98b1feb
      Andrew Morton authored
      Patch from Bjorn Helgaas, via Rusty.
      
      Change:
      
        On node 0 totalpages: 61031         <--- not including holes
        zone(0): 65172 pages.               <--- including holes
        zone(1): 0 pages.                   ...
        zone(2): 0 pages.
      
      to:
      
        On node 0 totalpages: 61031         <--- not including holes
        DMA zone: 61031 pages               <--- not including holes
        Normal zone: 0 pages
        HighMem zone: 0 pages
      d98b1feb
    • Ingo Molnar's avatar
      [PATCH] shared thread signals · 6dfc8897
      Ingo Molnar authored
      Support POSIX compliant thread signals on a kernel level with usable
      debugging (broadcast SIGSTOP, SIGCONT) and thread group management
      (broadcast SIGKILL), plus to load-balance 'process' signals between
      threads for better signal performance. 
      
      Changes:
      
      - POSIX thread semantics for signals
      
      there are 7 'types' of actions a signal can take: specific, load-balance,
      kill-all, kill-all+core, stop-all, continue-all and ignore. Depending on
      the POSIX specifications each signal has one of the types defined for both
      the 'handler defined' and the 'handler not defined (kernel default)' case.  
      Here is the table:
      
       ----------------------------------------------------------
       |                    |  userspace       |  kernel        |
       ----------------------------------------------------------
       |  SIGHUP            |  load-balance    |  kill-all      |
       |  SIGINT            |  load-balance    |  kill-all      |
       |  SIGQUIT           |  load-balance    |  kill-all+core |
       |  SIGILL            |  specific        |  kill-all+core |
       |  SIGTRAP           |  specific        |  kill-all+core |
       |  SIGABRT/SIGIOT    |  specific        |  kill-all+core |
       |  SIGBUS            |  specific        |  kill-all+core |
       |  SIGFPE            |  specific        |  kill-all+core |
       |  SIGKILL           |  n/a             |  kill-all      |
       |  SIGUSR1           |  load-balance    |  kill-all      |
       |  SIGSEGV           |  specific        |  kill-all+core |
       |  SIGUSR2           |  load-balance    |  kill-all      |
       |  SIGPIPE           |  specific        |  kill-all      |
       |  SIGALRM           |  load-balance    |  kill-all      |
       |  SIGTERM           |  load-balance    |  kill-all      |
       |  SIGCHLD           |  load-balance    |  ignore        |
       |  SIGCONT           |  load-balance    |  continue-all  |
       |  SIGSTOP           |  n/a             |  stop-all      |
       |  SIGTSTP           |  load-balance    |  stop-all      |
       |  SIGTTIN           |  load-balancen   |  stop-all      |
       |  SIGTTOU           |  load-balancen   |  stop-all      |
       |  SIGURG            |  load-balance    |  ignore        |
       |  SIGXCPU           |  specific        |  kill-all+core |
       |  SIGXFSZ           |  specific        |  kill-all+core |
       |  SIGVTALRM         |  load-balance    |  kill-all      |
       |  SIGPROF           |  specific        |  kill-all      |
       |  SIGPOLL/SIGIO     |  load-balance    |  kill-all      |
       |  SIGSYS/SIGUNUSED  |  specific        |  kill-all+core |
       |  SIGSTKFLT         |  specific        |  kill-all      |
       |  SIGWINCH          |  load-balance    |  ignore        |
       |  SIGPWR            |  load-balance    |  kill-all      |
       |  SIGRTMIN-SIGRTMAX |  load-balance    |  kill-all      |
       ----------------------------------------------------------
      
      as you can see it from the list, signals that have handlers defined never 
      get broadcasted - they are either specific or load-balanced.
      
      - CLONE_THREAD implies CLONE_SIGHAND
      
      It does not make much sense to have a thread group that does not share
      signal handlers. In fact in the patch i'm using the signal spinlock to
      lock access to the thread group. I made the siglock IRQ-safe, thus we can
      load-balance signals from interrupt contexts as well. (we cannot take the
      tasklist lock in write mode from IRQ handlers.)
      
      this is not as clean as i'd like it to be, but it's the best i could come
      up with so far.
      
      - thread group list management reworked.
      
      threads are now removed from the group if the thread is unhashed from the
      PID table. This makes the most sense. This also helps with another feature 
      that relies on an intact thread group list: multithreaded coredumps.
      
      - child reparenting reworked.
      
      the O(N) algorithm in forget_original_parent() causes massive performance
      problems if a large number of threads exit from the group. Performance 
      improves more than 10-fold if the following simple rules are followed 
      instead:
      
       - reparent children to the *previous* thread [exiting or not]
       - if a thread is detached then reparent to init.
      
      - fast broadcasting of kernel-internal SIGSTOP, SIGCONT, SIGKILL, etc.
      
      kernel-internal broadcasted signals are a potential DoS problem, since
      they might generate massive amounts of GFP_ATOMIC allocations of siginfo
      structures. The important thing to note is that the siginfo structure does
      not actually have to be allocated and queued - the signal processing code
      has all the information it needs, neither of these signals carries any
      information in the siginfo structure. This makes a broadcast SIGKILL a
      very simple operation: all threads get the bit 9 set in their pending
      bitmask. The speedup due to this was significant - and the robustness win
      is invaluable.
      
      - sys_execve() should not kill off 'all other' threads.
      
      the 'exec kills all threads if the master thread does the exec()' is a
      POSIX(-ish) thing that should not be hardcoded in the kernel in this case.
      
      to handle POSIX exec() semantics, glibc uses a special syscall, which
      kills 'all but self' threads: sys_exit_allbutself().
      
      the straightforward exec() implementation just calls sys_exit_allbutself()  
      and then sys_execve().
      
      (this syscall is also be used internally if the thread group leader
      thread sys_exit()s or sys_exec()s, to ensure the integrity of the thread
      group.)
      6dfc8897
    • Ivan Kokshaysky's avatar
      [PATCH] pci bus resources, transparent bridges · 36780249
      Ivan Kokshaysky authored
      Added PCI_BUS_NUM_RESOURCES as Ben suggested. Default value is 4
      and can be overridden by arch (probably in asm/system.h).
      pci_read_bridge_bases() and pci_assign_bus_resource() changed
      accordingly. "for (i = 0 ; i < 4; i++)" in pci_add_new_bus() not
      changed, as it's used _only_ for pci-pci and cardbus bridges.
      36780249
  2. 07 Sep, 2002 31 commits
    • Randy Hron's avatar
      [PATCH] qlogic "this should not happen" fix · be4bde60
      Randy Hron authored
      This patch is based on changes I've used for 2.5.31, 2.5.31-mm1,
      2.5.32-mm1, 2.5.32-mm2, and 2.5.33-mm1.
      
      Without the patch, 2.5.x during heavy benchmark/stress testing
      eventually locks up with these final messages:
      
      kernel: qlogicfc0 : no handle slots, this should not happen.
      kernel: hostdata->queued is 6, in_ptr: 7d
      
      This is a combination of Doug Ledford's patch:
      http://marc.theaimsgroup.com/?l=linux-kernel&m=103005703808312&w=2
      and Eric Weigle's patch:
      http://marc.theaimsgroup.com/?l=linux-kernel&m=103005790509079&w=2
      
      2.5.33 (and all predecessors i've tested) locked up without it.
      be4bde60
    • Linus Torvalds's avatar
    • Linus Torvalds's avatar
      5f6e8ce4
    • Linus Torvalds's avatar
      Merge home.transmeta.com:/home/torvalds/v2.5/viro · 165088f9
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      165088f9
    • Alexander Viro's avatar
      [PATCH] (25/25) more cleanups of struct gendisk. · e86a3786
      Alexander Viro authored
      	* we remove the paritition 0 from ->part[] and put the old
      contents of ->part[0] into gendisk itself; indexes are shifted, obviously.
      	* ->part is allocated at add_gendisk() time and freed at del_gendisk()
      according to value of ->minor_shift; static arrays of hd_struct are gone
      from drivers, ditto for manual allocations a-la ide.  As the matter of fact,
      none of the drivers know about struct hd_struct now.
      e86a3786
    • Alexander Viro's avatar
      [PATCH] (24/25) disk capacity helpers · 3708de94
      Alexander Viro authored
      	new helpers - get_capacity(gendisk)/set_capacity(gendisk, sectors).
      Drivers switched to these; that eliminates most of the accesses to
      disk->part[]... in the drivers (and makes code more readable, while
      we are at it).  That had caught several bugs when minor had been
      used in place of minor>>minor_shift (acsi.c is especially nasty in
      that respect; I don't know if it had ever been used with multiple
      devices...)
      3708de94
    • Alexander Viro's avatar
      [PATCH] (23/25) move pointer to gendisk from hwif to drive · 07586b33
      Alexander Viro authored
      	 ide switched from hwif->gd[i] to hwif->drive[i]->disk - IOW, instead
      of array of two pointers to gendisks refered from hwif, we keep these pointers
      in relevant drives.  Cleaned up.
      07586b33
    • Alexander Viro's avatar
      [PATCH] (22/25) gendisks for SCSI cdroms · c276ff4d
      Alexander Viro authored
      	SCSI cdroms got gendisks.
      c276ff4d
    • Alexander Viro's avatar
      [PATCH] (21/25) cdrom->reset() cleanup · 5c5302d4
      Alexander Viro authored
      	invalidate_buffers() pulled from cdrom ->reset() into its caller.
      At that point only cdrom.c using cdi->dev.  That will play a bit later.
      5c5302d4
    • Alexander Viro's avatar
      [PATCH] (20/25) cdu31a.c cleanup · ffe63c1e
      Alexander Viro authored
      	minor cleanup in cdu31a.c
      ffe63c1e
    • Alexander Viro's avatar
      [PATCH] (19/25) mcdx.c cleanup · ac26e454
      Alexander Viro authored
      	mcdx.c cleaned up, uses of cdi->dev eliminated
      ac26e454
    • Alexander Viro's avatar
      [PATCH] (18/25) pcd.c - cleanup, killed used of cdi->dev · 25b9b8f4
      Alexander Viro authored
      	 pcd.c cleaned up, uses of cdi->dev eliminated, abuse of macros killed
      (it used to have
      #define PCD pcd[unit]
      #define PI PCD.pi
      and expected 'unit' to be local variable in each function that used these
      (== almost every function in there)).
      25b9b8f4
    • Alexander Viro's avatar
      [PATCH] (17/25) Lindent pcd.c · a20c0ab1
      Alexander Viro authored
      	Lindent pcd.c.
      a20c0ab1
    • Alexander Viro's avatar
      [PATCH] (16/25) pcd.c - beginning of macroectomy · 9aab49e3
      Alexander Viro authored
      	 pcd.c - killed RR and WR macros (replaced with inlines without hidden
      arguments; the first step in cleanup, they were monstrous).
      9aab49e3
    • Alexander Viro's avatar
      [PATCH] (15/25) sbpcd.c - killed useds of cdi->dev · 8b3e8b7a
      Alexander Viro authored
      	sbpcd.c - d eliminated, ditto for uses of cdi->dev (we set cdi->handle
      pointing to structure we neeed).  Cleaned up a bit.
      8b3e8b7a
    • Alexander Viro's avatar
      [PATCH] (14/25) sbpcd.c - use *current_drive instead of D_S[d] · 3aa5472a
      Alexander Viro authored
      	 sbpcd.[c,h] - uses of D_S[d] replaced with uses of *current_drive.
      3aa5472a
    • Alexander Viro's avatar
      [PATCH] (13/25) sbpcd.c - beginning of cleanup · 62ea21cb
      Alexander Viro authored
      	sbpcd.c - sigh... It used to have a global variable inventively called
      'd'.  Current disk number.  Tons of uses, 99% of them being D_S[d].<blah>.
      Added a new variable - current_drive.  Said animal is equal to D_S + d -
      it's reassigned at the same place as d.
      62ea21cb
    • Alexander Viro's avatar
      [PATCH] (12/25) sr.c passes pointers instead of minors now · 6f14c533
      Alexander Viro authored
      	 killed passing minors around; we always pass a pointer to structure;
      scsi_CDs made static.  That killed uses of cdi->dev in sr.c and friends.
      6f14c533
    • Alexander Viro's avatar
      [PATCH] (11/25) sr.c naming cleanup · c7633c22
      Alexander Viro authored
      	Global search'n'replace job - 'SCp' (Scsi_CD pointer - I'm not kidding;
      and yes, they spell it "Scsi") replaced with 'cd' (sr.c, sr_ioctl.c,
      sr_vendor.c).
      c7633c22
    • Alexander Viro's avatar
      [PATCH] (10/25) sr.c device name handling · e342ff2e
      Alexander Viro authored
      	sr.c: we set SCp->cdi.name from the very beginning, which allows
      to kill passing minors in many cases (we can use "%s...", SCp->cd.name instead
      of "sr%d...", minor and that turns out to be the majority of places where
      we use minors at all).
      e342ff2e
    • Alexander Viro's avatar
      [PATCH] (9/25) update_partition() · 897c924e
      Alexander Viro authored
      	new helper - update_partition(disk, partition_number); does the
      right thing wrt devfs and driverfs (un)registration of partition entries.
      BLKPG ioctls fixed - now they call that beast rather than calling only
      devfs side.  New helper - rescan_partitions(disk, bdev); does all work
      with wiping/rereading/etc. and fs/block_dev.c now uses it instead of
      check_partition().  The latter became static.
      897c924e
    • Alexander Viro's avatar
      [PATCH] (8/25) Removing bogus arrays - ->de_arr[] · 06f55b09
      Alexander Viro authored
      	similar to ->flags and ->driverfs_dev_arr, ->de_arr[] got replaced
      with its (single) element + flag.
      06f55b09
    • Alexander Viro's avatar
      [PATCH] (7/25) Removing bogus arrays - ->part[].number · db09b5fc
      Alexander Viro authored
      	Each hd_struct used to have int number; in it.  It's used _only_
      in disk->part[0] - disk->part[n].number is never assigned/checked for any
      positive n.  Moved from hd_struct to gendisk (disk->part[0].number to
      disk->number).
      db09b5fc
    • Alexander Viro's avatar
      [PATCH] (6/25) Removing bogus arrays - ->driverfs_dev_arr[] · c5f45a70
      Alexander Viro authored
      	disk->driverfs_dev_arr is either NULL or consists of exactly one
      element.  Same change as above (struct device ** -> struct device *); old
      "is the pointer to array itself NULL or not?" replaced with a flag (in
      disk->flags).
      c5f45a70
    • Alexander Viro's avatar
      [PATCH] (5/25) Removing bogus arrays - ->flags[] · ab3bfaa2
      Alexander Viro authored
      	Seeing that now disk->flags[] always consists of one element, we
      replace char *flags with int flags, remove the junk from places that used
      to allocate these "arrays" and do obvious updates of the code
      (s/->flags[0]/->flags/).
      ab3bfaa2
    • Alexander Viro's avatar
      [PATCH] (4/25) Unexporting driverfs_remove_partitions() · 097b3217
      Alexander Viro authored
      	call of driverfs_remove_partitions() pulled into del_gendisk();
      function isn't exported anymore.  Both it and driverfs_create_partitions()
      cleaned up.
      097b3217
    • Alexander Viro's avatar
      [PATCH] (3/25) Removing useless minor arguments · 36bd834b
      Alexander Viro authored
      	driverfs_remove_partitions(), devfs_register_partitions(),
      driverfs_create_partitions(), devfs_register_partition(), devfs_register_disc(),
      had lost 'minor' argument - it's always disk->first_minor these days.
      disk_name() takes partition number instead of minor now.  Callers of
      wipe_partitions() in fs/block_dev.c expanded.  Remaining caller passes
      gendisk instead of kdev_t now.
      36bd834b
    • Alexander Viro's avatar
      [PATCH] (2/25) Removing ->nr_real · 4e493886
      Alexander Viro authored
      	Since ->nr_real is always 1 now, we can remove that field completely.
      Removed the last remnants of switch in disk_name() (it could be killed
      a long time ago, I just forgot to remove the last two cases when md and i2o
      got converted).  Collapsed several instances of
      disk->part[minor - disk->first_minor] - in cases when we know that we deal
      with disk->part[0].
      4e493886
    • Alexander Viro's avatar
      [PATCH] (1/25) Unexporting helper functions · b3152267
      Alexander Viro authored
      	wipe_partitions() and driverfs_register_partitions(..., 1) (i.e.
      unregistering them) pulled into del_gendisk() and removed from callers.
      grok_partitions() merged with register_disk().  devfs_register_partitions(),
      grok_partitions() and wipe_partitions() not exported anymore.
      b3152267
    • Ivan Kokshaysky's avatar
      [PATCH] alpha: misc fixes · b340c708
      Ivan Kokshaysky authored
      Patch set from Jay Estabrook:
      
       - include/asm-alpha/dma.h:
      	Add MAX_DMA_ADDR for SABLE and ALCOR
      
       - include/asm-alpha/floppy.h:
      	enable the full CROSS_64KB macro for all platforms
      
       - include/asm-alpha/core_t2.h:
      	fix HAE usage
      
       - arch/alpha/kernel/pci.c:
      	fiddle with quirk_cypress
      
       - arch/alpha/kernel/traps.c:
      	prevent opDEC_check() from multiple calls (EV4 SMP SABLEs)
      
       - arch/alpha/kernel/proto.h:
      	make t2_pci_tbi() real
      
       - arch/alpha/kernel/time.c:
      	shorten timeout delay
      
       - arch/alpha/kernel/sys_alcor.c:
      	use ALCOR_MAX_DMA_ADDR because of the 1GB limit on ISA devices
      
       - arch/alpha/kernel/core_t2.c:
      	add S/G support and allow direct-map to handle 2GB of memory
      
       - arch/alpha/kernel/core_tsunami.c:
      	rework alignment requirements for ISA DMA, esp. for ACER platforms
      
       - arch/alpha/kernel/sys_sable.c:
      	fix MAX_DMA_ADDR for the 1GB limitation
      
       - arch/alpha/kernel/pci_impl.h:
      	add T2_DEFAULT_MEM_BASE to help avoid HAE use
      
       - arch/alpha/kernel/pci_iommu.c:
      	fix ISA_DMA_MASK calculation, and force ISA alignment to 64KB
      b340c708
    • Ivan Kokshaysky's avatar
      [PATCH] alpha: compile fixes · 7d1d6131
      Ivan Kokshaysky authored
       - add another argument to do_fork();
       - assorted compile fixes.
      7d1d6131