1. 18 Feb, 2004 40 commits
    • Andrew Morton's avatar
      [PATCH] dm: Tidy up the error path for alloc_dev() · 6b1b56f9
      Andrew Morton authored
      From: Joe Thornber <thornber@redhat.com>
      
      Tidy up the error path for alloc_dev()
      6b1b56f9
    • Andrew Morton's avatar
      [PATCH] dm: Maintain ordering when deferring bios · 54e37e09
      Andrew Morton authored
      From: Joe Thornber <thornber@redhat.com>
      
      Make sure that we maintain ordering when deferring bios.
      54e37e09
    • Andrew Morton's avatar
      [PATCH] dm: Get rid of struct dm_deferred_io in dm.c · a0befbbc
      Andrew Morton authored
      From: Joe Thornber <thornber@redhat.com>
      
      Remove struct dm_deferred_io from dm.c.  [Christophe Saout]
      a0befbbc
    • Andrew Morton's avatar
      [PATCH] dm: Move to_bytes() and to_sectors() into dm.h · 0901c174
      Andrew Morton authored
      From: Joe Thornber <thornber@redhat.com>
      
      Move to_bytes() and to_sectors() into dm.h
      0901c174
    • Andrew Morton's avatar
      [PATCH] dm: Export dm_vcalloc() · c087ec3d
      Andrew Morton authored
      From: Joe Thornber <thornber@redhat.com>
      
      Export dm_vcalloc()
      c087ec3d
    • Andrew Morton's avatar
      [PATCH] md: Allow partitioning of MD devices. · 1797a796
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      With this patch, md used two major numbers for arrays.
      
      One Major is number 9 with name 'md' have unpartitioned md arrays, one per
      minor number.
      
      The other Major is allocated dynamically with name 'mdp' and had on array for
      every 64 minors, allowing for upto 63 partitions.
      
      The arrays under one major are completely separate from the arrays under the
      other.
      
      The preferred name for devices with the new major are of the form:
      
        /dev/md/d1p3  # partion 3 of device 1 - minor 67
      
      When a paritioned md device is assembled, the partitions are not recognised
      until after the whole-array device is opened again.  A future version of
      mdadm will perform this open so that the need will be transparent.
      1797a796
    • Andrew Morton's avatar
      [PATCH] md: Dynamically limit size of bio requests used for raid1 resync · 5077fef0
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      Currently raid1 uses PAGE_SIZE read/write requests for resync, as it doesn't
      know how to honour per-device restrictions.  This patch uses to bio_add_page
      to honour those restrictions and ups the limit on request size to 64K.  This
      has a measurable impact on rebuild speed (25M/s -> 60M/s)
      5077fef0
    • Andrew Morton's avatar
      [PATCH] md: Avoid unnecessary bio allocation during raid1 resync · 89654f5b
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      For each resync request, we allocate a "r1_bio" which has a bio "master_bio"
      attached that goes largely unused.  We also allocate a read_bio which is
      used.  This patch removes the read_bio and just uses the master_bio instead.
      
      This fixes a bug wherein bi_bdev of the master_bio wasn't being set, but was
      being used.
      
      We also introduce a new "sectors" field into the r1_bio as we can no-longer
      rely in master_bio->bi_sectors.
      89654f5b
    • Andrew Morton's avatar
      [PATCH] md: Remove some un-needed fields from r1bio_s · d0d464b1
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      next_r1 is never used, so it can just go.
      
      read_bio isn't needed as we can easily use one of the pointers in the
      write_bios array - write_bios[->read_disk].  So rename "write_bios" to "bios"
      and store the pointer to the read bio in there.
      d0d464b1
    • Andrew Morton's avatar
      [PATCH] md: Discard the cmd field from r1_bio structure · ebf7768e
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      The only time it is really needed is to differentiate a retry-on-fail from a
      write-after-read-for-resync request to raid1d.  So we use a bit in 'state'
      for that.
      ebf7768e
    • Andrew Morton's avatar
      [PATCH] md: Split read and write end_request handlers · c1dd448e
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      Instead of having a single end_request handler that must determine whether it
      was a read or a write request, we have two separate handlers, which makes
      each of them easier to follow.
      c1dd448e
    • Andrew Morton's avatar
      [PATCH] md: Print "deprecated" warning when START_ARRAY is used. · a2c4e506
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      The "START_ARRAY" ioctl depends on major/minor numbers (as stored in the raid
      superblock) are stable over reboots, which is increasingly untrue.
      
      There are better ways to start an array (e.g.  with mdadm) so we mark the
      ioctl as deprecated for 2.6, and will remove it in 2.7.
      a2c4e506
    • Andrew Morton's avatar
      [PATCH] kNFSd:fix build problems in nfs w/o proc_fs on 2.6.0-test5 · 67afcb4f
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      From: Stephen Hemminger <shemminger@osdl.org>
      Date: Fri, 12 Sep 2003 11:31:06 -0700
      
      NFS won't build w/o CONFIG_PROC_FS.  Looks like typo's (or a C++
      programmer) in stats.h
      67afcb4f
    • Andrew Morton's avatar
      [PATCH] kNFSd: convert NFS /proc interfaces to seq_file · 2a0807bd
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      From: shemminger@osdl.org Sat Sep  6 09:19:50 2003
      Date: Fri, 5 Sep 2003 16:19:30 -0700
      
      Converts /proc/net/rpc/nfs and /proc/net/rpc/nfsd to use the simpler
      seq_file interface.
      2a0807bd
    • Andrew Morton's avatar
      [PATCH] kNFSd: ip_map_init does a kmalloc which isn't checked... · bbcc5fa8
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      There is no way to return an error from a cache init routine, so instead we
      make sure to pre-allocate the memory needed, and free it after the lookup
      if the lookup failed.
      bbcc5fa8
    • Andrew Morton's avatar
      [PATCH] kNFSd: Allow sunrpc/svc cache init function to modify the "key" · 9417bd87
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      When adding a item to a sunrpc/svc cache that contains kmalloced data it is
      usefully to move the malloced data out of the key object into the new cache
      object rather than copying (as then we would need to cope with kmalloc
      failure and such).  This means modifying the original.
      
      If the kmalloced data forms part of the key, then we must not move the data
      out until after the key isn't needed any more.  So this patch moves the
      call to "INIT" on a new item (which fills in the key) to *after* the item
      has been found (or not), and also makes sure we only call the HASH function
      once.
      
      Thanks to "J.  Bruce Fields" <bfields@fieldses.org>
      
      also
      
       1/ remove unnecessary assignment
       2/ fix comments that lag behind implementation.
      9417bd87
    • Andrew Morton's avatar
      [PATCH] kNFSd: Fix possible scheduling_while_atomic in cache.c · 16b82dca
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      We currently call cache_put, which can schedule(), under a spin_lock.  This
      patch moves that call outside the spinlock.
      16b82dca
    • Andrew Morton's avatar
      [PATCH] #if versus #ifdef cleanup · c65febbb
      Andrew Morton authored
      From: Valdis.Kletnieks@vt.edu
      
      15 changes of #if to #ifdef and 2 places CONFIG_FOO should be
      defined(CONFIG_FOO).  This gets rid of spurious warnings if you build with
      "-Wundef" so you get a warning if you have a preprocessor command like:
      
      #if CONFIG_ETRAX_DS1302_RSTBIT == 27
      
      and you'll be told if it's substituting a zero rather than silent
      weirdness and unexpected code generation.
      c65febbb
    • Andrew Morton's avatar
      [PATCH] MIPS: New 2.6 serial drivers · b7df53b3
      Andrew Morton authored
      From: Ralf Baechle <ralf@linux-mips.org>
      
      Three new MIPS-specific serial drivers.  ip22.c is derived from the sparc
      zilog driver; guess we should write a generic Zilog driver somewhen ...
      b7df53b3
    • Andrew Morton's avatar
      [PATCH] Enable coredumps > 2GB · 95b387a4
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Some x86-64 users were complaining that coredumps >2GB don't work.
      
      This will enable large coredump for everybody.  Apparently the 32bit
      gdb/binutils cannot handle them, but I hear the binutils people are working
      on fixing that.  I doubt it will harm people - unreadable coredumps are not
      worse than no coredump and it won't make any difference in space usage if
      you get a 1.99GB or a 2.5GB coredump.  So just enable it unconditionally.
      If it should be really a problem for 32bit the rlimit defaults in
      resource.h could be changed.
      
      For file systems that don't support O_LARGEFILE you should just get an
      truncated coredumps for big address spaces.
      95b387a4
    • Andrew Morton's avatar
      [PATCH] devfs: race fixes and cleanup · bf98c406
      Andrew Morton authored
      From: Andrey Borzenkov <arvidjaar@mail.ru>
      
      - use struct nameidata in devfs_d_revalidate_wait to detect when it is
        called without i_sem hold; take i_sem on parent in this case.  This
        prevents both deadlock with devfs_lookup by allowing it to drop i_sem
        consistently and oops in d_instantiate by ensuring that it always runs
        protected
      
      - remove dead code that deals with major number allocation.  The only
        remaining user was devfs itself and patch changes it to
      
      - use register_chardev to get device number for internal /dev/.devfsd and
        /dev/.statd.
      
      - remove dead auto allocation flag as well
      
      - remove code that does module get on dev open - it is handled by fops_get.
         Use init_special_inode consistently
      
      - get rid of struct cdev_type and bdev_type - both have just single dev_t
        now
      bf98c406
    • Andrew Morton's avatar
      [PATCH] snprintf fixes · 01d1a791
      Andrew Morton authored
      From: Juergen Quade <quade@hsnr.de>
      
      Lots of places in the kernel are using [v]snprintf wrongly: they assume it
      returns the number of characters copied.  It doesn't.  It returns the
      number of characters which _would_ have been copied had the buffer not been
      filled up.
      
      So create new functions vscnprintf() and scnprintf() which have the
      expected (sane) semaptics, and migrate callers over to using them.
      01d1a791
    • Andrew Morton's avatar
      [PATCH] bd_set_size i_size handling · 53b15b86
      Andrew Morton authored
      We need to hold i_sem while running i_size_write().  But that seems like a
      lot of fuss and deadlock potential.  So just write the dang thing.
      53b15b86
    • Andrew Morton's avatar
      [PATCH] Mark intermezzo as broken · eaaec5b5
      Andrew Morton authored
      The NGROUPS changes broke it, and we're not sure how to fixit, and nobody
      appears to be working on or testing intermezzo.
      eaaec5b5
    • Andrew Morton's avatar
      [PATCH] NGROUPS 2.6.2rc2 + fixups · a937b06e
      Andrew Morton authored
      From: Tim Hockin <thockin@sun.com>,
            Neil Brown <neilb@cse.unsw.edu.au>,
            me
      
      New groups infrastructure.  task->groups and task->ngroups are replaced by
      task->group_info.  Group)info is a refcounted, dynamic struct with an array
      of pages.  This allows for large numbers of groups.  The current limit of
      32 groups has been raised to 64k groups.  It can be raised more by changing
      the NGROUPS_MAX constant in limits.h
      a937b06e
    • Andrew Morton's avatar
      [PATCH] bonding alias revert and documentation fix · 7e594425
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Jeff Garzik disliked the bonding driver knowing it was called "bond0".
      Remove that alias, and revert documentation.
      7e594425
    • Andrew Morton's avatar
      [PATCH] add some more MODULE_ALIASes · 69b848dd
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      New MODULE_ALIASes in:
      1) arch/i386/kernel/microcode.c
      2) drivers/char/genrtc.c
      3) drivers/ide/ide-tape.c
      4) drivers/net/bonding/bond_main.c
      5) drivers/net/bsd_comp.c
      6) drivers/net/ppp_deflate.c
      7) drivers/net/ppp_generic.c
      69b848dd
    • Andrew Morton's avatar
      [PATCH] Documentation: remove /etc/modules.conf refs · bf5e91d7
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Someone complained about the number of references to /etc/modules.conf in
      the documentation.  While fixing them up (and examples where changed),
      removed those which are redundant due to MODULE_ALIAS.
      bf5e91d7
    • Andrew Morton's avatar
      [PATCH] AMD Elan is a different subarch · 4aef2132
      Andrew Morton authored
      From: Adrian Bunk <bunk@fs.tum.de>
      
      - AMD Elan is a different subarch, you can't configure a kernel that runs
        on both the AMD Elan and other i386 CPUs
      
      - added optimizing CFLAGS for the AMD Elan
      4aef2132
    • Andrew Morton's avatar
      [PATCH] gcc 2.95 supports -march=k6 (no need for check_gcc) · b26c400f
      Andrew Morton authored
      From: Adrian Bunk <bunk@fs.tum.de>
      
      gcc 2.95 supports -march=k6 (no need for check_gcc)
      b26c400f
    • Andrew Morton's avatar
      [PATCH] add Pentium M and Pentium-4 M options · 53720dcf
      Andrew Morton authored
      From: Adrian Bunk <bunk@fs.tum.de>
      
      add Pentium M and Pentium-4 M options:
      
      - add MPENTIUMM (equivalent to PENTIUMIII except for a bigger
        X86_L1_CACHE_SHIFT)
      
      - document that MPENTIUM4 is the right choice for a Pentium-4 M
      53720dcf
    • Andrew Morton's avatar
      [PATCH] Limit hashtable sizes · 7453596a
      Andrew Morton authored
      From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      
      The issue of exceedingly large hash tables has been discussed on the
      mailing list a while back, but seems to slip through the cracks.
      
      What we found is it's not a problem for x86 (and most other
      architectures) because __get_free_pages won't be able to get anything
      beyond order MAX_ORDER-1 (10) which means at most those hash tables are
      4MB each (assume 4K page size).  However, on ia64, in order to support
      larger hugeTLB page size, the MAX_ORDER is bumped up to 18, which now
      means a 2GB upper limits enforced by the page allocator (assume 16K page
      size).  PPC64 is another example that bumps up MAX_ORDER.
      
      Last time I checked, the tcp ehash table is taking a whooping (insane!)
      2GB on one of our large machine.  dentry and inode hash tables also take
      considerable amount of memory.
      
      Setting the size of these tables is difficult: they need to be constrained on
      many-zone ia64 machines, but this could cause significant performance
      problems when there are (for example) 100 million dentries in cache.
      Large-memory machines which do not slice that memory up into huge numbers of
      zones do not need to run the risk of this slowdown.
      
      So the sizing algorithms remain essentially unchanged, and boot-time options
      are provided which permit the tables to be scaled down.
      7453596a
    • Andrew Morton's avatar
      [PATCH] Use CPU_UP_PREPARE properly · 86c1b9ae
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      The cpu hotplug code actually provides two notifiers: CPU_UP_PREPARE
      which preceeds the online and can fail, and CPU_ONLINE which can't.
      
      Current usage is only done at boot, so this distinction doesn't
      matter, but it's a bad example to set.  This also means that the
      migration threads do not have to be higher priority than the
      others, since they are ready to go before any CPU_ONLINE callbacks
      are done.
      
      This patch is experimental but fairly straight foward: I haven't been
      able to test it since extracting it from the hotplug cpu code, so it's
      possible I screwed something up.
      86c1b9ae
    • Andrew Morton's avatar
      [PATCH] Remove More Unneccessary CPU Notifiers · 79caa7d5
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Three more removed CPU notifiers extracted from the hotplug CPU patch.
      
      kernel/softirq.c: the tasklet cpu prepration callback is useless:
      the vectors are already initialized to NULL.  Even with the hotplug
      CPU patches, they're of little or no use.
      
      fs/buffer.c: once again, they are already initialized to zero.
      
      mm/page_alloc.c: once again, already initialized to zero.
      79caa7d5
    • Andrew Morton's avatar
      [PATCH] Minor workqueue.c cleanup · d01feda8
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Move duplicated code to __queue_work(), and don't set the CPU for
      queue_delayed_work() until the timer goes off.  The second one only has an
      effect on CONFIG_HOTPLUG_CPU where the CPU goes down and the timer goes off
      on a different CPU than it was scheduled on.
      d01feda8
    • Andrew Morton's avatar
      [PATCH] Remove kstat cpu notifiers · 35651c8c
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Some well-meaning person put a notifier in for CPUs to update the kstat
      structures in sched.c.  However, it does nothing, and even with the full
      hotplug CPU patch, it still does nothing.
      
      Simple counters very rarely need anything done when CPUs come up or go
      down.  If you have per-cpu caches, or per-cpu threads, you need to do
      something.  But very rarely for stats.
      35651c8c
    • Andrew Morton's avatar
      [PATCH] kthread primitive · 933ba102
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      These two patches provide the framework for stopping kernel threads to
      allow hotplug CPU.  This one just adds kthread.c and kthread.h, next
      one uses it.
      
      Most importantly, adds a Monty Python quote to the kernel.
      
      Details:
      
      The hotplug CPU code introduces two major problems:
      
      1) Threads which previously never stopped (migration thread,
         ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline.
      2) Threads which previously never had to be created now have
         to be created when a CPU goes online.
      
      Unfortunately, stopping a thread is fairly baroque, involving memory
      barriers, a completion and spinning until the task is actually dead
      (for example, complete_and_exit() must be used if inside a module).
      
      There are also three problems in starting a thread:
      1) Doing it from a random process context risks environment contamination:
         better to do it from keventd to guarantee a clean environment, a-la
         call_usermodehelper.
      2) Getting the task struct without races is a hard: see kernel/sched.c
         migration_call(), kernel/workqueue.c create_workqueue_thread().
      3) There are races in starting a thread for a CPU which is not yet
         online: migration thread does a complex dance at the moment for
         a similar reason (there may be no migration thread to migrate us).
      
      Place all this logic in some primitives to make life easier:
      kthread_create() and kthread_stop().  These primitives require no
      extra data-structures in the caller: they operate on normal "struct
      task_struct"s.
      
      Other changes:
      
      - Expose keventd_up(), as keventd and migration threads will use kthread to
        launch, and kthread normally uses workqueues and must recognize this case.
      
      - Kthreads created at boot before "keventd" are spawned directly.  However,
        this means that they don't have all signals blocked, and hence can be
        killed.  The simplest solution is to always explicitly block all signals in
        the kthread.
      
      - Change over the migration threads, the workqueue threads and the
        ksoftirqd threads to use kthread.
      
      - module.c currently spawns threads directly to stop the machine, so a
        module can be atomically tested for removal.
      
      - Unfortunately, this means that the current task is manipulated (which
        races with set_cpus_allowed, for example), and it can't set its priority
        artificially high.  Using a kernel thread can solve this cleanly, and with
        kthread_run, it's simple.
      
      - kthreads use keventd, so they inherit its cpus_allowed mask.  Unset it.
        All current users set it explicity anyway, but it's nice to fix.
      
      - call_usermode_helper uses keventd, so the process created inherits its
        cpus_allowed mask.  Unset it.
      
      - Prevent errors in boot when cpus_possible() contains a cpu which is not
        online (ie.  a cpu didn't come up).  This doesn't happen on x86, since a
        boot failure makes that CPU no longer possible (hacky, but it works).
      
      - When the cpu fails to come up, some callbacks do kthread_stop(), which
        doesn't work without keventd (which hasn't started yet).  Call it directly,
        and take care that it restores signal state (note: do_sigaction does a
        flush on blocked signals, so we don't need to repeat it).
      933ba102
    • Andrew Morton's avatar
      [PATCH] ACPI PM timer · ad77865c
      Andrew Morton authored
      From: Dominik Brodowski <linux@dominikbrodowski.de>,
            John Stultz <johnstul@us.ibm.com>,
            Dmitry Torokhov
      
      Add the ACPI Powermanagement Timer as x86 kernel timing source.  Unlike the
      Time Stamp Counter, it is a reliable timing source which does not get
      affected by aggressive powermanagement features like CPU frequency scaling.
      
      Some ideas and some code are based on Arjan van de Ven's implementation for
      2.4, and on R.  Byron Moore's drivers/acpi/hardware/hwtimer.c.
      
      
      We also replace the loop based delay_pmtmr with a TSC based delay_pmtmr,
      which resolves a number of issues caused by the loop based delay.  Unsynced
      TSCs as well frequency changing TSCs will effect the length of __delay(), but
      it seems this method works best.
      ad77865c
    • Andrew Morton's avatar
      [PATCH] loop: remove redundant initialisation · ee6afa31
      Andrew Morton authored
      From: "Yury V. Umanets" <umka@namesys.com>
      
      This removes a redundant assignment in loop.
      ee6afa31
    • Andrew Morton's avatar
      [PATCH] loop.c doesn't fail init gracefully · 685eba2c
      Andrew Morton authored
      From: BlaisorBlade <blaisorblade_spam@yahoo.it>
      
      loop_init doesn't fail gracefully for two reasons:
      
      1) If initialization of loop driver fails, we have an call to
         devfs_add("loop") without any devfs_remove; I add that.
      
      2) On lwn.net 2.6 kernel docs, Jonathan Corbet says: "If you are calling
         add_disk() in your driver initialization routine, you should not fail
         the initialization process after the first call."
      
      So I make loop.c conform to this request by moving add_disk after all
      memory allocations.
      685eba2c