1. 12 Apr, 2003 24 commits
    • Neil Brown's avatar
      [PATCH] kNFSd: NFSD binary compatibility breakage · 88d416db
      Neil Brown authored
      The removal of "struct nfsctl_uidmap" from "nfsctl_fdparm" broke
      binary compatiblity on 64-bit platforms (strictly speaking: on all
      platforms with alignof(void *) > alignof(int)).  The problem is that
      nfsctl_uidmap contained a "char *", which forced the alignment of the
      entire union to be 64 bits.  With the removal of the uidmap, the
      required alignment drops to 32 bits.  Since the first member is only
      32 bits in size, this breaks compatibility with user-space.  Patch
      below fixes the problem.
      88d416db
    • Neil Brown's avatar
      [PATCH] kNFSd: Return correct result for ACCESS(READ) on eXecute-only file. · 4fe13364
      Neil Brown authored
      Currently, an NFSv3 ACCESS check for READ permission on an
      eXecute-only file will succeed where it should fail.
      
      This is because nfsd_permission allows READ access to eXecute only
      files so that mode 711 executables can be loaded and run, and
      nfsd_access simply uses nfsd_permission.
      
      This patch changes nfsd_permission to only map eXecute permission to
      read permission of MAY_OWNER_OVERRIDE was set.  This is only set
      when trying to read from a file, so ACCESS will no longer be tricked.
      
      This change will only affect callers of nfsd_permission that specify
      MAY_READ and not MAY_OWNER_OVERRIDE, and nfsd_access is the only
      routine that calls nfsd_permission (via fh_verify) that way.
      4fe13364
    • Neil Brown's avatar
      [PATCH] kNFSd: nfsd/export.c tidyup and add missing exp_put · 3a280533
      Neil Brown authored
      There was a missing exp_put in export.c so that after a client
      mounts an exported filesystem, the server would never be able to
      unmount, even after trying to unexport.  This is fixed by the last
      chunk of this patch.
      
      Also assorted cleanups to the code found while hunting.
      3a280533
    • Andrew Morton's avatar
      [PATCH] Put all functions in kallsyms · 4b4bd81a
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Introduce _sinittext and _einittext (cf. _stext and _etext), so kallsyms
      includes __init functions.
      
      TODO: Use huffman name compression and 16-bit offsets (see IDE
      oopser patch)
      4b4bd81a
    • Andrew Morton's avatar
      [PATCH] use spinlocking in the ext2 inode allocator · 77a9874a
      Andrew Morton authored
      From Alex Tomas and myself
      
      It is identical in concept to the block allocator change.  It uses the same
      hashed spinlock.
      77a9874a
    • Andrew Morton's avatar
      [PATCH] use spinlocking in the ext2 block allocator · c14c1a44
      Andrew Morton authored
      From Alex Tomas and myself
      
      ext2 currently uses lock_super() to protect the filesystem's in-core block
      allocation bitmaps.
      
      On big SMP machines the contention on that semaphore is causing high context
      switch rates, large amounts of idle time and reduced throughput.
      
      The context switch rate can also worsen block allocation: if several tasks
      are trying to allocate blocks inside the same blockgroup for different files,
      madly rotating between those tasks will cause the files' blocks to be
      intermingled.
      
      On SDET and dbench-style worloads (lots of tasks doing lots of allocation)
      this patch (and a similar one for the inode allocator) improve throughout on
      an 8-way by ~15%.  On 16-way NUMAQ the speedup is 150%.
      
      What wedo isto remove the lock altogether and just rely on the atomic
      semantics of test_and_set_bit(): if the allocator sees a block was free it
      runs test_and_set_bit().  If that fails, then we raced and the allocator will
      go and look for another block.
      
      Of course, we don't really use test_and_set_bit() because that
      isn'tendian-dependent.  New atomic endian-independent functions are
      introduced: ext2_set_bit_atomic() and ext2_clear_bit_atomic().  We do not
      need ext2_test_bit_atomic(), since even if ext2_test_bit() returns the wrong
      result, that error will be detected and naturally handled in the subsequent
      ext2_set_bit_atomic().
      
      For little-endian machines the new atomic ops map directly onto the
      test_and_set_bit(), etc.
      
      For big-endian machines we provide the architecture's impementation with the
      address of a spinlock whcih can be taken around the nonatomic ext2_set_bit().
       The spinlocks are hashed, and the hash is scaled according to the machine
      size.  Architectures are free to implement optimised versions of
      ext2_set_bit_atomic() and ext2_clear_bit_atomic().
      c14c1a44
    • Andrew Morton's avatar
      [PATCH] blockgroup_lock: hashed spinlocks for ext2 and ext3 · c9db333a
      Andrew Morton authored
      ext2 and ext3 per-blockgroup metadata needs locking.  An fs-wide lock is
      expensive, and a per-blockgroup lock consumes too much storage (up to 32768
      blockgroups per filesystem).  We need something in-between.
      
      blockgroup_locks are very simple hashed spinlocks which provide this
      compromise.  The size of the lock is scaled by NR_CPUS to implement an
      additional speed/space tradeoff.
      
      These locks are actually fairly generic.  However I presented it as something
      which is specific to ext2 and ext3 so that people wouldn't go using them all
      over the place.  They consume a lot of storage.
      c9db333a
    • Andrew Morton's avatar
      [PATCH] percpu_counters: approximate but scalable counters · ba8e8755
      Andrew Morton authored
      Several places in ext2 and ext3 are using filesystem-wide counters which use
      global locking.  Mainly for the orlov allocator's heuristics.
      
      To solve the contention which this causes we can trade off accuracy against
      speed.
      
      This patch introduces a "percpu_counter" library type in which the counts are
      per-cpu and are periodically spilled into a global counter.  Readers only
      read the global counter.
      
      These objects are *large*.  On a 32 CPU P4, they are 4 kbytes.  On a 4 way
      p3, 128 bytes.
      ba8e8755
    • Andrew Morton's avatar
      [PATCH] /proc/meminfo documentation · f688c084
      Andrew Morton authored
      From: Dave Hansen <haveblue@us.ibm.com>
      
      Documents the information in /proc/meminfo
      f688c084
    • Andrew Morton's avatar
      [PATCH] vmalloc stats in /proc/meminfo · ffa5b8eb
      Andrew Morton authored
      From: Matt Porter <porter@cox.net>
      
      There was a thread a while back on lkml where Dave Hansen proposed this
      simple vmalloc usage reporting patch.  The thread pretty much died out as
      most people seemed focused on what VM loading type bugs it could solve.  I
      had posted that this type of information was really valuable in debugging
      embedded Linux board ports.  A common example is where people do arch
      specific setup that limits there vmalloc space and then they find modules
      won't load.  ;) Having the Vmalloc* info readily available is real useful in
      helping folks to fix their kernel ports.
      ffa5b8eb
    • Andrew Morton's avatar
      [PATCH] /proc/interrupts allocates too much memory · 873015a8
      Andrew Morton authored
      From: David Mosberger <davidm@napali.hpl.hp.com>
      
      interrupts_open() can easily try to kmalloc() more memory than
      supported by kmalloc.  E.g., with 16KB page size and NR_CPUS==64, it
      would try to allocate 147456 bytes.
      
      The workaround below is to allocate 4KB per 8 CPUs.  Not really a
      solution, but the fundamental problem is that /proc/interrupts
      shouldn't use a fixed buffer size in the first place.  I suppose
      another solution would be to use vmalloc() instead.  It all feels like
      bandaids though.
      873015a8
    • Andrew Morton's avatar
      [PATCH] Fix kmalloc_sizes[] indexing · 830d6ef2
      Andrew Morton authored
      From: Brian Gerst and David Mosberger
      
      The previous fix to the kmalloc_sizes[] array didn't null-terminate the
      correct array.
      
      Fix that up, and also avoid running ARRAY_SIZE() against an array which is
      really a null-terminated list.
      830d6ef2
    • Andrew Morton's avatar
      [PATCH] architecture hooks for mem_map initialization · 17817b89
      Andrew Morton authored
      From: Christoph Hellwig <hch@lst.de>
      
      This patch is from the IA64 tree, with minor cleanups from me.
      
      Split out initialization of pgdat->node_mem_map into a separate function
      and allow architectures to override it.  This is needed for HP IA64
      machines that have a virtually mapped memory map to support big
      memory holes without having to use discontigmem.
      
      (memmap_init_zone is non-static to allow the IA64 code to use it -
       I did that instead of passing it's address into the arch hook as
       it is done currently in the IA64 tree)
      17817b89
    • Andrew Morton's avatar
      [PATCH] bootmem speedup from the IA64 tree · 79e626e1
      Andrew Morton authored
      From: Christoph Hellwig <hch@lst.de>
      
      This patch is from the IA64 tree, with some minor cleanups by me.
      David described it as:
      
        This is a performance speed up and some minor indendation fixups.
      
        The problem is that the bootmem code is (a) hugely slow and (b) has
        execution that grow quadratically with the size of the bootmap bitmap.
        This causes noticable slowdowns, especially on machines with (relatively)
        large holes in the physical memory map.  Issue (b) is addressed by
        maintaining the "last_success" cache, so that we start the next search
        from the place where we last found some memory (this part of the patch
        could stand additional reviewing/testing).  Issue (a) is addressed by
        using find_next_zero_bit() instead of the slow bit-by-bit testing.
      79e626e1
    • Andrew Morton's avatar
      [PATCH] convert file_lock to a spinlock · a413a276
      Andrew Morton authored
      Time to write a 2M file, one byte at a time:
      
      Before:
              1.09s user 4.92s system 99% cpu 6.014 total
              0.74s user 5.28s system 99% cpu 6.023 total
              1.03s user 4.97s system 100% cpu 5.991 total
      
      After:
      	0.79s user 5.17s system 99% cpu 5.993 total
      	0.79s user 5.17s system 100% cpu 5.957 total
      	0.84s user 5.11s system 100% cpu 5.942 total
      a413a276
    • Andrew Morton's avatar
      [PATCH] correct vm_page_prot on stack pages · 300c2652
      Andrew Morton authored
      From: David Mosberger <davidm@napali.hpl.hp.com>
      
      The patch below is needed to make it possible to map stack pages
      without execution permission (as we do on ia64).
      300c2652
    • Andrew Morton's avatar
      [PATCH] don't clear PG_uptodate on ENOSPC · 2accc2e3
      Andrew Morton authored
      If get_block() returns -ENOSPC __block_write_full_page() is currently
      clearing PG_uptodate.
      
      Tht doesn't make any sense - failure to allocate space (or an IO error) does
      not make the page not uptodate.  It will create pages which are dirty, mapped
      into pagetables and not uptodate, which is a nonsensical state.
      2accc2e3
    • Andrew Morton's avatar
      [PATCH] Fix deadlock with ext3+quota · 36b4f825
      Andrew Morton authored
      From: Jan Kara <jack@ucw.cz>
      
      Fixes a deadlock-causing lock-ranking bug between dqio_sem and
      journal_start().
      
      It sets up the needed infrastructure so that the quota code's sync_dquot()
      operation can call into ext3 and arrange for the transaction start to be
      nested outside the taking of dqio_sem.
      36b4f825
    • Andrew Morton's avatar
      [PATCH] Remove flush_page_to_ram() · edf20d3a
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      This patch removes the long deprecated flush_page_to_ram.  We have
      two different schemes for doing this cache flushing stuff, the old
      flush_page_to_ram way and the not so old flush_dcache_page etc. way:
      see DaveM's Documentation/cachetlb.txt.  Keeping flush_page_to_ram
      around is confusing, and makes it harder to get this done right.
      
      All architectures are updated, but the only ones where it amounts
      to more than deleting a line or two are m68k, mips, mips64 and v850.
      
      I followed a prescription from DaveM (though not to the letter), that
      those arches with non-nop flush_page_to_ram need to do what it did
      in their clear_user_page and copy_user_page and flush_dcache_page.
      
      Dave is consterned that, in the v850 nb85e case, this patch leaves its
      flush_dcache_page as was, uses it in clear_user_page and copy_user_page,
      instead of making them all flush icache as well.  That may be wrong:
      I'm just hesitant to add cruft blindly, changing a flush_dcache macro
      to flush icache too; and naively hope that the necessary flush_icache
      calls are already in place.  Miles, please let us know which way is
      right for v850 nb85e - thanks.
      edf20d3a
    • Andrew Morton's avatar
      [PATCH] remove the test for null waitqueue in __wake_up() · 831cbe24
      Andrew Morton authored
      I've had a warning in there for 4-5 months and it has never triggered.  I
      think it's safe to remove this test.
      831cbe24
    • Andrew Morton's avatar
      [PATCH] Fix gen_rtc compilation error · a802b873
      Andrew Morton authored
      From: Geert Uytterhoeven <geert@linux-m68k.org>
      
      It updates include/asm-{generic,parisc}/rtc.h for the recent changes in
      drivers/char/genrtc.c and include/asm-{m68k,ppc}/rtc.h.
      
      get_rtc_time() now returns some RTC flags instead of a 0/-1 success/failure
      indicator.  These flags include:
      
         - RTC_BATT_BAD: RTC battery is bad (can be detected on PA-RISC)
         - RTC_24H: Clock runs in 24 hour mode
      
      Most of these flags are the same as drivers/char/rtc.c, but RTC_BATT_BAD is a
      new one.
      a802b873
    • Andrew Morton's avatar
      [PATCH] radix_tree_delete API improvement · ed49cb09
      Andrew Morton authored
      radix_tree_delete() currently returns 0 on success, -ENOENT if there was
      nothing to delete.
      
      But it is more useful to return the address of the deleted item on success
      and NULL if there was no matching item.  It can potentially save a
      lookup+delete operation.
      ed49cb09
    • Andrew Morton's avatar
      [PATCH] kobject hotplug fixes · 932fd605
      Andrew Morton authored
      - allocated storage `envp' was being leaked on an error path
      
      - kmalloc() returns void*, no need to cast it
      
      - don't return 0 from a void-returning function
      
      Greg has acked this patch.
      932fd605
    • Ben Collins's avatar
      a94538ff
  2. 11 Apr, 2003 16 commits