1. 01 Nov, 2002 14 commits
    • Alexander Viro's avatar
      [PATCH] scsi_get_request_dev() cleanup · e09824be
      Alexander Viro authored
      	_Now_ we can clean the scsi_get_request_dev() up.  Indeed, for
      any SCSI request we either have ->rq_dev == NODEV and ->rq_disk == NULL
      or ->rq_disk->private_data points to address of template in question.
      IOW, scsi_get_request_dev() becomes simply
      {
      	struct gendisk *p = req->rq_disk;
      	return p ? *(struct Scsi_Device_Template **)p->private_data : NULL;
      }
      and that allows to kill ->max_major, ->min_major and ->major in
      Scsi_Device_Template, along with the last non-trivial use of ->rq_dev.
      e09824be
    • Alexander Viro's avatar
      [PATCH] sg template · 1c48fba3
      Alexander Viro authored
      	Ditto for sg.c
      1c48fba3
    • Alexander Viro's avatar
      [PATCH] sd template · eb7f5858
      Alexander Viro authored
      	Ditto for sd.c
      eb7f5858
    • Alexander Viro's avatar
      [PATCH] sr template · 4a17ca93
      Alexander Viro authored
      	Ditto for sr.c
      4a17ca93
    • Alexander Viro's avatar
      [PATCH] st template · 1061e346
      Alexander Viro authored
      	Ditto for st.c
      1061e346
    • Alexander Viro's avatar
      [PATCH] osst template · 2572ba40
      Alexander Viro authored
      	Next 5 chunks prepare cleanup of scsi_get_request_dev().  Namely,
      scsi_disk/scsi_cd/... get a new field - pointer to Scsi_Device_Template.
      It is initialized with address of that driver's template.  sr.c and sd.c
      have disk->private_data pointing to that field (instead of pointing to
      entire structure).  osst.c, st.c and sg.c get gendisk - allocated, but
      not registered (obviously) with ->private_name set in the same way.  When
      they set ->rq_dev, they also set ->rq_disk.
      	This chunk does it for osst.c
      2572ba40
    • Alexander Viro's avatar
      [PATCH] file->private_data in st.c and osst.c · 02558d08
      Alexander Viro authored
      	->open() of st and osst sets file->private_data to Scsi_Tape in
      question, other methods use it (same as in sg.c)
      02558d08
    • Alexander Viro's avatar
      [PATCH] tape_name() in st.c · 10e76361
      Alexander Viro authored
      	* new inlined helper: tape_name(tape)
      	* most of TAPE_NR() uses replaced with that animal
      ("st%d ...", TAPE_NR(STp), ... -> "%s ...", tape_name(STp), ... )
      10e76361
    • Alexander Viro's avatar
      [PATCH] tape_name() in osst.c · 84e4e82e
      Alexander Viro authored
      	* new inlined helper: tape_name(tape)
      	* most of TAPE_NR() uses replaced with that animal
      ("osst%d ...", TAPE_NR(STp), ... -> "%s ...", tape_name(STp), ... )
      84e4e82e
    • Ivan Kokshaysky's avatar
      [PATCH] more alpha build fixes · 6c61f0f9
      Ivan Kokshaysky authored
      - isapnp: asm/io.h is needed for inb() etc.;
      - sync up with 2.5.44 vmlinux.lds changes.
      6c61f0f9
    • Alexander Viro's avatar
      [PATCH] fix 2.5.45 initrd breakage · f208c0ee
      Alexander Viro authored
      OK, that's my f*ckup in rd.c (not on initrd path, actually) + couple of
      f*ckups from Pat (mine: forgot to bump ->bd_count in rd_open(), Pat's:
      dropped reference to gendisk on del_gendisk(), resulting in use of
      kfree'd object + tried to remove a symlink that didn't exit).
      
      This fixes these.  It also changes order of blkdev_put()/del_gendisk()
      in initrd_release() - better safe than sorry.
      
      It got initrd working on my boxen...
      f208c0ee
    • Linus Torvalds's avatar
      Merge http://jfs.bkbits.net/linux-2.5 · e6f13c6c
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      e6f13c6c
    • Dave Kleikamp's avatar
      JFS: add posix acls · 1e1c167a
      Dave Kleikamp authored
      The posix acls are implemented as extended attributes and are compatible
      with ext2/ext3 posix acls.
      1e1c167a
    • Dave Kleikamp's avatar
      Merge jfs@jfs.bkbits.net:linux-2.5 · 19d973f0
      Dave Kleikamp authored
      into shaggy.austin.ibm.com:/shaggy/bk/jfs-2.5
      19d973f0
  2. 31 Oct, 2002 26 commits
    • Linus Torvalds's avatar
      Merge http://lia64.bkbits.net/to-linus-2.5 · 19cdce9c
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      19cdce9c
    • Linus Torvalds's avatar
      Merge bk://linux-bt.bkbits.net/bt-2.5 · 2b738648
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      2b738648
    • David Mosberger's avatar
      ia64: Update defconfig. · 50155729
      David Mosberger authored
      50155729
    • Erich Focht's avatar
      [PATCH] ia64: 2.5.44 NUMA fixups · b3dc1acc
      Erich Focht authored
      Dear David,
      
      please find attached two patches for the latest 2.5.44-ia64. They fix
      some problems and simplify things a bit.
      
      remove_nodeid-2.5.44.patch:
      This comes from Kimi. In 2.5.44 we suddenly had two definitions for
      numa_node_id(), one was IA64 specific (local_cpu_data->nodeid) while
      the other one is now platform independent:
      __cpu_to_node(smp_processor_id()). After some discussions we decided
      to remove the nodeid from the local_cpu_data and keep the definition of
      all other platforms. With using the cpu_to_node_map[] we are also
      faster when doing multiple lookups, as all node ids come in a single
      cache line (which is not bounced around, as it's content is only
      read).
      
      
      ia64_topology_fixup-2.5.44.patch:
      I'm following here the latest fixup for i386 from Matthew Dobson. The
      __node_to_cpu_mask() macro now accesses an array which is initialized
      after the ACPI CPU discovery. It also simplifies
      __node_to_first_cpu(). A compiler warning has been fixed, too.
      
      
      Please apply these to your kernel tree.
      b3dc1acc
    • David Mosberger's avatar
      ia64: Sync up with 2.5.45. · a376ed89
      David Mosberger authored
      a376ed89
    • Robert Love's avatar
      [PATCH] fix UP proc.c compile warning · 715814fd
      Robert Love authored
      The hyper-threading in /proc/cpuinfo patch introduced a compile warning
      under UP.
      
      Fixed thus.
      715814fd
    • Luca Barbieri's avatar
      [PATCH] Clear TLS on execve · 490f7ca4
      Luca Barbieri authored
      This trivial patch causes the TLS to be cleared on execve (code is in
      flush_thread).  This is necessary to avoid ESRCH errors when
      set_thread_area is asked to choose a free TLS entry after several nested
      execve's.
      
      The LDT also has a similar problem, but it is less serious because the
      LDT code doesn't scan for free entries.  I'll probably send a patch to
      fix this too, unless there is something important relying on this
      behavior.
      490f7ca4
    • Linus Torvalds's avatar
      Merge bk://cifs.bkbits.net/linux-2.5cifs · 08b27f50
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      08b27f50
    • John Levon's avatar
      [PATCH] fix APIC errors on oprofile restore · c8a9fb59
      John Levon authored
      As per comment:
      
        restoring APIC_LVTPC can trigger an apic error because the delivery
        mode and vector nr combination can be illegal. That's by design: on
        power on apic lvt contain a zero vector nr which are legal only for
        NMI delivery mode. So inhibit apic err before restoring lvtpc
      c8a9fb59
    • John Levon's avatar
      [PATCH] fix sys_lookup_dcookie prototype · 1c03b1a9
      John Levon authored
      We need to use u64 because the future 64-bit ports can theoretically
      return the same value for two different dentries, as pointed out by
      Ulrich Weigand.
      
      The patch also changes return value of the syscall to give length of
      data copied, needed for valgrind support (this bit is by Philippe Elie).
      
      Note this is not a complete fix for mixed 32/64: userspace needs to
      figure out the kernel pointer size when reading from the buffer. But
      that's another fix...
      
      NOTE! any oprofile users will need to upgrade after this goes in, and
      the user-space equivalent is checked into CVS.  Sorry for the inconvenience
      1c03b1a9
    • Andrew Morton's avatar
      [PATCH] additional arch support for per-cpu kernel_stat · 97679f9c
      Andrew Morton authored
      Companion to the previous patch: all the support needed for non-ia32
      architectures.
      97679f9c
    • Andrew Morton's avatar
      [PATCH] make kernel_stat use per-cpu infrastructure · fd3e6205
      Andrew Morton authored
      Patch from Ravikiran G Thirumalai <kiran@in.ibm.com>
      
      1. Break out disk stats from kernel_stat and move disk stat to blkdev.h
      
      2. Group cpu stat in kernel_stat and make them "per_cpu" instead of
         the NR_CPUS array
      
      3. Remove EXPORT_SYMBOL(kstat) from ksyms.c (as I noticed that no module is
         using kstat)
      fd3e6205
    • Andrew Morton's avatar
      [PATCH] uninlining in ipc/* · 8f2215c6
      Andrew Morton authored
      Uninlines some large functions in the ipc code.
      
      Before:
         text    data     bss     dec     hex filename
        30226     224     192   30642    77b2 ipc/built-in.o
      
      After:
         text    data     bss     dec     hex filename
        20274     224     192   20690    50d2 ipc/built-in.o
      8f2215c6
    • Andrew Morton's avatar
      [PATCH] use RCU for IPC locking · bb468c02
      Andrew Morton authored
      Patch from Mingming, Rusty, Hugh, Dipankar, me:
      
      - It greatly reduces the lock contention by having one lock per id.
        The global spinlock is removed and a spinlock is added in
        kern_ipc_perm structure.
      
      - Uses ReadCopyUpdate in grow_ary() for locking-free resizing.
      
      - In the places where ipc_rmid() is called, delay calling ipc_free()
        to RCU callbacks.  This is to prevent ipc_lock() returning an invalid
        pointer after ipc_rmid().  In addition, use the workqueue to enable
        RCU freeing vmalloced entries.
      
      Also some other changes:
      
      - Remove redundant ipc_lockall/ipc_unlockall
      
      - Now ipc_unlock() directly takes IPC ID pointer as argument, avoid
        extra looking up the array.
      
      The changes are made based on the input from Huge Dickens, Manfred
      Spraul and Dipankar Sarma.  In addition, Cliff White has run OSDL's
      dbt1 test on a 2 way against the earlier version of this patch.
      Results shows about 2-6% improvement on the average number of
      transactions per second.  Here is the summary of his tests:
      
                              2.5.42-mm2      2.5.42-mm2-ipclock
      			-----------------------------
      Average over 5 runs     85.0 BT         89.8 BT
      Std Deviation 5 runs     7.4  BT         1.0 BT
      
      Average over 4 best     88.15 BT        90.2 BT
      Std Deviation 4 best     2.8 BT          0.5 BT
      
      
      Also, another test today from Bill Hartner:
      
      I tested Mingming's RCU ipc lock patch using a *new* microbenchmark - semopbench.
      semopbench was written to test the performance of Mingming's patch.
      I also ran a 3 hour stress and it completed successfully.
      
      Explanation of the microbenchmark is below the results.
      Here is a link to the microbenchmark source.
      
      http://www-124.ibm.com/developerworks/opensource/linuxperf/semopbench/semopbench.c
      
      SUT : 8-way 700 Mhz PIII
      
      I tested 2.5.44-mm2 and 2.5.44-mm2 + RCU ipc patch
      
      >semopbench -g 64 -s 16 -n 16384 -r > sem.results.out
      >readprofile -m /boot/System.map | sort -n +0 -r > sem.profile.out
      
      The metric is seconds / per repetition.  Lower is better.
      
      kernel              run 1     run 2
                          seconds   seconds
      ==================  =======   =======
      2.5.44-mm2          515.1       515.4
      2.5.44-mm2+rcu-ipc   46.7        46.7
      
      With Mingming's patch, the test completes 10X faster.
      bb468c02
    • Andrew Morton's avatar
      [PATCH] tmpfs support for remap_file_pages · 0a4b1945
      Andrew Morton authored
      From Hugh
      
      Instate Ingo's shmem_populate on top of the previous patches, now using
      shmem_getpage(,,,SGP_QUICK) for the nonblocking case (its find_lock_page
      may block, but rarely for long).  Note install_page will need redefining
      if PAGE_CACHE_SIZE departs from PAGE_SIZE; note pgoff to populate must
      be in terms of PAGE_SIZE; note page_cache_release if install_page fails.
      
      filemap_populate similarly needs page_cache_release when install_page
      fails, but filemap.c not included in this patch since we started out
      from 2.5.43 rather than 2.5.43-mm2: whereas patches 1-8 could go
      directly to 2.5.43, this 9/9 belongs with Ingo's population work.
      0a4b1945
    • Andrew Morton's avatar
      [PATCH] sys_remap_file_pages · d16dc20c
      Andrew Morton authored
      Ingo's remap_file_pages patch.  Supported on ia32, x86-64, sparc
      and sparc64.  Others will need to update mman.h and the syscall
      tables.
      d16dc20c
    • Andrew Morton's avatar
      [PATCH] strip pagecache from to-be-reaped inodes · f9a316fa
      Andrew Morton authored
      With large highmem machines and many small cached files it is possible
      to encounter ZONE_NORMAL allocation failures.  This can be demonstrated
      with a large number of one-byte files on a 7G machine.
      
      All lowmem is filled with icache and all those inodes have a small
      amount of highmem pagecache which makes them unfreeable.
      
      The patch strips the pagecache from inodes as they come off the tail of
      the inode_unused list.
      
      I play tricks in there peeking at the head of the inode_unused list to
      pick up the inode again after running iput().  The alternatives seemed
      to involve more widespread changes.
      
      Or running invalidate_inode_pages() under inode_lock which would be a
      bad thing from a scheduling latency and lock contention point of view.
      f9a316fa
    • Andrew Morton's avatar
      [PATCH] exempt swapcahe pages from "use once" handling · 1bbb1949
      Andrew Morton authored
      The kernel will presently reclaim swapcache pages as they come off the
      tail of the inactive list even if they are referenced.  That's the
      "use-once" pagecache path and shouldn't be applied to swapcache pages.
      
      This affects very few pages in practice because all those pages tend to
      be mapped into pagetables anyway.
      1bbb1949
    • Andrew Morton's avatar
      [PATCH] empty the deferred lru-addition buffers in swapin_readahead · e550cf78
      Andrew Morton authored
      If we're about to return to userspace after performing some swap
      readahead, the pages in the deferred-addition LRU queues could stay
      there for some time.  So drain them after performing readahead.
      e550cf78
    • Andrew Morton's avatar
      [PATCH] start anon pages on the active list (properly this time) · 33709b5c
      Andrew Morton authored
      Use lru_cache_add_active() so ensure that pages which are, or will be
      mapped into pagetables are started out on the active list.
      33709b5c
    • Andrew Morton's avatar
      [PATCH] lru_add_active(): for starting pages on the active list · 228c3d15
      Andrew Morton authored
      This is the first in a series of patches which tune up the 2.5
      performance under heavy swap loads.
      
      Throughput on stupid swapstormy tests is increased by 1.5x to 3x.
      Still about 20% behind 2.4 with multithreaded tests.  That is not
      easily fixable - the virtual scan tends to apply a form of load
      control: particular processes are heavily swapped out so the others can
      get ahead.  With 2.5 all processes make very even progress and much
      more swapping is needed.  It's on par with 2.4 for single-process
      swapstorms.
      
      
      In this patch:
      
      The code which tries to start mapped pages out on the active list
      doesn't work very well.  It uses an "is it mapped into pagetables"
      test.  Which doesn't work for, say, swap readahead pages.  They are not
      mapped into pagetables when they are spilled onto the LRU.
      
      So create a new `lru_cache_add_active()' function for deferred addition
      of pages to their active list.
      
      Also move mark_page_accessed() from filemap.c to swap.c where all
      similar functions live.  And teach it to not try to move pages which
      are in the deferred-addition list onto the active list.  That won't
      work, and it's bogusly clearing PageReferenced in that case.
      
      The deferred-addition lists are a pest.  But lru_cache_add used to be
      really expensive in sime workloads on some machines.  Must persist.
      228c3d15
    • Andrew Morton's avatar
      [PATCH] flush_dcache_page in get_user_pages() · e735f278
      Andrew Morton authored
      Davem said:
      
      "Ho hum, it is tricky :-)))
      
       At bio_map_user() you need to see the user's most recent write to the
       page if you are going "user --> device".  So if "user --> device"
       bio_map_user() must flush_dcache_page().
      
       I find the write_to_vm condition confusion which is probably why I am
       sitting here spelling this out :-)
      
       At bio_unmap_user(), if we are going "device --> user" you have to
       flush_dcache_page().  And actually, this flush could just as
       legitimately occur at bio_map_user() time.
      
       Therefore, the easiest thing to do is always flush_dcache_page() at
       bio_map_user().
      
       All the other cases are going to be like this, so we might as well
       cut to the chase and flush_dcache_page() for all the pages inside of
       get_user_pages()."
      e735f278
    • Andrew Morton's avatar
      [PATCH] uninline some things in mm/*.c · 79425084
      Andrew Morton authored
      Tuned for gcc-2.95.3:
      
      	filemap.c:	10815 -> 10046
      	highmem.c:	3392 -> 3104
      	mmap.c:		5998 -> 5854
      	mremap.c:	3058 -> 2802
      	msync.c:	1521 -> 1489
      	page_alloc.c:	8487 -> 8167
      79425084
    • Andrew Morton's avatar
      [PATCH] speedup heuristic for get_unmapped_area · 631709da
      Andrew Morton authored
      [I was going to send shared pagetables today, but it failed in
       my testing under X :( ]
      
      the first one is an mmap inefficiency that was reported by Saurabh Desai.
      The test_str02 NPTL test-utility does the following: it tests the maximum
      number of threads by creating a new thread, which thread creates a new
      thread itself, etc. It basically creates thousands of parallel threads,
      which means thousands of thread stacks.
      
      NPTL uses mmap() to allocate new default thread stacks - and POSIX
      requires us to install a 'guard page' as well, which is done via
      mprotect(PROT_NONE) on the first page of the stack. This means that tons
      of NPTL threads means 2* tons of vmas per MM, all allocated in a forward
      fashion starting at the virtual address of 1 GB (TASK_UNMAPPED_BASE).
      
      Saurabh reported a slowdown after the first couple of thousands of
      threads, which i can reproduce as well. The reason for this slowdown is
      the get_unmapped_area() implementation, which tries to achieve the most
      compact virtual memory allocation, by searching for the vma at
      TASK_UNMAPPED_BASE, and then linearly searching for a hole. With thousands
      of linearly allocated vmas this is an increasingly painful thing to do ...
      
      obviously, high-performance threaded applications will create stacks
      without the guard page, which triggers the anon-vma merging code so we end
      up with one large vma, not tons of small vmas.
      
      it's also possible for userspace to be smarter by setting aside a stack
      space and keeping a bitmap of allocated stacks and using MAP_FIXED (this
      also enables it to do the guard page not via mprotect() but by keeping the
      stacks apart by 1 page - ie. half the number of vmas) - but this also
      decreases flexibility.
      
      So i think that the default behavior nevertheless makes sense as well, so
      IMO we should optimize it in the kernel.
      
      there are various solutions to this problem, none of which solve the
      problem in a 100% sufficient way, so i went for the simplest approach: i
      added code to cache the 'last known hole' address in mm->free_area_cache,
      which is used as a hint to get_unmapped_area().
      
      this fixed the test_str02 testcase wonderfully, thread creation
      performance for this testcase is O(1) again, but this simpler solution
      obviously has a number of weak spots, and the (unlikely but possible)
      worst-case is quite close to the current situation. In any case, this
      approach does not sacrifice the perfect VM compactness out mmap()
      implementation achieves, so it's a performance optimization with no
      externally visible consequences.
      
      The most generic and still perfectly-compact VM allocation solution would
      be to have a vma tree for the 'inverse virtual memory space', ie. a tree
      of free virtual memory ranges, which could be searched and iterated like
      the space of allocated vmas. I think we could do this by extending vmas,
      but the drawback is larger vmas. This does not save us from having to scan
      vmas linearly still, because the size constraint is still present, but at
      least most of the anon-mmap activities are constant sized. (both malloc()
      and the thread-stack allocator uses mostly fixed sizes.)
      
      This patch contains some fixes from Dave Miller - on some architectures
      it is not posible to evaluate TASK_UNMAPPED_BASE at compile-time.
      631709da
    • Andrew Morton's avatar
      [PATCH] Orlov block allocator for ext2 · b2205dc0
      Andrew Morton authored
      This is Al's implementation of the Orlov block allocator for ext2.
      
      At least doubles the throughput for the traverse-a-kernel-tree
      test and is well tested.
      
      I still need to do the ext3 version.
      
      No effort has been put into tuning it at this time, so more gains
      are probably possible.
      b2205dc0
    • Linus Torvalds's avatar
      Merge bk://ldm.bkbits.net/linux-2.5-kobject · 4856e09e
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      4856e09e