1. 15 Sep, 2002 6 commits
    • Andrew Morton's avatar
      [PATCH] fix reverse map accounting leak · 05d9bac3
      Andrew Morton authored
      From Hugh Dickins.  Fix a leak in the /proc/meminfo:ReverseMaps
      accounting.
      05d9bac3
    • Andrew Morton's avatar
      [PATCH] hugetlb pages · c9d3808f
      Andrew Morton authored
      Rohit Seth's ia32 huge tlb pages patch.
      
      Anton Blanchard took a look at this today; he seemed happy
      with it and said he could borrow bits.
      c9d3808f
    • Andrew Morton's avatar
      [PATCH] resurrect /proc/meminfo:Buffers · fca174cc
      Andrew Morton authored
      The /proc/meminfo:Buffers statistic is quite useful - it tells us
      how effective we are being at caching filesystem metadata.
      
      For example, increases in this figure are a measure of success of the
      slablru and buffer_head-limitation patches.
      
      The patch resurrects buffermem accounting.  The metric is calculated
      on-demand, via a walk of the blockdev hashtable.
      fca174cc
    • Andrew Morton's avatar
      [PATCH] low-latency zap_page_range · e572ef2e
      Andrew Morton authored
      zap_page_range and truncate are the two main latency problems
      in the VM/VFS.  The radix-tree-based truncate grinds that into
      the dust, but no algorithmic fixes for pagetable takedown have
      presented themselves...
      
      Patch from Robert Love.
      
      Attached patch implements a low latency version of "zap_page_range()".
      
      Calls with even moderately large page ranges result in very long lock
      held times and consequently very long periods of non-preemptibility.
      This function is in my list of the top 3 worst offenders.  It is gross.
      
      This new version reimplements zap_page_range() as a loop over
      ZAP_BLOCK_SIZE chunks.  After each iteration, if a reschedule is
      pending, we drop page_table_lock and automagically preempt.  Note we can
      not blindly drop the locks and reschedule (e.g. for the non-preempt
      case) since there is a possibility to enter this codepath holding other
      locks.
      
      ... I am sure you are familar with all this, its the same deal as your
      low-latency work.  This patch implements the "cond_resched_lock()" as we
      discussed sometime back.  I think this solution should be acceptable to
      you and Linus.
      
      There are other misc. cleanups, too.
      
      This new zap_page_range() yields latency too-low-to-benchmark: <<1ms.
      e572ef2e
    • Linus Torvalds's avatar
      Linux v2.5.35 · 697f3abe
      Linus Torvalds authored
      697f3abe
    • Linus Torvalds's avatar
      Merge bk://ppc.bkbits.net/for-linus-ppc · 11a5dbb4
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      11a5dbb4
  2. 16 Sep, 2002 10 commits
  3. 15 Sep, 2002 10 commits
    • Ingo Molnar's avatar
      [PATCH] thread exec fix, BK-curr · 71ee22d3
      Ingo Molnar authored
      The broadcast SIGKILL kept pending in the new thread as well, and killed
      it prematurely ...
      71ee22d3
    • Linus Torvalds's avatar
      9325c684
    • Ingo Molnar's avatar
      [PATCH] thread-exec-2.5.34-B1, BK-curr · 63540cea
      Ingo Molnar authored
      This implements one of the last missing POSIX threading details - exec()
      semantics.  Previous kernels had code that tried to handle it, but that
      code had a number of disadvantages:
      
       - it only worked if the exec()-ing thread was the thread group leader,
         creating an assymetry. This does not work if the thread group leader
         has exited already.
      
       - it was racy: it sent a SIGKILL to every thread in the group but did not
         wait for them to actually process the SIGKILL. It did a yield() but
         that is not enough. All 'other' threads have to finish processing
         before we can continue with the exec().
      
      This adds the same logic, but extended with the following enhancements:
      
       - works from non-leader threads just as much as the thread group leader.
      
       - waits for all other threads to exit before continuing with the exec().
      
       - reuses the PID of the group.
      
      It would perhaps be a more generic approach to add a new syscall,
      sys_ungroup() - which would do largely what de_thread() does in this
      patch.
      
      But it's not really needed now - posix_spawn() is currently implemented
      via starting a non-CLONE_THREAD helper thread that does a sys_exec().
      There's no API currently that needs a direct exec() from a thread - but
      it could be created (such as pthread_exec_np()).  It would have the
      advantage of not having to go through a helper thread, but the
      difference is minimal.
      63540cea
    • Ingo Molnar's avatar
      [PATCH] exit-fix-2.5.34-C0, BK-curr · 7cd0a691
      Ingo Molnar authored
      This fixes one more exit-time resource accounting issue - and it's also
      a speedup and a thread-tree (to-be thread-aware pstree) visual
      improvement.
      
      In the current code we reparent detached threads to the init thread.
      This worked but was not very nice in ps output: threads showed up as
      being related to init.  There was also a resource-accounting issue, upon
      exit they update their parent's (ie.  init's) rusage fields -
      effectively losing these statistics.  Eg.  'time' under-reports CPU
      usage if the threaded app is Ctrl-C-ed prematurely.
      
      The solution is to reparent threads to the group leader - this is now
      very easy since we have p->group_leader cached and it's also valid all
      the time.  It's also somewhat faster for applications that use
      CLONE_THREAD but do not use the CLONE_DETACHED feature.
      7cd0a691
    • Ingo Molnar's avatar
      [PATCH] wait4-fix-2.5.34-B2, BK-curr · 975639b1
      Ingo Molnar authored
      This fixes a number of bugs that broke ptrace:
      
       - wait4 must not inhibit TASK_STOPPED processes even for thread group
         leaders.
      
       - do_notify_parent() should not delay the notification of parents if
         the thread in question is ptraced.
      
      strace now works as expected for CLONE_THREAD applications as well.
      975639b1
    • Ingo Molnar's avatar
      [PATCH] exit-thread-2.5.34-A0, BK-curr · a8194b4e
      Ingo Molnar authored
      This optimizes sys_exit_group() to only take the siglock if it's a true
      thread group.  Boots & works fine.
      a8194b4e
    • Ingo Molnar's avatar
      [PATCH] detached-fix-2.5.34-A0, BK-curr · 292c2c8d
      Ingo Molnar authored
      This fixes three resource accounting related bugs introduced by detached
      threads:
      
       - the 'child CPU usage' fields were updated in wait4 until now - this was
         slightly buggy for a number of reasons, eg. if the exit_code writout
         faults then it's possible to trigger this code multiple times.
      
       - those threads that do not go through wait4 were not properly accounted.
      
       - sched_exit() was incorrectly assuming that current == parent. In the
         detached case p->parent is the real parent.
      
      with this patch applied things like 'time' work again for new-style
      threaded apps.
      292c2c8d
    • Ingo Molnar's avatar
      [PATCH] clone-fix-2.5.34-A0, BK-curr · 97600f56
      Ingo Molnar authored
      This fixes a clone-flags bug noticed by Roland McGrath.  The current
      CLONE_DETACHED & CLONE_THREAD forcing code did things in the wrong
      order, which makes it possible to force an oops the following way:
      
              main () { syscall(120, 0x00400000); }
      
      instead of changing the order of CLONE_SIGHAND and CLONE_THREAD flag
      forcing (which would fix the bug), the proper approach is to fail with
      -EINVAL if invalid combinations of clone flags are detected.  This
      change does not affect existing applications.
      97600f56
    • Ingo Molnar's avatar
      [PATCH] wait4-fix-2.5.34-A0, BK-curr · a969214c
      Ingo Molnar authored
      the attached patch (against BK-curr) fixes a sys_wait4() bug noticed by
      Ulrich Drepper. The kernel would not block properly if there are eligible
      children delayed due to the new delayed thread-group-leader logic. The
      solution is to introduce a new type of 'eligible child' type - and skip
      over delayed children but set the wait4 flag nevertheless.
      
      The libpthreads testcase that failed due to it now it works fine.
      a969214c
    • Paul Mackerras's avatar
      Merge samba.org:/home/paulus/kernel/linux-2.5 · 81803bc1
      Paul Mackerras authored
      into samba.org:/home/paulus/kernel/for-linus-ppc
      81803bc1
  4. 14 Sep, 2002 10 commits
    • Paul Mackerras's avatar
      Merge samba.org:/home/paulus/kernel/linux-2.5 · 48f97bf4
      Paul Mackerras authored
      into samba.org:/home/paulus/kernel/for-linus-ppc
      48f97bf4
    • Linus Torvalds's avatar
      Make sure MTRR setting is atomic on SMP, since · c7ce0140
      Linus Torvalds authored
       - HT CPU's can share the MTRR state between cores
       - the code uses static variables that are shared
      c7ce0140
    • Linus Torvalds's avatar
      Merge master.kernel.org:/home/acme/BK/llc-2.5 · 0ef01f36
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      0ef01f36
    • Ingo Molnar's avatar
      [PATCH] hide-threads-2.5.34-C1 · a5d2bf7b
      Ingo Molnar authored
      I fixed up the 'remove thread group inferiors from the tasklist' patch. I
      think i managed to find a reasonably good construct to iterate over all
      threads:
      
      	do_each_thread(g, p) {
      		...
      	} while_each_thread(g, p);
      
      the only caveat with this is that the construct suggests a single-loop -
      while it's two loops internally - and 'break' will not work. I added a
      comment to sched.h that warns about this, but perhaps it would help more
      to have naming that suggests two loops:
      
      	for_each_process_do_each_thread(g, p) {
      		...
      	} while_each_thread(g, p);
      
      but this looks a bit too long. I dont know. We might as well use it all
      unrolled and no helper macros - although with the above construct it's
      pretty straightforward to iterate over all threads in the system.
      a5d2bf7b
    • Petr Vandrovec's avatar
      [PATCH] 2.5.34-bk fcntl lockup · 8fd85682
      Petr Vandrovec authored
      This fixes endless loop without schedule which happens as soon as smbd
      invokes fcntl64(7, F_SETLK64, ...).  fcntl_setlk64 gets cmd F_SETLK64,
      not F_SETLK tested in the loop;
      
      Maybe return value from posix_lock_file should be changed to -EINPROGRESS
      or -EJUKEBOX instead of testing passed cmd in callers, but this oneliner
      works too. If you preffer changing posix_lock_file return value to clearly
      distinugish between -EAGAIN and lock request queued, I'll do that.
      8fd85682
    • Ingo Molnar's avatar
      [PATCH] signal failures in nightly LTP test · bbd9f14c
      Ingo Molnar authored
      On 13 Sep 2002, Paul Larson wrote:
      >
      > The nightly LTP test against the 2.5 kernel bk tree last night turned up
      > some test failures we don't normally see.  These failures did not show
      > up in the run from the previous night.
      
      [...]
      > I found what was breaking this, looks like it was this change from your
      > shared thread signals patch:
      > -	if (sig < 1 || sig > _NSIG ||
      > -	    (act && (sig == SIGKILL || sig == SIGSTOP)))
      > +	if (sig < 1 || sig > _NSIG || (act && sig_kernel_only(sig)))
      
      This fixes this bug and a number of others in the same class - the
      signal behavior bitmasks should never be consulted before making sure
      that the signal is in the word range.
      bbd9f14c
    • Ingo Molnar's avatar
      [PATCH] thread exit deadlock bug · eda4d244
      Ingo Molnar authored
      This fixes the Mozilla SMP lockup in the exit path.
      eda4d244
    • Neil Brown's avatar
      [PATCH] PATCH - cset 1.497.59.25 breaks MD autodetect · e335a273
      Neil Brown authored
      The partition changes shifted a lot of indexes down one, but this one
      shouldn't have been shifted...
      e335a273
    • Paul Mackerras's avatar
      Merge au1.ibm.com:/home/paulus/kernel/linux-2.5 · bb5eec4a
      Paul Mackerras authored
      into au1.ibm.com:/home/paulus/kernel/for-linus-ppc
      bb5eec4a
    • Arnaldo Carvalho de Melo's avatar
      [LLC] remove all tmr ev structs & fix psnap and p8022 wrt ui sending · ad2bce43
      Arnaldo Carvalho de Melo authored
      . No need for the timer_running member on llc_timer,
        we only need it in one place, and timer_pending is
        equivalent. One more procom OS generalisation killed.
      . Move the skb->protocol assignment in llc_build_and_send_pkt
        routines and llc_ui_send_data to the caller, this is the common
        practice in Linux networking code (think netif_rx) and required
        to keep the request functions in psnap and p8022 simple.
      . Remove the rpt_status (report status) ev members, not
        used at all, not even in the original procom code.
      . Convert psnap and p8022 request functions to use
        llc_ui_build_and_send_ui_pkt, removing all the prim cruft.
      ad2bce43
  5. 13 Sep, 2002 4 commits
    • Andrew Morton's avatar
      [PATCH] Use a sync iocb for generic_file_read · acf7aa2c
      Andrew Morton authored
      This adds support for synchronous iocbs and converts generic_file_read
      to use a sync iocb to call into generic_file_aio_read.
      
      The tests I've run with lmbench on a piii-866 showed no difference in
      file re-read speed when forced to use a completion path via aio_complete
      and an -EIOCBQUEUED return from generic_file_aio_read -- people with
      slower machines might want to test this to see if we can tune it any
      better.  Also, a bug fix to correct a missing call into the aio code
      from the fork code is present.  This patch sets things up for making
      generic_file_aio_read actually asynchronous.
      acf7aa2c
    • Andrew Morton's avatar
      [PATCH] readv/writev speedup · a83638a4
      Andrew Morton authored
      This is Janet Morgan's patch which converts the readv/writev code
      to submit all segments for IO before waiting on them, rather than
      submitting each segment separately.
      
      This is a critical performance fix for O_DIRECT reads and writes.
      Prior to this change, O_DIRECT vectored IO was forced to wait for
      completion against each segment of the iovec rather than submitting all
      segments and waiting on the lot.  ie: for ten segments, this code will
      be ten times faster.
      
      There will also be moderate improvements for buffered IO - smaller code
      paths, plus writev() only takes i_sem once.
      
      The patch ended up quite large unfortunately - turned out that the only
      sane way to implement this without duplicating significant amounts of
      code (the generic_file_write() bounds checking, all the O_DIRECT
      handling, etc) was to redo generic_file_read() and generic_file_write()
      to take an iovec/nr_segs pair rather than `buf, count'.
      
      New exported functions generic_file_readv() and generic_file_writev()
      have been added:
      
      ssize_t generic_file_readv(struct file *filp, const struct iovec *iov,
                                unsigned long nr_segs, loff_t *ppos);
      ssize_t generic_file_writev(struct file *file, const struct iovec *iov,
                                unsigned long nr_segs, loff_t * ppos);
      
      If a driver does not use these in their file_operations then they will
      continue to use the old readv/writev code, which sits in a loop calling
      calls fops->read() or fops->write().
      
      ext2, ext3, JFS and the blockdev driver are currently using this
      capability.
      
      Some coding cleanups were made in fs/read_write.c.  Mainly:
      
      - pass "READ" or "WRITE" around to indicate the diretion of the
        operation, rather than the (confusing, inverted)
        VERIFY_READ/VERIFY_WRITE.
      
      - Use the identifier `nr_segs' everywhere to indicate the iovec
        length rather than `count', which is often used to indicate the
        number of bytes in the syscall.  It was confusing the heck out of me.
      
      - Some cleanups to the raw driver.
      
      - Some additional generality in fs/direct_io.c: the core `struct dio'
        used to be a "populate-and-go" thing.  Janet has broken that up so
        you can initialise a struct dio once, then loop around feeding it
        more file segments, then wait on completion against everything.
      
      - In a couple of places we needed to handle the situation where we
        knew, a-priori, that the user was going to get a short read or write.
        File size limit exceeded, read past i_size, etc.  We handled that by
        shortening the iovec in-place with iov_shorten().  Which is not
        particularly pretty, but neither were the alternatives.
      a83638a4
    • Ingo Molnar's avatar
      [PATCH] NMI watchdog SMP fix · d8fcce3f
      Ingo Molnar authored
      This makes NMIs work - otherwise they go to CPU 0 only and any hard
      lockup on the other CPUs will not be detected by the nmi_watchdog.
      d8fcce3f
    • Linus Torvalds's avatar
      Merge master.kernel.org:/home/davem/BK/net-2.5 · d038b8c5
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      d038b8c5