Commits · 292ae1292c4f1f7c46a5dd0e67ef9741855d716b · Kirill Smelkov / linux

23 Aug, 2004 40 commits

[PATCH] ppc64: tweak schedule_timeout in __cpu_die · 292ae129

Nathan Lynch authored Aug 22, 2004

The current code does schedule_timeout(HZ) when waiting for a cpu to die,
which is a bit coarse and tends to limit the "throughput" of my stress
tests :)

Change the HZ timeout to HZ/5, increase the number of tries to 25 so the
overall wait time is similar.  In practice, I've never seen the loop need
more than two iterations.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

292ae129

[PATCH] ppc64: switch screen_info init to C99 · 37cfbd31

Olof Johansson authored Aug 22, 2004

Minor cleanup: Use C99 initializers for the screen_info struct.
Signed-off-by: Olof Johansson <olof@austin.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

37cfbd31

[PATCH] ppc64: bolted SLB entry for iSeries · d983cfb9

David Gibson authored Aug 22, 2004

Tested, at least basically, on Power4 iSeries with shared processors, on
Power4 pSeries and RS64 (non-SLB) iSeries machines.

On pSeries SLB machines we "bolt" an SLB entry for the first segment of the
vmalloc() area into the SLB, to reduce the SLB miss rate.  This caused
problems, so was disabled, on iSeries because the bolted entry was not
restored properly on shared processor switch.  This patch adds information
about the bolted vmalloc segment to the lpar map, which should be restored
on shared processor switch.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

d983cfb9

[PATCH] ppc64: HVSI driver · 9024871d

Hollis Blanchard authored Aug 22, 2004

This is a console driver for IBM's p5 servers; please consider it for
inclusion.  I've addressed all the comments I've received so far.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

9024871d

[PATCH] ppc64: Fix v_regs pointer setup · 923bf111

Srivatsa Vaddagiri authored Aug 22, 2004

During some signal test, we found that v_regs pointer was not setup
correctly.  v_regs was made to point to itself, as a result of which the
pointer was corrupted when vec registers were copied over.  When the signal
handler returned, restore_sigcontext tried derefering the invalid pointer
and in the process killed the app with SIGSEGV.
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

923bf111

[PATCH] ppc64: Reduce verbosity of RTAS error logs · 0cc8f1e0

Paul Mackerras authored Aug 22, 2004

Currently on pSeries systems the kernel will print out a hex dump of any
error events reported by the platform at boot time. These can be rather
large and are practically incomprehensible to humans. With this patch, the
kernel will by default print a 1-line summary for each error reported with
the severity, type, etc. printed as text strings. The old behaviour is
still available by using the rtasmsgs=on kernel command line option. The
patch also renames some RTAS-specific symbols to start with "RTAS_".
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

0cc8f1e0

[PATCH] ppc64 Fix unbalanced pci_dev_put in EEH code · 7c1645c5

Paul Mackerras authored Aug 22, 2004

The EEH code currently can end up doing an extra pci_dev_put() in the case
where we hot-unplug a card for which we are ignoring EEH errors (e.g.  a
graphics card).  This patch fixes that problem by only maintaining a
reference to the PCI device if we have entered any of its resource
addresses into our address -> PCI device cache.  This patch is based on an
earlier patch by Linas Vepstas.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

7c1645c5

[PATCH] ppc64: log firmware errors during boot · 8a163c94

Paul Mackerras authored Aug 22, 2004

Firmware can report errors at any time, and not atypically during boot. 
However, these reports were being discarded until th rtasd comes up, which
occurs fairly late in the boot cycle.  As a result, firmware errors during
boot were being silently ignored.
Signed-off-by: Linas Vepstas <linas@linas.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8a163c94

[PATCH] ppc64: C99 initializers in INIT_THREAD · cab74470

David Gibson authored Aug 22, 2004

Fairly trivial PPC64 cleanup.  This patch makes the ppc64 INIT_THREAD
#define use C99 initializers, which will make it less likely to get broken
if we need to change thread_struct.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

cab74470

[PATCH] ppc64: fix idle loop for offline cpu · b83f8c40

Paul Mackerras authored Aug 22, 2004

In the default_idle and dedicated_idle loops, there are some inner loops
out of which we should break if the cpu is marked offline.  Otherwise, it
is possible for the cpu to get stuck and never actually go offline.
shared_idle is unaffected.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b83f8c40

[PATCH] ppc64: Don't call scheduler on offline cpu · 8861f2cb

Paul Mackerras authored Aug 22, 2004

When taking a cpu offline, once the cpu has been removed from
cpu_online_map, it is not supposed to service any more interrupts.  This
presents a problem on ppc64 because we cannot truly disable the
decrementer.  There used to be cpu_is_offline() checks in several scheduler
functions (e.g.  rebalance_tick()) which papered over this issue, but these
checks were removed recently.  So with recent 2.6 kernels, an attempt to
offline a cpu can result in a crash in find_busiest_group().  This patch
prevents such crashes.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8861f2cb

[PATCH] ppc64: set tbl->it_type in iommu code · 7d2d3531

Paul Mackerras authored Aug 22, 2004

Here is a patch that sets struct iommu_table->it_type to TCE_PCI in
pSeries_iommu.c.  This is just for code completeness (and it is updated in
iSeries_iommu.c, but was somehow missed in pSeries_iommu.c).
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

7d2d3531

[PATCH] ppc64: Fix oprofile error messages · cda707e2

Anton Blanchard authored Aug 22, 2004

Clean up an oprofile error message, it was missing a newline.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

cda707e2

[PATCH] ppc64: add missing EXPORT_SYMBOLS for oprofile · 23cbe308

Anton Blanchard authored Aug 22, 2004

Add some missing exports, required for oprofile to be compiled as a module.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

23cbe308

[PATCH] ppc64: allow oprofile module to be safely unloaded · 3c09bf53

Anton Blanchard authored Aug 22, 2004

Allow the oprofile module to be unloaded, before we never removed the
oprofile specific interrupt handler.  Handle the pending exception case in
the dummy interrupt handler instead.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

3c09bf53

[PATCH] ppc64: disable oprofile debug messages · c52f92f1

Anton Blanchard authored Aug 22, 2004

Disable oprofile debug messages.  They arent much use now things are
working reliably.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c52f92f1

[PATCH] ppc64: POWER4 oprofile update · 619f1e75

Anton Blanchard authored Aug 22, 2004

POWER4 oprofile updates from Carl Love.

- Create mmcr0, mmcr1, mmcra oprofilefs files.
- Use kernel and user profile disable bits. (Some modifications by me)
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

619f1e75

[PATCH] ppc64: remove unnecessary cpu maps · f3abb77d

Paul Mackerras authored Aug 22, 2004

With cpu_present_map, we don't need these any longer.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f3abb77d

[PATCH] ppc64: rework secondary SMT thread setup at boot · 8923632c

Paul Mackerras authored Aug 22, 2004

Our (ab)use of cpu_possible_map in setup_system to start secondary SMT
threads bothers me.  Mark such threads in cpu_possible_map during early
boot; let RTAS tell us which present cpus are still offline later so we can
start them.

I'm not totally sure about this one, it might be better to set up
cpu_sibling_map in prom_hold_cpus and use that in setup_system.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8923632c

[PATCH] ppc64: use cpu_present_map in ppc64 · 686a8677

Paul Mackerras authored Aug 22, 2004

Adopt the "standard" cpu_present_map for describing cpus which are present
in the system, but not necessarily online.  cpu_present_map is meant to be
a superset of cpu_online_map and a subset of cpu_possible_map.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

686a8677

[PATCH] ppc64: use platform numbering of cpus for hypervisor calls. · 04aff4ba

Paul Mackerras authored Aug 22, 2004

We were using Linux's cpu numbering for cpu-related hypervisor calls (e.g. 
vpa registration, H_CONFER).  It happened to work most of the time because
Linux and the hypervisor usually, but not always, have the same numbering
for cpus.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

04aff4ba

[PATCH] ppc64: include profile.c in kernel/irq.c · 1092beac

Dave Hansen authored Aug 22, 2004

arch/ppc64/kernel/irq.c: In function `init_irq_proc':
arch/ppc64/kernel/irq.c:797: warning: implicit declaration of function
`create_prof_cpu_mask'
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

1092beac

[PATCH] ppc64: set time-related systemcfg fields · 5c7de485

Paul Mackerras authored Aug 22, 2004

Somewhere along the line we lost the code that updates some fields of the
systemcfg structure that are used for translating timebase values to time
of day.  I want to get rid of the systemcfg structure eventually, but
applications are using it (and in particular these fields) and I don't want
to break the ABI in a stable kernel series.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

5c7de485

[PATCH] ppc64: remove old asm offsets · 50cabfcb

Anton Blanchard authored Aug 22, 2004

Remove some unused things in asm-offsets.c
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

50cabfcb

[PATCH] ppc64: reduce stack overflow warning threshold · 4574fa8f

Anton Blanchard authored Aug 22, 2004

Reduce the stack overflow warning from 4kB to 2kB now that its been in and
tested for a while.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4574fa8f

[PATCH] ppc32: fix warnings on Ebony MTD build · 6a819234

Matt Porter authored Aug 22, 2004

This patch removes warnings associated with Ebony MTD related defines.
Please apply.
Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

6a819234

[PATCH] ppc32: Fix bug in altivec emulation · a2679264

Paul Mackerras authored Aug 22, 2004

This patch fixes a bug in the kernel emulation of altivec instructions with
denormalized operands.  The emulation of the vmaddfp and vmnsubfp
instructions was giving the wrong answer because I had the wrong order of
operands to the fmadds and fnmsubs instructions.  This patch fixes it for
both ppc32 and ppc64.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

a2679264

[PATCH] ppc32: export __dma_sync & __dma_sync_page · de5cdff5

Eugene Surovegin authored Aug 22, 2004

This patch adds missing exports for __dma_sync and __dma_sync_page (DMA API
helpers for non-coherent cache PPCs).
Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

de5cdff5

[PATCH] ppc32: add docs for noltlbs and nobats parameters · 934b8f1a

Matt Porter authored Aug 22, 2004

Adds documentation of the PPC noltlbs and nobats kernel cmdline parameters.
noltlbs is a new option and nobats never had an entry.
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

934b8f1a

[PATCH] ppc32: emulate obsolete instructions · 79e2e096

Paul Mackerras authored Aug 22, 2004

This patch adds emulation in the illegal instruction handler for a couple
of old instructions that are no longer implemented in the PPC970 and later
chips.  This patch adds the code for both ppc32 and ppc64, and cleans up
the ppc64 traps.c a bit, along the lines of the ppc32 code.  It also makes
sure that the ppc64 code generates a SIGTRAP after emulating an instruction
if single-stepping is enabled.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

79e2e096

[PATCH] ppc32: handle misaligned string/multiple insns · 1862e9f2

Paul Mackerras authored Aug 22, 2004

This patch adds code to the ppc32 alignment exception handler to make it
handle the load/store string and load/store multiple word instructions. 
This is an issue for older CPUs such as the PPC601, which traps on
load/store string instructions which cross a page boundary (newer CPUs
handle this in hardware).  I have a little test program which exercises
this code, so I am reasonably confident it's correct.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

1862e9f2

[PATCH] ppc32: make PPC40x large tlb mapping optional · bc3152e4

Matt Porter authored Aug 22, 2004

This makes the PPC40x lowmem large tlb mapping selectable via a cmdline
option.  This allows use of the normal page-sized mapping so that kernel
text can be read only if desired.
Signed-off-by: Josh Boyer <jwboyer@charter.net>
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bc3152e4

[PATCH] ppc32: optimize/fix timer_interrupt loop · 07a134d6

Matt Porter authored Aug 22, 2004

The following patch fixes the situation where the loop condition could
generate a next_dec of zero while exiting the loop.  This is suboptimal on
Classic PPC because it forces another interrupt to occur and reenter the
handler.  It is fatal on Book E cores, because their decrementer is stopped
when writing a zero (Classic interrupts on a 0->-1 transition, Book E
interrupts on a 1->0 transition).  Instead, stay in the loop on a
next_dec==0.
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

07a134d6

[PATCH] ppc32: remove hardcoded offsets from ppc asm · d6e3c04a

Benjamin Herrenschmidt authored Aug 22, 2004

This patch by Vincent Hanquez removes some hard coded offsets for accessing
thread info fields from assembly, uses the normal offset generation
mecanism that we already have for other things instead.
Signed-off-by: Vincent Hanquez <tab@snarc.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

d6e3c04a

[PATCH] Make i386 die() more resilient against recursive errors · 2fa69d93

Keith Owens authored Aug 22, 2004

Make i386 die() more resilient against recursive errors, almost a cut
and paste of the ia64 die() routine.  Much of the patch is indentation
changes.

Mainly to make it easier to add crash, lcrash, kmsgdump or other RAS patches. 
They are invoked from die() and if they crash themselves, we have to avoid
recursive loops in die().
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2fa69d93

[PATCH] NMI trigger switch support for debugging(updated) · 7f2b65bd

Akiyama Nobuyuki authored Aug 22, 2004

I made a patch for debugging with the help of NMI trigger switch.
When kernel hangs severely, keyboard operation(e.g.Ctrl-Alt-Del)
doesn't work properly. This patch enables debugging information
to be displayed on console in this case.
I think this feature is necessary as standard functionality.
Please feel free to use this patch and let me know if you have
any comments.

Background:

When a trouble occurs in kernel, we usually begin to investigate
with following information:
 - panic >> panic message.
 - oops >> CPU registers and stack trace.
 - hang >> **NONE** no standard method established.

How it works:

Most IA32 servers have a NMI switch that fires NMI interrupt up.
The NMI interrupt can interrupt even if kernel is serious state,
for example deadlock under the interrupt disabled.
When the NMI switch is pressed after this feature is activated,
CPU registers and stack trace are displayed on console and then
panic occurs.
This feature is activated or deactivated with sysctl.

On IA32 architecture, only the following are defined as reason
of NMI interrupt:
 - memory parity error
 - I/O check error
The reason code of NMI switch is not defined, so this patch assumes
that all undefined NMI interrupts are fired by MNI switch.
However, oprofile and NMI watchdog also use undefined NMI interrupt.
Therefore this feature cannot be used at the same time with oprofile
and NMI watchdog. This feature hands NMI interrupt over to oprofile
and NMI watchdog. So, when they have been activated, this feature
doesn't work even if it is activated.

Supported architecture:

IA32

Setup:

Set up the system control parameter as follows:

# sysctl -w kernel.unknown_nmi_panic=1
kernel.unknown_nmi_panic = 1

If the NMI switch is pressed, CPU registers and stack trace will
be displayed on console and then panic occurs.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

7f2b65bd

[PATCH] fix reading string module parameters in sysfs · f1577452

Arnd Bergmann authored Aug 22, 2004

Reading the contents of a module_param_string through sysfs currently
oopses because the param_get_charp() function cannot operate on a
kparam_string struct.  This introduces the required param_get_string.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f1577452

[PATCH] proc fs task name locking fix · 4b4b699d

Mike Kravetz authored Aug 22, 2004

Races have been observed between excec-time overwriting of task->comm and
/proc accesses to the same data.  This causes environment string
information to appear in /proc.

Fix that up by taking task_lock() around updates to and accesses to
task->comm.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4b4b699d

[PATCH] tg3 section fix · 9026a8d6

Randy Dunlap authored Aug 22, 2004

add_pin_to_irq() should not be __init; it is used after init code.
Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

9026a8d6

[PATCH] context-switching overhead in X, ioport() · a55702bb

Ingo Molnar authored Aug 22, 2004

while debugging/improving scheduling latencies i got the following
strange latency report from Lee Revell:

  http://krustophenia.net/testresults.php?dataset=2.6.8.1-P6#/var/www/2.6.8.1-P6

this trace shows a 120 usec latency caused by XFree86, on a 600 MHz x86
system. Looking closer reveals:

  00000002 0.006ms (+0.003ms): __switch_to (schedule)
  00000002 0.088ms (+0.082ms): finish_task_switch (schedule)

it took more than 80 usecs for XFree86 to do a context-switch!

it turns out that the reason for this (massive) context-switching
overhead is the following change in 2.6.8:

      [PATCH] larger IO bitmaps

To demonstrate the effect of this change i've written ioperm-latency.c
(attached), which gives the following on vanilla 2.6.8.1:

  # ./ioperm-latency
  default no ioperm:             scheduling latency: 2528 cycles
  turning on port 80 ioperm:     scheduling latency: 10563 cycles
  turning on port 65535 ioperm:  scheduling latency: 10517 cycles

the ChangeSet says:

        Now, with the lazy bitmap allocation and per-CPU TSS, this
        will really not drain any resources I think.

this is plain wrong. An increase in the IO bitmap size introduces
per-context-switch overhead as well: we now have to copy an 8K bitmap
every time XFree86 context-switches - even though XFree86 never uses
ports higher than 1024! I've straced XFree86 on a number of x86 systems
and in every instance ioperm() was used - so i'd say the majority of x86
Linux systems running 2.6.8.1 are affected by this problem.

This not only causes lots of overhead, it also trashes ~16K out of the
L1 and L2 caches, on every context-switch. It's as if XFree86 did a L1
cache flush on every context-switch ...

the simple solution would be to revert IO_BITMAP_BITS back to 1024 and
release 2.6.8.2?

I've implemented another solution as well, which tracks the
highest-enabled port # for every task and does the copying of the bitmap
intelligently. (patch attached) The patched kernel gives:

  # ./ioperm-latency
  default no ioperm:             scheduling latency: 2423 cycles
  turning on port 80 ioperm:     scheduling latency: 2503 cycles
  turning on port 65535 ioperm:  scheduling latency: 10607 cycles

this is much more acceptable - the full overhead only occurs in the very
unlikely event of a task using the high ioport range. X doesnt suffer
any significant overhead.

(tracking the maximum allowed port # also allows a simplification of
io_bitmap handling: e.g. we dont do the invalid-offset trick anymore -
the IO bitmap in the TSS is always valid and secure.)

I tested the patch on x86 SMP and UP, it works fine for me. I tested
boundary conditions as well, it all seems secure.

	Ingo

#include <errno.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <sys/io.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/unistd.h>

#define CYCLES(x) asm volatile ("rdtsc" :"=a" (x)::"edx")

#define __NR_sched_set_affinity 241
_syscall3 (int, sched_set_affinity, pid_t, pid, unsigned int, mask_len, unsigned long *, mask)

/*
 * Use a pair of RT processes bound to the same CPU to measure
 * context-switch overhead:
 */
static void measure(void)
{
	unsigned long i, min = ~0UL, pid, mask = 1, t1, t2;

	sched_set_affinity(0, sizeof(mask), &mask);

	pid = fork();
	if (!pid)
		for (;;) {
			asm volatile ("sti; nop; cli");
			sched_yield();
		}

	sched_yield();
	for (i = 0; i < 100; i++) {
		asm volatile ("sti; nop; cli");
		CYCLES(t1);
		sched_yield();
		CYCLES(t2);
		if (i > 10) {
			if (t2 - t1 < min)
				min = t2 - t1;
		}
	}
	asm volatile ("sti");

	kill(pid, 9);
	printf("scheduling latency: %ld cycles\n", min);
	sched_yield();
}

int main(void)
{
	struct sched_param p = { sched_priority: 2 };
	unsigned long mask = 1;

	if (iopl(3)) {
		printf("need to run as root!\n");
		exit(-1);
	}
	sched_setscheduler(0, SCHED_FIFO, &p);
	sched_set_affinity(0, sizeof(mask), &mask);

	printf("default no ioperm:             ");
	measure();

	printf("turning on port 80 ioperm:     ");
	ioperm(0x80,1,1);
	measure();

	printf("turning on port 65535 ioperm:  ");
	if (ioperm(0xffff,1,1))
		printf("FAILED - older kernel.\n");
	else
		measure();

	return 0;
}
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

a55702bb