1. 21 Sep, 2010 3 commits
    • Steven Rostedt's avatar
      sched: Try not to migrate higher priority RT tasks · 43fa5460
      Steven Rostedt authored
      When first working on the RT scheduler design, we concentrated on
      keeping all CPUs running RT tasks instead of having multiple RT
      tasks on a single CPU waiting for the migration thread to move
      them. Instead we take a more proactive stance and push or pull RT
      tasks from one CPU to another on wakeup or scheduling.
      
      When an RT task wakes up on a CPU that is running another RT task,
      instead of preempting it and killing the cache of the running RT
      task, we look to see if we can migrate the RT task that is waking
      up, even if the RT task waking up is of higher priority.
      
      This may sound a bit odd, but RT tasks should be limited in
      migration by the user anyway. But in practice, people do not do
      this, which causes high prio RT tasks to bounce around the CPUs.
      This becomes even worse when we have priority inheritance, because
      a high prio task can block on a lower prio task and boost its
      priority. When the lower prio task wakes up the high prio task, if
      it happens to be on the same CPU it will migrate off of it.
      
      But in reality, the above does not happen much either, because the
      wake up of the lower prio task, which has already been boosted, if
      it was on the same CPU as the higher prio task, it would then
      migrate off of it. But anyway, we do not want to migrate them
      either.
      
      To examine the scheduling, I created a test program and examined it
      under kernelshark. The test program created CPU * 2 threads, where
      each thread had a different priority. The program takes different
      options. The options used in this change log was to have priority
      inheritance mutexes or not.
      
      All threads did the following loop:
      
      static void grab_lock(long id, int iter, int l)
      {
      	ftrace_write("thread %ld iter %d, taking lock %d\n",
      		     id, iter, l);
      	pthread_mutex_lock(&locks[l]);
      	ftrace_write("thread %ld iter %d, took lock %d\n",
      		     id, iter, l);
      	busy_loop(nr_tasks - id);
      	ftrace_write("thread %ld iter %d, unlock lock %d\n",
      		     id, iter, l);
      	pthread_mutex_unlock(&locks[l]);
      }
      
      void *start_task(void *id)
      {
      	[...]
      	while (!done) {
      		for (l = 0; l < nr_locks; l++) {
      			grab_lock(id, i, l);
      			ftrace_write("thread %ld iter %d sleeping\n",
      				     id, i);
      			ms_sleep(id);
      		}
      		i++;
      	}
      	[...]
      }
      
      The busy_loop(ms) keeps the CPU spinning for ms milliseconds. The
      ms_sleep(ms) sleeps for ms milliseconds. The ftrace_write() writes
      to the ftrace buffer to help analyze via ftrace.
      
      The higher the id, the higher the prio, the shorter it does the
      busy loop, but the longer it spins. This is usually the case with
      RT tasks, the lower priority tasks usually run longer than higher
      priority tasks.
      
      At the end of the test, it records the number of loops each thread
      took, as well as the number of voluntary preemptions, non-voluntary
      preemptions, and number of migrations each thread took, taking the
      information from /proc/$$/sched and /proc/$$/status.
      
      Running this on a 4 CPU processor, the results without changes to
      the kernel looked like this:
      
      Task        vol    nonvol   migrated     iterations
      ----        ---    ------   --------     ----------
        0:         53      3220       1470             98
        1:        562       773        724             98
        2:        752       933       1375             98
        3:        749        39        697             98
        4:        758         5        515             98
        5:        764         2        679             99
        6:        761         2        535             99
        7:        757         3        346             99
      
      total:     5156       4977      6341            787
      
      Each thread regardless of priority migrated a few hundred times.
      The higher priority tasks, were a little better but still took
      quite an impact.
      
      By letting higher priority tasks bump the lower prio task from the
      CPU, things changed a bit:
      
      Task        vol    nonvol   migrated     iterations
      ----        ---    ------   --------     ----------
        0:         37      2835       1937             98
        1:        666      1821       1865             98
        2:        654      1003       1385             98
        3:        664       635        973             99
        4:        698       197        352             99
        5:        703       101        159             99
        6:        708         1         75             99
        7:        713         1          2             99
      
      total:     4843       6594      6748            789
      
      The total # of migrations did not change (several runs showed the
      difference all within the noise). But we now see a dramatic
      improvement to the higher priority tasks. (kernelshark showed that
      the watchdog timer bumped the highest priority task to give it the
      2 count. This was actually consistent with every run).
      
      Notice that the # of iterations did not change either.
      
      The above was with priority inheritance mutexes. That is, when the
      higher prority task blocked on a lower priority task, the lower
      priority task would inherit the higher priority task (which shows
      why task 6 was bumped so many times). When not using priority
      inheritance mutexes, the current kernel shows this:
      
      Task        vol    nonvol   migrated     iterations
      ----        ---    ------   --------     ----------
        0:         56      3101       1892             95
        1:        594       713        937             95
        2:        625       188        618             95
        3:        628         4        491             96
        4:        640         7        468             96
        5:        631         2        501             96
        6:        641         1        466             96
        7:        643         2        497             96
      
      total:     4458       4018      5870            765
      
      Not much changed with or without priority inheritance mutexes. But
      if we let the high priority task bump lower priority tasks on
      wakeup we see:
      
      Task        vol    nonvol   migrated     iterations
      ----        ---    ------   --------     ----------
        0:        115      3439       2782             98
        1:        633      1354       1583             99
        2:        652       919       1218             99
        3:        645       713        934             99
        4:        690         3          3             99
        5:        694         1          4             99
        6:        720         3          4             99
        7:        747         0          1            100
      
      Which shows a even bigger change. The big difference between task 3
      and task 4 is because we have only 4 CPUs on the machine, causing
      the 4 highest prio tasks to always have preference.
      
      Although I did not measure cache misses, and I'm sure there would
      be little to measure since the test was not data intensive, I could
      imagine large improvements for higher priority tasks when dealing
      with lower priority tasks. Thus, I'm satisfied with making the
      change and agreeing with what Gregory Haskins argued a few years
      ago when we first had this discussion.
      
      One final note. All tasks in the above tests were RT tasks. Any RT
      task will always preempt a non RT task that is running on the CPU
      the RT task wants to run on.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Gregory Haskins <ghaskins@novell.com>
      LKML-Reference: <20100921024138.605460343@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      43fa5460
    • Venkatesh Pallipadi's avatar
      sched: Increment cache_nice_tries only on periodic lb · 58b26c4c
      Venkatesh Pallipadi authored
      scheduler uses cache_nice_tries as an indicator to do cache_hot and
      active load balance, when normal load balance fails. Currently,
      this value is changed on any failed load balance attempt. That ends
      up being not so nice to workloads that enter/exit idle often, as
      they do more frequent new_idle balance and that pretty soon results
      in cache hot tasks being pulled in.
      
      Making the cache_nice_tries ignore failed new_idle balance seems to
      make better sense. With that only the failed load balance in
      periodic load balance gets accounted and the rate of accumulation
      of cache_nice_tries will not depend on idle entry/exit (short
      running sleep-wakeup kind of tasks). This reduces movement of
      cache_hot tasks.
      
      schedstat diff (after-before) excerpt from a workload that has
      frequent and short wakeup-idle pattern (:2 in cpu col below refers
      to NEWIDLE idx) This snapshot was across ~400 seconds.
      
      Without this change:
      domainstats:  domain0
       cpu     cnt      bln      fld      imb     gain    hgain  nobusyq  nobusyg
       0:2  306487   219575    73167  110069413    44583    19070     1172   218403
       1:2  292139   194853    81421  120893383    50745    21902     1259   193594
       2:2  283166   174607    91359  129699642    54931    23688     1287   173320
       3:2  273998   161788    93991  132757146    57122    24351     1366   160422
       4:2  289851   215692    62190  83398383    36377    13680      851   214841
       5:2  316312   222146    77605  117582154    49948    20281      988   221158
       6:2  297172   195596    83623  122133390    52801    21301      929   194667
       7:2  283391   178078    86378  126622761    55122    22239      928   177150
       8:2  297655   210359    72995  110246694    45798    19777     1125   209234
       9:2  297357   202011    79363  119753474    50953    22088     1089   200922
      10:2  278797   178703    83180  122514385    52969    22726     1128   177575
      11:2  272661   167669    86978  127342327    55857    24342     1195   166474
      12:2  293039   204031    73211  110282059    47285    19651      948   203083
      13:2  289502   196762    76803  114712942    49339    20547     1016   195746
      14:2  264446   169609    78292  115715605    50459    21017      982   168627
      15:2  260968   163660    80142  116811793    51483    21281     1064   162596
      
      With this change:
      domainstats:  domain0
       cpu     cnt      bln      fld      imb     gain    hgain  nobusyq  nobusyg
       0:2  272347   187380    77455  105420270    24975        1      953   186427
       1:2  267276   172360    86234  116242264    28087        6     1028   171332
       2:2  259769   156777    93281  123243134    30555        1     1043   155734
       3:2  250870   143129    97627  127370868    32026        6     1188   141941
       4:2  248422   177116    64096  78261112    22202        2      757   176359
       5:2  275595   180683    84950  116075022    29400        6      778   179905
       6:2  262418   162609    88944  119256898    31056        4      817   161792
       7:2  252204   147946    92646  122388300    32879        4      824   147122
       8:2  262335   172239    81631  110477214    26599        4      864   171375
       9:2  261563   164775    88016  117203621    28331        3      849   163926
      10:2  243389   140949    93379  121353071    29585        2      909   140040
      11:2  242795   134651    98310  124768957    30895        2     1016   133635
      12:2  255234   166622    79843  104696912    26483        4      746   165876
      13:2  244944   151595    83855  109808099    27787        3      801   150794
      14:2  241301   140982    89935  116954383    30403        6      845   140137
      15:2  232271   128564    92821  119185207    31207        4     1416   127148
      Signed-off-by: default avatarVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1284167957-3675-1-git-send-email-venki@google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      58b26c4c
    • Ingo Molnar's avatar
      Merge commit 'v2.6.36-rc5' into sched/core · cf84fd96
      Ingo Molnar authored
      Merge reason: Pick up the latest fixes in -rc5.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cf84fd96
  2. 20 Sep, 2010 19 commits
    • Linus Torvalds's avatar
      Linux 2.6.36-rc5 · b30a3f62
      Linus Torvalds authored
      b30a3f62
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6 · 6b3d2cc4
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6:
        Staging: vt6655: fix buffer overflow
        Revert: "Staging: batman-adv: Adding netfilter-bridge hooks"
      6b3d2cc4
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 · 0c4ab345
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
        USB: musb: MAINTAINERS: Fix my mail address
        USB: serial/mos*: prevent reading uninitialized stack memory
        USB: otg: twl4030: fix phy initialization(v1)
        USB: EHCI: Disable langwell/penwell LPM capability
        usb: musb_debugfs: don't use the struct file private_data field with seq_files
      0c4ab345
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 · 36ff4a55
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
        serial: mfd: fix bug in serial_hsu_remove()
        serial: amba-pl010: fix set_ldisc
      36ff4a55
    • Dan Carpenter's avatar
      Staging: vt6655: fix buffer overflow · dd173abf
      Dan Carpenter authored
      "param->u.wpa_associate.wpa_ie_len" comes from the user.  We should
      check it so that the copy_from_user() doesn't overflow the buffer.
      
      Also further down in the function, we assume that if
      "param->u.wpa_associate.wpa_ie_len" is set then "abyWPAIE[0]" is
      initialized.  To make that work, I changed the test here to say that if
      "wpa_ie_len" is set then "wpa_ie" has to be a valid pointer or we return
      -EINVAL.
      
      Oddly, we only use the first element of the abyWPAIE[] array.  So I
      suspect there may be some other issues in this function.
      Signed-off-by: default avatarDan Carpenter <error27@gmail.com>
      Cc: stable <stable@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      dd173abf
    • Sven Eckelmann's avatar
      Revert: "Staging: batman-adv: Adding netfilter-bridge hooks" · 350aede6
      Sven Eckelmann authored
      This reverts commit 96d592ed.
      
      The netfilter hook seems to be misused and may leak skbs in situations
      when NF_HOOK returns NF_STOLEN. It may not filter everything as
      expected. Also the ethernet bridge tables are not yet capable to
      understand batman-adv packet correctly.
      
      It was only added for testing purposes and can be removed again.
      Reported-by: default avatarVasiliy Kulikov <segooon@gmail.com>
      Signed-off-by: default avatarSven Eckelmann <sven.eckelmann@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      350aede6
    • Feng Tang's avatar
      serial: mfd: fix bug in serial_hsu_remove() · e3671ac4
      Feng Tang authored
      Medfield HSU driver deal with 4 pci devices(3 uart ports + 1 dma controller),
      so in pci remove func, we need handle them differently
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      e3671ac4
    • Mika Westerberg's avatar
      serial: amba-pl010: fix set_ldisc · 476f771c
      Mika Westerberg authored
      Commit d87d9b7d ("tty: serial - fix tty referencing in set_ldisc") changed
      set_ldisc to take ldisc number as parameter. This patch fixes AMBA PL010 driver
      according the new prototype.
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@iki.fi>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      476f771c
    • Felipe Balbi's avatar
      USB: musb: MAINTAINERS: Fix my mail address · f299470a
      Felipe Balbi authored
      If we don't, contributors to musb and any USB OMAP
      code will be sending mails to an unexistent inbox.
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      f299470a
    • Dan Rosenberg's avatar
      USB: serial/mos*: prevent reading uninitialized stack memory · a0846f18
      Dan Rosenberg authored
      The TIOCGICOUNT device ioctl in both mos7720.c and mos7840.c allows
      unprivileged users to read uninitialized stack memory, because the
      "reserved" member of the serial_icounter_struct struct declared on the
      stack is not altered or zeroed before being copied back to the user.
      This patch takes care of it.
      Signed-off-by: default avatarDan Rosenberg <dan.j.rosenberg@gmail.com>
      Cc: stable <stable@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      a0846f18
    • Ming Lei's avatar
      USB: otg: twl4030: fix phy initialization(v1) · fc8f2a76
      Ming Lei authored
      Commit 461c3177(into 2.6.36-v3)
      is put forward to power down phy if no usb cable is connected,
      but does introduce the two issues below:
      
      1), phy is not into work state if usb cable is connected
      with PC during poweron, so musb device mode is not usable
      in such case, follows the reasons:
      	-twl4030_phy_resume is not called, so
      		regulators are not enabled
      		i2c access are not enabled
      		usb mode not configurated
      
      2), The kernel warings[1] of regulators 'unbalanced disables'
      is caused if poweron without usb cable connected
      with PC or b-device.
      
      This patch fixes the two issues above:
      	-power down phy only if no usb cable is connected with PC
      and b-device
      	-do phy initialization(via __twl4030_phy_resume) if usb cable
      is connected with PC(vbus event) or another b-device(ID event) in
      twl4030_usb_probe.
      
      This patch also doesn't put VUSB3V1 LDO into active mode in
      twl4030_usb_ldo_init until VBUS/ID change detected, so we can
      save more power consumption than before.
      
      This patch is verified OK on Beagle board either connected with
      usb cable or not when poweron.
      
      [1]. warnings of 'unbalanced disables' of regulators.
      [root@OMAP3EVM /]# dmesg
      ------------[ cut here ]------------
      WARNING: at drivers/regulator/core.c:1357 _regulator_disable+0x38/0x128()
      unbalanced disables for VUSB1V8
      Modules linked in:
      Backtrace:
      [<c0030c48>] (dump_backtrace+0x0/0x110) from [<c034f5a8>] (dump_stack+0x18/0x1c)
       r7:c78179d8 r6:c01ed6b8 r5:c0410822 r4:0000054d
      [<c034f590>] (dump_stack+0x0/0x1c) from [<c0057da8>] (warn_slowpath_common+0x54/0x6c)
      [<c0057d54>] (warn_slowpath_common+0x0/0x6c) from [<c0057e64>] (warn_slowpath_fmt+0x38/0x40)
       r9:00000000 r8:00000000 r7:c78e6608 r6:00000000 r5:fffffffb
       r4:c78e6c00
      [<c0057e2c>] (warn_slowpath_fmt+0x0/0x40) from [<c01ed6b8>] (_regulator_disable+0x38/0x128)
       r3:c0410e53 r2:c0410ad5
      [<c01ed680>] (_regulator_disable+0x0/0x128) from [<c01ed87c>] (regulator_disable+0x24/0x38)
       r7:c78e6608 r6:00000000 r5:c78e6c40 r4:c78e6c00
      [<c01ed858>] (regulator_disable+0x0/0x38) from [<c02382dc>] (twl4030_phy_power+0x15c/0x17c)
       r5:c78595c0 r4:00000000
      [<c0238180>] (twl4030_phy_power+0x0/0x17c) from [<c023831c>] (twl4030_phy_suspend+0x20/0x2c)
       r6:00000000 r5:c78595c0 r4:c78595c0
      [<c02382fc>] (twl4030_phy_suspend+0x0/0x2c) from [<c0238638>] (twl4030_usb_irq+0x11c/0x16c)
       r5:c78595c0 r4:00000040
      [<c023851c>] (twl4030_usb_irq+0x0/0x16c) from [<c034ec18>] (twl4030_usb_probe+0x2c4/0x32c)
       r6:00000000 r5:00000000 r4:c78595c0
      [<c034e954>] (twl4030_usb_probe+0x0/0x32c) from [<c02152a0>] (platform_drv_probe+0x20/0x24)
       r7:00000000 r6:c047d49c r5:c78e6608 r4:c047d49c
      [<c0215280>] (platform_drv_probe+0x0/0x24) from [<c0214244>] (driver_probe_device+0xd0/0x190)
      [<c0214174>] (driver_probe_device+0x0/0x190) from [<c02143d4>] (__device_attach+0x44/0x48)
       r7:00000000 r6:c78e6608 r5:c78e6608 r4:c047d49c
      [<c0214390>] (__device_attach+0x0/0x48) from [<c0213694>] (bus_for_each_drv+0x50/0x90)
       r5:c0214390 r4:00000000
      [<c0213644>] (bus_for_each_drv+0x0/0x90) from [<c0214474>] (device_attach+0x70/0x94)
       r6:c78e663c r5:c78e6608 r4:c78e6608
      [<c0214404>] (device_attach+0x0/0x94) from [<c02134fc>] (bus_probe_device+0x2c/0x48)
       r7:00000000 r6:00000002 r5:c78e6608 r4:c78e6600
      [<c02134d0>] (bus_probe_device+0x0/0x48) from [<c0211e48>] (device_add+0x340/0x4b4)
      [<c0211b08>] (device_add+0x0/0x4b4) from [<c021597c>] (platform_device_add+0x110/0x16c)
      [<c021586c>] (platform_device_add+0x0/0x16c) from [<c0220cb0>] (add_numbered_child+0xd8/0x118)
       r7:00000000 r6:c045f15c r5:c78e6600 r4:00000000
      [<c0220bd8>] (add_numbered_child+0x0/0x118) from [<c001c618>] (twl_probe+0x3a4/0x72c)
      [<c001c274>] (twl_probe+0x0/0x72c) from [<c02601ac>] (i2c_device_probe+0x7c/0xa4)
      [<c0260130>] (i2c_device_probe+0x0/0xa4) from [<c0214244>] (driver_probe_device+0xd0/0x190)
       r5:c7856e20 r4:c047c860
      [<c0214174>] (driver_probe_device+0x0/0x190) from [<c02143d4>] (__device_attach+0x44/0x48)
       r7:c7856e04 r6:c7856e20 r5:c7856e20 r4:c047c860
      [<c0214390>] (__device_attach+0x0/0x48) from [<c0213694>] (bus_for_each_drv+0x50/0x90)
       r5:c0214390 r4:00000000
      [<c0213644>] (bus_for_each_drv+0x0/0x90) from [<c0214474>] (device_attach+0x70/0x94)
       r6:c7856e54 r5:c7856e20 r4:c7856e20
      [<c0214404>] (device_attach+0x0/0x94) from [<c02134fc>] (bus_probe_device+0x2c/0x48)
       r7:c7856e04 r6:c78fd048 r5:c7856e20 r4:c7856e20
      [<c02134d0>] (bus_probe_device+0x0/0x48) from [<c0211e48>] (device_add+0x340/0x4b4)
      [<c0211b08>] (device_add+0x0/0x4b4) from [<c0211fd8>] (device_register+0x1c/0x20)
      [<c0211fbc>] (device_register+0x0/0x20) from [<c0260aa8>] (i2c_new_device+0xec/0x150)
       r5:c7856e00 r4:c7856e20
      [<c02609bc>] (i2c_new_device+0x0/0x150) from [<c0260dc0>] (i2c_register_adapter+0xa0/0x1c4)
       r7:00000000 r6:c78fd078 r5:c78fd048 r4:c781d5c0
      [<c0260d20>] (i2c_register_adapter+0x0/0x1c4) from [<c0260f80>] (i2c_add_numbered_adapter+0x9c/0xb4)
       r7:00000a28 r6:c04600a8 r5:c78fd048 r4:00000000
      [<c0260ee4>] (i2c_add_numbered_adapter+0x0/0xb4) from [<c034efa4>] (omap_i2c_probe+0x324/0x3e8)
       r5:00000000 r4:c78fd000
      [<c034ec80>] (omap_i2c_probe+0x0/0x3e8) from [<c02152a0>] (platform_drv_probe+0x20/0x24)
      [<c0215280>] (platform_drv_probe+0x0/0x24) from [<c0214244>] (driver_probe_device+0xd0/0x190)
      [<c0214174>] (driver_probe_device+0x0/0x190) from [<c021436c>] (__driver_attach+0x68/0x8c)
       r7:c78b2140 r6:c047e214 r5:c04600e4 r4:c04600b0
      [<c0214304>] (__driver_attach+0x0/0x8c) from [<c021399c>] (bus_for_each_dev+0x50/0x84)
       r7:c78b2140 r6:c047e214 r5:c0214304 r4:00000000
      [<c021394c>] (bus_for_each_dev+0x0/0x84) from [<c0214068>] (driver_attach+0x20/0x28)
       r6:c047e214 r5:c047e214 r4:c00270d0
      [<c0214048>] (driver_attach+0x0/0x28) from [<c0213274>] (bus_add_driver+0xa8/0x228)
      [<c02131cc>] (bus_add_driver+0x0/0x228) from [<c02146a4>] (driver_register+0xb0/0x13c)
      [<c02145f4>] (driver_register+0x0/0x13c) from [<c0215744>] (platform_driver_register+0x4c/0x60)
       r9:00000000 r8:c001f688 r7:00000013 r6:c005b6fc r5:c00083dc
      r4:c00270d0
      [<c02156f8>] (platform_driver_register+0x0/0x60) from [<c001f69c>] (omap_i2c_init_driver+0x14/0x1c)
      [<c001f688>] (omap_i2c_init_driver+0x0/0x1c) from [<c002c460>] (do_one_initcall+0xd0/0x1a4)
      [<c002c390>] (do_one_initcall+0x0/0x1a4) from [<c0008478>] (kernel_init+0x9c/0x154)
      [<c00083dc>] (kernel_init+0x0/0x154) from [<c005b6fc>] (do_exit+0x0/0x688)
       r5:c00083dc r4:00000000
      ---[ end trace 1b75b31a2719ed1d ]---
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Cc: David Brownell <dbrownell@users.sourceforge.net>
      Cc: Felipe Balbi <me@felipebalbi.com>
      Cc: Anand Gadiyar <gadiyar@ti.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      fc8f2a76
    • Alek Du's avatar
      USB: EHCI: Disable langwell/penwell LPM capability · fc928250
      Alek Du authored
      We have to do so due to HW limitation.
      Signed-off-by: default avatarAlek Du <alek.du@intel.com>
      Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Cc: stable <stable@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      
      fc928250
    • Mathias Nyman's avatar
      usb: musb_debugfs: don't use the struct file private_data field with seq_files · 024cfa59
      Mathias Nyman authored
      seq_files use the private_data field of a file struct for storing a seq_file structure,
      data should be stored in seq_file's own private field (e.g. file->private_data->private)
      Otherwise seq_release() will free the private data when the file is closed.
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@nokia.com>
      Cc: stable <stable@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      024cfa59
    • Al Viro's avatar
      frv: double syscall restarts, syscall restart in sigreturn() · ed1cde68
      Al Viro authored
      We need to make sure that only the first do_signal() to be handled on
      the way out syscall will bother with syscall restarts; additionally, the
      check on the "signal has user handler" path had been wrong - compare
      with restart prevention in sigreturn()...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed1cde68
    • Al Viro's avatar
      frv: handling of restart into restart_syscall is fscked · 44c7afff
      Al Viro authored
      do_signal() should place the syscall number in gr7, not gr8 when
      handling ERESTART_WOULDBLOCK.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44c7afff
    • Al Viro's avatar
      frv: avoid infinite loop of SIGSEGV delivery · ad0acab4
      Al Viro authored
      Use force_sigsegv() rather than force_sig(SIGSEGV, ...) as the former
      resets the SEGV handler pointer which will kill the process, rather than
      leaving it open to an infinite loop if the SEGV handler itself caused a
      SEGV signal.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad0acab4
    • Al Viro's avatar
      frv: fix address verification holes in setup_frame/setup_rt_frame · 5f4ad04a
      Al Viro authored
      a) sa_handler might be maliciously set to point to kernel memory;
         blindly dereferencing it in FDPIC case is a Bad Idea(tm).
      
      b) I'm not sure you need that set_fs(USER_DS) there at all, but if you
         do, you'd better do it *before* checking the frame you've decided to
         use with access_ok(), lest sigaltstack() becomes a convenient
         roothole.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f4ad04a
    • Al Viro's avatar
      frv: restart_block.fn needs to be reset on sigreturn · 20cd514d
      Al Viro authored
      Reset restart_block.fn on executing a sigreturn such that any currently
      pending system call restarts will be forced to return -EINTR.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20cd514d
    • Hugh Dickins's avatar
      mm: further fix swapin race condition · 31c4a3d3
      Hugh Dickins authored
      Commit 4969c119 ("mm: fix swapin race condition") is now agreed to
      be incomplete.  There's a race, not very much less likely than the
      original race envisaged, in which it is further necessary to check that
      the swapcache page's swap has not changed.
      
      Here's the reasoning: cast in terms of reuse_swap_page(), but probably
      could be reformulated to rely on try_to_free_swap() instead, or on
      swapoff+swapon.
      
      A, faults into do_swap_page(): does page1 = lookup_swap_cache(swap1) and
      comes through the lock_page(page1).
      
      B, a racing thread of the same process, faults on the same address: does
      page1 = lookup_swap_cache(swap1) and now waits in lock_page(page1), but
      for whatever reason is unlucky not to get the lock any time soon.
      
      A carries on through do_swap_page(), a write fault, but cannot reuse the
      swap page1 (another reference to swap1).  Unlocks the page1 (but B
      doesn't get it yet), does COW in do_wp_page(), page2 now in that pte.
      
      C, perhaps the parent of A+B, comes in and write faults the same swap
      page1 into its mm, reuse_swap_page() succeeds this time, swap1 is freed.
      
      kswapd comes in after some time (B still unlucky) and swaps out some
      pages from A+B and C: it allocates the original swap1 to page2 in A+B,
      and some other swap2 to the original page1 now in C.  But does not
      immediately free page1 (actually it couldn't: B holds a reference),
      leaving it in swap cache for now.
      
      B at last gets the lock on page1, hooray! Is PageSwapCache(page1)? Yes.
      Is pte_same(*page_table, orig_pte)? Yes, because page2 has now been
      given the swap1 which page1 used to have.  So B proceeds to insert page1
      into A+B's page_table, though its content now belongs to C, quite
      different from what A wrote there.
      
      B ought to have checked that page1's swap was still swap1.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31c4a3d3
  3. 19 Sep, 2010 15 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6 · 2422084a
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6:
        alpha: deal with multiple simultaneously pending signals
        alpha: fix a 14 years old bug in sigreturn tracing
        alpha: unb0rk sigsuspend() and rt_sigsuspend()
        alpha: belated ERESTART_RESTARTBLOCK race fix
        alpha: Shift perf event pending work earlier in timer interrupt
        alpha: wire up fanotify and prlimit64 syscalls
        alpha: kill big kernel lock
        alpha: fix build breakage in asm/cacheflush.h
        alpha: remove unnecessary cast from void* in assignment.
        alpha: Use static const char * const where possible
      2422084a
    • Linus Torvalds's avatar
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 7d7dee96
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
        dca: disable dca on IOAT ver.3.0 multiple-IOH platforms
        netpoll: Disable IRQ around RCU dereference in netpoll_rx
        sctp: Do not reset the packet during sctp_packet_config().
        net/llc: storing negative error codes in unsigned short
        MAINTAINERS: move atlx discussions to netdev
        drivers/net/cxgb3/cxgb3_main.c: prevent reading uninitialized stack memory
        drivers/net/eql.c: prevent reading uninitialized stack memory
        drivers/net/usb/hso.c: prevent reading uninitialized memory
        xfrm: dont assume rcu_read_lock in xfrm_output_one()
        r8169: Handle rxfifo errors on 8168 chips
        3c59x: Remove atomic context inside vortex_{set|get}_wol
        tcp: Prevent overzealous packetization by SWS logic.
        net: RPS needs to depend upon USE_GENERIC_SMP_HELPERS
        phylib: fix PAL state machine restart on resume
        net: use rcu_barrier() in rollback_registered_many
        bonding: correctly process non-linear skbs
        ipv4: enable getsockopt() for IP_NODEFRAG
        ipv4: force_igmp_version ignored when a IGMPv3 query received
        ppp: potential NULL dereference in ppp_mp_explode()
        net/llc: make opt unsigned in llc_ui_setsockopt()
        ...
      7d7dee96
    • Linus Torvalds's avatar
      Merge branch 's5p-fixes-for-linus' of... · f1c9c979
      Linus Torvalds authored
      Merge branch 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung
      
      * 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
        ARM: S3C64XX: Add IORESOURCE_IRQ_HIGHLEVEL flag to dm9000 on mach-real6410
        ARM: S3C64XX: Fix coding style errors on mach-real6410
        ARM: S3C64XX: Prototype SPI devices
        ARM: S3C64XX: Fix dev-spi build
        ARM: SAMSUNG: Fix on s5p_gpio_[get,set]_drvstr
        ARM: SAMSUNG: Fix on drive strength value
        ARM: S5PV210: Add FIMC clocks
        ARM: S5PV210: Reduce the iodesc length of systimer
        ARM: S5PV210: Update I2C-1 Clock Register Property.
        ARM: S5P: Decrease IO Registers memory region size on FIMC
        ARM: S5P: Fix DMA coherent mask for FIMC
      f1c9c979
    • Jan Harkes's avatar
      Coda: mount hangs because of missed REQ_WRITE rename · 112d421d
      Jan Harkes authored
      Coda's REQ_* defines were renamed to avoid clashes with the block layer
      (commit 4aeefdc6: "coda: fixup clash with block layer REQ_*
      defines").
      
      However one was missed and response messages are no longer matched with
      requests and waiting threads are no longer woken up.  This patch fixes
      this.
      Signed-off-by: default avatarJan Harkes <jaharkes@cs.cmu.edu>
      [ Also fixed up whitespace while at it  -Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      112d421d
    • Al Viro's avatar
      alpha: deal with multiple simultaneously pending signals · 494486a1
      Al Viro authored
      Unlike the other targets, alpha sets _one_ sigframe and
      buggers off until the next syscall/interrupt, even if
      more signals are pending.  It leads to quite a few unpleasant
      inconsistencies, starting with SIGSEGV potentially arriving
      not where it should and including e.g. mess with sigsuspend();
      consider two pending signals blocked until sigsuspend()
      unblocks them.  We pick the first one; then, if we are hit
      by interrupt while in the handler, we process the second one
      as well.  If we are not, and if no syscalls had been made,
      we get out of the first handler and leave the second signal
      pending; normally sigreturn() would've picked it anyway, but
      here it starts with restoring the original mask and voila -
      the second signal is blocked again.  On everything else we
      get both delivered consistently.
      
      It's actually easy to fix; the only thing to watch out for
      is prevention of double syscall restart.  Fortunately, the
      idea I've nicked from arm fix by rmk works just fine...
      
      Testcase demonstrating the behaviour in question; on alpha
      we get one or both flags set (usually one), on everything
      else both are always set.
      	#include <signal.h>
      	#include <stdio.h>
      	int had1, had2;
      	void f1(int sig) { had1 = 1; }
      	void f2(int sig) { had2 = 1; }
      	main()
      	{
      		sigset_t set1, set2;
      		sigemptyset(&set1);
      		sigemptyset(&set2);
      		sigaddset(&set2, 1);
      		sigaddset(&set2, 2);
      		signal(1, f1);
      		signal(2, f2);
      		sigprocmask(SIG_SETMASK, &set2, NULL);
      		raise(1);
      		raise(2);
      		sigsuspend(&set1);
      		printf("had1:%d had2:%d\n", had1, had2);
      	}
      Tested-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      494486a1
    • Al Viro's avatar
      alpha: fix a 14 years old bug in sigreturn tracing · 53293638
      Al Viro authored
      The way sigreturn() is implemented on alpha breaks PTRACE_SYSCALL,
      all way back to 1.3.95 when alpha has grown PTRACE_SYSCALL support.
      
      What happens is direct return to ret_from_syscall, in order to bypass
      mangling of a3 (error indicator) and prevent other mutilations of
      registers (e.g. by syscall restart).  That's fine, but... the entire
      TIF_SYSCALL_TRACE codepath is kept separate on alpha and post-syscall
      stopping/notifying the tracer is after the syscall.  And the normal
      path we are forcibly switching to doesn't have it.
      
      So we end up with *one* stop in traced sigreturn() vs. two in other
      syscalls.  And yes, strace is visibly broken by that; try to strace
      the following
      	#include <signal.h>
      	#include <stdio.h>
      	void f(int sig) {}
      	main()
      	{
      		signal(SIGHUP, f);
      		raise(SIGHUP);
      		write(1, "eeeek\n", 6);
      	}
      and watch the show.  The
      	close(1)                                = 405
      in the end of strace output is coming from return value of write() (6 ==
      __NR_close on alpha) and syscall number of exit_group() (__NR_exit_group ==
      405 there).
      
      The fix is fairly simple - the only thing we end up missing is the call
      of syscall_trace() and we can tell whether we'd been called from the
      SYSCALL_TRACE path by checking ra value.  Since we are setting the
      switch_stack up (that's what sys_sigreturn() does), we have the right
      environment for calling syscall_trace() - just before we call
      undo_switch_stack() and return.  Since undo_switch_stack() will overwrite
      s0 anyway, we can use it to store the result of "has it been called from
      SYSCALL_TRACE path?" check.  The same thing applies in rt_sigreturn().
      Tested-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      53293638
    • Al Viro's avatar
      alpha: unb0rk sigsuspend() and rt_sigsuspend() · 392fb6e3
      Al Viro authored
      Old code used to set regs->r0 and regs->r19 to force the right
      return value.  Leaving that after switch to ERESTARTNOHAND
      was a Bad Idea(tm), since now that screws the restart - if we
      hit the case when get_signal_to_deliver() returns 0, we will
      step back to syscall insn, with v0 set to EINTR and a3 to 1.
      The latter won't matter, since EINTR is 4, aka __NR_write.
      
      Testcase:
      
      	#include <signal.h>
      	#define _GNU_SOURCE
      	#include <unistd.h>
      	#include <sys/syscall.h>
      
      	main()
      	{
      		sigset_t mask;
      		sigemptyset(&mask);
      		sigaddset(&mask, SIGCONT);
      		sigprocmask(SIG_SETMASK, &mask, NULL);
      		kill(0, SIGCONT);
      		syscall(__NR_sigsuspend, 1, "b0rken\n", 7);
      	}
      
      results on alpha in immediate message to stdout...
      
      Fix is obvious; moreover, since we don't need regs anymore, we can
      switch to normal prototypes for these guys and lose the wrappers.
      Even better, rt_sigsuspend() is identical to generic version in
      kernel/signal.c now.
      Tested-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      392fb6e3
    • Al Viro's avatar
      alpha: belated ERESTART_RESTARTBLOCK race fix · 2deba1bd
      Al Viro authored
      same thing as had been done on other targets back in 2003 -
      move setting ->restart_block.fn into {rt_,}sigreturn().
      Tested-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      2deba1bd
    • Michael Cree's avatar
      alpha: Shift perf event pending work earlier in timer interrupt · bdc8b891
      Michael Cree authored
      Pending work from the performance event subsystem is executed in
      the timer interrupt.  This patch shifts the call to
      perf_event_do_pending() before the call to update_process_times()
      as the latter may call back into the perf event subsystem and it
      is prudent to have the pending work executed first.
      Signed-off-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      bdc8b891
    • Mikael Pettersson's avatar
      alpha: wire up fanotify and prlimit64 syscalls · 531f0474
      Mikael Pettersson authored
      The 2.6.36-rc kernel added three new system calls:
      fanotify_init, fanotify_mark, and prlimit64.  This
      patch wires them up on Alpha.
      
      Built and booted on an XP900.  Untested beyond that.
      Signed-off-by: default avatarMikael Pettersson <mikpe@it.uu.se>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      531f0474
    • Arnd Bergmann's avatar
      alpha: kill big kernel lock · 12e750d9
      Arnd Bergmann authored
      All uses of the BKL on alpha are totally bogus, nothing
      is really protected by this. Remove the remaining users
      so we don't have to mark alpha as 'depends on BKL'.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: linux-alpha@vger.kernel.org
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      12e750d9
    • Tejun Heo's avatar
      alpha: fix build breakage in asm/cacheflush.h · b97f897d
      Tejun Heo authored
      Alpha SMP flush_icache_user_range() is implemented as an inline
      function inside include/asm/cacheflush.h.  It dereferences @current
      but doesn't include linux/sched.h and thus causes build failure if
      linux/sched.h wasn't included previously.  Fix it by including the
      needed header file explicitly.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      b97f897d
    • matt mooney's avatar
    • Joe Perches's avatar
  4. 18 Sep, 2010 3 commits