1. 16 Jul, 2014 2 commits
    • Kirill Tkhai's avatar
      sched: Transform resched_task() into resched_curr() · 8875125e
      Kirill Tkhai authored
      We always use resched_task() with rq->curr argument.
      It's not possible to reschedule any task but rq's current.
      
      The patch introduces resched_curr(struct rq *) to
      replace all of the repeating patterns. The main aim
      is cleanup, but there is a little size profit too:
      
        (before)
      	$ size kernel/sched/built-in.o
      	   text	   data	    bss	    dec	    hex	filename
      	155274	  16445	   7042	 178761	  2ba49	kernel/sched/built-in.o
      
      	$ size vmlinux
      	   text	   data	    bss	    dec	    hex	filename
      	7411490	1178376	 991232	9581098	 92322a	vmlinux
      
        (after)
      	$ size kernel/sched/built-in.o
      	   text	   data	    bss	    dec	    hex	filename
      	155130	  16445	   7042	 178617	  2b9b9	kernel/sched/built-in.o
      
      	$ size vmlinux
      	   text	   data	    bss	    dec	    hex	filename
      	7411362	1178376	 991232	9580970	 9231aa	vmlinux
      
      	I was choosing between resched_curr() and resched_rq(),
      	and the first name looks better for me.
      
      A little lie in Documentation/trace/ftrace.txt. I have not
      actually collected the tracing again. With a hope the patch
      won't make execution times much worse :)
      Signed-off-by: default avatarKirill Tkhai <tkhai@yandex.ru>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20140628200219.1778.18735.stgit@localhostSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8875125e
    • Oleg Nesterov's avatar
      sched/deadline: Kill task_struct->pi_top_task · 466af29b
      Oleg Nesterov authored
      Remove task_struct->pi_top_task. The only user, rt_mutex_setprio(),
      can use a local.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Daeseok Youn <daeseok.youn@gmail.com>
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Dempsky <mdempsky@chromium.org>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/20140606165206.GB29465@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      466af29b
  2. 05 Jul, 2014 17 commits
  3. 18 Jun, 2014 4 commits
  4. 17 Jun, 2014 1 commit
  5. 16 Jun, 2014 11 commits
    • Frederic Weisbecker's avatar
      nohz: Use IPI implicit full barrier against rq->nr_running r/w · 3882ec64
      Frederic Weisbecker authored
      A full dynticks CPU is allowed to stop its tick when a single task runs.
      Meanwhile when a new task gets enqueued, the CPU must be notified so that
      it can restart its tick to maintain local fairness and other accounting
      details.
      
      This notification is performed by way of an IPI. Then when the target
      receives the IPI, we expect it to see the new value of rq->nr_running.
      
      Hence the following ordering scenario:
      
         CPU 0                   CPU 1
      
         write rq->running       get IPI
         smp_wmb()               smp_rmb()
         send IPI                read rq->nr_running
      
      But Paul Mckenney says that nowadays IPIs imply a full barrier on
      all architectures. So we can safely remove this pair and rely on the
      implicit barriers that come along IPI send/receive. Lets
      just comment on this new assumption.
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      3882ec64
    • Frederic Weisbecker's avatar
      nohz: Use nohz own full kick on 2nd task enqueue · fd2ac4f4
      Frederic Weisbecker authored
      Now that we have a nohz full remote kick based on irq work, lets use
      it to notify a CPU that it's exiting single task mode.
      
      This unbloats a bit the scheduler IPI that the nohz code was abusing
      for its cool "callable anywhere/anytime" properties.
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      fd2ac4f4
    • Frederic Weisbecker's avatar
      nohz: Switch to nohz full remote kick on timer enqueue · 53c5fa16
      Frederic Weisbecker authored
      When a new timer is enqueued on a full dynticks target, that CPU must
      re-evaluate the next tick to handle the timer correctly.
      
      This is currently performed through the scheduler IPI. Meanwhile this
      happens at the cost of off-topic workarounds in that fast path to make
      it call irq_exit().
      
      As we plan to remove this hack off the scheduler IPI, lets use
      the nohz full kick instead. Pretty much any IPI fits for that job
      as long at it calls irq_exit(). The nohz full kick just happens to be
      handy and readily available here.
      
      If it happens to be too much an overkill in the future, we can still
      turn that timer kick into an empty IPI.
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      53c5fa16
    • Frederic Weisbecker's avatar
      nohz: Support nohz full remote kick · 3d36aebc
      Frederic Weisbecker authored
      Remotely kicking a full nohz CPU in order to make it re-evaluate its
      next tick is currently implemented using the scheduler IPI.
      
      However this bloats a scheduler fast path with an off-topic feature.
      The scheduler tick was abused here for its cool "callable
      anywhere/anytime" properties.
      
      But now that the irq work subsystem can queue remote callbacks, it's
      a perfect fit to safely queue IPIs when interrupts are disabled
      without worrying about concurrent callers.
      
      So lets implement remote kick on top of irq work. This is going to
      be used when a new event requires the next tick to be recalculated:
      more than 1 task competing on the CPU, timer armed, ...
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      3d36aebc
    • Frederic Weisbecker's avatar
      irq_work: Implement remote queueing · 47885016
      Frederic Weisbecker authored
      irq work currently only supports local callbacks. However its code
      is mostly ready to run remote callbacks and we have some potential user.
      
      The full nohz subsystem currently open codes its own remote irq work
      on top of the scheduler ipi when it wants a CPU to reevaluate its next
      tick. However this ad hoc solution bloats the scheduler IPI.
      
      Lets just extend the irq work subsystem to support remote queuing on top
      of the generic SMP IPI to handle this kind of user. This shouldn't add
      noticeable overhead.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      47885016
    • Frederic Weisbecker's avatar
      irq_work: Split raised and lazy lists · b93e0b8f
      Frederic Weisbecker authored
      An irq work can be handled from two places: from the tick if the work
      carries the "lazy" flag and the tick is periodic, or from a self IPI.
      
      We merge all these works in a single list and we use some per cpu latch
      to avoid raising a self-IPI when one is already pending.
      
      Now we could do away with this ugly latch if only the list was only made of
      non-lazy works. Just enqueueing a work on the empty list would be enough
      to know if we need to raise an IPI or not.
      
      Also we are going to implement remote irq work queuing. Then the per CPU
      latch will need to become atomic in the global scope. That's too bad
      because, here as well, just enqueueing a work on an empty list of
      non-lazy works would be enough to know if we need to raise an IPI or not.
      
      So lets take a way out of this: split the works in two distinct lists,
      one for the works that can be handled by the next tick and another
      one for those handled by the IPI. Just checking if the latter is empty
      when we queue a new work is enough to know if we need to raise an IPI.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      b93e0b8f
    • Benjamin Herrenschmidt's avatar
      Revert "offb: Add palette hack for little endian" · 68986c9f
      Benjamin Herrenschmidt authored
      This reverts commit e1edf18b.
      
      This patch was a misguided attempt at fixing offb for LE ppc64
      kernels on BE qemu but is just wrong ... it breaks real LE/LE
      setups, LE with real HW, and existing mixed endian systems
      that did the fight thing with the appropriate device-tree
      property. Bad reviewing on my part, sorry.
      
      The right fix is to either make qemu change its endian when
      the guest changes endian (working on that) or to use the
      existing foreign endian support.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: <stable@vger.kernel.org> [v3.13+]
      ---
      68986c9f
    • Linus Torvalds's avatar
      Linux 3.16-rc1 · 7171511e
      Linus Torvalds authored
      7171511e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a9be2242
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix checksumming regressions, from Tom Herbert.
      
       2) Undo unintentional permissions changes for SCTP rto_alpha and
          rto_beta sysfs knobs, from Denial Borkmann.
      
       3) VXLAN, like other IP tunnels, should advertize it's encapsulation
          size using dev->needed_headroom instead of dev->hard_header_len.
          From Cong Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: sctp: fix permissions for rto_alpha and rto_beta knobs
        vxlan: Checksum fixes
        net: add skb_pop_rcv_encapsulation
        udp: call __skb_checksum_complete when doing full checksum
        net: Fix save software checksum complete
        net: Fix GSO constants to match NETIF flags
        udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
        vxlan: use dev->needed_headroom instead of dev->hard_header_len
        MAINTAINERS: update cxgb4 maintainer
      a9be2242
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux · dd1845af
      Linus Torvalds authored
      Pull more clock framework updates from Mike Turquette:
       "This contains the second half the of the clk changes for 3.16.
      
        They are simply fixes and code refactoring for the OMAP clock drivers.
        The sunxi clock driver changes include splitting out the one
        mega-driver into several smaller pieces and adding support for the A31
        SoC clocks"
      
      * tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux: (25 commits)
        clk: sunxi: document PRCM clock compatible strings
        clk: sunxi: add PRCM (Power/Reset/Clock Management) clks support
        clk: sun6i: Protect SDRAM gating bit
        clk: sun6i: Protect CPU clock
        clk: sunxi: Rework clock protection code
        clk: sunxi: Move the GMAC clock to a file of its own
        clk: sunxi: Move the 24M oscillator to a file of its own
        clk: sunxi: Remove calls to clk_put
        clk: sunxi: document new A31 USB clock compatible
        clk: sunxi: Implement A31 USB clock
        ARM: dts: OMAP5/DRA7: use omap5-mpu-dpll-clock capable of dealing with higher frequencies
        CLK: TI: dpll: support OMAP5 MPU DPLL that need special handling for higher frequencies
        ARM: OMAP5+: dpll: support Duty Cycle Correction(DCC)
        CLK: TI: clk-54xx: Set the rate for dpll_abe_m2x2_ck
        CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)
        dt:/bindings: DRA7 ATL (Audio Tracking Logic) clock bindings
        ARM: dts: dra7xx-clocks: Correct name for atl clkin3 clock
        CLK: TI: gate: add composite interface clock to OMAP2 only build
        ARM: OMAP2: clock: add DT boot support for cpufreq_ck
        CLK: TI: OMAP2: add clock init support
        ...
      dd1845af
    • Linus Torvalds's avatar
      Merge git://git.infradead.org/users/willy/linux-nvme · b55b3902
      Linus Torvalds authored
      Pull NVMe update from Matthew Wilcox:
       "Mostly bugfixes again for the NVMe driver.  I'd like to call out the
        exported tracepoint in the block layer; I believe Keith has cleared
        this with Jens.
      
        We've had a few reports from people who're really pounding on NVMe
        devices at scale, hence the timeout changes (and new module
        parameters), hotplug cpu deadlock, tracepoints, and minor performance
        tweaks"
      
      [ Jens hadn't seen that tracepoint thing, but is ok with it - it will
        end up going away when mq conversion happens ]
      
      * git://git.infradead.org/users/willy/linux-nvme: (22 commits)
        NVMe: Fix START_STOP_UNIT Scsi->NVMe translation.
        NVMe: Use Log Page constants in SCSI emulation
        NVMe: Define Log Page constants
        NVMe: Fix hot cpu notification dead lock
        NVMe: Rename io_timeout to nvme_io_timeout
        NVMe: Use last bytes of f/w rev SCSI Inquiry
        NVMe: Adhere to request queue block accounting enable/disable
        NVMe: Fix nvme get/put queue semantics
        NVMe: Delete NVME_GET_FEAT_TEMP_THRESH
        NVMe: Make admin timeout a module parameter
        NVMe: Make iod bio timeout a parameter
        NVMe: Prevent possible NULL pointer dereference
        NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD)
        NVMe: Update data structures for NVMe 1.2
        NVMe: Enable BUILD_BUG_ON checks
        NVMe: Update namespace and controller identify structures to the 1.1a spec
        NVMe: Flush with data support
        NVMe: Configure support for block flush
        NVMe: Add tracepoints
        NVMe: Protect against badly formatted CQEs
        ...
      b55b3902
  6. 15 Jun, 2014 5 commits
    • Daniel Borkmann's avatar
      net: sctp: fix permissions for rto_alpha and rto_beta knobs · b58537a1
      Daniel Borkmann authored
      Commit 3fd091e7 ("[SCTP]: Remove multiple levels of msecs
      to jiffies conversions.") has silently changed permissions for
      rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
      this was to discourage users from tweaking rto_alpha and
      rto_beta knobs in production environments since they are key
      to correctly compute rtt/srtt.
      
      RFC4960 under section 6.3.1. RTO Calculation says regarding
      rto_alpha and rto_beta under rule C3 and C4:
      
        [...]
        C3)  When a new RTT measurement R' is made, set
      
             RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|
      
             and
      
             SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'
      
             Note: The value of SRTT used in the update to RTTVAR
             is its value before updating SRTT itself using the
             second assignment. After the computation, update
             RTO <- SRTT + 4 * RTTVAR.
      
        C4)  When data is in flight and when allowed by rule C5
             below, a new RTT measurement MUST be made each round
             trip. Furthermore, new RTT measurements SHOULD be
             made no more than once per round trip for a given
             destination transport address. There are two reasons
             for this recommendation: First, it appears that
             measuring more frequently often does not in practice
             yield any significant benefit [ALLMAN99]; second,
             if measurements are made more often, then the values
             of RTO.Alpha and RTO.Beta in rule C3 above should be
             adjusted so that SRTT and RTTVAR still adjust to
             changes at roughly the same rate (in terms of how many
             round trips it takes them to reflect new values) as
             they would if making only one measurement per
             round-trip and using RTO.Alpha and RTO.Beta as given
             in rule C3. However, the exact nature of these
             adjustments remains a research issue.
        [...]
      
      While it is discouraged to adjust rto_alpha and rto_beta
      and not further specified how to adjust them, the RFC also
      doesn't explicitly forbid it, but rather gives a RECOMMENDED
      default value (rto_alpha=3, rto_beta=2). We have a couple
      of users relying on the old permissions before they got
      changed. That said, if someone really has the urge to adjust
      them, we could allow it with a warning in the log.
      
      Fixes: 3fd091e7 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b58537a1
    • David S. Miller's avatar
      Merge branch 'csum_fixes' · e4f7ae93
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      Fixes related to some recent checksum modifications.
      
      - Fix GSO constants to match NETIF flags
      - Fix logic in saving checksum complete in __skb_checksum_complete
      - Call __skb_checksum_complete from UDP if we are checksumming over
        whole packet in order to save checksum.
      - Fixes to VXLAN to work correctly with checksum complete
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4f7ae93
    • Tom Herbert's avatar
      vxlan: Checksum fixes · f79b064c
      Tom Herbert authored
      Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet
      header to work properly with checksum complete.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f79b064c
    • Tom Herbert's avatar
      net: add skb_pop_rcv_encapsulation · e5eb4e30
      Tom Herbert authored
      This function is used by UDP encapsulation protocols in RX when
      crossing encapsulation boundary. If ip_summed is set to
      CHECKSUM_UNNECESSARY and encapsulation is not set, change to
      CHECKSUM_NONE since the checksum has not been validated within the
      encapsulation. Clears csum_valid by the same rationale.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5eb4e30
    • Tom Herbert's avatar
      udp: call __skb_checksum_complete when doing full checksum · bbdff225
      Tom Herbert authored
      In __udp_lib_checksum_complete check if checksum is being done over all
      the data (len is equal to skb->len) and if it is call
      __skb_checksum_complete instead of __skb_checksum_complete_head. This
      allows checksum to be saved in checksum complete.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbdff225