1. 09 Jul, 2014 14 commits
    • Paul E. McKenney's avatar
      rcu: Remove CONFIG_PROVE_RCU_DELAY · 11992c70
      Paul E. McKenney authored
      The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very
      effective at finding race conditions, so this commit removes it.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      [ paulmck: Remove definition and uses as noted by Paul Bolle. ]
      11992c70
    • Shan Wei's avatar
      rcu: Use __this_cpu_read() instead of per_cpu_ptr() · d860d403
      Shan Wei authored
      The __this_cpu_read() function produces better code than does
      per_cpu_ptr() on both ARM and x86.  For example, gcc (Ubuntu/Linaro
      4.7.3-12ubuntu1) 4.7.3 produces the following:
      
      ARMv7 per_cpu_ptr():
      
      force_quiescent_state:
          mov    r3, sp    @,
          bic    r1, r3, #8128    @ tmp171,,
          ldr    r2, .L98    @ tmp169,
          bic    r1, r1, #63    @ tmp170, tmp171,
          ldr    r3, [r0, #220]    @ __ptr, rsp_6(D)->rda
          ldr    r1, [r1, #20]    @ D.35903_68->cpu, D.35903_68->cpu
          mov    r6, r0    @ rsp, rsp
          ldr    r2, [r2, r1, asl #2]    @ tmp173, __per_cpu_offset
          add    r3, r3, r2    @ tmp175, __ptr, tmp173
          ldr    r5, [r3, #12]    @ rnp_old, D.29162_13->mynode
      
      ARMv7 __this_cpu_read():
      
      force_quiescent_state:
          ldr    r3, [r0, #220]    @ rsp_7(D)->rda, rsp_7(D)->rda
          mov    r6, r0    @ rsp, rsp
          add    r3, r3, #12    @ __ptr, rsp_7(D)->rda,
          ldr    r5, [r2, r3]    @ rnp_old, *D.29176_13
      
      Using gcc 4.8.2:
      
      x86_64 per_cpu_ptr():
      
          movl %gs:cpu_number,%edx    # cpu_number, pscr_ret__
          movslq    %edx, %rdx    # pscr_ret__, pscr_ret__
          movq    __per_cpu_offset(,%rdx,8), %rdx    # __per_cpu_offset, tmp93
          movq    %rdi, %r13    # rsp, rsp
          movq    1000(%rdi), %rax    # rsp_9(D)->rda, __ptr
          movq    24(%rdx,%rax), %r12    # _15->mynode, rnp_old
      
      x86_64 __this_cpu_read():
      
          movq    %rdi, %r13    # rsp, rsp
          movq    1000(%rdi), %rax    # rsp_9(D)->rda, rsp_9(D)->rda
          movq %gs:24(%rax),%r12    # _10->mynode, rnp_old
      
      Because this change produces significant benefits for these two very
      diverse architectures, this commit makes this change.
      Signed-off-by: default avatarShan Wei <davidshan@tencent.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      d860d403
    • Paul E. McKenney's avatar
      rcu: Don't use NMIs to dump other CPUs' stacks · bc1dce51
      Paul E. McKenney authored
      Although NMI-based stack dumps are in principle more accurate, they are
      also more likely to trigger deadlocks.  This commit therefore replaces
      all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
      that the CPU detecting an RCU CPU stall does the stack dumping.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      bc1dce51
    • Paul E. McKenney's avatar
      rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs · c0f489d2
      Paul E. McKenney authored
      Binding the grace-period kthreads to the timekeeping CPU resulted in
      significant performance decreases for some workloads.  For more detail,
      see:
      
      https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
      
      https://lkml.org/lkml/2014/6/4/218 for CPU statistics
      
      It turns out that it is necessary to bind the grace-period kthreads
      to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
      on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
      In other cases, it suffices to bind the grace-period kthreads to the
      set of non-nohz_full CPUs.
      
      This commit therefore creates a tick_nohz_not_full_mask that is the
      complement of tick_nohz_full_mask, and then binds the grace-period
      kthread to the set of CPUs indicated by this new mask, which covers
      the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
      case still binds the grace-period kthreads to the timekeeping CPU.
      This commit also includes the tick_nohz_full_enabled() check suggested
      by Frederic Weisbecker.
      Reported-by: default avatarJet Chen <jet.chen@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Created housekeeping_affine() and housekeeping_mask per
        fweisbec feedback. ]
      c0f489d2
    • Paul E. McKenney's avatar
      rcu: Simplify priority boosting by putting rt_mutex in rcu_node · abaa93d9
      Paul E. McKenney authored
      RCU priority boosting currently checks for boosting via a pointer in
      task_struct.  However, this is not needed: As Oleg noted, if the
      rt_mutex is placed in the rcu_node instead of on the booster's stack,
      the boostee can simply check it see if it owns the lock.  This commit
      makes this change, shrinking task_struct by one pointer and the kernel
      by thirteen lines.
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      abaa93d9
    • Pranith Kumar's avatar
      rcu: Check both root and current rcu_node when setting up future grace period · 48bd8e9b
      Pranith Kumar authored
      The rcu_start_future_gp() function checks the current rcu_node's ->gpnum
      and ->completed twice, once without ACCESS_ONCE() and once with it.
      Which is pointless because we hold that rcu_node's ->lock at that point.
      The intent was to check the current rcu_node structure and the root
      rcu_node structure, the latter locklessly with ACCESS_ONCE().  This
      commit therefore makes that change.
      
      The reason that it is safe to locklessly check the root rcu_nodes's
      ->gpnum and ->completed fields is that we hold the current rcu_node's
      ->lock, which constrains the root rcu_node's ability to change its
      ->gpnum and ->completed fields.  Of course, if there is a single rcu_node
      structure, then rnp_root==rnp, and holding the lock prevents all changes.
      If there is more than one rcu_node structure, then the code updates the
      fields in the following order:
      
      1.	Increment rnp_root->gpnum to start new grace period.
      2.	Increment rnp->gpnum to initialize the current rcu_node,
      	continuing initialization for the new grace period.
      3.	Increment rnp_root->completed to end the current grace period.
      4.	Increment rnp->completed to continue cleaning up after the
      	old grace period.
      
      So there are four possible combinations of relative values of these
      four fields:
      
      N   N   N   N:  RCU idle, new grace period must be initiated.
      		Although rnp_root->gpnum might be incremented immediately
      		after we check, that will just result in unnecessary work.
      		The grace period already started, and we try to start it.
      
      N+1 N   N   N:  RCU grace period just started.  No further change is
      		possible because we hold rnp->lock, so the checks of
      		rnp_root->gpnum and rnp_root->completed are stable.
      		We know that our request for a future grace period will
      		be seen during grace-period cleanup.
      
      N+1 N   N+1 N:  RCU grace period is ongoing.  Because rnp->gpnum is
      		different than rnp->completed, we won't even look at
      		rnp_root->gpnum and rnp_root->completed, so the possible
      		concurrent change to rnp_root->completed does not matter.
      		We know that our request for a future grace period will
      		be seen during grace-period cleanup, which cannot pass
      		this rcu_node because we hold its ->lock.
      
      N+1 N+1 N+1 N:  RCU grace period has ended, but not yet been cleaned up.
      		Because rnp->gpnum is different than rnp->completed, we
      		won't look at rnp_root->gpnum and rnp_root->completed, so
      		the possible concurrent change to rnp_root->completed does
      		not matter.  We know that our request for a future grace
      		period will be seen during grace-period cleanup, which
      		cannot pass this rcu_node because we hold its ->lock.
      
      Therefore, despite initial appearances, the lockless check is safe.
      Signed-off-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      [ paulmck: Update comment to say why the lockless check is safe. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      48bd8e9b
    • Paul E. McKenney's avatar
      rcu: Allow post-unlock reference for rt_mutex · dfeb9765
      Paul E. McKenney authored
      The current approach to RCU priority boosting uses an rt_mutex strictly
      for its priority-boosting side effects.  The rt_mutex_init_proxy_locked()
      function is used by the booster to initialize the lock as held by the
      boostee.  The booster then uses rt_mutex_lock() to acquire this rt_mutex,
      which priority-boosts the boostee.  When the boostee reaches the end
      of its outermost RCU read-side critical section, it checks a field in
      its task structure to see whether it has been boosted, and, if so, uses
      rt_mutex_unlock() to release the rt_mutex.  The booster can then go on
      to boost the next task that is blocking the current RCU grace period.
      
      But reasonable implementations of rt_mutex_unlock() might result in the
      boostee referencing the rt_mutex's data after releasing it.  But the
      booster might have re-initialized the rt_mutex between the time that the
      boostee released it and the time that it later referenced it.  This is
      clearly asking for trouble, so this commit introduces a completion that
      forces the booster to wait until the boostee has completely finished with
      the rt_mutex, thus avoiding the case where the booster is re-initializing
      the rt_mutex before the last boostee's last reference to that rt_mutex.
      
      This of course does introduce some overhead, but the priority-boosting
      code paths are miles from any possible fastpath, and the overhead of
      executing the completion will normally be quite small compared to the
      overhead of priority boosting and deboosting, so this should be OK.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dfeb9765
    • Paul E. McKenney's avatar
      rcu: Loosen __call_rcu()'s rcu_head alignment constraint · 1146edcb
      Paul E. McKenney authored
      The m68k architecture aligns only to 16-bit boundaries, which can cause
      the align-to-32-bits check in __call_rcu() to trigger.  Because there is
      currently no known potential need for more than one low-order bit, this
      commit loosens the check to 16-bit boundaries.
      Reported-by: default avatarGreg Ungerer <gerg@uclinux.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      1146edcb
    • Paul E. McKenney's avatar
      rcu: Eliminate read-modify-write ACCESS_ONCE() calls · a792563b
      Paul E. McKenney authored
      RCU contains code of the following forms:
      
      	ACCESS_ONCE(x)++;
      	ACCESS_ONCE(x) += y;
      	ACCESS_ONCE(x) -= y;
      
      Now these constructs do operate correctly, but they really result in a
      pair of volatile accesses, one to do the load and another to do the store.
      This can be confusing, as the casual reader might well assume that (for
      example) gcc might generate a memory-to-memory add instruction for each
      of these three cases.  In fact, gcc will do no such thing.  Also, there
      is a good chance that the kernel will move to separate load and store
      variants of ACCESS_ONCE(), and constructs like the above could easily
      confuse both people and scripts attempting to make that sort of change.
      Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really
      only need the store to be volatile, so that the read-modify-write form
      might be misleading.
      
      This commit therefore changes the above forms in RCU so that each instance
      of ACCESS_ONCE() either does a load or a store, but not both.  In a few
      cases, ACCESS_ONCE() was not critical, for example, for maintaining
      statisitics.  In these cases, ACCESS_ONCE() has been dispensed with
      entirely.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a792563b
    • Paul E. McKenney's avatar
      rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu · 4da117cf
      Paul E. McKenney authored
      In kernels built with CONFIG_NO_HZ_FULL, tick_do_timer_cpu is constant
      once boot completes.  Thus, there is no need to wrap it in ACCESS_ONCE()
      in code that is built only when CONFIG_NO_HZ_FULL.  This commit therefore
      removes the redundant ACCESS_ONCE().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      4da117cf
    • Fabian Frederick's avatar
      rcu: Make rcu node arrays static const char * const · b4426b49
      Fabian Frederick authored
      Those two arrays are being passed to lockdep_init_map(), which expects
      const char *, and are stored in lockdep_map the same way.
      
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b4426b49
    • Paul E. McKenney's avatar
      signal: Explain local_irq_save() call · c41247e1
      Paul E. McKenney authored
      The explicit local_irq_save() in __lock_task_sighand() is needed to avoid
      a potential deadlock condition, as noted in a841796f (signal:
      align __lock_task_sighand() irq disabling and RCU).  However, someone
      reading the code might be forgiven for concluding that this separate
      local_irq_save() was completely unnecessary.  This commit therefore adds
      a comment referencing the shiny new block comment on rcu_read_unlock().
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      c41247e1
    • Paul E. McKenney's avatar
    • Paul E. McKenney's avatar
  2. 23 Jun, 2014 2 commits
    • Paul E. McKenney's avatar
      rcu: Reduce overhead of cond_resched() checks for RCU · 4a81e832
      Paul E. McKenney authored
      Commit ac1bea85 (Make cond_resched() report RCU quiescent states)
      fixed a problem where a CPU looping in the kernel with but one runnable
      task would give RCU CPU stall warnings, even if the in-kernel loop
      contained cond_resched() calls.  Unfortunately, in so doing, it introduced
      performance regressions in Anton Blanchard's will-it-scale "open1" test.
      The problem appears to be not so much the increased cond_resched() path
      length as an increase in the rate at which grace periods complete, which
      increased per-update grace-period overhead.
      
      This commit takes a different approach to fixing this bug, mainly by
      moving the RCU-visible quiescent state from cond_resched() to
      rcu_note_context_switch(), and by further reducing the check to a
      simple non-zero test of a single per-CPU variable.  However, this
      approach requires that the force-quiescent-state processing send
      resched IPIs to the offending CPUs.  These will be sent only once
      the grace period has reached an age specified by the boot/sysfs
      parameter rcutree.jiffies_till_sched_qs, or once the grace period
      reaches an age halfway to the point at which RCU CPU stall warnings
      will be emitted, whichever comes first.
      Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
        ktest build robot.  Also fixed smp_mb() comment as noted by
        Oleg Nesterov. ]
      
      Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4a81e832
    • Paul E. McKenney's avatar
      rcu: Export debug_init_rcu_head() and and debug_init_rcu_head() · 546a9d85
      Paul E. McKenney authored
      Currently, call_rcu() relies on implicit allocation and initialization
      for the debug-objects handling of RCU callbacks.  If you hammer the
      kernel hard enough with Sasha's modified version of trinity, you can end
      up with the sl*b allocators recursing into themselves via this implicit
      call_rcu() allocation.
      
      This commit therefore exports the debug_init_rcu_head() and
      debug_rcu_head_free() functions, which permits the allocators to allocated
      and pre-initialize the debug-objects information, so that there no longer
      any need for call_rcu() to do that initialization, which in turn prevents
      the recursion into the memory allocators.
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Looks-good-to: Christoph Lameter <cl@linux.com>
      546a9d85
  3. 16 Jun, 2014 4 commits
    • Linus Torvalds's avatar
      Linux 3.16-rc1 · 7171511e
      Linus Torvalds authored
      7171511e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a9be2242
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix checksumming regressions, from Tom Herbert.
      
       2) Undo unintentional permissions changes for SCTP rto_alpha and
          rto_beta sysfs knobs, from Denial Borkmann.
      
       3) VXLAN, like other IP tunnels, should advertize it's encapsulation
          size using dev->needed_headroom instead of dev->hard_header_len.
          From Cong Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: sctp: fix permissions for rto_alpha and rto_beta knobs
        vxlan: Checksum fixes
        net: add skb_pop_rcv_encapsulation
        udp: call __skb_checksum_complete when doing full checksum
        net: Fix save software checksum complete
        net: Fix GSO constants to match NETIF flags
        udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
        vxlan: use dev->needed_headroom instead of dev->hard_header_len
        MAINTAINERS: update cxgb4 maintainer
      a9be2242
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux · dd1845af
      Linus Torvalds authored
      Pull more clock framework updates from Mike Turquette:
       "This contains the second half the of the clk changes for 3.16.
      
        They are simply fixes and code refactoring for the OMAP clock drivers.
        The sunxi clock driver changes include splitting out the one
        mega-driver into several smaller pieces and adding support for the A31
        SoC clocks"
      
      * tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux: (25 commits)
        clk: sunxi: document PRCM clock compatible strings
        clk: sunxi: add PRCM (Power/Reset/Clock Management) clks support
        clk: sun6i: Protect SDRAM gating bit
        clk: sun6i: Protect CPU clock
        clk: sunxi: Rework clock protection code
        clk: sunxi: Move the GMAC clock to a file of its own
        clk: sunxi: Move the 24M oscillator to a file of its own
        clk: sunxi: Remove calls to clk_put
        clk: sunxi: document new A31 USB clock compatible
        clk: sunxi: Implement A31 USB clock
        ARM: dts: OMAP5/DRA7: use omap5-mpu-dpll-clock capable of dealing with higher frequencies
        CLK: TI: dpll: support OMAP5 MPU DPLL that need special handling for higher frequencies
        ARM: OMAP5+: dpll: support Duty Cycle Correction(DCC)
        CLK: TI: clk-54xx: Set the rate for dpll_abe_m2x2_ck
        CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)
        dt:/bindings: DRA7 ATL (Audio Tracking Logic) clock bindings
        ARM: dts: dra7xx-clocks: Correct name for atl clkin3 clock
        CLK: TI: gate: add composite interface clock to OMAP2 only build
        ARM: OMAP2: clock: add DT boot support for cpufreq_ck
        CLK: TI: OMAP2: add clock init support
        ...
      dd1845af
    • Linus Torvalds's avatar
      Merge git://git.infradead.org/users/willy/linux-nvme · b55b3902
      Linus Torvalds authored
      Pull NVMe update from Matthew Wilcox:
       "Mostly bugfixes again for the NVMe driver.  I'd like to call out the
        exported tracepoint in the block layer; I believe Keith has cleared
        this with Jens.
      
        We've had a few reports from people who're really pounding on NVMe
        devices at scale, hence the timeout changes (and new module
        parameters), hotplug cpu deadlock, tracepoints, and minor performance
        tweaks"
      
      [ Jens hadn't seen that tracepoint thing, but is ok with it - it will
        end up going away when mq conversion happens ]
      
      * git://git.infradead.org/users/willy/linux-nvme: (22 commits)
        NVMe: Fix START_STOP_UNIT Scsi->NVMe translation.
        NVMe: Use Log Page constants in SCSI emulation
        NVMe: Define Log Page constants
        NVMe: Fix hot cpu notification dead lock
        NVMe: Rename io_timeout to nvme_io_timeout
        NVMe: Use last bytes of f/w rev SCSI Inquiry
        NVMe: Adhere to request queue block accounting enable/disable
        NVMe: Fix nvme get/put queue semantics
        NVMe: Delete NVME_GET_FEAT_TEMP_THRESH
        NVMe: Make admin timeout a module parameter
        NVMe: Make iod bio timeout a parameter
        NVMe: Prevent possible NULL pointer dereference
        NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD)
        NVMe: Update data structures for NVMe 1.2
        NVMe: Enable BUILD_BUG_ON checks
        NVMe: Update namespace and controller identify structures to the 1.1a spec
        NVMe: Flush with data support
        NVMe: Configure support for block flush
        NVMe: Add tracepoints
        NVMe: Protect against badly formatted CQEs
        ...
      b55b3902
  4. 15 Jun, 2014 11 commits
    • Daniel Borkmann's avatar
      net: sctp: fix permissions for rto_alpha and rto_beta knobs · b58537a1
      Daniel Borkmann authored
      Commit 3fd091e7 ("[SCTP]: Remove multiple levels of msecs
      to jiffies conversions.") has silently changed permissions for
      rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
      this was to discourage users from tweaking rto_alpha and
      rto_beta knobs in production environments since they are key
      to correctly compute rtt/srtt.
      
      RFC4960 under section 6.3.1. RTO Calculation says regarding
      rto_alpha and rto_beta under rule C3 and C4:
      
        [...]
        C3)  When a new RTT measurement R' is made, set
      
             RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|
      
             and
      
             SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'
      
             Note: The value of SRTT used in the update to RTTVAR
             is its value before updating SRTT itself using the
             second assignment. After the computation, update
             RTO <- SRTT + 4 * RTTVAR.
      
        C4)  When data is in flight and when allowed by rule C5
             below, a new RTT measurement MUST be made each round
             trip. Furthermore, new RTT measurements SHOULD be
             made no more than once per round trip for a given
             destination transport address. There are two reasons
             for this recommendation: First, it appears that
             measuring more frequently often does not in practice
             yield any significant benefit [ALLMAN99]; second,
             if measurements are made more often, then the values
             of RTO.Alpha and RTO.Beta in rule C3 above should be
             adjusted so that SRTT and RTTVAR still adjust to
             changes at roughly the same rate (in terms of how many
             round trips it takes them to reflect new values) as
             they would if making only one measurement per
             round-trip and using RTO.Alpha and RTO.Beta as given
             in rule C3. However, the exact nature of these
             adjustments remains a research issue.
        [...]
      
      While it is discouraged to adjust rto_alpha and rto_beta
      and not further specified how to adjust them, the RFC also
      doesn't explicitly forbid it, but rather gives a RECOMMENDED
      default value (rto_alpha=3, rto_beta=2). We have a couple
      of users relying on the old permissions before they got
      changed. That said, if someone really has the urge to adjust
      them, we could allow it with a warning in the log.
      
      Fixes: 3fd091e7 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b58537a1
    • David S. Miller's avatar
      Merge branch 'csum_fixes' · e4f7ae93
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      Fixes related to some recent checksum modifications.
      
      - Fix GSO constants to match NETIF flags
      - Fix logic in saving checksum complete in __skb_checksum_complete
      - Call __skb_checksum_complete from UDP if we are checksumming over
        whole packet in order to save checksum.
      - Fixes to VXLAN to work correctly with checksum complete
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4f7ae93
    • Tom Herbert's avatar
      vxlan: Checksum fixes · f79b064c
      Tom Herbert authored
      Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet
      header to work properly with checksum complete.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f79b064c
    • Tom Herbert's avatar
      net: add skb_pop_rcv_encapsulation · e5eb4e30
      Tom Herbert authored
      This function is used by UDP encapsulation protocols in RX when
      crossing encapsulation boundary. If ip_summed is set to
      CHECKSUM_UNNECESSARY and encapsulation is not set, change to
      CHECKSUM_NONE since the checksum has not been validated within the
      encapsulation. Clears csum_valid by the same rationale.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5eb4e30
    • Tom Herbert's avatar
      udp: call __skb_checksum_complete when doing full checksum · bbdff225
      Tom Herbert authored
      In __udp_lib_checksum_complete check if checksum is being done over all
      the data (len is equal to skb->len) and if it is call
      __skb_checksum_complete instead of __skb_checksum_complete_head. This
      allows checksum to be saved in checksum complete.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbdff225
    • Tom Herbert's avatar
      net: Fix save software checksum complete · 46fb51eb
      Tom Herbert authored
      Geert reported issues regarding checksum complete and UDP.
      The logic introduced in commit 7e3cead5
      ("net: Save software checksum complete") is not correct.
      
      This patch:
      1) Restores code in __skb_checksum_complete_header except for setting
         CHECKSUM_UNNECESSARY. This function may be calculating checksum on
         something less than skb->len.
      2) Adds saving checksum to __skb_checksum_complete. The full packet
         checksum 0..skb->len is calculated without adding in pseudo header.
         This value is saved in skb->csum and then the pseudo header is added
         to that to derive the checksum for validation.
      3) In both __skb_checksum_complete_header and __skb_checksum_complete,
         set skb->csum_valid to whether checksum of zero was computed. This
         allows skb_csum_unnecessary to return true without changing to
         CHECKSUM_UNNECESSARY which was done previously.
      4) Copy new csum related bits in __copy_skb_header.
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46fb51eb
    • Tom Herbert's avatar
      net: Fix GSO constants to match NETIF flags · 4b28252c
      Tom Herbert authored
      Joseph Gasparakis reported that VXLAN GSO offload stopped working with
      i40e device after recent UDP changes. The problem is that the
      SKB_GSO_* bits are out of sync with the corresponding NETIF flags. This
      patch fixes that. Also, we add BUILD_BUG_ONs in net_gso_ok for several
      GSO constants that were missing to avoid the problem in the future.
      Reported-by: default avatarJoseph Gasparakis <joseph.gasparakis@intel.com>
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b28252c
    • Linus Torvalds's avatar
      Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · abf04af7
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "This is just a couple of drivers (hpsa and lpfc) that got left out for
        further testing in linux-next.  We also have one fix to a prior
        submission (qla2xxx sparse)"
      
      * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (36 commits)
        qla2xxx: fix sparse warnings introduced by previous target mode t10-dif patch
        lpfc: Update lpfc version to driver version 10.2.8001.0
        lpfc: Fix ExpressLane priority setup
        lpfc: mark old devices as obsolete
        lpfc: Fix for initializing RRQ bitmap
        lpfc: Fix for cleaning up stale ring flag and sp_queue_event entries
        lpfc: Update lpfc version to driver version 10.2.8000.0
        lpfc: Update Copyright on changed files from 8.3.45 patches
        lpfc: Update Copyright on changed files
        lpfc: Fixed locking for scsi task management commands
        lpfc: Convert runtime references to old xlane cfg param to fof cfg param
        lpfc: Fix FW dump using sysfs
        lpfc: Fix SLI4 s abort loop to process all FCP rings and under ring_lock
        lpfc: Fixed kernel panic in lpfc_abort_handler
        lpfc: Fix locking for postbufq when freeing
        lpfc: Fix locking for lpfc_hba_down_post
        lpfc: Fix dynamic transitions of FirstBurst from on to off
        hpsa: fix handling of hpsa_volume_offline return value
        hpsa: return -ENOMEM not -1 on kzalloc failure in hpsa_get_device_id
        hpsa: remove messages about volume status VPD inquiry page not supported
        ...
      abf04af7
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 16d52ef7
      Linus Torvalds authored
      Pull more btrfs updates from Chris Mason:
       "This has a few fixes since our last pull and a new ioctl for doing
        btree searches from userland.  It's very similar to the existing
        ioctl, but lets us return larger items back down to the app"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: fix error handling in create_pending_snapshot
        btrfs: fix use of uninit "ret" in end_extent_writepage()
        btrfs: free ulist in qgroup_shared_accounting() error path
        Btrfs: fix qgroups sanity test crash or hang
        btrfs: prevent RCU warning when dereferencing radix tree slot
        Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting
        btrfs: new ioctl TREE_SEARCH_V2
        btrfs: tree_search, search_ioctl: direct copy to userspace
        btrfs: new function read_extent_buffer_to_user
        btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW
        btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer
        btrfs: tree_search, search_ioctl: accept varying buffer
        btrfs: tree_search: eliminate redundant nr_items check
      16d52ef7
    • Linus Torvalds's avatar
      Merge git://git.kvack.org/~bcrl/aio-next · a311c480
      Linus Torvalds authored
      Pull aio fix and cleanups from Ben LaHaise:
       "This consists of a couple of code cleanups plus a minor bug fix"
      
      * git://git.kvack.org/~bcrl/aio-next:
        aio: cleanup: flatten kill_ioctx()
        aio: report error from io_destroy() when threads race in io_destroy()
        fs/aio.c: Remove ctx parameter in kiocb_cancel
      a311c480
    • Al Viro's avatar
      fix __swap_writepage() compile failure on old gcc versions · 05064084
      Al Viro authored
      Tetsuo Handa wrote:
       "Commit 62a8067a ("bio_vec-backed iov_iter") introduced an unnamed
        union inside a struct which gcc-4.4.7 cannot handle.  Name the unnamed
         union as u in order to fix build failure"
      
      Let's do this instead: there is only one place in the entire tree that
      steps into this breakage.  Anon structs and unions work in older gcc
      versions; as the matter of fact, we have those in the tree - see e.g.
      struct ieee80211_tx_info in include/net/mac80211.h
      
      What doesn't work is handling their initializers:
      
      struct {
      	int a;
      	union {
      		int b;
      		char c;
      	};
      } x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}};
      
      is the obvious syntax for initializer, perfectly fine for C11 and
      handled correctly by gcc-4.7 or later.
      
      Earlier versions, though, break on it - declaration is fine and so's
      access to fields (i.e.  x[0].c = 'a'; would produce the right code), but
      members of the anon structs and unions are not inserted into the right
      namespace.  Tellingly, those older versions will not barf on struct {int
      a; struct {int a;};}; - looks like they just have it hacked up somewhere
      around the handling of .  and -> instead of doing the right thing.
      
      The easiest way to deal with that crap is to turn initialization of
      those fields (in the only place where we have such initializer of
      iov_iter) into plain assignment.
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      05064084
  5. 14 Jun, 2014 4 commits
  6. 13 Jun, 2014 5 commits