1. 19 Oct, 2017 4 commits
  2. 18 Oct, 2017 7 commits
  3. 16 Oct, 2017 6 commits
  4. 12 Oct, 2017 1 commit
  5. 09 Oct, 2017 8 commits
  6. 04 Oct, 2017 1 commit
    • Heiko Carstens's avatar
      s390: use generic rwsem implementation · 91a1fad7
      Heiko Carstens authored
      We never optimized our rwsem inline assemblies to make use of the new
      atomic instructions. The generic rwsem implementation implicitly makes
      use of the new instructions, since it implements the required rwsem
      primitives with atomic operations, which we did optimize.
      
      However even when compiling for old architectures the generic variant
      still generates better code. So it's time to simply remove our old
      code and switch to the generic implementation.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      91a1fad7
  7. 29 Sep, 2017 11 commits
  8. 28 Sep, 2017 2 commits
    • Martin Schwidefsky's avatar
      s390/rwlock: introduce rwlock wait queueing · eb3b7b84
      Martin Schwidefsky authored
      Like the common queued rwlock code the s390 implementation uses the
      queued spinlock code on a spinlock_t embedded in the rwlock_t to achieve
      the queueing. The encoding of the rwlock_t differs though, the counter
      field in the rwlock_t is split into two parts. The upper two bytes hold
      the write bit and the write wait counter, the lower two bytes hold the
      read counter.
      
      The arch_read_lock operation works exactly like the common qrwlock but
      the enqueue operation for a writer follows a diffent logic. After the
      failed inline try to get the rwlock in write, the writer first increases
      the write wait counter, acquires the wait spin_lock for the queueing,
      and then loops until there are no readers and the write bit is zero.
      Without the write wait counter a CPU that just released the rwlock
      could immediately reacquire the lock in the inline code, bypassing all
      outstanding read and write waiters. For s390 this would cause massive
      imbalances in favour of writers in case of a contended rwlock.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      eb3b7b84
    • Martin Schwidefsky's avatar
      s390/spinlock: introduce spinlock wait queueing · b96f7d88
      Martin Schwidefsky authored
      The queued spinlock code for s390 follows the principles of the common
      code qspinlock implementation but with a few notable differences.
      
      The format of the spinlock_t locking word differs, s390 needs to store
      the logical CPU number of the lock holder in the spinlock_t to be able
      to use the diagnose 9c directed yield hypervisor call.
      
      The inline code sequences for spin_lock and spin_unlock are nice and
      short. The inline portion of a spin_lock now typically looks like this:
      
      	lhi	%r0,0			# 0 indicates an empty lock
      	l	%r1,0x3a0		# CPU number + 1 from lowcore
      	cs	%r0,%r1,<some_lock>	# lock operation
      	jnz	call_wait		# on failure call wait function
      locked:
      	...
      call_wait:
      	la	%r2,<some_lock>
      	brasl	%r14,arch_spin_lock_wait
      	j	locked
      
      A spin_unlock is as simple as before:
      
      	lhi	%r0,0
      	sth	%r0,2(%r2)		# unlock operation
      
      After a CPU has queued itself it may not enable interrupts again for the
      arch_spin_lock_flags() variant. The arch_spin_lock_wait_flags wait function
      is removed.
      
      To improve performance the code implements opportunistic lock stealing.
      If the wait function finds a spinlock_t that indicates that the lock is
      free but there are queued waiters, the CPU may steal the lock up to three
      times without queueing itself. The lock stealing update the steal counter
      in the lock word to prevent more than 3 steals. The counter is reset at
      the time the CPU next in the queue successfully takes the lock.
      
      While the queued spinlocks improve performance in a system with dedicated
      CPUs, in a virtualized environment with continuously overcommitted CPUs
      the queued spinlocks can have a negative effect on performance. This
      is due to the fact that a queued CPU that is preempted by the hypervisor
      will block the queue at some point even without holding the lock. With
      the classic spinlock it does not matter if a CPU is preempted that waits
      for the lock. Therefore use the queued spinlock code only if the system
      runs with dedicated CPUs and fall back to classic spinlocks when running
      with shared CPUs.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      b96f7d88