sync0sync.c 41.4 KB
Newer Older
vasil's avatar
vasil committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
/*****************************************************************************

Copyright (c) 1995, 2009, Innobase Oy. All Rights Reserved.
Copyright (c) 2008, Google Inc.

Portions of this file contain modifications contributed and copyrighted by
Google, Inc. Those modifications are gratefully acknowledged and are described
briefly in the InnoDB documentation. The contributions by Google are
incorporated with their permission, and subject to the conditions contained in
the file COPYING.Google.

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA

*****************************************************************************/

26 27
/**************************************************//**
@file sync/sync0sync.c
osku's avatar
osku committed
28 29 30 31 32 33 34 35 36 37 38 39 40 41
Mutex, the basic synchronization primitive

Created 9/5/1995 Heikki Tuuri
*******************************************************/

#include "sync0sync.h"
#ifdef UNIV_NONINL
#include "sync0sync.ic"
#endif

#include "sync0rw.h"
#include "buf0buf.h"
#include "srv0srv.h"
#include "buf0types.h"
vasil's avatar
vasil committed
42
#include "os0sync.h" /* for HAVE_ATOMIC_BUILTINS */
osku's avatar
osku committed
43 44 45 46 47 48 49 50 51 52 53 54

/*
	REASONS FOR IMPLEMENTING THE SPIN LOCK MUTEX
	============================================

Semaphore operations in operating systems are slow: Solaris on a 1993 Sparc
takes 3 microseconds (us) for a lock-unlock pair and Windows NT on a 1995
Pentium takes 20 microseconds for a lock-unlock pair. Therefore, we have to
implement our own efficient spin lock mutex. Future operating systems may
provide efficient spin locks, but we cannot count on that.

Another reason for implementing a spin lock is that on multiprocessor systems
55
it can be more efficient for a processor to run a loop waiting for the
osku's avatar
osku committed
56 57 58 59 60 61 62
semaphore to be released than to switch to a different thread. A thread switch
takes 25 us on both platforms mentioned above. See Gray and Reuter's book
Transaction processing for background.

How long should the spin loop last before suspending the thread? On a
uniprocessor, spinning does not help at all, because if the thread owning the
mutex is not executing, it cannot be released. Spinning actually wastes
63
resources.
osku's avatar
osku committed
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

On a multiprocessor, we do not know if the thread owning the mutex is
executing or not. Thus it would make sense to spin as long as the operation
guarded by the mutex would typically last assuming that the thread is
executing. If the mutex is not released by that time, we may assume that the
thread owning the mutex is not executing and suspend the waiting thread.

A typical operation (where no i/o involved) guarded by a mutex or a read-write
lock may last 1 - 20 us on the current Pentium platform. The longest
operations are the binary searches on an index node.

We conclude that the best choice is to set the spin time at 20 us. Then the
system should work well on a multiprocessor. On a uniprocessor we have to
make sure that thread swithches due to mutex collisions are not frequent,
i.e., they do not happen every 100 us or so, because that wastes too much
resources. If the thread switches are not frequent, the 20 us wasted in spin
80
loop is not too much.
osku's avatar
osku committed
81 82 83 84

Empirical studies on the effect of spin time should be done for different
platforms.

85

osku's avatar
osku committed
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
	IMPLEMENTATION OF THE MUTEX
	===========================

For background, see Curt Schimmel's book on Unix implementation on modern
architectures. The key points in the implementation are atomicity and
serialization of memory accesses. The test-and-set instruction (XCHG in
Pentium) must be atomic. As new processors may have weak memory models, also
serialization of memory references may be necessary. The successor of Pentium,
P6, has at least one mode where the memory model is weak. As far as we know,
in Pentium all memory accesses are serialized in the program order and we do
not have to worry about the memory model. On other processors there are
special machine instructions called a fence, memory barrier, or storage
barrier (STBAR in Sparc), which can be used to serialize the memory accesses
to happen in program order relative to the fence instruction.

Leslie Lamport has devised a "bakery algorithm" to implement a mutex without
the atomic test-and-set, but his algorithm should be modified for weak memory
models. We do not use Lamport's algorithm, because we guess it is slower than
the atomic test-and-set.

Our mutex implementation works as follows: After that we perform the atomic
test-and-set instruction on the memory word. If the test returns zero, we
know we got the lock first. If the test returns not zero, some other thread
was quicker and got the lock: then we spin in a loop reading the memory word,
waiting it to become zero. It is wise to just read the word in the loop, not
perform numerous test-and-set instructions, because they generate memory
traffic between the cache and the main memory. The read loop can just access
the cache, saving bus bandwidth.

If we cannot acquire the mutex lock in the specified time, we reserve a cell
in the wait array, set the waiters byte in the mutex to 1. To avoid a race
condition, after setting the waiters byte and before suspending the waiting
thread, we still have to check that the mutex is reserved, because it may
have happened that the thread which was holding the mutex has just released
it and did not see the waiters byte set to 1, a case which would lead the
other thread to an infinite wait.

123 124 125 126
LEMMA 1: After a thread resets the event of a mutex (or rw_lock), some
=======
thread will eventually call os_event_set() on that particular event.
Thus no infinite wait is possible in this case.
osku's avatar
osku committed
127 128 129 130 131 132

Proof:	After making the reservation the thread sets the waiters field in the
mutex to 1. Then it checks that the mutex is still reserved by some thread,
or it reserves the mutex for itself. In any case, some thread (which may be
also some earlier thread, not necessarily the one currently holding the mutex)
will set the waiters field to 0 in mutex_exit, and then call
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
os_event_set() with the mutex as an argument.
Q.E.D.

LEMMA 2: If an os_event_set() call is made after some thread has called
=======
the os_event_reset() and before it starts wait on that event, the call
will not be lost to the second thread. This is true even if there is an
intervening call to os_event_reset() by another thread.
Thus no infinite wait is possible in this case.

Proof (non-windows platforms): os_event_reset() returns a monotonically
increasing value of signal_count. This value is increased at every
call of os_event_set() If thread A has called os_event_reset() followed
by thread B calling os_event_set() and then some other thread C calling
os_event_reset(), the is_set flag of the event will be set to FALSE;
but now if thread A calls os_event_wait_low() with the signal_count
value returned from the earlier call of os_event_reset(), it will
return immediately without waiting.
Q.E.D.

Proof (windows): If there is a writer thread which is forced to wait for
the lock, it may be able to set the state of rw_lock to RW_LOCK_WAIT_EX
The design of rw_lock ensures that there is one and only one thread
that is able to change the state to RW_LOCK_WAIT_EX and this thread is
guaranteed to acquire the lock after it is released by the current
holders and before any other waiter gets the lock.
On windows this thread waits on a separate event i.e.: wait_ex_event.
Since only one thread can wait on this event there is no chance
of this event getting reset before the writer starts wait on it.
Therefore, this thread is guaranteed to catch the os_set_event()
signalled unconditionally at the release of the lock.
osku's avatar
osku committed
164 165 166 167
Q.E.D. */

/* Number of spin waits on mutexes: for performance monitoring */

168 169 170 171 172 173 174 175 176 177 178
/** The number of iterations in the mutex_spin_wait() spin loop.
Intended for performance monitoring. */
static ib_int64_t	mutex_spin_round_count		= 0;
/** The number of mutex_spin_wait() calls.  Intended for
performance monitoring. */
static ib_int64_t	mutex_spin_wait_count		= 0;
/** The number of OS waits in mutex_spin_wait().  Intended for
performance monitoring. */
static ib_int64_t	mutex_os_wait_count		= 0;
/** The number of mutex_exit() calls. Intended for performance
monitoring. */
inaam's avatar
inaam committed
179
UNIV_INTERN ib_int64_t	mutex_exit_count		= 0;
osku's avatar
osku committed
180

181
/** The global array of wait cells for implementation of the database's own
osku's avatar
osku committed
182
mutexes and read-write locks */
183
UNIV_INTERN sync_array_t*	sync_primary_wait_array;
osku's avatar
osku committed
184

185
/** This variable is set to TRUE when sync_init is called */
186
UNIV_INTERN ibool	sync_initialized	= FALSE;
osku's avatar
osku committed
187

188
/** An acquired mutex or rw-lock and its level in the latching order */
osku's avatar
osku committed
189
typedef struct sync_level_struct	sync_level_t;
190
/** Mutexes or rw-locks held by a thread */
osku's avatar
osku committed
191 192
typedef struct sync_thread_struct	sync_thread_t;

193
#ifdef UNIV_SYNC_DEBUG
194
/** The latch levels currently owned by threads are stored in this data
osku's avatar
osku committed
195 196
structure; the size of this array is OS_THREAD_MAX_N */

197
UNIV_INTERN sync_thread_t*	sync_thread_level_arrays;
osku's avatar
osku committed
198

199
/** Mutex protecting sync_thread_level_arrays */
200
UNIV_INTERN mutex_t		sync_thread_mutex;
201
#endif /* UNIV_SYNC_DEBUG */
osku's avatar
osku committed
202

203
/** Global list of database mutexes (not OS mutexes) created. */
204
UNIV_INTERN ut_list_base_node_t  mutex_list;
osku's avatar
osku committed
205

206
/** Mutex protecting the mutex_list variable */
207
UNIV_INTERN mutex_t mutex_list_mutex;
osku's avatar
osku committed
208

209
#ifdef UNIV_SYNC_DEBUG
210
/** Latching order checks start when this is set TRUE */
211
UNIV_INTERN ibool	sync_order_checks_on	= FALSE;
212
#endif /* UNIV_SYNC_DEBUG */
osku's avatar
osku committed
213

214
/** Mutexes or rw-locks held by a thread */
osku's avatar
osku committed
215
struct sync_thread_struct{
216 217 218
	os_thread_id_t	id;	/*!< OS thread id */
	sync_level_t*	levels;	/*!< level array for this thread; if
				this is NULL this slot is unused */
osku's avatar
osku committed
219 220
};

221
/** Number of slots reserved for each OS thread in the sync level array */
osku's avatar
osku committed
222 223
#define SYNC_THREAD_N_LEVELS	10000

224
/** An acquired mutex or rw-lock and its level in the latching order */
osku's avatar
osku committed
225
struct sync_level_struct{
226
	void*	latch;	/*!< pointer to a mutex or an rw-lock; NULL means that
osku's avatar
osku committed
227
			the slot is empty */
228
	ulint	level;	/*!< level of the latch in the latching order */
osku's avatar
osku committed
229 230
};

231
/******************************************************************//**
osku's avatar
osku committed
232 233 234 235
Creates, or rather, initializes a mutex object in a specified memory
location (which must be appropriately aligned). The mutex is initialized
in the reset state. Explicit freeing of the mutex with mutex_free is
necessary only if the memory block containing it is freed. */
236
UNIV_INTERN
osku's avatar
osku committed
237 238 239
void
mutex_create_func(
/*==============*/
240
	mutex_t*	mutex,		/*!< in: pointer to memory */
241
#ifdef UNIV_DEBUG
242
	const char*	cmutex_name,	/*!< in: mutex name */
243
# ifdef UNIV_SYNC_DEBUG
244
	ulint		level,		/*!< in: level */
245 246
# endif /* UNIV_SYNC_DEBUG */
#endif /* UNIV_DEBUG */
247 248
	const char*	cfile_name,	/*!< in: file name where created */
	ulint		cline)		/*!< in: file line where created */
osku's avatar
osku committed
249
{
250
#if defined(HAVE_ATOMIC_BUILTINS)
inaam's avatar
inaam committed
251
	mutex_reset_lock_word(mutex);
252
#else
osku's avatar
osku committed
253 254 255
	os_fast_mutex_init(&(mutex->os_fast_mutex));
	mutex->lock_word = 0;
#endif
256
	mutex->event = os_event_create(NULL);
osku's avatar
osku committed
257
	mutex_set_waiters(mutex, 0);
258
#ifdef UNIV_DEBUG
osku's avatar
osku committed
259
	mutex->magic_n = MUTEX_MAGIC_N;
260
#endif /* UNIV_DEBUG */
osku's avatar
osku committed
261 262 263
#ifdef UNIV_SYNC_DEBUG
	mutex->line = 0;
	mutex->file_name = "not yet reserved";
264
	mutex->level = level;
265
#endif /* UNIV_SYNC_DEBUG */
osku's avatar
osku committed
266 267
	mutex->cfile_name = cfile_name;
	mutex->cline = cline;
268
	mutex->count_os_wait = 0;
269
#ifdef UNIV_DEBUG
270 271 272 273 274 275 276 277
	mutex->cmutex_name=	  cmutex_name;
	mutex->count_using=	  0;
	mutex->mutex_type=	  0;
	mutex->lspent_time=	  0;
	mutex->lmax_spent_time=     0;
	mutex->count_spin_loop= 0;
	mutex->count_spin_rounds=   0;
	mutex->count_os_yield=  0;
278
#endif /* UNIV_DEBUG */
279

osku's avatar
osku committed
280 281 282 283 284
	/* Check that lock_word is aligned; this is important on Intel */
	ut_ad(((ulint)(&(mutex->lock_word))) % 4 == 0);

	/* NOTE! The very first mutexes are not put to the mutex list */

285 286 287 288 289
	if ((mutex == &mutex_list_mutex)
#ifdef UNIV_SYNC_DEBUG
	    || (mutex == &sync_thread_mutex)
#endif /* UNIV_SYNC_DEBUG */
	    ) {
osku's avatar
osku committed
290

291
		return;
osku's avatar
osku committed
292
	}
293

osku's avatar
osku committed
294 295
	mutex_enter(&mutex_list_mutex);

296 297
	ut_ad(UT_LIST_GET_LEN(mutex_list) == 0
	      || UT_LIST_GET_FIRST(mutex_list)->magic_n == MUTEX_MAGIC_N);
osku's avatar
osku committed
298 299 300 301 302 303

	UT_LIST_ADD_FIRST(list, mutex_list, mutex);

	mutex_exit(&mutex_list_mutex);
}

304
/******************************************************************//**
osku's avatar
osku committed
305 306 307
Calling this function is obligatory only if the memory buffer containing
the mutex is freed. Removes a mutex object from the mutex list. The mutex
is checked to be in the reset state. */
308
UNIV_INTERN
osku's avatar
osku committed
309 310 311
void
mutex_free(
/*=======*/
312
	mutex_t*	mutex)	/*!< in: mutex */
osku's avatar
osku committed
313
{
314
	ut_ad(mutex_validate(mutex));
osku's avatar
osku committed
315 316
	ut_a(mutex_get_lock_word(mutex) == 0);
	ut_a(mutex_get_waiters(mutex) == 0);
317

318 319 320 321 322 323 324 325 326
#ifdef UNIV_MEM_DEBUG
	if (mutex == &mem_hash_mutex) {
		ut_ad(UT_LIST_GET_LEN(mutex_list) == 1);
		ut_ad(UT_LIST_GET_FIRST(mutex_list) == &mem_hash_mutex);
		UT_LIST_REMOVE(list, mutex_list, mutex);
		goto func_exit;
	}
#endif /* UNIV_MEM_DEBUG */

327 328 329 330 331
	if (mutex != &mutex_list_mutex
#ifdef UNIV_SYNC_DEBUG
	    && mutex != &sync_thread_mutex
#endif /* UNIV_SYNC_DEBUG */
	    ) {
osku's avatar
osku committed
332

333
		mutex_enter(&mutex_list_mutex);
osku's avatar
osku committed
334

335 336 337 338 339 340
		ut_ad(!UT_LIST_GET_PREV(list, mutex)
		      || UT_LIST_GET_PREV(list, mutex)->magic_n
		      == MUTEX_MAGIC_N);
		ut_ad(!UT_LIST_GET_NEXT(list, mutex)
		      || UT_LIST_GET_NEXT(list, mutex)->magic_n
		      == MUTEX_MAGIC_N);
341 342

		UT_LIST_REMOVE(list, mutex_list, mutex);
osku's avatar
osku committed
343 344 345 346

		mutex_exit(&mutex_list_mutex);
	}

347
	os_event_free(mutex->event);
348 349 350
#ifdef UNIV_MEM_DEBUG
func_exit:
#endif /* UNIV_MEM_DEBUG */
351
#if !defined(HAVE_ATOMIC_BUILTINS)
osku's avatar
osku committed
352 353 354 355 356
	os_fast_mutex_free(&(mutex->os_fast_mutex));
#endif
	/* If we free the mutex protecting the mutex list (freeing is
	not necessary), we have to reset the magic number AFTER removing
	it from the list. */
357
#ifdef UNIV_DEBUG
osku's avatar
osku committed
358
	mutex->magic_n = 0;
359
#endif /* UNIV_DEBUG */
osku's avatar
osku committed
360 361
}

362
/********************************************************************//**
363 364
NOTE! Use the corresponding macro in the header file, not this function
directly. Tries to lock the mutex for the current thread. If the lock is not
365 366
acquired immediately, returns with return value 1.
@return	0 if succeed, 1 if not */
367
UNIV_INTERN
osku's avatar
osku committed
368
ulint
369 370
mutex_enter_nowait_func(
/*====================*/
371
	mutex_t*	mutex,		/*!< in: pointer to mutex */
osku's avatar
osku committed
372
	const char*	file_name __attribute__((unused)),
373
					/*!< in: file name where mutex
osku's avatar
osku committed
374 375
					requested */
	ulint		line __attribute__((unused)))
376
					/*!< in: line where requested */
osku's avatar
osku committed
377 378 379 380 381
{
	ut_ad(mutex_validate(mutex));

	if (!mutex_test_and_set(mutex)) {

382
		ut_d(mutex->thread_id = os_thread_get_curr_id());
osku's avatar
osku committed
383 384 385 386 387 388 389 390 391 392
#ifdef UNIV_SYNC_DEBUG
		mutex_set_debug_info(mutex, file_name, line);
#endif

		return(0);	/* Succeeded! */
	}

	return(1);
}

393
#ifdef UNIV_DEBUG
394
/******************************************************************//**
395 396
Checks that the mutex has been initialized.
@return	TRUE */
397
UNIV_INTERN
osku's avatar
osku committed
398 399 400
ibool
mutex_validate(
/*===========*/
401
	const mutex_t*	mutex)	/*!< in: mutex */
osku's avatar
osku committed
402 403 404 405 406 407
{
	ut_a(mutex);
	ut_a(mutex->magic_n == MUTEX_MAGIC_N);

	return(TRUE);
}
408

409
/******************************************************************//**
410
Checks that the current thread owns the mutex. Works only in the debug
411 412
version.
@return	TRUE if owns */
413
UNIV_INTERN
414 415 416
ibool
mutex_own(
/*======*/
417
	const mutex_t*	mutex)	/*!< in: mutex */
418 419 420 421 422 423
{
	ut_ad(mutex_validate(mutex));

	return(mutex_get_lock_word(mutex) == 1
	       && os_thread_eq(mutex->thread_id, os_thread_get_curr_id()));
}
424
#endif /* UNIV_DEBUG */
osku's avatar
osku committed
425

426
/******************************************************************//**
osku's avatar
osku committed
427
Sets the waiters field in a mutex. */
428
UNIV_INTERN
osku's avatar
osku committed
429 430 431
void
mutex_set_waiters(
/*==============*/
432 433
	mutex_t*	mutex,	/*!< in: mutex */
	ulint		n)	/*!< in: value to set */
osku's avatar
osku committed
434
{
435 436
	volatile ulint*	ptr;		/* declared volatile to ensure that
					the value is stored to memory */
osku's avatar
osku committed
437 438 439 440 441 442 443 444
	ut_ad(mutex);

	ptr = &(mutex->waiters);

	*ptr = n;		/* Here we assume that the write of a single
				word in memory is atomic */
}

445
/******************************************************************//**
osku's avatar
osku committed
446 447 448
Reserves a mutex for the current thread. If the mutex is reserved, the
function spins a preset time (controlled by SYNC_SPIN_ROUNDS), waiting
for the mutex before suspending the thread. */
449
UNIV_INTERN
osku's avatar
osku committed
450 451 452
void
mutex_spin_wait(
/*============*/
453 454
	mutex_t*	mutex,		/*!< in: pointer to mutex */
	const char*	file_name,	/*!< in: file name where mutex
455
					requested */
456
	ulint		line)		/*!< in: line where requested */
osku's avatar
osku committed
457
{
458 459
	ulint	   index; /* index of the reserved wait cell */
	ulint	   i;	  /* spin round count */
460
#ifdef UNIV_DEBUG
461
	ib_int64_t lstart_time = 0, lfinish_time; /* for timing os_wait */
462 463 464 465
	ulint ltime_diff;
	ulint sec;
	ulint ms;
	uint timer_started = 0;
466
#endif /* UNIV_DEBUG */
467
	ut_ad(mutex);
osku's avatar
osku committed
468

inaam's avatar
inaam committed
469 470 471 472 473 474
	/* This update is not thread safe, but we don't mind if the count
	isn't exact. Moved out of ifdef that follows because we are willing
	to sacrifice the cost of counting this as the data is valuable.
	Count the number of calls to mutex_spin_wait. */
	mutex_spin_wait_count++;

osku's avatar
osku committed
475 476
mutex_loop:

477
	i = 0;
osku's avatar
osku committed
478

479 480 481 482 483
	/* Spin waiting for the lock word to become zero. Note that we do
	not have to assume that the read access to the lock word is atomic,
	as the actual locking is always committed with atomic test-and-set.
	In reality, however, all processors probably have an atomic read of
	a memory word. */
osku's avatar
osku committed
484 485

spin_loop:
486
	ut_d(mutex->count_spin_loop++);
osku's avatar
osku committed
487

488 489 490 491
	while (mutex_get_lock_word(mutex) != 0 && i < SYNC_SPIN_ROUNDS) {
		if (srv_spin_wait_delay) {
			ut_delay(ut_rnd_interval(0, srv_spin_wait_delay));
		}
osku's avatar
osku committed
492

493 494
		i++;
	}
osku's avatar
osku committed
495

496
	if (i == SYNC_SPIN_ROUNDS) {
497
#ifdef UNIV_DEBUG
498
		mutex->count_os_yield++;
499 500
#ifndef UNIV_HOTBACKUP
		if (timed_mutexes && timer_started == 0) {
501
			ut_usectime(&sec, &ms);
502
			lstart_time= (ib_int64_t)sec * 1000000 + ms;
503 504
			timer_started = 1;
		}
505
#endif /* UNIV_HOTBACKUP */
506
#endif /* UNIV_DEBUG */
507 508
		os_thread_yield();
	}
osku's avatar
osku committed
509 510

#ifdef UNIV_SRV_PRINT_LATCH_WAITS
511
	fprintf(stderr,
512 513
		"Thread %lu spin wait mutex at %p"
		" cfile %s cline %lu rnds %lu\n",
514
		(ulong) os_thread_pf(os_thread_get_curr_id()), (void*) mutex,
515
		mutex->cfile_name, (ulong) mutex->cline, (ulong) i);
osku's avatar
osku committed
516 517
#endif

518
	mutex_spin_round_count += i;
osku's avatar
osku committed
519

520
	ut_d(mutex->count_spin_rounds += i);
osku's avatar
osku committed
521

522 523
	if (mutex_test_and_set(mutex) == 0) {
		/* Succeeded! */
osku's avatar
osku committed
524

525
		ut_d(mutex->thread_id = os_thread_get_curr_id());
osku's avatar
osku committed
526
#ifdef UNIV_SYNC_DEBUG
527
		mutex_set_debug_info(mutex, file_name, line);
osku's avatar
osku committed
528 529
#endif

530 531
		goto finish_timing;
	}
osku's avatar
osku committed
532

533 534 535 536 537 538
	/* We may end up with a situation where lock_word is 0 but the OS
	fast mutex is still reserved. On FreeBSD the OS does not seem to
	schedule a thread which is constantly calling pthread_mutex_trylock
	(in mutex_test_and_set implementation). Then we could end up
	spinning here indefinitely. The following 'i++' stops this infinite
	spin. */
osku's avatar
osku committed
539

540
	i++;
osku's avatar
osku committed
541

542 543 544
	if (i < SYNC_SPIN_ROUNDS) {
		goto spin_loop;
	}
osku's avatar
osku committed
545

546
	sync_array_reserve_cell(sync_primary_wait_array, mutex,
547
				SYNC_MUTEX, file_name, line, &index);
osku's avatar
osku committed
548

549 550 551 552 553
	/* The memory order of the array reservation and the change in the
	waiters field is important: when we suspend a thread, we first
	reserve the cell and then set waiters field to 1. When threads are
	released in mutex_exit, the waiters field is first set to zero and
	then the event is set to the signaled state. */
osku's avatar
osku committed
554

555
	mutex_set_waiters(mutex, 1);
osku's avatar
osku committed
556

557 558 559 560
	/* Try to reserve still a few times */
	for (i = 0; i < 4; i++) {
		if (mutex_test_and_set(mutex) == 0) {
			/* Succeeded! Free the reserved wait cell */
osku's avatar
osku committed
561

562
			sync_array_free_cell(sync_primary_wait_array, index);
osku's avatar
osku committed
563

564
			ut_d(mutex->thread_id = os_thread_get_curr_id());
osku's avatar
osku committed
565
#ifdef UNIV_SYNC_DEBUG
566
			mutex_set_debug_info(mutex, file_name, line);
osku's avatar
osku committed
567 568 569
#endif

#ifdef UNIV_SRV_PRINT_LATCH_WAITS
570 571 572
			fprintf(stderr, "Thread %lu spin wait succeeds at 2:"
				" mutex at %p\n",
				(ulong) os_thread_pf(os_thread_get_curr_id()),
573
				(void*) mutex);
osku's avatar
osku committed
574 575
#endif

576
			goto finish_timing;
osku's avatar
osku committed
577

578 579 580 581 582
			/* Note that in this case we leave the waiters field
			set to 1. We cannot reset it to zero, as we do not
			know if there are other waiters. */
		}
	}
osku's avatar
osku committed
583

584 585 586
	/* Now we know that there has been some thread holding the mutex
	after the change in the wait array and the waiters field was made.
	Now there is no risk of infinite wait on the event. */
osku's avatar
osku committed
587 588

#ifdef UNIV_SRV_PRINT_LATCH_WAITS
589 590
	fprintf(stderr,
		"Thread %lu OS wait mutex at %p cfile %s cline %lu rnds %lu\n",
591
		(ulong) os_thread_pf(os_thread_get_curr_id()), (void*) mutex,
592
		mutex->cfile_name, (ulong) mutex->cline, (ulong) i);
osku's avatar
osku committed
593 594
#endif

595
	mutex_os_wait_count++;
osku's avatar
osku committed
596

597
	mutex->count_os_wait++;
598
#ifdef UNIV_DEBUG
599
	/* !!!!! Sometimes os_wait can be called without os_thread_yield */
600 601
#ifndef UNIV_HOTBACKUP
	if (timed_mutexes == 1 && timer_started == 0) {
602
		ut_usectime(&sec, &ms);
603
		lstart_time= (ib_int64_t)sec * 1000000 + ms;
604 605
		timer_started = 1;
	}
606
#endif /* UNIV_HOTBACKUP */
607
#endif /* UNIV_DEBUG */
osku's avatar
osku committed
608

609 610
	sync_array_wait_event(sync_primary_wait_array, index);
	goto mutex_loop;
osku's avatar
osku committed
611 612

finish_timing:
613
#ifdef UNIV_DEBUG
614 615
	if (timed_mutexes == 1 && timer_started==1) {
		ut_usectime(&sec, &ms);
616
		lfinish_time= (ib_int64_t)sec * 1000000 + ms;
617 618 619 620 621 622 623 624

		ltime_diff= (ulint) (lfinish_time - lstart_time);
		mutex->lspent_time += ltime_diff;

		if (mutex->lmax_spent_time < ltime_diff) {
			mutex->lmax_spent_time= ltime_diff;
		}
	}
625
#endif /* UNIV_DEBUG */
626
	return;
osku's avatar
osku committed
627 628
}

629
/******************************************************************//**
osku's avatar
osku committed
630
Releases the threads waiting in the primary wait array for this mutex. */
631
UNIV_INTERN
osku's avatar
osku committed
632 633 634
void
mutex_signal_object(
/*================*/
635
	mutex_t*	mutex)	/*!< in: mutex */
osku's avatar
osku committed
636 637 638 639 640
{
	mutex_set_waiters(mutex, 0);

	/* The memory order of resetting the waiters field and
	signaling the object is important. See LEMMA 1 above. */
641 642
	os_event_set(mutex->event);
	sync_array_object_signalled(sync_primary_wait_array);
osku's avatar
osku committed
643 644 645
}

#ifdef UNIV_SYNC_DEBUG
646
/******************************************************************//**
osku's avatar
osku committed
647
Sets the debug information for a reserved mutex. */
648
UNIV_INTERN
osku's avatar
osku committed
649 650 651
void
mutex_set_debug_info(
/*=================*/
652 653 654
	mutex_t*	mutex,		/*!< in: mutex */
	const char*	file_name,	/*!< in: file where requested */
	ulint		line)		/*!< in: line where requested */
osku's avatar
osku committed
655 656 657 658 659 660 661
{
	ut_ad(mutex);
	ut_ad(file_name);

	sync_thread_add_level(mutex, mutex->level);

	mutex->file_name = file_name;
662 663
	mutex->line	 = line;
}
osku's avatar
osku committed
664

665
/******************************************************************//**
osku's avatar
osku committed
666
Gets the debug information for a reserved mutex. */
667
UNIV_INTERN
osku's avatar
osku committed
668 669 670
void
mutex_get_debug_info(
/*=================*/
671 672 673 674
	mutex_t*	mutex,		/*!< in: mutex */
	const char**	file_name,	/*!< out: file where requested */
	ulint*		line,		/*!< out: line where requested */
	os_thread_id_t* thread_id)	/*!< out: id of the thread which owns
osku's avatar
osku committed
675 676 677 678 679 680 681 682 683
					the mutex */
{
	ut_ad(mutex);

	*file_name = mutex->file_name;
	*line	   = mutex->line;
	*thread_id = mutex->thread_id;
}

684
/******************************************************************//**
osku's avatar
osku committed
685
Prints debug info of currently reserved mutexes. */
686
static
osku's avatar
osku committed
687
void
688 689
mutex_list_print_info(
/*==================*/
690
	FILE*	file)		/*!< in: file where to print */
osku's avatar
osku committed
691 692 693 694 695 696 697 698
{
	mutex_t*	mutex;
	const char*	file_name;
	ulint		line;
	os_thread_id_t	thread_id;
	ulint		count		= 0;

	fputs("----------\n"
699
	      "MUTEX INFO\n"
700
	      "----------\n", file);
osku's avatar
osku committed
701 702 703 704 705 706 707 708 709

	mutex_enter(&mutex_list_mutex);

	mutex = UT_LIST_GET_FIRST(mutex_list);

	while (mutex != NULL) {
		count++;

		if (mutex_get_lock_word(mutex) != 0) {
710
			mutex_get_debug_info(mutex, &file_name, &line,
711
					     &thread_id);
712
			fprintf(file,
713 714
				"Locked mutex: addr %p thread %ld"
				" file %s line %ld\n",
715
				(void*) mutex, os_thread_pf(thread_id),
osku's avatar
osku committed
716 717 718 719 720 721
				file_name, line);
		}

		mutex = UT_LIST_GET_NEXT(list, mutex);
	}

722
	fprintf(file, "Total number of mutexes %ld\n", count);
723

osku's avatar
osku committed
724 725 726
	mutex_exit(&mutex_list_mutex);
}

727
/******************************************************************//**
728 729
Counts currently reserved mutexes. Works only in the debug version.
@return	number of reserved mutexes */
730
UNIV_INTERN
osku's avatar
osku committed
731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758
ulint
mutex_n_reserved(void)
/*==================*/
{
	mutex_t*	mutex;
	ulint		count		= 0;

	mutex_enter(&mutex_list_mutex);

	mutex = UT_LIST_GET_FIRST(mutex_list);

	while (mutex != NULL) {
		if (mutex_get_lock_word(mutex) != 0) {

			count++;
		}

		mutex = UT_LIST_GET_NEXT(list, mutex);
	}

	mutex_exit(&mutex_list_mutex);

	ut_a(count >= 1);

	return(count - 1); /* Subtract one, because this function itself
			   was holding one mutex (mutex_list_mutex) */
}

759
/******************************************************************//**
osku's avatar
osku committed
760
Returns TRUE if no mutex or rw-lock is currently locked. Works only in
761 762
the debug version.
@return	TRUE if no mutexes and rw-locks reserved */
763
UNIV_INTERN
osku's avatar
osku committed
764 765 766 767 768 769 770
ibool
sync_all_freed(void)
/*================*/
{
	return(mutex_n_reserved() + rw_lock_n_locked() == 0);
}

771
/******************************************************************//**
772 773
Gets the value in the nth slot in the thread level arrays.
@return	pointer to thread slot */
osku's avatar
osku committed
774 775 776 777
static
sync_thread_t*
sync_thread_level_arrays_get_nth(
/*=============================*/
778
	ulint	n)	/*!< in: slot number */
osku's avatar
osku committed
779 780 781 782 783 784
{
	ut_ad(n < OS_THREAD_MAX_N);

	return(sync_thread_level_arrays + n);
}

785
/******************************************************************//**
786 787
Looks for the thread slot for the calling thread.
@return	pointer to thread slot, NULL if not found */
osku's avatar
osku committed
788 789 790 791
static
sync_thread_t*
sync_thread_level_arrays_find_slot(void)
/*====================================*/
792

osku's avatar
osku committed
793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812
{
	sync_thread_t*	slot;
	os_thread_id_t	id;
	ulint		i;

	id = os_thread_get_curr_id();

	for (i = 0; i < OS_THREAD_MAX_N; i++) {

		slot = sync_thread_level_arrays_get_nth(i);

		if (slot->levels && os_thread_eq(slot->id, id)) {

			return(slot);
		}
	}

	return(NULL);
}

813
/******************************************************************//**
814 815
Looks for an unused thread slot.
@return	pointer to thread slot */
osku's avatar
osku committed
816 817 818 819
static
sync_thread_t*
sync_thread_level_arrays_find_free(void)
/*====================================*/
820

osku's avatar
osku committed
821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837
{
	sync_thread_t*	slot;
	ulint		i;

	for (i = 0; i < OS_THREAD_MAX_N; i++) {

		slot = sync_thread_level_arrays_get_nth(i);

		if (slot->levels == NULL) {

			return(slot);
		}
	}

	return(NULL);
}

838
/******************************************************************//**
839 840
Gets the value in the nth slot in the thread level array.
@return	pointer to level slot */
osku's avatar
osku committed
841 842 843 844
static
sync_level_t*
sync_thread_levels_get_nth(
/*=======================*/
845
	sync_level_t*	arr,	/*!< in: pointer to level array for an OS
osku's avatar
osku committed
846
				thread */
847
	ulint		n)	/*!< in: slot number */
osku's avatar
osku committed
848 849 850 851 852 853
{
	ut_ad(n < SYNC_THREAD_N_LEVELS);

	return(arr + n);
}

854
/******************************************************************//**
osku's avatar
osku committed
855
Checks if all the level values stored in the level array are greater than
856 857
the given limit.
@return	TRUE if all greater */
osku's avatar
osku committed
858 859 860 861
static
ibool
sync_thread_levels_g(
/*=================*/
862
	sync_level_t*	arr,	/*!< in: pointer to level array for an OS
osku's avatar
osku committed
863
				thread */
864 865
	ulint		limit,	/*!< in: level limit */
	ulint		warn)	/*!< in: TRUE=display a diagnostic message */
osku's avatar
osku committed
866 867 868 869 870 871 872 873 874 875 876 877 878
{
	sync_level_t*	slot;
	rw_lock_t*	lock;
	mutex_t*	mutex;
	ulint		i;

	for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {

		slot = sync_thread_levels_get_nth(arr, i);

		if (slot->latch != NULL) {
			if (slot->level <= limit) {

879 880 881 882 883
				if (!warn) {

					return(FALSE);
				}

osku's avatar
osku committed
884 885 886 887
				lock = slot->latch;
				mutex = slot->latch;

				fprintf(stderr,
888
					"InnoDB: sync levels should be"
889 890
					" > %lu but a level is %lu\n",
					(ulong) limit, (ulong) slot->level);
osku's avatar
osku committed
891 892 893 894 895 896 897 898 899 900 901 902

				if (mutex->magic_n == MUTEX_MAGIC_N) {
					fprintf(stderr,
						"Mutex created at %s %lu\n",
						mutex->cfile_name,
						(ulong) mutex->cline);

					if (mutex_get_lock_word(mutex) != 0) {
						const char*	file_name;
						ulint		line;
						os_thread_id_t	thread_id;

903 904 905
						mutex_get_debug_info(
							mutex, &file_name,
							&line, &thread_id);
osku's avatar
osku committed
906 907

						fprintf(stderr,
908 909 910 911
							"InnoDB: Locked mutex:"
							" addr %p thread %ld"
							" file %s line %ld\n",
							(void*) mutex,
912 913
							os_thread_pf(
								thread_id),
914 915
							file_name,
							(ulong) line);
osku's avatar
osku committed
916 917
					} else {
						fputs("Not locked\n", stderr);
918
					}
osku's avatar
osku committed
919 920 921
				} else {
					rw_lock_print(lock);
				}
922

osku's avatar
osku committed
923 924 925 926 927 928 929 930
				return(FALSE);
			}
		}
	}

	return(TRUE);
}

931
/******************************************************************//**
932 933
Checks if the level value is stored in the level array.
@return	TRUE if stored */
osku's avatar
osku committed
934 935 936 937
static
ibool
sync_thread_levels_contain(
/*=======================*/
938
	sync_level_t*	arr,	/*!< in: pointer to level array for an OS
osku's avatar
osku committed
939
				thread */
940
	ulint		level)	/*!< in: level */
osku's avatar
osku committed
941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959
{
	sync_level_t*	slot;
	ulint		i;

	for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {

		slot = sync_thread_levels_get_nth(arr, i);

		if (slot->latch != NULL) {
			if (slot->level == level) {

				return(TRUE);
			}
		}
	}

	return(FALSE);
}

960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009
/******************************************************************//**
Checks if the level array for the current thread contains a
mutex or rw-latch at the specified level.
@return	a matching latch, or NULL if not found */
UNIV_INTERN
void*
sync_thread_levels_contains(
/*========================*/
	ulint	level)			/*!< in: latching order level
					(SYNC_DICT, ...)*/
{
	sync_level_t*	arr;
	sync_thread_t*	thread_slot;
	sync_level_t*	slot;
	ulint		i;

	if (!sync_order_checks_on) {

		return(NULL);
	}

	mutex_enter(&sync_thread_mutex);

	thread_slot = sync_thread_level_arrays_find_slot();

	if (thread_slot == NULL) {

		mutex_exit(&sync_thread_mutex);

		return(NULL);
	}

	arr = thread_slot->levels;

	for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {

		slot = sync_thread_levels_get_nth(arr, i);

		if (slot->latch != NULL && slot->level == level) {

			mutex_exit(&sync_thread_mutex);
			return(slot->latch);
		}
	}

	mutex_exit(&sync_thread_mutex);

	return(NULL);
}

1010
/******************************************************************//**
1011
Checks that the level array for the current thread is empty.
1012
@return	a latch, or NULL if empty except the exceptions specified below */
1013
UNIV_INTERN
1014 1015 1016
void*
sync_thread_levels_nonempty_gen(
/*============================*/
1017
	ibool	dict_mutex_allowed)	/*!< in: TRUE if dictionary mutex is
osku's avatar
osku committed
1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028
					allowed to be owned by the thread,
					also purge_is_running mutex is
					allowed */
{
	sync_level_t*	arr;
	sync_thread_t*	thread_slot;
	sync_level_t*	slot;
	ulint		i;

	if (!sync_order_checks_on) {

1029
		return(NULL);
osku's avatar
osku committed
1030 1031 1032 1033 1034 1035 1036 1037 1038 1039
	}

	mutex_enter(&sync_thread_mutex);

	thread_slot = sync_thread_level_arrays_find_slot();

	if (thread_slot == NULL) {

		mutex_exit(&sync_thread_mutex);

1040
		return(NULL);
osku's avatar
osku committed
1041 1042 1043 1044 1045 1046 1047 1048
	}

	arr = thread_slot->levels;

	for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {

		slot = sync_thread_levels_get_nth(arr, i);

1049 1050 1051 1052
		if (slot->latch != NULL
		    && (!dict_mutex_allowed
			|| (slot->level != SYNC_DICT
			    && slot->level != SYNC_DICT_OPERATION))) {
osku's avatar
osku committed
1053 1054 1055 1056

			mutex_exit(&sync_thread_mutex);
			ut_error;

1057
			return(slot->latch);
osku's avatar
osku committed
1058 1059 1060 1061 1062
		}
	}

	mutex_exit(&sync_thread_mutex);

1063
	return(NULL);
osku's avatar
osku committed
1064 1065
}

1066
/******************************************************************//**
1067 1068
Checks that the level array for the current thread is empty.
@return	TRUE if empty */
1069
UNIV_INTERN
osku's avatar
osku committed
1070 1071 1072 1073 1074 1075 1076
ibool
sync_thread_levels_empty(void)
/*==========================*/
{
	return(sync_thread_levels_empty_gen(FALSE));
}

1077
/******************************************************************//**
osku's avatar
osku committed
1078 1079 1080
Adds a latch and its level in the thread level array. Allocates the memory
for the array if called first time for this OS thread. Makes the checks
against other latch levels stored in the array for this thread. */
1081
UNIV_INTERN
osku's avatar
osku committed
1082 1083 1084
void
sync_thread_add_level(
/*==================*/
1085 1086
	void*	latch,	/*!< in: pointer to a mutex or an rw-lock */
	ulint	level)	/*!< in: level in the latching order; if
1087
			SYNC_LEVEL_VARYING, nothing is done */
osku's avatar
osku committed
1088 1089 1090 1091 1092
{
	sync_level_t*	array;
	sync_level_t*	slot;
	sync_thread_t*	thread_slot;
	ulint		i;
1093

osku's avatar
osku committed
1094 1095 1096 1097 1098 1099
	if (!sync_order_checks_on) {

		return;
	}

	if ((latch == (void*)&sync_thread_mutex)
1100 1101 1102
	    || (latch == (void*)&mutex_list_mutex)
	    || (latch == (void*)&rw_lock_debug_mutex)
	    || (latch == (void*)&rw_lock_list_mutex)) {
osku's avatar
osku committed
1103 1104 1105 1106

		return;
	}

1107
	if (level == SYNC_LEVEL_VARYING) {
osku's avatar
osku committed
1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118

		return;
	}

	mutex_enter(&sync_thread_mutex);

	thread_slot = sync_thread_level_arrays_find_slot();

	if (thread_slot == NULL) {
		/* We have to allocate the level array for a new thread */
		array = ut_malloc(sizeof(sync_level_t) * SYNC_THREAD_N_LEVELS);
1119

osku's avatar
osku committed
1120
		thread_slot = sync_thread_level_arrays_find_free();
1121 1122

		thread_slot->id = os_thread_get_curr_id();
osku's avatar
osku committed
1123
		thread_slot->levels = array;
1124

osku's avatar
osku committed
1125 1126 1127 1128 1129 1130 1131 1132 1133
		for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {

			slot = sync_thread_levels_get_nth(array, i);

			slot->latch = NULL;
		}
	}

	array = thread_slot->levels;
1134

osku's avatar
osku committed
1135 1136 1137 1138 1139
	/* NOTE that there is a problem with _NODE and _LEAF levels: if the
	B-tree height changes, then a leaf can change to an internal node
	or the other way around. We do not know at present if this can cause
	unnecessary assertion failures below. */

1140 1141 1142 1143
	switch (level) {
	case SYNC_NO_ORDER_CHECK:
	case SYNC_EXTERN_STORAGE:
	case SYNC_TREE_NODE_FROM_HASH:
osku's avatar
osku committed
1144
		/* Do no order checking */
1145 1146 1147 1148
		break;
	case SYNC_MEM_POOL:
	case SYNC_MEM_HASH:
	case SYNC_RECV:
1149
	case SYNC_WORK_QUEUE:
1150 1151 1152 1153
	case SYNC_LOG:
	case SYNC_THR_LOCAL:
	case SYNC_ANY_LATCH:
	case SYNC_TRX_SYS_HEADER:
calvin's avatar
calvin committed
1154
	case SYNC_FILE_FORMAT_TAG:
1155
	case SYNC_DOUBLEWRITE:
1156 1157
	case SYNC_BUF_POOL:
	case SYNC_SEARCH_SYS:
1158
	case SYNC_SEARCH_SYS_CONF:
1159 1160 1161 1162 1163 1164 1165 1166 1167 1168
	case SYNC_TRX_LOCK_HEAP:
	case SYNC_KERNEL:
	case SYNC_IBUF_BITMAP_MUTEX:
	case SYNC_RSEG:
	case SYNC_TRX_UNDO:
	case SYNC_PURGE_LATCH:
	case SYNC_PURGE_SYS:
	case SYNC_DICT_AUTOINC_MUTEX:
	case SYNC_DICT_OPERATION:
	case SYNC_DICT_HEADER:
vasil's avatar
vasil committed
1169 1170
	case SYNC_TRX_I_S_RWLOCK:
	case SYNC_TRX_I_S_LAST_READ:
1171
		if (!sync_thread_levels_g(array, level, TRUE)) {
1172 1173
			fprintf(stderr,
				"InnoDB: sync_thread_levels_g(array, %lu)"
vasil's avatar
vasil committed
1174
				" does not hold!\n", level);
1175 1176
			ut_error;
		}
1177 1178
		break;
	case SYNC_BUF_BLOCK:
1179 1180 1181
		/* Either the thread must own the buffer pool mutex
		(buf_pool_mutex), or it is allowed to latch only ONE
		buffer block (block->mutex or buf_pool_zip_mutex). */
1182 1183
		if (!sync_thread_levels_g(array, level, FALSE)) {
			ut_a(sync_thread_levels_g(array, level - 1, TRUE));
1184 1185
			ut_a(sync_thread_levels_contain(array, SYNC_BUF_POOL));
		}
1186 1187
		break;
	case SYNC_REC_LOCK:
1188 1189 1190 1191 1192 1193
		if (sync_thread_levels_contain(array, SYNC_KERNEL)) {
			ut_a(sync_thread_levels_g(array, SYNC_REC_LOCK - 1,
						  TRUE));
		} else {
			ut_a(sync_thread_levels_g(array, SYNC_REC_LOCK, TRUE));
		}
1194 1195
		break;
	case SYNC_IBUF_BITMAP:
1196 1197 1198
		/* Either the thread must own the master mutex to all
		the bitmap pages, or it is allowed to latch only ONE
		bitmap page. */
1199 1200 1201 1202 1203 1204 1205 1206
		if (sync_thread_levels_contain(array,
					       SYNC_IBUF_BITMAP_MUTEX)) {
			ut_a(sync_thread_levels_g(array, SYNC_IBUF_BITMAP - 1,
						  TRUE));
		} else {
			ut_a(sync_thread_levels_g(array, SYNC_IBUF_BITMAP,
						  TRUE));
		}
1207 1208
		break;
	case SYNC_FSP_PAGE:
osku's avatar
osku committed
1209
		ut_a(sync_thread_levels_contain(array, SYNC_FSP));
1210 1211
		break;
	case SYNC_FSP:
osku's avatar
osku committed
1212
		ut_a(sync_thread_levels_contain(array, SYNC_FSP)
1213
		     || sync_thread_levels_g(array, SYNC_FSP, TRUE));
1214 1215
		break;
	case SYNC_TRX_UNDO_PAGE:
osku's avatar
osku committed
1216
		ut_a(sync_thread_levels_contain(array, SYNC_TRX_UNDO)
1217 1218
		     || sync_thread_levels_contain(array, SYNC_RSEG)
		     || sync_thread_levels_contain(array, SYNC_PURGE_SYS)
1219
		     || sync_thread_levels_g(array, SYNC_TRX_UNDO_PAGE, TRUE));
1220 1221
		break;
	case SYNC_RSEG_HEADER:
osku's avatar
osku committed
1222
		ut_a(sync_thread_levels_contain(array, SYNC_RSEG));
1223 1224
		break;
	case SYNC_RSEG_HEADER_NEW:
osku's avatar
osku committed
1225
		ut_a(sync_thread_levels_contain(array, SYNC_KERNEL)
1226
		     && sync_thread_levels_contain(array, SYNC_FSP_PAGE));
1227 1228
		break;
	case SYNC_TREE_NODE:
osku's avatar
osku committed
1229
		ut_a(sync_thread_levels_contain(array, SYNC_INDEX_TREE)
1230
		     || sync_thread_levels_contain(array, SYNC_DICT_OPERATION)
1231
		     || sync_thread_levels_g(array, SYNC_TREE_NODE - 1, TRUE));
1232 1233
		break;
	case SYNC_TREE_NODE_NEW:
osku's avatar
osku committed
1234
		ut_a(sync_thread_levels_contain(array, SYNC_FSP_PAGE)
1235
		     || sync_thread_levels_contain(array, SYNC_IBUF_MUTEX));
1236 1237
		break;
	case SYNC_INDEX_TREE:
1238 1239 1240 1241 1242 1243 1244 1245
		if (sync_thread_levels_contain(array, SYNC_IBUF_MUTEX)
		    && sync_thread_levels_contain(array, SYNC_FSP)) {
			ut_a(sync_thread_levels_g(array, SYNC_FSP_PAGE - 1,
						  TRUE));
		} else {
			ut_a(sync_thread_levels_g(array, SYNC_TREE_NODE - 1,
						  TRUE));
		}
1246 1247
		break;
	case SYNC_IBUF_MUTEX:
1248
		ut_a(sync_thread_levels_g(array, SYNC_FSP_PAGE - 1, TRUE));
1249 1250
		break;
	case SYNC_IBUF_PESS_INSERT_MUTEX:
1251 1252
		ut_a(sync_thread_levels_g(array, SYNC_FSP - 1, TRUE));
		ut_a(!sync_thread_levels_contain(array, SYNC_IBUF_MUTEX));
1253 1254
		break;
	case SYNC_IBUF_HEADER:
1255 1256 1257 1258
		ut_a(sync_thread_levels_g(array, SYNC_FSP - 1, TRUE));
		ut_a(!sync_thread_levels_contain(array, SYNC_IBUF_MUTEX));
		ut_a(!sync_thread_levels_contain(array,
						 SYNC_IBUF_PESS_INSERT_MUTEX));
1259 1260
		break;
	case SYNC_DICT:
osku's avatar
osku committed
1261 1262
#ifdef UNIV_DEBUG
		ut_a(buf_debug_prints
1263
		     || sync_thread_levels_g(array, SYNC_DICT, TRUE));
osku's avatar
osku committed
1264
#else /* UNIV_DEBUG */
1265
		ut_a(sync_thread_levels_g(array, SYNC_DICT, TRUE));
osku's avatar
osku committed
1266
#endif /* UNIV_DEBUG */
1267 1268
		break;
	default:
osku's avatar
osku committed
1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287
		ut_error;
	}

	for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {

		slot = sync_thread_levels_get_nth(array, i);

		if (slot->latch == NULL) {
			slot->latch = latch;
			slot->level = level;

			break;
		}
	}

	ut_a(i < SYNC_THREAD_N_LEVELS);

	mutex_exit(&sync_thread_mutex);
}
1288

1289
/******************************************************************//**
1290
Removes a latch from the thread level array if it is found there.
1291 1292 1293
@return TRUE if found in the array; it is no error if the latch is
not found, as we presently are not able to determine the level for
every latch reservation the program does */
1294
UNIV_INTERN
osku's avatar
osku committed
1295 1296 1297
ibool
sync_thread_reset_level(
/*====================*/
1298
	void*	latch)	/*!< in: pointer to a mutex or an rw-lock */
osku's avatar
osku committed
1299 1300 1301 1302 1303
{
	sync_level_t*	array;
	sync_level_t*	slot;
	sync_thread_t*	thread_slot;
	ulint		i;
1304

osku's avatar
osku committed
1305 1306 1307 1308 1309 1310
	if (!sync_order_checks_on) {

		return(FALSE);
	}

	if ((latch == (void*)&sync_thread_mutex)
1311 1312 1313
	    || (latch == (void*)&mutex_list_mutex)
	    || (latch == (void*)&rw_lock_debug_mutex)
	    || (latch == (void*)&rw_lock_list_mutex)) {
osku's avatar
osku committed
1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330

		return(FALSE);
	}

	mutex_enter(&sync_thread_mutex);

	thread_slot = sync_thread_level_arrays_find_slot();

	if (thread_slot == NULL) {

		ut_error;

		mutex_exit(&sync_thread_mutex);
		return(FALSE);
	}

	array = thread_slot->levels;
1331

osku's avatar
osku committed
1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344
	for (i = 0; i < SYNC_THREAD_N_LEVELS; i++) {

		slot = sync_thread_levels_get_nth(array, i);

		if (slot->latch == latch) {
			slot->latch = NULL;

			mutex_exit(&sync_thread_mutex);

			return(TRUE);
		}
	}

1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356
	if (((mutex_t*) latch)->magic_n != MUTEX_MAGIC_N) {
		rw_lock_t*	rw_lock;

		rw_lock = (rw_lock_t*) latch;

		if (rw_lock->level == SYNC_LEVEL_VARYING) {
			mutex_exit(&sync_thread_mutex);

			return(TRUE);
		}
	}

osku's avatar
osku committed
1357 1358 1359 1360 1361 1362
	ut_error;

	mutex_exit(&sync_thread_mutex);

	return(FALSE);
}
1363
#endif /* UNIV_SYNC_DEBUG */
1364

1365
/******************************************************************//**
osku's avatar
osku committed
1366
Initializes the synchronization data structures. */
1367
UNIV_INTERN
osku's avatar
osku committed
1368 1369 1370 1371
void
sync_init(void)
/*===========*/
{
1372
#ifdef UNIV_SYNC_DEBUG
osku's avatar
osku committed
1373 1374
	sync_thread_t*	thread_slot;
	ulint		i;
1375
#endif /* UNIV_SYNC_DEBUG */
1376

osku's avatar
osku committed
1377 1378 1379 1380 1381 1382 1383 1384
	ut_a(sync_initialized == FALSE);

	sync_initialized = TRUE;

	/* Create the primary system wait array which is protected by an OS
	mutex */

	sync_primary_wait_array = sync_array_create(OS_THREAD_MAX_N,
1385
						    SYNC_ARRAY_OS_MUTEX);
1386
#ifdef UNIV_SYNC_DEBUG
osku's avatar
osku committed
1387 1388 1389 1390
	/* Create the thread latch level array where the latch levels
	are stored for each OS thread */

	sync_thread_level_arrays = ut_malloc(OS_THREAD_MAX_N
1391
					     * sizeof(sync_thread_t));
osku's avatar
osku committed
1392 1393 1394 1395 1396
	for (i = 0; i < OS_THREAD_MAX_N; i++) {

		thread_slot = sync_thread_level_arrays_get_nth(i);
		thread_slot->levels = NULL;
	}
1397
#endif /* UNIV_SYNC_DEBUG */
1398
	/* Init the mutex list and create the mutex to protect it. */
osku's avatar
osku committed
1399 1400

	UT_LIST_INIT(mutex_list);
1401
	mutex_create(&mutex_list_mutex, SYNC_NO_ORDER_CHECK);
1402
#ifdef UNIV_SYNC_DEBUG
1403
	mutex_create(&sync_thread_mutex, SYNC_NO_ORDER_CHECK);
1404
#endif /* UNIV_SYNC_DEBUG */
osku's avatar
osku committed
1405 1406 1407 1408

	/* Init the rw-lock list and create the mutex to protect it. */

	UT_LIST_INIT(rw_lock_list);
1409
	mutex_create(&rw_lock_list_mutex, SYNC_NO_ORDER_CHECK);
osku's avatar
osku committed
1410 1411

#ifdef UNIV_SYNC_DEBUG
1412
	mutex_create(&rw_lock_debug_mutex, SYNC_NO_ORDER_CHECK);
osku's avatar
osku committed
1413 1414 1415 1416 1417 1418

	rw_lock_debug_event = os_event_create(NULL);
	rw_lock_debug_waiters = FALSE;
#endif /* UNIV_SYNC_DEBUG */
}

1419
/******************************************************************//**
osku's avatar
osku committed
1420 1421
Frees the resources in InnoDB's own synchronization data structures. Use
os_sync_free() after calling this. */
1422
UNIV_INTERN
osku's avatar
osku committed
1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433
void
sync_close(void)
/*===========*/
{
	mutex_t*	mutex;

	sync_array_free(sync_primary_wait_array);

	mutex = UT_LIST_GET_FIRST(mutex_list);

	while (mutex) {
1434 1435 1436 1437 1438 1439
#ifdef UNIV_MEM_DEBUG
		if (mutex == &mem_hash_mutex) {
			mutex = UT_LIST_GET_NEXT(list, mutex);
			continue;
		}
#endif /* UNIV_MEM_DEBUG */
1440
		mutex_free(mutex);
osku's avatar
osku committed
1441 1442 1443 1444
		mutex = UT_LIST_GET_FIRST(mutex_list);
	}

	mutex_free(&mutex_list_mutex);
1445
#ifdef UNIV_SYNC_DEBUG
1446
	mutex_free(&sync_thread_mutex);
1447 1448 1449

	/* Switch latching order checks on in sync0sync.c */
	sync_order_checks_on = FALSE;
1450
#endif /* UNIV_SYNC_DEBUG */
1451 1452

	sync_initialized = FALSE;	
osku's avatar
osku committed
1453 1454
}

1455
/*******************************************************************//**
osku's avatar
osku committed
1456
Prints wait info of the sync system. */
1457
UNIV_INTERN
osku's avatar
osku committed
1458 1459 1460
void
sync_print_wait_info(
/*=================*/
1461
	FILE*	file)		/*!< in: file where to print */
osku's avatar
osku committed
1462 1463
{
#ifdef UNIV_SYNC_DEBUG
inaam's avatar
inaam committed
1464
	fprintf(file, "Mutex exits %llu, rws exits %llu, rwx exits %llu\n",
osku's avatar
osku committed
1465 1466 1467 1468
		mutex_exit_count, rw_s_exit_count, rw_x_exit_count);
#endif

	fprintf(file,
inaam's avatar
inaam committed
1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488
		"Mutex spin waits %llu, rounds %llu, OS waits %llu\n"
		"RW-shared spins %llu, OS waits %llu;"
		" RW-excl spins %llu, OS waits %llu\n",
		mutex_spin_wait_count,
		mutex_spin_round_count,
		mutex_os_wait_count,
		rw_s_spin_wait_count,
		rw_s_os_wait_count,
		rw_x_spin_wait_count,
		rw_x_os_wait_count);

	fprintf(file,
		"Spin rounds per wait: %.2f mutex, %.2f RW-shared, "
		"%.2f RW-excl\n",
		(double) mutex_spin_round_count /
		(mutex_spin_wait_count ? mutex_spin_wait_count : 1),
		(double) rw_s_spin_round_count /
		(rw_s_spin_wait_count ? rw_s_spin_wait_count : 1),
		(double) rw_x_spin_round_count /
		(rw_x_spin_wait_count ? rw_x_spin_wait_count : 1));
osku's avatar
osku committed
1489 1490
}

1491
/*******************************************************************//**
osku's avatar
osku committed
1492
Prints info of the sync system. */
1493
UNIV_INTERN
osku's avatar
osku committed
1494 1495 1496
void
sync_print(
/*=======*/
1497
	FILE*	file)		/*!< in: file where to print */
osku's avatar
osku committed
1498 1499
{
#ifdef UNIV_SYNC_DEBUG
1500
	mutex_list_print_info(file);
osku's avatar
osku committed
1501

1502
	rw_lock_list_print_info(file);
osku's avatar
osku committed
1503 1504 1505 1506 1507 1508
#endif /* UNIV_SYNC_DEBUG */

	sync_array_print_info(file, sync_primary_wait_array);

	sync_print_wait_info(file);
}