Post-vacation-musing fixes to WL#3071 "Maria checkpoint":

changes to how synchronous checkpoint requests are executed. changes to how the background LRD flushing thread refrains from using all resources. See more comments for each file. storage/maria/checkpoint.c: I was not happy that checkpoint requests which want to know the success/error of their executed request, get inaccurate information in case of error (no error string etc). Instead of implementing a more complete communication protocol between requestor and executor, I make the requestor do the execution itself. I call this a synchronous checkpoint. For asynchronous checkpoints (requestor does not want to know success/error, does not want to wait for completion), no change, checkpoint is executed by the background thread. Comments, constants, mutex usage fixes. storage/maria/checkpoint.h: new prototype of "API" (the calls exposed by the checkpoint module) storage/maria/least_recently_dirtied.c: A better solution than sleeping one second after flushing a piece of the LRD: instead we pthread_yield(). Hopefully this will slow down the background thread (avoiding it using all the disk's bandwidth) if there are other threads competing, and will not slow it down if this thread is alone (where we do want it to run fast and not do useless sleeps). This thread will probe for asynchronous checkpoint requests every few seconds.

Post-vacation-musing fixes to WL#3071 "Maria checkpoint":
changes to how synchronous checkpoint requests are executed. changes to how the background LRD flushing thread refrains from using all resources. See more comments for each file. storage/maria/checkpoint.c: I was not happy that checkpoint requests which want to know the success/error of their executed request, get inaccurate information in case of error (no error string etc). Instead of implementing a more complete communication protocol between requestor and executor, I make the requestor do the execution itself. I call this a synchronous checkpoint. For asynchronous checkpoints (requestor does not want to know success/error, does not want to wait for completion), no change, checkpoint is executed by the background thread. Comments, constants, mutex usage fixes. storage/maria/checkpoint.h: new prototype of "API" (the calls exposed by the checkpoint module) storage/maria/least_recently_dirtied.c: A better solution than sleeping one second after flushing a piece of the LRD: instead we pthread_yield(). Hopefully this will slow down the background thread (avoiding it using all the disk's bandwidth) if there are other threads competing, and will not slow it down if this thread is alone (where we do want it to run fast and not do useless sleeps). This thread will probe for asynchronous checkpoint requests every few seconds.
30e5a9bd · unknown · c262b880 · 30e5a9bd · 30e5a9bd · 30e5a9bd
Commit 30e5a9bd authored Jul 24, 2006 by unknown
Showing with 169 additions and 179 deletions

storage/maria/checkpoint.c storage/maria/checkpoint.c +139 -160

storage/maria/checkpoint.h storage/maria/checkpoint.h +3 -7

storage/maria/least_recently_dirtied.c storage/maria/least_recently_dirtied.c +27 -12

No files found.
--- a/storage/maria/checkpoint.c
+++ b/storage/maria/checkpoint.c
--- a/storage/maria/checkpoint.h
+++ b/storage/maria/checkpoint.h
@@ -13,11 +13,7 @@ typedef enum enum_checkpoint_level {
  FULL /* also flush all dirty pages */
 } CHECKPOINT_LEVEL;

-/*
-  Call this when you want to request a checkpoint.
-  In real life it will be called by log_write_record() and by client thread
-  which explicitely wants to do checkpoint (ALTER ENGINE CHECKPOINT
-  checkpoint_level).
-*/
-int request_checkpoint(CHECKPOINT_LEVEL level, my_bool wait_for_completion);
+void request_asynchronous_checkpoint(CHECKPOINT_LEVEL level);
+my_bool execute_synchronous_checkpoint(CHECKPOINT_LEVEL level);
+my_bool execute_asynchronous_checkpoint_if_any();
 /* that's all that's needed in the interface */
--- a/storage/maria/least_recently_dirtied.c
+++ b/storage/maria/least_recently_dirtied.c
@@ -48,6 +48,15 @@
  Key cache has groupping already somehow Monty said (investigate that).
 */
 #define FLUSH_GROUP_SIZE 512 /* 8 MB */
+/*
+  We don't want to probe for checkpoint requests all the time (it takes
+  the log mutex).
+  If FLUSH_GROUP_SIZE is 8MB, assuming a local disk which can write 30MB/s
+  (1.8GB/min), probing every 16th call to flush_one_group_from_LRD() is every
+  16*8=128MB which is every 128/30=4.2second.
+  Using a power of 2 gives a fast modulo operation.
+*/
+#define CHECKPOINT_PROBING_PERIOD_LOG2 4

 /*
  This thread does background flush of pieces of the LRD, and all checkpoints.
@@ -56,19 +65,19 @@
 pthread_handler_decl background_flush_and_checkpoint_thread()
 {
  char *flush_group_buffer= my_malloc(PAGE_SIZE*FLUSH_GROUP_SIZE);
+  uint flush_calls= 0;
  while (this_thread_not_killed)
  {
-    lock(log_mutex);
-    if (checkpoint_request)
-      checkpoint(); /* will unlock mutex */
-    else
-    {
-      unlock(log_mutex);
-      lock(global_LRD_mutex);
-      flush_one_group_from_LRD();
-      safemutex_assert_not_owner(global_LRD_mutex);
-    }
-    my_sleep(1000000); /* one second ? */
+    if ((flush_calls++) & ((2<<CHECKPOINT_PROBING_PERIOD_LOG2)-1) == 0)
+      execute_asynchronous_checkpoint_if_any();
+    lock(global_LRD_mutex);
+    flush_one_group_from_LRD();
+    safemutex_assert_not_owner(global_LRD_mutex);
+    /*
+      We are a background thread, leave time for client threads or we would
+      monopolize the disk:
+    */
+    pthread_yield();
  }
  my_free(flush_group_buffer);
 }
@@ -155,6 +164,12 @@ flush_one_group_from_LRD()
    */
  }
  free(array);
+  /*
+    MikaelR noted that he observed that Linux's file cache may never fsync to
+    disk until this cache is full, at which point it decides to empty the
+    cache, making the machine very slow. A solution was to fsync after writing
+    2 MB.
+  */
 }

 /* flushes all page from LRD up to approximately rec_lsn>=max_lsn */
@@ -165,7 +180,7 @@ int flush_all_LRD_to_lsn(LSN max_lsn)
    max_lsn= LRD->first->prev->rec_lsn;
  while (LRD->first->rec_lsn < max_lsn)
  {
-    if (flush_one_group_from_LRD()) /* will unlock mutex */
+    if (flush_one_group_from_LRD()) /* will unlock LRD mutex */
      return 1;
    /* scheduler may preempt us here so that we don't take full CPU */
    lock(global_LRD_mutex);