Reduce the number of Elasticsearch client instances that are created

When collecting instance profile credentials in AWS, each client instantiation is a HTTP request to an external web service. This service may rate-limit us if we perform too many requests in a given time period. A typical GitLab deployment will have many processes running, and each of those processes needs *one* elasticsearch client instance. The client instance is thread-safe and handles concurrent requests very well, with HTTP keep-alive connections. Prior to this commit, each process using elasticsearch would instantiate one client per *class* that used `Elasticsearch::Model::Client`. So a multi-node setup might look like: ``` * Server A * Unicorn parent * Unicorn child A * Client for Project class * Client for Repository class * Client for Issue class * ... * Unicorn child B * Client for Project class * Client for Repository class * Client for Issue class * ... * Sidekiq * Client for Project class * Client for Repository class * Client for Issue class * ... * Server B * Unicorn master * ... (same as above) * Sidekiq * .... (same as above) ``` (total: N, plus N per unicorn child, multipled by the number of servers) Following this commit, we have the following clients instead: ``` * Server A * Sidekiq (1 client) * Unicorn parent * Unicorn child A (1 client) * Unicorn child b (1 client) * Server B * ... (same as above) ``` (total: 1, + 1 per unicorn child, multipled by the number of servers) This drastically reduces the number of HTTP connections we make to the Elasticsearch and AWS instance profile credentials servers, and should come with a small increase in performance due to better utilisation of those connections.

Reduce the number of Elasticsearch client instances that are created
When collecting instance profile credentials in AWS, each client instantiation is a HTTP request to an external web service. This service may rate-limit us if we perform too many requests in a given time period. A typical GitLab deployment will have many processes running, and each of those processes needs *one* elasticsearch client instance. The client instance is thread-safe and handles concurrent requests very well, with HTTP keep-alive connections. Prior to this commit, each process using elasticsearch would instantiate one client per *class* that used `Elasticsearch::Model::Client`. So a multi-node setup might look like: ``` * Server A * Unicorn parent * Unicorn child A * Client for Project class * Client for Repository class * Client for Issue class * ... * Unicorn child B * Client for Project class * Client for Repository class * Client for Issue class * ... * Sidekiq * Client for Project class * Client for Repository class * Client for Issue class * ... * Server B * Unicorn master * ... (same as above) * Sidekiq * .... (same as above) ``` (total: N, plus N per unicorn child, multipled by the number of servers) Following this commit, we have the following clients instead: ``` * Server A * Sidekiq (1 client) * Unicorn parent * Unicorn child A (1 client) * Unicorn child b (1 client) * Server B * ... (same as above) ``` (total: 1, + 1 per unicorn child, multipled by the number of servers) This drastically reduces the number of HTTP connections we make to the Elasticsearch and AWS instance profile credentials servers, and should come with a small increase in performance due to better utilisation of those connections.
d87f2518 · Nick Thomas · 7f9dac98 · d87f2518 · d87f2518 · d87f2518
Commit d87f2518 authored Nov 16, 2017 by Nick Thomas
3 changed files
--- a/changelogs/unreleased-ee/3650-persist-elasticsearch-client.yml
+++ b/changelogs/unreleased-ee/3650-persist-elasticsearch-client.yml
+---
+title: Reduce the number of Elasticsearch client instances that are created
+merge_request: 3432
+author:
+type: fixed
--- a/config/initializers/elastic_client_setup.rb
+++ b/config/initializers/elastic_client_setup.rb
@@ -5,20 +5,38 @@ require 'gitlab/current_settings'
 module Elasticsearch
  module Model
    module Client
+      # This mutex is only used to synchronize *creation* of a new client, so
+      # all including classes can share the same client instance
+      CLIENT_MUTEX = Mutex.new
+
+      cattr_accessor :cached_client
+      cattr_accessor :cached_config
+
      module ClassMethods
        include Gitlab::CurrentSettings

-        def client(client = nil)
-          if @client.nil? || es_configuration_changed?
-            @es_config = current_application_settings.elasticsearch_config
-            @client = ::Gitlab::Elastic::Client.build(@es_config)
-          end
+        # Override the default ::Elasticsearch::Model::Client implementation to
+        # return a client configured from application settings. All including
+        # classes will use the same instance, which is refreshed automatically
+        # if the settings change.
+        #
+        # _client is present to match the arity of the overridden method, where
+        # it is also not used.
+        #
+        # @return [Elasticsearch::Transport::Client]
+        def client(_client = nil)
+          store = ::Elasticsearch::Model::Client

-          @client
-        end
+          store::CLIENT_MUTEX.synchronize do
+            config = current_application_settings.elasticsearch_config
+
+            if store.cached_client.nil? || config != store.cached_config
+              store.cached_client = ::Gitlab::Elastic::Client.build(config)
+              store.cached_config = config
+            end
+          end

-        def es_configuration_changed?
-          @es_config != current_application_settings.elasticsearch_config
+          store.cached_client
        end
      end
    end

--- a/spec/elastic_integration/elasticsearch_model_client_spec.rb
+++ b/spec/elastic_integration/elasticsearch_model_client_spec.rb
+require 'spec_helper'
+
+# This module is monkey-patched in config/initializers/elastic_client_setup.rb
+describe "Monkey-patches to ::Elasticsearch::Model::Client" do
+  before do
+    stub_application_setting(elasticsearch_url: ['http://localhost:9200'])
+  end
+
+  it 'uses the same client instance for all subclasses' do
+    a = Class.new { include ::Elasticsearch::Model }
+    b = Class.new { include ::Elasticsearch::Model }
+    c = Class.new(b)
+
+    expect(::Gitlab::Elastic::Client).to receive(:build).with(anything) { :fake_client }.once
+
+    # Ensure that the same client instance is used between classes and between
+    # requests
+    [a, b, c, b, c, b, a].each do |klass|
+      expect(klass.__elasticsearch__.client).to eq(:fake_client)
+    end
+  end
+end