Commit c7c495a0 authored by Vitali Tatarintev's avatar Vitali Tatarintev

Merge branch 'bvl-rename-target-duration-to-urgency' into 'master'

Rename the target_duration to urgency

See merge request gitlab-org/gitlab!71573
parents 4fc1e318 5de524bc
......@@ -139,8 +139,8 @@ class ApplicationController < ActionController::Base
self.class.feature_category_for_action(action_name).to_s
end
def target_duration
self.class.target_duration_for_action(action_name)
def urgency
self.class.urgency_for_action(action_name)
end
protected
......
......@@ -23,7 +23,7 @@ class SearchController < ApplicationController
layout 'search'
feature_category :global_search
target_duration :very_fast, [:opensearch]
urgency :high, [:opensearch]
def show
@project = search_service.project
......
......@@ -26,7 +26,8 @@ regular controller endpoints. It consists of these counters:
1. `gitlab_sli:rails_request_apdex:success_total`: This counter gets
incremented for every successful request that performed faster than
the [defined target duration](#adjusting-request-target-duration).
the [defined target duration depending on the endpoint's
urgency](#adjusting-request-urgency).
Both these counters are labeled with:
......@@ -52,15 +53,19 @@ For example: for the web-service, we want at least 99.8% of requests
to be faster than their target duration.
These are the targets we use for alerting and service montoring. So
durations should be set keeping those into account.
durations should be set keeping those into account. So we would not
cause alerts. But the goal would be to set the urgency to a target
that users would be satisfied with.
Both successful measurements and unsuccessful ones have an impact on the
error budget for stage groups.
## Adjusting request target duration
## Adjusting request urgency
Not all endpoints perform the same type of work, so it is possible to
define different durations for different endpoints.
define different urgencies for different endpoints. An endpoint with a
lower urgency can have a longer request duration than endpoints that
are high urgency.
Long-running requests are more expensive for our
infrastructure: while one request is being served, the thread remains
......@@ -71,9 +76,9 @@ process. The request is in fact a noisy neighbor for other requests
handled by the worker. This is why the upper bound for a target
duration is capped at 5 seconds.
## Increasing the target duration (setting a slower target)
## Decreasing the urgency (setting a higher target duration)
Increasing the target duration on an existing endpoint can be done on
Increasing the urgency on an existing endpoint can be done on
a case-by-case basis. Please take the following into account:
1. Apdex is about perceived performance, if a user is actively waiting
......@@ -84,10 +89,10 @@ a case-by-case basis. Please take the following into account:
A product manager can help to identify how an endpoint is used.
1. The workload for some endpoints can sometimes differ greatly
depending on the parameters specified by the caller. The target
duration needs to accomodate that. In some cases, it might be
interesting to define a separate [application
SLI](index.md#defining-a-new-sli) for what the endpoint is doing.
depending on the parameters specified by the caller. The urgency
needs to accomodate that. In some cases, it might be interesting to
define a separate [application SLI](index.md#defining-a-new-sli)
for what the endpoint is doing.
When the endpoints in certain cases turn into no-ops, making them
very fast, we should ignore these fast requests when setting the
......@@ -99,11 +104,12 @@ a case-by-case basis. Please take the following into account:
1. Consider the dependent resources consumed by the endpoint. If the endpoint
loads a lot of data from Gitaly or the database and this is causing
it to not perform satisfactory. It could be better to optimize the
way the data is loaded rather than increasing the target duration.
way the data is loaded rather than increasing the target duration
by lowering the urgency.
In cases like this, it might be appropriate to temporarily increase
the duration to make the endpoint meet SLO, if this is bearable for
the infrastructure. In such cases, please link an issue from a code
In cases like this, it might be appropriate to temporarily decrease
urgency to make the endpoint meet SLO, if this is bearable for the
infrastructure. In such cases, please link an issue from a code
comment.
If the endpoint consumes a lot of CPU time, we should also consider
......@@ -117,20 +123,19 @@ a case-by-case basis. Please take the following into account:
view. We cannot scale up the fleet fast enough to accomodate for
the incoming slow requests alongside the regular traffic.
When increasing the target duration for an existing endpoint, please
involve a [Scalability team
member](https://about.gitlab.com/handbook/engineering/infrastructure/team/scalability/#team-members)
When lowering the urgency for an existing endpoint, please involve a
[Scalability team member](https://about.gitlab.com/handbook/engineering/infrastructure/team/scalability/#team-members)
in the review. We can use request rates and durations available in the
logs to come up with a recommendation. Picking a threshold can be done
using the same process as for [decreasing a target
duration](#decreasing-a-target-duration-setting-a-faster-target), picking a duration that is
higher than the SLO for the service.
using the same process as for [increasing
urgency](#increasing-urgency-setting-a-lower-target-duration), picking
a duration that is higher than the SLO for the service.
We shouldn't set the longest durations on endpoints in the merge
requests that introduces them, since we don't yet have data to support
the decision.
## Decreasing a target duration (setting a faster target)
## Increasing urgency (setting a lower target duration)
When decreasing the target duration, we need to make sure the endpoint
still meets SLO for the fleet that handles the request. You can use the
......@@ -158,62 +163,61 @@ Since decreasing a threshold too much could result in alerts for the
apdex degradation, please also involve a Scalability team member in
the merge reqeust.
## How to adjust the target duration
## How to adjust the urgency
The target duration can be specified similar to how endpoints [get a
feature category](../feature_categorization/index.md).
The urgency can be specified similar to how endpoints [get a feature
category](../feature_categorization/index.md).
For endpoints that don't have a specific target, the default of 1s
(medium) will be used.
For endpoints that don't have a specific target, the default urgency (1s duration) will be used.
The following configurations are available:
| Name | Duration in seconds | Notes |
|------------|---------------------|-----------------------------------------------|
| :very_fast | 0.25s | |
| :fast | 0.5s | |
| :medium | 1s | This is the default when nothing is specified |
| :slow | 5s | |
| Urgency | Duration in seconds | Notes |
|----------|---------------------|-----------------------------------------------|
| :high | 0.25s | |
| :medium | 0.5s | |
| :default | 1s | This is the default when nothing is specified |
| :low | 5s | |
### Rails controller
A duration can be specified for all actions in a controller like this:
An urgency can be specified for all actions in a controller like this:
```ruby
class Boards::ListsController < ApplicationController
target_duration :fast
urgency :high
end
```
To specify the duration also for certain actions in a controller, they
To specify the urgency also for certain actions in a controller, they
can be specified like this:
```ruby
class Boards::ListsController < ApplicationController
target_duration :fast, [:index, :show]
urgency :high, [:index, :show]
end
```
### Grape endpoints
To specify the duration for an entire API class, this can be done as
To specify the urgency for an entire API class, this can be done as
follows:
```ruby
module API
class Issues < ::API::Base
target_duration :slow
urgency :low
end
end
```
To specify the duration also for certain actions in a API class, they
To specify the urgency also for certain actions in a API class, they
can be specified like this:
```ruby
module API
class Issues < ::API::Base
target_duration :fast, [
urgency :medium, [
'/groups/:id/issues',
'/groups/:id/issues_statistics'
]
......@@ -221,10 +225,10 @@ module API
end
```
Or, we can specify a custom duration per endpoint:
Or, we can specify the urgency per endpoint:
```ruby
get 'client/features', target_duration: :fast do
get 'client/features', urgency: :low do
# endpoint logic
end
```
......@@ -9,8 +9,8 @@ module API
feature_category_for_action(path_for_app(app))
end
def target_duration_for_app(app)
target_duration_for_action(path_for_app(app))
def urgency_for_app(app)
urgency_for_action(path_for_app(app))
end
def path_for_app(app)
......@@ -27,8 +27,8 @@ module API
feature_category(category, actions)
end
if target = route_options.delete(:target_duration)
target_duration(target, actions)
if target = route_options.delete(:urgency)
urgency(target, actions)
end
super
......
......@@ -164,7 +164,7 @@ module API
#
# Check whether an SSH key is known to GitLab
#
get '/authorized_keys', feature_category: :source_code_management, target_duration: :very_fast do
get '/authorized_keys', feature_category: :source_code_management, urgency: :high do
fingerprint = Gitlab::InsecureKeyFingerprint.new(params.fetch(:key)).fingerprint_sha256
key = Key.find_by_fingerprint_sha256(fingerprint)
......
......@@ -30,7 +30,7 @@ module API
end
desc 'Get a list of features'
get 'client/features', target_duration: :fast do
get 'client/features', urgency: :medium do
present :version, 1
present :features, feature_flags, with: ::API::Entities::UnleashFeature
end
......
......@@ -5,7 +5,7 @@ module Gitlab
extend ActiveSupport::Concern
include Gitlab::ClassAttributes
DEFAULT_TARGET_DURATION = Config::TARGET_DURATIONS.fetch(:medium)
DEFAULT_URGENCY = Config::REQUEST_URGENCIES.fetch(:default)
class_methods do
def feature_category(category, actions = [])
......@@ -17,13 +17,13 @@ module Gitlab
category || superclass_feature_category_for_action(action)
end
def target_duration(duration, actions = [])
endpoint_attributes.set(actions, target_duration: duration)
def urgency(urgency_name, actions = [])
endpoint_attributes.set(actions, urgency: urgency_name)
end
def target_duration_for_action(action)
duration = endpoint_attributes.attribute_for_action(action, :target_duration)
duration || superclass_target_duration_for_action(action) || DEFAULT_TARGET_DURATION
def urgency_for_action(action)
urgency = endpoint_attributes.attribute_for_action(action, :urgency)
urgency || superclass_urgency_for_action(action) || DEFAULT_URGENCY
end
private
......@@ -38,10 +38,10 @@ module Gitlab
superclass.feature_category_for_action(action)
end
def superclass_target_duration_for_action(action)
return unless superclass.respond_to?(:target_duration_for_action)
def superclass_urgency_for_action(action)
return unless superclass.respond_to?(:urgency_for_action)
superclass.target_duration_for_action(action)
superclass.urgency_for_action(action)
end
end
end
......
......@@ -3,14 +3,14 @@
module Gitlab
module EndpointAttributes
class Config
Duration = Struct.new(:name, :duration)
TARGET_DURATIONS = [
Duration.new(:very_fast, 0.25),
Duration.new(:fast, 0.5),
Duration.new(:medium, 1),
Duration.new(:slow, 5)
RequestUrgency = Struct.new(:name, :duration)
REQUEST_URGENCIES = [
RequestUrgency.new(:high, 0.25),
RequestUrgency.new(:medium, 0.5),
RequestUrgency.new(:default, 1),
RequestUrgency.new(:low, 5)
].index_by(&:name).freeze
SUPPORTED_ATTRIBUTES = %i[feature_category target_duration].freeze
SUPPORTED_ATTRIBUTES = %i[feature_category urgency].freeze
def initialize
@default_attributes = {}
......@@ -38,8 +38,8 @@ module Gitlab
def attribute_for_action(action, attribute_name)
value = @action_attributes.dig(action.to_s, attribute_name) || @default_attributes[attribute_name]
# Translate target duration to a representative struct
value = TARGET_DURATIONS[value] if attribute_name == :target_duration
# Translate urgency to a representative struct
value = REQUEST_URGENCIES[value] if attribute_name == :urgency
value
end
......@@ -49,8 +49,8 @@ module Gitlab
unsupported_attributes = (attributes.keys - SUPPORTED_ATTRIBUTES).present?
raise ArgumentError, "Attributes not supported: #{unsupported_attributes.join(", ")}" if unsupported_attributes
if attributes[:target_duration].present? && !TARGET_DURATIONS.key?(attributes[:target_duration])
raise ArgumentError, "Target duration not supported: #{attributes[:target_duration]}"
if attributes[:urgency].present? && !REQUEST_URGENCIES.key?(attributes[:urgency])
raise ArgumentError, "Urgency not supported: #{attributes[:urgency]}"
end
end
......
......@@ -132,12 +132,12 @@ module Gitlab
def satisfactory?(env, elapsed)
target =
if env['api.endpoint'].present?
env['api.endpoint'].options[:for].try(:target_duration_for_app, env['api.endpoint'])
elsif env['action_controller.instance'].present? && env['action_controller.instance'].respond_to?(:target_duration)
env['action_controller.instance'].target_duration
env['api.endpoint'].options[:for].try(:urgency_for_app, env['api.endpoint'])
elsif env['action_controller.instance'].present? && env['action_controller.instance'].respond_to?(:urgency)
env['action_controller.instance'].urgency
end
target ||= Gitlab::EndpointAttributes::DEFAULT_TARGET_DURATION
target ||= Gitlab::EndpointAttributes::DEFAULT_URGENCY
elapsed < target.duration
end
......
......@@ -18,7 +18,7 @@ RSpec.describe ::API::Base do
let(:api_handler) do
Class.new(described_class) do
feature_category :foo
target_duration :fast
urgency :medium
namespace '/test' do
get 'hello' do
......@@ -34,9 +34,9 @@ RSpec.describe ::API::Base do
expect(api_handler.feature_category_for_app(app_hi)).to eq(:foo)
end
it 'sets target duration for a particular route', :aggregate_failures do
expect(api_handler.target_duration_for_app(app_hello)).to be_a_target_duration(:fast)
expect(api_handler.target_duration_for_app(app_hi)).to be_a_target_duration(:fast)
it 'sets request urgency for a particular route', :aggregate_failures do
expect(api_handler.urgency_for_app(app_hello)).to be_request_urgency(:medium)
expect(api_handler.urgency_for_app(app_hi)).to be_request_urgency(:medium)
end
end
......@@ -44,9 +44,9 @@ RSpec.describe ::API::Base do
let(:api_handler) do
Class.new(described_class) do
namespace '/test' do
get 'hello', feature_category: :foo, target_duration: :slow do
get 'hello', feature_category: :foo, urgency: :low do
end
post 'hi', feature_category: :bar, target_duration: :fast do
post 'hi', feature_category: :bar, urgency: :medium do
end
end
end
......@@ -57,9 +57,9 @@ RSpec.describe ::API::Base do
expect(api_handler.feature_category_for_app(app_hi)).to eq(:bar)
end
it 'sets target duration for a particular route', :aggregate_failures do
expect(api_handler.target_duration_for_app(app_hello)).to be_a_target_duration(:slow)
expect(api_handler.target_duration_for_app(app_hi)).to be_a_target_duration(:fast)
it 'sets request urgency for a particular route', :aggregate_failures do
expect(api_handler.urgency_for_app(app_hello)).to be_request_urgency(:low)
expect(api_handler.urgency_for_app(app_hi)).to be_request_urgency(:medium)
end
end
......@@ -67,12 +67,12 @@ RSpec.describe ::API::Base do
let(:api_handler) do
Class.new(described_class) do
feature_category :foo, ['/test/hello']
target_duration :slow, ['/test/hello']
urgency :low, ['/test/hello']
namespace '/test' do
get 'hello' do
end
post 'hi', feature_category: :bar, target_duration: :fast do
post 'hi', feature_category: :bar, urgency: :medium do
end
end
end
......@@ -84,8 +84,8 @@ RSpec.describe ::API::Base do
end
it 'sets target duration for a particular route', :aggregate_failures do
expect(api_handler.target_duration_for_app(app_hello)).to be_a_target_duration(:slow)
expect(api_handler.target_duration_for_app(app_hi)).to be_a_target_duration(:fast)
expect(api_handler.urgency_for_app(app_hello)).to be_request_urgency(:low)
expect(api_handler.urgency_for_app(app_hi)).to be_request_urgency(:medium)
end
end
end
......
# frozen_string_literal: true
require 'fast_spec_helper'
require_relative "../../support/matchers/be_request_urgency"
require_relative "../../../lib/gitlab/endpoint_attributes"
RSpec.describe Gitlab::EndpointAttributes do
......@@ -16,15 +17,15 @@ RSpec.describe Gitlab::EndpointAttributes do
feature_category :bar, %w(index show)
feature_category :quux, %w(destroy)
target_duration :fast, %w(do_a)
target_duration :slow, %w(do_b do_c)
urgency :high, %w(do_a)
urgency :low, %w(do_b do_c)
end
end
let(:subclass) do
Class.new(controller) do
feature_category :baz, %w(subclass_index)
target_duration :very_fast, %w(superclass_do_something)
urgency :high, %w(superclass_do_something)
end
end
......@@ -38,17 +39,17 @@ RSpec.describe Gitlab::EndpointAttributes do
expect(controller.feature_category_for_action("destroy")).to eq(:quux)
end
it "falls back to medium when target_duration was not defined", :aggregate_failures do
expect(base_controller.target_duration_for_action("hello")).to be_a_target_duration(:medium)
expect(controller.target_duration_for_action("update")).to be_a_target_duration(:medium)
expect(controller.target_duration_for_action("index")).to be_a_target_duration(:medium)
expect(controller.target_duration_for_action("destroy")).to be_a_target_duration(:medium)
it "falls back to default when urgency was not defined", :aggregate_failures do
expect(base_controller.urgency_for_action("hello")).to be_request_urgency(:default)
expect(controller.urgency_for_action("update")).to be_request_urgency(:default)
expect(controller.urgency_for_action("index")).to be_request_urgency(:default)
expect(controller.urgency_for_action("destroy")).to be_request_urgency(:default)
end
it "returns the expected target_duration", :aggregate_failures do
expect(controller.target_duration_for_action("do_a")).to be_a_target_duration(:fast)
expect(controller.target_duration_for_action("do_b")).to be_a_target_duration(:slow)
expect(controller.target_duration_for_action("do_c")).to be_a_target_duration(:slow)
it "returns the expected urgency", :aggregate_failures do
expect(controller.urgency_for_action("do_a")).to be_request_urgency(:high)
expect(controller.urgency_for_action("do_b")).to be_request_urgency(:low)
expect(controller.urgency_for_action("do_c")).to be_request_urgency(:low)
end
it "returns feature category for an implied action if not specify actions" do
......@@ -62,10 +63,10 @@ RSpec.describe Gitlab::EndpointAttributes do
it "returns expected duration for an implied action if not specify actions" do
klass = Class.new(base_controller) do
feature_category :foo
target_duration :slow
urgency :low
end
expect(klass.target_duration_for_action("index")).to be_a_target_duration(:slow)
expect(klass.target_duration_for_action("show")).to be_a_target_duration(:slow)
expect(klass.urgency_for_action("index")).to be_request_urgency(:low)
expect(klass.urgency_for_action("show")).to be_request_urgency(:low)
end
it "returns the expected category for categories defined in subclasses" do
......@@ -76,12 +77,12 @@ RSpec.describe Gitlab::EndpointAttributes do
expect(subclass.feature_category_for_action("update")).to eq(:foo)
end
it "returns the expected target_duration for categories defined in subclasses" do
expect(subclass.target_duration_for_action("superclass_do_something")).to be_a_target_duration(:very_fast)
it "returns the expected urgency for categories defined in subclasses" do
expect(subclass.urgency_for_action("superclass_do_something")).to be_request_urgency(:high)
end
it "falls back to superclass's expected duration" do
expect(subclass.target_duration_for_action("do_a")).to be_a_target_duration(:fast)
expect(subclass.urgency_for_action("do_a")).to be_request_urgency(:high)
end
it "raises an error when defining for the controller and for individual actions" do
......@@ -105,10 +106,10 @@ RSpec.describe Gitlab::EndpointAttributes do
it "raises an error when multiple calls define the same action" do
expect do
Class.new(base_controller) do
target_duration :fast, [:world]
target_duration :slow, ["world"]
urgency :high, [:world]
urgency :low, ["world"]
end
end.to raise_error(ArgumentError, "Attributes re-defined for action world: target_duration")
end.to raise_error(ArgumentError, "Attributes re-defined for action world: urgency")
end
it "does not raise an error when multiple calls define the same action and configs" do
......@@ -116,8 +117,8 @@ RSpec.describe Gitlab::EndpointAttributes do
Class.new(base_controller) do
feature_category :hello, [:world]
feature_category :hello, ["world"]
target_duration :fast, [:moon]
target_duration :fast, ["moon"]
urgency :medium, [:moon]
urgency :medium, ["moon"]
end
end.not_to raise_error
end
......@@ -125,8 +126,8 @@ RSpec.describe Gitlab::EndpointAttributes do
it "raises an error if the expected duration is not supported" do
expect do
Class.new(base_controller) do
target_duration :super_slow
urgency :super_slow
end
end.to raise_error(ArgumentError, "Target duration not supported: super_slow")
end.to raise_error(ArgumentError, "Urgency not supported: super_slow")
end
end
......@@ -163,30 +163,30 @@ RSpec.describe Gitlab::Metrics::RequestsRackMiddleware, :aggregate_failures do
end
context 'SLI satisfactory' do
where(:target, :duration, :success) do
where(:request_urgency_name, :duration, :success) do
[
[:very_fast, 0.1, true],
[:very_fast, 0.25, false],
[:very_fast, 0.3, false],
[:fast, 0.3, true],
[:fast, 0.5, false],
[:fast, 0.6, false],
[:medium, 0.6, true],
[:medium, 1.0, false],
[:medium, 1.2, false],
[:slow, 4.5, true],
[:slow, 5.0, false],
[:slow, 6, false]
[:high, 0.1, true],
[:high, 0.25, false],
[:high, 0.3, false],
[:medium, 0.3, true],
[:medium, 0.5, false],
[:medium, 0.6, false],
[:default, 0.6, true],
[:default, 1.0, false],
[:default, 1.2, false],
[:low, 4.5, true],
[:low, 5.0, false],
[:low, 6, false]
]
end
with_them do
context 'Grape API handler having expected duration setup' do
let(:api_handler) do
target_duration = target # target is a DSL provided by Rspec, it's invisible to the inner block
request_urgency = request_urgency_name
Class.new(::API::Base) do
feature_category :hello_world, ['/projects/:id/archive']
target_duration target_duration, ['/projects/:id/archive']
urgency request_urgency, ['/projects/:id/archive']
end
end
......@@ -215,10 +215,10 @@ RSpec.describe Gitlab::Metrics::RequestsRackMiddleware, :aggregate_failures do
context 'Rails controller having expected duration setup' do
let(:controller) do
target_duration = target # target is a DSL provided by Rspec, it's invisible to the inner block
request_urgency = request_urgency_name
Class.new(ApplicationController) do
feature_category :hello_world, [:index, :show]
target_duration target_duration, [:index, :show]
urgency request_urgency, [:index, :show]
end
end
......
# frozen_string_literal: true
RSpec::Matchers.define :be_a_target_duration do |expected|
match do |actual|
actual.is_a?(::Gitlab::EndpointAttributes::Config::Duration) && actual.name == expected
end
end
# frozen_string_literal: true
RSpec::Matchers.define :be_request_urgency do |expected|
match do |actual|
actual.is_a?(::Gitlab::EndpointAttributes::Config::RequestUrgency) &&
actual.name == expected
end
end
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment