From 90765f973e817f2a464ed7d94f41c6b273ee76f3 Mon Sep 17 00:00:00 2001 From: Stan Hu <stanhu@gmail.com> Date: Wed, 15 Jun 2016 19:00:32 +0000 Subject: [PATCH] Merge branch 'fair-usage-of-shared-runners' into 'master' Fair usage of Shared Runners ## What does this MR do? Introduces a fair usage scheduler for shared runners. It tries to assign builds to shared runner from projects that have the lowest number of builds currently running on shared runners. **Example 1**: ``` We have following builds in queue: build 1 for project 1 build 2 for project 1 build 3 for project 1 build 4 for project 2 build 5 for project 2 build 6 for project 3 With the new algorithm we will assign builds in following order: - We choose build 1, because project 1 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds, - We choose build 4, because project 2 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds, - We choose build 6, because project 3 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds, - We choose build 2, because project 1 as other it runs 1 build, - We choose build 5, because project 2 runs 1 build, where project 1 runs 2 builds now, - We choose build 3, because project 1 and runs 2 builds. ``` **Example 2**: ``` We have following builds in queue: build 1 for project 1 build 2 for project 1 build 3 for project 1 build 4 for project 2 build 5 for project 2 build 6 for project 3 With the new algorithm we will assign builds in following order: - We choose build 1, because project 1 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds, - We finish build 1, - We choose build 2, because project 1 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds, - We choose build 4, because project 2 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds, - We finish build 4, - We choose build 5, because project 2 doesn't run currently any builds and has the lowest build number from projects that doesn't run builds, - We choose build 6, because project 3 doesn't run currently any builds, - We choose build 3, because project 1, 2 and 3 runs exactly one build now, ``` ## Why was this MR needed? Currently, we are scheduling builds using FIFO. This is catastrophic if there are projects that create a 100-300 jobs, this basically eats most of available shared runners. ## Performance All this logic is implemented with the help of SQL queries, because this is the fastest way to process 1k-2k pending builds in queue. It's not the fastest SQL query, because it sorts based on number of running_builds, and this forces to calculate a number of running builds for all dependent projects. However, since we have one/two shared runners that asks every few seconds for builds this should have minimal impact on DB performance. ``` explain analyze SELECT "ci_builds".* FROM "ci_builds" JOIN (SELECT "ci_builds"."gl_project_id", count(case when status = 'running' AND runner_id = (SELECT "ci_runners"."id" FROM "ci_runners" WHERE "ci_runners"."is_shared" = 't') then 1 end) as running_builds FROM "ci_builds" INNER JOIN "projects" ON "projects"."id" = "ci_builds"."gl_project_id" AND "projects"."pending_delete" = 'f' WHERE "ci_builds"."type" IN ('Ci::Build') AND "ci_builds"."status" IN ('running', 'pending') AND "projects"."builds_enabled" = 't' AND "projects"."shared_runners_enabled" = 't' GROUP BY "ci_builds"."gl_project_id") AS projects ON ci_builds.gl_project_id=projects.gl_project_id WHERE "ci_builds"."type" IN ('Ci::Build') AND "ci_builds"."status" = 'pending' AND "ci_builds"."runner_id" IS NULL ORDER BY projects.running_builds ASC, ci_builds.id ASC; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------- Sort (cost=64777.28..64777.29 rows=1 width=1010) (actual time=301.794..302.535 rows=1537 loops=1) Sort Key: (count(CASE WHEN (((public.ci_builds.status)::text = 'running'::text) AND (public.ci_builds.runner_id = $0)) THEN 1 ELSE NULL::integer END)), public.ci _builds.id Sort Method: quicksort Memory: 1423kB -> Nested Loop (cost=63279.78..64777.27 rows=1 width=1010) (actual time=66.384..298.724 rows=1537 loops=1) -> HashAggregate (cost=63177.15..63177.30 rows=15 width=15) (actual time=65.641..65.851 rows=187 loops=1) InitPlan 1 (returns $0) -> Seq Scan on ci_runners (cost=0.00..26963.66 rows=1 width=4) (actual time=1.145..34.381 rows=1 loops=1) Filter: is_shared Rows Removed by Filter: 6965 -> Nested Loop (cost=0.00..36186.34 rows=2715 width=15) (actual time=0.065..29.717 rows=1710 loops=1) -> Index Scan using index_ci_builds_on_status on ci_builds (cost=0.00..8913.95 rows=3577 width=15) (actual time=0.051..12.012 rows=2583 loops =1) Index Cond: ((status)::text = ANY ('{running,pending}'::text[])) Filter: ((type)::text = 'Ci::Build'::text) Rows Removed by Filter: 1219 -> Index Scan using projects_pkey on projects (cost=0.00..7.61 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=2583) Index Cond: (id = public.ci_builds.gl_project_id) Filter: ((NOT pending_delete) AND builds_enabled AND shared_runners_enabled) Rows Removed by Filter: 0 -> Bitmap Heap Scan on ci_builds (cost=102.63..106.64 rows=1 width=1002) (actual time=1.216..1.231 rows=8 loops=187) Recheck Cond: ((gl_project_id = public.ci_builds.gl_project_id) AND ((status)::text = 'pending'::text)) Filter: ((runner_id IS NULL) AND ((type)::text = 'Ci::Build'::text)) -> BitmapAnd (cost=102.63..102.63 rows=1 width=0) (actual time=1.201..1.201 rows=0 loops=187) -> Bitmap Index Scan on index_ci_builds_on_gl_project_id (cost=0.00..10.52 rows=241 width=0) (actual time=0.406..0.406 rows=1944 loops=187) Index Cond: (gl_project_id = public.ci_builds.gl_project_id) -> Bitmap Index Scan on index_ci_builds_on_status (cost=0.00..91.78 rows=3089 width=0) (actual time=0.652..0.652 rows=3362 loops=187) Index Cond: ((status)::text = 'pending'::text) Total runtime: 303.832 ms ``` ## Specific runners It doesn't affect the specific runners which still serve builds FIFO. @stanhu @markpundsack @yorickpeterse What do you think? See merge request !4634 --- CHANGELOG | 1 + app/services/ci/register_build_service.rb | 23 +++++-- .../ci/register_build_service_spec.rb | 62 +++++++++++++++++++ 3 files changed, 80 insertions(+), 6 deletions(-) diff --git a/CHANGELOG b/CHANGELOG index bf5fb0af4e..13d9f96edf 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -23,6 +23,7 @@ v 8.9.0 (unreleased) - Added descriptions to notification settings dropdown - Improve note validation to prevent errors when creating invalid note via API - Reduce number of fog gem dependencies + - Implement a fair usage of shared runners - Remove project notification settings associated with deleted projects - Fix 404 page when viewing TODOs that contain milestones or labels in different projects - Add a metric for the number of new Redis connections created by a transaction diff --git a/app/services/ci/register_build_service.rb b/app/services/ci/register_build_service.rb index 4ff268a6f0..f0ed09a629 100644 --- a/app/services/ci/register_build_service.rb +++ b/app/services/ci/register_build_service.rb @@ -7,15 +7,19 @@ module Ci builds = if current_runner.shared? - # don't run projects which have not enables shared runners - builds.joins(:project).where(projects: { builds_enabled: true, shared_runners_enabled: true }) + builds. + # don't run projects which have not enabled shared runners + joins(:project).where(projects: { builds_enabled: true, shared_runners_enabled: true }). + + # this returns builds that are ordered by number of running builds + # we prefer projects that don't use shared runners at all + joins("LEFT JOIN (#{running_builds_for_shared_runners.to_sql}) AS project_builds ON ci_builds.gl_project_id=project_builds.gl_project_id"). + order('COALESCE(project_builds.running_builds, 0) ASC', 'ci_builds.id ASC') else - # do run projects which are only assigned to this runner - builds.where(project: current_runner.projects.where(builds_enabled: true)) + # do run projects which are only assigned to this runner (FIFO) + builds.where(project: current_runner.projects.where(builds_enabled: true)).order('created_at ASC') end - builds = builds.order('created_at ASC') - build = builds.find do |build| build.can_be_served?(current_runner) end @@ -35,5 +39,12 @@ module Ci rescue StateMachines::InvalidTransition nil end + + private + + def running_builds_for_shared_runners + Ci::Build.running.where(runner: Ci::Runner.shared). + group(:gl_project_id).select(:gl_project_id, 'count(*) AS running_builds') + end end end diff --git a/spec/services/ci/register_build_service_spec.rb b/spec/services/ci/register_build_service_spec.rb index d91fc57429..f28f2f1438 100644 --- a/spec/services/ci/register_build_service_spec.rb +++ b/spec/services/ci/register_build_service_spec.rb @@ -45,11 +45,73 @@ module Ci end end + context 'deleted projects' do + before do + project.update(pending_delete: true) + end + + context 'for shared runners' do + before do + project.update(shared_runners_enabled: true) + end + + it 'does not pick a build' do + expect(service.execute(shared_runner)).to be_nil + end + end + + context 'for specific runner' do + it 'does not pick a build' do + expect(service.execute(specific_runner)).to be_nil + end + end + end + context 'allow shared runners' do before do project.update(shared_runners_enabled: true) end + context 'for multiple builds' do + let!(:project2) { create :empty_project, shared_runners_enabled: true } + let!(:pipeline2) { create :ci_pipeline, project: project2 } + let!(:project3) { create :empty_project, shared_runners_enabled: true } + let!(:pipeline3) { create :ci_pipeline, project: project3 } + let!(:build1_project1) { pending_build } + let!(:build2_project1) { FactoryGirl.create :ci_build, pipeline: pipeline } + let!(:build3_project1) { FactoryGirl.create :ci_build, pipeline: pipeline } + let!(:build1_project2) { FactoryGirl.create :ci_build, pipeline: pipeline2 } + let!(:build2_project2) { FactoryGirl.create :ci_build, pipeline: pipeline2 } + let!(:build1_project3) { FactoryGirl.create :ci_build, pipeline: pipeline3 } + + it 'prefers projects without builds first' do + # it gets for one build from each of the projects + expect(service.execute(shared_runner)).to eq(build1_project1) + expect(service.execute(shared_runner)).to eq(build1_project2) + expect(service.execute(shared_runner)).to eq(build1_project3) + + # then it gets a second build from each of the projects + expect(service.execute(shared_runner)).to eq(build2_project1) + expect(service.execute(shared_runner)).to eq(build2_project2) + + # in the end the third build + expect(service.execute(shared_runner)).to eq(build3_project1) + end + + it 'equalises number of running builds' do + # after finishing the first build for project 1, get a second build from the same project + expect(service.execute(shared_runner)).to eq(build1_project1) + build1_project1.success + expect(service.execute(shared_runner)).to eq(build2_project1) + + expect(service.execute(shared_runner)).to eq(build1_project2) + build1_project2.success + expect(service.execute(shared_runner)).to eq(build2_project2) + expect(service.execute(shared_runner)).to eq(build1_project3) + expect(service.execute(shared_runner)).to eq(build3_project1) + end + end + context 'shared runner' do let(:build) { service.execute(shared_runner) } -- 2.30.9