Port the GitOps page to the development docs

Move the first half of the GitOps page from an external repository to the development documentation. The second half of the page discusses an architectural implementation that doesn't yet exist, so I didn't add it.

Port the GitOps page to the development docs
Move the first half of the GitOps page from an external repository to the development documentation. The second half of the page discusses an architectural implementation that doesn't yet exist, so I didn't add it.
06f4f791 · Amy Qualls · d26f7de6 · 06f4f791 · 06f4f791
Commit 06f4f791 authored Dec 09, 2020 by Amy Qualls
Hide whitespace changes
Inline Side-by-side

Showing with 150 additions and 0 deletions

doc/.vale/gitlab/spelling-exceptions.txt doc/.vale/gitlab/spelling-exceptions.txt +2 -0

doc/development/agent/gitops.md doc/development/agent/gitops.md +148 -0

No files found.
--- a/doc/.vale/gitlab/spelling-exceptions.txt
+++ b/doc/.vale/gitlab/spelling-exceptions.txt
@@ -172,6 +172,8 @@ globals
 Gmail
 Gollum
 Google
+goroutine
+goroutines
 Gosec
 Gradle
 Grafana

--- a/doc/development/agent/gitops.md
+++ b/doc/development/agent/gitops.md
+---
+stage: Configure
+group: Configure
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
+# GitOps with the Kubernetes Agent **(PREMIUM ONLY)**
+
+The [GitLab Kubernetes Agent](../../user/clusters/agent/index.md) supports the
+[pull-based version](https://www.gitops.tech/#pull-based-deployments) of
+[GitOps](https://www.gitops.tech/). To be useful, the feature must be able to perform these tasks:
+
+- Connect one or more Kubernetes clusters to a GitLab project or group.
+- Synchronize cluster-wide state from a Git repository.
+- Synchronize namespace-scoped state from a Git repository.
+- Control the following settings:
+
+  - The kinds of objects an agent can manage.
+  - Enabling the namespaced mode of operation for managing objects only in a specific namespace.
+  - Enabling the non-namespaced mode of operation for managing objects in any namespace, and
+    managing non-namespaced objects.
+
+- Synchronize state from one or more Git repositories into a cluster.
+- Configure multiple agents running in different clusters to synchronize state
+  from the same repository.
+
+## GitOps architecture
+
+In this architecture, the Kubernetes cluster (`agentk`) periodically fetches
+configuration from (`kas`), spawning a goroutine for each configured GitOps
+repository. Each goroutine makes a streaming `GetObjectsToSynchronize()` gRPC call.
+`kas` accepts these requests, then checks if this agent is authorized to access
+this GitLab repository. If authorized, `kas` polls Gitaly for repository updates
+and sends the latest manifests to the agent.
+
+Before each poll, `kas` verifies with GitLab that the agent's token is still valid.
+When `agentk` receives an updated manifest, it performs a synchronization using
+[`gitops-engine`](https://github.com/argoproj/gitops-engine).
+
+If a repository is removed from the list, `agentk` stops the `GetObjectsToSynchronize()`
+calls to that repository.
+
+```mermaid
+graph TB
+  agentk -- fetch configuration --> kas
+  agentk -- fetch GitOps manifests --> kas
+
+  subgraph "GitLab"
+  kas[kas]
+  GitLabRoR[GitLab RoR]
+  Gitaly[Gitaly]
+  kas -- poll GitOps repositories --> Gitaly
+  kas -- authZ for agentk --> GitLabRoR
+  kas -- fetch configuration --> Gitaly
+  end
+
+  subgraph "Kubernetes cluster"
+  agentk[agentk]
+  end
+```
+
+## Architecture considered but not implemented
+
+As part of the implementation process, this architecture was considered, but ultimately
+not implemented.
+
+In this architecture, `agentk` periodically fetches configuration from `kas`. For each
+configured GitOps repository, it spawns a goroutine. Each goroutine then spawns a
+copy of [`git-sync`](https://github.com/kubernetes/git-sync). It polls a particular
+repository and invokes a corresponding webhook on `agentk` when it changes. When that
+happens, `agentk` performs a synchronization using
+[`gitops-engine`](https://github.com/argoproj/gitops-engine).
+
+For repositories no longer in the list, `agentk` stops corresponding goroutines
+and `git-sync` copies, also deleting their cloned repositories from disk:
+
+```mermaid
+graph TB
+  agentk -- fetch configuration --> kas
+  git-sync -- poll GitOps repositories --> GitLabRoR
+
+  subgraph "GitLab"
+  kas[kas]
+  GitLabRoR[GitLab RoR]
+  kas -- authZ for agentk --> GitLabRoR
+  kas -- fetch configuration --> Gitaly[Gitaly]
+  end
+
+  subgraph "Kubernetes cluster"
+  agentk[agentk]
+  git-sync[git-sync]
+  agentk -- control --> git-sync
+  git-sync -- notify about changes --> agentk
+  end
+```
+
+## Comparing implemented and non-implemented architectures
+
+Both architectures attempt to answer the same question: how to grant an agent
+access to a non-public repository?
+
+In the **implemented** architecture:
+
+- Favorable: Fewer moving parts, as `git-sync` and `git` are not used, making this
+  design more reliable.
+- Favorable: Uses existing connectivity and authentication mechanisms are used (gRPC + `agentk` token).
+- Favorable: No polling through external infrastructure. Saves traffic and avoids
+  noise in access logs.
+
+In the **unimplemented** architecture:
+
+- Favorable: `agentk` uses `git-sync` to access repositories with standard protocols
+  (either HTTPS, or SSH and Git) with accepted authentication and authorization methods.
+
+  - Unfavorable: The user must put credentials into a `secret`. GitLab doesn't have
+    a mechanism for per-repository tokens for robots.
+  - Unfavorable: Rotating all credentials is more work than rotating a single `agentk` token.
+
+- Unfavorable: A dependency on an external component (`git-sync`) that can be avoided.
+- Unfavorable: More network traffic and connections than the implemented design
+
+### Ideas considered for the unimplemented design
+
+As part of the design process, these ideas were considered, and discarded:
+
+- Running `git-sync` and `gitops-engine` as part of `kas`.
+
+  - Favorable: More code and infrastructure under our control for GitLab.com
+  - Unfavorable: Running an arbitrary number of `git-sync` processes would require
+    an unbounded amount of RAM and disk space.
+  - Unfavorable: Unclear which `kas` replica is responsible for which agent and
+    repository synchronization. If done as part of `agentk`, leader election can be
+    done using [client-go](https://pkg.go.dev/k8s.io/client-go/tools/leaderelection?tab=doc).
+
+- Running `git-sync` and a "`gitops-engine` driver" helper program as a separate
+  Kubernetes `Deployment`.
+
+  - Favorable: Better isolation and higher resiliency. For example, if the node
+    with `agentk` dies, not all synchronization stops.
+  - Favorable: Each deployment has its own memory and disk limits.
+  - Favorable: Per-repository synchronization identity (distinct `ServiceAccount`)
+    can be implemented.
+  - Unfavorable: Time consuming to implement properly:
+
+    - Each `Deployment` needs CRUD (create, update, and delete) permissions.
+    - Users may want to customize a `Deployment`, or add and remove satellite objects
+      like `PodDisruptionBudget`, `HorizontalPodAutoscaler`, and `PodSecurityPolicy`.
+    - Metrics, monitoring, logs for the `Deployment`.