info:To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
info:To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
The regex for names is: `/\A[a-z0-9]([-a-z0-9]*[a-z0-9])?\z/`.
## Multiple agents in a cluster
A Kubernetes cluster may have 0 or more agents running in it. Each agent likely
has a different configuration. Some may enable features A and B, and some may
enable features B and C. This flexibility enables different groups of people to
use different features of the agent in the same cluster.
For example, [Priyanka (Platform Engineer)](https://about.gitlab.com/handbook/marketing/strategic-marketing/roles-personas/#priyanka-platform-engineer)
may want to use cluster-wide features of the agent, while
a distinct Kubernetes identity, with a distinct set of permissions attached to it.
These permissions enable the agent administrator to follow the
[principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege)
and minimize the permissions each particular agent needs.
## Kubernetes Agent authentication
When adding a new agent, GitLab provides the user with a bearer access token. The
agent uses this token to authenticate with GitLab. This token is a random string
and does not encode any information in it, but it is secret and must
be treated with care. Store it as a `Secret` in Kubernetes.
Each agent can have 0 or more tokens in a GitLab database. Having several valid
tokens helps you rotate tokens without needing to re-register an agent. Each token
record in the database has the following fields:
- Agent identity it belongs to.
- Token value. Encrypted at rest.
- Creation time.
- Who created it.
- Revocation flag to mark token as revoked.
- Revocation time.
- Who revoked it.
- A text field to store any comments the administrator may want to make about the token for future self.
Tokens can be managed by users with `maintainer` and higher level of
[permissions](../../user/permissions.md).
Tokens are immutable, and only the following fields can be updated:
- Revocation flag. Can only be updated to `true` once, but immutable after that.
- Revocation time. Set to the current time when revocation flag is set, but immutable after that.
- Comments field. Can be updated any number of times, including after the token has been revoked.
The agent sends its token, along with each request, to GitLab to authenticate itself.
For each request, GitLab checks the token's validity:
- Does the token exist in the database?
- Has the token been revoked?
This information may be cached for some time to reduce load on the database.
## Kubernetes Agent authorization
GitLab provides the following information in its response for a given Agent access token:
- Agent configuration Git repository. (The agent doesn't support per-folder authorization.)
- Agent name.
## Create an agent
You can create an agent by following the [user documentation](../../user/clusters/agent/index.md#create-an-agent-record-in-gitlab), or via Rails console:
info:To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
This file was moved to [another location](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/doc/architecture.md).
This page contains developer-specific information about the GitLab Kubernetes Agent.
[End-user documentation about the GitLab Kubernetes Agent](../../user/clusters/agent/index.md)
is also available.
The agent can help you perform tasks like these:
- Integrate a cluster, located behind a firewall or NAT, with GitLab. To
learn more, read [issue #212810, Invert the model GitLab.com uses for Kubernetes integration by leveraging long lived reverse tunnels](https://gitlab.com/gitlab-org/gitlab/-/issues/212810).
- Access API endpoints in a cluster in real time. For an example use case, read
[issue #218220, Allow Prometheus in K8s cluster to be installed manually](https://gitlab.com/gitlab-org/gitlab/-/issues/218220#note_348729266).
- Enable real-time features by pushing information about events happening in a cluster.
For example, you could build a cluster view dashboard to visualize changes in progress
in a cluster. For more information about these efforts, read about the
[Real-Time Working Group](https://about.gitlab.com/company/team/structure/working-groups/real-time/).
- Enable a [cache of Kubernetes objects through informers](https://github.com/kubernetes/client-go/blob/ccd5becdffb7fd8006e31341baaaacd14db2dcb7/tools/cache/shared_informer.go#L34-L183),
kept up-to-date with very low latency. This cache helps you:
- Reduce or eliminate information propagation latency by avoiding Kubernetes API calls
and polling, and only fetching data from an up-to-date cache.
- Lower the load placed on the Kubernetes API by removing polling.
- Eliminate any rate-limiting errors by removing polling.
- Simplify backend code by replacing polling code with cache access. While it's another
API call, no polling is needed. This example describes [fetching cached data synchronously from the front end](https://gitlab.com/gitlab-org/gitlab/-/issues/217792#note_348582537) instead of fetching data from the Kubernetes API.
## Architecture of the Kubernetes Agent
The GitLab Kubernetes Agent and the GitLab Kubernetes Agent Server use
This diagram describes how GitLab (`GitLab RoR`), the GitLab Kubernetes Agent (`agentk`), and the GitLab Kubernetes Agent Server (`kas`) work together.
```mermaid
graph TB
agentk -- gRPC bidirectional streaming --> kas
subgraph "GitLab"
kas[kas]
GitLabRoR[GitLab RoR] -- gRPC --> kas
kas -- gRPC --> Gitaly[Gitaly]
kas -- REST API --> GitLabRoR
end
subgraph "Kubernetes cluster"
agentk[agentk]
end
```
-`GitLab RoR` is the main GitLab application. It uses gRPC to talk to `kas`.
-`agentk` is the GitLab Kubernetes Agent. It keeps a connection established to a
`kas` instance, waiting for requests to process. It may also actively send information
about things happening in the cluster.
-`kas` is the GitLab Kubernetes Agent Server, and is responsible for:
- Accepting requests from `agentk`.
-[Authentication of requests](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/doc/identity_and_auth.md) from `agentk` by querying `GitLab RoR`.
- Fetching agent's configuration from a corresponding Git repository by querying Gitaly.
- Matching incoming requests from `GitLab RoR` with existing connections from
the right `agentk`, forwarding requests to it and forwarding responses back.
- (Optional) Sending notifications through ActionCable for events received from `agentk`.
- Polling manifest repositories for [GitOps support](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/doc/gitops.md) by communicating with Gitaly.
GitLab prefers to add logic into `kas` rather than `agentk`. `agentk` should be kept
streamlined and small to minimize the need for upgrades. On GitLab.com, `kas` is
managed by GitLab, so upgrades and features can be added without requiring you
to upgrade `agentk` in your clusters.
`agentk` can't be viewed as a dumb reverse proxy because features are planned to be built
[on top of the cache with informers](https://github.com/kubernetes/client-go/blob/ccd5becdffb7fd8006e31341baaaacd14db2dcb7/tools/cache/shared_informer.go#L34-L183).
<!-- This redirect file can be deleted after <2022-06-24>. -->
<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/#move-or-rename-a-page -->
info:To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
[the agent you created](../../user/clusters/agent/index.md#create-an-agent-record-in-gitlab). This file must not contain a newline character. You can create the file with this command:
```shell
echo-n"<TOKEN>"> token.txt
```
1. Start the binaries with the following commands:
```shell
# Need GitLab to start
gdk start
# Stop GDK's version of kas
gdk stop gitlab-k8s-agent
# Start kas
bazel run //cmd/kas ----configuration-file="$(pwd)/cfg.yaml"
```
1. In a new terminal window, run this command to start `agentk`:
```shell
bazel run //cmd/agentk ----kas-address=grpc://127.0.0.1:8150 --token-file="$(pwd)/token.txt"
The `kas` QA tests currently have some limitations. You can run them manually on GDK, but they don't
run automatically with the nightly jobs against the live environment. See the section below
to learn how to run them against different environments.
### Prepare
Before performing any of these tests, if you have a `k3s` instance running, make sure to
stop it manually before running them. Otherwise, the tests might fail with the message
`failed to remove k3s cluster`.
You might need to specify the correct Agent image version that matches the `kas` image version. You can use the `GITLAB_AGENTK_VERSION` local environment for this.
### Against `staging`
1. Go to your local `qa/qa/service/cluster_provider/k3s.rb` and comment out
[this line](https://gitlab.com/gitlab-org/gitlab/-/blob/5b15540ea78298a106150c3a1d6ed26416109b9d/qa/qa/service/cluster_provider/k3s.rb#L8) and
1. Go to your `qa/qa/fixtures/kubernetes_agent/agentk-manifest.yaml.erb` and comment out [this line](https://gitlab.com/gitlab-org/gitlab/-/blob/a55b78532cfd29426cf4e5b4edda81407da9d449/qa/qa/fixtures/kubernetes_agent/agentk-manifest.yaml.erb#L27) and uncomment [this line](https://gitlab.com/gitlab-org/gitlab/-/blob/a55b78532cfd29426cf4e5b4edda81407da9d449/qa/qa/fixtures/kubernetes_agent/agentk-manifest.yaml.erb#L28).
GDK's `kas` listens on `grpc`, not on `wss`.
1. Go to the GDK's root folder and `cd gitlab/qa`.
1. On the contrary to staging, run the QA test in GDK as admin, which is the default choice. To do so, use the default sandbox group and run the command below. Make sure to adjust your credentials if necessary, otherwise, the test might fail:
info:To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
info:To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
For this architecture, this diagram shows a request to `agentk 3, Pod1` for the list of pods:
```mermaid
sequenceDiagram
GitLab->>+kas1: Get list of running<br />Pods from agentk<br />with agent_id=3
Note right of kas1: kas1 checks for<br />agent connected with agent_id=3.<br />It does not.<br />Queries Redis
kas1->>+Redis: Get list of connected agents<br />with agent_id=3
Redis-->-kas1: List of connected agents<br />with agent_id=3
Note right of kas1: kas1 picks a specific agentk instance<br />to address and talks to<br />the corresponding kas instance,<br />specifying which agentk instance<br />to route the request to.
kas1->>+kas2: Get the list of running Pods<br />from agentk 3, Pod1
kas2->>+agentk 3 Pod1: Get list of Pods
agentk 3 Pod1->>-kas2: Get list of Pods
kas2-->>-kas1: List of running Pods<br />from agentk 3, Pod1
kas1-->>-GitLab: List of running Pods<br />from agentk with agent_id=3
```
Each `kas` instance tracks the agents connected to it in Redis. For each agent, it
stores a serialized protobuf object with information about the agent. When an agent
disconnects, `kas` removes all corresponding information from Redis. For both events,
`kas` publishes a notification to a Redis [pub-sub channel](https://redis.io/topics/pubsub).
Each agent, while logically a single entity, can have multiple replicas (multiple pods)
in a cluster. `kas` accommodates that and records per-replica (generally per-connection)
information. Each open `GetConfiguration()` streaming request is given
a unique identifier which, combined with agent ID, identifies an `agentk` instance.
gRPC can keep multiple TCP connections open for a single target host. `agentk` only
runs one `GetConfiguration()` streaming request. `kas` uses that connection, and
doesn't see idle TCP connections because they are handled by the gRPC framework.
Each `kas` instance provides information to Redis, so other `kas` instances can discover and access it.
Information is stored in Redis with an [expiration time](https://redis.io/commands/expire),
to expire information for `kas` instances that become unavailable. To prevent
information from expiring too quickly, `kas` periodically updates the expiration time
for valid entries. Before terminating, `kas` cleans up the information it adds into Redis.
When `kas` must atomically update multiple data structures in Redis, it uses
[transactions](https://redis.io/topics/transactions) to ensure data consistency.
Grouped data items must have the same expiration time.
In addition to the existing `agentk -> kas` gRPC endpoint, `kas` exposes two new,
separate gRPC endpoints for GitLab and for `kas -> kas` requests. Each endpoint
is a separate network listener, making it easier to control network access to endpoints
and allowing separate configuration for each endpoint.
Databases, like PostgreSQL, aren't used because the data is transient, with no need
to reliably persist it.
### `GitLab : kas` external endpoint
GitLab authenticates with `kas` using JWT and the same shared secret used by the
`kas -> GitLab` communication. The JWT issuer should be `gitlab` and the audience
should be `gitlab-kas`.
When accessed through this endpoint, `kas` plays the role of request router.
If a request from GitLab comes but no connected agent can handle it, `kas` blocks
and waits for a suitable agent to connect to it or to another `kas` instance. It
stops waiting when the client disconnects, or when some long timeout happens, such
as client timeout. `kas` is notified of new agent connections through a
[pub-sub channel](https://redis.io/topics/pubsub) to avoid frequent polling.
When a suitable agent connects, `kas` routes the request to it.
### `kas : kas` internal endpoint
This endpoint is an implementation detail, an internal API, and should not be used
by any other system. It's protected by JWT using a secret, shared among all `kas`
instances. No other system must have access to this secret.
When accessed through this endpoint, `kas` uses the request itself to determine
which `agentk` to send the request to. It prevents request cycles by only following
the instructions in the request, rather than doing discovery. It's the responsibility
of the `kas` receiving the request from the _external_ endpoint to retry and re-route
requests. This method ensures a single central component for each request can determine
how a request is routed, rather than distributing the decision across several `kas` instances.
### Reverse gRPC tunnel
This section explains how the `agentk` -> `kas` reverse gRPC tunnel is implemented.
info:To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
@@ -23,7 +23,7 @@ tasks in a secure and cloud-native way. It enables:
-[CI/CD Tunnel](ci_cd_tunnel.md) that enables users to access Kubernetes clusters from GitLab CI/CD jobs even if there is no network connectivity between GitLab Runner and a cluster.
Many more features are planned. Please review [our roadmap](https://gitlab.com/groups/gitlab-org/-/epics/3329)
and [our development documentation](../../../development/agent/index.md).
and [our development documentation](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/tree/master/doc).