Commit b5665e3a authored by Nick Gaskill's avatar Nick Gaskill

Merge branch 'eread/unify-all-gitaly-troubleshooting' into 'master'

Unify all Gitaly troubleshooting

See merge request gitlab-org/gitlab!63824
parents e67370db 1a2d028d
...@@ -391,26 +391,24 @@ Before troubleshooting, see the Gitaly and Gitaly Cluster ...@@ -391,26 +391,24 @@ Before troubleshooting, see the Gitaly and Gitaly Cluster
The following sections provide possible solutions to Gitaly errors. The following sections provide possible solutions to Gitaly errors.
See also: See also [Gitaly timeout](../../user/admin_area/settings/gitaly_timeouts.md) settings.
- [Gitaly timeout](../../user/admin_area/settings/gitaly_timeouts.md) settings.
- [Gitaly troubleshooting information](../reference_architectures/troubleshooting.md#troubleshooting-gitaly)
in reference architecture documentation.
#### Check versions when using standalone Gitaly servers #### Check versions when using standalone Gitaly servers
When using standalone Gitaly servers, you must make sure they are the same version When using standalone Gitaly servers, you must make sure they are the same version
as GitLab to ensure full compatibility. Check **Admin Area > Overview > Gitaly Servers** on as GitLab to ensure full compatibility:
your GitLab instance and confirm all Gitaly servers indicate that they are up to date.
1. Go to **Admin Area > Overview > Gitaly Servers** on your GitLab instance.
1. Confirm all Gitaly servers indicate that they are up to date.
#### `gitaly-debug` #### Use `gitaly-debug`
The `gitaly-debug` command provides "production debugging" tools for Gitaly and Git The `gitaly-debug` command provides "production debugging" tools for Gitaly and Git
performance. It is intended to help production engineers and support performance. It is intended to help production engineers and support
engineers investigate Gitaly performance problems. engineers investigate Gitaly performance problems.
If you're using GitLab 11.6 or newer, this tool should be installed on If you're using GitLab 11.6 or newer, this tool should be installed on
your GitLab / Gitaly server already at `/opt/gitlab/embedded/bin/gitaly-debug`. your GitLab or Gitaly server already at `/opt/gitlab/embedded/bin/gitaly-debug`.
If you're investigating an older GitLab version you can compile this If you're investigating an older GitLab version you can compile this
tool offline and copy the executable to your server: tool offline and copy the executable to your server:
...@@ -432,13 +430,13 @@ gitaly-debug -h ...@@ -432,13 +430,13 @@ gitaly-debug -h
remote: GitLab: 401 Unauthorized remote: GitLab: 401 Unauthorized
``` ```
You need to sync your `gitlab-secrets.json` file with your Gitaly clients (GitLab You need to sync your `gitlab-secrets.json` file with your GitLab
app nodes). application nodes.
#### Client side gRPC logs #### Client side gRPC logs
Gitaly uses the [gRPC](https://grpc.io/) RPC framework. The Ruby gRPC Gitaly uses the [gRPC](https://grpc.io/) RPC framework. The Ruby gRPC
client has its own log file which may contain debugging information when client has its own log file which may contain useful information when
you are seeing Gitaly errors. You can control the log level of the you are seeing Gitaly errors. You can control the log level of the
gRPC client with the `GRPC_LOG_LEVEL` environment variable. The gRPC client with the `GRPC_LOG_LEVEL` environment variable. The
default level is `WARN`. default level is `WARN`.
...@@ -490,12 +488,13 @@ so, there's not that much visibility into what goes on inside ...@@ -490,12 +488,13 @@ so, there's not that much visibility into what goes on inside
If you have Prometheus set up to scrape your Gitaly process, you can see If you have Prometheus set up to scrape your Gitaly process, you can see
request rates and error codes for individual RPCs in `gitaly-ruby` by request rates and error codes for individual RPCs in `gitaly-ruby` by
querying `grpc_client_handled_total`. Strictly speaking, this metric does querying `grpc_client_handled_total`.
not differentiate between `gitaly-ruby` and other RPCs. However from GitLab 11.9,
all gRPC calls made by Gitaly itself are internal calls from the main Gitaly process to one of its - In theory, this metric does not differentiate between `gitaly-ruby` and other RPCs.
`gitaly-ruby` sidecars. - In practice from GitLab 11.9, all gRPC calls made by Gitaly itself are internal calls from the
main Gitaly process to one of its `gitaly-ruby` sidecars.
Assuming your `grpc_client_handled_total` counter observes only Gitaly, Assuming your `grpc_client_handled_total` counter only observes Gitaly,
the following query shows you RPCs are (most likely) internally the following query shows you RPCs are (most likely) internally
implemented as calls to `gitaly-ruby`: implemented as calls to `gitaly-ruby`:
...@@ -529,7 +528,7 @@ Confirm the following are all true: ...@@ -529,7 +528,7 @@ Confirm the following are all true:
- When any user adds or modifies a file from the repository using the GitLab - When any user adds or modifies a file from the repository using the GitLab
UI, it immediately fails with a red `401 Unauthorized` banner. UI, it immediately fails with a red `401 Unauthorized` banner.
- Creating a new project and [initializing it with a README](../../user/project/working_with_projects.md#blank-projects) - Creating a new project and [initializing it with a README](../../user/project/working_with_projects.md#blank-projects)
successfully creates the project, but doesn't create the README. successfully creates the project but doesn't create the README.
- When [tailing the logs](https://docs.gitlab.com/omnibus/settings/logs.html#tail-logs-in-a-console-on-the-server) - When [tailing the logs](https://docs.gitlab.com/omnibus/settings/logs.html#tail-logs-in-a-console-on-the-server)
on a Gitaly client and reproducing the error, you get `401` errors on a Gitaly client and reproducing the error, you get `401` errors
when reaching the [`/api/v4/internal/allowed`](../../development/internal_api.md) endpoint: when reaching the [`/api/v4/internal/allowed`](../../development/internal_api.md) endpoint:
...@@ -611,22 +610,24 @@ Verify you can reach Gitaly by using TCP: ...@@ -611,22 +610,24 @@ Verify you can reach Gitaly by using TCP:
sudo gitlab-rake gitlab:tcp_check[GITALY_SERVER_IP,GITALY_LISTEN_PORT] sudo gitlab-rake gitlab:tcp_check[GITALY_SERVER_IP,GITALY_LISTEN_PORT]
``` ```
If the TCP connection fails, check your network settings and your firewall rules. If the TCP connection:
If the TCP connection succeeds, your networking and firewall rules are correct.
If you use proxy servers in your command line environment, such as Bash, these - Fails, check your network settings and your firewall rules.
can interfere with your gRPC traffic. - Succeeds, your networking and firewall rules are correct.
If you use Bash or a compatible command line environment, run the following commands If you use proxy servers in your command line environment such as Bash, these can interfere with
to determine whether you have proxy servers configured: your gRPC traffic.
If you use Bash or a compatible command line environment, run the following commands to determine
whether you have proxy servers configured:
```shell ```shell
echo $http_proxy echo $http_proxy
echo $https_proxy echo $https_proxy
``` ```
If either of these variables have a value, your Gitaly CLI connections may be If either of these variables have a value, your Gitaly CLI connections may be getting routed through
getting routed through a proxy which cannot connect to Gitaly. a proxy which cannot connect to Gitaly.
To remove the proxy setting, run the following commands (depending on which variables had values): To remove the proxy setting, run the following commands (depending on which variables had values):
...@@ -663,6 +664,22 @@ it's likely that the Gitaly servers are experiencing ...@@ -663,6 +664,22 @@ it's likely that the Gitaly servers are experiencing
Ensure the Gitaly clients and servers are synchronized, and use an NTP time Ensure the Gitaly clients and servers are synchronized, and use an NTP time
server to keep them synchronized. server to keep them synchronized.
#### Gitaly not listening on new address after reconfiguring
When updating the `gitaly['listen_addr']` or `gitaly['prometheus_listen_addr']` values, Gitaly may
continue to listen on the old address after a `sudo gitlab-ctl reconfigure`.
When this occurs, run `sudo gitlab-ctl restart` to resolve the issue. This should no longer be
necessary because [this issue](https://gitlab.com/gitlab-org/gitaly/-/issues/2521) is resolved.
#### Permission denied errors appearing in Gitaly logs when accessing repositories from a standalone Gitaly node
If this error occurs even though file permissions are correct, it's likely that the Gitaly node is
experiencing [clock drift](https://en.wikipedia.org/wiki/Clock_drift).
Please ensure that the GitLab and Gitaly nodes are synchronized and use an NTP time
server to keep them synchronized if possible.
### Troubleshoot Praefect (Gitaly Cluster) ### Troubleshoot Praefect (Gitaly Cluster)
The following sections provide possible solutions to Gitaly Cluster errors. The following sections provide possible solutions to Gitaly Cluster errors.
...@@ -717,7 +734,7 @@ For example: ...@@ -717,7 +734,7 @@ For example:
{"level":"error","msg":"Error updating node: pq: relation \"node_status\" does not exist","pid":210882,"praefectName":"gitlab1x4m:0.0.0.0:2305","time":"2021-04-01T19:26:19.473Z","virtual_storage":"praefect-cluster-1"} {"level":"error","msg":"Error updating node: pq: relation \"node_status\" does not exist","pid":210882,"praefectName":"gitlab1x4m:0.0.0.0:2305","time":"2021-04-01T19:26:19.473Z","virtual_storage":"praefect-cluster-1"}
``` ```
To solve this, the database schema migration can be done using `sql-migrate` subcommand of To solve this, the database schema migration can be done using `sql-migrate` sub-command of
the `praefect` command: the `praefect` command:
```shell ```shell
......
...@@ -206,211 +206,8 @@ To make sure your configuration is correct: ...@@ -206,211 +206,8 @@ To make sure your configuration is correct:
## Troubleshooting Gitaly ## Troubleshooting Gitaly
If you have any problems when using standalone Gitaly nodes, first For troubleshooting information, see Gitaly and Gitaly Cluster
[check all the versions are up to date](../gitaly/index.md#check-versions-when-using-standalone-gitaly-servers). [troubleshooting information](../gitaly/index.md).
### `gitaly-debug`
The `gitaly-debug` command provides "production debugging" tools for Gitaly and Git
performance. It is intended to help production engineers and support
engineers investigate Gitaly performance problems.
If you're using GitLab 11.6 or newer, this tool should be installed on
your GitLab / Gitaly server already at `/opt/gitlab/embedded/bin/gitaly-debug`.
If you're investigating an older GitLab version you can compile this
tool offline and copy the executable to your server:
```shell
git clone https://gitlab.com/gitlab-org/gitaly.git
cd cmd/gitaly-debug
GOOS=linux GOARCH=amd64 go build -o gitaly-debug
```
To see the help page of `gitaly-debug` for a list of supported sub-commands, run:
```shell
gitaly-debug -h
```
### Commits, pushes, and clones return a 401
```plaintext
remote: GitLab: 401 Unauthorized
```
You will need to sync your `gitlab-secrets.json` file with your GitLab
app nodes.
### Client side gRPC logs
Gitaly uses the [gRPC](https://grpc.io/) RPC framework. The Ruby gRPC
client has its own log file which may contain useful information when
you are seeing Gitaly errors. You can control the log level of the
gRPC client with the `GRPC_LOG_LEVEL` environment variable. The
default level is `WARN`.
You can run a gRPC trace with:
```shell
sudo GRPC_TRACE=all GRPC_VERBOSITY=DEBUG gitlab-rake gitlab:gitaly:check
```
### Observing `gitaly-ruby` traffic
[`gitaly-ruby`](../gitaly/configure_gitaly.md#gitaly-ruby) is an internal implementation detail of Gitaly,
so, there's not that much visibility into what goes on inside
`gitaly-ruby` processes.
If you have Prometheus set up to scrape your Gitaly process, you can see
request rates and error codes for individual RPCs in `gitaly-ruby` by
querying `grpc_client_handled_total`. Strictly speaking, this metric does
not differentiate between `gitaly-ruby` and other RPCs, but in practice
(as of GitLab 11.9), all gRPC calls made by Gitaly itself are internal
calls from the main Gitaly process to one of its `gitaly-ruby` sidecars.
Assuming your `grpc_client_handled_total` counter only observes Gitaly,
the following query shows you RPCs are (most likely) internally
implemented as calls to `gitaly-ruby`:
```prometheus
sum(rate(grpc_client_handled_total[5m])) by (grpc_method) > 0
```
### Repository changes fail with a `401 Unauthorized` error
If you're running Gitaly on its own server and notice that users can
successfully clone and fetch repositories (via both SSH and HTTPS), but can't
push to them or make changes to the repository in the web UI without getting a
`401 Unauthorized` message, then it's possible Gitaly is failing to authenticate
with the other nodes due to having the wrong secrets file.
Confirm the following are all true:
- When any user performs a `git push` to any repository on this Gitaly node, it
fails with the following error (note the `401 Unauthorized`):
```shell
remote: GitLab: 401 Unauthorized
To <REMOTE_URL>
! [remote rejected] branch-name -> branch-name (pre-receive hook declined)
error: failed to push some refs to '<REMOTE_URL>'
```
- When any user adds or modifies a file from the repository using the GitLab
UI, it immediately fails with a red `401 Unauthorized` banner.
- Creating a new project and [initializing it with a README](../../user/project/working_with_projects.md#blank-projects)
successfully creates the project but doesn't create the README.
- When [tailing the logs](https://docs.gitlab.com/omnibus/settings/logs.html#tail-logs-in-a-console-on-the-server) on an app node and reproducing the error, you get `401` errors
when reaching the [`/api/v4/internal/allowed`](../../development/internal_api.md) endpoint:
```shell
# api_json.log
{
"time": "2019-07-18T00:30:14.967Z",
"severity": "INFO",
"duration": 0.57,
"db": 0,
"view": 0.57,
"status": 401,
"method": "POST",
"path": "\/api\/v4\/internal\/allowed",
"params": [
{
"key": "action",
"value": "git-receive-pack"
},
{
"key": "changes",
"value": "REDACTED"
},
{
"key": "gl_repository",
"value": "REDACTED"
},
{
"key": "project",
"value": "\/path\/to\/project.git"
},
{
"key": "protocol",
"value": "web"
},
{
"key": "env",
"value": "{\"GIT_ALTERNATE_OBJECT_DIRECTORIES\":[],\"GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE\":[],\"GIT_OBJECT_DIRECTORY\":null,\"GIT_OBJECT_DIRECTORY_RELATIVE\":null}"
},
{
"key": "user_id",
"value": "2"
},
{
"key": "secret_token",
"value": "[FILTERED]"
}
],
"host": "gitlab.example.com",
"ip": "REDACTED",
"ua": "Ruby",
"route": "\/api\/:version\/internal\/allowed",
"queue_duration": 4.24,
"gitaly_calls": 0,
"gitaly_duration": 0,
"correlation_id": "XPUZqTukaP3"
}
# nginx_access.log
[IP] - - [18/Jul/2019:00:30:14 +0000] "POST /api/v4/internal/allowed HTTP/1.1" 401 30 "" "Ruby"
```
To fix this problem, confirm that your `gitlab-secrets.json` file
on the Gitaly node matches the one on all other nodes. If it doesn't match,
update the secrets file on the Gitaly node to match the others, then
[reconfigure the node](../restart_gitlab.md#omnibus-gitlab-reconfigure).
### Command line tools cannot connect to Gitaly
If you are having trouble connecting to a Gitaly node with command line (CLI) tools, and certain actions result in a `14: Connect Failed` error message, it means that gRPC cannot reach your Gitaly node.
Verify that you can reach Gitaly via TCP:
```shell
sudo gitlab-rake gitlab:tcp_check[GITALY_SERVER_IP,GITALY_LISTEN_PORT]
```
If the TCP connection fails, check your network settings and your firewall rules. If the TCP connection succeeds, your networking and firewall rules are correct.
If you use proxy servers in your command line environment, such as Bash, these can interfere with your gRPC traffic.
If you use Bash or a compatible command line environment, run the following commands to determine whether you have proxy servers configured:
```shell
echo $http_proxy
echo $https_proxy
```
If either of these variables have a value, your Gitaly CLI connections may be getting routed through a proxy which cannot connect to Gitaly.
To remove the proxy setting, run the following commands (depending on which variables had values):
```shell
unset http_proxy
unset https_proxy
```
### Gitaly not listening on new address after reconfiguring
When updating the `gitaly['listen_addr']` or `gitaly['prometheus_listen_addr']` values, Gitaly may continue to listen on the old address after a `sudo gitlab-ctl reconfigure`.
When this occurs, performing a `sudo gitlab-ctl restart` will resolve the issue. This will no longer be necessary after [this issue](https://gitlab.com/gitlab-org/gitaly/-/issues/2521) is resolved.
### Permission denied errors appearing in Gitaly logs when accessing repositories from a standalone Gitaly node
If this error occurs even though file permissions are correct, it's likely that
the Gitaly node is experiencing
[clock drift](https://en.wikipedia.org/wiki/Clock_drift).
Please ensure that the GitLab and Gitaly nodes are synchronized and use an NTP time
server to keep them synchronized if possible.
## Troubleshooting the GitLab Rails application ## Troubleshooting the GitLab Rails application
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment