Commit adddc040 authored by Kati Paizee's avatar Kati Paizee

Merge branch 'DmitrijsF-master-patch-57153' into 'master'

Update elasticsearch.md to replace future tense with present tense

See merge request gitlab-org/gitlab!75740
parents cea38abd 3aa12e23
...@@ -207,7 +207,7 @@ The following Elasticsearch settings are available: ...@@ -207,7 +207,7 @@ The following Elasticsearch settings are available:
| `Password` | The password of your Elasticsearch instance. | | `Password` | The password of your Elasticsearch instance. |
| `Number of Elasticsearch shards` | Elasticsearch indexes are split into multiple shards for performance reasons. In general, you should use at least 5 shards, and indexes with tens of millions of documents need to have more shards ([see below](#guidance-on-choosing-optimal-cluster-configuration)). Changes to this value do not take effect until the index is recreated. You can read more about tradeoffs in the [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html). | | `Number of Elasticsearch shards` | Elasticsearch indexes are split into multiple shards for performance reasons. In general, you should use at least 5 shards, and indexes with tens of millions of documents need to have more shards ([see below](#guidance-on-choosing-optimal-cluster-configuration)). Changes to this value do not take effect until the index is recreated. You can read more about tradeoffs in the [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html). |
| `Number of Elasticsearch replicas` | Each Elasticsearch shard can have a number of replicas. These are a complete copy of the shard, and can provide increased query performance or resilience against hardware failure. Increasing this value increases total disk space required by the index. | | `Number of Elasticsearch replicas` | Each Elasticsearch shard can have a number of replicas. These are a complete copy of the shard, and can provide increased query performance or resilience against hardware failure. Increasing this value increases total disk space required by the index. |
| `Limit namespaces and projects that can be indexed` | Enabling this allows you to select namespaces and projects to index. All other namespaces and projects use database search instead. If you enable this option but do not select any namespaces or projects, none will be indexed. [Read more below](#limit-namespaces-and-projects). | `Limit namespaces and projects that can be indexed` | Enabling this allows you to select namespaces and projects to index. All other namespaces and projects use database search instead. If you enable this option but do not select any namespaces or projects, none are indexed. [Read more below](#limit-namespaces-and-projects).
| `Using AWS hosted Elasticsearch with IAM credentials` | Sign your Elasticsearch requests using [AWS IAM authorization](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html), [AWS EC2 Instance Profile Credentials](https://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-iam-instance-profile.html#getting-started-create-iam-instance-profile-cli), or [AWS ECS Tasks Credentials](https://docs.aws.amazon.com/AmazonECS/latest/userguide/task-iam-roles.html). Please refer to [Identity and Access Management in Amazon Elasticsearch Service](https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-ac.html) for details of AWS hosted Elasticsearch domain access policy configuration. | | `Using AWS hosted Elasticsearch with IAM credentials` | Sign your Elasticsearch requests using [AWS IAM authorization](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html), [AWS EC2 Instance Profile Credentials](https://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-iam-instance-profile.html#getting-started-create-iam-instance-profile-cli), or [AWS ECS Tasks Credentials](https://docs.aws.amazon.com/AmazonECS/latest/userguide/task-iam-roles.html). Please refer to [Identity and Access Management in Amazon Elasticsearch Service](https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-ac.html) for details of AWS hosted Elasticsearch domain access policy configuration. |
| `AWS Region` | The AWS region in which your Elasticsearch service is located. | | `AWS Region` | The AWS region in which your Elasticsearch service is located. |
| `AWS Access Key` | The AWS access key. | | `AWS Access Key` | The AWS access key. |
...@@ -233,7 +233,7 @@ If you select `Limit namespaces and projects that can be indexed`, more options ...@@ -233,7 +233,7 @@ If you select `Limit namespaces and projects that can be indexed`, more options
You can select namespaces and projects to index exclusively. Note that if the namespace is a group, it includes You can select namespaces and projects to index exclusively. Note that if the namespace is a group, it includes
any subgroups and projects belonging to those subgroups to be indexed as well. any subgroups and projects belonging to those subgroups to be indexed as well.
Advanced Search only provides cross-group code/commit search (global) if all name-spaces are indexed. In this particular scenario where only a subset of namespaces are indexed, a global search will not provide a code or commit scope. This is possible only in the scope of an indexed namespace. There is no way to code/commit search in multiple indexed namespaces (when only a subset of namespaces has been indexed). For example if two groups are indexed, there is no way to run a single code search on both. You can only run a code search on the first group and then on the second. Advanced Search only provides cross-group code/commit search (global) if all name-spaces are indexed. In this particular scenario where only a subset of namespaces are indexed, a global search does not provide a code or commit scope. This is possible only in the scope of an indexed namespace. There is no way to code/commit search in multiple indexed namespaces (when only a subset of namespaces has been indexed). For example if two groups are indexed, there is no way to run a single code search on both. You can only run a code search on the first group and then on the second.
You can filter the selection dropdown by writing part of the namespace or project name you're interested in. You can filter the selection dropdown by writing part of the namespace or project name you're interested in.
...@@ -355,7 +355,7 @@ Setting this value too high can have adverse performance impacts as your cluster ...@@ -355,7 +355,7 @@ Setting this value too high can have adverse performance impacts as your cluster
may become heavily saturated with searches and writes. Setting this value too may become heavily saturated with searches and writes. Setting this value too
low may lead the reindexing process to take a very long time to complete. low may lead the reindexing process to take a very long time to complete.
The best value for this will depend on your cluster size, whether you're willing The best value for this depends on your cluster size, whether you're willing
to accept some degraded search performance during reindexing, and how important to accept some degraded search performance during reindexing, and how important
it is for the reindex to finish quickly and resume indexing. it is for the reindex to finish quickly and resume indexing.
...@@ -430,23 +430,23 @@ In order to debug issues with the migrations you can check the [`elasticsearch.l ...@@ -430,23 +430,23 @@ In order to debug issues with the migrations you can check the [`elasticsearch.l
### Retry a halted migration ### Retry a halted migration
Some migrations are built with a retry limit. If the migration cannot finish within the retry limit, Some migrations are built with a retry limit. If the migration cannot finish within the retry limit,
it will be halted and a notification will be displayed in the Advanced Search integration settings. it is halted and a notification is displayed in the Advanced Search integration settings.
It is recommended to check the [`elasticsearch.log` file](../administration/logs.md#elasticsearchlog) to It is recommended to check the [`elasticsearch.log` file](../administration/logs.md#elasticsearchlog) to
debug why the migration was halted and make any changes before retrying the migration. Once you believe you've debug why the migration was halted and make any changes before retrying the migration. Once you believe you've
fixed the cause of the failure, click "Retry migration", and the migration will be scheduled to be retried fixed the cause of the failure, click "Retry migration", and the migration is scheduled to be retried
in the background. in the background.
If you cannot get the migration to succeed, you may If you cannot get the migration to succeed, you may
consider the [last resort to recreate the index from consider the [last resort to recreate the index from
scratch](#last-resort-to-recreate-an-index). This may allow you to skip over scratch](#last-resort-to-recreate-an-index). This may allow you to skip over
the problem because a newly created index will skip all migrations as the index the problem because a newly created index skips all migrations as the index
will be recreated with the correct up-to-date schema. is recreated with the correct up-to-date schema.
### All migrations must be finished before doing a major upgrade ### All migrations must be finished before doing a major upgrade
Before doing a major version GitLab upgrade, you should have completed all Before doing a major version GitLab upgrade, you should have completed all
migrations that exist up until the latest minor version before that major migrations that exist up until the latest minor version before that major
version. If you have halted migrations, these will need to be resolved and version. If you have halted migrations, these need to be resolved and
[retried](#retry-a-halted-migration) before proceeding with a major version [retried](#retry-a-halted-migration) before proceeding with a major version
upgrade. Read more about [upgrading to a new major upgrade. Read more about [upgrading to a new major
version](../update/index.md#upgrading-to-a-new-major-version). version](../update/index.md#upgrading-to-a-new-major-version).
...@@ -468,7 +468,7 @@ The following are some available Rake tasks: ...@@ -468,7 +468,7 @@ The following are some available Rake tasks:
| [`sudo gitlab-rake gitlab:elastic:resume_indexing`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Resumes Elasticsearch indexing. | | [`sudo gitlab-rake gitlab:elastic:resume_indexing`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Resumes Elasticsearch indexing. |
| [`sudo gitlab-rake gitlab:elastic:index_projects`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Iterates over all projects and queues Sidekiq jobs to index them in the background. | | [`sudo gitlab-rake gitlab:elastic:index_projects`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Iterates over all projects and queues Sidekiq jobs to index them in the background. |
| [`sudo gitlab-rake gitlab:elastic:index_projects_status`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Determines the overall status of the indexing. It is done by counting the total number of indexed projects, dividing by a count of the total number of projects, then multiplying by 100. | | [`sudo gitlab-rake gitlab:elastic:index_projects_status`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Determines the overall status of the indexing. It is done by counting the total number of indexed projects, dividing by a count of the total number of projects, then multiplying by 100. |
| [`sudo gitlab-rake gitlab:elastic:clear_index_status`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Deletes all instances of IndexStatus for all projects. Note that this command will result in a complete wipe of the index, and it should be used with caution. | | [`sudo gitlab-rake gitlab:elastic:clear_index_status`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Deletes all instances of IndexStatus for all projects. Note that this command results in a complete wipe of the index, and it should be used with caution. |
| [`sudo gitlab-rake gitlab:elastic:create_empty_index`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Generates empty indexes (the default index and a separate issues index) and assigns an alias for each on the Elasticsearch side only if it doesn't already exist. | | [`sudo gitlab-rake gitlab:elastic:create_empty_index`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Generates empty indexes (the default index and a separate issues index) and assigns an alias for each on the Elasticsearch side only if it doesn't already exist. |
| [`sudo gitlab-rake gitlab:elastic:delete_index`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Removes the GitLab indexes and aliases (if they exist) on the Elasticsearch instance. | | [`sudo gitlab-rake gitlab:elastic:delete_index`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Removes the GitLab indexes and aliases (if they exist) on the Elasticsearch instance. |
| [`sudo gitlab-rake gitlab:elastic:recreate_index`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Wrapper task for `gitlab:elastic:delete_index` and `gitlab:elastic:create_empty_index`. | | [`sudo gitlab-rake gitlab:elastic:recreate_index`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Wrapper task for `gitlab:elastic:delete_index` and `gitlab:elastic:create_empty_index`. |
...@@ -501,7 +501,7 @@ I, [2019-03-04T21:27:05.215266 #3384] INFO -- : Indexing GitLab User / test (ID ...@@ -501,7 +501,7 @@ I, [2019-03-04T21:27:05.215266 #3384] INFO -- : Indexing GitLab User / test (ID
## Advanced Search index scopes ## Advanced Search index scopes
When performing a search, the GitLab index will use the following scopes: When performing a search, the GitLab index uses the following scopes:
| Scope Name | What it searches | | Scope Name | What it searches |
| ---------------- | ---------------------- | | ---------------- | ---------------------- |
...@@ -521,25 +521,25 @@ When performing a search, the GitLab index will use the following scopes: ...@@ -521,25 +521,25 @@ When performing a search, the GitLab index will use the following scopes:
For basic guidance on choosing a cluster configuration you may refer to [Elastic Cloud Calculator](https://cloud.elastic.co/pricing). You can find more information below. For basic guidance on choosing a cluster configuration you may refer to [Elastic Cloud Calculator](https://cloud.elastic.co/pricing). You can find more information below.
- Generally, you will want to use at least a 2-node cluster configuration with one replica, which will allow you to have resilience. If your storage usage is growing quickly, you may want to plan horizontal scaling (adding more nodes) beforehand. - Generally, you want to use at least a 2-node cluster configuration with one replica, which allows you to have resilience. If your storage usage is growing quickly, you may want to plan horizontal scaling (adding more nodes) beforehand.
- It's not recommended to use HDD storage with the search cluster, because it will take a hit on performance. It's better to use SSD storage (NVMe or SATA SSD drives for example). - It's not recommended to use HDD storage with the search cluster, because it takes a hit on performance. It's better to use SSD storage (NVMe or SATA SSD drives for example).
- You can use the [GitLab Performance Tool](https://gitlab.com/gitlab-org/quality/performance) to benchmark search performance with different search cluster sizes and configurations. - You can use the [GitLab Performance Tool](https://gitlab.com/gitlab-org/quality/performance) to benchmark search performance with different search cluster sizes and configurations.
- `Heap size` should be set to no more than 50% of your physical RAM. Additionally, it shouldn't be set to more than the threshold for zero-based compressed oops. The exact threshold varies, but 26 GB is safe on most systems, but can also be as large as 30 GB on some systems. See [Heap size settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#heap-size-settings) and [Setting JVM options](https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html) for more details. - `Heap size` should be set to no more than 50% of your physical RAM. Additionally, it shouldn't be set to more than the threshold for zero-based compressed oops. The exact threshold varies, but 26 GB is safe on most systems, but can also be as large as 30 GB on some systems. See [Heap size settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#heap-size-settings) and [Setting JVM options](https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html) for more details.
- Number of CPUs (CPU cores) per node usually corresponds to the `Number of Elasticsearch shards` setting described below. - Number of CPUs (CPU cores) per node usually corresponds to the `Number of Elasticsearch shards` setting described below.
- A good guideline is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health. - A good guideline is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better. This generally helps the cluster stay in good health.
- Number of Elasticsearch shards: - Number of Elasticsearch shards:
- Small shards result in small segments, which increases overhead. Aim to keep the average shard size between at least a few GB and a few tens of GB. - Small shards result in small segments, which increases overhead. Aim to keep the average shard size between at least a few GB and a few tens of GB.
- Another consideration is the number of documents. To determine the number of shards to use, sum the numbers in the **Menu > Admin > Dashboard > Statistics** pane (the number of documents to be indexed), divide by 5 million, and add 5. For example: - Another consideration is the number of documents. To determine the number of shards to use, sum the numbers in the **Menu > Admin > Dashboard > Statistics** pane (the number of documents to be indexed), divide by 5 million, and add 5. For example:
- If you have fewer than about 2,000,000 documents, use the default of 5 shards - If you have fewer than about 2,000,000 documents, use the default of 5 shards
- 10,000,000 documents: `10000000/5000000 + 5` = 7 shards - 10,000,000 documents: `10000000/5000000 + 5` = 7 shards
- 100,000,000 documents: `100000000/5000000 + 5` = 25 shards - 100,000,000 documents: `100000000/5000000 + 5` = 25 shards
- `refresh_interval` is a per index setting. You may want to adjust that from default `1s` to a bigger value if you don't need data in real-time. This will change how soon you will see fresh results. If that's important for you, you should leave it as close as possible to the default value. - `refresh_interval` is a per index setting. You may want to adjust that from default `1s` to a bigger value if you don't need data in real-time. This changes how soon you see fresh results. If that's important for you, you should leave it as close as possible to the default value.
- You might want to raise [`indices.memory.index_buffer_size`](https://www.elastic.co/guide/en/elasticsearch/reference/current/indexing-buffer.html) to 30% or 40% if you have a lot of heavy indexing operations. - You might want to raise [`indices.memory.index_buffer_size`](https://www.elastic.co/guide/en/elasticsearch/reference/current/indexing-buffer.html) to 30% or 40% if you have a lot of heavy indexing operations.
### Advanced Search integration settings guidance ### Advanced Search integration settings guidance
- The `Number of Elasticsearch shards` setting usually corresponds with the number of CPUs available in your cluster. For example, if you have a 3-node cluster with 4 cores each, this means you will benefit from having at least 3*4=12 shards in the cluster. It's only possible to change the shards number by using [Split index API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-split-index.html) or by reindexing to a different index with a changed number of shards. - The `Number of Elasticsearch shards` setting usually corresponds with the number of CPUs available in your cluster. For example, if you have a 3-node cluster with 4 cores each, this means you benefit from having at least 3*4=12 shards in the cluster. It's only possible to change the shards number by using [Split index API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-split-index.html) or by reindexing to a different index with a changed number of shards.
- The `Number of Elasticsearch replicas` setting should most of the time be equal to `1` (each shard will have 1 replica). Using `0` is not recommended, because losing one node will corrupt the index. - The `Number of Elasticsearch replicas` setting should most of the time be equal to `1` (each shard has 1 replica). Using `0` is not recommended, because losing one node corrupts the index.
### Indexing large instances ### Indexing large instances
...@@ -548,7 +548,7 @@ This section may be helpful in the event that the other ...@@ -548,7 +548,7 @@ This section may be helpful in the event that the other
due to large volumes of data being indexed. due to large volumes of data being indexed.
WARNING: WARNING:
Indexing a large instance will generate a lot of Sidekiq jobs. Indexing a large instance generates a lot of Sidekiq jobs.
Make sure to prepare for this task by having a [Scalable and Highly Available Make sure to prepare for this task by having a [Scalable and Highly Available
Setup](../administration/reference_architectures/index.md) or creating [extra Setup](../administration/reference_architectures/index.md) or creating [extra
Sidekiq processes](../administration/operations/extra_sidekiq_processes.md). Sidekiq processes](../administration/operations/extra_sidekiq_processes.md).
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment