Commit 8ddee1f1 authored by Mike Jang's avatar Mike Jang

Merge branch 'jeromezng-master-patch-00802' into 'master'

Move Event Dictionary and Product Analytics Guide To Handbook

See merge request gitlab-org/gitlab!46654
parents 363ccccb 78f8a639
...@@ -10,7 +10,7 @@ Please check the ~"product analytics" [guide](https://docs.gitlab.com/ee/develop ...@@ -10,7 +10,7 @@ Please check the ~"product analytics" [guide](https://docs.gitlab.com/ee/develop
MSG MSG
UPDATE_METRICS_DEFINITIONS_MESSAGE = <<~MSG UPDATE_METRICS_DEFINITIONS_MESSAGE = <<~MSG
When adding, changing, or updating metrics, please update the [Event dictionary Usage Ping table](https://docs.gitlab.com/ee/development/product_analytics/event_dictionary.html#usage-ping). When adding, changing, or updating metrics, please update the [Event dictionary Usage Ping table](https://about.gitlab.com/handbook/product/product-analytics-guide#event-dictionary).
MSG MSG
......
...@@ -183,7 +183,7 @@ See [database guidelines](database/index.md). ...@@ -183,7 +183,7 @@ See [database guidelines](database/index.md).
## Product Analytics guides ## Product Analytics guides
- [Product Analytics guide](product_analytics/index.md) - [Product Analytics guide](https://about.gitlab.com/handbook/product/product-analytics-guide/)
- [Usage Ping guide](product_analytics/usage_ping.md) - [Usage Ping guide](product_analytics/usage_ping.md)
- [Snowplow guide](product_analytics/snowplow.md) - [Snowplow guide](product_analytics/snowplow.md)
......
...@@ -47,7 +47,7 @@ the `author` field. GitLab team members **should not**. ...@@ -47,7 +47,7 @@ the `author` field. GitLab team members **should not**.
- Any user-facing change **must** have a changelog entry. This includes both visual changes (regardless of how minor), and changes to the rendered DOM which impact how a screen reader may announce the content. - Any user-facing change **must** have a changelog entry. This includes both visual changes (regardless of how minor), and changes to the rendered DOM which impact how a screen reader may announce the content.
- Any client-facing change to our REST and GraphQL APIs **must** have a changelog entry. - Any client-facing change to our REST and GraphQL APIs **must** have a changelog entry.
- Performance improvements **should** have a changelog entry. - Performance improvements **should** have a changelog entry.
- Changes that need to be documented in the Product Analytics [Event Dictionary](product_analytics/event_dictionary.md) - Changes that need to be documented in the Product Analytics [Event Dictionary](https://about.gitlab.com/handbook/product/product-analytics-guide#event-dictionary)
also require a changelog entry. also require a changelog entry.
- _Any_ contribution from a community member, no matter how small, **may** have - _Any_ contribution from a community member, no matter how small, **may** have
a changelog entry regardless of these guidelines if the contributor wants one. a changelog entry regardless of these guidelines if the contributor wants one.
...@@ -55,7 +55,7 @@ the `author` field. GitLab team members **should not**. ...@@ -55,7 +55,7 @@ the `author` field. GitLab team members **should not**.
- Any docs-only changes **should not** have a changelog entry. - Any docs-only changes **should not** have a changelog entry.
- Any change behind a disabled feature flag **should not** have a changelog entry. - Any change behind a disabled feature flag **should not** have a changelog entry.
- Any change behind an enabled feature flag **should** have a changelog entry. - Any change behind an enabled feature flag **should** have a changelog entry.
- Any change that adds new usage data metrics and changes that needs to be documented in Product Analytics [Event Dictionary](telemetry/event_dictionary.md) **should** have a changelog entry. - Any change that adds new usage data metrics and changes that needs to be documented in Product Analytics [Event Dictionary](https://about.gitlab.com/handbook/product/product-analytics-guide#event-dictionary) **should** have a changelog entry.
- A change that [removes a feature flag](feature_flags/development.md) **should** have a changelog entry - - A change that [removes a feature flag](feature_flags/development.md) **should** have a changelog entry -
only if the feature flag did not default to true already. only if the feature flag did not default to true already.
- A fix for a regression introduced and then fixed in the same release (i.e., - A fix for a regression introduced and then fixed in the same release (i.e.,
......
...@@ -27,7 +27,7 @@ A database review is required for: ...@@ -27,7 +27,7 @@ A database review is required for:
database review. database review.
- Changes in usage data metrics that use `count` and `distinct_count`. - Changes in usage data metrics that use `count` and `distinct_count`.
These metrics could have complex queries over large tables. These metrics could have complex queries over large tables.
See the [Product Analytics Guide](product_analytics/usage_ping.md#implementing-usage-ping) See the [Product Analytics Guide](https://about.gitlab.com/handbook/product/product-analytics-guide/)
for implementation details. for implementation details.
A database reviewer is expected to look out for obviously complex A database reviewer is expected to look out for obviously complex
......
...@@ -115,7 +115,7 @@ addressed. ...@@ -115,7 +115,7 @@ addressed.
To determine whether the experiment is a success or not, we must implement tracking events To determine whether the experiment is a success or not, we must implement tracking events
to acquire data for analyzing. We can send events to Snowplow via either the backend or frontend. to acquire data for analyzing. We can send events to Snowplow via either the backend or frontend.
Read the [product analytics guide](../product_analytics/index.md) for more details. Read the [product analytics guide](https://about.gitlab.com/handbook/product/product-analytics-guide/) for more details.
#### Track backend events #### Track backend events
......
--- ---
stage: Growth redirect_to: 'https://about.gitlab.com/handbook/product/product-analytics-guide/'
group: Product Analytics
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
--- ---
# Event Dictionary This document was moved to [another location](https://about.gitlab.com/handbook/product/product-analytics-guide/).
**Note: We've temporarily moved the Event Dictionary to a [Google Sheet](https://docs.google.com/spreadsheets/d/1VzE8R72Px_Y_LlE3Z05LxUlG_dumWe3vl-HeUo70TPw/edit?usp=sharing)**. The previous Markdown table exceeded 600 rows making it difficult to manage. In the future, our intention is to move this back into our docs using a [YAML file](https://gitlab.com/gitlab-org/gitlab-docs/-/issues/823).
The event dictionary is a single source of truth for the metrics and events we collect for product usage data. The Event Dictionary lists all the metrics and events we track, why we're tracking them, and where they are tracked.
This is a living document that is updated any time a new event is planned or implemented. It includes the following information.
- Section, stage, or group
- Description
- Implementation status
- Availability by plan type
- Code path
We're currently focusing our Event Dictionary on [Usage Ping](usage_ping.md). In the future, we will also include [Snowplow](snowplow.md). We currently have an initiative across the entire product organization to complete the [Event Dictionary for Usage Ping](https://gitlab.com/groups/gitlab-org/-/epics/4174).
## Instructions
1. Open the Event Dictionary and fill in all the **PM to edit** columns highlighted in yellow.
1. Check that all the metrics and events are assigned to the correct section, stage, or group. If a metric is used across many groups, assign it to the stage. If a metric is used across many stages, assign it to the section. If a metric is incorrectly assigned to another section, stage, or group, let the PM know you have reassigned it. If your group has no assigned metrics and events, check that your metrics and events are not incorrectly assigned to another PM.
1. Add descriptions of what your metrics and events are tracking. Work with your Engineering team or the Product Analytics team if you need help understanding this.
1. Add what plans this metric is available on. Work with your Engineering team or the Product Analytics team if you need help understanding this.
## Planned metrics and events
For future metrics and events you plan to track, please add them to the Event Dictionary and note the status as `Planned`, `In Progress`, or `Implemented`. Once you have confirmed the metric has been implemented and have confirmed the metric data is in our data warehouse, change the status to **Data Available**.
--- ---
stage: Growth redirect_to: 'https://about.gitlab.com/handbook/product/product-analytics-guide/'
group: Product Analytics
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
--- ---
# Product Analytics Guide This document was moved to [another location](https://about.gitlab.com/handbook/product/product-analytics-guide/).
At GitLab, we collect product usage data for the purpose of helping us build a better product. Data helps GitLab understand which parts of the product need improvement and which features we should build next. Product usage data also helps our team better understand the reasons why people use GitLab. With this knowledge we are able to make better product decisions.
We encourage users to enable tracking, and we embrace full transparency with our tracking approach so it can be easily understood and trusted.
By enabling tracking, users can:
- Contribute back to the wider community.
- Help GitLab improve on the product.
## Our tracking tools
We use three methods to gather product usage data:
- [Snowplow](#snowplow)
- [Usage Ping](#usage-ping)
- [Database import](#database-import)
### Snowplow
Snowplow is an enterprise-grade marketing and product analytics platform which helps track the way
users engage with our website and application.
Snowplow consists of two components:
- [Snowplow JS](https://github.com/snowplow/snowplow/wiki/javascript-tracker) tracks client-side
events.
- [Snowplow Ruby](https://github.com/snowplow/snowplow/wiki/ruby-tracker) tracks server-side events.
For more details, read the [Snowplow](snowplow.md) guide.
### Usage Ping
Usage Ping is a method for GitLab Inc to collect usage data on a GitLab instance. Usage Ping is primarily composed of row counts for different tables in the instance’s database. By comparing these counts month over month (or week over week), we can get a rough sense for how an instance is using the different features within the product. This high-level data is used to help our product, support, and sales teams.
For more details, read the [Usage Ping](usage_ping.md) guide.
### Database import
Database imports are full imports of data into GitLab's data warehouse. For GitLab.com, the PostgreSQL database is loaded into Snowflake data warehouse every 6 hours. For more details, see the [data team handbook](https://about.gitlab.com/handbook/business-ops/data-team/platform/#extract-and-load).
## What data can be tracked
Our different tracking tools allows us to track different types of events. The event types and examples of what data can be tracked are outlined below.
The availability of event types and their tracking tools varies by segment. For example, on Self-Managed Users, we only have reporting using Database records via Usage Ping.
| Event Types | SaaS Instance | SaaS Plan | SaaS Group | SaaS Session | SaaS User | SM Instance | SM Plan | SM Group | SM Session | SM User |
|----------------------------------------|---------------|-----------|------------|--------------|-----------|-------------|---------|----------|------------|---------|
| Snowplow (JS Pageview events) | ✅ | 📅 | 📅 | ✅ | 📅 | 📅 | 📅 | 📅 | 📅 | 📅 |
| Snowplow (JS UI events) | ✅ | 📅 | 📅 | ✅ | 📅 | 📅 | 📅 | 📅 | 📅 | 📅 |
| Snowplow (Ruby Pageview events) | ✅ | 📅 | 📅 | ✅ | 📅 | 📅 | 📅 | 📅 | 📅 | 📅 |
| Snowplow (Ruby CRUD / API events) | ✅ | 📅 | 📅 | ✅ | 📅 | 📅 | 📅 | 📅 | 📅 | 📅 |
| Usage Ping (Redis UI counters) | 🔄 | 🔄 | 🔄 | ✖️ | 🔄 | 🔄 | 🔄 | 🔄 | ✖️ | 🔄 |
| Usage Ping (Redis Pageview counters) | 🔄 | 🔄 | 🔄 | ✖️ | 🔄 | 🔄 | 🔄 | 🔄 | ✖️ | 🔄 |
| Usage Ping (Redis CRUD / API counters) | 🔄 | 🔄 | 🔄 | ✖️ | 🔄 | 🔄 | 🔄 | 🔄 | ✖️ | 🔄 |
| Usage Ping (Database counters) | ✅ | 🔄 | 📅 | ✖️ | ✅ | ✅ | ✅ | ✅ | ✖️ | ✅ |
| Usage Ping (Instance settings) | ✅ | 🔄 | 📅 | ✖️ | ✅ | ✅ | ✅ | ✅ | ✖️ | ✅ |
| Usage Ping (Integration settings) | ✅ | 🔄 | 📅 | ✖️ | ✅ | ✅ | ✅ | ✅ | ✖️ | ✅ |
| Database import (Database records) | ✅ | ✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ |
[Source file](https://docs.google.com/spreadsheets/d/1e8Afo41Ar8x3JxAXJF3nL83UxVZ3hPIyXdt243VnNuE/edit?usp=sharing)
**Legend**
✅ Available, 🔄 In Progress, 📅 Planned, ✖️ Not Possible
SaaS = GitLab.com. SM = Self-Managed instance
### Pageview events
- Number of sessions that visited the /dashboard/groups page
### UI events
- Number of sessions that clicked on a button or link
- Number of sessions that closed a modal
UI events are any interface-driven actions from the browser including click data.
### CRUD or API events
- Number of Git pushes
- Number of GraphQL queries
- Number of requests to a Rails action or controller
These are backend events that include the creation, read, update, deletion of records, and other events that might be triggered from layers other than those available in the interface.
### Database records
These are raw database records which can be explored using business intelligence tools like Sisense. The full list of available tables can be found in [structure.sql](https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/structure.sql).
### Instance settings
These are settings of your instance such as the instance's Git version and if certain features are enabled such as `container_registry_enabled`.
### Integration settings
These are integrations your GitLab instance interacts with such as an [external storage provider](../../administration/static_objects_external_storage.md) or an [external container registry](../../administration/packages/container_registry.md#use-an-external-container-registry-with-gitlab-as-an-auth-endpoint). These services must be able to send data back into a GitLab instance for data to be tracked.
## Reporting level
Our reporting levels of aggregate or individual reporting varies by segment. For example, on Self-Managed Users, we can report at an aggregate user level using Usage Ping but not on an Individual user level.
| Aggregated Reporting | SaaS Instance | SaaS Plan | SaaS Group | SaaS Session | SaaS User | SM Instance | SM Plan | SM Group | SM Session | SM User |
|----------------------|---------------|-----------|------------|--------------|-----------|-------------|---------|----------|------------|---------|
| Snowplow | ✅ | 📅 | 📅 | ✅ | 📅 | ✅ | 📅 | 📅 | ✅ | 📅 |
| Usage Ping | ✅ | 🔄 | 📅 | 📅 | ✅ | ✅ | ✅ | ✅ | 📅 | ✅ |
| Database import | ✅ | ✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ |
| Identifiable Reporting | SaaS Instance | SaaS Plan | SaaS Group | SaaS Session | SaaS User | SM Instance | SM Plan | SM Group | SM Session | SM User |
|------------------------|---------------|-----------|------------|--------------|-----------|-------------|---------|----------|------------|---------|
| Snowplow | ✅ | 📅 | 📅 | ✅ | 📅 | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ |
| Usage Ping | ✅ | 🔄 | 📅 | ✖️ | ✖️ | ✅ | ✅ | ✖️ | ✖️ | ✖️ |
| Database import | ✅ | ✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ | ✖️ | ✖️ | ✖️ |
**Legend**
✅ Available, 🔄 In Progress, 📅 Planned, ✖️ Not Possible
SaaS = GitLab.com. SM = Self-Managed instance
## Reporting time period
Our reporting time periods varies by segment. For example, on Self-Managed Users, we can report all time counts and 28 day counts in Usage Ping.
| Reporting Time Period | All Time | 28 Days | 7 Days | Daily |
|-----------------------|----------|---------|--------|-------|
| Snowplow | ✅ | ✅ | ✅ | ✅ |
| Usage Ping | ✅ | ✅ | 📅 | ✖️ |
| Database import | ✅ | ✅ | ✅ | ✅ |
**Legend**
✅ Available, 🔄 In Progress, 📅 Planned, ✖️ Not Possible
## Systems overview
The systems overview is a simplified diagram showing the interactions between GitLab Inc and self-managed instances.
![Product Analytics Overview](../img/telemetry_system_overview.png)
[Source file](https://app.diagrams.net/#G13DVpN-XnhWGz9tqReIj8pp1UE4ehk_EC)
### GitLab Inc
For Product Analytics purposes, GitLab Inc has three major components:
1. [Data Infrastructure](https://about.gitlab.com/handbook/business-ops/data-team/platform/infrastructure/): This contains everything managed by our data team including Sisense Dashboards for visualization, Snowflake for Data Warehousing, incoming data sources such as PostgreSQL Pipeline and S3 Bucket, and lastly our data collectors [GitLab.com's Snowplow Collector](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/snowplow/) and GitLab's Versions Application.
1. GitLab.com: This is the production GitLab application which is made up of a Client and Server. On the Client or browser side, a Snowplow JS Tracker (Frontend) is used to track client-side events. On the Server or application side, a Snowplow Ruby Tracker (Backend) is used to track server-side events. The server also contains Usage Ping which leverages a PostgreSQL database and a Redis in-memory data store to report on usage data. Lastly, the server also contains System Logs which are generated from running the GitLab application.
1. [Monitoring infrastructure](https://about.gitlab.com/handbook/engineering/monitoring/): This is the infrastructure used to ensure GitLab.com is operating smoothly. System Logs are sent from GitLab.com to our monitoring infrastructure and collected by a FluentD collector. From FluentD, logs are either sent to long term Google Cloud Services cold storage via Stackdriver, or, they are sent to our Elastic Cluster via Cloud Pub/Sub which can be explored in real-time using Kibana.
### Self-managed
For Product Analytics purposes, self-managed instances have two major components:
1. Data infrastructure: Having a data infrastructure setup is optional on self-managed instances. If you'd like to collect Snowplow tracking events for your self-managed instance, you can setup your own self-managed Snowplow collector and configure your Snowplow events to point to your own collector.
1. GitLab: A self-managed GitLab instance contains all of the same components as GitLab.com mentioned above.
### Differences between GitLab Inc and Self-managed
As shown by the orange lines, on GitLab.com Snowplow JS, Snowplow Ruby, Usage Ping, and PostgreSQL database imports all flow into GitLab Inc's data infrastructure. However, on self-managed, only Usage Ping flows into GitLab Inc's data infrastructure.
As shown by the green lines, on GitLab.com system logs flow into GitLab Inc's monitoring infrastructure. On self-managed, there are no logs sent to GitLab Inc's monitoring infrastructure.
Note (1): Snowplow JS and Snowplow Ruby are available on self-managed, however, the Snowplow Collector endpoint is set to a self-managed Snowplow Collector which GitLab Inc does not have access to.
## Additional information
More useful links:
- [Product Analytics Direction](https://about.gitlab.com/direction/product-analytics/)
- [Data Analysis Process](https://about.gitlab.com/handbook/business-ops/data-team/#data-analysis-process/)
- [Data for Product Managers](https://about.gitlab.com/handbook/business-ops/data-team/programs/data-for-product-managers/)
- [Data Infrastructure](https://about.gitlab.com/handbook/business-ops/data-team/platform/infrastructure/)
...@@ -10,7 +10,7 @@ This guide provides an overview of how Snowplow works, and implementation detail ...@@ -10,7 +10,7 @@ This guide provides an overview of how Snowplow works, and implementation detail
For more information about Product Analytics, see: For more information about Product Analytics, see:
- [Product Analytics Guide](index.md) - [Product Analytics Guide](https://about.gitlab.com/handbook/product/product-analytics-guide/)
- [Usage Ping Guide](usage_ping.md) - [Usage Ping Guide](usage_ping.md)
More useful links: More useful links:
......
...@@ -15,7 +15,7 @@ This guide describes Usage Ping's purpose and how it's implemented. ...@@ -15,7 +15,7 @@ This guide describes Usage Ping's purpose and how it's implemented.
For more information about Product Analytics, see: For more information about Product Analytics, see:
- [Product Analytics Guide](index.md) - [Product Analytics Guide](https://about.gitlab.com/handbook/product/product-analytics-guide/)
- [Snowplow Guide](snowplow.md) - [Snowplow Guide](snowplow.md)
More useful links: More useful links:
...@@ -613,7 +613,7 @@ We also use `#database-lab` and [explain.depesz.com](https://explain.depesz.com/ ...@@ -613,7 +613,7 @@ We also use `#database-lab` and [explain.depesz.com](https://explain.depesz.com/
### 5. Add the metric definition ### 5. Add the metric definition
When adding, changing, or updating metrics, please update the [Event Dictionary's **Usage Ping** table](event_dictionary.md). When adding, changing, or updating metrics, please update the [Event Dictionary's **Usage Ping** table](https://about.gitlab.com/handbook/product/product-analytics-guide#event-dictionary).
### 6. Add new metric to Versions Application ### 6. Add new metric to Versions Application
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment