coder/scaletest/README.md

110 lines
5.6 KiB
Markdown

# Scale Testing
This folder contains CLI commands, Terraform code, and scripts to aid in performing load tests of Coder.
At a high level, it performs the following steps:
- Using the Terraform code in `./terraform`, stands up a preconfigured Google Cloud environment
consisting of a VPC, GKE Cluster, and CloudSQL instance.
> **Note: You must have an existing Google Cloud project available.**
- Creates a dedicated namespace for Coder and installs Coder using the Helm chart in this namespace.
- Configures the Coder deployment with random credentials and a predefined Kubernetes template.
> **Note:** These credentials are stored in `${PROJECT_ROOT}/scaletest/.coderv2/coder.env`.
- Creates a number of workspaces and waits for them to all start successfully. These workspaces
are ephemeral and do not contain any persistent resources.
- Waits for 10 minutes to allow things to settle and establish a baseline.
- Generates web terminal traffic to all workspaces for 30 minutes.
- Directly after traffic generation, captures goroutine and heap snapshots of the Coder deployment.
- Tears down all resources (unless `--skip-cleanup` is specified).
## Usage
The main entrypoint is the `scaletest.sh` script.
```console
$ scaletest.sh --help
Usage: scaletest.sh --name <name> --project <project> --num-workspaces <num-workspaces> --scenario <scenario> [--dry-run] [--skip-cleanup]
```
### Required arguments:
- `--name`: Name for the loadtest. This is added as a prefix to resources created by Terraform (e.g. `joe-big-loadtest`).
- `--project`: Google Cloud project in which to create the resources (example: `my-loadtest-project`).
- `--num-workspaces`: Number of workspaces to create (example: `10`).
- `--scenario`: Deployment scenario to use (example: `small`). See `terraform/scenario-*.tfvars`.
> **Note:** In order to capture Prometheus metrics, you must define the environment variables
> `SCALETEST_PROMETHEUS_REMOTE_WRITE_USER` and `SCALETEST_PROMETHEUS_REMOTE_WRITE_PASSWORD`.
### Optional arguments:
- `--dry-run`: Do not perform any action and instead print what would be executed.
- `--skip-cleanup`: Do not perform any cleanup. You will be responsible for deleting any resources this creates.
### Environment Variables
All of the above arguments may be specified as environment variables. Consult the script for details.
### Prometheus Metrics
To capture Prometheus metrics from the loadtest, two environment variables are required:
- `SCALETEST_PROMETHEUS_REMOTE_WRITE_USER`
- `SCALETEST_PROMETHEUS_REMOTE_WRITE_PASSWORD`
### Enterprise License
To add an Enterprise license, set the `SCALETEST_CODER_LICENSE` environment variable to the JWT string
## Scenarios
A scenario defines a number of variables that override the default Terraform variables.
A number of existing scenarios are provided in `scaletest/terraform/scenario-*.tfvars`.
For example, `scenario-small.tfvars` includes the following variable definitions:
```
nodepool_machine_type_coder = "t2d-standard-2"
nodepool_machine_type_workspaces = "t2d-standard-2"
coder_cpu = "1000m" # Leaving 1 CPU for system workloads
coder_mem = "4Gi" # Leaving 4GB for system workloads
```
To create your own scenario, simply add a new file `terraform/scenario-$SCENARIO_NAME.tfvars`.
In this file, override variables as required, consulting `vars.tf` as needed.
You can then use this scenario by specifying `--scenario $SCENARIO_NAME`.
For example, if your scenario file were named `scenario-big-whopper2x.tfvars`, you would specify
`--scenario=big-whopper2x`.
## Utility scripts
A number of utility scripts are provided in `lib`, and are used by `scaletest.sh`:
- `coder_shim.sh`: a convenience script to run the `coder` binary with a predefined config root.
This is intended to allow running Coder CLI commands against the loadtest cluster without
modifying a user's existing Coder CLI configuration.
- `coder_init.sh`: Performs first-time user setup of an existing Coder instance, generating
a random password for the admin user. The admin user is named `admin@coder.com` by default.
Credentials are written to `scaletest/.coderv2/coder.env`.
- `coder_workspacetraffic.sh`: Runs traffic generation against the loadtest cluster and creates
a monitoring manifest for the traffic generation pod. This pod will restart automatically
after the traffic generation has completed.
## Grafana Dashboard
A sample Grafana dashboard is provided in `scaletest_dashboard.json`. This dashboard is intended
to be imported into an existing Grafana instance. It provides a number of useful metrics:
- **Control Plane Resources**: CPU, memory, and network usage for the Coder deployment, as well as the number of pod restarts.
- **Database**: Rows inserted/updated/deleted/returned, active connections, and transactions per second. Fine-grained `sqlQuerier` metrics are provided for Coder's database as well, broken down my query method.
- **HTTP requests**: Number of HTTP requests per second, broken down by status code and path.
- **Workspace Resources**: CPU, memory, and network usage for all workspaces.
- **Workspace Agents**: Workspace agent network usage, connection latency, and number of active connections.
- **Workspace Traffic**: Statistics related to workspace traffic generation.
- **Internals**: Provisioner job timings, concurrency, workspace builds, and AuthZ duration.
A subset of these metrics may be useful for a production deployment, but some are only useful
for load testing.
> **Note:** in particular, `sqlQuerier` metrics produce a large number of time series and may cause
> increased charges in your metrics provider.