docs: use scale testing utility (#12643)

This commit is contained in:
Marcin Tojek 2024-03-22 12:33:31 +01:00 committed by GitHub
parent 37a05372fa
commit a7d9d87ba2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
13 changed files with 833 additions and 140 deletions

View File

@ -604,6 +604,9 @@ jobs:
- name: Setup sqlc
uses: ./.github/actions/setup-sqlc
- name: make gen
run: "make --output-sync -j -B gen"
- name: Format
run: |
cd offlinedocs

View File

@ -1,110 +1,18 @@
We scale-test Coder with [a built-in utility](#scale-testing-utility) that can
be used in your environment for insights into how Coder scales with your
infrastructure.
infrastructure. For scale-testing Kubernetes clusters we recommend to install
and use the dedicated Coder template,
[scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner).
## General concepts
Coder runs workspace operations in a queue. The number of concurrent builds will
be limited to the number of provisioner daemons across all coderd replicas.
- **coderd**: Coders primary service. Learn more about
[Coders architecture](../about/architecture.md)
- **coderd replicas**: Replicas (often via Kubernetes) for high availability,
this is an [enterprise feature](../enterprise.md)
- **concurrent workspace builds**: Workspace operations (e.g.
create/stop/delete/apply) across all users
- **concurrent connections**: Any connection to a workspace (e.g. SSH, web
terminal, `coder_app`)
- **provisioner daemons**: Coder runs one workspace build per provisioner
daemon. One coderd replica can host many daemons
- **scaletest**: Our scale-testing utility, built into the `coder` command line.
```text
2 coderd replicas * 30 provisioner daemons = 60 max concurrent workspace builds
```
## Infrastructure recommendations
> Note: The below are guidelines for planning your infrastructure. Your mileage
> may vary depending on your templates, workflows, and users.
When planning your infrastructure, we recommend you consider the following:
1. CPU and memory requirements for `coderd`. We recommend allocating 1 CPU core
and 2 GB RAM per `coderd` replica at minimum. See
[Concurrent users](#concurrent-users) for more details.
1. CPU and memory requirements for
[external provisioners](../admin/provisioners.md#running-external-provisioners),
if required. We recommend allocating 1 CPU core and 1 GB RAM per 5 concurrent
workspace builds to external provisioners. Note that this may vary depending
on the template used. See
[Concurrent workspace builds](#concurrent-workspace-builds) for more details.
By default, `coderd` runs 3 integrated provisioners.
1. CPU and memory requirements for the database used by `coderd`. We recommend
allocating an additional 1 CPU core to the database used by Coder for every
1000 active users.
1. CPU and memory requirements for workspaces created by Coder. This will vary
depending on users' needs. However, the Coder agent itself requires at
minimum 0.1 CPU cores and 256 MB to run inside a workspace.
### Concurrent users
We recommend allocating 2 CPU cores and 4 GB RAM per `coderd` replica per 1000
active users. We also recommend allocating an additional 1 CPU core to the
database used by Coder for every 1000 active users. Inactive users do not
consume Coder resources, although workspaces configured to auto-start will
consume resources when they are built.
Users' primary mode of accessing Coder will also affect resource requirements.
If users will be accessing workspaces primarily via Coder's HTTP interface, we
recommend doubling the number of cores and RAM allocated per user. For example,
if you expect 1000 users accessing workspaces via the web, we recommend
allocating 4 CPU cores and 8 GB RAM.
Users accessing workspaces via SSH will consume fewer resources, as SSH
connections are not proxied through Coder.
### Concurrent workspace builds
Workspace builds are CPU-intensive, as it relies on Terraform. Various
[Terraform providers](https://registry.terraform.io/browse/providers) have
different resource requirements. When tested with our
[kubernetes](https://github.com/coder/coder/tree/main/examples/templates/kubernetes)
template, `coderd` will consume roughly 0.25 cores per concurrent workspace
build. For effective provisioning, our helm chart prefers to schedule
[one coderd replica per-node](https://github.com/coder/coder/blob/main/helm/coder/values.yaml#L188-L202).
We recommend:
- Running `coderd` on a dedicated set of nodes. This will prevent other
workloads from interfering with workspace builds. You can use
[node selectors](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector),
or
[taints and tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
to achieve this.
- Disabling autoscaling for `coderd` nodes. Autoscaling can cause interruptions
for users, see [Autoscaling](#autoscaling) for more details.
- (Enterprise-only) Running external provisioners instead of Coder's built-in
provisioners (`CODER_PROVISIONER_DAEMONS=0`) will separate the load caused by
workspace provisioning on the `coderd` nodes. For more details, see
[External provisioners](../admin/provisioners.md#running-external-provisioners).
- Alternatively, if increasing the number of integrated provisioner daemons in
`coderd` (`CODER_PROVISIONER_DAEMONS>3`), allocate additional resources to
`coderd` to compensate (approx. 0.25 cores and 256 MB per provisioner daemon).
For example, to support 120 concurrent workspace builds:
- Create a cluster/nodepool with 4 nodes, 8-core each (AWS: `t3.2xlarge` GCP:
`e2-highcpu-8`)
- Run coderd with 4 replicas, 30 provisioner daemons each.
(`CODER_PROVISIONER_DAEMONS=30`)
- Ensure Coder's [PostgreSQL server](./configure.md#postgresql-database) can use
up to 2 cores and 4 GB RAM
Learn more about [Coders architecture](../about/architecture.md) and our
[scale-testing methodology](architectures/index.md#scale-testing-methodology).
## Recent scale tests
> Note: the below information is for reference purposes only, and are not
> intended to be used as guidelines for infrastructure sizing.
> intended to be used as guidelines for infrastructure sizing. Review the
> [Reference Architectures](architectures/index.md) for hardware sizing
> recommendations.
| Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested |
| ---------------- | --------- | --------- | -------------- | ----------------- | ----- | ----------------- | ------------------------------------- | ------------- | ------------ |
@ -123,58 +31,78 @@ Since Coder's performance is highly dependent on the templates and workflows you
support, you may wish to use our internal scale testing utility against your own
environments.
> Note: This utility is intended for internal use only. It is not subject to any
> compatibility guarantees, and may cause interruptions for your users. To avoid
> potential outages and orphaned resources, we recommend running scale tests on
> a secondary "staging" environment. Run it against a production environment at
> your own risk.
> Note: This utility is experimental. It is not subject to any compatibility
> guarantees, and may cause interruptions for your users. To avoid potential
> outages and orphaned resources, we recommend running scale tests on a
> secondary "staging" environment or a dedicated
> [Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
> Run it against a production environment at your own risk.
### Workspace Creation
### Create workspaces
The following command will run our scale test against your own Coder deployment.
You can also specify a template name and any parameter values.
The following command will provision a number of Coder workspaces using the
specified template and extra parameters.
```shell
coder exp scaletest create-workspaces \
--count 1000 \
--template "kubernetes" \
--concurrency 0 \
--cleanup-concurrency 0 \
--parameter "home_disk_size=10" \
--run-command "sleep 2 && echo hello"
--retry 5 \
--count "${SCALETEST_PARAM_NUM_WORKSPACES}" \
--template "${SCALETEST_PARAM_TEMPLATE}" \
--concurrency "${SCALETEST_PARAM_CREATE_CONCURRENCY}" \
--timeout 5h \
--job-timeout 5h \
--no-cleanup \
--output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json"
# Run `coder exp scaletest create-workspaces --help` for all usage
```
The test does the following:
The command does the following:
1. create `1000` workspaces
1. establish SSH connection to each workspace
1. run `sleep 3 && echo hello` on each workspace via the web terminal
1. close connections, attempt to delete all workspaces
1. return results (e.g. `998 succeeded, 2 failed to connect`)
Concurrency is configurable. `concurrency 0` means the scaletest test will
attempt to create & connect to all workspaces immediately.
If you wish to leave the workspaces running for a period of time, you can
specify `--no-cleanup` to skip the cleanup step. You are responsible for
deleting these resources later.
1. Create `${SCALETEST_PARAM_NUM_WORKSPACES}` workspaces concurrently
(concurrency level: `${SCALETEST_PARAM_CREATE_CONCURRENCY}`) using the
template `${SCALETEST_PARAM_TEMPLATE}`.
1. Leave workspaces running to use in next steps (`--no-cleanup` option).
1. Store provisioning results in JSON format.
1. If you don't want the creation process to be interrupted by any errors, use
the `--retry 5` flag.
### Traffic Generation
Given an existing set of workspaces created previously with `create-workspaces`,
the following command will generate traffic similar to that of Coder's web
terminal against those workspaces.
the following command will generate traffic similar to that of Coder's Web
Terminal against those workspaces.
```shell
# Produce load at about 1000MB/s (25MB/40ms).
coder exp scaletest workspace-traffic \
--byes-per-tick 128 \
--tick-interval 100ms \
--concurrency 0
--template "${SCALETEST_PARAM_GREEDY_AGENT_TEMPLATE}" \
--bytes-per-tick $((1024 * 1024 * 25)) \
--tick-interval 40ms \
--timeout "$((delay))s" \
--job-timeout "$((delay))s" \
--scaletest-prometheus-address 0.0.0.0:21113 \
--target-workspaces "0:100" \
--trace=false \
--output json:"${SCALETEST_RESULTS_DIR}/traffic-${type}-greedy-agent.json"
```
To generate SSH traffic, add the `--ssh` flag.
Traffic generation can be parametrized:
1. Send `bytes-per-tick` every `tick-interval`.
1. Enable tracing for performance debugging.
1. Target a range of workspaces with `--target-workspaces 0:100`.
1. For dashboard traffic: Target a range of users with `--target-users 0:100`.
1. Store provisioning results in JSON format.
1. Expose a dedicated Prometheus address (`--scaletest-prometheus-address`) for
scaletest-specific metrics.
The `workspace-traffic` supports also other modes - SSH traffic, workspace app:
1. For SSH traffic: Use `--ssh` flag to generate SSH traffic instead of Web
Terminal.
1. For workspace app traffic: Use `--app [wsdi|wsec|wsra]` flag to select app
behavior. (modes: _WebSocket discard_, _WebSocket echo_, _WebSocket read_).
### Cleanup
@ -182,11 +110,101 @@ The scaletest utility will attempt to clean up all workspaces it creates. If you
wish to clean up all workspaces, you can run the following command:
```shell
coder exp scaletest cleanup
coder exp scaletest cleanup \
--cleanup-job-timeout 2h \
--cleanup-timeout 15min
```
This will delete all workspaces and users with the prefix `scaletest-`.
## Scale testing template
Consider using a dedicated
[scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner)
template alongside the CLI utility for testing large-scale Kubernetes clusters.
The template deploys a main workspace with scripts used to orchestrate Coder,
creating workspaces, generating workspace traffic, or load-testing workspace
apps.
### Parameters
The _scaletest-runner_ offers the following configuration options:
- Workspace size selection: minimal/small/medium/large (_default_: minimal,
which contains just enough resources for a Coder agent to run without
additional workloads)
- Number of workspaces
- Wait duration between scenarios or staggered approach
The template exposes parameters to control the traffic dimensions for SSH
connections, workspace apps, and dashboard tests:
- Traffic duration of the load test scenario
- Traffic percentage of targeted workspaces
- Bytes per tick and tick interval
- _For workspace apps_: modes (echo, read random data, or write and discard)
Scale testing concurrency can be controlled with the following parameters:
- Enable parallel scenarios - interleave different traffic patterns (SSH,
workspace apps, dashboard traffic, etc.)
- Workspace creation concurrency level (_default_: 10)
- Job concurrency level - generate workspace traffic using multiple jobs
(_default_: 0)
- Cleanup concurrency level
### Kubernetes cluster
It is recommended to learn how to operate the _scaletest-runner_ before running
it against the staging cluster (or production at your own risk). Coder provides
different
[workspace configurations](https://github.com/coder/coder/tree/main/scaletest/templates)
that operators can deploy depending on the traffic projections.
There are a few cluster options available:
| Workspace size | vCPU | Memory | Persisted storage | Details |
| -------------- | ---- | ------ | ----------------- | ----------------------------------------------------- |
| minimal | 1 | 2 Gi | None | |
| small | 1 | 1 Gi | None | |
| medium | 2 | 2 Gi | None | Medium-sized cluster offers the greedy agent variant. |
| large | 4 | 4 Gi | None | |
Note: Review the selected cluster template and edit the node affinity to match
your setup.
#### Greedy agent
The greedy agent variant is a template modification that makes the Coder agent
transmit large metadata (size: 4K) while reporting stats. The transmission of
large chunks puts extra overhead on coderd instances and agents when handling
and storing the data.
Use this template variant to verify limits of the cluster performance.
### Observability
During scale tests, operators can monitor progress using a Grafana dashboard.
Coder offers a comprehensive overview
[dashboard](https://github.com/coder/coder/blob/main/scaletest/scaletest_dashboard.json)
that can seamlessly integrate into the internal Grafana deployment.
This dashboard provides insights into various aspects, including:
- Utilization of resources within the Coder control plane (CPU, memory, pods)
- Database performance metrics (CPU, memory, I/O, connections, queries)
- Coderd API performance (requests, latency, error rate)
- Resource consumption within Coder workspaces (CPU, memory, network usage)
- Internal metrics related to provisioner jobs
Note: Database metrics are disabled by default and can be enabled by setting the
environment variable `CODER_PROMETHEUS_COLLECT_DB_METRICS` to `true`.
It is highly recommended to deploy a solution for centralized log collection and
aggregation. The presence of error logs may indicate an underscaled deployment
of Coder, necessitating action from operators.
## Autoscaling
We generally do not recommend using an autoscaler that modifies the number of
@ -228,6 +246,6 @@ an annotation on the coderd deployment.
## Troubleshooting
If a load test fails or if you are experiencing performance issues during
day-to-day use, you can leverage Coder's [prometheus metrics](./prometheus.md)
day-to-day use, you can leverage Coder's [Prometheus metrics](./prometheus.md)
to identify bottlenecks during scale tests. Additionally, you can use your
existing cloud monitoring stack to measure load, view server logs, etc.

View File

@ -0,0 +1,7 @@
# kubernetes-large
Provisions a large-sized workspace with no persistent storage.
_Note_: It is assumed you will be running workspaces on a dedicated GKE nodepool.
By default, this template sets a node affinity of `cloud.google.com/gke-nodepool` = `big-workspaces`.
The nodepool affinity can be customized with the variable `kubernetes_nodepool_workspaces`.

View File

@ -0,0 +1,88 @@
terraform {
required_providers {
coder = {
source = "coder/coder"
version = "~> 0.7.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.18"
}
}
}
provider "coder" {}
provider "kubernetes" {
config_path = null # always use host
}
variable "kubernetes_nodepool_workspaces" {
description = "Kubernetes nodepool for Coder workspaces"
type = string
default = "big-workspaces"
}
data "coder_workspace" "me" {}
resource "coder_agent" "main" {
os = "linux"
arch = "amd64"
startup_script_timeout = 180
startup_script = ""
}
resource "kubernetes_pod" "main" {
count = data.coder_workspace.me.start_count
metadata {
name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
namespace = "coder-big"
labels = {
"app.kubernetes.io/name" = "coder-workspace"
"app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
}
}
spec {
security_context {
run_as_user = "1000"
fs_group = "1000"
}
container {
name = "dev"
image = "docker.io/codercom/enterprise-minimal:ubuntu"
image_pull_policy = "Always"
command = ["sh", "-c", coder_agent.main.init_script]
security_context {
run_as_user = "1000"
}
env {
name = "CODER_AGENT_TOKEN"
value = coder_agent.main.token
}
resources {
requests = {
"cpu" = "4"
"memory" = "4Gi"
}
limits = {
"cpu" = "4"
"memory" = "4Gi"
}
}
}
affinity {
node_affinity {
required_during_scheduling_ignored_during_execution {
node_selector_term {
match_expressions {
key = "cloud.google.com/gke-nodepool"
operator = "In"
values = ["${var.kubernetes_nodepool_workspaces}"]
}
}
}
}
}
}
}

View File

@ -0,0 +1,7 @@
# kubernetes-medium-greedy
Provisions a medium-sized workspace with no persistent storage. Greedy agent variant.
_Note_: It is assumed you will be running workspaces on a dedicated GKE nodepool.
By default, this template sets a node affinity of `cloud.google.com/gke-nodepool` = `big-workspaces`.
The nodepool affinity can be customized with the variable `kubernetes_nodepool_workspaces`.

View File

@ -0,0 +1,202 @@
terraform {
required_providers {
coder = {
source = "coder/coder"
version = "~> 0.7.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.18"
}
}
}
provider "coder" {}
provider "kubernetes" {
config_path = null # always use host
}
variable "kubernetes_nodepool_workspaces" {
description = "Kubernetes nodepool for Coder workspaces"
type = string
default = "big-workspaces"
}
data "coder_workspace" "me" {}
resource "coder_agent" "main" {
os = "linux"
arch = "amd64"
startup_script_timeout = 180
startup_script = ""
# Greedy metadata (3072 bytes base64 encoded is 4097 bytes).
metadata {
display_name = "Meta 01"
key = "01_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 02"
key = "0_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 03"
key = "03_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 04"
key = "04_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 05"
key = "05_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 06"
key = "06_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 07"
key = "07_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 08"
key = "08_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 09"
key = "09_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 10"
key = "10_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 11"
key = "11_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 12"
key = "12_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 13"
key = "13_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 14"
key = "14_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 15"
key = "15_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
metadata {
display_name = "Meta 16"
key = "16_meta"
script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64"
interval = 1
timeout = 10
}
}
resource "kubernetes_pod" "main" {
count = data.coder_workspace.me.start_count
metadata {
name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
namespace = "coder-big"
labels = {
"app.kubernetes.io/name" = "coder-workspace"
"app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
}
}
spec {
security_context {
run_as_user = "1000"
fs_group = "1000"
}
container {
name = "dev"
image = "docker.io/codercom/enterprise-minimal:ubuntu"
image_pull_policy = "Always"
command = ["sh", "-c", coder_agent.main.init_script]
security_context {
run_as_user = "1000"
}
env {
name = "CODER_AGENT_TOKEN"
value = coder_agent.main.token
}
resources {
requests = {
"cpu" = "2"
"memory" = "2Gi"
}
limits = {
"cpu" = "2"
"memory" = "2Gi"
}
}
}
affinity {
node_affinity {
required_during_scheduling_ignored_during_execution {
node_selector_term {
match_expressions {
key = "cloud.google.com/gke-nodepool"
operator = "In"
values = ["${var.kubernetes_nodepool_workspaces}"]
}
}
}
}
}
}
}

View File

@ -0,0 +1,7 @@
# kubernetes-medium
Provisions a medium-sized workspace with no persistent storage.
_Note_: It is assumed you will be running workspaces on a dedicated GKE nodepool.
By default, this template sets a node affinity of `cloud.google.com/gke-nodepool` = `big-workspaces`.
The nodepool affinity can be customized with the variable `kubernetes_nodepool_workspaces`.

View File

@ -0,0 +1,88 @@
terraform {
required_providers {
coder = {
source = "coder/coder"
version = "~> 0.7.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.18"
}
}
}
provider "coder" {}
provider "kubernetes" {
config_path = null # always use host
}
variable "kubernetes_nodepool_workspaces" {
description = "Kubernetes nodepool for Coder workspaces"
type = string
default = "big-workspaces"
}
data "coder_workspace" "me" {}
resource "coder_agent" "main" {
os = "linux"
arch = "amd64"
startup_script_timeout = 180
startup_script = ""
}
resource "kubernetes_pod" "main" {
count = data.coder_workspace.me.start_count
metadata {
name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
namespace = "coder-big"
labels = {
"app.kubernetes.io/name" = "coder-workspace"
"app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
}
}
spec {
security_context {
run_as_user = "1000"
fs_group = "1000"
}
container {
name = "dev"
image = "docker.io/codercom/enterprise-minimal:ubuntu"
image_pull_policy = "Always"
command = ["sh", "-c", coder_agent.main.init_script]
security_context {
run_as_user = "1000"
}
env {
name = "CODER_AGENT_TOKEN"
value = coder_agent.main.token
}
resources {
requests = {
"cpu" = "2"
"memory" = "2Gi"
}
limits = {
"cpu" = "2"
"memory" = "2Gi"
}
}
}
affinity {
node_affinity {
required_during_scheduling_ignored_during_execution {
node_selector_term {
match_expressions {
key = "cloud.google.com/gke-nodepool"
operator = "In"
values = ["${var.kubernetes_nodepool_workspaces}"]
}
}
}
}
}
}
}

View File

@ -0,0 +1,7 @@
# kubernetes-minimal
Provisions a minimal-sized workspace with no persistent storage.
_Note_: It is assumed you will be running workspaces on a dedicated GKE nodepool.
By default, this template sets a node affinity of `cloud.google.com/gke-nodepool` = `big-workspaces`.
The nodepool affinity can be customized with the variable `kubernetes_nodepool_workspaces`.

View File

@ -0,0 +1,170 @@
terraform {
required_providers {
coder = {
source = "coder/coder"
version = "~> 0.12.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.18"
}
}
}
provider "coder" {}
provider "kubernetes" {
config_path = null # always use host
}
variable "kubernetes_nodepool_workspaces" {
description = "Kubernetes nodepool for Coder workspaces"
type = string
default = "big-workspaces"
}
data "coder_workspace" "me" {}
resource "coder_agent" "m" {
os = "linux"
arch = "amd64"
startup_script_timeout = 180
startup_script = ""
metadata {
display_name = "CPU Usage"
key = "0_cpu_usage"
script = "coder stat cpu"
interval = 10
timeout = 1
}
metadata {
display_name = "RAM Usage"
key = "1_ram_usage"
script = "coder stat mem"
interval = 10
timeout = 1
}
}
resource "coder_script" "websocat" {
agent_id = coder_agent.m.id
display_name = "websocat"
script = <<EOF
curl -sSL -o /tmp/websocat https://github.com/vi/websocat/releases/download/v1.12.0/websocat.x86_64-unknown-linux-musl
chmod +x /tmp/websocat
/tmp/websocat --exit-on-eof --binary ws-l:127.0.0.1:1234 mirror: &
/tmp/websocat --exit-on-eof --binary ws-l:127.0.0.1:1235 cmd:'dd if=/dev/urandom' &
/tmp/websocat --exit-on-eof --binary ws-l:127.0.0.1:1236 cmd:'dd of=/dev/null' &
wait
EOF
run_on_start = true
}
resource "coder_app" "ws_echo" {
agent_id = coder_agent.m.id
slug = "wsec" # Short slug so URL doesn't exceed limit: https://wsec--main--scaletest-UN9UmkDA-0--scaletest-SMXCCYVP-0--apps.big.cdr.dev
display_name = "WebSocket Echo"
url = "http://localhost:1234"
subdomain = true
share = "authenticated"
}
resource "coder_app" "ws_random" {
agent_id = coder_agent.m.id
slug = "wsra" # Short slug so URL doesn't exceed limit: https://wsra--main--scaletest-UN9UmkDA-0--scaletest-SMXCCYVP-0--apps.big.cdr.dev
display_name = "WebSocket Random"
url = "http://localhost:1235"
subdomain = true
share = "authenticated"
}
resource "coder_app" "ws_discard" {
agent_id = coder_agent.m.id
slug = "wsdi" # Short slug so URL doesn't exceed limit: https://wsdi--main--scaletest-UN9UmkDA-0--scaletest-SMXCCYVP-0--apps.big.cdr.dev
display_name = "WebSocket Discard"
url = "http://localhost:1236"
subdomain = true
share = "authenticated"
}
resource "kubernetes_deployment" "main" {
count = data.coder_workspace.me.start_count
metadata {
name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
namespace = "coder-big"
labels = {
"app.kubernetes.io/name" = "coder-workspace"
"app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
"app.kubernetes.io/part-of" = "coder"
"com.coder.resource" = "true"
"com.coder.workspace.id" = data.coder_workspace.me.id
"com.coder.workspace.name" = data.coder_workspace.me.name
"com.coder.user.id" = data.coder_workspace.me.owner_id
"com.coder.user.username" = data.coder_workspace.me.owner
}
}
spec {
replicas = 1
selector {
match_labels = {
"app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
}
}
strategy {
type = "Recreate"
}
template {
metadata {
labels = {
"app.kubernetes.io/name" = "coder-workspace"
"app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
}
}
spec {
security_context {
run_as_user = "1000"
fs_group = "1000"
}
container {
name = "dev"
image = "docker.io/codercom/enterprise-minimal:ubuntu"
image_pull_policy = "IfNotPresent"
command = ["sh", "-c", coder_agent.m.init_script]
security_context {
run_as_user = "1000"
}
env {
name = "CODER_AGENT_TOKEN"
value = coder_agent.m.token
}
resources {
requests = {
"cpu" = "100m"
"memory" = "320Mi"
}
limits = {
"cpu" = "100m"
"memory" = "320Mi"
}
}
}
affinity {
node_affinity {
required_during_scheduling_ignored_during_execution {
node_selector_term {
match_expressions {
key = "cloud.google.com/gke-nodepool"
operator = "In"
values = ["${var.kubernetes_nodepool_workspaces}"]
}
}
}
}
}
}
}
}
}

View File

@ -0,0 +1,7 @@
# kubernetes-small
Provisions a small-sized workspace with no persistent storage.
_Note_: It is assumed you will be running workspaces on a dedicated GKE nodepool.
By default, this template sets a node affinity of `cloud.google.com/gke-nodepool` = `big-workspaces`.
The nodepool affinity can be customized with the variable `kubernetes_nodepool_workspaces`.

View File

@ -0,0 +1,88 @@
terraform {
required_providers {
coder = {
source = "coder/coder"
version = "~> 0.7.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.18"
}
}
}
provider "coder" {}
provider "kubernetes" {
config_path = null # always use host
}
variable "kubernetes_nodepool_workspaces" {
description = "Kubernetes nodepool for Coder workspaces"
type = string
default = "big-workspaces"
}
data "coder_workspace" "me" {}
resource "coder_agent" "main" {
os = "linux"
arch = "amd64"
startup_script_timeout = 180
startup_script = ""
}
resource "kubernetes_pod" "main" {
count = data.coder_workspace.me.start_count
metadata {
name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
namespace = "coder-big"
labels = {
"app.kubernetes.io/name" = "coder-workspace"
"app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}"
}
}
spec {
security_context {
run_as_user = "1000"
fs_group = "1000"
}
container {
name = "dev"
image = "docker.io/codercom/enterprise-base:ubuntu"
image_pull_policy = "Always"
command = ["sh", "-c", coder_agent.main.init_script]
security_context {
run_as_user = "1000"
}
env {
name = "CODER_AGENT_TOKEN"
value = coder_agent.main.token
}
resources {
requests = {
"cpu" = "1"
"memory" = "1Gi"
}
limits = {
"cpu" = "1"
"memory" = "1Gi"
}
}
}
affinity {
node_affinity {
required_during_scheduling_ignored_during_execution {
node_selector_term {
match_expressions {
key = "cloud.google.com/gke-nodepool"
operator = "In"
values = ["${var.kubernetes_nodepool_workspaces}"]
}
}
}
}
}
}
}

View File

@ -44,7 +44,7 @@ locals {
scaletest_run_id = "scaletest-${replace(time_static.start_time.rfc3339, ":", "-")}"
scaletest_run_dir = "/home/coder/${local.scaletest_run_id}"
scaletest_run_start_time = time_static.start_time.rfc3339
grafana_url = "https://stats.dev.c8s.io"
grafana_url = "https://grafana.corp.tld"
grafana_dashboard_uid = "qLVSTR-Vz"
grafana_dashboard_name = "coderv2-loadtest-dashboard"
}
@ -625,6 +625,8 @@ resource "coder_agent" "main" {
vscode = false
ssh_helper = false
}
startup_script_timeout = 86400
shutdown_script_timeout = 7200
startup_script_behavior = "blocking"
startup_script = file("startup.sh")
shutdown_script = file("shutdown.sh")
@ -734,10 +736,9 @@ resource "coder_app" "prometheus" {
agent_id = coder_agent.main.id
slug = "01-prometheus"
display_name = "Prometheus"
// https://stats.dev.c8s.io:9443/classic/graph?g0.range_input=2h&g0.end_input=2023-09-08%2015%3A58&g0.stacked=0&g0.expr=rate(pg_stat_database_xact_commit%7Bcluster%3D%22big%22%2Cdatname%3D%22big-coder%22%7D%5B1m%5D)&g0.tab=0
url = "https://stats.dev.c8s.io:9443"
icon = "https://prometheus.io/assets/favicons/favicon-32x32.png"
external = true
url = "https://grafana.corp.tld:9443"
icon = "https://prometheus.io/assets/favicons/favicon-32x32.png"
external = true
}
resource "coder_app" "manual_cleanup" {