docs: describe reference architectures (#12609)

This commit is contained in:
Marcin Tojek 2024-03-15 17:01:45 +01:00 committed by GitHub
parent b0c4e7504c
commit bed2545636
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 544 additions and 7 deletions

View File

@ -0,0 +1,51 @@
# Reference Architecture: up to 1,000 users
The 1,000 users architecture is designed to cover a wide range of workflows.
Examples of subjects that might utilize this architecture include medium-sized
tech startups, educational units, or small to mid-sized enterprises.
**Target load**: API: up to 180 RPS
**High Availability**: non-essential for small deployments
## Hardware recommendations
### Coderd nodes
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | ------------------- | ------------------- | --------------- | ---------- | ----------------- |
| Up to 1,000 | 2 vCPU, 8 GB memory | 1-2 / 1 coderd each | `n1-standard-2` | `t3.large` | `Standard_D2s_v3` |
**Footnotes**:
- For small deployments (ca. 100 users, 10 concurrent workspace builds), it is
acceptable to deploy provisioners on `coderd` nodes.
### Provisioner nodes
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ------------------------------ | ---------------- | ------------ | ----------------- |
| Up to 1,000 | 8 vCPU, 32 GB memory | 2 nodes / 30 provisioners each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
**Footnotes**:
- An external provisioner is deployed as Kubernetes pod.
### Workspace nodes
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ----------------------- | ---------------- | ------------ | ----------------- |
| Up to 1,000 | 8 vCPU, 32 GB memory | 64 / 16 workspaces each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
**Footnotes**:
- Assumed that a workspace user needs at minimum 2 GB memory to perform. We
recommend against over-provisioning memory for developer workloads, as this my
lead to OOMKiller invocations.
- Maximum number of Kubernetes workspace pods per node: 256
### Database nodes
| Users | Node capacity | Replicas | Storage | GCP | AWS | Azure |
| ----------- | ------------------- | -------- | ------- | ------------------ | ------------- | ----------------- |
| Up to 1,000 | 2 vCPU, 8 GB memory | 1 | 512 GB | `db-custom-2-7680` | `db.t3.large` | `Standard_D2s_v3` |

View File

@ -0,0 +1,59 @@
# Reference Architecture: up to 2,000 users
In the 2,000 users architecture, there is a moderate increase in traffic,
suggesting a growing user base or expanding operations. This setup is
well-suited for mid-sized companies experiencing growth or for universities
seeking to accommodate their expanding user populations.
Users can be evenly distributed between 2 regions or be attached to different
clusters.
**Target load**: API: up to 300 RPS
**High Availability**: The mode is _enabled_; multiple replicas provide higher
deployment reliability under load.
## Hardware recommendations
### Coderd nodes
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ----------------------- | --------------- | ----------- | ----------------- |
| Up to 2,000 | 4 vCPU, 16 GB memory | 2 nodes / 1 coderd each | `n1-standard-4` | `t3.xlarge` | `Standard_D4s_v3` |
### Provisioner nodes
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ------------------------------ | ---------------- | ------------ | ----------------- |
| Up to 2,000 | 8 vCPU, 32 GB memory | 4 nodes / 30 provisioners each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
**Footnotes**:
- An external provisioner is deployed as Kubernetes pod.
- It is not recommended to run provisioner daemons on `coderd` nodes.
- Consider separating provisioners into different namespaces in favor of
zero-trust or multi-cloud deployments.
### Workspace nodes
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ------------------------ | ---------------- | ------------ | ----------------- |
| Up to 2,000 | 8 vCPU, 32 GB memory | 128 / 16 workspaces each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
**Footnotes**:
- Assumed that a workspace user needs 2 GB memory to perform
- Maximum number of Kubernetes workspace pods per node: 256
- Nodes can be distributed in 2 regions, not necessarily evenly split, depending
on developer team sizes
### Database nodes
| Users | Node capacity | Replicas | Storage | GCP | AWS | Azure |
| ----------- | -------------------- | -------- | ------- | ------------------- | -------------- | ----------------- |
| Up to 2,000 | 4 vCPU, 16 GB memory | 1 | 1 TB | `db-custom-4-15360` | `db.t3.xlarge` | `Standard_D4s_v3` |
**Footnotes**:
- Consider adding more replicas if the workspace activity is higher than 500
workspace builds per day or to achieve higher RPS.

View File

@ -0,0 +1,62 @@
# Reference Architecture: up to 3,000 users
The 3,000 users architecture targets large-scale enterprises, possibly with
on-premises network and cloud deployments.
**Target load**: API: up to 550 RPS
**High Availability**: Typically, such scale requires a fully-managed HA
PostgreSQL service, and all Coder observability features enabled for operational
purposes.
**Observability**: Deploy monitoring solutions to gather Prometheus metrics and
visualize them with Grafana to gain detailed insights into infrastructure and
application behavior. This allows operators to respond quickly to incidents and
continuously improve the reliability and performance of the platform.
## Hardware recommendations
### Coderd nodes
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ----------------- | --------------- | ----------- | ----------------- |
| Up to 3,000 | 8 vCPU, 32 GB memory | 4 / 1 coderd each | `n1-standard-4` | `t3.xlarge` | `Standard_D4s_v3` |
### Provisioner nodes
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ------------------------ | ---------------- | ------------ | ----------------- |
| Up to 3,000 | 8 vCPU, 32 GB memory | 8 / 30 provisioners each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
**Footnotes**:
- An external provisioner is deployed as Kubernetes pod.
- It is strongly discouraged to run provisioner daemons on `coderd` nodes at
this level of scale.
- Separate provisioners into different namespaces in favor of zero-trust or
multi-cloud deployments.
### Workspace nodes
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ------------------------------ | ---------------- | ------------ | ----------------- |
| Up to 3,000 | 8 vCPU, 32 GB memory | 256 nodes / 12 workspaces each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
**Footnotes**:
- Assumed that a workspace user needs 2 GB memory to perform
- Maximum number of Kubernetes workspace pods per node: 256
- As workspace nodes can be distributed between regions, on-premises networks
and cloud areas, consider different namespaces in favor of zero-trust or
multi-cloud deployments.
### Database nodes
| Users | Node capacity | Replicas | Storage | GCP | AWS | Azure |
| ----------- | -------------------- | -------- | ------- | ------------------- | --------------- | ----------------- |
| Up to 3,000 | 8 vCPU, 32 GB memory | 2 | 1.5 TB | `db-custom-8-30720` | `db.t3.2xlarge` | `Standard_D8s_v3` |
**Footnotes**:
- Consider adding more replicas if the workspace activity is higher than 1500
workspace builds per day or to achieve higher RPS.

View File

@ -0,0 +1,344 @@
# Reference Architectures
This document provides prescriptive solutions and reference architectures to
support successful deployments of up to 3000 users and outlines at a high-level
the methodology currently used to scale-test Coder.
## General concepts
This section outlines core concepts and terminology essential for understanding
Coder's architecture and deployment strategies.
### Administrator
An administrator is a user role within the Coder platform with elevated
privileges. Admins have access to administrative functions such as user
management, template definitions, insights, and deployment configuration.
### Coder
Coder, also known as _coderd_, is the main service recommended for deployment
with multiple replicas to ensure high availability. It provides an API for
managing workspaces and templates. Each _coderd_ replica has the capability to
host multiple [provisioners](#provisioner).
### User
A user is an individual who utilizes the Coder platform to develop, test, and
deploy applications using workspaces. Users can select available templates to
provision workspaces. They interact with Coder using the web interface, the CLI
tool, or directly calling API methods.
### Workspace
A workspace refers to an isolated development environment where users can write,
build, and run code. Workspaces are fully configurable and can be tailored to
specific project requirements, providing developers with a consistent and
efficient development environment. Workspaces can be autostarted and
autostopped, enabling efficient resource management.
Users can connect to workspaces using SSH or via workspace applications like
`code-server`, facilitating collaboration and remote access. Additionally,
workspaces can be parameterized, allowing users to customize settings and
configurations based on their unique needs. Workspaces are instantiated using
Coder templates and deployed on resources created by provisioners.
### Template
A template in Coder is a predefined configuration for creating workspaces.
Templates streamline the process of workspace creation by providing
pre-configured settings, tooling, and dependencies. They are built by template
administrators on top of Terraform, allowing for efficient management of
infrastructure resources. Additionally, templates can utilize Coder modules to
leverage existing features shared with other templates, enhancing flexibility
and consistency across deployments. Templates describe provisioning rules for
infrastructure resources offered by Terraform providers.
### Workspace Proxy
A workspace proxy serves as a relay connection option for developers connecting
to their workspace over SSH, a workspace app, or through port forwarding. It
helps reduce network latency for geo-distributed teams by minimizing the
distance network traffic needs to travel. Notably, workspace proxies do not
handle dashboard connections or API calls.
### Provisioner
Provisioners in Coder execute Terraform during workspace and template builds.
While the platform includes built-in provisioner daemons by default, there are
advantages to employing external provisioners. These external daemons provide
secure build environments and reduce server load, improving performance and
scalability. Each provisioner can handle a single concurrent workspace build,
allowing for efficient resource allocation and workload management.
### Registry
The Coder Registry is a platform where you can find starter templates and
_Modules_ for various cloud services and platforms.
Templates help create self-service development environments using
Terraform-defined infrastructure, while _Modules_ simplify template creation by
providing common features like workspace applications, third-party integrations,
or helper scripts.
Please note that the Registry is a hosted service and isn't available for
offline use.
## Scale-testing methodology
Scaling Coder involves planning and testing to ensure it can handle more load
without compromising service. This process encompasses infrastructure setup,
traffic projections, and aggressive testing to identify and mitigate potential
bottlenecks.
A dedicated Kubernetes cluster for Coder is Kubernetes cluster specifically
configured to host and manage Coder workloads. Kubernetes provides container
orchestration capabilities, allowing Coder to efficiently deploy, scale, and
manage workspaces across a distributed infrastructure. This ensures high
availability, fault tolerance, and scalability for Coder deployments. Code is
deployed on this cluster using the
[Helm chart](../install/kubernetes#install-coder-with-helm).
Our scale tests include the following stages:
1. Prepare environment: create expected users and provision workspaces.
2. SSH connections: establish user connections with agents, verifying their
ability to echo back received content.
3. Web Terminal: verify the PTY connection used for communication with Web
Terminal.
4. Workspace application traffic: assess the handling of user connections with
specific workspace apps, confirming their capability to echo back received
content effectively.
5. Dashboard evaluation: verify the responsiveness and stability of Coder
dashboards under varying load conditions. This is achieved by simulating user
interactions using instances of headless Chromium browsers.
6. Cleanup: delete workspaces and users created in step 1.
### Infrastructure and setup requirements
The scale tests runner can distribute the workload to overlap single scenarios
based on the workflow configuration:
| | T0 | T1 | T2 | T3 | T4 | T5 | T6 |
| -------------------- | --- | --- | --- | --- | --- | --- | --- |
| SSH connections | X | X | X | X | | | |
| Web Terminal (PTY) | | X | X | X | X | | |
| Workspace apps | | | X | X | X | X | |
| Dashboard (headless) | | | | X | X | X | X |
This pattern closely reflects how our customers naturally use the system. SSH
connections are heavily utilized because they're the primary communication
channel for IDEs with VS Code and JetBrains plugins.
The basic setup of scale tests environment involves:
1. Scale tests runner (32 vCPU, 128 GB RAM)
2. Coder: 2 replicas (4 vCPU, 16 GB RAM)
3. Database: 1 instance (2 vCPU, 32 GB RAM)
4. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
The test is deemed successful if users did not experience interruptions in their
workflows, `coderd` did not crash or require restarts, and no other internal
errors were observed.
### Traffic Projections
In our scale tests, we simulate activity from 2000 users, 2000 workspaces, and
2000 agents, with two items of workspace agent metadata being sent every 10
seconds. Here are the resulting metrics:
Coder:
- Median CPU usage for _coderd_: 3 vCPU, peaking at 3.7 vCPU while all tests are
running concurrently.
- Median API request rate: 350 RPS during dashboard tests, 250 RPS during Web
Terminal and workspace apps tests.
- 2000 agent API connections with latency: p90 at 60 ms, p95 at 220 ms.
- on average 2400 Web Socket connections during dashboard tests.
Provisionerd:
- Median CPU usage is 0.35 vCPU during workspace provisioning.
Database:
- Median CPU utilization is 80%, with a significant portion dedicated to writing
workspace agent metadata.
- Memory utilization averages at 40%.
- `write_ops_count` between 6.7 and 8.4 operations per second.
## Available reference architectures
[Up to 1,000 users](1k-users.md)
[Up to 2,000 users](2k-users.md)
[Up to 3,000 users](3k-users.md)
## Hardware recommendation
### Control plane: coderd
To ensure stability and reliability of the Coder control plane, it's essential
to focus on node sizing, resource limits, and the number of replicas. We
recommend referencing public cloud providers such as AWS, GCP, and Azure for
guidance on optimal configurations. A reasonable approach involves using scaling
formulas based on factors like CPU, memory, and the number of users.
While the minimum requirements specify 1 CPU core and 2 GB of memory per
`coderd` replica, it is recommended to allocate additional resources depending
on the workload size to ensure deployment stability.
#### CPU and memory usage
Enabling [agent stats collection](../../cli.md#--prometheus-collect-agent-stats)
(optional) may increase memory consumption.
Enabling direct connections between users and workspace agents (apps or SSH
traffic) can help prevent an increase in CPU usage. It is recommended to keep
[this option enabled](../../cli.md#--disable-direct-connections) unless there
are compelling reasons to disable it.
Inactive users do not consume Coder resources.
#### Scaling formula
When determining scaling requirements, consider the following factors:
- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource
allocation based on the number of users and their expected usage patterns.
- API latency/response time: Monitor API latency and response times to ensure
optimal performance under varying loads.
- Average number of HTTP requests: Track the average number of HTTP requests to
gauge system usage and identify potential bottlenecks. The number of proxied
connections: For a very high number of proxied connections, more memory is
required.
**HTTP API latency**
For a reliable Coder deployment dealing with medium to high loads, it's
important that API calls for workspace/template queries and workspace build
operations respond within 300 ms. However, API template insights calls, which
involve browsing workspace agent stats and user activity data, may require more
time. Moreover, Coder API exposes WebSocket long-lived connections for Web
Terminal (bidirectional), and Workspace events/logs (unidirectional).
If the Coder deployment expects traffic from developers spread across the globe,
be aware that customer-facing latency might be higher because of the distance
between users and the load balancer. Fortunately, the latency can be improved
with a deployment of Coder [workspace proxies](../workspace-proxies.md).
**Node Autoscaling**
We recommend disabling the autoscaling for `coderd` nodes. Autoscaling can cause
interruptions for user connections, see [Autoscaling](../scale.md#autoscaling)
for more details.
### Control plane: provisionerd
Each external provisioner can run a single concurrent workspace build. For
example, running 10 provisioner containers will allow 10 users to start
workspaces at the same time.
By default, the Coder server runs 3 built-in provisioner daemons, but the
_Enterprise_ Coder release allows for running external provisioners to separate
the load caused by workspace provisioning on the `coderd` nodes.
#### Scaling formula
When determining scaling requirements, consider the following factors:
- `1 vCPU x 1 GB memory x 2 concurrent workspace build`: A formula to determine
resource allocation based on the number of concurrent workspace builds, and
standard complexity of a Terraform template. _Rule of thumb_: the more
provisioners are free/available, the more concurrent workspace builds can be
performed.
**Node Autoscaling**
Autoscaling provisioners is not an easy problem to solve unless it can be
predicted when a number of concurrent workspace builds increases.
We recommend disabling autoscaling and adjusting the number of provisioners to
developer needs based on the workspace build queuing time.
### Data plane: Workspaces
To determine workspace resource limits and keep the best developer experience
for workspace users, administrators must be aware of a few assumptions.
- Workspace pods run on the same Kubernetes cluster, but possibly in a different
namespace or on a separate set of nodes.
- Workspace limits (per workspace user):
- Evaluate the workspace utilization pattern. For instance, web application
development does not require high CPU capacity at all times, but will spike
during builds or testing.
- Evaluate minimal limits for single workspace. Include in the calculation
requirements for Coder agent running in an idle workspace - 0.1 vCPU and 256
MB. For instance, developers can choose between 0.5-8 vCPUs, and 1-16 GB
memory.
#### Scaling formula
When determining scaling requirements, consider the following factors:
- `1 vCPU x 2 GB memory x 1 workspace`: A formula to determine resource
allocation based on the minimal requirements for an idle workspace with a
running Coder agent and occasional CPU and memory bursts for building
projects.
**Node Autoscaling**
Workspace nodes can be set to operate in autoscaling mode to mitigate the risk
of prolonged high resource utilization.
One approach is to scale up workspace nodes when total CPU usage or memory
consumption reaches 80%. Another option is to scale based on metrics such as the
number of workspaces or active users. It's important to note that as new users
onboard, the autoscaling configuration should account for ongoing workspaces.
Scaling down workspace nodes to zero is not recommended, as it will result in
longer wait times for workspace provisioning by users. However, this may be
necessary for workspaces with special resource requirements (e.g. GPUs) that
incur significant cost overheads.
### Data plane: External database
While running in production, Coder requires a access to an external PostgreSQL
database. Depending on the scale of the user-base, workspace activity, and High
Availability requirements, the amount of CPU and memory resources required by
Coder's database may differ.
#### Scaling formula
When determining scaling requirements, take into account the following
considerations:
- `2 vCPU x 8 GB RAM x 512 GB storage`: A baseline for database requirements for
Coder deployment with less than 1000 users, and low activity level (30% active
users). This capacity should be sufficient to support 100 external
provisioners.
- Storage size depends on user activity, workspace builds, log verbosity,
overhead on database encryption, etc.
- Allocate two additional CPU core to the database instance for every 1000
active users.
- Enable _High Availability_ mode for database engine for large scale
deployments.
If you enable [database encryption](../encryption.md) in Coder, consider
allocating an additional CPU core to every `coderd` replica.
#### Performance optimization guidelines
We provide the following general recommendations for PostgreSQL settings:
- Increase number of vCPU if CPU utilization or database latency is high.
- Allocate extra memory if database performance is poor, CPU utilization is low,
and memory utilization is high.
- Utilize faster disk options (higher IOPS) such as SSDs or NVMe drives for
optimal performance enhancement and possibly reduce database load.

View File

@ -106,12 +106,13 @@ For example, to support 120 concurrent workspace builds:
> Note: the below information is for reference purposes only, and are not
> intended to be used as guidelines for infrastructure sizing.
| Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested |
| ---------------- | --------- | --------- | -------------- | ---------------- | ----- | ----------------- | ------------------------------------- | ------------- | ------------ |
| Kubernetes (GKE) | 3 cores | 12 GB | 1 | db-f1-micro | 200 | 3 | 200 simulated | `v0.24.1` | Jun 26, 2023 |
| Kubernetes (GKE) | 4 cores | 8 GB | 1 | db-custom-1-3840 | 1500 | 20 | 1,500 simulated | `v0.24.1` | Jun 27, 2023 |
| Kubernetes (GKE) | 2 cores | 4 GB | 1 | db-custom-1-3840 | 500 | 20 | 500 simulated | `v0.27.2` | Jul 27, 2023 |
| Kubernetes (GKE) | 2 cores | 4 GB | 2 | db-custom-2-7680 | 1000 | 20 | 1000 simulated | `v2.2.1` | Oct 9, 2023 |
| Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested |
| ---------------- | --------- | --------- | -------------- | ----------------- | ----- | ----------------- | ------------------------------------- | ------------- | ------------ |
| Kubernetes (GKE) | 3 cores | 12 GB | 1 | db-f1-micro | 200 | 3 | 200 simulated | `v0.24.1` | Jun 26, 2023 |
| Kubernetes (GKE) | 4 cores | 8 GB | 1 | db-custom-1-3840 | 1500 | 20 | 1,500 simulated | `v0.24.1` | Jun 27, 2023 |
| Kubernetes (GKE) | 2 cores | 4 GB | 1 | db-custom-1-3840 | 500 | 20 | 500 simulated | `v0.27.2` | Jul 27, 2023 |
| Kubernetes (GKE) | 2 cores | 8 GB | 2 | db-custom-2-7680 | 1000 | 20 | 1000 simulated | `v2.2.1` | Oct 9, 2023 |
| Kubernetes (GKE) | 4 cores | 16 GB | 2 | db-custom-8-30720 | 2000 | 50 | 2000 simulated | `v2.8.4` | Feb 28, 2024 |
> Note: a simulated connection reads and writes random data at 40KB/s per
> connection.

View File

@ -375,10 +375,30 @@
},
{
"title": "Scaling Coder",
"description": "Reference architecture and load testing tools",
"description": "Learn how to use load testing tools",
"path": "./admin/scale.md",
"icon_path": "./images/icons/scale.svg"
},
{
"title": "Reference Architectures",
"description": "Learn about reference architectures for Coder",
"path": "./admin/architectures/index.md",
"icon_path": "./images/icons/scale.svg",
"children": [
{
"title": "Up to 1,000 users",
"path": "./admin/architectures/1k-users.md"
},
{
"title": "Up to 2,000 users",
"path": "./admin/architectures/2k-users.md"
},
{
"title": "Up to 3,000 users",
"path": "./admin/architectures/3k-users.md"
}
]
},
{
"title": "External Provisioners",
"description": "Run provisioners isolated from the Coder server",