These will show up when configuring the application along with the
client ID and everything else. Should make it easier to configure the
application, otherwise you will have to go look up the URLs in the
docs (which are not yet written).
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
This one is huge, and I'm sorry.
The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.
There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
* fix: doing a noop patch to templates resulted in 404
The patch response did not include the template. The UI required the
template to be returned to form the new page path
null is more explicit, and harder to make occur by mistake.
* fix: allow ports in wildcard url configuration
This just forwards the port to the ui that generates urls.
Our existing parsing + regex already supported ports for
subdomain app requests.
wsproxy also needs to be updated to use tailnet v2 because the `tailnet.Conn` stores peers by ID, and the peerID was not being carried by the JSON protocol. This adds a query param to the endpoint to conditionally switch to the new protocol.
Fixes a flake seen here: https://github.com/coder/coder/actions/runs/7541558190/job/20528545916
```
=== FAIL: enterprise/provisionerd TestRemoteConnector_Fuzz (0.06s)
t.go:84: 2024-01-16 12:32:27.024 [info] connector: failed provisioner authentication remote_addr=[::1]:45138 ...
error= failed to receive jobID:
github.com/coder/coder/v2/enterprise/provisionerd.(*remoteConnector).authenticate
/home/runner/actions-runner/_work/coder/coder/enterprise/provisionerd/remoteprovisioners.go:438
- bufio.Scanner: token too long
t.go:84: 2024-01-16 12:32:27.024 [debu] connector: closed connection remote_addr=[::1]:45138 error=<nil>
remoteprovisioners_test.go:209:
Error Trace: /home/runner/actions-runner/_work/coder/coder/enterprise/provisionerd/remoteprovisioners_test.go:209
Error: "2992256" is not less than "2097152"
Test: TestRemoteConnector_Fuzz
Messages: should not allow more than 1 MiB
```
This was an attempt to test that malicious actors can't abuse our authentication protocol to make us allocate a bunch of memory.
However, the test asserted on the number of bytes sent by the fuzzer, not the number of bytes read (& allocated) by the service. The former is affected by network queue sizes and is thus flaky without actively managing the socket queues, which I don't think we want to do.
In actual practise, the thing that matters is how much memory the bufio Scanner allocates. By inspection, the scanner will allocate up to 64k, and testing this is true devolves into testing the go standard library, which I don't think is worth doing.
So... let's just drop the assertion because
a) its flaky,
b) it doesn't test what we actually want to test,
c) the behavior we actually care about is part of the standard library.
- Adds a new query BatchUpdateLastUsedAt
- Adds calls to BatchUpdateLastUsedAt in app stats handler upon flush
- Passes a stats flush channel to apptest setup scaffolding and updates unit tests to assert modifications to LastUsedAt.
Cli errors are pretty formatted. This handles nested pretty types. Before it found the first error it could understand and return that. Now it will print the full error stack with more information.
To prevent information loss, a "[Trace=...]" was added to capture some extra error context for debugging.
The `SingleTailnet` behavior only checked to see if the `MultiAgent` was
closed, but the websocket error was not being propogated into the
`MultiAgent`, causing it to never be swapped for a new working one.
Fixes https://github.com/coder/coder/issues/11401
Before:
```
Coder Workspace Proxy v0.0.0-devel+85ff030 - Your Self-Hosted Remote Development Platform
Started HTTP listener at http://0.0.0.0:3001
View the Web UI: http://127.0.0.1:3001
==> Logs will stream in below (press ctrl+c to gracefully exit):
2024-01-04 20:11:56.376 [warn] net.workspace-proxy.servertailnet: broadcast server node to agents ...
error= write message:
github.com/coder/coder/v2/enterprise/wsproxy/wsproxysdk.(*remoteMultiAgentHandler).writeJSON
/home/coder/coder/enterprise/wsproxy/wsproxysdk/wsproxysdk.go:524
- failed to write msg: WebSocket closed: failed to read frame header: EOF
```
After:
```
Coder Workspace Proxy v0.0.0-devel+12f1878 - Your Self-Hosted Remote Development Platform
Started HTTP listener at http://0.0.0.0:3001
View the Web UI: http://127.0.0.1:3001
==> Logs will stream in below (press ctrl+c to gracefully exit):
2024-01-04 20:26:38.545 [warn] net.workspace-proxy.servertailnet: multiagent closed, reinitializing
2024-01-04 20:26:38.546 [erro] net.workspace-proxy.servertailnet: reinit multi agent ...
error= dial coordinate websocket:
github.com/coder/coder/v2/enterprise/wsproxy/wsproxysdk.(*Client).DialCoordinator
/home/coder/coder/enterprise/wsproxy/wsproxysdk/wsproxysdk.go:454
- failed to WebSocket dial: failed to send handshake request: Get "http://127.0.0.1:3000/api/v2/workspaceproxies/me/coordinate": dial tcp 127.0.0.1:3000: connect: connection refused
2024-01-04 20:26:38.587 [erro] net.workspace-proxy.servertailnet: reinit multi agent ...
error= dial coordinate websocket:
github.com/coder/coder/v2/enterprise/wsproxy/wsproxysdk.(*Client).DialCoordinator
/home/coder/coder/enterprise/wsproxy/wsproxysdk/wsproxysdk.go:454
- failed to WebSocket dial: failed to send handshake request: Get "http://127.0.0.1:3000/api/v2/workspaceproxies/me/coordinate": dial tcp 127.0.0.1:3000: connect: connection refusedhandshake request: Get "http://127.0.0.1:3000/api/v2/workspaceproxies/me/coordinate": dial tcp 127.0.0.1:3000: connect: connection refused
2024-01-04 20:26:40.446 [info] net.workspace-proxy.servertailnet: successfully reinitialized multiagent agents=0 took=1.900892615s
```
* assert provisioner daemon version and api_version in unit tests
* add build info in HTTP header, extract codersdk.BuildVersionHeader
* add api_version to codersdk.ProvisionerDaemon
* testutil.MustString -> testutil.MustRandString
* Add database tables for OAuth2 applications
These are applications that will be able to use OAuth2 to get an API key
from Coder.
* Add endpoints for managing OAuth2 applications
These let you add, update, and remove OAuth2 applications.
* Add frontend for managing OAuth2 applications
* Adds UpdateProvisionerDaemonLastSeenAt
* Adds heartbeat to provisioner daemons
* Inserts provisioner daemons to database upon start
* Ensures TagOwner is an empty string and not nil
* Adds COALESCE() in idx_provisioner_daemons_name_owner_key
Part of #10532
DRPC transport over yamux and in-mem pipes was previously only used on the provisioner APIs, but now will also be used in tailnet. Moved to subpackage of codersdk to avoid import loops.
Fixes flake https://github.com/coder/coder/runs/19639217635
AGPL coordinator used to process node updates for single_tailnet synchronously, but it's been refactored to process async, so in this test we need to wait for it to be processed.
This sends the email the license was issued to, and whether or not it's a trial in the telemetry payload. It's a bit janky since the license parsing is all enterprise licensed.
- Adds a --name argument to provisionerd start
- Plumbs through name to integrated and external provisioners
- Defaults to hostname if not specified for external, hostname-N for integrated
- Adds cliutil.Hostname
* Force typegen types for some fields of derp health report
* Explicitly allocate slices for RegionReport.{Errors,Warnings} to avoid nulls in API response
Spotted during a code read. ConnIO unlocks the mutex before attempting to write to the response channel, which could allow another goroutine to call Close() and close the channel, causing a panic.
Fix is to hold the mutex. This won't cause a deadlock because the `select{}` has a `default` case, so we won't block even if the receiver isn't keeping up.
Adds support for graceful disconnect to PGCoordinator. When peers gracefully disconnect, they send a disconnect message. This triggers the peer to be disconnected from all tunneled peers.
The Multi-Agent Client supports graceful disconnect, since it is in memory and we know that when it is closed, we really mean to disconnect.
The v1 agent and client Websocket connections do not support graceful disconnect, since the v1 protocol doesn't have this feature. That means that if a v1 peer connects to a v2 peer, when the v1 peer's coordinator connection is closed, the v2 peer will
see it as "lost" since we don't know whether the v1 peer meant to disconnect, or it just lost connectivity to the coordinator.
- Adds a template_insights pseudo-resource
- Grants auditor and template admin roles read access on template_insights
- Updates existing RBAC checks to check for read template_insights, falling back to template update permissions where necessary
- Updates TemplateLayout to show Insights tab if can read template_insights or can update template
Adds a health check for workspace proxies:
- Healthy iff all proxies are healthy and the same version,
- Warning if some proxies are unhealthy,
- Error if all proxies are unhealthy, or do not all have the same version.
fixes#10810
The tailnet coordinators don't depend on replicasync, so we can still enable HA coordinators even if the relay URL is unset.
The in-memory, non-HA coordinator probably has lower latency than the PG Coordinator, since we have to query the database, so enterprise customers might want to disable it for single-replica deployments, but this PR default-enables the HA coordinator. We could add support later to disable it if anyone complains. Latency setting up connections matters, but I don't believe the coordinator contributes significantly at this point for reasonable postgres round-trip-time.
* feat: implement deprecated flag for templates to prevent new workspaces
* Add deprecated filter to template fetching
* Add deprecated to template table
* Add deprecated notice to template page
* Add ui to deprecate a template
re: #10528
Refactors PG Coordinator to work with the Tailnet v2 API, including wrappers for the existing v1 API.
The debug endpoint functions, but doesn't return sensible data, that will be in another stacked PR.
Fixes an issue where remote forwards are not correctly torn down when using OpenSSH with `coder ssh --stdio`. OpenSSH sends a disconnect signal, but then also sends SIGHUP to `coder`. Previously, we just exited when we got SIGHUP, and this raced against properly disconnecting.
Fixes https://github.com/coder/customers/issues/327
* Updated testingWithOwnerUser ruleguard rule to detect:
a) Passing client from coderdenttest.New() to clitest.SetupConfig() similar to what already exists for AGPL code
b) Usage of any method of the owner client from coderdenttest.New() - all usages of the owner client must be justified with a `//nolint:gocritic` comment.
* Fixed resulting linter complaints.
* Added new coderdtest helpers CreateGroup and UpdateTemplateMeta.
* Modified check_enterprise_import.sh to ignore scripts/rules.go.
Fixes test flake seen here: https://github.com/coder/coder/runs/17562370632
It's inherently flaky to create a context with a timeout and then later call `t.Parallel()` since it causes the test to wait until all non-parallel tests have completed before resuming execution. By the time execution has resumed, the context may
have expired. The amount of time before resuming is dependent on machine resources and number of test cases, which are inherently variable.
* feat: support configurable web terminal rendering
- Added a deployment option for configuring web terminal rendering.
Valid values are 'webgl', 'canvas', and 'dom'.
- Fixes an issue where workspaces that are eligible for auto-deletion
are retried every tick (1 minute) even if the previous deletion
transition failed.
The updated logic only attempts to delete workspaces that previously
failed once a day (24 hours since last attempt).
- Adds an audit log for workspaces automatically transitioned to the dormant
state.
- Imposes a mininum of 1 minute on cleanup-related fields. This is to
prevent accidental API misuse from resulting in catastrophe.
* chore: rename `git_auth` to `external_auth` in our schema
We're changing Git auth to be external auth. It will support
any OAuth2 or OIDC provider.
To split up the larger change I want to contribute the schema
changes first, and I'll add the feature itself in another PR.
* Fix names
* Fix outdated view
* Rename some additional places
* Fix sort order
* Fix template versions auth route
* Fix types
* Fix dbauthz
* Adds agenttest.New() helper function
* Makes sure agent gets closed on test cleanup
* Makes sure you don't forget to set session token
* Sets the agent and client logger automatically
Fixes#9823.
- Decomposes UpdateWorkspaceBuildByID into UpdateWorkspaceBuildProvisionerStateByID and UpdateWorkspaceBuildDeadlineByID.
- Replaces existing invocations of UpdateWorkspaceBuildByID with the newer queries where applicable.
- Modifies GetActiveWorkspaceBuildsByTemplateID to not return incomplete workspace builds.
* fix: use AlwaysEnable for licenses with all features
Signed-off-by: Spike Curtis <spike@coder.com>
* use dbtime.Now() intead of time.Now()
Signed-off-by: Spike Curtis <spike@coder.com>
---------
Signed-off-by: Spike Curtis <spike@coder.com>
- Broadens scope of data generation in TestServerDBCrypt over all user login types, statuses, and deletion status.
- Adds support for specifying user status / user deletion status in dbgen
- Adds more comprehensive logging in TestServerDBCrypt upon test failure (to be generalized and expanded upon in a follow-up)
- Adds AllUserIDs query, updates dbcrypt to use this instead of GetUsers.
- Adds dbtestutil.WithTimezone(tz) to allow setting the timezone for a test database.
- Modifies our test database setup code to pick a consistently weird timezone for the database.
- Adds the facility randtz.Name() to pick a random timezone which is consistent across subtests (via sync.Once).
- Adds a linter rule to warn against setting the test database timezone to UTC.
This change reduces the CPU consumption of --help by ~50%.
Also, this change removes ANSI escape codes from our golden files. I
don't think those were worth the inability to parallelize golden file tests and
global state fragility.
This change will improve over CLI performance and "snappiness" as well as
substantially reduce our test times. Preliminary benchmarks show
`coder server --help` times cut from 300ms to 120ms on my dogfood
instance.
The inefficiency of lipgloss disproportionately impacts our system, as all help
text for every command is generated whenever any command is invoked.
The `pretty` API could clean up a lot of the code (e.g., by replacing
complex string concatenations with Printf), but this commit is too
expansive as is so that work will be done in a follow up.
See also: https://github.com/coder/coder/pull/9522
- Adds commands `server dbcrypt {rotate,decrypt,delete}` to re-encrypt, decrypt, or delete encrypted data, respectively.
- Plumbs through dbcrypt in enterprise/coderd (including unit tests).
- Adds documentation in admin/encryption.md.
This enables dbcrypt by default, but the feature is soft-enforced on supplying external token encryption keys. Without specifying any keys, encryption/decryption is a no-op.
This is an alternative approach to #9519 and removes 2 MB instead of 1
MB (1.2 MB accounted for by embedded migration SQL files).
Combined with #9481, #9506, #9508, #9517, a total of 5 MB is removed.
Ref: #9380
This removes an indirect import of `coderd/database` from the CLI and
results in a logical separation between server related and generalized
schedule.
No size change (yet).
Ref: #9380