This PR solves #10478 by auto-filling previously used template values in create and update workspace flows.
I decided against explicit user values in settings for these reasons:
* Autofill is far easier to implement
* Users benefit from autofill _by default_ — we don't need to teach them new concepts
* If we decide that autofill creates more harm than good, we can remove it without breaking compatibility
`agentsdk` depends on `agent/proto` because it needs to get the version to dial.
Therefore, the conversion routines need to live in `agentsdk` so that we can convert to and from the Manifest.
I briefly considered refactoring the agent to only reference `proto.Manifest`, but decided against it because we might have multiple protocol versions in the future, its useful to have a protocol-independent data structure.
Fixes#8218
Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it.
The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly.
We should eventually remove this: #11819
This PR updates the Agent API to use the appearance.Fetcher, which is set by entitlement code in Enterprise coderd.
This brings the agentapi into compliance with the Enterprise feature.
The new Agent API needs an interface for ServiceBanners, so this PR creates it and refactors the AGPL and Enterprise code to achieve it.
Before we depended on the fact that the HTTP endpoint was missing to serve an empty ServiceBanner on AGPL deployments, but that won't work with dRPC, so we need a real interface to call.
#7439 added global caching of RBAC results.
Calls are cached based on hash(subject, object, action).
We often use dbauthz.AsSystemRestricted to handle "internal" authz calls, and these are often repeated with similar arguments and are likely to get cached.
So a transient error doing an authz check on a system function will be cached for up to a minute.
I'm just starting off with excluding context.Canceled but there's likely a whole suite of different errors we want to also exclude from the global cache.
- Adds column `favorite` to workspaces table
- Adds API endpoints to favorite/unfavorite workspaces
- Modifies sorting order to return owners' favorite workspaces first
Fixes 2 related issues:
1. wsconncache had incorrect logic to test whether to send DERPMap updates, sending if the maps were equivalent, instead of if they were _not equivalent_.
2. configmaps used a bugged check to test equality between DERPMaps, since it contains a map and the map entries are serialized in random order. Instead, we avoid comparing the protobufs and instead depend on the existing function that compares `tailcfg.DERPMap`. This also has the effect of reducing the number of times we convert to and from protobuf.
fixes#10531
Adds a check for `version` on connection to the Agent API websocket endpoint. This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd.
It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.
Adds support to `ServerTailnet` to set all peers lost before attempting to reconnect to the coordinator. In practice, this only really affects `wsproxy` since coderd has a local connection to the coordinator that only goes down if we're shutting down or change licenses.
These will show up when configuring the application along with the
client ID and everything else. Should make it easier to configure the
application, otherwise you will have to go look up the URLs in the
docs (which are not yet written).
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
Rather than passing all the deployment values. This is to make it
easier to generate API keys as part of the oauth flow.
I also added and fixed a test for when the lifetime is set and the
default and expiration are unset.
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
We don't have visibility into some feature usage, so this adds a lot of fields missing from `database.Template` to `telemetry.Template`. Deprecation message is not collected, just whether it's set or not.
adds setAllPeersLost to the configMaps subcomponent of tailnet.Conn --- we'll call this when we disconnect from a coordinator so we'll eventually clean up peers if they disconnect while we are retrying the coordinator connection (or we don't succeed in reconnecting to the coordinator).
This one is huge, and I'm sorry.
The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.
There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
* fix: allow ports in wildcard url configuration
This just forwards the port to the ui that generates urls.
Our existing parsing + regex already supported ports for
subdomain app requests.
- Adds a new query BatchUpdateLastUsedAt
- Adds calls to BatchUpdateLastUsedAt in app stats handler upon flush
- Passes a stats flush channel to apptest setup scaffolding and updates unit tests to assert modifications to LastUsedAt.
The `SingleTailnet` behavior only checked to see if the `MultiAgent` was
closed, but the websocket error was not being propogated into the
`MultiAgent`, causing it to never be swapped for a new working one.
Fixes https://github.com/coder/coder/issues/11401
Before:
```
Coder Workspace Proxy v0.0.0-devel+85ff030 - Your Self-Hosted Remote Development Platform
Started HTTP listener at http://0.0.0.0:3001
View the Web UI: http://127.0.0.1:3001
==> Logs will stream in below (press ctrl+c to gracefully exit):
2024-01-04 20:11:56.376 [warn] net.workspace-proxy.servertailnet: broadcast server node to agents ...
error= write message:
github.com/coder/coder/v2/enterprise/wsproxy/wsproxysdk.(*remoteMultiAgentHandler).writeJSON
/home/coder/coder/enterprise/wsproxy/wsproxysdk/wsproxysdk.go:524
- failed to write msg: WebSocket closed: failed to read frame header: EOF
```
After:
```
Coder Workspace Proxy v0.0.0-devel+12f1878 - Your Self-Hosted Remote Development Platform
Started HTTP listener at http://0.0.0.0:3001
View the Web UI: http://127.0.0.1:3001
==> Logs will stream in below (press ctrl+c to gracefully exit):
2024-01-04 20:26:38.545 [warn] net.workspace-proxy.servertailnet: multiagent closed, reinitializing
2024-01-04 20:26:38.546 [erro] net.workspace-proxy.servertailnet: reinit multi agent ...
error= dial coordinate websocket:
github.com/coder/coder/v2/enterprise/wsproxy/wsproxysdk.(*Client).DialCoordinator
/home/coder/coder/enterprise/wsproxy/wsproxysdk/wsproxysdk.go:454
- failed to WebSocket dial: failed to send handshake request: Get "http://127.0.0.1:3000/api/v2/workspaceproxies/me/coordinate": dial tcp 127.0.0.1:3000: connect: connection refused
2024-01-04 20:26:38.587 [erro] net.workspace-proxy.servertailnet: reinit multi agent ...
error= dial coordinate websocket:
github.com/coder/coder/v2/enterprise/wsproxy/wsproxysdk.(*Client).DialCoordinator
/home/coder/coder/enterprise/wsproxy/wsproxysdk/wsproxysdk.go:454
- failed to WebSocket dial: failed to send handshake request: Get "http://127.0.0.1:3000/api/v2/workspaceproxies/me/coordinate": dial tcp 127.0.0.1:3000: connect: connection refusedhandshake request: Get "http://127.0.0.1:3000/api/v2/workspaceproxies/me/coordinate": dial tcp 127.0.0.1:3000: connect: connection refused
2024-01-04 20:26:40.446 [info] net.workspace-proxy.servertailnet: successfully reinitialized multiagent agents=0 took=1.900892615s
```
This test case fails with an error log, showing "context canceled" when trying to send an acquired job to an in-mem provisionerd.
https://github.com/coder/coder/runs/20331469006
In this case, we don't want to supress this error, since it could mean that we acquired a job, locked it in the database, then failed to send it to a provisioner.
(We also don't want to mark the job as failed because we don't know whether the job made it to the provisionerd or not --- in the failed test you can see that the job is actually processed just fine).
The reason we got context canceled is because the API was shutting down --- we don't want provisionerdserver to abruptly stop processing job stuff as the API shuts down as this will leave jobs in a bad state. This PR fixes up the use of contexts with provisionerdserver and the associated drpc service calls.
Fixes#11451
A refactor of the Agent API passes metrics as protobufs, which include pointers to label name/value pairs. The aggregator tested for sameness by doing a shallow compare of label values, which for different stats reports would compare unequal because the pointers would be different.
This fix does a deep compare.
While testing I also noted that we neglect to compare template names. This is unlikely to have caused any issue in practice, since the combination of username/workspace is unique, but in the context of comparing metric labels we should do the comparison.
If a user creates a workspace, deletes it, then recreates from a different template, we could in principle have reported incorrect stats for the old template.
* assert provisioner daemon version and api_version in unit tests
* add build info in HTTP header, extract codersdk.BuildVersionHeader
* add api_version to codersdk.ProvisionerDaemon
* testutil.MustString -> testutil.MustRandString
Refactors the code that handles monitoring an agent websocket with pings and updating the connection times in the DB.
Consolidates v1 and v2 agent APIs under the same code for this.
One substantive change (not _just_ a refactor) is that I've made it so that we actually disconnect if the agent fails to respond to our pings, rather than the old behavior where we would update the database, but not actually tear down the websocket.
Refactors our DRPC service definitions slightly.
In the previous version, I inserted the RPCs from the tailnet proto directly into the Agent service. This makes things hard to deal with because DRPC then generates a new set of methods with new interfaces with the `DRPCAgent_` prefixed. Since you can't have a single method that takes different argument types, we couldn't reuse the implementation of those RFCs without a lot of extra classes and pass-thru methods.
Instead, the "right" way to do it is to integrate at the DRPC layer. So, we have two DRPC services available over the Agent websocket, and register them both on the DRPC `mux`.
Since the tailnet proto RPC service is now for both clients and agents, I renamed some things to clarify and shorten.
This PR also removes the `TailnetAPI` implementation from the `agentapi` package, and the next PR in the stack replaces it with the implementation from the `tailnet` package.
* Add database tables for OAuth2 applications
These are applications that will be able to use OAuth2 to get an API key
from Coder.
* Add endpoints for managing OAuth2 applications
These let you add, update, and remove OAuth2 applications.
* Add frontend for managing OAuth2 applications
* feat: enable csrf token header
* Exempt external auth requets
* ensure dev server bypasses CSRF
* external auth is just get requests
* Add some more routes
* Extra assurance nothing breaks
* chore: fix flake, use time closer to actual test
The tests were queued, and the autostart time was being set
to the time the table was created, not when the test was actually
being run. This diff was causing failures in CI
* Adds UpdateProvisionerDaemonLastSeenAt
* Adds heartbeat to provisioner daemons
* Inserts provisioner daemons to database upon start
* Ensures TagOwner is an empty string and not nil
* Adds COALESCE() in idx_provisioner_daemons_name_owner_key
closes#10532
Adds v2 support to the /coordinate endpoint via a query parameter.
v1 already has test cases, and we haven't implemented v2 at the client yet, so the only new test case is an unsupported version.
Part of #10532
DRPC transport over yamux and in-mem pipes was previously only used on the provisioner APIs, but now will also be used in tailnet. Moved to subpackage of codersdk to avoid import loops.
This sends the email the license was issued to, and whether or not it's a trial in the telemetry payload. It's a bit janky since the license parsing is all enterprise licensed.
Addresses the issue in #11185 for the StringMap datatype.
There are other slice data types in our database package that also need to be fixed, but that'll be a different PR
Adds column api_version to the provisioner_daemons table.
This is distinct from the coderd version, and is used to handle breaking changes in the provisioner daemon API.
* chore(Makefile): use golangci-lint version from dogfood Dockerfile
* chore(dogfood/Dockerfile): update golangci-lint to latest version
* chore(coderd): address linter complaints
- Adds a --name argument to provisionerd start
- Plumbs through name to integrated and external provisioners
- Defaults to hostname if not specified for external, hostname-N for integrated
- Adds cliutil.Hostname
* feat: add endpoints to list all authed external apps
Listing the apps allows users to auth to external apps without going through the create workspace flow.
* Force typegen types for some fields of derp health report
* Explicitly allocate slices for RegionReport.{Errors,Warnings} to avoid nulls in API response
Updates coder/customers#365
This PR updates our migration framework to run all migrations in a single transaction. This is the same behavior we had in v1 and ensures that failed migrations don't bring the whole deployment down. If a migration fails now, it will automatically be rolled back to the previous version, allowing the deployment to continue functioning.
Adds cleanup queries to clean out "lost" peer and tunnel state after 24 hours. We leave this state in the database so that anything trying to connect to the peer can see that it was lost, but clean it up after 24 hours to ensure our table doesn't grow without bounds.
Relates to #8965
- Added error codes for separate code paths in health checks
- Prefixed errors and warnings with error code prefixes
- Added a docs page with details on each code, cause and solution
Co-authored-by: Muhammad Atif Ali <atif@coder.com>
The 'userOIDC' method body was getting unwieldy.
I think there is a good way to redesign the flow, but
I do not want to undertake that at this time.
The easy win is just to move some LoC to other methods
and cleanup the main method.
Drop "New" and "Builder" from the function names, in favor of the top-level resource created. This shortens tests and gives a nice syntax. Since everything is a builder, the prefix and suffix don't add much value and just make things harder to read.
I've also chosen to leave `Do()` as the function to insert into the database. Even though it's a builder pattern, I fear `.Build()` might be confusing with Workspace Builds. One other idea is `Insert()` but if we later add dbfake functions that update, this might be inconsistent.
Convert to builder for consistency with rest of the package. This will make it easier to use, and means we can drop "Builder" from function arguments since they are all builders in the package.
* Adds workspace proxy section to health page
* Conditionally places workspace proxy warnings in errors or warnings based on calculated severity
* Adds some more stories we were missing for HealthPage
- Adds a template_insights pseudo-resource
- Grants auditor and template admin roles read access on template_insights
- Updates existing RBAC checks to check for read template_insights, falling back to template update permissions where necessary
- Updates TemplateLayout to show Insights tab if can read template_insights or can update template
Adds a health check for workspace proxies:
- Healthy iff all proxies are healthy and the same version,
- Warning if some proxies are unhealthy,
- Error if all proxies are unhealthy, or do not all have the same version.
I'd like to convert dbfake into a builder pattern to prevent a proliferation of XXXWithYYY methods. This is one step of the way by removing the Non-builder function.
Refactors SSH tests to skip provisionerd and instead use dbfake to insert workspaces and builds. This should make tests faster and more reliable.
dbfake.WorkspaceBuild is refactored to use a "builder" pattern with "fluent" options, as the number of options and variants was starting to get out of hand.
* feat: implement deprecated flag for templates to prevent new workspaces
* Add deprecated filter to template fetching
* Add deprecated to template table
* Add deprecated notice to template page
* Add ui to deprecate a template
Marked as a breaking change as the previous activity bump was always the TTL duration of the workspace/template.
This change is more cost conservative, only bumping by 1 hour for workspace activity. To accommodate wrap around, eg bumping a workspace into the next autostart, the deadline is bumped by the TTL if the workspace crosses the autostart threshold.
This is a niche case that is likely caused by an idle terminal making a workspace survive through a night. The next morning, the workspace will get activity bumped the default TTL on the autostart, being similar to as if the workspace was autostarted again.
In practice, a good way to avoid this is to set a max_deadline of <24hrs to avoid wrap around entirely.
* coder list: adds information about next start / stop to available columns (not default)
* coder schedule: show now essentially coder list with a different set of columns
* Updates cli schedule unit tests to use new dbfake
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
* Updated testingWithOwnerUser ruleguard rule to detect:
a) Passing client from coderdenttest.New() to clitest.SetupConfig() similar to what already exists for AGPL code
b) Usage of any method of the owner client from coderdenttest.New() - all usages of the owner client must be justified with a `//nolint:gocritic` comment.
* Fixed resulting linter complaints.
* Added new coderdtest helpers CreateGroup and UpdateTemplateMeta.
* Modified check_enterprise_import.sh to ignore scripts/rules.go.
* Adds a Contains() method on MockAuditor to help with asserting the presence of an audit log with specific fields.
* Updates existing usages of verifyAuditWorkspaceCreated to use the new helper
* Updates test referenced in PR#10396.
* feat: add dbfakedata for workspace builds and resources
This creates `coderdtest.NewWithDatabase` and adds a series of
helper functions to `dbfake` that insert structured fake data
for resources into the database.
It allows us to remove provisionerd from a significant amount of
tests which should speed them up and reduce flakes.
* Rename dbfakedata to dbfake
* Migrate workspaceagents_test.go to use the new dbfake
* Migrate agent_test.go to use the new fakes
* Fix comments
Fixes flake seen here: https://github.com/coder/coder/actions/runs/6716682414/job/18253279654
The test used a cron schedule to compute autobuild ticks, with ticks every hour on the hour. The default TTL was set to an hour. Usually, the next tick is less than one hour in the future, unless the test runs at :00 past the hour, which it did in my flake'd
run. But, given that this is an autostop test, the cron schedule is irrelevant (such schedules are used for auto_start_). So, I've removed it from the test and compute the build ticks directly.
Also, the test originally had the workspace TTL set to longer than the default template TTL, and then tested that no build happened when the tick was prior to both. This seems odd to me, as we want to demonstrate the the executor disregards the workspace TTL.
So, I changed the test to set the workspace TTL shorter, and then send in a tick between the two, verify that we don't autostop, then a tick after the template TTL and verify that we do.