Fixes#10979
Testing code that listens on a specific port has created a long battle with flakes. Previous attempts to deal with this include opening a listener on a port chosen by the OS, then closing the listener, noting the port and starting the test with that port.
This still flakes, notably in macOS which has a proclivity to reuse ports quickly.
Instead of fighting with the chaos that is an OS networking stack, this PR fakes the host networking in tests.
I've taken a small step here, only faking out the Listen() calls that port-forward makes, but I think over time we should be transitioning all networking the CLI does to an abstract interface so we can fake it. This allows us to run in parallel without flakes and
presents an opportunity to test error paths as well.
- Adds a --name argument to provisionerd start
- Plumbs through name to integrated and external provisioners
- Defaults to hostname if not specified for external, hostname-N for integrated
- Adds cliutil.Hostname
Fixes flake seen here: https://github.com/coder/coder/runs/19170327767
The goroutine that attempts to dial the socket didn't complete before the test did. Here we add an explicit wait for it to complete in each run of the loop.
Drop "New" and "Builder" from the function names, in favor of the top-level resource created. This shortens tests and gives a nice syntax. Since everything is a builder, the prefix and suffix don't add much value and just make things harder to read.
I've also chosen to leave `Do()` as the function to insert into the database. Even though it's a builder pattern, I fear `.Build()` might be confusing with Workspace Builds. One other idea is `Insert()` but if we later add dbfake functions that update, this might be inconsistent.
Man, graceful shutdown is hard. Even after my changes, we were still hitting a graceful shutdown race: https://github.com/coder/coder/runs/18886842123
The problem was that while we attempt a graceful shutdown at the SSH layer by closing the session for writing, we were not giving it a chance to complete before continuing to tear down the stack of closers, including one that closes the netstack, and thus drop the TCP connection before it closes.
I'd like to convert dbfake into a builder pattern to prevent a proliferation of XXXWithYYY methods. This is one step of the way by removing the Non-builder function.
Refactors SSH tests to skip provisionerd and instead use dbfake to insert workspaces and builds. This should make tests faster and more reliable.
dbfake.WorkspaceBuild is refactored to use a "builder" pattern with "fluent" options, as the number of options and variants was starting to get out of hand.
* feat: implement deprecated flag for templates to prevent new workspaces
* Add deprecated filter to template fetching
* Add deprecated to template table
* Add deprecated notice to template page
* Add ui to deprecate a template
> Can someone help me understand the differences between these env variables:
>
> CODER_REDIRECT_TO_ACCESS_URL
> CODER_TLS_REDIRECT_HTTP_TO_HTTPS
> CODER_TLS_REDIRECT_HTTP
Oh man, what a mess. It looks like `CODER_TLS_REDIRECT_HTTP ` appears in our config docs. Maybe that was the initial name for the environment variable?
At some point, both the flag and the environment variable were `--tls-redirect-http-to-https` and `CODER_TLS_REDIRECT_HTTP_TO_HTTPS`. `CODER_TLS_REDIRECT_HTTP` did nothing.
However, then we introduced `CODER_REDIRECT_TO_ACCESS_URL`, we put in some deprecation code that was maybe fat-fingered such that we accept the environment variable `CODER_TLS_REDIRECT_HTTP` but the flag `--tls-redirect-http-to-https`. Our docs still refer to `CODER_TLS_REDIRECT_HTTP` at https://coder.com/docs/v2/latest/admin/configure#address
So, I think what we gotta do is still accept `CODER_TLS_REDIRECT_HTTP` since it was working and in an example doc, but also fix the deprecation code to accept `CODER_TLS_REDIRECT_HTTP_TO_HTTPS` environment variable.
Re-enables TestSSH/RemoteForward_Unix_Signal and addresses the underlying race: we were not closing the remote forward on context expiry, only the session and connection.
However, there is still a more fundamental issue in that we don't have the ability to ensure that TCP sessions are properly terminated before tearing down the Tailnet conn. This is due to the assumption in the sockets API, that the underlying IP interface is long
lived compared with the TCP socket, and thus closing a socket returns immediately and does not wait for the TCP termination handshake --- that is handled async in the tcpip stack. However, this assumption does not hold for us and tailnet, since on shutdown,
we also tear down the tailnet connection, and this can race with the TCP termination.
Closing the remote forward explicitly should prevent forward state from accumulating, since the Close() function waits for a reply from the remote SSH server.
I've also attempted to workaround the TCP/tailnet issue for `--stdio` by using `CloseWrite()` instead of `Close()`. By closing the write side of the connection, half-close the TCP connection, and the server detects this and closes the other direction, which then
triggers our read loop to exit only after the server has had a chance to process the close.
TODO in a stacked PR is to implement this logic for `vscodessh` as well.
Adds a Logger to cli Invocation and standardizes CLI commands to use it. clitest creates a test logger by default so that CLI command logs are captured in the test logs.
CLI commands that do their own log configuration are modified to add sinks to the existing logger, rather than create a new one. This ensures we still capture logs in CLI tests.
Fixes an issue where remote forwards are not correctly torn down when using OpenSSH with `coder ssh --stdio`. OpenSSH sends a disconnect signal, but then also sends SIGHUP to `coder`. Previously, we just exited when we got SIGHUP, and this raced against properly disconnecting.
Fixes https://github.com/coder/customers/issues/327
* coder list: adds information about next start / stop to available columns (not default)
* coder schedule: show now essentially coder list with a different set of columns
* Updates cli schedule unit tests to use new dbfake
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
* feat: add dbfakedata for workspace builds and resources
This creates `coderdtest.NewWithDatabase` and adds a series of
helper functions to `dbfake` that insert structured fake data
for resources into the database.
It allows us to remove provisionerd from a significant amount of
tests which should speed them up and reduce flakes.
* Rename dbfakedata to dbfake
* Migrate workspaceagents_test.go to use the new dbfake
* Migrate agent_test.go to use the new fakes
* Fix comments
I've said it before, I'll say it again: you can't create a timed context before calling `t.Parallel()` and then use it after.
Fixes flakes like https://github.com/coder/coder/actions/runs/6716682414/job/18253279157
I've chosen just to drop `t.Parallel()` entirely rather than create a second context after the parallel call, since the vast majority of the test time happens before where the parallel call was. It does all the tailnet setup before `t.Parallel()`.
Leaving a call to `t.Parallel()` is a bug risk for future maintainers to come in and use the wrong context in the latter part of the test by accident.
- Set viewport size to avoid responsive mode
- Added way more debug logging
- Added facility to write a screenshot on error in verbose mode.
- Added a deadline for each iteraction of clicking on and waiting for a thing.
* feat: support configurable web terminal rendering
- Added a deployment option for configuring web terminal rendering.
Valid values are 'webgl', 'canvas', and 'dom'.
* Detects the following pattern where the CLI is initialized with a client authenticated as the "first user":
client := coderdtest.New(t, ...)
[...]
user := coderdtest.CreateFirstUser(t, client)
[...]
clitest.SetupConfig(t, client, root)
* Updates documentation regarding role permissions on workspaces.
AwaitWorkspaceAgent calls testify.require which isn't allowed from a goroutine and causes cascading failures in the test suite such as: https://github.com/coder/coder/actions/runs/6458768855/job/17533163316
I don't believe these functions serve a direct purpose since nothing else is "waiting" for the functions to return before doing other things.
Adds `CODER_AGENT_TOKEN_FILE` which will read the agent token from
a file if `CODER_AGENT_TOKEN` is not provided. Using a Kubernetes
Secret with a volume-mounted file is a more secure way to provide
the agent token instead of an environment variable.
- Adds an audit log for workspaces automatically transitioned to the dormant
state.
- Imposes a mininum of 1 minute on cleanup-related fields. This is to
prevent accidental API misuse from resulting in catastrophe.
- Removes the `exp scaletest` command from the slim binary
- Updates scaletest-runner template to fetch the full binary from the running Coder instance
* feat: allow external services to be authable
* Refactor external auth config structure for defaults
* Add support for new config properties
* Change the name of external auth
* Move externalauth -> external-auth
* Run gen
* Fix tests
* Fix MW tests
* Fix git auth redirect
* Fix lint
* Fix name
* Allow any ID
* Fix invalid type test
* Fix e2e tests
* Fix comments
* Fix colors
* Allow accepting any type as string
* Run gen
* Fix href
* Adds a set of actions to automatically interact with a Coder instance using chromedp
* Integrates the chromedp actions into the scaletest dashboard command,
* Re-enables the previously disabled unit tests for scaletest/dashboard
* Removes previous dashboard actions based around codersdk
* chore: move `/gitauth` to `/externalauth` on the frontend
This actually took a lot more jank than anticipated,
so I wanted to split this up before adding the ability
to embed new providers.
* Rename FE
* Fix em' up
* Fix linting error
* Fix e2e tests
* chore: update helm golden files
* chore: rename `git_auth` to `external_auth` in our schema
We're changing Git auth to be external auth. It will support
any OAuth2 or OIDC provider.
To split up the larger change I want to contribute the schema
changes first, and I'll add the feature itself in another PR.
* Fix names
* Fix outdated view
* Rename some additional places
* Fix sort order
* Fix template versions auth route
* Fix types
* Fix dbauthz
* Adds agenttest.New() helper function
* Makes sure agent gets closed on test cleanup
* Makes sure you don't forget to set session token
* Sets the agent and client logger automatically
- Adds dbtestutil.WithTimezone(tz) to allow setting the timezone for a test database.
- Modifies our test database setup code to pick a consistently weird timezone for the database.
- Adds the facility randtz.Name() to pick a random timezone which is consistent across subtests (via sync.Once).
- Adds a linter rule to warn against setting the test database timezone to UTC.
This change reduces the CPU consumption of --help by ~50%.
Also, this change removes ANSI escape codes from our golden files. I
don't think those were worth the inability to parallelize golden file tests and
global state fragility.
- An opt-in feature has been added to the agent to allow
deprioritizing non coder-related processes for CPU by setting their
niceness level to 10.
- Opting in to the feature requires setting CODER_PROC_PRIO_MGMT to a non-empty value.
This change will improve over CLI performance and "snappiness" as well as
substantially reduce our test times. Preliminary benchmarks show
`coder server --help` times cut from 300ms to 120ms on my dogfood
instance.
The inefficiency of lipgloss disproportionately impacts our system, as all help
text for every command is generated whenever any command is invoked.
The `pretty` API could clean up a lot of the code (e.g., by replacing
complex string concatenations with Printf), but this commit is too
expansive as is so that work will be done in a follow up.
See also: https://github.com/coder/coder/pull/9522
- Adds commands `server dbcrypt {rotate,decrypt,delete}` to re-encrypt, decrypt, or delete encrypted data, respectively.
- Plumbs through dbcrypt in enterprise/coderd (including unit tests).
- Adds documentation in admin/encryption.md.
This enables dbcrypt by default, but the feature is soft-enforced on supplying external token encryption keys. Without specifying any keys, encryption/decryption is a no-op.
This change removes one use of `coderd/database` from the slim binary
and more correctly uses codersdk instead of database or provisionerd
packages.
No size change (yet).
Ref: #9380
This is an alternative approach to #9519 and removes 2 MB instead of 1
MB (1.2 MB accounted for by embedded migration SQL files).
Combined with #9481, #9506, #9508, #9517, a total of 5 MB is removed.
Ref: #9380
This removes an indirect import of `coderd/database` from the CLI and
results in a logical separation between server related and generalized
schedule.
No size change (yet).
Ref: #9380
* feat: add `template_active_version_id` to workspaces
This reduces a fetch in the VS Code extension when getting the
active version update message!
* Fix entities.ts
* Fix golden gen
* docs: expand on TTL flags
* make: gen
* Discard changes to site/src/api/api.ts
* Discard changes to site/src/xServices/templateVersion/templateVersionXService.ts
---------
Co-authored-by: Muhammad Atif Ali <matifali@live.com>
Co-authored-by: Muhammad Atif Ali <atif@coder.com>
* chore: rename locked to dormant
- The following columns have been updated:
- workspace.locked_at -> dormant_at
- template.inactivity_ttl -> time_til_dormant
- template.locked_ttl -> time_til_dormant_autodelete
This change has also been reflected in the SDK.
A route has also been updated from /workspaces/<id>/lock to /workspaces/<id>/dormant
* chore: add /v2 to import module path
go mod requires semantic versioning with versions greater than 1.x
This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```
Migrate generated files to import /v2
* Fix gen
* The batchstats warning went out on every Ctrl+C in my development
Rule of silence:
The provisioner and connect messages messages were sent out on every startup
without a corresponding user event, making them annoying and more-so
debug messages.
This allows specifying a command to run that can output headers for
cases where users require dynamic headers (like to authenticate to their
VPN).
The primary use case is to add this flag in SSH configs created by the
VS Code plugin, although maybe config-ssh should do the same.
* add flag for auto create groups
* fixup! add flag for auto create groups
* sync missing groups
Also added a regex filter to filter out groups that are not
important
* chore: rename startup logs to agent logs
This also adds a `source` property to every agent log. It
should allow us to group logs and display them nicer in
the UI as they stream in.
* Fix migration order
* Fix naming
* Rename the frontend
* Fix tests
* Fix down migration
* Match enums for workspace agent logs
* Fix inserting log source
* Fix migration order
* Fix logs tests
* Fix psql insert
- For cgroups v1 the wrong cgroup file was being read
to determine max memory. This commit updates the file
from '/sys/fs/cgroup/memory/memory.max_usage_in_bytes' to
'/sys/fs/cgroup/memory/memory.limit_in_bytes'
`--variable` is used frequently enough to deserve a shorthand. Unfortunately,
`-v` is taken by verbose, and `-V` is too easily confused with version or
verbose, so we're left with "--var".
* feat(cli): check if dotfiles install script is executable
* feat(docs): add section for dotfiles setup and document executable fix
---------
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Co-authored-by: Muhammad Atif Ali <matifali@live.com>
* Attempts reading cgroupv1 quota, period, usage from /sys/fs/cgroup/cpu,cpuacct by default
* Fall back to /sys/fs/cgroup/cpu for v1 quota and period
* Fall back to /sys/fs/cgroup/cpuacct for v1 usage
Fixes https://github.com/coder/coder/issues/8468