In anticipation of needing the `LogSender` to run on a context that doesn't get immediately canceled when you `Close()` the agent, I've undertaken a little refactor to manage the goroutines that get run against the Tailnet and Agent API connection.
This handles controlling two contexts, one that gets canceled right away at the start of graceful shutdown, and another that stays up to allow graceful shutdown to complete.
Fixes race seen here: https://github.com/coder/coder/runs/21852483781
What happens is that the agent connects, completes the test, and then disconnects before the Eventually condition runs. The waiter then times out because it's looking for a connected agent.
Then, since it's a `require` in a goroutine, that causes the `tGo` cleanup to hang and the whole test suite to timeout after 10 minutes.
Anyway, `agenttest.New` doesn't block, and we don't actually need to wait for the agent to connect, since a successful SSH session is evidence that it connected.
Fixes flake seen here: https://github.com/coder/coder/runs/19170327767
The goroutine that attempts to dial the socket didn't complete before the test did. Here we add an explicit wait for it to complete in each run of the loop.
Drop "New" and "Builder" from the function names, in favor of the top-level resource created. This shortens tests and gives a nice syntax. Since everything is a builder, the prefix and suffix don't add much value and just make things harder to read.
I've also chosen to leave `Do()` as the function to insert into the database. Even though it's a builder pattern, I fear `.Build()` might be confusing with Workspace Builds. One other idea is `Insert()` but if we later add dbfake functions that update, this might be inconsistent.
I'd like to convert dbfake into a builder pattern to prevent a proliferation of XXXWithYYY methods. This is one step of the way by removing the Non-builder function.
Refactors SSH tests to skip provisionerd and instead use dbfake to insert workspaces and builds. This should make tests faster and more reliable.
dbfake.WorkspaceBuild is refactored to use a "builder" pattern with "fluent" options, as the number of options and variants was starting to get out of hand.
Re-enables TestSSH/RemoteForward_Unix_Signal and addresses the underlying race: we were not closing the remote forward on context expiry, only the session and connection.
However, there is still a more fundamental issue in that we don't have the ability to ensure that TCP sessions are properly terminated before tearing down the Tailnet conn. This is due to the assumption in the sockets API, that the underlying IP interface is long
lived compared with the TCP socket, and thus closing a socket returns immediately and does not wait for the TCP termination handshake --- that is handled async in the tcpip stack. However, this assumption does not hold for us and tailnet, since on shutdown,
we also tear down the tailnet connection, and this can race with the TCP termination.
Closing the remote forward explicitly should prevent forward state from accumulating, since the Close() function waits for a reply from the remote SSH server.
I've also attempted to workaround the TCP/tailnet issue for `--stdio` by using `CloseWrite()` instead of `Close()`. By closing the write side of the connection, half-close the TCP connection, and the server detects this and closes the other direction, which then
triggers our read loop to exit only after the server has had a chance to process the close.
TODO in a stacked PR is to implement this logic for `vscodessh` as well.
Adds a Logger to cli Invocation and standardizes CLI commands to use it. clitest creates a test logger by default so that CLI command logs are captured in the test logs.
CLI commands that do their own log configuration are modified to add sinks to the existing logger, rather than create a new one. This ensures we still capture logs in CLI tests.
Fixes an issue where remote forwards are not correctly torn down when using OpenSSH with `coder ssh --stdio`. OpenSSH sends a disconnect signal, but then also sends SIGHUP to `coder`. Previously, we just exited when we got SIGHUP, and this raced against properly disconnecting.
Fixes https://github.com/coder/customers/issues/327
AwaitWorkspaceAgent calls testify.require which isn't allowed from a goroutine and causes cascading failures in the test suite such as: https://github.com/coder/coder/actions/runs/6458768855/job/17533163316
I don't believe these functions serve a direct purpose since nothing else is "waiting" for the functions to return before doing other things.
* Adds agenttest.New() helper function
* Makes sure agent gets closed on test cleanup
* Makes sure you don't forget to set session token
* Sets the agent and client logger automatically
* chore: add /v2 to import module path
go mod requires semantic versioning with versions greater than 1.x
This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```
Migrate generated files to import /v2
* Fix gen
- (breaking) Protects Logger and LogBodies fields of codersdk.Client with its mutex. This addresses a data race in cli/scaletest.
- Fillets the existing cli/createworkspaces unit test and moves the testing logic there into the tests under scaletest/createworkspaces.
- Adds testutil.RaceEnabled bool const and conditionaly skips previously-skipped tests under scaletest/ if the race detector is enabled. This is unfortunate and sad, but I would prefer to have these tests at least running without the race detector than not running at all.
- Adds IgnoreErrors option to fake in-memory agent loggers; having the agents fail the test immediately when they encounter any sort of error isn't really helpful.
* fix(cli/ssh): Avoid connection hang when workspace is stopped
Two issues are addressed here:
1. We were not detecting disconnects due to waiting for Stdin to close
(disconnect would only propagate after entering input and failing to
write to the connection).
2. In other scenarios, where the connection drop is not detected, we now
also watch workspace status and drop the connection when a workspace
reaches the stopped state.
Fixes: https://github.com/coder/jetbrains-coder/issues/199
Refs: #6180, #6175
* chore: rename `AgentConn` to `WorkspaceAgentConn`
The codersdk was becoming bloated with consts for the workspace
agent that made no sense to a reader. `Tailnet*` is an example
of these consts.
* chore: remove `Get` prefix from *Client functions
* chore: remove `BypassRatelimits` option in `codersdk.Client`
It feels wrong to have this as a direct option because it's so infrequently
needed by API callers. It's better to directly modify headers in the two
places that we actually use it.
* Merge `appearance.go` and `buildinfo.go` into `deployment.go`
* Merge `experiments.go` and `features.go` into `deployment.go`
* Fix `make gen` referencing old type names
* Merge `error.go` into `client.go`
`codersdk.Response` lived in `error.go`, which is wrong.
* chore: refactor workspace agent functions into agentsdk
It was odd conflating the codersdk that clients should use
with functions that only the agent should use. This separates
them into two SDKs that are closely coupled, but separate.
* Merge `insights.go` into `deployment.go`
* Merge `organizationmember.go` into `organizations.go`
* Merge `quota.go` into `workspaces.go`
* Rename `sse.go` to `serversentevents.go`
* Rename `codersdk.WorkspaceAppHostResponse` to `codersdk.AppHostResponse`
* Format `.vscode/settings.json`
* Fix outdated naming in `api.ts`
* Fix app host response
* Fix unsupported type
* Fix imported type
* test: Fix GPG test so it does not inherit parent parallelism
Running a subtest in a parent with `t.Parallel()` and using `t.Setenv`
is not allowed in Go 1.20, so we move it to a separate test function.
* Fix shadowed import
Writing to stdin for `coder ssh` too early could result in the input
being discarded. To work around this we add a new `ptytest` method
called `ReadRune` that lets us read one character of output. This will
indicate the command is ready to accept input.
It could be one character of the prompt, or of the loading message
waiting for connection to be established.
* Add extio
* feat: Add `vscodeipc` subcommand for VS Code Extension
This enables the VS Code extension to communicate with a Coder client.
The extension will download the slim binary from `/bin/*` for the
respective client architecture and OS, then execute `coder vscodeipc`
for the connecting workspace.
* Add authentication header, improve comments, and add tests for the CLI
* Update cli/vscodeipc_test.go
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
* Update cli/vscodeipc_test.go
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
* Update cli/vscodeipc/vscodeipc_test.go
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
* Fix requested changes
* Fix IPC tests
* Fix shell execution
* Fix nix flake
* Silence usage
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
* feat: Add connection_timeout and troubleshooting_url to agent
This commit adds the connection timeout and troubleshooting url fields
to coder agents.
If an initial connection cannot be established within connection timeout
seconds, then the agent status will be marked as `"timeout"`.
The troubleshooting URL will be present, if configured in the Terraform
template, it can be presented to the user when the agent state is either
`"timeout"` or `"disconnected"`.
Fixes#4678
This feature is used by the coder agent to exchange a new token. By
protecting the SessionToken via mutex we ensure there are no data races
when accessing it.
* fix: Refactor agent to consume API client
This simplifies a lot of code by creating an interface for
the codersdk client into the agent. It also moves agent
authentication code so instance identity will work between
restarts.
Fixes#3485 and #4082.
* Fix client reconnections