Refactors our DRPC service definitions slightly.
In the previous version, I inserted the RPCs from the tailnet proto directly into the Agent service. This makes things hard to deal with because DRPC then generates a new set of methods with new interfaces with the `DRPCAgent_` prefixed. Since you can't have a single method that takes different argument types, we couldn't reuse the implementation of those RFCs without a lot of extra classes and pass-thru methods.
Instead, the "right" way to do it is to integrate at the DRPC layer. So, we have two DRPC services available over the Agent websocket, and register them both on the DRPC `mux`.
Since the tailnet proto RPC service is now for both clients and agents, I renamed some things to clarify and shorten.
This PR also removes the `TailnetAPI` implementation from the `agentapi` package, and the next PR in the stack replaces it with the implementation from the `tailnet` package.
* chore: add unit test to excercise flake
* Implement a *fix for cron stop() before run()
This fix still has a race condition. I do not see a clean solution
without modifying the cron libary. The cron library uses a boolean
to indicate running, and that boolean needs to be set to "true"
before we call "Close()". Or "Close()" should prevent "Run()"
from doing anything.
In either case, this solves the issue for a niche unit test bug
in which the test finishes, calling Close(), before there was
an oppertunity to start the go routine. It probably isn't worth
a lot of time investment, and this fix will suffice
* chore: check if process is nil
We check if process is nil in the ports_supported file.
Just matching that defensive check, not sure if it can be nil.
Fixes#10799
The flake happens when we try to remote forward, but the port we've chosen is not free. In the flaked example, it's actually the SSH listener that occupies the port we try to remote forward, leading to confusing reads (c.f. the linked issue).
This fix simplies the tests considerably by using the Go ssh client, rather than shelling out to OpenSSH. This avoids using a pseudoterminal, avoids the need for starting any local OS listeners to communicate the forwarding (go SSH just returns in-process listeners), and avoids an OS listener to wire OpenSSH up to the agentConn.
With the simplied logic, we can immediately tell if a remote forward on a random port fails, so we can do this in a loop until success or timeout.
I've also simplified and fixed up the other forwarding tests. Since we set up forwarding in-process with Go ssh, we can remove a lot of the `require.Eventually` logic.
Fixes an issue where remote forwards are not correctly torn down when using OpenSSH with `coder ssh --stdio`. OpenSSH sends a disconnect signal, but then also sends SIGHUP to `coder`. Previously, we just exited when we got SIGHUP, and this raced against properly disconnecting.
Fixes https://github.com/coder/customers/issues/327
* Fit once during creation
This does not fix any bugs (that I know of) but we only need to fit once
when the terminal is created, not every time we reconnect. Granted,
currently we do not support reconnecting without refreshing anyway so it
does not really matter, but this just seems more correct.
Plus now we will not have to pass the fit addon around.
* Pass size when connecting web socket URL
I think this will solve an issue where screen does does not correctly
handle an immediate resize. It seems to ignore the resize, but even if
you send it again nothing changes, seemingly thinking it is already at
that size?
* Use new struct for decoding reconnecting pty requests
Decoding a JSON message does not touch omitted (or null) fields so once
a message with a resize comes in, every single message from that point
will cause a resize.
I am not sure if this is an actual problem in practice but at the very
least it seems unintentional.
* Adds agenttest.New() helper function
* Makes sure agent gets closed on test cleanup
* Makes sure you don't forget to set session token
* Sets the agent and client logger automatically
- An opt-in feature has been added to the agent to allow
deprioritizing non coder-related processes for CPU by setting their
niceness level to 10.
- Opting in to the feature requires setting CODER_PROC_PRIO_MGMT to a non-empty value.
* chore: add /v2 to import module path
go mod requires semantic versioning with versions greater than 1.x
This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```
Migrate generated files to import /v2
* Fix gen
It looks like it is possible for screen to use control sequences instead
of literal newlines which fails the tests.
This reuses the existing readUntil function used in other pty tests.
I forgot that waiting on the cond releases the lock so it was possible
to get pty output after writing the buffer but before adding the pty to
the map. To fix, add the pty to the map while under the same lock where
we read from the buffer.
The rest does not need to be behind the lock so I moved it out of
doAttach, and that also means we no longer need
waitForStateOrContextLocked.
Also, this can hit a logger error saying the attach failed which fails
the tests however it is not that the attach failed, just that the
process already ran and exited, so when the process exits do not
set an error, instead for now assume this is an expected close.
* Add screen backend for reconnecting ptys
The screen portion is a port from wsep. There is an interface that lets
you choose between screen and the previous method. By default it will
choose screen if it is installed but this can be overidden (mostly for
tests).
The tests use a scanner instead of a reader now because the reader will
loop infinitely at the end of a stream.
Replace /bin/bash with bash since bash is not always in /bin.
* Remove connection_id from reconnecting PTY logger
This serves multiple connections so it makes no sense to scope it to a
single connection.
Also lets us use "connection_id" when logging write errors instead of
"other_conn_id".
* Use PATH to test buffered reconnecting pty
* chore: rename startup logs to agent logs
This also adds a `source` property to every agent log. It
should allow us to group logs and display them nicer in
the UI as they stream in.
* Fix migration order
* Fix naming
* Rename the frontend
* Fix tests
* Fix down migration
* Match enums for workspace agent logs
* Fix inserting log source
* Fix migration order
* Fix logs tests
* Fix psql insert
This commit adds two new `agentsdk` functions, `StartupLogsSender` and
`StartupLogsWriter` that can be used by any client looking to send
startup logs to coderd.
We also refactor the `agent` to use these new functions.
As a bonus, agent startup logs are separated into "info" and "error"
levels to separate stdout and stderr.
---------
Co-authored-by: Marcin Tojek <mtojek@users.noreply.github.com>
During agent close it was possible for the startup script logs consumer
to enter a deadlock state where by agent close was waiting via
`a.trackConnGoroutine` and the log reader for a flush event.
This refactor removes the mutex in favor of channel communication and
relies on two goroutines without shared state.
This commit reverts some of the changes in #8029 and implements an
alternative method of keeping track of when the startup script has ended
and there will be no more logs.
This is achieved by adding new agent fields for tracking when the agent
enters the "starting" and "ready"/"start_error" lifecycle states. The
timestamps simplify logic since we don't need understand if the current
state is before or after the state we're interested in. They can also be
used to show data like how long the startup script took to execute. This
also allowed us to remove the EOF field from the logs as the
implementation was problematic when we returned the EOF log entry in the
response since requesting _after_ that ID would give no logs and the API
would thus lose track of EOF.
* feat(coderd,agent): send startup log eof at the end
* fix(coderd): fix edge case in startup log pubsub
* fix(coderd): ensure startup logs are closed on lifecycle state change (fallback)
* fix(codersdk): fix startup log channel shared memory bug
* fix(site): remove the EOF log line
* chore: skip timing-sensistive AgentMetadata test in the standard suite
* Add test-timing target
* fix windows?
* Works on my Windows desktop?
* Use tag system
* fixup! Use tag system