Commit Graph

302 Commits

Author SHA1 Message Date
Asher 02ee724d9f
fix: do terminal emulation in reconnecting pty tests (#9114)
It looks like it is possible for screen to use control sequences instead
of literal newlines which fails the tests.

This reuses the existing readUntil function used in other pty tests.
2023-08-16 13:02:03 -08:00
Colin Adler 344d32b2f1
feat(coderd): expire agents from server tailnet (#9092) 2023-08-14 20:38:37 -05:00
Asher a08f7b8fb9
fix: catch missing output with reconnecting PTY (#9094)
I forgot that waiting on the cond releases the lock so it was possible
to get pty output after writing the buffer but before adding the pty to
the map.  To fix, add the pty to the map while under the same lock where
we read from the buffer.

The rest does not need to be behind the lock so I moved it out of
doAttach, and that also means we no longer need
waitForStateOrContextLocked.

Also, this can hit a logger error saying the attach failed which fails
the tests however it is not that the attach failed, just that the
process already ran and exited, so when the process exits do not
set an error, instead for now assume this is an expected close.
2023-08-14 15:54:23 -08:00
Asher b993cab49a
fix: use screen for reconnecting terminal sessions on Linux if available (#8640)
* Add screen backend for reconnecting ptys

The screen portion is a port from wsep.  There is an interface that lets
you choose between screen and the previous method.  By default it will
choose screen if it is installed but this can be overidden (mostly for
tests).

The tests use a scanner instead of a reader now because the reader will
loop infinitely at the end of a stream.

Replace /bin/bash with bash since bash is not always in /bin.

* Remove connection_id from reconnecting PTY logger

This serves multiple connections so it makes no sense to scope it to a
single connection.

Also lets us use "connection_id" when logging write errors instead of
"other_conn_id".

* Use PATH to test buffered reconnecting pty
2023-08-14 11:19:13 -08:00
Dean Sheather 07fd73c4a0
chore: allow multiple agent subsystems, add exectrace (#8933) 2023-08-08 22:10:28 -07:00
Dean Sheather 3c52b01850
chore: add tailscale magicsock debug logging controls (#8982) 2023-08-08 17:56:08 +00:00
Dean Sheather b955c5fefc
fix: avoid agent runLoop exiting due to ws ping (#8852) 2023-08-02 07:25:07 +00:00
Dean Sheather c575292ba6
fix: fix tailnet netcheck issues (#8802) 2023-08-02 01:50:43 +10:00
Kyle Carberry bd944e0d21
chore: rename startup logs to agent logs (#8649)
* chore: rename startup logs to agent logs

This also adds a `source` property to every agent log. It
should allow us to group logs and display them nicer in
the UI as they stream in.

* Fix migration order

* Fix naming

* Rename the frontend

* Fix tests

* Fix down migration

* Match enums for workspace agent logs

* Fix inserting log source

* Fix migration order

* Fix logs tests

* Fix psql insert
2023-07-28 15:57:23 +00:00
Ammar Bandukwala 25e30c6f41
feat(cli): support fine-grained server log filtering (#8748) 2023-07-26 16:46:22 -05:00
Dean Sheather 2f0a9996e7
chore: add derpserver to wsproxy, add proxies to derpmap (#7311) 2023-07-27 02:21:04 +10:00
Colin Adler 71d4e4e6e8
fix(agent): check agent metadata every second instead of minute (#8614) 2023-07-20 14:02:58 -05:00
Colin Adler c8d65de4b7
test(agent): fix `TestAgent_Metadata/Once` flake (#8613) 2023-07-20 18:49:44 +00:00
Mathias Fredriksson 5fd77ad7cf
test(agent): fix service banner and metadata intervals (#8516) 2023-07-14 16:10:26 +03:00
Colin Adler c47b78c44b
chore: replace wsconncache with a single tailnet (#8176) 2023-07-12 17:37:31 -05:00
Mathias Fredriksson e508d9aa6e
fix(agent/usershell): check shell on darwin via dscl (#8366) 2023-07-11 20:27:50 +03:00
Mathias Fredriksson 34c3f919dc
fix(agent/agentssh): check for hushlogin via afero fs (#8358) 2023-07-07 13:30:23 +03:00
Mathias Fredriksson 3f058f28e7
test(agent): use afero for motd tests to allow parallel execution (#8329) 2023-07-06 10:57:51 +03:00
Asher 6015319e9d
feat: show service banner in SSH/TTY sessions (#8186)
* Allow workspace agents to get appearance
* Poll for service banner every two minutes
* Show service banner before MOTD if not quiet
2023-06-30 10:41:29 -08:00
Mathias Fredriksson 6d176aee5d
test(agent): fix lifecycle test flakeyness (#8230) 2023-06-27 12:44:16 +00:00
Mathias Fredriksson 3b9b06fe5a
feat(codersdk/agentsdk): add `StartupLogsSender` and `StartupLogsWriter` (#8129)
This commit adds two new `agentsdk` functions, `StartupLogsSender` and
`StartupLogsWriter` that can be used by any client looking to send
startup logs to coderd.

We also refactor the `agent` to use these new functions.

As a bonus, agent startup logs are separated into "info" and "error"
levels to separate stdout and stderr.

---------

Co-authored-by: Marcin Tojek <mtojek@users.noreply.github.com>
2023-06-22 23:28:59 +03:00
Spike Curtis e738123a9c
chore: log ssh connection disconnects with errors (#8143)
Signed-off-by: Spike Curtis <spike@coder.com>
2023-06-22 11:37:50 +04:00
Dean Sheather a28d422c35
feat: add flag to disable all direct connections (#7936) 2023-06-21 22:02:05 +00:00
Marcin Tojek 4fb4c9b270
chore: add more rules to ensure logs consistency (#8104) 2023-06-21 12:00:38 +02:00
Spike Curtis 1c8f564fdb
feat: add logging of ssh connections to agent (#8096)
* feat: adds logging of ssh connections to agent

Signed-off-by: Spike Curtis <spike@coder.com>

* code review improvements

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-06-21 13:49:58 +04:00
Mathias Fredriksson ea4b7d60d7
fix(agent): refactor `trackScriptLogs` to avoid deadlock (#8084)
During agent close it was possible for the startup script logs consumer
to enter a deadlock state where by agent close was waiting via
`a.trackConnGoroutine` and the log reader for a flush event.

This refactor removes the mutex in favor of channel communication and
relies on two goroutines without shared state.
2023-06-20 18:05:11 +03:00
Mathias Fredriksson 8dac0356ed
refactor: replace startup script logs EOF with starting/ready time (#8082)
This commit reverts some of the changes in #8029 and implements an
alternative method of keeping track of when the startup script has ended
and there will be no more logs.

This is achieved by adding new agent fields for tracking when the agent
enters the "starting" and "ready"/"start_error" lifecycle states. The
timestamps simplify logic since we don't need understand if the current
state is before or after the state we're interested in. They can also be
used to show data like how long the startup script took to execute. This
also allowed us to remove the EOF field from the logs as the
implementation was problematic when we returned the EOF log entry in the
response since requesting _after_ that ID would give no logs and the API
would thus lose track of EOF.
2023-06-20 14:41:55 +03:00
Marcin Tojek b1d1b63113
chore: ensure logs consistency across Coder (#8083) 2023-06-20 12:30:45 +02:00
Mathias Fredriksson 0c5077464b
fix: avoid missed logs when streaming startup logs (#8029)
* feat(coderd,agent): send startup log eof at the end

* fix(coderd): fix edge case in startup log pubsub

* fix(coderd): ensure startup logs are closed on lifecycle state change (fallback)

* fix(codersdk): fix startup log channel shared memory bug

* fix(site): remove the EOF log line
2023-06-16 17:14:22 +03:00
Marcin Tojek 247f8a973f
feat: replace ssh maxTimeout with keep-alive mechanism (#8062)
* Bump up coder/ssh

* feat: Set default agent timeout to ~72h

* Address PR comments

* Fix
2023-06-16 15:22:18 +02:00
Mathias Fredriksson 74fdcb1ace
fix(agent/agentssh): wait for sessions to exit (#8008) 2023-06-13 17:52:31 +00:00
Mathias Fredriksson c916a9e67f
fix(agent): guard against multiple rpty race for same id (#7998)
* fix(agent): guard against multiple rpty race for same id
* fix(agent): ensure pty is closed on error
2023-06-13 15:14:07 +00:00
Ammar Bandukwala fcca639d38
test(agent/agentssh): close SSH servers in all tests (#7911)
Potentially solves the flake seen here:

https://github.com/coder/coder/actions/runs/5167029213/jobs/9307647816.
2023-06-07 23:43:38 +00:00
Marcin Tojek 14efdadd3c
feat: Collect agent SSH metrics (#7584) 2023-05-25 12:52:36 +02:00
Jon Ayers 00a2413c03
feat: add telemetry support for workspace agent subsystem (#7579) 2023-05-17 22:49:25 -05:00
Kyle Carberry 70d2203b9e
chore: reduce the log output of skipped tests (#7520)
With the introduction of the workspace proxy tests there was a lot
of output if a test was eventually skipped.
2023-05-14 19:37:00 -05:00
Spike Curtis 9c030a8888
fix: pty.Start respects context on Windows too (#7373)
* fix: pty.Start respects context on Windows too

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix windows imports; rename ToExec -> AsExec

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix import in windows test

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-05-03 11:43:05 +04:00
Ammar Bandukwala 465fe8658d
chore: skip timing-sensistive AgentMetadata test in the standard suite (#7237)
* chore: skip timing-sensistive AgentMetadata test in the standard suite

* Add test-timing target

* fix windows?

* Works on my Windows desktop?

* Use tag system

* fixup! Use tag system
2023-05-02 10:41:41 +00:00
Marcin Tojek bb0a38b161
feat: Implement aggregator for agent metrics (#7259) 2023-04-27 12:34:00 +02:00
Spike Curtis b6666cf1cf
chore: tailnet debug logging (#7260)
* Enable discovery (disco) debug

Signed-off-by: Spike Curtis <spike@coder.com>

* Better debug on reconnectingPTY

Signed-off-by: Spike Curtis <spike@coder.com>

* Agent logging in appstest

Signed-off-by: Spike Curtis <spike@coder.com>

* More reconnectingPTY logging

Signed-off-by: Spike Curtis <spike@coder.com>

* Add logging to coordinator

Signed-off-by: Spike Curtis <spike@coder.com>

* Update agent/agent.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Update agent/agent.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Update agent/agent.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Update agent/agent.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Clarify logs; remove unrelated changes

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2023-04-27 13:59:01 +04:00
Colin Adler 3eb7f06bf1
feat(agent): add http debug routes for magicsock (#7287) 2023-04-26 13:01:49 -05:00
Spike Curtis 29cbc5404a Reconnecting PTY waits for command output or EOF (#7279)
Signed-off-by: Spike Curtis <spike@coder.com>
2023-04-26 09:02:06 +04:00
Spike Curtis daee91c6dc
refactor: PTY & SSH (#7100)
* Add ssh tests for longoutput, orphan

Signed-off-by: Spike Curtis <spike@coder.com>

* PTY/SSH tests & improvements

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix some tests

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix linting

Signed-off-by: Spike Curtis <spike@coder.com>

* fmt

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix windows test

Signed-off-by: Spike Curtis <spike@coder.com>

* Windows copy test

Signed-off-by: Spike Curtis <spike@coder.com>

* WIP Windows pty handling

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix truncation tests

Signed-off-by: Spike Curtis <spike@coder.com>

* Appease linter/fmt

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix typo

Signed-off-by: Spike Curtis <spike@coder.com>

* Rework truncation test to not assume OS buffers

Signed-off-by: Spike Curtis <spike@coder.com>

* Disable orphan test on Windows --- uses sh

Signed-off-by: Spike Curtis <spike@coder.com>

* agent_test running SSH in pty use ptytest.Start

Signed-off-by: Spike Curtis <spike@coder.com>

* More detail about closing pseudoconsole on windows

Signed-off-by: Spike Curtis <spike@coder.com>

* Code review fixes

Signed-off-by: Spike Curtis <spike@coder.com>

* Rearrange ptytest method order

Signed-off-by: Spike Curtis <spike@coder.com>

* Protect pty.Resize on windows from races

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix windows bugs

Signed-off-by: Spike Curtis <spike@coder.com>

* PTY doesn't extend PTYCmd

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix windows types

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-04-24 14:53:57 +04:00
Mathias Fredriksson 712098fa2b
test(agent): Increase the time to wait for agent reachable (#7245) 2023-04-21 19:40:17 +00:00
Kyle Carberry fd84df769d
fix: add `DISPLAY` env var for X11 connections (#7248)
* fix: add `DISPLAY` env var for X11 connections

* Update agent/agentssh/agentssh.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2023-04-21 16:43:49 +00:00
Kyle Carberry f39e6a79de
feat: add support for X11 forwarding (#7205)
* feat: add support for X11 forwarding

* Only run X forwarding on Linux

* Fix piping

* Fix comments
2023-04-21 15:52:40 +00:00
Mathias Fredriksson 300ae4a6bf
test(agent): Fix TestAgent_UnixRemoteForwarding timeout (#7235) 2023-04-21 01:35:51 +03:00
Mathias Fredriksson b3b26a62f2
test(agent/reaper): Fix restructure issue (#7168)
In #7164 we accidentally removed the "in CI" check, this fixes it.
2023-04-17 17:39:10 +00:00
Mathias Fredriksson bf0fed4f3f
chore: Update pion/udp and improve parallel/non-parallel tests (#7164)
* test(all): Improve and fix subtests with parallell/nonparallel parents

* chore: Update pion/udp to fix buffer close
2023-04-17 20:23:10 +03:00
Muhammad Atif Ali bb43713d38
fix: VSCode desktop connection (#7120)
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2023-04-14 17:32:18 +03:00
Ammar Bandukwala 24d8644c0b
chore: de-flake TestWorkspaceAgent_Metadata (round 2) (#7039)
This time, we keep the timing / "racey" tests, but avoid running
them in the harsher CI conditions.
2023-04-06 21:10:13 +00:00
Mathias Fredriksson aa660e0631
feat(agentssh): Gracefully close SSH sessions on Close (#7027)
By tracking and closing sessions manually before closing the underlying
connections, we ensure that the termination is propagated to SSH/SFTP
clients and they're not left waiting for a connection timeout.

Refs: #6177
2023-04-06 19:57:30 +03:00
Mathias Fredriksson 0224426e5b
refactor(agent): Move SSH server into agentssh package (#7004)
Refs: #6177
2023-04-06 19:39:22 +03:00
Mathias Fredriksson 121c2bcde8
test(agent): Fix tests without cmd.Wait() (#7029) 2023-04-06 16:45:53 +03:00
Kyle Carberry bc18f6c113
fix: add `CODER_AGENT_TAILNET_LISTEN_PORT` for specifying a static tailnet port (#6980)
Fixes #5175.
2023-04-03 16:20:19 +00:00
Ammar Bandukwala 34debbf837
fix(agent): prevent goroutine pile up in reportMetadataLoop (#6957) 2023-04-01 16:34:42 -05:00
Ammar Bandukwala ca4fa81570
feat: add agent metadata (#6614) 2023-03-31 15:26:19 -05:00
Kyle Carberry 04e404e448
chore: dial the remote socket continually until connect (#6891)
It's possible that the command starts but the socket isn't ready
even when the file exists.
2023-03-30 15:36:23 +00:00
Mathias Fredriksson 90d18dd2e5
fix(agent): Close stdin and stdout separately to fix pty output loss (#6862)
Fixes #6656
Closes #6840
2023-03-29 21:58:38 +03:00
Mathias Fredriksson 891bbda995
fix(agent): More protection for lost output of SSH PTY commands (#6833)
Fixes #6656 (part 2)
2023-03-28 09:11:15 +00:00
Mathias Fredriksson 76bdde7f1b
fix(agent): Prevent SSH TTYs from losing command output on exit (#6777) 2023-03-24 18:23:41 +00:00
Kyle Carberry cb7375450b
feat: add startup script logs to the ui (#6558)
* Add startup script logs to the database

* Add coderd endpoints for startup script logs

* Push startup script logs from agent

* Pull startup script logs on frontend

* Rename queries

* Add constraint

* Start creating log sending loop

* Add log sending to the agent

* Add tests for streaming logs

* Shorten notify channel name

* Add FE

* Improve bulk log performance

* Finish UI display

* Fix startup log visibility

* Add warning for overflow

* Fix agent queue logs overflow

* Display staartup logs in a virtual DOM for performance

* Fix agent queue with loads of logs

* Fix authorize test

* Remove faulty test

* Fix startup and shutdown reporting error

* Fix gen

* Fix comments

* Periodically purge old database entries

* Add test fixture for migration

* Add Storybook

* Check if there are logs when displaying features

* Fix startup component overflow gap

* Fix startup log wrapping

---------

Co-authored-by: Asher <ash@coder.com>
2023-03-23 14:09:13 -05:00
Benjamin Sejas 7076dee522
feat(agent): Add SSH max timeout option for coder agent (#6596)
* feat(agent): Add SSH max timeout option for coder agent

* Fix lint and update test golden snapshot
2023-03-15 09:08:50 -05:00
Steven Masley 2abae42cec
feat: Ignore agent pprof port in listening ports (#6515)
* feat: Ignore agent pprof port in listening ports
2023-03-09 10:53:00 -06:00
Kyle Carberry 5304b4e483
feat: add connection statistics for workspace agents (#6469)
* fix: don't make session counts cumulative

This made for some weird tracking... we want the point-in-time
number of counts!

* Add databasefake query for getting agent stats

* Add deployment stats endpoint

* The query... works?!?

* Fix aggregation query

* Select from multiple tables instead

* Fix continuous stats

* Increase period of stat refreshes

* Add workspace counts to deployment stats

* fmt

* Add a slight bit of responsiveness

* Fix template version editor overflow

* Add refresh button

* Fix font family on button

* Fix latest stat being reported

* Revert agent conn stats

* Fix linting error

* Fix tests

* Fix gen

* Fix migrations

* Block on sending stat updates

* Add test fixtures

* Fix response structure

* make gen
2023-03-08 21:05:45 -06:00
Kyle Carberry 29ced72cda
chore: fix stats leaking in tests (#6478)
See https://github.com/coder/coder/actions/runs/4350254306/jobs/7601134509
2023-03-07 04:09:02 +00:00
Mathias Fredriksson 22e3ff96be
feat(agent): Add shutdown lifecycle states and shutdown_script support (#6139)
* feat(api): Add agent shutdown lifecycle states

* feat(agent): Add shutdown_script support

* feat(agent): Add shutdown_script timeout

* feat(site): Support new agent lifecycle states

---

Co-authored-by: Marcin Tojek <marcin@coder.com>
2023-03-06 21:34:00 +02:00
Kyle Carberry 2ff1c6d613
feat: add agent stats for different connection types (#6412)
This allows us to track when our extensions are used, when the
web terminal is used, and average connection latency to the agent.
2023-03-02 08:06:00 -06:00
Kyle Carberry 05e449943d
chore: convert agent stats to use a table (#6374)
* chore: convert workspace agent stats from json to table

* chore: convert agent stats to use a table

Backwards compatibility becomes hard when all agent stats are in a JSON blob.
We also want to query this table for new agents that are failing health checks
so we can display it in the UI.

* Fix migration using default values
2023-02-28 13:33:33 -06:00
Kyle Carberry 3d8b77d6f1
chore: improve clarity of the agent logs (#6345)
I looked through these logs when debugging and there was a bit of spam!
2023-02-27 09:20:24 -06:00
Mathias Fredriksson 677721e4a1
fix(tailnet): Skip nodes without DERP, avoid use of RemoveAllPeers (#6320)
* fix(tailnet): Skip nodes without DERP, avoid use of RemoveAllPeers
2023-02-24 18:16:29 +02:00
Ammar Bandukwala f05609b4da
chore: format Go more aggressively 2023-02-18 18:32:09 -06:00
Mathias Fredriksson 41ae01d2e9
fix: Improve closure of provisioner and agent tailnet dial (#6199) 2023-02-14 14:57:48 +00:00
Colin Adler 4432cd08d6
chore: update tailscale (#6091) 2023-02-09 21:43:18 -06:00
Mathias Fredriksson 6f3f7f2937
fix(agent): Allow signal propagation when running as PID 1 (#6141) 2023-02-09 23:07:21 +02:00
Kyle Carberry 691495d761
feat: add `expanded_directory` to the agent for extension support (#6087)
This will enable opening the default `dir` of an agent in
the VS Code extension!
2023-02-07 21:35:09 +00:00
Kyle Carberry f6effdb63e
fix: redirect the user to the home directory if dir is not set (#6085)
This was blocking SSH connections from being established if a dir
that wasn't created yet is set.
2023-02-07 20:28:41 +00:00
Kyle Carberry 2c2bbcc019
chore: update tests to support fish (#6023)
* fix: update tests to add fish support

* Track connections for SSH sessions to prevent leaks

* Revert SSH conn handling
2023-02-03 12:25:11 -06:00
Mathias Fredriksson e6f5623627
chore: Rename agent statistics server to http api server (#5961) 2023-02-01 20:05:57 +02:00
Mathias Fredriksson f4d6afb01d
feat(agent): Allow specifying log directory via flag or env (#5915) 2023-01-30 18:39:52 +02:00
Kyle Carberry 0d08065488
fix: use a waitgroup to ensure all connections are cleaned up in agent (#5910)
* fix: use a waitgroup to ensure all connections are cleaned up in agent

There was a race where connections would be created at the same time as close.
The `net.Conn` produced by Tailscale doesn't close then the listener does.

* Remove accidental test
2023-01-29 17:20:30 -06:00
Kyle Carberry 7ad87505c8
chore: move agent functions from `codersdk` into `agentsdk` (#5903)
* chore: rename `AgentConn` to `WorkspaceAgentConn`

The codersdk was becoming bloated with consts for the workspace
agent that made no sense to a reader. `Tailnet*` is an example
of these consts.

* chore: remove `Get` prefix from *Client functions

* chore: remove `BypassRatelimits` option in `codersdk.Client`

It feels wrong to have this as a direct option because it's so infrequently
needed by API callers. It's better to directly modify headers in the two
places that we actually use it.

* Merge `appearance.go` and `buildinfo.go` into `deployment.go`

* Merge `experiments.go` and `features.go` into `deployment.go`

* Fix `make gen` referencing old type names

* Merge `error.go` into `client.go`

`codersdk.Response` lived in `error.go`, which is wrong.

* chore: refactor workspace agent functions into agentsdk

It was odd conflating the codersdk that clients should use
with functions that only the agent should use. This separates
them into two SDKs that are closely coupled, but separate.

* Merge `insights.go` into `deployment.go`

* Merge `organizationmember.go` into `organizations.go`

* Merge `quota.go` into `workspaces.go`

* Rename `sse.go` to `serversentevents.go`

* Rename `codersdk.WorkspaceAppHostResponse` to `codersdk.AppHostResponse`

* Format `.vscode/settings.json`

* Fix outdated naming in `api.ts`

* Fix app host response

* Fix unsupported type

* Fix imported type
2023-01-29 15:47:24 -06:00
Colin Adler 1cd5f38cb0
feat: add debug server for tailnet coordinators (#5861)
Implements a Tailscale-like debug server for our in-memory coordinator. This should provide some visibility into why connections could be failing.
Resolves: https://github.com/coder/coder/issues/5845

![image](https://user-images.githubusercontent.com/6332295/214680832-2724d633-2d54-44d6-a7ce-5841e5824ee5.png)
2023-01-25 21:27:36 +00:00
Mathias Fredriksson 138887de7e
feat: Add workspace agent lifecycle state reporting (#5785) 2023-01-24 14:24:27 +02:00
Colin Adler d2ae16dd22
fix: routinely ping agent websocket to ensure liveness (#5824) 2023-01-23 20:05:29 +00:00
Cian Johnston 73afdd7c09
chore: agent_test.go: use ptty.Peek() instead of expecting caret in TestAgent_SessionTTYShell (#5821) 2023-01-23 11:23:25 +00:00
Kyle Carberry 9f6edab53b
feat: replace vscodeipc with vscodessh (#5645)
The VS Code extension has been refactored to use VS Code
Remote SSH instead of using the private API.

This changes the structure to continue using SSH, but
output network information periodically to a file.
2023-01-10 04:23:17 +00:00
Dean Sheather f1fe2b5c06
feat: add GPG forwarding to coder ssh (#5482) 2023-01-06 07:52:19 +00:00
Dean Sheather 1bc4eb5329
fix: fix security vulnerabilities reported by CodeQL (#5467) 2022-12-19 19:25:59 +00:00
Kyle Carberry d170d27e80
feat: add `external` property to `coder_app` (#5425)
* Add schema

* feat: add `external` property to `coder_app`

This allows exposing applications that open an external URL.
2022-12-14 15:54:18 -06:00
Mathias Fredriksson 012a9e759e
fix: Avoid deadlock in AgentReportStats Close during agent Close (#5415)
Since AgentReportStats takes a stats function which was doing mutex
locking on agent shutdown, it was possible for there to be a deadlock
depending on how the AgentReportsStats Close function is implemented.

This mostly seems to happen on Windows test runners as it's pretty hard
to hit this edge case. The bug currently only exists in the test
implementation of AgentReportStats, however, this was refactored to be
more robust in case of future changes.
2022-12-14 18:45:46 +02:00
Mathias Fredriksson 4fc4c01cea
fix: Enable reconnectingpty loadtest and fix/improve logging (#5403)
* fix: Enable reconnectingpty loadtest and fix/improve logging

This commit re-enabled reconnectingpty loadtests after a logging
refactor of `(*agent).handleReconnectingPTY`. The reasons the tests were
flaking was that `logger.Error` was being called and `slogtest` failing
the test.

We could have set the option for `slogtest` to disable failing, but that
could hide real issues. The current approach improves reconnectingpty
logging overall and provides more insight into what's happening. It's
expected that reconnectingpty sessions fail after the agent is closed,
so calling `logger.Error` at that point is not wanted.

Ref: #5322
2022-12-13 21:28:07 +02:00
Mathias Fredriksson 760419a965
chore: Refactor agent tests to avoid `t.Run` when not needed (#5376)
It turns out that writing tests that contain subtests should probably be
limited to table-based tests and tests that share a common setup shared
between tests.

Writing tests with a subtest like this:

```
func TestSomething(t *testing.T) {
	t.Run("Subtest", func(t *testing.t) {})
}
```

Has the following disadvantages:

- It can lead to multiple tests failing with `(unknown)` status when
  only one of the subtests hang (never exit)
- In Go 1.20rc1, using `t.Setenv` is no longer allowed if the parent
  test is parallel
2022-12-12 22:20:46 +02:00
Mathias Fredriksson 88bb901283
fix: Close tailnet if agent is closed during creation (#5375) 2022-12-12 11:26:49 +00:00
Mathias Fredriksson 05130db571
fix: Improve closing of services in agent tests (#5355) 2022-12-09 12:22:27 +02:00
Marcin Tojek d3200382f6
fix: agent panics on closed network (#5295)
* fix: agent panics on closed network

* Remove a.network = network

* Fix

* Fix

* Fix
2022-12-05 23:18:23 +01:00
Mathias Fredriksson fa641554e8
fix: Improve agent connection tracking when agent is closed (#5253) 2022-12-02 16:24:40 +02:00
Mathias Fredriksson eff99f78fa
feat: Add support for MOTD file in coder agents (#5147) 2022-11-24 12:22:20 +00:00
Colin Adler ae38bbeab6
chore: refactor agent stats streaming (#5112) 2022-11-18 16:46:53 -06:00
Dean Sheather 69e8c9e7b4
feat: add reconnectingpty loadtest (#5083) 2022-11-17 16:57:15 +00:00
Mathias Fredriksson e72927f3ab
fix: Avoid running shell twice in coder agent (#5061)
The users login shell would be executed as:

	/bin/bash -c '/bin/bash -l'

This simplifies the command for login shells so that the executed
command is:

	/bin/bash -l
2022-11-14 14:01:22 +02:00
Mathias Fredriksson c515085450
fix: Unify context usage for agent cmd and logs (#5059) 2022-11-14 13:48:44 +02:00
Geoffrey Huntley cbb1e91372
feat(windows): default to PowerShell v7 over v6 and fallback to cmd.exe (#5053) 2022-11-14 15:43:40 +10:00
Mathias Fredriksson f017548a9c
fix: Return correct exit code for SFTP sessions (#5044)
Fixes #5038
2022-11-13 23:22:50 +02:00
Ammar Bandukwala 73f91e4690
ci: use big runners (#4990)
* chore: Close idle connections on test cleanup

It's possible that this was the source of a leak on Windows...

* ci: use big runners

* fix: Improve tailnet connections by reducing timeouts

This awaits connection ping before running a dial. Before,
we were hitting the TCP retransmission and handshake timeouts,
which could intermittently add 1 or 5 seconds to a connection
being initialized.

* Add logging to Startupscript test

* Add better logging

* Write startup script logs to fs dir

* Fix startup script test

* Fix startup script test

* Reduce test timeout

* Use central tmp dir in agent

* Adjust output

* Skip startup script test on Windows

Co-authored-by: Kyle Carberry <kyle@carberry.com>
2022-11-13 14:23:23 -06:00
Kyle Carberry 82f494c99c
fix: Improve tailnet connections by reducing timeouts (#5043)
* fix: Improve tailnet connections by reducing timeouts

This awaits connection ping before running a dial. Before,
we were hitting the TCP retransmission and handshake timeouts,
which could intermittently add 1 or 5 seconds to a connection
being initialized.

* Update Tailscale
2022-11-13 11:33:05 -06:00
Kyle Carberry 16e9b1eb1a
fix: Add timeouts to every tailnet ping (#4986)
A ping isn't guaranteed to deliver, so these need to have a
tight timeout for tests to not flake.
2022-11-09 20:12:51 +00:00
Dean Sheather d82364b9b5
feat: make trace provider in loadtest, add tracing to sdk (#4939) 2022-11-09 08:10:48 +10:00
Kyle Carberry 165b6fbc6a
fix: Use app slugs instead of the display name to report health (#4944)
All applications without display names were reporting broken health.
2022-11-07 23:35:01 +00:00
Kyle Carberry 8e743d28c8
fix: Use instance identity session token for git subcommands (#4884)
This broke using gitssh with instance identity!
2022-11-04 09:44:36 -07:00
Kyle Carberry 104d6608d9
feat: Add `VSCODE_PROXY_URI` to surface code-server ports (#4798)
* feat: Add `VSCODE_PROXY_URI` to surface code-server ports

Fixes #4776.

* Check if app host is provided
2022-11-04 04:45:43 +00:00
Ben Potter 9b76b10206
chore: hide Coder message on code-server's "Getting Started" page (#4847) 2022-11-03 11:04:27 -05:00
Dean Sheather 10df2fd4fb
feat: add new required slug property to coder_app, use in URLs (#4573) 2022-10-28 17:41:31 +00:00
Kyle Carberry b34a67e6cb
fix: Allow custom Git OAuth URLs (#4758)
Fixes an issue reported in Discord where custom endpoints
weren't working.
2022-10-27 10:38:05 -07:00
Mathias Fredriksson a0bdb4fca2
fix: Remove pkg/sftp fork, fix SFTP test (#4759) 2022-10-26 16:02:06 +03:00
Kyle Carberry eec406b739
feat: Add Git auth for GitHub, GitLab, Azure DevOps, and BitBucket (#4670)
* Add scaffolding

* Move migration

* Add endpoints for gitauth

* Add configuration files and tests!

* Update typesgen

* Convert configuration format for git auth

* Fix unclosed database conn

* Add overriding VS Code configuration

* Fix Git screen

* Write VS Code special configuration if providers exist

* Enable automatic cloning from VS Code

* Add tests for gitaskpass

* Fix feature visibiliy

* Add banner for too many configurations

* Fix update loop for oauth token

* Jon comments

* Add deployment config page
2022-10-24 19:46:24 -05:00
Kyle Carberry bf3224e373
fix: Refactor agent to consume API client (#4715)
* fix: Refactor agent to consume API client

This simplifies a lot of code by creating an interface for
the codersdk client into the agent. It also moves agent
authentication code so instance identity will work between
restarts.

Fixes #3485 and #4082.

* Fix client reconnections
2022-10-23 22:35:08 -05:00
Mathias Fredriksson 173b7a2c83
fix: Start SFTP sessions in user home (working directory) (#4549)
* fix: Start SFTP sessions in user home (working directory)

This commit switches to our fork of `pkg/sftp` which includes a Server
option for changing the current working directory.

Attempt to upstream: https://github.com/pkg/sftp/pull/528

Supercedes and closes #4420

Fixes #3620

* Update fork
2022-10-21 09:54:06 -05:00
Colin Adler 0a5e5544b1
fix: `time.NewTicker` leaks (#4630) 2022-10-18 15:26:21 -05:00
Colin Adler 29acd25b4e
fix: chrome requests hanging over port-forward (#4588) 2022-10-17 11:45:29 -05:00
Kyle Carberry 2ba4a62a0d
feat: Add high availability for multiple replicas (#4555)
* feat: HA tailnet coordinator

* fixup! feat: HA tailnet coordinator

* fixup! feat: HA tailnet coordinator

* remove printlns

* close all connections on coordinator

* impelement high availability feature

* fixup! impelement high availability feature

* fixup! impelement high availability feature

* fixup! impelement high availability feature

* fixup! impelement high availability feature

* Add replicas

* Add DERP meshing to arbitrary addresses

* Move packages to highavailability folder

* Move coordinator to high availability package

* Add flags for HA

* Rename to replicasync

* Denest packages for replicas

* Add test for multiple replicas

* Fix coordination test

* Add HA to the helm chart

* Rename function pointer

* Add warnings for HA

* Add the ability to block endpoints

* Add flag to disable P2P connections

* Wow, I made the tests pass

* Add replicas endpoint

* Ensure close kills replica

* Update sql

* Add database latency to high availability

* Pipe TLS to DERP mesh

* Fix DERP mesh with TLS

* Add tests for TLS

* Fix replica sync TLS

* Fix RootCA for replica meshing

* Remove ID from replicasync

* Fix getting certificates for meshing

* Remove excessive locking

* Fix linting

* Store mesh key in the database

* Fix replica key for tests

* Fix types gen

* Fix unlocking unlocked

* Fix race in tests

* Update enterprise/derpmesh/derpmesh.go

Co-authored-by: Colin Adler <colin1adler@gmail.com>

* Rename to syncReplicas

* Reuse http client

* Delete old replicas on a CRON

* Fix race condition in connection tests

* Fix linting

* Fix nil type

* Move pubsub to in-memory for twenty test

* Add comment for configuration tweaking

* Fix leak with transport

* Fix close leak in derpmesh

* Fix race when creating server

* Remove handler update

* Skip test on Windows

* Fix DERP mesh test

* Wrap HTTP handler replacement in mutex

* Fix error message for relay

* Fix API handler for normal tests

* Fix speedtest

* Fix replica resend

* Fix derpmesh send

* Ping async

* Increase wait time of template version jobd

* Fix race when closing replica sync

* Add name to client

* Log the derpmap being used

* Don't connect if DERP is empty

* Improve agent coordinator logging

* Fix lock in coordinator

* Fix relay addr

* Fix race when updating durations

* Fix client publish race

* Run pubsub loop in a queue

* Store agent nodes in order

* Fix coordinator locking

* Check for closed pipe

Co-authored-by: Colin Adler <colin1adler@gmail.com>
2022-10-17 13:43:30 +00:00
Dean Sheather 29a2fe46e8
fix: fix builds on windows_arm64 (#4388) 2022-10-06 23:42:58 +10:00
Dean Sheather 1386465631
feat: add endpoint to get listening ports in agent (#4260) 2022-10-06 22:38:22 +10:00
Garrett Delfosse 20bcb04e8a
fix: use correct interval for healthcheck loop (#4212) 2022-09-26 21:00:58 +00:00
Kyle Carberry 39cf329404
fix: Replace access URL for built-in DERP servers (#4197)
Fixes #4195.
2022-09-26 12:56:04 -05:00
Garrett Delfosse 4c8be34d81
feat: add health check monitoring to workspace apps (#4114) 2022-09-23 15:51:04 -04:00
Mathias Fredriksson 6b365f46f5
fix: Ensure coordinator is closed and freed in agent (#4164)
* fix: Close coordinator on context cancellation

* fix: Refactor runCoordinator so that previous is closed/freed
2022-09-23 18:08:13 +03:00
Kyle Carberry a7ee8b31e0
fix: Don't use StatusAbnormalClosure (#4155) 2022-09-22 18:26:05 +00:00
Kyle Carberry 714c366d16
chore: Remove WebRTC networking (#3881)
* chore: Remove WebRTC networking

* Fix race condition

* Fix WebSocket not closing
2022-09-19 19:46:29 -05:00
Kyle Carberry 0f8c2f592e
feat: Use Tailscale networking by default (#4003)
* feat: Use Tailscale networking by default

Removal of WebRTC code will happen in another PR, but it
felt dangerious to default and remove in a single commit.

Ideally, we can release this version and collect final
thoughts and  feedback before a full commitment.

* Remove UNIX forwarding

Tailscale doesn't support this, and adding support
for it shouldn't block our rollout. Customers can
always forward over SSH.

* Update cli/portforward_test.go

Co-authored-by: Dean Sheather <dean@deansheather.com>

Co-authored-by: Dean Sheather <dean@deansheather.com>
2022-09-13 15:55:56 -05:00
Mathias Fredriksson 09da3858ce
fix: Terminal emulation used by SSH sessions (#3473)
Fixes #3371
2022-09-12 19:27:51 +03:00
Kyle Carberry 00104096c2
fix: Resolve CI flakes for tailnet agent (#3924) 2022-09-07 09:24:58 -05:00
Kyle Carberry 80352656e9
fix: Improve speedtest by adding direct connection toggle (#3919)
It's weird to test connection speeds over DERP, because most
connections will eventually migrate to direct.
2022-09-07 03:21:08 +00:00
Kyle Carberry 1254e7a902
feat: Add speedtest command for tailnet (#3874) 2022-09-05 17:15:49 -05:00
Mathias Fredriksson 3ca6f1fcd4
fix: Prevent nil pointer deref in reconnectingPTY (#3871)
Related #3870
2022-09-05 16:45:10 +03:00
Kyle Carberry de219d966d
fix: Run Tailnet SSH connections in a goroutine (#3838)
This was causing SSH connections in parallel to fail 🤦!
2022-09-02 11:58:15 -05:00
Ammar Bandukwala 04b03792cb
feat: add last used to Workspaces page (#3816) 2022-09-02 00:08:51 +00:00
Ammar Bandukwala 30f8fd9b95
Daily Active User Metrics (#3735)
* agent: add StatsReporter

* Stabilize protoc
2022-09-01 14:58:23 -05:00
Kyle Carberry 9bd83e5ec7
feat: Add Tailscale networking (#3505)
* fix: Add coder user to docker group on installation

This makes for a simpler setup, and reduces the likelihood
a user runs into a strange issue.

* Add wgnet

* Add ping

* Add listening

* Finish refactor to make this work

* Add interface for swapping

* Fix conncache with interface

* chore: update gvisor

* fix tailscale types

* linting

* more linting

* Add coordinator

* Add coordinator tests

* Fix coordination

* It compiles!

* Move all connection negotiation in-memory

* Migrate coordinator to use net.conn

* Add closed func

* Fix close listener func

* Make reconnecting PTY work

* Fix reconnecting PTY

* Update CI to Go 1.19

* Add CLI flags for DERP mapping

* Fix Tailnet test

* Rename ConnCoordinator to TailnetCoordinator

* Remove print statement from workspace agent test

* Refactor wsconncache to use tailnet

* Remove STUN from unit tests

* Add migrate back to dump

* chore: Upgrade to Go 1.19

This is required as part of #3505.

* Fix reconnecting PTY tests

* fix: update wireguard-go to fix devtunnel

* fix migration numbers

* linting

* Return early for status if endpoints are empty

* Update cli/server.go

Co-authored-by: Colin Adler <colin1adler@gmail.com>

* Update cli/server.go

Co-authored-by: Colin Adler <colin1adler@gmail.com>

* Fix frontend entites

* Fix agent bicopy

* Fix race condition for the last node

* Fix down migration

* Fix connection RBAC

* Fix migration numbers

* Fix forwarding TCP to a local port

* Implement ping for tailnet

* Rename to ForceHTTP

* Add external derpmapping

* Expose DERP region names to the API

* Add global option to enable Tailscale networking for web

* Mark DERP flags hidden while testing

* Update DERP map on reconnect

* Add close func to workspace agents

* Fix race condition in upstream dependency

* Fix feature columns race condition

Co-authored-by: Colin Adler <colin1adler@gmail.com>
2022-08-31 20:09:44 -05:00
Mathias Fredriksson e44f7adb7e
feat: Set SSH env vars: `SSH_CLIENT`, `SSH_CONNECTION` and `SSH_TTY` (#3622)
Fixes #2339
2022-08-23 21:19:57 +03:00
Mathias Fredriksson 5025fe2fa0
fix: Protect circular buffer during close in reconnectingPTY (#3646) 2022-08-23 16:07:31 +00:00
Mathias Fredriksson f7ccfa2ab9
feat: Set `CODER=true` in workspaces (#3637)
Fixes #2340
2022-08-23 14:29:01 +03:00
Kyle Carberry b0fe9bcdd1
chore: Upgrade to Go 1.19 (#3617)
This is required as part of #3505.
2022-08-21 22:32:53 +00:00
Kyle Carberry 16c12e976e
chore: Improve agent logging (#3483) 2022-08-12 07:01:00 -05:00
Ammar Bandukwala 19fcf60864
ci: add typo detection (#3327)
And fix them.
2022-08-01 09:29:52 -04:00
Mathias Fredriksson 4730c589fe
chore: Use standardized test timeouts and delays (#3291) 2022-08-01 15:45:05 +03:00
Mathias Fredriksson 29d44b6283
fix: Guard pty window resize after close (#3270)
Could help alleviate #3236.
2022-07-28 19:07:11 +00:00
Spike Curtis 36ffdce065
Return proper exit code on ssh with TTY (#3192)
* Return proper exit code on ssh with TTY

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix revive lint

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix Windows exit code for missing command

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix close error handling on agent TTY

Signed-off-by: Spike Curtis <spike@coder.com>
2022-07-27 14:23:28 -05:00
Mathias Fredriksson 0b86c8047c
fix: Close connections in agent tests (#3196) 2022-07-26 13:24:54 +03:00
Mathias Fredriksson 51dd1fde3b
fix: Remove use of `require` in `require.Eventually` in tests (#3110)
* fix: Remove use of `require` in `require.Eventually` in tests

Because require uses `t.FailNow()` and `require.Eventually` runs the
function in a goroutine, which is not allowed.

* feat: Add ruleguard for require.Eventually

Co-authored-by: Cian Johnston <cian@coder.com>
2022-07-22 20:02:49 +03:00