Commit Graph

124 Commits

Author SHA1 Message Date
Mathias Fredriksson e6f5623627
chore: Rename agent statistics server to http api server (#5961) 2023-02-01 20:05:57 +02:00
Mathias Fredriksson f4d6afb01d
feat(agent): Allow specifying log directory via flag or env (#5915) 2023-01-30 18:39:52 +02:00
Kyle Carberry 0d08065488
fix: use a waitgroup to ensure all connections are cleaned up in agent (#5910)
* fix: use a waitgroup to ensure all connections are cleaned up in agent

There was a race where connections would be created at the same time as close.
The `net.Conn` produced by Tailscale doesn't close then the listener does.

* Remove accidental test
2023-01-29 17:20:30 -06:00
Kyle Carberry 7ad87505c8
chore: move agent functions from `codersdk` into `agentsdk` (#5903)
* chore: rename `AgentConn` to `WorkspaceAgentConn`

The codersdk was becoming bloated with consts for the workspace
agent that made no sense to a reader. `Tailnet*` is an example
of these consts.

* chore: remove `Get` prefix from *Client functions

* chore: remove `BypassRatelimits` option in `codersdk.Client`

It feels wrong to have this as a direct option because it's so infrequently
needed by API callers. It's better to directly modify headers in the two
places that we actually use it.

* Merge `appearance.go` and `buildinfo.go` into `deployment.go`

* Merge `experiments.go` and `features.go` into `deployment.go`

* Fix `make gen` referencing old type names

* Merge `error.go` into `client.go`

`codersdk.Response` lived in `error.go`, which is wrong.

* chore: refactor workspace agent functions into agentsdk

It was odd conflating the codersdk that clients should use
with functions that only the agent should use. This separates
them into two SDKs that are closely coupled, but separate.

* Merge `insights.go` into `deployment.go`

* Merge `organizationmember.go` into `organizations.go`

* Merge `quota.go` into `workspaces.go`

* Rename `sse.go` to `serversentevents.go`

* Rename `codersdk.WorkspaceAppHostResponse` to `codersdk.AppHostResponse`

* Format `.vscode/settings.json`

* Fix outdated naming in `api.ts`

* Fix app host response

* Fix unsupported type

* Fix imported type
2023-01-29 15:47:24 -06:00
Colin Adler 1cd5f38cb0
feat: add debug server for tailnet coordinators (#5861)
Implements a Tailscale-like debug server for our in-memory coordinator. This should provide some visibility into why connections could be failing.
Resolves: https://github.com/coder/coder/issues/5845

![image](https://user-images.githubusercontent.com/6332295/214680832-2724d633-2d54-44d6-a7ce-5841e5824ee5.png)
2023-01-25 21:27:36 +00:00
Mathias Fredriksson 138887de7e
feat: Add workspace agent lifecycle state reporting (#5785) 2023-01-24 14:24:27 +02:00
Colin Adler d2ae16dd22
fix: routinely ping agent websocket to ensure liveness (#5824) 2023-01-23 20:05:29 +00:00
Cian Johnston 73afdd7c09
chore: agent_test.go: use ptty.Peek() instead of expecting caret in TestAgent_SessionTTYShell (#5821) 2023-01-23 11:23:25 +00:00
Kyle Carberry 9f6edab53b
feat: replace vscodeipc with vscodessh (#5645)
The VS Code extension has been refactored to use VS Code
Remote SSH instead of using the private API.

This changes the structure to continue using SSH, but
output network information periodically to a file.
2023-01-10 04:23:17 +00:00
Dean Sheather f1fe2b5c06
feat: add GPG forwarding to coder ssh (#5482) 2023-01-06 07:52:19 +00:00
Dean Sheather 1bc4eb5329
fix: fix security vulnerabilities reported by CodeQL (#5467) 2022-12-19 19:25:59 +00:00
Kyle Carberry d170d27e80
feat: add `external` property to `coder_app` (#5425)
* Add schema

* feat: add `external` property to `coder_app`

This allows exposing applications that open an external URL.
2022-12-14 15:54:18 -06:00
Mathias Fredriksson 012a9e759e
fix: Avoid deadlock in AgentReportStats Close during agent Close (#5415)
Since AgentReportStats takes a stats function which was doing mutex
locking on agent shutdown, it was possible for there to be a deadlock
depending on how the AgentReportsStats Close function is implemented.

This mostly seems to happen on Windows test runners as it's pretty hard
to hit this edge case. The bug currently only exists in the test
implementation of AgentReportStats, however, this was refactored to be
more robust in case of future changes.
2022-12-14 18:45:46 +02:00
Mathias Fredriksson 4fc4c01cea
fix: Enable reconnectingpty loadtest and fix/improve logging (#5403)
* fix: Enable reconnectingpty loadtest and fix/improve logging

This commit re-enabled reconnectingpty loadtests after a logging
refactor of `(*agent).handleReconnectingPTY`. The reasons the tests were
flaking was that `logger.Error` was being called and `slogtest` failing
the test.

We could have set the option for `slogtest` to disable failing, but that
could hide real issues. The current approach improves reconnectingpty
logging overall and provides more insight into what's happening. It's
expected that reconnectingpty sessions fail after the agent is closed,
so calling `logger.Error` at that point is not wanted.

Ref: #5322
2022-12-13 21:28:07 +02:00
Mathias Fredriksson 760419a965
chore: Refactor agent tests to avoid `t.Run` when not needed (#5376)
It turns out that writing tests that contain subtests should probably be
limited to table-based tests and tests that share a common setup shared
between tests.

Writing tests with a subtest like this:

```
func TestSomething(t *testing.T) {
	t.Run("Subtest", func(t *testing.t) {})
}
```

Has the following disadvantages:

- It can lead to multiple tests failing with `(unknown)` status when
  only one of the subtests hang (never exit)
- In Go 1.20rc1, using `t.Setenv` is no longer allowed if the parent
  test is parallel
2022-12-12 22:20:46 +02:00
Mathias Fredriksson 88bb901283
fix: Close tailnet if agent is closed during creation (#5375) 2022-12-12 11:26:49 +00:00
Mathias Fredriksson 05130db571
fix: Improve closing of services in agent tests (#5355) 2022-12-09 12:22:27 +02:00
Marcin Tojek d3200382f6
fix: agent panics on closed network (#5295)
* fix: agent panics on closed network

* Remove a.network = network

* Fix

* Fix

* Fix
2022-12-05 23:18:23 +01:00
Mathias Fredriksson fa641554e8
fix: Improve agent connection tracking when agent is closed (#5253) 2022-12-02 16:24:40 +02:00
Mathias Fredriksson eff99f78fa
feat: Add support for MOTD file in coder agents (#5147) 2022-11-24 12:22:20 +00:00
Colin Adler ae38bbeab6
chore: refactor agent stats streaming (#5112) 2022-11-18 16:46:53 -06:00
Dean Sheather 69e8c9e7b4
feat: add reconnectingpty loadtest (#5083) 2022-11-17 16:57:15 +00:00
Mathias Fredriksson e72927f3ab
fix: Avoid running shell twice in coder agent (#5061)
The users login shell would be executed as:

	/bin/bash -c '/bin/bash -l'

This simplifies the command for login shells so that the executed
command is:

	/bin/bash -l
2022-11-14 14:01:22 +02:00
Mathias Fredriksson c515085450
fix: Unify context usage for agent cmd and logs (#5059) 2022-11-14 13:48:44 +02:00
Geoffrey Huntley cbb1e91372
feat(windows): default to PowerShell v7 over v6 and fallback to cmd.exe (#5053) 2022-11-14 15:43:40 +10:00
Mathias Fredriksson f017548a9c
fix: Return correct exit code for SFTP sessions (#5044)
Fixes #5038
2022-11-13 23:22:50 +02:00
Ammar Bandukwala 73f91e4690
ci: use big runners (#4990)
* chore: Close idle connections on test cleanup

It's possible that this was the source of a leak on Windows...

* ci: use big runners

* fix: Improve tailnet connections by reducing timeouts

This awaits connection ping before running a dial. Before,
we were hitting the TCP retransmission and handshake timeouts,
which could intermittently add 1 or 5 seconds to a connection
being initialized.

* Add logging to Startupscript test

* Add better logging

* Write startup script logs to fs dir

* Fix startup script test

* Fix startup script test

* Reduce test timeout

* Use central tmp dir in agent

* Adjust output

* Skip startup script test on Windows

Co-authored-by: Kyle Carberry <kyle@carberry.com>
2022-11-13 14:23:23 -06:00
Kyle Carberry 82f494c99c
fix: Improve tailnet connections by reducing timeouts (#5043)
* fix: Improve tailnet connections by reducing timeouts

This awaits connection ping before running a dial. Before,
we were hitting the TCP retransmission and handshake timeouts,
which could intermittently add 1 or 5 seconds to a connection
being initialized.

* Update Tailscale
2022-11-13 11:33:05 -06:00
Kyle Carberry 16e9b1eb1a
fix: Add timeouts to every tailnet ping (#4986)
A ping isn't guaranteed to deliver, so these need to have a
tight timeout for tests to not flake.
2022-11-09 20:12:51 +00:00
Dean Sheather d82364b9b5
feat: make trace provider in loadtest, add tracing to sdk (#4939) 2022-11-09 08:10:48 +10:00
Kyle Carberry 165b6fbc6a
fix: Use app slugs instead of the display name to report health (#4944)
All applications without display names were reporting broken health.
2022-11-07 23:35:01 +00:00
Kyle Carberry 8e743d28c8
fix: Use instance identity session token for git subcommands (#4884)
This broke using gitssh with instance identity!
2022-11-04 09:44:36 -07:00
Kyle Carberry 104d6608d9
feat: Add `VSCODE_PROXY_URI` to surface code-server ports (#4798)
* feat: Add `VSCODE_PROXY_URI` to surface code-server ports

Fixes #4776.

* Check if app host is provided
2022-11-04 04:45:43 +00:00
Ben Potter 9b76b10206
chore: hide Coder message on code-server's "Getting Started" page (#4847) 2022-11-03 11:04:27 -05:00
Dean Sheather 10df2fd4fb
feat: add new required slug property to coder_app, use in URLs (#4573) 2022-10-28 17:41:31 +00:00
Kyle Carberry b34a67e6cb
fix: Allow custom Git OAuth URLs (#4758)
Fixes an issue reported in Discord where custom endpoints
weren't working.
2022-10-27 10:38:05 -07:00
Mathias Fredriksson a0bdb4fca2
fix: Remove pkg/sftp fork, fix SFTP test (#4759) 2022-10-26 16:02:06 +03:00
Kyle Carberry eec406b739
feat: Add Git auth for GitHub, GitLab, Azure DevOps, and BitBucket (#4670)
* Add scaffolding

* Move migration

* Add endpoints for gitauth

* Add configuration files and tests!

* Update typesgen

* Convert configuration format for git auth

* Fix unclosed database conn

* Add overriding VS Code configuration

* Fix Git screen

* Write VS Code special configuration if providers exist

* Enable automatic cloning from VS Code

* Add tests for gitaskpass

* Fix feature visibiliy

* Add banner for too many configurations

* Fix update loop for oauth token

* Jon comments

* Add deployment config page
2022-10-24 19:46:24 -05:00
Kyle Carberry bf3224e373
fix: Refactor agent to consume API client (#4715)
* fix: Refactor agent to consume API client

This simplifies a lot of code by creating an interface for
the codersdk client into the agent. It also moves agent
authentication code so instance identity will work between
restarts.

Fixes #3485 and #4082.

* Fix client reconnections
2022-10-23 22:35:08 -05:00
Mathias Fredriksson 173b7a2c83
fix: Start SFTP sessions in user home (working directory) (#4549)
* fix: Start SFTP sessions in user home (working directory)

This commit switches to our fork of `pkg/sftp` which includes a Server
option for changing the current working directory.

Attempt to upstream: https://github.com/pkg/sftp/pull/528

Supercedes and closes #4420

Fixes #3620

* Update fork
2022-10-21 09:54:06 -05:00
Colin Adler 0a5e5544b1
fix: `time.NewTicker` leaks (#4630) 2022-10-18 15:26:21 -05:00
Colin Adler 29acd25b4e
fix: chrome requests hanging over port-forward (#4588) 2022-10-17 11:45:29 -05:00
Kyle Carberry 2ba4a62a0d
feat: Add high availability for multiple replicas (#4555)
* feat: HA tailnet coordinator

* fixup! feat: HA tailnet coordinator

* fixup! feat: HA tailnet coordinator

* remove printlns

* close all connections on coordinator

* impelement high availability feature

* fixup! impelement high availability feature

* fixup! impelement high availability feature

* fixup! impelement high availability feature

* fixup! impelement high availability feature

* Add replicas

* Add DERP meshing to arbitrary addresses

* Move packages to highavailability folder

* Move coordinator to high availability package

* Add flags for HA

* Rename to replicasync

* Denest packages for replicas

* Add test for multiple replicas

* Fix coordination test

* Add HA to the helm chart

* Rename function pointer

* Add warnings for HA

* Add the ability to block endpoints

* Add flag to disable P2P connections

* Wow, I made the tests pass

* Add replicas endpoint

* Ensure close kills replica

* Update sql

* Add database latency to high availability

* Pipe TLS to DERP mesh

* Fix DERP mesh with TLS

* Add tests for TLS

* Fix replica sync TLS

* Fix RootCA for replica meshing

* Remove ID from replicasync

* Fix getting certificates for meshing

* Remove excessive locking

* Fix linting

* Store mesh key in the database

* Fix replica key for tests

* Fix types gen

* Fix unlocking unlocked

* Fix race in tests

* Update enterprise/derpmesh/derpmesh.go

Co-authored-by: Colin Adler <colin1adler@gmail.com>

* Rename to syncReplicas

* Reuse http client

* Delete old replicas on a CRON

* Fix race condition in connection tests

* Fix linting

* Fix nil type

* Move pubsub to in-memory for twenty test

* Add comment for configuration tweaking

* Fix leak with transport

* Fix close leak in derpmesh

* Fix race when creating server

* Remove handler update

* Skip test on Windows

* Fix DERP mesh test

* Wrap HTTP handler replacement in mutex

* Fix error message for relay

* Fix API handler for normal tests

* Fix speedtest

* Fix replica resend

* Fix derpmesh send

* Ping async

* Increase wait time of template version jobd

* Fix race when closing replica sync

* Add name to client

* Log the derpmap being used

* Don't connect if DERP is empty

* Improve agent coordinator logging

* Fix lock in coordinator

* Fix relay addr

* Fix race when updating durations

* Fix client publish race

* Run pubsub loop in a queue

* Store agent nodes in order

* Fix coordinator locking

* Check for closed pipe

Co-authored-by: Colin Adler <colin1adler@gmail.com>
2022-10-17 13:43:30 +00:00
Dean Sheather 29a2fe46e8
fix: fix builds on windows_arm64 (#4388) 2022-10-06 23:42:58 +10:00
Dean Sheather 1386465631
feat: add endpoint to get listening ports in agent (#4260) 2022-10-06 22:38:22 +10:00
Garrett Delfosse 20bcb04e8a
fix: use correct interval for healthcheck loop (#4212) 2022-09-26 21:00:58 +00:00
Kyle Carberry 39cf329404
fix: Replace access URL for built-in DERP servers (#4197)
Fixes #4195.
2022-09-26 12:56:04 -05:00
Garrett Delfosse 4c8be34d81
feat: add health check monitoring to workspace apps (#4114) 2022-09-23 15:51:04 -04:00
Mathias Fredriksson 6b365f46f5
fix: Ensure coordinator is closed and freed in agent (#4164)
* fix: Close coordinator on context cancellation

* fix: Refactor runCoordinator so that previous is closed/freed
2022-09-23 18:08:13 +03:00
Kyle Carberry a7ee8b31e0
fix: Don't use StatusAbnormalClosure (#4155) 2022-09-22 18:26:05 +00:00