coder

Commit Graph

Author	SHA1	Message	Date
Spike Curtis	f01cab9894	feat: use tailnet v2 API for coordination (#11638 ) This one is huge, and I'm sorry. The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet. There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!	2024-01-22 11:07:50 +04:00
Spike Curtis	58873fa7e2	chore: remove unused context/cancel in tailnet Conn (#11399 ) Spotted during code read; unused fields	2024-01-05 08:15:42 +04:00
Spike Curtis	520c3a8ff7	fix: use TSMP for pings and checking reachability (#11306 ) We're seeing some flaky tests related to agent connectivity - https://github.com/coder/coder/actions/runs/7286675441/job/19856270998 I'm pretty sure what happened in this one is that the client opened a connection while the wgengine was in the process of reconfiguring the wireguard device, so the fact that the peer became "active" as a result of traffic being sent was not noticed. The test calls `AwaitReachable()` but this only tests the disco layer, so it doesn't wait for wireguard to come up. I think we should be using TSMP for pinging and reachability, since this operates at the IP layer, and therefore requires that wireguard comes up before being successful. This should also help with the problems we have seen where a TCP connection starts before wireguard is up and the initial round trip has to wait for the 5 second wireguard handshake retry. fixes: #11294	2024-01-02 15:53:52 +04:00
Spike Curtis	f400d8a0c5	fix: handle SIGHUP from OpenSSH (#10638 ) Fixes an issue where remote forwards are not correctly torn down when using OpenSSH with `coder ssh --stdio`. OpenSSH sends a disconnect signal, but then also sends SIGHUP to `coder`. Previously, we just exited when we got SIGHUP, and this raced against properly disconnecting. Fixes https://github.com/coder/customers/issues/327	2023-11-13 15:14:42 +04:00
Spike Curtis	94eb9b8db1	fix: disable t.Parallel on TestPortForward (#10449 ) I've said it before, I'll say it again: you can't create a timed context before calling `t.Parallel()` and then use it after. Fixes flakes like https://github.com/coder/coder/actions/runs/6716682414/job/18253279157 I've chosen just to drop `t.Parallel()` entirely rather than create a second context after the parallel call, since the vast majority of the test time happens before where the parallel call was. It does all the tailnet setup before `t.Parallel()`. Leaving a call to `t.Parallel()` is a bug risk for future maintainers to come in and use the wrong context in the latter part of the test by accident.	2023-11-01 13:45:13 +04:00
Spike Curtis	236e84c4d6	feat: add logging for forwarded TCP connections part of #7963 log TCP connections as they are forwarded by gVisor	2023-10-09 19:41:26 +04:00
Colin Adler	03a7d2f70b	chore: fix servertailnet test flake (#10110 ) https://github.com/coder/coder/actions/runs/6424100765/job/17444018788?pr=10083#step:5:771	2023-10-06 11:31:53 -05:00
Mathias Fredriksson	19d7da3d24	refactor(coderd/database): split `Time` and `Now` into `dbtime` package (#9482 ) Ref: #9380	2023-09-01 16:50:12 +00:00
Colin Adler	64ef867b4f	fix(tailnet): re-add keepalives (#9410 )	2023-08-29 15:21:30 -05:00
Dean Sheather	64df076328	feat: add server flag to force DERP to use always websockets (#9238 )	2023-08-24 17:22:31 +00:00
Kyle Carberry	22e781eced	chore: add /v2 to import module path (#9072 ) * chore: add /v2 to import module path go mod requires semantic versioning with versions greater than 1.x This was a mechanical update by running: ``` go install github.com/marwan-at-work/mod/cmd/mod@latest mod upgrade ``` Migrate generated files to import /v2 * Fix gen	2023-08-18 18:55:43 +00:00
Colin Adler	5b2ea2e94f	fix(tailnet): disable wireguard trimming (#9098 ) Co-authored-by: Spike Curtis <spike@coder.com>	2023-08-15 14:26:56 -05:00
Colin Adler	344d32b2f1	feat(coderd): expire agents from server tailnet (#9092 )	2023-08-14 20:38:37 -05:00
Colin Adler	bc862fa493	chore: upgrade tailscale to v1.46.1 (#8913 )	2023-08-09 19:50:26 +00:00
Dean Sheather	3c52b01850	chore: add tailscale magicsock debug logging controls (#8982 )	2023-08-08 17:56:08 +00:00
Ammar Bandukwala	25e30c6f41	feat(cli): support fine-grained server log filtering (#8748 )	2023-07-26 16:46:22 -05:00
Colin Adler	c47b78c44b	chore: replace wsconncache with a single tailnet (#8176 )	2023-07-12 17:37:31 -05:00
Kyle Carberry	f40865bc2f	chore: use mutex around `blockEndpoints` (#8209 ) https://github.com/coder/coder/actions/runs/5378950122/jobs/9759972142	2023-06-26 10:01:50 -05:00
Dean Sheather	a28d422c35	feat: add flag to disable all direct connections (#7936 )	2023-06-21 22:02:05 +00:00
Marcin Tojek	b1d1b63113	chore: ensure logs consistency across Coder (#8083 )	2023-06-20 12:30:45 +02:00
Marcin Tojek	247f8a973f	feat: replace ssh maxTimeout with keep-alive mechanism (#8062 ) * Bump up coder/ssh * feat: Set default agent timeout to ~72h * Address PR comments * Fix	2023-06-16 15:22:18 +02:00
Spike Curtis	b3689c8f64	Only send tailnet nodes updates with preferred DERP (#7387 ) Signed-off-by: Spike Curtis <spike@coder.com>	2023-05-04 14:30:57 +04:00
Colin Adler	3eb7f06bf1	feat(agent): add http debug routes for magicsock (#7287 )	2023-04-26 13:01:49 -05:00
Colin Adler	745868fd8a	revert: chore: upgrade tailscale (#7236 )	2023-04-20 17:58:22 -05:00
Colin Adler	a86830a283	chore: upgrade tailscale (#7207 )	2023-04-20 13:29:56 -05:00
Colin Adler	fbf329fbb7	fix(tailnet): set TCP keepalive idle to 72 hours for SSH conns (#7196 )	2023-04-18 17:53:11 -05:00
Kyle Carberry	bc18f6c113	fix: add `CODER_AGENT_TAILNET_LISTEN_PORT` for specifying a static tailnet port (#6980 ) Fixes #5175.	2023-04-03 16:20:19 +00:00
Josh Vawdrey	97f77c4507	feat: allow DERP headers to be set (#6572 ) * feat: allow DERP headers to be set * chore: remove custom flag * Clone DERP header on client create * Adjust to use interface to cast headers --------- Co-authored-by: Kyle Carberry <kyle@carberry.com>	2023-03-21 18:43:20 +00:00
Kyle Carberry	17bc5794d4	fix: direct embedded derp traffic directly to the server (#6595 ) Prior to this change, DERP traffic would route from `coderd` to the `CODER_ACCESS_URL` to reach the internal DERP server, which may have resulted in slower connections due to proxying, or the failure of web traffic entirely. If your Coder deployment has a proxy in front of it, your traffic through web terminals, apps, and port-forwarding is about to get a lot faster!	2023-03-14 14:46:47 +00:00
Kyle Carberry	7a8ccda40e	chore: copy forced derp websockets to fix flake (#6475 ) See: https://github.com/coder/coder/actions/runs/4350034299/jobs/7600478389	2023-03-06 21:29:41 -06:00
Kyle Carberry	2ff1c6d613	feat: add agent stats for different connection types (#6412 ) This allows us to track when our extensions are used, when the web terminal is used, and average connection latency to the agent.	2023-03-02 08:06:00 -06:00
Kyle Carberry	1724cbf872	feat: automatically use websockets if DERP upgrade is unavailable (#6381 ) * feat: automatically use websockets if DERP upgrade is unavailable This might be our biggest hangup for deployments at the moment... Load balancers by default do not support the DERP protocol, so many of our prospects and customers run into failing workspace connections. This automatically swaps to use WebSockets, and reports the reason to coderd. In a future contribution, a warning will appear by the agent if it was forced to use WebSockets instead of DERP. * Fix nil pointer type in Tailscale dep * Fix requested changes	2023-03-01 22:18:14 +00:00
Mathias Fredriksson	cae8b88f60	fix(tailnet): Avoid logging netmap (#6342 )	2023-02-25 08:06:38 +00:00
Mathias Fredriksson	677721e4a1	fix(tailnet): Skip nodes without DERP, avoid use of RemoveAllPeers (#6320 ) * fix(tailnet): Skip nodes without DERP, avoid use of RemoveAllPeers	2023-02-24 18:16:29 +02:00
Mathias Fredriksson	a414de9e81	fix(tailnet): Improve tailnet setup and agentconn stability (#6292 ) * fix(tailnet): Improve start and close to detect connection races * fix: Prevent agentConn use before ready via AwaitReachable * fix(tailnet): Ensure connstats are closed on conn close * fix(codersdk): Use AwaitReachable in DialWorkspaceAgent * fix(tailnet): Improve logging via slog.Helper()	2023-02-24 13:11:28 +02:00
Colin Adler	a54de6093b	feat: add `coder ping` (#6161 )	2023-02-13 10:38:00 -06:00
Kyle Carberry	c0c83f17b2	fix: follow tailscale idioms for when to update nodes (#6164 )	2023-02-10 16:59:24 -06:00
Colin Adler	4432cd08d6	chore: update tailscale (#6091 )	2023-02-09 21:43:18 -06:00
Colin Adler	52ecd35c8f	fix(wsconncache): only allow one peer per connection (#5886 ) If an agent went away and reconnected, the wsconncache connection would be polluted for about 10m because there would be two peers with the same IP. The old peer always had priority, which caused the dashboard to try and always dial the old peer until it was removed. Fixes: https://github.com/coder/coder/issues/5292	2023-01-26 22:23:35 +00:00
Kyle Carberry	e61234f260	feat: Add `vscodeipc` subcommand for VS Code Extension (#5326 ) * Add extio * feat: Add `vscodeipc` subcommand for VS Code Extension This enables the VS Code extension to communicate with a Coder client. The extension will download the slim binary from `/bin/` for the respective client architecture and OS, then execute `coder vscodeipc` for the connecting workspace. Add authentication header, improve comments, and add tests for the CLI * Update cli/vscodeipc_test.go Co-authored-by: Mathias Fredriksson <mafredri@gmail.com> * Update cli/vscodeipc_test.go Co-authored-by: Mathias Fredriksson <mafredri@gmail.com> * Update cli/vscodeipc/vscodeipc_test.go Co-authored-by: Mathias Fredriksson <mafredri@gmail.com> * Fix requested changes * Fix IPC tests * Fix shell execution * Fix nix flake * Silence usage Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2022-12-18 17:50:06 -06:00
Colin Adler	ae38bbeab6	chore: refactor agent stats streaming (#5112 )	2022-11-18 16:46:53 -06:00
Mathias Fredriksson	d9a83fc723	fix: Refactor tailnet conn AwaitReachable to allow for pings >1s RTT (#5096 )	2022-11-15 20:59:22 +02:00
Kyle Carberry	82f494c99c	fix: Improve tailnet connections by reducing timeouts (#5043 ) * fix: Improve tailnet connections by reducing timeouts This awaits connection ping before running a dial. Before, we were hitting the TCP retransmission and handshake timeouts, which could intermittently add 1 or 5 seconds to a connection being initialized. * Update Tailscale	2022-11-13 11:33:05 -06:00
Kyle Carberry	29dc5f66b8	experiment: Switch to BuildJet Linux Runners (#4846 )	2022-11-01 20:56:33 +00:00
Kyle Carberry	288e7d1045	fix: Flake on `TestReplica/TwentyConcurrent` (#4842 ) This could actually cause connections to intermittently fail too when a CPU is absolutely pegged. It just so happens that only our runners have been that slow! Fixes #4607.	2022-11-01 20:28:34 +00:00
Kyle Carberry	2ba4a62a0d	feat: Add high availability for multiple replicas (#4555 ) * feat: HA tailnet coordinator * fixup! feat: HA tailnet coordinator * fixup! feat: HA tailnet coordinator * remove printlns * close all connections on coordinator * impelement high availability feature * fixup! impelement high availability feature * fixup! impelement high availability feature * fixup! impelement high availability feature * fixup! impelement high availability feature * Add replicas * Add DERP meshing to arbitrary addresses * Move packages to highavailability folder * Move coordinator to high availability package * Add flags for HA * Rename to replicasync * Denest packages for replicas * Add test for multiple replicas * Fix coordination test * Add HA to the helm chart * Rename function pointer * Add warnings for HA * Add the ability to block endpoints * Add flag to disable P2P connections * Wow, I made the tests pass * Add replicas endpoint * Ensure close kills replica * Update sql * Add database latency to high availability * Pipe TLS to DERP mesh * Fix DERP mesh with TLS * Add tests for TLS * Fix replica sync TLS * Fix RootCA for replica meshing * Remove ID from replicasync * Fix getting certificates for meshing * Remove excessive locking * Fix linting * Store mesh key in the database * Fix replica key for tests * Fix types gen * Fix unlocking unlocked * Fix race in tests * Update enterprise/derpmesh/derpmesh.go Co-authored-by: Colin Adler <colin1adler@gmail.com> * Rename to syncReplicas * Reuse http client * Delete old replicas on a CRON * Fix race condition in connection tests * Fix linting * Fix nil type * Move pubsub to in-memory for twenty test * Add comment for configuration tweaking * Fix leak with transport * Fix close leak in derpmesh * Fix race when creating server * Remove handler update * Skip test on Windows * Fix DERP mesh test * Wrap HTTP handler replacement in mutex * Fix error message for relay * Fix API handler for normal tests * Fix speedtest * Fix replica resend * Fix derpmesh send * Ping async * Increase wait time of template version jobd * Fix race when closing replica sync * Add name to client * Log the derpmap being used * Don't connect if DERP is empty * Improve agent coordinator logging * Fix lock in coordinator * Fix relay addr * Fix race when updating durations * Fix client publish race * Run pubsub loop in a queue * Store agent nodes in order * Fix coordinator locking * Check for closed pipe Co-authored-by: Colin Adler <colin1adler@gmail.com>	2022-10-17 13:43:30 +00:00
Kyle Carberry	b8ec5c786d	fix: Ensure tailnet coordinations are sent orderly (#4198 )	2022-09-26 10:16:04 -05:00
Kyle Carberry	99013b3aed	chore: Close dials in tailnet conn on close (#4174 ) Fixes a race seen in: https://github.com/coder/coder/actions/runs/3114263658/jobs/5049905647	2022-09-23 12:10:47 -05:00
Kyle Carberry	80b45f1aa1	fix: Buffer tailnet nodes from connection initialization (#4159 ) * fix: Don't use StatusAbnormalClosure This is reserved for WASM use, and might be the cause of some weird leaks. * Add close to provisioner logs	2022-09-22 20:22:49 +00:00
Kyle Carberry	5c0d63d31f	fix: Only hold `tailnet.Conn.Close()` for a short duration (#4015 ) fix: Only hold `tailnet.Conn.Close()` for a short duration The long duration could be cause to a test deadlock. Add closed chan to listener struct	2022-09-12 17:46:45 +00:00

1 2

57 Commits