coder

Commit Graph

Author	SHA1	Message	Date
Colin Adler	15157c1c40	chore: add network integration test suite scaffolding (#13072 ) * chore: add network integration test suite scaffolding * dean comments	2024-04-26 17:48:41 +00:00
Colin Adler	777dfbe965	feat(enterprise): add ready for handshake support to pgcoord (#12935 )	2024-04-16 15:01:10 -05:00
Colin Adler	e801e878ba	feat: add agent acks to in-memory coordinator (#12786 ) When an agent receives a node, it responds with an ACK which is relayed to the client. After the client receives the ACK, it's allowed to begin pinging.	2024-04-10 17:15:33 -05:00
Marcin Tojek	b6359b0a89	fix: ignore gomock temporary files (#12924 )	2024-04-10 08:48:56 +00:00
Colin Adler	4d5a7b2d56	chore(codersdk): move all tailscale imports out of `codersdk` (#12735 ) Currently, importing `codersdk` just to interact with the API requires importing tailscale, which causes builds to fail unless manually using our fork.	2024-03-26 12:44:31 -05:00
Colin Adler	66154f937e	fix(coderd): pass block endpoints into servertailnet (#12149 )	2024-03-08 05:29:54 +00:00
Colin Adler	e5d911462f	fix(tailnet): enforce valid agent and client addresses (#12197 ) This adds the ability for `TunnelAuth` to also authorize incoming wireguard node IPs, preventing agents from reporting anything other than their static IP generated from the agent ID.	2024-03-01 09:02:33 -06:00
Spike Curtis	4e7beee102	feat: show tailnet peer diagnostics after coder ping (#12314 ) Beginnings of a solution to #12297 Doesn't cover disco or definitively display whether we successfully connected to DERP, but shows some checklist diagnostics for connecting to an agent. For this first PR, I just added it to `coder ping` to see how we like it, but could be incorporated into `coder ssh` _et al._ after a timeout. ``` $ coder ping dogfood2 p2p connection established in 147ms pong from dogfood2 p2p via 95.217.xxx.yyy:42631 in 147ms pong from dogfood2 p2p via 95.217.xxx.yyy:42631 in 140ms pong from dogfood2 p2p via 95.217.xxx.yyy:42631 in 140ms ✔ preferred DERP region 999 (Council Bluffs, Iowa) ✔ sent local data to Coder networking coodinator ✔ received remote agent data from Coder networking coordinator preferred DERP 10013 (Europe Fly.io (Paris)) endpoints: 95.217.xxx.yyy:42631, 95.217.xxx.yyy:37576, 172.17.0.1:37576, 172.20.0.10:37576 ✔ Wireguard handshake 11s ago ```	2024-02-27 22:04:46 +04:00
Spike Curtis	af3fdc68c3	chore: refactor agent routines that use the v2 API (#12223 ) In anticipation of needing the `LogSender` to run on a context that doesn't get immediately canceled when you `Close()` the agent, I've undertaken a little refactor to manage the goroutines that get run against the Tailnet and Agent API connection. This handles controlling two contexts, one that gets canceled right away at the start of graceful shutdown, and another that stays up to allow graceful shutdown to complete.	2024-02-23 11:04:23 +04:00
Dean Sheather	9861830e87	fix: never send local endpoints if disabled (#12138 )	2024-02-20 15:51:25 +10:00
Cian Johnston	a2cbb0f87f	fix(enterprise/coderd): check provisionerd API version on connection (#12191 )	2024-02-16 18:43:07 +00:00
Spike Curtis	2d0b9106c0	fix: change servertailnet to register the DERP dialer before setting DERP map (#12137 ) I noticed a possible race where tailnet.Conn can try to dial the embedded region before we've set our custom dialer that send the DERP in-memory. This closes that race and adds a test case for servertailnet with no STUN and an embedded relay	2024-02-15 10:51:12 +04:00
Spike Curtis	e5ba586e30	fix: fix graceful disconnect in DialWorkspaceAgent (#11993 ) I noticed in testing that the CLI wasn't correctly sending the disconnect message when it shuts down, and thus agents are seeing this as a "lost" peer, rather than a "disconnected" one. What was happening is that we just used a single context for everything from the netconn to the RPCs, and when the context was canceled we failed to send the disconnect message due to canceled context. So, this PR splits things into two contexts, with a graceful one set to last up to 1 second longer than the main one.	2024-02-05 14:01:37 +04:00
Spike Curtis	bb99cb7d2b	chore: move FakeCoordinator to tailnettest (#11992 ) Moves FakeCoordinator to tailnettest since it's reused in testing multiple packages in this stack of PRs.	2024-02-05 13:49:32 +04:00
Spike Curtis	646ac942b2	chore: rename FakeCoordinator for export (#11991 ) Part of a stack that fixes graceful disconnect from the CLI to tailnet. I reuse FakeCoordinator in a test for graceful disconnects.	2024-02-05 13:33:31 +04:00
Spike Curtis	520b12e1a2	fix: close MultiAgentConn when coordinator closes (#11941 ) Fixes an issue where a MultiAgentConn isn't closed properly when the coordinator it is connected to is closed. Since servertailnet checks whether the conn is closed before reinitializing, it is important that we check this, otherwise servertailnet can get stuck if the coordinator closes (e.g. when we switch from AGPL to PGCoordinator after decoding a license).	2024-01-31 00:38:19 +04:00
Spike Curtis	d3983e4dba	feat: add logging to client tailnet yamux (#11908 ) Adds logging to yamux when used for tailnet client connections, e.g. CLI and wsproxy. This could be useful for debugging connection issues with tailnet v2 API.	2024-01-30 09:58:59 +04:00
Spike Curtis	1e8a9c09fe	chore: remove legacy wsconncache (#11816 ) Fixes #8218 Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it. The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly. We should eventually remove this: #11819	2024-01-30 07:56:36 +04:00
Colin Adler	bc14e926d8	feat: add option to speedtest to dump a pcap of network traffic (#11848 )	2024-01-29 09:57:31 -06:00
Spike Curtis	5cbb76b47a	fix: stop spamming DERP map updates for equivalent maps (#11792 ) Fixes 2 related issues: 1. wsconncache had incorrect logic to test whether to send DERPMap updates, sending if the maps were equivalent, instead of if they were _not equivalent_. 2. configmaps used a bugged check to test equality between DERPMaps, since it contains a map and the map entries are serialized in random order. Instead, we avoid comparing the protobufs and instead depend on the existing function that compares `tailcfg.DERPMap`. This also has the effect of reducing the number of times we convert to and from protobuf.	2024-01-24 16:27:15 +04:00
Spike Curtis	059e533544	feat: agent uses Tailnet v2 API for DERPMap updates (#11698 ) Switches the Agent to use Tailnet v2 API to get DERPMap updates. Subsequent PRs will do the same for the CLI (`codersdk`) and `wsproxy`.	2024-01-23 14:42:07 +04:00
Spike Curtis	3e0e7f8739	feat: check agent API version on connection (#11696 ) fixes #10531 Adds a check for `version` on connection to the Agent API websocket endpoint. This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd. It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.	2024-01-23 14:27:49 +04:00
Spike Curtis	eb12fd7d92	feat: make ServerTailnet set peers lost when it reconnects to the coordinator (#11682 ) Adds support to `ServerTailnet` to set all peers lost before attempting to reconnect to the coordinator. In practice, this only really affects `wsproxy` since coderd has a local connection to the coordinator that only goes down if we're shutting down or change licenses.	2024-01-23 13:17:56 +04:00
Spike Curtis	5388a1b6d7	fix: use TSMP ping for reachability, not latency (#11749 ) Use TSMP ping for reachability, but leave Disco ping for when we call Ping() since we often use that to determine whether we have a direct connection. Also adds unit tests to make sure Ping() returns direct connection vs DERP correctly.	2024-01-22 17:37:15 +04:00
Spike Curtis	7ffd99cfe2	fix: use DiscoPing (partially reverts #11306 ) (#11744 )	2024-01-22 12:40:21 +00:00
Spike Curtis	3d85cdfa11	feat: set peers lost when disconnected from coordinator (#11681 ) Adds support to Coordination to call SetAllPeersLost() when it is closed. This ensure that when we disconnect from a Coordinator, we set all peers lost. This covers CoderSDK (CLI client) and Agent. Next PR will cover MultiAgent (notably, `wsproxy`).	2024-01-22 15:26:20 +04:00
Spike Curtis	b7b936547d	feat: add setAllPeersLost to the configMaps subcomponent (#11665 ) adds setAllPeersLost to the configMaps subcomponent of tailnet.Conn --- we'll call this when we disconnect from a coordinator so we'll eventually clean up peers if they disconnect while we are retrying the coordinator connection (or we don't succeed in reconnecting to the coordinator).	2024-01-22 12:12:15 +04:00
Spike Curtis	f01cab9894	feat: use tailnet v2 API for coordination (#11638 ) This one is huge, and I'm sorry. The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet. There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!	2024-01-22 11:07:50 +04:00
Spike Curtis	8910ac715c	feat: add tailnet v2 support to wsproxy coordinate endpoint (#11637 ) wsproxy also needs to be updated to use tailnet v2 because the `tailnet.Conn` stores peers by ID, and the peerID was not being carried by the JSON protocol. This adds a query param to the endpoint to conditionally switch to the new protocol.	2024-01-18 10:10:36 +04:00
Spike Curtis	07427e06f7	chore: add setBlockEndpoints to nodeUpdater (#11636 ) nodeUpdater also needs block endpoints, so that it can stop sending nodes with endpoints.	2024-01-18 10:02:15 +04:00
Spike Curtis	5b4de667d6	chore: add setCallback to nodeUpdater (#11635 ) we need to be able to (re-)set the node callback when we lose and regain connection to a coordinator over the network.	2024-01-18 09:51:09 +04:00
Spike Curtis	e725f9d7d4	chore: stop passing addresses on configMaps constructor (#11634 ) moving this out of the constructor so that setting this when creating a new `tailnet.Conn` triggers configuring the engine.	2024-01-18 09:43:28 +04:00
Spike Curtis	a514df71ed	chore: add setDERPMap to configMaps (#11590 ) Add setDERPMap	2024-01-18 09:34:30 +04:00
Spike Curtis	25e289e1f6	chore: add setAddresses to nodeUpdater (#11571 ) Adds setAddresses to nodeUpdater	2024-01-18 09:24:16 +04:00
Spike Curtis	2aa3cbbd03	chore: add logging to nodeUpdater (#11569 ) Add debug logging for nodeUpdater and configMaps	2024-01-17 14:15:45 +04:00
Spike Curtis	38d9ce5267	chore: add setStatus support to nodeUpdater (#11568 ) Add support for the wgengine Status callback to nodeUpdater	2024-01-17 09:06:34 +04:00
Spike Curtis	f6dc707511	chore: add DERPForcedWebsocket to nodeUpdater (#11567 ) Add support for DERPForcedWebsocket to nodeUpdater	2024-01-17 08:55:45 +04:00
Spike Curtis	8701dbc874	chore: add nodeUpdater to tailnet (#11539 ) Adds a nodeUpdater component, which serves a similar role to configMaps, but tracks information from tailscale going out to the coordinator as node updates. This first PR just handles netInfo, subsequent PRs will handle DERP forced websockets, endpoints, and addresses.	2024-01-11 09:29:42 +04:00
Spike Curtis	7005fb1b2f	chore: add support for blockEndpoints to configMaps (#11512 ) Adds support for setting blockEndpoints on the configMaps	2024-01-11 09:18:31 +04:00
Spike Curtis	617ecbfb1f	chore: add support for peer updates to tailnet.configMaps (#11487 ) Adds support to configMaps to handle peer updates including lost and disconnected peers	2024-01-11 09:11:43 +04:00
Spike Curtis	89e3bbe0f5	chore: add configMaps component to tailnet (#11400 ) Work in progress on a subcomponent of the Conn which will handle configuring the wireguard engine on changes. I've implemented setAddresses as the simplest case and added unit tests of the reconfiguration loop. Besides making the code easier to test and understand, the goal is for this component to handle disconnect and loss updates about peers, and thereby, implement the v2 Tailnet API. Further PRs will handle peer updates, status updates, and net info updates. Then, after the subcomponent is implemented and tested, I will refactor Conn to use it instead of the current monolithic architecture.	2024-01-10 10:58:53 +04:00
Cian Johnston	4d2fe2685a	chore(coderd): extract api version validation to util package (#11407 )	2024-01-05 10:22:07 +00:00
Spike Curtis	58873fa7e2	chore: remove unused context/cancel in tailnet Conn (#11399 ) Spotted during code read; unused fields	2024-01-05 08:15:42 +04:00
Steven Masley	dd05a6b13a	chore: mockgen archived, moved to new location (#11415 ) * chore: mockgen archived, moved to new location	2024-01-04 18:35:56 -06:00
Spike Curtis	520c3a8ff7	fix: use TSMP for pings and checking reachability (#11306 ) We're seeing some flaky tests related to agent connectivity - https://github.com/coder/coder/actions/runs/7286675441/job/19856270998 I'm pretty sure what happened in this one is that the client opened a connection while the wgengine was in the process of reconfiguring the wireguard device, so the fact that the peer became "active" as a result of traffic being sent was not noticed. The test calls `AwaitReachable()` but this only tests the disco layer, so it doesn't wait for wireguard to come up. I think we should be using TSMP for pinging and reachability, since this operates at the IP layer, and therefore requires that wireguard comes up before being successful. This should also help with the problems we have seen where a TCP connection starts before wireguard is up and the initial round trip has to wait for the 5 second wireguard handshake retry. fixes: #11294	2024-01-02 15:53:52 +04:00
Spike Curtis	25f2abf9ab	chore: remove tailnet from agent API and rename client API to tailnet (#11303 ) Refactors our DRPC service definitions slightly. In the previous version, I inserted the RPCs from the tailnet proto directly into the Agent service. This makes things hard to deal with because DRPC then generates a new set of methods with new interfaces with the `DRPCAgent_` prefixed. Since you can't have a single method that takes different argument types, we couldn't reuse the implementation of those RFCs without a lot of extra classes and pass-thru methods. Instead, the "right" way to do it is to integrate at the DRPC layer. So, we have two DRPC services available over the Agent websocket, and register them both on the DRPC `mux`. Since the tailnet proto RPC service is now for both clients and agents, I renamed some things to clarify and shorten. This PR also removes the `TailnetAPI` implementation from the `agentapi` package, and the next PR in the stack replaces it with the implementation from the `tailnet` package.	2024-01-02 10:02:45 +04:00
Spike Curtis	d257f8163d	feat: implement DERP streaming on tailnet Client API (#11302 ) Implements DERPMap streaming from client API. In a subsequent PR I plan to remove the implementation in coderd/agentapi in favor of the tailnet one	2024-01-02 08:07:57 +04:00
Dean Sheather	e46431078c	feat: add AgentAPI using DRPC (#10811 ) Co-authored-by: Spike Curtis <spike@coder.com>	2023-12-18 22:53:28 +10:00
Spike Curtis	a58e4febb9	feat: add tailnet v2 Service and Client (#11225 ) Part of #10532 Adds a tailnet ClientService that accepts a net.Conn and serves v1 or v2 of the tailnet API. Also adds a DRPCService that implements the DRPC interface for the v2 API. This component is within the ClientService, but needs to be reusable and exported so that we can also embed it in the Agent API. Finally, includes a NewDRPCClient function that takes a net.Conn and runs dRPC in yamux over it on the client side.	2023-12-15 12:48:39 +04:00
Spike Curtis	30f032d282	feat: add tailnet ValidateVersion (#11223 ) Part of #10532 Adds a method to validate a requested version of the tailnet API	2023-12-15 11:49:30 +04:00

1 2 3

146 Commits