Commit Graph

26 Commits

Author SHA1 Message Date
Kyle Carberry 1724cbf872
feat: automatically use websockets if DERP upgrade is unavailable (#6381)
* feat: automatically use websockets if DERP upgrade is unavailable

This might be our biggest hangup for deployments at the moment...

Load balancers by default do not support the DERP protocol, so many
of our prospects and customers run into failing workspace connections.
This automatically swaps to use WebSockets, and reports the reason to
coderd.

In a future contribution, a warning will appear by the agent if it was
forced to use WebSockets instead of DERP.

* Fix nil pointer type in Tailscale dep

* Fix requested changes
2023-03-01 22:18:14 +00:00
Mathias Fredriksson cae8b88f60
fix(tailnet): Avoid logging netmap (#6342) 2023-02-25 08:06:38 +00:00
Mathias Fredriksson 677721e4a1
fix(tailnet): Skip nodes without DERP, avoid use of RemoveAllPeers (#6320)
* fix(tailnet): Skip nodes without DERP, avoid use of RemoveAllPeers
2023-02-24 18:16:29 +02:00
Mathias Fredriksson a414de9e81
fix(tailnet): Improve tailnet setup and agentconn stability (#6292)
* fix(tailnet): Improve start and close to detect connection races

* fix: Prevent agentConn use before ready via AwaitReachable

* fix(tailnet): Ensure connstats are closed on conn close

* fix(codersdk): Use AwaitReachable in DialWorkspaceAgent

* fix(tailnet): Improve logging via slog.Helper()
2023-02-24 13:11:28 +02:00
Colin Adler a54de6093b
feat: add `coder ping` (#6161) 2023-02-13 10:38:00 -06:00
Kyle Carberry c0c83f17b2
fix: follow tailscale idioms for when to update nodes (#6164) 2023-02-10 16:59:24 -06:00
Colin Adler 4432cd08d6
chore: update tailscale (#6091) 2023-02-09 21:43:18 -06:00
Colin Adler 52ecd35c8f
fix(wsconncache): only allow one peer per connection (#5886)
If an agent went away and reconnected, the wsconncache connection would
be polluted for about 10m because there would be two peers with the
same IP. The old peer always had priority, which caused the dashboard to
try and always dial the old peer until it was removed.

Fixes: https://github.com/coder/coder/issues/5292
2023-01-26 22:23:35 +00:00
Kyle Carberry e61234f260
feat: Add `vscodeipc` subcommand for VS Code Extension (#5326)
* Add extio

* feat: Add `vscodeipc` subcommand for VS Code Extension

This enables the VS Code extension to communicate with a Coder client.
The extension will download the slim binary from `/bin/*` for the
respective client architecture and OS, then execute `coder vscodeipc`
for the connecting workspace.

* Add authentication header, improve comments, and add tests for the CLI

* Update cli/vscodeipc_test.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Update cli/vscodeipc_test.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Update cli/vscodeipc/vscodeipc_test.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Fix requested changes

* Fix IPC tests

* Fix shell execution

* Fix nix flake

* Silence usage

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2022-12-18 17:50:06 -06:00
Colin Adler ae38bbeab6
chore: refactor agent stats streaming (#5112) 2022-11-18 16:46:53 -06:00
Mathias Fredriksson d9a83fc723
fix: Refactor tailnet conn AwaitReachable to allow for pings >1s RTT (#5096) 2022-11-15 20:59:22 +02:00
Kyle Carberry 82f494c99c
fix: Improve tailnet connections by reducing timeouts (#5043)
* fix: Improve tailnet connections by reducing timeouts

This awaits connection ping before running a dial. Before,
we were hitting the TCP retransmission and handshake timeouts,
which could intermittently add 1 or 5 seconds to a connection
being initialized.

* Update Tailscale
2022-11-13 11:33:05 -06:00
Kyle Carberry 29dc5f66b8
experiment: Switch to BuildJet Linux Runners (#4846) 2022-11-01 20:56:33 +00:00
Kyle Carberry 288e7d1045
fix: Flake on `TestReplica/TwentyConcurrent` (#4842)
This could actually cause connections to intermittently fail too
when a CPU is absolutely pegged. It just so happens that only
our runners have been that slow!

Fixes #4607.
2022-11-01 20:28:34 +00:00
Kyle Carberry 2ba4a62a0d
feat: Add high availability for multiple replicas (#4555)
* feat: HA tailnet coordinator

* fixup! feat: HA tailnet coordinator

* fixup! feat: HA tailnet coordinator

* remove printlns

* close all connections on coordinator

* impelement high availability feature

* fixup! impelement high availability feature

* fixup! impelement high availability feature

* fixup! impelement high availability feature

* fixup! impelement high availability feature

* Add replicas

* Add DERP meshing to arbitrary addresses

* Move packages to highavailability folder

* Move coordinator to high availability package

* Add flags for HA

* Rename to replicasync

* Denest packages for replicas

* Add test for multiple replicas

* Fix coordination test

* Add HA to the helm chart

* Rename function pointer

* Add warnings for HA

* Add the ability to block endpoints

* Add flag to disable P2P connections

* Wow, I made the tests pass

* Add replicas endpoint

* Ensure close kills replica

* Update sql

* Add database latency to high availability

* Pipe TLS to DERP mesh

* Fix DERP mesh with TLS

* Add tests for TLS

* Fix replica sync TLS

* Fix RootCA for replica meshing

* Remove ID from replicasync

* Fix getting certificates for meshing

* Remove excessive locking

* Fix linting

* Store mesh key in the database

* Fix replica key for tests

* Fix types gen

* Fix unlocking unlocked

* Fix race in tests

* Update enterprise/derpmesh/derpmesh.go

Co-authored-by: Colin Adler <colin1adler@gmail.com>

* Rename to syncReplicas

* Reuse http client

* Delete old replicas on a CRON

* Fix race condition in connection tests

* Fix linting

* Fix nil type

* Move pubsub to in-memory for twenty test

* Add comment for configuration tweaking

* Fix leak with transport

* Fix close leak in derpmesh

* Fix race when creating server

* Remove handler update

* Skip test on Windows

* Fix DERP mesh test

* Wrap HTTP handler replacement in mutex

* Fix error message for relay

* Fix API handler for normal tests

* Fix speedtest

* Fix replica resend

* Fix derpmesh send

* Ping async

* Increase wait time of template version jobd

* Fix race when closing replica sync

* Add name to client

* Log the derpmap being used

* Don't connect if DERP is empty

* Improve agent coordinator logging

* Fix lock in coordinator

* Fix relay addr

* Fix race when updating durations

* Fix client publish race

* Run pubsub loop in a queue

* Store agent nodes in order

* Fix coordinator locking

* Check for closed pipe

Co-authored-by: Colin Adler <colin1adler@gmail.com>
2022-10-17 13:43:30 +00:00
Kyle Carberry b8ec5c786d
fix: Ensure tailnet coordinations are sent orderly (#4198) 2022-09-26 10:16:04 -05:00
Kyle Carberry 99013b3aed
chore: Close dials in tailnet conn on close (#4174)
Fixes a race seen in: https://github.com/coder/coder/actions/runs/3114263658/jobs/5049905647
2022-09-23 12:10:47 -05:00
Kyle Carberry 80b45f1aa1
fix: Buffer tailnet nodes from connection initialization (#4159)
* fix: Don't use StatusAbnormalClosure

This is reserved for WASM use, and might be the cause of some weird leaks.

* Add close to provisioner logs
2022-09-22 20:22:49 +00:00
Kyle Carberry 5c0d63d31f
fix: Only hold `tailnet.*Conn.Close()` for a short duration (#4015)
* fix: Only hold `tailnet.*Conn.Close()` for a short duration

The long duration could be cause to a test deadlock.

* Add closed chan to listener struct
2022-09-12 17:46:45 +00:00
Kyle Carberry 5b5bc1da56
feat: Add local configuration option for DERP mapping (#3996)
This allows entirely airgapped geodistributed deployments of Coder!
2022-09-11 16:45:49 -05:00
Kyle Carberry 7718fa53c9
fix: Use a channel for bufferring tailnet connection updates (#3940) 2022-09-07 22:18:35 -05:00
Kyle Carberry dca24bd15d
fix: Don't clear out peers that haven't connected yet (#3916)
This was causing parallel connections to fail, because they wouldn't
be established yet.
2022-09-06 21:27:59 +00:00
Kyle Carberry 2fa77a9bbd
fix: Run status callbacks async to solve tailnet race (#3866) 2022-09-05 10:43:24 -05:00
Ammar Bandukwala 30f8fd9b95
Daily Active User Metrics (#3735)
* agent: add StatsReporter

* Stabilize protoc
2022-09-01 14:58:23 -05:00
Kyle Carberry 6826b976d7
fix: Add latency-check for DERP over HTTP(s) (#3788)
* fix: Add latency-check for DERP over HTTP(s)

This fixes scenarios where latency wasn't being reported if
a connection had UDP entirely blocked.

* Add inactivity ping

* Improve coordinator error reporting consistency
2022-09-01 16:41:47 +00:00
Kyle Carberry 9bd83e5ec7
feat: Add Tailscale networking (#3505)
* fix: Add coder user to docker group on installation

This makes for a simpler setup, and reduces the likelihood
a user runs into a strange issue.

* Add wgnet

* Add ping

* Add listening

* Finish refactor to make this work

* Add interface for swapping

* Fix conncache with interface

* chore: update gvisor

* fix tailscale types

* linting

* more linting

* Add coordinator

* Add coordinator tests

* Fix coordination

* It compiles!

* Move all connection negotiation in-memory

* Migrate coordinator to use net.conn

* Add closed func

* Fix close listener func

* Make reconnecting PTY work

* Fix reconnecting PTY

* Update CI to Go 1.19

* Add CLI flags for DERP mapping

* Fix Tailnet test

* Rename ConnCoordinator to TailnetCoordinator

* Remove print statement from workspace agent test

* Refactor wsconncache to use tailnet

* Remove STUN from unit tests

* Add migrate back to dump

* chore: Upgrade to Go 1.19

This is required as part of #3505.

* Fix reconnecting PTY tests

* fix: update wireguard-go to fix devtunnel

* fix migration numbers

* linting

* Return early for status if endpoints are empty

* Update cli/server.go

Co-authored-by: Colin Adler <colin1adler@gmail.com>

* Update cli/server.go

Co-authored-by: Colin Adler <colin1adler@gmail.com>

* Fix frontend entites

* Fix agent bicopy

* Fix race condition for the last node

* Fix down migration

* Fix connection RBAC

* Fix migration numbers

* Fix forwarding TCP to a local port

* Implement ping for tailnet

* Rename to ForceHTTP

* Add external derpmapping

* Expose DERP region names to the API

* Add global option to enable Tailscale networking for web

* Mark DERP flags hidden while testing

* Update DERP map on reconnect

* Add close func to workspace agents

* Fix race condition in upstream dependency

* Fix feature columns race condition

Co-authored-by: Colin Adler <colin1adler@gmail.com>
2022-08-31 20:09:44 -05:00