Instead of removing the mappings of unhealthy coordinators entirely,
mark them as lost instead. This prevents peers from disappearing from
other peers if a coordinator misses a heartbeat.
When starting a workspace, if the deadline crosses an autostart boundary, the deadline is set to autostart + TTL.
This copies the behavior in `ActivityBumpWorkspace`, but does not require activity.
When an agent receives a node, it responds with an ACK which is relayed
to the client. After the client receives the ACK, it's allowed to begin
pinging.
fixes#12923
Prevents Coordinate peer connections from generating spurious database queries like DeleteTailnetPeer when the coordinator is unhealthy.
It does this by checking the health of the querier before accepting a connection, rather than unconditionally accepting it only for it to get swatted down later.
* chore: merge apikey/token session config values
There is a confusing difference between an apikey and a token. This
difference leaks into our configs. This change does not resolve the
difference. It only groups the config values to try and manage any
bloat that occurs from adding more similar config values
* chore: merge authorization contexts
Instead of 2 auth contexts from apikey and dbauthz, merge them to
just use dbauthz. It is annoying to have two.
* fixup authorization reference
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
This cleans up `root.go` a bit, adds tests for middleware HTTP transport
functions, and removes two HTTP requests we always always performed previously
when executing *any* client command.
It should improve CLI performance (especially for users with higher latency).
* chore: remove max_ttl from templates
Completely removing max_ttl as a feature on template scheduling. Must use other template scheduling features to achieve autostop.
* chore: add org ID as optional param to AcquireJob
* chore: plumb through organization id to provisioner daemons
* add org id to provisioner domain key
* enforce org id argument
* dbgen provisioner jobs defaults to default org
In the peer healthcheck code, when an error pinging peers is detected we
write a "replicaErr" string with the error reason. However, if there are
no peer replicas to ping we returned early without setting the string to
empty. This would cause replicas that had peers (which were failing) and
then the peers left to permanently show an error until a new peer
appeared.
Also demotes DERP replica checking to a "warning" rather than an "error"
which should prevent the primary from removing the proxy from the region
map if DERP meshing is non-functional. This can happen without causing
problems if the peer is shutting down so we don't want to disrupt
everything if there isn't an issue.
* fix: separate signals for passive, active, and forced shutdown
`SIGTERM`: Passive shutdown stopping provisioner daemons from accepting new
jobs but waiting for existing jobs to successfully complete.
`SIGINT` (old existing behavior): Notify provisioner daemons to cancel in-flight jobs, wait 5s for jobs to be exited, then force quit.
`SIGKILL`: Untouched from before, will force-quit.
* Revert dramatic signal changes
* Rename
* Fix shutdown behavior for provisioner daemons
* Add test for graceful shutdown
* fix: do not set max deadline for workspaces on template update
When templates are updated and schedule data is changed, we update all
running workspaces to have up-to-date scheduling information that sticks
to the new policy.
When updating the max_deadline for existing running workspaces, if the
max_deadline was before now()+2h we would set the max_deadline to
now()+2h.
Builds that don't/shouldn't have a max_deadline have it set to 0, which
is always before now()+2h, and thus would always have the max_deadline
updated.
* test: add unit test to excercise template schedule bug
---------
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
- Reworks the proxy registration loop into a struct (so I can add a `RegisterNow` method)
- Changes the proxy registration loop interval to 15s (previously 30s)
- Adds test which tests bidirectional DERP meshing on all possible paths between 6 workspace proxy replicas
Related to https://github.com/coder/customers/issues/438
This adds the ability for `TunnelAuth` to also authorize incoming wireguard node IPs, preventing agents from reporting anything other than their static IP generated from the agent ID.