* chore: Refactor Enterprise code to layer on top of AGPL
This is an experiment to invert the import order of the Enterprise
code to layer on top of AGPL.
* Fix Garrett's comments
* Add pointer.Handle to atomically obtain references
This uses a context to ensure the same value persists through
multiple executions to `Load()`.
* Remove entitlements API from AGPL coderd
* Remove AGPL Coder entitlements endpoint test
* Fix warnings output
* Add command-line flag to toggle audit logging
* Fix hasLicense being set
* Remove features interface
* Fix audit logging default
* Add bash as a dependency
* Add comment
* Add tests for resync and pubsub, and add back previous exp backoff retry
* Separate authz code again
* Add pointer loading example from comment
* Fix duplicate test, remove pointer.Handle
* Fix expired license
* Add entitlements struct
* Fix context passing
* fix: Add latency-check for DERP over HTTP(s)
This fixes scenarios where latency wasn't being reported if
a connection had UDP entirely blocked.
* Add inactivity ping
* Improve coordinator error reporting consistency
* fix: Add coder user to docker group on installation
This makes for a simpler setup, and reduces the likelihood
a user runs into a strange issue.
* Add wgnet
* Add ping
* Add listening
* Finish refactor to make this work
* Add interface for swapping
* Fix conncache with interface
* chore: update gvisor
* fix tailscale types
* linting
* more linting
* Add coordinator
* Add coordinator tests
* Fix coordination
* It compiles!
* Move all connection negotiation in-memory
* Migrate coordinator to use net.conn
* Add closed func
* Fix close listener func
* Make reconnecting PTY work
* Fix reconnecting PTY
* Update CI to Go 1.19
* Add CLI flags for DERP mapping
* Fix Tailnet test
* Rename ConnCoordinator to TailnetCoordinator
* Remove print statement from workspace agent test
* Refactor wsconncache to use tailnet
* Remove STUN from unit tests
* Add migrate back to dump
* chore: Upgrade to Go 1.19
This is required as part of #3505.
* Fix reconnecting PTY tests
* fix: update wireguard-go to fix devtunnel
* fix migration numbers
* linting
* Return early for status if endpoints are empty
* Update cli/server.go
Co-authored-by: Colin Adler <colin1adler@gmail.com>
* Update cli/server.go
Co-authored-by: Colin Adler <colin1adler@gmail.com>
* Fix frontend entites
* Fix agent bicopy
* Fix race condition for the last node
* Fix down migration
* Fix connection RBAC
* Fix migration numbers
* Fix forwarding TCP to a local port
* Implement ping for tailnet
* Rename to ForceHTTP
* Add external derpmapping
* Expose DERP region names to the API
* Add global option to enable Tailscale networking for web
* Mark DERP flags hidden while testing
* Update DERP map on reconnect
* Add close func to workspace agents
* Fix race condition in upstream dependency
* Fix feature columns race condition
Co-authored-by: Colin Adler <colin1adler@gmail.com>
* fix: Change uses of t.Cleanup -> defer in test bodies
Mixing t.Cleanup and defer can lead to unexpected order of execution.
* fix: Ensure t.Cleanup is not aborted by require
* chore: Add helper annotations
* fix: Remove use of `require` in `require.Eventually` in tests
Because require uses `t.FailNow()` and `require.Eventually` runs the
function in a goroutine, which is not allowed.
* feat: Add ruleguard for require.Eventually
Co-authored-by: Cian Johnston <cian@coder.com>
Two coderd unit tests (TestPatchCancelTemplateVersion/Success and TestPatchCancelWorkspaceBuild) implied erroneously that the job was canceled successfully.
This is not the case, as these unit tests do not include a Provision_Complete response in the input to the
echo provisioner. Now explicitly checking the job error and bumping the force cancel interval to be longer.
Fixes#3083.
* test: Use a template to prevent migrations from running for every test
* Create a single makefile target
* Fix built-in race
* Extend timeout of built-in PostgreSQL fetch
* feat: Add anonymized telemetry to report product usage
This adds a background service to report telemetry to a Coder
server for usage data. There will be realtime event data sent
in the future, but for now usage will report on a CRON.
* Fix flake and requested changes
* Add reporting options for setup
* Add reporting for workspaces
* Add resources as they are reported
* Track API key usage
* Ensure telemetry is tracked prior to exit
* fix: Use in-memory filesystem for echo provisioner tests
This should reduce IO in CI to shave some time off tests!
* test: Increase timeouts to reduce flakes
It's difficult to understand what's timing out due to a lock
vs. taking a long time. This should help resolve! 🕵️
This PR adds fields to templates that constrain values for workspaces derived from that template.
- Autostop: Adds a field max_ttl on the template which limits the maximum value of ttl on all workspaces derived from that template. Defaulting to 168 hours, enforced on edits to workspace metadata. New workspaces will default to the templates's `max_ttl` if not specified.
- Autostart: Adds a field min_autostart_duration which limits the minimum duration between successive autostarts of a template, measured from a single reference time. Defaulting to 1 hour, enforced on edits to workspace metadata.
* feat: Add app support
This adds apps as a property to a workspace agent.
The resource is added to the Terraform provider here:
https://github.com/coder/terraform-provider-coder/pull/17
Apps will be opened in the dashboard or via the CLI
with `coder open <name>`. If `command` is specified, a
terminal will appear locally and in the web. If `target`
is specified, the browser will open to an exposed instance
of that target.
* Compare fields in apps test
* Update Terraform provider to use relative path
* Add some basic structure for routing
* chore: Remove interface from coderd and lift API surface
Abstracting coderd into an interface added misdirection because
the interface was never intended to be fulfilled outside of a single
implementation.
This lifts the abstraction, and attaches all handlers to a root struct
named `*coderd.API`.
* Add basic proxy logic
* Add proxying based on path
* Add app proxying for wildcards
* Add wsconncache
* fix: Race when writing to a closed pipe
This is such an intermittent race it's difficult to track,
but regardless this is an improvement to the code.
* fix: Race when writing to a closed pipe
This is such an intermittent race it's difficult to track,
but regardless this is an improvement to the code.
* fix: Race when writing to a closed pipe
This is such an intermittent race it's difficult to track,
but regardless this is an improvement to the code.
* fix: Race when writing to a closed pipe
This is such an intermittent race it's difficult to track,
but regardless this is an improvement to the code.
* Add workspace route proxying endpoint
- Makes the workspace conn cache concurrency-safe
- Reduces unnecessary open checks in `peer.Channel`
- Fixes the use of a temporary context when dialing a workspace agent
* Add embed errors
* chore: Refactor site to improve testing
It was difficult to develop this package due to the
embed build tag being mandatory on the tests. The logic
to test doesn't require any embedded files.
* Add test for error handler
* Remove unused access url
* Add RBAC tests
* Fix dial agent syntax
* Fix linting errors
* Fix gen
* Fix icon required
* Adjust migration number
* Fix proxy error status code
* Fix empty db lookup
* Changes all public-facing codersdk types to use a plain int64 (milliseconds) instead of time.Duration.
* Makes autostart_schedule a *string as it may not be present.
* Adds a utils/ptr package with some useful methods.
* feat: Member roles are implied and never exlpicitly added
* Rename "GetAllUserRoles" to "GetAuthorizationRoles"
* feat: Add migration to remove implied roles
* rename user auth role middleware
Removes 5-second wait in autobuild.executor unit tests:
- Adds a write-only channel to Executor and plumbs through to unit tests
- Modifies runOnce to return an executor.RunStats struct and write to statsCh if not nil
Abstracting coderd into an interface added misdirection because
the interface was never intended to be fulfilled outside of a single
implementation.
This lifts the abstraction, and attaches all handlers to a root struct
named `*coderd.API`.
* database: add autostart_schedule and ttl to InsertWorkspace; make gen
* coderd: workspaces: consume additional fields of CreateWorkspaceRequest
* cli: update: add support for TTL and autostart_schedule
* cli: create: add unit tests
* coder: import `time/tzdata` for embedded timezone database
* autobuild: fix unit test that only runs with a real db
This PR adds a package lifecycle and an Executor implementation that attempts to schedule a build of workspaces with autostart configured.
- lifecycle.Executor takes a chan time.Time in its constructor (e.g. time.Tick(time.Minute))
- Whenever a value is received from this channel, it executes one iteration of looping through the workspaces and triggering lifecycle operations.
- When the context passed to the executor is Done, it exits.
- Only workspaces that meet the following criteria will have a lifecycle operation applied to them:
- Workspace has a valid and non-empty autostart or autostop schedule (either)
- Workspace's last build was successful
- The following transitions will be applied depending on the current workspace state:
- If the workspace is currently running, it will be stopped.
- If the workspace is currently stopped, it will be started.
- Otherwise, nothing will be done.
- Workspace builds will be created with the same parameters and template version as the last successful build (for example, template version)
This removes split ownership for workspaces. They are now
a resource of organizations and have a designated owner,
which is a user.
This enables simple administration for commands like:
- `coder stop ben/dev`
- `coder build logs colin/arch`
or if we decide to allow administrators to access workspaces,
they could even SSH using this syntax: `coder ssh colin/dev`.
* Improve CLI documentation
* feat: Allow workspace resources to attach multiple agents
This enables a "kubernetes_pod" to attach multiple agents that
could be for multiple services. Each agent is required to have
a unique name, so SSH syntax is:
`coder ssh <workspace>.<agent>`
A resource can have zero agents too, they aren't required.
* Add tree view
* Improve table UI
* feat: Allow workspace resources to attach multiple agents
This enables a "kubernetes_pod" to attach multiple agents that
could be for multiple services. Each agent is required to have
a unique name, so SSH syntax is:
`coder ssh <workspace>.<agent>`
A resource can have zero agents too, they aren't required.
* Rename `tunnel` to `skip-tunnel`
This command was `true` by default, which causes
a confusing user experience.
* Add disclaimer about editing templates
* Add help to template create
* Improve workspace create flow
* Add end-to-end test for config-ssh
* Improve testing of config-ssh
* Fix workspace list
* Fix config ssh tests
* Update cli/configssh.go
Co-authored-by: Cian Johnston <public@cianjohnston.ie>
* Fix requested changes
* Remove socat requirement
* Fix resources not reading in TTY
Co-authored-by: Cian Johnston <public@cianjohnston.ie>
This enables a "kubernetes_pod" to attach multiple agents that
could be for multiple services. Each agent is required to have
a unique name, so SSH syntax is:
`coder ssh <workspace>.<agent>`
A resource can have zero agents too, they aren't required.
Customer feedback indicated projects was a confusing name.
After querying the team internally, it seemed unanimous
that it is indeed a confusing name.
Here's for a lil less confusion @ashmeer7 🥂
* feat: Add AWS instance identity authentication
This allows zero-trust authentication for all AWS instances.
Prior to this, AWS instances could be used by passing `CODER_TOKEN`
as an environment variable to the startup script. AWS explicitly
states that secrets should not be passed in startup scripts because
it's user-readable.
* Fix sha256 verbosity
* Fix HTTP client being exposed on auth
* chore: Move httpmw to /coderd directory
httpmw is specific to coderd and should be scoped under coderd
* chore: Move httpapi to /coderd directory
httpapi is specific to coderd and should be scoped under coderd
* chore: Move database to /coderd directory
database is specific to coderd and should be scoped under coderd
* chore: Update codecov & gitattributes for generated files
* chore: Update Makefile
* ci: Update DataDog GitHub branch to fallback to GITHUB_REF
This was detecting branches, but not our "main" branch before.
Hopefully this fixes it!
* Add basic Terraform Provider
* Rename post files to upload
* Add tests for resources
* Skip instance identity test
* Add tests for ensuring agent get's passed through properly
* Fix linting errors
* Add echo path
* Fix agent authentication
* fix: Convert all jobs to use a common resource and agent type
This enables a consistent API for project import and provisioned resources.
* Add "coder_workspace" data source
* feat: Remove magical parameters from being injected
This is a much cleaner abstraction. Explicitly declaring the user
parameters for each provisioner makes for significantly simpler
testing.
* feat: Add agent authentication based on instance ID
Each cloud has it's own unique instance identity signatures, which
can be used for zero-token authentication. This change adds support
for tracking by "instance_id", and automatically authenticating
with Google Cloud.
* Add test for CLI
* Fix workspace agent request name
* Fix race with adding to wait group
* Fix name of instance identity token
WebSockets hijack the HTTP connection from the server, causing
server.Close() to not wait for these connections to fully cleanup.
This adds a global wait-group to the coderd API, which ensures all
WebSocket HTTP handlers have properly exited before returning.
* feat: Add workspace agent for SSH
This adds the initial agent that supports TTY
and execution over SSH. It functions across MacOS,
Windows, and Linux.
This does not handle the coderd interaction yet,
but does setup a simple path forward.
* Fix pty tests on Windows
* Fix log race
* Lock around dial error to fix log output
* Fix context return early
* fix: Leaking yamux session after HTTP handler is closed
Closes#317. We depended on the context canceling the yamux connection,
but this isn't a sync operation. Explicitly calling close ensures the
handler waits for yamux to complete before exit.
* Lock around close return
* Force failure with log
* Fix failed handler
* Upgrade dep
* Fix defer inside loops
* Fix context cancel for HTTP requests
* Fix resize
* fix: Leaking yamux session after HTTP handler is closed
Closes#317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.
After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.
* Switch to windows-2022 to improve decompression
* Invalidate cache on matrix OS
* Refactor parameter parsing to return nil values if none computed
* Refactor parameter to allow for hiding redisplay
* Refactor parameters to enable schema matching
* Refactor provisionerd to dynamically update parameter schemas
* Refactor job update for provisionerd
* Handle multiple states correctly when provisioning a project
* Add project import job resource table
* Basic creation flow works!
* Create project fully works!!!
* Only show job status if completed
* Add create workspace support
* Replace Netflix/go-expect with ActiveState
* Fix linting errors
* Use forked chzyer/readline
* Add create workspace CLI
* Add CLI test
* Move jobs to their own APIs
* Remove go-expect
* Fix requested changes
* Skip workspacecreate test on windows
* Nest jobs under an organization
* Rename project parameter to parameter schema
* Update references when computing project parameters
* Add files endpoint
* Allow one-off project import jobs
* Allow variables to be injected that are not defined by the schema
* Update API to use jobs first
* Fix CLI tests
* Fix linting
* Fix hex length for files table
* Reduce memory allocation for windows
* refactor: Rename ProjectParameter to ProjectVersionParameter
This was confusing with ParameterValue before. It still is a bit,
but this should help distinguish scope.
* Add project version resources table
* Allow project parameters to optionally have user and workspace
* Add dry run for provisioners
* Add resource detection on project import
* chore: Rename ProjectHistory to ProjectVersion
Version more accurately represents version storage. This
forks from the WorkspaceHistory name, but I think it's
easier to understand Workspace history.
* Rename files
* Standardize tests a bit more
* Remove Server struct from coderdtest
* Improve test coverage for workspace history
* Fix linting errors
* Fix coderd test leak
* Fix coderd test leak
* Improve workspace history logs
* Standardize test structure for codersdk
* Fix linting errors
* Fix WebSocket compression
* Update coderd/workspaces.go
Co-authored-by: Bryan <bryan@coder.com>
* Add test for listing project parameters
* Cache npm dependencies with setup node
* Remove windows npm cache key
Co-authored-by: Bryan <bryan@coder.com>
This replaces the cdr-basic provisioner type with
"echo". It reads binary data from the directory
and returns the responses in order.
This is used to test project and workspace job logic.
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](f6e369438f/drpcwire/error.go (L15)) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](f6e369438f/drpcstream/stream.go (L253))
- [`writeFrame`](f6e369438f/drpcwire/writer.go (L65))
- [`AppendFrame`](f6e369438f/drpcwire/packet.go (L128))
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: f6e369438f/drpcwire/writer.go (L73)
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](8dee580a7f/write.go (L180)), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](a1a9cfc821/flate/stateless.go (L94)), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
* feat: Add organizations endpoint for users
This moves the /user endpoint to /users/me instead. This
will reduce code duplication.
This adds /users/<name>/organizations to list organizations
a user has access to. It doesn't contain the permissions a
user has over the organizations, but that will come in a future
contribution.
* Fix requested changes
* Fix tests
* Fix timeout
* Add test for UserOrgs
* Add test for userparam getting
* Add test for NoUser
* ci: Run tests using PostgreSQL database and mock
This allows us to use the mock database for quick iterative testing,
and have confidence from CI using a real PostgreSQL database.
PostgreSQL tests are only ran on Linux. They are *really* slow on MacOS
and Windows runners, and don't provide much additional confidence.
* Only run PostgreSQL tests once for speed
* Fix race condition of log after close
Not all resources were cleaned up immediately after a peer connection was
closed. DataChannels could have a goroutine exit after Close() prior to this.
* Fix comment
* chore: Fix golangci-lint configuration and patch errors
Due to misconfiguration of a linting rules directory, our linter has not been
working properly. This change fixes the configuration issue, and all remaining
linting errors.
* Fix race in peer logging
* Fix race and return
* Lock on bufferred amount low
* Fix mutex lock
* feat: Add authentication and personal user endpoint
This contribution adds a lot of scaffolding for the database fake
and testability of coderd.
A new endpoint "/user" is added to return the currently authenticated
user to the requester.
* Use TestMain to catch leak instead
* Add userpassword package
* Add WIP
* Add user auth
* Fix test
* Add comments
* Fix login response
* Fix order
* Fix generated code
* Update httpapi/httpapi.go
Co-authored-by: Bryan <bryan@coder.com>
Co-authored-by: Bryan <bryan@coder.com>