This allows deployments using our Prometheus export t determine
the number of active users in the past hour.
The interval is an hour to align with API key last used refresh times.
SSH connections poll to check shutdown time, so this will be accurate
even on long-running connections without dashboard requests.
* fix: Remove explicit coverpkg github.com/coder/coder/codersdk
This package is already covered by ./...
* fix: Ignore test utils in coverage (clitest, coderdtest, ptytest)
This cleans up agent association code to explicitly map a single
agent to a single resource. This will fix#1884, and unblock
a prospect from beginning a POC.
* feat: Add app support
This adds apps as a property to a workspace agent.
The resource is added to the Terraform provider here:
https://github.com/coder/terraform-provider-coder/pull/17
Apps will be opened in the dashboard or via the CLI
with `coder open <name>`. If `command` is specified, a
terminal will appear locally and in the web. If `target`
is specified, the browser will open to an exposed instance
of that target.
* Compare fields in apps test
* Update Terraform provider to use relative path
* Add some basic structure for routing
* chore: Remove interface from coderd and lift API surface
Abstracting coderd into an interface added misdirection because
the interface was never intended to be fulfilled outside of a single
implementation.
This lifts the abstraction, and attaches all handlers to a root struct
named `*coderd.API`.
* Add basic proxy logic
* Add proxying based on path
* Add app proxying for wildcards
* Add wsconncache
* fix: Race when writing to a closed pipe
This is such an intermittent race it's difficult to track,
but regardless this is an improvement to the code.
* fix: Race when writing to a closed pipe
This is such an intermittent race it's difficult to track,
but regardless this is an improvement to the code.
* fix: Race when writing to a closed pipe
This is such an intermittent race it's difficult to track,
but regardless this is an improvement to the code.
* fix: Race when writing to a closed pipe
This is such an intermittent race it's difficult to track,
but regardless this is an improvement to the code.
* Add workspace route proxying endpoint
- Makes the workspace conn cache concurrency-safe
- Reduces unnecessary open checks in `peer.Channel`
- Fixes the use of a temporary context when dialing a workspace agent
* Add embed errors
* chore: Refactor site to improve testing
It was difficult to develop this package due to the
embed build tag being mandatory on the tests. The logic
to test doesn't require any embedded files.
* Add test for error handler
* Remove unused access url
* Add RBAC tests
* Fix dial agent syntax
* Fix linting errors
* Fix gen
* Fix icon required
* Adjust migration number
* Fix proxy error status code
* Fix empty db lookup
* added error boundary and error ui components
* add body txt and standardize btn size
* added story
* feat: added error boundary
closes#1013
* committing lockfile
* added email body to help link
* feat: Add web terminal with reconnecting TTYs
This adds a web terminal that can reconnect to resume sessions!
No more disconnects, and no more bad bufferring!
* Add xstate service
* Add the webpage for accessing a web terminal
* Add terminal page tests
* Use Ticker instead of Timer
* Active Windows mode on Windows
These values were ignored. Environment variables are applied to
new sessions, and are refreshed on reconnect. This is cool because
a workspace could be updated with new environment variables without
requiring a complete start/stop.
The startup script is only ran once regardless of changes, which
feels like the expected behavior.
This fixes the dependency tree by adding recursion. It
now finds indirect connections and associates it with
an agent.
An example is attached which surfaced this issue.
This enables a "kubernetes_pod" to attach multiple agents that
could be for multiple services. Each agent is required to have
a unique name, so SSH syntax is:
`coder ssh <workspace>.<agent>`
A resource can have zero agents too, they aren't required.
This also resolves build time and commit hash using the
Go 1.18 debug/buildinfo package. An external URL is outputted
on running version as well to easily route the caller to a
release or commit.
* feat: Add stage to build logs
This adds a stage property to logs, and refactors the job logs
cliui.
It also adds tests to the cliui for build logs!
* feat: Add stage to build logs
This adds a stage property to logs, and refactors the job logs
cliui.
It also adds tests to the cliui for build logs!
* feat: Add config-ssh and tests for resiliency
* Rename "Echo" test to "ImmediateExit"
* Fix Terraform resource agent association
* Fix logs post-cancel
* Fix select on Windows
* Remove terraform init logs
* Move timer into it's own loop
* Fix race condition in provisioner jobs
* Fix requested changes
Summary:
This commit is a bit of a shotgun fix for various project settings.
Realistically, they could've been separate commits, but this is
convenience for just getting things into a green state to unblock
further work.
Details:
- Use our version of TS in vscode plugins
- organize vscode/settings.json
- fix tsconfig.test and tsconfig.prod (removes errors in test files)
- only use prod tsconfig in webpack
- point .eslintrc to both test and prod configs
- cleanup storybook
- running eslint in my workspace was OOMing. I configured
maxWorkers like we had in v1 to fix this.
- remove .storybook from code coverage
- remove .js files from code coverage --> after moving away
from Next.js, we don't allowJS in our tsconfig anymore. We only
use JS for configurations, it's not allowed in src code!
* ci: Update DataDog GitHub branch to fallback to GITHUB_REF
This was detecting branches, but not our "main" branch before.
Hopefully this fixes it!
* Add basic Terraform Provider
* Rename post files to upload
* Add tests for resources
* Skip instance identity test
* Add tests for ensuring agent get's passed through properly
* Fix linting errors
* Add echo path
* Fix agent authentication
* fix: Convert all jobs to use a common resource and agent type
This enables a consistent API for project import and provisioned resources.
* feat: Add agent authentication based on instance ID
Each cloud has it's own unique instance identity signatures, which
can be used for zero-token authentication. This change adds support
for tracking by "instance_id", and automatically authenticating
with Google Cloud.
* Add test for CLI
* Fix workspace agent request name
* Fix race with adding to wait group
* Fix name of instance identity token
* Initial agent
* fix: Use buffered reader in peer to fix ShortBuffer
This prevents a io.ErrShortBuffer from occurring when the byte
slice being read is smaller than the chunks sent from the opposite
pipe.
This makes sense for unordered connections, where transmission is
not guarunteed, but does not make sense for TCP-like connections.
We use a bufio.Reader when ordered to ensure data isn't lost.
* SSH server works!
* Start Windows support
* Something works
* Refactor pty package to support Windows spawn
* SSH server now works on Windows
* Fix non-Windows
* Fix Linux PTY render
* FIx linux build tests
* Remove agent and wintest
* Add test for Windows resize
* Fix linting errors
* Add Windows environment variables
* Add strings import
* Add comment for attrs
* Add goleak
* Add require import
This prevents a io.ErrShortBuffer from occurring when the byte
slice being read is smaller than the chunks sent from the opposite
pipe.
This makes sense for unordered connections, where transmission is
not guaranteed, but does not make sense for TCP-like connections.
We use a bufio.Reader when ordered to ensure data isn't lost.
* Refactor parameter parsing to return nil values if none computed
* Refactor parameter to allow for hiding redisplay
* Refactor parameters to enable schema matching
* Refactor provisionerd to dynamically update parameter schemas
* Refactor job update for provisionerd
* Handle multiple states correctly when provisioning a project
* Add project import job resource table
* Basic creation flow works!
* Create project fully works!!!
* Only show job status if completed
* Add create workspace support
* Replace Netflix/go-expect with ActiveState
* Fix linting errors
* Use forked chzyer/readline
* Add create workspace CLI
* Add CLI test
* Move jobs to their own APIs
* Remove go-expect
* Fix requested changes
* Skip workspacecreate test on windows
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](f6e369438f/drpcwire/error.go (L15)) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](f6e369438f/drpcstream/stream.go (L253))
- [`writeFrame`](f6e369438f/drpcwire/writer.go (L65))
- [`AppendFrame`](f6e369438f/drpcwire/packet.go (L128))
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: f6e369438f/drpcwire/writer.go (L73)
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](8dee580a7f/write.go (L180)), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](a1a9cfc821/flate/stateless.go (L94)), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* chore: Add VS Code recommended extensions
This adds extensions I feel work well for the project.
With this, a "Run on Save" extension is added that runs
"make gen" on save of query.sql to regenerate models.
* Fix formatting
* feat: Create provisioner abstraction
Creates a provisioner abstraction that takes prior art from the Terraform plugin system. It's safe to assume this code will change a lot when it becomes integrated with provisionerd.
Closes#10.
* Ignore generated files in diff view
* Check for unstaged file changes
* Install protoc-gen-go
* Use proper drpc plugin version
* Fix serve closed pipe
* Install sqlc with curl for speed
* Fix install command
* Format CI action
* Add linguist-generated and closed pipe test
* Cleanup code from comments
* Add dRPC comment
* Add Terraform installer for cross-platform
* Build provisioner tests on Linux only
* chore: Add golangci-lint and codecov
* Use consistent file names
* Format settings.json
* Add golangci-lint and codecov GitHub Actions
* Add base Go file for linting
* Add test coverage