Although the terraform-exec docs don't indicate this, the result of
"terraform show" isn't actually the state... it's a trimmed version
of the state that excludes resource identifiers, essentially removing
all state that did exist.
Tests will be written to ensure Terraform state reconciliation can occur.
This will happen in another PR, as dogfood is currently broken because of this.
The Terraform Provisioner depended on the statefile content
being at a specific path, which disallowed the use of external
state providers. This fixes it!
* fix: Update GIT_COMMITTER_NAME to use username
This was a mistake when adding the committer fields 🤦.
* fix: Use environment variables for agent authentication
Using files led to situations where running "coder server --dev" would
break `gitssh`. This is applicable in a production environment too. Users
should be able to log into another Coder deployment from their workspace.
Users can still set "CODER_URL" if they'd like with agent env vars!
This fixes the dependency tree by adding recursion. It
now finds indirect connections and associates it with
an agent.
An example is attached which surfaced this issue.
This enables a "kubernetes_pod" to attach multiple agents that
could be for multiple services. Each agent is required to have
a unique name, so SSH syntax is:
`coder ssh <workspace>.<agent>`
A resource can have zero agents too, they aren't required.
It was occasionally failing without any clear indication of
what to fix on our side. The plugins weren't being found
by Terraform.
We already disable this on Windows, so figured it's fine on
Darwin too considering most production deployments will be Linux.
It's possible for websocket close messages to be too long, which cause
them to silently fail without a proper close message. See error below:
```
2022-03-31 17:08:34.862 [INFO] (stdlib) <close_notjs.go:72> "2022/03/31 17:08:34 websocket: failed to marshal close frame: reason string max is 123 but got \"insert provisioner daemon:Cannot encode []database.ProvisionerType into oid 19098 - []database.ProvisionerType must implement Encoder or be converted to a string\" with length 161"
```
* feat: Add stage to build logs
This adds a stage property to logs, and refactors the job logs
cliui.
It also adds tests to the cliui for build logs!
* feat: Add stage to build logs
This adds a stage property to logs, and refactors the job logs
cliui.
It also adds tests to the cliui for build logs!
* feat: Add config-ssh and tests for resiliency
* Rename "Echo" test to "ImmediateExit"
* Fix Terraform resource agent association
* Fix logs post-cancel
* Fix select on Windows
* Remove terraform init logs
* Move timer into it's own loop
* Fix race condition in provisioner jobs
* Fix requested changes
* feat: Add AWS instance identity authentication
This allows zero-trust authentication for all AWS instances.
Prior to this, AWS instances could be used by passing `CODER_TOKEN`
as an environment variable to the startup script. AWS explicitly
states that secrets should not be passed in startup scripts because
it's user-readable.
* feat: Support caching provisioner assets
This caches the Terraform binary, and Terraform plugins.
Eventually, it could cache other temporary files.
* chore: fix linter
Co-authored-by: Garrett <garrett@coder.com>
* feat: Add stage to build logs
This adds a stage property to logs, and refactors the job logs
cliui.
It also adds tests to the cliui for build logs!
* Fix comments
The logic required a constant value before, which disallowed dynamic
value injection into the agent. This isn't an accurate limitation,
so inverting the logic resolves it.
This update exposes the workspace name and owner, and changes
authentication methods to be explicit. Implicit authentication
added unnecessary complexity and introduced inconsistency.
* ci: Update DataDog GitHub branch to fallback to GITHUB_REF
This was detecting branches, but not our "main" branch before.
Hopefully this fixes it!
* Add basic Terraform Provider
* Rename post files to upload
* Add tests for resources
* Skip instance identity test
* Add tests for ensuring agent get's passed through properly
* Fix linting errors
* Add echo path
* Fix agent authentication
* fix: Convert all jobs to use a common resource and agent type
This enables a consistent API for project import and provisioned resources.
* Add "coder_workspace" data source
* feat: Remove magical parameters from being injected
This is a much cleaner abstraction. Explicitly declaring the user
parameters for each provisioner makes for significantly simpler
testing.
* feat: Add graceful exits to provisionerd
Terraform (or other provisioners) may need to cleanup state, or
cancel actions before exit. This adds the ability to gracefully
exit provisionerd.
* Fix cancel error check
* feat: Add destroy to workspace provision job
This enables the full flow of create/update/delete.
* fix: Use plan to detect resource agent association
Before this used the configuration object which detected all resources
regardless of count.
* ci: Update DataDog GitHub branch to fallback to GITHUB_REF
This was detecting branches, but not our "main" branch before.
Hopefully this fixes it!
* Add basic Terraform Provider
* Rename post files to upload
* Add tests for resources
* Skip instance identity test
* Add tests for ensuring agent get's passed through properly
* Fix linting errors
* Add echo path
* Fix agent authentication
* fix: Convert all jobs to use a common resource and agent type
This enables a consistent API for project import and provisioned resources.
* Add "coder_workspace" data source
* feat: Remove magical parameters from being injected
This is a much cleaner abstraction. Explicitly declaring the user
parameters for each provisioner makes for significantly simpler
testing.
* feat: Add graceful exits to provisionerd
Terraform (or other provisioners) may need to cleanup state, or
cancel actions before exit. This adds the ability to gracefully
exit provisionerd.
* Fix cancel error check
* feat: Add destroy to workspace provision job
This enables the full flow of create/update/delete.
* ci: Update DataDog GitHub branch to fallback to GITHUB_REF
This was detecting branches, but not our "main" branch before.
Hopefully this fixes it!
* Add basic Terraform Provider
* Rename post files to upload
* Add tests for resources
* Skip instance identity test
* Add tests for ensuring agent get's passed through properly
* Fix linting errors
* Add echo path
* Fix agent authentication
* fix: Convert all jobs to use a common resource and agent type
This enables a consistent API for project import and provisioned resources.
* Add "coder_workspace" data source
* feat: Remove magical parameters from being injected
This is a much cleaner abstraction. Explicitly declaring the user
parameters for each provisioner makes for significantly simpler
testing.
* feat: Add graceful exits to provisionerd
Terraform (or other provisioners) may need to cleanup state, or
cancel actions before exit. This adds the ability to gracefully
exit provisionerd.
* Fix cancel error check
* ci: Update DataDog GitHub branch to fallback to GITHUB_REF
This was detecting branches, but not our "main" branch before.
Hopefully this fixes it!
* Add basic Terraform Provider
* Rename post files to upload
* Add tests for resources
* Skip instance identity test
* Add tests for ensuring agent get's passed through properly
* Fix linting errors
* Add echo path
* Fix agent authentication
* fix: Convert all jobs to use a common resource and agent type
This enables a consistent API for project import and provisioned resources.
* Add "coder_workspace" data source
* feat: Remove magical parameters from being injected
This is a much cleaner abstraction. Explicitly declaring the user
parameters for each provisioner makes for significantly simpler
testing.
* feat: Add agent authentication based on instance ID
Each cloud has it's own unique instance identity signatures, which
can be used for zero-token authentication. This change adds support
for tracking by "instance_id", and automatically authenticating
with Google Cloud.
* Add test for CLI
* Fix workspace agent request name
* Fix race with adding to wait group
* Fix name of instance identity token
* Refactor parameter parsing to return nil values if none computed
* Refactor parameter to allow for hiding redisplay
* Refactor parameters to enable schema matching
* Refactor provisionerd to dynamically update parameter schemas
* Refactor job update for provisionerd
* Handle multiple states correctly when provisioning a project
* Add project import job resource table
* Basic creation flow works!
* Create project fully works!!!
* Only show job status if completed
* Add create workspace support
* Replace Netflix/go-expect with ActiveState
* Fix linting errors
* Use forked chzyer/readline
* Add create workspace CLI
* Add CLI test
* Move jobs to their own APIs
* Remove go-expect
* Fix requested changes
* Skip workspacecreate test on windows
* refactor: Rename ProjectParameter to ProjectVersionParameter
This was confusing with ParameterValue before. It still is a bit,
but this should help distinguish scope.
* Add project version resources table
* Allow project parameters to optionally have user and workspace
* Add dry run for provisioners
* Add resource detection on project import
* chore: Rename ProjectHistory to ProjectVersion
Version more accurately represents version storage. This
forks from the WorkspaceHistory name, but I think it's
easier to understand Workspace history.
* Rename files
* Standardize tests a bit more
* Remove Server struct from coderdtest
* Improve test coverage for workspace history
* Fix linting errors
* Fix coderd test leak
* Fix coderd test leak
* Improve workspace history logs
* Standardize test structure for codersdk
* Fix linting errors
* Fix WebSocket compression
* Update coderd/workspaces.go
Co-authored-by: Bryan <bryan@coder.com>
* Add test for listing project parameters
* Cache npm dependencies with setup node
* Remove windows npm cache key
Co-authored-by: Bryan <bryan@coder.com>
This replaces the cdr-basic provisioner type with
"echo". It reads binary data from the directory
and returns the responses in order.
This is used to test project and workspace job logic.
* feat: Add parameter querying to the API
* feat: Add streaming endpoint for workspace history
Enables a buildlog-like flow for reading job output.
* Fix empty parameter source and destination
* Add comment for usage of workspace history logs endpoint
Enforces a consistent test package layout.
This makes it difficult to test internal
functionality, which I believe promotes
healthy decomposition, and minimal package
exports.
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](f6e369438f/drpcwire/error.go (L15)) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](f6e369438f/drpcstream/stream.go (L253))
- [`writeFrame`](f6e369438f/drpcwire/writer.go (L65))
- [`AppendFrame`](f6e369438f/drpcwire/packet.go (L128))
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: f6e369438f/drpcwire/writer.go (L73)
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](8dee580a7f/write.go (L180)), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](a1a9cfc821/flate/stateless.go (L94)), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
This brings an async service that parses and
provisions to life! It's separated from coderd
intentionally to allow for simpler testing.
Integration with coderd will come in another PR!
* fix: Synchronize peer logging with a channel
We were depending on the close mutex to properly
report connection state. This ensures the RTC
connection is properly closed before returning.
* Disable pion logging
* Remove buffer
* Try ICE servers
* Remove flushed
* Add diagram explaining handshake
* Fix candidate accept ordering
* Add debug logging to peerbroker
* Fix send ordering
* Lock adding ICE candidate
* Add test for negotiating out of order
* Reduce connection to a single negotiation channel
* Improve test times by pre-installing Terraform
* Lock remote session description being applied
* Organize conn
* Revert to multi-channel setup
* Properly close ICE gatherer
* Improve comments
* Try removing buffered candidates
* Buffer local and remote messages
* Log dTLS transport state
* Add pion logging
* feat: Add parameter and jobs database schema
This modifies a prior migration which is typically forbidden,
but because we're pre-production deployment I felt grouping
would be helpful to future contributors.
This adds database functions that are required for the provisioner
daemon and job queue logic.
* feat: Compute project build parameters
Adds a projectparameter package to compute build-time project
values for a provided scope.
This package will be used to return which variables are being
used for a build, and can visually indicate the hierarchy to
a user.
* Fix terraform provisioner
* Improve naming, abstract inject to consume scope
* Run CI on all branches
* chore: Fix golangci-lint configuration and patch errors
Due to misconfiguration of a linting rules directory, our linter has not been
working properly. This change fixes the configuration issue, and all remaining
linting errors.
* Fix race in peer logging
* Fix race and return
* Lock on bufferred amount low
* Fix mutex lock
* feat: Add authentication and personal user endpoint
This contribution adds a lot of scaffolding for the database fake
and testability of coderd.
A new endpoint "/user" is added to return the currently authenticated
user to the requester.
* Use TestMain to catch leak instead
* Add userpassword package
* Add WIP
* Add user auth
* Fix test
* Add comments
* Fix login response
* Fix order
* Fix generated code
* Update httpapi/httpapi.go
Co-authored-by: Bryan <bryan@coder.com>
Co-authored-by: Bryan <bryan@coder.com>
* feat: Create broker for negotiating connections
WebRTC require an exchange of encryption keys and network hops to connect. This package pipes the exchange over gRPC. This will be used in all connecting clients and agents.
* Regenerate protobuf definition
* Cache Go build and test
* Fix gRPC language with dRPC
Co-authored-by: Bryan <bryan@coder.com>
Co-authored-by: Bryan <bryan@coder.com>
* feat: Create provisioner abstraction
Creates a provisioner abstraction that takes prior art from the Terraform plugin system. It's safe to assume this code will change a lot when it becomes integrated with provisionerd.
Closes#10.
* Ignore generated files in diff view
* Check for unstaged file changes
* Install protoc-gen-go
* Use proper drpc plugin version
* Fix serve closed pipe
* Install sqlc with curl for speed
* Fix install command
* Format CI action
* Add linguist-generated and closed pipe test
* Cleanup code from comments
* Add dRPC comment
* Add Terraform installer for cross-platform
* Build provisioner tests on Linux only