coder

Commit Graph

Author	SHA1	Message	Date
Colin Adler	d2ae16dd22	fix: routinely ping agent websocket to ensure liveness (#5824 )	2023-01-23 20:05:29 +00:00
Marcin Tojek	971e36781b	chore: improve logging in provisionerd_test (#5353 )	2022-12-09 13:11:54 +01:00
Colin Adler	ab3b3d5fca	feat: add debouncing to provisionerd rpc calls (#5198 )	2022-12-01 16:54:53 -06:00
Mathias Fredriksson	8ff89c4288	fix: Fix flakeyness of TestProvisionerd/ReconnectAndComplete (#5169 )	2022-11-24 14:09:56 +02:00
Colin Adler	1f20cab110	fix: don't use yamux for in-memory provisioner{,d} streams (#5136 )	2022-11-22 12:19:32 -06:00
Ammar Bandukwala	97dbd4dc5d	Implement Quotas v3 (#5012 ) * provisioner/terraform: add cost to resource_metadata * provisionerd/runner: use Options struct * Complete provisionerd implementation * Add quota_allowance to groups * Combine Quota and RBAC licenses * Add Opts to InTx	2022-11-14 17:57:33 +00:00
Ammar Bandukwala	95fb59696e	Refactor Provisioner to distinguish Plan and Apply (#5036 )	2022-11-11 16:45:58 -06:00
Kyle Carberry	30281852d6	feat: Add buffering to provisioner job logs (#4918 ) * feat: Add bufferring to provisioner job logs This should improve overall build performance, and especially under load. It removes the old `id` column on the `provisioner_job_logs` table and replaces it with an auto-incrementing big integer to preserve order. Funny enough, we never had to care about order before because inserts would at minimum be 1ms different. Now they aren't, so the order needs to be preserved. * Fix log bufferring * Fix frontend log streaming * Fix JS test	2022-11-06 20:50:34 -06:00
Mathias Fredriksson	4730c589fe	chore: Use standardized test timeouts and delays (#3291 )	2022-08-01 15:45:05 +03:00
Mathias Fredriksson	6916d34458	fix: Fix cleanup in test helpers, prefer `defer` in tests (#3113 ) * fix: Change uses of t.Cleanup -> defer in test bodies Mixing t.Cleanup and defer can lead to unexpected order of execution. * fix: Ensure t.Cleanup is not aborted by require * chore: Add helper annotations	2022-07-25 19:22:02 +03:00
Kyle Carberry	4f1df88529	fix: Always output job failure reason in provisioner daemon tests (#2850 ) This flake can be seen here: https://github.com/coder/coder/runs/7186604615?check_suite_focus=true	2022-07-10 14:52:33 -05:00
Spike Curtis	22febc749a	provisionerd sends failed or complete last (#2732 ) * provisionerd sends failed or complete last Signed-off-by: Spike Curtis <spike@coder.com> * Move runner into package Signed-off-by: Spike Curtis <spike@coder.com> * Remove jobRunner interface Signed-off-by: Spike Curtis <spike@coder.com> * renames and slight reworking from code review Signed-off-by: Spike Curtis <spike@coder.com> * Reword comment about okToSend Signed-off-by: Spike Curtis <spike@coder.com>	2022-07-01 09:55:46 -07:00
Dean Sheather	6be8a373e0	feat: run a terraform plan before creating workspaces with the given template parameters (#1732 )	2022-06-02 00:44:53 +10:00
Colin Adler	98ccd0eb89	feat: add README parsing to template versions (#1500 )	2022-05-17 15:00:48 -05:00
Colin Adler	9319c39257	fix: additional provisionerd test double closes (#1267 )	2022-05-03 08:02:54 -05:00
Kyle Carberry	603b7da413	test: Wrap provisionerd channel closes in sync.Once (#1181 ) This caused a few flakes, so figured I'd tackle all of them: https://github.com/coder/coder/runs/6167856950?check_suite_focus=true#step:9:246	2022-04-26 09:44:16 -05:00
Kyle Carberry	947e8f9d2e	fix: Manually format external URL (#1168 ) path.Join escaped the double slash!	2022-04-25 22:22:31 +00:00
Kyle Carberry	c8e566fe42	feat: Add links for registering git key (#1125 )	2022-04-25 04:27:45 +00:00
Kyle Carberry	68f67c54b6	fix: Add sync.Once to prevent double close in test (#1124 ) https://github.com/coder/coder/runs/6151451291?check_suite_focus=true	2022-04-25 04:06:18 +00:00
Kyle Carberry	db7ed4d019	fix: Add resiliency to daemon connections (#1116 ) Connections could fail when massive payloads were transmitted. This fixes an upstream bug in dRPC where the connection would end with a context canceled if a message was too large. This adds retransmission of completion and failures too. If Coder somehow loses connection with a provisioner daemon, upon the next connection the state will be properly reported.	2022-04-24 20:33:19 -05:00
Kyle Carberry	02ad3f14f5	chore: Rename Projects to Templates (#880 ) Customer feedback indicated projects was a confusing name. After querying the team internally, it seemed unanimous that it is indeed a confusing name. Here's for a lil less confusion @ashmeer7 🥂	2022-04-06 12:42:40 -05:00
Kyle Carberry	a502a5fa14	feat: Add AWS instance identity authentication (#570 ) * feat: Add AWS instance identity authentication This allows zero-trust authentication for all AWS instances. Prior to this, AWS instances could be used by passing `CODER_TOKEN` as an environment variable to the startup script. AWS explicitly states that secrets should not be passed in startup scripts because it's user-readable. * Fix sha256 verbosity * Fix HTTP client being exposed on auth	2022-03-28 19:31:03 +00:00
Kyle Carberry	b33dec9d38	feat: Add stage to build logs (#577 ) * feat: Add stage to build logs This adds a stage property to logs, and refactors the job logs cliui. It also adds tests to the cliui for build logs! * Fix comments	2022-03-28 18:43:22 +00:00
Kyle Carberry	c451f4e685	feat: Add templates to create working release (#422 ) * Add templates * Move API structs to codersdk * Back to green tests! * It all works, but now with tea! 🧋 * It works! * Add cancellation to provisionerd * Tests pass! * Add deletion of workspaces and projects * Fix agent lock * Add clog * Fix linting errors * Remove unused CLI tests * Rename daemon to start * Fix leaking command * Fix promptui test * Update agent connection frequency * Skip login tests on Windows * Increase tunnel connect timeout * Fix templater * Lower test requirements * Fix embed * Disable promptui tests for Windows * Fix write newline * Fix PTY write newline * Fix CloseReader * Fix compilation on Windows * Fix linting error * Remove bubbletea * Cleanup readwriter * Use embedded templates instead of serving over API * Move templates to examples * Improve workspace create flow * Fix Windows build * Fix tests * Fix linting errors * Fix untar with extracting max size * Fix newline char	2022-03-22 13:17:50 -06:00
Kyle Carberry	bf0ae8f573	feat: Refactor API routes to use UUIDs instead of friendly names (#401 ) * Add client for agent * Cleanup code * Fix linting error * Rename routes to be simpler * Rename workspace history to workspace build * Refactor HTTP middlewares to use UUIDs * Cleanup routes * Compiles! * Fix files and organizations * Fix querying * Fix agent lock * Cleanup database abstraction * Add parameters * Fix linting errors * Fix log race * Lock on close wait * Fix log cleanup * Fix e2e tests * Fix upstream version of opencensus-go * Update coderdtest.go * Fix coverpkg * Fix codecov ignore	2022-03-07 11:40:54 -06:00
Kyle Carberry	ea5efbd37f	test: Fix flake with context.Cancelled in provisionerd (#386 ) This occurred because the context can cancel in the same time a response is sent. This isn't a bug, because the complete still occurs.	2022-03-01 01:56:54 +00:00
Kyle Carberry	fd5eceb0b8	test: Fix test flake panic in provisionerd (#383 ) Closes #382.	2022-02-28 13:10:34 -08:00
Kyle Carberry	9d2803e07a	feat: Add graceful exits to provisionerd (#372 ) * ci: Update DataDog GitHub branch to fallback to GITHUB_REF This was detecting branches, but not our "main" branch before. Hopefully this fixes it! * Add basic Terraform Provider * Rename post files to upload * Add tests for resources * Skip instance identity test * Add tests for ensuring agent get's passed through properly * Fix linting errors * Add echo path * Fix agent authentication * fix: Convert all jobs to use a common resource and agent type This enables a consistent API for project import and provisioned resources. * Add "coder_workspace" data source * feat: Remove magical parameters from being injected This is a much cleaner abstraction. Explicitly declaring the user parameters for each provisioner makes for significantly simpler testing. * feat: Add graceful exits to provisionerd Terraform (or other provisioners) may need to cleanup state, or cancel actions before exit. This adds the ability to gracefully exit provisionerd. * Fix cancel error check	2022-02-28 18:40:49 +00:00
Kyle Carberry	e5c95552cd	feat: Remove magical parameters from being injected (#371 ) * ci: Update DataDog GitHub branch to fallback to GITHUB_REF This was detecting branches, but not our "main" branch before. Hopefully this fixes it! * Add basic Terraform Provider * Rename post files to upload * Add tests for resources * Skip instance identity test * Add tests for ensuring agent get's passed through properly * Fix linting errors * Add echo path * Fix agent authentication * fix: Convert all jobs to use a common resource and agent type This enables a consistent API for project import and provisioned resources. * Add "coder_workspace" data source * feat: Remove magical parameters from being injected This is a much cleaner abstraction. Explicitly declaring the user parameters for each provisioner makes for significantly simpler testing.	2022-02-28 18:26:01 +00:00
Kyle Carberry	6bdef0697c	test: Fix race condition in provisionerd on cleanup (#322 ) These goroutines could be ran after the pipe has already been closed. I'm not certain this resolves this specific leak: https://github.com/coder/coder/runs/5249481202?check_suite_focus=true#step:7:186 ...but I find it likely.	2022-02-18 10:43:56 -06:00
Kyle Carberry	154b9bce57	feat: Add "coder projects create" command (#246 ) * Refactor parameter parsing to return nil values if none computed * Refactor parameter to allow for hiding redisplay * Refactor parameters to enable schema matching * Refactor provisionerd to dynamically update parameter schemas * Refactor job update for provisionerd * Handle multiple states correctly when provisioning a project * Add project import job resource table * Basic creation flow works! * Create project fully works!!! * Only show job status if completed * Add create workspace support * Replace Netflix/go-expect with ActiveState * Fix linting errors * Use forked chzyer/readline * Add create workspace CLI * Add CLI test * Move jobs to their own APIs * Remove go-expect * Fix requested changes * Skip workspacecreate test on windows	2022-02-12 13:34:04 -06:00
Kyle Carberry	795bba2af4	feat: Add dry run for provisioners (#178 ) * refactor: Rename ProjectParameter to ProjectVersionParameter This was confusing with ParameterValue before. It still is a bit, but this should help distinguish scope. * Add project version resources table * Allow project parameters to optionally have user and workspace * Add dry run for provisioners * Add resource detection on project import	2022-02-07 19:35:18 -06:00
Kyle Carberry	ed705f6af2	refactor: Generalize log ownership to allow for scratch jobs (#182 ) * refactor: Generalize log ownership to allow for scratch jobs Importing may fail when creating a project. We don't want to lose this output, but we don't want to allow users to create a failing project. This generalizes logs to soon enable one-off situations where a user can upload their archive, create a project, and watch the output parse to completion. * Improve file table schema by using hash * Fix racey test by allowing logs before * Add debug logging for PostgreSQL insert	2022-02-07 15:32:37 -06:00
Kyle Carberry	1796dc6c2f	chore: Add test helpers to improve coverage (#166 ) * chore: Rename ProjectHistory to ProjectVersion Version more accurately represents version storage. This forks from the WorkspaceHistory name, but I think it's easier to understand Workspace history. * Rename files * Standardize tests a bit more * Remove Server struct from coderdtest * Improve test coverage for workspace history * Fix linting errors * Fix coderd test leak * Fix coderd test leak * Improve workspace history logs * Standardize test structure for codersdk * Fix linting errors * Fix WebSocket compression * Update coderd/workspaces.go Co-authored-by: Bryan <bryan@coder.com> * Add test for listing project parameters * Cache npm dependencies with setup node * Remove windows npm cache key Co-authored-by: Bryan <bryan@coder.com>	2022-02-05 18:24:51 -06:00
Kyle Carberry	e75bde4e31	feat: Add provisionerdaemon to coderd (#141 ) * feat: Add history middleware parameters These will be used for streaming logs, checking status, and other operations related to workspace and project history. * refactor: Move all HTTP routes to top-level struct Nesting all structs behind their respective structures is leaky, and promotes naming conflicts between handlers. Our HTTP routes cannot have conflicts, so neither should function naming. * Add provisioner daemon routes * Add periodic updates * Skip pubsub if short * Return jobs with WorkspaceHistory * Add endpoints for extracting singular history * The full end-to-end operation works * fix: Disable compression for websocket dRPC transport (#145) There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing. This is just tracking some experimentation to fix that race condition ## Run results: ## - Run 1: peer test failure - Run 2: peer test failure - Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45 ``` status code 412: The provided project history is running. Wait for it to complete importing!` ``` - Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176 ``` workspacehistory_test.go:122: Error Trace: workspacehistory_test.go:122 Error: Condition never satisfied Test: TestWorkspaceHistory/CreateHistory ``` - Run 5: peer failure - Run 6: Pass ✅ - Run 7: Peer failure ## Open Questions: ## ### Is `dRPC` or `websocket` at fault for the data race? It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](`f6e369438f/drpcwire/error.go (L15)`) - so `dRPC` has created this buffer and owns it. From `dRPC`'s perspective, the callstack looks like this: - [`sendPacket`](`f6e369438f/drpcstream/stream.go (L253)`) - [`writeFrame`](`f6e369438f/drpcwire/writer.go (L65)`) - [`AppendFrame`](`f6e369438f/drpcwire/packet.go (L128)`) - with finally the data race happening here: ```go // AppendFrame appends a marshaled form of the frame to the provided buffer. func AppendFrame(buf []byte, fr Frame) []byte { ... out := buf out = append(out, control). // <--------- ``` This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame. Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: `f6e369438f/drpcwire/writer.go (L73)` However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](`8dee580a7f/write.go (L180)`), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](`a1a9cfc821/flate/stateless.go (L94)`), which is where get our race. In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly). ### Why does cloning on `Read` fail? Get a bunch of errors like: ``` 2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0 2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF 2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF 2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0 ``` # UPDATE: We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now: - Run 1: ✅ - Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338 - Run 3: ✅ - Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168 - Run 5: ✅ * fix: Remove race condition with acquiredJobDone channel (#148) Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83 __Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs. __Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up. * fix: Bump up workspace history timeout (#149) This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32 Looking at the timing of the test: ``` t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running workspacehistory_test.go:122: Error Trace: workspacehistory_test.go:122 Error: Condition never satisfied Test: TestWorkspaceHistory/CreateHistory ``` It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout. Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here. In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that. Co-authored-by: Bryan <bryan@coder.com>	2022-02-03 20:34:50 +00:00
Kyle Carberry	3ba8242764	feat: Add provisionerd service (#127 ) This brings an async service that parses and provisions to life! It's separated from coderd intentionally to allow for simpler testing. Integration with coderd will come in another PR!	2022-02-01 12:15:54 -06:00

36 Commits