2022-01-20 13:46:51 +00:00
package coderdtest
import (
2022-03-25 19:48:08 +00:00
"bytes"
2022-01-20 13:46:51 +00:00
"context"
2022-03-28 19:31:03 +00:00
"crypto"
2022-03-25 19:48:08 +00:00
"crypto/rand"
"crypto/rsa"
2022-03-28 19:31:03 +00:00
"crypto/sha256"
2022-10-17 13:43:30 +00:00
"crypto/tls"
2022-03-28 19:31:03 +00:00
"crypto/x509"
2022-04-19 13:48:13 +00:00
"crypto/x509/pkix"
2023-06-08 15:30:15 +00:00
"database/sql"
2022-03-25 19:48:08 +00:00
"encoding/base64"
"encoding/json"
2022-03-28 19:31:03 +00:00
"encoding/pem"
2022-09-06 17:07:00 +00:00
"errors"
2022-05-12 20:56:23 +00:00
"fmt"
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
"io"
2022-03-25 19:48:08 +00:00
"math/big"
2022-02-19 04:06:56 +00:00
"net"
2022-03-25 19:48:08 +00:00
"net/http"
2022-01-20 13:46:51 +00:00
"net/http/httptest"
"net/url"
2022-10-14 18:25:11 +00:00
"regexp"
2022-09-01 01:09:44 +00:00
"strconv"
2022-02-06 00:24:51 +00:00
"strings"
2022-10-17 13:43:30 +00:00
"sync"
2023-04-04 12:48:35 +00:00
"sync/atomic"
2022-01-20 13:46:51 +00:00
"testing"
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
"time"
2022-01-20 13:46:51 +00:00
2022-03-25 19:48:08 +00:00
"cloud.google.com/go/compute/metadata"
2022-04-19 13:48:13 +00:00
"github.com/fullsailor/pkcs7"
2023-08-25 19:34:07 +00:00
"github.com/golang-jwt/jwt/v4"
2022-02-06 00:24:51 +00:00
"github.com/google/uuid"
"github.com/moby/moby/pkg/namesgenerator"
2023-02-14 14:27:06 +00:00
"github.com/prometheus/client_golang/prometheus"
2022-05-24 07:58:39 +00:00
"github.com/stretchr/testify/assert"
2022-01-20 13:46:51 +00:00
"github.com/stretchr/testify/require"
2022-06-29 19:17:32 +00:00
"golang.org/x/xerrors"
2022-02-21 20:36:29 +00:00
"google.golang.org/api/idtoken"
"google.golang.org/api/option"
2022-10-17 13:43:30 +00:00
"tailscale.com/derp"
2022-09-15 01:21:53 +00:00
"tailscale.com/net/stun/stuntest"
2022-09-01 01:09:44 +00:00
"tailscale.com/tailcfg"
2022-10-17 13:43:30 +00:00
"tailscale.com/types/key"
2022-09-15 01:21:53 +00:00
"tailscale.com/types/nettype"
2022-01-20 13:46:51 +00:00
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
"cdr.dev/slog"
2023-08-04 16:00:42 +00:00
"cdr.dev/slog/sloggers/sloghuman"
2022-01-20 13:46:51 +00:00
"cdr.dev/slog/sloggers/slogtest"
2023-08-18 18:55:43 +00:00
"github.com/coder/coder/v2/coderd"
"github.com/coder/coder/v2/coderd/audit"
"github.com/coder/coder/v2/coderd/autobuild"
"github.com/coder/coder/v2/coderd/awsidentity"
"github.com/coder/coder/v2/coderd/batchstats"
"github.com/coder/coder/v2/coderd/database"
"github.com/coder/coder/v2/coderd/database/dbauthz"
"github.com/coder/coder/v2/coderd/database/dbtestutil"
"github.com/coder/coder/v2/coderd/database/pubsub"
2023-09-30 19:30:01 +00:00
"github.com/coder/coder/v2/coderd/externalauth"
2023-08-18 18:55:43 +00:00
"github.com/coder/coder/v2/coderd/gitsshkey"
"github.com/coder/coder/v2/coderd/healthcheck"
"github.com/coder/coder/v2/coderd/httpapi"
"github.com/coder/coder/v2/coderd/httpmw"
"github.com/coder/coder/v2/coderd/rbac"
"github.com/coder/coder/v2/coderd/schedule"
"github.com/coder/coder/v2/coderd/telemetry"
"github.com/coder/coder/v2/coderd/unhanger"
"github.com/coder/coder/v2/coderd/updatecheck"
"github.com/coder/coder/v2/coderd/util/ptr"
"github.com/coder/coder/v2/coderd/workspaceapps"
"github.com/coder/coder/v2/codersdk"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/coder/v2/cryptorand"
"github.com/coder/coder/v2/provisioner/echo"
"github.com/coder/coder/v2/provisionerd"
provisionerdproto "github.com/coder/coder/v2/provisionerd/proto"
"github.com/coder/coder/v2/provisionersdk"
sdkproto "github.com/coder/coder/v2/provisionersdk/proto"
"github.com/coder/coder/v2/tailnet"
"github.com/coder/coder/v2/testutil"
2022-01-20 13:46:51 +00:00
)
2023-04-05 18:41:55 +00:00
// AppSecurityKey is a 96-byte key used to sign JWTs and encrypt JWEs for
// workspace app tokens in tests.
var AppSecurityKey = must ( workspaceapps . KeyFromString ( "6465616e207761732068657265206465616e207761732068657265206465616e207761732068657265206465616e207761732068657265206465616e207761732068657265206465616e207761732068657265206465616e2077617320686572" ) )
2023-03-07 19:38:11 +00:00
2022-02-21 20:36:29 +00:00
type Options struct {
2022-12-15 18:43:00 +00:00
// AccessURL denotes a custom access URL. By default we use the httptest
// server's URL. Setting this may result in unexpected behavior (especially
// with running agents).
2023-03-07 14:14:58 +00:00
AccessURL * url . URL
AppHostname string
AWSCertificates awsidentity . Certificates
Authorizer rbac . Authorizer
AzureCertificates x509 . VerifyOptions
GithubOAuth2Config * coderd . GithubOAuth2Config
RealIPConfig * httpmw . RealIPConfig
OIDCConfig * coderd . OIDCConfig
GoogleTokenValidator * idtoken . Validator
SSHKeygenAlgorithm gitsshkey . Algorithm
AutobuildTicker <- chan time . Time
2023-06-22 04:33:22 +00:00
AutobuildStats chan <- autobuild . Stats
2023-03-07 14:14:58 +00:00
Auditor audit . Auditor
TLSCertificates [ ] tls . Certificate
2023-09-30 19:30:01 +00:00
ExternalAuthConfigs [ ] * externalauth . Config
2023-03-07 14:14:58 +00:00
TrialGenerator func ( context . Context , string ) error
TemplateScheduleStore schedule . TemplateScheduleStore
2023-07-12 22:37:31 +00:00
Coordinator tailnet . Coordinator
2022-05-19 22:47:45 +00:00
2023-06-02 00:21:24 +00:00
HealthcheckFunc func ( ctx context . Context , apiKey string ) * healthcheck . Report
2023-04-03 06:28:42 +00:00
HealthcheckTimeout time . Duration
HealthcheckRefresh time . Duration
2023-01-05 18:05:20 +00:00
// All rate limits default to -1 (unlimited) in tests if not set.
APIRateLimit int
LoginRateLimit int
FilesRateLimit int
2022-09-04 16:28:09 +00:00
// IncludeProvisionerDaemon when true means to start an in-memory provisionerD
2022-09-06 17:07:00 +00:00
IncludeProvisionerDaemon bool
MetricsCacheRefreshInterval time . Duration
AgentStatsRefreshInterval time . Duration
2023-03-07 21:10:01 +00:00
DeploymentValues * codersdk . DeploymentValues
2022-10-17 13:43:30 +00:00
2022-12-01 17:43:28 +00:00
// Set update check options to enable update check.
UpdateCheckOptions * updatecheck . Options
2022-10-17 13:43:30 +00:00
// Overriding the database is heavily discouraged.
// It should only be used in cases where multiple Coder
// test instances are running against the same database.
Database database . Store
2023-06-14 15:34:54 +00:00
Pubsub pubsub . Pubsub
2022-12-19 17:43:46 +00:00
2023-03-16 18:03:37 +00:00
ConfigSSH codersdk . SSHConfigResponse
2022-12-19 17:43:46 +00:00
SwaggerEndpoint bool
2023-06-22 04:33:22 +00:00
// Logger should only be overridden if you expect errors
// as part of your test.
2023-08-04 16:00:42 +00:00
Logger * slog . Logger
StatsBatcher * batchstats . Batcher
2023-08-16 12:22:00 +00:00
WorkspaceAppsStatsCollectorOptions workspaceapps . StatsCollectorOptions
2022-02-21 20:36:29 +00:00
}
2022-05-26 03:14:08 +00:00
// New constructs a codersdk client connected to an in-memory API instance.
2023-05-08 13:59:01 +00:00
func New ( t testing . TB , options * Options ) * codersdk . Client {
2022-06-27 18:50:52 +00:00
client , _ := newWithCloser ( t , options )
2022-05-26 03:14:08 +00:00
return client
2022-05-18 20:49:46 +00:00
}
2023-11-02 17:15:07 +00:00
// NewWithDatabase constructs a codersdk client connected to an in-memory API instance.
// The database is returned to provide direct data manipulation for tests.
func NewWithDatabase ( t testing . TB , options * Options ) ( * codersdk . Client , database . Store ) {
client , _ , api := NewWithAPI ( t , options )
return client , api . Database
}
2022-06-27 18:50:52 +00:00
// NewWithProvisionerCloser returns a client as well as a handle to close
// the provisioner. This is a temporary function while work is done to
// standardize how provisioners are registered with coderd. The option
// to include a provisioner is set to true for convenience.
2023-10-12 12:03:16 +00:00
func NewWithProvisionerCloser ( t testing . TB , options * Options ) ( * codersdk . Client , io . Closer ) {
2022-06-27 18:50:52 +00:00
if options == nil {
options = & Options { }
}
2022-09-04 16:28:09 +00:00
options . IncludeProvisionerDaemon = true
2022-06-27 18:50:52 +00:00
client , closer := newWithCloser ( t , options )
return client , closer
}
// newWithCloser constructs a codersdk client connected to an in-memory API instance.
// The returned closer closes a provisioner if it was provided
// The API is intentionally not returned here because coderd tests should not
// require a handle to the API. Do not expose the API or wrath shall descend
// upon thee. Even the io.Closer that is exposed here shouldn't be exposed
// and is a temporary measure while the API to register provisioners is ironed
// out.
2023-05-08 13:59:01 +00:00
func newWithCloser ( t testing . TB , options * Options ) ( * codersdk . Client , io . Closer ) {
2022-09-20 04:11:01 +00:00
client , closer , _ := NewWithAPI ( t , options )
2022-08-24 19:05:46 +00:00
return client , closer
}
2023-05-08 13:59:01 +00:00
func NewOptions ( t testing . TB , options * Options ) ( func ( http . Handler ) , context . CancelFunc , * url . URL , * coderd . Options ) {
2023-06-13 17:18:31 +00:00
t . Helper ( )
2022-02-21 20:36:29 +00:00
if options == nil {
options = & Options { }
}
2023-10-06 09:27:12 +00:00
if options . Logger == nil {
logger := slogtest . Make ( t , nil ) . Leveled ( slog . LevelDebug )
options . Logger = & logger
}
2022-04-19 13:48:13 +00:00
if options . GoogleTokenValidator == nil {
2022-02-21 20:36:29 +00:00
ctx , cancelFunc := context . WithCancel ( context . Background ( ) )
t . Cleanup ( cancelFunc )
var err error
2022-04-19 13:48:13 +00:00
options . GoogleTokenValidator , err = idtoken . NewValidator ( ctx , option . WithoutAuthentication ( ) )
2022-02-21 20:36:29 +00:00
require . NoError ( t , err )
}
2022-05-13 16:14:24 +00:00
if options . AutobuildTicker == nil {
2022-05-11 22:03:02 +00:00
ticker := make ( chan time . Time )
2022-05-13 16:14:24 +00:00
options . AutobuildTicker = ticker
2022-05-11 22:03:02 +00:00
t . Cleanup ( func ( ) { close ( ticker ) } )
}
2022-05-30 19:23:36 +00:00
if options . AutobuildStats != nil {
t . Cleanup ( func ( ) {
close ( options . AutobuildStats )
} )
}
2023-03-21 14:10:22 +00:00
if options . Authorizer == nil {
2023-05-08 13:59:01 +00:00
defAuth := rbac . NewCachingAuthorizer ( prometheus . NewRegistry ( ) )
if _ , ok := t . ( * testing . T ) ; ok {
options . Authorizer = & RecordingAuthorizer {
Wrapped : defAuth ,
}
} else {
// In benchmarks, the recording authorizer greatly skews results.
options . Authorizer = defAuth
2023-03-21 14:10:22 +00:00
}
}
2022-10-17 13:43:30 +00:00
if options . Database == nil {
options . Database , options . Pubsub = dbtestutil . NewDB ( t )
2023-01-25 21:35:53 +00:00
}
2023-03-21 14:10:22 +00:00
2023-06-08 15:30:15 +00:00
// Some routes expect a deployment ID, so just make sure one exists.
// Check first incase the caller already set up this database.
// nolint:gocritic // Setting up unit test data inside test helper
depID , err := options . Database . GetDeploymentID ( dbauthz . AsSystemRestricted ( context . Background ( ) ) )
if xerrors . Is ( err , sql . ErrNoRows ) || depID == "" {
// nolint:gocritic // Setting up unit test data inside test helper
err := options . Database . InsertDeploymentID ( dbauthz . AsSystemRestricted ( context . Background ( ) ) , uuid . NewString ( ) )
require . NoError ( t , err , "insert a deployment id" )
}
2023-03-07 21:10:01 +00:00
if options . DeploymentValues == nil {
options . DeploymentValues = DeploymentValues ( t )
2022-11-08 16:59:39 +00:00
}
2023-04-11 13:57:23 +00:00
// This value is not safe to run in parallel. Force it to be false.
options . DeploymentValues . DisableOwnerWorkspaceExec = false
2022-01-23 05:58:10 +00:00
2023-01-05 18:05:20 +00:00
// If no ratelimits are set, disable all rate limiting for tests.
if options . APIRateLimit == 0 {
options . APIRateLimit = - 1
}
if options . LoginRateLimit == 0 {
options . LoginRateLimit = - 1
}
if options . FilesRateLimit == 0 {
options . FilesRateLimit = - 1
}
2023-08-04 16:00:42 +00:00
if options . StatsBatcher == nil {
ctx , cancel := context . WithCancel ( context . Background ( ) )
t . Cleanup ( cancel )
batcher , closeBatcher , err := batchstats . New ( ctx ,
batchstats . WithStore ( options . Database ) ,
// Avoid cluttering up test output.
batchstats . WithLogger ( slog . Make ( sloghuman . Sink ( io . Discard ) ) ) ,
)
require . NoError ( t , err , "create stats batcher" )
options . StatsBatcher = batcher
t . Cleanup ( closeBatcher )
}
2023-01-05 18:05:20 +00:00
2023-10-18 22:07:21 +00:00
accessControlStore := & atomic . Pointer [ dbauthz . AccessControlStore ] { }
var acs dbauthz . AccessControlStore = dbauthz . AGPLTemplateAccessControlStore { }
accessControlStore . Store ( & acs )
2023-04-04 12:48:35 +00:00
var templateScheduleStore atomic . Pointer [ schedule . TemplateScheduleStore ]
if options . TemplateScheduleStore == nil {
options . TemplateScheduleStore = schedule . NewAGPLTemplateScheduleStore ( )
}
templateScheduleStore . Store ( & options . TemplateScheduleStore )
2023-10-05 18:41:07 +00:00
var auditor atomic . Pointer [ audit . Auditor ]
if options . Auditor == nil {
options . Auditor = audit . NewNop ( )
}
auditor . Store ( & options . Auditor )
2022-04-18 22:40:25 +00:00
ctx , cancelFunc := context . WithCancel ( context . Background ( ) )
2023-06-22 04:33:22 +00:00
lifecycleExecutor := autobuild . NewExecutor (
2022-05-11 22:03:02 +00:00
ctx ,
2022-10-17 13:43:30 +00:00
options . Database ,
2023-09-19 06:25:57 +00:00
options . Pubsub ,
2023-04-04 12:48:35 +00:00
& templateScheduleStore ,
2023-10-05 18:41:07 +00:00
& auditor ,
2023-10-18 22:07:21 +00:00
accessControlStore ,
2023-10-06 09:27:12 +00:00
* options . Logger ,
2022-05-13 16:14:24 +00:00
options . AutobuildTicker ,
2022-05-30 19:23:36 +00:00
) . WithStatsChannel ( options . AutobuildStats )
2022-05-11 22:03:02 +00:00
lifecycleExecutor . Run ( )
2023-06-25 13:17:00 +00:00
hangDetectorTicker := time . NewTicker ( options . DeploymentValues . JobHangDetectorInterval . Value ( ) )
defer hangDetectorTicker . Stop ( )
2023-10-06 09:27:12 +00:00
hangDetector := unhanger . New ( ctx , options . Database , options . Pubsub , options . Logger . Named ( "unhanger.detector" ) , hangDetectorTicker . C )
2023-06-25 13:17:00 +00:00
hangDetector . Start ( )
t . Cleanup ( hangDetector . Close )
2022-10-17 13:43:30 +00:00
var mutex sync . RWMutex
var handler http . Handler
srv := httptest . NewUnstartedServer ( http . HandlerFunc ( func ( w http . ResponseWriter , r * http . Request ) {
mutex . RLock ( )
2023-06-14 11:52:01 +00:00
handler := handler
mutex . RUnlock ( )
2022-10-17 13:43:30 +00:00
if handler != nil {
handler . ServeHTTP ( w , r )
}
} ) )
2022-02-19 04:06:56 +00:00
srv . Config . BaseContext = func ( _ net . Listener ) context . Context {
return ctx
}
2022-10-17 13:43:30 +00:00
if options . TLSCertificates != nil {
srv . TLS = & tls . Config {
Certificates : options . TLSCertificates ,
MinVersion : tls . VersionTLS12 ,
}
srv . StartTLS ( )
} else {
srv . Start ( )
}
2022-07-25 16:22:02 +00:00
t . Cleanup ( srv . Close )
2022-09-13 17:31:33 +00:00
tcpAddr , ok := srv . Listener . Addr ( ) . ( * net . TCPAddr )
require . True ( t , ok )
2022-02-06 00:24:51 +00:00
serverURL , err := url . Parse ( srv . URL )
2022-01-23 05:58:10 +00:00
require . NoError ( t , err )
2022-09-13 17:31:33 +00:00
serverURL . Host = fmt . Sprintf ( "localhost:%d" , tcpAddr . Port )
2022-04-06 00:18:26 +00:00
2022-09-01 01:09:44 +00:00
derpPort , err := strconv . Atoi ( serverURL . Port ( ) )
require . NoError ( t , err )
2022-12-15 18:43:00 +00:00
accessURL := options . AccessURL
if accessURL == nil {
accessURL = serverURL
}
2023-08-01 15:50:43 +00:00
// If the STUNAddresses setting is empty or the default, start a STUN
// server. Otherwise, use the value as is.
var (
stunAddresses [ ] string
dvStunAddresses = options . DeploymentValues . DERP . Server . STUNAddresses . Value ( )
)
2023-08-24 17:22:31 +00:00
if len ( dvStunAddresses ) == 0 || dvStunAddresses [ 0 ] == "stun.l.google.com:19302" {
2023-08-01 15:50:43 +00:00
stunAddr , stunCleanup := stuntest . ServeWithPacketListener ( t , nettype . Std { } )
stunAddr . IP = net . ParseIP ( "127.0.0.1" )
t . Cleanup ( stunCleanup )
stunAddresses = [ ] string { stunAddr . String ( ) }
options . DeploymentValues . DERP . Server . STUNAddresses = stunAddresses
2023-08-27 19:46:44 +00:00
} else if dvStunAddresses [ 0 ] != tailnet . DisableSTUN {
2023-08-01 15:50:43 +00:00
stunAddresses = options . DeploymentValues . DERP . Server . STUNAddresses . Value ( )
}
2022-09-15 01:21:53 +00:00
2023-10-06 09:27:12 +00:00
derpServer := derp . NewServer ( key . NewNode ( ) , tailnet . Logger ( options . Logger . Named ( "derp" ) . Leveled ( slog . LevelDebug ) ) )
2022-10-17 13:43:30 +00:00
derpServer . SetMeshKey ( "test-key" )
2022-04-06 00:18:26 +00:00
// match default with cli default
if options . SSHKeygenAlgorithm == "" {
options . SSHKeygenAlgorithm = gitsshkey . AlgorithmEd25519
}
2022-10-14 18:25:11 +00:00
var appHostnameRegex * regexp . Regexp
if options . AppHostname != "" {
var err error
appHostnameRegex , err = httpapi . CompileHostnamePattern ( options . AppHostname )
require . NoError ( t , err )
}
2023-06-21 22:02:05 +00:00
region := & tailcfg . DERPRegion {
EmbeddedRelay : true ,
RegionID : int ( options . DeploymentValues . DERP . Server . RegionID . Value ( ) ) ,
RegionCode : options . DeploymentValues . DERP . Server . RegionCode . String ( ) ,
RegionName : options . DeploymentValues . DERP . Server . RegionName . String ( ) ,
Nodes : [ ] * tailcfg . DERPNode { {
Name : fmt . Sprintf ( "%db" , options . DeploymentValues . DERP . Server . RegionID ) ,
RegionID : int ( options . DeploymentValues . DERP . Server . RegionID . Value ( ) ) ,
IPv4 : "127.0.0.1" ,
DERPPort : derpPort ,
// STUN port is added as a separate node by tailnet.NewDERPMap() if
// direct connections are enabled.
STUNPort : - 1 ,
InsecureForTests : true ,
ForceHTTP : options . TLSCertificates == nil ,
} } ,
}
if ! options . DeploymentValues . DERP . Server . Enable . Value ( ) {
region = nil
}
2023-08-01 15:50:43 +00:00
derpMap , err := tailnet . NewDERPMap ( ctx , region , stunAddresses , "" , "" , options . DeploymentValues . DERP . Config . BlockDirect . Value ( ) )
2023-06-21 22:02:05 +00:00
require . NoError ( t , err )
2022-10-17 13:43:30 +00:00
return func ( h http . Handler ) {
mutex . Lock ( )
defer mutex . Unlock ( )
handler = h
2022-12-15 18:43:00 +00:00
} , cancelFunc , serverURL , & coderd . Options {
2022-10-17 13:43:30 +00:00
AgentConnectionUpdateFrequency : 150 * time . Millisecond ,
// Force a long disconnection timeout to ensure
// agents are not marked as disconnected during slow tests.
AgentInactiveDisconnectTimeout : testutil . WaitShort ,
2022-12-15 18:43:00 +00:00
AccessURL : accessURL ,
2022-10-17 13:43:30 +00:00
AppHostname : options . AppHostname ,
AppHostnameRegex : appHostnameRegex ,
2023-06-22 04:33:22 +00:00
Logger : * options . Logger ,
2022-10-17 13:43:30 +00:00
CacheDir : t . TempDir ( ) ,
Database : options . Database ,
Pubsub : options . Pubsub ,
2023-09-29 19:13:20 +00:00
ExternalAuthConfigs : options . ExternalAuthConfigs ,
2022-10-17 13:43:30 +00:00
2023-08-16 12:22:00 +00:00
Auditor : options . Auditor ,
AWSCertificates : options . AWSCertificates ,
AzureCertificates : options . AzureCertificates ,
GithubOAuth2Config : options . GithubOAuth2Config ,
RealIPConfig : options . RealIPConfig ,
OIDCConfig : options . OIDCConfig ,
GoogleTokenValidator : options . GoogleTokenValidator ,
SSHKeygenAlgorithm : options . SSHKeygenAlgorithm ,
DERPServer : derpServer ,
APIRateLimit : options . APIRateLimit ,
LoginRateLimit : options . LoginRateLimit ,
FilesRateLimit : options . FilesRateLimit ,
Authorizer : options . Authorizer ,
Telemetry : telemetry . NewNoop ( ) ,
TemplateScheduleStore : & templateScheduleStore ,
2023-10-18 22:07:21 +00:00
AccessControlStore : accessControlStore ,
2023-08-16 12:22:00 +00:00
TLSCertificates : options . TLSCertificates ,
TrialGenerator : options . TrialGenerator ,
TailnetCoordinator : options . Coordinator ,
BaseDERPMap : derpMap ,
DERPMapUpdateFrequency : 150 * time . Millisecond ,
MetricsCacheRefreshInterval : options . MetricsCacheRefreshInterval ,
AgentStatsRefreshInterval : options . AgentStatsRefreshInterval ,
DeploymentValues : options . DeploymentValues ,
2023-09-29 17:04:28 +00:00
DeploymentOptions : codersdk . DeploymentOptionsWithoutSecrets ( options . DeploymentValues . Options ( ) ) ,
2023-08-16 12:22:00 +00:00
UpdateCheckOptions : options . UpdateCheckOptions ,
SwaggerEndpoint : options . SwaggerEndpoint ,
AppSecurityKey : AppSecurityKey ,
SSHConfig : options . ConfigSSH ,
HealthcheckFunc : options . HealthcheckFunc ,
HealthcheckTimeout : options . HealthcheckTimeout ,
HealthcheckRefresh : options . HealthcheckRefresh ,
StatsBatcher : options . StatsBatcher ,
WorkspaceAppsStatsCollectorOptions : options . WorkspaceAppsStatsCollectorOptions ,
2022-10-17 13:43:30 +00:00
}
2022-09-20 04:11:01 +00:00
}
2022-06-27 18:50:52 +00:00
2022-09-20 04:11:01 +00:00
// NewWithAPI constructs an in-memory API instance and returns a client to talk to it.
// Most tests never need a reference to the API, but AuthorizationTest in this module uses it.
// Do not expose the API or wrath shall descend upon thee.
2023-05-08 13:59:01 +00:00
func NewWithAPI ( t testing . TB , options * Options ) ( * codersdk . Client , io . Closer , * coderd . API ) {
2022-09-20 04:11:01 +00:00
if options == nil {
options = & Options { }
}
2022-12-15 18:43:00 +00:00
setHandler , cancelFunc , serverURL , newOptions := NewOptions ( t , options )
2022-09-20 04:11:01 +00:00
// We set the handler after server creation for the access URL.
coderAPI := coderd . New ( newOptions )
2022-10-17 13:43:30 +00:00
setHandler ( coderAPI . RootHandler )
2022-06-27 18:50:52 +00:00
var provisionerCloser io . Closer = nopcloser { }
2022-09-04 16:28:09 +00:00
if options . IncludeProvisionerDaemon {
2022-06-27 18:50:52 +00:00
provisionerCloser = NewProvisionerDaemon ( t , coderAPI )
2022-05-19 22:47:45 +00:00
}
2022-12-15 18:43:00 +00:00
client := codersdk . New ( serverURL )
2022-02-20 22:29:16 +00:00
t . Cleanup ( func ( ) {
2022-09-20 04:11:01 +00:00
cancelFunc ( )
2022-06-27 18:50:52 +00:00
_ = provisionerCloser . Close ( )
2022-09-20 04:11:01 +00:00
_ = coderAPI . Close ( )
2022-11-13 20:06:03 +00:00
client . HTTPClient . CloseIdleConnections ( )
2022-02-20 22:29:16 +00:00
} )
2022-11-13 20:06:03 +00:00
return client , provisionerCloser , coderAPI
2022-05-17 18:43:19 +00:00
}
2023-09-19 06:25:57 +00:00
// provisionerdCloser wraps a provisioner daemon as an io.Closer that can be called multiple times
type provisionerdCloser struct {
mu sync . Mutex
closed bool
d * provisionerd . Server
}
func ( c * provisionerdCloser ) Close ( ) error {
c . mu . Lock ( )
defer c . mu . Unlock ( )
if c . closed {
return nil
}
c . closed = true
ctx , cancel := context . WithTimeout ( context . Background ( ) , testutil . WaitShort )
defer cancel ( )
shutdownErr := c . d . Shutdown ( ctx )
closeErr := c . d . Close ( )
if shutdownErr != nil {
return shutdownErr
}
return closeErr
}
2022-02-06 00:24:51 +00:00
// NewProvisionerDaemon launches a provisionerd instance configured to work
// well with coderd testing. It registers the "echo" provisioner for
// quick testing.
2023-05-08 13:59:01 +00:00
func NewProvisionerDaemon ( t testing . TB , coderAPI * coderd . API ) io . Closer {
2023-06-13 17:18:31 +00:00
t . Helper ( )
2023-09-13 18:13:08 +00:00
// t.Cleanup runs in last added, first called order. t.TempDir() will delete
// the directory on cleanup, so we want to make sure the echoServer is closed
// before we go ahead an attempt to delete it's work directory.
// seems t.TempDir() is not safe to call from a different goroutine
workDir := t . TempDir ( )
2022-11-22 18:19:32 +00:00
echoClient , echoServer := provisionersdk . MemTransportPipe ( )
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
ctx , cancelFunc := context . WithCancel ( context . Background ( ) )
t . Cleanup ( func ( ) {
2022-02-04 22:51:54 +00:00
_ = echoClient . Close ( )
_ = echoServer . Close ( )
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
cancelFunc ( )
} )
2023-09-13 18:13:08 +00:00
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
go func ( ) {
2023-08-25 06:10:15 +00:00
err := echo . Serve ( ctx , & provisionersdk . ServeOptions {
Listener : echoServer ,
WorkDirectory : workDir ,
Logger : coderAPI . Logger . Named ( "echo" ) . Leveled ( slog . LevelDebug ) ,
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
} )
2022-05-24 07:58:39 +00:00
assert . NoError ( t , err )
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
} ( )
2023-09-19 06:25:57 +00:00
daemon := provisionerd . New ( func ( ctx context . Context ) ( provisionerdproto . DRPCProvisionerDaemonClient , error ) {
return coderAPI . CreateInMemoryProvisionerDaemon ( ctx )
2022-11-16 22:34:06 +00:00
} , & provisionerd . Options {
2023-06-22 04:33:22 +00:00
Logger : coderAPI . Logger . Named ( "provisionerd" ) . Leveled ( slog . LevelDebug ) ,
2022-11-16 22:34:06 +00:00
UpdateInterval : 250 * time . Millisecond ,
2023-10-04 21:16:39 +00:00
ForceCancelInterval : 5 * time . Second ,
2023-09-08 09:53:48 +00:00
Connector : provisionerd . LocalProvisioners {
2022-11-22 18:19:32 +00:00
string ( database . ProvisionerTypeEcho ) : sdkproto . NewDRPCProvisionerClient ( echoClient ) ,
2022-11-16 22:34:06 +00:00
} ,
} )
2023-09-19 06:25:57 +00:00
closer := & provisionerdCloser { d : daemon }
2022-11-16 22:34:06 +00:00
t . Cleanup ( func ( ) {
_ = closer . Close ( )
} )
return closer
}
2023-10-12 12:03:16 +00:00
func NewExternalProvisionerDaemon ( t testing . TB , client * codersdk . Client , org uuid . UUID , tags map [ string ] string ) io . Closer {
2022-11-22 18:19:32 +00:00
echoClient , echoServer := provisionersdk . MemTransportPipe ( )
2022-11-16 22:34:06 +00:00
ctx , cancelFunc := context . WithCancel ( context . Background ( ) )
2023-02-14 14:57:48 +00:00
serveDone := make ( chan struct { } )
2022-11-16 22:34:06 +00:00
t . Cleanup ( func ( ) {
_ = echoClient . Close ( )
_ = echoServer . Close ( )
cancelFunc ( )
2023-02-14 14:57:48 +00:00
<- serveDone
2022-11-16 22:34:06 +00:00
} )
go func ( ) {
2023-02-14 14:57:48 +00:00
defer close ( serveDone )
2023-08-25 06:10:15 +00:00
err := echo . Serve ( ctx , & provisionersdk . ServeOptions {
Listener : echoServer ,
WorkDirectory : t . TempDir ( ) ,
2022-11-16 22:34:06 +00:00
} )
assert . NoError ( t , err )
} ( )
2023-09-19 06:25:57 +00:00
daemon := provisionerd . New ( func ( ctx context . Context ) ( provisionerdproto . DRPCProvisionerDaemonClient , error ) {
2023-08-04 08:32:28 +00:00
return client . ServeProvisionerDaemon ( ctx , codersdk . ServeProvisionerDaemonRequest {
Organization : org ,
Provisioners : [ ] codersdk . ProvisionerType { codersdk . ProvisionerTypeEcho } ,
Tags : tags ,
} )
2022-11-10 22:37:33 +00:00
} , & provisionerd . Options {
2022-03-22 19:17:50 +00:00
Logger : slogtest . Make ( t , nil ) . Named ( "provisionerd" ) . Leveled ( slog . LevelDebug ) ,
2022-09-04 16:28:09 +00:00
UpdateInterval : 250 * time . Millisecond ,
2023-10-04 21:16:39 +00:00
ForceCancelInterval : 5 * time . Second ,
2023-09-08 09:53:48 +00:00
Connector : provisionerd . LocalProvisioners {
2022-11-22 18:19:32 +00:00
string ( database . ProvisionerTypeEcho ) : sdkproto . NewDRPCProvisionerClient ( echoClient ) ,
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
} ,
} )
2023-09-19 06:25:57 +00:00
closer := & provisionerdCloser { d : daemon }
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
t . Cleanup ( func ( ) {
_ = closer . Close ( )
} )
return closer
}
2022-05-06 14:20:08 +00:00
var FirstUserParams = codersdk . CreateFirstUserRequest {
2022-11-16 23:09:49 +00:00
Email : "testuser@coder.com" ,
Username : "testuser" ,
2023-02-08 20:10:08 +00:00
Password : "SomeSecurePassword!" ,
2022-05-06 14:20:08 +00:00
}
2022-03-07 17:40:54 +00:00
// CreateFirstUser creates a user with preset credentials and authenticates
2022-02-06 00:24:51 +00:00
// with the passed in codersdk client.
2023-05-08 13:59:01 +00:00
func CreateFirstUser ( t testing . TB , client * codersdk . Client ) codersdk . CreateFirstUserResponse {
2022-05-06 14:20:08 +00:00
resp , err := client . CreateFirstUser ( context . Background ( ) , FirstUserParams )
2022-02-06 00:24:51 +00:00
require . NoError ( t , err )
feat: Add provisionerdaemon to coderd (#141)
* feat: Add history middleware parameters
These will be used for streaming logs, checking status,
and other operations related to workspace and project
history.
* refactor: Move all HTTP routes to top-level struct
Nesting all structs behind their respective structures
is leaky, and promotes naming conflicts between handlers.
Our HTTP routes cannot have conflicts, so neither should
function naming.
* Add provisioner daemon routes
* Add periodic updates
* Skip pubsub if short
* Return jobs with WorkspaceHistory
* Add endpoints for extracting singular history
* The full end-to-end operation works
* fix: Disable compression for websocket dRPC transport (#145)
There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing.
This is just tracking some experimentation to fix that race condition
## Run results: ##
- Run 1: peer test failure
- Run 2: peer test failure
- Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45
```
status code 412: The provided project history is running. Wait for it to complete importing!`
```
- Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176
```
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
- Run 5: peer failure
- Run 6: Pass ✅
- Run 7: Peer failure
## Open Questions: ##
### Is `dRPC` or `websocket` at fault for the data race?
It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it.
From `dRPC`'s perspective, the callstack looks like this:
- [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253)
- [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65)
- [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128)
- with finally the data race happening here:
```go
// AppendFrame appends a marshaled form of the frame to the provided buffer.
func AppendFrame(buf []byte, fr Frame) []byte {
...
out := buf
out = append(out, control). // <---------
```
This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame.
Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73
However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race.
In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly).
### Why does cloning on `Read` fail?
Get a bunch of errors like:
```
2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF
2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0
```
# UPDATE:
We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now:
- Run 1: ✅
- Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338
- Run 3: ✅
- Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168
- Run 5: ✅
* fix: Remove race condition with acquiredJobDone channel (#148)
Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83
__Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs.
__Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.
* fix: Bump up workspace history timeout (#149)
This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32
Looking at the timing of the test:
```
t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply
t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running
workspacehistory_test.go:122:
Error Trace: workspacehistory_test.go:122
Error: Condition never satisfied
Test: TestWorkspaceHistory/CreateHistory
```
It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout.
Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here.
In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.
Co-authored-by: Bryan <bryan@coder.com>
2022-02-03 20:34:50 +00:00
2022-03-22 19:17:50 +00:00
login , err := client . LoginWithPassword ( context . Background ( ) , codersdk . LoginWithPasswordRequest {
2022-05-06 14:20:08 +00:00
Email : FirstUserParams . Email ,
Password : FirstUserParams . Password ,
2022-02-06 00:24:51 +00:00
} )
require . NoError ( t , err )
2022-11-09 13:31:24 +00:00
client . SetSessionToken ( login . SessionToken )
2022-03-07 17:40:54 +00:00
return resp
2022-02-06 00:24:51 +00:00
}
2022-03-07 17:40:54 +00:00
// CreateAnotherUser creates and authenticates a new user.
2023-10-12 12:03:16 +00:00
func CreateAnotherUser ( t testing . TB , client * codersdk . Client , organizationID uuid . UUID , roles ... string ) ( * codersdk . Client , codersdk . User ) {
2023-06-14 17:48:43 +00:00
return createAnotherUserRetry ( t , client , organizationID , 5 , roles )
2022-06-29 19:17:32 +00:00
}
2023-10-12 12:03:16 +00:00
func CreateAnotherUserMutators ( t testing . TB , client * codersdk . Client , organizationID uuid . UUID , roles [ ] string , mutators ... func ( r * codersdk . CreateUserRequest ) ) ( * codersdk . Client , codersdk . User ) {
2023-06-14 17:48:43 +00:00
return createAnotherUserRetry ( t , client , organizationID , 5 , roles , mutators ... )
}
2023-10-12 12:03:16 +00:00
func createAnotherUserRetry ( t testing . TB , client * codersdk . Client , organizationID uuid . UUID , retries int , roles [ ] string , mutators ... func ( r * codersdk . CreateUserRequest ) ) ( * codersdk . Client , codersdk . User ) {
2022-03-22 19:17:50 +00:00
req := codersdk . CreateUserRequest {
2022-07-20 15:03:04 +00:00
Email : namesgenerator . GetRandomName ( 10 ) + "@coder.com" ,
2023-10-19 22:16:15 +00:00
Username : RandomUsername ( t ) ,
2023-02-08 20:10:08 +00:00
Password : "SomeSecurePassword!" ,
2022-04-01 19:42:36 +00:00
OrganizationID : organizationID ,
2022-03-07 17:40:54 +00:00
}
2023-06-14 17:48:43 +00:00
for _ , m := range mutators {
m ( & req )
}
2022-06-29 19:17:32 +00:00
2022-05-12 20:56:23 +00:00
user , err := client . CreateUser ( context . Background ( ) , req )
2022-06-29 19:17:32 +00:00
var apiError * codersdk . Error
// If the user already exists by username or email conflict, try again up to "retries" times.
if err != nil && retries >= 0 && xerrors . As ( err , & apiError ) {
if apiError . StatusCode ( ) == http . StatusConflict {
retries --
2023-06-14 17:48:43 +00:00
return createAnotherUserRetry ( t , client , organizationID , retries , roles )
2022-06-29 19:17:32 +00:00
}
}
2022-03-07 17:40:54 +00:00
require . NoError ( t , err )
2023-06-14 17:48:43 +00:00
var sessionToken string
2023-08-11 01:04:35 +00:00
if req . DisableLogin || req . UserLoginType == codersdk . LoginTypeNone {
2023-06-14 17:48:43 +00:00
// Cannot log in with a disabled login user. So make it an api key from
// the client making this user.
token , err := client . CreateToken ( context . Background ( ) , user . ID . String ( ) , codersdk . CreateTokenRequest {
Lifetime : time . Hour * 24 ,
Scope : codersdk . APIKeyScopeAll ,
TokenName : "no-password-user-token" ,
} )
require . NoError ( t , err )
sessionToken = token . Key
2023-08-11 01:04:35 +00:00
} else {
login , err := client . LoginWithPassword ( context . Background ( ) , codersdk . LoginWithPasswordRequest {
Email : req . Email ,
Password : req . Password ,
} )
require . NoError ( t , err )
sessionToken = login . SessionToken
2023-06-14 17:48:43 +00:00
}
2022-03-07 17:40:54 +00:00
2023-08-02 14:31:25 +00:00
if user . Status == codersdk . UserStatusDormant {
// Use admin client so that user's LastSeenAt is not updated.
// In general we need to refresh the user status, which should
// transition from "dormant" to "active".
user , err = client . User ( context . Background ( ) , user . Username )
require . NoError ( t , err )
}
2022-03-07 17:40:54 +00:00
other := codersdk . New ( client . URL )
2023-06-14 17:48:43 +00:00
other . SetSessionToken ( sessionToken )
2023-02-25 22:01:01 +00:00
t . Cleanup ( func ( ) {
other . HTTPClient . CloseIdleConnections ( )
} )
2022-05-12 20:56:23 +00:00
if len ( roles ) > 0 {
// Find the roles for the org vs the site wide roles
orgRoles := make ( map [ string ] [ ] string )
var siteRoles [ ] string
for _ , roleName := range roles {
roleName := roleName
orgID , ok := rbac . IsOrgRole ( roleName )
if ok {
orgRoles [ orgID ] = append ( orgRoles [ orgID ] , roleName )
} else {
siteRoles = append ( siteRoles , roleName )
}
}
// Update the roles
for _ , r := range user . Roles {
siteRoles = append ( siteRoles , r . Name )
}
2022-05-17 18:43:19 +00:00
_ , err := client . UpdateUserRoles ( context . Background ( ) , user . ID . String ( ) , codersdk . UpdateRoles { Roles : siteRoles } )
2022-05-12 20:56:23 +00:00
require . NoError ( t , err , "update site roles" )
// Update org roles
for orgID , roles := range orgRoles {
organizationID , err := uuid . Parse ( orgID )
require . NoError ( t , err , fmt . Sprintf ( "parse org id %q" , orgID ) )
2022-05-25 16:00:59 +00:00
_ , err = client . UpdateOrganizationMemberRoles ( context . Background ( ) , organizationID , user . ID . String ( ) ,
2022-06-01 14:07:50 +00:00
codersdk . UpdateRoles { Roles : roles } )
2022-05-12 20:56:23 +00:00
require . NoError ( t , err , "update org membership roles" )
}
}
2022-08-09 18:16:53 +00:00
return other , user
2022-03-07 17:40:54 +00:00
}
2022-04-06 17:42:40 +00:00
// CreateTemplateVersion creates a template import provisioner job
2022-02-08 18:00:44 +00:00
// with the responses provided. It uses the "echo" provisioner for compatibility
// with testing.
2023-10-12 12:03:16 +00:00
func CreateTemplateVersion ( t testing . TB , client * codersdk . Client , organizationID uuid . UUID , res * echo . Responses , mutators ... func ( * codersdk . CreateTemplateVersionRequest ) ) codersdk . TemplateVersion {
2022-10-10 20:37:06 +00:00
t . Helper ( )
2022-02-08 18:00:44 +00:00
data , err := echo . Tar ( res )
require . NoError ( t , err )
2023-02-04 20:07:09 +00:00
file , err := client . Upload ( context . Background ( ) , codersdk . ContentTypeTar , bytes . NewReader ( data ) )
2022-02-08 18:00:44 +00:00
require . NoError ( t , err )
2023-02-15 17:24:15 +00:00
req := codersdk . CreateTemplateVersionRequest {
2022-10-13 23:02:52 +00:00
FileID : file . ID ,
2022-05-19 18:04:44 +00:00
StorageMethod : codersdk . ProvisionerStorageMethodFile ,
Provisioner : codersdk . ProvisionerTypeEcho ,
2023-02-15 17:24:15 +00:00
}
for _ , mut := range mutators {
mut ( & req )
}
templateVersion , err := client . CreateTemplateVersion ( context . Background ( ) , organizationID , req )
2022-02-06 00:24:51 +00:00
require . NoError ( t , err )
2022-04-06 17:42:40 +00:00
return templateVersion
2022-02-06 00:24:51 +00:00
}
2022-05-18 16:33:33 +00:00
// CreateWorkspaceBuild creates a workspace build for the given workspace and transition.
func CreateWorkspaceBuild (
t * testing . T ,
client * codersdk . Client ,
workspace codersdk . Workspace ,
2022-07-22 17:02:49 +00:00
transition database . WorkspaceTransition ,
2023-07-14 23:07:48 +00:00
mutators ... func ( * codersdk . CreateWorkspaceBuildRequest ) ,
2022-07-22 17:02:49 +00:00
) codersdk . WorkspaceBuild {
2022-05-18 16:33:33 +00:00
req := codersdk . CreateWorkspaceBuildRequest {
2022-05-19 18:04:44 +00:00
Transition : codersdk . WorkspaceTransition ( transition ) ,
2022-05-18 16:33:33 +00:00
}
2023-07-14 23:07:48 +00:00
for _ , mut := range mutators {
mut ( & req )
}
2022-05-18 16:33:33 +00:00
build , err := client . CreateWorkspaceBuild ( context . Background ( ) , workspace . ID , req )
require . NoError ( t , err )
return build
}
2022-04-06 17:42:40 +00:00
// CreateTemplate creates a template with the "echo" provisioner for
2022-02-08 18:00:44 +00:00
// compatibility with testing. The name assigned is randomly generated.
2023-10-12 12:03:16 +00:00
func CreateTemplate ( t testing . TB , client * codersdk . Client , organization uuid . UUID , version uuid . UUID , mutators ... func ( * codersdk . CreateTemplateRequest ) ) codersdk . Template {
2022-06-07 12:37:45 +00:00
req := codersdk . CreateTemplateRequest {
2023-10-19 22:16:15 +00:00
Name : RandomUsername ( t ) ,
2023-04-20 17:59:57 +00:00
VersionID : version ,
2022-06-07 12:37:45 +00:00
}
for _ , mut := range mutators {
mut ( & req )
}
template , err := client . CreateTemplate ( context . Background ( ) , organization , req )
2022-02-06 00:24:51 +00:00
require . NoError ( t , err )
2022-04-06 17:42:40 +00:00
return template
2022-02-06 00:24:51 +00:00
}
2023-11-08 14:54:48 +00:00
// CreateGroup creates a group with the given name and members.
func CreateGroup ( t testing . TB , client * codersdk . Client , organizationID uuid . UUID , name string , members ... codersdk . User ) codersdk . Group {
t . Helper ( )
group , err := client . CreateGroup ( context . Background ( ) , organizationID , codersdk . CreateGroupRequest {
Name : name ,
} )
require . NoError ( t , err , "failed to create group" )
memberIDs := make ( [ ] string , 0 )
for _ , member := range members {
memberIDs = append ( memberIDs , member . ID . String ( ) )
}
group , err = client . PatchGroup ( context . Background ( ) , group . ID , codersdk . PatchGroupRequest {
AddUsers : memberIDs ,
} )
require . NoError ( t , err , "failed to add members to group" )
return group
}
2022-05-11 22:03:02 +00:00
// UpdateTemplateVersion creates a new template version with the "echo" provisioner
// and associates it with the given templateID.
2023-10-12 12:03:16 +00:00
func UpdateTemplateVersion ( t testing . TB , client * codersdk . Client , organizationID uuid . UUID , res * echo . Responses , templateID uuid . UUID ) codersdk . TemplateVersion {
2023-10-10 15:20:31 +00:00
ctx := context . Background ( )
2022-05-11 22:03:02 +00:00
data , err := echo . Tar ( res )
require . NoError ( t , err )
2023-10-10 15:20:31 +00:00
file , err := client . Upload ( ctx , codersdk . ContentTypeTar , bytes . NewReader ( data ) )
2022-05-11 22:03:02 +00:00
require . NoError ( t , err )
2023-10-10 15:20:31 +00:00
templateVersion , err := client . CreateTemplateVersion ( ctx , organizationID , codersdk . CreateTemplateVersionRequest {
2022-05-11 22:03:02 +00:00
TemplateID : templateID ,
2022-10-13 23:02:52 +00:00
FileID : file . ID ,
2022-05-19 18:04:44 +00:00
StorageMethod : codersdk . ProvisionerStorageMethodFile ,
Provisioner : codersdk . ProvisionerTypeEcho ,
2022-05-11 22:03:02 +00:00
} )
require . NoError ( t , err )
return templateVersion
}
2023-10-12 12:03:16 +00:00
func UpdateActiveTemplateVersion ( t testing . TB , client * codersdk . Client , templateID , versionID uuid . UUID ) {
2023-10-10 15:20:31 +00:00
err := client . UpdateActiveTemplateVersion ( context . Background ( ) , templateID , codersdk . UpdateActiveTemplateVersion {
ID : versionID ,
} )
require . NoError ( t , err )
}
2023-11-08 14:54:48 +00:00
// UpdateTemplateMeta updates the template meta for the given template.
func UpdateTemplateMeta ( t testing . TB , client * codersdk . Client , templateID uuid . UUID , meta codersdk . UpdateTemplateMeta ) codersdk . Template {
t . Helper ( )
updated , err := client . UpdateTemplateMeta ( context . Background ( ) , templateID , meta )
require . NoError ( t , err )
return updated
}
2023-10-05 16:49:25 +00:00
// AwaitTemplateVersionJobRunning waits for the build to be picked up by a provisioner.
2023-10-12 12:03:16 +00:00
func AwaitTemplateVersionJobRunning ( t testing . TB , client * codersdk . Client , version uuid . UUID ) codersdk . TemplateVersion {
2023-10-05 16:49:25 +00:00
t . Helper ( )
ctx , cancel := context . WithTimeout ( context . Background ( ) , testutil . WaitShort )
defer cancel ( )
t . Logf ( "waiting for template version %s build job to start" , version )
var templateVersion codersdk . TemplateVersion
require . Eventually ( t , func ( ) bool {
var err error
templateVersion , err = client . TemplateVersion ( ctx , version )
if err != nil {
return false
}
t . Logf ( "template version job status: %s" , templateVersion . Job . Status )
switch templateVersion . Job . Status {
case codersdk . ProvisionerJobPending :
return false
case codersdk . ProvisionerJobRunning :
return true
default :
t . FailNow ( )
return false
}
} , testutil . WaitShort , testutil . IntervalFast , "make sure you set `IncludeProvisionerDaemon`!" )
t . Logf ( "template version %s job has started" , version )
return templateVersion
}
// AwaitTemplateVersionJobCompleted waits for the build to be completed. This may result
// from cancelation, an error, or from completing successfully.
2023-10-12 12:03:16 +00:00
func AwaitTemplateVersionJobCompleted ( t testing . TB , client * codersdk . Client , version uuid . UUID ) codersdk . TemplateVersion {
2022-07-22 17:02:49 +00:00
t . Helper ( )
2023-09-22 18:13:50 +00:00
ctx , cancel := context . WithTimeout ( context . Background ( ) , testutil . WaitLong )
2022-12-13 19:28:07 +00:00
defer cancel ( )
2023-10-05 16:49:25 +00:00
t . Logf ( "waiting for template version %s build job to complete" , version )
2022-04-06 17:42:40 +00:00
var templateVersion codersdk . TemplateVersion
2022-09-04 16:28:09 +00:00
require . Eventually ( t , func ( ) bool {
2022-03-07 17:40:54 +00:00
var err error
2022-12-13 19:28:07 +00:00
templateVersion , err = client . TemplateVersion ( ctx , version )
2023-10-05 16:49:25 +00:00
t . Logf ( "template version job status: %s" , templateVersion . Job . Status )
2022-07-22 17:02:49 +00:00
return assert . NoError ( t , err ) && templateVersion . Job . CompletedAt != nil
2023-10-05 16:49:25 +00:00
} , testutil . WaitLong , testutil . IntervalMedium , "make sure you set `IncludeProvisionerDaemon`!" )
t . Logf ( "template version %s job has completed" , version )
2022-04-06 17:42:40 +00:00
return templateVersion
2022-03-07 17:40:54 +00:00
}
2023-10-03 17:02:56 +00:00
// AwaitWorkspaceBuildJobCompleted waits for a workspace provision job to reach completed status.
2023-10-12 12:03:16 +00:00
func AwaitWorkspaceBuildJobCompleted ( t testing . TB , client * codersdk . Client , build uuid . UUID ) codersdk . WorkspaceBuild {
2022-07-22 17:02:49 +00:00
t . Helper ( )
2022-12-13 19:28:07 +00:00
ctx , cancel := context . WithTimeout ( context . Background ( ) , testutil . WaitShort )
defer cancel ( )
2022-07-22 17:02:49 +00:00
t . Logf ( "waiting for workspace build job %s" , build )
2022-03-22 19:17:50 +00:00
var workspaceBuild codersdk . WorkspaceBuild
2022-09-04 16:28:09 +00:00
require . Eventually ( t , func ( ) bool {
2022-11-14 17:57:33 +00:00
var err error
2022-12-13 19:28:07 +00:00
workspaceBuild , err = client . WorkspaceBuild ( ctx , build )
2022-07-22 17:02:49 +00:00
return assert . NoError ( t , err ) && workspaceBuild . Job . CompletedAt != nil
2023-09-22 18:13:50 +00:00
} , testutil . WaitMedium , testutil . IntervalMedium )
2022-12-13 19:28:07 +00:00
t . Logf ( "got workspace build job %s" , build )
2022-03-07 17:40:54 +00:00
return workspaceBuild
2022-02-12 19:34:04 +00:00
}
2023-03-30 13:24:51 +00:00
// AwaitWorkspaceAgents waits for all resources with agents to be connected. If
// specific agents are provided, it will wait for those agents to be connected
// but will not fail if other agents are not connected.
2023-09-26 11:05:19 +00:00
func AwaitWorkspaceAgents ( t testing . TB , client * codersdk . Client , workspaceID uuid . UUID , agentNames ... string ) [ ] codersdk . WorkspaceResource {
2022-07-22 17:02:49 +00:00
t . Helper ( )
2023-03-30 13:24:51 +00:00
agentNamesMap := make ( map [ string ] struct { } , len ( agentNames ) )
for _ , name := range agentNames {
agentNamesMap [ name ] = struct { } { }
}
2022-12-13 19:28:07 +00:00
ctx , cancel := context . WithTimeout ( context . Background ( ) , testutil . WaitLong )
defer cancel ( )
2022-10-03 21:01:13 +00:00
t . Logf ( "waiting for workspace agents (workspace %s)" , workspaceID )
2022-03-22 19:17:50 +00:00
var resources [ ] codersdk . WorkspaceResource
2022-09-04 16:28:09 +00:00
require . Eventually ( t , func ( ) bool {
2022-02-06 00:24:51 +00:00
var err error
2022-12-13 19:28:07 +00:00
workspace , err := client . Workspace ( ctx , workspaceID )
2022-07-22 17:02:49 +00:00
if ! assert . NoError ( t , err ) {
return false
}
2023-10-09 18:12:28 +00:00
if workspace . LatestBuild . Job . CompletedAt == nil {
return false
}
2022-10-03 21:01:13 +00:00
if workspace . LatestBuild . Job . CompletedAt . IsZero ( ) {
return false
}
for _ , resource := range workspace . LatestBuild . Resources {
2022-04-11 21:06:15 +00:00
for _ , agent := range resource . Agents {
2023-03-30 13:24:51 +00:00
if len ( agentNames ) > 0 {
if _ , ok := agentNamesMap [ agent . Name ] ; ! ok {
continue
}
}
2022-04-11 23:54:30 +00:00
if agent . Status != codersdk . WorkspaceAgentConnected {
chore: add testutil.Eventually and friends (#3389)
This PR adds a `testutil` function aimed to replace `require.Eventually`.
Before:
```go
require.Eventually(t, func() bool { ... }, testutil.WaitShort, testutil.IntervalFast)
```
After:
```go
require.True(t, testutil.EventuallyShort(t, func(ctx context.Context) bool { ... }))
// or the full incantation if you need more control
ctx, cancel := context.WithTimeout(ctx.Background(), testutil.WaitLong)
require.True(t, testutil.Eventually(t, ctx, func(ctx context.Context) bool { ... }, testutil.IntervalSlow))
```
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2022-08-05 15:34:44 +00:00
t . Logf ( "agent %s not connected yet" , agent . Name )
2022-04-11 21:06:15 +00:00
return false
}
2022-03-07 17:40:54 +00:00
}
}
2022-10-03 21:01:13 +00:00
resources = workspace . LatestBuild . Resources
2022-03-07 17:40:54 +00:00
return true
2023-09-22 18:13:50 +00:00
} , testutil . WaitLong , testutil . IntervalMedium )
2022-12-13 19:28:07 +00:00
t . Logf ( "got workspace agents (workspace %s)" , workspaceID )
2022-03-07 17:40:54 +00:00
return resources
2022-02-06 00:24:51 +00:00
}
2022-01-22 21:58:26 +00:00
2022-04-06 17:42:40 +00:00
// CreateWorkspace creates a workspace for the user and template provided.
2022-02-06 00:24:51 +00:00
// A random name is generated for it.
2022-05-23 22:31:41 +00:00
// To customize the defaults, pass a mutator func.
2023-10-12 12:03:16 +00:00
func CreateWorkspace ( t testing . TB , client * codersdk . Client , organization uuid . UUID , templateID uuid . UUID , mutators ... func ( * codersdk . CreateWorkspaceRequest ) ) codersdk . Workspace {
2022-05-23 22:31:41 +00:00
t . Helper ( )
req := codersdk . CreateWorkspaceRequest {
TemplateID : templateID ,
2023-10-19 22:16:15 +00:00
Name : RandomUsername ( t ) ,
2022-06-07 12:37:45 +00:00
AutostartSchedule : ptr . Ref ( "CRON_TZ=US/Central 30 9 * * 1-5" ) ,
2022-06-02 10:23:34 +00:00
TTLMillis : ptr . Ref ( ( 8 * time . Hour ) . Milliseconds ( ) ) ,
2023-10-06 09:27:12 +00:00
AutomaticUpdates : codersdk . AutomaticUpdatesNever ,
2022-05-23 22:31:41 +00:00
}
for _ , mutator := range mutators {
mutator ( & req )
}
2022-09-24 01:17:10 +00:00
workspace , err := client . CreateWorkspace ( context . Background ( ) , organization , codersdk . Me , req )
2022-01-20 13:46:51 +00:00
require . NoError ( t , err )
2022-02-06 00:24:51 +00:00
return workspace
}
2022-01-20 13:46:51 +00:00
2022-06-07 12:37:45 +00:00
// TransitionWorkspace is a convenience method for transitioning a workspace from one state to another.
2023-10-18 22:07:21 +00:00
func MustTransitionWorkspace ( t testing . TB , client * codersdk . Client , workspaceID uuid . UUID , from , to database . WorkspaceTransition , muts ... func ( req * codersdk . CreateWorkspaceBuildRequest ) ) codersdk . Workspace {
2022-06-07 12:37:45 +00:00
t . Helper ( )
ctx := context . Background ( )
workspace , err := client . Workspace ( ctx , workspaceID )
require . NoError ( t , err , "unexpected error fetching workspace" )
require . Equal ( t , workspace . LatestBuild . Transition , codersdk . WorkspaceTransition ( from ) , "expected workspace state: %s got: %s" , from , workspace . LatestBuild . Transition )
2023-10-18 22:07:21 +00:00
req := codersdk . CreateWorkspaceBuildRequest {
2023-11-02 19:41:34 +00:00
TemplateVersionID : workspace . LatestBuild . TemplateVersionID ,
2022-06-07 12:37:45 +00:00
Transition : codersdk . WorkspaceTransition ( to ) ,
2023-10-18 22:07:21 +00:00
}
for _ , mut := range muts {
mut ( & req )
}
build , err := client . CreateWorkspaceBuild ( ctx , workspace . ID , req )
2022-06-07 12:37:45 +00:00
require . NoError ( t , err , "unexpected error transitioning workspace to %s" , to )
2023-10-03 17:02:56 +00:00
_ = AwaitWorkspaceBuildJobCompleted ( t , client , build . ID )
2022-06-07 12:37:45 +00:00
updated := MustWorkspace ( t , client , workspace . ID )
require . Equal ( t , codersdk . WorkspaceTransition ( to ) , updated . LatestBuild . Transition , "expected workspace to be in state %s but got %s" , to , updated . LatestBuild . Transition )
return updated
}
// MustWorkspace is a convenience method for fetching a workspace that should exist.
2023-10-12 12:03:16 +00:00
func MustWorkspace ( t testing . TB , client * codersdk . Client , workspaceID uuid . UUID ) codersdk . Workspace {
2022-06-07 12:37:45 +00:00
t . Helper ( )
ctx := context . Background ( )
ws , err := client . Workspace ( ctx , workspaceID )
if err != nil && strings . Contains ( err . Error ( ) , "status code 410" ) {
ws , err = client . DeletedWorkspace ( ctx , workspaceID )
}
require . NoError ( t , err , "no workspace found with id %s" , workspaceID )
return ws
}
2023-09-30 19:30:01 +00:00
// RequestExternalAuthCallback makes a request with the proper OAuth2 state cookie
// to the external auth callback endpoint.
2023-10-12 12:03:16 +00:00
func RequestExternalAuthCallback ( t testing . TB , providerID string , client * codersdk . Client ) * http . Response {
2023-02-27 16:18:19 +00:00
client . HTTPClient . CheckRedirect = func ( req * http . Request , via [ ] * http . Request ) error {
return http . ErrUseLastResponse
}
state := "somestate"
2023-10-03 14:04:39 +00:00
oauthURL , err := client . URL . Parse ( fmt . Sprintf ( "/external-auth/%s/callback?code=asd&state=%s" , providerID , state ) )
2023-02-27 16:18:19 +00:00
require . NoError ( t , err )
req , err := http . NewRequestWithContext ( context . Background ( ) , "GET" , oauthURL . String ( ) , nil )
require . NoError ( t , err )
req . AddCookie ( & http . Cookie {
Name : codersdk . OAuth2StateCookie ,
Value : state ,
} )
req . AddCookie ( & http . Cookie {
Name : codersdk . SessionTokenCookie ,
Value : client . SessionToken ( ) ,
} )
res , err := client . HTTPClient . Do ( req )
require . NoError ( t , err )
t . Cleanup ( func ( ) {
_ = res . Body . Close ( )
} )
return res
}
2022-03-25 19:48:08 +00:00
// NewGoogleInstanceIdentity returns a metadata client and ID token validator for faking
// instance authentication for Google Cloud.
// nolint:revive
2023-10-12 12:03:16 +00:00
func NewGoogleInstanceIdentity ( t testing . TB , instanceID string , expired bool ) ( * idtoken . Validator , * metadata . Client ) {
2022-03-25 19:48:08 +00:00
keyID , err := cryptorand . String ( 12 )
require . NoError ( t , err )
claims := jwt . MapClaims {
"google" : map [ string ] interface { } {
"compute_engine" : map [ string ] string {
"instance_id" : instanceID ,
} ,
} ,
}
if ! expired {
claims [ "exp" ] = time . Now ( ) . AddDate ( 1 , 0 , 0 ) . Unix ( )
}
token := jwt . NewWithClaims ( jwt . SigningMethodRS256 , claims )
token . Header [ "kid" ] = keyID
privateKey , err := rsa . GenerateKey ( rand . Reader , 2048 )
require . NoError ( t , err )
signedKey , err := token . SignedString ( privateKey )
require . NoError ( t , err )
// Taken from: https://github.com/googleapis/google-api-go-client/blob/4bb729045d611fa77bdbeb971f6a1204ba23161d/idtoken/validate.go#L57-L75
type jwk struct {
Kid string ` json:"kid" `
N string ` json:"n" `
E string ` json:"e" `
}
type certResponse struct {
Keys [ ] jwk ` json:"keys" `
}
validator , err := idtoken . NewValidator ( context . Background ( ) , option . WithHTTPClient ( & http . Client {
Transport : roundTripper ( func ( r * http . Request ) ( * http . Response , error ) {
data , err := json . Marshal ( certResponse {
Keys : [ ] jwk { {
Kid : keyID ,
N : base64 . RawURLEncoding . EncodeToString ( privateKey . N . Bytes ( ) ) ,
E : base64 . RawURLEncoding . EncodeToString ( new ( big . Int ) . SetInt64 ( int64 ( privateKey . E ) ) . Bytes ( ) ) ,
} } ,
} )
require . NoError ( t , err )
return & http . Response {
StatusCode : http . StatusOK ,
2022-03-29 19:59:32 +00:00
Body : io . NopCloser ( bytes . NewReader ( data ) ) ,
2022-03-25 19:48:08 +00:00
Header : make ( http . Header ) ,
} , nil
} ) ,
} ) )
require . NoError ( t , err )
return validator , metadata . NewClient ( & http . Client {
Transport : roundTripper ( func ( r * http . Request ) ( * http . Response , error ) {
return & http . Response {
StatusCode : http . StatusOK ,
2022-03-29 19:59:32 +00:00
Body : io . NopCloser ( bytes . NewReader ( [ ] byte ( signedKey ) ) ) ,
2022-03-25 19:48:08 +00:00
Header : make ( http . Header ) ,
} , nil
} ) ,
} )
}
2022-03-28 19:31:03 +00:00
// NewAWSInstanceIdentity returns a metadata client and ID token validator for faking
// instance authentication for AWS.
2023-10-12 12:03:16 +00:00
func NewAWSInstanceIdentity ( t testing . TB , instanceID string ) ( awsidentity . Certificates , * http . Client ) {
2022-03-28 19:31:03 +00:00
privateKey , err := rsa . GenerateKey ( rand . Reader , 2048 )
require . NoError ( t , err )
document := [ ] byte ( ` { "instanceId":" ` + instanceID + ` "} ` )
hashedDocument := sha256 . Sum256 ( document )
signatureRaw , err := rsa . SignPKCS1v15 ( rand . Reader , privateKey , crypto . SHA256 , hashedDocument [ : ] )
require . NoError ( t , err )
signature := make ( [ ] byte , base64 . StdEncoding . EncodedLen ( len ( signatureRaw ) ) )
base64 . StdEncoding . Encode ( signature , signatureRaw )
certificate , err := x509 . CreateCertificate ( rand . Reader , & x509 . Certificate {
SerialNumber : big . NewInt ( 2022 ) ,
} , & x509 . Certificate { } , & privateKey . PublicKey , privateKey )
require . NoError ( t , err )
certificatePEM := bytes . Buffer { }
err = pem . Encode ( & certificatePEM , & pem . Block {
Type : "CERTIFICATE" ,
Bytes : certificate ,
} )
require . NoError ( t , err )
return awsidentity . Certificates {
awsidentity . Other : certificatePEM . String ( ) ,
} , & http . Client {
Transport : roundTripper ( func ( r * http . Request ) ( * http . Response , error ) {
// Only handle metadata server requests.
if r . URL . Host != "169.254.169.254" {
return http . DefaultTransport . RoundTrip ( r )
}
switch r . URL . Path {
case "/latest/api/token" :
return & http . Response {
StatusCode : http . StatusOK ,
2022-03-29 19:59:32 +00:00
Body : io . NopCloser ( bytes . NewReader ( [ ] byte ( "faketoken" ) ) ) ,
2022-03-28 19:31:03 +00:00
Header : make ( http . Header ) ,
} , nil
case "/latest/dynamic/instance-identity/signature" :
return & http . Response {
StatusCode : http . StatusOK ,
2022-03-29 19:59:32 +00:00
Body : io . NopCloser ( bytes . NewReader ( signature ) ) ,
2022-03-28 19:31:03 +00:00
Header : make ( http . Header ) ,
} , nil
case "/latest/dynamic/instance-identity/document" :
return & http . Response {
StatusCode : http . StatusOK ,
2022-03-29 19:59:32 +00:00
Body : io . NopCloser ( bytes . NewReader ( document ) ) ,
2022-03-28 19:31:03 +00:00
Header : make ( http . Header ) ,
} , nil
default :
panic ( "unhandled route: " + r . URL . Path )
}
} ) ,
}
}
2022-04-19 13:48:13 +00:00
// NewAzureInstanceIdentity returns a metadata client and ID token validator for faking
// instance authentication for Azure.
2023-10-12 12:03:16 +00:00
func NewAzureInstanceIdentity ( t testing . TB , instanceID string ) ( x509 . VerifyOptions , * http . Client ) {
2022-04-19 13:48:13 +00:00
privateKey , err := rsa . GenerateKey ( rand . Reader , 2048 )
require . NoError ( t , err )
rawCertificate , err := x509 . CreateCertificate ( rand . Reader , & x509 . Certificate {
SerialNumber : big . NewInt ( 2022 ) ,
NotAfter : time . Now ( ) . AddDate ( 1 , 0 , 0 ) ,
Subject : pkix . Name {
CommonName : "metadata.azure.com" ,
} ,
} , & x509 . Certificate { } , & privateKey . PublicKey , privateKey )
require . NoError ( t , err )
certificate , err := x509 . ParseCertificate ( rawCertificate )
require . NoError ( t , err )
signed , err := pkcs7 . NewSignedData ( [ ] byte ( ` { "vmId":" ` + instanceID + ` "} ` ) )
require . NoError ( t , err )
err = signed . AddSigner ( certificate , privateKey , pkcs7 . SignerInfoConfig { } )
require . NoError ( t , err )
signatureRaw , err := signed . Finish ( )
require . NoError ( t , err )
signature := make ( [ ] byte , base64 . StdEncoding . EncodedLen ( len ( signatureRaw ) ) )
base64 . StdEncoding . Encode ( signature , signatureRaw )
2023-01-29 21:47:24 +00:00
payload , err := json . Marshal ( agentsdk . AzureInstanceIdentityToken {
2022-04-19 13:48:13 +00:00
Signature : string ( signature ) ,
Encoding : "pkcs7" ,
} )
require . NoError ( t , err )
certPool := x509 . NewCertPool ( )
certPool . AddCert ( certificate )
return x509 . VerifyOptions {
Intermediates : certPool ,
Roots : certPool ,
} , & http . Client {
Transport : roundTripper ( func ( r * http . Request ) ( * http . Response , error ) {
// Only handle metadata server requests.
if r . URL . Host != "169.254.169.254" {
return http . DefaultTransport . RoundTrip ( r )
}
switch r . URL . Path {
case "/metadata/attested/document" :
return & http . Response {
StatusCode : http . StatusOK ,
Body : io . NopCloser ( bytes . NewReader ( payload ) ) ,
Header : make ( http . Header ) ,
} , nil
default :
panic ( "unhandled route: " + r . URL . Path )
}
} ) ,
}
}
2023-10-19 22:16:15 +00:00
func RandomUsername ( t testing . TB ) string {
2023-04-06 16:53:02 +00:00
suffix , err := cryptorand . String ( 3 )
require . NoError ( t , err )
2023-04-20 18:40:36 +00:00
suffix = "-" + suffix
n := strings . ReplaceAll ( namesgenerator . GetRandomName ( 10 ) , "_" , "-" ) + suffix
if len ( n ) > 32 {
n = n [ : 32 - len ( suffix ) ] + suffix
}
return n
2022-01-20 13:46:51 +00:00
}
2022-03-25 19:48:08 +00:00
// Used to easily create an HTTP transport!
type roundTripper func ( req * http . Request ) ( * http . Response , error )
func ( r roundTripper ) RoundTrip ( req * http . Request ) ( * http . Response , error ) {
return r ( req )
}
2022-06-27 18:50:52 +00:00
type nopcloser struct { }
func ( nopcloser ) Close ( ) error { return nil }
2022-09-06 17:07:00 +00:00
// SDKError coerces err into an SDK error.
2023-10-12 12:03:16 +00:00
func SDKError ( t testing . TB , err error ) * codersdk . Error {
2022-09-06 17:07:00 +00:00
var cerr * codersdk . Error
require . True ( t , errors . As ( err , & cerr ) )
return cerr
}
2022-10-18 03:07:11 +00:00
2023-05-08 13:59:01 +00:00
func DeploymentValues ( t testing . TB ) * codersdk . DeploymentValues {
2023-03-07 21:10:01 +00:00
var cfg codersdk . DeploymentValues
opts := cfg . Options ( )
2023-04-07 22:58:21 +00:00
err := opts . SetDefaults ( )
2022-11-08 16:59:39 +00:00
require . NoError ( t , err )
2023-03-07 21:10:01 +00:00
return & cfg
2022-11-08 16:59:39 +00:00
}