coder/docs/templates/troubleshooting.md

153 lines
7.0 KiB
Markdown

# Troubleshooting templates
Occasionally, you may run into scenarios where a workspace is created, but the
agent is either not connected or the
[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script)
has failed or timed out.
## Agent connection issues
If the agent is not connected, it means the agent or
[init script](https://github.com/coder/coder/tree/main/provisionersdk/scripts)
has failed on the resource.
```console
$ coder ssh myworkspace
⢄⡱ Waiting for connection from [agent]...
```
While troubleshooting steps vary by resource, here are some general best
practices:
- Ensure the resource has `curl` installed (alternatively, `wget` or `busybox`)
- Ensure the resource can `curl` your Coder
[access URL](../admin/configure.md#access-url)
- Manually connect to the resource and check the agent logs (e.g.,
`kubectl exec`, `docker exec` or AWS console)
- The Coder agent logs are typically stored in `/tmp/coder-agent.log`
- The Coder agent startup script logs are typically stored in
`/tmp/coder-startup-script.log`
- The Coder agent shutdown script logs are typically stored in
`/tmp/coder-shutdown-script.log`
- This can also happen if the websockets are not being forwarded correctly when
running Coder behind a reverse proxy.
[Read our reverse-proxy docs](../admin/configure.md#tls--reverse-proxy)
## Startup script issues
Depending on the contents of the
[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script),
and whether or not the
[startup script behavior](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script_behavior)
is set to blocking or non-blocking, you may notice issues related to the startup
script. In this section we will cover common scenarios and how to resolve them.
### Unable to access workspace, startup script is still running
If you're trying to access your workspace and are unable to because the
[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script)
is still running, it means the
[startup script behavior](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script_behavior)
option is set to blocking or you have enabled the `--wait=yes` option (for e.g.
`coder ssh` or `coder config-ssh`). In such an event, you can always access the
workspace by using the web terminal, or via SSH using the `--wait=no` option. If
the startup script is running longer than it should, or never completing, you
can try to [debug the startup script](#debugging-the-startup-script) to resolve
the issue. Alternatively, you can try to force the startup script to exit by
terminating processes started by it or terminating the startup script itself (on
Linux, `ps` and `kill` are useful tools).
For tips on how to write a startup script that doesn't run forever, see the
[`startup_script`](#startup_script) section. For more ways to override the
startup script behavior, see the
[`startup_script_behavior`](#startup_script_behavior) section.
Template authors can also set the
[startup script behavior](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script_behavior)
option to non-blocking, which will allow users to access the workspace while the
startup script is still running. Note that the workspace must be updated after
changing this option.
### Your workspace may be incomplete
If you see a warning that your workspace may be incomplete, it means you should
be aware that programs, files, or settings may be missing from your workspace.
This can happen if the
[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script)
is still running or has exited with a non-zero status (see
[startup script error](#startup-script-error)). No action is necessary, but you
may want to
[start a new shell session](#session-was-started-before-the-startup-script-finished-web-terminal)
after it has completed or check the
[startup script logs](#debugging-the-startup-script) to see if there are any
issues.
### Session was started before the startup script finished
The web terminal may show this message if it was started before the
[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script)
finished, but the startup script has since finished. This message can safely be
dismissed, however, be aware that your preferred shell or dotfiles may not yet
be activated for this shell session. You can either start a new session or
source your dotfiles manually. Note that starting a new session means that
commands running in the terminal will be terminated and you may lose unsaved
work.
Examples for activating your preferred shell or sourcing your dotfiles:
- `exec zsh -l`
- `source ~/.bashrc`
### Startup script exited with an error
When the
[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script)
exits with an error, it means the last command run by the script failed. When
`set -e` is used, this means that any failing command will immediately exit the
script and the remaining commands will not be executed. This also means that
[your workspace may be incomplete](#your-workspace-may-be-incomplete). If you
see this error, you can check the
[startup script logs](#debugging-the-startup-script) to figure out what the
issue is.
Common causes for startup script errors:
- A missing command or file
- A command that fails due to missing permissions
- Network issues (e.g., unable to reach a server)
### Debugging the startup script
The simplest way to debug the
[startup script](https://registry.terraform.io/providers/coder/coder/latest/docs/resources/agent#startup_script)
is to open the workspace in the Coder dashboard and click "Show startup log" (if
not already visible). This will show all the output from the script. Another
option is to view the log file inside the workspace (usually
`/tmp/coder-startup-script.log`). If the logs don't indicate what's going on or
going wrong, you can increase verbosity by adding `set -x` to the top of the
startup script (note that this will show all commands run and may output
sensitive information). Alternatively, you can add `echo` statements to show
what's going on.
Here's a short example of an informative startup script:
```shell
echo "Running startup script..."
echo "Run: long-running-command"
/path/to/long-running-command
status=$?
echo "Done: long-running-command, exit status: ${status}"
if [ $status -ne 0 ]; then
echo "Startup script failed, exiting..."
exit $status
fi
```
> **Note:** We don't use `set -x` here because we're manually echoing the
> commands. This protects against sensitive information being shown in the log.
This script tells us what command is being run and what the exit status is. If
the exit status is non-zero, it means the command failed and we exit the script.
Since we are manually checking the exit status here, we don't need `set -e` at
the top of the script to exit on error.