Fix: GitHub Actions Runner Failed to Start or Connect
Part of: Docker, DevOps & Infrastructure
Quick Answer
Fix GitHub Actions self-hosted runner failures including connection issues, version mismatches, and registration problems with step-by-step solutions.
The Error
You set up a self-hosted runner for GitHub Actions and see one of these messages:
Error: The self-hosted runner lost communication with the server.Could not resolve host: github.comRunner connect error: The HTTP request timed out after 00:01:00.Or the runner appears offline in your repository’s Settings > Actions > Runners page, even though you believe it’s running.
Why This Happens
Self-hosted runners maintain a persistent long-poll HTTPS connection to GitHub’s job-dispatch service. The runner agent (Runner.Listener) keeps that socket open, waits for a job, leases it, then hands the work to a worker process (Runner.Worker). When the listener can’t reach GitHub, can’t authenticate, or can’t be matched to a queued job, the runner shows offline and your workflow sits in a “Waiting for a runner” state indefinitely. The error message you see is downstream of the failure — the root cause is almost always one of four categories: network reachability, runner agent state, label or group misrouting, or host resource exhaustion.
GitHub regularly updates the runner application and stops accepting connections from versions that fall outside the supported window. The agent has auto-update logic, but that update path itself depends on the runner being able to reach GitHub at the moment a new release ships. If your runner was offline when an update was pushed, it can be stuck on a version GitHub no longer accepts, which then prevents it from coming back online — a chicken-and-egg loop that requires a manual upgrade.
There’s also a class of failures that has nothing to do with the runner itself: the GitHub-side configuration can silently keep jobs from ever reaching the runner. Organization spending limits, restricted runner groups, missing labels, and disabled Actions at the repo or org level all produce the same surface symptom (“job pending, runner idle”) but require completely different fixes. The diagnostic timeline below walks through how to separate these cases in order of likelihood.
Diagnostic Timeline
Use this sequence the moment a self-hosted runner stops picking up jobs. Each step takes under a minute and rules out one root cause.
- Minute 0 — Confirm the runner row in Settings > Actions > Runners. Green dot with “Idle” means the agent is connected and waiting for work; the issue is label/group routing or repo permissions, not the runner. Red dot with “Offline” means the agent itself can’t talk to GitHub — jump to network and version checks.
- Minute 1 — Compare the workflow’s
runs-on:value against the runner’s labels. Open the failing workflow file, note the label(s), then click into the runner row and compare. A typo (self-hosted-linuxvsself-hosted,linux) sends jobs to a phantom runner. - Minute 2 — Check organization Actions spending and runner groups. Org Settings > Billing > Plans and add-ons shows whether you’ve hit the Actions minutes cap (this affects GitHub-hosted runners but also queues self-hosted jobs that depend on workflow_run). Org Settings > Actions > Runner groups shows whether the runner’s group is restricted to specific repositories.
- Minute 3 — On the runner host, check the listener process. Run
ps aux | grep Runner.Listener(orGet-Process Runner.Listeneron Windows). If the process is missing, the service crashed; if it’s running but offline, the agent thinks it’s connected but GitHub disagrees — usually a version or token mismatch. - Minute 4 — Tail the diagnostic log.
tail -50 _diag/Runner_*.logshows the exact handshake. A “401 Unauthorized” points at an expired or revoked registration; “Connect timeout” points at network or DNS; “Version not supported” points at an upgrade. - Minute 5 — Check disk and inode usage.
df -handdf -i. A full disk silently kills the worker after job start, which looks identical to a connection drop in the UI. - Minute 6 — Restart the service interactively, not as a daemon.
sudo ./svc.sh stop && ./run.sh. Running the agent in the foreground surfaces handshake errors that the systemd journal sometimes truncates.
If you reach minute 6 without a clear cause, you’re almost certainly looking at a corporate proxy doing TLS inspection, a DNS split-horizon issue, or an outbound firewall change that the runner host can’t see. Move to Fix 1.
Fix 1: Check Network Connectivity
The runner needs outbound HTTPS access to several GitHub domains. Test connectivity from the runner machine:
curl -v https://github.com
curl -v https://api.github.com
curl -v https://codeload.github.com
curl -v https://objects.githubusercontent.comAll must return HTTP 200 or 301. If any fail, check your firewall rules. The runner communicates exclusively over HTTPS (port 443).
For runners behind a corporate proxy:
export https_proxy=http://proxy.company.com:8080
export http_proxy=http://proxy.company.com:8080
export no_proxy=localhost,127.0.0.1Add these to the runner’s .env file (located in the runner directory) to persist across restarts.
Pro Tip: GitHub publishes its IP ranges via the meta API. Use the
actionskey to find the IP ranges your firewall needs to allow.
Fix 2: Update the Runner Version
GitHub requires runners to be within a certain version range. Check your current version:
./run.sh --versionCompare it with the latest release on GitHub. If your version is more than a few minor versions behind, update:
# Stop the runner
sudo ./svc.sh stop
# Download and extract the latest version
curl -o actions-runner-linux-x64.tar.gz -L \
https://github.com/actions/runner/releases/download/v2.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
tar xzf actions-runner-linux-x64.tar.gz
# Restart
sudo ./svc.sh startThe runner has auto-update capability, but it sometimes fails if the runner process isn’t running when an update is published.
Fix 3: Re-register the Runner
Registration tokens expire after 1 hour. If the runner was configured with an expired token, it won’t connect. Re-register:
# Remove existing registration
./config.sh remove --token YOUR_REMOVAL_TOKEN
# Generate a new token from:
# Settings > Actions > Runners > New self-hosted runner
# Re-configure
./config.sh --url https://github.com/OWNER/REPO --token NEW_TOKENFor organization-level runners, use the organization settings page instead. You can also generate tokens via the GitHub API:
curl -X POST \
-H "Authorization: token YOUR_PAT" \
https://api.github.com/repos/OWNER/REPO/actions/runners/registration-tokenFix 4: Fix Label and Group Mismatches
Jobs target runners using labels. If your workflow specifies a label the runner doesn’t have, the job queues forever:
# Workflow expects this label
runs-on: self-hosted-gpu
# But runner was configured with
# ./config.sh --labels self-hosted,linux,x64Check runner labels in Settings > Actions > Runners. Add missing labels:
# You must remove and re-register to change labels
./config.sh remove --token TOKEN
./config.sh --url https://github.com/OWNER/REPO \
--token NEW_TOKEN \
--labels self-hosted,linux,x64,self-hosted-gpuCommon Mistake: Runner groups (enterprise/organization feature) can restrict which repositories a runner serves. If your runner is in a group that doesn’t include your repository, jobs won’t be routed to it. Check organization Settings > Actions > Runner groups.
Fix 5: Fix Docker-Based Runner Issues
If you run the GitHub Actions runner inside a Docker container, several issues can arise:
# Common mistake: running as root without --user
FROM ubuntu:22.04
# Runner refuses to run as root by defaultThe runner won’t start as root unless you set RUNNER_ALLOW_RUNASROOT=1:
docker run -e RUNNER_ALLOW_RUNASROOT=1 \
-v /var/run/docker.sock:/var/run/docker.sock \
your-runner-imageFor Docker-in-Docker workflows, mount the Docker socket:
docker run -v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp:/tmp \
your-runner-imageMake sure the runner container has enough disk space for workspace files and Docker layer caching.
Fix 6: Address Resource Limits
The runner may crash or hang if the machine runs out of resources. Check:
# Memory
free -h
# Disk space
df -h
# CPU
top -bn1 | head -5
# Check if runner process is alive
ps aux | grep Runner.ListenerCommon resource issues:
- Disk full: Old workflow artifacts and Docker images accumulate. Clean up with
docker system prune -afand clear the runner’s_workdirectory. - Memory exhaustion: The runner itself uses ~200MB, but your workflows may need much more. Monitor with
dmesg | grep -i oomto check for OOM kills. - Too many concurrent jobs: By default, a runner processes one job at a time. Running multiple runners on the same machine requires enough resources for all concurrent jobs.
Fix 7: Fix GITHUB_TOKEN Permissions
The runner uses a GITHUB_TOKEN that’s automatically generated for each workflow run. If permissions are too restrictive, steps that interact with the repository may fail:
permissions:
contents: read
packages: write
issues: writeFor organization repositories with restrictive default permissions, set permissions explicitly in your workflow:
jobs:
build:
runs-on: self-hosted
permissions:
contents: write
pull-requests: writeCheck your organization settings under Settings > Actions > General > Workflow permissions. “Read repository contents” is the most restrictive default and may block operations like pushing commits or creating releases.
Fix 8: Debug Using Runner Logs
The runner writes detailed logs that reveal exactly why it can’t connect:
# Service logs (if installed as service)
journalctl -u actions.runner.OWNER-REPO.RUNNER_NAME -f
# Or check the log files directly
cat _diag/Runner_*.log | tail -100
cat _diag/Worker_*.log | tail -100Look for these key messages:
"Authentication failed"— Token expired or invalid. Re-register."Http response code: Unauthorized"— PAT or app token lacks required scopes."Connect timeout"— Network issue. Check firewall and DNS."Version not supported"— Runner too old. Update."No free disk space"— Clean up the_workdirectory.
Enable diagnostic logging by creating a .env file in the runner directory:
ACTIONS_RUNNER_DEBUG=true
ACTIONS_STEP_DEBUG=trueStill Not Working?
Check if GitHub is down. Visit githubstatus.com before deep-diving into your configuration.
Verify DNS resolution. Run
nslookup github.comfrom the runner machine. Corporate DNS servers sometimes block or redirect GitHub domains.Check TLS certificates. Corporate proxies that perform SSL inspection can break the runner’s HTTPS connection. Add your corporate CA certificate to the runner’s trust store at the OS level so the .NET runtime that ships with the agent picks it up.
Try running interactively. Stop the service and run
./run.shdirectly. This shows real-time errors that the service logs might not capture.Check Docker image compatibility. If using a container-based runner, ensure the base image has all required dependencies (
libicu,libssl,git).Monitor the runner process. Use
systemctl status actions.runner.*to check if the service is actually running or if it crashed silently.Check organization spending limits. Even self-hosted runners can be blocked when an organization hits its monthly Actions storage or data transfer cap. Org Settings > Billing > Plans and add-ons shows usage. Bumping the limit immediately frees pending jobs without restarting anything.
Confirm Actions is enabled at every level. Repo Settings > Actions > General, then Org Settings > Actions > General. A “Disabled” setting at the org level overrides every repo and silently queues jobs. The runner stays “Idle” because it’s healthy — there just isn’t a job allowed to reach it.
Look for ephemeral runner exhaustion. If you registered the runner with
--ephemeral, it accepts exactly one job and de-registers. A workflow that usesruns-on: self-hostedafter that finds zero matching runners. Add a re-registration loop or switch to a non-ephemeral configuration if you need persistent capacity.Audit
_work/_tempownership. A previous job that ran as root can leave files the runner user can’t delete on the next checkout. The next job fails before any of your steps execute.chown -R runner:runner _workresolves it.Check for IPv6 surprises. Some runner hosts resolve
github.comto an IPv6 address by default but only have IPv4 outbound through the corporate firewall. The TCP connection silently times out instead of failing fast. Force IPv4 by settingprecedence ::ffff:0:0/96 100in/etc/gai.confor by editing the firewall to allow IPv6 egress on 443.Watch for clock skew. TLS handshakes fail with cryptic errors when the runner clock drifts more than five minutes from real time. Enable
chronydorsystemd-timesyncdand confirm withtimedatectl status. Hosts that have been suspended or paused (common with VM-based runners) frequently come back with a stale clock.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: gh CLI Not Working — Auth Scopes, Multiple Accounts, PR Create Errors, and Enterprise Hosts
How to fix GitHub CLI errors — gh auth login token scopes missing, multiple accounts switching, gh pr create permission denied, GHE host auth, gh repo clone vs git clone, and API rate limits.
Fix: Nox Not Working — Session Errors, Virtualenv Backends, and Reuse Logic
How to fix Nox errors — no noxfile.py found, session not detected, virtualenv backend uv not installed, session.install fails outside virtualenv, parametrize matrix exploding, and reuse_venv confusion.
Fix: Tox Not Working — Environment Creation, Config Errors, and Multi-Python Testing
How to fix Tox errors — ERROR cannot find Python interpreter, tox.ini config parsing error, allowlist_externals required, recreating environments slow, pyproject.toml integration, and matrix env selection.
Fix: Turborepo Not Working — Cache Never Hits, Pipeline Not Running, or Workspace Task Fails
How to fix Turborepo issues — turbo.json pipeline configuration, cache keys, remote caching setup, workspace filtering, and common monorepo task ordering mistakes.