How do I fix "Go Goroutine Leak — Goroutines That Never Exit"?

How to find and fix goroutine leaks in Go — detecting leaks with pprof and goleak, blocked channel patterns, context cancellation, and goroutine lifecycle management.

Fix: Go Goroutine Leak — Goroutines That Never Exit

The Problem

A Go service’s memory and goroutine count grow indefinitely:

# pprof output — goroutine count keeps climbing
goroutine profile: total 14382
# After 1 hour of traffic, this number should stabilize — instead it grows

Or in application logs, memory keeps increasing:

runtime.MemStats.NumGoroutine: 100      # On startup
runtime.MemStats.NumGoroutine: 1500     # After 10 minutes
runtime.MemStats.NumGoroutine: 8200     # After 1 hour

Or a test catches a leak:

--- FAIL: TestHandleRequest (0.12s)
    goroutine_leak_test.go:45: found unexpected goroutines:
        [Goroutine 18 in state chan receive, with main.processItems on top of the stack]

Or the service eventually OOM-crashes or becomes unresponsive after running for hours.

Why This Happens

A goroutine leak occurs when a goroutine is started but never exits. Unlike memory allocated with make or new, goroutines are not garbage collected when unreachable. They only exit when their function returns, when runtime.Goexit() is called, or when the program itself terminates. A single leaked goroutine costs a minimum of 2 KB of stack space, but a goroutine that is blocked on I/O or holding references to heap objects prevents those objects from being collected too. Over hours of traffic, leaked goroutines compound: 10 leaks per request at 100 req/s produces 3.6 million leaked goroutines in a single hour.

The most common causes fall into a few patterns. Blocked channel operations are the leading culprit. A goroutine waits on <-ch but no one ever sends to ch or closes it, so the goroutine blocks forever. The inverse is also common: a goroutine sends to an unbuffered channel, but the receiver has already exited. Missing context cancellation is the second most frequent cause. A goroutine running a long loop or waiting on I/O checks ctx.Done(), but the caller never cancels the context when it finishes, so the goroutine runs indefinitely.

Less obvious sources include goroutines started inside HTTP handlers that outlive the request, time.After called inside a loop (each iteration creates a new timer goroutine that persists until it fires), goroutines blocked on sync.Mutex that will never be unlocked, and goroutines spawned per event in a message consumer that block on downstream services. In all cases, the root issue is the same: the goroutine has no exit path.

Platform and Environment Differences

Goroutine leak behavior changes across operating systems, container runtimes, and CI environments in ways that make detection harder.

GOMAXPROCS and CPU detection. On bare-metal Linux and macOS, runtime.GOMAXPROCS defaults to the number of logical CPUs reported by the OS. Inside a Docker container with CPU limits (e.g., --cpus=2), Go versions before 1.19 still see all host CPUs and set GOMAXPROCS to 8 or 16, causing the scheduler to create more OS threads than the container can use. Go 1.19+ reads cgroup v2 limits on Linux and adjusts automatically, but cgroup v1 (still common on older Docker hosts and AWS ECS) is not auto-detected. The automaxprocs package from Uber fixes this. On macOS Docker Desktop, cgroup detection works because the Linux VM exposes cgroup v2. On Windows with WSL2-backed Docker, the behavior matches Linux cgroup v2.

Profiling tools per platform. net/http/pprof works everywhere, but the visual flamegraph experience differs. On macOS, go tool pprof -http=:8080 opens a browser-based UI that requires graphviz installed via Homebrew. On Linux, the same command works but the system clipboard integration and SVG rendering depend on the desktop environment. fgprof (a wall-clock profiler) reveals goroutines blocked on I/O that standard CPU profiling misses, but it adds overhead and should not run in production on resource-constrained containers. go tool trace captures goroutine scheduling events with nanosecond precision and is the best tool for diagnosing intermittent leaks, but trace files grow quickly and can exceed available memory in long-running CI jobs.

Docker CPU limit vs GOMAXPROCS mismatch. When a container has a 2-CPU limit but GOMAXPROCS is 8, the Go scheduler creates more runnable goroutines than the container can schedule, leading to high context-switch overhead that masks leak symptoms. The service appears slow rather than leaking. Monitor goroutine count separately from CPU usage inside containers. Use runtime.NumGoroutine() and export it as a Prometheus metric.

CI timeout hiding leaks. In CI pipelines (GitHub Actions, GitLab CI, Jenkins), test suites run with tight timeouts. A goroutine leak test using goleak.VerifyTestMain may pass if the leaked goroutine is still in its startup phase when the check runs. Increase the goleak poll interval or add explicit time.Sleep before verification in flaky CI environments. On GitHub Actions specifically, the default runner has 2 vCPUs, which changes goroutine scheduling compared to a developer’s 8-core laptop — leaks that manifest under concurrent load may not appear in CI at all.

Fix 1: Detect Leaks with pprof

The net/http/pprof package exposes goroutine stack traces over HTTP:

// main.go — add pprof endpoints
import (
    _ "net/http/pprof"   // Side-effect import registers handlers
    "net/http"
)

func main() {
    // pprof endpoints on a separate port (don't expose to public)
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()
    // ... rest of your app
}

# View all running goroutines
go tool pprof http://localhost:6060/debug/pprof/goroutine

# Interactive mode
(pprof) top10        # Top 10 goroutine creators
(pprof) list main.   # Show goroutines with 'main.' in the stack

# Save and compare snapshots (detect growth)
curl http://localhost:6060/debug/pprof/goroutine > goroutines_before.pb
# ... run some requests ...
curl http://localhost:6060/debug/pprof/goroutine > goroutines_after.pb
go tool pprof -diff_base goroutines_before.pb goroutines_after.pb

# Quick text dump of all goroutines
curl http://localhost:6060/debug/pprof/goroutine?debug=2

Monitor goroutine count in production:

import (
    "runtime"
    "time"
    "log/slog"
)

func monitorGoroutines(interval time.Duration) {
    ticker := time.NewTicker(interval)
    defer ticker.Stop()
    for range ticker.C {
        count := runtime.NumGoroutine()
        slog.Info("goroutine count", "count", count)
        if count > 10000 {
            slog.Warn("goroutine count exceeds threshold — possible leak", "count", count)
        }
    }
}

Fix 2: Use goleak in Tests

The goleak package detects goroutine leaks in unit tests automatically:

go get go.uber.org/goleak

package mypackage_test

import (
    "testing"
    "go.uber.org/goleak"
)

func TestMain(m *testing.M) {
    // Verify no goroutines are leaked across all tests in the package
    goleak.VerifyTestMain(m)
}

func TestHandleRequest(t *testing.T) {
    defer goleak.VerifyNone(t)  // Verify no leaks after this specific test

    handler := NewRequestHandler()
    handler.Handle(context.Background(), testRequest())
    // goleak will fail the test if any goroutines spawned here are still running
}

goleak checks goroutine state at the end of each test. If goroutines started during the test are still running, the test fails with a stack trace showing where the leaked goroutine was created.

Fix 3: Fix Blocked Channel Patterns

The most common leak — goroutines waiting on channels that will never receive a value:

// LEAKY — goroutine blocks on receive forever if processItem never sends to results
func processItems(items []Item) {
    results := make(chan Result)  // Unbuffered channel

    for _, item := range items {
        go func(item Item) {
            result := process(item)
            results <- result   // If the receiver exits early, this goroutine blocks forever
        }(item)
    }

    // If this returns early (error, timeout), goroutines above are stuck trying to send
    for range items {
        result := <-results
        if err := handleResult(result); err != nil {
            return  // Returns here, but goroutines are still trying to send
        }
    }
}

// FIXED — use a done channel or context to signal goroutines to exit
func processItems(ctx context.Context, items []Item) ([]Result, error) {
    results := make(chan Result, len(items))  // Buffered — goroutines never block on send

    for _, item := range items {
        go func(item Item) {
            select {
            case <-ctx.Done():
                return  // Context cancelled — exit without sending
            case results <- process(item):
                // Sent successfully
            }
        }(item)
    }

    var collected []Result
    for range items {
        select {
        case <-ctx.Done():
            return nil, ctx.Err()
        case result := <-results:
            collected = append(collected, result)
        }
    }
    return collected, nil
}

Always close channels when done writing:

func producer(ch chan<- int) {
    defer close(ch)   // Closing unblocks all receivers waiting on <-ch

    for i := 0; i < 10; i++ {
        ch <- i
    }
}

func consumer(ch <-chan int) {
    for v := range ch {   // range exits when ch is closed
        fmt.Println(v)
    }
    // Goroutine exits cleanly after channel is closed
}

Fix 4: Use Context for Goroutine Lifecycle

Pass context to all goroutines that do I/O or long-running work. Cancel the context when the caller is done:

// LEAKY — goroutine runs forever because it has no exit signal
func startWorker() {
    go func() {
        for {
            msg := fetchMessage()  // Blocks until a message arrives
            process(msg)
            // No way to stop this goroutine
        }
    }()
}

// FIXED — goroutine exits when context is cancelled
func startWorker(ctx context.Context) {
    go func() {
        for {
            select {
            case <-ctx.Done():
                log.Println("Worker stopping:", ctx.Err())
                return  // Clean exit

            default:
                msg, err := fetchMessageWithContext(ctx)
                if err != nil {
                    if ctx.Err() != nil {
                        return  // Context cancelled during fetch — exit
                    }
                    log.Println("Fetch error:", err)
                    continue
                }
                process(msg)
            }
        }
    }()
}

// Caller controls the goroutine's lifetime
func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()  // Cancels the context (and stops the worker) when main exits

    startWorker(ctx)

    // ... rest of main
}

For HTTP handlers — the request context is automatically cancelled when the client disconnects or the request times out:

func handleRequest(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()   // Cancelled when handler returns or client disconnects

    // Pass ctx to goroutines — they'll stop when the request is done
    go func() {
        select {
        case <-ctx.Done():
            return  // Client disconnected — stop background work
        case result := <-doBackgroundWork(ctx):
            log.Println("Background work done:", result)
        }
    }()
}

Fix 5: Fix time.After Leaks in Loops

time.After creates a timer channel that’s garbage collected only after the timer fires — not when the surrounding function returns. In a loop, this creates a goroutine per iteration:

// LEAKY — creates a new timer (and goroutine) on every iteration
func processWithTimeout(items []Item) {
    for _, item := range items {
        select {
        case result := <-process(item):
            handle(result)
        case <-time.After(5 * time.Second):   // New timer goroutine each iteration
            log.Println("Timeout")
        }
    }
}

// FIXED — reuse a single timer
func processWithTimeout(items []Item) {
    timer := time.NewTimer(5 * time.Second)
    defer timer.Stop()   // Cancel the timer when done

    for _, item := range items {
        timer.Reset(5 * time.Second)   // Reset for each iteration

        select {
        case result := <-process(item):
            if !timer.Stop() {
                <-timer.C   // Drain the channel if Stop() returns false
            }
            handle(result)
        case <-timer.C:
            log.Println("Timeout processing item")
        }
    }
}

Common Mistake: Forgetting to drain timer.C after timer.Stop(). If Stop() returns false, the timer already fired and its channel has a value. The next Reset() won’t work correctly until the channel is drained.

Fix 6: Use sync.WaitGroup to Track and Wait for Goroutines

sync.WaitGroup ensures all goroutines finish before the parent function returns:

// LEAKY — goroutines continue after function returns
func processAll(items []Item) {
    for _, item := range items {
        go processItem(item)   // Fire and forget — goroutines outlive the function
    }
    // Function returns immediately — goroutines are orphaned
}

// FIXED — wait for all goroutines to finish
func processAll(ctx context.Context, items []Item) error {
    var wg sync.WaitGroup
    errCh := make(chan error, len(items))   // Buffered — goroutines don't block on send

    for _, item := range items {
        wg.Add(1)
        go func(item Item) {
            defer wg.Done()

            if err := processItem(ctx, item); err != nil {
                errCh <- err
            }
        }(item)
    }

    // Wait for all goroutines to finish
    wg.Wait()
    close(errCh)

    // Collect errors
    var errs []error
    for err := range errCh {
        errs = append(errs, err)
    }

    if len(errs) > 0 {
        return errors.Join(errs...)
    }
    return nil
}

With errgroup for cleaner error handling:

import "golang.org/x/sync/errgroup"

func processAll(ctx context.Context, items []Item) error {
    g, ctx := errgroup.WithContext(ctx)

    for _, item := range items {
        item := item   // Capture loop variable (Go < 1.22)
        g.Go(func() error {
            return processItem(ctx, item)
        })
    }

    return g.Wait()   // Waits for all goroutines; returns first non-nil error
}

errgroup.WithContext cancels the context when any goroutine returns an error, signalling all other goroutines to stop — preventing the leak when one goroutine fails.

Fix 7: Worker Pool Pattern to Bound Goroutine Count

Instead of spawning one goroutine per task (unbounded growth), use a fixed-size worker pool:

func processWithPool(ctx context.Context, items []Item, workerCount int) error {
    jobs := make(chan Item, len(items))
    results := make(chan error, len(items))

    // Start fixed number of workers
    var wg sync.WaitGroup
    for i := 0; i < workerCount; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for item := range jobs {   // Workers exit when jobs channel is closed
                select {
                case <-ctx.Done():
                    return
                default:
                    results <- processItem(ctx, item)
                }
            }
        }()
    }

    // Send all jobs
    for _, item := range items {
        jobs <- item
    }
    close(jobs)   // Signal workers there are no more jobs

    // Wait for workers to finish, then close results
    go func() {
        wg.Wait()
        close(results)
    }()

    // Collect results
    var errs []error
    for err := range results {
        if err != nil {
            errs = append(errs, err)
        }
    }

    if len(errs) > 0 {
        return errors.Join(errs...)
    }
    return nil
}

// Usage
err := processWithPool(ctx, items, runtime.NumCPU())

Still Not Working?

Check for goroutines blocked on mutex — a goroutine waiting on a locked sync.Mutex is harder to spot than a blocked channel. Use pprof’s mutex profile:

curl http://localhost:6060/debug/pprof/mutex?debug=1

Check for goroutines in syscall state — goroutines making blocking system calls (DNS resolution, file I/O without context) can block indefinitely:

curl http://localhost:6060/debug/pprof/goroutine?debug=2 | grep -A 5 "syscall"

Use context-aware versions of blocking operations: net.DefaultResolver.LookupHost(ctx, ...) instead of net.LookupHost(...).

Long-lived HTTP connections — http.Client connections stay open in the pool. If the pool grows unboundedly, set transport limits:

transport := &http.Transport{
    MaxIdleConns:        100,
    MaxIdleConnsPerHost: 10,
    IdleConnTimeout:     90 * time.Second,
}
client := &http.Client{Transport: transport}

Goroutines blocked on DNS resolution — on Linux, pure-Go DNS resolution (GODEBUG=netdns=go) uses goroutines for lookups. If /etc/resolv.conf points to a slow or unreachable DNS server, lookup goroutines accumulate. Switch to cgo resolver (GODEBUG=netdns=cgo) or fix the DNS server. On macOS, the cgo resolver is the default because the system DNS resolution path requires it.

Leaked goroutines inside third-party libraries — libraries that start background goroutines (gRPC health checkers, database connection pool managers, Kafka consumers) may leak if you do not call their Close() or Stop() method. Use defer client.Close() immediately after creation and verify with goleak in integration tests.