Fix: py-spy Not Working — Attach Permission, Empty Output, and Native Frame Errors
Part of: Python Errors
Quick Answer
How to fix py-spy errors — Operation not permitted ptrace, flamegraph blank, missing native code frames, top mode shows no Python frames, dump command empty, and subprocess inheritance.
The Error
You try to attach py-spy to a running process and get permission denied:
$ py-spy dump --pid 12345
Error: Operation not permitted: ptrace failedOr the flamegraph generated is empty or just shows <idle>:
$ py-spy record -o profile.svg --pid 12345 --duration 30
# After 30 seconds, profile.svg shows almost nothingOr you see “thread state was not Python” errors:
Error: Unable to get python interpreter state, can't profileOr native (C extension) frames don’t appear:
$ py-spy record -o profile.svg python my_script.py
# Flamegraph shows numpy.zeros calls but no breakdown of what numpy does internallyOr subprocess profiling misses child processes:
$ py-spy record -o profile.svg python my_script.py
# Script spawns worker processes; py-spy only profiles the parentpy-spy is the gold-standard Python CPU profiler — Rust-based, sample-based (low overhead), can attach to already-running processes without restart. It’s the right tool for “production is slow, why?” debugging — but ptrace permission issues, sampling-vs-deterministic confusion, and native frame handling produce specific failures. This guide covers each.
Why This Happens
py-spy works by attaching to a running Python process via ptrace (Linux/macOS) or similar system calls. It samples the interpreter’s call stack at a fixed rate (default 100 Hz). No instrumentation, no code changes — just observation. Because it’s sample-based, very fast functions may not appear in profiles; only frames captured at sample points show up.
ptrace is a privileged system call on Linux. Default kernel hardening (Yama LSM, ptrace_scope=1) restricts ptrace to parent processes only — attaching to arbitrary processes requires either CAP_SYS_PTRACE capability, root, or relaxing the kernel setting.
Fix 1: Installation and Basic Use
pip install py-spy
# Or via cargo for the latest
cargo install py-spyThree main commands:
# Profile a new process — outputs flamegraph SVG
py-spy record -o profile.svg python my_script.py
# Live top view (like Linux top, but for Python)
py-spy top --pid 12345
# One-shot stack dump for all threads
py-spy dump --pid 12345Common Mistake: Running py-spy record against a one-shot script and expecting the flamegraph to show the script’s logic. py-spy samples — a script that runs in 100ms only gets a handful of samples and produces a useless flamegraph. Always profile workloads that run for at least several seconds; ideally minutes.
Profile until script exits:
py-spy record -o profile.svg -- python my_script.py
# Note the `--` before pythonProfile for fixed duration:
py-spy record -o profile.svg --duration 60 --pid 12345
# Records 60 seconds of samples from PID 12345Fix 2: Permission Denied on Attach
$ py-spy dump --pid 12345
Error: Permission denied: cannot read process memoryOr:
ptrace: Operation not permitted (os error 1)Linux blocks ptrace to non-child processes by default for security.
Quick fix — sudo:
sudo py-spy dump --pid 12345
sudo py-spy record -o profile.svg --pid 12345 --duration 30Permanent fix — relax ptrace scope:
# Check current setting
cat /proc/sys/kernel/yama/ptrace_scope
# 0 = ptrace any process
# 1 = ptrace only descendants (default on Ubuntu/Debian)
# 2 = require CAP_SYS_PTRACE
# 3 = ptrace disabled
# Temporarily allow ptrace any process
sudo sysctl kernel.yama.ptrace_scope=0
# Permanently
echo "kernel.yama.ptrace_scope = 0" | sudo tee /etc/sysctl.d/10-ptrace.conf
sudo sysctl -p /etc/sysctl.d/10-ptrace.confReduce security risk — grant just py-spy the capability:
sudo setcap cap_sys_ptrace=eip $(which py-spy)
# Now py-spy can attach to any process without sudoThis is safer than ptrace_scope=0 — only py-spy gets the elevated permission, not every process you run.
Docker containers — add the capability:
docker run --cap-add SYS_PTRACE ...Or for security policies, use a sidecar container:
services:
app:
image: myapp
py-spy:
image: python:3.12
pid: "service:app" # Share PID namespace
cap_add: [SYS_PTRACE]
command: py-spy record -o /output/profile.svg --pid 1 --duration 60Pro Tip: Set cap_sys_ptrace on the py-spy binary once via setcap. After that, profile any process without sudo or kernel config changes. The security risk is bounded to py-spy specifically, not your shell or all binaries.
Fix 3: Sampling Rate
py-spy record -o profile.svg --rate 250 --pid 12345
# Sample at 250 Hz instead of default 100Higher rates = more detail but more overhead:
| Rate | Overhead | Use case |
|---|---|---|
| 1 Hz | 0.01% | Very long observations (hours), low overhead |
| 100 Hz (default) | ~1% | General profiling |
| 500 Hz | ~5% | Short bursts, fine-grained |
| 1000 Hz | ~10% | Very fast functions |
For production profiling where overhead matters, stick to 100 Hz or less. For debugging brief operations, 500-1000 Hz catches more.
Sampling vs deterministic profilers:
- Sampling (py-spy, py-spy) — periodic snapshots, low overhead, may miss fast events
- Deterministic (cProfile) — records every call, high overhead, captures everything
py-spy’s sampling means very brief functions called rarely won’t show up. For “what’s slow in this hot loop?”, py-spy is perfect; for “did this function get called at all?”, cProfile is better.
Fix 4: Native Frames (C Extensions)
py-spy record -o profile.svg --native --pid 12345--native resolves C extension frames (NumPy, PyTorch, lxml, etc.) instead of showing them as opaque blocks.
Without --native:
my_func
numpy.dot
<native code> ← Just a black boxWith --native:
my_func
numpy.dot
__dgemm_kernel_avx2 ← Actual BLAS function
matmul_internalCommon Mistake: Profiling NumPy/PyTorch code without --native and concluding the bottleneck is “matrix multiplication” — but the real question is “which BLAS kernel” or “is GEMM blocking on memory bandwidth.” Native frames reveal the actual hot spots.
--native overhead — adds 10-50% to sampling overhead because it resolves symbols at each sample. For long-running production profiles, leave native off; for targeted hot-path investigations, turn it on.
For PyTorch profiling that benefits from native frames, see PyTorch not working.
Fix 5: Subprocess and Multi-Process Profiling
py-spy record -o profile.svg --subprocesses python my_script.py--subprocesses follows child processes spawned via os.fork, subprocess.Popen, multiprocessing. Each subprocess gets its own sample stream in the same flamegraph.
Without --subprocesses:
# my_script.py
from multiprocessing import Pool
with Pool(4) as p:
p.map(slow_function, range(1000))py-spy record -o profile.svg python my_script.py
# Profile shows only the parent process — workers invisibleWith --subprocesses:
py-spy record -o profile.svg --subprocesses python my_script.py
# Profile includes all worker processesFor containerized workloads:
py-spy record -o profile.svg --pid 12345 --subprocesses --duration 60For multiprocessing patterns that affect profiling, see Python multiprocessing not working.
Fix 6: Top Mode (Live)
py-spy top --pid 12345Live TUI like Linux top but for Python — shows which functions are using CPU right now:
%Own %Total Function (filename)
80.0% 80.0% slow_function (my_script.py)
15.0% 95.0% process_data (my_script.py)
5.0% 5.0% <built-in method>Sort modes (press the key while top is running):
| Key | Sort by |
|---|---|
1 | %Own |
2 | %Total |
3 | Function name |
4 | Time spent |
q | Quit |
top is great for “is this still happening?” queries. Attach to a running process for a few seconds, see what’s hot, detach.
Pro Tip: Use py-spy top as a quick diagnostic before deeper analysis. If top shows your hot function consistently, you have a CPU problem worth profiling further. If top shows everything is idle but the process is slow, your bottleneck is I/O — switch to other tools (strace for syscalls, iotop for disk).
Fix 7: Dump for Stuck/Hung Processes
py-spy dump --pid 12345Prints the current Python stack of every thread — instant snapshot, no sampling. Perfect for hung processes:
Thread 0x7f8b2c19c700 (active+gil): "MainThread"
fetch_data (my_module.py:42)
main (my_module.py:78)
<module> (my_script.py:5)
Thread 0x7f8b1a3fd700 (idle): "Thread-1"
wait (threading.py:312)
join (threading.py:355)
main (my_module.py:80)(active+gil) = currently executing Python; (idle) = waiting (I/O, lock, sleep).
Common Mistake: Using top on a hung process. py-spy top shows CPU usage — a deadlocked process shows 0% CPU and you learn nothing. Use dump for hangs; it shows where every thread is parked.
Profile a stuck process from a remote machine:
ssh prod-server "sudo py-spy dump --pid 12345"Output is plain text — easy to grep and share with teammates.
Fix 8: Reading Flamegraphs
py-spy generates SVG flamegraphs — interactive in any browser.
py-spy record -o profile.svg --pid 12345 --duration 60
open profile.svg # macOS — opens in default browserFlamegraph anatomy:
- X-axis (width) = total sample time at that function
- Y-axis (depth) = call stack — deeper = nested calls
- Color = arbitrary, helps distinguish adjacent frames
- Click a frame = zoom in to that subtree
- Search box = highlight matching frames
To find bottlenecks:
- Look for wide blocks at the top — these are the leaves of the stack, the actual work being done
- Look for stacks that recur — same function called from many paths is a candidate for caching
- Ignore tall narrow towers — they’re deep but not consuming much time
Reverse flamegraph (icicle graph) — root at top, leaves at bottom:
py-spy record -o profile.svg --flamegraph-direction down --pid 12345Useful when you want to start from “what’s calling X?” rather than “where does X live?”
Speedscope format for richer analysis:
py-spy record -o profile.json --format speedscope --pid 12345Then open in speedscope.app — interactive timeline, multiple visualization modes, filtering.
Production Incident Lens — Attaching Without Adding Risk
The reason teams reach for py-spy mid-incident is exactly the reason they hesitate: the box is on fire and you need to look inside without making it worse. Two failure modes show up repeatedly during real incidents.
Failure mode 1 — py-spy itself spikes CPU on attach. Attaching to a process with thousands of threads, deep stacks, or heavy native code can momentarily push CPU higher while symbols resolve. On a worker already running at 85% CPU, that initial burst can tip latency over the SLO. Mitigation: start with --rate 50 --duration 30 on a single worker behind a healthy load balancer, never the whole fleet, and never the leader of a quorum.
# Lowest-risk first attach
py-spy record -o /tmp/profile.svg --pid <one-worker-pid> --rate 50 --duration 30Failure mode 2 — CAP_SYS_PTRACE leaks across the blast radius. The convenient sudo sysctl kernel.yama.ptrace_scope=0 you set during the incident often gets forgotten. It stays in /etc/sysctl.d/ until someone audits it months later. Every process on the box can now ptrace every other process — a real privilege-escalation accelerant if the host is later compromised. The correct pattern is setcap cap_sys_ptrace=eip on the py-spy binary itself, scoped to that binary, scoped to the operator.
Blast radius checklist before you attach to production:
- One worker, not the fleet — pull it out of rotation if possible.
--rate 50,--duration 30for the first sample. Increase only if results are too sparse.- Run from a sidecar container with shared PID namespace, not inside the app container.
- Capture the SVG to a writable volume you can grab even if the worker is later replaced.
- Reset any kernel toggles you flipped —
ptrace_scopeshould go back to1once the investigation closes.
Real incident pattern — the “p99 climbed at 03:00 and we don’t know why” page. Metrics show CPU is up, logs show nothing unusual, and the obvious suspects (deploys, traffic spike) are clean. The right move is py-spy dump --pid <pid> first — instant snapshot, zero sampling overhead, tells you what every thread is doing right now. If most threads are in socket.recv or psycopg2, the answer is downstream and profiling further won’t help. If most threads are in CPU-bound Python, that’s when you switch to record for a flamegraph.
Pro Tip: Treat py-spy access like a sharp tool. Document who has CAP_SYS_PTRACE, in which environments, and audit it quarterly. The first time you reach for py-spy at 3am is not the time to also be debugging permissions.
Still Not Working?
py-spy vs scalene vs cProfile
- py-spy — Sampling, low overhead, attach to running processes. Best for production debugging.
- scalene — Sampling + CPU + memory + GPU. Slightly more overhead than py-spy. Best when you want everything in one tool.
- cProfile — Deterministic stdlib profiler, captures every call. Best for unit-test-style profiling of specific code.
- memray — Memory profiling specifically. See memray not working.
For “production is slow,” start with py-spy. For “this test is slow,” use cProfile or pytest-benchmark.
Profiling pytest Tests
py-spy record -o test-profile.svg -- pytest tests/slow_test.pyOr use pytest-py-spy:
pip install pytest-py-spy
pytest --py-spy tests/This is the cleanest way to investigate a slow CI suite — record once, share the SVG in the PR comment, let reviewers see exactly which test eats the budget.
Profiling Async Code
Async code is tricky to profile — many tasks share the event loop. py-spy handles asyncio reasonably:
py-spy record --pid 12345 -o asyncio-profile.svgThe flamegraph shows asyncio.events.run_forever at the top, with coroutines underneath. Look for _run_once and the coroutines it calls. If _run_once itself is wide, the loop is saturated — you need more workers or to move blocking work off the loop. If _run_once is narrow but a single coroutine dominates, that coroutine is the culprit.
Profiling Production with Minimal Risk
# 30 seconds of sampling at 50 Hz on a single worker
py-spy record -o profile.svg --pid <worker-pid> --duration 30 --rate 5050 Hz with 30 second duration: ~0.5% CPU overhead, low risk of affecting production traffic. For longer observations, lower the rate further.
Pro tip for Docker: Don’t run py-spy inside the same container as the app for production profiling. Use a sidecar container that shares the PID namespace — keeps py-spy off the production hot path.
Combining with logging / metrics
Production profiling complements logs and metrics — not replaces them. Use py-spy when:
- Metrics show CPU is high
- Logs don’t reveal the cause
- You need to see what code path is responsible
Structured logs answer “what happened”; metrics answer “how much”; py-spy answers “where in the code did it happen.” Reach for it once the first two say something is wrong but neither says where.
Output Formats
py-spy record -o profile.svg # Flamegraph SVG
py-spy record -o profile.json --format speedscope # Speedscope JSON
py-spy record -o profile.raw --format raw # Raw sample dataRaw format is useful for custom analysis — feed into your own scripts to compute custom metrics.
Live Profiling for FastAPI / Uvicorn
# Find the worker PID
ps aux | grep uvicorn
# Profile
py-spy top --pid <worker-pid>Watch the workers handle requests in real time. For Uvicorn worker configuration, see Uvicorn not working.
py-spy Hangs the Target Process on Attach
A rare but real failure: py-spy attaches, the target process freezes for several seconds, then resumes. The cause is usually that py-spy is walking a very large heap or many threads while symbol resolution blocks on disk I/O (cold pyc cache, network-mounted site-packages). Mitigations:
- Pre-warm the worker before profiling — issue a few requests so files are page-cached.
- Skip
--nativefor the first sample. It is the most expensive flag. - Avoid network-mounted Python installations in production. Local disk only.
If the hang persists, switch to dump (snapshot only, no sampling loop) and skip record for that host.
Kernel and Container Edge Cases
gVisor and some other sandboxed runtimes silently disable ptrace. py-spy attaches with no error but returns empty stacks. Verify the runtime supports ptrace before debugging “py-spy broken.” Check /proc/<pid>/status for TracerPid — if you can’t see the field, the kernel is hiding it from you and py-spy will fail.
For Kubernetes, securityContext.capabilities.add: [SYS_PTRACE] is required on the pod that runs py-spy. PSPs or Pod Security Standards at restricted level will block it. You need an exception namespace or a privileged debug pod.
Symbols Look Like Hex Addresses
Stack frames showing as 0x7f8b2c19c700 instead of function names mean symbol resolution failed. Common causes:
- Stripped Python binary (Alpine images often ship without debug symbols). Use
python:3.12-slimorpython:3.12instead ofpython:3.12-alpinewhen you expect to profile. - Stale debug info path. py-spy uses the same paths as
gdb; if your debug info is in a separate/usr/lib/debugtree, mount it into the profiling container. - The target process is a frozen binary (PyInstaller, Nuitka). py-spy may not resolve frames inside the bundled interpreter. Profile the development build instead.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: scalene Not Working — Web UI, GPU Profiling, and AI Suggestion Errors
How to fix scalene errors — scalene command not found, web UI port conflict, no GPU detected, profile.json empty, AI optimize requires OpenAI key, native code not attributed, and Jupyter integration.
Fix: memray Not Working — Tracking Errors, Flamegraph Empty, and Native Allocations
How to fix memray errors — memray run command not found, flamegraph shows no data, native allocations not tracked, live mode TUI broken, attach to running process fails, and pytest integration.
Fix: Locust Not Working — User Class Errors, Distributed Mode, and Throughput Issues
How to fix Locust errors — no locustfile found, User class not detected, worker connection refused, distributed mode throughput lower than single-node, StopUser exception, FastHttpUser vs HttpUser, and headless CSV reports.
Fix: Python asyncio Blocking the Event Loop — Mixing Sync and Async Code
How to fix Python asyncio event loop blocking — using run_in_executor for sync calls, asyncio.to_thread, avoiding blocking I/O in coroutines, and detecting event loop stalls.