Python is the language where reachability analysis is most likely to disappoint and most worth doing anyway. The dynamic nature of the runtime means a static call graph is always an approximation, and aggressive monkey-patching in libraries like Celery or pytest can shift call edges in ways no static tool can predict. Yet the false-positive reduction from running reachability against a typical Django or FastAPI service is still 60 to 75%, which is the difference between a usable alert queue and an ignored one.
This post walks through what Python reachability tools can and cannot do, where the abstractions leak, and how to get useful results out of a tool stack that includes Semgrep, Snyk Code, Endor Labs, and the open-source pip-audit. It is written for backend engineers who have been told their SCA is broken and want to know whether reachability fixes it.
What makes Python reachability fundamentally hard?
Python is duck-typed and lacks the static type information that makes Rust or Java reachability tractable. A function call like obj.process(data) could dispatch to any class that defines process, and a static analyzer cannot know which one without inferring types across the entire program. Tools approximate this with various heuristics: PyType and Pyright produce type inferences that reachability engines consume, and tools like Semgrep build a coarser dataflow graph that over-approximates edges.
The second problem is monkey-patching. Libraries like gevent replace standard library functions at import time, and ORMs add methods to model classes dynamically. A reachability tool that does not model these patterns will produce both false positives and false negatives. The third problem is import-time side effects: code that runs during module import can reconfigure the call graph for the rest of the process. Python is the worst major language for static reachability for these reasons, and a tool that claims 99% precision on Python is selling you something.
How do popular Python frameworks affect reachability?
Django and FastAPI structure the call graph in predictable ways that good tools have learned to model. In Django, URL routing in urls.py is the canonical entry point, and tools that parse URLconf can use it as a reachability root. Middleware and signal handlers introduce additional edges that need explicit modeling. In FastAPI, the route decorators on path functions are the entry points, and tools that parse decorators correctly produce reasonable call graphs from there.
Celery is harder. Task functions are invoked by name from a message broker, and the static caller is a delay or apply_async call that does not look like a direct invocation. Tools that model Celery's task registration correctly will treat the task functions as reachable; tools that don't will mark them as dead code. Endor Labs and Snyk Code both handle Celery competently as of their 2026 releases. The practical advice is to test your reachability tool against a known-vulnerable function in a Celery task to confirm it produces the expected result before trusting its broader output.
What about optional extras and pip's dependency resolution?
pip's extras_require mechanism lets a package declare optional dependencies that are only installed when explicitly requested. A vulnerable function in an extra that you never install is unambiguously unreachable, but a reachability tool that walks the full advertised dependency graph will list it as present. This was a common source of confusion with the requests[security] extra historically, and it persists today with packages like pandas[excel] or pydantic[email].
The right approach is to read the actual installed package set from pip freeze or the project's lock file, not the declared setup.py or pyproject.toml. Tools that consume lock files from Poetry, PDM, or pip-tools get this right; tools that scan requirements.txt without resolution often get it wrong. The recent improvements in uv and rye, with their proper lockfile semantics, have made this easier for tool vendors to handle correctly. If your reachability tool does not consume a lockfile, treat its output with skepticism.
What does a real CVE look like with Python reachability?
Consider CVE-2024-3651, the idna package punycode issue that affected most Python web services through urllib3 and requests. The vulnerable code is reached only when parsing externally-supplied internationalized hostnames, which is common in web crawlers but rare in typical API services. Reachability analysis on a Django REST API that calls outbound HTTP only to a handful of internal services will correctly mark this as unreachable, even though idna is present in the dependency tree.
Contrast with CVE-2022-23491, the certifi root CA issue. The vulnerable certificate bundle is loaded at process startup by every HTTPS-using Python service, which means it is reachable in the technical sense for almost every service. Reachability cannot help you here, and the right response is to patch certifi directly. The lesson is that reachability is a filter, not a magic wand. About 70% of Python CVEs in our measured dataset are filterable; the other 30% require ordinary patching discipline.
How should Python teams operationalize this?
The pragmatic pattern is to run reachability in CI against the installed virtualenv, not the source tree alone. Build the production environment, run your reachability tool with that environment as input, and gate merges on reachable-critical CVEs. Use pip-audit as a baseline check that runs on every commit, and run the heavier reachability tool on a slower cadence, daily or per-PR depending on the cost.
A common mistake is to run reachability on a development environment that includes test dependencies. Test dependencies routinely include packages with known CVEs that never ship to production. Filter to the production extras only, or you will generate alerts on dev-only packages that no one needs to fix.
How Safeguard Helps
Safeguard ingests Python dependency manifests from pip, Poetry, PDM, uv, and Pipenv, and runs reachability against installed environments using framework-aware call graphs for Django, FastAPI, Flask, and Celery. Griffin AI explains why a CVE is reachable in terms of the specific route handler or task that calls into the vulnerable code. SBOMs distinguish production from dev dependencies so policy gates only enforce on what actually ships. TPRM scores PyPI maintainers on response time and namespace history, and our zero-CVE Python base images give services a clean runtime foundation that survives audits.