Static analysis has always been a balancing act between depth and usability. CodeQL and Semgrep represent two different philosophies for solving this problem, and the choice between them shapes how your team approaches code security.
Fundamentals
CodeQL, developed by GitHub (originally Semmle), treats code as data. It compiles source code into a relational database, then lets you query that database using a purpose-built query language called QL. This approach enables deep semantic analysis including data flow tracking, taint analysis, and complex pattern matching across function boundaries.
Semgrep, from r2c (now Semgrep Inc.), takes a pattern-matching approach. You write rules that look like the code you want to find, with metavariables for the parts that vary. The syntax is deliberately close to the source language, which makes rules readable by developers who have never used Semgrep before.
The philosophical difference: CodeQL treats security analysis as a database query problem. Semgrep treats it as a pattern matching problem. Both are valid approaches, but they lead to very different user experiences and capabilities.
Rule Authoring
This is where the tools diverge most dramatically.
Writing a CodeQL query requires learning QL, a Datalog-inspired query language. QL is powerful but has a genuine learning curve. A simple query to find SQL injection takes 15-20 lines of QL code and requires understanding the CodeQL library hierarchy for the target language. A complex data flow query can run to hundreds of lines.
Semgrep rules are YAML files with pattern fields that mirror the target language syntax. Finding SQL injection in Python looks roughly like writing the vulnerable code pattern with metavariables in place of specific values. A developer who has never seen Semgrep can read a rule and understand what it catches. Writing custom rules typically takes minutes rather than hours.
For security teams that want to encode internal coding standards and organization-specific vulnerability patterns, Semgrep's approachability is a massive advantage. When any developer on the team can write and maintain rules, the rule set stays current and relevant.
For security researchers who need to trace data flows through complex codebases, CodeQL's query language is more expressive. The things you can find with a well-crafted CodeQL query (multi-step taint propagation through serialization boundaries, for instance) are difficult or impossible to express in Semgrep.
Analysis Depth
CodeQL performs whole-program analysis. It builds a database from the entire codebase and resolves function calls, type hierarchies, and data flows across files and modules. This means CodeQL can find vulnerabilities where tainted data passes through five function calls before reaching a dangerous sink.
Semgrep historically operated within single files. Inter-file analysis was added through Semgrep Pro (their commercial offering), which includes cross-file data flow and taint tracking. The free open source version is limited to single-file patterns, which misses vulnerabilities that span multiple files.
This distinction matters more for some vulnerability classes than others. SQL injection where the query string is built in the same function it is executed? Both tools find it. SQL injection where user input passes through a controller, a service layer, and a data access layer? CodeQL finds it reliably. Semgrep needs the Pro tier and even then, the cross-file analysis is not as mature.
Performance
Semgrep is fast. Scanning a large codebase (500K+ lines) typically takes 30-60 seconds. It starts quickly, processes files in parallel, and outputs results without a compilation step. In a CI pipeline, Semgrep adds minimal time to your build.
CodeQL is slow. The database creation step alone can take 5-30 minutes depending on language and codebase size. Running queries against the database adds another few minutes. Total analysis time for a medium codebase is 10-40 minutes. In CI, this is noticeable.
The performance gap is a consequence of the architectural differences. CodeQL's whole-program analysis requires building a complete code database. Semgrep's pattern matching can process files independently. You are trading speed for depth.
Language Support
CodeQL supports C, C++, C#, Go, Java, Kotlin, JavaScript, TypeScript, Python, Ruby, and Swift. Each language has a dedicated extractor and library, which means coverage quality varies. Java and JavaScript have the most mature CodeQL libraries.
Semgrep supports over 30 languages at varying levels of maturity. The broadly supported languages include Python, JavaScript, TypeScript, Java, Go, Ruby, C, C++, PHP, Rust, Scala, Kotlin, and Swift. Semgrep can also handle generic pattern matching for languages without full parsing support.
For polyglot organizations, Semgrep's language breadth is an advantage. For organizations standardized on a few well-supported languages, CodeQL's deeper analysis per language may be more valuable.
Rule Ecosystems
Both tools ship with extensive default rule sets covering OWASP Top 10 vulnerabilities and common security misconfigurations.
Semgrep Registry contains thousands of community-contributed rules organized by language, framework, and vulnerability class. The quality varies, but the curated rules from the Semgrep team are solid. You can also find rules targeting specific frameworks like Django, Flask, Spring, and Express.
CodeQL's default queries are maintained by GitHub's security research team and are consistently high quality. The community query ecosystem is smaller but focused on security-relevant patterns. GitHub Code Scanning uses CodeQL queries under the hood, which means the default rule set gets extensive real-world testing across millions of repositories.
CI/CD Integration
Semgrep integrates through a simple CLI command or GitHub Action. The semgrep ci command is designed specifically for CI use, with diff-aware scanning that only reports findings in changed code. This prevents the "wall of findings" problem that kills developer trust in security tooling.
CodeQL integrates through GitHub Code Scanning (native integration), GitHub Actions, or the CLI. On GitHub, the integration is seamless. Code Scanning results appear inline in pull requests, and the CodeQL Action handles database creation and query execution. Off GitHub, the setup is more involved.
Pricing
Semgrep's open source version is free and genuinely useful for single-file analysis. Semgrep Pro, with cross-file analysis, costs per developer. Pricing is not public but runs in the range of $40-60 per developer per month.
CodeQL is free for open source repositories on GitHub. For private repositories, it requires GitHub Advanced Security, which costs $49 per active committer per month. If you are already on GitHub Enterprise, adding Advanced Security is the simplest path to CodeQL.
Practical Recommendation
Use both if you can justify the pipeline time. Semgrep as the fast first pass that catches pattern-based issues, CodeQL as the deep analysis that catches data flow vulnerabilities. The overlap in findings is surprisingly small; each tool catches things the other misses.
If you must choose one: Semgrep for teams that need fast feedback and want developers writing custom rules. CodeQL for teams on GitHub that need deep taint analysis and are willing to accept longer scan times.
How Safeguard.sh Helps
Safeguard.sh aggregates findings from both Semgrep and CodeQL (along with other SAST tools) into a unified vulnerability management workflow. Instead of triaging findings in separate dashboards, your security team sees a deduplicated, prioritized list of issues across all scanners. Safeguard.sh also correlates SAST findings with SCA data, so you can see whether a vulnerable coding pattern is compounded by a vulnerable dependency, giving you a more accurate picture of actual risk.