main logo icon

Published on

July 1, 2026

|

11 min read

CI/CD Pipeline Penetration Testing: What a Build-Chain Test Scopes That an App Pentest Misses

A CI/CD pipeline penetration test scopes the build chain itself: secrets, OIDC and cloud credentials, runners, the action and dependency supply chain, and deploy tokens. Here is how it differs from an application pentest.

Arafat Afzalzada

Arafat Afzalzada

Founder

Web App Security

Summarize with AI

ChatGPTPerplexityGeminiGrokClaude

TL;DR

A CI/CD pipeline penetration test scopes the build and deploy chain as its own high-trust attack surface: pipeline secrets, OIDC and cloud credentials, self-hosted runners, the third-party action and dependency supply chain, and deploy tokens. An application pentest scopes the running product at its trust boundary. They overlap only at the edges. The tj-actions/changed-files compromise (CVE-2025-30066) exposed secrets in the workflow logs of more than 23,000 repositories in March 2025, and the self-propagating Shai-Hulud npm worm backdoored packages across the ecosystem later that year. Those attacks live entirely inside the build chain, where a normal app test never looks. Snipe, Stingrai's autonomous web-app agent, closes the last gap by gating pull requests and shipping AutoFix remediation directly in CI.

A CI/CD pipeline penetration test scopes the build and deploy chain itself as an attack surface: the secrets stored in the pipeline, the OIDC and cloud credentials it assumes, the runners that execute jobs, the third-party actions and dependencies it pulls, and the deploy tokens it uses to ship to production. An application penetration test scopes the running product at its trust boundary. The two overlap only at the edges. This is the single most useful distinction to hold onto: an app test asks "can an attacker abuse the software you shipped," while a pipeline test asks "can an attacker abuse the machinery that ships it."

That machinery has become one of the highest-value targets in modern security. In March 2025, attackers compromised the popular tj-actions/changed-files GitHub Action (CVE-2025-30066) and made it dump CI secrets into workflow logs across more than 23,000 repositories (Sysdig). None of those secrets lived in the application. They lived in the pipeline, which is exactly the surface a build-chain pentest is built to interrogate.

This guide explains the pipeline as a high-trust attack surface, gives a clear scope-comparison table between a pipeline pentest and an app pentest, and lays out what a build-chain test uniquely covers. It is written for DevSecOps and platform-security teams whose pipeline holds signing keys, cloud credentials, and deploy tokens. There are no exploit walkthroughs or payloads here, only synthetic examples and scope-level altitude.

What does a CI/CD pipeline penetration test cover, and how is it different from an application pentest?

Answer first, because this is the question most teams arrive with. A CI/CD pipeline penetration test covers the build and deploy infrastructure: how pipelines are triggered, what secrets and credentials they can reach, whether third-party actions and dependencies can be poisoned, whether runners can be compromised or reused across trust boundaries, and whether a low-privilege contributor can escalate a pull request into production access. An application penetration test covers the deployed application: authentication, authorization, input handling, business logic, and the data the app exposes to its users.

The difference matters because the two attack surfaces have different trust models. An application faces the internet and expects hostile input from anonymous users. A pipeline faces its own developers and expects trusted input from authenticated contributors, which is precisely why a poisoned commit or a malicious dependency travels so far so fast. Verizon's 2025 Data Breach Investigations Report found that the share of breaches involving a third party doubled to 30 percent (up from 15 percent the year before), which is exactly this class of inherited, trusted-path risk. Your app pentest does not model it because your app is not where it happens.

Cicd Attack Surface Map

The pipeline is a high-trust attack surface

The reason a pipeline is dangerous is that it is trusted by design. It holds the keys to production, it runs code from many sources, and it usually runs that code with more privilege than any single developer holds. Break the pipeline and you inherit all of it at once. Six asset classes make the build chain a distinct target.

Pipeline secrets

Pipelines store secrets so that jobs can authenticate to registries, cloud accounts, and internal services. Those secrets are frequently the crown jewels: registry credentials, database passwords, API keys, and signing material. The tj-actions compromise worked precisely because the payload dumped these secrets into logs that were, in many public repositories, world-readable. A pipeline pentest inventories where secrets are defined, which jobs can read them, whether they leak into logs or artifacts, and whether a fork or a pull request can reach them.

OIDC and cloud credentials

Modern pipelines increasingly assume short-lived cloud credentials through OpenID Connect (OIDC) rather than storing long-lived cloud keys. This is a security improvement, but it moves the risk into the trust policy: if the cloud role trusts the pipeline's OIDC issuer too broadly (any branch, any repository, any workflow), then a single poisoned workflow can assume a production cloud role. A build-chain test scopes the OIDC trust conditions, the subject claims, and the blast radius of each assumable role. An app pentest never sees this because it lives one layer below the application.

Runners

Runners are the machines that execute pipeline jobs. Self-hosted runners are especially sensitive: if a job from an untrusted pull request lands on a runner that also processes trusted jobs, or if a runner persists state between jobs, an attacker can pivot from a low-trust build into a high-trust one. A pipeline pentest looks at runner isolation, whether ephemeral runners are actually ephemeral, and whether public pull requests can reach self-hosted infrastructure.

The action and dependency supply chain

Every third-party action and dependency your pipeline pulls is code you execute with pipeline privilege. Sonatype's 10th annual State of the Software Supply Chain report logged 512,847 malicious open-source packages in a single year, a 156 percent year-over-year increase. The Shai-Hulud npm worm, first identified in September 2025, was self-propagating: it harvested npm tokens from each compromised environment to publish malicious versions of everything it could reach (CISA, Unit 42). A pipeline pentest assesses how actions and dependencies are pinned, whether floating tags allow retroactive tampering, and how far a single poisoned dependency could reach.

Deploy tokens and the path to production

The final asset is the ability to ship. Deploy tokens, environment approvals, and branch-protection rules are the last gate between a merged pull request and running production code. A build-chain test asks whether a contributor with only pull-request access can force code into a protected environment, and whether deploy credentials are scoped to the environments they actually serve.

Poisoned pipeline execution

Poisoned Pipeline Execution (PPE) ties several of these together. Catalogued as CICD-SEC-4 in the OWASP Top 10 CI/CD Security Risks, PPE abuses write access to a source repository to make the pipeline run attacker-controlled commands, either by editing the pipeline config directly or by tampering with a file the pipeline already trusts (a Makefile, a test script, a build tool config). Because the pipeline runs that code with its own privilege, a low-trust change becomes high-trust execution. This is the mechanism a pipeline pentest is designed to reproduce at scope level and then help you close.

Scope comparison: pipeline pentest vs app pentest

The clearest way to see the difference is side by side. Both tests are valuable, and mature teams run both, but they answer different questions and touch different assets.

Dimension

CI/CD pipeline penetration test

Application penetration test

Primary question

Can an attacker abuse the machinery that builds and ships the software?

Can an attacker abuse the software that was shipped?

Trust model tested

Trusted-contributor and trusted-dependency paths

Anonymous or authenticated end-user input

Core assets in scope

Pipeline secrets, OIDC and cloud credentials, runners, third-party actions, dependencies, deploy tokens

Auth, session, authorization, input handling, business logic, data exposure

Signature threat

Poisoned pipeline execution, supply-chain tampering, secret exfiltration from logs

IDOR, broken access control, injection, business-logic abuse

Entry point

A pull request, a forked repo, a poisoned dependency or action

A login page, an API endpoint, a public form

Blast radius on success

Production credentials, signing keys, deploy access, every downstream consumer

The application's data and users

Where the compromise lives

Inside the build chain (never reached by an app test)

Inside the running product

Representative framework

OWASP Top 10 CI/CD Security Risks

OWASP Top 10, OWASP ASVS

Cicd Scope Comparison Chart

The overlap is real but narrow. An app pentest may notice a secret hard-coded in a client bundle, and a pipeline pentest may flag an application dependency with a known vulnerability. But the center of mass is different: the app test cannot assume a cloud role from a poisoned workflow, and the pipeline test does not fuzz your checkout flow. Scoping both, and scoping them as distinct engagements, is how you avoid the false comfort of a clean app report while your build chain sits wide open.

What a build-chain test uniquely scopes

Some findings only ever surface in a pipeline-focused engagement. These are the items a normal app test structurally cannot reach.

  • Trigger and flow control. Which events start a pipeline, and can an untrusted actor start a privileged one? OWASP catalogues this as CICD-SEC-1, insufficient flow control. A pentest maps every trigger and checks whether a fork, a comment, or an unreviewed pull request can invoke a job that holds secrets.

  • Secret reachability from untrusted context. Not just where secrets live, but whether a pull request from a fork can read them, whether they leak into logs or artifacts, and whether they are scoped to the jobs that need them.

  • OIDC trust-policy blast radius. Whether the cloud roles a pipeline can assume are constrained to specific branches, repositories, and workflows, or whether the trust policy is broad enough that any workflow becomes a path to production cloud.

  • Runner isolation and reuse. Whether self-hosted runners are ephemeral, isolated from untrusted jobs, and prevented from carrying state or credentials between trust boundaries.

  • Action and dependency pinning. Whether third-party actions are pinned to immutable commit hashes rather than mutable tags, which is the exact control that would have limited the tj-actions retroactive-tag tampering.

  • Artifact integrity and the promotion path. Whether build artifacts are signed and verified before deploy (CICD-SEC-9, improper artifact integrity validation), and whether the path from a merged commit to production enforces the approvals it claims to.

A useful mental model: an app pentest reads the product, a pipeline pentest reads the factory. Both can find serious problems, but only the pipeline test can tell you whether an attacker who never touches your product can still ship code as you.

Cicd Supply Chain Incidents

What this means for defenders

The data and the incidents point at a small set of moves that pay off quickly.

  • Scope the pipeline as its own engagement. A clean web application penetration test does not clear your build chain. If your pipeline holds signing keys, cloud credentials, or deploy tokens, it deserves a dedicated test against the OWASP Top 10 CI/CD Security Risks, not a footnote in an app report.

  • Pin actions and dependencies to immutable hashes. Mutable tags are what let the tj-actions attackers retroactively repoint historical releases to malicious code. Hash-pinning removes that entire class of retroactive tampering.

  • Constrain OIDC trust and scope every credential. Treat each assumable cloud role as a blast radius. Constrain trust to specific branches and workflows, and give deploy tokens only the environments they serve.

  • Isolate runners and keep ephemeral runners ephemeral. Never let a pull-request job from an untrusted fork share a runner, or persist state, with trusted jobs.

  • Move the trust boundary into the pull request. The cheapest place to stop a poisoned change is before it merges. A security check that runs on every pull request, blocks a vulnerable merge, and opens a fix is a control that operates exactly where PPE and supply-chain tampering try to enter.

That last point is where Stingrai's Snipe fits natively. Snipe is Stingrai's autonomous agent for web application penetration testing, purpose-built to hunt complex, high-impact classes such as IDOR, broken authorization, and business-logic flaws rather than only known-signature bugs. It runs both black-box dynamic testing and white-box code review, it can run as a PR-gating check on every pull request, and it ships AutoFix pull requests for what it finds. In a build chain, that combination is a trust-boundary control: it validates code at the pull request, blocks a vulnerable merge before it becomes a deployable artifact, and closes the loop with a remediation PR. It does not replace a full pipeline pentest, but it moves continuous assurance into the exact place the pipeline is most often attacked.

Cicd Pr Gating Flow

Stingrai combines this automated, in-pipeline assurance with senior human penetration testing and red teaming, so the machinery that ships your software gets the same scrutiny as the software itself. Stingrai is a CREST-accredited penetration testing service provider, and its pentest evidence supports SOC 2, ISO 27001, and PCI DSS compliance programs.

Frequently Asked Questions

What does a CI/CD pipeline penetration test cover and how is it different from an application pentest?

A CI/CD pipeline penetration test covers the build and deploy chain itself: pipeline secrets, OIDC and cloud credentials, runners, the third-party action and dependency supply chain, deploy tokens, and the trigger and flow-control logic that decides which jobs run with which privilege. An application penetration test covers the running product at its trust boundary: authentication, authorization, input handling, business logic, and data exposure. They overlap only narrowly, so mature teams scope both as distinct engagements.

What is poisoned pipeline execution?

Poisoned Pipeline Execution (PPE) is an attack that abuses write access to a source repository to make the CI pipeline run attacker-controlled commands. Attackers either edit the pipeline configuration directly or tamper with a file the pipeline already trusts, such as a Makefile or test script. Because the pipeline runs that code with its own elevated privilege, a low-trust change becomes high-trust execution. OWASP catalogues it as CICD-SEC-4 in the Top 10 CI/CD Security Risks.

Why can't a normal application pentest catch supply-chain and pipeline attacks?

Because those attacks live in the build machinery, not in the shipped product. The tj-actions/changed-files compromise dumped CI secrets into workflow logs across more than 23,000 repositories (Sysdig), and none of those secrets were in the application. An app pentest tests the trust boundary the product exposes to its users, while a pipeline test targets the trusted-contributor and trusted-dependency paths that a build chain exposes to its own developers.

What were the major CI/CD supply-chain incidents in 2025?

Two stand out. In March 2025, the tj-actions/changed-files GitHub Action was compromised (CVE-2025-30066) and made to leak CI secrets across more than 23,000 repositories, with a linked compromise of reviewdog/action-setup (CVE-2025-30154), per CISA. Later in the year, the self-propagating Shai-Hulud npm worm harvested tokens and backdoored packages across the npm ecosystem (CISA).

How often are third parties involved in breaches?

Verizon's 2025 Data Breach Investigations Report found the share of breaches involving a third party doubled to 30 percent, up from 15 percent the prior year (Verizon). The same report measured a 34 percent increase in vulnerability exploitation as an initial access vector. Both trends point at inherited, trusted-path risk that lives outside the application an organization directly controls.

How can a team reduce CI/CD pipeline risk quickly?

Pin third-party actions and dependencies to immutable commit hashes rather than mutable tags, constrain OIDC cloud-role trust to specific branches and workflows, isolate self-hosted runners so untrusted pull-request jobs never share infrastructure with trusted jobs, and move a security check into the pull request so vulnerable code is blocked before it merges. Then validate the whole chain with a dedicated pipeline penetration test against the OWASP Top 10 CI/CD Security Risks.

Does Stingrai test CI/CD pipelines?

Yes. Stingrai's penetration testing and red teaming engagements can scope the build and deploy chain against the OWASP Top 10 CI/CD Security Risks, and Snipe, Stingrai's autonomous web-app agent, adds continuous assurance inside the pipeline by running as a PR-gating check and shipping AutoFix pull requests. Stingrai is a CREST-accredited penetration testing service provider whose pentest evidence supports SOC 2, ISO 27001, and PCI DSS compliance programs. Pricing is on the Stingrai pricing page.

References

  1. CISA. Supply Chain Compromise of Third-Party tj-actions/changed-files (CVE-2025-30066) and reviewdog/action-setup (CVE-2025-30154). March 18, 2025. https://www.cisa.gov/news-events/alerts/2025/03/18/supply-chain-compromise-third-party-tj-actionschanged-files-cve-2025-30066-and-reviewdogaction. Government advisory confirming the CVE identifiers and the secrets-in-logs exposure.

  2. Sysdig. Detecting and Mitigating the tj-actions/changed-files Supply Chain Attack (CVE-2025-30066). March 2025. https://www.sysdig.com/blog/detecting-and-mitigating-the-tj-actions-changed-files-supply-chain-attack-cve-2025-30066. Analysis confirming 23,000-plus affected repositories and the workflow-log secret disclosure.

  3. CISA. Widespread Supply Chain Compromise Impacting npm Ecosystem. September 23, 2025. https://www.cisa.gov/news-events/alerts/2025/09/23/widespread-supply-chain-compromise-impacting-npm-ecosystem. Advisory on the self-propagating npm worm and token harvesting.

  4. Palo Alto Networks Unit 42. Shai-Hulud Worm Compromises npm Ecosystem in Supply Chain Attack. 2025. https://unit42.paloaltonetworks.com/npm-supply-chain-attack/. Threat research documenting the self-replicating worm behaviour and scope.

  5. Sonatype. 10th Annual State of the Software Supply Chain. October 2024. https://www.sonatype.com/state-of-the-software-supply-chain/2024/introduction. Reports 512,847 malicious open-source packages in a single year, a 156 percent year-over-year increase.

  6. Verizon. 2025 Data Breach Investigations Report. 2025. https://www.verizon.com/about/news/2025-data-breach-investigations-report. Finds the share of breaches involving a third party doubled to 30 percent and a 34 percent rise in vulnerability exploitation.

  7. OWASP. Top 10 CI/CD Security Risks. 2022 to present. https://owasp.org/www-project-top-10-ci-cd-security-risks/. The canonical framework, including CICD-SEC-4 Poisoned Pipeline Execution and CICD-SEC-9 Improper Artifact Integrity Validation.

0 views

0

X

Related reading

Do AI-Coded Apps Need a Penetration Test? What Copilot and Cursor Output Actually Breaks
Web App SecurityLLM Security

Do AI-Coded Apps Need a Penetration Test? What Copilot and Cursor Output Actually Breaks

Why Copilot and Cursor output needs adversarial testing, the exact bug classes AI code breaks on, and a pentest plus PR-gating workflow to fix it.

11 min read

Your RAG Vector Store Is an Unauthenticated Asset: Testing Knowledge-Base Access Control and Ingestion
LLM SecurityWeb App Security

Your RAG Vector Store Is an Unauthenticated Asset: Testing Knowledge-Base Access Control and Ingestion

Test your RAG pipeline before attackers do. How to check vector store access control, tenant isolation, and ingestion authz, plus what RAG poisoning takes.

12 min read

Why Automated API Scanners Miss BOLA and IDOR: Testing Object-Level and Tenant-Isolation Authorization
Web App Security

Why Automated API Scanners Miss BOLA and IDOR: Testing Object-Level and Tenant-Isolation Authorization

Why DAST and API scanners miss BOLA and IDOR, and what manual object-level and tenant-isolation authorization testing adds. OWASP-aligned methodology.

11 min read

Contents

X