---
title: "Automated Red Teaming for Continuous Vulnerability Discovery: Architectures, Methodologies, and Integration Strategies"
type: research
subtype: paper
tags: [security, red-teaming, vulnerability-discovery, ci-cd, automation, dast, fuzzing, ai-security, llm-security, penetration-testing, devops]
tools: [github-actions, gitlab-ci, metasploit, sliver, bloodhound, promptfoo, pyrit]
status: draft
created: 2026-03-20
updated: 2026-03-20
version: 1.1.0
related: [research-paper-bias-detection-prevention-mitigation.md, prompt-workflow-pre-mortem-planning.md]
source: Comprehensive analysis of automated offensive security methodologies, CI/CD integration patterns, adversary emulation toolchains, AI-driven vulnerability discovery, and outcome-driven security metrics
---

# Automated Red Teaming for Continuous Vulnerability Discovery: Architectures, Methodologies, and Integration Strategies

## Summary

This research examines the transition from manual, point-in-time penetration testing to continuous automated red teaming (CART) as a structural necessity for modern software security. The paper synthesizes five domains: methodological foundations of automated vulnerability discovery (reconnaissance, intelligent fuzzing, privilege escalation), architectural blueprints for embedding offensive automation within CI/CD pipelines, the toolchain ecosystem (C2 frameworks, AD exploitation, DAST), the dual role of AI as both offensive engine and novel attack surface, and outcome-driven metrics that replace vulnerability counting with adversarial disruption measurement. Sustainable software security requires algorithmic, always-on adversary emulation integrated directly into the software delivery lifecycle, measured by time-based eviction metrics and attack success rates rather than raw vulnerability counts.

## Context

The volume and velocity of software vulnerabilities have reached levels that overwhelm manual security practices. In the first half of 2025 alone, over 21,500 CVEs were disclosed — approximately 133 new vulnerabilities per day.^1 With cybercrime costs projected at $10.5 trillion annually by 2025, organizations face a structural mismatch between the rate of vulnerability introduction and the capacity of human-driven security assessment.^2

Traditional approaches occupy two poles. Vulnerability scanning is automated but shallow — limited by predefined heuristics, blind to business logic flaws, and generating high volumes of false positives.^3,4 Manual penetration testing leverages human expertise for deep context-sensitive discovery but is resource-intensive, slow (weeks per engagement, tens of thousands of dollars), and produces only point-in-time snapshots that degrade immediately in continuous deployment environments.^4

Red teaming transcends both by simulating the holistic tactics, techniques, and procedures (TTPs) of advanced persistent threats — focusing on multi-stage attack paths to compromise sensitive data, establish command and control (C2), or capture credentials, while simultaneously evaluating whether SOC teams, telemetry, and incident response playbooks can detect and eradicate an active adversary.^2,3,6

The automation of red teaming removes the human bottleneck from repetitive reconnaissance, payload delivery, and lateral movement, enabling continuous threat exposure management and freeing human operators for novel exploit development and strategic remediation.^5,8

## Hypothesis / Question

What architectural patterns, methodological frameworks, and toolchain ecosystems enable organizations to embed continuous automated adversary emulation into CI/CD pipelines, and what outcome-driven metrics accurately evaluate the resulting defensive readiness — including the unique challenges posed by AI/LLM attack surfaces?

## Method

The research employs a multidisciplinary synthesis of:

- **Offensive Methodology Analysis:** Codification of adversarial behavior into automated reconnaissance, intelligent fuzzing, privilege escalation, lateral movement, and evasion stages.^6,9,10,11
- **CI/CD Integration Architecture:** Evaluation of post-deployment staging scans, ephemeral per-PR environments, and shadow production endpoints as integration blueprints.^17,18
- **Pipeline Security Assessment:** Analysis of CI/CD-specific attack vectors including workflow injection, environment manipulation, and evasion techniques targeting GitHub Actions.^19
- **Toolchain Taxonomy:** Comparative analysis of C2 frameworks, AD exploitation utilities, and DAST engines across operational security, evasion capability, and automation fitness.^14,20,21
- **AI Red Teaming Framework Evaluation:** Assessment of LLM-specific vulnerability classes (prompt injection, jailbreaking, data poisoning, agentic orchestration flaws) and automated testing frameworks.^27,30,31
- **Maturity Model Mapping:** Five-tier organizational capability progression from ad-hoc testing to fully autonomous adversary emulation.^31
- **Metric Framework Design:** Time-based operational metrics (TTD, MTTA, TTC, TTE) and attack success rate (ASR) as industry-standard alternatives to vulnerability counting.^6,8,31

## Results

The results are organized in five arcs: automated methodology (sections 1-3), CI/CD integration (sections 4-5), toolchain ecosystem (sections 6-8), AI intersection (sections 9-11), and organizational maturity and metrics (sections 12-14).

### 1. Reconnaissance and Behavioral Attack Vector Selection

Automated red teaming initiates with continuous reconnaissance — deep-tree scanning, domain enumeration, and asset discovery to identify exposed APIs, shadow IT, leaked credentials, and misconfigured cloud storage.^9 Advanced platforms extend beyond standard OSINT to behavioral profiling: algorithmic analysis of patch management cadences, CI/CD deployment frequencies, and historical change management cycles.^6 These behavioral metrics enable intelligent selection of initial attack vectors with the highest statistical probability of success, mirroring the customized targeting of sophisticated human adversaries.^6

### 2. Intelligent Fuzzing and Automated Payload Generation

Automated frameworks deploy dynamically generated payloads designed to evade static signature detection and induce unhandled application states. Intelligent fuzzing has evolved via machine learning integration — hybrid approaches combining mutation-based fuzzing with generation-based strategies guided by Markov chains thoroughly explore protocol behaviors and buried application logic.^10

Deep reinforcement learning (DRL) and Long Short-Term Memory (LSTM) neural networks generate context-sensitive attack inputs. For automated SQL injection testing, Deep Q-Network (DQN) agents dynamically analyze back-end responses and iteratively construct obfuscated payloads, learning target-specific syntax and Web Application Firewall (WAF) filtering rules to synthesize novel bypass techniques with significantly higher success rates than static payload dictionaries.^11

### 3. Privilege Escalation, Lateral Movement, and Evasion

Upon establishing a foothold, automated agents interrogate local host configurations — enumerating permission sets, service misconfigurations, and AD group memberships to identify escalation pathways.^12 Lateral movement involves programmatic execution of identity-based attacks: Kerberoasting, NTLM relay, token impersonation, and credential extraction from system memory.^7

Graph theory and algorithmic pathfinding analyze network trust relationships, calculating the shortest and least-monitored path to domain dominance.^14 Evasion techniques include fileless implants, in-memory execution, and living-off-the-land tactics — abusing native tools like WMI and PowerShell to blend with legitimate administration traffic.^13 This validates whether internal telemetry and behavioral analytics can detect anomalous lateral pivoting before total compromise.

### 4. CI/CD Integration Blueprints

Embedding automated red teaming within CI/CD pipelines shifts security testing earlier in the development lifecycle at varying degrees of integration depth — security becomes an intrinsic component of the software assembly line rather than a reactive post-deployment audit.^15

Three architectural patterns emerge, progressing from latest-stage to earliest-stage integration:

| Pattern | Mechanism | Trade-offs |
| :---- | :---- | :---- |
| **Post-deployment staging scan** | Code deploys to a production-mirror staging environment; CI/CD triggers DAST, API exploitation, and red team scripts against the live environment.^17 | High fidelity; validates runtime configurations and business logic; slower feedback loop. |
| **Ephemeral per-PR environments** | CI/CD dynamically provisions isolated, disposable infrastructure for each pull request; full attack suite executes against the specific code delta; environment destroyed on completion.^17,18 | Truest shift-left pattern; parallelized and precisely scoped; higher infrastructure cost; may lack full production data parity and third-party integration coverage. |
| **Shadow production endpoints** | Automated attacks route to shadow endpoints within production that mirror public APIs but are restricted from end-users.^17 | Highest fidelity against real infrastructure; requires careful access control to prevent leakage. |

Security tools operate headlessly within containerized runners (GitHub Actions, GitLab CI, Azure DevOps), exporting machine-readable telemetry in JSON or SARIF for automated policy evaluation and deployment halting.^16,17

A minimal GitHub Actions integration for DAST scanning on pull requests:

```yaml
# .github/workflows/dast-scan.yml
name: DAST on PR
on: pull_request
jobs:
  dast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@af513c7a016048ae468971c52ed77d9562c7c819
      - name: Deploy ephemeral environment
        run: docker compose up -d --wait
      - name: Run DAST scan
        run: |
          rapidast --config rapidast-config.yaml \
            --output results/
      - name: Evaluate policy gates
        run: |
          CRITICAL=$(jq '[.results[] | select(.severity == "critical")] | length' results/report.json)
          if [ "$CRITICAL" -gt 0 ]; then echo "::error::$CRITICAL critical vulnerabilities found"; exit 1; fi
      - name: Teardown
        if: always()
        run: docker compose down -v
```

### 5. Pipeline Infrastructure as Attack Surface

CI/CD platforms themselves are prime supply chain attack targets. Specialized red team frameworks scan workflow definitions (`.github/workflows/*.yml`) for unvalidated user-controlled parameters in shell `run:` blocks, unpinned third-party actions using mutable version tags instead of cryptographic commit SHAs, and externally imported scripts vulnerable to manipulation.^19

Evasion techniques targeting pipeline security tooling include:

- **Multi-Step Technique:** Splitting malicious commands across sequential workflow steps that individually appear innocuous, reassembling and executing in a final step — breaking static analysis that evaluates blocks in isolation.^19
- **Code Cave Technique:** Hiding execution logic within massive documentation comment blocks inside legitimate scripts; performance-optimized scanners truncate analysis of heavily commented sections.^19
- **Time-Based Pattern Bypass:** Payloads evaluate runtime context variables (e.g., `github.run_id`), executing only under specific mathematical conditions to prevent behavioral heuristics from recognizing consistent attack patterns.^19
- **Environment Manipulation:** Prepending attacker-controlled directories to `GITHUB_PATH` or `GITHUB_ENV` so subsequent legitimate steps invoke malicious binaries instead of intended utilities.^19

### 6. Command and Control (C2) Frameworks

C2 frameworks manage compromised hosts through stealthy implants, orchestrate covert communication, and facilitate post-exploitation. Key frameworks ranked by operational characteristics:

| Framework | Architecture |
| :---- | :---- |
| **Sliver** (Bishop Fox) | Open-source, cross-platform. Implants communicate over mTLS, HTTPS, and DNS for versatile, stealthy connectivity.^20 |
| **PoshC2** (Nettitude) | Python3-based, proxy-aware. Integrated AMSI bypass, ETW patching, auto-generated Apache Rewrite rules, encrypted channels.^20 |
| **Cobalt Strike** (Fortra) | Industry standard. Configurable "Beacon" implants with malleable C2 profiles, rich post-exploitation modules, custom scripting.^20 |
| **Nighthawk** (MDSec) | Maximum operational security. Proprietary EDR bypass techniques not distributed in public repositories.^20 |
| **Mythic** (Cody Thomas) | Collaborative, cross-platform. Comprehensive operator activity tracking, highly customizable agents, pipeline integration ready.^20 |
| **Empire** (BC Security) | Multi-language agents (PowerShell, Python3, C#). Modular obfuscation, customizable bypass, encrypted communications.^20 |
| **Metasploit** (Rapid7) | Massive pre-built exploit collection enabling rapid payload deployment and varied post-exploitation across any architecture.^20 |
| **Merlin** (Russel Van Tuyl) | Golang-based RAT. Communicates via HTTP/1.1, HTTP/2, and HTTP/3 (QUIC) to bypass traditional traffic inspection.^20 |

### 7. Active Directory and Network Exploitation

AD acts as the centralized identity system for most enterprises. Specialized tools map and exploit trust relationships:

| Tool | Capability |
| :---- | :---- |
| **BloodHound.py** | Cross-platform Python AD ingestor; collects relationship data without executing .NET binaries, reducing host-based detection.^20 |
| **NetExec** | Automated assessment of Windows/AD environments at enterprise scale via SMB, WMI, WinRM — credential validation and remote command execution.^20 |
| **GhostPack** | C# post-exploitation suite: Rubeus (Kerberos abuse), Certify (AD CS exploitation), Seatbelt (reconnaissance), SharpDPAPI (credential extraction).^20 |
| **Certipy** | AD Certificate Services enumeration and abuse — shadow credential injection, Golden Ticket generation.^20 |
| **Impacket** | Python library for low-level Windows protocol access (SMB, RPC, WMI) — custom network attacks, credential relay, fileless remote execution.^14 |
| **MSLDAP** | LDAP library and interactive client for auditing Microsoft AD; facilitates real-time programmatic exploration of AD tree structures to identify misconfigurations and attack vectors.^20 |
| **Octopwn** | Browser-based WebAssembly platform packaging Nmap, BloodHound, and Hashcat into a single agile interface.^20 |

### 8. Dynamic Application Security Testing (DAST)

DAST tools assess application-layer logic by interacting with running applications — evaluating how software responds to malformed inputs in a production-like state.^21 Unlike SAST (which analyzes uncompiled source code via theoretical execution paths), DAST reduces false-positive rates and identifies runtime vulnerabilities invisible to static analysis: insecure direct object references, broken authentication, and server misconfigurations.^21

Modern frameworks like RapiDAST automate this within CI/CD pipelines, providing broad coverage for both web interfaces and backend microservices through continuous monitoring and dynamic payload generation.^21

### 9. Agentic AI for Autonomous Vulnerability Discovery

Traditional fuzzing and static analysis struggle with deep semantic understanding and multi-step business logic vulnerabilities. Agentic AI — autonomous systems capable of reasoning, planning, and executing complex hacking tasks without human supervision — addresses this cognitive shortfall.^22

Multi-agent orchestration frameworks (e.g., Co-RedTeam) automate the full vulnerability discovery lifecycle: Analysis and Critique agents ingest codebases and security documentation to hypothesize vulnerabilities; Planning, Validation, and Execution agents interface with isolated environments to iteratively write and test exploit code based on runtime feedback.^23 Layered long-term memory of successful vulnerability patterns continuously improves exploitation success rates.^23

Hybrid AI models fusing symbolic execution with deep learning have produced massive increases in automated code coverage.^24 This was validated when AI agents (e.g., Google's Big Sleep) autonomously discovered and exploited previously unknown zero-day vulnerabilities in real-world database components.^25 The Model Context Protocol (MCP) — originally designed as a tool-integration standard for AI agents — can be abused to enable distributed agents to coordinate operations covertly, bypassing traditional network telemetry and C2 beaconing detection.^26

### 10. The Unique Attack Surface of Generative AI

LLMs embedded in production applications inherit a vastly expanded, probabilistic attack surface that traditional scanners cannot assess.^27,28,29 Unique vulnerability classes include:

- **Direct Prompt Injection:** Explicit override commands compelling the model to ignore safety training.^31
- **Indirect Prompt Injection:** Malicious instructions embedded in external content (webpages, PDFs) processed by the AI.^31
- **Cross-Plugin Injection / Tool Misuse:** Manipulating AI-tool integrations to execute unauthorized API calls — deleting records, exfiltrating keys.^31
- **Jailbreaking:** Role-playing personas, encoding prompts in Base64/hex, or language-switching to exploit weaker safety alignment in low-resource languages.^31
- **Data Poisoning / Model Extraction:** Injecting malicious training examples to create dormant backdoors, or query-based extraction to steal proprietary model architectures.^31
- **Agentic Orchestration Flaws:** Inter-agent exploitation (socially engineering one agent to attack another) and memory manipulation (corrupting persistent context to influence future actions).^31

### 11. Automated AI Security Assessment Frameworks

Specialized frameworks automate adversarial testing of LLMs at scale:

| Framework | Capabilities |
| :---- | :---- |
| **Promptfoo** (Open Source) | Adaptive attack generation via smart agents; multi-turn conversational testing; NIST/OWASP/EU AI Act compliance mapping; CI/CD integration.^30 |
| **PyRIT** (Microsoft) | Multi-turn conversation attack orchestration; media converters for multi-modal testing (audio, imagery); automated scoring engine.^30 |
| **Garak** (NVIDIA) | High-volume scanner testing 100+ attack vectors across dozens of platforms; HTML reports with z-score grading for hallucination and injection risk.^30 |
| **DeepEval** (Confident AI) | RAG pipeline and agent stress-testing; OWASP LLM Top 10 alignment; automated PII leakage, bias, and context manipulation tests.^31 |
| **Mindgard** (Mindgard) | Enterprise-grade continuous runtime assessment; automated reconnaissance; SIEM integration for active threat blocking.^12 |

Empirical evaluation across 214,000+ attack attempts demonstrates that automated algorithmic testing achieves a 69.5% attack success rate versus 47.6% for manual testing — showing clear superiority in systematic exploration and pattern matching.^27

A minimal Promptfoo configuration for AI red teaming in CI/CD:

```yaml
# promptfoo-redteam.yaml
redteam:
  purpose: "Customer service chatbot for order management"
  plugins:
    - prompt-injection
    - indirect-prompt-injection
    - pii-leak
    - tool-misuse
    - jailbreak:
        strategies: [role-play, encoding, language-switch]
  strategies:
    - crescendo
    - multi-turn
```

### 12. Red Team Maturity Model

Organizational progression from ad-hoc testing to continuous automation follows five tiers.^31 This model aligns with broader capability maturity frameworks (e.g., NIST CSF tiering, MITRE ATT&CK evaluations) and should be cross-referenced with organizational security maturity assessments:

1. **Ad Hoc** — Manual point-in-time pentests; no formalized processes; reactive posture; zero automation.
2. **Repeatable** — Basic scripting and low-level automation; standard operating procedures; regular manual testing cadence.
3. **Defined** — Continuous methodology; automation integrated into CI/CD; custom TTP implementations; version-controlled offensive code.
4. **Managed** — Metrics-driven; continuous red teaming with advanced infrastructure; automated unit testing for implants/payloads; risk-based remediation prioritization; on-demand defensive validation.
5. **Optimizing** — Fully autonomous adversary emulation; automated tradecraft variation based on threat intelligence; proactive threat hunting; routine discovery governed by autonomous systems.

### 13. The 90-Day Implementation Blueprint

Operationalizing automated red teaming follows a structured three-phase approach:^31

- **Days 1–30 (Foundation):** Attack surface mapping, crown-jewel asset inventory (data stores, systems, and processes whose compromise would cause maximum business impact), and threat modeling workshops. Output: baseline automated attack library (standard prompt injections, PII leakage tests, SQLi mutations).
- **Days 31–60 (Operationalization):** Attack libraries embedded in CI/CD for weekly regression testing. Severity-based SLAs for triage and remediation defined and enforced.
- **Days 61–90 (Scaling):** Expansion to complex multi-turn attack paths and agentic abuse scenarios. Monthly purple-team exercises to fine-tune SIEM detection logic. Quarterly security posture metrics published.

### 14. Outcome-Driven Metrics

Traditional vulnerability counting conflates adversary emulation with scanning and fails to answer: "Is the organization improving its ability to stop real adversaries?" Mature programs track time-based operational metrics — established industry-standard measures for SOC and red team performance evaluation:

| Metric | Definition |
| :---- | :---- |
| **Time-to-Detection (TTD/MTTD)** | Duration from automated attack execution to actionable alert generation. Low TTD indicates accurate detection; high TTD highlights blind spots in logging, EDR, and SIEM. |
| **Time-to-Acknowledge (MTTA)** | Gap between alert generation and human/automated triage response. Evaluates alert prioritization pipeline efficiency. |
| **Time-to-Containment (TTC)** | Time to halt adversary progression after detection — validated when C2 mechanisms are severed. |
| **Time-to-Eviction (TTE)** | Total duration to completely cleanse the environment — all persistence mechanisms, scheduled tasks, backup C2, and lateral footholds identified and eradicated. |

Attacks are contextualized by a 1-to-5 complexity scoring system (Red Team Levels) weighting exploit sophistication, operational security, and TTP complexity — enabling organizations to track resistance improvement against advanced adversaries rather than noisy scripts.^6,8

For AI systems, Attack Success Rate (ASR) measures the percentage of automated probes that bypass safety guardrails, tracked per vulnerability category. CI/CD release gates automatically fail builds if critical vulnerabilities remain unmitigated, ASR exceeds thresholds (thresholds must be calibrated to organizational risk profile — e.g., 5% for high-risk financial categories may be appropriate, while internal tools may tolerate higher rates), or new code introduces >20% regression in historical ASR baselines.^31

## Discussion

### For Development Teams

Continuous automated red teaming requires development teams to treat security findings as first-class CI/CD signals — on par with test failures. The shift from annual audits to per-PR attack suites means developers receive immediate, actionable feedback on exploitability within their normal workflow. This demands security literacy beyond "fix the CVE" toward understanding multi-stage attack paths and business logic abuse.

### For Security Organizations

The maturity model transition from Level 1 (ad-hoc) to Level 3+ (defined/managed) represents a fundamental organizational shift — from security as gatekeeping function to security as engineering discipline. The 90-day blueprint provides a pragmatic acceleration path, but success depends on cultural alignment: purple-team collaboration, shared metrics ownership, and risk-based remediation prioritization rather than vulnerability volume.

### For AI-Enabled Applications

The dual role of AI as both offensive tool and attack surface creates a dual mandate. Organizations embedding LLMs in production must simultaneously leverage agentic AI for autonomous vulnerability discovery in traditional software and deploy specialized frameworks (Promptfoo, PyRIT, Garak) to continuously assess the probabilistic, non-deterministic attack surfaces of AI systems themselves. The 69.5% vs 47.6% ASR disparity between automated and manual testing establishes automated AI red teaming as empirically superior for systematic coverage.

### Limitations

**Scope of Automation:** Automated red teaming excels at systematic, pattern-based attack execution but cannot replicate the creative, context-sensitive intuition of expert human red teamers for novel exploit chains and social engineering. The optimal model is hybrid — automation for breadth and continuous coverage, human operators for depth and novel attack synthesis.

**Legal and Authorization Requirements:** Automated red teaming techniques — including credential extraction, privilege escalation, and lateral movement — may violate computer fraud statutes or internal policies if executed without formal authorization. Organizations must establish rules of engagement, scoping agreements, and legal review processes before deploying automated adversary emulation, particularly for techniques targeting production-adjacent environments or third-party integrations.

**Tool Ecosystem Maturity:** C2 frameworks and AD exploitation tools are mature for enterprise Windows environments but less developed for cloud-native, containerized, and serverless architectures. Pipeline security testing (GitHub Actions injection) remains an emerging discipline with limited tooling standardization.

**AI Red Teaming Volatility:** LLM attack surfaces evolve with every model update, fine-tuning cycle, and prompt engineering change. Static attack libraries degrade rapidly; frameworks must continuously adapt their adversarial generation strategies to remain effective against moving targets.

**Metric Validity:** Time-based metrics (TTD, TTC, TTE) measure defensive operational efficiency but do not directly quantify residual risk exposure. ASR provides probabilistic coverage assessment but is sensitive to the representativeness of the automated attack corpus.

## Conclusions

The transition from manual penetration testing to continuous automated red teaming is a structural necessity driven by the compounding velocity of software development, attack surface complexity, and daily vulnerability disclosure rates. Sustainable software security requires:

1. **Methodological automation** — Codified reconnaissance, intelligent fuzzing, and algorithmic lateral movement executing multi-stage attack paths at scale.
2. **Pipeline integration** — Ephemeral environments, staging scans, or shadow endpoints embedding adversary emulation directly into CI/CD workflows with machine-readable policy enforcement.
3. **Dual AI mandate** — Leveraging agentic AI for autonomous zero-day discovery while simultaneously deploying specialized frameworks to assess the probabilistic attack surfaces of production LLMs.
4. **Maturity progression** — Structured organizational evolution from ad-hoc testing through metrics-driven continuous automation, guided by the 90-day operationalization blueprint.
5. **Outcome-driven measurement** — Replacing vulnerability counting with time-based eviction metrics (TTD, TTC, TTE) and attack success rates contextualized by complexity scoring.

The future of software security is algorithmic, continuous, and mathematically measurable — the static annual audit is structurally incompatible with the modern software delivery lifecycle.

## Open Questions

- What architectural patterns enable automated red teaming in serverless and edge-computing environments where traditional C2 infrastructure and lateral movement assumptions do not apply?
- How should organizations balance the security coverage benefits of per-PR ephemeral environment testing against the infrastructure cost and environmental impact of provisioning disposable stacks at scale?
- What governance frameworks are needed for agentic AI red teams that autonomously discover and exploit zero-day vulnerabilities — particularly regarding responsible disclosure timelines and operational containment?
- How can AI red teaming frameworks maintain effectiveness as model providers implement increasingly sophisticated safety alignment, and what is the equilibrium between attack automation and defensive adaptation?
- What quantitative thresholds for TTD, TTC, and ASR should trigger deployment halting across different organizational risk profiles and regulatory environments?

## References

1. "Vulnerabilities Statistics 2025: Record CVEs, Zero-Days & Exploits." DeepStrike. https://deepstrike.io/blog/vulnerability-statistics-2025
2. "Red teaming vs. penetration testing: A guide to comprehensive security testing." Bugcrowd. https://www.bugcrowd.com/blog/red-teaming-vs-penetration-testing-a-guide-to-comprehensive-security-testing/
3. "Red teaming vs. vulnerability scanning – which is right for you?" RedSentry. https://redsentry.com/resources/blog/red-teaming-vs.-vulnerability-scanning
4. "Top Automated Pentesting Tools (2025)." Security Boulevard. https://securityboulevard.com/2025/08/top-automated-pentesting-tools-2025/
5. "I replaced manual pen tests with automation. Here's what I learned." CSO Online. https://www.csoonline.com/article/4141544/i-replaced-manual-pen-tests-with-automation-heres-what-i-learned.html
6. "Red Teaming in 2026: The Bleeding Edge of Security Testing." CyCognito. https://www.cycognito.com/learn/red-teaming/
7. "Understanding Red Teaming Methodology: A Complete Guide." Appsecure Security. https://www.appsecure.security/blog/understanding-red-teaming-methodology-a-complete-guide
8. "Automated Red Teaming: Capabilities, Pros/Cons, and Latest Trends." Mend.io. https://www.mend.io/blog/automated-red-teaming-capabilities-pros-cons-and-latest-trends/
9. "Continuous Penetration Testing vs. Vulnerability Management." Evolve Security. https://www.evolvesecurity.com/blog-posts/continuous-penetration-testing-vs-vulnerability-management
10. "LLM-Assisted Model-Based Fuzzing of Protocol Implementations." arXiv. https://arxiv.org/html/2508.01750v1
11. "SmartInject: Automated SQL Injection Testing Using Deep Q-Learning and LSTM-Based Payload Generation." IEEE Xplore. https://ieeexplore.ieee.org/abstract/document/11398683/
12. "Red Teaming Tools: Key Features & Top 8 Tools to Know in 2025." CyCognito. https://www.cycognito.com/learn/red-teaming/red-teaming-tools/
13. "What Is Automated Red Teaming?" Picus Security. https://www.picussecurity.com/resource/glossary/what-is-automated-red-teaming
14. "Our Red Team's Favorite Penetration Testing Tools in 2025 (And How We Use Them)." OP-C. https://op-c.net/blog/penetration-testing-tools-2025/
15. "Continuous Pentesting in CI/CD: Automating Security Testing." Aikido. https://www.aikido.dev/blog/continuous-pentesting-ci-cd
16. "Automating DAST in CI/CD: Application Security Without the Bottlenecks." Invicti. https://www.invicti.com/blog/web-security/automating-dast-in-ci-cd
17. "Continuous DAST in CI/CD Pipelines: A Practical Guide." Astra Security. https://www.getastra.com/blog/dast/continuous-dast-in-cicd-pipelines/
18. "Anyone here actually doing 'continuous pentesting' instead of yearly audits?" Reddit r/Pentesting. https://www.reddit.com/r/Pentesting/comments/1ojx2uz/
19. "A Deep Dive into Pipeline Injection Attacks: Designing a Red Team Framework for GitHub Actions." Medium. https://medium.com/tom-tech/a-deep-dive-into-pipeline-injection-attacks-designing-a-red-team-framework-for-github-actions-22f9ddb25d19
20. "Top Red Team Tools & C2 Frameworks for 2025: Active Directory & Network Exploitation." Bishop Fox. https://bishopfox.com/blog/2025-red-team-tools-c2-frameworks-active-directory-network-exploitation
21. "Automate dynamic application security testing with RapiDAST." Red Hat Developer. https://developers.redhat.com/articles/2025/06/19/automate-dynamic-application-security-testing-rapidast
22. "RedTeamLLM: an Agentic AI framework for offensive security." arXiv. https://arxiv.org/html/2505.06913v1
23. "Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents." arXiv. https://arxiv.org/html/2602.02164v2
24. "Modern Approaches to Software Vulnerability Detection: A Survey of Machine Learning, Deep Learning, and Large Language Models." MDPI. https://www.mdpi.com/2079-9292/14/22/4449
25. "Agentic Red Teams Are Here: Autonomous Vulnerability Discovery Ushers in a New Security Paradigm." Kodem. https://www.kodemsecurity.com/resources/agentic-red-teams-are-here-autonomous-vulnerability-discovery-ushers-in-a-new-security-paradigm
26. "Hiding in the AI Traffic: Abusing MCP for LLM-Powered Agentic Red Teaming." arXiv. https://arxiv.org/html/2511.15998v1
27. "The Automation Advantage in AI Red Teaming." arXiv. https://arxiv.org/html/2504.19855v1
28. "AI red teaming: Tools, frameworks, and attack strategies explained." Vectra AI. https://www.vectra.ai/topics/ai-red-teaming
29. "What Is AI Red Teaming? Why You Need It and How to Implement." Palo Alto Networks. https://www.paloaltonetworks.com/cyberpedia/what-is-ai-red-teaming
30. "Top Open Source AI Red-Teaming and Fuzzing Tools in 2025." Promptfoo. https://www.promptfoo.dev/blog/top-5-open-source-ai-red-teaming-tools-2025/
31. "AI-Red-Teaming-Guide." GitHub (requie). https://github.com/requie/AI-Red-Teaming-Guide