You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an automated analysis of the CI/CD pipeline and integration test coverage in this repository, with actionable recommendations for improving PR quality measurement.
📊 Current CI/CD Pipeline Status
The repository has a well-structured, multi-layered CI/CD pipeline with 40 YAML workflows and 21 agentic (.md) workflows — 61 total. The pipeline covers build verification, linting, type checking, unit tests, integration tests, security scanning, documentation, and end-to-end smoke testing.
Workflows running on pull_request events:
Workflow
Type
Purpose
build.yml
Static
Build verification (Node 20 + 22 matrix) + API proxy unit tests
lint.yml
Static
ESLint + Markdownlint
test-integration.yml
Static
Integration tests (4 parallel jobs: domain/network, protocol/security, container-ops, API proxy)
test-integration-suite.yml
Static
Integration tests (duplicate of above — same content, same name)
✅ Semantic PR titles — Enforced via amannn/action-semantic-pull-request
Testing
✅ Unit tests with coverage — Jest with 38% statement threshold; reports coverage delta in PR comments
✅ Integration tests (30+ test files) — Domain filtering, DNS, protocols, container ops, API proxy, chroot languages, package managers
✅ Examples test — All example shell scripts verified end-to-end
✅ Setup action test — GitHub Action versioning and image pull tested
Security
✅ CodeQL SAST — JavaScript/TypeScript and Actions language analysis
✅ Dependency audit — npm audit --audit-level=high for main + docs-site
✅ Container scanning — Trivy (HIGH/CRITICAL) on agent and squid containers
✅ AI security guard — Claude reviews every PR for security boundary changes
✅ Secret diggers — Three hourly agentic workflows scanning for leaked secrets
Documentation
✅ Docs preview — Astro/Starlight site builds verified on doc changes
✅ Link checker — Lychee checks broken links on markdown changes
Smoke Tests
✅ Multi-agent smoke tests — Smoke tests for Claude, Codex, Copilot, and chroot (run on PRs but gated by emoji reactions)
🔍 Identified Gaps
🔴 High Priority
1. 7 Integration Test Files Not Executed in CI
Seven integration test files exist in tests/integration/ but do not match any --testPathPatterns in any CI workflow:
Missing Test File
Security Relevance
api-target-allowlist.test.ts
Validates API targets are auto-added to domain allowlist
chroot-capsh-chain.test.ts
Validates capability dropping in chroot
chroot-copilot-home.test.ts
Validates whitelisted home directory isolation
gh-host-injection.test.ts
Tests GH_HOST injection prevention
ghes-auto-populate.test.ts
Tests GHES domain auto-population
skip-pull.test.ts
Tests --skip-pull flag behavior
workdir-tmpfs-hiding.test.ts
Tests workdir tmpfs isolation
Several of these (chroot-capsh-chain, gh-host-injection, chroot-copilot-home) are security-critical tests that verify the firewall's isolation guarantees are not silently broken by code changes.
2. Critically Low Unit Test Coverage — Core Files at Near-Zero
From COVERAGE_SUMMARY.md:
File
Statement Coverage
Priority
cli.ts
0%
🔴 Critical
docker-manager.ts
18%
🔴 Critical
host-iptables.ts
83%
🟡 Good
cli.ts (entry point, signal handling, orchestration) and docker-manager.ts (all container lifecycle logic, compose generation, bind mount config) are the two most important files and are essentially untested at the unit level. A refactor in either file could introduce regressions that slip through.
3. Coverage Thresholds Are Too Low to Be Meaningful
Current thresholds: Statements 38%, Branches 30%, Functions 35%, Lines 38%. Given that cli.ts is 0% and docker-manager.ts is 18%, these thresholds can pass while the most important code paths have no coverage at all. The thresholds do not enforce coverage on security-critical paths.
4. Container Security Scan Has a Path Filter Gap
container-scan.yml only triggers on containers/** path changes. Changes to src/docker-manager.ts or src/squid-config.ts that alter container configuration, mount points, or capabilities do not retrigger the container scan — even though those source changes directly affect runtime security posture.
Both files have the name Integration Tests and identical content (4 parallel jobs: domain/network, protocol/security, container-ops, API proxy). This causes confusion in the PR check list and doubles the build cost with no added value. One should be removed or differentiated.
6. Smoke Tests Are Not Automatic — Require Emoji Reaction
The agentic smoke tests (smoke-claude.md, smoke-codex.md, smoke-copilot.md, smoke-chroot.md) run on PRs but only when a maintainer adds a specific emoji reaction (❤️, 🎉, 👀, 🚀). They do not run automatically. This means a PR that breaks the actual Claude/Copilot/Codex agent execution can merge without the smoke tests ever firing.
7. Performance Benchmarks Never Run on PRs
performance-monitor.yml only runs on a weekly schedule. A PR that introduces a 2× container startup regression would not be caught until the following week. No performance gate exists on the PR merge path.
8. api-proxy Container Not Scanned by Trivy
container-scan.yml scans awf-agent and awf-squid but the API proxy sidecar (containers/api-proxy/) is not scanned. The API proxy handles real API credentials (OpenAI, Anthropic, Copilot tokens) and runs as a network-accessible service, making it a high-value target for CVEs.
9. No SBOM (Software Bill of Materials) Generation
No workflow generates or attaches an SBOM to releases. For a security tool distributed as a Docker image and npm binary, SBOM attestation is increasingly expected for supply chain transparency. This is especially relevant since the project publishes to GHCR.
10. No Coverage Enforcement Per File or Per Module
Coverage is enforced globally (38% statements project-wide) but not per-module. A contributor could add 1000 new lines with 0% coverage to docker-manager.ts and the global threshold would still pass, as long as other covered files compensate.
🟢 Low Priority
11. No License Compliance Check
No workflow scans dependencies for license compatibility. As a tool used in enterprise/CI environments and distributed on npm/GHCR, license drift (a dependency changing from MIT to GPL/AGPL) should be automatically detected.
12. No Spell Check on Documentation
The link checker (link-check.yml) validates URLs but there is no spell check or prose style linting on documentation. The docs site (docs-site/) targets enterprise users and engineers who may file issues for documentation errors.
13. Documentation Build Not Triggered by Code Changes
docs-preview.yml only builds the docs when docs-site/**, docs/**, or *.md files change. A change to src/ that adds a new CLI flag would not trigger a docs preview build. Manual verification is needed to confirm docs remain accurate after code changes.
14. No Commit Message Validation in CI
commitlint is configured (via commitlint.config.js + husky) as a local pre-commit hook, but there is no CI enforcement. Commits merged via the GitHub UI, squash-merges from PRs, or commits from automated tools bypass the hook entirely.
📋 Actionable Recommendations
R1: Add Missing Integration Tests to CI Matrix [High | Low Complexity]
Issue: 7 integration test files never run in CI. Fix: Add the missing test patterns to test-integration.yml:
- name: Run security isolation testsrun: | npm run test:integration -- \ --testPathPatterns="(api-target-allowlist|chroot-capsh-chain|chroot-copilot-home|gh-host-injection|ghes-auto-populate|skip-pull|workdir-tmpfs-hiding)" \ --verbose
Impact: Catches regressions in security-critical isolation paths that are currently invisible to CI.
R2: Increase Coverage Thresholds and Add Per-File Minimums [High | Medium Complexity]
Issue: 38% global threshold allows critical files to have 0% coverage. Fix: Raise global thresholds incrementally and add per-file overrides in jest.config.js:
Issue: Container scan skips PRs that change container config in src/. Fix: Add src/** to the paths: filter in container-scan.yml trigger. Impact: Ensures every code change that could affect container security posture triggers a Trivy scan.
R4: Add Trivy Scan for API Proxy Container [Medium | Low Complexity]
Issue: API proxy container is excluded from security scanning. Fix: Add a third scan-api-proxy job to container-scan.yml mirroring the existing scan-agent job with ./containers/api-proxy. Impact: Closes a CVE blind spot on the component that holds real API credentials.
R5: Remove Duplicate Integration Test Workflow [Medium | Low Complexity]
Issue:test-integration.yml and test-integration-suite.yml are identical. Fix: Delete one file; keep the one with better path filtering. Impact: Halves unnecessary CI runtime and removes check list confusion.
R6: Make Smoke Tests Automatically Run on PRs (Opt-Out Model) [Medium | Medium Complexity]
Issue: Smoke tests only run when maintainer adds emoji reaction. Fix: Run smoke tests automatically on PRs with roles: maintainer to avoid burning runner minutes on external contributor PRs. Or add a required smoke test for a single agent (e.g., smoke-copilot.md) to block merges. Impact: Prevents merging PRs that silently break the end-to-end agent execution flow.
R7: Add Performance Gate on PRs [Medium | Medium Complexity]
Issue: Performance regressions only detected weekly. Fix: Add a lightweight startup-time benchmark step (container up + simple command) to build.yml or a new PR-targeted workflow. Fail if time exceeds a 2× threshold vs. a stored baseline. Impact: Catches startup regressions before they reach users.
Issue: No supply chain transparency for releases. Fix: Add anchore/sbom-action to release.yml and attach SBOM to GitHub Release assets. Impact: Meets enterprise compliance requirements and improves supply chain security posture.
Impact: Prevents accidental introduction of copyleft dependencies.
R10: Enforce Commitlint in CI [Low | Low Complexity]
Issue: Commit message convention only enforced locally via husky. Fix: Add a step to lint.yml that runs commitlint on the PR's commits via npx commitlint --from origin/main --to HEAD. Impact: Ensures consistent commit history regardless of how commits are created.
📈 Metrics Summary
Metric
Value
Total workflow files
61 (40 YAML + 21 agentic)
Workflows running on PRs
~20
Unit test files
6
Unit test count
135
Statement coverage
38.39% (threshold: 38%)
Branch coverage
31.78% (threshold: 30%)
Integration test files
30
Integration test files not in CI
7 (23%)
Security scanning tools
CodeQL, Trivy, npm audit, AI security guard
cli.ts coverage
0%
docker-manager.ts coverage
18%
Recent PR Title Check failure rate
~20% (non-conforming PR titles)
Containers scanned by Trivy
2 of 3 (API proxy missing)
Generated by automated CI/CD gap assessment workflow on 2026-03-18.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
This is an automated analysis of the CI/CD pipeline and integration test coverage in this repository, with actionable recommendations for improving PR quality measurement.
📊 Current CI/CD Pipeline Status
The repository has a well-structured, multi-layered CI/CD pipeline with 40 YAML workflows and 21 agentic (
.md) workflows — 61 total. The pipeline covers build verification, linting, type checking, unit tests, integration tests, security scanning, documentation, and end-to-end smoke testing.Workflows running on pull_request events:
build.ymllint.ymltest-integration.ymltest-integration-suite.ymltest-chroot.ymltest-examples.ymlexamples/*.shscripts end-to-endtest-action.ymltest-coverage.ymlcodeql.ymldependency-audit.ymlcontainer-scan.ymlcontainers/**path changes)pr-title.ymldocs-preview.ymllink-check.yml*.mdpath changes)build-test.mdsecurity-guard.mdsmoke-claude.mdsmoke-codex.mdsmoke-copilot.mdsmoke-chroot.md✅ Existing Quality Gates
Code Quality
tsc --noEmitamannn/action-semantic-pull-requestTesting
Security
npm audit --audit-level=highfor main + docs-siteagentandsquidcontainersDocumentation
Smoke Tests
🔍 Identified Gaps
🔴 High Priority
1. 7 Integration Test Files Not Executed in CI
Seven integration test files exist in
tests/integration/but do not match any--testPathPatternsin any CI workflow:api-target-allowlist.test.tschroot-capsh-chain.test.tschroot-copilot-home.test.tsgh-host-injection.test.tsghes-auto-populate.test.tsskip-pull.test.ts--skip-pullflag behaviorworkdir-tmpfs-hiding.test.tsSeveral of these (
chroot-capsh-chain,gh-host-injection,chroot-copilot-home) are security-critical tests that verify the firewall's isolation guarantees are not silently broken by code changes.2. Critically Low Unit Test Coverage — Core Files at Near-Zero
From
COVERAGE_SUMMARY.md:cli.tsdocker-manager.tshost-iptables.tscli.ts(entry point, signal handling, orchestration) anddocker-manager.ts(all container lifecycle logic, compose generation, bind mount config) are the two most important files and are essentially untested at the unit level. A refactor in either file could introduce regressions that slip through.3. Coverage Thresholds Are Too Low to Be Meaningful
Current thresholds: Statements 38%, Branches 30%, Functions 35%, Lines 38%. Given that
cli.tsis 0% anddocker-manager.tsis 18%, these thresholds can pass while the most important code paths have no coverage at all. The thresholds do not enforce coverage on security-critical paths.4. Container Security Scan Has a Path Filter Gap
container-scan.ymlonly triggers oncontainers/**path changes. Changes tosrc/docker-manager.tsorsrc/squid-config.tsthat alter container configuration, mount points, or capabilities do not retrigger the container scan — even though those source changes directly affect runtime security posture.🟡 Medium Priority
5. Duplicate Workflow Definition (
test-integration.yml=test-integration-suite.yml)Both files have the name
Integration Testsand identical content (4 parallel jobs: domain/network, protocol/security, container-ops, API proxy). This causes confusion in the PR check list and doubles the build cost with no added value. One should be removed or differentiated.6. Smoke Tests Are Not Automatic — Require Emoji Reaction
The agentic smoke tests (
smoke-claude.md,smoke-codex.md,smoke-copilot.md,smoke-chroot.md) run on PRs but only when a maintainer adds a specific emoji reaction (❤️, 🎉, 👀, 🚀). They do not run automatically. This means a PR that breaks the actual Claude/Copilot/Codex agent execution can merge without the smoke tests ever firing.7. Performance Benchmarks Never Run on PRs
performance-monitor.ymlonly runs on a weekly schedule. A PR that introduces a 2× container startup regression would not be caught until the following week. No performance gate exists on the PR merge path.8.
api-proxyContainer Not Scanned by Trivycontainer-scan.ymlscansawf-agentandawf-squidbut the API proxy sidecar (containers/api-proxy/) is not scanned. The API proxy handles real API credentials (OpenAI, Anthropic, Copilot tokens) and runs as a network-accessible service, making it a high-value target for CVEs.9. No SBOM (Software Bill of Materials) Generation
No workflow generates or attaches an SBOM to releases. For a security tool distributed as a Docker image and npm binary, SBOM attestation is increasingly expected for supply chain transparency. This is especially relevant since the project publishes to GHCR.
10. No Coverage Enforcement Per File or Per Module
Coverage is enforced globally (38% statements project-wide) but not per-module. A contributor could add 1000 new lines with 0% coverage to
docker-manager.tsand the global threshold would still pass, as long as other covered files compensate.🟢 Low Priority
11. No License Compliance Check
No workflow scans dependencies for license compatibility. As a tool used in enterprise/CI environments and distributed on npm/GHCR, license drift (a dependency changing from MIT to GPL/AGPL) should be automatically detected.
12. No Spell Check on Documentation
The link checker (
link-check.yml) validates URLs but there is no spell check or prose style linting on documentation. The docs site (docs-site/) targets enterprise users and engineers who may file issues for documentation errors.13. Documentation Build Not Triggered by Code Changes
docs-preview.ymlonly builds the docs whendocs-site/**,docs/**, or*.mdfiles change. A change tosrc/that adds a new CLI flag would not trigger a docs preview build. Manual verification is needed to confirm docs remain accurate after code changes.14. No Commit Message Validation in CI
commitlintis configured (viacommitlint.config.js+ husky) as a local pre-commit hook, but there is no CI enforcement. Commits merged via the GitHub UI, squash-merges from PRs, or commits from automated tools bypass the hook entirely.📋 Actionable Recommendations
R1: Add Missing Integration Tests to CI Matrix [High | Low Complexity]
Issue: 7 integration test files never run in CI.
Fix: Add the missing test patterns to
test-integration.yml:Impact: Catches regressions in security-critical isolation paths that are currently invisible to CI.
R2: Increase Coverage Thresholds and Add Per-File Minimums [High | Medium Complexity]
Issue: 38% global threshold allows critical files to have 0% coverage.
Fix: Raise global thresholds incrementally and add per-file overrides in
jest.config.js:Impact: Forces test investment in the highest-risk files.
R3: Expand Container Security Scan Trigger Paths [High | Low Complexity]
Issue: Container scan skips PRs that change container config in
src/.Fix: Add
src/**to thepaths:filter incontainer-scan.ymltrigger.Impact: Ensures every code change that could affect container security posture triggers a Trivy scan.
R4: Add Trivy Scan for API Proxy Container [Medium | Low Complexity]
Issue: API proxy container is excluded from security scanning.
Fix: Add a third
scan-api-proxyjob tocontainer-scan.ymlmirroring the existingscan-agentjob with./containers/api-proxy.Impact: Closes a CVE blind spot on the component that holds real API credentials.
R5: Remove Duplicate Integration Test Workflow [Medium | Low Complexity]
Issue:
test-integration.ymlandtest-integration-suite.ymlare identical.Fix: Delete one file; keep the one with better path filtering.
Impact: Halves unnecessary CI runtime and removes check list confusion.
R6: Make Smoke Tests Automatically Run on PRs (Opt-Out Model) [Medium | Medium Complexity]
Issue: Smoke tests only run when maintainer adds emoji reaction.
Fix: Run smoke tests automatically on PRs with
roles: maintainerto avoid burning runner minutes on external contributor PRs. Or add a required smoke test for a single agent (e.g.,smoke-copilot.md) to block merges.Impact: Prevents merging PRs that silently break the end-to-end agent execution flow.
R7: Add Performance Gate on PRs [Medium | Medium Complexity]
Issue: Performance regressions only detected weekly.
Fix: Add a lightweight startup-time benchmark step (container up + simple command) to
build.ymlor a new PR-targeted workflow. Fail if time exceeds a 2× threshold vs. a stored baseline.Impact: Catches startup regressions before they reach users.
R8: Add SBOM Generation to Release Workflow [Medium | Low Complexity]
Issue: No supply chain transparency for releases.
Fix: Add
anchore/sbom-actiontorelease.ymland attach SBOM to GitHub Release assets.Impact: Meets enterprise compliance requirements and improves supply chain security posture.
R9: Add License Compliance Scanning [Low | Low Complexity]
Issue: No license drift detection.
Fix: Add
license-checkerorlicenseeas a CI step independency-audit.yml:npx license-checker --onlyAllow 'MIT;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC;CC0-1.0'Impact: Prevents accidental introduction of copyleft dependencies.
R10: Enforce Commitlint in CI [Low | Low Complexity]
Issue: Commit message convention only enforced locally via husky.
Fix: Add a step to
lint.ymlthat runs commitlint on the PR's commits vianpx commitlint --from origin/main --to HEAD.Impact: Ensures consistent commit history regardless of how commits are created.
📈 Metrics Summary
cli.tscoveragedocker-manager.tscoverageGenerated by automated CI/CD gap assessment workflow on 2026-03-18.
Beta Was this translation helpful? Give feedback.
All reactions