[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1387

2026-03-20T22:20:26Z

github-actions[bot]
bot Mar 20, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and layered CI/CD pipeline with 52 total GitHub Actions workflows (19 native .yml + 21 compiled agentic .lock.yml + 12 scheduled/event-triggered). Overall pipeline health is good, with multi-tiered testing from unit → integration → smoke tests, plus security scanning at multiple layers.

Pipeline Architecture (4 tiers):

Tier	Workflows	PR Triggered
Build & Static Analysis	`build.yml`, `lint.yml`, `test-integration.yml` (type-check), `pr-title.yml`	✅ All PRs
Unit & Coverage	`test-coverage.yml`	✅ All PRs
Integration Tests	`test-integration-suite.yml`, `test-chroot.yml`, `test-examples.yml`, `test-action.yml`	✅ All PRs
Security	`codeql.yml`, `dependency-audit.yml`, `container-scan.yml`, `security-guard.lock.yml`	✅ Most PRs
Smoke Tests	`smoke-claude.lock.yml`, `smoke-copilot.lock.yml`, `smoke-codex.lock.yml`, `smoke-chroot.lock.yml`	⚠️ Reaction-gated
AI-Assisted	`build-test.lock.yml`, `security-guard.lock.yml`	✅ All PRs

✅ Existing Quality Gates

Code Quality:

ESLint (TypeScript) + markdownlint on all PRs
TypeScript strict type checking (tsc --noEmit)
Conventional commit PR title enforcement (feat/fix/docs/etc.)
Documentation link checking on Markdown file changes

Testing:

Unit tests with Jest + Istanbul coverage reporting
Coverage delta comparison (PR vs base branch) with automatic PR comments
Integration tests: Domain/Network filtering, Protocol/Security, Container Ops, API Proxy (4 parallel jobs, 45 min each)
Chroot integration tests: language runtimes (Python, Go, Java, .NET, Ruby, Rust), package managers, security properties
Examples test suite: real .sh example scripts tested end-to-end
Setup Action tests: action.yml tested with latest and pinned versions

Security:

GitHub CodeQL (JavaScript/TypeScript + Actions) on all PRs and weekly
Trivy container vulnerability scanning (CRITICAL/HIGH) for agent and squid images — results to Security tab
npm audit with SARIF upload for main and docs-site packages; fails on high/critical
AI-powered security review (Claude) on all PRs via security-guard.lock.yml
Scheduled daily/weekly dependency security monitoring, secret diggers

Documentation:

Documentation preview build on docs-file changes
Link checking on Markdown changes

Performance:

Weekly performance benchmarks with automated regression issue creation

🔍 Identified Gaps

🔴 High Priority

1. Coverage thresholds are critically low
Current enforced thresholds: Statements 38%, Branches 30%, Functions 35%, Lines 38%.
Critical files: cli.ts at 0% coverage (entry point), docker-manager.ts at 18% coverage (core orchestration — 250 statements, only 45 covered). The regression check only blocks decreases from an already low baseline — a PR that stays at 0% cli.ts coverage passes without issue.

2. Container scan skips source-code changes
container-scan.yml only triggers on paths: containers/** — changes to src/squid-config.ts, src/docker-manager.ts, or src/host-iptables.ts affect container behavior but don't trigger a rebuild and rescan. A security regression in these files won't surface in the Security tab until a container file also changes.

3. Performance benchmarks not on PRs
performance-monitor.yml runs weekly on schedule only. A PR that doubles container startup time (a critical user-visible metric for this tool) won't be flagged until the next Monday benchmark run, well after merge.

4. Integration test naming confusion (test-integration.yml ≠ integration tests)
test-integration.yml is named "TypeScript Type Check" in its name: field — it runs only tsc --noEmit. The actual integration tests are in test-integration-suite.yml. This creates confusion in the status checks UI and in PR merge requirements.

🟡 Medium Priority

5. Smoke tests require manual emoji reactions to run on PRs
The four smoke tests (smoke-claude, smoke-copilot, smoke-codex, smoke-chroot) require specific emoji reactions (❤️, 👀, 🎉, 🚀) from maintainers to trigger on PRs. For regular contributors, smoke tests only run on schedule (every 12h). A PR that breaks the Claude/Copilot/Codex agent execution path may merge before the next scheduled run validates it.

6. No Dockerfile linting (hadolint)
Container security is central to this project. The three Dockerfiles (containers/squid/, containers/agent/, containers/api-proxy/) are not linted with hadolint or equivalent. Best-practice violations (e.g., RUN apt-get without version pinning, missing --no-install-recommends, unnecessary COPY layers) won't be caught automatically.

7. No test coverage for api-proxy Node.js package on PRs
The API proxy sidecar (containers/api-proxy/) has its own package.json and test suite but is only run in build.yml (Build Verification), which tests on both Node 20 and 22. However, its test coverage is not measured or tracked alongside the main package. api-proxy handles real API credential injection — bugs here have high security impact.

8. No npm audit blocking on containers/api-proxy
dependency-audit.yml audits the main package and docs-site but misses containers/api-proxy/package.json. The api-proxy's dependencies could have high/critical vulnerabilities without any automated gate.

9. Link checking not triggered on source code changes
link-check.yml only runs on paths: **/*.md. URLs embedded in TypeScript source files (e.g., documentation comments, error messages pointing to docs) are not validated.

🟢 Low Priority

10. No artifact size regression check
No check on compiled output size (dist/ bundle) or Docker image size. A dependency change that bloats dist/cli.js or the agent container image by 100MB would go undetected.

11. Performance benchmarks use unpinned action versions
performance-monitor.yml uses actions/checkout@v4 and actions/setup-node@v4 without SHA-pinning, while all other workflows pin to full SHAs. This is inconsistent with the project's supply chain security practices.

12. No mutation testing
With unit test coverage at 38% and critical modules at 0-18%, there's no mechanism to verify that tests are actually catching bugs (not just running code). Mutation testing (e.g., Stryker) would reveal whether the existing test suite has meaningful assertions.

13. No spell check on documentation or code comments
No automated spell checking runs. This is a low-impact quality gap but affects documentation credibility.

📋 Actionable Recommendations

#	Gap	Recommendation	Complexity	Impact
1	Low coverage thresholds	Raise thresholds incrementally (e.g., 45%/35%/40%/45%) and add per-file minimums for `docker-manager.ts` and `cli.ts` in Jest config. Set a 6-month roadmap to reach 70%.	Medium	High
2	Container scan misses src/ changes	Extend `container-scan.yml` `paths:` to include `src/**` so any TS change re-scans containers. Or add a separate job in `build.yml` that builds and scans containers on every PR.	Low	High
3	No PR performance benchmarks	Add a lightweight benchmark job to `build.yml` measuring container startup time with a 2× regression threshold. Full benchmarks remain weekly; PR check uses a single fast iteration.	Medium	High
4	Workflow naming confusion	Rename `test-integration.yml` → `type-check.yml` (or update its `name:` field to "TypeScript Type Check") to match what it actually does.	Low	Medium
5	Smoke tests require reactions	Add a `smoke-fast` job to the integration suite that runs one minimal smoke scenario (e.g., `curl` through the firewall) on every PR without requiring a reaction. Keep full smoke tests as scheduled/reaction-triggered.	High	High
6	No Dockerfile linting	Add `hadolint` to `build.yml` or a dedicated `docker-lint.yml` checking all three Dockerfiles on PR. Use `DL3008` and related rules.	Low	Medium
7	api-proxy coverage not tracked	Add `cd containers/api-proxy && npm run test -- --coverage` to `test-coverage.yml` and upload results as a separate artifact.	Low	Medium
8	api-proxy not audited	Extend `dependency-audit.yml` to add a third `audit-api-proxy` job mirroring `audit-main`.	Low	High
9	No source-code link validation	Extend `link-check.yml` to also check links in `.ts` files using a regexp-based lychee config or custom script.	Medium	Low
10	No artifact size check	Add a step in `build.yml` that records `dist/` total size and fails if it exceeds a threshold (e.g., +20% from baseline stored as an artifact).	Medium	Low
11	Unpinned actions in perf monitor	Pin `actions/checkout` and `actions/setup-node` in `performance-monitor.yml` to full SHAs to match repo-wide convention.	Low	Medium
12	No mutation testing	Evaluate Stryker Mutator for `src/squid-config.ts` and `src/host-iptables.ts` (highest security value). Add as a weekly scheduled workflow initially.	High	Medium

📈 Metrics Summary

Metric	Value
Total workflows	52 (19 native YAML + 21 agentic lock + 12 shared/other)
Workflows triggered on PRs	~14 native + 2 agentic
Unit test coverage – Statements	38.39% (threshold: 38%)
Unit test coverage – Branches	31.78% (threshold: 30%)
Unit test coverage – Functions	37.03% (threshold: 35%)
Largest coverage gap	`cli.ts` at 0%, `docker-manager.ts` at 18%
Integration test jobs on PRs	4 parallel (domain/network, protocol/security, container ops, API proxy)
Chroot test jobs on PRs	4 parallel (languages, package managers, security, shell features)
Security scans on PRs	CodeQL (JS/TS + Actions), npm audit, AI security review
Performance monitoring frequency	Weekly only (not on PRs)
Container scan frequency on PRs	Only when `containers/**` files change

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 27, 2026, 10:20 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1387

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1387

Uh oh!

github-actions[bot] bot Mar 20, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

📈 Metrics Summary

Replies: 0 comments

github-actions[bot]
bot Mar 20, 2026