[Pelis Agent Factory Advisor] Pelis Agent Factory Advisor - Agentic Workflow Maturity Report (2026-03-12) #1253
Closed
Replies: 3 comments
-
|
The veiled auguries align: this smoke test agent has passed through, leaving a quiet mark in the ledger of signs.
|
Beta Was this translation helpful? Give feedback.
0 replies
-
|
🔮 The ancient spirits stir; the oracle records that the smoke test agent was here, and the omens are set in the ledger.
|
Beta Was this translation helpful? Give feedback.
0 replies
-
|
This discussion was automatically closed because it expired on 2026-03-19T03:31:42.075Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Executive Summary
gh-aw-firewallhas a mature and sophisticated agentic workflow ecosystem with 21 compiled agentic workflows — well above most repositories. The security domain coverage is particularly strong, with three hourly secret-digger agents, a PR security guard, daily threat modeling, and dependency monitoring. The primary gaps are: automated issue triage (no labeling), a meta-agent layer to monitor workflow health, and a few high-ROI automation patterns (breaking change checker, PR fix command, changeset generator).🎓 Patterns Learned from Pelis Agent Factory
From crawling the documentation at github.github.io/gh-aw and the series of "Meet the Workflows" posts, key patterns include:
/pr-fixon-demand CI repair📋 Current Agentic Workflow Inventory
build-testci-doctorworkflow_runcompletedci-cd-gaps-assessmentcli-flag-consistency-checkerdependency-security-monitordoc-maintainerissue-duplication-detectorissue-monsterpelis-agent-factory-advisorplan/planslash command for task breakdownsecret-digger-claudesecret-digger-codexsecret-digger-copilotsecurity-guardsecurity-reviewsmoke-chrootsmoke-claudesmoke-codexsmoke-copilottest-coverage-improverupdate-release-notesTotal: 21 agentic workflows (all compiled and active)
🚀 Actionable Recommendations
P0 — Implement Immediately
[P0] Issue Triage Agent
What: Automatically label incoming issues with
bug,feature,enhancement,security,documentation,questionbased on content analysis.Why: Currently zero issues are being labeled automatically. The Pelis Agent Factory calls this the "hello world" of agentic workflows — practical, immediately useful, and simple. With
issue-monsteralready dispatching issues to Copilot agents, good labeling will help it prioritize correctly. Security-relevant issues (iptables, container escape, credential exposure) should be labeledsecurityautomatically.How: Add a new
issue-triage.mdworkflow triggered onissues: [opened, reopened]with read-only permissions. Allow safe outputsadd-labelsandadd-comment. Include domain-specific categories:security,firewall,docker,cli,documentation,bug,feature.Effort: Low — ~20 lines of YAML frontmatter + natural language instructions
[P0] PR Fix Slash Command
What: Add a
/pr-fixslash command that an agent can invoke on a failing PR to investigate CI failures and attempt fixes.Why: Multiple PRs in the current backlog have failing CI (e.g.,
[WIP] Fix failing GitHub Actions workflow, smoke test failures). Developers currently have to investigate these manually. The PR Fix workflow fromgithubnext/agenticshas proven very effective. Given this repo runs 4 smoke engines and an integration test suite, having a command to kick off automated repair is high-value.How: Add
pr-fix.mdtriggered byslash_command: name: pr-fixonpull_request_review_commentandissue_comment. Agent investigates the failing CI, reads logs, proposes fixes as commits.Effort: Low — can be adapted from
githubnext/agentics/workflows/pr-fix.mdP1 — Plan for Near-Term
[P1] Breaking Change Checker
What: Monitor PRs and recent commits for backward-incompatible changes to the AWF public API (CLI flags, config file schema, container API, exit codes).
Why: AWF is used as infrastructure by other teams' agentic workflows. Breaking CLI flags (e.g., renaming
--allow-domains, changing config format) can silently break downstream workflows. A recent PR adding--openai-api-targetand--anthropic-api-targetis exactly the kind of addition to track. The Pelis Agent Factory's Breaking Change Checker has proven it catches real issues before users do.How: Trigger on PRs modifying
src/cli.ts,src/types.ts, oraction.yml. Compare current CLI interface against last tagged release. Alert with a comment if backward-incompatible changes detected (removed flags, changed semantics, modified env vars).Effort: Medium — needs domain-specific knowledge of AWF's public surface area
[P1] Container Base Image Freshness Monitor
What: Daily check whether
ubuntu/squid:latestandubuntu:22.04base images have newer security-patched versions available. Alert when images are stale.Why: This is a domain-specific security gap unique to this repository. AWF's security posture depends on its container images being current. Unlike npm dependencies (already monitored by
dependency-security-monitor), the Docker base images are not monitored by any existing workflow. A compromised or outdated Squid image in the agent's security-critical container is a real risk.How: Use
bash: docker pull --dry-runor compare image digests. Create a GitHub issue when images are more than N days old or when digest changes, suggesting a rebuild and release.Effort: Medium — requires Docker registry API calls or digest comparison logic
[P1] Workflow Health Manager
What: A meta-agent that monitors all other agentic workflows for signs of degraded health: stale last-run dates, high error rates, repeated failures, unexpected costs.
Why: With 21 workflows running, it's easy for one to silently start failing (wrong permissions, expired secrets, changed API, etc.). The Pelis Agent Factory's Workflow Health Manager created 40 issues and 34 PRs (14 merged) from monitoring the health of other workflows. Right now there's no agent watching the watchers.
How: Daily schedule. Use
agentic-workflowstool to get status and recent run history. Useagenticworkflows-logsto identify runs with high error rates or missed schedules. Create issues for unhealthy workflows. Can usecache-memoryto track patterns over time.Effort: Medium — uses existing tools, mainly needs good prompting
[P1] Smoke Test Results Aggregator
What: Weekly summary report of all smoke test outcomes across all 4 engines (Claude, Codex, Copilot, Chroot) as a GitHub Discussion.
Why: Four smoke tests run every 12 hours, generating a lot of signal. Currently there's no consolidated view of smoke test health trends. A weekly aggregation showing pass/fail rates by engine, common failure patterns, and flakiness metrics would help identify reliability issues before they become critical.
How: Weekly schedule. Use
agenticworkflows-logsto collect smoke test run history. Compute pass/fail rates. Post as[Smoke Test Report]discussion. Use cache-memory to track trends across weeks.Effort: Low-Medium — largely uses existing AWF log tooling
P2 — Consider for Roadmap
[P2] Changeset Generator
What: When a PR is merged to main, analyze the commit diff and propose a version bump (major/minor/patch) and changelog entry as a PR.
Why: The Pelis Agent Factory's Changeset workflow has a 78% merge rate (22 of 28 proposed PRs merged). Currently
update-release-notesonly runs at release time. Adding a changeset workflow would create a continuous changelog so releases become simple "approve and tag" events. Given the active commit history (10+ open PRs with conventional commit messages), this would save significant release prep time.Effort: Medium — needs to understand conventional commit semantics and semver rules
[P2] Audit Workflows / Agent Observability
What: A meta-agent that weekly audits all other agent runs for costs, error patterns, success rates, and identifies outliers (overly expensive runs, flaky agents, agents that never produce output).
Why: The Pelis Agent Factory's Audit Workflows became their most prolific discussion creator (93 discussions, 9 issues). With 21 workflows running, token costs and effectiveness vary widely. Some workflows may be consuming high token budgets with low impact. Having visibility into this helps prioritize which workflows to tune or disable.
Effort: Medium — uses
agenticworkflows-logstool extensively[P2] Issue Arborist
What: Periodically scan open issues for related ones and link them as sub-issues using GitHub's sub-issue feature, creating parent issues to group related work.
Why: This repository has active feature development with multiple related issues that share themes (e.g., multiple issues around API proxy, DNS handling, container security). The Issue Arborist pattern creates 77 reports and 18 parent issues in the Pelis factory. With
issue-monsterdispatching issues to agents, organized sub-issues help agents work on related problems cohesively.Effort: Low-Medium
[P2] CI Coach
What: Monthly analysis of CI pipeline performance — which workflows are slowest, which have the most failures, where there's redundancy — with optimization suggestions.
Why: The Pelis Agent Factory's CI Coach had a 100% merge rate (9 of 9 proposed PRs merged). This repo runs an extensive CI suite (integration tests, smoke tests for 4 engines, build tests, CodeQL, container scans). Identifying duplicated test runs, flaky tests that add noise, or test ordering improvements could meaningfully reduce CI time and cost.
Effort: Medium
P3 — Future Ideas
[P3] Daily Malicious Code Scan
What: Scan recent commits for suspicious patterns that might indicate supply chain compromise or malicious intent (unusual base64 blobs, curl|bash patterns, unexpected credential access, obfuscated code).
Why: The Pelis Agent Factory runs this as part of its security suite. For AWF specifically, since this is a security tool that handles iptables and credential management, any malicious code inserted into container scripts or the agent startup could have high-impact consequences. The existing
security-guardfocuses on PRs opened, but doesn't do a retroactive daily scan.Effort: Low — primarily uses
bashandgithubtools[P3] Security Compliance SLA Tracker
What: Track security vulnerabilities (from
dependency-security-monitor, CodeQL, andsecurity-review) from detection through resolution, alerting on any CVEs exceeding SLA deadlines.Why: The Pelis Factory's Security Compliance workflow "runs vulnerability campaigns with deadline tracking." Currently, security issues are opened by the dependency monitor but there's no tracking of whether they're being resolved within acceptable timeframes.
Effort: Medium — needs SLA configuration and cache-memory state tracking
[P3] Portfolio Analyst (Token Cost Optimizer)
What: Weekly analysis of which workflows are consuming the most tokens/compute, identifying over-engineered prompts or unnecessarily large context windows.
Why: With 21 workflows running on regular schedules, cost optimization matters. Secret diggers run every hour across 3 engines — understanding their per-run cost and whether their frequency is justified would be valuable.
Effort: Medium
📈 Maturity Assessment
Current Level: 4 / 5 — Advanced
The repository has an unusually mature agentic workflow ecosystem, particularly for security coverage. It is well ahead of most repositories in the ecosystem.
Target Level: 4.5 / 5
The primary gaps are meta-observability (watching the watchers), issue triage, and release automation polish.
Gap to close:
🔄 Comparison with Pelis Agent Factory Best Practices
What This Repository Does Exceptionally Well
security-guardPR reviewer,dependency-security-monitor, andsecurity-revieware all specifically tuned for a security-critical codebaseissue-duplication-detectorcorrectly uses cache-memory for persistent state across runs — matching the Pelis patternWhat Could Improve
update-release-notesis reactive (runs on release), while the Pelis approach (Changeset) is proactive — proposing version bumps as commits landUnique Opportunities Given the Domain
The firewall/security domain creates unique automation opportunities not found in the Pelis factory:
Generated by Pelis Agent Factory Advisor • Run ID: 22985059821 • 2026-03-12
Beta Was this translation helpful? Give feedback.
All reactions