Troubleshooting
Symptoms grouped by where they tend to show up. Start with whichever page matches what you're seeing; cross-references take you to deeper context.
Pages
- Install — bootstrap or production install failed, services won't come up.
- Authentication / OIDC — can't sign in, sign-in loop, claims missing.
- Networking / OpenZiti — agent can't reach Gateway,
.zitihostname fails. - Agents won't start — workload fails, init container errors, image pull issues.
- LLM calls fail — auth errors, rate limits, model not found.
- MCP tools fail — tool returns error, tool not visible to agent.
- Tracing gaps — spans missing, run timeline empty.
- FAQ — short answers to common questions.
Diagnostic mindset
When something doesn't work:
- Reproduce it. Confirm the failure is reliable. Intermittent failures usually mean rate limits, resource contention, or DNS — different troubleshooting from "always broken."
- Look at the run. If an agent is involved, open the Run Timeline. Half the time the issue is visible there (an LLM call returned an error, a tool returned bad data).
- Look at the logs. Filter by
identity_idortrace_idacross services. See Operate → Logging & audit. - Check the obvious. Is the service running? Are credentials in the right Secret? Did the certificate expire? Did the OIDC provider rotate keys?
- Bisect by component. If you don't know which service is failing, walk through the request path (Chat app → Gateway → Threads → Notifications → Orchestrator → Runner → Pod → MCP).
Where to ask for help
- GitHub issues on the relevant
agynio/*repository — see Reference → Service catalog. - Community channels linked from the project README.
- For commercial support, your contract details apply.
When filing an issue, include:
- Agyn chart version (
helm list -n agyn). - Service version (look at the pod image).
- Reproduction steps.
- Relevant log snippets with
trace_id. - Any sanitized config (
.tffiles, Helm values).
Sanitize secrets, tokens, and personally-identifying information before sharing.