Install issues

`./apply.sh` fails immediately

Terraform not installed. Bootstrap requires the terraform CLI. Install it before running.
Docker not running. The k8s stack creates a k3d cluster in Docker. Start Docker Desktop / dockerd.
Insufficient Docker resources. k3d + every platform service is heavy. Give Docker at least 6 vCPU and 12 GB RAM (Docker Desktop → Settings → Resources).
Port 2496 or 6443 already in use. Stop whatever holds them (lsof -i :2496) or set PORT=<free port> before re-running.

k3d cluster up, but pods stay `Pending` or restart

shkubectl describe pod -n <ns> <pod>
kubectl logs -n <ns> <pod> -c <container>

Read the Events at the bottom. Common causes:

Insufficient cpu / Insufficient memory → bump Docker resources.
MountVolume.SetUp failed → storage provisioning issue. terraform -chdir=stacks/k8s destroy && ./apply.sh to recreate the cluster fresh.
ImagePullBackOff → see Image pull failures below.
CrashLoopBackOff on a platform service → check the container logs; most often a downstream dependency (Postgres, OpenFGA) isn't healthy yet.

A stack fails partway through

apply.sh runs nine stacks in sequence and exits on the first failure. The failing stack's name is printed in the [TIMING] line just before exit.

Common failures:

deps waits forever for cert-manager / trust-manager / ziti-controller. They install via Argo CD. Check https://argocd.agyn.dev:2496/ (or kubectl get applications -n argocd). The failing application's events page tells you why.
ziti exits with "OpenZiti Management API did not become ready". The ziti-controller pod is still starting or unhealthy. kubectl -n ziti logs deploy/ziti-controller. Wait and re-run the ziti stack alone: terraform -chdir=stacks/ziti apply.
platform fails on a service-specific Argo CD app. apply.sh waits per-app; the failing app's name is logged plus pod state. Most often: image pull, DB connectivity, or a chart bug in a specific service version.

You can re-run a single stack after fixing the issue:

terraform -chdir=stacks/<stack> apply

Each stack picks up where the previous succeeded.

Argo CD applications show `OutOfSync`

kubectl get applications -n argocd

For any OutOfSync app: open Argo CD's UI (https://argocd.agyn.dev:2496/) and click Sync — or argocd app sync <name> if you have the CLI installed.

If sync keeps failing, open the app and look at the App Health section. The failing resource's status is usually self-explanatory.

Browser can't open `https://agyn.dev:2496/`

The CA cert install step was skipped or cancelled. Browsers warn on every URL. Re-run:
sh
```
./install-ca-cert.sh -y local-certs/ca-agyn-dev.pem
```
agyn.dev doesn't resolve to 127.0.0.1. Very rare — the domain is configured publicly. If your DNS resolver strips it (corporate networks sometimes do), set a custom DOMAIN and add it to /etc/hosts yourself.
You changed DOMAIN but didn't add it to your hosts file. Custom domains need a real resolution path. Add 127.0.0.1 <domain> *.<domain> to /etc/hosts or run a local DNS.
Ingress isn't routed. kubectl get gateway -A should show platform-gateway in istio-gateway. If missing, the routing stack didn't run.

Image pull failures

If any pod shows ImagePullBackOff:

kubectl describe pod -n <ns> <pod>

Look at the Events:

unauthorized: authentication required → you're pulling from a private registry. Set GHCR_USERNAME and GHCR_TOKEN before re-running apply.sh. The platform stack uses these to create the registry pull secret.
manifest unknown → the image tag in stacks/platform/variables.tf doesn't exist upstream. Either bump or pin a known-good version.
rate limited → Docker Hub. The k3d image and some upstream deps are on Docker Hub; authenticated pulls or a mirror avoid this.

For agent workloads (not the platform services themselves) pulling from your private registry: see Administer → Image pull secrets.

Database migrations stuck

If a service pod is stuck in Init running a migration:

kubectl logs -n <ns> <pod> -c <init-container>

The platform's per-service charts run migrations as Init containers. They are idempotent — delete the failing pod and let the Deployment recreate it:

kubectl delete pod -n <ns> <pod>

If the migration genuinely fails (constraint violation, missing column from a prior state), check the service's release notes for known migration issues. If none, file an issue against the service repository.

`gateway` returns 502 after install

Gateway depends on most services. If any are unhealthy, Gateway can fail readiness probes.

kubectl get pods -A | grep -v Running

Wait for everything to be Ready 1/1 before assuming Gateway is broken. First-time deploys take about 15 minutes total — Gateway is one of the last services to settle because it talks to everyone.

Bootstrap re-run after a failed apply

It's safe to re-run ./apply.sh after fixing an issue. Each stack's terraform apply is idempotent. Argo CD applications converge to their declared state.

If a partial install is genuinely broken and you want a clean slate, see Uninstall — quick reset.

Install issues

./apply.sh fails immediately

k3d cluster up, but pods stay Pending or restart

A stack fails partway through

Argo CD applications show OutOfSync

Browser can't open https://agyn.dev:2496/