Skip to main content
Version: 0.9.20

Troubleshooting

This page covers the most common issues when deploying, updating, or running AlphaAgent Studio, and how to resolve them. The installer is resumable and idempotent — once you fix an underlying problem, you can re-run studioctl install and it continues from the first step that has not completed. The read-only studioctl doctor command checks configuration and health at any time.

Preflight failures

The preflight check runs first and is read-only. If it reports a problem, fix it before continuing (it defaults to not proceeding).

SymptomLikely causeFix
Wrong account or regionCredentials point at the wrong targetRe-export the correct AWS credentials/profile and region; confirm the account/identity banner the installer prints.
Console not reachableNetwork egress blocked from your workstation or accountEnsure outbound HTTPS to the Console is allowed; retry.
Conflicting existing resourcesA previous deployment's retained resources are still presentSee Name collisions below.
Bedrock model access denied / quota too lowRequired models not enabled, or token quotas insufficientEnable access to the required model families in the Bedrock console for your region, and request quota increases if needed. See Prerequisites.
Certificate not usableACM certificate not ISSUED, wrong region, or doesn't cover your domainProvision/validate an ISSUED certificate in the deploy region that covers your domain.
Cognito prefix unavailableThe prefix is already taken (it must be globally unique)Choose a different, globally-unique prefix.

Common deploy errors

Docker not running / image not found

The installer loads and pushes the Studio image early. If Docker isn't running or the bundle is incomplete, you'll see an error before any cloud resources are created. Start Docker and confirm you unzipped the complete bundle, then re-run.

Throttling / rate-limit errors

Fresh accounts sometimes hit AWS API rate limits during deploy. The installer automatically retries these with backoff, so transient throttling usually resolves itself. If a step ultimately fails on throttling, simply re-run — completed work is skipped.

Template upload / size errors

The installer uploads its templates to a small helper bucket it creates in your account. If this bucket can't be created (permissions, region constraints), the deploy can't proceed — confirm your credentials have administrator permissions in the target region.

Neo4j connectivity

If Studio can't reach Neo4j after deploy, the fix depends on your Neo4j mode:

Self-hosted (managed by Studio):

  • Run studioctl doctor to confirm the Neo4j service is healthy and its credentials resolved.
  • Check the Neo4j service in your ECS cluster — on first start it pulls the Neo4j Community image, so allow a minute or two to become healthy.
  • The service is reached internally at bolt://neo4j:7687 via Service Connect; you do not manage its networking.

External (bring your own):

  • Confirm your Neo4j is reachable from the deployment's network and its security group allows inbound on 7687 from the Studio services' security group.
  • Re-run studioctl neo4j --uri … --user … --password … to (re)validate the connection and write the credentials; it rolls the knowledge-base services so they pick up the change.
  • Confirm the URI, username, and password are correct — studioctl neo4j validates them before saving.

Sign-in (SAML) problems

SymptomFix
Redirect loops back to your IdP, or access deniedVerify the ACS URL and SP entity ID registered in your IdP exactly match what the installer printed; confirm the user is assigned to the Studio app in your IdP.
User signs in but name/email are missingEnsure email, given_name, and family_name are mapped and released by your IdP.
IdP certificate rotatedRe-run python3 studioctl.py saml with the updated metadata.

See Single Sign-On (SAML) for details.

Name collisions after a previous deploy

Teardown retains your data stores (DynamoDB, S3, ECR, ElastiCache, the license secret). Redeploying into the same account will collide with those retained resources, which preflight flags. You have two choices:

  • Resume with existing data — keep the retained resources; the activated license secret lets the new deployment resume the same license without re-activation.
  • Start completely fresh — perform the full manual cleanup in Teardown first, then redeploy.

License and deployment health

Health is visible on the Console license page. Use it to diagnose runtime issues.

StatusMeaningWhat to check
Warning / UnhealthyOne or more recent heartbeats were missedOutbound HTTPS from your deployment to the Console; recent deploy/update activity.
Deployment disabled (service returns "unavailable")The deployment lost contact with the Console for an extended period, or the license was revokedSee below.
Frozen (billing)An invoice is significantly past duePay the outstanding invoice; the deployment recovers automatically. See Billing.

A deployment that has disabled itself

If a deployment cannot reach the Console (or had its license revoked), it will eventually stop serving agent traffic as a safety measure and return an "unavailable" response. To recover:

  • If it's a connectivity issue, restore outbound access to the Console; the deployment resumes after a successful heartbeat.
  • Check the host clock. Large clock drift on the deployment can cause the Console to reject its signed requests, which looks similar to a revoked license. Confirm time sync (NTP) is healthy.
  • If the license was revoked, recovery is not possible for that key — revocation is permanent. Create a new license key and redeploy.

Operators can always reach Studio's diagnostic and re-activation endpoints even when agent traffic is disabled, so you can confirm the current state during recovery.

Getting help

When contacting support, include your License ID (the public aalk_… identifier, safe to share), the Studio version, and the deploy region. Do not share your License Token, Neo4j password, or AWS credentials.