Deployment¶
Deploy Gremia Labs across managed cloud services (recommended) or self-hosted with Docker Compose.
Architecture Overview¶
graph TB
subgraph Managed Services
V[Vercel<br/>Next.js Web App]
R[Railway<br/>FastAPI Orchestrator]
SB[Supabase<br/>PostgreSQL + Auth]
UP[Upstash<br/>Redis]
ST[Stripe<br/>Billing]
end
subgraph CI/CD
GH[GitHub Actions]
end
subgraph Clients
B[Browser] -->|HTTPS| V
S[Shell Desktop App] -->|WSS + mTLS| R
end
V -->|REST| R
R --> SB
R --> UP
R --> ST
GH -->|deploy/dev| R
GH -->|deploy/dev| V
Production uses managed services: Railway (backend), Vercel (frontend), Supabase (database), and Upstash (Redis). A Docker Compose option exists for fully self-hosted deployments.
1. Managed Deployment (Railway + Vercel)¶
This is the primary deployment method used by Gremia Labs.
1.1 Railway (Backend Orchestrator)¶
Railway hosts the FastAPI orchestrator (LangGraph agent execution, MCP coordination, billing webhooks).
| Setting | Production | Development |
|---|---|---|
| Branch | main |
deploy/dev |
| Domain | gremia-labs-production.up.railway.app |
gremia-labs-development.up.railway.app |
| Region | europe-west4 |
europe-west4 |
| Dockerfile | infra/docker/Dockerfile.orchestrator |
infra/docker/Dockerfile.orchestrator |
Railway auto-deploys when it detects a push to the configured branch. The production environment deploys directly from main; the development environment deploys from deploy/dev, which is updated by CI after all quality gates pass.
Dockerfile
The orchestrator uses infra/docker/Dockerfile.orchestrator at the repository root. Railway must be configured with dockerfilePath: "infra/docker/Dockerfile.orchestrator" to avoid Railpack auto-detection.
Key Railway environment variables (set via Railway dashboard or CLI):
- All variables from the Environment Variables table below
UVICORN_WORKERS=4(adjust per plan)LOG_LEVEL=INFOPYTHONUNBUFFERED=1
1.2 Vercel (Frontend Web App)¶
Vercel hosts the Next.js Builder app (apps/web/).
| Setting | Production | Preview |
|---|---|---|
| Branch | main |
deploy/dev |
| Framework | Next.js (auto-detected) | Next.js |
| Root Directory | apps/web |
apps/web |
Vercel environment variables are split by target:
- Production target: points to Supabase PROD and Railway PROD URLs
- Preview + Development target: points to Supabase DEV and Railway DEV URLs
Required NEXT_PUBLIC_* variables:
| Variable | Description |
|---|---|
NEXT_PUBLIC_SUPABASE_URL |
Supabase project URL |
NEXT_PUBLIC_SUPABASE_ANON_KEY |
Supabase anonymous key |
NEXT_PUBLIC_ORCHESTRATOR_URL |
Railway backend URL (e.g., https://gremia-labs-production.up.railway.app) |
Ignored Build Step
Configure Vercel's "Ignored Build Step" to skip builds on the development branch directly. Only deploy/dev (pushed by CI after quality gates pass) should trigger preview deployments.
1.3 Supabase (Database)¶
Supabase provides managed PostgreSQL with Row-Level Security, Auth, and real-time subscriptions.
| Environment | Project Ref | Region |
|---|---|---|
| Production | zscxhmfhrqrdzusrfqfl |
EU Frankfurt |
| Development | vrsjiyufvqktzazyiynb |
EU Central |
All tables live under the gremia schema. Migrations are in infra/supabase/migrations/ (001 through 020). Apply via the Supabase SQL Editor or MCP tool, not the Supabase CLI.
Schema separation
The public schema belongs to a legacy project (CVAI). Never create or modify tables in public. All Gremia tables use the gremia schema exclusively.
1.4 Redis (Upstash)¶
Upstash provides serverless Redis with TLS.
- Connection:
REDIS_URLusesrediss://protocol (TLS-enabled) - Usage: Rate limiting, execution cache, audit buffer, WebSocket relay
- Fallback: The application functions without Redis using in-memory fallbacks (suitable for development only)
2. CI/CD Pipeline¶
GitHub Actions runs quality gates on every push to development and orchestrates deployment to staging.
Pipeline Flow¶
graph LR
A[Push to development] --> B[CI Workflow]
B --> C{Quality Gates}
C -->|Pass| D[Push to deploy/dev]
C -->|Fail| E[Block - Fix Required]
D --> F[Railway Dev Deploy]
D --> G[Vercel Preview Deploy]
H[Merge to main] --> I[Railway Prod Deploy]
H --> J[Vercel Prod Deploy]
Quality Gate Jobs¶
The CI workflow (.github/workflows/ci.yml) runs these jobs in parallel:
| Job | What it checks |
|---|---|
| quality-js | turbo run lint typecheck test, i18n completeness |
| quality-python | ruff check, ruff format --check, mypy app, pytest with coverage |
| quality-rust | cargo check, cargo clippy -- -D warnings, cargo test --lib |
| audit-npm | npm audit --omit=dev --audit-level=high |
| audit-python | pip-audit --skip-editable |
| audit-rust | cargo audit |
| build | turbo run build (depends on quality jobs) |
| e2e | Playwright tests with Chromium |
Deploy Trigger¶
After all quality and build jobs pass, the deploy-dev job force-pushes the development branch to deploy/dev:
This uses a DEPLOY_TOKEN (GitHub PAT) to ensure the push triggers Railway and Vercel webhooks.
Production deploys
Pushes and merges to main trigger Railway and Vercel production deployments directly -- there is no CI gate on main. Ensure all changes reach main via reviewed pull requests.
3. Self-Hosted Deployment (Docker Compose)¶
For environments where managed services are not an option, a full Docker Compose stack is provided.
Prerequisites¶
| Tool | Version |
|---|---|
| Docker | 24.0+ |
| Docker Compose | 2.20+ |
| Domain name | With DNS A record pointing to your server |
| SSL certificate | Let's Encrypt or commercial (for Nginx) |
Quick Start¶
1. Clone the repository¶
2. Configure environment¶
cp .env.example .env.production
# Edit .env.production with your values (see Environment Variables below)
Security
Never commit .env.production to version control. Use a secrets manager
(Vault, AWS Secrets Manager, etc.) in production.
3. Set up SSL certificates¶
Place your TLS certificates in a location accessible to the certs Docker volume, or use Let's Encrypt:
4. Build and start¶
This starts 6 services by default:
| Service | Image / Build | Internal Port | Description |
|---|---|---|---|
| redis | redis:7-alpine |
6379 | Caching, rate limiting, pub/sub |
| nginx | nginx:1.27-alpine |
80, 443 (exposed) | TLS termination, reverse proxy, mTLS |
| orchestrator | Built from infra/docker/Dockerfile.orchestrator |
8000 | FastAPI + LangGraph backend |
| web | Built from infra/docker/Dockerfile.web |
3000 | Next.js frontend |
| prometheus | prom/prometheus:v2.51.0 |
9090 | Metrics collection (30-day retention) |
| grafana | grafana/grafana:11.4.0 |
3001 | Dashboards and alerting |
An optional 7th service is available:
| Service | Profile | Description |
|---|---|---|
| supabase-db | self-hosted |
Self-hosted PostgreSQL (replaces Supabase Cloud) |
To include the self-hosted database:
5. Verify¶
Docker Compose Configuration¶
The compose file is at the repository root: docker-compose.prod.yml
# Key architectural decisions:
# - Only ports 80/443 exposed externally (via nginx)
# - All services on a dedicated 'gremia' bridge network
# - Healthchecks on every service
# - env_file: .env.production
# - Redis URL override: redis://redis:6379/0 (internal network)
# Scale the orchestrator horizontally:
docker compose -f docker-compose.prod.yml up -d --scale orchestrator=2
Nginx Configuration¶
The Nginx config at infra/nginx/nginx.conf provides:
- TLS 1.3 with modern cipher configuration
- Optional mTLS for Shell desktop client connections
- Rate limiting per endpoint type:
- General API: 30 req/s per IP (burst 60)
- Auth endpoints: 5 req/s per IP (burst 10)
- WebSocket handshakes: 2/s per IP (burst 5)
- Security headers: HSTS, X-Content-Type-Options, X-Frame-Options, CSP
- WebSocket upgrade for the Shell tunnel (
/api/v1/tunnel/ws, 24h timeout) - Metrics endpoint restricted to internal Docker network IPs
Key proxy locations:
# WebSocket: Shell tunnel (long-lived, 24h timeout)
location /api/v1/tunnel/ws {
proxy_pass http://orchestrator;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Client-Cert $ssl_client_cert;
proxy_read_timeout 86400;
proxy_send_timeout 86400;
}
# REST API (rate-limited)
location /api/ {
proxy_pass http://orchestrator;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# Next.js frontend (default)
location / {
proxy_pass http://web;
}
SSL with Let's Encrypt¶
# Install certbot
apt install certbot python3-certbot-nginx
# Generate certificate
certbot --nginx -d your-domain.com
# Auto-renewal (crontab)
0 0 * * * certbot renew --quiet
4. Environment Variables¶
Backend (Orchestrator)¶
These variables are required for the FastAPI orchestrator, whether deployed on Railway or self-hosted.
| Variable | Required | Description |
|---|---|---|
SUPABASE_URL |
Yes | Supabase project URL |
SUPABASE_KEY |
Yes | Supabase anonymous (anon) key |
SUPABASE_SERVICE_ROLE_KEY |
Yes | Supabase service role key (bypasses RLS) |
JWT_SECRET |
Yes | Secret for JWT signing (min 32 chars) |
SUPABASE_JWT_SECRET |
Yes | Supabase JWT secret (for token verification) |
ANTHROPIC_API_KEY |
Yes | Anthropic API key for AI model calls |
ENCRYPTION_KEY |
Yes | AES-256 encryption key (hex-encoded, 64 chars) |
STRIPE_SECRET_KEY |
Yes | Stripe secret key |
STRIPE_WEBHOOK_SECRET |
Yes | Stripe webhook signing secret |
STRIPE_PRO_PRICE_ID |
Yes | Stripe price ID for Pro monthly plan |
STRIPE_PRO_ANNUAL_PRICE_ID |
Yes | Stripe price ID for Pro annual plan |
STRIPE_ENTERPRISE_PRICE_ID |
Yes | Stripe price ID for Enterprise monthly plan |
STRIPE_ENTERPRISE_ANNUAL_PRICE_ID |
Yes | Stripe price ID for Enterprise annual plan |
REDIS_URL |
No | Redis/Upstash connection URL (rediss:// for TLS). Falls back to in-memory if absent |
VOYAGE_API_KEY |
No | Voyage AI key for RAG embeddings |
ALLOWED_ORIGINS |
No | JSON array of allowed CORS origins |
LOG_LEVEL |
No | Logging level (default: INFO) |
UVICORN_WORKERS |
No | Number of Uvicorn workers (default: 4) |
Frontend (Web App)¶
These are set in Vercel or in the web app's environment for self-hosted deployments.
| Variable | Required | Description |
|---|---|---|
NEXT_PUBLIC_SUPABASE_URL |
Yes | Supabase project URL (client-side) |
NEXT_PUBLIC_SUPABASE_ANON_KEY |
Yes | Supabase anonymous key (client-side) |
NEXT_PUBLIC_ORCHESTRATOR_URL |
Yes | Backend API URL |
Self-Hosted Only¶
| Variable | Required | Description |
|---|---|---|
POSTGRES_PASSWORD |
If using supabase-db |
PostgreSQL password for self-hosted DB |
GRAFANA_ADMIN_USER |
No | Grafana admin username (default: admin) |
GRAFANA_ADMIN_PASSWORD |
No | Grafana admin password (default: changeme) |
Secrets management
Never commit secrets to version control. Use environment-specific secrets in Railway/Vercel dashboards, or a dedicated secrets manager (HashiCorp Vault, AWS Secrets Manager) for self-hosted deployments.
Railway environment variables
When setting ALLOWED_ORIGINS in Railway, use a single-line JSON array. Railway inserts newlines if the value contains line breaks, which breaks JSON parsing.
5. Monitoring¶
Health Checks¶
All services expose health endpoints:
# Orchestrator (Railway or self-hosted)
curl https://your-backend-url/health
# Web app
curl https://your-frontend-url/
Prometheus Metrics¶
The orchestrator exposes a /metrics endpoint with the following metrics:
| Metric | Type | Description |
|---|---|---|
gremia_http_requests_total |
Counter | Total HTTP requests by method, path, status |
gremia_http_request_duration_seconds |
Histogram | Request latency distribution |
gremia_ws_connections_active |
Gauge | Currently active WebSocket connections |
gremia_ws_connections_total |
Counter | Total WebSocket connections over time |
gremia_executions_total |
Counter | Total task executions by status |
Managed deployment: Configure an external Prometheus instance to scrape the Railway backend URL's /metrics endpoint. Ensure the endpoint is protected (e.g., via a bearer token or IP allowlist on Railway).
Self-hosted deployment: Prometheus and Grafana are included in the Docker Compose stack. Prometheus scrapes the orchestrator on the internal gremia network. Grafana dashboards are provisioned automatically from infra/grafana/dashboards/.
Grafana¶
Access Grafana at port 3001 (internal in Docker Compose, proxied via Nginx in production).
- Default credentials:
admin/changeme(change immediately) - Pre-provisioned data source: Prometheus (
http://prometheus:9090) - Dashboard provisioning:
infra/grafana/provisioning/
Logs¶
Railway: View logs in the Railway dashboard or via the Railway CLI:
Self-hosted:
# Follow orchestrator logs
docker compose -f docker-compose.prod.yml logs -f orchestrator
# Last 100 lines of a specific service
docker compose -f docker-compose.prod.yml logs --tail=100 orchestrator
6. Scaling¶
Managed (Railway)¶
Railway supports vertical scaling (adjust CPU/RAM per service) and horizontal scaling (multiple instances). Configure in the Railway dashboard or via the Railway CLI.
WebSocket affinity
If running multiple orchestrator instances, WebSocket connections require sticky sessions. Configure your load balancer or Railway service to use source IP affinity for the /api/v1/tunnel/ws path.
Self-Hosted¶
Horizontal scaling¶
The orchestrator is stateless (state lives in Supabase + Redis). Run multiple instances behind Nginx:
The Nginx upstream uses ip_hash for WebSocket session affinity.
Resource limits¶
services:
orchestrator:
deploy:
resources:
limits:
cpus: "2.0"
memory: 2G
reservations:
cpus: "0.5"
memory: 512M
Redis¶
The default Redis configuration uses maxmemory 256mb with allkeys-lru eviction. Increase for high-traffic deployments:
# In docker-compose.prod.yml, update the redis command:
command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru
7. Backup and Recovery¶
Database (Supabase)¶
Managed: Supabase provides automated daily backups on Pro plans and above. Point-in-time recovery is available on Team and Enterprise plans.
Manual backup:
Redis¶
Redis data is ephemeral (cache, rate-limit counters). No backup is required. On Upstash, persistence is handled automatically.
For self-hosted Redis, the appendonly flag is enabled. Data persists in the redis-data Docker volume.
Encryption Keys¶
Store the ENCRYPTION_KEY in a secure vault. If this key is lost, encrypted credential data cannot be recovered. Use the key rotation API to rotate keys periodically.
Key loss
There is no recovery mechanism for data encrypted with a lost ENCRYPTION_KEY. Store it in at least two independent secure locations.
Disaster Recovery Checklist¶
- Database: Restore from Supabase backup or manual
pg_dump - Environment variables: Re-apply from secrets manager
- SSL certificates: Re-issue via Let's Encrypt or restore from backup
- Docker volumes (self-hosted):
prometheus-data,grafana-data,supabase-datashould be on persistent storage with regular snapshots
Updating¶
Managed¶
- Backend: Push to
main(or merge a PR). Railway auto-deploys. - Frontend: Push to
main. Vercel auto-deploys. - Database: Apply new migrations via Supabase SQL Editor.
Self-Hosted¶
cd gremia-labs
git pull origin main
docker compose -f docker-compose.prod.yml build orchestrator web
docker compose -f docker-compose.prod.yml up -d orchestrator web
Zero-downtime updates are supported when running multiple orchestrator replicas behind Nginx.