2026-04-27

The idea of this post is to show how I set up a personal AI assistant running 24/7 on an AWS VPS, reachable from anywhere via Telegram, that automatically picks the cheapest available model before falling back to paid ones. The core of it is OpenClaw: an open source daemon (MIT, TypeScript, Node 24) that exposes a single assistant across many channels (Telegram, Slack, Discord, Signal, iMessage, Matrix, WeChat) and routes every request through a configurable provider/model stack with built-in failover.
The interesting bit: OpenClaw already implements a native fallback cascade. I configure free models first and it auto-escalates when a tier exhausts. Claude stays gated at the end, only triggered by an explicit command.
[ your phone ]
|
| Telegram DM
v
api.telegram.org
|
| long-poll (outbound only)
v
+--------------------------------------------------------+
| AWS VPS (Ubuntu 24.04, Lightsail $20/mo) |
| |
| systemd user unit: openclaw.service |
| openclaw gateway (Node 24) port 18789 (lo) |
| channel: telegram (grammY long-poll) |
| agent runtime (sandbox: Docker) |
| model cascade -- HTTPS outbound -----+ |
| | |
| ~/.openclaw/ | |
| openclaw.json config + chain | |
| auth-profiles OAuth tokens, API | |
| workspace/ per-agent state | |
+--------------------------------------------|-----------+
|
+----------------+--------------+----+------------+
v v v v
openrouter.ai api.minimax.io ChatGPT (Codex) api.anthropic.com
(free models) (Coding Plan) (your sub) (gated)
Why this shape:
127.0.0.1:18789. Security group exposes SSH only.~/.openclaw/). Easy to back up to S3.The OpenClaw docs at docs.openclaw.ai/concepts/model-failover spell it out:
OpenClaw handles failures in two stages: (1) auth profile rotation within the current provider, then (2) model fallback to the next model in
agents.defaults.model.fallbacks.
What this gives me for free:
| Requirement | How OpenClaw handles it |
|---|---|
| Try free model first | agents.defaults.model.primary = "openrouter/qwen/qwen3-coder:free" |
| Auto-escalate on rate limit / 429 | fallbacks[] advances on rate-limit, timeout, upstream errors |
| Don’t loop on a broken provider | Exponential cooldown (1m, 5m, 25m, 1h) on the failed profile |
| Stop at Codex unless permission given | Codex last in fallbacks; don’t include Claude in the auto-chain |
| Reach Claude only with permission | Use /model anthropic/claude-opus-4-7 in chat. A user /model is strict: it never silently falls back to another tier. |
The “user override is strict” property is the crucial bit: when I type /model anthropic/..., OpenClaw locks the session to Claude until I reset it, and a Claude failure bubbles up as an error rather than silently routing to a cheaper tier. That is exactly the “Claude requires permission” behavior I wanted.
The Gateway is a single Node process plus an optional Docker sandbox. RAM is the binding constraint.
| Criterion | EC2 t3.medium | EC2 t4g.medium (ARM) | Lightsail $20 |
|---|---|---|---|
| vCPU | 2 burst | 2 burst | 2 |
| RAM | 4 GB | 4 GB | 4 GB |
| Storage | gp3, BYO | gp3, BYO | 80 GB SSD inc. |
| Network | EIP $3.60 unattached | same | 3 TB egress inc. |
| Sticker | ~$30/mo | ~$24/mo | $20/mo flat |
| 1-yr Savings Plan | ~$19/mo | ~$15/mo | n/a |
| Ops complexity | VPC, SG, EBS, EIP | same | one click |
Recommendation: Lightsail $20/mo. Predictable flat cost, includes egress, no EIP fees. Same Ubuntu 24.04 underneath, so every command in this guide works identically. Sizes below (Lightsail $10 / t3.small with 2 GB) are too tight: Node 24 plus the Docker sandbox peaks above 2 GB during heavy turns.
brew install awscli
aws configure
brew install --cask tailscale # optional but recommended
AmazonLightsailFullAccess.REGION="sa-east-1"
aws lightsail create-instances \
--instance-names openclaw-1 \
--availability-zone "${REGION}a" \
--blueprint-id ubuntu_24_04 \
--bundle-id medium_3_0 \
--tags key=Name,value=openclaw
aws lightsail get-instance --instance-name openclaw-1 \
--query 'instance.state.name' --output text
# Lock SSH to the current IP
MY_IP=$(curl -s https://checkip.amazonaws.com)
aws lightsail put-instance-public-ports \
--instance-name openclaw-1 \
--port-infos "fromPort=22,toPort=22,protocol=TCP,cidrs=${MY_IP}/32"
# Default SSH key for the region
aws lightsail download-default-key-pair --query 'privateKeyBase64' --output text \
| base64 -d > ~/.ssh/lightsail-${REGION}.pem
chmod 400 ~/.ssh/lightsail-${REGION}.pem
PUBLIC_IP=$(aws lightsail get-instance --instance-name openclaw-1 \
--query 'instance.publicIpAddress' --output text)
ssh -i ~/.ssh/lightsail-${REGION}.pem ubuntu@${PUBLIC_IP}
Inside the box, hostname and timezone:
sudo hostnamectl set-hostname openclaw
sudo timedatectl set-timezone America/Sao_Paulo
sudo apt update && sudo apt upgrade -y
OpenClaw ships an install script that pulls Node 24 and the binary, then installs the systemd user daemon for me.
curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash -
sudo apt install -y nodejs
node --version # expect v24.x
npm --version
curl -fsSL https://openclaw.ai/install.sh | bash
which openclaw
openclaw --version
# Choose "skip" or just ANTHROPIC_API_KEY for now.
# We overwrite the model config in Step 3.
openclaw onboard --install-daemon
openclaw gateway status
# Expect: "running on 127.0.0.1:18789"
ls ~/.openclaw/
To peek at the Control UI from my laptop:
ssh -i ~/.ssh/lightsail-sa-east-1.pem -L 18789:127.0.0.1:18789 ubuntu@${PUBLIC_IP}
# Open http://127.0.0.1:18789 locally.
ssh -i ~/.ssh/... ubuntu@${PUBLIC_IP} 'cat ~/.openclaw/.env | grep TOKEN'
The four key tutorials are in API key tutorials below. By the end I need:
OPENROUTER_API_KEY (sk-or-v1-…)MINIMAX_API_KEY (or OAuth via openclaw onboard --auth-choice minimax-global-oauth)openclaw onboard --auth-choice openai-codex-oauthANTHROPIC_API_KEY (sk-ant-…).envcat > ~/.openclaw/.env <<'EOF'
# Tier 1 -- OpenRouter (free models)
OPENROUTER_API_KEY=sk-or-v1-REPLACE_ME
# Tier 2 -- MiniMax Coding Plan ($20/mo). Comment out if you used OAuth.
MINIMAX_API_KEY=REPLACE_ME
# Tier 4 -- Anthropic API key (gated; only invoked via /model)
ANTHROPIC_API_KEY=sk-ant-REPLACE_ME
# Telegram (filled in Step 4)
# TELEGRAM_BOT_TOKEN=
EOF
chmod 600 ~/.openclaw/.env
Tier 3 (Codex) goes through OAuth, not env vars. The token lands in
~/.openclaw/agents/<id>/agent/auth-profiles.json.
openclaw.jsoncp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak
openclaw config set agents.defaults.model.primary 'openrouter/qwen/qwen3-coder:free'
openclaw config set --strict-json --merge agents.defaults.model.fallbacks '[
"openrouter/z-ai/glm-4.5-air:free",
"openrouter/openai/gpt-oss-120b:free",
"minimax/MiniMax-M2.7",
"openai-codex/gpt-5.5"
]'
# Allow Claude to be picked manually with /model. Not in the auto-chain.
openclaw config set --strict-json --merge agents.defaults.models '{
"openrouter/qwen/qwen3-coder:free": {"alias": "free-coder"},
"openrouter/z-ai/glm-4.5-air:free": {"alias": "free-glm"},
"openrouter/openai/gpt-oss-120b:free": {"alias": "free-oss"},
"minimax/MiniMax-M2.7": {"alias": "minimax"},
"openai-codex/gpt-5.5": {"alias": "codex"},
"anthropic/claude-opus-4-7": {"alias": "claude"}
}'
openclaw config get agents.defaults.model
openclaw config get agents.defaults.models
| Event | Result |
|---|---|
| Healthy free model | Everything served by qwen3-coder:free. Free. |
| Free model 429s | Rotates to glm-4.5-air:free, then gpt-oss-120b:free. |
| All free models exhausted | Steps to MiniMax (MiniMax-M2.7). $20/mo flat. |
| MiniMax window exhausted | Steps to Codex via your ChatGPT subscription. |
| Codex weekly cap hit | Reply errors out (Claude isn’t in the chain). I decide whether to escalate. |
I type /model claude | Session locks to Anthropic Opus. Bills my Anthropic key. |
I type /new or /reset | Returns to the cascade primary. Claude lock cleared. |
That’s exactly the behavior I wanted: free-to-Codex automatic; Claude only via explicit action.
Instead of typing /model every time, define a second agent:
openclaw config set --strict-json --merge agents.list '[
{
"id": "premium",
"model": { "primary": "anthropic/claude-opus-4-7", "fallbacks": [] },
"system": "You are the premium tier. Use sparingly."
}
]'
fallbacks: [] makes it strict: failure on Claude does not silently downgrade.
openclaw gateway restart
openclaw doctor
In the Telegram app on the phone:
@BotFather)./newbot.My OpenClaw).bot (e.g. myopenclaw_bot).123456789:ABCDef-GhIjKlMnOpQrStUv./setprivacy, pick the bot, Disable (lets the bot see group messages; only matters if you’ll add it to groups)./setjoingroups, Disable (you don’t want randoms adding it).# On the VPS, tail the log and DM your bot anything from your phone
openclaw logs --follow
# Look for a line with "from.id": 123456789
That number is <your-telegram-id>.
echo "TELEGRAM_BOT_TOKEN=123456789:ABC..." >> ~/.openclaw/.env
openclaw config set --strict-json --merge channels.telegram '{
"enabled": true,
"dmPolicy": "allowlist",
"allowFrom": ["telegram:<your-telegram-id>"]
}'
# Set yourself as command owner (lets you run /model, /reset, etc.)
openclaw config set --strict-json --merge commands.ownerAllowFrom '["telegram:<your-telegram-id>"]'
openclaw gateway restart
openclaw doctor
dmPolicy: "allowlist" blocks any other Telegram user the moment they DM the bot, even if the username leaks.
On Telegram, DM hello to the bot. Within a few seconds you get a reply served by qwen3-coder:free.
Manually escalating:
/model claude
write a Python decorator that retries with exponential backoff
/new
/model claude switches to Claude for that session; /new resets back to the cascade.
The threat model: a long-running daemon with credentials for ChatGPT, Claude, MiniMax, OpenRouter, and a Telegram bot that can run shell inside its sandbox. Both inbound and outbound surfaces need to stay tight.
The Lightsail/EC2 SG should only expose port 22 from your IP. The Gateway already binds loopback (gateway.bind: "lan" is the default; verify with openclaw config get gateway.bind. If you don’t need LAN, set to "lo").
sudo ss -tlnp | grep -v 127.0.0.1
# Should only see :22 (sshd)
Done in Step 4. Verify:
openclaw config get channels.telegram.dmPolicy # expect "allowlist"
openclaw config get channels.telegram.allowFrom # expect ["telegram:..."]
If you’ll let the agent run shell (which is half the point), turn on the Docker sandbox so it can’t touch the host filesystem outside ~/.openclaw/workspace/.
curl -fsSL https://get.docker.com | sudo bash
sudo usermod -aG docker ubuntu
newgrp docker
openclaw config set agents.defaults.sandbox.mode 'non-main'
openclaw config set agents.defaults.sandbox.backend 'docker'
openclaw gateway restart
openclaw doctor
The main session (your direct DMs) still has host access by default, useful for actually doing work on the box. Group/multi-user sessions are sandboxed.
chmod 600 ~/.openclaw/.env
chmod 700 ~/.openclaw/agents
find ~/.openclaw -name 'auth-profiles.json' -exec chmod 600 {} \;
If you don’t want SSH on the public internet at all:
curl -fsSL https://tailscale.com/install.sh | sudo bash
sudo tailscale up --ssh
# Then close port 22 in the Lightsail/EC2 security group entirely.
# SSH only via Tailscale MagicDNS: ssh ubuntu@openclaw
gateway.bind = "lo" or "lan" only)dmPolicy: "allowlist" with your numeric IDcommands.ownerAllowFrom set to your numeric ID~/.openclaw/.env is 0600openclaw doctor is greenopenclaw onboard --install-daemon already created the user unit. Just verify and tighten.
systemctl --user status openclaw
# If you don't see one:
openclaw daemon install --user
# Lingering so it survives logout
sudo loginctl enable-linger ubuntu
# Auto-start on boot
systemctl --user enable openclaw
# Logs
journalctl --user -u openclaw -f
# Restart / stop / start
systemctl --user restart openclaw
systemctl --user stop openclaw
systemctl --user start openclaw
The daemon runs as the ubuntu user (no root), reads ~/.openclaw/.env, and writes state under ~/.openclaw/. Restart-on-crash is built in.
Everything that matters lives in ~/.openclaw/. Nightly snapshot to S3:
sudo apt install -y awscli
aws configure # least-privilege IAM key with PutObject on one bucket
cat > ~/backup-openclaw.sh <<'EOF'
#!/bin/bash
set -euo pipefail
BUCKET="s3://your-bucket/openclaw-backups"
TS=$(date -u +%Y%m%d-%H%M%S)
cd ~
tar --exclude='.openclaw/workspace/*/node_modules' \
--exclude='.openclaw/agents/*/sessions/cache' \
-czf /tmp/openclaw-${TS}.tar.gz .openclaw
aws s3 cp /tmp/openclaw-${TS}.tar.gz "${BUCKET}/openclaw-${TS}.tar.gz" \
--storage-class STANDARD_IA
rm /tmp/openclaw-${TS}.tar.gz
# Keep last 14 days
aws s3 ls "${BUCKET}/" | sort | head -n -14 | awk '{print $4}' | while read f; do
aws s3 rm "${BUCKET}/${f}"
done
EOF
chmod +x ~/backup-openclaw.sh
( crontab -l 2>/dev/null; echo "0 3 * * * /home/ubuntu/backup-openclaw.sh" ) | crontab -
A Lightsail full-disk snapshot ($0.05/GB/mo) is a coarser, simpler alternative.
OpenRouter is a single API that proxies hundreds of LLMs. The free tier (:free suffix) gets you ~50 requests/day per account; bumps to 1,000/day if you load $10 in lifetime credits (those credits are not consumed by free models, they only act as a quota unlock).
openrouter.ai. Sign in with Google, GitHub, or magic link. No phone, no KYC.openrouter.ai/keys, Create Key, optionally set a credit cap.sk-or-v1-... token (shown once).openrouter.ai/credits load $10. This raises the free-model cap from 50 to 1,000 requests/day across all :free models combined.Quick test:
export OPENROUTER_API_KEY=sk-or-v1-...
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3-coder:free",
"messages": [{"role":"user","content":"Say hi"}]
}'
Limits as of April 2026:
| Limit | Value |
|---|---|
| Free models, RPM | 20/min (account-wide) |
| Free models, RPD (no credits) | 50/day |
| Free models, RPD (>= $10 credits) | 1,000/day |
| Pool | All :free models share one daily counter |
MiniMax’s Token Plan is a real monthly subscription, not credit top-up. The Plus tier gives 4,500 requests per rolling 5-hour window on MiniMax-M2.7.
platform.minimax.io. Sign up with email plus verification.MINIMAX_API_KEY=... in ~/.openclaw/.env and use minimax/MiniMax-M2.7. Desktop OAuth path: openclaw onboard --auth-choice minimax-global-oauth, using minimax-portal/MiniMax-M2.7.| Plan | Price | M2.7 Quota |
|---|---|---|
| Starter | $10/mo | 1,500 req/5h |
| Plus | $20/mo | 4,500 req/5h |
| Max | $50/mo | 15,000 req/5h |
When the 5-hour window exhausts, the Token Plan API key returns 429 and the cascade auto-advances to Codex.
Codex CLI is OpenAI’s official Rust-based coding agent. With a ChatGPT Plus/Pro/Business subscription it authenticates by OAuth, no separate API key, no per-token billing. Limits follow your ChatGPT plan (5h windows, weekly caps).
The OAuth callback hits localhost:1455, awkward on a headless VPS. Three paths.
ssh -L 1455:localhost:1455 -i ~/.ssh/lightsail-...pem ubuntu@${PUBLIC_IP}
openclaw onboard --auth-choice openai-codex-oauth
https://auth.openai.com/... URL. Open it in the laptop browser.localhost:1455. Because of -L, that resolves on the VPS, completing the flow.~/.openclaw/agents/<id>/agent/auth-profiles.json.openclaw onboard --auth-choice openai-codex-oauth --device-code
Prints a code; visit the URL in any browser, paste, approve. Personal accounts work; ChatGPT Business/Team workspaces require an admin to enable device-code login first.
auth.json from a desktop login# On the Mac
brew install --cask codex
codex login
# Copy credentials over
ssh -i ~/.ssh/... ubuntu@${PUBLIC_IP} 'mkdir -p ~/.codex'
scp -i ~/.ssh/... ~/.codex/auth.json ubuntu@${PUBLIC_IP}:~/.codex/auth.json
# On the VPS
openclaw onboard --auth-choice openai-codex-oauth --import ~/.codex/auth.json
Caveat: Codex refresh tokens are effectively single-use. If both machines use the credentials and one refreshes, the other becomes invalid. Path A is the most stable for a long-lived deploy.
Claude is the gated tier. Don’t put it in the auto-cascade, only summon it via /model.
console.anthropic.com.openclaw. Copy the sk-ant-... (shown once)..env:echo 'ANTHROPIC_API_KEY=sk-ant-...' >> ~/.openclaw/.env
chmod 600 ~/.openclaw/.env
systemctl --user restart openclaw.Test on Telegram:
/model anthropic/claude-opus-4-7
explain the difference between Promise.all and Promise.allSettled
/new
/new returns the session to the cascade primary so subsequent messages don’t burn Anthropic credit accidentally.
These are the three free models worth chaining, ranked. All accept tools, all reachable via <id>:free.
| Rank | Model ref | Why | Context | Best at |
|---|---|---|---|---|
| 1 | qwen/qwen3-coder:free | Coding-specialized MoE (480B/35B active), best free coder available | 262K | Code generation, agentic tool use |
| 2 | z-ai/glm-4.5-air:free | Strong agent-centric MoE near frontier on SWE-bench | 131K | Reasoning, long-form tasks |
| 3 | openai/gpt-oss-120b:free | OpenAI open-weight, Apache-2.0, different family (decorrelates outages) | 131K | General reasoning, fallback diversity |
Why three free models in a row instead of just one? Free endpoints throttle constantly during US business hours. A single free model means one 429 burst lands you on MiniMax (paid quota). Three diverse free models absorb brief throttles for free.
| Item | Cost | Notes |
|---|---|---|
| Lightsail medium_3_0 | $20.00/mo | Flat. Includes 3 TB egress. |
| OpenRouter credits load | $10.00 | One-time. Lifts free cap from 50 to 1,000/day. |
| MiniMax Coding Plan Plus | $20.00/mo | Tier 2. |
| ChatGPT Plus | $20.00/mo | Tier 3. |
| Anthropic API | $0-50/mo | Only when you /model claude. Monthly cap recommended. |
| S3 backup (~1 GB) | $0.02/mo | Standard-IA. |
| Recurring subtotal | ~$60.02/mo | Excludes opportunistic Claude spend. |
| Plan | Cost/mo |
|---|---|
| This cascade (free first, Claude on demand) | ~$60 + Claude on demand |
| Claude only via API, ~5M tokens/mo | ~$75 (input/output mix) |
| ChatGPT Pro standalone | $200 |
~90 minutes end to end:
| Command | What it does |
|---|---|
hello | Routes through the cascade, default model. |
/model claude | Locks the current session to Claude (strict, no auto-failover). |
/model codex | Force the Codex tier (alias from §3.3). |
/model list | Show the allowlisted models. |
/new | Fresh session. Returns to cascade primary. |
/reset | Same as /new; clears all session state. |
/status | Current model, channel, session, tool state. |
/think high | Boost reasoning effort for the next reply. |
/usage tokens | Token count for the session. |
# Logs
journalctl --user -u openclaw -f
# Hot-edit without restart
openclaw config set agents.defaults.model.primary 'minimax/MiniMax-M2.7'
openclaw gateway reload
# Provider auth health
openclaw doctor
openclaw models auth list
# Which model handled the last message
openclaw sessions list
openclaw sessions history --session <id>
# Update OpenClaw
npm install -g openclaw@latest
systemctl --user restart openclaw
Pull a tier out of rotation:
# Disable temporarily
openclaw models auth disable openrouter:default
# Re-enable
openclaw models auth enable openrouter:default
# Force a specific tier
openclaw config set agents.defaults.model.primary 'minimax/MiniMax-M2.7'
openclaw gateway reload
openclaw gateway status
openclaw config get channels.telegram
openclaw doctor
openclaw logs --follow
# Common culprit: dmPolicy=allowlist with the wrong numeric ID
openclaw config get channels.telegram.allowFrom
Means I used /model X where X isn’t in agents.defaults.models. Add it:
openclaw config set --strict-json --merge agents.defaults.models \
'{"openai/gpt-4o": {"alias": "gpt4o"}}'
openclaw logs --follow | grep -i 'failover\|429\|cooldown'
openclaw config get agents.defaults.model.fallbacks
If fallbacks is empty, re-apply the JSON from §3.3.
openclaw onboard --auth-choice openai-codex-oauth --reauth
Sandbox Docker is competing with Node for RAM on a 4 GB host. Either bump Lightsail to large_3_0 (8 GB, $40/mo), or set agents.defaults.sandbox.mode = "off" (less safe, more RAM headroom).