Alternative LLM Backends¶
Selain Kiro, ada banyak backend lain. Comparison + setup per provider.
Quick comparison (Mei 2026)¶
| Provider | Cost | Quality | Setup difficulty | Best for |
|---|---|---|---|---|
| Kiro AI | FREE | ⭐⭐⭐⭐⭐ | Easy | Default best |
| OpenCode Free | FREE | ⭐⭐⭐⭐ | Easiest (no auth) | Quick start |
| Vertex AI | FREE ($300 credit) | ⭐⭐⭐⭐⭐ | Medium | High volume + quality |
| OpenRouter | Pay-per-use | ⭐⭐⭐⭐ | Easy | Flexibility |
| z.ai (GLM) | Cheap ($0.6/1M) | ⭐⭐⭐⭐ | Easy | Cost-effective |
| Kimi | Cheap | ⭐⭐⭐⭐ | Easy | Asian language tasks |
| DeepSeek | Cheap ($0.14/1M) | ⭐⭐⭐⭐ | Easy | Reasoning + coding |
| Groq | FREE tier + paid | ⭐⭐⭐⭐ | Easy | Speed (fastest) |
| OpenAI API | $$$ | ⭐⭐⭐⭐⭐ | Easiest | Mainstream |
| Anthropic API | $$$ | ⭐⭐⭐⭐⭐ | Easy | Claude direct |
| Self-host Llama | FREE (compute) | ⭐⭐⭐⭐ | Hard | Privacy |
OpenCode Free¶
Paling gampang setup. No login, no auth, no API key.
Setup di 9Router¶
Dashboard → Providers → OpenCode Free → Connect → done.
Models auto-fetched dari https://opencode.ai/zen/v1/models. Biasanya ada:
oc/claude-3.5-sonnetoc/gpt-4ooc/llama-3.1-70boc/gemini-2.0-flash
Pakai di bot¶
Trade-off¶
- ✅ Fastest setup
- ✅ Models auto-update
- ❌ Rate-limit ga jelas
- ❌ Reliability tergantung OpenCode infra
Cocok untuk: dev / staging, atau backup ke Kiro.
Vertex AI (Google Cloud)¶
Kalo lo punya akun GCP baru, dapat $300 credit gratis. Cukup buat agent personal ~6-12 bulan.
Setup¶
- Sign up GCP: https://cloud.google.com/free
- Aktifkan billing (kredit gratis, ga charge sampai habis)
- Enable Vertex AI API
- Bikin service account:
- IAM & Admin → Service Accounts → Create
- Role: Vertex AI User
- Download JSON key
Setup di 9Router¶
Dashboard → Providers → Vertex AI → Upload JSON key → Select project ID.
Models available:
vx/gemini-3-provx/gemini-3-flashvx/claude-sonnet-4(via Vertex partnership)vx/glm-5vx/deepseek-v3
Pakai di bot¶
Trade-off¶
- ✅ Premium models (Gemini 3, Claude via Vertex)
- ✅ $300 credit lasts long
- ✅ Stable & reliable
- ❌ Setup ribet (GCP project, IAM, service account)
- ❌ After credit, expensive
OpenRouter¶
Hub ke 100+ models dari semua provider. 1 API key access ke OpenAI + Anthropic + Google + Mistral + free tier (Llama 3, Mistral 7B, etc).
Setup¶
- Sign up: https://openrouter.ai
- Dashboard → Keys → Create key (
sk-or-v1-...) - (Optional) Top up credit, atau cuma pake free tier
Setup di 9Router¶
Dashboard → Providers → OpenRouter → Paste API key.
Direct pakai tanpa 9Router¶
Kalo lo ga mau host 9Router, OpenRouter langsung bisa jadi backend Kai:
OPENAI_API_KEY=sk-or-v1-xxxxxxxx
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_MODEL=anthropic/claude-3.5-sonnet
Tambah header (opsional):
client = OpenAI(
api_key=os.environ["OPENROUTER_API_KEY"],
base_url="https://openrouter.ai/api/v1",
default_headers={
"HTTP-Referer": "https://yourdomain.com", # untuk credit attribution
"X-Title": "Kai Personal Agent"
}
)
Model populer di OpenRouter¶
| Model | Cost (per 1M token in/out) |
|---|---|
anthropic/claude-3.5-sonnet |
$3 / $15 |
anthropic/claude-3-haiku |
$0.25 / $1.25 |
openai/gpt-4o-mini |
$0.15 / $0.6 |
openai/gpt-4o |
$2.50 / $10 |
google/gemini-2.0-flash-exp:free |
FREE (rate-limited) |
meta-llama/llama-3.1-70b-instruct:free |
FREE (rate-limited) |
mistralai/mistral-7b-instruct:free |
FREE (rate-limited) |
Strategi hemat: pakai free models untuk dev, paid models untuk prod kritis.
Trade-off¶
- ✅ Mainstream, paling banyak dokumentasi
- ✅ 100+ models di 1 API
- ✅ Free tier untuk Llama, Mistral, Gemini Flash
- ❌ Free tier rate-limited (ga konsisten)
- ❌ Paid tier ga termurah
z.ai (GLM by Zhipu)¶
China-based provider, GLM family models. Murah tapi quality decent.
Setup¶
- Sign up: https://z.ai
- Top up credit (minimal $5)
- Generate API key
Direct integration¶
OPENAI_API_KEY=<glm-api-key>
OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
OPENAI_MODEL=glm-4-plus
Setup di 9Router¶
Dashboard → Providers → GLM (Zhipu) → API key.
Models¶
glm-5— flagship, comparable to GPT-4glm-4-plus— fast & quality balanceglm-4-flash— cheapest, simple tasks
Cost¶
- glm-5: $0.6 / 1M tokens
- glm-4-plus: $0.3 / 1M tokens
- glm-4-flash: $0.1 / 1M tokens
10x lebih murah dari Claude/GPT-4.
Trade-off¶
- ✅ Sangat murah
- ✅ Quality decent untuk most tasks
- ❌ China-hosted (kalo concern data sovereignty)
- ❌ Performa Mandarin > English
Kimi (Moonshot AI)¶
China-based, Kimi K2 model.
Setup¶
- Sign up: https://platform.moonshot.cn
- API key
Direct integration¶
OPENAI_API_KEY=<kimi-api-key>
OPENAI_BASE_URL=https://api.moonshot.cn/v1
OPENAI_MODEL=moonshot-v1-32k
Models¶
moonshot-v1-8k— short contextmoonshot-v1-32k— mediummoonshot-v1-128k— long context (paling populer)
Trade-off¶
- ✅ Long context (128k tokens)
- ✅ Cheap
- ❌ Performa Indonesian/English bisa inconsistent
DeepSeek¶
China-based, strong di reasoning + coding.
Setup¶
- Sign up: https://platform.deepseek.com
- Top up $5 minimum
- API key
Direct integration¶
OPENAI_API_KEY=<deepseek-api-key>
OPENAI_BASE_URL=https://api.deepseek.com/v1
OPENAI_MODEL=deepseek-chat
Models¶
deepseek-chat— general purposedeepseek-coder— specialized codingdeepseek-reasoner— R1-style reasoning
Cost¶
- $0.14 / 1M input tokens
- $0.28 / 1M output tokens
Termurah di tier paid.
Trade-off¶
- ✅ Sangat murah
- ✅ Quality kompetitif
- ✅ Strong di code & reasoning
- ❌ Latency variabel
Groq (Speed king)¶
Cloud LPU inference. Kecepatan inferensi ~10x lebih cepat dari GPU standar.
Setup¶
- Sign up: https://console.groq.com
- API key free tier
Direct integration¶
OPENAI_API_KEY=<groq-api-key>
OPENAI_BASE_URL=https://api.groq.com/openai/v1
OPENAI_MODEL=llama-3.3-70b-versatile
Models¶
llama-3.3-70b-versatile— fastest decent qualityllama-3.1-8b-instant— sub-second responsemixtral-8x7b-32768— long contextgemma2-9b-it— Google open model
Cost¶
- Free tier: 30 req/min, 6000 req/day
- Paid: $0.59 / 1M tokens (Llama 70B)
Trade-off¶
- ✅ Fastest LLM inference available
- ✅ Free tier generous
- ✅ Llama 70B competitive dengan Claude Haiku
- ❌ Cuma open models (Llama, Mixtral, Gemma)
- ❌ Bukan flagship quality
Cocok untuk: bot yang butuh respon cepet (< 1 detik), real-time chat.
OpenAI API langsung¶
Yang paling mainstream. Pay-as-you-go.
Setup¶
- Sign up: https://platform.openai.com
- Top up minimum $5
- API key
Integration¶
Models¶
gpt-4o-mini— daily driver ($0.15 / $0.60 per 1M)gpt-4o— flagship ($2.50 / $10 per 1M)gpt-4-turbo— older flagshipo1-mini/o1-preview— reasoning
Trade-off¶
- ✅ Mainstream, ton dokumentasi
- ✅ Reliable, scalable
- ✅ Privacy lebih baik (paid policy)
- ❌ Lebih mahal dari China-based
- ❌ Quality OpenAI updates ga konsisten lately
Anthropic API langsung¶
Direct ke Claude tanpa proxy.
Setup¶
- Sign up: https://console.anthropic.com
- Top up $5 minimum
- API key
Integration¶
⚠️ Anthropic API format beda dari OpenAI. Pake anthropic SDK:
from anthropic import Anthropic
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "halo"}],
max_tokens=1024,
system="You are Kai..."
)
Atau via OpenRouter/9Router yang udah handle translation OpenAI ↔ Anthropic.
Models¶
claude-3-5-sonnet— flagship Claude 3.5claude-3-5-haiku— fastclaude-3-opus— older flagship (deprecated soon)claude-4-sonnet-thinking(kalo udah release)
Cost¶
- Claude 3.5 Sonnet: $3 / $15 per 1M
- Claude 3.5 Haiku: $1 / $5 per 1M
Trade-off¶
- ✅ Best quality (Claude 4.5 via API tier)
- ✅ Privacy decent
- ❌ Termahal
- ❌ Format ga OpenAI-compatible (perlu adapter / OpenRouter)
Self-host (advanced)¶
Kalo lo serius privacy + own everything.
Stack populer¶
Ollama (paling gampang):
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b
ollama serve # default port 11434
OpenAI-compatible endpoint:
vLLM (high-throughput):
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-3.1-8B-Instruct \
--port 8000
llama.cpp (CPU-friendly, quantized):
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
./server -m models/llama-3.1-8b.Q4_K_M.gguf -c 4096 --port 8080
Hardware requirement¶
| Model | Min RAM (Q4 quantized) | Min RAM (Full FP16) |
|---|---|---|
| Llama 3.1 8B | 6 GB | 16 GB |
| Llama 3.1 70B | 40 GB | 140 GB |
| Mistral 7B | 5 GB | 14 GB |
| Qwen 2.5 7B | 5 GB | 14 GB |
| DeepSeek-Coder 6.7B | 5 GB | 13 GB |
VPS Oracle (24GB RAM ARM) bisa run Llama 3.1 8B quantized comfort.
Trade-off¶
- ✅ Full privacy, ga ada outbound network call
- ✅ Zero cost ongoing (after VPS)
- ✅ Customize model behavior fully
- ❌ Setup ribet
- ❌ Quality < cloud frontier models
- ❌ CPU inference slow (token/detik rendah)
Recommendation per use case¶
"Bot personal, gratis, kualitas tinggi"¶
→ Kiro AI via 9Router
"Bot personal, gratis, super simple"¶
→ OpenCode Free via 9Router
"Bot personal, butuh privacy"¶
→ Self-host Llama 3.1 8B + Ollama, atau Anthropic API (paid)
"Bot personal, butuh kecepatan ekstrim"¶
→ Groq (Llama 70B via LPU)
"Bot personal, cost-conscious tapi mau quality"¶
→ DeepSeek ($0.14/1M) atau GLM ($0.6/1M)
"Bot production, mainstream support"¶
→ OpenAI API atau Anthropic API
"Bot dengan banyak provider fallback"¶
→ 9Router + connect 3-5 provider tier-based
Decision tree¶
Mau bayar?
├── Tidak
│ ├── Mau setup minimal? → OpenCode Free
│ ├── Mau quality terbaik? → Kiro AI
│ ├── Mau GCP credit? → Vertex AI ($300)
│ └── Mau privacy? → Self-host Llama
│
└── Ya, budget?
├── < $5/bln → DeepSeek / GLM via OpenRouter
├── $5-20/bln → OpenAI gpt-4o-mini / Anthropic Haiku
└── $20+/bln → Claude Sonnet via Anthropic API
Switching backend di kode¶
Bot lo harus support switch backend tanpa code change. Pakai env var:
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
OPENAI_BASE_URL = os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1")
OPENAI_MODEL = os.environ.get("OPENAI_MODEL", "gpt-4o-mini")
client = OpenAI(api_key=OPENAI_API_KEY, base_url=OPENAI_BASE_URL)
Switch provider = update .env + restart bot:
# Pake Kiro via 9Router
sed -i 's|OPENAI_BASE_URL=.*|OPENAI_BASE_URL=http://127.0.0.1:20128/v1|' ~/agent/.env
sed -i 's|OPENAI_MODEL=.*|OPENAI_MODEL=kr/claude-sonnet-4.5|' ~/agent/.env
sudo systemctl restart kai-bot
# Pake OpenAI langsung
sed -i 's|OPENAI_BASE_URL=.*|OPENAI_BASE_URL=https://api.openai.com/v1|' ~/agent/.env
sed -i 's|OPENAI_MODEL=.*|OPENAI_MODEL=gpt-4o-mini|' ~/agent/.env
sudo systemctl restart kai-bot
Atau bikin /setmodel command di bot untuk runtime switch.
Final advice¶
Untuk pemula: 1. Mulai dengan Kiro via 9Router (gratis, quality tinggi) 2. Setup OpenCode Free sebagai fallback 3. Tambah Anthropic / OpenAI paid kalo perlu privacy / reliability
Untuk advanced: 1. Multi-account Kiro untuk effective doubling 2. Tier-based via 9Router (Subscription → Cheap → Free) 3. Self-host Llama untuk privacy-sensitive task
Untuk production komersil: 1. Anthropic API atau OpenAI API langsung 2. Monitoring + cost alert 3. SLA dengan provider
Cost projection personal agent (Mei 2026): - Pure free (Kiro + OpenCode): $0/bulan - Hybrid (Kiro + DeepSeek backup): $1-3/bulan - Production paid (Anthropic Claude Sonnet): $10-30/bulan