10. Resource Management¶
Pola start → use → stop.
Agent yang ga sadar resource = agent yang ngabisin disk, memory, atau billing cloud lo dalam semalam. Resource management pilar nentuin pola housekeeping agent.
Yang harus ada¶
## RESOURCE MANAGEMENT
- Pola kerja: start → use → stop.
- Setelah selesai pake dev server, container, atau background
process — stop kalo bukan long-lived service.
- Pengecualian: bot agent itu sendiri, production server,
service yang emang harus running terus — biarin.
- Disk dipantau (VPS terbatas) — kalo butuh space, suggest
cleanup ke user dulu.
Tipe resource¶
1. Compute (CPU/memory)¶
- Process yang lagi jalan
- Background scripts (cron, scheduler)
- Docker containers
- Browser sessions
2. Storage (disk)¶
- File logs (bisa bengkak)
- Cache (npm, pip, puppeteer)
- Temporary files (/tmp)
- Build artifacts
3. Network (API quota)¶
- API calls dengan rate limit
- Cloud service quotas (S3, GPU, etc)
- DNS queries
- Egress bandwidth (kalo cloud charge egress)
4. Money¶
- Cloud billing (AWS, GCP)
- API costs (OpenAI, dll)
- Wallet gas fees
- Subscription services
Pattern start → use → stop¶
Agent yang baik:
1. START: agent decide perlu spin up resource X
- Confirm necessity (apakah bener-bener butuh?)
- Confirm spec (size, region, dll)
- Log: "Starting X for purpose Y"
2. USE: agent pake resource
- Track usage (logs, output)
- Check idle time
3. STOP: agent selesai pake
- Verify task complete
- Cleanup: stop container, remove temp files, terminate session
- Log: "Stopped X after Y minutes"
Contoh:
User: scrape data dari 100 page
Agent: [START playwright headless browser]
[USE: navigate 100 pages, save data]
[STOP: close browser, free 250MB RAM]
Done. Data saved to scraped.json (2.3MB, 100 pages).
Long-lived vs ephemeral¶
Identifikasi mana yang ephemeral vs long-lived:
| Resource | Type | Action |
|---|---|---|
| Bot agent itu sendiri | Long-lived | Biarin running |
| Production database | Long-lived | Biarin running |
| Production API | Long-lived | Biarin running |
| 9router/proxy LLM lokal | Long-lived | Biarin running |
| Test browser session | Ephemeral | Stop after use |
| Build container | Ephemeral | Stop after use |
| Dev server lokal | Ephemeral | Stop after debug |
| Temporary files | Ephemeral | Cleanup |
SOUL.md harus list yang long-lived:
Long-lived services (biarin running):
- kai-bot.service (bot itu sendiri)
- 9router (LLM proxy lokal)
- nginx (kalo ada web server)
- postgresql (kalo ada DB)
Disk awareness¶
VPS gratis tier biasanya disk kecil. Agent harus aware:
def check_disk_space():
import shutil
total, used, free = shutil.disk_usage("/")
free_pct = (free / total) * 100
return {
"free_gb": free / (1024**3),
"free_pct": free_pct,
"warning": free_pct < 20
}
Inject ke system prompt kalo critical:
def build_system_prompt():
soul = load_soul()
disk = check_disk_space()
disk_warning = ""
if disk["free_pct"] < 10:
disk_warning = f"\n\n⚠️ DISK CRITICAL: {disk['free_gb']:.2f}GB free. "
disk_warning += "Suggest cleanup ke user sebelum install/clone besar."
return f"{soul}{disk_warning}"
Agent harus suggest cleanup, bukan langsung jalan:
User: clone repo besar A
Agent: Disk udah 92%, sisa 500MB. Repo A katanya 1.2GB.
Mau gue cleanup dulu? Yang bisa di-cleanup safe:
- npm cache (~100MB)
- pip cache (~50MB)
- apt cache (~80MB)
Atau lo perlu space lebih besar, resize disk dulu?
Cost awareness¶
Untuk paid cloud service, agent track cost:
User: spin up GPU instance untuk training
Agent: GPU instance (p3.2xlarge) = $3.06/hour. Estimasi training
2-3 jam = $6-9. Confirm lanjut? Balas "ya".
User aware sebelum charge.
Cleanup checklist¶
Saat task selesai, agent harus cek:
- Temp files dihapus
- Build artifacts compressed/dihapus
- Containers di-stop
- Browser sessions di-close
- Background processes di-kill
- Open file handles di-tutup
Untuk task yang spin up banyak resource:
RESOURCES_TO_CLEANUP = []
def use_resource(resource):
RESOURCES_TO_CLEANUP.append(resource)
return resource
def cleanup_all():
for r in RESOURCES_TO_CLEANUP:
try:
r.cleanup()
except Exception as e:
log(f"Failed cleanup {r}: {e}")
RESOURCES_TO_CLEANUP.clear()
# Use:
try:
browser = use_resource(spin_up_browser())
# ... use browser
finally:
cleanup_all()
Periodic audit¶
Tiap minggu, agent suggest audit:
Agent (proaktif weekly):
"Weekly audit:
- Disk: 78% used (warning kalo >85%)
- Bot uptime: 6.4 days
- Memory: 240MB peak
- API calls hari ini: 142 (rate limit OK)
- Background processes: 1 (bot itu sendiri, OK)
Anything to cleanup atau review?"
Anti-patterns¶
❌ Spin up tanpa stop¶
(playwright masih jalan, ngabisin RAM)
❌ Cache yang ga di-cleanup¶
(npm/pip cache bengkak)
❌ Log yang ga rotate¶
(disk penuh dalam beberapa bulan)
Pakai logrotate atau truncate:
def rotate_log(log_file, max_size_mb=100):
if os.path.exists(log_file):
size_mb = os.path.getsize(log_file) / (1024**2)
if size_mb > max_size_mb:
archive = f"{log_file}.{datetime.now().strftime('%Y%m%d-%H%M%S')}"
shutil.move(log_file, archive)
❌ Cloud resource yang lupa di-shutdown¶
(GPU instance masih running, $3/jam, 24/7)
Wajib aws ec2 stop-instances atau equivalent.
SOUL.md template lengkap¶
## RESOURCE MANAGEMENT
Pola: start → use → stop.
Long-lived services (biarin running):
- kai-bot.service
- 9router (LLM proxy)
- nginx, postgresql, dll
Ephemeral (stop after use):
- Browser sessions
- Build containers
- Dev servers
- Temp files
Periodic cleanup:
- Logs > 100MB → rotate
- Cache > 1GB → cleanup
- /tmp > 7 days old → hapus
Cost awareness:
- Cloud GPU/spot instances → confirm sebelum spin up
- API quota → check sebelum bulk operation
- Egress bandwidth → estimate before transfer
Disk awareness:
- Free < 20% → suggest cleanup
- Free < 10% → block install/clone besar sampai cleanup