Lewati ke isi

10. Resource Management

Pola start → use → stop.

Agent yang ga sadar resource = agent yang ngabisin disk, memory, atau billing cloud lo dalam semalam. Resource management pilar nentuin pola housekeeping agent.

Yang harus ada

## RESOURCE MANAGEMENT

- Pola kerja: start → use → stop.
- Setelah selesai pake dev server, container, atau background 
  process — stop kalo bukan long-lived service.
- Pengecualian: bot agent itu sendiri, production server, 
  service yang emang harus running terus — biarin.
- Disk dipantau (VPS terbatas) — kalo butuh space, suggest 
  cleanup ke user dulu.

Tipe resource

1. Compute (CPU/memory)

  • Process yang lagi jalan
  • Background scripts (cron, scheduler)
  • Docker containers
  • Browser sessions

2. Storage (disk)

  • File logs (bisa bengkak)
  • Cache (npm, pip, puppeteer)
  • Temporary files (/tmp)
  • Build artifacts

3. Network (API quota)

  • API calls dengan rate limit
  • Cloud service quotas (S3, GPU, etc)
  • DNS queries
  • Egress bandwidth (kalo cloud charge egress)

4. Money

  • Cloud billing (AWS, GCP)
  • API costs (OpenAI, dll)
  • Wallet gas fees
  • Subscription services

Pattern start → use → stop

Agent yang baik:

1. START: agent decide perlu spin up resource X
   - Confirm necessity (apakah bener-bener butuh?)
   - Confirm spec (size, region, dll)
   - Log: "Starting X for purpose Y"

2. USE: agent pake resource
   - Track usage (logs, output)
   - Check idle time

3. STOP: agent selesai pake
   - Verify task complete
   - Cleanup: stop container, remove temp files, terminate session
   - Log: "Stopped X after Y minutes"

Contoh:

User: scrape data dari 100 page

Agent: [START playwright headless browser]
       [USE: navigate 100 pages, save data]
       [STOP: close browser, free 250MB RAM]
       Done. Data saved to scraped.json (2.3MB, 100 pages).

Long-lived vs ephemeral

Identifikasi mana yang ephemeral vs long-lived:

Resource Type Action
Bot agent itu sendiri Long-lived Biarin running
Production database Long-lived Biarin running
Production API Long-lived Biarin running
9router/proxy LLM lokal Long-lived Biarin running
Test browser session Ephemeral Stop after use
Build container Ephemeral Stop after use
Dev server lokal Ephemeral Stop after debug
Temporary files Ephemeral Cleanup

SOUL.md harus list yang long-lived:

Long-lived services (biarin running):
- kai-bot.service (bot itu sendiri)
- 9router (LLM proxy lokal)
- nginx (kalo ada web server)
- postgresql (kalo ada DB)

Disk awareness

VPS gratis tier biasanya disk kecil. Agent harus aware:

def check_disk_space():
    import shutil
    total, used, free = shutil.disk_usage("/")
    free_pct = (free / total) * 100
    return {
        "free_gb": free / (1024**3),
        "free_pct": free_pct,
        "warning": free_pct < 20
    }

Inject ke system prompt kalo critical:

def build_system_prompt():
    soul = load_soul()
    disk = check_disk_space()

    disk_warning = ""
    if disk["free_pct"] < 10:
        disk_warning = f"\n\n⚠️ DISK CRITICAL: {disk['free_gb']:.2f}GB free. "
        disk_warning += "Suggest cleanup ke user sebelum install/clone besar."

    return f"{soul}{disk_warning}"

Agent harus suggest cleanup, bukan langsung jalan:

User: clone repo besar A

Agent: Disk udah 92%, sisa 500MB. Repo A katanya 1.2GB. 
       Mau gue cleanup dulu? Yang bisa di-cleanup safe:
       - npm cache (~100MB)
       - pip cache (~50MB)  
       - apt cache (~80MB)
       Atau lo perlu space lebih besar, resize disk dulu?

Cost awareness

Untuk paid cloud service, agent track cost:

User: spin up GPU instance untuk training

Agent: GPU instance (p3.2xlarge) = $3.06/hour. Estimasi training 
       2-3 jam = $6-9. Confirm lanjut? Balas "ya".

User aware sebelum charge.

Cleanup checklist

Saat task selesai, agent harus cek:

  • Temp files dihapus
  • Build artifacts compressed/dihapus
  • Containers di-stop
  • Browser sessions di-close
  • Background processes di-kill
  • Open file handles di-tutup

Untuk task yang spin up banyak resource:

RESOURCES_TO_CLEANUP = []

def use_resource(resource):
    RESOURCES_TO_CLEANUP.append(resource)
    return resource

def cleanup_all():
    for r in RESOURCES_TO_CLEANUP:
        try:
            r.cleanup()
        except Exception as e:
            log(f"Failed cleanup {r}: {e}")
    RESOURCES_TO_CLEANUP.clear()

# Use:
try:
    browser = use_resource(spin_up_browser())
    # ... use browser
finally:
    cleanup_all()

Periodic audit

Tiap minggu, agent suggest audit:

Agent (proaktif weekly):
"Weekly audit:
- Disk: 78% used (warning kalo >85%)
- Bot uptime: 6.4 days
- Memory: 240MB peak
- API calls hari ini: 142 (rate limit OK)
- Background processes: 1 (bot itu sendiri, OK)

Anything to cleanup atau review?"

Anti-patterns

❌ Spin up tanpa stop

Agent: [start playwright]
       [scrape]
       Done.

(playwright masih jalan, ngabisin RAM)

❌ Cache yang ga di-cleanup

Agent: [clone repo, install, build] (5x dalam 1 jam)

(npm/pip cache bengkak)

❌ Log yang ga rotate

Agent: [save log ke /var/log/agent.log, append forever]

(disk penuh dalam beberapa bulan)

Pakai logrotate atau truncate:

def rotate_log(log_file, max_size_mb=100):
    if os.path.exists(log_file):
        size_mb = os.path.getsize(log_file) / (1024**2)
        if size_mb > max_size_mb:
            archive = f"{log_file}.{datetime.now().strftime('%Y%m%d-%H%M%S')}"
            shutil.move(log_file, archive)

❌ Cloud resource yang lupa di-shutdown

Agent: [start EC2 GPU instance for training]
       [training done]
       Done.

(GPU instance masih running, $3/jam, 24/7)

Wajib aws ec2 stop-instances atau equivalent.

SOUL.md template lengkap

## RESOURCE MANAGEMENT

Pola: start → use → stop.

Long-lived services (biarin running):
- kai-bot.service
- 9router (LLM proxy)
- nginx, postgresql, dll

Ephemeral (stop after use):
- Browser sessions
- Build containers
- Dev servers
- Temp files

Periodic cleanup:
- Logs > 100MB → rotate
- Cache > 1GB → cleanup
- /tmp > 7 days old → hapus

Cost awareness:
- Cloud GPU/spot instances → confirm sebelum spin up
- API quota → check sebelum bulk operation
- Egress bandwidth → estimate before transfer

Disk awareness:
- Free < 20% → suggest cleanup
- Free < 10% → block install/clone besar sampai cleanup