Lewati ke isi

Update & Maintain

Pemeliharaan harian dan jangka panjang.

Daily checklist

Cek setiap hari (atau setup monitoring):

  • Service status: sudo systemctl is-active kai-bot
  • Disk space: df -h (alert kalo >80%)
  • Memory: free -h
  • Recent errors: sudo journalctl -u kai-bot -p err --since today
  • Bot responsive (chat di Telegram)

Cron script untuk auto-check:

cat > ~/agent/daily-check.sh << 'EOF'
#!/bin/bash
# Run via cron daily, alert via Telegram if issues

source ~/agent/.env

ISSUES=()

# Service check
if ! systemctl is-active --quiet kai-bot; then
    ISSUES+=("Service DOWN")
fi

# Disk check
DISK_PCT=$(df / | tail -1 | awk '{print $5}' | tr -d '%')
if [ "$DISK_PCT" -gt 80 ]; then
    ISSUES+=("Disk ${DISK_PCT}% full")
fi

# Memory check
MEM_PCT=$(free | grep Mem | awk '{print int($3/$2 * 100)}')
if [ "$MEM_PCT" -gt 80 ]; then
    ISSUES+=("RAM ${MEM_PCT}% used")
fi

# Recent errors
ERROR_COUNT=$(journalctl -u kai-bot -p err --since "24 hours ago" | wc -l)
if [ "$ERROR_COUNT" -gt 10 ]; then
    ISSUES+=("$ERROR_COUNT errors last 24h")
fi

# Alert if any
if [ ${#ISSUES[@]} -gt 0 ]; then
    MSG="Daily check issues:%0A"
    for issue in "${ISSUES[@]}"; do
        MSG="${MSG}- ${issue}%0A"
    done
    curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
        -d "chat_id=${OWNER_TELEGRAM_ID}" \
        -d "text=${MSG}"
fi
EOF

chmod +x ~/agent/daily-check.sh
(crontab -l 2>/dev/null; echo "0 9 * * * /home/ubuntu/agent/daily-check.sh") | crontab -

Weekly maintenance

Setiap minggu:

1. System update

sudo apt update && sudo apt upgrade -y
sudo apt autoremove -y

2. Cleanup cache

sudo apt-get clean
sudo journalctl --vacuum-time=7d
rm -rf ~/.cache/{pip,npm}

3. Disk audit

# Big folders di /home
du -h ~ --max-depth=2 | sort -rh | head -10

# Big folders di /
sudo du -h / --max-depth=2 2>/dev/null | sort -rh | head -10

4. Log rotation manual

Kalo log file lo bengkak (custom logs, bukan journal):

ls -lh ~/agent/*.log
# kalo gede:
mv ~/agent/agent.log ~/agent/logs-archive/agent-$(date +%Y%m%d).log
gzip ~/agent/logs-archive/agent-$(date +%Y%m%d).log

5. Backup verify

# Test backup readable
aws s3 ls s3://my-backups/ | tail -5

# Test restore (dry run)
aws s3 cp s3://my-backups/agent-latest.tar.gz /tmp/test-restore.tar.gz
tar -tzf /tmp/test-restore.tar.gz | head -5  # list files tanpa extract
rm /tmp/test-restore.tar.gz

Monthly maintenance

1. Review memory.json

cat ~/agent/data/memory.json | jq '.notes' | less

Hapus yang stale, redundant, atau prompt injection. Edit file langsung.

2. Archive history

History per chat bisa numpuk. Archive yang lama:

cd ~/agent/data/history
# Move file yang udah > 30 hari ke archive
find . -maxdepth 1 -name "*.json" -mtime +30 -exec mv {} archive/ \;

3. Audit credentials

ls -la ~/agent/credentials/
  • Token mana yang masih active?
  • Mana yang udah ga dipake?
  • Mana yang perlu rotation (sudah > 90 hari)?

Rotate token yang dipake: 1. Generate new token di provider 2. Update ~/agent/credentials/<platform>.env 3. Test agent jalan 4. Revoke old token

4. SOUL.md review

Buka ~/agent/SOUL.md, baca pelan-pelan:

  • Apakah identity masih akurat?
  • Apakah capabilities up-to-date dengan tool yang ada?
  • Apakah ada pattern aneh dari bot bulan terakhir yang harus di-fix di SOUL.md?
  • Hapus section yang ga relevan, tambah yang baru.

Setelah edit, reset history biar pattern lama ga affect:

/reset

di Telegram.

5. Cost audit

Kalo lo pakai paid services:

  • LLM API: berapa token dipake bulan ini? Cek dashboard provider.
  • VPS: kalo bukan free tier, berapa charge bulan ini?
  • Database / storage: berapa GB used?

Optimize kalo over-budget: - Switch LLM model ke yang murah (gpt-4o-mini, claude-haiku, llama via Groq) - Reduce history window di system prompt - Cleanup old data

Quarterly (3 bulan sekali)

1. Provider review

Apakah lo masih pakai provider yang paling cost-effective?

  • AWS Free Tier abis di bulan 12 → migrate ke Oracle / Hetzner
  • Provider X naik harga → consider alternative

2. Security audit

# Check SSH attempts
sudo journalctl _COMM=sshd --since "30 days ago" | grep "Failed password" | wc -l

# Cek active user
who
last -n 20

# Cek listening ports
sudo netstat -tlnp
# Hanya ada port yang lo expect (22 SSH + bot tertentu)?

# Cek file permission credential
ls -la ~/agent/credentials/
# Semua 600?

# Cek SOUL.md
ls -la ~/agent/SOUL.md
# 644 atau 600?

3. Backup restore test

Backup ga useful kalo restore-nya ga jalan. Test:

# Download backup terbaru
aws s3 cp s3://my-backups/agent-latest.tar.gz /tmp/

# Extract di tmp folder
mkdir -p /tmp/restore-test
tar -xzf /tmp/agent-latest.tar.gz -C /tmp/restore-test

# Verify isi
ls -la /tmp/restore-test/data/
cat /tmp/restore-test/data/memory.json | jq '.notes | length'

# Cleanup
rm -rf /tmp/restore-test /tmp/agent-latest.tar.gz

Kalo ada error, fix backup pipeline.

4. Disaster recovery drill

Skenario: VPS lo di-suspend, harus migrate ke VPS baru dalam 1 jam. Apakah lo bisa?

Step: 1. Sign up VPS baru 2. SSH ke VPS baru 3. Clone repo: git clone https://github.com/user/agent.git 4. Setup env: python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt 5. Restore data: aws s3 cp s3://my-backups/agent-latest.tar.gz . && tar -xzf agent-latest.tar.gz 6. Setup systemd 7. Start service 8. Update DNS / Telegram webhook kalo perlu

Kalo lo ga bisa dalam 1 jam, improve documentation / scripts.

Yearly

1. Major upgrades

  • Ubuntu LTS upgrade (22.04 → 24.04 → 26.04 saat available)
  • Python major version (3.12 → 3.13)
  • Library major versions

Test dulu di staging / cloned VPS.

2. SOUL.md philosophical review

SOUL.md udah 1 tahun aktif. Pertanyaan:

  • Apakah filosofi inti masih sesuai? (mungkin lo udah mature, butuh adjust)
  • Apakah identity masih relevan? (mungkin lo mau rename, ganti tone)
  • Apakah boundaries cukup ketat / cukup longgar?
  • Apakah autonomy levels masih sesuai?

Rewrite kalo perlu. Bot lo mungkin udah evolve dari original use case.

3. Stack evaluation

  • Apakah LLM model lo masih SOTA?
  • Apakah Python library lo masih maintained?
  • Apakah Telegram masih platform yang lo pake? (mungkin Discord lebih cocok sekarang)

Switching cost vs benefit. Kalo benefit > cost, plan migration.

Update best practices

Always backup before update

# Sebelum git pull / pip upgrade / etc
tar -czf ~/agent-pre-update-$(date +%Y%m%d).tar.gz ~/agent

Update sequence

# 1. Save state
sudo systemctl stop kai-bot

# 2. Backup
tar -czf ~/agent-backup-$(date +%Y%m%d).tar.gz ~/agent/data ~/agent/credentials

# 3. Update code
cd ~/agent
git pull

# 4. Update deps
source venv/bin/activate
pip install -r requirements.txt --upgrade

# 5. Run tests (kalo ada)
# pytest

# 6. Start service
sudo systemctl start kai-bot

# 7. Verify
sleep 5
sudo systemctl is-active kai-bot
sudo journalctl -u kai-bot -n 20 --no-pager

Rollback if needed

# Stop current
sudo systemctl stop kai-bot

# Restore from backup
cd ~
tar -xzf agent-backup-20260514.tar.gz

# Revert code
cd ~/agent
git log --oneline -5
git checkout <previous-commit>

# Start
sudo systemctl start kai-bot

Anti-patterns

❌ Update tanpa backup

git pull  # ← bisa break, ga ada rollback

Always backup first.

❌ Update tanpa test

git pull && sudo systemctl restart kai-bot
# bot crash, lo baru tau setelah user complain

Test manual dulu (run python main.py foreground), terus systemd.

❌ Lupa monitoring setelah update

Bot bisa silent fail (jalan tapi ga respond bener). Check beberapa kali setelah update.

❌ Edit SOUL.md tanpa version

nano ~/agent/SOUL.md
# kalo broken, ga ada way to rollback

Pakai git untuk SOUL.md:

cd ~/agent
git add SOUL.md
git commit -m "soul: tweak communication tone"

Rollback gampang:

git log SOUL.md
git checkout <commit> -- SOUL.md

Monitoring dashboard sederhana

Bikin command yang summary status:

cat > ~/agent/status.sh << 'EOF'
#!/bin/bash
echo "=== Kai Agent Status ==="
echo ""
echo "Service:"
systemctl is-active kai-bot
systemctl is-enabled kai-bot
echo ""
echo "Memory:"
free -h | head -2
echo ""
echo "Disk:"
df -h / | tail -1
echo ""
echo "Recent errors (24h):"
journalctl -u kai-bot -p err --since "24 hours ago" --no-pager | wc -l
echo ""
echo "Uptime:"
uptime
EOF

chmod +x ~/agent/status.sh
~/agent/status.sh

Atau make available via Telegram command:

async def status_cmd(update, ctx):
    if not is_owner(update.effective_user.id):
        return
    result = await run_shell("/home/ubuntu/agent/status.sh")
    await update.message.reply_text(f"{result[:3000]}")