Lewati ke isi

7. Verification

Agent tidak boleh bilang "done" tanpa bukti.

Salah satu masalah paling umum di AI agent: halusinasi sukses. Model bilang "udah selesai" padahal command-nya gagal. Verification pilar nge-force agent buat verify dulu sebelum claim success.

Yang harus ada

## VERIFICATION

Setelah jalanin command apapun, VERIFIKASI hasil sebelum bilang "done":
- Setelah mkdir → ls confirm folder ada
- Setelah git commit → git status confirm working tree clean
- Setelah git push → cek output remote
- Setelah deploy → curl endpoint atau cek service status
- Setelah transfer wallet → cek tx hash di explorer
- Setelah post sosmed → cek URL post-nya ada

Kalo output error → baca error, coba fix, baru report ke user.
Kalo verify gagal → bilang "verify gagal, [reason]", jangan claim sukses.

Pattern verification per action type

File operations

mkdir foo/  →  ls -la foo/
touch f.txt →  ls -la f.txt
echo > f.txt → cat f.txt
rm f.txt →   ls f.txt 2>&1 (expect "No such file")

Git

git commit -m "..."  →  git status (expect "nothing to commit, working tree clean")
git push origin x →  git log origin/x --oneline -1 (expect latest commit visible)
git pull → git log --oneline -5 (expect latest commits ada)

Service management

systemctl restart x → systemctl is-active x (expect "active")
systemctl stop x → systemctl is-active x (expect "inactive")

Network/HTTP

curl POST → check status code 200/201
deploy app → curl http://localhost:port (expect specific response)

Wallet/crypto

send tx → cek tx hash di blockchain explorer
check confirmation → expect >= 1 confirmation
verify amount → expect to_address balance increased by amount

Sosial media

post tweet → cek URL post (https://twitter.com/user/status/...)
delete tweet → cek 404
follow user → cek following list updated

Verification flow di kode

Agent dapet verification mechanism via tool_use chain:

Step 1: User minta "create folder test"
Step 2: Agent output <tool_use>mkdir test</tool_use>
Step 3: System execute, return "" (mkdir success silent)
Step 4: Agent SHOULDN'T jump to "done", harus verify
Step 5: Agent output <tool_use>ls -la test</tool_use>
Step 6: System execute, return "total 0 ..."
Step 7: Agent NOW say "Done. Folder test dibuat."

Untuk encourage ini di SOUL.md:

JANGAN bilang "udah" / "done" / "selesai" tanpa step verifikasi 
explicit di tool_use loop. Tiap action yang state-changing 
harus diikuti tool_use verification sebelum final answer.

Verification yang sering dilewatin

1. Side effect verification

Command bisa "sukses" (exit code 0) tapi efeknya beda:

git push origin main
# Exit code 0, tapi mungkin remote masih lagging
# Real verification: cek remote ada commit baru via gh api

2. Persistence verification

State ada di memory != state ada di disk:

echo "new content" > config.json
# Mungkin ke-buffered, belum flush
# Real verification: cat config.json (force read from disk)

3. Cross-system verification

Action di system A, efek di system B:

Buy crypto di Binance
# Order ID 12345 (di Binance)
# Real verification: cek wallet balance bertambah (di blockchain)

Multi-step verification

Untuk task complex, verify per step bukan cuma di akhir:

Task: deploy new feature

Step 1: git pull origin main → verify: git log latest commit
Step 2: pip install -r requirements.txt → verify: pip list | grep new-package
Step 3: pytest → verify: tests passed (exit code 0)
Step 4: systemctl restart api → verify: systemctl is-active api → expect "active"
Step 5: curl localhost:8000/health → verify: response "ok"

Kalo verify gagal di step 2, stop. Jangan lanjut step 3.

Verification dengan retry

Untuk async/network ops, sometimes verifikasi butuh retry:

def verify_deploy(url, max_retries=10, delay=2):
    for i in range(max_retries):
        try:
            r = requests.get(url, timeout=5)
            if r.status_code == 200:
                return True
        except:
            pass
        time.sleep(delay)
    return False

Untuk transaction blockchain:

def verify_tx(tx_hash, max_retries=30, delay=10):
    for i in range(max_retries):
        receipt = web3.eth.get_transaction_receipt(tx_hash)
        if receipt and receipt['status'] == 1:
            return True
        time.sleep(delay)
    return False

SOUL.md note untuk async:

Untuk async actions (blockchain, deploy, API external), 
verifikasi mungkin butuh retry. Tunggu sampe state observable. 
Kalo udah X menit ga muncul, anggap fail dan report.

Anti-patterns

❌ Trust without verify

Agent: [run command]
Agent: Done. Folder dibuat.

vs

Agent: [run mkdir test]
Agent: [run ls -la test]
Agent: Done. Folder test dibuat. Empty, owner ubuntu.

❌ Verifikasi yang ga relevan

Action: post tweet
Verifikasi: ping twitter.com

Ping ga konfirmasi tweet ke-post. Verifikasi harus specific ke action.

❌ Skip verifikasi karena "udah keliatan benar"

Agent: [git push] (exit code 0)
Agent: Push success.

Tapi exit 0 ga selalu berarti remote terima. Verify dengan gh api repos/x/y/commits/main atau equivalent.

❌ Verifikasi dengan tool yang same root cause

Kalo mkdir foo gagal karena disk penuh, ls foo juga bakal gagal. Tapi error message-nya ga jelas. Pakai diagnostic tool yang independent:

mkdir foo  →  cek df -h dulu (apakah disk penuh?)
              →  ls foo (apakah folder ada?)
              →  stat foo (info detail)

SOUL.md template

## VERIFICATION

Setelah action yang state-changing, output tool_use verifikasi 
sebelum reply final.

Patterns:
- Create file/folder → ls (read)
- Modify file → cat (read)
- Delete → ls expect "not found"
- Service restart → systemctl is-active expect "active"
- Git commit → git status expect "clean"
- Git push → gh api commits expect latest
- Deploy → curl endpoint expect specific response
- Wallet tx → cek tx hash di explorer
- Sosmed post → cek URL post

Kalo verifikasi gagal:
1. Baca error message
2. Coba 1x retry / fix (max 3 attempts total)
3. Kalo masih gagal, report ke user dengan error details

JANGAN bilang "done" tanpa verifikasi explicit.