7. Verification¶
Agent tidak boleh bilang "done" tanpa bukti.
Salah satu masalah paling umum di AI agent: halusinasi sukses. Model bilang "udah selesai" padahal command-nya gagal. Verification pilar nge-force agent buat verify dulu sebelum claim success.
Yang harus ada¶
## VERIFICATION
Setelah jalanin command apapun, VERIFIKASI hasil sebelum bilang "done":
- Setelah mkdir → ls confirm folder ada
- Setelah git commit → git status confirm working tree clean
- Setelah git push → cek output remote
- Setelah deploy → curl endpoint atau cek service status
- Setelah transfer wallet → cek tx hash di explorer
- Setelah post sosmed → cek URL post-nya ada
Kalo output error → baca error, coba fix, baru report ke user.
Kalo verify gagal → bilang "verify gagal, [reason]", jangan claim sukses.
Pattern verification per action type¶
File operations¶
mkdir foo/ → ls -la foo/
touch f.txt → ls -la f.txt
echo > f.txt → cat f.txt
rm f.txt → ls f.txt 2>&1 (expect "No such file")
Git¶
git commit -m "..." → git status (expect "nothing to commit, working tree clean")
git push origin x → git log origin/x --oneline -1 (expect latest commit visible)
git pull → git log --oneline -5 (expect latest commits ada)
Service management¶
systemctl restart x → systemctl is-active x (expect "active")
systemctl stop x → systemctl is-active x (expect "inactive")
Network/HTTP¶
curl POST → check status code 200/201
deploy app → curl http://localhost:port (expect specific response)
Wallet/crypto¶
send tx → cek tx hash di blockchain explorer
check confirmation → expect >= 1 confirmation
verify amount → expect to_address balance increased by amount
Sosial media¶
post tweet → cek URL post (https://twitter.com/user/status/...)
delete tweet → cek 404
follow user → cek following list updated
Verification flow di kode¶
Agent dapet verification mechanism via tool_use chain:
Step 1: User minta "create folder test"
Step 2: Agent output <tool_use>mkdir test</tool_use>
Step 3: System execute, return "" (mkdir success silent)
Step 4: Agent SHOULDN'T jump to "done", harus verify
Step 5: Agent output <tool_use>ls -la test</tool_use>
Step 6: System execute, return "total 0 ..."
Step 7: Agent NOW say "Done. Folder test dibuat."
Untuk encourage ini di SOUL.md:
JANGAN bilang "udah" / "done" / "selesai" tanpa step verifikasi
explicit di tool_use loop. Tiap action yang state-changing
harus diikuti tool_use verification sebelum final answer.
Verification yang sering dilewatin¶
1. Side effect verification¶
Command bisa "sukses" (exit code 0) tapi efeknya beda:
git push origin main
# Exit code 0, tapi mungkin remote masih lagging
# Real verification: cek remote ada commit baru via gh api
2. Persistence verification¶
State ada di memory != state ada di disk:
echo "new content" > config.json
# Mungkin ke-buffered, belum flush
# Real verification: cat config.json (force read from disk)
3. Cross-system verification¶
Action di system A, efek di system B:
Buy crypto di Binance
# Order ID 12345 (di Binance)
# Real verification: cek wallet balance bertambah (di blockchain)
Multi-step verification¶
Untuk task complex, verify per step bukan cuma di akhir:
Task: deploy new feature
Step 1: git pull origin main → verify: git log latest commit
Step 2: pip install -r requirements.txt → verify: pip list | grep new-package
Step 3: pytest → verify: tests passed (exit code 0)
Step 4: systemctl restart api → verify: systemctl is-active api → expect "active"
Step 5: curl localhost:8000/health → verify: response "ok"
Kalo verify gagal di step 2, stop. Jangan lanjut step 3.
Verification dengan retry¶
Untuk async/network ops, sometimes verifikasi butuh retry:
def verify_deploy(url, max_retries=10, delay=2):
for i in range(max_retries):
try:
r = requests.get(url, timeout=5)
if r.status_code == 200:
return True
except:
pass
time.sleep(delay)
return False
Untuk transaction blockchain:
def verify_tx(tx_hash, max_retries=30, delay=10):
for i in range(max_retries):
receipt = web3.eth.get_transaction_receipt(tx_hash)
if receipt and receipt['status'] == 1:
return True
time.sleep(delay)
return False
SOUL.md note untuk async:
Untuk async actions (blockchain, deploy, API external),
verifikasi mungkin butuh retry. Tunggu sampe state observable.
Kalo udah X menit ga muncul, anggap fail dan report.
Anti-patterns¶
❌ Trust without verify¶
vs
Agent: [run mkdir test]
Agent: [run ls -la test]
Agent: Done. Folder test dibuat. Empty, owner ubuntu.
❌ Verifikasi yang ga relevan¶
Ping ga konfirmasi tweet ke-post. Verifikasi harus specific ke action.
❌ Skip verifikasi karena "udah keliatan benar"¶
Tapi exit 0 ga selalu berarti remote terima. Verify dengan gh api repos/x/y/commits/main atau equivalent.
❌ Verifikasi dengan tool yang same root cause¶
Kalo mkdir foo gagal karena disk penuh, ls foo juga bakal gagal. Tapi error message-nya ga jelas. Pakai diagnostic tool yang independent:
mkdir foo → cek df -h dulu (apakah disk penuh?)
→ ls foo (apakah folder ada?)
→ stat foo (info detail)
SOUL.md template¶
## VERIFICATION
Setelah action yang state-changing, output tool_use verifikasi
sebelum reply final.
Patterns:
- Create file/folder → ls (read)
- Modify file → cat (read)
- Delete → ls expect "not found"
- Service restart → systemctl is-active expect "active"
- Git commit → git status expect "clean"
- Git push → gh api commits expect latest
- Deploy → curl endpoint expect specific response
- Wallet tx → cek tx hash di explorer
- Sosmed post → cek URL post
Kalo verifikasi gagal:
1. Baca error message
2. Coba 1x retry / fix (max 3 attempts total)
3. Kalo masih gagal, report ke user dengan error details
JANGAN bilang "done" tanpa verifikasi explicit.