Examples
Manual LLM-style verification
from agentproof import AgentResponse, ChallengeSpec, generate_challenge, verify_response
challenge = generate_challenge(
ChallengeSpec(
challenge_type="obfuscated_text_lock",
difficulty=2,
options={"template": "amber_sort"},
)
)
# Send challenge.to_dict() to the client.
response = AgentResponse(
challenge_id=challenge.challenge_id,
challenge_type=challenge.challenge_type,
payload={"answer": str(challenge.private_data["expected_answer"])},
)
result = verify_response(challenge, response)
assert result.ok
Harder multi-pass example
from agentproof import AgentResponse, ChallengeSpec, generate_challenge, verify_response
challenge = generate_challenge(
ChallengeSpec(
challenge_type="multi_pass_lock",
difficulty=2,
options={"template": "warm_reverse_length"},
)
)
response = AgentResponse(
challenge_id=challenge.challenge_id,
challenge_type=challenge.challenge_type,
payload={"answer": str(challenge.private_data["expected_answer"])},
)
result = verify_response(challenge, response)
assert result.ok
Public challenge JSON
public_payload does not include the expected answer.
internal_payload does.
CLI generation for the obfuscated family
agentproof generate obfuscated_text_lock \
--difficulty 2 \
--template amber_sort \
--output challenge.internal.json \
--public-output challenge.public.json
CLI generation for the harder family
agentproof generate multi_pass_lock \
--difficulty 2 \
--template warm_reverse_length \
--output challenge.internal.json \
--public-output challenge.public.json
Benchmark report
from agentproof import run_benchmark
report = run_benchmark(
challenge_type="obfuscated_text_lock",
iterations=10,
difficulty=2,
template="amber_sort",
)
print(report.to_dict())
Baseline family with a bundled solver
from agentproof import ChallengeSpec, generate_challenge, solve_challenge, verify_response
challenge = generate_challenge(
ChallengeSpec(challenge_type="proof_of_work", difficulty=16, ttl_seconds=120)
)
response = solve_challenge(challenge)
result = verify_response(challenge, response)
assert result.ok
Failure example
If the client sends the wrong format for either LLM family, the result looks like:
Local demo
The repository ships a runnable local demo in the
demo/ directory on GitHub.
Run it with:
The demo lets you:
- generate public LLM-family challenges
- paste a manual LLM response into the editor
- use the bundled solver for the baseline families
- inspect the raw JSON and failure modes