Challenge Types
agentproof ships two LLM-capability challenge families and two baseline families.
obfuscated_text_lock
Use this when you want the challenge itself to depend on recovering intent from obfuscated text. This family now uses stronger prompt patterns than the earlier literal style, while keeping exact verification.
What the client does:
- reads a noisy, shuffled instruction prompt
- recovers the rule hidden in the text
- returns a structured JSON payload with an exact answer in uppercase hyphenated form
What the server verifies:
- challenge ID matches
- response is not expired
payload.answerexists and matches the required format- the normalized answer equals the private expected answer
Built-in templates:
amber_sortecho_reversevowel_count
Typical use cases:
- LLM-capability CAPTCHA experiments
- challenge-response gates for LLM-first APIs
- testing whether clients can recover intent from obfuscated text
Important constraint:
- there is intentionally no bundled solver for this family
- the challenge must be solved by an external LLM-capable client
multi_pass_lock
Use this when the single-step obfuscated family is not enough and you want a harder prompt that requires multiple transformations before the final answer.
What the client does:
- recovers a noisy instruction prompt
- keeps only the relevant entries
- applies one or more transformations such as reverse, trim, or clip
- orders the result and returns the uppercase hyphenated answer
What the server verifies:
- challenge ID matches
- response is not expired
payload.answerexists and matches the required format- the normalized answer equals the private expected answer
Built-in templates:
warm_reverse_lengthecho_clip_descvowel_trim_desc
Typical use cases:
- harder LLM-capability CAPTCHA experiments
- regression testing against stronger obfuscated prompts
- evaluating how brittle parser-style solvers degrade as prompt complexity rises
Important constraint:
- there is intentionally no bundled solver for this family
- the challenge must be solved by an external LLM-capable client
proof_of_work
Use this when you want a deterministic compute task with no language recovery component.
What the agent does:
- reads the payload
- searches for a nonce
- returns a nonce and hash pair
What the server verifies:
- challenge ID matches
- response is not expired
- hash recomputes correctly
- hash satisfies the required difficulty
Typical use cases:
- deterministic smoke tests
- cheap baseline friction
- CLI and CI examples
semantic_math_lock
Use this when you want a readable challenge that still has exact measurable constraints.
What the agent does:
- reads required words and word-count rules
- produces text that matches all constraints
- returns the text in structured JSON
What the server verifies:
- challenge ID matches
- response is not expired
- exact word count
- required words appear exactly once
- initial-letter ASCII sum matches the target
Typical use cases:
- local demos
- readable examples
- deterministic constraint checks without obfuscation
Determinism matters
agentproof intentionally avoids fuzzy verification.
Each built-in challenge is designed so the server can produce a clear yes/no result and, when it fails, a concrete failure reason such as:
challenge_expiredmissing_answerinvalid_answer_formatanswer_mismatchhash_mismatchwrong_word_countrequired_word_constraint_failedinitial_sum_mismatch
Benchmarking
agentproof includes a benchmark harness for the LLM families. It runs generated challenges
through bundled non-LLM baseline solvers and reports attempts, solves, and success rates.
Use it when you want to compare:
obfuscated_text_lockagainst older literal parsersmulti_pass_lockagainst the same parsers under higher prompt complexity
Extending the library
Challenge families implement a shared internal protocol:
- generate
- solve
- verify
For challenge families that should only be solved by an external client, solve(...) can raise
SolverUnavailableError.