agentproof

LLM-capability CAPTCHA for obfuscated language challenges.

Issue a public challenge, keep the private verification copy server-side, and check whether a client can recover and execute an obfuscated instruction.

PyPI Getting started Examples

Python 3.10 to 3.13

Public challenge + private verifier copy

Structured JSON answers

CLI + API + benchmark harness + local demo

What it actually does

Traditional CAPTCHA asks whether the client is human.

agentproof asks a narrower question:

Can this client recover and execute an obfuscated instruction in an LLM-like way?

That makes it useful for:

LLM-first endpoints

Add a capability gate before exposing agent-focused routes.

Reverse CAPTCHA experiments

Favor clients that can decode noisy instructions and answer in exact JSON.

Composable verification

Combine challenge-response checks with auth, replay protection, and rate limits.

Flow

1. Issue

Your service generates a challenge and keeps the internal verification copy.

2. Send

The client receives only the public challenge JSON with the obfuscated prompt.

3. Verify

Your service checks the returned JSON answer against the private expected result.

Smallest working example

from agentproof import AgentResponse, ChallengeSpec, generate_challenge, verify_response

challenge = generate_challenge(
    ChallengeSpec(
        challenge_type="obfuscated_text_lock",
        difficulty=2,
        options={"template": "amber_sort"},
    )
)
response = AgentResponse(
    challenge_id=challenge.challenge_id,
    challenge_type=challenge.challenge_type,
    payload={"answer": str(challenge.private_data["expected_answer"])},
)
result = verify_response(challenge, response)

assert result.ok

When you need a stronger language-recovery task, generate multi_pass_lock instead. It keeps the same verification model but adds multiple rule and transformation stages.

Real public challenge and response

Challenge

{
  "challenge_id": "bb28567e201b35aa",
  "challenge_type": "obfuscated_text_lock",
  "prompt": "gl1tch//llm-cap-v1::d2\nfrag@f8 // D3c0d3 the driFted Br13f ANd 4N5w3r tHrOUgH Payload.answer 0NLY\nfrag@d8 %% d3CK: slOt5 v10l37 cIndEr\nfrag@f6 %% d3ck: sloT2 4Mb3R h4Rb0r\nfrag@c9 || task: 0rD3R thE kept 5h4Rd WOrdS By 5l07 numBer fr0m loW to h1gh\nfrag@b3 %% dEcK: slOt3 C0b4L7 sabLe\nfrag@d3 %% AnswEr ruLe: R37urn ThE 5H4rd W0rd5 in UpPercaSe aScii J01N3D WIth hYpheNs\nfrag@e2 || d3Ck: SLot4 4mb3R 51gn4L\nfrag@e5 ^^ tasK: keEp onLy ShArds cArrying the 4MB3r TAg\nfrag@e4 :: d3CK: slot1 4mB3r 3Mb3R\nreply via payload.answer only // structured-json",
  "issued_at": "2026-03-07T02:58:20.639623+00:00",
  "expires_at": "2026-03-07T03:00:20.639623+00:00",
  "version": "1",
  "data": {
    "difficulty": 2,
    "profile": "llm_capability_v2",
    "response_contract": {
      "payload.answer": "UPPERCASE ASCII words joined with hyphens",
      "payload.decoded_preview": "optional free-form notes"
    }
  }
}

Agent response

{
  "challenge_id": "bb28567e201b35aa",
  "challenge_type": "obfuscated_text_lock",
  "payload": {
    "answer": "EMBER-HARBOR-SIGNAL",
    "decoded_preview": "kept amber shards ordered by slot"
  }
}

Success result:

{
  "ok": true,
  "reason": "ok",
  "details": {
    "answer": "EMBER-HARBOR-SIGNAL",
    "template_id": "amber_sort",
    "difficulty": 2
  }
}

Why it fits LLM-capable clients

Obfuscated

The key instruction is noisy, shuffled, and distorted instead of being directly machine-labeled.

Exact

The final answer is still deterministic: uppercase ASCII, hyphen-joined, exact expected value.

Verifiable

The server keeps the private verification copy and returns clear failure reasons instead of fuzzy scores.

Built-in families

obfuscated_text_lock

Primary challenge family for external LLM clients, with stronger obfuscated prompt patterns.

multi_pass_lock

Harder LLM family that layers filtering, transforms, and ordering into one prompt.

proof_of_work

Deterministic compute baseline with a bundled reference solver.

semantic_math_lock

Readable exact-constraint baseline that stays easy to inspect locally.

Benchmarking

Use the built-in harness to compare weak non-LLM baselines against generated LLM-family challenges:

agentproof benchmark obfuscated_text_lock --iterations 25 --difficulty 2 --template amber_sort

It reports per-solver attempts, solves, and success rate so you can see how often brittle parsers still succeed against the current prompt family.

What it is not

Warning

agentproof is not provider attestation or identity proof. It is an LLM-capability CAPTCHA library.

Continue

Start with Getting Started
Compare built-in Challenge Types
Open runnable Examples