It's a TRAP!

Task-Redirecting Agent Persuasion Benchmark for Web Agents

A modular social-engineering evaluation suite studying how persuasion techniques misguide autonomous web agents on high-fidelity website clones.

By Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki, Piotr Błaszczyk, Will Howard, Lukas Aichberger, Chris Russell, Philip H.S. Torr, Adam Mahdi, Adel Bibi
karokorgul@gmail.com

Admiral Ackbar: It's a Trap!
25%
Average ASR
630
Unique Injections
6
High-Fidelity Clones
5
Attack Dimensions

Core Contributions

Modular Attack Space

We developed a 5D modular attack space consisting of 630 distinct injections. These vary along persuasion principles, manipulation methods, and interface forms, allowing for granular analysis of agent failure.

Extensible Framework

Built on the REAL framework, TRAP provides an extensible environment for evaluating web-based LLM agents. It replicates deterministic website simulations to ensure reproducible and realistic safety testing.

Psychological Analysis

Our benchmark shifts the focus from simple task completion to why attacks succeed, revealing systematic vulnerabilities driven by human-centric persuasion strategies like Social Proof and Authority.

Empirical Findings

We evaluated six closed- and open-source frontier models across 3,780 experimental runs.

Model Robustness Table

Model Benign Utility Attack Success
GPT-5 89% 13%
Claude 3.7 Sonnet 83% 20%
DeepSeek-R1 67% 43%
Gemini 2.5 Flash 61% 30%

Hijack Success Matrix

Success Matrix
Note: Social Proof and Consistency principles emerged as the most universal triggers. Many-shot conditioning and Chain-of-Thought injections were the most successful manipulation methods across all evaluated agents.

Strategic Findings

  • 01

    The Interface Gap: Button-based injections are over 3.5x more effective than hyperlinks, likely due to models associating buttons with mandatory UI interactions.

  • 02

    Systematic Transfer: Attacks that breach robust models like GPT-5 are high-confidence "super-injections" that almost always transfer to weaker models.

Contextual Insights

  • 03

    Tailoring Multiplier: Swapping generic phrases for task-specific wording (e.g., "access event details") increases attack success rates by up to 600%.

  • 04

    Model-Specific Weaknesses: Each model exhibits a distinct vulnerability profile. GPT-5 is most susceptible to Many-shot and CoT attacks, DeepSeek-R1 is almost entirely compromised through CoT injection, and Claude Sonnet 3.7 is particularly sensitive to Consistency-based persuasion.

Anatomy of a Success

How multiple layers of persuasion create an effective hijack.

Example Injection
Authority Principle

The injection impersonates a "Legally Required" Admin Policy notice to trigger compliance via perceived authority.

CoT Injection

Numbered "Reasoning steps" mimic the agent's internal Chain-of-Thought, effectively overwriting its planned behavior.