The ‘Semantic’ Supply Chain: How Minor Text Edits Can Turn AI Agents Rogue

Table of Contents
Beyond the Code: A New Vector for AI Hijacking
For years, the cybersecurity industry has focused on the ‘plumbing’ of software—dependencies, executable code, and memory leaks. But as AI agents transition from simple chatbots to autonomous entities capable of using tools and executing multi-step workflows, the attack surface is shifting from binary code to natural language.
New research from the University of Maryland (UMD) suggests that the very mechanisms designed to make AI agents extensible—specifically ‘skills’—are creating a dangerous new loophole. These skills, often distributed via online registries, are not always complex pieces of software. Frequently, they are simply text instructions housed in a SKILL.md file, telling an agent how to perform a specific task, such as a code review or a market analysis.
The problem, according to Soheil Feizi, a computer science professor at UMD and CEO of RELAI.ai, is that these text-based specifications are being treated as benign documentation rather than security-sensitive objects. When an agent fetches a skill from a third-party registry, it isn’t just downloading a tool; it is importing a set of instructions that can fundamentally alter the model’s behavior.
The Art of Semantic Evasion
In a recent preprint titled “Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry,” Feizi and his co-authors, Shoumik Saha and Kazem Faghih, demonstrate that attackers don’t need to hide malware in a script to compromise a system. Instead, they can use “semantic triggers”—short bursts of text (roughly 20 tokens) designed to manipulate how an agent discovers and selects a skill.
The researchers found that by subtly tweaking the descriptions of these skills, they could trick AI agents into choosing a malicious skill over a legitimate one 77.6% of the time. Even more concerning was the discovery phase: the team successfully induced agents to discover their adversarial skills over unaltered sources in 86% of tested cases.
This effectively turns a skill into a form of user-authorized prompt injection. If an agent like OpenClaw automatically fetches a skill that seems relevant to a task, it may inadvertently import a set of instructions that tell it to exfiltrate data, ignore safety guardrails, or execute unauthorized commands.
Bypassing the Guardrails
The study also highlights a critical failure in how current registries vet these skills. Many registries use LLM-based reviewers to scan for malicious content, but these scanners have a finite context window. The researchers discovered a simple “overflow” strategy: by making the SKILL.md file exceptionally long, they could push the malicious instructions beyond the scanner’s 10,000-character limit.
The scanner sees a benign beginning and marks the file as safe, while the AI agent, which may process the full file during execution, triggers the rogue instructions hidden in the tail end. This method allowed the team to evade registry scanning defenses between 36.5% and 100% of the time.
This isn’t a theoretical risk. Earlier this year, security firm Snyk analyzed skills on platforms like ClawHub and skills.sh, finding that approximately 13.4% of the skills—roughly 534 out of nearly 4,000—contained critical security issues, ranging from exposed secrets to active malware distribution.
Redefining the AI Security Perimeter
The UMD findings suggest that the industry is operating under a false assumption: that if the code is clean, the system is secure. However, in the era of LLMs, the instruction is the code.
As agents become more autonomous in how they source their own capabilities, the reliance on third-party registries creates a semantic supply chain. If the ranking mechanisms and governance pipelines for these registries remain based on simple keyword relevance or limited-window scans, the potential for widespread agent hijacking remains high.
The research team has published their source code and documentation on GitHub, urging developers to treat natural-language specifications with the same rigor as executable binaries.