Home / The ‘Skill’ Gap: How Minor Text Tweaks Can Turn AI Agents Into Security Risks

The ‘Skill’ Gap: How Minor Text Tweaks Can Turn AI Agents Into Security Risks

Saran K | May 23, 2026 | 4 min read

The New Attack Surface of Natural Language

For years, cybersecurity has been a battle of code—exploiting buffer overflows, patching kernels, and hunting for malicious binaries. But as the industry shifts toward autonomous AI agents, the attack surface is evolving. It is no longer just about what the software executes, but how the AI interprets a set of instructions.

AI agents are essentially Large Language Models (LLMs) wrapped in software, capable of using external tools to perform complex, multi-step tasks. To expand their capabilities, these agents often rely on ‘skills’—text-based instructions typically stored in SKILL.md files. These files tell an agent how to perform a specific task, such as conducting a code quality review or managing a calendar. However, new research suggests that these skills can be weaponized through what is being called a semantic supply-chain attack.

Soheil Feizi, a computer science professor at the University of Maryland and founder of RELAI.ai, argues that the current architecture of agent frameworks creates a dangerous blind spot. Many frameworks allow agents to autonomously discover and install skills from online registries to meet a user’s needs on the fly. While this allows for seamless scalability, it introduces a vector where natural language text acts as the payload.

The danger lies in the fact that a ‘skill’ is not just a piece of code; it is a set of prompts. When an agent loads a skill, those instructions are fed into the model’s context window alongside the user’s request. If those instructions are maliciously crafted, they can function as a form of user-authorized prompt injection, directing the AI to ignore its safety guardrails or exfiltrate data.

Gaming the Registry

In a recent preprint paper, “Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry,” Feizi and his colleagues, Shoumik Saha and Kazem Faghih, detailed how attackers can manipulate the discovery process. They found that by adding small, 20-token ‘triggers’ to a skill description, they could significantly influence whether an agent selects their malicious skill over a legitimate one.

The results were stark. The researchers demonstrated that they could induce an agent to discover their manipulated skill over an unaltered source 86 percent of the time. Furthermore, the agent selected the adversarial skill over other variants in 77.6 percent of trials. This suggests that agents are not just following logic, but are susceptible to semantic ‘nudges’ that make a malicious skill appear more relevant or authoritative than a safe one.

This vulnerability is compounded by the failure of existing security scanners. Because traditional security tools look for malicious code or known malware signatures, they often ignore the semantic meaning of natural language. The researchers found they could evade registry scanning defenses between 36.5 percent and 100 percent of the time.

The ‘Context Overflow’ Tactic

One of the most effective methods for bypassing safety checks was surprisingly simple: overwhelming the scanner. The team discovered that some registry reviewers, such as those used by ClawHub, only process the first 10,000 characters of a SKILL.md file. By placing malicious instructions beyond this boundary, the researchers ensured the LLM reviewer would never see the attack, while the agent—which may have a larger or different context window—would still execute the command.

This is not an isolated incident. In February, the security firm Snyk reported that roughly 13.4 percent of skills on platforms like ClawHub and skills.sh contained critical security issues, ranging from exposed secrets to full-blown malware distribution.

The shift toward autonomous agency means that the line between ‘data’ and ‘instructions’ has blurred. When an agent visits a website or pulls a file from a registry, it is essentially trusting a third party to write its internal logic. Until skill registries implement more robust, context-aware governance and agents are designed with stricter boundaries between system prompts and third-party skills, the risk of ‘rogue’ agents remains a systemic reality of the AI ecosystem.

The ‘Skill’ Gap: How Minor Text Tweaks Can Turn AI Agents Into Security Risks

Table of Contents

The New Attack Surface of Natural Language

Gaming the Registry

The ‘Context Overflow’ Tactic

Related News

The ‘Skill’ Gap: Researchers Warn AI Agents Can Be Hijacked via Natural Language Instructions

Related Posts

Apple Intelligence Shifts Focus Toward Family Safety and Granular AI Guardrails at WWDC26

The Mid-Year Laptop Market: Where to Actually Save on Windows and Gaming Rigs

The End of the ‘Aha!’ Moment? How AI is Scooping Human Mathematicians

Leave a Reply Cancel reply