Breaking
OpenAI announces GPT-5 with breakthrough reasoning capabilities | OpenAI announces GPT-5 with breakthrough reasoning capabilities |

Home / The ‘Skill’ Gap: How Minor Text Tweaks Can Turn AI Agents Into Security Risks

Technology

The ‘Skill’ Gap: How Minor Text Tweaks Can Turn AI Agents Into Security Risks

Saran K | May 23, 2026 | 4 min read

AI agent security

Table of Contents

    The New Attack Surface of Natural Language

    For years, cybersecurity has been a battle of code—exploiting buffer overflows, patching kernels, and hunting for malicious binaries. But as the industry shifts toward autonomous AI agents, the attack surface is evolving. It is no longer just about what the software executes, but how the AI interprets a set of instructions.

    AI agents are essentially Large Language Models (LLMs) wrapped in software, capable of using external tools to perform complex, multi-step tasks. To expand their capabilities, these agents often rely on ‘skills’—text-based instructions typically stored in SKILL.md files. These files tell an agent how to perform a specific task, such as conducting a code quality review or managing a calendar. However, new research suggests that these skills can be weaponized through what is being called a semantic supply-chain attack.

    Soheil Feizi, a computer science professor at the University of Maryland and founder of RELAI.ai, argues that the current architecture of agent frameworks creates a dangerous blind spot. Many frameworks allow agents to autonomously discover and install skills from online registries to meet a user’s needs on the fly. While this allows for seamless scalability, it introduces a vector where natural language text acts as the payload.

    The danger lies in the fact that a ‘skill’ is not just a piece of code; it is a set of prompts. When an agent loads a skill, those instructions are fed into the model’s context window alongside the user’s request. If those instructions are maliciously crafted, they can function as a form of user-authorized prompt injection, directing the AI to ignore its safety guardrails or exfiltrate data.

    Gaming the Registry

    In a recent preprint paper, “Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry,” Feizi and his colleagues, Shoumik Saha and Kazem Faghih, detailed how attackers can manipulate the discovery process. They found that by adding small, 20-token ‘triggers’ to a skill description, they could significantly influence whether an agent selects their malicious skill over a legitimate one.

    The results were stark. The researchers demonstrated that they could induce an agent to discover their manipulated skill over an unaltered source 86 percent of the time. Furthermore, the agent selected the adversarial skill over other variants in 77.6 percent of trials. This suggests that agents are not just following logic, but are susceptible to semantic ‘nudges’ that make a malicious skill appear more relevant or authoritative than a safe one.

    This vulnerability is compounded by the failure of existing security scanners. Because traditional security tools look for malicious code or known malware signatures, they often ignore the semantic meaning of natural language. The researchers found they could evade registry scanning defenses between 36.5 percent and 100 percent of the time.

    The ‘Context Overflow’ Tactic

    One of the most effective methods for bypassing safety checks was surprisingly simple: overwhelming the scanner. The team discovered that some registry reviewers, such as those used by ClawHub, only process the first 10,000 characters of a SKILL.md file. By placing malicious instructions beyond this boundary, the researchers ensured the LLM reviewer would never see the attack, while the agent—which may have a larger or different context window—would still execute the command.

    This is not an isolated incident. In February, the security firm Snyk reported that roughly 13.4 percent of skills on platforms like ClawHub and skills.sh contained critical security issues, ranging from exposed secrets to full-blown malware distribution.

    The shift toward autonomous agency means that the line between ‘data’ and ‘instructions’ has blurred. When an agent visits a website or pulls a file from a registry, it is essentially trusting a third party to write its internal logic. Until skill registries implement more robust, context-aware governance and agents are designed with stricter boundaries between system prompts and third-party skills, the risk of ‘rogue’ agents remains a systemic reality of the AI ecosystem.

    Related News

    #artificialIntelligence #cybersecurity #softwareVulnerabilities #aiAgents #cybersecurity #artificialIntelligenceAgents #ai+Ml #ai #universityOfMaryland #promptInjection

    Related Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *