Home / The Social Engineering of AI: How Hackers are Exploiting Chatbot ‘Personalities’

The Social Engineering of AI: How Hackers are Exploiting Chatbot ‘Personalities’

Saran K | May 24, 2026 | 3 min read

From Memes to Malware

The first era of AI jailbreaking was, in many ways, an exercise in absurdity. In the early days of large language models (LLMs), bypassing the multi-billion dollar safety frameworks of companies like OpenAI or Google didn’t require a degree in computer science or a sophisticated suite of hacking tools. It often just required a creative imagination.

Early exploits were essentially digital dares. Users discovered that by telling a bot to “ignore all previous instructions,” they could strip away its corporate persona and turn it into something erratic. This culminated in the “DAN” (Do Anything Now) persona, where users coerced ChatGPT into roleplaying as a rogue agent free from constraints, leading the bot to generate everything from conspiracy theories to prohibited slurs.

Other early attempts were even more surreal. The “grandma exploit” saw users asking bots to act as a deceased grandmother who happened to tell bedtime stories about the chemical composition of napalm. These attacks were treated as curiosities—memes shared on Twitter and Reddit—but they revealed a fundamental flaw in how LLMs process intent versus instruction.

The Psychology of the Prompt

As developers patched these obvious loopholes, the nature of the attack shifted. We have entered an era where the most effective “hackers” aren’t necessarily coders, but wordsmiths and psychologists. The vulnerability isn’t in the software’s code, but in the very nature of human language upon which these models are trained.

Because AI models are designed to be helpful and context-aware, they are inherently susceptible to the same social engineering tactics used against humans. Banning specific keywords—like “bomb” or “toxin”—is an ineffective strategy because those words have legitimate uses in medical, historical, and scientific research. The machine must understand context, and that is where the manipulation happens.

Researchers at the AI red-teaming firm Mindgard have recently demonstrated this by “gaslighting” Claude, Anthropic’s AI. Rather than demanding prohibited content outright, they used conversational steering—cajoling and flattering the model until it lowered its guard. By framing a forbidden request within a specific, seemingly benign narrative, they were able to extract instructions for creating explosives and generating malicious code.

Profiling the Machine

This shift has created a strange new class of security professional. Red-teamers are no longer just looking for memory leaks or buffer overflows; they are profiling models like interrogators profile suspects. They look for behavioral patterns to determine which psychological levers to pull.

According to Mindgard, different models exhibit different “personality” vulnerabilities. One model might be more susceptible to flattery, while another might cave under a sustained, high-pressure conversational tone. While it is technically inaccurate to say an AI “feels” pressure or “enjoys” praise—since they are ultimately statistical prediction engines—they are trained to mimic those human responses. For a hacker, that mimicry is a backdoor.

The industry is currently locked in a recursive arms race. As developers implement more robust RLHF (Reinforcement Learning from Human Feedback) to prune these behaviors, attackers find more nuanced ways to simulate a context that justifies the bypass. The core tension remains: the more a chatbot feels human and intuitive to the average user, the more it opens itself up to the psychological tricks humans have used on each other for millennia.

The Social Engineering of AI: How Hackers are Exploiting Chatbot ‘Personalities’

Table of Contents

From Memes to Malware

The Psychology of the Prompt

Profiling the Machine

Related News

Grok’s Government Absence: xAI’s Chatbot Struggles for Foothold in Federal Adoption

Related Posts

Apple Intelligence Shifts Focus Toward Family Safety and Granular AI Guardrails at WWDC26

The Mid-Year Laptop Market: Where to Actually Save on Windows and Gaming Rigs

The End of the ‘Aha!’ Moment? How AI is Scooping Human Mathematicians

Leave a Reply Cancel reply