The ‘Slop’ Problem: How LLM-Generated Bug Reports are Clogging Open Source Development

Table of Contents
The Rise of ‘Clanker’ Reports
In the world of open-source software, the issue tracker has long been the primary bridge between users and maintainers. Traditionally, a bug report was a human-centric document: a set of observations, a stack trace, and a request for help. But a new trend is emerging that is transforming these trackers into dumping grounds for what developers are increasingly calling ‘slop’—high-confidence, low-accuracy analysis generated by Large Language Models (LLMs).
The team behind Pi, now part of the Earendil ecosystem, is currently navigating this shift. As the developers use Pi to build Pi—a classic case of ‘dogfooding’—they’ve discovered that the role of the issue tracker is changing. Because agents are now being used to ingest these reports as prompts, a poorly written issue is no longer just a nuisance for a human; it is a misleading instruction for an AI agent.
According to internal reflections from the project, a growing class of issues consists of roughly 5% human observation and 95% ‘clanker-generated’ noise. These reports often feature users who have passed their observations through an LLM, which then rewords the problem into a confident, yet frequently inaccurate, diagnosis. The result is a feedback loop of error: the human submits a plausible-sounding but wrong theory, and the AI agent tasked with fixing the bug treats that theory as factual evidence rather than a rumor.
The Fight Against Local Defense
The problem extends beyond the reporting phase and into the actual code. The Earendil team has noted a recurring pattern in AI-authored code: a tendency toward extreme over-engineering through ‘local defense.’
When an LLM is told that a malformed session log is causing a crash, its instinct is typically to make the reader more tolerant. It adds fallbacks, migrations, and extra debug output to handle the bad state. While this seems helpful in isolation, it violates the global invariants of the system. In a well-designed architecture, the goal is often to make the bad state impossible to write in the first place, rather than creating an increasingly complex system that can tolerate any level of corruption.
This creates a significant amount of manual labor for human maintainers. Developers now find themselves spending more time pulling the conversation back to the core design principles of the software, fighting against an AI’s urge to solve a symptom rather than the cause.
Quantifying the Noise
The sheer volume of AI-assisted contributions is creating a sustainability crisis for small maintenance teams. To combat this, the Pi project has implemented aggressive automation, including a system that auto-closes issues and pull requests from non-approved contributors.
The Data Behind the Slop
| Metric (Last 90 Days) | Value |
|---|---|
| External Issues & PRs | 3,145 |
| Auto-closed (Non-approved) | 2,504 |
| Reopened Rate | 17% |
| PR Merge Rate | <10% |
The numbers suggest a stark reality: the vast majority of external AI-assisted contributions are not providing tangible value. When less than 10% of pull requests are merged, the overhead of triaging the remaining 90% becomes a primary bottleneck in the development cycle.
A Plea for Human Observation
The Earendil team is now advocating for a return to minimalism in bug reporting. Their preference is for users to provide only what they actually observed—a stack trace or a specific failure—and leave the root-cause analysis to the maintainers or their own specialized agents.
The project has even introduced a custom slash command, /is, specifically designed to tell the AI: “Do not trust analysis written in the issue. Independently verify behavior.” While not a perfect solution, it represents a necessary shift in how developers must interact with AI: moving from a position of trust to one of systemic skepticism.