Home / Nvidia’s SCADA Storage Servers: Solving the AI Data Bottleneck with PCIe 6.0 and 2.9 Petabytes of Scale

Nvidia’s SCADA Storage Servers: Solving the AI Data Bottleneck with PCIe 6.0 and 2.9 Petabytes of Scale

Saran K | June 15, 2026 | 8 min read

The Architectural Shift in AI Data Access

For years, the primary challenge in AI scaling hasn’t just been raw compute power, but the ‘data starvation’ problem. While GPUs like the Blackwell series can process information at staggering speeds, they are often left idling, waiting for data to travel from storage, through the CPU, and into the GPU memory. At Computex 2026, Wiwynn provided a glimpse into the solution: the Nvidia SCADA (SCaled Accelerated Data Access) server.

Key Takeaways

Bypassing the CPU: SCADA allows GPUs to initiate and control storage I/O directly, removing the central processor bottleneck.
Massive Scale: Using Micron 9650 Pro drives, the Wiwynn implementation reaches 2.949 petabytes of storage in a 6RU form factor.
Extreme Throughput: The system leverages PCIe 6.0 to achieve an aggregated random read speed of 528 million 4K IOPS.
Thermal Management: To maintain these speeds, the server utilizes a comprehensive liquid-cooling loop for all 96 E3.S SSDs.

The introduction of SCADA represents a fundamental shift in how we think about data center hierarchy. Historically, storage was a passive repository. With SCADA, storage becomes an active, accelerated layer that behaves less like a hard drive and more like an extension of the GPU’s own high-bandwidth memory (HBM).

Decoding SCADA: What is Scaled Accelerated Data Access?

Nvidia SCADA is a specialized storage architecture designed to enable GPUs to manage data movement from solid-state drives (SSDs) without relying on the host CPU to orchestrate every single request. In traditional server architectures, even with technologies like GPUDirect Storage, the CPU still maintains the ‘control path’—meaning it must tell the system where the data is and how to move it.

In the world of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG), this control path becomes a massive bottleneck. When a system needs to perform a vector search across billions of parameters or retrieve specific KV-cache data, it isn’t asking for one giant file; it is asking for millions of tiny, 4KB blocks of data scattered across different drives. A CPU attempting to manage these millions of tiny requests creates a queue that slows the entire AI pipeline to a crawl.

SCADA solves this by handing the keys to the GPUs. By allowing the GPUs to initiate their own storage I/O operations, Nvidia has effectively removed the middleman. The GPUs now control the data path, allowing for the extreme parallelism required by thousands of concurrent GPU threads.

The Hardware Breakdown: Wiwynn’s 2.9 Petabyte Powerhouse

The server showcased by Wiwynn isn’t just a proof of concept; it is a high-density industrial machine built for the most demanding AI clusters. To understand the scale, we have to look at the underlying specifications.

Storage Density and Capacity

The system is designed around the E3.S form factor, the modern standard for enterprise NVMe storage. The Wiwynn unit supports up to 96 of these drives. When populated with 30.72 TB Micron 9650 Pro SSDs, the math is simple but staggering: 96 drives x 30.72 TB equals roughly 2.949 petabytes of raw, flash-based storage in a single 6RU chassis.

The Compute and Interconnect Core

This isn’t a storage server in the traditional sense—it is more of a ‘storage processor.’ The architecture is built on:

Nvidia Vera CPU: Serving as the management layer rather than the primary data orchestrator.
4x RTX Pro 6000 Blackwell GPUs: These don’t just run models; they act as the intelligent controllers for the storage transactions.
4x PCIe 6.x Switches: Providing the massive bandwidth required to move petabytes of data without congestion.
4x ConnectX-9 SuperNICs: Ensuring that once the data is retrieved from the SSDs, it can be blasted across the network to other compute nodes with minimal latency.

Thermal Engineering: The Role of Liquid Cooling

Pushing 96 PCIe 6.0 SSDs to their limit generates an immense amount of heat. If these drives were air-cooled, they would quickly hit thermal throttling limits, causing a ‘performance cliff’ where read/write speeds plummet to protect the hardware. To prevent this, Wiwynn integrated six separate cold plate modules into a unified liquid cooling loop. This ensures that the coolant reaches every SSD simultaneously, maintaining a consistent temperature across the entire 2.9PB array and guaranteeing that the 528 million 4K IOPS performance remains stable under 100% load.

Solving the AI Inference Bottleneck

To understand why this matters, we have to distinguish between AI training and AI inference. During training, the system typically handles large sequential transfers—reading a massive dataset from start to finish. Current storage solutions handle this reasonably well.

Inference is a different beast. When a user interacts with a RAG-based AI, the system must perform ‘fine-grained random accesses.’ It searches a vector database, finds a specific piece of context, and retrieves it. This involves data blocks often smaller than 4KB. When you scale this to thousands of simultaneous users, you get a ‘random read storm.’

Traditional CPU-centric I/O cannot survive a random read storm. The CPU becomes overwhelmed by the sheer number of requests, not the volume of data. By leveraging the Blackwell GPUs to manage these requests, SCADA allows the system to handle millions of these small requests in parallel, feeding the compute servers at a rate that matches the speed of the AI’s thought process.

What This Means for the AI Industry

The emergence of SCADA indicates that we are entering the era of ‘Tier 3.5’ storage. In the standard data center hierarchy, you have Tier 1 (HBM/GPU Memory), Tier 2 (Local NVMe), and Tier 4 (Remote HDD/Cloud Storage). SCADA sits in the middle. It is too fast to be considered remote storage, but too large to be considered local cache.

For Enterprise AI Developers: This means the ‘memory wall’ is starting to crumble. Developers can now build models with datasets that far exceed the onboard HBM of a GPU without suffering a catastrophic performance hit. RAG systems will become significantly more responsive, as the latency between ‘finding’ a piece of data in a petabyte-scale database and ‘processing’ it in the GPU will be slashed.

For Data Center Operators: The power requirements are the new challenge. With a maximum power consumption of 9 kW per 6RU unit, these servers require specialized power delivery and liquid cooling infrastructure. The days of simply plugging a server into a standard rack are over; we are moving toward integrated ‘compute-storage pods.’

The Strategic Context: Nvidia’s ‘Storage Next’ Vision

SCADA is not a standalone product but a piece of Nvidia’s broader ‘Storage Next’ strategy. This vision seeks to blend the line between memory and storage. In an ideal world, a GPU shouldn’t ‘know’ if a piece of data is in its own HBM or on a SCADA-managed SSD; it should just be able to access it with near-instantaneous speed.

By partnering with hardware vendors like Broadcom (for PCIe switches) and Micron (for high-density NAND), Nvidia is building an ecosystem where the entire data path—from the NAND cell in the SSD to the CUDA core in the GPU—is optimized for a single purpose: accelerating AI.

Performance Comparison: Traditional vs. SCADA

Feature	Traditional CPU-Centric I/O	Nvidia SCADA Architecture
Control Path	CPU Managed (Bottleneck)	GPU Managed (Parallel)
Data Access	Sequential/Large Block	Random/Fine-Grained (4KB)
Max Throughput	Limited by CPU Interrupts	Limited by PCIe 6.0 Fabric
	Thermal Strategy	Air/Fan Cooling (Throttling)	Integrated Liquid Cooling
Storage Role	Passive Repository	Active Storage Processor

Frequently Asked Questions

What is the main difference between SCADA and GPUDirect Storage?

While GPUDirect Storage allows data to move directly from the SSD to the GPU memory, the CPU still handles the ‘handshake’—it tells the system where the data is and manages the request. SCADA allows the GPU to take over the control path entirely, initiating and managing the I/O requests itself, which is critical for high-parallelism random reads.

Why is PCIe 6.0 necessary for these servers?

PCIe 6.0 doubles the bandwidth of PCIe 5.0. When you are dealing with 96 high-speed SSDs and 528 million IOPS, the interconnect becomes the primary bottleneck. Without PCIe 6.0, the GPUs would be waiting for the bus to clear, negating the benefits of the SCADA architecture.

Can any server be converted to a SCADA server?

No. SCADA requires a specific hardware ecosystem, including PCIe 6.x switches and compatible GPUs (like the Blackwell series) that can handle the storage control logic. It is a specialized architectural design, not a software update.

How does liquid cooling affect the performance of the SSDs?

Enterprise SSDs generate significant heat during sustained random read/write operations. When they overheat, they engage in ‘thermal throttling,’ which drastically reduces speed. Liquid cooling maintains a constant, low temperature, allowing the drives to operate at peak performance indefinitely.

What is the cost of a SCADA-based server?

Wiwynn has not disclosed official pricing. However, given the inclusion of four RTX Pro 6000 Blackwell GPUs, a Vera CPU, and nearly 3PB of high-end Micron PCIe 6.0 storage, these systems are targeted at hyperscalers and enterprise AI labs, likely costing hundreds of thousands of dollars per unit.

Final Reporting Notes

The Wiwynn SCADA server is a signal that the AI industry is moving away from general-purpose computing. We are seeing the birth of ‘AI-native hardware’ where every component—from the cooling system to the PCIe switch—is designed specifically for the erratic, high-speed, and massive-scale demands of neural networks. While SCADA is currently a niche solution for the highest tier of AI labs, it sets the blueprint for how all high-performance data centers will likely operate by the end of the decade.

Related Posts

The T1 Phone Mystery: Investigating Trump Mobile’s ‘Made in USA’ Smartphone Claims

Salesforce Bets $3.6 Billion on Fin to Scale Agentforce AI Capabilities

Leave a Reply Cancel reply

Table of Contents