Breaking
OpenAI announces GPT-5 with breakthrough reasoning capabilities | OpenAI announces GPT-5 with breakthrough reasoning capabilities |

Home / Norway is building a ‘Sovereign AI’ to save its language from the English-centric LLM surge

Technology

Norway is building a ‘Sovereign AI’ to save its language from the English-centric LLM surge

Saran K | May 26, 2026 | 4 min read

sovereign AI

Table of Contents

    The fight for linguistic sovereignty

    For most of the world, the current AI revolution feels like an English-language project. While models from OpenAI, Google, and Anthropic can translate dozens of tongues, they often struggle with the nuance, historical context, and cultural idiosyncrasies of smaller languages. Norway has decided that relying on commercial providers is a strategic risk.

    The National Library of Norway (Nasjonallbiblioteket) is currently developing a sovereign Large Language Model (LLM) designed specifically to understand and reflect the Norwegian language. The project isn’t just a technical exercise; it is a cultural safeguard. According to Marius Husnes, Head of IT Platform at the library, any nation that fails to develop its own sovereign AI risks having its history, news, and culture filtered through the lens of a globally trained, English-speaking model.

    The mandate came from Norway’s Ministry of Culture, which tasked the library with the project due to its unique position as the country’s primary digital archive. Through its legal deposit mandate, the library possesses the largest digital collection of Norwegian books, newspapers, and web pages in existence—a dataset that no private company could legally or physically replicate.

    Solving the ‘Data Bottleneck’

    Building a model of this scale requires more than just raw compute; it requires a massive, high-speed data pipeline. While the library has been digitizing its archives since 2005—amassing roughly 20 PB of unique data, or 60 PB when accounting for 3-2-1 backup redundancy—getting that data into a usable format for AI is a different challenge entirely.

    Speaking at Huawei’s ID Forum 2026 in Paris, Husnes noted that the primary bottleneck wasn’t the GPUs, but rather data quality, cleaning, and throughput. To bridge the gap between long-term preservation and active training, the library implemented a hybrid storage architecture. The bulk of the archive resides in a high-durability, high-latency system optimized for cost and longevity. To feed the AI pipeline, however, the library deployed 2 PB of Huawei OceanStor Dorado all-flash storage.

    This flash layer acts as a low-latency staging area where raw text, sound, and images undergo ingestion, deduplication, and format normalization. This processed data is then fed into an Nvidia DGX H200 system and a 384-core CPU cluster before being sent to the actual training site: Norway’s national supercomputer, the Sigma2 Olivia system.

    The machinery of a national model

    The Sigma2 Olivia system, an HPE Cray Supercomputing EX, provides the heavy lifting for the actual training runs. With 448 GPUs and 64,512 CPU cores, it is supported by a 5.3 PB Cray ClusterStor E1000 storage system.

    The transition from a ‘cold’ archive to a ‘hot’ training pipeline proved to be one of the project’s steepest learning curves. Husnes pointed out that there is very little industry documentation on the specifics of moving petabyte-scale datasets from a preservation archive into an AI pipeline. His team had to essentially map this territory from scratch, balancing the need for extreme durability in the archive with the need for parallel data IO in the training phase.

    Beyond the technology

    The project also navigated a complex legal landscape. Through specific agreements with Norwegian newspapers, the library was able to train the model on copyrighted content—a privilege that puts them in a stronger position than commercial AI startups currently fighting copyright lawsuits in the US and EU.

    As training continues, the Norwegian experiment serves as a blueprint for other non-English speaking nations. The project highlights a growing trend toward ‘AI sovereignty,’ where states treat their linguistic data as a critical national resource. As Husnes framed it, AI needs custodians, not just builders; without a dedicated effort to preserve local context, the digital future may be written in a language that forgets the specifics of the past.

    #artificialIntelligence #bigData #norway #huawei #supercomputing #huawei #norway #llm #flash

    Related Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *