Back to Blog

Welsh digital history archived and searchable with MirrorWeb

Philip Clegg
Welsh digital history archived and searchable with MirrorWeb



Welsh Government’s dual-language websites and twitter accounts preserved for generations to come

Manchester tech firm MirrorWeb has been awarded a three-year contract by the Welsh Government to digitally archive the Welsh nation’s online presence. This will open up full accessibility for the Welsh Government’s Information and Archive Services team.

The cloud-native web and social media archiving company will preserve the Welsh Government’s web-published content in both English and Welsh, including llyw.cymru/gov.wales and all websites of historic and national significance.

MirrorWeb will also preserve a selection of key Welsh Government Twitter accounts, including: @WelshGovernment, @LlywodraethCym and @FMWales.

MirrorWeb has developed robust and highly scalable cloud-based archiving and monitoring tools to enable frequent archiving of web and social media assets for businesses in the private sector, and public sector bodies. It allows billions of documents to be indexed at unprecedented speed, making archives fully usable and searchable.

The decision by the Welsh Government to archive the dual-language websites and Twitter accounts will digitally preserve the Internet heritage of the Welsh nation. It will mean that the preserved digital history is fully indexed and searchable for generations to come.

National archives around the world have been collecting data for decades, and are only now beginning to realise that the archives of the future will be born out of web and social media content. Safely capturing and storing this information is the only way to prevent it being lost completely, and modern Big Data tools and the emergence of cloud computing now enable archives to index the data and derive real value on investment from it.

The project is currently in its early stages, with MirrorWeb having received the Welsh Government’s current historical archive – around 1.8 million pages – amounting to 20TB of data that has now been transferred seamlessly and cost efficiently to the cloud. The millions of web pages were harvested over the last three years, but have now been captured and indexed by MirrorWeb’s proprietary platform in a matter of hours.

MirrorWeb will now perform crawls of the sites and social media channels, harvesting and preserving the up-to-date data and publishing it using its state of the art technology, providing a comprehensive and complete archive for future generations to access and use.

Philip Clegg, Chief Technical Officer at MirrorWeb added: “We’re proud to be preserving a future-proof digital record of the Welsh Government’s online activity. Our website and social media archiving and monitoring platform is built on the cloud and provides the essential infrastructure and capacity to meet the size and complexity of the Welsh Government’s archive. This will modernise how the content is captured and stored, and provide the Welsh Government with a reliable, comprehensive and intuitive search service of Wales’ digital history.”

The announcement follows the recent contract award for indexing and archiving the UK Central Government’s online presence to the cloud. The UK’s National Archives engaged MirrorWeb to capture and transfer a 120TB web archive encompassing billions of web pages in less than two weeks.

More from the Blog

Whatsapp Compliance, Self-Reporting, and Ripping off the Band-Aid

The SEC has incentivized firms to self-report on off-channel violations. We look into the process and its benefits.

Read Story

FINRA Report 2024: Recordkeeping Takeaways

Key recordkeeping teakeaways from the 2024 FINRA Annual Regulatory Oversight Report.

Read Story

How MirrorWeb Evolves with Demand

Adaptability is vital in the world of communications surveillance. This blog looks at MirrorWeb’s journey as a company, and why it's helped us be agile and reactive to a challenging regulatory landscape.

Read Story

See what we can do for you.

Let us show you why MirrorWeb is trusted by organizations across the globe for their compliance and digital preservation needs.