“MirrorWeb have brought some outstanding technical capabilities - in particular with data migration, cloud computing, search and new ways of harvesting and crawling content, as well as new ways of presenting that content and making it available.”

- John Sheridan, Digital Director, The National Archives

Introduction.

“The home to 1000 years of British history”. The National Archives is the archive of the UK government and the sector lead for all archives across the UK.

Watch our video interview with John Sheridan (Digital Director, The National Archives) to find out how MirrorWeb enables the long-term preservation of all UK government web and social media communications.

“What I’ve been most impressed with MirrorWeb is their creative use of cloud computing technologies. For example, to index the entirety of our 120TB collection by spinning 1,000 node plus cluster of computers to process that collection in just a couple of days has been hugely impressive."

 

The National Archives’ Challenge

 


The way the government uses the web is changing, and as the size and complexity of the UK Government Web Archive (UKGWA) has grown, so has the expectation of the users who now demand a reliable, comprehensive and intuitive search service as well as access to social media records.

In 2016, The National Archives (TNA) realised they needed to update their web archiving and social archiving provision. They were looking for someone to:

  • Take their existing archives and modernise how they were managing the capturing and storing content.
  • Help them capture new digital content including the government's social media channels.

 

The National Archive Challenge

MirrorWeb’s Challenge

 

Firstly we needed to collect and move the archive from the previous supplier. This data was stored on 72 2TB hard drives which meant we required two custom-built machines to connect the drives simultaneously and ingest the data - this was accomplished within two weeks.

The next phase was to develop a public-facing web archive that was searchable and allowed archives to be replayed whilst serving over 75 million visitors per month. We utilised Elasticsearch as our primary search tech for the following reasons:

  • Scalability - we spun a very large 1,000 node plus cluster to do the initial ingest of data and then scaled it down to an affordable level when we deploying live.
  • We could integrate it into the Amazon environment and monitor it with CloudWatch.
The MirrorWeb Platform

What we achieved

 

MirrorWeb’s capabilities have given TNA, for the first time, the ability to index the whole of the web archive, which has also significantly helped them improve searchability for users. A whole raft of digital content was able to be indexed by the search facility and offers users the ability to narrow their search to a particular site that was archived.

John Sheridan, Digital Director of The National Archives, said:

“MirrorWeb have brought some outstanding technical capabilities - in particular with data migration, cloud computing, search and new ways of harvesting and crawling content, as well as new ways of presenting that content and making it available. Improving search for users has been one of the biggest things that MirrorWeb have been able to achieve.”

Map of the UK

The key features and benefits.

ISO-Certified & Time-Stamped Archives
ISO-Certified & Time-Stamped Archives

Every archived file is time-stamped, immutable and stored in an ISO-compliant format to ensure authenticity and legal acceptance.

Automated Archiving
Automated Archiving

Standard daily, weekly or monthly crawls for all website URLs and daily crawls for social accounts. 

Replayable Web & Social Content
Replayable Web & Social Content

Fully indexed and searchable WARC’s. Users can replay content and archived metadata at any time and curate collections in line with Dublin Core.

Long-Term Preservation Guaranteed
Long-Term Preservation Guaranteed

All digital information is preserved and protected, ensuring digital content is never lost or made obsolete.

Compliance Requirements Met
Compliance Requirements Met

Our digital archiving technology answers the regulators' requirements, meaning they're source-proof, tamperproof and immutable.

Scaleable Cloud Technology
Scaleable Cloud Technology

Our cloud-based platform can manage huge data-sets and is light touch, requiring no infrastructure costs or extra resource burdens on customers.

A Single Searchable Archive
A Single Searchable Archive

All digital assets are fully indexed and searchable in the platform, making it easier than ever to find online records and content.

Discovery Support
Discovery Support

All archived website content can be made available to academic professionals, researchers, government bodies and other third parties for required purposes.

Data Sovereignty
Data Sovereignty

The National Archives are in total control of when, where and how their data is archived and complies with ISO standards.

Read our other case studies

Case Study: Irish Distillers Pernod Ricard

Case Study: Irish Distillers Pernod Ricard

Read More
Case Study: The Bank of England

Case Study: The Bank of England

Read More