Watch our interview with Digital Director John Sheridan to find out how MirrorWeb archives all UK government web and social media communications.
“The home to 1000 years of British history”. The National Archives is the archive of the UK government and the sector lead for all archives across the UK.
Watch our video interview with John Sheridan (Digital Director, The National Archives) to find out how MirrorWeb enables the long-term preservation of all UK government web and social media communications.
“What I’ve been most impressed with MirrorWeb is their creative use of cloud computing technologies. For example, to index the entirety of our 120TB collection by spinning 1,000 node plus cluster of computers to process that collection in just a couple of days has been hugely impressive."
The way the government uses the web is changing, and as the size and complexity of the UK Government Web Archive (UKGWA) has grown, so has the expectation of the users who now demand a reliable, comprehensive and intuitive search service as well as access to social media records.
In 2016, The National Archives (TNA) realised they needed to update their web archiving and social archiving provision. They were looking for someone to:
Firstly we needed to collect and move the archive from the previous supplier. This data was stored on 72 2TB hard drives which meant we required two custom-built machines to connect the drives simultaneously and ingest the data - this was accomplished within two weeks.
The next phase was to develop a public-facing web archive that was searchable and allowed archives to be replayed whilst serving over 75 million visitors per month. We utilised Elasticsearch as our primary search tech for the following reasons:
MirrorWeb’s capabilities have given TNA, for the first time, the ability to index the whole of the web archive, which has also significantly helped them improve searchability for users. A whole raft of digital content was able to be indexed by the search facility and offers users the ability to narrow their search to a particular site that was archived.
John Sheridan, Digital Director of The National Archives, said:
“MirrorWeb have brought some outstanding technical capabilities - in particular with data migration, cloud computing, search and new ways of harvesting and crawling content, as well as new ways of presenting that content and making it available. Improving search for users has been one of the biggest things that MirrorWeb have been able to achieve.”
Every archived file is time-stamped, immutable and stored in an ISO-compliant format to ensure authenticity and legal acceptance.
Standard daily, weekly or monthly crawls for all website URLs and daily crawls for social accounts.
Fully indexed and searchable WARC’s. Users can replay content and archived metadata at any time and curate collections in line with Dublin Core.
All digital information is preserved and protected, ensuring digital content is never lost or made obsolete.
Our digital archiving technology answers the regulators' requirements, meaning they're source-proof, tamperproof and immutable.
Our cloud-based platform can manage huge data-sets and is light touch, requiring no infrastructure costs or extra resource burdens on customers.
All digital assets are fully indexed and searchable in the platform, making it easier than ever to find online records and content.
All archived website content can be made available to academic professionals, researchers, government bodies and other third parties for required purposes.
The National Archives are in total control of when, where and how their data is archived and complies with ISO standards.
MirrorWeb Limited
Kenworthys Buildings / 83 Bridge Street
Manchester / M3 2RF / United Kingdom
Registered in England / Registration No. 08072284
251 Little Falls Drive / Willmington
Newcastle / Delaware / 19808 / United States
0800 222 9200
info@mirrorweb.com
Website Archiving / Wayback Machine Alternatives