Back to Blog

6 Drivers for Digital Archiving in the Museums & Higher Education Markets

Marketing Team

6 Drivers for Digital Archiving in the Museums & Higher Education Markets

“If you’re not capturing web & social content now, to do it retrospectively in 12 months’ time could be virtually impossible." David Clee, MirrorWeb.

Following the recent announcement of the MirrorWeb and Arkivum partnership that delivers a hybrid, end-to-end digital archiving solution, we sat down with both organisations to discuss the drivers for website and social media archiving in the heritage, museums and higher education markets.

Joining us were MirrorWeb’s CEO, David Clee along with Arkivum’s Head of Marketing, Becks Hicks, and VP in Higher Education, Archives, Libraries & Heritage, Paula Keogh. We would also like to thank the DPC and Sean Rippington, Digital Archives Officer at the University of St Andrews, for their support and contributions to this article.

Read on to learn more about the drivers for website and social media archiving in the heritage, museums and higher education markets.

1. Digital Data is at Risk

Many organisations still rely on legacy technologies or formats in danger of becoming obsolete, and this is putting digital content at risk of being lost forever.

David Clee, MirrorWeb, on how capturing web & social contnt in 12 months' time could be virtually impossible

When you then consider the sheer volume of content created digitally on a daily basis - that’s 2.5 quintillion bytes of data created online every single day, and to conceptualise that, if you laid out 2.5 quintillion one pence coins, it would cover the surface area of Earth five times over. 

We are currently in the eye of a perfect storm, with the consequences of not capturing today incomprehensible in the potential impact it will have on each future generation’s ability to have access to the archives.

The DPC’s ‘Bit List’ of Digitally Endangered Species highlights this by listing digital materials most at risk, demonstrating the need to safeguard digital materials that could be lost to future generations. As MirrorWeb CEO, David Clee, states, “If you’re not capturing web and social content now, to do it retrospectively in 12 months’ time could be virtually impossible.” 

He went on to explain that the average lifespan of a web page is believed to be 90 days, with modern CMS platforms allowing webmasters to change content on the fly with no focus or best practice in place for capturing older versions of content. This is not intentional but a result of digital evolution, where historically the technology and tools to do so have not been openly available. It is only now that marketing and commercial teams are making sense of the need to look back to move forward.

The future use of the digital archive is only just being explored, but as AI and big data tools start to improve the interrogation and analysis of the archived website and social content, we will truly be able to make archives open to everyone, from anywhere, and usable in areas we as a community would not have thought possible 5 years ago.

2. Preserve Website Content of Commercial, Cultural and Historical Importance 

Sean Rippington from the University of St Andrews says websites now represent a key record of what is happening at any given time. This is also reflected in the wider higher education, libraries and heritage sectors.

For example, many university publications have been replaced by web publications over the past 20 years. This means websites are now a key repository for official documents and much information only exists online. This includes university course material, research outputs, blogs, video and audio content, etc.

This is online information vital to capture for legal reasons - for example, although the CMA's Advice on Consumer Protection Law for UK Higher Education Providers does not openly state that HEP’s need to use a web archiving solution, this would prove invaluable to demonstrate compliance by maintaining a permanent record of what students had been given - and also to preserve content of commercial, cultural and historical importance.

A poignant example of the importance of preserving website content is how, within 24 hours of President Trump getting into office, the official US Government climate change website was removed. It is only through its digital preservation by archivists that it was saved before it disappeared. And it is this archive that will influence historical discourse and how future historians, researchers, social commentators, etc. see what happened and why.

We must also consider transient websites for short-term exhibitions, sporting events such as the Olympics and Commonwealth Games, theatre productions, etc. They get shut down again and again, and unlike posters and programmes won’t get put in a box for storage. Unless we do something now, they will literally disappear.


3. Preserve Social Media Content and Communications 


Social media platforms now form an essential part of any organisation’s communications, according to Arkivum’s Paula Keogh: “There has been a shift in how we communicate, and if you think about some of the online conversations and the history that is happening online, whether it’s via Twitter, LinkedIn or Facebook, the realisation that social is fast becoming part of the heritage, museums and HE markets’ daily role is both exciting and daunting in equal measure.”

We rarely write letters to each other anymore, and so how will people in the future studying these areas gain a rounded view of public opinion if the main conversation happened on Twitter? 

Arkivum's Paula Keogh - Change in how heritage, museums and HE communicate via social media

Pliny the Younger wrote down his eye witness account of the destruction of Pompeii in 79 AD, and that is the only reason we know what happened. In many ways, Twitter is our modern day papyrus of equal importance.

This is particularly pertinent for disasters such as the Manchester bombing, Grenfell Tower fire and Westminster terror attack, for which masses and masses of highly important data  is on social media of how the public reacted  at these historically significant times.

Social media platforms now also act as key sites of record for student societies, sports teams and wider university communications. This includes communications such as photos and annual reports. But, because much of this data now only exists on social media, it is at risk of being lost if something isn't done now.

Preserving social media content is also important for research resource data. For example, Twitter datasets around a hashtag at a particular time or vast troves of social data harvested from many platforms can support data validated research and show impact. Being able to track and archive tweets and posts about a project or programme, and to find relevant content to include in reports, has also become paramount for research within these markets.

4. Preserve Digital Data for Future Insights

Website and social media archiving is still at an early stage in the grand scheme of things, as Paula Keogh says: "The move from analogue and physical archive records, which we’ve had a long time to get used to and find strategies for keeping and accessing in the future, is a luxury of time we don’t have with digital data.”

David Clee, MirrorWeb, on why we need to capture and archive content now

But, even if we’re not sure what use this archived digital data will provide in the future, it’s better to have it than not to have it.

As explained by MirrorWeb’s David Clee: “We do have to capture stuff today because we don’t really understand how we’re going to use that or what we’re going to do with it in the future. By having content archived now, this means that as we do learn, evolve and develop, we can make sure information is never lost and organisations can access it in the future.”


5. Preserve Research Data and Output


Researcher databases need preserving and research outputs made available for reference by future generations. In universities, for example, researchers produce websites in need of preserving as part of research data output. This is to ensure compliance with open data initiatives, funder requirements and the Research Excellence Framework (REF).

Archiving content from external websites, such as research institutions, government bodies, policy makers, corporate leaders, etc. is also important to support the REF. This would also provide evidence of university research outputs used or praised by external parties.

Most universities also encourage their researchers to deposit large web and social datasets with specialist data centres. This makes the data more discoverable to the research community who might reuse them.

6. Maintain Best Practice in Record Keeping, Protect Investments and Reuse Archived Content


Maintaining best practice in record keeping will, or should, enforce website and social media archiving within the heritage, museums and higher education markets to the same degree as other forms of corporate records. For example, GDPR compliance has been a regulation leveraged by some higher education institutions. This has helped garner support for better management of digital records such as website and social media data.

Website and social media archiving also helps to preserve an organisation’s investments in digital communications. For example, professionally produced videos and blog content that would otherwise be at risk of being lost. Sean Rippington from the University of St Andrews also highlights the use of preserved website and social media content in outreach and alumni relations work, and to support marketing and other publicity activities.

Get Research Access to MirrorWeb and Arkivum's Hybrid, End-to-End Digital Archiving Solution


Arkivum had increasingly been asked about website and social media archiving by their existing customers, but were unable to provide a satisfactory solution. This is because the concept was still cutting-edge and the technology in conception and development within the community at large.

Then MirrorWeb emerged into the market in conjunction with The UK National Archives when they launched their new web and social media crawler tech-stack with automated QA features built on AWS cloud technology back in 2016.

Taking a lead from other successful technology sectors, both companies identified that concentrating on what they do well and specialise in is best practice - but collaborating to bring the specialisms together to improve and satisfy the customer need is the way forward. And now MirrorWeb and Arkivum’s hybrid, end-to-end digital archiving solution is the most comprehensive data lifecycle management solution and service for website archiving and social media archiving in museum and higher education markets.

The portal provided by MirrorWeb is user-friendly, light-touch with minimal user input to setup, crawl and replay web and social media archives in high-fidelity, and cost-effective as it harnesses the power of the cloud - meaning cost is no longer a barrier to capturing and archiving the future digital content we all need today, and which we need for tomorrow once we understand how to use it and where it fits in the overarching archivist’s asset bank.

The WARCs created are passed seamlessly and automatically via an API to Arkivum’s Perpetua system, preserving and future-proofing the web and social media content for all time.

The partnership now covers safeguarding, digital preservation, compliances, records management and integration, and data discovery, filling a major gap for institutions seeking digital preservation as a comprehensive service with open-standards and a no-vendor-lock-in philosophy.

To try the hybrid, end-to-end digital archiving solution, and see how it can benefit your organisation, request a demo of our website and social media archive here. 


More from the Blog

Whatsapp Compliance, Self-Reporting, and Ripping off the Band-Aid

The SEC has incentivized firms to self-report on off-channel violations. We look into the process and its benefits.

Read Story

FINRA Report 2024: Recordkeeping Takeaways

Key recordkeeping teakeaways from the 2024 FINRA Annual Regulatory Oversight Report.

Read Story

How MirrorWeb Evolves with Demand

Adaptability is vital in the world of communications surveillance. This blog looks at MirrorWeb’s journey as a company, and why it's helped us be agile and reactive to a challenging regulatory landscape.

Read Story

See what we can do for you.

Let us show you why MirrorWeb is trusted by organizations across the globe for their compliance and digital preservation needs.