“If you’re not capturing web & social content now, to do it retrospectively in 12 months’ time could be virtually impossible." David Clee, MirrorWeb.
Following the recent announcement of the MirrorWeb and Arkivum partnership that delivers a hybrid, end-to-end digital archiving solution, we sat down with both organisations to discuss the drivers for website and social media archiving in the heritage, museums and higher education markets.
Joining us were MirrorWeb’s CEO, David Clee along with Arkivum’s Head of Marketing, Becks Hicks, and VP in Higher Education, Archives, Libraries & Heritage, Paula Keogh. We would also like to thank the DPC and Sean Rippington, Digital Archives Officer at the University of St Andrews, for their support and contributions to this article.
Read on to learn more about the drivers for website and social media archiving in the heritage, museums and higher education markets.
1. Digital Data is at Risk
Many organisations still rely on legacy technologies or formats in danger of becoming obsolete, and this is putting digital content at risk of being lost forever.
When you then consider the sheer volume of content created digitally on a daily basis - that’s 2.5 quintillion bytes of data created online every single day, and to conceptualise that, if you laid out 2.5 quintillion one pence coins, it would cover the surface area of Earth five times over.
We are currently in the eye of a perfect storm, with the consequences of not capturing today incomprehensible in the potential impact it will have on each future generation’s ability to have access to the archives.
The DPC’s ‘Bit List’ of Digitally Endangered Species highlights this by listing digital materials most at risk, demonstrating the need to safeguard digital materials that could be lost to future generations. As MirrorWeb CEO, David Clee, states, “If you’re not capturing web and social content now, to do it retrospectively in 12 months’ time could be virtually impossible.”
He went on to explain that the average lifespan of a web page is believed to be 90 days, with modern CMS platforms allowing webmasters to change content on the fly with no focus or best practice in place for capturing older versions of content. This is not intentional but a result of digital evolution, where historically the technology and tools to do so have not been openly available. It is only now that marketing and commercial teams are making sense of the need to look back to move forward.
The future use of the digital archive is only just being explored, but as AI and big data tools start to improve the interrogation and analysis of the archived website and social content, we will truly be able to make archives open to everyone, from anywhere, and usable in areas we as a community would not have thought possible 5 years ago.
2. Preserve Website Content of Commercial, Cultural and Historical Importance
Sean Rippington from the University of St Andrews says websites now represent a key record of what is happening at any given time. This is also reflected in the wider higher education, libraries and heritage sectors.
For example, many university publications have been replaced by web publications over the past 20 years. This means websites are now a key repository for official documents and much information only exists online. This includes university course material, research outputs, blogs, video and audio content, etc.
This is online information vital to capture for legal reasons - for example, although the CMA's Advice on Consumer Protection Law for UK Higher Education Providers does not openly state that HEP’s need to use a web archiving solution, this would prove invaluable to demonstrate compliance by maintaining a permanent record of what students had been given - and also to preserve content of commercial, cultural and historical importance.
A poignant example of the importance of preserving website content is how, within 24 hours of President Trump getting into office, the official US Government climate change website was removed. It is only through its digital preservation by archivists that it was saved before it disappeared. And it is this archive that will influence historical discourse and how future historians, researchers, social commentators, etc. see what happened and why.
We must also consider transient websites for short-term exhibitions, sporting events such as the Olympics and Commonwealth Games, theatre productions, etc. They get shut down again and again, and unlike posters and programmes won’t get put in a box for storage. Unless we do something now, they will literally disappear.
3. Preserve Social Media Content and Communications
Social media platforms now form an essential part of any organisation’s communications, according to Arkivum’s Paula Keogh: “There has been a shift in how we communicate, and if you think about some of the online conversations and the history that is happening online, whether it’s via Twitter, LinkedIn or Facebook, the realisation that social is fast becoming part of the heritage, museums and HE markets’ daily role is both exciting and daunting in equal measure.”
We rarely write letters to each other anymore, and so how will people in the future studying these areas gain a rounded view of public opinion if the main conversation happened on Twitter?
Pliny the Younger wrote down his eye witness account of the destruction of Pompeii in 79 AD, and that is the only reason we know what happened. In many ways, Twitter is our modern day papyrus of equal importance.
This is particularly pertinent for disasters such as the Manchester bombing, Grenfell Tower fire and Westminster terror attack, for which masses and masses of highly important data is on social media of how the public reacted at these historically significant times.
Social media platforms now also act as key sites of record for student societies, sports teams and wider university communications. This includes communications such as photos and annual reports. But, because much of this data now only exists on social media, it is at risk of being lost if something isn't done now.
Preserving social media content is also important for research resource data. For example, Twitter datasets around a hashtag at a particular time or vast troves of social data harvested from many platforms can support data validated research and show impact. Being able to track and archive tweets and posts about a project or programme, and to find relevant content to include in reports, has also become paramount for research within these markets.
4. Preserve Digital Data for Future Insights
Website and social media archiving is still at an early stage in the grand scheme of things, as Paula Keogh says: "The move from analogue and physical archive records, which we’ve had a long time to get used to and find strategies for keeping and accessing in the future, is a luxury of time we don’t have with digital data.”
But, even if we’re not sure what use this archived digital data will provide in the future, it’s better to have it than not to have it.
As explained by MirrorWeb’s David Clee: “We do have to capture stuff today because we don’t really understand how we’re going to use that or what we’re going to do with it in the future. By having content archived now, this means that as we do learn, evolve and develop, we can make sure information is never lost and organisations can access it in the future.”
5. Preserve Research Data and Output
Researcher databases need preserving and research outputs made available for reference by future generations. In universities, for example, researchers produce websites in need of preserving as part of research data output. This is to ensure compliance with open data initiatives, funder requirements and the Research Excellence Framework (REF).
Archiving content from external websites, such as research institutions, government bodies, policy makers, corporate leaders, etc. is also important to support the REF. This would also provide evidence of university research outputs used or praised by external parties.
Most universities also encourage their researchers to deposit large web and social datasets with specialist data centres. This makes the data more discoverable to the research community who might reuse them.
6. Maintain Best Practice in Record Keeping, Protect Investments and Reuse Archived Content
Maintaining best practice in record keeping will, or should, enforce website and social media archiving within the heritage, museums and higher education markets to the same degree as other forms of corporate records. For example, GDPR compliance has been a regulation leveraged by some higher education institutions. This has helped garner support for better management of digital records such as website and social media data.
Website and social media archiving also helps to preserve an organisation’s investments in digital communications. For example, professionally produced videos and blog content that would otherwise be at risk of being lost. Sean Rippington from the University of St Andrews also highlights the use of preserved website and social media content in outreach and alumni relations work, and to support marketing and other publicity activities.
Get Research Access to MirrorWeb and Arkivum's Hybrid, End-to-End Digital Archiving Solution
Arkivum had increasingly been asked about website and social media archiving by their existing customers, but were unable to provide a satisfactory solution. This is because the concept was still cutting-edge and the technology in conception and development within the community at large.
Then MirrorWeb emerged into the market in conjunction with The UK National Archives when they launched their new web and social media crawler tech-stack with automated QA features built on AWS cloud technology back in 2016.
Taking a lead from other successful technology sectors, both companies identified that concentrating on what they do well and specialise in is best practice - but collaborating to bring the specialisms together to improve and satisfy the customer need is the way forward. And now MirrorWeb and Arkivum’s hybrid, end-to-end digital archiving solution is the most comprehensive data lifecycle management solution and service for website archiving and social media archiving in museum and higher education markets.
The portal provided by MirrorWeb is user-friendly, light-touch with minimal user input to setup, crawl and replay web and social media archives in high-fidelity, and cost-effective as it harnesses the power of the cloud - meaning cost is no longer a barrier to capturing and archiving the future digital content we all need today, and which we need for tomorrow once we understand how to use it and where it fits in the overarching archivist’s asset bank.
The WARCs created are passed seamlessly and automatically via an API to Arkivum’s Perpetua system, preserving and future-proofing the web and social media content for all time.
The partnership now covers safeguarding, digital preservation, compliances, records management and integration, and data discovery, filling a major gap for institutions seeking digital preservation as a comprehensive service with open-standards and a no-vendor-lock-in philosophy.
To try the hybrid, end-to-end digital archiving solution, and see how it can benefit your organisation, request a demo of our website and social media archive here.