Back to Blog

A Guide to Website Archiving 2023

Sean Stapleton

Across all sectors, organizations are steadily publishing more and more digital content. 

This has become even more prominent post-pandemic, as the global community shifted online to overcome the physical limitations placed upon them by lockdown.

Websites are used to sell products and services to clients, to publish and share sales and marketing materials, and - from small businesses to state governments - as a primary channel of communication with members of the general public.

The proliferation of digital content presents a new set of challenges. Financial firms, for instance, must fulfill a range of compliance obligations in order to use digital channels (including websites) to advertise to clients. Elsewhere, organizations may need to keep records of their website content for dispute resolution purposes, and others might wish to preserve website pages simply for their cultural or historical significance, as a matter of public record, or as a point of reference for future marketing campaigns.

Website archiving is the only way of preserving website content in a form that lets it be revisited as it was at a particular point in time. It’s the only means of creating and maintaining a stable, time-stamped, verifiably authentic and independent version of web content. As the archives are independent, they are completely separate from the original website architecture and will only include the elements that were live at the time of archive, creating an iteration as close to its original form as possible.

There are many reasons an organization should archive its website, but in all cases they must ensure their archives are complete, secure and legally admissible. 

In this guide, we look at some of the specific challenges around website archiving in three different sectors - financial services, the public sector and brands - and how the MirrorWeb solution can help.

The Role of Website Archiving

Website archiving allows companies in the financial, public and retail sectors to store immutable records of their web pages. This helps ensure the following:

Compliance

Regulated companies and firms worldwide must record and retain all electronic communications under MiFID II (EU), FCA (UK), SEC and FINRA (US), AND ASIC (AU) rules. 

MiFID II, for example, states that the recorded electronic communications must be:

Complete - An organization must be aware of all types of electronic communications that are used and by whom. In addition to this, they must have a system and processes in place designed to capture and retain all records of those communications.

Accurate - An organization must be fully confident in the recorded electronic communications’ content and metadata which can demonstrate the exact dates and times that anything took place.

High Quality - An organization should be able to reproduce records of electronic communications in as close to their “original form” as possible. 

The leash is tightening globally. As of November 2022, the SEC’s Marketing Rule expanded the definition of what constitutes an ‘advertisement’ to include website content, which must now be captured and archived in its entirety. Meanwhile, the Australian Securities and Investments Commission (ASIC) has begun requesting records of historical website price promises from insurance firms, and administering fines to those that are unable to provide them.

Legal admissibility

Companies could be required to provide authenticated evidence of electronic data when in court. The records must demonstrate that the data has been stored in a format that is unalterable, and when it was archived. These requirements are covered in the following rules and regulations:

Federal Rules of Evidence (rule 901) - the requirement of  authenticating or identifying an item of evidence. 

The Code of Practice on Evidential Weight and Legal  Admissibility of Electronic Information (BS 10008:2014) - ensuring the authenticity and integrity of electronic information. 

SEC rule 17a-4 - which requires firms to archive electronic  business communications in non-rewritable and non-erasable (WORM) format.

eDiscovery Requests - archiving website data for dispute resolution and eDiscovery purposes can help ensure that the records are non-refutable and are a true reproduction of the content at that time.

Protection of IP and brand assets

Brands have a clear incentive to keep a long-term record of their activity to inform future campaigns. However, as more business activity occurs online, they will be continuously creating and publishing large amounts of digital content at speed which can be difficult to keep track of. Website archiving can be carried out on a regular basis, and with unlimited cloud storage and the ability to archive large data sets, it can ensure that nothing of value is lost.

Preservation of records of cultural and historical significance

Public-sector organizations and archivists may require the preservation of culturally important website content for instant access for historical public record data. Website archiving is the ideal solution to preserving large amounts of data and storing in an unalterable WORM file format.

Who Benefits From Website Archiving?

Financial Services

The financial services industry is under constant pressure to adopt new online channels if they are to keep up with the evolving digital landscape. However, in doing so, their use of these channels needs to be balanced against stringent compliance regulation from the likes of the SEC, ASIC, FINRA and more.

Any solution must be able to demonstrate compliance with such regulations as well as with any potential data sovereignty and GDPR requirements.

Public Sector 

Numerous national archives, libraries, governments and universities now archive website data to preserve all records of cultural and historical significance. This is mostly driven by legislation such as the UK Public Records Act 1958 and, more recently, the Freedom of Information Act 2000.

As the public sector undertakes more activity online, organizations are looking for ways to evolve their website archiving provisions in order to take advantage of new technologies such as:

  • The cloud - To allow for efficient and flexible storage of the large data sets. 
  • Indexing and search - To make the data useful to researchers, civil servants, students and members of the public (including public-facing portals such as The UK National Archives). 
  • The update from the traditional ARC file format to the ISO standard WARC file format, which can help to store born-digital or digitized materials. 

Brands 

FMCG brands are creating more and more online content in addition to traditional brand assets. This content can easily be altered, corrupted or lost without planning and foresight. 

Keeping a record of online brand activity and customer communications can help…

  • Inform future brand direction (through monitoring of performance)
  • Inspire future campaigns
  • Ensure legally admissible records of all communications are kept, for cases of dispute resolution. 

How Do You Archive a Website?

When it comes to actually archiving your website content, there are several ways of doing so. Free online tools like the Wayback Machine are options, but require users to manually save every page individually. This simply isn’t feasible for the majority of firms given the frequency with which captures must take place to satisfy regulators; namely every time a change is made.

While some businesses rely on Content Management System (CMS) backups for record-keeping purposes, there are some major differences between a backup and an archive:

  • Digital signatures and metadata: Most importantly, data taken from a CMS backup won’t have a digital signature, and therefore won’t be authenticatable, or admissible in court. Further, CMS backups don’t allow legal teams to easily export a record with all of its crucial metadata.
  • Full-text search: CMS backups will not provide a full-text search feature. Auditors can request information urgently and at a moment's notice - when manually capturing your site, the reams of data captured can make it difficult to locate specific data quickly.
  • Compliant Data Storage: For regulated industries with specific recordkeeping rules (such as the public sector and financial services), a CMS backup does not meet requirements.

Alternatively, an automated website archiving service allows businesses to keep a complete record of their website content, while relieving the manual burden and ensuring legal admissibility.

How MirrorWeb Can Help

MirrorWeb delivers cloud-based archiving and monitoring solutions for the information-driven enterprise. Trusted by clients ranging from global asset managers to state governments, our website archiving platform allows organizations to meet their requirements in capturing immutable, historical records of their website.

Our solution is able to improve an organization’s operational efficiency and improve compliance with our value-added features, such as: 

  • Complete archives - We archive all website content, this includes everything from internal and external sources, images, video, metadata and even social media channels.
  • Cloud-native solution - MirrorWeb are partnered with AWS to deliver a turnkey, scalable and future-proof solution in a fully secure AWS S3 environment. 
  • Close to original format - Every archive is captured in near-real time, as it was on the day it was published. 
  • Full text search - With Elasticsearch technology, all archives are indexed and searchable. 
  • Sophisticated user portal - The MirrorWeb portal allows users to be able to search and replay content, set-up and manage archive crawl frequency and parameters. The data is accessible anywhere as it is cloud-based providing flexibility and scalability with user protocols that manage functionality. 
  • Public portal - Where there is a need for public access to archives, specifically within government and national archives for example, MirrorWeb have developed a proprietary portal that integrates with the user portal to deliver high fidelity access and contemporary user experience for sharing records of cultural and historic significance with the general public. 
  • Meet compliance requirements - all archived records are stored in the ISO-standard WARC format, including date and timestamps. The MirrorWeb portal provides tools to monitor compliance risks and records can be made available to eDiscovery professionals as required. 
  • Local territories - Archives are stored in local territories being ISO9001 and ISO27001-certified and GDPR compliant.

Book a demo at the above link where we can establish your requirements, take you through the platform, and discuss how MirrorWeb can help your business!

More from the Blog

What License Type Do I Need in Order to Archive Microsoft Teams?

Microsoft has recently updated the licensing requirements for compliance vendors capturing data from their Teams platform.

Read Story

Financial Services’ Inconvenient Need for WhatsApp Archiving

The financial services sector has recently been hit with huge fines over the widespread use of apps like WhatsApp. How did we get to this stage, and what options do firms now have to avoid these penalties?

Read Story

How to Archive Your Google Optimize Content

If you're running Google Optimize to capture A/B tests, then read on to learn how MirrorWeb can capture all variants of your site

Read Story

See what we can do for you.

Let us show you why MirrorWeb is trusted by organizations across the globe for their compliance and digital preservation needs.