The Ultimate Guide to Website Archiving

Learn what website archiving is, the different archiving methods available, how to compare web archiving providers and what service levels to expect.
GRAB THE ESSENTIAL GUIDE TO WEBSITE ARCHIVING 

If you'd like to read more in your own time, grab our PDF guide right here!

Download now

Welcome to The Ultimate Guide to Website Archiving. This is a resource for those looking to understand exactly what website archiving is, the benefits archiving offers and how you can go about archiving your website. This resource will be useful to a variety of business professionals across digital, compliance, marketing and information governance.

After reading this guide, you'll understand:


Struggling for time? You can download our condensed Essential Guide to Website Archiving Guide by clicking below.

Grab our mini-guide to learn everything you need to get started with website archiving.

 


Getting started with website archiving

What is website archiving?

Website archiving is the process of automatically collecting websites and the information they contain and preserving them in a lasting archive. Using special ‘crawl’ technology, entire websites are scoured to create 100% accurate replicas. These can be replayed and navigated just as when they were live, offering a snapshot of your website at a specific date and time.

(NOTE: Archives are not the same as backups, as we explain below)

 

Who do businesses archive their websites?

Websites contain huge amounts of data including brand assets, logos, marketing messages, etc. So much investment is put into websites, which have increasingly become the ‘face’ for most businesses, that it makes sense to create thorough records of these. In fact, a recent IGI survey found 98% of organisations had online data they needed (or wanted) to keep for more than 10 years. As well as preserving valuable brands, website archiving can also be extremely useful for meeting regulatory and legal requirements and protecting content of historical or cultural significance.

 

What are the main challenges when archiving a website?

Size is typically the biggest challenge as websites are growing every day. According to Netcrafts, there are around 1.2 billion active websites and over 4.2 billion webpages. Companies everywhere are continually growing their websites, creating new webpages and publishing more and more content. These webpages are becoming ever more complex with hyperlinks, rich media and personalised content via the use of cookies. In particular, some regulated businesses have to ensure they have several versions of their websites which doubles up the work required (for instance, an investment firm creating both retail and professional versions of its webpages). Unsurprisingly the average size of a webpage is now over 3MB!

This means archiving a modern website is a big challenge. Not only is there an immense amount of data to accurately record and replicate, but a thorough web archiving solution should allow for users to navigate and replay the website at a later date. Creating such a perfect and usable record, that is continually updated, is not easy.

 


Deciding how to archive your website

Knowing your options

Backups

Backups are a great way of securing your website. Should you suffer an outage or your systems go down, being able to quickly return to a backup means your website can be back up and running in no time. Backups also provide a valuable safeguard when it comes to protecting data, ensuring nothing is lost in a worst case scenario.

However, as an archiving solution, backups do not tick every box. A backup stores data that’s needed for operational recovery so it has a very different purpose to an archive. Unlike a specified archive, a backup can also be altered so it does not meet the legal-admissibility requirement.

 

Screenshots

Taking a screenshot is a quick way of creating an image of your website, which can then be easily shared with colleagues. We’ve all become accustomed to taking screenshots of content we find interesting and it can be a lot quicker than sharing a link. This is why many small businesses regularly use screenshots of their websites for record-keeping.

Screenshots may be accurate in the content they show but they fall short of being a complete archiving solution. Screenshots are standalone static images which cannot be interacted with. At the same time, they aren’t immune to tampering and therefore fail to satisfy compliance standards from many regulators. And do you have the time to repeatedly screenshot a continually growing website, specifically allowing for personalised and localised pages?

 

DIY Archiving

Website archiving has surged in popularity as it provides a full and legally-admissible solution to regulatory record-keeping requirements. Such archives aren’t just a compliance solution, and these 100% accurate (and interactive) archives have become renowned for preserving valuable brand assets and providing protection in times of dispute. Crawl technology simply cannot be matched in how it scours entire digital estates to create 100% accurate replicas that satisfy compliance requirements.

That said, there is a reason large multinationals turn to specialists like Pagefreezer, Hanzo, Smarsh and MirrorWeb for their website archiving. Developing and implementing this crawl technology takes considerable expertise and effort, which can quickly become a hefty expense for companies and explains why many prefer to outsource this compliance headache.

 

Specialist Web Archiving Provider

Due to the extensive technology requirements that go into website archiving, a number of vendors now operate in this space and provide archiving solutions to businesses that want to get on top of their digital record-keeping. The differences between these providers can be more significant than you think, in how their technology works, the functionality they offer, their pricing models, etc. Finding the solution that suits your business is critical.

 

What tools and functionality does your web archive need?

Search

Web archives are popular because of the audit trail they provide. A comprehensive web archiving solution will store all your archives in a secure and ordered fashion. However, you may often need to find a specific piece of information from a certain archive. A search functionality can be extremely useful in finding the archives you need, especially when in response to a time sensitive request (from a regulator for example).

 

Replay

The really great thing about archives is they are so accurate they are essentially working versions of the webpages they were created from. Unlike a static screenshot, users can return to past archives and click hyperlinks, navigate pages, watch videos and live media, etc. This replay functionality can be used in countless ways, making archives more than just compliance assets.

 

Download

Web archives are extremely helpful for internal use, but many archiving providers understand that at times these will need to be shared with third parties (i.e. for regulatory and legal purposes). The ability to download and share your archives can be very useful, depending on your business’s needs, and is something to ask your potential website archiving provider about.

 

Compare

Being able to show what has changed on a website (i.e. a new banner, logo, piece of content, etc) is one of the core reasons companies use website archiving. However, if the changes are minimal they can be easily overlooked. Now, some archiving providers are exploring exciting new functionalities that allow users to compare different archives at the same time to spot the difference!

 

Legal Admissibility

The legal admissibility factor is often the main thing that sets web archiving apart from other solutions (like backups, screenshots, etc). Website archives are created with compliance and legal admissibility in mind, so they will include a full digital signature with a timestamp that provides irrefutable proof of authenticity. At the same time, the fact website archives are tamperproof further adds to their credibility.

 

What is your goal?

Time Saving

Records can be made of websites manually, but this can be time-consuming and prone to human error. Archiving solutions have become popular because they create 100% accurate records within minutes. The time this gives back to compliance teams can’t be overlooked.

 

Cost Reduction

The time spent on manual record-keeping can mean a significant cost for a business (i.e. deploying entire teams to capture screenshots and log records). Furthermore, going through handmade records can make even the lightest audit an extremely time-consuming and expensive process! A thorough web archiving solution greatly reduces the expense wasted on temporary fixes, with cloud-based software allowing for true scalability. This also allows for competitive pricing that potentially offers direct savings.

 

Regulatory Compliance

As well as 100% accurate depictions of how your website appeared on a given time and day, the authenticity of a website archive makes it an extremely useful compliance tool. A digital signature means every archive, created in a tamperproof ISO-accredited, WORM format, is time-stamped and legally admissible.

 


Comparing web archiving providers

 

Don’t know where to start? Website archiving is niche so we get it. To help, here are some factors beyond basic tools and functionality to consider during your company’s procurement process.

Value

Pricing structures will vary among web archiving providers and this can often be based on your archiving needs (number of pages, memory size, frequency of archives, etc). If you’re talking with an archiving provider, ask them up front about their pricing model. Ideally this should be clear and easy to understand, so remember to ask about things like additional storage or download charges (ideally, a provider will offer pricing per website and not per page).

 

Service

Archiving providers can offer very different levels of service so it’s definitely worth double checking this and seeing how much support you can expect. For example, would you be assigned a dedicated account manager? Is there a helpline? Are there any response SLAs?

 

Flexibility

Your chosen solution has to keep up with the website being archived. Helpfully, some providers can tweak and tailor their solutions based on the company they are working for. Speak to your potential archiving provider about this: is their solution uniform for everyone or can they tailor this to your bespoke requirements?

 

Innovation

Archiving hasn’t been around for long so most website archiving platforms are at the cutting edge of tech innovation. However, some solutions on the market are white label propositions. Drill your potential archiving provider and ask them about the tech team they have in-house. Are there developments in the pipeline or is it a static product?

 

Experience

In industries, like financial services for example, there can be very specific regulations to satisfy. So, it’s very important to assess how much relevant experience your potential archiving provider has. For instance, who are their current clients? Do they have experience assisting similar businesses?

 

Specialisation

At the same time, assess how specialised the vendor is. Many platforms talk a lot about the volume of clients they look after and the number of industries they service. If that’s what you’re looking for then great, but it might be worthwhile ensuring their website archiving tools don’t suffer from a lack of focus.

 


Onboarding: what to expect

 

Demo

This should be a brief (and free) rundown of how a website archiving provider’s solution works in practice. They should get in touch with you beforehand to ask about what you’re looking for so the demo can be tailored to any specific needs you have. Whoever is talking you through this (a salesperson, tech developer, etc.) should be able to answer all your questions.

 

FAQs

How long does a demo usually last?

We’d advise making space in your schedule for around 30 minutes or so.


Can the demo be tailored to your site?

Yes, the archiving provider should be able to use the demo to create an archive of the webpages you have in mind to give you a first-hand look at how the technology works. For a bespoke demo, you can provide a chosen website URL to the archiving provider you’re meeting with. They will then initiate a crawl and prepare an archive specifically for you!

 

Why MirrorWeb?

MirrorWeb is a comprehensive archiving solution, popular across several industries for brand preservation, regulatory compliance and many more specific needs.

We aim to offer a no headache solution (our custom-built crawler captures your site with pixel for pixel accuracy) and customers benefit from a dedicated account manager which makes everything streamlined – and human!

Most of all, we have a fantastic in-house tech development team who are always working hard to hone the solutions we offer while continually innovating new functionalities.

The MirrorWeb Platform - Dashboard


To take a look at the platform, simply request a demo and a member of the team will be in touch shortly!


Take a look at the MirrorWeb Platform