Protecting Your Organisation’s Digital History by Archiving in the Cloud
August 16, 2018 • 6 min read
Your digital history is at risk, and the cloud is going to help you protect it. That may sound like scaremongering, but stick with us.
Organisations are creating and amending online content all the time. Using social media to share business updates and engage with their customers, and using websites to publish information and resources.
But what happens to it all if we don't do something to protect it now?
- To all the hundreds of thousands of Facebook posts per minute?
- To the thousands of Tweets per second?
- To web pages with an average lifetime of 90 days?
- To the billions of gigabytes of data created online each day?
Why is my organisation's digital history at risk?
There is more awareness amongst organisations of the commercial, cultural and historical value of digital content, but without the right planning and foresight, this digital content is at risk for several reasons:
- Organisations are relying on technologies or formats in danger of becoming obsolete.
- There are issues with third-party platforms which can harm data integrity.
- Reliance on content management systems and backups that provide short or medium term security.
An eye-opening example is the once world-leading Myspace. At its peak the social media platform was attracting around 100 million monthly users. At its end former users lost pages, messages and photos they could not get back.
It is also a difficult process finding older forms of websites and web pages. After all, search engines like Google only index the now and show the latest content.
Although most people take it as given that their content on the internet is safe, this is incorrect. Organisations need to act now to protect their digital communications and ensure they are accessible in the future.
The cloud-based solution to protecting your digital history
Digital communications will form the basis of the legacy of 2018. To protect and preserve this evolving history requires a web and social archive that:
- Securely stores the data and ensures it is usable (for it to be most valuable).
- Delivers a snapshot of a site or social media communications at a specific time.
- Creates a permanent, unalterable record of an organisation’s digital communications at any given time.
Doing so has positive implications for private and public sector organisations. It allows them to preserve their content of legacy and historical significance, and companies to show compliance in regulated sectors.
Why is the cloud the best method of archiving your digital history?
Traditional archiving methods using physical hardware have their benefits but their limits too. As your data storage needs increase, physical hardware is likely to require continuous investment in infrastructure.
Cloud storage is a scalable, almost unlimited capacity solution that gives you the capability to get more storage when you need it.
Physical hardware such as hard drives and servers can become overloaded or fail due to inherent risks. The cloud provides a higher level of redundancy (e.g. duplicate copies of data).
If your hard drive, server or data centre go down, you can be safe in the knowledge of normal resumption of your service with minimal disruption.
A cloud-native, ISO-certified web archiving solution provides assurance that an organisation is in complete control of access to their archives.
This has always been a key consideration but has grown in importance due to data protection. Public sector organisations, for example, will have sensitive information of national importance that needs securing.
The strong safeguards in cloud data centres and its scalable capacity make a cloud-based solution a secure option no matter the size of your dataset.
Companies and firms that need to meet compliance requirements, such as MiFiD II and GDPR, must record, monitor and keep all electronic communications.
The cloud achieves this level of compliance by providing essential scalability and future-proofing to ensure the permanent storage of data in an unalterable format.
Cloud-based archiving solutions decrease overheads needed to maintain infrastructure, including outgoings on space for local servers, power, etc. and hardware upgrade cycles.
This level of flexibility can enable you to focus on improving the archive itself. For example, by improving user interface or implementing advanced capabilities to make data accessible for large-scale research projects.
Usability is important for internal staff, students, librarians, researchers, and other users. But it is not easy making an archive searchable.
This is because web archives contain data stored in WARC file format, and playback of these requires indexing. This is essentially a list of all assets within a web archive such as PDFs and HTML data.
Providing this search functionality for big data archives can be a challenge with a traditional model. This is because it will need to include billions of very small items for indexing.
Which is why a flexible cloud-based solution is so beneficial for processing this type of data. You can scale up or down based on your search functionality requirements, and the cloud can also help to data quality by deduplicating pages.
It’s something MirrorWeb have had great success with by managing to process 1.4 billion documents for The UK National Archives in just 10 hours.
Protect your organisation's digital history
Organisations and companies have not been archiving websites since the dawn of the internet, so there’s already likely to be a 20+ year black hole.
This signifies a loss of valuable historical data we can’t retrieve, meaning organisations are unable to look back at the contents of their first website, their digital presence on social media, etc.
In 2018, the world is collectively estimated to spend one billion years' worth of time online. To avoid losing this developing history, organisations need to wake up to the urgent need for digital archiving to protect and preserve their legacy.
The MirrorWeb cloud-based digital archiving solution
MirrorWeb delivers cloud-native, ISO-compliant web archiving solution. This allows organisations to create permanent, unalterable records of all online communications.
We have chosen to archive in the cloud because it is a scalable solution, provides speed and reliability, has near-unlimited capacity to meet your big data needs, offers complete control over data storage, and results in real cost-savings for our customers.
By partnering with MirrorWeb, you’ll be using a trusted and secure archiving service provider. We have extensive experience in understanding your requirements, and offer many additional benefits to the cloud for your organisation:
State-of-the-Art: Offering support for web and social media data at scale, as well as indexing for search and big data initiatives.
ISO-Compliant: We are ISO9001 and ISO27001-certified and archive our data in the secure, date and time-stamped ISO28500 standard WARC file format.
UK-Based: We offer UK-based support 24/7/365 and store all archives in local territories to meet data protection, compliance and regulatory requirements.
User-Friendly: Our best-in-class user portal puts users in control of their data, allowing them to control archiving frequency, search and replay content, and view reports and notifications.
Cost-Competitive: We give full, uncapped access to the MirrorWeb portal at all times, leveraging Cloud economies of scale with no hidden seat fee, no setup and maintenance fees.