Back to Blog

MirrorWeb Modifies Crawling Process to Reduce Carbon footprint

Sean Stapleton

Leading communications surveillance platform, MirrorWeb, has announced wholesale changes to its web crawling technology in order to become more energy efficient.

MirrorWeb operates Amazon Web Services (AWS) accounts in London, Ohio, Virginia and Frankfurt, which are utilized depending on their clients’ preferences. Over any given 24-hour period, each of these accounts is known to run thousands of web crawls, a vital element of the digital archiving service that MirrorWeb provides.

In recent months, the company has been making the transition from Intel based crawl servers to ARM (Advanced RISC Machine) based crawl servers. ARM processors were developed by Acorn Computers and eventually Apple, and provide a low power, energy-efficient alternative to their Intel counterparts.

AWS’ version of an ARM chip set, Graviton, uses up to 60% less energy for the same performance than comparable EC2 instances, such as Intel. Due to the move to the ARM chip set, MirrorWeb was able to reduce the size of their crawl servers by half, based on the performance gains achieved with Graviton.

Additionally, MirrorWeb is introducing the practice of ‘upload on rotation’. Traditionally, for each crawl, archival data would be stored on the crawl server for the duration of the web crawl. The storage capacity would need to expand as the crawl progressed, with additional space being repeatedly requested from Amazon, as it was unclear how large the crawl would end up being. At the end of the crawl, it would take some time to upload it to the cloud, depending on the size of the crawl.

For the new ‘upload on rotation’ process, every time an archive file is completed, a new file is created, and the previous file is uploaded right away. This saves energy wasted on repeatedly growing the storage, and the longer upload period at the end of the crawl, further increasing energy efficiency.

Philip Clegg, Chief Technology Officer of MirrorWeb, said: “The changes that we’ve made have been on the agenda for a while now, and we’re very happy to make the transition over to ARM processors. The performance benefits are remarkable, and we can use up to 60% less energy to get the same results. From an environmental perspective, it’s a no-brainer.

“Further tweaks to our crawling process should complement that perfectly. 'Upload on rotation’ saves energy on every one of our crawls. It shows our commitment to honing our processes while embracing our responsibilities”.

For more information about MirrorWeb web archiving, visit https://www.mirrorweb.com/solutions/capabilities/website-archiving

More from the Blog

Whatsapp Compliance, Self-Reporting, and Ripping off the Band-Aid

The SEC has incentivized firms to self-report on off-channel violations. We look into the process and its benefits.

Read Story

FINRA Report 2024: Recordkeeping Takeaways

Key recordkeeping teakeaways from the 2024 FINRA Annual Regulatory Oversight Report.

Read Story

How MirrorWeb Evolves with Demand

Adaptability is vital in the world of communications surveillance. This blog looks at MirrorWeb’s journey as a company, and why it's helped us be agile and reactive to a challenging regulatory landscape.

Read Story

See what we can do for you.

Let us show you why MirrorWeb is trusted by organizations across the globe for their compliance and digital preservation needs.