Back to Blog

MirrorWeb Modifies Crawling Process to Reduce Carbon footprint

Sean Stapleton

Leading communications surveillance platform, MirrorWeb, has announced wholesale changes to its web crawling technology in order to become more energy efficient.

MirrorWeb operates Amazon Web Services (AWS) accounts in London, Ohio, Virginia and Frankfurt, which are utilized depending on their clients’ preferences. Over any given 24-hour period, each of these accounts is known to run thousands of web crawls, a vital element of the digital archiving service that MirrorWeb provides.

In recent months, the company has been making the transition from Intel based crawl servers to ARM (Advanced RISC Machine) based crawl servers. ARM processors were developed by Acorn Computers and eventually Apple, and provide a low power, energy-efficient alternative to their Intel counterparts.

AWS’ version of an ARM chip set, Graviton, uses up to 60% less energy for the same performance than comparable EC2 instances, such as Intel. Due to the move to the ARM chip set, MirrorWeb was able to reduce the size of their crawl servers by half, based on the performance gains achieved with Graviton.

Additionally, MirrorWeb is introducing the practice of ‘upload on rotation’. Traditionally, for each crawl, archival data would be stored on the crawl server for the duration of the web crawl. The storage capacity would need to expand as the crawl progressed, with additional space being repeatedly requested from Amazon, as it was unclear how large the crawl would end up being. At the end of the crawl, it would take some time to upload it to the cloud, depending on the size of the crawl.

For the new ‘upload on rotation’ process, every time an archive file is completed, a new file is created, and the previous file is uploaded right away. This saves energy wasted on repeatedly growing the storage, and the longer upload period at the end of the crawl, further increasing energy efficiency.

Philip Clegg, Chief Technology Officer of MirrorWeb, said: “The changes that we’ve made have been on the agenda for a while now, and we’re very happy to make the transition over to ARM processors. The performance benefits are remarkable, and we can use up to 60% less energy to get the same results. From an environmental perspective, it’s a no-brainer.

“Further tweaks to our crawling process should complement that perfectly. 'Upload on rotation’ saves energy on every one of our crawls. It shows our commitment to honing our processes while embracing our responsibilities”.

For more information about MirrorWeb web archiving, visit

More from the Blog

Communications Surveillance: A Company-wide Consideration

Outside of compliance teams, how does communications surveillance impact other critical roles within an organization?

Read Story

Feature Spotlight: Privileged and Marketing Addresses

We're ready to share the second in our MirrorWeb Insight Feature Spotlight series! This time, we'll take you though Privileged and Marketing Addresses.

Read Story

Feature Spotlight: Mobile Threading

Welcome to the first of our Feature Spotlight series, where we explore some of the top features of our new MirrorWeb Insight monitoring and surveillance platform. We’ve got a lot to share with you over the coming weeks, so sit back, relax, and get ready to dive into a world of Insight.

Read Story

See what we can do for you.

Let us show you why MirrorWeb is trusted by organizations across the globe for their compliance and digital preservation needs.