How are websites able to remove their content from the WayBack Machine?
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

If you click on this link (from the Wayback Machine), you'll notice that the web content successfully loads.

But then, something else loads, and the content is destroyed. (perhaps robots.txt?)

What is the loading that is making this 404 happen? Somehow, the is loading something externally, and that is causing the 404 to result.

Is there a way to prevent this?

Maybe a Chrome extension that does step-by-step website loading--such that the user can decide what to load, and what to not load.

Anyway, i'm just looking for ideas, because it seems like a lot of websites are mad that their content is on the Wayback Machine, and they are sabotaging the archive--so that a 404 replaces the content.



I found one way to do it. But it'd be nice if there was a more elegant solution--maybe a chrome extension.


Crowdsource coding tasks.

2 Solutions

For your particular link(economist) i was able to hit view page source at that instant and got HTML but no CSS

To remove a site from the Wayback Machine, place a robots.txt file at the top level of your site (e.g. and then submit your site below.

The robots.txt file will do two things:

 1.   It will remove all documents from your domain from the Wayback Machine.

 2.   It will tell the Internet Archive's crawler not to crawl your site in the future.

To exclude the Internet Archive's crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

                   User-agent: ia_archiver

                   Disallow: /

Robots.txt is the most widely used method for controlling the behavior of automated robots on your site (all major robots, including those of Google, Alta Vista, etc. respect these exclusions). It can be used to block access to the whole domain, or any file or directory within. There are a large number of resources for webmasters and site owners describing this method and how to use it. Here are a few:




Once you have put a robots.txt file up, submit your site ( on the form on

The robots.txt file must be placed at the root of your domain ( If you cannot put a robots.txt file up, submit a request to

Right, but my goal is to see the content at the above link from the Economist. e.g. if you click the link and watch closely, you'll notice that the webpage content actually does successfully load. But then is wiped away a second later. Is there anyway to halt this? Any way to prevent the eocnomist from wiping their content at Wayback Machine?
tonloc 2 years ago
ST2-EV 2 years ago
Hi. Thanks. But what method did you use to get this HTML. e.g. my goal is to be able to do this on future links when i stumble upon them in WayBack machine. e.g. there should be some sort of way to download content form a web source, and "press PAUSE" to prevent further downloading.
tonloc 2 years ago
Update: hmm i found one method to do it. It'd be nice if there was a more reliable way. Video in original post.
tonloc 2 years ago
i just hit view page source at the instant when the page loads.
ST2-EV 2 years ago
Winning solution

Another solution is to go to Chrome > Settings > Advanced > Site Settings > JavaScript and Add a new blocking URL:


This should disable JavaScript when loading a Web Archive page and thus allow the full economist page to load without getting the 404 message.

Video walkthrough:

Hi thanks. So in your video, what you did was prevent chrome form executing javascript at So, what this did was prevent the Economists "destroyer script" from firing after the website loaded?
tonloc 2 years ago
There is a script loaded in the page (filename: 0d74ccc6.js ) that causes the 404 redirect. If you can prevent this script from executing, you can probably browser through all the archived pages without redirections. You can use a Chrome Extension such as Resources Override ( to specify a specific file to be excluded from execution, e.g, the aforementioned one. When loading the archived page through the Resources Override extension and the script being blocked, you can see the article loading just fine.
kostasx 2 years ago
I am not sure if this is the economist that is blocking archived pages or a conflict with some of's scripts used to load the archived pages. Some further research is needed to see what's happening under the hood.
kostasx 2 years ago
View Timeline