Monitor website for changes
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Dear Freelancers,

Need to be notified about changes on several websites in few hours via email.
Looking for some webservice or any other solution, which would monitor website (not separate webpage) for changes and send me email with:
- link to modified page;
- information before changes;
- information after changes.

I've found software called Website Watcher on http://aignes.com/
But still looking for service which would only looking for changes and notify me via email.

Thanks

Some time later i've found 100% working solution:

There is online service i was looking for - http://NeoWatcher.com
This service has all functions i need.
But today NeoWatcher is only in Russian.

Есть именно то, что нужно - мониторинг изменений сайта.
Контролирует и показывает где именно правили веб страницы.
Умеет даже делать скриншоты и высылать отчеты на почту.
Есть аналитика.

What type of sites
eji001 almost 7 years ago
@BCECJIAB, is this for monitoring a website that you control (do you have access to the web application's code)? Or do you want to monitor any arbitrary website every few hours, and receive an email when the content changes?
bevan almost 7 years ago
Actually i have access to code, but i'm looking for changes in frontend. So it rather looks like i want to monitor arbitrary website every few hours.
BCECJIAB almost 7 years ago
Perhaps explaining why you want to do this would make your question easier to answer. It is very likely that there is a better way to do what you want, rather than just "looking for changes in a website"... Like if you are looking for changes to a specific piece of information then I could easily write you a python script (using Beautiful Soup) to scrape the site and email you what the change to that piece of info is (if it changed) every hour.
slang almost 7 years ago
My company supports over 10 websites and need to know when someone changes any more or less important information on them. For example if some developer accidentally or not has changed support phone number i have to get notification as soon as its possible. Cause next day some of my clients will call me and say: "My costumers could not cantact support. Its wrong phone number on my site. What am i pay u for?" So i need to monitor whole websites for frontend changes.
BCECJIAB almost 7 years ago
are you already using version control? if not, the answer that tarraschk gave is something that you should definitely look into.
slang almost 7 years ago
Yep, we are already using SVN but to monitor code changes.
BCECJIAB almost 7 years ago
How about a script that downloads the site, runs a git diff comparing the downloaded site to the last time it was downloaded, and emails you the results? You would be emailed all of the changes to the HTML, which is pretty much the same as if you just kept a compiled version of the site (whatever your server generates) under version control... In fact, I would probably just use a git repo to hold all of downloaded HTML and then commit each time the new download doesn't match the old one.
slang almost 7 years ago
The best idea for now. But if main menu or footer will be changed, it will be changed on all pages (10 000 e.g.) and making/sending/reading report would be disaster.
BCECJIAB almost 7 years ago
Actually it would still just be one email, and it would only show the lines that are changed... it would just include a lot of repetitive diffs for all the files that changed. I'll start writing this script unless there is something better that you would want. What OS will this be running on? - you would need a server to be constantly running to check for changes to the site - probably a VPS
slang almost 7 years ago

Crowdsource coding tasks.

9 Solutions


Hello,

if you are looking for changes in your code, you might be looking for a solution with Git [1].

You could easily deploy Git on your server(s) to follow any changes, and set some "hooks" (scripts triggered when some changes happen) to send you emails [2] with modified pages [3], information before changes, and information after changes.

I hope it'll help !

For more information :

[1] Git : http://git-scm.com/

[2] Emails with hooks : http://stackoverflow.com/questions/552360/git-push-email-notification

[3] Get a list of modified files : http://stackoverflow.com/questions/10162695/git-pre-commit-hook-getting-list-of-changed-files

We have tools to monitor changes in code. The question is how to monitor changes in frontend - html code, generated by server.
BCECJIAB almost 7 years ago
Then you could use a script which would be downloading your websites, and use git diff (as slang800 said). It would download html code generated by your server, and then compare with the last version available, and send you per mail all changes.
tarraschk almost 7 years ago
See comment to slang800's message.
BCECJIAB almost 7 years ago

You can create your own website monitor with google apps script
https://developers.google.com/apps-script/
This is simple example
http://ctrlq.org/code/19030-google-apps-script-website-monitor

As i see it only checks is my site up or down. So its not to spec.
BCECJIAB almost 7 years ago
If you write more difficult script than sample, it will solve your problem
mrMakaronka almost 7 years ago
I need to monitor website (not separate webpage).
BCECJIAB almost 7 years ago
10 Free Services to Monitor Your Site’s Uptime There is no need to monitor uptime.
BCECJIAB almost 7 years ago
Winning solution

Make a script that downloads the HTML from the sites (using wget), runs a git diff comparing the new downloaded site to the last time it was downloaded, and emails you the results. You would be emailed all of the changes to the HTML (each line that is changed), which is pretty much the same as if you just kept a compiled version of the site (whatever your server generates) under version control... In fact, it would probably be best to just use a git repo to hold all of downloaded HTML and then commit each time the new download doesn't match the old one.

(figured I should move this from a comment to an actual answer)

...I would be willing to write this, however it would need to be run on a Linux based OS (preferably Ubuntu) because Windows doesn't have any of the tools needed to do this easily, and IMO it is terrible for programming on.

(figured I should move this from a comment to an actual answer) Good idea.
We are discussing this solution for now. Will inform u about our decidion in a day or two.
BCECJIAB almost 7 years ago
Just an advice: wget will not retrieve AJAX content. See other solutions here: http://ubuntuincident.wordpress.com/2011/04/15/scraping-ajax-web-pages/
Slava almost 7 years ago
True, but typically AJAX content will follow a specific URL scheme that could easily be followed by a custom script, especially if it is being generated by the server... but without knowing what these sites are, it is impossible to tell for certain. Also, data like phone numbers would probably not be kept in AJAX-ified content because search engines would have the same issue with scraping this content... and if it is, then it is probably included in a sitemap so Google can index it. However, if there is AJAX content on these sites, and it uses a complex (unpredictable) URL pattern, and it is not included in a sitemap, and it is not just used for trivial enhancement to the sites, then I could look into a headless browser for scraping, like PhantomJS or Crowbar... but this is probably overkill, especially because BCECJIAB has access to the code of these sites (so (s)he probably knows how the URLs are formatted and if the sites use AJAX).
slang almost 7 years ago
Exactly. Good luck implementing all of that for $5. :-) You should really consider selling it later as a web service.
Slava almost 7 years ago
Award $5 is just for idea.
BCECJIAB almost 7 years ago
No offense meant. Everyone was aware of the amount out front. Just taunting.
Slava almost 7 years ago
lol, yeah... implementing that would take at least a day of programming, so I would charge well over $5.
slang almost 7 years ago
The decidion is to test Website Watcher. Thanks for help.
BCECJIAB almost 7 years ago

There is a good collection of links with reviews here:
www.rba.co.uk/sources/monitor.htm

P.S. If you find this link useful, I am happy for the bounty to go for charity unless Karen Blakeman is here. :-)

I need to monitor website (not separate webpage).
BCECJIAB almost 7 years ago
Well, to monitor the entire website you need to have all the links to feed to the monitor. If that's your server - create your own list otherwise you need a crawler. Like that: http://blog.oneortheother.info/scripts/generate-url-list/index.html Crawler is not the best solution here because website can have some "deep" pages.
Slava almost 7 years ago
We have over 200 000 pages... some of them could be rly "deep".
BCECJIAB almost 7 years ago
I seriously doubt all 200k pages are interlinked in such a way that crawler would be able to find them all. Hence your developers would have to provide the list of pages to monitor basing on their knowledge of architecture. Then you can run any of the page monitors over this list.
Slava almost 7 years ago
Sites are usually designed to be interlinked with either links between pages, or (for larger sites like stack-overflow) a massive sitemap... without something like this, Google would not be able to index the unlinked pages and users would not be able to find them without a specific link to the page. Also, if BCECJIAB is trying to monitor a site the size of stack-overflow, he should probably look into revision control for his database (used to generate the pages), rather than scraping the actual pages.
slang almost 7 years ago
I do not think BCECJIAB is looking to monitor changes on something like stack-overflow where all the content is generated by users. Imagine the size of email message with just one day change log! :-) It has got to be something rather static like a knowledge base or a catalogue. Those things are often designed to actually prohibit conventional crawlers from dumping the contents. Thus I think crawler would be of little help and the list of pages should be provied by developers.
Slava almost 7 years ago
Yep. Just need to be notified when something like "James Bond was here" appeared on any page.
BCECJIAB almost 7 years ago
yeah, stack is just an example of a site that both cannot have all pages interlinked (google couldn't crawl over all the questions posted there) so they use a sitemap. Also, it is a site that uses a database (of some type) to hold questions, rather than just having static pages, so revision control for the database would be much better suited for a site like it. If it is a catalouge then it probably would be better monitored with revision control at the database level, which would be much faster than downloading the pages. Also, if the site is, for whatever reason, designed to prevent crawlers from indexing it, then a list of pages would be best, however this seems like a pain for the developers to make, and it would be better to use existing links or a sitemap to get the list automatically. Oh! another idea: if this is a site managed with a CMS (which would make sense, given the number of pages) you may just want to have the changes recorded by the CMS itself and emailed to you every hour. So, when a user changes something then that change gets added to a queue which at the end of the hour gets emptied into a email and sent to you. Also, if you are having untrusted users input info into your sites, you might want to add some type of spam detection or flagging to keep changes, that match certain heuristics which identify spam, from being published until you see them (like the type of detection http://akismet.com/ uses).
slang almost 7 years ago

You can`t monitor the whole website. Website consists of the individual pages and you can monitor only separate webpages.

My solution is:

  1. To create a webpages list (an url list)

  2. To define the elements for the monitoring on the each webpage (for example using the css selectors)

  3. To make a script that will download the HTML from the url list and compare the defined blocks with the previous version.
    There is a PHP solution for the selecting of the HTML code elements (see phpQuery - jQuery port to PHP )

  4. As an example you can compare not the whole text, but its md5 hash. Оr use git diff comparing.

  5. As a result you can make a report, containing:

a) a list of modified pages (width link)

b) a copy of the original page (before changes)

c) an actual page (information after changes)

The above mentioned description is a general algorithm that can be implemented in almost every programming language.

It's the third copy of the same solution, which was suggested by slang800 and tarraschk earlier on this page.
BCECJIAB almost 7 years ago
Yes, but there is a significant difference. I propose to track blocks on the page, not the whole page. This way you will avoid duplication of records, if change the phone on whole site, for example.
maxim-dev almost 7 years ago

SaaS product http://neowatcher.com/ for russian speaking is good online service to monitor website for changes.


Maybe too late but still worth to try quite universal http://www.wachete.com
Watching of part of page, full page or full page with subpages. Email notifications and mobile apps.

View Timeline