Breeze-bot
Breeze‑bot is an application designed for crawling URL's within a specified root domain, it is not a free roaming crawler.
This document is provided with webmasters in mind, it contains information about the purpose of Breeze‑bot and how to manipulate it's behaviour on a website.
Identification
Breeze-bot can be identified by it's useragent breeze‑bot - http://www.getme.co.uk/breeze and the IP address 217.10.147.71.
Purpose
Breeze‑bot is used to parse part or all content within a web domain for the purpose of analyzing the content and producing reports (see Atmos reports).
Unlike most crawlers, Breeze‑bot is restricted to parsing within a single domain. You are only likely to be visted by Breeze‑bot if you are an existing Getme client, or if you have requested a report.
Breeze‑bot may attempt to verify external (to the root domain) URL's exist, but will not attempt to parse or store the content.
Frequency
Breeze‑bot attempts to parse a domain in one go. It may be run as a one off, or on a regular basis (typically monthly or quarterly) depending on the clients requirements for reports. Breeze is a threaded application but is typically limited to 5 threads each of which can make a maximum of one request every 5 seconds.
Breeze‑bot is usually run during a websites period of lowest activity, and on large websites may be stopped and re‑started to help balance server load.
Modifying behaviour
Breeze‑bot obeys both robots.txt and robot specific meta-tags (see http://www.robotstxt.org). These are the standard mechanisms for webmasters to tell web robots which portions of a site a crawler is welcome to access.
