Breeze-bot

Breeze‑bot is an application designed for crawling URL's within a specified root domain, it is not a free roaming crawler.

This document is provided with webmasters in mind, it contains information about the purpose of Breeze‑bot and how to manipulate it's behaviour on a website.

Identification

Breeze-bot can be identified by it's useragent breeze‑bot - http://www.getme.co.uk/breeze and the IP address 217.10.147.71.

Purpose

Breeze‑bot is used to parse part or all content within a web domain for the purpose of analyzing the content and producing reports (see Atmos reports).

Unlike most crawlers, Breeze‑bot is restricted to parsing within a single domain. You are only likely to be visted by Breeze‑bot if you are an existing Getme client, or if you have requested a report.

Breeze‑bot may attempt to verify external (to the root domain) URL's exist, but will not attempt to parse or store the content.

Frequency

Breeze‑bot attempts to parse a domain in one go. It may be run as a one off, or on a regular basis (typically monthly or quarterly) depending on the clients requirements for reports. Breeze is a threaded application but is typically limited to 5 threads each of which can make a maximum of one request every 5 seconds.

Breeze‑bot is usually run during a websites period of lowest activity, and on large websites may be stopped and re‑started to help balance server load.

Modifying behaviour

Breeze‑bot obeys both robots.txt and robot specific meta-tags (see http://www.robotstxt.org). These are the standard mechanisms for webmasters to tell web robots which portions of a site a crawler is welcome to access.


Getting in touch

Getme Ltd, P.O.Box 765, Great Witley, Worcs. (UK) WR6 6TU

t +44 (0) 1905 670032
f +44 (0) 871 5594457

enquires@getme.co.uk