Skip to content

Events and callbacks

Péter Bencze edited this page Jun 16, 2018 · 6 revisions

How the framework handles events

When an event occurs the framework calls the appropriate callback to handle the event. The default behavior of these callbacks is to simply log the event. By overriding them, you can define the logic of the crawler.

The following callbacks are available:

Callback which gets called when the crawler is started.

Typically used to perform initialization (create handles/connections) before the crawling starts.

Callback which gets called when the browser loads the page.

Typically used to find URLs to follow and extract specific data from the HTML source.

Callback which gets called when the page does not load in the browser within the timeout period.

Callback which gets called when the content type is not HTML.

Provides the opportunity to simply download the specific non-HTML resource.

Callback which gets called when a request is redirected.

Callback which gets called when a request error occurs.

Callback which gets called when the crawler is stopped.

Typically used to perform resource cleanup (close handles/connections) after the run.

Clone this wiki locally