Crawler

General Notes

The crawler crawls "random" elements from the ddb
Random Elements are chosen by searching for "*" and using a random result offset and limit
public domain is calculated for these random items and statistics about outcome and errors is created
all exceptions and results will be logged to the console by the crawler

Usage

To run the crawler start the server//application with command line parameter -c or --crawler

java -jar pdc-0.1-SNAPSHOT.jar -c or java -jar pdc-0.1-SNAPSHOT.jar --crawler

The crawler will start a search for the search term * to fetch items from the DDB-API.

Additional parameters:
--maxDepth 50 will run the crawler for 50 DDB Items and then stop the crawler. Defaults to 1000 items. Alternative: --depth

--fetchSize 10 will fetch 10 results in one request to the DDB-API and then calculate the public domain status. It will then fetch the next 10 results and so on (until maxDepth is reached). Larger numbers will speed up the crawler but will stress the DDB-API web service. Defaults to 100

--timeout 1000 the timeout in milliseconds in between of the calculation of the different DDB Items

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawler

General Notes

Usage

Clone this wiki locally