The application will follow every link of each visited page, up to the specified link depth limit.
It will then return the list of URLs matching the search query, ordered by relevance.
The visiting of the pages is realized through the integration of JPPF with the Smart and Simple Web Crawler project.
The search and page indexing are implemented thanks to the Lucene project from the Apache Foundation. Before running this sample application, you must have a JPPF server and at least one node running.
For information on how to set up a node and server, please refer to the JPPF documentation.
Once you have a server and node, from a command prompt, type: "ant run"
The GUI is separated in two main parts, the search parameters at the top and the search results at the bottom.
The "Compute" button submits the web search for processing by JPPF.
The "Reset defaults" button restores the start url, search query and link depth to their original values
The web search relies on 3 parameters:
- start URL: this is the URL of the web page from which links will be followed recursively. To limit the scope (and length) of the search, a filter is set so only links to the same server will be followed
- search query: this is what to search on the visited pages; interpreted as Lucene query syntax
- search depth: this parameter also limits the scope of the search, by restricting the depth of the chains of links that can be followed. For example: if the depth is set to 1, only the links found on the start page will be followed. If it is set to 2, the links found in the pages specified by the links in the start page will also be followed
- Integration with other Java-based open source projects.
Among other things, you will notice that the 3rd-party libraries (including Lucene and Smart and Simple Web Crawler) remain in the classpath of the client application. - Using JPPF as part of a workflow.
In this sample the search is realized through multiple invocations of JPPF:- once for every search depth level to perform the link navigation and gather links for the next level
- once for the actual indexing and searching of the pages, once all links have been gathered
- Integration with a graphical user interface.
To generate the Javadoc, from a command prompt, type: "ant javadoc"
If you need more insight into the code of this demo, you can consult the source, or have a look at the API documentation.
In addition, There are 2 privileged places you can go to: