Skip to content

Latest commit

 

History

History

WebSearchEngine

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Search Engine demo

What does the sample do?

This sample application performs as a web search engine. It takes a search query and performs the search starting from a specified URL.
The application will follow every link of each visited page, up to the specified link depth limit.
It will then return the list of URLs matching the search query, ordered by relevance.
The visiting of the pages is realized through the integration of JPPF with the Smart and Simple Web Crawler project.
The search and page indexing are implemented thanks to the Lucene project from the Apache Foundation.

How do I run it?

Before running this sample application, you must have a JPPF server and at least one node running.
For information on how to set up a node and server, please refer to the JPPF documentation.
Once you have a server and node, from a command prompt, type: "ant run"

How do I use it?

The GUI is separated in two main parts, the search parameters at the top and the search results at the bottom.

The "Compute" button submits the web search for processing by JPPF.

The "Reset defaults" button restores the start url, search query and link depth to their original values

The web search relies on 3 parameters:

  • start URL: this is the URL of the web page from which links will be followed recursively. To limit the scope (and length) of the search, a filter is set so only links to the same server will be followed
  • search query: this is what to search on the visited pages; interpreted as Lucene query syntax
  • search depth: this parameter also limits the scope of the search, by restricting the depth of the chains of links that can be followed. For example: if the depth is set to 1, only the links found on the start page will be followed. If it is set to 2, the links found in the pages specified by the links in the start page will also be followed
When the search is over, the results are displayed in the search results panel, in descending order of relevance.

What integration features of JPPF are demonstrated?

  • Integration with other Java-based open source projects.
    Among other things, you will notice that the 3rd-party libraries (including Lucene and Smart and Simple Web Crawler) remain in the classpath of the client application.
  • Using JPPF as part of a workflow.
    In this sample the search is realized through multiple invocations of JPPF:
    • once for every search depth level to perform the link navigation and gather links for the next level
    • once for the actual indexing and searching of the pages, once all links have been gathered
    This design explains why the search can take a long time since, contrary to search engines such a Yahoo or Google, the indexes are recomputed during the search.
  • Integration with a graphical user interface.

How can I build the sample?

To compile the source code, from a command prompt, type: "ant compile"
To generate the Javadoc, from a command prompt, type: "ant javadoc"

I have additional questions and comments, where can I go?

If you need more insight into the code of this demo, you can consult the source, or have a look at the API documentation.

In addition, There are 2 privileged places you can go to: