Skip to content

Development

Giacomo Stelluti Scala edited this page Jan 12, 2020 · 42 revisions

Manifesto

Like the project motto says PickAll is both modular and extensible, this means you can develop service in form or searchers or post processors.

Prerequisites

To develop a PickAll service you can use any .NET language.

Searchers are meant to scrape the web and this is why you have access to IBrowsingContext from AngleSharp library via ActiveContext property (which is a property of the SearchContext bound to your service, in turn available via Context property).

A minimal knowledge of AngleSharp is required (including basics of HTML DOM and possibly CSS Selectors).

This my blog post on AngleSharp maybe of some help.

Searcher

To be a PickAll searcher a type mush inherit from the Searcher base class. The library comes with built-in searchers which you can take as example.

A searcher must implement and mark async Task<IEnumerable<ResultInfo>> SearchAsync(string) method and supply a constructor that accepts a System.Object type. The latter is used to allow the searcher to consume an instance of configuration settings.

It follows an example:

public class MySearcher : Searcher
{
    public MySearcher(object settings) : base(settings)  
    {
    }

    public override async Task<IEnumerable<ResultInfo>> SearchAsync(string query)
    {
        using (var document = await Context.ActiveContext.OpenAsync("https://www.somewhere.com/")) {
            var form = document.QuerySelector<IHtmlFormElement>("#some_form");
            ((IHtmlInputElement)form["query_field"]).Value = query;
            using (var result = await form.SubmitAsync(form)) {
                var links = from link in result.QuerySelectorAll<IHtmlAnchorElement>("li a")
                            where link.Attributes["href"].Value.StartsWith(
                                "http",
                                StringComparison.OrdinalIgnoreCase)
                            select link;
                return links.Select((link, index) =>
                    CreateResult((ushort)index, link.Attributes["href"].Value, link.Text));
            }
        }
    }

In this example we scrape fake Somewhere search engine inject a query into some_form via query_field. Than we exclude URL that starts whith HTTP scheme. This is essentially what is done Bing searcher to avoid returning UI links.

Settings Type

Although PickAll kernel can pass you everything, it's recommended to follow the same design of built-in types.

  • Source
    • As for the Data Type (which we'll talk about later) the class should be defined in the same source file of the service.
  • Name
    • The name the name of the service with Settings suffix. Following previous example, it will be MySearcherSettings.
  • Value Type
    • Should be defined as struct or at least as sealed class.
  • Content
    • May contain all automatic properties you need. It follows and example of declaration and how you shuld be accessed within your searcher service:
public struct MySearcherSettings
{
    public bool DeepSearch { get; set; } // whatever it means
}

public class MySearcher : Searcher
{
    private readonly MySearcherSettings _settings;

    public MySearcher(object settings) : base(settings)  
    {
        if (!(settings is MySearcherSettings)) {
            throw new NotSupportedException(
                $"{nameof(settings)} must be of {nameof(MySearcherSettings)} type");
        }
        _settings = (MySearcherSettings)Settings;
    }

    public override async Task<IEnumerable<ResultInfo>> SearchAsync(string query)
    {
        if (_settings.DeepSearch) {
            // Perform deep search
            // ..
        }
        esle {
            // Perform normal search
            // ...
        }
    }
}

Please follow this design, but if you prefer doing it different way choose something meaningful. E.g.: don't require to pass a bool (value of deep search setting), a dictionary, a dynamic or anonymous type are all better alternatives.

Data Type

A PickAll Data Type is used to add additional data to each result. It's normally produced by a post processor, but nothing forbids producing additional data directly in a searcher service. It's main purpose is to to encapsulate a result, avoiding the bad practice to populating ResultInfo.Data directly.

  • Source
    • As for the [Settings Type](#Settings Type) the class should be defined in the same source file of the searcher.
  • Name
    • The name the name of the service with Data suffix. Following previous example, it will be MySearcherData.
  • Value Type
    • Should be defined as struct or at least as sealed class.
  • Content
    • May contain all automatic properties you need, but normalizze one is suffice.
Clone this wiki locally