-
Notifications
You must be signed in to change notification settings - Fork 3
Development
Like the project motto says PickAll is both modular and extensible, this means you can develop service in form or searchers or post processors.
To develop a PickAll service you can use any .NET language.
Searchers are meant to scrape the web and this is why you have access to IBrowsingContext
from AngleSharp library via ActiveContext
property (which is a property of the SearchContext
bound to your service, in turn available via Context
property).
A minimal knowledge of AngleSharp is required (including basics of HTML DOM and possibly CSS Selectors).
This my blog post on AngleSharp maybe of some help.
To be a PickAll searcher a type mush inherit from the Searcher base class. The library comes with built-in searchers which you can take as example.
A searcher must implement and mark async Task<IEnumerable<ResultInfo>> SearchAsync(string)
method and supply a constructor that accepts a System.Object
type. The latter is used to allow the searcher to consume an instance of configuration settings.
It follows an example:
public class MySearcher : Searcher
{
public MySearcher(object settings) : base(settings)
{
}
public override async Task<IEnumerable<ResultInfo>> SearchAsync(string query)
{
using (var document = await Context.ActiveContext.OpenAsync("https://www.somewhere.com/")) {
var form = document.QuerySelector<IHtmlFormElement>("#some_form");
((IHtmlInputElement)form["query_field"]).Value = query;
using (var result = await form.SubmitAsync(form)) {
var links = from link in result.QuerySelectorAll<IHtmlAnchorElement>("li a")
where link.Attributes["href"].Value.StartsWith(
"http",
StringComparison.OrdinalIgnoreCase)
select link;
return links.Select((link, index) =>
CreateResult((ushort)index, link.Attributes["href"].Value, link.Text));
}
}
}
In this example we scrape fake Somewhere search engine inject a query
into some_form
via query_field
. Than we exclude URL that starts whith HTTP scheme. This is essentially what is done Bing searcher to avoid returning UI links.
Although PickAll kernel can pass you everything, it's recommended to follow the same design of built-in types.
- Source
- As for the Data Type (which we'll talk about later) the class should be defined in the same source file of the service.
- Name
- The name the name of the service with
Settings
suffix. Following previous example, it will beMySearcherSettings
.
- The name the name of the service with
- Value Type
- Should be defined as
struct
or at least assealed class
.
- Should be defined as
- Content
- May contain all automatic properties you need. It follows and example of declaration and how you shuld be accessed within your searcher service:
public struct MySearcherSettings
{
public bool DeepSearch { get; set; } // whatever it means
}
public class MySearcher : Searcher
{
private readonly MySearcherSettings _settings;
public MySearcher(object settings) : base(settings)
{
if (!(settings is MySearcherSettings)) {
throw new NotSupportedException(
$"{nameof(settings)} must be of {nameof(MySearcherSettings)} type");
}
_settings = (MySearcherSettings)Settings;
}
public override async Task<IEnumerable<ResultInfo>> SearchAsync(string query)
{
if (_settings.DeepSearch) {
// Perform deep search
// ..
}
esle {
// Perform normal search
// ...
}
}
}
Please follow this design, but if you prefer doing it different way choose something meaningful. E.g.: don't require to pass a bool
(value of deep search setting), a dictionary, a dynamic or anonymous type are all better alternatives.
A PickAll Data Type is used to add additional data to each result. It's normally produced by a post processor, but nothing forbids producing additional data directly in a searcher service. It's main purpose is to to encapsulate a result, avoiding the bad practice to populating ResultInfo.Data
directly.
- Source
- As for the Settings Type the class should be defined in the same source file of the searcher.
- Name
- The name the name of the service with
Data
suffix. Following previous example, it will beMySearcherData
.
- The name the name of the service with
- Value Type
- Should be defined as
struct
or at least assealed class
.
- Should be defined as
- Content
- May contain all automatic properties you need, but normalizze one is suffice.