-
-
Notifications
You must be signed in to change notification settings - Fork 3
How to write a Search Engine Module
Look at examples of some already implemented search engines modules, to get an idea on how they should look. An example of a very simple module is mojeek (src/engines/mojeek). The duckduckgo module should also be looked at, as it is unique in multiple ways.
The folder for a search engines can contain the following files:
- <name>.go (e.g. qwant.go) - the main Go file for the module
-
options.go - a Go file with:
var Info engines.Info
var Support engines.SupportedSettings
- optionally
var dompaths engines.DOMPaths
- optionally
var timings config.Timings
(this may be moved soon)
- optionally a <name>.md (e.g. qwant.md) markdown file, explaining things worthy of note
- optionally a json_response.go Go file that has the structures and functions necessary for parsing JSON responses if the module receives them
- for anything less standard (for search engines that require a more unique implementation), try to refer to previous implementations
- Two things to keep in mind when creating a module:
- It should be as fast as possible
- It should lower the chances of it being rate limited as much as possible.
- This will be achieved by emulating user interaction as closely as possible. For example, making the request to the first page not have the page URL parameter (e.g.
&s=
in mojeek)
- This will be achieved by emulating user interaction as closely as possible. For example, making the request to the first page not have the page URL parameter (e.g.
- Cleanup the URL retrieved by passing it to
parse.ParseURL
. - Different search engines have different formats for the
locale
,device
,safeSearch
(and similar) parameters, while the format is standardized inengines.Options
. Parsing this should be done in module functions likegetLocale
,getDevice
,getSafeSearch
(and similar). Refer to qwant. - If the search engine has a "Load More" functionality (like yep), the
page
value for all results should be1
.
And the modules that solve them.
Pass a context with the page number (1-indexed). Almost every module does this, example: mojeek.
Refer to var pageRankCounter []int
from mojeek. This works because the matches to the OnHTML
function are called in order and synchronously.
It's okay to hardcode some elements (instead of putting them in dompaths
). Refer to descText
from brave.
Various methods may be necessary, refer to duckduckgo and etools.
Cookies gotten through the Set-Cookie
are saved passed in subsequent requests by colly
automatically. We do need to wait for a response that actually sets the cookie though (can't have everything async
). Refer to etools.
Through unmarshalling. Refer to qwant and swisscows. If the JSON has an array that doesn't have consistent objects, refer to yep.
Good luck. swisscows uses a nonce + signature. You may also refer to metager (not implemented at the time of writing).
Good luck. yandex is an example (not implemented at the time of writing).