Palomar is a backend search service for atproto, specifically the bsky.app
post and profile record types. It works by consuming a repo event stream ("firehose") and upating an OpenSearch cluster (fork of Elasticsearch) with docs.
Almost all the code for this service is actually in the search/
directory at the top of this repo.
In September 2023, this service was substantially re-written. It no longer stores records in a local database, returns only "skelton" results (list of ATURIs or DIDs) via the HTTP API, and defines index mappings.
Currently only a simple query string syntax is supported. Double-quotes can surround phrases, -
prefix negates a single keyword, and the following initial filters are supported:
from:<handle>
will filter to results from that account, based on current (cached) identity resolution- entire DIDs as an un-quoted keyword will result in filtering to results from that account
Palomar uses environment variables for configuration.
ATP_BGS_HOST
: URL of firehose to subscribe to, either global BGS or individual PDS (default:wss://bsky.social
)ATP_PLC_HOST
: PLC directory for identity lookups (default:https://plc.directory
)DATABASE_URL
: connection string for database to persist firehose cursor subscription statePALOMAR_BIND
: IP/port to have HTTP API listen on (default::3999
)ES_USERNAME
: Elasticsearch username (default:admin
)ES_PASSWORD
: Password for Elasticsearch authenticationES_CERT_FILE
: Optional, for TLS connectionsES_HOSTS
: Comma-separated list of Elasticsearch endpointsES_POST_INDEX
: name of index for post docs (default:palomar_post
)ES_PROFILE_INDEX
: name of index for profile docs (default:palomar_profile
)PALOMAR_READONLY
: Set this if the instance should act as a readonly HTTP server (no indexing)
HTTP Query Params:
q
: query string, requiredlimit
: integer, default 25cursor
: string, for partial pagination (uses offset, not a scroll)
Response:
posts
: array of AT-URI stringshits_total
: integer; optional number of search hits (may not be populated for large result sets, eg over 10k hits)cursor
: string; optionally included if there are more results that can be paginated
HTTP Query Params:
q
: query string, requiredlimit
: integer, default 25cursor
: string, for partial pagination (uses offset, not a scroll)typeahead
: boolean, for typeahead behavior (vs. full search)
Response:
actors
: array of AT-URI stringshits_total
: integer; optional number of search hits (may not be populated for large result sets, eg over 10k hits)cursor
: string; optionally included if there are more results that can be paginated
Run an ephemeral opensearch instance on local port 9200, with SSL disabled, and the analysis-icu
plugin installed, using docker:
docker build -f Dockerfile.opensearch . -t opensearch-palomar
docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "plugins.security.disabled=true" opensearch-palomar
See README.opensearch.md for more Opensearch operational tips.
From the top level of the repository:
# run combined indexing and search service
make run-dev-search
# run just the search service
READONLY=true make run-dev-search
You'll need to get some content in to the index. An easy way to do this is to have palomar consume from the public production firehose.
You can run test queries from the top level of the repository:
go run ./cmd/palomar search-post "hello"
go run ./cmd/palomar search-profile "hello"
go run ./cmd/palomar search-profile -typeahead "h"
For more commands and args:
go run ./cmd/palomar --help