Reads Automated Surface Observation Station (ASOS) data from disk on the NSIDC archive to create a temporally and geospatially indexed database to quickly search events.
Note
TODO: Is this data available publicly and documented? How is it produced? Links!
Part of the AROSS Stations project.
To get started quickly, install Docker.
Important
Instructions that follow presume the current working directory is the root of this repository unless otherwise stated.
Dev quickstart
‼️ Don't worry about this unless you intend to change the code!
View the contributing docs for more details!
Set up the development compose configuration to be automatically loaded:
ln -s compose.dev.yml compose.override.dev.yml
You will need local tooling like Nox and pre-commit to do development. Use whatever Python version management tool you prefer (Conda, VirtualEnv, PyEnv, ...) to create a virtual environment, then install this package and its dev dependencies:
pip install --editable ".[dev]"
[!IMPORTANT] Do this step before starting the stack in dev mode, or you may encounter an error (in which case, see the troubleshooting section for explanation!).
You may wish to run the API process from an attached shell for interactive debugging.
You can set up the relevant container to "sleep" in compose.dev.yml
:
api:
<<: *dev-common
entrypoint: "sleep"
command: ["9999999"]
# command: ["dev", "--host", "0.0.0.0", "./src/aross_stations_db/api"]
Then you can manually run the dev server interactively:
docker compose exec api fastapi dev --host 0.0.0.0 ./src/aross_stations_db/api
From here, you can interactively pause at any breakpoint()
calls in the Python code.
The instructions below specify starting the stack with the --profile ui
option. If you
wish to develop in
the user interface code repository, you
should omit that flag and follow the instructions in the UI repo to develop.
Create a .env
file or otherwise export
the required envvars. You can use our sample
environment file as a starting point, and modify as you see fit:
cp .env.sample .env
Important
$AROSS_DATA_BASEDIR
should be Andy's data directory containing expected "metadata"
and "events" subdirectories. TODO: Document how that data is created! How can the
public access it?
Note
The connection string shown here is for connecting within the Docker network to a
container with the hostname db
.
The stack is configured within compose.yml
and includes containers:
aross-stations-db
: A PostGIS database for quickly storing and accessing event records.aross-stations-admin
: An Adminer container for inspecting the database in the browser.aross-stations-api
: An HTTP API for accessing data in the database.
docker compose --profile ui up --pull=always --detach
Important
If you've pulled the images before, you may need to fetch new ones! Bring down the running containers:
docker compose down --remove-orphans
...then run the "up" command again.
You can use the included Adminer container for quick inspection. Navigate in your
browser to http://localhost:8080
and enter:
Field | Value |
---|---|
System | PostgreSQL |
Server | aross-stations-db |
Username | aross |
Password | Whatever you specified in the environment variable |
Database | aross |
Note
At this point, the database is empty. We're just verifying we can connect. Continue to ingest next!
docker compose run cli init
From a fast disk, this should take under 2 minutes.
Now, you can use Adminer's SQL Query menu to select some data:
Example SQL query
This query returns 13 results at the time of this writing, but it may return more at a future time.
select event.*
from event
join station on event.station_id = station.id
where
ST_Within(
station.location,
ST_SetSRID(
ST_GeomFromText('POLYGON ((-159.32130625160698 69.56469019745796, -159.32130625160698 68.08208920517862, -150.17196253090276 68.08208920517862, -150.17196253090276 69.56469019745796, -159.32130625160698 69.56469019745796))'),
4326
)
)
AND event.time_start > '2023-01-01'::date
AND event.time_end < '2023-06-01'::date
AND event.snow_on_ground
AND event.rain_hours >= 1
;
Or you can check out the API docs in your browser at http://localhost:8000/docs
or
submit an HTTP query:
Example HTTP query
http://localhost:8000/v1/stations?start=2023-01-01&end=2023-06-01&polygon=POLYGON%20((-159.32130625160698%2069.56469019745796,%20-159.32130625160698%2068.08208920517862,%20-150.17196253090276%2068.08208920517862,%20-150.17196253090276%2069.56469019745796,%20-159.32130625160698%2069.56469019745796))
In this example, we view and follow logs for the api
service:
docker compose logs --follow api
You can replace api
with any other service name, or omit it to view logs for all
services.
Navigate to http://localhost:80
.
This repository provides a demo notebook to experiment with the API. In your browser,
navigate to http://localhost:8888
. The password is the same as the database password
you set earlier.
docker compose down
There is no need to remove the _data/
directory to start over with a fresh database; the
init
CLI command will do that for you! However, if you want to completely remove the
database to save space on your system, you may want to delete the _data/
directory.
# Bring down containers, even if a service name has changed
docker compose down --remove-orphans
# Clean up all unused images aggressively
docker system prune -af
When this error occurs, the webserver still responds to queries, but hot-reloading doesn't work.
You may need to grant read access to the _data/
directory if you're running locally.
The problem is that FastAPI's hot-reloading functionality in dev needs to watch the
current directory for changes, and I don't know of a way to ignore this directory that
is usually not readable. The directory is likely owned by root, assuming it was created
automatically by Docker, so you may need to use sudo
.
sudo chmod -R ugo+r _data
Unfortunately, this project doesn't work perfectly with Docker for development yet. This
is because our project configuration (pyproject.toml
) is set up to dynamically
generate version numbers from source control at build-time:
[tool.hatch]
version.source = "vcs"
build.hooks.vcs.version-file = "src/aross_stations_db/_version.py"
If you freshly clone this project and immediately start up the docker containers in dev
mode, the dynamically-generated version module, _version.py
, won't exist yet in the
source directory (because it is git-ignored). The source directory will be mounted in to
the docker container, overwriting the pre-built source directory in the image that
does (well, it did until it was overwritten 😉) include _version.py
.
It's very important to complete the initial setup step of creating a local environment and installing the package and its development dependencies if you plan to be doing development. This will also give you Nox and pre-commit for automating development tasks.