Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
pmhalvor committed Oct 19, 2024
1 parent a154ec2 commit 2c4458d
Show file tree
Hide file tree
Showing 7 changed files with 239 additions and 14 deletions.
82 changes: 82 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@ Derived from <a href="https://docs.mbari.org/pacific-sound/notebooks/humpbackwha
## Getting started

### Install

Create a virtual environment and install the required packages.
We'll use conda for this, but you can use any package manager you prefer.

Since we're developing on an M1 machine, we'll need to specify the `CONDA_SUBDIR` to `osx-arm64`.
This step should be adapted based on the virtual environment you're using.

#### M1:
```bash
CONDA_SUBDIR=osx-arm64 conda create -n whale-speech python=3.11
Expand All @@ -24,11 +31,86 @@ conda activate whale-speech
pip install -r requirements.txt
```

### Google Cloud SDK
To run the pipeline on Google Cloud Dataflow, you'll need to install the Google Cloud SDK.
You can find the installation instructions [here](https://cloud.google.com/sdk/docs/install).

Make sure you authentication your using and initialize the project you are using.
```bash
gcloud auth login
gcloud init
```

For newly created projects, each of the services used will need to be enabled.
This can be easily done in the console, or via the command line.
For example:
```bash
gcloud services enable bigquery.googleapis.com
gcloud services enable dataflow.googleapis.com
gcloud services enable storage-api.googleapis.com
gcloud services enable run.googleapis.com
```

### Run locally
To run the pipeline and model server locally, you can use the `make` target `local-run`.

```bash
make local-run
```

This target starts by killing any previous model servers that might be running (needed for when a pipeline fails, without tearing down the server, causing the previous call to hang).
Then it starts the model server in the background and runs the pipeline.


### Build and push the model server
To build and push the model server to your model registry (stored as an environment variable), you can use the following `make` target.

```bash
make build-push-model-server
```
This target builds the model server image and pushes it to the registry specified in the `env.sh` file.
The tag is a combination of the version set in the makefile and the last git commit hash.
This helps keep track of what is included in the image, and allows for easy rollback if needed.
The target fails if there are any uncommited changes in the git repository.

The `latest` tag is only added to images deployed via GHA.

### Run pipeline with Dataflow
To run the pipeline on Google Cloud Dataflow, you can use the following `make` target.

```bash
make run-dataflow
```
Logging in the terminal will tell you the status of the pipeline, and you can follow the progress in the [Dataflow console](https://console.cloud.google.com/dataflow/jobs).

In addition to providing the inference url and filesystem to store outputs on, the definition of the above target also provides an example on how a user can pass additional arguments to and request different resources for the pipeline run.

**Pipeline specific parameters**
You can configure all the paramters set in the config files directly when running the pipeline.
The most important here is probably the start and end time for the initial search.

```bash
--start "2024-07-11" \
--end "2024-07-11" \
--offset 0 \
--margin 1800 \
--batch_duration 60
```

Note that any parameters with the same name under different sections will only be updated if its the last section in the list.
Also, since these argparse-parameters are added automatically, behavior of boolean flags might be unexpected (always true is added).
<!-- TODO fix behavior of boolean in-line parameters -->

**Compute resources**
The default compute resources are quite small and slow. To speed things up, you can request more workers and a larger machine type. For more on Dataflow resources, check out [the docs](https://cloud.google.com/dataflow/docs/reference/pipeline-options#worker-level_options).
```
--worker_machine_type=n1-highmem-8 \
--disk_size_gb=100 \
--num_workers=8 \
--max_num_workers=8 \
```


## Pipeline description

Stages:
Expand Down
67 changes: 67 additions & 0 deletions docs/howto/model-server-as-cloud-run.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Model server as Cloud Run
In this guide, we will deploy the model server as a [Cloud Run](https://cloud.google.com/run/) service.

Cloud Run is a serverless compute platform that allows you to run prebuilt containers triggered via HTTP requests.
Our model server component is a perfect example of a service that can be deployed on Cloud Run, since it is a REST API listening for POST requests on a specified port and endpoint.

## Prerequisites
- A Google Cloud Platform (GCP) account and [project](https://cloud.google.com/resource-manager/docs/creating-managing-projects) with [billing enabled](https://cloud.google.com/billing/docs/how-to/modify-project).
- [Docker](https://github.com/docker/docker-install?tab=readme-ov-file#usage) installed on your local machine.
- This code locally cloned (`git clone https://github.com/pmhalvor/whale-speech`).

## Steps

### 0. (Optional) Set up Artifact Registry
If you want to store your Docker images in Google Cloud, you can use [Artifact Registry](https://cloud.google.com/artifact-registry/docs/overview).

You'll likely need to enable the app, create a repository, then add permissions to your local environment to push to the repository.
See more on this authentication process [here](https://cloud.google.com/artifact-registry/docs/docker/pushing-and-pulling#auth).

### 1. Build the Docker image and push to Google Container Registry
Navigate to the project directory in a terminal, build and tag your model-server image, and push to your model registry.

If you are using the Google Artifact Registry, you'll need to tag your image with the registry URL and a zone, something like `us-central1-docker.pkg.dev/your_project/whale-speech/model-server:x.y.z`.
If you prefer to use the free Docker hub registry, you can use your public Docker ID as a prefix to your image tag, something like `your_docker_id/whale-speech:model-server-x.y.z`.

This guide will only document the Google Artifact Registry method. The Docker Hub method is similar, though naming might be different.

```bash
cd whale-speech
docker build -f Dockerfile.model-server -t us-central1-docker.pkg.dev/your_project/whale-speech/model-server:x.y.z .
docker push us-central1-docker.pkg.dev/your_project/whale-speech/model-server:x.y.z
```

The `Dockerfile.model-server` is a Dockerfile written for hosting the model server.
You can find this file in the `whale-speech` directory.
Note there is no need to expose a port in the Dockerfile, as this will be done in the Cloud Run deployment.


### 2. Deploy image as Cloud Run service
Navigate to the [Cloud Run](https://console.cloud.google.com/run) page in the GCP console.

- Select **Deploy container** and then **Service**, since we'll want the container to be server with an endpoint.
- Add you container image URL that you pushed to in the step above `docker push us-central1-docker.pkg.dev/your_project/whale-speech/model-server:x.y.z`.
- Name your service (ex. `whale-speech-model-server`) and select a region (`us-central1` is a good default).
- Open the section for **Container(s), Volumes, Networking, Security**.
- Add the port your model server is listening on (default is `5000`) as the container port. This will be added as an environment variable when running the container.
- Update memory and CPU count as needed. I noticed that 4 GiB and 2 vCPUs worked fine with batch durations of 60 seconds. This value can be adjusted through revisioning later.
- I'd maybe reccomend lowering the max number of requests per container to 1-5, since the inputs will be larger for each request.
- You may need to adjust the min and max number of instances, depending on your expected traffic and quotas.

- Click **Create**.

### 3. Test the service
Once the service is deployed, you can test it by sending a POST request to the service's endpoint.
The URL should be available at to top of the service details page. It'll look something like `https://whale-speech-model-server-xxxx.a.run.app`.

In the `whale-speech` directory, you can run the following command to test the service:
```bash
export INFERENCE_URL="https://whale-speech-model-server-xxxx.a.run.app"
python3 examples/test_model_server.py
```

The expected response should be a JSON object with a `prediction` key and a list of floats as the value.

I'd recommend saving the `export INFERENCE_URL="https://whale-speech-model-server-xxxx.a.run.app"` command to an `env.sh` file in the `whale-speech` directory, so you can easily run the test script in the future. This filename is in the `.gitignore`, so it won't be pushed to the repository.

In the same file, I export a MODEL_REGISTRY variable, which is the URL of the model server image in the Google Artifact Registry. This is used in the `make` targets, like `build-model-server`, which builds the image to the registry.
4 changes: 3 additions & 1 deletion docs/howto/spin-up-model-server.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# How to spin up model server
# How to spin up model server VM

NOTE: VM for model serving is no longer used for this project. See [Model server as CloudRun](docs/howto/model-server-as-cloud-run.md) for the current model serving method. This doc on VMs is kept for reference.

Our pipeline requires a publically accessible model server for classifications.
While this code is included in this repo, users will still need to spin up their own server.
Expand Down
71 changes: 71 additions & 0 deletions docs/ladr/LADR_0006_model_server_deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Model Server Deployment

Need to decide how to host the model server.

When running locally, I've been starting a Flask server in a separate thread.
For my production environment, I will likely need a more robust solution.

The desired solution should:
- Be fast, scalable, and stable
- Be cost-effective
- Be easy to deploy and maintain
- Have some version control

## Options

### 1. Compute Engine VM
**Pros**:
- Full control over the environment
- Easy to debug by ssh-ing into the VM
- Manually install and update needed dependencies
- Very similar to local development
- Can host multiple services on the same VM
- Ex. if inference server and pipeline triggers were on the same VM

**Cons**:
- Requires more setup and maintenance
- Networking Firewall rules in GCP
- Monitoring and logging not built-in
- Not as scalable as other options
- Persistent servers would likely be more expensive than serverless options

### 2. Cloud Run
**Pros**:
- Serverless
- Only pay for what you use
- Scales automatically
- Easy to deploy
- Can deploy and revise directly from `gcloud` or in the GCP console
- Built-in monitoring and logging
- Built-in version control (using image registry and/or tags)
- Exposes a public endpoint that can be triggered by HTTP requests

**Cons**:
- Can only serve one contianer per service. Other services would need to be deployed separately.
- Haven't figured out how to scale up (to recieve large input requests)

### 3. Kubernetes Engine
**Pros**:
- Full control over the environment
- Scalable
- Can host multiple services on the same cluster

**Cons**:
- Takes a (relatively) long time to start and scale up
- Requires more setup and maintenance
- Not as cost-effective as serverless options
- Probably overkill for this project


## Decision
For this project, I'll use Cloud Run.
I tried a VM first, but realized it costs too much over time, and missed the ability to easily scale.

Cloud Run worked pretty much out of the box, and I was able to deploy the model server in a few minutes.
Figuring out the correct PORT configuration was a bit cumbersome, though.

I think the stateless nature will be a cheapest option for the end goal of this project.
During times of high activity, we can keep the minimum instance count at 1, to ensure faster response times.
Otherwise, we can scale down to 0 instances, and only pay for the storage of the container image (if using Artifact Registry).

I just need to figure out how to scale up the instances to handle larger requests.
2 changes: 1 addition & 1 deletion examples/test_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"key": "encouter1",
"batch": [
[0.3745401203632355], [0.9507142901420593], [0.7319939136505127], [0.5986585021018982], [0.15601864044243652], [0.15599452033620265]
]*10_000*10*1, # (6 samples * 10_000 = 6 seconds )* 10 = 60 seconds
]*10_000*10, # (6 samples * 10_000 = 6 seconds )* 10 = 60 seconds
}

response = requests.post(inference_url, json=data)
Expand Down
19 changes: 11 additions & 8 deletions makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,12 @@ GIT_SHA := $(shell git rev-parse --short HEAD)
PIPELINE_IMAGE_NAME := whale-speech/pipeline:$(VERSION)-$(GIT_SHA)
MODEL_SERVER_IMAGE_NAME := whale-speech/model-server:$(VERSION)-$(GIT_SHA)
PIPELINE_WORKER_IMAGE_NAME := whale-speech/pipeline-worker:$(VERSION)-$(GIT_SHA)
MODEL_REGISTERY := us-central1-docker.pkg.dev/bioacoustics-2024
ENV_LOCATION := .env

local-run:
bash scripts/kill_model_server.sh
python3 src/model_server.py & python3 src/pipeline.py
bash scripts/kill_model_server.sh
python3 src/gcp.py --deduplicate

run-pipeline:
python3 src/pipeline.py
Expand All @@ -36,26 +34,31 @@ setup:
run:
$(ENV_LOCATION)/bin/python3 src/pipeline.py

build:
check-uncommited:
git diff-index --quiet HEAD

build: check-uncommited
docker build -t $(PIPELINE_IMAGE_NAME) --platform linux/amd64 .

push:
push: check-uncommited
docker tag $(PIPELINE_IMAGE_NAME) $(MODEL_REGISTERY)/$(PIPELINE_IMAGE_NAME)
docker push $(MODEL_REGISTERY)/$(PIPELINE_IMAGE_NAME)

build-push: build push

build-model-server:
build-model-server: check-uncommited
docker build -t $(MODEL_SERVER_IMAGE_NAME) --platform linux/amd64 -f Dockerfile.model-server .

push-model-server:
push-model-server: check-uncommited
docker tag $(MODEL_SERVER_IMAGE_NAME) $(MODEL_REGISTERY)/$(MODEL_SERVER_IMAGE_NAME)
docker push $(MODEL_REGISTERY)/$(MODEL_SERVER_IMAGE_NAME)

build-pipeline-worker:
build-push-model-server: build-model-server push-model-server

build-pipeline-worker: check-uncommited
docker build -t $(PIPELINE_WORKER_IMAGE_NAME) --platform linux/amd64 -f Dockerfile.pipeline-worker .

push-pipeline-worker:
push-pipeline-worker: check-uncommited
docker tag $(PIPELINE_WORKER_IMAGE_NAME) $(MODEL_REGISTERY)/$(PIPELINE_WORKER_IMAGE_NAME)
docker push $(MODEL_REGISTERY)/$(PIPELINE_WORKER_IMAGE_NAME)

Expand Down
8 changes: 4 additions & 4 deletions src/config/common.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,11 +72,11 @@ pipeline:
url_template: "https://pacific-sound-16khz.s3.amazonaws.com/{year}/{month:02}/{filename}"
filename_template: "MARS-{year}{month:02}{day:02}T000000Z-16kHz.wav"
source_sample_rate: 16000
margin: 30 # TODO set to 900 # seconds
offset: 13 # TODO set to 0 # hours
margin: 1800 # seconds
offset: 0 # hours - only used for cherry picking during development
output_array_path_template: "data/audio/raw/key={key}/{filename}"
output_table_path_template: "data/table/{table_id}/metadata.json"
skip_existing: false # if true, skip downstream processing of existing audio files (false during development)
skip_existing: true # if true, skip downstream processing of existing audio files (false during development)
audio_table_id: "raw_audio"
store_audio: true
audio_table_schema:
Expand Down Expand Up @@ -135,7 +135,7 @@ pipeline:
inference_retries: 3
med_filter_size: 3

plot_scores: false # TODO write plot of final results to GCS
plot_scores: false
hydrophone_sensitivity: -168.8
plot_path_template: "data/plots/results/{params}/{plot_name}.png"
output_array_path_template: "data/classifications/{params}/{key}.npy"
Expand Down

0 comments on commit 2c4458d

Please sign in to comment.