update docs

pmhalvor · Oct 19, 2024 · 2c4458d · 2c4458d
1 parent a154ec2
commit 2c4458d
Show file tree

Hide file tree

Showing 7 changed files with 239 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -9,6 +9,13 @@ Derived from <a href="https://docs.mbari.org/pacific-sound/notebooks/humpbackwha
 ## Getting started
 
 ### Install
+
+Create a virtual environment and install the required packages.
+We'll use conda for this, but you can use any package manager you prefer.
+
+Since we're developing on an M1 machine, we'll need to specify the `CONDA_SUBDIR` to `osx-arm64`.
+This step should be adapted based on the virtual environment you're using.
+
 #### M1:
 ```bash
 CONDA_SUBDIR=osx-arm64 conda create -n whale-speech python=3.11
@@ -24,11 +31,86 @@ conda activate whale-speech
 pip install -r requirements.txt
 ```
 
+### Google Cloud SDK
+To run the pipeline on Google Cloud Dataflow, you'll need to install the Google Cloud SDK.
+You can find the installation instructions [here](https://cloud.google.com/sdk/docs/install).
+
+Make sure you authentication your using and initialize the project you are using.
+```bash
+gcloud auth login
+gcloud init
+```
+
+For newly created projects, each of the services used will need to be enabled. 
+This can be easily done in the console, or via the command line. 
+For example:
+```bash
+gcloud services enable bigquery.googleapis.com
+gcloud services enable dataflow.googleapis.com
+gcloud services enable storage-api.googleapis.com
+gcloud services enable run.googleapis.com
+```
+
 ### Run locally 
+To run the pipeline and model server locally, you can use the `make` target `local-run`.
+
 ```bash
 make local-run
 ```
 
+This target starts by killing any previous model servers that might be running (needed for when a pipeline fails, without tearing down the server, causing the previous call to hang). 
+Then it starts the model server in the background and runs the pipeline.
+
+
+### Build and push the model server
+To build and push the model server to your model registry (stored as an environment variable), you can use the following `make` target.
+
+```bash
+make build-push-model-server
+```
+This target builds the model server image and pushes it to the registry specified in the `env.sh` file.
+The tag is a combination of the version set in the makefile and the last git commit hash. 
+This helps keep track of what is included in the image, and allows for easy rollback if needed.
+The target fails if there are any uncommited changes in the git repository.
+
+The `latest` tag is only added to images deployed via GHA.
+
+### Run pipeline with Dataflow
+To run the pipeline on Google Cloud Dataflow, you can use the following `make` target.
+
+```bash
+make run-dataflow
+```
+Logging in the terminal will tell you the status of the pipeline, and you can follow the progress in the [Dataflow console](https://console.cloud.google.com/dataflow/jobs).
+
+In addition to providing the inference url and filesystem to store outputs on, the definition of the above target also provides an example on how a user can pass additional arguments to and request different resources for the pipeline run. 
+
+**Pipeline specific parameters**
+You can configure all the paramters set in the config files directly when running the pipeline.
+The most important here is probably the start and end time for the initial search. 
+
+```bash
+		--start "2024-07-11" \
+		--end "2024-07-11" \
+		--offset 0 \
+		--margin 1800 \
+		--batch_duration 60 
+```
+
+Note that any parameters with the same name under different sections will only be updated if its the last section in the list. 
+Also, since these argparse-parameters are added automatically, behavior of boolean flags might be unexpected (always true is added). 
+<!-- TODO fix behavior of boolean in-line parameters -->
+
+**Compute resources**
+The default compute resources are quite small and slow. To speed things up, you can request more workers and a larger machine type. For more on Dataflow resources, check out [the docs](https://cloud.google.com/dataflow/docs/reference/pipeline-options#worker-level_options).
+```
+		--worker_machine_type=n1-highmem-8 \
+		--disk_size_gb=100 \
+		--num_workers=8 \
+		--max_num_workers=8 \
+```
+
+
 ## Pipeline description
 
 Stages:

diff --git a/docs/howto/model-server-as-cloud-run.md b/docs/howto/model-server-as-cloud-run.md
@@ -0,0 +1,67 @@
+# Model server as Cloud Run
+In this guide, we will deploy the model server as a [Cloud Run](https://cloud.google.com/run/) service. 
+
+Cloud Run is a serverless compute platform that allows you to run prebuilt containers triggered via HTTP requests.
+Our model server component is a perfect example of a service that can be deployed on Cloud Run, since it is a REST API listening for POST requests on a specified port and endpoint.
+
+## Prerequisites
+- A Google Cloud Platform (GCP) account and [project](https://cloud.google.com/resource-manager/docs/creating-managing-projects) with [billing enabled](https://cloud.google.com/billing/docs/how-to/modify-project).
+- [Docker](https://github.com/docker/docker-install?tab=readme-ov-file#usage) installed on your local machine.
+- This code locally cloned (`git clone https://github.com/pmhalvor/whale-speech`).
+
+## Steps
+
+### 0. (Optional) Set up Artifact Registry
+If you want to store your Docker images in Google Cloud, you can use [Artifact Registry](https://cloud.google.com/artifact-registry/docs/overview).
+
+You'll likely need to enable the app, create a repository, then add permissions to your local environment to push to the repository.
+See more on this authentication process [here](https://cloud.google.com/artifact-registry/docs/docker/pushing-and-pulling#auth).
+
+### 1. Build the Docker image and push to Google Container Registry
+Navigate to the project directory in a terminal, build and tag your model-server image, and push to your model registry.
+
+If you are using the Google Artifact Registry, you'll need to tag your image with the registry URL and a zone, something like `us-central1-docker.pkg.dev/your_project/whale-speech/model-server:x.y.z`.
+If you prefer to use the free Docker hub registry, you can use your public Docker ID as a prefix to your image tag, something like `your_docker_id/whale-speech:model-server-x.y.z`.
+
+This guide will only document the Google Artifact Registry method. The Docker Hub method is similar, though naming might be different. 
+
+```bash
+cd whale-speech
+docker build -f Dockerfile.model-server -t us-central1-docker.pkg.dev/your_project/whale-speech/model-server:x.y.z  .
+docker push us-central1-docker.pkg.dev/your_project/whale-speech/model-server:x.y.z
+```
+
+The `Dockerfile.model-server` is a Dockerfile written for hosting the model server. 
+You can find this file in the `whale-speech` directory.
+Note there is no need to expose a port in the Dockerfile, as this will be done in the Cloud Run deployment.
+
+
+### 2. Deploy image as Cloud Run service
+Navigate to the [Cloud Run](https://console.cloud.google.com/run) page in the GCP console.
+
+- Select **Deploy container** and then **Service**, since we'll want the container to be server with an endpoint.
+- Add you container image URL that you pushed to in the step above `docker push us-central1-docker.pkg.dev/your_project/whale-speech/model-server:x.y.z`.
+- Name your service (ex. `whale-speech-model-server`) and select a region (`us-central1` is a good default).
+- Open the section for **Container(s), Volumes, Networking, Security**.
+    - Add the port your model server is listening on (default is `5000`) as the container port. This will be added as an environment variable when running the container.
+    - Update memory and CPU count as needed. I noticed that 4 GiB and 2 vCPUs worked fine with batch durations of 60 seconds. This value can be adjusted through revisioning later. 
+    - I'd maybe reccomend lowering the max number of requests per container to 1-5, since the inputs will be larger for each request.
+    - You may need to adjust the min and max number of instances, depending on your expected traffic and quotas. 
+
+- Click **Create**.
+
+### 3. Test the service
+Once the service is deployed, you can test it by sending a POST request to the service's endpoint.
+The URL should be available at to top of the service details page. It'll look something like `https://whale-speech-model-server-xxxx.a.run.app`.
+
+In the `whale-speech` directory, you can run the following command to test the service:
+```bash
+export INFERENCE_URL="https://whale-speech-model-server-xxxx.a.run.app"
+python3 examples/test_model_server.py
+```
+
+The expected response should be a JSON object with a `prediction` key and a list of floats as the value.
+
+I'd recommend saving the `export INFERENCE_URL="https://whale-speech-model-server-xxxx.a.run.app"` command to an `env.sh` file in the `whale-speech` directory, so you can easily run the test script in the future. This filename is in the `.gitignore`, so it won't be pushed to the repository.
+
+In the same file, I export a MODEL_REGISTRY variable, which is the URL of the model server image in the Google Artifact Registry. This is used in the `make` targets, like `build-model-server`, which builds the image to the registry.
diff --git a/docs/howto/spin-up-model-server.md b/docs/howto/spin-up-model-server.md
@@ -1,4 +1,6 @@
-# How to spin up model server
+# How to spin up model server VM
+
+NOTE: VM for model serving is no longer used for this project. See [Model server as CloudRun](docs/howto/model-server-as-cloud-run.md) for the current model serving method. This doc on VMs is kept for reference.
 
 Our pipeline requires a publically accessible model server for classifications. 
 While this code is included in this repo, users will still need to spin up their own server.

diff --git a/docs/ladr/LADR_0006_model_server_deployment.md b/docs/ladr/LADR_0006_model_server_deployment.md
@@ -0,0 +1,71 @@
+# Model Server Deployment
+
+Need to decide how to host the model server.
+
+When running locally, I've been starting a Flask server in a separate thread. 
+For my production environment, I will likely need a more robust solution. 
+
+The desired solution should:
+- Be fast, scalable, and stable
+- Be cost-effective
+- Be easy to deploy and maintain
+- Have some version control
+
+## Options
+
+### 1. Compute Engine VM
+**Pros**: 
+- Full control over the environment
+    - Easy to debug by ssh-ing into the VM
+    - Manually install and update needed dependencies
+- Very similar to local development
+- Can host multiple services on the same VM
+    - Ex. if inference server and pipeline triggers were on the same VM
+
+**Cons**:
+- Requires more setup and maintenance
+    - Networking Firewall rules in GCP
+    - Monitoring and logging not built-in
+- Not as scalable as other options
+- Persistent servers would likely be more expensive than serverless options
+
+### 2. Cloud Run
+**Pros**:
+- Serverless
+    - Only pay for what you use
+    - Scales automatically
+- Easy to deploy
+    - Can deploy and revise directly from `gcloud` or in the GCP console
+- Built-in monitoring and logging
+- Built-in version control (using image registry and/or tags)
+- Exposes a public endpoint that can be triggered by HTTP requests
+
+**Cons**:
+- Can only serve one contianer per service. Other services would need to be deployed separately.
+- Haven't figured out how to scale up (to recieve large input requests)
+
+### 3. Kubernetes Engine
+**Pros**:
+- Full control over the environment
+- Scalable
+- Can host multiple services on the same cluster
+
+**Cons**:
+- Takes a (relatively) long time to start and scale up 
+- Requires more setup and maintenance
+- Not as cost-effective as serverless options
+- Probably overkill for this project
+
+
+## Decision
+For this project, I'll use Cloud Run.
+I tried a VM first, but realized it costs too much over time, and missed the ability to easily scale.
+
+Cloud Run worked pretty much out of the box, and I was able to deploy the model server in a few minutes.
+Figuring out the correct PORT configuration was a bit cumbersome, though. 
+
+I think the stateless nature will be a cheapest option for the end goal of this project. 
+During times of high activity, we can keep the minimum instance count at 1, to ensure faster response times.
+Otherwise, we can scale down to 0 instances, and only pay for the storage of the container image (if using Artifact Registry).
+
+I just need to figure out how to scale up the instances to handle larger requests.
diff --git a/examples/test_server.py b/examples/test_server.py
@@ -8,7 +8,7 @@
     "key": "encouter1",
     "batch": [
         [0.3745401203632355], [0.9507142901420593], [0.7319939136505127], [0.5986585021018982], [0.15601864044243652], [0.15599452033620265]
-    ]*10_000*10*1, # (6 samples * 10_000 = 6 seconds )* 10 = 60 seconds 
+    ]*10_000*10, # (6 samples * 10_000 = 6 seconds )* 10 = 60 seconds 
 }
 
 response = requests.post(inference_url, json=data)

diff --git a/makefile b/makefile
@@ -3,14 +3,12 @@ GIT_SHA := $(shell git rev-parse --short HEAD)
 PIPELINE_IMAGE_NAME := whale-speech/pipeline:$(VERSION)-$(GIT_SHA)
 MODEL_SERVER_IMAGE_NAME := whale-speech/model-server:$(VERSION)-$(GIT_SHA)
 PIPELINE_WORKER_IMAGE_NAME := whale-speech/pipeline-worker:$(VERSION)-$(GIT_SHA)
-MODEL_REGISTERY := us-central1-docker.pkg.dev/bioacoustics-2024
 ENV_LOCATION := .env
 
 local-run: 
 	bash scripts/kill_model_server.sh
 	python3 src/model_server.py & python3 src/pipeline.py
 	bash scripts/kill_model_server.sh
-	python3 src/gcp.py --deduplicate
 
 run-pipeline:
 	python3 src/pipeline.py
@@ -36,26 +34,31 @@ setup:
 run: 
 	$(ENV_LOCATION)/bin/python3 src/pipeline.py
 
-build:
+check-uncommited:
+	git diff-index --quiet HEAD
+
+build: check-uncommited
 	docker build -t $(PIPELINE_IMAGE_NAME) --platform linux/amd64 .
 
-push:
+push: check-uncommited
 	docker tag $(PIPELINE_IMAGE_NAME) $(MODEL_REGISTERY)/$(PIPELINE_IMAGE_NAME)
 	docker push $(MODEL_REGISTERY)/$(PIPELINE_IMAGE_NAME)
 
 build-push: build push
 
-build-model-server:
+build-model-server: check-uncommited
 	docker build -t $(MODEL_SERVER_IMAGE_NAME) --platform linux/amd64 -f Dockerfile.model-server .
 
-push-model-server:
+push-model-server: check-uncommited
 	docker tag $(MODEL_SERVER_IMAGE_NAME) $(MODEL_REGISTERY)/$(MODEL_SERVER_IMAGE_NAME)
 	docker push $(MODEL_REGISTERY)/$(MODEL_SERVER_IMAGE_NAME)
 
-build-pipeline-worker:
+build-push-model-server: build-model-server push-model-server
+
+build-pipeline-worker: check-uncommited
 	docker build -t $(PIPELINE_WORKER_IMAGE_NAME) --platform linux/amd64 -f Dockerfile.pipeline-worker .
 
-push-pipeline-worker:
+push-pipeline-worker: check-uncommited
 	docker tag $(PIPELINE_WORKER_IMAGE_NAME) $(MODEL_REGISTERY)/$(PIPELINE_WORKER_IMAGE_NAME)
 	docker push $(MODEL_REGISTERY)/$(PIPELINE_WORKER_IMAGE_NAME)
 

diff --git a/src/config/common.yaml b/src/config/common.yaml
@@ -72,11 +72,11 @@ pipeline:
     url_template: "https://pacific-sound-16khz.s3.amazonaws.com/{year}/{month:02}/{filename}"
     filename_template: "MARS-{year}{month:02}{day:02}T000000Z-16kHz.wav"
     source_sample_rate: 16000
-    margin: 30 # TODO set to 900  # seconds  
-    offset: 13 # TODO set to 0    # hours
+    margin: 1800  # seconds  
+    offset: 0     # hours - only used for cherry picking during development 
     output_array_path_template: "data/audio/raw/key={key}/{filename}"
     output_table_path_template: "data/table/{table_id}/metadata.json"
-    skip_existing: false # if true, skip downstream processing of existing audio files (false during development)
+    skip_existing: true # if true, skip downstream processing of existing audio files (false during development)
     audio_table_id: "raw_audio"
     store_audio: true
     audio_table_schema:
@@ -135,7 +135,7 @@ pipeline:
     inference_retries: 3
     med_filter_size: 3
 
-    plot_scores: false # TODO write plot of final results to GCS
+    plot_scores: false 
     hydrophone_sensitivity: -168.8
     plot_path_template: "data/plots/results/{params}/{plot_name}.png"
     output_array_path_template: "data/classifications/{params}/{key}.npy"